=Paper= {{Paper |id=Vol-2143/paper1 |storemode=property |title=Semi-automated creation of regulation rule bases using generic template-driven rule extraction |pdfUrl=https://ceur-ws.org/Vol-2143/paper1.pdf |volume=Vol-2143 |authors=Deepali Kholkar,Sagar Sunkle,Vinay Kulkarni |dblpUrl=https://dblp.org/rec/conf/icail/KholkarSK17 }} ==Semi-automated creation of regulation rule bases using generic template-driven rule extraction== https://ceur-ws.org/Vol-2143/paper1.pdf
 Semi-automated creation of regulation rule bases using generic
               template-driven rule extraction
                Deepali Kholkar                                           Sagar Sunkle                               Vinay Kulkarni
                TCS Research                                           TCS Research                                   TCS Research
        54B, Hadapsar Industrial Estate                        54B, Hadapsar Industrial Estate                54B, Hadapsar Industrial Estate
          Pune, Maharashtra 411013                               Pune, Maharashtra 411013                       Pune, Maharashtra 411013
          deepali.kholkar@tcs.com                                  sagar.sunkle@tcs.com                         vinay.vkulkarni@tcs.com
ABSTRACT                                                                               Formal approaches in research usually encode a small subset of
Formal approaches to checking compliance manually encode in-                        rules from regulation text and demonstrate compliance to individual
dividual obligations from the regulation text as rules. Automated                   rules. Encoding rules manually from the entire natural language
extraction approaches identify key elements in regulatory text, and                 text of a regulation is a complex endeavor due to the volume of
create annotated, in some cases structured, representations of regula-              text, legal language, and abstract nature of guidelines described. A
tion text. It is desirable to combine the two approaches to automate                bigger knowledge engineering problem is creating an isomorphic
creation of a regulation rule base that can be used for inferenc-                   rule base structured such that it is an accurate representation of the
ing and reasoning about compliance. In this paper we present a                      regulation, necessary for it to be usable by its users, and also easier
semi-automated approach that uses a generic semantic model of                       to maintain [4]. Getting the rule hierarchy, predicates, and arguments
regulations to guide automated extraction of rule suggestions. The                  of each rule right, necessary for correct inferencing, presents the
suggestions help domain experts author rules in Structured English                  greatest complexity in manual rule creation. All of these require the
using the generic model as template. Rules are translated automati-                 person(s) writing the rules to be an expert in the regulation domain
cally into a Semantics of Business Vocabulary and Rules (SBVR)                      as well as formal logic, which is hard in practice. Building a rule
model and defeasible logic rules, creating a hierarchical knowledge                 base of an entire regulation thus becomes a daunting task. With mul-
base that reflects the regulation structure and enables querying and                tiple regulatory document sources to be considered for compliance,
reasoning about compliance.                                                         (semi-) automated information extraction become desirable [26, 27].
                                                                                    Even then, building and structuring the rule base from extracted
KEYWORDS                                                                            information remains a major challenge.
                                                                                       We carried out case study experiments of building rule bases for
Information extraction, knowledge extraction, knowledge base, au-
                                                                                    two large real-life regulations, viz. MiFID-2 (Markets in Financial
tomated compliance checking, defeasible logic, Structured English,
                                                                                    Instruments Directive) and KYC (Know Your Customer) regulations
controlled natural language, SBVR, semantic model
                                                                                    [18, 26, 27]. Although we were helped by domain experts in under-
                                                                                    standing the business domain of each regulation, it was a complex
1    INTRODUCTION                                                                   task to encode formal rules from the natural language regulation
Enterprises need to comply with a plethora of regulations. The pro-                 text, both with and without the help of (semi-) automated extrac-
cess of compliance is made more complex by the fact that regulatory                 tion [26, 27]. Following our own approach of (semi-) automated
bodies publish guidelines pertaining to a single legislation in multi-              extraction [26, 27], it took several iterations before we got the rule
ple forms and regulatory documents such as directives, regulations,                 hierarchy and parameters of predicates right. Most importantly, it
and annexures containing supporting information such as reporting                   was hard to pinpoint rules modularly in the regulation text. By mod-
formats, data descriptions, and example cases. It is a great deal of                ularly, we mean that our rule extraction process [27], based on the
effort for domain experts to manually compile, correlate, and in-                   generation of a domain model and a dictionary [26], enabled us to
terpret information from all of these sources and translate it into                 classify the legal text sentences into those that pertain to regula-
implementation of compliance.                                                       tory rules and those that do not. Unaware of the greater structure
   Several approaches exist for automated legal information extrac-                 in which a regulatory body organizes the regulations, we ended up
tion that identify patterns and classify information available in un-               overlooking some critical rules that relate to such organization. It is
structured natural language text, annotate it, and in some cases con-               in this context that the work presented in this paper becomes relevant.
vert it into structured representations such as XML [10, 21, 28, 29].               Examples of these problems are provided in the case study section.
The resultant rules are however, not in a logic form that can be rigor-                This paper presents an approach to address the challenge of build-
ously reasoned with. Formal compliance checking approaches on the                   ing structured rule bases for large regulations guided by a generic
other hand use logic formalisms to represent regulation rules, how-                 semantic model. We use our approach for (semi-) automated extrac-
ever, these need to be encoded manually by human experts [26, 27].                  tion of a domain model, dictionary, and rule suggestions to get to
                                                                                    the rules [26, 27]. The generic model also serves as a hierarchical
                                                                                    template for creating the rule base, wherein the domain expert fills in
In: Proceedings of the Second Workshop on Automated Semantic Analysis of Informa-
tion in Legal Text (ASAIL 2017), June 16, 2017, London, UK.                         extracted information to create rules in a controlled natural language.
Copyright © 2017 held by the authors. Copying permitted for private and academic    The template helps create a coherent knowledge base of rules with
purposes.                                                                           an inference hierarchy that makes reasoning about higher-level goals
Published at http://ceur-ws.org
ASAIL 2017, June 16, 2017, London, UK                                                                          Deepali Kholkar, Sagar Sunkle, and Vinay Kulkarni


                                                                          Generic semantic
                                                                          model for regulations
                                                        Generic
                                                    regulation rules


                                                   Generic regulation
                                                       concepts



                                                    Information                   Rule
                   Large regulation                 Extraction                    template
                   document
                   sources
                                                                                                            Semantic model of       Translation Regulation
                                           Extracted rule     Extracted                                     specific regulation                 rule base
                                           suggestions        instances




                              Figure 1: Our method for automated rule extraction and regulation rule base creation


of the regulation possible. Most importantly, the template gives a                              A domain model and a dictionary of the concepts in the model
skeletal structure that ensures inclusion of principal categories of                         could be used as the central artifacts to drive the compliance process,
rules. The rules are translated automatically into a Semantics of                            giving the domain experts a more principled way of managing com-
Business Vocabulary and Rules (SBVR) model1 and further into a                               pliance. Such a domain model would be also helpful, if one were to
defeasible logic formalism DR-Prolog[1] as we detailed in [18].                              introduce the benefits of formal compliance checking in an industry
   Our overall approach is depicted in Figure 1. We first briefly                            setting [25]. This observation led us to come up with a method and
review our (semi-) automated extraction approach in Section 2, fol-                          a tool for generation of a domain model and a dictionary, detailed in
lowed by the description of the generic semantic model in Section 2.                         [27], revisited below.
Section 3 describes rule base creation and querying for compliance,                             Using Distributional Semantics for Building Domain Model
Section 4 discusses the utility of our approach, Section 5 describes                         and Dictionary Instead of using natural language processing (NLP)
related work, and Section 6 concludes the paper. We illustrate our                           for syntactic analysis of legal text, we chose to use NLP to imple-
approach using a real-life case study from the MiFID-2 regulation                            ment distributional semantics in the process of building the domain
applicable in the European Union (EU).                                                       model and the dictionary. Most of the state of the art NLP approaches
                                                                                             in creating domain models or ontologies rely on syntactic features of
2     OUR APPROACH FOR RULE EXTRACTION                                                       the tokens in the text [26, 27]. These approaches tend to use heuris-
                                                                                             tics, for instance, every noun phrase is a candidate for a concept,
In this section, we first review the generation of the domain model
                                                                                             every verb phrase is a candidate for a relation, and every adjective
and dictionary. We also go over the creation of a classifier that
                                                                                             is a candidate for a characteristic of a concept, etc. In our experi-
uses these artifacts to classify legal text sentences into those that
                                                                                             ence, such approaches are feasible, when a) the sentences in the
contribute to rules and those that do not. We then proceed to elaborate
                                                                                             given text are small2 , (b) the sentences possess simple phrasal and
on our approach for (semi-) automated creation of hierarchical rule
                                                                                             clausal structures that do not lead to multiple parses, and c) the
bases. Note that we only expound the key ideas without restating
                                                                                             overall number of sentences in the text under consideration is few
the results already published in [27]. We proceed by revisiting the
                                                                                             hundreds of sentences. For several hundreds of long and complex
motivation behind the domain modeling first.
                                                                                             sentences3 , which is the usual case in business domains like banking
                                                                                             and financial regulations4,5 , we needed to use techniques that did
2.1     Domain Modeling for Regulatory Compliance                                            not specifically depend on the syntactic features for constructing the
In our engagements and interactions with the domain experts from                             domain models.
enterprises active in the banking and financial services, we found                              We chose to use distributional semantics hypothesis [14] to help
that the domain experts would encode their knowledge in the form of                          the domain expert discover the domain model and the dictionary of
descriptive artifacts, within which they would establish some form                           concepts. The distributional semantics hypothesis states that words
of traceability. But in most cases, the backbone of this activity was                        that occur in the same contexts tend to have similar meanings. Since
a mental model of the regulation, which the domain experts had to
                                                                                                 2 Examples from most of these approaches contain sentences with 5-15 tokens
somehow corroborate with the artifacts, that the governance, risk,
                                                                                             (words). The Penn TreeBank, on which the statistical parsers like Stanford PCFG parser
and compliance (GRC) frameworks or the in-house solutions would                              and Malt parser are trained, has sentences with average length of 25.6 tokens [12].
let them create. However, the solutions did not offer the domain                                 3 In our Know Your Customer (KYC) for Indian Banks case study, we found that

experts a way to formalize their knowledge.                                                  average length of the sentences was 31.7 tokens. For the MiFID-2 text, it is 38.27 tokens.
                                                                                                 4 The number of sentences in the text offered online for KYC is 526, while in the
                                                                                             MiFID-2 is 4069. The sentences are obtained using heuristics based sentence detection
                                                                                             model and do not consider additional text from relevant documents that an enterprise
   1 Semantics of Business Vocabulary and Business Rules, http://www.omg.org/spec/
                                                                                             may have to consider for enacting compliance to these regulations.
SBVR/1.2/                                                                                        5 The KYC and MiFID-2 links are provided at the end of this section.
Semi-automated creation of regulation rule bases using generic template-driven rule extraction
                                                                                             ASAIL 2017, June 16, 2017, London, UK


this hypothesis is independent of syntactic features, the length or                            The interested reader is invited to refer to [26] and [27] for ex-
the phrasal or clausal complexity of the sentences do not restrict                          perimental results in both the generation of the domain model and
either the scope or the scale of its application. Following observa-                        dictionary as well as the rule classifier.
tions helped us in designing the implementation of the distributional                          From Classified Rules to Structured Regulations In the fol-
semantics hypothesis for domain modeling:                                                   lowing section, we describe how this approach of building a domain
        ∙ All regulations constrain the interaction of domain concepts                      model and a dictionary, and then constituting a rule classifier on
           in some manner [27]. To do so, the text of the regulations                       top of these, helps in hierarchical structuring of the regulations. We
           uses mentions of domain concepts. By getting handle on                           describe the structure of a regulation and the regulatory compliance
           concepts and their mentions, it becomes intuitively easy to                      problem context, then briefly outline the SBVR standard by Object
           understand what the regulation is trying to do and how to                        Management Group (OMG), that we use for creating a semantic
           specify it [26].                                                                 model of regulation rules. The rules in SBVR are translated to logic
        ∙ Fact-orientation (FO), a domain modeling method used                              form, that can be used for querying and checking compliance. SBVR
           for constructing vocabularies in SBVR [13, 22], uses the                         allows rules to be defined in its variant of controlled natural language
           same principle as the distributional semantics hypothesis                        called Structured English (SE). We first create a generic rule model
           in its conceptual schema design procedure (CSDP). The                            for regulations that serves as a template for the domain expert to
          very first step in CSDP is transform familiar information                         construct the rule base for a specific regulation using extracted infor-
           examples into elementary facts. When performed manually,                         mation, as depicted in Figure 1. The detailed process is described in
           a modeler essentially strives to check whether the contexts                      the next few sections.
           of familiar examples contain some hints to obtain concepts
           and relations [26].                                                              2.2    The Regulatory Compliance Context
    We refer to occurrences of the instances and the synonyms of                            Regulatory bodies introduce legislation to mitigate risks faced by
the domain concepts as mentions. Based on above observations, we                            individuals or enterprises. Introduction of new legislation often in-
compute the spans of texts of a configurable length, around (both to                        volves issue of a directive that gives abstract guidelines, followed by
the left and to the right of) the mentions of the domain entities. We                       a regulation that makes concrete recommendations for the guidelines.
cluster the contexts of each concept discovered so far, so as to find                       The regulatory body usually also makes available other supporting
its other mentions and the mentions of the concepts, to which it is                         documents such as regulatory technical specifications (RTS), and
likely related [26, 27].                                                                    consultation papers giving guidelines for implementation through
    The domain expert has the option to provide a seed set of domain                        data and reporting formats, explanatory use case scenarios, etc.7 .
concepts and their mentions to the system, generally found in the                               The directive and regulation, both define goals that aim to miti-
definitions section of the most industry regulations6 , or build the                        gate risks. The regulation typically applies in conjunction with the
domain model from scratch, starting with a single concept and its                           parent directive if it exists. Regulations always have a well-defined
mention                                                                                     scope within which they are applicable. They include detailed scope
    Using Informed Active Learning for a Rule Classifier Our                                rules defining this scope such as, entities to which the regulation
choice of active learning technique was motivated by the fact that the                      applies, conditions under which it applies, and exemption condi-
active learning process aims at keeping the domain expert annotation                        tions. They lay down obligations for entities that fall within the
effort to a minimum, only asking for advice where the training utility                      scope. Obligations are individual regulatory rules that apply to enter-
of the result of such a query is high [24].                                                 prises. Obligations are usually grouped into sections based on the
    For the purpose of classification of legal text sentences, it is pos-                   domain functions that they govern. In the prevalent manual practice
sible to use features based on various n-grams (n items like letters                        of regulatory compliance, enterprises that need to comply with the
or words), and part of speech classes like verbs, modal auxiliaries,                        regulation, legal and compliance experts, auditors, and even regula-
word couples and so on. Such features do provide acceptable results                         tors spend huge effort in understanding and interpreting the contents
for detecting arguments in legal text [21]. Instead of such features,                       of regulations in the context of enterprise compliance.
we make use of the domain model and the dictionary obtained previ-                              If a knowledge base that encoded all the obligations of a reg-
ously. A dictionary-based feature is activated whenever a mention                           ulation were available, the various stakeholders would be able to
of a domain concept is found in a given sentence. During the ac-                            query the same, for the purpose of implementing compliance, or to
tive learning sessions, the role of the domain expert is essentially                        ascertain enterprises’ compliance to the regulation. Queries could
to provide a judgment over classification suggested by the active                           include: ’What are the goals of the regulation?’, ’What are the risks
learner. The domain expert is queried for the top-k sentences one                           it aims to mitigate?’, ’What is the scope of the regulation?’, ’What
by one in each session in a console-based application, whereby the                          sort of entities does it apply to?’, ’What are the broad groups of
domain expert inputs the true class of the sentence queried by the                          obligations it describes?’, ’What are the obligations impacting en-
active learner [27].                                                                        terprises of type X?’, ’Given a certain set of data from the enterprise,
                                                                                            is it compliant?’. A knowledge base would make it possible to query
                                                                                            compliance to goals or sub-goals at various levels, to the regula-
     6 See the definitions section in European MiFID-2 regulations, Article 4 Definitions   tion as a whole, or to groups of obligations, as also to individual
at http://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32014L0065&                obligations, at various stages in the compliance process.
from=EN and the definitions section of Indian Know Your Customer regulations, Section
2 Definitions at https://rbi.org.in/scripts/BS_ViewMasCirculardetails.aspx?id=9848             7 MiFID documents
ASAIL 2017, June 16, 2017, London, UK                                                                 Deepali Kholkar, Sagar Sunkle, and Vinay Kulkarni


                                                          aims for                                    mitigates
                                            Directive                                       Goal
                                                  is supported by


                                                                       addresses
                                                       Regulation                                                    Risk
                                                                                                                         faced by
                                                has scope lays down

                                                                                                  fulfils
                                               Scope                 Obligation                                Enterprise
                                     applies to


                                                        Figure 2: Generic set of regulation concepts


   The key elements goals, risks, directives, regulations, scope, obli-              1   r u l e gen001 I t i s o b l i g a t o r y t h a t e n t e r p r i s e
gations, and their relationships define the semantic structure of a                              complies_with d i r e c t i v e i f d i r e c t i v e
                                                                                                 i s _ s u p p o r t e d _ b y r e g u l a t i o n && e n t e r p r i s e
regulation, depicted in Figure 2. We define a generic set of compli-
                                                                                                 complies_with r e g u l a t i o n
ance rules based on this structure, and term it our generic semantic                 2   r u l e gen002 I t i s o b l i g a t o r y t h a t e n t e r p r i s e
(rule) model for regulations. We use this generic model for both                                 complies_with r e g u l a t i o n i f e n t e r p r i s e
extraction of rules from the regulation text as well as a template for                           f a l l s _ w i t h i n _ s c o p e _ o f r e g u l a t i o n && r e g u l a t i o n
creating a regulation rule base. The next section describes SBVR                                 lays_down r e g u l a t i o n _ o b l i g a t i o n s && e n t e r p r i s e
                                                                                                  f u l f i l s regulation_obligations                                           Compliance
and SBVR SE, used to capture our semantic rule model.                                3   r u l e gen003 I t i s o b l i g a t o r y t h a t e n t e r p r i s e
                                                                                                 falls_within_scope_of regulation i f regulation
                                                                                                                                                                                 meta-model
2.3     Semantic modeling of rules using SBVR                                                    has_scope r e g u l a t i o n _ s c o p e && r e g u l a t i o n _ s c o p e
                                                                                                 applies_to enterprise
SBVR is an OMG standard that helps define a semantic model of                        4   r u l e gen004 I t i s necessary t h a t r e g u l a t i o n addresses
rules, where rules are defined as compositions of fact types. Fact                               r i s k i f r e g u l a t i o n a i m s _ f o r g o a l && g o a l m i t i g a t e s
types are relations between concepts. This is called a semantic model                            risk
because the meaning of the rule is explicated through its component                  5   r u l e gen005 I t i s o b l i g a t o r y t h a t e n t e r p r i s e
                                                                                                 falls_within_scope_of directive i f directive
facts and concepts.                                                                              i s _ s u p p o r t e d _ b y r e g u l a t i o n && e n t e r p r i s e
   Since SBVR is intended to capture the vocabulary and rules of                                 falls_within_scope_of regulation
a business domain, OMG provides a controlled natural language
notation for specifying the model, called SBVR Structured English                  Rules gen001 and gen002 define compliance for an enterprise to the
(SE)8                                                                              directive and regulation respectively. Rule gen003 evaluates whether
   Rules in SE are written by imposing modalities such as obligation               the enterprise falls within the scope of the regulation. Rule gen004
and necessity onto compositions of fact types. e.g. It is obligatory               defines the relation between goals and risks. These rules define a
that account has balance if customer holds account. Here, customer,                generic template that can be instantiated to create a rule template
account and balance are concepts, and ’customer holds account’ and                 for a specific regulation, by substituting generic concepts with their
’account has balance’ are fact types. SBVR SE being a restricted                   instances from the regulation text. The specific rule template is
subset of natural language, can be understood and used with ease by                then filled in with rules from the regulation text to create its rule
domain experts. We use SBVR SE to define some generic rules for                    base. Instances of generic concepts, and rule suggestions are found
regulations, detailed in the next subsection.                                      through automated extraction from the regulation text, using the
                                                                                   techniques detailed in earlier sections.
2.4     Generic Semantic Model for Regulations                                        We illustrate our approach of rule base creation using a subset of
The key elements of a regulation and their relations depicted in                   the MiFID-2 regulation. The next section gives a brief description
Figure 2 are concepts and fact types in SBVR terminology, in other                 of the regulation.
words, the conceptual model of a regulation. We define generic rules
for checking compliance based on these concepts and fact types,                    2.5     MiFID-2 example
depicted in Listing1.                                                              MiFID-2 is a directive introduced in the European Union to regulate
                                                                                   the functioning of financial markets and bring in greater transparency
               Listing 1: Generic rules for compliance                             in their operation, for safeguarding the interests of customers of
                                                                                   investment firms. It mainly lays down obligations on investment
     8 Semantics of Business Vocabulary and Business Rules: Annex A: SBVR Struc-   firms to report trading transactions carried out on secondary markets,
tured English, http://www.omg.org/spec/SBVR/1.2/                                   to the appropriate authorities, to enable oversight. The directive is
Semi-automated creation of regulation rule bases using generic template-driven rule extraction
                                                                                             ASAIL 2017, June 16, 2017, London, UK


supported by the MiFIR regulation, an RTS, and several consultation                                         4   competent a u t h o r i t y :REGULATOR l e g a l framework :REGULATORY
papers9 .                                                                                                                  FRAMEWORK r e g u l a t o r y framework :REGULATORY
                                                                                                                         FRAMEWORK
   Broadly, the level of detail increases from directive to regulation to
                                                                                                            5   e n t e r p r i s e : ENTERPRISE e n t e r p r i s e s : ENTERPRISE e n t i t y :
RTS. We used a subset of the text from each regulatory document for                                                      ENTERPRISE e n t i t i e s : ENTERPRISE o r g a n i z a t i o n :
our case study. Elements of regulatory information, their document                                                       ENTERPRISE o r g a n i z a t i o n s : ENTERPRISE i n s t i t u t i o n :
sources, and the criteria applied in picking the subset of document                                                      ENTERPRISE i n s t i t u t i o n s : ENTERPRISE f i r m : ENTERPRISE
text for the case study are listed below.                                                                                 f i r m s : ENTERPRISE
                                                                                                            6   DIRECTIVE>aims f o r >GOAL DIRECTIVE> i s supported by>
       ∙ Risks, goals, scope, definitions, and high-level obligations                                                    REGULATION GOAL> m i t i g a t e s >RISK REGULATION>has scope
          from Directive. The Introduction and Article 1 (Scope and                                                      >SCOPE REGULATION> l a y s down>OBLIGATION ENTERPRISE>
          Definition section) from the Directive were used as source                                                      f u l f i l s >OBLIGATION SCOPE> a p p l i e s to >ENTERPRISE RISK
                                                                                                                         >faced by>ENTERPRISE REGULATION>addresses >RISK
          text.
       ∙ Scope and definitions from regulation. Article 1 (Scope and
          Definitions) was used as source text.                                                         The rule extractor is run on all the available documents, viz. directive,
       ∙ Obligations from regulation. Here, we scoped the text by                                       regulation, and RTS. This brings up rule suggestions from the regu-
          selecting one high-level obligation from the directive and                                    latory texts that contain mentions of these key concepts. Instances of
          picking the corresponding sections from directive, regula-                                    concepts can be found in these, e.g. MiFID as instance of directive,
          tion, and RTS, viz. Article 26.                                                               transparency in financial markets as instance of goal. Examples of
       ∙ Detailed specification of obligations from RTS (Sections                                       rule suggestions that come up in the first iteration are scope rules and
          with references to Article 26).                                                               high-level obligations, due to the seed concepts scope, enterprise,
       ∙ Data definitions from regulation appendix                                                      requirements and their mentions given as input. These are illustrated
                                                                                                        in Listing 3.
Sample text from the directive document is reproduced here to il-
lustrate rule and non-rule text. The first paragraph is non-rule text
while the second paragraph gives a scope rule.                                                          Listing 3: Extracted rule suggestions from MiFID-2 documents
   The financial crisis has exposed weaknesses in the functioning                                           1   / / From D i r e c t i v e
                                                                                                            2   r u l e a5697 I t i s o b l i g a t o r y t h a t T h i s D i r e c t i v e a p p l i e s
and in the transparency of financial markets. The evolution of finan-                                                   t o i n v e s t m e n t f i r m s market o p e r a t o r s data r e p o r t i n g
cial markets has exposed the need to strengthen the framework for                                                         services providers t h i r d country firms providing
the regulation of markets in financial instruments, including where                                                     investment services or performing investment
trading in such markets takes place over-the-counter (OTC), in order                                                    a c t i v i t i e s e s t a b l i s h m e n t o f branch i n Union
                                                                                                            3   r u l e a1292 T h i s D i r e c t i v e e s t a b l i s h e s r e q u i r e m e n t s t o
to increase transparency, better protect investors, reinforce confi-
                                                                                                                        a u t h o r i s a t i o n operating co nditio ns investment f i r m s
dence, address unregulated areas, and ensure that supervisors are                                                          p r o v i s i o n of investment services or a c t i v i t i e s
granted adequate powers to fulfil their tasks......                                                                     t h i r d country firms .
   Article 1: Scope                                                                                         4   / / From R e g u l a t i o n
1. This Directive shall apply to investment firms, market operators,                                        5   r u l e a7790 T h i s R e g u l a t i o n e s t a b l i s h e s u n i f o r m
                                                                                                                        r e q u i r e m e n t s t o d i s c l o s u r e o f t r a d e data t o p u b l i c
data reporting services providers, and third-country firms providing                                                    r e p o r t i n g o f t r a n s a c t i o n s t o competent a u t h o r i t i e s
investment services or performing investment activities through the                                                     t r a d i n g o f d e r i v a t i v e s o r g a n i s ed venues
establishment of a branch in the Union....                                                                              d i s c r i m i n a t o r y access t o c l e a r i n g d i s c r i m i n a t o r y
   The next section describes automated extraction of these elements                                                    access t o t r a d i n g i n benchmarks p r o d u c t
                                                                                                                        i n t e r v e n t i o n powers o f competent a u t h o r i t i e s ESMA
from the document sources.
                                                                                                                        EBA powers o f ESMA p o s i t i o n management c o n t r o l s
                                                                                                                        p o s i t i o n l i m i t s p r o v i s i o n of investment services or
2.6        Rule Extraction from Regulatory Documents                                                                       a c t i v i t i e s t h i r d c o u n t r y f i r m s o r branch
In the first iteration, the generic concepts of Figure 2 and their
mentions are given as seed concepts for extraction. These are shown                                     The suggestions are in a format very close to Structured English.
in Listing2. Concepts and mentions are given as mention:CONCEPT,                                        The domain expert can use them to write SE rules with very little
and relations as CONCEPT>relation>CONCEPT.                                                              editing, illustrated in the next section. In subsequent iterations, spe-
Listing 2: Generic concepts and mentions input to rule extrac-                                          cific concepts from obligations that need to be explicated further are
tor                                                                                                     given to the rule extraction engine, to extract detailed obligations.
                                                                                                        The next section describes the steps to create the rule base using
   1    d i r e c t i v e : DIRECTIVE d i r e c t i v e s : DIRECTIVE r i s k s : RISK r i s k
                  : RISK r e g u l a t i o n : REGULATION r e g u l a t i o n s : REGULATION            extracted information.
   2    aim :GOAL g o a l :GOAL aims :GOAL g o a l s :GOAL need : OBLIGATION
                  necessary : OBLIGATION n e c e s s i t y : OBLIGATION                                 3       RULE BASE CREATION
                  r e q u i r e m e n t s : OBLIGATION r e q u i r e m e n t : OBLIGATION
                  p o l i c y : POLICY p o l i c i e s : POLICY r e g u l a t o r y t e c h n i c a l   Rule base creation using the template and extracted rule suggestions
                  s t a n da r d s : RTS                                                                is described here as a set of steps, illustrated in Figure 3.
   3    scope :SCOPE o b l i g a t i o n s : OBLIGATION o b l i g a t i o n : OBLIGATION                Step 1: Identify instances of generic concepts Instances of con-
                     d e f i n i t i o n s : DEFINITION d e f i n i t i o n : DEFINITION r u l e :
                                                                                                        cepts found in the rule suggestions are listed by the experts as in-
                 RULE c o n t r o l s :CONTROL
                                                                                                        stances in the rule base, using is_a facts, as shown in Listing 4.
       9 MiFID-2: http://ec.europa.eu/finance/securities/isd/mifid2/index_en.htm
ASAIL 2017, June 16, 2017, London, UK                                                                                                                                  Steps
                                                                                                                                           Deepali Kholkar, Sagar Sunkle, and in ruleKulkarni
                                                                                                                                                                              Vinay   base
                                                                                                                                                                                               creation

                 Iteration 1
                                                                                                                                               Step 2: Create specific
                                         Step 1: Identify instances of
                                                                                          Domain                                                   rule template
                                              generic concepts
                                                                                          expert
                                                                                                              Instances
                               Generic                                                                                                   Generic regulation
                               concept-driven                                                                                              rule template
                               extraction
                                                                    Extracted rule
                                                                    suggestions                                                                         Specific rule
                                                                                                                                                        template in SE

                        Regulatory                                                                                                                                     Step 3: Define scope rules and
                        documents                                                                              Domain                                                    obligations using template
                                                      Regulation-specific                                      expert
                                                      concept-driven
                                                      extraction                                              Define rules
                                                                                   Domain
                                       Iteration 2                                 expert
                                                                                                                                                                         Automated
                                                                             Identify
                                                                             concepts                                                           Regulation               translation            Regulation
                                                                                             Step 4: Identify concepts
                                                                                                                                                rule model                                      rule base
                                                                                              and detail obligations
                                                            Concepts
                                                                                                                                                                       Step 5: Generate logic
                                                                                                                                                                       specification of rules


                                                                      Figure 3: Steps in our rule base creation process

Listing 4: Instances of generic regulation concepts extracted                                                           4    r u l e gen003 I t i s o b l i g a t o r y t h a t e n t e r p r i s e
from MiFID-2 documents                                                                                                               f a l l s _ w i t h i n _ s c o p e _ o f MiFIR i f MiFIR has_scope
   1   rule   um001 MiFID i s _ a d i r e c t i v e                                                                                  MiFIR_scope && MiFIR_scope a p p l i e s _ t o e n t e r p r i s e
   2   rule   um002 MiFIR i s _ a r e g u l a t i o n                                                                   5    r u l e gen004 I t i s necessary t h a t MiFIR addresses r i s k i f
   3   rule   um003 MiFID i s _ s u p p o r t e d _ b y MiFIR                                                                          MiFIR a i m s _ f o r g o a l && g o a l m i t i g a t e s r i s k
   4   rule   um004 t r a n s p a r e n c y _ i n _ f i n a n c i a l _ m a r k e t s i s _ a g o a l                   6    r u l e gen005 I t i s o b l i g a t o r y t h a t e n t e r p r i s e
   5   rule   um005 w e a k n e s s _ i n _ f u n c t i o n i n g _ o f _ f i n a n c i a l _ m a r k e t s                          f a l l s _ w i t h i n _ s c o p e _ o f MiFID i f MiFID
              is_a r i s k                                                                                                           i s _ s u p p o r t e d _ b y MiFIR && e n t e r p r i s e
   6   rule   um006 r e g u l a t i o n _ o f _ f i n a n c i a l _ m a r k e t s i s _ a g o a l                                    f a l l s _ w i t h i n _ s c o p e _ o f MiFIR
   7   rule   um007 MiFIR_scope i s _ a r e g u l a t i o n _ s c o p e
   8   rule   um008 MiFIR a i m s _ f o r                                                                          Step 3: Define scope rules and obligations using the template
              transparency_in_financial_markets
   9   rule   um009 MiFIR has_scope MiFIR_scope
                                                                                                                   The specific template contains placeholders for scope rules and
  10   rule   um010 t r a n s p a r e n c y _ i n _ f i n a n c i a l _ m a r k e t s m i t i g a t e s            obligations, in rules gen003 and gen002, in the predicates MiFIR
              weakness_in_functioning_of_financial_markets                                                         has_scope MiFIR_scope, MiFIR_scope applies_to enterprise, Mi-
  11   rule   um011 r e g u l a t i o n _ o f _ f i n a n c i a l _ m a r k e t s m i t i g a t e s                FIR lays_down MiFIR_obligations, and enterprise fulfils MiFIR
              weakness_in_functioning_of_financial_markets
                                                                                                                   obligations respectively. These need to be further detailed in order
  12   rule   um012 M i F I R _ o b l i g a t i o n s i s _ a r e g u l a t i o n _ o b l i g a t i o n s
  13   rule   um013 MiFIR a i m s _ f o r                                                                          to complete the definition of the rule base. Their details are obtained
              regulation_of_financial_markets                                                                      from the extracted rule suggestions illustrated in Listing3. Rules
  14   rule   um014 MiFIR lays_down M i F I R _ o b l i g a t i o n s                                              written using the suggestions can be seen in Listing6, with the same
  15   rule   um015 MiFIR has_scope MiFIR_scope                                                                    rule numbers.

Step 2: Create specific rule template
                                                                                                                             Listing 6: Rule base with scope rules and obligations
The generic rule template of Listing 1 is instantiated by replacing
                                                                                                                        1    r u l e a5697 I t i s o b l i g a t o r y t h a t MiFIR_scope i s _ f o r
concept names with instance names in the rules to generate the
                                                                                                                                     enterprise i f enterprise is_a investment_firm | |
specific rule template for the regulation shown in Listing5.                                                                         e n t e r p r i s e is_a regulated_markets | | e n t e r p r i s e
                                                                                                                                     is_a r e p o r t i n g _ f i r m | | enterprise is_a
 Listing 5: Specific rule template for the MiFID-2 regulation                                                                        t h i r d _ c o u n t r y _ i n v e s t m e n t _ f i r m s _ o p e r a t i n g _ i n _ E U &&
   1   / / Instantiated rules                                                                                                        e n t e r p r i s e h a s _ e s t a b l i s h e d branch_in_EU
   2   r u l e gen001 I t i s o b l i g a t o r y t h a t e n t e r p r i s e                                           2    r u l e a7790 I t i s o b l i g a t o r y t h a t e n t e r p r i s e f u l f i l s
               c o m p l i e s _ w i t h MiFID i f MiFID i s _ s u p p o r t e d _ b y MiFIR                                         MiFIR_requirements i f e n t e r p r i s e f u l f i l s
               && e n t e r p r i s e c o m p l i e s _ w i t h MiFIR                                                                requirements_for disclosure_of_trade_data_to_public
   3   r u l e gen002 I t i s o b l i g a t o r y t h a t e n t e r p r i s e                                                          &&
               c o m p l i e s _ w i t h MiFIR i f e n t e r p r i s e                                                  3    enterprise f u l f i l s requirements_for_reporting_of
               f a l l s _ w i t h i n _ s c o p e _ o f MiFIR && MiFIR lays_down                                                    t r a n s a c t i o n s && e n t e r p r i s e f u l f i l s
               M i F I R _ o b l i g a t i o n s && e n t e r p r i s e f u l f i l s                                                requirements_for_trading_of_derivatives_on
               MiFIR_obligations                                                                                                     organised_venues &&
Semi-automated creation of regulation rule bases using generic template-driven rule extraction
                                                                                             ASAIL 2017, June 16, 2017, London, UK


    4   enterprise f u l f i l s requirements_for_non_discriminatory                                            takes away the complexity associated with manual construction of
             a c c e s s _ t o _ c l e a r i n g && e n t e r p r i s e f u l f i l s                           formal logic rules. Since extracted rules depend completely on seed
             requirements_for
                                                                                                                concepts given, which may vary from user to user, guided extraction
             n o n _ d i s c r i m i n a t o r y _ a c c e s s _ t o _ t r a d i n g _ b e n c h m a r k s &&
             enterprise f u l f i l s requirements_for_product                                                  with the generic set of seed concepts gives uniformity and assurance.
             i n t e r v e n t i o n _ p o w e r s &&                                                           An example of omission during our earlier manual rule base creation
    5   enterprise f u l f i l s                                                                                experiment is that we had missed encoding scope rules regarding
             requirements_for_activities_by_third_party_firms                                                   enterprises to which the MiFID-2 regulation is applicable.
                                                                                                                    Reduced burden on domain experts and faster knowledge engi-
rule a5697 defines MiFIR_scope is_for enterprise in terms of the                                                neering seem to justify the development cost of our rule generation
scope rule rule a5697 extracted from the directive text, that details                                           framework. However, empirical evidence is crucial to support this
the kinds of enterprises the regulation applies to. rule a7790 defines                                          claim. We are in the process of conducting a systematic empir-
enterprise fulfils MiFIR_requirements as a set of high-level obliga-                                            ical evaluation of our approach. It must be mentioned here that
tions given in the regulation, again obtained from the extracted rule                                           the problems mentioned in the pioneering and extensive work on
suggestion.                                                                                                     encoding regulations in formal logic [5, 23] such as need for simpli-
    The obligation rules are detailed further, using suggestions ex-                                            fication when encoding legislations and handling cross-references
tracted from regulation or RTS documents. e.g. rule a8033, a9259,                                               within regulations, remain. We have dealt with some of these such
a8133, a8233 define enterprise fulfils requirements_for_reporting_of                                            as simplification of complex sentence constructs, bulleted lists, and
transactions using a hierarchy of rules expressing obligations, shown                                           cross-references in our work on extraction.
in Listing 7. Concepts from the obligations are then given as input to                                              The important objective of having a formal rule base is being able
the rule extractor to extract the next set of rule suggestions, that can                                        to answer queries about the regulation. Using our resultant rule base
be used to detail obligations still further. This process is iteratively                                        structure, we are able to answer the queries listed in Section 2.2 with
followed.                                                                                                       regard to goals addressed by the regulation, the kinds of enterprises
Step 4: Generate logic specification of rules The SE rules written                                              it applies to, as well as compliance of all or specific entities whose
by the domain expert are automatically translated into an SBVR                                                  ground data is provided, to all or specific rules.
model of rules. The description of this work is outside the scope                                                   Pros of our approach include automated extraction of necessary
of this paper. Rules in SBVR are translated to defeasible logic for-                                            supporting rules such as investment_firm executes transaction and
malism DR-Prolog[1], using the translation mechanism described in                                               transaction trades financial_instrument, that were not retrieved even
[18]. The translated rules in DR-Prolog are shown in Listing8.                                                  in non-guided extraction. We are currently testing this hypothesis
    These rules can be directly checked against enterprise data facts.                                          on larger examples. We have not so far encountered any problems
It is seen from the listing in DR-Prolog that the lowest-level rules                                            of using automated rule generation. A difference from the manual
contain data definitions. Our earlier work [17] deals with the pro-                                             encoding approach is that the user needs to familiarize himself
cess of arriving at these rules, as well as the necessary enterprise                                            with generated rule names and some indirections in the generated
data facts, hence it is not explained here. The term obligations has                                            rules when tracing rule execution during compliance checking. Rule
been used throughout the text to mean rules that have any of the                                                expressiveness in our approach is adequate and scales well, since
modalities obligation, permission, prohibition and necessity. Each                                              SBVR has a very rich meta-model with a direct correspondence to
of these modalities has been implemented using the defeasible logic                                             SBVR Structured English. Being fact-oriented, SBVR maps directly
metaprogram of DR-Prolog[1], which provides an implementation                                                   to DR-Prolog, which scales for large datasets as well.
for each modality using the constructs available in standard Prolog.                                                The generic model described here is a little basic, but can and
In the next section, we discuss whether the described approach meets                                            should be altered as required, if the regulation being worked on
its objectives.                                                                                                 has a different structure or important sections or elements that need
                                                                                                                to be incorporated into the structure of the rule base. We plan to
4       DISCUSSION                                                                                              also enhance the model using the learning from our experience
One of the key objectives of using the template based approach is                                               of applying the approach to several regulations. The next section
isomorphism, i.e. imparting a structure to the rule base similar to                                             reviews related work.
that of the original regulation document sources, discussed in [4].
The generic concept model we have used to guide the extraction
refers to the regulatory sources, viz. directive and regulation, and
                                                                                                                5   RELATED WORK
embodies the structure of a regulation. This structure and traceabil-                                           We survey work with similar objectives as ours, of structured repre-
ity is maintained right from initial rule specification in Structured                                           sentation of regulatory content to allow formal checking and analysis.
English to the model and formal rule specification in DR-Prolog.                                                Encoding the British Nationality Act as a logic program in Prolog
In the absence of a process such as this, the onus is on experts to                                             [23] was pioneering work in encoding regulations in formal logic,
ensure both structure and correctness. Structuring and achieving                                                as is [5]. A need for intermediate representation of natural language
isomorphism becomes subjective. A manual process of rule con-                                                   regulations was underlined in [4]. These and later formal approaches
struction goes through several iterations to achieve correctness in                                             to encoding regulations [3, 11] require regulation rules to be encoded
the rule hierarchy. Our generic model-guided extraction seeks to                                                manually in the logic formalism.
avoid omission of key elements and reduce errors through automated                                                 The important requirement of integrating compliance checking
generation of the rule hierarchy, predicates, and parameters. This                                              and accessing related regulatory documents is addressed in [16].
ASAIL 2017, June 16, 2017, London, UK                                                                                                      Deepali Kholkar, Sagar Sunkle, and Vinay Kulkarni


                                                                         Listing 7: Rule base with detailed obligations
   1   / / D e t a i l e d o b l i g a t i o n s from d i r e c t i v e / r e g u l a t i o n
   2   r u l e a8033 I t i s o b l i g a t o r y t h a t e n t e r p r i s e f u l f i l s r e q u i r e m e n t s _ f o r _ r e p o r t i n g _ o f _ t r a n s a c t i o n s i f e n t e r p r i s e i s _ a
                  i n v e s t m e n t _ f i r m && e n t e r p r i s e r e p o r t s c o m p l e t e _ a c c u r a t e _ d e t a i l s _ o f _ t r a n s a c t i o n s _ t o _ c o m p e t e n t _ a u t h o r i t y &&
                  i n v e s t m e n t _ f i r m executes t r a n s a c t i o n && t r a n s a c t i o n t r a d e s f i n a n c i a l _ i n s t r u m e n t && f i n a n c i a l _ i n s t r u m e n t
                 traded_at_or_admitted_to trading_venue
   3   r u l e a8037 I t i s o b l i g a t o r y t h a t e n t e r p r i s e r e p o r t s c o m p l e t e _ a c c u r a t e _ d e t a i l s _ o f _ t r a n s a c t i o n s _ t o _ c o m p e t e n t _ a u t h o r i t y i f
                  e n t e r p r i s e r e p o r t s t r a n s a c t i o n _ d e t a i l s && e n t e r p r i s e r e p o r t s _ t o t r a d e _ r e p o s i t o r y && e n t e r p r i s e r e p o r t s
                 before_close_of_working_day
   4   r u l e a9259 I t i s necessary t h a t f i n a n c i a l _ i n s t r u m e n t t r a d e d _ a t _ o r _ a d m i t t e d _ t o t r a d i n g _ v e n u e i f f i n a n c i a l _ i n s t r u m e n t
                 traded_at trading_venue | |
   5       f i n a n c i a l _ i n s t r u m e n t admitted_to t r a d i n g | | underlying_instrument traded_at trading_venue | |
   6       underlying_index_or_basket traded_at trading_venue
   7   r u l e a8133 I t i s o b l i g a t o r y t h a t e n t e r p r i s e r e p o r t s t r a n s a c t i o n _ d e t a i l s i f e n t e r p r i s e r e p o r t s r e p o r t a b l e _ t r a n s a c t i o n s &&
   8       e n t e r p r i s e does_not_report excluded_transactions
   9   r u l e a8233 I t i s o b l i g a t o r y t h a t e n t e r p r i s e r e p o r t s r e p o r t a b l e _ t r a n s a c t i o n s i f t r a d e i s _ c o n s t i t u t e d _ f o r r e p o r t i n g _ p u r p o s e
  10   r u l e a7029 I t i s o b l i g a t o r y t h a t t r a d e i s _ c o n s t i t u t e d _ f o r r e p o r t i n g _ p u r p o s e i f t r a d e i s a c q u i s i t i o n | | t r a d e i s d i s p o s a l
                  | | trade i s simultaneous_acquisition_disposal_no_change_in_ownership
  11   r u l e a2467 t r a d e i s a c q u i s i t i o n i f t r a d e i s p u r c h a s e _ o f _ f i n a n c i a l _ i n s t r u m e n t | | t r a d e i s
                  entering_into_derivative_contract | | trade i s increase_in_notional




They build a compliance assistance framework that uses first-order                                                      GaiusT, a tool based on Cerno framework and related research
logic (FOL) representations of regulation rules, as well as related                                                  presents semi-automated rule extraction with precision and recall
questions and answers. The FOL rules are linked to their occurrences                                                 numbers similar to ours [7, 8, 19, 30, 31]. In contrast to our approach,
in regulation documents through an XML representation of tagged                                                      this tool and the framework use a number of intermediate artifacts,
regulation information. Writing of the FOL rules and tagging to                                                      namely form simplification through semantic parameterization [7,
regulation text is done manually. Our endeavour is semi-automated                                                    8], structural comprehension [30], as well as semantic annotation
creation of a rule base of the entire regulation, and logical structuring                                            [30]. Their approach seems to be restricted in applicability as well,
of the rule base to be able to reason about compliance to the higher-                                                since all of the above activities have to be performed for any new
level goals of the regulation.                                                                                       regulation to which they apply the framework. It is likely that when
   Most of the current state of the art in legal rule extraction contains                                            the regulation is large like MiFID-2 or FATCA10 , the interaction
an implicit step of rule identification. This step often encompasses                                                 between various components becomes hard to handle. At the same
several other constituent steps, like identifying segmentation of reg-                                               time, this approach presents some ideas around a conceptual (meta-)
ulations, ascertaining modality of the regulations such as whether                                                   model of deontic concepts and a rules generator which could be of
the rule is an obligation or a permission, and so on [7, 28–30]. The                                                 use to us.
final constituent step is often writing the chosen logical specification                                                An approach presented in [9] contrasts ML-based text classifi-
of NL rule. We believe that by separating these concerns from rule                                                   cation with knowledge engineering-based (KE-based) text classi-
identification, it is possible to defer their treatment until we obtain                                              fication. The idea behind KE is that definitions of legal terms are
logical specifications, albeit partial ones. We believe that the logical                                             formulated using specific phrases and presuming that only a few
specification language itself, such as for instance DR-Prolog [1, 2]                                                 clear and easily observable pattens were used for each type of legal
can be used to take care of the segmentation and cross referencing,                                                  sentences or provisions, then these so called classification patterns
because at that level of abstraction, we already have access to schema                                               could be used for classification. It was found that such approach is
of the information required for regulations.                                                                         susceptible to the same complexities of legal sentences which also af-
   In contrast, the current state of the art often focuses on the treat-                                             fect ML-based classification negatively. Some of these complexities
ment of legal syntactical specifics early on. This is evident in the                                                 are classification keywords appearing in auxiliary sentences rather
governance extraction model which manually classifies and attempts                                                   than the main sentence to be classified, missing standard phrases,
to extract regulations as legal requirements in terms of procedural,                                                 syntactical and lexical variation in the standard phrases and so on.
declarative, ontology statements [15]. Further fine level classification                                             In our own experiments, we too included certain phrases as indicat-
includes access-rights statements and delegation of authority rights.                                                ing definitional regulations (those regulations which define domain
Another work proposes to group legal sentences into few categories                                                   entities and their specializations) as well as rules in the approximate
referred to as juridical natural language constructs (JNLCs). JNLCs                                                  dictionary chunker. Like the results in [9], our own experiments indi-
are proposed to be parsed using unification grammars [28]. In a                                                      cated that there are no perceptible differences in the performance of
similar vein, legal concepts are proposed to be classified into rights,                                              the classifier when these additional phrases are considered as well. In
obligations, privileges, no-rights, powers, liabilities, immunities, and                                             contrast to [9], if we liken our approach to KE-based classification,
disabilities using a production rule model in [20]. Another work                                                     then we go one step further and actually make use of the domain
proposes to use a categorization of provisions and an ML classifier                                                  model entities and the dictionary in the classification.
trained to identify the provisions in [6].
                                                                                                                        10 FATCA: Foreign Account Tax Compliance Act, https://www.irs.gov/businesses/
                                                                                                                     corporations/foreign-account-tax-compliance-act-fatca
Semi-automated creation of regulation rule bases using generic template-driven rule extraction
                                                                                             ASAIL 2017, June 16, 2017, London, UK


                                                                     Listing 8: A section of the translated rules in DR-Prolog
     1   d e f e a s i b l e ( gen001 , o b l i g a t i o n , complies_withMiFID ( LEI ) , [ r u l e 2 1 0 ( LEI ) ] ) .
     2   d e f e a s i b l e ( gen002 , o b l i g a t i o n , complies_withMiFIR ( LEI ) , [ r u l e 2 1 2 ( LEI ) ] ) .
     3   d e f e a s i b l e ( gen003 , o b l i g a t i o n , f a l l s _ w i t h i n _ s c o p e _ o f M i F I R ( LEI ) , [ r u l e 2 1 5 ( LEI ) ] ) .
     4   d e f e a s i b l e ( a5697 , o b l i g a t i o n , m i F I R _ s c o p e A p p l i e s _ t o E n t e r p r i s e ( LEI ) , [ r u l e 2 2 3 ( LEI ) ] ) .
     5   d e f e a s i b l e ( a7790 , o b l i g a t i o n , f u l f i l s M i F I R _ o b l i g a t i o n s ( LEI ) , [ r u l e 2 2 5 ( LEI ) ] ) .
     6   d e f e a s i b l e ( a8033 , o b l i g a t i o n , f u l f i l s R e q u i r e m e n t s _ f o r _ r e p o r t i n g _ o f _ t r a n s a c t i o n s ( LEI ) , [ r u l e 2 3 2 ( LEI ) ] ) .
     7   d e f e a s i b l e ( a8037 , o b l i g a t i o n , r e p o r t s C o m p l e t e _ a c c u r a t e _ d e t a i l s _ o f _ t r a n s a c t i o n s _ t o _ c o m p e t e n t _ a u t h o r i t y ( LEI ) , [ r u l e 2 3 7 ( LEI )
                   ]) .
     8   d e f e a s i b l e ( a8133 , o b l i g a t i o n , r e p o r t s T r a n s a c t i o n _ d e t a i l s ( LEI ) , [ r u l e 2 4 0 ( LEI ) ] ) .
     9   d e f e a s i b l e ( a8233 , o b l i g a t i o n , r e p o r t s R e p o r t a b l e _ t r a n s a c t i o n s ( LEI ) , [ r u l e 2 4 2 ( LEI ) ] ) .
    10   d e f e a s i b l e ( a7029 , o b l i g a t i o n , i s _ c o n s t i t u t e d _ f o r R e p o r t i n g _ p u r p o s e ( T r a n s r e f ) , [ r u l e 3 0 2 ( T r a n s r e f ) ] ) .
    11   f a c t ( r u l e 2 1 1 ( LEI ) ) : − f a c t ( miFIDIs_supported_byMiFIR ( LEI ) ) , f a c t ( complies_withMiFIR ( LEI ) ) .
    12   f a c t ( r u l e 2 1 0 ( LEI ) ) : − f a c t ( r u l e 2 1 1 ( LEI ) ) .
    13   f a c t ( r u l e 2 1 3 ( LEI ) ) : − f a c t ( f a l l s _ w i t h i n _ s c o p e _ o f M i F I R ( LEI ) ) , f a c t ( miFIRLays_downMiFIR_obligations ( LEI ) ) , f a c t (
                   f u l f i l s M i F I R _ o b l i g a t i o n s ( LEI ) ) .
    14   f a c t ( r u l e 2 1 2 ( LEI ) ) : − f a c t ( r u l e 2 1 3 ( LEI ) ) .
    15   f a c t ( r u l e 2 1 6 ( LEI ) ) : − f a c t ( miFIRHas_scopeMiFIR_scope ( LEI ) ) , f a c t ( m i F I R _ s c o p e A p p l i e s _ t o E n t e r p r i s e ( LEI ) ) .
    16   f a c t ( r u l e 2 1 5 ( LEI ) ) : − f a c t ( r u l e 2 1 6 ( LEI ) ) .
    17   f a c t ( r u l e 2 2 3 ( LEI ) ) : − f a c t ( i s _ a I n v e s t m e n t _ f i r m ( LEI ) ) .
    18   f a c t ( r u l e 2 2 3 ( LEI ) ) : − f a c t ( i s _ a M a r k e t _ o p e r a t o r ( LEI ) ) .
    19   f a c t ( r u l e 2 2 3 ( LEI ) ) : − f a c t ( i s _ a R e p o r t i n g _ f i r m ( LEI ) ) .
    20   f a c t ( r u l e 2 2 4 ( LEI ) ) : − f a c t ( i s _ a T h i r d _ c o u n t r y _ i n v e s t m e n t _ f i r m s _ o p e r a t i n g _ i n _ E U ( LEI ) ) , f a c t ( has_establishedBranch_in_EU (
                   LEI ) ) .
    21   f a c t ( r u l e 2 2 3 ( LEI ) ) : − f a c t ( r u l e 2 2 4 ( LEI ) ) .
    22   f a c t ( r u l e 2 2 6 ( LEI ) ) : − f a c t ( f u l f i l s R e q u i r e m e n t s _ f o r _ d i s c l o s u r e _ o f _ t r a d e _ d a t a _ t o _ p u b l i c ( LEI ) ) , f a c t (
                   f u l f i l s R e q u i r e m e n t s _ f o r _ r e p o r t i n g _ o f _ t r a n s a c t i o n s ( LEI ) ) , f a c t (
                   f u l f i l s R e q u i r e m e n t s _ f o r _ t r a d i n g _ o f _ d e r i v a t i v e s _ o n _ o r g a n i s e d _ v e n u e s ( LEI ) ) , f a c t (
                   f u l f i l s R e q u i r e m e n t s _ f o r _ n o n _ d i s c r i m i n a t o r y _ a c c e s s _ t o _ c l e a r i n g _ a n d _ t r a d i n g _ i n _ b e n c h m a r k s ( LEI ) ) , f a c t (
                   f u l f i l s R e q u i r e m e n t s _ f o r _ p r o d u c t _ i n t e r v e n t i o n _ p o w e r s ( LEI ) ) , f a c t (
                   f u l f i l s R e q u i r e m e n t s _ f o r _ a c t i v i t i e s _ b y _ t h i r d _ p a r t y _ f i r m s ( LEI ) ) .
    23   f a c t ( r u l e 2 2 5 ( LEI ) ) : − f a c t ( r u l e 2 2 6 ( LEI ) ) .
    24   f a c t ( r u l e 2 3 3 ( LEI ) ) : − f a c t ( i s _ a I n v e s t m e n t _ f i r m ( LEI ) ) , f a c t (
                   r e p o r t s C o m p l e t e _ a c c u r a t e _ d e t a i l s _ o f _ t r a n s a c t i o n s _ t o _ c o m p e t e n t _ a u t h o r i t y ( LEI ) ) , f a c t (
                   i n v e s t m e n t _ f i r m E x e c u t e s T r a n s a c t i o n ( LEI ) ) , f a c t ( t r a n s a c t i o n T r a d e s F i n a n c i a l _ i n s t r u m e n t ( LEI ) ) , f a c t (
                   f i n a n c i a l _ i n s t r u m e n t T r a d e d _ a t _ o r _ a d m i t t e d _ t o T r a d i n g _ v e n u e ( LEI ) ) .
    25   f a c t ( r u l e 2 3 8 ( LEI ) ) : − f a c t ( r e p o r t s T r a n s a c t i o n _ d e t a i l s ( LEI ) ) , f a c t ( r e p o r t s _ t o T r a d e _ r e p o s i t o r y ( LEI ) ) , f a c t (
                   r e p o r t s B e f o r e _ c l o s e _ o f _ w o r k i n g _ d a y ( LEI ) ) .
    26   f a c t ( r u l e 2 3 7 ( LEI ) ) : − f a c t ( r u l e 2 3 8 ( LEI ) ) .
    27   f a c t ( r u l e 2 4 1 ( LEI ) ) : − f a c t ( r e p o r t s R e p o r t a b l e _ t r a n s a c t i o n s ( LEI ) ) , f a c t ( d o e s _ n o t _ r e p o r t E x c l u d e d _ t r a n s a c t i o n s ( LEI ) ) .
    28   f a c t ( r u l e 2 4 0 ( LEI ) ) : − f a c t ( r u l e 2 4 1 ( LEI ) ) .
    29   f a c t ( r u l e 2 4 2 ( LEI ) ) : − f a c t ( i s _ c o n s t i t u t e d _ f o r R e p o r t i n g _ p u r p o s e ( T r a n s r e f ) ) .
    30   f a c t ( rule302 ( Transref ) ) :− f a c t ( i s A c q u i s i t i o n ( Transref ) ) .
    31   f a c t ( rule302 ( Transref ) ) :− f a c t ( isDisposal ( Transref ) ) .
    32   f a c t ( rule302 ( Transref ) ) :− f a c t ( isSimultaneous_acquisition_disposal_no_change_in_ownership ( Transref ) ) .
    33   f a c t ( rule303 ( Transref ) ) :− f a c t ( isPurchase_of_financial_instrument ( Transref ) ) .
    34   f a c t ( rule303 ( Transref ) ) :− f a c t ( i s E n t e r i n g _ i n t o _ d e r i v a t i v e _ c o n t r a c t ( Transref ) ) .
    35   f a c t ( rule303 ( Transref ) ) :− f a c t ( i s I n c r e a s e _ i n _ n o t i o n a l ( Transref ) ) .
    36   f a c t ( i s A c q u i s i t i o n ( Transref ) ) :− f a c t ( rule303 ( Transref ) ) .
    37   f a c t ( rule304 ( Transref ) ) :− f a c t ( i s S a l e _ o f _ f i n a n c i a l _ i n s t r u m e n t ( Transref ) ) .
    38   f a c t ( rule304 ( Transref ) ) :− f a c t ( i s C l o s i n g _ o u t _ o f _ d e r i v a t i v e _ c o n t r a c t ( Transref ) ) .
    39   f a c t ( rule304 ( Transref ) ) :− f a c t ( isDecrease_in_notional ( Transref ) ) .
    40   f a c t ( isDisposal ( Transref ) ) :− f a c t ( rule304 ( Transref ) ) .
    41   f a c t ( hasTradetype_buy ( T r a n s r e f ) ) : − f a c t ( hasTradetype ( T r a n s r e f , ' buy ' ) ) .
    42   f a c t ( hasTradetype ( T r a n s r e f , Tradetype ) ) : − f a c t ( t r a d e ( T r a n s r e f , Tradetype ) ) .
    43   f a c t ( i s P u r c h a s e _ o f _ f i n a n c i a l _ i n s t r u m e n t ( T r a n s r e f ) ) : − f a c t ( hasTradetype_buy ( T r a n s r e f ) ) .
    44   f a c t ( hasTradetype_buy ( T r a n s r e f ) ) : − f a c t ( hasTradetype ( T r a n s r e f , ' buy ' ) ) .
    45   f a c t ( hasTradetype ( T r a n s r e f , Tradetype ) ) : − f a c t ( t r a d e ( T r a n s r e f , Tradetype ) ) .




6        CONCLUSIONS AND FUTURE WORK                                                                                         and found it gave us a method, and a structured rule base with less
We used a generic conceptual model of regulations to extract specific                                                        rework, as well as some assurances on inclusion of vital sections
concepts, relations, and rules from the regulation text. This gave us                                                        of information about the regulation. It created a rule hierarchy that
a better directed approach to rule extraction and more structured                                                            helps reason all the way from ground data to high-level goals of
rule suggestions. We used a generic model of regulation rules based                                                          the regulation. We believe this principled approach gives us a more
on the conceptual model as a template for the regulation rule base                                                           accurate and functional model of the regulation.
ASAIL 2017, June 16, 2017, London, UK                                                                    Deepali Kholkar, Sagar Sunkle, and Vinay Kulkarni


   We have experimented with using this generic concept-driven                                Rights and Obligations for Regulatory Compliance. In Conceptual Modeling - ER.
extraction on sections of the KYC regulation. We plan to further test                         154–168. https://doi.org/10.1007/978-3-540-87877-3_13
                                                                                        [20] Jeremy C. Maxwell and Annie I. Antón. 2010. The Production Rule Framework:
the method on the entire KYC regulation and two more regulations,                             Developing a Canonical Set of Software Requirements for Compliance with Law.
and enhance the generic model and template as required. We also                               In Proceedings of the 1st ACM International Health Informatics Symposium (IHI
                                                                                             ’10). ACM, New York, NY, USA, 629–636. https://doi.org/10.1145/1882992.
plan to conduct an empirical study comparing this approach of rule                            1883092
base construction to the manual one, as well as to the conventional                     [21] Marie-Francine Moens, Erik Boiy, Raquel Mochales Palau, and Chris Reed. 2007.
industry approach to compliance.                                                              Automatic Detection of Arguments in Legal Texts (ICAIL ’07). ACM, New York,
                                                                                              NY, USA, 225–230. https://doi.org/10.1145/1276318.1276362
                                                                                        [22] Sjir Nijssen. 2007. SBVR: Semantics for business. (2007).
REFERENCES                                                                              [23] Marek J. Sergot, Fariba Sadri, Robert A. Kowalski, F. Kriwaczek, Peter Hammond,
                                                                                              and H. T. Cory. 1986. The British Nationality Act as a Logic Program. Commun.
 [1] Grigoris Antoniou, Nikos Dimaresis, and Guido Governatori. 2007. A System                ACM 29, 5 (1986), 370–386. https://doi.org/10.1145/5689.5920
     for Modal and Deontic Defeasible Reasoning. In AI 2007: Advances in Artificial     [24] Burr Settles. 2009. Active Learning Literature Survey. Computer Sciences
     Intelligence, 20th Australian Joint Conference on Artificial Intelligence, Gold          Technical Report 1648. University of Wisconsin–Madison.
     Coast, Australia, December 2-6, 2007, Proceedings. 609–613. https://doi.org/10.    [25] Sagar Sunkle, Deepali Kholkar, and Vinay Kulkarni. 2015. Model-driven regu-
     1007/978-3-540-76928-6_62                                                                latory compliance: A case study of “Know Your Customer” regulations. In 18th
 [2] Grigoris Antoniou, Nikos Dimaresis, and Guido Governatori. 2009. A modal                 ACM/IEEE International Conference on Model Driven Engineering Languages
     and deontic defeasible reasoning system for modelling policies and multi-agent           and Systems, MoDELS 2015, Ottawa, ON, Canada, September 30 - October 2,
     systems. Expert Syst. Appl. 36, 2 (2009), 4125–4134. https://doi.org/10.1016/j.          2015. 436–445. https://doi.org/10.1109/MODELS.2015.7338275
     eswa.2008.03.009                                                                   [26] Sagar Sunkle, Deepali Kholkar, and Vinay Kulkarni. 2016. Comparison and
 [3] Ahmed Awad, Sergey Smirnov, and Mathias Weske. 2009. Resolution of Compli-               Synergy Between Fact-Orientation and Relation Extraction for Domain Model
     ance Violation in Business Process Models: A Planning-Based Approach. OTM                Generation in Regulatory Compliance. In Conceptual Modeling - 35th Interna-
     Conferences (1) 2009: 6-23.                                                              tional Conference, ER 2016, Gifu, Japan, November 14-17, 2016, Proceedings
 [4] Trevor J. M. Bench-Capon and Frans Coenen. 1992. Isomorphism and legal                   (Lecture Notes in Computer Science), Isabelle Comyn-Wattiau, Katsumi Tanaka,
     knowledge based systems. Artif. Intell. Law 1, 1 (1992), 65–86. https://doi.org/         Il-Yeol Song, Shuichiro Yamamoto, and Motoshi Saeki (Eds.), Vol. 9974. 381–395.
     10.1007/BF00118479                                                                       https://doi.org/10.1007/978-3-319-46397-1_29
 [5] Trevor J. M. Bench-Capon, G. O. Robinson, Tom Routen, and Marek J. Sergot.         [27] Sagar Sunkle, Deepali Kholkar, and Vinay Kulkarni. 2016. Informed Active
     1987. Logic Programming for Large Scale Applications in Law: A Formalisation             Learning to Aid Domain Experts in Modeling Compliance. In 20th IEEE In-
     of Supplementary Benefit Legislation. In Proceedings of the First International          ternational Enterprise Distributed Object Computing Conference, EDOC 2016,
     Conference on Artificial Intelligence and Law, ICAIL ’87, Boston, MA, USA, May           Vienna, Austria, September 5-9, 2016, Florian Matthes, Jan Mendling, and Ste-
     27-29, 1987. 190–198. https://doi.org/10.1145/41735.41757                                fanie Rinderle-Ma (Eds.). IEEE Computer Society, 1–10. https://doi.org/10.1109/
 [6] Carlo Biagioli, Enrico Francesconi, Andrea Passerini, Simonetta Montemagni,              EDOC.2016.7579382
     and Claudia Soria. 2005. Automatic Semantics Extraction In Law Documents. In       [28] Tom M. van Engers, Ron van Gog, and Kamal Sayah. 2004. A Case Study on
     ICAIL, June 6-11, 2005, Italy. 133–140. https://doi.org/10.1145/1165485.1165506          Automated Norm Extraction. In Legal Knowledge and Information Systems. Jurix
 [7] Travis D. Breaux and Annie I. Antón. 2005. Deriving Semantic Models from                 2004: The Seventeenth Annual Conference. (Frontiers in Artificial Intelligence
     Privacy Policies. In 6th POLICY Workshop, Sweden. 67–76. https://doi.org/10.             and Applications), T. Gordon (Ed.). IOS Press, Amsterdam, 49–58.
     1109/POLICY.2005.12                                                                [29] Adam Z. Wyner and Wim Peters. 2011. On Rule Extraction from Regulations.
 [8] Travis D. Breaux, Annie I. Antón, and Jon Doyle. 2008. Semantic parameter-               In Legal Knowledge and Information Systems - JURIX 2011: The Twenty-Fourth
     ization: A process for modeling domain descriptions. ACM Trans. Softw. Eng.              Annual Conference, University of Vienna, Austria, 14th-16th December 2011.
     Methodol. 18, 2 (2008). https://doi.org/10.1145/1416563.1416565                          113–122. https://doi.org/10.3233/978-1-60750-981-3-113
 [9] Emile de Maat, Kai Krabben, and Radboud Winkels. 2010. Machine Learning            [30] Nicola Zeni, Nadzeya Kiyavitskaya, Luisa Mich, James R. Cordy, and John
     Versus Knowledge Based Classification of Legal Texts. In Proceedings JURIX               Mylopoulos. 2015. GaiusT Supporting The Extraction Of Rights And Obligations
     2010. IOS Press, Amsterdam, The Netherlands, The Netherlands, 87–96.                     For Regulatory Compliance. Requir. Eng. 20, 1 (2015), 1–22. https://doi.org/10.
[10] Mauro Dragoni, Guido Governatori, and Serena Villata. 2015. Automated Rules              1007/s00766-013-0181-8
     Generation from Natural Language Legal Texts. In Workshop on Automated             [31] Nicola Zeni, Luisa Mich, John Mylopoulos, and James R. Cordy. 2013. Ap-
     Detection, Extraction and Analysis of Semantic Information in Legal Texts. San           plying GaiusT For Extracting Requirements From Legal Documents. In Sixth
     Diego, USA, 1–6.                                                                         International Workshop on Requirements Engineering and Law, RELAW 2013, 16
[11] Guido Governatori and Antonino Rotolo. 2013. A conceptually rich model of                July, 2013, Rio de Janeiro, Brasil. 65–68. https://doi.org/10.1109/RELAW.2013.
     business process compliance. In APCCM 2010: 3-12.                                        6671349
[12] Jan Hajic, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara,
     Maria Antònia Martí, Lluís Màrquez, Adam Meyers, Joakim Nivre, Sebastian
     Padó, Jan Stepánek, Pavel Stranák, Mihai Surdeanu, Nianwen Xue, and Yi Zhang.
     2009. The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in
     Multiple Languages. In Proceedings of the Thirteenth Conference on Computa-
     tional Natural Language Learning: Shared Task, CoNLL 2009, Boulder, Colorado,
     USA, June 4, 2009. 1–18. http://aclweb.org/anthology/W/W09/W09-1201.pdf
[13] Terry A. Halpin. 2011. Fact-Orientation and Conceptual Logic. In Proceedings
     EDOC 2011, Finland. 14–19. https://doi.org/10.1109/EDOC.2011.28
[14] Zellig Harris. 1954. Distributional structure. Word 10, 23 (1954), 146–162.
[15] Waël Hassan and Luigi Logrippo. 2009. Governance Requirements Extraction
     Model for Legal Compliance Validation. In RELAW 2009, USA. 7–12. https:
     //doi.org/10.1109/RELAW.2009.4
[16] Shawn Kerrigan and Kincho H. Law. 2003. Logic-Based Regulation Compliance-
     Assistance. In Proceedings of the 9th International Conference on Artificial
     Intelligence and Law, ICAIL 2003, Edinburgh, Scotland, UK, June 24-28, 2003.
     126–135. https://doi.org/10.1145/1047788.1047820
[17] Deepali Kholkar, Sagar Sunkle, and Vinay Kulkarni. 2016. From Natural-language
     Regulations to Enterprise Data using Knowledge Representation and Model
     Transformations. In Proceedings of the 11th International Joint Conference on
     Software Technologies (ICSOFT 2016) - Volume 2: ICSOFT-PT, Lisbon, Portugal,
     July 24 - 26, 2016. 60–71. https://doi.org/10.5220/0006002600600071
[18] Deepali Kholkar, Sagar Sunkle, and Vinay Kulkarni. 2017. Towards Automated
     Generation of Regulation Rule Bases using MDA. In MODELSWARD 2017 -
     Accepted.
[19] Nadzeya Kiyavitskaya, Nicola Zeni, Travis D. Breaux, Annie I. Antón, James R.
     Cordy, Luisa Mich, and John Mylopoulos. 2008. Automating the Extraction of