=Paper=
{{Paper
|id=None
|storemode=property
|title=Extension of Business Rule Sets Using Data Mining of GUHA Association Rules
|pdfUrl=https://ceur-ws.org/Vol-1422/59.pdf
|volume=Vol-1422
|dblpUrl=https://dblp.org/rec/conf/itat/Vojir15
}}
==Extension of Business Rule Sets Using Data Mining of GUHA Association Rules==
J. Yaghob (Ed.): ITAT 2015 pp. 59–64 Charles University in Prague, Prague, 2015 Extension of Business Rule Sets Using Data Mining of GUHA Association Rules Stanislav Vojíř Department of Information and Knowledge Engineering University of Economics, Prague W. Churchill Sq. 4, Prague 3, 130 67, Czech Republic Abstract. The following paper is intended to introduce three source system Weka, but the conversion from data mining suitable ways of using data mining of GUHA association rules results to the form of classification tables for the in conjunction with existing set of business rules. The OpenRules system can be realized only by experts from the integration can be realized using full integration, as black box classification model and also using dynamic integration with authors´ company. data mining system. These ways are illustrated by demo use case based on data from a health insurance company. 1.2 Business Rules In this paper, the author describes three suitable ways of 1 Introduction direct integration of data mining results into an existing Business rules are not only an effective way for business rule set. Business rules is not the name of one modeling of business structure and descriptions of specification or system. The term “business rules” covers operations, definitions and constrains in an organization, the relatively great area of rule-based systems and but also an efficient way for separation of business logic applications. It is mainly the name of modeling approach. from the application code of information systems. The In this approach, the modeling of the business behavior and separation of business logic, mainly “decision-making decisions leads from the definition of basic entities and points” from the implementation of applications is very terms to the definition of standalone business rules. These important, especially in today´s rapidly changing world. rules are collected info rule sets in one complex knowledge For this reason, it can be observed an increasing number of base of the company. applications of rule engines and business rules system. The business rules approach has been applied in many In this paper, the presented approach of extension of specifications of languages for definition of business rules. a business rules base is illustrated using examples from The specifications can be divided by their main focus in a health insurance company. From this domain, examples two groups – specifications suitable for inference engines of business rule could be: “If the doctor has specialization and specifications suitable for sharing of knowledge in 001, then the diagnosis AAA is OK.” or “The child human-friendly form. The work presented in this paper is emergency cannot treat the adult patients.” Such rules are more suitable for implementation in automatic inference usually saved and managed by a business rule management (business rules) engines – JBoss Drools, Jess, Jena etc. The system. The rule set in conjunction with the related terms execution component takes the set of business rules and the dictionary can be called “knowledge base”. base of facts, evaluates the conditions of business rules and However, the applicability of the rule-based systems activates the proprietary rules. greatly depends on the complexity and completeness of their knowledge base. In addition to the manual input of 1.3 GUHA Association Rules business rules by domain experts, there have been One of the possible and suitable methods for extension discovered also some methods of obtaining business rules of knowledge base in the form of business rules is the from the business data – for example from unstructured application of data mining methods on the historical data of texts or from operational data store of the company. the company. It seems that the suitable data mining models A suitable method for “learning” of business rules from the are association and decision rules. The association rules can working or historical business data is application of data be discovered not only using the mostly known algorithm mining methods and reusage of the gained data mining APRIORI, but also using the procedure ASSOC of the models. GUHA method.1 The GUHA method is original Czech data mining 1.1 Related Work method for data mining of association rules with “rich semantic”. The basic form of GUHA association rules is From the relevant works and papers, the “semi-automatic learning of business rules” has been a subject of research φ≈ψ activities for relatively long period. But there are still not too many real applications. The most relevant existing where φ (antecedent), ψ (consequent2) and possibly are application of “data mining of business rules” is the logical combinations of attributes (with concrete values) component RuleLearner, which is a part of the business and ≈ is the quantifier – function defined on the four feet rules system OpenRules.[1] This system works with table. Examples of the 4ft-quantifiers are founded knowledge base in the form of decision tables in Excel 1 worksheets. According to the information from the In this paper, the rules founded using application of company OpenRules, Inc., the component RuleLearner is GUHA procedure ASSOC are called „GUHA association still non-public. It is based on data mining using open rules“. 2 In the GUHA method, consequent is called succedent 60 S. Vojíř implication (combination of interest measures confidence a separate business rule. From the GUHA association rule, and support) and above average dependence (this antecedent and condition parts are transformed into quantifier is convertible to the combination of interest condition of the business rule, consequent4 of the measures lift and support). [5] association rule is “implemented” in the body of the The GUHA association rules for the approaches business rule. The body of the business rule executes the presented in this paper are discovered using the data mining requested action – returns the result of the classification system LISp-Miner.3 This software supports data mining of task in suitable form (set of attributes with values, adds GUHA association rules also with the “dynamic binning of new data in the base of facts etc.) For this transformation, values in attributes”. This feature extends the pattern of some constrains of the solved data mining tasks has to been requested association rules (task definition). The attributes considered. can contain the set of values – for example the rule attribute Antecedent, condition and even consequent of a GUHA age([0;1),[1;5)) is interpreted as age in interval from 0 to association rule can consists from multiple “partial 5 years (without the request for redefinition of the data cedents” (brackets in logical representation), containing preprocessing). The dynamic binning can be defined as conjunctions, disjunctions and negations. In case of mining subsets of the given length, left or right cuts, intervals etc. using LISp-Miner system, every attribute in the rule can An example of the founded GUHA association rule: also contain multiple values, connected during the mining age([20;40]) & city(Prague) & clinic(A, B) process using the “dynamic binning” feature. For the procedure(C) | confidence 0.6, support 0.01 possibility of transformation from association rules to The interpretation of this rule: If the age is in the interval business rules, it is not necessary to apply any limits or from 20 to 30 years, city is Prague and the clinic is A or B, constrains to antecedent and condition part of association then the applied procedure is C. The confidence of this rule rules. However, it is necessary to solve the problem of the is 60% and support is 1%. data dictionary. The data dictionary has to be mapped to shared terms dictionary used in organization. If the data 1.4 Structure of this Paper mining process has been initialized using data from operational data store of the organization, it is possible to This work is focused on the use of association rules use the default names of data attributes (columns) in the obtained by application of GUHA method (below in text operational data store as the terms dictionary for definition called “GUHA association rules”), but the principles are of business rules.5 generalizable also for the usage of simpler association rules From the perspective of transformation to the form of obtained using the algorithm APRIORI (for example in the business rules for the system JBoss Drools, condition of the system R). This paper follows the previous work of rule can consist from logical expressions similar to native preparation classification business rule sets using GUHA java code. The transformation consists from these steps: association rules [2] and is also related to currently solved TAČR project TA04011691 “Automated extraction of 1. Perform reverse preprocessing of used data. In data business rules with feedback” [3]. mining process it is common to prepare attributes from The paper is organized as follows. Section 2 gives a walk the data columns from the original data matrix. These through three suitable models of integration data mining attributes have different names and preprocessed model into business rule set. Section 3 contains example values (during the preprocessing phase of data mining use cases motivated by real data. The conclusion process, the original data values are grouped into summarizes the paper and outline for future work. named sets or intervals of original data values). The transformation itemizes the attributes included in association rules to the original names and values. 2 Integration of Data Mining Models into 2. Remove unnecessary cedents from antecedent and Existing Business Rule Set condition part of GUHA association rule – because of Within this section, there are described three model ways the data mining task configuration and LISp-Miner of integration GUHA association rules into an existing export, the GUHA association rules saved in PMML6 business rule set. The suitability of their use differs form often contain unnecessary partial cedents according to the requested level of the integration and also (multiple brackets without any added logical to the analytical questing solved with the data mining task. expression). All these ways are fully implementable (and have been 3. Transform antecedent and condition of every GUHA practically verified) using business rule engine JBoss association rule into condition of a business rule. Drools [4] and data mining system LISp-Miner [5]. Dependently on the handling method of null values in the data set for data mining task, negation in 2.1 Direct Ttransformation of GUHA Association association rule can be interpreted as inequality or Rules into Business Rules 4 In GUHA method is „consequent“ called „succedent“. First variant of the involvement of founded association 5 Alternativelly in the organization maybe exists rules into an existing business rule set is the direct a mapping for data attributes from operational data store to transformation of them. Within this transformation, every an ontology or other “terms dictionary”. founded GUHA association rule is transformed into 6 Predictive Model Markup Language – XML-based format (technical standard) for saving of data mining 3 http://lisp-miner.vse.cz models; developer by Data Mining Group Extension of Business Rule Sets Using Data Mining of GUHA Association Rules 61 negation of the checking condition. For preparation of however, return a lot of founded rules (possibly thousands a classification business rule set, it is more suitable to of rules). In case of their integration into the main use the interpretation as inequality (by testing results). knowledge base, it is appropriate to identify these rules Negation in association rule expression should be with specific “tag”. interpreted as inequality. In case of mining of GUHA In terms of practical evaluation, the options of this model association rules with condition, the condition can be of integration were verified in [2] and [8]. It is suitable to appended to antecedent part (using conjunction), or generate business rule set in DRL form from GUHA could be interpreted as group condition for conditioned association rules. The classifier obtained by this method subset of business rules. can achieve even better results than reference 4. Prepare business rules´ bodies from the consequents classifiers. [2] According to realized tests, dependently on association rules cedents. Semiautomatic acquisition of the solved data set, the greater “expression language” of business rules from data mining results is suitable for GUHA association rules can contribute to better results (but solving of “classification” tasks. These tasks cannot at the cost of more rules). return value of one “result” attribute. The limitations of consequent of the association rules for following 2.2 Black Box Classification Component automatic processing of results are as follows: Each The second suitable variant for the inclusion of data consequent should contain one or more attributes with mining results into an existing knowledge base in form of values, which were not preprocessed in data mining business rule set is the integration as “black box”. In this process. In case of more attributes in consequent part way, the connected component is suitable for solving of of association rule, these attributes should be classification tasks. The integration schema should be as connected within conjunction. follows: 5. Use requested conflict resolution strategy. Information about required facts Business rules in DRL form (format suitable for JBoss Drools) are based on Java classes, which represents the terminological dictionary. For support of solving Information classification problems using association rules, in most from eval. base cases it is necessary to select the best result consequent (the Association rules resulting recommendation) in case of more business rules Black box Wrapper with matching antecedent/condition. Good conflict component business rule resolution strategy is to prefer classification rules with better values of confidence, support and shorter Classification condition.[6] In DRL, the suitable strategy is implemented Input values result in one conflict resolution function written in DRL. mapping List of variables The result of recommendation/classification task can be processed with other part of information system of the Fig. 1. Schema of black box component integration. organization, or can be processed with other business rules. Based on testing use cases, it can be said, that the following The user connects the black box component as one processing of the results using other business rules “part” of the knowledge base. It can be connected to body contributes to the clarity of the full knowledge base of the of a business rule, or as a partial condition. In the definition organization. From the perspective of knowledge phase of the issued business rule, the user has to follow management in organization in context of business rules, it steps of a simple connection process: is appropriate to build one shared knowledge base in form of business rules based on one shared terms dictionary. [7] 1. Select results of a data mining task and export them In implementation using JBoss Drools, it is suitable to into a standardized form (usually PMML). (temporarily) insert results of classification subtask into the 2. Define wrapper rule – one rule, which initializes the base of facts and continue in the business rules execution. evaluation of a classification black box component. Great advantage of the transformation of each one 3. Import data mining results into the classification black association rules into a separate business rules is the box component. The component checks the structure possibility of their subsequent management and of the uploaded model and detects all connecting administration using tools from the business rules points. The connecting points could be defined as management system. It is easy to edit these rules, their input, output or shared. Input connecting points priority and behavior. should include a definition of mapping between facts In case of automatic transfer of the complete results of in the evaluation base and attributes used in data mining of GUHA association rules into business rules, conditions of the classification model. there can be also found some disadvantages. First big 4. Define mapping for the connecting points: In case of disadvantage of full integration is a large increase of the classification model based on GUHA association number of business rules. For solving of classification tasks rules, the user defines 1:1 mapping between attributes using association rules without pruning algorithms, it is used in antecedents and conditions of association rules suitable to use data mining tasks with a really low and fields from the terms dictionary, the output requested minimal threshold value of support. Such tasks, connecting point is usually a variable for the result of 62 S. Vojíř classification. The result variable could be 1. Define export from the operational data store of the immediately captured and processed in the wrapper organization. This export can be realized for example business rule, or added into the evaluation base of using SQL query and should be “repeatable” for later facts used in the inference algorithm. For all the usages. The best way is definition of a view. mappings, the black box component detects required 2. Define data mining task for selection of GUHA data types for individual attributes and checks the association rules in the data mining system mapping at least on the level of data type, at best on LISp-Miner. Execute the task and check the results for level of the definition range. the corresponding form. There are no limits for The involvement of a data mining model as the black definition of the data mining task except of the “final box component brings many benefits. This way of attribute”, which should be returned as result. This integration has the lowest requirements for interaction with attribute should contain values from the original data other rules in the knowledge base and it is applicable not matrix (without the use of values grouping in only for data mining models consisting of rules but also for preprocessing phase or dynamic binning). other suitable types of data mining models. For example, 3. Export definition of the data mining task in PMML. there can be considered decision trees or neural networks, 4. Define the wrapper business rule including the too. definition of data mining task, mapping of terms From the perspective of management or domain experts, dictionary at least for “final attribute”, database this integration does not have too big impact on other connection string and limits for counts of requested business rules saved in the knowledge base. It is really easy results. interpretable: “In the condition of this rule matches the For some use cases, it is possible to map not only the characteristic of client, the body of the rule returns the final attribute, but also another attributes with fixed statistically most probable next offer for the client.” The value for the definition of a condition. “most probable next offer” is determined with the black box component, so the management expert does not have to 5. Define period or condition for activation of the know the hidden algorithms used for this recommendation. defined wrapper business rule. Within implementation This integration has also disadvantages. The most of using JBoss Drools, both these options are possible. them is the problematic of “recycling” of specified data The wrapper business rule initializes the execution of mining models for usage in more business rules. The data data mining task. It is possible to run the LISp-Miner mining model is usually connected at only one point (in the system not only from the graphical user interface, but also black box component integrated in wrapper rule”). In case from the command line. After receiving the results from the of usage models based on rules is a disadvantage also the data mining system, the wrapper rule compares the count of exclusion of the evaluation of contained rules out of the founded association rules. If the count is within the main RETE network.7 requested interval, the wrapper business rule extracts values In case of implementation of the black box component in of the final attribute in the founded rules and adds them as the system JBoss Drools, it is possible to use external new facts in the evaluation base for processing using other implementation in Java code, or implementation using business rules. separated, conditioned subset of business rules, which is In case of inappropriate count of founded data mining evaluated only “on demand” (separated with special results, the wrapper business rule can reinitialize the data condition). mining task with modified thresholds of interest measures. This model of integration has recently been implemented To find association rules the user usually defines thresholds in TAČR project mentioned in Introduction of this paper. of two interest measures (usually confidence and support, for some cases also lift and support).8 If the system founds 2.3 Data Mining Initialized by Business Rules too many rules, it is possible to increase the minimal Although the use of data mining models for solving of requested thresholds of interest measures and execute the classification tasks integrated in business rule set is data mining task again. appropriately interpretable and user comprehendible, it is This method of integration is suitable for interaction not suitable to limit the possible use cases for using only between business rules saved in the knowledge base and this way. The main reason for finding other, alternative data mining systems for detection of exceptions in the approach is absence of the target attribute for classification operational data. Whether the exception can be negative or in the operational data of the organization. Particularly in positive. For example detection of an increase in staff the case of usage data mining methods for finding of performance. The advantage of the application of data exceptions it is suitable to use dynamic data mining mining methods is the better performance than in case of initialized by business rules. This process of definition the evaluation the data matrix using set of specific business appropriate wrapper for initialization of data mining rules. However, also a disadvantage has to be considered. thought business rules engine in combination with The separation of statistical evaluation of the operational LISp-Miner system could be defined as follows: data matrix from the knowledge base for execution using 8 In the GUHA method, confidence and support are 7 Most systems for execution of business rules are based included in 4ft quaintifier „Founded implication“, lift is on usage of RETE algorithm, which allows quickly compatible with „Above average dependance“ quantifier inference evaluation. (AAD). Extension of Business Rule Sets Using Data Mining of GUHA Association Rules 63 external system can be founded either as advantage or 3.1 Direct acquisition of business rules disadvantage. It depends on the specialization of the Most business rules for evaluation the correctness of domain expert. From the point of view of marketing or requests from medical facilities is inputted manually by business specialists, it will be probably evaluated as an domain experts. For founding of unobvious relations in advantage – it is really simplification of the knowledge data, it is suitable to use data mining methods. The user can base. select some founded association rules, convert them into business rules and use them for following manual editing of 2.4 Terms Dictionary for Definition of Business Rules the knowledge base. For a definition of business rules, it is required to use To use data mining techniques, it is necessary to have a terms dictionary. This terms dictionary should contain access to the archive with operational data received in the declaration of basic entities used in the organization. These past. In terms of medical procedures it is also necessary to terms composites to facts, and facts composites into respect the specificities of different seasons and impact of business rules. For expanding of a business rule set using weather. For example, there are differences in frequency data mining results, the “good” terms dictionary can be the and types of illnesses and injuries between the summer and schema of the main operational database used in the the winter. organization. On the basis of this data mining analysis, it is also The mapping techniques are not subject of this paper. possible to detect potentially interesting areas for For the integration of data mining results into existing application of models for automatic learning of business business rule set, the best way is a definition of mapping in rules. the mode 1:1 not only at level of data attributes, but also at level of their values. 3.2 Classification model learning In using of business rules, the mapping can be realized Suitable analytical question for processing of the on basis of usage of specific mapping rules. In JBoss incoming data is the detection of facilities, which require Drools, it is possible to define rules with conditional probably too much procedures or unusual combinations of validity. So if the “mapping rule” detects in the evaluation them. A concrete example could be redundant performing base, it adds one or more other facts (instances of Java of laboratory analysis of blood or automatically request for object) representing the mapped fact. The added fact is RTG for all patients of a surgery. These unnecessary present and valid only while the mapping rule is active (it´s procedures are no benefit not only for the insurance condition is evaluated as true). company, but also for the patients. To solve this task, it is possible to use historical data 3 Demo use case about the checks previously made in medical facilities in For better illustration of the appropriateness of ways of combination with results from these tasks. Based on these integrating data mining results into a business rule set, it is data, it is possible to prepare a classification model for suitable to explain them on a demo use case. In this paper, recommending suitable facilities for the future check. the author represents them on use cases defined on data The classification model can be included into the from a health insurance company. knowledge base as native business rules, or better in form In every insurance company, it is necessary to collect the of black box component. The advantage of separated black most possible data from the real life and reuse them for the box component is the simpler replacement of the full risks analysis and for detection of fraud techniques. In the classification model with a newer version. domain of health insurance, the medical facilities send lists of performed procedures and request a financial 3.3 Periodically solved data mining task compensation for them. Every request composites from Another interesting task suitably solvable using data identification of the medical facility, the concrete medical mining methods is detection of unusual increase or worker, identification of the patient and details about the decrease of performed medical procedures in a concrete diagnosis and performed procedures. After composition in medical facility compared to other facilities of the same one “data row”, respectively one data matrix, there are tens type. This task cannot be resolved in the “flow check” of data attributes. system, but it is possible to solve it using archive of the The health insurance company has contracts with incoming data. individual medical facilities, but it does not mean, that It is suitable use case for application of periodical every facility requests only really performed procedures. solving of a predefined data mining task. The domain The reason may be a mistake, of course, but also attempt to experts defines a data mining task for founding GUHA fraudulently acquire some finances. The insurance association rules for example in form: company should have a list of rules (optionally a knowledge base in form of business rules) for detection of diagnosis(A) & facility(*) procedure(B) / clinicType(A) patently false requests. For example if a family doctor where clinicType(A) is condition of founded rules, the task requests finance for a surgical operation. But is it necessary is defined using AAD quantifier (interest measures are lift to detect not only obvious errors in requests. The insurance and support) and the expert want to process as results the company want to detect also unusual growths of performed values of the attribute facility. The expert defines interval procedures, which could be potentially evaluated as of minimal threshold of interest measures and maximal untruthful. count of requested rules. 64 S. Vojíř The data mining is then executed periodically once per month and the business rules system initializes the request for the check in the indicated medical facilities. 4 Conclusion and future work In this paper, the author presented three suitable ways of integration data mining results (mainly GUHA association rules) into a knowledge base in form of business rules, which are suitable for automatically execution. These models are applicable not only in conjunction with JBoss Drools system, they are generally applicable with all “execution oriented” business rules systems. For example, there can be mentioned systems Jess, Jena or ERIAN. Within the further work, it is necessary to propagate methods of automatic integration of data mining results into business rule sets. Another task is finalization of a model of knowledge base for combination data mining tasks with definitions of business rules. The demo implementation of the knowledge base, which concept was presented in [9], should be extended to a public methodology. Acknowledgment This paper was processed with contribution of long term institutional support of research activities and by IGA project 20/2013 by Faculty of Informatics and Statistics, University of Economics, Prague. References [1] OpenRules, Inc., “Rule Learner,” Open Rules [online] http://openrules.com/rulelearner.htm [cit. 2015-01-28] [2] Kliegr, T., Kuchař, J., Sottara, D., Vojíř, S.: Learning business rules with association rule classifiers. Rules on the Web. From Theory to Applications, Springer, 2014, 236-250 [3] Vysoká škola ekonomická v Praze and KOMIX s.r.o., TA04011691 - Automatizovaná extrakce byznys pravidel se zpětnou vazbou (2014-2016, TA0/TA), 2013 [4] Red Hat, Inc, “Drools,” Drools - Business Rules Management System (Java™, Open Source) [online] http://www.drools.org/, [cit. 2015-04-21] [5] Rauch, J., Šimůnek, M.: Dobývání znalostí z databází LISp-Miner a GUHA. Oeconomica Praha, 2015 [6] Thabtah, F. A.: A review of associative classification mining. Knowledge Engineering Review 22(1) (2007),. 37-65 [7] Ross, R.G.: Principles of the Business Rule Approach. Addison-Wesley Professional, 2003 [8] Vojíř, S., Kliegr, T., Hazucha, A., Škrabal, R., Šimůnek, M.: Transforming association rules to business rules: EasyMiner meets Drools. RuleML Challenge 2013, CEUR-WS.org, vol. 1004, 2013 [9] Vojíř, S.: Concept of semantic knowledge base for data mining of business rules. Znalosti 2014 Exhibice, Edukace a nacházení Expertů - Exhibition, Education and Expert finding. Praha: KIZI FIS, 2014, 132-136