157 The Associative Rules Constructing on the Example of Patient’s Physical Characteristics Nataliya Shakhovska, Iryna Zhelizniak Department of Artificial Intelligence, Lviv Polytechnic National University, UKRAINE, Lviv, 12 S.Bandera str., email: nataliya.b.shakhovska@lpnu.ua Abstract: The methodof the construction of associative In the field of medicine, such objects, for example, are rules are described. Associative rules for assaying the indicators and analyzes of the patient (Table 1). patient have been constructed. The set of transactions that TABLE 1. OBJECTS INCLUDED IN THE STUDY SET are available for medical analysis of a patient is considered. It has been found that the correct assessment of the utility Id Indicator Value of an associative rule affects the volume and speed of access to information. A unique identifier for the patient set of 0 Blood pressure 120/80 mm. Hg. patient analyzes has been entered. Additional numerical attributes of the investigated objects are indicated. 1 Venous pressure 70 mm. H2O Transactions that contain additional attributes and operations are not only available, but also compared. The 2 Capillary pressure 70 mm. Hg. distinction between associative rules and sequential analysis is given. 3 Pulse 85 beats/min Keywords: associative rules, data mining, support, patient’s physical characteristics, sequential analysis 4 Temperature 36,6 С I. INTRODUCTION 5 Level of hemoglobin in the 145 Hb In medical and biological research, as well as in practical blood medicine, the range of tasks to be solved is so wide that it is possible to use any of the methodologies of Data Mining. An 6 рН 7.35 example can be the construction of a diagnostic system or the study of the effectiveness of surgical intervention. In this way they correspond to the following set of objects: One of the most advanced areas of medicine is I = {arterial pressure, venous pressure, capillary pressure, bioinformatics. The object of bioinformatics research is huge pulse, temperature, hemoglobin level in blood, pH}. amounts of information about DNA sequences and the primary Sets of objects from the I set, stored in a database and structure of proteins that arose as a result of studying the subject to analysis, are called transactions. We describe the structure of genomes of microorganisms, mammals and transaction as a subset of the set I: humans. Abstracted from the specific content of this 𝑇𝑇 = {𝑖𝑖𝑗𝑗 |𝑖𝑖𝑖𝑖 ∈ 𝐼𝐼} . (2) information, it can be regarded as a set of genetic texts, Such transactions in the hospital are in accordance with the consisting of extended character sequences. Detection of delivery of medical examinations of the patient and stored in structural laws in such sequences is a number of tasks, the database in the form of a medical card. They list the tests effectively solved by means of Data Mining, for example, by that the patient passed for a history and diagnosis. means of sequencing and associative analysis [1, 2]. The set of transactions, the information about which is The purpose of the study is to identify the most important available for analysis, will be described by the following set: rules for constructing associative rules. Determination of the patterns of constructing associative rules and the division of 𝐷𝐷 = {𝑇𝑇1 , 𝑇𝑇2 , … , 𝑇𝑇𝑟𝑟 , … , 𝑇𝑇𝑚𝑚 }, (3) physical indicators at different levels of the hierarchy.. where m - the number of transactions available for analysis. II. OBJECTS AND METHODS OF RESEARCH III. RESEARCH RESULTS One of the most common data analysis tasks is to identify To use Data Mining methods, the set D can be represented sets of objects that are often encountered in a large set of as a table (Table 2). objects. We describe this problem in a generalized form. To do The set of transactions, which includes jі objects, is this, we denote the objects that make up the study sets indicated as follows [3]: (itemsets), as follows [2, 3]: 𝐷𝐷 = {𝑇𝑇𝑟𝑟 |𝑖𝑖𝑗𝑗 ∈ 𝑇𝑇𝑟𝑟 ; 𝑗𝑗 = 1. . 𝑛𝑛; 𝑟𝑟 = 1. . 𝑚𝑚} ⊆ 𝐷𝐷 (4) 𝐼𝐼 = {𝑖𝑖1 , 𝑖𝑖2 , … , 𝑖𝑖𝑗𝑗 , … , 𝑖𝑖𝑛𝑛 }, (1) In this example, the set of transactions containing the Object where j i - objects included in the studied sets; n - total number Temperature is the following: of objects. ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 158 TABLE 2. A SET OF INVESTIGATED OBJECTS Then the sequence of objects can be described as follows: 𝑆𝑆 = {… , 𝑖𝑖𝑝𝑝 , … , 𝑖𝑖𝑞𝑞 }, 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑝𝑝 < 𝑞𝑞 . (10) Transaction Indicator Indicator Value For example, in the case of analyzes such a sequence of number number objects may be the date of delivery of analyzes. Such a sequence: 0 0 Blood 110/75 S = {(hemoglobin level, 10.10.2017), (venous pressure, pressure mm. Hg. 09/25/2017), (pH, 28.09.2017)} 0 3 Pulse 110 сan be interpreted as a sequence of delivery of tests by one beats/min person at different times (initially measured venous pressure, then measured the pH level, and finally the level of 0 1 Venous 58 mm. hemoglobin). pressure H2O There are two types of sequences: with cycles and without cycles. In the first case it is allowed to enter the sequence of 1 4 Temperature 37.4 С the same object at different positions: 𝑆𝑆 = {… , 𝑖𝑖𝑝𝑝 , … , 𝑖𝑖𝑞𝑞 , … }, 𝑤𝑤ℎ𝑒𝑒𝑒𝑒𝑒𝑒 𝑝𝑝 < 𝑞𝑞, 𝑖𝑖𝑞𝑞 = 𝑖𝑖𝑝𝑝 . (11) 1 5 pH 7.46 It is said that transaction T contains the sequence S, if S ⊆ T and the objects included in S, also belong to the set of T, with 2 1 Venous 72 mm. preservation of the relation of order. It is supposed that in the pressure H2O set T between objects in the sequence of S there may be other 2 6 рН 7.81 objects. The maintenance of the sequence S is the ratio of the number 2 4 Temperature 37.2 С of transactions, which includes the sequence of S, to the total number of transactions. The sequence is frequent if its support In this example, the set of transactions containing the Object exceeds the minimum support given by the user: Temperature is the following set: 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(𝑆𝑆) > 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚 . (12) Dtemperature = {{Temperature, pH }, The task of sequential analysis is to search all frequent sequences: {Venous pressure, pH, Temperature}} 𝐿𝐿 = {𝑆𝑆|𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(𝑆𝑆) > 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚 } . (13) Some arbitrary set of objects (itemset) is denoted as follows: The main difference between the problems of sequential 𝐹𝐹 = {𝑖𝑖𝑗𝑗 |𝑖𝑖𝑗𝑗 ∈ 𝐼𝐼; 𝑗𝑗 = 1. . 𝑛𝑛} . (5) analysis from the search for associative rules is to establish a The set of transactions that includes the set F is denoted as relation of order between objects of the set I. This relation can follows: be determined in different ways. In the analysis of the 𝐷𝐷𝐹𝐹 = {𝑇𝑇𝑟𝑟 │𝐹𝐹 ⊆ 𝑇𝑇𝑟𝑟 ; 𝑟𝑟 = 1. . 𝑚𝑚} ⊆ 𝐷𝐷. (6) sequence of events occurring in time, the objects of the set I The ratio of the number of transactions, which includes the are events, and the order of relationships corresponds to the set F, to the total number of transactions is called support of chronology of their appearance. For example, analyzing the set F and denoted by Supp (F): sequences of assays in a hospital are sets of analyzes that the 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(𝐹𝐹) = |𝐷𝐷𝐹𝐹 |/𝐷𝐷 . (7) patient submits at different times, and the order of reference is For example, for a set {pH, temperature} the subtraction will the time of the implementation of these analyzes. be equal to 2/3, because this set is included in two transactions D = {{(temperature, blood pressure, capillary pressure), (pH, (numbers 1 and 2) of the three possible. temperature, pulse)}, {(hemoglobin level in blood, When searching, an analyst can specify the minimum value temperature), (blood pressure, temperature), (temperature, of maintaining interesting sets - Suppmin. A set is called large venous pressure)}, {( hemoglobin level in the blood)}}. if its value exceeds the minimum support value specified by Of course, there is a problem of identification of patients. In the user: practice, this is decided by the introduction of medical cards 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(𝐹𝐹) > 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚 . (8) that have a unique identifier (table 3). So, when searching for associative rules you need to find the TABLE 3. A UNIQUE IDENTIFIER FOR THE SET OF ANALYZES set of all frequent sets: 𝐿𝐿 = {𝐹𝐹|𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(𝐹𝐹) > 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑚𝑚𝑚𝑚𝑚𝑚 } . (9) Patient ID Sequence of analyzes delivery In this case, the sets with Suppmin = 2/3 are the following: {Venous pressure} Suppmin = 2/3; 0 (temperature, arterial pressure, capillary pressure), {Temperature} Suppmin = 2/3; (pH, temperature, pulse) {рН, Temperature} Suppmin = 2/3; 1 (hemoglobin level in the blood, temperature), In an analysis, the sequence of events is often of interest. (blood pressure, temperature), (temperature, When detecting regularities in such sequences, it is possible to venous pressure) predict with some degree the occurrence of events in the future, which allows us to make more correct decisions. A 2 (hemoglobin level in the blood) sequence is called an ordered set of objects. To do this, the order must be given to the set [4]. ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 159 The following sequence can be interpreted as follows: the the groups, and then, depending on the results, investigate the patient with the ID 0 initially passed the temperature, the objects that interest the group analyst. In any case, it can be arterial and capillary pressure, and then passed the pH, argued that the presence of a hierarchy in objects and its use in temperature and pulse rate with his visit. For example, the the task of finding associative rules allows you to perform a support for the {(blood pressure, temperature)} sequence is more flexible analysis and gain additional knowledge. 2/3, since it is found in patients with identifiers 0 and 1. In the considered problem of searching for associative rules, In many applications, objects of the set I naturally combine the presence of an object in a transaction was determined only into groups that in turn can also be grouped into more general by its presence in it (𝑖𝑖𝑗𝑗 ∈ 𝑇𝑇) or the absence (𝑖𝑖𝑗𝑗 ∉ 𝑇𝑇). Often, groups, etc. Thus, the hierarchical structure of objects is objects have additional attributes, usually numeric. For obtained. example, analyzes in a transaction have attributes: value and An example of such a hierarchy may be the following duration. In this case, the presence of an object in the set can categorization of analyzes: be determined not only by the fact of its presence, but also the Pressure: execution of the condition in relation to a certain attribute. For · Arterial; example, in analyzing transactions performed by patients, they · Venous; are interested not only in the value of the analysis, but also in · Capillary how well this indicator is stable (long-term). Physical indicators: You can add additional objects to explore the sets in order · Temperature to extend the analysis capabilities by searching for associative Blood test: rules. In the general case, they may have a nature different · Hemoglobin level; from the main objects. For example, in the case of delivery of · PH tests, you can enter the field of delivery frequency or The presence of a hierarchy changes the perception of when symptoms that precede the delivery of these particular an object i is present in transaction T. Obviously, support is not analyzes. a separate object, but the group to which it is included is Solving the problem of finding associative rules, as well as greater: any task, is to process the output and obtain the results. Processing of the initial data is performed by a certain Data 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(𝐼𝐼𝑞𝑞 ) ≥ 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆(𝑖𝑖𝑗𝑗 ) (14) Mining algorithm. where i j ∈ Iq. The results obtained in solving this problem are accepted in This is due to the fact that when analyzing groups, not only the form of associative rules. In this regard, when searching transactions that include a separate object, but also transactions for them, there are two main stages: containing all objects of the analyzed group are counted. For 1. Finding all large sets of objects; example, if Supp {blood pressure, temperature} = 2/3, then 2. Generation of associative rules from found large sets support Supp {pressure, physical parameters} = 2/3, since the of objects. objects of the groups of pressure and physical parameters are Associative rules are as follows: included in the transaction with the identifiers 0 and 1. If (condition) then (result) Using the hierarchy allows you to determine the connection where condition is usually not a logical expression (as in the that goes into higher levels of the hierarchy, since the support classification rules), but a set of objects from the set I, with for the set can increase if the entry of the group, and not its which associated (associated) objects are included in the result object, is counted. In addition to the search for kits that often of this rule. occur in transactions, which in turn consist of objects 𝐹𝐹 = For example, associative rule: {𝑖𝑖|𝑖𝑖 Î 𝐼𝐼} or groups of the same level of the hierarchy: If (blood pressure, pH) then (hemoglobin level) 𝐹𝐹 = {𝐼𝐼 𝑔𝑔 |𝐼𝐼 𝑔𝑔 ∈ 𝐼𝐼 𝑔𝑔+1 } . (15) You can also consider mixed sets of objects and groups: means that if the patient is measured by arterial pressure and pH level, he also measured by hemoglobin level. 𝐹𝐹 = {𝑖𝑖, 𝐼𝐼 𝑔𝑔 |𝑖𝑖 ∈ 𝐼𝐼 𝑔𝑔 ∈ 𝐼𝐼 𝑔𝑔+1 } . (16) As already noted, in associative rules the condition and the This allows you to extend the analysis and gain additional result are objects of the set I: knowledge. In the hierarchical structure of objects, you can change the If X then Y, nature of the search by changing the analyzed level. Obviously, where 𝑋𝑋 ∈ 𝐼𝐼, 𝑌𝑌 ∈ 𝐼𝐼, 𝑋𝑋 ∪ 𝑌𝑌 = 𝜑𝜑. the more objects in the set I, the more objects in transactions T The main advantage of associative rules is their easy and frequent sets. This in turn increases search time and perception by a person and a simple interpretation of complicates the analysis of results. You can reduce or increase programming languages. However, they are not always useful. the amount of data using the hierarchical representation of the There are three types of rules: objects under analysis. Moving up the hierarchy, we 1. Useful rules - contain valid information that was previously summarize the data and reduce their number, and vice versa. unknown but has a logical explanation. Such rules can be The disadvantage of generalizing objects is the less used for making decisions that are beneficial; usefulness of the knowledge gained, since in this case they 2. Trivial rules - contain valid and easily understandable relate to groups that do not always have useful information. To information that is already known. Such rules, although achieve a compromise between group analysis and analysis of they can be explained, but can not bring any benefits, as individual objects, they often do the following: first analyze they reflect or known laws in the studied area, or the ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 160 results of past activity. Sometimes such rules can be used that indicates the validity period of a particular version of the to verify the implementation of decisions taken on the document. basis of preliminary analysis; • Persistence of displayed information 3. Unclear rules - contain information that can not be Once created, the information of a static XML document explained. Such rules can be obtained either on the basis remains valid at all times. Conversely, the version of the of abnormal values, or deeply hidden knowledge. Directly dynamic XML document is valid only for the period specified such rules can not be used for decision making, since their in the corresponding elements. As soon as a new version lack of clarity can lead to unpredictable results. For better appears, the information contained in the previous version is understanding, further analysis is required. replaced. Associative rules are built on the basis of large sets. So, the Most of the work on finding associative rules in static XML rules built on the basis of the set F, are all possible documents is related to the use of XML-based algorithms combinations of objects included in it. based on the Apriori algorithm. However, there are a number For example, for the set {arterial pressure, temperature, of other approaches. pulse} the following associative rules can be constructed: TABLE 4. REPRESENTATION OF STATIC XML If (arterial pressure) then (temperature); DOCUMENT If (arterial pressure) then (pulse); If (arterial pressure) then (temperature); If (arterial pressure) then (temperature, pulse); Tom Johnson If (temperature, pulse) then (arterial pressure); Bandery Lviv And so on. Thus, the number of associative rules can be very large and 15/01/2017 bad for human perception. In addition, not all of the built-in rules carry useful information. To assess their usefulness, the 38.2 following values are entered: 72 • Support - shows which percentage of transactions 110 supports this rule (we found rules, where Support is 7.46 upper then 75%). 145 • Confidence - shows the probability that the presence of a set Y in the transaction in the set X implies (we found rules, where Confidence is upper then 0.5). • Improvement - indicates whether this rule is useful for research. III. CONCLUSION These estimates are used when generating rules. An analyst The task of finding associative rules is to identify sets of when searching for associative rules specifies the minimum objects that are commonly encountered in a large number of values of these variables. As a result, those rules that do not objects. The task of sequential analysis is to search for frequent satisfy these conditions are discarded and are not included in sequences. The main difference between the tasks of the solution of the problem. sequential analysis from the search for associative rules is to If objects have additional attributes that affect the establish a relationship of order between objects. The presence composition of objects in transactions, and therefore in sets, of a hierarchy in objects and its use in the task of finding then they should be taken into account in generated rules. In associative rules allows you to perform a more flexible this case, the conditional part of the rules will not only include analysis and obtain additional knowledge. The results of the verification of the existence of an object in a transaction, but solution of the problem are presented in the form of associative also more complex comparing operations: more, less, includes, rules, conditional and the final part of which contains sets of etc. The resulting part of the rules may also contain statements objects. about the attribute values. For example, if an indicator is considered topical, then the rules may look like this: REFERENCES If pH.relevance > 10 days then the level of hemoglobin [1] Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer networks and in the blood.relevance < 3 days. ISDN systems, 30(1-7), 107-117. This rule states that the patient did the pH analysis more than [2] Negnevitsky, M. (2005). Artificial intelligence: a guide to 10 days ago, then probably his analysis of hemoglobin in the intelligent systems. Pearson Education. blood is valid for no more than 3 days. [3] Jain, V., Benyoucef, L., & Deshmukh, S. G. (2008). A new The main differences between static and dynamic XML approach for evaluating agility in supply chains using documents are: fuzzy association rules mining. Engineering Applications • Availability of validity period of Artificial Intelligence, 21(3), 367-385. A static XML document does not contain elements that [4] Shakhovska, N., Kaminskyy, R., Zasoba, E., & Tsiutsiura, indicate the expiration date of this document. In contrast, a M. (2018). ASSOCIATION RULES MINING IN BIG dynamic XML document initially contains at least one element DATA. International Journal of Computing, 17(1), 25-32 ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic