I. INTRODUCTION

Constructing on the Example of Patient's Physical Characteristics

Nataliya Shakhovska

nataliya.b.shakhovska@lpnu.ua 0 1

Iryna Zhelizniak

0 1 0 Department of Artificial Intelligence, Lviv Polytechnic National University, UKRAINE , Lviv, 12 S.Bandera str. 1 In this example, the set of transactions containing the Object

2018

1 3

The methodof the construction of associative In the field of medicine, such objects, for example, are rules are described. Associative rules for assaying the patient have been constructed. The set of transactions that are available for medical analysis of a patient is considered. It has been found that the correct assessment of the utility of an associative rule affects the volume and speed of access to information. A unique identifier for the patient set of patient analyzes has been entered. Additional numerical attributes of the investigated objects are indicated.

Characteristics associative rules data

I. INTRODUCTION

In medical and biological research, as well as in practical medicine, the range of tasks to be solved is so wide that it is possible to use any of the methodologies of Data Mining. An example can be the construction of a diagnostic system or the study of the effectiveness of surgical intervention.

One of the most advanced areas of medicine is bioinformatics. The object of bioinformatics research is huge amounts of information about DNA sequences and the primary structure of proteins that arose as a result of studying the structure of genomes of microorganisms, mammals and humans. the specific content of this information, it can be regarded as a set of genetic texts, consisting of extended character sequences. Detection of structural laws in such sequences is a number of tasks, effectively solved by means of Data Mining, for example, by means of sequencing and associative analysis [ 1, 2 ].

The purpose of the study is to identify the most important rules for constructing associative rules. Determination of the patterns of constructing associative rules and the division of physical indicators at different levels of the hierarchy..

II. OBJECTS AND METHODS OF RESEARCH

One of the most common data analysis tasks is to identify sets of objects that are often encountered in a large set of objects. We describe this problem in a generalized form. To do this, we denote the objects that make up the study sets (itemsets), as follows [ 2, 3 ]: = { 1, 2, … , , … , }, Value 120/80 mm. Hg. 70 mm. H2O 70 mm. Hg. 85 beats/min 36,6 С

In this way they correspond to the following set of objects: I = {arterial pressure, venous pressure, capillary pressure, pulse, temperature, hemoglobin level in blood, pH}.

Sets of objects from the I set, stored in a database and subject to analysis, are called transactions. We describe the transaction as a subset of the set I:

= { | ∈ } .

Such transactions in the hospital are in accordance with the delivery of medical examinations of the patient and stored in the database in the form of a medical card. They list the tests that the patient passed for a history and diagnosis.

The set of transactions, the information about which is available for analysis, will be described by the following set: where m - the number of transactions available for analysis.

= { 1, 2, … , , … , },

III. RESEARCH RESULTS as a table (Table 2). indicated as follows [ 3 ]: = { |

∈ ; To use Data Mining methods, the set D can be represented The set of transactions, which includes jі objects, is (1) = 1. . ; = 1. . } ⊆ (2) (3) (4) indicators and analyzes of the patient (Table 1).

Pulse Temperature Level of hemoglobin in the blood Temperature is the following: of objects.

(5) (6) (7) (8)

Then the sequence of objects can be described as follows: For example, in the case of analyzes such a sequence of objects may be the date of delivery of analyzes. Such a = {… , , … , }, ℎ < .

(10) sequence: S = {(hemoglobin level, 10.10.2017), (venous pressure, 09/25/2017), (pH, 28.09.2017)} сan be interpreted as a sequence of delivery of tests by one person at different times (initially measured venous pressure, then

measured the pH level, and finally the level of hemoglobin).

There are two types of sequences: with cycles and without cycles. In the first case it is allowed to enter the sequence of the same object at different positions: = {… , , … , , … }, ℎ < , = . (11)

It is said that transaction T contains the sequence S, if S ⊆ T and the objects included in S, also belong to the set of T, with preservation of the relation of order. It is supposed that in the set T between objects in the sequence of S there may be other objects.

The maintenance of the sequence S is the ratio of the number of transactions, which includes the sequence of S, to the total number of transactions. The sequence is frequent if its support exceeds the minimum support given by the user: ( ) > .

The task of sequential analysis is to search all frequent sequences: = { | ( ) > } .

The main difference between the problems of sequential analysis from the search for associative rules is to establish a relation of order between objects of the set I. This relation can be determined in different ways. In the analysis of the sequence of events occurring in time, the objects of the set I are events, and the order of relationships corresponds to the chronology of their appearance. For example, analyzing sequences of assays in a hospital are sets of analyzes that the patient submits at different times, and the order of reference is the time of the implementation of these analyzes.

D = {{(temperature, blood pressure, capillary pressure), (pH, temperature, pulse)}, {(hemoglobin level in blood, temperature), (blood pressure, temperature), (temperature, venous pressure)}, {( hemoglobin level in the blood)}}.

Of course, there is a problem of identification of patients. In practice, this is decided by the introduction of medical cards that have a unique identifier (table 3). The presence of a hierarchy changes the perception of when an object i is present in transaction T. Obviously, support is not a separate object, but the group to which it is included is greater: where ij ∈ Iq. transactions that include a separate object, but also transactions containing all objects of the analyzed group are counted. For example, if Supp {blood pressure, temperature} = 2/3, then support Supp {pressure, physical parameters} = 2/3, since the objects of the groups of pressure and physical parameters are included in the transaction with the identifiers 0 and 1.

Using the hierarchy allows you to determine the connection that goes into higher levels of the hierarchy, since the support for the set can increase if the entry of the group, and not its object, is counted. In addition to the search for kits that often occur in transactions, which in turn consist of objects = { | Î } or groups of the same level of the hierarchy: You can also consider mixed sets of objects and groups: patient with the ID 0 initially passed the temperature, the arterial and capillary pressure, and then passed the pH, temperature and pulse rate with his visit. For example, the support for the {(blood pressure, temperature)} sequence is 2/3, since it is found in patients with identifiers 0 and 1.

In many applications, objects of the set I naturally combine into groups that in turn can also be grouped into more general groups, etc. Thus, the hierarchical structure of objects is obtained. categorization of analyzes:

An example of such a hierarchy may be the following

Pressure:

· Arterial; · Venous; · Capillary

Physical indicators:

· Temperature

Blood test: · Hemoglobin level; · PH

(14) (15) (16) the groups, and then, depending on the results, investigate the objects that interest the group analyst. In any case, it can be argued that the presence of a hierarchy in objects and its use in the task of finding associative rules allows you to perform a more flexible analysis and gain additional knowledge.

In the considered problem of searching for associative rules, the presence of an object in a transaction was determined only by its presence in it ( ∈ ) or the absence ( objects have additional attributes, usually ∉ ). Often, numeric. For example, analyzes in a transaction have attributes: value and duration. In this case, the presence of an object in the set can be determined not only by the fact of its presence, but also the execution of the condition in relation to a certain attribute. For example, in analyzing transactions performed by patients, they are interested not only in the value of the analysis, but also in how well this indicator is stable (long-term).

You can add additional objects to explore the sets in order to extend the analysis capabilities by searching for associative rules. In the general case, they may have a nature different from the main objects. For example, in the case of delivery of tests, you can enter the field of delivery frequency or symptoms that precede the delivery of these particular analyzes.

Solving the problem of finding associative rules, as well as any task, is to process the output and obtain the results. Processing of the initial data is performed by a certain Data

Mining algorithm.

The results obtained in solving this problem are accepted in the form of associative rules. In this regard, when searching for them, there are two main stages: 1. 2.

Finding all large sets of objects; Generation of associative rules from found large sets of objects. Associative rules are as follows:

programming languages. However, they are not always useful. There are three types of rules: 1. Useful rules - contain valid information that was previously unknown but has a logical explanation. Such rules can be used for making decisions that are beneficial; 2. Trivial rules - contain valid and easily understandable information that is already known. Such rules, although they can be explained, but can not bring any benefits, as they reflect or known laws in the studied area, or the results of past activity. Sometimes such rules can be used to verify the implementation of decisions taken on the basis of preliminary analysis; 3. Unclear rules - contain information that can not be explained. Such rules can be obtained either on the basis of abnormal values, or deeply hidden knowledge. Directly such rules can not be used for decision making, since their lack of clarity can lead to unpredictable results. For better understanding, further analysis is required.

Associative rules are built on the basis of large sets. So, the rules built on the basis of the set F, are all possible combinations of objects included in it.

For example, for the set {arterial pressure, temperature, pulse} the following associative rules can be constructed: If (arterial pressure) then (temperature); If (arterial pressure) then (pulse); If (arterial pressure) then (temperature); If (arterial pressure) then (temperature, pulse); If (temperature, pulse) then (arterial pressure); And so on.

Thus, the number of associative rules can be very large and bad for human perception. In addition, not all of the built-in rules carry useful information. To assess their usefulness, the following values are entered: • Support - shows which percentage of transactions supports this rule (we found rules, where Support is upper then 75%). • Confidence - shows the probability that the presence of a set Y in the transaction in the set X implies (we found rules, where Confidence is upper then 0.5). • Improvement - indicates whether this rule is useful for research.

These estimates are used when generating rules. An analyst when searching for associative rules specifies the minimum values of these variables. As a result, those rules that do not satisfy these conditions are discarded and are not included in the solution of the problem.

If objects have additional attributes that affect the composition of objects in transactions, and therefore in sets, then they should be taken into account in generated rules. In this case, the conditional part of the rules will not only include verification of the existence of an object in a transaction, but also more complex comparing operations: more, less, includes, etc. The resulting part of the rules may also contain statements about the attribute values. For example, if an indicator is considered topical, then the rules may look like this: If pH.relevance > 10 days then the level of hemoglobin in the blood.relevance < 3 days.

This rule states that the patient did the pH analysis more than 10 days ago, then probably his analysis of hemoglobin in the blood is valid for no more than 3 days.

The main differences between static and dynamic XML documents are: • Availability of validity period

A static XML document does not contain elements that indicate the expiration date of this document. In contrast, a dynamic XML document initially contains at least one element that indicates the validity period of a particular version of the document. • Persistence of displayed information

Once created, the information of a static XML document remains valid at all times. Conversely, the version of the dynamic XML document is valid only for the period specified in the corresponding elements. As soon as a new version appears, the information contained in the previous version is replaced.

Most of the work on finding associative rules in static XML documents is related to the use of XML-based algorithms based on the Apriori algorithm. However, there are a number of other approaches.

III. CONCLUSION

The task of finding associative rules is to identify sets of objects that are commonly encountered in a large number of objects. The task of sequential analysis is to search for frequent sequences. The main difference between the tasks of sequential analysis from the search for associative rules is to establish a relationship of order between objects. The presence of a hierarchy in objects and its use in the task of finding associative rules allows you to perform a more flexible analysis and obtain additional knowledge. The results of the solution of the problem are presented in the form of associative rules, conditional and the final part of which contains sets of objects.

[1] Brin , S. , & Page , L. ( 1998 ). The anatomy of a large-scale hypertextual web search engine . Computer networks and ISDN systems , 30 ( 1-7 ), 107 - 117 .

[2] Negnevitsky , M. ( 2005 ). Artificial intelligence: a guide to intelligent systems . Pearson Education.

[3] Jain , V. , Benyoucef , L. , & Deshmukh , S. G. ( 2008 ). A new approach for evaluating agility in supply chains using fuzzy association rules mining . Engineering Applications of Artificial Intelligence , 21 ( 3 ), 367 - 385 .

[4] Shakhovska , N. , Kaminskyy , R. , Zasoba , E. , & Tsiutsiura , M. ( 2018 ). ASSOCIATION RULES MINING IN BIG DATA . International Journal of Computing , 17 ( 1 ), 25 - 32