=Paper= {{Paper |id=Vol-2098/paper23 |storemode=property |title=Optimization Models for Detection of Patterns in Data |pdfUrl=https://ceur-ws.org/Vol-2098/paper23.pdf |volume=Vol-2098 |authors=Igor Masich,Lev Kazakovtsev,Alena Stupina }} ==Optimization Models for Detection of Patterns in Data== https://ceur-ws.org/Vol-2098/paper23.pdf
              Optimization Models for Detection
                    of Patterns in Data ?

                  Igor Masich, Lev Kazakovtsev, and Alena Stupina

       Siberian State University of Science and Technology, Krasnoyarsk, Russia
                                 is.masich@gmail.com



         Abstract. The paper reviews the issues of detection of hidden rules
         (or patterns) in data sets and their use to support decision making in
         recognition. The problem of finding patterns is considered as the problem
         of conditional optimization of monotone pseudo-Boolean functions. For
         comparison of patterns, three criteria are used: simplicity, selectivity and
         evidence, as well as their possible overlap. We consider the types of pat-
         terns obtained in accordance with these criteria, which are of the greatest
         interest for supporting decision making in recognition. The problem of
         searching for informative patterns by means of formalizing this search in
         the form of a conditional pseudo-Boolean optimization problem is inves-
         tigated. The analysis of properties of the optimization model is carried
         out, and a new alternative optimization model is proposed to search for
         strong spanned patterns.

         Keywords: Pseudo-Boolean optimization · Recognition · Rule-based al-
         gorithms




1     Introduction
At present, when solving classification problems, in addition to the requirement
of high accuracy, the necessity often arises for interpretability and validity of
the obtained solutions. Especially interpretability and validity are key factors in
solving those practical problems in which losses from making the wrong decision
can be great. Therefore, the decision support system used for such problems
should justify possible solutions and interpret the result.
    To create such system, data classification algorithms are required that, in
addition to the solution itself, provide an explicit rule, that is, they reveal knowl-
edge from the available data. This is true for logical classification algorithms, the
working principle of which is to identify patterns in data and formalize them in
the form of rules set, i.e. a set of patterns described by a simple logical formula.
?
    Results were obtained in the framework of the state task 2.5527.2017/8.9 of the
    Ministry of Education and Science of the Russian Federation
     Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
    In: S. Belim et al. (eds.): OPTA-SCL 2018, Omsk, Russia, published at http://ceur-ws.org
                      Optimization Models for Detection of Patterns in Data      265

     The process of forming logical rules is accompanied by solving the problems
of choosing the best alternatives in accordance with some criteria. In the studied
method the formalization of the formation process of logical rules is implemented
in the form of a number of problems of combinatorial optimization, which forms
a flexible and efficient algorithm for finding patterns in the primary data. By
combining a number of patterns in the composition, the obtained classifier will
solve the problem.
     However, at present time, there are a number of problems associated with
the application of the method of logical data analysis in solving practical clas-
sification problems. One of them is the construction of optimization models for
the informative patterns formation. While considering this problem, first of all,
it is necessary to determine the criteria and constraints that are the foundation
of these optimization models.
     The main objective is increase of the interpretability of the classifier and the
quality of recognition of new observations, that is, to improve the classifier’s
generalizing abilities.
     The main object of the research conducted by the authors is the method
of logical analysis of data derived from the theory of combinatorial optimiza-
tion [3]. This method belongs to rule based classification algorithms and is based
on identification of logical patterns from the data sample. This paper analyses
the existing optimization model for searching patterns and offers an alternative
optimization model for constructing strong spanned patterns.


2   The Concept of Pattern
The creation and use of logical classification algorithms is based on the identi-
fication in the primary data of the patterns from which a decision function is
formed. The search for patterns can be considered as a task of combinatorial
optimization. To obtain a more efficient solution, the choice of the optimization
algorithm should be made proceeding from the characteristic properties inherent
in the optimization considered problem. In this paper some properties of opti-
mization problems are considered that can be solved in the course of searching
of logical patterns in the data.
    Let us consider the problem of recognizing objects described by binary vari-
ables and divided into two classes
                                       [
                              K = K + K − ⊂ B2n ,

where B2n = B2 × B2 × · · · × B2 , B2 = {0, 1}.
   In this case, the classes do not intersect:
                                      \
                                 K + K − = ∅.
   An observation X ∈ K is described by a binary vector X = (x1 , x2 , ..., xn )
and can be represented as a point in the hypercube of the space of binary vari-
ables B2n . Observations of the class K + will be called positive points of the
266     I. Masich et al.

sample K, and observations of the class K − will be called negative points of the
sample.
   Let us consider a subset of points from B2n , for which some variables are fixed
and identical, and the rest take an arbitrary value:
                        T = {x ∈ B2n |xi = 1 for ∀i ∈ A
                        and xj = 0 for ∀j ∈ B}
                                            T
for some subsets A, B ⊆ {1, 2, . . . , n}, A B = ∅. This set can also be defined
as a Boolean function that takes a true value for the elements of the set:
                                           !        
                                     ^          ^
                        t(x) =           xi ∧    xj  .
                                   i∈A          j∈B

    The set of points x, for which t(x) = 1, is defined as S(t). S(t) is a subcube in
the Boolean hypercube B2n , the number of points of the subcube is 2(n−|A|−|B|) .
    The binary variable xi or its negation x̄i in a term is called a literal. Record
xαi indicates xi if α = 1 and x̄i if α = 0. Thus, a term is a conjunction of various
literals that does not simultaneously contain a certain variable and its negation.
The multiplicity of literals in the term t is defined as Lit(t).
    Let us consider that the term t covers the point a ∈ B2n , if t(a) = 1, that is,
this point belongs to the corresponding subcube.
    In this case under a pattern P (or rule) is meant a term that covers at least
one observation of a certain class and does not cover any observations of another
class [1]. That is, the pattern corresponds to a subcube having a non-empty
intersection with one of the sets (K + or K − ) and an empty intersection with
another set (K − or K + relatively). A consistent pattern P , which does not
intersect with K − , will be called positive and a consistent pattern P 0 , which
does not intersect with K + will be called negative. Below for definiteness only
positive patterns will be considered. A set of observations that are covered by
the pattern P is defined as Cov(P ).
    Patterns are elementary blocks for constructing logical recognition algorithms.
The most useful for this purpose are patterns with the largest coverage (maxi-
mum patterns), that is, those for which |Cov(P )| is maximum.
    One of the ways in constructing a set of patterns for the recognition algorithm
is to search for patterns that are based on the values of the attributes of specific
objects.


3     Criteria for Selection and Optimality of Patterns
There are no single unambiguous criteria for comparing logical patterns with
each other. While analyzing various data, different requirements can be imposed
on the quality and features of the formed patterns. In accordance with [5], to
evaluate the quality of pure patterns (homogeneous, non-covering observations
of other classes), three criteria are used – simplicity, selectivity and evidence, as
well as their possible overlap.
                     Optimization Models for Detection of Patterns in Data      267

    The criterion of simplicity (or compactness) is often used to compare patterns
among themselves, including those obtained by different learning algorithms.
    A pattern P1 is more preferable to P2 by the criterion of simplicity (define
P1 σ P2 ), if Lit(P1 ) ⊆ Lit(P2 ).
    In [2] a pattern P is prime if after removing any literal from Lit(P ) a term is
formed and is not a (pure) pattern (that is, it covers the observations of another
class).
    It is obvious that the optimality of a pattern by the criterion of simplicity is
identical with the assertion that this pattern is the prime.
    The search for simpler patterns has well-founded premises. First, such pat-
terns are better interpreted and understandable for humans when used in deci-
sion making. Secondly, it is often believed that simpler patterns have a better
generalizing ability, and their use leads to better recognition accuracy. However,
this assertion is debatable; moreover, it has been shown in [6] that a decrease in
simplicity can lead to greater accuracy.
    The usage of simple and, therefore, short patterns leads to the fact that the
number of incorrectly recognized positive observations (false negative) decreases,
but at the same time this can lead to an increase in the number of incorrectly
recognized negative observations (false positive observations). The natural way
to reduce the number of erroneous positive observations is the formation of more
selective observations, which is achieved by reducing the size of the subcube that
determines the pattern.
    A pattern P1 is more preferable to P2 by criterion of selectivity (define P1 Σ
P2 ) if S(P1 ) ⊆ S(P2 ).
    It should be noted that the two criteria considered above are opposite to each
other, that is, Lit(P1 ) ⊆ Lit(P2 ) ⇔ S(P1 ) ⊇ S(P2 ).
    A pattern that optimal by the selectivity criterion is a minterm, that is,
a pattern covering the only positive observation. The use of this criterion by
itself is, of course, not effective, since the resulting patterns (minterms) do not
possess any generalizing ability. But the selectivity criterion is extremely useful
when used in conjunction with other criteria, which will be discussed later.
    Another useful criterion is a criterion based on coverage, that is, the number
of positive observations of the training sample that satisfy the conditions of
the pattern. Undoubtedly, the patterns with greater coverage have a greater
generalizing ability. And the observations of the training sample, covered by the
pattern, are a kind of proof of the applicability of this pattern to the decision.
    A pattern P1 is more preferable to P2 by criterion of evidence (define P1 ε
P2 ), if Cov(P1 ) ⊇ Cov(P2 ).
    The patterns that are optimal by the criterion of evidence are called strong.
That is, a pattern P is strong in such case that there is no such pattern P’, that
Cov(P 0 ) ⊃ Cov(P ).
    It is important to note that the considered criteria are not completely inde-
pendent. So, the criteria of simplicity and selectivity are opposite to each other.
Moreover, the following dependencies can be noted:

                              P1 σ P2 ⇒ P1 ε P2 ,
268     I. Masich et al.




                                P1 Σ P2 ⇒ P2 ε P1 .

    To combine the considered criteria with each other two methods are used.
    For given criteria π and ρ a pattern P1 is more preferable with respect to
P2 by intersection of π ∧ ρ (define P1 π∧ρ P2 ), if and only if P1 π P2 and
P1  ρ P2 .
    For given criteria π and ρ a pattern P1 is more preferable with respect to P2
on lexicographical refinement π|ρ (define P1 π|ρ P2 ), if and only if P1 π P2 or
(P1 ≈π P2 and P1 ρ P2 ).
    In [2] it is shown that only three of all possible combinations of criteria
deserve attention. Patterns that are optimal by criterion Σ ∧ε are called spanned.
Patterns that are optimal by criterion ε|σ are called strong prime. And patterns
that are optimal by criterion ε|Σ are called strong spanned.
    Among all the types of patterns obtained in accordance with the above crite-
ria and their combinations, patterns of prime, strong prime and strong spanned
are most useful for identifying informative patterns and their use to support
decision making in recognition.


4     Search for Optimal Patterns
Let us emphasize some a ∈ K + observation and define pattern P a , covering the
observation a. Those variables that are fixed in P a , are equal to the correspond-
ing values of the attributes of the object a.
    To define the pattern P a , the binary variables Y = (y1 , y2 , . . . , yn ) are intro-
duced:
                            1, if ith attribute is fixed in P a ,
                          
                     yi =
                            0, otherwise.
    Some point b ∈ K + will be covered by the pattern P a only if yi = 0 for all
i for which bi 6= ai . On the other hand, some point c ∈ K − will not be covered
by the pattern P a in case if yi = 1 at least for one variable i, for which ci 6= ai .
    A condition that says that a positive pattern must not contain any point K −
requires that for each observation c ∈ K − variable yi take the value 1 for at
least one i, for which ci 6= ai [2]:
                                       n
                                       X
                                                yi ≥ 1.
                                     i=1
                                     ci 6= ai

   Strengthening the constraint to increase resistance to errors is made by re-
placing the number 1 on the right-hand side of the inequality by a positive integer
d.
   On the other hand, a positive observation b ∈ K + will be a part of considerate
subcube when the variable yi takes the value 0 for all indices i for which bi 6= ai .
                      Optimization Models for Detection of Patterns in Data       269

Thus, the number of positive observations covered by a-pattern can be calculated
as:
                              X      Yn
                                          (1 − yi ).
                               b∈K + i = 1

                                        bi 6= ai
    Thus, the problem of finding the maximum pattern can be written in the form
of a problem of finding such values Y = (y1 , y2 , . . . , yn ), in which the obtained
pattern P a covers as many possible points b ∈ K + and does not cover any points
c ∈ K − [4]:
                           X      Yn
                                       (1 − yi ) → max,                            (1)
                                                         Y
                           b∈K + i = 1

                                 bi 6= ai
                             n
                             X
                                      yi ≥ 1 for all c ∈ K − .                    (2)
                           i=1
                           ci 6= ai
    This problem is the problem of conditional pseudo-Boolean optimization,
that is, the optimization problem in which the objective functions and the func-
tions on the left-hand side of the constraint are pseudo-Boolean functions (real
functions of Boolean variables).
    The problem of finding the maximum negative patterns is formulated simi-
larly.
    It should be noted that any point Y = (y1 , y2 , . . . , yn ) corresponds to a
subcube in the space of binary variables X = (x1 , x2 , . . . , xn ), including the
basic observation a. When Y ∈ Ok (Y 1 ) (in other words Y differs from Y 1 by
value of k coordinate), where Y 1 = (1, 1, ..., 1), the number of points of this
subcube is 2k .
    Each found pattern is characterized by coverage (the number of covered ob-
servations of the certain class) and the degree (the number of fixed variables
that determine this pattern). According to the above optimization model (1)–
(2), the obtained patterns do not cover any observations of another class (from
the training sample).
    The most valuable patterns are those that have the greatest coverage. The
larger the coverage, the better the pattern reflects the image of the class.


5    Optimization Problem Properties
Let us consider optimization problem properties (1)–(2). For this, first of all, we
mention the basic concepts [1].

 – Points X 1 , X 2 ∈ B2n are called k-nearly, if they differ in value k coordinate,
   k = 1, . . . , n.
270       I. Masich et al.

 – Set Ok (X), k = 0, . . . , n, of all points B2n , k-nearly to the point X, is called
   k-th level of point X.
 – Point X ∈ B2n is called                                       n
                                                                          T
                             S k-nearly to the set A ⊂ B2 , if A Ok (X) 6= ∅ and
   ∀l = 0, . . . , k − 1 : A Ol (X) = ∅.
 – Point X ∗ ∈ B2n , for which f (X ∗ ) < f (X), ∀X ∈ O1 (X ∗ ), is called local
   minimum of a pseudo-Boolean function f .
 – A pseudo-Boolean function having only one local minimum is called unimodal
   function on B2n .
 – Unimodal function f is called monotone on B2n , ifT ∀X k ∈ Ok (X ∗ ), k =
   1, . . . , n: f (X k−1 ) ≤ f (X k ), ∀X k−1 ∈ Ok−1 (X ∗ ) O1 (X k ), and strictly
   monotone, if this condition is satisfied with the strict inequality sign.
 – A set of points W (X 0 , X l ) = {X 0 , X 1 , . . . , X l } ⊂ B2n is called path between
   points X 0 and X l , if ∀i = 1, . . . , l point X i is nearly to X i−1 .
 – Path W (X 0 , X l ) ⊂ B2n between k-nearly points X 0 and X l is called shortest,
   if l = k.
 – ∀X, Y ∈ B2n combination of all shortest paths W (X, Y ) is called subcube B2n
   and define as K(X, Y ).
    Let us consider the basic properties of the set of feasible solutions of the
conditional pseudo-Boolean optimization problem. There is a problem of the
following form:
                              C(X) → max n ,                                 (3)
                                             X∈S⊂B2

where C(X) is a monotone increasing from X 0 pseudo-Boolean function, S ⊂ B2n
is some feasible subspace of Boolean variables, defined by a given system of
constraints, for example:
                              Aj (X) ≤ Hj , j = 1, . . . , m.
   Let us introduce a number of concepts for a subset of points of the space of
Boolean variables [1].
 – Point Y ∈ S is a boundary point of the set S, if there are X ∈ O1 (Y ), for
   which X ∈ / S.
 – Point Y ∈ Oi (X 0 ) S is
                       T
                            T called limiting point of the set S with base point
   X 0 ∈ S, if ∀X ∈ O1 (Y ) Oi+1 (X 0 ) condition X ∈  / S is fulfilled.
 – The constraint defining the subarea of the space of Boolean variables are
   called active if the optimal solution of the conditional optimization prob-
   lem (3) does not coincide with the optimal solution of the corresponding
   optimization problem without the considering constraint.
      One of the properties of the set of feasible solutions is as follows:
Property 1. If the objective function is a strictly monotonic unimodal function
and the constraint is active, then the optimal solution of problem (3) is a point
belonging to the subset of limiting points of the set of feasible solutions S with
the base point X 0 at which the objective function takes minimum value:
                                C(X 0 ) = minn C(X).
                                            X∈B2
                     Optimization Models for Detection of Patterns in Data     271

   Let us consider a separate restriction in the optimization problem (1)–(2):

               Aj (Y ) ≥ 1, for some cj ∈ K − , j = {1, 2, ..., |K − |},

                                                        n
                                                        X
                             where Aj (Y ) =                     yi .
                                                     i=1
                                                     cji 6= ai

   Introduce the notation
                                             1, if cji 6= ai ;
                                        
                                δij =
                                             0, if cji = ai .

   Then
                                                 n
                                                 X
                                  Aj (Y ) =        δij yi .
                                                  i=1

    Function Aj (Y ) is monotone decreasing from the point Y 1 = (1, 1, . . . , 1).
    The limiting points of the feasible domain are points Yk ∈ On−1 (Y 1 ) (either,
which is the same thing, Yk ∈ O1 (Y 0 )), and such that Yk differ from Y 0 =
(0, 0, . . . , 0) by the value of the k-th component, for which δkj = 1.
    The set of feasible solutions is the combination of the subcubes formed by
the limiting points of the feasible space and the point Y 1 :
                                       [
                                            K(Yk , Y 1 ).
                                     j
                                  k:δk =1

   Now let us move on to the entire system of constraints

                      Aj (Y ) ≥ 1, for all j = 1, 2, . . . , |K − |.

   This system will be satisfied by points belonging to the set
                               |K − |
                                \        [
                                                K(Ykj , Y 1 ),
                                j=1 k:δ j =1
                                         k


which, in the final analysis, is the union of a finite number of subcubes. The
limiting points of the feasible space can be at completely different levels of the
                                                                        [n/2]
point Y 1 . And their number, in the worst case, can reach the value Cn       that
is the average power level.
    Next, let us consider the objective function
                                        X          n
                                                   Y
                           C(Y ) =                         (1 − yi )
                                        b∈K +   i=1
                                                bi 6= ai
272     I. Masich et al.

for some “basic” observation a ∈ K + . Or it can be written as
                                       |K + | n
                                        XY
                             C(Y ) =              (1 − ∆ji yi ),
                                       j=1 i=1



                                               1, if bji 6= ai
                                           
                            where ∆ji =
                                               0, if bji = ai .

    The function C(X) increases monotonically from the point Y 1 = (1, 1, ..., 1),
taking the value 1 in it, which corresponds to the covering of the “basic” obser-
vation a. The largest value, equal to |K + |, the function C(X) takes the value
Y 0 = (0, 0, ..., 0) at a point.
    Let us suppose that the observation b ∈ K + closest to observation a differs
in the value of the s component.
                                               n
                                               X
                             s=      min
                                      +
                                                     |ai − bi |.
                                  b∈K \{a}
                                               i=1

    At all points Y ∈ Ok (Y 1 ), k = 0, 1, ..., s−1, the value of the objective function
are the same and equal to 1. The presence of such a set of constancy complicates
the work of optimization algorithms starting to search from a feasible point Y 1
and leading it along nearly points, since the calculation of the objective function
in a nearly system consisting of nearly points does not give an information on
the best direction of the search.
    In solving practical problems of large dimensions this set of constancy can be
such that most of the points of the feasible space belong to it. This complicates
or makes the operation of such algorithms as a genetic algorithm, a local search
with a multistart impossible.
    Another consequence of the sets of constancy presence of the objective func-
tion is that the optimal solution is not only a point belonging to a subset of
limiting points of an feasible space, but it can be an integer set of points repre-
senting a set of constancy of the objective function. The following statement is
true.

Proposition 1. The limiting points of the feasible space of problem (1)–(2) cor-
respond to the prime patterns.

    The proof follows from the concepts of the limiting point and the prime
pattern.
    It should be noted that the found prime pattern will not necessarily be a
strong pattern. This property is garanteed only for the optimal solution of the
problem.

Proposition 2. The optimal solution of problem (1)–(2) corresponds to a strong
prime pattern.
                      Optimization Models for Detection of Patterns in Data      273

Proof. From the past statement it follows that the optimal solution corresponds
to the prime pattern. The value of the objective function of the problem is
a covering of pattern. If the solution is optimal, then there is no pattern P a
(covering the basic observation a), better by the criterion of evidence, and hence
this solution corresponds to a strong pattern.

   Thus, applying approximate optimization algorithms, it can be confirmed
that the found pattern will be prime, but it will not necessarily be strong. If the
exact optimization algorithm will be used, then found pattern will be a strong
prime pattern.


6   Search for Strong Spanned Patterns

The optimization model is determined, first of all, by the way in which the
variables that define the alternatives of the optimization problem are introduced.
In the optimization model discussed above, the alternatives, that is, the patterns,
are determined by including or not including in the pattern the conditions that
are satisfied in some basic observation. Let us consider an alternative way of
specifying the pattern.
    Let us introduce the binary variable
                               
                                1, if observation b ∈ K +
                         zb =     is covered by P a ,
                                 0, otherwise.
                               

    To ensure that the resulting pattern is encompassing, let us introduce into
it all the literals available for all positive observations that this pattern covers,
that is, those literals for which the condition is true:
                               Y
                                       (1 − |bi − ai |) = 1.
                                   +
                           b∈K :
                           zb = 1

   Now let us formulate the problem of finding the maximum pattern as the
maximization of the coverage of positive observations under the condition that
the coverage of negative observations is in admissible:
                                   X
                                           zb → max,                             (4)
                                                 Z
                                   b∈K +

                 n
                 X         Y
                                   (1 − |bi − ai |) ≥ 1 for all c ∈ K − .        (5)
                               +
               i=1 b∈K :
               ci 6= ai zb = 1
274     I. Masich et al.

   Having solved this optimization problem, we will define a set of positive
observations covered by the desired pattern. The pattern itself can be determined
using the characteristic variables Y = (y1 , y2 , ..., yn ):
                             Y
                     yi =            (1 − |bi − ai |) , i = 1, ..., n.
                                 +
                            b∈K :
                            zb = 1

    In this formulation of the optimization problem, the objective function is
strictly monotonic, and there are no sets of constancy of the objective function.

Proposition 3. In the problem (4)–(5), the limiting points of the feasible space,
and only they, correspond to strong spanned patters.

                                         0          0
                 T a point 0Zk ∈ Ok (Z ), where Z = (z1 , ..., zm+ ). For any point
Proof. Let us take
Zk+1 ∈ O1 (Zk ) Ok+1 (Z ), Cov(Tk ) ⊂ Cov(Tk+1 ) is fulfilled, where Tk and
Tk+1 are terms, corresponding to the points Zk and Zk+1 respectively.  T If Zk is
an limiting point of an feasible set, then any point of Zk+1 ∈ O1 (Zk ) Ok+1 (Z 0 )
is not permissible, and the term Tk+1 is not a pattern, consequently, the term
Tk is a strong pattern (according to the concept of a strong regularity).

    In turn, if the point ZTk is not an limiting point then there is an admissible
point of Zk+1 ∈ O1 (Zk ) Ok+1 (Z 0 ) and the corresponding term Tk+1 , which
is a pattern with a large covering Cov(Tk+1 ) ⊃ Cov(Tk ), so the term Tk is not
strong pattern.


7     Experimental Research

The results of experimental studies conducted on the problem of predicting com-
plications of myocardial infarction are presented in [4]. The sample consists of
1700 observations (patients), each of which is described by 112 different types
of symptoms. According to the input criteria, it is required to make a predic-
tion about the onset of four complications of the disease: ventricular fibrillation
(VF), auricular fibrillation (AF), pulmonary edema (PE), cardiac rupture (CR).
85% of sample observations were used for training, the rest – for control. The
search for patterns was implemented with the help of two optimization models:
the search for the maximum prime patterns (model 1) and the search for strong
spanned patterns (model 2).
    The optimization problems were solved using the algorithms of conditional
pseudo-Boolean optimization described in [1].
    Table 1 presents comparative results for the average degree of obtained pat-
terns (the number of literals in the term) and the accuracy of recognition of
control observations. As can be seen from the results, the use of strong spanned
patterns makes it possible to obtain classifiers with a better generalizing ability.
                      Optimization Models for Detection of Patterns in Data         275

                      Table 1. Results of experimental research

The problem of forecasting Class       Model 1          Model 2
                                   Degree Precision Degree Precision
VF                          neg. 5        0.90      6       0.90
                            pos. 3        0.78      4       0.83
AF                          neg. 7        0.80      9       0.80
                            pos. 6        0.68      6       0.68
PE                          neg. 7        0.68      9       0.68
                            pos. 4        0.89      5       0.89
CR                          neg. 5        1.00      6       1.00
                            pos. 3        0.79      4       0.93



8    Conclusions
The paper examines the problem of searching for informative patterns by means
of formalizing of this search in the form of a conditional pseudo-Boolean opti-
mization problem. The analysis of the properties of the constructed optimization
model is implemented and a new alternative model is proposed for optimization,
designed to search for strong spanned patterns.


References
 1. Antamoshkin, A.N., Masich, I.S.: Search algorithms for conditional pseudo-Boolean
    optimization. Control, communication and security systems 1, 103–145 (2016)
 2. Bonates, T.O., Hammer, P.L., Kogan, A.: Maximum patterns in datasets. Discrete
    Applied Mathematics 156(6), 846–861 (2008)
 3. Crama, Y., Hammer, P.L., Ibaraki, T.: Cause-effect Relationships and Partially
    Defined Boolean Functions. Annals of Operations Research 16, 299–325 (1988)
 4. Golovenkin, S.E., Gorban, A.N., Shulman, V.A.: Complications of myocardial in-
    farction: a database for approbation of recognition and prognosis systems: Preprint
    6; Krasnoyarsk: Computing Center of the SB RAS (1997)
 5. Hammer, P.L., Kogan, A., Simeone B,., Szedmak, S.: Pareto-optimal patterns in
    Logical Analysis of Data. Discrete Applied Mathematics 144(1–2), 79–102 (2004)
 6. Holte, R.C., Acker, L., Porter, B.: Concept learning and the problem of small dis-
    juncts. In: Proceedings of the Eleventh International Joint Conference on Artificial
    Intelligence (IJCAI-89). pp. 813–818. Morgan Kaufmann, Detroit, MI (1989)