=Paper= {{Paper |id=Vol-2416/paper37 |storemode=property |title=Analysis of the structure of the relationship between the descriptions of objects of classes and evaluation of their compactness |pdfUrl=https://ceur-ws.org/Vol-2416/paper37.pdf |volume=Vol-2416 |authors=Ekaterina Zguralskaya }} ==Analysis of the structure of the relationship between the descriptions of objects of classes and evaluation of their compactness == https://ceur-ws.org/Vol-2416/paper37.pdf
Analysis of the structure of the relationship between the
descriptions of objects of classes and evaluation of their
compactness

               E N Zguralskaya1


               1
                Ulyanovsk Technical University. Institute of Aviation Technologies and Management,
               Sozidateley avenue, 13A, Ulyanovsk, Russia, 432072



               e-mail: iatu@inbox.ru


               Abstract. The study is conducted to assess the compactness of descriptions of objects of
               classes on the numerical axis and in the multidimensional attribute space. The computation of
               compactness is possible only in the defined boundaries of areas of the attribute space. In the
               one-dimensional case, the boundaries are calculated by the frequency of occurrence of the
               values of features of objects of classes in the interval. In the multidimensional case, a subset of
               the boundary objects of the classes is used for a given metric. A comparative analysis is given
               of the values of the compactness measure by latent attributes on the numerical axis and by the
               sets of initial features from which they are synthesized.



1. Introduction
In the pattern recognition theory objects are structured into classes based on the compactness
hypothesis. Under this hypothesis, “close” objects shall belong to the same class. It is necessary to
clarify (interpret) the terms “closeness” and “compactness” of objects.
    No common determination of the “compactness” term has been adopted. [1] postulates a
compactness measure of disjoint groups, set of admissible values of which is determined in (0, 1] and
depends on structure of relations between objects. The following factors affecting values of
compactness are pointed out:
     • the choice of the metric to compute distances between objects;
     • the dimension of the attribute space;
     • the choice of the way to scale and normalize data;
     • the usage of methods to select informative collections of attributes;
     • conditions to select and remove noise objects from the sample;
     • the number of standard objects of the minimal coverage of the learning sample;
     • linear and nonlinear transformations of the attribute space for the description of the objects.
    The aim of the searching for extremal values of compactness measures on the variety of parameters
listed above is to improve generalizing ability of recognition algorithms. The method to obtain a
quantitative estimate for the pattern compactness, described in [2], is based on the usage of the
function of competitive similarity between objects (FRiS-functions). Using the FRiS-function, one can


                   V International Conference on "Information Technology and Nanotechnology" (ITNT-2019)
Data Science
E N Zguralskaya




describe all distributions of classes by collections of standard objects. The collection of objects allows
one to find the compactness measure of the whole sample or each separate object of the class and to
clear the learning sample from objects adding negative contributions to the value compactness.
    Implementation of machine learning algorithms becomes significantly more complicated when the
dimension of the data is large. A geometric interpretation of origin of the effect of curse of
dimensionality is given in [3]. The effect of curse of dimensionality arises from the fact that the
number of possible sets of attributes in the description of objects significantly exceeds the number of
training examples. Learning algorithm can only support correct generalization provided that the
number of examples from learning sample is enough.
    Compactness implies the existence of a boundary between areas of attribute space with a
description of objects from different classes.
    Numerical methods to obtain a quantitative estimate for compactness are differentiated as well. For
one-dimensional cases the interval methods are used while for multidimensional cases – computation
of measure of compactness of objects of classes and samples in a whole for a given metric. What both
one-dimensional and multidimensional cases have in common is the existence of areas of attribute
space on boundaries of which measure of compactness is computed.
    For a one-dimensional case the objects can be compared on the numerical axis by values of its
initial and latent attributes using relations “greater than”, “less than” or “equal to”.
    When the measure of compactness is computed for a multidimensional case in [1] the property of
connectedness of objects along the subset (spans) of boundary objects of disjoint groups is used. Based
on this property the objects are decomposed into disjoint groups. Connectedness of objects Si, Sj is
treated as property of logical regularities in form of hyperballs with these objects being its centre. Si
and Sj objects are considered bound if their intersection contains spans objects. Any pair of objects
(Si,Sj) of one group can always be linked by a chain of connected objects. Ideally, all class objects
shall represent one group of connected objects.
    This paperwork reviews structure of relations between class objects on the numerical axis. It is
suggested to use measures of compactness, computed through decomposition of either attributes
values (initial and latent) or values of distance between the objects into intervals, as a research tool.
Values of measure of compactness are used to detect latent patterns in data. Such patterns can be
regarded as new knowledge obtained within the frames of information models of ill-structured subject
areas.

2. Criteria for decomposition of attributes into intervals
Let us consider two computing algorithms put forward in [4, 5] to optimize criteria for decomposition
of attributes values into intervals. For convenience let these criteria be referred to as CR1 and CR2.
   When computing with respect to CR1 number of intervals on the ordered sequence of attribute
values equals to number of disjoint classes. Values of interval boundaries are determined via the
maximum of product of intraclass similarity and interclass difference. Ideally every interval shall be
represented by all attribute values of objects of one class.
   For the CR2 criterion the number of classes is 2, the number of intervals is equal to or greater than
2. When computing boundaries of disjoint intervals, number of which is initially unknown, the
absolute difference in frequency of occurrence of attribute values (both initial and latent) in the
description of objects of two classes is used. The values of attributes on the numerical axis form a
sequence of clusters (intervals). There should not be two neighboring clusters in which representatives
of one class would dominate (in terms of frequency of occurrence). Those decompositions are
considered ideal in the sense of consistency, for which values of (not necessarily all) objects of only
one class are contained within the boundaries of each interval.
   The set of admissible values by the CR1 criterion and the consistent decomposition of attributes
into intervals over CR2 are contained in the segment [0; 1] and are further considered as a measure of
their compactness. The value 1 corresponds to a perfect decomposition with respect to CR1 and CR2.
The degree of deviation from the ideal can be inferred by values less than 1.




V International Conference on "Information Technology and Nanotechnology" (ITNT-2019)                  284
Data Science
E N Zguralskaya




    Combined use of CR1 and CR2 criteria is necessary to detect latent patterns in data. The search for
patterns is based on results of a computational experiment. To interpret the results of the experiment,
known forms of logical regularities are used (hyperball, half-plane, parallelepiped).
    Let a set of objects E0={S1,…,Sm} be given, containing representatives d of disjoint classes K1,...,Kd.
The objects are described using a set of n different types of X (n) attributes, δ (δ