=Paper=
{{Paper
|id=Vol-3308/paper10
|storemode=property
|title=Experimental Study of Concise Representations of
Concepts and Dependencies
|pdfUrl=https://ceur-ws.org/Vol-3308/Paper10.pdf
|volume=Vol-3308
|authors=Aleksey Buzmakov,Egor Dudyrev,Sergei O. Kuznetsov,Tatiana Makhalova,Amedeo Napoli
|dblpUrl=https://dblp.org/rec/conf/cla/BuzmakovDKMN22
}}
==Experimental Study of Concise Representations of
Concepts and Dependencies==
<pdf width="1500px">https://ceur-ws.org/Vol-3308/Paper10.pdf</pdf>
<pre>
Experimental Study of Concise Representations of
Concepts and Dependencies
Aleksey Buzmakov1 , Egor Dudyrev2 , Sergei O. Kuznetsov2 , Tatiana Makhalova3 and
Amedeo Napoli3,1,∗
1
  HSE University, Perm, Russia
2
  HSE University, Moscow, Russia
3
  Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France


                                         Abstract
                                         In this paper we are interested in studying concise representations of concepts and dependencies, i.e.,
                                         implications and association rules. Such representations are based on equivalence classes and their
                                         elements, i.e., minimal generators, minimum generators including keys and passkeys, proper premises,
                                         and pseudo-intents. All these sets of attributes are significant and well-studied from the computational
                                         point of view, while their statistical properties remain to be investigated. This is the purpose of this
                                         paper to study these singular attribute sets and in parallel to study how to evaluate the complexity of a
                                         dataset from an FCA point of view. In the paper we analyze the empirical distributions and the sizes of
                                         these particular attribute sets. In addition we propose several measures of data complexity relying on
                                         these attribute sets in considering real-world and related randomized datasets.

                                         Keywords
                                         Formal Concept Analysis, attribute sets, equivalence classes, closed sets, generators, keys, data complexity


1. Introduction
In this paper we are interested in measuring “complexity” of a dataset in terms of Formal
Concept Analysis (FCA [1]). On the one hand, we follow the lines of [2] where the “closure
structure” and the “closure index” are introduced and based on the so-called passkeys, i.e.,
minimum generators in an equivalence class of itemsets. On the other hand, we would like to
capture statistical properties of a dataset, not just extremal characteristics such as the size of
a passkey. In the following we introduce an alternative approach and we try to measure the
complexity of a dataset in terms of five main elements that can be computed in a concept lattice,
namely intents (closed sets), pseudo-intents, proper premises, keys (minimal generators), and
passkeys (minimum generators). We follow a more practical point of view and we study the
distribution of these different elements in various datasets. We also investigate the relations that


Published in Pablo Cordero, Ondrej Kridlo (Eds.): The 16𝑡ℎ International Conference on Concept Lattices and Their
Applications, CLA 2022, Tallinn, Estonia, June 20–22, 2022, Proceedings, pp. 117–132.
∗
    Corresponding author.
Envelope-Open amedeo.napoli@loria.fr (A. Napoli)
Orcid 0000-0002-9317-8785 (A. Buzmakov); 0000-0002-2144-3308 (E. Dudyrev); 0000-0003-3284-9001 (S. O. Kuznetsov);
0000-0002-6724-3803 (T. Makhalova); 0000-0001-5236-9561 (A. Napoli)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
these five elements have with one another, and the relations with implications and association
rules.
   For example, the number of intents gives the size of the lattice, while the number of pseudo-
intents gives the size of the Duquenne-Guigues basis [3], and thus the size of the minimal
implication basis representing the whole lattice. The size of the covering relation of the concept
lattice gives the size of the “base” of association rules. Moreover, passkeys are indicators related
to the closure structure and the closure index indicates the number of levels in the structure.
The closure structure represents a dataset, so that closed itemsets are assigned to the level of the
structure given by the size of their passkeys. The complexity of the dataset can be read along
the number of levels of the dataset and the distribution of itemsets w.r.t. frequency at each level.
The most interesting are the “lower” levels, i.e., the levels with the lowest closure index, as
they usually contain itemsets with high frequency, contrasting the higher levels which contain
itemsets with a quite low frequency. Indeed, short minimum keys or passkeys correspond to
implications in the related equivalence class with minimal left-hand side (LHS) and maximal
right-hand side (RHS), which are the most informative implications [4, 5].
   In this paper we discuss alternative ways of defining the “complexity” of a dataset and how it
can be measured in the related concept lattice that can be computed from this dataset. For doing
so, we introduce two main indicators, namely (i) the probability that two concepts 𝐶1 and 𝐶2 are
comparable, (ii) given two intents 𝐴 and 𝐵, the probability that the union of these two intents is
again an intent. The first indicator “measures” how close is the lattice to a chain, and the second
indicator “measures” how close the lattice is to a distributive one [6, 7]. Indeed, a distributive
lattice may be considered as less complex than an arbitrary lattice, since, given two intents 𝐴
and 𝐵, their meet 𝐴 ∩ 𝐵 and their join 𝐴 ∪ 𝐵 are also intents. Moreover, in a distributive lattice,
all pseudo-intents are of size 1, meaning that every implication in the Duquenne-Guigues base
has a premise of size 1. Following the same line, given a set of 𝑛 attributes, the Boolean lattice
℘(𝑛) is the largest lattice that one can build from a context of size 𝑛 × 𝑛, but ℘(𝑛) can also be
considered as a simple lattice, since it can be represented by the set of its 𝑛 atoms. In addition,
the Duquenne-Guigues implication base is empty, so there are no nontrivial implications in
this lattice. Finally, a Boolean lattice is also distributive, thus it is simple in terms of the join of
intents.
   This paper presents an original and practical study about the complexity of a dataset through
an analysis of specific elements in the related concept lattice, namely intents, pseudo-intents,
proper premises, keys, and passkeys. Direct links are drawn with implications and association
rules, making also a bridge between the present study in the framework of FCA, and approaches
more related to data mining, actually pattern mining and association rule discovery. Indeed, the
covering relation of the concept lattice makes a concise representation of the set of association
rules of the context [4, 5], so that every element of the covering relation, i.e., a pair of neighboring
concepts or edge of the concept lattice, stays for an association rule, and reciprocally, every
association rule can be given by a set of such edges. Frequency distribution of confidence of the
edges can be considered as an important feature of the lattice as a collection of association rules.
   For studying practically this complexity, we have conducted a series of experiments where we
measure the distribution of the different elements for real-world datasets and then for related
randomized datasets. Actually these randomized datasets are based on corresponding real-world
datasets where either the distribution of crosses in columns is randomized or the whole set of
crosses is randomized while keeping the density of the dataset. We can observe that randomized
datasets are usually more complex in terms of our indicators than real-world datasets. This
means that, in general, the set of “interesting elements” in the lattice is smaller in real-world
datasets.
  The paper is organized as follows. In the second section we introduce the theoretical back-
ground and necessary definitions. Then the next section presents a range of experiments
involving real-world and randomized datasets. Finally, the results of experiments are discussed
and then we propose a conclusion.


2. Theoretical Background
2.1. Classes of Characteristic Attribute Sets
Here we recall basic FCA definitions related to concepts, dependencies, and their minimal
representations. After that we illustrate the definitions with a toy example. Let us consider a
formal context 𝐾 = (𝐺, 𝑀, 𝐼 ) and prime operators:

                              𝐴′ = {𝑚 ∈ 𝑀 ∣ ∀𝑔 ∈ 𝐴 ∶ 𝑔𝐼 𝑚},        𝐴⊆𝐺                               (1)
                               𝐵′ = {𝑔 ∈ 𝐺 ∣ ∀𝑚 ∈ 𝐵 ∶ 𝑔𝐼 𝑚},      𝐵⊆𝑀                                (2)

   We illustrate the next definitions using an adaptation of the “four geometrical figures and their
properties” context [8] which presented in Table 1. The set of objects 𝐺 = {𝑔1 , 𝑔2 , 𝑔3 , 𝑔4 } corre-
sponds to {equilateral triangle, rectangle triangle, rectangle, square}) and the set of attributes
𝑀 = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒} corresponds to {has 3 vertices, has 4 vertices, has a direct angle, equilateral,
e} (“e” is empty and introduced for the needs of our examples). The related concept lattice is
shown in Figure 1.

Definition 1 (Intent or closed description). A subset of attributes 𝐵 ⊆ 𝑀 is an intent or is closed
iff 𝐵″ = 𝐵.

   In the running example (Table 1), 𝐵 = {𝑏, 𝑐} = 𝐵″ is an intent and is the maximal subset of
attributes describing the subset of objects 𝐵′ = {𝑔3 , 𝑔4 }.

Definition 2 (Pseudo-intent). A subset of attributes 𝑃 ⊆ 𝑀 is a pseudo-intent iff:
   1. 𝑃 ≠ 𝑃 ″
   2. 𝑄 ″ ⊂ 𝑃 for every pseudo-intent 𝑄 ⊂ 𝑃

    Pseudo-intents are premises of implications of the cardinality-minimal implication basis
called “Duquenne-Guigues basis” [3] (DG-basis, also known as “canonical basis” or “stembase”
[1]). In the current example (Table 1), the set of pseudo-intents is {{𝑏}, {𝑒}, {𝑐, 𝑑}, {𝑎, 𝑏, 𝑐}} since:
(i) {𝑏}, {𝑒}, {𝑐, 𝑑} are minimal non-closed subsets of attributes, and (ii) {𝑎, 𝑏, 𝑐} is both non-closed
and contains the closure {𝑏, 𝑐} of the pseudo-intent {𝑏}.
                                                         a              d              c


                                                                                       b
           a   b    c   d   e
                                                                                       g3
     𝑔1    x            x
     𝑔2    x        x                                   g1              g2             g4
     𝑔3        x    x                                                    e

     𝑔4        x    x   x
                                        Figure 1: The corresponding lattice of geometrical figures.
   Table 1: The adapted con-
            text of geometri-
            cal figures [8].


Definition 3 (Proper premise). A set of attributes 𝐴 ⊆ 𝑀 is a proper premise iff:
                                                         ″
                                       𝐴 ∪ ⋃ (𝐴 ⧵ {𝑛}) ≠ 𝐴″
                                            𝑛∈𝐴

    In the running example (Table 1), 𝑄 = {𝑎, 𝑏} is a proper premise since the union of 𝑄 with the
closures of its subsets does not result in the closure of 𝑄, i.e., {𝑎, 𝑏}∪{𝑎}″ ∪{𝑏}″ = {𝑎, 𝑏}∪{𝑎}∪{𝑏, 𝑐} =
{𝑎, 𝑏, 𝑐} ≠ {𝑎, 𝑏, 𝑐, 𝑑, 𝑒}.
    Proper premises are premises of the so-called “proper-premise base” (PP-base, see [1, 9]) or
“direct canonical base” [10, 11]. The PP-base is a “direct” or “iteration-free base of implications”,
meaning that we can obtain all possible implications with a single application of Armstrong
rules to implications in PP-base.

Definition 4 (Generator). A set of attributes 𝐷 ⊆ 𝑀 is a generator iff ∃𝐵 ⊆ 𝑀 ∶ 𝐷 ″ = 𝐵.

   In this paper, every subset of attributes is a generator of a concept intent. A generator is
called non-trivial if it is not closed. In the current example (Table 1), 𝐷 = {𝑎, 𝑏, 𝑑} is a generator
of 𝐵 = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒} since 𝐵 is an intent, 𝐷 ⊆ 𝐵, and 𝐷 ″ = 𝐵.

Definition 5 (Minimal generator, key). A set of attributes 𝐷 ⊆ 𝑀 is a key or a minimal generator
of 𝐷 ″ iff ∄𝑚 ∈ 𝐷 ∶ (𝐷 ⧵ {𝑚})″ = 𝐷 ″ .

   In the following we will use “key” rather than “minimal generator. A key is inclusion minimal
in the equivalence class of subsets of attributes having the same closure [4, 5]. In the current
example (Table 1), 𝐷 = {𝑎, 𝑐, 𝑑} is a key since none of its subsets {𝑎, 𝑐}, {𝑎, 𝑑}, {𝑐, 𝑑} generates the
intent 𝐷 ″ = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒}. Every proper premise is a key, however the converse does not hold in
general.

Definition 6 (Minimum generator, passkey). A set of attributes 𝐷 ⊆ 𝑀 is a passkey or a minimum
generator iff 𝐷 is a minimal generator of 𝐷 ″ and 𝐷 has the minimal size among all minimal
generators of 𝐷 ″ .
   In the following we will use “passkey” rather than “minimum generator. A passkey is
cardinality-minimal in the equivalence class of subsets of attributes having the same closure. It
should be noticed that there can be several minimum generators and one is chosen as a passkey,
but the minimal size is unique. In [2] the maximal size of a passkey of a given context was
studied as an index of the context complexity. In the current example (Table 1), 𝐷 = {𝑏, 𝑑} is a
passkey of the intent {𝑏, 𝑐, 𝑑} since there is no other generator of smaller cardinality generating
𝐷 ″ . Meanwhile 𝐷 = {𝑎, 𝑐, 𝑑} is not a passkey of 𝐷 ″ = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒} since the subset 𝐸 = {𝑒} has a
smaller size and the same closure, i.e., 𝐸 ″ = 𝐷 ″ .
   Finally, for illustrating all these definitions, we form the context (2𝑀 , 𝑀𝑑 , 𝐼𝑑 ) of all
classes of “characteristic attribute sets” of 𝑀 as they are introduced above. 𝑀𝑑 =
{intent, pseudo-intent, proper premise, key, passkey}, while 𝐼𝑑 states that a given subset of at-
tributes in 2𝑀 is a characteristic attribute set in 𝑀𝑑 . The concept lattice of this context is shown
in Figure 2.

                                          subset of attributes
                                              generator
                                              description
                                 ae, be, ce, de, abd, abe, ace, ade
                            bce, bde, cde, abcd, abce, abde, acde, bcde
  closed description                           minimal gen
        intent                                     key                                 pseudo-intent
    bc, bcd, abcde                                                                           abc

                         minimum gen
                           passkey                               proper premise
                                bd                                    ab, acd


                       a, c, d, ac, ad, ∅                             b, e, cd


Figure 2: The concept lattice of “characteristic attribute sets” of the context introduced in Table 1.


2.2. Towards Measuring Data Complexity
“Data complexity” can mean many different things depending on the particular data analysis
problem under study. For example, data can be claimed to be complex when data processing takes
a very long time, and this could be termed as “computational complexity” of data. Alternatively,
data can be considered as complex when data are hard to analyze and to interpret, and this could
be termed as “interpretability complexity” of data. For example, it can be hard to apply FCA or
machine learning algorithms, such as clustering, classification, or regression. Accordingly, it is
quite hard to define data complexity in general terms.
   If we consider the dimension of interpretability, then the size of the “patterns” to interpret
and their number are definitely important elements to take into account. In the following, the
expression “pattern” refers to “interesting subsets of attributes” in a broad sense. In an ideal
case, one prefers a small number of patterns, to facilitate interpretation. Indeed, a small number
of rules with a few attributes in the premises and in the conclusions is simpler to interpret than
hundreds of rules with more than ten attributes in the premises and conclusions. Thus, it is
natural to study how the number of patterns is distributed w.r.t. their size. In most of the cases,
large numbers of patterns are associated with computational complexity. Then controlling the
size and the number of patterns is also a way to control computational complexity.
   It should also be mentioned that the number of patterns is related to the so-called “VC-
dimension” of a context [12], i.e., the maximal size of a Boolean sublattice generated from the
context. Accordingly, in this study about data complexity, we decided to count the number
of concepts, pseudo-intents, proper premises, keys, and passkeys, in order to understand and
evaluate the complexity of data. For all these “pattern types”, we also study the distribution of
pattern sizes.
   Additionally, we decided to measure the “lattice complexity”, i.e., the complexity of the
corresponding concept lattice, with two new measures related to what could be termed the
“linearity” of the lattice. Indeed, the simplest lattice structure that can be imagined is a chain,
while the counterpart is represented by the Boolean lattice, i.e., the lattice with the largest
amount of connections and concepts. However, it should be noticed that the Boolean lattice
may be considered as complex from the point of view of interpretability, but very simple from
the point of view of implication base, which is empty in such a lattice.
   Then, a first way to measure the closeness of the lattice to a chain is the “linearity index”
which is formally defined below as the probability that two random concepts are comparable in
the lattice.
Definition 7. Let us consider a concept lattice ℒ, ℒ ∗ = ℒ ⧵ {⊤, ⊥}, where |ℒ | denotes the number
of the concepts in ℒ, ⊤ and ⊥ being the top and bottom elements of ℒ. The linearity index LIN (ℒ )
is defined as:
                                          2
                                 |ℒ ∗ |⋅(|ℒ ∗ |−1)
                                                         ∑           1(𝑐𝑖 < 𝑐𝑗 ∨ 𝑐𝑖 > 𝑐𝑗 ) if |ℒ | > 3,
                  LIN (ℒ ) = {                       {𝑐𝑖 ,𝑐𝑗 }⊂ℒ ∗                                        (3)
                                                               1, otherwise
where {𝑐𝑖 , 𝑐𝑗 } is an unordered pair of concepts, and 1 is the indicator function taking the value 1
when the related constraint is true.
   The linearity index is maximal for a chain, i.e., the lattice related to a linear order. It is
minimal for the lattice related to a nominal scale which is also the lattice related to a bijection.
One example of such a lattice is given by the so-called M3 lattice which includes a top and
a bottom element, and three incomparable elements. In particular, when a lattice includes a
sublattice such as M3 it is not distributive [6, 7].
   This index does not directly measure how well the lattice is interpretable. One of the main
interpretability properties is the size of some particular sets, such as the size and the structure
of the implication basis. One of the simplest structure for the implication basis can be found in
distributive lattices, where pseudo-intents are all of size 1 [6]. Accordingly, the “distributivity
index” measures how a lattice is close to a distributive one.
Definition 8. Given a lattice ℒ, the distributivity index DIST (ℒ ) is defined as
                                      2
               DIST (ℒ ) =                                 ∑              1(𝑘𝑖 ∪ 𝑘𝑗 ∈ 𝐼 𝑛𝑡𝑒𝑛𝑡𝑠(ℒ )),   (4)
                             |ℒ | ⋅ (|ℒ | − 1)
                                                 {𝑘𝑖 ,𝑘𝑗 }𝑖≠𝑗 ⊂𝐼 𝑛𝑡𝑒𝑛𝑡𝑠(ℒ )

where 𝐼 𝑛𝑡𝑒𝑛𝑡𝑠(ℒ ) is the set of concept intents in ℒ.
   The distributivity index is maximal for distributive lattices, and this includes chain lattices
which are distributive lattices [6, 7]. Again it is minimal for lattices of nominal scales which
are not distributive. Although, it may sound strange to consider the lattices of nominal scales
as complex, they are not simple from the viewpoint of implications. For example, any pair of
attributes from the M3 lattice –introduced above– can form the premise of an implication with
a non-empty conclusion. This indeed introduces many implications in the basis and this makes
the DG-basis hard to interpret.

2.3. Synthetic Complex Data
In order to study some ways of measuring data complexity, we need to compare the behavior of
different complexity indices for “simple” and “complex” data. However, beforehand we cannot
know which dataset is complex. Accordingly, we will generate synthetic complex datasets and
compare them with real-world datasets. One way of generating complex data is “randomization”.
Actually, randomized data cannot be interpreted since any possible result is an artifact of the
method. For randomized data we know beforehand that there cannot exist any rule or concept
that have some meaning. Thus, randomized data are good candidate data for being considered
as “complex”.
   Now we discuss which randomization strategy should be used for generating such data. A
natural way is making randomized data similar in a certain sense to the real-world data they are
compared to. Firstly, when considering reference real-world data, it seems natural to keep the
same number of objects and attributes as they are in the real-world data. Moreover, the “density”
of the context, i.e., the number of crosses, significantly affects the size and the structure of the
related concept lattice. Thus, to ensure that randomized data are “similar” to the real-world
data it is also natural to keep the density of data. This gives us the first randomization strategy,
i.e., for any real-world dataset we can generate a randomized dataset with the same number of
objects and attributes, and with the same density. Then, the crosses in the original context will
be randomly spread along the new context in ensuring that the density is the same as in the
original real-world data.
   For example, let us consider the context given in Table 2, where there are 8 objects, 6 attributes,
and 35 crosses. Thus, any context with 8 objects, 6 attributes, and 35 crosses, can be considered
as a randomization of this original context. In our first randomization strategy, we suppose that
the probability of generating any such randomized context is equally distributed.
              descriptions   generator   intent   key    passkey    pseudo     proper
                                                                    intent    premise
                       67        x          x      x        x
                       45        x          x
                       41        x                 x        x         x          x
                      125        x                 x        x                    x
                        1        x                 x                  x          x
                       25        x                 x                             x
                       33        x                                    x
                  1048239        x

Table 2
The context corresponding to the lattice of descriptions for Bob Ross dataset (https://datahub.io/
five-thirty-eight/bob-ross).


   The randomized formal contexts for such strategy were studied in [13]. The authors have
shown that the correlation between the number of concepts and the number of pseudo-intents
has a non-random structure, suggesting that fixing density is not enough to generate randomized
data which are similar to the real-world ones. Accordingly, we also studied a randomization
strategy that keeps the number of objects as follows. A randomized context is generated attribute
by attribute. The number of crosses in every column remains the same as in the corresponding
“real-world attribute” but the crosses are randomly assigned. This can be viewed as a permutation
of the crosses within every column in the randomized context, every column being permuted
independently of the others. Such a procedure corresponds to the “null hypothesis” in statistical
terms of independence between attributes.
   Although such randomization strategy considers objects and attributes differently, it corre-
sponds to typical cases in data analysis. Indeed, in typical datasets, objects stand for observations
that are described by different attributes. The attributes correspond to any hypothesis in the
data domain. Then, analysts are usually interested in discovering some relations between
attributes, and the hypothesis of attribute independence is a natural null hypothesis in such a
setting.
   For example, let as consider again the context in Table 2, and that the numbers of objects and
attributes remain the same in the randomized context. Then a randomized context following the
second strategy is any context having 8 crosses for the first attribute, 2 crosses for the second
attribute, 5 crosses for the third attribute, etc. Accordingly, when we are randomizing data
from a given real-world dataset, we should have in mind that many randomized datasets can
be generated w.r.t. the same randomization strategy. Thus, for the sake of objectivity, it is not
enough to study only one random dataset for a given real-world dataset, but it is necessary to
generate several randomized datasets and then to estimate the distribution of a characteristic
under study within all randomized datasets.
   In the next section we study different ways of measuring the complexity of a dataset and we
observe that the complexity of randomized datasets is generally higher than the complexity of
the corresponding real-world dataset.
3. Experiments
3.1. Datasets
For this preliminary study we selected 4 small real-world datasets in order to support an efficient
computing of all necessary elements of the lattice. Efficiency matters here because we involve
randomization and computations are repeated hundreds of times for one dataset. The study
includes the following datasets: “Live in water1 ”, “Tea ladies2 ” “Lattice of lattice properties3 ”,
and “Bob Ross episodes4 ”. For the sake of efficiency the fourth dataset was restricted to only
first the 20 attributes.
   The datasets are analyzed in two different ways. Firstly, the characteristic attribute sets are
computed, e.g., concepts, keys, passkeys, pseudo-intents, and proper premises, and then the
relations existing between these elements are discussed. Secondly, we study the complexity
of a dataset in studying its characteristic attribute sets and their distributions w.r.t. the size of
these attribute sets. We also compared all the two numerical indicators, namely the linearity
and the distributivity indices, for real-world and randomized datasets.

3.2. Characteristic Attribute Sets in a Lattice
In this section we study the relations between the different characteristic attribute sets. The
experiment pipeline reads as follows.
     • Given a context 𝐾 = (𝐺, 𝑀, 𝐼 ), we compute all possible “attribute descriptions”, i.e., subsets
       of attributes in 2𝑀 , and we check whether a description verifies some characteristic such
       as being an intent, a key, a passkey….
     • Given 𝐴𝐷 = {𝑔𝑒𝑛, 𝑖𝑛𝑡𝑒𝑛𝑡, 𝑘𝑒𝑦, 𝑝𝑎𝑠𝑠𝑘𝑒𝑦, 𝑝𝑠𝑒𝑢𝑑𝑜-𝑖𝑛𝑡𝑒𝑛𝑡, 𝑝𝑟𝑜𝑝𝑒𝑟-𝑝𝑟𝑒𝑚𝑖𝑠𝑒}, we construct the
       context (2𝑀 , 𝐴𝐷, 𝐼𝐴𝐷 ) where 𝐼𝐴𝐷 is the relation indicating that a given set of attributes
       has a characteristic in 𝐴𝐷.
     • Finally, we construct the “lattice of descriptions” based on the context (2𝑀 , 𝐴𝐷, 𝐼𝐴𝐷 ). This
       lattice shows the relations existing between all generators –subsets of attributes– in a
       given dataset.
   The “description context” for the “Bob Ross” dataset is given in Table 2 and the corresponding
lattice is shown in Figure 3. From the lattice we can check that any two classes of descriptions
may intersect if this is not forbidden by their definition (e.g., a description cannot be both an
intent and a pseudo-intent). Although such a lattice is computed for a particular dataset, this is
the general lattice structure which is obtained most of the time. In some small datasets it may
happen that some characteristic attribute sets are missing. For example, in the “Live in water”
lattice, the properties of being a key, being a passkey, and being a proper premise, all coincide
and collapse into one single node.
   It is also very interesting to analyze the proportions of the sizes of classes of descriptions. For
example, in the “Bob Ross” context restricted to 20 attributes, there are 220 possible descriptions,
1
  https://upriss.github.io/fca/examples.html
2
  https://upriss.github.io/fca/examples.html
3
  https://upriss.github.io/fca/exampLes.html
4
  https://datahub.io/five-thirty-eight/bob-ross
                                              generator
                                          subset of attributes
                                              description
                                         # 1048576 (100.00%)
            closed description                    key
                  intent                      minimal gen                      pseudo-intent
              # 112 (0.01%)                   # 259 (0.02%)                    # 75 (0.01%)
                                passkey
                              minimum gen                     proper premise
                              # 233 (0.02%)                   # 175 (0.02%)


              # 67 (0.01%)                    # 149 (0.01%)                    # 42 (0.00%)


                                                              # 41 (0.00%)


Figure 3: The lattice of “attribute descriptions” for the “Bob Ross” dataset and their distributions.


but there are only 112 of them which are closed, and only 259 of them which are minimal
generators. Thus, the large majority of the descriptions are “useless” in the sense that they
do not correspond to any of the characteristic attribute subsets introduced above. In the next
subsection we consider the distributions of these characteristic attribute subsets.

3.3. Data Complexity
For analyzing data complexity, we start by comparing the numbers of elements in real-world data
and in randomized data. In the Figures 4 and 5, the distributions of the different characteristic
elements for “Bob Ross” dataset are shown. Along the horizontal axis, the sizes of the elements
are shown, i.e., the number of attributes in the intent, pseudo-intent, key, etc. Along the vertical
axis the number of elements of the corresponding sizes are shown.
   Red crosses show the values corresponding for real-world data and the boxplots visualize the
values found in random data. There were 100 randomizations and thus boxplots are based on
these 100 values. A box corresponds to the 50% middle values among 100 values. In addition,
it should be noticed that these two figures differ in the randomization strategy. Figure 4
corresponds to density-based randomization while Figure 5 shows randomization based on
column permutations.
   From both figures we can observe that randomized data contain significantly larger numbers
of elements than the real-world data. Moreover, the sizes of the elements for randomized data
are larger than the sizes of real-world data. Similar figures can be built for “Tea Ladies” and
“Lattice of lattice properties” datasets. However, it is not possible to distinguish the real-world
dataset and the randomized data for the “Live in Water” dataset5 . This can be explained by the
fact that either the dataset does not contain deep dependencies, or that the dataset is too small,
i.e., the randomized dataset cannot be substantially different from the original one.
   Let us now study how the linearity index and the distributivity index measure the complex-
ity of a dataset. Figures 7 and 6 show the values of the linearity and distributivity indices
correspondingly w.r.t. different randomizations. From these figures we can see that datasets
built from density-based randomization are more different from real-world datasets than the
randomized datasets built from column-wise permutations. We also notice that the values
of the linearity and distributivity indices show a substantial dependence w.r.t. density of the
corresponding context. Indeed, if we examine the randomized datasets, we can see that the
distributions of the linearity and distributivity indices are different. This can be explained either
by the context density and by the context size. Thus, we cannot have any reference values for
these indices that would split between “complex” and not “simple” data. However, comparing
the values of the index to the distribution of these indices allow one to decide on complexity
of the data. Finally, in all datasets but “Live in water”, both linearity and distributivity indices
have higher values for real-world datasets than for randomized datasets. This shows again that
real-world datasets are more structured than their randomized counterparts.


4. Conclusion
In this paper we have introduced and studied “concise representations” of datasets given by
related contexts and concept lattices, and characteristic attributes sets based on equivalence
classes, i.e., intents, keys (minimal generators), passkeys (minimum generators), proper premises,
and pseudo-intents. We have also defined two new indices for measuring the complexity of
a dataset, namely the linearity index for checking the direct dependencies between concepts
or how a concept lattice is close to a chain, and the distributivity lattice which measures how
close is a concept lattice to a distributive lattice. In the latter, all pseudo-intents are of length 1,
leading to sets of simple implications. We have also proposed a series of experiments where we
analyze real-world datasets and their randomized counterparts. As expected, the randomized
datasets are more complex that the real-world ones.
   The future work will be to improve this study in several directions, by studying more deeply
the role of both indices, the linearity index and the distributivity index, by analyzing more
larger datasets, and more importantly by analyzing the complexity from the point of view of
the generated implications and association rules. This paper is another step in the analysis
of the complexity of datasets in the framework of FCA, and we believe that FCA can bring a
substantial support for analyzing data complexity in general.


5
    All the figures that cannot be shown in this paper are visible in supplementary materials, see https://yadi.sk/i/8_
    5EEvY4zNi82g
Acknowledgments
The work of Sergei O. Kuznetsov on this paper was supported by the Russian Science Foundation
under grant 22-11-00323 and performed at HSE University, Moscow, Russia.


References
 [1] B. Ganter, R. Wille, Formal Concept Analysis, Springer, Berlin, 1999.
 [2] T. Makhalova, A. Buzmakov, S. O. Kuznetsov, A. Napoli, Introducing the closure structure
     and the GDPM algorithm for mining and understanding a tabular datasets, International
     Journal of Approximate Reasoning 145 (2022) 75–90.
 [3] J.-L. Guigues, V. Duquenne, Famile minimale d’implications informatives resultant d’un
     tableau de données binaire, Mathematique, Informatique et Sciences Humaines 95 (1986)
     5–18.
 [4] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Pruning Closed Itemset Lattices for Association
     Rules, International Journal of Information Systems 24 (1999) 25–46.
 [5] Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, L. Lakhal, Mining frequent patterns with
     counting inference, SIGKDD Exploration Newsletter 2 (2000) 66–75.
 [6] B. A. Davey, H. A. Priestley, Introduction to Lattices and Order, Cambridge University
     Press, Cambridge, UK, 1990.
 [7] G. Grätzer, General Lattice Theory (Second Edition), Birkäuzer, 2002.
 [8] S. O. Kuznetsov, S. A. Obiedkov, Comparing performance of algorithms for generating
     concept lattices, Journal of Experimental & Theoretical Artificial Intelligence 14 (2002)
     189–216.
 [9] U. Ryssel, F. Distel, D. Borchmann, Fast algorithms for implication bases and attribute
     exploration using proper premises, Annals of Mathematics and Artificial Intelligence 70
     (2014) 25–53.
[10] K. Bertet, B. Monjardet, The multiple facets of the canonical direct unit implicational basis,
     Theoretical Computer Science 411 (2010) 2155–2166.
[11] B. Ganter, S. A. Obiedkov, Conceptual Exploration, Springer, 2016.
[12] A. Albano, B. Chornomaz, Why concept lattices are large: extremal theory for generators,
     concepts, and VC-dimension, International Journal of General Systems 46 (2017) 440–457.
[13] D. Borchmann, T. Hanika, Some Experimental Results on Randomly Generating Formal
     Contexts, in: M. Huchard, S. O. Kuznetsov (Eds.), Proceedings of the 13th International
     Conference on Concept Lattices and Their Applications (CLA), volume 1624 of CEUR
     Workshop Proceedings, CEUR-WS.org, 2016, pp. 57–69.


5. Appendix: Figures Related to Experiments
                                                (a) Intents


                     (b) Keys                                              (c) Passkeys


                (d) Pseudo-intents                                     (e) Proper premises
Figure 4: The numbers of elements for the “Bob Ross” dataset w.r.t. context randomization based on
density. Along the horizontal axes the sizes of the elements are shown, e.g., the number of attributes in
the intents, pseudo-intents, keys, etc. Along the vertical axis the number of elements of the corresponding
sizes are shown.
                                            (a) Intents


                   (b) Keys                                         (c) Passkeys


              (d) Pseudo-intents                                 (e) Proper premises
Figure 5: The numbers of elements for the “Bob Ross” dataset w.r.t. context randomization based on
column-wise permutations.
                                    Density based randomization


    (a) Live in water            (b) Lattice              (c) Tea Lady              (d) Bob Ross
                           Randomization based on column permutations


    (e) Live in water            (f) Lattice              (g) Tea Lady              (h) Bob Ross

Figure 6: The distributivity index for different datasets and different randomizations.
                                    Density based randomization


    (a) Live in water            (b) Lattice              (c) Tea Lady               (d) Bob Ross
                           Randomization based on column permutations


    (e) Live in water            (f) Lattice              (g) Tea Lady               (h) Bob Ross

Figure 7: The linearity index for different datasets and different randomizations.

</pre>