=Paper=
{{Paper
|id=Vol-2123/paper21
|storemode=property
|title=A New Method for Inheriting Canonicity Test Failures in Close-by-One
Type Algorithms
|pdfUrl=https://ceur-ws.org/Vol-2123/paper21.pdf
|volume=Vol-2123
|authors=Simon Andrews
|dblpUrl=https://dblp.org/rec/conf/cla/Andrews18
}}
==A New Method for Inheriting Canonicity Test Failures in Close-by-One
Type Algorithms==
A New Method for Inheriting Canonicity Test
Failures in Close-by-One Type Algorithms
Simon Andrews[0000−0003−2094−7456]
Conceptual Structures Research Group
Communication and Computing Research Centre
Department of Computing
Faculty of Arts, Computing, Engineering and Sciences
Sheffield Hallam University, Sheffield, UK
s.andrews@shu.ac.uk
Abstract. Close-by-One type algorithms are efficient algorithms for
computing formal concepts. They use a mathematical canonicity test to
avoid the repeated computation of the same concept, which is far more
efficient than methods based on searching. Nevertheless, the canonicity
test is still the most labour intensive part of Close-by-One algorithms
and various means of avoiding the test have been devised, including the
ability to inherit test failures at the next level of recursion. This paper
presents a new method for inheriting canonicity test failures in Close-
by-One type algorithms. The new method is simpler than the existing
method and can be amalgamated with other algorithm features to fur-
ther improve efficiency. The paper recaps an existing algorithm that does
not feature test failure inheritance and an algorithm that features the
existing method. The paper then presents the new method and a new
algorithm that incorporates it. The three algorithms are implemented on
a ‘level playing field’ with the same level of optimisation. Experiments
are carried out on the implemented algorithms, using a representative
range of data sets, to compare the number of inherited canonicity test
failures and the computation times. It is shown that the new algorithm,
incorporating the new method of inheriting canonicity test failures, gives
the best performance.
Keywords: Formal Concept Analysis · FCA · FCA algorithms · Com-
puting formal concepts · Canonicity test · Inheriting canonicity test
failures· Close-by-One · FCbO · In-Close
1 Introduction
In the development o fast algorithms to compute formal concepts, the discovery
of the so-called ‘canonicity test’, whereby the attributes in a concept could be
examined to determine its newness in the computation, gave rise to the orig-
inal Close-by-One (CbO) algorithm [8]. The canonicity test has proved to be
fundamental in the efficient computation of formal concepts, being far more ef-
ficient than previous methods of determining the newness of a concept based on
c paper author(s), 2018. Proceedings volume published and copyrighted by its editors.
Paper published in Dmitry I. Ignatov, Lhouari Nourine (Eds.): CLA 2018, pp.
255–266, Department of Computer Science, Palacký University Olomouc, 2018.
Copying permitted only for private and academic purposes.
256 Simon Andrews
searching, and was integral to the subsequent CbO algorithm presented in [6].
Nevertheless, the canonicity test is still the most labour intensive part of CbO-
type algorithms and various means of avoiding or improving the test have been
devised, giving rise to a number of advances in CbO-type algorithms includ-
ing FCbO [7, 9], In-Close2 [3] and In-Close4 [4]. FCbO introduced a combined
‘breadth and depth’ approach to computation that allowed child concepts to
fully inherit their parent’s attributes. In-Close2 then added a modified, ‘partial-
closure’, canonicity test to reduce the computation required in the test. FCbO
also introduced a technique whereby failed canonicity tests could be inherited,
thereby avoiding many canonicity tests. In-Close4 made use of empty intersec-
tions between the current concept extent and attribute-extents in the computa-
tion to also avoid canonicity tests.
This paper describes a new method of inheriting failed canonicity tests that
is simpler than the method used by FCbO. Furthermore, the method can be
amalgamated with existing efficiency features to further improve performance.
The rest of this paper is structured as follows: The paper will use the algo-
rithm In-Close4 [4] as the framework in which to incorporate the new inheri-
tance method, so Section 2 is a recap of that algorithm. Section 3 is a recap of
the FCbO algorithm, describing its method of inheriting failed canonicity tests.
Section 4 describes the new method of inheriting failed canonicity tests and in-
corporates it into In-Close4, creating a new algorithm, In-Close5. It should be
noted that In-Close1, In-Close2 and In-Close3 are previous versions of In-Close,
as presented in [2]. Section 5 describes the implementation of In-Close4, FCbO
and In-Close5 on a ‘level playing field’ using the same programming optimisa-
tions. Section 5 also shows how the new method of inheriting failed canonicity
tests can be amalgamated with existing efficiency features to further improve
performance. Section 6 presents a series of experiments and results, comparing
the performance of In-Close4, FCbO and In-Close5. Finally, Section 7 provides
some concluding remarks and ideas for further work.
2 Recap of the In-Close4 Algorithm
Below is a recap of the In-Close4 algorithm, as presented in [4]. In-Close4 com-
bines the efficiency of a partial-closure canonicity test [2] with full inheritance of
the parent intent. The full inheritance is achieved by adapting and incorporating
the combined breadth-first and depth-first approach of FCbO [7, 9]. The main
cycle is completed before passing to the next level, so that all the attributes of
a parent intent can be passed down to the next level. Child intents only have to
be finished off by adding attributes that are not in the parent intent. During the
main cycle, whilst the current intent is being closed, new extents that pass the
canonicity test are stored in a queue, similar to the queue in FCbO, to be pro-
cessed after the main cycle has completed. In-Close4 also makes use of empty
intersections when the current extent is intersected with the next attribute-
extent (next column) in the formal context: empty intersections are inherited so
New Method of Inheriting Test Failures 257
that they can be skipped at subsequent levels in the computation and, whenever
an empty intersection occurs, the algorithm forgoes the canonicity test.
The In-Close4 algorithm is invoked with an initial pair (A, B) = (X, ∅), where
A is a set of objects (extent) and B is a set of attributes (intent) and X is the
set of all objects in the formal context, and initial attribute y = 0. Y is the set of
all attributes in the formal context and Yj is the set of all attributes up to (but
not including) j. The algorithm is also invoked with an empty set of attributes,
P = ∅, in which to store subsequent empty intersections.
Note that forgoing the canonicity test after an empty intersection means
that the algorithm is incomplete, in that it will not compute the concept (Y, ∅).
However, it is a simple task to add it afterwards, if it exists: If Y ↓ = ∅ then add
(∅, Y ) to the set of computed concepts.
In-Close4
ComputeConceptsFrom((A, B), y, P )
1 for j ← y upto n − 1 do
2 if j ∈
/ B and j ∈ / P then
3 C ← A ∩ {j}↓
4 if C 6= ∅ then
5 if C = A then
6 B ← B ∪ {j}
7 else
8 if B ∩ Yj = C ↑j then
9 PutInQueue(C, j)
10 else
11 P ← P ∪ {j}
12 ProcessConcept((A, B))
13 Q←P
14 while GetFromQueue(C, j) do
15 D ← B ∪ {j}
16 ComputeConceptsFrom((C, D), j + 1, Q)
A line by line explanation of In-Close4 is as follows:
Line 1 - Iterate across the formal context, from a starting attribute y up to
attribute n − 1, where n is the number of attributes in the context.
Line 2 - Skip attributes already in B. Because intents inherit all of their
parent’s attributes, these can be skipped. Also skip any attributes in P as these
are inherited empty intersections - if the parent extent resulted in an empty
intersection, so will its children since they are all subsets of the parent.
Line 3 - Form an extent, C, by intersecting the current extent, A, with the
next attribute-extent (column of objects) in the context.
Line 4 - If the extent, C, is not empty...
258 Simon Andrews
Line 5 - If the extent, C, equals the extent of the concept whose intent is
currently being closed, A, then...
Line 6 - ...add the current attribute, j, to the intent being closed, B.
Line 7 - Otherwise, test the canonicity using the partial-closure canonicity
test [1]: ↑ is the standard closure operator in FCA and ↑j is a modification
meaning “close up to, but not including, attribute j”.
Line 8 - If the canonicity test is passed...
Line 9 - ...place the new extent, C, and the location where it was found, j,
in a queue for later processing.
Line 10 - If the extent, C, is empty...
Line 11 - ... add the current attribute to P so that the empty intersection
can be inherited.
Line 12 - Pass concept (A, B) to the notional procedure ProcessConcept to
process it in some way (for example, storing it in a data base of concepts).
Line 13 - Store P in Q ready to pass the attributes resulting in empty
intersections to the next level.
Line 14 - The queue is processed by obtaining from the queue each new
extent and the location it was found.
Line 15 - Each new partial intent, D, inherits all the attributes from its
completed parent intent, B, along with the attribute, j, where its extent was
found.
Line 16 - Recursively call ComputeConceptsFrom to compute child concepts
from j + 1 and to complete the intent D.
3 Recap of the FCbO Algorithm
Below is a recap of the FCbO algorithm [7, 9] as presented in [2]. FCbO intro-
duced the feature of inherited canonicity test failures to improve the performance
of CbO-type algorithms, along with the combined breadth/depth first approach
to enable full inheritance of parent intents. The inheritance of test failures is
achieved by recording intents that are not canonical as N j s, where j is the cur-
rent attribute, thus enabling subsequent levels to compare these failed intents
against the current one and thus avoid the computation of a repeated concept
without the need for the original canonicity test. FCbO is invoked with the ini-
tial concept (A, B) = (X, X ↑ ), initial attribute y = 0 and a set of empty N y s,
{N y = ∅ | y ∈ Y }.
Line 1 - Pass concept (A, B) to the notional procedure ProcessConcept to
process it in some way (for example, storing it in a set of concepts).
Line 2 - Iterate across the context, from starting attribute y up to attribute
n − 1.
Line 3 - M j is set to the latest intent that failed the canonicity test at
attribute j, N j .
Line 4 - Skip attributes in B and those that have an inherited record of
failure.
New Method of Inheriting Test Failures 259
FCbO
ComputeConceptsFrom((A, B), y, {N y | y ∈ Y })
1 ProcessConcept((A, B))
2 for j ← y upto n − 1 do
3 Mj ← Nj
4 / B and N j ∩ Yj ⊆ B ∩ Yj then
if j ∈
5 C ← A ∩ {j}↓
6 D ← C↑
7 if B ∩ Yj = D ∩ Yj then
8 PutInQueue ((C, D), j)
9 else
10 Mj ← D
11 while GetFromQueue((C, D), j) do
12 ComputeConceptsFrom((C, D), j + 1, {M y | y ∈ Y })
Line 5 - Otherwise, form an extent, C, by intersecting the current extent, A,
with the next column of objects in the context.
Line 6 - Close the extent to form an intent, D.
Line 7 - Perform the canonicity test.
Line 8 - If the concept is a new one, store it in a queue along with the
attribute it was computed at.
Line 10 - Otherwise set the record of failure for attribute j, M j , to the intent
that failed the canonicity test.
Line 11 - Get each stored concept from the queue...
Line 12 - ...and pass it to the next level, along with the stored starting at-
tribute for the next level and the failed intents from this level.
4 A New Method of Inheriting Failed Canonicity Tests
The method of inheriting failed canonicity tests employed by FCbO requires the
manipulation and storage of a two-dimensional array to represent intents that
fail the canonicity test. A total of n intents are required, and, although the use of
pointers in a optimised implementation avoids the need for copying intents, they
still need to be computed and stored. This results in computational overheads
so that, even though a significant number of canonicity test are avoided [9],
algorithms such as In-Close4 are still able to outperform FCbO [2, 4].
However, it is possible to obtain the inheritance of failed canonicity tests
with a simpler method. Firstly consider the criteria for failure in In-Close4: the
test will fail if there exists an attribute in C ↑j that is not in B ∩ Yj . In other
words, when there is an attribute before j (but not in the current intent, B)
who’s attribute-extent contains the extent, C - in which case the extent, C,
260 Simon Andrews
will already have been computed. Now consider the starting attribute, y, for
the current cycle (Line 1 of In-Close4). Let us say, in a failed canonicity test,
that the smallest attribute in C ↑j that is not in B ∩ Yj is i. If i ≥ y then an
extent, H, where C ⊆ H, will have been discovered in the current cycle at i (and
be waiting in the current queue). And there may be other extents, discovered
after i but before j that are also supersets of C and also in the queue. Thus, if
i ≥ y, the current attribute, j, will be required at the next level to be examined
by the children in the queue: C may be canonical with respect to one of the
children or j may be an attribute in the intent of a child and thus required to be
added. However, if i < y, the concept with extent C and its children will have
already been computed and processed. Thus no children in the current queue,
or subsequent children, need examine j. In other words, if i < y then j can be
inherited as a canonicity test failure - all subsequent children can skip j in the
cycle. All that is required is to maintain a set of such attributes that can be
passed down to the next level in the algorithm.
The new algorithm, In-Close5, below, is In-Close4 with the new method of
inheriting failed canonicity tests added. It is invoked in the same way as In-
Close4 but with the addition of an initially empty set of attributes, N = ∅, in
which to store canonicity test failures.
In-Close5
ComputeConceptsFrom((A, B), y, P, N )
1 for j ← y upto n − 1 do
2 if j ∈
/ B and j ∈ / P and j ∈ / N then
3 C ← A ∩ {j}↓
4 if C 6= ∅ then
5 if C = A then
6 B ← B ∪ {j}
7 else
8 if B ∩ Yj = C ↑j then
9 PutInQueue(C, j)
10 else
11 if min(C ↑j ) < y then
12 N ← N ∪ {j}
13 else
14 P ← P ∪ {j}
15 ProcessConcept((A, B))
16 Q←P
17 M ← N
18 while GetFromQueue(C, j) do
19 D ← B ∪ {j}
20 ComputeConceptsFrom((C, D), j + 1, Q, M )
New Method of Inheriting Test Failures 261
The new lines in In-Close5 are as follows:
Line 2 - As well as skipping inherited attributes in the intent, j ∈ / B, and
inherited empty intersections, j ∈ / P , the algorithm now also skips inherited
canonicity test failures, j ∈
/ N.
Line 11 - If the canonicity test (Line 8) is failed, a test is carried out com-
paring the smallest attribute in C ↑j with y. If the attribute is smaller than y
then...
Line 12 - ...j is added to the set of canonicity test failures, N .
Line 17 - Store N in M ready to pass the canonicity test failures to the next
level.
5 Implementation
The three algorithms, In-Close4, FCbO and In-Close5, were implemented in
ANCII C using the same data structures, data pre-processing and level of opti-
misation to create a ‘level playing field’ for comparing their performance. The
key optimisations are described below.
The use of Bit-Arrays Implementations of CbO-type algorithms, such as
In-Close and FCbO, typically use a bit-array to represent the formal context.
This allows operations on the formal context, such as closure operations, to
be implemented using bit-wise operators in the manner of fine-grained parallel
processing. In a typical 64-bit architecture, this means that 64 cells of the formal
context can be operated on simultaneously. Using bits to represent cells of the
formal context also allows more of the context to be retained in cache memory.
Using a Local Boolean Copy of the Current Intent Typical implementa-
tions of CbO-type algorithms maintain a global data structure to store integer
representations of concept intents (integers mapping to formal attributes) but,
at the same time, also use a Boolean (bit-array) representation of the current
intent to facilitate an efficient implementation of the test for inherited attributes,
j∈/ B.
Efficient Implementation of the Partial-Closure Canonicity Test in
In-Close Algorithms In practice, it is not necessary to always close the new
extent up to the current attribute. It is only necessary to find the first instance
where B ∩ Yj and C ↑j do not agree. Thus failure is typically detected before
j is reached, thus saving additional time. In FCbO, however, a full-closure, C ↑
is always required because, if the test is passed, it provides the closure of the
concept intent, or, if the test is failed, it provides the failed intent to be stored in
M j . In In-Close, new concept intents are closed at the next level, during the main
cycle, whenever C = A by B ← B ∪{j} (Lines 5 and 6 of In-Close4, for example).
Furthermore, given that the test C = A is provided at no computational cost, as
a by-product of the intersection in C ← A ∩ {j}↓ , the overheads of the closure
262 Simon Andrews
are close to zero. This also means that savings are made by In-Close algorithms
when canonicity tests succeed. Here, the partial closure, C ↑j is carried out up
to j, compared to the full closure, C ↑ , in FCbO.
Amalgamation of Efficiency Features in In-Close5 In implementation, the
set of inherited empty intersections, the set of inherited canonicity test failures
and the local, Boolean, copy of the current intent can be amalgamated into a
single bit-array, in effect reducing the test in Line 2 of In-Close5, j ∈
/ B And j ∈
/P
And j ∈ / N to a single test, j ∈/ Z, where Z = B ∪ P ∪ N . Lines 6, 12 and 14 will
all become Z ← Z ∪{j}, thus updating the same bit-array in the implementation
(of course the update of the global set of intents in the implementation, required
by Line 6, remains unchanged). Amalgamating the three sets of attributes also
means there are overhead savings made from reduced parameter passing.
6 Evaluation of Performance
In this section, In-Close4, FCbO and In-Close5 are evaluated by comparing their
performance over a varied range of data sets. The experiments are divided into
three groups: 1) real data sets, 2) artificial data sets, and 3) randomised data
sets. In each case, the time taken to compute all formal concepts is measured
along with the number of canonicity tests carried out.
The experiments were conducted on a standard 64-bit Intel architecture,
using a PC with an Intel Core i7-2600 3.40GHz CPU and 8GB of RAM. To
cater for any inconsistency of system performance, due to background system
processes, for example, each experiment was conducted multiple times and the
average time taken for each.
Real Data Set Experiments. Four real data sets were used in the experi-
ments: Mushroom, Adult and Internet Ads, taken from the UCI Machine Learn-
ing Repository [5] and Student, an anonymised data set from an internal student
experience survey carried out at Sheffield Hallam University, UK. The data sets
were selected to represent a broad range of features, in terms of size and density,
and the UCI ones, in particular, are well known and used in FCA work.
The results of the experiments are given in Table 1 (timings) and Table 2
(canonicity tests).
In-Close5 was fastest for the Mushroom, Adult and Student data sets, and
equal fastest, with In-Close4, for the Internet Ads data set. In-Close5 used the
fewest canonicity tests for the Adult and Internet Ads data sets and was not far
behind FCbO for the Mushroom and Student data sets.
Artificial Data Set Experiments. Artificial data sets were used that, al-
though randomised, the randomisation was constrained by properties of real
data sets, such as many-valued attributes having a pre-defined number of unique
New Method of Inheriting Test Failures 263
Table 1. Real data set results (timings in seconds).
Data set Mushroom Adult Internet Ads Student
|G| × |M | 8, 124 × 126 32, 561 × 124 3, 279 × 1, 565 587 × 145
Density 17.36% 11.29% 0.97% 24.50%
#Concepts 233,116 1,388,469 16,570 22, 760, 243
FCbO 0.23 1.46 0.21 8.80
In-Close4 0.19 0.88 0.07 4.65
In-Close5 0.18 0.85 0.07 4.31
Table 2. Real data set results (canonicity tests).
Data set Mushroom Adult Internet Ads Student
FCbO 331,106 2,029,933 363,568 40,630,663
In-Close4 429,974 1,707,707 91,029 53,162,649
In-Close5 332,449 1,667,052 67,715 41,048,752
values. Three data sets, M7X10G120K, M10X30G120K and T10I4D100K, were
used to provide a range of features in terms of size and density.
The timing results of the artificial data set experiments are given in Table
3 and the comparison of the number of canonicity tests carried out is given in
Table 4. For all three data sets, In-Close5 was quickest and performed the fewest
canonicity tests.
Table 3. Artificial data set results (timings in seconds).
Data set M7X10G120K M10X30G120K T10I4D100K
|G| × |M | 120, 000 × 70 120, 000 × 300 100, 000 × 1, 000
Density 10.00% 3.33% 1.01%
#Concepts 1,166,326 4,570,493 2,347,376
FCbO 1.35 15.45 23.83
In-Close4 0.77 5.60 6.56
In-Close5 0.69 5.35 5.81
Random Data Set Experiments. Three series of random data experiments
were carried out, testing the effect of variation of the number of attributes,
context density, and number of objects, respectively:
– Attributes series - with 5% density and 5,000 objects, the number of at-
tributes was varied between 300 and 1,000. The number of concepts varied
from approximately 1,000,000 to 22,000,000.
264 Simon Andrews
Table 4. Artificial data set results (canonicity tests).
Data set M7X10G120K M10X30G120K T10I4D100K
FCbO 4,640,906 167,814,522 75,281,105
In-Close4 2,360,015 29,686,007 21,262,544
In-Close5 2,339,951 26,593,944 14,907,484
– Objects series - with 5% density and 200 attributes, the number of objects
was varied between 30,000 and 100,000. The number of concepts varied from
approximately 4,000,000 to 22,000,000.
– Density series - with 200 attributes and 10,000 objects, the density of 1s in
the context was varied between 3 and 10%. The number of concepts varied
from approximately 200,000 to 19,000,000.
The results of the random data set timings are shown in the plots below. In all
three series, In-Close5 performed the fewest canonicity tests and was fastest. It
is interesting to note that In-Close4 often performed fewer canonicity tests than
FCbO (particularly apparent in the Object series). One might therefore deduce
that the Object series data sets gave rise to large numbers of empty intersections
- perhaps not surprising as the number of objects is increased at a relatively low
density in a randomised formal context.
Attribute Series (time) Attribute Series (tests)
#Canonicity tests ×108
300 FCbO 10 FCbO
In-Close4 In-Close4
Time (sec)
200 In-Close5 In-Close5
5
100
0 0
4 6 8 10 4 6 8 10
#Attributes ×100 #Attributes ×100
New Method of Inheriting Test Failures 265
Object Series (time) Object Series (tests)
#Canonicity tests ×108
FCbO FCbO
Time (sec) 30 In-Close4 In-Close4
In-Close5 2 In-Close5
20
1
10
0
4 6 8 10 4 6 8 10
#Objects ×10, 000 #Objects ×10, 000
Density Series (time) Density Series (tests)
#Canonicity tests ×107
FCbO FCbO
20 In-Close4 In-Close4
20
Time (sec)
In-Close5 In-Close5
10 10
0 0
4 6 8 10 4 6 8 10
Density (%) Density (%)
7 Conclusions
In conclusion, the performance of In-Close5 clearly demonstrates the efficiency
savings provided by the new method of inheriting canonicity test failures when
its results are compared to those of In-Close4 (the same algorithm but without
canonicity test failure inheritance). In-Close5 clearly outperforms FCbO, the
algorithm that features the existing method of inheriting canonicity test failures.
Although FCbO’s method inherits more test failures than the new method, the
simplicity of the new method warrants its attention as a useful contribution
to the area. It was shown in In-Close3 [2] that incorporating FCbO’s method
gave little improvement of performance, due to the computational overheads
of implementing it, whereas it is show here that the incorporation of the new
method does improve performance significantly.
An implementation of In-Close5 is available, free and open source, at https:
//sourceforge.net/projects/inclose/.
266 Simon Andrews
References
1. Andrews, S.: A partial-closure canonicity test to increase the efficiency of cbo-type
algorithms. In: Proceedings of the 21st International Conference on Conceptual
Structures. pp. 37–50. Springer (2014)
2. Andrews, S.: A best-of-breed approach for designing a fast algorithm for computing
fixpoints of galois connections. Information Sciences 295, 633–649 (2015)
3. Andrews, S.: In-close2, a high performance formal concept miner. In: Andrews,
S., Polovina, S., Hill, R., Akhgar, B. (eds.) Conceptual Structures for Discover-
ing Knowledge - Proceedings of the 19th International Conference on Conceptual
Structures (ICCS). pp. 50–62. Springer (2011)
4. Andrews, S.: Making use of empty intersections to improve the performance of cbo-
type algorithms. In: Proceedings of the 14th International Conference on Formal
Concept Analysis. pp. 56–71. Springer (2017)
5. Frank, A., Asuncion, A.: UCI machine learning repository:
http://archive.ics.uci.edu/ml (2010)
6. Krajca, P., Outrata, J., Vychodil, V.: Parallel recursive algorithm for FCA. In:
Belohavlek, R., Kuznetsov, S. (eds.) Proceedings of Concept Lattices and their Ap-
plications, 2008. pp. 71–82 (2008)
7. Krajca, P., Vychodil, V., Outrata, J.: Advances in algorithms based on CbO. In:
Kryszkiewicz, M., Obiedkov, S. (eds.) CLA 2010. pp. 325–337. University of Sevilla
(2010)
8. Kuznetsov, S.O.: Mathematical aspects of concept analysis. Mathematical Science
80(2), 1654–1698 (1996)
9. Outrata, J., Vychodil, V.: Fast algorithm for computing fixpoints of Galois con-
nections induced by object-attribute relational data. Information Sciences 185(1),
114–127 (Feb 2012). https://doi.org/10.1016/j.ins.2011.09.023