Hermes: an efficient algorithm for building Galois sub-hierarchies Anne Berry1 , Marianne Huchard2 , Amedeo Napoli3 , and Alain Sigayret1 LIMOS - CNRS UMR 6158 - Université Clermont-Ferrand II (France)⋆⋆ 1 {berry, sigayret}@isima.fr 2 LIRMM - CNRS UMR 5506 - Université de Montpellier II - Montpellier (France) huchard@lirmm.fr 3 LORIA - CNRS UMR7503 - Vandoeuvre-lès-Nancy (France) amedeo.napoli@loria.fr Abstract. Given a relation R ⊆ O × A on a set O of objects and a set A of attributes, the Galois sub-hierarchy (also called AOC-poset) is the partial order on the introducers of objects and attributes in the corresponding concept lattice. We present a new efficient algorithm for building a Galois sub-hierarchy which runs in O(min{nm, nα }), where n is the number of objects or attributes, m is the size of the relation, and nα is the time required to perform matrix multiplication (currently α = 2.376). 1 Introduction Galois lattices (also called concept lattices) are a powerful tool for data modeling. Such a lattice is built on a relation between a set of objects and a set of attributes. The main drawback of this structure is that it may have an exponential size in the number n of objects or attributes. A canonical sub-order of the lattice, its Galois sub-hierarchy (GSH, also called AOC-Poset), of much smaller size, is recommended whenever possible. This GSH preserves only the key elements of the lattice: object-concepts and attribute-concepts (also called introducers); the number of these key elements is at most equal to the total number n of objects and attributes. Galois sub-hierarchies were introduced in software engineering by Godin et al. [10] for class hierarchy reconstruction and successfully applied in later research work (see e.g. [15]). The AOC-poset (Attribute/Object Concepts poset [9]) has been used in applications of FCA to non-monotonic reasoning and domain theory [12], and to produce classifications from linguistic data [18, 20]. Specific parts of the GSH (mainly attribute-concepts) have been used in several works, including approaches for refactoring a class hierarchy [14] and recently for extracting a feature tree from a set of products in Software Product Lines [21]. ⋆⋆ Research partially supported by the French Agency for Research under the DEFIS program TODO, ANR-09-EMER-010. c 2012 by the paper authors. CLA 2012, pp. 21–32. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–84–695–5252–0, Universidad de Málaga (Dept. Matemática Aplicada), Spain. 22 Anne Berry, Marianne Huchard, Amedeo Napoli and Alain Sigayret Three algorithms for building GSH already exist: Ares [7], Ceres [14], and Pluton [1]. Each of them has a time complexity of O(n3 ), and they are some- what complicated. Their comparative experimental running times were investi- gated in [1]. In this paper, we present a new algorithm for building Galois sub-hierarchies, which we call Hermes, with a better complexity. Hermes runs in O(nm) time, where m is the size of the relation, and is very easy to understand and imple- ment. With more effort invested in the implementation, Hermes can be made to run in O(nα ) (i.e. O(n2.376 )) time, which is the time for performing matrix mul- tiplication. Hermes works by simplifying and then extending the input relation into a relation which contains in a compact fashion all the necessary information on the elements of the GSH. The paper is organized as follows: after this introduction, we give some no- tations and definitions. Section 3 briefly outlines how previous algorithms work. Section 4 proves some preliminary results and presents the algorithmic tools nec- essary to ensure our good complexity. Section 5 describes and analyzes in detail the successive steps of our algorithmic process. Section 6 gives the algorithm. Section 7 describes the special case for chordal bipartite relations, where the final relation can easily be obtained in O(n2 ) time. We conclude in Section 8. 2 Definitions and notations The different communities handling Galois lattices and Galois sub-hierarchies use various notations. Here, we will use algebraic notations detailed below. Given two finite sets O (of ’objects’) and A (of ’attributes’), a binary relation R ⊆ O × A indicates which objects of O are associated with which attributes of A. O is called the starting set of the relation. We will denote n = |O| + |A| and m = |R|. For (x, y) ∈ R, we will say that x is an antecedent of y, and y is an image of x. For x ∈ O, R(x) = {y ∈ A | (x, y) ∈ R} is the image set (row) of x, and for y ∈ A, R−1 (y) = {x ∈ O | (x, y) ∈ R} is the antecedent set (column) of y. Note that notation x′ is used in FCA [9] for R(x) and R−1 (x). The term line will be indifferently used for row and column. The triple (O, A, R) is called a context. A maximal rectangle of R, also called a concept, is a maximal Cartesian sub-product of R, i.e. X × Y such that ∀x ∈ X, ∀y ∈ Y , (x, y) ∈ R and ∀w ∈ O − X, ∃y ∈ Y | (w, y) 6∈ R and ∀z ∈ A − Y , ∃x ∈ X | (x, z) 6∈ R. X is called the extent and Y the intent of concept X × Y , which is denoted (X, Y ). In our examples, we may omit set brackets when the meaning is clear. The extent and intent of concept C will be denoted Extent(C) and Intent(C). The concepts, ordered by inclusion on their extents (or dually by inclusion on their intents) form a lattice L(R) called a Galois lattice or a concept lattice. For two concepts C and C ′ , C L(R) Cy . Thus, the intent of object-concept Cx is R(x), and the extent of attribute- concept Cy is R−1 (y). Object-concepts and attribute-concepts are also called introducer concepts or simply introducers. Objects are introduced from bottom to top and attributes from top to bottom in L(R). A given concept may introduce several objects and/or attributes. Note that [9] uses arrow relations to characterize the relationship between attribute-concepts and object-concepts, but without referring to Galois sub-hierarchies. A relation is said to be clarified when it has no identical lines. A relation is said to be reduced when it is clarified and has no line which is the intersection of several other lines. When a relation is reduced, the irreducible elements of the lattice are exactly the introducers, whereas in a non-reduced relation there will be extra introducers. H(R) denotes the Galois sub-hierarchy (GSH) of relation R, defined by the set of introducer concepts ordered as in L(R). H(R) is then a sub-order of L(R). The elements of H(R) are generally labeled by the objects and/or attributes they introduce, defining the simplified labeling. The same simplified labeling applies to L(R), in which some concepts may have an empty label.