=Paper=
{{Paper
|id=None
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-959/proceedings-cla2011.pdf
|volume=Vol-959
}}
==None==
CLA 2011 Proceedings of the Eighth International Conference on Concept Lattices and Their Applications CLA Conference Series http://cla.inf.upol.cz INRIA Nancy – Grand Est and LORIA, France The Eighth International Conference on Concept Lattices and Their Applications CLA 2011 Nancy, France October 17–20, 2011 Edited by Amedeo Napoli Vilem Vychodil CLA 2011, October 17–20, 2011, Nancy, France. Copyright c 2011 by paper authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. Technical Editors: Jan Outrata, jan.outrata@upol.cz Vilem Vychodil, vychodil@acm.org Page count: xii+419 Impression: 100 Edition: 1st First published: 2011 Printed version published by INRIA Nancy – Grand Est and LORIA, France ISBN 978–2–905267–78–8 Organization CLA 2011 was organized by the INRIA Nancy – Grand Est and LORIA Steering Committee Radim Belohlavek Palacky University, Olomouc, Czech Republic Sadok Ben Yahia Faculté des Sciences de Tunis, Tunisia Jean Diatta Université de la Réunion, France Peter Eklund University of Wollongong, Australia Sergei O. Kuznetsov State University HSE, Moscow, Russia Michel Liquière LIRMM, Montpellier, France Engelbert Mephu Nguifo LIMOS, Clermont-Ferrand, France Program Chairs Amedeo Napoli INRIA NGE/LORIA, Nancy, France Vilem Vychodil Palacky University, Olomouc, Czech Republic Program Committee Jaume Baixeries Polytechnical University of Catalonia Jose Balcazar University of Cantabria and UPC Barcelona, Spain Radim Belohlavek Palacky University, Olomouc, Czech Republic Karell Bertet University of La Rochelle, France François Brucker University of Marseille, France Claudio Carpineto Fondazione Ugo Bordoni, Roma, Italy Jean Diatta Université de la Réunion, France Felix Distel TU Dresden, Germany Florent Domenach University of Nicosia, Cyprus Mireille Ducassé IRISA Rennes, France Alain Gély University of Metz, France Cynthia Vera Glodeanu TU Dresden, Germany Marianne Huchard LIRMM, Montpellier, France Vassilis G. Kaburlasos TEI, Kavala, Greece Stanislav Krajci University of P.J. Safarik, Kosice, Slovakia Sergei O. Kuznetsov State University HSE, Moscow, Russia Léonard Kwuida Zurich University of Applied Sciences, Switzerland Mondher Maddouri URPAH, University of Gafsa, Tunisie Rokia Missaoui UQO, Gatineau, Canada Lhouari Nourine LIMOS, University of Clermont Ferrand, France Sergei Obiedkov State University HSE, Moscow, Russia Manuel Ojeda-Aciego University of Malaga, Spain Jan Outrata Palacky University, Olomouc, Czech Republic Pascal Poncelet LIRMM, Montpellier, France Uta Priss Napier University, Edinburgh, United Kingdom Olivier Raynaud LIMOS, University of Clermont Ferrand, France Camille Roth EHESS, Paris, France Stefan Schmidt TU Dresden, Germany Baris Sertkaya SAP Research Center, Dresden, Germany Henry Soldano Université of Paris 13, France Gerd Stumme University of Kassel, Germany Petko Valtchev Université du Québec à Montréal, Canada Additional Reviewers Mikhail Babin State University HSE, Moscow, Russia Daniel Borchmann TU Dresden, Germany Peggy Cellier IRISA Rennes, France Sebastien Ferre IRISA Rennes, France Nathalie Girard University of La Rochelle, France Alice Hermann IRISA Rennes, France Mehdi Kaytoue INRIA NGE/LORIA, Nancy, France Petr Krajca Palacky University, Olomouc, Czech Republic Christian Meschke TU Dresden, Germany Petr Osicka Palacky University, Olomouc, Czech Republic Violaine Prince LIRMM, Montpellier, France Chedy Raissy INRIA NGE/LORIA, Nancy, France Yoan Renaud LIRIS, Lyon, France Heiko Reppe TU Dresden, Germany Lucie Urbanova Palacky University, Olomouc, Czech Republic Jean Villerd ENSAIA, Nancy, France Organization Committee Mehdi Kaytoue (chair) INRIA NGE/LORIA, Nancy, France Elias Egho INRIA NGE/LORIA, Nancy, France Felipe Melo INRIA NGE/LORIA, Nancy, France Amedeo Napoli INRIA NGE/LORIA, Nancy, France Chedy Raı̈ssi INRIA NGE/LORIA, Nancy, France Jean Villerd ENSAIA, Nancy, France Table of Contents Preface Invited Contributions Mathematical Morphology, Lattices, and Formal Concept Analysis . . . . . . 1 Isabelle Bloch Random concept lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Richard Emilion Galois and his Connections—A retrospective on the 200th birthday of Evariste Galois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Marcel Erné Canonical extensions, Duality theory, and Formal Concept Analysis . . . . . 7 Mai Gehrke Galois connections and residuation: origins and developments II . . . . . . . . . 9 Bruno Leclerc Galois connections and residuation: origins and developments I . . . . . . . . . 11 Bernard Monjardet Metrics, Betweeness Relations, and Entropies on Lattices and Applications 13 Dan Simovici Long Papers Vertical decomposition of a lattice using clique separators . . . . . . . . . . . . . . 15 Anne Berry, Romain Pogorelcnik and Alain Sigayret Building up Shared Knowledge with Logical Information Systems . . . . . . . 31 Mireille Ducasse, Sebastien Ferre and Peggy Cellier Comparing performance of algorithms for generating the Duquenne- Guigues basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Konstantin Bazhanov and Sergei Obiedkov Filtering Machine Translation Results with Automatically Constructed Concept Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Yılmaz Kılıçaslan and Edip Serdar Güner Concept lattices in fuzzy relation equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Juan Carlos Dı́az and Jesús Medina-Moreno Adaptation knowledge discovery for cooking using closed itemset extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer Fast Computation of Proper Premises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Uwe Ryssel, Felix Distel and Daniel Borchmann Block relations in fuzzy setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Jan Konecny and Michal Krupka A closure algorithm using a recursive decomposition of the set of Moore co-families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Pierre Colomb, Alexis Irlande, Olivier Raynaud and Yoan Renaud Iterative Software Design of Computer Games through FCA . . . . . . . . . . . . 143 David Llansó, Marco Antonio Gómez-Martı́n, Pedro Pablo Gomez-Martin and Pedro Antonio González-Calero Fuzzy-valued Triadic Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Cynthia Vera Glodeanu Mining bicluster of similar values with triadic concept analysis . . . . . . . . . . 175 Mehdi Kaytoue, Sergei Kuznetsov, Juraj Macko, Wagner Meira and Amedeo Napoli Fast Mining of Iceberg Lattices: A Modular Approach Using Generators . 191 Laszlo Szathmary, Petko Valtchev, Amedeo Napoli, Robert Godin, Alix Boc and Vladimir Makarenkov Boolean factors as a means of clustering of interestingness measures of association rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Radim Belohlavek, Dhouha Grissa, Sylvie Guillaume, Engelbert Mephu Nguifo and Jan Outrata Combining Formal Concept Analysis and Translation to Assign Frames and Thematic Grids to French Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Ingrid Falk and Claire Gardent Generation algorithm of a concept lattice with limited access to objects . . 239 Christophe Demko and Karell Bertet Homogeneity and Stability in Conceptual Analysis . . . . . . . . . . . . . . . . . . . . 251 Paula Brito and Géraldine Polaillon A lattice-based query system for assessing the quality of hydro-ecosystems 265 Agnés Braud, Cristina Nica, Corinne Grac and Florence Le Ber The word problem in semiconcept algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Philippe Balbiani Looking for analogical proportions in a formal concept analysis setting . . . 295 Laurent Miclet, Henri Prade and David Guennec Random extents and random closure systems . . . . . . . . . . . . . . . . . . . . . . . . . 309 Bernhard Ganter Extracting Decision Trees From Interval Pattern Concept Lattices . . . . . . 319 Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd A New Formal Context for Symmetric Dependencies . . . . . . . . . . . . . . . . . . 333 Jaume Baixeries Cheating to achieve Formal Concept Analysis over a large formal context 349 Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo A FCA-based analysis of sequential care trajectories . . . . . . . . . . . . . . . . . . . 363 Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli Querying Relational Concept Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Zeina Azmeh, Mohamed Hacéne-Rouane, Marianne Huchard, Amedeo Napoli and Petko Valtchev Links between modular decomposition of concept lattice and bimodular decomposition of a context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Alain Gély Short Papers Abduction in Description Logics using Formal Concept Analysis and Mathematical Morphology: application to image interpretation . . . . . . . . . 405 Jamal Atif, Céline Hudelot and Isabelle Bloch A local discretization of continuous data for lattices: Technical aspects . . . 409 Nathalie Girard, Karell Bertet and Muriel Visani Formal Concept Analysis on Graphics Hardware . . . . . . . . . . . . . . . . . . . . . . 413 W. B. Langdon, Shin Yoo, and Mark Harman Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Preface The Eighth International Conference “Concept Lattices and Applications (CLA 2011)” is held in Nancy, France from October 17th until October 20th 2011. CLA 2011 is aimed at providing to everyone interested in Formal Concept Analysis and more generally in Concept Lattices or Galois Lattices, students, professors, researchers and engineers, a global and an advanced view of some of the last research trends and applications in this field. As the diversity of the selected pa- pers shows, there is a wide range of theoretical and practical research directions, around data and knowledge processing, e.g. data mining, knowledge discovery, knowledge representation, reasoning, pattern recognition, together with logic, algebra and lattice theory. This volume includes the selected papers and the abstracts of the 7 invited talks. This year there were initially 47 submissions from which 27 papers were accepted as full papers and 3 papers as posters. We would like to thank here the authors for their work, often of very good quality, the members of the program committee and the external reviewers who did a great job as this can be seen in their reviews. This is one witnesses of the growing quality and importance of CLA, highlightening its leading position in the field. Next, this year is a little bit special while the bicentennial of the birth of Evariste Galois (1811–1832) is celebrated, particularly in France. Evariste Galois has something to do with Concept Lattices as they are based on a so-called “Galois connection”. Among the invited speakers, some of them will discuss of these fundamental aspects of Concept Lattices. Moreover, this is also the occasion of thanking the seven invited speakers who, at least we hope that, will meet the wishes of the attendees. We would like to thank firstly our first sponsors, namely the CNRS GDR I3 and Institut National Polytechnique de Lorraine (INPL). Then we would like to thank the steering committee of CLA for giving us the occasion of leading this edition of CLA, the conference participants for their participation and support, and people in charge of the organization, especially Anne-Lise Charbonnier, Nicolas Alcaraz and Mehdi Kaytoue, whose help was very precious in many occasions. Finally, we also do not forget that the conference was managed (quite easily) with the Easychair system, paper submission, selection, and reviewing, and that Jan Outrata has offered his files for preparing the proceedings. October 2011 Amedeo Napoli Vilem Vychodil Program Chairs of CLA 2011 Mathematical Morphology, Lattices, and Formal Concept Analysis Isabelle Bloch Telecom ParisTech, CNRS LTCI, Paris, France Abstract. Lattice theory has become a popular mathematical framework in differ- ent domains of information processing, and various communities employ its features and properties, e.g. in knowledge representation, in logics, automated reasoning and decision making, in image processing, in information retrieval, in soft computing, in formal concept analysis. Mathematical morphology is based adjunctions, on the alge- braic framework of posets, and more specifically of complete lattices, which endows it with strong properties and allows for multiple applications and extensions. In this talk we will summarize the main concepts of mathematical morphology and show their instantiations in different settings, where a complete lattice can be built, such as sets, functions, partitions, fuzzy sets, bipolar fuzzy sets, formal logics . . . We will detail in particular the links between morphological operations and formal concept analysis, thus initiating links between two domains that were quite disconnected until now, which could therefore open new interesting perspectives. Random concept lattices Richard Emilion MAPMO, University of Orléans, France Abstract. After presenting an algorithm providing concepts and frequent concepts, we will study the random size of concept lattices in the case of a Bernoulli(p) context. Next, for random lines which are independent and identically distributed or more generally outcomes of a Markov chain, we will show the almost everywhere convergence of the random closed intents towards deterministic intents. Finally we will consider the problem of predicting the number of concepts before choosing any algorithm. Galois and his Connections—A retrospective on the 200th birthday of Evariste Galois Marcel Erné University of Hannover, Germany Abstract. A frequently used tool in mathematics is what Oystein Ore called “Galois connections” (also “Galois connexions”, “Galois correspondences” or “dual adjunc- tions”). These are pairs (ϕ, ψ) of maps between ordered sets in opposite direction so that x ≤ ψ(y) is equivalent to y ≤ ϕ(x). This concept looks rather simple but proves very effective. The primary gain of such “dually adjoint situations” is that the ranges of the involved maps are dually isomorphic: thus, Galois connections present two faces of the same medal. Many concrete instances are given by what Garrett Birkhoff termed “polarities”: these are nothing but Galois connections between power sets. In slightly different terminology, the fundamental observation of modern Formal Concept Analysis is that every “formal context”, that is, any triple (J, M, I) where I is a relation between (the elements of) J and M , gives rise to a Galois connection (assigning to each subset of one side its “polar”, “extent” or “intent” on the other side), such that the resulting two closure systems of polars are dually isomorphic; more surprising is the fact that, conversely, every dual isomorphism between two closure systems arises in a unique fashion from a relation between the underlying sets. In other words: the complete Boolean algebra of all relations between J and M is isomorphic to that of all Galois connections between ¶J and ¶M , and also to that of all dual isomorphisms between closure systems on J and M , respectively. The classical example is the Fundamental Theorem of Galois Theory, establishing a dual isomorphism between the complete lattice of all intermediate fields of a Galois ex- tension and that of the corresponding automorphism groups, due to Richard Dedekind and Emil Artin. In contrast to that correspondence, which does not occur explicitly in Galois’ succinct original articles, a few other closely related Galois connections may be discovered in his work (of course not under that name). Besides these historical forerunners, we discuss a few other highlights of mathematical theories where Galois connections enter in a convincing way through certain “orthogonality” relations, and show how the Galois approach considerably facilitates the proofs. For example, each of the following important structural isomorphisms arises from a rather simple relation on the respective ground sets: – the dual isomorphism between the subspace lattice of a finite-dimensional linear space and the left ideal lattice of its endomorphism ring – the duality between algebraic varieties and radical ideals – the categorical equivalence between ordered sets and Alexandroff spaces – the representation of complete Boolean algebras as systems of polars. Canonical extensions, Duality theory, and Formal Concept Analysis Mai Gehrke LIAFA CNRS – University of Paris 7, France Abstract. The theory of canonical extensions, developed by Jonsson and Tarski in the setting of Boolean algebras with operators, provides an algebraic approach to duality theory. Recent developments in this theory have revealed that in this algebraic guise duality theory is no more complicated outside than within the setting of Boolean algebras or distributive lattices. This has opened the possibility of exporting the highly developed machinery and knowledge available in the classical setting (e.g. in modal logic) to the completely general setting of partially ordered and non-distributive lattice ordered algebras. Duality theory in this setting is a special instance of the connection between formal contexts and concept lattices and thus allows methods of classical algebraic logic to be imported into FCA. This will be an introductory talk on the subject of canonical extensions with the purpose of outlining the relationship between the three topics of the title. Galois connections and residuation: origins and developments II Bruno Leclerc CAMS – École des Hautes Études en Sciences Sociales, Paris, France Abstract. From the seventies, the uses of Galois connections (and residuated/residual maps) multiplied in applied fields. Indeed Galois connections have been several times rediscovered for one or another purpose, for instance in fuzzy set theory or aggregation problems. In this talk, we illustrate the diversity of such applications. Of course, the many developments in Galois lattices and Formal Concept Analysis, with their rela- tion with Data Mining, will be only briefly evoked. Besides these developments, one finds, among other uses, alternative methods to study a correspondence (binary rela- tion) between two sets, models of classification and preferences, fitting and aggregation problems. Galois connections and residuation: origins and developments I Bernard Monjardet Centre d’Economie de la Sorbonne (University of Paris 1) and CAMS (Centre Analyse et Mathmatique Sociale), France Abstract. The equivalent notions of Galois connexions, and of residual and residuated maps occur in a great varieties of “pure” as well as “applied” mathematical theories. They explicitly appeared in the framework of lattice theory and the first of these talks is devoted to the history of their appearance and of the revealing of their links in this framework. So this talk covers more or less the period between 1940 (with the notion of polarity defined in the first edition of Birkoff’s book Lattice theory) and 1972 (with Blyth and Janowitz’s book Residuation theory), a period containing fundamental works like Öre’s 1944 paper Galois connexions or Croisot’s 1956 paper Applications résiduées. Metrics, Betweeness Relations, and Entropies on Lattices and Applications Dan Simovici Department of Computer Science, University of Massachusetts at Boston, USA Abstract. We discuss an algebraic axiomatization of the notion of entropy in the framework of lattices as well as characterizations of metric structures induced by such entropies. The proposed new framework takes advantage of the partial orders defined on lattices, in particular the semimodular lattice of partitions of a finite set to allow multiple applications in data mining: data discretization, recommendation systems, classification, and feature selection. Vertical decomposition of a lattice using clique separators Anne Berry, Romain Pogorelcnik, Alain Sigayret LIMOS UMR CNRS 6158?? Ensemble Scientifique des Cézeaux Université Blaise Pascal, F-63 173 Aubière, France. berry@isima.fr, romain.pogorelcnik@isima.fr, sigayret@isima.fr Abstract. A concept (or Galois) lattice is built on a binary relation; this relation can be represented by a bipartite graph. We explain how we can use the graph tool of clique minimal separator decomposition to decom- pose some bipartite graphs into subgraphs in linear time; each subgraph corresponds to a subrelation. We show that the lattices of these subrela- tions easily yield the lattice of the global relation. We also illustrate how this decomposition is a tool to help displaying the lattice. Keywords: lattice decomposition, clique separator decomposition, lat- tice drawing 1 Introduction In many algorithms dealing with hard problems, a divide-and-conquer approach is helpful in practical applications. Computing the set of concepts associated with a given context (or the set of maximal rectangles associated with a binary relation) is time-consuming, as there may be an exponential number of concepts. It would be interesting to decompose the lattice into smaller sublattices. What we propose here is to decompose the relation into smaller subrelations, compute the lattice of each subrelation, and then use these lattices to reconstruct the lattice of the global relation. For this, we use a graph decomposition, called ”clique separator decomposi- tion”, introduced by Tarjan [9], and refined afterwards (see [3] for an extensive introduction to this decomposition). The general principal is roughly the follow- ing: repeatedly find a set of vertices which are pairwise adjacent (called a clique) and whose removal disconnects the graph (called a separator), then copy this clique separator into the different connected components obtained. When the de- composition is completed, a set of subgraphs is obtained, inconveniently called ’atoms’: each subgraph is a maximal subgraph containing no clique separator. ?? Research partially supported by the French Agency for Research under the DEFIS program TODO, ANR-09-EMER-010. c 2011 by the paper authors. CLA 2011, pp. 15–29. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 16 Anne Berry, Romain Pogorelcnik and Alain Sigayret In a previous work [2], we used graph algorithms on the complement of the bipartite graph associated with the relation. In this paper, we will apply this decomposition directly to the bipartite graph itself. It turns out upon investigation that the subgraphs obtained divide not only the graph, but in a very similar fashion divide the matrix of the relation, the set of concepts and the lattice. When the relation has a clique separator of size two, the lattice, as we will explain further on, is divided along a vertical axis by an atom and a co-atom which correspond to the two vertices of the separator. Thus not only can the concepts be computed on the subrelations, but the Hasse diagram of the lattice can be drawn better, as no edge need cross this vertical line. Moreover, in a bipartite graph, this decomposition can be implemented with a better worse-case time complexity than in the general case, as the clique sep- arators can be of size one (in this case they are called articulation points) or of size two. In both cases, the entire decomposition can be computed in linear time, i.e. in the size of the relation, thanks to the works of [6] and [3]. Although some graphs do not have a clique separator, when there is one, the decomposition is thus a useful and non-expensive pre-processing step. The paper is organized as follows: we will first give some more preliminaries in Section 2. In Section 3, we explain how a bipartite graph is decomposed. In Section 4, we show how to reconstruct the global lattice from the concepts ob- tained on the subrelations. In Section 5, we discuss using vertical decomposition as a tool for layout help. Finally, in Section 6, we conclude with some general remarks. 2 Preliminaries We will first recall essential definitions and properties. All the graphs will be undirected and finite. For a graph G = (V, E), V is the vertex set and E is the edge set. For xy ∈ E, x 6= y, x and y are said to be adjacent; we say that x sees y. A graph is connected if, for every pair {x, y} of vertices, there is a path from x to y. When a graph is not connected, the maximal connected subgraphs are called the connected components. For C ⊂ V , G(C) denotes the subgraph induced by C. In a graph G = (V, E), the neighborhood of a vertex x ∈ V is the set NG (x) = {y ∈ V, x 6= y|xy ∈ E}. NG (x) is denoted N (x) when there is no ambiguity. A clique is set of vertices which induces a complete graph, i.e. with all possible edges. A bipartite graph G = (X + Y, E), where + stands for disjoint union, is built on two vertex sets, X and Y , with no edge between vertices of X and no edge between vertices of Y . A maximal biclique of a bipartite graph G = (X + Y, E) is a subgraph G(X 0 + Y 0 ) with all possible edges between the vertices of X 0 and the vertices of Y 0 . A relation R ⊆ O × A on a set O of objects and a set A of attributes is associated with a bipartite graph G=(O + A, E), which we will denote Bip(R); Vertical decomposition of a lattice using clique separators 17 thus, for x ∈ O and y ∈ A, (x, y) is in R iff xy is an edge of G. The maximal rectangles of the relation correspond exactly to the maximal bicliques (maximal complete bipartite subgraphs) of Bip(R) and to the elements (the concepts) of concept lattice L(R) associated with context (O, A, R). If O1 × A1 and O2 × A2 are concepts of L(R) then O1 × A1 O2 × A2 iff O1 ⊆ O2 iff A1 ⊇ A2 ; the corresponding bicliques on vertex sets O1 + A1 and O2 + A2 of Bip(R) are comparable the same way. An atom (resp. co-atom) of L(R) is a concept covering the minimum element (resp. covered by the maximum element). In the bipartite graph Bip(R), the neighborhoods are defined as follows: for x ∈ O, N (x) = R(x) = {y ∈ A|(x, y) ∈ R}, and for x ∈ A, N (x) = R−1 (x) = {y ∈ O|(y, x) ∈ R}. A separator in a connected graph is a set of vertices, the removal of which disconnects the graph. A clique separator is a separator which is a clique. Clique separator decomposition [9] of a graph G = (V, E) is a process which repeat- edly finds a clique separator S and copies it into the connected components of G(V − S). When only minimal separators are used (see [3] for extensive general definitions), the decomposition is unique and the subgraphs obtained in the end are exactly the maximal subgraphs containing no clique separator [8], [3]. In a bipartite graph, the clique minimal separators are of size one or two. A separator of size one is called a articulation point. A clique separator S = {x, y} of size two is minimal if there are two components C1 and C2 of G(V − S) such that x and y both have at least one neighbor in C1 as well as in C2 . 3 Decomposing the bipartite graph and the relation In the rest of the paper, we will use the bipartite graph Bip(R) defined by a rela- tion R ⊆ O × A. Figure 1 shows an example of a relation with the corresponding bipartite graph. Fig. 1. A relation and the corresponding bipartite graph In this section, we will first discuss connectivity issues, then illustrate and give our process to decompose the bipartite graph. 18 Anne Berry, Romain Pogorelcnik and Alain Sigayret 3.1 Decomposing the bipartite graph into connected components When the bipartite graph Bip(R) is not connected, our process can be applied separately (or in parallel) to each connected component. The lattice obtained is characteristic: when the top and bottom elements are removed from the Hasse diagram, the resulting diagram is a set of disjoint lattices, with a one-to-one correspondence between the connected components of Bip(R) and the lattices obtained. Figure 2 shows such a disconnected bipartite graph, its relation, and the corresponding lattice. Note that trivially, if a connected component has only one vertex, this means that the corresponding row or column of the relation is empty: such a component corresponds to a lattice with only one element. In the rest of the paper, we will consider only relations whose bipartite graph is connected. Fig. 2. A disconnected bipartite graph, its relation, and the corresponding character- istic lattice. Vertical decomposition of a lattice using clique separators 19 3.2 Illustrating the two decomposition steps In order to make the process we use as clear as possible, we will first describe what happens when one decomposition step is applied for each of the two de- compositions involved (using clique separators of size one or of size two). It is important to understand, however, that to ensure a good (linear) time complexity, each of the two decompositions is computed globally in a single linear-time pass. Step with an articulation point The removal of an articulation point {x} in a connected bipartite graph G results in components C1 , ..., Ck , which correspond to a partition V = C1 + ... + Ck + {x} of Bip(R). After a decomposition step using {x}, x is preserved, with its local neighborhood, in each component, so that G is replaced by k subgraphs G(C1 ∪ {x}), ..., G(Ck ∪ {x}). Example 1. In the input graph of Figure 3, vertex 1 defines an articulation point that induces two connected components {2, 3, a, b, c} and {4, d, e}. The decom- position step results into subgraphs G({1, 2, 3, a, b, c}) and G({1, 4, d, e}). Fig. 3. Decomposition by articulation point {1}. Step with a separator of size two When the removal of a clique minimal separator {x, y} in a connected bi- partite graph G results into components C1 ,..., Ck , corresponding to a partition V = C1 +...+Ck +{x, y}. The decomposition step replaces G with G(C1 ∪{x, y}), ..., G(Ck ∪ {x, y}). Example 2. In the input graph of Figure 4, {2, b} is a clique minimal separa- tor of size two that induces two connected components {1, 2, 3, a, b, c, f } and {2, 5, 6, b, g, h}. 20 Anne Berry, Romain Pogorelcnik and Alain Sigayret Fig. 4. Decomposition by clique separator {2,b} 3.3 Ordering the steps A clique minimal separator of size two may include an articulation point. Thus it is important to complete the decomposition by the articulation points first, and then go on to decompose the obtained subgraphs using their clique separators of size two. Example 3. In the input graph of Figure 5, {2} is an articulation point included in clique minimal separator {2, b}. The decomposition will begin with {2}, in- ducing components {2, i} and {1, 2, 3, 5, 6, a, b, c, f, g, h}. As there remains no articulation point in these resulting components, the second component will be then decomposed by {2, b} into {1, 2, 3, a, b, c, f } and {2, 5, 6, b, g, h}. Fig. 5. Articulation point {2} is processed before clique separator {2,b} After the bipartite graph decomposition is completed, we will obtain sub- graphs with no remaining clique minimal separator, and the corresponding sub- relations with their associated lattices. Example 4. Figure 6 shows that the input graph of Figure 1 is decomposable into four bipartite subgraphs: G1 = G({1, 2, i}), G2 = G({2, 5, 6, b, g, h}), G3 = G({1, 2, 3, a, b, c, f }) and G4 = G({1, 4, d, e}). Note that in the general case all subgraphs obtained have at least two vertices, since at least one vertex of a separator is copied into a component which has at least one vertex. Vertical decomposition of a lattice using clique separators 21 Fig. 6. Complete decomposition of a bipartite graph 3.4 The global decomposition process To obtain the entire decomposition of a connected bipartite graph, we will thus first decompose the graph using all articulation points, and then decompose each of the subgraphs obtained using all its clique separators of size 2. The articulation points (clique minimal separators of size one) can be found by a simple depth-first search [6], as well as the corresponding decomposition of the graph (called decomposition into biconnected components). The search for clique separators of size two corresponds to a more complicated algorithm, described in [5]: all separators of size 2 are obtained, whether they are cliques or not. Once this list of separators is obtained, it is easy to check which are joined by an edge. The desired decomposition can then be obtained easily. In both cases, the set of clique separators is output. Both algorithms run in linear time, so the global complexity is in O(|R|) to obtain both the set of subgraphs and the set of clique separators of the original graph. 3.5 Sub-patterns defined in the matrix When the clique separators involved do not overlap and each defines exactly two connected components, this decomposition of the bipartite graph partitions the graph and the underlying relation. This results in a significant pattern of their binary matrix. As the components obtained are pairwise disconnected, the matrix can be reorganized in such a way that zeroes are gathered into blocks. Two components may appear as consecutive blocks, linked by a row corresponding to the articulation point that has been used to split them, or linked by one cell giving the edge between the two vertices of a size two clique minimal separator. In the general case, this pattern can occur in only some parts of the matrix, and different patterns can be defined according to the different separators which the user chooses to represent. Example 5. The first of the two matrices below corresponds to our running ex- ample from Figure 1 and has been reorganized, following the decomposition, which results in the second matrix. Notice how {1} is an articulation point, so 22 Anne Berry, Romain Pogorelcnik and Alain Sigayret row 1 is shared by blocs 231×bacf and 14×de; and how {2, b} is a clique separa- tor of size two, so cell [2, b] is the intersection of blocs 562 × ghb and 231 × bacf . [2, i] is not integrated into the pattern, because separator {2, b} of Bip(R) defines 3 connected components: {2, 5, 6, b, g, h}, {i} and {1, 3, 4, a, c, d, e, f }. We will now describe a process to organize the lines and columns of the ma- trix with such patterns. We will construct a meta-graph (introduced in [7] as the ’atom graph’), whose vertices represent the subgraphs obtained by our decompo- sition, and where there is an edge between two such vertices if the two subgraphs which are the endpoints have a non-empty intersection which is a clique minimal separator separating the corresponding two subgraphs in the original bipartite graph. In this meta-graph, choose a chordless path; the succession of subgraphs along this path will yield a succession of rectangles in the matrix which corre- spond to a pattern. Example 6. Figure 7 gives the meta-graph for our running example from Figure 1. Chordless path ({2, 5, 6, b, g, h}, {1, 2, 3, a, b, c, }, {1, 4, d, e}) was chosen for the patterning. Another possible chordless path would be ({2, i}, {1, 2, 3, a, b, c, }, {1, 4, d, e}). Finding a chordless path in a graph can be done in linear time; the meta-graph has less than min(|A|, |O|) elements, so finding such a path costs less than (min(|A|, |O|))2 . Fig. 7. Meta-graph for graph from Figure 1 Vertical decomposition of a lattice using clique separators 23 3.6 Decomposing the lattice We will now examine how the set of concepts is modified and partitioned into the subgraphs obtained. As clique minimal separators are copied in all the com- ponents induced, most of the concepts will be preserved by the decomposition. Furthermore, only initial concepts including a vertex of a clique minimal sepa- rator may be affected by the decomposition. Definition 1. We will say that a maximal biclique is a star maximal biclique if it contains either exactly one object or exactly one attribute. This single object or attribute will be called the center of the star. Lemma 1. A star maximal biclique {x} ∪ N (x) of Bip(R) is an atomic concept of L(R) (atom or co-atom), unless x is universal in Bip(R). More precisely, {x} × N (x) is a atom if x ∈ O and N (x) 6= A, or N (x) × {x} is a co-atom if x ∈ A and N (x) 6= O. Proof. Let {x} ∪ N (x) be a star maximal biclique of Bip(R). As a maximal biclique, it corresponds to a concept of L(R). Suppose the star has x ∈ O as center . As a star, it contains no other element of O; as a biclique, it includes all N (x) ⊆ A, and no other element of A by maximality. The corresponding concept is {x} × N (x) which is obviously the first concept from bottom to top including x. As the biclique is maximal, and as x is not universal, this concept cannot be the bottom of L(R) but only an atom. A similar proof holds for x ∈ A and co-atomicity. We will now give the property which describes how the maximal bicliques are dispatched or modified by the decomposition. In the next Section, we will give a general theorem and its proof, from which these properties can be deduced. Property 1. Let G = (X + Y, E) be a bipartite graph, let S be a clique minimal separator of G which decomposes G into subgraphs G1 , ..., Gk . Then: 1. ∀x ∈ S, {x} ∪ NG (x) is a star maximal biclique of G. 2. ∀x ∈ S, {x} ∪ NG (x) is not a maximal biclique of any Gi . 3. ∀x ∈ S, {x} ∪ NGi (x) is a biclique of Gi , but it is maximal in Gi iff it is not strictly contained in any other biclique of Gi . 4. All the maximal bicliques of G which are not star bicliques with any x ∈ S as a center are partitioned into the corresponding subgraphs. With the help of Lemma 1, this property may be translated in terms of lattices. Given a relation R, its associated graph G, its lattice L(R), and a decomposition step of G into some Gi s by articulation point {x}: If x ∈ O (resp. ∈ A) is an articulation point of G, {x}×NG (x) (resp. NG (x)× {x}) is a concept of L(R). After the decomposition step, in each subgraph Gi of G, either this concept becomes {x}×NGi (x), or this concept disappears from Gi ; this latter case occurs when there is in Gi some x0 ∈ O, the introducer of which appears after the introducer of x in L(R), from bottom to top (resp. from top 24 Anne Berry, Romain Pogorelcnik and Alain Sigayret to bottom if x, x0 ∈ A). Every other concept will appear unchanged in exactly one lattice associated with a subgraph Gi . The same holds for each vertex of a size two clique minimal separator. Example 7. Figure 8 illustrates a decomposition step with articulation point {1} using the graph from Figure 3. Concept {1, 4} × {d, e} disappears from the first component {1, 2, 3, a, b, c}, but remains in the second component {1, 4, d, e}. Fig. 8. Example of lattice decomposition using articulation point {1}. Example 8. Figure 9 illustrates a decomposition step with clique separator {2, b} using the graph from Figure 4. Concept {2}×N (2) is duplicated into components {2, 5, 6, b, g, h} and {1, 2, 3, a, b, c, f }; concept N (b) × {b} will appear as {2, 6} × {b} in the first component, but not in the second one, as {2, 3, b} is a biclique included in maximal biclique {2, 3, b, f } of G. Remark 1. The smaller lattices obtained can not be called sublattices of the initial lattice as some of their elements may not be the same: for example, in Figure 9, {2} × {b, c, f } is an element of the third smaller lattice L(G3 ) but is not an element of the initial lattice L(G). 4 Reconstructing the lattice We will now explain how, given the subgraphs obtained by clique decomposi- tion, as well as the corresponding subrelations and subsets of concepts, we can reconstruct the set of concepts of the global input bipartite graph. We will then go on to explain how to obtain the Hasse diagram of the reconstructed lattice. Vertical decomposition of a lattice using clique separators 25 Fig. 9. Example of lattice decomposition using clique separator {2, b}. 4.1 Reconstructing the set of concepts We will use the following Theorem, which describes the concepts of the global lattice. Theorem 1. Let G = (X + Y, E) be a bipartite graph, let Σ = {s1 , ...sh } be the set of all the vertices which belong to a clique separator of G, let G1 , ...Gk be the set of subgraphs obtained by the complete corresponding clique separator decomposition. Then: 1. For every s ∈ Σ, {s} ∪ NG (s) is a star maximal biclique of G. 2. Any maximal biclique of a subgraph Gi which is not a star with a vertex of Σ as center is also a maximal biclique of G. 3. There are no other maximal bicliques in G: ∀s ∈ Σ, no other star maximal biclique of Gi with center s is a star maximal biclique of G, and these are the only maximal bicliques of some graph Gi which are not also maximal bicliques in G. Proof. 1. For every s ∈ Σ, {s} ∪ NG (s) is a star maximal biclique of G: Case 1: s is an articulation point, let Gi , Gj be two graphs which s belongs to; s must be adjacent to some vertex yi in Gi and to some vertex yj in Gj . Suppose {s} ∪ NG (s) is not a maximal biclique: there must be a vertex z in G which sees yi and yj , but then {s} cannot separate yi from yj , a contradiction. Case 2: s is not an articulation point, let s0 be a vertex of S such that 26 Anne Berry, Romain Pogorelcnik and Alain Sigayret {s, s0 } is a clique separator of G, separating Gi from Gj . s must as above see some vertex yi in Gi and some vertex yj in Gj . Suppose {s} ∪ NG (s) is not maximal: there must be some vertex w in G which sees all of NG (s), but w must see yi and yj , so {s, s0 } cannot separate Gi from Gj . 2. Let B be a non-star maximal biclique of Gi , containing o1 , o2 ∈ O and a1 , a2 ∈ A. Suppose B is not maximal in G: there must be a vertex y in G − B which augments B. Let y be in Gj , wlog y ∈ A: y must see o1 and o2 . Since Gi is a maximal subgraph with no clique separator, Gi + {y} must have a clique separator. Therefore N (y) must be a clique separator of this subgraph, but this is impossible, since y sees two non-adjacent vertices of Gi . 3. Any star maximal biclique B of Gi whose center is not in Σ is also a star maximal biclique of G: suppose we can augment B in G. Case 1: v sees an extra vertex w; Gi + {w} contains as above a clique sepa- rator, which is impossible since N (w) = v and v 6∈ S. Case 2: A vertex z of Gj is adjacent to all of N (v): again, G + {z} contains a clique separator, so N (z) is a clique separator, but that is impossible since N (z) contains at least two non-adjacent vertices. 4. For s ∈ Σ, no star maximal biclique of Gi is a star maximal biclique of G: let B be a star maximal biclique of Gi , with s ∈ Σ as center. s ∈ Σ, so s belongs to some clique separator which separates Gi from some graph Gj . s must see a vertex yj in Gj , so B + {yj } is a larger star including B: B cannot be maximal in G. Example 9. We illustrate Theorem 1 using graph G from Figure 6, whose decom- position yields subgraphs G1 , ..., G4 , with G1 = G({1, 2, i}), G2 = G({2, 5, 6, b, g, h}), G3 = G({1, 2, 3, a, b, c, f }) and G4 = G({1, 4, d, e}). Finally, Σ = {1, 2, b}. The corresponding lattices are shown in Figure 10, and their concepts are presented in the table below. In this table, braces have been omitted; symbol ⇒ represents a concept of the considered subgraph Gi which is identical to a concept of G (there can be only one ⇒ per row); the other concepts of the subgraphs will not be preserved in G while recomposing. L(G) L(G1 ) L(G2 ) L(G3 ) L(G4 ) star max. biclique of G ? 1 × acde 1 × ac yes 2 × bcf hi 2 × i 2 × bh 2 × bcf yes 3 × abf ⇒ 14 × de ⇒ 5 × gh ⇒ 6 × bg ⇒ 13 × a ⇒ 236 × b 26 × b yes 12 × c ⇒ 23 × bf ⇒ 56 × g ⇒ 25 × h ⇒ Vertical decomposition of a lattice using clique separators 27 Fig. 10. Reconstruction of a lattice According to Theorem 1, the steps to reconstruct the maximal concepts of the global lattice from the concepts of the smaller lattices are: 1. Compute Σ, the set of attributes and objects involved in a clique minimal separator. (In our example, Σ = {1, 2, b}.) 2. Compute the maximal star bicliques for all the elements of Σ. (In our exam- ple, we will compute star maximal bicliques 1 × acde, 2 × bcf hi and 26 × b.) 3. For each smaller lattice, remove from the set of concepts the atoms or co- atoms corresponding to elements of Σ; maintain all the other concepts as concepts of the global lattice. (In our example, for L(G3 ), we will remove 1×ac and 2×bcf , and maintain 3×abf, 13×a, 12×c and 23×bf as concepts of L(G).) Step 1 requires O(|R|) time. Step 2 can be done while computing the smaller lattices; Step 3 costs constant time per concept. Thus the overall complexity of the reconstruction is in O(|R|) time. 4.2 Reconstructing the edges of the Hasse diagram According to Theorem 1, the maximal bicliques which are not star maximal bicliques with a vertex of Σ as center are preserved; therefore, the corresponding edges between the elements of the lattice are also preserved. In the process 28 Anne Berry, Romain Pogorelcnik and Alain Sigayret described below, we will refer to labels in the lattice as being the ’reduced’ labels, such as the ones used in our lattice figures throuhout this paper. To define the edges of the Hasse diagram of lattice L(G), we will, for each smaller lattice L(Gi ): – find each atom (or co-atom) which corresponds to an element σ of Σ (such as 2 or b for L(G3 ) in our example). – If σ shares its label with some non-elements of Σ, remove all elements of Σ from the label. (In our example for L(G3 ), bf becomes f ). If σ does not share its label with some non-elements of Σ, remove the atom or co-atom. (In our example for L(G3 ), remove element 2). – Maintain the remaining edges as edges of L(G). – Compute the neighborhood in L(G) of each atom or co-atom which corre- sponds to an element of Σ. All this can be done in polynomial time: there are at most |A| + |O| vertices in Σ, and the corresponding edges can be added in O((|A| + |O|)2 |R|). 5 Vertical decomposition as a layout tool When there is a size two clique separator in the bipartite graph which divides the graph into two components, the concepts which are not involved in the separator can be displayed on the two sides of the separator, thus helping to minimize the number of line crossings in the Hasse diagram. (a) (b) Fig. 11. (a) Lattice constructed by Concept Explorer using the minimal intersection layout option (8 crossings). (b) Lattice re-drawn using the information on clique sep- arators (5 crossings). To illustrate this, we have used our running example with ’Concept Explorer’ [1], which is a very nice and user-friendly tool for handling lattices. Notice how- ever how clique separator {1, d} is better displayed when put at the right ex- tremity. Vertical decomposition of a lattice using clique separators 29 Figure 11 shows the lattice as proposed by Concept Explorer, and then re- drawn with insight on the clique separators of the bipartite graph. The same technique of course also applies when there is a succession of such clique separators. Let us add that if moreover both lattices are planar, as discussed in [4], merg- ing the two lattices obtained using the clique separator as central will preserve planarity. 6 Conclusion and perspectives We have used a graph method, clique minimal separator decomposition, to pro- vide simple tools which can help reduce the time spent computing the elements of a lattice, as well as improve the drawing of its Hasse diagram. When there is no clique separator in the bipartite graph, it could be inter- esting to investigate restricting the relation to a subgraph or partial subgraph which does have one. We leave open the question of characterizing, without computing the relation, the lattices whose underlying bipartite graph has a clique minimal separator. Acknowledgments The authors sincerely thank all the referees for their useful suggestions and questions. References 1. Concept Explorer. Downloadable at http://sourceforge.net/projects/conexp/, ver- sion 1.3 (Java), 20/12/2009. 2. Berry A., Sigayret A.: Representing a concept lattice by a graph. Discrete Applied Mathematics, 144(1-2):27–42, 2004. 3. Berry A., Pogorelcnik R., Simonet G.: An introduction to clique minimal separator decomposition. Algorithms, 3(2):197–215, 2010. 4. Eschen E.M., Pinet N., Sigayret A.: Consecutive-ones: handling lattice planarity efficiently. CLA’07, Montpellier (Fr), 2007. 5. Hopcroft J. E., Tarjan R. E.: Dividing a graph into triconnected components. SIAM J. Comput., 2(3):135–158, 1973. 6. Hopcroft J. E., Tarjan R. E.: Efficient algorithms for graph manipulation [H] (Algorithm 447). Commun. ACM, 16(6):372–378, 1973. 7. Kaba B., Pinet N., Lelandais G., Sigayret A., Berry A.: Clustering gene expression data using graph separators. In Silico Biology, 7(4-5):433–52, 2007. 8. Leimer H.-G.: Optimal decomposition by clique separators. Discrete Mathematics, 113(1-3):99–123, 1993. 9. Tarjan R. E.: Decomposition by clique separators. Discrete Mathematics, 55(2):221–232, 1985. Building up Shared Knowledge with Logical Information Systems Mireille Ducassé1 , Sébastien Ferré2 , and Peggy Cellier1 1 IRISA-INSA de Rennes, France, {ducasse, cellier}@irisa.fr 2 IRISA-University of Rennes 1, France ferre@irisa.fr Abstract. Logical Information Systems (LIS) are based on Logical Con- cept Analysis, an extension of Formal Concept Analysis. This paper de- scribes an application of LIS to support group decision. A case study gathered a research team. The objective was to decide on a set of po- tential conferences on which to send submissions. People individually used Abilis, a LIS web server, to preselect a set of conferences. Start- ing from 1041 call for papers, the individual participants preselected 63 conferences. They met and collectively used Abilis to select a shared set of 42 target conferences. The team could then sketch a publication planning. The case study provides evidence that LIS cover at least three of the collaboration patterns identified by Kolfschoten, de Vreede and Briggs. Abilis helped the team to build a more complete and relevant set of information (Generate/Gathering pattern); to build a shared un- derstanding of the relevant information (Clarify/Building Shared Un- derstanding); and to quickly reduce the number of target conferences (Reduce/Filtering pattern). 1 Introduction Group work represents a large amount of time in professional life while many people feel that much of that time is wasted. Lewis [13] argues that this amount of time is even going to increase because problems are becoming more complex and are meant to be solved in a distributed way. Each involved person has a local and partial view of the problem, no one embraces the whole required knowledge. Lewis also emphasizes that it is common that “groups fail to adequately define a problem before rushing to judgment”. Building up shared knowledge in order to gather relevant distributed knowledge of a problem is therefore a crucial issue. Logical Information Systems (LIS) are based on Logical Concept Analysis (LCA), an extension of Formal Concept Analysis (FCA). In a previous work [5], Camelis, a single-user logical information system, has been shown useful to sup- port serene and fair meetings. This paper shows how Abilis, a LIS web server that implements OnLine Analytical Processing (OLAP [3]) features, can be applied to help build shared knowledge among a group of skilled users. The presented case study gathered a research team to decide on a publication strategy. Starting from 1041 call for papers, each team member on his own c 2011 by the paper authors. CLA 2011, pp. 31–42. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 32 Mireille Ducasse, Sebastien Ferre and Peggy Cellier preselected a set of conferences matching his own focus of interest. The union of individual preselections still contained 63 conferences. Then, participants met for an hour and a half and collectively built a shared set of 42 target conferences. For each conference, the team shared a deep understanding of why it was relevant. The team could sketch a publication planning in a non-conflictual way. Kolfschoten, de Vreede and Briggs have classified collaboration tasks into 16 collaboration patterns [12]. The contribution of this paper is to give evidences that LIS can significantly support three of these patterns which are important as- pects of decision making, namely Generate/Gathering, Clarify/Building Shared Understanding and Reduce/Filtering. Firstly, the navigation and filtering capa- bilities of LIS were helpful to detect inconsistencies and missing knowledge. The updating capabilities of LIS enabled participants to add objects, features and links between them on the fly. As a result the group had a more complete and relevant set of information (Generate/Gathering pattern). Secondly, the com- pact views provided by LIS and the OLAP features helped participants embrace the whole required knowledge. The group could therefore build a shared under- standing of the relevant information which was previously distributed amongst the participants (Clarify/Building Shared Understanding pattern). Thirdly, the navigation and filtering capabilities of LIS were relevant to quickly converge on a reduced number of target conferences (Reduce/Filtering pattern). In the following, Section 2 briefly introduces logical information systems. Sec- tion 3 describes the case study. Section 4 gives detailed arguments to support the claim that logical information systems help build up shared knowledge. Section 5 discusses related work. 2 Logical Information Systems Logical Information Systems (LIS) [7] belong to a paradigm of information re- trieval that combines querying and navigation. They are formally based on a logical generalization of Formal Concept Analysis (FCA) [8], namely Logical Concept Analysis (LCA) [6]. In LCA, logical formulas are used instead of sets of attributes to describe objects. LCA and LIS are generic in that the logic is not fixed, but is a parameter of those formalisms. Logical formulas are also used to represent queries and navigation links. The concept lattice serves as the navigation structure: every query leads to a concept, and every navigation link leads from one concept to another. The query can be modified in three ways: by formula edition, by navigation (selecting features in the index in order to modify the query) or by examples. Annotations can be performed in the same interface. Camelis3 has been developed since 2002; a web interface, Abilis 4 , has recently been added. It incorporates display paradigms based on On-Line Analytical Pro- cessing (OLAP). Instead of being presented as a list of objects, an extent can be partitioned as an OLAP cube, namely a multi-dimensional array [1]. 3 see http://www.irisa.fr/LIS/ferre/camelis/ 4 http://ledenez.insa-rennes.fr/abilis/ Building up Shared Knowledge with Logical Information Systems 33 3 The Case Study The reported case study gathered 6 participants, including the 3 authors, 4 academics and 2 PhD students. All participants were familiar with LIS, 4 of them had not previously used a LIS tool as a decision support system. The objective was to identify the publishing strategy of the team: in which conferences to submit and why. This has not been a conflictual decision, the group admitted very early that the set of selected conferences could be rather large provided that there was a good reason to keep each of them. One person, the facilitator, spent an afternoon organizing the meeting and preparing the raw data as well as a logical context according to the objective. She collected data about conference call for papers of about a year, related to themes corresponding to the potential area of the team, from WikiCFP, a semantic wiki for Calls For Papers in science and technology fields 5 . There were 1041 events: conferences, symposiums, workshops but also special issues of journals. Then every participant, on its own, spent between half an hour to two hours to browse the context, update it if necessary and preselect a number of con- ferences (Section 3.1). The group met for one hour and a half. It collaborately explored the data and selected a restricted set of conferences (Section 3.2). After the meeting, every participant filled a questionnaire. The context used for the case study can be freely accessed 6 . 3.1 Distributed Individual Preselection and Update When the context was ready, every participant was asked to preselect a set of conferences that could be possible submission targets. The instruction was to be as liberal as wanted and in case of doubt to label the conference as a possible target. During this phase, each of the academics preselected 20 to 30 conferences and each of the PhD students preselected around 10 conferences. Each participant had his own “basket”. There were overlappings, altogether 63 conferences were preselected. Participants also introduced new conferences and new features, for example, the ranking of the Australian CORE association 7 (Ranking), and the person expected to be a possible first author for the target conference (Main author). Figure 1 shows a state of Abilis during the preselection phase. LIS user in- terfaces give a local view of the concept lattice, centered on a focus concept. The local view is made of three parts: (1) the query (top left), (2) the extent (bottom right), and (3) the index (bottom left). The query is a logical formula that typi- cally combines attributes (e.g., Name), patterns (e.g., contains "conference"), and Boolean connectors (and, or, not). The extent is the set of objects that are matched by the query, according to logical subsumption. The extent identifies 5 http://www.wikicfp.com/cfp/ 6 http://ledenez.insa-rennes.fr/abilis/, connect as guest, load Call for papers. 7 http://core.edu.au/index.php/categories/conference rankings 34 Mireille Ducasse, Sebastien Ferre and Peggy Cellier Fig. 1. Snapshot of Abilis during preselection: a powerful query Building up Shared Knowledge with Logical Information Systems 35 the focus concept. Finally, the index is a set of features, taken from a finite sub- set of the logic, and is restricted to features associated to at least one object in the extent. The index plays the role of a summary or inventory of the extent, showing which kinds of objects there are, and how many of each kind there are (e.g., in Figure 1, 8 objects in the extent have data mining as a theme). In the index, features are organized as a taxonomy according to logical subsumption. The query area (top left) shows the current selection criteria: (Name contains "conference" or Name contains "symposium") and not (Name contains "agent" or Name contains "challenge" or Name contains "workshop") and (Theme is "Knowledge Discovery" or Theme is "Knowledge Engineering" or Theme is "Knowledge Management"). Note that the query had been obtained solely by clicking on features of the index (bottom left). Let us describe how it had been produced. Initially there were 1041 objects. Firstly, opening the Name ? feature, the participant had noticed that names could con- tain “conference” or “symposium” but also other keywords such as “special issue”. He decided to concentrate on conferences and symposiums by clicking on the two features and then on the zoom button. The resulting query was (Name contains "conference" or Name contains "symposium") and there were 495 objects in the extent. However, the displayed features under Name ? showed that there were still objects whose name in addition to “conference” or “symposium” also contained “agent”, “challenge” or “workshop”. He decided to filter them out by clicking on the three features then on the Not button then on the zoom button. The resulting query was (Name contains "conference" or Name contains "symposium") and not (Name contains "agent" or Name contains "challenge" or Name contains "workshop") and there were 475 objects in the extent. He opened the Theme ? feature, clicked on the three sub- features containing “Knowledge”, then on the zoom button. The resulting query is the one displayed on Figure 1 and there are 48 objects in the displayed extent. In the extent area (bottom right), the 48 acronyms of the selected conferences are displayed. In the index area, one can see which of the features are filled for these objects. The displayed features have at least 1 object attached to them. The number of objects actually attached to them is shown in parentheses. For example, only 14 of the preselected conferences have an abstract deadline. All of them have an acronym, a date of beginning, a date of end, a date for the paper deadline, a name, some other (not very relevant) information, as well as at least a theme and a town. The features shared by all selected objects have that number in parentheses (48 in this case). For the readers who have a color printout, these features are in green. The other features are attached to only some of the objects. For example, only 16 objects have a ranking attached to them: 4 core A, 6 core B, 2 core C, 1 ‘too recent event’, 4 unknown (to the Core ranking). One way to pursue the navigation could be, for example, to click on Ranking ? to select the conferences for which the information is filled. Alternatively, one could concentrate on the ones for which the ranking is not filled, for example 36 Mireille Ducasse, Sebastien Ferre and Peggy Cellier to fill in this information on the fly for the conferences which are considered interesting. Another way to pursue the navigation could be, for example, to notice that under the Theme ? feature, there are more than the selected themes. One can see that among the selected conferences, one conference is also relevant to the Decision Support Systems theme. One could zoom into it, this would add and Theme is "Decision Support Systems" to the query ; the object area would then display the relevant conference (namely GDN2011). 3.2 Collaborative Data Exploration, Update and Selection The group eventually had a physical meeting where the current state of the context was constantly displayed on a screen. Using the navigation facilities of Abilis, the conferences were examined by decreasing ranking. Initially, the group put in the selection all the A and A+ preselected conferences. After some discussions, it had, however, been decided that Human Computer Interaction (HCI) was too far away from the core of the team’s research. Subsequently, the HCI conferences already selected were removed from the selection. For the conferences of rank B, the team decided that most of them were pretty good and deserved to be kept in the selection. For the others, the group investigated first the conferences without ranking and very recent, trying to identify the ones with high selection rate or other good reasons to put them in the selection. Some of the arguments have been added into the context. Some others were taken into account on the fly to select some conferences but they seemed so obvious at the time of the discussion that they were not added in the context. Figure 2 shows the selection made by the group at a given point. In the extent area, on the right hand side, the selected objects are partitioned according to the deadline month and the anticipated main author thanks to the OLAP like facilities of Abilis [1]. Instead of being presented as a list of objects, an extent can be partitioned as an OLAP cube, namely a multi-dimensional array. Assuming object features are valued attributes, each attribute can play the role of a dimension, whose values play the role of indices along this dimension. Users can freely interleave changes to the query and changes to the extent view. The query is SelectionLIS02Dec and scope international. Note that the partition display is consistent with the query. When the group added and scope international to the query, the national conferences disappeared from the array. Some conferences, absent from WikiCFP have been entered on the fly at that stage (for example, ICCS 2011 - Concept). Not all the features had been entered for all of them. In particular, one can see in the feature area that only 28 out of 29 had been preselected. Nevertheless, the group judged that the deadline month, the potential main author and the ranking were crucial for the decision process and added them systematically. It is easy to find which objects do not have a feature using Not and zoom, and then to attach features to them. Building up Shared Knowledge with Logical Information Systems 37 Fig. 2. Snapshot of Abilis during collaborative data exploration: a partition deadline month/mainAuthor 38 Mireille Ducasse, Sebastien Ferre and Peggy Cellier One can see that there are enough opportunities for each participant to pub- lish round the year. One can also see at a glance where compromises and decisions will have to be made. For example, PC will probably not be in a position to pub- lish at IDA, ISSTA, KDD and ICCS the same year. Thanks to this global view PC can discuss with potential co-authors what the best strategy could be. A follow up to the meeting was that participants made a personal publication planning, knowing that their target conferences were approved by the group. 4 Discussion In this section, we discuss how the reported case study provides evidences that LIS help keep the group focused (Section 4.1) and that LIS also help build up shared knowledge (Section 4.2). As already mentioned, participants filled up a questionnaire after the meeting. In the following, for each item, we introduce the arguments, we present a summary of relevant parts of participant feedbacks, followed by an analysis of the features of LIS that are crucial for the arguments. 4.1 Logical Information Systems Help Keep the Group Focused It is recognized that an expert facilitator can significantly increase the efficiency of a meeting (see for example [2]). A study made by den Hengst and Adkins [10] investigated which facilitation functions were found the most challenging by facilitators around the world. It provides evidences that facilitators find that “the most difficult facilitation function in meeting procedures is keeping the group outcome focused.” In our case study, all participants reported that they could very easily stay focused on the point currently discussed thanks to the query and the consistency between the three views. As the objective was to construct a selection explicitly identified in Abilis by a feature, the objective of the meeting was always present to everybody and straightforward to bring back in case of digression. Furthermore, even if the context contained over a thousand conferences, thanks to the navigation facilities of LIS, only relevant information was displayed at a given time. Therefore, there was no “noise” and no dispersion of attention, the displayed information was always closely connected to the focus of the discussion. 4.2 Logical Information Systems Help Build Up Shared Knowledge Kolfschoten, de Vreede and Briggs have identified 6 collaboration patterns: Gen- erate, Reduce, Clarify, Organize, Evaluate, and Consensus Building [12]. We discuss in the following three of their 16 sub-patterns for which all participants agreed that they are supported by Abilis in its current stage. For the other sub- patterns, the situation did not demand much with respect to them. For example, the decision to make was not conflictual, the set of selected conferences could be rather large, there was, therefore, not much to experiment about “consensus Building up Shared Knowledge with Logical Information Systems 39 building.” The descriptions of the patterns in italic are from Kolfschoten, de Vreede and Briggs. Generate/Gathering: move from having fewer to having more complete and rel- evant information shared by the group. Before and during the meeting, information has been added to the shared knowledge repository of the group, namely the logical context. A new theme, important for the team and missing from WikiCFP, has been added: Decision Support Systems. New conferences have been added into the context either by individual participants in the preselection phase or by the group during the selection phase. New features were added. For example, it soon appeared that some sort of conference rankings was necessary. The group added by hand, for the conferences that were selected, the ranking of the Australian Core association. Some conferences were added subsequently, sometimes the ranking was not added at once. All participants acknowledged that the tool helped the group to set up a set of features which was relevant and reflecting the group’s point of view. The crucial characteristics of LIS for this aspect are those which enable in- tegrated navigation and update. Firstly, the possibility to update the context while navigating in it enables participants to enhance it on the fly adding small pieces of relevant information at a time. Secondly, for each feature, Abilis displays the number of objects which have it. It is therefore immediate to detect when a feature is not systematically filled. The query Notselects the objects that do not have the feature. Users can then decide if they want to update them. Thanks to the query, as soon as an object is updated, it disappears from the extent. Users can immediately see what remains to be updated. Thirdly, updating the context does not divert from the initial objective. Indeed, the Back button allows users to go back to previous queries. Fourthly, the three views (query, features, objects) are always consistent and provide a “global” understanding of the relevant objects. Lastly, in the shared web server, participants can see what information the others had entered. Hence each participant can inspire the others. For the last aspect, the facilitator inputs were decisive. Participants reported that they did not invent much, they imitated and adapted from what the facili- tator had initiated. This is consistent with the literature on group decision and negotiation which emphasizes the key role of facilitators [2]. Clarify/Building Shared Understanding: Move from having less to more shared understanding of the concepts shared by the group and the words and phrases used to express them. Participants, even senior ones, discovered new conferences. Some were sur- prised by the ranking of conferences that they had previously overlooked. Par- ticipants had a much clearer idea of who was interested in what. All participants found that the tool helped them understand the points of view of the others. 40 Mireille Ducasse, Sebastien Ferre and Peggy Cellier The crucial characteristics of LIS for this aspect are those which enable to grasp a global understanding at a glance. Firstly, the query, as discussed earlier, helps keep the group focused. Secondly, the consistency between the 3 views helps participants to grasp the situation. Thirdly, irrelevant features are not in the index, the features in the index thus reflect the current state of the group de- cision. Fourthly, the partitions à la OLAP sort the information according to the criteria under investigation. Lastly, the shared web server enables participants to know before the meeting what the others have entered. Reduce/Filtering: move from having many concepts to fewer concepts that meet specific criteria according to the group members. Both at preselection time and during the meeting, participants could quickly strip down the set of conferences of interest according to the most relevant criteria. All participants said that the filtering criteria were relevant and reflecting the group’s point of view. They also all thought that the group was satisfied with the selected set of conferences. The crucial characteristics of LIS for this aspect are those of the navigation core of LIS. Firstly, the features of the index propose filtering criteria. They are dynamically computed and they are relevant for the current selection of ob- jects. Secondly, the query with its powerful logic capabilities enables participants to express sophisticated selections. Thirdly, the navigation facilities enable par- ticipants to build powerful queries, even without knowing anything about the syntax. Lastly, users do not have to worry about the consistency of the set of selected objects. The view consistency of Abilis guaranties that all conferences fulfilling the expressed query are indeed present. This aspect is especially important. As claimed by Davis et al. [4], conver- gence in meetings is a slow and painful process for groups. Vogel and Coombes [16] present an experiment that supports the hypothesis that groups selecting ideas from a multicriteria task formulation will converge better than groups working on a single criteria formulation, where convergence is defined as moving from many ideas to a focus on a few ideas that are worthy of further attention. Convergence is very close to the Reduce/Filtering collaboration pattern. They also underline that people try to minimize the effects of information overload by employing con- scious or even unconscious strategies of heuristics in order to reduce information load, where information overload is defined as having too many things to do at once. With their powerful navigation facilities, LIS enable to address a large num- ber of criteria and objects with a limited information overload. Indeed, one can concentrate on local aspects. The global consistency is maintained automatically by the concept lattice. 5 Related work Abilis in its current stage does not pretend to match up to operational group support systems (GSS) which have a much broader scope. LIS, however, could be Building up Shared Knowledge with Logical Information Systems 41 integrated in some of the modules of GSS. For example, M eetingworksT M [13], one of the most established GSS, is a modular toolkit that can be configured to support a wide variety of group tasks. Its “Organize” module proposes a tree structure to help analyze and sort ideas. That structure looks much like the index of LIS. It can be edited by hand and some limited selection is possible. The navigation capabilities of LIS based on the concept lattice are, however, more powerful. Concept analysis has been applied to numerous social contexts, such as so- cial networks [15], computer-mediated communication [9] and domestic violence detection [14]. Most of those applications are intended to be applied a posteri- ori, in order to get some understanding of the studied social phenomena. On the contrary, we propose to use Logical Concept Analysis in the course and as a support of the social phenomena itself. In our case, the purpose is to support a collaborative decision process. Our approach is to other social applications, what information retrieval is to data mining. Whereas data mining automatically com- putes a global and static view on a posteriori data, information retrieval (i.e. navigation in and update of the concept lattice) presents the user with a local and dynamic view on live data, and only guides users in their choice. A specificity of LIS is the use of logics. This has consequences both on the queries that can be expressed, and on the feature taxonomy. The use of logics al- lows to express inequalities on numerical attributes, disjunctions and negations in queries. In pure FCA, only conjunctions of Boolean attributes can be expressed. Previous sections have shown how disjunction and negation are important to express selection criteria. In the taxonomy, criteria are organized according to the logical subsumption relation between them in pure FCA, criteria would be presented as a long flat list. Logics help to make the taxonomy more concise and readable by grouping and hierarchizing together similar criteria. The taxonomy can be dynamically updated by end-users. 6 Conclusion In this paper we have shown that a Logical Information System web server could be used to support a group decision process consisting of 1) data preparation 2) distributed individual preselection and update and 3) collaborative data ex- ploration, update and selection. We have presented evidences that the navigation and filtering capabilities of LIS were relevant to quickly reduce the number of target conferences. Secondly, the same capabilities were also helpful to detect inconsistencies and missing knowledge. The updating capabilities of LIS enabled participants to add objects, features and links between them on the fly. As a result the group had a more complete and relevant set of information. Thirdly, the group had built a shared understanding of the relevant information. Acknowledgments The authors thank Pierre Allard and Benjamin Sigonneau for the development and maintenance of Abilis. They thank Pierre Allard, Annie Foret and Alice Hermann for attending the experiment and giving many insight- ful feedbacks. 42 Mireille Ducasse, Sebastien Ferre and Peggy Cellier References 1. Allard, P., Ferré, S., Ridoux, O.: Discovering functional dependencies and associa- tion rules by navigating in a lattice of OLAP views. In: Kryszkiewicz, M., Obiedkov, S. (eds.) Concept Lattices and Their Applications. pp. 199–210. CEUR-WS (2010) 2. Briggs, R.O., Kolfschoten, G.L., de Vreede, G.J., Albrecht, C.C., Lukosch, S.G.: Facilitator in a box: Computer assisted collaboration engineering and process sup- port systems for rapid development of collaborative applications for high-value tasks. In: HICSS. pp. 1–10. IEEE Computer Society (2010) 3. Codd, E., Codd, S., Salley, C.: Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate. Codd & Date, Inc, San Jose (1993) 4. Davis, A., de Vreede, G.J., Briggs, R.: Designing thinklets for convergence. In: AMCIS 2007 Proceedings (2007), http://aisel.aisnet.org/amcis2007/358 5. Ducassé, M., Ferré, S.: Fair(er) and (almost) serene committee meetings with logi- cal and formal concept analysis. In: Eklund, P., Haemmerlé, O. (eds.) Proceedings of the International Conference on Conceptual Structures. Springer-Verlag (July 2008), lecture Notes in Artificial Intelligence 5113 6. Ferré, S., Ridoux, O.: A logical generalization of formal concept analysis. In: Mineau, G., Ganter, B. (eds.) International Conference on Conceptual Structures. pp. 371–384. No. 1867 in Lecture Notes in Computer Science, Springer (Aug 2000) 7. Ferré, S., Ridoux, O.: An introduction to logical information systems. Information Processing & Management 40(3), 383–419 (2004) 8. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999) 9. Hara, N.: Analysis of computer-mediated communication: Using formal concept analysis as a visualizing methodology. Journal of Educational Computing Research 26(1), 25–49 (2002) 10. den Hengst, M., Adkins, M.: Which collaboration patterns are most challenging: A global survey of facilitators. In: HICSS. p. 17. IEEE Computer Society (2007) 11. Kilgour, D.M., Eden, C.: Handbook of Group Decision and Negotiation, Advances in Group Decision and Negotiation, vol. 4. Springer Netherlands (2010) 12. Kolfschoten, G.L., de Vreede, G.J., Briggs, R.O.: Collaboration engineering. In: Kilgour and Eden [11], chap. 20, pp. 339–357 13. Lewis, L.F.: Group support systems: Overview and guided tour. In: Kilgour and Eden [11], chap. 14, pp. 249–268 14. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G.: A case of using formal con- cept analysis in combination with emergent self organizing maps for detecting domestic violence. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects, Lecture Notes in Computer Science, vol. 5633, pp. 247–260. Springer Berlin / Heidelberg (2009), http://dx.doi.org/10.1007/978-3-642-03067- 3 20, 10.1007/978-3-642-03067-3 20 15. Roth, C., Bourgine, P.: Lattice-based dynamic and overlapping taxonomies: The case of epistemic communities. Scientometrics 69, 429–447 (2006), http://dx.doi.org/10.1007/s11192-006-0161-6, 10.1007/s11192-006-0161-6 16. Vogel, D., Coombes, J.: The effect of structure on convergence activities using group support systems. In: Kilgour and Eden [11], chap. 17, pp. 301–311 Comparing Performance of Algorithms for Generating the Duquenne–Guigues Basis Konstantin Bazhanov and Sergei Obiedkov Higher School of Economics, Moscow, Russia, kostyabazhanov@mail.ru, sergei.obj@gmail.com Abstract. In this paper, we take a look at algorithms involved in the computation of the Duquenne–Guigues basis of implications. The most widely used algorithm for constructing the basis is Ganter’s Next Clo- sure, designed for generating closed sets of an arbitrary closure system. We show that, for the purpose of generating the basis, the algorithm can be optimized. We compare the performance of the original algorithm and its optimized version in a series of experiments using artificially generated and real-life datasets. An important computationally expen- sive subroutine of the algorithm generates the closure of an attribute set with respect to a set of implications. We compare the performance of three algorithms for this task on their own, as well as in conjunction with each of the two versions of Next Closure. 1 Introduction Implications are among the most important tools of formal concept analysis (FCA) [9]. The set of all attribute implications valid in a formal context defines a closure operator mapping attribute sets to concept intents of the context (this mapping is surjective). The following two algorithmic problems arise with respect to implications: 1. Given a set L of implications and an attribute set A, compute the closure L(A). 2. Given a formal context K, compute a set of implications equivalent to the set of all implications valid in K, i.e., the cover of valid implications. The first of these problems has received considerable attention in the database literature in application to functional dependencies [14]. Although functional dependencies are interpreted differently than implications, the two are in many ways similar: in particular, they share the notion of semantic consequence and the syntactic inference mechanism (Armstrong rules [1]). A linear-time algo- rithm, LinClosure, has been proposed for computing the closure of a set with respect to a set of functional dependencies (or implications) [3], i.e., for solving the first of the two problems stated above. However, the asymptotic complexity estimates may not always be good indicators for relative performance of algo- rithms in practical situations. In Sect. 3, we compare LinClosure with two c 2011 by the paper authors. CLA 2011, pp. 43–57. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 44 Konstantin Bazhanov and Sergei Obiedkov other algorithms—a “naı̈ve” algorithm, Closure [14], and the algorithm pro- posed in [20]—both of which are non-linear. We analyze their performance in several particular cases and compare them experimentally on several datasets. For the second problem, an obvious choice of the cover is the Duquenne– Guigues, or canonical, basis of implications, which is the smallest set equivalent to the set of valid implications [11]. Unlike for the other frequently occurring FCA algorithmic task, the computation of all formal concepts of a formal con- text [12], only few algorithms have been proposed for the calculation of the canonical basis. The most widely-used algorithm was proposed by Ganter in [10]. Another, attribute-incremental, algorithm for the same problem was de- scribed in [17]. It is claimed to be much faster than Ganter’s algorithm for most practical situations. The Concept Explorer software system [21] uses this algo- rithm to generate the Duquenne–Guigues basis of a formal context. However, we do not discuss it here, for we choose to concentrate on the computation of implications in the lectic order (see Sect. 4). The lectic order is important in the interactive knowledge-acquisition procedure of attribute exploration [8], where implications are output one by one and the user is requested to confirm or reject (by providing a counterexample) each implication. Ganter’s algorithm repeatedly computes the closure of an attribute set with respect to a set of implications; therefore, it relies heavily on a subprocedure implementing a solution to the first problem. In Sect. 4, we describe possible optimizations of Ganter’s algorithm and experimentally compare the original and optimized versions in conjunction with each of the three algorithms for solving the first problem. A systematic comparison with the algorithm from [17] is left for further work. 2 The Duquenne–Guigues Basis of Implications Before proceeding, we quickly recall the definition of the Duquenne–Guigues basis and related notions. Given a (formal) context K = (G, M, I), where G is called a set of objects, M is called a set of attributes, and the binary relation I ⊆ G × M specifies which objects have which attributes, the derivation operators (·)I are defined for A ⊆ G and B ⊆ M as follows: A0 = {m ∈ M | ∀g ∈ A : gIm} B 0 = {g ∈ G | ∀m ∈ B : gIm} In words, A0 is the set of attributes common to all objects of A and B 0 is the set of objects sharing all attributes of B. The double application of (·)0 is a closure operator, i.e., (·)00 is extensive, idempotent, and monotonous. Therefore, sets A00 and B 00 are said to be closed. Closed object sets are called concept extents and closed attribute sets are called concept intents of the formal context K. In discussing the algorithms later in the paper, we assume that the sets G and M are finite. An implication over M is an expression A → B, where A, B ⊆ M are at- tribute subsets. It holds in the context if A0 ⊆ B 0 , i.e., every object of the context that has all attributes from A also has all attributes from B. Comparing performance of algorithms for generating the DG basis 45 An attribute subset X ⊆ M respects (or is a model of) an implication A → B if A 6⊆ X or B ⊆ X. Obviously, an implication holds in a context (G, M, I) if and only if {g}0 respects the implication for all g ∈ G. A set L of implications over M defines the closure operator X 7→ L(X) that maps X ⊆ M to the smallest set respecting all the implications in L: \ L(X) = {Y | X ⊆ Y ⊆ M, ∀(A → B) ∈ L : A 6⊆ Y or B ⊆ Y }. We discuss algorithms for computing L(X) in Sect. 3. Note that, if L is the set of all valid implications of a formal context, then L(X) = X 00 for all X ⊆ M . Two implication sets over M are equivalent if they are respected by exactly the same subsets of M . Equivalent implication sets define the same closure op- erator. A minimum cover of an implication set L is a set of minimal size among all implication sets equivalent to L. One particular minimum cover described in [11] is defined using the notion of a pseudo-closed set, which we introduce next. A set P ⊆ M is called pseudo-closed (with respect to a closure operator (·)00 ) if P 6= P 00 and Q00 ⊂ P for every pseudo-closed Q ⊂ P . In particular, all minimal non-closed sets are pseudo-closed. A pseudo-closed attribute set of a formal context is also called a pseudo-intent. The Duquenne–Guigues or canonical basis of implications (with respect to a closure operator (·)00 ) is the set of all implications of the form P → P 00 , where P is pseudo-closed. This set of implications is of minimal size among those defining the closure operator (·)00 . If (·)00 is the closure operator associated with a formal context, the Duquenne–Guigues basis is a minimum cover of valid implications of this context. The computation of the Duquenne–Guigues basis of a formal context is hard, since even recognizing pseudo-intents is a coNP- complete problem [2], see also [13, 7]. We discuss algorithms for computing the basis in Sect. 4. 3 Computing the Closure of an Attribute Set In this section, we compare the performance of algorithms computing the closure of an attribute set X with respect to a set L of implications. Algorithm 1 [14] checks every implication A → B ∈ L and enlarges X with attributes from B if A ⊆ X. The algorithm terminates when a fixed point is reached, that is, when the set X cannot be enlarged any further (which always happens at some moment, since both L and M are assumed finite). The algorithm is obviously quadratic in the number of implications in L in the worst case. The worst case happens when exactly one implication is applied at each iteration (but the last one) of the repeat loop, resulting in |L|(|L| + 1)/2 iterations of the for all loop, each requiring O(|M |) time. Example 1. A simple example is when X = {1} and the implications in L = {{i} → {i + 1} | i ∈ N, 0 < i < n} for some n are arranged in the descending order of their one-element premises. 46 Konstantin Bazhanov and Sergei Obiedkov Algorithm 1 Closure(X, L) Input: An attribute set X ⊆ M and a set L of implications over M . Output: The closure of X w.r.t. implications in L. repeat stable := true for all A → B ∈ L do if A ⊆ X then X := X ∪ B stable := false L := L \ {A → B} until stable return X In [3], a linear-time algorithm, LinClosure, is proposed for the same prob- lem. Algorithm 2 is identical to the version of LinClosure from [14] except for one modification designed to allow implications with empty premises in L. Lin- Closure associates a counter with each implication initializing it with the size of the implication premise. Also, each attribute is linked to a list of implications that have it in their premises. The algorithm then checks every attribute m of X (the set whose closure must be computed) and decrements the counters for all implications linked to m. If the counter of some implication A → B reaches zero, attributes from B are added to X. Afterwards, they are used to decrement counters along with the original attributes of X. When all attributes in X have been checked in this way, the algorithm stops with X containing the closure of the input attribute set. It can be shown that the algorithm is linear in the length of the input as- suming that each attribute in the premise or conclusion of any implication in L requires a constant amount of memory [14]. Example 2. The worst case for LinClosure occurs, for instance, when X ⊂ N, M = X ∪ {1, 2, . . . , n} for some n such that X ∩ {1, 2, . . . , n} = ∅ and L consists of implications of the form X ∪ {i | 0 < i < k} → {k} for all k such that 1 ≤ k ≤ n. During each of the first |X| iterations of the for all loop, the counters of all implications will have to be updated with only the last iteration adding one attribute to X using the implication X → {1}. At each of the subsequent n − 1 iterations, the counter for every so far “unused” implication will be updated and one attribute will be added to X. The next, (|X| + n)th, iteration will terminate the algorithm. Note that, if the implications in L are arranged in the superset-inclusion order of their premises, this example will present the worst case for Algorithm 1 requiring n iterations of the main loop. However, if the implications are arranged in the subset-inclusion order of their premises, one iteration will be sufficient. Inspired by the mechanism used in LinClosure to obtain linear asymp- totic complexity, but somewhat disappointed by the poor performance of the Comparing performance of algorithms for generating the DG basis 47 Algorithm 2 LinClosure(X, L) Input: An attribute set X ⊆ M and a set L of implications over M. Output: The closure of X w.r.t. implications in L. for all A → B ∈ L do count[A → B] := |A| if |A| = 0 then X := X ∪ B for all a ∈ A do add A → B to list[a] update := X while update 6= ∅ do choose m ∈ update update := update \ {m} for all A → B ∈ list[m] do count[A → B] = count[A → B] − 1 if count[A → B] = 0 then add := B \ X X := X ∪ add update := update ∪ add return X algorithm relative to Closure, which was revealed in his experiments, Wild proposed a new algorithm in [20]. We present this algorithm (in a slightly more compact form) as Algorithm 3. The idea is to maintain implication lists similar to those used in LinClosure, but get rid of the counters. Instead, at each step, the algorithm combines the implications in the lists associated with attributes not occurring in X and “fires” the remaining implications (i.e., uses them to enlarge X). When there is no implication to fire, the algorithm terminates with X containing the desired result. Wild claims that his algorithm is faster than both LinClosure and Clo- sure, even though it has the same asymptotic complexity as the latter. The worst case for Algorithm 3 is when L \ L1 contains exactly one implication A → B and B \ X contains exactly one attribute at each iteration of the repeat . . . until loop. Example 1 presents the worst case for 3, but, unlike for Closure, the order of implications in L is irrelevant. The worst case for LinClosure (see Example 2) is also the worst case for Algorithm 3, but it deals with it, perhaps, in a more efficient way using n iterations of the main loop compared to n + |X| iterations of the main loop in LinClosure. Experimental Comparison We implemented the algorithms in C++ using Microsoft Visual Studio 2010. For the implementation of attribute sets, as well as sets of implications in Algorithm 3, we used dynamic bit sets from the Boost library [6]. All the tests described in the following sections were carried out on an Intel Core i5 2.67 GHz computer with 4 Gb of memory running under Windows 7 Home Premium x64. 48 Konstantin Bazhanov and Sergei Obiedkov Algorithm 3 Wild’s Closure(X, L) Input: An attribute set X ⊆ M and a set L of implications over M. Output: The closure of X w.r.t. implications in L. for all m ∈ M do for all A → B ∈ L do if m ∈ A then add A → B to list[m] repeat stable S := true L1 := m∈M \X list[m] for all A → B ∈ L \ L1 do X := X ∪ B stable := false L := L1 until stable return X Figure 1 shows the performance of the three algorithms on Example 1. Algo- rithm 2 is the fastest algorithm in this case: for a given n, it needs n iterations of the outer loop—the same as the other two algorithms, but the inner loop of Algorithm 2 checks exactly one implication at each iteration, whereas the inner loop of Algorithm 1 checks n − i implications at the ith iteration. Although the inner loop of Algorithm 3 checks only one implication at the ith iteration, it has to compute the union of n − i lists in addition. 40 35 Time in sec for 1000 tests 30 25 20 Closure 15 LinClosure Wild's Closure 10 5 0 0 100 200 300 400 500 600 700 800 900 n Fig. 1. The performance of Algorithms 1–3 for Example 1. Comparing performance of algorithms for generating the DG basis 49 Figure 2 shows the performance of the algorithms on Example 2. Here, the behavior of Algorithm 2 is similar to that of Algorithm 1, but Algorithm 2 takes more time due to the complicated initialization step. 40 35 Time in sec for 1000 tests 30 25 20 Closure 15 LinClosure Wild's Closure 10 5 0 0 100 200 300 400 500 600 700 800 900 n Fig. 2. The performance of Algorithms 1–3 for Example 2 with implications in L arranged in the superset-inclusion order of their premises and |X| = 50. Interestingly, Algorithm 1 works amost twice as fast on Example 2 as it does on Example 1. This may seem surprising, since it is easy to see that the algorithm performs essentially the same computations in both cases, the difference being that the implications of Example 1 have single-element premises. However, this turns out to be a source of inefficiency: at each iteration of the main loop, all implications but the last fail to fire, but, for each of them, the algorithm checks if their premises are included in the set X. Generally, when A 6⊆ X, this can be established easier if A is large, for, in this case, A is likely to contain more elements outside X. This effect is reinforced by the implementation of sets as bit strings: roughly speaking, to verify that {i} 6⊆ {1}, it is necessary to check all bits up {i}, whereas {i | 0 < i < k} 6⊆ {k + 1} can be established by checking only one bit (assuming that bits are checked from left to right). Alternative data structures for set implementation might have less dramatic consequences for performance in this setting. On the other hand, the example shows that performance may be affected by issues not so obviously related to the structure of the algorithm, thus, suggesting additional paths to obtain an optimal behavior (e.g., by rearranging attributes or otherwise preprocessing the input data). We have experimented with computing closures using the Duquenne–Guigues bases of formal contexts as input implication sets. Table 1 shows the results for randomly generated contexts. The first two columns indicate the size of the at- tribute set and the number of implications, respectively. The remaining three columns record the time (in seconds) for computing the closures of 1000 ran- 50 Konstantin Bazhanov and Sergei Obiedkov domly generated subsets of M by each of the three algorithms. Table 3 presents similar results for datasets taken from the UCI repository [5] and, if necessary, transformed into formal contexts using FCA scaling [9].1 The contexts are de- scribed in Table 2, where the last four columns correspond to the number of objects, number of attributes, number of intents, and number of pseudo-intents (i.e., the size of the canonical basis) of the context named in the first column. Table 1. Performance on randomly generated tests (time in seconds per 1000 closures) Algorithm |M | |L| 1 2 3 30 557 0.0051 0.2593 0.0590 50 1115 0.0118 0.5926 0.1502 100 380 0.0055 0.2887 0.0900 100 546 0.0086 0.4229 0.1350 100 2269 0.0334 1.5742 0.5023 100 3893 0.0562 2.6186 0.8380 100 7994 0.1134 5.3768 1.7152 100 8136 0.1159 5.6611 1.8412 Table 2. Contexts obtained from UCI datasets Context |G| |M | # intents # pseudo-intents Zoo 101 28 379 141 Postoperative Patient 90 26 2378 619 Congressional Voting 435 18 10644 849 SPECT 267 23 21550 2169 Breast Cancer 286 43 9918 3354 Solar Flare 1389 49 28742 3382 Wisconsin Breast Cancer 699 91 9824 10666 In these experiments, Algorithm 1 was the fastest and Algorithm 2 was the slowest, even though it has the best asymptotic complexity. This can be partly explained by the large overhead of the initialization step (setting up counters and implication lists). Therefore, these results can be used as a reference only when the task is to compute one closure for a given set of implications. When 1 The breast cancer domain was obtained from the University Medical Centre, Insti- tute of Oncology, Ljubljana, Yugoslavia (now, Slovenia). Thanks go to M. Zwitter and M. Soklic for providing the data. Comparing performance of algorithms for generating the DG basis 51 a large number of closures must be computed with respect to the same set of implications, Algorithms 2 and 3 may be more appropriate. Table 3. Performance on the canonical bases of contexts from Table 2 (time in seconds per 1000 closures) Algorithm Context 1 2 3 Zoo 0.0036 0.0905 0.0182 Postoperative Patient 0.0054 0.2980 0.0722 Congressional Voting 0.0075 0.1505 0.0883 SPECT 0.0251 0.9848 0.2570 Breast Cancer 0.0361 1.7912 0.5028 Solar Flare 0.0370 2.1165 0.6317 Wisconsin Breast Cancer 0.1368 8.4984 2.4730 4 Computing the Basis in the Lectic Order The best-known algorithm for computing the Duquenne–Guigues basis was de- veloped by Ganter in [10]. The algorithm is based on the fact that intents and pseudo-intents of a context taken together form a closure system. This makes it possible to iteratively generate all intents and pseudo-intents using Next Clo- sure (see Algorithm 4), a generic algorithm for enumerating closed sets of an arbitrary closure operator (also proposed in [10]). For every generated pseudo- intent P , an implication P → P 00 is added to the basis. The intents, which are also generated, are simply discarded. Algorithm 4 Next Closure(A, M , L) Input: A closure operator X 7→ L(X) on M and a subset A ⊆ M . Output: The lectically next closed set after A. for all m ∈ M in reverse order do if m ∈ A then A := A \ {m} else B := L(A ∪ {m}) if B \ A contains no element < m then return B return ⊥ Next Closure takes a closed set as input and outputs the next closed set according to a particular lectic order, which is a linear extension of the subset- 52 Konstantin Bazhanov and Sergei Obiedkov inclusion order. Assuming a linear order < on attributes in M , we say that a set A ⊆ M is lectically smaller than a set B ⊆ M if ∃b ∈ B \ A ∀a ∈ A(a < b ⇒ a ∈ B). In other words, the lectically largest among two sets is the one containing the smallest element in which they differ. Example 3. Let M = {a < b < c < d < e < f }, A = {a, c, e} and B = {a, b, f }. Then, A is lectically smaller than B, since the first attribute in which they differ, b, is in B. Note that if we represent sets by bit strings with smaller attributes corresponding to higher-order bits (in our example, A = 101010 and B = 110001), the lectic order will match the usual less-than order on binary numbers. To be able to use Next Closure for iterating over intents and pseudo- intents, we need access to the corresponding closure operator. This operator, which we denote by • , is defined via the Duquenne–Guigues basis L as follows.2 For a subset A ⊆ M , put [ A+ = A ∪ {P 00 | P → P 00 ∈ L, P ⊂ A}. Then, A• = A++···+ , where A•+ = A• ; i.e., • is the transitive closure of + . The problem is that L is not available when we start; in fact, this is precisely what we want to generate. Fortunately, for computing a pseudo-closed set A, it is sufficient to know only implications with premises that are proper subsets of A. Generating pseudo-closed sets in the lectic order, which is compatible with the subset-inclusion order, we ensure that, at each step, we have at hand the required part of the basis. Therefore, we can use any of the three algorithms from Sect. 3 to compute A• (provided that the implication A• → A00 has not been added to L yet). Algorithm 5 uses Next Closure to generate the canonical basis. It passes Next Closure the part of the basis computed so far; Next Closure may call any of the Algorithms 1–3 to compute the closure, L(A ∪ {m}), with respect to this set of implications. After Next Closure computes A• , the implication A• → A00 may be added to the basis. Algorithm 5 will then pass A• as the input to Next Closure, but there is some room for optimizations here. Let i be the maximal element of A and j be the minimal element of A00 \ A. Consider the following two cases: j < i: As long as m > i, the set L(A• ∪{m}) will be rejected by Next Closure, since it will contain j. Hence, it makes sense to skip all m > i and continue as if A• had been rejected by Next Closure. This optimization has already been proposed in [17]. i < j: It can be shown that, in this case, the lectically next intent or pseudo- intent after A• is A00 . Hence, A00 could be used at the next step instead of A• . Algorithm 6 takes these considerations into account. 2 We deliberately use the same letter L for an implication set and the closure operator it defines. Comparing performance of algorithms for generating the DG basis 53 Algorithm 5 Canonical Basis(M , 00 ) Input: A closure operator X 7→ X 00 on M , e.g., given by a formal context (G, M, I). Output: The canonical basis for the closure operator. L := ∅ A := ∅ while A 6= M do if A 6= A00 then L := L ∪ {A → A00 } A := Next Closure(A, M, L) return L Algorithm 6 Canonical Basis(M , 00 ), an optimized version Input: A closure operator X 7→ X 00 on M , e.g., given by a formal context (G, M, I). Output: The canonical basis for the closure operator. L := ∅ A := ∅ i := the smallest element of M while A 6= M do if A 6= A00 then L := L ∪ {A → A00 } if A00 \ A contains no element < i then A := A00 i := the largest element of M else A := {m ∈ A | m ≤ i} for all j ≤ i ∈ M in reverse order do if j ∈ A then A := A \ {j} else B := L(A ∪ {j}) if B \ A contains no element < j then A := B i := j exit for return L 54 Konstantin Bazhanov and Sergei Obiedkov Experimental Comparison We used Algorithms 5 and 6 for constructing the canonical bases of the contexts involved in testing the performance of the algorithms from Sect. 3, as well as the context (M, M, 6=) with |M | = 18, which is special in that every subset of M is closed (and hence there are no valid implications). Both algorithms have been tested in conjunction with each of the three procedures for computing closures (Algorithm 1–3). The results are presented in Table 4 and Fig. 3. It can be seen that Algorithm 6 indeed improves on the performance of Algorithm 5. Among the three algorithms computing the closure, the simpler Algorithm 1 is generally more efficient, even though, in our implementation, we do not perform the initialization step of Algorithms 2 and 3 from scratch each time we need to compute a closure of a new set; instead, we reuse the previously constructed counters and implication lists and update them incrementally with the addition of each new implication. We prefer to treat these results as preliminary: it still remains to see whether the asymptotic behavior of LinClosure will give it an advantage over the other algorithms on larger contexts. Table 4. Time (in seconds) for building the canonical bases of artificial contexts Algorithm Context # intents # pseudo-intents 5+1 5+2 5+3 6+1 6+2 6+3 100 × 30, 4 307 557 0.0088 0.0145 0.0119 0.0044 0.0065 0.0059 10 × 100, 25 129 380 0.0330 0.0365 0.0431 0.0073 0.0150 0.0169 100 × 50, 4 251 1115 0.0442 0.0549 0.0617 0.0138 0.0152 0.0176 10 × 100, 50 559 546 0.0542 0.1312 0.1506 0.0382 0.0932 0.0954 20 × 100, 25 716 2269 0.3814 0.3920 0.7380 0.1219 0.1312 0.2504 50 × 100, 10 420 3893 1.1354 0.7291 1.6456 0.1640 0.1003 0.2299 900 × 100, 4 2472 7994 4.6313 2.7893 6.3140 1.5594 0.8980 2.0503 20 × 100, 50 12394 8136 7.3097 8.1432 14.955 5.1091 6.0182 10.867 (M, M, 6=) 262144 0 0.1578 0.3698 0.1936 0.1333 0.2717 0.1656 5 Conclusion In this paper, we compared the performance of several algorithms computing the closure of an attribute set with respect to a set of implications. Each of these algorithms can be used as a (frequently called) subroutine while computing the Duquenne–Guigues basis of a formal context. We tested them in conjunction with Ganter’s algorithm and its optimized version. In our future work, we plan to extend the comparison to algorithms generat- ing the Duquenne–Guigues basis in a different (non-lectic) order, in particular, to incremental [17] and divide-and-conquer [19] approaches, probably, in conjunc- tion with newer algorithms for computing the closure of a set [16]. In addition, Comparing performance of algorithms for generating the DG basis 55 Fig. 3. Time (in seconds) for building the canonical bases of contexts from Table 2 0,18 0,16 0,14 5+1 0,12 5+2 0,10 5+3 0,08 6+1 0,06 6+2 0,04 6+3 0,02 0,00 Zoo Postoperative Patient Congressional Voting 1,6 1,4 1,2 5+1 1,0 5+2 0,8 5+3 6+1 0,6 6+2 0,4 6+3 0,2 0,0 SPECT Breast Cancer 16 14 12 5+1 10 5+2 8 5+3 6+1 6 6+2 4 6+3 2 0 Solar Flare Wisconsin Breast Cancer 56 Konstantin Bazhanov and Sergei Obiedkov we are going to consider algorithms that generate other implication covers: for example, direct basis [15, 20, 4] or proper basis [18]. They can be used as an inter- mediate step in the computation of the Duquenne–Guigues basis. If the number of intents is much larger than the number of pseudo-intents, this two-step ap- proach may be more efficient than direct generation of the Duquenne–Guigues basis with Algorithms 5 or 6, which produce all intents as a side effect. Acknowledgements The second author was supported by the Academic Fund Program of the Higher School of Economics (project 10-04-0017) and the Russian Foundation for Basic Research (grant no. 08-07-92497-NTsNIL a). References 1. Armstrong, W.: Dependency structure of data base ralationship. Proc. IFIP Congress pp. 580–583 (1974) 2. Babin, M.A., Kuznetsov, S.O.: Recognizing pseudo-intents is coNP-complete. In: Kryszkiewicz, M., Obiedkov, S. (eds.) Proceedings of the 7th International Con- ference on Concept Lattices and Their Applications. pp. 294–301. University of Sevilla, Spain (2010) 3. Beeri, C., Bernstein, P.: Computational problems related to the design of normal form relational schemas. ACM TODS 4(1), 30–59 (March 1979) 4. Bertet, K., Monjardet, B.: The multiple facets of the canonical direct unit impli- cational basis. Theor. Comput. Sci. 411(22-24), 2155–2166 (2010) 5. Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://archive.ics.uci.edu/ml 6. Demming, R., Duffy, D.: Introduction to the Boost C++ Libraries. Datasim Edu- cation Bv (2010), see http://www.boost.org 7. Distel, F., Sertkaya, B.: On the complexity of enumerating pseudo-intents. Discrete Appl. Math. 159, 450–466 (March 2011) 8. Ganter, B.: Attribute exploration with background knowledge. Theor. Comput. Sci. pp. 215–233 (1999) 9. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin (1999) 10. Ganter, B.: Two basic algorithms in concept analysis. Preprint 831, Technische Hochschule Darmstadt, Germany (1984) 11. Guigues, J.L., Duquenne, V.: Familles minimales d’implications informatives re- sultant d’un tableau de donnees binaires. Math. Sci. Hum. 95(1), 5–18 (1986) 12. Kuznetsov, S., Obiedkov, S.: Comparing performance of algorithms for generating concept lattices. Journal of Experimental and Theoretical Artificial Intelligence 14(2/3), 189–216 (2002) 13. Kuznetsov, S.O., Obiedkov, S.: Some decision and counting problems of the Duquenne–Guigues basis of implications. Discrete Appl. Math. 156(11), 1994–2003 (2008) 14. Maier, D.: The theory of relational databases. Computer software engineering se- ries, Computer Science Press (1983) Comparing performance of algorithms for generating the DG basis 57 15. Mannila, H., Räihä, K.J.: The design of relational databases. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1992) 16. Mora, A., Aguilera, G., Enciso, M., Cordero, P., de Guzman, I.P.: A new closure algorithm based in logic: SLFD-Closure versus classical closures. Inteligencia Ar- tificial, Revista Iberoamericana de IA 10(31), 31–40 (2006) 17. Obiedkov, S., Duquenne, V.: Attribute-incremental construction of the canonical implication basis. Annals of Mathematics and Artificial Intelligence 49(1-4), 77–99 (April 2007) 18. Taouil, R., Bastide, Y.: Computing proper implications. In Proc. ICCS-2001 In- ternational Workshop on Concept Lattices-Based Theory, Methods and Tools for Knowledge Discovery in Databases pp. 290–303 (2001) 19. Valtchev, P., Duquenne, V.: On the merge of factor canonical bases. In: Medina, R., Obiedkov, S. (eds.) ICFCA. Lecture Notes in Computer Science, vol. 4933, pp. 182–198. Springer (2008) 20. Wild, M.: Computations with finite closure systems and implications. In: Comput- ing and Combinatorics. pp. 111–120 (1995) 21. Yevtushenko, S.A.: System of data analysis “Concept Explorer” (in Russian). In: Proceedings of the 7th national conference on Artificial Intelligence KII-2000. pp. 127–134. Russia (2000), http://conexp.sourceforge.net/ Filtering Machine Translation Results with Automatically Constructed Concept Lattices Yılmaz Kılıçaslan1 and Edip Serdar Güner1, 1 Trakya University, Department of Computer Engineering, 22100 Edirne, Turkey {yilmazk, eserdarguner}@trakya.edu.tr Abstract. Concept lattices can significantly improve machine translation systems when applied as filters to their results. We have developed a rule-based machine translator from Turkish to English in a unification-based programming paradigm and supplemented it with an automatically constructed concept lattice. The test results achieved by applying this translation system to a Turkish child story reveals that lattices used as filters to translation results have a promising potential to improve machine translation. We have compared our system with Google Translate on the data. The comparison suggests that a rule- based system can even compete with this statistical machine translation system that stands out with its wide range of users. Keywords: Concept Lattices, Rule-based Machine Translation, Evaluation of MT systems. 1 Introduction Paradigms of Machine translation (MT) can be classified into two major categories depending on their focus: result-oriented paradigms and process-oriented ones. Statistical MT focuses on the result of the translation, not the translation process itself. In this paradigm, translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. Rule-based MT, a more classical paradigm, focuses on the selection of representations to be used and steps to be performed during the translation process. It is the rule-based paradigm that will be the concern of this paper. We argue for the viability of a rule-based translation model where a concept lattice functions as a filter for its results. In what follows, we first introduce the classical models for doing rule-based MT, illustrating particular problematic cases with translation pairs between Turkish and English (cf. Section 2). Then, we briefly introduce the basic notions of Formal Concept Analysis (FCA) and touch upon the question of how lattices built using FCA can serve as a bridge between two languages (cf. Section 3). This is followed by the presentation of our translation system (cf. Section 4). Subsequently, we report on and evaluate several experiments which we have performed by feeding our translation system with a Turkish child story text (cf. Section 5). The discussion ends with some remarks and with a summary of the paper (cf. Section 6). c 2011 by the paper authors. CLA 2011, pp. 59–73. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 60 Yılmaz Kılıçaslan and Edip Serdar Güner 2 Models for Rule-Based Translation 2.1 Direct Translation The most straightforward MT strategy is the so-called direct translation. Basically, the strategy is to translate each word into its target language counterpart while proceeding word-by-word through the source language text or speech. If the only difference between two languages were due to their lexical choices, this approach could be a very easy way of producing high quality translation. However, languages differ from each other not only lexically but also structurally. In fact, the direct translation strategy works very well only for very simple cases like the following: (1) Turkish: Direct Translation to English: Köpek-ler havlar-lar. Dogs bark. dog-pl bark-3pl In this example, the direct translation strategy provides us with a perfect translation of the Turkish sentence (interpreted as a kind-level statement about dogs). But, consider now the following example: (2) Turkish: Direct Translation to English: Supposing that the referent of the pronoun is a male person, the expected translation for the given Turkish sentence would be the following: (3) Correct Translation: The woman knows him. The direct translation approach fails in this example in the following respects: First, the translation results in a subject-object-verb (SOV) ordering, which does not comply with the canonical SVO ordering in English. SOV is the basic word order in Turkish. Second, the subject does not have the required definite article in the translation. The reason for this is another typological difference between the two languages: Turkish lacks a definite article. Third, the word-by-word translation leaves the English auxiliary verb ambiguous with respect to number, as the Turkish verb does not carry the number information. Fourth, the verb know is encoded in the progressive aspect in the translation, which is unacceptable as it denotes a mental state. This anomaly is the result of directly translating the Turkish continuous suffix –yor to the English suffix –ing. Fifth, the pronoun is left ambiguous with respect to gender in the translation, as Turkish pronouns do not bear this information. Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 61 2.2 Transfer Approach 2.2.1 Syntactic Transfer As Jurafsky and Martin [6] point out, examples like those above suggest that the direct approach to MT is too focused on individual words and that we need to add phrasal and structural knowledge into our MT models to achieve better results. It is through the transfer approach that a rule-based strategy incorporates the structural knowledge into the MT model. In this approach, MT involves three phases: analysis, transfer, and generation. In the analysis phase, the source language text is parsed into a syntactic and/or semantic structure. In the transfer phase, the structure of the source language is transformed to a structure of the target language. The generation phase takes this latter structure as input and turns it to an actual text of the target language. Let us first see how the transfer technique can make use of syntactic knowledge to improve the translation result of the example discussed above. Assuming a simple syntactic paradigm, the input sentence can be parsed into the following structure: (4) Once the sentence has been parsed, the resulting tree will undergo a syntactic transfer operation to resemble the target parse tree and this will be followed by a lexical transfer operation to generate the target text: (5) The syntactic transfer exploits the following facts about English: a singular count noun must have a determiner and the subject agrees in number and person with the verb. Collecting the leaves of the target parse tree, we get the following output: (6) Translation via Syntactic Transfer: This output is free from the first three defects noted with the direct translation. However, the problem of encoding the mental state verb in progressive aspect and the 62 Yılmaz Kılıçaslan and Edip Serdar Güner gender ambiguity of the pronoun still await to be resolved. These require meaning- related knowledge to be incorporated into the MT model. 2.2.2 Semantic Transfer The context-independent aspect of meaning is called semantic meaning. A crucial component of the semantic meaning of a natural language sentence is its lexical aspect, which determines whether the situation that the sentence describes is a (punctual) event, a process or a state. This information is argued to be inherently encoded in the verb. Obviously, knowing is a mental state and, hence, cannot be realized in the progressive aspect. We can apply a shallow semantic analysis to our previously obtained syntactic structure, which will give us a tree structure enriched with aspectual information, and thereby achieve a more satisfactory transfer: (7) The resulting translation is the following: (8) Translation via Semantic Transfer: 2.3 Interlingua Approach There are two problems with the transfer model: it requires contrastive knowledge about languages and it requires such knowledge for every pair of languages. If the meaning of the input can be extracted and encoded in a language-independent form and the output can, in turn, be generated out of this form, there will be no need for any kind of contrastive knowledge. A language-independent meaning representation language to be used in such a scheme is usually referred to as an interlingua. A common way to visualize the three approaches to rule-based MT is with Vauquois triangle shown below (adopted from [6]): Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 63 Fig. 1. The Vauquois triangle. As Jurafsky and Martin point out: [t]he triangle shows the increasing depth of analysis required (on both the analysis and generation end) as we move from the direct approach through transfer approaches, to interlingual approaches. In addition, it shows the decreasing amount of transfer knowledge needed as we move up the triangle, from huge amounts of transfer at the direct level (almost all knowledge is transfer knowledge for each word) through transfer (transfer rules only for parse trees or thematic roles) through interlingua (no specific transfer knowledge). (p. 867) 3 Lattice-Based Interlingua Strategy A question left open above is that of what kind of representation scheme can be used as an interlingua. There are many possible alternatives such as predicate calculus, Minimal Recursion Semantics or an event-based representation. Another interesting possibility is to use lattices built using Formal Concept Analysis (FCA) as meaning representations to this effect. FCA, developed by Ganter & Wille [5], assumes that data from an application are given by a formal context, a triple (G, M, I) consisting of two sets G and M and a so called incidence relation I between these sets. The elements of G are called the objects and the elements of M are called the attributes. The relation I holds between g and m, (g, m) ∈ I if and only if the object g has the attribute m. A formal context induces two operators, both of which usually denoted by ʹ. One of these operators maps each set of objects A to the set of attributes Aʹ which these objects have in common. The other operator maps each set of attributes B to the set of objects Bʹ which satisfies these attributes. FCA is in fact an attempt to give a formal definition of the notion of a ‘concept’. A formal concept of the context (G, M, I) is a pair (A, B) such that G ⊇ A = Aʹ and M ⊇ B Bʹ. A is called the extent and B the intent of the concept (A, B). The set of all concepts of the context (G, M, I) is denoted by C(G, M, I). This set is ordered by a subconcept – superconcept relation, which is a partial order relation denoted by ≤. If (A1, B1) and (A2, B2) are concepts in C(G, M, I), the former is said to 64 Yılmaz Kılıçaslan and Edip Serdar Güner be a subconcept of the latter (or, the latter a superconcept of the former), i.e., (A1, B1) ≤ (A2, B2), if and only if A1 ⊆ A2 (which is equivalent to B1 ⊇ B2). The ordered set C(G, M, I; ≤) is called the concept lattice or (Galois lattice) of the context (G, M, I). A concept lattice can be drawn as a (Hasse) diagram in which concepts are represented by nodes interconnected by lines going down from superconcept nodes to subconcept ones. Priss [15], rewording an idea first mentioned by Kipke & Wille [8], suggests that once linguistic databases are formalized as concept lattices, the lattices can serve as an interlingua. She explains how a concept lattice can serve as a bridge between two languages with the aid of the figure below (taken from [13]): Fig. 2. – A concept lattice as an interlingua. [This figure] shows separate concept lattices for English and German words for “building”. The main difference between English and German is that in English “house” only applies to small residential buildings (denoted by letter “H”), whereas in German even small office buildings (denoted by letter “O”) and larger residential buildings can be called “Haus”. Only factories would not normally be called “Haus” in German. The lattice in the top of the figure constitutes an information channel in the sense of Barwise & Seligman [2] between the German and the English concept lattice. ([15] p. 158) We consider Priss’s approach a promising avenue for interlingua-based translation strategies. We suggest that this approach can work not only for isolated words but also even for text fragments. In what follows, we will sketch out a strategy with interlingual concept lattices serving as filters for refining translation results. The strategy proceeds as follows: 1) Compile a concept lattice from a data source like WordNet. 2) Link the nodes of the lattice to their possibly corresponding expressions in the source and target language. 3) Translate the input text into the target language with no consideration of the pragmatic aspects of its meaning. 4) Integrate the concepts derived from the input text into the concept lattice. The main motivation behind this strategy is to refine the translation results to a certain extent by means of pragmatic knowledge structured as formal contexts. Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 65 4 A Translation System with Interlingual Concept Lattices 4.1 A Concept Lattice Generator Concept lattices to be used as machine translation filters should contain concept nodes associated with both functional and substantive words. All languages have a finite number of functional words. Therefore, a manual construction of the lattice fragments that would contain them would be reasonable. However, manually constructing a concept lattice for lexical words would have considerable drawbacks such as the following: • It is labor intensive. • It is prone to yielding errors which are difficult to detect automatically. • It generates incomplete lists that are costly to extend to cover missing information. • It is not easy to adapt to changes and domain-specific needs. Taking these potential problems into consideration, we have developed a tool for generating concept lattices for lexical words automatically. As this is an FCA application, it is crucial to decide on which formal context to use before delving its implementation details. Priss & Old [16] propose to construct concept neighborhoods in WordNet with a formal context where the formal objects are the words of the synsets belonging to all senses of a word, the formal attributes are the words of the hypernymic synsets and the incidence relation is the semantic relation between the synsets and their hypernymic synsets. The neighborhood lattice of a word in WordNet consists of all words that share some senses with that word.1 Below is the neighborhood lattice their method yields for the word volume: Fig. 3. – Priss and Old’s neighborhood lattice for the word volume. 1 As lattices often grow very rapidly to a size too large to be visualized, Wille [18] describes a method for constructing smaller, so-called “neighborhood” lattices. 66 Yılmaz Kılıçaslan and Edip Serdar Güner Consider the bottom node. The concept represented by this node is not a naturally occurring one. Obviously, the adopted formal context causes two distinct natural concepts to collapse into one single formal concept here. The reason is simply that WordNet employs one single word, i.e., volume, for two distinct senses, i.e., publication and amount. This could leave a translation attempt with the task of disambiguating this word. In fact, WordNet marks each sense with a single so-called synset number. When constructing concept lattices in WordNet, we suggest two amendments to the formal context adopted by Priss and Old. First, the formal objects are to be the synset numbers. Second, the formal attributes are to include also some information compiled from the glosses of the words. The first change allows us to distinguish between the two senses of the word volume, as shown in Fig. 4a. But, we are still far from resolving all ambiguities concerning this word, as indicated by the presence of two objects in the leftmost node. The problem is that the hypernymic attributes are not sufficiently informative to differentiate the 3-D space sense of the word volume from its relative amount sense. This extra information resides in the glosses of the word and once encoded as attributes it evokes the required effect, as shown in Fig. 4b. Fig. 4a. – A neighborhood lattice with the Fig. 4b. – A more fine-grained neighborhood objects being synset numbers. lattice with the objects being synset numbers. Each gloss, which is most likely a noun phrase, is parsed by means of a shift-reduce parser to extract a set of attributes. Having collected the objects (i.e. the synset numbers) and the associated attributes, the FCA algorithm that comes with the FCALGS library [9] is used for deriving a lattice-based ontology from that collection. FCALGS employs a parallel and recursive algorithm. Apart from its being parallel, it is very similar to Kuznetsov’s [10] Close-by-One algorithm. However, even the lattice in Fig4.b is still defective in at least one respect. The names of the objects denoted are lost. To remedy this problem, we suggest to encode the objects as tuples of synset numbers and sets of names, as illustrated below. Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 67 Fig. 5. – A neighborhood lattice including the names of the objects. Another point to note is that the name of a synset serves as the attribute of a subconcept. For example, ‘entity’ is the name of the topmost synset. But, as everything is an entity, any subconcept must treat it as an element of its set of attributes. 4.2 A Sense Translator Each WordNet node is associated with a set of synonymous English words, which is referred to as its synset. Each synset, in effect, denotes a sense in English. Thus, one task to accomplish is to translate synsets into Turkish to the furthest possible extent. We should, of course, keep in mind that some synsets (i.e. some senses encoded in English) may not have a counterpart in the target language. To find the Turkish translation of a particular synset, the Sense Translator first downloads a set of relevant articles via the links given in the disambiguation pages Wikipedia provides for the words in this set. It searches for the hypernyms of the synset in these articles. It assigns each article a score in accordance with the sum of the weighted points of the hypernyms found in this article. More specifically, if a synset has N hypernyms, the Kth hypernym starting from the top is assigned WeightK = K/N. Let FrequencyK be the number of occurrences of an item in a given article, then the score of the article is calculated as follows: Article Score = Weight1 * Frequency1 + ... + WeightN * FrequencyN. (1) If the article with the highest score has a link to a Turkish article, the title of the article will be the translation of the English word under examination. Otherwise, the word will be left unpaired with a Turkish counterpart. Figure 6 visualizes how the word cat in WordNet is translated into its Turkish counterpart, kedi, via Wikipedia. 68 Yılmaz Kılıçaslan and Edip Serdar Güner Fig. 6. - Translating the word cat into Turkish via Wikipedia. The Turkish counterparts will be added next to the English names, as shown below: Fig. 7. - A neighborhood lattice including the Turkish counterparts of the English names. 4.3 A Rule-Based Machine Translator We have designed a transfer-based architecture for Turkish-English translation and implemented the translator in SWI-Prolog which is an open-source implementation of the Prolog programming language. Below is a figure representing the main modules of the translator: Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 69 Fig. 8. - The main modules of the rule-based machine translator. The word list extracted by the Preprocessor is used as an input to the Analysis Module. We have devised a shift-reduce parser in the analysis phase for building up the grammatical structure of expressions. Briefly, a shift-reduce parser uses a bottom- up strategy with an ultimate goal of building trees rooted with a start symbol [1]. The Generation Module first rearranges the constituents using transformation rules. Afterwards, all the structures are lexically transferred into English using a bilingual dictionary. 4.4 Filtering Translation Results with the Concept Lattice Let us turn to our exemplary sentence introduced in (2) (i.e. Kadın onu tanıyor). Failing to take the context of the sentence into account, the rule-based translator generates the result in (8) (i.e. The woman knows him/her/it), where the pronoun is left ambiguous with respect to gender. Our claim is that we can resolve such ambiguities using FCA and thereby refine our translations. To this effect, we propose to generate transient formal concepts for noun phrases. We make the following assumptions. Basically, personal pronouns, determiners and proper names introduce formal objects whereas adjectives and nouns encode formal attributes. Suppose that our sentence is preceded by (the Turkish paraphrase of) a sentence like ‘A man has arrived’. The indefinite determiner evokes a new formal object, say obj1. As the source text is in Turkish, all attributes will be Turkish words. The Turkish counterpart of the word man is adam. Thus, the transient concept for the subject of this sentence will be ({obj1}, {adam}). The task is now to embed this transient concept into the big permanent concept lattice. To do this, a node where the Turkish counterpart of the synset name is ‘adam’ is searched for. Immediately below this node is placed a new node with its set of objects being {obj1} and with no additional attributes. As this is a physical object, the subconcept of this new node has to be the 70 Yılmaz Kılıçaslan and Edip Serdar Güner lowest one. As for the second sentence, the NP kadın (the woman) will be associated with the transient concept ({X},{kadın}) and the pronoun onu (him/her/it) with the transient concept ({Y},{entity}). X and Y are parameters to be anchored to particular formal objects. In other words, they are anaphoric. It seems plausible to assert that the attributes of an anaphoric object must constitute a (generally proper) subset or hypernym set of the attributes of the object serving as the antecedent. Assume that X is somehow anaphorically linked to an object obj2. Now, there are two candidate antecedents for Y. The woman, or the object obj2, is barred from being antecedent of the pronoun by a locality principle like one stated in Chomsky’s [3] Binding Theory: roughly stated, a pronoun and its antecedent cannot occur in the same clause. There remains one single candidate antecedent, obj1. As its attribute set is a hyponym set of {entity}, it can be selected as a legitimate antecedent. The concept node created for the man will also be the one denoted by the pronoun with Y being instantiated with obj1. In the concept lattice constructed in WordNet, the concept named as ‘man’ includes ‘male person’ in its set of attributes. Hence, the ambiguity is resolved and the pronoun translates into English as ‘him’. It is worth noting that in case there is more than one candidate antecedent, an anaphora resolution technique, especially a statistical one, can be employed to pick out the candidate most likely to be the antecedent. The interested reader is referred to Mitkov [12] for a survey of anaphora resolution approaches in general and to Kılıçaslan et al [7] for anaphora resolution in Turkish. The gender disambiguation process can also be carried out for common nouns. Consider the following fragment taken from a child story: (9) Turkish, leaves not only pronouns but also many other words ambiguous with respect to the gender feature. The word ‘kardeş’ in this example is ambiguous between the translations sister and brother. This ambiguity will be resolved in favor of the former interpretation in way similar to the disambiguation process sketched out for pronouns above. In fact, the problem of sense disambiguation is a kind of specification problem. Therefore, it cannot be confined to gender disambiguation. For example, given that we have somehow managed to compile the attributes listed in the column on the left- hand side, our FCA-based system generates the translations listed on the right-hand side: Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 71 zehirli, diş ‘poisonous, tooth’ fang zehirli, mantar ‘poisonous, mushroom’ toadstool sivri, diş ‘sharp, tooth’ fang arka, koltuk ‘rear, seat’ rumble acemi, asker ‘inexperienced, soldier’ recruit It will, of course, be interesting to try to solve other kinds of translation problems with FCA-based techniques. We leave this task to accomplish in the light of further research in the future. 5 Results and Evaluation In the early years of MT, the quality of an MT system was determined by human judgment. Though specially trained for the purpose, human judges are prone to suffer at least from subjectivity. Besides, this exercise is almost always more costly and time consuming. Some automated evaluation metrics have been developed in order to overcome such problems. Among these are BLEU, NIST, WER and PER. BLEU [14] and NIST [4] are rather specialized metrics. They are employed by considering the fraction of output n-grams that also appear in a set of human translations (n-gram precision). This allows the acknowledgment of a greater diversity of acceptable MT results. As for WER (Word Error Rate) and PER (Position-independent Word Error Rate), they are more general purpose measures and they rely on direct correspondence between the machine translation and a single human-produced reference. WER is based on the Levenshtein distance [11] which is the edit distance between a reference translation and its automatic translation, normalized by the length of the reference translation. This metric is formulated as: S+D+I (2) WER = N where N is the total number of words in the reference translation, S is the number of substituted words in the automatic translation, D is the number of words deleted from the automatic translation and I is the number of words inserted in the reference not appearing in the automatic translation. Although WER requires exactly the same order of the words in automatic translation and reference, PER neglects word order completely [17]. It measures the difference in the count of the words occurring in automatic and reference translations. The resulting number is divided by the number of words in the reference. It is worth noting that PER is technically not a distance measure as it uses a position-independent Levenshtein distance where the distance between a sentence and one of its permutations is always taken to be zero. We used WER to evaluate the performance of our MT system. This is probably the metric most commonly used for similar purposes. As we employed a single human- produced reference, this metric suits well to our evaluation setup. We fed our system 72 Yılmaz Kılıçaslan and Edip Serdar Güner with a Turkish child story involving 91 sentences (970 words).2 We post-edited the resulting translation in order to generate a reference. When necessary calculations were done in accordance with formula (1), the WER turned out to be 38%. The next step was to see the extent to which the performance or our MT system could be improved using concept lattices as filters for the raw results. To this effect, we devised several concept lattices like that in figure 3 and filtered the lexical constituents of each automatic translation with them. A considerable regression in error rate is observed in our system supplemented with concept lattices: the WER score is reduced down to a value around 30%. One question that comes to mind at this point is that of whether the improvement achieved is statistically significant or not. To get an answer we had recourse to the Wilcoxon Signed-Rank test. This test is used to analyze matched-pair numeric data, looking at the difference between the two values in each matched pair. When applied to the WER scores of the non-filtered and filtered translation results, the test shows that the difference is statistically significant (p < 0.005). Another question is that of whether the results are practically satisfactory. To get some insight to this question, we should employ a baseline system for a comparison on usability. Google Translate, a statistical MT system that stands out with its wide range of users, can serve for this purpose. The WER score obtained employing Google Translate on our data is 34%. Recalling that the WER score of our system supplemented with concept lattices is 30%, we seem to be entitled to argue for the viability of rule-based MT systems. Of course, we need to make this claim tentatively since the size of the data on which the comparisons are made is relatively small. However, it should also be noted that we have employed a limited number of concept lattices of considerably small sizes. It is of no doubt that increasing the number and size of filtering lattices would improve the performance of our MT system. More importantly, we do not primarily have an NLP concern in this work. Rather, we would like the results to be evaluated from a computational linguistics perspective. Everything aside, the results show that even a toy lattice based ontology can yield statistically significant improvement for an MT system. 6 Conclusion In this paper, we have illustrated some translation problems caused by some typological divergences between Turkish and English using a particular example. We have gone through the direct translation, syntactic transfer and semantic transfer phases of the rule-based translation model to see what problem is dealt with in what phase. We have seen that a context-dependent pragmatic process is necessary to get to a satisfactory result. Concept lattices appear to be very efficient tools for accomplishing this pragmatic disambiguation task. Supplementing a rule-based MT system with concept lattices not only yields statistically significant improvement on the results of the system but also enables it to compete with a statistical MT system like Google Translate. 2 This is the story where the example in (9) comes from. Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 73 References 1. Aho, A.V., Ullman, J.D.: The Theory of Parsing, Translation, and Compiling, Vol. 1., Prentice Hall (1972) 2. Barwise J., Seligman, J.: Information Flow. The Logic of Distributed Systems. Cambridge University Press (1997) 3. Chomsky, N.: Lectures on Government and Binding, Foris, Dordrecht (1981). 4. Doddington, G.: “Automatic Evaluation of Machine Translation Quality Using N-gram Co- occurrence Statistics”. In Proceedings of HLT 2002 (2nd Conference on Human Language Technology). San Diego, California, 128-132 (2002) 5. Ganter, B., Wille, R.: Formale Begriffsanalyse: Mathematische Grundlagen. Berlin: Springer (1996) 6. Jurafsky, D., Martin, J. H.: Speech and Language Processing, 2nd Edition, Prentice Hall (2009) 7. Kılıçaslan, Y., Güner, E. S., Yıldırım, S.: Learning-based pronoun resolution for Turkish with a comparative evaluation, Computer Speech & Language, Volume 23, Issue 3, p. 311-331 (2009) 8. Kipke, U., Wille, R.: Formale Begriffsanalyse erläutert an einem Wortfeld. LDV–Forum, 5 (1987) 9. Krajca, P., Outrata, J., Vychodil, V.: Parallel Recursive Algorithm for FCA. In: Belohlavek R., Kuznetsov S. O. (Eds.): Proc. CLA 2008, CEUR WS, 433, 71–82 (2008) 10. Kuznetsov, S.: Learning of Simple Conceptual Graphs from Positive and Negative Examples. PKDD 1999, pp. 384–391 (1999) 11. Levenshtein, V. I.: "Binary codes capable of correcting deletions, insertions, and reversals," Tech. Rep. 8. (1966) 12. Mitkov, R.: Anaphora Resolution: The State of the Art. Technical Report, University of Wolverhampton (1999) 13. Old, L. J., Priss, U.: Metaphor and Information Flow. In Proceedings of the 12th Midwest Artificial Intelligence and Cognitive Science Conference, pp. 99-104 (2001) 14. Papineni, K., Roukos, S., Ward, T., Zhu, W. J.: "BLEU: a method for automatic evaluation of machine translation" in ACL-2002: 40th Annual meeting of the Association for Computational Linguistics pp. 311–318 (2002) 15. Priss, U.: Linguistic Applications of Formal Concept Analysis, Ganter; Stumme; Wille (eds.), Formal Concept Analysis, Foundations and Applications, Springer Verlag, LNAI 3626, pp. 149-160 (2005) 16. Priss, U., Old, L. J.: "Concept Neighbourhoods in Lexical Databases.", In Proceedings of the 8th International Conference on Formal Concept Analysis, ICFCA'10, Springer Verlag, LNCS 5986, p. 283-295 (2010) 17. Tillmann C., Vogel, S., Ney, H., Zubiaga A., Sawaf, H.: Accelerated DP based search for statistical translation. In European Conf. on Speech Communication and Technology, pages 2667–2670, Rhodes, Greece, September (1997) 18. Wille, R.: The Formalization of Roget’s International Thesaurus. Unpublished manuscript (1993) Concept lattices in fuzzy relation equations? Juan Carlos Dı́az and Jesús Medina?? Department of Mathematics. University of Cádiz Email: {juancarlos.diaz,jesus.medina}@uca.es Abstract. Fuzzy relation equations are used to investigate theoretical and applicational aspects of fuzzy set theory, e.g., approximate reasoning, time series forecast, decision making and fuzzy control, etc.. This paper relates these equations to a particular kind of concept lattices. 1 Introduction Recently, multi-adjoint property-oriented concept lattices have been introduced in [16] as a generalization of property-oriented concept lattices [10,11] to a fuzzy environment. These concept lattices are a new point of view of rough set the- ory [23] that considers two different sets: the set of objects and the set of at- tributes. On the other hand, fuzzy relation equations, introduced by E. Sanchez [28], are associated to the composition of fuzzy relations and have been used to in- vestigate theoretical and applicational aspects of fuzzy set theory [22], e.g., ap- proximate reasoning, time series forecast, decision making, fuzzy control, as an appropriate tool for handling and modeling of nonprobabilistic form of uncer- tainty, etc. Many papers have investigated the capacity to solve (systems) of fuzzy relation equations, e.g., in [1, 8, 9, 25, 26]. In this paper, the multi-adjoint relation equations are presented as a general- ization of the fuzzy relation equations [24,28]. This general environment inherits the properties of the multi-adjoint philosophy, consequently, e.g., several con- junctors and residuated implications defined on general carriers as lattice struc- tures can be used, which provide more flexibility in order to relate the variables considered in the system. Moreover, multi-adjoint property-oriented concept lattices and systems of multi-adjoint relation equations have been related in order to obtain results that ensure the existence of solutions in these systems. These definitions and results are illustrated by a toy example to improve the readability and comprehension of the paper. Among all concept lattice frameworks, we have related the multi-adjoint property-oriented concept lattices to the systems of multi-adjoint relation equa- tions, e.g., the extension and intension operators of this concept lattice can be ? Partially supported by the Spanish Science Ministry TIN2009-14562-C05-03 and by Junta de Andalucı́a project P09-FQM-5233. ?? Corresponding author. c 2011 by the paper authors. CLA 2011, pp. 75–86. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 76 Juan Carlos Dı́az and Jesús Medina-Moreno used to represent multi-adjoint relation equations, and, as a result, the solu- tions of these systems of relation equations can be related to the concepts of the corresponding concept lattice. The more important consequence is that this relation provides that the prop- erties given, e.g., in [2–4,12,14,17,18,27] can be applied to obtain many properties of these systems. Indeed, it can be considered that the algorithms presented, e.g., in [5, 6, 15] obtain solutions for these systems. The plan of this paper is the following: in Section 2 we will recall the multi- adjoint property-oriented concept lattices as well as the basic operators used and some properties; later, in Section 3, an example will be introduced to motivate the multi-adjoint relation equations. Once these equations have been presented, in Section 4 the multi-adjoint property-oriented concept lattices and the systems of multi-adjoint relation equations will be related in order to obtain results which ensure the existence of solutions in these systems; the paper ends with some conclusions and prospects for future work. 2 Multi-adjoint property-oriented concept lattices The basic operators in this environment are the adjoint triples, which are formed by three mappings: a non-commutativity conjunctor and two residuated impli- cations [13], which satisfy the well-known adjoint property. Definition 1. Let (P1 , ≤1 ), (P2 , ≤2 ), (P3 , ≤3 ) be posets and & : P1 × P2 → P3 , . : P3 × P2 → P1 , - : P3 × P1 → P2 be mappings, then (&, ., -) is an adjoint triple with respect to P1 , P2 , P3 if: 1. & is order-preserving in both arguments. 2. . and - are order-preserving on the first argument1 and order-reversing on the second argument. 3. x ≤1 z . y iff x & y ≤3 z iff y ≤2 z - x, where x ∈ P1 , y ∈ P2 and z ∈ P3 . Example of adjoint triples are the Gödel, product and Lukasiewicz t-norms together with their residuated implications. Example 1. Since the Gödel, product and Lukasiewicz t-norms are commuta- tive, the residuated implications satisfy that .G =-G , .P =-P and .L =-L . Therefore, the Gödel, product and Lukasiewicz adjoint triples are defined on [0, 1] as: &P (x, y) = x · y z -P x = min(1, z/x) ( 1 if x ≤ z &G (x, y) = min(x, y) z -G x = z otherwise &L (x, y) = max(0, x + y − 1) z -G x = min(1, 1 − x + z) 1 Note that the antecedent will be evaluated on the right side, while the consequent will be evaluated on the left side, as in logic programming framework. 2 Concept lattices in fuzzy relation equations 77 In [19] more general examples of adjoint triples are given. The basic structure, which allows the existence of several adjoint triples for a given triplet of lattices, is the multi-adjoint property-oriented frame. Definition 2. Given two complete lattices (L1 , 1 ) and (L2 , 2 ), a poset (P, ≤) and adjoint triples with respect to P, L2 , L1 , (&i , .i , -i ), for all i = 1, . . . , l, a multi-adjoint property-oriented frame is the tuple (L1 , L2 , P, 1 , 2 , ≤, &1 , .1 , -1 , . . . , &l , .l , -l ) Multi-adjoint property-oriented frames are denoted as (L1 , L2 , P, &1 , . . . , &l ). Note that the notation is similar to a multi-adjoint frame [18], although the adjoint triples are defined on different carriers. The definition of context is analogous to the one given in [18]. Definition 3. Let (L1 , L2 , P, &1 , . . . , &l ) be a multi-adjoint property-oriented frame. A context is a tuple (A, B, R, σ) such that A and B are non-empty sets (usually interpreted as attributes and objects, respectively), R is a P -fuzzy rela- tion R : A × B → P and σ : B → {1, . . . , l} is a mapping which associates any element in B with some particular adjoint triple in the frame.2 From now on, we will fix a multi-adjoint property-oriented frame and context, (L1 , L2 , P, &1 , . . . , &l ), (A, B, R, σ). ↓N Now we define the following mappings ↑π : LB A 2 → L1 and : LA B 1 → L2 as g ↑π (a) = sup{R(a, b) &σ(b) g(b) | b ∈ B} (1) N f ↓ (b) = inf{f (a) -σ(b) R(a, b) | a ∈ A} (2) Clearly, these definitions3 generalize the classical possibility and necessity operators [11] and they form an isotone Galois connection [16]. There are two dual versions of the notion of Galois connetion. The most famous Galois connec- tion, where the maps are order-reversing, is properly called Galois connection, and the other in which the maps are order-preserving, will be called isotone Ga- lois connection. In order to make this contribution self-contained, we recall their formal definitions: Let (P1 , ≤1 ) and (P2 , ≤2 ) be posets, and ↓ : P1 → P2 , ↑ : P2 → P1 mappings, the pair (↑ , ↓ ) forms a Galois connection between P1 and P2 if and only if: ↑ and ↓ are order-reversing; x ≤1 x↓↑ , for all x ∈ P1 , and y ≤2 y ↑↓ , for all y ∈ P2 . The one we adopt here is the dual definition: Let (P1 , ≤1 ) and (P2 , ≤2 ) be posets, and ↓ : P1 → P2 , ↑ : P2 → P1 mappings, the pair (↑ , ↓ ) forms an isotone Galois connection between P1 and P2 if and only if: ↑ and ↓ are order-preserving; x ≤1 x↓↑ , for all x ∈ P1 , and y ↑↓ ≤2 y, for all y ∈ P2 . 2 A similar theory could be developed by considering a mapping τ : A → {1, . . . , l} which associates any element in A with some particular adjoint triple in the frame. 3 From now on, to improve readability, we will write &b , -b instead of &σ(b) , -σ(b) . 3 78 Juan Carlos Dı́az and Jesús Medina-Moreno A concept, in this environment, is a pair of mappings hg, f i, with g ∈ LB , f ∈ N L , such that g ↑π = f and f ↓ = g, which will be called multi-adjoint property- A oriented formal concept. In that case, g is called the extension and f , the inten- sion of the concept. The set of all these concepts will be denoted as MπN [16]. Definition 4. A multi-adjoint property-oriented concept lattice is the set N ↑π MπN = {hg, f i | g ∈ LB A 2 , f ∈ L1 and g = f, f ↓ = g} in which the ordering is defined by hg1 , f1 i hg2 , f2 i iff g1 2 g2 (or equivalently f1 1 f2 ). The pair (MπN , ) is a complete lattice [16], which generalize the concept lattice introduced in [7] to a fuzzy environment. 3 Multi-adjoint relation equations This section begins with an example that motivates the definition of multi- adjoint relation equations, which will be introduced later. 3.1 Multi-adjoint logic programming A short summary of the main features of multi-adjoint languages will be pre- sented. The reader is referred to [20, 21] for a complete formulation. A language L contains propositional variables, constants, and a set of logical connectives. In this fuzzy setting, the usual connectives are adjoint triples and a number of aggregators. The language L is interpreted on a (biresiduated) multi-adjoint lattice,4 hL, , .1 , -1 , &1 , . . . , .n , -n , &n i, which is a complete lattice L equipped with a collection of adjoint triples h&i , .i , -i i, where each &i is a conjunctor in- tended to provide a modus ponens-rule with respect to .i and -i . A rule is a formula A .i B or A -i B, where A is a propositional symbol (usually called the head) and B (which is called the body) is a formula built from propositional symbols B1 , . . . , Bn (n ≥ 0), truth values of L and conjunctions, disjunctions and aggregations. Rules with an empty body are called facts. A multi-adjoint logic program is a set of pairs hR, αi, where R is a rule and α is a value of L, which may express the confidence which the user of the system has in the truth of the rule R. Note that the truth degrees in a given program are expected to be assigned by an expert. Example 2. Let us to consider a multi-adjoint lattice h[0, 1], ≤, ←G , &G , ←P , &P , ∧L i 4 Note that a multi-adjoint lattice is a particular case of a multi-adjoint property- oriented frame. 4 Concept lattices in fuzzy relation equations 79 where &G and &P are the Gödel and product conjunctors, respectively, and ←G , ←P their corresponding residuated implications. Moreover, the Lukasie- wicz conjunctor ∧L will be used in the program [13]. Given the set of variables (propositional symbols) Π = {low oil, low water, rich mixture, overheating, noisy behaviour, high fuel consumption} the following set of multi-adjoint rules form a multi-adjoint program, which may represent the behaviour of a motor. hhigh fuel consumption ←G rich mixture ∧L low oil, 0.8i hoverheating ←G low oil, 0.5i hnoisy behaviour ←P rich mixture, 0.8i hoverheating ←P low water, 0.9i hnoisy behaviour ←G low oil, 1i The usual procedural is to measure the levels of “oil”, “water” and “mix- ture” of a specific motor, after that the values for low oil, low water and rich mixture are obtained, which are represented in the program as facts, for instance, the next ones can be added to the program: hlow oil ←P >, 0.2i hlow water ←P >, 0.2i hrich mixture ←P >, 0.5i Finally, the values for the rest of variables (propositional symbols) are com- puted [20]. For instance, in order to attain the value for overheating(o, w), for a level of oil, o, and water, w, the rules hoverheating ←G low oil, ϑ1 i and hoverheating ←P low water, ϑ2 i are considered and its value is obtained as: overheating(o, w) = (low oil(o) &G ϑ1 ) ∨ (low water(w) &P ϑ2 ) (3) Now, the problem could be to recompute the weights of the rules from experimental instances of the variables, that is, the values of overheating, noisy behaviour and high fuel consumption are known for particular mea- sures of low oil, low water and rich mixture. Specifically, given the levels of oil, o1 , . . . , on , the levels of water, w1 , . . . , wn , and the measures of mixture, t1 , . . . , tn , we may experimentally know the values of the variables: noisy behaviour(ti , oi ), high fuel consumption(ti , oi ) and overheating(oi , wi ), for all i ∈ {1, . . . , n}. Considering Equation (3), the unknown elements could be ϑ1 and ϑ2 instead of overheating(o, w). Therefore, the problem now is to look for the values of ϑ1 and ϑ2 , which solve the following system obtained after assuming the exper- imental data for the propositional symbols, ov1 , o1 , w1 , . . . , ovn , on , wn . overheating(ov1 ) = (low oil(o1 ) &G ϑ1 ) ∨ (low water(w1 ) &P ϑ2 ) .. .. .. .. . . . . overheating(ovn ) = (low oil(on ) &G ϑ1 ) ∨ (low water(wn ) &P ϑ2 ) 5 80 Juan Carlos Dı́az and Jesús Medina-Moreno This system can be interpreted as a system of fuzzy relation equations in which several conjunctors, &G and &P , are assumed. Moreover, these conjunctors could be neither non-commutative nor associative and defined in general lattices, as permit the multi-adjoint framework. Next sections introduce when these systems have solutions and a novel method to obtain them using concept lattice theory. 3.2 Systems of multi-adjoint relation equations The operators used in order to obtain the systems will be the generalization of the sup-∗-composition, introduced in [29], and inf-→-composition, introduced in [1]. From now on, a multi-adjoint property-oriented frame, (L1 , L2 , P, &1 , . . . , &l ) will be fixed. In the definition of a multi-adjoint relation equation an interesting mapping σ : U → {1, . . . , l} will be considered, which relates each element in U to an adjoint triple. This mapping will play a similar role as the one given in a multi- adjoint context, defined in the previous section, for instance, this map provides a partition of U in preference sets. A similar theory may be developed for V instead of U . Let U = {u1 , . . . , um } and V = {v1 , . . . , vn } be two universes, R ∈ L2 U ×V an unknown fuzzy relation, σ : U → {1, . . . , l} a map that relates each element in U to an adjoint triple, and K1 , . . . , Kn ∈ P U , D1 , . . . , Dn ∈ L1 V arbitrarily chosen fuzzy subsets of the respective universes. A system of multi-adjoint relation equations with sup-&-composition, is the following system of equations _ (Ki (u) &u R(u, v)) = Di (v), i ∈ {1, . . . , n} (4) u∈U where &u represents the adjoint conjunctor associated to u by σ, that is, if σ(u) = (&s , .s , -s ), for s ∈ {1, . . . , l}, then &u is exactly &s . If an element v of V is fixed and the elements Ki (uj ), R(uj , v) and Di (v) are written as kij , xj and di , respectively, for each i ∈ {1, . . . , n}, j ∈ {1, . . . , m}, then System (4) can particularly be written as k11 &u1 x1 ∨ · · · ∨ k1m &um xm = d1 .. .. .. .. (5) . . . . kn1 &u1 x1 ∨ · · · ∨ knm &um xm = dn where kij and di are known and xj must be obtained. Hence, for each v ∈ V , if we solve System (5), then we obtain a “column” of R (i.e. the elements R(uj , v), with j ∈ {1, . . . , m}), thus, solving n similar systems, one for each v ∈ V , the unknown relation R is obtained. Example 3. Assuming Example 2, in this case, we will try to solve the problem about to obtain the weights associated to the rules from particular observed data for the propositional symbols. 6 Concept lattices in fuzzy relation equations 81 The propositional symbols (variables) will be written in short as: hfc, nb, oh, rm, lo and lw, and the measures of particular cases of the behaviour of the motor will be: hi , ni , ovi , ri , oi , wi , for hfc, nb, oh, rm, lo and lw, respectively, in each case i, with i ∈ {1, 2, 3}. For instance, the next system associated to overheating is obtained from the computation provided in Example 2. oh(ov1 ) = (lo(o1 ) &G ϑoh oh lo ) ∨ (lw(w1 ) &P ϑlw ) oh(ov2 ) = (lo(o2 ) &G ϑoh oh lo ) ∨ (lw(w2 ) &P ϑlw ) oh(ov3 ) = (lo(o3 ) &G ϑoh oh lo ) ∨ (lw(w3 ) &P ϑlw ) where ϑoh oh lo and ϑlw are the weights associated to the rules with head oh. Similar systems can be obtained to high fuel consumption and noisy behaviour. Assuming the multi-adjoint frame with carrier L = [0, 1] and the Gödel and product triples, these systems are particular systems of multi-adjoint relational equations. The corresponding context is formed by the sets U = {rm, lo, lw, rm∧L lo}, V = {hfc, nb, oh}; the mapping σ that relates the elements lo, rm ∧L lo to the Gödel triple, and rm, lw to the product triple; the mappings K1 , . . . , Kn ∈ P U , defined as the values given by the propositional symbols in U on the ex- perimental data, for instance, if u = lo, then K1 (lo) = lo(o1 ), . . . , Kn (lo) = lo(on ); and the mappings D1 , . . . , Dn ∈ L1 V , defined analogously, for instance, if v = rm, then D1 (rm) = rm(r1 ), . . . , Dn (rm) = rm(rn ). Finally, the unknown fuzzy relation R ∈ L2 U ×V is formed by the weights of the rules in the program. In the system above, oh has been the element v ∈ V fixed. Moreover, as there do not exist rules with body rm and rm ∧L lo, that is, the weights for that hypothetical rules are 0, then the terms (rm(ri ) &G 0 = 0 and (rm(ri ) ∧L lo(oi ) &P 0 = 0 do not appear. Its counterpart is a system of multi-adjoint relation equations with inf--- composition, that is, ^ (R(u, v) -uj Kj∗ (v)) = Ej (u), j ∈ {1, . . . , m} (6) v∈V considered with respect to unknown fuzzy relation R ∈ L1 U ×V , and where K1∗ , . . . , Km ∗ ∈ P V and E1 , . . . , Em ∈ L2 U . Note that -uj represents the corre- sponding adjoint implication associated to uj by σ, that is, if σ(uj ) = (&s , .s , -s ), for s ∈ {1, . . . , l}, then -uj is exactly -s . Remark that in System 6, the implication -uj does not depend of the element u, but of j. Hence, the implications used in each equation of the system are the same. If an element u of U is fixed, fuzzy subsets K1∗ , . . . , Km∗ ∈ P V , E1 , . . . , Em ∈ U ∗ L2 are assumed, such that Kj (vi ) = kij , R(u, vi ) = yi and Ej (u) = ej , for each i ∈ {1, . . . , n}, j ∈ {1, . . . , m}, then System (6) can particularly be written as y1 -u1 k11 ∧ · · · ∧ yn -u1 kn1 = e1 .. .. .. .. (7) . . . . y1 -um k1m ∧ · · · ∧ yn -um knm = em 7 82 Juan Carlos Dı́az and Jesús Medina-Moreno Therefore, for each u ∈ U , we obtain a “row” of R (i.e. the elements R(u, vi ), with i ∈ {1, . . . , n}), consequently, solving m similar systems, the unknown relation R is obtained. Systems (5) and (7) have the same goal, searching for the unknown relation R although the mechanism is different. Analyzing these systems, we have that the left side of these systems can be represented by the mappings CK : Lm n n m 2 → L1 , IK ∗ : L1 → L2 , defined as: CK (x̄)i = ki1 &u1 x1 ∨ · · · ∨ kim &um xm , for all i ∈ {1, . . . , n} (8) IK ∗ (ȳ)j = y1 -uj k1j ∧ · · · ∧ yn -uj knj , for all j ∈ {1, . . . , m} (9) where x̄ = (x1 , . . . , xm ) ∈ Lm n 2 , ȳ = (y1 , . . . , yn ) ∈ L1 , and CK (x̄)i , IK ∗ (ȳ)j are the components of CK (x̄), IK ∗ (ȳ), respectively, for each i ∈ {1, . . . , n} and j ∈ {1, . . . , m}. Hence, Systems (5) and (7) can be written as: CK (x1 , . . . , xm ) = (d1 , . . . , dn ) (10) IK ∗ (y1 , . . . , yn ) = (e1 , . . . , em ) (11) respectively. 4 Relation between multi-adjoint property-oriented concept lattices and multi-adjoint relation equation This section shows that Systems (5) and (7) can be interpreted in a multi- adjoint property-oriented concept lattice. And so, the properties given to the N isotone Galois connection ↑π and ↓ , as well as to the complete lattice MπN can be used in the resolution of these systems. First of all, the environment must be fixed. Hence, a multi-adjoint context (A, B, S, σ) will be considered, such that A = V 0 , B = U , where V 0 has the same cardinality as V , σ will be the mapping given by the systems and S : A × B → P is defined as S(vi0 , uj ) = kij . Note that A = V 0 is related to the mappings Ki , since S(vi0 , uj ) = kij = Ki (uj ); Now, we will prove that the mappings defined at the end of the previous section are related to the isotone Galois connection. Given µ ∈ LB 2 , such that µ(uj ) = xj , for all j ∈ {1, . . . , m}, the following equalities are obtained, for each i ∈ {1, . . . , n}: CK (x̄)i = ki1 &u1 x1 ∨ · · · ∨ kim &um xm = S(vi0 , u1 ) &u1 µ(u1 ) ∨ · · · ∨ S(vi0 , um ) &um µ(um ) = sup{S(vi0 , uj ) &uj µ(uj ) | j ∈ {1, . . . , m}} = µ↑π (vi0 ) ↑π Therefore, the mapping CK : Lm n 2 → L1 is equivalent to the mapping : LB 2 → A m B L1 , where an element x̄ in L2 can be interpreted as a map µ in L2 , such that 8 Concept lattices in fuzzy relation equations 83 µ(uj ) = xj , for all j ∈ {1, . . . , m}, and the element CK (x̄) as the mapping µ↑π , such that µ↑π (vi0 ) = CK (x̄)i , for all i ∈ {1, . . . , n}. An analogy can be developed applying the above procedure to mappings IK ∗ N ↓N and ↓ , obtaining that the mappings IK ∗ : Ln1 → Lm 2 and : LA B 1 → L2 are equivalent. As a consequence, the following result holds: Theorem 1. The mappings CK : Lm n n m 2 → L1 , IK ∗ : L1 → L2 , establish an iso- m m tone Galois connection. Therefore, IK ∗ ◦ CK : L2 → L2 is a closure operator and CK ◦ IK ∗ : Ln1 → Ln1 is an interior operator. As (CA , IK ∗ ) is an isotone Galois connection, any result about the solvability of one system has its dual counterpart. The following result explains when these systems can be solved and how a solution can be obtained. N Theorem 2. System (5) can be solved if and only if hλ↓d¯ , λd¯i is a concept of MπN , where λd¯ : A = {v1 , . . . , vn } → L1 , defined as λd¯(vi ) = di , for all N i ∈ {1, . . . , n}. Moreover, if System (5) has a solution, then λ↓d¯ is the greatest solution of the system. Similarly, System (7) can be solved if and only if hµē , µē↑π i is a concept of MπN , where µē : B = {u1 , . . . , um } → L2 , defined as µē (uj ) = ej , for all j ∈ {1, . . . , m}. Furthermore, if System (7) has a solution, then µ↑ē π is the smallest solution of the system. The main contribution of the relation introduced in this paper is not only the above consequences, but a lot of other properties for Systems (5) and (7) that can be stabilized from the results proved, for example, in [2–4, 12, 14, 17, 18, 27]. Next example studies the system of multi-adjoint relation equations presented in Example 3. Example 4. The aim will be to solve a small system in order to improve the understanding of the method. In the environment of Example 3, the following system will be solved assuming the experimental data: oh(ov1 ) = 0.5, lo(o1 ) = 0.3, lw(w1 ) = 0.3, oh(ov2 ) = 0.7, lo(o2 ) = 0.6, lw(w2 ) = 0.8, oh(ov3 ) = 0.4, lo(o3 ) = 0.5, lw(w3 ) = 0.2. oh(ov1 ) = (lo(o1 ) &G ϑoh oh lo ) ∨ (lw(w1 ) &P ϑlw ) oh(ov2 ) = (lo(o2 ) &G ϑoh oh lo ) ∨ (lw(w2 ) &P ϑlw ) oh(ov3 ) = (lo(o3 ) &G ϑoh oh lo ) ∨ (lw(w3 ) &P ϑlw ) where ϑoh oh lo and ϑlw are the variables. The context is: A = V 0 = {1, 2, 3}, the set of observations, B = U = {lo, lw}, σ associates the propositional symbol lo to the Gödel triple and lw to the product triple. The relation S : A × B → [0, 1] is defined in Table 1. Therefore, considering the mapping λoh : A → [0, 1] associated to the values of overheating in each experimental case, that is λoh (1) = 0.5, λoh (2) = 0.7, 9 84 Juan Carlos Dı́az and Jesús Medina-Moreno Table 1. Relation S. low oil low water 1 0.3 0.3 2 0.6 0.8 3 0.5 0.2 and λoh (3) = 0.4; and the mapping CK : [0, 1]2 → [0, 1]3 , defined in Equation (8), the system above can be written as CK (ϑoh oh lo , ϑlw ) = λoh Since, by the comment above, there exists µ ∈ [0, 1]B , such that CK (ϑoh oh lo , ϑlw ) = ↑π B ↑π µ , the goal will be to attain the mapping µ ∈ [0, 1] , such that µ = λoh , N which can be found if and only if ((λoh )↓ , λoh ) is a multi-adjoint property- oriented concept in the considered context, by Theorem 2. N First of all, we compute (λoh )↓ . N (λoh )↓ (lo) = inf{λoh (1) -G S(1, lo), λoh (2) -G S(2, lo), λoh (3) -G S(3, lo)} = inf{0.5 -G 0.3, 0.7 -G 0.6, 0.4 -G 0.5} = inf{1, 1, 0.4} = 0.4 ↓N (λoh ) (lw) = inf{0.5 -P 0.3, 0.7 -P 0.8, 0.4 -P 0.2} = inf{1, 0.875, 1} = 0.875 N Now, the mapping (λoh )↓ ↑π is obtained. N N N (λoh )↓ ↑π (1) = sup{S(1, lo) &G (λoh )↓ (lo), S(1, lw) &P (λoh )↓ (lw)} = sup{0.3 &G 0.4, 0.3 &P 0.875} = sup{0.3, 0.2625} = 0.3 ↓N ↑π (λoh ) (2) = sup{0.6 &G 0.4, 0.8 &P 0.875} = 0.7 ↓N ↑π (λoh ) (3) = sup{0.5 &G 0.4, 0.2 &P 0.875} = 0.4 N Therefore, ((λoh )↓ , λoh ) is not a multi-adjoint property-oriented concept and thus, the considered system has no solution, although if the experimental value for oh had been 0.3 instead of 0.5, the system would have had a solution. These changes could be considered in several applications where noisy vari- ables exist and their values can be conveniently changed to obtain approximate solutions for the systems. Thus, if the experimental data for overheating are oh(ov1 ) = 0.3, oh(ov2 ) = 0.7 and oh(ov2 ) = 0.4, then the original system will have at least one solution and the values ϑoh oh lo , ϑlw will be 0.4, 0.875, respectively for a solution. Consequently, the truth for the first rule is lower than for the second or it might be thought that it is more determinant in obtaining higher 10 Concept lattices in fuzzy relation equations 85 values for lw than for lo. Another possibility is to consider that this conclusion about the certainty of the rules is not correct, in which case another adjoint triple might be associate to lo. As a result, the properties introduced in several fuzzy formal concept anal- ysis frameworks can be applied in order to obtain solutions of fuzzy relation equations, as well as in the multi-adjoint general framework. Furthermore, in order to obtain the solutions of Systems (5) and (7), the algorithms developed, e.g., in [5, 6, 15], can be used. 5 Conclusions and future work Multi-adjoint relation equations have been presented that generalize the existing definitions presented at this time. In this general environment, different conjunc- tors and residuated implications can be used, which provide more flexibility in order to relate the variables considered in the system. A toy example has been introduced in the paper in order to improve its readability and reduce the complexity of the definitions and results. As a consequence of the results presented in this paper, several of the prop- erties provided, e.g., in [2–4, 12, 14, 17, 18, 27], can be used to obtain additional characteristics of these systems. In the future, we will apply the results provided in the fuzzy formal con- cept analysis environments to the general systems of fuzzy relational equations presented here. References 1. W. Bandler and L. Kohout. Semantics of implication operators and fuzzy relational products. Int. J. Man-Machine Studies, 12:89–116, 1980. 2. E. Bartl, R. Bělohlávek, J. Konecny, and V. Vychodil. Isotone galois connections and concept lattices with hedges. In 4th International IEEE Conference “Intelli- gent Systems”, pages 15.24–15.28, 2008. 3. R. Bělohlávek. Lattices of fixed points of fuzzy Galois connections. Mathematical Logic Quartely, 47(1):111–116, 2001. 4. R. Bělohlávek. Concept lattices and order in fuzzy logic. Annals of Pure and Applied Logic, 128:277–298, 2004. 5. R. Bělohlávek, B. D. Baets, J. Outrata, and V. Vychodil. Lindig’s algorithm for concept lattices over graded attributes. Lecture Notes in Computer Science, 4617:156–167, 2007. 6. R. Bělohlávek, B. D. Baets, J. Outrata, and V. Vychodil. Computing the lattice of all fixpoints of a fuzzy closure operator. IEEE Transactions on Fuzzy Systems, 18(3):546–557, 2010. 7. Y. Chen and Y. Yao. A multiview approach for intelligent data analysis based on data operators. Information Sciences, 178(1):1–20, 2008. 8. B. De Baets. Analytical solution methods for fuzzy relation equations. In D. Dubois and H. Prade, editors, The Handbooks of Fuzzy Sets Series, volume 1, pages 291– 340. Kluwer, Dordrecht, 1999. 11 86 Juan Carlos Dı́az and Jesús Medina-Moreno 9. A. Di Nola, S. Sessa, W. Pedrycz, and E. Sanchez. Fuzzy Relation Equations and Their Applications to Knowledge Engineering. Kluwer, 1989. 10. I. Düntsch and G. Gediga. Approximation operators in qualitative data analysis. In Theory and Applications of Relational Structures as Knowledge Instruments, pages 214–230, 2003. 11. G. Gediga and I. Düntsch. Modal-style operators in qualitative data analysis. In Proc. IEEE Int. Conf. on Data Mining, pages 155–162, 2002. 12. G. Georgescu and A. Popescu. Non-dual fuzzy connections. Arch. Math. Log., 43(8):1009–1039, 2004. 13. P. Hájek. Metamathematics of Fuzzy Logic. Trends in Logic. Kluwer Academic, 1998. 14. H. Lai and D. Zhang. Concept lattices of fuzzy contexts: Formal concept analysis vs. rough set theory. International Journal of Approximate Reasoning, 50(5):695– 707, 2009. 15. C. Lindig. Fast concept analysis. In G. Stumme, editor, Working with Conceptual Structures-Contributions to ICCS 2000, pages 152–161, 2000. 16. J. Medina. Towards multi-adjoint property-oriented concept lattices. Lect. Notes in Artificial Intelligence, 6401:159–166, 2010. 17. J. Medina and M. Ojeda-Aciego. Multi-adjoint t-concept lattices. Information Sciences, 180(5):712–725, 2010. 18. J. Medina, M. Ojeda-Aciego, and J. Ruiz-Calviño. Formal concept analysis via multi-adjoint concept lattices. Fuzzy Sets and Systems, 160(2):130–144, 2009. 19. J. Medina, M. Ojeda-Aciego, A. Valverde, and P. Vojtáš. Towards biresiduated multi-adjoint logic programming. Lect. Notes in Artificial Intelligence, 3040:608– 617, 2004. 20. J. Medina, M. Ojeda-Aciego, and P. Vojtáš. Multi-adjoint logic programming with continuous semantics. In Logic Programming and Non-Monotonic Reasoning, LPNMR’01, pages 351–364. Lect. Notes in Artificial Intelligence 2173, 2001. 21. J. Medina, M. Ojeda-Aciego, and P. Vojtáš. Similarity-based unification: a multi- adjoint approach. Fuzzy Sets and Systems, 146:43–62, 2004. 22. A. D. Nola, E. Sanchez, W. Pedrycz, and S. Sessa. Fuzzy Relation Equations and Their Applications to Knowledge Engineering. Kluwer Academic Publishers, Norwell, MA, USA, 1989. 23. Z. Pawlak. Rough sets. International Journal of Computer and Information Sci- ence, 11:341–356, 1982. 24. W. Pedrycz. Fuzzy relational equations with generalized connectives and their applications. Fuzzy Sets and Systems, 10(1-3):185 – 201, 1983. 25. I. Perfilieva. Fuzzy function as an approximate solution to a system of fuzzy relation equations. Fuzzy Sets and Systems, 147(3):363–383, 2004. 26. I. Perfilieva and L. Nosková. System of fuzzy relation equations with inf-→ com- position: Complete set of solutions. Fuzzy Sets and Systems, 159(17):2256–2271, 2008. 27. A. M. Radzikowska and E. E. Kerre. A comparative study of fuzzy rough sets. Fuzzy Sets and Systems, 126(2):137–155, 2002. 28. E. Sanchez. Resolution of composite fuzzy relation equations. Information and Control, 30(1):38–48, 1976. 29. L. A. Zadeh. The concept of a linguistic variable and its application to approximate reasoning I, II, III. Information Sciences, 8–9:199–257, 301–357, 43–80, 1975. 12 Adaptation knowledge discovery for cooking using closed itemset extraction Emmanuelle Gaillard, Jean Lieber, and Emmanuel Nauer LORIA (UMR 7503—CNRS, INRIA, Nancy University) BP 239, 54506 Vandœuvre-lès-Nancy, France, First-Name.Last-Name@loria.fr Abstract. This paper is about the adaptation knowledge (AK) discov- ery for the Taaable system, a case-based reasoning system that adapts cooking recipes to user constraints. The AK comes from the interpreta- tion of closed itemsets (CIs) whose items correspond to the ingredients that have to be removed, kept, or added. An original approach is pro- posed for building the context on which CI extraction is performed. This approach focuses on a restrictive selection of objects and on a specific ranking based on the form of the CIs. Several experimentations are pro- posed in order to improve the quality of the AK being extracted and to decrease the computation time. This chain of experiments can be seen as an iterative knowledge discovery process: the analysis following each experiment leads to a more sophisticated experiment until some concrete and useful results are obtained. Keywords: adaptation knowledge discovery, closed itemset, data preprocess- ing, case-based reasoning, cooking. 1 Introduction This paper addresses the adaptation challenge proposed by the Computer Cook- ing Contest (http://computercookingcontest.net/) which consists in adapt- ing a given cooking recipe to specific constraints. For example, the user wants to adapt a strawberry pie recipe, because she has no strawberry. The underlying question is: which ingredient(s) will the strawberries be replaced with? Adapting a recipe by substituting some ingredients by others requires cook- ing knowledge and adaptation knowledge in particular. Taaable, a case-based reasoning (CBR) system, addresses this problem using an ingredient ontology. This ontology is used for searching which is/are the closest ingredient(s) to the one that has to be replaced. In this approach the notion of “being close to” is given by the distance between ingredients in the ontology. In the previous example, Taaable proposes to replace the strawberries by other berries (e.g. raspberries, blueberries, etc.). However, this approach is limited because two in- gredients which are close in the ontology are not necessarily interchangeable and because introducing a new ingredient in a recipe may be incompatible with some other ingredient(s) of the recipe or may required to add other ingredients. c 2011 by the paper authors. CLA 2011, pp. 87–99. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 88 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer This paper extends the approach proposed in [2] for extracting this kind of adaptation knowledge (AK). The approach is based on closed itemset (CI) extraction, in which items are the ingredients that have to be removed, kept, or added for adapting the recipe. This paper introduces two originalities. The first one concerns the way the binary context, on which the CI extraction is performed, is built, by focusing on a restrictive selection of objects according to the objectives of the knowledge discovery process. The second one concerns the way the CIs are filtered and ranked, according to their form. The paper is organised as follows: Section 2 specifies the problem in its whole context and introduces Taaable which will integrate the discovered AK in its reasoning process. Section 3 gives preliminaries for this work, introducing CI extraction, case-based reasoning, and related work. Section 4 explains our approach; several experiments and evaluations are described and discussed. 2 Context and motivations 2.1 Taaable The Computer Cooking Contest is an international contest that aims at compar- ing systems that make inferences about cooking. A candidate system has to use the recipe base given by the contest to propose a recipe matching the user query. This query is a set of constraints such as inclusion or rejection of ingredients, the type or the origin of the dish, and the compatibility with some diets (vegetarian, nut-free, etc.). Taaable [1] is a system that has been originally designed as a candidate of the Computer Cooking Contest. It is also used as a brain teaser for research in knowledge based systems, including knowledge discovery, ontology engineer- ing, and CBR. Like many CBR systems, Taaable uses an ontology to retrieve recipes that are the most similar to the query. Taaable retrieves and creates cooking recipes by adaptation. If there exist recipes exactly matching the query, they are returned to the user; otherwise the system is able to retrieve similar recipes (i.e. recipes that partially match the target query) and adapts these recipes, creating new ones. Searching similar recipes is guided by several ontolo- gies, i.e. hierarchies of classes (ingredient hierarchy, dish type hierarchy and dish origin hierarchy), in order to relax constraints by generalising the user query. The goal is to find the most specific generalisation of the query (the one with the minimal cost) for which recipes exist in the recipe base. Adaptation consists in substituting some ingredients of the retrieved recipes by the ones required by the query. Taaable retrieves recipes using query generalisation, then adapts them by substitution. This section gives a simplified description of the Taaable system. For more details about the Taaable inference engine, see e.g. [1]. For example, for adapting the “My Strawberry Pie” recipe to the no Strawberry constraint, the system first generalises Strawberry into Berry, then specialises Berry into, say, Raspberry. Adaptation knowledge discovery for cooking using closed itemset extraction 89 2.2 Domain ontology An ontology O defines the main classes and relations relevant to cooking. O is a set of atomic classes organised into several hierarchies (ingredient, dish type, dish origin, etc.). Given two classes B and A of this ontology, A is subsumed by B, denoted by B w A, if the set of instances of A is included in the set of instances of B. For instance, Berry w Blueberry and Berry w Raspberry. 2.3 Taaable adaptation principle Let R be a recipe and Q be a query such that R does not exactly match Q (oth- erwise, no adaptation would be needed). For example, Q = no Strawberry and R = “My Strawberry Pie”.The basic ontology-driven adaptation in Taaable follows the generalisation/specialisation principle explained hereafter (in a sim- plified way). First, R is generalised (in a minimal way) into Γ (R) that matches Q. For example, Γ may be the substitution Strawberry Berry. Second, Γ (R) is specialised into Σ(Γ (R)) that still matches Q. For example, Σ is the substitu- tion Berry Raspberry (the class Berry is too abstract for a recipe and must be made precise). This adaptation approach has at least two limits. First, the choice of Σ is at random: there is no reason to choose raspberries instead of blue- berries, unless additional knowledge is given. Second, when such a substitution of ingredient is made, it may occur that some ingredients should be added or removed from R. These limits point out the usefulness of additional knowledge for adaptation. 3 Preliminaries 3.1 Itemset extraction Itemset extraction is a set of data-mining methods for extracting regularities into data, by aggregating object items appearing together. Like FCA [8], itemset extraction algorithms start from a formal context K, defined by K = (G, M, r), where G is a set of objects, M is a set of items, and r is the relation on G × M stating that an object is described by an item [8]. Table 1 shows an example of context, in which recipes are described by the ingredients they require: G is a set of 5 objects (recipes R, R1 , R2 , R3 , and R4 ), M is a set of 7 items (ingredients Sugar, Water, Strawberry, etc.). An itemset I is a set of items, and the support of I, support(I), is the number of objects of the formal context having every item of I. I is frequent, with respect to a threshold σ, whenever support(I) ≥ σ. I is closed if it has no proper superset J (I ( J) with the same support. For example, {Sugar, Raspberry} is an item- set and support({Sugar, Raspberry}) = 2 because 2 recipes require both Sugar and Raspberry. However, {Sugar, Raspberry} is not a closed itemset, because {Sugar, PieCrust, Raspberry} has the same support. Another, equivalent, defi- nition of closed itemsets can be given on the basis of a closure operator ·00 defined as follows. Let I be an itemset and I 0 be the set of objects that have all the items 90 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer y h e Pi err Co arc Ge rry Ci uic Co st p Pi on ll i Ap n wb ru st Wh be ti eJ am he r r e ga te ra eC rn ol sp la pl pl nn eS Su Wa St Ra Ap R × × × × × × R1 × × × × × R2 × × × × R3 × × × × × R4 × × × × × Table 1. Formal context representing ingredients used in recipes. of I: I 0 = {x ∈ G | ∀i ∈ I, x r i}. In a dual way, let X be a set of objects and X 0 be the set of properties shared by all objects of X: X 0 = {i ∈ M | ∀x ∈ X, x r i}. This defines two operators: ·0 : I ∈ 2M 7→ I 0 ∈ 2G and ·0 : X ∈ 2G 7→ X 0 ∈ 2M . These operators can be composed in an operator ·00 : I ∈ 2M 7→ I 00 ∈ 2M . An itemset I is said to be closed if it is a fixed point of ·00 , i.e., I 00 = I. In the following, “CIs” stands for closed itemsets, and “FCIs” stands for frequent CIs. For σ = 3, the FCIs of this context are {Sugar, PieCrust}, {Sugar, PieCrust, Cornstarch}, {Sugar, Water}, {Sugar}, {Water}, {PieCrust}, and {Cornstarch}. For the following experiments, the Charm algorithm [12] that efficiently computes the FCIs is used thanks to Coron a software platform implementing a rich set of algorithmic methods for symbolic data mining [11]. 3.2 Case-based reasoning Case-based reasoning (CBR [10]) consists in answering queries with the help of previous experience units called cases. In Taaable, a case is a recipe and a query represents user constraints. In many systems, including Taaable, CBR consists in the retrieval of a case from the case base and in the adaptation of the retrieved case in an adapted case that solves the query. Retrieval in Taaable is performed by minimal generalisation of the query (cf. section 2.3). Adaptation can be a simple substitution (e.g., substitute strawberry with any berry) but it can be improved thanks to the use of some domain specific AK. This motivates the research on AK acquisition. 3.3 Related work The AK may be acquired in various way. It may be collected from experts [6], it may be acquired using machine learning techniques [9], or be semi-automatic, using data-mining techniques and knowledge discovery principles [3,4]. This paper addresses automatic AK discovery. Previous works, such as the ones proposed by d’Aquin et al. with the Kasimir project in the medical do- Adaptation knowledge discovery for cooking using closed itemset extraction 91 main [5], and by Badra et al. in the context of a previous work on Taaable [2], are the foundations of our work. Kasimir is a CBR system applied to decision support for breast cancer treatment. In Kasimir, a case is a treatment used for a given patient. The patient is described by characteristics (age, tumour size and location, etc.) and the treatment consists in applying medical instructions. In order to discover AK, cases that are similar to the target case are first selected. Then, FCIs are computed on the variations between the target case and the similar cases. FCIs matching a specific form are interpreted for generating AK [5]. Badra et al. use this approach to make cooking adaptations in Taaable [2]. Their work aims at comparing pairs of recipes depending on the ingredients they contain. A recipe R is represented by the set of its ingredients: Ingredients(R). For example, the recipe “My Strawberry Pie” is represented by Ingredients(“My Strawberry Pie”) = {Sugar, Water, Strawberry, PieCrust, Cornstarch, CoolWhip} Let (R, R0 ) be a pair of recipes which is selected. According to [2], the represen- tation of a pair is denoted by ∆, where ∆ represents the variation of ingredients between R and R0 . Each ingredient ing is marked by −, =, or +: – ing − ∈ ∆ if ing ∈ Ingredients(R) and ing ∈ / Ingredients(R0 ), meaning that ing appears in R but not in R0 . – ing + ∈ ∆ if ing ∈ / Ingredients(R) and ing ∈ Ingredients(R0 ), meaning that ing appears in R0 but not in R. – ing = ∈ ∆ if ing ∈ Ingredients(R) and ing ∈ Ingredients(R0 ), meaning that ing appears both in R in R0 . Building a formal context about ingredient variations in cooking reci- pes. Suppose we want to compare the recipe R with the four recipes (R1 , R2 , R3 , R4 ) given in Table 1. V Variations between R = “My Strawberry Pie” and a recipe Ri have the form j ingi,jmark . For example: ∆R,R1 = Sugar= ∧ Water− ∧ Strawberry− ∧ PieCrust= ∧ Cornstarch= ∧ CoolWhip− ∧ Raspberry+ ∧ Gelatin+ (1) According to these variations, a formal context K = (G, M, I) can be built (cf. Table 2, for the running example): – G = {∆R,Ri }i – M is the set of ingredient variations: M = {ingi,j mark }i,j . In particular, M contains all the conjuncts of ∆R,R1 (Strawberry , etc., cf.(1)). − – (g, m) ∈ I, if g ∈ G, m ∈ M , and m is a conjunct of g, for example (∆R,R1 , Strawberry− ) ∈ I. 92 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer eC ry − ol ch − rn ch = nn ce + la y + Co ust − Ra hip − Pi ust = Pi mon + ll + Ap in + Pi ber Co tar Co tar Ge err Ci Jui r− r= r= e+ he w r r s s W b t e a ga te te ra eC rn sp pl pl eS Su Wa Wa St Ap ∆R,R1 × × × × × × × × ∆R,R2 × × × × × × × ∆R,R3 × × × × × × × × ∆R,R4 × × × × × × × × × Table 2. Formal context for ingredient variations in pairs of recipes (R, Rj ). Interpretation. In the formal context, an ingredient marked with + (resp. −) is an ingredient that has to be added (resp. removed). An ingredient marked with = is an ingredient common to R and Ri . 4 Adaptation Knowledge discovery AK discovery is based on the same scheme as knowledge discovery in databases (KDD [7]). The main steps of the KDD process are data preparation, data- mining, and interpretation of the extracted units of information. Data prepara- tion relies on formatting data for being used by data-mining tools and on filtering operations for focusing on special subsets of objects and/or items, according to the objectives of KDD. Data-mining tools are applied for extracting regularities into the data. These regularities have then to be interpreted; filtering operations may also be performed on this step because of the (often) huge size of the data- mining results or of the noise included in these results. All the steps are guided by an analyst. The objective of our work is the extraction of some AK useful for adapt- ing a given recipe to a query. The work presented in the following focuses on filtering operations, in order to extract from a formal context encoding ingredient variations between pairs of recipes, the cooking adaptations. The database used as entry point of the process is the Recipe Source database (http://www.recipesource.com/) which contains 73795 cooking recipes. For the sake of simplicity, we consider in the following, the problem of adapting R by substituting one or several ingredient(s) with one or several ingredient(s) (but the approach can be generalised for removing more ingredients, and also be used for adding ingredient(s) in a recipe). Three experiments are presented; they address the same adaptation problem: adapting the R = “My Strawberry Pie” recipe, with Ingredients(“My Strawberry Pie”) = {Sugar, Water, Strawberry, PieCrust, Cornstarch, CoolWhip}, to the query no Strawberry. In each ex- periment, a formal context about ingredient variations in recipes is built. Then, FCIs are extracted and filtered for proposing cooking adaptation. The two first Adaptation knowledge discovery for cooking using closed itemset extraction 93 experiments focus on object filtering, selecting recipes which are more and more similar to the “My Strawberry Pie” recipe: the first experiment uses recipe from the same type (i.e. pie dish) as “My Strawberry Pie” instead of choosing recipes of any type; the second experiment focuses on a more precise filtering based on similarity between the “My Strawberry Pie” recipe and recipes used for gener- ating the formal context on ingredient variations. 4.1 A first approach with closed itemsets As introduced in [2], a formal context is defined, where objects are ordered pairs of recipes (R, R0 ) and properties are ingredients marked with +, =, − for representing the ingredient variations from R to R0 . The formal context which is build is similar to the example given in Table 2. In each pair of recipes, the first element is the recipe R =“My Strawberry Pie” that must be adapted; the second element is a recipe of the same dish type as R which, moreover, does not contain the ingredient which has to be removed. In our example, it corresponds to pie dish recipes which do not contain strawberry. This formal context allows to build CIs which have to be interpreted in order to acquire adaptation rules. Experiment. 3653 pie dish recipes that do not contain strawberry are found in the Recipe Source database. The formal context, with 3653 objects × 1355 items produces 107,837 CIs (no minimal support is used). Analysis. Some interesting CIs can be found. For example, {PieCrust− , Strawberry− , Cornstarch− , CoolWhip− , Water− , Sugar− } with support of 1657, contains all the ingredients of R with a − mark, meaning that there are 1657 recipes which have no common ingredients with the R recipe. In the same way, {PieCrust− , Strawberry− , Cornstarch− , CoolWhip− , Water− } with sup- port 2590, means that 2590 recipes share only the Sugar ingredient with R because the sugar is the sole ingredient of R which is not included in this CI. The same analysis can be done for {PieCrust− , Strawberry− , Cornstarch− , CoolWhip− , Sugar− } (support of 1900), for water, etc. Conclusion. The CIs are too numerous for being presented to the analyst. Only 1996 of the 3653 pie dish without strawberry recipes share at least one ingredient with R. There are too many recipes without anything in common. A first filter can be used to limit the size of the formal context in number of objects. 4.2 Filtering recipes with at least one common ingredient Experiment. The formal context, with 1996 objects × 813 items, produces 22,408 CIs (no minimal support is used), ranked by decreasing support. 94 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer Results. The top five FCIs are: – {Strawberry− } with support of 1996; – {Strawberry− , CoolWhip− } with support of 1916; – {Strawberry− , PieCrust− } with support of 1757; – {Strawberry− , PieCrust− , CoolWhip− } with support of 1679; – {Strawberry− , Cornstarch− } with support of 1631. Analysis. Several observations can be made. The first FCI containing an ingre- dient marked by + ({Strawberry− , Egg+ }, with support of 849) appears only at the 46th position. Moreover, there are 45 FCIs with one ingredient marked by + in the first 100 FCIs, and no FCI with more than one ingredient marked by +. A substituting ingredient ing can only be found in CIs containing ing + meaning that there exists a recipe containing ing, which is not in R. So, FCIs that do not contain the + mark cannot be used for finding a substitution proposition, and they are numerous in the first 100 ones, based on a support ranking (recall that it has been chosen not to consider adaptation by simply removing ingredient). In the first 100 FCIs, there is only 15 FCIs containing both an ingredient marked by + and an ingredient marked by =. In a FCI I, the = mark on a ingredient ing means that ing is common to R and to recipe(s) involved by the creation of I. So, an ingredient marked by = guarantees a certain similarity (based on ingredients that are used) between the recipes R and R0 compared by ∆R,R0 . If a FCI I contains a potential substituting ingredient, marked by +, but does not contain any =, the risk for proposing a cooking adaptation from I is very high, because there is no common ingredient with R in the recipe the potential substituting ingredient comes from. In the first 100 recipes, the only potential substituting ingredients (so, the ingredients marked by +) are egg, salt, and butter, which are not satisfactory from a cooking viewpoint for substituting the strawberries. We have conducted similar experiments with other R and queries, and the same observations as above can be made. Conclusion. From these observations, it can be concluded that the sole rank- ing based on support is not efficient to find relevant cooking adaptation rules, because the most frequent CIs do no contain potential substituting ingredients and, moreover, have no common ingredient with R. 4.3 Filtering and ranking CIs according to their forms To extract realistic adaptation, CIs with a maximum of ingredients marked by = are searched. We consider that a substitution is acceptable, if 50% of ingredients of R are preserved and if the adaptation does not introduce too many ingredients; we also limit the number of ingredients introduced to 50% of the initial number of ingredients in R. For the experiment with the R = “My Strawberry Pie”, containing initially 6 ingredients, it means that at least 3 ingredients must be preserved and at most 3 ingredients can be added. In term of CIs, it corresponds to CIs containing at least 3 ingredients marked with = and at most 3 ingredients marked with +. Adaptation knowledge discovery for cooking using closed itemset extraction 95 Experiment. Using this filter on CIs produced by the previous experiment re- duces the number of CIs to 505. However, because some CIs are more relevant than others, they must be ranked according to several criteria. We use the fol- lowing rules, given by priority order: 1. A CI must have a + in order to find a potential substituting ingredient. 2. A CI which has more = than another one is more relevant. This criterion promotes the pairs which have a largest set of common ingredient. 3. A CI which has less − than another one is more relevant. This criterion promotes adaptations which remove less ingredients. 4. A CI which has less + than another one is more relevant. This criterion promotes adaptations which add less ingredients. 5. If two CIs cannot be ranked according to the 4 criteria above, the CI the more frequent is considered to be the more relevant. Results. The 5 first CIs ranked according to the previous criteria are: – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , Salt+ } with support of 5; – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , LemonJuice+ } with support of 4; – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , LemonJuice+ , CreamCheese+ } with support of 2; – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , LemonJuice+ , WhippingCream+ } with support of 2; – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , LemonJuice+ , LemonPeel+ } with support of 2. Analysis. One can observe that potential substituting ingredients take part of the first 5 CIs and each CIs preserve 4 (of 6) ingredients. The low supports of these CIs confirm that searching frequent CIs is not compatible with our need, which is to extract CIs with a specific form. Conclusion. Ranking the CIs according to our particular criteria is more efficient than using a support based ranking. This kind of ranking can also be seen as a filter on CIs. However, this approach requires to compute all CIs because the support of interesting CIs is low. 4.4 More restrictive formal context building according to the form of interesting CIs The computation time can be improved by applying a more restrictive selection of recipe pairs at the formal context building step, decreasing drastically the size of the formal context. Indeed, as the expected form of CIs is known, recipe pairs that cannot produce CIs of the expected form can be removed. This can also be seen as a selection of recipes that are similar enough to R. R0 is considered 96 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer as enough similar to R if R0 has a minimal threshold σ = = 50% of ingredients in common with R (cf. (2)) and if R0 has a maximal threshold σ + = 50% of ingredients that are not used in R (cf. (3)). These two conditions expresses for ∆R,R0 the same similarities conditions considered in section 4.3 on CIs. |Ingredients(R) ∩ Ingredients(R0 )| ≥ σ= (2) |Ingredients(R)| |Ingredients(R0 ) \ Ingredients(R)| ≥ σ+ (3) |Ingredients(R)| Experiment. Among the 1996 pie dish recipes not containing Strawberry, only 20 recipes satisfy the two conditions. The formal context, with 20 objects × 40 items, produces only 21 CIs (no minimal support is used). Results. The 5 first CIs, satisfying the form introduced in the previous section and ranked by decreasing support are: – {Water= , Sugar= , Cornstarch= , PieCrust= , Strawberry− , CoolWhip− , RedFoodColoring+ , Cherry+ } with support of 1; – {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− , PieShell+} with support of 6; – {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− , Raspberry+ } with support of 3; – {Water= , Sugar− , Cornstarch= , PieCrust= , Strawberry− , CoolWhip− , Apple+ , AppleJuice+ } with support of 3; – {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− , Peach+ , PieShell+} with support of 2. Analysis. According to these CIs the first potential substituting ingredients are: RedFoodColoring, Cherry, PieShell, Raspberry, Apple, and Peach. Each CI preserves 3 or 4 (of 6) ingredients to 6 and two CIs add 2 ingredients. Conclusion. This approach reduces the computation time without reducing the result quality. Moreover, it gives the best potential adaptation in the first CIs. 4.5 From CIs to adaptation rules As Taaable must propose a recipe adaptation, CIs containing potentially sub- stituting ingredients must be transformed. Indeed, a CI does not represent a direct cooking adaptation. For example, the third CI of the last experiment contains Raspberry+ , simultaneously with CoolWhip− and PieCrust− . Remov- ing the pie crust (i.e. PieCrust− ) can look surprising for a pie dish, but one must keep in mind that a CI does not correspond to a real recipe, but to an abstraction of variations between R and a set of recipes. So, producing a complete adaptation requires to get back to the ∆R,Ri for having all the vari- ations of ingredient that will take part to the adaptation. For example, for Adaptation knowledge discovery for cooking using closed itemset extraction 97 y− h= st + + + St st − p− Ge ll + Pi arc Co err Pi rry GC lor Fo n + ru i r= r= st ru wb Wh be he ti Co eC te ga rn eC ra ol sp eS la od Pi Wa Su Co Ra ∆R,R1 × × × × × × × × × ∆R,R2 × × × × × × × × × ∆R,R3 × × × × × × × × Table 3. Formal context for ingredient variations in pairs of recipes (R, Rj ). the CI {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− , Raspberry+ }, the ∆R,Ri (with i ∈ [1; 3]) are the ones given by Table 3. The adaptation rules extracted from these 3 recipe variations are: – {CoolWhip, PieCrust, Strawberry} ; {Gelatin, GCPieCrust, Raspberry}; – {CoolWhip, PieCrust, Strawberry} ; {FoodColor, PieShell, Raspberry}; – {CoolWhip, PieCrust, Strawberry} ; {PieShell, Raspberry}. For R2 and R3 , PieShell is added in replacement of PieCrust; in R1 , GCPieCrust plays the role of PieCrust. These three recipe variations propose to replace Strawberry by Raspberry. For R1 (resp. R2 ), Gelatin (resp. FoodColor) is also added. Finally, the three recipe variations propose to remove the CoolWhip. Our approach guarantees the ingredient compatibility, with the assumption that the recipe base used for the adaptation rule extraction process contains only good recipes, i.e. recipes which do not contain ingredient incompatibility. Indeed, as adaptation rules are extracted from real recipes, the good combination of ingredients is preserved. So, when introducing a new ingredient ing1 (marked by ing1+ ), removing another ingredient ing2 (marked by ing2− ) could be required. The reason is that there is no recipe, entailed in the creation of the CI from which the adaptation rules are extracted, using both ing1 and ing2 . In the same way, adding a supplementary ingredient ing3 (marked by ing3+ ) in addition of ing1 , is obtained from recipes which use both ing1 and ing3 . Applying FCA on these ∆R,Ri produces the concept lattice presented in Fig. 1 in which the top node is the CI retained. This node can be seen as a generic cooking adaptation, and navigating into the lattice will conduct to more specific adaptation. The KDD loop is closed: after having (1) selected and formatting the data, (2) applying a data-mining CI extraction algorithm, and (3) interpreting the results, a new set of data is selected on which a data-mining –FCA– algorithm could then be applied. We have chosen to return the adaptation rules generated from the 5 first CIs to the user. So, the system proposes results where Strawberry could be replaced (in addition of some other ingredient adding or removing) by “RedFoodColoring and Cherry”, by Raspberry with optional Gelatin or FoodColor, by Peach 98 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer Fig. 1. The lattice computed on the formal context given in Table 3. with optional FoodColor or LemonJuice, by “HeavyCream and LemonRind”, or by “Apple and AppleJuice”. 5 Conclusion This paper shows how adaptation knowledge can be extracted efficiently for ad- dressing a cooking adaptation challenge. Our approach focuses on CIs with a particular form, because the support is not a good ranking measure for this problem. A ranking method based on 5 criteria explicitly specified for this adap- tation problem is proposed; the support is used in addition to distinguish CIs which satisfy in the same way the 5 criteria. Beyond the application domain, this study points out that KD is not only a data-mining issue: the preparation and interpretation steps are also important. Moreover, it highlights the iterative nature of KD: starting from a first experi- ment with few a priori about the form of the results which are too numerous to be interpreted, it arrives to an experiment with a precise aim that gives results that are easy to interpret as adaptation rules. It has been argued in the paper that this approach is better than the basic adaptation approach (based on substituting an ingredient by another one, on the basis of the ontology), in that it avoids some ingredient incompatibilities and makes some specialisation choices. However, a careful study remains to be made in order to compare experimentally these approaches. A short-term future work is to integrate this AK discovery into the online system Taaable, following the principles of opportunistic KD [2]. A mid-term future work consists in using the ontology during the KD process. The idea is to add new items, deduced thanks to the ontology (e.g. the properties Cream− and Milk+ entail the variation Dairy= ). First experiments have already been conducted but they raise interpretation difficulties. Indeed, the extracted CIs contain abstract terms (such as Dairy= or Flavoring+ ) that are not easy to interpret. Adaptation knowledge discovery for cooking using closed itemset extraction 99 References 1. F. Badra, R. Bendaoud, R. Bentebitel, P.-A. Champin, J. Cojan, A. Cordier, S. De- sprés, S. Jean-Daubias, J. Lieber, T. Meilender, A. Mille, E. Nauer, A. Napoli, and Y. Toussaint. Taaable: Text Mining, Ontology Engineering, and Hierarchical Clas- sification for Textual Case-Based Cooking. In ECCBR Workshops, Workshop of the First Computer Cooking Contest, pages 219–228, 2008. 2. F. Badra, A. Cordier, and J. Lieber. Opportunistic Adaptation Knowledge Dis- covery. In Lorraine McGinty and David C. Wilson, editors, 8th International Con- ference on Case-Based Reasoning - ICCBR 2009, volume 5650 of Lecture Notes in Computer Science, pages 60–74, Seattle, États-Unis, July 2009. Springer. The original publication is available at www.springerlink.com. 3. S. Craw, N. Wiratunga, and R. C. Rowe. Learning adaptation knowledge to im- prove case-based reasoning. Artificial Intelligence, 170(16-17):1175–1192, 2006. 4. M. d’Aquin, F. Badra, S. Lafrogne, J. Lieber, A. Napoli, and L. Szathmary. Case base mining for adaptation knowledge acquisition. In International Joint Confer- ence on Artificial Intelligence, IJCAI’07, pages 750–756, 2007. 5. M. D’Aquin, S. Brachais, J. Lieber, and A. Napoli. Decision Support and Knowl- edge Management in Oncology using Hierarchical Classification. In Katherina Kaiser, Silvia Miksch, and Samson W. Tu, editors, Proceedings of the Symposium on Computerized Guidelines and Protocols - CGP-2004, volume 101 of Studies in Health Technology and Informatics, pages 16–30, Prague, Czech Republic, 2004. Silvia Miksch and Samson W. Tu, IOS Press. 6. M. d’Aquin, J. Lieber, and A. Napoli. Adaptation Knowledge Acquisition: a Case Study for Case-Based Decision Support in Oncology. Computational Intelligence (an International Journal), 22(3/4):161–176, 2006. 7. U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery in databases. AI Magazine, pages 37–54, 1996. 8. B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer, 1999. 9. K. Hanney and M. T. Keane. Learning Adaptation Rules From a Case-Base. In I. Smith and B. Faltings, editors, Advances in Case-Based Reasoning – Third European Workshop, EWCBR’96, LNAI 1168, pages 179–192. Springer, 1996. 10. C. K. Riesbeck and R. C. Schank. Inside Case-Based Reasoning. Lawrence Erlbaum Associates, Inc., Hillsdale, New Jersey, 1989. 11. L. Szathmary and A. Napoli. CORON: A Framework for Levelwise Itemset Min- ing Algorithms. Supplementary Proc. of The Third International Conference on Formal Concept Analysis (ICFCA ’05), Lens, France, pages 110–113, 2005. 12. M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In SIAM International Conference on Data Mining SDM’02, pages 33–43, 2002. Fast Computation of Proper Premises Uwe Ryssel1 , Felix Distel2 , and Daniel Borchmann3 1 Institute of Applied Computer Science, Technische Universität Dresden, Dresden, Germany, uwe.ryssel@tu-dresden.de 2 Institute of Theoretical Computer Science, Technische Universität Dresden, Dresden, Germany, felix@tcs.inf.tu-dresden.de 3 Institute of Algebra, Technische Universität Dresden, Dresden, Germany, borch@tcs.inf.tu-dresden.de Abstract. This work is motivated by an application related to refactor- ing of model variants. In this application an implicational base needs to be computed, and runtime is more crucial than minimal cardinality. Since the usual stem base algorithms have proven to be too costly in terms of runtime, we have developed a new algorithm for the fast computation of proper premises. It is based on a known link between proper premises and minimal hypergraph transversals. Two further improvements are made, which reduce the number of proper premises that are obtained multiple times and redundancies within the set of proper premises. We provide heuristic evidence that an approach based on proper premises will also be beneficial for other applications. 1 Introduction Today, graph-like structures are used in many model languages to specify al- gorithms or problems in a more readable way. Examples are data-flow-oriented simulation models, such as MATLAB/Simulink, state diagrams, and diagrams of electrical networks. Generally, such models consist of blocks or elements and connections among them. Using techniques described in Section 5.2, a formal context can be obtained from such models. By computing an implicational base of this context, dependencies among model artifacts can be uncovered. These can help to represent a large number of model variants in a structured way. For many years, computing the stem base has been the default method for extracting a small but complete set of implications from a formal context. There exist mainly two algorithms to achieve this [10,15], and both of them compute not only the implications from the stem base, but also concept intents. This is problematic as a context may have exponentially many concept intents. Recent theoretical results suggest that existing approaches for computing the stem base may not lead to algorithms with better worst-case complexity [6,1]. Bearing this in mind, we focus on proper premises. Just like pseudo-intents, that are used to obtain the stem base, proper premises yield a sound and com- plete set of implications. Because this set of implications does not have minimal cardinality, proper premises have been outside the focus of the FCA community c 2011 by the paper authors. CLA 2011, pp. 101–113. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 102 Uwe Ryssel, Felix Distel and Daniel Borchmann for many years. However, there are substantial arguments to reconsider using them. Existing methods for computing proper premises avoid computing con- cept intents. Thus, in contexts with many concept intents they may have a clear advantage in runtime over the stem base algorithms. This is particularly true for our application where the number of concept intents is often close to the theoretical maximum. Here, attributes often occur together with their negated counterparts, and the concept lattice can contain several millions of elements. In Section 5.1 we provide arguments that we can expect the number of con- cept intents to be larger than the number of proper premises in most contexts, assuming a uniform random distribution. Often, in applications, runtime is the limiting factor, not the size of the basis. But even where minimal cardinality is a requirement, computing proper premises is worth considering, since there are methods to transform a base into the stem base in polynomial time [16]. In this paper we present an algorithm for the fast computation of proper premises. It is based on three ideas. The first idea is to use a simple connection between proper premises and minimal hypergraph transversals. The problem of enumerating minimal hypergraph transversals is well-researched. Exploiting the link to proper premises allows us to use existing algorithms that are known to behave well in practice. A first, naïve algorithm iterates over all attributes and uses a black-box hypergraph algorithm to compute proper premises of each attribute. A drawback when iterating over all attributes is that the same proper premise may be computed several times for different attributes. So we introduce a can- didate filter in the second step: For each attribute m, the attribute set is filtered and proper premises are searched only among the candidate attributes. We show that this filtering method significantly reduces the number of multiple-computed proper premises while maintaining completeness. In a third step we exploit the fact that there are obvious redundancies within the proper premises. These can be removed by searching for proper premises only among the meet-irreducible attributes. We argue that our algorithms are trivial to parallelize, leading to further speedups. Due to their incremental nature, parallelized versions of the stem base algorithms are not known to date. We conclude by providing experimental re- sults. These show highly significant improvements for the contexts obtained from the model refactoring application. For a sample context, where Next-Closure re- quired several hours to compute the stem base, runtime has dropped to fractions of a second. For contexts from other applications the improvements are not as impressive but still large. 2 Preliminaries We provide a short summary of the most common definitions in formal concept analysis. A formal context is a triple K = (G, M, I) where G is a set of objects, M a set of attributes, and I ⊆ G × M is a relation that expresses whether an Fast Computation of Proper Premises 103 object g ∈ G has an attribute m ∈ M . If A ⊆ G is a set of objects then A0 denotes the set of all attributes that are shared among all objects in A, i.e., A0 = { m ∈ M | ∀g ∈ G : gIm }. Likewise, for some set B ⊆ M we define B 0 = { g ∈ G | ∀m ∈ B : gIm }. Pairs of the form (A, B) where A0 = B and B 0 = A are called formal concepts. Formal concepts of the form ({ m }0 , { m }00 ) for some attribute m ∈ M are called attribute concept and are denoted by µm. We define the partial order ≤ on the set of all formal concepts of a context to be the subset order on the first component. The first component of a formal concept is called the concept extent while the second component is called the concept intent. Formal concept analysis provides methods to mine implicational knowledge from formal contexts. An implication is a pair (B1 , B2 ) where B1 , B2 ⊆ M , usually denoted by B1 → B2 . We say that the implication B1 → B2 holds in a context K if B10 ⊆ B20 . An implication B1 → B2 follows from a set of implications L if for every context K in which all implications from L hold, B1 → B2 also holds. We say that L is sound for K if all implications from L hold in K, and we say that L is complete for K if all implications that hold in K follow from L. There exists a sound and complete set of implications for each context which has minimal cardinality [12]. This is called the stem base. The exact definition of the stem base is outside the scope of this work. A sound and complete set of implications can also be obtained using proper premises. For a given set of attributes B ⊆ M , define B • to be the set of those attributes in M \ B that follow from B but not from a strict subset of B, i.e., [ B • = B 00 \ B ∪ S 00 . S(B B is called a proper premise if B • is not empty. It is called a proper premise for m ∈ M if m ∈ B • . It can be shown that L = { B → B • | B proper premise } is sound and complete [11]. Several alternative ways to define this sound and complete set of implications can be found in [2]. We write g $ m if g 0 is maximal with respect to the subset order among all object intents which do not contain m. 3 Proper Premises as Minimal Hypergraph Transversals We present a connection between proper premises and minimal hypergraph transversals, which forms the foundation for our enumeration algorithms. It has been exploited before in database theory to the purpose of mining functional dependencies from a database relation [14]. Implicitly, it has also been known for a long time within the FCA community. However, the term hypergraph has not been used in this context (cf. Prop. 23 from [11]). Let V be a finite set of vertices. Then a hypergraph on V is simply a pair (V, H) where H is a subset of the power set 2V . Intuitively, each set E ∈ H represents an edge of the hypergraph, which, in contrast to classical graph theory, 104 Uwe Ryssel, Felix Distel and Daniel Borchmann may be incident to more or less than two vertices. A set S ⊆ V is called a hypergraph transversal of H if it intersects every edge E ∈ H, i.e., ∀E ∈ H : S ∩ E 6= ∅. S is called a minimal hypergraph transversal of H if it is minimal with respect to the subset order among all hypergraph transversals of H. The transversal hyper- graph of H is the set of all minimal hypergraph transversals of H. It is denoted by Tr (H). The problem of deciding for two hypergraphs G and H whether H is the transversal hypergraph of G is called TransHyp. The problem of enumerating all minimal hypergraph transversals of a hypergraph G is called TransEnum. Both problems are relevant to a large number of fields and therefore have been well-researched. TransHyp is known to be contained in coNP. Since it has been shown that TransHyp can be decided in quasipolynomial time [9], it is not believed to be coNP-complete. Furthermore, it has been shown that it can be decided using only limited non-determinism [8]. For the enumeration problem it is not known to date whether an output-polynomial algorithm exists. However, efficient algorithms have been developed for several classes of hypergraphs [8,4]. The following proposition can be found in [11] among others. Proposition 1. P ⊆ M is a premise of m ∈ M iff (M \ g 0 ) ∩ P 6= ∅ holds for all g ∈ G with g $ m. P is a proper premise for m iff P is minimal (with respect to ⊆) with this property. We immediately obtain the following corollary. Corollary 1. P is a premise of m iff P is a hypergraph transversal of (M, H) where H := {M \ g 0 | g ∈ G, g $ m}. The set of all proper premises of m is exactly the transversal hypergraph Tr ({M \ g 0 | g ∈ G, g $ m}). In particular this proves that enumerating the proper premises of a given attribute m is polynomially equivalent to TransEnum. This can be exploited in a naïve algorithm for computing all proper premises of a formal context (Al- gorithm 1). Being aware of the link to hypergraph transversals, we can benefit from existing efficient algorithms for TransEnum in order to enumerate proper premises similar to what has been proposed in [14]. Of course, it is also possible to use other enumeration problems to which TransEnum can be reduced. Ex- amples are the enumeration of prime implicants of Horn functions [2] and the enumeration of set covers. Fast Computation of Proper Premises 105 4 Improvements to the Algorithm 4.1 Avoiding Duplicates using Candidate Sets We can further optimize Algorithm 1 by reducing the search space. In the naïve algorithm proper premises are typically computed multiple times since they can be proper premises of more than one attribute. Our goal is to avoid this wherever possible. The first idea is shown in Algorithm 2. There we introduce a candidate set C of particular attributes, depending on the current attribute m. We claim now that we only have to search for minimal hypergraph transversals P of { M \ g 0 | g $ m } with P ⊆ C. We provide some intuition for this idea. Algorithm 1 Naïve Algorithm for Enumerating All Proper Premises Input: K = (G, M, I) P=∅ for all m ∈ M do P = P ∪ Tr ({M \ g 0 | g ∈ G, g $ m}) end for return P Algorithm 2 A Better Algorithm for Enumerating All Proper Premises Input: K = (G, M, I) P = { { m } | m ∈ M, { m } is a proper premise of K } for all m ∈ M do C = { u ∈ M \ { m } | 6 ∃v ∈ M : µu ∧ µm ≤ µv < µm } P = P ∪ { P ⊆ C | P minimal hypergraph transversal of { M \ g 0 | g $ m } } end for return P Let us fix a formal context K = (G, M, I), choose m ∈ M and let P ⊆ M be a proper premise for m. Then we know that m ∈ P 00 , which is equivalent to ^ µp ≤ µm. p∈P If we now find another attribute n ∈ M \ { m } with ^ µp ≤ µn < µm p∈P it suffices to find the set P as a proper premise for n, because from µn < µm we can already infer m ∈ P 00 . Conversely, if we search for all proper premises for m, 106 Uwe Ryssel, Felix Distel and Daniel Borchmann we only have to search for those who are not proper premises for attributes n with µn < µm. Now suppose that there exists an element u ∈ P and an attribute v ∈ M such that µm ∧ µu ≤ µv < µm. (1) Then we know ^ ^ ( µp) ∧ µm = µp ≤ µv < µm, p∈P p∈P i.e., P is already a proper premise for v. In this case, we do not have to search for P , since it will be found in another iteration. On the other hand, if P is a proper premise for m but not for any other attribute n ∈ M with µn < µm, the argument given above shows that an element u ∈ P and an attribute v ∈ M satisfying (1) cannot exist. Lemma 1. Algorithm 2 enumerates for a given formal context K = (G, M, I) all proper premises of K. Proof. Let P be a proper premise of K for the attribute m. P is a proper premise and therefore m ∈ P 00 holds, which is equivalent to µm ≥ (P 0 , P 00 ). Let c ∈ M be such that µm ≥ µc ≥ (P 0 , P 00 ) and µc is minimal with this property. We claim that either P = { c } or P is found in the iteration for c of Algorithm 2. Suppose c ∈ P . Then m ∈ { c }00 follows from µm ≥ µc. As a proper premise, P is minimal with the property m ∈ P 00 . It follows P = { c } and P is found by Algorithm 2 during the initialization. Now suppose c 6∈ P . Consider C := { u ∈ M \ { c } | 6 ∃v ∈ M : µu ∧ µc ≤ µv < µc }. We shall show P ⊆ C. To see this, consider some p ∈ P . Then p 6= c holds by assumption. Suppose that p 6∈ C, i.e., there is some v ∈ M such that µp ∧ µc ≤ µv < µc. Because of p ∈ P , µp ≥ (P 0 , P 00 ) and together with µc ≥ (P 0 , P 00 ) we have (P 0 , P 00 ) ≤ µp ∧ µc ≤ µv < µc in contradiction to the minimality of µc. This shows p ∈ C and all together P ⊆ C. To complete the proof it remains to show that P is a minimal hypergraph transversal of { M \ { g }0 | g $ c }, i.e., that P is also a proper premise for c, not only for m. Consider n ∈ P . Assume c ∈ (P \ { n })00 . Since {c} implies m, then P \ { n } would be a premise for m in contradiction to the minimality of P . Thus c 6∈ (P \ { n })00 holds for all n ∈ P and therefore P is a proper premise for c. 4.2 Irreducible Attributes We go one step further and also remove attributes m from our candidate set C whose attribute concept µm is the V meet of other attribute concepts µx1 , . . . , µxn , n where x1 , . . . , xn ∈ C, i.e., µm = i=1 µxi . This results in Algorithm 3 that no Fast Computation of Proper Premises 107 longer computes all proper premises, but a subset that still yields a complete implicational base. We show that we only have to search for proper premises P with P ⊆ N where N is the set of irreducible attributes of K. To ease the presentation, let us assume for the rest of this paper that the formal context K is attribute-clarified. Algorithm 3 Computing Enough Proper Premises Input: K = (G, M, I) P = { { m } | m ∈ M, { m } Vis a proper premise of K } N = M \ { x ∈ M | µx = n i=1 µxi for an n ∈ N and xi ∈ M for 1 ≤ i ≤ n } for all m ∈ M do C = { u ∈ N \ { m } | 6 ∃v ∈ M : µu ∧ µm ≤ µv < µm } P = P ∪ { P ⊆ C | P minimal hypergraph transversal of { M \ g 0 | g $ m } } end for return P Proposition 2. Let m be an attribute and let P be a proper premise for m. Let x ∈ P , n ∈ N, and for 1 ≤ i ≤ n let xi ∈ M be attributes satisfying – m∈ / {Vx1 , . . . , xn }, n – µx = i=1 µxi , – xi ∈ / ∅ for all 1 ≤ i ≤ n and 00 – µx < µxi for all 1 ≤ i ≤ n. Then { x } is a proper premise for all xi and there exists a nonempty set Y ⊆ { x1 , . . . , xn } such that (P \ { x }) ∪ Y is a proper premise for m. Proof. It is clear that { x } is a proper premise for all xi , since xi ∈ { x }00 and / ∅00 . Define xi ∈ QY := (P \ { x }) ∪ Y for Y ⊆ { x1 , . . . , xn }. We choose Y ⊆ { x1 , . . . , xn } such that Y is minimal with respect to m ∈ Q00Y . Such a set exists, since m ∈ ((P \ { x }) ∪ { x1 , . . . , xn })00 because of { x1 , . . . , xn } → { x }. Furthermore, Y 6= ∅, since m ∈ / (P \ { x })00 . We now claim that QY is a proper premise for m. Clearly m ∈ / QY , since m∈ / Y . For all y ∈ Y it holds that m ∈ / (QY \ { y })00 or otherwise minimality of Y would be violated. It therefore remains to show that m ∈ / (QY \ { y })00 for all y ∈ QY \ Y = P \ { x }. (QY \ { y })00 = ((P \ { x, y }) ∪ Y )00 ⊆ ((P \ { y }) ∪ Y )00 = (P \ { y })00 since { x } → Y and x ∈ P \{ y }. Since m ∈ / (P \{ y })00 , we get m ∈ / (QY \{ y })00 as required. In sum, QY is a proper premise for m. 108 Uwe Ryssel, Felix Distel and Daniel Borchmann Lemma 2. Let N be the set of all meet-irreducible attributes of a context K. Define P = { X ⊆ M | |X| ≤ 1, X proper premise } ∪ { X ⊆ N | X proper premise } Then the set L = { P → P • | P ∈ P } is sound and complete for K. Proof. Let m be an attribute and let P be a proper premise for m. If P ∈ / P then it follows that P 6⊆ N . Thus we can find y1 ∈ P \N and elements x1 , . . . , xn ∈ M with n ≥ 1 such that – m∈ / { xV1 , . . . , xn }, n – µy1 = i=1 µxi , – xi ∈ / ∅00 for all 1 ≤ i ≤ n and – µx < µxi for all 1 ≤ i ≤ n. By Proposition 2 we can find a proper premise P1 such that P → { m } fol- lows from { y1 } → { x1 , . . . , xn } and P1 → { m }. Clearly { y1 } ∈ P, since all singleton proper premises are contained in P. If P1 ∈ / P then we can apply Proposition 2 again and obtain a new proper premise P2 , etc. To see that this process terminates consider the strict partial order ≺ defined as P ≺ Q iff ∀q ∈ Q : ∃p ∈ P : µp < µq. It is easy to see that with each application of Proposition 2 we obtain a new proper premise that is strictly larger than the previous with respect to ≺. Hence, the process must terminate. This yields a set P 0 = { { y1 }, . . . , { yk }, Pk } ⊆ P such that P → { m } follows from { Q → Q• | Q ∈ P 0 }. Thus L is a sound and complete set of implications. Together with Lemma 1 this yields correctness of Algorithm 3. Corollary 2. The set of proper premises computed by Algorithm 3 yields a sound and complete set of implications for the given formal context. 5 Evaluation 5.1 Computing Proper Premises Instead of Intents In both the stem base algorithms and our algorithms, runtime can be exponential in the size of the input. In the classical case the reason is that the number of intents can be exponential in the size of the stem base [13]. In the case of our algorithms there are two reasons: the computation of proper premises is TransEnum-complete, and there can be exponentially many proper premises. The first issue is less relevant in practice because algorithms for TransEnum, while still exponential in the worst case, behave well for most instances. To see that there can be exponentially many proper premises in the size of the stem base, let us look at the context Kn from Table 1 for some n ≥ 2, consisting Fast Computation of Proper Premises 109 of two contranominal scales of dimension n × n and one attribute a with empty extent. It can be verified that the proper premises of the attribute a are exactly the sets of the form {mi | i ∈ I} ∪ {m0i | i ∈ / I} for some I ⊆ {1, . . . , n}, while the only pseudo-intents are the singleton sets and {m1 , . . . , mn , m01 , . . . , m0n }. Hence there are 2n proper premises for a, while there are only 2n + 2 pseudo-intents. Table 1. Context Kn with Exponentially Many Proper Premises m1 . . . mn m01 . . . m0n a g1 .. . I6= I6= gn Next-Closure behaves poorly on contexts with many intents while our algo- rithms behave poorly on contexts with many proper premises. In order to provide evidence that our algorithm should behave better in practice we use formulae for the expectation of the number of intents and proper premises in a formal context that is chosen uniformly at random among all n × m-contexts for fixed natural numbers n and m.4 Derivations of these formulae can be found in [7]. The expected value for the number of intents in an n × m-context is Xm X n m n −rq Eintent = 2 (1 − 2−r )m−q (1 − 2−q )n−r , q=0 q r=0 r while the expected value for the number of proper premises for a fixed attribute a in an n × m-context is Xn m−1 q n X m 2 X Y pi+1 −pi −1 Epp = 2−n q! 2−q 1 − 2−q (1 + i) . r=0 r q=0 q q i=0 (p1 ,...,pq )∈N 1≤p1 <··· then continue Br = reducedIntent(C) if Br is empty then continue add(P , component( Br ) ) end for Iterative Software Design of Computer Games through FCA 151 All the lines are self-explicative except that with the add. The component function receives the reduced intent of the formal concept and builds the component representa- tion that has its attributes and functionalities. In some cases, the top concept (>) has a non-empty intent, so it would also generate a component with all its features (name, position and orientation in our example of Fig- ure 4). That component would be added in all entities so, instead of keeping ourselves in a pure component-based architecture with an empty generic Entity class, we can move all those top features to it. Figure 5 shows the components extracted from Rosette using the lattice from Figure 4. The components have been automatically named concatenat- ing each attribute name of the component or, when no one is available, by concatenating all the message names that the component is able to carry out. For example, let us say that the original name of the FightComp component was C health aim. Entity - _name - _position - _orientation + setPosition() + setOrientation() + update() + emmitMessage() 0..* PlayerControllerComp IComponent AIAndMovementComp - _entity - _aiscript - walk() - _messages - goToEntity() - stopWalk() + update() - goToPosition() - turn() + handleMessage() - steeringTo() FightComp PhysicsComp TriggerComp DoorComp GraphicsComp SpeakerComp - _health - _physicmodel - _target - _isOpen - _graphicmodel - _soundfile - _aim - _physicclass - trigger() - open() - setAnimation() - playSound() - hurt() - _scale - touched() - close() - stopAnimation() - stopSound() - shootTo() - ApplyForce() SpeedAttComp - _speed Fig. 5. The candidate components proposed by Rosette Summarizing all the process, when analysing a concept lattice, every formal concept that provides a new feature (having no empty reduced intent) does not represent a new entity type but a new component. The only exception is the formal concept in the top of the lattice that represents the generic entity class, which has data and functionality shared by all the entity types. Both the generic entity and every new component have the ability of carrying out actions in the reduced intent of the formal concept and they are populated with corresponding attributes. This way, we have easily obtained the candidate generic entity class and compo- nents, but we still have to describe the entity types. Starting from every concept which their reduced extents contain an entity type, Rosette uses the superconcept relation and goes up until reaching the concept in the top of the lattice. For example, the Persona entity type (Figure 4) would have components represented by formal concepts number 152 David Llansó et al. 8, 4, 3 and 2 (the number 6 has an empty reduced intent so it does not represent a com- ponent) whilst the ResourceBearer entity type would have the same components but also the number 10 and 9. Obviously, components of every entity type are stored in the generic entity container represented by the formal concept number 1. Keep in mind that the final component distribution does not include information about what components are needed for each entity. This knowledge is not thrown away: Rosette stores all the information in the original lattice using OWL, which provides a knowledge-rich representation that will let it provide some extra functionalities de- scribed in the next sections. 5 Expert Tuning The automatic process detailed above ends up with a collection of proposed compo- nents with a generated name, and the Entity base class that may have some common functionality. This result is presented to developers, who will be able to modify it using their prior experience. Some of the changes will affect to the underlying formal lattice (that is never shown to the users) in such a way that the relationship between it and the initial formal context extracted from the class hierarchy will be broken. At this stage of the process this does not represent an issue, because we will not use FCA anymore over it. On the other hand, changes could be so dramatic that the lattice could even become an invalid one. Fortunately, Rosette uses OWL as the underlying representation, that can be used to represent richer structures than mere partially ordered sets. In any case, for simplicity, in the rest of the paper we will keep talking about lattices although internally our tool will not be using them directly. Users will be able to perform the next four operators over the proposed component distribution: 1. Rename: proposed components are automatically named according to their at- tribute names. The first operator users may perform is to rename them in order to clarify its purpose. 2. Split: in some cases, two functionalities not related to each other may end up in the same component due to the entity type definitions (FCA will group two func- tionalities when both of them appears together in every entity type created in the formal hierarchy). In that case, Rosette gives developers the chance of splitting them in two different components. The expert will then decide which features re- main in the original component and which ones are moved to the new one (which is manually named). Formally speaking, this operator would modify the underly- ing concept lattice creating two concepts (A1, B1) and (A2, B2) that will have the same subconcepts and superconcepts than the original formal concept (A, B) where A ≡ A1 ≡ A2 and B ≡ B1 ∪ B2. The original concept is removed. Al- though this is not correct mathematically speaking, since with this operation we do not have concepts anymore, we still use the term in this and in the other operators for simplicity. 3. Move features: this is the opposite operator. Sometimes some features lie in dif- ferent components but the expert considers that they must belong to the same com- ponent. In this context, features of one component (some elements of the reduced Iterative Software Design of Computer Games through FCA 153 intent) can be transferred to a different component. In the lattice, this means that some attributes are moved from a node to another one. When this movement goes up-down (for example from node 9 to node 10), Rosette will detect the possible in- consistency (entities extracted from node 11 would end with missed features) and warns the user to clone the feature also in the component generated from node 11. If the developer moves all the features of a component the result is an useless and empty component that is therefore removed from the system. 4. Add features: some times features must be copied from one component to an- other one when FCA detects relationships that will not be valid in the long run. In our example, the dependency between node 3 and 4 indicates that all entities with a graphic model (4, GraphicsComp) will have physics (3, PhysicsComp), some- thing valid in the initial hierarchy but that is likely to change afterwards. With the initial distribution, all graphical entities will have an scale thanks to the physic component, but experts could envision that this should be a native feature of the GraphicsComp too. This operator let them to add those “missing” features to any component to avoid dependencies with other ones. The expert interaction is totally necessary, first of all because she has to name the components but also because the system ignores some semantic knowledge and infor- mation based in the developer experience. However, the bigger the example is, with more entity types, the more alike is the proposed and the final set of components, just because the system has more knowledge to distribute responsibilities. While using operators, coherence is granted because of the knowledge-rich OWL representation that contains semantic information about entities, components, and fea- tures (attributes and actions). This knowledge is useful while users tune the component distribution, but also to check errors in the domain and in future steps of the game development (as creating AIs that reason over the domain). Once users validate the final distribution, Rosette generates a big amount of source code for all the components, that programmers will be fill up with the concrete be- haviours. 5.1 Example Figure 5 showed the resultant candidate of components proposed by Rosette for the hierarchy of Figure 1, that can now be manipulated by the expert to tune some aspects. The first performed changes are component rename (rename operator) that is, in fact, applied in the figure. A hand-made component distribution of the original hierarchy would have ended with that one shown in Figure 3, that is quite similar to the distribution provided by Rosette. When using a richer hierarchy, both distributions are even more similar. With the purpose of demonstrating how the expert would use the available opera- tors to transform the proposed set of components, we apply some modifications to the automatically proposed distribution in order to turn it into the other one. First of all, we can consider the SpeedAttComp that has the speed attribute but no functionalities. In designing terms this is acceptable, but rarely has sense from the im- plementation point of view. Speed is used separately by PlayerControllerComp and 154 David Llansó et al. AIAndMovementComp to adjust the movement, so we will apply the move features operator moving (and cloning) the speed feature to both components, and removing SpeedAttComp completely. This operator is coherent with the lattice (Figure 4): we are moving the intent of the node labelled 9 to both subconcepts (10 and 11). After that, another application of the move features operator results in the movement of the touched message interpretation from the TriggerComp to the PhysicsComp. This is done for technical reasons in order to maintain all physic information in the same component. Then, the split operator, which split components, is applied over the AIAndMove- mentComp component twice. Due to the lack of entity types in the example, some fea- tures resides in the same component though in the real implementation are divided. In the first application of the split operator, the goToEntity and the goToPosition message interpretations are moved to a new component, which is named GoToComp. The second application results in the new SteeringToComp component with the steeringTo message interpretation and the speed attribute. The original component is renamed as AIComp by the rename operator and keeps the aiscript attribute. Finally, although the Entity class has received some generic features (from the top concept, >), they are especially important in other components. Instead of just use those features from the entity, programmers would prefer to maintain them also in those other components. For this reason, we have to apply the add features operator over the GraphicsComp, PhysicsComp and SpeakerComp components in order to add the setPosition and the setOrientation functionalities to them. 6 Iterative Software Development with FCA In the previous section we have presented a semi-automatic technique for moving from class hierarchies to components. The target purpose is helping programmers facing up to this kind of distributed system, which is widely used in computer game develop- ments. Through the use of FCA, this technique splits entity behaviours in candidate components but also provides experts with mechanisms for modifying these component candidates. These mechanisms are the operators defined in Section 5, which execution in the domain alter somehow the underlying formal lattice generated during the FCA process. Attentive readers will have realized that the previous technique is valid for the first step of the development but not for further development steps. Due to computer game requirements change throughout the game development, the entity distribution is al- ways changing. When the experts face up to this situation, they may decide to change the entity hierarchy in order to use Rosette for generating a new set of components. The application of FCA results in a new lattice that probably does not change a lot from the previous one. However, the experts usually would have performed some modifications in the proposed component distribution using our operators. As the process is now re- peated, these changes would be lost every time the expert request a new candidate set of components. Our intention in this section is to extend the previous technique in order to allow an iterative software design. In this new approach, the modifications applied over one Iterative Software Design of Computer Games through FCA 155 lattice can be extrapolated to other lattices in future iterations. Keep in mind that the domain operators (Section 5) are applied over components that has been created from a formal concept. So, these operators could be applied on similar formal concepts, of another domain, in case that both domains share the part of the lattice affected by the operators. From a high-level point of view, in order to preserve changes applied over the pre- vious component suggestions, the system compares the new formal lattice, obtained through FCA, with the previous one. The methodology identifies the part of the lattice that does not significantly change between the two FCA applications. This way the tun- ing operators executed in concepts of this part of the lattice could be reapplied in the new lattice. The identification of the target part of the lattice is a semi-automatic process, where formal concepts are related in pairs. Rosette automatically identifies the constant part of the lattice, which for our purpose is the set of pairs of formal concepts that have the same reduced intent. We do not care about the extent in our approach since the component suggestion lays its foundations in the reduced intent. The components extrated from the formal concepts that have not been matched up are presented to the expert. Then she can provide matches between old components and new ones to the considered constant part of the lattice. It is worth mentioning that some of the operators could not be executed in the new domains due to component distribution may vary a lot after various domain iterations but it is just because these operators become obsoleted. 6.1 Example In Section 5.1 FCA is applied to a hierarchy and the automatic part of the proposed methodology leads us to the set of components in Figure 5. The resultant domain was modified by the expert, by using the tuning operators, and the component-based system developed ends up with the components in Figure 3. Now, let us recover the example and suppose that the game design has new require- ments. The game designers propose the addition of two new entity types: the Break- ableDoor, which is a door that can be broken using weapons, and a Teleporter, which moves entities that enter in them to a far target place. Designers also require the modifi- cation of the ResourceBearer entity, which must have a currentEnemy attribute for the artificial intelligence. The Rosette expert captures these domain changes by modifying the entity hierarchy and uses the component suggestion module to distribute responsi- bilities. The application of FCA to the current domain results in the lattice in Figure 6, where formal concepts are tagged with letters from a to n. Comparing the new lattice with the lattice of the previous FCA application (Fig- ure 4), Rosette determines that the pairs of formal concepts <1,a>, <2,b>, <4,d>, <7,f>, <9,k> and <11,m> remain from the previous to the current iteration. When Rosette finishes this automatic match, the formal concepts that were not put into pairs and with no empty reduced intent are presented in the screen. In this moment, the expert put the formal concepts <3,c>, <5,e>, <8,j> and <10,l> into pairs, based on their experience and in the fact that these concepts are very similar (only some attributes 156 David Llansó et al. Fig. 6. New concept lattice changes). Just the g and h formal concepts have no pairs and will become new compo- nents. So, in these steps, the part of the lattice that does not significantly change has been identified and Rosette can extrapolate the modifications applied in the previous lattice to the new one. After applying the operators to the new domain, the new set of candidate components are finally given to the expert. Figure 7 shows these components, where we can compare the result with the components in Figure 5. The general structure is maintained but some actions and attributes has been moved between components. Fur- thermore two new components have arisen. The stressed features denote new elements (or moved ones) whilst the crossed out features mean that they do not belong to this component anymore (FightComp. At this point the expert could continue with the itera- tion by applying new operators to this set of components (i.e change the auto-generated names of the new components). 7 Related Work and Conclusions Regarding related work, we can mention other applications of FCA to software engi- neering. The work described in [12] focuses on the use of FCA during the early phases of software development. They propose a method for finding or deriving class can- didates from a given use case description. Also closely related is the work described in [10], where they propose a general framework for applying FCA to obtain a class hierarchy in different points of the software life-cycle: design from scratch using a set Iterative Software Design of Computer Games through FCA 157 Entity - _name - _position - _orientation + setPosition() + setOrientation() + update() + emmitMessage() GoToComp 0...* - goToEntity() PlayerControllerComp IComponent - goToPosition() - _speed - _entity SteeringToComp - walk() - _messages - _speed - stopWalk() + update() - turn() + handleMessage() - steeringTo() C_destination PhysicsComp TriggerComp AIComp GraphicsComp C_health FightComp SpeakerComp - destination - _target - _graphicmodel - health - _physicmodel - _aiscript - _health - _scale - _soundfile - teleportTo() - _physicclass - trigger() - _currentEnemy - hurt() - _aim - setPosition() - setPosition() - _scale - hurt() - setOrientation() - setOrientation() - setPosition() - shootTo() - setAnimation() - playSound() - setOrientation() DoorComp - applyForce() - stopAnimation() - stopSound() - touched() - _isOpen - open() - close() Fig. 7. The new candidate components proposed by Rosette of class specifications, refactoring from the observation of the actual use of the classes in applications, and hierarchy evolution by incrementally adding new classes. The main difference with the approach presented here is that they try to build a class hierarchy while we intend to distribute functionality among sibling components, which solve the problem with multiple inheritance in FCA lattices. The process of identifying components with FCA is not very different of identifying traits [13] and aspects [17]. In [13] Lienhard et al. present a process that identifies traits from inheritance hierarchies that is bases in the same principles than our system but is not exactly the same due to components are more autonomous pieces of software than traits. Components save their own state whilst traits are just a set of methods. However, which makes the difference between both proposals is the iterability. A possible scenario for applying the techniques described in the paper is to re- engineer a game from class hierarchy to components. In the last years, we have been working on Javy 2 [11], a educational game that was initially developed using an en- tity hierarchy (a portion was shown in Figure 1), and afterwards manually converted to a component-based architecture (Figure 3). When Rosette was available, we tested it using the original Javy 2 hierarchy, and the initial component distribution was quite ac- ceptable when compared with the human-made one. We could have saved a significant amount of time if it had been available on time. In the long term, our goal is to support the up-front development of games with a component-based architecture where entities are connected to a logical hierarchical view. In this paper we have shown how we allow an iterative process when defining the class hierarchy, so operators applied to the early versions of the component distribution are automatically reapplied in the late ones. Nevertheless, more work must be done in the code generation phase to do it reversible. Changes in the autogenerated source code 158 David Llansó et al. are still, unfortunately, out of the scope of Rosette so they must be manually redone for each class hierarchy iteration. References 1. K. Beck. Embracing change with extreme programming. Computer, 32:70–77, October 1999. 2. K. Beck and C. Andres. Extreme Programming Explained: Embrace Change (2nd Edition). Addison-Wesley Professional, 2004. 3. G. Birkhoff. Lattice Theory, third editon. American Math. Society Coll. Publ. 25, Provi- dence, R.I, 1973. 4. W. Buchanan. Game Programming Gems 5, chapter A Generic Component Library. Charles River Media, 2005. 5. M. Chady. Theory and practice of game object component architecture. In Game Developers Conference, 2009. 6. M. Dao, M. Huchard, T. Libourel, A. Pons, and J. Villerd. Proposals for Multiple to Single Inheritance Transformation. In MASPEGHI’04: 3rd Workshop on Managing SPEcializa- tion/Generalization Hierarchies, pages 21–26, Oslo (Norway), 2004. 7. S. Ducasse, O. Nierstrasz, N. Schärli, R. Wuyts, and A. P. Black. Traits: A mechanism for fine-grained reuse. ACM Trans. Program. Lang. Syst., 28:331–388, March 2006. 8. B. Ganter and R. Wille. Formal concept analysis. Mathematical Foundations, 1997. 9. S. Garcés. AI Game Programming Wisdom III, chapter Flexible Object-Composition Archi- tecture. Charles River Media, 2006. 10. R. Godin and P. Valtchev. Formal Concept Analysis, chapter Formal Concept Analysis-Based Class Hierarchy Design in Object-Oriented Software Development, pages 304–323. Springer Berlin / Heidelberg, 2005. 11. P. P. Gómez-Martı́n, M. A. Gómez-Martı́n, P. A. González-Calero, and P. Palmier-Campos. Using metaphors in game-based education. In K. chuen Hui, Z. Pan, R. C. kit Chung, C. C. Wang, X. Jin, S. Göbel, and E. C.-L. Li, editors, Technologies for E-Learning and Digital En- tertainment. Second International Conference of E-Learning and Games (Edutainment’07), volume 4469 of Lecture Notes in Computer Science, pages 477–488. Springer Verlag, 2007. 12. W. Hesse and T. A. Tilley. Formal Concept Analysis used for Software Analysis and Mod- elling, volume 3626 of LNAI, pages 288–303. Springer, 2005. 13. A. Lienhard, S. Ducasse, and G. Arévalo. Identifying traits with formal concept analysis. In Proceedings of the 20th IEEE/ACM international Conference on Automated software engi- neering, ASE ’05, pages 66–75, New York, NY, USA, 2005. ACM. 14. D. Llansó, M. A. Gómez-Martı́n, P. P. Gómez-Martı́n, and P. A. González-Calero. Explicit domain modelling in video games. In International Conference on the Foundations of Digital Games (FDG), Bordeaux, France, June 2011. ACM. 15. B. Rene. Game Programming Gems 5, chapter Component Based Object Management. Charles River Media, 2005. 16. K. Schwaber and M. Beedle. Agile Software Development with Scrum. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st edition, 2001. 17. T. Tourwe and K. Mens. Mining aspectual views using formal concept analysis. In Proceed- ings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop, pages 97–106, Washington, DC, USA, 2004. IEEE Computer Society. 18. P. Valtchev, D. Grosser, C. Roume, and M. R. Hacene. Galicia: An open platform for lattices. In In Using Conceptual Structures: Contributions to the 11th Intl. Conference on Conceptual Structures (ICCS’03, pages 241–254. Shaker Verlag, 2003. 19. M. West. Evolve your hiearchy. Game Developer, 13(3):51–54, Mar. 2006. Fuzzy-Valued Triadic Implications Cynthia Vera Glodeanu Technische Universität Dresden, 01062 Dresden, Germany Cynthia_Vera.Glodeanu@mailbox.tu-dresden.de Abstract. We present a new approach for handling fuzzy triadic data in the setting of Formal Concept Analysis. The starting point is a fuzzy- valued triadic context (K1 , K2 , K3 , Y ), where K1 , K2 and K3 are sets and Y is a ternary fuzzy relation between these sets. First, we generalise the methods of Triadic Concept Analysis to our setting and show how they fit other approaches to Fuzzy Triadic Concept Analysis. Afterwards, we develop the fuzzy-valued triadic implications as counterparts of the various triadic implications studied in the literature. These are of major importance for the integrity of Fuzzy and Fuzzy-Valued Triadic Concept Analysis. Keywords: Formal Concept Analysis, fuzzy data, three-way data 1 Introduction So far, the fuzzy approaches to Triadic Concept Analysis considered all three components of a triadic concept as fuzzy sets. In [1] the methods from Triadic Concept Analysis were generalised to the fuzzy setting. A more general approach was presented in [2], where different residuated lattices were considered for each fuzzy set. A somehow different strategy was considered in [3] using alpha-cuts. Our approach differs from the other ones in considering just two components as fuzzy and one as crisp in a triadic concept. This is motivated by the fact that in some situations it is not appropriate to regard all sets as fuzzy. For example, it is not natural to say that half of a person is old, however we may say a person is half old. First, we translate methods of Triadic Concept Analysis to our setting. Compared to other works, we generalise all triadic derivation operators and show how they change for the fuzzy approaches considered by other authors. Besides these results, the main achievement of this paper is the generalisation of the various triadic implications presented in [4]. Due to the large amount of results in this paper, we concentrate on giving an intuition of the methods and omit proofs whenever they do not influence the understanding. The missing proofs, further results concerning fuzzy-valued triadic concepts and trilattices can be found in [5]. There, we also study the fuzzy-valued triadic approach to Factor Analysis. The paper is structured as follows: In Section 2 we give brief introductions to Triadic and Formal Fuzzy Concept Analysis. In Section 3 we develop our c 2011 by the paper authors. CLA 2011, pp. 159–173. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 160 2 Fuzzy-Valued Cynthia Triadic Implications Vera Glodeanu fuzzy-valued setting, defining context, concept, derivation operators and show how they correspond other to approaches to Fuzzy Triadic Concept Analysis. We also comment on the reasons why our setting is a proper generalisation. In Section 4 we present the fuzzy-valued triadic implications. The developed methods are accompanied by illustrative examples. The last section contains concluding remarks and further topics of research. 2 Preliminaries We assume basic familiarities with Formal Concept Analysis and refer the reader to [6]. In the following we give brief introductions to Triadic Concept Analysis [7, 8] and Formal Fuzzy Concept Analysis [9, 10]. 2.1 Triadic Concept Analysis As introduced in [7], the underlying structure of Triadic Concept Analysis is a triadic context defined as a quadruple (K1 , K2 , K3 , Y ) where K1 , K2 and K3 are sets and Y is a ternary relation, i.e., Y ⊆ K1 × K2 × K3 . The elements of K1 , K2 and K3 are called (formal) objects, attributes and conditions, respectively, and (g, m, b) ∈ Y is read: object g has attribute m under condition b. A triadic concept (shortly triconcept) of a triadic context (K1 , K2 , K3 , Y ) is defined as a triple (A1 , A2 , A3 ) with Ai ⊆ Ki , i ∈ {1, 2, 3} that is maximal with respect to component-wise set inclusion. For a triconcept (A1 , A2 , A3 ), the components A1 , A2 and A3 are called the extent, the intent, and the modus of (A1 , A2 , A3 ), respectively. Small triadic contexts can be represented through three-dimensional cross tables (see Example 1). Pictorially, a triconcept is a rectangular box full of crosses in the three-dimensional cross table representation of (K1 , K2 , K3 , Y ), where this “box” is maximal under proper permutation of rows, columns and layers of the cross table. For {i, j, k} = {1, 2, 3} with j < k and for X ⊆ Ki and Z ⊆ Kj × Kk , the (−)(i) -derivation operators are defined by X 7→ X (i) := {(kj , kk ) ∈ Kj × Kk | (ki , kj , kk ) ∈ Y for all ki ∈ X}, (1) (i) Z 7→ Z := {ki ∈ Ki | (ki , kj , kk ) ∈ Y for all (kj , kk ) ∈ Z}. (2) These derivation operators correspond to the derivation operators of the dyadic contexts defined by K(i) := (Ki , Kj × Kk , Y (i) ) for {i, j, k} = {1, 2, 3}, where k1 Y (1) (k2 , k3 ) :⇐⇒ k2 Y (2) (k1 , k3 ) :⇐⇒ k3 Y (3) (k1 , k2 ) :⇐⇒ (k1 , k2 , k3 ) ∈ Y . Due to the structure of triadic contexts further derivation operators can be defined. For {i, j, k} = {1, 2, 3} and for Xi ⊆ Ki , Xj ⊆ Kj and Xk ⊆ Kk the (−)Xk -derivation operators are defined by Xi 7→ XiXk := {kj ∈ Kj | (ki , kj , kk ) ∈ Y for all (ki , kk ) ∈ Xi × Xk }, (3) Xj 7→ XjXk := {ki ∈ Ki | (ki , kj , kk ) ∈ Y for all (kj , kk ) ∈ Xj × Xk }. (4) Fuzzy-ValuedTriadic Fuzzy-valued TriadicImplications Implications 3 161 These derivation operators correspond to the derivation operators of the dyadic contexts defined by Kij ij ij Xk := (Ki , Kj , YXk ) where (ki , kj ) ∈ YXk if and only if (ki , kj , kk ) ∈ Y for all kk ∈ Xk . The structure on the set of all triconcepts T(K) is the set inclusion in each component of the triconcept. For each i ∈ {1, 2, 3} there is a quasiorder .i and its corresponding equivalence relation ∼i defined by (A1 , A2 , A3 ) .i (B1 , B2 , B3 ) :⇐⇒ Ai ⊆ Bi and (A1 , A2 , A3 ) ∼i (B1 , B2 , B3 ) :⇐⇒ Ai = Bi (i = 1, 2, 3). The triconcepts ordered in this way form complete trilattices, the triadic coun- terparts of concept lattices, as proved in the Basic Theorem of Triadic Concept Analysis [8]. However, unlike the dyadic case, the extents, intents and modi, respectively, do not form a closure system in general. Example 1. The triadic context displayed below consists of the object set K1 = {1, 2, 3}, the attribute set K2 = {a, b, c} and the condition set K3 = {A, B}. The context has 12 triconcepts which are displayed in the same figure on the right. For example, the first concept means that object 1 has attributes a and b under No. Extent Intent Modus No. Extent Intent Modus A B 1 {1} {a, b} {K3 } 7 {3} {K2 } {B} a b c a b c 2 {K1 } {b} {A} 8 {K1 } {a} {B} 1 ×× ×× 3 {2, 3} {b, c} {A} 9 {2, 3} {c} {K3 } 2 ×× × × 4 {∅} {K2 } {K3 } 10 {3} {b, c} {K3 } 3 ×× ××× 5 {1, 3} {a, b} {B} 11 {K1 } {K2 } {∅} 6 {2, 3} {a, c} {B} 12 {K1 } {∅} {K3 } Fig. 1. Triadic context and the associated triconcepts all conditions from K3 . However, as two components of a triconcept are necessary to determine the third one, {a, b} is also an intent of another triconcept, namely of the fifth one. 2.2 Formal Fuzzy Concept Analysis A complete residuated lattice L := (L, ∧, ∨, ⊗, →, 0, 1) is an algebra such that: (1) (L, ∧, ∨, 0, 1) is a complete lattice, (2) (L, ⊗, 1) is a commutative monoid, (3) 0 is the least and 1 the greatest element, (4) the adjointness property holds for all a, b, c ∈ L, i.e., a ⊗ b ≤ c ⇔ a ≤ b → c. Then, ⊗ is called mul- tiplication, → residuum and (⊗, →) adjoint couple. Each of the following adjoint couples make L a complete residuated lattice: 162 4 Fuzzy-Valued Cynthia Triadic Implications Vera Glodeanu Lukasiewicz: a ⊗ b := max(0, a + b − 1) with a → b := min(1, 1 − a + b) 1, a ≤ b Gödel: a ⊗ b := min(a, b) with a → b := b, a b 1, a ≤ b Product: a ⊗ b := ab with a → b := b/a, a b The hedge operator is defined as a unary function ∗ : L → L which satisfies the following properties: (1) 1∗ = 1, (2) a∗ ≤ a, (3) (a → b)∗ ≤ a∗ → b∗ , and (4) a∗∗ = a∗ . Typical examples are the identity, i.e., for all a ∈ L it holds that a∗ = a, and the globalization, i.e., a∗ = 0 for all a ∈ L \ {1} and a∗ = 1 if and only if a = 1. A triple (G, M, I) is called a formal fuzzy context if I : G × M → L is a fuzzy relation between the sets G and M and L is the support set of some residuated lattice. Elements from G and M are called objects and attributes, respectively. The fuzzy relation I assigns to each g ∈ G and each m ∈ M a truth degree I(g, m) ∈ L to which the object g has the attribute m. For fuzzy sets A ∈ LG and B ∈ LM the derivation operators are defined by ^ ^ Ap (m) := (A(g)∗ → I(g, m)), B p (g) := (B(m) → I(g, m)), (5) g∈G m∈M for g ∈ G and m ∈ M . Then, Ap (m) is the truth degree of the statement “m is shared by all objects from A” and B p (g) is the truth degree of “g has all attributes from B”. For now, we take for ∗ the identity. It plays an important role in the computation of the stem base, as we will see later. A fuzzy concept is a tuple (A, B) ∈ LG × LM such that Ap = B and p B = A. Then, A is called the (fuzzy) extent and B the (fuzzy) intent of (A, B). Fuzzy concepts represent maximal rectangles with truth values different from zero in the fuzzy context. The fuzzy concepts ordered by the fuzzy set inclusion form fuzzy concept lattices [9, 10]. Taking in (5) for ∗ hedges different from the identity, we obtain the so-called fuzzy concept lattices with hedges [11]. Example 2. The fuzzy context displayed below has the object set G = {x, y, z}, the attribute set M = {a, b, c, d} and the set of truth values is the 3-element chain L = {0, 0.5, 1}. Using the Gödel logic and the derivation operators defined in Equation 5 with the hedge ∗ being the iden- a b c d tity we obtain 10 fuzzy concepts. For example x 1 1 0.5 0 ({1, 0.5, 0}, {1, 1, 0, 0}) is a fuzzy concept. The extent y 1 0.5 0 1 contains the truth values of each object belonging to z 1 0 0 0.5 the extent, i.e., in this case x belongs fully to the set, y belongs to it with a truth value 0.5 and z does not belong to the extent. Similar affirmations can be done for the intent. Using the Lukasiewicz logic in the same setting we obtain 13 fuzzy concepts. On this set of truth values the only possible hedge operators are the identity and globalization. As one of the major roles of the hedge operators is to control the size of the fuzzy concept lattice, the Fuzzy-ValuedTriadic Fuzzy-valued TriadicImplications Implications 5 163 number of fuzzy concepts will be smaller, when using in (5) a hedge different from the identity. In our example, using the globalization operator as the hedge, we obtain 6 fuzzy concepts both with the Gödel and Lukasiewicz logic. As we will see immediately, the hedges play also an important role for the attribute implications, especially for the stem base. Fuzzy implications were studied in a series of papers by R. Belohlavek and V. Vychodil, as for example in [12, 13]. For fuzzy sets A, B ∈VLX the subsethood degree of A being a subset of B is given by tv(A ⊆ B) = x∈X (A(x) → B(x)). Let A and B be fuzzy attribute sets, then the truth value of the implication A → B is given by tv(A → B) := tv(∀g ∈ G((∀m ∈ A, (g, m) ∈ I) → (∀n ∈ B, (g, n) ∈ I))) ^ ^ ^ = ( (A(m) → I(g, m)) → (B(n) → I(g, n))) g∈G m∈M n∈M pp = tv(B ⊆ A ). Example 3. Let us go back to our fuzzy context from Example 2. Consider the Gödel logic and the derivation operators from (5) with ∗ being the identity. Then, b(1)pp = {1, 0.5, 0}p = {1, 1, 0, 0}. Now, tv(b(1) → a(1)) = tv({a(1)} ⊆ b(1)pp ) = 1 and tv(b(1) → {a(1), c(0.5)}) = 0 because c(0.5) ∈ / b(1)pp . On the other hand, considering in (5) the globalization as the hedge, we obtain b(1)pp = {1, 0.5, 0}p = {1, 1, 0.5, 0} and therefore tv(b(1) → {a(1), c(0.5)}) = 1. Yet another example is tv(b(0.5) → b(1)) = tv({b(1)} ⊆ b(0.5)pp ) = tv({b(1)} ⊆ {1, 0.5, 0, 0}) = 0.5. Due to the large number of implications in a fuzzy and even in a crisp formal context, one is intrested in the stem base of the implications. The stem base is a set of implications which is non-redundant and complete. The existence and construction of the stem base for the discrete case was studied in [14], see also [6]. The problem for the fuzzy case was studied in [13]. There, the authors showed that using in (5) the globalization, the stem base of a fuzzy context is uniquely determined. Using hedges different from the globalization, a fuzzy context may have more than one stem base. 3 Fuzzy-Valued Triadic Concept Analysis Now, we are ready to develop our fuzzy-valued triadicsetting. We will define fuzzy-valued triadic contexts, concepts and derivation operators. For a triadic context K = (K1 , K2 , K3 , Y ) a dyadic-cut (shortly d-cut) is defined as ciα := (Kj , Kk , Yαjk ), where {i, j, k} = {1, 2, 3} and α ∈ Ki . A d-cut is actually a special case of Kij ij Xk = (Ki , Kj , YXk ) for Xk ⊆ Kk and |Xk | = 1. Each d-cut is itself a dyadic context. Definition 1. A fuzzy-valued triadic context ( f-valued triadic context) is a quadruple K := (K1 , K2 , K3 , Y ), where Y is a ternary fuzzy relation between the sets Ki with i ∈ {1, 2, 3}, i.e., Y : K1 ×K2 ×K3 → L and L is the support set 164 6 Fuzzy-Valued Cynthia Triadic Implications Vera Glodeanu of some residuated lattice. The elements of K1 , K2 and K3 are called objects, attributes and conditions, respectively. To every triple (k1 , k2 , k3 ) ∈ K1 × K2 × K3 , Y assigns a truth value tvk3 (k1 , k2 ) to which object k1 has attribute k2 under condition k3 . The f-valued triadic context can be represented as a three-dimensional table, the entries of which are fuzzy values (see Example 4). In K one can interchange the roles played by the sets K1 , K2 and K3 requiring, for example, that Y assigns to every triple (k2 , k3 , k1 ) a truth value tvk1 (k2 , k3 ) to which attribute k2 exists under condition k3 having object k1 . Definition 2. A fuzzy-valued triadic concept (shortly f-valued tricon- cept) of an f-valued triadic context (K1 , K2 , K3 , Y ) is a triple (A1 , A2 , A3 ) with A1 ⊆ LK1 , A2 ⊆ LK2 and A3 ⊆ K3 that is maximal with respect to component- wise set inclusion. The components A1 , A2 and A3 are called (f-valued) extent, (f-valued) intent, and the modus of (A1 , A2 , A3 ), respectively. We denote by T(K) the set of all f-valued triconcepts. This definition immediately implies that the d-cut (K1 , K2 , Yk12 3 ) is a fuzzy context for every k3 ∈ K3 . Example 4. We consider an f-valued triadic context with values from the 3- element chain {0, 0.5, 1}. The object set K1 = {1, 2, 3, 4, 5} contains 5 groups of students, the attribute set K2 = {f, s, v} contains 3 feelings, namely, fevered (f), serious (s), vigilant (v) and the condition set K3 = {E, P, F } contains the events: Doing an exam (E), giving a presentation (P) and meeting friends (F). Using the Lukasiewicz logic, we obtain 30 f-valued triconcepts and with the Gödel E P F f s v f s v f s v 1 1 1 1 1 0.5 0.5 0 0.5 1 2 1 0.5 1 0.5 0 0 0 0 0.5 3 0.5 0.5 0.5 0.5 0.5 0 0 0 0.5 4 0.5 0 0.5 0.5 0.5 0.5 0 0.5 0.5 5 1 1 1 1 0.5 0.5 0 0.5 1 Fig. 2. F-valued triadic context logic 34. For example, ({1, 1, 0.5, 0, 0}, {1, 0.5, 1}, {E}) is an f-valued triconcept meaning that while doing an exam the first two student groups and half of the third one are fevered, vigilant and moderately serious. Another example is ({1, 1, 1, 1, 1}, {0.5, 0, 0}, {E, P }) meaning that all students are moderately fevered while giving a presentation. Yet another example is ({1, 0, 0, 0.5, 1}, {1, 0, 0.5}, {E, P }) signifying that the first, the last and half of the 4-th group of students are fevered and moderately vigilant while doing an exam and giving a presentation. Fuzzy-ValuedTriadic Fuzzy-valued TriadicImplications Implications 7 165 Lemma 1. Every f-valued triadic context is isomorphic to a triadic context. Proof. According to [9], every fuzzy context is isomorphic to a formal context, namely to its double-scaled context. Every condition d-cut is a fuzzy context. By double-scaling each condition d-cut we obtain the corresponding double-scaled e for an f-valued triadic context K. triadic context K Formally, suppose that K e := (K + , K + , K3 , Ye ) is the double-scaled triadic 1 2 context of K := (K1 , K2 , K3 , Y ), the construction of which is given below. We have to show that the considered isomorphism is given by e with ϕ(A1 , A2 , A3 ) := (A+ , A+ , A3 ) ϕ : T(K) → T(K) 1 2 and that the inverse map is given by e → T(K) with ψ(A1 , A2 , A3 ) := (A♦ , A♦ , A3 ). ψ : T(K) 1 2 Therefore, we have to prove the following statements: For all f-valued triconcepts e : (A1 , A2 , A3 ), (B1 , B2 , B3 ) ∈ T(K) and for all (X1 , X2 , X3 ) ∈ T(K) e ϕ(A1 , A2 , A3 ) ∈ T(K), ψ(X1 , X2 , X3 ) ∈ T(K), ψϕ(A1 , A2 , A3 ) = (A1 , A2 , A3 ), ϕψ(X1 , X2 , X3 ) = (X1 , X2 , X3 ), (A1 , A2 , A3 ) .i (B1 , B2 , B3 ) ⇔ ϕ(A1 , A2 , A3 ) .i ϕ(B1 , B2 , B3 ), for all i ∈ {1, 2, 3}. These statements can be proven by basic properties of fuzzy sets and triadic derivation operators. Due to limitation of space, we skip the proof. t u We present the construction of K e := (K + , K + , K3 , Ye ), the double-scaled 1 2 triadic context, for a given f-valued triadic context K = (K1 , K2 , K3 , Y ). Let Xi ⊆ LKi with i ∈ {1, 2} and let L be the support set of some residuated lattice. We define Xi+ := {(ki , µ) | ki ∈ Ki , µ ∈ L, µ ≤ Xi (ki )} ⊆ Ki∗ := Ki × L, _ Xi♦ := {µ | (ki , µ) ∈ Xi } ⊆ LKi . Then, Ye ⊆ K1+ × K2+ × K3 and ((k1 , µ), (k2 , λ), k3 ) ∈ Ye :⇐⇒ µ ⊗ λ ≤ tvk3 (k1 , k2 ) ⇐⇒ µ ⊗ λ ≤ Y (k1 , k2 , k3 ). According to the above lemma, the f-valued triadic contexts fulfill all the properties the triadic contexts have. The f-valued triconcepts ordered by the (fuzzy) set inclusion form a complete fuzzy trilattice. Due to limitation of space we omit the proofs. For our f-valued setting we want to obtain the corresponding (−)Ak and (−)(i) derivation operators. However, these can be defined in various ways. We 166 8 Fuzzy-Valued Cynthia Triadic Implications Vera Glodeanu distinguish between more cases for the (−)(i) -derivation operators. In case of the (−)(i) -derivation operators with Z = Xj × X3 ⊆ LKj × K3 and Xi ⊆ LKi for {i, j} = {1, 2} the situation is easy. They are defined as ^ Z 7→ Z (i) := {Xj (kj ) → Y (i) (ki , (kj , k3 )) | ∀k3 ∈ K3 }, kj ∈Kj (i) Xi 7→ Xi := (Tl3 , {k3 ∈ K3 | Tk3 ⊆ Tl3 }) for l3 ∈ K3 , V where Tl3 := ki ∈Ki (Xi (ki ) → Y (i) (ki , (kj , l3 )) with the derivation operators from the fuzzy dyadic context K(i) := (Ki , Kj ×K3 , Y (i) ) and Y (i) (ki , (kj , k3 )) := Y (ki , kj , k3 ). The (−)(3) -derivation operator for Z := X1 × X2 ⊆ LK1 × LK2 and X3 ⊆ K3 is defined by Z 7→ Z (3) : = {k3 ∈ K3 | k1 ⊗ k2 ≤ tvk3 (k1 , k2 ), ∀(k1 , k2 ) ∈ Z} (6) ^ = (Z(k1 , k2 ) → Y (3) ((k1 , k2 ), k3 ))∗ , (7) (k1 ,k2 )∈K1 ×K2 where Z(k1 , k2 ) := X1 (k1 ) ⊗ X2 (k2 ), ∗ is the globalization in order to assure that Z (3) is crisp and we have the dyadic fuzzy context K(3) := (K1 × K2 , K3 , Y (3) ) with Y (3) ((k1 , k2 ), k3 ) := Y (k1 , k2 , k3 ). We search for the conditions which con- tain the maximal rectangle generated by Z. (3) The situation for X3 is quite tricky. Applying the derivation operators in K for X3 , we get a truth value l ∈ L such that l = k1 ⊗ k2 instead of a tuple (3) (k1 , k2 ). To obtain such a tuple, we first have to compute the double-scaled con- text K.e Afterwards, we use the crisp (−)(3) -derivation operator in K e to find the components of the triconcept. Finally, we transform these into fuzzy sets as de- scribed in the construction of K. e This way, we obtain the tuples ((k1 , µ), (k2 , ν)) consisting of objects and attributes with their truth values instead of the truth value k1 ⊗ k2 . For other approaches of fuzzy triadic data the derivation operators given in (7) and the above construction suffice for any (−)(i) derivation operator. Proposition 1. The (−)(i) -derivation operators with i ∈ {1, 2, 3} yield f-valued triconcepts. (1) Proof. Suppose X1 ⊆ LK1 , X2 ⊆ LK2 and X3 ⊆ K3 . We have X1 = (Tl3 , {k3 | Tk3 ⊆ Tl3 }), where ^ Tl3 = (X1 (k1 ) → Y (1) (k1 , (k2 , l3 ))) k1 ∈K1 ^ = (X1 (k1 ) → Yl12 3 (k1 , k2 )). k1 ∈K1 Since K12l3 is a dyadic fuzzy context, (X1 , Tl3 ) =: (X1 , A2 ) is a fuzzy preconcept in Kl3 , i.e., X1p ⊆ A2 and Ap2 ⊆ X1 with the derivation operators of K12 12 l3 given Fuzzy-ValuedTriadic Fuzzy-valued TriadicImplications Implications 9 167 by Equation (5). In particular we have X1p ⊆ A2 . For any k3 ∈ K3 if Tk3 ⊆ Tl3 , then (X1 , A2 ) is a fuzzy preconcept also in K12 l3 ∪k3 . Proceeding alike, we obtain the largest set A3 ⊆ K3 containing l3 such that TA3 ⊆ Tl3 . Then, (X1 , A2 ) is a fuzzy preconcept in K12 A3 . So far, we obtained the last two components of the f-valued triconcept and apply on them the (−)(1) -derivation operator to obtain the first one. Now, we have ^ (A2 × A3 )(1) = {A2 (k2 ) → Y 1 (k1 , (k2 , k3 )) | ∀k3 ∈ A3 } k2 ∈K2 ^ = (A2 (k2 ) → YA123 (k1 , k2 )), k2 ∈K2 which is A2 derivated in K12 A3 , i.e., the fist component of the triconcept, namely A1 . Since (A1 , A2 ) is a fuzzy concept, it is a maximal rectangle and A3 is the largest set containing this maximal rectangle. We still have to check the other pair of derivation operators. Let X3 ⊆ K3 , (3) then the maximality of X3 = (A1 , A2 ) is automatically satisfied, as we obtain (3) X3 from the double scaled context. The maximality of (A1 × A2 )(3) follows analogously to the first case. tu As a direct consequence of this proposition, we have the following statement: Proposition 2. For an f-valued triconcept (A1 , A2 , A3 ) it holds that Ai = (Aj × Ak )(i) for {i, j, k} = {1, 2, 3} with j < k. t u For the (−)Ak -derivation operators we also distinguish between two cases, namely when Ak is a crisp set and when it is fuzzy. When Ak is crisp, i.e., Ak := A3 we proceed as follows: For Xi ⊆ LKi with i ∈ {1, 2} and A3 ⊆ K3 we define ^ X1 7→ X1A3 := (X1 (k1 )• → YA123 (k1 , k2 )), (8) k1 ∈K1 ^ X2 7→ X2A3 := (X2 (k2 ) → YA123 (k1 , k2 )) (9) k2 ∈K2 for the dyadic fuzzy context K12 12 A3 := (K1 , K2 , YA3 ). where YA123 : K1 × K2 × A3 → L, ^ YA123 (k1 , k2 ) := {tvk3 (k1 , k2 ) | ∀(k1 , k2 , k3 ) ∈ K1 × K2 × A3 }. These derivation operators are the fuzzy counterparts of the (−)Ak -derivation operators, because Ak is crisp. In the discrete case we have (ki , kj ) ∈ YAi,jk if and only if for all kk ∈ Ak it holds that (ki , kj , kk ) ∈ Y . Therefore, in the fuzzy setting for YAij3 (ki , kj ), we take the minimum of the values tvk3 (ki , kj ). Since K12 A3 is a fuzzy context, the (−)A3 -derivation operators form fuzzy Galois connections. 168 10 Cynthia Fuzzy-Valued Triadic Implications Vera Glodeanu In (8) we will need the hedge • for the computation of the unique stem base, however in general we take the identity for this hedge. For the (−)Aj -derivation operators with {i, j} = {1, 2} the situation is dif- ferent, because Aj is a fuzzy set. In the following we discuss more possibilities to obtain these derivation operators. In such cases we are interested in the relation between Ki and K3 for the values of Aj . This means that we are interested in just a part of the double-scaled context K,e namely in K e A := V + e j aj ∈Aj (Ki , K3 , aj , Y ). So, we could use discrete derivation operators to compute the concepts of K eA j and afterwards transform them into fuzzy concepts. However, this is a laborious task and was presented just for a better understanding of the problem. Another approach for the (−)Aj -derivation operators is the following: A Xi 7→ Xi j := {k3 ∈ K3 | ki ⊗ kj ≤ tvk3 (ki , kj ), ∀(ki , kj ) ∈ Xi × Aj }, A _ X3 7→ X3 j := {ki ∈ LKi | ki ⊗ kj ≤ tvk3 (ki , kj ), ∀(k3 , kj ) ∈ X3 × Aj }. In this case we do not need to double-scale the context. We compute the fuzzy concept induced by Xi and Aj and check under which conditions it exists. This A way we obtain Xi j , i.e., the third component of the f-valued triconcept that is A induced by Xi and Aj . To obtain X3 j we consider each ki ∈ LKi and check whether the maximal rectangle ki ⊗ Aj exists under the fixed conditions of X3 . Afterwards, we take the maximum of these ki ’s due to the maximality property of f-valued triconcepts. This approach is laborious, especially the computation A of X3 j due to the large number of ki ’s we have to check. We will consider a more straight-forward approach by computing the fuzzy context induced by Aj . A similar approach was presented in [1]. For Xi ∈ LKi , Aj ∈ LKj with {i, j} = {1, 2} and A3 ⊆ K3 we have A ^ Xi 7→ Xi j := (Xi (ki )• → YAi3j (ki , k3 ))∗ , (10) ki ∈Ki A ^ X3 7→ X3 j := (Xj (kj ) → YAi3j (ki , k3 )), (11) kj ∈Kj where • and ∗ are hedge operators. The • operator is optional, as it is needed just for the computation of the stem base. It is the identity if i = 1. The ∗ A hedge is always a compulsory globalization in order to assure that Xi j yields a crisp set. Then, (10) and (11) are the V derivation operators of the fuzzy context (Ki , K3 , YAi3j ) where YAi3j (ki , k3 ) := kj ∈Kj (Aj (kj ) → Y (ki , kj , kk )). Considering in (10) and (11) all values for the indices, i.e., instead of (−)Aj we take (−)Ak for {i, j, k} = {1, 2, 3}, and ignoring ∗ , these derivation operators suffice for other approaches to Fuzzy Triadic Concept Analysis. This happens due to the fact that such derivation operators yield triconcepts in which all three components are fuzzy sets. Proposition 3. For {i, j, k} = {1, 2, 3} there are (fuzzy) sets Xi ∈ LKi (Xi ∈ Ki , if i = 3) and Xk ∈ LKk (Xk ∈ Kk , if k = 3) such that Aj := XiXk , Ai := Fuzzy-ValuedTriadic Fuzzy-valued TriadicImplications Implications 11 169 AXj k and Ak := (Ai × Aj )(k) (if i < j) or Ak := (Aj × Ai )(k) (if i > j). Then, (A1 , A2 , A3 ) is an f-valued triconcept denoted by bik (Xi , Xk ) having the smallest k-th component under all f-valued triconcepts (B1 , B2 , B3 ) with the largest j- th component satisfying Xi ⊆ Bi and Xk ⊆ Bk . Particularly, bik (Ai , Ak ) = (A1 , A2 , A3 ) for each f-valued triconcept (A1 , A2 , A3 ) of K. Proof. Without loss of generality we can assume (i, j, k) = (1, 2, 3). Obviously, X1 ⊆ A1 and X3 ⊆ A3 . We start by proving that (A1 , A2 , A3 ) is indeed an f-valued triconcept. From Proposition 1 we have A3 = (A1 × A2 ). Then, (A ×A )(3) A2 ⊆ A1 1 2 = AA 1 3 ⊆ X1X3 = A2 . Hence, A2 = AA 1 3 = (A1 × A3 )(2) , (1) similarly A1 = (A2 × A3 ) and together with Proposition 2 they yield an f-valued triconcept. The rest of the proof is analogous to the crisp case. Let (B1 , B2 , B3 ) ∈ T(K) with X1 ⊆ B1 and X3 ⊆ B3 . Then, B2 ⊆ A2 , because B2 = (B1 ×B3 )(2) = B1B3 ⊆ X1X3 = A2 . If B2 = A2 , by similar consideration as before, we obtain B1 ⊆ A1 . Therefore, we have A3 = (A1 × A2 )(3) ⊆ (B1 × B2 )(3) = B3 , finishing the first part of the proof. Now, if (A1 , A2 , A3 ) is an f-valued tricon- cept, then AA 1 = (A1 × A3 ) 3 (2) = A2 and AA2 = (A2 × A3 ) 3 (1) = A1 . Therefore, bik (A1 , A3 ) = (A1 , A2 , A3 ) follows by the first part of the proposition. t u 4 F-valued Implications In this section we will study f-valued implications, as generalisations of those elaborated for the discrete case in [4]. There, the authors presented various triadic implications, which are stronger than the ones developed in [15]. For a given discrete triadic context K = (K1 , K2 , K3 , Y ) and for R, S ⊆ K2 and C C ⊆ K3 the expression R → S was called conditional attribute implication. For C R, S ⊆ K3 and C ⊆ K2 the expression R → S was called attributional condition implication. Implications of the form R → S with R, S ⊆ K2 × K3 were called attribute×condition implications. Our main aim in the upcoming subsections is to generalise such implications to our setting. 4.1 F-valued Conditional Attribute vs. Attributional Condition Implications In this subsection we study implications of the form: If we are moderately vigilant during an exam, then we are also fevered and If we are serious during an exam, then we feel the same during our presentation. Definition 3. For R, S ⊆ LK2 , C ⊆ K3 and globalization • we call the expres- C sion R → S f-valued conditional attribute implication and its truth value is given by C R → S := tv(∀g ∈ K1 ((∀m ∈ R, (g, m) × C ∈ Y )• → (∀n ∈ S, (g, n) × C ∈ Y ))) ^ ^ ^ = ( (R(m) → YC12 (g, m)) → (S(n) → YC12 (g, n))) g∈K1 m∈K2 n∈K2 CC = tv(S ⊆ R ). 170 12 Cynthia Fuzzy-Valued Triadic Implications Vera Glodeanu Note that these implications are ordinary fuzzy implications since we are working in the fuzzy context K12 C. Example 5. For the context given in Figure 2 we have, for example, the f-valued E P conditional attribute implication s(0.5) → f (1) = s(0.5) → f (1) = 0.5 and yet F another is s(0.5) → f (1) = 0. The first implication means that whenever the students are partially serious during an exam then they are also fevered. The same holds for this implication during a presentation given by the students. The implication does not hold when they are meeting their friends. In such situations the students can be serious but have a relaxed attitude. For an f-valued triadic context K we denote by Imp(K2 ) := {R → S | R, S ∈ LK2 } the set of all fuzzy implications on K2 . We construct the dyadic context Cimp (K) := (Imp(K2 ), K3 , I) c where Imp(K2 ) is a fuzzy set, K3 is a crisp set and I(R → S, c) := R → S. In order to keep the condition set crisp, we use in Cimp (K) a slightly different version of the dyadic fuzzy derivation operators defined in (5), namely ^ ^ Ap (m) := (A(g)∗ → I(g, m)), B p (g) := ( (B(m) → I(g, m)))• g∈Imp(K2 ) m∈K3 for A ∈ Imp(K2 ), B ∈ K3 and ∗ is the globalization. Then, (A, B) ∈ B(Cimp (K)) contains in its extent all the implications that hold under all conditions of B. As in the crisp case, each extent is an implicational theory and hence, every extent has a stem base. In the concept lattice of Cimp (K) the implicational theories are hierarchically ordered by the conditions under which they hold. The extent A is the set of all implications that hold in (K1 , K2 , Yc12 ) with c ∈ C. The number of fuzzy implications can be very large, since we have all impli- cations A → B with A, B ⊆ LK2 . In the crisp case an implication either holds or not, whereas in the fuzzy case an implication holds V with a given truth value, i.e., with tv(A → V B). We have tv(a → b, c) = {tv(a → b), tv(a → c)} and tv(a, b → c) = {tv(a → c), tv(b → c)} for all a, b, c ∈ LK2 . Hence, for the structure of Cimp (K) it is enough to compute implications of the form a → b and a(µ) → a(ν) for all a, b ∈ LK2 with b 6= a and µ, ν ∈ L with µ ν. As discussed before, the other implications are infimum reducible elements in the lattice. In accordance with the idea presented in [4] we label the concept lattice of Cimp (K) as follows: The attribute labelling is done in the usual way. For the object labelling the situation is more cumbersome. Each set of implications from Imp(K2 ) is an extent of Cimp (K) and an implicational theory, as discussed above. The object labels shall be distributed such that every extent is generated as an implicational theory by the labels attached to it and to its subconcepts. Therefore, the bottom element of the lattice will contain the stem base of all f-valued conditional attribute implications. Fuzzy-ValuedTriadic Fuzzy-valued TriadicImplications Implications 13 171 18 v(0.5) f (0.5) 2 17 s(1) 15 16 Friends Exam Presentation v(1) 13 f (1) 4 3 5 14 12 s(0.5) 6 7 11 10 v(1) → f (1) E, F → P 8 9 Fig. 3. Conditional attribute vs. attributional condition implications On the left part in Figure 3 the lattice of Cimp (K) is displayed. For better legibility we used just the attribute labels (the conditions) and one object label (conditional attribute implication). The implication v(1) → f (1) from the lattice means that whenever the students are vigilant in degree (truth value) 1 during an exam and presentation they are also fevered in degree 1 in these situations. C An implication C → D between the intents of Cimp (K) means that if R → S D holds, then R → S must hold as well. For our example the stem base of Cimp (K) is P, F → E. We could perform a condition attribute exploration as proposed in [4] for the discrete case, however this would go beyond the scope of this paper. In a triadic context we may arbitrarily interchange the roles of objects, at- tributes and conditions. Therefore, a triadic context has a sixfold symmetry. By interchanging attributes with conditions in Definition 3, we obtain the attribu- tional condition implications defined as follows: Definition 4. For R, S ⊆ K3 and M ⊆ LK2 the expression M R → S := tv(∀g ∈ K1 ((∀a ∈ R, g × M × a ∈ Y ∗ ) → (∀b ∈ S, g × M × b ∈ Y ∗ ))) ^ ^ ^ = ( (R(a) → YM 13 (g, a))∗ → (S(b) → YM13 (g, a))∗ ), g∈K1 a∈K3 b∈K3 is called f-valued attributional condition implication, where ∗ is the glob- alization. M We use the globalization hedge operator because this time R → S is a crisp implication. For example, for the f-valued triadic context from Table 2 we have v(1) the attributional condition implication P → E, F = 1, meaning that students who are vigilant during a presentation are also vigilant during an exam and while f (1) meeting friends. On the other hand, P → E, F = 0 means that a student being fevered during a presentation does not imply that he/she is fevered during an exam and while spending time with friends. 172 14 Cynthia Fuzzy-Valued Triadic Implications Vera Glodeanu In analogy to the conditional attribute implications, we can also build the e := (Imp(K3 ), K2 ×L, I) for the attributional condition implica- context Cimp (K) tions. This time we have Imp(K3 ) := {R → S | R, S ∈ K3 }, i.e., all implications m e consist of all impli- on K3 and I(R → S, m) := R → S. The extents of Cimp (K) 13 cations that hold in (K1 , K3 , Ym ) with m ∈ K2 . The concept lattice is displayed on the right in Figure 3. For example the implication E, F → P means that if the students during an exam and while meeting friends are (partially) fevered and (partially) serious, then they have the same feelings during their presentation. The connection between the two classes of implications is an open question even for the discrete case and it remains open for the f-valued triadic case as well. 4.2 F-valued Attribute×Condition Implications As presented for the discrete case, the two classes of implications studied so far are not powerful enough to express all possible kinds of implications in a triadic context. Therefore, we will generalise the so-called attribute×condition implications to our setting. These express implications of the form If we are serious during our presentation, then we are moderately fevered during the exam. Definition 5. For R, S ⊆ LK2 × K3 the expression R → S is an f-valued attribute×condition implication and its truth value is given by ^ ^ ^ ( (R(m, b) → Y (g, m, b))• → (S(n, c) → Y (g, n, c))), g∈K1 (m,b)∈K2 ×K3 (n,c)∈K2 ×K3 where • is the globalization, if we want to compute the unique stem base, other- wise the identity. These are the attribute implications of the fuzzy context (K1 , K2 ×K3 , Y (1) ). Their stem base is given by the stem base of the attribute implications from (K1 , K2 × K3 , Y (1) ). Obviously, such implications can be easily obtained by the f-valued condi- C tional attribute and attributional condition implications, i.e., if we have R → K2 S for R, S ⊆ L , C ⊆ K3 , then we can compute R × {c} → S × {c} for all c ∈ C. Going the other way around, namely transforming the f-valued attribute×condition implications into f-valued conditional attribute and attri- butional condition implications, is of course also possible. One could also be interested in f-valued object×attribute or object×condi- tion implications. For our example this would mean If the first group of students is fevered, then the second one is serious. 5 Conclusion and Further Research First, we presented a new framework for treating triadic fuzzy data. For this setting we generalised the notions of the (−)Ak and (−)(i) derivation operators, Fuzzy-ValuedTriadic Fuzzy-valued TriadicImplications Implications 15 173 triconcepts and trilattices. We also showed how our notions can be translated into different approaches to Fuzzy Triadic Concept Analysis studied by other authors. One of our main results is the generalisation of the (−)(i) derivation operator for the f-valued triadic and fuzzy triadic setting, since it is absent in other works dealing with fuzzy triadic data. Second, we generalised triadic implications to our f-valued setting. These are of major importance for the development of Fuzzy and Fuzzy-Valued Triadic Concept Analysis. Future research will focus on the connection between the different classes of f-valued triadic implications. As mentioned at the beginning, [5] is an extended version of this paper including the factorization problem. In the future we want to apply the f-valued triadic factorization to real world data. References 1. Belohlávek, R., Osicka, P.: Triadic concept analysis of data with fuzzy attributes. In Hu, X., Lin, T.Y., Raghavan, V.V., Grzymala-Busse, J.W., Liu, Q., Broder, A.Z., eds.: GrC, IEEE Computer Society (2010) 661–665 2. Osicka, P., Konecny, J.: General approach to triadic concept analysis 116-126. In Kryszkiewicz, M., Obiedkov, S.A., eds.: Proc. CLA 2010, University of Sevilla (2010) 116–126 3. Clara, N.: Hierarchies generated for data represented by fuzzy ternary relations. In: Proceedings of the 13th WSEAS international conference on Systems, Stevens Point, Wisconsin, USA, World Scientific and Engineering Academy and Society (WSEAS) (2009) 121–126 4. Ganter, B., Obiedkov, S.A.: Implications in triadic formal contexts. In: ICCS. (2004) 186–195 5. Glodeanu, C.: Fuzzy-valued triadic concept analysis and its applications. Technical Report MATH-AL-07-2011, Technische Universitat Dresden (September 2011) 6. Ganter, B., Wille, R.: Formale Begriffsanalyse: Mathematische Grundlagen. (1996) 7. Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In Ellis, G., Levinson, R., Rich, W., Sowa, J.F., eds.: ICCS. Volume 954 of Lecture Notes in Computer Science., Springer (1995) 32–43 8. Wille, R.: The basic theorem of triadic concept analysis. Order 12 (1995) 149–158 9. Pollandt, S.: Fuzzy Begriffe. Springer Verlag, Berlin Heidelberg New York (1997) 10. Belohlávek, R.: Fuzzy Relational Systems: Foundations and Principles. Volume 20 of IFSR Int. Series on Systems Science and Engineering. Kluwer Academic/Plenum Press (2002) 11. Belohlávek, R., Vychodil, V.: Fuzzy concept lattices constrained by hedges. JACIII 11(6) (2007) 536–545 12. Belohlávek, R., Vychodil, V.: Attribute implications in a fuzzy setting. In: ICFCA. (2006) 45–60 13. Belohlávek, R., Vychodil, V., Chlupová, M.: Implications from data with fuzzy attributes. In: AISTA 2004 in Cooperation with the IEEE Computer Society Proceedings. (2004) 14. Guigues, J.L., Duquenne, V.: Familles minimales d’implications informatives re- sultant d’un tableau de donnes binaires. Math. Sci. Humaines (95) (1986) 15. Biedermann, K.: A Foundation of the Theory of Trilattices. Shaker Verlag, Aachen (1988) Mining Biclusters of Similar Values with Triadic Concept Analysis Mehdi Kaytoue1 , Sergei O. Kuznetsov2 , Juraj Macko3 , Wagner Meira Jr.1 and Amedeo Napoli4 1 Universidade Fereral de Minas Gerais – Belo Horizonte – Brazil 2 HSE – Pokrovskiy Bd. 11 – 109028 Moscow – Russia 3 Palacky University – 17. listopadu – 77146 Olomouc – Czech Republic 4 INRIA/LORIA – Campus Scientifique, B.P. 239 – Vandœuvre-lès-Nancy – France kaytoue@dcc.ufmg.br, kuznetsovs@yandex.ru, juraj.macko@upol.cz, meira@dcc.ufmg.br, napoli@loria.fr Abstract. Biclustering numerical data became a popular data-mining task in the beginning of 2000’s, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non redundant enumeration of such patterns, which is a well-known in- tractable problem, while no formal framework exists. In this paper, we introduce important links between biclustering and formal concept anal- ysis. More specifically, we originally show that Triadic Concept Analysis (TCA), provides a nice mathematical framework for biclustering. Inter- estingly, existing algorithms of TCA, that usually apply on binary data, can be used (directly or with slight modifications) after a preprocessing step for extracting maximal biclusters of similar values. Keywords: Triadic concept analysis, numerical biclustering, scaling 1 Introduction Numerical data biclustering mainly appeared in the beginning of 2000’s as a first answer to new challenges raised by biological data analysis, and especially gene expression data analysis [13]. Starting from an object/attribute numerical data-table (e.g. Table 1), the goal is to group together some objects with some attributes according to the values taken by these attributes for these objects [13]. Accordingly, a bicluster is formally defined as a pair composed of a set of ob- jects and a set of attributes. Such pair can be represented as a rectangle in the numerical table, modulo lines and columns permutations. Table 1 is a numerical dataset with objects in lines and attributes in columns, while each table entry corresponds to the value taken by the attribute in column for the object in line. Table 2 illustrates bicluster ({g1 , g2 , g3 }, {m1 , m2 , m3 }) as a grey rectangle. There are several types of biclusters in the literature (see [13] for a survey), depending on the relation between the values taken by their attributes for their c 2011 by the paper authors. CLA 2011, pp. 175–190. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 176 Mehdi Kaytoue et al. objects. The most simple case can be understood as rectangles of equal val- ues: a bicluster corresponds to a set of objects whose values taken by a same set of attributes are exactly the same, e.g. ({g1 , g2 , g3 }, {m5 }). Constant bi- clusters only appear in idyllic situations: generally numerical data are noisy. Accordingly, a straightforward generalization of such biclusters lies in so called biclusters of similar values: they are represented by rectangles with almost iden- tical, say similar, values [13, 1, 7]. Table 2 illustrates a bicluster of similar values ({g1 , g2 , g3 }, {m1 , m2 , m3 }) where two values are said to be similar if their dif- ference is no more than 1. Moreover, this bicluster is maximal: neither an object nor an attribute can be added without violating the similarity condition. Only few methods address a complete, correct and non redundant enumer- ation of such patterns [1, 7], which is a well-known intractable problem [13], while no formal framework exists. In this paper, we show that Formal Concept Analysis (FCA) [3], and especially Triadic Concept Analysis (TCA) [12] pro- vides a suitable and well defined framework for this task: Basically, an object has an attribute under a condition (a value). After a simple scaling procedure (turning original data into binary), a bicluster is represented as a triadic con- cept, composed of a set of objects, a set of attributes (both characterizing the corresponding “rectangle”) and a set of values. All sets are maximal thanks to existing concept forming derivation operators of TCA. This comes with several advantages: – Two values w1 , w2 of the original data are said to be similar iff their difference does not exceed a given parameter θ. In this case, we write w1 'θ w2 ⇐⇒ |w1 − w2 | ≤ θ. Otherwise, we write w1 6'θ w2 . The trilattice produced with TCA after scaling gives all maximal biclusters of similar values for any θ ordered w.r.t. similarity of their values. – The well known notion of frequency takes a semantics w.r.t. similarity of values. For example, let (A, B, C) be a triconcept, where A is a set of objects, B a set of attributes, and C a set of similar values. Assume (A, B) to be the corresponding bicluster. The higher |C|, the more similar are the values of the bicluster. If all |A|, |B|, and |C| are high we obtain a bicluster represented as a large rectangle of close values. – Existing algorithms from TCA [4] and n-ary closed set mining [2] can be used directly after scaling. We also provide a new algorithm to compute biclusters maximal only for a given θ (see algorithm TriMax later on). – Both scaling procedure and algorithm TriMax computations can be directly distributed to several computing cores. – The method can be adapted to n-ary numerical datasets. For example, with n = 3, a n-cluster would be a maximal 3D-box of similar values. It can be applied to 3D gene expression data, monitoring the behaviour of genes in different samples over time. It follows that mining n-dimensional clusters can be achieved with n + 1-adic concept analysis. The paper is organized as follows. Firstly, preliminaries regarding TCA are presented in Section 2. Then Section 3 formally states the problem. It is followed by the description of our two methods, respectively in Section 4 and 5. The Mining bicluster of similar values with triadic concept analysis 177 first shows how TCA can help characterizing all maximal biclusters for any θ, while the second restricts the problem to a user-given θ. This is followed by experiments on the proposed approaches. Finally, the paper ends with a discussion and perspectives of further research. Table 1: A numerical dataset Table 2: A bicluster of similar values m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 g1 1 2 2 1 6 g1 1 2 2 1 6 g2 2 1 1 0 6 g2 2 1 1 0 6 g3 2 2 1 7 6 g3 2 2 1 7 6 g4 8 9 2 6 7 g4 8 9 2 6 7 2 Triadic Concept Analysis We assume that the reader is familiar with basic notions of Formal Concept Anal- ysis [3]. Lehmann and Wille introduced Triadic Concept Analysis (TCA [12]). Data are represented by a triadic context, given by (G, M, B, Y ). G, M , and B are respectively called sets of objects, attributes and conditions, and Y ⊆ G × M × B. The fact (g, m, b) ∈ Y is interpreted as the statement “Object g has the attribute m under condition b”. A (triadic) concept of (G, M, B, Y ) is a triple (A1 , A2 , A3 ) with A1 ⊆ G, A2 ⊆ M and A3 ⊆ B satisfying the two following statements: (i) A1 × A2 × A3 ⊆ Y , X1 × X2 × X3 ⊆ Y and (ii) A1 ⊆ X1 , A2 ⊆ X2 and A3 ⊆ X3 implies A1 = X1 , A2 = X2 and A3 = X3 . If (G, M, B, Y ) is represented by a three dimensional table, (i) means that a concept stands for a 3-dimensional rectangle full of crosses while (ii) characterises component-wise maximality of concepts. For a triadic concept (A1 , A2 , A3 ), A1 is called the extent, A2 the intent and A3 the modus. To describe the derivation operators, it is convenient to alternatively repre- sent a triadic context as (K1 , K2 , K3 , Y ). Then, for {i, j, k} = {1, 2, 3}, j < k, X ⊆ Ki and Z ⊆ Kj × Kk , (i)-derivation operators are defined by: Φ : X → X (i) : {(aj , ak ) ∈ Kj × Kk | (ai , aj , ak ) ∈ Y for all ai ∈ X} 0 Φ : Z → Z (i) : {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Z} This definition leads to derivation operator K(3) and dyadic context K(3) = hK3 , K1 × K2 , Y (3) i. Further derivation operators are defined as follows: for {i, j, k} = {1, 2, 3}, Xi ⊆ Ki , Xj ⊆ Kj and Ak ⊆ Kk , the (i, j, Ak )-derivation operators are defined by: (i,j,Ak ) Ψ : Xi → Xi : {aj ∈ Kj | (ai , aj , ak ) ∈ Y for all (ai , ak ) ∈ Xi × Ak } 0 (i,j,Ak ) Ψ : Xj → Xj : {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Xj × Ak } 0 Operators Φ and Φ will be called outer operators, pair of both operators outer 0 closure and dyadic operators Ψ and Ψ inner operators or inner closure when pair of both is used. Derivation operators of dyadic context are defined by Kij Ak = hKi , Kj , YAijk i, where (ai , aj ) ∈ YAijk iff ai , aj , ak are related by Y for all ak ∈ Ak . From a computational point of view, [4] developed the algorithm Trias for extracting frequent triadic concepts, i.e. whose extent, intent and modus cardi- nalities are higher than user-defined thresholds (see also [5]). Cerf et al. presented 178 Mehdi Kaytoue et al. a more efficient algorithm called Data-peeler able to handle n-ary relations [2] while formal definitions lie in so called Polyadic Concept Analysis [14]. 3 Notations and problem settings A numerical dataset is realized by a many-valued context [3] and we define accordingly (maximal) biclusters of similar values. Definition 1 (Many-valued context). Let G be a set of objects, M be a set of attributes, W be the set of attribute values and I be a ternary relation defined on the Cartesian product G × M × W . The fact (g, m, w) ∈ I, also written m(g) = w, means that “Attribute m takes the value w for the object g”. The tuple (G, M, W, I) is called many-valued context, or simply numerical dataset in this paper. Example 1. Table 1 is a numerical dataset, or many-valued context, with objects G = {g1 , g2 , g3 , g4 }, attributes M = {m1 , m2 , m3 , m4 , m5 }, W = {0, 1, 2, 6, 7, 8, 9} and for example m5 (g2 ) = 6. Definition 2 (Bicluster). In a numerical dataset (G, M, W, I), a bicluster is a tuple (A, B) with A ⊆ G and B ⊆ M . Definition 3 (Similarity relation and bicluster of similar values). Let w1 , w2 ∈ W be two attribute values and θ ∈ N be a user-defined parameter, called similarity parameter. w1 and w2 are said to be similar iff |w1 − w2 | ≤ θ and we note w1 'θ w2 . (A, B) is bicluster of similar values if m(g) 'θ n(h) for all g, h ∈ A and for all m, n ∈ B. Definition 4 (Maximal bicluster of similar values). A bicluster of similar values (A, B) is maximal if adding either an object in A or an attribute in B does not result in a bicluster of similar values. Example 2 (From Table 1). ({g1 , g4 }, {m2 , m4 }) is a bicluster. ({g1 , g2 }, {m2 }) is a bicluster of similar values with θ ≥ 1. However, it is not maximal. With 1 ≤ θ < 5, ({g1 , g2 , g3 }, {m1 , m2 , m3 }) is maximal. Finally, with θ = 7 the biclus- ter ({g1 , g2 , g3 }, {m1 , m2 , m3 , m4 , m5 }) is maximal. Note that a constant (max- imal) bicluster is a (maximal) bicluster of similar values with θ = 0. Thus the problem that we address in this paper is the extraction of all max- imal biclusters of similar values from a numerical dataset. We desire the extrac- tion to be complete, correct and non-redundant compared to several existing methods of the literature based on heuristics [13]. For that matter, we pro- pose in the next section a first method aiming at extracting biclusters for any similarity parameter θ. This method establishes new links between biclustering and FCA in general, and TCA in particular. Then, the present methodology is adapted to characterize and extract biclusters that are maximal for a given θ only as usually done in the literature [1, 7, 13]. Mining bicluster of similar values with triadic concept analysis 179 4 Biclusters of similar values in Triadic Concept Analysis Firstly, we consider the problem of generating maximal biclusters for any θ. Starting from a numerical dataset (G, M, W, I), the basic idea lies in building a triadic context (G, M, T, Y ) where the two first dimensions remain formal objects and formal attributes, while W is scaled into a third dimension denoted by T . This new dimension T is called the scale dimension: intuitively, it gives different “spaces of values” that each object-attribute pair (g, m) ∈ G×M can take. Once the scale is given, a triadic context is derived from which triadic concepts are characterized. We use the interordinal scaling [3] to build the scale dimension. It allows to encode in 2T all possible intervals of values in W . This scale allows to derive a triadic context from which any bicluster of similar values can be characterized as a triadic concept. We made more precise these statements and illustrate the whole procedure with examples. Definition 5 (Interordinal Scaling). A scale is a binary relation J ⊆ W × T associating original elements from the set of values W to their derived ele- ments in T . In the case of interordinal scaling, T = {[min(W ), w], ∀w ∈ W } ∪ {[w, max(W )], ∀w ∈ W }. Then (w, t) ∈ J iff w ∈ t. Example 3. Table 3 gives the tabular representation of the interordinal scale for Table 1. Intuitively, each line describes a single value, while dyadic concepts represent all possible intervals over W . An example of dyadic concept in this table is given by ({6, 7, 8}, {t6 , t7 , t8 , t9 , t10 }), rewritten as ({6, 7, 8}, {[6, 8]}) since {t6 , t7 , t8 , t9 , t10 } represents the interval [0, 8] ∩ [0, 9] ∩ [1, 9] ∩ [2, 9] ∩ [6, 9] = [6, 8]. t10 = [6, 9] t11 = [7, 9] t12 = [8, 9] t13 = [9, 9] t1 = [0, 0] t2 = [0, 1] t3 = [0, 2] t4 = [0, 6] t5 = [0, 7] t6 = [0, 8] t7 = [0, 9] t8 = [1, 9] t9 = [2, 9] J 0 × × × × × × × 1 × × × × × × × 2 × × × × × × × 6 × × × × × × × 7 × × × × × × × 8 × × × × × × × 9 × × × × × × × Table 3: Interordinal scale of the set of attribute values W . Once the scale is defined, we can derive the triadic context w.r.t. this scale. Definition 6 (Triadic scaled context). Let Y be ternary relation Y ⊆ G × M ×T . Then (g, m, t) ∈ Y iff (m(g), t) ∈ J, or simply m(g) ∈ t. We call the tuple (G, M, T, Y ) the triadic scaled context of the numerical dataset (G, M, W, I). Example 4. The object-attribute pair (g1 , m1 ) taking value m1 (g1 ) = 1 is scaled into triples (g1 , m1 , t) ∈ Y where t takes any interval in {[0, 1], [0, 2], [0, 6], [0, 7], 180 Mehdi Kaytoue et al. [0, 8], [0, 9], [1, 9]}. The intersection of intervals in this set is the original value itself, i.e. m1 (g1 ) = 1, a basic property of interordinal scaling. As a result, Table 4 illustrates the whole scaled triadic context derived from the numerical dataset given in Table 1 using interordinal scale. The very first cross (×) in this table (upper left) represents the tuple (g2 , m4 , t1 ), meaning that m4 (g2 ) ∈ [0, 0]. We present now our first main result: there is a one-to-one correspondence between (i) the set of maximal biclusters of similar values in a given numerical dataset for any similarity parameter θ and (ii) the set of all triadic concepts in the triadic context derived with interordinal scaling. Proposition 1. Tuple hA, B, U i, where A ⊆ G, B ⊆ G and U ⊆ T is triadic concept iff (A, B) is a maximal bicluster of similar values for some θ ≥ 0. Proof. We leave the proof in the Appendix of the paper since we need to intro- duce notations and propositions not necessary in the rest of the paper. Example 5. For example, ({g1 , g2 , g3 }, {m1 , m2 , m3 }, {t3 , t4 , t5 , t6 , t7 , t8 }) is a tri- adic concept from the context depicted in Table 4. It corresponds to the maximal bicluster ({g1 , g2 , g3 }, {m1 , m2 , m3 }) with θ = 1. θ = 1 since {t3 , t4 , t5 , t6 , t7 , t8 } is maximal (it is a modus), it corresponds to interval [1, 2] and naturally 2−1 = 1 is the length of this interval. t1 = [0, 0] t2 = [0, 1] t3 = [0, 2] t4 = [0, 6] t5 = [0, 7] m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 g1 × × × × × × × × × × × × × × × × g2 × × × × × × × × × × × × × × × × × × g3 × × × × × × × × × × × × × g4 × × × × × × t6 = [0, 8] t7 = [0, 9] t8 = [1, 9] t9 = [2, 9] t10 = [6, 9] m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 g1 × × × × × × × × × × × × × × × × × × × g2 × × × × × × × × × × × × × × × × × g3 × × × × × × × × × × × × × × × × × × × × × × g4 × × × × × × × × × × × × × × × × × × × × × × × t11 = [7, 9] t12 = [8, 9] t13 = [9, 9] m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 g1 g2 g3 × g4 × × × × × × Table 4: Triadic scaled context for Table 1 with interordinal scaling. Hence we showed that extracting biclusters of similar values for any θ in a numerical dataset can be achieved by (i) scaling the attribute value dimension and (ii) extracting the triadic concepts in the resulting derived triadic context. Interestingly, triadic concepts (A, B, U ) with the largest sets A, B or C rep- resent large biclusters of close values. Indeed, the larger |A| and |B| the larger the data covering of the corresponding bicluster. Furthermore, the larger |U |, the more similar values for bicluster (A, B). Indeed, by the properties of interordinal Mining bicluster of similar values with triadic concept analysis 181 scaling, the more intervals in U , the smaller their interval intersection. Mining so called top-k frequent triadic concepts can accordingly be achieved with the existing algorithm Data-Peeler [2]. On another hand, extracting maximal biclusters for all θ may be neither efficient nor effective with large numerical data: their number tends to be very large and all biclusters are not relevant for a given analysis. Furthermore, both size and density of contexts derived with interordinal scaling are known to be problematic w.r.t algorithmic scalability, see e.g. [9]. In existing methods of the literature, θ is set a priori. We show now how to handle this case with slight modifications, our second main result. 5 Extracting biclusters of similar values for a given θ In this section we consider the problem of extracting maximal biclusters of sim- ilar values in TCA for a given θ only. It comes with slight modifications of the methodology presented in last section. Intuitively, consider the previous scaling applied on a numerical dataset (G, M, W, I). It scales W into dimension T and subsets of T characterize all intervals of values over W . To get maximal biclusters for a given θ only, we should not consider all possible intervals in W , but rather all intervals (i) having a range size that is less or equal than θ to avoid biclusters with non similar values, and (ii) having a range size the closest as possible to θ to avoid non-maximal biclusters. For example, if we set θ = 2, it is probably not interesting to consider interval [0, 8] in the scale dimension since 8 − 0 > θ. Similarly, considering the interval [6, 6] may not be interesting as well, since a bicluster with all its values equal to 6 may not be maximal. As introduced in [6], those maximal intervals of similar values used for the scale are called blocks of tolerance over the set of numbers W with respect to the tolerance relation 'θ . Therefore we firstly recall basics on tolerance relations over a set of numbers. It allows us to define a simpler scaling procedure. The resulting triadic context is then mined with a new TCA algorithm called TriMax to extract maximal biclusters of similar values for a given θ. Blocks of tolerance over W are defined as maximal sets of pairwise similar values from W : Definition 7 (Tolerance blocks from a set of numbers). The similarity relation 'θ is called a tolerance relation, i.e. reflexive, symmetric but not tran- sitive. Given a set W of values, a subset V ⊆ W , and a tolerance relation 'θ over W , V is a block of tolerance if: (i) ∀w1 , w2 ∈ V, w1 'θ w2 (pairwise similarity) (ii) ∀w1 6∈ V, ∃w2 ∈ V, w1 6'θ w2 (maximality). From Table 1 we have W = {0, 1, 2, 6, 7, 8, 9}. With θ = 2, one has 0 '2 2 but 2 6'2 6. Accordingly, one obtains 3 blocks of tolerance, namely the sets {0, 1, 2}, {6, 7, 8} and {7, 8, 9}. These three sets can be renamed as the convex hull of their elements on N: respectively, [0, 2], [6, 8] and [7, 9]: any number lying between the 182 Mehdi Kaytoue et al. minimal and the maximal elements (w.r.t. natural number ordering) of a block of tolerance is naturally similar to any other element of the block. To derive a triadic context from a numerical dataset, we simply use tolerance blocks over W to define the scale dimension. Definition 8 (Trimax scale relation). The scale relation is a binary relation J ⊆ W × C, where C is the set of blocks of tolerance over W renamed as their convex hulls. Then, (w, c) ∈ J iff w ∈ c. Example 6. From Table 1 we have: C = {[0, 1], [1, 2], [6, 7], [7, 8], [8, 9]} with θ = 1, and C = {[0, 2], [6, 8], [7, 9]} with θ = 2. Then, we can apply the same context derivation as in previous section: scaling is still based on intervals, but this time it uses tolerance blocks. Definition 9 (TriMax triadic scaled context). Let Y ⊆ G × M × C be a ternary relation. Then (g, m, c) ∈ Y iff (m(g), c) ∈ J, or simply m(g) ∈ c, where J is the scale relation. (G, M, C, Y ) is called the TriMax triadic scaled context. Example 7. Table 5 is the Trimax triadic scaled concept derived from the nu- merical dataset lying in Table 1 with θ = 1. label 1 label 2 label 3 label 4 label 5 [0, 1] [1, 2] [6, 7] [7, 8] [8, 9] m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 g1 × × × × × × × g2 × × × × × × × g3 × × × × × × × g4 × × × × × × × Table 5: Triadic scaled context using tolerance blocks over W and θ = 1. Definition 10 (Dyadic context associated with a block of tolerance). Consider a block of tolerance c ∈ C. The dyadic context associated with this block is given by (G, M, Z) where z ∈ Z denotes all (g, m) ∈ G × M such as m(g) ∈ c. Example 8. In Table 5, each such dyadic context is labelled by its corresponding block of tolerance. Now, remark that blocks of tolerance over W are totally ordered: let [v1 , v2 ] and [w1 , w2 ] be two blocks of tolerance, one has [v1 , v2 ] < [w1 , w2 ] iff v1 < w1 . Hence, associated dyadic contexts are also totally ordered and we use a corre- sponding indexing set to label them. In Table 5, contexts for blocks h[0, 1], [1, 2], [6, 7], [7, 8], [8, 9]i are respectively labelled h1, 2, 3, 4, 5i. We now present our second main results: The scaled triadic context supports the extraction of maximal biclusters of similar values for a given θ. In this case however, existing algorithms of TCA cannot be applied directly. For example, in Table 5, the triconcept ({g3 }, {m4 }, {3, 4}) corresponds to a bicluster of similar values which is not maximal. Hence we present hereafter a new TCA algorithm for this task, called TriMax. Mining bicluster of similar values with triadic concept analysis 183 The basic idea of TriMax relies on the following facts. Firstly, since each dyadic context corresponds to a block of tolerance, we do not need to compute intersections of contexts, such as classically done in TCA. Hence each dyadic context is processed separately. Secondly, a dyadic concept of a dyadic context necessarily represents a bicluster of similar values, but we cannot be sure it is maximal (see previous example). Hence, we need to check if a concept is still a concept in other dyadic contexts, corresponding to other classes of tolerance. This is made precise with the following proposition. Proposition 2. Let (A, B, U ) be a triadic concept from Trimax triadic scaled context (G, M, C, Y ), such that U is the outer closure of a singleton {c} ⊆ C. If |U | = 1, (A, B) is a maximal bicluster of similar values. Otherwise, (A, B) is a maximal bicluster of similar values iff @y ∈ [min(U ); max(U )], y < c s.t. 0 0 (A, B) 6= Ψy (Ψy ((A, B))), where Ψy (.) and Ψy (.) correspond to inner derivation operators associated with y th dyadic context. Proof. When |U | = 1, (A, B) is a dyadic concept only in one dyadic context corresponding to a block of tolerance. By properties of tolerance blocks, (A, B) is a maximal bicluster. If |U | 6= 1, (A, B) is a dyadic concept in |U | dyadic contexts. Since the tolerance block set is totally ordered, it directly implies that modus U is an interval [min(U ); max(U )]. Hence, if ∃y ∈ [min(U ); max(U )] s.t. 0 (A, B) = Ψy (Ψy ((A, B))) this means that (A, B) is not a maximal bicluster of similar values. Description of the TriMax algorithm. TriMax starts with scaling ini- tial numerical data into several dyadic contexts, each one standing for a block of tolerance over W with given θ. The set of all dyadic contexts forms accordingly a triadic context. Then, each dyadic context is mined with any FCA algorithm (or closed itemset mining algorithm), and all formal concepts are extracted. For 0 a given concept (A, B), we compute outer derivation Φ ((A, B)), i.e. to obtain the set of dyadic contexts labels in which the current dyadic concept holds. If it results in a singleton, this means that (A, B) is a concept for the current block of tolerance only, i.e. it is a maximal bicluster of similar values, and it has been, or will never be, generated twice. Otherwise, (A, B) is a concept in other con- texts, and can be generated accordingly several times (as much as the number of contexts in which it holds). Then, we only consider (A, B) if we are sure it is the last time it is computed. Finally, we need to check if current concept represents a maximal bicluster, i.e. there should not exist a context from the modus where (A, B) is not a dyadic concept. Proposition 3. TriMax outputs a (i) complete, (ii) correct and (iii) non re- dundant collection of all maximal biclusters of similar values for a given numer- ical dataset and similarity parameter θ. Proof. (i) and (ii) follow directly from Proposition 2. Statement (iii) is ensured by the second if condition of the algorithm: a dyadic concept (or equivalently bicluster) is considered iff it has been extracted in the last dyadic context in which it holds. 184 Mehdi Kaytoue et al. Algorithm 1: TriMax input : Numerical dataset (G, M, W, I), tolerance parameter θ output: Maximal biclusters of similar values Let C = {[ai , bi ]} be the totally ordered set of all blocks over W for given θ. Indices i form an indexing set. forall the [ai , bi ] ∈ C do Build context (G, M, Zi ) such that (g, m) ∈ Zi ⇔ m(g) ∈ [ai , bi ] forall the (G, M, Zi ) do Use any FCA algorithm to extract all its concepts (A, B) forall the dyadic concepts (A, B) in the current context (G, M, Zi ) do 0 if |Φ ((A, B))| = 1 then print (A, B) 0 else if max(Φ ((A, B)) = i then 0 x ← min(Φ ((A, B)) 0 if @y ∈ [x, i[ s.t. (A, B) 6= Ψy (Ψy ((A, B))) then print (A, B) 6 Computer experiments In this section, we experiment with the algorithm TriMax and highlight various aspects of its practical complexity. Data. We explore a gene expression dataset of the species Laccaria bicolor avail- able at NCBI5 . More details on this dataset can be found in [9]. This gene expres- sion dataset monitors the behaviour of 11, 930 genes in 12 biological situations, reflecting various stages of Laccaria bicolor biological cycle. Attribute values in W vary between 0 and 60, 000. TriMax implementation. TriMax is written in C++. It uses the boost library 1.42 for data structures and the implementation of InClose from its authors6 for dyadic concepts extraction. At each iteration of the main loop, i.e. each tolerance block, the current scaled dyadic context is produced: We do not generated the whole triadic context which cannot fit into memory for large databases. It turns out that the modus computation for a given dyadic concept requires to compute scaling “on the fly”, i.e. when computing the set of dyadic contexts in which a current concept holds. The experiments were carried out on an Intel CPU 2.54 Ghz machine with 8 GB RAM running under Ubuntu 11.04. Experiment settings. The goal of the present experiments is not to give a qualitative evaluation of the present approach (say biological interpretation), but rather a quantitative evaluation. Indeed, the present work aims at showing 5 http://www.ncbi.nlm.nih.gov/geo/ as series GSE9784 6 http://sourceforge.net/projects/inclose/ Mining bicluster of similar values with triadic concept analysis 185 (i) Numbers of patterns (Y-axis) (ii) Execution times in seconds (Y-axis) w.r.t. θ (X-axis) and |G| (Z-axis) w.r.t. θ (X-axis) and |G| (Z-axis) (iii) Numbers of blocks of tolerance (Y-axis) (iv) Density of triadic contexts (Y-axis) w.r.t. θ (X-axis) and |G| (Z-axis) w.r.t. θ (X-axis) and |G| (Z-axis) (v) Comparing the number of generated dyadic (vi) Repartition of execution time concepts w.r.t. the actual number of maximal w.r.t main steps of TriMax biclusters varying θ with |G| = 500 with θ = 33, 000 and |G| = 500 Fig. 1: Monitoring with different settings (i) the number of maximal biclusters, (ii) the execution times of TriMax, (iii) the number of tolerance blocks, (iv) the derived triadic context density, (v) the number of non-maximal biclusters generated as dyadic-concepts w.r.t. the number of maximal biclusters, and (vi) repartition of execution time in the TriMax algorithm. 186 Mehdi Kaytoue et al. how an existing type of biclusters can be mined with Triadic Concept Analysis. For a qualitative evaluation, the reader may refer for example to [1, 9]. Accordingly, we designed the following experiments to monitor various as- pects of the TriMax algorithm. For most of the experiments, the dataset used is composed of an increasing number of objects and all attributes. The objects are chosen randomly once and for all so that the different experiment results can be compared. We also vary the parameter θ in the same way across all experiments. Then, we monitor the following aspects, as presented in Figure 1: i. Number of maximal biclusters of similar values ii. Execution time (in seconds) iii. Number of tolerance blocks iv. Density of the triadic context, where density is defined as d(G, M, C, Y ) = |Y |/(|G| × |M | × |C|). This information is important, since contexts with high density are known to be hard to process with FCA algorithms [11], and we use the InClose algorithm for dyadic contexts processing. v. Comparison between the number of non-maximal biclusters produced by TriMax (i.e. dyadic concepts that do not corresponds to maximal biclus- ters) with the number of maximal biclusters. vi. Execution time profiling of the main procedures of TriMax. This is achieved with the tool GNU GProf and gives us what parts of the algorithm are the most time consuming. Experiment results. Figure 1 presents the results of our experiments with different settings. In these settings, we vary the number of objects |G| and the parameter θ. A first observation arises from graph (i): the number of biclusters is the highest when θ ' 30, 000. A first explanation is that 30, 000 is the half of the maximal value of W and almost all multiples of 100 in [0; 60, 000] belongs to W . In graph (ii), execution time has the same behaviour as graph (i). These results can be understood by paying attention to the next graphs (iii) and (iv). In (iii) is monitored the number of tolerance blocks. The maximal number is reached when θ = 0, i.e. |C| = |W |. When θ = max(W ), we have |C| = 1. Now we observe in (iv) that the density follows a reverse behaviour: When θ = 0, the density tends towards 0%; when θ = max(W ), then density exactly equal 1%. Combining both graph (iii) and (iv), the worst cases happen when both density and tolerance bloc count are high. Another observation, which explains also the execution times, arises from graph (v). Here are compared the number of maximal biclusters and the number of non-maximal biclusters generated as dyadic concepts. Here again, worst case is reached when θ ' 30, 000. Looking at graph (vi), we learn that this is however not the major problem. The mostly consuming procedure of TriMax is the computation of the modus of a dyadic concept. The explanation is that we compute modus with “on the fly scaling”. Therefore, the bottleneck of our algorithm reveals itself to be the modus computation. In practical applications however, the analyst is not interested in all biclusters of similar values. Some constraints are generally defined, such as a minimal (resp. maximal) number of objects (resp. attributes) in a bicluster Mining bicluster of similar values with triadic concept analysis 187 (A, B), or a minimal area |A| × |B|, etc. Interestingly, most of those constraints can be evaluated on a generated dyadic concept. Therefore, before computing the modus of such concept, we can check such properties and discard the concept if not respecting the constraints. Although not reflected in this paper, we tested how adding minimal (resp. maximal) size constraints on a bicluster affects both number of biclusters and execution times. The results are very interesting: for example with θ = 33, 000, |G| = 500, and minimal (resp. maximal) size for |A| set to 10 (resp. 40), TriMax produces only 5, 332 maximal biclusters in 2.1 seconds compared to 104, 226 maximal biclusters extracted in 16.130 seconds without any constraint. Finally, the most interesting aspect of TriMax is its direct distributed com- putation capacity. Indeed, each iteration, i.e. for each block of tolerance, can be achieved independently from the others. Furthermore, the core of TriMax consisting in extracting dyadic contexts can also be distributed, see e.g. [10]. A deeper investigation remains to be done in this case. Note that although the method description involves W as a set of natural numbers, TriMax can directly handle numerical data real numbers, and has been implemented as such. Comparison with existing methods. Two existing methods in the literature also consider the problem of extracting all maximal biclusters of similar values from a numerical dataset. The first method is called Numerical Biset Miner (NBS-Miner [1]). The second method is based on interval pattern structures (IPS [7, 8]). Limited by space, we do not detail these methods. Both NBS-Miner and IPS algorithms have been implemented in C++. First experiments show that NBS-Miner is not scalable compared to IPS and TriMax. On another hand, it seems that TriMax outperforms IPS, but a deeper investigation is required. The main problem in IPS is to find an efficient algorithm able to compute tolerance blocks over a set of intervals. 7 Conclusion We addressed the problem of biclustering numerical data with Formal Concept Analysis. So called (maximal) biclusters of similar values can be characterized and extracted with Triadic Concept Analysis, which turns out to be a novel mathematical framework for this task. We properly defined a scaling procedure turning original numerical data into triadic contexts from which biclusters can be extracted as triadic concepts with existing algorithms. This approach allows a correct, complete and non-redundant extraction of all maximal biclusters, for any similarity parameter θ and can be extended to n-ary numerical datasets while their computation can be directly distributed. The interpretation of triadic con- cepts is very rich: both extent and intent allow to characterize a bicluster (i.e. the rectangle), while the modus gives the range of values of the biclusters, and for which θ is the bicluster maximal. Moreover, the larger the modus, the more simi- lar the values within current bicluster. It follows a perspective of research, aiming at extracting the top-k frequent tri-concepts with Data-Peeler [2], which can help to handle the problem of top-k biclusters extraction. We also adapted the 188 Mehdi Kaytoue et al. TCA machinery with algorithm TriMax to extract maximal biclusters for a user-defined θ, which is classical in the existing literature. It appears that Tri- Max is a fully customizable algorithm: any concept extraction algorithm can be used inside its core (along with several constraints on produced dyadic concepts), while its distributed computation is direct. Among several other experiments, it remains now to determine which are the best core algorithms for a given θ parameter, the very last directly influencing derived contexts density. Acknowledgements. Authors would like to thank Dmitry Andreevich Morozov for implementing the algorithms NBS-Miner and IPS. The second author was supported by the project of the Russian Foundation for Basic Research, grant no. 08-07-92497-NTsNIL a. Juraj Macko acknowledges support by Grant No. 202/10/0262 of the Czech Science Foundation. A Proof of the Proposition 1. Before proving this proposition, we need to introduce the following. For sake of simplicity, we now consider W as the set of all natural numbers from a numerical dataset that are greater or equal than the minimal value and lower or equal than the maximal value, i.e. W = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} with the example of Table 1. Definition 11 (Scale value and scale relation). We call scale value s = q − r where r = min(W ) and q = max(W ). The scale relation is a binary relation J ⊆ W × T , where T = {t1 , . . . , t2s+1 } r ≤ w ≤ q and hw, ti i ∈ J iff i ∈ [w − r + 1, w − r + 1 + s]. Note that J is equivalent to interordinal scale of W previously given, but this notations are used for the proof. Definition 12 (Eθw - cluster base). We introduce Eθw ⊆ T defined as Eθw = [tw+θ−r+1 ; tw−r+1+s ] for given θ and w ∈ W . Example 9 (Eθw - cluster base). E12 = [t2+1−0+1 ; t2−0+1+9 ] = [t4 ; t12 ]. Proposition 4. (wb = m(g)) 'θ (n(h) = wc ) iff (hg, mi ∈ YE12θb and hh, ni ∈ YE12θb ). Proof. Let Eb , Ec ⊆ T and wc ≥ wb . According to the definition (g, m) ∈ YE12θb iff m, g, t are related by Y for all t ∈ Eθb . Using scaling and definition we have [twb −r+1 ; twb −r+1+s ] = Eb ⊇ Eθb = [twb +θ−r+1 ; twb −r+1+s ] which is straight- forward. We just need to show that (h, n) ∈ YE12θb holds as well. With scaling definition and previous definition we get [twc −r+1 ; twc −r+1+s ] = Ec ⊇ Eθb = [twb +θ−r+1 ; twb −r+1+s ] holding iff wc − wb ≤ θ, which is equal to the definition of 'θ . Moreover we can easily see as a corollary that wc −wb ≤ θ holds iff Eb ∩Ec ⊇ Eθb and wc − wb = θ holds iff Eb ∩ Ec = Eθb . Now we can prove the Proposition 1 from the main text. Mining bicluster of similar values with triadic concept analysis 189 Proposition 1. Tuple hA1 , A2 , U i, where A1 ⊆ G, A2 ⊆ M and U ⊆ T is triadic concept iff (A1 , A2 ) is a maximal bicluster of similar values for some θ ≥ 0. Furthermore the value of θ is defined as θ = s − |U | + 1. Proof. Let U = Eθb and consider dyadic context YU12 = YE12θb for some wb . Using 0 dyadic closure operator Ψ (Ψ ((A1 )) we get (A1 , A2 ). From definition of triconcept we know that A1 ⊆ B1 implies A1 = B1 (the same for A2 ). From definition of maximal bicluster of similar values we know that hA1 , A2 i is maximal when it does not exists hB1 , B2 i s.t. B1 ⊇ A1 (the same applies for A2 ). It is obvious that both sets are maximal from definition and when we have the same dyadic context YU12 = YE12θb . Now we need to look at dyadic context YU12 = YE12θb . In |U | = |Eθb | = |[twb +θ−r+1 ; twb −r+1+s ]| we can easily see that |U | = s − θ + 1, which gives θ = s − |U | + 1. Finally, U is maximal (as being modus of a triconcept) and Eθb is maximal as well because wc − wb ≤ θ holds iff Eb ∩ Ec ⊇ Eθb . All facts mentioned in this proof leads to equality of the triconcept and maximal bicluster of similar values. References 1. Besson, J., Robardet, C., Raedt, L.D., Boulicaut, J.F.: Mining bi-sets in numerical data. In: Dzeroski, S., Struyf, J. (eds.) KDID. Lecture Notes in Computer Science, vol. 4747, pp. 11–23. Springer (2007) 2. Cerf, L., Besson, J., Robardet, C., Boulicaut, J.F.: Closed patterns meet n-ary relations. TKDD 3(1) (2009) 3. Ganter, B., Wille, R.: Formal Concept Analysis. Springer (1999) 4. Jäschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Trias - an algorithm for mining iceberg tri-lattices. In: ICDM. pp. 907–911 (2006) 5. Ji, L., Tan, K.L., Tung, A.K.H.: Mining frequent closed cubes in 3d datasets. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB). pp. 811–822. ACM (2006) 6. Kaytoue, M., Assaghir, Z., Napoli, A., Kuznetsov, S.O.: Embedding tolerance re- lations in formal concept analysis: an application in information fusion. In: CIKM. pp. 1689–1692. ACM (2010) 7. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Biclustering numerical data in formal concept analysis. In: Valtchev, P., Jäschke, R. (eds.) ICFCA. LNCS, vol. 6628, pp. 135–150. Springer (2011) 8. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting numerical pattern mining with formal concept analysis. In: Proceedings of the 22nd International Joint Con- ference on Artificial Intelligence (IJCAI). IJCAI/AAAI (2011) 9. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181(10), 1989– 2001 (2011) 10. Krajca, P., Vychodil, V.: Distributed algorithm for computing formal concepts using map-reduce framework. In: IDA. pp. 333–344. Springer (2009) 11. Kuznetsov, S.O., Obiedkov, S.A.: Comparing performance of algorithms for gen- erating concept lattices. J. Exp. Theor. Artif. Intell. 14(2-3), 189–216 (2002) 12. Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In: ICCS. LNCS, vol. 954, pp. 32–43. Springer (1995) 190 Mehdi Kaytoue et al. 13. Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004) 14. Voutsadakis, G.: Polyadic concept analysis. Order 19(3), 295–304 (2002) Fast Mining of Iceberg Lattices: A Modular Approach Using Generators Laszlo Szathmary1 , Petko Valtchev1 , Amedeo Napoli2 , Robert Godin1 , Alix Boc1 , and Vladimir Makarenkov1 1 Dépt. d’Informatique UQAM, C.P. 8888, Succ. Centre-Ville, Montréal H3C 3P8, Canada Szathmary.L@gmail.com, {valtchev.petko, godin.robert}@uqam.ca, {makarenkov.vladimir, boc.alix}@uqam.ca 2 LORIA UMR 7503, B.P. 239, 54506 Vandœuvre-lès-Nancy Cedex, France napoli@loria.fr Abstract. Beside its central place in FCA, the task of constructing the concept lattice, i.e., concepts plus Hasse diagram, has attracted some interest within the data mining (DM) field, primarily to support the mining of association rule bases. Yet most FCA algorithms do not pass the scalability test fundamental in DM. We are interested in the ice- berg part of the lattice, alias the frequent closed itemsets (FCIs) plus precedence, augmented with the respective generators (FGs) as these provide the starting point for nearly all known bases. Here, we investi- gate a modular approach that follows a workflow of individual tasks that diverges from what is currently practiced. A straightforward instantia- tion thereof, Snow-Touch, is presented that combines past contributions of ours, Touch for FCIs/FGs and Snow for precedence. A performance comparison of Snow-Touch to its closest competitor, Charm-L, indicates that in the specific case of dense data, the modularity overhead is offset by the speed gain of the new task order. To demonstrate our method’s usefulness, we report first results of a genome data analysis application. 1 Introduction Association discovery [1] in data mining (DM) is aimed at pinpointing the most frequent patterns of items, or itemsets, and the strongest associations between items dug in a large transaction database. The main challenge here is the po- tentially huge size of the output. A typical way out is to focus on a basis, i.e. a reduced yet lossless representation of the target family (see a list in [2]). Many popular bases are either formulated in terms of FCA or involve structures that do. For instance, the minimal non-redundant association rules [3] require the computation of the frequent closed itemsets (FCI) and their respective frequent generators (FGs), while the informative basis involves the inclusion-induced precedence links between FCIs. We investigate the computation of iceberg lattices, i.e., FCIs plus prece- dence, together with the FGs. In the DM literature, several methods exist that c 2011 by the paper authors. CLA 2011, pp. 191–206. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 192 2 Laszlo Szathmary et al. Laszlo Szathmaryet al. target FCIs by first discovering the associated FGs (e.g. the levelwise FCI miners A-Close [4] and Titanic [5]). More recently, a number of extensions of the pop- ular FCI miner Charm [6] have been published that output two or all three of the above components. The basic one, Charm-L [7], produces FCIs with prece- dence links (and could be regarded as a lattice construction procedure). Further extensions to Charm-L produce the FGs as well (see [8,9]). In the FCA field, the frequency aspect of concepts has been mostly ignored whereas generators have rarely been explicitly targeted. Historically, the first method whose output combines closures, generators and precedence has been presented in [10] yet this fact is covered by a different terminology and a some- what incomplete result (see explanations below). The earliest method to explic- itly target all three components is to be found in [11] while an improvement was published in [12]. Yet all these FCA-centered methods have a common drawback: They scale poorly on large datasets due to repeated scans of the entire database (either for closure computations or as an incremental restructuring technique). In contrast, Charm-L exploits a vertical encoding of the database that helps mitigate the cost of the impact of the large object (a.k.a. transaction) set. Despite a diverging modus operandi, both FCA and data mining methods follow the same overall algorithmic schema: they first compute the set of con- cepts/FCIs and the precedence links between them and then use these as input in generator/FG calculation. However efficient Charm-L is, its design is far from optimal: For instance, FCI precedence is computed at FCI discovery, benefiting from no particular in- sight. Thus, many FCIs from distant parts of the search space are compared. We therefore felt that there is space for improvement, e.g., by bringing in tech- niques operating locally. An appealing track seemed to lay in the exploration of an important duality from hypergraph theory to inverse the computation depen- dencies between individual tasks (and thus define a new overall workflow). To clarify this point, we chose to investigate a less intertwined algorithmic schema, i.e. by a modular design so that each individual task could be targeted by the best among a pool of alternative methods. Here, we describe a first step in our study, Snow-Touch, which has been assembled from existing methods by wiring them w.r.t. our new schema. Indeed, our method relies on Charm for mining FCIs and on the vertical FG miner Talky-G, which are put together into a combined miner, Touch [13], by means of an FGs-to-FCIs matching mechanism. The Snow method [14] extracts the precedence links from FCIs and FGs. The pleasant surprise with Snow-Touch was that, when a Java implemen- tation thereof was experimentally compared to Charm-L (authors’ version in C++) on a wide range of data, our method prevailed on all dense datasets. This was not readily anticipated as the modular design brought a computational over- head, e.g. the extra matching step. Moreover, Snow-Touch proved to work well with real-world data, as the first steps of a large-scale analysis of genomic data indicate. Fast Mining of Iceberg Lattices:Mining A Modular Approach Iceberg LatticesUsing with Generators Generators 1933 In summary, we contribute here a novel computation schema for iceberg lattices with generators (hence a new lattice construction approach). More- over, we derive an efficient FCI/FG/precedence miner (especially on dense sets). We also demonstrate the practical usefulness of Snow-Touch as well as of the global approach for association mining based on generic rules. The remainder of the paper is as follows: Background on pattern mining, hypergraphs, and vertical pattern mining is provided in Section 2. In Section 3 we present the different modules of the Snow-Touch algorithm. Experimental evaluations are provided in Section 4 and conclusions are drawn in Section 5. 2 Background In the following, we summarize knowledge about relevant structures from fre- quent pattern mining and hypergraph theory (with parallels drawn with similar notions from FCA) as well as about efficient methods for mining them. 2.1 Basic facts from pattern mining and concept analysis In pattern mining, the input is a database (comparable to an FCA formal con- text). Records in the database are called transactions (alias objects), denoted here O = {o1 , o2 , . . . , om }. A transaction is basically subsets of a given total set of items (alias attributes), denoted here A = {a1 , a2 , . . . , an }. Except for its itemset, a transaction is explicitly identified through a unique identifier, a tid (a set of identifiers is thus called a tidset). Throughout this paper, we shall use the following database as a running example (the “dataset D”): D = {(1, ACDE), (2, ABCDE), (3, AB), (4, D), (5, B)}. The standard 0 derivation operators from FCA are denoted differently in this context. Thus, given an itemset X, the tidset of all transactions comprising X in their itemsets is the image of X, denoted t(X) (e.g. t(AB) = 23). We recall that an itemset of length k is called a k-itemset. Moreover, the (absolute) support of an itemset X, supp : ℘(A) → N, is supp(X) = |t(X)|. An itemset X is called fre- quent, if its support is not less than a user-provided minimum support (denoted by min supp). Recall as well that, in [X], the equivalence class of X induced by t(), the extremal elements w.r.t. set-theoretic inclusion are, respectively, the unique maximum X 00 (a.k.a. closed itemset or the concept intent), and its set of minimums, a.k.a. the generator itemsets. In data mining, an alternative defi- nition is traditionally used stating that an itemset X is closed (a generator ) if it has no proper superset (subset) with the same support. For instance, in our dataset D, B and C are generators, whose respective closures are B and ACDE. In [6], a subsumption relation is defined as well: X subsumes Z, iff X ⊃ Z and supp(X) = supp(Z). Obviously, if Z subsumes X, then Z cannot be a generator. In other words, if X is a generator, then all its subsets Y are generators as well3 . Formally speaking, the generator family forms a downset within the Boolean lattice of all itemsets h℘(A), ⊆i. 3 Please notice that the dual property holds for non generators. 194 4 Laszlo Szathmary et al. Laszlo Szathmaryet al. Fig. 1. Concept lattices of dataset D. (a) The entire concept lattice. (b) An iceberg part of (a) with min supp = 3 (indicated by a dashed rectangle). (c) The concept lattice with generators drawn within their respective nodes The FCI and FG families losslessly represent the family of all frequent item- sets (FIs) [15]. They jointly compose various non-redundant bases of valid asso- ciation rules, e.g. the generic basis [2]. Further bases require the inclusion order ≤ between FCIs or its transitive reduction ≺, i.e. the precedence relation. In Fig. 1 (adapted from [14]), views (a) and (b) depict the concept lattice of dataset D and its iceberg part, respectively. Here we investigate the efficient computation of the three components of an association rule basis, or what could be spelled as the generator-decorated iceberg (see Fig. 1 (c)). 2.2 Effective mining methods for FCIs, FGs, and precedence links Historically, the first algorithm computing all closures with their generators and precedence links can be found in [10] (although under a different name in a somewhat incomplete manner). Yet the individual tasks have been addressed separately or in various combinations by a large variety of methods. First, the construction of all concepts is a classical FCA task and a large number of algorithms exist for it using a wide range of computing strategies. Yet they scale poorly as FCI miners due to their reliance on object-wise com- putations (e.g. the incremental acquisition of objects as in [10]). These involve to a large number of what is called data scans in data mining that are known to seriously deteriorate the performances. In fact, the overwhelming majority of FCA algorithms would suffer on the same drawback as they have been designed under the assumption that the number of objects and the number of attributes remain in the same order of magnitude. Yet in data mining, there is usually a much larger number of transactions than there are items. As to generators, they have attracted significantly less attraction in FCA as a standalone structure. Precedence links, in turn, are sometimes computed by Fast Mining of Iceberg Lattices:Mining A Modular Approach Iceberg LatticesUsing with Generators Generators 1955 concept mining FCA algorithms beside the concept set. Here again, objects are heavily involved in the computation hence the poor scaling capacity of these methods. The only notable exception to this rule is the method described in [16] which was designed to deliberately avoid referring to objects by relying exclu- sively on concept intents. When all three structures are considered together, after [10], efficient methods for the combined task have been proposed, among others, in [11,12]. In data mining, mining FCIs is also a popular task [17]. Many FCI miners exist and a good proportion thereof would output FGs as a byproduct. For instance, levelwise miners such as Titanic [5] and A-Close [17], use FGs as entry points into the equivalence classes of their respective FCIs. In this field, the FGs, under the name of free-sets [15], have been targeted by dedicated miners. Precedence links do not seem to play a major role in pattern mining since few miners would consider them. In fact, to the best of our knowledge, the only mainstream FCI miner that would also output the Hasse diagram of the iceberg lattice is Charm-L [8]. In order to avoid multiple data scans, Charm-L relies on a specific encoding of the transaction database, called vertical, that takes advantage of the aforementioned asymmetry between the number of transactions and the number of items. Moreover, two ulterior extensions thereof [7,9] would also cover the FGs for each FCI, making them the primary competitors for our own approach. Despite the clear discrepancies in their modus operandi, both FCA-centered algorithms and FCI/FG miners share their overall algorithmic schema. Indeed, they first compute the set of concepts/FCIs and the precedence links between them and then use these as input in generator/FG calculation. The latter task can be either performed along a levelwise traversal of the equivalence class of a given closure, as in [8] and [10], or shaped as the computation of the minimal transversals of a dedicated hypergraph4 , as in [11,12] and [9]. While such a schema could appear more intuitive from an FCA point of view (first comes the lattice, then the generators which are seen as an “extra”), it is less natural and eventually less profitable for data mining. Indeed, while a good number of association rule bases would require the precedence links in order to be constructed, FGs are used in a much larger set of such bases and may even constitute a target structure of their own (see above). Hence, a more versatile mining method would only output the precedence relation (and compute it) as an option, which is not possible with the current design schema. More precisely, the less rigid order between the steps of the combined task would be: (1) FCIs, (2) FGs, and (3) precedence. This basically means that precedence needs to be computed at the end, independently from FG and FCI computations (but may rely on these structures as input). Moreover, the separation of the three steps insures a higher degree of modularity in the design of the concrete methods following our schema: Any combination of methods that solve an individual task could be used, leaving the user with a vast choice. On the reverse side of the coin, 4 Termed alternatively as (minimal) blockers or hitting sets. 196 6 Laszlo Szathmary et al. Laszlo Szathmaryet al. total modularity comes with a price: if FGs and FCIs are computed separately, an extra step will be necessary to match an FCI to its FGs. We describe hereafter a method fitting the above schema which relies exclu- sively on existing algorithmic techniques. These are combined into a single global procedure, called Snow-Touch in the following manner: The FCI computation is delegated to the Charm algorithm which is also the basis for Charm-L. FGs are extracted by our own vertical miner Talky-G. The two methods together with an FG-to-FCI matching technique form the Touch algorithm [13]. Finally, precedence is retrieved from FCIs with FGs by the Snow algorithm [14] using a ground duality result from hypergraph theory. In the remainder of this section we summarize the theoretical and the algo- rithmic background of the above methods which are themselves briefly presented and illustrated in the next section. 2.3 Hypergraphs, transversals, and precedence in closure semi-lattices The generator computation in [11] exploits the tight interdependence between the intent of a concept, its generators and the intents of its immediate predecessor concepts. Technically speaking, a generator is a minimal blocker for the family of faces associated to the concept intent and its predecessor intents5 . Example. Consider the closed itemset (CI) lattice in Figure 1 (c). The CI ABCDE has two faces: F1 = ABCDE \ AB = CDE and F2 = ABCDE \ ACDE = B. It turns out that blocker is a different term for the widely known hypergraph transversal notion. We recall that a hypergraph [18] is a generalization of a graph where edges can connect arbitrary number of vertices. Formally, it is a pair (V ,E) made of a basic vocabulary V = {v1 , v2 , . . . , vn }, the vertices, and a family of sets E, the hyper-edges, all drawn from V . A set T ⊆ V is called a transversal of H if it has a non-empty intersection with all the edges of H. A special case are the minimal transversals that are exploited in [11]. Example. In the above example, the minimal transversals of {CDE, B} are {BC, BD, BE}, hence these are the generators of ABCDE (see Figure 1 (c)). The family of all minimal transversals of H constitutes the transversal hy- pergraph of H (T r(H)). A duality exists between a simple hypergraph and its transversal hypergraph [18]: For a simple hypergraph H, T r(T r(H)) = H. Thus, the faces of a concept intent are exactly the minimal transversals of the hyper- graph composed by its generators. Example. The bottom node in Figure 1 (c) labelled ABCDE has three generators: BC, BD, and BE while the transversals of the corresponding hy- pergraph are {CDE, B}. 5 A face is the set-theoretic difference between the intents of two concepts bound by a precedence link. Fast Mining of Iceberg Lattices:Mining A Modular Approach Iceberg LatticesUsing with Generators Generators 1977 Fig. 2. Left: pre-order traversal with Eclat; Right: reverse pre-order traversal with Talky-G 2.4 Vertical Itemset Mining Miners from the literature, whether for plain FIs or FCIs, can be roughly split into breadth-first and depth-first ones. Breadth-first algorithms, more specifi- cally the Apriori -like [1] ones, apply levelwise traversal of the pattern space exploiting the anti-monotony of the frequent status. Depth-first algorithms, e.g., Closet [19], in contrast, organize the search space into a prefix-tree (see Figure 2) thus factoring out the effort to process common prefixes of itemsets. Among them, the vertical miners use an encoding of the dataset as a set of pairs (item, tidset), i.e., {(i, t(i))|i ∈ A}, which helps avoid the costly database re-scans. Eclat [20] is a plain FI miner relying on a vertical encoding at a depth-first traversal of a tree structure, called IT-tree, whose nodes are X ×t(X) pairs. Eclat traverses the IT-tree in a pre-order way, from left-to-right [20] (see Figure 2). Charm adapts the computing schema of Eclat to the rapid construction of the FCIs [6]. It is knowingly one of the fastest FCI-miners, hence its adoption as a component in Touch as well as the idea to look for similar technique for FGs. However, a vertical FG miner would be closer to Eclat than to Charm as it requires no specific speed-up during the traversal (recall that FGs form a downset). In contrast, there is a necessary shift in the test focus w.r.t. Eclat: Instead of supersets, subsets need to be examined to check candidate FGs. This, in turn, requires that all such subsets are already tested at the moment an itemset is examined. In other terms, the IT-tree traversal order needs to be a linear extension of ⊆ order between itemsets. 3 The Snow-Touch Algorithm We sketch below the key components of Snow-Touch i.e. Talky-G, Touch, and Snow. 3.1 Talky-G Talky-G is a vertical FG miner that constructs an IT-tree in a depth-first right- to-left manner [13]. 198 8 Laszlo Szathmary et al. Laszlo Szathmaryet al. Traversal Of The Generator Search Space Traversing ℘(A) so that a given set X is processed after all its subsets induces a ⊆-complying traversal order, i.e. a linear extension of ⊆. In FCA, a similar technique is used by the Next-Closure algorithm [21]. The underlying lectic order is rooted in an implicit mapping of ℘(A) to [0 . . . 2|A| −1], where a set image is the decimal value of its characteristic vector w.r.t. an arbitrary ordering rank : A ↔ [1..|A|]. The sets are then listed in the increasing order of their mapping values which represents a depth-first traversal of ℘(A). This encoding yields a depth-first right-to-left traversal (called reverse pre-order traversal in [22]) of the IT-tree representing ℘(A). Example. See Figure 2 for a comparison between the traversal strategies in Eclat (left) and in Talky-G (right). Order-induced ranks of itemsets are drawn next to their IT-tree nodes. The Algorithm The algorithm works the following way. The IT-tree is initialized by creating the root node and hanging a node for each frequent item below the root (with its respective tidset). Next, nodes below the root are examined, starting from the right-most one. A 1-itemset p in such a node is an FG iff supp(p) < 100% in which case it is saved to a dedicated list. A recursive exploration of the subtree below the current node then ensues. At the end, all FGs are comprised in the IT-tree. During the recursive exploration, all FGs from the subtree rooted in a node are mined. First, FGs are generated by “joining” the subtree’s root to each of its sibling nodes laying to the right. A node is created for each of them and hung below the subtree’s root. The resulting node’s itemset is the union of its parents’ itemsets while its tidset is the intersection of the tidsets of its parents. Then, all the direct children of the subtree’s root are processed recursively in a right-to-left order. When two FGs are joined to form a candidate node, two cases can occur. Either we obtain a new FG, or a valid FG cannot be generated from the two FGs. A candidate FG is the union of the input node itemsets while its tidset is the intersection of the respective tidsets. It can fail the FG test either by insufficient support (non frequent) or by a strict FG-subset of the same support (which means that the candidate is a proper superset of an already found FG with the same support). Example. Figure 3 illustrates Talky-G on an input made of the dataset D and a min supp = 1 (20%). The node ranks in the traversal-induced order are again indicated. The IT-tree construction starts with the root node and its children nodes: Since no universal item exists in D, all items are FGs and get a node below the root. In the recursive extension step, the node E is examined first: Absent right siblings, it is skipped. Node D is next: the candidate itemset DE Fast Mining of Iceberg Lattices:Mining A Modular Approach Iceberg LatticesUsing with Generators Generators 1999 Fig. 3. Execution of Talky-G on dataset D with min supp = 1 (20%) fails the FG test since of the same support as E. With C, both candidates CD and CE are discarded for the same reason. In a similar fashion, the only FGs in the subtree below the node of B are BC, BD, and BE. In the case of A, these are AB and AD. ABD fails the FG test because of BD. Fast Subsumption Checking During the generation of candidate FGs, a subsumer itemset cannot be a gen- erator. To speed up the subsumption computation, Talky-G adapts the hash structure of Charm for storing frequent generators together with their support values. Thus, as tidsets are used for hashing of FGs, two equivalent itemsets get the same hash value. Hence, when tracking a potential subsumee for a candidate X, we check within the corresponding list in the hash table for FGs Y having (i) the same support as X and, if positive outcome, (ii) proper subsets of X (see details in [13]). Example. The hash structure of the IT-tree in Figure 3 is drawn in Figure 4 (top right). The hash table has four entries that are lists of itemsets. The hash function over a tidset is the modulo 4 of the sum of all tids. For instance, to check whether ABD subsumes a known FG, we take its hash key, 2 mod 4 = 2, and check the content of the list at index 2. In the list order, B is discarded for support mismatch, while BE fails the subset test. In contrast, BD succeeds both the support and the inclusion tests so it invalidates the candidate ABD. 3.2 The Touch Algorithm The Touch algorithm has three main features, namely (1) extracting frequent closed itemsets, (2) extracting frequent generators, and (3) associating frequent generators to their closures, i.e. identifying frequent equivalence classes. Finally, our method matches FGs to their respective FCIs. To that end, it exploits the shared storage technique in both Talky-G and Charm, i.e. the hashing on their images (see Figure 4 (top)). The calculation is immediate: as the hash value of a FG is the same as for its FCI, one only needs to traverse 200 10 Laszlo Szathmary et al. Laszlo Szathmaryet al. FCI (supp) FGs FCI (supp) FGs AB (2) AB B (3) B ABCDE (1) BE; BD; BC ACDE (2) E; C; AD A (3) A D (3) D Fig. 4. Top: hash tables for dataset D with min supp = 1. Top left: hash table of Charm containing all FCIs. Top right: hash table of Talky-G containing all FGs. Bottom: output of Touch on dataset D with min supp = 1 the FG hash and for each itemset lookup the list of FCI associated to its own hash value. Moreover, setting both lists to the same size, further simplifies the procedure as both lists will then be located at the same offset within their respective hash tables. Example. Figure 4 (top) depicts the hash structures of Charm and Talky-G. Assume we want to determine the generators of ACDE which is stored at posi- tion 3 in the hash structure of Charm. Its generators are also stored at position 3 in the hash structure of Talky-G. The list comprises three members that are subsets of ACDE with the same support: E, C, and AD. Hence, these are the generators of ACDE. The output of Touch is shown in Figure 4 (bottom). 3.3 The Snow Algorithm Snow computes precedence links on FCIs from associated FGs [14]. Snow ex- ploits the duality between hypergraphs made of the generators of an FCI and of its faces, respectively to compute the latter as the transversals of the former. Thus, its input is made of FCIs and their associated FGs. Several algorithms can be used to produce this input, e.g. Titanic [5], A-Close [4], Zart [23], Touch, etc. Figure 4 (bottom) depicts a sample input of Snow. On such data, Snow first computes the faces of a CI as the minimal transver- sals of its generator hypergraph. Next, each difference of the CI X with a face yields a predecessor of X in the closed itemset lattice. Example. Consider again ABCDE with its generator family {BC, BD, BE}. First, we compute its transversal hypergraph: T r({BC, BD, BE}) = {CDE, B}. The two faces F1 = CDE and F2 = B indicate that there are two predeces- sors for ABCDE, say Z1 and Z2 , where Z1 = ABCDE \ CDE = AB, and Z2 = ABCDE \ B = ACDE. Application of this procedure for all CIs yields the entire precedence relation for the CI lattice. Fast Mining of Iceberg Lattices:Mining A Modular Approach Iceberg LatticesUsing with Generators Generators 201 11 Table 1. Database characteristics database # records # non-empty # attributes largest name attributes (in average) attribute T25I10D10K 10,000 929 25 1,000 Mushrooms 8,416 119 23 128 Chess 3,196 75 37 75 Connect 67,557 129 43 129 4 Experimental Evaluation In this section we discuss practical aspects of our method. First, in order to demonstrate that our approach is computationally efficient, we compare its per- formances on a wide range of datasets to those of Charm-L. Then, we present an application of Snow-Touch to the analysis of genomic data together with an excerpt of the most remarkable gene associations that our method helped to uncover. 4.1 Snow-Touch vs. Charm-L We evaluated Snow-Touch against Charm-L [8,9]. The experiments were carried out on a bi-processor Intel Quad Core Xeon 2.33 GHz machine running Ubuntu GNU/Linux with 4 GB RAM. All times reported are real, wall clock times, as ob- tained from the Unix time command between input and output. Snow-Touch was implemented entirely in Java. For performance comparisons, the authors’ origi- nal C++ source of Charm-L was used. Charm-L and Snow-Touch were executed with these options: ./charm-l -i input -s min_supp -x -L -o COUT -M 0 -n; ./leco.sh input min_supp -order -alg:dtouch -method:snow -nof2. In each case, the standard output was redirected to a file. The diffset optimization tech- nique [24] was activated in both algorithms.6 Benchmark datasets. For the experiments, we used several real and syn- thetic dataset benchmarks (see Table 1). The synthetic dataset T25, using the IBM Almaden generator, is constructed according to the properties of market basket data. The Mushrooms database describes mushrooms characteristics. The Chess and Connect datasets are derived from their respective game steps. The latter three datasets can be found in the UC Irvine Machine Learn- ing Database Repository. Typically, real datasets are very dense, while synthetic data are usually sparse. Response times of the two algorithms on these datasets are presented in Figure 5. Charm-L. Charm-L represents a state-of-the-art algorithm for closed item- set lattice construction [8]. Charm-L extends Charm to directly compute the lat- tice while it generates the CIs. In the experiments, we executed Charm-L with a switch to compute (minimal) generators too using the minhitset method. In [9], Zaki and Ramakrishnan present an efficient method for calculating the genera- tors, which is actually the generator-computing method of Pfaltz and Taylor [25]. 6 Charm-L uses diffsets by default, thus no explicit parameter was required. 202 12 Laszlo Szathmary et al. Laszlo Szathmaryet al. T25I10D10K MUSHROOMS 180 Snow-Touch 26 Snow-Touch 160 Charm-L[minhitset] Charm-L[minhitset] 24 140 22 120 total time (sec.) total time (sec.) 20 100 18 80 16 60 14 40 12 20 10 0 8 0.50 0.40 0.30 0.20 0.10 1.00 0.75 0.50 0.25 minimum support (%) minimum support (%) CHESS CONNECT 100 Snow-Touch 60 Snow-Touch 90 Charm-L[minhitset] Charm-L[minhitset] 80 50 70 total time (sec.) total time (sec.) 60 40 50 30 40 30 20 20 10 10 0 0 70 65 60 55 50 70 65 60 55 50 minimum support (%) minimum support (%) Fig. 5. Response times of Snow-Touch and Charm-L. This way, the two algorithms (Snow-Touch and Charm-L) are comparable since they produce exactly the same output. Performance on sparse datasets. On T25, Charm-L performs better than Snow-Touch. We have to admit that sparse datasets are a bit problematic for our algorithm. The reason is that T25 produces long sparse bitvectors, which gives some overhead to Snow-Touch. In our implementation, we use bitvectors to store tidsets. However, as can be seen in the next paragraph, our algorithm outperforms Charm-L on all the dense datasets that were used during our tests. Performance on dense datasets. On Mushrooms, Chess and Con- nect, we can observe that Charm-L performs well only for high values of sup- port. Below a certain threshold, Snow-Touch gives lower response times, and the gap widens as the support is lowered. When the minimum support is set low enough, Snow-Touch can be several times faster than Charm-L. Consid- ering that Snow-Touch is implemented in Java, we believe that a good C++ implementation could be several orders of magnitude faster than Charm-L. Fast Mining of Iceberg Lattices:Mining A Modular Approach Iceberg LatticesUsing with Generators Generators 203 13 According to our experiments, Snow-Touch can construct the concept lattices faster than Charm-L in the case of dense datasets. From this, we draw the hypothesis that our direction towards the construction of FG-decorated concept lattices is more beneficial than the direction of Charm-L. That is, it is better to extract first the FCI/FG-pairs and then determine the order relation between them than first extracting the set of FCIs, constructing the order between them, and then determining the corresponding FGs for each FCI. 4.2 Analysis of Antibiotic Resistant Genes We looked at the practical performance of Snow-Touch on real-world genomic dataset whereby the goal was to discover meaningful associations between genes in entire genomes seen as items and transactions, respectively. The genomic dataset was collected from the website of the National Cen- ter for Biotechnology Information (NCBI) with a focus on genes from microbial genomes. At the time of writing (June 2011), 1, 518 complete microbial genomes were available on the NCBI website.7 For each genome, its list of genes was col- lected (for instance the genome with ID CP002059.1 has two genes, rnpB and ssrA). Only 1, 250 genomes out of the 1, 518 proved non empty; we put them in a binary matrix of 1, 250 rows × 125, 139 columns. With an average of 684 genes per genome we got 0.55% density (i.e., large yet sparse dataset with an imbalance between numbers of rows and of columns). The initial result of the mining task was the family of minimal non- redundant association rules (MN R), which are directly available from the out- put of Snow-Touch. We sorted them according to the confidence. Among all strong associations, the bioinformaticians involved in this study found most ap- pealing the rules describing the behavior of antibiotic resistant genes, in partic- ular, the mecA gene. mecA is frequently found in bacterial cells. It induces a resistance to antibiotics such as Methicillin, Penicillin, Erythromycin, etc. [26]. The most commonly known carrier of the gene mecA is the bacterium known as MRSA (methicillin-resistant Staphylococcus aureus). At a second step, we were narrowing the focus on a group of three genes, mecA plus ampC and vanA [27]. ampC is a beta-lactam-resistance gene. AmpC beta-lactamases are typically encoded on the chromosome of many gram-negative bacteria; it may also occur on Escherichia coli. AmpC type beta-lactamases may also be carried on plasmids [26]. Finally, the gene vanA is a vancomycin- resistance gene typically encoded on the chromosome of gram-positive bacteria such as Enterococcus. The idea was to relate the presence of these three genes to the presence or absence of any other gene or a combination thereof. Table 2 shows an extract of the most interesting rules found by our algorithm. These rules were selected from a set of 18,786 rules. For instance, rule (1) in Table 2 says that the gene mecA is present in 85.71% of cases when the set of genes {clpX, dnaA, dnaI, dnaK, gyrB, hrcA, pyrF} 7 http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi 204 14 Laszlo Szathmary et al. Laszlo Szathmaryet al. Table 2. An extract of the generated minimal non-redundant association rules. After each rule, the following measures are indicated: support, confidence, support of the left-hand side (antecedent), support of the right-hand side (consequent) (1) {clpX, dnaA, dnaI, dnaK, gyrB, hrcA, pyrF } → {mecA} (supp=96 [7.68%]; conf=0.857 [85.71%]; suppL=112 [8.96%]; suppR=101 [8.08%]) (2) {clpX, dnaA, dnaI, dnaK, nusG} → {mecA} (supp=96 [7.68%]; conf=0.835 [83.48%]; suppL=115 [9.20%]; suppR=101 [8.08%]) (3) {clpX, dnaA, dnaI, dnaJ, dnaK } → {mecA} (supp=96 [7.68%]; conf=0.828 [82.76%]; suppL=116 [9.28%]; suppR=101 [8.08%]) (4) {clpX, dnaA, dnaI, dnaK, ftsZ } → {mecA} (supp=96 [7.68%]; conf=0.828 [82.76%]; suppL=116 [9.28%]; suppR=101 [8.08%]) (5) {clpX, dnaA, dnaI, dnaK } → {mecA} (supp=97 [7.76%]; conf=0.815 [81.51%]; suppL=119 [9.52%]; suppR=101 [8.08%]) (6) {greA, murC, pheS, rnhB, ruvA} → {ampC } (supp=99 [7.92%]; conf=0.227 [22.71%]; suppL=436 [34.88%]; suppR=105 [8.40%]) (7) {murC, pheS, pyrB, rnhB, ruvA} → {ampC } (supp=99 [7.92%]; conf=0.221 [22.15%]; suppL=447 [35.76%]; suppR=105 [8.40%]) (8) {dxs, hemA} → {vanA} (supp=29 [2.32%]; conf=0.081 [8.15%]; suppL=356 [28.48%]; suppR=30 [2.40%]) (9) {dxs} → {vanA} (supp=30 [2.40%]; conf=0.067 [6.73%]; suppL=446 [35.68%]; suppR=30 [2.40%]) is present in a genome. The above rules have a direct practical use. In one such scenario, they could be used to suggest which antibiotic should be taken by a patient depending on the presence or absence of certain genes in the infecting microbe. 5 Conclusion We presented a new design schema for the task of mining the iceberg lattice and the corresponding generators out of a large context. The target structure directly involved in the construction of a number of association rule bases and hence is of a certain importance in the data mining field. While previously pub- lished algorithms follow the same schema, i.e., construction of the iceberg lattice (FCIs plus precedence links) followed by the extraction of the FGs, our approach consists in inferring precedence links from the previously mined FCIs with their FGs. We presented an initial and straightforward instanciation of the new algorith- mic schema that reuses existing methods for the three steps: the popular Charm FCI miner, our own method for FG extraction, Talky-G (plus an FGs-to-FCIs matching procedure), and the Hasse diagram constructor Snow. The resulting iceberg plus FGs miner, Snow-Touch, is far from an optimal algorithm, in par- ticular due to redundancies in the first two steps. Yet an implementation thereof within the Coron platform (in Java) has managed to outperform its natural Fast Mining of Iceberg Lattices:Mining A Modular Approach Iceberg LatticesUsing with Generators Generators 205 15 competitor, Charm-L (in C++) on a wide range of datasets, especially on dense ones. To level the playing ground, we are currently re-implementing Snow-Touch in C++ and expect the new version to be even more efficient. In a different vein, we have tested the capacity of our approach to support practical mining task by applying it to the analysis of genomic data. While a large number of associations usually come out of such datasets, many of the redundant with respect to each other, by limiting the output to only the generic ones, our method helped focus the analysts’ attention to a smaller number of significant rules. As a next step, we are studying a more integrated approach for FCI/FG con- struction that requires no extra matching step. This should result in substantial efficiency gains. On the methodological side, our study underlines the duality between generators and order w.r.t. FCIs: either can be used in combination with FCIs to yield the other one. It rises the natural question of whether FCIs alone, which are output by a range of frequent pattern miners, could be used to efficiently retrieve first precedence, and then FGs. References 1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: Proc. of the 20th Intl. Conf. on Very Large Data Bases (VLDB ’94), San Francisco, CA, Morgan Kaufmann (1994) 487–499 2. Kryszkiewicz, M.: Concise Representations of Association Rules. In: Proc. of the ESF Exploratory Workshop on Pattern Detection and Discovery. (2002) 92–109 3. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining Minimal Non-Redundant Association Rules Using Frequent Closed Itemsets. In: Proc. of the Computational Logic (CL ’00). Volume 1861 of LNAI., Springer (2000) 972–986 4. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules. In: Proc. of the 7th Intl. Conf. on Database Theory (ICDT ’99), Jerusalem, Israel (1999) 398–416 5. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing Iceberg Concept Lattices with Titanic. Data and Knowl. Eng. 42(2) (2002) 189–222 6. Zaki, M.J., Hsiao, C.J.: CHARM: An Efficient Algorithm for Closed Itemset Min- ing. In: SIAM Intl. Conf. on Data Mining (SDM’ 02). (Apr 2002) 33–43 7. Zaki, M.J.: Mining Non-Redundant Association Rules. Data Mining and Knowl- edge Discovery 9(3) (2004) 223–248 8. Zaki, M.J., Hsiao, C.J.: Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure. IEEE Trans. on Knowl. and Data Eng. 17(4) (2005) 462–478 9. Zaki, M.J., Ramakrishnan, N.: Reasoning about Sets using Redescription Mining. In: Proc. of the 11th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining (KDD ’05), Chicago, IL, USA (2005) 364–373 10. Godin, R., Missaoui, R.: An incremental concept formation approach for learning from databases. Theoretical Computer Science Journal (133) (1994) 387–419 11. Pfaltz, J.L.: Incremental Transformation of Lattices: A Key to Effective Knowledge Discovery. In: Proc. of the First Intl. Conf. on Graph Transformation (ICGT ’02), Barcelona, Spain (Oct 2002) 351–362 12. Le Floc’h, A., Fisette, C., Missaoui, R., Valtchev, P., Godin, R.: JEN : un al- gorithme efficace de construction de générateurs pour l’identification des règles d’association. Nouvelles Technologies de l’Information 1(1) (2003) 135–146 206 16 Laszlo Szathmary et al. Laszlo Szathmaryet al. 13. Szathmary, L., Valtchev, P., Napoli, A., Godin, R.: Efficient Vertical Mining of Frequent Closures and Generators. In: Proc. of the 8th Intl. Symposium on In- telligent Data Analysis (IDA ’09). Volume 5772 of LNCS., Lyon, France, Springer (2009) 393–404 14. Szathmary, L., Valtchev, P., Napoli, A., Godin, R.: Constructing Iceberg Lattices from Frequent Closures Using Generators. In: Discovery Science. Volume 5255 of LNAI., Budapest, Hungary, Springer (2008) 136–147 15. Calders, T., Rigotti, C., Boulicaut, J.F.: A Survey on Condensed Representations for Frequent Sets. In Boulicaut, J.F., Raedt, L.D., Mannila, H., eds.: Constraint- Based Mining and Inductive Databases. Volume 3848 of Lecture Notes in Computer Science., Springer (2004) 64–80 16. Baixeries, J., Szathmary, L., Valtchev, P., Godin, R.: Yet a Faster Algorithm for Building the Hasse Diagram of a Galois Lattice. In: Proc. of the 7th Intl. Conf. on Formal Concept Analysis (ICFCA ’09). Volume 5548 of LNAI., Darmstadt, Germany, Springer (May 2009) 162–177 17. Pasquier, N.: Mining association rules using formal concept analysis. In: Proc. of the 8th Intl. Conf. on Conceptual Structures (ICCS ’00), Shaker-Verlag (Aug 2000) 259–264 18. Berge, C.: Hypergraphs: Combinatorics of Finite Sets. North Holland, Amsterdam (1989) 19. Pei, J., Han, J., Mao, R.: CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. (2000) 21–30 20. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast Discovery of Association Rules. In: Proc. of the 3rd Intl. Conf. on Knowledge Discovery in Databases. (August 1997) 283–286 21. Ganter, B., Wille, R.: Formal concept analysis: mathematical foundations. Springer, Berlin/Heidelberg (1999) 22. Calders, T., Goethals, B.: Depth-first non-derivable itemset mining. In: Proc. of the SIAM Intl. Conf. on Data Mining (SDM ’05), Newport Beach, USA. (Apr 2005) 23. Szathmary, L., Napoli, A., Kuznetsov, S.O.: ZART: A Multifunctional Itemset Mining Algorithm. In: Proc. of the 5th Intl. Conf. on Concept Lattices and Their Applications (CLA ’07), Montpellier, France (Oct 2007) 26–37 24. Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proc. of the 9th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining (KDD ’03), New York, NY, USA, ACM Press (2003) 326–335 25. Pfaltz, J.L., Taylor, C.M.: Scientific Knowledge Discovery through Iterative Trans- formation of Concept Lattices. In: Proc. of the SIAM Workshop on Data Mining and Discrete Mathematics, Arlington, VA, USA (2002) 65–74 26. Philippon, A., Arlet, G., Jacoby, G.A.: Plasmid-Determined AmpC-Type β- Lactamases. Antimicrobial Agents and Chemotherapy 46(1) (2002) 1–11 27. Schwartz, T., Kohnen, W., Jansen, B., Obst, U.: Detection of antibiotic-resistant bacteria and their resistance genes in wastewater, surface water, and drinking water biofilms. Microbiology Ecology 43(3) (2003) 325–335 Boolean factors as a means of clustering of interestingness measures of association rules ? Radim Belohlavek1 , Dhouha Grissa2,4,5 , Sylvie Guillaume3,4 , Engelbert Mephu Nguifo2,4 , Jan Outrata1 1 Data Analysis and Modeling Lab Department of Computer Science , Palacky University, Olomouc 17. listopadu 12, CZ-77146 Olomouc, Czech Republic radim.belohlavek@acm.org,jan.outrata@upol.cz 2 Clermont Université, Université Blaise Pascal, LIMOS, BP 10448, F-63000 Clermont-Ferrand, France 3 Clermont Université, Université d’Auvergne, LIMOS, BP 10448, F-63000 Clermont-Ferrand, France 4 CNRS, UMR 6158, LIMOS, F-63173 Aubiére, France 5 URPAH, Département d’Informatique, Faculté des Sciences de Tunis, Campus Universitaire, 1060 Tunis, Tunisie dgrissa@isima.fr,guillaum@isima.fr,mephu@isima.fr Abstract. Measures of interestingness play a crucial role in association rule mining. An important methodological problem is to provide a rea- sonable classification of the measures. Several papers appeared on this topic. In this paper, we explore Boolean factor analysis, which uses formal concepts corresponding to classes of measures as factors, for the purpose of classification and compare the results to the previous approaches. 1 Introduction An important problem in extracting association rules, well known since the early stage of association rule mining [32], is the possibly huge number of rules ex- tracted from data. A general way of dealing with this problem is to define the concept of rule interestingness: only association rules that are considered inter- esting according to some measure are presented to the user. The most widely used measures of interestingness are based on the concept of support and con- fidence. However, the suitability of these measures to extract interesting rules was challenged by several studies, see e.g. [34]. Consequently, several other in- terestingness measures of association rules were proposed, see e.g. [35], [23], [12], [38]. With the many existing measures of interestingness arises the problem of selecting an appropriate one. ? We acknowledge support by the ESF project No. CZ.1.07/2.3.00/20.0059, the project is co-financed by the European Social Fund and the state budget of the Czech Re- public (R. Belohlavek); Grant No. 202/10/P360 of the Czech Science Foundation (J. Outrata); and by Grant No. 11G1417 of the French-Tunisian cooperation PHC Utique (D. Grissa). c 2011 by the paper authors. CLA 2011, pp. 207–222. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 208 Radim Belohlavek et al. To understand better the behavior of various measures, several studies of the properties of measures of interestingness appeared, see e.g. [12], [27], [23], [16]. Those studies explore various properties of the measures that are considered important. For example, Vaillant et al. [37] evaluated twenty interestingness mea- sures according to eight properties. To facilitate the choice of the user-adapted interestingness measure, the authors applied the clustering methods on the de- cision matrix and obtained five clusters. Tan et al. [35] studied twenty-one in- terestingness measures through eight properties and showed that no measure is adapted to all cases. To select the best interestingness measure, they use both a support-based pruning and standardization methods. By applying a new cluster- ing approach, Huynh et al. [21] classifyed thirty-four interestingness measures with a correlation analysis. Geng and Hamilton [12] made a survey of thirty- eight interestingness measures for rules and summaries with eleven properties and gived strategies to select the appropriate measures. D. R. Feno [10] evalu- ated fifteen interestingness measures with thirteen properties to describe their behaviour. Delgato et al. [9] provided a new study of the interestingness measures by means of the logical model. In addition, the authors proposed and justified the addition of two new principles to the three proposed by Piatetsky-Shapiro [32]. Finally, Heravi and Zaiane [22] studied fifty-three objective measures for associative classification rules according to sixteen properties and explained that no single measure can be introduced as an obvious winner. The assessment of measures according to their properties results in a measure- property binary matrix. Two studies of this matrix were conducted. Namely, [17] describes how FCA can highlight interestingness measures with similar behav- ior in order to help the user during his choice. [16] and [14] attempted to find natural clusters of measures using widely used clustering methods, the agglomer- ative hierarchical method (AHC) and the K-means method. A common feature of these methods is that they only produce disjoint clusters of measures. On the other hand, one could naturally expect overlapping clusters. The aim of this paper is to explore the possibility of obtaining overlapping clusters of measures using factor analysis of binary data and to compare the results with the results of other studies. In particular, we use the recently developed method from [3] and take the discovered factors for clusters. The method uses formal concepts as factors that makes it possible to interpret the factors easily. 2 Preliminaries 2.1 Binary (Boolean) data Let X be a set of objects (such as a set of customers, a set of functions or the like) and Y be a set of attributes (such as a set of products that customers may buy, a set of properties of functions). The information about which objects have which attributes may formally be represented by a binary relation I between X and Y , i.e. I ⊆ X × Y , and may be visualized by a table (matrix) that contains 1s and 0s, according to whether the object corresponding to a row has the attribute corresponding to a column (for this we suppose some orders of Bool. factors as a means of clust. of interestingness measures of assoc. rules 209 objects and attributes are fixed). We denote the entries of such matrix by Ixy . A data of this type is called binary data (or Boolean data). The triplet hX, Y, Ii is called a formal context in FCA but other terms are used in other areas. Such type of data appears in two roles in our paper. First, association rules, whose interestingness measures we analyze, are certain dependencies over the binary data. Second, the information we have about the interestingness measures of association rules is in the form of binary data: the objects are interestingness measures and the attributes are their properties. 2.2 Association rules An association rule [36] over a set Y of attributes is a formula A⇒B (1) where A and B are sets of attributes from Y , i.e. A, B ⊆ Y . Let hX, Y, Ii be a formal context. A natural measure of interestingness of association rules is based on the notions of confidence and support. The confidence and support of an association rule A ⇒ B in hX, Y, Ii is defined by |A↓ ∩ B ↓ | |A↓ ∩ B ↓ | conf(A ⇒ B) = and supp(A ⇒ B) = , |A↓ | |X| where C ↓ for C ⊆ Y is defined by C ↓ = {x ∈ X | for each y ∈ C : hx, yi ∈ I}. An association rule is considered interesting if its confidence and support ex- ceed some user-specified thresholds. However, the support-confidence approach reveals some weaknesses. Often, this approach as well as algorithms based on it lead to the extraction of an exponential number of rules. Therefore, it is impos- sible to validate it by an expert. In addition, the disadvantage of the support is that sometimes many rules that are potentially interesting, have a lower support value and therefore can be eliminated by the pruning threshold minsupp. To ad- dress this problem, many other measures of interestingness have been proposed in the literature [13], mainly because they are effective for mining potentially interesting rules and capture some aspects of user interest. The most important of those measures are subject to our analysis and are surveyed in Section 3.1. Note that association rules are attributed to [1]. However, the concept of associ- ation rule itself as well as various measures of interestingness are particular cases of what is investigated in depth in [18], a book that develops logico-statistical foundations of the GUHA method [19]. 2.3 Factor analysis of binary (Boolean) data Let I be an n × m binary matrix. The aim in Boolean factor analysis is to find a decomposition I =A◦B (2) 210 Radim Belohlavek et al. of I into an n × k binary matrix A and a k × m binary matrix B with ◦ denoting the Boolean product of matrices, i.e. k (A ◦ B)ij = max min(Ail , Blj ). l=1 The inner dimension, k, in the decomposition may be interpreted as the number of factors that may be used to describe the original data. Namely, Ail = 1 if and only if the lth factor applies to the ith object and Blj = 1 if and only if the jth attribute is one of the manifestations of the lth factor. The factor model behind (2) has therefore the following meaning: The object i has the attribute j if and only if there exists a factor l that applies to i and for which j is one of its particular manifestations. We refer to [3] for further information and references to papers that deal with the problem of factor analysis and decompositions of binary matrices. In [3], the following method for finding decompositions (2) with the number k of factors as small as possible has been presented. The method utilizes formal concepts of the formal context hX, Y, Ii as factors, where X = {1, . . . , n}, Y = {1, . . . , m} (objects and attributes correspond to the rows and columns of I). Let F = {hC1 , D1 i, . . . , hCk , Dk i} be a set of formal concepts of hX, Y, Ii, i.e. hCl , Dl i are elements of the concept lattice B(X, Y, I) [11]. Consider the n × k binary matrix AF and a k × m binary matrix BF defined by (AF )il = 1 iff i ∈ Cl and (BF )lj = 1 iff j ∈ Dl . (3) Denote by ρ(I) the smallest number k, so-called Schein rank of I, such that a decomposition of I exists with k factors. The following theorem shows that using formal concepts as factors as in (3) enables us to reach the Schein rank, i.e. is optimal [3]: Theorem 1. For every binary matrix I, there exists F ⊆ B(X, Y, I) such that I = AF ◦ BF and |F| = ρ(I). As has been demonstrated in [3], a useful feature of using formal concepts as factors is the fact that formal concepts may easily be interpreted. Namely, every factor, i.e. a formal concept hCl , Dl i, consists of a set Cl of objects (objects are measures of interestingness in our case) and a set Dl of attributes (properties of measures in our case). Cl contains just the objects to which all the attributes from Dl apply and Dl contains all attributes shared by all objects from Cl . From a clustering point of view, the factors hCl , Dl i may thus be seen as clusters Cl with their descriptions by attributes from Dl . The factors thus have a natural, easy to understand meaning. Since the problem of computing the smallest set of factors is NP-hard, a greedy approximation algorithm was proposed in [3, Algorithm 2]. This algorithm is utilized below in our paper. Bool. factors as a means of clust. of interestingness measures of assoc. rules 211 3 Clustering interestingness measures using Boolean factors 3.1 Measures of interestingness In the following, we present the interestingness measures reported in the litera- ture and recall nineteen of their most important properties that were proposed in the literature. To identify interesting association rules and to enable the user to focus on what is interesting for him, about sixty interestingness measures [20], [35], [10] were proposed in the literature. All of them are defined using the following parameters: p(XY ), p(X̄Y ), p(X Ȳ ) and p(X̄ Ȳ ), where p(XY ) = nXY n represents the number of objects satisfying XY (the intersection of X and Y ), and X̄ is the negation of X. The following are important examples of interestingness measures: Lift [6]: Given a rule X → Y , lift is the ratio of the probability that X and Y occur together to the multiple of the two individual probabilities for X and Y , i.e., p(XY ) Lift(X → Y ) = p(X)×p(Y ). If this value is 1, then X and Y are independent. The higher this value, the more likely that the existence of X and Y together in a transaction is not just a random occurrence, but because of some relationship between them. Correlation coefficient [31]: Correlation is a symmetric measure evaluating the strength of the itemsets’ connection. It is defined by Correlation = √p(XY )−p(X)p(Y ) . p(X)p(Y )p(X̄)p(Ȳ ) A correlation around 0 indicates that X and Y are not correlated. The lower is its value, the more negatively correlated X and Y are. The higher is its value, the more positively correlated they are. Conviction [6]: Conviction is one of the measures that favor counter-examples. It is defined by Conviction = p(X)p( Ȳ ) p(X Ȳ ) Conviction which is not a symmetric measure, is used to quatify the deviation from independence. If its value is 1, then X and Y are independent. 212 Radim Belohlavek et al. MGK [15]: MGK is an interesting measure, which allows the extraction of neg- ative rules. MGK = p(Y1−p(Y /X)−p(Y ) ) , if X favorise Y MGK = p(Y /X)−p(Y p(Y ) ) , if X defavorise Y It takes into account several situations of references: in the case where the rule is situated in the attractive zone (i.e. p(Y /X) > p(Y )), this measure evaluates the distance between independence and logical implication. Thus, the higher the value of MGK is close to 1, the more the rule is close to the logical implication and the higher the value of MGK is close to 0, the more the rule is close to the independence. In the case where the rule is located in the repulsive zone (i.e. p(Y /X) < p(Y )), MGK evaluates this time a distance between the independence and the incompatibility. Thus, the closer the value of MGK is to −1, the more similar to incompatibility the rule is; and the closer the value of MGK is to 0, the closer to the independence the rule is. As was mentioned above, several studies [35], [23], [25], [13] were reported in the literature on the various properties of interestingess measures to be able to characterize and evaluate the interestingness measures. The main goal of researchers in the domain is then to provide a user assistance in choosing the best interestingness measure meeting his needs. For that, formal properties have been developed [32], [24], [35], [12], [4] in order to evaluate the interestingness measures and to help users understanding their behavior. In the following, we present nineteen properties reported in the literature. 3.2 Properties of the measures Figure 1 lists 19 properties of interestingness measures. The properties are de- scribed in detail in [16]; we omit details due to lack of space. The authors in [14] proposed an evaluation of 61 interestingness measures according to the 19 properties (P3 to P21 ). Properties P1 and P2 were not taken into account in this study because of their subjective character. The measures and their properties result in a binary measure-property matrix that is used for clustering the measures according to their properties. The clustering performed in [14] using the agglomerative hierarchical method and the K-means method revealed 7 clusters of measures which will be used in the next section in a com- parison with the results obtained by Boolean factor analysis applied on the same measure-property matrix. 3.3 Clustering using Boolean factors The measure-property matrix describing interestingness measures by their prop- erties is depicted in Figure 2. It consists of 62 measures (61 measures from [14] plus one more that has been studied recently) described by 21 properties be- cause the three-valued property P14 is represented by three yes-no properties Bool. factors as a means of clust. of interestingness measures of assoc. rules 213 No. Property Ref. P1 Intelligibility or comprehensibility of measure [25] P2 Easiness to fix a threshold to the rule [23] P3 Asymmetric measure. [35], [23] P4 Asymmetric measure in the sense of the conclusion negation. [23], [35] P5 Measure assessing in the same way X → Y and Ȳ → X̄ in the logical [23] implication case. P6 Measure increasing function the number of examples or decreasing func- [32], [23] tion the number of counter-examples. P7 Measure increasing function the data size. [12], [35] P8 Measure decreasing function the consequent/antecedent size. [23], [32] P9 Fixed value a in the independence case. [23], [32] P10 Fixed value b in the logical implication case. [23] P11 Fixed value c in the equilibrium case. [5] P12 Identified values in the attraction case between X and Y . [32] P13 Identified values in the repulsion case between X and Y . [32] P14 Tolerance to the first counter-example. [23], [38] P15 Invariance in case of expansion of certain quantities. [35] P16 Desired relationship between X → Y and X̄ → Y rules. [35] P17 Desired relationship between X → Y and X → Ȳ antinomic rules. [35] P18 Desired relationship between X → Y and X̄ → Ȳ rules. [35] P19 Antecedent size is fixed or random. [23] P20 Descriptive or statistical measure. [23] P21 Discriminant measure. [23] Fig. 1. Interestingness measures properties. P14.1 , P14.2 , and P14.3 . We computed the decomposition of the matrix using Al- gorithm 2 from [3] and obtained 28 factors (as in the case below, several of them may be disregarded as not very important; we leave the details for a full version of this paper). In addition, we extended the original 62 × 21 binary matrix by adding for every property its negation, and obtained a 62×42 binary matrix. The reason for adding negated properties is due to our goal to compare the results with the two clustering methods mentioned above and the particular role of the properties and their negations in these clustering methods. From the 62 × 42 matrix, we obtained 38 factors, denoted F1 , . . . , F38 . The factors are presented in Figures 3 and 4. Figure 3 depicts the object-factor matrix describing the in- terestingness measures by factors, Figure 4 depicts the factor-property matrix explaining factors by properties of measures. Factors are sorted from the most important to the least important, where the importance is determined by the number of 1s in the input measure-property matrix covered by the factor [3]. The first factors cover a large part of the matrix, while the last ones cover only a small part and may thus be omitted [3], see the graph of cumulative cover of the matrix by the factors in Figure 5. 4 Interpretation and comparison to other approaches The aim of this section is to provide an interpretation of the results described in the previous section and compare them to the results already reported in the literature, focusing mainly on [14]. As was described in the previous section, 38 factors were obtained. The first 21 of them cover 94 % of the input measure- property matrix (1s in the matrix), the first nine cover 72 %, and the first five 214 Radim Belohlavek et al. P14.1 P14.2 P14.3 P10 P11 P12 P13 P15 P16 P17 P18 P19 P20 P21 P3 P4 P5 P6 P7 P8 P9 correlation 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 Cohen 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 1 0 confidence 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 causal confidence 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 Pavillon 1 1 0 1 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 1 0 Ganascia 1 1 1 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 1 0 causal confirmation 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 descriptive confirmation 1 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 1 0 conviction 1 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 cosine 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 coverage 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 dependency 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 causal dependency 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 weighted dependency 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 Bayes factor 1 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 0 0 1 0 1 Loevinger 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 1 1 0 collective strength 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 0 1 Fukuda 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 information gain 0 1 0 1 1 1 1 0 0 1 1 1 0 0 0 0 0 0 1 0 0 Goodman 0 1 1 1 1 0 1 1 0 1 1 0 0 1 1 1 0 0 1 1 0 implication index 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 IPEE 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 0 0 IP3E 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 1 1 0 0 PDI 0 1 1 1 0 1 0 0 0 0 0 1 1 0 0 0 1 1 1 0 0 II 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 1 0 0 0 EII 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 REII 1 1 1 1 1 0 1 0 0 1 1 1 0 0 0 0 1 1 1 0 0 likelihood index 0 1 1 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1 0 0 0 interest 0 1 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 Jaccard 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 Jmeasure 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 Klosgen 1 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 Laplace 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 Mgk 1 1 1 1 1 0 1 1 0 1 1 0 0 0 1 0 0 0 1 1 0 least contradiction 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 Pearl 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1 0 Piatetsky-Shapiro 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 1 1 1 0 precision 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 prevalence 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 YuleQ 0 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 1 0 0 recall 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 Gini 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 relative risk 1 1 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 Sebag 1 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 support 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 one way support 1 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 two way support 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 examples and counter-examples rate 1 1 1 1 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 VT100 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 1 1 0 variation support 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 1 YuleY 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 1 0 0 1 0 1 Zhang 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 0 0 0 1 0 0 causal confirmed confidence 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 Czekanowski 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 negative reliability 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 mutual information 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 Kulczynski 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 Leverage 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 novelty 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 odds ratio 0 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 0 1 specificity 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 causal support 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 Fig. 2. Input binary matrix describing interestingness measures by their properties. Bool. factors as a means of clust. of interestingness measures of assoc. rules 215 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 F30 F31 F32 F33 F34 F35 F36 F37 F38 F1 F2 F3 F4 F5 F6 F7 F8 F9 correlation 10000100100010000000101001000001000010 Cohen 10000100000010010000101000000001000010 confidence 01010000000100010000000000000100001000 causal confidence 01000100010100010000000000100100001000 Pavillon 10000100010000000001100001000000000000 Ganascia 01010000000100000000000001000100001000 causal confirmation 01000100011000010000000000100100000000 descriptive confirmation 01010000000001000001000001000100000000 conviction 10000010000000110000100000000000100001 cosine 01001000001011000001000000000001000010 coverage 00100000010001000000000110000100000000 dependency 00100000010001000001000100010000000000 causal dependency 01001100011000000001000000100100000000 weighted dependency 00100000001000000101000000100100000001 Bayes factor 10001000000000100010100000000000100001 Loevinger 10000100010100010000100000000000001000 collective strength 10000010000010010000101000000000100011 Fukuda 00000000011000000101000000001100010000 information gain 10000000000010000001110000000000000011 Goodman 10000000100100000100001001000001001010 implication index 00100000010000010100000000011000000000 IPEE 00010001000000000000010010001100010100 IP3E 00010001001000010000010000001100010100 PDI 00000001001010000010000000001000010110 II 00000001000000100010100010000000010100 EII 00000001001000010100010000100100010100 REII 00000001000000110100010000000000010100 likelihood index 00000001000010000000110010000000010110 interest 10001100000010000001100000000001000010 Jaccard 00001010001011000001000000000000000011 Jmeasure 00100010000000000001000100010000100001 Klosgen 10000010000000100101000000010000100001 Laplace 01010000001000010000000000000100000000 Mgk 10000000010100000100000001000000001000 least contradiction 01010000001001000001000000000100000000 Pearl 00100000000000011000001000010001000010 Piatetsky-Shapiro 00000100100010000000101000000001010010 precision 01000100001010010000001000100001000010 prevalence 00100000010001000100000010000100000000 YuleQ 10000000100100000110000000000010001011 recall 01001000011001000001000000000100000000 Gini 00100010001001000000001100000100000001 relative risk 10001010000000100001100000000000100001 Sebag 00010010001001000001000000000100000001 support 01000000001001000101000000000001000010 one way support 10001010000000100001000000010000100001 two way support 10001010000000000001000000010000100011 examples and counter-examples rate 00010000000100000000010010000100001001 VT100 00000100100010000000000001100001000110 variation support 00100010000000011000001000010000100011 YuleY 10000000100000000110001000000000101011 Zhang 10000000000100100110000000000010001001 causal confirmed confidence 01000100010100010000000000100100001000 Czekanowski 01001000001011000001000000000001000010 negative reliability 01000100010100010000000000100100001000 mutual information 00100010001001000000001100000100000001 Kulczynski 00001010001011000001000000000000000011 Leverage 01001100011000000001000000100100000000 novelty 10000100100010000000101001000001000010 odds ratio 10000010000010010000101000000000100011 specificity 01001100011000000001000000100100000000 causal support 01000100001010010000001000100001000010 Fig. 3. Interestingness measures described by factors obtained by decomposition of the input matrix from Figure 2 extended by negated properties. 216 Radim Belohlavek et al. P14.1 P14.2 P14.3 P14.1 P14.2 P14.3 P10 P11 P12 P13 P15 P16 P17 P18 P19 P20 P21 P10 P11 P12 P13 P15 P16 P17 P18 P19 P20 P21 P3 P4 P5 P6 P7 P8 P9 P3 P4 P5 P6 P7 P8 P9 F1 010010100110000000100 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 F2 010100000000000000110 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 F3 000000000000000000000 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 0 1 0 0 0 0 F4 110100001000000000000 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 1 0 0 0 0 0 F5 010001000000000000100 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1 1 1 0 0 0 F6 010111000000000000110 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 F7 000000000000000000101 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 0 F8 011100000001000011000 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1 1 F9 011110000000011100100 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 F10 100000000000000000010 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1 F11 000000000000000000100 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 F12 011100010000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 F13 010101000000000000000 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 F14 000000000000000000000 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 0 F15 110010100110000000000 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0 F16 001000000000000000100 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 F17 001000100100000100100 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 F18 010000000000000000000 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 F19 010100000000100000000 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 F20 000000000000000000100 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0 F21 010111100110000000000 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 F22 010100000001000000000 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 1 F23 000000000000000100100 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 F24 100000000000000000000 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 F25 000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1 F26 010100000000001000110 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 F27 010010000000000000100 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 1 F28 000000100000000000100 0 0 0 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 0 0 F29 010000000000000001000 0 0 0 0 1 0 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1 F30 100000000000000000000 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 0 F31 011110110111101000100 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 1 F32 000000000000000000110 1 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1 F33 000000100100000000101 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0 F34 010100000000000001000 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 F35 011100010000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 F36 011100000000000010000 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 F37 000000000000000000000 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 F38 000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 Fig. 4. Factors obtained by decomposition of the input matrix from Figure 2 extended by negated properties. The factors are described in terms of the original and negated properties. Bool. factors as a means of clust. of interestingness measures of assoc. rules 217 100 90 cumulative cover (%) 80 70 60 50 40 30 20 10 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 number of factors Fig. 5. Cumulative cover of input matrix from Figure 2 extended by negated properties by factors obtained by decomposition of the matrix. Fig. 6. Venn diagram of the first five factors (plus the eighth and part of the sixth and tenth to cover the whole set of measures) obtained by decomposition of the input matrix from Figure 2 extended by negated properties. 218 Radim Belohlavek et al. cover 52.4 %. Another remark is that the first ten factors cover the whole set of measures. Note first that the Boolean factors represent overlapping clusters, contrary to the clustering using the agglomerative hierarchical method and the K-means method performed in [14]. Namely, the clusterings are depicted in Figure 6 de- scribing the Venn diagram of the first five Boolean factors (plus the eighth and part of the sixth and tenth to cover the whole set of measures) and Figure 7, which is borrowed from [14], describing the consensus on the classification ob- tained by the hierarchical and K-means clusterings. This consensus refunds the classes C1 to C7 of the extracted measures, which are common to both tech- niques. Fig. 7. Classes of measures obtained by the hierarchical and K-means clusterings. Due to lack of space, we focus on the first four factors since they cover nearly half of the matrix (45.1 %), and also because most of the measures appear at least once in the four factors. Factor 1. The first factor F1 applies to 20 measures, see Figure 3, namely: correlation, Cohen, Pavillon, conviction, Bayes factor, Loevinger, collective strength, information gain, Goodman, interest, Klosgen, Mgk, YuleQ, relative risk, one Bool. factors as a means of clust. of interestingness measures of assoc. rules 219 way support, two way support, YuleY, Zhang, novelty, and odds ratio. These measures share the following 9 properties: P4, P7, P9, not P11, P12, P13, not P19, not P20, P21, see Figure 4. Interpretation. The factor applies to measures whose evolutionary curve in- creases w.r.t the number of examples and have a fixed point in the case of independence (this allows to identify the attractive and repulsive area of a rule). The factor also applies only to descriptive and discriminant measures that are not based on a probabilistic model. Comparison. When looking at the classification results reported in [14], F1 covers two classes from [14]: C6 and C7 , which together contain 15 measures. Those classes are closely related within the dendrogram obtained with the ag- glomerative hierarchical clustering method used in [14]. The 5 missing measures form a class obtained with K-means method in [14] with Euclidian distance. Factor 2. F2 applies to 18 measures, namely: confidence, causal confidence, Ganascia, causal confirmation, descriptive confirmation, cosine, causal depen- dency, Laplace, least contradiction, precision, recall, support, causal confirmed confidence, Czekanowski, negative reliability, Leverage, specificity, and causal support. These measures share the following 11 properties: P4, P6, not P9, not P12, not P13, P14.2, not P15, not P16, not P19, not P20, P21. Interpretation. The factor applies to measures whose evolutionary curve in- creases w.r.t. the number of examples and has a variable point in the case of independence, which implies that the attractive and repulsive areas of a rule are not identifiable. The factor also applies only to measures that are not dis- criminant, are indifferent to the first counter-examples, and are not based on a probabilistic model. Comparison. F2 corresponds to two classes, C4 and C5 reported in [14]. C4 ∪ C5 contains 22 measures. The missing measures are: Jaccard, Kulczyn- ski, examples and counter-examples rate and Sebag. Those measures are not covered by F2 since they are not indifferent to the first counter-examples. Factor 3. F3 applies to 10 measures, namely: coverage, dependency, weighted dependency, implication index, Jmeasure, Pearl, prevalence, Gini, variation sup- port, and mutual information. These measures share the following 10 properties: not P6, not P8, not P10, not P11, not P13, not P14.1, not P15, not P16, not P17, not P19. Interpretation. The factor applies to measures whose evolutionary curve does not increase w.r.t. the number of examples. Comparison. F3 corresponds to class C3 reported in [14], which contains 8 measures. The two missing measures, variation support and Pearl, belong to the same classes obtained by both K-means and the hierarchical method. Moreover, these two missing measures are similar to those from C3 obtained by the hierarchical method since they merge with the measures in C3 at the next level of the generated dendrogram. Here, there is a strong correspondence between results obtained using Boolean factors and the ones reported in [14]. Factor 4. F4 applies to 9 measures, namely: confidence, Ganascia, descriptive confirmation, IPEE, IP3E, Laplace, least contradiction, Sebag, and examples and 220 Radim Belohlavek et al. counter-examples rate. These measures share the following 12 properties: P3, P4, P6, P11, not P7, not P8, not P9, not P12, not P13, not P15, not P16, not P18. Interpretation. The factor applies to measures whose evolutionary curve in- creases w.r.t. the number of examples and has a fixed value in the equilibrium case. As there is no fixed value in the independence case, we can not get an identifiable area in the case of attraction or repulsion. Comparison. F4 mainly applies to measures of class C5 obtained in [14]. The two missing measures, IPEE et IP3E, belong to a different class. 5 Conclusions and further issues We demonstrated that Boolean factors provide us with clearly interpretable meaningful clusters of measures among which the first ones are highly similar to other clusters of measures reported in the literature. Contrary to other clus- tering methods, Boolean factors represent overlapping clusters. We consider this an advantage because overlapping clusters are a natural phenomenon in human classification. We presented preliminary results on clustering the measures using Boolean factors. Due to limited scope, we presented only parts of the results obtained and leave other results for a full version of this paper. An interesting feature of the presented method, to be explored in the future, is that the method need not start from scratch. Rather, one or more clusters, that are considered important classes of measures, may be supplied at the start and the method may be asked to complete the clustering. Another issue left for future research is the benefit of the clustering of measures for a user who is interested in selecting a type of measure, rather than a particular measure of interestingness of association rules. In the intended scenario, a user may use various interestingness measures that belong to different classes of measures. References 1. Agrawal R., Imielinski T., Swami A.: Mining association rules between sets of items in large databases. Proc. ACM SIGMOD 1993, 207–216. 2. Agrawal R., Srikant R.: Fast algorithms for mining association rules. Proc. VLDB Conf. 1994, 478–499. 3. Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. of Computer and System Sciences 76(1)(2010), 3–20. 4. Blanchard J., Guillet F., Briand H., Gras R.: Assessing rule with a probabilistic measure of deviation from equilbrium. In Proc. Of 11th International Symposium on Applied Stochastic Models and Data Analysis ASMDA 2005, Brest, France, 191–200. 5. Blanchard J., Guillet F., Briand H., Gras R.: IPEE: Indice Probabiliste d’Écart à l’Équilibre pour l’évaluation de la qualité des règles. Dans l’Atelier Qualité des Données et des Connaissances 2005, 26–34. 6. Brin S., Motwani R., Silverstein C.: Beyond Market Baskets: Generalizing Associ- ation Rules to Correlations. In Proc. of the ACM SIGMOD Conference, Tucson, Arizona, 1997, 265–276. Bool. factors as a means of clust. of interestingness measures of assoc. rules 221 7. Carpineto C., Romano G.: Concept Data Analysis. Theory and Applications. J. Wi- ley, 2004. 8. Davey B. A., Priestley H.: Introduction to Lattices and Order. Cambridge Univer- sity Press, Oxford, 1990. 9. Delgado M., Ruiz D.-L., Sanchez D.: Studying Interest measures for association rules through a logical model. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 18(1)(2010), World Scientific, 87–106. 10. Feno D.R.: Mesures de qualité des règles d’association: normalisation et car- actérisation des bases. PhD thesis, Université de La Réunion, 2007. 11. Ganter B., Wille R.: Formal Concept Analysis. Mathematical Foundations. Springer, Berlin, 1999. 12. Geng L., Hamilton H.J.: Choosing the Right Lens: Finding What is Interesting in Data Mining. Quality Measures in Data Mining 2007, ISBN 978-3-540-44911-9, 3–24. 13. Geng L., Hamilton H. J.: Interestingness measures for data mining: A Survey. ACM Comput. Surveys 38(3)(2006), 1–31. 14. Guillaume S., Grissa D., Mephu Nguifo E.: Catégorisation des mesures d’intérêt pour l’extraction des connaissances. Revue des Nouvelles Technologies de l’Information, 2011, to appear (previously available as Technical Report RR-10-14, LIMOS, ISIMA, 2010). 15. Guillaume S.: Traitement des données volumineuses. Mesures et algorithmes d’extraction des règles d’association et règles ordinales. PhD thesis. Université de Nantes, France, 2000. 16. Guillaume S., Grissa D., Mephu Nguifo E.: Propriétés des mesures d’intérêt pour l’extraction des règles. Dans l’Atelier Qualité des Données et des Connaissances, EGC’2010, 2010, Hammamet-Tunisie, http://qdc2010.lri.fr/fr/actes.php, 15–28. 17. Grissa D., Guillaume S., Mephu Nguifo E.: Combining Clustering techniques and Formal Concept Analysis to characterize Interestingness Measures. CoRR abs/1008.3629, 2010. 18. Hájek P., Havránek T.: Mechanizing Hypotheses Formation. Springer, 1978. 19. Hájek P., Holeňa, Rauch J.: The GUHA method and its meaning for data mining. J. Computer and System Sciences 76(2010), 34–48. 20. Hilderman R. J., Hamilton H. J.: Knowledge Discovery and Measures of Interest, Volume 638 of The International Series in Engineering and Computer Science 81(2)(2001), Kluwer. 21. Huynh X.-H., Guillet F., Briand H.: Clustering Interestingness Measures with Pos- itive Correaltion. ICEIS (2) (2005), 248–253. 22. Heravi M. J., Zaı̈ane O. R.: A study on interestingness measures for associative classifiers. SAC (2010), 1039–1046. 23. Lallich S., Teytaud, O.: Évaluation et validation de mesures d’intérêt des règles d’association. RNTI-E-1, numéro spécial 2004, 193–217. 24. Lenca P, Meyer P., Picouet P., Vaillant B., Lallich S.: Critères d’évaluation des mesures de qualité en ecd. Revue des Nouvelles Technologies de l’Information (En- treposage et Fouille de données) (1)(2003), 123–134. 25. Lenca P., Meyer P., Vaillant B., Lallich, S.: A multicriteria decision aid for interest- ingness measure selection. Technical Report LUSSI-TR-2004-01-EN, Dpt. LUSSI, ENST Bretagne 2004 (chapter 1). 26. Liu J., Mi J.-S.: A novel approach to attribute reduction in formal concept lattices. RSKT 2006, Lecture Notes in Artificial Intelligence 4062 (2006), 522–529. 222 Radim Belohlavek et al. 27. Maddouri M., Gammoudi J.: On Semantic Properties of Interestingness Measures for Extracting Rules from Data. Lecture Notes in Computer Science 4431 (2007), 148–158. 28. Maier D.: The Theory of Relational Databases. Computer Science Press, Rockville, 1983. 29. Pawlak Z.: Rough sets. Int. J. Information and Computer Sciences 11(5)(1982), 341–356. 30. Pawlak Z.: Rough Sets: Theoretical Aspcets of Reasoning About Data. Kluwer, Dor- drecht, 1991. 31. Pearson K.: Mathematical contributions to the theory of evolution, regression, heredity and panmixia. Philosophical Trans. of the Royal Society A (1896). 32. Piatetsky-Shapiro G.: Discovery, Analysis and Presentation of Strong Rules. In G. Piatetsky-Shapiro & W.J. Frawley, editors: Knowledge Discovery in Databases. AAAI Press, 1991, 229–248. 33. Polkowski L.: Rough Sets: Mathematical Foundations. Springer, 2002. 34. Sese J., Morishita S.: Answering the most correlated n association rules efficiently. In Proceedings of the 6th European Conf on Principles of Data Mining and Knowl- edge Discovery 2002, Springer-Verlag, 410–422. 35. Tan P.-N., Kumar V., Srivastava J.: Selecting the right objective measure for as- sociation analysis. Information Systems 29(4)(2004), 293–313. 36. Tan P.-N., Steinbach M., Kumar V.: Introduction to Data Mining. Addison-Wesley, 2005. 37. Vaillant B., Lenca P., Lallich S.: A Clustering of Interestingness Measures. DS’04, the 7th International Conference on Discovery Science LNAI 3245 (2004), 290– 297. 38. Vaillant B.: Mesurer la qualité des règles d’association: études formelles et expérimentales. PhD thesis, ENST Bretagne, 2006. 39. Wang X., Ma J.: A novel approach to attribute reduction in concept lattices. RSKT 2006, Lecture Notes in Artificial Intelligence 4062 (2006), 522–529. 40. Wille R.: Restructuring lattice theory: an approach based on hierarchies of con- cepts. In: Rival I.: Ordered Sets. Reidel, Dordrecht, Boston, 1982, 445–470. 41. Zhang W.-X., Wie L., Qi J.-J.: Attribute reduction in concept lattices based on discernibility matrix. RSFDGrC 2005, Lecture Notes in Artificial Intelligence 3642 (2005), 157–165. Combining Formal Concept Analysis and Translation to Assign Frames and Thematic Role Sets to French Verbs Ingrid Falk1 , Claire Gardent2 1 INRIA/Nancy Universités, Nancy (France) 2 CNRS/LORIA, Nancy (France) Abstract. We present an application of Formal Concept Analysis in the domain of Natural Language Processing: We give a general overview of the framework, describe its goals, the data it is based on, the way it works and we illustrate the kind of data we expect as a result. More specifically, we examine the ability of the stability, separation and probability indices to select the most relevant concepts with respect to our FCA application. We show that the sum of stability and separation gives results close to those obtained when using the entire lattice. 1 Introduction Ideally natural language processing (NLP) applications need to analyse texts to answer the question of “Who did What to Whom”. For computers to effectively extract this information from texts, it is essential that they be able to detect the events that are being described and the event participants. Because events are mostly lexicalised using verbs, one ingredient that is essential for such sys- tems is detailed knowledge about their syntactic and semantic behaviour. It has been shown (Briscoe and Carroll (1993), Carroll and Fang (2004)) that detailed subcategorisation information (that is, information about the number and the syntactic type of verb complements) is crucial in enhancing their linguistic cover- age and theoretical accuracy. However this syntactic information is not sufficient to specify “Who did what to Whom” because it does not allow to identify the thematic roles participating in the event described by the verb. For example in John threw a ball to Mary the syntactic analysis of the sentence would not allow to identify John which is the syntactic subject of the sentence as the Agent or Causer of the throwing event, Mary, syntactically the prepositional object as the Destination and ball (the object) as the item being thrown. To help computer systems in this task of understanding and representing the full meaning of a text, verb classifications have been proposed which group to- gether verbs with similar syntactic and semantic behaviour, ie. which associate groups of verbs with subcategorisation frames showing the syntactic construc- tions the verbs may appear in and sets of thematic roles which represent the participants in an event described by the verbs in the group. c 2011 by the paper authors. CLA 2011, pp. 223–238. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 224 Ingrid Falk and Claire Gardent For English, there exist several large scale resources providing verb classes (eg. Framenet Baker et al. (1998) and VerbNet Schuler (2006), the classifica- tion we use in our framework) in a format that is amenable for use by natural language processing systems. For example for the verb throw the corresponding VerbNet class shows that the participants in a throwing event are an Agent, a Theme (the thing being thrown), a Source and a Destination. In addition, the VerbNet class provides the syntactic constructions the verb can occur in (eg. Subject(John) V(throws) Object(a ball ) PrepObject(to Mary)) and shows how the participant roles can be realised as syntactic arguments: In the exam- ple above the Agent (John) is realised syntactically as Subject, the Theme (the ball ) as Object and the Destination (to Mary) as prepositional object (PrepObject). For French however, existing verb classes are either too restricted in scope (Volem Saint-Dizier (1999)) or not sufficiently structured (the LADL tables Gross (1975)) to be directly useful for NLP. Even though recently other large cov- erage syntactic-semantic resources for French have been made available (Tolone (2011) as well as further processed versions of Dubois and Dubois-Charlier (1997), Hadouche and Lapalme (2010)) the terminology and linguistic formalisms they are based on is often still hardly compatible with the methods and tools currently used in the NLP community. In this paper we present a method for providing a VerbNet style classifica- tion of French verbs which associates verbs with syntactic constructions on the one hand and sets of semantic role sets (the set of semantic roles participating in the event described by the verb) on the other. To obtain this classification, we build and combine two independent classifications. The first is semantic and is obtained from the English VerbNet (VN) by translation, the second is syn- tactic and is obtained by building an FCA (Formal Concept Analysis) lattice from three, manually validated syntactic lexicons for French. The first asso- ciates groups of French verbs with the semantic roles of the English VN class. The second associates groups of French verbs (the concept extent) with syntactic constructions (concept intent). We then merge both classifications by associating with each translated VN class, the FCA concept whose verb set yields the best F-measure with respect to the verb sets contained in each translated VN class. We thus effectively associate the set of semantic roles of the VN class to the group of French verbs and the syntactic information given by the FCA concept. In the past several linguistic FCA applications have been presented, as Priss (2005) shows in her overview. For example, Sporleder (2002) describes an FCA based approach to build structured class hierarchies starting from unstructured lexicon entries while the features used for building classes in the approach pre- sented in (Cimiano et al., 2003) are collected from a corpus. Our approach (based on earlier work presented in Falk et al. (2010), Falk and Gardent (2010)) is con- cerned with building a lexical resource based on lexicons and is therefore related to the FCA approach in (Sporleder, 2002). However, the features we use are different. In addition we explore the use of concept selection indices to filter the concept lattices and finally relate the formal concepts we obtain to other classes Combining FCA and Transl. to Assign Frames and Thematic Grids to 225 French Verbs obtained by a clustering approach based on different numeric features extracted from lexicons and English-French dictionaries. In the following we first introduce the terminology and data used in our application domain. Next we describe how we associate groups of French verbs with syntactic information using Formal Concept Analysis (Section 3). As the resulting concept lattice has a very large number of concepts which are mostly not useful verb classes we explore methods to select the concepts most relevant to our application (Section 4). We show in particular that selecting only ∼ 10% of the concepts of the lattice using indices proposed in Klimushkin et al. (2010) gives results close to those obtained when using the entire lattice. We then show how we build the translated VerbNet classes and how they are mapped to the previously pre-selected FCA concepts (Section 5). Finally in Section 6 we present the kind of associations we obtain by our method. 2 Linguistic Concepts and Resources Our aim is to build a lexicon associating groups of French verbs with: 1) the syntactic constructions the verbs of this group may appear in, 2) the semantic roles participating in an event described by a verb of this group. Syntactic constructions a verb may occur in are described using subcategorisation frames (SCF) and are usually part of a lexical entry describing the verb. A subcategorisation frame (SCF) characterises the number and the type of the syntactic arguments expected by a verb. Each frame describes a set of syntactic arguments and each argument is characterised by a grammatical function (eg. SUJ - subject, OBJ - direct object etc.) and a syntactic category (NP indicates a noun phrase, PP a prepositional phrase, etc.). For example John throws a ball to Mary. is a possible realisation of the subcategorisation frame SUJ:NP V OBJ:NP POBJ:PP. The semantic (thematic) roles are the participants in an event described by a particular verb. To date there is no consensus about a set of semantic roles or a set of tests determining them. There may be a general agreement on a set of Semantic Roles (eg. Agent, Patient, Theme, Instrument, Location, etc.) but there is substantial disagreement on when and where they can be assigned (Palmer et al., 2010). Thus each of the well known resources (FrameNet (Baker et al., 1998), PropBank (Palmer et al., 2005), VerbNet (Schuler, 2006), LVF (Dubois and Dubois-Charlier, 1997)) providing semantic role information have their own semantic role inventory. In our work we chose the VerbNet semantic role inventory for several reasons: 1. VN semantic roles provide a compromise between generalisation and speci- ficity in that they are common across all verbs3 but are still able to capture specificities of particular classes. 3 in contrast to FrameNet Baker et al. (1998) and PropBankPalmer et al. (2005) roles. 226 Ingrid Falk and Claire Gardent 2. VN roles are among those generally agreed upon in the community. 3. None of the other resources provide the link between syntactic arguments and semantic roles across different verbs. 4. Semantic roles are expected to be valid across languages and by using the same role inventory as for English we hope to leverage some of the substantial research done for English and link syntactic information for French with semantic information provided by the English classes. Our method allows us to detect groups of French verbs with the same role set as some English VerbNet class and gives information about how these semantic roles are realised syntactically in French. Figure 1 shows an excerpt of the throw-17.1 VerbNet class, with its verbs, thematic roles and subcategorisation frames. verbs (32): kick, launch, throw, tip, toss, ... sem. roles: Agent, Theme, Source, Destination SCFs sem. roles Subject V Object Agent V Theme John throws a ball Subject V Object PrepObject Agent V Theme Destination frames (8): John throws a ball to Mary Subject V Object Object Agent V Destination Theme John throws Mary a ball etc. Fig. 1: Simplified VerbNet class throw-17.1. Thus, from this data an English NLP system analysing the sentence John threw a ball to Mary could infer the semantic roles involved in the event, namely those given by the VerbNet class. It could also detect the possible semantic roles realised by the syntactic arguments: It would know that the subject is a realisation of the Agent semantic role, the object of the Theme or Destination semantic roles, etc. 3 Associating French Verbs with Subcategorisation Frames To associate French verbs with syntactic frames, we use the FCA classification approach where the objects are verbs and the attributes are the subcategorisa- tion frames associated with these verbs by the subcategorisation lexicon to be described below. 3.1 Subcategorisation Lexicons Subcategorisation information is retrieved from three existing lexicons for French: Dicovalence van den Eynde and Mertens (2003), the LADL tables Gross (1975), Combining FCA and Transl. to Assign Frames and Thematic Grids to 227 French Verbs Guillet and Leclère (1992) and finally TreeLex Kupść and Abeillé (2008). Each of these was constructed manually or with an important manual validation by linguists. The combined lexicon covers 5918 verbs, 345 SCFs and has a total of 20443 hverb, framei pairs. Table 1 shows sample entries in this lexicon for the verb expédier (send). Using the Galicia Lattice Builder software4 , we first build Verb: expédier SCF Source info SUJ:NP,DUMMY:REFL DV:41640,41650 SUJ:NP,OBJ:NP DV:41640,41650;TL SUJ:NP,OBJ:NP,AOBJ:PP TL SUJ:NP,OBJ:NP,POBJ:PP,POBJ:PP LA:38L Table 1: Sample entries in subcategorisation lexicon for verb expédier (send). a concept lattice based on the formal context hV, F, Ri such that: – V is the set of verbs in our subcategorisation lexicon. We ignore verbs with only one SCF as they will result in classes associating verbs with a unique frame. – F is the set of subcategorisation frames (SCFs) present in the subcategori- sation lexicon, – R is the mapping such that (v, f ) ∈ R iff the subcategorisation lexicon associates the verb v with the SCF f . The resulting formal context is made of 2091 objects (verbs) and 238 attributes (frames), giving rise to a lattice of 12802 concepts. Clearly however not all these concepts are interesting verb classes. Classes aim to factorise information and express generalisations about verbs. Hence, concepts with few (1 or 2) verbs can hardly be viewed as classes and similarly, concepts with few frames are less interesting. To select from this lattice those concepts which are most likely to provide the most relevant verb-frame associations, we explore the use of three indices for concept selection: concept stability, separation and probability which have been proposed and analysed in (Klimushkin et al., 2010). In Section 4.2 we investigate which of these indices performs best in the context of our application. We then use the best performing concept filtering method to select the most relevant concepts with respect to our data. For each translated VN class we then identify among the selected FCA concepts the one(s) with best f-measure between preci- sion and recall. For a translated VN class CV N (consisting of French verbs) and the extent (verb set) of an FCA concept CF CA precision, recall and f-measure |CV N ∩ CF CA | |CV N ∩ CF CA | 2RP are computed as follows: R = ,P = ,F = |CV N | |CF CA | R+P The translated VN class is then associated with the FCA concept(s) with best F-measure. Thus the verbs in the FCA concept are effectively associated with the thematic roles of the translated class and at the same time with the syntactic subcategorisation frames in the intent (attribute set) of the FCA concept. 4 http://www.iro.umontreal.ca/~galicia/ 228 Ingrid Falk and Claire Gardent 4 Filtering Concept Lattices The lattices we have to deal with are very large and many of the concepts do not represent valid verb classes. To select those concepts which are most relevant in the context of our application the concept lattice needs to be filtered. Klimushkin et al. (2010) propose three indices for selecting relevant concepts in concept lattices built from noisy data: concept stability, separation and probability. In this section, we investigate which of these indices works best for our data. Concept stability is a measure which helps discriminating potentially interesting patterns from irrelevant information in a concept lattice based on possibly noisy data. The stability of a concept C = (V, F ) is the proportion of subsets of the extent V which have the same attribute set F as V : |{A ⊆ V | A0 = F }| 5 σ((V, F )) = . (1) 2|V | Intuitively, a more stable concept is less dependant on any individual object in its extent and is therefore more resistant to outliers or other noisy data items. Concept separation indicates the significance of the difference between the ob- jects covered by a given concept from other objects and, simultaneously, between its attributes and other attributes: |V | |F | s((V, F )) = P P . (2) v∈V |{v}0 | + 0 f ∈F |{f } | − |V | |F | Intuitively we expect a concept with high separation index to better sort out the verbs it covers from other verbs and simultaneously the frames it covers from other frames. Whereas concept stability is a measure concerned with either ob- jects or attributes, separation gives information about objects and attributes at the same time. Concept probability. For an attribute a ∈ A, the attribute set, we denote by pa the probability of an object to have the attribute a. In practise it is the propor- 0 tion of objects having a: pa = |{a} | |O| , where O denotes the set of objects. For B ⊆ A, we define pBYas the probability of an arbitrary object having all attributes from B: pB = pa . This formulation assumes the mutual indepen- a∈B dence of attributes. Based on this, and denoting n = |O| we obtain the following formula for the probability of B being closed: n X p(B = B 00 ) = p(|B 0 | = k, B = B 00 ) (3) k=0 n " # X n k Y = p (1 − pB )n−k (1 − pka ) (4) k B k=0 a∈B / 5 Here and in the following 0 represents the operator on the power sets of objects: 0 : 2O → 2A , X 0 = {a ∈ A | ∀o ∈ X. (o, a) ∈ R} and dually on that of attributes. Combining FCA and Transl. to Assign Frames and Thematic Grids to 229 French Verbs A small p(B = B 00 ) suggests a small probability of the attribute combination B to be a concept intent by chance only (and p(B = B 00 ) ≈ 1 that there is a high probability that the combination is a concept intent by chance). However, this reasoning is based on the independence of the attributes, which in our particular case can not be warranted. 4.1 Computing Stability, Separation and Probability Indices. Stability. Calculating stability is known to be NP-complete (Kuznetsov, 2007), however Jay et al. (2008) show that when the concept lattice is known it can be computed efficiently by a bottom-up traversal algorithm introduced in (Roth et al., 2006). This is the algorithm we used to compute concept stability. Separation can be computed in O(|O| + |A|) time, where O and A are the object and attribute sets respectively. Computing separation is the least prohibitive of the three indices. Probability. Klimushkin et al. (2010) show that computing probability of only one concept involves O(|O|2 ·|A|) multiplication operations which is computationally very costly. With the computational means at our disposal it was not possible for us to compute the concept probabilities. We therefore computed approximations derived as follows: Y First, we consider (1 − pka ) ≈ 1 for k > 40. In view of this, Equation (4) a∈B becomes: 40 " # X n k Y 00 n−k k p(B = B ) = p (1 − pB ) (1 − pa ) (5) k B k=0 a∈B / X n n k n−k + p (1 − pB ) (6) k B k=41 n X n As pk (1 − p)n−k = 1, Term (6) can be rewritten as: k k=0 40 X n 1− pkB (1 − pB )n−k = (7) k k=0 1 − F (40; n, pB ). (8) Xk n i F (k; n, p) = p (1 − p)n−1 is the cumulative distribution function of the i=0 i binomial distribution6 and can be computed using various statistical software packages. Term (5) can also be computed more easily considering that nk pkB (1− pB )n−k are binomial densities the computation of which is also provided by statistics software7 6 Source Wikipedia: http://en.wikipedia.org/wiki/Binomial_distribution 7 We used the R software environment for statistical computing (http://www. r-project.org/). 230 Ingrid Falk and Claire Gardent 4.2 Evaluating the Concept Selection Indices In the following we measure the performance of the three concept selection in- dices with respect to our data. The experimental setting is as follows: We first select a number of N (1500) concepts with best selection index. The selected concepts are aligned with the classes translated from VerbNet (see Sec- tion 5): For each translated class, we select the concept with best precision/recall f-measure. Then we associate to the concept with best f-measure the thematic roles of the translated VN class. Next we compare the obtained hverb, thematic role seti associations with those given by a reference. As for our task recall is more important than precision, we use the F 2 measure, which gives more weight to recall, for comparison. As reference we use the data used for training the classifier for learning the translated VN classes (see Section 5): we are checking which index selects the most relevant concepts, that is those best matching the translated classes. The reference consists of the hverb, semantic role seti pairs marked as positive examples in the training set, ie. those for which we considered that the French verbs could have the semantic roles given by the English VN class. Table 2 shows cov. prec. rec. F2 stab only 39.88 18.96 32.55 26.27 sep only 34.25 28.37 21.52 23.41 prob only 35.53 26.60 20.73 22.38 w/o filtering 100 12.30 60.96 26.30 Table 2: F2 scores and coverage for stability, separation and the 6th probability 10- quantile. the F2 scores and coverage when using only one index at a time. For stability and separation we applied the method above on the top ranking 1500 concepts. Regarding probability, at first sight, we should consider best the concepts with lowest probability – because the probability of their intents of being closed by chance only is accordingly low. However, looking at the data we found that these concepts have very few verbs and large intent (frame) sets - which rather suggest improbable or rare verb groups. On the other hand, the interpretation of concept probability suggests that a concept with a probability close to 1 could occur by chance only. For these reasons, to assess probability separately we settled on the 6th 10 quantile. The results confirm the observations of Klimushkin et al. (2010): stability alone gives F2 scores close to an upper bound – the results obtained without filtering, ie. aligning the translated classes with all the concepts of the lattice. The results for separation and probability are several points lower. As we only select ∼ 10% of the total number of concepts we also have to make sure that the selected concepts cover at least a reasonable amount of verbs. The cov column gives the percentage of verbs in the lattice covered by the selected concepts. It shows that using only one index at a time the pre-selected concepts would contain only 35% − 40% of the verbs in the entire lattice, which is unsatisfactory. Combining FCA and Transl. to Assign Frames and Thematic Grids to 231 French Verbs Klimushkin et al. (2010) investigate the performance of the stability, sepa- ration and probability indices at finding the original concepts in lattices pro- duced from contexts which were previously altered by introducing two types of noise: Type I noise is obtained by altering every cell in the context with some probability, Type II noise is obtained by adding a given number or pro- portion of random objects or attributes. According to this, our contexts are affected by Type I noise rather than Type II. Klimushkin et al. (2010) found that stability was most effective at sorting out Type II noise, but also proved helpful in the case of Type I noise. In contrast, they suggest that separation and probability can not be used on their own but should rather serve as a nor- malising measure for stability. The most promising combination seemed to be: stability + ksep · separation − kprob · probability. In the following we start from the assumption that the most effective index for selecting relevant concepts is given by a linear combination of stability, sepa- ration and probability: kstab · stability + ksep · separation − kprob · probability, and empirically determine the coefficients kstab , ksep and kprob such that the selected concepts perform best with respect to our task. We proceed as follows: We choose kstab , ksep and kprob . We then compute the corresponding linear combination for the concepts and select the 1500 concepts ranking highest. As in the previous experiments, we measure the relevance of the selected concepts by aligning the concepts with the translated VN classes and by comparing the alignments with the same reference as before. We consider the “best” kstab , ksep , kprob combination the one giving highest F2 scores and good coverage. Table 3a shows the results for a first series of experiments where kstab and ksep were assigned the values 0.5 and 1 and kprob 0.25 and 0.5 (The lines are sorted by decreasing F2 score). They suggest that the stability and separation coefficients had less impact on coverage and F2 score than the probability coef- ficient. Interestingly the coverage is correlated with the F2 score. In the second series of experiments, shown in Table 3b, we kept the stability and separation coefficients fixed and varied only the probability coefficient. These results suggest that the probability coefficient may not help at selecting the most relevant concepts in our setting. This may be due first to the fact that our attributes are not independent (we assumed independence of attributes when setting up the formula for computing the probability index) and second to the fact that we had to approximate the probability index and this approximation may not be accurate enough. In the next series of experiments we investigated the impact of the number of preselected concepts (500). The results showed that with this smaller num- ber of concepts the selected concepts reached a slightly smaller F2 score but a substantially lower coverage. Also, in this configuration the probability index did seem to be helpful. Preselecting 1000 concepts confirmed the previously ob- served tendencies: The F2 score and coverage were only slightly lower than when preselecting 1500 concepts and again the probability index seemed to have only a small impact on the overall results. 232 Ingrid Falk and Claire Gardent (a) F2 and coverage when kstab , ksep ∈(b) F2 and coverage when kstab and ksep {0.5, 1}, kprob ∈ {0.25, 0.5}. are kept fixed and kprob varies. kstab ksep kprob cov. prec. rec. F2 kstab ksep kprob cov. prec. rec. F2 1 1 0.25 98.04 11.87 55.19 24.89 1 1 0 98.04 12.05 55.12 25.16 1 0.5 0.25 98.04 11.87 55.19 24.89 1 1 0.05 98.04 12.05 55.12 25.16 1 0.5 0.5 57.69 17.08 30.18 24.04 1 1 0.005 98.04 12.05 55.12 25.16 1 1 0.5 56.15 17.45 29.13 23.82 1 1 0.0005 98.04 12.05 55.12 25.16 0.5 0.5 0.25 56.15 17.45 29.13 23.82 1 1 0.1 98.00 11.91 55.38 25.00 0.5 1 0.25 53.81 18.03 27.82 23.36 1 1 0.2 98.08 11.88 55.12 24.91 0.5 0.5 0.5 49.72 18.55 26.25 23.06 1 1 0.25 98.04 11.87 55.12 24.89 0.5 1 0.5 49.90 18.61 25.98 22.95 1 1 0.3 98.00 11.79 55.38 24.80 1 1 0.4 59.95 16.27 31.23 23.91 1 1 0.5 56.16 17.45 29.13 23.82 w/o filtering 100 12.30 60.96 26.30 Table 3: F2 scores and coverage for various kstab , ksep , kprob combinations. From these experiments we conclude the following: First they suggest that the best linear combination is the sum of the stability and separation indices as the F2 measure and the coverage for this combination are similar to those of an upper bound, ie. the alignment obtained without filtering. They show that selecting only ∼ 10% of the original lattice gives a verb, frame, semantic role set alignment which is close to the alignment obtained when using the entire lattice and that the pre-selected concepts also have a similar coverage. Second, it does not seem evident that probability has a positive effect on the selected concepts. However, it does improve f-measure when the number of selected concepts is lower (500 or 1000 vs. 1500 in our experiments). Hence, for our application we concluded that it is a better strategy to select a larger number of concepts (1500) and not take probability into account. This is even more so as the probability index in our case should be taken with caution because first we had to use an approximation to compute it which may be too rough, and second the computation of probability is based on the independence of attributes which is not warranted in our case. 5 Associating French Verbs with Thematic Role Sets. We associate French verbs with thematic role sets by translating the English VerbNet classes to French using 3 English-French dictionaries. In the following we first briefly describe the relevant resources, ie. VerbNet and the dictionaries before giving the translation methodology. As for this paper only the translated classes, but not the method to produce them is relevant8 we only very briefly sketch the methodology. 8 Of course better translated classes will result in a better performance of our method, but it is not straight forward to evaluate the quality of the translated classes. Combining FCA and Transl. to Assign Frames and Thematic Grids to 233 French Verbs VerbNet (Schuler (2006)) is the largest electronic verb classification for En- glish. It was created manually and classifies 3626 verbs using 411 classes. Each VN class includes among other things a set of verbs, a set of subcategorisation frames and a set of thematic roles. Figure 2 shows an excerpt of the amuse-31.1 class, with its verbs, thematic roles and subcategorisation frames. verbs (242): abash, affect, afflict, amuse, annoy, . . . thematic roles: Experiencer, Cause NP V NP Experiencer V Cause NP V ADV-Middle Experiencer V Adv frames (6): NP V NP-PRO-ARB Cause V ... Fig. 2: Simplified VerbNet class amuse-31.1. English-French dictionaries. We use the following resources to translate the verbs in the English VN classes to French: Sci-Fran-Euradic, a French-English bilingual dictionary, built and improved by linguists , Google dictionary9 and Dicovalence van den Eynde and Mertens (2003)10 . The merged dictionary contains 51242 French-English verb pairs. In the following we describe our method for translating the English VerbNet classes to French. The translation of VerbNet classes is bound to be very noisy because verbs are polysemous and the dictionaries typically give translations for several readings of the verb: Thus the dictionary may give several translations vf r which do not correspond to the meaning given by the hven , classi pair or this meaning may even not be covered at all by the dictionary. To get more accurate translated VN classes we use a machine learning method, namely Support Vector Machines (SVM)11 . We follow a straight forward SVM application scenario: we build all the French verb, VN class pairs hvf r , CV N i where vf r is a translation of an English verb in CV N . The classifier has to give a probability estimate about whether this association is correct or not. For training the classifier we use the 160 verbs appearing in the gold standard proposed by Sun et al. (2010)12 . We build the pairs hvf r , CV N i where vf r is a verb in the gold standard which is a translation of a verb in CV N . For each of these pairs we assessed whether or not there was a meaning of vf r where the semantic roles involved in the event described by the verb were those given by CV N . The features associated to the hverb, classi pairs are numeric and are extracted from the dictionaries and VerbNet. 9 http://www.google.com/dictionary. We obtained 13824 French-English verb pairs. 10 The number of French-English verb pairs we obtained is 11351 11 We used libsvm, the software package and methodology presented on http://www. csie.ntu.edu.tw/~cjlin/libsvm/, Chang and Lin (2011). 12 In fact this is the only existing gold standard for French VerbNet style classes and we also use it for the overall evaluation of our system (not presented in this paper). 234 Ingrid Falk and Claire Gardent The trained classifier is then used to produce probability estimates for all verb, class instances. We select the 6000 pairs with highest probability esti- mates13 and finally obtain the translated classes by assigning each verb in a selected pair to the corresponding class. To give an idea of the quality of the obtained classes: The accuracy of the classifier on the held out test set was 90%, compared to a maximum accuracy of 93.84% for five fold cross-validation on the development set. The frequency distribution of the translated classes obtained this way is much closer to the distribution of verbs in VerbNet classes as when using an approach based only on translation frequencies, thus providing more accurate verb groups to guide the FCA concept - thematic roles associations. 6 The French Verb ↔ Thematic Role Sets ↔ Syntactic Frame Associations As a detailed and thorough evaluation of the verb, thematic role sets and syntac- tic frames associations would be out of the scope of this paper we only give here an intuition of the type of information provided by our method. Following the preliminary investigations in the previous sections we associated French verbs with subcategorisation frames and thematic role sets according to the scheme listed below: – We group the VerbNet thematic roles and assign to one class all the VN verbs whose class have the same role set. We then translate the obtained classes using the methods described in Section 5. – We use FCA to group French verbs and syntactic frames associated to these verbs by the lexicons described in Section 3. The concept lattices we create are based on the formal contexts consisting of French verbs as objects and SCFs as attributes. – We then select the 1500 concepts where the sum of the stability and sepa- ration indices is highest because in Section 4 we found this combination of concept selection indices to work best for our application. – For each translated VN class we identify among the 1500 filtered FCA con- cepts the one(s) with best f-measure between precision and recall. The translated VerbNet class is then associated with this FCA concept(s). Thus the verbs in the FCA concept are effectively associated with the thematic role set of the translated class and at the same time with the syntactic frames in the intent (attribute set) of the FCA concept. Figure 3 shows the associations between concepts, thematic role sets and frames generated by our method for some VN classes14 . The figure shows the concepts associated to these thematic role sets and for each of these concepts: their attribute set (syntactic frames), 13 In VerbNet there are 5726 verb, class pairs 14 These are the classes occuring in the gold standard proposed by Sun et al. (2010), mentioned in Section 5. Combining FCA and Transl. to Assign Frames and Thematic Grids to 235 French Verbs 1248 5022 32 7191 5312 617 SUJ:NP,OBJ:NP SUJ:NP,OBJ:NP,DEOBJ:PP SUJ:NP SUJ:NP,OBJ:Ssub SUJ:NP,OBJ:NP,POBJ:PP,POBJ:PP SUJ:NP,DEOBJ:Ssub,POBJ:PP SUJ:NP,OBJ:NP,POBJ:PP AgExp-End-Theme AgExp-Location-Theme AgExp-PredAtt-Theme AgExp-End-Start-Theme AgentSym-Theme AgExp-Instrument-Patient AgExp-Start-Theme verb set: 977 verbs verb set: 343 verbs verb set: 52 verbs verb set: 33 verbs verb set: 1706 verbs verb set: 300 verbs 4584 SUJ:NP 18868 7190 1227 SUJ:NP,AOBJ:PP SUJ:NP,OBJ:NP SUJ:NP,DEOBJ:PP SUJ:NP,OBJ:NP SUJ:NP,OBJ:NP SUJ:Ssub,OBJ:NP SUJ:NP,OBJ:NP SUJ:NP,OBJ:Ssub AgExp-PatientSym SUJ:NP,OBJ:NP,DEOBJ:PP AgExp-Cause AgExp-Theme SUJ:NP,OBJ:NP,POBJ:PP verb set: 122 verbs verb set: 354 verbs verb set: 326 verbs AgExp-Beneficiary-Extent-Start-Theme verb set: 17 verbs Fig. 3: French verb ↔ synt. frames ↔ thematic role set associations. the associated thematic role set(s), the number of verbs in the concept and the hierarchical relations between the concepts as given by the concept lattice. Thus for example the following 11 verbs (occuring in the gold standard) bouger, déplacer, emporter, passer, promener, envoyer, expédier, jeter, porter, transmettre, transporter are in concept 5312 and thereby may be used in the construction SUJ:NP,OBJ:NP,POBJ:PP,POBJ:PP15 (according to our lexical re- sources). When they occur in this construction they are associated with the thematic role set AgExp, End, Start, Theme, i.e. the semantic roles involved are an Agent or Experiencer, a Start point, an End point and a Theme. The listed verbs are all verbs of movement where an agent may move a theme from a start point to an end point – therefore in this case the associations with the syntactic frame and thematic role set seem to be correct. An NLP sys- tem which encounters the verb déplacer for example, used in the construction SUJ:NP,OBJ:NP,POBJ:PP,POBJ:PP could infer that possible thematic roles in- volved in the described event are an Agent (or Experiencer), a Theme, an End point and a Start point. However, it still would not know which thematic role is realised by which syntactic argument. There are also some problems with these associations. As can be seen in Figure 3, there is one case where the classification maps the same concept to two distinct VerbNet classes (AgExp-End-Theme and AgExp-Instrument-Patient). In addition, verbs in sub-concepts inherit the class label of the super-concept. Although there are verbs which belong to several VN classes, in many cases this multiple mapping was not warranted. Improving the precision of these mappings requires further investigations. 7 Conclusion We introduced a new approach to verb clustering which involves the combined use of the English VerbNet, a bilingual English-French lexicon and a merged subcategorisation lexicon for French. Using these resources, we built two classi- fications, one derived from the English VN by translation and the other, from the subcategorisation lexicons via the construction of a formal concept lattice. We then use the translated VN to associate FCA concepts with VN classes 15 a transitive construction with two additional prepositional objects 236 Ingrid Falk and Claire Gardent and thereby associate verbs with both syntactic frames and a thematic role set. We explored the performance of the concept selection indices introduced by Klimushkin et al. (2010) which are stability, separation and probability at selecting most relevant concepts with respect to our data and found that the sum of stability and separation gave best results in the setting of our appli- cation. These results were similar to those obtained without filtering, showing that this combination of the indices did indeed allow to select the most relevant concepts with respect to our data. Finally we showed the French verb, syntactic constructions and semantic role sets associations we obtained and briefly illus- trated their potential use. Thus Formal Concept Analysis in combination with the concept selection indices, translation and set mapping methods proved an adequate method in this knowledge acquisition process. Combining FCA and Transl. to Assign Frames and Thematic Grids to 237 French Verbs Bibliography Baker, C. F., Fillmore, C. J., and Lowe, J. B. (1998). The berkeley FrameNet project. In Proceedings of the 17th International Conference on Computational Linguistics, volume 1, pages 86–90, Montreal, Quebec, Canada. Association for Computational Linguistics. Briscoe, T. and Carroll, J. (1993). Generalized probabilistic lr parsing of natu- ral language (corpora) with unification-based grammars. Comput. Linguist., 19(1):25–59. Carroll, J. and Fang, A. C. (2004). The automatic acquisition of verb sub- categorisations and their impact on the performance of an hpsg parser. In IJCNLP, pages 646–654. Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1– 27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Cimiano, P., S.Staab, and Tane, J. (2003). Automatic Acquisition of Taxonomies from Text: FCA meets NLP. In Proceedings of the PKDD/ECML’03 Inter- national Workshop on Adaptive Text Extraction and Mining (ATEM), pages 10–17. Dubois, J. and Dubois-Charlier, F. (1997). Les verbes français. Larousse. Falk, I. and Gardent, C. (2010). Bootstrapping a Classification of French Verbs Using Formal Concept Analysis. In Interdisciplinary Workshop on Verbs In- terdisciplinary Workshop on Verbs, page 6, Pisa Italy. Falk, I., Gardent, C., and Lorenzo, A. (2010). Using Formal Concept Analysis to Acquire Knowledge about Verbs. In Concept Lattices and their applications, page 12, Sevilla, Spain. Gross, M. (1975). Méthodes en syntaxe. Hermann, Paris. Guillet, A. and Leclère, C. (1992). La structure des phrases simples en français. 2 : Constructions transitives locatives. Droz, Geneva. Hadouche, F. and Lapalme, G. (2010). Une version électronique du LVF com- parée avec d’autres ressources lexicales. Langages, pages 193–220. Mise en page différente que celle parue dans la revue. Jay, N., Kohler, F., and Napoli, A. (2008). Analysis of social communities with iceberg and stability-based concept lattices. In ICFCA’08: Proceedings of the 6th international conference on Formal concept analysis, pages 258–272, Berlin, Heidelberg. Springer-Verlag. Klimushkin, M., Obiedkov, S., and Roth, C. (2010). Approaches to the selection of relevant concepts in the case of noisy data. In Kwuida, L. and Sertkaya, B., editors, Formal Concept Analysis, volume 5986 of Lecture Notes in Com- puter Science, chapter 18, pages 255–266. Springer Berlin / Heidelberg, Berlin, Heidelberg. Kupść, A. and Abeillé, A. (2008). Growing treelex. In Gelbkuh, A., editor, Computational Linguistics and Intelligent Text Processing, volume 4919 of Lecture Notes in Computer Science, pages 28–39. Springer Berlin / Heidelberg. 238 Ingrid Falk and Claire Gardent Kuznetsov, S. O. (2007). On stability of a formal concept. Annals of Mathematics and Artificial Intelligence, 49(1-4):101–115. Palmer, M., Gildea, D., and Xue, N. (2010). Semantic Role Labeling. Synthesis lectures on human language technologies. Morgan & Claypool Publishers. Palmer, M., Kingsbury, P., and Gildea, D. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1):71–106. Priss, U. (2005). Linguistic Applications of Formal Concept Analysis. In Ganter, B., Stumme, G., and Wille, R., editors, Formal Concept Analysis, volume 3626 of Lecture Notes in Computer Science, pages 149–160–160. Springer Berlin / Heidelberg. Roth, C., Obiedkov, S. A., and Kourie, D. G. (2006). Towards concise represen- tation for taxonomies of epistemic communities. In CLA, pages 240–255. Saint-Dizier, P. (1999). Alternation and verb semantic classes for french: Analysis and class formation. In Predicative forms in natural language and in lexical knowledge bases. Kluwer Academic Publishers. Schuler, K. K. (2006). VerbNet: A Broad-Coverage, Comprehensive Verb Lexi- con. PhD thesis, University of Pennsylvania. Sporleder, C. (2002). A Galois Lattice based Approach to Lexical Inheritance Hierarchy Learning. In 15th European Conference on Artificial Intelligence (ECAI’02): Workshop on Machine Learning and Natural Language Processing for Ontology Engineering, Lyon, France. Sun, L., Korhonen, A., Poibeau, T., and Messiant, C. (2010). Investigating the cross-linguistic potential of VerbNet-style classification. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’10, pages 1056–1064, Stroudsburg, PA, USA. Association for Computational Linguistics. Tolone, E. (2011). Analyse syntaxique à l’aide des tables du Lexique-Grammaire du français. PhD thesis, LIGM, Université Paris-Est, France, Laboratoire d’Informatique Gaspard-Monge, Université Paris-Est Marne-la-Vallée, France. (326 pp.). van den Eynde, K. and Mertens, P. (2003). La valence : l’approche pronominale et son application au lexique verbal. Journal of French Language Studies, 13:63–104. Generation algorithm of a concept lattice with limited object access Ch. DemkoF, K. Bertet L3I - Université de La Rochelle - av Michel Crépeau - 17042 La Rochelle cdemko,kbertet@univ-lr.fr F Joomla! Production Leadership Team christophe.demko@joomla.org Abstract. Classical algorithms for generating the concept lattice (C, ≤ ) of a binary table (O, I, R) have a complexity in O(|C| ∗ |I|2 ∗ |O|). Although the number of concepts is exponential in the size of the table in the worst case, the generation of a concept is output polynomial. In practice, the number of concepts is often polynomial in the size of the table. However, the cost of generating a concept remains high when the table is composed of a large number of objects. We propose in this paper an algorithm for generating the lattice with limited object access, which can improve the computation time. Experi- ments were conducted with Joomla!, a content management system based on relational algebra, and located on a MySQL database. keywords: concept lattice ; databases ; algorithm 1 Introduction Galois lattices (or concept lattices) were first introduced in a formal way in the graph and ordered structures theory [1–3]. Later, they were developed in the field of Formal Concept Analysis (FCA) [4] for data analysis and classification. The concept lattice structure, based on the notion of concept, enables data description while preserving its diversity. It is used to analyse data when organised by a binary relation between objects and attributes. Galois lattice is a graph providing a representation of all the possible cor- respondences between a set of objects (or examples) O and a set of attributes (or features) I. The technological improvements of the last decades enable use of these structures for data mining problems though they are exponential in space/time (worst case). It has to be noted that in practice, in most cases, the size of the lattice remains reasonable. In addition, some applications offer to only generate some concepts from the huge amount of available data. Bordat’s algorithm [5] is the more appro- priate since it generates the cover relation between concepts, and thus allows an on-demand generation of concepts. Moreover, huge amount of data are of- ten described by a huge amount of objects. It is the case in databases where sophisticated key-indexation techniques are used to improve object access. c 2011 by the paper authors. CLA 2011, pp. 239–250. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 240 Christophe Demko and Karell Bertet In this paper, we propose the Limited Object Access algorithm (LOA algo- rithm), an extension of Bordat’s immediate successors generation with a limited access to objects. This algorithm, compounded with an on-demand strategy, and with sophisticated key-indexation techniques to improve objects’s access, aims to improve time computation for a large amount of objects. However, worst case theoretical complexity remains the same as Bordat’s algorithm. Experiments were conducted with Joomla!, a content management system based on relational algebra, and located on a MySQL database. This paper is organized as follows. In section 2, we describe the Galois lattice structure and the Bordat’s generation algorithm. In section 3, we describe our limited object access algorithm, illustrated by an example and some experiments. 2 Description and generation of a concept lattice 2.1 Description of a concept lattice The concept lattice is a particular graph defined and generated from a relation R between objects O and attributes I. This graph is composed of a set of concepts ordered by a relation verifying the properties of a lattice, i.e. an order relation ≤ (transitive, reflexive and antisymmetric relation) such that, for each pair of concepts in the graph, there exists both a lower bound and an upper bound. Therefore, a lattice contains a minimum (resp. maximum) element according to the relation ≤ called the bottom (resp. top) of the lattice. The Hasse diagram of a graph [1] is the cover relation of ≤ denoted as ≺, i.e. the suppression on the graph of both transitivity and reflexivity edges. We associate to a set of objects A ⊆ O the set f (A) of attributes in relation R with the objects of A: f (A) = {y ∈ I | xRy ∀ x ∈ A} Dually, to a set of attributes B ⊆ I, we define the set g(B) of objects in relation with the attributes of B: g(B) = {x ∈ O | xRy ∀ y ∈ B} These two functions f and g defined between objects and attributes form a Galois correspondence. The relation between the set of objects and the set of attributes is described by a formal context. A formal context C is a triplet C = (O, I, R) (or C = (O, I, (f, g))) represented by a table. A formal concept represents maximal objects-attributes correspondences (fol- lowing relation R) by a pair (A, B) with A ⊆ O and B ⊆ I, which verifies f (A) = B and g(B) = A. The whole set of formal concepts thus corresponds to all the possible maximal correspondences between a set of objects O and a set of attributes I. Two formal concepts (A1 , B1 ) and (A2 , B2 ) are in relation in the lattice when they verify the following inclusion property: A2 ⊆ A1 (A1 , B1 ) ≤ (A2 , B2 ) ⇔ (equivalent to B1 ⊆ B2 ) Generation algorithm of a concept lattice with limited access to objects 241 The whole set of formal concepts fitted out by the order relation ≤ is called concept lattice or Galois lattice because it verifies the lattice properties: the relation ≤ is clearly an order relation, and for each pair of concepts (A1 , B1 ) and (A2 , B2 ), there exists the greatest lower bound (resp. the least upper bound) called meet (resp. join) denoted (A1 , B1 ) ∧ (A2 , B2 ) (resp. (A1 , B1 ) ∨ (A2 , B2 )) defined by: (A1 , B1 ) ∧ (A2 , B2 ) = (g(B1 ∩ B2 ), (B1 ∩ B2 )) (1) (A1 , B1 ) ∨ (A2 , B2 ) = ((A1 ∩ A2 ), f (A1 ∩ A2 )) (2) The concepts ⊥ = (O, f (O)) and > = (g(I), I) respectively correspond to the bottom and the top of the concept lattice. In formal concept analysis (FCA) concept lattices are used to analyse data when organised by a binary relation between objects and attributes. See the book of Ganter and Wille [4] for a more complete description of formal concept analysis. In the following, we abuse notation and use X + x (respectively, X \ x) for X ∪ {x} (respectively, X\{x}). 2.2 Generation algorithms of a concept lattice Numerous generation algorithms for concept lattices have been proposed in lit- erature [6,7,5,8]. Although all these algorithms generate the same lattice, they propose different strategies. Some of these algorithms are incremental [6,9]. Gan- ter’s NextClosure [7] is the reference algorithm that determines the concepts in lectical order (next, the concepts may be ordered by ≤ to form the concept lat- tice) while Bordat’s algorithm [5] is the first algorithm that computes directly the Hasse diagram of the lattice. Recent work [10] proposed a generic algorithm unifying the existing algorithms in a unique framework, which makes easier the comparison of these algorithms. A formal and experimental comparative study of the different algorithms has been published [11]. All of these proposed algorithms have a polynomial complexity with respect to the number of concepts (at best quadratic in [8]). The complexity is therefore determined by the size of the lattice, this size being bounded by 2|O+I| in the worst case and by |O + I| in the best case. Studies on average complexity are difficult to carry out because the size of the concept lattice depends both on the dimensionality of the data to classify and on their organization and diversity. However, in practice the size of the Galois lattice generally remains reasonable. Some applications offer to only generate some concepts from the huge amount of available data. Bordat’s algorithm [5] is the more appropriate since it generates the cover relation between concepts, and thus allows an on-demand generation of concepts usually used in concrete applications. Bordat’s algorithm is issued from a corollary of Bordat’s theorem: Theorem 1 (Bordat [5]). Let (A, B) and (A0 , B 0 ) be two concepts of a context (O, I, R). Then (A, B) ≺ (A0 , B 0 ) if and only if A0 is inclusion-maximal in the 242 Christophe Demko and Karell Bertet following set system FA defined on O1 : FA = {g(x + B) : x ∈ I\B} (3) Corollary 2 (Bordat [5]). Let (A, B) be a concept. There is a one-to-one map- ping between the immediate successors of (A, B) in the Hasse diagram of the lattice and the inclusion-maximal subsets of FA . Bordat’s algorithm recursively computes all the concepts of C by computing immediate successors for each concept (A, B), starting from the bottom concept ⊥ = (f (G), G), until all concepts are generated. Immediate successors are gen- erated using Corollary 2 in O(|I|2 ∗ |O|): the set system has first to be generated in a linear time ; then inclusion-maximal subsets of FB , can easily be computed in O(|I|2 ∗ |O|). 3 Limited Object Access Algorithm (LOA) 3.1 Description of the LOA Algorithm Large data are often described by a huge amount of objects, as in databases for example where the number of recordings (i.e. objects) can be huge, indexed using sophisticated key-indexation techniques. In this section, we describe our Limited Object Access algorithm (LOA algorithm), an extension of Bordat’s immediate successors generation with a limited object access. This algorithm, compounded with an on-demand strategy aims to improve time computation for large amount of objects. Our algorithm considers the restriction of a concept lattice to the attributes. A nice result establishes that any concept lattice (C, ≤C ) is isomorphic to the lattice (CI , ⊆) defined on the set I of attributes, with CI the restriction of C to the attributes in each concept. The lattice (CI , ⊆) is also known as the closed sets lattice on the attributes I of a context (O, I, R), where the set system CI is composed of all closed set - i.e. fixed points - for the closure operator ϕ = g ◦ f . See the survey of Caspard and Monjardet [12] for more details about closed set lattices. Using the closed sets lattice (CI , ⊆) instead of the whole concept lattice (C, ≤C ) gives raise to a storage improvement, for example in case of large amount of objects. A closed sets lattice can be generated using an algorithm similar to Bordat’s algorithm, and therefore enabling an on-demand generation in order to reduce the whole amount of closed sets. This algorithm (see Alg. 1) recursively computes immediate successors (see Alg. 2) of a closed set B, starting from the bottom closed set ⊥ = ϕ(∅), until I is generated. The Immediates Successors LOA algorithm we propose (see Alg. 3) rein- forces the object access limitation by considering the cardinality of the subset g(X + B) instead of the subset itself to compute the inclusion-maximal subsets of FA using the following property: 1 In [5], the equivalent formulation g(x) ∩ A is used instead of g(x + B) Generation algorithm of a concept lattice with limited access to objects 243 Proposition 3. Consider a concept (A, B), and two subsets X and Y of at- tributes in B\I. Then g(X + B) ⊆ g(Y + B) ⇐⇒ |g(X + B)| = |g(X + Y + B)| (4) This proposition is a direct consequence of the two following remarks: 1. The equivalence between inclusion and intersection set operations (C ⊆ D ⇐⇒ C = C ∩ B) allows to deduce the equivalence between g(X + B) ⊆ g(Y + B) and g(X + B) = g(X + B) ∩ g(Y + B): 2. Then, by definition of g, we have g(X + B) ∩ g(Y + B) = g(X + Y + B). More precisely, the Immediates Successors LOA algorithm (see Alg. 3) first initialize the set Succ of immediate successors of a closed set B with the emp- tyset. The set Succ is then updated by considering each attribute x of I\B and another already inserted potential successor X ⊆ I\B by considering the fol- lowing four cases, where cB (Y ) denotes the cardinality of g(B + Y ) for a Y of attributes: Merge x with X: When g(x + B) = g(X + B), then x and X belongs to the same closed set, and thus have to be merged in a same potential successor of B. By Proposition 3, this case is tested by cB (X + x) = cB (X) and cB (X) = cB (x). Eliminate X: When g(X + B) ⊂ g(x + B), then the closed set containing X isn’t inclusion-maximal in FA , and thus hasn’t to be considered as a potential successor of B. By Proposition 3, this case is tested by cB (X + x) = cB (X) and cB (X) < cB (x). Eliminate x: When g(x + B) ⊂ g(X + B), then the closed set containing x isn’t inclusion-maximal if FA , and thus hasn’t to be considered as a potential successor of B. By Proposition 3, this case is tested by cB (X + x) = cB (X) and cB (x) < cB (X). Insert x: When x is neither eliminated or merged with X, then it is added as a potential successor of B ; another attribute is then considered. 3.2 Example To illustrate our algorithm, we use the following context where numbers from 1 to 9 are described by some properties: the number is a prime number, an odd or even number, a square, a composite number or a factorial number. (p)rime o(dd) (e)ven (s)quare (c)omposite (f)actorial nb 1 × × × nb 2 × × × nb 3 × × nb 4 × × × nb 5 × × nb 6 × × × nb 7 × × nb 8 × × nb 9 × × × 244 Christophe Demko and Karell Bertet Name: Closed Set Lattice Data: A context K = (O, I, R) Result: The Hasse diagram (CI , ≺) of the lattice (CI , ⊆) begin CI = {f (O)}; foreach B ∈ CI not marked do SuccB =Immediates successors (K, B); foreach X ∈ SuccB do B 0 = B + X; if B 0 6∈ CI then add B 0 to CI ; add a cover relation B ≺ B 0 end mark B end return (CI , ≺) end Algorithm 1: Generation of the Hasse diagram of the closed set lattice (CI , ⊆) Name: Immediates Successors Data: A context K ; A closed set B of the closed set lattice (CI , ⊆) of K Result: The immediate successors of B in the lattice begin initialize the set system FA with ∅; foreach x ∈ I\B do add g(x + B) to FA end Succ=maximal inclusion subsets of FA ; return Succ end Algorithm 2: Generation of the immediate successors of a closed set in the Hasse diagram of the lattice (CI , ⊆) Generation algorithm of a concept lattice with limited access to objects 245 Name: Immediates Successors LOA Data: A context K ; A closed set B of the closed set lattice (CI , ⊆) of K Result: The immediate successors of B in the lattice begin initialize the SuccB family to an empty set; foreach x ∈ I \ B do add = true; foreach X ∈ SuccB do \\ Merge x and X in the same potential successor if cB (x) = cB (X) then if cB (X + x) = cB (x) then replace X by X + x in SuccB ; add=false; break; end end \\ Eliminate x as potential successor if cB (x) < cB (X) then if cB (X + x) = cB (x) then add=false; break; end end \\ Eliminate X as potential successor if cB (x) > cB (X) then if cB (X + x) = cB (X) then delete X from SuccB end end end \\ Insert x as a new potential successor ; if add then add {x} to SuccB end return SuccB ; end Algorithm 3: Generation of the immediate successors of a closed set in the Hasse diagram of the lattice (CI , ⊆) 246 Christophe Demko and Karell Bertet Fig. 1. Concept lattice Figure 1 gives the concept lattice of this context. When the algorithm com- putes the successors of the closed sets e (resp. p), it proceeds as described in Table 1 (resp. Table 2). The different steps of these two examples show the different actions taken by the algorithm. SuccF x cB (x) X cB (X) cB (x + X) Case Action ∅ p 1 Insert [p] {[p]} o 0 [p] 1 0 cB (x + X) = cB (x) < cB (X) Eliminate [o] {[p]} s 1 [p] 1 0 cB (x + X) < cB (X) = cB (x) {[p]} c 3 [p] 1 0 cB (x+X) < cB (X) < cB (x) Insert [c] {[p], [c]} f 2 [p] 1 1 cB (x + X) = cB (X) < cB (x) Eliminate [p] {[c]} f 2 [c] 3 1 cB (x+X) < cB (x) < cB (X) Insert [f ] {[c], [f ]} Table 1. Immediate successors of [e] 3.3 Complexity The complexity of computing the immediate successors of a closed set B using the Immediates Successors LOA algorithm is: (|I| − |B|)(|I| − |B|) ∗ O(cB (X)) 2 Generation algorithm of a concept lattice with limited access to objects 247 SuccF x cB (x) X cB (X) cB (x + X) Case Action ∅ o 3 Insert [o] {[o]} e 1 [o] 3 0 cB (x+X) < cB (x) < cB (X) Insert [e] {[o], [e]} s 0 [o] 3 0 cB (x + X) = cB (x) < cB (X) Eliminate [s] {[o], [e]} c 0 [o] 3 0 cB (x + X) = cB (x) < cB (X) Eliminate [c] {[o], [e]} f 1 [o] 3 0 cB (x+X) < cB (x) < cB (X) {[o], [e]} f 1 [e] 1 1 cB (x + X) = cB (x) = cB (X) Merge [e], [f ] {[o], [ef ]} Table 2. Immediate successors of [p] which leads to O((|I| − |B|)2 ∗ O(cB (X))) using the big O notation. This has to be compared with O(|I|2 ∗ |O|) of the Immediates Successors algorithm. In addition the cost O(CB (x)) of computing the cardinality of objects satisfying the required properties can be based on multiple keys and robust algorithms used in databases that do not need to load all data for computing a cardinality. 3.4 Experimentations In the experiment, we use a dataset composed of: 54 attributes: the 6 attributes of the example, and attributes corresponding to the property to be multiple of {3 − 50}. 100000 objects: the integers between 1 and 100000 The dataset is stored in a database MySQL 5.5.14. We have implemented our Immediates Successors LOA algorithm using PhP 5.3.6. The counting of objects satisfying a set of properties is realised by the SQL request comparing indexes with a constant: select count (*) from att1=1 and att2=1 We compare the processing time of our Immediates Successors LOA algo- rithm in the two following cases: Indexed: Each attribute is defined to be an index. Objects are indexed by their attributes, and MySQL can quickly retrieve them in the dataset using a B- tree indexation with a logarithmic complexity [13]: O(cB (X)) = O(log|I|). Not indexed: Objects are not indexed and a scan of all the lines is neces- sary to retrieve objects. The complexity is then similar to those of the Immediates Successors algorithm: O(cB (X)) = O(|I| − |B|). We compare the processing time of computing the immediate successors of the botom element ∅ in this two cases (indexed and not indexed): 248 Christophe Demko and Karell Bertet (a) 100000 first integers as objects ; a number of attributes between 6 to 54. (b) 54 attributes ; integers between 1000 and 100000 Fig. 2. Calculating the immediate successors of ∅ Fig.2(a): for the 100000 first integers as objects, and a number of attributes between 6 to 54. Fig.2(b): for the 54 attributes, and integer between 1000 and 100000. In the results, the time processing is really improved with an indexed dataset, and seems to be near to linear in O(|I| + |O|). Immediate successors of the ∅ for 100000 objects and 54 attributes are computed in 3 seconds with the indexed algorithm, and in 18 seconds with the not indexed one. Generation algorithm of a concept lattice with limited access to objects 249 Moreover, the explain of the count operation shows that an index-merge operation is realized on indexes corresponding to an intersect computation: mysql> explain select count(*) from numbers where p=1 and o=1; +----+-------------+------+---------+------+-----------------------+ | id | select_type | key | key_len | rows | Extra | +----+-------------+------+---------+------+-----------------------+ | 1 | index_merge | p,o | 1,1 | 2 | Using intersect(p,o); | | | | | | | Using where; | | | | | | | Using index | +----+-------------+------+---------+------+-----------------------+ 1 row in set (0.00 sec) Therefore, optimizing the intersection operation, with an adaptated sort on the lines for example, would be a possible optimization of our algorithm. 4 Conclusion In this paper, we described a new algorithm for computing the immediate succes- sors of a concept using the counting of objects satisfying a set of properties. By separating the counting from the rest of the algorithm, new systems for explor- ing concept lattices can now rely on optimization algorithms used in relational databases. If the tests we will realize on PostgreSQL and MySQL databases are successfull in terms of manipulating a huge amounts of data, we plan to propose a library for extending content management system such as Joomla!. References 1. Birkhoff, G.: Lattice theory. 3d edn. American Mathematical Society (1967) 2. Barbut, M., Monjardet, B.: Ordres et classifications : Algèbre et combinatoire. Hachette, Paris (1970) 2 tomes. 3. Davey, B., Priestley, H.: Introduction to lattices and orders. 2nd edn. Cambridge University Press (1991) 4. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical foundations. Springer Verlag, Berlin (1999) 5. Bordat, J.: Calcul pratique du treillis de Galois d’une correspondance. Math. Sci. Hum. 96 (1986) 31–47 6. Norris, E.: An algorithm for computing the maximal rectangles in a binary relation. Revue Roumaine de Mathématiques Pures et Appliquées 23 (1978) 7. Ganter, B.: Two basic algorithms in concept analysis. Technische Hochschule Darmstadt (Preprint 831) (1984) 8. Nourine, L., Raynaud, O.: A fast algorithm for building lattices. Information Processing Letters 71 (1999) 199–204 9. Gödin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms based on Galois (concept) lattices. Computational Intelligence 11 (1995) 246–267 250 Christophe Demko and Karell Bertet 10. Gely, A.: A generic algorithm for generating closed sets of binary relation. Third International Conference on Formal Concept Analysis (ICFCA 2005) (2005) 223– 234 11. Kuznetsov, S., Obiedkov, S.: Comparing performance of algorithms for generating concept lattices. Journal of Experimental and Theorical Artificial Intelligence 14 (2002) 189–216 12. Caspard, N., Monjardet, B.: The lattice of closure systems, closure operators and implicational systems on a finite set: a survey. Discrete Applied Mathematics 127 (2003) 241–269 13. Bayer, R. et McCreight, E.M.: Organization and maintenance of large ordered indexes. Acta Informatica 1 (1972) 173–189 Homogeneity and Stability in Conceptual Analysis Paula Brito1 and Géraldine Polaillon2 1 Faculdade de Economia & LIAAD/INESC-Porto L.A., Universidade do Porto Rua Dr. Roberto Frias, 4200-464 Porto, Portugal mpbrito@fep.up.pt 2 SUPELEC Science des Systèmes (E3S) - Département Informatique Plateau de Moulon, 3 rue Joliot Curie, 91192 Gif-sur-Yvette cedex, France geraldine.polaillon@supelec.fr Abstract. This work comes within the field of data analysis using Galois lattices. We consider ordinal, numerical single or interval data as well as data that consist on frequency/probability distributions on a finite set of categories. Data are represented and dealt with on a common framework, by defining a generalization operator that determines intents by intervals. In the case of distribution data, the obtained concepts are more homogeneous and more easily interpretable than those obtained by using the maximum and minimum operators previously proposed. The number of obtained concepts being often rather large, and to limit the influence of atypical elements, we propose to identify stable concepts using interval distances in a cross validation-like approach. 1 Introduction This work concerns multivariate data analysis using Galois concept lattices. Let E = {ω1 , . . . , ωn } be the set of elements to be analyzed, described by p variables Y1 , . . . , Yp . In this paper we consider the specific case where the variables Yj are numerical (real or interval-valued), ordinal and modal. Modal variables allow associating with each element of E a probability/frequency distribution on an underlying finite set of categories (see [9]). The use of Galois lattices in Data Analysis was first introduced by Barbut and Monjardet, in the seventies of last century [2] and then further developed and largely spread out by the work of R. Wille and B. Ganter (see, e.g., [6]). Let (A, ≤1 ) and (B, ≤2 ) be two ordered sets. A Galois connection is a pair (f, g), where f is a mapping f : A → B, g is a mapping g : B → A, such that f and g are antitone, and both h = gof and h0 = f og are extensive; h and h0 are then closure operators. The mapping f defines the intent of a set S ⊆ E, and the mapping g that allows obtaining the extent in E associated with a set of attributes T ⊆ O, where O is the set of the considered (binary) attributes. The couple (f, g) then constitutes a Galois connection between (P (E), ⊆) and (P (O), ⊆). A concept is defined as a couple (S, T ) where S ⊆ E, T ⊆ O, S = g(T ) and T = f (S), i.e., we have h(S) = S; S is the extent of the concept and T its intent. This approach has been applied to non-binary variables, but in this case data are generally submitted to a previous “binarization”, by performing a binary coding of the c 2011 by the paper authors. CLA 2011, pp. 251–263. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 252 Paula Brito and Géraldine Polaillon data array; for numerical or ordinal variables Y , attributes of the form “Y ≤ x,” for each observed value x, are considered. In [3] this approach has been extended by defining directly the intent of a set of elements; which has allowed obtaining, for each variable type (classical or otherwise) appropriate couples of mappings (f, g) forming a Galois connection. This has the advantage of allowing analyzing the data directly as it is presented, without imposing any sort of binary pre-coding, which may, and generally does, drastically increase the size of the data array to be analyzed. Galois lattices where intents are obtained by union and by intersection are obtained. This approach has been further extended to modal variables (see [4]). The case of ordinal variables has been dealt with in [11], using an approach similar to that of [4] for modal variables. Ganter and Kuznetsov [5] proposed a general construction, called pattern structures, which allows for arbitrary descriptions with a semilattice operation on them; since union and intersection of intervals define semilattices, they make respective pattern structures. An application on gene expression data is pre- sented in [7]. Here, we consider a common framework for numerical (real or interval-valued), ordinal and modal variables, by defining a generalization operator that deter- mines the intent in the form of vectors of intervals. For ordinal and modal (i.e., distribution-valued) variables the obtained concepts are more homogeneous and therefore easier to interpret than those obtained by applying the minimum and maximum operators, as previously proposed. In the next sections, we detail how generalization of a set of elements is performed for each variable type. The number of obtained concepts being often rather large, we propose to identify stable concepts (see also [8] and [12]), using distances designed for inter- val data. The criteria is that the intent of a concept should not be too different from those obtained by sequentially removing one element of the extent at a time - which would reveal that this particular element is provoking a drastic change in the concepts’ intent. Should it occur, the concept would be considered to be non-stable. In the case of multi-valued data, other approaches of lattice reduction, di- rectly applied to the concept lattice, have been proposed in [1] and [10]. These two approaches rely on the same idea of merging together similar attribute values (in respect to a given threshold), and thereby reducing the number of concepts. The remainder of the paper is organized as follows. Section 2 describes the generalization procedure for real and interval-valued variables, which is extended in Section 3 to modal variables. In Section 4 a common generalization approach by vectors of intervals is presented. In Section 5 the problem of concept stability is considered, and a method using interval distances is proposed, which allows addressing the question of lattice reduction. Section 6 concludes the paper, open- ing paths for future research. Homogeneity and Stability in Conceptual Analysis 253 2 Real and interval-valued variables Let E = {ω1 , ..., ωn } be the set of n elements or objects to be analyzed, and Y1 , . . . , Yp real or interval-valued variables such that Yj (ωi ) = [lij , uij ]. We shall consider real-valued variables as a special case of interval-valued ones; it is there- fore equivalent to write Yj (ωi ) = x or Yj (ωi ) = [x, x]. Let A = {ω1 , . . . , ωh } ⊆ E. Generalization by union is defined (see [3]) by the mapping f : P (E) → I p where I is the set of intervals of IR endowed with the inclusion order, such that f (A) = (I1 , . . . , Ip ), with Ij = [M in {lij } , M ax {uij }], ωi ∈ A, j = 1, . . . , p, i.e., for each j = 1, . . . , p, Ij is the minimum interval (for the inclusion order) that covers all values taken by the elements of A for variable Yj . Let g : I p → P (E) be the mapping defined as g((I1 , . . . , Ip )) = = {ωi ∈ E : Yj (ωi ) ⊆ Ij , j = 1, . . . , p}, i.e., the set of elements of E taking values within Ij , for j = 1, . . . , p. The couple (f, g) is a Galois connection. Likewise, we may generalise by intersection defining f and g as follows: f ∗ : P (E) → I p , f (A) = (I1 , . . . , Ip ), with Ij = [M ax {lij } , M in {uij }] if M ax {lij } ≤ M in {uij } , ωi ∈ A, Ij = otherwise (i.e., the largest interval contained in all intervals taken by the elements of A for variable Yj , which may be empty), for j = 1, . . . , p, and g ∗ : I p → P (E) with g ∗ ((I1 , . . . , Ip )) = {ωi ∈ E : Yj (ωi ) ⊇ Ij , j = 1, . . . , p} (the set of elements of E taking interval- values that contain Ij ,) for j = 1, . . . , p. The couple (f ∗ , g ∗ ) forms also a Galois connection. Example 1: Consider three persons, Ann, Bob and Charles characterized by two variables, age and amount of time (in minutes) necessary to go to work (which varies from day to day, and is therefore represented by an interval-valued variable), as presented in Table 1. Age Time Ann 25 [15, 20] Bob 32 [25, 30] Charles 40 [10, 20] Table 1. Age and amount of time (in minutes) necessary to go to work for three persons. Let A = {Bob,Charles}. Generalization by the union leads to f (A) = ([32, 40], [10, 30]), describing people who are between 32 and 40 years old and take 10 to 30 minutes to go to work; in this dataset people meeting this description are given by g(f (A)) = g(([32, 40], [10, 30])), i.e., {Bob, Charles} = A. Here, ({Bob, Charles}, ([32, 40], [10, 30])) is a concept. 254 Paula Brito and Géraldine Polaillon 3 Modal variables Two Galois connections may also be defined for the case of modal variables (see [4]). Let Y1 , . . . , Yp be p modal variables, Oj = mj1 , . . . , mjkj the set of kj possible categories of variable Yj , Mj the set of distributions defined on Oj , for j = 1, . . . ,np, and M = M1 × . . . ×oMp . For variable Yj and element ωi ∈ E, Yj (ωi ) = mj1 (pω ωi ωi j1 ), . . . , mjkj (pjkj ) , where pjk` is the probability/frequency i associated with category mj` (` = 1, . . . , kj ) of variable Yj , and element ωi . Let A = {ω1 , . . . , ωh } ⊆ E. To generalise by the maximum we take, for each category mj` , the maximum of its probabilities/frequencies in A. Let f : P (E) → M , such that f (A) = (d1 , . . . , dp ), with dj = {mj1 (tj1 ), . . . , mjkj (tjkj )}, where tj` = M ax{pωj` , ωi ∈ i A}, ` = 1, . . . , kj . The intent of a set A ⊆ E is then to be interpreted as “objects with at most tj` cases presenting category mj` , ` = 1, . . . , kj , j = 1, . . . , p”. The couple (f, g) with n g : M → P (E) defined as, for dj = {mj1 (pj1 ),o. . . , mjkj (pjkj )}, g((d1 , . . . , dp )) = ωi ∈ E : pωj` ≤ pj` , ` = 1, . . . , kj , j = 1, . . . , p , forms a Galois i connection. Similarly, we may generalise by the minimum taking for each category the minimum of its probabilities/frequencies. Let f ∗ : P (E) → M , f ∗ (A) = (d1 , . . . , dp ), with dj = {mj1 (vj1 ), . . . , mjkj (vjkj )}, where vj` = M in{pω j` , ωi ∈ A}, ` = i 1, . . . , kj . The intent of a set A ⊆ E is now interpreted as “objects with at least vj` cases presenting category mj` , ` = 1, . . . , kj , j = 1, . . . , p”. The couple (f ∗ , g ∗ )nwith g ∗ : M → P (E) such that, for dj = {mj1o(pj1 ), . . . , mjkj (pjkj )}, g ∗ ((d1 , . . . , dp )) = ωi ∈ E : pω j` ≥ pj` , ` = 1, . . . , kj , j = 1, . . . , p forms likewise i a Galois connection. Example 2: Consider four groups of students for each of which a categorical mark is given, according to the following scale: a: mark < 10, b: mark between 10 and 15, c: mark > 15 as summarized in Table 2. Mark Group 1 < 10(0.2), [10 − 15] (0.6), > 15(0.2) Group 2 < 10(0.3), [10 − 15] (0.3), > 15(0.4) Group 3 < 10(0.1), [10 − 15] (0.6), > 15(0.3) Group 4 < 10(0.3), [10 − 15] (0.6), > 15(0.1) Group 5 < 10(0.5), [10 − 15] (0.3), > 15(0.2) Table 2. Frequency distributions of the students marks, in 3 categories, for 5 groups. The intent, obtained by the maximum operator, of the set formed by groups 1 and 2, is {a(0.3), b(0.6), c(0.4)} and is interpreted as “students’ groups with at Homogeneity and Stability in Conceptual Analysis 255 most 30% of marks a, at most 60% of marks b and at most 40% of marks c”. The corresponding extent comprehends groups 1, 2, 3 and 4. If, alternatively, we determine the intent of the same set by the minimum operator, we obtain {a(0.2), b(0.3), c(0.2)}, to be read as “students’ groups with at least 20% of marks a, at least 30% of marks b and at least 20% of marks c”, whose extent is formed by groups 1, 2 and 5. 4 A common approach: generalization by intervals We now present a unique framework allowing to perform generalization for nu- merical (real or interval-valued) variables, ordinal variables and modal variables, based on generalization by intervals. For numerical (real or interval-valued) data, we are in the above mentioned case of generalization by taking the union. For modal variables, it amounts to consider, for each category, an interval corresponding to the range of its probability/frequency. In fact, it has often been observed that generalization either by the maximum or by the minimum, as defined in Section 3, may quickly lead to over-generalization. As a consequence, f (A), A ⊆ E, is not very informative. Let MjI = {mj` (Ij` ), ` = 1, . . . , kj }, mj` ∈ Oj , Ij` ⊆ [0, 1] and M I = M1I × . . . × MpI . Generalization is now defined as f I : P (E) → M I I f (A) = (d1 , . . . , dp ) with dj = {mj1 (Ij1 ), . . . , mjkj (Ijkj )}, h i where Ij` = M in{pωi j` }, M ax{pωi j` } , ωi ∈ A, ` = 1, . . . , kj , j = 1, . . . , p and n gI : M I → E o g((d1 , . . . , dp )) = ωi ∈ E : pω j` ∈ Ij` , ` = 1, . . . , kj , j = 1, . . . , p i The so-defined couple of mappings (f I , g I ) forms a new Galois connection. On the data of Example 2, generalization by intervals of groups 1 and 2 provides the intent {a [0.2, 0.3] , b [0.3, 0.6] , c [0.2, 0.4]}, to be read as “students’ groups having between 20% and 30% cases of mark a, between 30% and 60% cases of mark b and between 20% and 40% cases of mark c” and whose extent now only contains groups 1 and 2. The case of ordinal variables has been addressed in [11], performing general- ization either using the maximum or the minimum. To allow for more flexibility, the author proposes to choose the operator individually for each variable. Nev- ertheless, one of these generalization operators must be chosen in each case, and over-generalization is not prevented. Our proposal for this type of variables, is to generalise a set A ⊆ E considering, no longer a minimum or a maximum, but rather an interval of ordinal values. 256 Paula Brito and Géraldine Polaillon Example 3: Consider the classifications given by four cinema critics while evaluating three movies, Movie 1, Movie 2 and Movie 3 as given in Table 3. Movie 1 Movie 2 Movie 3 Critic 1 5 5 4 Critic 2 5 4 4 Critic 3 1 2 2 Critic 4 2 1 1 Table 3. Classifications given by four critics to three movies. The intent obtained by using the maximum operator of the group formed by critics 1 and 2 is (5, 5, 4), to be interpreted as “critics giving at most mark 5 to Movie 1, at most mark 5 to Movie 2 and at most mark 4 to Movie 3” - which is obviously too general and would cover almost everyone; in this dataset the corresponding extent contains critics 1, 2, 3 and 4. Therefore, the class formed by critics 1 and 2, who present a similar behavior, does not correspond to a con- cept. The intent obtained by using the minimum operator of the group formed by critics 3 and 4 is (1, 1, 1), to be read “critics giving at least mark 1 to Movie 1, at least mark 1 to Movie 2 and at least mark 1 to Movie 3” - which would cover every critic; its extent in this dataset consists again of critics 1, 2, 3 and 4. Here again, the class formed by critics 3 and 4, who give quite similar marks, does not correspond to a concept. If we now perform generalization by interval-vectors of the group formed by critics 1 and 2, we obtain the intent ([5, 5] , [4, 5] , [4, 4]); likewise for the group formed by critics 3 and 4, we have ([1, 2] , [1, 2] , [1, 2]); in the first case we are clearly referring to critics giving high marks while in the second case we describe critics giving low marks to all movies. The corre- sponding extents no longer contain other critics, presenting a rather different profile from those considered each time. Furthermore, both ({Critic 1, Critic 2}, ([5, 5] , [4, 5] , [4, 4]) and ({Critic 3, Critic 4}, ([1, 2] , [1, 2] , [1, 2]) are concepts. When determining concepts, according to the minimum or the maximum oper- ators, e.g. in a clustering context, there is therefore a risk of forming heteroge- neous clusters, since over-generalization may lead to a too large extent. By taking interval-vectors of observed values, the over-generalization problem is avoided. To conclude this section, we now present a more general example, with variables of the different considered types. Example 4: Consider the data in Table 4, where 4 persons are described by their age, a real-valued variable, time (in minutes) they take to go to work, an interval- valued variable, the means of transportation used, a modal variable, and their classifications given to three newspapers, A, B and C (ordinal variable). Homogeneity and Stability in Conceptual Analysis 257 Age Time Transport A B C Albert 25 [15, 20] car (0.2) bus (0.8)) 4 2 5 Bellinda 40 [25, 30] car (0.7), bus (0.2), train (0.1)) 2 4 3 Christine 32 [10, 15] car (0.2), bus (0.7), train (0.1)) 5 1 4 David 58 [30, 45] car (0.9), bus (0.1)) 2 4 1 Table 4. Age, time taken to go to work (in minutes), means of transportation used and classifications given to newspapers A, B and C for four persons. The intent of A = {Albert, Christine} is V = ([25, 32] , [10, 20] , ([0.2, 0.2] , [0.7, 0.8] , [0.0, 0.1]) , [4, 5] , [1, 2] , [4, 5]) and (A, V ) is a concept. 5 Stability Concepts are theoretically very interesting, and do provide rich information on the values shared by subsets of elements of the set under study. However, the number of concepts of a data array is often rather large, even for relatively low cardinals of the sets of elements and variables. This fact makes the analysis and interpretation of results a bit delicate. It is often to be noticed that when analyzing the concepts generated by numerical or modal variables, groups of concepts appear which are quite similar. This may be due to noise or minor differences, generally not pertinent. The idea is therefore to extract only those concepts which are representative of these groups of similar concepts, so as to obtain a more concise representation with significantly homogeneous concepts. Several solutions may be pointed out for this objective. We will focus on the notion of stability, as introduced in [8] and [12], which evaluates the amount of information of the intent that depends on specific objects of the concept’s extent. Formally, the stability of a concept is defined as the probability of keeping its intent unchanged while deleting arbitrarily chosen objects of its extent. When analyzing data described by numerical (real or interval-valued), ordinal or modal variables, and generalizing using interval-vectors (as described in the previous sections), we shall apply a similar approach to each formed concept, but introducing a distance measure. The objective being to retain the homogeneous concepts, it is wished to avoid that a single element of the concepts’ extent produces an important increase in the intent’s intervals’ ranges. To identify the stable concepts, a threshold α depending on the maximum distance is defined (so as no to be dependent from the variables’ scales). A concept is said to be “stable” if the distance between the intent obtained by removing one element of the extent at a time, and its original intent, is not above the given threshold. This is in fact a cross-validation-like approach, in that one element of the extent is removed at a time, and the resulting intent is compared with the original one. 258 Paula Brito and Géraldine Polaillon When data have an interval form, interval distances should be used. Dif- ferent measures are available in the literature; we will focus on three interval distance measures: the Hausdorff distance, the interval Euclidean distance and the interval City-Block distance. Let Ii = [li , ui ] and Ih = [lh , uh ] be two intervals we wish to compare. The Hausdorff distance dH , the interval Euclidean distance d2 and the interval City- Block distance d1 between Ii and Ih are respectively dH (Ii , Ih ) = M ax {{|li − lh | , |ui − uh |} p d2 (Ii , Ih ) = (li − lh )2 + (ui − uh )2 d1 (Ii , Ih ) = |li − lh | + |ui − uh | . The Hausdorff distance between two sets is the maximum distance of a set to the nearest point in the other set, i.e., two sets are close in terms of the Hausdorff distance if every point of either set is close to some point of the other set. Interval Euclidean and City-Block distances are just the counterparts of the corresponding distances for real values; if we embed the interval set in IR2 , where one dimension is used for the lower and the other for the upper bound of the intervals, then these distances are just the Euclidean and City-Block distances between the corresponding points in the two-dimensional space. Let C = (A, D) be a concept, where A = {ω1 , . . . , ωh } ⊆ E is its extent and D = (I1 , . . . , Ip ) is its intent, D = f (A). The considered criterion is then the distance ∆ between D et D−i where D−i is the intent of A without ωi , D−i = f (A \ {ωi }), i = 1, . . . , h, defined by: ∆ = M ax{δ(D, D−i ), ωi ∈ A}, δ measuring the dissimilarity between interval-vectors. Let d be the distance (according to the chosen measure) between the intervals corresponding to variable Yj in a concept’s intent. Two options may then be foreseen, whether it is wished to consider the maximal or the average distance on the intervals defining the intents: 1. δM ax (D, D−i ) = M ax{d(Ij , Ij−i )}, j indexing the variable set Yj , j = 1, . . . , p in the case of numerical and ordinal variables, and the global category set O = O1 ∪ . . . ∪ Op in the case of p modal variables; 2. δM ean (D, D−i ) = M ean{d(Ij , Ij−i )}, j as in 1. A concept C = (A, D) is then considered to be stable if ∆ ≤ α. This ap- proach allows keeping only the stable, and therefore more representative, con- cepts, avoiding the effect of outlier observations. 6 Illustrative application Consider again classifications given by cinema critics evaluating three movies, Movie 1, Movie 2 and Movie 3 where Yj (Critici ) is the mark given by Critic i to Movie j, i = 1, . . . , 5; j = 1, 2, 3, as given in Table 5. Tables 6 and 7 list the concepts obtained when the Minimum and the Maxi- mum generalization operators are used, respectively. Homogeneity and Stability in Conceptual Analysis 259 Movie 1 Movie 2 Movie 3 Critic 1 3 2 3 Critic 2 1 1 2 Critic 3 5 5 1 Critic 4 4 3 2 Critic 5 2 4 5 Table 5. Classifications given by five critics to three movies. Intent Extent Movie 1 Movie 2 Movie 3 {1} ≥3 ≥2 ≥3 {3} ≥5 ≥5 ≥1 {4} ≥4 ≥3 ≥2 {5} ≥2 ≥4 ≥5 {1, 4} ≥3 ≥2 ≥2 {1, 5} ≥2 ≥2 ≥3 {3, 4} ≥4 ≥3 ≥1 {3, 5} ≥2 ≥4 ≥1 {1, 3, 4} ≥3 ≥2 ≥1 {1, 4, 5} ≥2 ≥2 ≥2 {3, 4, 5} ≥2 ≥3 ≥1 {1, 2, 4, 5} ≥1 ≥1 ≥2 {1, 3, 4, 5} ≥2 ≥2 ≥1 {1, 2, 3, 4, 5} ≥1 ≥1 ≥1 Table 6. Concepts of the Minimum lattice corresponding to the data in Table 5. Intent Extent Movie 1 Movie 2 Movie 3 {2} ≤1 ≤1 ≤2 {3} ≤5 ≤5 ≤1 {1, 2} ≤3 ≤2 ≤3 {2, 4} ≤4 ≤3 ≤2 {2, 5} ≤2 ≤4 ≤5 {1, 2, 4} ≤4 ≤3 ≤3 {1, 2, 5} ≤3 ≤4 ≤5 {2, 3, 4} ≤5 ≤5 ≤2 {1, 2, 3, 4} ≤5 ≤5 ≤3 {1, 2, 4, 5} ≤4 ≤4 ≤5 {1, 2, 3, 4, 5} ≤5 ≤5 ≤5 Table 7. Concepts of the Maximum lattice corresponding to the data in Table 5. 260 Paula Brito and Géraldine Polaillon The concepts (except for the empty extent one) obtained from this data table, using generalization by intervals, i.e., for A ⊆ E, f (A) = (I1 , I2 , I3 ), with Ij = [M in {Yj (Critici )} , M ax {Yj (Critici )}], Critici ∈ A, j = 1, 2, 3, are listed in Table 8. Intent Extent Movie 1 Movie 2 Movie 3 {1} [3, 3] [2, 2] [3, 3] {2} [1, 1] [1, 1] [2, 2] {3} [5, 5] [5, 5] [1, 1] {4} [4, 4] [3, 3] [2, 2] {5} [2, 2] [4, 4] [5, 5] {1, 2} [1, 3] [1, 2] [2, 3] {1, 4} [3, 4] [2, 3] [2, 3] {1, 5} [2, 3] [2, 4] [3, 5] {2, 4} [1, 4] [1, 3] [2, 2] {2, 5} [1, 2] [1, 4] [2, 5] {3, 4} [4, 5] [3, 5] [1, 2] {3, 5} [2, 5] [4, 5] [1, 5] {4, 5} [2, 4] [3, 4] [2, 5] {1, 2, 4} [1, 4] [1, 3] [2, 3] {1, 2, 5} [1, 3] [1, 3] [2, 5] {1, 3, 4} [3, 5] [2, 5] [1, 3] {1, 4, 5} [2, 4] [2, 4] [2, 5] {2, 3, 4} [1, 5] [1, 5] [1, 2] {3, 4, 5} [2, 5] [3, 5] [1, 5] {1, 2, 3, 4} [1, 5] [1, 5] [1, 3] {1, 2, 4, 5} [1, 4] [1, 4] [2, 5] {1, 3, 4, 5} [2, 5] [2, 5] [1, 5] {1, 2, 3, 4, 5} [1, 5] [1, 5] [1, 5] Table 8. Concepts of the interval lattice for the data in Table 5. We notice that all the concepts obtained using the Minimum or the Maximum operator are concepts for the interval generalization, although with a different meaning, given the different intent mapping. As discussed before, even in this small example it may be observed that concepts obtained using the Minimum or the Maximum operator often present a rather general intent, thus leading to over-generalization in the concept formation. Consider, for instance, the concept ({1} , (Movie 1 ≥ 3 , Movie 2 ≥ 2 , Movie 3 ≥ 3)) in Table 6, it indicates that Critic 1 gives high marks to each movie, which is not really the case, whereas the concept ({1} , (Movie 1 ∈ [3, 3] , Movie 2 ∈ [2, 2] , Movie 3 ∈ [3, 3])) in Table 8 gives a much more accurate description of the concepts’s extent. Also, concept ({3} , (Movie 1 ≤ 5 , Movie 2 ≤ 5 , Movie 3 ≤ 1)) in Table 7 describes Critic 3 Homogeneity and Stability in Conceptual Analysis 261 as giving any marks to Movies 1 and 2, and low marks to Movie 3; using interval generalization we learn that the marks given by Critic 3 to Movies 1 and 2 are the highest and non other. Consider now concept ({3, 4} , (Movie 1 ≥ 4 , Movie 2 ≥ 3 , Movie 3 ≥ 1)) in Table 6: the intent reports any mark for Movie 3 (in particular, high marks are possible); if we use interval generalization instead we obtain the concept ({3, 4} , (Movie 1 ∈ [4, 5] , Movie 2 ∈ [3, 5] , Movie 3 ∈ [1, 2] which more accurately describes the observed situation. We now compare the concepts retained as stable with each of the three distances, using both δM ax and δM ean , and a threshold value of 1 and 2. The identified stable concepts in each case, represented by the corresponding extent, are listed in Table 9. Distance Criterion Threshold Stable concepts (extent) dH Max 1 {1} , {2} , {3} , {4} , {5} , {1, 4} {1} , {2} , {3} , {4} , {5} , {1, 2} , {1, 4} , {1, 5} , {3, 4} , 2 {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} , {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5} Mean 1 {1} , {2} , {3} , {4} , {5} , {1, 4} , {1, 2, 4} , {1, 2, 5} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5} {1} , {2} , {3} , {4} , {5} , {1, 2} , {1, 4} , {1, 5} , {2, 4} , {3, 4} , {4, 5} 2 {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} , {2, 3, 4} , {3, 4, 5} {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5} d2 Max 1 {1} , {2} , {3} , {4} , {5} , {1, 4} {1} , {2} , {3} , {4} , {5} , {1, 2} , {1, 4} , {1, 5} , {3, 4} , 2 {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} , {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5} Mean 1 {1} , {2} , {3} , {4} , {5} , {1, 4} , {1, 2, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5} {1} , {2} , {3} , {4} , {5} , {1, 2} , {1, 4} , {1, 5} , {2, 4} , {3, 4} , {4, 5} 2 {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} , {2, 3, 4} , {3, 4, 5} , {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5} d1 Max 1 {1} , {2} , {3} , {4} , {5} , {1, 4} {1} , {2} , {3} , {4} , {5} , {1, 2} , {1, 4} , {1, 5} , {3, 4} , 2 {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} , {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5} Mean 1 {1} , {2} , {3} , {4} , {5} , {1, 4} , {1, 2, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5} {1} , {2} , {3} , {4} , {5} , {1, 2} , {1, 4} , {1, 5} , {2, 4} , {3, 4} , {4, 5} , 2 {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} , {2, 3, 4} , {3, 4, 5} , {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5} Table 9. Stable concepts for different distances, criteria and threshold values. 262 Paula Brito and Géraldine Polaillon As it may be seen from Table 9, for all distances and both criteria, a demand- ing threshold identifies a small number of stable concepts, therefore leading to an important reduction in the number of retained concepts; if we use a more liberal threshold, a larger number of concepts are retained as stable, as was to be expected. The maximum criterion is naturally more strict than the mean, which retains more concepts as stable, for all distances and both threshold val- ues. Finally, in this example, no important difference appears between the results obtained for the different distance measures. 7 Conclusion A common generalization procedure, for numerical, ordinal and modal variables, which uses a representation based on interval-vectors is presented. This allows defining more homogeneous concepts, than generalization operators that use the maximum and/or the minimum. The proposed approach for ordinal variables allows addressing recommendation systems, analyzing preference data tables. It would also be interesting to explore how the proposed generalization operator behaves in a supervised learning context. The number of obtained concepts being often rather large, a method for identifying stable concepts is proposed, using a cross-validation-like approach. This allows avoiding the effect of atypical elements in the concepts’ formation. Naturally, the value of the used threshold has an important influence in the rate of concept reduction. The next step will be to explore this methodology for larger data tables, so as to have a more accurate evaluation of its efficiency in concept reduction. Another issue interesting to investigate is the comparaison of the list of concepts with those obtained with a subset of the given variables. This then leads to the problem of variable selection in the context of Galois lattices construction and analysis. As concerns applications, we are particularly interested in analyzing real preference data, for application in recommendation systems. References [1] Z. Assaghir, M. Kaytoue, N. Messai and A. Napoli (2009). On the mining of numer- ical data with Formal Concept Analysis and similarity. In Proc. Société Francophone de Classification, pp. 121-124. [2] Barbut, M. and B. Monjardet (1970). Ordre et Classification, Algèbre et Combina- toire, Tomes I et II. Paris: Hachette. [3] Brito, P. (1994). Order structure of symbolic assertion objects. IEEE Transactions on Knowledge and Data Engineering 6 (5), 830–835. [4] Brito, P. and G. Polaillon (2005). Structuring probabilistic data by Galois lattices. Math. & Sci. Hum. / Mathematics and Social Sciences 169 (1), 77–104. [5] Ganter, B. and S.O. Kuznetsov (2001). Pattern structures and their projections. In: G. Stumme and H. Delugach (Eds.), Proc. 9th Int. Conf. on Conceptual Structures, ICCS’01, Lecture Notes in Artificial Intelligence, vol. 2120, pp. 129-142. Homogeneity and Stability in Conceptual Analysis 263 [6] Ganter, B. and R. Wille (1999). Formal Concept Analysis, Mathematical Founda- tions. Berlin: Springer. [7] Kaytoue, M., S.O. Kuznetsov, A. Napoli and S. Duplessis (2011). Mining gene expression data with pattern structures in formal concept analysis. Information Sciences, Volume 181, Issue 10, 1989–2001. [8] Kuznetsov, S. (2007). On stability of a formal concept. Annals of Mathematics and Artificial Intelligence 49 (1-4), 101–115. [9] Noirhomme-Fraiture, M. and P. Brito (2011). Far beyond the classical data models: Symbolic Data Analysis. Statistical Analysis and Data Mining 4 (2), 157–170. [10] Pernelle, N., M.-C. Rousset, and V. Ventos (2001). Automatic construction and refinement of a class hierarchy over multi-valued data. In L. De Raedt and A. Siebes (Eds.), Principles of Data Mining and Knowledge Discovery, Lecture Notes in Com- puter Science, pp. 386–398. [11] Pfaltz, J. (2007). Representing numeric values in concept lattices. In J. Diatta, P. Eklund and M. Liquiere (Eds.), Proc. Fifth International Conference on Concept Lattices and Their Applications, pp. 260–269. [12] Roth, C., S. Obiedkov and D. Kourie (2008). On succint representation of knowl- edge community taxonomies with Formal Concept Analysis. International Journal of Foundations of Computer Science 19 (2), 383–404. A lattice-based query system for assessing the quality of hydro-ecosystems Agnès Braud1 , Cristina Nica2 , Corinne Grac3 , and Florence Le Ber?3,4 1 LSIIT, CNRS-UdS, Strasbourg, France 2 University Dunărea de Jos, Galati, Romania 3 › LHYGES, CNRS-ENGEES-UdS, Strasbourg, France 4 LORIA – INRIA NGE, Nancy, France Abstract. Concept lattices are useful tools for organising and querying data. In this paper we present an application of lattices for analysing and classifying stream sites described by physical, physico-chemical and biological parameters. Lattices are first used for building a hierarchy of site profiles which are annotated by hydro-ecologists. This hierarchy can then be queried to classify and assess new sites. The whole approach relies on an information system storing data about Alsatian stream sites and their parameters. A specific interface has been designed to manipu- late the lattices and an incremental algorithm has been implemented to perform the query operations. Keywords: incremental lattice, lattice-based query system, classifica- tion, information system, biological quality of water-bodies 1 Introduction Concept -or Galois- lattices are useful tools for organising, mining, and querying qualitative data in various application domains [14, 10, 24]. However when de- veloping a domain specific lattice-based tool -to be used by domain analysts, a main problem is to define the proper approach and tool that fit the requirements of the experts and other users involved in the project. This paper presents an application of Galois lattices to the hydro-ecological domain, focussing on how to assess and monitor the ecological state of streams or water areas. These questions are currently major problems in Europe, as underlined by the recent European Water Framework Directive (2000). Assessing the ecological quality of streams requires to take into account various data such as physico-chemical measures on sites, but also taxonomic statements or qualitative information on species. Fur- thermore tools are needed to summarise all these data and to provide a global and reliable information on the ecological state of streams and water areas. Fol- lowing this aim we have developed an information system to collect data on Alsatian streams (North-East of France) [17] and implemented a lattice-based query system to help hydro-ecologists to compare and assess the ecological state ? Corresponding author, florence.leber@engees.unistra.fr. c 2011 by the paper authors. CLA 2011, pp. 265–277. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 266 2 A. Braud, Agnés Braud, C. Nica, C. Cristina Grac, Nica, F. Le Ber Corinne Grac and Florence Le Ber of streams. Concepts lattices are used: (1) to organise data, i.e. stream or water area sites with similar parameters are clustered within concepts; (2) to embed expert knowledge, i.e. concepts are annotated with an expert qualification or comment; (3) to perform queries, i.e. the annotated concepts are used to help assessing new sites of streams or water areas. The paper is organised as follows. First (Section 2) we present the application domain. Section 3 is devoted to the principles of lattice-based querying. Sections 4 and 5 describe the principles and the implementation of our proposition. Sec- tion 6 compares our approach to other lattice-based tools and the last section is a conclusion. 2 Assessing the quality of hydro-ecosystems The European Water Framework Directive (2000) requires the development of new tools for monitoring and assessing the quality of water-bodies (i.e. rivers, lake, gravel pits,...). Such an assessment is built on various information: informa- tion about the species living in the streams and physical, chemical and biological data collected on the sites. From these information are built several numerical indices that are synthetic indicators for assessing the physico-chemical or bio- logical quality of an hydro-ecosystem. More precisely, in France, five biological indices have been normalised to assess the quality of running water. They are based on three faunistic groups: the invertebrate index [1], the oligochaete (small worms living in sediments) index [3], the fish index [5], and on two floristic groups: the diatom (microscopic algae) index [2], and the macrophyte (macroscopic plants living in water) index [4]. Illustrations of the taxa used for these indices are given in Figure 1. (a) Invertebrate(b) Oligochaete (c) Fish (d) Diatom (e) Macrophyte Fig. 1: Taxa examples for the five biological indices According to AFNOR (French organism of normalisation) [1, 3, 5, 2, 4] each of them gives a different estimation of the water ecosystem quality. The macrophyte index estimates the trophic level of water, the diatom index gives the global water quality, the oligochaete index gives an evaluation of the sediment quality, and the fish index allows to classify the chemical and physical water quality quite like the invertebrate index. Therefore, their answers on a same site, with a same undergone pressure, at the same time can be really different but the simultaneous A lattice-based query system forLattice-based assessing theassessment quality ofofhydro-ecosystems hydro-ecosystems 3 267 application of these five indices is not common and work comparing their answers are not frequent [20]. Furthermore, indices based on physical (e.g. width and slope of the stream bed) and physico-chemical (e.g. pH, temperature, nitrates, organic matters, pes- ticides) data give an other estimation of the ecosystem quality. Thus, it is necessary to combine the various indices to assess the quality of a whole water ecosystem. Such an approach, called the ecological ambiance system, has been proposed in [20, 21] based on the five French biological indices. Our objective is to develop this concept and to propose a concretely applicable tool. We therefore rely on a large database collecting data on Alsatian streams and water areas [18]. The database contains 38 tables and it suits the SANDRE1 French national format for aquatic data. It is implemented within the MySQL Database Management System. The data are either issued from samples, synthetic data or general informa- tion issued from the literature. They are qualitative and quantitative, and suit the current standards about protocol sampling and indices computation based on thresholds [1, 3, 5, 2, 4, 22, 23]. Data issued from samples correspond to raw data. Synthetic data are produced from these samples, in particular taxonomic lists are used to compute biological indices. Data issued from the literature are used for the analysis and synthesis of the preceding data (for example they provide the thresholds for the classification of physical, physico-chemical and biological results into classes ranging from 1 (very good quality) to 5 (very bad quality)). We have gathered information on 700 sites in the Alsace Plain, the oldest one being collected 20 years ago. Details on this database and how it is used are given in [17]. 3 Using lattices for querying databases Galois lattices are useful tools for organising data and building knowledge bases [7, 14, 24]. Furthermore, they are very interesting for information retrieval since they allow both direct retrieval and browsing [16]. Primarily, concept lattices have been used for information retrieval within texts [25, 11]. More recently lattice- based approaches have been used to build query or information retrieval systems on various data: e.g. information retrieval within photos or personal data [13], geographical data [8], or museum collections [26]. The underlying hypothesis is that a concept extent represents the result of a query which is defined by the conjunction of its intent. The query can be easily refined or enlarged following the edges starting from the concept into the lattice hierarchy. Practically, the query (a A set of attributes) can be performed as follows: the lattice is looked for a matching concept that is a concept which intent equals the A set -if it exists- or the most general concept which intent is larger than A. This concept can also be characterised as the infimum (greatest lower bound) of all the concepts containing at least one of the attributes of A. This can be done 1 http://sandre.eaufrance.fr 268 4 A. Braud, Agnés Braud, C. Nica, C. Cristina Grac, Nica, F. Le Ber Corinne Grac and Florence Le Ber with various algorithms and the queried lattice does not have to be modified. Furthermore, a local view can be displayed to the user. However, when the query represents a new object that is to be incorporated within the lattice, an incremental algorithm has to be used [15, 10]. This is the case in our application, since the user has got data about real stream sites which she/he wants to confront to the sites represented in the existing lattice. Further- more, she/he can add the new sites to the lattice and thus modify its structure. We have implemented therefore two incremental algorithms proposed in [10], and roughly described in section 5.1. These algorithms have been chosen because they allow to build the Hasse diagram of the lattice, contrarily to most of incremental algorithms (see [19] for a comparison on these algorithms). Furthermore, we did not look for performance, since in this first step of our work only small data sets (40 sites) have been considered. 4 Using lattices for assessing hydro-ecosystems Lattices have been used in two ways: firstly to cluster stream sites into concepts that are used by hydro-ecologists to define profiles of these sites; secondly, the lattices are annotated with the profiles and used into a query-system to help the assessment of new sites. The proposed tool includes the two stages (see Section 5.2). 4.1 A lattice-based clustering of Alsatian stream sites Stream sites are described by different numerical attributes, biological indices on the one hand, physico-chemical data on the other hand. Those attributes are con- verted into ordinal scales leading to quality classes. The whole context contains about 40 stream sites, described with 5 biological indices, 10 physico-chemical indices and 5 physical indices. In the following, we focus on the biological indices. Table 1 gives the values of these five indices restricted to seven sites. Each site is denoted by a code: for example, the BW2 site (Brunnwasser downstream) has a good quality (class 2) for the IBGN (invertebrate), IBD (diatom) and IPR (fish) indices, a bad quality (class 4) for the IBMR (macrophyte) index and an average quality (class 3) for the IOBS (oligochaete) index. The multi-valued context rep- resented in table 1, denoted C7 in the following, can be converted into a binary one by using a linear scale [14]. The general idea is to gather similar sites and to allocate them a profile describing their ecological state, combining the quality estimations of all com- partments, with respect to the different classes of indices. This work is based on the approach described in [20]. The process is as follows: – Step 1: Lattice construction on the data. To facilitate the expert analysis, the context size is reduced by focussing on a small number of indices or by identifying sub-lattices with respect to classes of indices. For example, A lattice-based query system forLattice-based assessing theassessment quality ofofhydro-ecosystems hydro-ecosystems 5 269 Site code IBGN IBMR IOBS IBD IPR BW2 2 4 3 2 2 IL1 3 3 3 2 3 MO1 1 4 3 3 4 MS2 2 4 5 2 2 RT2 2 5 4 2 2 ST1 1 3 4 3 2 ZN4 1 4 4 3 2 Table 1: Quality classes of the five biological indices for 7 stream sites Figure 2 presents the lattice obtained from the context C7 (the lattice was built with ConExp2 ) . – Step 2: Analysis by the experts of the lattice hierarchy and its implication rules in order to select relevant concepts (or site profiles). In this step, the expert may identify profiles which are not present in the lattice and create virtual sites to be represented in the lattice. – Step 3: Qualification of the concepts by the experts. For example, the con- cept ({IBGN 2, IBD 2, IPR 2, IBMR 4, IOBS 3},{BW2}) (down on the lat- tice, Figure 2) is interpreted as follows: Brunnwasser downstream: low sed- iment degradation, high eutrophication, good general potential of resilience and possible resilience for sediments, various habitats. Once a suitable annotated lattice has been built following this process, it can be used to determine the profile of a new site based on its values for the corresponding indices. This is explained in the next section. 4.2 Assessing a stream site from the lattice According to the ecological ambiance system described in [20], several lattices have been built for clustering sites with similar average values (or alteration degrees3 ) on the five biological indices. The underlying hypothesis is that global state of an hydro-ecosystem can be assessed on the basis of the five biological indices and synthesised by the alteration degree. Sites with similar alteration degrees can be compared even if they represent various profiles. The intervals of similarity have been defined by the hydro-ecologists [18]. For example, the lattice in Figure 2 was obtained from a set of sites with an alteration degree belonging to [2.5 ; 3] (see C7 context in table 1). The classes of indices in the lattice vary between 1 and 5. Each site is represented alone in an atom of the lattice, which is coherent with the choices done in the project, trying to represent all the variety of streams or water areas in the Alsace plain. 2 http://conexp.sourceforge.net/ 3 The alteration degree is computed as the average value of the five biological in- dices, e.g. the alteration degree of BW2 equals 13/5. Currently the physico-chemical parameters are not taken into account. 270 6 A. Braud, Agnés Braud, C. Nica, C. Cristina Grac, Nica, F. Le Ber Corinne Grac and Florence Le Ber Fig. 2: The lattice based on the context of table 1 (linear scale) Let us now suppose that we have got a partial information on a new stream site, denoted Q, defined by the following values: IBGN 2 IBMR 4 IOBS 3 IPR 2 (IBD missing). Its alteration degree is 2.75 ∈ [2.5 ; 3], Q can thus be compared to the stream sites represented in the C7 lattice. This is done by classifying Q within this lattice, as shown in Figure 3. Looking at the lattice in Figure 3, one can see that the Q site-query has four common values with only the BW2 site (Brunnwasser downstream). The expert qualification of BW2 (except for the IBD index) can thus be used to assess the Q site. The Q site could thus be assessed as follows: the habitat quality and the water physico-chemical quality are good, expect for nutriments (nitrate and phosphor mineral forms) which quality is medium; the sediment quality is medium, the resilience potential of the general ecosystem is good, while the resilience potential of sediments is deteriorated. 5 Implementation 5.1 Algorithms As explained before, the built lattices have to be queried for assessing new sites. Furthermore, they could have to be updated, by adding a new site, or by modi- fying an existing site. The new/updated object is described by attributes which can exist in the context of the lattice or not. In this paper we only consider the case where the attributes already exist. Two algorithms described by Carpineto and Romano [10] have been implemented, the first one allows to add a new object in a lattice, while the second one allows to delete an object from a lattice. A lattice-based query system forLattice-based assessing theassessment quality ofofhydro-ecosystems hydro-ecosystems 7 271 Fig. 3: The C7 lattice with the Q site-query inserted The first algorithm allows to add a new object into an existing Galois lattice, which can be interpreted as classifying a new object. It takes as input a Galois lattice and the new object with its attributes. The output is the updated Galois lattice of the new context. The mechanism of the algorithm is as follows. The set of the concepts is divided into subsets according to their intent cardinality, and then analysed in ascending order. For each concept of a subset, if the intent is included in or equal to the set of the new object attributes then the current concept extent is augmented by the new object; otherwise a new concept is created, after verifying that such a concept is not in the initial set of concepts or among the new added ones. The intent of this new concept is determined by the intersection of the current concept intent and the new object attributes; its extent is defined by the current concept extent augmented with the new object. After the addition of a new concept a new link between this concept and the current concept is created. The links with neighbouring concepts are also updated. The second algorithm allows to delete an object from a lattice. It takes as input a Galois lattice and the object to be removed. The output is the updated Galois lattice of the new context. The mechanism of the algorithm is as follows. For each concept, if the object to be deleted is included or equal to the current concept extent, then it is removed from this extent. If the modified concept has then the same extent as one of its children, it is deleted. When a concept is removed the links among the concepts are updated. 272 8 A. Braud, Agnés Braud, C. Nica, C. Cristina Grac, Nica, F. Le Ber Corinne Grac and Florence Le Ber The modification of an existing object in a Galois lattice is performed in two steps: (1) deleting this object using the second algorithm; (2) adding the updated object using the first algorithm. The whole process could be improved with a third algorithm for adding attributes into the lattice context, allowing to enrich the initial lattice with new information. 5.2 User interface and manipulation The user interface allows to use a lattice either stored in the database or stored in a XML file with the structure used in the software Galicia4 . Three main functional views are provided to the user. The first one allows to qualify concepts, i.e. to describe the profile of a set of sites. The second one allows to define a query, i.e. a new site to be assessed according to an existing lattice. The third view allows to explore the result of the query, i.e. to compare the characteristics of the new site to those of the already assessed sites. Currently texts appearing on the interface views are written in French since the target users are French. Other languages could be used in the future. The functional view for qualifying concepts is presented on Figure 4. Once a lattice is chosen, it is possible to select a given concept in a list and to see its description (intent, extent, and comment). The lists of the parents and children of that concept are also shown, and by a click on one of them, we see its related information. These information may help the experts in qualifying the concept. The comment is then stored in the database. Fig. 4: Qualifying the concepts of the site lattice 4 http://www.iro.umontreal.ca/~galicia/ A lattice-based query system forLattice-based assessing theassessment quality ofofhydro-ecosystems hydro-ecosystems 9 273 The functionality for classifying a new site based on its values (for one or several indices) is presented on Figure 5. One has first to select a lattice and to give a name for the new site, and then to provide a description of this new site by choosing indices and their values. Once this is done, it is possible to classify the site, that is to integrate it in the lattice, either temporarily or to save it in the lattice. The button “Classer” allows this classification. To interpret the result, the button “Visualiser le résultat” can be used to see the new lattice with the modifications shown in a specific colour. The button “Explorer le treillis” also helps in the interpretation by giving access to a third view (Figure 6) where it is possible to navigate within the concepts and see the description of the parents and children of the current concept. Fig. 5: Definition of the Q site-query More precisely, the third view allows to explore only the modified or new concepts of the lattice, i.e. the concepts where the site-query is represented. These concepts can be commented and the modified lattice can be stored in the database. Eventually, the commented lattices can be exported in various formats to be further analysed. 6 Discussion We decided to implement a specific tool for several reasons: 1. the tool has to be interconnected with a database and to offer a user-friendly interface for hydro-ecologists, allowing them to annotate the concepts; 2. the purpose of the tool is not navigating throughout the whole database; 3. this is a two-stage tool: the first stage organises a specific information within a lattice; the second stage allows the user to explore and possibly modify this lattice. 274 10 Agnés A. Braud, Braud, C. Nica, C. Cristina Grac, Nica, F. Le Ber Corinne Grac and Florence Le Ber Fig. 6: Analysing the classification result of the Q query Regarding the first point, lattice-builder tools like Galicia, ConExp, or the Toscana suite5 cannot be used, since they do not fit the requirements of hydro- ecologists. Actually, as said before, we have used Galicia to build the lattices which are then recorded in the database to be annotated and explored by hydro- ecologists. Besides, the lattices built through our tool can be exported into a Galicia format. Regarding the second point, our approach differs from those used in search or browsing tools like Camelis [13], Abilis [6], D-SIFT [12] or in the Virtual Museum of the Pacific [26]. Indeed we did not try to implement a lattice-based approach to explore the whole database, but only specific information from this database. This information was chosen by hydro-ecologists as a synthetic view of the database. Furthermore, the lattice is used as a basis to record expert knowledge (the annotations) that can be involved in further investigations. Regarding the last point, our tool can be compared to Ulysses [9] which is a visual interface allowing to access a lattice structure organising information from a database. Ulysses allows the user to search the retrieval space both by browsing or querying, whereas our tool only allows querying. Nevertheless, the originality of our tool is the user possibility of modifying and annotating the lattice concepts. Finally, the underlying aim of our approach is to build an ontology, gather- ing the knowledge of various experts on hydro-ecosystems. Each expert indeed focuses on a specific compartment of the hydro-ecosystems (e.g. fishes, macro- 5 http://toscanaj.sourceforge.net/ A lattice-based query system forLattice-based assessing theassessment quality ofofhydro-ecosystems hydro-ecosystems 11 275 phytes, diatoms...) and a generic tool is needed to combine their expertises and produce a global assessment of the ecological state of a stream site. 7 Conclusion This paper presents a lattice-based query system for helping the assessment of hydro-ecosystems. The approach relies on a database storing various information on stream sites of the Alsace plain. These data are summarised within qualitative indices, biological indices or physico-chemical and physical indices. Based on these indices and their own expertise, hydro-ecologists can perform a global evaluation of the functioning of a stream ecosystem. Furthermore, they want to define quality profiles of streams or water areas that could be used to assess new sites. Eventually a tool is needed to help the whole process. Our work aims at building such a tool. Concept lattices appeared as a good approach since they allow both to build hierarchical clustering of sites, to nav- igate through the clusters, and to perform queries for helping the assessment of a new site. The clustering aspects already proved to be interesting, and the user interface allowing to comment and query the lattices is currently being ex- perimented by hydro-ecologists. In the future, several lattices have to be built including various sets of indices (physico-chemical and physical indices). Fur- thermore, the whole approach will be tested with stream or water area data from other regions in France. Regarding the implementation aspects, the system should be improved in two ways: allowing the integration of new attributes in an existing lattice and allow- ing the navigation through bigger lattices. Finally improvements can be done to provide self-building comments on the site-queries, based on the comments of the neighbouring concepts. Acknowledgements The Indice project (2006-11) was supported by the Agence de l’Eau Rhin- Meuse. We also acknowledge the scientific and technical help of the Cemagref Centre in Lyon, the Gabriel Lippmann Public Research Centre in Luxembourg and the regional delegation of ONEMA (Office National de l’Eau et des Milieux Aquatiques). Cristina Nica’s stay in France was supported by the Erasmus Eu- ropean program. We acknowledge the anonymous reviewers who helped us to improve our paper. References 1. AFNOR: Qualité de l’eau : détermination de l’Indice Biologique Global Normalisé (IBGN). NF T90-350 (1992), révision 2004 2. AFNOR: Qualité de l’eau : détermination de l’Indice Biologique Diatomées (IBD. NF T90-354 (2000), révision 2007 276 12 Agnés A. Braud, Braud, C. Nica, C. Cristina Grac, Nica, F. Le Ber Corinne Grac and Florence Le Ber 3. AFNOR: Qualité de l’eau : détermination de l’Indice Oligochètes de Bioindication des Sédiments (IOBS). NF T90-390 (2002) 4. AFNOR: Qualité de l’eau : détermination de l’Indice Biologique Macrophytique en Rivière (IBMR). NF T90-395 (2003) 5. AFNOR: Qualité de l’eau : détermination de l’Indice poissons rivière (IPR). NF T90-344 (2004) 6. Allard, P., Ferré, S., Ridoux, O.: Discovering Functional Dependencies and Asso- ciation Rules by Navigating in a Lattice of OLAP Views. In: Kryszkiewicz, M., Obiedkov, S. (eds.) Proceedings of CLA 2010, Sevilla, Spain. pp. 199–210 (2010) 7. Barbut, M., Monjardet, B.: Ordre et classification – Algèbre et combinatoire. Ha- chette (1970) 8. Bedel, O., Ferré, S., Ridoux, O., Quesseveur, E.: GEOLIS: a logical information system for geographical data. Revue Internationale de Géomatique 17, 371–390 (2007) 9. Carpineto, C., Romano, G.: ULYSSES: A Lattice-based Multiple Interaction Strat- egy Retrieval Interface. In: Blumenthal, B., Gornostaev, J., Unger, C. (eds.) Human-Computer Interaction, 5th International Conference, EWHCI’95, Moscow, Russia. LNCS, vol. 1015, pp. 91–104. Springer-Verlag (1995) 10. Carpineto, C., Romano, G.: Concept Data Analysis. Theory and Applications. John Wiley & Sons Ltd (2004), 201 pages 11. Carpineto, C., Romano, G.: Using concept lattices for text retrieval and mining. In: Ganter, B., Stumme, G., Wille, R. (eds.) Formal Concept Analysis, LNCS, vol. 3626, pp. 3–45. Springer Berlin / Heidelberg (2005) 12. Ducrou, J., Wormuth, B., Eklund, P.: D-SIFT: A Dynamic Simple Intuitive FCA Tool. In: Dau, F., Mugnier, M.L., Stumme, G. (eds.) Conceptual Structures: Com- mon Semantics for Sharing Knowledge – Proceedings of ICCS 2005. vol. LNAI 3596, pp. 295–306. Springer-Verlag (2005) 13. Ferré, S.: Camelis: a logical information system to organise and browse a collection of documents. International Journal of General Systems 38(4), 379–403 (2009) 14. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer Verlag (1999) 15. Godin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithm based on Galois (concept) lattices. Computational Intelligence 11(2), 246–267 (1995) 16. Godin, R., Missaoui, R., April, A.: Experimental comparison of navigation in a Ga- lois lattice with conventional information retrieval method. International Journal of Man-Machine Studies 38, 747–767 (1993) 17. Grac, C., Braud, A., Le Ber, F., Trémolières, M.: Un système d’information pour le suivi et l’évaluation de la qualité des cours d’eau – Application à l’hydro-région de la plaine d’Alsace. RSTI - Ingénierie des Systèmes d’Information 16, 9–30 (2011) 18. Grac, C., Le Ber, F., Braud, A., Trémolières, M., Bertaux, A., Herrmann, A., Manné, S., Lafont, M.: Programme de recherche-développement Indices – rap- port scienfique final. Contrat pluriannuel 1463 de l’Agence de l’Eau Rhin-Meuse, LHYGES – LSIIT – ONEMA – CEMAGREF (2011) 19. Kuznetsov, S.O., Obiedkov, S.A.: Comparing performance of algorithms for gen- erating concept lattices. J. Exp. Theor. Artif. Inelligence 14(2-3), 189–216 (2002) 20. Lafont, M.: A conceptual approach to the biomonitoring of freshwater: the ecolog- ical ambience system. Journal of Limnology 6, 17–24 (2001) 21. Lafont, M., Jézéquel, C., Vivier, A., Breil, P., Schmitt, L., Bernoud, S.: Refinement of biomonitoring of urban water courses by combining descriptive and ecohydro- logical approaches. Ecohydrol. Hydrobiol. 10, 3–11 (2010) A lattice-based query system forLattice-based assessing theassessment quality ofofhydro-ecosystems hydro-ecosystems 13 277 22. MEDD: Système d’évaluation de la qualité de l’eau des cours d’eau (SEQ-Eau), version 2. Ministère de l’Ecologie et du Développement Durable et Agences de l’Eau (2003), Étude inter-agences de l’eau, no 52 23. MEDD: Circulaire dce 2007/22 du 11 avril 2007 relative au protocole de prélèvement et de traitement des échantillons des invertébrés pour la mise en œu- vre du programme de surveillance sur cours d’eau. Ministère de l’Ecologie et du Développement Durable (2007) 24. Napoli, A.: A smooth introduction to symbolic methods in knowledge discovery. In: Cohen, H., Lefebvre, C. (eds.) Categorization in Cognitive Science. Elsevier (2006) 25. Priss, U.: Lattice-based information retrieval. Knowledge Organization 27(3), 132142 (2000) 26. Wray, T., Eklund, P.: Exploring the Information Space of Cultural Collections Using Formal Concept Analysis. In: Valtchev, P., Jäschke, R. (eds.) Proceedings of 9th International Conference on Formal Concept Analysis, ICFCA 2011, Nicosia, Cyprus. LNAI, vol. 6628, pp. 251–266. Springer-Verlag (2011) The Word Problem in Semiconcept Algebras Philippe Balbiani CNRS — Université de Toulouse Institut de recherche en informatique de Toulouse 118 ROUTE DE NARBONNE, 31062 TOULOUSE CEDEX 9, France Philippe.Balbiani@irit.fr Abstract. The aim of this article is to prove that the word problem in semiconcept algebras is PSPACE-complete. Keywords: Formal concept analysis, semiconcept algebras, word problem, de- cidability/complexity. 1 Introduction In formal concept analysis [2, 3], the properties of formal contexts are reflected by the properties of the concept lattices they give rise to [10, 12]. Extending concept lattices to protoconcept algebras and semiconcept algebras, Herrmann et al. [5] and Wille [11] introduced negations in conceptual structures based on formal contexts such as double Boolean algebras and pure double Boolean algebras. These algebras have attracted interest for their theoretical merits — basic representations have been obtained — and for their practical relevance — applications in the field of knowledge representation and reasoning have been developed [5–7, 9, 11]. The basic representations of protoconcept algebras and semiconcept algebras evoked above have been obtained by means of equational axioms. Hence, the problem naturally arises of whether there is an algorithm which given terms s, t, decides whether they represent the same element in all models of these equa- tional axioms. Such a problem is called the word problem (WP) in protoconcept algebras or in semiconcept algebras. In Mathematics and Computer Science, word problems are of the utmost importance. Within the context of protoconcept algebras, Vormbrock [8] demonstrates that given terms s, t, if s = t is not valid in all protoconcept algebras then there exists a finite protoconcept algebra in which s = t is not valid. Nevertheless, the upper bound on the size of the finite protoconcept algebra given in [8, Page 258] is not elementary. Therefore, it does not allow us to conclude — as wrongly stated in [8, Page 240] — that the WP in protoconcept algebras is NP-complete. Switching over to semiconcept algebras, the aim of this article is to prove that the WP in semiconcept algebras is PSPACE-complete. Sections 2 and 3 show some of the basic properties of formal contexts and semi- concept algebras that have been discussed in [5–7, 9, 11]. In Section 4, we present c 2011 by the paper authors. CLA 2011, pp. 279–294. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 280 Philippe Balbiani the WP in semiconcept algebras. Section 5 introduces a basic 2-sorted modal logic that will be used in Sections 6 and 7 to prove that the WP in semiconcept algebras is PSPACE-complete. The proofs of Lemmas 10, 11, 12 and 13 can be found in the annex. 2 From Formal Contexts to Semiconcept Algebras In formal concept analysis, the properties of semiconcepts are reflected by the properties of the algebras they give rise to. 2.1 Formal Contexts Formal contexts are structures of the form IK = (G, M, ∆) where G is a nonempty set (with typical member denoted g), M is a nonempty set (with typical member denoted m) and ∆ is a binary relation between G and M . The elements of G are called “objects”, the elements of M are called “attributes” and the intended meaning of g ∆ m is “object g possesses attribute m”. ∆ a1 a2 o1 × × o2 × Tab. 1. Example 1. In Tab. 1 is an example of a formal context IK2,2 with 2 objects — o1 and o2 — and 2 attributes — a1 and a2 . For all X ⊆ G and for all Y ⊆ M , let X . = {m ∈ M : for all g ∈ G, if g ∈ X then g ∆ m} Y / = {g ∈ G: for all m ∈ M , if m ∈ Y then g ∆ m} That is to say, X . is the set of all attributes possessed by all objects in X and Y / is the set of all objects possessing all attributes in Y . Example 2. In the formal context IK2,2 of Tab. 1, {o1 }. = {a1 , a2 } and {a2 }/ = {o1 }. To carry out our plan, we need to learn a little more about the pair (. ,/ ) of maps . : 2G 7→ 2M and / : 2M 7→ 2G . Obviously, for all X ⊆ G and for all Y ⊆ M, – X ⊆ Y / iff X . ⊇ Y . Hence, the pair (. ,/ ) of maps . : 2G 7→ 2M and / : 2M 7→ 2G is a Galois connection between (2G , ⊆) and (2M , ⊇). Thus, for all X, X1 , X2 ⊆ G and for all Y, Y1 , Y2 ⊆ M, – if X1 ⊆ X2 then X1. ⊇ X2. , – if Y1 ⊇ Y2 then Y1/ ⊆ Y2/ , – X ⊆ X ./ and X . = X ./. , – Y /. ⊇ Y and Y / = Y /./ . The word problem in semiconcept algebras 281 2.2 Semiconcept Algebras Let IK = (G, M, ∆) be a formal context. Given X ⊆ G, the pair (X, X . ) is called “left semiconcept of IK”. Remark that (∅, M ) is a left semiconcept of IK. Let Hl (IK) = (Hl (IK), ⊥l , >l , ¬l , ∨l , ∧l ) be the algebraic structure of type (0, 0, 1, 2, 2) where Hl (IK) is the set of all left semiconcepts of IK, ⊥l = (∅, M ), >l = (G, G. ), ¬l (X, X . ) = (G \ X, (G \ X). ), (X1 , X1. ) ∨l (X2 , X2. ) = (X1 ∪ X2 , (X1 ∪ X2 ). ) and (X1 , X1. ) ∧l (X2 , X2. ) = (X1 ∩ X2 , (X1 ∩X2 ). ). Remark that if G is finite then Hl (IK) is finite too and moreover, | Hl (IK) | = 2|G| . It is a simple exercise to check that the above operations ⊥l , >l , ¬l ·, · ∨l · and · ∧l · on Hl (IK) are isomorphic to the Boolean operations ∅, G, G \ ·, · ∪ · and · ∩ · on 2G . Hence, Hl (IK) satisfies the conditions of nondegenerate Boolean algebras. Given Y ⊆ M , the pair (Y / , Y ) is called “right semiconcept of IK”. Remark that (G, ∅) is a right semiconcept of IK. Let Hr (IK) = (Hr (IK), ⊥r , >r , ¬r , ∨r , ∧r ) be the algebraic structure of type (0, 0, 1, 2, 2) where Hr (IK) is the set of all right semiconcepts of IK, ⊥r = (M / , M ), >r = (G, ∅), ¬r (Y / , Y ) = ((M \Y )/ , M \Y ), (Y1/ , Y1 ) ∨r (Y2/ , Y2 ) = ((Y1 ∩ Y2 )/ , Y1 ∩ Y2 ) and (Y1/ , Y1 ) ∧r (Y2/ , Y2 ) = ((Y1 ∪ Y2 )/ , Y1 ∪ Y2 ). Remark that if M is finite then Hr (IK) is finite too and moreover, | Hr (IK) | = 2|M | . It is a simple exercise to check that the above operations ⊥r , >r , ¬r ·, · ∨r · and · ∧r · on Hr (IK) are anti-isomorphic to the Boolean operations ∅, M , M \·, ·∪· and ·∩· on 2M . Hence, Hr (IK) satisfies the conditions of nondegenerate Boolean algebras. Now, for the concept underlying most of our work in this article. Given X ⊆ G and Y ⊆ M , the pair (X, Y ) is called “semiconcept of IK” iff Y = X . or X = Y / . Remark that (∅, M ) and (G, ∅) are semiconcepts of IK. Example 3. In the formal context IK2,2 of Tab. 1, the semiconcepts are (∅, {a1 , a2 }), ({o1 }, {a1 , a2 }), ({o2 }, {a1 }), ({o1 }, {a2 }), ({o1 , o2 }, {a1 }) and ({o1 , o2 }, ∅). Let H(IK) = (H(IK), ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) be the algebraic structure of type (0, 0, 0, 0, 1, 1, 2, 2, 2, 2) where H(IK) is the set of all semiconcepts of IK, ⊥l = (∅, M ), ⊥r = (M / , M ), >l = (G, G. ), >r = (G, ∅), ¬l (X, Y ) = (G \ X, (G \ X). ), ¬r (X, Y ) = ((M \ Y )/ , M \ Y ), (X1 , Y1 ) ∨l (X2 , Y2 ) = (X1 ∪ X2 , (X1 ∪ X2 ). ), (X1 , Y1 ) ∨r (X2 , Y2 ) = ((Y1 ∩ Y2 )/ , Y1 ∩ Y2 ), (X1 , Y1 ) ∧l (X2 , Y2 ) = (X1 ∩ X2 , (X1 ∩ X2 ). ) and (X1 , Y1 ) ∧r (X2 , Y2 ) = ((Y1 ∪ Y2 )/ , Y1 ∪ Y2 ). Example 4. In the formal context IK2,2 of Tab. 1, ⊥l = (∅, {a1 , a2 }), >l = ({o1 , o2 }, {a1 }), ⊥r = ({o1 }, {a1 , a2 }) and >r = ({o1 , o2 }, ∅). Remark that if G, M are finite then H(IK) is finite too and moreover, | H(IK) | ≤ 2|G| + 2|M | . Obviously, the operations ⊥l , >l , ¬l ·, · ∨l · and · ∧l ·, when restricted to the set of all left semiconcepts of IK, are isomorphic to the Boolean operations 282 Philippe Balbiani ∅, G, G \ ·, · ∪ · and · ∩ · on 2G whereas the operations ⊥r , >r , ¬r ·, · ∨r · and · ∧r ·, when restricted to the set of all right semiconcepts of IK, are anti-isomorphic to the Boolean operations ∅, M , M \ ·, · ∪ · and · ∩ · on 2M . In other respects, it is a simple matter to check that H(IK) satisfies the following conditions for every x, y, z ∈ H(IK): – x ∧l (y ∧l z) = (x ∧l y) ∧l z and x ∨r (y ∨r z) = (x ∨r y) ∨r z, – x ∧l y = y ∧l x and x ∨r y = y ∨r x, – ¬l (x ∧l x) = ¬l x and ¬r (x ∨r x) = ¬r x, – x ∧l (y ∧l y) = x ∧l y and x ∨r (y ∨r y) = x ∨r y, – x ∧l (y ∨l z) = (x ∧l y) ∨l (x ∧l z) and x ∨r (y ∧r z) = (x ∨r y) ∧r (x ∨r z), – x ∧l (x ∨l y) = x ∧l x and x ∨r (x ∧r y) = x ∨r x, – x ∧l (x ∨r y) = x ∧l x and x ∨r (x ∧l y) = x ∨r x, – ¬l (¬l x ∧l ¬l y) = x ∨l y and ¬r (¬r x ∨r ¬r y) = x ∧r y, – ¬l ⊥l = >l and ¬r >r = ⊥r , – ¬l >r = ⊥l and ¬r ⊥l = >r , – >r ∧l >r = >l and ⊥l ∨r ⊥l = ⊥r , – x ∧l ¬l x = ⊥l and x ∨r ¬r x = >r , – ¬l ¬l (x ∧l y) = x ∧l y and ¬r ¬r (x ∨r y) = x ∨r y, – (x ∧l x) ∨r (x ∧l x) = (x ∨r x) ∧l (x ∨r x), – x ∧l x = x or x ∨r x = x. Let us remark that the first 13 above conditions come in pairs of mirror images obtained by interchanging ⊥l with >r , >l with ⊥r , ¬l with ¬r , ∨l with ∧r and ∧l with ∨r whereas the last 2 above conditions are equivalent to their own mirror images. This leads us to the principle of duality stating that from any condition provable from the 15 above conditions, another such condition results immediately by interchanging ⊥l with >r , >l with ⊥r , ¬l with ¬r , ∨l with ∧r and ∧l with ∨r . The set H(IK) can be ordered by the binary relation v defined by (X1 , Y1 ) v (X2 , Y2 ) iff X1 ⊆ X2 and Y1 ⊇ Y2 for every (X1 , Y1 ), (X2 , Y2 ) ∈ H(IK). Obviously, for all (X1 , Y1 ), (X2 , Y2 ) ∈ H(IK), – (X1 , Y1 ) v (X2 , Y2 ) iff (X1 , Y1 )∧l (X2 , Y2 ) = (X1 , Y1 )∧l (X1 , Y1 ) and (X1 , Y1 ) ∨r (X2 , Y2 ) = (X2 , Y2 ) ∨r (X2 , Y2 ), – if (X1 , Y1 ) ∈ Hl (IK) then (X1 , Y1 ) v (X2 , Y2 ) iff (X1 , Y1 ) ∧l (X2 , Y2 ) = (X1 , Y1 ), – if (X2 , Y2 ) ∈ Hr (IK) then (X1 , Y1 ) v (X2 , Y2 ) iff (X1 , Y1 ) ∨r (X2 , Y2 ) = (X2 , Y2 ). Moreover, the binary relation v is reflexive, antisymmetric and transitive on H(IK). In order to give an abstract characterization of the operations ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l and ∧r , we shall say that an algebraic structure D = (D, ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) of type (0, 0, 0, 0, 1, 1, 2, 2, 2, 2) is a pure double Boolean algebra iff the operations ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l and ∧r satisfy the 15 above conditions. The word problem in semiconcept algebras 283 3 From Semiconcept Algebras to Formal Contexts The aim of this section is to give an abstract characterization of the operations ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l and ∧r . 3.1 Filters and Ideals Let D = (D, ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) be a pure double Boolean alge- bra. We define Dl = {x ∧l x: x ∈ D} Dr = {x ∨r x: x ∈ D} Intuitively, elements of Dl can be considered as sets of objects and elements of Dr can be considered as sets of attributes. Example 5. In the semiconcept algebra associated to the formal context IK2,2 of Tab. 1, D2,2 = {(∅, {a1 , a2 }), ({o1 }, {a1 , a2 }), ({o2 }, {a1 }), ({o1 }, {a2 }), ({o1 , o2 }, {a1 }), ({o1 , o2 }, ∅)}, Dl2,2 = {(∅, {a1 , a2 }), ({o1 }, {a1 , a2 }), ({o2 }, {a1 }), ({o1 , o2 }, {a1 })} and Dr2,2 = {({o1 }, {a1 , a2 }), ({o1 , o2 }, {a1 }), ({o1 }, {a2 }), ({o1 , o2 }, ∅)}. Obviously, the operations ⊥l , >l , ¬l , ∨l and ∧l are stable on Dl and the opera- tions ⊥r , >r , ¬r , ∨r and ∧r are stable on Dr . Hence, the algebraic structures Dl = (Dl , ⊥l , >l , ¬l , ∨l , ∧l ) and Dr = (Dr , ⊥r , >r , ¬r , ∨r , ∧r ) are algebraic struc- tures of type (0, 0, 1, 2, 2). More precisely, they are Boolean algebras. Moreover, the set D can be ordered by the binary relation ≤ defined by x ≤ y iff x ∧l y = x ∧l x and x ∨r y = y ∨r y for every x, y ∈ D. Obviously, for all x, y ∈ D, – if x ∈ Dl then x ≤ y iff x ∧l y = x, – if y ∈ Dr then x ≤ y iff x ∨r y = y. Moreover, the binary relation ≤ is reflexive, antisymmetric and transitive on D. q ({o1 , o2 }, ∅) @ I @ q @q ({o1 }, {a2 })@ ({o1 , o2 }, {a1 }) I @ @ @ I@ @q @q @ ({o1 }, {a1 , a@ 2 }) ({o2 }, {a1 }) I @ @ @q @ (∅, {a1 , a2 }) Fig. 1. 284 Philippe Balbiani Example 6. In Fig. 1 is represented the binary relation ≤2,2 ordering the set D2,2 of the semiconcept algebra associated to the formal context IK2,2 of Tab. 1. A nonempty subset F of D is called a filter iff for all x, y ∈ D, – x, y ∈ F implies x ∧l y ∈ F , – x ∈ F and x ≤ y imply y ∈ F . A nonempty subset I of D is called an ideal iff for all x, y ∈ D, – x, y ∈ I implies x ∨r y ∈ I, – x ∈ I and y ≤ x imply y ∈ I. The following lemma explains how filters and ideals can be transformed into filters and ideals of the Boolean algebras Dl and Dr . Lemma 1. Let F, I be nonempty subsets of D. If F is a filter then F ∩ Dl is a filter of the Boolean algebra Dl and F ∩ Dr is a filter of the Boolean algebra Dr and if I is an ideal then I ∩ Dl is an ideal of the Boolean algebra Dl and I ∩ Dr is an ideal of the Boolean algebra Dr . Let F be a nonempty subset of Dl and I be a nonempty subset of Dr . We define [F ) = {x ∈ D: there exists y ∈ F such that y ≤ x} (I] = {x ∈ D: there exists y ∈ I such that x ≤ y} The following lemma explains how filters of the Boolean algebra Dl and ideals of the Boolean algebra Dr can be transformed into filters and ideals. Lemma 2. Let F be a nonempty subset of Dl , I be a nonempty subset of Dr . If F is a filter of the Boolean algebra Dl then [F ) is a filter and [F ) ∩ Dl = F and if I is an ideal of the Boolean algebra Dr then (I] is an ideal and (I] ∩ Dr = I. As a result, Lemma 3. There exists filters F such that F ∩Dl is a prime filter of the Boolean algebra Dl and there exists ideals I such that I∩Dr is a prime ideal of the Boolean algebra Dr . We shall say that D is concrete iff there exists a formal context IK and a function h assigning to each element of D an element of H(IK) such that h is injective and h is a homomorphism from D to H(IK). 3.2 Representation Now, the main question is to prove that every pure double Boolean algebra is concrete. Let D = (D, ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) be a pure double Boolean algebra and consider the formal context IK(D) = (Fp (D), Ip (D), ∆) The word problem in semiconcept algebras 285 where Fp (D) is the set of all filters F for which F ∩ Dl is a prime filter of the Boolean algebra Dl , Ip (D) is the set of all ideals I for which I ∩ Dr is a prime ideal of the Boolean algebra Dr and F ∆ I iff F ∩ I is nonempty. Let H(IK(D)) = (H(IK(D)), ⊥0l , ⊥0r , >0l , >0r , ¬0l , ¬0r , ∨0l , ∨0r , ∧0l , ∧0r ) For all elements x of D, let Fx = {F ∈ Fp (D): x ∈ F } Ix = {I ∈ Ip (D): x ∈ I} Here, the first results are Lemma 4. Let x ∈ D. Fx∧l x = Fx and Ix∨r x = Ix . Lemma 5. Let x ∈ D. If x ∈ Dl then Fx. = Ix and if x ∈ Dr then Ix/ = Fx . Lemma 6. Let x ∈ D. F¬.l ¬l x = I¬l ¬l x and I¬/ r ¬r x = F¬r ¬r x . The next lemmas point the way to the strategy followed in our approach to the proof that every pure double Boolean algebra is concrete. Lemma 7. Let x ∈ D. The pair (Fx , Ix ) is a semiconcept of IK(D). Lemma 8. Let x, y ∈ D. If x 6= y then (Fx , Ix ) 6= (Fy , Iy ). For all x ∈ D, let h(x) = (Fx , Ix ) The next lemma is central for proving that the function h is a homomorphism from D to H(IK). Lemma 9. Let x, y ∈ D. – F⊥l = ∅ and I⊥l = Ip (D), – F⊥r = Ip (D)/ and I⊥r = Ip (D), – F>l = Fp (D) and I>l = Fp (D). , – F>r = Fp (D) and I>r = ∅, – F¬l x = Fp (D) \ Fx and I¬l x = (Fp (D) \ Fx ). , – F¬r x = (Ip (D) \ Ix )/ and I¬r x = Ip (D) \ Ix , – Fx∨l y = Fx ∪ Fy and Ix∨l y = (Fx ∪ Fy ). , – Fx∨r y = (Ix ∩ Iy )/ and Ix∨r y = Ix ∩ Iy , – Fx∧l y = Fx ∩ Fy and Ix∧l y = (Fx ∩ Fy ). , – Fx∧r y = (Ix ∪ Iy )/ and Ix∧r y = Ix ∪ Iy . As a result, Theorem 1. The function h is a homomorphism from D to H(IK). In other words: every pure double Boolean algebra is concrete. 286 Philippe Balbiani 4 The Word Problem in Pure Double Boolean Algebras Let us introduce the word problem in pure double Boolean algebras. 4.1 Syntax Let V ar denote a countable set of individual variables (with typical instances denoted x, y, etc). The set t(V ar) of all terms (with typical instances denoted s, t, etc) is given by the rule s ::= x | 0l | 0r | 1l | 1r | −l s | −r s | (s tl t) | (s tr t) | (s ul t) | (s ur t) Let us adopt the standard rules for omission of the parentheses. Example 7. For instance, x ul (x tr y) is a term. 4.2 Semantics Let D = (D, ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) be a pure double Boolean alge- bra. A valuation based on D is a function m assigning to each individual variable x an element m(x) of D. Example 8. The function m2,2 defined below is a valuation based on the pure double Boolean algebra D2,2 defined in Example 5: m2,2 (x) = ({o2 }, {a1 }), m2,2 (y) = ({o1 }, {a2 }) and for all individual variables z, if z 6= x, y then m2,2 (z) = ({o1 , o2 }, {a1 }). m induces a function (·)m assigning to each term s an element (s)m of D such that (x)m = m(x), (0l )m = ⊥l , (0r )m = ⊥r , (1l )m = >l , (1r )m = >r , (−l s)m = ¬l (s)m , (−r s)m = ¬r (s)m , (s tl t)m = (s)m ∨l (t)m , (s tr t)m = (s)m ∨r (t)m , (s ul t)m = (s)m ∧l (t)m and (s ur t)m = (s)m ∧r (t)m . Example 9. Concerning the valuation m2,2 defined in Example 8, we have (x tr 2,2 2,2 y)m = ({o1 , o2 }, ∅) and (−l x)m = ({o1 }, {a1 , a2 }). 4.3 The Word Problem Now, for the WP in pure double Boolean algebras: input: terms s, t, output: determine whether there exists a pure double Boolean algebra D and a valuation m based on D such that (s)m 6= (t)m . A general strategy for proving a decision problem to be PSPACE-complete is first, to reduce to it a decision problem easily proved to be PSPACE-hard and second, to reduce it to a decision problem easily proved to be in PSPACE. PSPACE is the key complexity class of the satisfiability problem of numerous modal logics [1, Chapter 6]. Therefore, we introduce in Section 5 a PSPACE- complete modal logic and we show in Sections 6 and 7 how to reduce one into the other its satisfiability problem and the WP in pure double Boolean algebras. The word problem in semiconcept algebras 287 5 A Basic 2-Sorted Modal Logic In Section 3, we gave the proof that every pure double Boolean algebra can be homomorphically embedded into the pure double Boolean algebra over some formal context. Formal contexts are 2-sorted structures. Hence, the modal logic that will be used in Sections 6 and 7 for proving the WP in pure double Boolean algebras to be PSPACE-complete is a 2-sorted one. 5.1 Syntax The language of K2 is based on a countable set OV ar of object variables (with typical instances denoted P , Q, etc) and a countable set AV ar of attribute variables (with typical instances denoted p, q, etc). Without loss of generality, let us assume that OV ar and AV ar are disjoint. The set of all object formulas (with typical instances denoted A, B, etc) and the set of all attribute formulas (with typical instances denoted a, b, etc) are given by the rules A ::= P | ⊥ | ¬A | (A ∨ B) | 2a a ::= p | ⊥ | ¬a | (a ∨ b) | 2A The other Boolean constructs are defined as usual. Let us adopt the standard rules for omission of the parentheses. A formula (with typical instances denoted α, β, etc) is either an object formula or an attribute formula. The notion of “being a subformula of” is standard, the expression α β denoting the fact that α is a subformula of β. A substitution is a pair (Θ, θ) where Θ is a function assigning to each object variable P an object formula Θ(P ) and θ is a function assigning to each attribute variable p an attribute formula θ(p). (Θ, θ) induces a homomorphism (·)(Θ,θ) assigning to each formula α a formula (α)(Θ,θ) such that (P )(Θ,θ) = Θ(P ) and (p)(Θ,θ) = θ(p). Remark that for all object formulas A and for all attribute formulas a, – (A)(Θ,θ) is an object formula, – (a)(Θ,θ) is an attribute formula. Let OV ar = P1 , P2 , . . . be an enumeration of OV ar and AV ar = p1 , p2 , . . . be an enumeration of AV ar. We shall say that a substitution (Θ, θ) is normal with respect to OV ar and AV ar iff for all positive integers i, – Θ(Pi ) = Pi and θ(pi ) = 2Pi or Θ(Pi ) = 2pi and θ(pi ) = pi . Given a formula α, V ar(α) will denote the set of all variables occurring in α. A formula α is said to be nice iff – V ar(α) ⊆ OV ar or V ar(α) ⊆ AV ar. 288 Philippe Balbiani 5.2 Semantics Let IK = (G, M, ∆) be a formal context. A IK-valuation is a pair (V, v) of func- tions where V assigns to each object variable P a subset V (P ) of G and v assigns to each attribute variable p a subset v(p) of M . (V, v) induces a function (·)(V,v) assigning to each formula α a subset (α)(V,v) of G ∪ M such that (P )(V,v) = V (P ), (⊥)(V,v) = ∅, (¬A)(V,v) = G \ (A)(V,v) , (A ∨ B)(V,v) = (A)(V,v) ∪ (B)(V,v) , (2a)(V,v) = {g ∈ G: for all m ∈ M , if m ∈ (a)(V,v) then g ∆ m}, (p)(V,v) = v(p), (⊥)(V,v) = ∅, (¬a)(V,v) = M \ (a)(V,v) , (a ∨ b)(V,v) = (a)(V,v) ∪ (b)(V,v) and (2A)(V,v) = {m ∈ M : for all g ∈ G, if g ∈ (A)(V,v) then g ∆ m}. Remark that for all object formulas A and for all attribute formulas a, . – (A)(V,v) is a subset of G such that (A)(V,v) = (2A)(V,v) , / – (a)(V,v) is a subset of M such that (a)(V,v) = (2a)(V,v) . A formula α is said to be satisfiable iff – there exists a formal context IK = (G, M, ∆) and a IK-valuation (V, v) such that (α)(V,v) is nonempty. 5.3 Decision Now, for the nice satisfiability problem for K2 : input: a nice formula α, output: determine whether α is satisfiable. The next lemmas are central for proving that the problem of deciding equations in pure double Boolean algebras is PSPACE-complete. Theorem 2. The nice satisfiability problem for K2 is PSPACE-hard. Proof. A reduction similar to the reduction from the QBF -validity problem to the satisfiability problem for K considered in [1, Theorem 6.50] can be easily obtained. Now, for the satisfiability problem for K2 : input: a formula α, output: determine whether α is satisfiable. Theorem 3. The satisfiability problem for K2 is in PSPACE. Proof. An algorithm similar to the W itness algorithm considered in [1, Theorem 6.47] can be easily obtained. From Theorems 2 and 3, it follows immediately that the nice satisfiability prob- lem for K2 and the satisfiability problem for K2 are both PSPACE-complete. The word problem in semiconcept algebras 289 6 From K2 to Pure Double Boolean Algebras First, we consider the lower bound of the complexity of the problem of deciding the WP in pure double Boolean algebras. Given a nice formula α, we wish to construct a pair (s1 (α), s2 (α)) of terms such that α is satisfiable iff there exists a pure double Boolean algebra D and a valuation m based on D such that (s1 (α))m 6= (s2 (α))m . Let OV ar = P1 , P2 , . . . be an enumeration of OV ar, AV ar = p1 , p2 , . . . be an enumeration of AV ar and V ar = x1 , y1 , x2 , y2 , . . . be an enumeration of V ar. The function T (·) assigning to each nice object formula A a term T (A) and the function t(·) assigning to each nice attribute formula a a term t(a) are such that T (Pi ) = xi , T (⊥) = 0l , T (¬A) = −l T (A), T (A ∨ B) = T (A) tl T (B), T (2a) = −l −l −r −r t(a), t(pi ) = yi , t(⊥) = 1r , t(¬a) = −r t(a), t(a ∨ b) = t(a) ur t(b) and t(2A) = −r −r −l −l T (A). Let (s1 (·), s2 (·)) be the function assigning to each nice formula α a pair (s1 (α), s2 (α)) of terms such that if α is a nice object formula then s1 (α) = T (α) and s2 (α) = 0l and if α is a nice attribute formula then s1 (α) = t(α) and s2 (α) = 1r . Obviously, (s1 (α), s2 (α)) can be computed in space log | α |. Moreover, Proposition 1. If α is nice then α is satisfiable iff there exists a pure double Boolean algebra D and a valuation m based on D such that (s1 (α))m 6= (s2 (α))m . Proof. Since α is nice, V ar(α) ⊆ OV ar or V ar(α) ⊆ AV ar. Without loss of generality, let us assume that V ar(α) ⊆ OV ar. Hence, there exists a positive integer n such that V ar(α) ⊆ {P1 , . . . , Pn }. (⇒) Suppose α is satisfiable, we demonstrate there exists a pure double Boolean algebra D and a valuation m based on D such that (s1 (α))m 6= (s2 (α))m . Since α is satisfiable, there exists a formal context IK = (G, M, ∆) and a valuation (V, v) based on IK such that (α)(V,v) is nonempty. Let H(IK) = (H(IK), ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) and m be a valuation based on H(IK) such that for all positive integers i, if i ≤ n then m(xi ) = (V (Pi ), V (Pi ). ). We show first that Lemma 10. Let A be a nice object formula and a be a nice attribute formula. . If A α then (T (A))m = ((A)(V,v) , (A)(V,v) ) and if a α then (t(a))m = / ((a)(V,v) , (a)(V,v) ). Continuing the proof of Proposition 1, since (α)(V,v) is nonempty, by Lemma 10, if α is a nice object formula then (T (α))m 6= (0l )m and if α is a nice attribute formula then (t(α))m 6= (1r )m . Hence, (s1 (α))m 6= (s2 (α))m . Thus, there exists a pure double Boolean algebra D and a valuation m based on D such that (s1 (α))m 6= (s2 (α))m . (⇐) Suppose there exists a pure double Boolean algebra D and a valuation m based on D such that (s1 (α))m 6= (s2 (α))m , we demonstrate α is satisfiable. Let IK(D) = (Fp (D), Ip (D), ∆) and (V, v) be a valuation based on IK(D) such that for all positive integers i, if i ≤ n then V (Pi ) = Fm(xi ) . Interestingly, Lemma 11. Let A be a nice object formula and a be a nice attribute formula. If A α then (A)(V,v) = F(T (A))m and if a α then (a)(V,v) = I(t(a))m . 290 Philippe Balbiani Continuing the proof of Proposition 1, since (s1 (α))m 6= (s2 (α))m , if α is a nice object formula then (T (α))m 6= (0l )m and if α is a nice attribute formula then (t(α))m 6= (1r )m . Hence, by Lemma 11, (α)(V,v) is nonempty. Thus, α is satisfiable. This ends the proof of Proposition 1. Hence, (s1 (·), s2 (·)) is a reduction from the nice satisfiability problem for K2 to the WP in pure double Boolean algebras. Thus, by Theorem 2, Corollary 1. The WP in pure double Boolean algebras is PSPACE-hard. 7 From Pure Double Boolean Algebras to K2 Second, we consider the upper bound of the complexity of the WP in pure double Boolean algebras. Given a pair (s, t) of terms, we wish to construct an object formula O(s, t) and an attribute formula A(s, t) such that there exists a pure double Boolean algebra D and a valuation m based on D such that (s)m 6= (t)m iff some instance of O(s, t) is satisfiable or some instance of A(s, t) is satisfiable. Let V ar = x1 , x2 , . . . be an enumeration of V ar, OV ar = P1 , P2 , . . . be an enumeration of OV ar and AV ar = p1 , p2 , . . . be an enumeration of AV ar. The function F (·) assigning to each term s an object formula F (s) and the function f (·) assigning to each term s an attribute formula f (s) are such that F (xi ) = Pi , f (xi ) = pi , F (0l ) = ⊥, f (0l ) = 2⊥, F (0r ) = 2>, f (0r ) = >, F (1l ) = >, f (1l ) = 2>, F (1r ) = 2⊥, f (1r ) = ⊥, F (−l s) = ¬F (s), f (−l s) = 2¬F (s), F (−r s) = 2¬f (s), f (−r s) = ¬f (s), F (s tl t) = F (s) ∨ F (t), f (s tl t) = 2(F (s) ∨ F (t)), F (str t) = 2(f (s)∧f (t)), f (str t) = f (s)∧f (t), F (sul t) = F (s)∧F (t), f (sul t) = 2(F (s)∧F (t)), F (sur t) = 2(f (s)∨f (t)) and f (sur t) = f (s)∨f (t). Let O(·, ·) be the function assigning to each pair (s, t) of terms the object formula O(s, t) such that O(s, t) = ¬(F (s) ↔ F (t)). Let A(·, ·) be the function assigning to each pair (s, t) of terms the attribute formula A(s, t) such that A(s, t) = ¬(f (s) ↔ f (t)). Obviously, O(s, t) and A(s, t) can be computed in space log | (s, t) |. Moreover, Proposition 2. There exists a pure double Boolean algebra D and a valuation m based on D such that (s)m 6= (t)m iff there exists a substitution (Θ, θ) such that (Θ, θ) is normal with respect to OV ar and AV ar and O(s, t)(Θ,θ) is satisfiable or A(s, t)(Θ,θ) is satisfiable. Proof. Let n be a positive integer such that V ar(s) ∪ V ar(t) ⊆ {x1 , . . . , xn }. (⇒) Suppose there exists a pure double Boolean algebra D and a valuation m based on D such that (s)m 6= (t)m , we demonstrate there exists a substitu- tion (Θ, θ) such that (Θ, θ) is normal with respect to OV ar and AV ar and O(s, t)(Θ,θ) is satisfiable or A(s, t)(Θ,θ) is satisfiable. Let (Θ, θ) be a normal sub- stitution with respect to OV ar and AV ar such that for all positive integers i, if i ≤ n then if m(xi ) is in Dl then Θ(Pi ) = Pi and θ(pi ) = 2Pi and if m(xi ) is in Dr then Θ(Pi ) = 2pi and θ(pi ) = pi . Let IK(D) = (Fp (D), Ip (D), ∆) and (V, v) be a valuation based on IK(D) such that for all positive integers i, if i ≤ n then V (Pi ) = Fm(xi ) and v(pi ) = Im(xi ) . Remark that for all positive integers i, The word problem in semiconcept algebras 291 (V,v) if i ≤ n then if m(xi ) is in Dl then (Pi )(Θ,θ) = (Pi )(V,v) = V (Pi ) = Fm(xi ) (Θ,θ) (V,v) / and if m(xi ) is in Dr then (Pi ) = (2pi )(V,v) = (pi )(V,v) = v(pi )/ = / Im(xi ) = Fm(xi ) . Similarly, for all positive integers i, if i ≤ n then if m(xi ) is in (V,v) . Dl then (pi )(Θ,θ) = (2Pi )(V,v) = (Pi )(V,v) = V (Pi ). = Fm(x . i) = Im(xi ) and (V,v) if m(xi ) is in Dr then (pi )(Θ,θ) = (pi )(V,v) = v(pi ) = Im(xi ) We first observe (V,v) Lemma 12. Let u be a term. If u s or u t then (F (u))(Θ,θ) = F(u)m (Θ,θ) (V,v) and (f (u)) = I(u)m . Continuing the proof of Proposition 2, since (s)m 6= (t)m , F(s)m 6= F(t)m or I(s)m (V,v) (V,v) 6 I(t)m . Hence, by Lemma 12, O(s, t)(Θ,θ) = is nonempty or A(s, t)(Θ,θ) is nonempty. Thus, there exists a substitution (Θ, θ) such that (Θ, θ) is normal with respect to OV ar and AV ar and O(s, t)(Θ,θ) is satisfiable or A(s, t)(Θ,θ) is satisfiable. (⇐) Suppose there exists a substitution (Θ, θ) such that (Θ, θ) is normal with respect to OV ar and AV ar and O(s, t)(Θ,θ) is satisfiable or A(s, t)(Θ,θ) is sat- isfiable, we demonstrate there exists a pure double Boolean algebra D and a valuation m based on D such that (s)m 6= (t)m . Since O(s, t)(Θ,θ) is satisfi- able or A(s, t)(Θ,θ) is satisfiable, there exists a formal context IK = (G, M, ∆) (V,v) and a valuation (V, v) based on IK such that O(s, t)(Θ,θ) is nonempty or (Θ,θ) (V,v) A(s, t) is nonempty. Let H(IK) = (H(IK), ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) and m be a valuation based on H(IK) such that for all positive integers i, if i ≤ n then m(xi ) = ((Θ(Pi ))(V,v) , (θ(pi ))(V,v) ). Interestingly, (V,v) Lemma 13. Let u be a term. If u s or u t then (u)m = ((F (u))(Θ,θ) , (V,v) (f (u))(Θ,θ) ). (V,v) Continuing the proof of Proposition 2, since O(s, t)(Θ,θ) is nonempty or (Θ,θ) (V,v) (Θ,θ) (V,v) (Θ,θ) (V,v) (V,v) A(s, t) is nonempty, F (s) 6= F (t) or f (s)(Θ,θ) 6= (V,v) f (t)(Θ,θ) . Hence, by lemma 13, (s)m 6= (t)m . Thus, there exists a pure double Boolean algebra D and a valuation m based on D such that (s)m 6= (t)m . This ends the proof of Proposition 2. Hence, O(·, ·) and A(·, ·) are reductions from the WP in pure double Boolean algebras to the satisfiability problem for K2 . Thus, by Theorem 3, Corollary 2. The WP in pure double Boolean algebras is in PSPACE. 8 Conclusion Our results implicitly assume that the set V ar of all individual variables is infi- nite and the depth of nesting of the left operations with the right operations is not bounded. Following the line of reasoning suggested in [4], we may see what 292 Philippe Balbiani happens if we assume that the set V ar of all individual variables is finite and the depth of nesting of the left operations with the right operations is bounded. Do we get a linear time complexity in this case? The unification problem is quite different from the WP discussed here: given terms s, t, decide whether there exists terms which can be substituted for the variables in s, t so that the terms thus obtained are identically interpreted in all pure double Boolean algebras. In Mathematics and Computer Science, unifica- tion problems are of the utmost importance. At the time of writing, we know nothing about the decidability/complexity of the unification problem in pure double Boolean algebras. Acknowledgements Special acknowledgement is heartly granted to Christian Herrmann who made several helpful comments for improving the correctness and the readability of this article. References 1. Blackburn, P., de Rijke, M., Venema, Y.: Modal Logic. Cambridge University Press (2001). 2. Davey, B, Priestley, H.: Introduction to Lattices and Order. Cambridge University Press (2002). 3. Ganter, B, Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer (1999). 4. Halpern, J.: The effect of bounding the number of primitive propositions and the depth of nesting on the complexity of modal logic. Artificial Intelligence 75 (1995) 361–372. 5. Herrmann, C., Luksch, P., Skorsky, M., Wille, R.: Algebras of semiconcepts and double Boolean algebras. Technische Universität Darmstadt (2000). 6. Vormbrock, B.: A first step towards protoconcept exploration. In Eklund, P. (editor): Concept Lattices. Springer (2004) 208–221. 7. Vormbrock, B.: Complete subalgebras of semiconcept algebras and protoconcept alge- bras. In Ganter, B., Godin, R. (editors): Formal Concept Analysis. Springer (2005) 329–343. 8. Vormbrock, B.: A solution of the word problem for free double Boolean algebras. In Kuznetsov, S., Schmidt, S. (editors): Formal Concept Analysis. Springer (2007) 240–270. 9. Vormbrock, B., Wille, R.: Semiconcept and protoconcept algebras: the basic theorems. In Ganter, B., Stumme, G., Wille, R. (editors): Formal Concept Analysis. Springer (2005) 34–48. 10. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of con- cepts. In Rival, I. (editor): Ordered Sets. D. Reidel (1982) 314–339 11. Wille, R.: Boolean concept logic. In Ganter, B., Mineau, G. (editors): Conceptual Structures: Logical, Linguistic, and Computational Issues. Springer (2000) 317–331. 12. Wille, R.: Formal concept analysis as applied lattice theory. In Ben Yahia, S., Me- phu Nguifo, E., Belohlavek, R. (editors): Concept Lattices and their Applications. Springer (2008) 42–67 The word problem in semiconcept algebras 293 Annex Proof of Lemma 10. By induction on A and a. Basis. Remind that V ar(α) ⊆ {P1 , . . . , Pn }. In this respect, for all positive integers i, if i ≤ n then (T (Pi ))m = (xi )m = m(xi ) = (V (Pi ), V (Pi ). ) = . ((Pi )(V,v) , (Pi )(V,v) ). Hypothesis. Suppose A, B are nice object formulas such that A, B α, . . (T (A))m = ((A)(V,v) , (A)(V,v) ) and (T (B))m = ((B)(V,v) , (B)(V,v) ) and a, b m (V,v) / are nice attribute formulas such that a, b α, (t(a)) = ((a) , (a)(V,v) ) and m (V,v) / (V,v) (t(b)) = ((b) , (b) ). Step. We only consider the case of the nice object formula 2a, the other cases being treated similarly. We have: (T (2a))m = (−l −l −r −r t(a))m = / / ¬l ¬l ¬r ¬r (t(a))m = ¬l ¬l ¬r ¬r ((a)(V,v) , (a)(V,v) ) = ¬l ¬l (((a)(V,v) ) , (a)(V,v) ) = . (V,v) / (V,v) / (V,v) (V,v) . (((a) ) , ((a) ) ) = ((2a) , (2a) ). Proof of Lemma 11. By induction on A and a. Basis. Remind that V ar(α) ⊆ {P1 , . . . , Pn }. In this respect, for all positive in- tegers i, if i ≤ n then (Pi )(V,v) = V (Pi ) = Fm(xi ) = F(xi )m = F(T (Pi ))m . Hypothesis. Suppose A, B are nice object formulas such that A, B α, (A)(V,v) = F(T (A))m and (B)(V,v) = F(T (B))m and a, b are nice attribute formulas such that a, b α, (a)(V,v) = I(t(a))m and (b)(V,v) = I(t(b))m . Step. We only consider the case of the nice object formula 2a, the other cases being treated similarly. We have: (2a)(V,v) = {F ∈ Fp (D): for all I ∈ Ip (D), if I ∈ (a)(V,v) then F ∆ I} = {F ∈ Fp (D): for all I ∈ Ip (D), if I ∈ I(t(a))m then F ∆ I} = I(t(a))m / = F¬r ¬r (t(a))m = F¬l ¬l ¬r ¬r (t(a))m = F(−l −l −r −r t(a))m = F(T (2a))m . Proof of Lemma 12. By induction on u. Basis. Remind that V ar(s) ∪ V ar(t) ⊆ {x1 , . . . , xn }. In this respect, for all pos- (V,v) (V,v) itive integers i, if i ≤ n then (F (xi ))(Θ,θ) = (Pi )(Θ,θ) = Fm(xi ) = F(xi )m (V,v) (V,v) and (f (xi ))(Θ,θ) = (pi )(Θ,θ) = Im(xi ) = I(xi )m . Hypothesis. Suppose u, v are terms such that u s or u t, v s or v t, (V,v) (V,v) (V,v) (F (u))(Θ,θ) = F(u)m , (f (u))(Θ,θ) = I(u)m , (F (v))(Θ,θ) = F(v)m and (V,v) (f (v))(Θ,θ) = I(v)m . Step. We only consider the case of the term u ul v, the other cases being treated (V,v) (V,v) similarly. We have: (F (u ul v))(Θ,θ) = (F (u) ∧ F (v))(Θ,θ) = ((F (u))(Θ,θ) (V,v) (V,v) ∧(F (v))(Θ,θ) )(V,v) = (F (u))(Θ,θ) ∩ (F (v))(Θ,θ) = F(u)m ∩ F(v)m = (Θ,θ) (V,v) (V,v) F(u)m ∧l (v)m = F(uul v)m and (f (u ul v)) = (2(F (u) ∧ F (v)))(Θ,θ) = . (2((F (u))(Θ,θ) ∧ (F (v))(Θ,θ) ))(V,v) = ((F (u))(Θ,θ) ∧ (F (v))(Θ,θ) )(V,v) = (V,v) (V,v) . ((F (u))(Θ,θ) ∩(F (v))(Θ,θ) ) = (F(u)m ∩F(v)m ). = I(u)m ∧l (v)m = I(uul v)m . Proof of Lemma 13. By induction on u. Basis. Remind that V ar(s) ∪ V ar(t) ⊆ {x1 , . . . , xn }. In this respect, for all 294 Philippe Balbiani positive integers i, if i ≤ n then (xi )m = m(xi ) = ((Θ(Pi ))(V,v) , (θ(pi ))(V,v) ) = (V,v) (V,v) (V,v) (V,v) ((Pi )(Θ,θ) , (pi )(Θ,θ) ) = ((F (xi ))(Θ,θ) , (f (xi ))(Θ,θ) ). Hypothesis. Suppose u, v are terms such that u s or u t, v s or (V,v) (V,v) (V,v) v t, (u)m = ((F (u))(Θ,θ) , (f (u))(Θ,θ) ) and (v)m = ((F (v))(Θ,θ) , (V,v) (f (v))(Θ,θ) ). Step. We only consider the case of the term u ul v, the other cases being treated (V,v) (V,v) similarly. We have: (u ul v)m = (u)m ∧l (v)m = ((F (u))(Θ,θ) , (f (u))(Θ,θ) ) (V,v) (V,v) (V,v) (V,v) ∧l ((F (v))(Θ,θ) , (f (v))(Θ,θ) ) = ((F (u))(Θ,θ) ∩ (F (v))(Θ,θ) , (Θ,θ) (V,v) (Θ,θ) (V,v) . (Θ,θ) (Θ,θ) (V,v) ((F (u)) ∩ (F (v)) ) ) = (((F (u)) ∧ (F (v)) ) , . (V,v) ((F (u))(Θ,θ) ∧ (F (v))(Θ,θ) )(V,v) ) = ((F (u) ∧ (F (v))(Θ,θ) , (2((F (u))(Θ,θ) ∧ (V,v) (V,v) (F (v))(Θ,θ) ))(V,v) ) = ((F (u ul v))(Θ,θ) , (2(F (u) ∧ F (v)))(Θ,θ) ) = (V,v) (V,v) ((F (u ul v))(Θ,θ) , (f (u ul v))(Θ,θ) ). Looking for analogical proportions in a formal concept analysis setting Laurent Miclet1 , Henri Prade2 , and David Guennec1 1 IRISA-ENSSAT, Lannion, France, miclet@enssat.fr, david.guennec@gmail.com, 2 IRIT, Université Paul Sabatier, Toulouse, France, prade@irit.fr Abstract. Categorization and analogical reasoning are two important cognitive processes, for which there exist formal counterparts (at least they may be regarded as such): namely, formal concept analysis on the one hand, and analogical proportions (modeled in propositional logic) on the other hand. This is a first attempt aiming at relating these two settings. The paper presents an algorithm that takes advantage of the lattice structure of the set of formal concepts for searching for analogical proportions that may hold in a formal context. Moreover, properties linking analogical proportions and formal concepts are laid bare. 1 Introduction Categorization and analogical reasoning play important roles in cognitive pro- cesses. They both heavily rely on the ideas of similarity and dissimilarity. Items belonging to the same category should be similar, while they are dissimilar with respect to items belonging to other categories. Analogical proportions, which are statements of the form ‘a is to b as c is to d’, express the similarity of the relations linking a and b with the relations linking c and d (note that however a and b may be somewhat dissimilar (as well as c and d). In a Boolean setting, where items are described in terms of binary attributes, similarity amounts to the identity of properties, while dissimilarity refers to the presence of properties for an item which are absent in the other considered item. Among formal approaches aiming at categorizing items, Formal Concept Analysis (FCA) provides a way for characterizing concepts both extensionally in terms of the objects that they cover and intensionally in terms of the properties that these objects share. FCA is known as a lattice-theoretic framework devised for knowledge extraction from Boolean data tables called formal contexts that relate objects and properties. Introduced under this name by Wille [13], FCA has been developed by Ganter and Wille [7] and their followers for thirty years. Besides, there has been a renewal of interest for analogical proportions in the last decade, firstly in relation with computational linguistic concerns. Set-based, algebraic and logical models have been proposed [8, 12, 1, 9]. In the following, we more particularly use the Boolean view [9] of analogical proportions that is directly relevant for application to formal contexts. Then, it makes sense to look for analogical proportions in Boolean contexts, and to try to understand what formal concepts and analogical proportions may have in common. c 2011 by the paper authors. CLA 2011, pp. 295–307. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 296 Laurent Miclet, Henri Prade and David Guennec The paper is organized as follows. We first provide a short background on analogical proportions in Section 2. Then in Section 3, after a brief reminder of basic definitions in FCA, we present an efficient algorithm able to discover ana- logical proportions in a formal context by using the lattice of formal concepts. In Section 4, we further investigate the theoretical relations between FCA and ana- logical proportions, by showing how formal concepts are involved in analogical proportions, before indicating lines for further research and concluding. 2 Analogical proportions An analogical proportion ‘a is to b as c is to d’, usually denoted a : b :: c : d, expresses that the way a and b differ is the same as the way c and ddiffer [9]. This leads to the following definitions, here stated for three closely related kinds of items: subsets of a finite set, Boolean truth values, and objects defined by Boolean properties (also called binary attributes). Analogical proportion between sets First, let us consider four sets A, B, C and D, all subsets of some set X. The dissimilarity between A and B is evaluated by A ∩ B and by A ∩ B, where A denotes the complement of A in X, while the similarity corresponds to A ∩ B and A ∩ B. Viewing an analogical proportion as expressing that the differences between A and B and between C and D are the same, we get the following definition [9]: Definition 1 Four subsets A, B, C and D of a finite set X are in analogical proportion in this order when A ∩ B = C ∩ D and A ∩ B = C ∩ D. Analogical proportion between Boolean objects This expression has an immediate logical counterpart when a, b, c, and d now denote Boolean variables: ((a ∧ ¬b) ≡ (c ∧ ¬d)) ∧ ((¬a ∧ b) ≡ (¬c ∧ d)) This formula is true for the 6 truth value assignments of a, b, c, d appearing in Table 1, and is false for the 24 − 6 = 10 remaining possible assignments. It can be checked that the above definitions of an analogical proportion sat- isfies the following characteristic postulates [8]: – a : b :: a : b (identity) – a : b :: c : d =⇒ c : d :: a : b (global symmetry) – a : b :: c : d =⇒ a : c :: b : d (central permutation) – a : b :: c : d and ¬(b : a :: c : d) are consistent (local dissymmetry) Looking for analogical proportions in a formal concept analysis setting 297 a × × × b × × × c × ×× d × × × Table 1. The six Boolean 4-tuples that are in analogical proportion. The Boolean truth-values True and False are written as a cross and a blank. Objects defined by Boolean properties Let us suppose that the objects (or items) a, b, c, d are described by sets of binary properties belonging to a set P rop. Then, each item can be viewed as a subset of P rop, made of the attributes that hold true on this item3 . Then, we can apply Definition 1, namely a ∩ b = c ∩ d and a ∩ b = c ∩ d. Another way of seeing this analogical proportion is given by the equivalent definition: Definition 2 Four objects (a, b, c, d) defined by binary properties are in analogi- cal proportion iff the truth-values of each property on these objects make a 4-tuple of binary values corresponding to one of the six 4-tuples displayed in Table 1. Analogical dissimilarity We now introduce the concept of analogical dissim- ilarity (AD) between four objects defined by binary attributes. It is simply the sum on all the attributes of the analogical dissimilarity per attribute. The latter is defined according to the following table: a ×××××××× b ×××× ×××× c ×× ×× ×× ×× d × × × × × × × × AD = 0 1 1 0 1 0 2 1 1 2 0 1 0 1 1 0 AD per attribute is merely the minimal number of bit(s) that has/have to be flipped in order to turn the four bits into an analogical proportion, according to Table 1. Notice that any 4-tuple with a zero AD is an analogical proportion, and vice-versa. For example, the four first objects of the formal context that we will later call BASE lm (see Figure 2) are such that AD(leech, bream, frog, dog) = 0 + 1 + 0 + 0 + 0 + 0 + 0 + 1 + 1 = 3. 3 Searching for analogical proportions in formal concepts This section is devoted to the following problem: given a formal context with n objects and d properties (or attributes), is it possible to discover 4-tuples of objects in analogical proportion, without running an O(n4 · d) algorithm? We give in this section an heuristic algorithm which uses the lattice of formal concepts, and has shown experimentally its efficiency for discovering analogical proportions. We start with a brief reminder on FCA. 3 For an object x, this subset is called R↑(x) in Formal Concept Analysis, see Sect. 3.1. 298 Laurent Miclet, Henri Prade and David Guennec 3.1 Formal concept analysis (FCA) FCA starts with a binary relation R, called formal context, defined between a set Obj of objects and a set P rop of Boolean properties. The notation (x, y) ∈ R means that object x has property y. R↑ (x) = {y ∈ P rop|(x, y) ∈ R} is the set of properties of object x. Similarly, R↓ (y) = {x ∈ Obj|(x, y) ∈ R} is the set of objects having property y. Given a set Y of properties, one can define the set of objects [7]: R↓ (Y ) = {x ∈ Obj|R↑ (x) ⊇ Y }. This is the set of objects sharing all properties in Y (and having maybe some others). Then a formal concept is defined as a pair made of its extension X and its intension Y such that R↓ (Y ) = X and R↑ (X) = Y, where (X, Y ) ⊆ Obj×P rop, and R↑ (X) is similarly defined as {y ∈ P rop|R↓ (y) ⊇ X}. It can be also shown that formal concepts are maximal pairs (X, Y ) (in the sense of inclusion) such that X × Y ⊆ R. Moreover, the set of all formal concepts is equipped with a partial order (de- noted ) defined as: (X1 , Y1 ) (X2 , Y2 ) iff X1 ⊆ X2 (or, equivalently, Y2 ⊆ Y1 ), and forms a complete lattice, called the concept lattice of R. Let us consider an example where R is a relation that defines links be- tween eight objects Obj = {1, 2, 3, 4, 5, 6, 7, 8} and nine properties P rop = {a, b, c, d, e, f, g, h, i}. There is a “×” in the cell corresponding to an object x and to a property y if the object x has property y, in other words the “×”s describe the relation R (or context). An empty cell corresponds to the fact that (x, y) 6∈ R, i.e., it is known that object x has not property y. The relation R in the example is given in Figure 1. There are 5 formal concepts. For instance, consider X = {a, b, c, d, e}. Then R↑∆ (X) = {7, 8} ; likewise if Y = {7, 8}. Then R↓∆ (Y ) = {a, b, c, d, e}. a b c d e f g h i 1 × 2 × 3 ××× 4 ××× 5 ×××× 6 ×××× 7××××× 8××××× Fig. 1. R2 : a relation with 5 formal concepts (and 2 sub-contexts) Looking for analogical proportions in a formal concept analysis setting 299 3.2 Organizing the search The basic idea of our algorithm is to start from some 4-tuple of objects, to observe the attributes that contribute to make AD non-zero for this 4-tuple, and to replace one of the four objects by another object. Then we iterate the process. Two important features have been added to avoid a random walk in the space of the 4-tuples. 1. The replacement of one object by another is done according to the obser- vation of the lattice of concepts. The idea is to try to decrease the value of AD. This point will be explained in the next section. 2. All 4-tuples of objects that are created are stored in a list, ordered by in- creasing value of AD. The next 4-tuple to be chosen is the first in the list. This algorithm can therefore be seen as an optimization procedure, more pre- cisely as a best-first version of the GRAPHSEARCH algorithm ([11]). The or- dered list of 4-tuples is an Open list in this interpretation. 3.3 Decreasing the analogical dissimilarity Let us come now to the heart of the algorithm, namely the replacement of one object by another. Can we find some information in the lattice that leads us to choose both an object in the 4-tuple and another object to replace it ? Remember that we are looking for a replacement that makes the AD decrease. Now, let us consider the following situation, taken from BASE lm (see Figure 2). Suppose that we are studying the 4-tuple of objects (3, 4, 9, 12), with AD = 1. Attribute c is the only one to contribute in the AD of this 4-tuple. Actually, c has the values (1, 1, 0, 1) on the 4-tuple (3, 4, 9, 12). We notice now that there are two interesting concepts in the lattice with respect to c, the first one being ({b, c, h, g, a} , {3}) and the second being ({b, h, g, a} , {2, 3}), which are directly connected. What can we deduce from this pair of connected concepts ? – Attribute c has value 1 on object 3. – Attribute c has value 0 on object 2. – Attribute c is the only attribute to switch from 1 to 0 to transform concept ({b, c, h, g, a} , {3}) into concept ({b, h, g, a} , {2, 3}). We can conclude from this evidence that replacing object 3 by object 2 will decrease by 1 the value of AD, since c will take values (0, 1, 0, 1) on the 4-tuple (2, 4, 9, 12), and therefore that (2, 4, 9, 12) is an analogical proportion. Unfortunately this argument does not lead to a greedy algorithm, since there no insurance, given a 4-tuple, that there exists a couple of concepts having the three above properties. Most of the time, actually, there is more than one attribute switching to 1 between two connected concepts, and only one insuring the decreasing of AD. Let us take another example from the same data base. The 4-tuple (5, 4, 9, 12) has a AD of 4, because of the attributes d, f , g and h. Two interesting connected 300 Laurent Miclet, Henri Prade and David Guennec concepts are ({c, f, d, e, i, a} , {12}) and ({c, d, e, a} , {7, 12}), since f switching from 1 to 0 will decrease the AD (see the table below). But replacing 12 by 7 in the 4-tuple (5, 4, 9, 12) will switch not only f but also the attribute i, and we don’t know what will happen when switching i: it may decrease as well as increase the AD. Actually, in this case, it increases the AD. Hence, the 4-tuple (5, 4, 9, 7) has the same AD of 4. a b c d e f g h i a b c d e f g h i 5×× × × 5×× × × 4× × ××× 4× × ××× 9×× ××× 9×× ××× 12 × × × × × × 7× ××× AD = 4 AD = 4 By generalizing these examples, we propose now an heuristic to try to de- crease the AD that we call h-Doap. Heuristic h-Doap. Let a couple of connected concepts be (A ∪ B, Z) and (B, Z ∪ Y ), A and B being subsets of attributes and Y and Z being subsets of objects such that A ∩ B = ∅ and Y ∩ Z = ∅. If there is a 4-tuple with one of its four objects in Z and if there is an attribute in B that decreases the AD of this 4-tuple when switching from 1 to 0, then create a new 4-tuple by replacing the object in Z by an object in Y . Next section shows how this heuristic can be used to discover 4-tuples of objects with null AD in a formal context, i.e. analogical proportions of objects. 3.4 Algorithm Discovering one analogical proportion. We explain in this section the al- gorithm used to discover one analogical proportion in a formal context. We call it ‘Discover One Analogical Proportion’, in short Doap. As already stated, it is a simple version of Graphsearch, where the nodes to be explored are 4-tuples of objects. We denote Start the 4-tuple of objects chosen to begin, Open the current set of 4-tuples to be processed and Closed the set of 4-tuples already processed. The choice of Start is either done randomly or by selecting objects which appear in small subsets of objects in the lattice. We also require that the explored 4-tuples are composed of four different objects, since we do not want to converge towards 4-tuples trivially in proportion, such as (1, 3, 1, 3) or even (2, 2, 2, 2). The algorithm stops either by discovering an analogical proportion, or in failure. One has to notice that its failure does not insure that there is no ana- logical proportion, since there is no guarantee given by the heuristic. We have never met this failure case, but our experiments are very limited, as explained in section 3.5. Looking for analogical proportions in a formal concept analysis setting 301 1: Algorithm Doap(Start) 2: begin 3: Closed ← ∅ 4: Open ← {Start} 5: while Open 6= ∅ do 6: x ← the 4-tuple in Open having the lowest AD value 7: if AD(x) = 0 then 8: return x 9: else 10: Open ← Open \ {x} ; Closed ← Closed ∪ {x} 11: decision ← 1 12: while decision = 1 do 13: Use heuristic h-Doap to construct a new 4-tuple y from x 14: if y is composed of four different objects and y 6∈ Closed and y 6∈ Open then 15: Open ← Open ∪ {y} ; decision ← 0 16: end if 17: end while 18: end if 19: end while 20: return failure 21: end Discovering several analogical proportions. To discover more analogical proportions, the simplest manner is to imbed algorithm Doap in a procedure that discards the first two objects of a discovered analogical 4-tuple from the formal context before re-running the algorithm. Since the transitivity holds for analogical proportions on objects (u : v :: w : x and w : x :: y : z implies u : v :: y : z), we are loosing no information on analogical 4-tuples. However, we are not insured to find all proportions in that manner, due to the fact that algorithm Doap may not find an existing proportion. 3.5 Experiments The size of Close when the algorithm Doap stops is a precise indication of its practical time complexity. Notice that a random algorithm, running on n objects (without any construction of a formal lattice), in which there are q 4-tuples in analogical proportion would in average try ((n4 )/8·q) 4-tuples before discovering a proportion. In the previous formula, the number “8” comes from the fact that, when there is one analogical proportion in a formal context, then there are in fact exactly 8 through suitable permutations. This property stems directly from the postulates an analogical proportion (see section 2). We have used two different formal contexts to run the algorithm Doap. The first one is described in [2], except that we have added four objects 9, 10, 11 et 12 in order to have (at least) the analogical proportions (3, 4, 9, 10) and (1, 8, 11, 12). This leads to the formal context called BASE lm , Figure 2. 302 Laurent Miclet, Henri Prade and David Guennec a b c d e f g h i a b c d e f g h i leech 1 × × × bean 7 × × × × bream 2 × × ×× maize 8 × × × × frog 3 × × × ×× x 9×× ××× dog 4 × × ××× y 10 × ××× × spike-weed 5 × × × × z 11 × × × × × reed 6 × × × × × t 12 × × × × × × Fig. 2. BASE lm : A formal context from Bělohlávek [2] increased with four objects 9, 10, 11 et 12 in order to have (at least) the analogical proportions (3, 4, 9, 10) and (1, 8, 11, 12). The lattice constructed on this formal context (with the In-Close free soft- ware [14]) has 31 concepts. We have run Doap more than 600 times. It has always terminated by finding one of the three analogical proportions in the data4 . The average size of the Closed list is 63 and its median size is 28. Figure 3 gives the details. Fig. 3. Results of 622 runs of Doap on BASE lm . The size of the Close list for each run is on the Y axis. To appreciate these results, we have compared with a random search, replac- ing line 13 of the Doap algorithm by picking a random 4-tuple. The detailed results are given in Figure 4. The average size of the Closed list is 253 and its median size is 174. We also have tried to “symmetrize” the role of 0 and 1 in this formal context by adding the reverse attributes (indeed the Table 1 defining analogical propor- tions is left unchanged when exchanging 0 and 1). It leads to 12 objects and 18 4 We actually had a good surprise: Doap found a third proportion, namely (2, 4, 9, 12). Looking for analogical proportions in a formal concept analysis setting 303 Fig. 4. Results of 630 runs of random Doap on BASE lm . The Y axis is graduated from 0 up to 2000. attributes instead of 9. The size of the lattice of concepts is now 94. The algo- rithm Doap with the same parameters examines in average 93 4-tuples before finding an analogical proportion. The median value is 31. The symmetrization does not seem to be a good idea in this case. The random Doap algorithm has failed to give complete results on these data, due to overflows in the Close list. The second experiment has been run on the Lenses data base, from UCI ML Repository [6]. The nominal attributes have been transformed into binary ones by simply creating as many binary attributes as the number of modalities. The number of objects is 24 with 7 binary parameters. The size of the lattice is 43, the average number of examined 4-tuples is 77 and the median number is 20. When adding the reverse attributes, we have a lattice of size 227, an average number of 39 and a median number of 16. In that experiment, the symmetrization of the data seems clearly to have a positive effect. A first conclusion is that our heuristic algorithm seems to perform well. In the second context the basic search space has a size over 40.000 and we examine only 77 4-tuples in average. The construction of the lattice of concepts takes in practice much more time than the discovery of analogical proportions, which seems to suggest that it is a relevant space for looking for analogical proportions. 4 Analogical proportions between formal concepts We have seen that discovering analogical proportions in a formal context benefits from the knowledge of the associated lattice of formal concepts. Then it raises the question of understanding how formal concepts are involved in analogical proportions. Clearly, four objects in the same formal concept form an analogical proportion – in a trivial way – w.r.t. the subset of attributes involved in the formal concept. Partial answers to the question, when two formal concepts are involved in the proportion, are given in this section. 4.1 The smallest formal context in complete proportion We are interested in this section in examining the properties of the smallest context with an analogical proportion between objects. Obviously, this context will have exactly four objects. If we want to have, only one time, each of the possible analogical proportions between attributes, we need six of them (see table 1) and we obtain BASE 0 (see Figure 5). 304 Laurent Miclet, Henri Prade and David Guennec f a b c d e uv wx ∅ u ××× v × ×× w × × × wx vx uw uv a b c d x ×× × a b c d u v w x u ×× cd bd ac ab v × × w× × x×× ∅ abcd Fig. 5. BASE 0 , BASE 1 and the concepts lattice of BASE 1 . The lattice of BASE 0 is deduced from it by adding e to all subsets of attributes. We can construct now the concept lattice of BASE 0 , but it is interesting to get rid of attributes f (which will not be present in any context) and e (present in every context). We call BASE 1 the reduced context, shown at Figure 5. Its lattice is displayed in Figure 5. Note that there is a perfect symmetry between attributes and objects. The third line of the lattice expresses that u : v :: w : x, but also in subsets terms that {c, d} : {b, d} :: {a, c} : {a, b}. The second line expresses that a : b :: c : d and that {w, x} : {v, x} :: {u, w} : {u, v}. This is not surprising: as explained in section 2, we can see an object as the set of properties that hold true for it. 4.2 Some relations between analogical proportions and lattices of concepts Firstly, let us remark that the two following propositions are equivalent. This is immediate from section 2, in which these two equivalent definitions of analogical proportion have been presented. 1. x1 , x2 , x3 and x4 are four objects, in analogical proportion in this order. 2. R↑ (x1 ), R↑ (x2 ), R↑ (x3 ) and R↑ (x4 ) are four subsets of properties in analog- ical proportion in this order. Property 1 Let x1 , x2 , x3 and x4 be four objects in analogical proportion in this order. Let (X1 , Y1 ) be the5 concept with the smallest set X1 of objects in which x1 is present. Let us define (X2 , Y2 ), (X3 , Y3 ) and (X4 , Y4 ) in the same way. Then the four sets of attributes Y1 , Y2 , Y3 and Y4 are in analogical proportion, in this order. Proof. Since x1 ∈ X1 , all the attributes in Y1 take value 1 on x1 . Since X1 is the smallest set of objects including x1 , there is no attribute outside Y1 having 5 If they were two, x1 would be present in the intersection of the two. Looking for analogical proportions in a formal concept analysis setting 305 value 1 on x1 . Hence, Y1 is exactly R↑ (x1 ), the extension of x1 , i.e. the subset of attributes that take value 1 on x1 . This is also true for x2 , x3 and x4 . We have to prove now that x1 : x2 :: x3 : x4 implies R↑ (x1 ) : R↑ (x2 ) :: R↑ (x3 ) : R↑ (x4 ). It is immediate from the remark above. For example, in BASE 0 , we know that 1 : 8 :: 11 : 12. We have X1 = {1, 2, 3, 11}, Y1 = {a, b, g}, X2 = {6, 8, 12}, Y2 = {a, c, d, f }, X3 = {11}, Y3 = {a, b, e, g, i} and X4 = {12}, Y4 = {a, c, d, e, f, i}. The proportion Y1 :Y2 ::Y3 :Y4 holds, since: {a, b, g}:{a, c, d, f }::{a, b, e, g, i}:{a, c, d, e, f, i}. Property 2 Let (X1 , Y1 ), (X2 , Y2 ), (X3 , Y3 ) and (X4 , Y4 ) be four concepts of a lattice of concepts, such that the four sets of attributes Y1 , Y2 , Y3 and Y4 are in analogical proportion, in this order. Let X b1 be the subset of X1 composed of all objects that are in X1 but cannot be found in any subset of X1 belonging to a concept. We define in the same manner X b2 , X b3 and Xb4 . The following property b b b b holds true: ∀x1 ∈ X1 , x2 ∈ X2 , x3 ∈ X3 , x4 ∈ X4 : x1 x2 :: x3 : x4 . Proof. It is the reciprocal of Property 1: Y1 is the extension of all objects in b1 , and we take x1 in X X b1 . We derive the conclusion from the remark above. Property 3 Let x1 , x2 , x3 and x4 be four objects, in analogical proportion in this order. Let A1111 = {y|y ∈ R↑ (x1 ), y ∈ R↑ (x2 ), y ∈ R↑ (x3 ), y ∈ R↑ (x4 ))} Let A1100 = {y|y ∈ R↑ (x1 ), y ∈ R↑ (x2 ), y 6∈ R↑ (x3 ), y 6∈ R↑ (x4 ))} Let A0011 = {y|y 6∈ R↑ (x1 ), y 6∈ R↑ (x2 ), y ∈ R↑ (x3 ), y ∈ R↑ (x4 ))} Let A1010 = {y|y ∈ R↑ (x1 ), y 6∈ R↑ (x2 ), y ∈ R↑ (x3 ), y 6∈ R↑ (x4 ))} Let A0101 = {y|y 6∈ R↑ (x1 ), y ∈ R↑ (x2 ), y 6∈ R↑ (x3 ), y ∈ R↑ (x4 ))} Then ({x1 , x2 }, A1111 ∪ A1100 ) is included into a formal concept. ({x3 , x4 }, A1111 ∪ A0011 ) is included into a formal concept. ({x1 , x3 }, A1111 ∪ A1010 ) is included into a formal concept. ({x2 , x4 }, A1111 ∪ A0101 ) is included into a formal concept. The result follows from the definition of the subsets of attributes considered and their clear relation with the definition of analogical proportions. The fact that we only have an inclusion in the above property should not come as a surprise. Indeed, when describing objects, attributes that are nor not relevant w.r.t. the analogical proportion may be present. 5 Lines for further research and concluding remarks Beyond the already introduced set function, R↓ (Y ) = {x ∈ Obj|R↑ (x) ⊇ Y }, which is at the core of FCA,and which leads to the definition of formal concepts, it has been noticed [5], on the basis of a parallel with possibility theory that, given a set Y of properties, four remarkable sets of objects can be defined in this setting (here the overbar denotes set complementation): 306 Laurent Miclet, Henri Prade and David Guennec – R↓Π (Y ) = {x ∈ Obj|R↑ (x) ∩ Y 6= ∅} = ∪y∈Y R↓ (y). This is the set of objects having at least one property in Y . – R↓N (Y ) = {x ∈ Obj|R↑ (x) ⊆ Y } = ∩y6∈Y R↓ (y). This is the set of objects having no property outside Y . – R↓∆ (Y ) = R↓ (Y ) = ∩y∈Y R↓ (y). This is the set of objects sharing all prop- erties in Y . – R↓∇ (Y ) = {x ∈ Obj|R↑ (x) ∪ Y 6= Obj} = ∪y6∈Y R↓ (y). This is the set of objects that are missing at least one property outside Y . It has been recently pointed out [3] that pairs (X, Y ) such that R↓N (Y ) = X and R↑N (X) = Y are characterizing independent sub-contexts (X, Y ) such that ((X × Y ) + (X × Y ) ⊇ R, in the sense that they do not share any object or property. Thus, in Figure 1, ({a, b, c, d, e, f }, {5, 6, 7, 8}) and ({g, h, i}, {1, 2, 3, 4}) are two formal sub-contexts. When comparing the features underlying FCA and analogical proportions, one can notice that the same 4 “indicators” are involved from the beginning: a∩b, a ∩ b, a ∩ b, and a ∩ b. Indeed R↓Π (Y ) is based on the condition R↑ (x) ∩ Y 6= ∅, R↓N (Y ) on the condition R↑ (x)∩Y = ∅, R↓∆ (Y ) on the condition R↑ (x)∩Y = ∅, and R↓∇ (Y ) on the condition R↑ (x) ∩ Y 6= ∅. Moreover, with these 4 indicators, one can define other so-called logical proportions [4], including some that are closely related to analogical proportions such as ‘paralogy’ which reads “what a and b have in common, c and d have it also” and is defined by a ∧ b = c ∧ d and a ∧ b = c ∧ d [10]. This more generally raises the question of the rela- tions between FCA and these logical proportions. Finally, the experiments with Doap have obviously to be scaled on larger formal contexts, in order to estimate its practical complexity more accurately. Some more thought has also to be given about the choice of the Start 4-tuples, especially to take advantage of the addition of the reverse attributes. An inter- esting point would be to be able to choose the Start in order to insure that every analogical proportion can be discovered. We also believe that the speed of Doap can be increased, since there are still a lot of parameters to tune, for example breaking ties in the head of the Close list in a non random fashion. An interesting question is whether or not the construction of the lattice must precede the heuristic search. It would certainly be of great interest to construct only the parts that are required by the running od the Doap algorithm. This would lead to merge the two parts of the method, rather than computing the whole lattice (a very costly operation) before its exploration. More generally, it would be clearly of interest to have an algorithm also able to find out the analogical proportions that hold in some sub-context (since as already said, irrelevant attributes may hide interesting analogical proportions), rather than in the initial formal context. This will open a machine learning point of view [1]. Looking for analogical proportions in a formal concept analysis setting 307 6 Aknowledgements We would like to thank the anonymous reviewers for their careful reading of this article and their interesting suggestions. References 1. S. Bayoudh, L. Miclet, and A. Delhay. Learning by analogy: a classification rule for binary and nominal data. Proc. 20th Inter. Joint Conf. on Artificial Intelligence, (M. M. Veloso, ed.), Hyderabad, India, AAAI Press, 678–683, 2007. 2. R. Bělohlávek. Introduction to formal context analysis. Internal report. Dept of Computer science. Palacký University, Olomouk, Czech Republic. 2008. 3. Y. Djouadi, D. Dubois, H. Prade. Possibility theory and formal concept analysis: Context decomposition and uncertainty handling. Proc. 13th Inter. Conf. on Infor- mation Processing and Management of Uncertainty (IPMU’10), (E. Hüllermeier, R. Kruse and F. Hoffmann, eds.), Dortmund, Springer, LNCS 6178, 260–269, 2010. 4. H. Prade, G. Richard. Logical proportions - Typology and roadmap. Proc. Inter. Conf. on Information Processing and Management of Uncertainty in Knowledge- based Systems (IPMU 2010), Dortmund, (E. Hüllermeier, R. Kruse, F. Hoffmann, eds.), Springer, LNCS 6178, 757–767, 2010. 5. D. Dubois, F. Dupin de Saint-Cyr, H. Prade. A possibility-theoretic view of formal concept analysis. Fundamentae Informaticae, 75, 195–213, 2007. 6. A. Frank and A. Asuncion. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of In- formation and Computer Science. 7. B. Ganter and R. Wille. Formal Concept Analysis. Mathematical Foundations. Springer Verlag, 1999. 8. Y. Lepage. De l’analogie rendant compte de la commutation en linguistique. http://www.slt.atr.jp/ lepage/pdf/dhdryl.pdf, Grenoble, 2001. HDR. 9. L. Miclet and H. Prade. Handling analogical proportions in classical logic and fuzzy logics settings. Proc. 10th Europ. Conf. on (ECSQARU’09), Verona, Springer, LNCS 5590, 2009, 638–650. 10. H. Prade and G. Richard. Analogy, paralogy and reverse analogy: Postulates and Inferences. Proc. Annual German Conf. on Artificial Intelligence (KI 2009), Pader- norn, Sept. 15-18, (B. Mertsching, M. Hund, Z. Aziz, eds.), Springer, LNAI 5803, 306–314, 2009. 11. N. Nilsson. Principles of Artificial Intelligence. Tioga, 1980. 12. N. Stroppa and F. Yvon. Analogical learning and formal proportions: Definitions and methodological issues. Technical Report ENST-2005-D004, http://www.tsi.enst.fr/publications/enst/techreport-2007-6830.pdf, June 2005. 13. R. Wille. Restructuring lattice theory: an approach based on hierarchies of con- cepts. In: Ordered Sets, (I. Rival, ed.), D. Reidel, Dordrecht, 445–470, 1982. 14. The Inclose software. http://inclose.sourceforge.net/. Downloaded on March 2011. Random extents and random closure systems Bernhard Ganter Institut für Algebra Technische Universität Dresden Abstract. We discuss how to randomly generate extents of a given formal context. Our basic method involves counting the generating sets of an extent, and we show how this can be done using the Möbius function. We then show how to generate closure systems on seven elements uniformly at random. 1 Introduction Let Random(0,1] denote an operator that generates a random number between 0 and 1 with equal probability. From such a (memoryless) random number gen- erator an operator Random_subset(S ) can be derived that produces, upon each invocation, a random subset of a given nite set S , such that all subsets are equally likely (see, e.g., [6]). Building on this we derive in this article an operator that randomly selects a 1 closed set from a given closure system on a nite set. Note that this is a trivial task for moderately sized systems of which you can label the closed sets by numbers 1, . . . , n. For such you could simply randomly pick a number between 1 and n and select the closed set labeled by this number. Since the size of a closure system is at most exponential in the size of its carrier, this trivial algorithm clearly requires polynomial time. However, a potentially exponential list of closed sets must be pre-computed and stored. For example we aim at generating closure systems at random2 . But there are many closure systems, even for small carrier sets. On seven elements the number was recently computed by Colomb, Irlande, and Raynaud [3] to be 14 087 648 235 707 352 472. Maintaining a list of this size is not an inviting idea, and thus the trivial approach is not very realistic. Our motivation comes from recent experimental computer investigations by D. Borchmann that yielded surprising results. Borchmann raised the question if these were artefacts caused by the non-uniform choice of the random input data. Have a look at Figure 1. It shows ve diagrams, each with 27 rows and 13 columns, corresponding to the possible number of meet reducible and irreducible closed sets in a closure system on a ve element set (the trivial system with zero irreducibles is omitted). A system with r reducibles and i irreducibles corresponds 1 That is, from an intersection-closed familiy of sets. Such families are also called Moore families. 2 The family of all closure systems on a xed set is itself a closure system. c 2011 by the paper authors. CLA 2011, pp. 309–318. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 310 Bernhard Ganter a b c d e Fig. 1. Closure systems on ve elements by their number of meet-irreducible (horizontal) and reducible closed sets (vertical). Tile a shows the possible values, tile b the true relative frequencies, tile c and d come from random contexts and tile e from picking systems uniformly at random. to the cell in the r -th row from the bottom and the i-th column from the left. The rst diagram depicts which combinations of r and i are possible, while the other four display relative frequencies (the darker, the higher). The second diagram shows the true frequencies, counted over all 1 385 552 closure systems on ve elements. The other three show frequencies of randomly chosen closure systems (1 000 000 samples each). For the diagram in the middle, the systems were made by putting random crosses in a 5×13context. The fourth diagram was obtained by putting random crosses with random density in a formal context with a random number of columns. The fth diagram shows the distribution of a sample picked with uniform distribution. We use the language of Formal Concept Analysis [4] and, in particular, that every closure system is the set Ext(G, M, I) of all extents of some formal context (G, M, I). We construct an operator Random_extent(G, M, I ) which selects, upon each invocation, randomly an extent of (G, M, I), with equal probability for all extents. The closure operator for the extents will be denoted by X 7→ X 00 . 00 If X = Y , then Y is called agenerating set of the extent X (not necessarily a minimal one). The number of generating sets of an extent E shall be denoted by egen(E). We extend this denition to arbitrary subsets so that egen(Y ) := |{X ⊆ G | X 00 = Y 00 }| gives the number of generating sets of the extent generated by Y . Of course then, egen(Y ) = egen(Y 00 ). Therefore if E is an extent then obviously X 1 = 1. egen(Y ) {Y |Y 00 =E} Random extents and random closure systems 311 Computing the function egen() is a nontrivial task. We shall discuss this below. Our method could theoretically be applied to many instances, such as generat- ing random partitions, random subgroups, etc. However, its runtime performance is very bad. For most such situations algorithms are known that are much more ecient than what we suggest. Indeed, we do not believe that our method will be very useful in practice. Our contribution is meant as a challenge to come up with a more ecient approach. We are grateful to the referees for several useful hints. We were unaware of the paper by Boley, Gärtner, and Grosskreutz [2], which addresses the same problem, but with a dierent and more general approach. It may well be that their algorithm yields better results even for generating random closure systems. We have also learnt that the problem of generating random extents is known to belong to a (dicult) complexity class: it is equivalent to the #RHΠ1 -hard problem of counting formal concepts (again, see [2] and the references given there). We already knew (because our colleagues of the stochastics group told us so and recommended the book by Asmussen and Glynn [1] as a standard reference) that our approach is an instance of the so-called acceptance-rejection method. 2 Random Extent Our innocent looking algorithm for generating a random extent of a given formal context (G, M, I) goes like this: Algorithm 1 Random_extent: Generating a random extent Input: A formal context (G, M, I). Output: A random extent of (G, M, I) repeat S := Random_subset(G) 1 until Random(0,1] ≤ egen (S) return S 00 . What the algorithm does essentially is to pick a random subset and output 3 its closure with probability one over the number of generating sets . It is quite elementary to prove that it does what it is supposed to do: Proposition 1 The algorithm Random_extent generates extents of (G, M, I) with equal probability. The proposition is an instance of the following lemma from elementary stochas- tics, for which we provide a proof. To obtain the proposition from the lemma, let 3 One of the referees pointed out that a much simpler algorithm with the same number of expected iterations is obtained by replacing the until statement by until S is closed. We see however no straightforward way to a recursive version of this algorithm. 312 Bernhard Ganter A be the set of all subsets of G, let B be the set of all extents, and let f be the map that associates a subset to the extent it generates. Lemma 1 Let f : A → B be a surjective (i.e., onto) map between nite sets A and B and let Random(A) be an operator that picks elements from A with equal probability. Then Algorithm 2 outputs elements of B with equal probability. Algorithm 2 Random image: Random image of a mapping Input: An onto map f : A → B and an operator Random(A) Output: A random element of B repeat a := Random(A) r := Random(0,1] b := f (a); until r ≤ |f −11(b)| return b. Proof It is obvious that the algorithm produces elements of B . In order that a given element b is produced in one iteration of the loop, the element a must belong to f −1 (b) and, independently, r ≤ |f −11(b)| . The probability that this happens is |f −1 (b)| 1 1 · −1 = , |A| |f (b)| |A| independently of b. The probability that some element is selected after one step |B| thus is . The probability that the element b is produced after k steps is |A| k−1 |B| 1 1− · . |A| |A| The probability that b is produced is X∞ k−1 |B| 1 |A| 1 1 1− · = · = , k=1 |A| |A| |B| |A| |B| as claimed. The expected number of iterations until success is |A| #subsets = . |B| #extents The algorithm may therefore need quite some time. For example, would this algo- rithm be applied to the standard context of closure systems to generate a random Random extents and random closure systems 313 closure system on a 6-element set, it requires, on average, 121 402 088 iterations 6 of the loop, since that context has 2 − 1 objects and 75 973 751 474 extents ([5]). For closure systems on a seven-element set the average number of loop iterations for obtaining a single random closure system would be 12 077 330 482 260 320 447. As already mentioned we shall develop a better method for this case below. Before we do so, we study the problem of computing the value of egen(A). 3 Counting generating sets and hitting sets The algorithm in the previous section uses the number egen(A) of a given extent A, and that by itself is not easy. Of course, each such generating set must be a subset of A. On the other hand, a subset S ⊆ A is a generating set of A i it is not contained in a lower neighbor of A. It is worthwhile to consider the formal context (A, N , ∈), where N is the family of lower neighbor extents of A. For this context, the elements of N are precisely the maximal extents below A, and thus the generating sets of A are the same as before. Counting generating sets thereby has been reduced to counting generating sets of the unit element in a co-atomistic lattice. Every subset of A is generating set of exactly one extent of (A, N , ∈). The |A| total number of generating sets thus is 2 . Indeed, for every extent B we obtain X egen(E) = 2|B| , E≤B where E runs over extents. By Möbius inversion we obtain X egen(A) = µ(E, A) · 2|E| , E≤A where µ is the Möbius function of the lattice B(A, N , ∈). The evaluation of this formula poses no algorithmic diculties. Using the standard Next_intent algorithm ([4]) to generate the extents in descending order, and using, for every constructed extent E , the same algorithm again for producing all extents F between E and A, suces to compute the Möbius function by the well known recursion X µ(E, A) = − µ(F, A). E 2n−1 ) do if F [i] = 2 then j := 2n−1 − 1 while success and (j > 0) do meet := i and j if (j 6= meet) and (F [meet] 6= 1) then success := (Random(0,1] < 0.5) F [meet] := 1 j := j − 1 i := i − 1 until success return F . 318 Bernhard Ganter We have implemented Algorithm 5 for n = 7 and present rst experimental results. Note that the number of random samples produced by this experiment is small compared to the number of closure systems: we have generated less than 0.000 000 000 000 4% of all closure systems on seven points. 85 85 80 80 75 75 70 70 65 65 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 5 10 15 20 25 30 35 5 10 15 20 25 30 35 Fig. 2. A random sample of 50 000 closure systems on a seven element set, plotted according to their number of irreducible closed sets (horizontal) and reducible closed sets (vertical). The left image shows which sizes occurred at least once. The right image expresses higher frequencies by darker shadings. The computation took one night on a 1.4 GHz PC. We did not even attempt to generate random closure systems on eight elements using Algorithm 5. We believe that a substantially better idea is needed for that case and beyond. References 1. S. Asmussen and P. W. Glynn. Stochastic Simulation. Springer-Verlag, New York, 2007. 2. Mario Boley, Henrik Grosskreutz, and Thomas Gärtner. Formal concept sampling for counting and thresholdfree local pattern mining. In Proc. of the SIAM Int. Conf. on Data Mining (SDM 2010). SIAM, 2010. 3. Pierre Colomb, Alexis Irlande, and Olivier Raynaud. Counting of Moore families for n = 7. In ICFCA'10, pages 7287, 2010. 4. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis - mathematical foundations. Springer Verlag, 1999. 5. Michel Habib and Lhouari Nourine. The number of Moore families on n = 6. Discrete Mathematics, pages 291296, 2005. 6. Albert Nijenhuis and Herbert S. Wilf. Combinatorial algorithms. Academic Press, 1975. Extracting Decision Trees from Interval Pattern Concept Lattices Zainab Assaghir1 , Mehdi Kaytoue2 , Wagner Meira Jr.2 and Jean Villerd3 1 INRIA Nancy Grand Est / LORIA, Nancy, France 2 Universidade Federal de Minas Gerais, Belo Horizonte, Brazil 3 Institut National de Recherche Agronomique / Ensaia, Nancy, France Zainab.Assaghir@loria.fr, {kaytoue,meira}@dcc.ufmg.br, Jean.Villerd@nancy.inra.fr Abstract. Formal Concept Analysis (FCA) and concept lattices have shown their effectiveness for binary clustering and concept learning. Moreover, several links between FCA and unsupervised data mining tasks such as itemset mining and association rules extraction have been emphasized. Several works also studied FCA in a supervised framework, showing that popular machine learning tools such as decision trees can be extracted from concept lattices. In this paper, we investigate the links be- tween FCA and decision trees with numerical data. Recent works showed the efficiency of ”pattern structures” to handle numerical data in FCA, compared to traditional discretization methods such as conceptual scal- ing. 1 Introduction Decision trees (DT) are among the most popular classification tools, especially for their readability [1]. Connexions between DT induction and FCA have been widely studied in the context of binary and nominal features [2], including struc- tural links between decision trees and dichotomic lattices [8], and lattice-based learning [7]. However the numerical case faces issues regarding FCA and numer- ical data. In this paper, we investigate the links between FCA and decision trees with numerical data and a binary target attribute. We use an extension of For- mal Concept Analysis called interval pattern structures to extract sets of positive and negative hypothesis from numerical data. Then, we propose an algorithm thats extract decision trees from minimal positive and negative hypothesis. The paper is organised as follows. Section 2 presents the basics of FCA and one of its extensions called interval pattern structureq for numerical data. Sec- tion 3 recalls basic notions of decision trees. Then, we introduce some definitions in section 4 showing the links between interval pattern structures and decision trees, and a first algorithm for building decision trees from minimal positive and negative hypothesis extracted from the pattern structures. c 2011 by the paper authors. CLA 2011, pp. 319–332. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 320 II Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd 2 Pattern structures in formal concept analysis Formal contexts and concept lattices. We assume that the reader is familiar with FCA, and recall here most important definitions from [3]. Basically, data are represented as a binary table called formal context (G, M, I) that represents a relation I between a set of objects G and a set of attributes M . The statement (g, m) ∈ I is interpreted as “the object g has attribute m”. The two operators (·)0 define a Galois connection between the powersets (2G , ⊆) and (2M , ⊆), with A ⊆ G and B ⊆ M : A0 = {m ∈ M | ∀g ∈ A : gIm} f or A ⊆ G, 0 B = {g ∈ G | ∀m ∈ B : gIm} f or B ⊆ M For A ⊆ G, B ⊆ M , a pair (A, B), such that A0 = B and B 0 = A, is called a (formal) concept. In (A, B), the set A is called the extent and the set B the intent of the concept (A, B). The set of all concepts is partially ordered by (A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1 ) and forms a complete lattice called the concept lattice of the formal context (G, M, I). In many applications, data usually consist in complex data involving num- bers, intervals, graphs, etc. (e.g. Table 1) and require to be conceptually scaled into formal contexts. Instead of transforming data, leading to representation and computational difficulties, one may directly work on the original data. Indeed, to handle complex data in FCA, Ganter & Kuznetsov [4] defined pattern struc- tures: it consists of objects whose descriptions admit a similarity operator which induces a semi-lattice on data descriptions. Then, the basic theorem of FCA nat- urally holds. We recall here their basic definitions, and present interval pattern structures from [5] to handle numerical data. Patterns structures. Formally, let G be a set of objects, let (D, u) be a meet-semi-lattice of potential object descriptions and let δ : G −→ D be a mapping. Then (G, (D, u), δ) is called a pattern structure. Elements of D are called patterns and are ordered by a subsumption relation v such that given c, d ∈ D one has c v d ⇐⇒ c u d = c. A pattern structure (G, (D, u), δ) gives rise to the following derivation operators (·) , given A ⊆ G and an interval pattern d ∈ (D, u): A = l δ(g) g∈A d = {g ∈ G|d v δ(g)} These operators form a Galois connection between (2G , ⊆) and (D, v). (Pattern) concepts of (G, (D, u), δ) are pairs of the form (A, d), A ⊆ G, d ∈ (D, u), such that A = d and A = d . For a pattern concept (A, d), d is called a pattern intent and is the common description of all objects in A, called pattern extent. When partially ordered by (A1 , d1 ) ≤ (A2 , d2 ) ⇔ A1 ⊆ A2 (⇔ d2 v d1 ), the set of all concepts forms a complete lattice called a (pattern) concept lattice. Interval pattern structures. Pattern structures allow us to consider com- plex data in full compliance with FCA formalism. This requires to define a meet Extracting Decision Trees From Interval Pattern Concept Lattices 321 III operator on object descriptions, inducing their partial order. Concerning numer- ical data, an interesting possibility presented in [5] is to define a meet operator as an interval convexification. Indeed, one should realize that “similarity” or “in- tersection” between two real numbers (between two intervals) may be expressed in the fact that they lie within some (larger) interval, this interval being the smallest interval containing both two. Formally, given two intervals [a1 , b1 ] and [a2 , b2 ], with a1 , b1 , a2 , b2 ∈ R, one has: [a1 , b1 ] u [a2 , b2 ] = [min(a1 , a2 ), max(b1 , b2 )] [a1 , b1 ] v [a2 , b2 ] ⇔ [a1 , b1 ] ⊇ [a2 , b2 ] The definition of u implies that smaller intervals subsume larger intervals that contain them. This is counter intuitive referring to usual intuition, and is ex- plained by the fact that u behaves as an union (actually convex hull is the union of intervals, plus the holds between them). These definitions of u and v can be directly applied component wise on vectors of numbers or intervals, e.g. in Table 1 where objects are described by vectors of values, each dimension corresponding to an attribute. For example, h[5, 7.2], [1, 1.8]i v h[5, 7], [1, 1.4]i as [5, 7.2] v [5, 7] and [1, 1.8] v [1, 1.4]. Now that vectors of interval forms a u-semi-lattice, numerical data such as Table 1 give rise to a pattern structure and a pattern concept lattice. An example of application of concept forming operators (.) is given below. The corresponding pattern structure is (G, (D, u), δ) with G = {p1 , ..., p4 , n1 , ..., n3 } and d ∈ D is a vector with ith component corresponding to attribute mi . {p2 , p3 } = δ(p2 ) u δ(p3 ) = h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i {p2 , p3 } = {p2 , p3 , p4 } As detailed in [5], vectors of intervals can be seen as hyperrectangles in Eu- clidean space: first (.) operator gives the smallest rectangle containing some object descriptions while second (.) operator returns the set of objects whose descriptions are rectangles included in the rectangle in argument. Accordingly, ({p2 , p3 , p4 }, h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i) is a pattern concept. All pattern concepts of an interval pattern structure form a concept lattice. Intuitively, low- est concepts have few objects and “small” intervals while higher concepts have “larger” intervals. An example of such lattice is given later. 3 Decision trees Among all machine leaning tools, decision trees [6, 1] are one of the most widely used. They belong to the family of supervised learning techniques, where data consist in a set of explanatory attributes (binary, nominal or numerical) that describe each object, called example, and one target class attribute that affects each example to a nominal class. Many extensions have been proposed, e.g. to consider a numerical class attribute (regression trees) or other particular cases depending on the nature of attributes. In this paper we focus on data consisting 322 IV Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd m1 m2 m3 m4 p1 7 3.2 4.7 1.4 + p2 5 2 3.5 1 + p3 5.9 3.2 4.8 1.8 + p4 5.5 2.3 4 1.3 + n1 6.7 3.3 5.7 2.1 - n2 7.2 3.2 6 1.8 - n3 6.2 2.8 4.8 1.8 - Table 1: Example 1: numerical context with an external target attribute of numerical explanatory attributes and a binary class attribute. The aim of de- cision tree learning is to exhibit the relation between explanatory attributes and the class attribute through a set of decision paths. A decision path is a sequence of tests on the value of explanatory attributes that is a sufficient condition to assign a new example to one of the two classes. A decision tree gathers a set of decision paths through a tree structure where nodes contain tests on explana- tory attributes. Each node has two branches, the left (resp. right) corresponds to the next test if the new example passed (resp. failed) the current test. When there is no more test to perform, the branch points to a class label, that repre- sents a leaf of the tree. The links between FCA and decision tree learning have been investigated in the case where explanatory attributes are binary [7–10, 2]. However, to our knowledge, no research has been carried out until now in the case of numerical explanatory attributes. In the next section, we show how pat- tern structures can be used to extract decision trees from numerical data with positive and negative examples. 4 Learning in interval pattern structures In [7], S. Kuznetsov considers a machine learning model in term of formal concept analysis. He assumes that the cause of a target property resides in common at- tributes of objects sharing this property. In the following, we adapt this machine learning model to the case of numerical data. Let us consider an interval pattern structure (G, (D, u), δ) with an external target property . The set of objects G (the training set) is partitioned into two disjoints sets: positive G+ and negative G− . Then, we obtain two different pattern structures (G+ , (D, u), δ) and (G− , (D, u), δ). Definition 1 (Positive hypothesis). A positive hypothesis h is defined as an interval pattern of (G+ , (D, u), δ) that is not subsumed by any interval pattern of (G− , (D, u), δ), i.e. not subsumed by any negative example. Formally, h ∈ D is a positive hypothesis iff h ∩ G− = ∅ and ∃A ⊆ G+ such that A = h Definition 2 (Negative hypothesis). A negative hypothesis h is defined as an interval pattern of (G− , (D, u), δ) that is not subsumed by any interval pattern Extracting Decision Trees From Interval Pattern Concept Lattices 323 V of (G+ , (D, u), δ), i.e. not subsumed by any positive example. Formally, h ∈ D is a negative hypothesis iff h ∩ G+ = ∅ and ∃A ⊆ G− such that A = h Definition 3 (Minimal hypothesis). A positive (resp. negative) hypothesis h is minimal iff there is no positive (resp. negative) hypothesis e 6= h such that e v h. Going back to numerical data in Table 1, we now consider the ex- ternal binary target property and split accordingly the object set into G+ = {p1 , p2 , p3 , p4 } and G− = {n1 , n2 , n3 }. The pattern concept lat- tice of (G+ , (D, u), δ), where D is the semi-lattice of intervals and δ is a mapping associating for each object its pattern description is given in Fig- ure 1 where positive hypothesis are marked. Note that neither the interval pattern h[5.5, 7], [2.3, 3.2], [4, 4.8], [1.3, 1.8]i nor h[5, 7], [2, 3.2], [3.5, 4.8], [1, 1.8]i are positive hypothesis since they are both subsumed by the inter- val pattern δ(n3 ) = h[6.2, 6.2], [2.8, 2.8], [4.8, 4.8], [1.8, 1.8]i. Therefore, there are two minimal positive hypothesis: P1 = h[5, 7], [2, 3.2], [3.5, 4.7], [1, 1.4]i and P2 = h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i. From (G− , (D, u), δ) (not shown), we obtain the unique minimal negative hypothesis: N1 = h[6.2, 7.2], [2.8, 3.3], [4.8, 6], [1.8, 2.1]i. Now, we consider decision trees more formally. Let the training data be de- scribed by K+− = (G+ ∪ G− , (D, u), δ) with the derivation operator denoted by (.) . This operator is called subposition in term of FCA. Definition 4 (Decision path). A sequence h(m1 , d1 ), (m2 , d2 ), . . . , (mk , dk )i, for different attributes m1 , m2 , · · · , mk chosen one after another, is a called deci- sion path of length k if there is no mi such that (mi , di ), (mi , ei ) and di and ei are not comparable, and there exists g ∈ G+ ∪ G− such that hd1 , d2 , . . . , dk i v δ(g) (i.e. there is at least one example g such that di v δ(g) for each attribute mi ). For instance, h(m3 , [4.8, 6]), (m1 , [6.2, 7.2])i is a decision path for Example 1. If i ≤ k (respectively i < k), the sequence h(m1 , d1 ), (m2 , d2 ), . . . , (mi , di )i is called subpath (proper subpath) of a decision path h(m1 , d1 ), (m2 , d2 ), . . . , (mk , dk )i. Definition 5 (Full decision path). A sequence h(m1 , d1 ), (m2 , d2 ), . . . , (mk , dk )i, for different attributes m1 , m2 , . . . , mk chosen one after another, is called full de- cision path of length k if all object having (m1 , d1 ), (m2 , d2 ), . . . , (mk , dk ) (i.e. ∀g ∈ G, di v δ(g) for the attribute mi ) are either positive or negative examples (i.e. have either + or − value of the target attribute). We say that a full decision path is non-redundant if none of its subpaths is a full decision path. The set of all chosen attributes in a full decision path can be considered as a sufficient condition for an object to belong to a class ∈ {+, −}. A decision tree is then defined as the set of full decision paths. Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd P1 P2 minimal positive hypothesis minimal positive hypothesis positive hypothesis Fig. 1: Lattice of the pattern structure (G+ , (D, u), δ). 324 VI Extracting Decision Trees From Interval Pattern Concept Lattices 325 VII 4.1 A first algorithm for building decision trees from interval pattern structures In this section, we propose a first algorithm for extracting full decision paths from the sets of minimal positive hypothesis P and minimal negative hypoth- esis N . Intuitively, minimal positive (resp. negative) hypothesis describe the largest areas in the attribute space that gathers the maximum number of posi- tive (resp. negative) examples with no negative (resp. positive) example. Positive and negative areas may intersect on some dimensions. In Example 1 (see Table 1), P = {P1 , P2 } and N = {N1 } and we denote by Pi ∩ Nj the interval vector for which the k-the component is the intersection of the Pi and Nj intervals for the k-the component. Recall that P1 = h[5, 7], [2, 3.2], [3.5, 4.7], [1, 1.4]i, P2 = h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i and N1 = h[6.2, 7.2], [2.8, 3.3], [4.8, 6], [1.8, 2.1]i. Then we have: P1 ∩ N1 = h[6.2, 7], [2.8, 3.2], ∅, ∅i P2 ∩ N1 = h∅, [2.8, 3.2], [4.8], [1.8]i We note that P1 and N1 have no intersection for attributes m3 and m4 . This means that any example that has a value for m3 (resp. m4 ) that is contained in P1 ’s interval for m3 (resp. m4 ) can directly be classified as positive. Similarly, any example having a value for m3 (resp. m4 ) contained in N1 ’s interval for m3 (resp. m4 ) can directly be classified as negative. The same occurs for P2 and N1 for m1 . Therefore a full decision path for a minimal positive hypothesis P is defined as a sequence h(mi , mi (P ))ii∈{1...|N |} where mi is an attribute such that mi (P ∩ Ni ) = ∅4 . A full decision path for a minimal negative hypothesis N is defined as a sequence h(mj , mj (N ))ij∈{1...|P|} where mj is an attribute such that mj (N ∩ Pi ) = ∅. Here examples of such decision paths (built from P1 , P2 and N1 respectively) are: h(m3 , [3.5, 4.7])i(P1 ) h(m4 , [1, 1.4])i(P1 ) h(m1 , [5, 5.9])i(P2 ) h(m3 , [4.8, 6]), (m1 , [6.2, 7.2])i(N1 ) h(m4 , [1.8, 2.1]), (m1 , [6.2, 7.2])i(N1 ) Decision paths built from P1 and P2 are sequences that contain a single element since |N | = 1. Decision paths built from N1 are sequences that contain two elements since |P| = 2. Two distinct full decision paths can be built from P1 since there are two attributes for which P1 and N1 do not intersect. A positive (resp. negative) decision tree is therefore a set of full decision paths, one for each minimal positive (resp.negative) hypothesis. For instance: 4 For any interval pattern P , the notation mi (P ) denotes its interval value for the attribute mi . 326 VIII Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd ”if m3 ∈ [3.5, 4.7] then +, else if m1 ∈ [5, 5.9] then + else -” is an example of positive decision path. An example of negative decision path is ”if m1 ∈ [6.2, 7.2] and m3 ∈ [4.8, 6] then -, else +”. Algorithm 1 describes the computation of full decision paths for minimal pos- itive hypothesis. The dual algorithm for minimal negative hypothesis is obtained by interchanging P and N . 1 Res ← empty array of size |P|; 2 foreach P ∈ P do 3 foreach N ∈ N such that ∃mi , mi (P ∩ N ) = ∅ do 4 if (mi , mi (P )) 6∈ Res[P ] then 5 Res[P ] ← Res[P ] ∪ (mi , mi (P )); 6 foreach N ∈ N such thatW6 ∃mi , mi (P ∩ N ) = ∅ do 7 Res[P ] ← Res[P ] ∪ { m∈M (m, m(P ) \ m(P ∩ N ))}; Algorithm 1: Modified algorithm for extracting full decision paths Res (in- cluding non-redundant) for minimal positive hypothesis The different steps of the algorithm are detailed below: line 1: Res will contain a full decision path for each minimal positive hypothesis. line 2: Process each minimal positive hypothesis P . line 3: For each minimal negative hypothesis N that has at least one attribute m such that m(P ∩ N ) = ∅, choose one of these attribute, called mi below. line 4: Ensure that mi has not already been selected for another N , this enables to produce non redundant full decision paths (see Example 2). line 5: Add the interval mi (P ) in the full decision path of P . The test mi ∈ mi (P ) will separate between positive examples covered by P and negative examples cov- ered by N . line 6: For each minimal negative hypothesis N that has no attribute m such that m(P ∩ N ) = ∅. line 7: Positive examples covered by P and negative examples covered by N can be separated by a disjunction of tests m ∈ m(P ) \ m(P ∩ N )) on each attribute m. Hence, there is at least one attribute for which a positive example from P belongs to m(P ) and not to m(N ). Otherwise, N would not be a negative hy- pothesis. Note that Example 1 is a particular case where all negative examples are gathered in a unique minimal negative hypothesis. A few values have been modified in Table 2 in order to produce two minimal negative hypothesis. Extracting Decision Trees From Interval Pattern Concept Lattices 327 IX m1 m2 m3 m4 p1 7 3.2 4.7 1.4 + p2 5 2 3.5 1 + p3 5.9 3.2 4.8 1.8 + p4 5.5 2.3 4 1.3 + n1 5.9 3.3 5.7 1.4 - n2 7.2 3.2 6 1.8 - n3 6.2 2.8 4.8 1.8 - Table 2: Example 2: training set Minimal positive hypothesis P1 and P2 remain unchanged while there are two minimal negative hypothesis: N1 = h[5.9, 7.2], [3.2, 3.3], [5.7, 6], [1.4, 1.8]i N2 = h[6.2, 7.2], [2.8, 3.2], [4.8, 6], [1.8, 1.8]i This leads to the following intersections: P1 ∩ N1 = h[5.9, 7], [3.2], ∅, [1.4]i P1 ∩ N2 = h[6.2, 7], [2.8, 3.2], ∅, ∅i P2 ∩ N1 = h[5.9], [3.2], ∅, [1.4, 1.8]i P2 ∩ N2 = h∅, [2.8, 3.2], [4.8, 4.8], [1.8]i Examples of full decision path computed by Algorithm 1 from P1 are h(m3 , [3.5, 4.7]), (m4 , [1, 1.4])i(1) h(m3 , [3.5, 4.7]), (m3 , [3.5, 4.7])i(2) Note that neither N1 nor N2 intersect P1 on m3 , therefore the full decision path (2) can be simplified as h(m3 , [3.5, 4.7])i. More generally, following pre- vious definitions, h(m3 , [3.5, 4.7])i is a non-redundant full decision path while h(m3 , [3.5, 4.7]), (m4 , [1, 1.4])i and h(m3 , [3.5, 4.7]), (m3 , [3.5, 4.7])i are not. A con- ditional test has been added in Algorithm 1 in order to also produce such non- redundant full decision paths. Finally a concrete positive decision tree is built from the set of full decision paths, each node corresponds to a minimal positive hypothesis Pi and contains a test that consists in the conjunction of the elements of a full decision path. The left child contains + and the right child is a node corresponding to another minimal positive hypothesis Pj or - if all minimal positive hypothesis have been processed. An example of decision tree for example 2 is: ”if m3 ∈ [3.5, 4.7] and m4 ∈ [1, 1.4] then +, else (if m3 ∈ [3.5, 4.8] and m1 ∈ [5, 5.9] then +, else -)”. We detail below the complete process for examples 1 and 2. 328 X Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd 4.2 Example 1 P P1 = h[5, 7], [2, 3.2], [3.5, 4.7], [1, 1.4]i P2 = h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i N N1 = h[6.2, 7.2], [2.8, 3.3], [4.8, 6], [1.8, 2.1]i intersections P1 ∩ N1 = h[6.2, 7], [2.8, 3.2], ∅, ∅i P2 ∩ N1 = h∅, [2.8, 3.2], [4.8], [1.8]i full decisions paths from P h(m3 , [3.5, 4.7])i(P1 ) h(m4 , [1, 1.4])i(P1 ) h(m1 , [5, 5.9])i(P2 ) full decisions paths from N h(m3 , [4.8, 6]), (m1 , [6.2, 7.2])i(N1 ) h(m4 , [1.8, 2.1]), (m1 , [6.2, 7.2])i(N1 ) m3 ∈ [3.5, 4.7] m1 ∈ [5, 5.9] yes no yes no + m1 ∈ [5, 5.9] + m3 ∈ [3.5, 4.7] yes no yes no + − + − m4 ∈ [1, 1.4] m1 ∈ [5, 5.9] yes no yes no + m1 ∈ [5, 5.9] + m4 ∈ [1, 1.4] yes no yes no + − + − full paths from P1, then from P2 full paths from P2, then from P1 m1 ∈ [6.2, 7.2] ∧ m3 ∈ [4.8, 6] m1 ∈ [6.2, 7.2] ∧ m4 ∈ [1.8, 2.1] yes no yes no − + − + full paths from N1 Fig. 2: Decision trees built from Example 1 Extracting Decision Trees From Interval Pattern Concept Lattices 329 XI 4.3 Example 2 P P1 = h[5, 7], [2, 3.2], [3.5, 4.7], [1, 1.4]i P2 = h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i N N1 = h[5.9, 7.2], [3.2, 3.3], [5.7, 6], [1.4, 1.8]i N2 = h[6.2, 7.2], [2.8, 3.2], [4.8, 6], [1.8, 1.8]i intersections P1 ∩ N1 = h[5.9, 7], [3.2], ∅, [1.4]i P1 ∩ N2 = h[6.2, 7], [2.8, 3.2], ∅, ∅i P2 ∩ N1 = h[5.9], [3.2], ∅, [1.4, 1.8]i P2 ∩ N2 = h∅, [2.8, 3.2], [4.8, 4.8], [1.8]i full decisions paths from P h(m3 , [3.5, 4.7])i(P1 ) h(m3 , [3.5, 4.7]), (m4 , [1, 1.4])i(P1 ) (redundant) h(m3 , [3.5, 4.8]), (m1 , [5, 5.9])i(P2 ) full decisions paths from N h(m3 , [5.7, 6])i(N1 ) h(m3 , [4.8, 6]), (m1 , [6.2, 7.2])i(N2 ) h(m4 , [1.8]), (m1 , [6.2, 7.2])i(N2 ) 4.4 Comparison with traditional decision tree learning approaches Standard algorithms such as C4.5 produce decision trees in which nodes contain tests of the form a ≤ v, i.e. the value for attribute a is less or equal to v, while our nodes contain conjunctions of tests of the form a ∈ [a1 , a2 ] ∧ b ∈ [b1 , b2 ]. A solution consists in identifying minimal and maximal values for each attribute in the training set, and by replacing them by −∞ and +∞ respectively in the resulting trees (see Figure 4). Moreover, common decision tree induction tech- niques use Information Gain maximization (or equivalently conditional entropy minization) to choose the best split at each node. The conditional entropy of a split is null when each child node is pure (contains only positive or negative ex- amples). When this perfect split can not be expressed as an attribute-value test, it can be shown that the optimal split that minimize conditional entropy consists in maximizing the number of examples in one pure child node (proof is ommited due to space limitation). This optimal split exactly matches our notion of posi- tive (resp. negative) minimal hypothesis, which corresponds to descriptions that gathers the maximum number of only positive (resp. negative) examples. However we insist that our algorithm is only a first and naive attempt to produce decision trees from multi-valued contexts using pattern structures. Its aim is only to clarify the links between decision tree learning and pattern struc- tures. Therefore it obviously lacks of relevant data structures and optimization. However we plan to focus our efforts on algorithm optimization and then on rigorous experimentations on standard datasets. 5 Concluding remarks In this paper, we studied the links between decision trees and FCA in the par- ticular context of numerical data. More precisely, we focused on an extension 330 XII Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd m3 ∈ [3.5, 4.7] m3 ∈ [3.5, 4.8] ∧ m1 ∈ [5, 5.9] yes no yes no + m3 ∈ [3.5, 4.8] ∧ m1 ∈ [5, 5.9] + m3 ∈ [3.5, 4.7] yes no yes no + − + − full paths from P1, then from P2 full paths from P2, then from P1 m3 ∈ [5.7, 6] m3 ∈ [4.8, 6] ∧ m1 ∈ [6.2, 7.2] yes no yes no − m3 ∈ [4.8, 6] ∧ m1 ∈ [6.2, 7.2] − m3 ∈ [5.7, 6] yes no yes no − + − + m3 ∈ [5.7, 6] m4 = 1.8 ∧ m1 ∈ [6.2, 7.2] yes no yes no − m4 = 1.8 ∧ m1 ∈ [6.2, 7.2] m3 ∈ [5.7, 6] − yes no yes no − + − + full paths from N1, then from N2 full paths from N2, then from N1 Fig. 3: Decision trees built from Example 2 Extracting Decision Trees From Interval Pattern Concept Lattices 331 XIII m4 ∈ (−∞, 1.4] m4 ≤ 1.4 yes no yes no + m1 ∈ (−∞, 5.9] + m1 ≤ 5.9 yes no yes no + − + − our approach Weka implementation of C4.5 Fig. 4: Comparison of decisions trees produced by our approach and by C4.5 for Ex- ample 1 of FCA for numerical data called interval pattern structures, that has recently gained popularity through its ability to handle numerical data without any dis- cretization step. We showed that interval pattern structures from positive and negative examples are able to reveal positive and negative hypothesis, from which decision paths and decision trees can be built. In future works, we will focus on a comprehensive and rigorous comparison of our approach with traditional decision tree learning techniques. Moreover, we will study how to introduce in our approach pruning techniques that avoid overfitting. We will also investigate solutions in order to handle nominal class attributes (i.e. more than two classes) and heterogeneous explanatory attributes (binary, nominal, ordinal, numerical). Finally, notice that interval patterns are closed since (.) is a closure operator. In a recent work [11], it has been shown that whereas a closed interval pattern represents the smallest hyper-rectangle in its equivalence class, interval pattern generators represent the largest hyper- rectangles. Accordingly, generators are favoured by minimum description length principle (MDL), since being less constrained. An interesting perspective is to test their effectiveness to describe minimal hypothesis in the present work. References 1. Quinlan, J.: Induction of decision trees. Machine learning 1(1) (1986) 81–106 2. Fu, H., Njiwoua, P., Nguifo, E.: A comparative study of fca-based supervised classification algorithms. Concept Lattices (2004) 219–220 3. Ganter, B., Wille, R.: Formal Concept Analysis. Springer-Verlag (1999) 4. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: ICCS ’01: Proceedings of the 9th International Conference on Conceptual Structures, Springer-Verlag (2001) 129–142 5. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181(10) (2011) 1989–2001 6. Breiman, L.: Classification and regression trees. Chapman & Hall/CRC (1984) 332 XIV Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd 7. Kuznetsov, S.O.: Machine learning and formal concept analysis. Int. Conf. on Formal Concept Analysis, LNCS 2961, (2004) 287–312 8. Guillas, S., Bertet, K., Ogier, J.: A generic description of the concept lattices classifier: Application to symbol recognition. Graphics Recognition. Ten Years Review and Future Perspectives (2006) 47–60 9. Nijssen, S., Fromont, E.: Mining optimal decision trees from itemset lattices. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM (2007) 530–539 10. Nguifo, E., Njiwoua, P.: Iglue: A lattice-based constructive induction system. In- telligent data analysis 5(1) (2001) 73 11. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting Numerical Pattern Mining with Formal Concept Analysis. In: International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Espagne (2011) A New Formal Context for Symmetric Dependencies Jaume Baixeries Departament de Llenguatges i Sistemes Informàtics. Universitat Politècnica de Catalunya. 08024 Barcelona. Catalonia. jbaixer@lsi.upc.edu Abstract. In this paper we present a new formal context for symmet- ric dependencies. We study its properties and compare it with previous approaches. We also discuss how this new context may open the door to solve some open problems for symmetric dependencies. 1 Introduction and Motivation In database theory there are different types of dependencies, yet, two of them appear to be the most popular: functional dependencies and multivalued depen- dencies. The reason is that both dependencies come handy in order to explain the normalization of a database scheme. But some of these dependencies are not only confined to the database domain. For instance, implications (the equiva- lent of functional dependencies for binary data) are present in datamining and learning ([4,19,20]). In general terms, a dependency states a relationship between sets of attributes in a table. Let us suppose that we have the following set of attributes: U = {name, income, age} in a table that contains the following records: id Name Income Age 1 Smith 30.000 26-10-1956 2 Hart 35.000 14-02-1966 3 Smith 30.000 02-01-1964 In such a case, we have that the relationship between age and the attributes income and name is functional, this is, that given a value of age, the value of income and name can be determined. We also have that the value of name can be determined by income and viceversa. In such a case, given these functional relationships between the attributes, we say that the functional dependencies age → {name, income}, name → income and income → name hold in that table. Functional dependencies and multivalued dependencies have their own set of axioms ([9,21]), which state what dependencies hold in the presence of other dependencies. For instance, an axiom for functional dependencies states that c 2011 by the paper authors. CLA 2011, pp. 333–348. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 334 2 Jaume Baixeries Baixeries J. transitivity holds, which means that, in the previous case, if we had that name → income and income → age hold in that table (which is not true in that table, but just as a supposition), it must follow necessarily that name → age holds. Given a set of dependencies Σ, we define as Σ + the set of all dependencies that hold according to those axioms. These axioms, in turn, are also shared by other dependencies: implications share the same axioms of functional dependencies ([4]), and degenerate multi- valued dependencies share the same axioms of multivalued dependencies ([5]). That is why we generically call Armstrong Dependencies (AD) those dependen- cies that share the same axioms of the former, and Symmetric Dependencies (SD) those that share the axioms of the latter. Since in this paper we are focusing on the syntactical properties of those de- pendencies, we will only talk of Armstrong and symmetric dependencies, rather than functional or multivalued dependencies. The lattice characterization of a set of Armstrong dependencies has been widely studied in [10,11,13,14,15], and their characterization with a formal con- text in [7,17]. However, the lattice characterization of symmetric dependencies has not been so widely studied. The main work is in [12], and the character- ization of symmetric dependencies with a formal contexts was studied in [3,5] (we talk indistinctly of a lattice characterization and a characterization with a formal context). In the case of AD’s, the formal context yields a powerset lattice ([17]), whereas in the case of symmetric dependencies, it yields a partition lattice ([3]). The fact that some problems related to AD’s have been solved using their lattice characterization, suggests that the same problems for SD’s could also be solved using their corresponding lattice characterization. We name three of those problems already solved for AD’s, not yet for SD’s: learning SD’s, the finding of a minimal basis for a set of dependencies for SD’s and the characterization of mixed sets of SD’s and AD’s. In general terms, query learning consists in deducing a function (a formula) via membership queries to an oracle. This method has been used to learn sets of Horn clauses, which can also be seen as implications ([8]), or, more generally, sets of Armstrong dependencies. Thus, the same general algorithm for learning Horn clauses ([1]) has been adapted to learn Armstrong dependencies ([2]). This adaptation was obviously easied by the fact that Horn clauses and Armstrong dependencies share the same set of axioms. Yet, no such algorithm for symmetric dependencies exists (to the best of the author’s knowledge). The minimal base (also: Duquenne-Guigues basis [16]) is the minimal set of Armstrong dependencies needed to compute Σ + . In [11], [16] and [17] it is characterized and computed in terms of the (powerset) lattice characterization of Σ + . We have been dealing with unmixed sets of AD’s and SD’s, but there exists an axiomatizations of mixed sets of AD’s and SD’s ([21]), but no lattice characterization of mixed sets. A A New New Formal Formal Context Context for for Symmetric Symmetric Dependencies Dependencies 3353 Although AD’s and SD’s are related, the lattice characterization yielded by the formal context in [3] is quite different in nature to that for AD’s. Potentially, it may pose different problems. The first is that the solutions that have been found for AD’s (based on their lattice characterization) may not be applied directly to the case of SD’s. We do not mean that having AD’s characterized with a powerset lattice and SD’s with a partition lattice makes it impossible to solve the same problems for SD’s. What we mean is that having a similar characterization for SD’s would make it easier to try and find an answer using existing solutions for AD’s. A second drawback is that the size of the formal context for SD’s is much larger, in comparison with that for AD’s. This may cause a problem in case the context is used in practical applications, but, more importantly, there are partitions that play no rôle in that characterization. A simple analysis of [3] yields that partitions that contain no singleton are completly useless, but a more detailed analysis (out of the scope of this paper) indicates that there are more redundant partitions. Finally, although partitions may be intuitive when dealing with SD’s, they do not reflect the B ⇔ ¬B symmetry of the definition of symmetric dependencies (as stated by Alan Day in [12]). It seems that the connection between AD’s and SD’s is stronger than what the partition lattice characterization suggests. As a step towards solving the learning problem and computation of a minimal basis for SD’s as well as the characterization of mixed sets of SD’s and AD’s, in this paper we present a new formal context for symmetric dependencies, following the work started in [3]. The results presented in this paper parallel those results, but from a different perspective that, we think, improve both the understanding and the possibilities to solve the open problems previously listed. This paper starts with the Notation section, followed by a Previous Work section that explains the departing point of this paper. In the Results section, we present a new formal context for SD’s. We also present an example in a separate section to illustrate the results. Finally, we discuss some aspects of this new formal context and present the conclusions and future work. 2 Notation We depart from a set of attributes U. We use non capital letters for single elements of that set, starting with a, b, c, . . . , and capital letters for subsets of U. The complement of a set X ⊆ U is X. We drop the union operator and use juxtaposition to indicate set union. For instance, instead of X ∪ Y we write XY . Generally, we also drop the set notation, and write abc instead of { a, b, c }. We define the powerset of a set U as ℘(U). The set of partitions that can be formed with U is Part(U). The notation for a partition is P = [P1 | P2 | · · · | Pn ], where Pi are the classes (subsets) of P . If needed, we indicate that the attributes in a set X are in fact a set of singletons with this notation: X. For instance, { a, b, c, d } = { { a }, { b }, { c }, { d } }. We overload P ≥ Q to indicate that a 336 4 Jaume Baixeries Baixeries J. partition P refines a partition Q and P ≤ Q to indicate that P is coarser than Q. More details of this (reversed) order can be found in [18]. As for Formal Concept Analysis, we use the usual notation ([17]), which includes the use of 0 as the (overloaded) function that relates the set of attributes and that of objects and viceversa. 2.1 Symmetric Dependencies A symmetric dependency is a relation between two sets of attributes, and it is stated as X ⇒ Y . Given a set of attributes U, we define SDU as the set of all symmetric dependencies that can be formed with U. Although they will only be mentioned in this paper, we say that X → Y is an Armstrong dependency. Given a set of SD’s Σ ⊆ SDU , we say that the closure of Σ is Σ + , and consists of Σ plus the set of all SD’s that can be derived from Σ applying recursively the following axioms: Definition 1 (Axioms for SD’s). 1. Reflexivity: If Y ⊆ X, then, X ⇒ Y holds. 2. Complementation: If X ⇒ Y holds, then, X ⇒ XY holds. 3. Augmentation: If X ⇒ Y holds and W 0 ⊆ W ⊆ U, then, XW ⇒ Y W 0 holds. 4. Transitivity: If X ⇒ Y and Y ⇒ Z hold, then, X ⇒ Z \ Y holds. Because of complementation, we give a symmetric dependency as X ⇒ Y | Z, where Z = XY . We always assume that the rightest set in the right-hand side of a symmetric dependency is the complementary of the union of the other two. However, sometimes we will state it explicitly, as in X ⇒ Y | XY and sometimes we will simply use X ⇒ Y | Z. In both cases, X is the left-hand side of the dependency, Y its first right-hand side, and Z its second right-hand side. The set SDU is the set of all non-trivial symmetric dependencies that can be formed using all the attributes in U. By non-trivial we mean those SD’s X ⇒ Y | Z such that: Definition 2. A symmetric dependency X ⇒ Y | Z is non-trivial if: 1. X ∪ Y ∪ Z = U. 2. X ∩ Y = X ∩ Z = Y ∩ Z = ∅. 3. X 6= ∅, Y 6= ∅, Z 6= ∅. As it can be seen, according to the axioms for symmetric dependencies, this limitation incurs in no loss of information, since the remaining symmetric de- pendencies can easily be derived from SDU ([21]). It is precisely the complementation rule that states the relation between Arm- strong dependencies and symmetric dependencies. Broadly speaking, we could say that a symmetric dependency X ⇒ Y | Z is equivalent to the fact that either the Armstrong dependencies X → Y or X → Z hold. This is a too general state- ment, but if, as an example, we take, functional dependencies and its symmetric A A New New Formal Formal Context Context for for Symmetric Symmetric Dependencies Dependencies 3375 counterpart, degenerate multivalued dependencies, we see that the definition of a functional dependency X → Y states that whenever two tuples agree on X they also agree on Y , whereas the definition of a degenerate multivalued dependency X ⇒ Y | Z states that whenever two tuples agree on X they also agree on Y or they agree in Z. In fact, there are also a set of two axioms in the case we are dealing with mixed sets of AD’s and SD’s. One of this axioms state that if X → Y holds, then, X ⇒ Y | XY holds as well. This example is just to indicate that the relationship between AD’s and SD’s is strong, and that SD’s can be as a generalization of AD’s. Given a set of symmetric dependencies Σ, we say that the dependency basis of a set of attributes X ⊆ U (that is: DBΣ (X)) is the coarsest partition of U such that all the dependencies X ⇒ Y | Z that hold in Σ + are those such that Y (symmetrically Z) is the union of one or more classes of DB(X). This partition always exists ([21]) and defines all the symmetric dependencies that hold in Σ + such that their left-hand side is X. We also have that, since reflexivity holds for SD’s, all the attributes of X ⊆ U are singletons in DBΣ (X). 2.2 Previous Work The origins of defining a formal context to characterize the closure of a set of Armstrong dependencies started in [17]. This formal context was defined as: KAD (U) = (ADU , ℘(U), I) where ADU is the set of Armstrong dependencies that can be formed with the set of attributes U, and I was a binary relation between an Armstrong dependency and a set of attributes. In [3], it was presented a formal context for symmetric dependencies with identical properties: KSD (U) = (SDU , Part(U), I 0 ) The relations I and I 0 are generically called ”respect” relations: a set of attributes (a partition) respects an Armstrong (symmetric) dependency. Both formal contexts, in spite of its obvious structural differences, charac- terized the closure of a set of dependencies of its kind. In fact, both contexts provided the following results for each respective kind of dependencies: 1. Σ + = Σ 00 . 2. Σ 0 is the lattice characterization of Σ + . When we say that Σ 0 was the lattice characterization of Σ + , it may seem redundant, since we already have that Σ + = Σ 00 . What we mean is that Σ 0 alone, without the application of the operator 0 , also characterized all the dependencies of Σ + . This was done with the definition of a closure operator on Σ 0 : 338 6 Jaume Baixeries Baixeries J. ^ ΓΣ 0 (X) = { Y ∈ Σ0 | Y ⊇ X } The fact that this function is total indicates that ∧ is always defined in Σ 0 . Depending on the formal context we were dealing with, we would have that X ∈ ℘(U) (AD’s) or that X ∈ Part(U) (SD’s). In the case of Armstrong depen- dencies, we would then have that X → Y ∈ Σ + if and only if: ΓΣ 0 (X) = ΓΣ 0 (XY ) In the case of a symmetric dependency, it is a little bit more elaborated from a syntactical point of view, yet, equivalent to the previous case: X ⇒ Y | Z ∈ Σ + if and only if: ΓΣ 0 ([X | Y Z]) = ΓΣ 0 ([X | Y | Z]) Clearly, Σ 0 alone gives us the information of which dependencies are in Σ + by querying the (closure) operator ΓΣ 0 . In both cases, and oversimplifying, we can say that a dependency holds in Σ + if and only if there is some kind of relationship between its left-hand side and its right-hand side, being this relationship defined by the formal context. 3 Results The results in this paper try to overcome the potential problems that may rep- resent the differnet nature of the current formal contexts for AD’s and SD’s (powerset versus partitions), as well as the larger size of a partition set, by pre- senting a characterization of symmetric dependencies based on a formal context whose set of attributes is the powerset of U instead of its partitions. This context will generalize that for AD’s in [17] as it will seen in Section 5. We define a formal context, and prove that it characterizes the set of sym- metric dependencies Σ + , in a way similar to that in [3]: (SDU , ℘(U), I ), where the relation I is defined as follows: Definition 3. A ⊆ U respects a symmetric dependency X ⇒ Y | Z (that is: X ⇒ Y | Z I A) if and only if: A + X or A ⊇ XY or A ⊇ XZ We have that Σ 0 ⊆ ℘(U). As a trivial consequence of Definition 3 we have the following proposition: Proposition 1. X ⇒ Y | Z ∈ Σ 00 if and only if @A ∈ Σ 0 : A ⊇ X and A + XY and A + XZ A A New New Formal Formal Context Context for for Symmetric Symmetric Dependencies Dependencies 3397 We now study the properties of this contexts and how they characterize Σ + . We first see that all the dependencies that are in Σ + are also present in Σ 00 . To prove this claim, we must prove axiom by axiom that the dependencies derived by those axioms are also present in Σ 00 , but since reflexivity and complementation are trivial, we only prove augmentation and transitivity. Proposition 2 (Augmentation). If X ⇒ Y | XY ∈ Σ, and W 0 ⊆ W then, XW ⇒ Y W 0 | XW Y ∈ Σ 00 . Proof. By the way of contradiction, we suppose that there is a set A ⊆ U, A ∈ Σ 0 such that (note that XW XY W = XW Y ) A ⊇ XW and A + XW Y and A + XW Y Since X ⇒ Y | XY ∈ Σ, we have that A + X or A ⊇ XY or A ⊇ XY (because XXY = XY ). We have that A ⊇ XW discards A + X. So, we only have two possible options: (i) A ⊇ XY , which in combination with A ⊇ XW yields A ⊇ XY W , which contradicts A + XW Y . (ii) A ⊇ XY , which in combination with A ⊇ XW yields A ⊇ XW Y , which contradicts A + XW Y . Proposition 3 (Transitivity). If X ⇒ Y | XY ∈ Σ and Y ⇒ Z | Y Z ∈ Σ, then, X ⇒ Z \ Y | X(Z \ Y ) ∈ Σ. Proof. By the way of contradiction, we suppose that there is A ⊆ U, A ∈ Σ 0 such that A ⊇ X and A + X(Z \ Y ) and A + XX(Z \ Y ) We have to note that XX(Z \ Y ) = X(Z \ Y ), and that Z \ Y = Y Z, we finally have that XX(Z \ Y ) = XY Z. Therefore, we suppose that there is a set A ⊆ U, A ∈ Σ 0 such that: (i) A ⊇ X. (ii) A + X(Z \ Y ). (iii) A + XY Z. On the other hand, we have that: X ⇒ Y | XY ∈ Σ implies that A + X or A ⊇ XY or A ⊇ XY . Y ⇒ Z | Y Z ∈ Σ implies that A + Y or A ⊇ Y Z or A ⊇ Y Z. Since we are assuming that A ⊇ X, we can discard A + X. We also have that the case A ⊇ XY discards A + Y . This leaves three possibilities, either: 340 8 Jaume Baixeries Baixeries J. (i) A ⊇ XY and A ⊇ Y Z, that is, A ⊇ XY Z ⊇ X(Z \ Y ). This contradicts A + X(Z \ Y ). (ii) A ⊇ XY and A ⊇ Y Z, that is, A ⊇ XY Z. This contradicts A + XY Z. (iii) A ⊇ XY . Y ∩ Z = ∅ implies that Z \ Y ⊆ Y . All this yields A ⊇ XY ⊇ X(Z \ Y ). This contradicts A + X(Z \ Y ). Therefore, we have proved that any Σ 00 contains, at least, all the symmetric dependencies that are in Σ + . Corollary 1. Σ + ⊆ Σ 00 . Proof. By Propositions 2 and 3. We now prove completeness, that is, that Σ 00 only contains all the depen- dencies in Σ + . Theorem 1. Σ 00 ⊆ Σ + . Proof. We prove that X ⇒ Y | Z ∈ / Σ + implies that X ⇒ Y | Z ∈ / Σ 00 . + We have that X ⇒ Y | Z ∈ / Σ . It means that the dependency basis of X is such that in DBΣ (X) = [X | P1 | · · · | Pn ] (with n ⊇ 1) there is, at least, a class Pk such that Pk ∩ Y 6= ∅ and Pk ∩ Z 6= ∅. We fix Pk in this proof. We note that |Pk | ⊇ 2, since it contains, at least, one attribute from Y and one from Z. Sn Let P = ( Pj )\Pk , that is, P is the union of all partitions in DBΣ (X) which j=1 are not X, except Pk . Therefore, XP = Pk . We now claim that XP ∈ Σ 0 . We prove this statement by the way of contradiction. Assume that XP 6∈ Σ 0 . That is because there is a dependency R ⇒ S | T ∈ Σ such that X ⊇ R and X + RS and X + RT . This implies that there is, at least, one attribute in RS which is not i XP , and, at least, one attribute in RT which is not in XP . Let them be s ∈ RS, s 6∈ XP and t ∈ RT, t 6∈ XP . Since X ⊇ R, then, s ∈ S, s 6∈ XP and t ∈ T, t 6∈ XP . Necessarily, since s, t 6∈ XP , then, s, t ∈ Pk . Since XP ⊇ R, by reflexivity, XP ⇒ R | XP R, and by transitivity, XP ⇒ S \ R | XP (S \ R). Without lack of generality, we assume that R, S, T are disjoint, so, finally, we have XP ⇒ S | XP S. By the definition of DBΣ (X), then, X ⇒ P | XP , and by transitivity, we have X ⇒ S \ XP | X(S \ XP ). Since s ∈ S, s 6∈ XP , then, s ∈ S \ XP , and since t ∈ T (assuming R, S, T disjoint), t 6∈ S, that, together with t 6∈ XP yields that t ∈ X(S \ XP ). It means that the attributes s, t are in different classes in DBΣ (X), but this contradicts the previous assumption that Pk ∈ DBΣ (X). Now, we have that XP ∈ Σ 0 . Since s, t 6∈ XP , we have that XP + XY and XP + XZ and XP ⊇ X, which implies X ⇒ Y | Z 6∈ Σ 00 . A A New New Formal Formal Context Context for for Symmetric Symmetric Dependencies Dependencies 3419 We have that Σ 00 is exactly the set Σ + . But, as we have already discussed in the previous section, in [3] and [17] we had a method to query Σ 0 whether a dependency was in Σ + , and consisted in the closure operator ΓΣ 0 that, given a set of attributes, returned the meet of its up-set. In this present case, we may have that Σ 0 is not a lattice (but a partial lattice) and the same operator would not be a total function. Therefore, we use the up-set, instead of its meet: Definition 4. Let Σ ⊆ SDU . We define the up-set of X ⊆ U as follows: U PΣ (X) = { Y ∈ Σ 0 | Y ⊇ X } This definition is the standard one in lattice theory ([18]) when Σ 0 is an ordered set. The proof of the following proposition is trivial, yet, it will come handy to prove the last result of this paper. Proposition 4. Let X, Y, Z ⊆ U such that Y ⊇ X and Z ⊇ X. U PΣ (X) = U PΣ (Y ) ∪ U PΣ (Z) if and only if @A ∈ Σ 0 : A ⊇ X and A + Y and A + Z We need to remark that, although the set Σ 0 may not be closed under set intersection, the set of all up-sets of Σ 0 is closed under intersection. We are now ready to prove that it can be tested whether a dependency is in Σ + querying Σ 0 alone: Proposition 5. X ⇒ Y | Z ∈ Σ + if and only if U PΣ (X) = U PΣ (XY ) ∪ U PΣ (XZ) Proof. X ⇒ Y | Z ∈ Σ+ if and only if (by Corollary 1 and Theorem 1) X ⇒ Y | Z ∈ Σ 00 if and only if (by Proposition 1) @A : A ⊇ X and A + XY and A + XZ if and only if (by Proposition 4) U PΣ (X) = U PΣ (XY ) ∪ U PΣ (XZ) 342 10 Jaume Baixeries Baixeries J. 4 Example We provide a running example in order to illustrate and clarify the results that are contained in the previous section. We depart from a set of attributes U = { a, b, c, d }. The resulting formal context is presented in Figure 1. As stated in Theorem 1, this contexts computes the set Σ + . For instance, let us take the set Σ = {a ⇒ b | cd, b ⇒ ad | c} According to this context, we have that Σ 0 = {c, d, bc, cd, abc, abd, acd, bcd, abcd} and, finally, Σ 00 = Σ + = {a ⇒ b | cd, b ⇒ ad | c, a ⇒ c | bd, a ⇒ d | bc, ab ⇒ c | d, ac ⇒ b | d, ad ⇒ b | c, bd ⇒ a | c} To check these results, we see that ac ⇒ b | d, ad ⇒ b | c and bd ⇒ a | c are derived from Σ by the reflexivity, transitivity and complementation. For instance, given a ⇒ b | cd, by reflexivity we have ac ⇒ a | bd, and by transitivity ac ⇒ b | d (complementation comes from the notation X ⇒ Y | Z used in this paper). Dependencies ad ⇒ b | c and bd ⇒ a | c can be derived alike. As for the remaining SD’s: a ⇒ c | bd, a ⇒ d | bc by applying transitivity to a ⇒ b | cd and b ⇒ ad | c, we obtain a ⇒ c | bd, and with complementation we have a ⇒ d | bc. A A New New Formal Formal Context Context for for Symmetric Symmetric Dependencies Dependencies 343 11 abcd acd abd abc bcd ad ac ab cd bd bc a d c b a ⇒ b | cd ×××× ×××××××× b ⇒ a | cd × ××××× ×××××× a ⇒ c | bd ××× × ×××××××× c ⇒ a | bd × × ×××× × ××××× a ⇒ d | cb ××× ××××××××× d ⇒ a | cb × × × ×××× ××××× b ⇒ c | ad × ×× ××× ×××××× c ⇒ b | ad × × ×× ××× ××××× b ⇒ d | ac × ×× ×× ××××××× d ⇒ b | ac × × × ×× ×× ××××× c ⇒ d | ab × × ×× × ××××××× d ⇒ c | ab × × × ×× × ×××××× ab ⇒ c | d × × × × ×××××××××× ac ⇒ b | d × × × × × ××××××××× bc ⇒ a | d × × × × × × × ××××××× ad ⇒ b | c × × × × × × ×××××××× bd ⇒ a | c × × × × × × × × ×××××× cd ⇒ a | b × × × × × × × × × ××××× Fig. 1. Formal context (SDU , ℘(U), I) for U = { a, b, c, d } We now present an example with one more attribute, which may provide more insight in the details, but in this case, we do not present the context explicitly. The set of attributes is now U = { a, b, c, d, e }. Let Σ be the set of symmetric dependencies: b ⇒ a | cde b ⇒ c | ade c ⇒ a | bde c ⇒ b | ade d ⇒ a | bce d ⇒ e | abc e ⇒ a | bcd e ⇒ d | abc According the formal context (SDU , ℘(U), I ), we have that: Σ 0 = { a, abc, ade, abcd, abce, abde, acde, bcde, abcde } We can see that, applying the axioms of symmetric dependencies in Definition 1, the set Σ + is: b ⇒ a | cde b ⇒ c | ade c ⇒ a | bde c ⇒ b | ade d ⇒ a | bce d ⇒ e | abc e ⇒ a | bcd e ⇒ d | abc abd ⇒ c | e acd ⇒ b | e ce ⇒ a | bd de ⇒ a | bc bcd ⇒ a | e abe ⇒ c | d ace ⇒ b | d bce ⇒ a | d bde ⇒ a | c cde ⇒ a | b ab ⇒ c | de ac ⇒ b | de ad ⇒ bc | e ae ⇒ bc | d bc ⇒ a | de bd ⇒ ac | e bd ⇒ a | ce bd ⇒ ae | c be ⇒ ac | d be ⇒ ad | c be ⇒ a | cd cd ⇒ a | be cd ⇒ ae | b cd ⇒ ab | e ce ⇒ ab | d ce ⇒ ad | b 344 12 Jaume Baixeries Baixeries J. We only state the non-trivial dependencies as in Definition 2. We take, for instance, the dependencies: bd ⇒ ac | e, bd ⇒ a | ce, bd ⇒ ae | c They are derived from the dependencies b ⇒ a | cde and b ⇒ c | ade. They are in Σ + because the sets that include bd are abcd, abde, bcde, abcde. This obviously means that all of them respect all the dependencies in Σ + . We take, for instance, the set abcd and see that it respects bd ⇒ ae | c because abcd ≥ bcd (the left- hand side plus the second right-hand side) and that it also respects bd ⇒ a | ce because abcd ≥ abd (the left-hand side plus the first right-hand side). We can see in this example the duality of the definition of the relation respect. This is one case of derivation by augmentation, which means that the dependencies that derive another dependency remove the sets that would prevent the derived dependency from appearing in Σ 00 . In this latter particular case, the sets that could be forbitting any of these dependencies from appearing in Σ 00 have been cleared by b ⇒ a | cde and b ⇒ c | ade. We take, for instance, the set bde, (which would prevent bd ⇒ a | ce from being in Σ 00 ) is not in Σ 0 because it does not respect the dependency b ⇒ a | cde. We now illustrate one case of derivation by transitivity with the following set: a ⇒ bc | de, bc ⇒ d | ae By transitivity, we have that a ⇒ d | bce ∈ Σ + . If we take Σ = { a ⇒ bc | de }, we have: Σ 0 = {b, c, d, e, bc, bd, be, cd, ce, de, abc, ade, bcd, bce, bde, cde, abcd, abce, abde, acde, bcde, abcde} / Σ 00 since the sets abc, acde ∈ Σ 0 do not respect It is clear that a ⇒ d | bce ∈ this dependency. Now, if we include bc ⇒ d | ae in Σ, we have: Σ 0 = { b, c, d, e, bd, be, cd, ce, de, ade, bcd, bde, cde, abcd, abce, abde, acde, bcde, abcde } It has precisely been the dependency bc ⇒ d | ae the one that has cleared both abc and acde from Σ 0 and, therefore, allows a ⇒ d | bce to appear in Σ 00 . We now illustrate how Σ 0 alone can be used to query what dependencies hold in Σ + . Again, we have that Σ is the set: b ⇒ a | cde b ⇒ c | ade c ⇒ a | bde c ⇒ b | ade d ⇒ a | bce d ⇒ e | abc e ⇒ a | bcd e ⇒ d | abc and, therefore: Σ 0 = { a, abc, ade, abcd, abce, abde, acde, bcde, abcde } A A New New Formal Formal Context Context for for Symmetric Symmetric Dependencies Dependencies 345 13 We can see that Σ 0 in not closed (abcd, abde ∈ Σ 0 , but ab ∈ / Σ 0 ). Now, + suppose that we want to test whether a dependency is in Σ . For instance, we take a dependency that is not in Σ + , as a ⇒ bc | de and query Σ 0 : U PΣ (a) = { a, abc, ade, abcd, abce, abde, acde, abcde } U PΣ (abc) = { abc, abcd, abce, abcde } U PΣ (ade) = { ade, abde, acde, abcde } According to Proposition 5, since the sets U PΣ (a) and U PΣ (abc)∪U PΣ (ade) do not coincide, then, this dependency does not hold in Σ + . We see that the set that does not allow this equality to hold is the set a, which is in Σ 0 because all dependencies in Σ are respected by this set. We take now a positive example of a dependency that is in Σ + but not in Σ, as for instance ab ⇒ c | de: U PΣ (ab) = { abc, abcd, abce, abde, abcde } U PΣ (abc) = { abc, abcd, abce, abcde } U PΣ (abde) = { abde, abcde } In this case, the sets U PΣ (ab) and U PΣ (abc) ∪ U PΣ (abde) coincide. 5 Discussion We have seen in Section 2 that the different characterizations of dependencies with formal contexts follow a common pattern, regardless of the type of depen- dencies or the definition of the context. Yet, the definition of formal contexts for AD’s and SD’s as in [3] was structurally different (powersets versus partitions) and that made it difficult to find a relationship and generalization between both contexts, in spite of the clear structural similarities that exist between AD’s and SD’s. Now, we have that the relation I is a generalization of the relation defined in the context KAD (U) = (ADU , ℘(U), I). We recall the definition of this relation ([17]): Definition 5. A ⊆ U respects an Armstrong dependency X → Y iff: A + X or A ⊇ XY We see that this definition avoids the reference to the second right-hand side, precisely because in AD’s, complementation does not hold. If we drop this part from Definition 3, we have the definition of the respect relation for Armstrong dependencies. This generalization seems to suggest that the solutions that have been de- veloped based on the lattice characterization of sets of Armstrong dependencies, may also be applied to symmetric dependencies, namely: 1. To define a formal context for mixed sets of AD’s and SD’s. 346 14 Jaume Baixeries Baixeries J. 2. To adapt the classical query algorithm for learning Armstrong dependencies. 3. To characterize the generating set of a set of symmetric dependencies. Yet, although we are now in a better position to attack those problems, it does not seem to be a trivial task. For instance, the intuition would tell us that defining a formal context for mixed sets of dependencies, such that the relation would be the union of the relations already defined for AD’s and SD’s would work, but this is not the case. In fact, although this is out of the scope of this paper, this mixed formal contexts characterizes the symmetric dependencies that are in Σ + , where Σ is a mixed set of AD’s and SD’s, but fails in characterizing the AD’s that are in Σ + . However, this simple strategy allows to advance towards the definition of a mixed formal context, which would have not been that simple departing from a partition context. Adapting the classic learning algorithm for learning AD’s and the characteri- zation of the generating set of a set of SD’s may encounter some difficulties. The main difference between Σ 0 for Armstrong and symmetric dependencies is that for the former, Σ 0 is always a powerset lattice closed under intersection, whereas for symmetric dependencies, this is not necessarily the case, and, therefore, not all the existing solutions for Armstrong dependencies, based on lattices, may be applied out of the box to symmetric dependencies. Yet, the fact that now we are dealing with contexts of the same nature, offers a much clearer perspective and understanding than before. It must be said too that whereas this new characterization may make it po- tentially easier to find methods for finding minimal basis and query learning for SD’s, it is true that SD’s have not yet been used outside the database do- main. We think that advancing in the study of lattice characterization for SD’s and finding algorithmic similarities with FD’s may introduce the use of SD’s in other domains, namely knowledge discovery and machine learning, or in database theory, where it is already present: it would be of interest to have algorithms to compute minimal basis for SD’s, profiting from the important collection of algorithms that compute the minimal basis of a set of AD’s. Finally, we would like to remark that the size of the formal context is greatly improved w.r.t the context in [3], since we have replaced the set Part(U) by the set ℘(U). Yet, and for the sake of algorithmic solutions already existing in the FCA community, we have to say that the size of the context remains exponential. 6 Conclusions and Future Work We have presented a new formal context for symmetric dependencies. This con- texts provides the same functionalities as previous approaches, and it is much simpler. Yet, it offers the same expressivity power and, in fact, reduces the con- ceptual gap between Armstrong and symmetric dependencies that existed in a previous approach. We strongly believe that this may be the first step towards the resolution via formal concept analysis, of the learning, minimal bases and mixed sets of dependencies problems for symmetric dependencies, profiting from solutions already existing for Armstrong dependencies. A A New New Formal Formal Context Context for for Symmetric Symmetric Dependencies Dependencies 347 15 References 1. Angluin D., Frazier M., Pitt L. Learning Conjunctions of Horn Clauses. Machine Learning, 9:147-164, 1992. 2. Arias M., Balcázar, José L. Canonical Horn Representations and Query Learning. Lecture notes in computer science, vol. 5809, p. 156-17, 2009. 3. Baixeries, Jaume. A Formal Context for Symmetric Dependencies. ICFCA 2008. LNAI 4933. 4. Baixeries, Jaume and Balcázar, José L. Discrete Deterministic Data Mining as Knowledge Compilation. Proceedings of Workshop on Discrete Mathematics and Data Mining in SIAM International Conference on Data Mining, 2003. 5. Baixeries, Jaume and Balcázar, José L. Characterization and Armstrong Relations for Degenerate Multivalued Dependencies Using Formal Concept Analysis. Formal Concept Analysis, Third International Conference, ICFCA 2005, Lens, France, February 14-18, 2005, Proceedings. Lecture Notes in Computer Science, 2005 6. Baixeries, Jaume and Balcázar, José L. Unified Characterization of Symmetric Dependencies with Lattices. Contributions to ICFCA 2006. 4th International Con- ference on Formal Concept Analysis 2005. 7. Baixeries, Jaume. A Formal Concept Analysis framework to model functional dependencies. Mathematical Methods for Learning, 2004. 8. Balcázar, José L. and Baixeries, Jaume. Discrete Deterministic Data Mining as Knowledge Compilation. Workshop on Discrete Mathematics and Data Mining in SIAM Int. Conf. 2003. 9. Beeri, Catriel and Fagin, Roland and Howard, John H. A Complete Axiomatization for Functional and Multivalued Dependencies in Database Relations. Proceedings of the 1977 ACM SIGMOD International Conference on Management of Data, Toronto, Canada, August 3-5, 1977. 10. Caspard, Nathalie and Monjardet, Bernard. The Lattices of Closure Systems, Clo- sure Operators, and Implicational Systems on a Finite Set: a Survey. Proceedings of the 1998 Conference on Ordinal and Symbolic Data Analysis (OSDA-98). Discrete Applied Mathematics, 2003. 11. Day, Alan. The Lattice Theory of Functional Dependencies and Normal Decompo- sitions. International Journal of Algebra and Computation Vol. 2, No. 4 409-431. 1992. 12. Day, Alan. A Lattice Interpretation of Database Dependencies. Semantics of Pro- gramming Languages and Model Theory, 1993. 13. Demetrovics, János and Hencsey, Gusztav and Libkin, Leonid and Muchnik, Ilya. Normal Form Relation Schemes: a New Characterization. Acta Cybernetica, 1992. 14. Demetrovics, János and Huy, Xuan. Representation of Closure for Functional, Mul- tivalued and Join Dependencies. Computers and Artificial Intelligence, 1992. 15. Demetrovics, János and Libkin, Leonid and Muchnik, Ilya. Functional Dependen- cies in Relational Databases: a Lattice Point of View. Discrete Applied Mathemat- ics, 1992. 16. Duquenne, Vincent and Guigues, J.L. Familles Minimales d’Implications Informa- tives Resultant d’un Tableau de Donées Binaires. Mathematics and Social Sciences, 1986. 17. Ganter, Bernhard and Wille, Rudolf. Formal Concept Analysis: Mathematical Foundations. Springer, 1999. 18. Grätzer, George. General Lattice Theory. Academic Press, 1978. 348 16 Jaume Baixeries Baixeries J. 19. Pfaltz, John L. Using Concept Lattices to Uncover Causal Dependencies in Soft- ware. Formal Concept Analysis, 4th International Conference, ICFCA 2006, Dres- den, Germany, February 13-17, 2006. 20. Pfaltz, John L. Incremental Transformation of Lattices: A Key to Effective Knowledge Discovery. In Proc. of the First Intl. Conf. on Graph Transformation (ICGT’02), pages 351–362, Barcelona, Spain, Oct 2002. 21. Ullman, Jeffrey D. Principles of Database Systems. Computer Science Press, 1982. Cheating to achieve Formal Concept Analysis over a large formal context? Victor Codocedo1,3 , Carla Taramasco2 , and Hernán Astudillo1 1 Universidad Técnica Federico Santa Marı́a, Av. España 1640. Valparaı́so, Chile. 2 École Polytechnique, 32 Boulevard Victor 75015 Paris, France. 3 LORIA, BP 70239, F-54506 Vandoeuvre-lès-Nancy, France. Abstract. Researchers are facing one of the main problems of the In- formation Era. As more articles are made electronically available, it gets harder to follow trends in the different domains of research. Cheap, coher- ent and fast to construct knowledge models of research domains will be much required when information becomes unmanageable. While Formal Concept Analysis (FCA) has been widely used on several areas to con- struct knowledge artifacts for this purpose [17] (Ontology development, Information Retrieval, Software Refactoring, Knowledge Discovery), the large amount of documents and terminology used on research domains makes it not a very good option (because of the high computational cost and humanly-unprocessable output). In this article we propose a novel heuristic to create a taxonomy from a large term-document dataset us- ing Latent Semantic Analysis and Formal Concept Analysis. We provide and discuss its implementation on a real dataset from the Software Ar- chitecture community obtained from the ISI Web of Knowledge (4400 documents). 1 Introduction Research communities are facing one of the main problems of the Information Era and Formal Concept Analysis is not prepared to solve it. The amount of articles available online is growing each year yielding difficult to track trends, following ideas, looking for new terminology, etc. While some communities have under- stood the need for an artifact representing the knowledge within the domain (such as an ontology, a body-of-knowledge or a taxonomy) the problem remains in its construction since it is hard (highly technical), expensive (researchers are scarce) and complex (information is dynamic). Automatic and semi-automatic creation of a terms taxonomy have been widely boarded in several fields [3,4,5,13,24]. In this work we focus on the ap- proach described by Roth et al. [19] in which a taxonomy is derived from a corpus of documents by the use of Formal Concept Analysis (FCA). In partic- ular, they describe an application used to “represent a meaningful structure of ? We would like to thank Chilean project FONDEF D08I1155 ContentCompass, intra- basal project FB/20SO/10 in the context of the Chilean basal project FB0821 and ECOS-CONICYT project C09E08 for funding this work. c 2011 by the paper authors. CLA 2011, pp. 349–362. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 350 Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo a given knowledge community in a form of a lattice-based taxonomy”. This ap- plication is illustrated using a set of abstracts of the embryologist community obtained from MedLine spanning 5 years where a random set of 25 authors and 18 terms were analyzed. Although the lattice-based taxonomy obtained was a fair representation of the domain, real-size corpora of research communities are rather much larger than this example. Handling large datasets has been defined as one of the open problems in the community of FCA4 for two main reasons: first, the computational costs involved in the calculation of the concept lattice can make the use of FCA prohibitive and second, the concept lattice structure yielded could be so complex that its use may be impossible [10]. Iceberg lattices [21] help in improving readability by eliminating “not rep- resentative” data, but useful information, such as “emerging behaviors [12,15], is lost in the process. Stabilized lattices (using a stability measure [16]) also improves readability by eliminating “noisy elements” from data, but being a post-process tool it also raises computational costs. We describe in this document a novel heuristic to create a lattice-based tax- onomy from a large corpus using Formal Concept Analysis and a widely used Information Retrieval technique called Latent Semantic Analysis (LSA). In par- ticular, we describe a process to compress a formal context into a smaller reduced context in order to obtain a lattice of terms that can be used to describe the knowledge on a given research domain. We illustrate our approach using a real- size dataset from a research community of Computer Sciences. The remainder of this paper proceeds as follows: Section 2 explains the basis of FCA, section 3 presents our approach and section 4, a case study over a real dataset from a research community. Section 5 presents the results and a com- parison of the obtained taxonomy with a human-expert handmade thesaurus. Finally, the conclusions are described in section 6. 2 Formal Concept Analysis Formal Concept Analysis, originally developed as a subfield of applied mathe- matics [23], is a method for data analysis, knowledge representation and infor- mation management. It organizes information in a lattice of formal concepts. A formal concept is constituted by its extension (the objects that compose the concept) and its intension (the attributes that objects share). Objects and at- tributes are placed as rows and columns (resp.) in a cross-table or formal context where each cell indicates whether the object of that row have the attribute of that column. In what follows, we describe the Formal Concept Analysis framework as synthesized by Wille [22]. 4 http://www.upriss.org.uk/fca/problems06.pdf Cheating to achieve Formal Concept Analysis over a large formal context 351 2.1 Framework Let G be a set of objects, M a set of attributes and I a binary relation between G and M (I ⊆ (G × M )) indicating by gIm that the object g contains the attribute m and K = (G, M, I) be the formal context defined by G, M and I. For A ⊆ G and B ⊆ M it is defined the derivation operator (0 ) as follows: A0 = {m ∈ M | gIm, ∀g ∈ A}, with A ⊆ G (1) 0 B = {g ∈ G | gIm, ∀m ∈ B}, with B ⊆ M (2) A formal concept of the formal context K is defined by (A, B) with A ⊆ G, B ⊆ M , A0 = B and B 0 = A, where A is called the extent and B is called the intent of the concept. The set of all formal concepts is defined as L(G, M, I). For two formal concepts (A1 , B1 ), (A2 , B2 ) ∈ K, the hierarchy of concepts is given by the relation subconcept-superconcept as follows: (A1 , B1 ) ≤ (A2 , B2 ) ⇐⇒ A1 ⊆ A2 ( ⇐⇒ B1 ⊇ B2 ) (3) Where (A1 , B1) is called the subconcept and (A2 , B2 ) is called the supercon- cept. B(K) = (L(G, M, I), ≤) is the complete lattice or concept lattice of context K 2.2 Iceberg Concept Lattices Let (A, B) be a concept of B(K), its support is defined as: |A| supp(A, B) = (4) |G| Given a threshold minsupp ∈ [0, 1], the concept (A, B) is called a “frequent concept” if supp(A, B) ≥minsupp. An Iceberg lattice [21] is the set of all frequent concepts for a given min- supp. 2.3 Stability Stability was proposed by Kuznetsov in [14,16] as a mechanism to prune “noisy concepts”. It was extended by Roth et al. We use and provide their definition from [19] and [15]: Let K = (G, M, I) be a formal context and (A, B) be a formal concept of K. The stability index, σ, of (A, B) is defined as follows: |{C ⊆ A | C 0 = B}| σ(A, B) = (5) 2|A| 352 Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo Stability measures how much the intent of a concept depends on particular objects of its extent, meaning that if the formal context changes and some objects disappear, then stability indicates how likely it is for a concept to remain in the concept lattice. Stability can also be used to construct a stabilized lattice for a given threshold similarly to an iceberg lattice. Analogous to definition 5, the extensional stability of a concept (A, B) can be defined as: |{D ⊆ B | D0 = A}| σe (A, B) = (6) 2|B| Extensional stability measures how likely is for a concept to remain if some attributes are eliminated from the context. We will use both definitions in this work differentiating them as intensional stability (on (5)) and extensional sta- bility (on (6)). 3 Reducing a large formal context Different from Roth’s approach [19], we are not interested in tracking groups of people working on groups of topics, but rather in the relations among topics. These relations occur in the articles that authors write, where topics or terms can appear in sets and each one can appear one or more times. To elaborate: Given a corpus of articles G, a list of terms M and the relation among them I ⊆ (G × M ) indicating by gIm that the article g contains the term m, the document-article formal context is defined as: KO = (G, M, I) (7) 3.1 Rationale Even for a small set of terms, the amount of articles for a small research com- munity can reach thousands of articles making the processing of KO impossible or useless. The problem gets worse over time, because it can be expected that each year hundreds of articles will be added to the corpus. What happens with terms over time? In taxonomy evolution, as described in [18], symmetric patterns arise: some fields will progress or decline; some fields will contain more or less concepts (enrichment or impoverishment); and some fields will merge or split. In any case, it is not expected that the amount of terms would vary greatly. Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI) [6] is a technique used commonly in Information Retrieval (IR) as a tool for indexation, clusterization and query answering. LSA is based on the idea that for a given set of terms and documents, the relation among terms can be explained by a set of dimensions whose size is much smaller than the amount of documents. We exploit this feature of LSA to construct a reduced formal context of dimen- sions and terms having as conditions that information regarding relations of Cheating to achieve Formal Concept Analysis over a large formal context 353 terms cannot be lost and that it has to produce a coherent taxonomy using less computational time. In what follows, we provide a brief description of LSA to elaborate on how we used it to produce a reduced formal context. For further reading, please refer to [6]. 3.2 Latent Semantic Analysis Given a list of m terms and a corpus of n documents, let A be a term-document matrix of rank-min(m,n) as defined in 8, where aij is the weight5 of the term i in the document j. The Single-Value Decomposition of matrix A (in equation (9)) produces its factorization in three matrices where Σ contains the single-values of matrix A at the diagonal in descending order and the columns of matrices U and V are called left and right singular vectors of A. Am×n = [aij ] ; i = [1..m], j = [1..n] (8) T Am×n = Um×m · Σm×n · Vn×n (9) A0m×n = Um×k · Σk×k · Vk×n T (10) Since singular values drops quickly, we can create a new approximation of matrix A using k min(m, n) as shown in (10). Matrix A0 ≈ A is the closest k-rank matrix approximation to A by the Frobenius measure [11]. Two new matrices can be calculated: Bm×k = Um×k · Σk×k (11) Cn×k = Vn×k · Σk×k (12) where B holds the vector-space representation in k dimensions of terms; and C the one of documents. Both of these matrices are used for clusterization since, on them, similar elements are closer on each dimension. In particular, each di- mension on B (each column) has a Gaussian-like distribution where terms group around the mean value (see figure 1), except for dimension 0 (the different be- havior in figure 1(b)) where terms have almost the same value6 . We exploit this feature to define a conversion-function that allows us to construct the reduced context. 3.3 A probabilistic-based conversion-function Which terms are related within a given dimension? Since each dimension holds continuous values, it is hard to define a region for them. Nevertheless, we know 5 Several weighting functions can be used, being the most used frequency of term and term frequency-inverse document frequency (tf.idf) 6 We do not use the information in this dimension for our analysis and exclude it from our results. 354 Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo 90 300 80 250 70 60 200 50 Hits Hits 150 40 30 100 20 50 10 0 0 0.10 0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.2 0.1 0.0 0.1 0.2 0.3 0.4 Coordinate Value Coordinate Value (a) Distribution of values in dimension 1 (b) Distribution of values in all dimensions Fig. 1. Distribution of Dimensions’ values in matrix B that such a region has to be centered at the mean value of the dimension. Hence, we define a “belonging region” centered at the mean with a modifiable width. Terms in this region are related because they belong in the dimension and hence, the pair dimension-term will appear in the reduced context. The width of the “belonging region” is a parameter that allows us to manage the density of the context. The conversion function is defined as: ( 1 Gk (x) ∈ [α, 1 − α] bl (x, k) = (13) 0 otherwise where function Gk is the probability density function (PDF) for dimension k and α ∈ ]0,0.5[ defines the limits of the “belonging region”. 3.4 Creating the reduced context For a document-article formal context KO as defined in (7) (original context) and a term-document matrix A as defined in (8) analogous to KO : Given the factorization of matrix A as defined in (10), the vector-space rep- resentation of its terms in k dimensions B as defined in (11) and a conversion- function bl (x, k) as defined in (13): Let D be the set of k dimensions in B IR ⊆ (D × M ) = {(j, i) : ∀j ∈ D ∧ ∀i ∈ M ⇐⇒ bl (Bij , j) = 1} (14) we define the reduced context of KO as KR = (D, M, IR ). Notice the inversion of pair (j, i) and Bij performed to respect LSA con- ventions that require term-document matrices and FCA that uses document as objects and terms as attributes. In the reduced context we say “dimension j con- tains term i if the evaluation of the conversion-function bl over the value of the coordinate j for the term i is 1”. Summarizing, in order to get a reduced context, the values for α and k must be found. Cheating to achieve Formal Concept Analysis over a large formal context 355 3.5 Related approaches Similar techniques have been proposed before. Gajdos et al[9] used LSA to re- duce complexity in the structure of the lattice by eliminating noise in the formal context. While this approach is useful, it does not reduce the amount of data, but it “tunes it” to get a clearer result. Snasel et al. [20,9] proposed a matrix- reduction algorithms based on NMF. 7 and SVD8 . While they state that these methods are successful to reduce the amount of concepts obtained using FCA, they do not describe a real life use of their technique (their experiment was performed over a 17x16 matrix) neither do they discuss about the performance of their approach. Kumar and Srinivas [1] approach consists of using fuzzy K- Means clustering 9 to reduce the attributes in a formal term-document context. In their approach, documents are categorized in k clusters using the cosine sim- ilarity measure. Cheung et al. [2] introduced term-document lattices complexity reduction by defining a set of equivalence relations that allows to reduce the set of objects. Finally, Dias et al. introduced JBOS [7] (junction based on objects similarity) which proposed a similar method where objects where group into pro- totype objects by calculating its similarity according to certain weights assigned manually to attributes. 4 Case Study: Software Architecture Community The Software Architecture Corpus (SAC) was composed by extracting metadata from papers retrieved by the ISI Web of Knowledge search engine 10 using the query ”software architecture”. It is assumed that the keyword ”software archi- tecture” is present in each paper on their titles and/or abstracts. While the search engine retrieved 4701 articles, not all of them have an abstract to work with. Those are excluded from our analysis leaving 4565 articles spanning from 1990 to 2009 (retrieved documents span from 1973 to 2009). 4.1 Term list A term list was assembled by using Natural Language Processing over the arti- cles’ titles and abstracts. In order to avoid common words, a stopword list and a lexical tagger were used as a filter. A list of candidate terms was then manually filtered to obtain a final list of 120 terms, which included words and multi-words (such as “Unified Model Language”). Table 1 shows a sample of selected terms. Each term was looked up on each document and its frequency of use was calculated. Then, a weighting measure was applied (tf.idf11 ) to each value. The 7 Non-negative matrix factorization 8 Single-Value Decomposition 9 K-Means Clustering is a classic clustering technique for vector-space models 10 http://isiwebofknowledge.com 11 Term Frequency-Inverse Document Frequency is a weighting measure commonly used on IR based on the notion that term infrequency on a global scale makes it important. 356 Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo Table 1. Top 10 more frequent terms Term Frequency design 1710 development 1450 component 1253 process 1083 implementation 1006 datum 874 requirement 869 analysis 851 framework 817 control 801 term-document matrix Aw = aij was constructed using the final list of terms (M) and the corpus of documents (G) where aij represents the weight of term i in document j. We defined the relation I ⊆ (G × M ) = {(j, i) : ∀j ∈ G ∧ ∀i ∈ M ⇐⇒ aij > 0} to build up the original context KO = (G, M, I) describing that a document contains a term only if its weight on it is over 0. The formal context KO was used later to compare our reductions. 4.2 Reducing the SAC As we stated at the end of section 3.3, two parameters had to be set in order to create the reduced context. Sadly, in LSA there is not a known method to find the best value for k, and not knowing that, it is not possible to find a good value for α. We defined a set of goals to observe which were the values of k and α that best accomplished them. The goals defined were: – Low Execution time – High Stability – Few Concepts in the final lattice Using three fixed values for k we reduced several contexts and processed them through FCA in order to find the best value for α. As shown in figure 2, it was found that higher values of α (close to 0.5) yields the best results. Repeating the experience with 3 fixed values for α (0.45, 0.47 and 0.49) to find the best value for k we found a trade-off between stability and execution time as it can be observed in figure 3. Higher values of k yield higher stability but also a high execution time, and vice-versa. Since stability drops fast on k=60 and in the same value the execution time grows greatly, we selected it to obtain our results. α was set on 0.45 and 0.47. Cheating to achieve Formal Concept Analysis over a large formal context 357 2000 0.30 K=10 K=10 K=15 K=15 K=20 K=20 0.25 1500 0.20 Mean Stability 1000 time 0.15 0.10 500 0.05 0 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.00 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 alpha alpha (a) alpha vs Execution Time (b) alpha vs Stability Fig. 2. Fixed K, Variable α 1.0 1.0 alpha=0.45 alpha=0.45 alpha=0.47 alpha=0.47 alpha=0.49 0.9 alpha=0.49 0.8 0.8 0.7 0.6 100 mean stab 0.6 time 0.4 0.5 0.4 0.2 0.3 0.00 20 40 60 80 100 120 0.20 20 40 60 80 100 120 K K (a) K vs Execution Time Normalized (b) K vs Top 100 Mean Stability Fig. 3. Variable K, Fixed α 5 Results and Discussion Table 2 shows a comparative of the characteristics of the lattices yielded from two reduced contexts (KR ) and the original context (KO ). The lattices were processed using the FCA suite Coron System12 . 12 http://coron.loria.fr/ 358 Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo Table 2. Comparison of characteristics α = 0, 45 α = 0, 47 Original Objects 60 60 4565 Attributes 120 120 120 Density [%] 17,24 10,59 6,59 Concepts 6309 1207 170606 Coincidental Intents 3029 815 - Mean attributes per concept 20,52 12,6 7,91 Intensional Stability 0,2170 0,3041 0.3995 Extensional Stability 0.2277 0.3211 0.1103 100 Top Int. Stab. 0,9061 0,7576 1 100 Top Ext. Stab. 0.9515 0.8287 0.9837 Levels 10 7 10 Time [s] 6,869 1,145 2865,723 Time to reduce [s] 39,333 39,325 - Results shows that using LSA before FCA performs a clear reduction in the formal context from a size of 4565 × 120 (original context) to 60 × 120 (reduced context), specifically a reduction of 76 times the amount of data to be processed. It also lowers the amount of concepts yielded in the final lattice (27 and 141 times for α equal to 0.45 and 0.47 resp.), and because of that the time required to calculate the full concept lattice is considerably reduced, even considering the time required to create the reduced contexts. Stability gives more clues about the good quality of the reduction. Figure 4 shows intensional and extensional stability distribution. As it can be observed, the original context’s lattice has a better intensional stability than the reduced contexts but a worst extensional stability. Mean values for these two measures are shown in table 2. Since we have eliminated redundant data, each dimension is almost equally important meaning that in reduced contexts we cannot afford to eliminate a subset of them without affecting greatly the structure of the lattice obtained. In this case, we have eliminated a big part of the noise (k=60 was in fact a very good choice). On the other hand, the growth in extensional stability reflects that the structure of the reduced lattices is not tied to some specific terms. Some terms can be removed and the structure of the lattice would not vary greatly, which is what happens each year (see section 3.1). 5.1 A Software Architecture Taxonomy Figure 5 shows the reduced notation of the lattice for the reduced context (k=60 and α = 0.45). This lattice was drawn with Coron-drawer13 a set of scripts specially written for large lattices. For the sake of space and simplicity we provide 13 http://code.google.com/p/coron-drawer/ Cheating to achieve Formal Concept Analysis over a large formal context 359 7 6 a=0.45 a=0.45 a=0.47 a=0.47 original original 6 5 5 4 4 hits hits 3 3 2 2 1 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Stability Stability (a) Intensional Stability (b) Extensional Stability Fig. 4. Stability distribution (k=60) Fig. 5. Filtered Lattice (K=60, α = 0.45, minsupp=0) a small comparison of the terms in the reduced lattice-based taxonomy with a human-expert handmade thesaurus of Software Architecture [8]. Software Architecture Thesaurus Comparison The thesaurus contains 494 elements (we call them elements to differentiate them from lattice’s con- cepts and taxonomy’s terms) organized in a hierarchical fashion. They have at most one parent and the hierarchy has multiple roots. The thesaurus is exhaus- tive and comprises mainly definitions of Software Architecture’s concepts and entities (such as framework’s names or important authors in the domain). The comparison shows: – From our 120 term list, 50 terms (42%) match exactly with a term on the thesaurus. 25 terms (21%) have a semi-match, meaning that they are part of a term on the thesaurus (database in our hierarchy and shared database in the thesaurus) and 45 (37%) terms do not have a simile in the thesaurus. 360 Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo – The three main concepts design, analysis and framework (with support over 50%) found in our taxonomy, also remain being main elements in the the- saurus. – Even when some elements in the thesaurus are not found in our taxonomy, they actually exists as relations among terms. For instance, the relation among the terms design and pattern describe the thesaurus’ element design pattern. This is also true for design decision, information view, knowledge reuse, quality requirements, business methodology and several more elements. 6 Conclusions In this work we have presented a method and a technique to apply Formal Concept Analysis (FCA) to large contexts of data in order to obtain a lattice- based taxonomy. We have outlined that large-size datasets are not suitable to be processed by FCA and that, this fact is an important problem in the domain. The solution presented here, is based on an Information Retrieval technique called Latent Semantic Analysis which is used to reduce a term-document matrix to a much smaller matrix where terms are related to a set of dimensions instead of documents. Using a probabilistic approach, this matrix is converted into a binary formal context where FCA can be applied. The approach was illustrated with a case study using a research domain from computational sciences called Software Architecture. The corpus created for this domain consists of more than 4500 documents and 120 terms. We have com- pared the characteristics of the lattice obtained through FCA from the original formal context of terms and documents and the reduced contexts generated by our approach. We have found that not only our approach is considerably more economic in execution time as well as in the amount of concepts obtained in the final lattice but intensional and extensional stabilities give us elements to be certain of the quality of our approach. A small comparison with a human expert handmade thesaurus of the com- munity of Software Architecture is provided in order to illustrate that a real and coherent taxonomy can be obtained using our approach. References 1. Ch. Aswani Kumar and S. Srinivas. Concept lattice reduction using fuzzy K-Means clustering. Expert Systems with Applications, 37(3):2696–2704, March 2010. 2. Karen S. K. Cheung and Douglas Vogel. Complexity Reduction in Lattice-Based Information Retrieval. Information Retrieval, 8(2):285–299, April 2005. 3. Philipp Cimiano, Andreas Hotho, and Steffen Staab. Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Int. Res., 24:305–339, August 2005. 4. Vı́ctor Codocedo and Hernán Astudillo. No mining, no meaning: relating docu- ments across repositories with ontology-driven information extraction. In Proceed- ing of the eighth ACM symposium on Document engineering, DocEng ’08, pages 110–118, New York, NY, USA, 2008. ACM. Cheating to achieve Formal Concept Analysis over a large formal context 361 5. Wisam Dakka, Panagiotis G. Ipeirotis, and Kenneth R. Wood. Automatic con- struction of multifaceted browsing interfaces. In Proceedings of the 14th ACM international conference on Information and knowledge management, CIKM ’05, pages 768–775, New York, NY, USA, 2005. ACM. 6. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. Indexing by latent semantic analysis. Journal of the american society for Information Science, 41(6):391–407, 1990. 7. Sergio M. Dias and Newton J. Vieira. Reducing the size of concept lattices: The JBOS Approach. In Proceedings of the 8ht international conference on Concept Lattices and their Applications, CLA 2010, pages 80–91, 2010. 8. Anabel Fraga and Juan Lloréns. Training initiative for new software/enterprise architects: An ontological approach. In WICSA, page 19. IEEE Computer Society, 2007. 9. Petr Gajdos, Pavel Moravec, and Václav Snásel. Concept lattice generation by singular value decomposition. In Václav Snásel and Radim Belohlávek, editors, International Workshop on Concept Lattices and their Applications (CLA), volume 110 of CEUR Workshop Proceedings. CEUR-WS.org, 2004. 10. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foun- dations. Springer, Berlin/Heidelberg, 1999. 11. Gene H. Golub and Charles F. Van Loan. Matrix computations (3rd ed.). Johns Hopkins University Press, Baltimore, MD, USA, 1996. 12. Nicolas Jay, François Kohler, and Amedeo Napoli. Analysis of social communities with iceberg and stability-based concept lattices. In Proceedings of the 6th inter- national conference on Formal concept analysis, ICFCA’08, pages 258–272, Berlin, Heidelberg, 2008. Springer-Verlag. 13. John Kominek and Rick Kazman. Accessing multimedia through concept clus- tering. In Proceedings of the SIGCHI conference on Human factors in computing systems, CHI ’97, pages 19–26, New York, NY, USA, 1997. ACM. 14. Sergei Kuznetsov. Stability as an estimate of the degree of substantiation of hy- potheses derived on the basis of operational similarity. nauchn. tekh. inf., ser.2 (automat. document. math. linguist.). 12:21–29, 1990. 15. Sergei Kuznetsov, Sergei Obiedkov, and Camille Roth. Reducing the representa- tion complexity of lattice-based taxonomies. In Uta Priss, Simon Polovina, and Richard Hill, editors, Conceptual Structures: Knowledge Architectures for Smart Applications, volume 4604 of Lecture Notes in Computer Science, pages 241–254. Springer Berlin / Heidelberg, 2007. 16. Sergei O. Kuznetsov. On stability of a formal concept. Annals of Mathematics and Artificial Intelligence, 49:101–115, April 2007. 17. Uta Priss. Formal concept analysis in information science. Annual Review of Information Science and Technology, 40(1):521–543, September 2007. 18. Camille Roth and Paul Bourgine. Lattice-based dynamic and overlapping tax- onomies: The case of epistemic communities. SCIENTOMETRICS, 69(2):429–447, NOV 2006. 19. Camille Roth, Sergei Obiedkov, and Derrick Kourie. Towards concise represen- tation for taxonomies of epistemic communities. In Proceedings of the 4th in- ternational conference on Concept lattices and their applications, CLA’06, pages 240–255, Berlin, Heidelberg, 2008. Springer-Verlag. 20. Vaclav Snasel, Martin Polovincak, and Hussam M. Dahwa. Concept lattice Re- duction by Singular Value Decomposition. Proceedings of the Spring Young Re- searcher’s Colloquium on Database and Information Systems, 2007. 362 Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo 21. Gerd Stumme. Efficient data mining based on formal concept analysis. In Abdelka- der Hameurlain, Rosine Cicchetti, and Roland Traunmüller, editors, Database and Expert Systems Applications, volume 2453 of Lecture Notes in Computer Science, pages 3–22. Springer Berlin / Heidelberg, 2002. 22. Rudolf Wille. Restructuring lattice theory: an approach based on hierarchies of concepts. In Ivan Rival, editor, Ordered sets, pages 445–470, Dordrecht–Boston, 1982. Reidel. 23. Rudolf Wille. Formal concept analysis as mathematical theory of concepts and concept hierarchies. In Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors, Formal Concept Analysis, volume 3626 of Lecture Notes in Computer Science, pages 1–33. Springer Berlin / Heidelberg, 2005. 24. Jian-hua Yeh and Naomi Yang. Ontology construction based on latent topic ex- traction in a digital library. In George Buchanan, Masood Masoodian, and Sally Cunningham, editors, Digital Libraries: Universal and Ubiquitous Access to Infor- mation, volume 5362 of Lecture Notes in Computer Science, pages 93–103. Springer Berlin / Heidelberg, 2008. A FCA-based analysis of sequential care trajectories Elias EGHO, Nicolas Jay, Chedy Raissi and Amedeo Napoli Orpailleur Team, LORIA, Vandoeuvre-les-Nancy, France elias.egho,nicolas.jay,chedy.raissi,amedeo.napoli@loria.fr Abstract. This paper presents a research work in the domains of se- quential pattern mining and formal concept analysis. Using a combined method, we show how concept lattices and interestingness measures such as stability can improve the task of discovering knowledge in symbolic sequential data. We give example of a real medical application to illus- trate how this approach can be useful to discover patterns of trajectories of care in a french medico-economical database. Keywords: Data-Mining, Formal Concept Analysis, Sequential patterns, stability 1 Introduction Sequential pattern mining, introduced by Agrawal et al [2], is a popular ap- proach to discover patterns in ordered data. It can be seen as an extension of the well known association rule problem, applied to data that can be modelled as sequences of itemsets, indexed for example by dates. It helps to discover rules such as: customers frequently first buy DVDs of episodes I, II and III of Stars Wars, then buy within 6 months episodes IV, V, VI of the same famous epic space opera. Sequential pattern mining has been successfully used so far in various domains : DNA sequencing, customer behavior, web mining . . . [2]. Many scalable methods and algorithms have been published so far to effi- ciently mine sequential patterns. However few of them deal with the multidi- mensional aspect of databases. Multidimensionality conveys two notions: – items can be of different intrinsic nature. While the common approach con- siders objects of the same dimension, for example articles bought by cus- tomers, databases can hold much more information such as article price, gender of the customer, location of the store and so on. – a dimension can be considered at different levels of granularity. For example, apples in a basket market analysis can be either described as fruits, fresh food or food following a hierarchical taxonomy. Plantevit et al. [13] address this problem as mining multidimensional and multi- level sequential patterns and propose a method to achieve this task. They rely on the support measure to efficiently discover relevant sequential patterns. Support c 2011 by the paper authors. CLA 2011, pp. 363–376. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 364 Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli indicates to what extent a pattern is frequent in a database. Many (sequential and non sequential) itemset mining methods use support as measure for find- ing interesting correlations in databases. However, the most relevant patterns may not be the most frequent ones. Moreover, discovering interesting patterns with low support leads generally to overwhelming results that need to be further processed in order to be analyzed by human experts. Formal Concept Analysis (FCA) is a theory of data analysis introduced in [17], that is tightly connected with data-mining and especially the search of frequent itemsets [16]. FCA organizes information into a concept lattice repre- senting inherent structures existing in data. Recently, some authors proposed new interest measures to reduce complex concept lattices and thus find inter- esting patterns. In [9], Kuznetsov introduces stability, successfully used in social network and social community analysis [7, 6]. To our knowledge, there are no similar approaches to find interesting sequen- tial patterns. In this paper, we present an original experiment based on both multilevel and multidimensional sequential patterns and lattice-based classifi- cation. This experiment may be regarded from two points of view: on the one hand, it is based on multilevel and multidimensional sequential patterns search, and on the other hand, visualization and classification of extracted sequences is based on Formal Concept Analysis (FCA) techniques, organizing them into a lattice for analysis and interpretation. It has been motivated by the problem of mining care trajectories in a regional healthcare system, using data from the PMSI, the so called French hospital information system. The remaining of the paper is organized as follows. In Section 2, we present the problem of mining care trajectories. Section 3 presents the methods proposed in domains of multi- level and multidimensional sequential patterns and Formal Concept Analysis. In Section 4, we present some of the results we achieved. 2 Mining healthcare trajectories The PMSI (Programme de Médicalisation des Systèmes d’ information) database is a national information system used in France to describe hospital activity with both an economical and medical point of view. The PMSI is based on the systematic collection of administrative and medical data. In this system, every hospitalization leads to the collection of administrative, demographical and medical data. This information is mainly used for billing and planning purposes. Its structure can be described (and voluntarily simplified) as follows: – Entities (attributes): • Patients (id, gender . . . ) • Stays (id, hospital, principal diagnosis, . . . ) • Associated Diagnoses (id) • Procedures (id, date,. . . ) – Relationships • a patient has 1 or more stays • a stay may have several procedures A FCA-based analysis of sequential care trajectories 365 • a stay may have several associated diagnoses The collection of data is done with a minimum recordset using controlled vocabularies and classifications. For example, all diagnoses are coded with the International Classification of Diseases (ICD10)1 . Theses classifications can be used as taxonomies to feed the process of multilevel sequential pattern mining as shown in figure 1. ICD 10 Institutions taxonomy Fig. 1. Examples of taxonomies used in multilevel sequential pattern mining Healthcare management and planning play a key role for improving the over- all health level of the population. From a population point of view, even the best and state-of-the-art therapy is not effective if it cannot be delivered in the right conditions. Actually, many determinants affect the effective delivery of healthcare services: availability of trained personnel, availability of equipment, security constraints, costs, proximity . . . . All of these should meet economics, demographics, and epidemiological needs in a given area. This issue is especially acute in the field of cancer care where many institutions and professionals must cooperate to deliver high level, long term, and costly care. Therefore, it is crucial for healthcare managers and decision makers to be assisted by decision support systems that give strategic insights about the intrinsic behavior of the healthcare system. On the one hand, healthcare systems can be considered as rich in data as they produce massive amounts of data such as electronic medical records, clin- ical trial data, hospital records, administrative data, and so on. On the other hand, they can be regarded as poor in knowledge as these data are rarely em- bedded into a strategic decision-support resource [1]. We used the PMSI system as a source of data to study patient movements between several institutions. By organizing themselves into groups of sequences representing trajectories of care, we aim at discovering patterns describing the whole course of treatments for a given population. This global approach contrasts with the usual statistical exploitations of the PMSI data that focus mainly on single hospitalizations. In this experiment, we have worked on four years (2006 – 2009) of the PMSI data of the Burgundy region related to patient suffering from lung cancer. 1 http://apps.who.int/classifications/apps/icd/icd10online/ 366 Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli 3 Related work 3.1 Sequential Pattern Mining Let I be a finite set of items. A subset of I is called an itemset. A sequence s = hs1 s2 . . . sk i (si ⊆ I) is an ordered list of itemsets. A sequence s = hs1 s2 . . . sn i 0 0 0 0 is a subsequence of a sequence s = hs1 s2 . . . sm i if and only if ∃i1 , i2 , . . . in such 0 0 0 0 that i1 ≤ i2 ≤ . . . ≤ in and s1 ⊆ si1 , s2 ⊆ si2 . . . an ⊆ sin . We note s ⊆ s and 0 also say that s contains s. Let D = {s1 , s2 . . . sn } be a database of sequences. The support of a sequence s in D is the proportion of sequences of D containing s. Given a minsup threshold, the problem of frequent sequential pattern min- ing consists in finding the set FS of sequences whose support is not less than minsup. Following the seminal work of Agrawal and Srikant [2] and the Apriori algorithm, many studies have contributed to the efficient mining of sequential pattern. The main approaches are PrefixSpan [11], SPADE [20], SPAM [3], PSP [10], DISC [4] and PAID [18]. Much work has been done in the area of single-dimensional sequential pat- terns, i.e, all the items in a sequence have the same nature like the sequence of products sold in a certain store. But in many cases, the information in a se- quence can be based on several dimensions. For example: a male patient had a surgical operation in Hospital A and then received chemotherapy in Hospital B. In this case, we have 3 dimensions: gender, type of treatment (chemotherapy, surgery) and location (Hospitals A and B). Pinto et al [12] is the first work giving solutions for mining multidimensional sequential patterns. They propose to include some dimensions in the first or the last itemset in the sequence. But this works only for dimensions that remain constant over time, such as gender in our previous example. Among other proposals addressed in this area, Yu et al [19] consider multidimensional sequential pattern mining in the web domain. In their approach, dimensions are pages, sessions and days. They present two algo- rithms AprioriMD and PrefixMDSpan by modifying the Apriori and PrefixSpan algorithms. Zhang et al [21] propose the mining of multidimensional sequential patterns in distributed system. Moreover, each dimension can be represented by different levels of granu- larity, using a taxonomy which defines the hierarchical relations between items. Figure 2 shows an example of a diseases taxonomy. Including knowledge con- tained in the taxonomy leads to the problem of multilevel sequential pattern mining. Its interest resides in the capacity to extract more or less general/specific sequential patterns and overcome problems of excessive granularity and low sup- port. For example, using the diseases taxonomy in Figure 2, sequences such as hHeartDisease, BrainDisease > could be extracted while hArryth., BrainDisease > and hMyoc.Inf., BrainDisease > may have a too low support. Although Srikant and Agrawal [14] early introduced hierarchy management in the extraction of association rules and sequential patterns, their approach was not scalable in a multidimensional context. Han et al [5] proposed a method for mining multiple level association rules in large databases. But their approach could not extract patterns containing items from different levels in the taxonomy. Plantevit et al [13] proposed M3SP, a method taking both multilevel and multidimensional aspects into account. M3SP is able to find sequential patterns with the most appropriate level of granularity. A FCA-based analysis of sequential care trajectories 367 Fig. 2. disease’s taxonomy The PMSI is a multidimensional database holding information coded with controled vocabularies and taxonomies. Therefore, we relied on M3SP to extract multilevel and multidimensional sequential patterns. Nevertheless, the M3SP paradigm is still the search of frequent patterns. As our objective is to discover interesting patterns that may be infrequent, we ran M3SP iteratively until very low support thresholds. (See appendix for more details about M3SP and how we used it). This produced massive amounts of patterns requiring further processing for a practical interpretation by a domain expert. This next phase was conducted with a lattice-based classification of sequential patterns described in the following section. 3.2 Formal Concept Analysis Introduced by Wille [17], Formal Concept Analysis is based on the mathematical order theory. FCA has successfully been applied to many fields, such as medicine and psy- chology, musicology, linguistic databases, information science, software engineering . . . . A strong feature of Formal Concept Analysis is its capability of producing graphical visualizations of the inherent structures among data. FCA starts with a formal context K = (G, M, I) where G is a set of objects, M is a set of attributes, and the binary relation I = G × M specifies which objects have which attributes. Two operators, both denoted by 0 , connect the power sets of objects 2G and attributes 2M as follows: 0 : 2G → 2M , X0 = {m ∈ M|∀g ∈ X, gIm} 0 : 2M → 2G , Y0 = {g ∈ G|∀m ∈ Y, gIm} The operator 0 is dually defined on attributes. The pair of 0 operators induces a Galois connection between 2G and 2M . The composition operators 00 are closure operators: they are idempotent, extensive and monotonous. For any A ⊆ G and B ⊆ M, A00 and B 00 are closed sets whenever A = A00 and B = B00 . A formal concept of the context K = (G, M, I) is a pair (A, B) ⊆ G × M where A0 = B and B0 = A. A is called the extent and B is called the intent. A concept (A1 , B1 ) is a subconcept of a concept (A2 , B2 ) if A1 ⊆ A2 (which is equivalent to B2 ⊆ B1 ) and we write (A1 , B1 ) ≤ (A2 , B2 ). The set B of all concepts of a formal context K together with the partial order relation ≤ forms a lattice and is called concept lattice of K. This lattice can be represented as a Hasse diagram providing a visual support for interpretation. 368 Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli 4 Classification and selection of interesting care trajectories We use FCA to classify and filter the results of the sequential mining step. The formal context is built by taking patients as objects, and sequential patterns as attributes. A patient p, considered as a sequence, is related to a sequential pattern s if p contains s. Table 4 shows a formal context KP S representing the binary relation between the patients and the sequences. The cross indicates that the patient has passed completely in the sequence of the health facilities. Thus, we achieve a classification of patients according to their trajectories of care. Seq1 Seq2 Seq3 Seq4 P1 x x x P2 x P3 x x x P4 x Table 1.formal context KP S In order to choose the most important concepts, we rely on stability, a measure of interest introduced in [8] and revisited in [9]. Let (A, B) be a formal concept of B. Stability of (A, B) is defined as: 0 0 γ(A, B) = |{C⊆A|C2|A| =A =B}| The stability index of a concept indicates how much the concept intent depends on particular objects of the extent. It indicates the probability of preserving concept intent while removing some objects of its extent. A stable concept continues to be a concept even if a few members stop being members. This means also that a stable concept is resistant to noise and will not collapse when some members are removed from its extent. Stability offers an alternative point of view on concepts compared to the well known metric of support based on frequency, which is noticely used to build iceberg lattices [15]. Actually, combining support and stability allows a more subtle interpretation, as shown in a previous work in the same application domain [6]. 5 Results 5.1 Patient healthcare trajectories The PMSI is a relational database holding informations for any hospitalization in France. We reconstituted patient care trajectories from PMSI data considering each stay as an itemset. The sequence of stays for a same patient defines his care trajectory. In our experiment, itemsets could be made of various combinations of dimensions. Table 2 shows the trajectories of care obtained using two dimension (principal diagnosis, hospital ID). For example (C341,210780581) represents one hospitalization for a patient A FCA-based analysis of sequential care trajectories 369 Patient Sequence p1 h(C341, 750712184)(Z452, 580780138)(D122, 030785430) . . .i p2 h(C770, 100000017)(C770, 210780581)(Z080, 210780581) . . .i p3 h(H259, 210780110)(H259, 210780110)(K804, 210010070) . . . > p4 h(R91, 210780136)(C07, 210780136)(C341, 210780136) . . .i Table 2. Care trajectories of 4 patients showing principal diagnoses and hospital IDs in the University Hospital of Dijon (coded as 210780581) treated for a lung cancer (C341). Our dataset contained 486 patients suffering from lung cancer and living in the French region of Burgundy. Table 3 shows some of the patterns generated by M3SP with the data presented in Table 2 using taxonomies of Figure 1. Pattern 3 can be interpreted as follows: 36% of patients have a hospitalization in a private institution (CL), for any kind of principal diagnosis (ALL). Then, 3 hospitalizations follow with the same principal diagnosis (Z511 coding for chemotherapy). That kind of pattern demonstrates the interest of multilevel and multidimensional sequential pattern mining: though principal diagnosis are the same in the third last stays, hospitals can be different. Mining at the lowest level of granularity, without taxonomies, would generate many different patterns with lower support. ID Support Pattern 1 100% h(All, All)i 2 65% h(Z511, All)(Z511, All)(Z511, All)i 3 36% h(All, CL)(Z511, All)(Z511, All)(Z511, All)i 4 21% h(Z511, CH)(Z511, CH)(Z511, CH)(Z511, CH)(Z511, CH)i Table 3. Example of sequential patterns generated by M3SP However, for low support thresholds, the number of extracted patterns dramatically grows with the size of the database, depending on the number of patients, the size of the taxonomies and the number of dimensions as shown in Table 4. Dimensions used Number of patterns Institutions 1529 Principal Diagnosis, Institution 4051 All diagnoses 50546 Institutions, Medical Procedures 293402 Table 4. Number of patterns generated by M3SP (minsup=5%) The next step consists in building a lattice with the resulting sequential patterns in order to facilitate interpretation and selection of interesting care trajectories. 370 Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli 5.2 Lattice-based classification of sequential patterns We illustrate this approach with patterns representing the sequences of institutions that are frequent in the patients set. We built a formal context relating 486 patients and 1529 sequential patterns. These sequences are generated in the first experimental by considering only one dimension (healthcare institutions). It is characterized with a taxonomy with two levels of granularity. We iteratively applied M3SP, decreasing threshold by one patient at each step. The resulting lattice has 10145 concepts orga- nized on 48 different levels. Figure 3 shows the upper part of the lattice. Concepts intents are sets of one or more sequential patterns. From the lowest right concept, we can see that 37 patients support 3 sequential patterns: – at least one hospitalization in the hospital 690781810 – they were hospitalized at least once in a University Hospital (CHU/CHR) – they had at least 2 hospitalization, for simplicity, 2*(ALL) is the contraction of (ALL)(ALL). The intent of top concept is h(ALL)i, because all patients have at least one hospital- ization during their treatment. The intent of co-atoms (i.e. immediate descendant of top) is always a sequence of length one, holding items of high level of granularity. <{(All)}> Nb=486 <{(CHU/CHR)} <{(CL)}> <{(CH)}> <2*{(All)}> > <{(All)}> <{(All)}> Nb=475 <{(All)}> Nb=287 Nb=279 Nb=259 <{(710780354)}> <{(890000409)}> <{(210780581)}> <{(CHU/CHR)}> <{(CH)}> <3*{(All)}> <{(CL)}> <{(CH)}> <{(CHU/CHR)}> … Nb=459 <{(All)}> Nb=12 <{(All)}> Nb=457 <2*{(All)}> Nb=254 <2*{(All)}> Nb=277 <{(All)}> Nb=156 … <{(710780354)}> <{(210780714)}> <{(710780958)}> <{(690781810)}> <{(210780979)}> … <4*{(All)}> Nb=445 <{(CH)}> <2*{(All)}> <{(CL)}> <{(All)}> <{(CH)}> <2*{(All)}> <{(CHU/CHR)}> <2*{(All)}> … Nb=23 Nb=32 Nb=37 Nb=8 Fig. 3. Lattice of sequences of healthcare institutions Filtering concepts can be achieved using both support and stability. In order to highlight the interesting properties of stability, we try to answer the question “is there a number of hospitalizations that characterizes care trajectories for lung cancer?”. A basic scheme in lung cancer treatment consists generally in a sequence of 4 chemotherapy sessions possibly following a surgical operation. Due to noise in data or variability in A FCA-based analysis of sequential care trajectories 371 practices, we may observe sequences of 4, 5, 6 or more stays in the PMSI database. Mining such data with an a priori fixed support threshold may not discover the most interesting patterns. If the threshold is too high, we simply miss the good pattern. If it is too low, similar patterns, differing only in length, with close values of support can be extracted. Figure 4 shows the power of stability in discriminating such patterns. The concept with intent h(CL)ih2 ∗ (ALL)i is the most frequent. It represents patients with at least a stay in a private organization, and at least 2 stays in hospital. Similar concepts have a relatively close support, and differ only in the total number of stays. The concept with 5 stays has the highest stability. This probably matches the basic treatment scheme of lung cancer. Our interpretation relies on the power of stability to point out noisy concepts. Actually, only a few patients in concept h(CL)ih2 ∗ (ALL)i have only 2 stays. Fig. 4. Discriminating power of stability: scatter plot of support and stability of con- cepts (represented by their intent) Another interesting feature of lattice-based classification of sequential patterns lies in its ability to characterize objects by several patterns. Let consider the minimal database of sequences D = {s1 = h(a)(b)(c)i; s2 = h(a)(c)(b)i : s3 = h(d)i}. With a 2/3 threshold, h(a)(b)i and h(a)(c)i are considered as frequent sequential patterns, but sequential pattern mining will give no information about the fact that all sequences containing the pattern h(a)(b)i contain also the pattern h(a)(c)i. However this infor- mation can be obtained by classifying sequential patterns with FCA. 372 Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli 6 Conclusion In this paper we propose an original combination of sequential pattern mining and FCA to explore a database of multidimensional sequences. We show some interesting prop- erties of concept lattices and stability index to classify and select interesting sequential patterns. This work is in a early step. Further developments can be made in several axes. First, other measures of interest could be investigated to qualify sequential pat- terns. Furthermore, connexions between FCA and the sequential mining problem could be explored in a more integrative approach, especially by studying closure operators on sequences. 7 Acknowledgments The authors wish to thank the TRAJCAN project for its financial support and Mrs. Catherine QUANTIN, the responsible of TRAJCAN project at university hospital of Dijon. References 1. Abidi, S.S.: Knowledge management in healthcare: towards ’knowledge-driven’ decision-support services. Int J Med Inform 63(1-2), 5–18 (Sep 2001) 2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.S.P. (eds.) Eleventh International Conference on Data Engineering. pp. 3–14. IEEE Computer Society Press, Taipei, Taiwan (1995), cite- seer.ist.psu.edu/agrawal95mining.html 3. Ayres, J., Gehrke, J., Yiu, T., Flannick, J.: Sequential pattern mining using a bitmap representation. pp. 429–435. ACM Press (2002) 4. ying Chiu, D., hung Wu, Y., Chen, A.L.P.: An efficient algorithm for mining fre- quent sequences by a new strategy without support counting. In: In Proceedings of the 20th International Conference on Data Engineering (ICDE’04. pp. 375–386. IEEE Computer Society (2004) 5. Han, J., Fu, Y.: Mining multiple-level association rules in large databases. Knowl- edge and Data Engineering, IEEE Transactions on 11(5), 798 –805 (sep/oct 1999) 6. Jay, N., Kohler, F., Napoli, A.: Analysis of social communities with iceberg and stability-based concept lattices. In: Medina, R., Obiedkov, S.A. (eds.) International Conference on Formal Concept Analysis (ICFCA’08). LNAI, vol. 4923, pp. 258– 272. Springer (2008) 7. Kuznetsov, S., Obiedkov, S., Roth, C.: Reducing the representation complexity of lattice-based taxonomies. In: Priss, U., Polovina, S., Hill, R. (eds.) Proc. of ICCS 15th Intl Conf Conceptual Structures. LNCS/LNAI, vol. 4604, pp. 241–254. Springer (2007) 8. Kuznetsov, S.O.: Stability as an estimate of the degree of substantiation of hy- potheses derived on the basis of operational similarity. Nauchn. Tekh. Inf., Ser.2 (Automat. Document. Math. Linguist.) 12, 21–29 (1990) 9. Kuznetsov, S.O.: On stability of a formal concept. Annals of Mathematics and Artificial Intelligence 49, 101–115 (2007), http://www.springerlink.com/content/fk1414v361277475/ A FCA-based analysis of sequential care trajectories 373 10. Masseglia, F., Cathala, F., Poncelet, P.: The psp approach for mining sequential patterns. pp. 176–184 (1998) 11. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixs- pan: Mining sequential pattern by prefix-projected growth. In: ICDE. pp. 215–224 (2001) 12. Pinto, H., Han, J., Pei, J., Wang, K., Chen, Q., Dayal, U.: Multi-dimensional sequential pattern mining. In: CIKM ’01: Proceedings of the tenth international conference on Information and knowledge management. pp. 81–88. ACM Press, New York, NY, USA (2001) 13. Plantevit, M., Laurent, A., Laurent, D., Teisseire, M., Choong, Y.W.: Mining mul- tidimensional and multilevel sequential patterns. ACM Trans. Knowl. Discov. Data 4(1), 1–37 (2010) 14. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) Proc. 5th Int. Conf. Extending Database Tech- nology, EDBT. vol. 1057, pp. 3–17. Springer-Verlag (25–29 1996), http://citeseer.ist.psu.edu/article/srikant96mining.html 15. Stumme, G.: Efficient data mining based on formal concept analysis. In: Lecture Notes in Computer Science, vol. 2453, p. 534. Springer (Jan 2002) 16. Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge dis- covery and data mining: The new challenges. In: Eklund, P.W. (ed.) ICFCA. Lec- ture Notes in Computer Science, vol. 2961, pp. 352–371. Springer (2004) 17. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of con- cepts. In: Rival, I. (ed.) Ordered Sets. Reidel (1982) 18. Yang, Z., Kitsuregawa, M., Wang, Y.: Paid: Mining sequential patterns by passed item deduction in large databases. In: IDEAS’06. pp. 113–120 (2006) 19. Yu, C.C., Chen, Y.L.: Mining sequential patterns from multidimensional sequence data. Knowledge and Data Engineering, IEEE Transactions on 17(1), 136 – 140 (jan 2005) 20. Zaki, M.J.: Spade: An efficient algorithm for mining frequent sequences. Machine Learning 42(1-2), 31–60 (January 2001), http://www.springerlink.com/link.asp?id=n3t642725v615427 21. Zhang, C., Hu, K., Chen, Z., Chen, L., Dong, Y.: Approxmgmsp: A scalable method of mining approximate multidimensional sequential patterns on distributed system. In: Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth Interna- tional Conference on. vol. 2, pp. 730 –734 (aug 2007) 374 Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli Appendix The M3SP algorithm is able to extract sequential patterns characterized by several dimensions with different levels of granularity for each dimension [13]. Each dimension has a taxonomy which defines the hierarchical relations between items. M3SP runs in three steps: data pre-processing , MAF-item generation and sequence mining. In Figure 5, we present an example to illustrate the mechanism of M3SP. Table 5b shows a dataset of hospitalizations relating patients (P) with attributes from three dimensions – T, the date of stay, – H, the healthcare setting in which the hospitalization takes place, – D, the disease of the patient. For instance, the first tuple means that, at date 1, the patient 1 has been treated for the disease D11 in hospital H11 . Let us now assume that we want to extract all multidimensional sequences that deal with hospitals and diseases that are frequent in the patients set. Figure 5a displays a taxonomy for dimensions H and D. Pre-processing step M3SP considers three types of dimensions: a temporal dimension Dt , a set of analysis dimensions DA , and a set of reference dimensions DR . M3SP orders the dataset according to Dt . The tuples appearing in a sequence are defined over the dimensions of DA . The support of the sequences is computed according to dimensions of DR . M3SP splits the dataset into blocks according to distinct tuple values over reference dimensions. The support of a given multidimensional sequence is the ratio of the number of blocks supporting the sequence over the total number of blocks. In our example, H (hospitals) and D (diseases) are the analysis dimensions, T is the temporal dimension and P (patients) is the only reference dimension. We obtain two blocks defined by Patient1 and Patient2 . as shown in table 5c. MAF-item generation step In this step, M3SP generates all the Maximal Atomic Frequent items or MAF-items. In order to define MAF-items, we fisrt define the specificity relation between items. Specificity relation. Given two multidimensional items a = (d1 , ..., dm ) and 0 0 0 0 0 a = (d1 , ..., dm ), a is said to be more specific than a, denoted by a 4I a , if for every 0 i = 1, ..., m, di ∈ di ↓. Where di ↓ is the set of all direct specializations of di according to the dimension taxonomy of di . In our example, we have (H1 , D1 ) 4I (H1 , D11 ), because H1 ∈ H1 ↓ and D1 ∈ D11 ↓. MAF-item. An atomic item a is said to be a0 Maximal Atomic0 Frequent item, 0 or a MAF-item, if a is frequent and if for every a such that a 4I a , the item a is not frequent. In our example, if we consider minsup = 100%, b = (H1 , D1 ) is a MAF-item, because it is frequent and there is not another item as frequent and more specific than b. The computation of MAF-items is represented by a tree in which the nodes are of the form (d1 , d2 )s , meaning that (d1 , d2 )s is an atomic item with support s as we A FCA-based analysis of sequential care trajectories 375 show in Figure 5d. In this tree, MAF-items are displayed as boxed nodes. We note that all leaves are not necessarily MAF-items. For example, (H2 , D21 )100% is a leaf, but not a MAF-item. This is because (H2 , D21 )100% 4I (H21 , D21 )100% and (H21 , D21 ) has been identified as being an MAF-item. Sequence mining step Frequent sequences can be mined using any standard sequential pattern-mining algo- rithm (PrefixSpan in this work). Since in such algorithms, the dataset to be mined is a set-pairs of the form (id, seq), where id is a sequence identifier and seq is a sequence of itemsets, our example dataset is transformed as follows : – every MAF-item is associated with a unique identifier denoted by ID(a) (table 5e), playing the role of the items in standard algorithms. – every block b is assigned a patient identifier ID(p), playing the role of the sequence identifiers in standard algorithms, – every block b transformed into a pair (ID(b), ζ(b)), where ζ(b) is a sequence. (table 5f) PrefixSpan is run over table 5f. By considering a support threshold minsup =50%, table 5g displays all the frequent sequences in their transformed format as well in their multidimensional format in which identifiers are replaced with their actual values. The basic step in M3SP method is MAF-item generation, because it provides all multidimensional items that occur in sequences to be mined. If the set of MAF-items is changed, the sequence will be changed. M3SP always extracts the most specific multidimensional items. For example (H1 , D1 ) is frequent according to minsup=50%, but another item, (H11 , D11 ) is more specific and still frequent. As a result, (H1 , D1 ) is not a MAF-item and consen- quently not used to build squences. Finaly the frequent sequence h{(H1 , D1 ), (H21 , D21 )}i does not appear in the results of M3SP. However, tables 5 and 6 show the MAF-items set and the frequent sequences extracted by M3SP at a 100% threshold. It can be noticed that (H1 , D1 ) is a MAF-item and that the sequence h{(H1 , D1 ), (H21 , D21 )}i is generated. 0 Thus, given two minsup thresholds σ < σ. The set of frequent sequences obtained 0 for σ may not always contain the set of sequences obtained for σ. Considering this as a limit in our approach as we wanted to extract both general and specific sequences, we iteratively applied M3SP, decreasing threshold by one patient at each step. This allowed us to extract more potentially interesting sequences than by using a single low minsup threshold. Frequent Multidimensional Sequences MAF-item h{(H1 , D1 )}i (H1 , D1 ) h{(H21 , D21 )}i (H21 , D21 ) h{(H1 , D1 ), (H21 , D21 )}i Table 5. maf-item,minsup =100% Table 6. Sequences for minsup =100% * 376 H1 H2 P T H D T H D T H D I-Pre-processing H11 H12 H21 H22 1 1 H11 D11 Input 1 H11 D11 1 H12 D12 healthcare ins/tu/ons 's taxonomy 1 2 H21 D21 * 2 H21 D21 2 H21 D21 2 1 H12 D12 D1 D2 Pa/ent1 Pa/ent2 2 2 H21 D21 Figure 5(c). Block partition of Table I according to DR = {Patient}. D11 D12 D D22 Table 5(b). DataSet 21 Disease's taxonomy Figure 5(a). Dimensione 's taxonomy II-maf-item (*,*) generation (H1,*)100% (H2,*)100% (*, D1)100% (*, D2)100% (H11,*)50% (H12,*)50% (H1, D1)100% (H21,*)100% (H2,D2)100% (*,D11)50% (*,D12)50% (*, D21)100% (H11,D1)50% (H12,D1)50% (H1, D11)50% (H1, D12)50% (H21,D2)100% (H2,D21)100% Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli Fig. 5. example for M3SP method, minsup =50% (H11,D11)50% (H12,D12)50% (H21,D21)100% Figure 5(d). Tree of frequent atomic item Frequent Multidimensional Sequences III-Sequences Maf-‐item ID(a) PrefixSpan <(1) > <{(H11,D11)}> mining ID(p) ς (b ) minsup=50% (H11,D11) 1 <(2) > <{(H12,D12)}> 1 <(1),(3)> <(3)> <{(H21,D21)}> (H12,D12) 2 2 <(2),(3)> <(1),(3)> <{(H11,D11), (H21,D21)}> (H21,D21) 3 <(2),(3)> <{(H12,D12), (H21,D21)}> Table 5(e). Maf-item Table 5(f). Transformed Database Table 5(g). Frequent Multidimensional Sequences Querying Relational Concept Lattices Z. Azmeh1 , M. Huchard1 , A. Napoli2 , M. Rouane-Hacene3 , and P. Valtchev3 1 LIRMM, 161, rue Ada, F-34392 Montpellier Cedex 5 2 LORIA, B.P. 239, F-54506 Vandœuvre-lès-Nancy 3 Dépt. d’informatique, UQÀM, C.P. 8888, Succ. Centre-Ville Montréal, Canada Abstract. Relational Concept Analysis (RCA) constructs conceptual abstractions from objects described by both own properties and inter- object links, while dealing with several sorts of objects. RCA produces lattices for each category of objects and those lattices are connected via relational attributes that are abstractions of the initial links. Navigating such interrelated lattice family in order to find concepts of interest is not a trivial task due to the potentially large size of the lattices and the need to move the expert’s focus from one lattice to another. In this paper, we investigate the navigation of a concept lattice family based on a query expressed by an expert. The query is defined in the terms of RCA. Thus it is either included in the contexts (modifying the lattices when feasible), or directly classified in the concept lattices. Then a navigation schema can be followed to discover solutions. Different navigation possibilities are discussed. Keywords: Formal Concept Analysis, Relational Concept Analysis, Re- lational Queries. 1 Introduction Recently [1], we worked on the problem of selecting suitable Web services for instantiating an abstract calculation workflow. This workflow can be seen as a DAG whose nodes are abstract tasks (like book a hotel room) and directed edges are connections between the tasks, which often correspond to a data flow (like connecting reserve a train ticket and book a hotel room: train dates and time- table are transmitted from reserve a train ticket to book a hotel room). The selection is based on quality-of-service (QoS) properties like response time or availability and on the composability quality between services chosen for neigh- bor tasks in the workflow. Besides, we aim at identifying and storing a set of backup services adapted to each task. To be efficient in the replacement of a fail- ing Web service by another, we want to organize each set of backup Web services by a partial order that expresses the quality criteria and helps to choose a good trade-off for instantiating the abstract workflow. Analyzing such multi-relational data is a complex problem, which can be approached by various methods includ- ing querying, visualization, statistics, or rule extraction (data mining). c 2011 by the paper authors. CLA 2011, pp. 377–392. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 378 Zeina Azmeh et al. We proposed an approach based on Relational Concept Analysis (an itera- tive version of Formal Concept Analysis) to solve this problem, because of its multi-relational nature. Web services are filtered and grouped by tasks they may satisfy (e. g. the Web services for booking a hotel room). In formal contexts (one for each task), we associate the Web services and their QoS criteria. For exam- ple, the service HotelsService by lastminutetravel.com would be described by low response time, medium availability (classical scaling is applied to the QoS val- ues). In relational contexts we encode the composability levels in each directed edge of the workflow. Given an edge of the workflow, the composition quality depends on the way output data of the source task cover input data of the end- ing task, and the need for data adaptation. A relational context encodes for example the relation Adaptable-Fully-Composable between services for reserve a train ticket and services for book a hotel room. In this relation TravelService by puturist.com is connected to HotelsService by lastminutetravel.com if output data of TravelService can be used, with a slight adaptation, to fill input data of HotelsService. The concept lattice family we obtain (one Web service lattice for each task of the workflow) makes it possible: (1) to select a Web service for each task based on QoS and composability criteria, (2) to memorize classified alternatives for each task. Due to the nature of our problem, we are interested in classifying indepen- dently the Web services corresponding to the tasks and not classifying the solu- tions. By solution, we mean a set of Web services, each of which can instantiate a task of the workflow. If a particular service fails or is no more available, the goal is to constitute a new working combination out of the old one, with the smallest number of service replacements. To the best of our knowledge, this problem area has not been investigated in-depth prior to our study, especially in the context of Relational Context Analysis [7, 6]. Therefore, we believe that it would be use- ful to generalize and report what we learned in our experience. In more general terms, we have multi-relational data and a question which contains variables we want to instantiate, and we aim at: – Finding a specific set of objects that satisfy the query. An answer is composed of objects, each object instantiates one variable; – Classifying, for each variable, the objects depending on the way they satisfy (or not) the query, to find alternative answers. In this paper, we put the problem in a more general framework, which as- sumes an unrestricted relational context family and a query given by an expert. The query can be seen as a DAG, where some nodes are labelled by variables and some others are labelled by objects. The nodes roughly correspond to the formal (object-attribute) contexts and the edges correspond to the relational (object- object) contexts. A set of lattices is built using Relational Concept Analysis and the existential scaling operator. We assume that an expert gives a total ordering of the edges of the DAG. Then an algorithm navigates the lattices following this ordering. This navigation allows us to determine objects that answer the query. Querying Relational Concept Lattices 379 These objects with their position in the lattices are what the expert wants to explore, to extract a solution and store the alternatives. In the following, Section 2 reminds the main principles of Relational Concept Analysis (RCA). Section 3 defines the model of queries in the RCA framework that we consider in this paper. Section 4 presents and discusses an algorithm that navigates the concept lattice family using a query. Related work is presented in Section 5 and we conclude in Section 6. 2 Background on Relational Concept Analysis For FCA, we use the notations of [4]. In RCA [5], the objects are classified not only according to the attributes they share, but also according to the links be- tween them. Let us take the following case study. We consider a list of countries, a list of restaurants, a list of Mexican dishes, a list of ingredients, and finally a list of salsas. We impose some relations between these entities {Country, Restau- rant, MexicanDish, Ingredient, Salsa}, such that: a Country ”has” a Restaurant; a Restaurant ”serves” a MexicanDish; a MexicanDish ”contains” an Ingredient; an Ingredient is ”made-in” a Country; and finally a Salsa is ”suitable-with” a MexicanDish. We express these entities and their relations by the DAG in Fig. 1. We capture an instantiation of this entity-relationship diagram in a relational context family. Fig. 1. The entities of the Mexican food example (left). The query schema (right) Definition 1. A relational context family RCF is a pair (K, R) where K is a set of formal (object-attribute) contexts Ki = (Oi , Ai , Ii ) and R is a set of relational (object-object) contexts rij ⊆ Oi × Oj , where Oi (domain of rij ) and Oj (range of rij ) are the object sets of the contexts Ki and Kj , respectively. The RCF corresponding to our example contains five formal contexts and five relational contexts, illustrated in Table 1 (except the made-in relational context, which is not used in this paper for sake of simplicity). An RCF is used in an iterative process to generate at each step a set of concept lattices. First concept lattices are built using the formal contexts only. Then, in the following steps, a scaling mechanism translates the links between objects into 380 Zeina Azmeh et al. Table 1. Relational Context Family for mexican dishes America Europe r1 r2 r3 r4 r5 r6 r7 Asia d1 d2 d3 d4 d5 d6 Restaurant mx MexicanDish en Chili’s × us ca es lb fr Country Burritos × Chipotle × Canada × × Enchiladas × El Sombrero × England × × Fajitas × Hard Rock × France × × Nachos × Mi Casa × Lebanon × × Quesadillas × Taco Bell × Mexico × × Tacos × Old el Paso × Spain × × USA × × i10 i11 i12 i1 i2 i3 i4 i5 i6 i7 i8 i9 Ingredient medium-hot chicken × beef × pork × vegetables × mild beans × hot s1 s2 s3 s4 rice × Salsa cheese × Fresh Tomato × × guacamole × Roasted Chili-Corn × × sour-cream × Tomatillo-Green Chili × × lettuce × Tomatillo-Red Chili × × corn-tortilla × flour-tortilla × contains chickenbeef porkvegetablesbeansricecheeseguacamolesour-creamlettucecorn-tortillaflour-tortilla Burritos × × × × × × × × × × × Enchiladas × × × × Fajitas × × × × × × × × Nachos × × × × Quesadillas × × × × × Tacos × × × × × × × has Chili’s Chipotle El Sombrero Hard Rock Mi Casa Taco Bell Old el Paso Canada × × × × × England × × × France × × × Lebanon × × × Mexico × × × Spain × × USA × × × × × × serves Burritos Enchiladas Fajitas Nachos Quesadillas Tacos Chili’s × × × Chipotle × × El Sombrero × × × × × × Hard Rock × × Mi Casa × × × × × Taco Bell × × × × Old el Paso × suitable-with Burritos Enchiladas Fajitas Nachos Quesadillas Tacos Fresh Tomato × × × × × × Roasted Chili-Corn × × Tomatillo-Green Chili × × Tomatillo-Red Chili × × × × × × conventional FCA attributes and derives a collection of lattices whose concepts are linked by relations. For example, the existential scaled relation (that we will use in this paper) captures the following information: if an object os is linked to another object ot , then in the scaled relation, this link is encoded in a relational attribute assigned to os . This relational attribute states that os is linked to a concept, which clusters ot with other objects. This is used to form new groups, for example the group (See Concept 84) of restaurants, which serve at least one dish containing sour cream (such dishes are grouped in Concept 75). The steps are repeated until reaching the stability of lattices (when no more new concepts are generated). For mexican dishes, four lattices of the concept lattice family are represented in Figures 3 and 4. The ingredient lattice is presented in Fig. 2. Querying Relational Concept Lattices 381 Definition 2. Let rij ⊆ Oi × Oj be a relational context. The exists scaled ∃ ∃ relation rij is defined as rij ⊆ Oi × B(Oj , A, I), such that for an object oi and a concept c: (oi , c) ∈ rij ⇐⇒ ∃ x, x ∈ o0i ∩ Extent(c). ∃ In this definition, A is any set of attributes maybe including relational at- tributes, which are defined below. Definition 3. A relational attribute (s r c) is composed of a scaling operator s (for example exists), a relation r ∈ R, and a concept c. It results from scaling a relation rij ∈ R where rij ⊆ Oi × Oj . It expresses a relation between the objects o ∈ Oi with the concepts of B(Oj , A, I). An existential relational attribute is denoted by ∃rij c where c ∈ B(Oj , A, I). For example: the Concept 50 in the Country lattice owns the relational attribute ∃has Concept 60. This expresses that each country in Concept 50 (Canada and USA) has at least a restaurant in Concept 60 extent (El Som- brero or Mi Casa). Fig. 2. The concept lattice for ingredients of the RCF in Table 1 (concepts names are reduced to C n). 3 Introducing Relational Queries In this section, we define the notion of query and answer to a query. First (section 3.1) we recall simple queries that help navigating concept lattices [7]. Then (section 3.2), we generalize to relational queries that lead the navigation across a concept lattice family. 3.1 Simple queries Definition 4. A query (including its answer) on a context K = (O, A, I), de- noted by q|K (or q when it is not ambiguous), is a pair q = (oq , aq ), such that oq is the query object(s) i.e. the set of objects satisfying the query (or the an- swer set), and aq is the set of attributes defining the constraint of the query. By definition, we have: o0q ⊇ aq , where aq ⊆ A. 382 Zeina Azmeh et al. Fig. 3. Country and restaurant lattices for exists and the RCF in Table 1. For example q|Kcountry = ({England, F rance, Spain}, {Europe}) is a query on the country context (in Table 1), asking for countries in Europe. Another example q|KM exicanDish = ({}, {rice, corn-tortilla}) When aq is closed, solving the query consists in finding the concept C = (a0q , aq ). To ensure that such a concept exists, a virtual query object ovq that satisfies ovq0 = aq can be added to the context (as an additional line). Then, three types of answers can be interesting: the more precise answers are in a0q , less constrained (with less attributes) answers are in extents of super-concepts of C, more constrained (with more attributes) answers are in extents of sub-concepts of C. When aq is not closed, and we don’t use the virtual query object, searching for answers needs to find first the more general concept C whose intent contains aq . Now we will define more generally what we mean by relational queries. Querying Relational Concept Lattices 383 Fig. 4. Dishes and salsa lattices for exists and the RCF in Table 1. 3.2 Relational queries In this study, a relational query is composed of several simple queries, to which we add relational constraints. The relational constraints are expressed via virtual query objects (variables), one for each formal context, where we want to find an object. A virtual query object may have relations (according to the relational contexts) with objects of other contexts, as well as with other virtual query objects. Definition 5. A relational query Q on a relational context family (K, R) is a pair Q = (Aq , Ovq , Rq ), where: 1. Aq is a set of simple queries Aq = {q|Ki = (oq |Ki , aq |Ki ) | q|Ki is a query on Ki ∈ K} 2. There is a one-to-one mapping between Aq and Ovq , where Ovq is the set of virtual query objects. 3. Rq is a set of relational constraints Rq = {(ov q|Ki , rij , Oq )}, where ov q|Ki is the virtual object associated with q|Ki , Oq ⊆ Oj ∪ {ov q|Kj }, with ov q|Kj is the virtual object associated with Kj . 384 Zeina Azmeh et al. For example, let us consider the following query: I am searching for a country with the attribute ”fr”, a restaurant in this country serving Mexican dish contain- ing (chicken, cheese, and corn-tortilla), and a salsa which is ”hot” and suitable with the dish. This query can be translated into a relational query Qexample = (Aq , Ovq , Rq ) as follows: Aq = {qcountry , qrest. , qdish , qsalsa }, aqcountry = {f r}, aqrest. = aqdish = ∅, aqsalsa = {hot}. Ovq = {ov qdish , ov qcountry , ov qrest. , ov qsalsa } Rq = {(ov qdish , contains, {chicken, cheese, corn-tortilla}), (ov qcountry , has, {ov qrest. }), (ov qrest. , serves, {ov qdish }), (ov qsalsa , suitable-with, {ov qdish })}. By definition, a query corresponds to the data model, and must respect the schema of the RCF (see in Fig. 1). An answer to the relational query is included in the answers of the simple queries. For our example, the answers of the simple queries would be oqcountry = {F rance}, oqrest. contains all the restaurants, oqdish contains all the dishes, oqsalsa = {T omatillo-Red Chili}. If we consider these objects connected with the relations, this forms what we call the maximal answer graph. In this graph, we are interested in the subgraphs that cover the query (they have at least one object per element of Aq ). These subgraphs are included in the graph of Fig. 5. There are various interesting forms of answer: having exactly one object per element of Aq , or having several objects per element of Aq . Fig. 5. The subgraph containing all the answers with the relations between the objects corresponding to the relational query example. Definition 6. An answer to a relational query Q = (Aq , Ovq , Rq ) is a set of objects X having a unique object per each context that is involved in the query: X =< oi | oi ∈ Oi with 1 ≤ i ≤ n > These objects satisfy the query Q = (Aq , Ovq , Rq ), when they have the requested attributes: ∀ q|Ki ∈ Aq , ∃ oi ∈ X : o0i ⊇ aq|Ki Querying Relational Concept Lattices 385 and they are connected as expected: ∀ (ov q|Ki , r, Oq ) ∈ Rq with r ⊆ Oi × Oj , (and thus : Oq ⊆ Oj ∪ {ov q|Kj }) and ∀ o ∈ Oq , we have : 1. if o ∈ Oj , we have (oi , o) ∈ r 2. if o = ovq|K , we have (oi , oj ) ∈ r with oj ∈ X ∩ Oj j For our example, the set of answers to the relational query, is: {{F rance, El Sombrero, Enchiladas, T omatillo- Red Chili}, {F rance, El Sombrero, Quesadillas, T omatillo-Red Chili}, {F rance, El Sombrero, T acos, T omatillo-Red Chili}, {F rance, Old el P aso, T acos, T omatillo-Red Chili}}. Answers can be provided with an aggregated form which can be found in lattices, as we explain below. They allow us to discover sets of equivalent objects relatively to the answer. E.g. {Enchiladas, Quesadillas, T acos} are equivalent objects if we choose F rance and ElSombrero. Definition 7. An aggregated answer to a query Q = (Aq , Ovq , Rq ) is the set AR containing the sets Si , such that: – there is a one-to-one mapping between AR and Aq which maps each q|Ki to a set Si – ∀ q|Ki ∈ Aq , ∀ oi ∈ Si , o0i ⊇ q|Ki (objects of Si have the requested attributes) – when (ov q|Ki , r, Oq ) ∈ Rq - if ov q|Kj ∈ Oq , r ⊆ Oi × Oj , thus : ∀ oi ∈ Si , ∀ oj ∈ Sj , Sj ∈ AR, we have (oi , oj ) ∈ r (virtual objects are connected if requested) - f or each oj ∈ Oq ∩Oj we have : (oi , oj ) ∈ r (connections with particular objects are satisfied). For example, an aggregated answer for our query is {Scountry , Srest. , Sdish , Ssalsa } = {{F rance}, {ElSombrero}, {Enchiladas, Quesadillas, T acos}, {T omatillo- RedChili}} 4 Navigating a Concept Lattice Family w.r.t. a Query In this section, we explain how the navigation between the concept lattices can be guided by a relational query. Following relational attributes that lead us from one lattice to another, we navigate a graph whose nodes are the concept lattices. In a first subsection, we propose an algorithm which gives a general navigation schema that applies to concept lattices built with the existential scaling. Then we present several variations of this navigation algorithm. 386 Zeina Azmeh et al. 4.1 A query-based navigation algorithm Our approach for navigating the concept lattices along the relational attributes is based on the observations made during an experimental use of RCA, for finding the appropriate Web services to implement an abstract calculation workflow [1]. We consider an RCF and a query that respects the RCF relations. From our experience, we observed that an expert often expresses his query by a phrase, where the chronology of the principal verbs (relations) gives a natural path for the query flow. This will be our hypothesis. Let us consider the query previously specified: I am searching for a country, with the basic attribute ”fr”, that has a restaurant which serves dishes containing chicken, cheese and corn-tortilla; I am searching for a hot salsa suitable with this dish. In order to simplify the notation, we use the same notation for queries q|ki and the virtual objects ov q|Ki . The query path is a total ordering of the arcs of the query (the query itself is a DAG in general). For our example, the path is the total ordering for Rq given by {(qcountry , has, {qrestaurant }), (qrestaurant , serves, {qdish }), (qdish , contains, {chicken, cheese, corn-tortilla}), (qsalsa , suitable-with, {qdish })}. Each arc cor- responds to a relation used in the query. All the relations involved inside a query are covered by this path. This translation of the expert query determines a composition on the relations. The query path does not always correspond to a directed chain in the object graph (e.g. dishes are the end of two of the considered relations (serves and suitable-with)). We propose the algorithms 1 to 3 (an additional procedure is needed which combines two others) for navigating through a concept lattice family using queries. During the exploration, we fill a set X by objects that will constitute an answer at the end (at most one object for each formal context). In this section, the algorithm is presented as an automatic procedure. Its use to guide an expert in its manual exploration of the data is discussed afterwards. Algorithm 1 identifies three main cases: – line 2, the arc connects two query objects, e.g. (qcountry , has, {qrestaurant }); – line 5, the arc connects a query object to original objects e.g. (qdish , contains, {chicken, cheese, corn-tortilla}); – line 8, the arc connects a query object to another query object and to original objects e.g. (qdish , contains, {qingredient , chicken, cheese, corn-tortilla}). Each of these cases considers, for a given arc a, whether the partial answer X already contains a source object or (inclusively) a target object. When the arc connects a query object to another query object a = (q|Ks , rst , q|Kt ), (Algorithm 2), four cases are possible. – X does not contain any object for Ks and any ot for Kt : we identify the highest concept that introduces the attributes of q|Ks and we select an object in its extent (lines 3-5). Then the algorithm continues on the next conditional statement (to find a target). Querying Relational Concept Lattices 387 – X contains an object os for Ks and an object ot for Kt selected in previous steps: we just check if os owns the relational attribute pointing at the object concept introducing ot , that is γot (line 8)1 . – X contains only an object os for Ks . We should find a target. We identify, under the meet of the concepts that introduce the attributes of q|Kt , one of the lowest concepts to which os points (lines 12-14). We select a target in its extent. – X contains only an object ot for Kt . We should find a source. We identify the meet of the concepts that introduce the attributes of q|Ks and the relational attribute that points to ot (lines 20-23). We select a source in its extent. When the arc connects a query object to original objects a = (q|Ks , rst , Oq ) (Algorithm 3): – Either X contains an object for Ks and we need to check if the relational attributes confirm that this object is connected to all the original objects in Oq ) (line 4); – Or we have to select an object for Ks , owning the attributes of the query q|Ks and owning the relational attributes ending in the concepts introducing the original objects (line 9-11). The algorithm for the last case is a combination of the algorithms for the two other cases. Note that whenever a condition is not verified, we have to backtrack, this is not specified in the algorithm for sake of simplicity. If the query path forms also a directed chain in the entity-relationship diagram, the main algorithm is a depth-first search. But in the general case, in some steps, when we consider an arc, we assigned to X an object for the end of the arc, and we need to find a source object. For example, we start with the arc (qcountry , has, {qrestaurant }) where the query path begins. We have to identify a source object os satisfying the query {f r} (Definition 4). For example, we choose the object France appearing the extent of Concept4 , whose intent contains fr. We extract the relational attributes of os = F rance, having the form ∃rst C). They are in practice in the lattices denoted by r : C. For example, we obtain has:Concept 19, has:Concept 15, has:Concept 60, has:Concept 16, etc. We keep the relational attributes with the concepts satisfying the target query in the corresponding lattice and discard the rest. In our example, the qrestaurant is empty. A relational attribute with the smallest concept (Ct ) is the one to consider that leads us to find a solution. We choose Concept 15 among the available smallest concepts. Let ∃ rst Ct be the selected relational attribute (if it exists). The object ot must be in the extent of Ct . In our example, we select El Sombrero. Then we consider the query-to-query arc (qrestaurant , serves, {qdish }). Given that an object is selected for Krestaurant , we look for a possible target object, led by the query qdish = ∅ and the relational attributes owned by the object 1 We remind that γo is the object concept introduced by o. 388 Zeina Azmeh et al. concept Concept 15 which introduces El Sombrero. Suppose we choose (line 13) a relational attribute that targets one of the minimum concepts, namely serves : Concept 23 (but serves : Concept 26 or serves : Concept 25 are also possible). This leads us to Concept 23, in the extent of which we select Enchiladas. Dealing with the next arc (qdish , contains, {chicken, cheese, corn-tortilla}) involves, since we have already selected a dish, to verify (Algorithm 3, line 4) that, the object concept γEnchiladas owns all the relational attributes that go to object concepts introducing chicken, cheese, and corn-tortilla. These are contains : γ chicken = Concept 29, contains : γ cheese = Concept 36 and contains : γ corn − tortilla = Concept 40 and they are indeed inherited by γEnchiladas = Concept 23. When the arc (qsalsa , suitable-with, {qdish }) is considered, the target (Enchi- ladas) is in X. Thus we identify a source in the extent of the Concept 47, which satisfies the target query {hot}. Its intent contains suitable − with : Concept 23 which is Enchiladas. A target object (Tomatillo-Red Chili) is selected in the extent of Concept 47. The answer is now complete. Algorithm 1: Navigate(RCF, Q, PQ ) //PQ = (ak ) | ak = rij and rij ∈ RQ Data: (K, R): an RCF; Q = (Aq , Ovq , Rq ): a query on (K, R); and a query path Result: X: an object set (answer for Q) or fail foreach arc a ∈ PQ do 1 if a = (q|Ks , rst , q|Kt ) then 2 Case pure query 3 else 4 if a = (q|Ks , rst , Oq ) with Oq ⊆ Ot then 5 Case pure objects 6 else 7 if a = (q|Ks , rst , q|Kt ) with q|Kt ∈ Oq then 8 Case query and objects 9 4.2 Variations about the algorithm Integrating queries into the contexts. One approach that was investigated in the case of simple queries consists of integrating the virtual query object in the context, then building the concept lattice. This can also be done for rela- tional queries. A relational query Q = (Aq , Ovq , Rq ) can be integrated into an RCF by adding the virtual query objects ovq|K into the context Ki . Each vir- i tual query object ov q|Ki owns the attributes of the query aq|Ki and for each arc (ovq|K , rij , ovq|K ), the relational context of rij is enriched by a line for ovq|K , i j i Querying Relational Concept Lattices 389 Algorithm 2: Case pure query Let a = (q|Ks , rst , q|Kt ) 1 if // X does not contain a source and a target for the current arc a 2 X ∩ Os = ∅ and X ∩ Ot = ∅ then // select a source in the extent of a concept that verifies the source query 3 Let Cs be the highest concept having Intent (Cs ) ⊇ q|Ks select os ∈ Extent(Cs ) 4 X ← X ∪ {os } 5 if // X contains a source and a target for the current arc a 6 X ∩ Os = {os } and X ∩ Ot = {ot } then // verify that the source is connected to the target 7 check ∃rst γot ∈ Intent(γos ) 8 else 9 if // X contains a source for the current arc a 10 X ∩ Os = {os } then // select a target in the extent of a concept that verifies the target query 11 and is connected to the source Let Ct be the highest concept having Intent (Ct ) ⊇ q|Kt 12 and Ct ∈ min(C | ∃ (∃rst C) ∈ Intent(γos )) 13 select ot ∈ Extent(Ct ) 14 X ← X ∪ {ot } 15 else 16 // X contains a target for the current arc a 17 // select a source in the extent of a concept that verifies the source query18 and is connected to the target 19 Let ot ∈ X ∩ Ot 20 Let Cs be the highest concept having Intent (Cs ) ⊇ q|Ks 21 and ∃rst γot ∈ Intent(Cs ) 22 select os ∈ Extent(Cs ) 23 X ← X ∪ {os } 24 Algorithm 3: Case pure objects Let a = (q|Ks , rst , Oq ) with Oq ⊆ Ot 1 if // X contains a source for the current arc a 2 X ∩ Os = {os } then // verify that the source is connected to the objects in Oq 3 check ∀o ∈ Oq , ∃rst γo ∈ Intent(γos ) 4 else 5 // X does not contain a possible source 6 // select a source in the extent of a concept that verifies the source query 7 // and is connected to the target objects 8 Let Cs be the highest concept having Intent (Cs ) ⊇ q|Ks 9 and ∀o ∈ Oq , ∃rst γo ∈ Intent(Cs ) 10 select os ∈ Extent(Cs ) 11 X ← X ∪ {os } 12 390 Zeina Azmeh et al. a column for ovq|K and the relation (ovq|K , ovq|K )2 . We generate the corre- j i j sponding concept lattice family, considering the existential scaling 3 . Locating the highest concept that introduces all the attributes of each query of each con- cerned context, now is much more easy because it introduces the virtual query object. Then, we can navigate in a similar way as before. Opportunities of browsing offered by the exploration. As we explained before, the algorithm described in the previous section can be understood as an automatic procedure to determine a solution to a query. Nevertheless, it is more interesting to use it as a guiding method for the exploration of data by a human expert. Each object selection is a departure point for inspecting the objects of the selected concept, and, explore the neighborhood, going up by relaxing constraints or going down by adding constraints. A point in favor of the lattices is that they do not only give us a solution, but they also classify the objects of the solutions and provide a navigation structure. They also carry other information about the objects which can be useful for the expert: attributes that objects of the answer set have necessarily, attributes that appear simultaneously as attributes of the answer, etc. In our Web service application, we preferred the solution which integrates the query in RCF because it was easier to identify the answers. The lattices show how the existing objects match and differ from the query, thanks to the factorization of attributes between the query and the existing objects. Nevertheless, having several queries at the same time would not be efficient. Thus, the solution has been used only for specific problems. An incremental algorithm can be used to introduce the query, which enlightens the process of modifying the lattice and highlights the structure of the data. We can keep the original lattice (before query integration), and save the query objects together with the resulting concepts in an auxiliary structure. This way, we can always easily go back to the original lattices. 5 Related Work ER-compatible data, e.g., relational databases, and concept lattices have a long history of collaboration. First attempts to apply FCA to that sort of data go back to the introduction of concept graphs by R. Wille in the mid-90s [8]. The standard approach is rooted in the translation of an ER model into a power- context family (PCF) where basically everything is represented within a formal context [9]. Thus, inter-object links of various arities (i.e., tuples of different sizes) are reified and hence become formal objects of a dedicated context (one per arity). The overall reasoning is therefore uniformly based on the formal concepts. 2 See our example in Table http://www.lirmm.fr/~huchard/RCA_queries/ mexicoExistsWithQuery.rcft.html) 3 It is represented in Figure http://www.lirmm.fr/~huchard/RCA_queries/ mexicoExistsWithQuery.rcft.svg Querying Relational Concept Lattices 391 While this brings an undeniable mathematical strength in the formalization of the data processing and, in particular, querying, there are some issues with the expressiveness. Indeed, while formal concepts are typically based on a doubly universal quantification, the relational query languages mostly apply existential one. Alternatives to the PCF in the interpretation of concept graphs have been proposed that involve the notions of nested graphs and cuts [2]. It was shown that the resulting formalism, called Nested Query Graphs, have the same expressive power over relational data as first order predicate logic and hence can be used as a visual representation of most mainstream SQL queries. Existing approaches outside the concept graphs-based paradigm (see [3, 6]) follow a more conventional coding schema. Here inter-object links are modeled either through a particular sort of formal attributes or they reside in a differ- ent binary tables that match two sorts of individuals among them (instead of matching a set of individuals against a set of properties). Our own relational concept analysis framework is akin to this second category of approaches, hence our querying mechanisms are closer in spirit to those presented in the aforemen- tioned papers. For instance, in [3], the author proposes a language modeled w.r.t. SPARQL (the query language associated with the RDF language) to query relational data within the logical concept analysis (LCA) framework. The idea is to explore the relation structure of the data, starting from a single object and following its links to other objects. The language admits advanced constructs such as negation and disjunction and therefore qualifies as a fully-fledged relational query language. Recently, a less expressive language has been proposed in [6] for the brows- ing of a relational database content while taking advantage of the underlying conceptual structure. As the author himself admits, the underlying data for- mat used to ground the language semantics, the linked context family, is only slightly different from our own relational context family construct. The queries are limited here to conjunctions and existential quantifiers, yet variables are ad- mitted. Consequently, query topologies are akin to general graphs: In actuality, the browsing engine comprises a factorization mechanism enabling the discovery of identical extensions in the query graph which are subsequently merged. The downside of remaining free of the extensive commitments made by the concept graphs formalism both in terms of syntax and of semantics is the lack of unified methodological and mathematical framework beneath this second group of approaches. As a result, these diverge on a wide range of aspects which makes their in-depth comparison a hard task. First, there is an obvious query language expressiveness gap: On that axis, the two extremes are occupied by the LCA- and the RCA-based approaches, respec- tively, the former being the most expressive and the latter, the less expressive one. Then, the role played by the concept lattices vs. the query resolution is specific in each case. While in the LCA-based approach the concepts seem to be formed on the fly, in [6] the author seems to imply that they are constructed beforehand. Despite this distinction, in both cases the concept lattice is a sec- 392 Zeina Azmeh et al. ondary structure that supports query resolution. In our own approach however, lattices are not only constructed prior to querying, but they also incorporate relational information in the intents of their concepts. In this sense, they are the primary structures whereas the queries are intended as navigational support. 6 Conclusion In this paper, we have presented a query-based navigation approach that helps an expert to explore a concept lattice family. The approach was based on an ap- plication of Relational Concept Analysis to the selection of suitable Web services for instantiating an abstract service composition. There are many perspectives of this work. In our Web service experience, we tested other scaling operators (like the covers operator) that offers other results, and helps to find more easily the aggregate answers. The query language can be made more expressive (including quantifiers). For example, we can request dishes containing only {chicken, cheese, ...}, which means that the universal scaling operator shall be used in the RCA process for this particular relation. Besides, the query path can be calculated, rather than being defined by the expert, suggesting more efficient exploration paths. References 1. Azmeh, Z., Driss, M., Hamoui, F., Huchard, M., Moha, N., Tibermacine, C.: Selec- tion of composable web services driven by user requirements. In: ICWS. pp. 395–402. IEEE Computer Society (2011) 2. Dau, F., Correia, J.H.: Nested concept graphs: Applications for databases and math- ematical foundations. In: Contribution to ICCS 2003. Skaker Verlag (2003) 3. Ferré, S.: Conceptual navigation in RDF graphs with SPARQL-Like Queries. In: Kwuida, L., Sertkaya, B. (eds.) ICFCA. LNCS, vol. 5986, pp. 193–208. Springer (2010) 4. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Sprin- ger-Verlag (1999) 5. Huchard, M., Rouane-Hacene, M., Roume, C., Valtchev, P.: Relational concept dis- covery in structured datasets. Ann. Math. Artif. Intell. 49(1-4), 39–76 (2007) 6. Kötters, J.: Object configuration browsing in relational databases. In: Valtchev, P., Jäschke, R. (eds.) ICFCA. Lecture Notes in Computer Science, vol. 6628, pp. 151–166. Springer (2011) 7. Messai, N., Devignes, M.D., Napoli, A., Smaı̈l-Tabbone, M.: Querying a bioin- formatic data sources registry with concept lattices. In: Dau, F., Mugnier, M.L., Stumme, G. (eds.) ICCS. LNCS, vol. 3596, pp. 323–336. Springer (2005) 8. Wille, R.: Conceptual graphs and formal concept analysis. In: Lukose, D., Delugach, H.S., Keeler, M., Searle, L., Sowa, J.F. (eds.) ICCS. Lecture Notes in Computer Science, vol. 1257, pp. 290–303. Springer (1997) 9. Wille, R.: Formal concept analysis and contextual logic. In: Hitzler, P., Scharfe, H. (eds.) Conceptual Structures in Practice. pp. 137–173. Chapman and Hall/CRC (2009) Links between modular decomposition of concept lattice and bimodular decomposition of a context Alain Gély LITA, Ile du Saulcy, 57045 Metz Cedex 1 Université de Metz, France gely@univ-metz.fr Abstract. This paper is a preliminary attempt to study how modular and bimodular decomposition, used in graph theory, can be used on contexts and concept lattices in formal concept analysis (FCA). In a graph, a module is a set of vertices defined in term of behaviour with respect to the outside of the module: All vertices in the module act with no distinction and can be replaced by a unique vertex, which is a representation of the module. This definition may be applied to concepts of lattices, with slighty modifications (using order relation instead of adjacency). One can note that modular decomposition is not well suited for bipar- tite graphs. For example, every bipartite graph corresponding to a clar- ified context is trivially prime (not decomposable w.r.t modules). In [4], authors have introduced a decomposition dedicaced to bipartite graph, called the bimodular decomposition. In this paper, we show how modu- lar decomposition of lattices and bimodular decomposition of contexts interact. These results may be used to improve readability of a Hasse diagram. 1 Introduction Concept lattices are well suited to deal with knowledge representation and clas- sification, but when the number of concepts grows, it is not very convenient to visualize the Hasse diagram. To avoid this problem, some approaches keep only part of the concepts (Iceberg lattice [9] , usage of Galois sub-hierarchy [3], concepts with high stability score [7, 8] or any combinaison of these techniques); Others approaches try to obtain a more readable lattice by usage of a threeshold, as for α-galois lattices [10]. Another solution is to use decompositions to improve readability (see all chapter 4 of [5] and particularly nested diagrams). There is a lot of works in graph theory about decomposition of a graph. A classical and well studied decomposition is the modular decomposition (see for example [6]). This decomposition has great properties: possibility of replacing a set of vertices by a single representant, so that visualization of the graph is better understandable; recursive approach, so that one can go from generalities to finer c 2011 by the paper authors. CLA 2011, pp. 393–403. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 394 Alain Gély detail levels (useful for knowledge representation); nice theoretical properties, as the existence of a decomposition tree or closure properties for the family of modules. Modular decomposition of graphs may be adapted to lattices with only little changes: In a graph, this is the adjacency relation which is fundamental, but it is the order relation for lattices. Moreover, concept lattices are usually computed from a context, which can be considerated as a bipartite graph. So, there are two structures which can be decomposed: the concept lattice and the bipartite graph. Unfortunately, bipartite graphs are not good candidates for modular decom- position: except for twin vertices (vertices with the same neighbourghood) or con- nected components, there are no modules (except trivial one’s) in such graphs. To improve the decomposition of bipartite graph, the notion of bimodule is in- troduced in [4]. Goal of this paper is to study how bimodules of a bipartite graph interact with modules of the concept lattice of this context, and to see how it can be used to help the visualisation of information contained in lattices. The next section is dedicaced to definitions. Section 2.2 introduces modules of a graph and transposes the definition to lattice (modules of a lattice). Section 3 is about bimodules of a bipartite graph (context) and the links that exist with the corresponding concept lattice. After some discussion in section 4, we conclude the paper in section 5. 2 Preliminaries 2.1 Definitions In this paper, all discrete structures are finites and all graphs are simples (no loops neither multi-edges). Since this paper is about usages of graph theory results, a formal context will be considerated as a bipartite graph B = (O, A, I) with O (objects) and A (attributes) being two stable sets of vertices, and I (incidence relation between objects and attributes) the set of edges of B. For a vertex v, v ′ denotes the neighbourghood of v (vertices adjacents to v). For a subset V of vertices, V ′ denotes the common neighbourhood (vertices which are adjacent to every vertices of V ). With this notation, the classical definition of galois connections follows immediately. Definition 1 (Galois connections). For a set X ⊂ O, Y ⊆ A we define X ′ = {y ∈ O | xIy for all x ∈ X}, Y ′ = {x ∈ A | xIy for all y ∈ Y }. A clarified context is a context such that x′ = y ′ implie x = y for any vertices of O ∪ A. A clarified context is reduced if no vertex v is such that v ′ = V ′ with V ⊆ O ∪ A, v ̸∈ V . Links between modular decomp. of conc. lat. and bimodular decomp. of a 395 context A complete lattice L = (P, ≤, ∨, ∧) is a poset such that for all X ⊆ P , there exist a supremum and an infimum in P . j ∈ P is ∨-irreducible element if x∨y = j implies x = j or y = j. m ∈ P is a ∧-irreducible element if x ∧ y = m implies x = m or y = m. j covers a unique element j∗ (j∗ ≺ j), m is covered by a unique element m∗ (m ≺ m∗ ). We denote J the set of ∨-irreducible elements and M the set of ∧-irreducible elements. For a formal context C = (O, A, I) a formal concept is a pair (X, Y ), X ⊆ O, X ⊆ A and X ′ = Y and Y ′ = X. X is called the extent of the concept and Y is called the intent. The set of formal concepts ordered by inclusion on the intents is the concept lattice of C. For every finite lattice L = (P, ≤, ∨, ∧) there is, up to isomorphism, a unique reduced context C = (J, M, ≤). In the following of this paper, we will consider only reduced contexts, i.e. contexts such that O = J, the set of ∨-irreducible elements and A = M the set of ∧-irreducible elements of L. 2.2 Modules of graphs and lattices We denote a graph G with G = (V, E). V is the set of vertices and E a set of edges. Let X ⊂ V and s ∈ V \X. Then s distinguishes X if s′ ∩ X ̸= ∅ and s′ ∩ X ̸= X. That is, s is adjacent with some vertices of X and not adjacent with some others vertices of X. So, if no vertex distinguishes a set X, then for the outside of X and relation of adjacency, every vertex is similar and X can be viewed as a unique vertex. Definition 2 (Module, graph theory). A module in a graph is a subset of vertices that no vertex distinguishes. The graph which is obtained by the replacement of a module by a single vertex is called a quotient graph. It is a simplification of the original one (see Fig. 1). As no vertex distinguishes X (elements in dashed line), there exist only two possibilities for a vertex v not in X: either v is adjacent to every vertex of X (then there exists an edge between v and the representant of X) or v is adjacent with no vertex of X (then, there is no edge between v and the representant of X). For a graph G = (V, E), the set V and singletons x ∈ V are trivial modules. A graph without non trivial module is called a prime graph (for the modular decomposition). Two modules A and B overlap if no one is a subset of the other and A ∩ B ̸= ∅. A module which does not overlap another module is a strong module. Modules and strong modules are central in several decomposition processes and their properties have been well studied. In the first definitions, modules where defined with respect to the adjacency relation, but decompositions have been generalized (for example in [1]) for others properties of graphs. For a lattice, it is more natural to consider the order relation than an adja- cency relation, so a natural definition follows immediately: 396 Alain Gély (a) (b) Fig. 1. (a) A module in a graph and (b) the quotient graph Definition 3. For a lattice L = (P, ≤, ∨, ∧), a lattice module is a set of elements X ⊆ P such that, for every y ∈ P \X, one of the three following statements is true: – ∀x ∈ X, x < y; – or ∀x ∈ X, x > y; – or ∀x ∈ X, x||y. It is clear with this definition that a module in a lattice L is equivalent to a module (with respect to adjacency) in the graph obtained by transitive closure of the Hasse Diagram of L. ⊤ f g h M1 M2 a b c d e ⊥ (a) (b) Fig. 2. Two strong modules of lattice (a) and the quotient lattice (b). Since no vertex outside the module distinguishes vertices inside the module, it can be collapsed to a single vertex which is the representant of the module. Note that M2 can be recursively decomposed in two other modules {h} (trivial) and {d, e}. Let X ⊆ P be a subset of elements of a lattice L, with A = min(X) and B = max(X) the sets of minimal (resp. maximal) elements of X. X is a convex set iff for all y ∈ P such that a < y < b, a ∈ A, b ∈ B, then y ∈ X. If A and B are reduced to singletons, X is an interval. [A, B] denotes the convex set defined by the two sets A and B. Links between modular decomp. of conc. lat. and bimodular decomp. of a 397 context Lemma 1. Modules in lattices are convex sets. Proof. Suppose it is not, then there exists y ∈ P \X with a < y < b and so, y distinguishes a and b. It follows that X is not a module. From now, since lattices modules are convex sets, we will use the notation X = [A, B] to speak of a module X. Lemma 2. For a lattice module [A, B]: 1. if |A| > 1 then A ⊆ J, 2. if |B| > 1 then B ⊆ M . Proof. Suppose |A| > 1, and let A = a1 , a2 , . . . , an . Suppose ai ̸∈ J, then, since ai ||aj there exists at least one ∨-irreducible element j such that j < ai and j ̸< aj , which is a contradiction with the fact that [A, B] is a module. Dually proof applies for elements of B. Note that when |A| = 1 (dually for B) the maximal element of the module is not necessary an irreducible one (See Fig. 3). M1 M6 b b M2 c d M5 a a M3 M4 (a) (b) Fig. 3. (a) Module M2 is a convex set [A, B], with A = {a} ̸⊂ J and B = {b} ̸⊂ M . M1 , M2 and M3 overlap: there are not strong modules. (b) M4 , M5 and M6 are strong modules but are not intervals. M5 = [A, B], with A = {b, c} ⊂ J and B = {b, c} ⊂ M . Lemma 3. For a lattice module [A, B]: ∧ 1. if |A| > 2, ∨ A = ai ∧ aj for all ai , aj ∈ A. 2. if |B| > 2, B = bi ∨ bj for all bi , bj ∈ B. Proof. Clearly, suppose |A| > 2 and there exist ai , aj , ak ∈ A such that x1 = ai ∧ aj ̸= aj ∧ ak = x2 . w.l.o.g suppose x1 ̸< x2 . Then x1 < aj and x1 ̸< ak . It follows that x1 distinguishes [A, B]. 398 Alain Gély 3 Modules of lattices and bimodules of contexts As a preliminary remark, we recall that all considered contexts are reduced, and so, clarified. The clarification of a context is the fact to keep only one object o for all objects oi such that o′i = o′j (dually for attribute). It is clear that the set {o1 , . . . , on } is a module in the bipartite graph and this process is equivalent to replace twin vertices by a representant. Modules are not well situed for bipartite graphs. Twin vertices and connected components are the only modules for these graphs which are poorly decompos- able. In goal to improve the decomposition, Fouquet and all have introduced bimodule, an analog of module for bipartite graphs. Definition 4 (Bimodule). Let C = (O, A, I) be a bipartite graph, and (X, Y ) ⊂ (O, A), then (X, Y ) is a bimodule if no x ∈ O\X distinguishes A and no y ∈ A\Y distinguishes O. Example of bimodule is given in Fig. 4: b and c are not distinguished with respect to vertices 4 (none of them are adjacent) or 3 (each of them is adjacent). Similarly, 1 and 2 are not distinguished by a (each of them is adjacent) and d (none of them is adjacent). 1 2 3 4 a b c d Fig. 4. Example of bimodules The whole bipartite (O, A), all vertices and pairs (j, m), j ∈ J, m ∈ M are trivial modules. In the following, we consider only non trivial bimodules, i.e bimodules with at least 3 elements. Proposition 1. To any non trivial module X of lattice corresponds a bimodule of reduced context. Proof. By definition of a lattice module, no elements inside the modules are dis- tinguished by elements outside. It follows directly that no ∨-irreducible element outside the module distinguishes ∧-irreducible elements inside, and conversely. Now, we want to know how a bimodule on the context may be interpretated in the concept lattice. First, we define a set in the lattice L from a bimodule X. From a bimodule X = (J1 , M1 ) ⊆ (J, M ), we build a subset C of concepts in L such that: Links between modular decomp. of conc. lat. and bimodular decomp. of a 399 context – attributes concepts of X are in C, – objects concepts of X are in C, – C = [A, B] is a convex set, with A being maximal elements of C and B being minimal elements of C. As previously seen, a lattice module corresponds to a context bimodule but, with the previous construction, the converse may be false: There exist bimodules of a context such that [A, B] does not correspond to a module in the lattice. As an example, Fig. 5 shows the lattice for the bipartite graph in Fig. 4. The set [A, B] is bounded by a dashed line. Nevertheless, we can observe that, even if [A, B] is not a module, there exists a possibility of simplification, replacing a set of elements by two vertices and an edge. (abcd, ∅) (abcd, ∅) (ab, 1) (ac, 2) (bcd, 3) 12 (bcd, 3) (a, 12) (b, 13) (c, 23) (d, 34) (a, 12) bc (d, 34) (∅, 1234) (∅, 1234) Fig. 5. (a) lattice for bipartite graph in Fig. 4.a and (b) simplified lattice In the following, we show that when [A, B] is not a module, the shape of the set [A, B] is very constrained. Proposition 2. Let [A, B] be a convex set built from a non trivial bimodule X. If [A, B] is not a module, then 1. | |[A, B] ∩ M | − |[A, B] ∩ J| | ≤ 1 2. if |[A, B] ∩ M | = |[A, B] ∩ J|, then∨A ⊆ J and B ⊆ M 3. if |max([A, B] ∩ J)| > 1, a+ = ai , ai ∈ max([A, B] ∩ J) is such that ai ≺ a+ ∧ 4. if |min([A, B] ∩ M )| > 1, b− = bi , bi ∈ min([A, B] ∩ M ) is such that b− ≺ bi Proof. First, we show that | |[A, B] ∩ M | − |[A, B] ∩ J| | ≤ 1: Suppose that [A, B] is not a module of the concept lattice, then there exists an element x ̸∈ [A, B] which distinguishes [A, B]. Without lost of generality, we can consider that x ∈ J and there exist y ∈ [A, B] such that x < y. We denote P1 400 Alain Gély the set of elements of [A, B] which are greater than x and P2 = [A, B]\P1 . Since x is a ∨-irreducible element, it does not distinguish any ∧-irreducible elements in [A, B]. It follows that all ∧-irreducible elements in [A, B] are in P1 and none of them in P2 (or conversely). In a finite lattice, every element e is ∧-dense, i.e. equal to the infimum of ∧-irreducible elements greater than e. All ∨-irreducible elements in P2 cannot be distinguished by ∧-irreducible elements outside [A, ∧B]. One unique ∨-irreducible element jmax of P2 may be defined by jmax = mi , . . . , mj , with mi , . . . , mj ̸∈ P1 . All other ∨-irreducible elements in P2 are distinguished by ∧-irreducible elements in P1 (and only by these elements). So P2 ∩ J = X ∪ {jmax } (jmax may not exist). Suppose |X| < |[A, B] ∩ M |, then there exist m1 , m2 ∈ [A, B] ∩ M , j1 ∈ X such that j1 < m1 , j1 < m2 , m1 ||m2 . It follows that j1 < m1 ∧ m2 , which is impossible since j1 ∈ P2 and elements in P2 are not comparable to x. Similarly, suppose |X| > |[A, B] ∩ M |, At least one ∨-irreducible element j of X is smaller than two ∧-irreducible elements m1 and m2 of P1 , with m1 ||m2 . This is impossible, so |X| = |[A, B] ∩ M | and | |[A, B] ∩ M | − |[A, B] ∩ J| | ≤ 1. A ⊆ J and B ⊆ M follow directly of the fact that, by construction A and B contain irreducible elements and for each ∧-irreducible element m ∈ [A, B], there exists a ∨-irreducible element j ∈ [A, B] such that j < m. It remains ∨ to prove that, when max([A, B]∩J) contains at least two elements, a+ = ai , ai ∈ max([A, B]∩J) is such that ai ≺ a+ (and dually for b− ). Suppose it is not the case, then exist at least two elements x1 and x2 smaller than a+ and such that x1 and x2 distinguish elements in A. It follows that one can find a ∧-irreducible element which distinguishes ∨-irreducible elements in A and that is a contradiction. It follows from this proposition that even if a set [A, B] is not a module, it can be collapsed into two vertices j and m such that j < m (but maybe not j ≺ m). j is a representant for the set [A, B] ∩ J and m a representant for the set [A, B] ∩ M . Moreover, j ≺ a+ and b+ ≺ m. 4 Discussion 4.1 Algorithmic Aspects It is known that the family of modules of a graph (and so, of a lattice) and the family of bimodules of a bipartite graph are closed by intersection. Since the whole graph is a (trivial) module, it defines a lattice. So, for any set S of vertices, it is possible to use a closure operator to compute the smallest module which contains S. Algorithm 1 adds all vertices which distinguish respectively X and Y and the same process is repeated until no more vertex can be added. Usually, bimodules decomposition does not produce all possible modules, but an inclusion tree such that all possible bimodules can be deduced from this tree. The root represents the whole graph and the leaves are vertices (trivial Links between modular decomp. of conc. lat. and bimodular decomp. of a 401 context Input: (O, A, I) a bipartite graph, (X, Y ) ⊂ (O, A) Output: (Xc , Yc ), smallest bimodule containing (X, Y ) begin continue ← true; (Xc , Yc ) ← (X, Y ); while continue do continue ← f alse; forall the x ∈ J\Xc do if x distinguishes Yc then Xc ← Xc ∪ x; continue ← true; end end forall the y ∈ M \Yc do if y distinguishes Xc then Yc ← Yc ∪ y; continue ← true; end end end return (Xc , Yc ) end Algorithm 1: Computation of the smallest bimodule which contains (X, Y ) bimodules). It follows that the size of the tree is O(n), with n = |O| + |A|. In [1], authors propose a O(n3 ) algorithm to compute a such tree. 4.2 Decomposition and Real Data In Fig. 6, an example of bimodule is shown on the “Living Beings and Water” concept lattice [5]. g is the attribute for “can move around” and h is the one for “has limbs”. These two attributes are equivalent (cannot be distinguished) from the outside of the bimodule. So, on the lattice in Fig. 6.c these two attributes are collapsed, as well as objects 1 (Leech) and 2 (Bream). Further work must be done on real data to see what bimodules can enlight for practical cases. 5 Conclusion First, we have seen that modules defined on a lattice have natural links with bimodules of the bipartite graph (context) of this lattice. Modules of a lattice can be used the same way as modules of a graph are used: to produce a quo- tient lattice, which is a simplification of the original one. Recursive definition of modules allows to consider several details levels in the lattice. All results in modular decomposition may be transposed immediatly to con- cept lattice and associated context to improve the readibility of the lattice. 402 Alain Gély b c d e f g h i 1 × × 2 × ×× 3 ×× ×× 4 × ××× 5 × × × 6 ××× × 7 ××× 8 ×× × (a) c b g h d f 1 2 5 8 4 i 3 e 7 6 (b) c b gh d f 12 5 8 4 i 3 e 7 6 (c) Fig. 6. (a) “Living Beings and Water ” Context [5], (b) Concept lattice for “living Beings and Water” and (c) the same concept lattice with a bimodule collapsed. Links between modular decomp. of conc. lat. and bimodular decomp. of a 403 context Second, investigation of bimodules properties shows that a bimodule may not correspond to a module of the lattice. Nevertheless, it remains possible to use it to produce a simplification of the original lattice. In such a case, the bimodule is collapsed in two elements a and b which represent ∨-irreducible elements and ∧-irreducible elements of the bimodule. This last case is a particular case of another decomposition proposed for inheritance hierarchies [2], called the block decomposition (with a different def- inition of block that the one in [5]): a block is an interval [a, b] such that only a and b can be distinguished of other vertices from the outside of the block. As a perspective behaviour of this decomposition for lattices and associated properties on the context can be investigated. References 1. m. Bui Xuan, B., Habib, M., Limouzy, V., Montgolfier, F.D.: Homogeneity vs. adja- cency: generalising some graph decomposition algorithms. In: In 32nd International Workshop on Graph-Theoretic Concepts in Computer Science (WG), volume 4271 of LNCS (2006) 2. Capelle, C.: Block decomposition of inheritance hierarchies. In: WG. pp. 118–131 (1997) 3. Encheva, S.: Galois sub-hierarchy and orderings. In: Proceedings of the 10th WSEAS international conference on Artificial intelligence, knowledge engineer- ing and data bases. pp. 168–171. AIKED’11, World Scientific and Engineer- ing Academy and Society (WSEAS), Stevens Point, Wisconsin, USA (2011), http://portal.acm.org/citation.cfm?id=1959485.1959517 4. Fouquet, J., Habib, M., de Montgolfier, F., Vanherpe, J.: Bimodular decomposi- tion of bipartite graphs. In: Graph-Theoretic Concepts in Computer Science 30th International Workshop, WG 2004, Bad Honnef, Germany, June 21-23 (2004) 5. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Springer-Verlag Berlin (1996) 6. Habib, M., Paul, C.: A survey of the algorithmic aspects of modular decomposition. Computer Science Review 4, 41–59 (2010) 7. Jay, N., Kohler, F., Napoli, A.: Analysis of social communities with iceberg and stability-based concept lattices. In: Proceedings of the 6th international confer- ence on Formal concept analysis. pp. 258–272. ICFCA’08, Springer-Verlag, Berlin, Heidelberg (2008), http://portal.acm.org/citation.cfm?id=1787746.1787765 8. Klimushkin, M., Obiedkov, S., Roth, C.: Approaches to the selection of relevant concepts in the case of noisy data. In: Kwuida, L., Sertkaya, B. (eds.) Proc. 8th Intl. Conf. Formal Concept Analysis. LNCS/LNAI, vol. 5986, pp. 255–266. Springer (2010) 9. Stumme, G., Taouil, R., Bastide, Y., Lakhal, L.: Conceptual clustering with iceberg concept lattices. In: In: Proc. of GI-Fachgruppentreffen Maschinelles Lernen’01, Universität Dortmund (2001) 10. Ventos, V., Soldano, H.: Alpha galois lattices: an overview. In: In: International Conference in Formal Concept Analysis (ICFCA05), LNCS. pp. 298–313. Springer (2005) Abduction in Description Logics using Formal Concept Analysis and Mathematical Morphology: Application to Image Interpretation Jamal Atif1 , Céline Hudelot2, and Isabelle Bloch3 1. Université Paris Sud, LRI - TAO, Orsay, France jamal.atif@lri.fr 2. Ecole Centrale de Paris, France, celine.hudelot@ecp.fr 3. Telecom ParisTech - CNRS LTCI, Paris, France isabelle.bloch@telecom-paristech.fr Abstract. We propose an original way of enriching Description Logics with ab- duction reasoning services by computing the best explanations of an observation through mathematical morphology (using erosions) over the Concept Lattice of a background theory. The intended application is scene understanding and spatial reasoning. Keywords: Abduction, Description Logics, FCA, Mathematical Morphology, Scene Understanding. 1 Introduction and notations Scene interpretation can benefit from prior knowledge expressed as ontologies and from description logics (DL) endowed with spatial reasoning tools as illustrated in our pre- vious work [5, 6]. The challenge in this work was to derive reasoning tools that are able to handle in a unified way quantitative information supplied by the image domain and qualitative pieces of knowledge supplied by the ontology level. Object recognition and interpretation are seen as the satisfiability of a current situation (spatial configuration) encoded in the ABox of the DL and its TBox part. However, when the expert knowledge is not crisply consistent with the observations, which is common in image interpreta- tion, then this approach does not apply or leads to inconsistent results. Adapting DL reasoning tools to such situations can be performed using abduction. Our aim is thus to compute the “best explanation” to the observed phenomena in such situations. Formally, given a background theory K representing the expert knowledge and a formula C rep- resenting an observation on the problem domain, abductive reasoning searches for an explanation formula D such that D is satisfiable w.r.t. K and it holds that K |= D → C (K ∪ D |= C). We propose to add abductive reasoning tools to DL by associating ingre- dients from mathematical morphology, DL and Formal Concept Analysis (FCA), and by computing the best explanations of an observation through algebraic erosion over the concept lattice of a background theory which is efficiently constructed using tools from FCA. We show that the defined operators satisfy important rationality postulates of abductive reasoning. Based on the TBox T and the ABox A parts of a knowledge base K, we consider ABox abduction [3]: if for every a ∈ A it holds that K 6|= ¬a, an ABox Abduction Prob- lem, denoted as hK, Ai, consists in finding a set of assertions γ such that K ∪ γ |= A. c 2011 by the paper authors. CLA 2011, pp. 405–408. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 406 2 Atif,Atif, Jamal Hudelot and Hudelot Céline Bloch and Isabelle Bloch The set γ (consistent with K) is said to be an explanation of A. Explanatory reason- ing is concerned with preferred explanations rather than just plain explanations. So, explaining an observation requires that some formulas must be “selected” as preferred explanations. We also rely on classical notions of (FCA), and denote a formal context by K = (G, M, I), where G is the set of objects, M the set of attributes and I ⊆ G × M a relation between the objects and attributes. For X ⊆ G and Y ⊆ M , the derivation operators are denoted by α and β, with α(X) = {m ∈ M | ∀g ∈ X, (g, m) ∈ I}, and β(Y ) = {g ∈ G | ∀m ∈ Y, (g, m) ∈ I}. The concept lattice is defined from the classical partial ordering (X1 , Y1 ) ≤ (X2 , Y2 ) ⇔ X1 ⊆ X2 (⇔ Y2 ⊆ Y1 ). Links between FCA and DL can be formalized via the notion of semantic context KT := (G, M, I) defined as [1]: G := {(I, d) | I is a model of T and d ∈ ∆I }, M := {m1 , . . . , mn }, and I := {((I, d), m) | d ∈ mI }, where I = (∆I , .I ) denotes an interpretation. The lattice can be constructed using the distributive concept exploration algorithm [9]. 2 Abduction Operators from Mathematical Morphology on Complete Lattices Let (L, ) and (L′ , ′ ) be two complete lattices (which do not need to be equal). An operator δ : L → L′ is a dilation if it commutes with the supremum. An operator ε : L′ → L is an erosion if it commutes with the infimum. Classical properties of mathematical morphology operators on complete lattices can be found in [4, 8]. Here, with the aim of performing ABox abduction, we would like to reason on sub- sets of G in order to find their best explanations (in G). Hence we consider the complete lattice (P(G), ⊆) and operations from P(G) into P(G), where P(G) is the set of sub- sets of G. Since the ordering on G is equivalent to the one of M , reasoning on G will di- rectly lead to results on M . In order to define explicit operations on P(G), we will make use of particular erosions and dilations, called morphological ones [8], which involve the notion of structuring element, i.e. a binary relation b between elements of G. For g ∈ G, we denote by b(g) the set of elements of G in relation with g. It can be typically derived from a distance d: b(g) = {g ′ ∈ G | ∃X ∈ P(G), g ′ ∈ X, d({g}, X) ≤ 1}. The morphological erosion of X is then expressed as εb (X) = {g ∈ G | b(g) ⊆ X}. Defining b from a distance is particularly interesting in the context of abduction, where the “most central” parts of models will have to be defined. Erosion is then expressed as εn (X) = {g ∈ G | d(g, X C ) > n}, where X C denotes the complement of X in G. Here G is a discrete finite space, and therefore only integer values of n are considered. All classical properties of mathematical morphology hold in this framework. Last Non-empty Erosion. As shown in [2] in the framework of propositional logic, erosions can be used to find explanations. In this context, the idea was to find the most central part of a formula as the best explanation. This approach was shown to have good properties with respect to rationality postulates of abductive reasoning [7]. In this paper, we propose similar ideas, but adapted to the context of concept lattices, using erosions as defined above. For any X ⊆ G, we define its last erosion as εℓ (X) = εn (X) ⇔ Abduction in Description Logics using FCA Abduction and Math. in DL using FCA Morphology 3 407 εn (X) 6= ∅, and ∀m > n, εm (X) = ∅. This last non-empty erosion defines the subset of models in G that are the furthest ones from the complement of X (according to the distance d), i.e. the most central in X. Definition 1 Let A be a set of ABox assertions. A preferred explanation γ of A is de- def fined from the last non-empty erosion as A⊲ℓne γ ⇔ γ I ⊆ εℓ (AI ). In this equation, AI should be understood as the extent of the semantic concept associated with the DL concept A. When a constraint (e.g. a set of hypotheses belonging to the backgroud the- def ory) H has to be introduced, then this definition is modified as A ⊲ℓne γ ⇔ γI ⊆ εℓ (HI ∩ AI ). Starting from the subset to be explained, performing successive erosions amounts to “go down” in the lattice as much as possible, in order to find a non-empty set of interpretations. Last Consistent Erosion. Another idea to introduce the constraint H is to erode it, as soon as it remains consistent with A. This leads to a second explanatory relation. Definition 2 A preferred explanation γ of A is defined from the last consistent erosion def as: A ⊲ℓc γ ⇔ γ I ⊆ εℓc (HI , AI ) ∩ AI , where AI corresponds to the extent of the semantic context and εℓc is the last consistent erosion defined as εℓc (HI , AI ) = εn (HI ) where n = max{k | εk (HI ) ∩ AI 6= ∅}. Here we consider erosion of H (i.e. HI ) alone, which means that we are looking at the subsets (submodels) of the models of A while being the most in the constraint. Properties and interpretations. A first important property is that reasoning on G ac- tually amounts to reason on the whole formal context. Here, explanations where defined from ABox reasoning, leading to erosions of subsets of G (models). Let (X, Y ) be a formal concept, with X ⊆ G and Y ⊆ M . From the definitions of explanations of X, we can derive directly the corresponding concepts for Y , using the derivation opera- tor, i.e. α(γ) = {m ∈ M | ∀g ∈ γ, (g, m) ∈ I}. Note that eroding X amounts to dilate Y , which is in accordance with the correspondence between the Galois connec- tion property between derivation operators and the adjunction properties of dilation and erosion. Let us now consider the rationality postulates introduced in [7] for explanation relations. It has been proved that most of them hold for explanations derived from last non-empty erosion and from last consistent erosion [2]. These results extend to the DL context as follows: - Both ⊲ℓne and ⊲ℓc are independent of the syntax (since they are computed on models). - Definitions are consistent in the sense that K 6|= ¬A iff ∃γ, A ⊲ γ. - A reflexivity property holds for both definitions: if A ⊲ γ, then γ ⊲ γ. - Disjunctions of explanations: if A ⊲ γ and A ⊲ δ, then A ⊲ (γ ⊔ δ), for both defi- nitions. This means that if there are several possible explanations, their disjunction is an explanation as well, which is an expected result. 408 4 Atif,Atif, Jamal Hudelot and Hudelot Céline Bloch and Isabelle Bloch - Disjunction on the left: if C⊲ℓc γ and D⊲ℓc γ, then (C ⊔ D)⊲ℓc γ (since the erosion is always performed on HI ). However this property does not hold for ⊲ℓne since erosion does not commute with the supremum. - For the same reasons, we have the following property for ⊲ℓc : if C ⊲ℓc γ and D ⊲ℓc δ, then (C ⊔ D) ⊲ℓc γ or (C ⊔ D) ⊲ℓc δ, but it does not hold for ⊲ℓne . - For conjunctions, we have a monotony property for ⊲ℓc : if C ⊲ℓc γ and γ I ⊆ DI (i.e. D |= γ), then (C ⊓ D) ⊲ℓc γ. For ⊲ℓne , only a weaker form holds: if C ⊲ℓne γ and D ⊲ℓne γ, then (C ⊓ D) ⊲ℓne γ. Note that this weaker form is also very natural and interesting. Since both ⊲ℓne and ⊲ℓc operators perform erosion in the interpretation set ∆I , any solution belongs then to this set and K is a model of the obtained solution. Hence we have the following theorems: - Soundness: If ∃γ | A ⊲ γ then K |= γ. - Completeness: K |= γ ⇒ ∃A | K |= A : A ⊲ γ. 3 Conclusion With the aim of image interpretation, we have proposed abductive inference services in DL based on mathematical morphology over concept lattices, whose construction is based on exploiting the advances of using FCA in DL. The properties and interpreta- tions of the introduced explanatory operators were analyzed, and the rational postulates of abductive reasoning were stated and extended to our context. Future work will con- cern the complexity analysis of these operators and associated algorithms, and a deeper investigation of their applications to image interpretation. References 1. F. Baader. Computing a minimal representation of the subsumption lattice of all conjunctions of concepts defined in a terminology. In Knowledge Retrieval, Use and Storage for Efficiency: 1st International KRUSE Symposium, pages 168–178, 1995. 2. I. Bloch, R. Pino-Pérez, and C. Uzcátegui. Explanatory Relations based on Mathematical Morphology. In ECSQARU 2001, pages 736–747, Toulouse, France, sep 2001. 3. C. Elsenbroich, O. Kutz, and U. Sattler. A case for abductive reasoning over ontologies. In OWL: Experiences and Directions, Athens, Georgia, USA, 2006. 4. H. J. A. M. Heijmans and C. Ronse. The Algebraic Basis of Mathematical Morphology – Part I: Dilations and Erosions. Computer Vision, Graphics and Image Processing, 50:245– 295, 1990. 5. C. Hudelot, J. Atif, and I. Bloch. Fuzzy spatial relation ontology for image interpretation. Fuzzy Sets and Systems, 159(15):1929–1951, 2008. 6. C. Hudelot, J. Atif, and I. Bloch. Integrating bipolar fuzzy mathematical morphology in description logics for spatial reasoning. In European Conference on Artificial Intelligence ECAI 2010, pages 497–502, Lisbon, Portugal, August 2010. 7. R. Pino-Pérez and C. Uzcátegui. Jumping to Explanations versus jumping to Conclusions. Artificial Intelligence, 111:131–169, 1999. 8. J. Serra. Image Analysis and Mathematical Morphology. Academic Press, New-York, 1982. 9. G. Stumme. Distributive concept exploration–a knowledge acquisition tool in formal concept analysis. In KI-98: Advances in Artificial Intelligence, pages 117–128. Springer, 1998. A local discretization of continuous data for lattices: Technical aspects Nathalie Girard, Karell Bertet and Muriel Visani Laboratory L3i - University of La Rochelle - FRANCE ngirar02, kbertet, mvisani@univ-lr.fr Abstract. Since few years, Galois lattices (GLs) are used in data mining and defining a GL from complex data (i.e. non binary) is a recent chal- lenge [1,2]. Indeed GL is classically defined from a binary table (called context), and therefore in the presence of continuous data a discretization step is generally needed to convert continuous data into discrete data. Discretization is classically performed before the GL construction in a global way. However, local discretization is reported to give better clas- sification rates than global discretization when used jointly with other symbolic classification methods such as decision trees (DTs). Using a re- sult of lattice theory bringing together set of objects and specific nodes of the lattice, we identify subsets of data to perform a local discretization for GLs. Experiments are performed to assess the efficiency and the ef- fectiveness of the proposed algorithm compared to global discretization. 1 Discretization process The discretization process consists in converting continuous attributes into dis- crete attributes [3]. This conversion can induce scaling attributes or disjoint intervals. We focus on the latter. Such a transformation is necessary for some classification models like symbolic models, which cannot handle continuous at- tributes [4]. Consider a continuous data set D = (O, F ), where each object in O is described by p continuous attributes in F . The discretization process is performed by iteration of attribute splitting step, according to a splitting cri- terion (Entropy [3], Gini [5], χ2 [6], ...) until a stopping criterion S is satisfied (a maximal number of intervals to create, a purity measure,...). More formally for one discretization step, for selecting the best attribute to be cut, let (v1 , . . . , vN ) be the sorted values of a continuous attribute V ∈ F . Each vi corresponds to a value verified by one object of the data set D. The set of possible cut-points is CV = (c1V , . . . , cVN −1 ) where ciV = vi +v2 i+1 ∀i ≤ N − 1. The best cut-point, denoted c∗V , is defined by: c∗V = argmaxciV ∈CV (gain(V, ciV , D)) (1) where gain(V, c, D) denotes in a generic manner the splitting criterion com- puted for the attribute V , the cut-point c ∈ CV and the data set D. The best attribute, denoted V ∗ , is the V ∈ F maximizing the splitting criterion computed for its best cut-point (i.e. c∗V ): V ∗ (D) = argmaxV ∈F (gain(V, c∗V , D)) (2) c 2011 by the paper authors. CLA 2011, pp. 409–412. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 410 Nathalie Girard, Karell Bertet and Muriel Visani Finally for one discretization step, the attribute V ∗ is divided into two intervals: [v1 , c∗V ∗ ] and ]c∗V ∗ , vn ] and the process is repeated. This process can be run using, at each step, all the objects in the training set. This is global discretization. It can also be run during model construction con- sidering, at each step, only a part of the training set. This is local discretiza- tion. In [7], Quinlan shows that local discretization improves supervised classification with decision trees (DTs) as compared with global discretiza- tion. In DT construction, the growing process is iterated until S is satisfied. Local discretization is performed on the subset of objects in the current node to select its best attribute (V ∗ (node)), according to the splitting criterion. Given the structural links between DTs and Galois lattices (GLs) [8], we propose a lo- cal discretization algorithm for GL and compare its performances with a global discretization. 2 Local discretization for Galois lattices A GL is generally defined from a binary relation R between objects O and bi- nary attributes I - i.e. a binary data set also called a formal context - denoted as a triplet T = (O, I, R). A GL is composed of a set of concepts - a concept (A, B) is a maximal objects-attributes subset in relation - ordered by a general- ization/specialization relation. For more details on GL theory, notation and their use in classification tasks, please refer to [9,10]. To define a local discretization for GL, we have to identify at each discretization step the subset of concepts to be processed. Given a subset of objects A ∈ P (O), there always exists a smallest concept M containing this subset and identified in lattice theory as a meet-irreducible concept of the GL [11]. Moreover, it is possible to compute the set of meet-irreducibles directly from the context, thus the generation of the lattice is useless [12]. Consequently, local discretization is performed on the set of meet-irreducible concepts M I which does not satisfy S. Attributes in M I are locally discretized: the best attribute V ∗ (M ) for each M ∈ M I is computed according to eq. (3); then the best one V ∗ (M I) (eq. (4),(5)) for the whole set M I is split into two intervals as explain before. The context T is then updated with these new intervals; and its M I are computed. The process is iterated until all M ∈ M I verify the stopping criterion S. The context T is initialized with, for each continuous attribute, an interval -i.e. a binary attribute- containing all continuous values observed in D; thus each object is in relation with every bi- nary attributes of T . The GL of the inital context T contains only one concept (O, I) being a meet-irreducible concept, which is used to initialize M I. See [13] for more details on the algorithm. The main difference with DT is that splitting an attribute in a GL impacts all the other concepts of the GL that contain this attribute, and due to the order relation between concepts ≤, the structure of the GL is also modified. Whereas, when an attribute is split in a DT node, predecessors and others branches are not impacted. In order to select the best V ∗ (M I) over all the concepts sharing this attribute, we introduce different computing of V ∗ (M I). A local discretization of continuous data for lattices: Technical aspects 411 Let M I = {Dq = (Aq , Bq ); q ≤ Q}} be the set of meet-irreducible concepts not satisfying S. The best attribute V ∗ (Dq ) associated to its best cut-point is first computed for each concept Dq ∈ M I: V ∗ (Dq ) = argmaxV ∈Bq (gain(V, c∗V , Dq )) (3) where c∗V is defined by (1) for Dq instead of D. ∗ ∗ ∗ Let us define IM I = {V (D1 ), . . . , V (DQ )} the set of best attributes associated to each concept in M I. The best attribute V ∗ (M I) among IM ∗ I can be defined in two different ways: By local discretization: Local discretization selects the best attribute V ∈ ∗ IM I as the one having the best gain for M I: V ∗ (M I) = argmaxV ∗ (Dq )∈IM ∗ ∗ ∗ (gain(V (Dq ), c ∗ I V (Dq ) , Dq )) (4) By linear local discretization: Linear local discretization takes into account ∗ that the split of one attribute V ∈ IM I in a concept Dq can impact the other concepts. So we compute a linear combination of the criterion as the sum of the gain for each concept Dq0 ∈ M I containing this attribute V . The selected attribute is the one that gives the best linear combination: X |Aq0 | V ∗ (M I) = argmaxV ∈IM ∗ ( P ∗ gain(V, c∗V , Dq0 )) (5) Dq ∈M I |Aq | I Dq0 ∈M I|V ∈Bq0 3 Experimental comparison The study is performed on three supervised databases of the UCI Machine Learn- ing Repository1 : the Image Segmentation database (Image1), the Glass Identifi- cation Database (GLASS) and the Breast Cancer Database (BREAST Cancer). We also use one supervised data set stemming from GREC 2003 database2 de- scribed by the statistical Radon signature (GREC Radon). Table 1 presents the complexity of each lattice structure associated to each discretization algo- rithm and the classification performance using each GL by navigation [14] and using CHAID as DT classifier [6]. Discretization is performed in each case with χ2 as a splitting and stopping supervised criterion. 4 Conclusion The study [3] shows that for DTs, local discretization induces more complex structures compared to global discretization; Table 1 shows that for GL, on the contrary, local discretization allows to reduce the structures’ com- plexity. In [7], Quinlan proves that local discretization improves classification performance of DTs compared to global discretization; as in DTs, Table 1 shows that local discretization improves GLs classification performances. 1 http://archive:ics:uci:edu/ml 2 www.cvc.uab.es/grec2003/symreccontest/index.htm 412 Nathalie Girard, Karell Bertet and Muriel Visani Table 1. Structures complexity and Classification performance Nb concepts Recognition rates Local Linear Local Global Local Linear Local Global CHAID Image1 527 649 12172 90.33 91.57 82.23 90.95 GLASS 1950 2128 2074 71.11 72.60 73.18 63.72 BREAST Cancer 3608 2613 7784 91.66 91.23 90.05 93,47 GREC Radon 69 92 2192 90.43 90.17 81.42 92.94 References 1. Ganter, B., Kuznetsov, S.: Pattern structures and their projections. In Delugach, H., Stumme, G., eds.: Conceptual Structures: Broadening the Base. Volume 2120 of LNCS. (2001) 129–142 2. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181 (2011) 1989– 2001 3. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proc. of the Twelfth International Conference, Morgan Kaufmann (1995) 194–202 4. Muhlenbach, F., Rakotomalala, R.: Discretization of continuous attributes. In Reference, I.G., ed.: Encyclopedia of Data Warehousing and Mining. J. Wang (2005) 397–402 5. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression trees. Wadsworth Inc., 358 pp (1984) 6. Kass, G.: An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29(2) (1980) 119–127 7. Quinlan, J.: Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research 4 (1996) 77–90 8. Guillas, S., Bertet, K., Visani, M., Ogier, J.M., Girard, N.: Some links between decision tree and dichotomic lattice. In: Proc. of the Sixth International Conference on Concept Lattices and Their Applications, CLA 2008 (2008) 193–205 9. Ganter, B., Wille, R.: Formal concept analysis, Mathematical foundations. Springer Verlag, Berlin, 284 pp (1999) 10. Fu, H., Fu, H., Njiwoua, P., Nguifo, E.M.: A comparative study of fca-based supervised classification algorithms. In: Concept Lattices. Volume LNCS 2961. (2004) 219–220 11. Birkhoff, G.: Lattice theory. Third edn. Volume 25. American Mathematical Society, 418 pp (1967) 12. Wille, R.: Restructuring lattice theory : an approach based on hierarchies of con- cepts. Ordered sets (1982) 445–470 I. Rival (ed.), Dordrecht-Boston, Reidel. 13. Girard, N., Bertet, K., Visani, M.: Local discretization of numerical data for galois lattices. In: Proceedings of the 23rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2011 (2011) to appear. 14. Visani, M., Bertet, K., Ogier, J.M.: Navigala: an original symbol classifier based on navigation through a galois lattice. International Journal of Pattern Recognition and Artificial Intelligence, IJPRAI 25 (2011) 449–473 Formal Concept Analysis on Graphics Hardware W. B. Langdon, Shin Yoo, and Mark Harman CREST centre, Department of Computer Science, University College London Gower Street, London WC1E 6BT, UK Abstract. We document a parallel non-recursive beam search GPGPU FCA CbO like algorithm written in nVidia CUDA C and test it on soft- ware module dependency graphs. Despite removing repeated calculations and optimising data structures and kernels, we do not yet see major speed ups. Instead GeForce 295 GTX and Tesla C2050 report 141 072 concepts (maximal rectangles, clusters) in about one second. Future improvements in graphics hardware may make GPU implementations of Galois lattices competitive. Keywords: software module clustering, MDG, close-by-one, arithmetic intensity 1 Introduction Formal Concept Analysis [7] is a well known technique for grouping objects by the attributes they have in common. It can be thought of as discrete data clustering. In general the number of conceptual clusters grows exponentially. However there are a few specialised algorithms which render FCA manageable, even on quite large problems, provided the object-attribute table is sparse [10]. Krajca, Outrata and Vychodil [10] report considerable improvement in FCA algorithms in the last two decades. All these successful algorithms use depth first tree search to find all the conceptual clusters in an object-attribute table. Computer graphics gaming cards (GPUs) are relatively cheap and yet offer far more computing power than the computer’s CPU alone. (E.g. a 295 GTX con- tains 480 fully functioning processors and yet costs only a few hundred pounds.) Also microprocessor trends suggest faster computing will require parallel com- puting in future. There are already hundreds of millions of computers fitted with graphics hardware which might be used for general purpose computing [3]. Krajca et al. [10] report using a distributed computer to overcome the “major drawback [of FCA’s] computational complexity”. They report their parallel algo- rithm PCbO gives near linear speed increase with number of computing nodes in a network of up to 15 PCs. In other work [11] they conclude that there is no universal best FCA data structure. Instead they suggest that the optimum performance will depend upon the application. In earlier work, Huaiguo Fu had created a parallel implementation of NextClosure but it was limited to 50 at- tributes [5] but this was subsequently greatly extended [6]. However, like Krajca et al. [10], both Fu’s [5] and [6] approaches use conventional distributed comput- ers composed of a few CPUs rather than hundreds of GPU processing elements. c 2011 by the paper authors. CLA 2011, pp. 413–416. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 414 W. B. Langdon, Shin Yoo, and Mark Harman Similarly Djoufak Kengue et al. [4]’s ParCIM implementation used a conven- tional network of 8 computers connected in a star fashion with MPI. Ours is the first FCA implementation to run in parallel on computer graphics cards (GPUs). 2 CUDA FCA Implementation Although In-close [1] claims to be faster we easily obtained FCbO [9] from Source Forge. We initially implemented the Krajca sequential algorithm [9] in Python. This was followed by a version in CUDA C, where ComputeClosure is imple- mented in parallel on the GPU. (For details see our technical report [13].) Krajca’s routines ComputeClosure and GenerateFrom essentially form a depth first search algorithm which builds and navigates a tree of formal concepts from a binary 0/1 matrix describing which object has which property. Since the search is recursive and operates on one point in the tree at one time, it is unsuitable for parallel operation on graphics cards. Our graphics card parallel version retains the tree but uses beam search rather than depth first search. Instead of proceeding to the first leaf of the tree, recursively backing up and then going forward to the next leaf and so on, in beam search, we also start from the top of the tree and then proceed along every branch to the next level. This requires saving information on the beam for every node at that level. Beam search next expands the search again to cover everything at the next level and so on until all the leafs of the tree have been reached. Notice instead of working on a single point in the tree the beam covers many points which can be worked on in parallel. Indeed within a couple of levels we can get a beam containing tens of thousands of individual search points which can be processed independently. This suits the GPU architecture which needs literally thousands of independent processing threads for it to deliver its best performance [12]. You will have spotted that in an exponential problem, like FCA, beam search quickly runs out of memory. Even for quite modest tree depths the beam width is limited by the available space in the GPU card. (We have a configuration limit of 1.8 million simultane- ous parallel operations.) When a beam search exceeds this limit, only the first 1.8 million searches are loaded onto the GPU and the rest of the beam is queued on the host PC. (Although we have not done this, in multi-GPU systems it would be possible to split the beam between the GPUs, allocating up to 1.8 million to each GPU.) The GPU only searches to the next level. It returns the concepts found by the searches and the newly discovered branches which remain to be searched. The concepts are printed by the host PC and the new branches are added to the end of the beam to await their turn. Effectively the beam becomes a queue of points in the tree waiting to be searched. The number of parallel searches is mostly limited by the need to have space on the GPU for all the potential new branches. This depends upon the tree’s fan out which is problem dependent. Nonetheless the GPU can manage modest real software engineering examples (e.g. dependence clustering of the Linux kernel). Notice the beam will contain a mixture of pending search points at different depths in the tree. Formal Concept Analysis on Graphics Hardware 415 Table 1. Performance on, FCA benchmarks, random module dependency graphs, and Software Engineering datasets [8]. Time given in seconds, except longest Python run which is hours:mins:secs. (For 21 295 GTX and Tesla C2050 the total time on the GPU is given.) Dataset Size Density Concepts FCbO Python 295 GTX C2050 krajca 5×7 54% 16 0.00 0.11 0.01 0.01 wiki 10×5 44% 14 0.00 0.03 0.00 0.00 random 10×10 20% 16 0.00 0.04 0.00 0.00 random 100×100 2% 137 0.00 0.40 0.02 0.01 random 200×200 2% 420 0.00 4.33 0.00 0.01 random 500×500 2% 2861 0.01 162.60 0.02 0.02 bison 37×37 24% 692 0.00 0.32 0.00 0.01 compiler 33×33 6% 24 0.00 0.05 0.00 0.00 dot 42×42 28% 1302 0.00 0.71 0.00 0.01 grappa 86×86 7% 850 0.00 2.54 0.01 0.01 incl 172×172 2% 238 0.00 1.84 0.00 0.01 ispell 24×24 34% 432 0.00 0.15 0.01 0.01 linuxConverted 955×955 2% 141072 0.73 15:42:51 1.79 0.93 mtunis 20×20 29% 110 0.00 0.05 0.00 0.01 rcs 29×29 37% 1074 0.00 0.46 0.01 0.02 swing 413×413 2% 3654 0.01 208.71 0.03 0.02 3 Results FCbO (version 2010/10/05) was downloaded and compiled without changes on a 2.66 GHz PC with 3 Gigabytes of RAM running 64 bit CentOS 5.0. The per- formance of FCbO, our Python code and our CUDA code on two types of GPU are given in Table 1. They show performance on: two bench mark problems, a se- lection of randomly generated symmetric object-attribute pairings and software module dependency graphs of real world example programs. 4 Discussion It is unclear why our code does not do better. We would expect a linear speed advantage for FCbO from both using 64 bit operations and from using compiled rather than interpreted code. However on sizable examples, the ratio between the speed of FCbO and that of our Python code is huge. This hints that FCbO has some algorithmic advantage. GPUs are often limited by the time taken to move data rather than to per- form calculations. “Arithmetic intensity” is the ratio of calculations per data item. Typically this is in the range 4–64 FLOP/TDE [2, p206], we estimate the arithmetic intensity of Krajca et al.’s algorithm [9] is less than 1. Thus a po- tential problem might be there is simply is not enough computation required by FCA compared to the volume of data. 416 W. B. Langdon, Shin Yoo, and Mark Harman Newer versions of CUDA have make it easier to overlap GPU operations. However our implementation does not do this. Since the work is spread across the multi-processors, we suspect that idle time is not a major problem. 5 Conclusions There are many problems which are traditionally solved by depth first search. However this may not suit low cost computer graphics GPU hardware. We have implemented a form of beam search and demonstrated it on several existing FCA benchmarks and ten software engineering dependence clustering problems [8]. GPU beam search may also be more widely applicable. Acknowledgements I am grateful for the assistance of Gernot Ziegler of nVidia. Steve Worley, Sar- nath Kannan, Stephen Swift, Stan Seibert and Yuanyuan Zhang. Software en- gineering MDGs were supplied by Spiros Mancoridis. Tesla donated by nVidia. Funded by EPSRC grant EP/G060525/2. References 1. S. J. Andrews. In-close, a fast algorithm for computing formal concepts. In Con- ceptual Structures Tools Interoperability Workshop at the 17th International Con- ference on Conceptual Structures, Moscow, 26-31 July 2009. 2. M. Christen, O. Schenk, and H. Burkhart. Automatic code generation and tuning for stencil kernels on modern shared memory architectures. CSRD, 26(3):205–210. 3. B. Del Rizzo. Dice puts faith in nvidia PhysX technology for Mirror’s Edge. NVIDIA Corporation press release, Nov 19 2008. 4. J. Djoufak Kengue, P. Valtchev, and C. Tayou Djamegni. Parallel computation of closed itemsets and implication rule bases. In I. Stojmenovic, et al., eds., ISPA 2007, LNCS 4742, pp359–370. Springer. 5. Huaiguo Fu and E. Nguifo. A parallel algorithm to generate formal concepts for large data. In P. Eklund, ed., ICFCA, LNAI 2961, pp141–142. Springer, 2004. 6. Huaiguo Fu and M. O’Foghlu. A distributed algorithm of density-based subspace frequent closed itemset mining. In HPCC, pp750–755. IEEE, 2008. 7. B. Ganter and R. Wille. Formal Concept Analysis. Springer, 1999. 8. M. Harman, S. Swift, and K. Mahdavi. An empirical study of the robustness of two module clustering fitness functions. In H.-G. Beyer, et al., eds., GECCO 2005. 9. P. Krajca, J. Outrata, and V. Vychodil. Parallel recursive algorithm for FCA. In R. Belohlavek and S. O. Kuznetsov, eds., CLA 2008, Olomouc, Czech Republic. 10. P. Krajca, J. Outrata, and V. Vychodil. Parallel algorithm for computing fixpoints of Galois connections. Ann Math Artif Intel, 59:257–272, 2010. 11. P. Krajca and V. Vychodil. Comparison of data structures for computing formal concepts. In V. Torra, et al., eds., MDAI 2009, LNCS 5861, pp114–125. Springer. 12. W. B. Langdon. Graphics processing units and genetic programming: An overview. Soft Computing, 15:1657–1669, Aug. 2011. 13. W. B. Langdon, S. Yoo, and M. Harman. Non-recursive beam search on GPU for formal concept analysis. RN/11/18, Computer Science, UCL, London, UK, 2011. Author Index Assaghir, Zainab, 319 Grissa, Dhouha, 207 Astudillo, Hernán, 349 Guennec, David, 295 Atif, Jamal, 405 Guillaume, Sylvie, 207 Azmeh, Zeina, 377 Hacéne-Rouane, Mohamed, 377 Baixeries, Jaume, 333 Harman, Mark, 413 Balbiani, Philippe, 279 Huchard, Marianne, 377 Bazhanov, Konstantin, 43 Hudelot, Céline, 405 Belohlavek, Radim, 207 Irlande, Alexis, 131 Berry, Anne, 15 Bertet, Karell, 239, 409 Jay, Nicolas, 363 Bloch, Isabelle, 1, 405 Boc, Alix, 191 Kılıçaslan, Yılmaz, 59 Borchmann, Daniel, 101 Kaytoue, Mehdi, 175, 319 Braud, Agnés, 265 Konecny, Jan, 115 Brito, Paula, 251 Krupka, Michal, 115 Kuznetsov, Sergei, 175 Carlos Dı́az, Juan, 75 Cellier, Peggy, 31 Langdon, W. B., 413 Codocedo, Vı́ctor, 349 Le Ber, Florence, 265 Colomb, Pierre, 131 Leclerc, Bruno, 9 Lieber, Jean , 87 Demko, Christophe, 239 Llansó, David, 143 Distel, Felix, 101 Ducasse, Mireille, 31 Macko, Juraj, 175 Makarenkov, Vladimir, 191 Egho, Elias, 363 Medina-Moreno, Jesús, 75 Emilion, Richard, 3 Meira, Wagner, 175, 319 Erné, Marcel, 5 Miclet, Laurent, 295 Monjardet, Bernard, 11 Falk, Ingrid, 223 Ferre, Sebastien, 31 Napoli, Amedeo, 175, 191, 363, 377 Nauer, Emmanuel, 87 Güner, Edip Serdar, 59 Nguifo, Engelbert Mephu, 207 Gómez-Martı́n, Marco Antonio, 143 Nica, Cristina, 265 Gély, Alain, 393 Obiedkov, Sergei, 43 Gaillard, Emmanuelle, 87 Outrata, Jan, 207 Ganter, Bernhard, 309 Gardent, Claire, 223 Pogorelcnik, Romain, 15 Gehrke, Mai, 7 Polaillon, Géraldine, 251 Girard, Nathalie, 409 Prade, Henri, 295 Glodeanu, Cynthia Vera, 159 Godin, Robert, 191 Raissi, Chedy, 363 Gomez-Martin, Pedro Pablo, 143 Raynaud, Olivier, 131 González-Calero, Pedro Antonio, 143 Renaud, Yoan, 131 Grac, Corinne, 265 Ryssel, Uwe, 101 Sigayret, Alain, 15 Valtchev, Petko, 191, 377 Simovici, Dan, 13 Villerd, Jean, 319 Szathmary, Laszlo, 191 Visani, Muriel, 409 Taramasco, Carla, 349 Yoo, Shin, 413 Editors: Amedeo Napoli, Vilem Vychodil Publisher & Print: INRIA Nancy – Grand Est and LORIA France Title: CLA 2011, Proceedings of the Eighth International Conference on Concept Lattices and Their Applications Place, year, edition: Nancy, 2011, 1st Page count: xii+419 Impression: 100 Archived at: http://cla.inf.upol.cz Not for sale ISBN 978–2–905267–78–8