CLA 2011 Proceedings of the Eighth International Conference on Concept Lattices and Their Applications CLA Conference Series http://cla.inf.upol.cz INRIA Nancy – Grand Est and LORIA, France The Eighth International Conference on Concept Lattices and Their Applications CLA 2011 Nancy, France October 17–20, 2011 Edited by Amedeo Napoli Vilem Vychodil CLA 2011, October 17–20, 2011, Nancy, France. Copyright c 2011 by paper authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors. Technical Editors: Jan Outrata, jan.outrata@upol.cz Vilem Vychodil, vychodil@acm.org Page count: xii+419 Impression: 100 Edition: 1st First published: 2011 Printed version published by INRIA Nancy – Grand Est and LORIA, France ISBN 978–2–905267–78–8 Organization CLA 2011 was organized by the INRIA Nancy – Grand Est and LORIA Steering Committee Radim Belohlavek Palacky University, Olomouc, Czech Republic Sadok Ben Yahia Faculté des Sciences de Tunis, Tunisia Jean Diatta Université de la Réunion, France Peter Eklund University of Wollongong, Australia Sergei O. Kuznetsov State University HSE, Moscow, Russia Michel Liquière LIRMM, Montpellier, France Engelbert Mephu Nguifo LIMOS, Clermont-Ferrand, France Program Chairs Amedeo Napoli INRIA NGE/LORIA, Nancy, France Vilem Vychodil Palacky University, Olomouc, Czech Republic Program Committee Jaume Baixeries Polytechnical University of Catalonia Jose Balcazar University of Cantabria and UPC Barcelona, Spain Radim Belohlavek Palacky University, Olomouc, Czech Republic Karell Bertet University of La Rochelle, France François Brucker University of Marseille, France Claudio Carpineto Fondazione Ugo Bordoni, Roma, Italy Jean Diatta Université de la Réunion, France Felix Distel TU Dresden, Germany Florent Domenach University of Nicosia, Cyprus Mireille Ducassé IRISA Rennes, France Alain Gély University of Metz, France Cynthia Vera Glodeanu TU Dresden, Germany Marianne Huchard LIRMM, Montpellier, France Vassilis G. Kaburlasos TEI, Kavala, Greece Stanislav Krajci University of P.J. Safarik, Kosice, Slovakia Sergei O. Kuznetsov State University HSE, Moscow, Russia Léonard Kwuida Zurich University of Applied Sciences, Switzerland Mondher Maddouri URPAH, University of Gafsa, Tunisie Rokia Missaoui UQO, Gatineau, Canada Lhouari Nourine LIMOS, University of Clermont Ferrand, France Sergei Obiedkov State University HSE, Moscow, Russia Manuel Ojeda-Aciego University of Malaga, Spain Jan Outrata Palacky University, Olomouc, Czech Republic Pascal Poncelet LIRMM, Montpellier, France Uta Priss Napier University, Edinburgh, United Kingdom Olivier Raynaud LIMOS, University of Clermont Ferrand, France Camille Roth EHESS, Paris, France Stefan Schmidt TU Dresden, Germany Baris Sertkaya SAP Research Center, Dresden, Germany Henry Soldano Université of Paris 13, France Gerd Stumme University of Kassel, Germany Petko Valtchev Université du Québec à Montréal, Canada Additional Reviewers Mikhail Babin State University HSE, Moscow, Russia Daniel Borchmann TU Dresden, Germany Peggy Cellier IRISA Rennes, France Sebastien Ferre IRISA Rennes, France Nathalie Girard University of La Rochelle, France Alice Hermann IRISA Rennes, France Mehdi Kaytoue INRIA NGE/LORIA, Nancy, France Petr Krajca Palacky University, Olomouc, Czech Republic Christian Meschke TU Dresden, Germany Petr Osicka Palacky University, Olomouc, Czech Republic Violaine Prince LIRMM, Montpellier, France Chedy Raissy INRIA NGE/LORIA, Nancy, France Yoan Renaud LIRIS, Lyon, France Heiko Reppe TU Dresden, Germany Lucie Urbanova Palacky University, Olomouc, Czech Republic Jean Villerd ENSAIA, Nancy, France Organization Committee Mehdi Kaytoue (chair) INRIA NGE/LORIA, Nancy, France Elias Egho INRIA NGE/LORIA, Nancy, France Felipe Melo INRIA NGE/LORIA, Nancy, France Amedeo Napoli INRIA NGE/LORIA, Nancy, France Chedy Raı̈ssi INRIA NGE/LORIA, Nancy, France Jean Villerd ENSAIA, Nancy, France Table of Contents Preface Invited Contributions Mathematical Morphology, Lattices, and Formal Concept Analysis . . . . . . 1 Isabelle Bloch Random concept lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Richard Emilion Galois and his Connections—A retrospective on the 200th birthday of Evariste Galois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Marcel Erné Canonical extensions, Duality theory, and Formal Concept Analysis . . . . . 7 Mai Gehrke Galois connections and residuation: origins and developments II . . . . . . . . . 9 Bruno Leclerc Galois connections and residuation: origins and developments I . . . . . . . . . 11 Bernard Monjardet Metrics, Betweeness Relations, and Entropies on Lattices and Applications 13 Dan Simovici Long Papers Vertical decomposition of a lattice using clique separators . . . . . . . . . . . . . . 15 Anne Berry, Romain Pogorelcnik and Alain Sigayret Building up Shared Knowledge with Logical Information Systems . . . . . . . 31 Mireille Ducasse, Sebastien Ferre and Peggy Cellier Comparing performance of algorithms for generating the Duquenne- Guigues basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Konstantin Bazhanov and Sergei Obiedkov Filtering Machine Translation Results with Automatically Constructed Concept Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Yılmaz Kılıçaslan and Edip Serdar Güner Concept lattices in fuzzy relation equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Juan Carlos Dı́az and Jesús Medina-Moreno Adaptation knowledge discovery for cooking using closed itemset extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer Fast Computation of Proper Premises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Uwe Ryssel, Felix Distel and Daniel Borchmann Block relations in fuzzy setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Jan Konecny and Michal Krupka A closure algorithm using a recursive decomposition of the set of Moore co-families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Pierre Colomb, Alexis Irlande, Olivier Raynaud and Yoan Renaud Iterative Software Design of Computer Games through FCA . . . . . . . . . . . . 143 David Llansó, Marco Antonio Gómez-Martı́n, Pedro Pablo Gomez-Martin and Pedro Antonio González-Calero Fuzzy-valued Triadic Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Cynthia Vera Glodeanu Mining bicluster of similar values with triadic concept analysis . . . . . . . . . . 175 Mehdi Kaytoue, Sergei Kuznetsov, Juraj Macko, Wagner Meira and Amedeo Napoli Fast Mining of Iceberg Lattices: A Modular Approach Using Generators . 191 Laszlo Szathmary, Petko Valtchev, Amedeo Napoli, Robert Godin, Alix Boc and Vladimir Makarenkov Boolean factors as a means of clustering of interestingness measures of association rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Radim Belohlavek, Dhouha Grissa, Sylvie Guillaume, Engelbert Mephu Nguifo and Jan Outrata Combining Formal Concept Analysis and Translation to Assign Frames and Thematic Grids to French Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Ingrid Falk and Claire Gardent Generation algorithm of a concept lattice with limited access to objects . . 239 Christophe Demko and Karell Bertet Homogeneity and Stability in Conceptual Analysis . . . . . . . . . . . . . . . . . . . . 251 Paula Brito and Géraldine Polaillon A lattice-based query system for assessing the quality of hydro-ecosystems 265 Agnés Braud, Cristina Nica, Corinne Grac and Florence Le Ber The word problem in semiconcept algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Philippe Balbiani Looking for analogical proportions in a formal concept analysis setting . . . 295 Laurent Miclet, Henri Prade and David Guennec Random extents and random closure systems . . . . . . . . . . . . . . . . . . . . . . . . . 309 Bernhard Ganter Extracting Decision Trees From Interval Pattern Concept Lattices . . . . . . 319 Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd A New Formal Context for Symmetric Dependencies . . . . . . . . . . . . . . . . . . 333 Jaume Baixeries Cheating to achieve Formal Concept Analysis over a large formal context 349 Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo A FCA-based analysis of sequential care trajectories . . . . . . . . . . . . . . . . . . . 363 Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli Querying Relational Concept Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Zeina Azmeh, Mohamed Hacéne-Rouane, Marianne Huchard, Amedeo Napoli and Petko Valtchev Links between modular decomposition of concept lattice and bimodular decomposition of a context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Alain Gély Short Papers Abduction in Description Logics using Formal Concept Analysis and Mathematical Morphology: application to image interpretation . . . . . . . . . 405 Jamal Atif, Céline Hudelot and Isabelle Bloch A local discretization of continuous data for lattices: Technical aspects . . . 409 Nathalie Girard, Karell Bertet and Muriel Visani Formal Concept Analysis on Graphics Hardware . . . . . . . . . . . . . . . . . . . . . . 413 W. B. Langdon, Shin Yoo, and Mark Harman Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417 Preface The Eighth International Conference “Concept Lattices and Applications (CLA 2011)” is held in Nancy, France from October 17th until October 20th 2011. CLA 2011 is aimed at providing to everyone interested in Formal Concept Analysis and more generally in Concept Lattices or Galois Lattices, students, professors, researchers and engineers, a global and an advanced view of some of the last research trends and applications in this field. As the diversity of the selected pa- pers shows, there is a wide range of theoretical and practical research directions, around data and knowledge processing, e.g. data mining, knowledge discovery, knowledge representation, reasoning, pattern recognition, together with logic, algebra and lattice theory. This volume includes the selected papers and the abstracts of the 7 invited talks. This year there were initially 47 submissions from which 27 papers were accepted as full papers and 3 papers as posters. We would like to thank here the authors for their work, often of very good quality, the members of the program committee and the external reviewers who did a great job as this can be seen in their reviews. This is one witnesses of the growing quality and importance of CLA, highlightening its leading position in the field. Next, this year is a little bit special while the bicentennial of the birth of Evariste Galois (1811–1832) is celebrated, particularly in France. Evariste Galois has something to do with Concept Lattices as they are based on a so-called “Galois connection”. Among the invited speakers, some of them will discuss of these fundamental aspects of Concept Lattices. Moreover, this is also the occasion of thanking the seven invited speakers who, at least we hope that, will meet the wishes of the attendees. We would like to thank firstly our first sponsors, namely the CNRS GDR I3 and Institut National Polytechnique de Lorraine (INPL). Then we would like to thank the steering committee of CLA for giving us the occasion of leading this edition of CLA, the conference participants for their participation and support, and people in charge of the organization, especially Anne-Lise Charbonnier, Nicolas Alcaraz and Mehdi Kaytoue, whose help was very precious in many occasions. Finally, we also do not forget that the conference was managed (quite easily) with the Easychair system, paper submission, selection, and reviewing, and that Jan Outrata has offered his files for preparing the proceedings. October 2011 Amedeo Napoli Vilem Vychodil Program Chairs of CLA 2011 Mathematical Morphology, Lattices, and Formal Concept Analysis Isabelle Bloch Telecom ParisTech, CNRS LTCI, Paris, France Abstract. Lattice theory has become a popular mathematical framework in differ- ent domains of information processing, and various communities employ its features and properties, e.g. in knowledge representation, in logics, automated reasoning and decision making, in image processing, in information retrieval, in soft computing, in formal concept analysis. Mathematical morphology is based adjunctions, on the alge- braic framework of posets, and more specifically of complete lattices, which endows it with strong properties and allows for multiple applications and extensions. In this talk we will summarize the main concepts of mathematical morphology and show their instantiations in different settings, where a complete lattice can be built, such as sets, functions, partitions, fuzzy sets, bipolar fuzzy sets, formal logics . . . We will detail in particular the links between morphological operations and formal concept analysis, thus initiating links between two domains that were quite disconnected until now, which could therefore open new interesting perspectives. Random concept lattices Richard Emilion MAPMO, University of Orléans, France Abstract. After presenting an algorithm providing concepts and frequent concepts, we will study the random size of concept lattices in the case of a Bernoulli(p) context. Next, for random lines which are independent and identically distributed or more generally outcomes of a Markov chain, we will show the almost everywhere convergence of the random closed intents towards deterministic intents. Finally we will consider the problem of predicting the number of concepts before choosing any algorithm. Galois and his Connections—A retrospective on the 200th birthday of Evariste Galois Marcel Erné University of Hannover, Germany Abstract. A frequently used tool in mathematics is what Oystein Ore called “Galois connections” (also “Galois connexions”, “Galois correspondences” or “dual adjunc- tions”). These are pairs (ϕ, ψ) of maps between ordered sets in opposite direction so that x ≤ ψ(y) is equivalent to y ≤ ϕ(x). This concept looks rather simple but proves very effective. The primary gain of such “dually adjoint situations” is that the ranges of the involved maps are dually isomorphic: thus, Galois connections present two faces of the same medal. Many concrete instances are given by what Garrett Birkhoff termed “polarities”: these are nothing but Galois connections between power sets. In slightly different terminology, the fundamental observation of modern Formal Concept Analysis is that every “formal context”, that is, any triple (J, M, I) where I is a relation between (the elements of) J and M , gives rise to a Galois connection (assigning to each subset of one side its “polar”, “extent” or “intent” on the other side), such that the resulting two closure systems of polars are dually isomorphic; more surprising is the fact that, conversely, every dual isomorphism between two closure systems arises in a unique fashion from a relation between the underlying sets. In other words: the complete Boolean algebra of all relations between J and M is isomorphic to that of all Galois connections between ¶J and ¶M , and also to that of all dual isomorphisms between closure systems on J and M , respectively. The classical example is the Fundamental Theorem of Galois Theory, establishing a dual isomorphism between the complete lattice of all intermediate fields of a Galois ex- tension and that of the corresponding automorphism groups, due to Richard Dedekind and Emil Artin. In contrast to that correspondence, which does not occur explicitly in Galois’ succinct original articles, a few other closely related Galois connections may be discovered in his work (of course not under that name). Besides these historical forerunners, we discuss a few other highlights of mathematical theories where Galois connections enter in a convincing way through certain “orthogonality” relations, and show how the Galois approach considerably facilitates the proofs. For example, each of the following important structural isomorphisms arises from a rather simple relation on the respective ground sets: – the dual isomorphism between the subspace lattice of a finite-dimensional linear space and the left ideal lattice of its endomorphism ring – the duality between algebraic varieties and radical ideals – the categorical equivalence between ordered sets and Alexandroff spaces – the representation of complete Boolean algebras as systems of polars. Canonical extensions, Duality theory, and Formal Concept Analysis Mai Gehrke LIAFA CNRS – University of Paris 7, France Abstract. The theory of canonical extensions, developed by Jonsson and Tarski in the setting of Boolean algebras with operators, provides an algebraic approach to duality theory. Recent developments in this theory have revealed that in this algebraic guise duality theory is no more complicated outside than within the setting of Boolean algebras or distributive lattices. This has opened the possibility of exporting the highly developed machinery and knowledge available in the classical setting (e.g. in modal logic) to the completely general setting of partially ordered and non-distributive lattice ordered algebras. Duality theory in this setting is a special instance of the connection between formal contexts and concept lattices and thus allows methods of classical algebraic logic to be imported into FCA. This will be an introductory talk on the subject of canonical extensions with the purpose of outlining the relationship between the three topics of the title. Galois connections and residuation: origins and developments II Bruno Leclerc CAMS – École des Hautes Études en Sciences Sociales, Paris, France Abstract. From the seventies, the uses of Galois connections (and residuated/residual maps) multiplied in applied fields. Indeed Galois connections have been several times rediscovered for one or another purpose, for instance in fuzzy set theory or aggregation problems. In this talk, we illustrate the diversity of such applications. Of course, the many developments in Galois lattices and Formal Concept Analysis, with their rela- tion with Data Mining, will be only briefly evoked. Besides these developments, one finds, among other uses, alternative methods to study a correspondence (binary rela- tion) between two sets, models of classification and preferences, fitting and aggregation problems. Galois connections and residuation: origins and developments I Bernard Monjardet Centre d’Economie de la Sorbonne (University of Paris 1) and CAMS (Centre Analyse et Mathmatique Sociale), France Abstract. The equivalent notions of Galois connexions, and of residual and residuated maps occur in a great varieties of “pure” as well as “applied” mathematical theories. They explicitly appeared in the framework of lattice theory and the first of these talks is devoted to the history of their appearance and of the revealing of their links in this framework. So this talk covers more or less the period between 1940 (with the notion of polarity defined in the first edition of Birkoff’s book Lattice theory) and 1972 (with Blyth and Janowitz’s book Residuation theory), a period containing fundamental works like Öre’s 1944 paper Galois connexions or Croisot’s 1956 paper Applications résiduées. Metrics, Betweeness Relations, and Entropies on Lattices and Applications Dan Simovici Department of Computer Science, University of Massachusetts at Boston, USA Abstract. We discuss an algebraic axiomatization of the notion of entropy in the framework of lattices as well as characterizations of metric structures induced by such entropies. The proposed new framework takes advantage of the partial orders defined on lattices, in particular the semimodular lattice of partitions of a finite set to allow multiple applications in data mining: data discretization, recommendation systems, classification, and feature selection. Vertical decomposition of a lattice using clique separators Anne Berry, Romain Pogorelcnik, Alain Sigayret LIMOS UMR CNRS 6158?? Ensemble Scientifique des Cézeaux Université Blaise Pascal, F-63 173 Aubière, France. berry@isima.fr, romain.pogorelcnik@isima.fr, sigayret@isima.fr Abstract. A concept (or Galois) lattice is built on a binary relation; this relation can be represented by a bipartite graph. We explain how we can use the graph tool of clique minimal separator decomposition to decom- pose some bipartite graphs into subgraphs in linear time; each subgraph corresponds to a subrelation. We show that the lattices of these subrela- tions easily yield the lattice of the global relation. We also illustrate how this decomposition is a tool to help displaying the lattice. Keywords: lattice decomposition, clique separator decomposition, lat- tice drawing 1 Introduction In many algorithms dealing with hard problems, a divide-and-conquer approach is helpful in practical applications. Computing the set of concepts associated with a given context (or the set of maximal rectangles associated with a binary relation) is time-consuming, as there may be an exponential number of concepts. It would be interesting to decompose the lattice into smaller sublattices. What we propose here is to decompose the relation into smaller subrelations, compute the lattice of each subrelation, and then use these lattices to reconstruct the lattice of the global relation. For this, we use a graph decomposition, called ”clique separator decomposi- tion”, introduced by Tarjan [9], and refined afterwards (see [3] for an extensive introduction to this decomposition). The general principal is roughly the follow- ing: repeatedly find a set of vertices which are pairwise adjacent (called a clique) and whose removal disconnects the graph (called a separator), then copy this clique separator into the different connected components obtained. When the de- composition is completed, a set of subgraphs is obtained, inconveniently called ’atoms’: each subgraph is a maximal subgraph containing no clique separator. ?? Research partially supported by the French Agency for Research under the DEFIS program TODO, ANR-09-EMER-010. c 2011 by the paper authors. CLA 2011, pp. 15–29. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 16 Anne Berry, Romain Pogorelcnik and Alain Sigayret In a previous work [2], we used graph algorithms on the complement of the bipartite graph associated with the relation. In this paper, we will apply this decomposition directly to the bipartite graph itself. It turns out upon investigation that the subgraphs obtained divide not only the graph, but in a very similar fashion divide the matrix of the relation, the set of concepts and the lattice. When the relation has a clique separator of size two, the lattice, as we will explain further on, is divided along a vertical axis by an atom and a co-atom which correspond to the two vertices of the separator. Thus not only can the concepts be computed on the subrelations, but the Hasse diagram of the lattice can be drawn better, as no edge need cross this vertical line. Moreover, in a bipartite graph, this decomposition can be implemented with a better worse-case time complexity than in the general case, as the clique sep- arators can be of size one (in this case they are called articulation points) or of size two. In both cases, the entire decomposition can be computed in linear time, i.e. in the size of the relation, thanks to the works of [6] and [3]. Although some graphs do not have a clique separator, when there is one, the decomposition is thus a useful and non-expensive pre-processing step. The paper is organized as follows: we will first give some more preliminaries in Section 2. In Section 3, we explain how a bipartite graph is decomposed. In Section 4, we show how to reconstruct the global lattice from the concepts ob- tained on the subrelations. In Section 5, we discuss using vertical decomposition as a tool for layout help. Finally, in Section 6, we conclude with some general remarks. 2 Preliminaries We will first recall essential definitions and properties. All the graphs will be undirected and finite. For a graph G = (V, E), V is the vertex set and E is the edge set. For xy ∈ E, x 6= y, x and y are said to be adjacent; we say that x sees y. A graph is connected if, for every pair {x, y} of vertices, there is a path from x to y. When a graph is not connected, the maximal connected subgraphs are called the connected components. For C ⊂ V , G(C) denotes the subgraph induced by C. In a graph G = (V, E), the neighborhood of a vertex x ∈ V is the set NG (x) = {y ∈ V, x 6= y|xy ∈ E}. NG (x) is denoted N (x) when there is no ambiguity. A clique is set of vertices which induces a complete graph, i.e. with all possible edges. A bipartite graph G = (X + Y, E), where + stands for disjoint union, is built on two vertex sets, X and Y , with no edge between vertices of X and no edge between vertices of Y . A maximal biclique of a bipartite graph G = (X + Y, E) is a subgraph G(X 0 + Y 0 ) with all possible edges between the vertices of X 0 and the vertices of Y 0 . A relation R ⊆ O × A on a set O of objects and a set A of attributes is associated with a bipartite graph G=(O + A, E), which we will denote Bip(R); Vertical decomposition of a lattice using clique separators 17 thus, for x ∈ O and y ∈ A, (x, y) is in R iff xy is an edge of G. The maximal rectangles of the relation correspond exactly to the maximal bicliques (maximal complete bipartite subgraphs) of Bip(R) and to the elements (the concepts) of concept lattice L(R) associated with context (O, A, R). If O1 × A1 and O2 × A2 are concepts of L(R) then O1 × A1  O2 × A2 iff O1 ⊆ O2 iff A1 ⊇ A2 ; the corresponding bicliques on vertex sets O1 + A1 and O2 + A2 of Bip(R) are comparable the same way. An atom (resp. co-atom) of L(R) is a concept covering the minimum element (resp. covered by the maximum element). In the bipartite graph Bip(R), the neighborhoods are defined as follows: for x ∈ O, N (x) = R(x) = {y ∈ A|(x, y) ∈ R}, and for x ∈ A, N (x) = R−1 (x) = {y ∈ O|(y, x) ∈ R}. A separator in a connected graph is a set of vertices, the removal of which disconnects the graph. A clique separator is a separator which is a clique. Clique separator decomposition [9] of a graph G = (V, E) is a process which repeat- edly finds a clique separator S and copies it into the connected components of G(V − S). When only minimal separators are used (see [3] for extensive general definitions), the decomposition is unique and the subgraphs obtained in the end are exactly the maximal subgraphs containing no clique separator [8], [3]. In a bipartite graph, the clique minimal separators are of size one or two. A separator of size one is called a articulation point. A clique separator S = {x, y} of size two is minimal if there are two components C1 and C2 of G(V − S) such that x and y both have at least one neighbor in C1 as well as in C2 . 3 Decomposing the bipartite graph and the relation In the rest of the paper, we will use the bipartite graph Bip(R) defined by a rela- tion R ⊆ O × A. Figure 1 shows an example of a relation with the corresponding bipartite graph. Fig. 1. A relation and the corresponding bipartite graph In this section, we will first discuss connectivity issues, then illustrate and give our process to decompose the bipartite graph. 18 Anne Berry, Romain Pogorelcnik and Alain Sigayret 3.1 Decomposing the bipartite graph into connected components When the bipartite graph Bip(R) is not connected, our process can be applied separately (or in parallel) to each connected component. The lattice obtained is characteristic: when the top and bottom elements are removed from the Hasse diagram, the resulting diagram is a set of disjoint lattices, with a one-to-one correspondence between the connected components of Bip(R) and the lattices obtained. Figure 2 shows such a disconnected bipartite graph, its relation, and the corresponding lattice. Note that trivially, if a connected component has only one vertex, this means that the corresponding row or column of the relation is empty: such a component corresponds to a lattice with only one element. In the rest of the paper, we will consider only relations whose bipartite graph is connected. Fig. 2. A disconnected bipartite graph, its relation, and the corresponding character- istic lattice. Vertical decomposition of a lattice using clique separators 19 3.2 Illustrating the two decomposition steps In order to make the process we use as clear as possible, we will first describe what happens when one decomposition step is applied for each of the two de- compositions involved (using clique separators of size one or of size two). It is important to understand, however, that to ensure a good (linear) time complexity, each of the two decompositions is computed globally in a single linear-time pass. Step with an articulation point The removal of an articulation point {x} in a connected bipartite graph G results in components C1 , ..., Ck , which correspond to a partition V = C1 + ... + Ck + {x} of Bip(R). After a decomposition step using {x}, x is preserved, with its local neighborhood, in each component, so that G is replaced by k subgraphs G(C1 ∪ {x}), ..., G(Ck ∪ {x}). Example 1. In the input graph of Figure 3, vertex 1 defines an articulation point that induces two connected components {2, 3, a, b, c} and {4, d, e}. The decom- position step results into subgraphs G({1, 2, 3, a, b, c}) and G({1, 4, d, e}). Fig. 3. Decomposition by articulation point {1}. Step with a separator of size two When the removal of a clique minimal separator {x, y} in a connected bi- partite graph G results into components C1 ,..., Ck , corresponding to a partition V = C1 +...+Ck +{x, y}. The decomposition step replaces G with G(C1 ∪{x, y}), ..., G(Ck ∪ {x, y}). Example 2. In the input graph of Figure 4, {2, b} is a clique minimal separa- tor of size two that induces two connected components {1, 2, 3, a, b, c, f } and {2, 5, 6, b, g, h}. 20 Anne Berry, Romain Pogorelcnik and Alain Sigayret Fig. 4. Decomposition by clique separator {2,b} 3.3 Ordering the steps A clique minimal separator of size two may include an articulation point. Thus it is important to complete the decomposition by the articulation points first, and then go on to decompose the obtained subgraphs using their clique separators of size two. Example 3. In the input graph of Figure 5, {2} is an articulation point included in clique minimal separator {2, b}. The decomposition will begin with {2}, in- ducing components {2, i} and {1, 2, 3, 5, 6, a, b, c, f, g, h}. As there remains no articulation point in these resulting components, the second component will be then decomposed by {2, b} into {1, 2, 3, a, b, c, f } and {2, 5, 6, b, g, h}. Fig. 5. Articulation point {2} is processed before clique separator {2,b} After the bipartite graph decomposition is completed, we will obtain sub- graphs with no remaining clique minimal separator, and the corresponding sub- relations with their associated lattices. Example 4. Figure 6 shows that the input graph of Figure 1 is decomposable into four bipartite subgraphs: G1 = G({1, 2, i}), G2 = G({2, 5, 6, b, g, h}), G3 = G({1, 2, 3, a, b, c, f }) and G4 = G({1, 4, d, e}). Note that in the general case all subgraphs obtained have at least two vertices, since at least one vertex of a separator is copied into a component which has at least one vertex. Vertical decomposition of a lattice using clique separators 21 Fig. 6. Complete decomposition of a bipartite graph 3.4 The global decomposition process To obtain the entire decomposition of a connected bipartite graph, we will thus first decompose the graph using all articulation points, and then decompose each of the subgraphs obtained using all its clique separators of size 2. The articulation points (clique minimal separators of size one) can be found by a simple depth-first search [6], as well as the corresponding decomposition of the graph (called decomposition into biconnected components). The search for clique separators of size two corresponds to a more complicated algorithm, described in [5]: all separators of size 2 are obtained, whether they are cliques or not. Once this list of separators is obtained, it is easy to check which are joined by an edge. The desired decomposition can then be obtained easily. In both cases, the set of clique separators is output. Both algorithms run in linear time, so the global complexity is in O(|R|) to obtain both the set of subgraphs and the set of clique separators of the original graph. 3.5 Sub-patterns defined in the matrix When the clique separators involved do not overlap and each defines exactly two connected components, this decomposition of the bipartite graph partitions the graph and the underlying relation. This results in a significant pattern of their binary matrix. As the components obtained are pairwise disconnected, the matrix can be reorganized in such a way that zeroes are gathered into blocks. Two components may appear as consecutive blocks, linked by a row corresponding to the articulation point that has been used to split them, or linked by one cell giving the edge between the two vertices of a size two clique minimal separator. In the general case, this pattern can occur in only some parts of the matrix, and different patterns can be defined according to the different separators which the user chooses to represent. Example 5. The first of the two matrices below corresponds to our running ex- ample from Figure 1 and has been reorganized, following the decomposition, which results in the second matrix. Notice how {1} is an articulation point, so 22 Anne Berry, Romain Pogorelcnik and Alain Sigayret row 1 is shared by blocs 231×bacf and 14×de; and how {2, b} is a clique separa- tor of size two, so cell [2, b] is the intersection of blocs 562 × ghb and 231 × bacf . [2, i] is not integrated into the pattern, because separator {2, b} of Bip(R) defines 3 connected components: {2, 5, 6, b, g, h}, {i} and {1, 3, 4, a, c, d, e, f }. We will now describe a process to organize the lines and columns of the ma- trix with such patterns. We will construct a meta-graph (introduced in [7] as the ’atom graph’), whose vertices represent the subgraphs obtained by our decompo- sition, and where there is an edge between two such vertices if the two subgraphs which are the endpoints have a non-empty intersection which is a clique minimal separator separating the corresponding two subgraphs in the original bipartite graph. In this meta-graph, choose a chordless path; the succession of subgraphs along this path will yield a succession of rectangles in the matrix which corre- spond to a pattern. Example 6. Figure 7 gives the meta-graph for our running example from Figure 1. Chordless path ({2, 5, 6, b, g, h}, {1, 2, 3, a, b, c, }, {1, 4, d, e}) was chosen for the patterning. Another possible chordless path would be ({2, i}, {1, 2, 3, a, b, c, }, {1, 4, d, e}). Finding a chordless path in a graph can be done in linear time; the meta-graph has less than min(|A|, |O|) elements, so finding such a path costs less than (min(|A|, |O|))2 . Fig. 7. Meta-graph for graph from Figure 1 Vertical decomposition of a lattice using clique separators 23 3.6 Decomposing the lattice We will now examine how the set of concepts is modified and partitioned into the subgraphs obtained. As clique minimal separators are copied in all the com- ponents induced, most of the concepts will be preserved by the decomposition. Furthermore, only initial concepts including a vertex of a clique minimal sepa- rator may be affected by the decomposition. Definition 1. We will say that a maximal biclique is a star maximal biclique if it contains either exactly one object or exactly one attribute. This single object or attribute will be called the center of the star. Lemma 1. A star maximal biclique {x} ∪ N (x) of Bip(R) is an atomic concept of L(R) (atom or co-atom), unless x is universal in Bip(R). More precisely, {x} × N (x) is a atom if x ∈ O and N (x) 6= A, or N (x) × {x} is a co-atom if x ∈ A and N (x) 6= O. Proof. Let {x} ∪ N (x) be a star maximal biclique of Bip(R). As a maximal biclique, it corresponds to a concept of L(R). Suppose the star has x ∈ O as center . As a star, it contains no other element of O; as a biclique, it includes all N (x) ⊆ A, and no other element of A by maximality. The corresponding concept is {x} × N (x) which is obviously the first concept from bottom to top including x. As the biclique is maximal, and as x is not universal, this concept cannot be the bottom of L(R) but only an atom. A similar proof holds for x ∈ A and co-atomicity. We will now give the property which describes how the maximal bicliques are dispatched or modified by the decomposition. In the next Section, we will give a general theorem and its proof, from which these properties can be deduced. Property 1. Let G = (X + Y, E) be a bipartite graph, let S be a clique minimal separator of G which decomposes G into subgraphs G1 , ..., Gk . Then: 1. ∀x ∈ S, {x} ∪ NG (x) is a star maximal biclique of G. 2. ∀x ∈ S, {x} ∪ NG (x) is not a maximal biclique of any Gi . 3. ∀x ∈ S, {x} ∪ NGi (x) is a biclique of Gi , but it is maximal in Gi iff it is not strictly contained in any other biclique of Gi . 4. All the maximal bicliques of G which are not star bicliques with any x ∈ S as a center are partitioned into the corresponding subgraphs. With the help of Lemma 1, this property may be translated in terms of lattices. Given a relation R, its associated graph G, its lattice L(R), and a decomposition step of G into some Gi s by articulation point {x}: If x ∈ O (resp. ∈ A) is an articulation point of G, {x}×NG (x) (resp. NG (x)× {x}) is a concept of L(R). After the decomposition step, in each subgraph Gi of G, either this concept becomes {x}×NGi (x), or this concept disappears from Gi ; this latter case occurs when there is in Gi some x0 ∈ O, the introducer of which appears after the introducer of x in L(R), from bottom to top (resp. from top 24 Anne Berry, Romain Pogorelcnik and Alain Sigayret to bottom if x, x0 ∈ A). Every other concept will appear unchanged in exactly one lattice associated with a subgraph Gi . The same holds for each vertex of a size two clique minimal separator. Example 7. Figure 8 illustrates a decomposition step with articulation point {1} using the graph from Figure 3. Concept {1, 4} × {d, e} disappears from the first component {1, 2, 3, a, b, c}, but remains in the second component {1, 4, d, e}. Fig. 8. Example of lattice decomposition using articulation point {1}. Example 8. Figure 9 illustrates a decomposition step with clique separator {2, b} using the graph from Figure 4. Concept {2}×N (2) is duplicated into components {2, 5, 6, b, g, h} and {1, 2, 3, a, b, c, f }; concept N (b) × {b} will appear as {2, 6} × {b} in the first component, but not in the second one, as {2, 3, b} is a biclique included in maximal biclique {2, 3, b, f } of G. Remark 1. The smaller lattices obtained can not be called sublattices of the initial lattice as some of their elements may not be the same: for example, in Figure 9, {2} × {b, c, f } is an element of the third smaller lattice L(G3 ) but is not an element of the initial lattice L(G). 4 Reconstructing the lattice We will now explain how, given the subgraphs obtained by clique decomposi- tion, as well as the corresponding subrelations and subsets of concepts, we can reconstruct the set of concepts of the global input bipartite graph. We will then go on to explain how to obtain the Hasse diagram of the reconstructed lattice. Vertical decomposition of a lattice using clique separators 25 Fig. 9. Example of lattice decomposition using clique separator {2, b}. 4.1 Reconstructing the set of concepts We will use the following Theorem, which describes the concepts of the global lattice. Theorem 1. Let G = (X + Y, E) be a bipartite graph, let Σ = {s1 , ...sh } be the set of all the vertices which belong to a clique separator of G, let G1 , ...Gk be the set of subgraphs obtained by the complete corresponding clique separator decomposition. Then: 1. For every s ∈ Σ, {s} ∪ NG (s) is a star maximal biclique of G. 2. Any maximal biclique of a subgraph Gi which is not a star with a vertex of Σ as center is also a maximal biclique of G. 3. There are no other maximal bicliques in G: ∀s ∈ Σ, no other star maximal biclique of Gi with center s is a star maximal biclique of G, and these are the only maximal bicliques of some graph Gi which are not also maximal bicliques in G. Proof. 1. For every s ∈ Σ, {s} ∪ NG (s) is a star maximal biclique of G: Case 1: s is an articulation point, let Gi , Gj be two graphs which s belongs to; s must be adjacent to some vertex yi in Gi and to some vertex yj in Gj . Suppose {s} ∪ NG (s) is not a maximal biclique: there must be a vertex z in G which sees yi and yj , but then {s} cannot separate yi from yj , a contradiction. Case 2: s is not an articulation point, let s0 be a vertex of S such that 26 Anne Berry, Romain Pogorelcnik and Alain Sigayret {s, s0 } is a clique separator of G, separating Gi from Gj . s must as above see some vertex yi in Gi and some vertex yj in Gj . Suppose {s} ∪ NG (s) is not maximal: there must be some vertex w in G which sees all of NG (s), but w must see yi and yj , so {s, s0 } cannot separate Gi from Gj . 2. Let B be a non-star maximal biclique of Gi , containing o1 , o2 ∈ O and a1 , a2 ∈ A. Suppose B is not maximal in G: there must be a vertex y in G − B which augments B. Let y be in Gj , wlog y ∈ A: y must see o1 and o2 . Since Gi is a maximal subgraph with no clique separator, Gi + {y} must have a clique separator. Therefore N (y) must be a clique separator of this subgraph, but this is impossible, since y sees two non-adjacent vertices of Gi . 3. Any star maximal biclique B of Gi whose center is not in Σ is also a star maximal biclique of G: suppose we can augment B in G. Case 1: v sees an extra vertex w; Gi + {w} contains as above a clique sepa- rator, which is impossible since N (w) = v and v 6∈ S. Case 2: A vertex z of Gj is adjacent to all of N (v): again, G + {z} contains a clique separator, so N (z) is a clique separator, but that is impossible since N (z) contains at least two non-adjacent vertices. 4. For s ∈ Σ, no star maximal biclique of Gi is a star maximal biclique of G: let B be a star maximal biclique of Gi , with s ∈ Σ as center. s ∈ Σ, so s belongs to some clique separator which separates Gi from some graph Gj . s must see a vertex yj in Gj , so B + {yj } is a larger star including B: B cannot be maximal in G. Example 9. We illustrate Theorem 1 using graph G from Figure 6, whose decom- position yields subgraphs G1 , ..., G4 , with G1 = G({1, 2, i}), G2 = G({2, 5, 6, b, g, h}), G3 = G({1, 2, 3, a, b, c, f }) and G4 = G({1, 4, d, e}). Finally, Σ = {1, 2, b}. The corresponding lattices are shown in Figure 10, and their concepts are presented in the table below. In this table, braces have been omitted; symbol ⇒ represents a concept of the considered subgraph Gi which is identical to a concept of G (there can be only one ⇒ per row); the other concepts of the subgraphs will not be preserved in G while recomposing. L(G) L(G1 ) L(G2 ) L(G3 ) L(G4 ) star max. biclique of G ? 1 × acde 1 × ac yes 2 × bcf hi 2 × i 2 × bh 2 × bcf yes 3 × abf ⇒ 14 × de ⇒ 5 × gh ⇒ 6 × bg ⇒ 13 × a ⇒ 236 × b 26 × b yes 12 × c ⇒ 23 × bf ⇒ 56 × g ⇒ 25 × h ⇒ Vertical decomposition of a lattice using clique separators 27 Fig. 10. Reconstruction of a lattice According to Theorem 1, the steps to reconstruct the maximal concepts of the global lattice from the concepts of the smaller lattices are: 1. Compute Σ, the set of attributes and objects involved in a clique minimal separator. (In our example, Σ = {1, 2, b}.) 2. Compute the maximal star bicliques for all the elements of Σ. (In our exam- ple, we will compute star maximal bicliques 1 × acde, 2 × bcf hi and 26 × b.) 3. For each smaller lattice, remove from the set of concepts the atoms or co- atoms corresponding to elements of Σ; maintain all the other concepts as concepts of the global lattice. (In our example, for L(G3 ), we will remove 1×ac and 2×bcf , and maintain 3×abf, 13×a, 12×c and 23×bf as concepts of L(G).) Step 1 requires O(|R|) time. Step 2 can be done while computing the smaller lattices; Step 3 costs constant time per concept. Thus the overall complexity of the reconstruction is in O(|R|) time. 4.2 Reconstructing the edges of the Hasse diagram According to Theorem 1, the maximal bicliques which are not star maximal bicliques with a vertex of Σ as center are preserved; therefore, the corresponding edges between the elements of the lattice are also preserved. In the process 28 Anne Berry, Romain Pogorelcnik and Alain Sigayret described below, we will refer to labels in the lattice as being the ’reduced’ labels, such as the ones used in our lattice figures throuhout this paper. To define the edges of the Hasse diagram of lattice L(G), we will, for each smaller lattice L(Gi ): – find each atom (or co-atom) which corresponds to an element σ of Σ (such as 2 or b for L(G3 ) in our example). – If σ shares its label with some non-elements of Σ, remove all elements of Σ from the label. (In our example for L(G3 ), bf becomes f ). If σ does not share its label with some non-elements of Σ, remove the atom or co-atom. (In our example for L(G3 ), remove element 2). – Maintain the remaining edges as edges of L(G). – Compute the neighborhood in L(G) of each atom or co-atom which corre- sponds to an element of Σ. All this can be done in polynomial time: there are at most |A| + |O| vertices in Σ, and the corresponding edges can be added in O((|A| + |O|)2 |R|). 5 Vertical decomposition as a layout tool When there is a size two clique separator in the bipartite graph which divides the graph into two components, the concepts which are not involved in the separator can be displayed on the two sides of the separator, thus helping to minimize the number of line crossings in the Hasse diagram. (a) (b) Fig. 11. (a) Lattice constructed by Concept Explorer using the minimal intersection layout option (8 crossings). (b) Lattice re-drawn using the information on clique sep- arators (5 crossings). To illustrate this, we have used our running example with ’Concept Explorer’ [1], which is a very nice and user-friendly tool for handling lattices. Notice how- ever how clique separator {1, d} is better displayed when put at the right ex- tremity. Vertical decomposition of a lattice using clique separators 29 Figure 11 shows the lattice as proposed by Concept Explorer, and then re- drawn with insight on the clique separators of the bipartite graph. The same technique of course also applies when there is a succession of such clique separators. Let us add that if moreover both lattices are planar, as discussed in [4], merg- ing the two lattices obtained using the clique separator as central will preserve planarity. 6 Conclusion and perspectives We have used a graph method, clique minimal separator decomposition, to pro- vide simple tools which can help reduce the time spent computing the elements of a lattice, as well as improve the drawing of its Hasse diagram. When there is no clique separator in the bipartite graph, it could be inter- esting to investigate restricting the relation to a subgraph or partial subgraph which does have one. We leave open the question of characterizing, without computing the relation, the lattices whose underlying bipartite graph has a clique minimal separator. Acknowledgments The authors sincerely thank all the referees for their useful suggestions and questions. References 1. Concept Explorer. Downloadable at http://sourceforge.net/projects/conexp/, ver- sion 1.3 (Java), 20/12/2009. 2. Berry A., Sigayret A.: Representing a concept lattice by a graph. Discrete Applied Mathematics, 144(1-2):27–42, 2004. 3. Berry A., Pogorelcnik R., Simonet G.: An introduction to clique minimal separator decomposition. Algorithms, 3(2):197–215, 2010. 4. Eschen E.M., Pinet N., Sigayret A.: Consecutive-ones: handling lattice planarity efficiently. CLA’07, Montpellier (Fr), 2007. 5. Hopcroft J. E., Tarjan R. E.: Dividing a graph into triconnected components. SIAM J. Comput., 2(3):135–158, 1973. 6. Hopcroft J. E., Tarjan R. E.: Efficient algorithms for graph manipulation [H] (Algorithm 447). Commun. ACM, 16(6):372–378, 1973. 7. Kaba B., Pinet N., Lelandais G., Sigayret A., Berry A.: Clustering gene expression data using graph separators. In Silico Biology, 7(4-5):433–52, 2007. 8. Leimer H.-G.: Optimal decomposition by clique separators. Discrete Mathematics, 113(1-3):99–123, 1993. 9. Tarjan R. E.: Decomposition by clique separators. Discrete Mathematics, 55(2):221–232, 1985. Building up Shared Knowledge with Logical Information Systems Mireille Ducassé1 , Sébastien Ferré2 , and Peggy Cellier1 1 IRISA-INSA de Rennes, France, {ducasse, cellier}@irisa.fr 2 IRISA-University of Rennes 1, France ferre@irisa.fr Abstract. Logical Information Systems (LIS) are based on Logical Con- cept Analysis, an extension of Formal Concept Analysis. This paper de- scribes an application of LIS to support group decision. A case study gathered a research team. The objective was to decide on a set of po- tential conferences on which to send submissions. People individually used Abilis, a LIS web server, to preselect a set of conferences. Start- ing from 1041 call for papers, the individual participants preselected 63 conferences. They met and collectively used Abilis to select a shared set of 42 target conferences. The team could then sketch a publication planning. The case study provides evidence that LIS cover at least three of the collaboration patterns identified by Kolfschoten, de Vreede and Briggs. Abilis helped the team to build a more complete and relevant set of information (Generate/Gathering pattern); to build a shared un- derstanding of the relevant information (Clarify/Building Shared Un- derstanding); and to quickly reduce the number of target conferences (Reduce/Filtering pattern). 1 Introduction Group work represents a large amount of time in professional life while many people feel that much of that time is wasted. Lewis [13] argues that this amount of time is even going to increase because problems are becoming more complex and are meant to be solved in a distributed way. Each involved person has a local and partial view of the problem, no one embraces the whole required knowledge. Lewis also emphasizes that it is common that “groups fail to adequately define a problem before rushing to judgment”. Building up shared knowledge in order to gather relevant distributed knowledge of a problem is therefore a crucial issue. Logical Information Systems (LIS) are based on Logical Concept Analysis (LCA), an extension of Formal Concept Analysis (FCA). In a previous work [5], Camelis, a single-user logical information system, has been shown useful to sup- port serene and fair meetings. This paper shows how Abilis, a LIS web server that implements OnLine Analytical Processing (OLAP [3]) features, can be applied to help build shared knowledge among a group of skilled users. The presented case study gathered a research team to decide on a publication strategy. Starting from 1041 call for papers, each team member on his own c 2011 by the paper authors. CLA 2011, pp. 31–42. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 32 Mireille Ducasse, Sebastien Ferre and Peggy Cellier preselected a set of conferences matching his own focus of interest. The union of individual preselections still contained 63 conferences. Then, participants met for an hour and a half and collectively built a shared set of 42 target conferences. For each conference, the team shared a deep understanding of why it was relevant. The team could sketch a publication planning in a non-conflictual way. Kolfschoten, de Vreede and Briggs have classified collaboration tasks into 16 collaboration patterns [12]. The contribution of this paper is to give evidences that LIS can significantly support three of these patterns which are important as- pects of decision making, namely Generate/Gathering, Clarify/Building Shared Understanding and Reduce/Filtering. Firstly, the navigation and filtering capa- bilities of LIS were helpful to detect inconsistencies and missing knowledge. The updating capabilities of LIS enabled participants to add objects, features and links between them on the fly. As a result the group had a more complete and relevant set of information (Generate/Gathering pattern). Secondly, the com- pact views provided by LIS and the OLAP features helped participants embrace the whole required knowledge. The group could therefore build a shared under- standing of the relevant information which was previously distributed amongst the participants (Clarify/Building Shared Understanding pattern). Thirdly, the navigation and filtering capabilities of LIS were relevant to quickly converge on a reduced number of target conferences (Reduce/Filtering pattern). In the following, Section 2 briefly introduces logical information systems. Sec- tion 3 describes the case study. Section 4 gives detailed arguments to support the claim that logical information systems help build up shared knowledge. Section 5 discusses related work. 2 Logical Information Systems Logical Information Systems (LIS) [7] belong to a paradigm of information re- trieval that combines querying and navigation. They are formally based on a logical generalization of Formal Concept Analysis (FCA) [8], namely Logical Concept Analysis (LCA) [6]. In LCA, logical formulas are used instead of sets of attributes to describe objects. LCA and LIS are generic in that the logic is not fixed, but is a parameter of those formalisms. Logical formulas are also used to represent queries and navigation links. The concept lattice serves as the navigation structure: every query leads to a concept, and every navigation link leads from one concept to another. The query can be modified in three ways: by formula edition, by navigation (selecting features in the index in order to modify the query) or by examples. Annotations can be performed in the same interface. Camelis3 has been developed since 2002; a web interface, Abilis 4 , has recently been added. It incorporates display paradigms based on On-Line Analytical Pro- cessing (OLAP). Instead of being presented as a list of objects, an extent can be partitioned as an OLAP cube, namely a multi-dimensional array [1]. 3 see http://www.irisa.fr/LIS/ferre/camelis/ 4 http://ledenez.insa-rennes.fr/abilis/ Building up Shared Knowledge with Logical Information Systems 33 3 The Case Study The reported case study gathered 6 participants, including the 3 authors, 4 academics and 2 PhD students. All participants were familiar with LIS, 4 of them had not previously used a LIS tool as a decision support system. The objective was to identify the publishing strategy of the team: in which conferences to submit and why. This has not been a conflictual decision, the group admitted very early that the set of selected conferences could be rather large provided that there was a good reason to keep each of them. One person, the facilitator, spent an afternoon organizing the meeting and preparing the raw data as well as a logical context according to the objective. She collected data about conference call for papers of about a year, related to themes corresponding to the potential area of the team, from WikiCFP, a semantic wiki for Calls For Papers in science and technology fields 5 . There were 1041 events: conferences, symposiums, workshops but also special issues of journals. Then every participant, on its own, spent between half an hour to two hours to browse the context, update it if necessary and preselect a number of con- ferences (Section 3.1). The group met for one hour and a half. It collaborately explored the data and selected a restricted set of conferences (Section 3.2). After the meeting, every participant filled a questionnaire. The context used for the case study can be freely accessed 6 . 3.1 Distributed Individual Preselection and Update When the context was ready, every participant was asked to preselect a set of conferences that could be possible submission targets. The instruction was to be as liberal as wanted and in case of doubt to label the conference as a possible target. During this phase, each of the academics preselected 20 to 30 conferences and each of the PhD students preselected around 10 conferences. Each participant had his own “basket”. There were overlappings, altogether 63 conferences were preselected. Participants also introduced new conferences and new features, for example, the ranking of the Australian CORE association 7 (Ranking), and the person expected to be a possible first author for the target conference (Main author). Figure 1 shows a state of Abilis during the preselection phase. LIS user in- terfaces give a local view of the concept lattice, centered on a focus concept. The local view is made of three parts: (1) the query (top left), (2) the extent (bottom right), and (3) the index (bottom left). The query is a logical formula that typi- cally combines attributes (e.g., Name), patterns (e.g., contains "conference"), and Boolean connectors (and, or, not). The extent is the set of objects that are matched by the query, according to logical subsumption. The extent identifies 5 http://www.wikicfp.com/cfp/ 6 http://ledenez.insa-rennes.fr/abilis/, connect as guest, load Call for papers. 7 http://core.edu.au/index.php/categories/conference rankings 34 Mireille Ducasse, Sebastien Ferre and Peggy Cellier Fig. 1. Snapshot of Abilis during preselection: a powerful query Building up Shared Knowledge with Logical Information Systems 35 the focus concept. Finally, the index is a set of features, taken from a finite sub- set of the logic, and is restricted to features associated to at least one object in the extent. The index plays the role of a summary or inventory of the extent, showing which kinds of objects there are, and how many of each kind there are (e.g., in Figure 1, 8 objects in the extent have data mining as a theme). In the index, features are organized as a taxonomy according to logical subsumption. The query area (top left) shows the current selection criteria: (Name contains "conference" or Name contains "symposium") and not (Name contains "agent" or Name contains "challenge" or Name contains "workshop") and (Theme is "Knowledge Discovery" or Theme is "Knowledge Engineering" or Theme is "Knowledge Management"). Note that the query had been obtained solely by clicking on features of the index (bottom left). Let us describe how it had been produced. Initially there were 1041 objects. Firstly, opening the Name ? feature, the participant had noticed that names could con- tain “conference” or “symposium” but also other keywords such as “special issue”. He decided to concentrate on conferences and symposiums by clicking on the two features and then on the zoom button. The resulting query was (Name contains "conference" or Name contains "symposium") and there were 495 objects in the extent. However, the displayed features under Name ? showed that there were still objects whose name in addition to “conference” or “symposium” also contained “agent”, “challenge” or “workshop”. He decided to filter them out by clicking on the three features then on the Not button then on the zoom button. The resulting query was (Name contains "conference" or Name contains "symposium") and not (Name contains "agent" or Name contains "challenge" or Name contains "workshop") and there were 475 objects in the extent. He opened the Theme ? feature, clicked on the three sub- features containing “Knowledge”, then on the zoom button. The resulting query is the one displayed on Figure 1 and there are 48 objects in the displayed extent. In the extent area (bottom right), the 48 acronyms of the selected conferences are displayed. In the index area, one can see which of the features are filled for these objects. The displayed features have at least 1 object attached to them. The number of objects actually attached to them is shown in parentheses. For example, only 14 of the preselected conferences have an abstract deadline. All of them have an acronym, a date of beginning, a date of end, a date for the paper deadline, a name, some other (not very relevant) information, as well as at least a theme and a town. The features shared by all selected objects have that number in parentheses (48 in this case). For the readers who have a color printout, these features are in green. The other features are attached to only some of the objects. For example, only 16 objects have a ranking attached to them: 4 core A, 6 core B, 2 core C, 1 ‘too recent event’, 4 unknown (to the Core ranking). One way to pursue the navigation could be, for example, to click on Ranking ? to select the conferences for which the information is filled. Alternatively, one could concentrate on the ones for which the ranking is not filled, for example 36 Mireille Ducasse, Sebastien Ferre and Peggy Cellier to fill in this information on the fly for the conferences which are considered interesting. Another way to pursue the navigation could be, for example, to notice that under the Theme ? feature, there are more than the selected themes. One can see that among the selected conferences, one conference is also relevant to the Decision Support Systems theme. One could zoom into it, this would add and Theme is "Decision Support Systems" to the query ; the object area would then display the relevant conference (namely GDN2011). 3.2 Collaborative Data Exploration, Update and Selection The group eventually had a physical meeting where the current state of the context was constantly displayed on a screen. Using the navigation facilities of Abilis, the conferences were examined by decreasing ranking. Initially, the group put in the selection all the A and A+ preselected conferences. After some discussions, it had, however, been decided that Human Computer Interaction (HCI) was too far away from the core of the team’s research. Subsequently, the HCI conferences already selected were removed from the selection. For the conferences of rank B, the team decided that most of them were pretty good and deserved to be kept in the selection. For the others, the group investigated first the conferences without ranking and very recent, trying to identify the ones with high selection rate or other good reasons to put them in the selection. Some of the arguments have been added into the context. Some others were taken into account on the fly to select some conferences but they seemed so obvious at the time of the discussion that they were not added in the context. Figure 2 shows the selection made by the group at a given point. In the extent area, on the right hand side, the selected objects are partitioned according to the deadline month and the anticipated main author thanks to the OLAP like facilities of Abilis [1]. Instead of being presented as a list of objects, an extent can be partitioned as an OLAP cube, namely a multi-dimensional array. Assuming object features are valued attributes, each attribute can play the role of a dimension, whose values play the role of indices along this dimension. Users can freely interleave changes to the query and changes to the extent view. The query is SelectionLIS02Dec and scope international. Note that the partition display is consistent with the query. When the group added and scope international to the query, the national conferences disappeared from the array. Some conferences, absent from WikiCFP have been entered on the fly at that stage (for example, ICCS 2011 - Concept). Not all the features had been entered for all of them. In particular, one can see in the feature area that only 28 out of 29 had been preselected. Nevertheless, the group judged that the deadline month, the potential main author and the ranking were crucial for the decision process and added them systematically. It is easy to find which objects do not have a feature using Not and zoom, and then to attach features to them. Building up Shared Knowledge with Logical Information Systems 37 Fig. 2. Snapshot of Abilis during collaborative data exploration: a partition deadline month/mainAuthor 38 Mireille Ducasse, Sebastien Ferre and Peggy Cellier One can see that there are enough opportunities for each participant to pub- lish round the year. One can also see at a glance where compromises and decisions will have to be made. For example, PC will probably not be in a position to pub- lish at IDA, ISSTA, KDD and ICCS the same year. Thanks to this global view PC can discuss with potential co-authors what the best strategy could be. A follow up to the meeting was that participants made a personal publication planning, knowing that their target conferences were approved by the group. 4 Discussion In this section, we discuss how the reported case study provides evidences that LIS help keep the group focused (Section 4.1) and that LIS also help build up shared knowledge (Section 4.2). As already mentioned, participants filled up a questionnaire after the meeting. In the following, for each item, we introduce the arguments, we present a summary of relevant parts of participant feedbacks, followed by an analysis of the features of LIS that are crucial for the arguments. 4.1 Logical Information Systems Help Keep the Group Focused It is recognized that an expert facilitator can significantly increase the efficiency of a meeting (see for example [2]). A study made by den Hengst and Adkins [10] investigated which facilitation functions were found the most challenging by facilitators around the world. It provides evidences that facilitators find that “the most difficult facilitation function in meeting procedures is keeping the group outcome focused.” In our case study, all participants reported that they could very easily stay focused on the point currently discussed thanks to the query and the consistency between the three views. As the objective was to construct a selection explicitly identified in Abilis by a feature, the objective of the meeting was always present to everybody and straightforward to bring back in case of digression. Furthermore, even if the context contained over a thousand conferences, thanks to the navigation facilities of LIS, only relevant information was displayed at a given time. Therefore, there was no “noise” and no dispersion of attention, the displayed information was always closely connected to the focus of the discussion. 4.2 Logical Information Systems Help Build Up Shared Knowledge Kolfschoten, de Vreede and Briggs have identified 6 collaboration patterns: Gen- erate, Reduce, Clarify, Organize, Evaluate, and Consensus Building [12]. We discuss in the following three of their 16 sub-patterns for which all participants agreed that they are supported by Abilis in its current stage. For the other sub- patterns, the situation did not demand much with respect to them. For example, the decision to make was not conflictual, the set of selected conferences could be rather large, there was, therefore, not much to experiment about “consensus Building up Shared Knowledge with Logical Information Systems 39 building.” The descriptions of the patterns in italic are from Kolfschoten, de Vreede and Briggs. Generate/Gathering: move from having fewer to having more complete and rel- evant information shared by the group. Before and during the meeting, information has been added to the shared knowledge repository of the group, namely the logical context. A new theme, important for the team and missing from WikiCFP, has been added: Decision Support Systems. New conferences have been added into the context either by individual participants in the preselection phase or by the group during the selection phase. New features were added. For example, it soon appeared that some sort of conference rankings was necessary. The group added by hand, for the conferences that were selected, the ranking of the Australian Core association. Some conferences were added subsequently, sometimes the ranking was not added at once. All participants acknowledged that the tool helped the group to set up a set of features which was relevant and reflecting the group’s point of view. The crucial characteristics of LIS for this aspect are those which enable in- tegrated navigation and update. Firstly, the possibility to update the context while navigating in it enables participants to enhance it on the fly adding small pieces of relevant information at a time. Secondly, for each feature, Abilis displays the number of objects which have it. It is therefore immediate to detect when a feature is not systematically filled. The query Not selects the objects that do not have the feature. Users can then decide if they want to update them. Thanks to the query, as soon as an object is updated, it disappears from the extent. Users can immediately see what remains to be updated. Thirdly, updating the context does not divert from the initial objective. Indeed, the Back button allows users to go back to previous queries. Fourthly, the three views (query, features, objects) are always consistent and provide a “global” understanding of the relevant objects. Lastly, in the shared web server, participants can see what information the others had entered. Hence each participant can inspire the others. For the last aspect, the facilitator inputs were decisive. Participants reported that they did not invent much, they imitated and adapted from what the facili- tator had initiated. This is consistent with the literature on group decision and negotiation which emphasizes the key role of facilitators [2]. Clarify/Building Shared Understanding: Move from having less to more shared understanding of the concepts shared by the group and the words and phrases used to express them. Participants, even senior ones, discovered new conferences. Some were sur- prised by the ranking of conferences that they had previously overlooked. Par- ticipants had a much clearer idea of who was interested in what. All participants found that the tool helped them understand the points of view of the others. 40 Mireille Ducasse, Sebastien Ferre and Peggy Cellier The crucial characteristics of LIS for this aspect are those which enable to grasp a global understanding at a glance. Firstly, the query, as discussed earlier, helps keep the group focused. Secondly, the consistency between the 3 views helps participants to grasp the situation. Thirdly, irrelevant features are not in the index, the features in the index thus reflect the current state of the group de- cision. Fourthly, the partitions à la OLAP sort the information according to the criteria under investigation. Lastly, the shared web server enables participants to know before the meeting what the others have entered. Reduce/Filtering: move from having many concepts to fewer concepts that meet specific criteria according to the group members. Both at preselection time and during the meeting, participants could quickly strip down the set of conferences of interest according to the most relevant criteria. All participants said that the filtering criteria were relevant and reflecting the group’s point of view. They also all thought that the group was satisfied with the selected set of conferences. The crucial characteristics of LIS for this aspect are those of the navigation core of LIS. Firstly, the features of the index propose filtering criteria. They are dynamically computed and they are relevant for the current selection of ob- jects. Secondly, the query with its powerful logic capabilities enables participants to express sophisticated selections. Thirdly, the navigation facilities enable par- ticipants to build powerful queries, even without knowing anything about the syntax. Lastly, users do not have to worry about the consistency of the set of selected objects. The view consistency of Abilis guaranties that all conferences fulfilling the expressed query are indeed present. This aspect is especially important. As claimed by Davis et al. [4], conver- gence in meetings is a slow and painful process for groups. Vogel and Coombes [16] present an experiment that supports the hypothesis that groups selecting ideas from a multicriteria task formulation will converge better than groups working on a single criteria formulation, where convergence is defined as moving from many ideas to a focus on a few ideas that are worthy of further attention. Convergence is very close to the Reduce/Filtering collaboration pattern. They also underline that people try to minimize the effects of information overload by employing con- scious or even unconscious strategies of heuristics in order to reduce information load, where information overload is defined as having too many things to do at once. With their powerful navigation facilities, LIS enable to address a large num- ber of criteria and objects with a limited information overload. Indeed, one can concentrate on local aspects. The global consistency is maintained automatically by the concept lattice. 5 Related work Abilis in its current stage does not pretend to match up to operational group support systems (GSS) which have a much broader scope. LIS, however, could be Building up Shared Knowledge with Logical Information Systems 41 integrated in some of the modules of GSS. For example, M eetingworksT M [13], one of the most established GSS, is a modular toolkit that can be configured to support a wide variety of group tasks. Its “Organize” module proposes a tree structure to help analyze and sort ideas. That structure looks much like the index of LIS. It can be edited by hand and some limited selection is possible. The navigation capabilities of LIS based on the concept lattice are, however, more powerful. Concept analysis has been applied to numerous social contexts, such as so- cial networks [15], computer-mediated communication [9] and domestic violence detection [14]. Most of those applications are intended to be applied a posteri- ori, in order to get some understanding of the studied social phenomena. On the contrary, we propose to use Logical Concept Analysis in the course and as a support of the social phenomena itself. In our case, the purpose is to support a collaborative decision process. Our approach is to other social applications, what information retrieval is to data mining. Whereas data mining automatically com- putes a global and static view on a posteriori data, information retrieval (i.e. navigation in and update of the concept lattice) presents the user with a local and dynamic view on live data, and only guides users in their choice. A specificity of LIS is the use of logics. This has consequences both on the queries that can be expressed, and on the feature taxonomy. The use of logics al- lows to express inequalities on numerical attributes, disjunctions and negations in queries. In pure FCA, only conjunctions of Boolean attributes can be expressed. Previous sections have shown how disjunction and negation are important to express selection criteria. In the taxonomy, criteria are organized according to the logical subsumption relation between them in pure FCA, criteria would be presented as a long flat list. Logics help to make the taxonomy more concise and readable by grouping and hierarchizing together similar criteria. The taxonomy can be dynamically updated by end-users. 6 Conclusion In this paper we have shown that a Logical Information System web server could be used to support a group decision process consisting of 1) data preparation 2) distributed individual preselection and update and 3) collaborative data ex- ploration, update and selection. We have presented evidences that the navigation and filtering capabilities of LIS were relevant to quickly reduce the number of target conferences. Secondly, the same capabilities were also helpful to detect inconsistencies and missing knowledge. The updating capabilities of LIS enabled participants to add objects, features and links between them on the fly. As a result the group had a more complete and relevant set of information. Thirdly, the group had built a shared understanding of the relevant information. Acknowledgments The authors thank Pierre Allard and Benjamin Sigonneau for the development and maintenance of Abilis. They thank Pierre Allard, Annie Foret and Alice Hermann for attending the experiment and giving many insight- ful feedbacks. 42 Mireille Ducasse, Sebastien Ferre and Peggy Cellier References 1. Allard, P., Ferré, S., Ridoux, O.: Discovering functional dependencies and associa- tion rules by navigating in a lattice of OLAP views. In: Kryszkiewicz, M., Obiedkov, S. (eds.) Concept Lattices and Their Applications. pp. 199–210. CEUR-WS (2010) 2. Briggs, R.O., Kolfschoten, G.L., de Vreede, G.J., Albrecht, C.C., Lukosch, S.G.: Facilitator in a box: Computer assisted collaboration engineering and process sup- port systems for rapid development of collaborative applications for high-value tasks. In: HICSS. pp. 1–10. IEEE Computer Society (2010) 3. Codd, E., Codd, S., Salley, C.: Providing OLAP (On-line Analytical Processing) to User-Analysts: An IT Mandate. Codd & Date, Inc, San Jose (1993) 4. Davis, A., de Vreede, G.J., Briggs, R.: Designing thinklets for convergence. In: AMCIS 2007 Proceedings (2007), http://aisel.aisnet.org/amcis2007/358 5. Ducassé, M., Ferré, S.: Fair(er) and (almost) serene committee meetings with logi- cal and formal concept analysis. In: Eklund, P., Haemmerlé, O. (eds.) Proceedings of the International Conference on Conceptual Structures. Springer-Verlag (July 2008), lecture Notes in Artificial Intelligence 5113 6. Ferré, S., Ridoux, O.: A logical generalization of formal concept analysis. In: Mineau, G., Ganter, B. (eds.) International Conference on Conceptual Structures. pp. 371–384. No. 1867 in Lecture Notes in Computer Science, Springer (Aug 2000) 7. Ferré, S., Ridoux, O.: An introduction to logical information systems. Information Processing & Management 40(3), 383–419 (2004) 8. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999) 9. Hara, N.: Analysis of computer-mediated communication: Using formal concept analysis as a visualizing methodology. Journal of Educational Computing Research 26(1), 25–49 (2002) 10. den Hengst, M., Adkins, M.: Which collaboration patterns are most challenging: A global survey of facilitators. In: HICSS. p. 17. IEEE Computer Society (2007) 11. Kilgour, D.M., Eden, C.: Handbook of Group Decision and Negotiation, Advances in Group Decision and Negotiation, vol. 4. Springer Netherlands (2010) 12. Kolfschoten, G.L., de Vreede, G.J., Briggs, R.O.: Collaboration engineering. In: Kilgour and Eden [11], chap. 20, pp. 339–357 13. Lewis, L.F.: Group support systems: Overview and guided tour. In: Kilgour and Eden [11], chap. 14, pp. 249–268 14. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G.: A case of using formal con- cept analysis in combination with emergent self organizing maps for detecting domestic violence. In: Perner, P. (ed.) Advances in Data Mining. Applications and Theoretical Aspects, Lecture Notes in Computer Science, vol. 5633, pp. 247–260. Springer Berlin / Heidelberg (2009), http://dx.doi.org/10.1007/978-3-642-03067- 3 20, 10.1007/978-3-642-03067-3 20 15. Roth, C., Bourgine, P.: Lattice-based dynamic and overlapping taxonomies: The case of epistemic communities. Scientometrics 69, 429–447 (2006), http://dx.doi.org/10.1007/s11192-006-0161-6, 10.1007/s11192-006-0161-6 16. Vogel, D., Coombes, J.: The effect of structure on convergence activities using group support systems. In: Kilgour and Eden [11], chap. 17, pp. 301–311 Comparing Performance of Algorithms for Generating the Duquenne–Guigues Basis Konstantin Bazhanov and Sergei Obiedkov Higher School of Economics, Moscow, Russia, kostyabazhanov@mail.ru, sergei.obj@gmail.com Abstract. In this paper, we take a look at algorithms involved in the computation of the Duquenne–Guigues basis of implications. The most widely used algorithm for constructing the basis is Ganter’s Next Clo- sure, designed for generating closed sets of an arbitrary closure system. We show that, for the purpose of generating the basis, the algorithm can be optimized. We compare the performance of the original algorithm and its optimized version in a series of experiments using artificially generated and real-life datasets. An important computationally expen- sive subroutine of the algorithm generates the closure of an attribute set with respect to a set of implications. We compare the performance of three algorithms for this task on their own, as well as in conjunction with each of the two versions of Next Closure. 1 Introduction Implications are among the most important tools of formal concept analysis (FCA) [9]. The set of all attribute implications valid in a formal context defines a closure operator mapping attribute sets to concept intents of the context (this mapping is surjective). The following two algorithmic problems arise with respect to implications: 1. Given a set L of implications and an attribute set A, compute the closure L(A). 2. Given a formal context K, compute a set of implications equivalent to the set of all implications valid in K, i.e., the cover of valid implications. The first of these problems has received considerable attention in the database literature in application to functional dependencies [14]. Although functional dependencies are interpreted differently than implications, the two are in many ways similar: in particular, they share the notion of semantic consequence and the syntactic inference mechanism (Armstrong rules [1]). A linear-time algo- rithm, LinClosure, has been proposed for computing the closure of a set with respect to a set of functional dependencies (or implications) [3], i.e., for solving the first of the two problems stated above. However, the asymptotic complexity estimates may not always be good indicators for relative performance of algo- rithms in practical situations. In Sect. 3, we compare LinClosure with two c 2011 by the paper authors. CLA 2011, pp. 43–57. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 44 Konstantin Bazhanov and Sergei Obiedkov other algorithms—a “naı̈ve” algorithm, Closure [14], and the algorithm pro- posed in [20]—both of which are non-linear. We analyze their performance in several particular cases and compare them experimentally on several datasets. For the second problem, an obvious choice of the cover is the Duquenne– Guigues, or canonical, basis of implications, which is the smallest set equivalent to the set of valid implications [11]. Unlike for the other frequently occurring FCA algorithmic task, the computation of all formal concepts of a formal con- text [12], only few algorithms have been proposed for the calculation of the canonical basis. The most widely-used algorithm was proposed by Ganter in [10]. Another, attribute-incremental, algorithm for the same problem was de- scribed in [17]. It is claimed to be much faster than Ganter’s algorithm for most practical situations. The Concept Explorer software system [21] uses this algo- rithm to generate the Duquenne–Guigues basis of a formal context. However, we do not discuss it here, for we choose to concentrate on the computation of implications in the lectic order (see Sect. 4). The lectic order is important in the interactive knowledge-acquisition procedure of attribute exploration [8], where implications are output one by one and the user is requested to confirm or reject (by providing a counterexample) each implication. Ganter’s algorithm repeatedly computes the closure of an attribute set with respect to a set of implications; therefore, it relies heavily on a subprocedure implementing a solution to the first problem. In Sect. 4, we describe possible optimizations of Ganter’s algorithm and experimentally compare the original and optimized versions in conjunction with each of the three algorithms for solving the first problem. A systematic comparison with the algorithm from [17] is left for further work. 2 The Duquenne–Guigues Basis of Implications Before proceeding, we quickly recall the definition of the Duquenne–Guigues basis and related notions. Given a (formal) context K = (G, M, I), where G is called a set of objects, M is called a set of attributes, and the binary relation I ⊆ G × M specifies which objects have which attributes, the derivation operators (·)I are defined for A ⊆ G and B ⊆ M as follows: A0 = {m ∈ M | ∀g ∈ A : gIm} B 0 = {g ∈ G | ∀m ∈ B : gIm} In words, A0 is the set of attributes common to all objects of A and B 0 is the set of objects sharing all attributes of B. The double application of (·)0 is a closure operator, i.e., (·)00 is extensive, idempotent, and monotonous. Therefore, sets A00 and B 00 are said to be closed. Closed object sets are called concept extents and closed attribute sets are called concept intents of the formal context K. In discussing the algorithms later in the paper, we assume that the sets G and M are finite. An implication over M is an expression A → B, where A, B ⊆ M are at- tribute subsets. It holds in the context if A0 ⊆ B 0 , i.e., every object of the context that has all attributes from A also has all attributes from B. Comparing performance of algorithms for generating the DG basis 45 An attribute subset X ⊆ M respects (or is a model of) an implication A → B if A 6⊆ X or B ⊆ X. Obviously, an implication holds in a context (G, M, I) if and only if {g}0 respects the implication for all g ∈ G. A set L of implications over M defines the closure operator X 7→ L(X) that maps X ⊆ M to the smallest set respecting all the implications in L: \ L(X) = {Y | X ⊆ Y ⊆ M, ∀(A → B) ∈ L : A 6⊆ Y or B ⊆ Y }. We discuss algorithms for computing L(X) in Sect. 3. Note that, if L is the set of all valid implications of a formal context, then L(X) = X 00 for all X ⊆ M . Two implication sets over M are equivalent if they are respected by exactly the same subsets of M . Equivalent implication sets define the same closure op- erator. A minimum cover of an implication set L is a set of minimal size among all implication sets equivalent to L. One particular minimum cover described in [11] is defined using the notion of a pseudo-closed set, which we introduce next. A set P ⊆ M is called pseudo-closed (with respect to a closure operator (·)00 ) if P 6= P 00 and Q00 ⊂ P for every pseudo-closed Q ⊂ P . In particular, all minimal non-closed sets are pseudo-closed. A pseudo-closed attribute set of a formal context is also called a pseudo-intent. The Duquenne–Guigues or canonical basis of implications (with respect to a closure operator (·)00 ) is the set of all implications of the form P → P 00 , where P is pseudo-closed. This set of implications is of minimal size among those defining the closure operator (·)00 . If (·)00 is the closure operator associated with a formal context, the Duquenne–Guigues basis is a minimum cover of valid implications of this context. The computation of the Duquenne–Guigues basis of a formal context is hard, since even recognizing pseudo-intents is a coNP- complete problem [2], see also [13, 7]. We discuss algorithms for computing the basis in Sect. 4. 3 Computing the Closure of an Attribute Set In this section, we compare the performance of algorithms computing the closure of an attribute set X with respect to a set L of implications. Algorithm 1 [14] checks every implication A → B ∈ L and enlarges X with attributes from B if A ⊆ X. The algorithm terminates when a fixed point is reached, that is, when the set X cannot be enlarged any further (which always happens at some moment, since both L and M are assumed finite). The algorithm is obviously quadratic in the number of implications in L in the worst case. The worst case happens when exactly one implication is applied at each iteration (but the last one) of the repeat loop, resulting in |L|(|L| + 1)/2 iterations of the for all loop, each requiring O(|M |) time. Example 1. A simple example is when X = {1} and the implications in L = {{i} → {i + 1} | i ∈ N, 0 < i < n} for some n are arranged in the descending order of their one-element premises. 46 Konstantin Bazhanov and Sergei Obiedkov Algorithm 1 Closure(X, L) Input: An attribute set X ⊆ M and a set L of implications over M . Output: The closure of X w.r.t. implications in L. repeat stable := true for all A → B ∈ L do if A ⊆ X then X := X ∪ B stable := false L := L \ {A → B} until stable return X In [3], a linear-time algorithm, LinClosure, is proposed for the same prob- lem. Algorithm 2 is identical to the version of LinClosure from [14] except for one modification designed to allow implications with empty premises in L. Lin- Closure associates a counter with each implication initializing it with the size of the implication premise. Also, each attribute is linked to a list of implications that have it in their premises. The algorithm then checks every attribute m of X (the set whose closure must be computed) and decrements the counters for all implications linked to m. If the counter of some implication A → B reaches zero, attributes from B are added to X. Afterwards, they are used to decrement counters along with the original attributes of X. When all attributes in X have been checked in this way, the algorithm stops with X containing the closure of the input attribute set. It can be shown that the algorithm is linear in the length of the input as- suming that each attribute in the premise or conclusion of any implication in L requires a constant amount of memory [14]. Example 2. The worst case for LinClosure occurs, for instance, when X ⊂ N, M = X ∪ {1, 2, . . . , n} for some n such that X ∩ {1, 2, . . . , n} = ∅ and L consists of implications of the form X ∪ {i | 0 < i < k} → {k} for all k such that 1 ≤ k ≤ n. During each of the first |X| iterations of the for all loop, the counters of all implications will have to be updated with only the last iteration adding one attribute to X using the implication X → {1}. At each of the subsequent n − 1 iterations, the counter for every so far “unused” implication will be updated and one attribute will be added to X. The next, (|X| + n)th, iteration will terminate the algorithm. Note that, if the implications in L are arranged in the superset-inclusion order of their premises, this example will present the worst case for Algorithm 1 requiring n iterations of the main loop. However, if the implications are arranged in the subset-inclusion order of their premises, one iteration will be sufficient. Inspired by the mechanism used in LinClosure to obtain linear asymp- totic complexity, but somewhat disappointed by the poor performance of the Comparing performance of algorithms for generating the DG basis 47 Algorithm 2 LinClosure(X, L) Input: An attribute set X ⊆ M and a set L of implications over M. Output: The closure of X w.r.t. implications in L. for all A → B ∈ L do count[A → B] := |A| if |A| = 0 then X := X ∪ B for all a ∈ A do add A → B to list[a] update := X while update 6= ∅ do choose m ∈ update update := update \ {m} for all A → B ∈ list[m] do count[A → B] = count[A → B] − 1 if count[A → B] = 0 then add := B \ X X := X ∪ add update := update ∪ add return X algorithm relative to Closure, which was revealed in his experiments, Wild proposed a new algorithm in [20]. We present this algorithm (in a slightly more compact form) as Algorithm 3. The idea is to maintain implication lists similar to those used in LinClosure, but get rid of the counters. Instead, at each step, the algorithm combines the implications in the lists associated with attributes not occurring in X and “fires” the remaining implications (i.e., uses them to enlarge X). When there is no implication to fire, the algorithm terminates with X containing the desired result. Wild claims that his algorithm is faster than both LinClosure and Clo- sure, even though it has the same asymptotic complexity as the latter. The worst case for Algorithm 3 is when L \ L1 contains exactly one implication A → B and B \ X contains exactly one attribute at each iteration of the repeat . . . until loop. Example 1 presents the worst case for 3, but, unlike for Closure, the order of implications in L is irrelevant. The worst case for LinClosure (see Example 2) is also the worst case for Algorithm 3, but it deals with it, perhaps, in a more efficient way using n iterations of the main loop compared to n + |X| iterations of the main loop in LinClosure. Experimental Comparison We implemented the algorithms in C++ using Microsoft Visual Studio 2010. For the implementation of attribute sets, as well as sets of implications in Algorithm 3, we used dynamic bit sets from the Boost library [6]. All the tests described in the following sections were carried out on an Intel Core i5 2.67 GHz computer with 4 Gb of memory running under Windows 7 Home Premium x64. 48 Konstantin Bazhanov and Sergei Obiedkov Algorithm 3 Wild’s Closure(X, L) Input: An attribute set X ⊆ M and a set L of implications over M. Output: The closure of X w.r.t. implications in L. for all m ∈ M do for all A → B ∈ L do if m ∈ A then add A → B to list[m] repeat stable S := true L1 := m∈M \X list[m] for all A → B ∈ L \ L1 do X := X ∪ B stable := false L := L1 until stable return X Figure 1 shows the performance of the three algorithms on Example 1. Algo- rithm 2 is the fastest algorithm in this case: for a given n, it needs n iterations of the outer loop—the same as the other two algorithms, but the inner loop of Algorithm 2 checks exactly one implication at each iteration, whereas the inner loop of Algorithm 1 checks n − i implications at the ith iteration. Although the inner loop of Algorithm 3 checks only one implication at the ith iteration, it has to compute the union of n − i lists in addition. 40 35 Time in sec for 1000 tests 30 25 20 Closure 15 LinClosure Wild's Closure 10 5 0 0 100 200 300 400 500 600 700 800 900 n Fig. 1. The performance of Algorithms 1–3 for Example 1. Comparing performance of algorithms for generating the DG basis 49 Figure 2 shows the performance of the algorithms on Example 2. Here, the behavior of Algorithm 2 is similar to that of Algorithm 1, but Algorithm 2 takes more time due to the complicated initialization step. 40 35 Time in sec for 1000 tests 30 25 20 Closure 15 LinClosure Wild's Closure 10 5 0 0 100 200 300 400 500 600 700 800 900 n Fig. 2. The performance of Algorithms 1–3 for Example 2 with implications in L arranged in the superset-inclusion order of their premises and |X| = 50. Interestingly, Algorithm 1 works amost twice as fast on Example 2 as it does on Example 1. This may seem surprising, since it is easy to see that the algorithm performs essentially the same computations in both cases, the difference being that the implications of Example 1 have single-element premises. However, this turns out to be a source of inefficiency: at each iteration of the main loop, all implications but the last fail to fire, but, for each of them, the algorithm checks if their premises are included in the set X. Generally, when A 6⊆ X, this can be established easier if A is large, for, in this case, A is likely to contain more elements outside X. This effect is reinforced by the implementation of sets as bit strings: roughly speaking, to verify that {i} 6⊆ {1}, it is necessary to check all bits up {i}, whereas {i | 0 < i < k} 6⊆ {k + 1} can be established by checking only one bit (assuming that bits are checked from left to right). Alternative data structures for set implementation might have less dramatic consequences for performance in this setting. On the other hand, the example shows that performance may be affected by issues not so obviously related to the structure of the algorithm, thus, suggesting additional paths to obtain an optimal behavior (e.g., by rearranging attributes or otherwise preprocessing the input data). We have experimented with computing closures using the Duquenne–Guigues bases of formal contexts as input implication sets. Table 1 shows the results for randomly generated contexts. The first two columns indicate the size of the at- tribute set and the number of implications, respectively. The remaining three columns record the time (in seconds) for computing the closures of 1000 ran- 50 Konstantin Bazhanov and Sergei Obiedkov domly generated subsets of M by each of the three algorithms. Table 3 presents similar results for datasets taken from the UCI repository [5] and, if necessary, transformed into formal contexts using FCA scaling [9].1 The contexts are de- scribed in Table 2, where the last four columns correspond to the number of objects, number of attributes, number of intents, and number of pseudo-intents (i.e., the size of the canonical basis) of the context named in the first column. Table 1. Performance on randomly generated tests (time in seconds per 1000 closures) Algorithm |M | |L| 1 2 3 30 557 0.0051 0.2593 0.0590 50 1115 0.0118 0.5926 0.1502 100 380 0.0055 0.2887 0.0900 100 546 0.0086 0.4229 0.1350 100 2269 0.0334 1.5742 0.5023 100 3893 0.0562 2.6186 0.8380 100 7994 0.1134 5.3768 1.7152 100 8136 0.1159 5.6611 1.8412 Table 2. Contexts obtained from UCI datasets Context |G| |M | # intents # pseudo-intents Zoo 101 28 379 141 Postoperative Patient 90 26 2378 619 Congressional Voting 435 18 10644 849 SPECT 267 23 21550 2169 Breast Cancer 286 43 9918 3354 Solar Flare 1389 49 28742 3382 Wisconsin Breast Cancer 699 91 9824 10666 In these experiments, Algorithm 1 was the fastest and Algorithm 2 was the slowest, even though it has the best asymptotic complexity. This can be partly explained by the large overhead of the initialization step (setting up counters and implication lists). Therefore, these results can be used as a reference only when the task is to compute one closure for a given set of implications. When 1 The breast cancer domain was obtained from the University Medical Centre, Insti- tute of Oncology, Ljubljana, Yugoslavia (now, Slovenia). Thanks go to M. Zwitter and M. Soklic for providing the data. Comparing performance of algorithms for generating the DG basis 51 a large number of closures must be computed with respect to the same set of implications, Algorithms 2 and 3 may be more appropriate. Table 3. Performance on the canonical bases of contexts from Table 2 (time in seconds per 1000 closures) Algorithm Context 1 2 3 Zoo 0.0036 0.0905 0.0182 Postoperative Patient 0.0054 0.2980 0.0722 Congressional Voting 0.0075 0.1505 0.0883 SPECT 0.0251 0.9848 0.2570 Breast Cancer 0.0361 1.7912 0.5028 Solar Flare 0.0370 2.1165 0.6317 Wisconsin Breast Cancer 0.1368 8.4984 2.4730 4 Computing the Basis in the Lectic Order The best-known algorithm for computing the Duquenne–Guigues basis was de- veloped by Ganter in [10]. The algorithm is based on the fact that intents and pseudo-intents of a context taken together form a closure system. This makes it possible to iteratively generate all intents and pseudo-intents using Next Clo- sure (see Algorithm 4), a generic algorithm for enumerating closed sets of an arbitrary closure operator (also proposed in [10]). For every generated pseudo- intent P , an implication P → P 00 is added to the basis. The intents, which are also generated, are simply discarded. Algorithm 4 Next Closure(A, M , L) Input: A closure operator X 7→ L(X) on M and a subset A ⊆ M . Output: The lectically next closed set after A. for all m ∈ M in reverse order do if m ∈ A then A := A \ {m} else B := L(A ∪ {m}) if B \ A contains no element < m then return B return ⊥ Next Closure takes a closed set as input and outputs the next closed set according to a particular lectic order, which is a linear extension of the subset- 52 Konstantin Bazhanov and Sergei Obiedkov inclusion order. Assuming a linear order < on attributes in M , we say that a set A ⊆ M is lectically smaller than a set B ⊆ M if ∃b ∈ B \ A ∀a ∈ A(a < b ⇒ a ∈ B). In other words, the lectically largest among two sets is the one containing the smallest element in which they differ. Example 3. Let M = {a < b < c < d < e < f }, A = {a, c, e} and B = {a, b, f }. Then, A is lectically smaller than B, since the first attribute in which they differ, b, is in B. Note that if we represent sets by bit strings with smaller attributes corresponding to higher-order bits (in our example, A = 101010 and B = 110001), the lectic order will match the usual less-than order on binary numbers. To be able to use Next Closure for iterating over intents and pseudo- intents, we need access to the corresponding closure operator. This operator, which we denote by • , is defined via the Duquenne–Guigues basis L as follows.2 For a subset A ⊆ M , put [ A+ = A ∪ {P 00 | P → P 00 ∈ L, P ⊂ A}. Then, A• = A++···+ , where A•+ = A• ; i.e., • is the transitive closure of + . The problem is that L is not available when we start; in fact, this is precisely what we want to generate. Fortunately, for computing a pseudo-closed set A, it is sufficient to know only implications with premises that are proper subsets of A. Generating pseudo-closed sets in the lectic order, which is compatible with the subset-inclusion order, we ensure that, at each step, we have at hand the required part of the basis. Therefore, we can use any of the three algorithms from Sect. 3 to compute A• (provided that the implication A• → A00 has not been added to L yet). Algorithm 5 uses Next Closure to generate the canonical basis. It passes Next Closure the part of the basis computed so far; Next Closure may call any of the Algorithms 1–3 to compute the closure, L(A ∪ {m}), with respect to this set of implications. After Next Closure computes A• , the implication A• → A00 may be added to the basis. Algorithm 5 will then pass A• as the input to Next Closure, but there is some room for optimizations here. Let i be the maximal element of A and j be the minimal element of A00 \ A. Consider the following two cases: j < i: As long as m > i, the set L(A• ∪{m}) will be rejected by Next Closure, since it will contain j. Hence, it makes sense to skip all m > i and continue as if A• had been rejected by Next Closure. This optimization has already been proposed in [17]. i < j: It can be shown that, in this case, the lectically next intent or pseudo- intent after A• is A00 . Hence, A00 could be used at the next step instead of A• . Algorithm 6 takes these considerations into account. 2 We deliberately use the same letter L for an implication set and the closure operator it defines. Comparing performance of algorithms for generating the DG basis 53 Algorithm 5 Canonical Basis(M , 00 ) Input: A closure operator X 7→ X 00 on M , e.g., given by a formal context (G, M, I). Output: The canonical basis for the closure operator. L := ∅ A := ∅ while A 6= M do if A 6= A00 then L := L ∪ {A → A00 } A := Next Closure(A, M, L) return L Algorithm 6 Canonical Basis(M , 00 ), an optimized version Input: A closure operator X 7→ X 00 on M , e.g., given by a formal context (G, M, I). Output: The canonical basis for the closure operator. L := ∅ A := ∅ i := the smallest element of M while A 6= M do if A 6= A00 then L := L ∪ {A → A00 } if A00 \ A contains no element < i then A := A00 i := the largest element of M else A := {m ∈ A | m ≤ i} for all j ≤ i ∈ M in reverse order do if j ∈ A then A := A \ {j} else B := L(A ∪ {j}) if B \ A contains no element < j then A := B i := j exit for return L 54 Konstantin Bazhanov and Sergei Obiedkov Experimental Comparison We used Algorithms 5 and 6 for constructing the canonical bases of the contexts involved in testing the performance of the algorithms from Sect. 3, as well as the context (M, M, 6=) with |M | = 18, which is special in that every subset of M is closed (and hence there are no valid implications). Both algorithms have been tested in conjunction with each of the three procedures for computing closures (Algorithm 1–3). The results are presented in Table 4 and Fig. 3. It can be seen that Algorithm 6 indeed improves on the performance of Algorithm 5. Among the three algorithms computing the closure, the simpler Algorithm 1 is generally more efficient, even though, in our implementation, we do not perform the initialization step of Algorithms 2 and 3 from scratch each time we need to compute a closure of a new set; instead, we reuse the previously constructed counters and implication lists and update them incrementally with the addition of each new implication. We prefer to treat these results as preliminary: it still remains to see whether the asymptotic behavior of LinClosure will give it an advantage over the other algorithms on larger contexts. Table 4. Time (in seconds) for building the canonical bases of artificial contexts Algorithm Context # intents # pseudo-intents 5+1 5+2 5+3 6+1 6+2 6+3 100 × 30, 4 307 557 0.0088 0.0145 0.0119 0.0044 0.0065 0.0059 10 × 100, 25 129 380 0.0330 0.0365 0.0431 0.0073 0.0150 0.0169 100 × 50, 4 251 1115 0.0442 0.0549 0.0617 0.0138 0.0152 0.0176 10 × 100, 50 559 546 0.0542 0.1312 0.1506 0.0382 0.0932 0.0954 20 × 100, 25 716 2269 0.3814 0.3920 0.7380 0.1219 0.1312 0.2504 50 × 100, 10 420 3893 1.1354 0.7291 1.6456 0.1640 0.1003 0.2299 900 × 100, 4 2472 7994 4.6313 2.7893 6.3140 1.5594 0.8980 2.0503 20 × 100, 50 12394 8136 7.3097 8.1432 14.955 5.1091 6.0182 10.867 (M, M, 6=) 262144 0 0.1578 0.3698 0.1936 0.1333 0.2717 0.1656 5 Conclusion In this paper, we compared the performance of several algorithms computing the closure of an attribute set with respect to a set of implications. Each of these algorithms can be used as a (frequently called) subroutine while computing the Duquenne–Guigues basis of a formal context. We tested them in conjunction with Ganter’s algorithm and its optimized version. In our future work, we plan to extend the comparison to algorithms generat- ing the Duquenne–Guigues basis in a different (non-lectic) order, in particular, to incremental [17] and divide-and-conquer [19] approaches, probably, in conjunc- tion with newer algorithms for computing the closure of a set [16]. In addition, Comparing performance of algorithms for generating the DG basis 55 Fig. 3. Time (in seconds) for building the canonical bases of contexts from Table 2 0,18 0,16 0,14 5+1 0,12 5+2 0,10 5+3 0,08 6+1 0,06 6+2 0,04 6+3 0,02 0,00 Zoo Postoperative Patient Congressional Voting 1,6 1,4 1,2 5+1 1,0 5+2 0,8 5+3 6+1 0,6 6+2 0,4 6+3 0,2 0,0 SPECT Breast Cancer 16 14 12 5+1 10 5+2 8 5+3 6+1 6 6+2 4 6+3 2 0 Solar Flare Wisconsin Breast Cancer 56 Konstantin Bazhanov and Sergei Obiedkov we are going to consider algorithms that generate other implication covers: for example, direct basis [15, 20, 4] or proper basis [18]. They can be used as an inter- mediate step in the computation of the Duquenne–Guigues basis. If the number of intents is much larger than the number of pseudo-intents, this two-step ap- proach may be more efficient than direct generation of the Duquenne–Guigues basis with Algorithms 5 or 6, which produce all intents as a side effect. Acknowledgements The second author was supported by the Academic Fund Program of the Higher School of Economics (project 10-04-0017) and the Russian Foundation for Basic Research (grant no. 08-07-92497-NTsNIL a). References 1. Armstrong, W.: Dependency structure of data base ralationship. Proc. IFIP Congress pp. 580–583 (1974) 2. Babin, M.A., Kuznetsov, S.O.: Recognizing pseudo-intents is coNP-complete. In: Kryszkiewicz, M., Obiedkov, S. (eds.) Proceedings of the 7th International Con- ference on Concept Lattices and Their Applications. pp. 294–301. University of Sevilla, Spain (2010) 3. Beeri, C., Bernstein, P.: Computational problems related to the design of normal form relational schemas. ACM TODS 4(1), 30–59 (March 1979) 4. Bertet, K., Monjardet, B.: The multiple facets of the canonical direct unit impli- cational basis. Theor. Comput. Sci. 411(22-24), 2155–2166 (2010) 5. Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://archive.ics.uci.edu/ml 6. Demming, R., Duffy, D.: Introduction to the Boost C++ Libraries. Datasim Edu- cation Bv (2010), see http://www.boost.org 7. Distel, F., Sertkaya, B.: On the complexity of enumerating pseudo-intents. Discrete Appl. Math. 159, 450–466 (March 2011) 8. Ganter, B.: Attribute exploration with background knowledge. Theor. Comput. Sci. pp. 215–233 (1999) 9. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin (1999) 10. Ganter, B.: Two basic algorithms in concept analysis. Preprint 831, Technische Hochschule Darmstadt, Germany (1984) 11. Guigues, J.L., Duquenne, V.: Familles minimales d’implications informatives re- sultant d’un tableau de donnees binaires. Math. Sci. Hum. 95(1), 5–18 (1986) 12. Kuznetsov, S., Obiedkov, S.: Comparing performance of algorithms for generating concept lattices. Journal of Experimental and Theoretical Artificial Intelligence 14(2/3), 189–216 (2002) 13. Kuznetsov, S.O., Obiedkov, S.: Some decision and counting problems of the Duquenne–Guigues basis of implications. Discrete Appl. Math. 156(11), 1994–2003 (2008) 14. Maier, D.: The theory of relational databases. Computer software engineering se- ries, Computer Science Press (1983) Comparing performance of algorithms for generating the DG basis 57 15. Mannila, H., Räihä, K.J.: The design of relational databases. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA (1992) 16. Mora, A., Aguilera, G., Enciso, M., Cordero, P., de Guzman, I.P.: A new closure algorithm based in logic: SLFD-Closure versus classical closures. Inteligencia Ar- tificial, Revista Iberoamericana de IA 10(31), 31–40 (2006) 17. Obiedkov, S., Duquenne, V.: Attribute-incremental construction of the canonical implication basis. Annals of Mathematics and Artificial Intelligence 49(1-4), 77–99 (April 2007) 18. Taouil, R., Bastide, Y.: Computing proper implications. In Proc. ICCS-2001 In- ternational Workshop on Concept Lattices-Based Theory, Methods and Tools for Knowledge Discovery in Databases pp. 290–303 (2001) 19. Valtchev, P., Duquenne, V.: On the merge of factor canonical bases. In: Medina, R., Obiedkov, S. (eds.) ICFCA. Lecture Notes in Computer Science, vol. 4933, pp. 182–198. Springer (2008) 20. Wild, M.: Computations with finite closure systems and implications. In: Comput- ing and Combinatorics. pp. 111–120 (1995) 21. Yevtushenko, S.A.: System of data analysis “Concept Explorer” (in Russian). In: Proceedings of the 7th national conference on Artificial Intelligence KII-2000. pp. 127–134. Russia (2000), http://conexp.sourceforge.net/ Filtering Machine Translation Results with Automatically Constructed Concept Lattices Yılmaz Kılıçaslan1 and Edip Serdar Güner1, 1 Trakya University, Department of Computer Engineering, 22100 Edirne, Turkey {yilmazk, eserdarguner}@trakya.edu.tr Abstract. Concept lattices can significantly improve machine translation systems when applied as filters to their results. We have developed a rule-based machine translator from Turkish to English in a unification-based programming paradigm and supplemented it with an automatically constructed concept lattice. The test results achieved by applying this translation system to a Turkish child story reveals that lattices used as filters to translation results have a promising potential to improve machine translation. We have compared our system with Google Translate on the data. The comparison suggests that a rule- based system can even compete with this statistical machine translation system that stands out with its wide range of users. Keywords: Concept Lattices, Rule-based Machine Translation, Evaluation of MT systems. 1 Introduction Paradigms of Machine translation (MT) can be classified into two major categories depending on their focus: result-oriented paradigms and process-oriented ones. Statistical MT focuses on the result of the translation, not the translation process itself. In this paradigm, translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. Rule-based MT, a more classical paradigm, focuses on the selection of representations to be used and steps to be performed during the translation process. It is the rule-based paradigm that will be the concern of this paper. We argue for the viability of a rule-based translation model where a concept lattice functions as a filter for its results. In what follows, we first introduce the classical models for doing rule-based MT, illustrating particular problematic cases with translation pairs between Turkish and English (cf. Section 2). Then, we briefly introduce the basic notions of Formal Concept Analysis (FCA) and touch upon the question of how lattices built using FCA can serve as a bridge between two languages (cf. Section 3). This is followed by the presentation of our translation system (cf. Section 4). Subsequently, we report on and evaluate several experiments which we have performed by feeding our translation system with a Turkish child story text (cf. Section 5). The discussion ends with some remarks and with a summary of the paper (cf. Section 6). c 2011 by the paper authors. CLA 2011, pp. 59–73. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 60 Yılmaz Kılıçaslan and Edip Serdar Güner 2 Models for Rule-Based Translation 2.1 Direct Translation The most straightforward MT strategy is the so-called direct translation. Basically, the strategy is to translate each word into its target language counterpart while proceeding word-by-word through the source language text or speech. If the only difference between two languages were due to their lexical choices, this approach could be a very easy way of producing high quality translation. However, languages differ from each other not only lexically but also structurally. In fact, the direct translation strategy works very well only for very simple cases like the following: (1) Turkish: Direct Translation to English: Köpek-ler havlar-lar. Dogs bark. dog-pl bark-3pl In this example, the direct translation strategy provides us with a perfect translation of the Turkish sentence (interpreted as a kind-level statement about dogs). But, consider now the following example: (2) Turkish: Direct Translation to English: Supposing that the referent of the pronoun is a male person, the expected translation for the given Turkish sentence would be the following: (3) Correct Translation: The woman knows him. The direct translation approach fails in this example in the following respects: First, the translation results in a subject-object-verb (SOV) ordering, which does not comply with the canonical SVO ordering in English. SOV is the basic word order in Turkish. Second, the subject does not have the required definite article in the translation. The reason for this is another typological difference between the two languages: Turkish lacks a definite article. Third, the word-by-word translation leaves the English auxiliary verb ambiguous with respect to number, as the Turkish verb does not carry the number information. Fourth, the verb know is encoded in the progressive aspect in the translation, which is unacceptable as it denotes a mental state. This anomaly is the result of directly translating the Turkish continuous suffix –yor to the English suffix –ing. Fifth, the pronoun is left ambiguous with respect to gender in the translation, as Turkish pronouns do not bear this information. Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 61 2.2 Transfer Approach 2.2.1 Syntactic Transfer As Jurafsky and Martin [6] point out, examples like those above suggest that the direct approach to MT is too focused on individual words and that we need to add phrasal and structural knowledge into our MT models to achieve better results. It is through the transfer approach that a rule-based strategy incorporates the structural knowledge into the MT model. In this approach, MT involves three phases: analysis, transfer, and generation. In the analysis phase, the source language text is parsed into a syntactic and/or semantic structure. In the transfer phase, the structure of the source language is transformed to a structure of the target language. The generation phase takes this latter structure as input and turns it to an actual text of the target language. Let us first see how the transfer technique can make use of syntactic knowledge to improve the translation result of the example discussed above. Assuming a simple syntactic paradigm, the input sentence can be parsed into the following structure: (4) Once the sentence has been parsed, the resulting tree will undergo a syntactic transfer operation to resemble the target parse tree and this will be followed by a lexical transfer operation to generate the target text: (5) The syntactic transfer exploits the following facts about English: a singular count noun must have a determiner and the subject agrees in number and person with the verb. Collecting the leaves of the target parse tree, we get the following output: (6) Translation via Syntactic Transfer: This output is free from the first three defects noted with the direct translation. However, the problem of encoding the mental state verb in progressive aspect and the 62 Yılmaz Kılıçaslan and Edip Serdar Güner gender ambiguity of the pronoun still await to be resolved. These require meaning- related knowledge to be incorporated into the MT model. 2.2.2 Semantic Transfer The context-independent aspect of meaning is called semantic meaning. A crucial component of the semantic meaning of a natural language sentence is its lexical aspect, which determines whether the situation that the sentence describes is a (punctual) event, a process or a state. This information is argued to be inherently encoded in the verb. Obviously, knowing is a mental state and, hence, cannot be realized in the progressive aspect. We can apply a shallow semantic analysis to our previously obtained syntactic structure, which will give us a tree structure enriched with aspectual information, and thereby achieve a more satisfactory transfer: (7) The resulting translation is the following: (8) Translation via Semantic Transfer: 2.3 Interlingua Approach There are two problems with the transfer model: it requires contrastive knowledge about languages and it requires such knowledge for every pair of languages. If the meaning of the input can be extracted and encoded in a language-independent form and the output can, in turn, be generated out of this form, there will be no need for any kind of contrastive knowledge. A language-independent meaning representation language to be used in such a scheme is usually referred to as an interlingua. A common way to visualize the three approaches to rule-based MT is with Vauquois triangle shown below (adopted from [6]): Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 63 Fig. 1. The Vauquois triangle. As Jurafsky and Martin point out: [t]he triangle shows the increasing depth of analysis required (on both the analysis and generation end) as we move from the direct approach through transfer approaches, to interlingual approaches. In addition, it shows the decreasing amount of transfer knowledge needed as we move up the triangle, from huge amounts of transfer at the direct level (almost all knowledge is transfer knowledge for each word) through transfer (transfer rules only for parse trees or thematic roles) through interlingua (no specific transfer knowledge). (p. 867) 3 Lattice-Based Interlingua Strategy A question left open above is that of what kind of representation scheme can be used as an interlingua. There are many possible alternatives such as predicate calculus, Minimal Recursion Semantics or an event-based representation. Another interesting possibility is to use lattices built using Formal Concept Analysis (FCA) as meaning representations to this effect. FCA, developed by Ganter & Wille [5], assumes that data from an application are given by a formal context, a triple (G, M, I) consisting of two sets G and M and a so called incidence relation I between these sets. The elements of G are called the objects and the elements of M are called the attributes. The relation I holds between g and m, (g, m) ∈ I if and only if the object g has the attribute m. A formal context induces two operators, both of which usually denoted by ʹ. One of these operators maps each set of objects A to the set of attributes Aʹ which these objects have in common. The other operator maps each set of attributes B to the set of objects Bʹ which satisfies these attributes. FCA is in fact an attempt to give a formal definition of the notion of a ‘concept’. A formal concept of the context (G, M, I) is a pair (A, B) such that G ⊇ A = Aʹ and M ⊇ B  Bʹ. A is called the extent and B the intent of the concept (A, B). The set of all concepts of the context (G, M, I) is denoted by C(G, M, I). This set is ordered by a subconcept – superconcept relation, which is a partial order relation denoted by ≤. If (A1, B1) and (A2, B2) are concepts in C(G, M, I), the former is said to 64 Yılmaz Kılıçaslan and Edip Serdar Güner be a subconcept of the latter (or, the latter a superconcept of the former), i.e., (A1, B1) ≤ (A2, B2), if and only if A1 ⊆ A2 (which is equivalent to B1 ⊇ B2). The ordered set C(G, M, I; ≤) is called the concept lattice or (Galois lattice) of the context (G, M, I). A concept lattice can be drawn as a (Hasse) diagram in which concepts are represented by nodes interconnected by lines going down from superconcept nodes to subconcept ones. Priss [15], rewording an idea first mentioned by Kipke & Wille [8], suggests that once linguistic databases are formalized as concept lattices, the lattices can serve as an interlingua. She explains how a concept lattice can serve as a bridge between two languages with the aid of the figure below (taken from [13]): Fig. 2. – A concept lattice as an interlingua. [This figure] shows separate concept lattices for English and German words for “building”. The main difference between English and German is that in English “house” only applies to small residential buildings (denoted by letter “H”), whereas in German even small office buildings (denoted by letter “O”) and larger residential buildings can be called “Haus”. Only factories would not normally be called “Haus” in German. The lattice in the top of the figure constitutes an information channel in the sense of Barwise & Seligman [2] between the German and the English concept lattice. ([15] p. 158) We consider Priss’s approach a promising avenue for interlingua-based translation strategies. We suggest that this approach can work not only for isolated words but also even for text fragments. In what follows, we will sketch out a strategy with interlingual concept lattices serving as filters for refining translation results. The strategy proceeds as follows: 1) Compile a concept lattice from a data source like WordNet. 2) Link the nodes of the lattice to their possibly corresponding expressions in the source and target language. 3) Translate the input text into the target language with no consideration of the pragmatic aspects of its meaning. 4) Integrate the concepts derived from the input text into the concept lattice. The main motivation behind this strategy is to refine the translation results to a certain extent by means of pragmatic knowledge structured as formal contexts. Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 65 4 A Translation System with Interlingual Concept Lattices 4.1 A Concept Lattice Generator Concept lattices to be used as machine translation filters should contain concept nodes associated with both functional and substantive words. All languages have a finite number of functional words. Therefore, a manual construction of the lattice fragments that would contain them would be reasonable. However, manually constructing a concept lattice for lexical words would have considerable drawbacks such as the following: • It is labor intensive. • It is prone to yielding errors which are difficult to detect automatically. • It generates incomplete lists that are costly to extend to cover missing information. • It is not easy to adapt to changes and domain-specific needs. Taking these potential problems into consideration, we have developed a tool for generating concept lattices for lexical words automatically. As this is an FCA application, it is crucial to decide on which formal context to use before delving its implementation details. Priss & Old [16] propose to construct concept neighborhoods in WordNet with a formal context where the formal objects are the words of the synsets belonging to all senses of a word, the formal attributes are the words of the hypernymic synsets and the incidence relation is the semantic relation between the synsets and their hypernymic synsets. The neighborhood lattice of a word in WordNet consists of all words that share some senses with that word.1 Below is the neighborhood lattice their method yields for the word volume: Fig. 3. – Priss and Old’s neighborhood lattice for the word volume. 1 As lattices often grow very rapidly to a size too large to be visualized, Wille [18] describes a method for constructing smaller, so-called “neighborhood” lattices. 66 Yılmaz Kılıçaslan and Edip Serdar Güner Consider the bottom node. The concept represented by this node is not a naturally occurring one. Obviously, the adopted formal context causes two distinct natural concepts to collapse into one single formal concept here. The reason is simply that WordNet employs one single word, i.e., volume, for two distinct senses, i.e., publication and amount. This could leave a translation attempt with the task of disambiguating this word. In fact, WordNet marks each sense with a single so-called synset number. When constructing concept lattices in WordNet, we suggest two amendments to the formal context adopted by Priss and Old. First, the formal objects are to be the synset numbers. Second, the formal attributes are to include also some information compiled from the glosses of the words. The first change allows us to distinguish between the two senses of the word volume, as shown in Fig. 4a. But, we are still far from resolving all ambiguities concerning this word, as indicated by the presence of two objects in the leftmost node. The problem is that the hypernymic attributes are not sufficiently informative to differentiate the 3-D space sense of the word volume from its relative amount sense. This extra information resides in the glosses of the word and once encoded as attributes it evokes the required effect, as shown in Fig. 4b. Fig. 4a. – A neighborhood lattice with the Fig. 4b. – A more fine-grained neighborhood objects being synset numbers. lattice with the objects being synset numbers. Each gloss, which is most likely a noun phrase, is parsed by means of a shift-reduce parser to extract a set of attributes. Having collected the objects (i.e. the synset numbers) and the associated attributes, the FCA algorithm that comes with the FCALGS library [9] is used for deriving a lattice-based ontology from that collection. FCALGS employs a parallel and recursive algorithm. Apart from its being parallel, it is very similar to Kuznetsov’s [10] Close-by-One algorithm. However, even the lattice in Fig4.b is still defective in at least one respect. The names of the objects denoted are lost. To remedy this problem, we suggest to encode the objects as tuples of synset numbers and sets of names, as illustrated below. Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 67 Fig. 5. – A neighborhood lattice including the names of the objects. Another point to note is that the name of a synset serves as the attribute of a subconcept. For example, ‘entity’ is the name of the topmost synset. But, as everything is an entity, any subconcept must treat it as an element of its set of attributes. 4.2 A Sense Translator Each WordNet node is associated with a set of synonymous English words, which is referred to as its synset. Each synset, in effect, denotes a sense in English. Thus, one task to accomplish is to translate synsets into Turkish to the furthest possible extent. We should, of course, keep in mind that some synsets (i.e. some senses encoded in English) may not have a counterpart in the target language. To find the Turkish translation of a particular synset, the Sense Translator first downloads a set of relevant articles via the links given in the disambiguation pages Wikipedia provides for the words in this set. It searches for the hypernyms of the synset in these articles. It assigns each article a score in accordance with the sum of the weighted points of the hypernyms found in this article. More specifically, if a synset has N hypernyms, the Kth hypernym starting from the top is assigned WeightK = K/N. Let FrequencyK be the number of occurrences of an item in a given article, then the score of the article is calculated as follows: Article Score = Weight1 * Frequency1 + ... + WeightN * FrequencyN. (1) If the article with the highest score has a link to a Turkish article, the title of the article will be the translation of the English word under examination. Otherwise, the word will be left unpaired with a Turkish counterpart. Figure 6 visualizes how the word cat in WordNet is translated into its Turkish counterpart, kedi, via Wikipedia. 68 Yılmaz Kılıçaslan and Edip Serdar Güner Fig. 6. - Translating the word cat into Turkish via Wikipedia. The Turkish counterparts will be added next to the English names, as shown below: Fig. 7. - A neighborhood lattice including the Turkish counterparts of the English names. 4.3 A Rule-Based Machine Translator We have designed a transfer-based architecture for Turkish-English translation and implemented the translator in SWI-Prolog which is an open-source implementation of the Prolog programming language. Below is a figure representing the main modules of the translator: Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 69 Fig. 8. - The main modules of the rule-based machine translator. The word list extracted by the Preprocessor is used as an input to the Analysis Module. We have devised a shift-reduce parser in the analysis phase for building up the grammatical structure of expressions. Briefly, a shift-reduce parser uses a bottom- up strategy with an ultimate goal of building trees rooted with a start symbol [1]. The Generation Module first rearranges the constituents using transformation rules. Afterwards, all the structures are lexically transferred into English using a bilingual dictionary. 4.4 Filtering Translation Results with the Concept Lattice Let us turn to our exemplary sentence introduced in (2) (i.e. Kadın onu tanıyor). Failing to take the context of the sentence into account, the rule-based translator generates the result in (8) (i.e. The woman knows him/her/it), where the pronoun is left ambiguous with respect to gender. Our claim is that we can resolve such ambiguities using FCA and thereby refine our translations. To this effect, we propose to generate transient formal concepts for noun phrases. We make the following assumptions. Basically, personal pronouns, determiners and proper names introduce formal objects whereas adjectives and nouns encode formal attributes. Suppose that our sentence is preceded by (the Turkish paraphrase of) a sentence like ‘A man has arrived’. The indefinite determiner evokes a new formal object, say obj1. As the source text is in Turkish, all attributes will be Turkish words. The Turkish counterpart of the word man is adam. Thus, the transient concept for the subject of this sentence will be ({obj1}, {adam}). The task is now to embed this transient concept into the big permanent concept lattice. To do this, a node where the Turkish counterpart of the synset name is ‘adam’ is searched for. Immediately below this node is placed a new node with its set of objects being {obj1} and with no additional attributes. As this is a physical object, the subconcept of this new node has to be the 70 Yılmaz Kılıçaslan and Edip Serdar Güner lowest one. As for the second sentence, the NP kadın (the woman) will be associated with the transient concept ({X},{kadın}) and the pronoun onu (him/her/it) with the transient concept ({Y},{entity}). X and Y are parameters to be anchored to particular formal objects. In other words, they are anaphoric. It seems plausible to assert that the attributes of an anaphoric object must constitute a (generally proper) subset or hypernym set of the attributes of the object serving as the antecedent. Assume that X is somehow anaphorically linked to an object obj2. Now, there are two candidate antecedents for Y. The woman, or the object obj2, is barred from being antecedent of the pronoun by a locality principle like one stated in Chomsky’s [3] Binding Theory: roughly stated, a pronoun and its antecedent cannot occur in the same clause. There remains one single candidate antecedent, obj1. As its attribute set is a hyponym set of {entity}, it can be selected as a legitimate antecedent. The concept node created for the man will also be the one denoted by the pronoun with Y being instantiated with obj1. In the concept lattice constructed in WordNet, the concept named as ‘man’ includes ‘male person’ in its set of attributes. Hence, the ambiguity is resolved and the pronoun translates into English as ‘him’. It is worth noting that in case there is more than one candidate antecedent, an anaphora resolution technique, especially a statistical one, can be employed to pick out the candidate most likely to be the antecedent. The interested reader is referred to Mitkov [12] for a survey of anaphora resolution approaches in general and to Kılıçaslan et al [7] for anaphora resolution in Turkish. The gender disambiguation process can also be carried out for common nouns. Consider the following fragment taken from a child story: (9) Turkish, leaves not only pronouns but also many other words ambiguous with respect to the gender feature. The word ‘kardeş’ in this example is ambiguous between the translations sister and brother. This ambiguity will be resolved in favor of the former interpretation in way similar to the disambiguation process sketched out for pronouns above. In fact, the problem of sense disambiguation is a kind of specification problem. Therefore, it cannot be confined to gender disambiguation. For example, given that we have somehow managed to compile the attributes listed in the column on the left- hand side, our FCA-based system generates the translations listed on the right-hand side: Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 71 zehirli, diş ‘poisonous, tooth’ fang zehirli, mantar ‘poisonous, mushroom’ toadstool sivri, diş ‘sharp, tooth’ fang arka, koltuk ‘rear, seat’ rumble acemi, asker ‘inexperienced, soldier’ recruit It will, of course, be interesting to try to solve other kinds of translation problems with FCA-based techniques. We leave this task to accomplish in the light of further research in the future. 5 Results and Evaluation In the early years of MT, the quality of an MT system was determined by human judgment. Though specially trained for the purpose, human judges are prone to suffer at least from subjectivity. Besides, this exercise is almost always more costly and time consuming. Some automated evaluation metrics have been developed in order to overcome such problems. Among these are BLEU, NIST, WER and PER. BLEU [14] and NIST [4] are rather specialized metrics. They are employed by considering the fraction of output n-grams that also appear in a set of human translations (n-gram precision). This allows the acknowledgment of a greater diversity of acceptable MT results. As for WER (Word Error Rate) and PER (Position-independent Word Error Rate), they are more general purpose measures and they rely on direct correspondence between the machine translation and a single human-produced reference. WER is based on the Levenshtein distance [11] which is the edit distance between a reference translation and its automatic translation, normalized by the length of the reference translation. This metric is formulated as: S+D+I (2) WER = N where N is the total number of words in the reference translation, S is the number of substituted words in the automatic translation, D is the number of words deleted from the automatic translation and I is the number of words inserted in the reference not appearing in the automatic translation. Although WER requires exactly the same order of the words in automatic translation and reference, PER neglects word order completely [17]. It measures the difference in the count of the words occurring in automatic and reference translations. The resulting number is divided by the number of words in the reference. It is worth noting that PER is technically not a distance measure as it uses a position-independent Levenshtein distance where the distance between a sentence and one of its permutations is always taken to be zero. We used WER to evaluate the performance of our MT system. This is probably the metric most commonly used for similar purposes. As we employed a single human- produced reference, this metric suits well to our evaluation setup. We fed our system 72 Yılmaz Kılıçaslan and Edip Serdar Güner with a Turkish child story involving 91 sentences (970 words).2 We post-edited the resulting translation in order to generate a reference. When necessary calculations were done in accordance with formula (1), the WER turned out to be 38%. The next step was to see the extent to which the performance or our MT system could be improved using concept lattices as filters for the raw results. To this effect, we devised several concept lattices like that in figure 3 and filtered the lexical constituents of each automatic translation with them. A considerable regression in error rate is observed in our system supplemented with concept lattices: the WER score is reduced down to a value around 30%. One question that comes to mind at this point is that of whether the improvement achieved is statistically significant or not. To get an answer we had recourse to the Wilcoxon Signed-Rank test. This test is used to analyze matched-pair numeric data, looking at the difference between the two values in each matched pair. When applied to the WER scores of the non-filtered and filtered translation results, the test shows that the difference is statistically significant (p < 0.005). Another question is that of whether the results are practically satisfactory. To get some insight to this question, we should employ a baseline system for a comparison on usability. Google Translate, a statistical MT system that stands out with its wide range of users, can serve for this purpose. The WER score obtained employing Google Translate on our data is 34%. Recalling that the WER score of our system supplemented with concept lattices is 30%, we seem to be entitled to argue for the viability of rule-based MT systems. Of course, we need to make this claim tentatively since the size of the data on which the comparisons are made is relatively small. However, it should also be noted that we have employed a limited number of concept lattices of considerably small sizes. It is of no doubt that increasing the number and size of filtering lattices would improve the performance of our MT system. More importantly, we do not primarily have an NLP concern in this work. Rather, we would like the results to be evaluated from a computational linguistics perspective. Everything aside, the results show that even a toy lattice based ontology can yield statistically significant improvement for an MT system. 6 Conclusion In this paper, we have illustrated some translation problems caused by some typological divergences between Turkish and English using a particular example. We have gone through the direct translation, syntactic transfer and semantic transfer phases of the rule-based translation model to see what problem is dealt with in what phase. We have seen that a context-dependent pragmatic process is necessary to get to a satisfactory result. Concept lattices appear to be very efficient tools for accomplishing this pragmatic disambiguation task. Supplementing a rule-based MT system with concept lattices not only yields statistically significant improvement on the results of the system but also enables it to compete with a statistical MT system like Google Translate. 2 This is the story where the example in (9) comes from. Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices 73 References 1. Aho, A.V., Ullman, J.D.: The Theory of Parsing, Translation, and Compiling, Vol. 1., Prentice Hall (1972) 2. Barwise J., Seligman, J.: Information Flow. The Logic of Distributed Systems. Cambridge University Press (1997) 3. Chomsky, N.: Lectures on Government and Binding, Foris, Dordrecht (1981). 4. Doddington, G.: “Automatic Evaluation of Machine Translation Quality Using N-gram Co- occurrence Statistics”. In Proceedings of HLT 2002 (2nd Conference on Human Language Technology). San Diego, California, 128-132 (2002) 5. Ganter, B., Wille, R.: Formale Begriffsanalyse: Mathematische Grundlagen. Berlin: Springer (1996) 6. Jurafsky, D., Martin, J. H.: Speech and Language Processing, 2nd Edition, Prentice Hall (2009) 7. Kılıçaslan, Y., Güner, E. S., Yıldırım, S.: Learning-based pronoun resolution for Turkish with a comparative evaluation, Computer Speech & Language, Volume 23, Issue 3, p. 311-331 (2009) 8. Kipke, U., Wille, R.: Formale Begriffsanalyse erläutert an einem Wortfeld. LDV–Forum, 5 (1987) 9. Krajca, P., Outrata, J., Vychodil, V.: Parallel Recursive Algorithm for FCA. In: Belohlavek R., Kuznetsov S. O. (Eds.): Proc. CLA 2008, CEUR WS, 433, 71–82 (2008) 10. Kuznetsov, S.: Learning of Simple Conceptual Graphs from Positive and Negative Examples. PKDD 1999, pp. 384–391 (1999) 11. Levenshtein, V. I.: "Binary codes capable of correcting deletions, insertions, and reversals," Tech. Rep. 8. (1966) 12. Mitkov, R.: Anaphora Resolution: The State of the Art. Technical Report, University of Wolverhampton (1999) 13. Old, L. J., Priss, U.: Metaphor and Information Flow. In Proceedings of the 12th Midwest Artificial Intelligence and Cognitive Science Conference, pp. 99-104 (2001) 14. Papineni, K., Roukos, S., Ward, T., Zhu, W. J.: "BLEU: a method for automatic evaluation of machine translation" in ACL-2002: 40th Annual meeting of the Association for Computational Linguistics pp. 311–318 (2002) 15. Priss, U.: Linguistic Applications of Formal Concept Analysis, Ganter; Stumme; Wille (eds.), Formal Concept Analysis, Foundations and Applications, Springer Verlag, LNAI 3626, pp. 149-160 (2005) 16. Priss, U., Old, L. J.: "Concept Neighbourhoods in Lexical Databases.", In Proceedings of the 8th International Conference on Formal Concept Analysis, ICFCA'10, Springer Verlag, LNCS 5986, p. 283-295 (2010) 17. Tillmann C., Vogel, S., Ney, H., Zubiaga A., Sawaf, H.: Accelerated DP based search for statistical translation. In European Conf. on Speech Communication and Technology, pages 2667–2670, Rhodes, Greece, September (1997) 18. Wille, R.: The Formalization of Roget’s International Thesaurus. Unpublished manuscript (1993) Concept lattices in fuzzy relation equations? Juan Carlos Dı́az and Jesús Medina?? Department of Mathematics. University of Cádiz Email: {juancarlos.diaz,jesus.medina}@uca.es Abstract. Fuzzy relation equations are used to investigate theoretical and applicational aspects of fuzzy set theory, e.g., approximate reasoning, time series forecast, decision making and fuzzy control, etc.. This paper relates these equations to a particular kind of concept lattices. 1 Introduction Recently, multi-adjoint property-oriented concept lattices have been introduced in [16] as a generalization of property-oriented concept lattices [10,11] to a fuzzy environment. These concept lattices are a new point of view of rough set the- ory [23] that considers two different sets: the set of objects and the set of at- tributes. On the other hand, fuzzy relation equations, introduced by E. Sanchez [28], are associated to the composition of fuzzy relations and have been used to in- vestigate theoretical and applicational aspects of fuzzy set theory [22], e.g., ap- proximate reasoning, time series forecast, decision making, fuzzy control, as an appropriate tool for handling and modeling of nonprobabilistic form of uncer- tainty, etc. Many papers have investigated the capacity to solve (systems) of fuzzy relation equations, e.g., in [1, 8, 9, 25, 26]. In this paper, the multi-adjoint relation equations are presented as a general- ization of the fuzzy relation equations [24,28]. This general environment inherits the properties of the multi-adjoint philosophy, consequently, e.g., several con- junctors and residuated implications defined on general carriers as lattice struc- tures can be used, which provide more flexibility in order to relate the variables considered in the system. Moreover, multi-adjoint property-oriented concept lattices and systems of multi-adjoint relation equations have been related in order to obtain results that ensure the existence of solutions in these systems. These definitions and results are illustrated by a toy example to improve the readability and comprehension of the paper. Among all concept lattice frameworks, we have related the multi-adjoint property-oriented concept lattices to the systems of multi-adjoint relation equa- tions, e.g., the extension and intension operators of this concept lattice can be ? Partially supported by the Spanish Science Ministry TIN2009-14562-C05-03 and by Junta de Andalucı́a project P09-FQM-5233. ?? Corresponding author. c 2011 by the paper authors. CLA 2011, pp. 75–86. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 76 Juan Carlos Dı́az and Jesús Medina-Moreno used to represent multi-adjoint relation equations, and, as a result, the solu- tions of these systems of relation equations can be related to the concepts of the corresponding concept lattice. The more important consequence is that this relation provides that the prop- erties given, e.g., in [2–4,12,14,17,18,27] can be applied to obtain many properties of these systems. Indeed, it can be considered that the algorithms presented, e.g., in [5, 6, 15] obtain solutions for these systems. The plan of this paper is the following: in Section 2 we will recall the multi- adjoint property-oriented concept lattices as well as the basic operators used and some properties; later, in Section 3, an example will be introduced to motivate the multi-adjoint relation equations. Once these equations have been presented, in Section 4 the multi-adjoint property-oriented concept lattices and the systems of multi-adjoint relation equations will be related in order to obtain results which ensure the existence of solutions in these systems; the paper ends with some conclusions and prospects for future work. 2 Multi-adjoint property-oriented concept lattices The basic operators in this environment are the adjoint triples, which are formed by three mappings: a non-commutativity conjunctor and two residuated impli- cations [13], which satisfy the well-known adjoint property. Definition 1. Let (P1 , ≤1 ), (P2 , ≤2 ), (P3 , ≤3 ) be posets and & : P1 × P2 → P3 , . : P3 × P2 → P1 , - : P3 × P1 → P2 be mappings, then (&, ., -) is an adjoint triple with respect to P1 , P2 , P3 if: 1. & is order-preserving in both arguments. 2. . and - are order-preserving on the first argument1 and order-reversing on the second argument. 3. x ≤1 z . y iff x & y ≤3 z iff y ≤2 z - x, where x ∈ P1 , y ∈ P2 and z ∈ P3 . Example of adjoint triples are the Gödel, product and Lukasiewicz t-norms together with their residuated implications. Example 1. Since the Gödel, product and Lukasiewicz t-norms are commuta- tive, the residuated implications satisfy that .G =-G , .P =-P and .L =-L . Therefore, the Gödel, product and Lukasiewicz adjoint triples are defined on [0, 1] as: &P (x, y) = x · y z -P x = min(1, z/x) ( 1 if x ≤ z &G (x, y) = min(x, y) z -G x = z otherwise &L (x, y) = max(0, x + y − 1) z -G x = min(1, 1 − x + z) 1 Note that the antecedent will be evaluated on the right side, while the consequent will be evaluated on the left side, as in logic programming framework. 2 Concept lattices in fuzzy relation equations 77 In [19] more general examples of adjoint triples are given. The basic structure, which allows the existence of several adjoint triples for a given triplet of lattices, is the multi-adjoint property-oriented frame. Definition 2. Given two complete lattices (L1 , 1 ) and (L2 , 2 ), a poset (P, ≤) and adjoint triples with respect to P, L2 , L1 , (&i , .i , -i ), for all i = 1, . . . , l, a multi-adjoint property-oriented frame is the tuple (L1 , L2 , P, 1 , 2 , ≤, &1 , .1 , -1 , . . . , &l , .l , -l ) Multi-adjoint property-oriented frames are denoted as (L1 , L2 , P, &1 , . . . , &l ). Note that the notation is similar to a multi-adjoint frame [18], although the adjoint triples are defined on different carriers. The definition of context is analogous to the one given in [18]. Definition 3. Let (L1 , L2 , P, &1 , . . . , &l ) be a multi-adjoint property-oriented frame. A context is a tuple (A, B, R, σ) such that A and B are non-empty sets (usually interpreted as attributes and objects, respectively), R is a P -fuzzy rela- tion R : A × B → P and σ : B → {1, . . . , l} is a mapping which associates any element in B with some particular adjoint triple in the frame.2 From now on, we will fix a multi-adjoint property-oriented frame and context, (L1 , L2 , P, &1 , . . . , &l ), (A, B, R, σ). ↓N Now we define the following mappings ↑π : LB A 2 → L1 and : LA B 1 → L2 as g ↑π (a) = sup{R(a, b) &σ(b) g(b) | b ∈ B} (1) N f ↓ (b) = inf{f (a) -σ(b) R(a, b) | a ∈ A} (2) Clearly, these definitions3 generalize the classical possibility and necessity operators [11] and they form an isotone Galois connection [16]. There are two dual versions of the notion of Galois connetion. The most famous Galois connec- tion, where the maps are order-reversing, is properly called Galois connection, and the other in which the maps are order-preserving, will be called isotone Ga- lois connection. In order to make this contribution self-contained, we recall their formal definitions: Let (P1 , ≤1 ) and (P2 , ≤2 ) be posets, and ↓ : P1 → P2 , ↑ : P2 → P1 mappings, the pair (↑ , ↓ ) forms a Galois connection between P1 and P2 if and only if: ↑ and ↓ are order-reversing; x ≤1 x↓↑ , for all x ∈ P1 , and y ≤2 y ↑↓ , for all y ∈ P2 . The one we adopt here is the dual definition: Let (P1 , ≤1 ) and (P2 , ≤2 ) be posets, and ↓ : P1 → P2 , ↑ : P2 → P1 mappings, the pair (↑ , ↓ ) forms an isotone Galois connection between P1 and P2 if and only if: ↑ and ↓ are order-preserving; x ≤1 x↓↑ , for all x ∈ P1 , and y ↑↓ ≤2 y, for all y ∈ P2 . 2 A similar theory could be developed by considering a mapping τ : A → {1, . . . , l} which associates any element in A with some particular adjoint triple in the frame. 3 From now on, to improve readability, we will write &b , -b instead of &σ(b) , -σ(b) . 3 78 Juan Carlos Dı́az and Jesús Medina-Moreno A concept, in this environment, is a pair of mappings hg, f i, with g ∈ LB , f ∈ N L , such that g ↑π = f and f ↓ = g, which will be called multi-adjoint property- A oriented formal concept. In that case, g is called the extension and f , the inten- sion of the concept. The set of all these concepts will be denoted as MπN [16]. Definition 4. A multi-adjoint property-oriented concept lattice is the set N ↑π MπN = {hg, f i | g ∈ LB A 2 , f ∈ L1 and g = f, f ↓ = g} in which the ordering is defined by hg1 , f1 i  hg2 , f2 i iff g1 2 g2 (or equivalently f1 1 f2 ). The pair (MπN , ) is a complete lattice [16], which generalize the concept lattice introduced in [7] to a fuzzy environment. 3 Multi-adjoint relation equations This section begins with an example that motivates the definition of multi- adjoint relation equations, which will be introduced later. 3.1 Multi-adjoint logic programming A short summary of the main features of multi-adjoint languages will be pre- sented. The reader is referred to [20, 21] for a complete formulation. A language L contains propositional variables, constants, and a set of logical connectives. In this fuzzy setting, the usual connectives are adjoint triples and a number of aggregators. The language L is interpreted on a (biresiduated) multi-adjoint lattice,4 hL, , .1 , -1 , &1 , . . . , .n , -n , &n i, which is a complete lattice L equipped with a collection of adjoint triples h&i , .i , -i i, where each &i is a conjunctor in- tended to provide a modus ponens-rule with respect to .i and -i . A rule is a formula A .i B or A -i B, where A is a propositional symbol (usually called the head) and B (which is called the body) is a formula built from propositional symbols B1 , . . . , Bn (n ≥ 0), truth values of L and conjunctions, disjunctions and aggregations. Rules with an empty body are called facts. A multi-adjoint logic program is a set of pairs hR, αi, where R is a rule and α is a value of L, which may express the confidence which the user of the system has in the truth of the rule R. Note that the truth degrees in a given program are expected to be assigned by an expert. Example 2. Let us to consider a multi-adjoint lattice h[0, 1], ≤, ←G , &G , ←P , &P , ∧L i 4 Note that a multi-adjoint lattice is a particular case of a multi-adjoint property- oriented frame. 4 Concept lattices in fuzzy relation equations 79 where &G and &P are the Gödel and product conjunctors, respectively, and ←G , ←P their corresponding residuated implications. Moreover, the Lukasie- wicz conjunctor ∧L will be used in the program [13]. Given the set of variables (propositional symbols) Π = {low oil, low water, rich mixture, overheating, noisy behaviour, high fuel consumption} the following set of multi-adjoint rules form a multi-adjoint program, which may represent the behaviour of a motor. hhigh fuel consumption ←G rich mixture ∧L low oil, 0.8i hoverheating ←G low oil, 0.5i hnoisy behaviour ←P rich mixture, 0.8i hoverheating ←P low water, 0.9i hnoisy behaviour ←G low oil, 1i The usual procedural is to measure the levels of “oil”, “water” and “mix- ture” of a specific motor, after that the values for low oil, low water and rich mixture are obtained, which are represented in the program as facts, for instance, the next ones can be added to the program: hlow oil ←P >, 0.2i hlow water ←P >, 0.2i hrich mixture ←P >, 0.5i Finally, the values for the rest of variables (propositional symbols) are com- puted [20]. For instance, in order to attain the value for overheating(o, w), for a level of oil, o, and water, w, the rules hoverheating ←G low oil, ϑ1 i and hoverheating ←P low water, ϑ2 i are considered and its value is obtained as: overheating(o, w) = (low oil(o) &G ϑ1 ) ∨ (low water(w) &P ϑ2 ) (3) Now, the problem could be to recompute the weights of the rules from experimental instances of the variables, that is, the values of overheating, noisy behaviour and high fuel consumption are known for particular mea- sures of low oil, low water and rich mixture. Specifically, given the levels of oil, o1 , . . . , on , the levels of water, w1 , . . . , wn , and the measures of mixture, t1 , . . . , tn , we may experimentally know the values of the variables: noisy behaviour(ti , oi ), high fuel consumption(ti , oi ) and overheating(oi , wi ), for all i ∈ {1, . . . , n}. Considering Equation (3), the unknown elements could be ϑ1 and ϑ2 instead of overheating(o, w). Therefore, the problem now is to look for the values of ϑ1 and ϑ2 , which solve the following system obtained after assuming the exper- imental data for the propositional symbols, ov1 , o1 , w1 , . . . , ovn , on , wn . overheating(ov1 ) = (low oil(o1 ) &G ϑ1 ) ∨ (low water(w1 ) &P ϑ2 ) .. .. .. .. . . . . overheating(ovn ) = (low oil(on ) &G ϑ1 ) ∨ (low water(wn ) &P ϑ2 ) 5 80 Juan Carlos Dı́az and Jesús Medina-Moreno This system can be interpreted as a system of fuzzy relation equations in which several conjunctors, &G and &P , are assumed. Moreover, these conjunctors could be neither non-commutative nor associative and defined in general lattices, as permit the multi-adjoint framework. Next sections introduce when these systems have solutions and a novel method to obtain them using concept lattice theory. 3.2 Systems of multi-adjoint relation equations The operators used in order to obtain the systems will be the generalization of the sup-∗-composition, introduced in [29], and inf-→-composition, introduced in [1]. From now on, a multi-adjoint property-oriented frame, (L1 , L2 , P, &1 , . . . , &l ) will be fixed. In the definition of a multi-adjoint relation equation an interesting mapping σ : U → {1, . . . , l} will be considered, which relates each element in U to an adjoint triple. This mapping will play a similar role as the one given in a multi- adjoint context, defined in the previous section, for instance, this map provides a partition of U in preference sets. A similar theory may be developed for V instead of U . Let U = {u1 , . . . , um } and V = {v1 , . . . , vn } be two universes, R ∈ L2 U ×V an unknown fuzzy relation, σ : U → {1, . . . , l} a map that relates each element in U to an adjoint triple, and K1 , . . . , Kn ∈ P U , D1 , . . . , Dn ∈ L1 V arbitrarily chosen fuzzy subsets of the respective universes. A system of multi-adjoint relation equations with sup-&-composition, is the following system of equations _ (Ki (u) &u R(u, v)) = Di (v), i ∈ {1, . . . , n} (4) u∈U where &u represents the adjoint conjunctor associated to u by σ, that is, if σ(u) = (&s , .s , -s ), for s ∈ {1, . . . , l}, then &u is exactly &s . If an element v of V is fixed and the elements Ki (uj ), R(uj , v) and Di (v) are written as kij , xj and di , respectively, for each i ∈ {1, . . . , n}, j ∈ {1, . . . , m}, then System (4) can particularly be written as k11 &u1 x1 ∨ · · · ∨ k1m &um xm = d1 .. .. .. .. (5) . . . . kn1 &u1 x1 ∨ · · · ∨ knm &um xm = dn where kij and di are known and xj must be obtained. Hence, for each v ∈ V , if we solve System (5), then we obtain a “column” of R (i.e. the elements R(uj , v), with j ∈ {1, . . . , m}), thus, solving n similar systems, one for each v ∈ V , the unknown relation R is obtained. Example 3. Assuming Example 2, in this case, we will try to solve the problem about to obtain the weights associated to the rules from particular observed data for the propositional symbols. 6 Concept lattices in fuzzy relation equations 81 The propositional symbols (variables) will be written in short as: hfc, nb, oh, rm, lo and lw, and the measures of particular cases of the behaviour of the motor will be: hi , ni , ovi , ri , oi , wi , for hfc, nb, oh, rm, lo and lw, respectively, in each case i, with i ∈ {1, 2, 3}. For instance, the next system associated to overheating is obtained from the computation provided in Example 2. oh(ov1 ) = (lo(o1 ) &G ϑoh oh lo ) ∨ (lw(w1 ) &P ϑlw ) oh(ov2 ) = (lo(o2 ) &G ϑoh oh lo ) ∨ (lw(w2 ) &P ϑlw ) oh(ov3 ) = (lo(o3 ) &G ϑoh oh lo ) ∨ (lw(w3 ) &P ϑlw ) where ϑoh oh lo and ϑlw are the weights associated to the rules with head oh. Similar systems can be obtained to high fuel consumption and noisy behaviour. Assuming the multi-adjoint frame with carrier L = [0, 1] and the Gödel and product triples, these systems are particular systems of multi-adjoint relational equations. The corresponding context is formed by the sets U = {rm, lo, lw, rm∧L lo}, V = {hfc, nb, oh}; the mapping σ that relates the elements lo, rm ∧L lo to the Gödel triple, and rm, lw to the product triple; the mappings K1 , . . . , Kn ∈ P U , defined as the values given by the propositional symbols in U on the ex- perimental data, for instance, if u = lo, then K1 (lo) = lo(o1 ), . . . , Kn (lo) = lo(on ); and the mappings D1 , . . . , Dn ∈ L1 V , defined analogously, for instance, if v = rm, then D1 (rm) = rm(r1 ), . . . , Dn (rm) = rm(rn ). Finally, the unknown fuzzy relation R ∈ L2 U ×V is formed by the weights of the rules in the program. In the system above, oh has been the element v ∈ V fixed. Moreover, as there do not exist rules with body rm and rm ∧L lo, that is, the weights for that hypothetical rules are 0, then the terms (rm(ri ) &G 0 = 0 and (rm(ri ) ∧L lo(oi ) &P 0 = 0 do not appear. Its counterpart is a system of multi-adjoint relation equations with inf--- composition, that is, ^ (R(u, v) -uj Kj∗ (v)) = Ej (u), j ∈ {1, . . . , m} (6) v∈V considered with respect to unknown fuzzy relation R ∈ L1 U ×V , and where K1∗ , . . . , Km ∗ ∈ P V and E1 , . . . , Em ∈ L2 U . Note that -uj represents the corre- sponding adjoint implication associated to uj by σ, that is, if σ(uj ) = (&s , .s , -s ), for s ∈ {1, . . . , l}, then -uj is exactly -s . Remark that in System 6, the implication -uj does not depend of the element u, but of j. Hence, the implications used in each equation of the system are the same. If an element u of U is fixed, fuzzy subsets K1∗ , . . . , Km∗ ∈ P V , E1 , . . . , Em ∈ U ∗ L2 are assumed, such that Kj (vi ) = kij , R(u, vi ) = yi and Ej (u) = ej , for each i ∈ {1, . . . , n}, j ∈ {1, . . . , m}, then System (6) can particularly be written as y1 -u1 k11 ∧ · · · ∧ yn -u1 kn1 = e1 .. .. .. .. (7) . . . . y1 -um k1m ∧ · · · ∧ yn -um knm = em 7 82 Juan Carlos Dı́az and Jesús Medina-Moreno Therefore, for each u ∈ U , we obtain a “row” of R (i.e. the elements R(u, vi ), with i ∈ {1, . . . , n}), consequently, solving m similar systems, the unknown relation R is obtained. Systems (5) and (7) have the same goal, searching for the unknown relation R although the mechanism is different. Analyzing these systems, we have that the left side of these systems can be represented by the mappings CK : Lm n n m 2 → L1 , IK ∗ : L1 → L2 , defined as: CK (x̄)i = ki1 &u1 x1 ∨ · · · ∨ kim &um xm , for all i ∈ {1, . . . , n} (8) IK ∗ (ȳ)j = y1 -uj k1j ∧ · · · ∧ yn -uj knj , for all j ∈ {1, . . . , m} (9) where x̄ = (x1 , . . . , xm ) ∈ Lm n 2 , ȳ = (y1 , . . . , yn ) ∈ L1 , and CK (x̄)i , IK ∗ (ȳ)j are the components of CK (x̄), IK ∗ (ȳ), respectively, for each i ∈ {1, . . . , n} and j ∈ {1, . . . , m}. Hence, Systems (5) and (7) can be written as: CK (x1 , . . . , xm ) = (d1 , . . . , dn ) (10) IK ∗ (y1 , . . . , yn ) = (e1 , . . . , em ) (11) respectively. 4 Relation between multi-adjoint property-oriented concept lattices and multi-adjoint relation equation This section shows that Systems (5) and (7) can be interpreted in a multi- adjoint property-oriented concept lattice. And so, the properties given to the N isotone Galois connection ↑π and ↓ , as well as to the complete lattice MπN can be used in the resolution of these systems. First of all, the environment must be fixed. Hence, a multi-adjoint context (A, B, S, σ) will be considered, such that A = V 0 , B = U , where V 0 has the same cardinality as V , σ will be the mapping given by the systems and S : A × B → P is defined as S(vi0 , uj ) = kij . Note that A = V 0 is related to the mappings Ki , since S(vi0 , uj ) = kij = Ki (uj ); Now, we will prove that the mappings defined at the end of the previous section are related to the isotone Galois connection. Given µ ∈ LB 2 , such that µ(uj ) = xj , for all j ∈ {1, . . . , m}, the following equalities are obtained, for each i ∈ {1, . . . , n}: CK (x̄)i = ki1 &u1 x1 ∨ · · · ∨ kim &um xm = S(vi0 , u1 ) &u1 µ(u1 ) ∨ · · · ∨ S(vi0 , um ) &um µ(um ) = sup{S(vi0 , uj ) &uj µ(uj ) | j ∈ {1, . . . , m}} = µ↑π (vi0 ) ↑π Therefore, the mapping CK : Lm n 2 → L1 is equivalent to the mapping : LB 2 → A m B L1 , where an element x̄ in L2 can be interpreted as a map µ in L2 , such that 8 Concept lattices in fuzzy relation equations 83 µ(uj ) = xj , for all j ∈ {1, . . . , m}, and the element CK (x̄) as the mapping µ↑π , such that µ↑π (vi0 ) = CK (x̄)i , for all i ∈ {1, . . . , n}. An analogy can be developed applying the above procedure to mappings IK ∗ N ↓N and ↓ , obtaining that the mappings IK ∗ : Ln1 → Lm 2 and : LA B 1 → L2 are equivalent. As a consequence, the following result holds: Theorem 1. The mappings CK : Lm n n m 2 → L1 , IK ∗ : L1 → L2 , establish an iso- m m tone Galois connection. Therefore, IK ∗ ◦ CK : L2 → L2 is a closure operator and CK ◦ IK ∗ : Ln1 → Ln1 is an interior operator. As (CA , IK ∗ ) is an isotone Galois connection, any result about the solvability of one system has its dual counterpart. The following result explains when these systems can be solved and how a solution can be obtained. N Theorem 2. System (5) can be solved if and only if hλ↓d¯ , λd¯i is a concept of MπN , where λd¯ : A = {v1 , . . . , vn } → L1 , defined as λd¯(vi ) = di , for all N i ∈ {1, . . . , n}. Moreover, if System (5) has a solution, then λ↓d¯ is the greatest solution of the system. Similarly, System (7) can be solved if and only if hµē , µē↑π i is a concept of MπN , where µē : B = {u1 , . . . , um } → L2 , defined as µē (uj ) = ej , for all j ∈ {1, . . . , m}. Furthermore, if System (7) has a solution, then µ↑ē π is the smallest solution of the system. The main contribution of the relation introduced in this paper is not only the above consequences, but a lot of other properties for Systems (5) and (7) that can be stabilized from the results proved, for example, in [2–4, 12, 14, 17, 18, 27]. Next example studies the system of multi-adjoint relation equations presented in Example 3. Example 4. The aim will be to solve a small system in order to improve the understanding of the method. In the environment of Example 3, the following system will be solved assuming the experimental data: oh(ov1 ) = 0.5, lo(o1 ) = 0.3, lw(w1 ) = 0.3, oh(ov2 ) = 0.7, lo(o2 ) = 0.6, lw(w2 ) = 0.8, oh(ov3 ) = 0.4, lo(o3 ) = 0.5, lw(w3 ) = 0.2. oh(ov1 ) = (lo(o1 ) &G ϑoh oh lo ) ∨ (lw(w1 ) &P ϑlw ) oh(ov2 ) = (lo(o2 ) &G ϑoh oh lo ) ∨ (lw(w2 ) &P ϑlw ) oh(ov3 ) = (lo(o3 ) &G ϑoh oh lo ) ∨ (lw(w3 ) &P ϑlw ) where ϑoh oh lo and ϑlw are the variables. The context is: A = V 0 = {1, 2, 3}, the set of observations, B = U = {lo, lw}, σ associates the propositional symbol lo to the Gödel triple and lw to the product triple. The relation S : A × B → [0, 1] is defined in Table 1. Therefore, considering the mapping λoh : A → [0, 1] associated to the values of overheating in each experimental case, that is λoh (1) = 0.5, λoh (2) = 0.7, 9 84 Juan Carlos Dı́az and Jesús Medina-Moreno Table 1. Relation S. low oil low water 1 0.3 0.3 2 0.6 0.8 3 0.5 0.2 and λoh (3) = 0.4; and the mapping CK : [0, 1]2 → [0, 1]3 , defined in Equation (8), the system above can be written as CK (ϑoh oh lo , ϑlw ) = λoh Since, by the comment above, there exists µ ∈ [0, 1]B , such that CK (ϑoh oh lo , ϑlw ) = ↑π B ↑π µ , the goal will be to attain the mapping µ ∈ [0, 1] , such that µ = λoh , N which can be found if and only if ((λoh )↓ , λoh ) is a multi-adjoint property- oriented concept in the considered context, by Theorem 2. N First of all, we compute (λoh )↓ . N (λoh )↓ (lo) = inf{λoh (1) -G S(1, lo), λoh (2) -G S(2, lo), λoh (3) -G S(3, lo)} = inf{0.5 -G 0.3, 0.7 -G 0.6, 0.4 -G 0.5} = inf{1, 1, 0.4} = 0.4 ↓N (λoh ) (lw) = inf{0.5 -P 0.3, 0.7 -P 0.8, 0.4 -P 0.2} = inf{1, 0.875, 1} = 0.875 N Now, the mapping (λoh )↓ ↑π is obtained. N N N (λoh )↓ ↑π (1) = sup{S(1, lo) &G (λoh )↓ (lo), S(1, lw) &P (λoh )↓ (lw)} = sup{0.3 &G 0.4, 0.3 &P 0.875} = sup{0.3, 0.2625} = 0.3 ↓N ↑π (λoh ) (2) = sup{0.6 &G 0.4, 0.8 &P 0.875} = 0.7 ↓N ↑π (λoh ) (3) = sup{0.5 &G 0.4, 0.2 &P 0.875} = 0.4 N Therefore, ((λoh )↓ , λoh ) is not a multi-adjoint property-oriented concept and thus, the considered system has no solution, although if the experimental value for oh had been 0.3 instead of 0.5, the system would have had a solution. These changes could be considered in several applications where noisy vari- ables exist and their values can be conveniently changed to obtain approximate solutions for the systems. Thus, if the experimental data for overheating are oh(ov1 ) = 0.3, oh(ov2 ) = 0.7 and oh(ov2 ) = 0.4, then the original system will have at least one solution and the values ϑoh oh lo , ϑlw will be 0.4, 0.875, respectively for a solution. Consequently, the truth for the first rule is lower than for the second or it might be thought that it is more determinant in obtaining higher 10 Concept lattices in fuzzy relation equations 85 values for lw than for lo. Another possibility is to consider that this conclusion about the certainty of the rules is not correct, in which case another adjoint triple might be associate to lo. As a result, the properties introduced in several fuzzy formal concept anal- ysis frameworks can be applied in order to obtain solutions of fuzzy relation equations, as well as in the multi-adjoint general framework. Furthermore, in order to obtain the solutions of Systems (5) and (7), the algorithms developed, e.g., in [5, 6, 15], can be used. 5 Conclusions and future work Multi-adjoint relation equations have been presented that generalize the existing definitions presented at this time. In this general environment, different conjunc- tors and residuated implications can be used, which provide more flexibility in order to relate the variables considered in the system. A toy example has been introduced in the paper in order to improve its readability and reduce the complexity of the definitions and results. As a consequence of the results presented in this paper, several of the prop- erties provided, e.g., in [2–4, 12, 14, 17, 18, 27], can be used to obtain additional characteristics of these systems. In the future, we will apply the results provided in the fuzzy formal con- cept analysis environments to the general systems of fuzzy relational equations presented here. References 1. W. Bandler and L. Kohout. Semantics of implication operators and fuzzy relational products. Int. J. Man-Machine Studies, 12:89–116, 1980. 2. E. Bartl, R. Bělohlávek, J. Konecny, and V. Vychodil. Isotone galois connections and concept lattices with hedges. In 4th International IEEE Conference “Intelli- gent Systems”, pages 15.24–15.28, 2008. 3. R. Bělohlávek. Lattices of fixed points of fuzzy Galois connections. Mathematical Logic Quartely, 47(1):111–116, 2001. 4. R. Bělohlávek. Concept lattices and order in fuzzy logic. Annals of Pure and Applied Logic, 128:277–298, 2004. 5. R. Bělohlávek, B. D. Baets, J. Outrata, and V. Vychodil. Lindig’s algorithm for concept lattices over graded attributes. Lecture Notes in Computer Science, 4617:156–167, 2007. 6. R. Bělohlávek, B. D. Baets, J. Outrata, and V. Vychodil. Computing the lattice of all fixpoints of a fuzzy closure operator. IEEE Transactions on Fuzzy Systems, 18(3):546–557, 2010. 7. Y. Chen and Y. Yao. A multiview approach for intelligent data analysis based on data operators. Information Sciences, 178(1):1–20, 2008. 8. B. De Baets. Analytical solution methods for fuzzy relation equations. In D. Dubois and H. Prade, editors, The Handbooks of Fuzzy Sets Series, volume 1, pages 291– 340. Kluwer, Dordrecht, 1999. 11 86 Juan Carlos Dı́az and Jesús Medina-Moreno 9. A. Di Nola, S. Sessa, W. Pedrycz, and E. Sanchez. Fuzzy Relation Equations and Their Applications to Knowledge Engineering. Kluwer, 1989. 10. I. Düntsch and G. Gediga. Approximation operators in qualitative data analysis. In Theory and Applications of Relational Structures as Knowledge Instruments, pages 214–230, 2003. 11. G. Gediga and I. Düntsch. Modal-style operators in qualitative data analysis. In Proc. IEEE Int. Conf. on Data Mining, pages 155–162, 2002. 12. G. Georgescu and A. Popescu. Non-dual fuzzy connections. Arch. Math. Log., 43(8):1009–1039, 2004. 13. P. Hájek. Metamathematics of Fuzzy Logic. Trends in Logic. Kluwer Academic, 1998. 14. H. Lai and D. Zhang. Concept lattices of fuzzy contexts: Formal concept analysis vs. rough set theory. International Journal of Approximate Reasoning, 50(5):695– 707, 2009. 15. C. Lindig. Fast concept analysis. In G. Stumme, editor, Working with Conceptual Structures-Contributions to ICCS 2000, pages 152–161, 2000. 16. J. Medina. Towards multi-adjoint property-oriented concept lattices. Lect. Notes in Artificial Intelligence, 6401:159–166, 2010. 17. J. Medina and M. Ojeda-Aciego. Multi-adjoint t-concept lattices. Information Sciences, 180(5):712–725, 2010. 18. J. Medina, M. Ojeda-Aciego, and J. Ruiz-Calviño. Formal concept analysis via multi-adjoint concept lattices. Fuzzy Sets and Systems, 160(2):130–144, 2009. 19. J. Medina, M. Ojeda-Aciego, A. Valverde, and P. Vojtáš. Towards biresiduated multi-adjoint logic programming. Lect. Notes in Artificial Intelligence, 3040:608– 617, 2004. 20. J. Medina, M. Ojeda-Aciego, and P. Vojtáš. Multi-adjoint logic programming with continuous semantics. In Logic Programming and Non-Monotonic Reasoning, LPNMR’01, pages 351–364. Lect. Notes in Artificial Intelligence 2173, 2001. 21. J. Medina, M. Ojeda-Aciego, and P. Vojtáš. Similarity-based unification: a multi- adjoint approach. Fuzzy Sets and Systems, 146:43–62, 2004. 22. A. D. Nola, E. Sanchez, W. Pedrycz, and S. Sessa. Fuzzy Relation Equations and Their Applications to Knowledge Engineering. Kluwer Academic Publishers, Norwell, MA, USA, 1989. 23. Z. Pawlak. Rough sets. International Journal of Computer and Information Sci- ence, 11:341–356, 1982. 24. W. Pedrycz. Fuzzy relational equations with generalized connectives and their applications. Fuzzy Sets and Systems, 10(1-3):185 – 201, 1983. 25. I. Perfilieva. Fuzzy function as an approximate solution to a system of fuzzy relation equations. Fuzzy Sets and Systems, 147(3):363–383, 2004. 26. I. Perfilieva and L. Nosková. System of fuzzy relation equations with inf-→ com- position: Complete set of solutions. Fuzzy Sets and Systems, 159(17):2256–2271, 2008. 27. A. M. Radzikowska and E. E. Kerre. A comparative study of fuzzy rough sets. Fuzzy Sets and Systems, 126(2):137–155, 2002. 28. E. Sanchez. Resolution of composite fuzzy relation equations. Information and Control, 30(1):38–48, 1976. 29. L. A. Zadeh. The concept of a linguistic variable and its application to approximate reasoning I, II, III. Information Sciences, 8–9:199–257, 301–357, 43–80, 1975. 12 Adaptation knowledge discovery for cooking using closed itemset extraction Emmanuelle Gaillard, Jean Lieber, and Emmanuel Nauer LORIA (UMR 7503—CNRS, INRIA, Nancy University) BP 239, 54506 Vandœuvre-lès-Nancy, France, First-Name.Last-Name@loria.fr Abstract. This paper is about the adaptation knowledge (AK) discov- ery for the Taaable system, a case-based reasoning system that adapts cooking recipes to user constraints. The AK comes from the interpreta- tion of closed itemsets (CIs) whose items correspond to the ingredients that have to be removed, kept, or added. An original approach is pro- posed for building the context on which CI extraction is performed. This approach focuses on a restrictive selection of objects and on a specific ranking based on the form of the CIs. Several experimentations are pro- posed in order to improve the quality of the AK being extracted and to decrease the computation time. This chain of experiments can be seen as an iterative knowledge discovery process: the analysis following each experiment leads to a more sophisticated experiment until some concrete and useful results are obtained. Keywords: adaptation knowledge discovery, closed itemset, data preprocess- ing, case-based reasoning, cooking. 1 Introduction This paper addresses the adaptation challenge proposed by the Computer Cook- ing Contest (http://computercookingcontest.net/) which consists in adapt- ing a given cooking recipe to specific constraints. For example, the user wants to adapt a strawberry pie recipe, because she has no strawberry. The underlying question is: which ingredient(s) will the strawberries be replaced with? Adapting a recipe by substituting some ingredients by others requires cook- ing knowledge and adaptation knowledge in particular. Taaable, a case-based reasoning (CBR) system, addresses this problem using an ingredient ontology. This ontology is used for searching which is/are the closest ingredient(s) to the one that has to be replaced. In this approach the notion of “being close to” is given by the distance between ingredients in the ontology. In the previous example, Taaable proposes to replace the strawberries by other berries (e.g. raspberries, blueberries, etc.). However, this approach is limited because two in- gredients which are close in the ontology are not necessarily interchangeable and because introducing a new ingredient in a recipe may be incompatible with some other ingredient(s) of the recipe or may required to add other ingredients. c 2011 by the paper authors. CLA 2011, pp. 87–99. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 88 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer This paper extends the approach proposed in [2] for extracting this kind of adaptation knowledge (AK). The approach is based on closed itemset (CI) extraction, in which items are the ingredients that have to be removed, kept, or added for adapting the recipe. This paper introduces two originalities. The first one concerns the way the binary context, on which the CI extraction is performed, is built, by focusing on a restrictive selection of objects according to the objectives of the knowledge discovery process. The second one concerns the way the CIs are filtered and ranked, according to their form. The paper is organised as follows: Section 2 specifies the problem in its whole context and introduces Taaable which will integrate the discovered AK in its reasoning process. Section 3 gives preliminaries for this work, introducing CI extraction, case-based reasoning, and related work. Section 4 explains our approach; several experiments and evaluations are described and discussed. 2 Context and motivations 2.1 Taaable The Computer Cooking Contest is an international contest that aims at compar- ing systems that make inferences about cooking. A candidate system has to use the recipe base given by the contest to propose a recipe matching the user query. This query is a set of constraints such as inclusion or rejection of ingredients, the type or the origin of the dish, and the compatibility with some diets (vegetarian, nut-free, etc.). Taaable [1] is a system that has been originally designed as a candidate of the Computer Cooking Contest. It is also used as a brain teaser for research in knowledge based systems, including knowledge discovery, ontology engineer- ing, and CBR. Like many CBR systems, Taaable uses an ontology to retrieve recipes that are the most similar to the query. Taaable retrieves and creates cooking recipes by adaptation. If there exist recipes exactly matching the query, they are returned to the user; otherwise the system is able to retrieve similar recipes (i.e. recipes that partially match the target query) and adapts these recipes, creating new ones. Searching similar recipes is guided by several ontolo- gies, i.e. hierarchies of classes (ingredient hierarchy, dish type hierarchy and dish origin hierarchy), in order to relax constraints by generalising the user query. The goal is to find the most specific generalisation of the query (the one with the minimal cost) for which recipes exist in the recipe base. Adaptation consists in substituting some ingredients of the retrieved recipes by the ones required by the query. Taaable retrieves recipes using query generalisation, then adapts them by substitution. This section gives a simplified description of the Taaable system. For more details about the Taaable inference engine, see e.g. [1]. For example, for adapting the “My Strawberry Pie” recipe to the no Strawberry constraint, the system first generalises Strawberry into Berry, then specialises Berry into, say, Raspberry. Adaptation knowledge discovery for cooking using closed itemset extraction 89 2.2 Domain ontology An ontology O defines the main classes and relations relevant to cooking. O is a set of atomic classes organised into several hierarchies (ingredient, dish type, dish origin, etc.). Given two classes B and A of this ontology, A is subsumed by B, denoted by B w A, if the set of instances of A is included in the set of instances of B. For instance, Berry w Blueberry and Berry w Raspberry. 2.3 Taaable adaptation principle Let R be a recipe and Q be a query such that R does not exactly match Q (oth- erwise, no adaptation would be needed). For example, Q = no Strawberry and R = “My Strawberry Pie”.The basic ontology-driven adaptation in Taaable follows the generalisation/specialisation principle explained hereafter (in a sim- plified way). First, R is generalised (in a minimal way) into Γ (R) that matches Q. For example, Γ may be the substitution Strawberry Berry. Second, Γ (R) is specialised into Σ(Γ (R)) that still matches Q. For example, Σ is the substitu- tion Berry Raspberry (the class Berry is too abstract for a recipe and must be made precise). This adaptation approach has at least two limits. First, the choice of Σ is at random: there is no reason to choose raspberries instead of blue- berries, unless additional knowledge is given. Second, when such a substitution of ingredient is made, it may occur that some ingredients should be added or removed from R. These limits point out the usefulness of additional knowledge for adaptation. 3 Preliminaries 3.1 Itemset extraction Itemset extraction is a set of data-mining methods for extracting regularities into data, by aggregating object items appearing together. Like FCA [8], itemset extraction algorithms start from a formal context K, defined by K = (G, M, r), where G is a set of objects, M is a set of items, and r is the relation on G × M stating that an object is described by an item [8]. Table 1 shows an example of context, in which recipes are described by the ingredients they require: G is a set of 5 objects (recipes R, R1 , R2 , R3 , and R4 ), M is a set of 7 items (ingredients Sugar, Water, Strawberry, etc.). An itemset I is a set of items, and the support of I, support(I), is the number of objects of the formal context having every item of I. I is frequent, with respect to a threshold σ, whenever support(I) ≥ σ. I is closed if it has no proper superset J (I ( J) with the same support. For example, {Sugar, Raspberry} is an item- set and support({Sugar, Raspberry}) = 2 because 2 recipes require both Sugar and Raspberry. However, {Sugar, Raspberry} is not a closed itemset, because {Sugar, PieCrust, Raspberry} has the same support. Another, equivalent, defi- nition of closed itemsets can be given on the basis of a closure operator ·00 defined as follows. Let I be an itemset and I 0 be the set of objects that have all the items 90 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer y h e Pi err Co arc Ge rry Ci uic Co st p Pi on ll i Ap n wb ru st Wh be ti eJ am he r r e ga te ra eC rn ol sp la pl pl nn eS Su Wa St Ra Ap R × × × × × × R1 × × × × × R2 × × × × R3 × × × × × R4 × × × × × Table 1. Formal context representing ingredients used in recipes. of I: I 0 = {x ∈ G | ∀i ∈ I, x r i}. In a dual way, let X be a set of objects and X 0 be the set of properties shared by all objects of X: X 0 = {i ∈ M | ∀x ∈ X, x r i}. This defines two operators: ·0 : I ∈ 2M 7→ I 0 ∈ 2G and ·0 : X ∈ 2G 7→ X 0 ∈ 2M . These operators can be composed in an operator ·00 : I ∈ 2M 7→ I 00 ∈ 2M . An itemset I is said to be closed if it is a fixed point of ·00 , i.e., I 00 = I. In the following, “CIs” stands for closed itemsets, and “FCIs” stands for frequent CIs. For σ = 3, the FCIs of this context are {Sugar, PieCrust}, {Sugar, PieCrust, Cornstarch}, {Sugar, Water}, {Sugar}, {Water}, {PieCrust}, and {Cornstarch}. For the following experiments, the Charm algorithm [12] that efficiently computes the FCIs is used thanks to Coron a software platform implementing a rich set of algorithmic methods for symbolic data mining [11]. 3.2 Case-based reasoning Case-based reasoning (CBR [10]) consists in answering queries with the help of previous experience units called cases. In Taaable, a case is a recipe and a query represents user constraints. In many systems, including Taaable, CBR consists in the retrieval of a case from the case base and in the adaptation of the retrieved case in an adapted case that solves the query. Retrieval in Taaable is performed by minimal generalisation of the query (cf. section 2.3). Adaptation can be a simple substitution (e.g., substitute strawberry with any berry) but it can be improved thanks to the use of some domain specific AK. This motivates the research on AK acquisition. 3.3 Related work The AK may be acquired in various way. It may be collected from experts [6], it may be acquired using machine learning techniques [9], or be semi-automatic, using data-mining techniques and knowledge discovery principles [3,4]. This paper addresses automatic AK discovery. Previous works, such as the ones proposed by d’Aquin et al. with the Kasimir project in the medical do- Adaptation knowledge discovery for cooking using closed itemset extraction 91 main [5], and by Badra et al. in the context of a previous work on Taaable [2], are the foundations of our work. Kasimir is a CBR system applied to decision support for breast cancer treatment. In Kasimir, a case is a treatment used for a given patient. The patient is described by characteristics (age, tumour size and location, etc.) and the treatment consists in applying medical instructions. In order to discover AK, cases that are similar to the target case are first selected. Then, FCIs are computed on the variations between the target case and the similar cases. FCIs matching a specific form are interpreted for generating AK [5]. Badra et al. use this approach to make cooking adaptations in Taaable [2]. Their work aims at comparing pairs of recipes depending on the ingredients they contain. A recipe R is represented by the set of its ingredients: Ingredients(R). For example, the recipe “My Strawberry Pie” is represented by Ingredients(“My Strawberry Pie”) = {Sugar, Water, Strawberry, PieCrust, Cornstarch, CoolWhip} Let (R, R0 ) be a pair of recipes which is selected. According to [2], the represen- tation of a pair is denoted by ∆, where ∆ represents the variation of ingredients between R and R0 . Each ingredient ing is marked by −, =, or +: – ing − ∈ ∆ if ing ∈ Ingredients(R) and ing ∈ / Ingredients(R0 ), meaning that ing appears in R but not in R0 . – ing + ∈ ∆ if ing ∈ / Ingredients(R) and ing ∈ Ingredients(R0 ), meaning that ing appears in R0 but not in R. – ing = ∈ ∆ if ing ∈ Ingredients(R) and ing ∈ Ingredients(R0 ), meaning that ing appears both in R in R0 . Building a formal context about ingredient variations in cooking reci- pes. Suppose we want to compare the recipe R with the four recipes (R1 , R2 , R3 , R4 ) given in Table 1. V Variations between R = “My Strawberry Pie” and a recipe Ri have the form j ingi,jmark . For example: ∆R,R1 = Sugar= ∧ Water− ∧ Strawberry− ∧ PieCrust= ∧ Cornstarch= ∧ CoolWhip− ∧ Raspberry+ ∧ Gelatin+ (1) According to these variations, a formal context K = (G, M, I) can be built (cf. Table 2, for the running example): – G = {∆R,Ri }i – M is the set of ingredient variations: M = {ingi,j mark }i,j . In particular, M contains all the conjuncts of ∆R,R1 (Strawberry , etc., cf.(1)). − – (g, m) ∈ I, if g ∈ G, m ∈ M , and m is a conjunct of g, for example (∆R,R1 , Strawberry− ) ∈ I. 92 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer eC ry − ol ch − rn ch = nn ce + la y + Co ust − Ra hip − Pi ust = Pi mon + ll + Ap in + Pi ber Co tar Co tar Ge err Ci Jui r− r= r= e+ he w r r s s W b t e a ga te te ra eC rn sp pl pl eS Su Wa Wa St Ap ∆R,R1 × × × × × × × × ∆R,R2 × × × × × × × ∆R,R3 × × × × × × × × ∆R,R4 × × × × × × × × × Table 2. Formal context for ingredient variations in pairs of recipes (R, Rj ). Interpretation. In the formal context, an ingredient marked with + (resp. −) is an ingredient that has to be added (resp. removed). An ingredient marked with = is an ingredient common to R and Ri . 4 Adaptation Knowledge discovery AK discovery is based on the same scheme as knowledge discovery in databases (KDD [7]). The main steps of the KDD process are data preparation, data- mining, and interpretation of the extracted units of information. Data prepara- tion relies on formatting data for being used by data-mining tools and on filtering operations for focusing on special subsets of objects and/or items, according to the objectives of KDD. Data-mining tools are applied for extracting regularities into the data. These regularities have then to be interpreted; filtering operations may also be performed on this step because of the (often) huge size of the data- mining results or of the noise included in these results. All the steps are guided by an analyst. The objective of our work is the extraction of some AK useful for adapt- ing a given recipe to a query. The work presented in the following focuses on filtering operations, in order to extract from a formal context encoding ingredient variations between pairs of recipes, the cooking adaptations. The database used as entry point of the process is the Recipe Source database (http://www.recipesource.com/) which contains 73795 cooking recipes. For the sake of simplicity, we consider in the following, the problem of adapting R by substituting one or several ingredient(s) with one or several ingredient(s) (but the approach can be generalised for removing more ingredients, and also be used for adding ingredient(s) in a recipe). Three experiments are presented; they address the same adaptation problem: adapting the R = “My Strawberry Pie” recipe, with Ingredients(“My Strawberry Pie”) = {Sugar, Water, Strawberry, PieCrust, Cornstarch, CoolWhip}, to the query no Strawberry. In each ex- periment, a formal context about ingredient variations in recipes is built. Then, FCIs are extracted and filtered for proposing cooking adaptation. The two first Adaptation knowledge discovery for cooking using closed itemset extraction 93 experiments focus on object filtering, selecting recipes which are more and more similar to the “My Strawberry Pie” recipe: the first experiment uses recipe from the same type (i.e. pie dish) as “My Strawberry Pie” instead of choosing recipes of any type; the second experiment focuses on a more precise filtering based on similarity between the “My Strawberry Pie” recipe and recipes used for gener- ating the formal context on ingredient variations. 4.1 A first approach with closed itemsets As introduced in [2], a formal context is defined, where objects are ordered pairs of recipes (R, R0 ) and properties are ingredients marked with +, =, − for representing the ingredient variations from R to R0 . The formal context which is build is similar to the example given in Table 2. In each pair of recipes, the first element is the recipe R =“My Strawberry Pie” that must be adapted; the second element is a recipe of the same dish type as R which, moreover, does not contain the ingredient which has to be removed. In our example, it corresponds to pie dish recipes which do not contain strawberry. This formal context allows to build CIs which have to be interpreted in order to acquire adaptation rules. Experiment. 3653 pie dish recipes that do not contain strawberry are found in the Recipe Source database. The formal context, with 3653 objects × 1355 items produces 107,837 CIs (no minimal support is used). Analysis. Some interesting CIs can be found. For example, {PieCrust− , Strawberry− , Cornstarch− , CoolWhip− , Water− , Sugar− } with support of 1657, contains all the ingredients of R with a − mark, meaning that there are 1657 recipes which have no common ingredients with the R recipe. In the same way, {PieCrust− , Strawberry− , Cornstarch− , CoolWhip− , Water− } with sup- port 2590, means that 2590 recipes share only the Sugar ingredient with R because the sugar is the sole ingredient of R which is not included in this CI. The same analysis can be done for {PieCrust− , Strawberry− , Cornstarch− , CoolWhip− , Sugar− } (support of 1900), for water, etc. Conclusion. The CIs are too numerous for being presented to the analyst. Only 1996 of the 3653 pie dish without strawberry recipes share at least one ingredient with R. There are too many recipes without anything in common. A first filter can be used to limit the size of the formal context in number of objects. 4.2 Filtering recipes with at least one common ingredient Experiment. The formal context, with 1996 objects × 813 items, produces 22,408 CIs (no minimal support is used), ranked by decreasing support. 94 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer Results. The top five FCIs are: – {Strawberry− } with support of 1996; – {Strawberry− , CoolWhip− } with support of 1916; – {Strawberry− , PieCrust− } with support of 1757; – {Strawberry− , PieCrust− , CoolWhip− } with support of 1679; – {Strawberry− , Cornstarch− } with support of 1631. Analysis. Several observations can be made. The first FCI containing an ingre- dient marked by + ({Strawberry− , Egg+ }, with support of 849) appears only at the 46th position. Moreover, there are 45 FCIs with one ingredient marked by + in the first 100 FCIs, and no FCI with more than one ingredient marked by +. A substituting ingredient ing can only be found in CIs containing ing + meaning that there exists a recipe containing ing, which is not in R. So, FCIs that do not contain the + mark cannot be used for finding a substitution proposition, and they are numerous in the first 100 ones, based on a support ranking (recall that it has been chosen not to consider adaptation by simply removing ingredient). In the first 100 FCIs, there is only 15 FCIs containing both an ingredient marked by + and an ingredient marked by =. In a FCI I, the = mark on a ingredient ing means that ing is common to R and to recipe(s) involved by the creation of I. So, an ingredient marked by = guarantees a certain similarity (based on ingredients that are used) between the recipes R and R0 compared by ∆R,R0 . If a FCI I contains a potential substituting ingredient, marked by +, but does not contain any =, the risk for proposing a cooking adaptation from I is very high, because there is no common ingredient with R in the recipe the potential substituting ingredient comes from. In the first 100 recipes, the only potential substituting ingredients (so, the ingredients marked by +) are egg, salt, and butter, which are not satisfactory from a cooking viewpoint for substituting the strawberries. We have conducted similar experiments with other R and queries, and the same observations as above can be made. Conclusion. From these observations, it can be concluded that the sole rank- ing based on support is not efficient to find relevant cooking adaptation rules, because the most frequent CIs do no contain potential substituting ingredients and, moreover, have no common ingredient with R. 4.3 Filtering and ranking CIs according to their forms To extract realistic adaptation, CIs with a maximum of ingredients marked by = are searched. We consider that a substitution is acceptable, if 50% of ingredients of R are preserved and if the adaptation does not introduce too many ingredients; we also limit the number of ingredients introduced to 50% of the initial number of ingredients in R. For the experiment with the R = “My Strawberry Pie”, containing initially 6 ingredients, it means that at least 3 ingredients must be preserved and at most 3 ingredients can be added. In term of CIs, it corresponds to CIs containing at least 3 ingredients marked with = and at most 3 ingredients marked with +. Adaptation knowledge discovery for cooking using closed itemset extraction 95 Experiment. Using this filter on CIs produced by the previous experiment re- duces the number of CIs to 505. However, because some CIs are more relevant than others, they must be ranked according to several criteria. We use the fol- lowing rules, given by priority order: 1. A CI must have a + in order to find a potential substituting ingredient. 2. A CI which has more = than another one is more relevant. This criterion promotes the pairs which have a largest set of common ingredient. 3. A CI which has less − than another one is more relevant. This criterion promotes adaptations which remove less ingredients. 4. A CI which has less + than another one is more relevant. This criterion promotes adaptations which add less ingredients. 5. If two CIs cannot be ranked according to the 4 criteria above, the CI the more frequent is considered to be the more relevant. Results. The 5 first CIs ranked according to the previous criteria are: – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , Salt+ } with support of 5; – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , LemonJuice+ } with support of 4; – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , LemonJuice+ , CreamCheese+ } with support of 2; – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , LemonJuice+ , WhippingCream+ } with support of 2; – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= , LemonJuice+ , LemonPeel+ } with support of 2. Analysis. One can observe that potential substituting ingredients take part of the first 5 CIs and each CIs preserve 4 (of 6) ingredients. The low supports of these CIs confirm that searching frequent CIs is not compatible with our need, which is to extract CIs with a specific form. Conclusion. Ranking the CIs according to our particular criteria is more efficient than using a support based ranking. This kind of ranking can also be seen as a filter on CIs. However, this approach requires to compute all CIs because the support of interesting CIs is low. 4.4 More restrictive formal context building according to the form of interesting CIs The computation time can be improved by applying a more restrictive selection of recipe pairs at the formal context building step, decreasing drastically the size of the formal context. Indeed, as the expected form of CIs is known, recipe pairs that cannot produce CIs of the expected form can be removed. This can also be seen as a selection of recipes that are similar enough to R. R0 is considered 96 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer as enough similar to R if R0 has a minimal threshold σ = = 50% of ingredients in common with R (cf. (2)) and if R0 has a maximal threshold σ + = 50% of ingredients that are not used in R (cf. (3)). These two conditions expresses for ∆R,R0 the same similarities conditions considered in section 4.3 on CIs. |Ingredients(R) ∩ Ingredients(R0 )| ≥ σ= (2) |Ingredients(R)| |Ingredients(R0 ) \ Ingredients(R)| ≥ σ+ (3) |Ingredients(R)| Experiment. Among the 1996 pie dish recipes not containing Strawberry, only 20 recipes satisfy the two conditions. The formal context, with 20 objects × 40 items, produces only 21 CIs (no minimal support is used). Results. The 5 first CIs, satisfying the form introduced in the previous section and ranked by decreasing support are: – {Water= , Sugar= , Cornstarch= , PieCrust= , Strawberry− , CoolWhip− , RedFoodColoring+ , Cherry+ } with support of 1; – {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− , PieShell+} with support of 6; – {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− , Raspberry+ } with support of 3; – {Water= , Sugar− , Cornstarch= , PieCrust= , Strawberry− , CoolWhip− , Apple+ , AppleJuice+ } with support of 3; – {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− , Peach+ , PieShell+} with support of 2. Analysis. According to these CIs the first potential substituting ingredients are: RedFoodColoring, Cherry, PieShell, Raspberry, Apple, and Peach. Each CI preserves 3 or 4 (of 6) ingredients to 6 and two CIs add 2 ingredients. Conclusion. This approach reduces the computation time without reducing the result quality. Moreover, it gives the best potential adaptation in the first CIs. 4.5 From CIs to adaptation rules As Taaable must propose a recipe adaptation, CIs containing potentially sub- stituting ingredients must be transformed. Indeed, a CI does not represent a direct cooking adaptation. For example, the third CI of the last experiment contains Raspberry+ , simultaneously with CoolWhip− and PieCrust− . Remov- ing the pie crust (i.e. PieCrust− ) can look surprising for a pie dish, but one must keep in mind that a CI does not correspond to a real recipe, but to an abstraction of variations between R and a set of recipes. So, producing a complete adaptation requires to get back to the ∆R,Ri for having all the vari- ations of ingredient that will take part to the adaptation. For example, for Adaptation knowledge discovery for cooking using closed itemset extraction 97 y− h= st + + + St st − p− Ge ll + Pi arc Co err Pi rry GC lor Fo n + ru i r= r= st ru wb Wh be he ti Co eC te ga rn eC ra ol sp eS la od Pi Wa Su Co Ra ∆R,R1 × × × × × × × × × ∆R,R2 × × × × × × × × × ∆R,R3 × × × × × × × × Table 3. Formal context for ingredient variations in pairs of recipes (R, Rj ). the CI {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− , Raspberry+ }, the ∆R,Ri (with i ∈ [1; 3]) are the ones given by Table 3. The adaptation rules extracted from these 3 recipe variations are: – {CoolWhip, PieCrust, Strawberry} ; {Gelatin, GCPieCrust, Raspberry}; – {CoolWhip, PieCrust, Strawberry} ; {FoodColor, PieShell, Raspberry}; – {CoolWhip, PieCrust, Strawberry} ; {PieShell, Raspberry}. For R2 and R3 , PieShell is added in replacement of PieCrust; in R1 , GCPieCrust plays the role of PieCrust. These three recipe variations propose to replace Strawberry by Raspberry. For R1 (resp. R2 ), Gelatin (resp. FoodColor) is also added. Finally, the three recipe variations propose to remove the CoolWhip. Our approach guarantees the ingredient compatibility, with the assumption that the recipe base used for the adaptation rule extraction process contains only good recipes, i.e. recipes which do not contain ingredient incompatibility. Indeed, as adaptation rules are extracted from real recipes, the good combination of ingredients is preserved. So, when introducing a new ingredient ing1 (marked by ing1+ ), removing another ingredient ing2 (marked by ing2− ) could be required. The reason is that there is no recipe, entailed in the creation of the CI from which the adaptation rules are extracted, using both ing1 and ing2 . In the same way, adding a supplementary ingredient ing3 (marked by ing3+ ) in addition of ing1 , is obtained from recipes which use both ing1 and ing3 . Applying FCA on these ∆R,Ri produces the concept lattice presented in Fig. 1 in which the top node is the CI retained. This node can be seen as a generic cooking adaptation, and navigating into the lattice will conduct to more specific adaptation. The KDD loop is closed: after having (1) selected and formatting the data, (2) applying a data-mining CI extraction algorithm, and (3) interpreting the results, a new set of data is selected on which a data-mining –FCA– algorithm could then be applied. We have chosen to return the adaptation rules generated from the 5 first CIs to the user. So, the system proposes results where Strawberry could be replaced (in addition of some other ingredient adding or removing) by “RedFoodColoring and Cherry”, by Raspberry with optional Gelatin or FoodColor, by Peach 98 Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer Fig. 1. The lattice computed on the formal context given in Table 3. with optional FoodColor or LemonJuice, by “HeavyCream and LemonRind”, or by “Apple and AppleJuice”. 5 Conclusion This paper shows how adaptation knowledge can be extracted efficiently for ad- dressing a cooking adaptation challenge. Our approach focuses on CIs with a particular form, because the support is not a good ranking measure for this problem. A ranking method based on 5 criteria explicitly specified for this adap- tation problem is proposed; the support is used in addition to distinguish CIs which satisfy in the same way the 5 criteria. Beyond the application domain, this study points out that KD is not only a data-mining issue: the preparation and interpretation steps are also important. Moreover, it highlights the iterative nature of KD: starting from a first experi- ment with few a priori about the form of the results which are too numerous to be interpreted, it arrives to an experiment with a precise aim that gives results that are easy to interpret as adaptation rules. It has been argued in the paper that this approach is better than the basic adaptation approach (based on substituting an ingredient by another one, on the basis of the ontology), in that it avoids some ingredient incompatibilities and makes some specialisation choices. However, a careful study remains to be made in order to compare experimentally these approaches. A short-term future work is to integrate this AK discovery into the online system Taaable, following the principles of opportunistic KD [2]. A mid-term future work consists in using the ontology during the KD process. The idea is to add new items, deduced thanks to the ontology (e.g. the properties Cream− and Milk+ entail the variation Dairy= ). First experiments have already been conducted but they raise interpretation difficulties. Indeed, the extracted CIs contain abstract terms (such as Dairy= or Flavoring+ ) that are not easy to interpret. Adaptation knowledge discovery for cooking using closed itemset extraction 99 References 1. F. Badra, R. Bendaoud, R. Bentebitel, P.-A. Champin, J. Cojan, A. Cordier, S. De- sprés, S. Jean-Daubias, J. Lieber, T. Meilender, A. Mille, E. Nauer, A. Napoli, and Y. Toussaint. Taaable: Text Mining, Ontology Engineering, and Hierarchical Clas- sification for Textual Case-Based Cooking. In ECCBR Workshops, Workshop of the First Computer Cooking Contest, pages 219–228, 2008. 2. F. Badra, A. Cordier, and J. Lieber. Opportunistic Adaptation Knowledge Dis- covery. In Lorraine McGinty and David C. Wilson, editors, 8th International Con- ference on Case-Based Reasoning - ICCBR 2009, volume 5650 of Lecture Notes in Computer Science, pages 60–74, Seattle, États-Unis, July 2009. Springer. The original publication is available at www.springerlink.com. 3. S. Craw, N. Wiratunga, and R. C. Rowe. Learning adaptation knowledge to im- prove case-based reasoning. Artificial Intelligence, 170(16-17):1175–1192, 2006. 4. M. d’Aquin, F. Badra, S. Lafrogne, J. Lieber, A. Napoli, and L. Szathmary. Case base mining for adaptation knowledge acquisition. In International Joint Confer- ence on Artificial Intelligence, IJCAI’07, pages 750–756, 2007. 5. M. D’Aquin, S. Brachais, J. Lieber, and A. Napoli. Decision Support and Knowl- edge Management in Oncology using Hierarchical Classification. In Katherina Kaiser, Silvia Miksch, and Samson W. Tu, editors, Proceedings of the Symposium on Computerized Guidelines and Protocols - CGP-2004, volume 101 of Studies in Health Technology and Informatics, pages 16–30, Prague, Czech Republic, 2004. Silvia Miksch and Samson W. Tu, IOS Press. 6. M. d’Aquin, J. Lieber, and A. Napoli. Adaptation Knowledge Acquisition: a Case Study for Case-Based Decision Support in Oncology. Computational Intelligence (an International Journal), 22(3/4):161–176, 2006. 7. U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery in databases. AI Magazine, pages 37–54, 1996. 8. B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer, 1999. 9. K. Hanney and M. T. Keane. Learning Adaptation Rules From a Case-Base. In I. Smith and B. Faltings, editors, Advances in Case-Based Reasoning – Third European Workshop, EWCBR’96, LNAI 1168, pages 179–192. Springer, 1996. 10. C. K. Riesbeck and R. C. Schank. Inside Case-Based Reasoning. Lawrence Erlbaum Associates, Inc., Hillsdale, New Jersey, 1989. 11. L. Szathmary and A. Napoli. CORON: A Framework for Levelwise Itemset Min- ing Algorithms. Supplementary Proc. of The Third International Conference on Formal Concept Analysis (ICFCA ’05), Lens, France, pages 110–113, 2005. 12. M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In SIAM International Conference on Data Mining SDM’02, pages 33–43, 2002. Fast Computation of Proper Premises Uwe Ryssel1 , Felix Distel2 , and Daniel Borchmann3 1 Institute of Applied Computer Science, Technische Universität Dresden, Dresden, Germany, uwe.ryssel@tu-dresden.de 2 Institute of Theoretical Computer Science, Technische Universität Dresden, Dresden, Germany, felix@tcs.inf.tu-dresden.de 3 Institute of Algebra, Technische Universität Dresden, Dresden, Germany, borch@tcs.inf.tu-dresden.de Abstract. This work is motivated by an application related to refactor- ing of model variants. In this application an implicational base needs to be computed, and runtime is more crucial than minimal cardinality. Since the usual stem base algorithms have proven to be too costly in terms of runtime, we have developed a new algorithm for the fast computation of proper premises. It is based on a known link between proper premises and minimal hypergraph transversals. Two further improvements are made, which reduce the number of proper premises that are obtained multiple times and redundancies within the set of proper premises. We provide heuristic evidence that an approach based on proper premises will also be beneficial for other applications. 1 Introduction Today, graph-like structures are used in many model languages to specify al- gorithms or problems in a more readable way. Examples are data-flow-oriented simulation models, such as MATLAB/Simulink, state diagrams, and diagrams of electrical networks. Generally, such models consist of blocks or elements and connections among them. Using techniques described in Section 5.2, a formal context can be obtained from such models. By computing an implicational base of this context, dependencies among model artifacts can be uncovered. These can help to represent a large number of model variants in a structured way. For many years, computing the stem base has been the default method for extracting a small but complete set of implications from a formal context. There exist mainly two algorithms to achieve this [10,15], and both of them compute not only the implications from the stem base, but also concept intents. This is problematic as a context may have exponentially many concept intents. Recent theoretical results suggest that existing approaches for computing the stem base may not lead to algorithms with better worst-case complexity [6,1]. Bearing this in mind, we focus on proper premises. Just like pseudo-intents, that are used to obtain the stem base, proper premises yield a sound and com- plete set of implications. Because this set of implications does not have minimal cardinality, proper premises have been outside the focus of the FCA community c 2011 by the paper authors. CLA 2011, pp. 101–113. Copying permitted only for private and academic purposes. Volume published and copyrighted by its editors. Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France. 102 Uwe Ryssel, Felix Distel and Daniel Borchmann for many years. However, there are substantial arguments to reconsider using them. Existing methods for computing proper premises avoid computing con- cept intents. Thus, in contexts with many concept intents they may have a clear advantage in runtime over the stem base algorithms. This is particularly true for our application where the number of concept intents is often close to the theoretical maximum. Here, attributes often occur together with their negated counterparts, and the concept lattice can contain several millions of elements. In Section 5.1 we provide arguments that we can expect the number of con- cept intents to be larger than the number of proper premises in most contexts, assuming a uniform random distribution. Often, in applications, runtime is the limiting factor, not the size of the basis. But even where minimal cardinality is a requirement, computing proper premises is worth considering, since there are methods to transform a base into the stem base in polynomial time [16]. In this paper we present an algorithm for the fast computation of proper premises. It is based on three ideas. The first idea is to use a simple connection between proper premises and minimal hypergraph transversals. The problem of enumerating minimal hypergraph transversals is well-researched. Exploiting the link to proper premises allows us to use existing algorithms that are known to behave well in practice. A first, naïve algorithm iterates over all attributes and uses a black-box hypergraph algorithm to compute proper premises of each attribute. A drawback when iterating over all attributes is that the same proper premise may be computed several times for different attributes. So we introduce a can- didate filter in the second step: For each attribute m, the attribute set is filtered and proper premises are searched only among the candidate attributes. We show that this filtering method significantly reduces the number of multiple-computed proper premises while maintaining completeness. In a third step we exploit the fact that there are obvious redundancies within the proper premises. These can be removed by searching for proper premises only among the meet-irreducible attributes. We argue that our algorithms are trivial to parallelize, leading to further speedups. Due to their incremental nature, parallelized versions of the stem base algorithms are not known to date. We conclude by providing experimental re- sults. These show highly significant improvements for the contexts obtained from the model refactoring application. For a sample context, where Next-Closure re- quired several hours to compute the stem base, runtime has dropped to fractions of a second. For contexts from other applications the improvements are not as impressive but still large. 2 Preliminaries We provide a short summary of the most common definitions in formal concept analysis. A formal context is a triple K = (G, M, I) where G is a set of objects, M a set of attributes, and I ⊆ G × M is a relation that expresses whether an Fast Computation of Proper Premises 103 object g ∈ G has an attribute m ∈ M . If A ⊆ G is a set of objects then A0 denotes the set of all attributes that are shared among all objects in A, i.e., A0 = { m ∈ M | ∀g ∈ G : gIm }. Likewise, for some set B ⊆ M we define B 0 = { g ∈ G | ∀m ∈ B : gIm }. Pairs of the form (A, B) where A0 = B and B 0 = A are called formal concepts. Formal concepts of the form ({ m }0 , { m }00 ) for some attribute m ∈ M are called attribute concept and are denoted by µm. We define the partial order ≤ on the set of all formal concepts of a context to be the subset order on the first component. The first component of a formal concept is called the concept extent while the second component is called the concept intent. Formal concept analysis provides methods to mine implicational knowledge from formal contexts. An implication is a pair (B1 , B2 ) where B1 , B2 ⊆ M , usually denoted by B1 → B2 . We say that the implication B1 → B2 holds in a context K if B10 ⊆ B20 . An implication B1 → B2 follows from a set of implications L if for every context K in which all implications from L hold, B1 → B2 also holds. We say that L is sound for K if all implications from L hold in K, and we say that L is complete for K if all implications that hold in K follow from L. There exists a sound and complete set of implications for each context which has minimal cardinality [12]. This is called the stem base. The exact definition of the stem base is outside the scope of this work. A sound and complete set of implications can also be obtained using proper premises. For a given set of attributes B ⊆ M , define B • to be the set of those attributes in M \ B that follow from B but not from a strict subset of B, i.e.,  [  B • = B 00 \ B ∪ S 00 . S(B B is called a proper premise if B • is not empty. It is called a proper premise for m ∈ M if m ∈ B • . It can be shown that L = { B → B • | B proper premise } is sound and complete [11]. Several alternative ways to define this sound and complete set of implications can be found in [2]. We write g $ m if g 0 is maximal with respect to the subset order among all object intents which do not contain m. 3 Proper Premises as Minimal Hypergraph Transversals We present a connection between proper premises and minimal hypergraph transversals, which forms the foundation for our enumeration algorithms. It has been exploited before in database theory to the purpose of mining functional dependencies from a database relation [14]. Implicitly, it has also been known for a long time within the FCA community. However, the term hypergraph has not been used in this context (cf. Prop. 23 from [11]). Let V be a finite set of vertices. Then a hypergraph on V is simply a pair (V, H) where H is a subset of the power set 2V . Intuitively, each set E ∈ H represents an edge of the hypergraph, which, in contrast to classical graph theory, 104 Uwe Ryssel, Felix Distel and Daniel Borchmann may be incident to more or less than two vertices. A set S ⊆ V is called a hypergraph transversal of H if it intersects every edge E ∈ H, i.e., ∀E ∈ H : S ∩ E 6= ∅. S is called a minimal hypergraph transversal of H if it is minimal with respect to the subset order among all hypergraph transversals of H. The transversal hyper- graph of H is the set of all minimal hypergraph transversals of H. It is denoted by Tr (H). The problem of deciding for two hypergraphs G and H whether H is the transversal hypergraph of G is called TransHyp. The problem of enumerating all minimal hypergraph transversals of a hypergraph G is called TransEnum. Both problems are relevant to a large number of fields and therefore have been well-researched. TransHyp is known to be contained in coNP. Since it has been shown that TransHyp can be decided in quasipolynomial time [9], it is not believed to be coNP-complete. Furthermore, it has been shown that it can be decided using only limited non-determinism [8]. For the enumeration problem it is not known to date whether an output-polynomial algorithm exists. However, efficient algorithms have been developed for several classes of hypergraphs [8,4]. The following proposition can be found in [11] among others. Proposition 1. P ⊆ M is a premise of m ∈ M iff (M \ g 0 ) ∩ P 6= ∅ holds for all g ∈ G with g $ m. P is a proper premise for m iff P is minimal (with respect to ⊆) with this property. We immediately obtain the following corollary. Corollary 1. P is a premise of m iff P is a hypergraph transversal of (M, H) where H := {M \ g 0 | g ∈ G, g $ m}. The set of all proper premises of m is exactly the transversal hypergraph Tr ({M \ g 0 | g ∈ G, g $ m}). In particular this proves that enumerating the proper premises of a given attribute m is polynomially equivalent to TransEnum. This can be exploited in a naïve algorithm for computing all proper premises of a formal context (Al- gorithm 1). Being aware of the link to hypergraph transversals, we can benefit from existing efficient algorithms for TransEnum in order to enumerate proper premises similar to what has been proposed in [14]. Of course, it is also possible to use other enumeration problems to which TransEnum can be reduced. Ex- amples are the enumeration of prime implicants of Horn functions [2] and the enumeration of set covers. Fast Computation of Proper Premises 105 4 Improvements to the Algorithm 4.1 Avoiding Duplicates using Candidate Sets We can further optimize Algorithm 1 by reducing the search space. In the naïve algorithm proper premises are typically computed multiple times since they can be proper premises of more than one attribute. Our goal is to avoid this wherever possible. The first idea is shown in Algorithm 2. There we introduce a candidate set C of particular attributes, depending on the current attribute m. We claim now that we only have to search for minimal hypergraph transversals P of { M \ g 0 | g $ m } with P ⊆ C. We provide some intuition for this idea. Algorithm 1 Naïve Algorithm for Enumerating All Proper Premises Input: K = (G, M, I) P=∅ for all m ∈ M do P = P ∪ Tr ({M \ g 0 | g ∈ G, g $ m}) end for return P Algorithm 2 A Better Algorithm for Enumerating All Proper Premises Input: K = (G, M, I) P = { { m } | m ∈ M, { m } is a proper premise of K } for all m ∈ M do C = { u ∈ M \ { m } | 6 ∃v ∈ M : µu ∧ µm ≤ µv < µm } P = P ∪ { P ⊆ C | P minimal hypergraph transversal of { M \ g 0 | g $ m } } end for return P Let us fix a formal context K = (G, M, I), choose m ∈ M and let P ⊆ M be a proper premise for m. Then we know that m ∈ P 00 , which is equivalent to ^ µp ≤ µm. p∈P If we now find another attribute n ∈ M \ { m } with ^ µp ≤ µn < µm p∈P it suffices to find the set P as a proper premise for n, because from µn < µm we can already infer m ∈ P 00 . Conversely, if we search for all proper premises for m, 106 Uwe Ryssel, Felix Distel and Daniel Borchmann we only have to search for those who are not proper premises for attributes n with µn < µm. Now suppose that there exists an element u ∈ P and an attribute v ∈ M such that µm ∧ µu ≤ µv < µm. (1) Then we know ^ ^ ( µp) ∧ µm = µp ≤ µv < µm, p∈P p∈P i.e., P is already a proper premise for v. In this case, we do not have to search for P , since it will be found in another iteration. On the other hand, if P is a proper premise for m but not for any other attribute n ∈ M with µn < µm, the argument given above shows that an element u ∈ P and an attribute v ∈ M satisfying (1) cannot exist. Lemma 1. Algorithm 2 enumerates for a given formal context K = (G, M, I) all proper premises of K. Proof. Let P be a proper premise of K for the attribute m. P is a proper premise and therefore m ∈ P 00 holds, which is equivalent to µm ≥ (P 0 , P 00 ). Let c ∈ M be such that µm ≥ µc ≥ (P 0 , P 00 ) and µc is minimal with this property. We claim that either P = { c } or P is found in the iteration for c of Algorithm 2. Suppose c ∈ P . Then m ∈ { c }00 follows from µm ≥ µc. As a proper premise, P is minimal with the property m ∈ P 00 . It follows P = { c } and P is found by Algorithm 2 during the initialization. Now suppose c 6∈ P . Consider C := { u ∈ M \ { c } | 6 ∃v ∈ M : µu ∧ µc ≤ µv < µc }. We shall show P ⊆ C. To see this, consider some p ∈ P . Then p 6= c holds by assumption. Suppose that p 6∈ C, i.e., there is some v ∈ M such that µp ∧ µc ≤ µv < µc. Because of p ∈ P , µp ≥ (P 0 , P 00 ) and together with µc ≥ (P 0 , P 00 ) we have (P 0 , P 00 ) ≤ µp ∧ µc ≤ µv < µc in contradiction to the minimality of µc. This shows p ∈ C and all together P ⊆ C. To complete the proof it remains to show that P is a minimal hypergraph transversal of { M \ { g }0 | g $ c }, i.e., that P is also a proper premise for c, not only for m. Consider n ∈ P . Assume c ∈ (P \ { n })00 . Since {c} implies m, then P \ { n } would be a premise for m in contradiction to the minimality of P . Thus c 6∈ (P \ { n })00 holds for all n ∈ P and therefore P is a proper premise for c. 4.2 Irreducible Attributes We go one step further and also remove attributes m from our candidate set C whose attribute concept µm is the V meet of other attribute concepts µx1 , . . . , µxn , n where x1 , . . . , xn ∈ C, i.e., µm = i=1 µxi . This results in Algorithm 3 that no Fast Computation of Proper Premises 107 longer computes all proper premises, but a subset that still yields a complete implicational base. We show that we only have to search for proper premises P with P ⊆ N where N is the set of irreducible attributes of K. To ease the presentation, let us assume for the rest of this paper that the formal context K is attribute-clarified. Algorithm 3 Computing Enough Proper Premises Input: K = (G, M, I) P = { { m } | m ∈ M, { m } Vis a proper premise of K } N = M \ { x ∈ M | µx = n i=1 µxi for an n ∈ N and xi ∈ M for 1 ≤ i ≤ n } for all m ∈ M do C = { u ∈ N \ { m } | 6 ∃v ∈ M : µu ∧ µm ≤ µv < µm } P = P ∪ { P ⊆ C | P minimal hypergraph transversal of { M \ g 0 | g $ m } } end for return P Proposition 2. Let m be an attribute and let P be a proper premise for m. Let x ∈ P , n ∈ N, and for 1 ≤ i ≤ n let xi ∈ M be attributes satisfying – m∈ / {Vx1 , . . . , xn }, n – µx = i=1 µxi , – xi ∈ / ∅ for all 1 ≤ i ≤ n and 00 – µx < µxi for all 1 ≤ i ≤ n. Then { x } is a proper premise for all xi and there exists a nonempty set Y ⊆ { x1 , . . . , xn } such that (P \ { x }) ∪ Y is a proper premise for m. Proof. It is clear that { x } is a proper premise for all xi , since xi ∈ { x }00 and / ∅00 . Define xi ∈ QY := (P \ { x }) ∪ Y for Y ⊆ { x1 , . . . , xn }. We choose Y ⊆ { x1 , . . . , xn } such that Y is minimal with respect to m ∈ Q00Y . Such a set exists, since m ∈ ((P \ { x }) ∪ { x1 , . . . , xn })00 because of { x1 , . . . , xn } → { x }. Furthermore, Y 6= ∅, since m ∈ / (P \ { x })00 . We now claim that QY is a proper premise for m. Clearly m ∈ / QY , since m∈ / Y . For all y ∈ Y it holds that m ∈ / (QY \ { y })00 or otherwise minimality of Y would be violated. It therefore remains to show that m ∈ / (QY \ { y })00 for all y ∈ QY \ Y = P \ { x }. (QY \ { y })00 = ((P \ { x, y }) ∪ Y )00 ⊆ ((P \ { y }) ∪ Y )00 = (P \ { y })00 since { x } → Y and x ∈ P \{ y }. Since m ∈ / (P \{ y })00 , we get m ∈ / (QY \{ y })00 as required. In sum, QY is a proper premise for m. 108 Uwe Ryssel, Felix Distel and Daniel Borchmann Lemma 2. Let N be the set of all meet-irreducible attributes of a context K. Define P = { X ⊆ M | |X| ≤ 1, X proper premise } ∪ { X ⊆ N | X proper premise } Then the set L = { P → P • | P ∈ P } is sound and complete for K. Proof. Let m be an attribute and let P be a proper premise for m. If P ∈ / P then it follows that P 6⊆ N . Thus we can find y1 ∈ P \N and elements x1 , . . . , xn ∈ M with n ≥ 1 such that – m∈ / { xV1 , . . . , xn }, n – µy1 = i=1 µxi , – xi ∈ / ∅00 for all 1 ≤ i ≤ n and – µx < µxi for all 1 ≤ i ≤ n. By Proposition 2 we can find a proper premise P1 such that P → { m } fol- lows from { y1 } → { x1 , . . . , xn } and P1 → { m }. Clearly { y1 } ∈ P, since all singleton proper premises are contained in P. If P1 ∈ / P then we can apply Proposition 2 again and obtain a new proper premise P2 , etc. To see that this process terminates consider the strict partial order ≺ defined as P ≺ Q iff ∀q ∈ Q : ∃p ∈ P : µp < µq. It is easy to see that with each application of Proposition 2 we obtain a new proper premise that is strictly larger than the previous with respect to ≺. Hence, the process must terminate. This yields a set P 0 = { { y1 }, . . . , { yk }, Pk } ⊆ P such that P → { m } follows from { Q → Q• | Q ∈ P 0 }. Thus L is a sound and complete set of implications. Together with Lemma 1 this yields correctness of Algorithm 3. Corollary 2. The set of proper premises computed by Algorithm 3 yields a sound and complete set of implications for the given formal context. 5 Evaluation 5.1 Computing Proper Premises Instead of Intents In both the stem base algorithms and our algorithms, runtime can be exponential in the size of the input. In the classical case the reason is that the number of intents can be exponential in the size of the stem base [13]. In the case of our algorithms there are two reasons: the computation of proper premises is TransEnum-complete, and there can be exponentially many proper premises. The first issue is less relevant in practice because algorithms for TransEnum, while still exponential in the worst case, behave well for most instances. To see that there can be exponentially many proper premises in the size of the stem base, let us look at the context Kn from Table 1 for some n ≥ 2, consisting Fast Computation of Proper Premises 109 of two contranominal scales of dimension n × n and one attribute a with empty extent. It can be verified that the proper premises of the attribute a are exactly the sets of the form {mi | i ∈ I} ∪ {m0i | i ∈ / I} for some I ⊆ {1, . . . , n}, while the only pseudo-intents are the singleton sets and {m1 , . . . , mn , m01 , . . . , m0n }. Hence there are 2n proper premises for a, while there are only 2n + 2 pseudo-intents. Table 1. Context Kn with Exponentially Many Proper Premises m1 . . . mn m01 . . . m0n a g1 .. . I6= I6= gn Next-Closure behaves poorly on contexts with many intents while our algo- rithms behave poorly on contexts with many proper premises. In order to provide evidence that our algorithm should behave better in practice we use formulae for the expectation of the number of intents and proper premises in a formal context that is chosen uniformly at random among all n × m-contexts for fixed natural numbers n and m.4 Derivations of these formulae can be found in [7]. The expected value for the number of intents in an n × m-context is Xm  X n   m n −rq Eintent = 2 (1 − 2−r )m−q (1 − 2−q )n−r , q=0 q r=0 r while the expected value for the number of proper premises for a fixed attribute a in an n × m-context is Xn   m−1   q n X m 2 X Y pi+1 −pi −1 Epp = 2−n q! 2−q 1 − 2−q (1 + i) . r=0 r q=0 q q i=0 (p1 ,...,pq )∈N 1≤p1 <···