=Paper= {{Paper |id=None |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-959/proceedings-cla2011.pdf |volume=Vol-959 }} ==None== https://ceur-ws.org/Vol-959/proceedings-cla2011.pdf
CLA 2011
Proceedings of the Eighth International Conference on
Concept Lattices and Their Applications



CLA Conference Series
http://cla.inf.upol.cz
 INRIA Nancy – Grand Est and LORIA, France




The Eighth International Conference on
Concept Lattices and Their Applications




                CLA 2011
             Nancy, France
         October 17–20, 2011




                 Edited by

              Amedeo Napoli
              Vilem Vychodil
CLA 2011, October 17–20, 2011, Nancy, France.
Copyright c 2011 by paper authors.
Copying permitted only for private and academic purposes.
This volume is published and copyrighted by its editors.




Technical Editors:
Jan Outrata, jan.outrata@upol.cz
Vilem Vychodil, vychodil@acm.org




Page count:      xii+419
Impression:      100
Edition:         1st
First published: 2011




Printed version published by INRIA Nancy – Grand Est and LORIA, France
ISBN 978–2–905267–78–8
                         Organization


CLA 2011 was organized by the INRIA Nancy – Grand Est and LORIA


Steering Committee
Radim Belohlavek         Palacky University, Olomouc, Czech Republic
Sadok Ben Yahia          Faculté des Sciences de Tunis, Tunisia
Jean Diatta              Université de la Réunion, France
Peter Eklund             University of Wollongong, Australia
Sergei O. Kuznetsov      State University HSE, Moscow, Russia
Michel Liquière         LIRMM, Montpellier, France
Engelbert Mephu Nguifo   LIMOS, Clermont-Ferrand, France


Program Chairs
Amedeo Napoli            INRIA NGE/LORIA, Nancy, France
Vilem Vychodil           Palacky University, Olomouc, Czech Republic


Program Committee
Jaume Baixeries          Polytechnical University of Catalonia
Jose Balcazar            University of Cantabria and UPC Barcelona, Spain
Radim Belohlavek         Palacky University, Olomouc, Czech Republic
Karell Bertet            University of La Rochelle, France
François Brucker        University of Marseille, France
Claudio Carpineto        Fondazione Ugo Bordoni, Roma, Italy
Jean Diatta              Université de la Réunion, France
Felix Distel             TU Dresden, Germany
Florent Domenach         University of Nicosia, Cyprus
Mireille Ducassé        IRISA Rennes, France
Alain Gély              University of Metz, France
Cynthia Vera Glodeanu    TU Dresden, Germany
Marianne Huchard         LIRMM, Montpellier, France
Vassilis G. Kaburlasos   TEI, Kavala, Greece
Stanislav Krajci         University of P.J. Safarik, Kosice, Slovakia
Sergei O. Kuznetsov      State University HSE, Moscow, Russia
Léonard Kwuida          Zurich University of Applied Sciences, Switzerland
Mondher Maddouri         URPAH, University of Gafsa, Tunisie
Rokia Missaoui           UQO, Gatineau, Canada
Lhouari Nourine          LIMOS, University of Clermont Ferrand, France
Sergei Obiedkov         State University HSE, Moscow, Russia
Manuel Ojeda-Aciego     University of Malaga, Spain
Jan Outrata             Palacky University, Olomouc, Czech Republic
Pascal Poncelet         LIRMM, Montpellier, France
Uta Priss               Napier University, Edinburgh, United Kingdom
Olivier Raynaud         LIMOS, University of Clermont Ferrand, France
Camille Roth            EHESS, Paris, France
Stefan Schmidt          TU Dresden, Germany
Baris Sertkaya          SAP Research Center, Dresden, Germany
Henry Soldano           Université of Paris 13, France
Gerd Stumme             University of Kassel, Germany
Petko Valtchev          Université du Québec à Montréal, Canada


Additional Reviewers
Mikhail Babin           State University HSE, Moscow, Russia
Daniel Borchmann        TU Dresden, Germany
Peggy Cellier           IRISA Rennes, France
Sebastien Ferre         IRISA Rennes, France
Nathalie Girard         University of La Rochelle, France
Alice Hermann           IRISA Rennes, France
Mehdi Kaytoue           INRIA NGE/LORIA, Nancy, France
Petr Krajca             Palacky University, Olomouc, Czech Republic
Christian Meschke       TU Dresden, Germany
Petr Osicka             Palacky University, Olomouc, Czech Republic
Violaine Prince         LIRMM, Montpellier, France
Chedy Raissy            INRIA NGE/LORIA, Nancy, France
Yoan Renaud             LIRIS, Lyon, France
Heiko Reppe             TU Dresden, Germany
Lucie Urbanova          Palacky University, Olomouc, Czech Republic
Jean Villerd            ENSAIA, Nancy, France


Organization Committee
Mehdi Kaytoue (chair)   INRIA NGE/LORIA, Nancy, France
Elias Egho              INRIA NGE/LORIA, Nancy, France
Felipe Melo             INRIA NGE/LORIA, Nancy, France
Amedeo Napoli           INRIA NGE/LORIA, Nancy, France
Chedy Raı̈ssi           INRIA NGE/LORIA, Nancy, France
Jean Villerd            ENSAIA, Nancy, France
                                        Table of Contents



Preface
Invited Contributions
Mathematical Morphology, Lattices, and Formal Concept Analysis . . . . . .                                                 1
  Isabelle Bloch

Random concept lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .              3
  Richard Emilion

Galois and his Connections—A retrospective on the 200th birthday of
Evariste Galois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      5
   Marcel Erné

Canonical extensions, Duality theory, and Formal Concept Analysis . . . . .                                                7
   Mai Gehrke

Galois connections and residuation: origins and developments II . . . . . . . . .                                          9
   Bruno Leclerc

Galois connections and residuation: origins and developments I . . . . . . . . .                                          11
   Bernard Monjardet

Metrics, Betweeness Relations, and Entropies on Lattices and Applications                                                 13
  Dan Simovici

Long Papers
Vertical decomposition of a lattice using clique separators . . . . . . . . . . . . . .                                   15
   Anne Berry, Romain Pogorelcnik and Alain Sigayret

Building up Shared Knowledge with Logical Information Systems . . . . . . .                                               31
   Mireille Ducasse, Sebastien Ferre and Peggy Cellier

Comparing performance of algorithms for generating the Duquenne-
Guigues basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   43
   Konstantin Bazhanov and Sergei Obiedkov

Filtering Machine Translation Results with Automatically Constructed
Concept Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      59
    Yılmaz Kılıçaslan and Edip Serdar Güner

Concept lattices in fuzzy relation equations . . . . . . . . . . . . . . . . . . . . . . . . . . .                        75
   Juan Carlos Dı́az and Jesús Medina-Moreno
Adaptation knowledge discovery for cooking using closed itemset
extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   87
   Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer

Fast Computation of Proper Premises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
   Uwe Ryssel, Felix Distel and Daniel Borchmann

Block relations in fuzzy setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
   Jan Konecny and Michal Krupka

A closure algorithm using a recursive decomposition of the set of Moore
co-families . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
    Pierre Colomb, Alexis Irlande, Olivier Raynaud and Yoan Renaud

Iterative Software Design of Computer Games through FCA . . . . . . . . . . . . 143
    David Llansó, Marco Antonio Gómez-Martı́n, Pedro Pablo
    Gomez-Martin and Pedro Antonio González-Calero

Fuzzy-valued Triadic Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
   Cynthia Vera Glodeanu

Mining bicluster of similar values with triadic concept analysis . . . . . . . . . . 175
   Mehdi Kaytoue, Sergei Kuznetsov, Juraj Macko, Wagner Meira and
   Amedeo Napoli

Fast Mining of Iceberg Lattices: A Modular Approach Using Generators . 191
   Laszlo Szathmary, Petko Valtchev, Amedeo Napoli, Robert Godin,
   Alix Boc and Vladimir Makarenkov

Boolean factors as a means of clustering of interestingness measures of
association rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
   Radim Belohlavek, Dhouha Grissa, Sylvie Guillaume, Engelbert
   Mephu Nguifo and Jan Outrata

Combining Formal Concept Analysis and Translation to Assign Frames
and Thematic Grids to French Verbs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
   Ingrid Falk and Claire Gardent

Generation algorithm of a concept lattice with limited access to objects . . 239
  Christophe Demko and Karell Bertet

Homogeneity and Stability in Conceptual Analysis . . . . . . . . . . . . . . . . . . . . 251
  Paula Brito and Géraldine Polaillon

A lattice-based query system for assessing the quality of hydro-ecosystems 265
   Agnés Braud, Cristina Nica, Corinne Grac and Florence Le Ber

The word problem in semiconcept algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
   Philippe Balbiani
Looking for analogical proportions in a formal concept analysis setting . . . 295
   Laurent Miclet, Henri Prade and David Guennec

Random extents and random closure systems . . . . . . . . . . . . . . . . . . . . . . . . . 309
  Bernhard Ganter
Extracting Decision Trees From Interval Pattern Concept Lattices . . . . . . 319
   Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd

A New Formal Context for Symmetric Dependencies . . . . . . . . . . . . . . . . . . 333
   Jaume Baixeries
Cheating to achieve Formal Concept Analysis over a large formal context                                    349
   Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo

A FCA-based analysis of sequential care trajectories . . . . . . . . . . . . . . . . . . . 363
   Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli
Querying Relational Concept Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
  Zeina Azmeh, Mohamed Hacéne-Rouane, Marianne Huchard,
  Amedeo Napoli and Petko Valtchev

Links between modular decomposition of concept lattice and bimodular
decomposition of a context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
   Alain Gély

Short Papers
Abduction in Description Logics using Formal Concept Analysis and
Mathematical Morphology: application to image interpretation . . . . . . . . . 405
  Jamal Atif, Céline Hudelot and Isabelle Bloch

A local discretization of continuous data for lattices: Technical aspects . . . 409
   Nathalie Girard, Karell Bertet and Muriel Visani
Formal Concept Analysis on Graphics Hardware . . . . . . . . . . . . . . . . . . . . . . 413
   W. B. Langdon, Shin Yoo, and Mark Harman

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
                                    Preface


The Eighth International Conference “Concept Lattices and Applications (CLA
2011)” is held in Nancy, France from October 17th until October 20th 2011. CLA
2011 is aimed at providing to everyone interested in Formal Concept Analysis
and more generally in Concept Lattices or Galois Lattices, students, professors,
researchers and engineers, a global and an advanced view of some of the last
research trends and applications in this field. As the diversity of the selected pa-
pers shows, there is a wide range of theoretical and practical research directions,
around data and knowledge processing, e.g. data mining, knowledge discovery,
knowledge representation, reasoning, pattern recognition, together with logic,
algebra and lattice theory.
This volume includes the selected papers and the abstracts of the 7 invited
talks. This year there were initially 47 submissions from which 27 papers were
accepted as full papers and 3 papers as posters. We would like to thank here the
authors for their work, often of very good quality, the members of the program
committee and the external reviewers who did a great job as this can be seen
in their reviews. This is one witnesses of the growing quality and importance of
CLA, highlightening its leading position in the field.
Next, this year is a little bit special while the bicentennial of the birth of Evariste
Galois (1811–1832) is celebrated, particularly in France. Evariste Galois has
something to do with Concept Lattices as they are based on a so-called “Galois
connection”. Among the invited speakers, some of them will discuss of these
fundamental aspects of Concept Lattices. Moreover, this is also the occasion of
thanking the seven invited speakers who, at least we hope that, will meet the
wishes of the attendees.
We would like to thank firstly our first sponsors, namely the CNRS GDR I3 and
Institut National Polytechnique de Lorraine (INPL). Then we would like to thank
the steering committee of CLA for giving us the occasion of leading this edition
of CLA, the conference participants for their participation and support, and
people in charge of the organization, especially Anne-Lise Charbonnier, Nicolas
Alcaraz and Mehdi Kaytoue, whose help was very precious in many occasions.
Finally, we also do not forget that the conference was managed (quite easily)
with the Easychair system, paper submission, selection, and reviewing, and that
Jan Outrata has offered his files for preparing the proceedings.




October 2011                                                      Amedeo Napoli
                                                                  Vilem Vychodil
                                                      Program Chairs of CLA 2011
           Mathematical Morphology, Lattices,
             and Formal Concept Analysis

                                     Isabelle Bloch

                    Telecom ParisTech, CNRS LTCI, Paris, France

Abstract. Lattice theory has become a popular mathematical framework in differ-
ent domains of information processing, and various communities employ its features
and properties, e.g. in knowledge representation, in logics, automated reasoning and
decision making, in image processing, in information retrieval, in soft computing, in
formal concept analysis. Mathematical morphology is based adjunctions, on the alge-
braic framework of posets, and more specifically of complete lattices, which endows
it with strong properties and allows for multiple applications and extensions. In this
talk we will summarize the main concepts of mathematical morphology and show their
instantiations in different settings, where a complete lattice can be built, such as sets,
functions, partitions, fuzzy sets, bipolar fuzzy sets, formal logics . . . We will detail in
particular the links between morphological operations and formal concept analysis,
thus initiating links between two domains that were quite disconnected until now,
which could therefore open new interesting perspectives.
                     Random concept lattices

                                  Richard Emilion

                       MAPMO, University of Orléans, France

Abstract. After presenting an algorithm providing concepts and frequent concepts,
we will study the random size of concept lattices in the case of a Bernoulli(p) context.
Next, for random lines which are independent and identically distributed or more
generally outcomes of a Markov chain, we will show the almost everywhere convergence
of the random closed intents towards deterministic intents. Finally we will consider the
problem of predicting the number of concepts before choosing any algorithm.
Galois and his Connections—A retrospective on
     the 200th birthday of Evariste Galois

                                     Marcel Erné

                           University of Hannover, Germany

Abstract. A frequently used tool in mathematics is what Oystein Ore called “Galois
connections” (also “Galois connexions”, “Galois correspondences” or “dual adjunc-
tions”). These are pairs (ϕ, ψ) of maps between ordered sets in opposite direction so
that x ≤ ψ(y) is equivalent to y ≤ ϕ(x). This concept looks rather simple but proves
very effective. The primary gain of such “dually adjoint situations” is that the ranges
of the involved maps are dually isomorphic: thus, Galois connections present two faces
of the same medal.
Many concrete instances are given by what Garrett Birkhoff termed “polarities”: these
are nothing but Galois connections between power sets. In slightly different terminology,
the fundamental observation of modern Formal Concept Analysis is that every “formal
context”, that is, any triple (J, M, I) where I is a relation between (the elements of)
J and M , gives rise to a Galois connection (assigning to each subset of one side its
“polar”, “extent” or “intent” on the other side), such that the resulting two closure
systems of polars are dually isomorphic; more surprising is the fact that, conversely,
every dual isomorphism between two closure systems arises in a unique fashion from a
relation between the underlying sets. In other words: the complete Boolean algebra of
all relations between J and M is isomorphic to that of all Galois connections between
¶J and ¶M , and also to that of all dual isomorphisms between closure systems on J
and M , respectively.
The classical example is the Fundamental Theorem of Galois Theory, establishing a
dual isomorphism between the complete lattice of all intermediate fields of a Galois ex-
tension and that of the corresponding automorphism groups, due to Richard Dedekind
and Emil Artin. In contrast to that correspondence, which does not occur explicitly
in Galois’ succinct original articles, a few other closely related Galois connections may
be discovered in his work (of course not under that name). Besides these historical
forerunners, we discuss a few other highlights of mathematical theories where Galois
connections enter in a convincing way through certain “orthogonality” relations, and
show how the Galois approach considerably facilitates the proofs. For example, each of
the following important structural isomorphisms arises from a rather simple relation
on the respective ground sets:

 – the dual isomorphism between the subspace lattice of a finite-dimensional linear
   space and the left ideal lattice of its endomorphism ring
 – the duality between algebraic varieties and radical ideals
 – the categorical equivalence between ordered sets and Alexandroff spaces
 – the representation of complete Boolean algebras as systems of polars.
      Canonical extensions, Duality theory, and
             Formal Concept Analysis

                                      Mai Gehrke

                     LIAFA CNRS – University of Paris 7, France

Abstract. The theory of canonical extensions, developed by Jonsson and Tarski in the
setting of Boolean algebras with operators, provides an algebraic approach to duality
theory. Recent developments in this theory have revealed that in this algebraic guise
duality theory is no more complicated outside than within the setting of Boolean
algebras or distributive lattices. This has opened the possibility of exporting the highly
developed machinery and knowledge available in the classical setting (e.g. in modal
logic) to the completely general setting of partially ordered and non-distributive lattice
ordered algebras. Duality theory in this setting is a special instance of the connection
between formal contexts and concept lattices and thus allows methods of classical
algebraic logic to be imported into FCA.

This will be an introductory talk on the subject of canonical extensions with the
purpose of outlining the relationship between the three topics of the title.
 Galois connections and residuation: origins and
                developments II

                                    Bruno Leclerc

        CAMS – École des Hautes Études en Sciences Sociales, Paris, France

Abstract. From the seventies, the uses of Galois connections (and residuated/residual
maps) multiplied in applied fields. Indeed Galois connections have been several times
rediscovered for one or another purpose, for instance in fuzzy set theory or aggregation
problems. In this talk, we illustrate the diversity of such applications. Of course, the
many developments in Galois lattices and Formal Concept Analysis, with their rela-
tion with Data Mining, will be only briefly evoked. Besides these developments, one
finds, among other uses, alternative methods to study a correspondence (binary rela-
tion) between two sets, models of classification and preferences, fitting and aggregation
problems.
 Galois connections and residuation: origins and
                 developments I

                                  Bernard Monjardet

    Centre d’Economie de la Sorbonne (University of Paris 1) and CAMS (Centre
                    Analyse et Mathmatique Sociale), France

Abstract. The equivalent notions of Galois connexions, and of residual and residuated
maps occur in a great varieties of “pure” as well as “applied” mathematical theories.
They explicitly appeared in the framework of lattice theory and the first of these talks
is devoted to the history of their appearance and of the revealing of their links in
this framework. So this talk covers more or less the period between 1940 (with the
notion of polarity defined in the first edition of Birkoff’s book Lattice theory) and 1972
(with Blyth and Janowitz’s book Residuation theory), a period containing fundamental
works like Öre’s 1944 paper Galois connexions or Croisot’s 1956 paper Applications
résiduées.
Metrics, Betweeness Relations, and Entropies on
           Lattices and Applications

                                     Dan Simovici

   Department of Computer Science, University of Massachusetts at Boston, USA

Abstract. We discuss an algebraic axiomatization of the notion of entropy in the
framework of lattices as well as characterizations of metric structures induced by such
entropies. The proposed new framework takes advantage of the partial orders defined
on lattices, in particular the semimodular lattice of partitions of a finite set to allow
multiple applications in data mining: data discretization, recommendation systems,
classification, and feature selection.
      Vertical decomposition of a lattice using clique
                       separators

                    Anne Berry, Romain Pogorelcnik, Alain Sigayret

                               LIMOS UMR CNRS 6158??
      Ensemble Scientifique des Cézeaux Université Blaise Pascal, F-63 173 Aubière,
                                         France.
            berry@isima.fr, romain.pogorelcnik@isima.fr, sigayret@isima.fr




          Abstract. A concept (or Galois) lattice is built on a binary relation; this
          relation can be represented by a bipartite graph. We explain how we can
          use the graph tool of clique minimal separator decomposition to decom-
          pose some bipartite graphs into subgraphs in linear time; each subgraph
          corresponds to a subrelation. We show that the lattices of these subrela-
          tions easily yield the lattice of the global relation. We also illustrate how
          this decomposition is a tool to help displaying the lattice.


          Keywords: lattice decomposition, clique separator decomposition, lat-
          tice drawing



  1     Introduction

  In many algorithms dealing with hard problems, a divide-and-conquer approach
  is helpful in practical applications. Computing the set of concepts associated
  with a given context (or the set of maximal rectangles associated with a binary
  relation) is time-consuming, as there may be an exponential number of concepts.
  It would be interesting to decompose the lattice into smaller sublattices. What
  we propose here is to decompose the relation into smaller subrelations, compute
  the lattice of each subrelation, and then use these lattices to reconstruct the
  lattice of the global relation.
      For this, we use a graph decomposition, called ”clique separator decomposi-
  tion”, introduced by Tarjan [9], and refined afterwards (see [3] for an extensive
  introduction to this decomposition). The general principal is roughly the follow-
  ing: repeatedly find a set of vertices which are pairwise adjacent (called a clique)
  and whose removal disconnects the graph (called a separator), then copy this
  clique separator into the different connected components obtained. When the de-
  composition is completed, a set of subgraphs is obtained, inconveniently called
  ’atoms’: each subgraph is a maximal subgraph containing no clique separator.
 ??
      Research partially supported by the French Agency for Research under the DEFIS
      program TODO, ANR-09-EMER-010.


c 2011 by the paper authors. CLA 2011, pp. 15–29. Copying permitted only for private
  and academic purposes. Volume published and copyrighted by its editors. Local
  Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
16     Anne Berry, Romain Pogorelcnik and Alain Sigayret


    In a previous work [2], we used graph algorithms on the complement of the
bipartite graph associated with the relation. In this paper, we will apply this
decomposition directly to the bipartite graph itself.
    It turns out upon investigation that the subgraphs obtained divide not only
the graph, but in a very similar fashion divide the matrix of the relation, the
set of concepts and the lattice. When the relation has a clique separator of size
two, the lattice, as we will explain further on, is divided along a vertical axis by
an atom and a co-atom which correspond to the two vertices of the separator.
Thus not only can the concepts be computed on the subrelations, but the Hasse
diagram of the lattice can be drawn better, as no edge need cross this vertical
line.
    Moreover, in a bipartite graph, this decomposition can be implemented with
a better worse-case time complexity than in the general case, as the clique sep-
arators can be of size one (in this case they are called articulation points) or of
size two. In both cases, the entire decomposition can be computed in linear time,
i.e. in the size of the relation, thanks to the works of [6] and [3]. Although some
graphs do not have a clique separator, when there is one, the decomposition is
thus a useful and non-expensive pre-processing step.
    The paper is organized as follows: we will first give some more preliminaries
in Section 2. In Section 3, we explain how a bipartite graph is decomposed. In
Section 4, we show how to reconstruct the global lattice from the concepts ob-
tained on the subrelations. In Section 5, we discuss using vertical decomposition
as a tool for layout help. Finally, in Section 6, we conclude with some general
remarks.


2    Preliminaries

We will first recall essential definitions and properties.
    All the graphs will be undirected and finite. For a graph G = (V, E), V is
the vertex set and E is the edge set. For xy ∈ E, x 6= y, x and y are said to be
adjacent; we say that x sees y. A graph is connected if, for every pair {x, y} of
vertices, there is a path from x to y. When a graph is not connected, the maximal
connected subgraphs are called the connected components. For C ⊂ V , G(C)
denotes the subgraph induced by C. In a graph G = (V, E), the neighborhood of
a vertex x ∈ V is the set NG (x) = {y ∈ V, x 6= y|xy ∈ E}. NG (x) is denoted
N (x) when there is no ambiguity.
    A clique is set of vertices which induces a complete graph, i.e. with all possible
edges.
    A bipartite graph G = (X + Y, E), where + stands for disjoint union, is built
on two vertex sets, X and Y , with no edge between vertices of X and no edge
between vertices of Y . A maximal biclique of a bipartite graph G = (X + Y, E)
is a subgraph G(X 0 + Y 0 ) with all possible edges between the vertices of X 0 and
the vertices of Y 0 .
    A relation R ⊆ O × A on a set O of objects and a set A of attributes is
associated with a bipartite graph G=(O + A, E), which we will denote Bip(R);
                    Vertical decomposition of a lattice using clique separators   17


thus, for x ∈ O and y ∈ A, (x, y) is in R iff xy is an edge of G. The maximal
rectangles of the relation correspond exactly to the maximal bicliques (maximal
complete bipartite subgraphs) of Bip(R) and to the elements (the concepts) of
concept lattice L(R) associated with context (O, A, R). If O1 × A1 and O2 × A2
are concepts of L(R) then O1 × A1  O2 × A2 iff O1 ⊆ O2 iff A1 ⊇ A2 ; the
corresponding bicliques on vertex sets O1 + A1 and O2 + A2 of Bip(R) are
comparable the same way.
    An atom (resp. co-atom) of L(R) is a concept covering the minimum element
(resp. covered by the maximum element). In the bipartite graph Bip(R), the
neighborhoods are defined as follows: for x ∈ O, N (x) = R(x) = {y ∈ A|(x, y) ∈
R}, and for x ∈ A, N (x) = R−1 (x) = {y ∈ O|(y, x) ∈ R}.
    A separator in a connected graph is a set of vertices, the removal of which
disconnects the graph. A clique separator is a separator which is a clique. Clique
separator decomposition [9] of a graph G = (V, E) is a process which repeat-
edly finds a clique separator S and copies it into the connected components of
G(V − S). When only minimal separators are used (see [3] for extensive general
definitions), the decomposition is unique and the subgraphs obtained in the end
are exactly the maximal subgraphs containing no clique separator [8], [3]. In a
bipartite graph, the clique minimal separators are of size one or two. A separator
of size one is called a articulation point. A clique separator S = {x, y} of size
two is minimal if there are two components C1 and C2 of G(V − S) such that x
and y both have at least one neighbor in C1 as well as in C2 .


3   Decomposing the bipartite graph and the relation

In the rest of the paper, we will use the bipartite graph Bip(R) defined by a rela-
tion R ⊆ O × A. Figure 1 shows an example of a relation with the corresponding
bipartite graph.




              Fig. 1. A relation and the corresponding bipartite graph




In this section, we will first discuss connectivity issues, then illustrate and give
our process to decompose the bipartite graph.
18     Anne Berry, Romain Pogorelcnik and Alain Sigayret


3.1   Decomposing the bipartite graph into connected components


When the bipartite graph Bip(R) is not connected, our process can be applied
separately (or in parallel) to each connected component. The lattice obtained is
characteristic: when the top and bottom elements are removed from the Hasse
diagram, the resulting diagram is a set of disjoint lattices, with a one-to-one
correspondence between the connected components of Bip(R) and the lattices
obtained.
    Figure 2 shows such a disconnected bipartite graph, its relation, and the
corresponding lattice.
    Note that trivially, if a connected component has only one vertex, this means
that the corresponding row or column of the relation is empty: such a component
corresponds to a lattice with only one element.
    In the rest of the paper, we will consider only relations whose bipartite graph
is connected.




Fig. 2. A disconnected bipartite graph, its relation, and the corresponding character-
istic lattice.
                   Vertical decomposition of a lattice using clique separators   19


3.2   Illustrating the two decomposition steps

In order to make the process we use as clear as possible, we will first describe
what happens when one decomposition step is applied for each of the two de-
compositions involved (using clique separators of size one or of size two).
    It is important to understand, however, that to ensure a good (linear) time
complexity, each of the two decompositions is computed globally in a single
linear-time pass.

Step with an articulation point

     The removal of an articulation point {x} in a connected bipartite graph G
results in components C1 , ..., Ck , which correspond to a partition V = C1 + ... +
Ck + {x} of Bip(R). After a decomposition step using {x}, x is preserved, with
its local neighborhood, in each component, so that G is replaced by k subgraphs
G(C1 ∪ {x}), ..., G(Ck ∪ {x}).

Example 1. In the input graph of Figure 3, vertex 1 defines an articulation point
that induces two connected components {2, 3, a, b, c} and {4, d, e}. The decom-
position step results into subgraphs G({1, 2, 3, a, b, c}) and G({1, 4, d, e}).




                 Fig. 3. Decomposition by articulation point {1}.




Step with a separator of size two

     When the removal of a clique minimal separator {x, y} in a connected bi-
partite graph G results into components C1 ,..., Ck , corresponding to a partition
V = C1 +...+Ck +{x, y}. The decomposition step replaces G with G(C1 ∪{x, y}),
..., G(Ck ∪ {x, y}).

Example 2. In the input graph of Figure 4, {2, b} is a clique minimal separa-
tor of size two that induces two connected components {1, 2, 3, a, b, c, f } and
{2, 5, 6, b, g, h}.
20     Anne Berry, Romain Pogorelcnik and Alain Sigayret




                  Fig. 4. Decomposition by clique separator {2,b}



3.3   Ordering the steps
A clique minimal separator of size two may include an articulation point. Thus it
is important to complete the decomposition by the articulation points first, and
then go on to decompose the obtained subgraphs using their clique separators
of size two.
Example 3. In the input graph of Figure 5, {2} is an articulation point included
in clique minimal separator {2, b}. The decomposition will begin with {2}, in-
ducing components {2, i} and {1, 2, 3, 5, 6, a, b, c, f, g, h}. As there remains no
articulation point in these resulting components, the second component will be
then decomposed by {2, b} into {1, 2, 3, a, b, c, f } and {2, 5, 6, b, g, h}.




      Fig. 5. Articulation point {2} is processed before clique separator {2,b}



    After the bipartite graph decomposition is completed, we will obtain sub-
graphs with no remaining clique minimal separator, and the corresponding sub-
relations with their associated lattices.
Example 4. Figure 6 shows that the input graph of Figure 1 is decomposable
into four bipartite subgraphs: G1 = G({1, 2, i}), G2 = G({2, 5, 6, b, g, h}), G3 =
G({1, 2, 3, a, b, c, f }) and G4 = G({1, 4, d, e}).
    Note that in the general case all subgraphs obtained have at least two vertices,
since at least one vertex of a separator is copied into a component which has at
least one vertex.
                   Vertical decomposition of a lattice using clique separators   21




               Fig. 6. Complete decomposition of a bipartite graph




3.4   The global decomposition process

To obtain the entire decomposition of a connected bipartite graph, we will thus
first decompose the graph using all articulation points, and then decompose each
of the subgraphs obtained using all its clique separators of size 2.
    The articulation points (clique minimal separators of size one) can be found
by a simple depth-first search [6], as well as the corresponding decomposition of
the graph (called decomposition into biconnected components).
    The search for clique separators of size two corresponds to a more complicated
algorithm, described in [5]: all separators of size 2 are obtained, whether they
are cliques or not.
    Once this list of separators is obtained, it is easy to check which are joined
by an edge. The desired decomposition can then be obtained easily.
    In both cases, the set of clique separators is output.
    Both algorithms run in linear time, so the global complexity is in O(|R|) to
obtain both the set of subgraphs and the set of clique separators of the original
graph.


3.5   Sub-patterns defined in the matrix

When the clique separators involved do not overlap and each defines exactly
two connected components, this decomposition of the bipartite graph partitions
the graph and the underlying relation. This results in a significant pattern of
their binary matrix. As the components obtained are pairwise disconnected, the
matrix can be reorganized in such a way that zeroes are gathered into blocks. Two
components may appear as consecutive blocks, linked by a row corresponding
to the articulation point that has been used to split them, or linked by one cell
giving the edge between the two vertices of a size two clique minimal separator.
    In the general case, this pattern can occur in only some parts of the matrix,
and different patterns can be defined according to the different separators which
the user chooses to represent.

Example 5. The first of the two matrices below corresponds to our running ex-
ample from Figure 1 and has been reorganized, following the decomposition,
which results in the second matrix. Notice how {1} is an articulation point, so
22     Anne Berry, Romain Pogorelcnik and Alain Sigayret


row 1 is shared by blocs 231×bacf and 14×de; and how {2, b} is a clique separa-
tor of size two, so cell [2, b] is the intersection of blocs 562 × ghb and 231 × bacf .
[2, i] is not integrated into the pattern, because separator {2, b} of Bip(R) defines
3 connected components: {2, 5, 6, b, g, h}, {i} and {1, 3, 4, a, c, d, e, f }.




We will now describe a process to organize the lines and columns of the ma-
trix with such patterns. We will construct a meta-graph (introduced in [7] as the
’atom graph’), whose vertices represent the subgraphs obtained by our decompo-
sition, and where there is an edge between two such vertices if the two subgraphs
which are the endpoints have a non-empty intersection which is a clique minimal
separator separating the corresponding two subgraphs in the original bipartite
graph. In this meta-graph, choose a chordless path; the succession of subgraphs
along this path will yield a succession of rectangles in the matrix which corre-
spond to a pattern.

Example 6. Figure 7 gives the meta-graph for our running example from Figure
1. Chordless path ({2, 5, 6, b, g, h}, {1, 2, 3, a, b, c, }, {1, 4, d, e}) was chosen for the
patterning. Another possible chordless path would be ({2, i}, {1, 2, 3, a, b, c, },
{1, 4, d, e}). Finding a chordless path in a graph can be done in linear time; the
meta-graph has less than min(|A|, |O|) elements, so finding such a path costs
less than (min(|A|, |O|))2 .




                      Fig. 7. Meta-graph for graph from Figure 1
                   Vertical decomposition of a lattice using clique separators   23


3.6   Decomposing the lattice

We will now examine how the set of concepts is modified and partitioned into
the subgraphs obtained. As clique minimal separators are copied in all the com-
ponents induced, most of the concepts will be preserved by the decomposition.
Furthermore, only initial concepts including a vertex of a clique minimal sepa-
rator may be affected by the decomposition.

Definition 1. We will say that a maximal biclique is a star maximal biclique if
it contains either exactly one object or exactly one attribute. This single object
or attribute will be called the center of the star.

Lemma 1. A star maximal biclique {x} ∪ N (x) of Bip(R) is an atomic concept
of L(R) (atom or co-atom), unless x is universal in Bip(R). More precisely,
{x} × N (x) is a atom if x ∈ O and N (x) 6= A, or N (x) × {x} is a co-atom if
x ∈ A and N (x) 6= O.

Proof. Let {x} ∪ N (x) be a star maximal biclique of Bip(R). As a maximal
biclique, it corresponds to a concept of L(R). Suppose the star has x ∈ O as
center . As a star, it contains no other element of O; as a biclique, it includes
all N (x) ⊆ A, and no other element of A by maximality. The corresponding
concept is {x} × N (x) which is obviously the first concept from bottom to top
including x. As the biclique is maximal, and as x is not universal, this concept
cannot be the bottom of L(R) but only an atom. A similar proof holds for x ∈ A
and co-atomicity.

   We will now give the property which describes how the maximal bicliques are
dispatched or modified by the decomposition. In the next Section, we will give
a general theorem and its proof, from which these properties can be deduced.

Property 1. Let G = (X + Y, E) be a bipartite graph, let S be a clique minimal
separator of G which decomposes G into subgraphs G1 , ..., Gk . Then:

1. ∀x ∈ S, {x} ∪ NG (x) is a star maximal biclique of G.
2. ∀x ∈ S, {x} ∪ NG (x) is not a maximal biclique of any Gi .
3. ∀x ∈ S, {x} ∪ NGi (x) is a biclique of Gi , but it is maximal in Gi iff it is not
   strictly contained in any other biclique of Gi .
4. All the maximal bicliques of G which are not star bicliques with any x ∈ S
   as a center are partitioned into the corresponding subgraphs.

With the help of Lemma 1, this property may be translated in terms of lattices.
Given a relation R, its associated graph G, its lattice L(R), and a decomposition
step of G into some Gi s by articulation point {x}:
    If x ∈ O (resp. ∈ A) is an articulation point of G, {x}×NG (x) (resp. NG (x)×
{x}) is a concept of L(R). After the decomposition step, in each subgraph Gi of
G, either this concept becomes {x}×NGi (x), or this concept disappears from Gi ;
this latter case occurs when there is in Gi some x0 ∈ O, the introducer of which
appears after the introducer of x in L(R), from bottom to top (resp. from top
24     Anne Berry, Romain Pogorelcnik and Alain Sigayret


to bottom if x, x0 ∈ A). Every other concept will appear unchanged in exactly
one lattice associated with a subgraph Gi .
   The same holds for each vertex of a size two clique minimal separator.
Example 7. Figure 8 illustrates a decomposition step with articulation point {1}
using the graph from Figure 3. Concept {1, 4} × {d, e} disappears from the first
component {1, 2, 3, a, b, c}, but remains in the second component {1, 4, d, e}.




        Fig. 8. Example of lattice decomposition using articulation point {1}.




Example 8. Figure 9 illustrates a decomposition step with clique separator {2, b}
using the graph from Figure 4. Concept {2}×N (2) is duplicated into components
{2, 5, 6, b, g, h} and {1, 2, 3, a, b, c, f }; concept N (b) × {b} will appear as {2, 6} ×
{b} in the first component, but not in the second one, as {2, 3, b} is a biclique
included in maximal biclique {2, 3, b, f } of G.
Remark 1. The smaller lattices obtained can not be called sublattices of the
initial lattice as some of their elements may not be the same: for example, in
Figure 9, {2} × {b, c, f } is an element of the third smaller lattice L(G3 ) but is
not an element of the initial lattice L(G).


4    Reconstructing the lattice
We will now explain how, given the subgraphs obtained by clique decomposi-
tion, as well as the corresponding subrelations and subsets of concepts, we can
reconstruct the set of concepts of the global input bipartite graph. We will then
go on to explain how to obtain the Hasse diagram of the reconstructed lattice.
                     Vertical decomposition of a lattice using clique separators   25




         Fig. 9. Example of lattice decomposition using clique separator {2, b}.




4.1   Reconstructing the set of concepts

We will use the following Theorem, which describes the concepts of the global
lattice.
Theorem 1. Let G = (X + Y, E) be a bipartite graph, let Σ = {s1 , ...sh } be
the set of all the vertices which belong to a clique separator of G, let G1 , ...Gk
be the set of subgraphs obtained by the complete corresponding clique separator
decomposition. Then:

1. For every s ∈ Σ, {s} ∪ NG (s) is a star maximal biclique of G.
2. Any maximal biclique of a subgraph Gi which is not a star with a vertex of
   Σ as center is also a maximal biclique of G.
3. There are no other maximal bicliques in G: ∀s ∈ Σ, no other star maximal
   biclique of Gi with center s is a star maximal biclique of G, and these are
   the only maximal bicliques of some graph Gi which are not also maximal
   bicliques in G.

Proof.
 1. For every s ∈ Σ, {s} ∪ NG (s) is a star maximal biclique of G:
    Case 1: s is an articulation point, let Gi , Gj be two graphs which s belongs
    to; s must be adjacent to some vertex yi in Gi and to some vertex yj in
    Gj . Suppose {s} ∪ NG (s) is not a maximal biclique: there must be a vertex
    z in G which sees yi and yj , but then {s} cannot separate yi from yj , a
    contradiction.
    Case 2: s is not an articulation point, let s0 be a vertex of S such that
26     Anne Berry, Romain Pogorelcnik and Alain Sigayret


    {s, s0 } is a clique separator of G, separating Gi from Gj . s must as above see
    some vertex yi in Gi and some vertex yj in Gj . Suppose {s} ∪ NG (s) is not
    maximal: there must be some vertex w in G which sees all of NG (s), but w
    must see yi and yj , so {s, s0 } cannot separate Gi from Gj .
 2. Let B be a non-star maximal biclique of Gi , containing o1 , o2 ∈ O and
    a1 , a2 ∈ A. Suppose B is not maximal in G: there must be a vertex y in
    G − B which augments B. Let y be in Gj , wlog y ∈ A: y must see o1 and
    o2 . Since Gi is a maximal subgraph with no clique separator, Gi + {y} must
    have a clique separator. Therefore N (y) must be a clique separator of this
    subgraph, but this is impossible, since y sees two non-adjacent vertices of
    Gi .
 3. Any star maximal biclique B of Gi whose center is not in Σ is also a star
    maximal biclique of G: suppose we can augment B in G.
    Case 1: v sees an extra vertex w; Gi + {w} contains as above a clique sepa-
    rator, which is impossible since N (w) = v and v 6∈ S.
    Case 2: A vertex z of Gj is adjacent to all of N (v): again, G + {z} contains
    a clique separator, so N (z) is a clique separator, but that is impossible since
    N (z) contains at least two non-adjacent vertices.
 4. For s ∈ Σ, no star maximal biclique of Gi is a star maximal biclique of G:
    let B be a star maximal biclique of Gi , with s ∈ Σ as center. s ∈ Σ, so s
    belongs to some clique separator which separates Gi from some graph Gj .
    s must see a vertex yj in Gj , so B + {yj } is a larger star including B: B
    cannot be maximal in G.

Example 9. We illustrate Theorem 1 using graph G from Figure 6, whose decom-
position yields subgraphs G1 , ..., G4 , with G1                  =   G({1, 2, i}),
G2 = G({2, 5, 6, b, g, h}), G3 = G({1, 2, 3, a, b, c, f }) and G4 = G({1, 4, d, e}).
Finally, Σ = {1, 2, b}. The corresponding lattices are shown in Figure 10, and
their concepts are presented in the table below. In this table, braces have been
omitted; symbol ⇒ represents a concept of the considered subgraph Gi which is
identical to a concept of G (there can be only one ⇒ per row); the other concepts
of the subgraphs will not be preserved in G while recomposing.

         L(G)    L(G1 ) L(G2 ) L(G3 ) L(G4 ) star max. biclique of G ?
      1 × acde                 1 × ac                  yes
      2 × bcf hi 2 × i 2 × bh 2 × bcf                  yes
       3 × abf                   ⇒
       14 × de                         ⇒
        5 × gh            ⇒
        6 × bg            ⇒
        13 × a                   ⇒
       236 × b          26 × b                         yes
        12 × c                   ⇒
       23 × bf                   ⇒
        56 × g            ⇒
        25 × h            ⇒
                   Vertical decomposition of a lattice using clique separators   27




                        Fig. 10. Reconstruction of a lattice



According to Theorem 1, the steps to reconstruct the maximal concepts of the
global lattice from the concepts of the smaller lattices are:

1. Compute Σ, the set of attributes and objects involved in a clique minimal
   separator. (In our example, Σ = {1, 2, b}.)
2. Compute the maximal star bicliques for all the elements of Σ. (In our exam-
   ple, we will compute star maximal bicliques 1 × acde, 2 × bcf hi and 26 × b.)
3. For each smaller lattice, remove from the set of concepts the atoms or co-
   atoms corresponding to elements of Σ; maintain all the other concepts as
   concepts of the global lattice. (In our example, for L(G3 ), we will remove
   1×ac and 2×bcf , and maintain 3×abf, 13×a, 12×c and 23×bf as concepts
   of L(G).)

    Step 1 requires O(|R|) time. Step 2 can be done while computing the smaller
lattices; Step 3 costs constant time per concept. Thus the overall complexity of
the reconstruction is in O(|R|) time.


4.2   Reconstructing the edges of the Hasse diagram

According to Theorem 1, the maximal bicliques which are not star maximal
bicliques with a vertex of Σ as center are preserved; therefore, the corresponding
edges between the elements of the lattice are also preserved. In the process
28     Anne Berry, Romain Pogorelcnik and Alain Sigayret


described below, we will refer to labels in the lattice as being the ’reduced’
labels, such as the ones used in our lattice figures throuhout this paper.
   To define the edges of the Hasse diagram of lattice L(G), we will, for each
smaller lattice L(Gi ):

 – find each atom (or co-atom) which corresponds to an element σ of Σ (such
   as 2 or b for L(G3 ) in our example).
 – If σ shares its label with some non-elements of Σ, remove all elements of
   Σ from the label. (In our example for L(G3 ), bf becomes f ). If σ does not
   share its label with some non-elements of Σ, remove the atom or co-atom.
   (In our example for L(G3 ), remove element 2).
 – Maintain the remaining edges as edges of L(G).
 – Compute the neighborhood in L(G) of each atom or co-atom which corre-
   sponds to an element of Σ.

   All this can be done in polynomial time: there are at most |A| + |O| vertices
in Σ, and the corresponding edges can be added in O((|A| + |O|)2 |R|).


5    Vertical decomposition as a layout tool

When there is a size two clique separator in the bipartite graph which divides the
graph into two components, the concepts which are not involved in the separator
can be displayed on the two sides of the separator, thus helping to minimize the
number of line crossings in the Hasse diagram.




                   (a)                                          (b)

Fig. 11. (a) Lattice constructed by Concept Explorer using the minimal intersection
layout option (8 crossings). (b) Lattice re-drawn using the information on clique sep-
arators (5 crossings).




     To illustrate this, we have used our running example with ’Concept Explorer’
[1], which is a very nice and user-friendly tool for handling lattices. Notice how-
ever how clique separator {1, d} is better displayed when put at the right ex-
tremity.
                    Vertical decomposition of a lattice using clique separators    29


    Figure 11 shows the lattice as proposed by Concept Explorer, and then re-
drawn with insight on the clique separators of the bipartite graph.
    The same technique of course also applies when there is a succession of such
clique separators.
    Let us add that if moreover both lattices are planar, as discussed in [4], merg-
ing the two lattices obtained using the clique separator as central will preserve
planarity.


6   Conclusion and perspectives
We have used a graph method, clique minimal separator decomposition, to pro-
vide simple tools which can help reduce the time spent computing the elements
of a lattice, as well as improve the drawing of its Hasse diagram.
    When there is no clique separator in the bipartite graph, it could be inter-
esting to investigate restricting the relation to a subgraph or partial subgraph
which does have one.
    We leave open the question of characterizing, without computing the relation,
the lattices whose underlying bipartite graph has a clique minimal separator.


Acknowledgments
The authors sincerely thank all the referees for their useful suggestions and
questions.


References
 1. Concept Explorer. Downloadable at http://sourceforge.net/projects/conexp/, ver-
    sion 1.3 (Java), 20/12/2009.
 2. Berry A., Sigayret A.: Representing a concept lattice by a graph. Discrete Applied
    Mathematics, 144(1-2):27–42, 2004.
 3. Berry A., Pogorelcnik R., Simonet G.: An introduction to clique minimal separator
    decomposition. Algorithms, 3(2):197–215, 2010.
 4. Eschen E.M., Pinet N., Sigayret A.: Consecutive-ones: handling lattice planarity
    efficiently. CLA’07, Montpellier (Fr), 2007.
 5. Hopcroft J. E., Tarjan R. E.: Dividing a graph into triconnected components. SIAM
    J. Comput., 2(3):135–158, 1973.
 6. Hopcroft J. E., Tarjan R. E.: Efficient algorithms for graph manipulation [H]
    (Algorithm 447). Commun. ACM, 16(6):372–378, 1973.
 7. Kaba B., Pinet N., Lelandais G., Sigayret A., Berry A.: Clustering gene expression
    data using graph separators. In Silico Biology, 7(4-5):433–52, 2007.
 8. Leimer H.-G.: Optimal decomposition by clique separators. Discrete Mathematics,
    113(1-3):99–123, 1993.
 9. Tarjan R. E.: Decomposition by clique separators. Discrete Mathematics,
    55(2):221–232, 1985.
            Building up Shared Knowledge with
                Logical Information Systems

              Mireille Ducassé1 , Sébastien Ferré2 , and Peggy Cellier1
                          1
                             IRISA-INSA de Rennes, France,
                            {ducasse, cellier}@irisa.fr
                        2
                          IRISA-University of Rennes 1, France
                                   ferre@irisa.fr


        Abstract. Logical Information Systems (LIS) are based on Logical Con-
        cept Analysis, an extension of Formal Concept Analysis. This paper de-
        scribes an application of LIS to support group decision. A case study
        gathered a research team. The objective was to decide on a set of po-
        tential conferences on which to send submissions. People individually
        used Abilis, a LIS web server, to preselect a set of conferences. Start-
        ing from 1041 call for papers, the individual participants preselected 63
        conferences. They met and collectively used Abilis to select a shared
        set of 42 target conferences. The team could then sketch a publication
        planning. The case study provides evidence that LIS cover at least three
        of the collaboration patterns identified by Kolfschoten, de Vreede and
        Briggs. Abilis helped the team to build a more complete and relevant
        set of information (Generate/Gathering pattern); to build a shared un-
        derstanding of the relevant information (Clarify/Building Shared Un-
        derstanding); and to quickly reduce the number of target conferences
        (Reduce/Filtering pattern).


  1    Introduction
  Group work represents a large amount of time in professional life while many
  people feel that much of that time is wasted. Lewis [13] argues that this amount
  of time is even going to increase because problems are becoming more complex
  and are meant to be solved in a distributed way. Each involved person has a local
  and partial view of the problem, no one embraces the whole required knowledge.
  Lewis also emphasizes that it is common that “groups fail to adequately define a
  problem before rushing to judgment”. Building up shared knowledge in order to
  gather relevant distributed knowledge of a problem is therefore a crucial issue.
      Logical Information Systems (LIS) are based on Logical Concept Analysis
  (LCA), an extension of Formal Concept Analysis (FCA). In a previous work [5],
  Camelis, a single-user logical information system, has been shown useful to sup-
  port serene and fair meetings. This paper shows how Abilis, a LIS web server that
  implements OnLine Analytical Processing (OLAP [3]) features, can be applied
  to help build shared knowledge among a group of skilled users.
      The presented case study gathered a research team to decide on a publication
  strategy. Starting from 1041 call for papers, each team member on his own

c 2011 by the paper authors. CLA 2011, pp. 31–42. Copying permitted only for private
  and academic purposes. Volume published and copyrighted by its editors. Local
  Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
32       Mireille Ducasse, Sebastien Ferre and Peggy Cellier


preselected a set of conferences matching his own focus of interest. The union of
individual preselections still contained 63 conferences. Then, participants met for
an hour and a half and collectively built a shared set of 42 target conferences. For
each conference, the team shared a deep understanding of why it was relevant.
The team could sketch a publication planning in a non-conflictual way.
    Kolfschoten, de Vreede and Briggs have classified collaboration tasks into 16
collaboration patterns [12]. The contribution of this paper is to give evidences
that LIS can significantly support three of these patterns which are important as-
pects of decision making, namely Generate/Gathering, Clarify/Building Shared
Understanding and Reduce/Filtering. Firstly, the navigation and filtering capa-
bilities of LIS were helpful to detect inconsistencies and missing knowledge. The
updating capabilities of LIS enabled participants to add objects, features and
links between them on the fly. As a result the group had a more complete and
relevant set of information (Generate/Gathering pattern). Secondly, the com-
pact views provided by LIS and the OLAP features helped participants embrace
the whole required knowledge. The group could therefore build a shared under-
standing of the relevant information which was previously distributed amongst
the participants (Clarify/Building Shared Understanding pattern). Thirdly, the
navigation and filtering capabilities of LIS were relevant to quickly converge on
a reduced number of target conferences (Reduce/Filtering pattern).
    In the following, Section 2 briefly introduces logical information systems. Sec-
tion 3 describes the case study. Section 4 gives detailed arguments to support the
claim that logical information systems help build up shared knowledge. Section 5
discusses related work.


2      Logical Information Systems

Logical Information Systems (LIS) [7] belong to a paradigm of information re-
trieval that combines querying and navigation. They are formally based on a
logical generalization of Formal Concept Analysis (FCA) [8], namely Logical
Concept Analysis (LCA) [6]. In LCA, logical formulas are used instead of sets
of attributes to describe objects. LCA and LIS are generic in that the logic
is not fixed, but is a parameter of those formalisms. Logical formulas are also
used to represent queries and navigation links. The concept lattice serves as the
navigation structure: every query leads to a concept, and every navigation link
leads from one concept to another. The query can be modified in three ways: by
formula edition, by navigation (selecting features in the index in order to modify
the query) or by examples. Annotations can be performed in the same interface.
Camelis3 has been developed since 2002; a web interface, Abilis 4 , has recently
been added. It incorporates display paradigms based on On-Line Analytical Pro-
cessing (OLAP). Instead of being presented as a list of objects, an extent can be
partitioned as an OLAP cube, namely a multi-dimensional array [1].
3
     see http://www.irisa.fr/LIS/ferre/camelis/
4
     http://ledenez.insa-rennes.fr/abilis/
            Building up Shared Knowledge with Logical Information Systems          33


3     The Case Study

The reported case study gathered 6 participants, including the 3 authors, 4
academics and 2 PhD students. All participants were familiar with LIS, 4 of them
had not previously used a LIS tool as a decision support system. The objective
was to identify the publishing strategy of the team: in which conferences to
submit and why. This has not been a conflictual decision, the group admitted
very early that the set of selected conferences could be rather large provided
that there was a good reason to keep each of them.
    One person, the facilitator, spent an afternoon organizing the meeting and
preparing the raw data as well as a logical context according to the objective. She
collected data about conference call for papers of about a year, related to themes
corresponding to the potential area of the team, from WikiCFP, a semantic wiki
for Calls For Papers in science and technology fields 5 . There were 1041 events:
conferences, symposiums, workshops but also special issues of journals.
    Then every participant, on its own, spent between half an hour to two hours
to browse the context, update it if necessary and preselect a number of con-
ferences (Section 3.1). The group met for one hour and a half. It collaborately
explored the data and selected a restricted set of conferences (Section 3.2). After
the meeting, every participant filled a questionnaire. The context used for the
case study can be freely accessed 6 .


3.1   Distributed Individual Preselection and Update

When the context was ready, every participant was asked to preselect a set of
conferences that could be possible submission targets. The instruction was to be
as liberal as wanted and in case of doubt to label the conference as a possible
target.
    During this phase, each of the academics preselected 20 to 30 conferences and
each of the PhD students preselected around 10 conferences. Each participant
had his own “basket”. There were overlappings, altogether 63 conferences were
preselected. Participants also introduced new conferences and new features, for
example, the ranking of the Australian CORE association 7 (Ranking), and the
person expected to be a possible first author for the target conference (Main
author).
    Figure 1 shows a state of Abilis during the preselection phase. LIS user in-
terfaces give a local view of the concept lattice, centered on a focus concept. The
local view is made of three parts: (1) the query (top left), (2) the extent (bottom
right), and (3) the index (bottom left). The query is a logical formula that typi-
cally combines attributes (e.g., Name), patterns (e.g., contains "conference"),
and Boolean connectors (and, or, not). The extent is the set of objects that are
matched by the query, according to logical subsumption. The extent identifies
5
  http://www.wikicfp.com/cfp/
6
  http://ledenez.insa-rennes.fr/abilis/, connect as guest, load Call for papers.
7
  http://core.edu.au/index.php/categories/conference rankings
34   Mireille Ducasse, Sebastien Ferre and Peggy Cellier




        Fig. 1. Snapshot of Abilis during preselection: a powerful query
            Building up Shared Knowledge with Logical Information Systems        35


the focus concept. Finally, the index is a set of features, taken from a finite sub-
set of the logic, and is restricted to features associated to at least one object in
the extent. The index plays the role of a summary or inventory of the extent,
showing which kinds of objects there are, and how many of each kind there are
(e.g., in Figure 1, 8 objects in the extent have data mining as a theme). In the
index, features are organized as a taxonomy according to logical subsumption.
     The query area (top left) shows the current selection criteria: (Name contains
"conference" or Name contains "symposium") and not (Name contains
"agent" or Name contains "challenge" or Name contains "workshop")
and (Theme is "Knowledge Discovery" or Theme is "Knowledge
Engineering" or Theme is "Knowledge Management"). Note that the query
had been obtained solely by clicking on features of the index (bottom left). Let
us describe how it had been produced. Initially there were 1041 objects. Firstly,
opening the Name ? feature, the participant had noticed that names could con-
tain “conference” or “symposium” but also other keywords such as “special
issue”. He decided to concentrate on conferences and symposiums by clicking
on the two features and then on the zoom button. The resulting query was
(Name contains "conference" or Name contains "symposium") and there
were 495 objects in the extent. However, the displayed features under Name ?
showed that there were still objects whose name in addition to “conference” or
“symposium” also contained “agent”, “challenge” or “workshop”. He decided to
filter them out by clicking on the three features then on the Not button then on
the zoom button. The resulting query was (Name contains "conference" or
Name contains "symposium") and not (Name contains "agent" or Name
contains "challenge" or Name contains "workshop") and there were 475
objects in the extent. He opened the Theme ? feature, clicked on the three sub-
features containing “Knowledge”, then on the zoom button. The resulting query
is the one displayed on Figure 1 and there are 48 objects in the displayed extent.
    In the extent area (bottom right), the 48 acronyms of the selected conferences
are displayed. In the index area, one can see which of the features are filled for
these objects. The displayed features have at least 1 object attached to them.
The number of objects actually attached to them is shown in parentheses. For
example, only 14 of the preselected conferences have an abstract deadline. All
of them have an acronym, a date of beginning, a date of end, a date for the
paper deadline, a name, some other (not very relevant) information, as well as
at least a theme and a town. The features shared by all selected objects have
that number in parentheses (48 in this case). For the readers who have a color
printout, these features are in green. The other features are attached to only
some of the objects. For example, only 16 objects have a ranking attached to
them: 4 core A, 6 core B, 2 core C, 1 ‘too recent event’, 4 unknown (to the Core
ranking).
    One way to pursue the navigation could be, for example, to click on Ranking ?
to select the conferences for which the information is filled. Alternatively, one
could concentrate on the ones for which the ranking is not filled, for example
36     Mireille Ducasse, Sebastien Ferre and Peggy Cellier


to fill in this information on the fly for the conferences which are considered
interesting.
    Another way to pursue the navigation could be, for example, to notice that
under the Theme ? feature, there are more than the selected themes. One can
see that among the selected conferences, one conference is also relevant to the
Decision Support Systems theme. One could zoom into it, this would add and
Theme is "Decision Support Systems" to the query ; the object area would
then display the relevant conference (namely GDN2011).


3.2   Collaborative Data Exploration, Update and Selection

The group eventually had a physical meeting where the current state of the
context was constantly displayed on a screen.
    Using the navigation facilities of Abilis, the conferences were examined by
decreasing ranking. Initially, the group put in the selection all the A and A+
preselected conferences. After some discussions, it had, however, been decided
that Human Computer Interaction (HCI) was too far away from the core of
the team’s research. Subsequently, the HCI conferences already selected were
removed from the selection. For the conferences of rank B, the team decided
that most of them were pretty good and deserved to be kept in the selection.
For the others, the group investigated first the conferences without ranking and
very recent, trying to identify the ones with high selection rate or other good
reasons to put them in the selection. Some of the arguments have been added
into the context. Some others were taken into account on the fly to select some
conferences but they seemed so obvious at the time of the discussion that they
were not added in the context.
    Figure 2 shows the selection made by the group at a given point. In the
extent area, on the right hand side, the selected objects are partitioned according
to the deadline month and the anticipated main author thanks to the OLAP
like facilities of Abilis [1]. Instead of being presented as a list of objects, an
extent can be partitioned as an OLAP cube, namely a multi-dimensional array.
Assuming object features are valued attributes, each attribute can play the role
of a dimension, whose values play the role of indices along this dimension. Users
can freely interleave changes to the query and changes to the extent view.
    The query is SelectionLIS02Dec and scope international. Note that
the partition display is consistent with the query. When the group added and
scope international to the query, the national conferences disappeared from
the array.
    Some conferences, absent from WikiCFP have been entered on the fly at that
stage (for example, ICCS 2011 - Concept). Not all the features had been entered
for all of them. In particular, one can see in the feature area that only 28 out
of 29 had been preselected. Nevertheless, the group judged that the deadline
month, the potential main author and the ranking were crucial for the decision
process and added them systematically. It is easy to find which objects do not
have a feature using Not and zoom, and then to attach features to them.
            Building up Shared Knowledge with Logical Information Systems          37




Fig. 2. Snapshot of Abilis during collaborative data exploration: a partition deadline
month/mainAuthor
38     Mireille Ducasse, Sebastien Ferre and Peggy Cellier


    One can see that there are enough opportunities for each participant to pub-
lish round the year. One can also see at a glance where compromises and decisions
will have to be made. For example, PC will probably not be in a position to pub-
lish at IDA, ISSTA, KDD and ICCS the same year. Thanks to this global view
PC can discuss with potential co-authors what the best strategy could be.
    A follow up to the meeting was that participants made a personal publication
planning, knowing that their target conferences were approved by the group.


4     Discussion
In this section, we discuss how the reported case study provides evidences that
LIS help keep the group focused (Section 4.1) and that LIS also help build up
shared knowledge (Section 4.2). As already mentioned, participants filled up a
questionnaire after the meeting. In the following, for each item, we introduce
the arguments, we present a summary of relevant parts of participant feedbacks,
followed by an analysis of the features of LIS that are crucial for the arguments.

4.1   Logical Information Systems Help Keep the Group Focused
It is recognized that an expert facilitator can significantly increase the efficiency
of a meeting (see for example [2]). A study made by den Hengst and Adkins [10]
investigated which facilitation functions were found the most challenging by
facilitators around the world. It provides evidences that facilitators find that
“the most difficult facilitation function in meeting procedures is keeping the group
outcome focused.”
    In our case study, all participants reported that they could very easily stay
focused on the point currently discussed thanks to the query and the consistency
between the three views.
    As the objective was to construct a selection explicitly identified in Abilis
by a feature, the objective of the meeting was always present to everybody
and straightforward to bring back in case of digression. Furthermore, even if the
context contained over a thousand conferences, thanks to the navigation facilities
of LIS, only relevant information was displayed at a given time. Therefore, there
was no “noise” and no dispersion of attention, the displayed information was
always closely connected to the focus of the discussion.

4.2   Logical Information Systems Help Build Up Shared Knowledge
Kolfschoten, de Vreede and Briggs have identified 6 collaboration patterns: Gen-
erate, Reduce, Clarify, Organize, Evaluate, and Consensus Building [12]. We
discuss in the following three of their 16 sub-patterns for which all participants
agreed that they are supported by Abilis in its current stage. For the other sub-
patterns, the situation did not demand much with respect to them. For example,
the decision to make was not conflictual, the set of selected conferences could
be rather large, there was, therefore, not much to experiment about “consensus
            Building up Shared Knowledge with Logical Information Systems         39


building.” The descriptions of the patterns in italic are from Kolfschoten, de
Vreede and Briggs.

Generate/Gathering: move from having fewer to having more complete and rel-
evant information shared by the group.
     Before and during the meeting, information has been added to the shared
knowledge repository of the group, namely the logical context. A new theme,
important for the team and missing from WikiCFP, has been added: Decision
Support Systems. New conferences have been added into the context either by
individual participants in the preselection phase or by the group during the
selection phase. New features were added. For example, it soon appeared that
some sort of conference rankings was necessary. The group added by hand, for the
conferences that were selected, the ranking of the Australian Core association.
Some conferences were added subsequently, sometimes the ranking was not added
at once.
     All participants acknowledged that the tool helped the group to set up a set
of features which was relevant and reflecting the group’s point of view.
     The crucial characteristics of LIS for this aspect are those which enable in-
tegrated navigation and update.
     Firstly, the possibility to update the context while navigating in it enables
participants to enhance it on the fly adding small pieces of relevant information
at a time. Secondly, for each feature, Abilis displays the number of objects which
have it. It is therefore immediate to detect when a feature is not systematically
filled. The query Not  selects the objects that do not have the feature.
Users can then decide if they want to update them. Thanks to the query, as soon
as an object is updated, it disappears from the extent. Users can immediately
see what remains to be updated. Thirdly, updating the context does not divert
from the initial objective. Indeed, the Back button allows users to go back to
previous queries. Fourthly, the three views (query, features, objects) are always
consistent and provide a “global” understanding of the relevant objects. Lastly,
in the shared web server, participants can see what information the others had
entered. Hence each participant can inspire the others.
     For the last aspect, the facilitator inputs were decisive. Participants reported
that they did not invent much, they imitated and adapted from what the facili-
tator had initiated. This is consistent with the literature on group decision and
negotiation which emphasizes the key role of facilitators [2].

Clarify/Building Shared Understanding: Move from having less to more shared
understanding of the concepts shared by the group and the words and phrases
used to express them.
    Participants, even senior ones, discovered new conferences. Some were sur-
prised by the ranking of conferences that they had previously overlooked. Par-
ticipants had a much clearer idea of who was interested in what.
    All participants found that the tool helped them understand the points of
view of the others.
40     Mireille Ducasse, Sebastien Ferre and Peggy Cellier


    The crucial characteristics of LIS for this aspect are those which enable to
grasp a global understanding at a glance. Firstly, the query, as discussed earlier,
helps keep the group focused. Secondly, the consistency between the 3 views
helps participants to grasp the situation. Thirdly, irrelevant features are not in
the index, the features in the index thus reflect the current state of the group de-
cision. Fourthly, the partitions à la OLAP sort the information according to the
criteria under investigation. Lastly, the shared web server enables participants
to know before the meeting what the others have entered.

Reduce/Filtering: move from having many concepts to fewer concepts that meet
specific criteria according to the group members.
    Both at preselection time and during the meeting, participants could quickly
strip down the set of conferences of interest according to the most relevant
criteria.
    All participants said that the filtering criteria were relevant and reflecting
the group’s point of view. They also all thought that the group was satisfied
with the selected set of conferences.
    The crucial characteristics of LIS for this aspect are those of the navigation
core of LIS. Firstly, the features of the index propose filtering criteria. They
are dynamically computed and they are relevant for the current selection of ob-
jects. Secondly, the query with its powerful logic capabilities enables participants
to express sophisticated selections. Thirdly, the navigation facilities enable par-
ticipants to build powerful queries, even without knowing anything about the
syntax. Lastly, users do not have to worry about the consistency of the set of
selected objects. The view consistency of Abilis guaranties that all conferences
fulfilling the expressed query are indeed present.
    This aspect is especially important. As claimed by Davis et al. [4], conver-
gence in meetings is a slow and painful process for groups. Vogel and Coombes [16]
present an experiment that supports the hypothesis that groups selecting ideas
from a multicriteria task formulation will converge better than groups working on
a single criteria formulation, where convergence is defined as moving from many
ideas to a focus on a few ideas that are worthy of further attention. Convergence
is very close to the Reduce/Filtering collaboration pattern. They also underline
that people try to minimize the effects of information overload by employing con-
scious or even unconscious strategies of heuristics in order to reduce information
load, where information overload is defined as having too many things to do at
once.
    With their powerful navigation facilities, LIS enable to address a large num-
ber of criteria and objects with a limited information overload. Indeed, one can
concentrate on local aspects. The global consistency is maintained automatically
by the concept lattice.

5    Related work
Abilis in its current stage does not pretend to match up to operational group
support systems (GSS) which have a much broader scope. LIS, however, could be
            Building up Shared Knowledge with Logical Information Systems       41


integrated in some of the modules of GSS. For example, M eetingworksT M [13],
one of the most established GSS, is a modular toolkit that can be configured to
support a wide variety of group tasks. Its “Organize” module proposes a tree
structure to help analyze and sort ideas. That structure looks much like the
index of LIS. It can be edited by hand and some limited selection is possible.
The navigation capabilities of LIS based on the concept lattice are, however,
more powerful.
    Concept analysis has been applied to numerous social contexts, such as so-
cial networks [15], computer-mediated communication [9] and domestic violence
detection [14]. Most of those applications are intended to be applied a posteri-
ori, in order to get some understanding of the studied social phenomena. On
the contrary, we propose to use Logical Concept Analysis in the course and as a
support of the social phenomena itself. In our case, the purpose is to support a
collaborative decision process. Our approach is to other social applications, what
information retrieval is to data mining. Whereas data mining automatically com-
putes a global and static view on a posteriori data, information retrieval (i.e.
navigation in and update of the concept lattice) presents the user with a local
and dynamic view on live data, and only guides users in their choice.
    A specificity of LIS is the use of logics. This has consequences both on the
queries that can be expressed, and on the feature taxonomy. The use of logics al-
lows to express inequalities on numerical attributes, disjunctions and negations in
queries. In pure FCA, only conjunctions of Boolean attributes can be expressed.
Previous sections have shown how disjunction and negation are important to
express selection criteria. In the taxonomy, criteria are organized according to
the logical subsumption relation between them in pure FCA, criteria would be
presented as a long flat list. Logics help to make the taxonomy more concise and
readable by grouping and hierarchizing together similar criteria. The taxonomy
can be dynamically updated by end-users.

6   Conclusion
In this paper we have shown that a Logical Information System web server could
be used to support a group decision process consisting of 1) data preparation
2) distributed individual preselection and update and 3) collaborative data ex-
ploration, update and selection. We have presented evidences that the navigation
and filtering capabilities of LIS were relevant to quickly reduce the number of
target conferences. Secondly, the same capabilities were also helpful to detect
inconsistencies and missing knowledge. The updating capabilities of LIS enabled
participants to add objects, features and links between them on the fly. As a
result the group had a more complete and relevant set of information. Thirdly,
the group had built a shared understanding of the relevant information.

Acknowledgments The authors thank Pierre Allard and Benjamin Sigonneau for
the development and maintenance of Abilis. They thank Pierre Allard, Annie
Foret and Alice Hermann for attending the experiment and giving many insight-
ful feedbacks.
42     Mireille Ducasse, Sebastien Ferre and Peggy Cellier


References
 1. Allard, P., Ferré, S., Ridoux, O.: Discovering functional dependencies and associa-
    tion rules by navigating in a lattice of OLAP views. In: Kryszkiewicz, M., Obiedkov,
    S. (eds.) Concept Lattices and Their Applications. pp. 199–210. CEUR-WS (2010)
 2. Briggs, R.O., Kolfschoten, G.L., de Vreede, G.J., Albrecht, C.C., Lukosch, S.G.:
    Facilitator in a box: Computer assisted collaboration engineering and process sup-
    port systems for rapid development of collaborative applications for high-value
    tasks. In: HICSS. pp. 1–10. IEEE Computer Society (2010)
 3. Codd, E., Codd, S., Salley, C.: Providing OLAP (On-line Analytical Processing)
    to User-Analysts: An IT Mandate. Codd & Date, Inc, San Jose (1993)
 4. Davis, A., de Vreede, G.J., Briggs, R.: Designing thinklets for convergence. In:
    AMCIS 2007 Proceedings (2007), http://aisel.aisnet.org/amcis2007/358
 5. Ducassé, M., Ferré, S.: Fair(er) and (almost) serene committee meetings with logi-
    cal and formal concept analysis. In: Eklund, P., Haemmerlé, O. (eds.) Proceedings
    of the International Conference on Conceptual Structures. Springer-Verlag (July
    2008), lecture Notes in Artificial Intelligence 5113
 6. Ferré, S., Ridoux, O.: A logical generalization of formal concept analysis. In:
    Mineau, G., Ganter, B. (eds.) International Conference on Conceptual Structures.
    pp. 371–384. No. 1867 in Lecture Notes in Computer Science, Springer (Aug 2000)
 7. Ferré, S., Ridoux, O.: An introduction to logical information systems. Information
    Processing & Management 40(3), 383–419 (2004)
 8. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations.
    Springer, Heidelberg (1999)
 9. Hara, N.: Analysis of computer-mediated communication: Using formal concept
    analysis as a visualizing methodology. Journal of Educational Computing Research
    26(1), 25–49 (2002)
10. den Hengst, M., Adkins, M.: Which collaboration patterns are most challenging:
    A global survey of facilitators. In: HICSS. p. 17. IEEE Computer Society (2007)
11. Kilgour, D.M., Eden, C.: Handbook of Group Decision and Negotiation, Advances
    in Group Decision and Negotiation, vol. 4. Springer Netherlands (2010)
12. Kolfschoten, G.L., de Vreede, G.J., Briggs, R.O.: Collaboration engineering. In:
    Kilgour and Eden [11], chap. 20, pp. 339–357
13. Lewis, L.F.: Group support systems: Overview and guided tour. In: Kilgour and
    Eden [11], chap. 14, pp. 249–268
14. Poelmans, J., Elzinga, P., Viaene, S., Dedene, G.: A case of using formal con-
    cept analysis in combination with emergent self organizing maps for detecting
    domestic violence. In: Perner, P. (ed.) Advances in Data Mining. Applications and
    Theoretical Aspects, Lecture Notes in Computer Science, vol. 5633, pp. 247–260.
    Springer Berlin / Heidelberg (2009), http://dx.doi.org/10.1007/978-3-642-03067-
    3 20, 10.1007/978-3-642-03067-3 20
15. Roth, C., Bourgine, P.: Lattice-based dynamic and overlapping taxonomies:
    The case of epistemic communities. Scientometrics 69, 429–447 (2006),
    http://dx.doi.org/10.1007/s11192-006-0161-6, 10.1007/s11192-006-0161-6
16. Vogel, D., Coombes, J.: The effect of structure on convergence activities using
    group support systems. In: Kilgour and Eden [11], chap. 17, pp. 301–311
       Comparing Performance of Algorithms for
       Generating the Duquenne–Guigues Basis

                     Konstantin Bazhanov and Sergei Obiedkov

                    Higher School of Economics, Moscow, Russia,
                  kostyabazhanov@mail.ru, sergei.obj@gmail.com



        Abstract. In this paper, we take a look at algorithms involved in the
        computation of the Duquenne–Guigues basis of implications. The most
        widely used algorithm for constructing the basis is Ganter’s Next Clo-
        sure, designed for generating closed sets of an arbitrary closure system.
        We show that, for the purpose of generating the basis, the algorithm can
        be optimized. We compare the performance of the original algorithm
        and its optimized version in a series of experiments using artificially
        generated and real-life datasets. An important computationally expen-
        sive subroutine of the algorithm generates the closure of an attribute
        set with respect to a set of implications. We compare the performance
        of three algorithms for this task on their own, as well as in conjunction
        with each of the two versions of Next Closure.


  1    Introduction

  Implications are among the most important tools of formal concept analysis
  (FCA) [9]. The set of all attribute implications valid in a formal context defines
  a closure operator mapping attribute sets to concept intents of the context (this
  mapping is surjective). The following two algorithmic problems arise with respect
  to implications:

   1. Given a set L of implications and an attribute set A, compute the closure
      L(A).
   2. Given a formal context K, compute a set of implications equivalent to the
      set of all implications valid in K, i.e., the cover of valid implications.

      The first of these problems has received considerable attention in the database
  literature in application to functional dependencies [14]. Although functional
  dependencies are interpreted differently than implications, the two are in many
  ways similar: in particular, they share the notion of semantic consequence and
  the syntactic inference mechanism (Armstrong rules [1]). A linear-time algo-
  rithm, LinClosure, has been proposed for computing the closure of a set with
  respect to a set of functional dependencies (or implications) [3], i.e., for solving
  the first of the two problems stated above. However, the asymptotic complexity
  estimates may not always be good indicators for relative performance of algo-
  rithms in practical situations. In Sect. 3, we compare LinClosure with two

c 2011 by the paper authors. CLA 2011, pp. 43–57. Copying permitted only for private
  and academic purposes. Volume published and copyrighted by its editors. Local
  Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
44     Konstantin Bazhanov and Sergei Obiedkov


other algorithms—a “naı̈ve” algorithm, Closure [14], and the algorithm pro-
posed in [20]—both of which are non-linear. We analyze their performance in
several particular cases and compare them experimentally on several datasets.
    For the second problem, an obvious choice of the cover is the Duquenne–
Guigues, or canonical, basis of implications, which is the smallest set equivalent
to the set of valid implications [11]. Unlike for the other frequently occurring
FCA algorithmic task, the computation of all formal concepts of a formal con-
text [12], only few algorithms have been proposed for the calculation of the
canonical basis. The most widely-used algorithm was proposed by Ganter in
[10]. Another, attribute-incremental, algorithm for the same problem was de-
scribed in [17]. It is claimed to be much faster than Ganter’s algorithm for most
practical situations. The Concept Explorer software system [21] uses this algo-
rithm to generate the Duquenne–Guigues basis of a formal context. However,
we do not discuss it here, for we choose to concentrate on the computation of
implications in the lectic order (see Sect. 4). The lectic order is important in the
interactive knowledge-acquisition procedure of attribute exploration [8], where
implications are output one by one and the user is requested to confirm or reject
(by providing a counterexample) each implication.
    Ganter’s algorithm repeatedly computes the closure of an attribute set with
respect to a set of implications; therefore, it relies heavily on a subprocedure
implementing a solution to the first problem. In Sect. 4, we describe possible
optimizations of Ganter’s algorithm and experimentally compare the original
and optimized versions in conjunction with each of the three algorithms for
solving the first problem. A systematic comparison with the algorithm from [17]
is left for further work.

2    The Duquenne–Guigues Basis of Implications
Before proceeding, we quickly recall the definition of the Duquenne–Guigues
basis and related notions.
    Given a (formal) context K = (G, M, I), where G is called a set of objects, M
is called a set of attributes, and the binary relation I ⊆ G × M specifies which
objects have which attributes, the derivation operators (·)I are defined for A ⊆ G
and B ⊆ M as follows:
     A0 = {m ∈ M | ∀g ∈ A : gIm}                 B 0 = {g ∈ G | ∀m ∈ B : gIm}
In words, A0 is the set of attributes common to all objects of A and B 0 is the set
of objects sharing all attributes of B. The double application of (·)0 is a closure
operator, i.e., (·)00 is extensive, idempotent, and monotonous. Therefore, sets A00
and B 00 are said to be closed. Closed object sets are called concept extents and
closed attribute sets are called concept intents of the formal context K.
    In discussing the algorithms later in the paper, we assume that the sets G
and M are finite.
    An implication over M is an expression A → B, where A, B ⊆ M are at-
tribute subsets. It holds in the context if A0 ⊆ B 0 , i.e., every object of the context
that has all attributes from A also has all attributes from B.
           Comparing performance of algorithms for generating the DG basis          45


    An attribute subset X ⊆ M respects (or is a model of) an implication A → B
if A 6⊆ X or B ⊆ X. Obviously, an implication holds in a context (G, M, I) if
and only if {g}0 respects the implication for all g ∈ G.
    A set L of implications over M defines the closure operator X 7→ L(X) that
maps X ⊆ M to the smallest set respecting all the implications in L:
                 \
        L(X) = {Y | X ⊆ Y ⊆ M, ∀(A → B) ∈ L : A 6⊆ Y or B ⊆ Y }.

    We discuss algorithms for computing L(X) in Sect. 3. Note that, if L is the
set of all valid implications of a formal context, then L(X) = X 00 for all X ⊆ M .
    Two implication sets over M are equivalent if they are respected by exactly
the same subsets of M . Equivalent implication sets define the same closure op-
erator. A minimum cover of an implication set L is a set of minimal size among
all implication sets equivalent to L. One particular minimum cover described in
[11] is defined using the notion of a pseudo-closed set, which we introduce next.
    A set P ⊆ M is called pseudo-closed (with respect to a closure operator (·)00 )
if P 6= P 00 and Q00 ⊂ P for every pseudo-closed Q ⊂ P .
    In particular, all minimal non-closed sets are pseudo-closed. A pseudo-closed
attribute set of a formal context is also called a pseudo-intent.
    The Duquenne–Guigues or canonical basis of implications (with respect to
a closure operator (·)00 ) is the set of all implications of the form P → P 00 ,
where P is pseudo-closed. This set of implications is of minimal size among
those defining the closure operator (·)00 . If (·)00 is the closure operator associated
with a formal context, the Duquenne–Guigues basis is a minimum cover of valid
implications of this context. The computation of the Duquenne–Guigues basis
of a formal context is hard, since even recognizing pseudo-intents is a coNP-
complete problem [2], see also [13, 7]. We discuss algorithms for computing the
basis in Sect. 4.


3    Computing the Closure of an Attribute Set

In this section, we compare the performance of algorithms computing the closure
of an attribute set X with respect to a set L of implications. Algorithm 1 [14]
checks every implication A → B ∈ L and enlarges X with attributes from B
if A ⊆ X. The algorithm terminates when a fixed point is reached, that is,
when the set X cannot be enlarged any further (which always happens at some
moment, since both L and M are assumed finite).
    The algorithm is obviously quadratic in the number of implications in L in
the worst case. The worst case happens when exactly one implication is applied
at each iteration (but the last one) of the repeat loop, resulting in |L|(|L| + 1)/2
iterations of the for all loop, each requiring O(|M |) time.

Example 1. A simple example is when X = {1} and the implications in L =
{{i} → {i + 1} | i ∈ N, 0 < i < n} for some n are arranged in the descending
order of their one-element premises.
46     Konstantin Bazhanov and Sergei Obiedkov


Algorithm 1 Closure(X, L)
Input: An attribute set X ⊆ M and a set L of implications over M .
Output: The closure of X w.r.t. implications in L.
  repeat
      stable := true
      for all A → B ∈ L do
            if A ⊆ X then
                 X := X ∪ B
                 stable := false
                 L := L \ {A → B}
  until stable
  return X


    In [3], a linear-time algorithm, LinClosure, is proposed for the same prob-
lem. Algorithm 2 is identical to the version of LinClosure from [14] except for
one modification designed to allow implications with empty premises in L. Lin-
Closure associates a counter with each implication initializing it with the size
of the implication premise. Also, each attribute is linked to a list of implications
that have it in their premises. The algorithm then checks every attribute m of
X (the set whose closure must be computed) and decrements the counters for
all implications linked to m. If the counter of some implication A → B reaches
zero, attributes from B are added to X. Afterwards, they are used to decrement
counters along with the original attributes of X. When all attributes in X have
been checked in this way, the algorithm stops with X containing the closure of
the input attribute set.
    It can be shown that the algorithm is linear in the length of the input as-
suming that each attribute in the premise or conclusion of any implication in L
requires a constant amount of memory [14].
Example 2. The worst case for LinClosure occurs, for instance, when X ⊂ N,
M = X ∪ {1, 2, . . . , n} for some n such that X ∩ {1, 2, . . . , n} = ∅ and L consists
of implications of the form
                             X ∪ {i | 0 < i < k} → {k}
for all k such that 1 ≤ k ≤ n. During each of the first |X| iterations of the
for all loop, the counters of all implications will have to be updated with only
the last iteration adding one attribute to X using the implication X → {1}. At
each of the subsequent n − 1 iterations, the counter for every so far “unused”
implication will be updated and one attribute will be added to X. The next,
(|X| + n)th, iteration will terminate the algorithm.
    Note that, if the implications in L are arranged in the superset-inclusion
order of their premises, this example will present the worst case for Algorithm 1
requiring n iterations of the main loop. However, if the implications are arranged
in the subset-inclusion order of their premises, one iteration will be sufficient.
    Inspired by the mechanism used in LinClosure to obtain linear asymp-
totic complexity, but somewhat disappointed by the poor performance of the
           Comparing performance of algorithms for generating the DG basis     47


Algorithm 2 LinClosure(X, L)
Input: An attribute set X ⊆ M and a set L of implications over M.
Output: The closure of X w.r.t. implications in L.
  for all A → B ∈ L do
       count[A → B] := |A|
       if |A| = 0 then
            X := X ∪ B
       for all a ∈ A do
            add A → B to list[a]
  update := X
  while update 6= ∅ do
       choose m ∈ update
       update := update \ {m}
       for all A → B ∈ list[m] do
            count[A → B] = count[A → B] − 1
            if count[A → B] = 0 then
                 add := B \ X
                 X := X ∪ add
                 update := update ∪ add
  return X



algorithm relative to Closure, which was revealed in his experiments, Wild
proposed a new algorithm in [20]. We present this algorithm (in a slightly more
compact form) as Algorithm 3. The idea is to maintain implication lists similar
to those used in LinClosure, but get rid of the counters. Instead, at each step,
the algorithm combines the implications in the lists associated with attributes
not occurring in X and “fires” the remaining implications (i.e., uses them to
enlarge X). When there is no implication to fire, the algorithm terminates with
X containing the desired result.
    Wild claims that his algorithm is faster than both LinClosure and Clo-
sure, even though it has the same asymptotic complexity as the latter. The worst
case for Algorithm 3 is when L \ L1 contains exactly one implication A → B and
B \ X contains exactly one attribute at each iteration of the repeat . . . until
loop. Example 1 presents the worst case for 3, but, unlike for Closure, the
order of implications in L is irrelevant. The worst case for LinClosure (see
Example 2) is also the worst case for Algorithm 3, but it deals with it, perhaps,
in a more efficient way using n iterations of the main loop compared to n + |X|
iterations of the main loop in LinClosure.


Experimental Comparison

We implemented the algorithms in C++ using Microsoft Visual Studio 2010. For
the implementation of attribute sets, as well as sets of implications in Algorithm
3, we used dynamic bit sets from the Boost library [6]. All the tests described in
the following sections were carried out on an Intel Core i5 2.67 GHz computer
with 4 Gb of memory running under Windows 7 Home Premium x64.
48                                Konstantin Bazhanov and Sergei Obiedkov


Algorithm 3 Wild’s Closure(X, L)
Input: An attribute set X ⊆ M and a set L of implications over M.
Output: The closure of X w.r.t. implications in L.
  for all m ∈ M do
       for all A → B ∈ L do
            if m ∈ A then
                 add A → B to list[m]
  repeat
       stable S
              := true
       L1 := m∈M \X list[m]
       for all A → B ∈ L \ L1 do
            X := X ∪ B
            stable := false
       L := L1
  until stable
  return X




    Figure 1 shows the performance of the three algorithms on Example 1. Algo-
rithm 2 is the fastest algorithm in this case: for a given n, it needs n iterations
of the outer loop—the same as the other two algorithms, but the inner loop of
Algorithm 2 checks exactly one implication at each iteration, whereas the inner
loop of Algorithm 1 checks n − i implications at the ith iteration. Although the
inner loop of Algorithm 3 checks only one implication at the ith iteration, it has
to compute the union of n − i lists in addition.




                                  40
                                  35
     Time in sec for 1000 tests




                                  30
                                  25
                                  20                                                           Closure
                                  15                                                           LinClosure
                                                                                               Wild's Closure
                                  10
                                   5
                                   0
                                       0     100 200 300 400 500 600 700 800 900
                                                             n


                                           Fig. 1. The performance of Algorithms 1–3 for Example 1.
                                      Comparing performance of algorithms for generating the DG basis       49


   Figure 2 shows the performance of the algorithms on Example 2. Here, the
behavior of Algorithm 2 is similar to that of Algorithm 1, but Algorithm 2 takes
more time due to the complicated initialization step.


                                 40
                                 35
    Time in sec for 1000 tests




                                 30
                                 25
                                 20                                                        Closure
                                 15                                                        LinClosure
                                                                                           Wild's Closure
                                 10
                                 5
                                 0
                                      0   100 200 300 400 500 600 700 800 900
                                                           n


Fig. 2. The performance of Algorithms 1–3 for Example 2 with implications in L
arranged in the superset-inclusion order of their premises and |X| = 50.


    Interestingly, Algorithm 1 works amost twice as fast on Example 2 as it does
on Example 1. This may seem surprising, since it is easy to see that the algorithm
performs essentially the same computations in both cases, the difference being
that the implications of Example 1 have single-element premises. However, this
turns out to be a source of inefficiency: at each iteration of the main loop, all
implications but the last fail to fire, but, for each of them, the algorithm checks
if their premises are included in the set X. Generally, when A 6⊆ X, this can
be established easier if A is large, for, in this case, A is likely to contain more
elements outside X. This effect is reinforced by the implementation of sets as
bit strings: roughly speaking, to verify that {i} 6⊆ {1}, it is necessary to check
all bits up {i}, whereas {i | 0 < i < k} 6⊆ {k + 1} can be established by checking
only one bit (assuming that bits are checked from left to right). Alternative
data structures for set implementation might have less dramatic consequences
for performance in this setting. On the other hand, the example shows that
performance may be affected by issues not so obviously related to the structure
of the algorithm, thus, suggesting additional paths to obtain an optimal behavior
(e.g., by rearranging attributes or otherwise preprocessing the input data).
    We have experimented with computing closures using the Duquenne–Guigues
bases of formal contexts as input implication sets. Table 1 shows the results for
randomly generated contexts. The first two columns indicate the size of the at-
tribute set and the number of implications, respectively. The remaining three
columns record the time (in seconds) for computing the closures of 1000 ran-
50       Konstantin Bazhanov and Sergei Obiedkov


domly generated subsets of M by each of the three algorithms. Table 3 presents
similar results for datasets taken from the UCI repository [5] and, if necessary,
transformed into formal contexts using FCA scaling [9].1 The contexts are de-
scribed in Table 2, where the last four columns correspond to the number of
objects, number of attributes, number of intents, and number of pseudo-intents
(i.e., the size of the canonical basis) of the context named in the first column.


Table 1. Performance on randomly generated tests (time in seconds per 1000 closures)

                                               Algorithm
                          |M |   |L|       1         2       3
                           30 557      0.0051 0.2593 0.0590
                           50 1115     0.0118 0.5926 0.1502
                          100 380      0.0055 0.2887 0.0900
                          100 546      0.0086 0.4229 0.1350
                          100 2269     0.0334 1.5742 0.5023
                          100 3893     0.0562 2.6186 0.8380
                          100 7994     0.1134 5.3768 1.7152
                          100 8136     0.1159 5.6611 1.8412




                    Table 2. Contexts obtained from UCI datasets


                             Context    |G| |M | # intents # pseudo-intents
                                 Zoo 101 28            379             141
               Postoperative Patient    90 26         2378             619
                Congressional Voting 435 18          10644             849
                             SPECT 267 23            21550            2169
                       Breast Cancer 286 43           9918            3354
                          Solar Flare 1389 49        28742            3382
             Wisconsin Breast Cancer 699 91           9824           10666




    In these experiments, Algorithm 1 was the fastest and Algorithm 2 was the
slowest, even though it has the best asymptotic complexity. This can be partly
explained by the large overhead of the initialization step (setting up counters
and implication lists). Therefore, these results can be used as a reference only
when the task is to compute one closure for a given set of implications. When
1
     The breast cancer domain was obtained from the University Medical Centre, Insti-
     tute of Oncology, Ljubljana, Yugoslavia (now, Slovenia). Thanks go to M. Zwitter
     and M. Soklic for providing the data.
           Comparing performance of algorithms for generating the DG basis        51


a large number of closures must be computed with respect to the same set of
implications, Algorithms 2 and 3 may be more appropriate.


Table 3. Performance on the canonical bases of contexts from Table 2 (time in seconds
per 1000 closures)

                                                   Algorithm
                                  Context         1       2       3
                                     Zoo     0.0036 0.0905 0.0182
                   Postoperative Patient     0.0054 0.2980 0.0722
                    Congressional Voting     0.0075 0.1505 0.0883
                                 SPECT       0.0251 0.9848 0.2570
                           Breast Cancer     0.0361 1.7912 0.5028
                              Solar Flare    0.0370 2.1165 0.6317
                 Wisconsin Breast Cancer     0.1368 8.4984 2.4730




4   Computing the Basis in the Lectic Order
The best-known algorithm for computing the Duquenne–Guigues basis was de-
veloped by Ganter in [10]. The algorithm is based on the fact that intents and
pseudo-intents of a context taken together form a closure system. This makes it
possible to iteratively generate all intents and pseudo-intents using Next Clo-
sure (see Algorithm 4), a generic algorithm for enumerating closed sets of an
arbitrary closure operator (also proposed in [10]). For every generated pseudo-
intent P , an implication P → P 00 is added to the basis. The intents, which are
also generated, are simply discarded.


Algorithm 4 Next Closure(A, M , L)
Input: A closure operator X 7→ L(X) on M and a subset A ⊆ M .
Output: The lectically next closed set after A.
  for all m ∈ M in reverse order do
       if m ∈ A then
            A := A \ {m}
       else
            B := L(A ∪ {m})
            if B \ A contains no element < m then
                 return B
  return ⊥



   Next Closure takes a closed set as input and outputs the next closed set
according to a particular lectic order, which is a linear extension of the subset-
52       Konstantin Bazhanov and Sergei Obiedkov


inclusion order. Assuming a linear order < on attributes in M , we say that a set
A ⊆ M is lectically smaller than a set B ⊆ M if
                          ∃b ∈ B \ A ∀a ∈ A(a < b ⇒ a ∈ B).
   In other words, the lectically largest among two sets is the one containing
the smallest element in which they differ.
Example 3. Let M = {a < b < c < d < e < f }, A = {a, c, e} and B = {a, b, f }.
Then, A is lectically smaller than B, since the first attribute in which they
differ, b, is in B. Note that if we represent sets by bit strings with smaller
attributes corresponding to higher-order bits (in our example, A = 101010 and
B = 110001), the lectic order will match the usual less-than order on binary
numbers.
    To be able to use Next Closure for iterating over intents and pseudo-
intents, we need access to the corresponding closure operator. This operator,
which we denote by • , is defined via the Duquenne–Guigues basis L as follows.2
For a subset A ⊆ M , put
                               [
                  A+ = A ∪ {P 00 | P → P 00 ∈ L, P ⊂ A}.

Then, A• = A++···+ , where A•+ = A• ; i.e., • is the transitive closure of + .
    The problem is that L is not available when we start; in fact, this is precisely
what we want to generate. Fortunately, for computing a pseudo-closed set A, it is
sufficient to know only implications with premises that are proper subsets of A.
Generating pseudo-closed sets in the lectic order, which is compatible with the
subset-inclusion order, we ensure that, at each step, we have at hand the required
part of the basis. Therefore, we can use any of the three algorithms from Sect.
3 to compute A• (provided that the implication A• → A00 has not been added
to L yet). Algorithm 5 uses Next Closure to generate the canonical basis. It
passes Next Closure the part of the basis computed so far; Next Closure
may call any of the Algorithms 1–3 to compute the closure, L(A ∪ {m}), with
respect to this set of implications.
    After Next Closure computes A• , the implication A• → A00 may be added
to the basis. Algorithm 5 will then pass A• as the input to Next Closure, but
there is some room for optimizations here. Let i be the maximal element of A
and j be the minimal element of A00 \ A. Consider the following two cases:
j < i: As long as m > i, the set L(A• ∪{m}) will be rejected by Next Closure,
    since it will contain j. Hence, it makes sense to skip all m > i and continue as
    if A• had been rejected by Next Closure. This optimization has already
    been proposed in [17].
i < j: It can be shown that, in this case, the lectically next intent or pseudo-
    intent after A• is A00 . Hence, A00 could be used at the next step instead of
    A• .
      Algorithm 6 takes these considerations into account.
2
     We deliberately use the same letter L for an implication set and the closure operator
     it defines.
           Comparing performance of algorithms for generating the DG basis        53




Algorithm 5 Canonical Basis(M , 00 )
Input: A closure operator X 7→ X 00 on M , e.g., given by a formal context (G, M, I).
Output: The canonical basis for the closure operator.
  L := ∅
  A := ∅
  while A 6= M do
       if A 6= A00 then
             L := L ∪ {A → A00 }
       A := Next Closure(A, M, L)
  return L




Algorithm 6 Canonical Basis(M , 00 ), an optimized version
Input: A closure operator X 7→ X 00 on M , e.g., given by a formal context (G, M, I).
Output: The canonical basis for the closure operator.
  L := ∅
  A := ∅
  i := the smallest element of M
  while A 6= M do
       if A 6= A00 then
             L := L ∪ {A → A00 }
       if A00 \ A contains no element < i then
             A := A00
             i := the largest element of M
       else
             A := {m ∈ A | m ≤ i}
       for all j ≤ i ∈ M in reverse order do
             if j ∈ A then
                  A := A \ {j}
             else
                  B := L(A ∪ {j})
                  if B \ A contains no element < j then
                        A := B
                        i := j
                        exit for
  return L
54       Konstantin Bazhanov and Sergei Obiedkov


Experimental Comparison

We used Algorithms 5 and 6 for constructing the canonical bases of the contexts
involved in testing the performance of the algorithms from Sect. 3, as well as
the context (M, M, 6=) with |M | = 18, which is special in that every subset of
M is closed (and hence there are no valid implications). Both algorithms have
been tested in conjunction with each of the three procedures for computing
closures (Algorithm 1–3). The results are presented in Table 4 and Fig. 3. It can
be seen that Algorithm 6 indeed improves on the performance of Algorithm 5.
Among the three algorithms computing the closure, the simpler Algorithm 1 is
generally more efficient, even though, in our implementation, we do not perform
the initialization step of Algorithms 2 and 3 from scratch each time we need
to compute a closure of a new set; instead, we reuse the previously constructed
counters and implication lists and update them incrementally with the addition
of each new implication. We prefer to treat these results as preliminary: it still
remains to see whether the asymptotic behavior of LinClosure will give it an
advantage over the other algorithms on larger contexts.


     Table 4. Time (in seconds) for building the canonical bases of artificial contexts

                                                                 Algorithm
      Context # intents # pseudo-intents        5+1     5+2     5+3 6+1         6+2       6+3
 100 × 30, 4          307               557    0.0088 0.0145 0.0119 0.0044 0.0065 0.0059
10 × 100, 25          129               380    0.0330 0.0365 0.0431 0.0073 0.0150 0.0169
 100 × 50, 4          251              1115    0.0442 0.0549 0.0617 0.0138 0.0152 0.0176
10 × 100, 50          559               546    0.0542 0.1312 0.1506 0.0382 0.0932 0.0954
20 × 100, 25          716              2269    0.3814 0.3920 0.7380 0.1219 0.1312 0.2504
50 × 100, 10          420              3893    1.1354 0.7291 1.6456 0.1640 0.1003 0.2299
900 × 100, 4         2472              7994    4.6313 2.7893 6.3140 1.5594 0.8980 2.0503
20 × 100, 50        12394              8136    7.3097 8.1432 14.955 5.1091 6.0182 10.867
 (M, M, 6=)        262144                 0    0.1578 0.3698 0.1936 0.1333 0.2717 0.1656




5      Conclusion

In this paper, we compared the performance of several algorithms computing the
closure of an attribute set with respect to a set of implications. Each of these
algorithms can be used as a (frequently called) subroutine while computing the
Duquenne–Guigues basis of a formal context. We tested them in conjunction
with Ganter’s algorithm and its optimized version.
    In our future work, we plan to extend the comparison to algorithms generat-
ing the Duquenne–Guigues basis in a different (non-lectic) order, in particular, to
incremental [17] and divide-and-conquer [19] approaches, probably, in conjunc-
tion with newer algorithms for computing the closure of a set [16]. In addition,
           Comparing performance of algorithms for generating the DG basis        55



 Fig. 3. Time (in seconds) for building the canonical bases of contexts from Table 2
0,18
0,16
0,14
                                                                                5+1
0,12
                                                                                5+2
0,10
                                                                                5+3
0,08
                                                                                6+1
0,06                                                                            6+2
0,04                                                                            6+3
0,02
0,00
               Zoo            Postoperative Patient     Congressional Voting
1,6

1,4

1,2
                                                                                5+1
1,0                                                                             5+2
0,8                                                                             5+3
                                                                                6+1
0,6
                                                                                6+2
0,4
                                                                                6+3
0,2

0,0
                  SPECT                               Breast Cancer
16

14

12
                                                                                5+1
10                                                                              5+2
 8                                                                              5+3
                                                                                6+1
 6
                                                                                6+2
 4
                                                                                6+3
 2

 0
                Solar Flare                  Wisconsin Breast Cancer
56     Konstantin Bazhanov and Sergei Obiedkov


we are going to consider algorithms that generate other implication covers: for
example, direct basis [15, 20, 4] or proper basis [18]. They can be used as an inter-
mediate step in the computation of the Duquenne–Guigues basis. If the number
of intents is much larger than the number of pseudo-intents, this two-step ap-
proach may be more efficient than direct generation of the Duquenne–Guigues
basis with Algorithms 5 or 6, which produce all intents as a side effect.


Acknowledgements

The second author was supported by the Academic Fund Program of the Higher
School of Economics (project 10-04-0017) and the Russian Foundation for Basic
Research (grant no. 08-07-92497-NTsNIL a).


References
 1. Armstrong, W.: Dependency structure of data base ralationship. Proc. IFIP
    Congress pp. 580–583 (1974)
 2. Babin, M.A., Kuznetsov, S.O.: Recognizing pseudo-intents is coNP-complete. In:
    Kryszkiewicz, M., Obiedkov, S. (eds.) Proceedings of the 7th International Con-
    ference on Concept Lattices and Their Applications. pp. 294–301. University of
    Sevilla, Spain (2010)
 3. Beeri, C., Bernstein, P.: Computational problems related to the design of normal
    form relational schemas. ACM TODS 4(1), 30–59 (March 1979)
 4. Bertet, K., Monjardet, B.: The multiple facets of the canonical direct unit impli-
    cational basis. Theor. Comput. Sci. 411(22-24), 2155–2166 (2010)
 5. Blake, C., Merz, C.: UCI repository of machine learning databases (1998),
    http://archive.ics.uci.edu/ml
 6. Demming, R., Duffy, D.: Introduction to the Boost C++ Libraries. Datasim Edu-
    cation Bv (2010), see http://www.boost.org
 7. Distel, F., Sertkaya, B.: On the complexity of enumerating pseudo-intents. Discrete
    Appl. Math. 159, 450–466 (March 2011)
 8. Ganter, B.: Attribute exploration with background knowledge. Theor. Comput.
    Sci. pp. 215–233 (1999)
 9. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations.
    Springer, Berlin (1999)
10. Ganter, B.: Two basic algorithms in concept analysis. Preprint 831, Technische
    Hochschule Darmstadt, Germany (1984)
11. Guigues, J.L., Duquenne, V.: Familles minimales d’implications informatives re-
    sultant d’un tableau de donnees binaires. Math. Sci. Hum. 95(1), 5–18 (1986)
12. Kuznetsov, S., Obiedkov, S.: Comparing performance of algorithms for generating
    concept lattices. Journal of Experimental and Theoretical Artificial Intelligence
    14(2/3), 189–216 (2002)
13. Kuznetsov, S.O., Obiedkov, S.: Some decision and counting problems of the
    Duquenne–Guigues basis of implications. Discrete Appl. Math. 156(11), 1994–2003
    (2008)
14. Maier, D.: The theory of relational databases. Computer software engineering se-
    ries, Computer Science Press (1983)
           Comparing performance of algorithms for generating the DG basis          57


15. Mannila, H., Räihä, K.J.: The design of relational databases. Addison-Wesley
    Longman Publishing Co., Inc., Boston, MA, USA (1992)
16. Mora, A., Aguilera, G., Enciso, M., Cordero, P., de Guzman, I.P.: A new closure
    algorithm based in logic: SLFD-Closure versus classical closures. Inteligencia Ar-
    tificial, Revista Iberoamericana de IA 10(31), 31–40 (2006)
17. Obiedkov, S., Duquenne, V.: Attribute-incremental construction of the canonical
    implication basis. Annals of Mathematics and Artificial Intelligence 49(1-4), 77–99
    (April 2007)
18. Taouil, R., Bastide, Y.: Computing proper implications. In Proc. ICCS-2001 In-
    ternational Workshop on Concept Lattices-Based Theory, Methods and Tools for
    Knowledge Discovery in Databases pp. 290–303 (2001)
19. Valtchev, P., Duquenne, V.: On the merge of factor canonical bases. In: Medina,
    R., Obiedkov, S. (eds.) ICFCA. Lecture Notes in Computer Science, vol. 4933, pp.
    182–198. Springer (2008)
20. Wild, M.: Computations with finite closure systems and implications. In: Comput-
    ing and Combinatorics. pp. 111–120 (1995)
21. Yevtushenko, S.A.: System of data analysis “Concept Explorer” (in Russian). In:
    Proceedings of the 7th national conference on Artificial Intelligence KII-2000. pp.
    127–134. Russia (2000), http://conexp.sourceforge.net/
               Filtering Machine Translation Results with
              Automatically Constructed Concept Lattices

                              Yılmaz Kılıçaslan1 and Edip Serdar Güner1,
                       1
                           Trakya University, Department of Computer Engineering,
                                         22100 Edirne, Turkey
                                   {yilmazk, eserdarguner}@trakya.edu.tr



           Abstract. Concept lattices can significantly improve machine translation
           systems when applied as filters to their results. We have developed a rule-based
           machine translator from Turkish to English in a unification-based programming
           paradigm and supplemented it with an automatically constructed concept
           lattice. The test results achieved by applying this translation system to a Turkish
           child story reveals that lattices used as filters to translation results have a
           promising potential to improve machine translation. We have compared our
           system with Google Translate on the data. The comparison suggests that a rule-
           based system can even compete with this statistical machine translation system
           that stands out with its wide range of users.
           Keywords: Concept Lattices, Rule-based Machine Translation, Evaluation of
           MT systems.



    1 Introduction

    Paradigms of Machine translation (MT) can be classified into two major categories
    depending on their focus: result-oriented paradigms and process-oriented ones.
    Statistical MT focuses on the result of the translation, not the translation process
    itself. In this paradigm, translations are generated on the basis of statistical models
    whose parameters are derived from the analysis of bilingual text corpora. Rule-based
    MT, a more classical paradigm, focuses on the selection of representations to be used
    and steps to be performed during the translation process.
        It is the rule-based paradigm that will be the concern of this paper. We argue for
    the viability of a rule-based translation model where a concept lattice functions as a
    filter for its results.
        In what follows, we first introduce the classical models for doing rule-based MT,
    illustrating particular problematic cases with translation pairs between Turkish and
    English (cf. Section 2). Then, we briefly introduce the basic notions of Formal
    Concept Analysis (FCA) and touch upon the question of how lattices built using FCA
    can serve as a bridge between two languages (cf. Section 3). This is followed by the
    presentation of our translation system (cf. Section 4). Subsequently, we report on and
    evaluate several experiments which we have performed by feeding our translation
    system with a Turkish child story text (cf. Section 5). The discussion ends with some
    remarks and with a summary of the paper (cf. Section 6).



c 2011 by the paper authors. CLA 2011, pp. 59–73. Copying permitted only for private
  and academic purposes. Volume published and copyrighted by its editors. Local
  Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
60    Yılmaz Kılıçaslan and Edip Serdar Güner




 2 Models for Rule-Based Translation

 2.1 Direct Translation

 The most straightforward MT strategy is the so-called direct translation. Basically, the
 strategy is to translate each word into its target language counterpart while proceeding
 word-by-word through the source language text or speech. If the only difference
 between two languages were due to their lexical choices, this approach could be a
 very easy way of producing high quality translation. However, languages differ from
 each other not only lexically but also structurally.
    In fact, the direct translation strategy works very well only for very simple cases
 like the following:
    (1) Turkish:                                 Direct Translation to English:
          Köpek-ler havlar-lar.                Dogs bark.
          dog-pl    bark-3pl

 In this example, the direct translation strategy provides us with a perfect translation of
 the Turkish sentence (interpreted as a kind-level statement about dogs). But, consider
 now the following example:
    (2) Turkish:                                Direct Translation to English:




 Supposing that the referent of the pronoun is a male person, the expected translation
 for the given Turkish sentence would be the following:
    (3) Correct Translation:
          The woman knows him.

    The direct translation approach fails in this example in the following respects: First,
 the translation results in a subject-object-verb (SOV) ordering, which does not
 comply with the canonical SVO ordering in English. SOV is the basic word order in
 Turkish. Second, the subject does not have the required definite article in the
 translation. The reason for this is another typological difference between the two
 languages: Turkish lacks a definite article. Third, the word-by-word translation leaves
 the English auxiliary verb ambiguous with respect to number, as the Turkish verb
 does not carry the number information. Fourth, the verb know is encoded in the
 progressive aspect in the translation, which is unacceptable as it denotes a mental
 state. This anomaly is the result of directly translating the Turkish continuous suffix
 –yor to the English suffix –ing. Fifth, the pronoun is left ambiguous with respect to
 gender in the translation, as Turkish pronouns do not bear this information.
Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices                      61




2.2 Transfer Approach

2.2.1 Syntactic Transfer

As Jurafsky and Martin [6] point out, examples like those above suggest that the
direct approach to MT is too focused on individual words and that we need to add
phrasal and structural knowledge into our MT models to achieve better results. It is
through the transfer approach that a rule-based strategy incorporates the structural
knowledge into the MT model. In this approach, MT involves three phases: analysis,
transfer, and generation. In the analysis phase, the source language text is parsed into
a syntactic and/or semantic structure. In the transfer phase, the structure of the source
language is transformed to a structure of the target language. The generation phase
takes this latter structure as input and turns it to an actual text of the target language.
   Let us first see how the transfer technique can make use of syntactic knowledge to
improve the translation result of the example discussed above. Assuming a simple
syntactic paradigm, the input sentence can be parsed into the following structure:
(4)




    Once the sentence has been parsed, the resulting tree will undergo a syntactic
transfer operation to resemble the target parse tree and this will be followed by a
lexical transfer operation to generate the target text:
(5)




  The syntactic transfer exploits the following facts about English: a singular count
noun must have a determiner and the subject agrees in number and person with the
verb. Collecting the leaves of the target parse tree, we get the following output:
(6)   Translation via Syntactic Transfer:




This output is free from the first three defects noted with the direct translation.
However, the problem of encoding the mental state verb in progressive aspect and the
62     Yılmaz Kılıçaslan and Edip Serdar Güner




 gender ambiguity of the pronoun still await to be resolved. These require meaning-
 related knowledge to be incorporated into the MT model.


 2.2.2 Semantic Transfer

 The context-independent aspect of meaning is called semantic meaning. A crucial
 component of the semantic meaning of a natural language sentence is its lexical
 aspect, which determines whether the situation that the sentence describes is a
 (punctual) event, a process or a state. This information is argued to be inherently
 encoded in the verb. Obviously, knowing is a mental state and, hence, cannot be
 realized in the progressive aspect.
        We can apply a shallow semantic analysis to our previously obtained syntactic
 structure, which will give us a tree structure enriched with aspectual information, and
 thereby achieve a more satisfactory transfer:
  (7)




 The resulting translation is the following:
 (8)   Translation via Semantic Transfer:




 2.3 Interlingua Approach

 There are two problems with the transfer model: it requires contrastive knowledge
 about languages and it requires such knowledge for every pair of languages. If the
 meaning of the input can be extracted and encoded in a language-independent form
 and the output can, in turn, be generated out of this form, there will be no need for any
 kind of contrastive knowledge. A language-independent meaning representation
 language to be used in such a scheme is usually referred to as an interlingua.
    A common way to visualize the three approaches to rule-based MT is with
 Vauquois triangle shown below (adopted from [6]):
Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices                   63




                                  Fig. 1. The Vauquois triangle.
  As Jurafsky and Martin point out:
    [t]he triangle shows the increasing depth of analysis required (on both the
    analysis and generation end) as we move from the direct approach through
    transfer approaches, to interlingual approaches. In addition, it shows the
    decreasing amount of transfer knowledge needed as we move up the triangle,
    from huge amounts of transfer at the direct level (almost all knowledge is
    transfer knowledge for each word) through transfer (transfer rules only for
    parse trees or thematic roles) through interlingua (no specific transfer
    knowledge). (p. 867)


3 Lattice-Based Interlingua Strategy

A question left open above is that of what kind of representation scheme can be used
as an interlingua. There are many possible alternatives such as predicate calculus,
Minimal Recursion Semantics or an event-based representation. Another interesting
possibility is to use lattices built using Formal Concept Analysis (FCA) as meaning
representations to this effect.
  FCA, developed by Ganter & Wille [5], assumes that data from an application are
given by a formal context, a triple (G, M, I) consisting of two sets G and M and a so
called incidence relation I between these sets. The elements of G are called the objects
and the elements of M are called the attributes. The relation I holds between g and m,
(g, m) ∈ I if and only if the object g has the attribute m. A formal context induces two
operators, both of which usually denoted by ʹ. One of these operators maps each set of
objects A to the set of attributes Aʹ which these objects have in common. The other
operator maps each set of attributes B to the set of objects Bʹ which satisfies these
attributes. FCA is in fact an attempt to give a formal definition of the notion of a
‘concept’. A formal concept of the context (G, M, I) is a pair (A, B) such that G ⊇ A
= Aʹ and M ⊇ B  Bʹ. A is called the extent and B the intent of the concept (A, B).
The set of all concepts of the context (G, M, I) is denoted by C(G, M, I). This set is
ordered by a subconcept – superconcept relation, which is a partial order relation
denoted by ≤. If (A1, B1) and (A2, B2) are concepts in C(G, M, I), the former is said to
64      Yılmaz Kılıçaslan and Edip Serdar Güner




 be a subconcept of the latter (or, the latter a superconcept of the former), i.e., (A1, B1)
 ≤ (A2, B2), if and only if A1 ⊆ A2 (which is equivalent to B1 ⊇ B2). The ordered set
 C(G, M, I; ≤) is called the concept lattice or (Galois lattice) of the context (G, M, I).
 A concept lattice can be drawn as a (Hasse) diagram in which concepts are
 represented by nodes interconnected by lines going down from superconcept nodes to
 subconcept ones.
   Priss [15], rewording an idea first mentioned by Kipke & Wille [8], suggests that
 once linguistic databases are formalized as concept lattices, the lattices can serve as
 an interlingua. She explains how a concept lattice can serve as a bridge between two
 languages with the aid of the figure below (taken from [13]):




                              Fig. 2. – A concept lattice as an interlingua.


     [This figure] shows separate concept lattices for English and German words for
     “building”. The main difference between English and German is that in English
     “house” only applies to small residential buildings (denoted by letter “H”),
     whereas in German even small office buildings (denoted by letter “O”) and larger
     residential buildings can be called “Haus”. Only factories would not normally be
     called “Haus” in German. The lattice in the top of the figure constitutes an
     information channel in the sense of Barwise & Seligman [2] between the German
     and the English concept lattice. ([15] p. 158)

   We consider Priss’s approach a promising avenue for interlingua-based translation
 strategies. We suggest that this approach can work not only for isolated words but
 also even for text fragments. In what follows, we will sketch out a strategy with
 interlingual concept lattices serving as filters for refining translation results. The
 strategy proceeds as follows: 1) Compile a concept lattice from a data source like
 WordNet. 2) Link the nodes of the lattice to their possibly corresponding expressions
 in the source and target language. 3) Translate the input text into the target language
 with no consideration of the pragmatic aspects of its meaning. 4) Integrate the
 concepts derived from the input text into the concept lattice. The main motivation
 behind this strategy is to refine the translation results to a certain extent by means of
 pragmatic knowledge structured as formal contexts.
 Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices                            65




4 A Translation System with Interlingual Concept Lattices

4.1 A Concept Lattice Generator

Concept lattices to be used as machine translation filters should contain concept nodes
associated with both functional and substantive words. All languages have a finite
number of functional words. Therefore, a manual construction of the lattice fragments
that would contain them would be reasonable. However, manually constructing a
concept lattice for lexical words would have considerable drawbacks such as the
following:

     •    It is labor intensive.
     •    It is prone to yielding errors which are difficult to detect automatically.
     •    It generates incomplete lists that are costly to extend to cover missing
          information.
     •    It is not easy to adapt to changes and domain-specific needs.

Taking these potential problems into consideration, we have developed a tool for
generating concept lattices for lexical words automatically. As this is an FCA
application, it is crucial to decide on which formal context to use before delving its
implementation details.
  Priss & Old [16] propose to construct concept neighborhoods in WordNet with a
formal context where the formal objects are the words of the synsets belonging to all
senses of a word, the formal attributes are the words of the hypernymic synsets and
the incidence relation is the semantic relation between the synsets and their
hypernymic synsets. The neighborhood lattice of a word in WordNet consists of all
words that share some senses with that word.1 Below is the neighborhood lattice their
method yields for the word volume:




                   Fig. 3. – Priss and Old’s neighborhood lattice for the word volume.



1 As lattices often grow very rapidly to a size too large to be visualized, Wille [18] describes a

  method for constructing smaller, so-called “neighborhood” lattices.
66       Yılmaz Kılıçaslan and Edip Serdar Güner




 Consider the bottom node. The concept represented by this node is not a naturally
 occurring one. Obviously, the adopted formal context causes two distinct natural
 concepts to collapse into one single formal concept here. The reason is simply that
 WordNet employs one single word, i.e., volume, for two distinct senses, i.e.,
 publication and amount. This could leave a translation attempt with the task of
 disambiguating this word. In fact, WordNet marks each sense with a single so-called
 synset number.
    When constructing concept lattices in WordNet, we suggest two amendments to the
 formal context adopted by Priss and Old. First, the formal objects are to be the synset
 numbers. Second, the formal attributes are to include also some information compiled
 from the glosses of the words. The first change allows us to distinguish between the
 two senses of the word volume, as shown in Fig. 4a. But, we are still far from
 resolving all ambiguities concerning this word, as indicated by the presence of two
 objects in the leftmost node. The problem is that the hypernymic attributes are not
 sufficiently informative to differentiate the 3-D space sense of the word volume from
 its relative amount sense. This extra information resides in the glosses of the word
 and once encoded as attributes it evokes the required effect, as shown in Fig. 4b.




     Fig. 4a. – A neighborhood lattice with the   Fig. 4b. – A more fine-grained neighborhood
     objects being synset numbers.                lattice with the objects being synset numbers.


 Each gloss, which is most likely a noun phrase, is parsed by means of a shift-reduce
 parser to extract a set of attributes. Having collected the objects (i.e. the synset
 numbers) and the associated attributes, the FCA algorithm that comes with the
 FCALGS library [9] is used for deriving a lattice-based ontology from that collection.
 FCALGS employs a parallel and recursive algorithm. Apart from its being parallel, it
 is very similar to Kuznetsov’s [10] Close-by-One algorithm.
   However, even the lattice in Fig4.b is still defective in at least one respect. The
 names of the objects denoted are lost. To remedy this problem, we suggest to encode
 the objects as tuples of synset numbers and sets of names, as illustrated below.
Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices                      67




                 Fig. 5. – A neighborhood lattice including the names of the objects.

Another point to note is that the name of a synset serves as the attribute of a
subconcept. For example, ‘entity’ is the name of the topmost synset. But, as
everything is an entity, any subconcept must treat it as an element of its set of
attributes.


4.2 A Sense Translator

Each WordNet node is associated with a set of synonymous English words, which is
referred to as its synset. Each synset, in effect, denotes a sense in English. Thus, one
task to accomplish is to translate synsets into Turkish to the furthest possible extent.
We should, of course, keep in mind that some synsets (i.e. some senses encoded in
English) may not have a counterpart in the target language. To find the Turkish
translation of a particular synset, the Sense Translator first downloads a set of relevant
articles via the links given in the disambiguation pages Wikipedia provides for the
words in this set. It searches for the hypernyms of the synset in these articles. It
assigns each article a score in accordance with the sum of the weighted points of the
hypernyms found in this article. More specifically, if a synset has N hypernyms, the
Kth hypernym starting from the top is assigned WeightK = K/N. Let FrequencyK be the
number of occurrences of an item in a given article, then the score of the article is
calculated as follows:
    Article Score = Weight1 * Frequency1 + ... + WeightN * FrequencyN.                  (1)
If the article with the highest score has a link to a Turkish article, the title of the
article will be the translation of the English word under examination. Otherwise, the
word will be left unpaired with a Turkish counterpart. Figure 6 visualizes how the
word cat in WordNet is translated into its Turkish counterpart, kedi, via Wikipedia.
68     Yılmaz Kılıçaslan and Edip Serdar Güner




                  Fig. 6. - Translating the word cat into Turkish via Wikipedia.


    The Turkish counterparts will be added next to the English names, as shown
 below:




     Fig. 7. - A neighborhood lattice including the Turkish counterparts of the English names.



 4.3 A Rule-Based Machine Translator

 We have designed a transfer-based architecture for Turkish-English translation and
 implemented the translator in SWI-Prolog which is an open-source implementation of
 the Prolog programming language. Below is a figure representing the main modules
 of the translator:
Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices                  69




              Fig. 8. - The main modules of the rule-based machine translator.
    The word list extracted by the Preprocessor is used as an input to the Analysis
Module. We have devised a shift-reduce parser in the analysis phase for building up
the grammatical structure of expressions. Briefly, a shift-reduce parser uses a bottom-
up strategy with an ultimate goal of building trees rooted with a start symbol [1]. The
Generation Module first rearranges the constituents using transformation rules.
Afterwards, all the structures are lexically transferred into English using a bilingual
dictionary.

4.4 Filtering Translation Results with the Concept Lattice

Let us turn to our exemplary sentence introduced in (2) (i.e. Kadın onu tanıyor).
Failing to take the context of the sentence into account, the rule-based translator
generates the result in (8) (i.e. The woman knows him/her/it), where the pronoun is
left ambiguous with respect to gender.
  Our claim is that we can resolve such ambiguities using FCA and thereby refine our
translations. To this effect, we propose to generate transient formal concepts for noun
phrases. We make the following assumptions. Basically, personal pronouns,
determiners and proper names introduce formal objects whereas adjectives and nouns
encode formal attributes.
  Suppose that our sentence is preceded by (the Turkish paraphrase of) a sentence like
‘A man has arrived’. The indefinite determiner evokes a new formal object, say obj1.
As the source text is in Turkish, all attributes will be Turkish words. The Turkish
counterpart of the word man is adam. Thus, the transient concept for the subject of
this sentence will be ({obj1}, {adam}). The task is now to embed this transient
concept into the big permanent concept lattice. To do this, a node where the Turkish
counterpart of the synset name is ‘adam’ is searched for. Immediately below this node
is placed a new node with its set of objects being {obj1} and with no additional
attributes. As this is a physical object, the subconcept of this new node has to be the
70     Yılmaz Kılıçaslan and Edip Serdar Güner




 lowest one. As for the second sentence, the NP kadın (the woman) will be associated
 with the transient concept ({X},{kadın}) and the pronoun onu (him/her/it) with the
 transient concept ({Y},{entity}). X and Y are parameters to be anchored to particular
 formal objects. In other words, they are anaphoric. It seems plausible to assert that the
 attributes of an anaphoric object must constitute a (generally proper) subset or
 hypernym set of the attributes of the object serving as the antecedent. Assume that X
 is somehow anaphorically linked to an object obj2. Now, there are two candidate
 antecedents for Y. The woman, or the object obj2, is barred from being antecedent of
 the pronoun by a locality principle like one stated in Chomsky’s [3] Binding Theory:
 roughly stated, a pronoun and its antecedent cannot occur in the same clause. There
 remains one single candidate antecedent, obj1. As its attribute set is a hyponym set of
 {entity}, it can be selected as a legitimate antecedent. The concept node created for
 the man will also be the one denoted by the pronoun with Y being instantiated with
 obj1. In the concept lattice constructed in WordNet, the concept named as ‘man’
 includes ‘male person’ in its set of attributes. Hence, the ambiguity is resolved and the
 pronoun translates into English as ‘him’.
    It is worth noting that in case there is more than one candidate antecedent, an
 anaphora resolution technique, especially a statistical one, can be employed to pick
 out the candidate most likely to be the antecedent. The interested reader is referred to
 Mitkov [12] for a survey of anaphora resolution approaches in general and to
 Kılıçaslan et al [7] for anaphora resolution in Turkish.
    The gender disambiguation process can also be carried out for common nouns.
 Consider the following fragment taken from a child story:
 (9)




 Turkish, leaves not only pronouns but also many other words ambiguous with respect
 to the gender feature. The word ‘kardeş’ in this example is ambiguous between the
 translations sister and brother. This ambiguity will be resolved in favor of the former
 interpretation in way similar to the disambiguation process sketched out for pronouns
 above.
   In fact, the problem of sense disambiguation is a kind of specification problem.
 Therefore, it cannot be confined to gender disambiguation. For example, given that
 we have somehow managed to compile the attributes listed in the column on the left-
 hand side, our FCA-based system generates the translations listed on the right-hand
 side:
Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices                      71




         zehirli, diş ‘poisonous, tooth’               fang
         zehirli, mantar ‘poisonous, mushroom’         toadstool
         sivri, diş ‘sharp, tooth’                     fang
         arka, koltuk ‘rear, seat’                     rumble
         acemi, asker ‘inexperienced, soldier’         recruit
It will, of course, be interesting to try to solve other kinds of translation problems with
FCA-based techniques. We leave this task to accomplish in the light of further
research in the future.


5 Results and Evaluation

In the early years of MT, the quality of an MT system was determined by human
judgment. Though specially trained for the purpose, human judges are prone to suffer
at least from subjectivity. Besides, this exercise is almost always more costly and time
consuming. Some automated evaluation metrics have been developed in order to
overcome such problems. Among these are BLEU, NIST, WER and PER.
    BLEU [14] and NIST [4] are rather specialized metrics. They are employed by
considering the fraction of output n-grams that also appear in a set of human
translations (n-gram precision). This allows the acknowledgment of a greater diversity
of acceptable MT results.
    As for WER (Word Error Rate) and PER (Position-independent Word Error Rate),
they are more general purpose measures and they rely on direct correspondence
between the machine translation and a single human-produced reference. WER is
based on the Levenshtein distance [11] which is the edit distance between a reference
translation and its automatic translation, normalized by the length of the reference
translation. This metric is formulated as:
                                            S+D+I             (2)
                                WER =
                                              N
where N is the total number of words in the reference translation, S is the number of
substituted words in the automatic translation, D is the number of words deleted from
the automatic translation and I is the number of words inserted in the reference not
appearing in the automatic translation.
   Although WER requires exactly the same order of the words in automatic
translation and reference, PER neglects word order completely [17]. It measures the
difference in the count of the words occurring in automatic and reference translations.
The resulting number is divided by the number of words in the reference. It is worth
noting that PER is technically not a distance measure as it uses a position-independent
Levenshtein distance where the distance between a sentence and one of its
permutations is always taken to be zero.
   We used WER to evaluate the performance of our MT system. This is probably the
metric most commonly used for similar purposes. As we employed a single human-
produced reference, this metric suits well to our evaluation setup. We fed our system
72    Yılmaz Kılıçaslan and Edip Serdar Güner




 with a Turkish child story involving 91 sentences (970 words).2 We post-edited the
 resulting translation in order to generate a reference. When necessary calculations
 were done in accordance with formula (1), the WER turned out to be 38%.
    The next step was to see the extent to which the performance or our MT system
 could be improved using concept lattices as filters for the raw results. To this effect,
 we devised several concept lattices like that in figure 3 and filtered the lexical
 constituents of each automatic translation with them.
    A considerable regression in error rate is observed in our system supplemented with
 concept lattices: the WER score is reduced down to a value around 30%.
    One question that comes to mind at this point is that of whether the improvement
 achieved is statistically significant or not. To get an answer we had recourse to the
 Wilcoxon Signed-Rank test. This test is used to analyze matched-pair numeric data,
 looking at the difference between the two values in each matched pair. When applied
 to the WER scores of the non-filtered and filtered translation results, the test shows
 that the difference is statistically significant (p < 0.005).
    Another question is that of whether the results are practically satisfactory. To get
 some insight to this question, we should employ a baseline system for a comparison
 on usability. Google Translate, a statistical MT system that stands out with its wide
 range of users, can serve for this purpose. The WER score obtained employing
 Google Translate on our data is 34%. Recalling that the WER score of our system
 supplemented with concept lattices is 30%, we seem to be entitled to argue for the
 viability of rule-based MT systems. Of course, we need to make this claim tentatively
 since the size of the data on which the comparisons are made is relatively small.
 However, it should also be noted that we have employed a limited number of concept
 lattices of considerably small sizes. It is of no doubt that increasing the number and
 size of filtering lattices would improve the performance of our MT system.
    More importantly, we do not primarily have an NLP concern in this work. Rather,
 we would like the results to be evaluated from a computational linguistics perspective.
 Everything aside, the results show that even a toy lattice based ontology can yield
 statistically significant improvement for an MT system.


 6 Conclusion

    In this paper, we have illustrated some translation problems caused by some
 typological divergences between Turkish and English using a particular example. We
 have gone through the direct translation, syntactic transfer and semantic transfer
 phases of the rule-based translation model to see what problem is dealt with in what
 phase. We have seen that a context-dependent pragmatic process is necessary to get to
 a satisfactory result. Concept lattices appear to be very efficient tools for
 accomplishing this pragmatic disambiguation task. Supplementing a rule-based MT
 system with concept lattices not only yields statistically significant improvement on
 the results of the system but also enables it to compete with a statistical MT system
 like Google Translate.


 2 This is the story where the example in (9) comes from.
 Filtering Machine Transl. Results with Autom. Constructed Conc. Lattices                            73




References

1. Aho, A.V., Ullman, J.D.: The Theory of Parsing, Translation, and Compiling, Vol. 1.,
    Prentice Hall (1972)
2. Barwise J., Seligman, J.: Information Flow. The Logic of Distributed Systems. Cambridge
    University Press (1997)
3. Chomsky, N.: Lectures on Government and Binding, Foris, Dordrecht (1981).
4. Doddington, G.: “Automatic Evaluation of Machine Translation Quality Using N-gram Co-
    occurrence Statistics”. In Proceedings of HLT 2002 (2nd Conference on Human Language
    Technology). San Diego, California, 128-132 (2002)
5. Ganter, B., Wille, R.: Formale Begriffsanalyse: Mathematische Grundlagen. Berlin:
    Springer (1996)
6. Jurafsky, D., Martin, J. H.: Speech and Language Processing, 2nd Edition, Prentice Hall
    (2009)
7. Kılıçaslan, Y., Güner, E. S., Yıldırım, S.: Learning-based pronoun resolution for Turkish
   with a comparative evaluation, Computer Speech & Language, Volume 23, Issue 3,
   p. 311-331 (2009)
8. Kipke, U., Wille, R.: Formale Begriffsanalyse erläutert an einem Wortfeld. LDV–Forum, 5
    (1987)
9. Krajca, P., Outrata, J., Vychodil, V.: Parallel Recursive Algorithm for FCA. In: Belohlavek
    R., Kuznetsov S. O. (Eds.): Proc. CLA 2008, CEUR WS, 433, 71–82 (2008)
10. Kuznetsov, S.: Learning of Simple Conceptual Graphs from Positive and Negative
    Examples. PKDD 1999, pp. 384–391 (1999)
11. Levenshtein, V. I.: "Binary codes capable of correcting deletions, insertions, and reversals,"
    Tech. Rep. 8. (1966)
12. Mitkov, R.: Anaphora Resolution: The State of the Art. Technical Report, University of
     Wolverhampton (1999)
13. Old, L. J., Priss, U.: Metaphor and Information Flow. In Proceedings of the 12th Midwest
    Artificial Intelligence and Cognitive Science Conference, pp. 99-104 (2001)
14. Papineni, K., Roukos, S., Ward, T., Zhu, W. J.: "BLEU: a method for automatic evaluation
    of machine translation" in ACL-2002: 40th Annual meeting of the Association for
    Computational Linguistics pp. 311–318 (2002)
15. Priss, U.: Linguistic Applications of Formal Concept Analysis, Ganter; Stumme; Wille
    (eds.), Formal Concept Analysis, Foundations and Applications, Springer Verlag, LNAI
    3626, pp. 149-160 (2005)
16. Priss, U., Old, L. J.: "Concept Neighbourhoods in Lexical Databases.", In Proceedings of
   the 8th International Conference on Formal Concept Analysis, ICFCA'10, Springer
   Verlag, LNCS 5986, p. 283-295 (2010)
17. Tillmann C., Vogel, S., Ney, H., Zubiaga A., Sawaf, H.: Accelerated DP based search for
    statistical translation. In European Conf. on Speech Communication and Technology, pages
    2667–2670, Rhodes, Greece, September (1997)
18. Wille, R.: The Formalization of Roget’s International Thesaurus. Unpublished manuscript
   (1993)
      Concept lattices in fuzzy relation equations?

                        Juan Carlos Dı́az and Jesús Medina??

                   Department of Mathematics. University of Cádiz
                   Email: {juancarlos.diaz,jesus.medina}@uca.es



        Abstract. Fuzzy relation equations are used to investigate theoretical
        and applicational aspects of fuzzy set theory, e.g., approximate reasoning,
        time series forecast, decision making and fuzzy control, etc.. This paper
        relates these equations to a particular kind of concept lattices.


  1    Introduction
  Recently, multi-adjoint property-oriented concept lattices have been introduced
  in [16] as a generalization of property-oriented concept lattices [10,11] to a fuzzy
  environment. These concept lattices are a new point of view of rough set the-
  ory [23] that considers two different sets: the set of objects and the set of at-
  tributes.
      On the other hand, fuzzy relation equations, introduced by E. Sanchez [28],
  are associated to the composition of fuzzy relations and have been used to in-
  vestigate theoretical and applicational aspects of fuzzy set theory [22], e.g., ap-
  proximate reasoning, time series forecast, decision making, fuzzy control, as an
  appropriate tool for handling and modeling of nonprobabilistic form of uncer-
  tainty, etc. Many papers have investigated the capacity to solve (systems) of
  fuzzy relation equations, e.g., in [1, 8, 9, 25, 26].
      In this paper, the multi-adjoint relation equations are presented as a general-
  ization of the fuzzy relation equations [24,28]. This general environment inherits
  the properties of the multi-adjoint philosophy, consequently, e.g., several con-
  junctors and residuated implications defined on general carriers as lattice struc-
  tures can be used, which provide more flexibility in order to relate the variables
  considered in the system.
      Moreover, multi-adjoint property-oriented concept lattices and systems of
  multi-adjoint relation equations have been related in order to obtain results that
  ensure the existence of solutions in these systems. These definitions and results
  are illustrated by a toy example to improve the readability and comprehension
  of the paper.
      Among all concept lattice frameworks, we have related the multi-adjoint
  property-oriented concept lattices to the systems of multi-adjoint relation equa-
  tions, e.g., the extension and intension operators of this concept lattice can be
  ?
    Partially supported by the Spanish Science Ministry TIN2009-14562-C05-03 and by
    Junta de Andalucı́a project P09-FQM-5233.
 ??
    Corresponding author.


c 2011 by the paper authors. CLA 2011, pp. 75–86. Copying permitted only for private
  and academic purposes. Volume published and copyrighted by its editors. Local
  Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
76       Juan Carlos Dı́az and Jesús Medina-Moreno


used to represent multi-adjoint relation equations, and, as a result, the solu-
tions of these systems of relation equations can be related to the concepts of the
corresponding concept lattice.
    The more important consequence is that this relation provides that the prop-
erties given, e.g., in [2–4,12,14,17,18,27] can be applied to obtain many properties
of these systems. Indeed, it can be considered that the algorithms presented, e.g.,
in [5, 6, 15] obtain solutions for these systems.
    The plan of this paper is the following: in Section 2 we will recall the multi-
adjoint property-oriented concept lattices as well as the basic operators used and
some properties; later, in Section 3, an example will be introduced to motivate
the multi-adjoint relation equations. Once these equations have been presented,
in Section 4 the multi-adjoint property-oriented concept lattices and the systems
of multi-adjoint relation equations will be related in order to obtain results which
ensure the existence of solutions in these systems; the paper ends with some
conclusions and prospects for future work.

2      Multi-adjoint property-oriented concept lattices
The basic operators in this environment are the adjoint triples, which are formed
by three mappings: a non-commutativity conjunctor and two residuated impli-
cations [13], which satisfy the well-known adjoint property.
Definition 1. Let (P1 , ≤1 ), (P2 , ≤2 ), (P3 , ≤3 ) be posets and & : P1 × P2 → P3 ,
. : P3 × P2 → P1 , - : P3 × P1 → P2 be mappings, then (&, ., -) is an adjoint
triple with respect to P1 , P2 , P3 if:
1. & is order-preserving in both arguments.
2. . and - are order-preserving on the first argument1 and order-reversing
   on the second argument.
3. x ≤1 z . y iff x & y ≤3 z iff y ≤2 z - x, where x ∈ P1 , y ∈ P2
   and z ∈ P3 .
   Example of adjoint triples are the Gödel, product and Lukasiewicz t-norms
together with their residuated implications.
Example 1. Since the Gödel, product and Lukasiewicz t-norms are commuta-
tive, the residuated implications satisfy that .G =-G , .P =-P and .L =-L .
Therefore, the Gödel, product and Lukasiewicz adjoint triples are defined on
[0, 1] as:
           &P (x, y) = x · y                    z -P x = min(1, z/x)
                                                         (
                                                          1 if x ≤ z
           &G (x, y) = min(x, y)                z -G x =
                                                          z otherwise
           &L (x, y) = max(0, x + y − 1)        z -G x = min(1, 1 − x + z)

1
     Note that the antecedent will be evaluated on the right side, while the consequent
     will be evaluated on the left side, as in logic programming framework.


                                            2
                                       Concept lattices in fuzzy relation equations       77


In [19] more general examples of adjoint triples are given.

    The basic structure, which allows the existence of several adjoint triples for
a given triplet of lattices, is the multi-adjoint property-oriented frame.

Definition 2. Given two complete lattices (L1 , 1 ) and (L2 , 2 ), a poset (P, ≤)
and adjoint triples with respect to P, L2 , L1 , (&i , .i , -i ), for all i = 1, . . . , l, a
multi-adjoint property-oriented frame is the tuple

                 (L1 , L2 , P, 1 , 2 , ≤, &1 , .1 , -1 , . . . , &l , .l , -l )

Multi-adjoint property-oriented frames are denoted as (L1 , L2 , P, &1 , . . . , &l ).
Note that the notation is similar to a multi-adjoint frame [18], although the
adjoint triples are defined on different carriers.
   The definition of context is analogous to the one given in [18].

Definition 3. Let (L1 , L2 , P, &1 , . . . , &l ) be a multi-adjoint property-oriented
frame. A context is a tuple (A, B, R, σ) such that A and B are non-empty sets
(usually interpreted as attributes and objects, respectively), R is a P -fuzzy rela-
tion R : A × B → P and σ : B → {1, . . . , l} is a mapping which associates any
element in B with some particular adjoint triple in the frame.2

    From now on, we will fix a multi-adjoint property-oriented frame and context,
(L1 , L2 , P, &1 , . . . , &l ), (A, B, R, σ).
                                                               ↓N
    Now we define the following mappings ↑π : LB        A
                                                 2 → L1 and       : LA     B
                                                                     1 → L2 as

                       g ↑π (a) = sup{R(a, b) &σ(b) g(b) | b ∈ B}                        (1)
                         N
                      f ↓ (b) = inf{f (a) -σ(b) R(a, b) | a ∈ A}                         (2)

    Clearly, these definitions3 generalize the classical possibility and necessity
operators [11] and they form an isotone Galois connection [16]. There are two
dual versions of the notion of Galois connetion. The most famous Galois connec-
tion, where the maps are order-reversing, is properly called Galois connection,
and the other in which the maps are order-preserving, will be called isotone Ga-
lois connection. In order to make this contribution self-contained, we recall their
formal definitions:
    Let (P1 , ≤1 ) and (P2 , ≤2 ) be posets, and ↓ : P1 → P2 , ↑ : P2 → P1 mappings,
the pair (↑ , ↓ ) forms a Galois connection between P1 and P2 if and only if: ↑ and
↓
  are order-reversing; x ≤1 x↓↑ , for all x ∈ P1 , and y ≤2 y ↑↓ , for all y ∈ P2 .
    The one we adopt here is the dual definition: Let (P1 , ≤1 ) and (P2 , ≤2 ) be
posets, and ↓ : P1 → P2 , ↑ : P2 → P1 mappings, the pair (↑ , ↓ ) forms an isotone
Galois connection between P1 and P2 if and only if: ↑ and ↓ are order-preserving;
x ≤1 x↓↑ , for all x ∈ P1 , and y ↑↓ ≤2 y, for all y ∈ P2 .
2
  A similar theory could be developed by considering a mapping τ : A → {1, . . . , l}
  which associates any element in A with some particular adjoint triple in the frame.
3
  From now on, to improve readability, we will write &b , -b instead of &σ(b) , -σ(b) .


                                                3
78       Juan Carlos Dı́az and Jesús Medina-Moreno


    A concept, in this environment, is a pair of mappings hg, f i, with g ∈ LB , f ∈
                              N
L , such that g ↑π = f and f ↓ = g, which will be called multi-adjoint property-
    A

oriented formal concept. In that case, g is called the extension and f , the inten-
sion of the concept. The set of all these concepts will be denoted as MπN [16].

Definition 4. A multi-adjoint property-oriented concept lattice is the set
                                                                   N
                                                      ↑π
               MπN = {hg, f i | g ∈ LB        A
                                     2 , f ∈ L1 and g    = f, f ↓ = g}

in which the ordering is defined by hg1 , f1 i  hg2 , f2 i iff g1 2 g2 (or equivalently
f1 1 f2 ).

    The pair (MπN , ) is a complete lattice [16], which generalize the concept
lattice introduced in [7] to a fuzzy environment.


3       Multi-adjoint relation equations

This section begins with an example that motivates the definition of multi-
adjoint relation equations, which will be introduced later.


3.1      Multi-adjoint logic programming

A short summary of the main features of multi-adjoint languages will be pre-
sented. The reader is referred to [20, 21] for a complete formulation.
    A language L contains propositional variables, constants, and a set of logical
connectives. In this fuzzy setting, the usual connectives are adjoint triples and
a number of aggregators.
    The language L is interpreted on a (biresiduated) multi-adjoint lattice,4
hL, , .1 , -1 , &1 , . . . , .n , -n , &n i, which is a complete lattice L equipped with
a collection of adjoint triples h&i , .i , -i i, where each &i is a conjunctor in-
tended to provide a modus ponens-rule with respect to .i and -i .
    A rule is a formula A .i B or A -i B, where A is a propositional symbol
(usually called the head) and B (which is called the body) is a formula built from
propositional symbols B1 , . . . , Bn (n ≥ 0), truth values of L and conjunctions,
disjunctions and aggregations. Rules with an empty body are called facts.
    A multi-adjoint logic program is a set of pairs hR, αi, where R is a rule and
α is a value of L, which may express the confidence which the user of the system
has in the truth of the rule R. Note that the truth degrees in a given program
are expected to be assigned by an expert.
Example 2. Let us to consider a multi-adjoint lattice

                            h[0, 1], ≤, ←G , &G , ←P , &P , ∧L i
4
     Note that a multi-adjoint lattice is a particular case of a multi-adjoint property-
     oriented frame.


                                             4
                                        Concept lattices in fuzzy relation equations           79


where &G and &P are the Gödel and product conjunctors, respectively, and
←G , ←P their corresponding residuated implications. Moreover, the Lukasie-
wicz conjunctor ∧L will be used in the program [13].
   Given the set of variables (propositional symbols)
  Π = {low oil, low water, rich mixture, overheating, noisy behaviour,
         high fuel consumption}
the following set of multi-adjoint rules form a multi-adjoint program, which may
represent the behaviour of a motor.
           hhigh fuel consumption ←G rich mixture ∧L low oil, 0.8i
                         hoverheating ←G low oil, 0.5i
                   hnoisy behaviour ←P rich mixture, 0.8i
                         hoverheating ←P low water, 0.9i
                   hnoisy behaviour ←G low oil, 1i
    The usual procedural is to measure the levels of “oil”, “water” and “mix-
ture” of a specific motor, after that the values for low oil, low water and
rich mixture are obtained, which are represented in the program as facts, for
instance, the next ones can be added to the program:
                                        hlow oil ←P >, 0.2i
                                     hlow water ←P >, 0.2i
                                hrich mixture ←P >, 0.5i
    Finally, the values for the rest of variables (propositional symbols) are com-
puted [20]. For instance, in order to attain the value for overheating(o, w), for
a level of oil, o, and water, w, the rules hoverheating ←G low oil, ϑ1 i and
hoverheating ←P low water, ϑ2 i are considered and its value is obtained as:
        overheating(o, w) = (low oil(o) &G ϑ1 ) ∨ (low water(w) &P ϑ2 )                        (3)
    Now, the problem could be to recompute the weights of the rules from
experimental instances of the variables, that is, the values of overheating,
noisy behaviour and high fuel consumption are known for particular mea-
sures of low oil, low water and rich mixture.
    Specifically, given the levels of oil, o1 , . . . , on , the levels of water, w1 , . . . , wn ,
and the measures of mixture, t1 , . . . , tn , we may experimentally know the values
of the variables: noisy behaviour(ti , oi ), high fuel consumption(ti , oi ) and
overheating(oi , wi ), for all i ∈ {1, . . . , n}.
    Considering Equation (3), the unknown elements could be ϑ1 and ϑ2 instead
of overheating(o, w). Therefore, the problem now is to look for the values of
ϑ1 and ϑ2 , which solve the following system obtained after assuming the exper-
imental data for the propositional symbols, ov1 , o1 , w1 , . . . , ovn , on , wn .
       overheating(ov1 ) = (low oil(o1 ) &G ϑ1 ) ∨ (low water(w1 ) &P ϑ2 )
              ..         ..                  ..        ..
               .          .                   .         .
       overheating(ovn ) = (low oil(on ) &G ϑ1 ) ∨ (low water(wn ) &P ϑ2 )

                                                5
80     Juan Carlos Dı́az and Jesús Medina-Moreno


   This system can be interpreted as a system of fuzzy relation equations in
which several conjunctors, &G and &P , are assumed. Moreover, these conjunctors
could be neither non-commutative nor associative and defined in general lattices,
as permit the multi-adjoint framework.
    Next sections introduce when these systems have solutions and a novel method
to obtain them using concept lattice theory.

3.2   Systems of multi-adjoint relation equations
The operators used in order to obtain the systems will be the generalization of the
sup-∗-composition, introduced in [29], and inf-→-composition, introduced in [1].
From now on, a multi-adjoint property-oriented frame, (L1 , L2 , P, &1 , . . . , &l )
will be fixed.
    In the definition of a multi-adjoint relation equation an interesting mapping
σ : U → {1, . . . , l} will be considered, which relates each element in U to an
adjoint triple. This mapping will play a similar role as the one given in a multi-
adjoint context, defined in the previous section, for instance, this map provides
a partition of U in preference sets. A similar theory may be developed for V
instead of U .
    Let U = {u1 , . . . , um } and V = {v1 , . . . , vn } be two universes, R ∈ L2 U ×V
an unknown fuzzy relation, σ : U → {1, . . . , l} a map that relates each element
in U to an adjoint triple, and K1 , . . . , Kn ∈ P U , D1 , . . . , Dn ∈ L1 V arbitrarily
chosen fuzzy subsets of the respective universes.
    A system of multi-adjoint relation equations with sup-&-composition, is the
following system of equations
                  _
                      (Ki (u) &u R(u, v)) = Di (v), i ∈ {1, . . . , n}                (4)
                 u∈U

where &u represents the adjoint conjunctor associated to u by σ, that is, if
σ(u) = (&s , .s , -s ), for s ∈ {1, . . . , l}, then &u is exactly &s .
   If an element v of V is fixed and the elements Ki (uj ), R(uj , v) and Di (v) are
written as kij , xj and di , respectively, for each i ∈ {1, . . . , n}, j ∈ {1, . . . , m},
then System (4) can particularly be written as
                         k11 &u1 x1 ∨ · · · ∨ k1m &um xm = d1
                                     ..      ..          .. ..                         (5)
                                      .       .           . .
                        kn1 &u1 x1 ∨ · · · ∨ knm &um xm = dn
where kij and di are known and xj must be obtained.
   Hence, for each v ∈ V , if we solve System (5), then we obtain a “column”
of R (i.e. the elements R(uj , v), with j ∈ {1, . . . , m}), thus, solving n similar
systems, one for each v ∈ V , the unknown relation R is obtained.
Example 3. Assuming Example 2, in this case, we will try to solve the problem
about to obtain the weights associated to the rules from particular observed data
for the propositional symbols.


                                            6
                                     Concept lattices in fuzzy relation equations        81


    The propositional symbols (variables) will be written in short as: hfc, nb,
oh, rm, lo and lw, and the measures of particular cases of the behaviour of the
motor will be: hi , ni , ovi , ri , oi , wi , for hfc, nb, oh, rm, lo and lw, respectively,
in each case i, with i ∈ {1, 2, 3}.
    For instance, the next system associated to overheating is obtained from
the computation provided in Example 2.
                     oh(ov1 ) = (lo(o1 ) &G ϑoh                  oh
                                             lo ) ∨ (lw(w1 ) &P ϑlw )
                     oh(ov2 ) = (lo(o2 ) &G ϑoh                  oh
                                             lo ) ∨ (lw(w2 ) &P ϑlw )
                     oh(ov3 ) = (lo(o3 ) &G ϑoh                  oh
                                             lo ) ∨ (lw(w3 ) &P ϑlw )

where ϑoh         oh
         lo and ϑlw are the weights associated to the rules with head oh. Similar
systems can be obtained to high fuel consumption and noisy behaviour.
    Assuming the multi-adjoint frame with carrier L = [0, 1] and the Gödel and
product triples, these systems are particular systems of multi-adjoint relational
equations. The corresponding context is formed by the sets U = {rm, lo, lw, rm∧L
lo}, V = {hfc, nb, oh}; the mapping σ that relates the elements lo, rm ∧L lo to
the Gödel triple, and rm, lw to the product triple; the mappings K1 , . . . , Kn ∈
P U , defined as the values given by the propositional symbols in U on the ex-
perimental data, for instance, if u = lo, then K1 (lo) = lo(o1 ), . . . , Kn (lo) =
lo(on ); and the mappings D1 , . . . , Dn ∈ L1 V , defined analogously, for instance,
if v = rm, then D1 (rm) = rm(r1 ), . . . , Dn (rm) = rm(rn ).
    Finally, the unknown fuzzy relation R ∈ L2 U ×V is formed by the weights of
the rules in the program.
    In the system above, oh has been the element v ∈ V fixed. Moreover, as
there do not exist rules with body rm and rm ∧L lo, that is, the weights for
that hypothetical rules are 0, then the terms (rm(ri ) &G 0 = 0 and (rm(ri ) ∧L
lo(oi ) &P 0 = 0 do not appear.
   Its counterpart is a system of multi-adjoint relation equations with inf---
composition, that is,
              ^
                 (R(u, v) -uj Kj∗ (v)) = Ej (u), j ∈ {1, . . . , m}         (6)
               v∈V

considered with respect to unknown fuzzy relation R ∈ L1 U ×V , and where
K1∗ , . . . , Km
               ∗
                 ∈ P V and E1 , . . . , Em ∈ L2 U . Note that -uj represents the corre-
sponding adjoint implication associated to uj by σ, that is, if σ(uj ) = (&s , .s
, -s ), for s ∈ {1, . . . , l}, then -uj is exactly -s . Remark that in System 6,
the implication -uj does not depend of the element u, but of j. Hence, the
implications used in each equation of the system are the same.
    If an element u of U is fixed, fuzzy subsets K1∗ , . . . , Km∗
                                                                   ∈ P V , E1 , . . . , Em ∈
  U                                    ∗
L2 are assumed, such that Kj (vi ) = kij , R(u, vi ) = yi and Ej (u) = ej , for each
i ∈ {1, . . . , n}, j ∈ {1, . . . , m}, then System (6) can particularly be written as
                         y1 -u1 k11 ∧ · · · ∧ yn -u1 kn1 = e1
                                  ..           ..        .. ..                          (7)
                                   .            .         . .
                        y1 -um k1m ∧ · · · ∧ yn -um knm = em

                                             7
82     Juan Carlos Dı́az and Jesús Medina-Moreno


Therefore, for each u ∈ U , we obtain a “row” of R (i.e. the elements R(u, vi ), with
i ∈ {1, . . . , n}), consequently, solving m similar systems, the unknown relation
R is obtained.
   Systems (5) and (7) have the same goal, searching for the unknown relation
R although the mechanism is different.
   Analyzing these systems, we have that the left side of these systems can be
represented by the mappings CK : Lm            n          n    m
                                         2 → L1 , IK ∗ : L1 → L2 , defined as:

         CK (x̄)i = ki1 &u1 x1 ∨ · · · ∨ kim &um xm , for all i ∈ {1, . . . , n}         (8)
         IK ∗ (ȳ)j = y1 -uj k1j ∧ · · · ∧ yn -uj knj , for all j ∈ {1, . . . , m}       (9)

where x̄ = (x1 , . . . , xm ) ∈ Lm                              n
                                 2 , ȳ = (y1 , . . . , yn ) ∈ L1 , and CK (x̄)i , IK ∗ (ȳ)j
are the components of CK (x̄), IK ∗ (ȳ), respectively, for each i ∈ {1, . . . , n} and
j ∈ {1, . . . , m}.
   Hence, Systems (5) and (7) can be written as:

                            CK (x1 , . . . , xm ) = (d1 , . . . , dn )                  (10)
                             IK ∗ (y1 , . . . , yn ) = (e1 , . . . , em )               (11)

respectively.


4    Relation between multi-adjoint property-oriented
     concept lattices and multi-adjoint relation equation
This section shows that Systems (5) and (7) can be interpreted in a multi-
adjoint property-oriented concept lattice. And so, the properties given to the
                                         N
isotone Galois connection ↑π and ↓ , as well as to the complete lattice MπN can
be used in the resolution of these systems.
    First of all, the environment must be fixed. Hence, a multi-adjoint context
(A, B, S, σ) will be considered, such that A = V 0 , B = U , where V 0 has the same
cardinality as V , σ will be the mapping given by the systems and S : A × B → P
is defined as S(vi0 , uj ) = kij . Note that A = V 0 is related to the mappings Ki ,
since S(vi0 , uj ) = kij = Ki (uj );
    Now, we will prove that the mappings defined at the end of the previous
section are related to the isotone Galois connection. Given µ ∈ LB          2 , such that
µ(uj ) = xj , for all j ∈ {1, . . . , m}, the following equalities are obtained, for each
i ∈ {1, . . . , n}:

            CK (x̄)i = ki1 &u1 x1 ∨ · · · ∨ kim &um xm
                      = S(vi0 , u1 ) &u1 µ(u1 ) ∨ · · · ∨ S(vi0 , um ) &um µ(um )
                      = sup{S(vi0 , uj ) &uj µ(uj ) | j ∈ {1, . . . , m}}
                      = µ↑π (vi0 )
                                                                     ↑π
    Therefore, the mapping CK : Lm      n
                                 2 → L1 is equivalent to the mapping    : LB
                                                                           2 →
 A                           m                                    B
L1 , where an element x̄ in L2 can be interpreted as a map µ in L2 , such that

                                                 8
                                   Concept lattices in fuzzy relation equations     83


µ(uj ) = xj , for all j ∈ {1, . . . , m}, and the element CK (x̄) as the mapping µ↑π ,
such that µ↑π (vi0 ) = CK (x̄)i , for all i ∈ {1, . . . , n}.
   An analogy can be developed applying the above procedure to mappings IK ∗
      N
                                                                    ↓N
and ↓ , obtaining that the mappings IK ∗ : Ln1 → Lm           2 and    : LA    B
                                                                          1 → L2 are
equivalent.
   As a consequence, the following result holds:
Theorem 1. The mappings CK : Lm              n        n    m
                                     2 → L1 , IK ∗ : L1 → L2 , establish an iso-
                                                   m    m
tone Galois connection. Therefore, IK ∗ ◦ CK : L2 → L2 is a closure operator
and CK ◦ IK ∗ : Ln1 → Ln1 is an interior operator.
    As (CA , IK ∗ ) is an isotone Galois connection, any result about the solvability
of one system has its dual counterpart.
    The following result explains when these systems can be solved and how a
solution can be obtained.
                                                                 N
Theorem 2. System (5) can be solved if and only if hλ↓d¯ , λd¯i is a concept
of MπN , where λd¯ : A = {v1 , . . . , vn } → L1 , defined as λd¯(vi ) = di , for all
                                                                     N
i ∈ {1, . . . , n}. Moreover, if System (5) has a solution, then λ↓d¯ is the greatest
solution of the system.
    Similarly, System (7) can be solved if and only if hµē , µē↑π i is a concept of
MπN , where µē : B = {u1 , . . . , um } → L2 , defined as µē (uj ) = ej , for all j ∈
{1, . . . , m}. Furthermore, if System (7) has a solution, then µ↑ē π is the smallest
solution of the system.

   The main contribution of the relation introduced in this paper is not only the
above consequences, but a lot of other properties for Systems (5) and (7) that
can be stabilized from the results proved, for example, in [2–4, 12, 14, 17, 18, 27].
   Next example studies the system of multi-adjoint relation equations presented
in Example 3.
Example 4. The aim will be to solve a small system in order to improve the
understanding of the method. In the environment of Example 3, the following
system will be solved assuming the experimental data: oh(ov1 ) = 0.5, lo(o1 ) =
0.3, lw(w1 ) = 0.3, oh(ov2 ) = 0.7, lo(o2 ) = 0.6, lw(w2 ) = 0.8, oh(ov3 ) = 0.4,
lo(o3 ) = 0.5, lw(w3 ) = 0.2.

                   oh(ov1 ) = (lo(o1 ) &G ϑoh                  oh
                                           lo ) ∨ (lw(w1 ) &P ϑlw )
                   oh(ov2 ) = (lo(o2 ) &G ϑoh                  oh
                                           lo ) ∨ (lw(w2 ) &P ϑlw )
                   oh(ov3 ) = (lo(o3 ) &G ϑoh                  oh
                                           lo ) ∨ (lw(w3 ) &P ϑlw )

where ϑoh        oh
        lo and ϑlw are the variables.
    The context is: A = V 0 = {1, 2, 3}, the set of observations, B = U = {lo, lw},
σ associates the propositional symbol lo to the Gödel triple and lw to the
product triple. The relation S : A × B → [0, 1] is defined in Table 1.
    Therefore, considering the mapping λoh : A → [0, 1] associated to the values
of overheating in each experimental case, that is λoh (1) = 0.5, λoh (2) = 0.7,


                                          9
84        Juan Carlos Dı́az and Jesús Medina-Moreno

                                            Table 1. Relation S.

                                            low oil        low water
                                    1         0.3             0.3
                                    2         0.6             0.8
                                    3         0.5             0.2




and λoh (3) = 0.4; and the mapping CK : [0, 1]2 → [0, 1]3 , defined in Equation (8),
the system above can be written as

                                            CK (ϑoh    oh
                                                 lo , ϑlw ) = λoh

Since, by the comment above, there exists µ ∈ [0, 1]B , such that CK (ϑoh    oh
                                                                       lo , ϑlw ) =
 ↑π                                                      B             ↑π
µ , the goal will be to attain the mapping µ ∈ [0, 1] , such that µ = λoh ,
                                           N
which can be found if and only if ((λoh )↓ , λoh ) is a multi-adjoint property-
oriented concept in the considered context, by Theorem 2.
                                    N
    First of all, we compute (λoh )↓ .
         N
(λoh )↓ (lo) = inf{λoh (1) -G S(1, lo), λoh (2) -G S(2, lo), λoh (3) -G S(3, lo)}
                      = inf{0.5 -G 0.3, 0.7 -G 0.6, 0.4 -G 0.5}
                      = inf{1, 1, 0.4} = 0.4
         ↓N
(λoh )        (lw) = inf{0.5 -P 0.3, 0.7 -P 0.8, 0.4 -P 0.2}
                      = inf{1, 0.875, 1} = 0.875
                                        N
     Now, the mapping (λoh )↓ ↑π is obtained.

                  N                                        N              N
         (λoh )↓ ↑π (1) = sup{S(1, lo) &G (λoh )↓ (lo), S(1, lw) &P (λoh )↓ (lw)}
                            = sup{0.3 &G 0.4, 0.3 &P 0.875}
                            = sup{0.3, 0.2625} = 0.3
                ↓N ↑π
         (λoh )         (2) = sup{0.6 &G 0.4, 0.8 &P 0.875} = 0.7
                ↓N ↑π
         (λoh )         (3) = sup{0.5 &G 0.4, 0.2 &P 0.875} = 0.4
                               N
    Therefore, ((λoh )↓ , λoh ) is not a multi-adjoint property-oriented concept and
thus, the considered system has no solution, although if the experimental value
for oh had been 0.3 instead of 0.5, the system would have had a solution.
    These changes could be considered in several applications where noisy vari-
ables exist and their values can be conveniently changed to obtain approximate
solutions for the systems. Thus, if the experimental data for overheating are
oh(ov1 ) = 0.3, oh(ov2 ) = 0.7 and oh(ov2 ) = 0.4, then the original system will
have at least one solution and the values ϑoh        oh
                                               lo , ϑlw will be 0.4, 0.875, respectively
for a solution. Consequently, the truth for the first rule is lower than for the
second or it might be thought that it is more determinant in obtaining higher


                                                      10
                                   Concept lattices in fuzzy relation equations     85


values for lw than for lo. Another possibility is to consider that this conclusion
about the certainty of the rules is not correct, in which case another adjoint
triple might be associate to lo.
    As a result, the properties introduced in several fuzzy formal concept anal-
ysis frameworks can be applied in order to obtain solutions of fuzzy relation
equations, as well as in the multi-adjoint general framework.
    Furthermore, in order to obtain the solutions of Systems (5) and (7), the
algorithms developed, e.g., in [5, 6, 15], can be used.


5   Conclusions and future work
Multi-adjoint relation equations have been presented that generalize the existing
definitions presented at this time. In this general environment, different conjunc-
tors and residuated implications can be used, which provide more flexibility in
order to relate the variables considered in the system.
    A toy example has been introduced in the paper in order to improve its
readability and reduce the complexity of the definitions and results.
    As a consequence of the results presented in this paper, several of the prop-
erties provided, e.g., in [2–4, 12, 14, 17, 18, 27], can be used to obtain additional
characteristics of these systems.
    In the future, we will apply the results provided in the fuzzy formal con-
cept analysis environments to the general systems of fuzzy relational equations
presented here.


References
 1. W. Bandler and L. Kohout. Semantics of implication operators and fuzzy relational
    products. Int. J. Man-Machine Studies, 12:89–116, 1980.
 2. E. Bartl, R. Bělohlávek, J. Konecny, and V. Vychodil. Isotone galois connections
    and concept lattices with hedges. In 4th International IEEE Conference “Intelli-
    gent Systems”, pages 15.24–15.28, 2008.
 3. R. Bělohlávek. Lattices of fixed points of fuzzy Galois connections. Mathematical
    Logic Quartely, 47(1):111–116, 2001.
 4. R. Bělohlávek. Concept lattices and order in fuzzy logic. Annals of Pure and
    Applied Logic, 128:277–298, 2004.
 5. R. Bělohlávek, B. D. Baets, J. Outrata, and V. Vychodil. Lindig’s algorithm
    for concept lattices over graded attributes. Lecture Notes in Computer Science,
    4617:156–167, 2007.
 6. R. Bělohlávek, B. D. Baets, J. Outrata, and V. Vychodil. Computing the lattice
    of all fixpoints of a fuzzy closure operator. IEEE Transactions on Fuzzy Systems,
    18(3):546–557, 2010.
 7. Y. Chen and Y. Yao. A multiview approach for intelligent data analysis based on
    data operators. Information Sciences, 178(1):1–20, 2008.
 8. B. De Baets. Analytical solution methods for fuzzy relation equations. In D. Dubois
    and H. Prade, editors, The Handbooks of Fuzzy Sets Series, volume 1, pages 291–
    340. Kluwer, Dordrecht, 1999.


                                          11
86     Juan Carlos Dı́az and Jesús Medina-Moreno


 9. A. Di Nola, S. Sessa, W. Pedrycz, and E. Sanchez. Fuzzy Relation Equations and
    Their Applications to Knowledge Engineering. Kluwer, 1989.
10. I. Düntsch and G. Gediga. Approximation operators in qualitative data analysis.
    In Theory and Applications of Relational Structures as Knowledge Instruments,
    pages 214–230, 2003.
11. G. Gediga and I. Düntsch. Modal-style operators in qualitative data analysis. In
    Proc. IEEE Int. Conf. on Data Mining, pages 155–162, 2002.
12. G. Georgescu and A. Popescu. Non-dual fuzzy connections. Arch. Math. Log.,
    43(8):1009–1039, 2004.
13. P. Hájek. Metamathematics of Fuzzy Logic. Trends in Logic. Kluwer Academic,
    1998.
14. H. Lai and D. Zhang. Concept lattices of fuzzy contexts: Formal concept analysis
    vs. rough set theory. International Journal of Approximate Reasoning, 50(5):695–
    707, 2009.
15. C. Lindig. Fast concept analysis. In G. Stumme, editor, Working with Conceptual
    Structures-Contributions to ICCS 2000, pages 152–161, 2000.
16. J. Medina. Towards multi-adjoint property-oriented concept lattices. Lect. Notes
    in Artificial Intelligence, 6401:159–166, 2010.
17. J. Medina and M. Ojeda-Aciego. Multi-adjoint t-concept lattices. Information
    Sciences, 180(5):712–725, 2010.
18. J. Medina, M. Ojeda-Aciego, and J. Ruiz-Calviño. Formal concept analysis via
    multi-adjoint concept lattices. Fuzzy Sets and Systems, 160(2):130–144, 2009.
19. J. Medina, M. Ojeda-Aciego, A. Valverde, and P. Vojtáš. Towards biresiduated
    multi-adjoint logic programming. Lect. Notes in Artificial Intelligence, 3040:608–
    617, 2004.
20. J. Medina, M. Ojeda-Aciego, and P. Vojtáš. Multi-adjoint logic programming
    with continuous semantics. In Logic Programming and Non-Monotonic Reasoning,
    LPNMR’01, pages 351–364. Lect. Notes in Artificial Intelligence 2173, 2001.
21. J. Medina, M. Ojeda-Aciego, and P. Vojtáš. Similarity-based unification: a multi-
    adjoint approach. Fuzzy Sets and Systems, 146:43–62, 2004.
22. A. D. Nola, E. Sanchez, W. Pedrycz, and S. Sessa. Fuzzy Relation Equations
    and Their Applications to Knowledge Engineering. Kluwer Academic Publishers,
    Norwell, MA, USA, 1989.
23. Z. Pawlak. Rough sets. International Journal of Computer and Information Sci-
    ence, 11:341–356, 1982.
24. W. Pedrycz. Fuzzy relational equations with generalized connectives and their
    applications. Fuzzy Sets and Systems, 10(1-3):185 – 201, 1983.
25. I. Perfilieva. Fuzzy function as an approximate solution to a system of fuzzy relation
    equations. Fuzzy Sets and Systems, 147(3):363–383, 2004.
26. I. Perfilieva and L. Nosková. System of fuzzy relation equations with inf-→ com-
    position: Complete set of solutions. Fuzzy Sets and Systems, 159(17):2256–2271,
    2008.
27. A. M. Radzikowska and E. E. Kerre. A comparative study of fuzzy rough sets.
    Fuzzy Sets and Systems, 126(2):137–155, 2002.
28. E. Sanchez. Resolution of composite fuzzy relation equations. Information and
    Control, 30(1):38–48, 1976.
29. L. A. Zadeh. The concept of a linguistic variable and its application to approximate
    reasoning I, II, III. Information Sciences, 8–9:199–257, 301–357, 43–80, 1975.




                                           12
          Adaptation knowledge discovery for cooking
               using closed itemset extraction

                 Emmanuelle Gaillard, Jean Lieber, and Emmanuel Nauer

                    LORIA (UMR 7503—CNRS, INRIA, Nancy University)
                        BP 239, 54506 Vandœuvre-lès-Nancy, France,
                             First-Name.Last-Name@loria.fr



            Abstract. This paper is about the adaptation knowledge (AK) discov-
            ery for the Taaable system, a case-based reasoning system that adapts
            cooking recipes to user constraints. The AK comes from the interpreta-
            tion of closed itemsets (CIs) whose items correspond to the ingredients
            that have to be removed, kept, or added. An original approach is pro-
            posed for building the context on which CI extraction is performed. This
            approach focuses on a restrictive selection of objects and on a specific
            ranking based on the form of the CIs. Several experimentations are pro-
            posed in order to improve the quality of the AK being extracted and to
            decrease the computation time. This chain of experiments can be seen
            as an iterative knowledge discovery process: the analysis following each
            experiment leads to a more sophisticated experiment until some concrete
            and useful results are obtained.


      Keywords: adaptation knowledge discovery, closed itemset, data preprocess-
        ing, case-based reasoning, cooking.


      1    Introduction
      This paper addresses the adaptation challenge proposed by the Computer Cook-
      ing Contest (http://computercookingcontest.net/) which consists in adapt-
      ing a given cooking recipe to specific constraints. For example, the user wants
      to adapt a strawberry pie recipe, because she has no strawberry. The underlying
      question is: which ingredient(s) will the strawberries be replaced with?
          Adapting a recipe by substituting some ingredients by others requires cook-
      ing knowledge and adaptation knowledge in particular. Taaable, a case-based
      reasoning (CBR) system, addresses this problem using an ingredient ontology.
      This ontology is used for searching which is/are the closest ingredient(s) to the
      one that has to be replaced. In this approach the notion of “being close to”
      is given by the distance between ingredients in the ontology. In the previous
      example, Taaable proposes to replace the strawberries by other berries (e.g.
      raspberries, blueberries, etc.). However, this approach is limited because two in-
      gredients which are close in the ontology are not necessarily interchangeable and
      because introducing a new ingredient in a recipe may be incompatible with some
      other ingredient(s) of the recipe or may required to add other ingredients.




c 2011 by the paper authors. CLA 2011, pp. 87–99. Copying permitted only for private
  and academic purposes. Volume published and copyrighted by its editors. Local
  Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
88       Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer

         This paper extends the approach proposed in [2] for extracting this kind
     of adaptation knowledge (AK). The approach is based on closed itemset (CI)
     extraction, in which items are the ingredients that have to be removed, kept,
     or added for adapting the recipe. This paper introduces two originalities. The
     first one concerns the way the binary context, on which the CI extraction is
     performed, is built, by focusing on a restrictive selection of objects according
     to the objectives of the knowledge discovery process. The second one concerns
     the way the CIs are filtered and ranked, according to their form. The paper is
     organised as follows: Section 2 specifies the problem in its whole context and
     introduces Taaable which will integrate the discovered AK in its reasoning
     process. Section 3 gives preliminaries for this work, introducing CI extraction,
     case-based reasoning, and related work. Section 4 explains our approach; several
     experiments and evaluations are described and discussed.


     2     Context and motivations

     2.1   Taaable

     The Computer Cooking Contest is an international contest that aims at compar-
     ing systems that make inferences about cooking. A candidate system has to use
     the recipe base given by the contest to propose a recipe matching the user query.
     This query is a set of constraints such as inclusion or rejection of ingredients, the
     type or the origin of the dish, and the compatibility with some diets (vegetarian,
     nut-free, etc.).
         Taaable [1] is a system that has been originally designed as a candidate
     of the Computer Cooking Contest. It is also used as a brain teaser for research
     in knowledge based systems, including knowledge discovery, ontology engineer-
     ing, and CBR. Like many CBR systems, Taaable uses an ontology to retrieve
     recipes that are the most similar to the query. Taaable retrieves and creates
     cooking recipes by adaptation. If there exist recipes exactly matching the query,
     they are returned to the user; otherwise the system is able to retrieve similar
     recipes (i.e. recipes that partially match the target query) and adapts these
     recipes, creating new ones. Searching similar recipes is guided by several ontolo-
     gies, i.e. hierarchies of classes (ingredient hierarchy, dish type hierarchy and dish
     origin hierarchy), in order to relax constraints by generalising the user query.
     The goal is to find the most specific generalisation of the query (the one with
     the minimal cost) for which recipes exist in the recipe base. Adaptation consists
     in substituting some ingredients of the retrieved recipes by the ones required by
     the query.
         Taaable retrieves recipes using query generalisation, then adapts them by
     substitution. This section gives a simplified description of the Taaable system.
     For more details about the Taaable inference engine, see e.g. [1]. For example,
     for adapting the “My Strawberry Pie” recipe to the no Strawberry constraint,
     the system first generalises Strawberry into Berry, then specialises Berry into,
     say, Raspberry.
Adaptation knowledge discovery for cooking using closed itemset extraction          89

  2.2   Domain ontology
  An ontology O defines the main classes and relations relevant to cooking. O is
  a set of atomic classes organised into several hierarchies (ingredient, dish type,
  dish origin, etc.). Given two classes B and A of this ontology, A is subsumed by
  B, denoted by B w A, if the set of instances of A is included in the set of instances
  of B. For instance, Berry w Blueberry and Berry w Raspberry.

  2.3   Taaable adaptation principle
  Let R be a recipe and Q be a query such that R does not exactly match Q (oth-
  erwise, no adaptation would be needed). For example, Q = no Strawberry and
  R = “My Strawberry Pie”.The basic ontology-driven adaptation in Taaable
  follows the generalisation/specialisation principle explained hereafter (in a sim-
  plified way). First, R is generalised (in a minimal way) into Γ (R) that matches
  Q. For example, Γ may be the substitution Strawberry        Berry. Second, Γ (R)
  is specialised into Σ(Γ (R)) that still matches Q. For example, Σ is the substitu-
  tion Berry      Raspberry (the class Berry is too abstract for a recipe and must
  be made precise). This adaptation approach has at least two limits. First, the
  choice of Σ is at random: there is no reason to choose raspberries instead of blue-
  berries, unless additional knowledge is given. Second, when such a substitution
  of ingredient is made, it may occur that some ingredients should be added or
  removed from R. These limits point out the usefulness of additional knowledge
  for adaptation.


  3     Preliminaries
  3.1   Itemset extraction
  Itemset extraction is a set of data-mining methods for extracting regularities
  into data, by aggregating object items appearing together. Like FCA [8], itemset
  extraction algorithms start from a formal context K, defined by K = (G, M, r),
  where G is a set of objects, M is a set of items, and r is the relation on G × M
  stating that an object is described by an item [8]. Table 1 shows an example of
  context, in which recipes are described by the ingredients they require: G is a set
  of 5 objects (recipes R, R1 , R2 , R3 , and R4 ), M is a set of 7 items (ingredients
  Sugar, Water, Strawberry, etc.).
      An itemset I is a set of items, and the support of I, support(I), is the number
  of objects of the formal context having every item of I. I is frequent, with respect
  to a threshold σ, whenever support(I) ≥ σ. I is closed if it has no proper superset
  J (I ( J) with the same support. For example, {Sugar, Raspberry} is an item-
  set and support({Sugar, Raspberry}) = 2 because 2 recipes require both Sugar
  and Raspberry. However, {Sugar, Raspberry} is not a closed itemset, because
  {Sugar, PieCrust, Raspberry} has the same support. Another, equivalent, defi-
  nition of closed itemsets can be given on the basis of a closure operator ·00 defined
  as follows. Let I be an itemset and I 0 be the set of objects that have all the items
90     Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer




                         y




                         h




                         e
              Pi err




              Co arc




              Ge rry




              Ci uic
              Co st




                       p




              Pi on

                      ll
                      i




              Ap n
                   wb

                   ru

                   st

                   Wh

                   be

                   ti




                   eJ

                   am

                   he
                   r

                   r




                   e
                 ga

                te

                ra

                eC

                rn

                ol

                sp

                la

                pl

                pl

                nn

                 eS
              Su

              Wa

              St




              Ra




              Ap
            R    ×   ×    ×    ×     ×    ×
            R1   ×             ×     ×          ×    ×
            R2   ×   ×         ×                ×
            R3   ×             ×     ×                     ×    ×
            R4   ×   ×                                     ×         ×     ×

            Table 1. Formal context representing ingredients used in recipes.




     of I: I 0 = {x ∈ G | ∀i ∈ I, x r i}. In a dual way, let X be a set of objects and X 0
     be the set of properties shared by all objects of X: X 0 = {i ∈ M | ∀x ∈ X, x r i}.
     This defines two operators: ·0 : I ∈ 2M 7→ I 0 ∈ 2G and ·0 : X ∈ 2G 7→ X 0 ∈ 2M .
     These operators can be composed in an operator ·00 : I ∈ 2M 7→ I 00 ∈ 2M . An
     itemset I is said to be closed if it is a fixed point of ·00 , i.e., I 00 = I.
         In the following, “CIs” stands for closed itemsets, and “FCIs” stands for
     frequent CIs. For σ = 3, the FCIs of this context are {Sugar, PieCrust},
     {Sugar, PieCrust, Cornstarch}, {Sugar, Water}, {Sugar}, {Water}, {PieCrust},
     and {Cornstarch}.
         For the following experiments, the Charm algorithm [12] that efficiently
     computes the FCIs is used thanks to Coron a software platform implementing
     a rich set of algorithmic methods for symbolic data mining [11].


     3.2   Case-based reasoning

     Case-based reasoning (CBR [10]) consists in answering queries with the help
     of previous experience units called cases. In Taaable, a case is a recipe and a
     query represents user constraints. In many systems, including Taaable, CBR
     consists in the retrieval of a case from the case base and in the adaptation of the
     retrieved case in an adapted case that solves the query. Retrieval in Taaable is
     performed by minimal generalisation of the query (cf. section 2.3). Adaptation
     can be a simple substitution (e.g., substitute strawberry with any berry) but it
     can be improved thanks to the use of some domain specific AK. This motivates
     the research on AK acquisition.


     3.3   Related work

     The AK may be acquired in various way. It may be collected from experts [6],
     it may be acquired using machine learning techniques [9], or be semi-automatic,
     using data-mining techniques and knowledge discovery principles [3,4].
         This paper addresses automatic AK discovery. Previous works, such as the
     ones proposed by d’Aquin et al. with the Kasimir project in the medical do-
Adaptation knowledge discovery for cooking using closed itemset extraction         91

  main [5], and by Badra et al. in the context of a previous work on Taaable [2],
  are the foundations of our work.
      Kasimir is a CBR system applied to decision support for breast cancer
  treatment. In Kasimir, a case is a treatment used for a given patient. The
  patient is described by characteristics (age, tumour size and location, etc.) and
  the treatment consists in applying medical instructions. In order to discover
  AK, cases that are similar to the target case are first selected. Then, FCIs are
  computed on the variations between the target case and the similar cases. FCIs
  matching a specific form are interpreted for generating AK [5].
      Badra et al. use this approach to make cooking adaptations in Taaable [2].
  Their work aims at comparing pairs of recipes depending on the
  ingredients they contain. A recipe R is represented by the set of its ingredients:
  Ingredients(R). For example, the recipe “My Strawberry Pie” is represented
  by

  Ingredients(“My Strawberry Pie”) = {Sugar, Water, Strawberry, PieCrust,
                                             Cornstarch, CoolWhip}

  Let (R, R0 ) be a pair of recipes which is selected. According to [2], the represen-
  tation of a pair is denoted by ∆, where ∆ represents the variation of ingredients
  between R and R0 . Each ingredient ing is marked by −, =, or +:
   – ing − ∈ ∆ if ing ∈ Ingredients(R) and ing ∈
                                               / Ingredients(R0 ), meaning
     that ing appears in R but not in R0 .
   – ing + ∈ ∆ if ing ∈
                      / Ingredients(R) and ing ∈ Ingredients(R0 ), meaning
     that ing appears in R0 but not in R.
   – ing = ∈ ∆ if ing ∈ Ingredients(R) and ing ∈ Ingredients(R0 ), meaning
     that ing appears both in R in R0 .


  Building a formal context about ingredient variations in cooking reci-
  pes. Suppose we want to compare the recipe R with the four recipes (R1 , R2 ,
  R3 , R4 ) given in Table 1.
                            V Variations between R = “My Strawberry Pie” and a
  recipe Ri have the form j ingi,jmark
                                       . For example:

     ∆R,R1 = Sugar= ∧ Water− ∧ Strawberry− ∧ PieCrust= ∧ Cornstarch=
               ∧ CoolWhip− ∧ Raspberry+ ∧ Gelatin+                                (1)

      According to these variations, a formal context K = (G, M, I) can be built
  (cf. Table 2, for the running example):
   – G = {∆R,Ri }i
   – M is the set of ingredient variations: M = {ingi,j
                                                    mark
                                                         }i,j . In particular, M
     contains all the conjuncts of ∆R,R1 (Strawberry , etc., cf.(1)).
                                                    −

   – (g, m) ∈ I, if g ∈ G, m ∈ M , and m is a conjunct of g, for example
     (∆R,R1 , Strawberry− ) ∈ I.
92       Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer




                eC ry −




               ol ch −
                rn ch =




               nn ce +
               la y +
             Co ust −




             Ra hip −
             Pi ust =




             Pi mon +

                     ll +
             Ap in +
             Pi ber




             Co tar

             Co tar



             Ge err




             Ci Jui
                  r−
                  r=

                  r=




                  e+




                  he
                  w

                  r

                  r

                  s

                  s

                  W

                  b

                  t



                   e

                  a
               ga

               te

                te

                ra



               eC

               rn




               sp



               pl

                pl



                eS
             Su

             Wa

             Wa

             St




             Ap
         ∆R,R1   ×       ×   ×   ×         ×        ×    ×    ×
         ∆R,R2   ×   ×       ×   ×             ×    ×    ×
         ∆R,R3   ×       ×   ×   ×         ×        ×             ×    ×
         ∆R,R4   ×   ×       ×        ×        ×    ×             ×         ×    ×


      Table 2. Formal context for ingredient variations in pairs of recipes (R, Rj ).




     Interpretation. In the formal context, an ingredient marked with + (resp. −)
     is an ingredient that has to be added (resp. removed). An ingredient marked
     with = is an ingredient common to R and Ri .


     4     Adaptation Knowledge discovery

     AK discovery is based on the same scheme as knowledge discovery in databases
     (KDD [7]). The main steps of the KDD process are data preparation, data-
     mining, and interpretation of the extracted units of information. Data prepara-
     tion relies on formatting data for being used by data-mining tools and on filtering
     operations for focusing on special subsets of objects and/or items, according to
     the objectives of KDD. Data-mining tools are applied for extracting regularities
     into the data. These regularities have then to be interpreted; filtering operations
     may also be performed on this step because of the (often) huge size of the data-
     mining results or of the noise included in these results. All the steps are guided
     by an analyst.
         The objective of our work is the extraction of some AK useful for adapt-
     ing a given recipe to a query. The work presented in the following focuses
     on filtering operations, in order to extract from a formal context encoding
     ingredient variations between pairs of recipes, the cooking adaptations. The
     database used as entry point of the process is the Recipe Source database
     (http://www.recipesource.com/) which contains 73795 cooking recipes. For
     the sake of simplicity, we consider in the following, the problem of adapting
     R by substituting one or several ingredient(s) with one or several ingredient(s)
     (but the approach can be generalised for removing more ingredients, and also be
     used for adding ingredient(s) in a recipe). Three experiments are presented; they
     address the same adaptation problem: adapting the R = “My Strawberry Pie”
     recipe, with Ingredients(“My Strawberry Pie”) = {Sugar, Water, Strawberry,
     PieCrust, Cornstarch, CoolWhip}, to the query no Strawberry. In each ex-
     periment, a formal context about ingredient variations in recipes is built. Then,
     FCIs are extracted and filtered for proposing cooking adaptation. The two first
Adaptation knowledge discovery for cooking using closed itemset extraction       93

  experiments focus on object filtering, selecting recipes which are more and more
  similar to the “My Strawberry Pie” recipe: the first experiment uses recipe from
  the same type (i.e. pie dish) as “My Strawberry Pie” instead of choosing recipes
  of any type; the second experiment focuses on a more precise filtering based on
  similarity between the “My Strawberry Pie” recipe and recipes used for gener-
  ating the formal context on ingredient variations.


  4.1   A first approach with closed itemsets

  As introduced in [2], a formal context is defined, where objects are ordered
  pairs of recipes (R, R0 ) and properties are ingredients marked with +, =, − for
  representing the ingredient variations from R to R0 . The formal context which
  is build is similar to the example given in Table 2. In each pair of recipes, the
  first element is the recipe R =“My Strawberry Pie” that must be adapted; the
  second element is a recipe of the same dish type as R which, moreover, does not
  contain the ingredient which has to be removed. In our example, it corresponds
  to pie dish recipes which do not contain strawberry. This formal context allows
  to build CIs which have to be interpreted in order to acquire adaptation rules.


  Experiment. 3653 pie dish recipes that do not contain strawberry are found in
  the Recipe Source database. The formal context, with 3653 objects × 1355 items
  produces 107,837 CIs (no minimal support is used).


  Analysis. Some interesting CIs can be found. For example, {PieCrust− ,
  Strawberry− , Cornstarch− , CoolWhip− , Water− , Sugar− } with support of 1657,
  contains all the ingredients of R with a − mark, meaning that there are 1657
  recipes which have no common ingredients with the R recipe. In the same
  way, {PieCrust− , Strawberry− , Cornstarch− , CoolWhip− , Water− } with sup-
  port 2590, means that 2590 recipes share only the Sugar ingredient with R
  because the sugar is the sole ingredient of R which is not included in this CI.
  The same analysis can be done for {PieCrust− , Strawberry− , Cornstarch− ,
  CoolWhip− , Sugar− } (support of 1900), for water, etc.


  Conclusion. The CIs are too numerous for being presented to the analyst. Only
  1996 of the 3653 pie dish without strawberry recipes share at least one ingredient
  with R. There are too many recipes without anything in common. A first filter
  can be used to limit the size of the formal context in number of objects.


  4.2   Filtering recipes with at least one common ingredient

  Experiment. The formal context, with 1996 objects × 813 items, produces 22,408
  CIs (no minimal support is used), ranked by decreasing support.
94     Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer

     Results. The top five FCIs are:
      – {Strawberry− } with support of 1996;
      – {Strawberry− , CoolWhip− } with support of 1916;
      – {Strawberry− , PieCrust− } with support of 1757;
      – {Strawberry− , PieCrust− , CoolWhip− } with support of 1679;
      – {Strawberry− , Cornstarch− } with support of 1631.

     Analysis. Several observations can be made. The first FCI containing an ingre-
     dient marked by + ({Strawberry− , Egg+ }, with support of 849) appears only at
     the 46th position. Moreover, there are 45 FCIs with one ingredient marked by +
     in the first 100 FCIs, and no FCI with more than one ingredient marked by +.
     A substituting ingredient ing can only be found in CIs containing ing + meaning
     that there exists a recipe containing ing, which is not in R. So, FCIs that do not
     contain the + mark cannot be used for finding a substitution proposition, and
     they are numerous in the first 100 ones, based on a support ranking (recall that
     it has been chosen not to consider adaptation by simply removing ingredient).
         In the first 100 FCIs, there is only 15 FCIs containing both an ingredient
     marked by + and an ingredient marked by =. In a FCI I, the = mark on a
     ingredient ing means that ing is common to R and to recipe(s) involved by
     the creation of I. So, an ingredient marked by = guarantees a certain similarity
     (based on ingredients that are used) between the recipes R and R0 compared
     by ∆R,R0 . If a FCI I contains a potential substituting ingredient, marked by +,
     but does not contain any =, the risk for proposing a cooking adaptation from
     I is very high, because there is no common ingredient with R in the recipe the
     potential substituting ingredient comes from.
         In the first 100 recipes, the only potential substituting ingredients (so, the
     ingredients marked by +) are egg, salt, and butter, which are not satisfactory
     from a cooking viewpoint for substituting the strawberries.
         We have conducted similar experiments with other R and queries, and the
     same observations as above can be made.

     Conclusion. From these observations, it can be concluded that the sole rank-
     ing based on support is not efficient to find relevant cooking adaptation rules,
     because the most frequent CIs do no contain potential substituting ingredients
     and, moreover, have no common ingredient with R.

     4.3   Filtering and ranking CIs according to their forms
     To extract realistic adaptation, CIs with a maximum of ingredients marked by =
     are searched. We consider that a substitution is acceptable, if 50% of ingredients
     of R are preserved and if the adaptation does not introduce too many ingredients;
     we also limit the number of ingredients introduced to 50% of the initial number
     of ingredients in R. For the experiment with the R = “My Strawberry Pie”,
     containing initially 6 ingredients, it means that at least 3 ingredients must be
     preserved and at most 3 ingredients can be added. In term of CIs, it corresponds
     to CIs containing at least 3 ingredients marked with = and at most 3 ingredients
     marked with +.
Adaptation knowledge discovery for cooking using closed itemset extraction         95

  Experiment. Using this filter on CIs produced by the previous experiment re-
  duces the number of CIs to 505. However, because some CIs are more relevant
  than others, they must be ranked according to several criteria. We use the fol-
  lowing rules, given by priority order:

   1. A CI must have a + in order to find a potential substituting ingredient.
   2. A CI which has more = than another one is more relevant. This criterion
      promotes the pairs which have a largest set of common ingredient.
   3. A CI which has less − than another one is more relevant. This criterion
      promotes adaptations which remove less ingredients.
   4. A CI which has less + than another one is more relevant. This criterion
      promotes adaptations which add less ingredients.
   5. If two CIs cannot be ranked according to the 4 criteria above, the CI the
      more frequent is considered to be the more relevant.

  Results. The 5 first CIs ranked according to the previous criteria are:

   – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= ,
     Salt+ } with support of 5;
   – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= ,
     LemonJuice+ } with support of 4;
   – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= ,
     LemonJuice+ , CreamCheese+ } with support of 2;
   – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= ,
     LemonJuice+ , WhippingCream+ } with support of 2;
   – {Water= , Sugar= , Strawberry− , CoolWhip− , Cornstarch= , PieCrust= ,
     LemonJuice+ , LemonPeel+ } with support of 2.

  Analysis. One can observe that potential substituting ingredients take part of
  the first 5 CIs and each CIs preserve 4 (of 6) ingredients. The low supports of
  these CIs confirm that searching frequent CIs is not compatible with our need,
  which is to extract CIs with a specific form.

  Conclusion. Ranking the CIs according to our particular criteria is more efficient
  than using a support based ranking. This kind of ranking can also be seen as a
  filter on CIs. However, this approach requires to compute all CIs because the
  support of interesting CIs is low.


  4.4   More restrictive formal context building according to the form
        of interesting CIs

  The computation time can be improved by applying a more restrictive selection
  of recipe pairs at the formal context building step, decreasing drastically the size
  of the formal context. Indeed, as the expected form of CIs is known, recipe pairs
  that cannot produce CIs of the expected form can be removed. This can also
  be seen as a selection of recipes that are similar enough to R. R0 is considered
96     Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer

     as enough similar to R if R0 has a minimal threshold σ = = 50% of ingredients
     in common with R (cf. (2)) and if R0 has a maximal threshold σ + = 50% of
     ingredients that are not used in R (cf. (3)). These two conditions expresses for
     ∆R,R0 the same similarities conditions considered in section 4.3 on CIs.

                       |Ingredients(R) ∩ Ingredients(R0 )|
                                                           ≥ σ=                    (2)
                                |Ingredients(R)|
                       |Ingredients(R0 ) \ Ingredients(R)|
                                                           ≥ σ+                    (3)
                                |Ingredients(R)|

     Experiment. Among the 1996 pie dish recipes not containing Strawberry, only
     20 recipes satisfy the two conditions. The formal context, with 20 objects × 40
     items, produces only 21 CIs (no minimal support is used).

     Results. The 5 first CIs, satisfying the form introduced in the previous section
     and ranked by decreasing support are:
      – {Water= , Sugar= , Cornstarch= , PieCrust= , Strawberry− , CoolWhip− ,
        RedFoodColoring+ , Cherry+ } with support of 1;
      – {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− ,
        PieShell+} with support of 6;
      – {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− ,
        Raspberry+ } with support of 3;
      – {Water= , Sugar− , Cornstarch= , PieCrust= , Strawberry− , CoolWhip− ,
        Apple+ , AppleJuice+ } with support of 3;
      – {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− ,
        Peach+ , PieShell+} with support of 2.

     Analysis. According to these CIs the first potential substituting ingredients are:
     RedFoodColoring, Cherry, PieShell, Raspberry, Apple, and Peach. Each CI
     preserves 3 or 4 (of 6) ingredients to 6 and two CIs add 2 ingredients.

     Conclusion. This approach reduces the computation time without reducing the
     result quality. Moreover, it gives the best potential adaptation in the first CIs.

     4.5   From CIs to adaptation rules
     As Taaable must propose a recipe adaptation, CIs containing potentially sub-
     stituting ingredients must be transformed. Indeed, a CI does not represent a
     direct cooking adaptation. For example, the third CI of the last experiment
     contains Raspberry+ , simultaneously with CoolWhip− and PieCrust− . Remov-
     ing the pie crust (i.e. PieCrust− ) can look surprising for a pie dish, but one
     must keep in mind that a CI does not correspond to a real recipe, but to
     an abstraction of variations between R and a set of recipes. So, producing a
     complete adaptation requires to get back to the ∆R,Ri for having all the vari-
     ations of ingredient that will take part to the adaptation. For example, for
Adaptation knowledge discovery for cooking using closed itemset extraction         97




                          y−
                          h=




                         st +
                           +




                           +
               St st −




                        p−



               Ge ll +
               Pi arc




               Co err




               Pi rry




               GC lor
               Fo n +



                       ru
                       i
                    r=

                    r=

                     st

                    ru

                    wb

                    Wh

                    be

                    he

                    ti

                    Co

                     eC
                 te

                  ga

                  rn

                  eC

                 ra

                 ol

                 sp

                 eS

                 la

                 od

                  Pi
               Wa

               Su

               Co




               Ra
         ∆R,R1 ×       ×    ×     ×    ×    ×     ×          ×          ×
         ∆R,R2 ×       ×    ×     ×    ×    ×     ×    ×          ×
         ∆R,R3 ×       ×    ×     ×    ×    ×     ×    ×

   Table 3. Formal context for ingredient variations in pairs of recipes (R, Rj ).




  the CI {Water= , Sugar= , Cornstarch= , PieCrust− , Strawberry− , CoolWhip− ,
  Raspberry+ }, the ∆R,Ri (with i ∈ [1; 3]) are the ones given by Table 3.
     The adaptation rules extracted from these 3 recipe variations are:

   – {CoolWhip, PieCrust, Strawberry} ; {Gelatin, GCPieCrust, Raspberry};
   – {CoolWhip, PieCrust, Strawberry} ; {FoodColor, PieShell, Raspberry};
   – {CoolWhip, PieCrust, Strawberry} ; {PieShell, Raspberry}.


      For R2 and R3 , PieShell is added in replacement of PieCrust; in R1 ,
  GCPieCrust plays the role of PieCrust. These three recipe variations propose to
  replace Strawberry by Raspberry. For R1 (resp. R2 ), Gelatin (resp. FoodColor)
  is also added. Finally, the three recipe variations propose to remove the CoolWhip.
      Our approach guarantees the ingredient compatibility, with the assumption
  that the recipe base used for the adaptation rule extraction process contains
  only good recipes, i.e. recipes which do not contain ingredient incompatibility.
  Indeed, as adaptation rules are extracted from real recipes, the good combination
  of ingredients is preserved. So, when introducing a new ingredient ing1 (marked
  by ing1+ ), removing another ingredient ing2 (marked by ing2− ) could be required.
  The reason is that there is no recipe, entailed in the creation of the CI from which
  the adaptation rules are extracted, using both ing1 and ing2 . In the same way,
  adding a supplementary ingredient ing3 (marked by ing3+ ) in addition of ing1 ,
  is obtained from recipes which use both ing1 and ing3 .
      Applying FCA on these ∆R,Ri produces the concept lattice presented in
  Fig. 1 in which the top node is the CI retained. This node can be seen as
  a generic cooking adaptation, and navigating into the lattice will conduct to
  more specific adaptation. The KDD loop is closed: after having (1) selected and
  formatting the data, (2) applying a data-mining CI extraction algorithm, and
  (3) interpreting the results, a new set of data is selected on which a data-mining
  –FCA– algorithm could then be applied.
      We have chosen to return the adaptation rules generated from the 5 first CIs
  to the user. So, the system proposes results where Strawberry could be replaced
  (in addition of some other ingredient adding or removing) by “RedFoodColoring
  and Cherry”, by Raspberry with optional Gelatin or FoodColor, by Peach
98       Emmanuelle Gaillard, Jean Lieber and Emmanuel Nauer




             Fig. 1. The lattice computed on the formal context given in Table 3.




     with optional FoodColor or LemonJuice, by “HeavyCream and LemonRind”, or
     by “Apple and AppleJuice”.


     5    Conclusion

     This paper shows how adaptation knowledge can be extracted efficiently for ad-
     dressing a cooking adaptation challenge. Our approach focuses on CIs with a
     particular form, because the support is not a good ranking measure for this
     problem. A ranking method based on 5 criteria explicitly specified for this adap-
     tation problem is proposed; the support is used in addition to distinguish CIs
     which satisfy in the same way the 5 criteria.
         Beyond the application domain, this study points out that KD is not only a
     data-mining issue: the preparation and interpretation steps are also important.
     Moreover, it highlights the iterative nature of KD: starting from a first experi-
     ment with few a priori about the form of the results which are too numerous to
     be interpreted, it arrives to an experiment with a precise aim that gives results
     that are easy to interpret as adaptation rules.
         It has been argued in the paper that this approach is better than the basic
     adaptation approach (based on substituting an ingredient by another one, on
     the basis of the ontology), in that it avoids some ingredient incompatibilities
     and makes some specialisation choices. However, a careful study remains to be
     made in order to compare experimentally these approaches.
         A short-term future work is to integrate this AK discovery into the online
     system Taaable, following the principles of opportunistic KD [2].
         A mid-term future work consists in using the ontology during the KD process.
     The idea is to add new items, deduced thanks to the ontology (e.g. the properties
     Cream− and Milk+ entail the variation Dairy= ). First experiments have already
     been conducted but they raise interpretation difficulties. Indeed, the extracted
     CIs contain abstract terms (such as Dairy= or Flavoring+ ) that are not easy
     to interpret.
Adaptation knowledge discovery for cooking using closed itemset extraction             99

  References
   1. F. Badra, R. Bendaoud, R. Bentebitel, P.-A. Champin, J. Cojan, A. Cordier, S. De-
      sprés, S. Jean-Daubias, J. Lieber, T. Meilender, A. Mille, E. Nauer, A. Napoli, and
      Y. Toussaint. Taaable: Text Mining, Ontology Engineering, and Hierarchical Clas-
      sification for Textual Case-Based Cooking. In ECCBR Workshops, Workshop of
      the First Computer Cooking Contest, pages 219–228, 2008.
   2. F. Badra, A. Cordier, and J. Lieber. Opportunistic Adaptation Knowledge Dis-
      covery. In Lorraine McGinty and David C. Wilson, editors, 8th International Con-
      ference on Case-Based Reasoning - ICCBR 2009, volume 5650 of Lecture Notes
      in Computer Science, pages 60–74, Seattle, États-Unis, July 2009. Springer. The
      original publication is available at www.springerlink.com.
   3. S. Craw, N. Wiratunga, and R. C. Rowe. Learning adaptation knowledge to im-
      prove case-based reasoning. Artificial Intelligence, 170(16-17):1175–1192, 2006.
   4. M. d’Aquin, F. Badra, S. Lafrogne, J. Lieber, A. Napoli, and L. Szathmary. Case
      base mining for adaptation knowledge acquisition. In International Joint Confer-
      ence on Artificial Intelligence, IJCAI’07, pages 750–756, 2007.
   5. M. D’Aquin, S. Brachais, J. Lieber, and A. Napoli. Decision Support and Knowl-
      edge Management in Oncology using Hierarchical Classification. In Katherina
      Kaiser, Silvia Miksch, and Samson W. Tu, editors, Proceedings of the Symposium
      on Computerized Guidelines and Protocols - CGP-2004, volume 101 of Studies in
      Health Technology and Informatics, pages 16–30, Prague, Czech Republic, 2004.
      Silvia Miksch and Samson W. Tu, IOS Press.
   6. M. d’Aquin, J. Lieber, and A. Napoli. Adaptation Knowledge Acquisition: a Case
      Study for Case-Based Decision Support in Oncology. Computational Intelligence
      (an International Journal), 22(3/4):161–176, 2006.
   7. U. M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge
      discovery in databases. AI Magazine, pages 37–54, 1996.
   8. B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations.
      Springer, 1999.
   9. K. Hanney and M. T. Keane. Learning Adaptation Rules From a Case-Base.
      In I. Smith and B. Faltings, editors, Advances in Case-Based Reasoning – Third
      European Workshop, EWCBR’96, LNAI 1168, pages 179–192. Springer, 1996.
  10. C. K. Riesbeck and R. C. Schank. Inside Case-Based Reasoning. Lawrence Erlbaum
      Associates, Inc., Hillsdale, New Jersey, 1989.
  11. L. Szathmary and A. Napoli. CORON: A Framework for Levelwise Itemset Min-
      ing Algorithms. Supplementary Proc. of The Third International Conference on
      Formal Concept Analysis (ICFCA ’05), Lens, France, pages 110–113, 2005.
  12. M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset
      mining. In SIAM International Conference on Data Mining SDM’02, pages 33–43,
      2002.
             Fast Computation of Proper Premises

                  Uwe Ryssel1 , Felix Distel2 , and Daniel Borchmann3
   1
       Institute of Applied Computer Science, Technische Universität Dresden, Dresden,
                              Germany, uwe.ryssel@tu-dresden.de
        2
            Institute of Theoretical Computer Science, Technische Universität Dresden,
                        Dresden, Germany, felix@tcs.inf.tu-dresden.de
          3
             Institute of Algebra, Technische Universität Dresden, Dresden, Germany,
                                 borch@tcs.inf.tu-dresden.de



          Abstract. This work is motivated by an application related to refactor-
          ing of model variants. In this application an implicational base needs to
          be computed, and runtime is more crucial than minimal cardinality. Since
          the usual stem base algorithms have proven to be too costly in terms of
          runtime, we have developed a new algorithm for the fast computation of
          proper premises. It is based on a known link between proper premises and
          minimal hypergraph transversals. Two further improvements are made,
          which reduce the number of proper premises that are obtained multiple
          times and redundancies within the set of proper premises. We provide
          heuristic evidence that an approach based on proper premises will also
          be beneficial for other applications.


  1     Introduction

  Today, graph-like structures are used in many model languages to specify al-
  gorithms or problems in a more readable way. Examples are data-flow-oriented
  simulation models, such as MATLAB/Simulink, state diagrams, and diagrams
  of electrical networks. Generally, such models consist of blocks or elements and
  connections among them. Using techniques described in Section 5.2, a formal
  context can be obtained from such models. By computing an implicational base
  of this context, dependencies among model artifacts can be uncovered. These
  can help to represent a large number of model variants in a structured way.
      For many years, computing the stem base has been the default method for
  extracting a small but complete set of implications from a formal context. There
  exist mainly two algorithms to achieve this [10,15], and both of them compute
  not only the implications from the stem base, but also concept intents. This is
  problematic as a context may have exponentially many concept intents. Recent
  theoretical results suggest that existing approaches for computing the stem base
  may not lead to algorithms with better worst-case complexity [6,1].
      Bearing this in mind, we focus on proper premises. Just like pseudo-intents,
  that are used to obtain the stem base, proper premises yield a sound and com-
  plete set of implications. Because this set of implications does not have minimal
  cardinality, proper premises have been outside the focus of the FCA community

c 2011 by the paper authors. CLA 2011, pp. 101–113. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
102     Uwe Ryssel, Felix Distel and Daniel Borchmann


for many years. However, there are substantial arguments to reconsider using
them. Existing methods for computing proper premises avoid computing con-
cept intents. Thus, in contexts with many concept intents they may have a clear
advantage in runtime over the stem base algorithms. This is particularly true
for our application where the number of concept intents is often close to the
theoretical maximum. Here, attributes often occur together with their negated
counterparts, and the concept lattice can contain several millions of elements.
In Section 5.1 we provide arguments that we can expect the number of con-
cept intents to be larger than the number of proper premises in most contexts,
assuming a uniform random distribution.
    Often, in applications, runtime is the limiting factor, not the size of the basis.
But even where minimal cardinality is a requirement, computing proper premises
is worth considering, since there are methods to transform a base into the stem
base in polynomial time [16].
    In this paper we present an algorithm for the fast computation of proper
premises. It is based on three ideas. The first idea is to use a simple connection
between proper premises and minimal hypergraph transversals. The problem
of enumerating minimal hypergraph transversals is well-researched. Exploiting
the link to proper premises allows us to use existing algorithms that are known
to behave well in practice. A first, naïve algorithm iterates over all attributes
and uses a black-box hypergraph algorithm to compute proper premises of each
attribute.
    A drawback when iterating over all attributes is that the same proper premise
may be computed several times for different attributes. So we introduce a can-
didate filter in the second step: For each attribute m, the attribute set is filtered
and proper premises are searched only among the candidate attributes. We show
that this filtering method significantly reduces the number of multiple-computed
proper premises while maintaining completeness. In a third step we exploit the
fact that there are obvious redundancies within the proper premises. These can
be removed by searching for proper premises only among the meet-irreducible
attributes.
    We argue that our algorithms are trivial to parallelize, leading to further
speedups. Due to their incremental nature, parallelized versions of the stem base
algorithms are not known to date. We conclude by providing experimental re-
sults. These show highly significant improvements for the contexts obtained from
the model refactoring application. For a sample context, where Next-Closure re-
quired several hours to compute the stem base, runtime has dropped to fractions
of a second. For contexts from other applications the improvements are not as
impressive but still large.


2     Preliminaries

We provide a short summary of the most common definitions in formal concept
analysis. A formal context is a triple K = (G, M, I) where G is a set of objects,
M a set of attributes, and I ⊆ G × M is a relation that expresses whether an
                                     Fast Computation of Proper Premises       103


object g ∈ G has an attribute m ∈ M . If A ⊆ G is a set of objects then A0
denotes the set of all attributes that are shared among all objects in A, i.e.,
A0 = { m ∈ M | ∀g ∈ G : gIm }. Likewise, for some set B ⊆ M we define
B 0 = { g ∈ G | ∀m ∈ B : gIm }. Pairs of the form (A, B) where A0 = B and
B 0 = A are called formal concepts. Formal concepts of the form ({ m }0 , { m }00 )
for some attribute m ∈ M are called attribute concept and are denoted by µm.
We define the partial order ≤ on the set of all formal concepts of a context to
be the subset order on the first component. The first component of a formal
concept is called the concept extent while the second component is called the
concept intent.
    Formal concept analysis provides methods to mine implicational knowledge
from formal contexts. An implication is a pair (B1 , B2 ) where B1 , B2 ⊆ M ,
usually denoted by B1 → B2 . We say that the implication B1 → B2 holds in a
context K if B10 ⊆ B20 . An implication B1 → B2 follows from a set of implications
L if for every context K in which all implications from L hold, B1 → B2 also
holds. We say that L is sound for K if all implications from L hold in K, and
we say that L is complete for K if all implications that hold in K follow from
L. There exists a sound and complete set of implications for each context which
has minimal cardinality [12]. This is called the stem base. The exact definition
of the stem base is outside the scope of this work.
    A sound and complete set of implications can also be obtained using proper
premises. For a given set of attributes B ⊆ M , define B • to be the set of those
attributes in M \ B that follow from B but not from a strict subset of B, i.e.,
                                             [      
                             B • = B 00 \ B ∪    S 00 .
                                             S(B

B is called a proper premise if B • is not empty. It is called a proper premise for
m ∈ M if m ∈ B • . It can be shown that L = { B → B • | B proper premise }
is sound and complete [11]. Several alternative ways to define this sound and
complete set of implications can be found in [2].
    We write g $ m if g 0 is maximal with respect to the subset order among all
object intents which do not contain m.


3   Proper Premises as Minimal Hypergraph Transversals

We present a connection between proper premises and minimal hypergraph
transversals, which forms the foundation for our enumeration algorithms. It has
been exploited before in database theory to the purpose of mining functional
dependencies from a database relation [14]. Implicitly, it has also been known
for a long time within the FCA community. However, the term hypergraph has
not been used in this context (cf. Prop. 23 from [11]).
    Let V be a finite set of vertices. Then a hypergraph on V is simply a pair
(V, H) where H is a subset of the power set 2V . Intuitively, each set E ∈ H
represents an edge of the hypergraph, which, in contrast to classical graph theory,
104       Uwe Ryssel, Felix Distel and Daniel Borchmann


may be incident to more or less than two vertices. A set S ⊆ V is called a
hypergraph transversal of H if it intersects every edge E ∈ H, i.e.,

                                ∀E ∈ H : S ∩ E 6= ∅.

S is called a minimal hypergraph transversal of H if it is minimal with respect to
the subset order among all hypergraph transversals of H. The transversal hyper-
graph of H is the set of all minimal hypergraph transversals of H. It is denoted by
Tr (H). The problem of deciding for two hypergraphs G and H whether H is the
transversal hypergraph of G is called TransHyp. The problem of enumerating
all minimal hypergraph transversals of a hypergraph G is called TransEnum.
Both problems are relevant to a large number of fields and therefore have been
well-researched. TransHyp is known to be contained in coNP. Since it has
been shown that TransHyp can be decided in quasipolynomial time [9], it is
not believed to be coNP-complete. Furthermore, it has been shown that it can
be decided using only limited non-determinism [8]. For the enumeration problem
it is not known to date whether an output-polynomial algorithm exists. However,
efficient algorithms have been developed for several classes of hypergraphs [8,4].
     The following proposition can be found in [11] among others.

Proposition 1. P ⊆ M is a premise of m ∈ M iff

                                 (M \ g 0 ) ∩ P 6= ∅

holds for all g ∈ G with g $ m. P is a proper premise for m iff P is minimal
(with respect to ⊆) with this property.

      We immediately obtain the following corollary.

Corollary 1. P is a premise of m iff P is a hypergraph transversal of (M, H)
where
                       H := {M \ g 0 | g ∈ G, g $ m}.

The set of all proper premises of m is exactly the transversal hypergraph

                           Tr ({M \ g 0 | g ∈ G, g $ m}).

    In particular this proves that enumerating the proper premises of a given
attribute m is polynomially equivalent to TransEnum. This can be exploited
in a naïve algorithm for computing all proper premises of a formal context (Al-
gorithm 1). Being aware of the link to hypergraph transversals, we can benefit
from existing efficient algorithms for TransEnum in order to enumerate proper
premises similar to what has been proposed in [14]. Of course, it is also possible
to use other enumeration problems to which TransEnum can be reduced. Ex-
amples are the enumeration of prime implicants of Horn functions [2] and the
enumeration of set covers.
                                           Fast Computation of Proper Premises      105


4     Improvements to the Algorithm
4.1     Avoiding Duplicates using Candidate Sets
We can further optimize Algorithm 1 by reducing the search space. In the naïve
algorithm proper premises are typically computed multiple times since they can
be proper premises of more than one attribute. Our goal is to avoid this wherever
possible.
   The first idea is shown in Algorithm 2. There we introduce a candidate set
C of particular attributes, depending on the current attribute m. We claim now
that we only have to search for minimal hypergraph transversals P of { M \ g 0 |
g $ m } with P ⊆ C. We provide some intuition for this idea.


Algorithm 1 Naïve Algorithm for Enumerating All Proper Premises
    Input: K = (G, M, I)
    P=∅
    for all m ∈ M do
      P = P ∪ Tr ({M \ g 0 | g ∈ G, g $ m})
    end for
    return P




Algorithm 2 A Better Algorithm for Enumerating All Proper Premises
    Input: K = (G, M, I)
    P = { { m } | m ∈ M, { m } is a proper premise of K }
    for all m ∈ M do
      C = { u ∈ M \ { m } | 6 ∃v ∈ M : µu ∧ µm ≤ µv < µm }
      P = P ∪ { P ⊆ C | P minimal hypergraph transversal of { M \ g 0 | g $ m } }
    end for
    return P



   Let us fix a formal context K = (G, M, I), choose m ∈ M and let P ⊆ M be
a proper premise for m. Then we know that m ∈ P 00 , which is equivalent to
                                 ^
                                    µp ≤ µm.
                                     p∈P

If we now find another attribute n ∈ M \ { m } with
                               ^
                                  µp ≤ µn < µm
                                  p∈P

it suffices to find the set P as a proper premise for n, because from µn < µm we
can already infer m ∈ P 00 . Conversely, if we search for all proper premises for m,
106     Uwe Ryssel, Felix Distel and Daniel Borchmann


we only have to search for those who are not proper premises for attributes n
with µn < µm. Now suppose that there exists an element u ∈ P and an attribute
v ∈ M such that
                            µm ∧ µu ≤ µv < µm.                            (1)
Then we know                ^                ^
                        (       µp) ∧ µm =         µp ≤ µv < µm,
                         p∈P                 p∈P

i.e., P is already a proper premise for v. In this case, we do not have to search
for P , since it will be found in another iteration. On the other hand, if P is a
proper premise for m but not for any other attribute n ∈ M with µn < µm, the
argument given above shows that an element u ∈ P and an attribute v ∈ M
satisfying (1) cannot exist.

Lemma 1. Algorithm 2 enumerates for a given formal context K = (G, M, I)
all proper premises of K.

Proof. Let P be a proper premise of K for the attribute m. P is a proper premise
and therefore m ∈ P 00 holds, which is equivalent to µm ≥ (P 0 , P 00 ). Let c ∈ M
be such that µm ≥ µc ≥ (P 0 , P 00 ) and µc is minimal with this property. We
claim that either P = { c } or P is found in the iteration for c of Algorithm 2.
    Suppose c ∈ P . Then m ∈ { c }00 follows from µm ≥ µc. As a proper premise,
P is minimal with the property m ∈ P 00 . It follows P = { c } and P is found by
Algorithm 2 during the initialization.
    Now suppose c 6∈ P . Consider

               C := { u ∈ M \ { c } | 6 ∃v ∈ M : µu ∧ µc ≤ µv < µc }.

We shall show P ⊆ C. To see this, consider some p ∈ P . Then p 6= c holds by
assumption. Suppose that p 6∈ C, i.e., there is some v ∈ M such that µp ∧ µc ≤
µv < µc. Because of p ∈ P , µp ≥ (P 0 , P 00 ) and together with µc ≥ (P 0 , P 00 ) we
have
                       (P 0 , P 00 ) ≤ µp ∧ µc ≤ µv < µc
in contradiction to the minimality of µc. This shows p ∈ C and all together
P ⊆ C.
    To complete the proof it remains to show that P is a minimal hypergraph
transversal of { M \ { g }0 | g $ c }, i.e., that P is also a proper premise for c, not
only for m. Consider n ∈ P . Assume c ∈ (P \ { n })00 . Since {c} implies m, then
P \ { n } would be a premise for m in contradiction to the minimality of P . Thus
c 6∈ (P \ { n })00 holds for all n ∈ P and therefore P is a proper premise for c.


4.2   Irreducible Attributes

We go one step further and also remove attributes m from our candidate set C
whose attribute concept µm is the V  meet of other attribute concepts µx1 , . . . , µxn ,
                                      n
where x1 , . . . , xn ∈ C, i.e., µm = i=1 µxi . This results in Algorithm 3 that no
                                          Fast Computation of Proper Premises            107


longer computes all proper premises, but a subset that still yields a complete
implicational base. We show that we only have to search for proper premises P
with P ⊆ N where N is the set of irreducible attributes of K.
   To ease the presentation, let us assume for the rest of this paper that the
formal context K is attribute-clarified.


Algorithm 3 Computing Enough Proper Premises
  Input: K = (G, M, I)
  P = { { m } | m ∈ M, { m } Vis a proper premise of K }
  N = M \ { x ∈ M | µx = n    i=1 µxi for an n ∈ N and xi ∈ M for 1 ≤ i ≤ n }
  for all m ∈ M do
    C = { u ∈ N \ { m } | 6 ∃v ∈ M : µu ∧ µm ≤ µv < µm }
    P = P ∪ { P ⊆ C | P minimal hypergraph transversal of { M \ g 0 | g $ m } }
  end for
  return P




Proposition 2. Let m be an attribute and let P be a proper premise for m. Let
x ∈ P , n ∈ N, and for 1 ≤ i ≤ n let xi ∈ M be attributes satisfying
 – m∈ / {Vx1 , . . . , xn },
            n
 – µx = i=1 µxi ,
 – xi ∈
      / ∅ for all 1 ≤ i ≤ n and
         00

 – µx < µxi for all 1 ≤ i ≤ n.
Then { x } is a proper premise for all xi and there exists a nonempty set Y ⊆
{ x1 , . . . , xn } such that (P \ { x }) ∪ Y is a proper premise for m.

Proof. It is clear that { x } is a proper premise for all xi , since xi ∈ { x }00 and
   / ∅00 . Define
xi ∈
                               QY := (P \ { x }) ∪ Y
for Y ⊆ { x1 , . . . , xn }. We choose Y ⊆ { x1 , . . . , xn } such that Y is minimal with
respect to m ∈ Q00Y . Such a set exists, since m ∈ ((P \ { x }) ∪ { x1 , . . . , xn })00
because of { x1 , . . . , xn } → { x }. Furthermore, Y 6= ∅, since m ∈    / (P \ { x })00 .
    We now claim that QY is a proper premise for m. Clearly m ∈                / QY , since
m∈ / Y . For all y ∈ Y it holds that m ∈    / (QY \ { y })00 or otherwise minimality of
Y would be violated. It therefore remains to show that m ∈           / (QY \ { y })00 for all
y ∈ QY \ Y = P \ { x }.

                         (QY \ { y })00 = ((P \ { x, y }) ∪ Y )00
                                         ⊆ ((P \ { y }) ∪ Y )00
                                         = (P \ { y })00

since { x } → Y and x ∈ P \{ y }. Since m ∈
                                          / (P \{ y })00 , we get m ∈
                                                                    / (QY \{ y })00
as required. In sum, QY is a proper premise for m.
108       Uwe Ryssel, Felix Distel and Daniel Borchmann


Lemma 2. Let N be the set of all meet-irreducible attributes of a context K.
Define

    P = { X ⊆ M | |X| ≤ 1, X proper premise } ∪ { X ⊆ N | X proper premise }

Then the set L = { P → P • | P ∈ P } is sound and complete for K.

Proof. Let m be an attribute and let P be a proper premise for m. If P ∈    / P then
it follows that P 6⊆ N . Thus we can find y1 ∈ P \N and elements x1 , . . . , xn ∈ M
with n ≥ 1 such that
 – m∈ / { xV1 , . . . , xn },
             n
 – µy1 = i=1 µxi ,
 – xi ∈
      / ∅00 for all 1 ≤ i ≤ n and
 – µx < µxi for all 1 ≤ i ≤ n.
By Proposition 2 we can find a proper premise P1 such that P → { m } fol-
lows from { y1 } → { x1 , . . . , xn } and P1 → { m }. Clearly { y1 } ∈ P, since all
singleton proper premises are contained in P. If P1 ∈     / P then we can apply
Proposition 2 again and obtain a new proper premise P2 , etc. To see that this
process terminates consider the strict partial order ≺ defined as

                        P ≺ Q iff ∀q ∈ Q : ∃p ∈ P : µp < µq.

It is easy to see that with each application of Proposition 2 we obtain a new
proper premise that is strictly larger than the previous with respect to ≺. Hence,
the process must terminate. This yields a set P 0 = { { y1 }, . . . , { yk }, Pk } ⊆ P
such that P → { m } follows from { Q → Q• | Q ∈ P 0 }. Thus L is a sound and
complete set of implications.

      Together with Lemma 1 this yields correctness of Algorithm 3.

Corollary 2. The set of proper premises computed by Algorithm 3 yields a
sound and complete set of implications for the given formal context.


5      Evaluation
5.1     Computing Proper Premises Instead of Intents
In both the stem base algorithms and our algorithms, runtime can be exponential
in the size of the input. In the classical case the reason is that the number
of intents can be exponential in the size of the stem base [13]. In the case of
our algorithms there are two reasons: the computation of proper premises is
TransEnum-complete, and there can be exponentially many proper premises.
The first issue is less relevant in practice because algorithms for TransEnum,
while still exponential in the worst case, behave well for most instances.
    To see that there can be exponentially many proper premises in the size of the
stem base, let us look at the context Kn from Table 1 for some n ≥ 2, consisting
                                               Fast Computation of Proper Premises      109


of two contranominal scales of dimension n × n and one attribute a with empty
extent. It can be verified that the proper premises of the attribute a are exactly
the sets of the form {mi | i ∈ I} ∪ {m0i | i ∈
                                             / I} for some I ⊆ {1, . . . , n}, while the
only pseudo-intents are the singleton sets and {m1 , . . . , mn , m01 , . . . , m0n }. Hence
there are 2n proper premises for a, while there are only 2n + 2 pseudo-intents.


           Table 1. Context Kn with Exponentially Many Proper Premises


                                     m1 . . . mn m01 . . . m0n a
                               g1
                                ..
                                 .       I6=           I6=
                               gn




    Next-Closure behaves poorly on contexts with many intents while our algo-
rithms behave poorly on contexts with many proper premises. In order to provide
evidence that our algorithm should behave better in practice we use formulae
for the expectation of the number of intents and proper premises in a formal
context that is chosen uniformly at random among all n × m-contexts for fixed
natural numbers n and m.4 Derivations of these formulae can be found in [7].
    The expected value for the number of intents in an n × m-context is
                    Xm  X   n  
                         m        n −rq
          Eintent =                   2 (1 − 2−r )m−q (1 − 2−q )n−r ,
                    q=0
                         q   r=0
                                  r

while the expected value for the number of proper premises for a fixed attribute
a in an n × m-context is
             Xn   m−1                        q
                  n X m                2 X      Y                  pi+1 −pi −1
  Epp = 2−n                     q! 2−q              1 − 2−q (1 + i)             .
             r=0
                  r q=0 q                     q i=0
                                           (p1 ,...,pq )∈N
                                          1≤p1 <··· then continue
       Br = reducedIntent(C)
       if Br is empty then continue
       add(P , component( Br ) )
    end for
                       Iterative Software Design of Computer Games through FCA                                            151


     All the lines are self-explicative except that with the add. The component function
receives the reduced intent of the formal concept and builds the component representa-
tion that has its attributes and functionalities.
     In some cases, the top concept (>) has a non-empty intent, so it would also generate
a component with all its features (name, position and orientation in our example of Fig-
ure 4). That component would be added in all entities so, instead of keeping ourselves in
a pure component-based architecture with an empty generic Entity class, we can move
all those top features to it. Figure 5 shows the components extracted from Rosette using
the lattice from Figure 4. The components have been automatically named concatenat-
ing each attribute name of the component or, when no one is available, by concatenating
all the message names that the component is able to carry out. For example, let us say
that the original name of the FightComp component was C health aim.


                                                                  Entity
                                                           - _name
                                                           - _position
                                                           - _orientation
                                                           + setPosition()
                                                           + setOrientation()
                                                           + update()
                                                           + emmitMessage()
                                                                        0..*

                       PlayerControllerComp                    IComponent             AIAndMovementComp

                                                           - _entity                  - _aiscript
                       - walk()                            - _messages                - goToEntity()
                       - stopWalk()                        + update()                 - goToPosition()
                       - turn()                            + handleMessage()          - steeringTo()



            FightComp          PhysicsComp      TriggerComp                DoorComp   GraphicsComp         SpeakerComp

         - _health            - _physicmodel   - _target                - _isOpen     - _graphicmodel     - _soundfile
         - _aim               - _physicclass   - trigger()              - open()      - setAnimation()    - playSound()
         - hurt()             - _scale         - touched()              - close()     - stopAnimation()   - stopSound()
         - shootTo()          - ApplyForce()
                                                             SpeedAttComp
                                                             - _speed




                          Fig. 5. The candidate components proposed by Rosette


    Summarizing all the process, when analysing a concept lattice, every formal concept
that provides a new feature (having no empty reduced intent) does not represent a new
entity type but a new component. The only exception is the formal concept in the top
of the lattice that represents the generic entity class, which has data and functionality
shared by all the entity types. Both the generic entity and every new component have
the ability of carrying out actions in the reduced intent of the formal concept and they
are populated with corresponding attributes.
    This way, we have easily obtained the candidate generic entity class and compo-
nents, but we still have to describe the entity types. Starting from every concept which
their reduced extents contain an entity type, Rosette uses the superconcept relation and
goes up until reaching the concept in the top of the lattice. For example, the Persona
entity type (Figure 4) would have components represented by formal concepts number
152     David Llansó et al.


8, 4, 3 and 2 (the number 6 has an empty reduced intent so it does not represent a com-
ponent) whilst the ResourceBearer entity type would have the same components but
also the number 10 and 9. Obviously, components of every entity type are stored in the
generic entity container represented by the formal concept number 1.
    Keep in mind that the final component distribution does not include information
about what components are needed for each entity. This knowledge is not thrown away:
Rosette stores all the information in the original lattice using OWL, which provides
a knowledge-rich representation that will let it provide some extra functionalities de-
scribed in the next sections.


5 Expert Tuning
The automatic process detailed above ends up with a collection of proposed compo-
nents with a generated name, and the Entity base class that may have some common
functionality. This result is presented to developers, who will be able to modify it using
their prior experience. Some of the changes will affect to the underlying formal lattice
(that is never shown to the users) in such a way that the relationship between it and the
initial formal context extracted from the class hierarchy will be broken. At this stage of
the process this does not represent an issue, because we will not use FCA anymore over
it. On the other hand, changes could be so dramatic that the lattice could even become
an invalid one. Fortunately, Rosette uses OWL as the underlying representation, that can
be used to represent richer structures than mere partially ordered sets. In any case, for
simplicity, in the rest of the paper we will keep talking about lattices although internally
our tool will not be using them directly.
     Users will be able to perform the next four operators over the proposed component
distribution:
 1. Rename: proposed components are automatically named according to their at-
    tribute names. The first operator users may perform is to rename them in order
    to clarify its purpose.
 2. Split: in some cases, two functionalities not related to each other may end up in
    the same component due to the entity type definitions (FCA will group two func-
    tionalities when both of them appears together in every entity type created in the
    formal hierarchy). In that case, Rosette gives developers the chance of splitting
    them in two different components. The expert will then decide which features re-
    main in the original component and which ones are moved to the new one (which
    is manually named). Formally speaking, this operator would modify the underly-
    ing concept lattice creating two concepts (A1, B1) and (A2, B2) that will have
    the same subconcepts and superconcepts than the original formal concept (A, B)
    where A ≡ A1 ≡ A2 and B ≡ B1 ∪ B2. The original concept is removed. Al-
    though this is not correct mathematically speaking, since with this operation we do
    not have concepts anymore, we still use the term in this and in the other operators
    for simplicity.
 3. Move features: this is the opposite operator. Sometimes some features lie in dif-
    ferent components but the expert considers that they must belong to the same com-
    ponent. In this context, features of one component (some elements of the reduced
                 Iterative Software Design of Computer Games through FCA              153


    intent) can be transferred to a different component. In the lattice, this means that
    some attributes are moved from a node to another one. When this movement goes
    up-down (for example from node 9 to node 10), Rosette will detect the possible in-
    consistency (entities extracted from node 11 would end with missed features) and
    warns the user to clone the feature also in the component generated from node 11.
    If the developer moves all the features of a component the result is an useless and
    empty component that is therefore removed from the system.
 4. Add features: some times features must be copied from one component to an-
    other one when FCA detects relationships that will not be valid in the long run. In
    our example, the dependency between node 3 and 4 indicates that all entities with
    a graphic model (4, GraphicsComp) will have physics (3, PhysicsComp), some-
    thing valid in the initial hierarchy but that is likely to change afterwards. With the
    initial distribution, all graphical entities will have an scale thanks to the physic
    component, but experts could envision that this should be a native feature of the
    GraphicsComp too. This operator let them to add those “missing” features to any
    component to avoid dependencies with other ones.

    The expert interaction is totally necessary, first of all because she has to name the
components but also because the system ignores some semantic knowledge and infor-
mation based in the developer experience. However, the bigger the example is, with
more entity types, the more alike is the proposed and the final set of components, just
because the system has more knowledge to distribute responsibilities.
    While using operators, coherence is granted because of the knowledge-rich OWL
representation that contains semantic information about entities, components, and fea-
tures (attributes and actions). This knowledge is useful while users tune the component
distribution, but also to check errors in the domain and in future steps of the game
development (as creating AIs that reason over the domain).
    Once users validate the final distribution, Rosette generates a big amount of source
code for all the components, that programmers will be fill up with the concrete be-
haviours.

5.1   Example
Figure 5 showed the resultant candidate of components proposed by Rosette for the
hierarchy of Figure 1, that can now be manipulated by the expert to tune some aspects.
The first performed changes are component rename (rename operator) that is, in fact,
applied in the figure.
    A hand-made component distribution of the original hierarchy would have ended
with that one shown in Figure 3, that is quite similar to the distribution provided by
Rosette. When using a richer hierarchy, both distributions are even more similar.
    With the purpose of demonstrating how the expert would use the available opera-
tors to transform the proposed set of components, we apply some modifications to the
automatically proposed distribution in order to turn it into the other one.
    First of all, we can consider the SpeedAttComp that has the speed attribute but no
functionalities. In designing terms this is acceptable, but rarely has sense from the im-
plementation point of view. Speed is used separately by PlayerControllerComp and
154     David Llansó et al.


AIAndMovementComp to adjust the movement, so we will apply the move features
operator moving (and cloning) the speed feature to both components, and removing
SpeedAttComp completely. This operator is coherent with the lattice (Figure 4): we are
moving the intent of the node labelled 9 to both subconcepts (10 and 11).
    After that, another application of the move features operator results in the movement
of the touched message interpretation from the TriggerComp to the PhysicsComp. This
is done for technical reasons in order to maintain all physic information in the same
component.
    Then, the split operator, which split components, is applied over the AIAndMove-
mentComp component twice. Due to the lack of entity types in the example, some fea-
tures resides in the same component though in the real implementation are divided. In
the first application of the split operator, the goToEntity and the goToPosition message
interpretations are moved to a new component, which is named GoToComp. The second
application results in the new SteeringToComp component with the steeringTo message
interpretation and the speed attribute. The original component is renamed as AIComp
by the rename operator and keeps the aiscript attribute.
    Finally, although the Entity class has received some generic features (from the top
concept, >), they are especially important in other components. Instead of just use
those features from the entity, programmers would prefer to maintain them also in those
other components. For this reason, we have to apply the add features operator over
the GraphicsComp, PhysicsComp and SpeakerComp components in order to add the
setPosition and the setOrientation functionalities to them.


6 Iterative Software Development with FCA

In the previous section we have presented a semi-automatic technique for moving from
class hierarchies to components. The target purpose is helping programmers facing up
to this kind of distributed system, which is widely used in computer game develop-
ments. Through the use of FCA, this technique splits entity behaviours in candidate
components but also provides experts with mechanisms for modifying these component
candidates. These mechanisms are the operators defined in Section 5, which execution
in the domain alter somehow the underlying formal lattice generated during the FCA
process.
    Attentive readers will have realized that the previous technique is valid for the first
step of the development but not for further development steps. Due to computer game
requirements change throughout the game development, the entity distribution is al-
ways changing. When the experts face up to this situation, they may decide to change
the entity hierarchy in order to use Rosette for generating a new set of components. The
application of FCA results in a new lattice that probably does not change a lot from the
previous one. However, the experts usually would have performed some modifications
in the proposed component distribution using our operators. As the process is now re-
peated, these changes would be lost every time the expert request a new candidate set
of components.
    Our intention in this section is to extend the previous technique in order to allow
an iterative software design. In this new approach, the modifications applied over one
                  Iterative Software Design of Computer Games through FCA               155


lattice can be extrapolated to other lattices in future iterations. Keep in mind that the
domain operators (Section 5) are applied over components that has been created from
a formal concept. So, these operators could be applied on similar formal concepts, of
another domain, in case that both domains share the part of the lattice affected by the
operators.
     From a high-level point of view, in order to preserve changes applied over the pre-
vious component suggestions, the system compares the new formal lattice, obtained
through FCA, with the previous one. The methodology identifies the part of the lattice
that does not significantly change between the two FCA applications. This way the tun-
ing operators executed in concepts of this part of the lattice could be reapplied in the
new lattice.
     The identification of the target part of the lattice is a semi-automatic process, where
formal concepts are related in pairs. Rosette automatically identifies the constant part of
the lattice, which for our purpose is the set of pairs of formal concepts that have the same
reduced intent. We do not care about the extent in our approach since the component
suggestion lays its foundations in the reduced intent. The components extrated from the
formal concepts that have not been matched up are presented to the expert. Then she
can provide matches between old components and new ones to the considered constant
part of the lattice.
     It is worth mentioning that some of the operators could not be executed in the new
domains due to component distribution may vary a lot after various domain iterations
but it is just because these operators become obsoleted.


6.1   Example

In Section 5.1 FCA is applied to a hierarchy and the automatic part of the proposed
methodology leads us to the set of components in Figure 5. The resultant domain was
modified by the expert, by using the tuning operators, and the component-based system
developed ends up with the components in Figure 3.
     Now, let us recover the example and suppose that the game design has new require-
ments. The game designers propose the addition of two new entity types: the Break-
ableDoor, which is a door that can be broken using weapons, and a Teleporter, which
moves entities that enter in them to a far target place. Designers also require the modifi-
cation of the ResourceBearer entity, which must have a currentEnemy attribute for the
artificial intelligence. The Rosette expert captures these domain changes by modifying
the entity hierarchy and uses the component suggestion module to distribute responsi-
bilities. The application of FCA to the current domain results in the lattice in Figure 6,
where formal concepts are tagged with letters from a to n.
     Comparing the new lattice with the lattice of the previous FCA application (Fig-
ure 4), Rosette determines that the pairs of formal concepts <1,a>, <2,b>, <4,d>,
<7,f>, <9,k> and <11,m> remain from the previous to the current iteration. When
Rosette finishes this automatic match, the formal concepts that were not put into pairs
and with no empty reduced intent are presented in the screen. In this moment, the expert
put the formal concepts <3,c>, <5,e>, <8,j> and <10,l> into pairs, based on their
experience and in the fact that these concepts are very similar (only some attributes
156     David Llansó et al.




                                Fig. 6. New concept lattice


changes). Just the g and h formal concepts have no pairs and will become new compo-
nents.
    So, in these steps, the part of the lattice that does not significantly change has been
identified and Rosette can extrapolate the modifications applied in the previous lattice to
the new one. After applying the operators to the new domain, the new set of candidate
components are finally given to the expert. Figure 7 shows these components, where
we can compare the result with the components in Figure 5. The general structure is
maintained but some actions and attributes has been moved between components. Fur-
thermore two new components have arisen. The stressed features denote new elements
(or moved ones) whilst the crossed out features mean that they do not belong to this
component anymore (FightComp. At this point the expert could continue with the itera-
tion by applying new operators to this set of components (i.e change the auto-generated
names of the new components).


7 Related Work and Conclusions
Regarding related work, we can mention other applications of FCA to software engi-
neering. The work described in [12] focuses on the use of FCA during the early phases
of software development. They propose a method for finding or deriving class can-
didates from a given use case description. Also closely related is the work described
in [10], where they propose a general framework for applying FCA to obtain a class
hierarchy in different points of the software life-cycle: design from scratch using a set
                         Iterative Software Design of Computer Games through FCA                                                                             157


                                                                                Entity
                                                                         - _name
                                                                         - _position
                                                                         - _orientation
                                                                         + setPosition()
                                                                         + setOrientation()
                                                                         + update()
                                                                         + emmitMessage()                  GoToComp
                                                                                      0...*

                                                                                                        - goToEntity()
                               PlayerControllerComp                         IComponent                  - goToPosition()
                               - _speed                                  - _entity
                                                                                                         SteeringToComp
                               - walk()                                  - _messages
                                                                                                        - _speed
                               - stopWalk()                              + update()
                               - turn()                                  + handleMessage()              - steeringTo()



 C_destination                         PhysicsComp           TriggerComp                      AIComp    GraphicsComp                                  C_health
                    FightComp                                                                                                 SpeakerComp
- destination                                              - _target                                    - _graphicmodel                           - health
                                      - _physicmodel                                  - _aiscript
                 - _health                                                                              - _scale             - _soundfile
- teleportTo()                        - _physicclass       - trigger()                - _currentEnemy                                             - hurt()
                 - _aim                                                                                 - setPosition()      - setPosition()
                                      - _scale
                 - hurt()                                                                               - setOrientation()   - setOrientation()
                                      - setPosition()
                 - shootTo()                                                                            - setAnimation()     - playSound()
                                      - setOrientation()                     DoorComp
                                      - applyForce()                                                    - stopAnimation()    - stopSound()
                                      - touched()                         - _isOpen
                                                                          - open()
                                                                          - close()




                         Fig. 7. The new candidate components proposed by Rosette



of class specifications, refactoring from the observation of the actual use of the classes
in applications, and hierarchy evolution by incrementally adding new classes. The main
difference with the approach presented here is that they try to build a class hierarchy
while we intend to distribute functionality among sibling components, which solve the
problem with multiple inheritance in FCA lattices.
     The process of identifying components with FCA is not very different of identifying
traits [13] and aspects [17]. In [13] Lienhard et al. present a process that identifies traits
from inheritance hierarchies that is bases in the same principles than our system but is
not exactly the same due to components are more autonomous pieces of software than
traits. Components save their own state whilst traits are just a set of methods. However,
which makes the difference between both proposals is the iterability.
     A possible scenario for applying the techniques described in the paper is to re-
engineer a game from class hierarchy to components. In the last years, we have been
working on Javy 2 [11], a educational game that was initially developed using an en-
tity hierarchy (a portion was shown in Figure 1), and afterwards manually converted to
a component-based architecture (Figure 3). When Rosette was available, we tested it
using the original Javy 2 hierarchy, and the initial component distribution was quite ac-
ceptable when compared with the human-made one. We could have saved a significant
amount of time if it had been available on time.
    In the long term, our goal is to support the up-front development of games with
a component-based architecture where entities are connected to a logical hierarchical
view. In this paper we have shown how we allow an iterative process when defining the
class hierarchy, so operators applied to the early versions of the component distribution
are automatically reapplied in the late ones. Nevertheless, more work must be done in
the code generation phase to do it reversible. Changes in the autogenerated source code
158      David Llansó et al.


are still, unfortunately, out of the scope of Rosette so they must be manually redone for
each class hierarchy iteration.

References
 1. K. Beck. Embracing change with extreme programming. Computer, 32:70–77, October
    1999.
 2. K. Beck and C. Andres. Extreme Programming Explained: Embrace Change (2nd Edition).
    Addison-Wesley Professional, 2004.
 3. G. Birkhoff. Lattice Theory, third editon. American Math. Society Coll. Publ. 25, Provi-
    dence, R.I, 1973.
 4. W. Buchanan. Game Programming Gems 5, chapter A Generic Component Library. Charles
    River Media, 2005.
 5. M. Chady. Theory and practice of game object component architecture. In Game Developers
    Conference, 2009.
 6. M. Dao, M. Huchard, T. Libourel, A. Pons, and J. Villerd. Proposals for Multiple to Single
    Inheritance Transformation. In MASPEGHI’04: 3rd Workshop on Managing SPEcializa-
    tion/Generalization Hierarchies, pages 21–26, Oslo (Norway), 2004.
 7. S. Ducasse, O. Nierstrasz, N. Schärli, R. Wuyts, and A. P. Black. Traits: A mechanism for
    fine-grained reuse. ACM Trans. Program. Lang. Syst., 28:331–388, March 2006.
 8. B. Ganter and R. Wille. Formal concept analysis. Mathematical Foundations, 1997.
 9. S. Garcés. AI Game Programming Wisdom III, chapter Flexible Object-Composition Archi-
    tecture. Charles River Media, 2006.
10. R. Godin and P. Valtchev. Formal Concept Analysis, chapter Formal Concept Analysis-Based
    Class Hierarchy Design in Object-Oriented Software Development, pages 304–323. Springer
    Berlin / Heidelberg, 2005.
11. P. P. Gómez-Martı́n, M. A. Gómez-Martı́n, P. A. González-Calero, and P. Palmier-Campos.
    Using metaphors in game-based education. In K. chuen Hui, Z. Pan, R. C. kit Chung, C. C.
    Wang, X. Jin, S. Göbel, and E. C.-L. Li, editors, Technologies for E-Learning and Digital En-
    tertainment. Second International Conference of E-Learning and Games (Edutainment’07),
    volume 4469 of Lecture Notes in Computer Science, pages 477–488. Springer Verlag, 2007.
12. W. Hesse and T. A. Tilley. Formal Concept Analysis used for Software Analysis and Mod-
    elling, volume 3626 of LNAI, pages 288–303. Springer, 2005.
13. A. Lienhard, S. Ducasse, and G. Arévalo. Identifying traits with formal concept analysis. In
    Proceedings of the 20th IEEE/ACM international Conference on Automated software engi-
    neering, ASE ’05, pages 66–75, New York, NY, USA, 2005. ACM.
14. D. Llansó, M. A. Gómez-Martı́n, P. P. Gómez-Martı́n, and P. A. González-Calero. Explicit
    domain modelling in video games. In International Conference on the Foundations of Digital
    Games (FDG), Bordeaux, France, June 2011. ACM.
15. B. Rene. Game Programming Gems 5, chapter Component Based Object Management.
    Charles River Media, 2005.
16. K. Schwaber and M. Beedle. Agile Software Development with Scrum. Prentice Hall PTR,
    Upper Saddle River, NJ, USA, 1st edition, 2001.
17. T. Tourwe and K. Mens. Mining aspectual views using formal concept analysis. In Proceed-
    ings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop,
    pages 97–106, Washington, DC, USA, 2004. IEEE Computer Society.
18. P. Valtchev, D. Grosser, C. Roume, and M. R. Hacene. Galicia: An open platform for lattices.
    In In Using Conceptual Structures: Contributions to the 11th Intl. Conference on Conceptual
    Structures (ICCS’03, pages 241–254. Shaker Verlag, 2003.
19. M. West. Evolve your hiearchy. Game Developer, 13(3):51–54, Mar. 2006.
                  Fuzzy-Valued Triadic Implications

                                   Cynthia Vera Glodeanu

                               Technische Universität Dresden,
                                  01062 Dresden, Germany
                       Cynthia_Vera.Glodeanu@mailbox.tu-dresden.de



            Abstract. We present a new approach for handling fuzzy triadic data
            in the setting of Formal Concept Analysis. The starting point is a fuzzy-
            valued triadic context (K1 , K2 , K3 , Y ), where K1 , K2 and K3 are sets
            and Y is a ternary fuzzy relation between these sets. First, we generalise
            the methods of Triadic Concept Analysis to our setting and show how
            they fit other approaches to Fuzzy Triadic Concept Analysis. Afterwards,
            we develop the fuzzy-valued triadic implications as counterparts of the
            various triadic implications studied in the literature. These are of major
            importance for the integrity of Fuzzy and Fuzzy-Valued Triadic Concept
            Analysis.

            Keywords: Formal Concept Analysis, fuzzy data, three-way data


      1   Introduction

      So far, the fuzzy approaches to Triadic Concept Analysis considered all three
      components of a triadic concept as fuzzy sets. In [1] the methods from Triadic
      Concept Analysis were generalised to the fuzzy setting. A more general approach
      was presented in [2], where different residuated lattices were considered for each
      fuzzy set. A somehow different strategy was considered in [3] using alpha-cuts.
          Our approach differs from the other ones in considering just two components
      as fuzzy and one as crisp in a triadic concept. This is motivated by the fact that
      in some situations it is not appropriate to regard all sets as fuzzy. For example, it
      is not natural to say that half of a person is old, however we may say a person is
      half old. First, we translate methods of Triadic Concept Analysis to our setting.
      Compared to other works, we generalise all triadic derivation operators and show
      how they change for the fuzzy approaches considered by other authors. Besides
      these results, the main achievement of this paper is the generalisation of the
      various triadic implications presented in [4].
          Due to the large amount of results in this paper, we concentrate on giving
      an intuition of the methods and omit proofs whenever they do not influence
      the understanding. The missing proofs, further results concerning fuzzy-valued
      triadic concepts and trilattices can be found in [5]. There, we also study the
      fuzzy-valued triadic approach to Factor Analysis.
          The paper is structured as follows: In Section 2 we give brief introductions
      to Triadic and Formal Fuzzy Concept Analysis. In Section 3 we develop our




c 2011 by the paper authors. CLA 2011, pp. 159–173. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
160 2        Fuzzy-Valued
          Cynthia         Triadic Implications
                  Vera Glodeanu

    fuzzy-valued setting, defining context, concept, derivation operators and show
    how they correspond other to approaches to Fuzzy Triadic Concept Analysis.
    We also comment on the reasons why our setting is a proper generalisation.
    In Section 4 we present the fuzzy-valued triadic implications. The developed
    methods are accompanied by illustrative examples. The last section contains
    concluding remarks and further topics of research.


    2     Preliminaries
    We assume basic familiarities with Formal Concept Analysis and refer the reader
    to [6]. In the following we give brief introductions to Triadic Concept Analysis
    [7, 8] and Formal Fuzzy Concept Analysis [9, 10].

    2.1     Triadic Concept Analysis
    As introduced in [7], the underlying structure of Triadic Concept Analysis is a
    triadic context defined as a quadruple (K1 , K2 , K3 , Y ) where K1 , K2 and K3
    are sets and Y is a ternary relation, i.e., Y ⊆ K1 × K2 × K3 . The elements
    of K1 , K2 and K3 are called (formal) objects, attributes and conditions,
    respectively, and (g, m, b) ∈ Y is read: object g has attribute m under condition
    b. A triadic concept (shortly triconcept) of a triadic context (K1 , K2 , K3 , Y )
    is defined as a triple (A1 , A2 , A3 ) with Ai ⊆ Ki , i ∈ {1, 2, 3} that is maximal
    with respect to component-wise set inclusion. For a triconcept (A1 , A2 , A3 ), the
    components A1 , A2 and A3 are called the extent, the intent, and the modus
    of (A1 , A2 , A3 ), respectively.
        Small triadic contexts can be represented through three-dimensional cross
    tables (see Example 1). Pictorially, a triconcept is a rectangular box full of
    crosses in the three-dimensional cross table representation of (K1 , K2 , K3 , Y ),
    where this “box” is maximal under proper permutation of rows, columns and
    layers of the cross table.
        For {i, j, k} = {1, 2, 3} with j < k and for X ⊆ Ki and Z ⊆ Kj × Kk , the
    (−)(i) -derivation operators are defined by

            X 7→ X (i) := {(kj , kk ) ∈ Kj × Kk | (ki , kj , kk ) ∈ Y for all ki ∈ X},          (1)
                      (i)
             Z 7→ Z         := {ki ∈ Ki | (ki , kj , kk ) ∈ Y for all (kj , kk ) ∈ Z}.          (2)

    These derivation operators correspond to the derivation operators of the dyadic
    contexts defined by K(i) := (Ki , Kj × Kk , Y (i) ) for {i, j, k} = {1, 2, 3}, where
    k1 Y (1) (k2 , k3 ) :⇐⇒ k2 Y (2) (k1 , k3 ) :⇐⇒ k3 Y (3) (k1 , k2 ) :⇐⇒ (k1 , k2 , k3 ) ∈ Y .
    Due to the structure of triadic contexts further derivation operators can be
    defined. For {i, j, k} = {1, 2, 3} and for Xi ⊆ Ki , Xj ⊆ Kj and Xk ⊆ Kk the
    (−)Xk -derivation operators are defined by

            Xi 7→ XiXk := {kj ∈ Kj | (ki , kj , kk ) ∈ Y for all (ki , kk ) ∈ Xi × Xk },        (3)
           Xj 7→ XjXk := {ki ∈ Ki | (ki , kj , kk ) ∈ Y       for all (kj , kk ) ∈ Xj × Xk }.   (4)
                                           Fuzzy-ValuedTriadic
                                          Fuzzy-valued  TriadicImplications
                                                                Implications          3
                                                                                    161

These derivation operators correspond to the derivation operators of the dyadic
contexts defined by Kij                       ij                       ij
                            Xk := (Ki , Kj , YXk ) where (ki , kj ) ∈ YXk if and only if
(ki , kj , kk ) ∈ Y for all kk ∈ Xk . The structure on the set of all triconcepts T(K)
is the set inclusion in each component of the triconcept. For each i ∈ {1, 2, 3}
there is a quasiorder .i and its corresponding equivalence relation ∼i defined by

             (A1 , A2 , A3 ) .i (B1 , B2 , B3 ) :⇐⇒ Ai ⊆ Bi and
             (A1 , A2 , A3 ) ∼i (B1 , B2 , B3 ) :⇐⇒ Ai = Bi (i = 1, 2, 3).

The triconcepts ordered in this way form complete trilattices, the triadic coun-
terparts of concept lattices, as proved in the Basic Theorem of Triadic Concept
Analysis [8]. However, unlike the dyadic case, the extents, intents and modi,
respectively, do not form a closure system in general.

Example 1. The triadic context displayed below consists of the object set K1 =
{1, 2, 3}, the attribute set K2 = {a, b, c} and the condition set K3 = {A, B}. The
context has 12 triconcepts which are displayed in the same figure on the right.
For example, the first concept means that object 1 has attributes a and b under


                              No. Extent Intent Modus No. Extent Intent Modus
          A     B              1   {1} {a, b} {K3 }    7   {3} {K2 } {B}
        a b c a b c            2 {K1 } {b}       {A}   8 {K1 } {a} {B}
      1 ××    ××               3 {2, 3} {b, c} {A}     9 {2, 3} {c} {K3 }
      2   ×× × ×               4   {∅} {K2 } {K3 } 10      {3} {b, c} {K3 }
      3   ×× ×××               5 {1, 3} {a, b} {B}    11 {K1 } {K2 } {∅}
                               6 {2, 3} {a, c} {B}    12 {K1 } {∅} {K3 }

                Fig. 1. Triadic context and the associated triconcepts



all conditions from K3 . However, as two components of a triconcept are necessary
to determine the third one, {a, b} is also an intent of another triconcept, namely
of the fifth one.


2.2    Formal Fuzzy Concept Analysis

A complete residuated lattice L := (L, ∧, ∨, ⊗, →, 0, 1) is an algebra such
that: (1) (L, ∧, ∨, 0, 1) is a complete lattice, (2) (L, ⊗, 1) is a commutative
monoid, (3) 0 is the least and 1 the greatest element, (4) the adjointness property
holds for all a, b, c ∈ L, i.e., a ⊗ b ≤ c ⇔ a ≤ b → c. Then, ⊗ is called mul-
tiplication, → residuum and (⊗, →) adjoint couple. Each of the following
adjoint couples make L a complete residuated lattice:
162 4      Fuzzy-Valued
        Cynthia         Triadic Implications
                Vera Glodeanu



        Lukasiewicz: a ⊗ b := max(0, a + b − 1) with a → b := min(1, 1 − a + b)
                                                      
                                                        1, a ≤ b
        Gödel:      a ⊗ b := min(a, b) with a → b :=
                                                        b, a  b
                                               
                                                 1, a ≤ b
        Product:     a ⊗ b := ab with a → b :=
                                                 b/a, a  b

        The hedge operator is defined as a unary function ∗ : L → L which satisfies
    the following properties: (1) 1∗ = 1, (2) a∗ ≤ a, (3) (a → b)∗ ≤ a∗ → b∗ , and
    (4) a∗∗ = a∗ . Typical examples are the identity, i.e., for all a ∈ L it holds that
    a∗ = a, and the globalization, i.e., a∗ = 0 for all a ∈ L \ {1} and a∗ = 1 if and
    only if a = 1.
        A triple (G, M, I) is called a formal fuzzy context if I : G × M → L is
    a fuzzy relation between the sets G and M and L is the support set of some
    residuated lattice. Elements from G and M are called objects and attributes,
    respectively. The fuzzy relation I assigns to each g ∈ G and each m ∈ M a truth
    degree I(g, m) ∈ L to which the object g has the attribute m. For fuzzy sets
    A ∈ LG and B ∈ LM the derivation operators are defined by
                      ^                                  ^
            Ap (m) :=    (A(g)∗ → I(g, m)), B p (g) :=       (B(m) → I(g, m)),      (5)
                     g∈G                               m∈M

    for g ∈ G and m ∈ M . Then, Ap (m) is the truth degree of the statement “m
    is shared by all objects from A” and B p (g) is the truth degree of “g has all
    attributes from B”. For now, we take for ∗ the identity. It plays an important
    role in the computation of the stem base, as we will see later.
        A fuzzy concept is a tuple (A, B) ∈ LG × LM such that Ap = B and
      p
    B = A. Then, A is called the (fuzzy) extent and B the (fuzzy) intent of
    (A, B). Fuzzy concepts represent maximal rectangles with truth values different
    from zero in the fuzzy context. The fuzzy concepts ordered by the fuzzy set
    inclusion form fuzzy concept lattices [9, 10]. Taking in (5) for ∗ hedges different
    from the identity, we obtain the so-called fuzzy concept lattices with hedges [11].
    Example 2. The fuzzy context displayed below has the object set G = {x, y, z},
    the attribute set M = {a, b, c, d} and the set of truth values is the 3-element
    chain L = {0, 0.5, 1}. Using the Gödel logic and the derivation operators defined
    in Equation 5 with the hedge ∗ being the iden-
                                                                      a b c d
    tity we obtain 10 fuzzy concepts. For example
                                                                   x 1 1 0.5 0
    ({1, 0.5, 0}, {1, 1, 0, 0}) is a fuzzy concept. The extent
                                                                   y 1 0.5 0 1
    contains the truth values of each object belonging to
                                                                   z 1 0 0 0.5
    the extent, i.e., in this case x belongs fully to the set, y
    belongs to it with a truth value 0.5 and z does not belong to the extent. Similar
    affirmations can be done for the intent. Using the Lukasiewicz logic in the same
    setting we obtain 13 fuzzy concepts. On this set of truth values the only possible
    hedge operators are the identity and globalization. As one of the major roles
    of the hedge operators is to control the size of the fuzzy concept lattice, the
                                           Fuzzy-ValuedTriadic
                                          Fuzzy-valued  TriadicImplications
                                                                Implications          5
                                                                                    163

number of fuzzy concepts will be smaller, when using in (5) a hedge different
from the identity. In our example, using the globalization operator as the hedge,
we obtain 6 fuzzy concepts both with the Gödel and Lukasiewicz logic. As we
will see immediately, the hedges play also an important role for the attribute
implications, especially for the stem base.

   Fuzzy implications were studied in a series of papers by R. Belohlavek and V.
Vychodil, as for example in [12, 13]. For fuzzy sets A, B ∈VLX the subsethood
degree of A being a subset of B is given by tv(A ⊆ B) = x∈X (A(x) → B(x)).
Let A and B be fuzzy attribute sets, then the truth value of the implication
A → B is given by

    tv(A → B) := tv(∀g ∈ G((∀m ∈ A, (g, m) ∈ I) → (∀n ∈ B, (g, n) ∈ I)))
                  ^ ^                           ^
               =     (    (A(m) → I(g, m)) →      (B(n) → I(g, n)))
                    g∈G m∈M                            n∈M
                               pp
                 = tv(B ⊆ A ).

Example 3. Let us go back to our fuzzy context from Example 2. Consider the
Gödel logic and the derivation operators from (5) with ∗ being the identity. Then,
b(1)pp = {1, 0.5, 0}p = {1, 1, 0, 0}. Now, tv(b(1) → a(1)) = tv({a(1)} ⊆ b(1)pp ) = 1
and tv(b(1) → {a(1), c(0.5)}) = 0 because c(0.5) ∈     / b(1)pp . On the other hand,
considering in (5) the globalization as the hedge, we obtain b(1)pp = {1, 0.5, 0}p =
{1, 1, 0.5, 0} and therefore tv(b(1) → {a(1), c(0.5)}) = 1. Yet another example is
tv(b(0.5) → b(1)) = tv({b(1)} ⊆ b(0.5)pp ) = tv({b(1)} ⊆ {1, 0.5, 0, 0}) = 0.5.

   Due to the large number of implications in a fuzzy and even in a crisp formal
context, one is intrested in the stem base of the implications. The stem base is
a set of implications which is non-redundant and complete. The existence and
construction of the stem base for the discrete case was studied in [14], see also [6].
The problem for the fuzzy case was studied in [13]. There, the authors showed
that using in (5) the globalization, the stem base of a fuzzy context is uniquely
determined. Using hedges different from the globalization, a fuzzy context may
have more than one stem base.


3    Fuzzy-Valued Triadic Concept Analysis
Now, we are ready to develop our fuzzy-valued triadicsetting. We will define
fuzzy-valued triadic contexts, concepts and derivation operators.
    For a triadic context K = (K1 , K2 , K3 , Y ) a dyadic-cut (shortly d-cut) is
defined as ciα := (Kj , Kk , Yαjk ), where {i, j, k} = {1, 2, 3} and α ∈ Ki . A d-cut is
actually a special case of Kij                    ij
                               Xk = (Ki , Kj , YXk ) for Xk ⊆ Kk and |Xk | = 1. Each
d-cut is itself a dyadic context.
Definition 1. A fuzzy-valued triadic context ( f-valued triadic context)
is a quadruple K := (K1 , K2 , K3 , Y ), where Y is a ternary fuzzy relation between
the sets Ki with i ∈ {1, 2, 3}, i.e., Y : K1 ×K2 ×K3 → L and L is the support set
164 6      Fuzzy-Valued
        Cynthia         Triadic Implications
                Vera Glodeanu

    of some residuated lattice. The elements of K1 , K2 and K3 are called objects,
    attributes and conditions, respectively. To every triple (k1 , k2 , k3 ) ∈ K1 ×
    K2 × K3 , Y assigns a truth value tvk3 (k1 , k2 ) to which object k1 has attribute k2
    under condition k3 .
        The f-valued triadic context can be represented as a three-dimensional table,
    the entries of which are fuzzy values (see Example 4). In K one can interchange
    the roles played by the sets K1 , K2 and K3 requiring, for example, that Y assigns
    to every triple (k2 , k3 , k1 ) a truth value tvk1 (k2 , k3 ) to which attribute k2 exists
    under condition k3 having object k1 .
    Definition 2. A fuzzy-valued triadic concept (shortly f-valued tricon-
    cept) of an f-valued triadic context (K1 , K2 , K3 , Y ) is a triple (A1 , A2 , A3 ) with
    A1 ⊆ LK1 , A2 ⊆ LK2 and A3 ⊆ K3 that is maximal with respect to component-
    wise set inclusion. The components A1 , A2 and A3 are called (f-valued) extent,
    (f-valued) intent, and the modus of (A1 , A2 , A3 ), respectively. We denote by
    T(K) the set of all f-valued triconcepts.
       This definition immediately implies that the d-cut (K1 , K2 , Yk12
                                                                        3
                                                                          ) is a fuzzy
    context for every k3 ∈ K3 .
    Example 4. We consider an f-valued triadic context with values from the 3-
    element chain {0, 0.5, 1}. The object set K1 = {1, 2, 3, 4, 5} contains 5 groups
    of students, the attribute set K2 = {f, s, v} contains 3 feelings, namely, fevered
    (f), serious (s), vigilant (v) and the condition set K3 = {E, P, F } contains the
    events: Doing an exam (E), giving a presentation (P) and meeting friends (F).
    Using the Lukasiewicz logic, we obtain 30 f-valued triconcepts and with the Gödel


                                     E           P            F
                                 f s v       f s v         f s v
                            1    1 1 1 1 0.5 0.5           0 0.5 1
                            2    1 0.5 1 0.5 0 0           0 0 0.5
                            3   0.5 0.5 0.5 0.5 0.5 0      0 0 0.5
                            4   0.5 0 0.5 0.5 0.5 0.5      0 0.5 0.5
                            5    1 1 1 1 0.5 0.5           0 0.5 1

                                Fig. 2. F-valued triadic context



    logic 34. For example, ({1, 1, 0.5, 0, 0}, {1, 0.5, 1}, {E}) is an f-valued triconcept
    meaning that while doing an exam the first two student groups and half of
    the third one are fevered, vigilant and moderately serious. Another example
    is ({1, 1, 1, 1, 1}, {0.5, 0, 0}, {E, P }) meaning that all students are moderately
    fevered while giving a presentation. Yet another example is ({1, 0, 0, 0.5, 1}, {1, 0,
    0.5}, {E, P }) signifying that the first, the last and half of the 4-th group of
    students are fevered and moderately vigilant while doing an exam and giving a
    presentation.
                                            Fuzzy-ValuedTriadic
                                           Fuzzy-valued  TriadicImplications
                                                                 Implications            7
                                                                                       165

Lemma 1. Every f-valued triadic context is isomorphic to a triadic context.
Proof. According to [9], every fuzzy context is isomorphic to a formal context,
namely to its double-scaled context. Every condition d-cut is a fuzzy context. By
double-scaling each condition d-cut we obtain the corresponding double-scaled
                e for an f-valued triadic context K.
triadic context K
    Formally, suppose that K e := (K + , K + , K3 , Ye ) is the double-scaled triadic
                                        1   2
context of K := (K1 , K2 , K3 , Y ), the construction of which is given below. We
have to show that the considered isomorphism is given by
                           e with ϕ(A1 , A2 , A3 ) := (A+ , A+ , A3 )
              ϕ : T(K) → T(K)                           1    2

and that the inverse map is given by
                   e → T(K) with ψ(A1 , A2 , A3 ) := (A♦ , A♦ , A3 ).
             ψ : T(K)                                  1    2

Therefore, we have to prove the following statements: For all f-valued triconcepts
                                                                        e :
(A1 , A2 , A3 ), (B1 , B2 , B3 ) ∈ T(K) and for all (X1 , X2 , X3 ) ∈ T(K)
                             e
        ϕ(A1 , A2 , A3 ) ∈ T(K),
        ψ(X1 , X2 , X3 ) ∈ T(K),
        ψϕ(A1 , A2 , A3 ) = (A1 , A2 , A3 ),
        ϕψ(X1 , X2 , X3 ) = (X1 , X2 , X3 ),
        (A1 , A2 , A3 ) .i (B1 , B2 , B3 ) ⇔ ϕ(A1 , A2 , A3 ) .i ϕ(B1 , B2 , B3 ),

for all i ∈ {1, 2, 3}. These statements can be proven by basic properties of fuzzy
sets and triadic derivation operators. Due to limitation of space, we skip the
proof.                                                                           t
                                                                                 u
    We present the construction of K e := (K + , K + , K3 , Ye ), the double-scaled
                                               1   2
triadic context, for a given f-valued triadic context K = (K1 , K2 , K3 , Y ). Let
Xi ⊆ LKi with i ∈ {1, 2} and let L be the support set of some residuated lattice.
We define

        Xi+ := {(ki , µ) | ki ∈ Ki , µ ∈ L, µ ≤ Xi (ki )} ⊆ Ki∗ := Ki × L,
               _
        Xi♦ :=   {µ | (ki , µ) ∈ Xi } ⊆ LKi .

Then, Ye ⊆ K1+ × K2+ × K3 and

 ((k1 , µ), (k2 , λ), k3 ) ∈ Ye :⇐⇒ µ ⊗ λ ≤ tvk3 (k1 , k2 ) ⇐⇒ µ ⊗ λ ≤ Y (k1 , k2 , k3 ).

    According to the above lemma, the f-valued triadic contexts fulfill all the
properties the triadic contexts have. The f-valued triconcepts ordered by the
(fuzzy) set inclusion form a complete fuzzy trilattice. Due to limitation of space
we omit the proofs.

   For our f-valued setting we want to obtain the corresponding (−)Ak and
(−)(i) derivation operators. However, these can be defined in various ways. We
166 8      Fuzzy-Valued
        Cynthia         Triadic Implications
                Vera Glodeanu

    distinguish between more cases for the (−)(i) -derivation operators. In case of the
    (−)(i) -derivation operators with Z = Xj × X3 ⊆ LKj × K3 and Xi ⊆ LKi for
    {i, j} = {1, 2} the situation is easy. They are defined as
                               ^
                Z 7→ Z (i) :=      {Xj (kj ) → Y (i) (ki , (kj , k3 )) | ∀k3 ∈ K3 },
                               kj ∈Kj
                     (i)
              Xi 7→ Xi := (Tl3 , {k3 ∈ K3 | Tk3 ⊆ Tl3 }) for l3 ∈ K3 ,
                        V
    where Tl3 := ki ∈Ki (Xi (ki ) → Y (i) (ki , (kj , l3 )) with the derivation operators
    from the fuzzy dyadic context K(i) := (Ki , Kj ×K3 , Y (i) ) and Y (i) (ki , (kj , k3 )) :=
    Y (ki , kj , k3 ). The (−)(3) -derivation operator for Z := X1 × X2 ⊆ LK1 × LK2
    and X3 ⊆ K3 is defined by

              Z 7→ Z (3) : = {k3 ∈ K3 | k1 ⊗ k2 ≤ tvk3 (k1 , k2 ), ∀(k1 , k2 ) ∈ Z}          (6)
                                  ^
                           =              (Z(k1 , k2 ) → Y (3) ((k1 , k2 ), k3 ))∗ ,         (7)
                              (k1 ,k2 )∈K1 ×K2


    where Z(k1 , k2 ) := X1 (k1 ) ⊗ X2 (k2 ), ∗ is the globalization in order to assure that
    Z (3) is crisp and we have the dyadic fuzzy context K(3) := (K1 × K2 , K3 , Y (3) )
    with Y (3) ((k1 , k2 ), k3 ) := Y (k1 , k2 , k3 ). We search for the conditions which con-
    tain the maximal rectangle generated by Z.
                                  (3)
         The situation for X3 is quite tricky. Applying the derivation operators in
    K for X3 , we get a truth value l ∈ L such that l = k1 ⊗ k2 instead of a tuple
      (3)

    (k1 , k2 ). To obtain such a tuple, we first have to compute the double-scaled con-
    text K.e Afterwards, we use the crisp (−)(3) -derivation operator in K        e to find the
    components of the triconcept. Finally, we transform these into fuzzy sets as de-
    scribed in the construction of K.    e This way, we obtain the tuples ((k1 , µ), (k2 , ν))
    consisting of objects and attributes with their truth values instead of the truth
    value k1 ⊗ k2 .
         For other approaches of fuzzy triadic data the derivation operators given in
    (7) and the above construction suffice for any (−)(i) derivation operator.

    Proposition 1. The (−)(i) -derivation operators with i ∈ {1, 2, 3} yield f-valued
    triconcepts.
                                                                             (1)
    Proof. Suppose X1 ⊆ LK1 , X2 ⊆ LK2 and X3 ⊆ K3 . We have X1                    = (Tl3 , {k3 |
    Tk3 ⊆ Tl3 }), where
                              ^
                        Tl3 =    (X1 (k1 ) → Y (1) (k1 , (k2 , l3 )))
                                  k1 ∈K1
                                   ^
                              =            (X1 (k1 ) → Yl12
                                                         3
                                                            (k1 , k2 )).
                                  k1 ∈K1


    Since K12l3 is a dyadic fuzzy context, (X1 , Tl3 ) =: (X1 , A2 ) is a fuzzy preconcept
    in Kl3 , i.e., X1p ⊆ A2 and Ap2 ⊆ X1 with the derivation operators of K12
        12
                                                                                   l3 given
                                                  Fuzzy-ValuedTriadic
                                                 Fuzzy-valued  TriadicImplications
                                                                       Implications          9
                                                                                           167

by Equation (5). In particular we have X1p ⊆ A2 . For any k3 ∈ K3 if Tk3 ⊆ Tl3 ,
then (X1 , A2 ) is a fuzzy preconcept also in K12 l3 ∪k3 . Proceeding alike, we obtain
the largest set A3 ⊆ K3 containing l3 such that TA3 ⊆ Tl3 . Then, (X1 , A2 ) is
a fuzzy preconcept in K12 A3 . So far, we obtained the last two components of the
f-valued triconcept and apply on them the (−)(1) -derivation operator to obtain
the first one. Now, we have
                              ^
           (A2 × A3 )(1) =        {A2 (k2 ) → Y 1 (k1 , (k2 , k3 )) | ∀k3 ∈ A3 }
                              k2 ∈K2
                               ^
                          =            (A2 (k2 ) → YA123 (k1 , k2 )),
                              k2 ∈K2


which is A2 derivated in K12  A3 , i.e., the fist component of the triconcept, namely
A1 . Since (A1 , A2 ) is a fuzzy concept, it is a maximal rectangle and A3 is the
largest set containing this maximal rectangle.
    We still have to check the other pair of derivation operators. Let X3 ⊆ K3 ,
                             (3)
then the maximality of X3 = (A1 , A2 ) is automatically satisfied, as we obtain
  (3)
X3 from the double scaled context. The maximality of (A1 × A2 )(3) follows
analogously to the first case.                                                     tu

   As a direct consequence of this proposition, we have the following statement:

Proposition 2. For an f-valued triconcept (A1 , A2 , A3 ) it holds that Ai = (Aj ×
Ak )(i) for {i, j, k} = {1, 2, 3} with j < k.                                    t
                                                                                 u

    For the (−)Ak -derivation operators we also distinguish between two cases,
namely when Ak is a crisp set and when it is fuzzy. When Ak is crisp, i.e.,
Ak := A3 we proceed as follows: For Xi ⊆ LKi with i ∈ {1, 2} and A3 ⊆ K3 we
define
                                 ^
                X1 7→ X1A3 :=        (X1 (k1 )• → YA123 (k1 , k2 )),       (8)
                                       k1 ∈K1
                                         ^
                   X2 7→ X2A3 :=                (X2 (k2 ) → YA123 (k1 , k2 ))              (9)
                                       k2 ∈K2


for the dyadic fuzzy context K12                12
                              A3 := (K1 , K2 , YA3 ). where


    YA123 : K1 × K2 × A3 → L,
                                  ^
               YA123 (k1 , k2 ) := {tvk3 (k1 , k2 ) | ∀(k1 , k2 , k3 ) ∈ K1 × K2 × A3 }.

These derivation operators are the fuzzy counterparts of the (−)Ak -derivation
operators, because Ak is crisp. In the discrete case we have (ki , kj ) ∈ YAi,jk if
and only if for all kk ∈ Ak it holds that (ki , kj , kk ) ∈ Y . Therefore, in the fuzzy
setting for YAij3 (ki , kj ), we take the minimum of the values tvk3 (ki , kj ). Since K12
                                                                                        A3
is a fuzzy context, the (−)A3 -derivation operators form fuzzy Galois connections.
168 10 Cynthia
          Fuzzy-Valued Triadic Implications
               Vera Glodeanu

    In (8) we will need the hedge • for the computation of the unique stem base,
    however in general we take the identity for this hedge.
        For the (−)Aj -derivation operators with {i, j} = {1, 2} the situation is dif-
    ferent, because Aj is a fuzzy set. In the following we discuss more possibilities to
    obtain these derivation operators. In such cases we are interested in the relation
    between Ki and K3 for the values of Aj . This means that we are interested in just
    a part of the double-scaled context K,e namely in K e A := V          +             e
                                                           j     aj ∈Aj (Ki , K3 , aj , Y ).
    So, we could use discrete derivation operators to compute the concepts of K        eA
                                                                                           j
    and afterwards transform them into fuzzy concepts. However, this is a laborious
    task and was presented just for a better understanding of the problem.
        Another approach for the (−)Aj -derivation operators is the following:
                 A
        Xi 7→ Xi j := {k3 ∈ K3 | ki ⊗ kj ≤ tvk3 (ki , kj ), ∀(ki , kj ) ∈ Xi × Aj },
               A
                      _
        X3 7→ X3 j := {ki ∈ LKi | ki ⊗ kj ≤ tvk3 (ki , kj ), ∀(k3 , kj ) ∈ X3 × Aj }.

    In this case we do not need to double-scale the context. We compute the fuzzy
    concept induced by Xi and Aj and check under which conditions it exists. This
                      A
    way we obtain Xi j , i.e., the third component of the f-valued triconcept that is
                                            A
    induced by Xi and Aj . To obtain X3 j we consider each ki ∈ LKi and check
    whether the maximal rectangle ki ⊗ Aj exists under the fixed conditions of X3 .
    Afterwards, we take the maximum of these ki ’s due to the maximality property
    of f-valued triconcepts. This approach is laborious, especially the computation
         A
    of X3 j due to the large number of ki ’s we have to check.
        We will consider a more straight-forward approach by computing the fuzzy
    context induced by Aj . A similar approach was presented in [1]. For Xi ∈ LKi ,
    Aj ∈ LKj with {i, j} = {1, 2} and A3 ⊆ K3 we have
                               A
                                       ^
                      Xi 7→ Xi j :=        (Xi (ki )• → YAi3j (ki , k3 ))∗ ,    (10)
                                       ki ∈Ki
                             A
                                        ^
                      X3 7→ X3 j :=             (Xj (kj ) → YAi3j (ki , k3 )),         (11)
                                       kj ∈Kj

    where • and ∗ are hedge operators. The • operator is optional, as it is needed
    just for the computation of the stem base. It is the identity if i = 1. The ∗
                                                                                        A
    hedge is always a compulsory globalization in order to assure that Xi j yields a
    crisp set. Then, (10) and (11) are the      V derivation operators of the fuzzy context
    (Ki , K3 , YAi3j ) where YAi3j (ki , k3 ) := kj ∈Kj (Aj (kj ) → Y (ki , kj , kk )).
        Considering in (10) and (11) all values for the indices, i.e., instead of (−)Aj
    we take (−)Ak for {i, j, k} = {1, 2, 3}, and ignoring ∗ , these derivation operators
    suffice for other approaches to Fuzzy Triadic Concept Analysis. This happens
    due to the fact that such derivation operators yield triconcepts in which all three
    components are fuzzy sets.
    Proposition 3. For {i, j, k} = {1, 2, 3} there are (fuzzy) sets Xi ∈ LKi (Xi ∈
    Ki , if i = 3) and Xk ∈ LKk (Xk ∈ Kk , if k = 3) such that Aj := XiXk , Ai :=
                                           Fuzzy-ValuedTriadic
                                          Fuzzy-valued  TriadicImplications
                                                                Implications          11
                                                                                     169

AXj
    k
      and Ak := (Ai × Aj )(k) (if i < j) or Ak := (Aj × Ai )(k) (if i > j). Then,
(A1 , A2 , A3 ) is an f-valued triconcept denoted by bik (Xi , Xk ) having the smallest
k-th component under all f-valued triconcepts (B1 , B2 , B3 ) with the largest j-
th component satisfying Xi ⊆ Bi and Xk ⊆ Bk . Particularly, bik (Ai , Ak ) =
(A1 , A2 , A3 ) for each f-valued triconcept (A1 , A2 , A3 ) of K.
Proof. Without loss of generality we can assume (i, j, k) = (1, 2, 3). Obviously,
X1 ⊆ A1 and X3 ⊆ A3 . We start by proving that (A1 , A2 , A3 ) is indeed
an f-valued triconcept. From Proposition 1 we have A3 = (A1 × A2 ). Then,
          (A ×A )(3)
A2 ⊆ A1 1 2            = AA  1
                               3
                                  ⊆ X1X3 = A2 . Hence, A2 = AA      1
                                                                      3
                                                                        = (A1 × A3 )(2) ,
                                  (1)
similarly A1 = (A2 × A3 )              and together with Proposition 2 they yield an
f-valued triconcept. The rest of the proof is analogous to the crisp case. Let
(B1 , B2 , B3 ) ∈ T(K) with X1 ⊆ B1 and X3 ⊆ B3 . Then, B2 ⊆ A2 , because B2 =
(B1 ×B3 )(2) = B1B3 ⊆ X1X3 = A2 . If B2 = A2 , by similar consideration as before,
we obtain B1 ⊆ A1 . Therefore, we have A3 = (A1 × A2 )(3) ⊆ (B1 × B2 )(3) = B3 ,
finishing the first part of the proof. Now, if (A1 , A2 , A3 ) is an f-valued tricon-
cept, then AA   1 = (A1 × A3 )
                 3                 (2)
                                        = A2 and AA2 = (A2 × A3 )
                                                     3             (1)
                                                                        = A1 . Therefore,
bik (A1 , A3 ) = (A1 , A2 , A3 ) follows by the first part of the proposition.         t
                                                                                       u

4       F-valued Implications
In this section we will study f-valued implications, as generalisations of those
elaborated for the discrete case in [4]. There, the authors presented various
triadic implications, which are stronger than the ones developed in [15]. For
a given discrete triadic context K = (K1 , K2 , K3 , Y ) and for R, S ⊆ K2 and
                           C
C ⊆ K3 the expression R → S was called conditional attribute implication. For
                                           C
R, S ⊆ K3 and C ⊆ K2 the expression R → S was called attributional condition
implication. Implications of the form R → S with R, S ⊆ K2 × K3 were called
attribute×condition implications. Our main aim in the upcoming subsections is
to generalise such implications to our setting.

4.1     F-valued Conditional Attribute vs. Attributional Condition
        Implications
In this subsection we study implications of the form: If we are moderately vigilant
during an exam, then we are also fevered and If we are serious during an exam,
then we feel the same during our presentation.
Definition 3. For R, S ⊆ LK2 , C ⊆ K3 and globalization • we call the expres-
        C
sion R → S f-valued conditional attribute implication and its truth value
is given by
    C
R → S := tv(∀g ∈ K1 ((∀m ∈ R, (g, m) × C ∈ Y )• → (∀n ∈ S, (g, n) × C ∈ Y )))
          ^ ^                               ^
      =      (     (R(m) → YC12 (g, m)) →       (S(n) → YC12 (g, n)))
            g∈K1 m∈K2                              n∈K2
                       CC
          = tv(S ⊆ R        ).
170 12 Cynthia
          Fuzzy-Valued Triadic Implications
               Vera Glodeanu

    Note that these implications are ordinary fuzzy implications since we are working
    in the fuzzy context K12
                           C.

    Example 5. For the context given in Figure 2 we have, for example, the f-valued
                                              E                P
    conditional attribute implication s(0.5) → f (1) = s(0.5) → f (1) = 0.5 and yet
                      F
    another is s(0.5) → f (1) = 0. The first implication means that whenever the
    students are partially serious during an exam then they are also fevered. The
    same holds for this implication during a presentation given by the students. The
    implication does not hold when they are meeting their friends. In such situations
    the students can be serious but have a relaxed attitude.
       For an f-valued triadic context K we denote by

                              Imp(K2 ) := {R → S | R, S ∈ LK2 }

    the set of all fuzzy implications on K2 . We construct the dyadic context

                                Cimp (K) := (Imp(K2 ), K3 , I)
                                                                                   c
    where Imp(K2 ) is a fuzzy set, K3 is a crisp set and I(R → S, c) := R → S.
    In order to keep the condition set crisp, we use in Cimp (K) a slightly different
    version of the dyadic fuzzy derivation operators defined in (5), namely
                    ^                                     ^
      Ap (m) :=           (A(g)∗ → I(g, m)), B p (g) := (     (B(m) → I(g, m)))•
                 g∈Imp(K2 )                                m∈K3

    for A ∈ Imp(K2 ), B ∈ K3 and ∗ is the globalization. Then, (A, B) ∈ B(Cimp (K))
    contains in its extent all the implications that hold under all conditions of B. As
    in the crisp case, each extent is an implicational theory and hence, every extent
    has a stem base. In the concept lattice of Cimp (K) the implicational theories are
    hierarchically ordered by the conditions under which they hold. The extent A is
    the set of all implications that hold in (K1 , K2 , Yc12 ) with c ∈ C.
         The number of fuzzy implications can be very large, since we have all impli-
    cations A → B with A, B ⊆ LK2 . In the crisp case an implication either holds
    or not, whereas in the fuzzy case an implication holds
                                                         V      with a given truth value,
    i.e., with tv(A → V  B). We   have tv(a  →  b, c) =    {tv(a   → b), tv(a → c)} and
    tv(a, b → c) = {tv(a → c), tv(b → c)} for all a, b, c ∈ LK2 . Hence, for the
    structure of Cimp (K) it is enough to compute implications of the form a → b and
    a(µ) → a(ν) for all a, b ∈ LK2 with b 6= a and µ, ν ∈ L with µ ν. As discussed
    before, the other implications are infimum reducible elements in the lattice.
         In accordance with the idea presented in [4] we label the concept lattice
    of Cimp (K) as follows: The attribute labelling is done in the usual way. For
    the object labelling the situation is more cumbersome. Each set of implications
    from Imp(K2 ) is an extent of Cimp (K) and an implicational theory, as discussed
    above. The object labels shall be distributed such that every extent is generated
    as an implicational theory by the labels attached to it and to its subconcepts.
    Therefore, the bottom element of the lattice will contain the stem base of all
    f-valued conditional attribute implications.
                                               Fuzzy-ValuedTriadic
                                              Fuzzy-valued  TriadicImplications
                                                                    Implications                          13
                                                                                                         171
                                                                  18



                                                   v(0.5)                          f (0.5)
                     2                                       17        s(1)   15
                                                                  16



   Friends               Exam       Presentation v(1)             13               f (1)
             4       3          5                            14               12



                                                                                                s(0.5)
             6                  7                                 11                       10
                                    v(1) → f (1)                                                E, F → P

                     8                                                        9


        Fig. 3. Conditional attribute vs. attributional condition implications


    On the left part in Figure 3 the lattice of Cimp (K) is displayed. For better
legibility we used just the attribute labels (the conditions) and one object label
(conditional attribute implication). The implication v(1) → f (1) from the lattice
means that whenever the students are vigilant in degree (truth value) 1 during
an exam and presentation they are also fevered in degree 1 in these situations.
                                                                              C
    An implication C → D between the intents of Cimp (K) means that if R → S
                 D
holds, then R → S must hold as well. For our example the stem base of Cimp (K)
is P, F → E. We could perform a condition attribute exploration as proposed in
[4] for the discrete case, however this would go beyond the scope of this paper.

    In a triadic context we may arbitrarily interchange the roles of objects, at-
tributes and conditions. Therefore, a triadic context has a sixfold symmetry. By
interchanging attributes with conditions in Definition 3, we obtain the attribu-
tional condition implications defined as follows:

Definition 4. For R, S ⊆ K3 and M ⊆ LK2 the expression
  M
R → S := tv(∀g ∈ K1 ((∀a ∈ R, g × M × a ∈ Y ∗ ) → (∀b ∈ S, g × M × b ∈ Y ∗ )))
          ^ ^                               ^
       =     (    (R(a) → YM  13
                                 (g, a))∗ →    (S(b) → YM13
                                                            (g, a))∗ ),
             g∈K1 a∈K3                                  b∈K3

is called f-valued attributional condition implication, where ∗ is the glob-
alization.
                                                                                       M
   We use the globalization hedge operator because this time R → S is a crisp
implication. For example, for the f-valued triadic context from Table 2 we have
                                                      v(1)
the attributional condition implication P → E, F = 1, meaning that students
who are vigilant during a presentation are also vigilant during an exam and while
                                              f (1)
meeting friends. On the other hand, P → E, F = 0 means that a student being
fevered during a presentation does not imply that he/she is fevered during an
exam and while spending time with friends.
172 14 Cynthia
          Fuzzy-Valued Triadic Implications
               Vera Glodeanu

        In analogy to the conditional attribute implications, we can also build the
                   e := (Imp(K3 ), K2 ×L, I) for the attributional condition implica-
    context Cimp (K)
    tions. This time we have Imp(K3 ) := {R → S | R, S ∈ K3 }, i.e., all implications
                                     m                         e consist of all impli-
    on K3 and I(R → S, m) := R → S. The extents of Cimp (K)
                                     13
    cations that hold in (K1 , K3 , Ym ) with m ∈ K2 . The concept lattice is displayed
    on the right in Figure 3. For example the implication E, F → P means that if the
    students during an exam and while meeting friends are (partially) fevered and
    (partially) serious, then they have the same feelings during their presentation.
        The connection between the two classes of implications is an open question
    even for the discrete case and it remains open for the f-valued triadic case as
    well.

    4.2   F-valued Attribute×Condition Implications
    As presented for the discrete case, the two classes of implications studied so
    far are not powerful enough to express all possible kinds of implications in a
    triadic context. Therefore, we will generalise the so-called attribute×condition
    implications to our setting. These express implications of the form If we are
    serious during our presentation, then we are moderately fevered during the exam.
    Definition 5. For R, S ⊆ LK2 × K3 the expression R → S is an f-valued
    attribute×condition implication and its truth value is given by
     ^       ^                                   ^
        (           (R(m, b) → Y (g, m, b))• →          (S(n, c) → Y (g, n, c))),
    g∈K1 (m,b)∈K2 ×K3                              (n,c)∈K2 ×K3

    where • is the globalization, if we want to compute the unique stem base, other-
    wise the identity.
        These are the attribute implications of the fuzzy context (K1 , K2 ×K3 , Y (1) ).
    Their stem base is given by the stem base of the attribute implications from
    (K1 , K2 × K3 , Y (1) ).
        Obviously, such implications can be easily obtained by the f-valued condi-
                                                                                      C
    tional attribute and attributional condition implications, i.e., if we have R →
                      K2
    S for R, S ⊆ L , C ⊆ K3 , then we can compute R × {c} → S × {c} for
    all c ∈ C. Going the other way around, namely transforming the f-valued
    attribute×condition implications into f-valued conditional attribute and attri-
    butional condition implications, is of course also possible.
        One could also be interested in f-valued object×attribute or object×condi-
    tion implications. For our example this would mean If the first group of students
    is fevered, then the second one is serious.


    5     Conclusion and Further Research
    First, we presented a new framework for treating triadic fuzzy data. For this
    setting we generalised the notions of the (−)Ak and (−)(i) derivation operators,
                                           Fuzzy-ValuedTriadic
                                          Fuzzy-valued  TriadicImplications
                                                                Implications         15
                                                                                    173

triconcepts and trilattices. We also showed how our notions can be translated into
different approaches to Fuzzy Triadic Concept Analysis studied by other authors.
One of our main results is the generalisation of the (−)(i) derivation operator
for the f-valued triadic and fuzzy triadic setting, since it is absent in other works
dealing with fuzzy triadic data. Second, we generalised triadic implications to
our f-valued setting. These are of major importance for the development of Fuzzy
and Fuzzy-Valued Triadic Concept Analysis.
    Future research will focus on the connection between the different classes of
f-valued triadic implications. As mentioned at the beginning, [5] is an extended
version of this paper including the factorization problem. In the future we want
to apply the f-valued triadic factorization to real world data.


References
 1. Belohlávek, R., Osicka, P.: Triadic concept analysis of data with fuzzy attributes.
    In Hu, X., Lin, T.Y., Raghavan, V.V., Grzymala-Busse, J.W., Liu, Q., Broder,
    A.Z., eds.: GrC, IEEE Computer Society (2010) 661–665
 2. Osicka, P., Konecny, J.: General approach to triadic concept analysis 116-126.
    In Kryszkiewicz, M., Obiedkov, S.A., eds.: Proc. CLA 2010, University of Sevilla
    (2010) 116–126
 3. Clara, N.: Hierarchies generated for data represented by fuzzy ternary relations.
    In: Proceedings of the 13th WSEAS international conference on Systems, Stevens
    Point, Wisconsin, USA, World Scientific and Engineering Academy and Society
    (WSEAS) (2009) 121–126
 4. Ganter, B., Obiedkov, S.A.: Implications in triadic formal contexts. In: ICCS.
    (2004) 186–195
 5. Glodeanu, C.: Fuzzy-valued triadic concept analysis and its applications. Technical
    Report MATH-AL-07-2011, Technische Universitat Dresden (September 2011)
 6. Ganter, B., Wille, R.: Formale Begriffsanalyse: Mathematische Grundlagen. (1996)
 7. Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In Ellis,
    G., Levinson, R., Rich, W., Sowa, J.F., eds.: ICCS. Volume 954 of Lecture Notes
    in Computer Science., Springer (1995) 32–43
 8. Wille, R.: The basic theorem of triadic concept analysis. Order 12 (1995) 149–158
 9. Pollandt, S.: Fuzzy Begriffe. Springer Verlag, Berlin Heidelberg New York (1997)
10. Belohlávek, R.: Fuzzy Relational Systems: Foundations and Principles. Volume 20
    of IFSR Int. Series on Systems Science and Engineering. Kluwer Academic/Plenum
    Press (2002)
11. Belohlávek, R., Vychodil, V.: Fuzzy concept lattices constrained by hedges. JACIII
    11(6) (2007) 536–545
12. Belohlávek, R., Vychodil, V.: Attribute implications in a fuzzy setting. In: ICFCA.
    (2006) 45–60
13. Belohlávek, R., Vychodil, V., Chlupová, M.: Implications from data with fuzzy
    attributes. In: AISTA 2004 in Cooperation with the IEEE Computer Society
    Proceedings. (2004)
14. Guigues, J.L., Duquenne, V.: Familles minimales d’implications informatives re-
    sultant d’un tableau de donnes binaires. Math. Sci. Humaines (95) (1986)
15. Biedermann, K.: A Foundation of the Theory of Trilattices. Shaker Verlag, Aachen
    (1988)
               Mining Biclusters of Similar Values
                 with Triadic Concept Analysis

               Mehdi Kaytoue1 , Sergei O. Kuznetsov2 , Juraj Macko3 ,
                     Wagner Meira Jr.1 and Amedeo Napoli4
           1
             Universidade Fereral de Minas Gerais – Belo Horizonte – Brazil
                2
                  HSE – Pokrovskiy Bd. 11 – 109028 Moscow – Russia
        3
          Palacky University – 17. listopadu – 77146 Olomouc – Czech Republic
   4
     INRIA/LORIA – Campus Scientifique, B.P. 239 – Vandœuvre-lès-Nancy – France
       kaytoue@dcc.ufmg.br, kuznetsovs@yandex.ru, juraj.macko@upol.cz,
                       meira@dcc.ufmg.br, napoli@loria.fr


        Abstract. Biclustering numerical data became a popular data-mining
        task in the beginning of 2000’s, especially for analysing gene expression
        data. A bicluster reflects a strong association between a subset of objects
        and a subset of attributes in a numerical object/attribute data-table. So
        called biclusters of similar values can be thought as maximal sub-tables
        with close values. Only few methods address a complete, correct and
        non redundant enumeration of such patterns, which is a well-known in-
        tractable problem, while no formal framework exists. In this paper, we
        introduce important links between biclustering and formal concept anal-
        ysis. More specifically, we originally show that Triadic Concept Analysis
        (TCA), provides a nice mathematical framework for biclustering. Inter-
        estingly, existing algorithms of TCA, that usually apply on binary data,
        can be used (directly or with slight modifications) after a preprocessing
        step for extracting maximal biclusters of similar values.

        Keywords: Triadic concept analysis, numerical biclustering, scaling


  1   Introduction
  Numerical data biclustering mainly appeared in the beginning of 2000’s as a
  first answer to new challenges raised by biological data analysis, and especially
  gene expression data analysis [13]. Starting from an object/attribute numerical
  data-table (e.g. Table 1), the goal is to group together some objects with some
  attributes according to the values taken by these attributes for these objects [13].
  Accordingly, a bicluster is formally defined as a pair composed of a set of ob-
  jects and a set of attributes. Such pair can be represented as a rectangle in the
  numerical table, modulo lines and columns permutations. Table 1 is a numerical
  dataset with objects in lines and attributes in columns, while each table entry
  corresponds to the value taken by the attribute in column for the object in line.
  Table 2 illustrates bicluster ({g1 , g2 , g3 }, {m1 , m2 , m3 }) as a grey rectangle.
      There are several types of biclusters in the literature (see [13] for a survey),
  depending on the relation between the values taken by their attributes for their

c 2011 by the paper authors. CLA 2011, pp. 175–190. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
176     Mehdi Kaytoue et al.


objects. The most simple case can be understood as rectangles of equal val-
ues: a bicluster corresponds to a set of objects whose values taken by a same
set of attributes are exactly the same, e.g. ({g1 , g2 , g3 }, {m5 }). Constant bi-
clusters only appear in idyllic situations: generally numerical data are noisy.
Accordingly, a straightforward generalization of such biclusters lies in so called
biclusters of similar values: they are represented by rectangles with almost iden-
tical, say similar, values [13, 1, 7]. Table 2 illustrates a bicluster of similar values
({g1 , g2 , g3 }, {m1 , m2 , m3 }) where two values are said to be similar if their dif-
ference is no more than 1. Moreover, this bicluster is maximal: neither an object
nor an attribute can be added without violating the similarity condition.
    Only few methods address a complete, correct and non redundant enumer-
ation of such patterns [1, 7], which is a well-known intractable problem [13],
while no formal framework exists. In this paper, we show that Formal Concept
Analysis (FCA) [3], and especially Triadic Concept Analysis (TCA) [12] pro-
vides a suitable and well defined framework for this task: Basically, an object
has an attribute under a condition (a value). After a simple scaling procedure
(turning original data into binary), a bicluster is represented as a triadic con-
cept, composed of a set of objects, a set of attributes (both characterizing the
corresponding “rectangle”) and a set of values. All sets are maximal thanks to
existing concept forming derivation operators of TCA. This comes with several
advantages:
 – Two values w1 , w2 of the original data are said to be similar iff their difference
   does not exceed a given parameter θ. In this case, we write w1 'θ w2 ⇐⇒
   |w1 − w2 | ≤ θ. Otherwise, we write w1 6'θ w2 . The trilattice produced with
   TCA after scaling gives all maximal biclusters of similar values for any θ
   ordered w.r.t. similarity of their values.
 – The well known notion of frequency takes a semantics w.r.t. similarity of
   values. For example, let (A, B, C) be a triconcept, where A is a set of objects,
   B a set of attributes, and C a set of similar values. Assume (A, B) to be the
   corresponding bicluster. The higher |C|, the more similar are the values of
   the bicluster. If all |A|, |B|, and |C| are high we obtain a bicluster represented
   as a large rectangle of close values.
 – Existing algorithms from TCA [4] and n-ary closed set mining [2] can be used
   directly after scaling. We also provide a new algorithm to compute biclusters
   maximal only for a given θ (see algorithm TriMax later on).
 – Both scaling procedure and algorithm TriMax computations can be directly
   distributed to several computing cores.
 – The method can be adapted to n-ary numerical datasets. For example, with
   n = 3, a n-cluster would be a maximal 3D-box of similar values. It can be
   applied to 3D gene expression data, monitoring the behaviour of genes in
   different samples over time. It follows that mining n-dimensional clusters
   can be achieved with n + 1-adic concept analysis.
   The paper is organized as follows. Firstly, preliminaries regarding TCA are
presented in Section 2. Then Section 3 formally states the problem. It is followed
by the description of our two methods, respectively in Section 4 and 5. The
                 Mining bicluster of similar values with triadic concept analysis          177


first shows how TCA can help characterizing all maximal biclusters for any
θ, while the second restricts the problem to a user-given θ. This is followed
by experiments on the proposed approaches. Finally, the paper ends with a
discussion and perspectives of further research.

      Table 1: A numerical dataset                Table 2: A bicluster of similar values
                 m1 m2 m3 m4 m5                                  m1 m2 m3 m4 m5
            g1   1   2   2   1   6                          g1   1   2   2   1   6
            g2   2   1   1   0   6                          g2   2   1   1   0   6
            g3   2   2   1   7   6                          g3   2   2   1   7   6
            g4   8   9   2   6   7                          g4   8   9   2   6   7



2    Triadic Concept Analysis
We assume that the reader is familiar with basic notions of Formal Concept Anal-
ysis [3]. Lehmann and Wille introduced Triadic Concept Analysis (TCA [12]).
Data are represented by a triadic context, given by (G, M, B, Y ). G, M , and
B are respectively called sets of objects, attributes and conditions, and Y ⊆
G × M × B. The fact (g, m, b) ∈ Y is interpreted as the statement “Object g has
the attribute m under condition b”.
     A (triadic) concept of (G, M, B, Y ) is a triple (A1 , A2 , A3 ) with A1 ⊆ G,
A2 ⊆ M and A3 ⊆ B satisfying the two following statements: (i) A1 × A2 × A3 ⊆
Y , X1 × X2 × X3 ⊆ Y and (ii) A1 ⊆ X1 , A2 ⊆ X2 and A3 ⊆ X3 implies
A1 = X1 , A2 = X2 and A3 = X3 . If (G, M, B, Y ) is represented by a three
dimensional table, (i) means that a concept stands for a 3-dimensional rectangle
full of crosses while (ii) characterises component-wise maximality of concepts.
For a triadic concept (A1 , A2 , A3 ), A1 is called the extent, A2 the intent and A3
the modus.
     To describe the derivation operators, it is convenient to alternatively repre-
sent a triadic context as (K1 , K2 , K3 , Y ). Then, for {i, j, k} = {1, 2, 3}, j < k,
X ⊆ Ki and Z ⊆ Kj × Kk , (i)-derivation operators are defined by:
          Φ : X → X (i) : {(aj , ak ) ∈ Kj × Kk | (ai , aj , ak ) ∈ Y for all ai ∈ X}
           0
          Φ : Z → Z (i) : {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Z}
     This definition leads to derivation operator K(3) and dyadic context K(3) =
hK3 , K1 × K2 , Y (3) i. Further derivation operators are defined as follows: for
{i, j, k} = {1, 2, 3}, Xi ⊆ Ki , Xj ⊆ Kj and Ak ⊆ Kk , the (i, j, Ak )-derivation
operators are defined by:
                     (i,j,Ak )
      Ψ : Xi → Xi              : {aj ∈ Kj | (ai , aj , ak ) ∈ Y for all (ai , ak ) ∈ Xi × Ak }
        0             (i,j,Ak )
      Ψ : Xj → Xj               : {ai ∈ Ki | (ai , aj , ak ) ∈ Y for all (aj , ak ) ∈ Xj × Ak }
                            0
     Operators Φ and Φ will be called outer operators, pair of both operators outer
                                                0
closure and dyadic operators Ψ and Ψ inner operators or inner closure when pair
of both is used. Derivation operators of dyadic context are defined by Kij                Ak =
hKi , Kj , YAijk i, where (ai , aj ) ∈ YAijk iff ai , aj , ak are related by Y for all ak ∈ Ak .
     From a computational point of view, [4] developed the algorithm Trias for
extracting frequent triadic concepts, i.e. whose extent, intent and modus cardi-
nalities are higher than user-defined thresholds (see also [5]). Cerf et al. presented
178     Mehdi Kaytoue et al.


a more efficient algorithm called Data-peeler able to handle n-ary relations [2]
while formal definitions lie in so called Polyadic Concept Analysis [14].


3     Notations and problem settings

A numerical dataset is realized by a many-valued context [3] and we define
accordingly (maximal) biclusters of similar values.

Definition 1 (Many-valued context). Let G be a set of objects, M be a set
of attributes, W be the set of attribute values and I be a ternary relation defined
on the Cartesian product G × M × W . The fact (g, m, w) ∈ I, also written
m(g) = w, means that “Attribute m takes the value w for the object g”. The
tuple (G, M, W, I) is called many-valued context, or simply numerical dataset in
this paper.

Example 1. Table 1 is a numerical dataset, or many-valued context, with objects
G = {g1 , g2 , g3 , g4 }, attributes M = {m1 , m2 , m3 , m4 , m5 }, W = {0, 1, 2, 6, 7, 8, 9}
and for example m5 (g2 ) = 6.

Definition 2 (Bicluster). In a numerical dataset (G, M, W, I), a bicluster is
a tuple (A, B) with A ⊆ G and B ⊆ M .

Definition 3 (Similarity relation and bicluster of similar values). Let
w1 , w2 ∈ W be two attribute values and θ ∈ N be a user-defined parameter,
called similarity parameter. w1 and w2 are said to be similar iff |w1 − w2 | ≤ θ
and we note w1 'θ w2 . (A, B) is bicluster of similar values if m(g) 'θ n(h) for
all g, h ∈ A and for all m, n ∈ B.

Definition 4 (Maximal bicluster of similar values). A bicluster of similar
values (A, B) is maximal if adding either an object in A or an attribute in B
does not result in a bicluster of similar values.

Example 2 (From Table 1). ({g1 , g4 }, {m2 , m4 }) is a bicluster. ({g1 , g2 }, {m2 })
is a bicluster of similar values with θ ≥ 1. However, it is not maximal. With
1 ≤ θ < 5, ({g1 , g2 , g3 }, {m1 , m2 , m3 }) is maximal. Finally, with θ = 7 the biclus-
ter ({g1 , g2 , g3 }, {m1 , m2 , m3 , m4 , m5 }) is maximal. Note that a constant (max-
imal) bicluster is a (maximal) bicluster of similar values with θ = 0.

    Thus the problem that we address in this paper is the extraction of all max-
imal biclusters of similar values from a numerical dataset. We desire the extrac-
tion to be complete, correct and non-redundant compared to several existing
methods of the literature based on heuristics [13]. For that matter, we pro-
pose in the next section a first method aiming at extracting biclusters for any
similarity parameter θ. This method establishes new links between biclustering
and FCA in general, and TCA in particular. Then, the present methodology is
adapted to characterize and extract biclusters that are maximal for a given θ
only as usually done in the literature [1, 7, 13].
                 Mining bicluster of similar values with triadic concept analysis                 179


4     Biclusters of similar values in Triadic Concept Analysis
Firstly, we consider the problem of generating maximal biclusters for any θ.
Starting from a numerical dataset (G, M, W, I), the basic idea lies in building a
triadic context (G, M, T, Y ) where the two first dimensions remain formal objects
and formal attributes, while W is scaled into a third dimension denoted by T .
This new dimension T is called the scale dimension: intuitively, it gives different
“spaces of values” that each object-attribute pair (g, m) ∈ G×M can take. Once
the scale is given, a triadic context is derived from which triadic concepts are
characterized.
    We use the interordinal scaling [3] to build the scale dimension. It allows to
encode in 2T all possible intervals of values in W . This scale allows to derive a
triadic context from which any bicluster of similar values can be characterized
as a triadic concept. We made more precise these statements and illustrate the
whole procedure with examples.

Definition 5 (Interordinal Scaling). A scale is a binary relation J ⊆ W × T
associating original elements from the set of values W to their derived ele-
ments in T . In the case of interordinal scaling, T = {[min(W ), w], ∀w ∈ W } ∪
{[w, max(W )], ∀w ∈ W }. Then (w, t) ∈ J iff w ∈ t.

Example 3. Table 3 gives the tabular representation of the interordinal scale for
Table 1. Intuitively, each line describes a single value, while dyadic concepts
represent all possible intervals over W . An example of dyadic concept in this
table is given by ({6, 7, 8}, {t6 , t7 , t8 , t9 , t10 }), rewritten as ({6, 7, 8}, {[6, 8]}) since
{t6 , t7 , t8 , t9 , t10 } represents the interval [0, 8] ∩ [0, 9] ∩ [1, 9] ∩ [2, 9] ∩ [6, 9] = [6, 8].
                                   t10 = [6, 9]
                                   t11 = [7, 9]
                                   t12 = [8, 9]
                                   t13 = [9, 9]
                                   t1 = [0, 0]
                                   t2 = [0, 1]
                                   t3 = [0, 2]
                                   t4 = [0, 6]
                                   t5 = [0, 7]
                                   t6 = [0, 8]
                                   t7 = [0, 9]
                                   t8 = [1, 9]
                                   t9 = [2, 9]




                               J
                               0 × × × × × × ×
                               1   × × × × × × ×
                               2     × × × × × × ×
                               6       × × × × × × ×
                               7         × × × × × × ×
                               8           × × × × × × ×
                               9             × × × × × × ×

             Table 3: Interordinal scale of the set of attribute values W .

    Once the scale is defined, we can derive the triadic context w.r.t. this scale.

Definition 6 (Triadic scaled context). Let Y be ternary relation Y ⊆ G ×
M ×T . Then (g, m, t) ∈ Y iff (m(g), t) ∈ J, or simply m(g) ∈ t. We call the tuple
(G, M, T, Y ) the triadic scaled context of the numerical dataset (G, M, W, I).

Example 4. The object-attribute pair (g1 , m1 ) taking value m1 (g1 ) = 1 is scaled
into triples (g1 , m1 , t) ∈ Y where t takes any interval in {[0, 1], [0, 2], [0, 6], [0, 7],
180      Mehdi Kaytoue et al.


[0, 8], [0, 9], [1, 9]}. The intersection of intervals in this set is the original value
itself, i.e. m1 (g1 ) = 1, a basic property of interordinal scaling. As a result, Table 4
illustrates the whole scaled triadic context derived from the numerical dataset
given in Table 1 using interordinal scale. The very first cross (×) in this table
(upper left) represents the tuple (g2 , m4 , t1 ), meaning that m4 (g2 ) ∈ [0, 0].

   We present now our first main result: there is a one-to-one correspondence
between (i) the set of maximal biclusters of similar values in a given numerical
dataset for any similarity parameter θ and (ii) the set of all triadic concepts in
the triadic context derived with interordinal scaling.

Proposition 1. Tuple hA, B, U i, where A ⊆ G, B ⊆ G and U ⊆ T is triadic
concept iff (A, B) is a maximal bicluster of similar values for some θ ≥ 0.

Proof. We leave the proof in the Appendix of the paper since we need to intro-
duce notations and propositions not necessary in the rest of the paper.

Example 5. For example, ({g1 , g2 , g3 }, {m1 , m2 , m3 }, {t3 , t4 , t5 , t6 , t7 , t8 }) is a tri-
adic concept from the context depicted in Table 4. It corresponds to the maximal
bicluster ({g1 , g2 , g3 }, {m1 , m2 , m3 }) with θ = 1. θ = 1 since {t3 , t4 , t5 , t6 , t7 , t8 }
is maximal (it is a modus), it corresponds to interval [1, 2] and naturally 2−1 = 1
is the length of this interval.
        t1 = [0, 0]    t2 = [0, 1]    t3 = [0, 2]    t4 = [0, 6]     t5 = [0, 7]
     m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5
  g1                ×           ×  × × × ×        × × × × × × × × × ×
  g2             ×     × × ×       × × × ×        × × × × × × × × × ×
  g3                       ×       × × ×          × × ×          × × × × × ×
  g4                                      ×              × ×             × × ×

        t6 = [0, 8]    t7 = [0, 9]    t8 = [1, 9]     t9 = [2, 9]   t10 = [6, 9]
     m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5
  g1 × × × × × × × × × × × × × × ×                    × ×         ×              ×
  g2 × × × × × × × × × × × × ×                    × ×             ×              ×
  g3 × × × × × × × × × × × × × × × × × × × ×                                 × ×
  g4 ×      × × × × × × × × × × × × × × × × × × × ×                          × ×


         t11 = [7, 9]   t12 = [8, 9]   t13 = [9, 9]
      m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5
  g1
  g2
  g3             ×
  g4 ×   ×           ×   ×   ×                 ×


       Table 4: Triadic scaled context for Table 1 with interordinal scaling.

    Hence we showed that extracting biclusters of similar values for any θ in a
numerical dataset can be achieved by (i) scaling the attribute value dimension
and (ii) extracting the triadic concepts in the resulting derived triadic context.
    Interestingly, triadic concepts (A, B, U ) with the largest sets A, B or C rep-
resent large biclusters of close values. Indeed, the larger |A| and |B| the larger
the data covering of the corresponding bicluster. Furthermore, the larger |U |, the
more similar values for bicluster (A, B). Indeed, by the properties of interordinal
              Mining bicluster of similar values with triadic concept analysis    181


scaling, the more intervals in U , the smaller their interval intersection. Mining
so called top-k frequent triadic concepts can accordingly be achieved with the
existing algorithm Data-Peeler [2].
     On another hand, extracting maximal biclusters for all θ may be neither
efficient nor effective with large numerical data: their number tends to be very
large and all biclusters are not relevant for a given analysis. Furthermore, both
size and density of contexts derived with interordinal scaling are known to be
problematic w.r.t algorithmic scalability, see e.g. [9]. In existing methods of the
literature, θ is set a priori. We show now how to handle this case with slight
modifications, our second main result.


5    Extracting biclusters of similar values for a given θ

In this section we consider the problem of extracting maximal biclusters of sim-
ilar values in TCA for a given θ only. It comes with slight modifications of the
methodology presented in last section. Intuitively, consider the previous scaling
applied on a numerical dataset (G, M, W, I). It scales W into dimension T and
subsets of T characterize all intervals of values over W . To get maximal biclusters
for a given θ only, we should not consider all possible intervals in W , but rather
all intervals (i) having a range size that is less or equal than θ to avoid biclusters
with non similar values, and (ii) having a range size the closest as possible to
θ to avoid non-maximal biclusters. For example, if we set θ = 2, it is probably
not interesting to consider interval [0, 8] in the scale dimension since 8 − 0 > θ.
Similarly, considering the interval [6, 6] may not be interesting as well, since a
bicluster with all its values equal to 6 may not be maximal. As introduced in [6],
those maximal intervals of similar values used for the scale are called blocks of
tolerance over the set of numbers W with respect to the tolerance relation 'θ .
     Therefore we firstly recall basics on tolerance relations over a set of numbers.
It allows us to define a simpler scaling procedure. The resulting triadic context
is then mined with a new TCA algorithm called TriMax to extract maximal
biclusters of similar values for a given θ.
     Blocks of tolerance over W are defined as maximal sets of pairwise similar
values from W :

Definition 7 (Tolerance blocks from a set of numbers). The similarity
relation 'θ is called a tolerance relation, i.e. reflexive, symmetric but not tran-
sitive. Given a set W of values, a subset V ⊆ W , and a tolerance relation 'θ
over W , V is a block of tolerance if:
        (i) ∀w1 , w2 ∈ V, w1 'θ w2 (pairwise similarity)
        (ii) ∀w1 6∈ V, ∃w2 ∈ V, w1 6'θ w2 (maximality).

    From Table 1 we have W = {0, 1, 2, 6, 7, 8, 9}. With θ = 2, one has 0 '2 2 but
2 6'2 6. Accordingly, one obtains 3 blocks of tolerance, namely the sets {0, 1, 2},
{6, 7, 8} and {7, 8, 9}. These three sets can be renamed as the convex hull of their
elements on N: respectively, [0, 2], [6, 8] and [7, 9]: any number lying between the
182       Mehdi Kaytoue et al.


minimal and the maximal elements (w.r.t. natural number ordering) of a block
of tolerance is naturally similar to any other element of the block.
    To derive a triadic context from a numerical dataset, we simply use tolerance
blocks over W to define the scale dimension.

Definition 8 (Trimax scale relation). The scale relation is a binary relation
J ⊆ W × C, where C is the set of blocks of tolerance over W renamed as their
convex hulls. Then, (w, c) ∈ J iff w ∈ c.

Example 6. From Table 1 we have: C = {[0, 1], [1, 2], [6, 7], [7, 8], [8, 9]} with θ =
1, and C = {[0, 2], [6, 8], [7, 9]} with θ = 2.

    Then, we can apply the same context derivation as in previous section: scaling
is still based on intervals, but this time it uses tolerance blocks.
Definition 9 (TriMax triadic scaled context). Let Y ⊆ G × M × C be a
ternary relation. Then (g, m, c) ∈ Y iff (m(g), c) ∈ J, or simply m(g) ∈ c, where
J is the scale relation. (G, M, C, Y ) is called the TriMax triadic scaled context.

Example 7. Table 5 is the Trimax triadic scaled concept derived from the nu-
merical dataset lying in Table 1 with θ = 1.
        label 1        label 2        label 3        label 4        label 5
         [0, 1]         [1, 2]         [6, 7]         [7, 8]         [8, 9]
    m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5 m1 m2 m3 m4 m5
 g1 ×           ×  × × × ×                      ×
 g2    × × ×       × × ×                        ×
 g3        ×       × × ×                      × ×            ×
 g4                       ×                   × × ×            × × ×


      Table 5: Triadic scaled context using tolerance blocks over W and θ = 1.

Definition 10 (Dyadic context associated with a block of tolerance).
Consider a block of tolerance c ∈ C. The dyadic context associated with this block
is given by (G, M, Z) where z ∈ Z denotes all (g, m) ∈ G × M such as m(g) ∈ c.

Example 8. In Table 5, each such dyadic context is labelled by its corresponding
block of tolerance.

     Now, remark that blocks of tolerance over W are totally ordered: let [v1 , v2 ]
and [w1 , w2 ] be two blocks of tolerance, one has [v1 , v2 ] < [w1 , w2 ] iff v1 < w1 .
Hence, associated dyadic contexts are also totally ordered and we use a corre-
sponding indexing set to label them. In Table 5, contexts for blocks h[0, 1], [1, 2],
[6, 7], [7, 8], [8, 9]i are respectively labelled h1, 2, 3, 4, 5i.
     We now present our second main results: The scaled triadic context supports
the extraction of maximal biclusters of similar values for a given θ. In this case
however, existing algorithms of TCA cannot be applied directly. For example, in
Table 5, the triconcept ({g3 }, {m4 }, {3, 4}) corresponds to a bicluster of similar
values which is not maximal. Hence we present hereafter a new TCA algorithm
for this task, called TriMax.
              Mining bicluster of similar values with triadic concept analysis   183


    The basic idea of TriMax relies on the following facts. Firstly, since each
dyadic context corresponds to a block of tolerance, we do not need to compute
intersections of contexts, such as classically done in TCA. Hence each dyadic
context is processed separately. Secondly, a dyadic concept of a dyadic context
necessarily represents a bicluster of similar values, but we cannot be sure it is
maximal (see previous example). Hence, we need to check if a concept is still
a concept in other dyadic contexts, corresponding to other classes of tolerance.
This is made precise with the following proposition.
Proposition 2. Let (A, B, U ) be a triadic concept from Trimax triadic scaled
context (G, M, C, Y ), such that U is the outer closure of a singleton {c} ⊆ C.
If |U | = 1, (A, B) is a maximal bicluster of similar values. Otherwise, (A, B)
is a maximal bicluster of similar values iff @y ∈ [min(U ); max(U )], y < c s.t.
            0                      0
(A, B) 6= Ψy (Ψy ((A, B))), where Ψy (.) and Ψy (.) correspond to inner derivation
operators associated with y th dyadic context.
Proof. When |U | = 1, (A, B) is a dyadic concept only in one dyadic context
corresponding to a block of tolerance. By properties of tolerance blocks, (A, B)
is a maximal bicluster. If |U | 6= 1, (A, B) is a dyadic concept in |U | dyadic
contexts. Since the tolerance block set is totally ordered, it directly implies that
modus U is an interval [min(U ); max(U )]. Hence, if ∃y ∈ [min(U ); max(U )] s.t.
            0
(A, B) = Ψy (Ψy ((A, B))) this means that (A, B) is not a maximal bicluster of
similar values.
    Description of the TriMax algorithm. TriMax starts with scaling ini-
tial numerical data into several dyadic contexts, each one standing for a block of
tolerance over W with given θ. The set of all dyadic contexts forms accordingly
a triadic context. Then, each dyadic context is mined with any FCA algorithm
(or closed itemset mining algorithm), and all formal concepts are extracted. For
                                                             0
a given concept (A, B), we compute outer derivation Φ ((A, B)), i.e. to obtain
the set of dyadic contexts labels in which the current dyadic concept holds. If it
results in a singleton, this means that (A, B) is a concept for the current block
of tolerance only, i.e. it is a maximal bicluster of similar values, and it has been,
or will never be, generated twice. Otherwise, (A, B) is a concept in other con-
texts, and can be generated accordingly several times (as much as the number of
contexts in which it holds). Then, we only consider (A, B) if we are sure it is the
last time it is computed. Finally, we need to check if current concept represents
a maximal bicluster, i.e. there should not exist a context from the modus where
(A, B) is not a dyadic concept.
Proposition 3. TriMax outputs a (i) complete, (ii) correct and (iii) non re-
dundant collection of all maximal biclusters of similar values for a given numer-
ical dataset and similarity parameter θ.
Proof. (i) and (ii) follow directly from Proposition 2. Statement (iii) is ensured
by the second if condition of the algorithm: a dyadic concept (or equivalently
bicluster) is considered iff it has been extracted in the last dyadic context in
which it holds.
184       Mehdi Kaytoue et al.


    Algorithm 1: TriMax
      input : Numerical dataset (G, M, W, I), tolerance parameter θ
      output: Maximal biclusters of similar values
      Let C = {[ai , bi ]} be the totally ordered set of all blocks over W for given θ.
      Indices i form an indexing set.
      forall the [ai , bi ] ∈ C do
          Build context (G, M, Zi ) such that (g, m) ∈ Zi ⇔ m(g) ∈ [ai , bi ]
      forall the (G, M, Zi ) do
         Use any FCA algorithm to extract all its concepts (A, B)
         forall the dyadic concepts (A, B) in the current context (G, M, Zi ) do
                   0
              if |Φ ((A, B))| = 1 then
                  print (A, B)
                             0
              else if max(Φ ((A, B)) = i then
                               0
                  x ← min(Φ ((A, B))
                                                 0
                  if @y ∈ [x, i[ s.t. (A, B) 6= Ψy (Ψy ((A, B))) then
                      print (A, B)




6      Computer experiments

In this section, we experiment with the algorithm TriMax and highlight various
aspects of its practical complexity.

Data. We explore a gene expression dataset of the species Laccaria bicolor avail-
able at NCBI5 . More details on this dataset can be found in [9]. This gene expres-
sion dataset monitors the behaviour of 11, 930 genes in 12 biological situations,
reflecting various stages of Laccaria bicolor biological cycle. Attribute values in
W vary between 0 and 60, 000.

TriMax implementation. TriMax is written in C++. It uses the boost
library 1.42 for data structures and the implementation of InClose from its
authors6 for dyadic concepts extraction. At each iteration of the main loop,
i.e. each tolerance block, the current scaled dyadic context is produced: We do
not generated the whole triadic context which cannot fit into memory for large
databases. It turns out that the modus computation for a given dyadic concept
requires to compute scaling “on the fly”, i.e. when computing the set of dyadic
contexts in which a current concept holds. The experiments were carried out on
an Intel CPU 2.54 Ghz machine with 8 GB RAM running under Ubuntu 11.04.

Experiment settings. The goal of the present experiments is not to give a
qualitative evaluation of the present approach (say biological interpretation),
but rather a quantitative evaluation. Indeed, the present work aims at showing
5
    http://www.ncbi.nlm.nih.gov/geo/ as series GSE9784
6
    http://sourceforge.net/projects/inclose/
              Mining bicluster of similar values with triadic concept analysis     185




       (i) Numbers of patterns (Y-axis)           (ii) Execution times in seconds (Y-axis)
       w.r.t. θ (X-axis) and |G| (Z-axis)             w.r.t. θ (X-axis) and |G| (Z-axis)




  (iii) Numbers of blocks of tolerance (Y-axis)   (iv) Density of triadic contexts (Y-axis)
         w.r.t. θ (X-axis) and |G| (Z-axis)           w.r.t. θ (X-axis) and |G| (Z-axis)




(v) Comparing the number of generated dyadic         (vi) Repartition of execution time
 concepts w.r.t. the actual number of maximal           w.r.t main steps of TriMax
      biclusters varying θ with |G| = 500              with θ = 33, 000 and |G| = 500


Fig. 1: Monitoring with different settings (i) the number of maximal biclusters,
(ii) the execution times of TriMax, (iii) the number of tolerance blocks, (iv)
the derived triadic context density, (v) the number of non-maximal biclusters
generated as dyadic-concepts w.r.t. the number of maximal biclusters, and (vi)
repartition of execution time in the TriMax algorithm.
186     Mehdi Kaytoue et al.


how an existing type of biclusters can be mined with Triadic Concept Analysis.
For a qualitative evaluation, the reader may refer for example to [1, 9].
     Accordingly, we designed the following experiments to monitor various as-
pects of the TriMax algorithm. For most of the experiments, the dataset used is
composed of an increasing number of objects and all attributes. The objects are
chosen randomly once and for all so that the different experiment results can be
compared. We also vary the parameter θ in the same way across all experiments.
Then, we monitor the following aspects, as presented in Figure 1:
   i. Number of maximal biclusters of similar values
  ii. Execution time (in seconds)
 iii. Number of tolerance blocks
 iv. Density of the triadic context, where density is defined as d(G, M, C, Y ) =
      |Y |/(|G| × |M | × |C|). This information is important, since contexts with
      high density are known to be hard to process with FCA algorithms [11], and
      we use the InClose algorithm for dyadic contexts processing.
  v. Comparison between the number of non-maximal biclusters produced by
      TriMax (i.e. dyadic concepts that do not corresponds to maximal biclus-
      ters) with the number of maximal biclusters.
 vi. Execution time profiling of the main procedures of TriMax. This is achieved
      with the tool GNU GProf and gives us what parts of the algorithm are
      the most time consuming.

Experiment results. Figure 1 presents the results of our experiments with
different settings. In these settings, we vary the number of objects |G| and the
parameter θ. A first observation arises from graph (i): the number of biclusters
is the highest when θ ' 30, 000. A first explanation is that 30, 000 is the half of
the maximal value of W and almost all multiples of 100 in [0; 60, 000] belongs
to W . In graph (ii), execution time has the same behaviour as graph (i). These
results can be understood by paying attention to the next graphs (iii) and (iv).
In (iii) is monitored the number of tolerance blocks. The maximal number is
reached when θ = 0, i.e. |C| = |W |. When θ = max(W ), we have |C| = 1. Now
we observe in (iv) that the density follows a reverse behaviour: When θ = 0, the
density tends towards 0%; when θ = max(W ), then density exactly equal 1%.
Combining both graph (iii) and (iv), the worst cases happen when both density
and tolerance bloc count are high.
    Another observation, which explains also the execution times, arises from
graph (v). Here are compared the number of maximal biclusters and the number
of non-maximal biclusters generated as dyadic concepts. Here again, worst case
is reached when θ ' 30, 000. Looking at graph (vi), we learn that this is however
not the major problem. The mostly consuming procedure of TriMax is the
computation of the modus of a dyadic concept. The explanation is that we
compute modus with “on the fly scaling”.
    Therefore, the bottleneck of our algorithm reveals itself to be the modus
computation. In practical applications however, the analyst is not interested in
all biclusters of similar values. Some constraints are generally defined, such as
a minimal (resp. maximal) number of objects (resp. attributes) in a bicluster
              Mining bicluster of similar values with triadic concept analysis    187


(A, B), or a minimal area |A| × |B|, etc. Interestingly, most of those constraints
can be evaluated on a generated dyadic concept. Therefore, before computing
the modus of such concept, we can check such properties and discard the concept
if not respecting the constraints. Although not reflected in this paper, we tested
how adding minimal (resp. maximal) size constraints on a bicluster affects both
number of biclusters and execution times. The results are very interesting: for
example with θ = 33, 000, |G| = 500, and minimal (resp. maximal) size for |A|
set to 10 (resp. 40), TriMax produces only 5, 332 maximal biclusters in 2.1
seconds compared to 104, 226 maximal biclusters extracted in 16.130 seconds
without any constraint.
    Finally, the most interesting aspect of TriMax is its direct distributed com-
putation capacity. Indeed, each iteration, i.e. for each block of tolerance, can
be achieved independently from the others. Furthermore, the core of TriMax
consisting in extracting dyadic contexts can also be distributed, see e.g. [10].
A deeper investigation remains to be done in this case. Note that although the
method description involves W as a set of natural numbers, TriMax can directly
handle numerical data real numbers, and has been implemented as such.
Comparison with existing methods. Two existing methods in the literature
also consider the problem of extracting all maximal biclusters of similar values
from a numerical dataset. The first method is called Numerical Biset Miner
(NBS-Miner [1]). The second method is based on interval pattern structures
(IPS [7, 8]). Limited by space, we do not detail these methods. Both NBS-Miner
and IPS algorithms have been implemented in C++. First experiments show
that NBS-Miner is not scalable compared to IPS and TriMax. On another
hand, it seems that TriMax outperforms IPS, but a deeper investigation is
required. The main problem in IPS is to find an efficient algorithm able to
compute tolerance blocks over a set of intervals.


7    Conclusion
We addressed the problem of biclustering numerical data with Formal Concept
Analysis. So called (maximal) biclusters of similar values can be characterized
and extracted with Triadic Concept Analysis, which turns out to be a novel
mathematical framework for this task. We properly defined a scaling procedure
turning original numerical data into triadic contexts from which biclusters can
be extracted as triadic concepts with existing algorithms. This approach allows a
correct, complete and non-redundant extraction of all maximal biclusters, for any
similarity parameter θ and can be extended to n-ary numerical datasets while
their computation can be directly distributed. The interpretation of triadic con-
cepts is very rich: both extent and intent allow to characterize a bicluster (i.e. the
rectangle), while the modus gives the range of values of the biclusters, and for
which θ is the bicluster maximal. Moreover, the larger the modus, the more simi-
lar the values within current bicluster. It follows a perspective of research, aiming
at extracting the top-k frequent tri-concepts with Data-Peeler [2], which can
help to handle the problem of top-k biclusters extraction. We also adapted the
188     Mehdi Kaytoue et al.


TCA machinery with algorithm TriMax to extract maximal biclusters for a
user-defined θ, which is classical in the existing literature. It appears that Tri-
Max is a fully customizable algorithm: any concept extraction algorithm can be
used inside its core (along with several constraints on produced dyadic concepts),
while its distributed computation is direct. Among several other experiments,
it remains now to determine which are the best core algorithms for a given θ
parameter, the very last directly influencing derived contexts density.
Acknowledgements. Authors would like to thank Dmitry Andreevich Morozov
for implementing the algorithms NBS-Miner and IPS. The second author was
supported by the project of the Russian Foundation for Basic Research, grant
no. 08-07-92497-NTsNIL a. Juraj Macko acknowledges support by Grant No.
202/10/0262 of the Czech Science Foundation.


A     Proof of the Proposition 1.
Before proving this proposition, we need to introduce the following. For sake of
simplicity, we now consider W as the set of all natural numbers from a numerical
dataset that are greater or equal than the minimal value and lower or equal than
the maximal value, i.e. W = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} with the example of Table 1.

Definition 11 (Scale value and scale relation). We call scale value s =
q − r where r = min(W ) and q = max(W ). The scale relation is a binary
relation J ⊆ W × T , where T = {t1 , . . . , t2s+1 } r ≤ w ≤ q and hw, ti i ∈ J iff
i ∈ [w − r + 1, w − r + 1 + s].

    Note that J is equivalent to interordinal scale of W previously given, but
this notations are used for the proof.

Definition 12 (Eθw - cluster base). We introduce Eθw ⊆ T defined as Eθw =
[tw+θ−r+1 ; tw−r+1+s ] for given θ and w ∈ W .

Example 9 (Eθw - cluster base). E12 = [t2+1−0+1 ; t2−0+1+9 ] = [t4 ; t12 ].

Proposition 4. (wb = m(g)) 'θ (n(h) = wc ) iff (hg, mi ∈ YE12θb and hh, ni ∈
YE12θb ).

Proof. Let Eb , Ec ⊆ T and wc ≥ wb . According to the definition (g, m) ∈ YE12θb
iff m, g, t are related by Y for all t ∈ Eθb . Using scaling and definition we have
[twb −r+1 ; twb −r+1+s ] = Eb ⊇ Eθb = [twb +θ−r+1 ; twb −r+1+s ] which is straight-
forward. We just need to show that (h, n) ∈ YE12θb holds as well. With scaling
definition and previous definition we get [twc −r+1 ; twc −r+1+s ] = Ec ⊇ Eθb =
[twb +θ−r+1 ; twb −r+1+s ] holding iff wc − wb ≤ θ, which is equal to the definition
of 'θ .

Moreover we can easily see as a corollary that wc −wb ≤ θ holds iff Eb ∩Ec ⊇ Eθb
and wc − wb = θ holds iff Eb ∩ Ec = Eθb . Now we can prove the Proposition 1
from the main text.
              Mining bicluster of similar values with triadic concept analysis     189


Proposition 1. Tuple hA1 , A2 , U i, where A1 ⊆ G, A2 ⊆ M and U ⊆ T is
triadic concept iff (A1 , A2 ) is a maximal bicluster of similar values for some
θ ≥ 0. Furthermore the value of θ is defined as θ = s − |U | + 1.

Proof. Let U = Eθb and consider dyadic context YU12 = YE12θb for some wb . Using
                           0
dyadic closure operator Ψ (Ψ ((A1 )) we get (A1 , A2 ). From definition of triconcept
we know that A1 ⊆ B1 implies A1 = B1 (the same for A2 ). From definition of
maximal bicluster of similar values we know that hA1 , A2 i is maximal when it
does not exists hB1 , B2 i s.t. B1 ⊇ A1 (the same applies for A2 ). It is obvious
that both sets are maximal from definition and when we have the same dyadic
context YU12 = YE12θb . Now we need to look at dyadic context YU12 = YE12θb . In
|U | = |Eθb | = |[twb +θ−r+1 ; twb −r+1+s ]| we can easily see that |U | = s − θ + 1,
which gives θ = s − |U | + 1.
    Finally, U is maximal (as being modus of a triconcept) and Eθb is maximal
as well because wc − wb ≤ θ holds iff Eb ∩ Ec ⊇ Eθb . All facts mentioned in this
proof leads to equality of the triconcept and maximal bicluster of similar values.


References
 1. Besson, J., Robardet, C., Raedt, L.D., Boulicaut, J.F.: Mining bi-sets in numerical
    data. In: Dzeroski, S., Struyf, J. (eds.) KDID. Lecture Notes in Computer Science,
    vol. 4747, pp. 11–23. Springer (2007)
 2. Cerf, L., Besson, J., Robardet, C., Boulicaut, J.F.: Closed patterns meet n-ary
    relations. TKDD 3(1) (2009)
 3. Ganter, B., Wille, R.: Formal Concept Analysis. Springer (1999)
 4. Jäschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Trias - an algorithm
    for mining iceberg tri-lattices. In: ICDM. pp. 907–911 (2006)
 5. Ji, L., Tan, K.L., Tung, A.K.H.: Mining frequent closed cubes in 3d datasets.
    In: Proceedings of the 32nd International Conference on Very Large Data Bases
    (VLDB). pp. 811–822. ACM (2006)
 6. Kaytoue, M., Assaghir, Z., Napoli, A., Kuznetsov, S.O.: Embedding tolerance re-
    lations in formal concept analysis: an application in information fusion. In: CIKM.
    pp. 1689–1692. ACM (2010)
 7. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Biclustering numerical data in formal
    concept analysis. In: Valtchev, P., Jäschke, R. (eds.) ICFCA. LNCS, vol. 6628, pp.
    135–150. Springer (2011)
 8. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting numerical pattern mining
    with formal concept analysis. In: Proceedings of the 22nd International Joint Con-
    ference on Artificial Intelligence (IJCAI). IJCAI/AAAI (2011)
 9. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression
    data with pattern structures in formal concept analysis. Inf. Sci. 181(10), 1989–
    2001 (2011)
10. Krajca, P., Vychodil, V.: Distributed algorithm for computing formal concepts
    using map-reduce framework. In: IDA. pp. 333–344. Springer (2009)
11. Kuznetsov, S.O., Obiedkov, S.A.: Comparing performance of algorithms for gen-
    erating concept lattices. J. Exp. Theor. Artif. Intell. 14(2-3), 189–216 (2002)
12. Lehmann, F., Wille, R.: A triadic approach to formal concept analysis. In: ICCS.
    LNCS, vol. 954, pp. 32–43. Springer (1995)
190     Mehdi Kaytoue et al.


13. Madeira, S., Oliveira, A.: Biclustering algorithms for biological data analysis: a
    survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics
    1(1), 24–45 (2004)
14. Voutsadakis, G.: Polyadic concept analysis. Order 19(3), 295–304 (2002)
      Fast Mining of Iceberg Lattices: A Modular
             Approach Using Generators

      Laszlo Szathmary1 , Petko Valtchev1 , Amedeo Napoli2 , Robert Godin1 ,
                     Alix Boc1 , and Vladimir Makarenkov1
                      1
                      Dépt. d’Informatique UQAM, C.P. 8888,
                  Succ. Centre-Ville, Montréal H3C 3P8, Canada
        Szathmary.L@gmail.com, {valtchev.petko, godin.robert}@uqam.ca,
                   {makarenkov.vladimir, boc.alix}@uqam.ca
      2
        LORIA UMR 7503, B.P. 239, 54506 Vandœuvre-lès-Nancy Cedex, France
                                 napoli@loria.fr



        Abstract. Beside its central place in FCA, the task of constructing the
        concept lattice, i.e., concepts plus Hasse diagram, has attracted some
        interest within the data mining (DM) field, primarily to support the
        mining of association rule bases. Yet most FCA algorithms do not pass
        the scalability test fundamental in DM. We are interested in the ice-
        berg part of the lattice, alias the frequent closed itemsets (FCIs) plus
        precedence, augmented with the respective generators (FGs) as these
        provide the starting point for nearly all known bases. Here, we investi-
        gate a modular approach that follows a workflow of individual tasks that
        diverges from what is currently practiced. A straightforward instantia-
        tion thereof, Snow-Touch, is presented that combines past contributions
        of ours, Touch for FCIs/FGs and Snow for precedence. A performance
        comparison of Snow-Touch to its closest competitor, Charm-L, indicates
        that in the specific case of dense data, the modularity overhead is offset
        by the speed gain of the new task order. To demonstrate our method’s
        usefulness, we report first results of a genome data analysis application.


  1   Introduction

  Association discovery [1] in data mining (DM) is aimed at pinpointing the most
  frequent patterns of items, or itemsets, and the strongest associations between
  items dug in a large transaction database. The main challenge here is the po-
  tentially huge size of the output. A typical way out is to focus on a basis, i.e. a
  reduced yet lossless representation of the target family (see a list in [2]). Many
  popular bases are either formulated in terms of FCA or involve structures that
  do. For instance, the minimal non-redundant association rules [3] require the
  computation of the frequent closed itemsets (FCI) and their respective frequent
  generators (FGs), while the informative basis involves the inclusion-induced
  precedence links between FCIs.
      We investigate the computation of iceberg lattices, i.e., FCIs plus prece-
  dence, together with the FGs. In the DM literature, several methods exist that

c 2011 by the paper authors. CLA 2011, pp. 191–206. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
192
2       Laszlo Szathmary et al. Laszlo Szathmaryet al.


target FCIs by first discovering the associated FGs (e.g. the levelwise FCI miners
A-Close [4] and Titanic [5]). More recently, a number of extensions of the pop-
ular FCI miner Charm [6] have been published that output two or all three of
the above components. The basic one, Charm-L [7], produces FCIs with prece-
dence links (and could be regarded as a lattice construction procedure). Further
extensions to Charm-L produce the FGs as well (see [8,9]).
    In the FCA field, the frequency aspect of concepts has been mostly ignored
whereas generators have rarely been explicitly targeted. Historically, the first
method whose output combines closures, generators and precedence has been
presented in [10] yet this fact is covered by a different terminology and a some-
what incomplete result (see explanations below). The earliest method to explic-
itly target all three components is to be found in [11] while an improvement was
published in [12]. Yet all these FCA-centered methods have a common drawback:
They scale poorly on large datasets due to repeated scans of the entire database
(either for closure computations or as an incremental restructuring technique).
In contrast, Charm-L exploits a vertical encoding of the database that helps
mitigate the cost of the impact of the large object (a.k.a. transaction) set.
    Despite a diverging modus operandi, both FCA and data mining methods
follow the same overall algorithmic schema: they first compute the set of con-
cepts/FCIs and the precedence links between them and then use these as input
in generator/FG calculation.
     However efficient Charm-L is, its design is far from optimal: For instance,
FCI precedence is computed at FCI discovery, benefiting from no particular in-
sight. Thus, many FCIs from distant parts of the search space are compared.
We therefore felt that there is space for improvement, e.g., by bringing in tech-
niques operating locally. An appealing track seemed to lay in the exploration of
an important duality from hypergraph theory to inverse the computation depen-
dencies between individual tasks (and thus define a new overall workflow). To
clarify this point, we chose to investigate a less intertwined algorithmic schema,
i.e. by a modular design so that each individual task could be targeted by the
best among a pool of alternative methods.
    Here, we describe a first step in our study, Snow-Touch, which has been
assembled from existing methods by wiring them w.r.t. our new schema. Indeed,
our method relies on Charm for mining FCIs and on the vertical FG miner
Talky-G, which are put together into a combined miner, Touch [13], by means
of an FGs-to-FCIs matching mechanism. The Snow method [14] extracts the
precedence links from FCIs and FGs.
    The pleasant surprise with Snow-Touch was that, when a Java implemen-
tation thereof was experimentally compared to Charm-L (authors’ version in
C++) on a wide range of data, our method prevailed on all dense datasets. This
was not readily anticipated as the modular design brought a computational over-
head, e.g. the extra matching step. Moreover, Snow-Touch proved to work well
with real-world data, as the first steps of a large-scale analysis of genomic data
indicate.
      Fast Mining of Iceberg Lattices:Mining
                                      A Modular  Approach
                                             Iceberg LatticesUsing
                                                              with Generators
                                                                   Generators        1933


    In summary, we contribute here a novel computation schema for iceberg
lattices with generators (hence a new lattice construction approach). More-
over, we derive an efficient FCI/FG/precedence miner (especially on dense sets).
We also demonstrate the practical usefulness of Snow-Touch as well as of the
global approach for association mining based on generic rules.
    The remainder of the paper is as follows: Background on pattern mining,
hypergraphs, and vertical pattern mining is provided in Section 2. In Section 3
we present the different modules of the Snow-Touch algorithm. Experimental
evaluations are provided in Section 4 and conclusions are drawn in Section 5.


2     Background
In the following, we summarize knowledge about relevant structures from fre-
quent pattern mining and hypergraph theory (with parallels drawn with similar
notions from FCA) as well as about efficient methods for mining them.

2.1     Basic facts from pattern mining and concept analysis
In pattern mining, the input is a database (comparable to an FCA formal con-
text). Records in the database are called transactions (alias objects), denoted
here O = {o1 , o2 , . . . , om }. A transaction is basically subsets of a given total
set of items (alias attributes), denoted here A = {a1 , a2 , . . . , an }. Except for its
itemset, a transaction is explicitly identified through a unique identifier, a tid (a
set of identifiers is thus called a tidset). Throughout this paper, we shall use the
following database as a running example (the “dataset D”): D = {(1, ACDE),
(2, ABCDE), (3, AB), (4, D), (5, B)}.
    The standard 0 derivation operators from FCA are denoted differently in this
context. Thus, given an itemset X, the tidset of all transactions comprising X in
their itemsets is the image of X, denoted t(X) (e.g. t(AB) = 23). We recall that
an itemset of length k is called a k-itemset. Moreover, the (absolute) support of
an itemset X, supp : ℘(A) → N, is supp(X) = |t(X)|. An itemset X is called fre-
quent, if its support is not less than a user-provided minimum support (denoted
by min supp). Recall as well that, in [X], the equivalence class of X induced
by t(), the extremal elements w.r.t. set-theoretic inclusion are, respectively, the
unique maximum X 00 (a.k.a. closed itemset or the concept intent), and its set
of minimums, a.k.a. the generator itemsets. In data mining, an alternative defi-
nition is traditionally used stating that an itemset X is closed (a generator ) if
it has no proper superset (subset) with the same support. For instance, in our
dataset D, B and C are generators, whose respective closures are B and ACDE.
    In [6], a subsumption relation is defined as well: X subsumes Z, iff X ⊃ Z and
supp(X) = supp(Z). Obviously, if Z subsumes X, then Z cannot be a generator.
In other words, if X is a generator, then all its subsets Y are generators as well3 .
Formally speaking, the generator family forms a downset within the Boolean
lattice of all itemsets h℘(A), ⊆i.
3
    Please notice that the dual property holds for non generators.
194
4       Laszlo Szathmary et al. Laszlo Szathmaryet al.




Fig. 1. Concept lattices of dataset D. (a) The entire concept lattice. (b) An iceberg
part of (a) with min supp = 3 (indicated by a dashed rectangle). (c) The concept
lattice with generators drawn within their respective nodes


    The FCI and FG families losslessly represent the family of all frequent item-
sets (FIs) [15]. They jointly compose various non-redundant bases of valid asso-
ciation rules, e.g. the generic basis [2]. Further bases require the inclusion order
≤ between FCIs or its transitive reduction ≺, i.e. the precedence relation.
    In Fig. 1 (adapted from [14]), views (a) and (b) depict the concept lattice
of dataset D and its iceberg part, respectively. Here we investigate the efficient
computation of the three components of an association rule basis, or what could
be spelled as the generator-decorated iceberg (see Fig. 1 (c)).


2.2   Effective mining methods for FCIs, FGs, and precedence links

Historically, the first algorithm computing all closures with their generators and
precedence links can be found in [10] (although under a different name in a
somewhat incomplete manner). Yet the individual tasks have been addressed
separately or in various combinations by a large variety of methods.
    First, the construction of all concepts is a classical FCA task and a large
number of algorithms exist for it using a wide range of computing strategies.
Yet they scale poorly as FCI miners due to their reliance on object-wise com-
putations (e.g. the incremental acquisition of objects as in [10]). These involve
to a large number of what is called data scans in data mining that are known
to seriously deteriorate the performances. In fact, the overwhelming majority of
FCA algorithms would suffer on the same drawback as they have been designed
under the assumption that the number of objects and the number of attributes
remain in the same order of magnitude. Yet in data mining, there is usually a
much larger number of transactions than there are items.
    As to generators, they have attracted significantly less attraction in FCA as
a standalone structure. Precedence links, in turn, are sometimes computed by
      Fast Mining of Iceberg Lattices:Mining
                                      A Modular  Approach
                                             Iceberg LatticesUsing
                                                              with Generators
                                                                   Generators   1955


concept mining FCA algorithms beside the concept set. Here again, objects are
heavily involved in the computation hence the poor scaling capacity of these
methods. The only notable exception to this rule is the method described in [16]
which was designed to deliberately avoid referring to objects by relying exclu-
sively on concept intents.
    When all three structures are considered together, after [10], efficient methods
for the combined task have been proposed, among others, in [11,12].
    In data mining, mining FCIs is also a popular task [17]. Many FCI miners
exist and a good proportion thereof would output FGs as a byproduct. For
instance, levelwise miners such as Titanic [5] and A-Close [17], use FGs as entry
points into the equivalence classes of their respective FCIs. In this field, the
FGs, under the name of free-sets [15], have been targeted by dedicated miners.
Precedence links do not seem to play a major role in pattern mining since few
miners would consider them. In fact, to the best of our knowledge, the only
mainstream FCI miner that would also output the Hasse diagram of the iceberg
lattice is Charm-L [8]. In order to avoid multiple data scans, Charm-L relies
on a specific encoding of the transaction database, called vertical, that takes
advantage of the aforementioned asymmetry between the number of transactions
and the number of items. Moreover, two ulterior extensions thereof [7,9] would
also cover the FGs for each FCI, making them the primary competitors for our
own approach.
    Despite the clear discrepancies in their modus operandi, both FCA-centered
algorithms and FCI/FG miners share their overall algorithmic schema. Indeed,
they first compute the set of concepts/FCIs and the precedence links between
them and then use these as input in generator/FG calculation. The latter task
can be either performed along a levelwise traversal of the equivalence class of a
given closure, as in [8] and [10], or shaped as the computation of the minimal
transversals of a dedicated hypergraph4 , as in [11,12] and [9].
    While such a schema could appear more intuitive from an FCA point of view
(first comes the lattice, then the generators which are seen as an “extra”), it is
less natural and eventually less profitable for data mining. Indeed, while a good
number of association rule bases would require the precedence links in order to
be constructed, FGs are used in a much larger set of such bases and may even
constitute a target structure of their own (see above). Hence, a more versatile
mining method would only output the precedence relation (and compute it) as
an option, which is not possible with the current design schema. More precisely,
the less rigid order between the steps of the combined task would be: (1) FCIs,
(2) FGs, and (3) precedence. This basically means that precedence needs to be
computed at the end, independently from FG and FCI computations (but may
rely on these structures as input). Moreover, the separation of the three steps
insures a higher degree of modularity in the design of the concrete methods
following our schema: Any combination of methods that solve an individual task
could be used, leaving the user with a vast choice. On the reverse side of the coin,

4
    Termed alternatively as (minimal) blockers or hitting sets.
196
6        Laszlo Szathmary et al. Laszlo Szathmaryet al.


total modularity comes with a price: if FGs and FCIs are computed separately,
an extra step will be necessary to match an FCI to its FGs.
    We describe hereafter a method fitting the above schema which relies exclu-
sively on existing algorithmic techniques. These are combined into a single global
procedure, called Snow-Touch in the following manner: The FCI computation
is delegated to the Charm algorithm which is also the basis for Charm-L. FGs
are extracted by our own vertical miner Talky-G. The two methods together
with an FG-to-FCI matching technique form the Touch algorithm [13]. Finally,
precedence is retrieved from FCIs with FGs by the Snow algorithm [14] using a
ground duality result from hypergraph theory.
    In the remainder of this section we summarize the theoretical and the algo-
rithmic background of the above methods which are themselves briefly presented
and illustrated in the next section.


2.3    Hypergraphs, transversals, and precedence in closure
       semi-lattices

The generator computation in [11] exploits the tight interdependence between
the intent of a concept, its generators and the intents of its immediate predecessor
concepts. Technically speaking, a generator is a minimal blocker for the family
of faces associated to the concept intent and its predecessor intents5 .
    Example. Consider the closed itemset (CI) lattice in Figure 1 (c). The CI
ABCDE has two faces: F1 = ABCDE \ AB = CDE and F2 = ABCDE \
ACDE = B.
    It turns out that blocker is a different term for the widely known hypergraph
transversal notion. We recall that a hypergraph [18] is a generalization of a graph
where edges can connect arbitrary number of vertices. Formally, it is a pair (V ,E)
made of a basic vocabulary V = {v1 , v2 , . . . , vn }, the vertices, and a family of
sets E, the hyper-edges, all drawn from V .
    A set T ⊆ V is called a transversal of H if it has a non-empty intersection
with all the edges of H. A special case are the minimal transversals that are
exploited in [11].
    Example. In the above example, the minimal transversals of {CDE, B} are
{BC, BD, BE}, hence these are the generators of ABCDE (see Figure 1 (c)).
    The family of all minimal transversals of H constitutes the transversal hy-
pergraph of H (T r(H)). A duality exists between a simple hypergraph and its
transversal hypergraph [18]: For a simple hypergraph H, T r(T r(H)) = H. Thus,
the faces of a concept intent are exactly the minimal transversals of the hyper-
graph composed by its generators.
    Example. The bottom node in Figure 1 (c) labelled ABCDE has three
generators: BC, BD, and BE while the transversals of the corresponding hy-
pergraph are {CDE, B}.
5
    A face is the set-theoretic difference between the intents of two concepts bound by
    a precedence link.
      Fast Mining of Iceberg Lattices:Mining
                                      A Modular  Approach
                                             Iceberg LatticesUsing
                                                              with Generators
                                                                   Generators    1977




Fig. 2. Left: pre-order traversal with Eclat; Right: reverse pre-order traversal with
Talky-G


2.4    Vertical Itemset Mining

Miners from the literature, whether for plain FIs or FCIs, can be roughly split
into breadth-first and depth-first ones. Breadth-first algorithms, more specifi-
cally the Apriori -like [1] ones, apply levelwise traversal of the pattern space
exploiting the anti-monotony of the frequent status. Depth-first algorithms, e.g.,
Closet [19], in contrast, organize the search space into a prefix-tree (see Figure 2)
thus factoring out the effort to process common prefixes of itemsets. Among
them, the vertical miners use an encoding of the dataset as a set of pairs (item,
tidset), i.e., {(i, t(i))|i ∈ A}, which helps avoid the costly database re-scans.
    Eclat [20] is a plain FI miner relying on a vertical encoding at a depth-first
traversal of a tree structure, called IT-tree, whose nodes are X ×t(X) pairs. Eclat
traverses the IT-tree in a pre-order way, from left-to-right [20] (see Figure 2).
Charm adapts the computing schema of Eclat to the rapid construction of the
FCIs [6]. It is knowingly one of the fastest FCI-miners, hence its adoption as
a component in Touch as well as the idea to look for similar technique for
FGs. However, a vertical FG miner would be closer to Eclat than to Charm
as it requires no specific speed-up during the traversal (recall that FGs form a
downset). In contrast, there is a necessary shift in the test focus w.r.t. Eclat:
Instead of supersets, subsets need to be examined to check candidate FGs. This,
in turn, requires that all such subsets are already tested at the moment an
itemset is examined. In other terms, the IT-tree traversal order needs to be a
linear extension of ⊆ order between itemsets.


3     The Snow-Touch Algorithm

We sketch below the key components of Snow-Touch i.e. Talky-G, Touch, and
Snow.


3.1    Talky-G

Talky-G is a vertical FG miner that constructs an IT-tree in a depth-first right-
to-left manner [13].
198
8       Laszlo Szathmary et al. Laszlo Szathmaryet al.


Traversal Of The Generator Search Space

Traversing ℘(A) so that a given set X is processed after all its subsets induces
a ⊆-complying traversal order, i.e. a linear extension of ⊆.
    In FCA, a similar technique is used by the Next-Closure algorithm [21]. The
underlying lectic order is rooted in an implicit mapping of ℘(A) to [0 . . . 2|A| −1],
where a set image is the decimal value of its characteristic vector w.r.t. an
arbitrary ordering rank : A ↔ [1..|A|]. The sets are then listed in the increasing
order of their mapping values which represents a depth-first traversal of ℘(A).
This encoding yields a depth-first right-to-left traversal (called reverse pre-order
traversal in [22]) of the IT-tree representing ℘(A).

Example. See Figure 2 for a comparison between the traversal strategies in
Eclat (left) and in Talky-G (right). Order-induced ranks of itemsets are drawn
next to their IT-tree nodes.


The Algorithm

The algorithm works the following way. The IT-tree is initialized by creating the
root node and hanging a node for each frequent item below the root (with its
respective tidset). Next, nodes below the root are examined, starting from the
right-most one. A 1-itemset p in such a node is an FG iff supp(p) < 100% in
which case it is saved to a dedicated list. A recursive exploration of the subtree
below the current node then ensues. At the end, all FGs are comprised in the
IT-tree.
    During the recursive exploration, all FGs from the subtree rooted in a node
are mined. First, FGs are generated by “joining” the subtree’s root to each of
its sibling nodes laying to the right. A node is created for each of them and
hung below the subtree’s root. The resulting node’s itemset is the union of its
parents’ itemsets while its tidset is the intersection of the tidsets of its parents.
Then, all the direct children of the subtree’s root are processed recursively in a
right-to-left order.
    When two FGs are joined to form a candidate node, two cases can occur.
Either we obtain a new FG, or a valid FG cannot be generated from the two
FGs. A candidate FG is the union of the input node itemsets while its tidset
is the intersection of the respective tidsets. It can fail the FG test either by
insufficient support (non frequent) or by a strict FG-subset of the same support
(which means that the candidate is a proper superset of an already found FG
with the same support).

Example. Figure 3 illustrates Talky-G on an input made of the dataset D and
a min supp = 1 (20%). The node ranks in the traversal-induced order are again
indicated. The IT-tree construction starts with the root node and its children
nodes: Since no universal item exists in D, all items are FGs and get a node
below the root. In the recursive extension step, the node E is examined first:
Absent right siblings, it is skipped. Node D is next: the candidate itemset DE
      Fast Mining of Iceberg Lattices:Mining
                                      A Modular  Approach
                                             Iceberg LatticesUsing
                                                              with Generators
                                                                   Generators   1999




         Fig. 3. Execution of Talky-G on dataset D with min supp = 1 (20%)


fails the FG test since of the same support as E. With C, both candidates CD
and CE are discarded for the same reason. In a similar fashion, the only FGs in
the subtree below the node of B are BC, BD, and BE. In the case of A, these
are AB and AD. ABD fails the FG test because of BD.

Fast Subsumption Checking
During the generation of candidate FGs, a subsumer itemset cannot be a gen-
erator. To speed up the subsumption computation, Talky-G adapts the hash
structure of Charm for storing frequent generators together with their support
values. Thus, as tidsets are used for hashing of FGs, two equivalent itemsets get
the same hash value. Hence, when tracking a potential subsumee for a candidate
X, we check within the corresponding list in the hash table for FGs Y having
(i) the same support as X and, if positive outcome, (ii) proper subsets of X (see
details in [13]).

Example. The hash structure of the IT-tree in Figure 3 is drawn in Figure 4
(top right). The hash table has four entries that are lists of itemsets. The hash
function over a tidset is the modulo 4 of the sum of all tids. For instance, to
check whether ABD subsumes a known FG, we take its hash key, 2 mod 4 = 2,
and check the content of the list at index 2. In the list order, B is discarded
for support mismatch, while BE fails the subset test. In contrast, BD succeeds
both the support and the inclusion tests so it invalidates the candidate ABD.

3.2    The Touch Algorithm
The Touch algorithm has three main features, namely (1) extracting frequent
closed itemsets, (2) extracting frequent generators, and (3) associating frequent
generators to their closures, i.e. identifying frequent equivalence classes.
    Finally, our method matches FGs to their respective FCIs. To that end,
it exploits the shared storage technique in both Talky-G and Charm, i.e. the
hashing on their images (see Figure 4 (top)). The calculation is immediate: as
the hash value of a FG is the same as for its FCI, one only needs to traverse
200
10      Laszlo Szathmary et al. Laszlo Szathmaryet al.




         FCI (supp) FGs                      FCI (supp) FGs
         AB (2)     AB                       B (3)      B
         ABCDE (1) BE; BD; BC                ACDE (2) E; C; AD
         A (3)      A                        D (3)      D

Fig. 4. Top: hash tables for dataset D with min supp = 1. Top left: hash table of
Charm containing all FCIs. Top right: hash table of Talky-G containing all FGs.
Bottom: output of Touch on dataset D with min supp = 1



the FG hash and for each itemset lookup the list of FCI associated to its own
hash value. Moreover, setting both lists to the same size, further simplifies the
procedure as both lists will then be located at the same offset within their
respective hash tables.
    Example. Figure 4 (top) depicts the hash structures of Charm and Talky-G.
Assume we want to determine the generators of ACDE which is stored at posi-
tion 3 in the hash structure of Charm. Its generators are also stored at position
3 in the hash structure of Talky-G. The list comprises three members that are
subsets of ACDE with the same support: E, C, and AD. Hence, these are the
generators of ACDE. The output of Touch is shown in Figure 4 (bottom).


3.3   The Snow Algorithm

Snow computes precedence links on FCIs from associated FGs [14]. Snow ex-
ploits the duality between hypergraphs made of the generators of an FCI and
of its faces, respectively to compute the latter as the transversals of the former.
Thus, its input is made of FCIs and their associated FGs. Several algorithms
can be used to produce this input, e.g. Titanic [5], A-Close [4], Zart [23], Touch,
etc. Figure 4 (bottom) depicts a sample input of Snow.
    On such data, Snow first computes the faces of a CI as the minimal transver-
sals of its generator hypergraph. Next, each difference of the CI X with a face
yields a predecessor of X in the closed itemset lattice.
    Example. Consider again ABCDE with its generator family {BC, BD, BE}.
First, we compute its transversal hypergraph: T r({BC, BD, BE}) = {CDE, B}.
The two faces F1 = CDE and F2 = B indicate that there are two predeces-
sors for ABCDE, say Z1 and Z2 , where Z1 = ABCDE \ CDE = AB, and
Z2 = ABCDE \ B = ACDE. Application of this procedure for all CIs yields
the entire precedence relation for the CI lattice.                               
      Fast Mining of Iceberg Lattices:Mining
                                      A Modular  Approach
                                             Iceberg LatticesUsing
                                                              with Generators
                                                                   Generators    201
                                                                                  11

                           Table 1. Database characteristics

                database # records # non-empty # attributes largest
                  name              attributes (in average) attribute
               T25I10D10K 10,000       929          25        1,000
               Mushrooms 8,416         119          23         128
                 Chess     3,196        75          37          75
                Connect   67,557       129          43         129


4     Experimental Evaluation
In this section we discuss practical aspects of our method. First, in order to
demonstrate that our approach is computationally efficient, we compare its per-
formances on a wide range of datasets to those of Charm-L. Then, we present
an application of Snow-Touch to the analysis of genomic data together with an
excerpt of the most remarkable gene associations that our method helped to
uncover.

4.1    Snow-Touch vs. Charm-L
We evaluated Snow-Touch against Charm-L [8,9]. The experiments were carried
out on a bi-processor Intel Quad Core Xeon 2.33 GHz machine running Ubuntu
GNU/Linux with 4 GB RAM. All times reported are real, wall clock times, as ob-
tained from the Unix time command between input and output. Snow-Touch was
implemented entirely in Java. For performance comparisons, the authors’ origi-
nal C++ source of Charm-L was used. Charm-L and Snow-Touch were executed
with these options: ./charm-l -i input -s min_supp -x -L -o COUT -M 0 -n;
./leco.sh input min_supp -order -alg:dtouch -method:snow -nof2. In each case,
the standard output was redirected to a file. The diffset optimization tech-
nique [24] was activated in both algorithms.6
    Benchmark datasets. For the experiments, we used several real and syn-
thetic dataset benchmarks (see Table 1). The synthetic dataset T25, using the
IBM Almaden generator, is constructed according to the properties of market
basket data. The Mushrooms database describes mushrooms characteristics.
The Chess and Connect datasets are derived from their respective game
steps. The latter three datasets can be found in the UC Irvine Machine Learn-
ing Database Repository. Typically, real datasets are very dense, while synthetic
data are usually sparse. Response times of the two algorithms on these datasets
are presented in Figure 5.
    Charm-L. Charm-L represents a state-of-the-art algorithm for closed item-
set lattice construction [8]. Charm-L extends Charm to directly compute the lat-
tice while it generates the CIs. In the experiments, we executed Charm-L with a
switch to compute (minimal) generators too using the minhitset method. In [9],
Zaki and Ramakrishnan present an efficient method for calculating the genera-
tors, which is actually the generator-computing method of Pfaltz and Taylor [25].
6
    Charm-L uses diffsets by default, thus no explicit parameter was required.
202
12                          Laszlo Szathmary et al. Laszlo Szathmaryet al.


                                        T25I10D10K                                                       MUSHROOMS
                      180
                                       Snow-Touch                                        26              Snow-Touch
                      160        Charm-L[minhitset]                                                Charm-L[minhitset]
                                                                                         24
                      140
                                                                                         22
                      120
total time (sec.)




                                                                   total time (sec.)
                                                                                         20
                      100
                                                                                         18
                      80                                                                 16
                      60                                                                 14
                      40                                                                 12
                      20                                                                 10

                        0                                                                 8
                        0.50      0.40     0.30      0.20   0.10                          1.00         0.75        0.50       0.25
                                    minimum support (%)                                               minimum support (%)

                                           CHESS                                                           CONNECT
                      100
                                       Snow-Touch                                        60              Snow-Touch
                       90
                                 Charm-L[minhitset]                                                Charm-L[minhitset]
                       80
                                                                                         50
                       70
  total time (sec.)




                                                                     total time (sec.)



                       60                                                                40

                       50
                                                                                         30
                       40
                       30                                                                20

                       20
                                                                                         10
                       10
                        0                                                                 0
                            70      65       60        55    50                               70      65       60        55    50
                                     minimum support (%)                                               minimum support (%)


                                     Fig. 5. Response times of Snow-Touch and Charm-L.


This way, the two algorithms (Snow-Touch and Charm-L) are comparable since
they produce exactly the same output.
    Performance on sparse datasets. On T25, Charm-L performs better than
Snow-Touch. We have to admit that sparse datasets are a bit problematic for
our algorithm. The reason is that T25 produces long sparse bitvectors, which
gives some overhead to Snow-Touch. In our implementation, we use bitvectors
to store tidsets. However, as can be seen in the next paragraph, our algorithm
outperforms Charm-L on all the dense datasets that were used during our tests.
    Performance on dense datasets. On Mushrooms, Chess and Con-
nect, we can observe that Charm-L performs well only for high values of sup-
port. Below a certain threshold, Snow-Touch gives lower response times, and
the gap widens as the support is lowered. When the minimum support is set
low enough, Snow-Touch can be several times faster than Charm-L. Consid-
ering that Snow-Touch is implemented in Java, we believe that a good C++
implementation could be several orders of magnitude faster than Charm-L.
      Fast Mining of Iceberg Lattices:Mining
                                      A Modular  Approach
                                             Iceberg LatticesUsing
                                                              with Generators
                                                                   Generators   203
                                                                                 13



    According to our experiments, Snow-Touch can construct the concept lattices
faster than Charm-L in the case of dense datasets. From this, we draw the
hypothesis that our direction towards the construction of FG-decorated concept
lattices is more beneficial than the direction of Charm-L. That is, it is better to
extract first the FCI/FG-pairs and then determine the order relation between
them than first extracting the set of FCIs, constructing the order between them,
and then determining the corresponding FGs for each FCI.


4.2    Analysis of Antibiotic Resistant Genes

We looked at the practical performance of Snow-Touch on real-world genomic
dataset whereby the goal was to discover meaningful associations between genes
in entire genomes seen as items and transactions, respectively.
    The genomic dataset was collected from the website of the National Cen-
ter for Biotechnology Information (NCBI) with a focus on genes from microbial
genomes. At the time of writing (June 2011), 1, 518 complete microbial genomes
were available on the NCBI website.7 For each genome, its list of genes was col-
lected (for instance the genome with ID CP002059.1 has two genes, rnpB and
ssrA). Only 1, 250 genomes out of the 1, 518 proved non empty; we put them
in a binary matrix of 1, 250 rows × 125, 139 columns. With an average of 684
genes per genome we got 0.55% density (i.e., large yet sparse dataset with an
imbalance between numbers of rows and of columns).
    The initial result of the mining task was the family of minimal non-
redundant association rules (MN R), which are directly available from the out-
put of Snow-Touch. We sorted them according to the confidence. Among all
strong associations, the bioinformaticians involved in this study found most ap-
pealing the rules describing the behavior of antibiotic resistant genes, in partic-
ular, the mecA gene. mecA is frequently found in bacterial cells. It induces a
resistance to antibiotics such as Methicillin, Penicillin, Erythromycin, etc. [26].
The most commonly known carrier of the gene mecA is the bacterium known as
MRSA (methicillin-resistant Staphylococcus aureus).
    At a second step, we were narrowing the focus on a group of three genes,
mecA plus ampC and vanA [27]. ampC is a beta-lactam-resistance gene. AmpC
beta-lactamases are typically encoded on the chromosome of many gram-negative
bacteria; it may also occur on Escherichia coli. AmpC type beta-lactamases
may also be carried on plasmids [26]. Finally, the gene vanA is a vancomycin-
resistance gene typically encoded on the chromosome of gram-positive bacteria
such as Enterococcus. The idea was to relate the presence of these three genes
to the presence or absence of any other gene or a combination thereof.
    Table 2 shows an extract of the most interesting rules found by our algorithm.
These rules were selected from a set of 18,786 rules.
    For instance, rule (1) in Table 2 says that the gene mecA is present in
85.71% of cases when the set of genes {clpX, dnaA, dnaI, dnaK, gyrB, hrcA, pyrF}
7
    http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi
204
14       Laszlo Szathmary et al. Laszlo Szathmaryet al.

Table 2. An extract of the generated minimal non-redundant association rules. After
each rule, the following measures are indicated: support, confidence, support of the
left-hand side (antecedent), support of the right-hand side (consequent)

    (1) {clpX, dnaA, dnaI, dnaK, gyrB, hrcA, pyrF } → {mecA} (supp=96 [7.68%];
         conf=0.857 [85.71%]; suppL=112 [8.96%]; suppR=101 [8.08%])
    (2) {clpX, dnaA, dnaI, dnaK, nusG} → {mecA} (supp=96 [7.68%];
         conf=0.835 [83.48%]; suppL=115 [9.20%]; suppR=101 [8.08%])
    (3) {clpX, dnaA, dnaI, dnaJ, dnaK } → {mecA} (supp=96 [7.68%];
         conf=0.828 [82.76%]; suppL=116 [9.28%]; suppR=101 [8.08%])
    (4) {clpX, dnaA, dnaI, dnaK, ftsZ } → {mecA} (supp=96 [7.68%];
         conf=0.828 [82.76%]; suppL=116 [9.28%]; suppR=101 [8.08%])
    (5) {clpX, dnaA, dnaI, dnaK } → {mecA} (supp=97 [7.76%]; conf=0.815 [81.51%];
         suppL=119 [9.52%]; suppR=101 [8.08%])
    (6) {greA, murC, pheS, rnhB, ruvA} → {ampC } (supp=99 [7.92%];
         conf=0.227 [22.71%]; suppL=436 [34.88%]; suppR=105 [8.40%])
    (7) {murC, pheS, pyrB, rnhB, ruvA} → {ampC } (supp=99 [7.92%];
         conf=0.221 [22.15%]; suppL=447 [35.76%]; suppR=105 [8.40%])
    (8) {dxs, hemA} → {vanA} (supp=29 [2.32%]; conf=0.081 [8.15%];
         suppL=356 [28.48%]; suppR=30 [2.40%])
    (9) {dxs} → {vanA} (supp=30 [2.40%]; conf=0.067 [6.73%]; suppL=446 [35.68%];
         suppR=30 [2.40%])



is present in a genome. The above rules have a direct practical use. In one such
scenario, they could be used to suggest which antibiotic should be taken by a
patient depending on the presence or absence of certain genes in the infecting
microbe.


5     Conclusion

We presented a new design schema for the task of mining the iceberg lattice
and the corresponding generators out of a large context. The target structure
directly involved in the construction of a number of association rule bases and
hence is of a certain importance in the data mining field. While previously pub-
lished algorithms follow the same schema, i.e., construction of the iceberg lattice
(FCIs plus precedence links) followed by the extraction of the FGs, our approach
consists in inferring precedence links from the previously mined FCIs with their
FGs.
    We presented an initial and straightforward instanciation of the new algorith-
mic schema that reuses existing methods for the three steps: the popular Charm
FCI miner, our own method for FG extraction, Talky-G (plus an FGs-to-FCIs
matching procedure), and the Hasse diagram constructor Snow. The resulting
iceberg plus FGs miner, Snow-Touch, is far from an optimal algorithm, in par-
ticular due to redundancies in the first two steps. Yet an implementation thereof
within the Coron platform (in Java) has managed to outperform its natural
     Fast Mining of Iceberg Lattices:Mining
                                     A Modular  Approach
                                            Iceberg LatticesUsing
                                                             with Generators
                                                                  Generators        205
                                                                                     15

competitor, Charm-L (in C++) on a wide range of datasets, especially on dense
ones.
     To level the playing ground, we are currently re-implementing Snow-Touch
in C++ and expect the new version to be even more efficient. In a different vein,
we have tested the capacity of our approach to support practical mining task by
applying it to the analysis of genomic data. While a large number of associations
usually come out of such datasets, many of the redundant with respect to each
other, by limiting the output to only the generic ones, our method helped focus
the analysts’ attention to a smaller number of significant rules.
     As a next step, we are studying a more integrated approach for FCI/FG con-
struction that requires no extra matching step. This should result in substantial
efficiency gains. On the methodological side, our study underlines the duality
between generators and order w.r.t. FCIs: either can be used in combination
with FCIs to yield the other one. It rises the natural question of whether FCIs
alone, which are output by a range of frequent pattern miners, could be used to
efficiently retrieve first precedence, and then FGs.

References
 1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large
    Databases. In: Proc. of the 20th Intl. Conf. on Very Large Data Bases (VLDB
    ’94), San Francisco, CA, Morgan Kaufmann (1994) 487–499
 2. Kryszkiewicz, M.: Concise Representations of Association Rules. In: Proc. of the
    ESF Exploratory Workshop on Pattern Detection and Discovery. (2002) 92–109
 3. Bastide, Y., Taouil, R., Pasquier, N., Stumme, G., Lakhal, L.: Mining Minimal
    Non-Redundant Association Rules Using Frequent Closed Itemsets. In: Proc. of the
    Computational Logic (CL ’00). Volume 1861 of LNAI., Springer (2000) 972–986
 4. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed
    Itemsets for Association Rules. In: Proc. of the 7th Intl. Conf. on Database Theory
    (ICDT ’99), Jerusalem, Israel (1999) 398–416
 5. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing Iceberg
    Concept Lattices with Titanic. Data and Knowl. Eng. 42(2) (2002) 189–222
 6. Zaki, M.J., Hsiao, C.J.: CHARM: An Efficient Algorithm for Closed Itemset Min-
    ing. In: SIAM Intl. Conf. on Data Mining (SDM’ 02). (Apr 2002) 33–43
 7. Zaki, M.J.: Mining Non-Redundant Association Rules. Data Mining and Knowl-
    edge Discovery 9(3) (2004) 223–248
 8. Zaki, M.J., Hsiao, C.J.: Efficient Algorithms for Mining Closed Itemsets and Their
    Lattice Structure. IEEE Trans. on Knowl. and Data Eng. 17(4) (2005) 462–478
 9. Zaki, M.J., Ramakrishnan, N.: Reasoning about Sets using Redescription Mining.
    In: Proc. of the 11th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data
    Mining (KDD ’05), Chicago, IL, USA (2005) 364–373
10. Godin, R., Missaoui, R.: An incremental concept formation approach for learning
    from databases. Theoretical Computer Science Journal (133) (1994) 387–419
11. Pfaltz, J.L.: Incremental Transformation of Lattices: A Key to Effective Knowledge
    Discovery. In: Proc. of the First Intl. Conf. on Graph Transformation (ICGT ’02),
    Barcelona, Spain (Oct 2002) 351–362
12. Le Floc’h, A., Fisette, C., Missaoui, R., Valtchev, P., Godin, R.: JEN : un al-
    gorithme efficace de construction de générateurs pour l’identification des règles
    d’association. Nouvelles Technologies de l’Information 1(1) (2003) 135–146
206
16      Laszlo Szathmary et al. Laszlo Szathmaryet al.


13. Szathmary, L., Valtchev, P., Napoli, A., Godin, R.: Efficient Vertical Mining of
    Frequent Closures and Generators. In: Proc. of the 8th Intl. Symposium on In-
    telligent Data Analysis (IDA ’09). Volume 5772 of LNCS., Lyon, France, Springer
    (2009) 393–404
14. Szathmary, L., Valtchev, P., Napoli, A., Godin, R.: Constructing Iceberg Lattices
    from Frequent Closures Using Generators. In: Discovery Science. Volume 5255 of
    LNAI., Budapest, Hungary, Springer (2008) 136–147
15. Calders, T., Rigotti, C., Boulicaut, J.F.: A Survey on Condensed Representations
    for Frequent Sets. In Boulicaut, J.F., Raedt, L.D., Mannila, H., eds.: Constraint-
    Based Mining and Inductive Databases. Volume 3848 of Lecture Notes in Computer
    Science., Springer (2004) 64–80
16. Baixeries, J., Szathmary, L., Valtchev, P., Godin, R.: Yet a Faster Algorithm for
    Building the Hasse Diagram of a Galois Lattice. In: Proc. of the 7th Intl. Conf.
    on Formal Concept Analysis (ICFCA ’09). Volume 5548 of LNAI., Darmstadt,
    Germany, Springer (May 2009) 162–177
17. Pasquier, N.: Mining association rules using formal concept analysis. In: Proc.
    of the 8th Intl. Conf. on Conceptual Structures (ICCS ’00), Shaker-Verlag (Aug
    2000) 259–264
18. Berge, C.: Hypergraphs: Combinatorics of Finite Sets. North Holland, Amsterdam
    (1989)
19. Pei, J., Han, J., Mao, R.: CLOSET: An Efficient Algorithm for Mining Frequent
    Closed Itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining
    and Knowledge Discovery. (2000) 21–30
20. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New Algorithms for Fast
    Discovery of Association Rules. In: Proc. of the 3rd Intl. Conf. on Knowledge
    Discovery in Databases. (August 1997) 283–286
21. Ganter, B., Wille, R.: Formal concept analysis: mathematical foundations.
    Springer, Berlin/Heidelberg (1999)
22. Calders, T., Goethals, B.: Depth-first non-derivable itemset mining. In: Proc.
    of the SIAM Intl. Conf. on Data Mining (SDM ’05), Newport Beach, USA. (Apr
    2005)
23. Szathmary, L., Napoli, A., Kuznetsov, S.O.: ZART: A Multifunctional Itemset
    Mining Algorithm. In: Proc. of the 5th Intl. Conf. on Concept Lattices and Their
    Applications (CLA ’07), Montpellier, France (Oct 2007) 26–37
24. Zaki, M.J., Gouda, K.: Fast vertical mining using diffsets. In: Proc. of the 9th
    ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining (KDD ’03),
    New York, NY, USA, ACM Press (2003) 326–335
25. Pfaltz, J.L., Taylor, C.M.: Scientific Knowledge Discovery through Iterative Trans-
    formation of Concept Lattices. In: Proc. of the SIAM Workshop on Data Mining
    and Discrete Mathematics, Arlington, VA, USA (2002) 65–74
26. Philippon, A., Arlet, G., Jacoby, G.A.: Plasmid-Determined AmpC-Type β-
    Lactamases. Antimicrobial Agents and Chemotherapy 46(1) (2002) 1–11
27. Schwartz, T., Kohnen, W., Jansen, B., Obst, U.: Detection of antibiotic-resistant
    bacteria and their resistance genes in wastewater, surface water, and drinking water
    biofilms. Microbiology Ecology 43(3) (2003) 325–335
        Boolean factors as a means of clustering of
      interestingness measures of association rules ?

  Radim Belohlavek1 , Dhouha Grissa2,4,5 , Sylvie Guillaume3,4 , Engelbert Mephu
                           Nguifo2,4 , Jan Outrata1
                             1
                                 Data Analysis and Modeling Lab
               Department of Computer Science , Palacky University, Olomouc
                     17. listopadu 12, CZ-77146 Olomouc, Czech Republic
                      radim.belohlavek@acm.org,jan.outrata@upol.cz
        2
           Clermont Université, Université Blaise Pascal, LIMOS, BP 10448, F-63000
                                   Clermont-Ferrand, France
         3
            Clermont Université, Université d’Auvergne, LIMOS, BP 10448, F-63000
                                   Clermont-Ferrand, France
                   4
                      CNRS, UMR 6158, LIMOS, F-63173 Aubiére, France
      5
        URPAH, Département d’Informatique, Faculté des Sciences de Tunis, Campus
                                Universitaire, 1060 Tunis, Tunisie
                  dgrissa@isima.fr,guillaum@isima.fr,mephu@isima.fr



          Abstract. Measures of interestingness play a crucial role in association
          rule mining. An important methodological problem is to provide a rea-
          sonable classification of the measures. Several papers appeared on this
          topic. In this paper, we explore Boolean factor analysis, which uses formal
          concepts corresponding to classes of measures as factors, for the purpose
          of classification and compare the results to the previous approaches.


  1     Introduction
  An important problem in extracting association rules, well known since the early
  stage of association rule mining [32], is the possibly huge number of rules ex-
  tracted from data. A general way of dealing with this problem is to define the
  concept of rule interestingness: only association rules that are considered inter-
  esting according to some measure are presented to the user. The most widely
  used measures of interestingness are based on the concept of support and con-
  fidence. However, the suitability of these measures to extract interesting rules
  was challenged by several studies, see e.g. [34]. Consequently, several other in-
  terestingness measures of association rules were proposed, see e.g. [35], [23], [12],
  [38]. With the many existing measures of interestingness arises the problem of
  selecting an appropriate one.
  ?
      We acknowledge support by the ESF project No. CZ.1.07/2.3.00/20.0059, the project
      is co-financed by the European Social Fund and the state budget of the Czech Re-
      public (R. Belohlavek); Grant No. 202/10/P360 of the Czech Science Foundation
      (J. Outrata); and by Grant No. 11G1417 of the French-Tunisian cooperation PHC
      Utique (D. Grissa).


c 2011 by the paper authors. CLA 2011, pp. 207–222. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
208     Radim Belohlavek et al.


    To understand better the behavior of various measures, several studies of
the properties of measures of interestingness appeared, see e.g. [12], [27], [23],
[16]. Those studies explore various properties of the measures that are considered
important. For example, Vaillant et al. [37] evaluated twenty interestingness mea-
sures according to eight properties. To facilitate the choice of the user-adapted
interestingness measure, the authors applied the clustering methods on the de-
cision matrix and obtained five clusters. Tan et al. [35] studied twenty-one in-
terestingness measures through eight properties and showed that no measure is
adapted to all cases. To select the best interestingness measure, they use both a
support-based pruning and standardization methods. By applying a new cluster-
ing approach, Huynh et al. [21] classifyed thirty-four interestingness measures
with a correlation analysis. Geng and Hamilton [12] made a survey of thirty-
eight interestingness measures for rules and summaries with eleven properties
and gived strategies to select the appropriate measures. D. R. Feno [10] evalu-
ated fifteen interestingness measures with thirteen properties to describe their
behaviour. Delgato et al. [9] provided a new study of the interestingness measures
by means of the logical model. In addition, the authors proposed and justified
the addition of two new principles to the three proposed by Piatetsky-Shapiro
[32]. Finally, Heravi and Zaiane [22] studied fifty-three objective measures for
associative classification rules according to sixteen properties and explained that
no single measure can be introduced as an obvious winner.
    The assessment of measures according to their properties results in a measure-
property binary matrix. Two studies of this matrix were conducted. Namely, [17]
describes how FCA can highlight interestingness measures with similar behav-
ior in order to help the user during his choice. [16] and [14] attempted to find
natural clusters of measures using widely used clustering methods, the agglomer-
ative hierarchical method (AHC) and the K-means method. A common feature
of these methods is that they only produce disjoint clusters of measures. On
the other hand, one could naturally expect overlapping clusters. The aim of this
paper is to explore the possibility of obtaining overlapping clusters of measures
using factor analysis of binary data and to compare the results with the results
of other studies. In particular, we use the recently developed method from [3]
and take the discovered factors for clusters. The method uses formal concepts
as factors that makes it possible to interpret the factors easily.


2     Preliminaries
2.1   Binary (Boolean) data
Let X be a set of objects (such as a set of customers, a set of functions or the
like) and Y be a set of attributes (such as a set of products that customers may
buy, a set of properties of functions). The information about which objects have
which attributes may formally be represented by a binary relation I between
X and Y , i.e. I ⊆ X × Y , and may be visualized by a table (matrix) that
contains 1s and 0s, according to whether the object corresponding to a row has
the attribute corresponding to a column (for this we suppose some orders of
 Bool. factors as a means of clust. of interestingness measures of assoc. rules   209


objects and attributes are fixed). We denote the entries of such matrix by Ixy .
A data of this type is called binary data (or Boolean data). The triplet hX, Y, Ii
is called a formal context in FCA but other terms are used in other areas.
    Such type of data appears in two roles in our paper. First, association rules,
whose interestingness measures we analyze, are certain dependencies over the
binary data. Second, the information we have about the interestingness measures
of association rules is in the form of binary data: the objects are interestingness
measures and the attributes are their properties.


2.2   Association rules

An association rule [36] over a set Y of attributes is a formula

                                         A⇒B                                      (1)

where A and B are sets of attributes from Y , i.e. A, B ⊆ Y . Let hX, Y, Ii be
a formal context. A natural measure of interestingness of association rules is
based on the notions of confidence and support. The confidence and support of
an association rule A ⇒ B in hX, Y, Ii is defined by

                           |A↓ ∩ B ↓ |                            |A↓ ∩ B ↓ |
         conf(A ⇒ B) =                   and   supp(A ⇒ B) =                  ,
                             |A↓ |                                   |X|

where C ↓ for C ⊆ Y is defined by C ↓ = {x ∈ X | for each y ∈ C : hx, yi ∈ I}.
An association rule is considered interesting if its confidence and support ex-
ceed some user-specified thresholds. However, the support-confidence approach
reveals some weaknesses. Often, this approach as well as algorithms based on it
lead to the extraction of an exponential number of rules. Therefore, it is impos-
sible to validate it by an expert. In addition, the disadvantage of the support is
that sometimes many rules that are potentially interesting, have a lower support
value and therefore can be eliminated by the pruning threshold minsupp. To ad-
dress this problem, many other measures of interestingness have been proposed
in the literature [13], mainly because they are effective for mining potentially
interesting rules and capture some aspects of user interest. The most important
of those measures are subject to our analysis and are surveyed in Section 3.1.
Note that association rules are attributed to [1]. However, the concept of associ-
ation rule itself as well as various measures of interestingness are particular cases
of what is investigated in depth in [18], a book that develops logico-statistical
foundations of the GUHA method [19].


2.3   Factor analysis of binary (Boolean) data

Let I be an n × m binary matrix. The aim in Boolean factor analysis is to find
a decomposition
                                 I =A◦B                                    (2)
210     Radim Belohlavek et al.


of I into an n × k binary matrix A and a k × m binary matrix B with ◦ denoting
the Boolean product of matrices, i.e.
                                         k
                          (A ◦ B)ij = max min(Ail , Blj ).
                                        l=1

The inner dimension, k, in the decomposition may be interpreted as the number
of factors that may be used to describe the original data. Namely, Ail = 1 if
and only if the lth factor applies to the ith object and Blj = 1 if and only if
the jth attribute is one of the manifestations of the lth factor. The factor model
behind (2) has therefore the following meaning: The object i has the attribute j
if and only if there exists a factor l that applies to i and for which j is one of its
particular manifestations. We refer to [3] for further information and references
to papers that deal with the problem of factor analysis and decompositions of
binary matrices.
    In [3], the following method for finding decompositions (2) with the number
k of factors as small as possible has been presented. The method utilizes formal
concepts of the formal context hX, Y, Ii as factors, where X = {1, . . . , n}, Y =
{1, . . . , m} (objects and attributes correspond to the rows and columns of I).
Let
                            F = {hC1 , D1 i, . . . , hCk , Dk i}
be a set of formal concepts of hX, Y, Ii, i.e. hCl , Dl i are elements of the concept
lattice B(X, Y, I) [11]. Consider the n × k binary matrix AF and a k × m binary
matrix BF defined by

               (AF )il = 1 iff i ∈ Cl   and    (BF )lj = 1 iff j ∈ Dl .           (3)

Denote by ρ(I) the smallest number k, so-called Schein rank of I, such that a
decomposition of I exists with k factors. The following theorem shows that using
formal concepts as factors as in (3) enables us to reach the Schein rank, i.e. is
optimal [3]:

Theorem 1. For every binary matrix I, there exists F ⊆ B(X, Y, I) such that
I = AF ◦ BF and |F| = ρ(I).

    As has been demonstrated in [3], a useful feature of using formal concepts as
factors is the fact that formal concepts may easily be interpreted. Namely, every
factor, i.e. a formal concept hCl , Dl i, consists of a set Cl of objects (objects are
measures of interestingness in our case) and a set Dl of attributes (properties of
measures in our case). Cl contains just the objects to which all the attributes
from Dl apply and Dl contains all attributes shared by all objects from Cl . From
a clustering point of view, the factors hCl , Dl i may thus be seen as clusters Cl
with their descriptions by attributes from Dl . The factors thus have a natural,
easy to understand meaning. Since the problem of computing the smallest set
of factors is NP-hard, a greedy approximation algorithm was proposed in [3,
Algorithm 2]. This algorithm is utilized below in our paper.
    Bool. factors as a means of clust. of interestingness measures of assoc. rules   211


3      Clustering interestingness measures using Boolean
       factors

3.1     Measures of interestingness

In the following, we present the interestingness measures reported in the litera-
ture and recall nineteen of their most important properties that were proposed
in the literature.
    To identify interesting association rules and to enable the user to focus on
what is interesting for him, about sixty interestingness measures [20], [35], [10]
were proposed in the literature. All of them are defined using the following
parameters: p(XY ), p(X̄Y ), p(X Ȳ ) and p(X̄ Ȳ ), where p(XY ) = nXY
                                                                     n represents
the number of objects satisfying XY (the intersection of X and Y ), and X̄ is the
negation of X. The following are important examples of interestingness measures:

Lift [6]: Given a rule X → Y , lift is the ratio of the probability that X and Y
occur together to the multiple of the two individual probabilities for X and Y ,
i.e.,

                                                p(XY )
                               Lift(X → Y ) = p(X)×p(Y ).


If this value is 1, then X and Y are independent. The higher this value, the
more likely that the existence of X and Y together in a transaction is not just
a random occurrence, but because of some relationship between them.

Correlation coefficient [31]: Correlation is a symmetric measure evaluating
the strength of the itemsets’ connection. It is defined by

                           Correlation = √p(XY )−p(X)p(Y ) .
                                              p(X)p(Y )p(X̄)p(Ȳ )


A correlation around 0 indicates that X and Y are not correlated. The lower is
its value, the more negatively correlated X and Y are. The higher is its value,
the more positively correlated they are.

Conviction [6]: Conviction is one of the measures that favor counter-examples.
It is defined by

                                  Conviction = p(X)p( Ȳ )
                                                p(X Ȳ )


Conviction which is not a symmetric measure, is used to quatify the deviation
from independence. If its value is 1, then X and Y are independent.
212     Radim Belohlavek et al.


MGK [15]: MGK is an interesting measure, which allows the extraction of neg-
ative rules.

                    MGK = p(Y1−p(Y
                             /X)−p(Y )
                                   )   ,     if X favorise Y
                  MGK = p(Y /X)−p(Y
                             p(Y )
                                    )
                                      ,    if X defavorise Y

It takes into account several situations of references: in the case where the rule
is situated in the attractive zone (i.e. p(Y /X) > p(Y )), this measure evaluates
the distance between independence and logical implication. Thus, the higher the
value of MGK is close to 1, the more the rule is close to the logical implication
and the higher the value of MGK is close to 0, the more the rule is close to the
independence. In the case where the rule is located in the repulsive zone (i.e.
p(Y /X) < p(Y )), MGK evaluates this time a distance between the independence
and the incompatibility. Thus, the closer the value of MGK is to −1, the more
similar to incompatibility the rule is; and the closer the value of MGK is to 0,
the closer to the independence the rule is.
    As was mentioned above, several studies [35], [23], [25], [13] were reported
in the literature on the various properties of interestingess measures to be able
to characterize and evaluate the interestingness measures. The main goal of
researchers in the domain is then to provide a user assistance in choosing the
best interestingness measure meeting his needs. For that, formal properties have
been developed [32], [24], [35], [12], [4] in order to evaluate the interestingness
measures and to help users understanding their behavior. In the following, we
present nineteen properties reported in the literature.


3.2   Properties of the measures

Figure 1 lists 19 properties of interestingness measures. The properties are de-
scribed in detail in [16]; we omit details due to lack of space.
    The authors in [14] proposed an evaluation of 61 interestingness measures
according to the 19 properties (P3 to P21 ). Properties P1 and P2 were not taken
into account in this study because of their subjective character. The measures
and their properties result in a binary measure-property matrix that is used for
clustering the measures according to their properties. The clustering performed
in [14] using the agglomerative hierarchical method and the K-means method
revealed 7 clusters of measures which will be used in the next section in a com-
parison with the results obtained by Boolean factor analysis applied on the same
measure-property matrix.


3.3   Clustering using Boolean factors

The measure-property matrix describing interestingness measures by their prop-
erties is depicted in Figure 2. It consists of 62 measures (61 measures from [14]
plus one more that has been studied recently) described by 21 properties be-
cause the three-valued property P14 is represented by three yes-no properties
    Bool. factors as a means of clust. of interestingness measures of assoc. rules              213

      No.   Property                                                               Ref.
      P1    Intelligibility or comprehensibility of measure                        [25]
      P2    Easiness to fix a threshold to the rule                                [23]
      P3    Asymmetric measure.                                                    [35], [23]
      P4    Asymmetric measure in the sense of the conclusion negation.            [23], [35]
      P5    Measure assessing in the same way X → Y and Ȳ → X̄ in the logical [23]
            implication case.
      P6    Measure increasing function the number of examples or decreasing func- [32], [23]
            tion the number of counter-examples.
      P7    Measure increasing function the data size.                             [12], [35]
      P8    Measure decreasing function the consequent/antecedent size.            [23], [32]
      P9    Fixed value a in the independence case.                                [23], [32]
      P10   Fixed value b in the logical implication case.                         [23]
      P11   Fixed value c in the equilibrium case.                                 [5]
      P12   Identified values in the attraction case between X and Y .             [32]
      P13   Identified values in the repulsion case between X and Y .              [32]
      P14   Tolerance to the first counter-example.                                [23], [38]
      P15   Invariance in case of expansion of certain quantities.                 [35]
      P16   Desired relationship between X → Y and X̄ → Y rules.                   [35]
      P17   Desired relationship between X → Y and X → Ȳ antinomic rules.         [35]
      P18   Desired relationship between X → Y and X̄ → Ȳ rules.                  [35]
      P19   Antecedent size is fixed or random.                                    [23]
      P20   Descriptive or statistical measure.                                    [23]
      P21   Discriminant measure.                                                  [23]



                         Fig. 1. Interestingness measures properties.


P14.1 , P14.2 , and P14.3 . We computed the decomposition of the matrix using Al-
gorithm 2 from [3] and obtained 28 factors (as in the case below, several of them
may be disregarded as not very important; we leave the details for a full version
of this paper). In addition, we extended the original 62 × 21 binary matrix by
adding for every property its negation, and obtained a 62×42 binary matrix. The
reason for adding negated properties is due to our goal to compare the results
with the two clustering methods mentioned above and the particular role of the
properties and their negations in these clustering methods. From the 62 × 42
matrix, we obtained 38 factors, denoted F1 , . . . , F38 . The factors are presented
in Figures 3 and 4. Figure 3 depicts the object-factor matrix describing the in-
terestingness measures by factors, Figure 4 depicts the factor-property matrix
explaining factors by properties of measures. Factors are sorted from the most
important to the least important, where the importance is determined by the
number of 1s in the input measure-property matrix covered by the factor [3].
The first factors cover a large part of the matrix, while the last ones cover only
a small part and may thus be omitted [3], see the graph of cumulative cover of
the matrix by the factors in Figure 5.


4      Interpretation and comparison to other approaches
The aim of this section is to provide an interpretation of the results described
in the previous section and compare them to the results already reported in the
literature, focusing mainly on [14]. As was described in the previous section, 38
factors were obtained. The first 21 of them cover 94 % of the input measure-
property matrix (1s in the matrix), the first nine cover 72 %, and the first five
214       Radim Belohlavek et al.




                                           P14.1




                                           P14.2
                                           P14.3
                                           P10
                                           P11
                                           P12
                                           P13

                                           P15
                                           P16
                                           P17
                                           P18
                                           P19
                                           P20
                                           P21
                                           P3
                                           P4
                                           P5
                                           P6
                                           P7
                                           P8
                                           P9
                           correlation     0   1   1   1   1   1   1   0   0   1   1   0   0   1   1   1   0   0   1   1   0
                                 Cohen     0   1   1   1   1   1   1   0   0   1   1   0   0   0   0   1   0   0   1   1   0
                            confidence     1   1   1   1   0   0   0   1   1   0   0   0   0   0   0   0   0   0   1   1   0
                   causal confidence       1   1   1   1   1   1   0   1   0   0   0   0   0   0   0   0   0   0   1   1   0
                               Pavillon    1   1   0   1   1   1   1   0   0   1   1   0   0   0   1   0   0   0   1   1   0
                             Ganascia      1   1   1   1   0   0   0   1   1   0   0   0   0   0   1   0   0   0   1   1   0
               causal confirmation         1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0
           descriptive confirmation        1   1   0   1   0   0   0   0   1   0   0   0   0   0   1   0   0   0   1   1   0
                            conviction     1   1   1   1   1   1   1   0   0   1   1   0   0   0   0   0   0   0   1   0   1
                                 cosine    0   1   0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0
                              coverage     1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0
                          dependency       1   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   1   1   0
                 causal dependency         1   1   0   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0
              weighted dependency          1   1   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0
                         Bayes factor      1   1   0   1   1   1   1   0   0   1   1   0   1   0   0   0   0   0   1   0   1
                            Loevinger      1   1   1   1   1   1   1   1   0   1   1   0   0   0   0   0   0   0   1   1   0
                 collective strength       0   1   1   1   1   1   1   0   0   1   1   0   0   0   0   1   0   0   1   0   1
                               Fukuda      1   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   1   0
                    information gain       0   1   0   1   1   1   1   0   0   1   1   1   0   0   0   0   0   0   1   0   0
                             Goodman       0   1   1   1   1   0   1   1   0   1   1   0   0   1   1   1   0   0   1   1   0
                   implication index       1   1   1   0   0   0   1   0   0   0   0   0   0   0   0   0   0   1   1   1   0
                                  IPEE     1   1   1   1   0   0   0   0   1   0   0   1   0   0   0   0   1   1   0   0   0
                                  IP3E     1   1   1   1   0   0   0   0   1   0   0   1   0   0   0   0   1   1   1   0   0
                                   PDI     0   1   1   1   0   1   0   0   0   0   0   1   1   0   0   0   1   1   1   0   0
                                      II   1   1   1   1   1   1   1   0   0   1   1   1   1   0   0   0   1   1   0   0   0
                                    EII    1   1   1   1   1   0   0   0   0   0   0   1   0   0   0   0   1   1   1   0   0
                                   REII    1   1   1   1   1   0   1   0   0   1   1   1   0   0   0   0   1   1   1   0   0
                     likelihood index      0   1   1   1   1   1   1   0   0   1   1   1   0   0   0   0   1   1   0   0   0
                               interest    0   1   0   1   1   1   1   0   0   1   1   0   0   0   0   0   0   0   1   1   0
                               Jaccard     0   1   0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   0   1
                             Jmeasure      1   0   0   0   0   0   1   0   0   1   0   0   0   0   0   0   0   0   1   0   1
                               Klosgen     1   1   0   0   1   0   1   0   0   1   1   0   0   0   0   0   0   0   1   0   1
                               Laplace     1   1   1   1   0   0   0   0   1   0   0   0   0   0   0   0   0   0   1   1   0
                                   Mgk     1   1   1   1   1   0   1   1   0   1   1   0   0   0   1   0   0   0   1   1   0
                 least contradiction       1   1   0   1   0   0   0   0   1   0   0   0   0   0   0   0   0   0   1   1   0
                                  Pearl    0   0   1   0   0   0   1   0   0   1   0   0   0   0   0   1   0   0   1   1   0
                  Piatetsky-Shapiro        0   1   1   1   1   1   1   0   0   1   1   0   0   1   1   1   0   1   1   1   0
                              precision    0   1   1   1   1   1   0   0   0   0   0   0   0   0   0   1   0   0   1   1   0
                            prevalence     1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0
                                 YuleQ     0   1   1   1   1   0   1   1   0   1   1   1   1   1   1   1   0   0   1   0   0
                                  recall   1   1   0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0
                                   Gini    1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   1   0   1
                          relative risk    1   1   0   1   1   1   1   0   0   1   1   0   0   0   0   0   0   0   1   0   1
                                 Sebag     1   1   0   1   0   0   0   0   1   0   0   0   0   0   0   0   0   0   1   0   1
                               support     0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0
                    one way support        1   1   0   0   1   1   1   0   0   1   1   0   0   0   0   0   0   0   1   0   1
                    two way support        0   1   0   0   1   1   1   0   0   1   1   0   0   0   0   0   0   0   1   0   1
examples and counter-examples rate         1   1   1   1   0   0   0   1   1   0   0   1   0   0   0   0   0   0   0   0   0
                                VT100      0   1   1   1   1   1   0   0   0   0   0   0   0   1   1   1   1   0   1   1   0
                   variation support       0   0   1   0   0   0   1   0   0   1   0   0   0   0   0   1   0   0   1   0   1
                                 YuleY     0   1   1   1   1   0   1   1   0   1   1   0   1   1   1   1   0   0   1   0   1
                                 Zhang     1   1   1   1   1   0   1   1   0   1   1   1   1   0   1   0   0   0   1   0   0
       causal confirmed confidence         1   1   1   1   1   1   0   1   0   0   0   0   0   0   0   0   0   0   1   1   0
                         Czekanowski       0   1   0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0
                 negative reliability      1   1   1   1   1   1   0   1   0   0   0   0   0   0   0   0   0   0   1   1   0
               mutual information          1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   1   0   1
                           Kulczynski      0   1   0   1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   0   1
                              Leverage     1   1   0   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0
                                novelty    0   1   1   1   1   1   1   0   0   1   1   0   0   1   1   1   0   0   1   1   0
                            odds ratio     0   1   1   1   1   1   1   0   0   1   1   0   0   0   0   1   0   0   1   0   1
                            specificity    1   1   0   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   1   1   0
                       causal support      0   1   1   1   1   1   0   0   0   0   0   0   0   0   0   1   0   0   1   1   0



Fig. 2. Input binary matrix describing interestingness measures by their properties.
 Bool. factors as a means of clust. of interestingness measures of assoc. rules   215




                                           F10
                                           F11
                                           F12
                                           F13
                                           F14
                                           F15
                                           F16
                                           F17
                                           F18
                                           F19
                                           F20
                                           F21
                                           F22
                                           F23
                                           F24
                                           F25
                                           F26
                                           F27
                                           F28
                                           F29
                                           F30
                                           F31
                                           F32
                                           F33
                                           F34
                                           F35
                                           F36
                                           F37
                                           F38
                                           F1
                                           F2
                                           F3
                                           F4
                                           F5
                                           F6
                                           F7
                                           F8
                                           F9
                           correlation     10000100100010000000101001000001000010
                                 Cohen     10000100000010010000101000000001000010
                            confidence     01010000000100010000000000000100001000
                   causal confidence       01000100010100010000000000100100001000
                               Pavillon    10000100010000000001100001000000000000
                             Ganascia      01010000000100000000000001000100001000
               causal confirmation         01000100011000010000000000100100000000
           descriptive confirmation        01010000000001000001000001000100000000
                            conviction     10000010000000110000100000000000100001
                                 cosine    01001000001011000001000000000001000010
                              coverage     00100000010001000000000110000100000000
                          dependency       00100000010001000001000100010000000000
                 causal dependency         01001100011000000001000000100100000000
              weighted dependency          00100000001000000101000000100100000001
                         Bayes factor      10001000000000100010100000000000100001
                            Loevinger      10000100010100010000100000000000001000
                 collective strength       10000010000010010000101000000000100011
                               Fukuda      00000000011000000101000000001100010000
                    information gain       10000000000010000001110000000000000011
                             Goodman       10000000100100000100001001000001001010
                   implication index       00100000010000010100000000011000000000
                                  IPEE     00010001000000000000010010001100010100
                                  IP3E     00010001001000010000010000001100010100
                                   PDI     00000001001010000010000000001000010110
                                      II   00000001000000100010100010000000010100
                                    EII    00000001001000010100010000100100010100
                                   REII    00000001000000110100010000000000010100
                     likelihood index      00000001000010000000110010000000010110
                               interest    10001100000010000001100000000001000010
                               Jaccard     00001010001011000001000000000000000011
                             Jmeasure      00100010000000000001000100010000100001
                               Klosgen     10000010000000100101000000010000100001
                               Laplace     01010000001000010000000000000100000000
                                   Mgk     10000000010100000100000001000000001000
                 least contradiction       01010000001001000001000000000100000000
                                  Pearl    00100000000000011000001000010001000010
                  Piatetsky-Shapiro        00000100100010000000101000000001010010
                              precision    01000100001010010000001000100001000010
                            prevalence     00100000010001000100000010000100000000
                                 YuleQ     10000000100100000110000000000010001011
                                  recall   01001000011001000001000000000100000000
                                   Gini    00100010001001000000001100000100000001
                          relative risk    10001010000000100001100000000000100001
                                 Sebag     00010010001001000001000000000100000001
                               support     01000000001001000101000000000001000010
                    one way support        10001010000000100001000000010000100001
                    two way support        10001010000000000001000000010000100011
examples and counter-examples rate         00010000000100000000010010000100001001
                                VT100      00000100100010000000000001100001000110
                   variation support       00100010000000011000001000010000100011
                                 YuleY     10000000100000000110001000000000101011
                                 Zhang     10000000000100100110000000000010001001
       causal confirmed confidence         01000100010100010000000000100100001000
                         Czekanowski       01001000001011000001000000000001000010
                 negative reliability      01000100010100010000000000100100001000
               mutual information          00100010001001000000001100000100000001
                           Kulczynski      00001010001011000001000000000000000011
                              Leverage     01001100011000000001000000100100000000
                                novelty    10000100100010000000101001000001000010
                            odds ratio     10000010000010010000101000000000100011
                            specificity    01001100011000000001000000100100000000
                       causal support      01000100001010010000001000100001000010



Fig. 3. Interestingness measures described by factors obtained by decomposition of the
input matrix from Figure 2 extended by negated properties.
216         Radim Belohlavek et al.




            P14.1




            P14.2
            P14.3




            P14.1




            P14.2
            P14.3
            P10
            P11
            P12
            P13
            P15
            P16
            P17
            P18
            P19
            P20
            P21




            P10
            P11
            P12
            P13

            P15
            P16
            P17
            P18
            P19
            P20
            P21
            P3
            P4
            P5
            P6
            P7
            P8
            P9




            P3
            P4
            P5
            P6
            P7
            P8
            P9
       F1    010010100110000000100 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0
       F2    010100000000000000110 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1
       F3    000000000000000000000 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 0 1 0 0 0 0
       F4    110100001000000000000 0 0 0 0 1 1 1 0 0 1 1 0 1 1 0 1 0 0 0 0 0
       F5    010001000000000000100 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1 1 1 0 0 0
       F6    010111000000000000110 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1
       F7    000000000000000000101 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 1 1 0 1 0
       F8    011100000001000011000 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1 1
       F9    011110000000011100100 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
      F10    100000000000000000010 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1
      F11    000000000000000000100 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0
      F12    011100010000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1
      F13    010101000000000000000 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
      F14    000000000000000000000 0 0 1 0 1 0 0 1 0 1 1 1 1 1 0 0 1 1 0 0 0
      F15    110010100110000000000 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 1 0
      F16    001000000000000000100 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0
      F17    001000100100000100100 1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0
      F18    010000000000000000000 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
      F19    010100000000100000000 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0
      F20    000000000000000000100 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 0 0
      F21    010111100110000000000 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
      F22    010100000001000000000 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 1 1
      F23    000000000000000100100 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0
      F24    100000000000000000000 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0
      F25    000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 1
      F26    010100000000001000110 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1
      F27    010010000000000000100 0 0 0 0 0 0 1 0 1 1 1 0 1 0 0 0 0 0 0 0 1
      F28    000000100000000000100 0 0 0 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 0 0
      F29    010000000000000001000 0 0 0 0 1 0 0 1 0 1 1 0 0 1 1 1 0 0 0 0 1
      F30    100000000000000000000 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 0 0 0 0
      F31    011110110111101000100 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 1
      F32    000000000000000000110 1 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 1
      F33    000000100100000000101 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 1 0 1 0
      F34    010100000000000001000 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
      F35    011100010000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0
      F36    011100000000000010000 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1
      F37    000000000000000000000 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
      F38    000000000000000000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0



Fig. 4. Factors obtained by decomposition of the input matrix from Figure 2 extended
by negated properties. The factors are described in terms of the original and negated
properties.
 Bool. factors as a means of clust. of interestingness measures of assoc. rules                 217


                                 100
                                  90




          cumulative cover (%)
                                  80
                                  70
                                  60
                                  50
                                  40
                                  30
                                  20
                                  10
                                   0
                                       0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
                                                       number of factors


Fig. 5. Cumulative cover of input matrix from Figure 2 extended by negated properties
by factors obtained by decomposition of the matrix.




Fig. 6. Venn diagram of the first five factors (plus the eighth and part of the sixth
and tenth to cover the whole set of measures) obtained by decomposition of the input
matrix from Figure 2 extended by negated properties.
218     Radim Belohlavek et al.


cover 52.4 %. Another remark is that the first ten factors cover the whole set of
measures.
    Note first that the Boolean factors represent overlapping clusters, contrary
to the clustering using the agglomerative hierarchical method and the K-means
method performed in [14]. Namely, the clusterings are depicted in Figure 6 de-
scribing the Venn diagram of the first five Boolean factors (plus the eighth and
part of the sixth and tenth to cover the whole set of measures) and Figure 7,
which is borrowed from [14], describing the consensus on the classification ob-
tained by the hierarchical and K-means clusterings. This consensus refunds the
classes C1 to C7 of the extracted measures, which are common to both tech-
niques.




 Fig. 7. Classes of measures obtained by the hierarchical and K-means clusterings.


    Due to lack of space, we focus on the first four factors since they cover nearly
half of the matrix (45.1 %), and also because most of the measures appear at
least once in the four factors.
    Factor 1. The first factor F1 applies to 20 measures, see Figure 3, namely:
correlation, Cohen, Pavillon, conviction, Bayes factor, Loevinger, collective strength,
information gain, Goodman, interest, Klosgen, Mgk, YuleQ, relative risk, one
 Bool. factors as a means of clust. of interestingness measures of assoc. rules   219


way support, two way support, YuleY, Zhang, novelty, and odds ratio. These
measures share the following 9 properties: P4, P7, P9, not P11, P12, P13, not
P19, not P20, P21, see Figure 4.
    Interpretation. The factor applies to measures whose evolutionary curve in-
creases w.r.t the number of examples and have a fixed point in the case of
independence (this allows to identify the attractive and repulsive area of a rule).
The factor also applies only to descriptive and discriminant measures that are
not based on a probabilistic model.
    Comparison. When looking at the classification results reported in [14], F1
covers two classes from [14]: C6 and C7 , which together contain 15 measures.
Those classes are closely related within the dendrogram obtained with the ag-
glomerative hierarchical clustering method used in [14]. The 5 missing measures
form a class obtained with K-means method in [14] with Euclidian distance.
    Factor 2. F2 applies to 18 measures, namely: confidence, causal confidence,
Ganascia, causal confirmation, descriptive confirmation, cosine, causal depen-
dency, Laplace, least contradiction, precision, recall, support, causal confirmed
confidence, Czekanowski, negative reliability, Leverage, specificity, and causal
support. These measures share the following 11 properties: P4, P6, not P9, not
P12, not P13, P14.2, not P15, not P16, not P19, not P20, P21.
    Interpretation. The factor applies to measures whose evolutionary curve in-
creases w.r.t. the number of examples and has a variable point in the case of
independence, which implies that the attractive and repulsive areas of a rule
are not identifiable. The factor also applies only to measures that are not dis-
criminant, are indifferent to the first counter-examples, and are not based on a
probabilistic model.
    Comparison. F2 corresponds to two classes, C4 and C5 reported in [14].
C4 ∪ C5 contains 22 measures. The missing measures are: Jaccard, Kulczyn-
ski, examples and counter-examples rate and Sebag. Those measures are not
covered by F2 since they are not indifferent to the first counter-examples.
    Factor 3. F3 applies to 10 measures, namely: coverage, dependency, weighted
dependency, implication index, Jmeasure, Pearl, prevalence, Gini, variation sup-
port, and mutual information. These measures share the following 10 properties:
not P6, not P8, not P10, not P11, not P13, not P14.1, not P15, not P16, not
P17, not P19.
    Interpretation. The factor applies to measures whose evolutionary curve does
not increase w.r.t. the number of examples.
    Comparison. F3 corresponds to class C3 reported in [14], which contains
8 measures. The two missing measures, variation support and Pearl, belong
to the same classes obtained by both K-means and the hierarchical method.
Moreover, these two missing measures are similar to those from C3 obtained
by the hierarchical method since they merge with the measures in C3 at the
next level of the generated dendrogram. Here, there is a strong correspondence
between results obtained using Boolean factors and the ones reported in [14].
    Factor 4. F4 applies to 9 measures, namely: confidence, Ganascia, descriptive
confirmation, IPEE, IP3E, Laplace, least contradiction, Sebag, and examples and
220     Radim Belohlavek et al.


counter-examples rate. These measures share the following 12 properties: P3, P4,
P6, P11, not P7, not P8, not P9, not P12, not P13, not P15, not P16, not P18.
   Interpretation. The factor applies to measures whose evolutionary curve in-
creases w.r.t. the number of examples and has a fixed value in the equilibrium
case. As there is no fixed value in the independence case, we can not get an
identifiable area in the case of attraction or repulsion.
   Comparison. F4 mainly applies to measures of class C5 obtained in [14]. The
two missing measures, IPEE et IP3E, belong to a different class.

5     Conclusions and further issues
We demonstrated that Boolean factors provide us with clearly interpretable
meaningful clusters of measures among which the first ones are highly similar
to other clusters of measures reported in the literature. Contrary to other clus-
tering methods, Boolean factors represent overlapping clusters. We consider this
an advantage because overlapping clusters are a natural phenomenon in human
classification. We presented preliminary results on clustering the measures using
Boolean factors. Due to limited scope, we presented only parts of the results
obtained and leave other results for a full version of this paper.
    An interesting feature of the presented method, to be explored in the future,
is that the method need not start from scratch. Rather, one or more clusters,
that are considered important classes of measures, may be supplied at the start
and the method may be asked to complete the clustering. Another issue left
for future research is the benefit of the clustering of measures for a user who
is interested in selecting a type of measure, rather than a particular measure
of interestingness of association rules. In the intended scenario, a user may use
various interestingness measures that belong to different classes of measures.

References
 1. Agrawal R., Imielinski T., Swami A.: Mining association rules between sets of items
    in large databases. Proc. ACM SIGMOD 1993, 207–216.
 2. Agrawal R., Srikant R.: Fast algorithms for mining association rules. Proc. VLDB
    Conf. 1994, 478–499.
 3. Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel
    method of matrix decomposition. J. of Computer and System Sciences 76(1)(2010),
    3–20.
 4. Blanchard J., Guillet F., Briand H., Gras R.: Assessing rule with a probabilistic
    measure of deviation from equilbrium. In Proc. Of 11th International Symposium
    on Applied Stochastic Models and Data Analysis ASMDA 2005, Brest, France,
    191–200.
 5. Blanchard J., Guillet F., Briand H., Gras R.: IPEE: Indice Probabiliste d’Écart
    à l’Équilibre pour l’évaluation de la qualité des règles. Dans l’Atelier Qualité des
    Données et des Connaissances 2005, 26–34.
 6. Brin S., Motwani R., Silverstein C.: Beyond Market Baskets: Generalizing Associ-
    ation Rules to Correlations. In Proc. of the ACM SIGMOD Conference, Tucson,
    Arizona, 1997, 265–276.
 Bool. factors as a means of clust. of interestingness measures of assoc. rules         221


 7. Carpineto C., Romano G.: Concept Data Analysis. Theory and Applications. J. Wi-
    ley, 2004.
 8. Davey B. A., Priestley H.: Introduction to Lattices and Order. Cambridge Univer-
    sity Press, Oxford, 1990.
 9. Delgado M., Ruiz D.-L., Sanchez D.: Studying Interest measures for association
    rules through a logical model. International Journal of Uncertainty, Fuzziness and
    Knowledge-Based Systems 18(1)(2010), World Scientific, 87–106.
10. Feno D.R.: Mesures de qualité des règles d’association: normalisation et car-
    actérisation des bases. PhD thesis, Université de La Réunion, 2007.
11. Ganter B., Wille R.: Formal Concept Analysis. Mathematical Foundations.
    Springer, Berlin, 1999.
12. Geng L., Hamilton H.J.: Choosing the Right Lens: Finding What is Interesting
    in Data Mining. Quality Measures in Data Mining 2007, ISBN 978-3-540-44911-9,
    3–24.
13. Geng L., Hamilton H. J.: Interestingness measures for data mining: A Survey. ACM
    Comput. Surveys 38(3)(2006), 1–31.
14. Guillaume S., Grissa D., Mephu Nguifo E.: Catégorisation des mesures d’intérêt
    pour l’extraction des connaissances. Revue des Nouvelles Technologies de
    l’Information, 2011, to appear (previously available as Technical Report RR-10-14,
    LIMOS, ISIMA, 2010).
15. Guillaume S.: Traitement des données volumineuses. Mesures et algorithmes
    d’extraction des règles d’association et règles ordinales. PhD thesis. Université
    de Nantes, France, 2000.
16. Guillaume S., Grissa D., Mephu Nguifo E.: Propriétés des mesures d’intérêt pour
    l’extraction des règles. Dans l’Atelier Qualité des Données et des Connaissances,
    EGC’2010, 2010, Hammamet-Tunisie, http://qdc2010.lri.fr/fr/actes.php, 15–28.
17. Grissa D., Guillaume S., Mephu Nguifo E.: Combining Clustering techniques
    and Formal Concept Analysis to characterize Interestingness Measures. CoRR
    abs/1008.3629, 2010.
18. Hájek P., Havránek T.: Mechanizing Hypotheses Formation. Springer, 1978.
19. Hájek P., Holeňa, Rauch J.: The GUHA method and its meaning for data mining.
    J. Computer and System Sciences 76(2010), 34–48.
20. Hilderman R. J., Hamilton H. J.: Knowledge Discovery and Measures of Interest,
    Volume 638 of The International Series in Engineering and Computer Science
    81(2)(2001), Kluwer.
21. Huynh X.-H., Guillet F., Briand H.: Clustering Interestingness Measures with Pos-
    itive Correaltion. ICEIS (2) (2005), 248–253.
22. Heravi M. J., Zaı̈ane O. R.: A study on interestingness measures for associative
    classifiers. SAC (2010), 1039–1046.
23. Lallich S., Teytaud, O.: Évaluation et validation de mesures d’intérêt des règles
    d’association. RNTI-E-1, numéro spécial 2004, 193–217.
24. Lenca P, Meyer P., Picouet P., Vaillant B., Lallich S.: Critères d’évaluation des
    mesures de qualité en ecd. Revue des Nouvelles Technologies de l’Information (En-
    treposage et Fouille de données) (1)(2003), 123–134.
25. Lenca P., Meyer P., Vaillant B., Lallich, S.: A multicriteria decision aid for interest-
    ingness measure selection. Technical Report LUSSI-TR-2004-01-EN, Dpt. LUSSI,
    ENST Bretagne 2004 (chapter 1).
26. Liu J., Mi J.-S.: A novel approach to attribute reduction in formal concept lattices.
    RSKT 2006, Lecture Notes in Artificial Intelligence 4062 (2006), 522–529.
222     Radim Belohlavek et al.


27. Maddouri M., Gammoudi J.: On Semantic Properties of Interestingness Measures
    for Extracting Rules from Data. Lecture Notes in Computer Science 4431 (2007),
    148–158.
28. Maier D.: The Theory of Relational Databases. Computer Science Press, Rockville,
    1983.
29. Pawlak Z.: Rough sets. Int. J. Information and Computer Sciences 11(5)(1982),
    341–356.
30. Pawlak Z.: Rough Sets: Theoretical Aspcets of Reasoning About Data. Kluwer, Dor-
    drecht, 1991.
31. Pearson K.: Mathematical contributions to the theory of evolution, regression,
    heredity and panmixia. Philosophical Trans. of the Royal Society A (1896).
32. Piatetsky-Shapiro G.: Discovery, Analysis and Presentation of Strong Rules. In
    G. Piatetsky-Shapiro & W.J. Frawley, editors: Knowledge Discovery in Databases.
    AAAI Press, 1991, 229–248.
33. Polkowski L.: Rough Sets: Mathematical Foundations. Springer, 2002.
34. Sese J., Morishita S.: Answering the most correlated n association rules efficiently.
    In Proceedings of the 6th European Conf on Principles of Data Mining and Knowl-
    edge Discovery 2002, Springer-Verlag, 410–422.
35. Tan P.-N., Kumar V., Srivastava J.: Selecting the right objective measure for as-
    sociation analysis. Information Systems 29(4)(2004), 293–313.
36. Tan P.-N., Steinbach M., Kumar V.: Introduction to Data Mining. Addison-Wesley,
    2005.
37. Vaillant B., Lenca P., Lallich S.: A Clustering of Interestingness Measures. DS’04,
    the 7th International Conference on Discovery Science LNAI 3245 (2004), 290–
    297.
38. Vaillant B.: Mesurer la qualité des règles d’association: études formelles et
    expérimentales. PhD thesis, ENST Bretagne, 2006.
39. Wang X., Ma J.: A novel approach to attribute reduction in concept lattices. RSKT
    2006, Lecture Notes in Artificial Intelligence 4062 (2006), 522–529.
40. Wille R.: Restructuring lattice theory: an approach based on hierarchies of con-
    cepts. In: Rival I.: Ordered Sets. Reidel, Dordrecht, Boston, 1982, 445–470.
41. Zhang W.-X., Wie L., Qi J.-J.: Attribute reduction in concept lattices based on
    discernibility matrix. RSFDGrC 2005, Lecture Notes in Artificial Intelligence 3642
    (2005), 157–165.
       Combining Formal Concept Analysis and
      Translation to Assign Frames and Thematic
               Role Sets to French Verbs

                              Ingrid Falk1 , Claire Gardent2
                      1
                          INRIA/Nancy Universités, Nancy (France)
                             2
                               CNRS/LORIA, Nancy (France)



        Abstract. We present an application of Formal Concept Analysis in the
        domain of Natural Language Processing: We give a general overview of
        the framework, describe its goals, the data it is based on, the way it works
        and we illustrate the kind of data we expect as a result. More specifically,
        we examine the ability of the stability, separation and probability indices
        to select the most relevant concepts with respect to our FCA application.
        We show that the sum of stability and separation gives results close to
        those obtained when using the entire lattice.


  1   Introduction

  Ideally natural language processing (NLP) applications need to analyse texts to
  answer the question of “Who did What to Whom”. For computers to effectively
  extract this information from texts, it is essential that they be able to detect
  the events that are being described and the event participants. Because events
  are mostly lexicalised using verbs, one ingredient that is essential for such sys-
  tems is detailed knowledge about their syntactic and semantic behaviour. It has
  been shown (Briscoe and Carroll (1993), Carroll and Fang (2004)) that detailed
  subcategorisation information (that is, information about the number and the
  syntactic type of verb complements) is crucial in enhancing their linguistic cover-
  age and theoretical accuracy. However this syntactic information is not sufficient
  to specify “Who did what to Whom” because it does not allow to identify the
  thematic roles participating in the event described by the verb. For example in
  John threw a ball to Mary the syntactic analysis of the sentence would not allow
  to identify John which is the syntactic subject of the sentence as the Agent or
  Causer of the throwing event, Mary, syntactically the prepositional object as the
  Destination and ball (the object) as the item being thrown.
      To help computer systems in this task of understanding and representing the
  full meaning of a text, verb classifications have been proposed which group to-
  gether verbs with similar syntactic and semantic behaviour, ie. which associate
  groups of verbs with subcategorisation frames showing the syntactic construc-
  tions the verbs may appear in and sets of thematic roles which represent the
  participants in an event described by the verbs in the group.

c 2011 by the paper authors. CLA 2011, pp. 223–238. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
224     Ingrid Falk and Claire Gardent


    For English, there exist several large scale resources providing verb classes
(eg. Framenet Baker et al. (1998) and VerbNet Schuler (2006), the classifica-
tion we use in our framework) in a format that is amenable for use by natural
language processing systems. For example for the verb throw the corresponding
VerbNet class shows that the participants in a throwing event are an Agent, a
Theme (the thing being thrown), a Source and a Destination. In addition, the
VerbNet class provides the syntactic constructions the verb can occur in (eg.
Subject(John) V(throws) Object(a ball ) PrepObject(to Mary)) and shows
how the participant roles can be realised as syntactic arguments: In the exam-
ple above the Agent (John) is realised syntactically as Subject, the Theme
(the ball ) as Object and the Destination (to Mary) as prepositional object
(PrepObject).
   For French however, existing verb classes are either too restricted in scope
(Volem Saint-Dizier (1999)) or not sufficiently structured (the LADL tables
Gross (1975)) to be directly useful for NLP. Even though recently other large cov-
erage syntactic-semantic resources for French have been made available (Tolone
(2011) as well as further processed versions of Dubois and Dubois-Charlier
(1997), Hadouche and Lapalme (2010)) the terminology and linguistic formalisms
they are based on is often still hardly compatible with the methods and tools
currently used in the NLP community.
    In this paper we present a method for providing a VerbNet style classifica-
tion of French verbs which associates verbs with syntactic constructions on the
one hand and sets of semantic role sets (the set of semantic roles participating
in the event described by the verb) on the other. To obtain this classification,
we build and combine two independent classifications. The first is semantic and
is obtained from the English VerbNet (VN) by translation, the second is syn-
tactic and is obtained by building an FCA (Formal Concept Analysis) lattice
from three, manually validated syntactic lexicons for French. The first asso-
ciates groups of French verbs with the semantic roles of the English VN class.
The second associates groups of French verbs (the concept extent) with syntactic
constructions (concept intent). We then merge both classifications by associating
with each translated VN class, the FCA concept whose verb set yields the best
F-measure with respect to the verb sets contained in each translated VN class.
We thus effectively associate the set of semantic roles of the VN class to the
group of French verbs and the syntactic information given by the FCA concept.
    In the past several linguistic FCA applications have been presented, as Priss
(2005) shows in her overview. For example, Sporleder (2002) describes an FCA
based approach to build structured class hierarchies starting from unstructured
lexicon entries while the features used for building classes in the approach pre-
sented in (Cimiano et al., 2003) are collected from a corpus. Our approach (based
on earlier work presented in Falk et al. (2010), Falk and Gardent (2010)) is con-
cerned with building a lexical resource based on lexicons and is therefore related
to the FCA approach in (Sporleder, 2002). However, the features we use are
different. In addition we explore the use of concept selection indices to filter the
concept lattices and finally relate the formal concepts we obtain to other classes
        Combining FCA and Transl. to Assign Frames and Thematic Grids to           225
                                                           French Verbs

obtained by a clustering approach based on different numeric features extracted
from lexicons and English-French dictionaries.
    In the following we first introduce the terminology and data used in our
application domain. Next we describe how we associate groups of French verbs
with syntactic information using Formal Concept Analysis (Section 3). As the
resulting concept lattice has a very large number of concepts which are mostly
not useful verb classes we explore methods to select the concepts most relevant
to our application (Section 4). We show in particular that selecting only ∼ 10%
of the concepts of the lattice using indices proposed in Klimushkin et al. (2010)
gives results close to those obtained when using the entire lattice. We then show
how we build the translated VerbNet classes and how they are mapped to the
previously pre-selected FCA concepts (Section 5). Finally in Section 6 we present
the kind of associations we obtain by our method.


2     Linguistic Concepts and Resources
Our aim is to build a lexicon associating groups of French verbs with:
1) the syntactic constructions the verbs of this group may appear in,
2) the semantic roles participating in an event described by a verb of this group.

Syntactic constructions a verb may occur in are described using subcategorisation
frames (SCF) and are usually part of a lexical entry describing the verb. A
subcategorisation frame (SCF) characterises the number and the type of the
syntactic arguments expected by a verb. Each frame describes a set of syntactic
arguments and each argument is characterised by a grammatical function (eg.
SUJ - subject, OBJ - direct object etc.) and a syntactic category (NP indicates
a noun phrase, PP a prepositional phrase, etc.). For example John throws a
ball to Mary. is a possible realisation of the subcategorisation frame SUJ:NP V
OBJ:NP POBJ:PP.

The semantic (thematic) roles are the participants in an event described by
a particular verb. To date there is no consensus about a set of semantic roles
or a set of tests determining them. There may be a general agreement on a
set of Semantic Roles (eg. Agent, Patient, Theme, Instrument, Location, etc.)
but there is substantial disagreement on when and where they can be assigned
(Palmer et al., 2010). Thus each of the well known resources (FrameNet (Baker
et al., 1998), PropBank (Palmer et al., 2005), VerbNet (Schuler, 2006), LVF
(Dubois and Dubois-Charlier, 1997)) providing semantic role information have
their own semantic role inventory. In our work we chose the VerbNet semantic
role inventory for several reasons:
1. VN semantic roles provide a compromise between generalisation and speci-
   ficity in that they are common across all verbs3 but are still able to capture
   specificities of particular classes.
3
    in contrast to FrameNet Baker et al. (1998) and PropBankPalmer et al. (2005) roles.
226     Ingrid Falk and Claire Gardent


 2. VN roles are among those generally agreed upon in the community.
 3. None of the other resources provide the link between syntactic arguments
    and semantic roles across different verbs.
 4. Semantic roles are expected to be valid across languages and by using the
    same role inventory as for English we hope to leverage some of the substantial
    research done for English and link syntactic information for French with
    semantic information provided by the English classes. Our method allows
    us to detect groups of French verbs with the same role set as some English
    VerbNet class and gives information about how these semantic roles are
    realised syntactically in French.

   Figure 1 shows an excerpt of the throw-17.1 VerbNet class, with its verbs,
thematic roles and subcategorisation frames.
verbs (32): kick, launch, throw, tip, toss, ...
sem. roles: Agent, Theme, Source, Destination
                         SCFs                       sem. roles
                    Subject V Object              Agent V Theme
                                 John throws a ball
            Subject V Object PrepObject Agent V Theme Destination
frames (8):
                             John throws a ball to Mary
               Subject V Object Object Agent V Destination Theme
                              John throws Mary a ball
                                          etc.


                    Fig. 1: Simplified VerbNet class throw-17.1.

Thus, from this data an English NLP system analysing the sentence John threw
a ball to Mary could infer the semantic roles involved in the event, namely
those given by the VerbNet class. It could also detect the possible semantic
roles realised by the syntactic arguments: It would know that the subject is a
realisation of the Agent semantic role, the object of the Theme or Destination
semantic roles, etc.


3     Associating French Verbs with Subcategorisation
      Frames
To associate French verbs with syntactic frames, we use the FCA classification
approach where the objects are verbs and the attributes are the subcategorisa-
tion frames associated with these verbs by the subcategorisation lexicon to be
described below.

3.1   Subcategorisation Lexicons
Subcategorisation information is retrieved from three existing lexicons for French:
Dicovalence van den Eynde and Mertens (2003), the LADL tables Gross (1975),
        Combining FCA and Transl. to Assign Frames and Thematic Grids to               227
                                                           French Verbs

Guillet and Leclère (1992) and finally TreeLex Kupść and Abeillé (2008). Each
of these was constructed manually or with an important manual validation by
linguists. The combined lexicon covers 5918 verbs, 345 SCFs and has a total of
20443 hverb, framei pairs. Table 1 shows sample entries in this lexicon for the
verb expédier (send). Using the Galicia Lattice Builder software4 , we first build
              Verb: expédier
              SCF                                    Source info
              SUJ:NP,DUMMY:REFL                      DV:41640,41650
              SUJ:NP,OBJ:NP                          DV:41640,41650;TL
              SUJ:NP,OBJ:NP,AOBJ:PP                  TL
              SUJ:NP,OBJ:NP,POBJ:PP,POBJ:PP LA:38L
     Table 1: Sample entries in subcategorisation lexicon for verb expédier (send).


a concept lattice based on the formal context hV, F, Ri such that:
 – V is the set of verbs in our subcategorisation lexicon. We ignore verbs with
   only one SCF as they will result in classes associating verbs with a unique
   frame.
 – F is the set of subcategorisation frames (SCFs) present in the subcategori-
   sation lexicon,
 – R is the mapping such that (v, f ) ∈ R iff the subcategorisation lexicon
   associates the verb v with the SCF f .
The resulting formal context is made of 2091 objects (verbs) and 238 attributes
(frames), giving rise to a lattice of 12802 concepts. Clearly however not all these
concepts are interesting verb classes. Classes aim to factorise information and
express generalisations about verbs. Hence, concepts with few (1 or 2) verbs
can hardly be viewed as classes and similarly, concepts with few frames are less
interesting.
    To select from this lattice those concepts which are most likely to provide
the most relevant verb-frame associations, we explore the use of three indices for
concept selection: concept stability, separation and probability which have been
proposed and analysed in (Klimushkin et al., 2010). In Section 4.2 we investigate
which of these indices performs best in the context of our application. We then
use the best performing concept filtering method to select the most relevant
concepts with respect to our data. For each translated VN class we then identify
among the selected FCA concepts the one(s) with best f-measure between preci-
sion and recall. For a translated VN class CV N (consisting of French verbs) and
the extent (verb set) of an FCA concept CF CA precision, recall and f-measure
                                 |CV N ∩ CF CA |       |CV N ∩ CF CA |       2RP
are computed as follows: R =                     ,P =                  ,F =
                                      |CV N |              |CF CA |         R+P
The translated VN class is then associated with the FCA concept(s) with best
F-measure. Thus the verbs in the FCA concept are effectively associated with
the thematic roles of the translated class and at the same time with the syntactic
subcategorisation frames in the intent (attribute set) of the FCA concept.
4
    http://www.iro.umontreal.ca/~galicia/
228      Ingrid Falk and Claire Gardent


4     Filtering Concept Lattices

The lattices we have to deal with are very large and many of the concepts do not
represent valid verb classes. To select those concepts which are most relevant in
the context of our application the concept lattice needs to be filtered. Klimushkin
et al. (2010) propose three indices for selecting relevant concepts in concept
lattices built from noisy data: concept stability, separation and probability. In
this section, we investigate which of these indices works best for our data.
Concept stability is a measure which helps discriminating potentially interesting
patterns from irrelevant information in a concept lattice based on possibly noisy
data. The stability of a concept C = (V, F ) is the proportion of subsets of the
extent V which have the same attribute set F as V :
                                           |{A ⊆ V | A0 = F }| 5
                         σ((V, F )) =                            .                   (1)
                                                  2|V |
Intuitively, a more stable concept is less dependant on any individual object in
its extent and is therefore more resistant to outliers or other noisy data items.
Concept separation indicates the significance of the difference between the ob-
jects covered by a given concept from other objects and, simultaneously, between
its attributes and other attributes:
                                                     |V | |F |
                 s((V, F )) = P                      P                           .   (2)
                                  v∈V    |{v}0 | +                0
                                                        f ∈F |{f } | − |V | |F |

Intuitively we expect a concept with high separation index to better sort out the
verbs it covers from other verbs and simultaneously the frames it covers from
other frames. Whereas concept stability is a measure concerned with either ob-
jects or attributes, separation gives information about objects and attributes at
the same time.
Concept probability. For an attribute a ∈ A, the attribute set, we denote by pa
the probability of an object to have the attribute a. In practise it is the propor-
                                      0
tion of objects having a: pa = |{a}     |
                                   |O| , where O denotes the set of objects.
For B ⊆ A, we define pBYas the probability of an arbitrary object having all
attributes from B: pB =        pa . This formulation assumes the mutual indepen-
                            a∈B
dence of attributes. Based on this, and denoting n = |O| we obtain the following
formula for the probability of B being closed:
                                 n
                                 X
                p(B = B 00 ) =         p(|B 0 | = k, B = B 00 )                      (3)
                                 k=0
                                  n
                                     "                                 #
                                 X        n k               Y
                            =                p (1 − pB )n−k   (1 − pka )             (4)
                                          k B
                                 k=0                              a∈B
                                                                   /

5
    Here and in the following 0 represents the operator on the power sets of objects:
    0
      : 2O → 2A , X 0 = {a ∈ A | ∀o ∈ X. (o, a) ∈ R} and dually on that of attributes.
        Combining FCA and Transl. to Assign Frames and Thematic Grids to     229
                                                           French Verbs

A small p(B = B 00 ) suggests a small probability of the attribute combination B
to be a concept intent by chance only (and p(B = B 00 ) ≈ 1 that there is a high
probability that the combination is a concept intent by chance). However, this
reasoning is based on the independence of the attributes, which in our particular
case can not be warranted.

4.1     Computing Stability, Separation and Probability Indices.
Stability. Calculating stability is known to be NP-complete (Kuznetsov, 2007),
however Jay et al. (2008) show that when the concept lattice is known it can
be computed efficiently by a bottom-up traversal algorithm introduced in (Roth
et al., 2006). This is the algorithm we used to compute concept stability.
Separation can be computed in O(|O| + |A|) time, where O and A are the object
and attribute sets respectively. Computing separation is the least prohibitive of
the three indices.
Probability. Klimushkin et al. (2010) show that computing probability of only one
concept involves O(|O|2 ·|A|) multiplication operations which is computationally
very costly. With the computational means at our disposal it was not possible for
us to compute the concept probabilities. We therefore computed approximations
derived as follows: Y
First, we consider       (1 − pka ) ≈ 1 for k > 40. In view of this, Equation (4)
                       a∈B
becomes:
                              40
                                 "                                #
                             X     n k                Y
                          00                     n−k            k
                 p(B = B ) =          p (1 − pB )         (1 − pa )           (5)
                                   k B
                             k=0                      a∈B
                                                       /
                             X n                   
                                   n k            n−k
                           +          p (1 − pB )                             (6)
                                   k B
                                    k=41
      n  
      X  n
As              pk (1 − p)n−k = 1, Term (6) can be rewritten as:
            k
      k=0
                                    40  
                                    X                          
                                         n
                               1−              pkB (1 − pB )n−k =             (7)
                                           k
                                    k=0
                                                1 − F (40; n, pB ).           (8)
              Xk  
                  n i
F (k; n, p) =       p (1 − p)n−1 is the cumulative distribution function of the
              i=0
                  i
binomial distribution6 and can be computed using various statistical software
                                                                       
packages. Term (5) can also be computed more easily considering that nk pkB (1−
pB )n−k are binomial densities the computation of which is also provided by
statistics software7
6
    Source Wikipedia: http://en.wikipedia.org/wiki/Binomial_distribution
7
    We used the R software environment for statistical computing (http://www.
    r-project.org/).
230     Ingrid Falk and Claire Gardent


4.2   Evaluating the Concept Selection Indices

In the following we measure the performance of the three concept selection in-
dices with respect to our data. The experimental setting is as follows:
    We first select a number of N (1500) concepts with best selection index. The
selected concepts are aligned with the classes translated from VerbNet (see Sec-
tion 5): For each translated class, we select the concept with best precision/recall
f-measure. Then we associate to the concept with best f-measure the thematic
roles of the translated VN class. Next we compare the obtained hverb, thematic
role seti associations with those given by a reference. As for our task recall is
more important than precision, we use the F 2 measure, which gives more weight
to recall, for comparison.
    As reference we use the data used for training the classifier for learning
the translated VN classes (see Section 5): we are checking which index selects
the most relevant concepts, that is those best matching the translated classes.
The reference consists of the hverb, semantic role seti pairs marked as positive
examples in the training set, ie. those for which we considered that the French
verbs could have the semantic roles given by the English VN class. Table 2 shows

                                     cov. prec. rec. F2
                       stab only    39.88 18.96 32.55 26.27
                       sep only     34.25 28.37 21.52 23.41
                       prob only    35.53 26.60 20.73 22.38
                       w/o filtering 100 12.30 60.96 26.30
Table 2: F2 scores and coverage for stability, separation and the 6th probability 10-
quantile.


the F2 scores and coverage when using only one index at a time. For stability
and separation we applied the method above on the top ranking 1500 concepts.
Regarding probability, at first sight, we should consider best the concepts with
lowest probability – because the probability of their intents of being closed by
chance only is accordingly low. However, looking at the data we found that these
concepts have very few verbs and large intent (frame) sets - which rather suggest
improbable or rare verb groups. On the other hand, the interpretation of concept
probability suggests that a concept with a probability close to 1 could occur by
chance only. For these reasons, to assess probability separately we settled on the
6th 10 quantile. The results confirm the observations of Klimushkin et al. (2010):
stability alone gives F2 scores close to an upper bound – the results obtained
without filtering, ie. aligning the translated classes with all the concepts of the
lattice. The results for separation and probability are several points lower.
    As we only select ∼ 10% of the total number of concepts we also have to
make sure that the selected concepts cover at least a reasonable amount of
verbs. The cov column gives the percentage of verbs in the lattice covered by the
selected concepts. It shows that using only one index at a time the pre-selected
concepts would contain only 35% − 40% of the verbs in the entire lattice, which
is unsatisfactory.
       Combining FCA and Transl. to Assign Frames and Thematic Grids to             231
                                                          French Verbs

     Klimushkin et al. (2010) investigate the performance of the stability, sepa-
ration and probability indices at finding the original concepts in lattices pro-
duced from contexts which were previously altered by introducing two types
of noise: Type I noise is obtained by altering every cell in the context with
some probability, Type II noise is obtained by adding a given number or pro-
portion of random objects or attributes. According to this, our contexts are
affected by Type I noise rather than Type II. Klimushkin et al. (2010) found
that stability was most effective at sorting out Type II noise, but also proved
helpful in the case of Type I noise. In contrast, they suggest that separation
and probability can not be used on their own but should rather serve as a nor-
malising measure for stability. The most promising combination seemed to be:
stability + ksep · separation − kprob · probability.
     In the following we start from the assumption that the most effective index
for selecting relevant concepts is given by a linear combination of stability, sepa-
ration and probability: kstab · stability + ksep · separation − kprob · probability, and
empirically determine the coefficients kstab , ksep and kprob such that the selected
concepts perform best with respect to our task.
     We proceed as follows: We choose kstab , ksep and kprob . We then compute the
corresponding linear combination for the concepts and select the 1500 concepts
ranking highest. As in the previous experiments, we measure the relevance of the
selected concepts by aligning the concepts with the translated VN classes and
by comparing the alignments with the same reference as before. We consider the
“best” kstab , ksep , kprob combination the one giving highest F2 scores and good
coverage.
     Table 3a shows the results for a first series of experiments where kstab and
ksep were assigned the values 0.5 and 1 and kprob 0.25 and 0.5 (The lines are
sorted by decreasing F2 score). They suggest that the stability and separation
coefficients had less impact on coverage and F2 score than the probability coef-
ficient. Interestingly the coverage is correlated with the F2 score.
     In the second series of experiments, shown in Table 3b, we kept the stability
and separation coefficients fixed and varied only the probability coefficient. These
results suggest that the probability coefficient may not help at selecting the
most relevant concepts in our setting. This may be due first to the fact that our
attributes are not independent (we assumed independence of attributes when
setting up the formula for computing the probability index) and second to the
fact that we had to approximate the probability index and this approximation
may not be accurate enough.
     In the next series of experiments we investigated the impact of the number
of preselected concepts (500). The results showed that with this smaller num-
ber of concepts the selected concepts reached a slightly smaller F2 score but a
substantially lower coverage. Also, in this configuration the probability index
did seem to be helpful. Preselecting 1000 concepts confirmed the previously ob-
served tendencies: The F2 score and coverage were only slightly lower than when
preselecting 1500 concepts and again the probability index seemed to have only
a small impact on the overall results.
232       Ingrid Falk and Claire Gardent



(a) F2 and coverage when kstab , ksep ∈(b) F2 and coverage when kstab and ksep
{0.5, 1}, kprob ∈ {0.25, 0.5}.         are kept fixed and kprob varies.
 kstab ksep kprob cov. prec. rec. F2      kstab ksep kprob cov. prec. rec. F2
   1    1 0.25 98.04 11.87 55.19 24.89      1    1       0 98.04 12.05 55.12 25.16
   1 0.5 0.25 98.04 11.87 55.19 24.89       1    1 0.05 98.04 12.05 55.12 25.16
   1 0.5 0.5 57.69 17.08 30.18 24.04        1    1 0.005 98.04 12.05 55.12 25.16
   1    1 0.5 56.15 17.45 29.13 23.82       1    1 0.0005 98.04 12.05 55.12 25.16
  0.5 0.5 0.25 56.15 17.45 29.13 23.82      1    1     0.1 98.00 11.91 55.38 25.00
  0.5 1 0.25 53.81 18.03 27.82 23.36        1    1     0.2 98.08 11.88 55.12 24.91
  0.5 0.5 0.5 49.72 18.55 26.25 23.06       1    1 0.25 98.04 11.87 55.12 24.89
  0.5 1 0.5 49.90 18.61 25.98 22.95         1    1     0.3 98.00 11.79 55.38 24.80
                                            1    1     0.4 59.95 16.27 31.23 23.91
                                            1    1     0.5 56.16 17.45 29.13 23.82
                                            w/o filtering         100 12.30 60.96 26.30
      Table 3: F2 scores and coverage for various kstab , ksep , kprob combinations.




    From these experiments we conclude the following: First they suggest that
the best linear combination is the sum of the stability and separation indices
as the F2 measure and the coverage for this combination are similar to those of
an upper bound, ie. the alignment obtained without filtering. They show that
selecting only ∼ 10% of the original lattice gives a verb, frame, semantic role set
alignment which is close to the alignment obtained when using the entire lattice
and that the pre-selected concepts also have a similar coverage.
    Second, it does not seem evident that probability has a positive effect on
the selected concepts. However, it does improve f-measure when the number of
selected concepts is lower (500 or 1000 vs. 1500 in our experiments). Hence, for
our application we concluded that it is a better strategy to select a larger number
of concepts (1500) and not take probability into account. This is even more so as
the probability index in our case should be taken with caution because first we
had to use an approximation to compute it which may be too rough, and second
the computation of probability is based on the independence of attributes which
is not warranted in our case.


5     Associating French Verbs with Thematic Role Sets.

We associate French verbs with thematic role sets by translating the English
VerbNet classes to French using 3 English-French dictionaries. In the following
we first briefly describe the relevant resources, ie. VerbNet and the dictionaries
before giving the translation methodology. As for this paper only the translated
classes, but not the method to produce them is relevant8 we only very briefly
sketch the methodology.
8
    Of course better translated classes will result in a better performance of our method,
    but it is not straight forward to evaluate the quality of the translated classes.
       Combining FCA and Transl. to Assign Frames and Thematic Grids to              233
                                                          French Verbs

    VerbNet (Schuler (2006)) is the largest electronic verb classification for En-
glish. It was created manually and classifies 3626 verbs using 411 classes. Each
VN class includes among other things a set of verbs, a set of subcategorisation
frames and a set of thematic roles. Figure 2 shows an excerpt of the amuse-31.1
class, with its verbs, thematic roles and subcategorisation frames.

verbs (242): abash, affect, afflict, amuse, annoy, . . .
thematic roles: Experiencer, Cause
            NP V NP                  Experiencer V Cause
            NP V ADV-Middle Experiencer V Adv
frames (6):
            NP V NP-PRO-ARB Cause V
            ...


                     Fig. 2: Simplified VerbNet class amuse-31.1.

English-French dictionaries. We use the following resources to translate the verbs
in the English VN classes to French: Sci-Fran-Euradic, a French-English bilingual
dictionary, built and improved by linguists , Google dictionary9 and Dicovalence
van den Eynde and Mertens (2003)10 . The merged dictionary contains 51242
French-English verb pairs.
    In the following we describe our method for translating the English VerbNet
classes to French.
    The translation of VerbNet classes is bound to be very noisy because verbs are
polysemous and the dictionaries typically give translations for several readings
of the verb: Thus the dictionary may give several translations vf r which do not
correspond to the meaning given by the hven , classi pair or this meaning may
even not be covered at all by the dictionary. To get more accurate translated
VN classes we use a machine learning method, namely Support Vector Machines
(SVM)11 . We follow a straight forward SVM application scenario: we build all
the French verb, VN class pairs hvf r , CV N i where vf r is a translation of an
English verb in CV N . The classifier has to give a probability estimate about
whether this association is correct or not.
    For training the classifier we use the 160 verbs appearing in the gold standard
proposed by Sun et al. (2010)12 . We build the pairs hvf r , CV N i where vf r is a
verb in the gold standard which is a translation of a verb in CV N . For each
of these pairs we assessed whether or not there was a meaning of vf r where
the semantic roles involved in the event described by the verb were those given
by CV N . The features associated to the hverb, classi pairs are numeric and are
extracted from the dictionaries and VerbNet.
9
   http://www.google.com/dictionary. We obtained 13824 French-English verb pairs.
10
   The number of French-English verb pairs we obtained is 11351
11
   We used libsvm, the software package and methodology presented on http://www.
   csie.ntu.edu.tw/~cjlin/libsvm/, Chang and Lin (2011).
12
   In fact this is the only existing gold standard for French VerbNet style classes and
   we also use it for the overall evaluation of our system (not presented in this paper).
234       Ingrid Falk and Claire Gardent


    The trained classifier is then used to produce probability estimates for all
verb, class instances. We select the 6000 pairs with highest probability esti-
mates13 and finally obtain the translated classes by assigning each verb in a
selected pair to the corresponding class.
    To give an idea of the quality of the obtained classes: The accuracy of the
classifier on the held out test set was 90%, compared to a maximum accuracy
of 93.84% for five fold cross-validation on the development set. The frequency
distribution of the translated classes obtained this way is much closer to the
distribution of verbs in VerbNet classes as when using an approach based only
on translation frequencies, thus providing more accurate verb groups to guide
the FCA concept - thematic roles associations.


6      The French Verb ↔ Thematic Role Sets ↔ Syntactic
       Frame Associations

As a detailed and thorough evaluation of the verb, thematic role sets and syntac-
tic frames associations would be out of the scope of this paper we only give here
an intuition of the type of information provided by our method. Following the
preliminary investigations in the previous sections we associated French verbs
with subcategorisation frames and thematic role sets according to the scheme
listed below:

 – We group the VerbNet thematic roles and assign to one class all the VN
   verbs whose class have the same role set. We then translate the obtained
   classes using the methods described in Section 5.
 – We use FCA to group French verbs and syntactic frames associated to these
   verbs by the lexicons described in Section 3. The concept lattices we create
   are based on the formal contexts consisting of French verbs as objects and
   SCFs as attributes.
 – We then select the 1500 concepts where the sum of the stability and sepa-
   ration indices is highest because in Section 4 we found this combination of
   concept selection indices to work best for our application.
 – For each translated VN class we identify among the 1500 filtered FCA con-
   cepts the one(s) with best f-measure between precision and recall.

The translated VerbNet class is then associated with this FCA concept(s). Thus
the verbs in the FCA concept are effectively associated with the thematic role
set of the translated class and at the same time with the syntactic frames in
the intent (attribute set) of the FCA concept. Figure 3 shows the associations
between concepts, thematic role sets and frames generated by our method for
some VN classes14 . The figure shows the concepts associated to these thematic
role sets and for each of these concepts: their attribute set (syntactic frames),
13
     In VerbNet there are 5726 verb, class pairs
14
     These are the classes occuring in the gold standard proposed by Sun et al. (2010),
     mentioned in Section 5.
                Combining FCA and Transl. to Assign Frames and Thematic Grids to                                                                                                         235
                                                                   French Verbs
                                 1248                                                    5022
                                                              32                                                       7191                      5312                           617
                            SUJ:NP,OBJ:NP                                       SUJ:NP,OBJ:NP,DEOBJ:PP
                                                            SUJ:NP                                              SUJ:NP,OBJ:Ssub      SUJ:NP,OBJ:NP,POBJ:PP,POBJ:PP   SUJ:NP,DEOBJ:Ssub,POBJ:PP
                                                                                 SUJ:NP,OBJ:NP,POBJ:PP
                          AgExp-End-Theme
                                                     AgExp-Location-Theme                                     AgExp-PredAtt-Theme       AgExp-End-Start-Theme            AgentSym-Theme
                       AgExp-Instrument-Patient                                   AgExp-Start-Theme
                                                      verb set: 977 verbs                                      verb set: 343 verbs         verb set: 52 verbs            verb set: 33 verbs
                         verb set: 1706 verbs                                     verb set: 300 verbs



                                                                   4584
                                                                SUJ:NP
                                18868                                                                        7190
         1227                                               SUJ:NP,AOBJ:PP
                            SUJ:NP,OBJ:NP                  SUJ:NP,DEOBJ:PP                               SUJ:NP,OBJ:NP
     SUJ:NP,OBJ:NP
                           SUJ:Ssub,OBJ:NP                   SUJ:NP,OBJ:NP                              SUJ:NP,OBJ:Ssub
 AgExp-PatientSym                                       SUJ:NP,OBJ:NP,DEOBJ:PP
                             AgExp-Cause                                                                 AgExp-Theme
                                                         SUJ:NP,OBJ:NP,POBJ:PP
 verb set: 122 verbs
                          verb set: 354 verbs                                                       verb set: 326 verbs
                                                  AgExp-Beneficiary-Extent-Start-Theme
                                                           verb set: 17 verbs




                     Fig. 3: French verb ↔ synt. frames ↔ thematic role set associations.



the associated thematic role set(s), the number of verbs in the concept and the
hierarchical relations between the concepts as given by the concept lattice.
    Thus for example the following 11 verbs (occuring in the gold standard)
bouger, déplacer, emporter, passer, promener, envoyer, expédier, jeter, porter,
transmettre, transporter are in concept 5312 and thereby may be used in the
construction SUJ:NP,OBJ:NP,POBJ:PP,POBJ:PP15 (according to our lexical re-
sources). When they occur in this construction they are associated with the
thematic role set AgExp, End, Start, Theme, i.e. the semantic roles involved
are an Agent or Experiencer, a Start point, an End point and a Theme.
The listed verbs are all verbs of movement where an agent may move a theme
from a start point to an end point – therefore in this case the associations with
the syntactic frame and thematic role set seem to be correct. An NLP sys-
tem which encounters the verb déplacer for example, used in the construction
SUJ:NP,OBJ:NP,POBJ:PP,POBJ:PP could infer that possible thematic roles in-
volved in the described event are an Agent (or Experiencer), a Theme, an
End point and a Start point. However, it still would not know which thematic
role is realised by which syntactic argument.
    There are also some problems with these associations. As can be seen in
Figure 3, there is one case where the classification maps the same concept to two
distinct VerbNet classes (AgExp-End-Theme and AgExp-Instrument-Patient).
In addition, verbs in sub-concepts inherit the class label of the super-concept.
Although there are verbs which belong to several VN classes, in many cases this
multiple mapping was not warranted. Improving the precision of these mappings
requires further investigations.


7           Conclusion
We introduced a new approach to verb clustering which involves the combined
use of the English VerbNet, a bilingual English-French lexicon and a merged
subcategorisation lexicon for French. Using these resources, we built two classi-
fications, one derived from the English VN by translation and the other, from
the subcategorisation lexicons via the construction of a formal concept lattice.
We then use the translated VN to associate FCA concepts with VN classes
15
      a transitive construction with two additional prepositional objects
236    Ingrid Falk and Claire Gardent


and thereby associate verbs with both syntactic frames and a thematic role
set. We explored the performance of the concept selection indices introduced
by Klimushkin et al. (2010) which are stability, separation and probability at
selecting most relevant concepts with respect to our data and found that the
sum of stability and separation gave best results in the setting of our appli-
cation. These results were similar to those obtained without filtering, showing
that this combination of the indices did indeed allow to select the most relevant
concepts with respect to our data. Finally we showed the French verb, syntactic
constructions and semantic role sets associations we obtained and briefly illus-
trated their potential use. Thus Formal Concept Analysis in combination with
the concept selection indices, translation and set mapping methods proved an
adequate method in this knowledge acquisition process.
      Combining FCA and Transl. to Assign Frames and Thematic Grids to         237
                                                         French Verbs

                              Bibliography


Baker, C. F., Fillmore, C. J., and Lowe, J. B. (1998). The berkeley FrameNet
  project. In Proceedings of the 17th International Conference on Computational
  Linguistics, volume 1, pages 86–90, Montreal, Quebec, Canada. Association
  for Computational Linguistics.
Briscoe, T. and Carroll, J. (1993). Generalized probabilistic lr parsing of natu-
  ral language (corpora) with unification-based grammars. Comput. Linguist.,
  19(1):25–59.
Carroll, J. and Fang, A. C. (2004). The automatic acquisition of verb sub-
  categorisations and their impact on the performance of an hpsg parser. In
  IJCNLP, pages 646–654.
Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector
  machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–
  27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Cimiano, P., S.Staab, and Tane, J. (2003). Automatic Acquisition of Taxonomies
  from Text: FCA meets NLP. In Proceedings of the PKDD/ECML’03 Inter-
  national Workshop on Adaptive Text Extraction and Mining (ATEM), pages
  10–17.
Dubois, J. and Dubois-Charlier, F. (1997). Les verbes français. Larousse.
Falk, I. and Gardent, C. (2010). Bootstrapping a Classification of French Verbs
  Using Formal Concept Analysis. In Interdisciplinary Workshop on Verbs In-
  terdisciplinary Workshop on Verbs, page 6, Pisa Italy.
Falk, I., Gardent, C., and Lorenzo, A. (2010). Using Formal Concept Analysis to
  Acquire Knowledge about Verbs. In Concept Lattices and their applications,
  page 12, Sevilla, Spain.
Gross, M. (1975). Méthodes en syntaxe. Hermann, Paris.
Guillet, A. and Leclère, C. (1992). La structure des phrases simples en français.
  2 : Constructions transitives locatives. Droz, Geneva.
Hadouche, F. and Lapalme, G. (2010). Une version électronique du LVF com-
  parée avec d’autres ressources lexicales. Langages, pages 193–220. Mise en
  page différente que celle parue dans la revue.
Jay, N., Kohler, F., and Napoli, A. (2008). Analysis of social communities with
  iceberg and stability-based concept lattices. In ICFCA’08: Proceedings of
  the 6th international conference on Formal concept analysis, pages 258–272,
  Berlin, Heidelberg. Springer-Verlag.
Klimushkin, M., Obiedkov, S., and Roth, C. (2010). Approaches to the selection
  of relevant concepts in the case of noisy data. In Kwuida, L. and Sertkaya,
  B., editors, Formal Concept Analysis, volume 5986 of Lecture Notes in Com-
  puter Science, chapter 18, pages 255–266. Springer Berlin / Heidelberg, Berlin,
  Heidelberg.
Kupść, A. and Abeillé, A. (2008). Growing treelex. In Gelbkuh, A., editor,
  Computational Linguistics and Intelligent Text Processing, volume 4919 of
  Lecture Notes in Computer Science, pages 28–39. Springer Berlin / Heidelberg.
238     Ingrid Falk and Claire Gardent


Kuznetsov, S. O. (2007). On stability of a formal concept. Annals of Mathematics
  and Artificial Intelligence, 49(1-4):101–115.
Palmer, M., Gildea, D., and Xue, N. (2010). Semantic Role Labeling. Synthesis
  lectures on human language technologies. Morgan & Claypool Publishers.
Palmer, M., Kingsbury, P., and Gildea, D. (2005). The proposition bank: An
  annotated corpus of semantic roles. Computational Linguistics, 31(1):71–106.
Priss, U. (2005). Linguistic Applications of Formal Concept Analysis. In Ganter,
  B., Stumme, G., and Wille, R., editors, Formal Concept Analysis, volume 3626
  of Lecture Notes in Computer Science, pages 149–160–160. Springer Berlin /
  Heidelberg.
Roth, C., Obiedkov, S. A., and Kourie, D. G. (2006). Towards concise represen-
  tation for taxonomies of epistemic communities. In CLA, pages 240–255.
Saint-Dizier, P. (1999). Alternation and verb semantic classes for french: Analysis
  and class formation. In Predicative forms in natural language and in lexical
  knowledge bases. Kluwer Academic Publishers.
Schuler, K. K. (2006). VerbNet: A Broad-Coverage, Comprehensive Verb Lexi-
  con. PhD thesis, University of Pennsylvania.
Sporleder, C. (2002). A Galois Lattice based Approach to Lexical Inheritance
  Hierarchy Learning. In 15th European Conference on Artificial Intelligence
  (ECAI’02): Workshop on Machine Learning and Natural Language Processing
  for Ontology Engineering, Lyon, France.
Sun, L., Korhonen, A., Poibeau, T., and Messiant, C. (2010). Investigating
  the cross-linguistic potential of VerbNet-style classification. In Proceedings
  of the 23rd International Conference on Computational Linguistics, COLING
  ’10, pages 1056–1064, Stroudsburg, PA, USA. Association for Computational
  Linguistics.
Tolone, E. (2011). Analyse syntaxique à l’aide des tables du Lexique-Grammaire
  du français. PhD thesis, LIGM, Université Paris-Est, France, Laboratoire
  d’Informatique Gaspard-Monge, Université Paris-Est Marne-la-Vallée, France.
  (326 pp.).
van den Eynde, K. and Mertens, P. (2003). La valence : l’approche pronominale
  et son application au lexique verbal. Journal of French Language Studies,
  13:63–104.
      Generation algorithm of a concept lattice with
                  limited object access

                             Ch. DemkoF, K. Bertet

        L3I - Université de La Rochelle - av Michel Crépeau - 17042 La Rochelle
                               cdemko,kbertet@univ-lr.fr
                         F Joomla! Production Leadership Team
                              christophe.demko@joomla.org



        Abstract. Classical algorithms for generating the concept lattice (C, ≤
        ) of a binary table (O, I, R) have a complexity in O(|C| ∗ |I|2 ∗ |O|).
        Although the number of concepts is exponential in the size of the table
        in the worst case, the generation of a concept is output polynomial. In
        practice, the number of concepts is often polynomial in the size of the
        table. However, the cost of generating a concept remains high when the
        table is composed of a large number of objects.
        We propose in this paper an algorithm for generating the lattice with
        limited object access, which can improve the computation time. Experi-
        ments were conducted with Joomla!, a content management system based
        on relational algebra, and located on a MySQL database.


      keywords: concept lattice ; databases ; algorithm


  1    Introduction
  Galois lattices (or concept lattices) were first introduced in a formal way in the
  graph and ordered structures theory [1–3]. Later, they were developed in the field
  of Formal Concept Analysis (FCA) [4] for data analysis and classification. The
  concept lattice structure, based on the notion of concept, enables data description
  while preserving its diversity. It is used to analyse data when organised by a
  binary relation between objects and attributes.
      Galois lattice is a graph providing a representation of all the possible cor-
  respondences between a set of objects (or examples) O and a set of attributes
  (or features) I. The technological improvements of the last decades enable use
  of these structures for data mining problems though they are exponential in
  space/time (worst case). It has to be noted that in practice, in most cases, the
  size of the lattice remains reasonable.
      In addition, some applications offer to only generate some concepts from
  the huge amount of available data. Bordat’s algorithm [5] is the more appro-
  priate since it generates the cover relation between concepts, and thus allows
  an on-demand generation of concepts. Moreover, huge amount of data are of-
  ten described by a huge amount of objects. It is the case in databases where
  sophisticated key-indexation techniques are used to improve object access.

c 2011 by the paper authors. CLA 2011, pp. 239–250. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
240     Christophe Demko and Karell Bertet


    In this paper, we propose the Limited Object Access algorithm (LOA algo-
rithm), an extension of Bordat’s immediate successors generation with a limited
access to objects. This algorithm, compounded with an on-demand strategy, and
with sophisticated key-indexation techniques to improve objects’s access, aims
to improve time computation for a large amount of objects. However, worst case
theoretical complexity remains the same as Bordat’s algorithm. Experiments
were conducted with Joomla!, a content management system based on relational
algebra, and located on a MySQL database.
    This paper is organized as follows. In section 2, we describe the Galois lattice
structure and the Bordat’s generation algorithm. In section 3, we describe our
limited object access algorithm, illustrated by an example and some experiments.

2     Description and generation of a concept lattice
2.1    Description of a concept lattice
The concept lattice is a particular graph defined and generated from a relation R
between objects O and attributes I. This graph is composed of a set of concepts
ordered by a relation verifying the properties of a lattice, i.e. an order relation
≤ (transitive, reflexive and antisymmetric relation) such that, for each pair of
concepts in the graph, there exists both a lower bound and an upper bound.
Therefore, a lattice contains a minimum (resp. maximum) element according to
the relation ≤ called the bottom (resp. top) of the lattice. The Hasse diagram of
a graph [1] is the cover relation of ≤ denoted as ≺, i.e. the suppression on the
graph of both transitivity and reflexivity edges.
   We associate to a set of objects A ⊆ O the set f (A) of attributes in relation
R with the objects of A:
                          f (A) = {y ∈ I | xRy ∀ x ∈ A}
Dually, to a set of attributes B ⊆ I, we define the set g(B) of objects in relation
with the attributes of B:
                          g(B) = {x ∈ O | xRy ∀ y ∈ B}
These two functions f and g defined between objects and attributes form a Galois
correspondence. The relation between the set of objects and the set of attributes
is described by a formal context. A formal context C is a triplet C = (O, I, R)
(or C = (O, I, (f, g))) represented by a table.
    A formal concept represents maximal objects-attributes correspondences (fol-
lowing relation R) by a pair (A, B) with A ⊆ O and B ⊆ I, which verifies
f (A) = B and g(B) = A. The whole set of formal concepts thus corresponds to
all the possible maximal correspondences between a set of objects O and a set
of attributes I.
    Two formal concepts (A1 , B1 ) and (A2 , B2 ) are in relation in the lattice when
they verify the following inclusion property:
                                            A2 ⊆ A1
                (A1 , B1 ) ≤ (A2 , B2 ) ⇔
                                            (equivalent to B1 ⊆ B2 )
      Generation algorithm of a concept lattice with limited access to objects      241


    The whole set of formal concepts fitted out by the order relation ≤ is called
concept lattice or Galois lattice because it verifies the lattice properties: the
relation ≤ is clearly an order relation, and for each pair of concepts (A1 , B1 ) and
(A2 , B2 ), there exists the greatest lower bound (resp. the least upper bound)
called meet (resp. join) denoted (A1 , B1 ) ∧ (A2 , B2 ) (resp. (A1 , B1 ) ∨ (A2 , B2 ))
defined by:
                   (A1 , B1 ) ∧ (A2 , B2 ) = (g(B1 ∩ B2 ), (B1 ∩ B2 ))              (1)

                  (A1 , B1 ) ∨ (A2 , B2 ) = ((A1 ∩ A2 ), f (A1 ∩ A2 ))              (2)
The concepts ⊥ = (O, f (O)) and > = (g(I), I) respectively correspond to the
bottom and the top of the concept lattice.
   In formal concept analysis (FCA) concept lattices are used to analyse data
when organised by a binary relation between objects and attributes. See the
book of Ganter and Wille [4] for a more complete description of formal concept
analysis.
   In the following, we abuse notation and use X + x (respectively, X \ x) for
X ∪ {x} (respectively, X\{x}).


2.2   Generation algorithms of a concept lattice

Numerous generation algorithms for concept lattices have been proposed in lit-
erature [6,7,5,8]. Although all these algorithms generate the same lattice, they
propose different strategies. Some of these algorithms are incremental [6,9]. Gan-
ter’s NextClosure [7] is the reference algorithm that determines the concepts in
lectical order (next, the concepts may be ordered by ≤ to form the concept lat-
tice) while Bordat’s algorithm [5] is the first algorithm that computes directly
the Hasse diagram of the lattice. Recent work [10] proposed a generic algorithm
unifying the existing algorithms in a unique framework, which makes easier the
comparison of these algorithms. A formal and experimental comparative study
of the different algorithms has been published [11].
    All of these proposed algorithms have a polynomial complexity with respect
to the number of concepts (at best quadratic in [8]). The complexity is therefore
determined by the size of the lattice, this size being bounded by 2|O+I| in the
worst case and by |O + I| in the best case. Studies on average complexity are
difficult to carry out because the size of the concept lattice depends both on the
dimensionality of the data to classify and on their organization and diversity.
However, in practice the size of the Galois lattice generally remains reasonable.
    Some applications offer to only generate some concepts from the huge amount
of available data. Bordat’s algorithm [5] is the more appropriate since it generates
the cover relation between concepts, and thus allows an on-demand generation
of concepts usually used in concrete applications. Bordat’s algorithm is issued
from a corollary of Bordat’s theorem:

Theorem 1 (Bordat [5]). Let (A, B) and (A0 , B 0 ) be two concepts of a context
(O, I, R). Then (A, B) ≺ (A0 , B 0 ) if and only if A0 is inclusion-maximal in the
242       Christophe Demko and Karell Bertet


following set system FA defined on O1 :
                             FA = {g(x + B) : x ∈ I\B}                           (3)
Corollary 2 (Bordat [5]). Let (A, B) be a concept. There is a one-to-one map-
ping between the immediate successors of (A, B) in the Hasse diagram of the
lattice and the inclusion-maximal subsets of FA .
    Bordat’s algorithm recursively computes all the concepts of C by computing
immediate successors for each concept (A, B), starting from the bottom concept
⊥ = (f (G), G), until all concepts are generated. Immediate successors are gen-
erated using Corollary 2 in O(|I|2 ∗ |O|): the set system has first to be generated
in a linear time ; then inclusion-maximal subsets of FB , can easily be computed
in O(|I|2 ∗ |O|).

3     Limited Object Access Algorithm (LOA)
3.1     Description of the LOA Algorithm
Large data are often described by a huge amount of objects, as in databases for
example where the number of recordings (i.e. objects) can be huge, indexed using
sophisticated key-indexation techniques. In this section, we describe our Limited
Object Access algorithm (LOA algorithm), an extension of Bordat’s immediate
successors generation with a limited object access. This algorithm, compounded
with an on-demand strategy aims to improve time computation for large amount
of objects.
    Our algorithm considers the restriction of a concept lattice to the attributes.
A nice result establishes that any concept lattice (C, ≤C ) is isomorphic to the
lattice (CI , ⊆) defined on the set I of attributes, with CI the restriction of C to
the attributes in each concept. The lattice (CI , ⊆) is also known as the closed
sets lattice on the attributes I of a context (O, I, R), where the set system CI is
composed of all closed set - i.e. fixed points - for the closure operator ϕ = g ◦ f .
See the survey of Caspard and Monjardet [12] for more details about closed set
lattices.
    Using the closed sets lattice (CI , ⊆) instead of the whole concept lattice
(C, ≤C ) gives raise to a storage improvement, for example in case of large amount
of objects.
    A closed sets lattice can be generated using an algorithm similar to Bordat’s
algorithm, and therefore enabling an on-demand generation in order to reduce
the whole amount of closed sets. This algorithm (see Alg. 1) recursively computes
immediate successors (see Alg. 2) of a closed set B, starting from the bottom
closed set ⊥ = ϕ(∅), until I is generated.
    The Immediates Successors LOA algorithm we propose (see Alg. 3) rein-
forces the object access limitation by considering the cardinality of the subset
g(X + B) instead of the subset itself to compute the inclusion-maximal subsets
of FA using the following property:
1
    In [5], the equivalent formulation g(x) ∩ A is used instead of g(x + B)
      Generation algorithm of a concept lattice with limited access to objects   243


Proposition 3. Consider a concept (A, B), and two subsets X and Y of at-
tributes in B\I. Then
           g(X + B) ⊆ g(Y + B) ⇐⇒ |g(X + B)| = |g(X + Y + B)|                    (4)
This proposition is a direct consequence of the two following remarks:
1. The equivalence between inclusion and intersection set operations (C ⊆
   D ⇐⇒ C = C ∩ B) allows to deduce the equivalence between g(X + B) ⊆
   g(Y + B) and g(X + B) = g(X + B) ∩ g(Y + B):
2. Then, by definition of g, we have g(X + B) ∩ g(Y + B) = g(X + Y + B).
    More precisely, the Immediates Successors LOA algorithm (see Alg. 3) first
initialize the set Succ of immediate successors of a closed set B with the emp-
tyset. The set Succ is then updated by considering each attribute x of I\B and
another already inserted potential successor X ⊆ I\B by considering the fol-
lowing four cases, where cB (Y ) denotes the cardinality of g(B + Y ) for a Y of
attributes:
Merge x with X: When g(x + B) = g(X + B), then x and X belongs to the
   same closed set, and thus have to be merged in a same potential successor
   of B. By Proposition 3, this case is tested by cB (X + x) = cB (X) and
   cB (X) = cB (x).
Eliminate X: When g(X + B) ⊂ g(x + B), then the closed set containing X
   isn’t inclusion-maximal in FA , and thus hasn’t to be considered as a potential
   successor of B. By Proposition 3, this case is tested by cB (X + x) = cB (X)
   and cB (X) < cB (x).
Eliminate x: When g(x + B) ⊂ g(X + B), then the closed set containing x
   isn’t inclusion-maximal if FA , and thus hasn’t to be considered as a potential
   successor of B. By Proposition 3, this case is tested by cB (X + x) = cB (X)
   and cB (x) < cB (X).
Insert x: When x is neither eliminated or merged with X, then it is added as
   a potential successor of B ; another attribute is then considered.


3.2   Example
To illustrate our algorithm, we use the following context where numbers from 1
to 9 are described by some properties: the number is a prime number, an odd or
even number, a square, a composite number or a factorial number.
                  (p)rime o(dd) (e)ven (s)quare (c)omposite (f)actorial
           nb 1             ×             ×                     ×
           nb 2      ×            ×                             ×
           nb 3      ×      ×
           nb 4                   ×       ×          ×
           nb 5      ×      ×
           nb 6                   ×                  ×          ×
           nb 7      ×      ×
           nb 8                   ×                  ×
           nb 9             ×             ×          ×
244       Christophe Demko and Karell Bertet




      Name: Closed Set Lattice
      Data: A context K = (O, I, R)
      Result: The Hasse diagram (CI , ≺) of the lattice (CI , ⊆)
      begin
         CI = {f (O)};
         foreach B ∈ CI not marked do
             SuccB =Immediates successors (K, B);
             foreach X ∈ SuccB do
                 B 0 = B + X;
                 if B 0 6∈ CI then add B 0 to CI ;
                 add a cover relation B ≺ B 0
             end
             mark B
         end
         return (CI , ≺)
      end
Algorithm 1: Generation of the Hasse diagram of the closed set lattice (CI , ⊆)




      Name: Immediates Successors
      Data: A context K ; A closed set B of the closed set lattice (CI , ⊆) of K
      Result: The immediate successors of B in the lattice
      begin
         initialize the set system FA with ∅;
         foreach x ∈ I\B do
              add g(x + B) to FA
         end
         Succ=maximal inclusion subsets of FA ;
         return Succ
      end
Algorithm 2: Generation of the immediate successors of a closed set in the Hasse
diagram of the lattice (CI , ⊆)
     Generation algorithm of a concept lattice with limited access to objects   245




   Name: Immediates Successors LOA
   Data: A context K ; A closed set B of the closed set lattice (CI , ⊆) of K
   Result: The immediate successors of B in the lattice
   begin
      initialize the SuccB family to an empty set;
      foreach x ∈ I \ B do
           add = true;
           foreach X ∈ SuccB do
               \\ Merge x and X in the same potential successor
               if cB (x) = cB (X) then
                   if cB (X + x) = cB (x) then
                       replace X by X + x in SuccB ;
                       add=false; break;
                   end
               end
               \\ Eliminate x as potential successor
               if cB (x) < cB (X) then
                   if cB (X + x) = cB (x) then
                       add=false; break;
                   end
               end
               \\ Eliminate X as potential successor
               if cB (x) > cB (X) then
                   if cB (X + x) = cB (X) then
                       delete X from SuccB
                   end
               end
           end
           \\ Insert x as a new potential successor ;
           if add then add {x} to SuccB
      end
      return SuccB ;
   end
Algorithm 3: Generation of the immediate successors of a closed set in the Hasse
diagram of the lattice (CI , ⊆)
246            Christophe Demko and Karell Bertet




                                            Fig. 1. Concept lattice


    Figure 1 gives the concept lattice of this context. When the algorithm com-
putes the successors of the closed sets e (resp. p), it proceeds as described in
Table 1 (resp. Table 2). The different steps of these two examples show the
different actions taken by the algorithm.

 SuccF               x   cB (x)   X     cB (X) cB (x + X) Case                        Action
 ∅                   p   1                                                            Insert [p]
 {[p]}               o   0        [p]   1       0        cB (x + X) = cB (x) < cB (X) Eliminate [o]
 {[p]}               s   1        [p]   1       0        cB (x + X) < cB (X) = cB (x)
 {[p]}               c   3        [p]   1       0        cB (x+X) < cB (X) < cB (x) Insert [c]
 {[p], [c]}          f   2        [p]   1       1        cB (x + X) = cB (X) < cB (x) Eliminate [p]
 {[c]}               f   2        [c]   3       1        cB (x+X) < cB (x) < cB (X) Insert [f ]
 {[c], [f ]}

                                  Table 1. Immediate successors of [e]




3.3      Complexity
The complexity of computing the immediate successors of a closed set B using
the Immediates Successors LOA algorithm is:
                                  (|I| − |B|)(|I| − |B|)
                                                         ∗ O(cB (X))
                                            2
       Generation algorithm of a concept lattice with limited access to objects                 247

 SuccF           x   cB (x)   X     cB (X) cB (x + X) Case                        Action
 ∅               o   3                                                            Insert [o]
 {[o]}           e   1        [o]   3     0          cB (x+X) < cB (x) < cB (X) Insert [e]
 {[o], [e]}      s   0        [o]   3     0          cB (x + X) = cB (x) < cB (X) Eliminate [s]
 {[o], [e]}      c   0        [o]   3     0          cB (x + X) = cB (x) < cB (X) Eliminate [c]
 {[o], [e]}      f   1        [o]   3     0          cB (x+X) < cB (x) < cB (X)
 {[o], [e]}      f   1        [e]   1     1          cB (x + X) = cB (x) = cB (X) Merge [e], [f ]
 {[o], [ef ]}

                              Table 2. Immediate successors of [p]




      which leads to
                                    O((|I| − |B|)2 ∗ O(cB (X)))
using the big O notation.
    This has to be compared with O(|I|2 ∗ |O|) of the Immediates Successors
algorithm. In addition the cost O(CB (x)) of computing the cardinality of objects
satisfying the required properties can be based on multiple keys and robust
algorithms used in databases that do not need to load all data for computing a
cardinality.


3.4     Experimentations

In the experiment, we use a dataset composed of:

54 attributes: the 6 attributes of the example, and attributes corresponding
   to the property to be multiple of {3 − 50}.
100000 objects: the integers between 1 and 100000

   The dataset is stored in a database MySQL 5.5.14. We have implemented
our Immediates Successors LOA algorithm using PhP 5.3.6. The counting of
objects satisfying a set of properties is realised by the SQL request comparing
indexes with a constant:

select count (*) from att1=1 and att2=1

    We compare the processing time of our Immediates Successors LOA algo-
rithm in the two following cases:

Indexed: Each attribute is defined to be an index. Objects are indexed by their
   attributes, and MySQL can quickly retrieve them in the dataset using a B-
   tree indexation with a logarithmic complexity [13]: O(cB (X)) = O(log|I|).
Not indexed: Objects are not indexed and a scan of all the lines is neces-
   sary to retrieve objects. The complexity is then similar to those of the
   Immediates Successors algorithm: O(cB (X)) = O(|I| − |B|).

   We compare the processing time of computing the immediate successors of
the botom element ∅ in this two cases (indexed and not indexed):
248    Christophe Demko and Karell Bertet




            (a) 100000 first integers as objects ; a number of attributes
            between 6 to 54.




                   (b) 54 attributes ; integers between 1000 and 100000


                 Fig. 2. Calculating the immediate successors of ∅



Fig.2(a): for the 100000 first integers as objects, and a number of attributes
   between 6 to 54.
Fig.2(b): for the 54 attributes, and integer between 1000 and 100000.

In the results, the time processing is really improved with an indexed dataset,
and seems to be near to linear in O(|I| + |O|). Immediate successors of the ∅ for
100000 objects and 54 attributes are computed in 3 seconds with the indexed
algorithm, and in 18 seconds with the not indexed one.
     Generation algorithm of a concept lattice with limited access to objects      249


   Moreover, the explain of the count operation shows that an index-merge
operation is realized on indexes corresponding to an intersect computation:

mysql> explain select count(*) from numbers where p=1 and o=1;
+----+-------------+------+---------+------+-----------------------+
| id | select_type | key | key_len | rows | Extra                  |
+----+-------------+------+---------+------+-----------------------+
| 1 | index_merge | p,o | 1,1       |    2 | Using intersect(p,o); |
|    |             |      |         |      | Using where;          |
|    |             |      |         |      | Using index           |
+----+-------------+------+---------+------+-----------------------+
1 row in set (0.00 sec)


   Therefore, optimizing the intersection operation, with an adaptated sort on
the lines for example, would be a possible optimization of our algorithm.


4   Conclusion

In this paper, we described a new algorithm for computing the immediate succes-
sors of a concept using the counting of objects satisfying a set of properties. By
separating the counting from the rest of the algorithm, new systems for explor-
ing concept lattices can now rely on optimization algorithms used in relational
databases. If the tests we will realize on PostgreSQL and MySQL databases are
successfull in terms of manipulating a huge amounts of data, we plan to propose
a library for extending content management system such as Joomla!.


References

 1. Birkhoff, G.: Lattice theory. 3d edn. American Mathematical Society (1967)
 2. Barbut, M., Monjardet, B.: Ordres et classifications : Algèbre et combinatoire.
    Hachette, Paris (1970) 2 tomes.
 3. Davey, B., Priestley, H.: Introduction to lattices and orders. 2nd edn. Cambridge
    University Press (1991)
 4. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical foundations.
    Springer Verlag, Berlin (1999)
 5. Bordat, J.: Calcul pratique du treillis de Galois d’une correspondance. Math. Sci.
    Hum. 96 (1986) 31–47
 6. Norris, E.: An algorithm for computing the maximal rectangles in a binary relation.
    Revue Roumaine de Mathématiques Pures et Appliquées 23 (1978)
 7. Ganter, B.: Two basic algorithms in concept analysis. Technische Hochschule
    Darmstadt (Preprint 831) (1984)
 8. Nourine, L., Raynaud, O.: A fast algorithm for building lattices. Information
    Processing Letters 71 (1999) 199–204
 9. Gödin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms
    based on Galois (concept) lattices. Computational Intelligence 11 (1995) 246–267
250     Christophe Demko and Karell Bertet


10. Gely, A.: A generic algorithm for generating closed sets of binary relation. Third
    International Conference on Formal Concept Analysis (ICFCA 2005) (2005) 223–
    234
11. Kuznetsov, S., Obiedkov, S.: Comparing performance of algorithms for generating
    concept lattices. Journal of Experimental and Theorical Artificial Intelligence 14
    (2002) 189–216
12. Caspard, N., Monjardet, B.: The lattice of closure systems, closure operators and
    implicational systems on a finite set: a survey. Discrete Applied Mathematics 127
    (2003) 241–269
13. Bayer, R. et McCreight, E.M.: Organization and maintenance of large ordered
    indexes. Acta Informatica 1 (1972) 173–189
                           Homogeneity and Stability
                            in Conceptual Analysis

                              Paula Brito1 and Géraldine Polaillon2
          1
              Faculdade de Economia & LIAAD/INESC-Porto L.A., Universidade do Porto
                 Rua Dr. Roberto Frias, 4200-464 Porto, Portugal mpbrito@fep.up.pt
                2
                  SUPELEC Science des Systèmes (E3S) - Département Informatique
              Plateau de Moulon, 3 rue Joliot Curie, 91192 Gif-sur-Yvette cedex, France
                                geraldine.polaillon@supelec.fr


               Abstract. This work comes within the field of data analysis using Galois
               lattices. We consider ordinal, numerical single or interval data as well
               as data that consist on frequency/probability distributions on a finite
               set of categories. Data are represented and dealt with on a common
               framework, by defining a generalization operator that determines intents
               by intervals. In the case of distribution data, the obtained concepts are
               more homogeneous and more easily interpretable than those obtained by
               using the maximum and minimum operators previously proposed. The
               number of obtained concepts being often rather large, and to limit the
               influence of atypical elements, we propose to identify stable concepts
               using interval distances in a cross validation-like approach.


      1       Introduction
      This work concerns multivariate data analysis using Galois concept lattices. Let
      E = {ω1 , . . . , ωn } be the set of elements to be analyzed, described by p variables
      Y1 , . . . , Yp . In this paper we consider the specific case where the variables Yj are
      numerical (real or interval-valued), ordinal and modal. Modal variables allow
      associating with each element of E a probability/frequency distribution on an
      underlying finite set of categories (see [9]).
           The use of Galois lattices in Data Analysis was first introduced by Barbut
      and Monjardet, in the seventies of last century [2] and then further developed
      and largely spread out by the work of R. Wille and B. Ganter (see, e.g., [6]). Let
      (A, ≤1 ) and (B, ≤2 ) be two ordered sets. A Galois connection is a pair (f, g),
      where f is a mapping f : A → B, g is a mapping g : B → A, such that f and g are
      antitone, and both h = gof and h0 = f og are extensive; h and h0 are then closure
      operators. The mapping f defines the intent of a set S ⊆ E, and the mapping g
      that allows obtaining the extent in E associated with a set of attributes T ⊆ O,
      where O is the set of the considered (binary) attributes. The couple (f, g) then
      constitutes a Galois connection between (P (E), ⊆) and (P (O), ⊆). A concept is
      defined as a couple (S, T ) where S ⊆ E, T ⊆ O, S = g(T ) and T = f (S), i.e., we
      have h(S) = S; S is the extent of the concept and T its intent. This approach
      has been applied to non-binary variables, but in this case data are generally
      submitted to a previous “binarization”, by performing a binary coding of the




c 2011 by the paper authors. CLA 2011, pp. 251–263. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
252       Paula Brito and Géraldine Polaillon

      data array; for numerical or ordinal variables Y , attributes of the form “Y ≤ x,”
      for each observed value x, are considered.
          In [3] this approach has been extended by defining directly the intent of a
      set of elements; which has allowed obtaining, for each variable type (classical or
      otherwise) appropriate couples of mappings (f, g) forming a Galois connection.
      This has the advantage of allowing analyzing the data directly as it is presented,
      without imposing any sort of binary pre-coding, which may, and generally does,
      drastically increase the size of the data array to be analyzed. Galois lattices where
      intents are obtained by union and by intersection are obtained. This approach has
      been further extended to modal variables (see [4]). The case of ordinal variables
      has been dealt with in [11], using an approach similar to that of [4] for modal
      variables.
          Ganter and Kuznetsov [5] proposed a general construction, called pattern
      structures, which allows for arbitrary descriptions with a semilattice operation
      on them; since union and intersection of intervals define semilattices, they make
      respective pattern structures. An application on gene expression data is pre-
      sented in [7].
          Here, we consider a common framework for numerical (real or interval-valued),
      ordinal and modal variables, by defining a generalization operator that deter-
      mines the intent in the form of vectors of intervals. For ordinal and modal (i.e.,
      distribution-valued) variables the obtained concepts are more homogeneous and
      therefore easier to interpret than those obtained by applying the minimum and
      maximum operators, as previously proposed. In the next sections, we detail how
      generalization of a set of elements is performed for each variable type.
          The number of obtained concepts being often rather large, we propose to
      identify stable concepts (see also [8] and [12]), using distances designed for inter-
      val data. The criteria is that the intent of a concept should not be too different
      from those obtained by sequentially removing one element of the extent at a time
      - which would reveal that this particular element is provoking a drastic change
      in the concepts’ intent. Should it occur, the concept would be considered to be
      non-stable.
          In the case of multi-valued data, other approaches of lattice reduction, di-
      rectly applied to the concept lattice, have been proposed in [1] and [10]. These
      two approaches rely on the same idea of merging together similar attribute values
      (in respect to a given threshold), and thereby reducing the number of concepts.
          The remainder of the paper is organized as follows. Section 2 describes the
      generalization procedure for real and interval-valued variables, which is extended
      in Section 3 to modal variables. In Section 4 a common generalization approach
      by vectors of intervals is presented. In Section 5 the problem of concept stability
      is considered, and a method using interval distances is proposed, which allows
      addressing the question of lattice reduction. Section 6 concludes the paper, open-
      ing paths for future research.
                         Homogeneity and Stability in Conceptual Analysis                253

2    Real and interval-valued variables

Let E = {ω1 , ..., ωn } be the set of n elements or objects to be analyzed, and
Y1 , . . . , Yp real or interval-valued variables such that Yj (ωi ) = [lij , uij ]. We shall
consider real-valued variables as a special case of interval-valued ones; it is there-
fore equivalent to write Yj (ωi ) = x or Yj (ωi ) = [x, x].
     Let A = {ω1 , . . . , ωh } ⊆ E. Generalization by union is defined (see [3]) by the
mapping f : P (E) → I p where I is the set of intervals of IR endowed with the
inclusion order, such that f (A) = (I1 , . . . , Ip ), with Ij = [M in {lij } , M ax {uij }],
ωi ∈ A, j = 1, . . . , p, i.e., for each j = 1, . . . , p, Ij is the minimum interval (for
the inclusion order) that covers all values taken by the elements of A for variable
Yj . Let g : I p → P (E) be the mapping defined as g((I1 , . . . , Ip )) =
= {ωi ∈ E : Yj (ωi ) ⊆ Ij , j = 1, . . . , p}, i.e., the set of elements of E taking values
within Ij , for j = 1, . . . , p. The couple (f, g) is a Galois connection.
     Likewise, we may generalise by intersection defining f and g as follows:
f ∗ : P (E) → I p , f (A) = (I1 , . . . , Ip ), with Ij = [M ax {lij } , M in {uij }] if
M ax {lij } ≤ M in {uij } , ωi ∈ A, Ij =              otherwise (i.e., the largest interval
contained in all intervals taken by the elements of A for variable Yj , which
may be empty), for j = 1, . . . , p, and g ∗ : I p → P (E) with g ∗ ((I1 , . . . , Ip )) =
{ωi ∈ E : Yj (ωi ) ⊇ Ij , j = 1, . . . , p} (the set of elements of E taking interval-
values that contain Ij ,) for j = 1, . . . , p. The couple (f ∗ , g ∗ ) forms also a Galois
connection.

Example 1:
Consider three persons, Ann, Bob and Charles characterized by two variables,
age and amount of time (in minutes) necessary to go to work (which varies
from day to day, and is therefore represented by an interval-valued variable), as
presented in Table 1.


                                    Age Time
                            Ann     25 [15, 20]
                            Bob     32 [25, 30]
                            Charles 40 [10, 20]
Table 1. Age and amount of time (in minutes) necessary to go to work for three
persons.




    Let A = {Bob,Charles}. Generalization by the union leads to
f (A) = ([32, 40], [10, 30]), describing people who are between 32 and 40 years
old and take 10 to 30 minutes to go to work; in this dataset people meeting this
description are given by g(f (A)) = g(([32, 40], [10, 30])), i.e., {Bob, Charles} =
A. Here, ({Bob, Charles}, ([32, 40], [10, 30])) is a concept.
254       Paula Brito and Géraldine Polaillon

      3    Modal variables

      Two Galois connections may also be defined for the   case of modal variables (see
      [4]). Let Y1 , . . . , Yp be p modal variables, Oj = mj1 , . . . , mjkj the set of kj
      possible categories of variable Yj , Mj the set of distributions defined on Oj , for
      j = 1, . . . ,np, and M = M1 × . . . ×oMp . For variable Yj and element ωi ∈ E,
      Yj (ωi ) =    mj1 (pω                    ωi             ωi
                          j1 ), . . . , mjkj (pjkj ) , where pjk` is the probability/frequency
                           i


      associated with category mj` (` = 1, . . . , kj ) of variable Yj , and element ωi . Let
      A = {ω1 , . . . , ωh } ⊆ E.
           To generalise by the maximum we take, for each category mj` , the maximum
      of its probabilities/frequencies in A. Let f : P (E) → M , such that f (A) =
      (d1 , . . . , dp ), with dj = {mj1 (tj1 ), . . . , mjkj (tjkj )}, where tj` = M ax{pωj` , ωi ∈
                                                                                            i


      A}, ` = 1, . . . , kj . The intent of a set A ⊆ E is then to be interpreted as “objects
      with at most tj` cases presenting category mj` , ` = 1, . . . , kj , j = 1, . . . , p”. The
      couple (f, g) with n    g : M → P (E) defined as, for dj = {mj1 (pj1 ),o. . . , mjkj (pjkj )},
      g((d1 , . . . , dp )) = ωi ∈ E : pωj` ≤ pj` , ` = 1, . . . , kj , j = 1, . . . , p , forms a Galois
                                          i


      connection.
           Similarly, we may generalise by the minimum taking for each category the
      minimum of its probabilities/frequencies. Let f ∗ : P (E) → M , f ∗ (A) = (d1 , . . . , dp ),
      with dj = {mj1 (vj1 ), . . . , mjkj (vjkj )}, where vj` = M in{pω                j` , ωi ∈ A}, ` =
                                                                                        i


      1, . . . , kj . The intent of a set A ⊆ E is now interpreted as “objects with at
      least vj` cases presenting category mj` , ` = 1, . . . , kj , j = 1, . . . , p”.
      The couple (f ∗ , g ∗ )nwith g ∗ : M → P (E) such that, for dj = {mj1o(pj1 ), . . . , mjkj (pjkj )},
      g ∗ ((d1 , . . . , dp )) = ωi ∈ E : pω
                                           j` ≥ pj` , ` = 1, . . . , kj , j = 1, . . . , p forms likewise
                                            i


      a Galois connection.

      Example 2:
      Consider four groups of students for each of which a categorical mark is given,
      according to the following scale: a: mark < 10, b: mark between 10 and 15, c:
      mark > 15 as summarized in Table 2.


                                                  Mark
                         Group 1 < 10(0.2), [10 − 15] (0.6), > 15(0.2)
                         Group 2 < 10(0.3), [10 − 15] (0.3), > 15(0.4)
                         Group 3 < 10(0.1), [10 − 15] (0.6), > 15(0.3)
                         Group 4 < 10(0.3), [10 − 15] (0.6), > 15(0.1)
                         Group 5 < 10(0.5), [10 − 15] (0.3), > 15(0.2)
      Table 2. Frequency distributions of the students marks, in 3 categories, for 5 groups.




         The intent, obtained by the maximum operator, of the set formed by groups
      1 and 2, is {a(0.3), b(0.6), c(0.4)} and is interpreted as “students’ groups with at
                           Homogeneity and Stability in Conceptual Analysis                    255

most 30% of marks a, at most 60% of marks b and at most 40% of marks c”.
The corresponding extent comprehends groups 1, 2, 3 and 4. If, alternatively,
we determine the intent of the same set by the minimum operator, we obtain
{a(0.2), b(0.3), c(0.2)}, to be read as “students’ groups with at least 20% of marks
a, at least 30% of marks b and at least 20% of marks c”, whose extent is formed
by groups 1, 2 and 5.


4    A common approach: generalization by intervals

We now present a unique framework allowing to perform generalization for nu-
merical (real or interval-valued) variables, ordinal variables and modal variables,
based on generalization by intervals.
      For numerical (real or interval-valued) data, we are in the above mentioned
case of generalization by taking the union.
      For modal variables, it amounts to consider, for each category, an interval
corresponding to the range of its probability/frequency. In fact, it has often been
observed that generalization either by the maximum or by the minimum, as
defined in Section 3, may quickly lead to over-generalization. As a consequence,
f (A), A ⊆ E, is not very informative.
      Let MjI = {mj` (Ij` ), ` = 1, . . . , kj }, mj` ∈ Oj , Ij` ⊆ [0, 1] and M I = M1I ×
. . . × MpI . Generalization is now defined as

                              f I : P (E) → M I
                                       I
                            f (A) = (d1 , . . . , dp )
                   with dj = {mj1 (Ij1 ), . . . , mjkj (Ijkj )},
           h                       i
where Ij` = M in{pωi
                  j` }, M ax{pωi
                              j` }   , ωi ∈ A, ` = 1, . . . , kj , j = 1, . . . , p and

                                n      gI : M I → E                                        o
         g((d1 , . . . , dp )) = ωi ∈ E : pω
                                           j` ∈ Ij` , ` = 1, . . . , kj , j = 1, . . . , p
                                            i




The so-defined couple of mappings (f I , g I ) forms a new Galois connection.
    On the data of Example 2, generalization by intervals of groups 1 and 2
provides the intent {a [0.2, 0.3] , b [0.3, 0.6] , c [0.2, 0.4]}, to be read as “students’
groups having between 20% and 30% cases of mark a, between 30% and 60%
cases of mark b and between 20% and 40% cases of mark c” and whose extent
now only contains groups 1 and 2.
    The case of ordinal variables has been addressed in [11], performing general-
ization either using the maximum or the minimum. To allow for more flexibility,
the author proposes to choose the operator individually for each variable. Nev-
ertheless, one of these generalization operators must be chosen in each case, and
over-generalization is not prevented. Our proposal for this type of variables, is
to generalise a set A ⊆ E considering, no longer a minimum or a maximum, but
rather an interval of ordinal values.
256       Paula Brito and Géraldine Polaillon

      Example 3:
      Consider the classifications given by four cinema critics while evaluating three
      movies, Movie 1, Movie 2 and Movie 3 as given in Table 3.


                                         Movie 1 Movie 2 Movie 3
                               Critic 1     5         5         4
                               Critic 2     5         4         4
                               Critic 3     1         2         2
                               Critic 4     2         1         1
                   Table 3. Classifications given by four critics to three movies.




          The intent obtained by using the maximum operator of the group formed by
      critics 1 and 2 is (5, 5, 4), to be interpreted as “critics giving at most mark 5 to
      Movie 1, at most mark 5 to Movie 2 and at most mark 4 to Movie 3” - which
      is obviously too general and would cover almost everyone; in this dataset the
      corresponding extent contains critics 1, 2, 3 and 4. Therefore, the class formed
      by critics 1 and 2, who present a similar behavior, does not correspond to a con-
      cept. The intent obtained by using the minimum operator of the group formed
      by critics 3 and 4 is (1, 1, 1), to be read “critics giving at least mark 1 to Movie 1,
      at least mark 1 to Movie 2 and at least mark 1 to Movie 3” - which would cover
      every critic; its extent in this dataset consists again of critics 1, 2, 3 and 4. Here
      again, the class formed by critics 3 and 4, who give quite similar marks, does not
      correspond to a concept. If we now perform generalization by interval-vectors
      of the group formed by critics 1 and 2, we obtain the intent ([5, 5] , [4, 5] , [4, 4]);
      likewise for the group formed by critics 3 and 4, we have ([1, 2] , [1, 2] , [1, 2]);
      in the first case we are clearly referring to critics giving high marks while in
      the second case we describe critics giving low marks to all movies. The corre-
      sponding extents no longer contain other critics, presenting a rather different
      profile from those considered each time. Furthermore, both ({Critic 1, Critic
      2}, ([5, 5] , [4, 5] , [4, 4]) and ({Critic 3, Critic 4}, ([1, 2] , [1, 2] , [1, 2]) are concepts.
      When determining concepts, according to the minimum or the maximum oper-
      ators, e.g. in a clustering context, there is therefore a risk of forming heteroge-
      neous clusters, since over-generalization may lead to a too large extent. By taking
      interval-vectors of observed values, the over-generalization problem is avoided.
      To conclude this section, we now present a more general example, with variables
      of the different considered types.

      Example 4:
      Consider the data in Table 4, where 4 persons are described by their age, a
      real-valued variable, time (in minutes) they take to go to work, an interval-
      valued variable, the means of transportation used, a modal variable, and their
      classifications given to three newspapers, A, B and C (ordinal variable).
                          Homogeneity and Stability in Conceptual Analysis                   257

                 Age      Time                 Transport              A B    C
  Albert          25     [15, 20]        car (0.2) bus (0.8))         4 2     5
  Bellinda        40     [25, 30]  car (0.7), bus (0.2), train (0.1)) 2 4     3
  Christine       32     [10, 15]  car (0.2), bus (0.7), train (0.1)) 5 1     4
  David           58     [30, 45]        car (0.9), bus (0.1))        2 4     1
Table 4. Age, time taken to go to work (in minutes), means of transportation used
and classifications given to newspapers A, B and C for four persons.




   The intent of A = {Albert, Christine} is
V = ([25, 32] , [10, 20] , ([0.2, 0.2] , [0.7, 0.8] , [0.0, 0.1]) , [4, 5] , [1, 2] , [4, 5]) and
(A, V ) is a concept.


5    Stability

Concepts are theoretically very interesting, and do provide rich information on
the values shared by subsets of elements of the set under study. However, the
number of concepts of a data array is often rather large, even for relatively low
cardinals of the sets of elements and variables. This fact makes the analysis
and interpretation of results a bit delicate. It is often to be noticed that when
analyzing the concepts generated by numerical or modal variables, groups of
concepts appear which are quite similar. This may be due to noise or minor
differences, generally not pertinent. The idea is therefore to extract only those
concepts which are representative of these groups of similar concepts, so as to
obtain a more concise representation with significantly homogeneous concepts.
    Several solutions may be pointed out for this objective. We will focus on the
notion of stability, as introduced in [8] and [12], which evaluates the amount of
information of the intent that depends on specific objects of the concept’s extent.
Formally, the stability of a concept is defined as the probability of keeping its
intent unchanged while deleting arbitrarily chosen objects of its extent.
    When analyzing data described by numerical (real or interval-valued), ordinal
or modal variables, and generalizing using interval-vectors (as described in the
previous sections), we shall apply a similar approach to each formed concept, but
introducing a distance measure. The objective being to retain the homogeneous
concepts, it is wished to avoid that a single element of the concepts’ extent
produces an important increase in the intent’s intervals’ ranges.
    To identify the stable concepts, a threshold α depending on the maximum
distance is defined (so as no to be dependent from the variables’ scales). A
concept is said to be “stable” if the distance between the intent obtained by
removing one element of the extent at a time, and its original intent, is not
above the given threshold. This is in fact a cross-validation-like approach, in
that one element of the extent is removed at a time, and the resulting intent is
compared with the original one.
258       Paula Brito and Géraldine Polaillon

          When data have an interval form, interval distances should be used. Dif-
      ferent measures are available in the literature; we will focus on three interval
      distance measures: the Hausdorff distance, the interval Euclidean distance and
      the interval City-Block distance.
          Let Ii = [li , ui ] and Ih = [lh , uh ] be two intervals we wish to compare. The
      Hausdorff distance dH , the interval Euclidean distance d2 and the interval City-
      Block distance d1 between Ii and Ih are respectively

                             dH (Ii , Ih ) = M ax {{|li − lh | , |ui − uh |}
                                             p
                              d2 (Ii , Ih ) = (li − lh )2 + (ui − uh )2
                             d1 (Ii , Ih ) = |li − lh | + |ui − uh | .

      The Hausdorff distance between two sets is the maximum distance of a set
      to the nearest point in the other set, i.e., two sets are close in terms of the
      Hausdorff distance if every point of either set is close to some point of the other
      set. Interval Euclidean and City-Block distances are just the counterparts of the
      corresponding distances for real values; if we embed the interval set in IR2 , where
      one dimension is used for the lower and the other for the upper bound of the
      intervals, then these distances are just the Euclidean and City-Block distances
      between the corresponding points in the two-dimensional space.
          Let C = (A, D) be a concept, where A = {ω1 , . . . , ωh } ⊆ E is its extent
      and D = (I1 , . . . , Ip ) is its intent, D = f (A). The considered criterion is then
      the distance ∆ between D et D−i where D−i is the intent of A without ωi ,
      D−i = f (A \ {ωi }), i = 1, . . . , h, defined by: ∆ = M ax{δ(D, D−i ), ωi ∈ A}, δ
      measuring the dissimilarity between interval-vectors.
          Let d be the distance (according to the chosen measure) between the intervals
      corresponding to variable Yj in a concept’s intent. Two options may then be
      foreseen, whether it is wished to consider the maximal or the average distance
      on the intervals defining the intents:
       1. δM ax (D, D−i ) = M ax{d(Ij , Ij−i )}, j indexing the variable set Yj , j = 1, . . . , p
          in the case of numerical and ordinal variables, and the global category set
          O = O1 ∪ . . . ∪ Op in the case of p modal variables;
       2. δM ean (D, D−i ) = M ean{d(Ij , Ij−i )}, j as in 1.
         A concept C = (A, D) is then considered to be stable if ∆ ≤ α. This ap-
      proach allows keeping only the stable, and therefore more representative, con-
      cepts, avoiding the effect of outlier observations.


      6    Illustrative application
      Consider again classifications given by cinema critics evaluating three movies,
      Movie 1, Movie 2 and Movie 3 where Yj (Critici ) is the mark given by Critic i
      to Movie j, i = 1, . . . , 5; j = 1, 2, 3, as given in Table 5.
         Tables 6 and 7 list the concepts obtained when the Minimum and the Maxi-
      mum generalization operators are used, respectively.
                    Homogeneity and Stability in Conceptual Analysis         259

                               Movie 1 Movie 2 Movie 3
                    Critic 1      3         2         3
                    Critic 2      1         1         2
                    Critic 3      5         5         1
                    Critic 4      4         3         2
                    Critic 5      2         4         5
         Table 5. Classifications given by five critics to three movies.



                                           Intent
                      Extent       Movie 1 Movie 2 Movie 3
                        {1}         ≥3      ≥2      ≥3
                        {3}         ≥5      ≥5      ≥1
                        {4}         ≥4      ≥3      ≥2
                        {5}         ≥2      ≥4      ≥5
                       {1, 4}       ≥3      ≥2      ≥2
                       {1, 5}       ≥2      ≥2      ≥3
                       {3, 4}       ≥4      ≥3      ≥1
                       {3, 5}       ≥2      ≥4      ≥1
                     {1, 3, 4}      ≥3      ≥2      ≥1
                     {1, 4, 5}      ≥2      ≥2      ≥2
                     {3, 4, 5}      ≥2      ≥3      ≥1
                    {1, 2, 4, 5}    ≥1      ≥1      ≥2
                    {1, 3, 4, 5}    ≥2      ≥2      ≥1
                   {1, 2, 3, 4, 5}  ≥1      ≥1      ≥1
Table 6. Concepts of the Minimum lattice corresponding to the data in Table 5.



                                           Intent
                      Extent       Movie 1 Movie 2 Movie 3
                        {2}         ≤1      ≤1      ≤2
                        {3}         ≤5      ≤5      ≤1
                       {1, 2}       ≤3      ≤2      ≤3
                       {2, 4}       ≤4      ≤3      ≤2
                       {2, 5}       ≤2      ≤4      ≤5
                      {1, 2, 4}     ≤4      ≤3      ≤3
                      {1, 2, 5}     ≤3      ≤4      ≤5
                      {2, 3, 4}     ≤5      ≤5      ≤2
                    {1, 2, 3, 4}    ≤5      ≤5      ≤3
                    {1, 2, 4, 5}    ≤4      ≤4      ≤5
                   {1, 2, 3, 4, 5}  ≤5      ≤5      ≤5
Table 7. Concepts of the Maximum lattice corresponding to the data in Table 5.
260       Paula Brito and Géraldine Polaillon

          The concepts (except for the empty extent one) obtained from this data
      table, using generalization by intervals, i.e., for A ⊆ E, f (A) = (I1 , I2 , I3 ), with
      Ij = [M in {Yj (Critici )} , M ax {Yj (Critici )}], Critici ∈ A, j = 1, 2, 3, are listed
      in Table 8.


                                                    Intent
                              Extent      Movie 1 Movie 2 Movie 3
                                {1}        [3, 3]    [2, 2]    [3, 3]
                                {2}        [1, 1]    [1, 1]    [2, 2]
                                {3}        [5, 5]    [5, 5]    [1, 1]
                                {4}        [4, 4]    [3, 3]    [2, 2]
                                {5}        [2, 2]    [4, 4]    [5, 5]
                               {1, 2}      [1, 3]    [1, 2]    [2, 3]
                               {1, 4}      [3, 4]    [2, 3]    [2, 3]
                               {1, 5}      [2, 3]    [2, 4]    [3, 5]
                               {2, 4}      [1, 4]    [1, 3]    [2, 2]
                               {2, 5}      [1, 2]    [1, 4]    [2, 5]
                               {3, 4}      [4, 5]    [3, 5]    [1, 2]
                               {3, 5}      [2, 5]    [4, 5]    [1, 5]
                               {4, 5}      [2, 4]    [3, 4]    [2, 5]
                             {1, 2, 4}     [1, 4]    [1, 3]    [2, 3]
                             {1, 2, 5}     [1, 3]    [1, 3]    [2, 5]
                             {1, 3, 4}     [3, 5]    [2, 5]    [1, 3]
                             {1, 4, 5}     [2, 4]    [2, 4]    [2, 5]
                             {2, 3, 4}     [1, 5]    [1, 5]    [1, 2]
                             {3, 4, 5}     [2, 5]    [3, 5]    [1, 5]
                            {1, 2, 3, 4}   [1, 5]    [1, 5]    [1, 3]
                            {1, 2, 4, 5}   [1, 4]    [1, 4]    [2, 5]
                            {1, 3, 4, 5}   [2, 5]    [2, 5]    [1, 5]
                           {1, 2, 3, 4, 5} [1, 5]    [1, 5]    [1, 5]
                Table 8. Concepts of the interval lattice for the data in Table 5.




          We notice that all the concepts obtained using the Minimum or the Maximum
      operator are concepts for the interval generalization, although with a different
      meaning, given the different intent mapping. As discussed before, even in this
      small example it may be observed that concepts obtained using the Minimum
      or the Maximum operator often present a rather general intent, thus leading to
      over-generalization in the concept formation. Consider, for instance, the concept
      ({1} , (Movie 1 ≥ 3 , Movie 2 ≥ 2 , Movie 3 ≥ 3)) in Table 6, it indicates that
      Critic 1 gives high marks to each movie, which is not really the case, whereas
      the concept ({1} , (Movie 1 ∈ [3, 3] , Movie 2 ∈ [2, 2] , Movie 3 ∈ [3, 3])) in Table
      8 gives a much more accurate description of the concepts’s extent. Also, concept
      ({3} , (Movie 1 ≤ 5 , Movie 2 ≤ 5 , Movie 3 ≤ 1)) in Table 7 describes Critic 3
                          Homogeneity and Stability in Conceptual Analysis                      261

as giving any marks to Movies 1 and 2, and low marks to Movie 3; using interval
generalization we learn that the marks given by Critic 3 to Movies 1 and 2 are
the highest and non other. Consider now concept ({3, 4} , (Movie 1 ≥ 4 , Movie
2 ≥ 3 , Movie 3 ≥ 1)) in Table 6: the intent reports any mark for Movie 3 (in
particular, high marks are possible); if we use interval generalization instead we
obtain the concept ({3, 4} , (Movie 1 ∈ [4, 5] , Movie 2 ∈ [3, 5] , Movie 3 ∈ [1, 2]
which more accurately describes the observed situation.
    We now compare the concepts retained as stable with each of the three
distances, using both δM ax and δM ean , and a threshold value of 1 and 2. The
identified stable concepts in each case, represented by the corresponding extent,
are listed in Table 9.



Distance Criterion Threshold                       Stable concepts (extent)
   dH      Max         1                       {1} , {2} , {3} , {4} , {5} , {1, 4}
                                 {1} , {2} , {3} , {4} , {5} , {1, 2} , {1, 4} , {1, 5} , {3, 4} ,
                       2                   {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} ,
                                    {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5}
          Mean         1                      {1} , {2} , {3} , {4} , {5} , {1, 4} ,
                                {1, 2, 4} , {1, 2, 5} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5}
                                                   {1} , {2} , {3} , {4} , {5} ,
                                        {1, 2} , {1, 4} , {1, 5} , {2, 4} , {3, 4} , {4, 5}
                       2       {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} , {2, 3, 4} , {3, 4, 5}
                                    {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5}
   d2      Max         1                       {1} , {2} , {3} , {4} , {5} , {1, 4}
                                 {1} , {2} , {3} , {4} , {5} , {1, 2} , {1, 4} , {1, 5} , {3, 4} ,
                       2                   {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} ,
                                    {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5}
          Mean         1                      {1} , {2} , {3} , {4} , {5} , {1, 4} ,
                                      {1, 2, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5}
                                                   {1} , {2} , {3} , {4} , {5} ,
                                        {1, 2} , {1, 4} , {1, 5} , {2, 4} , {3, 4} , {4, 5}
                       2       {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} , {2, 3, 4} , {3, 4, 5} ,
                                    {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5}
   d1      Max         1                       {1} , {2} , {3} , {4} , {5} , {1, 4}
                                 {1} , {2} , {3} , {4} , {5} , {1, 2} , {1, 4} , {1, 5} , {3, 4} ,
                       2                   {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} ,
                                    {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5}
          Mean         1                      {1} , {2} , {3} , {4} , {5} , {1, 4} ,
                                      {1, 2, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5}
                                                   {1} , {2} , {3} , {4} , {5} ,
                                       {1, 2} , {1, 4} , {1, 5} , {2, 4} , {3, 4} , {4, 5} ,
                       2       {1, 2, 4} , {1, 2, 5} , {1, 3, 4} , {1, 4, 5} , {2, 3, 4} , {3, 4, 5} ,
                                    {1, 2, 3, 4} , {1, 2, 4, 5} , {1, 3, 4, 5} , {1, 2, 3, 4, 5}
   Table 9. Stable concepts for different distances, criteria and threshold values.
262       Paula Brito and Géraldine Polaillon

          As it may be seen from Table 9, for all distances and both criteria, a demand-
      ing threshold identifies a small number of stable concepts, therefore leading to
      an important reduction in the number of retained concepts; if we use a more
      liberal threshold, a larger number of concepts are retained as stable, as was to
      be expected. The maximum criterion is naturally more strict than the mean,
      which retains more concepts as stable, for all distances and both threshold val-
      ues. Finally, in this example, no important difference appears between the results
      obtained for the different distance measures.


      7    Conclusion

      A common generalization procedure, for numerical, ordinal and modal variables,
      which uses a representation based on interval-vectors is presented. This allows
      defining more homogeneous concepts, than generalization operators that use the
      maximum and/or the minimum. The proposed approach for ordinal variables
      allows addressing recommendation systems, analyzing preference data tables. It
      would also be interesting to explore how the proposed generalization operator
      behaves in a supervised learning context.
          The number of obtained concepts being often rather large, a method for
      identifying stable concepts is proposed, using a cross-validation-like approach.
      This allows avoiding the effect of atypical elements in the concepts’ formation.
      Naturally, the value of the used threshold has an important influence in the
      rate of concept reduction. The next step will be to explore this methodology for
      larger data tables, so as to have a more accurate evaluation of its efficiency in
      concept reduction. Another issue interesting to investigate is the comparaison
      of the list of concepts with those obtained with a subset of the given variables.
      This then leads to the problem of variable selection in the context of Galois
      lattices construction and analysis. As concerns applications, we are particularly
      interested in analyzing real preference data, for application in recommendation
      systems.


      References

      [1] Z. Assaghir, M. Kaytoue, N. Messai and A. Napoli (2009). On the mining of numer-
        ical data with Formal Concept Analysis and similarity. In Proc. Société Francophone
        de Classification, pp. 121-124.
      [2] Barbut, M. and B. Monjardet (1970). Ordre et Classification, Algèbre et Combina-
        toire, Tomes I et II. Paris: Hachette.
      [3] Brito, P. (1994). Order structure of symbolic assertion objects. IEEE Transactions
        on Knowledge and Data Engineering 6 (5), 830–835.
      [4] Brito, P. and G. Polaillon (2005). Structuring probabilistic data by Galois lattices.
        Math. & Sci. Hum. / Mathematics and Social Sciences 169 (1), 77–104.
      [5] Ganter, B. and S.O. Kuznetsov (2001). Pattern structures and their projections. In:
        G. Stumme and H. Delugach (Eds.), Proc. 9th Int. Conf. on Conceptual Structures,
        ICCS’01, Lecture Notes in Artificial Intelligence, vol. 2120, pp. 129-142.
                       Homogeneity and Stability in Conceptual Analysis           263

[6] Ganter, B. and R. Wille (1999). Formal Concept Analysis, Mathematical Founda-
  tions. Berlin: Springer.
[7] Kaytoue, M., S.O. Kuznetsov, A. Napoli and S. Duplessis (2011). Mining gene
  expression data with pattern structures in formal concept analysis. Information
  Sciences, Volume 181, Issue 10, 1989–2001.
[8] Kuznetsov, S. (2007). On stability of a formal concept. Annals of Mathematics and
  Artificial Intelligence 49 (1-4), 101–115.
[9] Noirhomme-Fraiture, M. and P. Brito (2011). Far beyond the classical data models:
  Symbolic Data Analysis. Statistical Analysis and Data Mining 4 (2), 157–170.
[10] Pernelle, N., M.-C. Rousset, and V. Ventos (2001). Automatic construction and
  refinement of a class hierarchy over multi-valued data. In L. De Raedt and A. Siebes
  (Eds.), Principles of Data Mining and Knowledge Discovery, Lecture Notes in Com-
  puter Science, pp. 386–398.
[11] Pfaltz, J. (2007). Representing numeric values in concept lattices. In J. Diatta,
  P. Eklund and M. Liquiere (Eds.), Proc. Fifth International Conference on Concept
  Lattices and Their Applications, pp. 260–269.
[12] Roth, C., S. Obiedkov and D. Kourie (2008). On succint representation of knowl-
  edge community taxonomies with Formal Concept Analysis. International Journal
  of Foundations of Computer Science 19 (2), 383–404.
           A lattice-based query system for assessing the
                     quality of hydro-ecosystems

             Agnès Braud1 , Cristina Nica2 , Corinne Grac3 , and Florence Le Ber?3,4
                              1
                                LSIIT, CNRS-UdS, Strasbourg, France
                            2
                             University Dunărea de Jos, Galati, Romania
                        3                                    ›
                          LHYGES, CNRS-ENGEES-UdS, Strasbourg,       France
                              4
                                LORIA – INRIA NGE, Nancy, France



               Abstract. Concept lattices are useful tools for organising and querying
               data. In this paper we present an application of lattices for analysing
               and classifying stream sites described by physical, physico-chemical and
               biological parameters. Lattices are first used for building a hierarchy of
               site profiles which are annotated by hydro-ecologists. This hierarchy can
               then be queried to classify and assess new sites. The whole approach
               relies on an information system storing data about Alsatian stream sites
               and their parameters. A specific interface has been designed to manipu-
               late the lattices and an incremental algorithm has been implemented to
               perform the query operations.

               Keywords: incremental lattice, lattice-based query system, classifica-
               tion, information system, biological quality of water-bodies


      1      Introduction

      Concept -or Galois- lattices are useful tools for organising, mining, and querying
      qualitative data in various application domains [14, 10, 24]. However when de-
      veloping a domain specific lattice-based tool -to be used by domain analysts, a
      main problem is to define the proper approach and tool that fit the requirements
      of the experts and other users involved in the project. This paper presents an
      application of Galois lattices to the hydro-ecological domain, focussing on how to
      assess and monitor the ecological state of streams or water areas. These questions
      are currently major problems in Europe, as underlined by the recent European
      Water Framework Directive (2000). Assessing the ecological quality of streams
      requires to take into account various data such as physico-chemical measures on
      sites, but also taxonomic statements or qualitative information on species. Fur-
      thermore tools are needed to summarise all these data and to provide a global
      and reliable information on the ecological state of streams and water areas. Fol-
      lowing this aim we have developed an information system to collect data on
      Alsatian streams (North-East of France) [17] and implemented a lattice-based
      query system to help hydro-ecologists to compare and assess the ecological state
       ?
           Corresponding author, florence.leber@engees.unistra.fr.




c 2011 by the paper authors. CLA 2011, pp. 265–277. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
266 2      A. Braud,
        Agnés Braud, C. Nica, C.
                      Cristina    Grac,
                                Nica,   F. Le Ber
                                      Corinne  Grac and Florence Le Ber

    of streams. Concepts lattices are used: (1) to organise data, i.e. stream or water
    area sites with similar parameters are clustered within concepts; (2) to embed
    expert knowledge, i.e. concepts are annotated with an expert qualification or
    comment; (3) to perform queries, i.e. the annotated concepts are used to help
    assessing new sites of streams or water areas.
        The paper is organised as follows. First (Section 2) we present the application
    domain. Section 3 is devoted to the principles of lattice-based querying. Sections
    4 and 5 describe the principles and the implementation of our proposition. Sec-
    tion 6 compares our approach to other lattice-based tools and the last section is
    a conclusion.


    2    Assessing the quality of hydro-ecosystems

    The European Water Framework Directive (2000) requires the development of
    new tools for monitoring and assessing the quality of water-bodies (i.e. rivers,
    lake, gravel pits,...). Such an assessment is built on various information: informa-
    tion about the species living in the streams and physical, chemical and biological
    data collected on the sites. From these information are built several numerical
    indices that are synthetic indicators for assessing the physico-chemical or bio-
    logical quality of an hydro-ecosystem.
        More precisely, in France, five biological indices have been normalised to
    assess the quality of running water. They are based on three faunistic groups: the
    invertebrate index [1], the oligochaete (small worms living in sediments) index [3],
    the fish index [5], and on two floristic groups: the diatom (microscopic algae)
    index [2], and the macrophyte (macroscopic plants living in water) index [4].
    Illustrations of the taxa used for these indices are given in Figure 1.




        (a) Invertebrate(b) Oligochaete   (c) Fish     (d) Diatom    (e) Macrophyte

                   Fig. 1: Taxa examples for the five biological indices


        According to AFNOR (French organism of normalisation) [1, 3, 5, 2, 4] each of
    them gives a different estimation of the water ecosystem quality. The macrophyte
    index estimates the trophic level of water, the diatom index gives the global water
    quality, the oligochaete index gives an evaluation of the sediment quality, and
    the fish index allows to classify the chemical and physical water quality quite
    like the invertebrate index. Therefore, their answers on a same site, with a same
    undergone pressure, at the same time can be really different but the simultaneous
A lattice-based query system forLattice-based
                                 assessing theassessment
                                                quality ofofhydro-ecosystems
                                                             hydro-ecosystems       3
                                                                                  267

 application of these five indices is not common and work comparing their answers
 are not frequent [20].
     Furthermore, indices based on physical (e.g. width and slope of the stream
 bed) and physico-chemical (e.g. pH, temperature, nitrates, organic matters, pes-
 ticides) data give an other estimation of the ecosystem quality.
     Thus, it is necessary to combine the various indices to assess the quality
 of a whole water ecosystem. Such an approach, called the ecological ambiance
 system, has been proposed in [20, 21] based on the five French biological indices.
 Our objective is to develop this concept and to propose a concretely applicable
 tool. We therefore rely on a large database collecting data on Alsatian streams
 and water areas [18]. The database contains 38 tables and it suits the SANDRE1
 French national format for aquatic data. It is implemented within the MySQL
 Database Management System.
     The data are either issued from samples, synthetic data or general informa-
 tion issued from the literature. They are qualitative and quantitative, and suit
 the current standards about protocol sampling and indices computation based on
 thresholds [1, 3, 5, 2, 4, 22, 23]. Data issued from samples correspond to raw data.
 Synthetic data are produced from these samples, in particular taxonomic lists
 are used to compute biological indices. Data issued from the literature are used
 for the analysis and synthesis of the preceding data (for example they provide
 the thresholds for the classification of physical, physico-chemical and biological
 results into classes ranging from 1 (very good quality) to 5 (very bad quality)).
 We have gathered information on 700 sites in the Alsace Plain, the oldest one
 being collected 20 years ago. Details on this database and how it is used are
 given in [17].


 3      Using lattices for querying databases

 Galois lattices are useful tools for organising data and building knowledge bases [7,
 14, 24]. Furthermore, they are very interesting for information retrieval since they
 allow both direct retrieval and browsing [16]. Primarily, concept lattices have
 been used for information retrieval within texts [25, 11]. More recently lattice-
 based approaches have been used to build query or information retrieval systems
 on various data: e.g. information retrieval within photos or personal data [13],
 geographical data [8], or museum collections [26]. The underlying hypothesis is
 that a concept extent represents the result of a query which is defined by the
 conjunction of its intent. The query can be easily refined or enlarged following
 the edges starting from the concept into the lattice hierarchy.
     Practically, the query (a A set of attributes) can be performed as follows: the
 lattice is looked for a matching concept that is a concept which intent equals
 the A set -if it exists- or the most general concept which intent is larger than A.
 This concept can also be characterised as the infimum (greatest lower bound) of
 all the concepts containing at least one of the attributes of A. This can be done
  1
      http://sandre.eaufrance.fr
268 4        A. Braud,
          Agnés Braud, C. Nica, C.
                        Cristina    Grac,
                                  Nica,   F. Le Ber
                                        Corinne  Grac and Florence Le Ber

    with various algorithms and the queried lattice does not have to be modified.
    Furthermore, a local view can be displayed to the user.
        However, when the query represents a new object that is to be incorporated
    within the lattice, an incremental algorithm has to be used [15, 10]. This is the
    case in our application, since the user has got data about real stream sites which
    she/he wants to confront to the sites represented in the existing lattice. Further-
    more, she/he can add the new sites to the lattice and thus modify its structure.
    We have implemented therefore two incremental algorithms proposed in [10], and
    roughly described in section 5.1. These algorithms have been chosen because they
    allow to build the Hasse diagram of the lattice, contrarily to most of incremental
    algorithms (see [19] for a comparison on these algorithms). Furthermore, we did
    not look for performance, since in this first step of our work only small data sets
    (40 sites) have been considered.


    4      Using lattices for assessing hydro-ecosystems

    Lattices have been used in two ways: firstly to cluster stream sites into concepts
    that are used by hydro-ecologists to define profiles of these sites; secondly, the
    lattices are annotated with the profiles and used into a query-system to help
    the assessment of new sites. The proposed tool includes the two stages (see
    Section 5.2).


    4.1     A lattice-based clustering of Alsatian stream sites

    Stream sites are described by different numerical attributes, biological indices on
    the one hand, physico-chemical data on the other hand. Those attributes are con-
    verted into ordinal scales leading to quality classes. The whole context contains
    about 40 stream sites, described with 5 biological indices, 10 physico-chemical
    indices and 5 physical indices. In the following, we focus on the biological indices.
    Table 1 gives the values of these five indices restricted to seven sites. Each site is
    denoted by a code: for example, the BW2 site (Brunnwasser downstream) has a
    good quality (class 2) for the IBGN (invertebrate), IBD (diatom) and IPR (fish)
    indices, a bad quality (class 4) for the IBMR (macrophyte) index and an average
    quality (class 3) for the IOBS (oligochaete) index. The multi-valued context rep-
    resented in table 1, denoted C7 in the following, can be converted into a binary
    one by using a linear scale [14].
        The general idea is to gather similar sites and to allocate them a profile
    describing their ecological state, combining the quality estimations of all com-
    partments, with respect to the different classes of indices. This work is based on
    the approach described in [20]. The process is as follows:

        – Step 1: Lattice construction on the data. To facilitate the expert analysis,
          the context size is reduced by focussing on a small number of indices or
          by identifying sub-lattices with respect to classes of indices. For example,
A lattice-based query system forLattice-based
                                 assessing theassessment
                                                quality ofofhydro-ecosystems
                                                             hydro-ecosystems           5
                                                                                      269

                      Site code   IBGN    IBMR     IOBS     IBD    IPR
                      BW2           2       4        3       2      2
                      IL1           3       3        3       2      3
                      MO1           1       4        3       3      4
                      MS2           2       4        5       2      2
                      RT2           2       5        4       2      2
                      ST1           1       3        4       3      2
                      ZN4           1       4        4       3      2
        Table 1: Quality classes of the five biological indices for 7 stream sites




     Figure 2 presents the lattice obtained from the context C7 (the lattice was
     built with ConExp2 ) .
   – Step 2: Analysis by the experts of the lattice hierarchy and its implication
     rules in order to select relevant concepts (or site profiles). In this step, the
     expert may identify profiles which are not present in the lattice and create
     virtual sites to be represented in the lattice.
   – Step 3: Qualification of the concepts by the experts. For example, the con-
     cept ({IBGN 2, IBD 2, IPR 2, IBMR 4, IOBS 3},{BW2}) (down on the lat-
     tice, Figure 2) is interpreted as follows: Brunnwasser downstream: low sed-
     iment degradation, high eutrophication, good general potential of resilience
     and possible resilience for sediments, various habitats.
    Once a suitable annotated lattice has been built following this process, it
 can be used to determine the profile of a new site based on its values for the
 corresponding indices. This is explained in the next section.

 4.2      Assessing a stream site from the lattice
 According to the ecological ambiance system described in [20], several lattices
 have been built for clustering sites with similar average values (or alteration
 degrees3 ) on the five biological indices. The underlying hypothesis is that global
 state of an hydro-ecosystem can be assessed on the basis of the five biological
 indices and synthesised by the alteration degree. Sites with similar alteration
 degrees can be compared even if they represent various profiles. The intervals of
 similarity have been defined by the hydro-ecologists [18]. For example, the lattice
 in Figure 2 was obtained from a set of sites with an alteration degree belonging
 to [2.5 ; 3] (see C7 context in table 1). The classes of indices in the lattice vary
 between 1 and 5. Each site is represented alone in an atom of the lattice, which is
 coherent with the choices done in the project, trying to represent all the variety
 of streams or water areas in the Alsace plain.
  2
      http://conexp.sourceforge.net/
  3
      The alteration degree is computed as the average value of the five biological in-
      dices, e.g. the alteration degree of BW2 equals 13/5. Currently the physico-chemical
      parameters are not taken into account.
270 6        A. Braud,
          Agnés Braud, C. Nica, C.
                        Cristina    Grac,
                                  Nica,   F. Le Ber
                                        Corinne  Grac and Florence Le Ber




              Fig. 2: The lattice based on the context of table 1 (linear scale)


        Let us now suppose that we have got a partial information on a new stream
    site, denoted Q, defined by the following values: IBGN 2 IBMR 4 IOBS 3 IPR 2
    (IBD missing). Its alteration degree is 2.75 ∈ [2.5 ; 3], Q can thus be compared
    to the stream sites represented in the C7 lattice. This is done by classifying Q
    within this lattice, as shown in Figure 3.
        Looking at the lattice in Figure 3, one can see that the Q site-query has four
    common values with only the BW2 site (Brunnwasser downstream). The expert
    qualification of BW2 (except for the IBD index) can thus be used to assess the Q
    site. The Q site could thus be assessed as follows: the habitat quality and the water
    physico-chemical quality are good, expect for nutriments (nitrate and phosphor
    mineral forms) which quality is medium; the sediment quality is medium, the
    resilience potential of the general ecosystem is good, while the resilience potential
    of sediments is deteriorated.


    5     Implementation
    5.1     Algorithms
    As explained before, the built lattices have to be queried for assessing new sites.
    Furthermore, they could have to be updated, by adding a new site, or by modi-
    fying an existing site. The new/updated object is described by attributes which
    can exist in the context of the lattice or not. In this paper we only consider the
    case where the attributes already exist. Two algorithms described by Carpineto
    and Romano [10] have been implemented, the first one allows to add a new object
    in a lattice, while the second one allows to delete an object from a lattice.
A lattice-based query system forLattice-based
                                 assessing theassessment
                                                quality ofofhydro-ecosystems
                                                             hydro-ecosystems     7
                                                                                271




               Fig. 3: The C7 lattice with the Q site-query inserted



     The first algorithm allows to add a new object into an existing Galois lattice,
 which can be interpreted as classifying a new object. It takes as input a Galois
 lattice and the new object with its attributes. The output is the updated Galois
 lattice of the new context. The mechanism of the algorithm is as follows. The
 set of the concepts is divided into subsets according to their intent cardinality,
 and then analysed in ascending order. For each concept of a subset, if the intent
 is included in or equal to the set of the new object attributes then the current
 concept extent is augmented by the new object; otherwise a new concept is
 created, after verifying that such a concept is not in the initial set of concepts
 or among the new added ones. The intent of this new concept is determined
 by the intersection of the current concept intent and the new object attributes;
 its extent is defined by the current concept extent augmented with the new
 object. After the addition of a new concept a new link between this concept and
 the current concept is created. The links with neighbouring concepts are also
 updated.
    The second algorithm allows to delete an object from a lattice. It takes as
 input a Galois lattice and the object to be removed. The output is the updated
 Galois lattice of the new context. The mechanism of the algorithm is as follows.
 For each concept, if the object to be deleted is included or equal to the current
 concept extent, then it is removed from this extent. If the modified concept has
 then the same extent as one of its children, it is deleted. When a concept is
 removed the links among the concepts are updated.
272 8        A. Braud,
          Agnés Braud, C. Nica, C.
                        Cristina    Grac,
                                  Nica,   F. Le Ber
                                        Corinne  Grac and Florence Le Ber

       The modification of an existing object in a Galois lattice is performed in
    two steps: (1) deleting this object using the second algorithm; (2) adding the
    updated object using the first algorithm. The whole process could be improved
    with a third algorithm for adding attributes into the lattice context, allowing to
    enrich the initial lattice with new information.


    5.2     User interface and manipulation

    The user interface allows to use a lattice either stored in the database or stored in
    a XML file with the structure used in the software Galicia4 . Three main functional
    views are provided to the user. The first one allows to qualify concepts, i.e. to
    describe the profile of a set of sites. The second one allows to define a query,
    i.e. a new site to be assessed according to an existing lattice. The third view
    allows to explore the result of the query, i.e. to compare the characteristics of
    the new site to those of the already assessed sites. Currently texts appearing on
    the interface views are written in French since the target users are French. Other
    languages could be used in the future.
         The functional view for qualifying concepts is presented on Figure 4. Once a
    lattice is chosen, it is possible to select a given concept in a list and to see its
    description (intent, extent, and comment). The lists of the parents and children
    of that concept are also shown, and by a click on one of them, we see its related
    information. These information may help the experts in qualifying the concept.
    The comment is then stored in the database.




                     Fig. 4: Qualifying the concepts of the site lattice
    4
        http://www.iro.umontreal.ca/~galicia/
A lattice-based query system forLattice-based
                                 assessing theassessment
                                                quality ofofhydro-ecosystems
                                                             hydro-ecosystems       9
                                                                                  273

     The functionality for classifying a new site based on its values (for one or
 several indices) is presented on Figure 5. One has first to select a lattice and to
 give a name for the new site, and then to provide a description of this new site by
 choosing indices and their values. Once this is done, it is possible to classify the
 site, that is to integrate it in the lattice, either temporarily or to save it in the
 lattice. The button “Classer” allows this classification. To interpret the result,
 the button “Visualiser le résultat” can be used to see the new lattice with the
 modifications shown in a specific colour. The button “Explorer le treillis” also
 helps in the interpretation by giving access to a third view (Figure 6) where it is
 possible to navigate within the concepts and see the description of the parents
 and children of the current concept.




                        Fig. 5: Definition of the Q site-query


    More precisely, the third view allows to explore only the modified or new
 concepts of the lattice, i.e. the concepts where the site-query is represented.
 These concepts can be commented and the modified lattice can be stored in the
 database. Eventually, the commented lattices can be exported in various formats
 to be further analysed.

 6    Discussion
 We decided to implement a specific tool for several reasons:
  1. the tool has to be interconnected with a database and to offer a user-friendly
     interface for hydro-ecologists, allowing them to annotate the concepts;
  2. the purpose of the tool is not navigating throughout the whole database;
  3. this is a two-stage tool: the first stage organises a specific information within
     a lattice; the second stage allows the user to explore and possibly modify
     this lattice.
274 10 Agnés
          A. Braud,
              Braud, C. Nica, C.
                     Cristina    Grac,
                               Nica,   F. Le Ber
                                     Corinne  Grac and Florence Le Ber




                Fig. 6: Analysing the classification result of the Q query


        Regarding the first point, lattice-builder tools like Galicia, ConExp, or the
    Toscana suite5 cannot be used, since they do not fit the requirements of hydro-
    ecologists. Actually, as said before, we have used Galicia to build the lattices
    which are then recorded in the database to be annotated and explored by hydro-
    ecologists. Besides, the lattices built through our tool can be exported into a
    Galicia format.
        Regarding the second point, our approach differs from those used in search
    or browsing tools like Camelis [13], Abilis [6], D-SIFT [12] or in the Virtual
    Museum of the Pacific [26]. Indeed we did not try to implement a lattice-based
    approach to explore the whole database, but only specific information from this
    database. This information was chosen by hydro-ecologists as a synthetic view
    of the database. Furthermore, the lattice is used as a basis to record expert
    knowledge (the annotations) that can be involved in further investigations.
        Regarding the last point, our tool can be compared to Ulysses [9] which is
    a visual interface allowing to access a lattice structure organising information
    from a database. Ulysses allows the user to search the retrieval space both by
    browsing or querying, whereas our tool only allows querying. Nevertheless, the
    originality of our tool is the user possibility of modifying and annotating the
    lattice concepts.
        Finally, the underlying aim of our approach is to build an ontology, gather-
    ing the knowledge of various experts on hydro-ecosystems. Each expert indeed
    focuses on a specific compartment of the hydro-ecosystems (e.g. fishes, macro-

    5
        http://toscanaj.sourceforge.net/
A lattice-based query system forLattice-based
                                 assessing theassessment
                                                quality ofofhydro-ecosystems
                                                             hydro-ecosystems       11
                                                                                   275

 phytes, diatoms...) and a generic tool is needed to combine their expertises and
 produce a global assessment of the ecological state of a stream site.


 7    Conclusion

 This paper presents a lattice-based query system for helping the assessment of
 hydro-ecosystems. The approach relies on a database storing various information
 on stream sites of the Alsace plain. These data are summarised within qualitative
 indices, biological indices or physico-chemical and physical indices. Based on
 these indices and their own expertise, hydro-ecologists can perform a global
 evaluation of the functioning of a stream ecosystem. Furthermore, they want to
 define quality profiles of streams or water areas that could be used to assess new
 sites. Eventually a tool is needed to help the whole process.
     Our work aims at building such a tool. Concept lattices appeared as a good
 approach since they allow both to build hierarchical clustering of sites, to nav-
 igate through the clusters, and to perform queries for helping the assessment
 of a new site. The clustering aspects already proved to be interesting, and the
 user interface allowing to comment and query the lattices is currently being ex-
 perimented by hydro-ecologists. In the future, several lattices have to be built
 including various sets of indices (physico-chemical and physical indices). Fur-
 thermore, the whole approach will be tested with stream or water area data
 from other regions in France.
     Regarding the implementation aspects, the system should be improved in two
 ways: allowing the integration of new attributes in an existing lattice and allow-
 ing the navigation through bigger lattices. Finally improvements can be done to
 provide self-building comments on the site-queries, based on the comments of
 the neighbouring concepts.


 Acknowledgements

 The Indice project (2006-11) was supported by the Agence de l’Eau Rhin-
 Meuse. We also acknowledge the scientific and technical help of the Cemagref
 Centre in Lyon, the Gabriel Lippmann Public Research Centre in Luxembourg
 and the regional delegation of ONEMA (Office National de l’Eau et des Milieux
 Aquatiques). Cristina Nica’s stay in France was supported by the Erasmus Eu-
 ropean program. We acknowledge the anonymous reviewers who helped us to
 improve our paper.


 References
  1. AFNOR: Qualité de l’eau : détermination de l’Indice Biologique Global Normalisé
     (IBGN). NF T90-350 (1992), révision 2004
  2. AFNOR: Qualité de l’eau : détermination de l’Indice Biologique Diatomées (IBD.
     NF T90-354 (2000), révision 2007
276 12 Agnés
          A. Braud,
              Braud, C. Nica, C.
                     Cristina    Grac,
                               Nica,   F. Le Ber
                                     Corinne  Grac and Florence Le Ber

     3. AFNOR: Qualité de l’eau : détermination de l’Indice Oligochètes de Bioindication
        des Sédiments (IOBS). NF T90-390 (2002)
     4. AFNOR: Qualité de l’eau : détermination de l’Indice Biologique Macrophytique en
        Rivière (IBMR). NF T90-395 (2003)
     5. AFNOR: Qualité de l’eau : détermination de l’Indice poissons rivière (IPR). NF
        T90-344 (2004)
     6. Allard, P., Ferré, S., Ridoux, O.: Discovering Functional Dependencies and Asso-
        ciation Rules by Navigating in a Lattice of OLAP Views. In: Kryszkiewicz, M.,
        Obiedkov, S. (eds.) Proceedings of CLA 2010, Sevilla, Spain. pp. 199–210 (2010)
     7. Barbut, M., Monjardet, B.: Ordre et classification – Algèbre et combinatoire. Ha-
        chette (1970)
     8. Bedel, O., Ferré, S., Ridoux, O., Quesseveur, E.: GEOLIS: a logical information
        system for geographical data. Revue Internationale de Géomatique 17, 371–390
        (2007)
     9. Carpineto, C., Romano, G.: ULYSSES: A Lattice-based Multiple Interaction Strat-
        egy Retrieval Interface. In: Blumenthal, B., Gornostaev, J., Unger, C. (eds.)
        Human-Computer Interaction, 5th International Conference, EWHCI’95, Moscow,
        Russia. LNCS, vol. 1015, pp. 91–104. Springer-Verlag (1995)
    10. Carpineto, C., Romano, G.: Concept Data Analysis. Theory and Applications.
        John Wiley & Sons Ltd (2004), 201 pages
    11. Carpineto, C., Romano, G.: Using concept lattices for text retrieval and mining.
        In: Ganter, B., Stumme, G., Wille, R. (eds.) Formal Concept Analysis, LNCS, vol.
        3626, pp. 3–45. Springer Berlin / Heidelberg (2005)
    12. Ducrou, J., Wormuth, B., Eklund, P.: D-SIFT: A Dynamic Simple Intuitive FCA
        Tool. In: Dau, F., Mugnier, M.L., Stumme, G. (eds.) Conceptual Structures: Com-
        mon Semantics for Sharing Knowledge – Proceedings of ICCS 2005. vol. LNAI
        3596, pp. 295–306. Springer-Verlag (2005)
    13. Ferré, S.: Camelis: a logical information system to organise and browse a collection
        of documents. International Journal of General Systems 38(4), 379–403 (2009)
    14. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations.
        Springer Verlag (1999)
    15. Godin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithm
        based on Galois (concept) lattices. Computational Intelligence 11(2), 246–267
        (1995)
    16. Godin, R., Missaoui, R., April, A.: Experimental comparison of navigation in a Ga-
        lois lattice with conventional information retrieval method. International Journal
        of Man-Machine Studies 38, 747–767 (1993)
    17. Grac, C., Braud, A., Le Ber, F., Trémolières, M.: Un système d’information pour le
        suivi et l’évaluation de la qualité des cours d’eau – Application à l’hydro-région de
        la plaine d’Alsace. RSTI - Ingénierie des Systèmes d’Information 16, 9–30 (2011)
    18. Grac, C., Le Ber, F., Braud, A., Trémolières, M., Bertaux, A., Herrmann, A.,
        Manné, S., Lafont, M.: Programme de recherche-développement Indices – rap-
        port scienfique final. Contrat pluriannuel 1463 de l’Agence de l’Eau Rhin-Meuse,
        LHYGES – LSIIT – ONEMA – CEMAGREF (2011)
    19. Kuznetsov, S.O., Obiedkov, S.A.: Comparing performance of algorithms for gen-
        erating concept lattices. J. Exp. Theor. Artif. Inelligence 14(2-3), 189–216 (2002)
    20. Lafont, M.: A conceptual approach to the biomonitoring of freshwater: the ecolog-
        ical ambience system. Journal of Limnology 6, 17–24 (2001)
    21. Lafont, M., Jézéquel, C., Vivier, A., Breil, P., Schmitt, L., Bernoud, S.: Refinement
        of biomonitoring of urban water courses by combining descriptive and ecohydro-
        logical approaches. Ecohydrol. Hydrobiol. 10, 3–11 (2010)
A lattice-based query system forLattice-based
                                 assessing theassessment
                                                quality ofofhydro-ecosystems
                                                             hydro-ecosystems           13
                                                                                       277

 22. MEDD: Système d’évaluation de la qualité de l’eau des cours d’eau (SEQ-Eau),
     version 2. Ministère de l’Ecologie et du Développement Durable et Agences de
     l’Eau (2003), Étude inter-agences de l’eau, no 52
 23. MEDD: Circulaire dce 2007/22 du 11 avril 2007 relative au protocole de
     prélèvement et de traitement des échantillons des invertébrés pour la mise en œu-
     vre du programme de surveillance sur cours d’eau. Ministère de l’Ecologie et du
     Développement Durable (2007)
 24. Napoli, A.: A smooth introduction to symbolic methods in knowledge discovery.
     In: Cohen, H., Lefebvre, C. (eds.) Categorization in Cognitive Science. Elsevier
     (2006)
 25. Priss, U.: Lattice-based information retrieval. Knowledge Organization 27(3),
     132142 (2000)
 26. Wray, T., Eklund, P.: Exploring the Information Space of Cultural Collections
     Using Formal Concept Analysis. In: Valtchev, P., Jäschke, R. (eds.) Proceedings of
     9th International Conference on Formal Concept Analysis, ICFCA 2011, Nicosia,
     Cyprus. LNAI, vol. 6628, pp. 251–266. Springer-Verlag (2011)
          The Word Problem in Semiconcept Algebras

                                      Philippe Balbiani

                              CNRS — Université de Toulouse
                     Institut de recherche en informatique de Toulouse
             118 ROUTE DE NARBONNE, 31062 TOULOUSE CEDEX 9, France
                                  Philippe.Balbiani@irit.fr



            Abstract. The aim of this article is to prove that the word problem in
            semiconcept algebras is PSPACE-complete.


      Keywords: Formal concept analysis, semiconcept algebras, word problem, de-
      cidability/complexity.


      1   Introduction
      In formal concept analysis [2, 3], the properties of formal contexts are reflected
      by the properties of the concept lattices they give rise to [10, 12]. Extending
      concept lattices to protoconcept algebras and semiconcept algebras, Herrmann
      et al. [5] and Wille [11] introduced negations in conceptual structures based
      on formal contexts such as double Boolean algebras and pure double Boolean
      algebras. These algebras have attracted interest for their theoretical merits —
      basic representations have been obtained — and for their practical relevance —
      applications in the field of knowledge representation and reasoning have been
      developed [5–7, 9, 11].
      The basic representations of protoconcept algebras and semiconcept algebras
      evoked above have been obtained by means of equational axioms. Hence, the
      problem naturally arises of whether there is an algorithm which given terms s, t,
      decides whether they represent the same element in all models of these equa-
      tional axioms. Such a problem is called the word problem (WP) in protoconcept
      algebras or in semiconcept algebras. In Mathematics and Computer Science,
      word problems are of the utmost importance.
      Within the context of protoconcept algebras, Vormbrock [8] demonstrates that
      given terms s, t, if s = t is not valid in all protoconcept algebras then there
      exists a finite protoconcept algebra in which s = t is not valid. Nevertheless,
      the upper bound on the size of the finite protoconcept algebra given in [8, Page
      258] is not elementary. Therefore, it does not allow us to conclude — as wrongly
      stated in [8, Page 240] — that the WP in protoconcept algebras is NP-complete.
      Switching over to semiconcept algebras, the aim of this article is to prove that
      the WP in semiconcept algebras is PSPACE-complete.
      Sections 2 and 3 show some of the basic properties of formal contexts and semi-
      concept algebras that have been discussed in [5–7, 9, 11]. In Section 4, we present




c 2011 by the paper authors. CLA 2011, pp. 279–294. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
280         Philippe Balbiani

      the WP in semiconcept algebras. Section 5 introduces a basic 2-sorted modal
      logic that will be used in Sections 6 and 7 to prove that the WP in semiconcept
      algebras is PSPACE-complete. The proofs of Lemmas 10, 11, 12 and 13 can be
      found in the annex.

      2     From Formal Contexts to Semiconcept Algebras
      In formal concept analysis, the properties of semiconcepts are reflected by the
      properties of the algebras they give rise to.

      2.1     Formal Contexts
      Formal contexts are structures of the form IK = (G, M, ∆) where G is a nonempty
      set (with typical member denoted g), M is a nonempty set (with typical member
      denoted m) and ∆ is a binary relation between G and M . The elements of G
      are called “objects”, the elements of M are called “attributes” and the intended
      meaning of g ∆ m is “object g possesses attribute m”.
                                           ∆ a1 a2
                                           o1 × ×
                                           o2 ×
                                           Tab. 1.
      Example 1. In Tab. 1 is an example of a formal context IK2,2 with 2 objects —
      o1 and o2 — and 2 attributes — a1 and a2 .
      For all X ⊆ G and for all Y ⊆ M , let
                     X . = {m ∈ M : for all g ∈ G, if g ∈ X then g ∆ m}
                     Y / = {g ∈ G: for all m ∈ M , if m ∈ Y then g ∆ m}
      That is to say, X . is the set of all attributes possessed by all objects in X and
      Y / is the set of all objects possessing all attributes in Y .
      Example 2. In the formal context IK2,2 of Tab. 1, {o1 }. = {a1 , a2 } and {a2 }/ =
      {o1 }.
      To carry out our plan, we need to learn a little more about the pair (. ,/ ) of
      maps . : 2G 7→ 2M and / : 2M 7→ 2G . Obviously, for all X ⊆ G and for all Y ⊆
      M,
       – X ⊆ Y / iff X . ⊇ Y .
      Hence, the pair (. ,/ ) of maps . : 2G 7→ 2M and / : 2M 7→ 2G is a Galois connection
      between (2G , ⊆) and (2M , ⊇). Thus, for all X, X1 , X2 ⊆ G and for all Y, Y1 , Y2
      ⊆ M,
       – if X1 ⊆ X2 then X1. ⊇ X2. ,
       – if Y1 ⊇ Y2 then Y1/ ⊆ Y2/ ,
       – X ⊆ X ./ and X . = X ./. ,
       – Y /. ⊇ Y and Y / = Y /./ .
                                    The word problem in semiconcept algebras                    281

2.2    Semiconcept Algebras
Let IK = (G, M, ∆) be a formal context. Given X ⊆ G, the pair (X, X . ) is called
“left semiconcept of IK”. Remark that (∅, M ) is a left semiconcept of IK. Let
                           Hl (IK) = (Hl (IK), ⊥l , >l , ¬l , ∨l , ∧l )
be the algebraic structure of type (0, 0, 1, 2, 2) where Hl (IK) is the set of all left
semiconcepts of IK, ⊥l = (∅, M ), >l = (G, G. ), ¬l (X, X . ) = (G \ X, (G \ X). ),
(X1 , X1. ) ∨l (X2 , X2. ) = (X1 ∪ X2 , (X1 ∪ X2 ). ) and (X1 , X1. ) ∧l (X2 , X2. ) = (X1 ∩
X2 , (X1 ∩X2 ). ). Remark that if G is finite then Hl (IK) is finite too and moreover,
| Hl (IK) | = 2|G| . It is a simple exercise to check that the above operations ⊥l ,
>l , ¬l ·, · ∨l · and · ∧l · on Hl (IK) are isomorphic to the Boolean operations ∅, G,
G \ ·, · ∪ · and · ∩ · on 2G . Hence, Hl (IK) satisfies the conditions of nondegenerate
Boolean algebras. Given Y ⊆ M , the pair (Y / , Y ) is called “right semiconcept
of IK”. Remark that (G, ∅) is a right semiconcept of IK. Let
                          Hr (IK) = (Hr (IK), ⊥r , >r , ¬r , ∨r , ∧r )
be the algebraic structure of type (0, 0, 1, 2, 2) where Hr (IK) is the set of all right
semiconcepts of IK, ⊥r = (M / , M ), >r = (G, ∅), ¬r (Y / , Y ) = ((M \Y )/ , M \Y ),
(Y1/ , Y1 ) ∨r (Y2/ , Y2 ) = ((Y1 ∩ Y2 )/ , Y1 ∩ Y2 ) and (Y1/ , Y1 ) ∧r (Y2/ , Y2 ) = ((Y1 ∪
Y2 )/ , Y1 ∪ Y2 ). Remark that if M is finite then Hr (IK) is finite too and moreover,
| Hr (IK) | = 2|M | . It is a simple exercise to check that the above operations
⊥r , >r , ¬r ·, · ∨r · and · ∧r · on Hr (IK) are anti-isomorphic to the Boolean
operations ∅, M , M \·, ·∪· and ·∩· on 2M . Hence, Hr (IK) satisfies the conditions
of nondegenerate Boolean algebras. Now, for the concept underlying most of
our work in this article. Given X ⊆ G and Y ⊆ M , the pair (X, Y ) is called
“semiconcept of IK” iff Y = X . or X = Y / . Remark that (∅, M ) and (G, ∅) are
semiconcepts of IK.
Example 3. In the formal context IK2,2 of Tab. 1, the semiconcepts are (∅, {a1 ,
a2 }), ({o1 }, {a1 , a2 }), ({o2 }, {a1 }), ({o1 }, {a2 }), ({o1 , o2 }, {a1 }) and ({o1 , o2 }, ∅).
Let
                 H(IK) = (H(IK), ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r )
be the algebraic structure of type (0, 0, 0, 0, 1, 1, 2, 2, 2, 2) where H(IK) is the set
of all semiconcepts of IK, ⊥l = (∅, M ), ⊥r = (M / , M ), >l = (G, G. ), >r = (G, ∅),
¬l (X, Y ) = (G \ X, (G \ X). ), ¬r (X, Y ) = ((M \ Y )/ , M \ Y ), (X1 , Y1 ) ∨l (X2 , Y2 )
= (X1 ∪ X2 , (X1 ∪ X2 ). ), (X1 , Y1 ) ∨r (X2 , Y2 ) = ((Y1 ∩ Y2 )/ , Y1 ∩ Y2 ), (X1 , Y1 ) ∧l
(X2 , Y2 ) = (X1 ∩ X2 , (X1 ∩ X2 ). ) and (X1 , Y1 ) ∧r (X2 , Y2 ) = ((Y1 ∪ Y2 )/ , Y1 ∪ Y2 ).
Example 4. In the formal context IK2,2 of Tab. 1, ⊥l = (∅, {a1 , a2 }), >l =
({o1 , o2 }, {a1 }), ⊥r = ({o1 }, {a1 , a2 }) and >r = ({o1 , o2 }, ∅).
Remark that if G, M are finite then H(IK) is finite too and moreover, | H(IK) | ≤
2|G| + 2|M | . Obviously, the operations ⊥l , >l , ¬l ·, · ∨l · and · ∧l ·, when restricted
to the set of all left semiconcepts of IK, are isomorphic to the Boolean operations
282       Philippe Balbiani

      ∅, G, G \ ·, · ∪ · and · ∩ · on 2G whereas the operations ⊥r , >r , ¬r ·, · ∨r · and · ∧r ·,
      when restricted to the set of all right semiconcepts of IK, are anti-isomorphic to
      the Boolean operations ∅, M , M \ ·, · ∪ · and · ∩ · on 2M . In other respects, it is
      a simple matter to check that H(IK) satisfies the following conditions for every
      x, y, z ∈ H(IK):
       – x ∧l (y ∧l z) = (x ∧l y) ∧l z and x ∨r (y ∨r z) = (x ∨r y) ∨r z,
       – x ∧l y = y ∧l x and x ∨r y = y ∨r x,
       – ¬l (x ∧l x) = ¬l x and ¬r (x ∨r x) = ¬r x,
       – x ∧l (y ∧l y) = x ∧l y and x ∨r (y ∨r y) = x ∨r y,
       – x ∧l (y ∨l z) = (x ∧l y) ∨l (x ∧l z) and x ∨r (y ∧r z) = (x ∨r y) ∧r (x ∨r z),
       – x ∧l (x ∨l y) = x ∧l x and x ∨r (x ∧r y) = x ∨r x,
       – x ∧l (x ∨r y) = x ∧l x and x ∨r (x ∧l y) = x ∨r x,
       – ¬l (¬l x ∧l ¬l y) = x ∨l y and ¬r (¬r x ∨r ¬r y) = x ∧r y,
       – ¬l ⊥l = >l and ¬r >r = ⊥r ,
       – ¬l >r = ⊥l and ¬r ⊥l = >r ,
       – >r ∧l >r = >l and ⊥l ∨r ⊥l = ⊥r ,
       – x ∧l ¬l x = ⊥l and x ∨r ¬r x = >r ,
       – ¬l ¬l (x ∧l y) = x ∧l y and ¬r ¬r (x ∨r y) = x ∨r y,
       – (x ∧l x) ∨r (x ∧l x) = (x ∨r x) ∧l (x ∨r x),
       – x ∧l x = x or x ∨r x = x.
      Let us remark that the first 13 above conditions come in pairs of mirror images
      obtained by interchanging ⊥l with >r , >l with ⊥r , ¬l with ¬r , ∨l with ∧r
      and ∧l with ∨r whereas the last 2 above conditions are equivalent to their own
      mirror images. This leads us to the principle of duality stating that from any
      condition provable from the 15 above conditions, another such condition results
      immediately by interchanging ⊥l with >r , >l with ⊥r , ¬l with ¬r , ∨l with ∧r
      and ∧l with ∨r . The set H(IK) can be ordered by the binary relation v defined
      by
                           (X1 , Y1 ) v (X2 , Y2 ) iff X1 ⊆ X2 and Y1 ⊇ Y2
      for every (X1 , Y1 ), (X2 , Y2 ) ∈ H(IK). Obviously, for all (X1 , Y1 ), (X2 , Y2 ) ∈
      H(IK),
       – (X1 , Y1 ) v (X2 , Y2 ) iff (X1 , Y1 )∧l (X2 , Y2 ) = (X1 , Y1 )∧l (X1 , Y1 ) and (X1 , Y1 )
         ∨r (X2 , Y2 ) = (X2 , Y2 ) ∨r (X2 , Y2 ),
       – if (X1 , Y1 ) ∈ Hl (IK) then (X1 , Y1 ) v (X2 , Y2 ) iff (X1 , Y1 ) ∧l (X2 , Y2 ) =
         (X1 , Y1 ),
       – if (X2 , Y2 ) ∈ Hr (IK) then (X1 , Y1 ) v (X2 , Y2 ) iff (X1 , Y1 ) ∨r (X2 , Y2 ) =
         (X2 , Y2 ).
      Moreover, the binary relation v is reflexive, antisymmetric and transitive on
      H(IK). In order to give an abstract characterization of the operations ⊥l , ⊥r ,
      >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l and ∧r , we shall say that an algebraic structure D =
      (D, ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) of type (0, 0, 0, 0, 1, 1, 2, 2, 2, 2) is a pure
      double Boolean algebra iff the operations ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l and
      ∧r satisfy the 15 above conditions.
                                     The word problem in semiconcept algebras                    283

3     From Semiconcept Algebras to Formal Contexts

The aim of this section is to give an abstract characterization of the operations
⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l and ∧r .


3.1    Filters and Ideals

Let D = (D, ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) be a pure double Boolean alge-
bra. We define

                                    Dl = {x ∧l x: x ∈ D}
                                    Dr = {x ∨r x: x ∈ D}

Intuitively, elements of Dl can be considered as sets of objects and elements of
Dr can be considered as sets of attributes.
Example 5. In the semiconcept algebra associated to the formal context IK2,2 of
Tab. 1, D2,2 = {(∅, {a1 , a2 }), ({o1 }, {a1 , a2 }), ({o2 }, {a1 }), ({o1 }, {a2 }), ({o1 , o2 },
{a1 }), ({o1 , o2 }, ∅)}, Dl2,2 = {(∅, {a1 , a2 }), ({o1 }, {a1 , a2 }), ({o2 }, {a1 }), ({o1 , o2 },
{a1 })} and Dr2,2 = {({o1 }, {a1 , a2 }), ({o1 , o2 }, {a1 }), ({o1 }, {a2 }), ({o1 , o2 }, ∅)}.
Obviously, the operations ⊥l , >l , ¬l , ∨l and ∧l are stable on Dl and the opera-
tions ⊥r , >r , ¬r , ∨r and ∧r are stable on Dr . Hence, the algebraic structures Dl
= (Dl , ⊥l , >l , ¬l , ∨l , ∧l ) and Dr = (Dr , ⊥r , >r , ¬r , ∨r , ∧r ) are algebraic struc-
tures of type (0, 0, 1, 2, 2). More precisely, they are Boolean algebras. Moreover,
the set D can be ordered by the binary relation ≤ defined by

                      x ≤ y iff x ∧l y = x ∧l x and x ∨r y = y ∨r y

for every x, y ∈ D. Obviously, for all x, y ∈ D,

 – if x ∈ Dl then x ≤ y iff x ∧l y = x,
 – if y ∈ Dr then x ≤ y iff x ∨r y = y.

Moreover, the binary relation ≤ is reflexive, antisymmetric and transitive on D.

                                       q
                                       ({o1 , o2 }, ∅)
                                    @ I @
                              q               @q
                              ({o1 }, {a2 })@ ({o1 , o2 }, {a1 })
                              I
                              @ @             @ I@
                                    @q                   @q
                                  @ ({o1 }, {a1 , a@   2 }) ({o2 }, {a1 })

                                       I
                                       @ @               

                                              @q
                                            @ (∅, {a1 , a2 })


                                              Fig. 1.
284         Philippe Balbiani

      Example 6. In Fig. 1 is represented the binary relation ≤2,2 ordering the set
      D2,2 of the semiconcept algebra associated to the formal context IK2,2 of Tab. 1.

      A nonempty subset F of D is called a filter iff for all x, y ∈ D,
       – x, y ∈ F implies x ∧l y ∈ F ,
       – x ∈ F and x ≤ y imply y ∈ F .
      A nonempty subset I of D is called an ideal iff for all x, y ∈ D,
       – x, y ∈ I implies x ∨r y ∈ I,
       – x ∈ I and y ≤ x imply y ∈ I.
      The following lemma explains how filters and ideals can be transformed into
      filters and ideals of the Boolean algebras Dl and Dr .
      Lemma 1. Let F, I be nonempty subsets of D. If F is a filter then F ∩ Dl is a
      filter of the Boolean algebra Dl and F ∩ Dr is a filter of the Boolean algebra Dr
      and if I is an ideal then I ∩ Dl is an ideal of the Boolean algebra Dl and I ∩ Dr
      is an ideal of the Boolean algebra Dr .
      Let F be a nonempty subset of Dl and I be a nonempty subset of Dr . We define
                     [F ) = {x ∈ D: there exists y ∈ F such that y ≤ x}
                      (I] = {x ∈ D: there exists y ∈ I such that x ≤ y}
      The following lemma explains how filters of the Boolean algebra Dl and ideals
      of the Boolean algebra Dr can be transformed into filters and ideals.
      Lemma 2. Let F be a nonempty subset of Dl , I be a nonempty subset of Dr .
      If F is a filter of the Boolean algebra Dl then [F ) is a filter and [F ) ∩ Dl = F
      and if I is an ideal of the Boolean algebra Dr then (I] is an ideal and (I] ∩ Dr
      = I.
      As a result,
      Lemma 3. There exists filters F such that F ∩Dl is a prime filter of the Boolean
      algebra Dl and there exists ideals I such that I∩Dr is a prime ideal of the Boolean
      algebra Dr .
      We shall say that D is concrete iff there exists a formal context IK and a function
      h assigning to each element of D an element of H(IK) such that h is injective
      and h is a homomorphism from D to H(IK).

      3.2     Representation
      Now, the main question is to prove that every pure double Boolean algebra
      is concrete. Let D = (D, ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) be a pure double
      Boolean algebra and consider the formal context
                                   IK(D) = (Fp (D), Ip (D), ∆)
                                    The word problem in semiconcept algebras                  285

where Fp (D) is the set of all filters F for which F ∩ Dl is a prime filter of the
Boolean algebra Dl , Ip (D) is the set of all ideals I for which I ∩ Dr is a prime
ideal of the Boolean algebra Dr and F ∆ I iff F ∩ I is nonempty. Let

          H(IK(D)) = (H(IK(D)), ⊥0l , ⊥0r , >0l , >0r , ¬0l , ¬0r , ∨0l , ∨0r , ∧0l , ∧0r )

For all elements x of D, let

                                Fx = {F ∈ Fp (D): x ∈ F }
                                 Ix = {I ∈ Ip (D): x ∈ I}

Here, the first results are
Lemma 4. Let x ∈ D. Fx∧l x = Fx and Ix∨r x = Ix .

Lemma 5. Let x ∈ D. If x ∈ Dl then Fx. = Ix and if x ∈ Dr then Ix/ = Fx .

Lemma 6. Let x ∈ D. F¬.l ¬l x = I¬l ¬l x and I¬/ r ¬r x = F¬r ¬r x .

The next lemmas point the way to the strategy followed in our approach to the
proof that every pure double Boolean algebra is concrete.
Lemma 7. Let x ∈ D. The pair (Fx , Ix ) is a semiconcept of IK(D).

Lemma 8. Let x, y ∈ D. If x 6= y then (Fx , Ix ) 6= (Fy , Iy ).

For all x ∈ D, let

                                       h(x) = (Fx , Ix )

The next lemma is central for proving that the function h is a homomorphism
from D to H(IK).
Lemma 9. Let x, y ∈ D.

 – F⊥l = ∅ and I⊥l = Ip (D),
 – F⊥r = Ip (D)/ and I⊥r = Ip (D),
 – F>l = Fp (D) and I>l = Fp (D). ,
 – F>r = Fp (D) and I>r = ∅,
 – F¬l x = Fp (D) \ Fx and I¬l x = (Fp (D) \ Fx ). ,
 – F¬r x = (Ip (D) \ Ix )/ and I¬r x = Ip (D) \ Ix ,
 – Fx∨l y = Fx ∪ Fy and Ix∨l y = (Fx ∪ Fy ). ,
 – Fx∨r y = (Ix ∩ Iy )/ and Ix∨r y = Ix ∩ Iy ,
 – Fx∧l y = Fx ∩ Fy and Ix∧l y = (Fx ∩ Fy ). ,
 – Fx∧r y = (Ix ∪ Iy )/ and Ix∧r y = Ix ∪ Iy .

As a result,
Theorem 1. The function h is a homomorphism from D to H(IK).
In other words: every pure double Boolean algebra is concrete.
286         Philippe Balbiani

      4     The Word Problem in Pure Double Boolean Algebras
      Let us introduce the word problem in pure double Boolean algebras.

      4.1     Syntax
      Let V ar denote a countable set of individual variables (with typical instances
      denoted x, y, etc). The set t(V ar) of all terms (with typical instances denoted
      s, t, etc) is given by the rule
            s ::= x | 0l | 0r | 1l | 1r | −l s | −r s | (s tl t) | (s tr t) | (s ul t) | (s ur t)
      Let us adopt the standard rules for omission of the parentheses.
      Example 7. For instance, x ul (x tr y) is a term.

      4.2     Semantics
      Let D = (D, ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) be a pure double Boolean alge-
      bra. A valuation based on D is a function m assigning to each individual variable
      x an element m(x) of D.
      Example 8. The function m2,2 defined below is a valuation based on the pure
      double Boolean algebra D2,2 defined in Example 5: m2,2 (x) = ({o2 }, {a1 }),
      m2,2 (y) = ({o1 }, {a2 }) and for all individual variables z, if z 6= x, y then m2,2 (z)
      = ({o1 , o2 }, {a1 }).
      m induces a function (·)m assigning to each term s an element (s)m of D such
      that (x)m = m(x), (0l )m = ⊥l , (0r )m = ⊥r , (1l )m = >l , (1r )m = >r , (−l s)m
      = ¬l (s)m , (−r s)m = ¬r (s)m , (s tl t)m = (s)m ∨l (t)m , (s tr t)m = (s)m ∨r (t)m ,
      (s ul t)m = (s)m ∧l (t)m and (s ur t)m = (s)m ∧r (t)m .
      Example 9. Concerning the valuation m2,2 defined in Example 8, we have (x tr
         2,2                           2,2
      y)m = ({o1 , o2 }, ∅) and (−l x)m = ({o1 }, {a1 , a2 }).

      4.3     The Word Problem
      Now, for the WP in pure double Boolean algebras:
      input: terms s, t,
      output: determine whether there exists a pure double Boolean algebra D and
         a valuation m based on D such that (s)m 6= (t)m .
      A general strategy for proving a decision problem to be PSPACE-complete is
      first, to reduce to it a decision problem easily proved to be PSPACE-hard and
      second, to reduce it to a decision problem easily proved to be in PSPACE.
      PSPACE is the key complexity class of the satisfiability problem of numerous
      modal logics [1, Chapter 6]. Therefore, we introduce in Section 5 a PSPACE-
      complete modal logic and we show in Sections 6 and 7 how to reduce one into
      the other its satisfiability problem and the WP in pure double Boolean algebras.
                               The word problem in semiconcept algebras          287

5     A Basic 2-Sorted Modal Logic

In Section 3, we gave the proof that every pure double Boolean algebra can
be homomorphically embedded into the pure double Boolean algebra over some
formal context. Formal contexts are 2-sorted structures. Hence, the modal logic
that will be used in Sections 6 and 7 for proving the WP in pure double Boolean
algebras to be PSPACE-complete is a 2-sorted one.


5.1   Syntax

The language of K2 is based on a countable set OV ar of object variables (with
typical instances denoted P , Q, etc) and a countable set AV ar of attribute
variables (with typical instances denoted p, q, etc). Without loss of generality,
let us assume that OV ar and AV ar are disjoint. The set of all object formulas
(with typical instances denoted A, B, etc) and the set of all attribute formulas
(with typical instances denoted a, b, etc) are given by the rules

                         A ::= P | ⊥ | ¬A | (A ∨ B) | 2a
                          a ::= p | ⊥ | ¬a | (a ∨ b) | 2A

The other Boolean constructs are defined as usual. Let us adopt the standard
rules for omission of the parentheses. A formula (with typical instances denoted
α, β, etc) is either an object formula or an attribute formula. The notion of
“being a subformula of” is standard, the expression α  β denoting the fact
that α is a subformula of β. A substitution is a pair (Θ, θ) where Θ is a function
assigning to each object variable P an object formula Θ(P ) and θ is a function
assigning to each attribute variable p an attribute formula θ(p). (Θ, θ) induces a
homomorphism (·)(Θ,θ) assigning to each formula α a formula (α)(Θ,θ) such that
(P )(Θ,θ) = Θ(P ) and (p)(Θ,θ) = θ(p). Remark that for all object formulas A and
for all attribute formulas a,

 – (A)(Θ,θ) is an object formula,
 – (a)(Θ,θ) is an attribute formula.

Let OV ar = P1 , P2 , . . . be an enumeration of OV ar and AV ar = p1 , p2 , . . . be
an enumeration of AV ar. We shall say that a substitution (Θ, θ) is normal with
respect to OV ar and AV ar iff for all positive integers i,

 – Θ(Pi ) = Pi and θ(pi ) = 2Pi or Θ(Pi ) = 2pi and θ(pi ) = pi .

Given a formula α, V ar(α) will denote the set of all variables occurring in α. A
formula α is said to be nice iff

 – V ar(α) ⊆ OV ar or V ar(α) ⊆ AV ar.
288         Philippe Balbiani

      5.2     Semantics

      Let IK = (G, M, ∆) be a formal context. A IK-valuation is a pair (V, v) of func-
      tions where V assigns to each object variable P a subset V (P ) of G and v assigns
      to each attribute variable p a subset v(p) of M . (V, v) induces a function (·)(V,v)
      assigning to each formula α a subset (α)(V,v) of G ∪ M such that (P )(V,v) =
      V (P ), (⊥)(V,v) = ∅, (¬A)(V,v) = G \ (A)(V,v) , (A ∨ B)(V,v) = (A)(V,v) ∪ (B)(V,v) ,
      (2a)(V,v) = {g ∈ G: for all m ∈ M , if m ∈ (a)(V,v) then g ∆ m}, (p)(V,v) =
      v(p), (⊥)(V,v) = ∅, (¬a)(V,v) = M \ (a)(V,v) , (a ∨ b)(V,v) = (a)(V,v) ∪ (b)(V,v) and
      (2A)(V,v) = {m ∈ M : for all g ∈ G, if g ∈ (A)(V,v) then g ∆ m}. Remark that
      for all object formulas A and for all attribute formulas a,
                                                       .
       – (A)(V,v) is a subset of G such that (A)(V,v) = (2A)(V,v) ,
                                                     /
       – (a)(V,v) is a subset of M such that (a)(V,v) = (2a)(V,v) .

      A formula α is said to be satisfiable iff

       – there exists a formal context IK = (G, M, ∆) and a IK-valuation (V, v) such
         that (α)(V,v) is nonempty.


      5.3     Decision

      Now, for the nice satisfiability problem for K2 :

      input: a nice formula α,
      output: determine whether α is satisfiable.

      The next lemmas are central for proving that the problem of deciding equations
      in pure double Boolean algebras is PSPACE-complete.

      Theorem 2. The nice satisfiability problem for K2 is PSPACE-hard.

      Proof. A reduction similar to the reduction from the QBF -validity problem to
      the satisfiability problem for K considered in [1, Theorem 6.50] can be easily
      obtained.

      Now, for the satisfiability problem for K2 :

      input: a formula α,
      output: determine whether α is satisfiable.

      Theorem 3. The satisfiability problem for K2 is in PSPACE.

      Proof. An algorithm similar to the W itness algorithm considered in [1, Theorem
      6.47] can be easily obtained.

      From Theorems 2 and 3, it follows immediately that the nice satisfiability prob-
      lem for K2 and the satisfiability problem for K2 are both PSPACE-complete.
                                 The word problem in semiconcept algebras             289

6    From K2 to Pure Double Boolean Algebras

First, we consider the lower bound of the complexity of the problem of deciding
the WP in pure double Boolean algebras. Given a nice formula α, we wish to
construct a pair (s1 (α), s2 (α)) of terms such that α is satisfiable iff there exists
a pure double Boolean algebra D and a valuation m based on D such that
(s1 (α))m 6= (s2 (α))m . Let OV ar = P1 , P2 , . . . be an enumeration of OV ar,
AV ar = p1 , p2 , . . . be an enumeration of AV ar and V ar = x1 , y1 , x2 , y2 , . . . be
an enumeration of V ar. The function T (·) assigning to each nice object formula
A a term T (A) and the function t(·) assigning to each nice attribute formula a a
term t(a) are such that T (Pi ) = xi , T (⊥) = 0l , T (¬A) = −l T (A), T (A ∨ B) =
T (A) tl T (B), T (2a) = −l −l −r −r t(a), t(pi ) = yi , t(⊥) = 1r , t(¬a) = −r t(a),
t(a ∨ b) = t(a) ur t(b) and t(2A) = −r −r −l −l T (A). Let (s1 (·), s2 (·)) be the
function assigning to each nice formula α a pair (s1 (α), s2 (α)) of terms such that
if α is a nice object formula then s1 (α) = T (α) and s2 (α) = 0l and if α is a nice
attribute formula then s1 (α) = t(α) and s2 (α) = 1r . Obviously, (s1 (α), s2 (α))
can be computed in space log | α |. Moreover,

Proposition 1. If α is nice then α is satisfiable iff there exists a pure double
Boolean algebra D and a valuation m based on D such that (s1 (α))m 6= (s2 (α))m .

Proof. Since α is nice, V ar(α) ⊆ OV ar or V ar(α) ⊆ AV ar. Without loss of
generality, let us assume that V ar(α) ⊆ OV ar. Hence, there exists a positive
integer n such that V ar(α) ⊆ {P1 , . . . , Pn }.
(⇒) Suppose α is satisfiable, we demonstrate there exists a pure double Boolean
algebra D and a valuation m based on D such that (s1 (α))m 6= (s2 (α))m . Since α
is satisfiable, there exists a formal context IK = (G, M, ∆) and a valuation (V, v)
based on IK such that (α)(V,v) is nonempty. Let H(IK) = (H(IK), ⊥l , ⊥r , >l , >r ,
¬l , ¬r , ∨l , ∨r , ∧l , ∧r ) and m be a valuation based on H(IK) such that for all
positive integers i, if i ≤ n then m(xi ) = (V (Pi ), V (Pi ). ). We show first that

Lemma 10. Let A be a nice object formula and a be a nice attribute formula.
                                             .
If A  α then (T (A))m = ((A)(V,v) , (A)(V,v) ) and if a  α then (t(a))m =
         /
((a)(V,v) , (a)(V,v) ).

Continuing the proof of Proposition 1, since (α)(V,v) is nonempty, by Lemma 10,
if α is a nice object formula then (T (α))m 6= (0l )m and if α is a nice attribute
formula then (t(α))m 6= (1r )m . Hence, (s1 (α))m 6= (s2 (α))m . Thus, there exists a
pure double Boolean algebra D and a valuation m based on D such that (s1 (α))m
6= (s2 (α))m .
(⇐) Suppose there exists a pure double Boolean algebra D and a valuation m
 based on D such that (s1 (α))m 6= (s2 (α))m , we demonstrate α is satisfiable. Let
 IK(D) = (Fp (D), Ip (D), ∆) and (V, v) be a valuation based on IK(D) such that
for all positive integers i, if i ≤ n then V (Pi ) = Fm(xi ) . Interestingly,

Lemma 11. Let A be a nice object formula and a be a nice attribute formula.
If A  α then (A)(V,v) = F(T (A))m and if a  α then (a)(V,v) = I(t(a))m .
290       Philippe Balbiani

      Continuing the proof of Proposition 1, since (s1 (α))m 6= (s2 (α))m , if α is a
      nice object formula then (T (α))m 6= (0l )m and if α is a nice attribute formula
      then (t(α))m 6= (1r )m . Hence, by Lemma 11, (α)(V,v) is nonempty. Thus, α is
      satisfiable. This ends the proof of Proposition 1.

      Hence, (s1 (·), s2 (·)) is a reduction from the nice satisfiability problem for K2 to
      the WP in pure double Boolean algebras. Thus, by Theorem 2,

      Corollary 1. The WP in pure double Boolean algebras is PSPACE-hard.


      7    From Pure Double Boolean Algebras to K2

      Second, we consider the upper bound of the complexity of the WP in pure double
      Boolean algebras. Given a pair (s, t) of terms, we wish to construct an object
      formula O(s, t) and an attribute formula A(s, t) such that there exists a pure
      double Boolean algebra D and a valuation m based on D such that (s)m 6= (t)m
      iff some instance of O(s, t) is satisfiable or some instance of A(s, t) is satisfiable.
      Let V ar = x1 , x2 , . . . be an enumeration of V ar, OV ar = P1 , P2 , . . . be an
      enumeration of OV ar and AV ar = p1 , p2 , . . . be an enumeration of AV ar. The
      function F (·) assigning to each term s an object formula F (s) and the function
      f (·) assigning to each term s an attribute formula f (s) are such that F (xi ) = Pi ,
      f (xi ) = pi , F (0l ) = ⊥, f (0l ) = 2⊥, F (0r ) = 2>, f (0r ) = >, F (1l ) = >, f (1l )
      = 2>, F (1r ) = 2⊥, f (1r ) = ⊥, F (−l s) = ¬F (s), f (−l s) = 2¬F (s), F (−r s) =
      2¬f (s), f (−r s) = ¬f (s), F (s tl t) = F (s) ∨ F (t), f (s tl t) = 2(F (s) ∨ F (t)),
      F (str t) = 2(f (s)∧f (t)), f (str t) = f (s)∧f (t), F (sul t) = F (s)∧F (t), f (sul t) =
      2(F (s)∧F (t)), F (sur t) = 2(f (s)∨f (t)) and f (sur t) = f (s)∨f (t). Let O(·, ·) be
      the function assigning to each pair (s, t) of terms the object formula O(s, t) such
      that O(s, t) = ¬(F (s) ↔ F (t)). Let A(·, ·) be the function assigning to each pair
      (s, t) of terms the attribute formula A(s, t) such that A(s, t) = ¬(f (s) ↔ f (t)).
      Obviously, O(s, t) and A(s, t) can be computed in space log | (s, t) |. Moreover,

      Proposition 2. There exists a pure double Boolean algebra D and a valuation
      m based on D such that (s)m 6= (t)m iff there exists a substitution (Θ, θ) such that
      (Θ, θ) is normal with respect to OV ar and AV ar and O(s, t)(Θ,θ) is satisfiable
      or A(s, t)(Θ,θ) is satisfiable.

      Proof. Let n be a positive integer such that V ar(s) ∪ V ar(t) ⊆ {x1 , . . . , xn }.
      (⇒) Suppose there exists a pure double Boolean algebra D and a valuation m
      based on D such that (s)m 6= (t)m , we demonstrate there exists a substitu-
      tion (Θ, θ) such that (Θ, θ) is normal with respect to OV ar and AV ar and
      O(s, t)(Θ,θ) is satisfiable or A(s, t)(Θ,θ) is satisfiable. Let (Θ, θ) be a normal sub-
      stitution with respect to OV ar and AV ar such that for all positive integers i,
      if i ≤ n then if m(xi ) is in Dl then Θ(Pi ) = Pi and θ(pi ) = 2Pi and if m(xi )
      is in Dr then Θ(Pi ) = 2pi and θ(pi ) = pi . Let IK(D) = (Fp (D), Ip (D), ∆) and
      (V, v) be a valuation based on IK(D) such that for all positive integers i, if i ≤ n
      then V (Pi ) = Fm(xi ) and v(pi ) = Im(xi ) . Remark that for all positive integers i,
                                         The word problem in semiconcept algebras                          291
                                                                (V,v)
if i ≤ n then if m(xi ) is in Dl then (Pi )(Θ,θ)                        = (Pi )(V,v) = V (Pi ) = Fm(xi )
                                           (Θ,θ) (V,v)                                      /
and if m(xi ) is in Dr then (Pi )              = (2pi )(V,v) = (pi )(V,v) = v(pi )/ =
 /
Im(xi ) = Fm(xi ) . Similarly, for all positive integers i, if i ≤ n then if m(xi ) is in
                        (V,v)                                    .
Dl then (pi )(Θ,θ)              = (2Pi )(V,v) = (Pi )(V,v) = V (Pi ). = Fm(x
                                                                         .
                                                                             i)
                                                                                = Im(xi ) and
                                         (V,v)
if m(xi ) is in Dr then (pi )(Θ,θ)               = (pi )(V,v) = v(pi ) = Im(xi ) We first observe
                                                                                            (V,v)
Lemma 12. Let u be a term. If u  s or u  t then (F (u))(Θ,θ)                                      = F(u)m
               (Θ,θ) (V,v)
and (f (u))                  = I(u)m .
Continuing the proof of Proposition 2, since (s)m 6= (t)m , F(s)m 6= F(t)m or I(s)m
                                                               (V,v)                                      (V,v)
6 I(t)m . Hence, by Lemma 12, O(s, t)(Θ,θ)
 =                                                   is nonempty or A(s, t)(Θ,θ)
is nonempty. Thus, there exists a substitution (Θ, θ) such that (Θ, θ) is normal
with respect to OV ar and AV ar and O(s, t)(Θ,θ) is satisfiable or A(s, t)(Θ,θ)
is satisfiable.
(⇐) Suppose there exists a substitution (Θ, θ) such that (Θ, θ) is normal with
respect to OV ar and AV ar and O(s, t)(Θ,θ) is satisfiable or A(s, t)(Θ,θ) is sat-
isfiable, we demonstrate there exists a pure double Boolean algebra D and a
valuation m based on D such that (s)m 6= (t)m . Since O(s, t)(Θ,θ) is satisfi-
 able or A(s, t)(Θ,θ) is satisfiable, there exists a formal context IK = (G, M, ∆)
                                                                                  (V,v)
and a valuation (V, v) based on IK such that O(s, t)(Θ,θ)                                 is nonempty or
          (Θ,θ) (V,v)
A(s, t)          is nonempty. Let H(IK) = (H(IK), ⊥l , ⊥r , >l , >r , ¬l , ¬r , ∨l , ∨r ,
∧l , ∧r ) and m be a valuation based on H(IK) such that for all positive integers
i, if i ≤ n then m(xi ) = ((Θ(Pi ))(V,v) , (θ(pi ))(V,v) ). Interestingly,
                                                                                                         (V,v)
Lemma 13. Let u be a term. If u  s or u  t then (u)m = ((F (u))(Θ,θ)                                           ,
               (V,v)
(f (u))(Θ,θ)           ).
                                                                                  (V,v)
Continuing the proof of Proposition 2, since O(s, t)(Θ,θ)                                 is nonempty or
          (Θ,θ) (V,v)                            (Θ,θ) (V,v)             (Θ,θ) (V,v)                   (V,v)
A(s, t)                 is nonempty, F (s)                      6= F (t)               or f (s)(Θ,θ)           6=
            (V,v)
f (t)(Θ,θ)  . Hence, by lemma 13, (s)m 6= (t)m . Thus, there exists a pure double
Boolean algebra D and a valuation m based on D such that (s)m 6= (t)m . This
ends the proof of Proposition 2.

Hence, O(·, ·) and A(·, ·) are reductions from the WP in pure double Boolean
algebras to the satisfiability problem for K2 . Thus, by Theorem 3,
Corollary 2. The WP in pure double Boolean algebras is in PSPACE.


8    Conclusion

Our results implicitly assume that the set V ar of all individual variables is infi-
nite and the depth of nesting of the left operations with the right operations is
not bounded. Following the line of reasoning suggested in [4], we may see what
292       Philippe Balbiani

      happens if we assume that the set V ar of all individual variables is finite and
      the depth of nesting of the left operations with the right operations is bounded.
      Do we get a linear time complexity in this case?
      The unification problem is quite different from the WP discussed here: given
      terms s, t, decide whether there exists terms which can be substituted for the
      variables in s, t so that the terms thus obtained are identically interpreted in all
      pure double Boolean algebras. In Mathematics and Computer Science, unifica-
      tion problems are of the utmost importance. At the time of writing, we know
      nothing about the decidability/complexity of the unification problem in pure
      double Boolean algebras.

      Acknowledgements
      Special acknowledgement is heartly granted to Christian Herrmann who made
      several helpful comments for improving the correctness and the readability of
      this article.

      References
      1. Blackburn, P., de Rijke, M., Venema, Y.: Modal Logic. Cambridge University Press
         (2001).
      2. Davey, B, Priestley, H.: Introduction to Lattices and Order. Cambridge University
         Press (2002).
      3. Ganter, B, Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer
         (1999).
      4. Halpern, J.: The effect of bounding the number of primitive propositions and the
         depth of nesting on the complexity of modal logic. Artificial Intelligence 75 (1995)
         361–372.
      5. Herrmann, C., Luksch, P., Skorsky, M., Wille, R.: Algebras of semiconcepts and
         double Boolean algebras. Technische Universität Darmstadt (2000).
      6. Vormbrock, B.: A first step towards protoconcept exploration. In Eklund, P. (editor):
         Concept Lattices. Springer (2004) 208–221.
      7. Vormbrock, B.: Complete subalgebras of semiconcept algebras and protoconcept alge-
         bras. In Ganter, B., Godin, R. (editors): Formal Concept Analysis. Springer (2005)
         329–343.
      8. Vormbrock, B.: A solution of the word problem for free double Boolean algebras.
         In Kuznetsov, S., Schmidt, S. (editors): Formal Concept Analysis. Springer (2007)
         240–270.
      9. Vormbrock, B., Wille, R.: Semiconcept and protoconcept algebras: the basic theorems.
         In Ganter, B., Stumme, G., Wille, R. (editors): Formal Concept Analysis. Springer
         (2005) 34–48.
      10. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of con-
         cepts. In Rival, I. (editor): Ordered Sets. D. Reidel (1982) 314–339
      11. Wille, R.: Boolean concept logic. In Ganter, B., Mineau, G. (editors): Conceptual
         Structures: Logical, Linguistic, and Computational Issues. Springer (2000) 317–331.
      12. Wille, R.: Formal concept analysis as applied lattice theory. In Ben Yahia, S., Me-
         phu Nguifo, E., Belohlavek, R. (editors): Concept Lattices and their Applications.
         Springer (2008) 42–67
                                        The word problem in semiconcept algebras                     293

Annex
Proof of Lemma 10. By induction on A and a.
Basis. Remind that V ar(α) ⊆ {P1 , . . . , Pn }. In this respect, for all positive
integers i, if i ≤ n then (T (Pi ))m = (xi )m = m(xi ) = (V (Pi ), V (Pi ). ) =
                         .
((Pi )(V,v) , (Pi )(V,v) ).
Hypothesis. Suppose A, B are nice object formulas such that A, B  α,
                                          .                                     .
(T (A))m = ((A)(V,v) , (A)(V,v) ) and (T (B))m = ((B)(V,v) , (B)(V,v) ) and a, b
                                                                   m (V,v) /
are nice attribute formulas such that a, b  α, (t(a)) = ((a)                , (a)(V,v) ) and
       m           (V,v) /       (V,v)
(t(b)) = ((b)              , (b)       ).
Step. We only consider the case of the nice object formula 2a, the other
cases being treated similarly. We have: (T (2a))m = (−l −l −r −r t(a))m =
                                                    /                         /
¬l ¬l ¬r ¬r (t(a))m = ¬l ¬l ¬r ¬r ((a)(V,v) , (a)(V,v) ) = ¬l ¬l (((a)(V,v) ) , (a)(V,v) ) =
                               .
      (V,v) /        (V,v) /                (V,v)       (V,v) .
(((a)      ) , ((a)         ) ) = ((2a)           , (2a)        ).

Proof of Lemma 11. By induction on A and a.
Basis. Remind that V ar(α) ⊆ {P1 , . . . , Pn }. In this respect, for all positive in-
tegers i, if i ≤ n then (Pi )(V,v) = V (Pi ) = Fm(xi ) = F(xi )m = F(T (Pi ))m .
Hypothesis. Suppose A, B are nice object formulas such that A, B  α, (A)(V,v)
= F(T (A))m and (B)(V,v) = F(T (B))m and a, b are nice attribute formulas such
that a, b  α, (a)(V,v) = I(t(a))m and (b)(V,v) = I(t(b))m .
Step. We only consider the case of the nice object formula 2a, the other cases
being treated similarly. We have: (2a)(V,v) = {F ∈ Fp (D): for all I ∈ Ip (D),
if I ∈ (a)(V,v) then F ∆ I} = {F ∈ Fp (D): for all I ∈ Ip (D), if I ∈ I(t(a))m
then F ∆ I} = I(t(a))m / = F¬r ¬r (t(a))m = F¬l ¬l ¬r ¬r (t(a))m = F(−l −l −r −r t(a))m
= F(T (2a))m .

Proof of Lemma 12. By induction on u.
Basis. Remind that V ar(s) ∪ V ar(t) ⊆ {x1 , . . . , xn }. In this respect, for all pos-
                                                      (V,v)                  (V,v)
itive integers i, if i ≤ n then (F (xi ))(Θ,θ)                = (Pi )(Θ,θ)           = Fm(xi ) = F(xi )m
                    (V,v)                  (V,v)
and (f (xi ))(Θ,θ)     = (pi )(Θ,θ)       = Im(xi ) = I(xi )m .
Hypothesis. Suppose u, v are terms such that u  s or u  t, v  s or v  t,
             (V,v)                        (V,v)                        (V,v)
(F (u))(Θ,θ)       = F(u)m , (f (u))(Θ,θ)       = I(u)m , (F (v))(Θ,θ)       = F(v)m and
            (V,v)
(f (v))(Θ,θ)    = I(v)m .
Step. We only consider the case of the term u ul v, the other cases being treated
                                      (V,v)                        (V,v)
similarly. We have: (F (u ul v))(Θ,θ)       = (F (u) ∧ F (v))(Θ,θ)       = ((F (u))(Θ,θ)
                                             (V,v)                    (V,v)
∧(F (v))(Θ,θ) )(V,v) = (F (u))(Θ,θ)                  ∩ (F (v))(Θ,θ)            = F(u)m ∩ F(v)m =
                                                     (Θ,θ) (V,v)                                 (V,v)
F(u)m ∧l (v)m = F(uul v)m and (f (u ul v))                     = (2(F (u) ∧ F (v)))(Θ,θ)                 =
                                                                                              .
(2((F (u))(Θ,θ) ∧ (F (v))(Θ,θ) ))(V,v) =                  ((F (u))(Θ,θ) ∧ (F (v))(Θ,θ) )(V,v)            =
                (V,v)                   (V,v) .
((F (u))(Θ,θ)           ∩(F (v))(Θ,θ)       ) = (F(u)m ∩F(v)m ). = I(u)m ∧l (v)m = I(uul v)m .

Proof of Lemma 13. By induction on u.
Basis. Remind that V ar(s) ∪ V ar(t) ⊆ {x1 , . . . , xn }. In this respect, for all
294       Philippe Balbiani

      positive integers i, if i ≤ n then (xi )m = m(xi ) = ((Θ(Pi ))(V,v) , (θ(pi ))(V,v) ) =
                   (V,v)                 (V,v)                               (V,v)                 (V,v)
      ((Pi )(Θ,θ) , (pi )(Θ,θ)  ) = ((F (xi ))(Θ,θ)          , (f (xi ))(Θ,θ) ).
      Hypothesis. Suppose u, v are terms such that u  s or u  t, v  s or
                                  (V,v)                (V,v)                            (V,v)
      v  t, (u)m = ((F (u))(Θ,θ)       , (f (u))(Θ,θ)       ) and (v)m = ((F (v))(Θ,θ)       ,
                    (V,v)
      (f (v))(Θ,θ)   ).
      Step. We only consider the case of the term u ul v, the other cases being treated
                                                                   (V,v)                (V,v)
      similarly. We have: (u ul v)m = (u)m ∧l (v)m = ((F (u))(Θ,θ)       , (f (u))(Θ,θ)       )
                           (V,v)                    (V,v)                                (V,v)                         (V,v)
      ∧l ((F (v))(Θ,θ)             , (f (v))(Θ,θ)           )    =       ((F (u))(Θ,θ)            ∩ (F (v))(Θ,θ)               ,
                 (Θ,θ) (V,v)                     (Θ,θ) (V,v) .                          (Θ,θ)                  (Θ,θ) (V,v)
      ((F (u))                  ∩ (F (v))                       ) ) = (((F (u))                  ∧ (F (v))             )       ,
                                                     .                                          (V,v)
      ((F (u))(Θ,θ) ∧ (F (v))(Θ,θ) )(V,v) ) = ((F (u) ∧ (F (v))(Θ,θ)                                    , (2((F (u))(Θ,θ) ∧
                                                                         (V,v)                                 (V,v)
      (F (v))(Θ,θ) ))(V,v) )           =     ((F (u ul v))(Θ,θ)                  , (2(F (u) ∧ F (v)))(Θ,θ)             )   =
                               (V,v)                            (V,v)
      ((F (u ul v))(Θ,θ)               , (f (u ul v))(Θ,θ)              ).
               Looking for analogical proportions
              in a formal concept analysis setting

                Laurent Miclet1 , Henri Prade2 , and David Guennec1
  1
      IRISA-ENSSAT, Lannion, France, miclet@enssat.fr, david.guennec@gmail.com,
           2
             IRIT, Université Paul Sabatier, Toulouse, France, prade@irit.fr



         Abstract. Categorization and analogical reasoning are two important
         cognitive processes, for which there exist formal counterparts (at least
         they may be regarded as such): namely, formal concept analysis on the
         one hand, and analogical proportions (modeled in propositional logic)
         on the other hand. This is a first attempt aiming at relating these two
         settings. The paper presents an algorithm that takes advantage of the
         lattice structure of the set of formal concepts for searching for analogical
         proportions that may hold in a formal context. Moreover, properties
         linking analogical proportions and formal concepts are laid bare.


  1     Introduction
  Categorization and analogical reasoning play important roles in cognitive pro-
  cesses. They both heavily rely on the ideas of similarity and dissimilarity. Items
  belonging to the same category should be similar, while they are dissimilar with
  respect to items belonging to other categories. Analogical proportions, which
  are statements of the form ‘a is to b as c is to d’, express the similarity of the
  relations linking a and b with the relations linking c and d (note that however
  a and b may be somewhat dissimilar (as well as c and d). In a Boolean setting,
  where items are described in terms of binary attributes, similarity amounts to
  the identity of properties, while dissimilarity refers to the presence of properties
  for an item which are absent in the other considered item.
      Among formal approaches aiming at categorizing items, Formal Concept
  Analysis (FCA) provides a way for characterizing concepts both extensionally in
  terms of the objects that they cover and intensionally in terms of the properties
  that these objects share. FCA is known as a lattice-theoretic framework devised
  for knowledge extraction from Boolean data tables called formal contexts that
  relate objects and properties. Introduced under this name by Wille [13], FCA
  has been developed by Ganter and Wille [7] and their followers for thirty years.
      Besides, there has been a renewal of interest for analogical proportions in the
  last decade, firstly in relation with computational linguistic concerns. Set-based,
  algebraic and logical models have been proposed [8, 12, 1, 9]. In the following,
  we more particularly use the Boolean view [9] of analogical proportions that is
  directly relevant for application to formal contexts. Then, it makes sense to look
  for analogical proportions in Boolean contexts, and to try to understand what
  formal concepts and analogical proportions may have in common.

c 2011 by the paper authors. CLA 2011, pp. 295–307. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
296     Laurent Miclet, Henri Prade and David Guennec


    The paper is organized as follows. We first provide a short background on
analogical proportions in Section 2. Then in Section 3, after a brief reminder of
basic definitions in FCA, we present an efficient algorithm able to discover ana-
logical proportions in a formal context by using the lattice of formal concepts. In
Section 4, we further investigate the theoretical relations between FCA and ana-
logical proportions, by showing how formal concepts are involved in analogical
proportions, before indicating lines for further research and concluding.


2     Analogical proportions

An analogical proportion ‘a is to b as c is to d’, usually denoted a : b :: c : d,
expresses that the way a and b differ is the same as the way c and ddiffer [9].
This leads to the following definitions, here stated for three closely related kinds
of items: subsets of a finite set, Boolean truth values, and objects defined by
Boolean properties (also called binary attributes).


Analogical proportion between sets First, let us consider four sets A, B, C
and D, all subsets of some set X. The dissimilarity between A and B is evaluated
by A ∩ B and by A ∩ B, where A denotes the complement of A in X, while the
similarity corresponds to A ∩ B and A ∩ B. Viewing an analogical proportion as
expressing that the differences between A and B and between C and D are the
same, we get the following definition [9]:

Definition 1 Four subsets A, B, C and D of a finite set X are in analogical
proportion in this order when A ∩ B = C ∩ D and A ∩ B = C ∩ D.


Analogical proportion between Boolean objects This expression has an
immediate logical counterpart when a, b, c, and d now denote Boolean variables:

                  ((a ∧ ¬b) ≡ (c ∧ ¬d)) ∧ ((¬a ∧ b) ≡ (¬c ∧ d))

This formula is true for the 6 truth value assignments of a, b, c, d appearing in
Table 1, and is false for the 24 − 6 = 10 remaining possible assignments.
     It can be checked that the above definitions of an analogical proportion sat-
isfies the following characteristic postulates [8]:

 – a : b :: a : b (identity)
 – a : b :: c : d =⇒ c : d :: a : b (global symmetry)
 – a : b :: c : d =⇒ a : c :: b : d (central permutation)
 – a : b :: c : d and ¬(b : a :: c : d) are consistent (local dissymmetry)
        Looking for analogical proportions in a formal concept analysis setting        297


                                      a ×        ×    ×
                                      b ×    ×        ×
                                      c ×        ××
                                      d ×    ×    ×
Table 1. The six Boolean 4-tuples that are in analogical proportion. The Boolean
truth-values True and False are written as a cross and a blank.



Objects defined by Boolean properties Let us suppose that the objects
(or items) a, b, c, d are described by sets of binary properties belonging to a
set P rop. Then, each item can be viewed as a subset of P rop, made of the
attributes that hold true on this item3 . Then, we can apply Definition 1, namely
a ∩ b = c ∩ d and a ∩ b = c ∩ d. Another way of seeing this analogical proportion
is given by the equivalent definition:
Definition 2 Four objects (a, b, c, d) defined by binary properties are in analogi-
cal proportion iff the truth-values of each property on these objects make a 4-tuple
of binary values corresponding to one of the six 4-tuples displayed in Table 1.

Analogical dissimilarity We now introduce the concept of analogical dissim-
ilarity (AD) between four objects defined by binary attributes. It is simply the
sum on all the attributes of the analogical dissimilarity per attribute. The latter
is defined according to the following table:
                      a                   ××××××××
                      b           ××××            ××××
                      c       ××      ××      ××      ××
                      d     × × × × × × × ×
                     AD = 0 1 1 0 1 0 2 1 1 2 0 1 0 1 1 0
    AD per attribute is merely the minimal number of bit(s) that has/have to be
flipped in order to turn the four bits into an analogical proportion, according to
Table 1. Notice that any 4-tuple with a zero AD is an analogical proportion, and
vice-versa. For example, the four first objects of the formal context that we will
later call BASE lm (see Figure 2) are such that AD(leech, bream, frog, dog) =
0 + 1 + 0 + 0 + 0 + 0 + 0 + 1 + 1 = 3.

3      Searching for analogical proportions in formal concepts
This section is devoted to the following problem: given a formal context with
n objects and d properties (or attributes), is it possible to discover 4-tuples
of objects in analogical proportion, without running an O(n4 · d) algorithm?
We give in this section an heuristic algorithm which uses the lattice of formal
concepts, and has shown experimentally its efficiency for discovering analogical
proportions. We start with a brief reminder on FCA.
3
    For an object x, this subset is called R↑(x) in Formal Concept Analysis, see Sect. 3.1.
298     Laurent Miclet, Henri Prade and David Guennec


3.1   Formal concept analysis (FCA)

FCA starts with a binary relation R, called formal context, defined between a
set Obj of objects and a set P rop of Boolean properties. The notation (x, y) ∈ R
means that object x has property y. R↑ (x) = {y ∈ P rop|(x, y) ∈ R} is the set
of properties of object x. Similarly, R↓ (y) = {x ∈ Obj|(x, y) ∈ R} is the set of
objects having property y.
    Given a set Y of properties, one can define the set of objects [7]: R↓ (Y ) =
{x ∈ Obj|R↑ (x) ⊇ Y }.
    This is the set of objects sharing all properties in Y (and having maybe some
others). Then a formal concept is defined as a pair made of its extension X and
its intension Y such that

                           R↓ (Y ) = X and R↑ (X) = Y,

where (X, Y ) ⊆ Obj×P rop, and R↑ (X) is similarly defined as {y ∈ P rop|R↓ (y) ⊇
X}. It can be also shown that formal concepts are maximal pairs (X, Y ) (in the
sense of inclusion) such that X × Y ⊆ R.
   Moreover, the set of all formal concepts is equipped with a partial order (de-
noted ) defined as: (X1 , Y1 )  (X2 , Y2 ) iff X1 ⊆ X2 (or, equivalently, Y2 ⊆ Y1 ),
and forms a complete lattice, called the concept lattice of R.

    Let us consider an example where R is a relation that defines links be-
tween eight objects Obj = {1, 2, 3, 4, 5, 6, 7, 8} and nine properties P rop =
{a, b, c, d, e, f, g, h, i}. There is a “×” in the cell corresponding to an object x
and to a property y if the object x has property y, in other words the “×”s
describe the relation R (or context). An empty cell corresponds to the fact that
(x, y) 6∈ R, i.e., it is known that object x has not property y. The relation R
in the example is given in Figure 1. There are 5 formal concepts. For instance,
consider X = {a, b, c, d, e}. Then R↑∆ (X) = {7, 8} ; likewise if Y = {7, 8}. Then
R↓∆ (Y ) = {a, b, c, d, e}.


                                 a b c d e f g h i
                               1             ×
                               2             ×
                               3             ×××
                               4             ×××
                               5     ××××
                               6     ××××
                               7×××××
                               8×××××


         Fig. 1. R2 : a relation with 5 formal concepts (and 2 sub-contexts)
      Looking for analogical proportions in a formal concept analysis setting       299


3.2   Organizing the search

The basic idea of our algorithm is to start from some 4-tuple of objects, to
observe the attributes that contribute to make AD non-zero for this 4-tuple,
and to replace one of the four objects by another object. Then we iterate the
process. Two important features have been added to avoid a random walk in the
space of the 4-tuples.

 1. The replacement of one object by another is done according to the obser-
    vation of the lattice of concepts. The idea is to try to decrease the value of
    AD. This point will be explained in the next section.
 2. All 4-tuples of objects that are created are stored in a list, ordered by in-
    creasing value of AD. The next 4-tuple to be chosen is the first in the list.

This algorithm can therefore be seen as an optimization procedure, more pre-
cisely as a best-first version of the GRAPHSEARCH algorithm ([11]). The or-
dered list of 4-tuples is an Open list in this interpretation.


3.3   Decreasing the analogical dissimilarity

Let us come now to the heart of the algorithm, namely the replacement of one
object by another. Can we find some information in the lattice that leads us to
choose both an object in the 4-tuple and another object to replace it ? Remember
that we are looking for a replacement that makes the AD decrease.
    Now, let us consider the following situation, taken from BASE lm (see Figure
2). Suppose that we are studying the 4-tuple of objects (3, 4, 9, 12), with AD = 1.
Attribute c is the only one to contribute in the AD of this 4-tuple. Actually, c
has the values (1, 1, 0, 1) on the 4-tuple (3, 4, 9, 12). We notice now that there
are two interesting concepts in the lattice with respect to c, the first one being
({b, c, h, g, a} , {3}) and the second being ({b, h, g, a} , {2, 3}), which are directly
connected. What can we deduce from this pair of connected concepts ?

 – Attribute c has value 1 on object 3.
 – Attribute c has value 0 on object 2.
 – Attribute c is the only attribute to switch from 1 to 0 to transform concept
   ({b, c, h, g, a} , {3}) into concept ({b, h, g, a} , {2, 3}).

We can conclude from this evidence that replacing object 3 by object 2 will
decrease by 1 the value of AD, since c will take values (0, 1, 0, 1) on the 4-tuple
(2, 4, 9, 12), and therefore that (2, 4, 9, 12) is an analogical proportion.
    Unfortunately this argument does not lead to a greedy algorithm, since there
no insurance, given a 4-tuple, that there exists a couple of concepts having
the three above properties. Most of the time, actually, there is more than one
attribute switching to 1 between two connected concepts, and only one insuring
the decreasing of AD.
    Let us take another example from the same data base. The 4-tuple (5, 4, 9, 12)
has a AD of 4, because of the attributes d, f , g and h. Two interesting connected
300       Laurent Miclet, Henri Prade and David Guennec


concepts are ({c, f, d, e, i, a} , {12}) and ({c, d, e, a} , {7, 12}), since f switching
from 1 to 0 will decrease the AD (see the table below). But replacing 12 by 7
in the 4-tuple (5, 4, 9, 12) will switch not only f but also the attribute i, and
we don’t know what will happen when switching i: it may decrease as well as
increase the AD. Actually, in this case, it increases the AD. Hence, the 4-tuple
(5, 4, 9, 7) has the same AD of 4.

         a b c d e f g h i                        a b c d e f g h i
       5×× × ×                                   5×× × ×
       4× ×          ×××                         4× ×         ×××
       9×× ×××                                   9×× ×××
      12 × × × × ×       ×                       7× ×××
      AD = 4                                     AD = 4

   By generalizing these examples, we propose now an heuristic to try to de-
crease the AD that we call h-Doap.
      Heuristic h-Doap. Let a couple of connected concepts be (A ∪ B, Z)
      and (B, Z ∪ Y ), A and B being subsets of attributes and Y and Z being
      subsets of objects such that A ∩ B = ∅ and Y ∩ Z = ∅. If there is a
      4-tuple with one of its four objects in Z and if there is an attribute in B
      that decreases the AD of this 4-tuple when switching from 1 to 0, then
      create a new 4-tuple by replacing the object in Z by an object in Y .

Next section shows how this heuristic can be used to discover 4-tuples of objects
with null AD in a formal context, i.e. analogical proportions of objects.


3.4     Algorithm
Discovering one analogical proportion. We explain in this section the al-
gorithm used to discover one analogical proportion in a formal context. We call
it ‘Discover One Analogical Proportion’, in short Doap. As already stated, it is
a simple version of Graphsearch, where the nodes to be explored are 4-tuples
of objects. We denote Start the 4-tuple of objects chosen to begin, Open the
current set of 4-tuples to be processed and Closed the set of 4-tuples already
processed.
    The choice of Start is either done randomly or by selecting objects which
appear in small subsets of objects in the lattice. We also require that the explored
4-tuples are composed of four different objects, since we do not want to converge
towards 4-tuples trivially in proportion, such as (1, 3, 1, 3) or even (2, 2, 2, 2).
    The algorithm stops either by discovering an analogical proportion, or in
failure. One has to notice that its failure does not insure that there is no ana-
logical proportion, since there is no guarantee given by the heuristic. We have
never met this failure case, but our experiments are very limited, as explained
in section 3.5.
        Looking for analogical proportions in a formal concept analysis setting        301


       1: Algorithm Doap(Start)
       2: begin
       3: Closed ← ∅
       4: Open ← {Start}
       5: while Open 6= ∅ do
       6:   x ← the 4-tuple in Open having the lowest AD value
       7:   if AD(x) = 0 then
       8:      return x
       9:   else
      10:      Open ← Open \ {x} ; Closed ← Closed ∪ {x}
      11:      decision ← 1
      12:      while decision = 1 do
      13:        Use heuristic h-Doap to construct a new 4-tuple y from x
      14:        if y is composed of four different objects and y 6∈ Closed and y 6∈
                 Open then
      15:           Open ← Open ∪ {y} ; decision ← 0
      16:        end if
      17:      end while
      18:   end if
      19: end while
      20: return failure
      21: end


Discovering several analogical proportions. To discover more analogical
proportions, the simplest manner is to imbed algorithm Doap in a procedure
that discards the first two objects of a discovered analogical 4-tuple from the
formal context before re-running the algorithm. Since the transitivity holds for
analogical proportions on objects (u : v :: w : x and w : x :: y : z implies
u : v :: y : z), we are loosing no information on analogical 4-tuples. However,
we are not insured to find all proportions in that manner, due to the fact that
algorithm Doap may not find an existing proportion.


3.5     Experiments

The size of Close when the algorithm Doap stops is a precise indication of its
practical time complexity. Notice that a random algorithm, running on n objects
(without any construction of a formal lattice), in which there are q 4-tuples in
analogical proportion would in average try ((n4 )/8·q) 4-tuples before discovering
a proportion. In the previous formula, the number “8” comes from the fact that,
when there is one analogical proportion in a formal context, then there are in
fact exactly 8 through suitable permutations. This property stems directly from
the postulates an analogical proportion (see section 2).
    We have used two different formal contexts to run the algorithm Doap. The
first one is described in [2], except that we have added four objects 9, 10, 11 et 12
in order to have (at least) the analogical proportions (3, 4, 9, 10) and (1, 8, 11, 12).
This leads to the formal context called BASE lm , Figure 2.
302       Laurent Miclet, Henri Prade and David Guennec


                       a b c d e f g h i                  a b c d e f g h i
               leech 1 × ×         ×              bean 7 × × × ×
             bream 2 × ×           ××            maize 8 × × × ×
                frog 3 × × ×       ××                x 9×× ×××
                 dog 4 × ×         ×××               y 10 ×     ×××       ×
         spike-weed 5 × × × ×                        z 11 × ×     × × ×
                reed 6 × × × × ×                     t 12 × × × × ×       ×


Fig. 2. BASE lm : A formal context from Bělohlávek [2] increased with four objects
9, 10, 11 et 12 in order to have (at least) the analogical proportions (3, 4, 9, 10) and
(1, 8, 11, 12).



   The lattice constructed on this formal context (with the In-Close free soft-
ware [14]) has 31 concepts. We have run Doap more than 600 times. It has always
terminated by finding one of the three analogical proportions in the data4 . The
average size of the Closed list is 63 and its median size is 28. Figure 3 gives the
details.




Fig. 3. Results of 622 runs of Doap on BASE lm . The size of the Close list for each run
is on the Y axis.



    To appreciate these results, we have compared with a random search, replac-
ing line 13 of the Doap algorithm by picking a random 4-tuple. The detailed
results are given in Figure 4. The average size of the Closed list is 253 and its
median size is 174.
    We also have tried to “symmetrize” the role of 0 and 1 in this formal context
by adding the reverse attributes (indeed the Table 1 defining analogical propor-
tions is left unchanged when exchanging 0 and 1). It leads to 12 objects and 18
4
    We actually had a good surprise: Doap found a third proportion, namely (2, 4, 9, 12).
      Looking for analogical proportions in a formal concept analysis setting     303




Fig. 4. Results of 630 runs of random Doap on BASE lm . The Y axis is graduated from 0
up to 2000.


attributes instead of 9. The size of the lattice of concepts is now 94. The algo-
rithm Doap with the same parameters examines in average 93 4-tuples before
finding an analogical proportion. The median value is 31. The symmetrization
does not seem to be a good idea in this case. The random Doap algorithm has
failed to give complete results on these data, due to overflows in the Close list.
    The second experiment has been run on the Lenses data base, from UCI ML
Repository [6]. The nominal attributes have been transformed into binary ones
by simply creating as many binary attributes as the number of modalities. The
number of objects is 24 with 7 binary parameters. The size of the lattice is 43, the
average number of examined 4-tuples is 77 and the median number is 20. When
adding the reverse attributes, we have a lattice of size 227, an average number
of 39 and a median number of 16. In that experiment, the symmetrization of the
data seems clearly to have a positive effect.
    A first conclusion is that our heuristic algorithm seems to perform well. In
the second context the basic search space has a size over 40.000 and we examine
only 77 4-tuples in average. The construction of the lattice of concepts takes
in practice much more time than the discovery of analogical proportions, which
seems to suggest that it is a relevant space for looking for analogical proportions.


4     Analogical proportions between formal concepts

We have seen that discovering analogical proportions in a formal context benefits
from the knowledge of the associated lattice of formal concepts. Then it raises
the question of understanding how formal concepts are involved in analogical
proportions. Clearly, four objects in the same formal concept form an analogical
proportion – in a trivial way – w.r.t. the subset of attributes involved in the
formal concept. Partial answers to the question, when two formal concepts are
involved in the proportion, are given in this section.


4.1   The smallest formal context in complete proportion

We are interested in this section in examining the properties of the smallest
context with an analogical proportion between objects. Obviously, this context
will have exactly four objects. If we want to have, only one time, each of the
possible analogical proportions between attributes, we need six of them (see table
1) and we obtain BASE 0 (see Figure 5).
304       Laurent Miclet, Henri Prade and David Guennec


                   f a b c d e                                       uv wx
                                                                     ∅
               u        ×××
               v      × ××
               w     × × ×                             wx       vx           uw   uv
                                                       a        b            c    d
               x     ××     ×

                     a b c d
                                                       u        v            w    x
                   u    ××                             cd       bd           ac   ab
                   v × ×
                   w× ×
                   x××
                                                                     ∅
                                                                     abcd



Fig. 5. BASE 0 , BASE 1 and the concepts lattice of BASE 1 . The lattice of BASE 0 is
deduced from it by adding e to all subsets of attributes.


    We can construct now the concept lattice of BASE 0 , but it is interesting to
get rid of attributes f (which will not be present in any context) and e (present
in every context). We call BASE 1 the reduced context, shown at Figure 5.
    Its lattice is displayed in Figure 5. Note that there is a perfect symmetry
between attributes and objects. The third line of the lattice expresses that u : v
:: w : x, but also in subsets terms that {c, d} : {b, d} :: {a, c} : {a, b}. The second
line expresses that a : b :: c : d and that {w, x} : {v, x} :: {u, w} : {u, v}. This
is not surprising: as explained in section 2, we can see an object as the set of
properties that hold true for it.

4.2     Some relations between analogical proportions and lattices of
        concepts
Firstly, let us remark that the two following propositions are equivalent. This is
immediate from section 2, in which these two equivalent definitions of analogical
proportion have been presented.
 1. x1 , x2 , x3 and x4 are four objects, in analogical proportion in this order.
 2. R↑ (x1 ), R↑ (x2 ), R↑ (x3 ) and R↑ (x4 ) are four subsets of properties in analog-
    ical proportion in this order.

Property 1 Let x1 , x2 , x3 and x4 be four objects in analogical proportion in this
order. Let (X1 , Y1 ) be the5 concept with the smallest set X1 of objects in which
x1 is present. Let us define (X2 , Y2 ), (X3 , Y3 ) and (X4 , Y4 ) in the same way.
Then the four sets of attributes Y1 , Y2 , Y3 and Y4 are in analogical proportion,
in this order.
      Proof. Since x1 ∈ X1 , all the attributes in Y1 take value 1 on x1 . Since X1 is
      the smallest set of objects including x1 , there is no attribute outside Y1 having
5
    If they were two, x1 would be present in the intersection of the two.
      Looking for analogical proportions in a formal concept analysis setting              305


    value 1 on x1 . Hence, Y1 is exactly R↑ (x1 ), the extension of x1 , i.e. the subset
    of attributes that take value 1 on x1 . This is also true for x2 , x3 and x4 .
    We have to prove now that x1 : x2 :: x3 : x4 implies R↑ (x1 ) : R↑ (x2 ) :: R↑ (x3 )
    : R↑ (x4 ). It is immediate from the remark above.                                

For example, in BASE 0 , we know that 1 : 8 :: 11 : 12. We have X1 = {1, 2, 3, 11},
Y1 = {a, b, g}, X2 = {6, 8, 12}, Y2 = {a, c, d, f }, X3 = {11}, Y3 = {a, b, e, g, i}
and X4 = {12}, Y4 = {a, c, d, e, f, i}. The proportion Y1 :Y2 ::Y3 :Y4 holds, since:
{a, b, g}:{a, c, d, f }::{a, b, e, g, i}:{a, c, d, e, f, i}.
Property 2 Let (X1 , Y1 ), (X2 , Y2 ), (X3 , Y3 ) and (X4 , Y4 ) be four concepts of a
lattice of concepts, such that the four sets of attributes Y1 , Y2 , Y3 and Y4 are in
analogical proportion, in this order. Let X b1 be the subset of X1 composed of all
objects that are in X1 but cannot be found in any subset of X1 belonging to a
concept. We define in the same manner X     b2 , X
                                                 b3 and Xb4 . The following property
                    b        b         b          b
holds true: ∀x1 ∈ X1 , x2 ∈ X2 , x3 ∈ X3 , x4 ∈ X4 : x1 x2 :: x3 : x4 .
    Proof. It is the reciprocal of Property 1: Y1 is the extension of all objects in
    b1 , and we take x1 in X
    X                        b1 . We derive the conclusion from the remark above.
    

Property 3 Let x1 , x2 , x3 and x4 be four objects, in analogical proportion in
this order.
    Let A1111 = {y|y ∈ R↑ (x1 ), y ∈ R↑ (x2 ), y ∈ R↑ (x3 ), y ∈ R↑ (x4 ))}
    Let A1100 = {y|y ∈ R↑ (x1 ), y ∈ R↑ (x2 ), y 6∈ R↑ (x3 ), y 6∈ R↑ (x4 ))}
    Let A0011 = {y|y 6∈ R↑ (x1 ), y 6∈ R↑ (x2 ), y ∈ R↑ (x3 ), y ∈ R↑ (x4 ))}
    Let A1010 = {y|y ∈ R↑ (x1 ), y 6∈ R↑ (x2 ), y ∈ R↑ (x3 ), y 6∈ R↑ (x4 ))}
    Let A0101 = {y|y 6∈ R↑ (x1 ), y ∈ R↑ (x2 ), y 6∈ R↑ (x3 ), y ∈ R↑ (x4 ))}
    Then
    ({x1 , x2 }, A1111 ∪ A1100 ) is included into a formal concept.
    ({x3 , x4 }, A1111 ∪ A0011 ) is included into a formal concept.
    ({x1 , x3 }, A1111 ∪ A1010 ) is included into a formal concept.
    ({x2 , x4 }, A1111 ∪ A0101 ) is included into a formal concept.


The result follows from the definition of the subsets of attributes considered and
their clear relation with the definition of analogical proportions. The fact that
we only have an inclusion in the above property should not come as a surprise.
Indeed, when describing objects, attributes that are nor not relevant w.r.t. the
analogical proportion may be present.


5    Lines for further research and concluding remarks
Beyond the already introduced set function, R↓ (Y ) = {x ∈ Obj|R↑ (x) ⊇ Y },
which is at the core of FCA,and which leads to the definition of formal concepts,
it has been noticed [5], on the basis of a parallel with possibility theory that,
given a set Y of properties, four remarkable sets of objects can be defined in this
setting (here the overbar denotes set complementation):
306      Laurent Miclet, Henri Prade and David Guennec


 – R↓Π (Y ) = {x ∈ Obj|R↑ (x) ∩ Y 6= ∅} = ∪y∈Y R↓ (y). This is the set of objects
   having at least one property in Y .
 – R↓N (Y ) = {x ∈ Obj|R↑ (x) ⊆ Y } = ∩y6∈Y R↓ (y). This is the set of objects
   having no property outside Y .
 – R↓∆ (Y ) = R↓ (Y ) = ∩y∈Y R↓ (y). This is the set of objects sharing all prop-
   erties in Y .
 – R↓∇ (Y ) = {x ∈ Obj|R↑ (x) ∪ Y 6= Obj} = ∪y6∈Y R↓ (y). This is the set of
   objects that are missing at least one property outside Y .


   It has been recently pointed out [3] that pairs (X, Y ) such that R↓N (Y ) = X
and R↑N (X) = Y are characterizing independent sub-contexts (X, Y ) such that
((X × Y ) + (X × Y ) ⊇ R, in the sense that they do not share any object or
property. Thus, in Figure 1, ({a, b, c, d, e, f }, {5, 6, 7, 8}) and ({g, h, i}, {1, 2, 3, 4})
are two formal sub-contexts.
    When comparing the features underlying FCA and analogical proportions,
one can notice that the same 4 “indicators” are involved from the beginning: a∩b,
a ∩ b, a ∩ b, and a ∩ b. Indeed R↓Π (Y ) is based on the condition R↑ (x) ∩ Y 6= ∅,
R↓N (Y ) on the condition R↑ (x)∩Y = ∅, R↓∆ (Y ) on the condition R↑ (x)∩Y = ∅,
and R↓∇ (Y ) on the condition R↑ (x) ∩ Y 6= ∅. Moreover, with these 4 indicators,
one can define other so-called logical proportions [4], including some that are
closely related to analogical proportions such as ‘paralogy’ which reads “what
a and b have in common, c and d have it also” and is defined by a ∧ b =
c ∧ d and a ∧ b = c ∧ d [10]. This more generally raises the question of the rela-
tions between FCA and these logical proportions.

    Finally, the experiments with Doap have obviously to be scaled on larger
formal contexts, in order to estimate its practical complexity more accurately.
Some more thought has also to be given about the choice of the Start 4-tuples,
especially to take advantage of the addition of the reverse attributes. An inter-
esting point would be to be able to choose the Start in order to insure that every
analogical proportion can be discovered. We also believe that the speed of Doap
can be increased, since there are still a lot of parameters to tune, for example
breaking ties in the head of the Close list in a non random fashion.
   An interesting question is whether or not the construction of the lattice must
precede the heuristic search. It would certainly be of great interest to construct
only the parts that are required by the running od the Doap algorithm. This
would lead to merge the two parts of the method, rather than computing the
whole lattice (a very costly operation) before its exploration.
    More generally, it would be clearly of interest to have an algorithm also able
to find out the analogical proportions that hold in some sub-context (since as
already said, irrelevant attributes may hide interesting analogical proportions),
rather than in the initial formal context. This will open a machine learning point
of view [1].
      Looking for analogical proportions in a formal concept analysis setting       307


6    Aknowledgements

We would like to thank the anonymous reviewers for their careful reading of this
article and their interesting suggestions.


References
 1. S. Bayoudh, L. Miclet, and A. Delhay. Learning by analogy: a classification rule for
    binary and nominal data. Proc. 20th Inter. Joint Conf. on Artificial Intelligence,
    (M. M. Veloso, ed.), Hyderabad, India, AAAI Press, 678–683, 2007.
 2. R. Bělohlávek. Introduction to formal context analysis. Internal report. Dept of
    Computer science. Palacký University, Olomouk, Czech Republic. 2008.
 3. Y. Djouadi, D. Dubois, H. Prade. Possibility theory and formal concept analysis:
    Context decomposition and uncertainty handling. Proc. 13th Inter. Conf. on Infor-
    mation Processing and Management of Uncertainty (IPMU’10), (E. Hüllermeier,
    R. Kruse and F. Hoffmann, eds.), Dortmund, Springer, LNCS 6178, 260–269, 2010.
 4. H. Prade, G. Richard. Logical proportions - Typology and roadmap. Proc. Inter.
    Conf. on Information Processing and Management of Uncertainty in Knowledge-
    based Systems (IPMU 2010), Dortmund, (E. Hüllermeier, R. Kruse, F. Hoffmann,
    eds.), Springer, LNCS 6178, 757–767, 2010.
 5. D. Dubois, F. Dupin de Saint-Cyr, H. Prade. A possibility-theoretic view of formal
    concept analysis. Fundamentae Informaticae, 75, 195–213, 2007.
 6. A. Frank and A. Asuncion. (2010). UCI Machine Learning Repository
    [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of In-
    formation and Computer Science.
 7. B. Ganter and R. Wille. Formal Concept Analysis. Mathematical Foundations.
    Springer Verlag, 1999.
 8. Y. Lepage. De l’analogie rendant compte de la commutation en linguistique.
    http://www.slt.atr.jp/ lepage/pdf/dhdryl.pdf, Grenoble, 2001. HDR.
 9. L. Miclet and H. Prade. Handling analogical proportions in classical logic and fuzzy
    logics settings. Proc. 10th Europ. Conf. on (ECSQARU’09), Verona, Springer,
    LNCS 5590, 2009, 638–650.
10. H. Prade and G. Richard. Analogy, paralogy and reverse analogy: Postulates and
    Inferences. Proc. Annual German Conf. on Artificial Intelligence (KI 2009), Pader-
    norn, Sept. 15-18, (B. Mertsching, M. Hund, Z. Aziz, eds.), Springer, LNAI 5803,
    306–314, 2009.
11. N. Nilsson. Principles of Artificial Intelligence. Tioga, 1980.
12. N. Stroppa and F. Yvon. Analogical learning and formal proportions:
    Definitions and methodological issues. Technical Report ENST-2005-D004,
    http://www.tsi.enst.fr/publications/enst/techreport-2007-6830.pdf, June 2005.
13. R. Wille. Restructuring lattice theory: an approach based on hierarchies of con-
    cepts. In: Ordered Sets, (I. Rival, ed.), D. Reidel, Dordrecht, 445–470, 1982.
14. The Inclose software. http://inclose.sourceforge.net/. Downloaded on March 2011.
                               Random extents
                         and random closure systems
                                           Bernhard Ganter

                                            Institut für Algebra
                                      Technische Universität Dresden




         Abstract. We discuss how to randomly generate extents of a given formal context. Our
         basic method involves counting the generating sets of an extent, and we show how this
         can be done using the Möbius function. We then show how to generate closure systems
         on seven elements uniformly at random.




1       Introduction
Let Random(0,1] denote an operator that generates a random number between
0 and 1 with equal probability. From such a (memoryless) random number gen-
erator an operator Random_subset(S ) can be derived that produces, upon
each invocation, a random subset of a given nite set S , such that all subsets are
equally likely (see, e.g., [6]).
      Building on this we derive in this article an operator that randomly selects a
                                                    1
closed set from a given closure system                  on a nite set.
      Note that this is a trivial task for moderately sized systems of which you can
label the closed sets by numbers 1, . . . , n. For such you could simply randomly
pick a number between 1 and n and select the closed set labeled by this number.
Since the size of a closure system is at most exponential in the size of its carrier,
this trivial algorithm clearly requires polynomial time. However, a potentially
exponential list of closed sets must be pre-computed and stored.
      For example we aim at generating                  closure systems at random2 . But there
are many closure systems, even for small carrier sets. On seven elements the
number was recently computed by Colomb, Irlande, and Raynaud [3] to be
14 087 648 235 707 352 472. Maintaining a list of this size is not an inviting
idea, and thus the trivial approach is not very realistic.
      Our motivation comes from recent experimental computer investigations by
D. Borchmann that yielded surprising results. Borchmann raised the question if
these were artefacts caused by the non-uniform choice of the random input data.
      Have a look at Figure 1. It shows ve diagrams, each with 27 rows and 13
columns, corresponding to the possible number of meet reducible and irreducible
closed sets in a closure system on a ve element set (the trivial system with zero
irreducibles is omitted). A system with r reducibles and i irreducibles corresponds

 1
     That is, from an intersection-closed familiy of sets. Such families are also called Moore families.
 2
     The family of all closure systems on a xed set is itself a closure system.




c 2011 by the paper authors. CLA 2011, pp. 309–318. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
310      Bernhard Ganter




                 a                     b                      c                    d                     e




Fig. 1. Closure systems on ve elements by their number of meet-irreducible (horizontal) and reducible
closed sets (vertical). Tile a shows the possible values, tile b the true relative frequencies, tile c and d
come from random contexts and tile e from picking systems uniformly at random.




to the cell in the r -th row from the bottom and the i-th column from the left. The
rst diagram depicts which combinations of r and i are possible, while the other
four display relative frequencies (the darker, the higher). The second diagram
shows the true frequencies, counted over all 1 385 552 closure systems on ve
elements. The other three show frequencies of randomly chosen closure systems
(1 000 000 samples each). For the diagram in the middle, the systems were made
by putting random crosses in a 5×13context. The fourth diagram was obtained
by putting random crosses with random density in a formal context with a
random number of columns. The fth diagram shows the distribution of a sample
picked with uniform distribution.
      We use the language of Formal Concept Analysis [4] and, in particular, that
every closure system is the set Ext(G, M, I) of all extents of some formal context
(G, M, I). We construct an operator Random_extent(G, M, I ) which selects,
upon each invocation, randomly an extent of (G, M, I), with equal probability
for all extents.
      The closure operator for the extents will be denoted by


                                                X 7→ X 00 .
        00
If X = Y , then Y is called agenerating set of the extent X (not necessarily a
minimal one). The number of generating sets of an extent E shall be denoted by
egen(E). We extend this denition to arbitrary subsets so that

                               egen(Y ) := |{X ⊆ G | X 00 = Y 00 }|

gives the number of generating sets of the extent generated by Y . Of course then,
egen(Y ) = egen(Y 00 ). Therefore if E is an extent then obviously
                                           X             1
                                                                  = 1.
                                                      egen(Y )
                                       {Y |Y 00 =E}
                                           Random extents and random closure systems                      311




Computing the function egen() is a nontrivial task. We shall discuss this below.
     Our method could theoretically be applied to many instances, such as generat-
ing random partitions, random subgroups, etc. However, its runtime performance
is very bad. For most such situations algorithms are known that are much more
ecient than what we suggest. Indeed, we do not believe that our method will
be very useful in practice. Our contribution is meant as a challenge to come up
with a more ecient approach.
     We are grateful to the referees for several useful hints. We were unaware
of the paper by Boley, Gärtner, and Grosskreutz [2], which addresses the same
problem, but with a dierent and more general approach. It may well be that their
algorithm yields better results even for generating random closure systems. We
have also learnt that the problem of generating random extents is known to belong
to a (dicult) complexity class: it is equivalent to the #RHΠ1 -hard problem
of counting formal concepts (again, see [2] and the references given there). We
already knew (because our colleagues of the stochastics group told us so and
recommended the book by Asmussen and Glynn [1] as a standard reference) that
our approach is an instance of the so-called             acceptance-rejection method.

2      Random Extent
Our innocent looking algorithm for generating a random extent of a given formal
context (G, M, I) goes like this:



Algorithm 1 Random_extent: Generating a random extent
Input: A formal context (G, M, I).
Output: A random extent of (G, M, I)
  repeat
      S := Random_subset(G)
                           1
    until Random(0,1] ≤ egen (S)
    return S 00 .


     What the algorithm does essentially is to pick a random subset and output
                                                                                       3
its closure with probability one over the number of generating sets . It is quite
elementary to prove that it does what it is supposed to do:

Proposition 1 The algorithm Random_extent generates extents of (G, M, I)
with equal probability.
     The proposition is an instance of the following lemma from elementary stochas-
tics, for which we provide a proof. To obtain the proposition from the lemma, let

3
    One of the referees pointed out that a much simpler algorithm with the same number of expected
    iterations is obtained by replacing the until statement by until S is closed. We see however no
    straightforward way to a recursive version of this algorithm.
312      Bernhard Ganter




A be the set of all subsets of G, let B be the set of all extents, and let f be the
map that associates a subset to the extent it generates.

Lemma 1 Let f : A → B be a surjective (i.e., onto) map between nite sets A
and B and let Random(A) be an operator that picks elements from A with equal
probability. Then Algorithm 2 outputs elements of B with equal probability.

Algorithm 2 Random image: Random image of a mapping
Input: An onto map f : A → B and an operator Random(A)
Output: A random element of B
  repeat
      a := Random(A)
      r := Random(0,1]
        b := f (a);
  until r ≤ |f −11(b)|
  return b.



Proof    It is obvious that the algorithm produces elements of B . In order that a
given element b is produced in one iteration of the loop, the element a must belong
to f
     −1
        (b) and, independently, r ≤ |f −11(b)| . The probability that this happens is

                                 |f −1 (b)|     1       1
                                            · −1     =     ,
                                    |A|      |f (b)|   |A|
independently of b. The probability that some element is selected after one step
        |B|
thus is     . The probability that the element b is produced after k steps is
        |A|
                                          k−1
                                       |B|         1
                                    1−          ·     .
                                       |A|        |A|
The probability that b is produced is

                         X∞        k−1
                                |B|         1    |A| 1        1
                             1−          ·     =     ·     =     ,
                         k=1
                                |A|        |A|   |B|   |A|   |B|

as claimed.                                                                        
The expected number of iterations until success is

                                     |A|   #subsets
                                         =          .
                                     |B|   #extents
The algorithm may therefore need quite some time. For example, would this algo-
rithm be applied to the standard context of closure systems to generate a random
                                  Random extents and random closure systems           313




closure system on a 6-element set, it requires, on average, 121 402 088 iterations
                                     6
of the loop, since that context has 2 − 1 objects and 75 973 751 474 extents ([5]).
For closure systems on a seven-element set the average number of loop iterations
for obtaining a single random closure system would be 12 077 330 482 260 320 447.
As already mentioned we shall develop a better method for this case below. Before
we do so, we study the problem of computing the value of egen(A).



3    Counting generating sets and hitting sets
The algorithm in the previous section uses the number egen(A) of a given extent
A, and that by itself is not easy. Of course, each such generating set must be a
subset of A. On the other hand, a subset S ⊆ A is a generating set of A i it is
not contained in a lower neighbor of A. It is worthwhile to consider the formal
context
                                     (A, N , ∈),
where     N is the family of lower neighbor extents of A. For this context, the
elements of N are precisely the maximal extents below A, and thus the generating
sets of   A are the same as before. Counting generating sets thereby has been
reduced to counting generating sets of the unit element in a co-atomistic lattice.
   Every subset of A is generating set of exactly one extent of (A, N , ∈). The
                                         |A|
total number of generating sets thus is 2 . Indeed, for every extent B we obtain

                                X
                                     egen(E) = 2|B| ,
                               E≤B


where E runs over extents. By Möbius inversion we obtain

                                       X
                           egen(A) =         µ(E, A) · 2|E| ,
                                       E≤A


where µ is the Möbius function of the lattice B(A, N , ∈).
    The evaluation of this formula poses no algorithmic diculties. Using the
standard Next_intent algorithm ([4]) to generate the extents in descending
order, and using, for every constructed extent E , the same algorithm again for
producing all extents F between E and A, suces to compute the Möbius function
by the well known recursion

                                             X
                            µ(E, A) = −            µ(F, A).
                                          E 2n−1 ) do
           if F [i] = 2 then
                 j := 2n−1 − 1
                 while success and (j > 0) do
                      meet := i and j
                      if (j 6= meet) and (F [meet] 6= 1) then
                           success := (Random(0,1] < 0.5)
                           F [meet] := 1
                      j := j − 1
           i := i − 1
 until success
 return F .
318      Bernhard Ganter




      We have implemented Algorithm 5 for n = 7 and present rst experimental
results. Note that the number of random samples produced by this experiment
is small compared to the number of closure systems: we have generated less than
0.000 000 000 000 4% of all closure systems on seven points.


            85                                           85
            80                                           80
            75                                           75
            70                                           70
            65                                           65
            60                                           60
            55                                           55
            50                                           50
            45                                           45
            40                                           40
            35                                           35
            30                                           30
            25                                           25
            20                                           20
            15                                           15
            10                                           10
             5                                            5

                   5   10 15 20 25 30 35                        5   10 15 20 25 30 35

Fig. 2. A random sample of 50 000 closure systems on a seven element set, plotted according to their
number of irreducible closed sets (horizontal) and reducible closed sets (vertical). The left image shows
which sizes occurred at least once. The right image expresses higher frequencies by darker shadings.




      The computation took one night on a 1.4 GHz PC. We did not even attempt
to generate random closure systems on eight elements using Algorithm 5. We
believe that a substantially better idea is needed for that case and beyond.



References
1. S. Asmussen and P. W. Glynn. Stochastic Simulation. Springer-Verlag, New York, 2007.
2. Mario Boley, Henrik Grosskreutz, and Thomas Gärtner. Formal concept sampling for counting and
   thresholdfree local pattern mining. In Proc. of the SIAM Int. Conf. on Data Mining (SDM 2010).
   SIAM, 2010.
3. Pierre Colomb, Alexis Irlande, and Olivier Raynaud.        Counting of Moore families for n = 7.   In
   ICFCA'10, pages 7287, 2010.
4. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis - mathematical foundations. Springer
   Verlag, 1999.
5. Michel Habib and Lhouari Nourine. The number of Moore families on n = 6. Discrete Mathematics,
   pages 291296, 2005.
6. Albert Nijenhuis and Herbert S. Wilf. Combinatorial algorithms. Academic Press, 1975.
                  Extracting Decision Trees from
                 Interval Pattern Concept Lattices

      Zainab Assaghir1 , Mehdi Kaytoue2 , Wagner Meira Jr.2 and Jean Villerd3
                   1
                     INRIA Nancy Grand Est / LORIA, Nancy, France
             2
               Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
         3
           Institut National de Recherche Agronomique / Ensaia, Nancy, France
             Zainab.Assaghir@loria.fr, {kaytoue,meira}@dcc.ufmg.br,
                              Jean.Villerd@nancy.inra.fr




         Abstract. Formal Concept Analysis (FCA) and concept lattices have
         shown their effectiveness for binary clustering and concept learning.
         Moreover, several links between FCA and unsupervised data mining
         tasks such as itemset mining and association rules extraction have been
         emphasized. Several works also studied FCA in a supervised framework,
         showing that popular machine learning tools such as decision trees can be
         extracted from concept lattices. In this paper, we investigate the links be-
         tween FCA and decision trees with numerical data. Recent works showed
         the efficiency of ”pattern structures” to handle numerical data in FCA,
         compared to traditional discretization methods such as conceptual scal-
         ing.



  1    Introduction

  Decision trees (DT) are among the most popular classification tools, especially
  for their readability [1]. Connexions between DT induction and FCA have been
  widely studied in the context of binary and nominal features [2], including struc-
  tural links between decision trees and dichotomic lattices [8], and lattice-based
  learning [7]. However the numerical case faces issues regarding FCA and numer-
  ical data. In this paper, we investigate the links between FCA and decision trees
  with numerical data and a binary target attribute. We use an extension of For-
  mal Concept Analysis called interval pattern structures to extract sets of positive
  and negative hypothesis from numerical data. Then, we propose an algorithm
  thats extract decision trees from minimal positive and negative hypothesis.
      The paper is organised as follows. Section 2 presents the basics of FCA and
  one of its extensions called interval pattern structureq for numerical data. Sec-
  tion 3 recalls basic notions of decision trees. Then, we introduce some definitions
  in section 4 showing the links between interval pattern structures and decision
  trees, and a first algorithm for building decision trees from minimal positive and
  negative hypothesis extracted from the pattern structures.

c 2011 by the paper authors. CLA 2011, pp. 319–332. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
320
II      Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd


2     Pattern structures in formal concept analysis
Formal contexts and concept lattices. We assume that the reader is familiar
with FCA, and recall here most important definitions from [3]. Basically, data
are represented as a binary table called formal context (G, M, I) that represents
a relation I between a set of objects G and a set of attributes M . The statement
(g, m) ∈ I is interpreted as “the object g has attribute m”. The two operators (·)0
define a Galois connection between the powersets (2G , ⊆) and (2M , ⊆), with
A ⊆ G and B ⊆ M :
                A0 = {m ∈ M | ∀g ∈ A : gIm}          f or A ⊆ G,
                  0
                B = {g ∈ G | ∀m ∈ B : gIm}           f or B ⊆ M
For A ⊆ G, B ⊆ M , a pair (A, B), such that A0 = B and B 0 = A, is called a
(formal) concept. In (A, B), the set A is called the extent and the set B the
intent of the concept (A, B). The set of all concepts is partially ordered by
(A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1 ) and forms a complete lattice
called the concept lattice of the formal context (G, M, I).
    In many applications, data usually consist in complex data involving num-
bers, intervals, graphs, etc. (e.g. Table 1) and require to be conceptually scaled
into formal contexts. Instead of transforming data, leading to representation and
computational difficulties, one may directly work on the original data. Indeed,
to handle complex data in FCA, Ganter & Kuznetsov [4] defined pattern struc-
tures: it consists of objects whose descriptions admit a similarity operator which
induces a semi-lattice on data descriptions. Then, the basic theorem of FCA nat-
urally holds. We recall here their basic definitions, and present interval pattern
structures from [5] to handle numerical data.
    Patterns structures. Formally, let G be a set of objects, let (D, u) be
a meet-semi-lattice of potential object descriptions and let δ : G −→ D be a
mapping. Then (G, (D, u), δ) is called a pattern structure. Elements of D are
called patterns and are ordered by a subsumption relation v such that given
c, d ∈ D one has c v d ⇐⇒ c u d = c. A pattern structure (G, (D, u), δ) gives
rise to the following derivation operators (·) , given A ⊆ G and an interval
pattern d ∈ (D, u):
                                 A =
                                       l
                                           δ(g)
                                        g∈A
                              
                             d = {g ∈ G|d v δ(g)}
These operators form a Galois connection between (2G , ⊆) and (D, v). (Pattern)
concepts of (G, (D, u), δ) are pairs of the form (A, d), A ⊆ G, d ∈ (D, u), such
that A = d and A = d . For a pattern concept (A, d), d is called a pattern
intent and is the common description of all objects in A, called pattern extent.
When partially ordered by (A1 , d1 ) ≤ (A2 , d2 ) ⇔ A1 ⊆ A2 (⇔ d2 v d1 ), the set
of all concepts forms a complete lattice called a (pattern) concept lattice.
   Interval pattern structures. Pattern structures allow us to consider com-
plex data in full compliance with FCA formalism. This requires to define a meet
            Extracting Decision Trees From Interval Pattern Concept Lattices              321
                                                                                           III

operator on object descriptions, inducing their partial order. Concerning numer-
ical data, an interesting possibility presented in [5] is to define a meet operator
as an interval convexification. Indeed, one should realize that “similarity” or “in-
tersection” between two real numbers (between two intervals) may be expressed
in the fact that they lie within some (larger) interval, this interval being the
smallest interval containing both two. Formally, given two intervals [a1 , b1 ] and
[a2 , b2 ], with a1 , b1 , a2 , b2 ∈ R, one has:

                     [a1 , b1 ] u [a2 , b2 ] = [min(a1 , a2 ), max(b1 , b2 )]
                     [a1 , b1 ] v [a2 , b2 ] ⇔ [a1 , b1 ] ⊇ [a2 , b2 ]

The definition of u implies that smaller intervals subsume larger intervals that
contain them. This is counter intuitive referring to usual intuition, and is ex-
plained by the fact that u behaves as an union (actually convex hull is the union
of intervals, plus the holds between them).
    These definitions of u and v can be directly applied component wise on
vectors of numbers or intervals, e.g. in Table 1 where objects are described by
vectors of values, each dimension corresponding to an attribute. For example,
h[5, 7.2], [1, 1.8]i v h[5, 7], [1, 1.4]i as [5, 7.2] v [5, 7] and [1, 1.8] v [1, 1.4].
    Now that vectors of interval forms a u-semi-lattice, numerical data such
as Table 1 give rise to a pattern structure and a pattern concept lattice. An
example of application of concept forming operators (.) is given below. The
corresponding pattern structure is (G, (D, u), δ) with G = {p1 , ..., p4 , n1 , ..., n3 }
and d ∈ D is a vector with ith component corresponding to attribute mi .

            {p2 , p3 } = δ(p2 ) u δ(p3 ) = h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i
          {p2 , p3 } = {p2 , p3 , p4 }

As detailed in [5], vectors of intervals can be seen as hyperrectangles in Eu-
clidean space: first (.) operator gives the smallest rectangle containing some
object descriptions while second (.) operator returns the set of objects whose
descriptions are rectangles included in the rectangle in argument. Accordingly,
({p2 , p3 , p4 }, h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i) is a pattern concept. All pattern
concepts of an interval pattern structure form a concept lattice. Intuitively, low-
est concepts have few objects and “small” intervals while higher concepts have
“larger” intervals. An example of such lattice is given later.


3    Decision trees
Among all machine leaning tools, decision trees [6, 1] are one of the most widely
used. They belong to the family of supervised learning techniques, where data
consist in a set of explanatory attributes (binary, nominal or numerical) that
describe each object, called example, and one target class attribute that affects
each example to a nominal class. Many extensions have been proposed, e.g. to
consider a numerical class attribute (regression trees) or other particular cases
depending on the nature of attributes. In this paper we focus on data consisting
322
IV      Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd


                                  m1 m2 m3 m4 
                               p1 7 3.2 4.7 1.4 +
                               p2 5 2 3.5 1 +
                               p3 5.9 3.2 4.8 1.8 +
                               p4 5.5 2.3 4 1.3 +
                               n1 6.7 3.3 5.7 2.1 -
                               n2 7.2 3.2 6 1.8 -
                               n3 6.2 2.8 4.8 1.8 -
      Table 1: Example 1: numerical context with an external target attribute 



of numerical explanatory attributes and a binary class attribute. The aim of de-
cision tree learning is to exhibit the relation between explanatory attributes and
the class attribute through a set of decision paths. A decision path is a sequence
of tests on the value of explanatory attributes that is a sufficient condition to
assign a new example to one of the two classes. A decision tree gathers a set of
decision paths through a tree structure where nodes contain tests on explana-
tory attributes. Each node has two branches, the left (resp. right) corresponds
to the next test if the new example passed (resp. failed) the current test. When
there is no more test to perform, the branch points to a class label, that repre-
sents a leaf of the tree. The links between FCA and decision tree learning have
been investigated in the case where explanatory attributes are binary [7–10, 2].
However, to our knowledge, no research has been carried out until now in the
case of numerical explanatory attributes. In the next section, we show how pat-
tern structures can be used to extract decision trees from numerical data with
positive and negative examples.


4     Learning in interval pattern structures
In [7], S. Kuznetsov considers a machine learning model in term of formal concept
analysis. He assumes that the cause of a target property resides in common at-
tributes of objects sharing this property. In the following, we adapt this machine
learning model to the case of numerical data.
    Let us consider an interval pattern structure (G, (D, u), δ) with an external
target property . The set of objects G (the training set) is partitioned into
two disjoints sets: positive G+ and negative G− . Then, we obtain two different
pattern structures (G+ , (D, u), δ) and (G− , (D, u), δ).
Definition 1 (Positive hypothesis). A positive hypothesis h is defined as an
interval pattern of (G+ , (D, u), δ) that is not subsumed by any interval pattern
of (G− , (D, u), δ), i.e. not subsumed by any negative example. Formally, h ∈ D
is a positive hypothesis iff
              h ∩ G− = ∅      and   ∃A ⊆ G+     such that    A = h
Definition 2 (Negative hypothesis). A negative hypothesis h is defined as
an interval pattern of (G− , (D, u), δ) that is not subsumed by any interval pattern
            Extracting Decision Trees From Interval Pattern Concept Lattices                323
                                                                                              V

of (G+ , (D, u), δ), i.e. not subsumed by any positive example. Formally, h ∈ D
is a negative hypothesis iff

                h ∩ G+ = ∅        and    ∃A ⊆ G−       such that     A = h

Definition 3 (Minimal hypothesis). A positive (resp. negative) hypothesis h
is minimal iff there is no positive (resp. negative) hypothesis e 6= h such that
e v h.

    Going back to numerical data in Table 1, we now consider the ex-
ternal binary target property and split accordingly the object set into
G+ = {p1 , p2 , p3 , p4 } and G− = {n1 , n2 , n3 }. The pattern concept lat-
tice of (G+ , (D, u), δ), where D is the semi-lattice of intervals and δ is a
mapping associating for each object its pattern description is given in Fig-
ure 1 where positive hypothesis are marked. Note that neither the interval
pattern h[5.5, 7], [2.3, 3.2], [4, 4.8], [1.3, 1.8]i nor h[5, 7], [2, 3.2], [3.5, 4.8], [1, 1.8]i
are positive hypothesis since they are both subsumed by the inter-
val pattern δ(n3 ) = h[6.2, 6.2], [2.8, 2.8], [4.8, 4.8], [1.8, 1.8]i. Therefore, there
are two minimal positive hypothesis: P1 = h[5, 7], [2, 3.2], [3.5, 4.7], [1, 1.4]i
and P2 = h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i. From (G− , (D, u), δ) (not
shown), we obtain the unique minimal negative hypothesis: N1 =
h[6.2, 7.2], [2.8, 3.3], [4.8, 6], [1.8, 2.1]i.
    Now, we consider decision trees more formally. Let the training data be de-
scribed by K+− = (G+ ∪ G− , (D, u), δ) with the derivation operator denoted by
(.) . This operator is called subposition in term of FCA.

Definition 4 (Decision path). A sequence h(m1 , d1 ), (m2 , d2 ), . . . , (mk , dk )i,
for different attributes m1 , m2 , · · · , mk chosen one after another, is a called deci-
sion path of length k if there is no mi such that (mi , di ), (mi , ei ) and di and ei are
not comparable, and there exists g ∈ G+ ∪ G− such that hd1 , d2 , . . . , dk i v δ(g)
(i.e. there is at least one example g such that di v δ(g) for each attribute mi ).
For instance, h(m3 , [4.8, 6]), (m1 , [6.2, 7.2])i is a decision path for Example 1.

If i ≤ k (respectively i < k), the sequence h(m1 , d1 ), (m2 , d2 ), . . . , (mi , di )i is
called subpath (proper subpath) of a decision path h(m1 , d1 ), (m2 , d2 ), . . . , (mk , dk )i.

Definition 5 (Full decision path). A sequence h(m1 , d1 ), (m2 , d2 ), . . . , (mk , dk )i,
for different attributes m1 , m2 , . . . , mk chosen one after another, is called full de-
cision path of length k if all object having (m1 , d1 ), (m2 , d2 ), . . . , (mk , dk ) (i.e.
∀g ∈ G, di v δ(g) for the attribute mi ) are either positive or negative examples
(i.e. have either + or − value of the target attribute).

We say that a full decision path is non-redundant if none of its subpaths is a
full decision path. The set of all chosen attributes in a full decision path can be
considered as a sufficient condition for an object to belong to a class  ∈ {+, −}.
A decision tree is then defined as the set of full decision paths.
Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd




                                                                              P1                                                                P2
                                                                minimal positive hypothesis                                          minimal positive hypothesis
                                                                                                        positive hypothesis
                                                                                   Fig. 1: Lattice of the pattern structure (G+ , (D, u), δ).
324
VI
            Extracting Decision Trees From Interval Pattern Concept Lattices                  325
                                                                                              VII

4.1     A first algorithm for building decision trees from interval
        pattern structures

In this section, we propose a first algorithm for extracting full decision paths
from the sets of minimal positive hypothesis P and minimal negative hypoth-
esis N . Intuitively, minimal positive (resp. negative) hypothesis describe the
largest areas in the attribute space that gathers the maximum number of posi-
tive (resp. negative) examples with no negative (resp. positive) example. Positive
and negative areas may intersect on some dimensions. In Example 1 (see Table 1),
P = {P1 , P2 } and N = {N1 } and we denote by Pi ∩ Nj the interval vector for
which the k-the component is the intersection of the Pi and Nj intervals for
the k-the component. Recall that P1 = h[5, 7], [2, 3.2], [3.5, 4.7], [1, 1.4]i, P2 =
h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i and N1 = h[6.2, 7.2], [2.8, 3.3], [4.8, 6], [1.8, 2.1]i.
Then we have:
                              P1 ∩ N1 = h[6.2, 7], [2.8, 3.2], ∅, ∅i
                            P2 ∩ N1 = h∅, [2.8, 3.2], [4.8], [1.8]i
    We note that P1 and N1 have no intersection for attributes m3 and m4 . This
means that any example that has a value for m3 (resp. m4 ) that is contained in
P1 ’s interval for m3 (resp. m4 ) can directly be classified as positive. Similarly,
any example having a value for m3 (resp. m4 ) contained in N1 ’s interval for m3
(resp. m4 ) can directly be classified as negative. The same occurs for P2 and N1
for m1 .
    Therefore a full decision path for a minimal positive hypothesis P is defined
as a sequence h(mi , mi (P ))ii∈{1...|N |} where mi is an attribute such that mi (P ∩
Ni ) = ∅4 . A full decision path for a minimal negative hypothesis N is defined as
a sequence h(mj , mj (N ))ij∈{1...|P|} where mj is an attribute such that mj (N ∩
Pi ) = ∅.
    Here examples of such decision paths (built from P1 , P2 and N1 respectively)
are:
                                h(m3 , [3.5, 4.7])i(P1 )
                                      h(m4 , [1, 1.4])i(P1 )
                                      h(m1 , [5, 5.9])i(P2 )
                            h(m3 , [4.8, 6]), (m1 , [6.2, 7.2])i(N1 )
                           h(m4 , [1.8, 2.1]), (m1 , [6.2, 7.2])i(N1 )
   Decision paths built from P1 and P2 are sequences that contain a single
element since |N | = 1. Decision paths built from N1 are sequences that contain
two elements since |P| = 2. Two distinct full decision paths can be built from
P1 since there are two attributes for which P1 and N1 do not intersect.
   A positive (resp. negative) decision tree is therefore a set of full decision
paths, one for each minimal positive (resp.negative) hypothesis. For instance:
4
    For any interval pattern P , the notation mi (P ) denotes its interval value for the
    attribute mi .
326
VIII     Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd


”if m3 ∈ [3.5, 4.7] then +, else if m1 ∈ [5, 5.9] then + else -” is an example of
positive decision path. An example of negative decision path is ”if m1 ∈ [6.2, 7.2]
and m3 ∈ [4.8, 6] then -, else +”.
    Algorithm 1 describes the computation of full decision paths for minimal pos-
itive hypothesis. The dual algorithm for minimal negative hypothesis is obtained
by interchanging P and N .



 1 Res ← empty array of size |P|;
 2 foreach P ∈ P do
 3    foreach N ∈ N such that ∃mi , mi (P ∩ N ) = ∅ do
 4        if (mi , mi (P )) 6∈ Res[P ] then
 5            Res[P ] ← Res[P ] ∪ (mi , mi (P ));

 6      foreach N ∈ N such thatW6 ∃mi , mi (P ∩ N ) = ∅ do
 7         Res[P ] ← Res[P ] ∪ { m∈M (m, m(P ) \ m(P ∩ N ))};

 Algorithm 1: Modified algorithm for extracting full decision paths Res (in-
 cluding non-redundant) for minimal positive hypothesis



     The different steps of the algorithm are detailed below:

line 1: Res will contain a full decision path for each minimal positive hypothesis.
line 2: Process each minimal positive hypothesis P .
line 3: For each minimal negative hypothesis N that has at least one attribute
m such that m(P ∩ N ) = ∅, choose one of these attribute, called mi below.
line 4: Ensure that mi has not already been selected for another N , this enables
to produce non redundant full decision paths (see Example 2).
line 5: Add the interval mi (P ) in the full decision path of P . The test mi ∈ mi (P )
will separate between positive examples covered by P and negative examples cov-
ered by N .
line 6: For each minimal negative hypothesis N that has no attribute m such
that m(P ∩ N ) = ∅.
line 7: Positive examples covered by P and negative examples covered by N can
be separated by a disjunction of tests m ∈ m(P ) \ m(P ∩ N )) on each attribute
m. Hence, there is at least one attribute for which a positive example from P
belongs to m(P ) and not to m(N ). Otherwise, N would not be a negative hy-
pothesis.



   Note that Example 1 is a particular case where all negative examples are
gathered in a unique minimal negative hypothesis.
   A few values have been modified in Table 2 in order to produce two minimal
negative hypothesis.
            Extracting Decision Trees From Interval Pattern Concept Lattices              327
                                                                                           IX

                                      m1 m2 m3 m4 
                                   p1 7 3.2 4.7 1.4 +
                                   p2 5 2 3.5 1 +
                                   p3 5.9 3.2 4.8 1.8 +
                                   p4 5.5 2.3 4 1.3 +
                                   n1 5.9 3.3 5.7 1.4 -
                                   n2 7.2 3.2 6 1.8 -
                                   n3 6.2 2.8 4.8 1.8 -
                             Table 2: Example 2: training set




   Minimal positive hypothesis P1 and P2 remain unchanged while there are
two minimal negative hypothesis:

                      N1 = h[5.9, 7.2], [3.2, 3.3], [5.7, 6], [1.4, 1.8]i

                      N2 = h[6.2, 7.2], [2.8, 3.2], [4.8, 6], [1.8, 1.8]i
    This leads to the following intersections:

                            P1 ∩ N1 = h[5.9, 7], [3.2], ∅, [1.4]i

                            P1 ∩ N2 = h[6.2, 7], [2.8, 3.2], ∅, ∅i
                           P2 ∩ N1 = h[5.9], [3.2], ∅, [1.4, 1.8]i
                         P2 ∩ N2 = h∅, [2.8, 3.2], [4.8, 4.8], [1.8]i
    Examples of full decision path computed by Algorithm 1 from P1 are

                            h(m3 , [3.5, 4.7]), (m4 , [1, 1.4])i(1)

                           h(m3 , [3.5, 4.7]), (m3 , [3.5, 4.7])i(2)
Note that neither N1 nor N2 intersect P1 on m3 , therefore the full decision
path (2) can be simplified as h(m3 , [3.5, 4.7])i. More generally, following pre-
vious definitions, h(m3 , [3.5, 4.7])i is a non-redundant full decision path while
h(m3 , [3.5, 4.7]), (m4 , [1, 1.4])i and h(m3 , [3.5, 4.7]), (m3 , [3.5, 4.7])i are not. A con-
ditional test has been added in Algorithm 1 in order to also produce such non-
redundant full decision paths.
     Finally a concrete positive decision tree is built from the set of full decision
paths, each node corresponds to a minimal positive hypothesis Pi and contains
a test that consists in the conjunction of the elements of a full decision path.
The left child contains + and the right child is a node corresponding to another
minimal positive hypothesis Pj or - if all minimal positive hypothesis have been
processed.
     An example of decision tree for example 2 is: ”if m3 ∈ [3.5, 4.7] and m4 ∈
[1, 1.4] then +, else (if m3 ∈ [3.5, 4.8] and m1 ∈ [5, 5.9] then +, else -)”.
     We detail below the complete process for examples 1 and 2.
328
X               Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd


4.2        Example 1

                      P     P1 = h[5, 7], [2, 3.2], [3.5, 4.7], [1, 1.4]i
                            P2 = h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i
              N             N1 = h[6.2, 7.2], [2.8, 3.3], [4.8, 6], [1.8, 2.1]i
        intersections       P1 ∩ N1 = h[6.2, 7], [2.8, 3.2], ∅, ∅i
                            P2 ∩ N1 = h∅, [2.8, 3.2], [4.8], [1.8]i
full decisions paths from P h(m3 , [3.5, 4.7])i(P1 )
                            h(m4 , [1, 1.4])i(P1 )
                            h(m1 , [5, 5.9])i(P2 )
full decisions paths from N h(m3 , [4.8, 6]), (m1 , [6.2, 7.2])i(N1 )
                            h(m4 , [1.8, 2.1]), (m1 , [6.2, 7.2])i(N1 )



                m3 ∈ [3.5, 4.7]                                              m1 ∈ [5, 5.9]

          yes                            no                          yes                            no


      +                            m1 ∈ [5, 5.9]                    +                        m3 ∈ [3.5, 4.7]

                             yes                          no                          yes                      no

                         +                                    −                      +                          −

            m4 ∈ [1, 1.4]                                                     m1 ∈ [5, 5.9]

      yes                               no                              yes                          no


      +                        m1 ∈ [5, 5.9]                         +                         m4 ∈ [1, 1.4]

                         yes                             no                              yes                   no

                         +                                 −                          +                             −
  full paths from P1, then from P2                                                 full paths from P2, then from P1



                 m1 ∈ [6.2, 7.2] ∧ m3 ∈ [4.8, 6]                        m1 ∈ [6.2, 7.2] ∧ m4 ∈ [1.8, 2.1]

                   yes                             no                    yes                             no


                  −                                 +                    −                                +

                                                        full paths from N1

                                    Fig. 2: Decision trees built from Example 1
          Extracting Decision Trees From Interval Pattern Concept Lattices          329
                                                                                     XI

4.3   Example 2

              P             P1 = h[5, 7], [2, 3.2], [3.5, 4.7], [1, 1.4]i
                            P2 = h[5, 5.9], [2, 3.2], [3.5, 4.8], [1, 1.8]i
              N             N1 = h[5.9, 7.2], [3.2, 3.3], [5.7, 6], [1.4, 1.8]i
                            N2 = h[6.2, 7.2], [2.8, 3.2], [4.8, 6], [1.8, 1.8]i
        intersections       P1 ∩ N1 = h[5.9, 7], [3.2], ∅, [1.4]i
                            P1 ∩ N2 = h[6.2, 7], [2.8, 3.2], ∅, ∅i
                            P2 ∩ N1 = h[5.9], [3.2], ∅, [1.4, 1.8]i
                            P2 ∩ N2 = h∅, [2.8, 3.2], [4.8, 4.8], [1.8]i
full decisions paths from P h(m3 , [3.5, 4.7])i(P1 )
                            h(m3 , [3.5, 4.7]), (m4 , [1, 1.4])i(P1 ) (redundant)
                            h(m3 , [3.5, 4.8]), (m1 , [5, 5.9])i(P2 )
full decisions paths from N h(m3 , [5.7, 6])i(N1 )
                            h(m3 , [4.8, 6]), (m1 , [6.2, 7.2])i(N2 )
                            h(m4 , [1.8]), (m1 , [6.2, 7.2])i(N2 )


4.4   Comparison with traditional decision tree learning approaches

Standard algorithms such as C4.5 produce decision trees in which nodes contain
tests of the form a ≤ v, i.e. the value for attribute a is less or equal to v, while
our nodes contain conjunctions of tests of the form a ∈ [a1 , a2 ] ∧ b ∈ [b1 , b2 ]. A
solution consists in identifying minimal and maximal values for each attribute
in the training set, and by replacing them by −∞ and +∞ respectively in the
resulting trees (see Figure 4). Moreover, common decision tree induction tech-
niques use Information Gain maximization (or equivalently conditional entropy
minization) to choose the best split at each node. The conditional entropy of a
split is null when each child node is pure (contains only positive or negative ex-
amples). When this perfect split can not be expressed as an attribute-value test,
it can be shown that the optimal split that minimize conditional entropy consists
in maximizing the number of examples in one pure child node (proof is ommited
due to space limitation). This optimal split exactly matches our notion of posi-
tive (resp. negative) minimal hypothesis, which corresponds to descriptions that
gathers the maximum number of only positive (resp. negative) examples.
    However we insist that our algorithm is only a first and naive attempt to
produce decision trees from multi-valued contexts using pattern structures. Its
aim is only to clarify the links between decision tree learning and pattern struc-
tures. Therefore it obviously lacks of relevant data structures and optimization.
However we plan to focus our efforts on algorithm optimization and then on
rigorous experimentations on standard datasets.


5     Concluding remarks

In this paper, we studied the links between decision trees and FCA in the par-
ticular context of numerical data. More precisely, we focused on an extension
330
XII             Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd




                m3 ∈ [3.5, 4.7]                                m3 ∈ [3.5, 4.8] ∧ m1 ∈ [5, 5.9]

          yes                        no                  yes                           no


      +             m3 ∈ [3.5, 4.8] ∧ m1 ∈ [5, 5.9]      +                      m3 ∈ [3.5, 4.7]

                           yes                  no                        yes                       no

                        +                            −                   +                           −
           full paths from P1, then from P2                  full paths from P2, then from P1



                m3 ∈ [5.7, 6]                                  m3 ∈ [4.8, 6] ∧ m1 ∈ [6.2, 7.2]

          yes                        no                  yes                           no


      −            m3 ∈ [4.8, 6] ∧ m1 ∈ [6.2, 7.2]       −                         m3 ∈ [5.7, 6]

                           yes                  no                        yes                       no

                        −                            +                   −                           +

            m3 ∈ [5.7, 6]                                          m4 = 1.8 ∧ m1 ∈ [6.2, 7.2]
      yes                           no                       yes                        no


      −                 m4 = 1.8 ∧ m1 ∈ [6.2, 7.2]                                  m3 ∈ [5.7, 6]
                                                         −
                       yes                     no                            yes                    no

                       −                         +                        −                              +
            full paths from N1, then from N2                 full paths from N2, then from N1

                                  Fig. 3: Decision trees built from Example 2
             Extracting Decision Trees From Interval Pattern Concept Lattices          331
                                                                                       XIII


          m4 ∈ (−∞, 1.4]                             m4 ≤ 1.4

    yes                     no                yes                  no


   +                   m1 ∈ (−∞, 5.9]        +                  m1 ≤ 5.9
                 yes                    no                yes                     no

                 +                       −                +                        −
                     our approach                   Weka implementation of C4.5

Fig. 4: Comparison of decisions trees produced by our approach and by C4.5 for Ex-
ample 1



of FCA for numerical data called interval pattern structures, that has recently
gained popularity through its ability to handle numerical data without any dis-
cretization step. We showed that interval pattern structures from positive and
negative examples are able to reveal positive and negative hypothesis, from which
decision paths and decision trees can be built.
    In future works, we will focus on a comprehensive and rigorous comparison
of our approach with traditional decision tree learning techniques. Moreover,
we will study how to introduce in our approach pruning techniques that avoid
overfitting. We will also investigate solutions in order to handle nominal class
attributes (i.e. more than two classes) and heterogeneous explanatory attributes
(binary, nominal, ordinal, numerical). Finally, notice that interval patterns are
closed since (.) is a closure operator. In a recent work [11], it has been shown
that whereas a closed interval pattern represents the smallest hyper-rectangle
in its equivalence class, interval pattern generators represent the largest hyper-
rectangles. Accordingly, generators are favoured by minimum description length
principle (MDL), since being less constrained. An interesting perspective is to
test their effectiveness to describe minimal hypothesis in the present work.


References

 1. Quinlan, J.: Induction of decision trees. Machine learning 1(1) (1986) 81–106
 2. Fu, H., Njiwoua, P., Nguifo, E.: A comparative study of fca-based supervised
    classification algorithms. Concept Lattices (2004) 219–220
 3. Ganter, B., Wille, R.: Formal Concept Analysis. Springer-Verlag (1999)
 4. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: ICCS
    ’01: Proceedings of the 9th International Conference on Conceptual Structures,
    Springer-Verlag (2001) 129–142
 5. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression
    data with pattern structures in formal concept analysis. Inf. Sci. 181(10) (2011)
    1989–2001
 6. Breiman, L.: Classification and regression trees. Chapman & Hall/CRC (1984)
332
XIV     Zainab Assaghir, Mehdi Kaytoue, Wagner Meira and Jean Villerd


 7. Kuznetsov, S.O.: Machine learning and formal concept analysis. Int. Conf. on
    Formal Concept Analysis, LNCS 2961, (2004) 287–312
 8. Guillas, S., Bertet, K., Ogier, J.: A generic description of the concept lattices
    classifier: Application to symbol recognition. Graphics Recognition. Ten Years
    Review and Future Perspectives (2006) 47–60
 9. Nijssen, S., Fromont, E.: Mining optimal decision trees from itemset lattices. In:
    Proceedings of the 13th ACM SIGKDD international conference on knowledge
    discovery and data mining, ACM (2007) 530–539
10. Nguifo, E., Njiwoua, P.: Iglue: A lattice-based constructive induction system. In-
    telligent data analysis 5(1) (2001) 73
11. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting Numerical Pattern Mining
    with Formal Concept Analysis. In: International Joint Conference on Artificial
    Intelligence (IJCAI), Barcelona, Espagne (2011)
          A New Formal Context for Symmetric
                    Dependencies

                                  Jaume Baixeries

                  Departament de Llenguatges i Sistemes Informàtics.
                        Universitat Politècnica de Catalunya.
                            08024 Barcelona. Catalonia.
                               jbaixer@lsi.upc.edu



        Abstract. In this paper we present a new formal context for symmet-
        ric dependencies. We study its properties and compare it with previous
        approaches. We also discuss how this new context may open the door to
        solve some open problems for symmetric dependencies.


  1   Introduction and Motivation

  In database theory there are different types of dependencies, yet, two of them
  appear to be the most popular: functional dependencies and multivalued depen-
  dencies. The reason is that both dependencies come handy in order to explain
  the normalization of a database scheme. But some of these dependencies are not
  only confined to the database domain. For instance, implications (the equiva-
  lent of functional dependencies for binary data) are present in datamining and
  learning ([4,19,20]).
      In general terms, a dependency states a relationship between sets of attributes
  in a table. Let us suppose that we have the following set of attributes: U =
  {name, income, age} in a table that contains the following records:

                           id Name Income    Age
                           1 Smith 30.000 26-10-1956
                           2 Hart 35.000 14-02-1966
                           3 Smith 30.000 02-01-1964

      In such a case, we have that the relationship between age and the attributes
  income and name is functional, this is, that given a value of age, the value of
  income and name can be determined. We also have that the value of name can
  be determined by income and viceversa. In such a case, given these functional
  relationships between the attributes, we say that the functional dependencies
  age → {name, income}, name → income and income → name hold in that
  table.
      Functional dependencies and multivalued dependencies have their own set
  of axioms ([9,21]), which state what dependencies hold in the presence of other
  dependencies. For instance, an axiom for functional dependencies states that

c 2011 by the paper authors. CLA 2011, pp. 333–348. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
334
2       Jaume Baixeries          Baixeries J.

transitivity holds, which means that, in the previous case, if we had that name →
income and income → age hold in that table (which is not true in that table,
but just as a supposition), it must follow necessarily that name → age holds.
Given a set of dependencies Σ, we define as Σ + the set of all dependencies that
hold according to those axioms.
    These axioms, in turn, are also shared by other dependencies: implications
share the same axioms of functional dependencies ([4]), and degenerate multi-
valued dependencies share the same axioms of multivalued dependencies ([5]).
That is why we generically call Armstrong Dependencies (AD) those dependen-
cies that share the same axioms of the former, and Symmetric Dependencies
(SD) those that share the axioms of the latter.
   Since in this paper we are focusing on the syntactical properties of those de-
pendencies, we will only talk of Armstrong and symmetric dependencies, rather
than functional or multivalued dependencies.
    The lattice characterization of a set of Armstrong dependencies has been
widely studied in [10,11,13,14,15], and their characterization with a formal con-
text in [7,17]. However, the lattice characterization of symmetric dependencies
has not been so widely studied. The main work is in [12], and the character-
ization of symmetric dependencies with a formal contexts was studied in [3,5]
(we talk indistinctly of a lattice characterization and a characterization with a
formal context). In the case of AD’s, the formal context yields a powerset lattice
([17]), whereas in the case of symmetric dependencies, it yields a partition lattice
([3]).
    The fact that some problems related to AD’s have been solved using their
lattice characterization, suggests that the same problems for SD’s could also be
solved using their corresponding lattice characterization. We name three of those
problems already solved for AD’s, not yet for SD’s: learning SD’s, the finding
of a minimal basis for a set of dependencies for SD’s and the characterization
of mixed sets of SD’s and AD’s.
    In general terms, query learning consists in deducing a function (a formula)
via membership queries to an oracle. This method has been used to learn sets
of Horn clauses, which can also be seen as implications ([8]), or, more generally,
sets of Armstrong dependencies. Thus, the same general algorithm for learning
Horn clauses ([1]) has been adapted to learn Armstrong dependencies ([2]). This
adaptation was obviously easied by the fact that Horn clauses and Armstrong
dependencies share the same set of axioms. Yet, no such algorithm for symmetric
dependencies exists (to the best of the author’s knowledge).
   The minimal base (also: Duquenne-Guigues basis [16]) is the minimal set
of Armstrong dependencies needed to compute Σ + . In [11], [16] and [17] it is
characterized and computed in terms of the (powerset) lattice characterization
of Σ + .
   We have been dealing with unmixed sets of AD’s and SD’s, but there exists
an axiomatizations of mixed sets of AD’s and SD’s ([21]), but no lattice
characterization of mixed sets.
                         A
                         A New
                           New Formal
                               Formal Context
                                      Context for
                                               for Symmetric
                                                   Symmetric Dependencies
                                                             Dependencies             3353


    Although AD’s and SD’s are related, the lattice characterization yielded by
the formal context in [3] is quite different in nature to that for AD’s. Potentially,
it may pose different problems. The first is that the solutions that have been
found for AD’s (based on their lattice characterization) may not be applied
directly to the case of SD’s. We do not mean that having AD’s characterized
with a powerset lattice and SD’s with a partition lattice makes it impossible
to solve the same problems for SD’s. What we mean is that having a similar
characterization for SD’s would make it easier to try and find an answer using
existing solutions for AD’s.
    A second drawback is that the size of the formal context for SD’s is much
larger, in comparison with that for AD’s. This may cause a problem in case
the context is used in practical applications, but, more importantly, there are
partitions that play no rôle in that characterization. A simple analysis of [3]
yields that partitions that contain no singleton are completly useless, but a
more detailed analysis (out of the scope of this paper) indicates that there are
more redundant partitions.
    Finally, although partitions may be intuitive when dealing with SD’s, they do
not reflect the B ⇔ ¬B symmetry of the definition of symmetric dependencies
(as stated by Alan Day in [12]). It seems that the connection between AD’s and
SD’s is stronger than what the partition lattice characterization suggests.
    As a step towards solving the learning problem and computation of a minimal
basis for SD’s as well as the characterization of mixed sets of SD’s and AD’s, in
this paper we present a new formal context for symmetric dependencies, following
the work started in [3]. The results presented in this paper parallel those results,
but from a different perspective that, we think, improve both the understanding
and the possibilities to solve the open problems previously listed.
    This paper starts with the Notation section, followed by a Previous Work
section that explains the departing point of this paper. In the Results section,
we present a new formal context for SD’s. We also present an example in a
separate section to illustrate the results. Finally, we discuss some aspects of this
new formal context and present the conclusions and future work.


2    Notation

We depart from a set of attributes U. We use non capital letters for single
elements of that set, starting with a, b, c, . . . , and capital letters for subsets of U.
    The complement of a set X ⊆ U is X. We drop the union operator and use
juxtaposition to indicate set union. For instance, instead of X ∪ Y we write XY .
Generally, we also drop the set notation, and write abc instead of { a, b, c }.
    We define the powerset of a set U as ℘(U). The set of partitions that can be
formed with U is Part(U). The notation for a partition is P = [P1 | P2 | · · · | Pn ],
where Pi are the classes (subsets) of P . If needed, we indicate that the attributes
in a set X are in fact a set of singletons with this notation: X. For instance,
{ a, b, c, d } = { { a }, { b }, { c }, { d } }. We overload P ≥ Q to indicate that a
336
4       Jaume Baixeries          Baixeries J.

partition P refines a partition Q and P ≤ Q to indicate that P is coarser than
Q. More details of this (reversed) order can be found in [18].
    As for Formal Concept Analysis, we use the usual notation ([17]), which
includes the use of 0 as the (overloaded) function that relates the set of attributes
and that of objects and viceversa.


2.1   Symmetric Dependencies

A symmetric dependency is a relation between two sets of attributes, and it is
stated as X ⇒ Y . Given a set of attributes U, we define SDU as the set of all
symmetric dependencies that can be formed with U. Although they will only
be mentioned in this paper, we say that X → Y is an Armstrong dependency.
Given a set of SD’s Σ ⊆ SDU , we say that the closure of Σ is Σ + , and consists
of Σ plus the set of all SD’s that can be derived from Σ applying recursively the
following axioms:

Definition 1 (Axioms for SD’s).

1. Reflexivity: If Y ⊆ X, then, X ⇒ Y holds.
2. Complementation: If X ⇒ Y holds, then, X ⇒ XY holds.
3. Augmentation: If X ⇒ Y holds and W 0 ⊆ W ⊆ U, then, XW ⇒ Y W 0 holds.
4. Transitivity: If X ⇒ Y and Y ⇒ Z hold, then, X ⇒ Z \ Y holds.

    Because of complementation, we give a symmetric dependency as X ⇒ Y | Z,
where Z = XY . We always assume that the rightest set in the right-hand side
of a symmetric dependency is the complementary of the union of the other two.
However, sometimes we will state it explicitly, as in X ⇒ Y | XY and sometimes
we will simply use X ⇒ Y | Z. In both cases, X is the left-hand side of the
dependency, Y its first right-hand side, and Z its second right-hand side. The
set SDU is the set of all non-trivial symmetric dependencies that can be formed
using all the attributes in U. By non-trivial we mean those SD’s X ⇒ Y | Z
such that:

Definition 2. A symmetric dependency X ⇒ Y | Z is non-trivial if:

1. X ∪ Y ∪ Z = U.
2. X ∩ Y = X ∩ Z = Y ∩ Z = ∅.
3. X 6= ∅, Y 6= ∅, Z 6= ∅.

    As it can be seen, according to the axioms for symmetric dependencies, this
limitation incurs in no loss of information, since the remaining symmetric de-
pendencies can easily be derived from SDU ([21]).
    It is precisely the complementation rule that states the relation between Arm-
strong dependencies and symmetric dependencies. Broadly speaking, we could
say that a symmetric dependency X ⇒ Y | Z is equivalent to the fact that either
the Armstrong dependencies X → Y or X → Z hold. This is a too general state-
ment, but if, as an example, we take, functional dependencies and its symmetric
                        A
                        A New
                          New Formal
                              Formal Context
                                     Context for
                                              for Symmetric
                                                  Symmetric Dependencies
                                                            Dependencies        3375


counterpart, degenerate multivalued dependencies, we see that the definition of a
functional dependency X → Y states that whenever two tuples agree on X they
also agree on Y , whereas the definition of a degenerate multivalued dependency
X ⇒ Y | Z states that whenever two tuples agree on X they also agree on Y
or they agree in Z. In fact, there are also a set of two axioms in the case we
are dealing with mixed sets of AD’s and SD’s. One of this axioms state that if
X → Y holds, then, X ⇒ Y | XY holds as well. This example is just to indicate
that the relationship between AD’s and SD’s is strong, and that SD’s can be as
a generalization of AD’s.
    Given a set of symmetric dependencies Σ, we say that the dependency
basis of a set of attributes X ⊆ U (that is: DBΣ (X)) is the coarsest partition
of U such that all the dependencies X ⇒ Y | Z that hold in Σ + are those such
that Y (symmetrically Z) is the union of one or more classes of DB(X). This
partition always exists ([21]) and defines all the symmetric dependencies that
hold in Σ + such that their left-hand side is X.
    We also have that, since reflexivity holds for SD’s, all the attributes of X ⊆ U
are singletons in DBΣ (X).


2.2   Previous Work

The origins of defining a formal context to characterize the closure of a set of
Armstrong dependencies started in [17]. This formal context was defined as:

                            KAD (U) = (ADU , ℘(U), I)
   where ADU is the set of Armstrong dependencies that can be formed with
the set of attributes U, and I was a binary relation between an Armstrong
dependency and a set of attributes.
   In [3], it was presented a formal context for symmetric dependencies with
identical properties:

                           KSD (U) = (SDU , Part(U), I 0 )
    The relations I and I 0 are generically called ”respect” relations: a set of
attributes (a partition) respects an Armstrong (symmetric) dependency.
    Both formal contexts, in spite of its obvious structural differences, charac-
terized the closure of a set of dependencies of its kind. In fact, both contexts
provided the following results for each respective kind of dependencies:

 1. Σ + = Σ 00 .
 2. Σ 0 is the lattice characterization of Σ + .

   When we say that Σ 0 was the lattice characterization of Σ + , it may seem
redundant, since we already have that Σ + = Σ 00 . What we mean is that Σ 0 alone,
without the application of the operator 0 , also characterized all the dependencies
of Σ + . This was done with the definition of a closure operator on Σ 0 :
338
6       Jaume Baixeries            Baixeries J.


                                       ^
                          ΓΣ 0 (X) =    { Y ∈ Σ0 | Y ⊇ X }

   The fact that this function is total indicates that ∧ is always defined in Σ 0 .
Depending on the formal context we were dealing with, we would have that
X ∈ ℘(U) (AD’s) or that X ∈ Part(U) (SD’s). In the case of Armstrong depen-
dencies, we would then have that X → Y ∈ Σ + if and only if:

                                 ΓΣ 0 (X) = ΓΣ 0 (XY )

    In the case of a symmetric dependency, it is a little bit more elaborated from a
syntactical point of view, yet, equivalent to the previous case: X ⇒ Y | Z ∈ Σ +
if and only if:

                          ΓΣ 0 ([X | Y Z]) = ΓΣ 0 ([X | Y | Z])

   Clearly, Σ 0 alone gives us the information of which dependencies are in Σ + by
querying the (closure) operator ΓΣ 0 . In both cases, and oversimplifying, we can
say that a dependency holds in Σ + if and only if there is some kind of relationship
between its left-hand side and its right-hand side, being this relationship defined
by the formal context.


3     Results

The results in this paper try to overcome the potential problems that may rep-
resent the differnet nature of the current formal contexts for AD’s and SD’s
(powerset versus partitions), as well as the larger size of a partition set, by pre-
senting a characterization of symmetric dependencies based on a formal context
whose set of attributes is the powerset of U instead of its partitions. This context
will generalize that for AD’s in [17] as it will seen in Section 5.
    We define a formal context, and prove that it characterizes the set of sym-
metric dependencies Σ + , in a way similar to that in [3]: (SDU , ℘(U), I ), where
the relation I is defined as follows:

Definition 3. A ⊆ U respects a symmetric dependency X ⇒ Y | Z (that is:
X ⇒ Y | Z I A) if and only if:

                           A + X or A ⊇ XY or A ⊇ XZ

   We have that Σ 0 ⊆ ℘(U). As a trivial consequence of Definition 3 we have
the following proposition:

Proposition 1. X ⇒ Y | Z ∈ Σ 00 if and only if

                  @A ∈ Σ 0 : A ⊇ X and A + XY and A + XZ
                        A
                        A New
                          New Formal
                              Formal Context
                                     Context for
                                              for Symmetric
                                                  Symmetric Dependencies
                                                            Dependencies        3397


   We now study the properties of this contexts and how they characterize Σ + .
We first see that all the dependencies that are in Σ + are also present in Σ 00 . To
prove this claim, we must prove axiom by axiom that the dependencies derived by
those axioms are also present in Σ 00 , but since reflexivity and complementation
are trivial, we only prove augmentation and transitivity.

Proposition 2 (Augmentation). If X ⇒ Y | XY ∈ Σ, and W 0 ⊆ W then,
XW ⇒ Y W 0 | XW Y ∈ Σ 00 .

Proof. By the way of contradiction, we suppose that there is a set A ⊆ U, A ∈ Σ 0
such that (note that XW XY W = XW Y )

                   A ⊇ XW and A + XW Y and A + XW Y
   Since X ⇒ Y | XY ∈ Σ, we have that A + X or A ⊇ XY or A ⊇ XY
(because XXY = XY ). We have that A ⊇ XW discards A + X. So, we only
have two possible options:

  (i) A ⊇ XY , which in combination with A ⊇ XW yields A ⊇ XY W , which
      contradicts A + XW Y .
 (ii) A ⊇ XY , which in combination with A ⊇ XW yields A ⊇ XW Y , which
      contradicts A + XW Y .




Proposition 3 (Transitivity). If X ⇒ Y | XY ∈ Σ and Y ⇒ Z | Y Z ∈ Σ,
then, X ⇒ Z \ Y | X(Z \ Y ) ∈ Σ.

Proof. By the way of contradiction, we suppose that there is A ⊆ U, A ∈ Σ 0
such that

                A ⊇ X and A + X(Z \ Y ) and A + XX(Z \ Y )

    We have to note that XX(Z \ Y ) = X(Z \ Y ), and that Z \ Y = Y Z, we
finally have that XX(Z \ Y ) = XY Z. Therefore, we suppose that there is a set
A ⊆ U, A ∈ Σ 0 such that:

  (i) A ⊇ X.
 (ii) A + X(Z \ Y ).
(iii) A + XY Z.

   On the other hand, we have that:
   X ⇒ Y | XY ∈ Σ implies that A + X or A ⊇ XY or A ⊇ XY .
   Y ⇒ Z | Y Z ∈ Σ implies that A + Y or A ⊇ Y Z or A ⊇ Y Z.
   Since we are assuming that A ⊇ X, we can discard A + X. We also have
that the case A ⊇ XY discards A + Y . This leaves three possibilities, either:
340
8       Jaume Baixeries         Baixeries J.

  (i) A ⊇ XY and A ⊇ Y Z, that is, A ⊇ XY Z ⊇ X(Z \ Y ). This contradicts
      A + X(Z \ Y ).
 (ii) A ⊇ XY and A ⊇ Y Z, that is, A ⊇ XY Z. This contradicts A + XY Z.
(iii) A ⊇ XY . Y ∩ Z = ∅ implies that Z \ Y ⊆ Y . All this yields A ⊇ XY ⊇
      X(Z \ Y ). This contradicts A + X(Z \ Y ).




   Therefore, we have proved that any Σ 00 contains, at least, all the symmetric
dependencies that are in Σ + .

Corollary 1. Σ + ⊆ Σ 00 .

Proof. By Propositions 2 and 3.

   We now prove completeness, that is, that Σ 00 only contains all the depen-
dencies in Σ + .

Theorem 1. Σ 00 ⊆ Σ + .

Proof. We prove that X ⇒ Y | Z ∈     / Σ + implies that X ⇒ Y | Z ∈   / Σ 00 .
                                      +
    We have that X ⇒ Y | Z ∈     / Σ . It means that the dependency basis of X
is such that in DBΣ (X) = [X | P1 | · · · | Pn ] (with n ⊇ 1) there is, at least, a
class Pk such that Pk ∩ Y 6= ∅ and Pk ∩ Z 6= ∅. We fix Pk in this proof. We note
that |Pk | ⊇ 2, since it contains, at least, one attribute from Y and one from Z.
              Sn
    Let P = ( Pj )\Pk , that is, P is the union of all partitions in DBΣ (X) which
              j=1
are not X, except Pk . Therefore, XP = Pk . We now claim that XP ∈ Σ 0 . We
prove this statement by the way of contradiction. Assume that XP 6∈ Σ 0 . That
is because there is a dependency R ⇒ S | T ∈ Σ such that X ⊇ R and X +
RS and X + RT . This implies that there is, at least, one attribute in RS which
is not i XP , and, at least, one attribute in RT which is not in XP . Let them
be s ∈ RS, s 6∈ XP and t ∈ RT, t 6∈ XP . Since X ⊇ R, then, s ∈ S, s 6∈ XP and
t ∈ T, t 6∈ XP . Necessarily, since s, t 6∈ XP , then, s, t ∈ Pk .
    Since XP ⊇ R, by reflexivity, XP ⇒ R | XP R, and by transitivity, XP ⇒
S \ R | XP (S \ R). Without lack of generality, we assume that R, S, T are
disjoint, so, finally, we have XP ⇒ S | XP S. By the definition of DBΣ (X),
then, X ⇒ P | XP , and by transitivity, we have X ⇒ S \ XP | X(S \ XP ).
Since s ∈ S, s 6∈ XP , then, s ∈ S \ XP , and since t ∈ T (assuming R, S, T
disjoint), t 6∈ S, that, together with t 6∈ XP yields that t ∈ X(S \ XP ). It means
that the attributes s, t are in different classes in DBΣ (X), but this contradicts
the previous assumption that Pk ∈ DBΣ (X).
    Now, we have that XP ∈ Σ 0 . Since s, t 6∈ XP , we have that XP + XY and XP +
XZ and XP ⊇ X, which implies X ⇒ Y | Z 6∈ Σ 00 .
                        A
                        A New
                          New Formal
                              Formal Context
                                     Context for
                                              for Symmetric
                                                  Symmetric Dependencies
                                                            Dependencies        3419


    We have that Σ 00 is exactly the set Σ + . But, as we have already discussed
in the previous section, in [3] and [17] we had a method to query Σ 0 whether a
dependency was in Σ + , and consisted in the closure operator ΓΣ 0 that, given a
set of attributes, returned the meet of its up-set. In this present case, we may
have that Σ 0 is not a lattice (but a partial lattice) and the same operator would
not be a total function. Therefore, we use the up-set, instead of its meet:

Definition 4. Let Σ ⊆ SDU . We define the up-set of X ⊆ U as follows:

                         U PΣ (X) = { Y ∈ Σ 0 | Y ⊇ X }

   This definition is the standard one in lattice theory ([18]) when Σ 0 is an
ordered set. The proof of the following proposition is trivial, yet, it will come
handy to prove the last result of this paper.

Proposition 4. Let X, Y, Z ⊆ U such that Y ⊇ X and Z ⊇ X.

                         U PΣ (X) = U PΣ (Y ) ∪ U PΣ (Z)
if and only if

                    @A ∈ Σ 0 : A ⊇ X and A + Y and A + Z

    We need to remark that, although the set Σ 0 may not be closed under set
intersection, the set of all up-sets of Σ 0 is closed under intersection. We are now
ready to prove that it can be tested whether a dependency is in Σ + querying
Σ 0 alone:

Proposition 5. X ⇒ Y | Z ∈ Σ + if and only if

                       U PΣ (X) = U PΣ (XY ) ∪ U PΣ (XZ)

Proof.
                                X ⇒ Y | Z ∈ Σ+
if and only if (by Corollary 1 and Theorem 1)

                                 X ⇒ Y | Z ∈ Σ 00
if and only if (by Proposition 1)

                     @A : A ⊇ X and A + XY and A + XZ
if and only if (by Proposition 4)

                       U PΣ (X) = U PΣ (XY ) ∪ U PΣ (XZ)
342
10        Jaume Baixeries           Baixeries J.

4      Example

We provide a running example in order to illustrate and clarify the results that
are contained in the previous section. We depart from a set of attributes U =
{ a, b, c, d }. The resulting formal context is presented in Figure 1.
    As stated in Theorem 1, this contexts computes the set Σ + . For instance, let
us take the set
                              Σ = {a ⇒ b | cd, b ⇒ ad | c}
      According to this context, we have that

                      Σ 0 = {c, d, bc, cd, abc, abd, acd, bcd, abcd}

and, finally,


             Σ 00 = Σ + = {a ⇒ b | cd, b ⇒ ad | c, a ⇒ c | bd, a ⇒ d | bc,
                            ab ⇒ c | d, ac ⇒ b | d, ad ⇒ b | c, bd ⇒ a | c}

    To check these results, we see that ac ⇒ b | d, ad ⇒ b | c and bd ⇒ a | c
are derived from Σ by the reflexivity, transitivity and complementation. For
instance, given a ⇒ b | cd, by reflexivity we have ac ⇒ a | bd, and by transitivity
ac ⇒ b | d (complementation comes from the notation X ⇒ Y | Z used in this
paper). Dependencies ad ⇒ b | c and bd ⇒ a | c can be derived alike.
    As for the remaining SD’s:

                                  a ⇒ c | bd, a ⇒ d | bc

by applying transitivity to a ⇒ b | cd and b ⇒ ad | c, we obtain a ⇒ c | bd, and
with complementation we have a ⇒ d | bc.
                        A
                        A New
                          New Formal
                              Formal Context
                                     Context for
                                              for Symmetric
                                                  Symmetric Dependencies
                                                            Dependencies            343
                                                                                     11




                             abcd
                             acd
                             abd
                             abc


                             bcd
                             ad
                             ac
                             ab




                             cd
                             bd
                             bc
                             a


                             d
                             c
                             b
                  a ⇒ b | cd   ××××        ××××××××
                  b ⇒ a | cd ×   ×××××         ××××××
                  a ⇒ c | bd   ×××     ×   ××××××××
                  c ⇒ a | bd × ×   ××××      ×  ×××××
                  a ⇒ d | cb   ×××       ×××××××××
                  d ⇒ a | cb × × ×   ××××       ×××××
                  b ⇒ c | ad ×   ××    ×××     ××××××
                  c ⇒ b | ad × ×   ××    ×××    ×××××
                  b ⇒ d | ac ×   ××    ××    ×××××××
                  d ⇒ b | ac × × ×   ××    ××   ×××××
                  c ⇒ d | ab × ×   ××    ×   ×××××××
                  d ⇒ c | ab × × ×   ××    ×   ××××××
                  ab ⇒ c | d × × × ×   ××××××××××
                  ac ⇒ b | d × × × × ×   ×××××××××
                  bc ⇒ a | d × × × × × × ×   ×××××××
                  ad ⇒ b | c × × × × × ×   ××××××××
                  bd ⇒ a | c × × × × × × × ×   ××××××
                  cd ⇒ a | b × × × × × × × × ×  ×××××


             Fig. 1. Formal context (SDU , ℘(U), I) for U = { a, b, c, d }



    We now present an example with one more attribute, which may provide more
insight in the details, but in this case, we do not present the context explicitly.
The set of attributes is now U = { a, b, c, d, e }. Let Σ be the set of symmetric
dependencies:

   b ⇒ a | cde b ⇒ c | ade c ⇒ a | bde c ⇒ b | ade d ⇒ a | bce d ⇒ e | abc
   e ⇒ a | bcd e ⇒ d | abc

   According the formal context (SDU , ℘(U), I ), we have that:

              Σ 0 = { a, abc, ade, abcd, abce, abde, acde, bcde, abcde }

    We can see that, applying the axioms of symmetric dependencies in Definition
1, the set Σ + is:

   b ⇒ a | cde   b ⇒ c | ade    c ⇒ a | bde   c ⇒ b | ade    d ⇒ a | bce     d ⇒ e | abc
   e ⇒ a | bcd   e ⇒ d | abc    abd ⇒ c | e   acd ⇒ b | e    ce ⇒ a | bd     de ⇒ a | bc
   bcd ⇒ a | e   abe ⇒ c | d    ace ⇒ b | d   bce ⇒ a | d    bde ⇒ a | c     cde ⇒ a | b
   ab ⇒ c | de   ac ⇒ b | de    ad ⇒ bc | e   ae ⇒ bc | d    bc ⇒ a | de     bd ⇒ ac | e
   bd ⇒ a | ce   bd ⇒ ae | c    be ⇒ ac | d   be ⇒ ad | c    be ⇒ a | cd     cd ⇒ a | be
   cd ⇒ ae | b   cd ⇒ ab | e    ce ⇒ ab | d   ce ⇒ ad | b
344
12        Jaume Baixeries            Baixeries J.

    We only state the non-trivial dependencies as in Definition 2. We take, for
instance, the dependencies:

                          bd ⇒ ac | e, bd ⇒ a | ce, bd ⇒ ae | c
    They are derived from the dependencies b ⇒ a | cde and b ⇒ c | ade. They are
in Σ + because the sets that include bd are abcd, abde, bcde, abcde. This obviously
means that all of them respect all the dependencies in Σ + . We take, for instance,
the set abcd and see that it respects bd ⇒ ae | c because abcd ≥ bcd (the left-
hand side plus the second right-hand side) and that it also respects bd ⇒ a | ce
because abcd ≥ abd (the left-hand side plus the first right-hand side). We can
see in this example the duality of the definition of the relation respect. This
is one case of derivation by augmentation, which means that the dependencies
that derive another dependency remove the sets that would prevent the derived
dependency from appearing in Σ 00 . In this latter particular case, the sets that
could be forbitting any of these dependencies from appearing in Σ 00 have been
cleared by b ⇒ a | cde and b ⇒ c | ade. We take, for instance, the set bde, (which
would prevent bd ⇒ a | ce from being in Σ 00 ) is not in Σ 0 because it does not
respect the dependency b ⇒ a | cde.
    We now illustrate one case of derivation by transitivity with the following
set:

                                 a ⇒ bc | de, bc ⇒ d | ae
   By transitivity, we have that a ⇒ d | bce ∈ Σ + . If we take Σ = { a ⇒ bc | de },
we have:


  Σ 0 = {b, c, d, e, bc, bd, be, cd, ce, de, abc, ade, bcd, bce, bde, cde, abcd, abce, abde,
          acde, bcde, abcde}

                                 / Σ 00 since the sets abc, acde ∈ Σ 0 do not respect
    It is clear that a ⇒ d | bce ∈
this dependency. Now, if we include bc ⇒ d | ae in Σ, we have:


Σ 0 = { b, c, d, e, bd, be, cd, ce, de, ade, bcd, bde, cde, abcd, abce, abde, acde, bcde, abcde }

   It has precisely been the dependency bc ⇒ d | ae the one that has cleared
both abc and acde from Σ 0 and, therefore, allows a ⇒ d | bce to appear in Σ 00 .
   We now illustrate how Σ 0 alone can be used to query what dependencies hold
in Σ + . Again, we have that Σ is the set:

      b ⇒ a | cde b ⇒ c | ade c ⇒ a | bde c ⇒ b | ade d ⇒ a | bce d ⇒ e | abc
      e ⇒ a | bcd e ⇒ d | abc

      and, therefore:

                Σ 0 = { a, abc, ade, abcd, abce, abde, acde, bcde, abcde }
                        A
                        A New
                          New Formal
                              Formal Context
                                     Context for
                                              for Symmetric
                                                  Symmetric Dependencies
                                                            Dependencies        345
                                                                                 13

   We can see that Σ 0 in not closed (abcd, abde ∈ Σ 0 , but ab ∈  / Σ 0 ). Now,
                                                             +
suppose that we want to test whether a dependency is in Σ . For instance, we
take a dependency that is not in Σ + , as a ⇒ bc | de and query Σ 0 :

    U PΣ (a) = { a, abc, ade, abcd, abce, abde, acde, abcde }
    U PΣ (abc) = { abc, abcd, abce, abcde }
    U PΣ (ade) = { ade, abde, acde, abcde }

   According to Proposition 5, since the sets U PΣ (a) and U PΣ (abc)∪U PΣ (ade)
do not coincide, then, this dependency does not hold in Σ + . We see that the set
that does not allow this equality to hold is the set a, which is in Σ 0 because all
dependencies in Σ are respected by this set. We take now a positive example of
a dependency that is in Σ + but not in Σ, as for instance ab ⇒ c | de:

    U PΣ (ab) = { abc, abcd, abce, abde, abcde }
    U PΣ (abc) = { abc, abcd, abce, abcde }
    U PΣ (abde) = { abde, abcde }

    In this case, the sets U PΣ (ab) and U PΣ (abc) ∪ U PΣ (abde) coincide.


5    Discussion
We have seen in Section 2 that the different characterizations of dependencies
with formal contexts follow a common pattern, regardless of the type of depen-
dencies or the definition of the context. Yet, the definition of formal contexts for
AD’s and SD’s as in [3] was structurally different (powersets versus partitions)
and that made it difficult to find a relationship and generalization between both
contexts, in spite of the clear structural similarities that exist between AD’s and
SD’s.
    Now, we have that the relation I is a generalization of the relation defined
in the context KAD (U) = (ADU , ℘(U), I). We recall the definition of this relation
([17]):

Definition 5. A ⊆ U respects an Armstrong dependency X → Y iff:

                               A + X or A ⊇ XY

    We see that this definition avoids the reference to the second right-hand side,
precisely because in AD’s, complementation does not hold. If we drop this part
from Definition 3, we have the definition of the respect relation for Armstrong
dependencies.
    This generalization seems to suggest that the solutions that have been de-
veloped based on the lattice characterization of sets of Armstrong dependencies,
may also be applied to symmetric dependencies, namely:

 1. To define a formal context for mixed sets of AD’s and SD’s.
346
14      Jaume Baixeries         Baixeries J.

 2. To adapt the classical query algorithm for learning Armstrong dependencies.
 3. To characterize the generating set of a set of symmetric dependencies.
    Yet, although we are now in a better position to attack those problems, it
does not seem to be a trivial task. For instance, the intuition would tell us that
defining a formal context for mixed sets of dependencies, such that the relation
would be the union of the relations already defined for AD’s and SD’s would
work, but this is not the case. In fact, although this is out of the scope of this
paper, this mixed formal contexts characterizes the symmetric dependencies that
are in Σ + , where Σ is a mixed set of AD’s and SD’s, but fails in characterizing
the AD’s that are in Σ + . However, this simple strategy allows to advance towards
the definition of a mixed formal context, which would have not been that simple
departing from a partition context.
    Adapting the classic learning algorithm for learning AD’s and the characteri-
zation of the generating set of a set of SD’s may encounter some difficulties. The
main difference between Σ 0 for Armstrong and symmetric dependencies is that
for the former, Σ 0 is always a powerset lattice closed under intersection, whereas
for symmetric dependencies, this is not necessarily the case, and, therefore, not
all the existing solutions for Armstrong dependencies, based on lattices, may be
applied out of the box to symmetric dependencies. Yet, the fact that now we are
dealing with contexts of the same nature, offers a much clearer perspective and
understanding than before.
    It must be said too that whereas this new characterization may make it po-
tentially easier to find methods for finding minimal basis and query learning
for SD’s, it is true that SD’s have not yet been used outside the database do-
main. We think that advancing in the study of lattice characterization for SD’s
and finding algorithmic similarities with FD’s may introduce the use of SD’s in
other domains, namely knowledge discovery and machine learning, or in database
theory, where it is already present: it would be of interest to have algorithms
to compute minimal basis for SD’s, profiting from the important collection of
algorithms that compute the minimal basis of a set of AD’s.
    Finally, we would like to remark that the size of the formal context is greatly
improved w.r.t the context in [3], since we have replaced the set Part(U) by the
set ℘(U). Yet, and for the sake of algorithmic solutions already existing in the
FCA community, we have to say that the size of the context remains exponential.


6     Conclusions and Future Work
We have presented a new formal context for symmetric dependencies. This con-
texts provides the same functionalities as previous approaches, and it is much
simpler. Yet, it offers the same expressivity power and, in fact, reduces the con-
ceptual gap between Armstrong and symmetric dependencies that existed in a
previous approach. We strongly believe that this may be the first step towards
the resolution via formal concept analysis, of the learning, minimal bases and
mixed sets of dependencies problems for symmetric dependencies, profiting from
solutions already existing for Armstrong dependencies.
                        A
                        A New
                          New Formal
                              Formal Context
                                     Context for
                                              for Symmetric
                                                  Symmetric Dependencies
                                                            Dependencies           347
                                                                                    15

References

 1. Angluin D., Frazier M., Pitt L. Learning Conjunctions of Horn Clauses. Machine
    Learning, 9:147-164, 1992.
 2. Arias M., Balcázar, José L. Canonical Horn Representations and Query Learning.
    Lecture notes in computer science, vol. 5809, p. 156-17, 2009.
 3. Baixeries, Jaume. A Formal Context for Symmetric Dependencies. ICFCA 2008.
    LNAI 4933.
 4. Baixeries, Jaume and Balcázar, José L. Discrete Deterministic Data Mining as
    Knowledge Compilation. Proceedings of Workshop on Discrete Mathematics and
    Data Mining in SIAM International Conference on Data Mining, 2003.
 5. Baixeries, Jaume and Balcázar, José L. Characterization and Armstrong Relations
    for Degenerate Multivalued Dependencies Using Formal Concept Analysis. Formal
    Concept Analysis, Third International Conference, ICFCA 2005, Lens, France,
    February 14-18, 2005, Proceedings. Lecture Notes in Computer Science, 2005
 6. Baixeries, Jaume and Balcázar, José L. Unified Characterization of Symmetric
    Dependencies with Lattices. Contributions to ICFCA 2006. 4th International Con-
    ference on Formal Concept Analysis 2005.
 7. Baixeries, Jaume. A Formal Concept Analysis framework to model functional
    dependencies. Mathematical Methods for Learning, 2004.
 8. Balcázar, José L. and Baixeries, Jaume. Discrete Deterministic Data Mining as
    Knowledge Compilation. Workshop on Discrete Mathematics and Data Mining in
    SIAM Int. Conf. 2003.
 9. Beeri, Catriel and Fagin, Roland and Howard, John H. A Complete Axiomatization
    for Functional and Multivalued Dependencies in Database Relations. Proceedings
    of the 1977 ACM SIGMOD International Conference on Management of Data,
    Toronto, Canada, August 3-5, 1977.
10. Caspard, Nathalie and Monjardet, Bernard. The Lattices of Closure Systems, Clo-
    sure Operators, and Implicational Systems on a Finite Set: a Survey. Proceedings of
    the 1998 Conference on Ordinal and Symbolic Data Analysis (OSDA-98). Discrete
    Applied Mathematics, 2003.
11. Day, Alan. The Lattice Theory of Functional Dependencies and Normal Decompo-
    sitions. International Journal of Algebra and Computation Vol. 2, No. 4 409-431.
    1992.
12. Day, Alan. A Lattice Interpretation of Database Dependencies. Semantics of Pro-
    gramming Languages and Model Theory, 1993.
13. Demetrovics, János and Hencsey, Gusztav and Libkin, Leonid and Muchnik, Ilya.
    Normal Form Relation Schemes: a New Characterization. Acta Cybernetica, 1992.
14. Demetrovics, János and Huy, Xuan. Representation of Closure for Functional, Mul-
    tivalued and Join Dependencies. Computers and Artificial Intelligence, 1992.
15. Demetrovics, János and Libkin, Leonid and Muchnik, Ilya. Functional Dependen-
    cies in Relational Databases: a Lattice Point of View. Discrete Applied Mathemat-
    ics, 1992.
16. Duquenne, Vincent and Guigues, J.L. Familles Minimales d’Implications Informa-
    tives Resultant d’un Tableau de Donées Binaires. Mathematics and Social Sciences,
    1986.
17. Ganter, Bernhard and Wille, Rudolf. Formal Concept Analysis: Mathematical
    Foundations. Springer, 1999.
18. Grätzer, George. General Lattice Theory. Academic Press, 1978.
348
16      Jaume Baixeries          Baixeries J.

19. Pfaltz, John L. Using Concept Lattices to Uncover Causal Dependencies in Soft-
    ware. Formal Concept Analysis, 4th International Conference, ICFCA 2006, Dres-
    den, Germany, February 13-17, 2006.
20. Pfaltz, John L. Incremental Transformation of Lattices: A Key to Effective
    Knowledge Discovery. In Proc. of the First Intl. Conf. on Graph Transformation
    (ICGT’02), pages 351–362, Barcelona, Spain, Oct 2002.
21. Ullman, Jeffrey D. Principles of Database Systems. Computer Science Press, 1982.
               Cheating to achieve Formal Concept Analysis
                       over a large formal context?

                    Victor Codocedo1,3 , Carla Taramasco2 , and Hernán Astudillo1
           1
               Universidad Técnica Federico Santa Marı́a, Av. España 1640. Valparaı́so, Chile.
                     2
                        École Polytechnique, 32 Boulevard Victor 75015 Paris, France.
                       3
                          LORIA, BP 70239, F-54506 Vandoeuvre-lès-Nancy, France.



                  Abstract. Researchers are facing one of the main problems of the In-
                  formation Era. As more articles are made electronically available, it gets
                  harder to follow trends in the different domains of research. Cheap, coher-
                  ent and fast to construct knowledge models of research domains will be
                  much required when information becomes unmanageable. While Formal
                  Concept Analysis (FCA) has been widely used on several areas to con-
                  struct knowledge artifacts for this purpose [17] (Ontology development,
                  Information Retrieval, Software Refactoring, Knowledge Discovery), the
                  large amount of documents and terminology used on research domains
                  makes it not a very good option (because of the high computational cost
                  and humanly-unprocessable output). In this article we propose a novel
                  heuristic to create a taxonomy from a large term-document dataset us-
                  ing Latent Semantic Analysis and Formal Concept Analysis. We provide
                  and discuss its implementation on a real dataset from the Software Ar-
                  chitecture community obtained from the ISI Web of Knowledge (4400
                  documents).


      1         Introduction
      Research communities are facing one of the main problems of the Information Era
      and Formal Concept Analysis is not prepared to solve it. The amount of articles
      available online is growing each year yielding difficult to track trends, following
      ideas, looking for new terminology, etc. While some communities have under-
      stood the need for an artifact representing the knowledge within the domain
      (such as an ontology, a body-of-knowledge or a taxonomy) the problem remains
      in its construction since it is hard (highly technical), expensive (researchers are
      scarce) and complex (information is dynamic).
          Automatic and semi-automatic creation of a terms taxonomy have been
      widely boarded in several fields [3,4,5,13,24]. In this work we focus on the ap-
      proach described by Roth et al. [19] in which a taxonomy is derived from a
      corpus of documents by the use of Formal Concept Analysis (FCA). In partic-
      ular, they describe an application used to “represent a meaningful structure of
       ?
           We would like to thank Chilean project FONDEF D08I1155 ContentCompass, intra-
           basal project FB/20SO/10 in the context of the Chilean basal project FB0821 and
           ECOS-CONICYT project C09E08 for funding this work.




c 2011 by the paper authors. CLA 2011, pp. 349–362. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
350        Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo

      a given knowledge community in a form of a lattice-based taxonomy”. This ap-
      plication is illustrated using a set of abstracts of the embryologist community
      obtained from MedLine spanning 5 years where a random set of 25 authors and
      18 terms were analyzed. Although the lattice-based taxonomy obtained was a
      fair representation of the domain, real-size corpora of research communities are
      rather much larger than this example.
          Handling large datasets has been defined as one of the open problems in the
      community of FCA4 for two main reasons: first, the computational costs involved
      in the calculation of the concept lattice can make the use of FCA prohibitive
      and second, the concept lattice structure yielded could be so complex that its
      use may be impossible [10].
          Iceberg lattices [21] help in improving readability by eliminating “not rep-
      resentative” data, but useful information, such as “emerging behaviors [12,15],
      is lost in the process. Stabilized lattices (using a stability measure [16]) also
      improves readability by eliminating “noisy elements” from data, but being a
      post-process tool it also raises computational costs.
          We describe in this document a novel heuristic to create a lattice-based tax-
      onomy from a large corpus using Formal Concept Analysis and a widely used
      Information Retrieval technique called Latent Semantic Analysis (LSA). In par-
      ticular, we describe a process to compress a formal context into a smaller reduced
      context in order to obtain a lattice of terms that can be used to describe the
      knowledge on a given research domain. We illustrate our approach using a real-
      size dataset from a research community of Computer Sciences.
          The remainder of this paper proceeds as follows: Section 2 explains the basis
      of FCA, section 3 presents our approach and section 4, a case study over a real
      dataset from a research community. Section 5 presents the results and a com-
      parison of the obtained taxonomy with a human-expert handmade thesaurus.
      Finally, the conclusions are described in section 6.



      2     Formal Concept Analysis


      Formal Concept Analysis, originally developed as a subfield of applied mathe-
      matics [23], is a method for data analysis, knowledge representation and infor-
      mation management. It organizes information in a lattice of formal concepts.
      A formal concept is constituted by its extension (the objects that compose the
      concept) and its intension (the attributes that objects share). Objects and at-
      tributes are placed as rows and columns (resp.) in a cross-table or formal context
      where each cell indicates whether the object of that row have the attribute of that
      column. In what follows, we describe the Formal Concept Analysis framework
      as synthesized by Wille [22].

      4
          http://www.upriss.org.uk/fca/problems06.pdf
Cheating to achieve Formal Concept Analysis over a large formal context         351

2.1   Framework

Let G be a set of objects, M a set of attributes and I a binary relation between G
and M (I ⊆ (G × M )) indicating by gIm that the object g contains the attribute
m and K = (G, M, I) be the formal context defined by G, M and I. For A ⊆ G
and B ⊆ M it is defined the derivation operator (0 ) as follows:


                   A0 = {m ∈ M | gIm, ∀g ∈ A}, with A ⊆ G                       (1)
                    0
                   B = {g ∈ G | gIm, ∀m ∈ B}, with B ⊆ M                        (2)

    A formal concept of the formal context K is defined by (A, B) with A ⊆ G,
B ⊆ M , A0 = B and B 0 = A, where A is called the extent and B is called the
intent of the concept. The set of all formal concepts is defined as L(G, M, I).
    For two formal concepts (A1 , B1 ), (A2 , B2 ) ∈ K, the hierarchy of concepts is
given by the relation subconcept-superconcept as follows:

              (A1 , B1 ) ≤ (A2 , B2 ) ⇐⇒ A1 ⊆ A2 ( ⇐⇒ B1 ⊇ B2 )                 (3)

   Where (A1 , B1) is called the subconcept and (A2 , B2 ) is called the supercon-
cept.
   B(K) = (L(G, M, I), ≤) is the complete lattice or concept lattice of context
K


2.2   Iceberg Concept Lattices

Let (A, B) be a concept of B(K), its support is defined as:


                                               |A|
                                supp(A, B) =                                    (4)
                                               |G|

   Given a threshold minsupp ∈ [0, 1], the concept (A, B) is called a “frequent
concept” if supp(A, B) ≥minsupp.
   An Iceberg lattice [21] is the set of all frequent concepts for a given min-
supp.


2.3   Stability

Stability was proposed by Kuznetsov in [14,16] as a mechanism to prune “noisy
concepts”. It was extended by Roth et al. We use and provide their definition
from [19] and [15]:
   Let K = (G, M, I) be a formal context and (A, B) be a formal concept of K.
The stability index, σ, of (A, B) is defined as follows:

                                     |{C ⊆ A | C 0 = B}|
                         σ(A, B) =                                              (5)
                                            2|A|
352         Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo

          Stability measures how much the intent of a concept depends on particular
      objects of its extent, meaning that if the formal context changes and some objects
      disappear, then stability indicates how likely it is for a concept to remain in the
      concept lattice. Stability can also be used to construct a stabilized lattice for a
      given threshold similarly to an iceberg lattice.
          Analogous to definition 5, the extensional stability of a concept (A, B)
      can be defined as:
                                             |{D ⊆ B | D0 = A}|
                               σe (A, B) =                                            (6)
                                                    2|B|
           Extensional stability measures how likely is for a concept to remain if some
      attributes are eliminated from the context. We will use both definitions in this
      work differentiating them as intensional stability (on (5)) and extensional sta-
      bility (on (6)).


      3     Reducing a large formal context

      Different from Roth’s approach [19], we are not interested in tracking groups of
      people working on groups of topics, but rather in the relations among topics.
      These relations occur in the articles that authors write, where topics or terms
      can appear in sets and each one can appear one or more times. To elaborate:
          Given a corpus of articles G, a list of terms M and the relation among them
      I ⊆ (G × M ) indicating by gIm that the article g contains the term m, the
      document-article formal context is defined as:

                                       KO = (G, M, I)                                 (7)


      3.1     Rationale

      Even for a small set of terms, the amount of articles for a small research com-
      munity can reach thousands of articles making the processing of KO impossible
      or useless. The problem gets worse over time, because it can be expected that
      each year hundreds of articles will be added to the corpus.
          What happens with terms over time? In taxonomy evolution, as described in
      [18], symmetric patterns arise: some fields will progress or decline; some fields
      will contain more or less concepts (enrichment or impoverishment); and some
      fields will merge or split. In any case, it is not expected that the amount of terms
      would vary greatly.
          Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI) [6] is a
      technique used commonly in Information Retrieval (IR) as a tool for indexation,
      clusterization and query answering. LSA is based on the idea that for a given
      set of terms and documents, the relation among terms can be explained by a set
      of dimensions whose size is much smaller than the amount of documents. We
      exploit this feature of LSA to construct a reduced formal context of dimen-
      sions and terms having as conditions that information regarding relations of
Cheating to achieve Formal Concept Analysis over a large formal context            353

terms cannot be lost and that it has to produce a coherent taxonomy using less
computational time. In what follows, we provide a brief description of LSA to
elaborate on how we used it to produce a reduced formal context. For further
reading, please refer to [6].

3.2   Latent Semantic Analysis
Given a list of m terms and a corpus of n documents, let A be a term-document
matrix of rank-min(m,n) as defined in 8, where aij is the weight5 of the term i in
the document j. The Single-Value Decomposition of matrix A (in equation (9))
produces its factorization in three matrices where Σ contains the single-values
of matrix A at the diagonal in descending order and the columns of matrices U
and V are called left and right singular vectors of A.


                        Am×n = [aij ] ; i = [1..m], j = [1..n]                     (8)
                                                  T
                        Am×n = Um×m · Σm×n · Vn×n                                  (9)
                        A0m×n = Um×k · Σk×k · Vk×n
                                                T
                                                                                  (10)

   Since singular values drops quickly, we can create a new approximation of
matrix A using k  min(m, n) as shown in (10). Matrix A0 ≈ A is the closest
k-rank matrix approximation to A by the Frobenius measure [11]. Two new
matrices can be calculated:


                               Bm×k = Um×k · Σk×k                                 (11)
                                 Cn×k = Vn×k · Σk×k                               (12)

    where B holds the vector-space representation in k dimensions of terms; and
C the one of documents. Both of these matrices are used for clusterization since,
on them, similar elements are closer on each dimension. In particular, each di-
mension on B (each column) has a Gaussian-like distribution where terms group
around the mean value (see figure 1), except for dimension 0 (the different be-
havior in figure 1(b)) where terms have almost the same value6 . We exploit this
feature to define a conversion-function that allows us to construct the reduced
context.

3.3   A probabilistic-based conversion-function
Which terms are related within a given dimension? Since each dimension holds
continuous values, it is hard to define a region for them. Nevertheless, we know
 5
   Several weighting functions can be used, being the most used frequency of term and
   term frequency-inverse document frequency (tf.idf)
 6
   We do not use the information in this dimension for our analysis and exclude it from
   our results.
354          Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo

                   90                                                                                   300




                   80

                                                                                                        250

                   70




                   60                                                                                   200




                   50
            Hits




                                                                                                 Hits
                                                                                                        150

                   40




                   30                                                                                   100




                   20

                                                                                                         50

                   10




                    0                                                                                    0
                    0.10   0.05   0.00   0.05     0.10   0.15      0.20   0.25   0.30   0.35              0.2   0.1   0.0         0.1          0.2   0.3     0.4
                                                Coordinate Value                                                            Coordinate Value




       (a) Distribution of values in dimension 1                                               (b) Distribution of values in all dimensions

                                         Fig. 1. Distribution of Dimensions’ values in matrix B


      that such a region has to be centered at the mean value of the dimension. Hence,
      we define a “belonging region” centered at the mean with a modifiable width.
      Terms in this region are related because they belong in the dimension and hence,
      the pair dimension-term will appear in the reduced context. The width of the
      “belonging region” is a parameter that allows us to manage the density of the
      context. The conversion function is defined as:
                                          (
                                           1 Gk (x) ∈ [α, 1 − α]
                              bl (x, k) =                                         (13)
                                           0 otherwise

      where function Gk is the probability density function (PDF) for dimension k
      and α ∈ ]0,0.5[ defines the limits of the “belonging region”.

      3.4               Creating the reduced context
      For a document-article formal context KO as defined in (7) (original context)
      and a term-document matrix A as defined in (8) analogous to KO :
          Given the factorization of matrix A as defined in (10), the vector-space rep-
      resentation of its terms in k dimensions B as defined in (11) and a conversion-
      function bl (x, k) as defined in (13):
          Let D be the set of k dimensions in B
                           IR ⊆ (D × M ) = {(j, i) : ∀j ∈ D ∧ ∀i ∈ M ⇐⇒ bl (Bij , j) = 1}                                                                  (14)
          we define the reduced context of KO as KR = (D, M, IR ).
          Notice the inversion of pair (j, i) and Bij performed to respect LSA con-
      ventions that require term-document matrices and FCA that uses document as
      objects and terms as attributes. In the reduced context we say “dimension j con-
      tains term i if the evaluation of the conversion-function bl over the value of the
      coordinate j for the term i is 1”.
          Summarizing, in order to get a reduced context, the values for α and k must
      be found.
Cheating to achieve Formal Concept Analysis over a large formal context       355

3.5   Related approaches
Similar techniques have been proposed before. Gajdos et al[9] used LSA to re-
duce complexity in the structure of the lattice by eliminating noise in the formal
context. While this approach is useful, it does not reduce the amount of data,
but it “tunes it” to get a clearer result. Snasel et al. [20,9] proposed a matrix-
reduction algorithms based on NMF. 7 and SVD8 . While they state that these
methods are successful to reduce the amount of concepts obtained using FCA,
they do not describe a real life use of their technique (their experiment was
performed over a 17x16 matrix) neither do they discuss about the performance
of their approach. Kumar and Srinivas [1] approach consists of using fuzzy K-
Means clustering 9 to reduce the attributes in a formal term-document context.
In their approach, documents are categorized in k clusters using the cosine sim-
ilarity measure. Cheung et al. [2] introduced term-document lattices complexity
reduction by defining a set of equivalence relations that allows to reduce the set
of objects. Finally, Dias et al. introduced JBOS [7] (junction based on objects
similarity) which proposed a similar method where objects where group into pro-
totype objects by calculating its similarity according to certain weights assigned
manually to attributes.


4     Case Study: Software Architecture Community
The Software Architecture Corpus (SAC) was composed by extracting metadata
from papers retrieved by the ISI Web of Knowledge search engine 10 using the
query ”software architecture”. It is assumed that the keyword ”software archi-
tecture” is present in each paper on their titles and/or abstracts.
    While the search engine retrieved 4701 articles, not all of them have an
abstract to work with. Those are excluded from our analysis leaving 4565 articles
spanning from 1990 to 2009 (retrieved documents span from 1973 to 2009).

4.1   Term list
A term list was assembled by using Natural Language Processing over the arti-
cles’ titles and abstracts. In order to avoid common words, a stopword list and a
lexical tagger were used as a filter. A list of candidate terms was then manually
filtered to obtain a final list of 120 terms, which included words and multi-words
(such as “Unified Model Language”). Table 1 shows a sample of selected terms.
     Each term was looked up on each document and its frequency of use was
calculated. Then, a weighting measure was applied (tf.idf11 ) to each value. The
 7
   Non-negative matrix factorization
 8
   Single-Value Decomposition
 9
   K-Means Clustering is a classic clustering technique for vector-space models
10
   http://isiwebofknowledge.com
11
   Term Frequency-Inverse Document Frequency is a weighting measure commonly
   used on IR based on the notion that term infrequency on a global scale makes it
   important.
356         Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo

                             Table 1. Top 10 more frequent terms

                                  Term             Frequency
                                  design               1710
                                  development          1450
                                  component            1253
                                  process              1083
                                  implementation       1006
                                  datum                 874
                                  requirement           869
                                  analysis              851
                                  framework             817
                                  control               801



      term-document matrix Aw = aij was constructed using the final list of terms
      (M) and the corpus of documents (G) where aij represents the weight of term i
      in document j. We defined the relation I ⊆ (G × M ) = {(j, i) : ∀j ∈ G ∧ ∀i ∈
      M ⇐⇒ aij > 0} to build up the original context KO = (G, M, I) describing
      that a document contains a term only if its weight on it is over 0. The formal
      context KO was used later to compare our reductions.



      4.2     Reducing the SAC


      As we stated at the end of section 3.3, two parameters had to be set in order to
      create the reduced context. Sadly, in LSA there is not a known method to find
      the best value for k, and not knowing that, it is not possible to find a good value
      for α. We defined a set of goals to observe which were the values of k and α that
      best accomplished them. The goals defined were:

       – Low Execution time
       – High Stability
       – Few Concepts in the final lattice

      Using three fixed values for k we reduced several contexts and processed them
      through FCA in order to find the best value for α. As shown in figure 2, it was
      found that higher values of α (close to 0.5) yields the best results. Repeating
      the experience with 3 fixed values for α (0.45, 0.47 and 0.49) to find the best
      value for k we found a trade-off between stability and execution time as it can
      be observed in figure 3. Higher values of k yield higher stability but also a high
      execution time, and vice-versa. Since stability drops fast on k=60 and in the
      same value the execution time grows greatly, we selected it to obtain our results.
      α was set on 0.45 and 0.47.
Cheating to achieve Formal Concept Analysis over a large formal context                                                                                                           357


        2000                                                                                                0.30

                                                                              K=10                                                                                             K=10

                                                                              K=15                                                                                             K=15

                                                                              K=20                                                                                             K=20

                                                                                                            0.25




        1500

                                                                                                            0.20




                                                                                           Mean Stability
        1000
     time




                                                                                                            0.15




                                                                                                            0.10




            500

                                                                                                            0.05




             0
             0.10    0.15    0.20        0.25   0.30    0.35     0.40     0.45   0.50                       0.00
                                                                                                               0.10   0.15     0.20        0.25   0.30    0.35    0.40     0.45   0.50
                                                alpha                                                                                             alpha




                    (a) alpha vs Execution Time                                                                              (b) alpha vs Stability

                                                                Fig. 2. Fixed K, Variable α


            1.0                                                                                              1.0
                     alpha=0.45                                                                                       alpha=0.45
                     alpha=0.47                                                                                       alpha=0.47
                     alpha=0.49                                                                              0.9      alpha=0.49
            0.8
                                                                                                             0.8


                                                                                                             0.7
            0.6
                                                                                           100 mean stab




                                                                                                             0.6
      time




            0.4
                                                                                                             0.5


                                                                                                             0.4
            0.2
                                                                                                             0.3


            0.00        20          40           60        80           100          120                     0.20        20           40           60        80          100          120
                                                 K                                                                                                 K



       (a) K vs Execution Time Normalized                                                                      (b) K vs Top 100 Mean Stability

                                                                Fig. 3. Variable K, Fixed α



5            Results and Discussion

Table 2 shows a comparative of the characteristics of the lattices yielded from
two reduced contexts (KR ) and the original context (KO ). The lattices were
processed using the FCA suite Coron System12 .
12
     http://coron.loria.fr/
358         Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo

                             Table 2. Comparison of characteristics

                                                   α = 0, 45 α = 0, 47 Original
                     Objects                          60         60   4565
                     Attributes                       120       120    120
                     Density [%]                    17,24     10,59   6,59
                     Concepts                        6309      1207  170606
                     Coincidental Intents            3029       815      -
                     Mean attributes per concept    20,52      12,6   7,91
                     Intensional Stability          0,2170    0,3041 0.3995
                     Extensional Stability          0.2277    0.3211 0.1103
                     100 Top Int. Stab.             0,9061    0,7576    1
                     100 Top Ext. Stab.             0.9515    0.8287 0.9837
                     Levels                           10         7      10
                     Time [s]                       6,869     1,145 2865,723
                     Time to reduce [s]             39,333    39,325     -



          Results shows that using LSA before FCA performs a clear reduction in
      the formal context from a size of 4565 × 120 (original context) to 60 × 120
      (reduced context), specifically a reduction of 76 times the amount of data to
      be processed. It also lowers the amount of concepts yielded in the final
      lattice (27 and 141 times for α equal to 0.45 and 0.47 resp.), and because of that
      the time required to calculate the full concept lattice is considerably
      reduced, even considering the time required to create the reduced contexts.
          Stability gives more clues about the good quality of the reduction. Figure 4
      shows intensional and extensional stability distribution. As it can be observed,
      the original context’s lattice has a better intensional stability than the reduced
      contexts but a worst extensional stability. Mean values for these two measures
      are shown in table 2.
          Since we have eliminated redundant data, each dimension is almost equally
      important meaning that in reduced contexts we cannot afford to eliminate a
      subset of them without affecting greatly the structure of the lattice obtained. In
      this case, we have eliminated a big part of the noise (k=60 was in fact a very good
      choice). On the other hand, the growth in extensional stability reflects that the
      structure of the reduced lattices is not tied to some specific terms. Some terms
      can be removed and the structure of the lattice would not vary greatly, which is
      what happens each year (see section 3.1).


      5.1     A Software Architecture Taxonomy

      Figure 5 shows the reduced notation of the lattice for the reduced context (k=60
      and α = 0.45). This lattice was drawn with Coron-drawer13 a set of scripts
      specially written for large lattices. For the sake of space and simplicity we provide
      13
           http://code.google.com/p/coron-drawer/
Cheating to achieve Formal Concept Analysis over a large formal context                                                                        359


           7                                                                           6

                                                               a=0.45                                                                    a=0.45

                                                               a=0.47                                                                    a=0.47

                                                               original                                                                  original
           6
                                                                                       5




           5


                                                                                       4




           4
    hits




                                                                                hits
                                                                                       3



           3




                                                                                       2


           2




                                                                                       1
           1




           0                                                                           0
           0.1   0.2   0.3   0.4   0.5     0.6     0.7   0.8   0.9        1.0          0.1   0.2   0.3   0.4   0.5     0.6   0.7   0.8   0.9        1.0
                                    Stability                                                                   Stability




                   (a) Intensional Stability                                                  (b) Extensional Stability

                                                Fig. 4. Stability distribution (k=60)




                             Fig. 5. Filtered Lattice (K=60, α = 0.45, minsupp=0)



a small comparison of the terms in the reduced lattice-based taxonomy with a
human-expert handmade thesaurus of Software Architecture [8].


Software Architecture Thesaurus Comparison The thesaurus contains
494 elements (we call them elements to differentiate them from lattice’s con-
cepts and taxonomy’s terms) organized in a hierarchical fashion. They have at
most one parent and the hierarchy has multiple roots. The thesaurus is exhaus-
tive and comprises mainly definitions of Software Architecture’s concepts and
entities (such as framework’s names or important authors in the domain). The
comparison shows:

  – From our 120 term list, 50 terms (42%) match exactly with a term on the
    thesaurus. 25 terms (21%) have a semi-match, meaning that they are part
    of a term on the thesaurus (database in our hierarchy and shared database
    in the thesaurus) and 45 (37%) terms do not have a simile in the thesaurus.
360       Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo

       – The three main concepts design, analysis and framework (with support over
         50%) found in our taxonomy, also remain being main elements in the the-
         saurus.
       – Even when some elements in the thesaurus are not found in our taxonomy,
         they actually exists as relations among terms. For instance, the relation
         among the terms design and pattern describe the thesaurus’ element design
         pattern. This is also true for design decision, information view, knowledge
         reuse, quality requirements, business methodology and several more elements.


      6   Conclusions
      In this work we have presented a method and a technique to apply Formal
      Concept Analysis (FCA) to large contexts of data in order to obtain a lattice-
      based taxonomy. We have outlined that large-size datasets are not suitable to
      be processed by FCA and that, this fact is an important problem in the domain.
          The solution presented here, is based on an Information Retrieval technique
      called Latent Semantic Analysis which is used to reduce a term-document matrix
      to a much smaller matrix where terms are related to a set of dimensions instead
      of documents. Using a probabilistic approach, this matrix is converted into a
      binary formal context where FCA can be applied.
          The approach was illustrated with a case study using a research domain from
      computational sciences called Software Architecture. The corpus created for this
      domain consists of more than 4500 documents and 120 terms. We have com-
      pared the characteristics of the lattice obtained through FCA from the original
      formal context of terms and documents and the reduced contexts generated by
      our approach. We have found that not only our approach is considerably more
      economic in execution time as well as in the amount of concepts obtained in
      the final lattice but intensional and extensional stabilities give us elements to be
      certain of the quality of our approach.
          A small comparison with a human expert handmade thesaurus of the com-
      munity of Software Architecture is provided in order to illustrate that a real and
      coherent taxonomy can be obtained using our approach.


      References
       1. Ch. Aswani Kumar and S. Srinivas. Concept lattice reduction using fuzzy K-Means
          clustering. Expert Systems with Applications, 37(3):2696–2704, March 2010.
       2. Karen S. K. Cheung and Douglas Vogel. Complexity Reduction in Lattice-Based
          Information Retrieval. Information Retrieval, 8(2):285–299, April 2005.
       3. Philipp Cimiano, Andreas Hotho, and Steffen Staab. Learning concept hierarchies
          from text corpora using formal concept analysis. J. Artif. Int. Res., 24:305–339,
          August 2005.
       4. Vı́ctor Codocedo and Hernán Astudillo. No mining, no meaning: relating docu-
          ments across repositories with ontology-driven information extraction. In Proceed-
          ing of the eighth ACM symposium on Document engineering, DocEng ’08, pages
          110–118, New York, NY, USA, 2008. ACM.
Cheating to achieve Formal Concept Analysis over a large formal context            361

 5. Wisam Dakka, Panagiotis G. Ipeirotis, and Kenneth R. Wood. Automatic con-
    struction of multifaceted browsing interfaces. In Proceedings of the 14th ACM
    international conference on Information and knowledge management, CIKM ’05,
    pages 768–775, New York, NY, USA, 2005. ACM.
 6. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and
    Richard Harshman. Indexing by latent semantic analysis. Journal of the american
    society for Information Science, 41(6):391–407, 1990.
 7. Sergio M. Dias and Newton J. Vieira. Reducing the size of concept lattices: The
    JBOS Approach. In Proceedings of the 8ht international conference on Concept
    Lattices and their Applications, CLA 2010, pages 80–91, 2010.
 8. Anabel Fraga and Juan Lloréns. Training initiative for new software/enterprise
    architects: An ontological approach. In WICSA, page 19. IEEE Computer Society,
    2007.
 9. Petr Gajdos, Pavel Moravec, and Václav Snásel. Concept lattice generation by
    singular value decomposition. In Václav Snásel and Radim Belohlávek, editors,
    International Workshop on Concept Lattices and their Applications (CLA), volume
    110 of CEUR Workshop Proceedings. CEUR-WS.org, 2004.
10. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foun-
    dations. Springer, Berlin/Heidelberg, 1999.
11. Gene H. Golub and Charles F. Van Loan. Matrix computations (3rd ed.). Johns
    Hopkins University Press, Baltimore, MD, USA, 1996.
12. Nicolas Jay, François Kohler, and Amedeo Napoli. Analysis of social communities
    with iceberg and stability-based concept lattices. In Proceedings of the 6th inter-
    national conference on Formal concept analysis, ICFCA’08, pages 258–272, Berlin,
    Heidelberg, 2008. Springer-Verlag.
13. John Kominek and Rick Kazman. Accessing multimedia through concept clus-
    tering. In Proceedings of the SIGCHI conference on Human factors in computing
    systems, CHI ’97, pages 19–26, New York, NY, USA, 1997. ACM.
14. Sergei Kuznetsov. Stability as an estimate of the degree of substantiation of hy-
    potheses derived on the basis of operational similarity. nauchn. tekh. inf., ser.2
    (automat. document. math. linguist.). 12:21–29, 1990.
15. Sergei Kuznetsov, Sergei Obiedkov, and Camille Roth. Reducing the representa-
    tion complexity of lattice-based taxonomies. In Uta Priss, Simon Polovina, and
    Richard Hill, editors, Conceptual Structures: Knowledge Architectures for Smart
    Applications, volume 4604 of Lecture Notes in Computer Science, pages 241–254.
    Springer Berlin / Heidelberg, 2007.
16. Sergei O. Kuznetsov. On stability of a formal concept. Annals of Mathematics and
    Artificial Intelligence, 49:101–115, April 2007.
17. Uta Priss. Formal concept analysis in information science. Annual Review of
    Information Science and Technology, 40(1):521–543, September 2007.
18. Camille Roth and Paul Bourgine. Lattice-based dynamic and overlapping tax-
    onomies: The case of epistemic communities. SCIENTOMETRICS, 69(2):429–447,
    NOV 2006.
19. Camille Roth, Sergei Obiedkov, and Derrick Kourie. Towards concise represen-
    tation for taxonomies of epistemic communities. In Proceedings of the 4th in-
    ternational conference on Concept lattices and their applications, CLA’06, pages
    240–255, Berlin, Heidelberg, 2008. Springer-Verlag.
20. Vaclav Snasel, Martin Polovincak, and Hussam M. Dahwa. Concept lattice Re-
    duction by Singular Value Decomposition. Proceedings of the Spring Young Re-
    searcher’s Colloquium on Database and Information Systems, 2007.
362       Vı́ctor Codocedo, Carla Taramasco and Hernán Astudillo

      21. Gerd Stumme. Efficient data mining based on formal concept analysis. In Abdelka-
          der Hameurlain, Rosine Cicchetti, and Roland Traunmüller, editors, Database and
          Expert Systems Applications, volume 2453 of Lecture Notes in Computer Science,
          pages 3–22. Springer Berlin / Heidelberg, 2002.
      22. Rudolf Wille. Restructuring lattice theory: an approach based on hierarchies of
          concepts. In Ivan Rival, editor, Ordered sets, pages 445–470, Dordrecht–Boston,
          1982. Reidel.
      23. Rudolf Wille. Formal concept analysis as mathematical theory of concepts and
          concept hierarchies. In Bernhard Ganter, Gerd Stumme, and Rudolf Wille, editors,
          Formal Concept Analysis, volume 3626 of Lecture Notes in Computer Science,
          pages 1–33. Springer Berlin / Heidelberg, 2005.
      24. Jian-hua Yeh and Naomi Yang. Ontology construction based on latent topic ex-
          traction in a digital library. In George Buchanan, Masood Masoodian, and Sally
          Cunningham, editors, Digital Libraries: Universal and Ubiquitous Access to Infor-
          mation, volume 5362 of Lecture Notes in Computer Science, pages 93–103. Springer
          Berlin / Heidelberg, 2008.
         A FCA-based analysis of sequential care
                     trajectories

           Elias EGHO, Nicolas Jay, Chedy Raissi and Amedeo Napoli

               Orpailleur Team, LORIA, Vandoeuvre-les-Nancy, France
               elias.egho,nicolas.jay,chedy.raissi,amedeo.napoli@loria.fr



        Abstract. This paper presents a research work in the domains of se-
        quential pattern mining and formal concept analysis. Using a combined
        method, we show how concept lattices and interestingness measures such
        as stability can improve the task of discovering knowledge in symbolic
        sequential data. We give example of a real medical application to illus-
        trate how this approach can be useful to discover patterns of trajectories
        of care in a french medico-economical database.

        Keywords: Data-Mining, Formal Concept Analysis, Sequential patterns,
        stability


  1   Introduction

  Sequential pattern mining, introduced by Agrawal et al [2], is a popular ap-
  proach to discover patterns in ordered data. It can be seen as an extension of
  the well known association rule problem, applied to data that can be modelled
  as sequences of itemsets, indexed for example by dates. It helps to discover
  rules such as: customers frequently first buy DVDs of episodes I, II and III of
  Stars Wars, then buy within 6 months episodes IV, V, VI of the same famous
  epic space opera. Sequential pattern mining has been successfully used so far in
  various domains : DNA sequencing, customer behavior, web mining . . . [2].
      Many scalable methods and algorithms have been published so far to effi-
  ciently mine sequential patterns. However few of them deal with the multidi-
  mensional aspect of databases. Multidimensionality conveys two notions:

   – items can be of different intrinsic nature. While the common approach con-
     siders objects of the same dimension, for example articles bought by cus-
     tomers, databases can hold much more information such as article price,
     gender of the customer, location of the store and so on.
   – a dimension can be considered at different levels of granularity. For example,
     apples in a basket market analysis can be either described as fruits, fresh
     food or food following a hierarchical taxonomy.

  Plantevit et al. [13] address this problem as mining multidimensional and multi-
  level sequential patterns and propose a method to achieve this task. They rely on
  the support measure to efficiently discover relevant sequential patterns. Support

c 2011 by the paper authors. CLA 2011, pp. 363–376. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
364     Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli


indicates to what extent a pattern is frequent in a database. Many (sequential
and non sequential) itemset mining methods use support as measure for find-
ing interesting correlations in databases. However, the most relevant patterns
may not be the most frequent ones. Moreover, discovering interesting patterns
with low support leads generally to overwhelming results that need to be further
processed in order to be analyzed by human experts.
    Formal Concept Analysis (FCA) is a theory of data analysis introduced in
[17], that is tightly connected with data-mining and especially the search of
frequent itemsets [16]. FCA organizes information into a concept lattice repre-
senting inherent structures existing in data. Recently, some authors proposed
new interest measures to reduce complex concept lattices and thus find inter-
esting patterns. In [9], Kuznetsov introduces stability, successfully used in social
network and social community analysis [7, 6].
    To our knowledge, there are no similar approaches to find interesting sequen-
tial patterns. In this paper, we present an original experiment based on both
multilevel and multidimensional sequential patterns and lattice-based classifi-
cation. This experiment may be regarded from two points of view: on the one
hand, it is based on multilevel and multidimensional sequential patterns search,
and on the other hand, visualization and classification of extracted sequences
is based on Formal Concept Analysis (FCA) techniques, organizing them into
a lattice for analysis and interpretation. It has been motivated by the problem
of mining care trajectories in a regional healthcare system, using data from the
PMSI, the so called French hospital information system. The remaining of the
paper is organized as follows. In Section 2, we present the problem of mining
care trajectories. Section 3 presents the methods proposed in domains of multi-
level and multidimensional sequential patterns and Formal Concept Analysis. In
Section 4, we present some of the results we achieved.


2     Mining healthcare trajectories
The PMSI (Programme de Médicalisation des Systèmes d’ information) database
is a national information system used in France to describe hospital activity
with both an economical and medical point of view. The PMSI is based on
the systematic collection of administrative and medical data. In this system,
every hospitalization leads to the collection of administrative, demographical and
medical data. This information is mainly used for billing and planning purposes.
Its structure can be described (and voluntarily simplified) as follows:
 – Entities (attributes):
    • Patients (id, gender . . . )
    • Stays (id, hospital, principal diagnosis, . . . )
    • Associated Diagnoses (id)
    • Procedures (id, date,. . . )
 – Relationships
    • a patient has 1 or more stays
    • a stay may have several procedures
                          A FCA-based analysis of sequential care trajectories      365


       • a stay may have several associated diagnoses


    The collection of data is done with a minimum recordset using controlled
vocabularies and classifications. For example, all diagnoses are coded with the
International Classification of Diseases (ICD10)1 . Theses classifications can be
used as taxonomies to feed the process of multilevel sequential pattern mining
as shown in figure 1.




                     ICD 10                        Institutions taxonomy

      Fig. 1. Examples of taxonomies used in multilevel sequential pattern mining



    Healthcare management and planning play a key role for improving the over-
all health level of the population. From a population point of view, even the
best and state-of-the-art therapy is not effective if it cannot be delivered in
the right conditions. Actually, many determinants affect the effective delivery of
healthcare services: availability of trained personnel, availability of equipment,
security constraints, costs, proximity . . . . All of these should meet economics,
demographics, and epidemiological needs in a given area. This issue is especially
acute in the field of cancer care where many institutions and professionals must
cooperate to deliver high level, long term, and costly care. Therefore, it is crucial
for healthcare managers and decision makers to be assisted by decision support
systems that give strategic insights about the intrinsic behavior of the healthcare
system.
    On the one hand, healthcare systems can be considered as rich in data as
they produce massive amounts of data such as electronic medical records, clin-
ical trial data, hospital records, administrative data, and so on. On the other
hand, they can be regarded as poor in knowledge as these data are rarely em-
bedded into a strategic decision-support resource [1]. We used the PMSI system
as a source of data to study patient movements between several institutions.
By organizing themselves into groups of sequences representing trajectories of
care, we aim at discovering patterns describing the whole course of treatments
for a given population. This global approach contrasts with the usual statistical
exploitations of the PMSI data that focus mainly on single hospitalizations.
    In this experiment, we have worked on four years (2006 – 2009) of the PMSI
data of the Burgundy region related to patient suffering from lung cancer.

1
    http://apps.who.int/classifications/apps/icd/icd10online/
366      Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli


3     Related work
3.1    Sequential Pattern Mining
Let I be a finite set of items. A subset of I is called an itemset. A sequence
s = hs1 s2 . . . sk i (si ⊆ I) is an ordered list of itemsets. A sequence s = hs1 s2 . . . sn i
                                       0    0 0         0
is a subsequence of a sequence s = hs1 s2 . . . sm i if and only if ∃i1 , i2 , . . . in such
                                          0           0          0                     0
that i1 ≤ i2 ≤ . . . ≤ in and s1 ⊆ si1 , s2 ⊆ si2 . . . an ⊆ sin . We note s ⊆ s and
                    0
also say that s contains s. Let D = {s1 , s2 . . . sn } be a database of sequences.
The support of a sequence s in D is the proportion of sequences of D containing
s. Given a minsup threshold, the problem of frequent sequential pattern min-
ing consists in finding the set FS of sequences whose support is not less than
minsup. Following the seminal work of Agrawal and Srikant [2] and the Apriori
algorithm, many studies have contributed to the efficient mining of sequential
pattern. The main approaches are PrefixSpan [11], SPADE [20], SPAM [3], PSP
[10], DISC [4] and PAID [18].
    Much work has been done in the area of single-dimensional sequential pat-
terns, i.e, all the items in a sequence have the same nature like the sequence
of products sold in a certain store. But in many cases, the information in a se-
quence can be based on several dimensions. For example: a male patient had a
surgical operation in Hospital A and then received chemotherapy in Hospital B.
In this case, we have 3 dimensions: gender, type of treatment (chemotherapy,
surgery) and location (Hospitals A and B). Pinto et al [12] is the first work
giving solutions for mining multidimensional sequential patterns. They propose
to include some dimensions in the first or the last itemset in the sequence. But
this works only for dimensions that remain constant over time, such as gender
in our previous example. Among other proposals addressed in this area, Yu et al
[19] consider multidimensional sequential pattern mining in the web domain. In
their approach, dimensions are pages, sessions and days. They present two algo-
rithms AprioriMD and PrefixMDSpan by modifying the Apriori and PrefixSpan
algorithms. Zhang et al [21] propose the mining of multidimensional sequential
patterns in distributed system.
    Moreover, each dimension can be represented by different levels of granu-
larity, using a taxonomy which defines the hierarchical relations between items.
Figure 2 shows an example of a diseases taxonomy. Including knowledge con-
tained in the taxonomy leads to the problem of multilevel sequential pattern
mining. Its interest resides in the capacity to extract more or less general/specific
sequential patterns and overcome problems of excessive granularity and low sup-
port. For example, using the diseases taxonomy in Figure 2, sequences such as
hHeartDisease, BrainDisease > could be extracted while hArryth., BrainDisease >
and hMyoc.Inf., BrainDisease > may have a too low support.
    Although Srikant and Agrawal [14] early introduced hierarchy management in the
extraction of association rules and sequential patterns, their approach was not scalable
in a multidimensional context. Han et al [5] proposed a method for mining multiple
level association rules in large databases. But their approach could not extract patterns
containing items from different levels in the taxonomy. Plantevit et al [13] proposed
M3SP, a method taking both multilevel and multidimensional aspects into account.
M3SP is able to find sequential patterns with the most appropriate level of granularity.
                          A FCA-based analysis of sequential care trajectories          367




                                  Fig. 2. disease’s taxonomy



    The PMSI is a multidimensional database holding information coded with controled
vocabularies and taxonomies. Therefore, we relied on M3SP to extract multilevel and
multidimensional sequential patterns. Nevertheless, the M3SP paradigm is still the
search of frequent patterns. As our objective is to discover interesting patterns that
may be infrequent, we ran M3SP iteratively until very low support thresholds. (See
appendix for more details about M3SP and how we used it). This produced massive
amounts of patterns requiring further processing for a practical interpretation by a
domain expert. This next phase was conducted with a lattice-based classification of
sequential patterns described in the following section.



3.2   Formal Concept Analysis

Introduced by Wille [17], Formal Concept Analysis is based on the mathematical order
theory. FCA has successfully been applied to many fields, such as medicine and psy-
chology, musicology, linguistic databases, information science, software engineering . . . .
A strong feature of Formal Concept Analysis is its capability of producing graphical
visualizations of the inherent structures among data.
    FCA starts with a formal context K = (G, M, I) where G is a set of objects, M is a
set of attributes, and the binary relation I = G × M specifies which objects have which
attributes. Two operators, both denoted by 0 , connect the power sets of objects 2G and
attributes 2M as follows:

                           0
                               : 2G → 2M , X0 = {m ∈ M|∀g ∈ X, gIm}
                           0
                               : 2M → 2G , Y0 = {g ∈ G|∀m ∈ Y, gIm}

The operator 0 is dually defined on attributes. The pair of 0 operators induces a Galois
connection between 2G and 2M . The composition operators 00 are closure operators: they
are idempotent, extensive and monotonous. For any A ⊆ G and B ⊆ M, A00 and B 00 are
closed sets whenever A = A00 and B = B00 .
     A formal concept of the context K = (G, M, I) is a pair (A, B) ⊆ G × M where A0 = B
and B0 = A. A is called the extent and B is called the intent. A concept (A1 , B1 ) is a
subconcept of a concept (A2 , B2 ) if A1 ⊆ A2 (which is equivalent to B2 ⊆ B1 ) and we write
(A1 , B1 ) ≤ (A2 , B2 ). The set B of all concepts of a formal context K together with the
partial order relation ≤ forms a lattice and is called concept lattice of K. This lattice
can be represented as a Hasse diagram providing a visual support for interpretation.
368     Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli


4     Classification and selection of interesting care
      trajectories
We use FCA to classify and filter the results of the sequential mining step. The formal
context is built by taking patients as objects, and sequential patterns as attributes. A
patient p, considered as a sequence, is related to a sequential pattern s if p contains
s. Table 4 shows a formal context KP S representing the binary relation between the
patients and the sequences. The cross indicates that the patient has passed completely
in the sequence of the health facilities. Thus, we achieve a classification of patients
according to their trajectories of care.


                                    Seq1 Seq2 Seq3 Seq4
                               P1    x    x    x
                               P2              x
                               P3         x    x    x
                               P4         x

                             Table 1.formal context KP S



    In order to choose the most important concepts, we rely on stability, a measure of
interest introduced in [8] and revisited in [9].
    Let (A, B) be a formal concept of B. Stability of (A, B) is defined as:
                                                 0   0
                                γ(A, B) = |{C⊆A|C2|A|
                                                   =A =B}|


    The stability index of a concept indicates how much the concept intent depends
on particular objects of the extent. It indicates the probability of preserving concept
intent while removing some objects of its extent. A stable concept continues to be a
concept even if a few members stop being members. This means also that a stable
concept is resistant to noise and will not collapse when some members are removed
from its extent.
    Stability offers an alternative point of view on concepts compared to the well known
metric of support based on frequency, which is noticely used to build iceberg lattices
[15]. Actually, combining support and stability allows a more subtle interpretation, as
shown in a previous work in the same application domain [6].


5     Results

5.1    Patient healthcare trajectories
The PMSI is a relational database holding informations for any hospitalization in
France. We reconstituted patient care trajectories from PMSI data considering each
stay as an itemset. The sequence of stays for a same patient defines his care trajectory.
In our experiment, itemsets could be made of various combinations of dimensions.
Table 2 shows the trajectories of care obtained using two dimension (principal diagnosis,
hospital ID). For example (C341,210780581) represents one hospitalization for a patient
                         A FCA-based analysis of sequential care trajectories       369


       Patient                         Sequence
         p1    h(C341, 750712184)(Z452, 580780138)(D122, 030785430) . . .i
         p2    h(C770, 100000017)(C770, 210780581)(Z080, 210780581) . . .i
         p3 h(H259, 210780110)(H259, 210780110)(K804, 210010070) . . . >
         p4     h(R91, 210780136)(C07, 210780136)(C341, 210780136) . . .i

Table 2. Care trajectories of 4 patients showing principal diagnoses and hospital IDs



in the University Hospital of Dijon (coded as 210780581) treated for a lung cancer
(C341). Our dataset contained 486 patients suffering from lung cancer and living in
the French region of Burgundy.
    Table 3 shows some of the patterns generated by M3SP with the data presented in
Table 2 using taxonomies of Figure 1. Pattern 3 can be interpreted as follows: 36% of
patients have a hospitalization in a private institution (CL), for any kind of principal
diagnosis (ALL). Then, 3 hospitalizations follow with the same principal diagnosis
(Z511 coding for chemotherapy). That kind of pattern demonstrates the interest of
multilevel and multidimensional sequential pattern mining: though principal diagnosis
are the same in the third last stays, hospitals can be different. Mining at the lowest
level of granularity, without taxonomies, would generate many different patterns with
lower support.


      ID Support                          Pattern
       1 100%                           h(All, All)i
       2   65%              h(Z511, All)(Z511, All)(Z511, All)i
       3   36%         h(All, CL)(Z511, All)(Z511, All)(Z511, All)i
       4   21% h(Z511, CH)(Z511, CH)(Z511, CH)(Z511, CH)(Z511, CH)i

            Table 3. Example of sequential patterns generated by M3SP



    However, for low support thresholds, the number of extracted patterns dramatically
grows with the size of the database, depending on the number of patients, the size of
the taxonomies and the number of dimensions as shown in Table 4.


             Dimensions used                  Number of patterns
             Institutions                                  1529
             Principal Diagnosis, Institution              4051
             All diagnoses                                50546
             Institutions, Medical Procedures            293402

          Table 4. Number of patterns generated by M3SP (minsup=5%)




    The next step consists in building a lattice with the resulting sequential patterns
in order to facilitate interpretation and selection of interesting care trajectories.
370         Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli


5.2       Lattice-based classification of sequential patterns
We illustrate this approach with patterns representing the sequences of institutions
that are frequent in the patients set. We built a formal context relating 486 patients
and 1529 sequential patterns. These sequences are generated in the first experimental
by considering only one dimension (healthcare institutions). It is characterized with
a taxonomy with two levels of granularity. We iteratively applied M3SP, decreasing
threshold by one patient at each step. The resulting lattice has 10145 concepts orga-
nized on 48 different levels. Figure 3 shows the upper part of the lattice. Concepts
intents are sets of one or more sequential patterns. From the lowest right concept, we
can see that 37 patients support 3 sequential patterns:

 – at least one hospitalization in the hospital 690781810
 – they were hospitalized at least once in a University Hospital (CHU/CHR)
 – they had at least 2 hospitalization, for simplicity, 2*(ALL) is the contraction of
   (ALL)(ALL).

The intent of top concept is h(ALL)i, because all patients have at least one hospital-
ization during their treatment. The intent of co-atoms (i.e. immediate descendant of
top) is always a sequence of length one, holding items of high level of granularity.



                                                     <{(All)}>
                                                      Nb=486




                                                                  <{(CHU/CHR)}
                                                 <{(CL)}>                              <{(CH)}>
                          <2*{(All)}>                                   >
                                                 <{(All)}>                             <{(All)}>
                            Nb=475                                   <{(All)}>
                                                  Nb=287                                Nb=279
                                                                      Nb=259




                        <{(710780354)}>    <{(890000409)}>                                  <{(210780581)}>
                                                             <{(CHU/CHR)}>     <{(CH)}>
          <3*{(All)}>      <{(CL)}>           <{(CH)}>                                      <{(CHU/CHR)}>
  …	
       Nb=459         <{(All)}>
                             Nb=12
                                              <{(All)}>
                                               Nb=457
                                                               <2*{(All)}>
                                                                 Nb=254
                                                                              <2*{(All)}>
                                                                                Nb=277
                                                                                               <{(All)}>
                                                                                                Nb=156
                                                                                                              …	
  


                                              <{(710780354)}>
                         <{(210780714)}>                             <{(710780958)}>        <{(690781810)}>
                                              <{(210780979)}>
  …	
     <4*{(All)}>
            Nb=445
                            <{(CH)}>
                           <2*{(All)}>
                                                 <{(CL)}>
                                                 <{(All)}>
                                                                        <{(CH)}>
                                                                       <2*{(All)}>
                                                                                            <{(CHU/CHR)}>
                                                                                              <2*{(All)}>     …	
  
                              Nb=23                                       Nb=32                  Nb=37
                                                   Nb=8




                        Fig. 3. Lattice of sequences of healthcare institutions


    Filtering concepts can be achieved using both support and stability. In order to
highlight the interesting properties of stability, we try to answer the question “is there a
number of hospitalizations that characterizes care trajectories for lung cancer?”. A basic
scheme in lung cancer treatment consists generally in a sequence of 4 chemotherapy
sessions possibly following a surgical operation. Due to noise in data or variability in
                          A FCA-based analysis of sequential care trajectories         371


practices, we may observe sequences of 4, 5, 6 or more stays in the PMSI database.
Mining such data with an a priori fixed support threshold may not discover the most
interesting patterns. If the threshold is too high, we simply miss the good pattern. If
it is too low, similar patterns, differing only in length, with close values of support can
be extracted. Figure 4 shows the power of stability in discriminating such patterns.
The concept with intent h(CL)ih2 ∗ (ALL)i is the most frequent. It represents patients
with at least a stay in a private organization, and at least 2 stays in hospital. Similar
concepts have a relatively close support, and differ only in the total number of stays.
The concept with 5 stays has the highest stability. This probably matches the basic
treatment scheme of lung cancer. Our interpretation relies on the power of stability to
point out noisy concepts. Actually, only a few patients in concept h(CL)ih2 ∗ (ALL)i
have only 2 stays.




Fig. 4. Discriminating power of stability: scatter plot of support and stability of con-
cepts (represented by their intent)




    Another interesting feature of lattice-based classification of sequential patterns lies
in its ability to characterize objects by several patterns. Let consider the minimal
database of sequences D = {s1 = h(a)(b)(c)i; s2 = h(a)(c)(b)i : s3 = h(d)i}. With a 2/3
threshold, h(a)(b)i and h(a)(c)i are considered as frequent sequential patterns, but
sequential pattern mining will give no information about the fact that all sequences
containing the pattern h(a)(b)i contain also the pattern h(a)(c)i. However this infor-
mation can be obtained by classifying sequential patterns with FCA.
372     Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli


6     Conclusion

In this paper we propose an original combination of sequential pattern mining and FCA
to explore a database of multidimensional sequences. We show some interesting prop-
erties of concept lattices and stability index to classify and select interesting sequential
patterns. This work is in a early step. Further developments can be made in several
axes. First, other measures of interest could be investigated to qualify sequential pat-
terns. Furthermore, connexions between FCA and the sequential mining problem could
be explored in a more integrative approach, especially by studying closure operators
on sequences.


7     Acknowledgments

The authors wish to thank the TRAJCAN project for its financial support and Mrs.
Catherine QUANTIN, the responsible of TRAJCAN project at university hospital of
Dijon.


References

 1. Abidi, S.S.: Knowledge management in healthcare: towards ’knowledge-driven’
    decision-support services. Int J Med Inform 63(1-2), 5–18 (Sep 2001)
 2. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen,
    A.S.P. (eds.) Eleventh International Conference on Data Engineering.
    pp. 3–14. IEEE Computer Society Press, Taipei, Taiwan (1995), cite-
    seer.ist.psu.edu/agrawal95mining.html
 3. Ayres, J., Gehrke, J., Yiu, T., Flannick, J.: Sequential pattern mining using a
    bitmap representation. pp. 429–435. ACM Press (2002)
 4. ying Chiu, D., hung Wu, Y., Chen, A.L.P.: An efficient algorithm for mining fre-
    quent sequences by a new strategy without support counting. In: In Proceedings
    of the 20th International Conference on Data Engineering (ICDE’04. pp. 375–386.
    IEEE Computer Society (2004)
 5. Han, J., Fu, Y.: Mining multiple-level association rules in large databases. Knowl-
    edge and Data Engineering, IEEE Transactions on 11(5), 798 –805 (sep/oct 1999)
 6. Jay, N., Kohler, F., Napoli, A.: Analysis of social communities with iceberg and
    stability-based concept lattices. In: Medina, R., Obiedkov, S.A. (eds.) International
    Conference on Formal Concept Analysis (ICFCA’08). LNAI, vol. 4923, pp. 258–
    272. Springer (2008)
 7. Kuznetsov, S., Obiedkov, S., Roth, C.: Reducing the representation complexity
    of lattice-based taxonomies. In: Priss, U., Polovina, S., Hill, R. (eds.) Proc. of
    ICCS 15th Intl Conf Conceptual Structures. LNCS/LNAI, vol. 4604, pp. 241–254.
    Springer (2007)
 8. Kuznetsov, S.O.: Stability as an estimate of the degree of substantiation of hy-
    potheses derived on the basis of operational similarity. Nauchn. Tekh. Inf., Ser.2
    (Automat. Document. Math. Linguist.) 12, 21–29 (1990)
 9. Kuznetsov,      S.O.:   On      stability   of    a    formal    concept.     Annals
    of    Mathematics     and      Artificial  Intelligence    49,    101–115     (2007),
    http://www.springerlink.com/content/fk1414v361277475/
                          A FCA-based analysis of sequential care trajectories        373


10. Masseglia, F., Cathala, F., Poncelet, P.: The psp approach for mining sequential
    patterns. pp. 176–184 (1998)
11. Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixs-
    pan: Mining sequential pattern by prefix-projected growth. In: ICDE. pp. 215–224
    (2001)
12. Pinto, H., Han, J., Pei, J., Wang, K., Chen, Q., Dayal, U.: Multi-dimensional
    sequential pattern mining. In: CIKM ’01: Proceedings of the tenth international
    conference on Information and knowledge management. pp. 81–88. ACM Press,
    New York, NY, USA (2001)
13. Plantevit, M., Laurent, A., Laurent, D., Teisseire, M., Choong, Y.W.: Mining mul-
    tidimensional and multilevel sequential patterns. ACM Trans. Knowl. Discov. Data
    4(1), 1–37 (2010)
14. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations
    and performance improvements. In: Apers, P.M.G., Bouzeghoub, M.,
    Gardarin, G. (eds.) Proc. 5th Int. Conf. Extending Database Tech-
    nology, EDBT. vol. 1057, pp. 3–17. Springer-Verlag (25–29                       1996),
    http://citeseer.ist.psu.edu/article/srikant96mining.html
15. Stumme, G.: Efficient data mining based on formal concept analysis. In: Lecture
    Notes in Computer Science, vol. 2453, p. 534. Springer (Jan 2002)
16. Valtchev, P., Missaoui, R., Godin, R.: Formal concept analysis for knowledge dis-
    covery and data mining: The new challenges. In: Eklund, P.W. (ed.) ICFCA. Lec-
    ture Notes in Computer Science, vol. 2961, pp. 352–371. Springer (2004)
17. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of con-
    cepts. In: Rival, I. (ed.) Ordered Sets. Reidel (1982)
18. Yang, Z., Kitsuregawa, M., Wang, Y.: Paid: Mining sequential patterns by passed
    item deduction in large databases. In: IDEAS’06. pp. 113–120 (2006)
19. Yu, C.C., Chen, Y.L.: Mining sequential patterns from multidimensional sequence
    data. Knowledge and Data Engineering, IEEE Transactions on 17(1), 136 – 140
    (jan 2005)
20. Zaki,     M.J.:    Spade:    An    efficient   algorithm    for   mining     frequent
    sequences.      Machine      Learning      42(1-2),    31–60     (January       2001),
    http://www.springerlink.com/link.asp?id=n3t642725v615427
21. Zhang, C., Hu, K., Chen, Z., Chen, L., Dong, Y.: Approxmgmsp: A scalable method
    of mining approximate multidimensional sequential patterns on distributed system.
    In: Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth Interna-
    tional Conference on. vol. 2, pp. 730 –734 (aug 2007)
374      Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli


Appendix
The M3SP algorithm is able to extract sequential patterns characterized by several
dimensions with different levels of granularity for each dimension [13]. Each dimension
has a taxonomy which defines the hierarchical relations between items. M3SP runs in
three steps: data pre-processing , MAF-item generation and sequence mining.
    In Figure 5, we present an example to illustrate the mechanism of M3SP. Table
5b shows a dataset of hospitalizations relating patients (P) with attributes from three
dimensions
 – T, the date of stay,
 – H, the healthcare setting in which the hospitalization takes place,
 – D, the disease of the patient.

    For instance, the first tuple means that, at date 1, the patient 1 has been treated
for the disease D11 in hospital H11 . Let us now assume that we want to extract all
multidimensional sequences that deal with hospitals and diseases that are frequent in
the patients set. Figure 5a displays a taxonomy for dimensions H and D.


Pre-processing step
M3SP considers three types of dimensions: a temporal dimension Dt , a set of analysis
dimensions DA , and a set of reference dimensions DR . M3SP orders the dataset according
to Dt . The tuples appearing in a sequence are defined over the dimensions of DA . The
support of the sequences is computed according to dimensions of DR . M3SP splits the
dataset into blocks according to distinct tuple values over reference dimensions. The
support of a given multidimensional sequence is the ratio of the number of blocks
supporting the sequence over the total number of blocks. In our example, H (hospitals)
and D (diseases) are the analysis dimensions, T is the temporal dimension and P
(patients) is the only reference dimension. We obtain two blocks defined by Patient1
and Patient2 . as shown in table 5c.


MAF-item generation step
In this step, M3SP generates all the Maximal Atomic Frequent items or MAF-items.
In order to define MAF-items, we fisrt define the specificity relation between items.

Specificity relation. Given two multidimensional items a = (d1 , ..., dm ) and
 0      0       0     0                                                         0
a = (d1 , ..., dm ), a is said to be more specific than a, denoted by a 4I a , if for every
                0
i = 1, ..., m, di ∈ di ↓. Where di ↓ is the set of all direct specializations of di according
to the dimension taxonomy of di . In our example, we have (H1 , D1 ) 4I (H1 , D11 ), because
H1 ∈ H1 ↓ and D1 ∈ D11 ↓.

MAF-item. An atomic item a is said to be a0 Maximal Atomic0 Frequent item,
                                                                        0
                                                                           or a
MAF-item, if a is frequent and if for every a such that a 4I a , the item a is not
frequent. In our example, if we consider minsup = 100%, b = (H1 , D1 ) is a MAF-item,
because it is frequent and there is not another item as frequent and more specific than
b.
    The computation of MAF-items is represented by a tree in which the nodes are
of the form (d1 , d2 )s , meaning that (d1 , d2 )s is an atomic item with support s as we
                           A FCA-based analysis of sequential care trajectories          375


show in Figure 5d. In this tree, MAF-items are displayed as boxed nodes. We note
that all leaves are not necessarily MAF-items. For example, (H2 , D21 )100% is a leaf, but
not a MAF-item. This is because (H2 , D21 )100% 4I (H21 , D21 )100% and (H21 , D21 ) has been
identified as being an MAF-item.


Sequence mining step
Frequent sequences can be mined using any standard sequential pattern-mining algo-
rithm (PrefixSpan in this work). Since in such algorithms, the dataset to be mined is
a set-pairs of the form (id, seq), where id is a sequence identifier and seq is a sequence
of itemsets, our example dataset is transformed as follows :

 – every MAF-item is associated with a unique identifier denoted by ID(a) (table
   5e), playing the role of the items in standard algorithms.
 – every block b is assigned a patient identifier ID(p), playing the role of the sequence
   identifiers in standard algorithms,
 – every block b transformed into a pair (ID(b), ζ(b)), where ζ(b) is a sequence. (table
   5f)

    PrefixSpan is run over table 5f. By considering a support threshold minsup =50%,
table 5g displays all the frequent sequences in their transformed format as well in their
multidimensional format in which identifiers are replaced with their actual values.
    The basic step in M3SP method is MAF-item generation, because it provides all
multidimensional items that occur in sequences to be mined. If the set of MAF-items
is changed, the sequence will be changed. M3SP always extracts the most specific
multidimensional items.
    For example (H1 , D1 ) is frequent according to minsup=50%, but another item, (H11 , D11 )
is more specific and still frequent. As a result, (H1 , D1 ) is not a MAF-item and consen-
quently not used to build squences. Finaly the frequent sequence h{(H1 , D1 ), (H21 , D21 )}i
does not appear in the results of M3SP. However, tables 5 and 6 show the MAF-items
set and the frequent sequences extracted by M3SP at a 100% threshold. It can be
noticed that (H1 , D1 ) is a MAF-item and that the sequence h{(H1 , D1 ), (H21 , D21 )}i is
generated.
                                             0
    Thus, given two minsup thresholds σ < σ. The set of frequent sequences obtained
     0
for σ may not always contain the set of sequences obtained for σ.
    Considering this as a limit in our approach as we wanted to extract both general and
specific sequences, we iteratively applied M3SP, decreasing threshold by one patient at
each step. This allowed us to extract more potentially interesting sequences than by
using a single low minsup threshold.


                                          Frequent Multidimensional Sequences
            MAF-item                                   h{(H1 , D1 )}i
             (H1 , D1 )                               h{(H21 , D21 )}i
            (H21 , D21 )                          h{(H1 , D1 ), (H21 , D21 )}i

Table 5. maf-item,minsup =100%           Table 6. Sequences for minsup =100%
                                                                        *	
  
                                                                                                                                                                                                                                                                                                            376




                                                               H1	
                H2	
  
                                                                                                                             P	
         T	
      H	
        D	
  
                                                                                                                                                                                                                     T	
       H	
      D	
                T	
        H	
     D	
        I-Pre-processing
                                                       H11	
   H12	
   H21	
   H22	
  
                                                                                                                             1	
         1	
   H11	
   D11	
                            Input	
                      1	
   H11	
   D11	
                   1	
   H12	
   D12	
  
                                               healthcare	
  ins/tu/ons	
  's	
  taxonomy	
                                  1	
         2	
   H21	
   D21	
  
                                                                 *	
                                                                                                                                                 2	
   H21	
   D21	
                   2	
   H21	
   D21	
  
                                                                                                                             2	
         1	
   H12	
   D12	
  
                                                             D1	
               D2	
                                                                                                                                   Pa/ent1	
                             Pa/ent2	
  
                                                                                                                             2	
         2	
   H21	
   D21	
  
                                                                                                                                                                                                            Figure 5(c). Block partition of Table I according to DR = {Patient}.

                                                     D11	
   D12	
   D 	
   D22	
                                                    Table 5(b). DataSet
                                                                      21
                                                    Disease's	
  taxonomy	
  
                                               Figure 5(a). Dimensione 's taxonomy                                                                                                                                                                                                         II-maf-item
                                                                                                                                                                                 (*,*)	
                                                                                                    generation


                                                                                            (H1,*)100%	
                                                                             (H2,*)100%	
  
                                                                                                                                                                                                                             (*,	
  D1)100%	
                      (*,	
  D2)100%	
  


                                                   (H11,*)50%	
                          (H12,*)50%	
         (H1,	
  D1)100%	
                            (H21,*)100%	
                     (H2,D2)100%	
      (*,D11)50%	
              (*,D12)50%	
             (*,	
  D21)100%	
  

                                                  (H11,D1)50%	
                          (H12,D1)50%	
   (H1,	
  D11)50%	
   (H1,	
  D12)50%	
   (H21,D2)100%	
                              (H2,D21)100%	
  
                                                                                                                                                                                                                                                                                                            Elias Egho, Nicolas Jay, Chedy Raissi and Amedeo Napoli




Fig. 5. example for M3SP method, minsup =50%
                                                  (H11,D11)50%	
   (H12,D12)50%	
                                                                         (H21,D21)100%	
  
                                                                                                                                                 Figure 5(d). Tree of frequent atomic item


                                                                                                                                                                                                                                  Frequent Multidimensional Sequences	
                   III-Sequences
                                                              Maf-‐item	
                       ID(a)	
                                                                                             PreﬁxSpan	
                   <(1)	
  >	
       <{(H11,D11)}>	
                          mining
                                                                                                                                      ID(p)	
                 ς	
  (b	
  )	
  	
                    minsup=50%	
  
                                                               (H11,D11)	
                          1	
                                                                                                                            <(2)	
  >	
       <{(H12,D12)}>	
  
                                                                                                                                         1	
               <(1),(3)>	
                                                              <(3)>	
          <{(H21,D21)}>	
  
                                                               (H12,D12)	
                          2	
  
                                                                                                                                         2	
               <(2),(3)>	
                                                            <(1),(3)>	
   <{(H11,D11),	
  (H21,D21)}>	
  
                                                               (H21,D21)	
                          3	
  
                                                                                                                                                                                                                                  <(2),(3)>	
   <{(H12,D12),	
  (H21,D21)}>	
  
                                                           Table 5(e). Maf-item                                                      Table 5(f). Transformed Database                                                         Table 5(g). Frequent Multidimensional Sequences
                 Querying Relational Concept Lattices

                                    Z. Azmeh1 , M. Huchard1 ,
                       A. Napoli2 , M. Rouane-Hacene3 , and P. Valtchev3
                       1
                          LIRMM, 161, rue Ada, F-34392 Montpellier Cedex 5
                          2
                            LORIA, B.P. 239, F-54506 Vandœuvre-lès-Nancy
          3
            Dépt. d’informatique, UQÀM, C.P. 8888, Succ. Centre-Ville Montréal, Canada



              Abstract. Relational Concept Analysis (RCA) constructs conceptual
              abstractions from objects described by both own properties and inter-
              object links, while dealing with several sorts of objects. RCA produces
              lattices for each category of objects and those lattices are connected via
              relational attributes that are abstractions of the initial links. Navigating
              such interrelated lattice family in order to find concepts of interest is not
              a trivial task due to the potentially large size of the lattices and the need
              to move the expert’s focus from one lattice to another. In this paper, we
              investigate the navigation of a concept lattice family based on a query
              expressed by an expert. The query is defined in the terms of RCA. Thus
              it is either included in the contexts (modifying the lattices when feasible),
              or directly classified in the concept lattices. Then a navigation schema
              can be followed to discover solutions. Different navigation possibilities
              are discussed.

              Keywords: Formal Concept Analysis, Relational Concept Analysis, Re-
              lational Queries.


      1     Introduction

      Recently [1], we worked on the problem of selecting suitable Web services for
      instantiating an abstract calculation workflow. This workflow can be seen as a
      DAG whose nodes are abstract tasks (like book a hotel room) and directed edges
      are connections between the tasks, which often correspond to a data flow (like
      connecting reserve a train ticket and book a hotel room: train dates and time-
      table are transmitted from reserve a train ticket to book a hotel room). The
      selection is based on quality-of-service (QoS) properties like response time or
      availability and on the composability quality between services chosen for neigh-
      bor tasks in the workflow. Besides, we aim at identifying and storing a set of
      backup services adapted to each task. To be efficient in the replacement of a fail-
      ing Web service by another, we want to organize each set of backup Web services
      by a partial order that expresses the quality criteria and helps to choose a good
      trade-off for instantiating the abstract workflow. Analyzing such multi-relational
      data is a complex problem, which can be approached by various methods includ-
      ing querying, visualization, statistics, or rule extraction (data mining).




c 2011 by the paper authors. CLA 2011, pp. 377–392. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
378       Zeina Azmeh et al.

          We proposed an approach based on Relational Concept Analysis (an itera-
      tive version of Formal Concept Analysis) to solve this problem, because of its
      multi-relational nature. Web services are filtered and grouped by tasks they may
      satisfy (e. g. the Web services for booking a hotel room). In formal contexts (one
      for each task), we associate the Web services and their QoS criteria. For exam-
      ple, the service HotelsService by lastminutetravel.com would be described by low
      response time, medium availability (classical scaling is applied to the QoS val-
      ues). In relational contexts we encode the composability levels in each directed
      edge of the workflow. Given an edge of the workflow, the composition quality
      depends on the way output data of the source task cover input data of the end-
      ing task, and the need for data adaptation. A relational context encodes for
      example the relation Adaptable-Fully-Composable between services for reserve
      a train ticket and services for book a hotel room. In this relation TravelService
      by puturist.com is connected to HotelsService by lastminutetravel.com if output
      data of TravelService can be used, with a slight adaptation, to fill input data of
      HotelsService.
          The concept lattice family we obtain (one Web service lattice for each task of
      the workflow) makes it possible: (1) to select a Web service for each task based
      on QoS and composability criteria, (2) to memorize classified alternatives for
      each task.
          Due to the nature of our problem, we are interested in classifying indepen-
      dently the Web services corresponding to the tasks and not classifying the solu-
      tions. By solution, we mean a set of Web services, each of which can instantiate a
      task of the workflow. If a particular service fails or is no more available, the goal
      is to constitute a new working combination out of the old one, with the smallest
      number of service replacements. To the best of our knowledge, this problem area
      has not been investigated in-depth prior to our study, especially in the context
      of Relational Context Analysis [7, 6]. Therefore, we believe that it would be use-
      ful to generalize and report what we learned in our experience. In more general
      terms, we have multi-relational data and a question which contains variables we
      want to instantiate, and we aim at:

       – Finding a specific set of objects that satisfy the query. An answer is composed
         of objects, each object instantiates one variable;
       – Classifying, for each variable, the objects depending on the way they satisfy
         (or not) the query, to find alternative answers.

          In this paper, we put the problem in a more general framework, which as-
      sumes an unrestricted relational context family and a query given by an expert.
      The query can be seen as a DAG, where some nodes are labelled by variables and
      some others are labelled by objects. The nodes roughly correspond to the formal
      (object-attribute) contexts and the edges correspond to the relational (object-
      object) contexts. A set of lattices is built using Relational Concept Analysis and
      the existential scaling operator. We assume that an expert gives a total ordering
      of the edges of the DAG. Then an algorithm navigates the lattices following this
      ordering. This navigation allows us to determine objects that answer the query.
                                      Querying Relational Concept Lattices         379

These objects with their position in the lattices are what the expert wants to
explore, to extract a solution and store the alternatives.
   In the following, Section 2 reminds the main principles of Relational Concept
Analysis (RCA). Section 3 defines the model of queries in the RCA framework
that we consider in this paper. Section 4 presents and discusses an algorithm that
navigates the concept lattice family using a query. Related work is presented in
Section 5 and we conclude in Section 6.


2      Background on Relational Concept Analysis

For FCA, we use the notations of [4]. In RCA [5], the objects are classified not
only according to the attributes they share, but also according to the links be-
tween them. Let us take the following case study. We consider a list of countries,
a list of restaurants, a list of Mexican dishes, a list of ingredients, and finally a
list of salsas. We impose some relations between these entities {Country, Restau-
rant, MexicanDish, Ingredient, Salsa}, such that: a Country ”has” a Restaurant;
a Restaurant ”serves” a MexicanDish; a MexicanDish ”contains” an Ingredient;
an Ingredient is ”made-in” a Country; and finally a Salsa is ”suitable-with” a
MexicanDish. We express these entities and their relations by the DAG in Fig. 1.
We capture an instantiation of this entity-relationship diagram in a relational
context family.




    Fig. 1. The entities of the Mexican food example (left). The query schema (right)



Definition 1. A relational context family RCF is a pair (K, R) where K is a set
of formal (object-attribute) contexts Ki = (Oi , Ai , Ii ) and R is a set of relational
(object-object) contexts rij ⊆ Oi × Oj , where Oi (domain of rij ) and Oj (range
of rij ) are the object sets of the contexts Ki and Kj , respectively.

    The RCF corresponding to our example contains five formal contexts and
five relational contexts, illustrated in Table 1 (except the made-in relational
context, which is not used in this paper for sake of simplicity). An RCF is
used in an iterative process to generate at each step a set of concept lattices.
First concept lattices are built using the formal contexts only. Then, in the
following steps, a scaling mechanism translates the links between objects into
380       Zeina Azmeh et al.

                                   Table 1. Relational Context Family for mexican dishes




                                                      America


                                                                       Europe




                                                                                                            r1
                                                                                                                 r2
                                                                                                                      r3
                                                                                                                           r4
                                                                                                                                r5
                                                                                                                                     r6
                                                                                                                                          r7
                                                                Asia




                                                                                                                                                            d1
                                                                                                                                                                 d2
                                                                                                                                                                      d3
                                                                                                                                                                           d4
                                                                                                                                                                                  d5
                                                                                                                                                                                               d6
                                                                                                 Restaurant




                                    mx
                                                                                                                                               MexicanDish




                    en
                                                                                                 Chili’s     ×




                                                us
               ca




                                          es
                               lb
                          fr
      Country                                                                                                                                  Burritos    ×
                                                                                                 Chipotle      ×
      Canada ×             ×                                                                                                                   Enchiladas    ×
                                                                                                 El Sombrero     ×
      England  ×               ×                                                                                                               Fajitas         ×
                                                                                                 Hard Rock         ×
      France     ×             ×                                                                                                               Nachos            ×
                                                                                                 Mi Casa             ×
      Lebanon      ×         ×                                                                                                                 Quesadillas         ×
                                                                                                 Taco Bell             ×
      Mexico         ×     ×                                                                                                                   Tacos                 ×
                                                                                                 Old el Paso             ×
      Spain            ×       ×
      USA                × ×




                                                                                     i10
                                                                                           i11
                                                                                                 i12
                         i1
                              i2
                                   i3
                                         i4
                                               i5
                                                     i6
                                                            i7
                                                                   i8
                                                                                i9
      Ingredient




                                                                                                                                                                                  medium-hot
      chicken        ×
      beef             ×
      pork               ×
      vegetables           ×




                                                                                                                                                                           mild
      beans                  ×




                                                                                                                                                                                               hot
                                                                                                                                                       s1
                                                                                                                                                            s2
                                                                                                                                                                 s3
                                                                                                                                                                      s4
      rice                     ×                                                                                                Salsa
      cheese                     ×                                                                                              Fresh Tomato          ×       ×
      guacamole                    ×                                                                                            Roasted Chili-Corn      ×       ×
      sour-cream                     ×                                                                                          Tomatillo-Green Chili     ×     ×
      lettuce                          ×                                                                                        Tomatillo-Red Chili         ×     ×
      corn-tortilla                      ×
      flour-tortilla                       ×

      contains    chickenbeef porkvegetablesbeansricecheeseguacamolesour-creamlettucecorn-tortillaflour-tortilla
      Burritos       ×    ×    ×      ×       ×   ×    ×       ×         ×       ×                      ×
      Enchiladas     ×                                 ×                 ×                ×
      Fajitas        ×    ×           ×                ×       ×         ×       ×                      ×
      Nachos                          ×       ×        ×       ×
      Quesadillas    ×    ×                            ×                                  ×             ×
      Tacos          ×    ×                   ×        ×                         ×        ×             ×
      has     Chili’s Chipotle El Sombrero Hard Rock Mi Casa Taco Bell Old el Paso
      Canada    ×        ×          ×          ×                ×
      England            ×                     ×                ×
      France                        ×          ×                            ×
      Lebanon   ×                              ×                ×
      Mexico    ×                                       ×       ×
      Spain                                    ×                ×
      USA       ×        ×          ×          ×        ×       ×
      serves      Burritos Enchiladas Fajitas Nachos Quesadillas Tacos
      Chili’s                           ×                ×         ×
      Chipotle       ×                                             ×
      El Sombrero    ×         ×        ×       ×        ×         ×
      Hard Rock                         ×       ×
      Mi Casa        ×         ×                ×        ×         ×
      Taco Bell      ×                          ×        ×         ×
      Old el Paso                                                  ×
      suitable-with         Burritos Enchiladas Fajitas Nachos Quesadillas Tacos
      Fresh Tomato             ×         ×        ×       ×        ×         ×
      Roasted Chili-Corn       ×                          ×
      Tomatillo-Green Chili    ×                          ×
      Tomatillo-Red Chili      ×         ×        ×       ×        ×         ×




      conventional FCA attributes and derives a collection of lattices whose concepts
      are linked by relations. For example, the existential scaled relation (that we will
      use in this paper) captures the following information: if an object os is linked to
      another object ot , then in the scaled relation, this link is encoded in a relational
      attribute assigned to os . This relational attribute states that os is linked to a
      concept, which clusters ot with other objects. This is used to form new groups,
      for example the group (See Concept 84) of restaurants, which serve at least one
      dish containing sour cream (such dishes are grouped in Concept 75). The steps
      are repeated until reaching the stability of lattices (when no more new concepts
      are generated). For mexican dishes, four lattices of the concept lattice family are
      represented in Figures 3 and 4. The ingredient lattice is presented in Fig. 2.
                                    Querying Relational Concept Lattices         381

Definition 2. Let rij ⊆ Oi × Oj be a relational context. The exists scaled
          ∃                  ∃
relation rij is defined as rij ⊆ Oi × B(Oj , A, I), such that for an object oi and
a concept c: (oi , c) ∈ rij ⇐⇒ ∃ x, x ∈ o0i ∩ Extent(c).
                         ∃


    In this definition, A is any set of attributes maybe including relational at-
tributes, which are defined below.
Definition 3. A relational attribute (s r c) is composed of a scaling operator s
(for example exists), a relation r ∈ R, and a concept c. It results from scaling a
relation rij ∈ R where rij ⊆ Oi × Oj . It expresses a relation between the objects
o ∈ Oi with the concepts of B(Oj , A, I). An existential relational attribute is
denoted by ∃rij c where c ∈ B(Oj , A, I).
    For example: the Concept 50 in the Country lattice owns the relational
attribute ∃has Concept 60. This expresses that each country in Concept 50
(Canada and USA) has at least a restaurant in Concept 60 extent (El Som-
brero or Mi Casa).




Fig. 2. The concept lattice for ingredients of the RCF in Table 1 (concepts names are
reduced to C n).




3     Introducing Relational Queries
In this section, we define the notion of query and answer to a query. First (section
3.1) we recall simple queries that help navigating concept lattices [7]. Then
(section 3.2), we generalize to relational queries that lead the navigation across
a concept lattice family.

3.1   Simple queries
Definition 4. A query (including its answer) on a context K = (O, A, I), de-
noted by q|K (or q when it is not ambiguous), is a pair q = (oq , aq ), such that
oq is the query object(s) i.e. the set of objects satisfying the query (or the an-
swer set), and aq is the set of attributes defining the constraint of the query. By
definition, we have: o0q ⊇ aq , where aq ⊆ A.
382       Zeina Azmeh et al.




          Fig. 3. Country and restaurant lattices for exists and the RCF in Table 1.




         For example q|Kcountry = ({England, F rance, Spain}, {Europe}) is a query
      on the country context (in Table 1), asking for countries in Europe. Another
      example q|KM exicanDish = ({}, {rice, corn-tortilla})

            When aq is closed, solving the query consists in finding the concept C =
      (a0q , aq ). To ensure that such a concept exists, a virtual query object ovq that
      satisfies ovq0 = aq can be added to the context (as an additional line). Then,
      three types of answers can be interesting: the more precise answers are in a0q , less
      constrained (with less attributes) answers are in extents of super-concepts of C,
      more constrained (with more attributes) answers are in extents of sub-concepts
      of C. When aq is not closed, and we don’t use the virtual query object, searching
      for answers needs to find first the more general concept C whose intent contains
      aq . Now we will define more generally what we mean by relational queries.
                                     Querying Relational Concept Lattices          383




        Fig. 4. Dishes and salsa lattices for exists and the RCF in Table 1.



3.2   Relational queries

In this study, a relational query is composed of several simple queries, to which
we add relational constraints. The relational constraints are expressed via virtual
query objects (variables), one for each formal context, where we want to find an
object. A virtual query object may have relations (according to the relational
contexts) with objects of other contexts, as well as with other virtual query
objects.

Definition 5. A relational query Q on a relational context family (K, R) is a
pair Q = (Aq , Ovq , Rq ), where:

1. Aq is a set of simple queries
   Aq = {q|Ki = (oq |Ki , aq |Ki ) | q|Ki is a query on Ki ∈ K}
2. There is a one-to-one mapping between Aq and Ovq , where Ovq is the set of
   virtual query objects.
3. Rq is a set of relational constraints Rq = {(ov q|Ki , rij , Oq )}, where ov q|Ki is
   the virtual object associated with q|Ki , Oq ⊆ Oj ∪ {ov q|Kj }, with ov q|Kj is
   the virtual object associated with Kj .
384       Zeina Azmeh et al.

          For example, let us consider the following query: I am searching for a country
      with the attribute ”fr”, a restaurant in this country serving Mexican dish contain-
      ing (chicken, cheese, and corn-tortilla), and a salsa which is ”hot” and suitable
      with the dish. This query can be translated into a relational query Qexample =
      (Aq , Ovq , Rq ) as follows: Aq = {qcountry , qrest. , qdish , qsalsa }, aqcountry = {f r},
      aqrest. = aqdish = ∅, aqsalsa = {hot}.
      Ovq = {ov qdish , ov qcountry , ov qrest. , ov qsalsa }
      Rq = {(ov qdish , contains, {chicken, cheese, corn-tortilla}), (ov qcountry , has,
      {ov qrest. }), (ov qrest. , serves, {ov qdish }), (ov qsalsa , suitable-with, {ov qdish })}.
      By definition, a query corresponds to the data model, and must respect the
      schema of the RCF (see in Fig. 1).
          An answer to the relational query is included in the answers of the simple
      queries. For our example, the answers of the simple queries would be oqcountry =
      {F rance}, oqrest. contains all the restaurants, oqdish contains all the dishes,
      oqsalsa = {T omatillo-Red Chili}. If we consider these objects connected with
      the relations, this forms what we call the maximal answer graph. In this graph,
      we are interested in the subgraphs that cover the query (they have at least one
      object per element of Aq ). These subgraphs are included in the graph of Fig. 5.
      There are various interesting forms of answer: having exactly one object per
      element of Aq , or having several objects per element of Aq .




      Fig. 5. The subgraph containing all the answers with the relations between the objects
      corresponding to the relational query example.



      Definition 6. An answer to a relational query Q = (Aq , Ovq , Rq ) is a set of
      objects X having a unique object per each context that is involved in the query:
                             X =< oi | oi ∈ Oi with 1 ≤ i ≤ n >
      These objects satisfy the query Q = (Aq , Ovq , Rq ), when
      they have the requested attributes: ∀ q|Ki ∈ Aq , ∃ oi ∈ X : o0i ⊇ aq|Ki
                                     Querying Relational Concept Lattices           385

and they are connected as expected:
∀ (ov q|Ki , r, Oq ) ∈ Rq with r ⊆ Oi × Oj , (and thus : Oq ⊆ Oj ∪ {ov q|Kj }) and
∀ o ∈ Oq , we have :

1.    if o ∈ Oj , we have (oi , o) ∈ r
2.    if o = ovq|K , we have (oi , oj ) ∈ r with oj ∈ X ∩ Oj
                  j



    For our example, the set of answers to the relational query, is:
{{F rance, El Sombrero, Enchiladas, T omatillo- Red Chili}, {F rance, El
Sombrero, Quesadillas, T omatillo-Red Chili}, {F rance, El Sombrero, T acos,
T omatillo-Red Chili}, {F rance, Old el P aso, T acos, T omatillo-Red Chili}}.
    Answers can be provided with an aggregated form which can be found in
lattices, as we explain below. They allow us to discover sets of equivalent objects
relatively to the answer. E.g. {Enchiladas, Quesadillas, T acos} are equivalent
objects if we choose F rance and ElSombrero.

Definition 7. An aggregated answer to a query Q = (Aq , Ovq , Rq ) is the set AR
containing the sets Si , such that:

 – there is a one-to-one mapping between AR and Aq which maps each q|Ki to
   a set Si
 – ∀ q|Ki ∈ Aq , ∀ oi ∈ Si , o0i ⊇ q|Ki (objects of Si have the requested attributes)
 – when (ov q|Ki , r, Oq ) ∈ Rq
      - if ov q|Kj ∈ Oq , r ⊆ Oi × Oj , thus : ∀ oi ∈ Si , ∀ oj ∈ Sj , Sj ∈
        AR, we have (oi , oj ) ∈ r (virtual objects are connected if requested)
      - f or each oj ∈ Oq ∩Oj we have : (oi , oj ) ∈ r (connections with particular
        objects are satisfied).


For example, an aggregated answer for our query is {Scountry , Srest. , Sdish , Ssalsa }
= {{F rance}, {ElSombrero}, {Enchiladas, Quesadillas, T acos}, {T omatillo-
RedChili}}




4    Navigating a Concept Lattice Family w.r.t. a Query

In this section, we explain how the navigation between the concept lattices can
be guided by a relational query. Following relational attributes that lead us from
one lattice to another, we navigate a graph whose nodes are the concept lattices.
In a first subsection, we propose an algorithm which gives a general navigation
schema that applies to concept lattices built with the existential scaling. Then
we present several variations of this navigation algorithm.
386         Zeina Azmeh et al.

      4.1     A query-based navigation algorithm

      Our approach for navigating the concept lattices along the relational attributes is
      based on the observations made during an experimental use of RCA, for finding
      the appropriate Web services to implement an abstract calculation workflow [1].
      We consider an RCF and a query that respects the RCF relations. From our
      experience, we observed that an expert often expresses his query by a phrase,
      where the chronology of the principal verbs (relations) gives a natural path for
      the query flow. This will be our hypothesis. Let us consider the query previously
      specified: I am searching for a country, with the basic attribute ”fr”, that has a
      restaurant which serves dishes containing chicken, cheese and corn-tortilla; I am
      searching for a hot salsa suitable with this dish. In order to simplify the notation,
      we use the same notation for queries q|ki and the virtual objects ov q|Ki .
          The query path is a total ordering of the arcs of the query (the query itself is
      a DAG in general). For our example, the path is the total ordering for Rq given
      by {(qcountry , has, {qrestaurant }), (qrestaurant , serves, {qdish }), (qdish , contains,
      {chicken, cheese, corn-tortilla}), (qsalsa , suitable-with, {qdish })}. Each arc cor-
      responds to a relation used in the query. All the relations involved inside a
      query are covered by this path. This translation of the expert query determines
      a composition on the relations. The query path does not always correspond to a
      directed chain in the object graph (e.g. dishes are the end of two of the considered
      relations (serves and suitable-with)).
          We propose the algorithms 1 to 3 (an additional procedure is needed which
      combines two others) for navigating through a concept lattice family using
      queries. During the exploration, we fill a set X by objects that will constitute an
      answer at the end (at most one object for each formal context). In this section,
      the algorithm is presented as an automatic procedure. Its use to guide an expert
      in its manual exploration of the data is discussed afterwards.
          Algorithm 1 identifies three main cases:

       – line 2, the arc connects two query objects, e.g. (qcountry , has, {qrestaurant });
       – line 5, the arc connects a query object to original objects e.g. (qdish , contains,
         {chicken, cheese, corn-tortilla});
       – line 8, the arc connects a query object to another query object and to original
         objects e.g. (qdish , contains, {qingredient , chicken, cheese, corn-tortilla}).

      Each of these cases considers, for a given arc a, whether the partial answer X
      already contains a source object or (inclusively) a target object.
      When the arc connects a query object to another query object a = (q|Ks , rst , q|Kt ),
      (Algorithm 2), four cases are possible.

       – X does not contain any object for Ks and any ot for Kt : we identify the
         highest concept that introduces the attributes of q|Ks and we select an object
         in its extent (lines 3-5). Then the algorithm continues on the next conditional
         statement (to find a target).
                                     Querying Relational Concept Lattices         387

 – X contains an object os for Ks and an object ot for Kt selected in previous
   steps: we just check if os owns the relational attribute pointing at the object
   concept introducing ot , that is γot (line 8)1 .
 – X contains only an object os for Ks . We should find a target. We identify,
   under the meet of the concepts that introduce the attributes of q|Kt , one of
   the lowest concepts to which os points (lines 12-14). We select a target in its
   extent.
 – X contains only an object ot for Kt . We should find a source. We identify the
   meet of the concepts that introduce the attributes of q|Ks and the relational
   attribute that points to ot (lines 20-23). We select a source in its extent.

   When the arc connects a query object to original objects a = (q|Ks , rst , Oq )
(Algorithm 3):

 – Either X contains an object for Ks and we need to check if the relational
   attributes confirm that this object is connected to all the original objects in
   Oq ) (line 4);
 – Or we have to select an object for Ks , owning the attributes of the query
   q|Ks and owning the relational attributes ending in the concepts introducing
   the original objects (line 9-11).

    The algorithm for the last case is a combination of the algorithms for the two
other cases. Note that whenever a condition is not verified, we have to backtrack,
this is not specified in the algorithm for sake of simplicity. If the query path forms
also a directed chain in the entity-relationship diagram, the main algorithm is a
depth-first search. But in the general case, in some steps, when we consider an
arc, we assigned to X an object for the end of the arc, and we need to find a
source object.
    For example, we start with the arc (qcountry , has, {qrestaurant }) where the
query path begins. We have to identify a source object os satisfying the query
{f r} (Definition 4). For example, we choose the object France appearing the
extent of Concept4 , whose intent contains fr.
    We extract the relational attributes of os = F rance, having the form
∃rst C). They are in practice in the lattices denoted by r : C. For example,
we obtain has:Concept 19, has:Concept 15, has:Concept 60, has:Concept 16, etc.
We keep the relational attributes with the concepts satisfying the target query
in the corresponding lattice and discard the rest. In our example, the qrestaurant
is empty. A relational attribute with the smallest concept (Ct ) is the one to
consider that leads us to find a solution. We choose Concept 15 among the
available smallest concepts. Let ∃ rst Ct be the selected relational attribute (if
it exists). The object ot must be in the extent of Ct . In our example, we select
El Sombrero.
    Then we consider the query-to-query arc (qrestaurant , serves, {qdish }). Given
that an object is selected for Krestaurant , we look for a possible target object,
led by the query qdish = ∅ and the relational attributes owned by the object
1
    We remind that γo is the object concept introduced by o.
388         Zeina Azmeh et al.

      concept Concept 15 which introduces El Sombrero. Suppose we choose (line 13)
      a relational attribute that targets one of the minimum concepts, namely serves :
      Concept 23 (but serves : Concept 26 or serves : Concept 25 are also possible).
      This leads us to Concept 23, in the extent of which we select Enchiladas.
          Dealing with the next arc (qdish , contains, {chicken, cheese, corn-tortilla})
      involves, since we have already selected a dish, to verify (Algorithm 3, line 4)
      that, the object concept γEnchiladas owns all the relational attributes that
      go to object concepts introducing chicken, cheese, and corn-tortilla. These are
      contains : γ chicken = Concept 29, contains : γ cheese = Concept 36 and
      contains : γ corn − tortilla = Concept 40 and they are indeed inherited by
      γEnchiladas = Concept 23.
          When the arc (qsalsa , suitable-with, {qdish }) is considered, the target (Enchi-
      ladas) is in X. Thus we identify a source in the extent of the Concept 47, which
      satisfies the target query {hot}. Its intent contains suitable − with : Concept 23
      which is Enchiladas. A target object (Tomatillo-Red Chili) is selected in the
      extent of Concept 47. The answer is now complete.



       Algorithm 1: Navigate(RCF, Q, PQ ) //PQ = (ak ) | ak = rij and rij ∈ RQ
            Data: (K, R): an RCF; Q = (Aq , Ovq , Rq ): a query on (K, R); and a query path
            Result: X: an object set (answer for Q) or fail
            foreach arc a ∈ PQ do                                                           1
               if a = (q|Ks , rst , q|Kt ) then                                             2
                   Case pure query                                                          3
               else                                                                         4
                   if a = (q|Ks , rst , Oq ) with Oq ⊆ Ot then                              5
                       Case pure objects                                                    6
                   else                                                                     7
                       if a = (q|Ks , rst , q|Kt ) with q|Kt ∈ Oq then                      8
                           Case query and objects                                           9




      4.2     Variations about the algorithm

      Integrating queries into the contexts. One approach that was investigated in
      the case of simple queries consists of integrating the virtual query object in
      the context, then building the concept lattice. This can also be done for rela-
      tional queries. A relational query Q = (Aq , Ovq , Rq ) can be integrated into an
      RCF by adding the virtual query objects ovq|K into the context Ki . Each vir-
                                                           i
      tual query object ov q|Ki owns the attributes of the query aq|Ki and for each arc
      (ovq|K , rij , ovq|K ), the relational context of rij is enriched by a line for ovq|K ,
              i          j                                                                 i
                                    Querying Relational Concept Lattices            389


Algorithm 2: Case pure query
 Let a = (q|Ks , rst , q|Kt )                                                         1
 if // X does not contain a source and a target for the current arc a                 2
 X ∩ Os = ∅ and X ∩ Ot = ∅ then
     // select a source in the extent of a concept that verifies the source query     3
     Let Cs be the highest concept having Intent (Cs ) ⊇ q|Ks
     select os ∈ Extent(Cs )                                                          4
     X ← X ∪ {os }                                                                    5

 if // X contains a source and a target for the current arc a                          6
 X ∩ Os = {os } and X ∩ Ot = {ot } then
     // verify that the source is connected to the target                              7
     check ∃rst γot ∈ Intent(γos )                                                     8
 else                                                                                  9
     if // X contains a source for the current arc a                                  10
     X ∩ Os = {os } then
         // select a target in the extent of a concept that verifies the target query 11
         and is connected to the source
         Let Ct be the highest concept having Intent (Ct ) ⊇ q|Kt                     12
         and Ct ∈ min(C | ∃ (∃rst C) ∈ Intent(γos ))                                  13
         select ot ∈ Extent(Ct )                                                      14
         X ← X ∪ {ot }                                                                15
     else                                                                             16
         // X contains a target for the current arc a                                 17
         // select a source in the extent of a concept that verifies the source query18
         and is connected to the target                                               19
         Let ot ∈ X ∩ Ot                                                              20
         Let Cs be the highest concept having Intent (Cs ) ⊇ q|Ks                     21
         and ∃rst γot ∈ Intent(Cs )                                                   22
         select os ∈ Extent(Cs )                                                      23
         X ← X ∪ {os }                                                                24




Algorithm 3: Case pure objects
 Let a = (q|Ks , rst , Oq ) with Oq ⊆ Ot                                              1
 if // X contains a source for the current arc a                                      2
 X ∩ Os = {os } then
     // verify that the source is connected to the objects in Oq                      3
     check ∀o ∈ Oq , ∃rst γo ∈ Intent(γos )                                           4
 else                                                                                 5
     // X does not contain a possible source                                          6
     // select a source in the extent of a concept that verifies the source query     7
     // and is connected to the target objects                                        8
     Let Cs be the highest concept having Intent (Cs ) ⊇ q|Ks                         9
     and ∀o ∈ Oq , ∃rst γo ∈ Intent(Cs )                                             10
     select os ∈ Extent(Cs )                                                         11
     X ← X ∪ {os }                                                                   12
390       Zeina Azmeh et al.

      a column for ovq|K and the relation (ovq|K , ovq|K )2 . We generate the corre-
                          j                          i      j

      sponding concept lattice family, considering the existential scaling 3 . Locating
      the highest concept that introduces all the attributes of each query of each con-
      cerned context, now is much more easy because it introduces the virtual query
      object. Then, we can navigate in a similar way as before.

      Opportunities of browsing offered by the exploration. As we explained before, the
      algorithm described in the previous section can be understood as an automatic
      procedure to determine a solution to a query. Nevertheless, it is more interesting
      to use it as a guiding method for the exploration of data by a human expert. Each
      object selection is a departure point for inspecting the objects of the selected
      concept, and, explore the neighborhood, going up by relaxing constraints or
      going down by adding constraints.
          A point in favor of the lattices is that they do not only give us a solution, but
      they also classify the objects of the solutions and provide a navigation structure.
      They also carry other information about the objects which can be useful for the
      expert: attributes that objects of the answer set have necessarily, attributes that
      appear simultaneously as attributes of the answer, etc.
          In our Web service application, we preferred the solution which integrates the
      query in RCF because it was easier to identify the answers. The lattices show how
      the existing objects match and differ from the query, thanks to the factorization
      of attributes between the query and the existing objects. Nevertheless, having
      several queries at the same time would not be efficient. Thus, the solution has
      been used only for specific problems. An incremental algorithm can be used to
      introduce the query, which enlightens the process of modifying the lattice and
      highlights the structure of the data. We can keep the original lattice (before query
      integration), and save the query objects together with the resulting concepts in
      an auxiliary structure. This way, we can always easily go back to the original
      lattices.


      5   Related Work

      ER-compatible data, e.g., relational databases, and concept lattices have a long
      history of collaboration. First attempts to apply FCA to that sort of data go
      back to the introduction of concept graphs by R. Wille in the mid-90s [8]. The
      standard approach is rooted in the translation of an ER model into a power-
      context family (PCF) where basically everything is represented within a formal
      context [9]. Thus, inter-object links of various arities (i.e., tuples of different
      sizes) are reified and hence become formal objects of a dedicated context (one per
      arity). The overall reasoning is therefore uniformly based on the formal concepts.
      2
        See our example in Table http://www.lirmm.fr/~huchard/RCA_queries/
        mexicoExistsWithQuery.rcft.html)
      3
        It is represented in Figure http://www.lirmm.fr/~huchard/RCA_queries/
        mexicoExistsWithQuery.rcft.svg
                                    Querying Relational Concept Lattices         391

While this brings an undeniable mathematical strength in the formalization of
the data processing and, in particular, querying, there are some issues with the
expressiveness. Indeed, while formal concepts are typically based on a doubly
universal quantification, the relational query languages mostly apply existential
one.
    Alternatives to the PCF in the interpretation of concept graphs have been
proposed that involve the notions of nested graphs and cuts [2]. It was shown that
the resulting formalism, called Nested Query Graphs, have the same expressive
power over relational data as first order predicate logic and hence can be used
as a visual representation of most mainstream SQL queries.
    Existing approaches outside the concept graphs-based paradigm (see [3, 6])
follow a more conventional coding schema. Here inter-object links are modeled
either through a particular sort of formal attributes or they reside in a differ-
ent binary tables that match two sorts of individuals among them (instead of
matching a set of individuals against a set of properties). Our own relational
concept analysis framework is akin to this second category of approaches, hence
our querying mechanisms are closer in spirit to those presented in the aforemen-
tioned papers.
    For instance, in [3], the author proposes a language modeled w.r.t. SPARQL
(the query language associated with the RDF language) to query relational data
within the logical concept analysis (LCA) framework. The idea is to explore the
relation structure of the data, starting from a single object and following its links
to other objects. The language admits advanced constructs such as negation and
disjunction and therefore qualifies as a fully-fledged relational query language.
    Recently, a less expressive language has been proposed in [6] for the brows-
ing of a relational database content while taking advantage of the underlying
conceptual structure. As the author himself admits, the underlying data for-
mat used to ground the language semantics, the linked context family, is only
slightly different from our own relational context family construct. The queries
are limited here to conjunctions and existential quantifiers, yet variables are ad-
mitted. Consequently, query topologies are akin to general graphs: In actuality,
the browsing engine comprises a factorization mechanism enabling the discovery
of identical extensions in the query graph which are subsequently merged.
    The downside of remaining free of the extensive commitments made by the
concept graphs formalism both in terms of syntax and of semantics is the lack of
unified methodological and mathematical framework beneath this second group
of approaches. As a result, these diverge on a wide range of aspects which makes
their in-depth comparison a hard task.
    First, there is an obvious query language expressiveness gap: On that axis, the
two extremes are occupied by the LCA- and the RCA-based approaches, respec-
tively, the former being the most expressive and the latter, the less expressive
one. Then, the role played by the concept lattices vs. the query resolution is
specific in each case. While in the LCA-based approach the concepts seem to
be formed on the fly, in [6] the author seems to imply that they are constructed
beforehand. Despite this distinction, in both cases the concept lattice is a sec-
392       Zeina Azmeh et al.

      ondary structure that supports query resolution. In our own approach however,
      lattices are not only constructed prior to querying, but they also incorporate
      relational information in the intents of their concepts. In this sense, they are the
      primary structures whereas the queries are intended as navigational support.


      6    Conclusion
      In this paper, we have presented a query-based navigation approach that helps
      an expert to explore a concept lattice family. The approach was based on an ap-
      plication of Relational Concept Analysis to the selection of suitable Web services
      for instantiating an abstract service composition. There are many perspectives of
      this work. In our Web service experience, we tested other scaling operators (like
      the covers operator) that offers other results, and helps to find more easily the
      aggregate answers. The query language can be made more expressive (including
      quantifiers). For example, we can request dishes containing only {chicken, cheese,
      ...}, which means that the universal scaling operator shall be used in the RCA
      process for this particular relation. Besides, the query path can be calculated,
      rather than being defined by the expert, suggesting more efficient exploration
      paths.


      References
      1. Azmeh, Z., Driss, M., Hamoui, F., Huchard, M., Moha, N., Tibermacine, C.: Selec-
         tion of composable web services driven by user requirements. In: ICWS. pp. 395–402.
         IEEE Computer Society (2011)
      2. Dau, F., Correia, J.H.: Nested concept graphs: Applications for databases and math-
         ematical foundations. In: Contribution to ICCS 2003. Skaker Verlag (2003)
      3. Ferré, S.: Conceptual navigation in RDF graphs with SPARQL-Like Queries. In:
         Kwuida, L., Sertkaya, B. (eds.) ICFCA. LNCS, vol. 5986, pp. 193–208. Springer
         (2010)
      4. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Sprin-
         ger-Verlag (1999)
      5. Huchard, M., Rouane-Hacene, M., Roume, C., Valtchev, P.: Relational concept dis-
         covery in structured datasets. Ann. Math. Artif. Intell. 49(1-4), 39–76 (2007)
      6. Kötters, J.: Object configuration browsing in relational databases. In: Valtchev,
         P., Jäschke, R. (eds.) ICFCA. Lecture Notes in Computer Science, vol. 6628, pp.
         151–166. Springer (2011)
      7. Messai, N., Devignes, M.D., Napoli, A., Smaı̈l-Tabbone, M.: Querying a bioin-
         formatic data sources registry with concept lattices. In: Dau, F., Mugnier, M.L.,
         Stumme, G. (eds.) ICCS. LNCS, vol. 3596, pp. 323–336. Springer (2005)
      8. Wille, R.: Conceptual graphs and formal concept analysis. In: Lukose, D., Delugach,
         H.S., Keeler, M., Searle, L., Sowa, J.F. (eds.) ICCS. Lecture Notes in Computer
         Science, vol. 1257, pp. 290–303. Springer (1997)
      9. Wille, R.: Formal concept analysis and contextual logic. In: Hitzler, P., Scharfe,
         H. (eds.) Conceptual Structures in Practice. pp. 137–173. Chapman and Hall/CRC
         (2009)
          Links between modular decomposition of
       concept lattice and bimodular decomposition of
                          a context

                                           Alain Gély

                            LITA, Ile du Saulcy, 57045 Metz Cedex 1
                                  Université de Metz, France
                                       gely@univ-metz.fr



             Abstract. This paper is a preliminary attempt to study how modular
             and bimodular decomposition, used in graph theory, can be used on
             contexts and concept lattices in formal concept analysis (FCA).
             In a graph, a module is a set of vertices deﬁned in term of behaviour
             with respect to the outside of the module: All vertices in the module act
             with no distinction and can be replaced by a unique vertex, which is a
             representation of the module. This deﬁnition may be applied to concepts
             of lattices, with slighty modiﬁcations (using order relation instead of
             adjacency).
             One can note that modular decomposition is not well suited for bipar-
             tite graphs. For example, every bipartite graph corresponding to a clar-
             iﬁed context is trivially prime (not decomposable w.r.t modules). In [4],
             authors have introduced a decomposition dedicaced to bipartite graph,
             called the bimodular decomposition. In this paper, we show how modu-
             lar decomposition of lattices and bimodular decomposition of contexts
             interact. These results may be used to improve readability of a Hasse
             diagram.


      1    Introduction

      Concept lattices are well suited to deal with knowledge representation and clas-
      siﬁcation, but when the number of concepts grows, it is not very convenient
      to visualize the Hasse diagram. To avoid this problem, some approaches keep
      only part of the concepts (Iceberg lattice [9] , usage of Galois sub-hierarchy [3],
      concepts with high stability score [7, 8] or any combinaison of these techniques);
      Others approaches try to obtain a more readable lattice by usage of a threeshold,
      as for α-galois lattices [10]. Another solution is to use decompositions to improve
      readability (see all chapter 4 of [5] and particularly nested diagrams).
          There is a lot of works in graph theory about decomposition of a graph. A
      classical and well studied decomposition is the modular decomposition (see for
      example [6]). This decomposition has great properties: possibility of replacing a
      set of vertices by a single representant, so that visualization of the graph is better
      understandable; recursive approach, so that one can go from generalities to ﬁner




c 2011 by the paper authors. CLA 2011, pp. 393–403. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
394         Alain Gély

      detail levels (useful for knowledge representation); nice theoretical properties,
      as the existence of a decomposition tree or closure properties for the family of
      modules.
          Modular decomposition of graphs may be adapted to lattices with only little
      changes: In a graph, this is the adjacency relation which is fundamental, but it
      is the order relation for lattices.
          Moreover, concept lattices are usually computed from a context, which can
      be considerated as a bipartite graph. So, there are two structures which can be
      decomposed: the concept lattice and the bipartite graph.
          Unfortunately, bipartite graphs are not good candidates for modular decom-
      position: except for twin vertices (vertices with the same neighbourghood) or con-
      nected components, there are no modules (except trivial one’s) in such graphs.
      To improve the decomposition of bipartite graph, the notion of bimodule is in-
      troduced in [4].
          Goal of this paper is to study how bimodules of a bipartite graph interact
      with modules of the concept lattice of this context, and to see how it can be
      used to help the visualisation of information contained in lattices.

          The next section is dedicaced to deﬁnitions. Section 2.2 introduces modules
      of a graph and transposes the deﬁnition to lattice (modules of a lattice). Section
      3 is about bimodules of a bipartite graph (context) and the links that exist
      with the corresponding concept lattice. After some discussion in section 4, we
      conclude the paper in section 5.


      2     Preliminaries

      2.1     Definitions

      In this paper, all discrete structures are ﬁnites and all graphs are simples (no
      loops neither multi-edges). Since this paper is about usages of graph theory
      results, a formal context will be considerated as a bipartite graph B = (O, A, I)
      with O (objects) and A (attributes) being two stable sets of vertices, and I
      (incidence relation between objects and attributes) the set of edges of B.
          For a vertex v, v ′ denotes the neighbourghood of v (vertices adjacents to
      v). For a subset V of vertices, V ′ denotes the common neighbourhood (vertices
      which are adjacent to every vertices of V ). With this notation, the classical
      deﬁnition of galois connections follows immediately.

      Definition 1 (Galois connections). For a set X ⊂ O, Y ⊆ A we define

                             X ′ = {y ∈ O | xIy for all x ∈ X},
                             Y ′ = {x ∈ A | xIy for all y ∈ Y }.

         A clariﬁed context is a context such that x′ = y ′ implie x = y for any vertices
      of O ∪ A. A clariﬁed context is reduced if no vertex v is such that v ′ = V ′ with
      V ⊆ O ∪ A, v ̸∈ V .
Links between modular decomp. of conc. lat. and bimodular decomp. of a          395
                                                                    context
     A complete lattice L = (P, ≤, ∨, ∧) is a poset such that for all X ⊆ P , there
 exist a supremum and an inﬁmum in P . j ∈ P is ∨-irreducible element if x∨y = j
 implies x = j or y = j. m ∈ P is a ∧-irreducible element if x ∧ y = m implies
 x = m or y = m. j covers a unique element j∗ (j∗ ≺ j), m is covered by a unique
 element m∗ (m ≺ m∗ ). We denote J the set of ∨-irreducible elements and M
 the set of ∧-irreducible elements.
     For a formal context C = (O, A, I) a formal concept is a pair (X, Y ), X ⊆ O,
 X ⊆ A and X ′ = Y and Y ′ = X. X is called the extent of the concept and Y is
 called the intent. The set of formal concepts ordered by inclusion on the intents
 is the concept lattice of C.
    For every ﬁnite lattice L = (P, ≤, ∨, ∧) there is, up to isomorphism, a unique
 reduced context C = (J, M, ≤). In the following of this paper, we will consider
 only reduced contexts, i.e. contexts such that O = J, the set of ∨-irreducible
 elements and A = M the set of ∧-irreducible elements of L.


 2.2   Modules of graphs and lattices

 We denote a graph G with G = (V, E). V is the set of vertices and E a set of
 edges. Let X ⊂ V and s ∈ V \X. Then s distinguishes X if s′ ∩ X ̸= ∅ and
 s′ ∩ X ̸= X. That is, s is adjacent with some vertices of X and not adjacent
 with some others vertices of X. So, if no vertex distinguishes a set X, then for
 the outside of X and relation of adjacency, every vertex is similar and X can be
 viewed as a unique vertex.

 Definition 2 (Module, graph theory). A module in a graph is a subset of
 vertices that no vertex distinguishes.

    The graph which is obtained by the replacement of a module by a single
 vertex is called a quotient graph. It is a simpliﬁcation of the original one (see
 Fig. 1). As no vertex distinguishes X (elements in dashed line), there exist only
 two possibilities for a vertex v not in X: either v is adjacent to every vertex of X
 (then there exists an edge between v and the representant of X) or v is adjacent
 with no vertex of X (then, there is no edge between v and the representant of
 X).
    For a graph G = (V, E), the set V and singletons x ∈ V are trivial modules.
 A graph without non trivial module is called a prime graph (for the modular
 decomposition). Two modules A and B overlap if no one is a subset of the other
 and A ∩ B ̸= ∅. A module which does not overlap another module is a strong
 module.
    Modules and strong modules are central in several decomposition processes
 and their properties have been well studied. In the ﬁrst deﬁnitions, modules
 where deﬁned with respect to the adjacency relation, but decompositions have
 been generalized (for example in [1]) for others properties of graphs.
    For a lattice, it is more natural to consider the order relation than an adja-
 cency relation, so a natural deﬁnition follows immediately:
396       Alain Gély




                                        (a)                  (b)

                     Fig. 1. (a) A module in a graph and (b) the quotient graph



      Definition 3. For a lattice L = (P, ≤, ∨, ∧), a lattice module is a set of elements
      X ⊆ P such that, for every y ∈ P \X, one of the three following statements is
      true:
       – ∀x ∈ X, x < y;
       – or ∀x ∈ X, x > y;
       – or ∀x ∈ X, x||y.
          It is clear with this deﬁnition that a module in a lattice L is equivalent to a
      module (with respect to adjacency) in the graph obtained by transitive closure
      of the Hasse Diagram of L.

                                        ⊤



                      f       g                   h        M1                     M2

                 a        b       c           d       e




                                        ⊥
                                  (a)                               (b)

      Fig. 2. Two strong modules of lattice (a) and the quotient lattice (b). Since no vertex
      outside the module distinguishes vertices inside the module, it can be collapsed to a
      single vertex which is the representant of the module. Note that M2 can be recursively
      decomposed in two other modules {h} (trivial) and {d, e}.




          Let X ⊆ P be a subset of elements of a lattice L, with A = min(X) and
      B = max(X) the sets of minimal (resp. maximal) elements of X. X is a convex
      set iﬀ for all y ∈ P such that a < y < b, a ∈ A, b ∈ B, then y ∈ X. If A and B
      are reduced to singletons, X is an interval. [A, B] denotes the convex set deﬁned
      by the two sets A and B.
Links between modular decomp. of conc. lat. and bimodular decomp. of a           397
                                                              context
 Lemma 1. Modules in lattices are convex sets.
 Proof. Suppose it is not, then there exists y ∈ P \X with a < y < b and so, y
 distinguishes a and b. It follows that X is not a module.
   From now, since lattices modules are convex sets, we will use the notation
 X = [A, B] to speak of a module X.
 Lemma 2. For a lattice module [A, B]:
 1. if |A| > 1 then A ⊆ J,
 2. if |B| > 1 then B ⊆ M .
 Proof. Suppose |A| > 1, and let A = a1 , a2 , . . . , an .
 Suppose ai ̸∈ J, then, since ai ||aj there exists at least one ∨-irreducible element
 j such that j < ai and j ̸< aj , which is a contradiction with the fact that [A, B]
 is a module.
     Dually proof applies for elements of B.
     Note that when |A| = 1 (dually for B) the maximal element of the module
 is not necessary an irreducible one (See Fig. 3).




                                    M1                         M6



                             b                         b

                                    M2           c         d   M5


                             a                        a


                                    M3                         M4



                             (a)                      (b)

 Fig. 3. (a) Module M2 is a convex set [A, B], with A = {a} ̸⊂ J and B = {b} ̸⊂ M .
 M1 , M2 and M3 overlap: there are not strong modules. (b) M4 , M5 and M6 are strong
 modules but are not intervals. M5 = [A, B], with A = {b, c} ⊂ J and B = {b, c} ⊂ M .




 Lemma 3. For a lattice module [A, B]:
                 ∧
  1. if |A| > 2, ∨ A = ai ∧ aj for all ai , aj ∈ A.
  2. if |B| > 2, B = bi ∨ bj for all bi , bj ∈ B.
 Proof. Clearly, suppose |A| > 2 and there exist ai , aj , ak ∈ A such that x1 =
 ai ∧ aj ̸= aj ∧ ak = x2 . w.l.o.g suppose x1 ̸< x2 . Then x1 < aj and x1 ̸< ak . It
 follows that x1 distinguishes [A, B].
398       Alain Gély

      3    Modules of lattices and bimodules of contexts
      As a preliminary remark, we recall that all considered contexts are reduced, and
      so, clariﬁed. The clariﬁcation of a context is the fact to keep only one object o
      for all objects oi such that o′i = o′j (dually for attribute). It is clear that the set
      {o1 , . . . , on } is a module in the bipartite graph and this process is equivalent to
      replace twin vertices by a representant.
          Modules are not well situed for bipartite graphs. Twin vertices and connected
      components are the only modules for these graphs which are poorly decompos-
      able. In goal to improve the decomposition, Fouquet and all have introduced
      bimodule, an analog of module for bipartite graphs.
      Definition 4 (Bimodule). Let C = (O, A, I) be a bipartite graph, and (X, Y ) ⊂
      (O, A), then (X, Y ) is a bimodule if no x ∈ O\X distinguishes A and no y ∈ A\Y
      distinguishes O.
         Example of bimodule is given in Fig. 4: b and c are not distinguished with
      respect to vertices 4 (none of them are adjacent) or 3 (each of them is adjacent).
      Similarly, 1 and 2 are not distinguished by a (each of them is adjacent) and d
      (none of them is adjacent).



                                       1       2       3       4




                                   a       b       c       d




                                  Fig. 4. Example of bimodules



          The whole bipartite (O, A), all vertices and pairs (j, m), j ∈ J, m ∈ M
      are trivial modules. In the following, we consider only non trivial bimodules, i.e
      bimodules with at least 3 elements.
      Proposition 1. To any non trivial module X of lattice corresponds a bimodule
      of reduced context.
      Proof. By deﬁnition of a lattice module, no elements inside the modules are dis-
      tinguished by elements outside. It follows directly that no ∨-irreducible element
      outside the module distinguishes ∧-irreducible elements inside, and conversely.
          Now, we want to know how a bimodule on the context may be interpretated
      in the concept lattice. First, we deﬁne a set in the lattice L from a bimodule X.
          From a bimodule X = (J1 , M1 ) ⊆ (J, M ), we build a subset C of concepts
      in L such that:
Links between modular decomp. of conc. lat. and bimodular decomp. of a     399
                                                                context
  – attributes concepts of X are in C,
  – objects concepts of X are in C,
  – C = [A, B] is a convex set, with A being maximal elements of C and B being
     minimal elements of C.

     As previously seen, a lattice module corresponds to a context bimodule but,
 with the previous construction, the converse may be false: There exist bimodules
 of a context such that [A, B] does not correspond to a module in the lattice. As
 an example, Fig. 5 shows the lattice for the bipartite graph in Fig. 4. The set
 [A, B] is bounded by a dashed line.
     Nevertheless, we can observe that, even if [A, B] is not a module, there exists
 a possibility of simpliﬁcation, replacing a set of elements by two vertices and an
 edge.


                         (abcd, ∅)                                                   (abcd, ∅)




     (ab, 1)             (ac, 2)                 (bcd, 3)                       12           (bcd, 3)




               (a, 12)             (b, 13)       (c, 23)    (d, 34)   (a, 12)           bc              (d, 34)




                                             (∅, 1234)                                  (∅, 1234)


        Fig. 5. (a) lattice for bipartite graph in Fig. 4.a and (b) simpliﬁed lattice




     In the following, we show that when [A, B] is not a module, the shape of the
 set [A, B] is very constrained.

 Proposition 2. Let [A, B] be a convex set built from a non trivial bimodule X.
 If [A, B] is not a module, then

 1. | |[A, B] ∩ M | − |[A, B] ∩ J| | ≤ 1
 2. if |[A, B] ∩ M | = |[A, B] ∩ J|, then∨A ⊆ J and B ⊆ M
 3. if |max([A, B] ∩ J)| > 1, a+ = ai , ai ∈ max([A, B] ∩ J) is such that
    ai ≺ a+                              ∧
 4. if |min([A, B] ∩ M )| > 1, b− = bi , bi ∈ min([A, B] ∩ M ) is such that
    b− ≺ bi

 Proof. First, we show that | |[A, B] ∩ M | − |[A, B] ∩ J| | ≤ 1:

    Suppose that [A, B] is not a module of the concept lattice, then there exists
 an element x ̸∈ [A, B] which distinguishes [A, B]. Without lost of generality, we
 can consider that x ∈ J and there exist y ∈ [A, B] such that x < y. We denote P1
400         Alain Gély

      the set of elements of [A, B] which are greater than x and P2 = [A, B]\P1 . Since
      x is a ∨-irreducible element, it does not distinguish any ∧-irreducible elements
      in [A, B]. It follows that all ∧-irreducible elements in [A, B] are in P1 and none
      of them in P2 (or conversely).
          In a ﬁnite lattice, every element e is ∧-dense, i.e. equal to the inﬁmum of
      ∧-irreducible elements greater than e.
          All ∨-irreducible elements in P2 cannot be distinguished by ∧-irreducible
      elements outside [A, ∧B]. One unique ∨-irreducible element jmax of P2 may be
      deﬁned by jmax = mi , . . . , mj , with mi , . . . , mj ̸∈ P1 . All other ∨-irreducible
      elements in P2 are distinguished by ∧-irreducible elements in P1 (and only by
      these elements). So P2 ∩ J = X ∪ {jmax } (jmax may not exist).
          Suppose |X| < |[A, B] ∩ M |, then there exist m1 , m2 ∈ [A, B] ∩ M , j1 ∈ X
      such that j1 < m1 , j1 < m2 , m1 ||m2 . It follows that j1 < m1 ∧ m2 , which is
      impossible since j1 ∈ P2 and elements in P2 are not comparable to x.
          Similarly, suppose |X| > |[A, B] ∩ M |, At least one ∨-irreducible element j
      of X is smaller than two ∧-irreducible elements m1 and m2 of P1 , with m1 ||m2 .
      This is impossible, so |X| = |[A, B] ∩ M | and | |[A, B] ∩ M | − |[A, B] ∩ J| | ≤ 1.
         A ⊆ J and B ⊆ M follow directly of the fact that, by construction A and
      B contain irreducible elements and for each ∧-irreducible element m ∈ [A, B],
      there exists a ∨-irreducible element j ∈ [A, B] such that j < m.
          It remains
             ∨       to prove that, when max([A, B]∩J) contains at least two elements,
      a+ = ai , ai ∈ max([A, B]∩J) is such that ai ≺ a+ (and dually for b− ). Suppose
      it is not the case, then exist at least two elements x1 and x2 smaller than a+
      and such that x1 and x2 distinguish elements in A. It follows that one can ﬁnd a
      ∧-irreducible element which distinguishes ∨-irreducible elements in A and that
      is a contradiction.

          It follows from this proposition that even if a set [A, B] is not a module, it
      can be collapsed into two vertices j and m such that j < m (but maybe not
      j ≺ m). j is a representant for the set [A, B] ∩ J and m a representant for the
      set [A, B] ∩ M . Moreover, j ≺ a+ and b+ ≺ m.


      4     Discussion
      4.1     Algorithmic Aspects
      It is known that the family of modules of a graph (and so, of a lattice) and
      the family of bimodules of a bipartite graph are closed by intersection. Since
      the whole graph is a (trivial) module, it deﬁnes a lattice. So, for any set S of
      vertices, it is possible to use a closure operator to compute the smallest module
      which contains S. Algorithm 1 adds all vertices which distinguish respectively
      X and Y and the same process is repeated until no more vertex can be added.
          Usually, bimodules decomposition does not produce all possible modules,
      but an inclusion tree such that all possible bimodules can be deduced from this
      tree. The root represents the whole graph and the leaves are vertices (trivial
Links between modular decomp. of conc. lat. and bimodular decomp. of a          401
                                                              context
       Input: (O, A, I) a bipartite graph, (X, Y ) ⊂ (O, A)
       Output: (Xc , Yc ), smallest bimodule containing (X, Y )
       begin
          continue ← true;
          (Xc , Yc ) ← (X, Y );
          while continue do
              continue ← f alse;
              forall the x ∈ J\Xc do
                    if x distinguishes Yc then
                        Xc ← Xc ∪ x;
                        continue ← true;
                    end
              end
              forall the y ∈ M \Yc do
                    if y distinguishes Xc then
                        Yc ← Yc ∪ y;
                        continue ← true;
                    end
              end
          end
          return (Xc , Yc )
       end
 Algorithm 1: Computation of the smallest bimodule which contains (X, Y )


 bimodules). It follows that the size of the tree is O(n), with n = |O| + |A|. In
 [1], authors propose a O(n3 ) algorithm to compute a such tree.


 4.2     Decomposition and Real Data

 In Fig. 6, an example of bimodule is shown on the “Living Beings and Water”
 concept lattice [5]. g is the attribute for “can move around” and h is the one for
 “has limbs”. These two attributes are equivalent (cannot be distinguished) from
 the outside of the bimodule. So, on the lattice in Fig. 6.c these two attributes
 are collapsed, as well as objects 1 (Leech) and 2 (Bream). Further work must be
 done on real data to see what bimodules can enlight for practical cases.


 5      Conclusion

 First, we have seen that modules deﬁned on a lattice have natural links with
 bimodules of the bipartite graph (context) of this lattice. Modules of a lattice
 can be used the same way as modules of a graph are used: to produce a quo-
 tient lattice, which is a simpliﬁcation of the original one. Recursive deﬁnition of
 modules allows to consider several details levels in the lattice.
     All results in modular decomposition may be transposed immediatly to con-
 cept lattice and associated context to improve the readibility of the lattice.
402       Alain Gély




                                               b c d e f g h i
                                             1 ×         ×
                                             2 ×         ××
                                             3 ××        ××
                                             4   ×       ×××
                                             5 × × ×
                                             6 ××× ×
                                             7   ×××
                                             8   ×× ×
                                                    (a)




                                                   c         b
                                    g

                                h                                        d

                                                                                 f
                                              1
                                         2                                           5
                                                                         8
                        4   i   3                                e   7       6




                                                       (b)




                                                   c         b
                                    gh

                                                                         d

                                                                                 f
                                              12
                                                                                     5
                                                                         8
                        4   i   3                                e   7       6




                                                       (c)

      Fig. 6. (a) “Living Beings and Water ” Context [5], (b) Concept lattice for “living
      Beings and Water” and (c) the same concept lattice with a bimodule collapsed.
Links between modular decomp. of conc. lat. and bimodular decomp. of a             403
                                                                       context
     Second, investigation of bimodules properties shows that a bimodule may not
 correspond to a module of the lattice. Nevertheless, it remains possible to use it
 to produce a simpliﬁcation of the original lattice. In such a case, the bimodule
 is collapsed in two elements a and b which represent ∨-irreducible elements and
 ∧-irreducible elements of the bimodule.
     This last case is a particular case of another decomposition proposed for
 inheritance hierarchies [2], called the block decomposition (with a diﬀerent def-
 inition of block that the one in [5]): a block is an interval [a, b] such that only a
 and b can be distinguished of other vertices from the outside of the block. As a
 perspective behaviour of this decomposition for lattices and associated properties
 on the context can be investigated.


 References
  1. m. Bui Xuan, B., Habib, M., Limouzy, V., Montgolﬁer, F.D.: Homogeneity vs. adja-
     cency: generalising some graph decomposition algorithms. In: In 32nd International
     Workshop on Graph-Theoretic Concepts in Computer Science (WG), volume 4271
     of LNCS (2006)
  2. Capelle, C.: Block decomposition of inheritance hierarchies. In: WG. pp. 118–131
     (1997)
  3. Encheva, S.: Galois sub-hierarchy and orderings. In: Proceedings of the 10th
     WSEAS international conference on Artiﬁcial intelligence, knowledge engineer-
     ing and data bases. pp. 168–171. AIKED’11, World Scientiﬁc and Engineer-
     ing Academy and Society (WSEAS), Stevens Point, Wisconsin, USA (2011),
     http://portal.acm.org/citation.cfm?id=1959485.1959517
  4. Fouquet, J., Habib, M., de Montgolﬁer, F., Vanherpe, J.: Bimodular decomposi-
     tion of bipartite graphs. In: Graph-Theoretic Concepts in Computer Science 30th
     International Workshop, WG 2004, Bad Honnef, Germany, June 21-23 (2004)
  5. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations.
     Springer-Verlag Berlin (1996)
  6. Habib, M., Paul, C.: A survey of the algorithmic aspects of modular decomposition.
     Computer Science Review 4, 41–59 (2010)
  7. Jay, N., Kohler, F., Napoli, A.: Analysis of social communities with iceberg and
     stability-based concept lattices. In: Proceedings of the 6th international confer-
     ence on Formal concept analysis. pp. 258–272. ICFCA’08, Springer-Verlag, Berlin,
     Heidelberg (2008), http://portal.acm.org/citation.cfm?id=1787746.1787765
  8. Klimushkin, M., Obiedkov, S., Roth, C.: Approaches to the selection of relevant
     concepts in the case of noisy data. In: Kwuida, L., Sertkaya, B. (eds.) Proc. 8th
     Intl. Conf. Formal Concept Analysis. LNCS/LNAI, vol. 5986, pp. 255–266. Springer
     (2010)
  9. Stumme, G., Taouil, R., Bastide, Y., Lakhal, L.: Conceptual clustering with iceberg
     concept lattices. In: In: Proc. of GI-Fachgruppentreﬀen Maschinelles Lernen’01,
     Universität Dortmund (2001)
 10. Ventos, V., Soldano, H.: Alpha galois lattices: an overview. In: In: International
     Conference in Formal Concept Analysis (ICFCA05), LNCS. pp. 298–313. Springer
     (2005)
      Abduction in Description Logics using Formal Concept
      Analysis and Mathematical Morphology: Application to
                      Image Interpretation

                         Jamal Atif1 , Céline Hudelot2, and Isabelle Bloch3

               1. Université Paris Sud, LRI - TAO, Orsay, France jamal.atif@lri.fr
                   2. Ecole Centrale de Paris, France, celine.hudelot@ecp.fr
                             3. Telecom ParisTech - CNRS LTCI, Paris, France
                           isabelle.bloch@telecom-paristech.fr


             Abstract. We propose an original way of enriching Description Logics with ab-
             duction reasoning services by computing the best explanations of an observation
             through mathematical morphology (using erosions) over the Concept Lattice of a
             background theory. The intended application is scene understanding and spatial
             reasoning.
             Keywords: Abduction, Description Logics, FCA, Mathematical Morphology,
             Scene Understanding.


      1 Introduction and notations
      Scene interpretation can benefit from prior knowledge expressed as ontologies and from
      description logics (DL) endowed with spatial reasoning tools as illustrated in our pre-
      vious work [5, 6]. The challenge in this work was to derive reasoning tools that are able
      to handle in a unified way quantitative information supplied by the image domain and
      qualitative pieces of knowledge supplied by the ontology level. Object recognition and
      interpretation are seen as the satisfiability of a current situation (spatial configuration)
      encoded in the ABox of the DL and its TBox part. However, when the expert knowledge
      is not crisply consistent with the observations, which is common in image interpreta-
      tion, then this approach does not apply or leads to inconsistent results. Adapting DL
      reasoning tools to such situations can be performed using abduction. Our aim is thus to
      compute the “best explanation” to the observed phenomena in such situations. Formally,
      given a background theory K representing the expert knowledge and a formula C rep-
      resenting an observation on the problem domain, abductive reasoning searches for an
      explanation formula D such that D is satisfiable w.r.t. K and it holds that K |= D → C
      (K ∪ D |= C). We propose to add abductive reasoning tools to DL by associating ingre-
      dients from mathematical morphology, DL and Formal Concept Analysis (FCA), and
      by computing the best explanations of an observation through algebraic erosion over
      the concept lattice of a background theory which is efficiently constructed using tools
      from FCA. We show that the defined operators satisfy important rationality postulates
      of abductive reasoning.
          Based on the TBox T and the ABox A parts of a knowledge base K, we consider
      ABox abduction [3]: if for every a ∈ A it holds that K 6|= ¬a, an ABox Abduction Prob-
      lem, denoted as hK, Ai, consists in finding a set of assertions γ such that K ∪ γ |= A.




c 2011 by the paper authors. CLA 2011, pp. 405–408. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
406 2      Atif,Atif,
        Jamal    Hudelot  and Hudelot
                      Céline Bloch   and Isabelle Bloch

    The set γ (consistent with K) is said to be an explanation of A. Explanatory reason-
    ing is concerned with preferred explanations rather than just plain explanations. So,
    explaining an observation requires that some formulas must be “selected” as preferred
    explanations.
        We also rely on classical notions of (FCA), and denote a formal context by K =
    (G, M, I), where G is the set of objects, M the set of attributes and I ⊆ G × M a
    relation between the objects and attributes. For X ⊆ G and Y ⊆ M , the derivation
    operators are denoted by α and β, with α(X) = {m ∈ M | ∀g ∈ X, (g, m) ∈ I},
    and β(Y ) = {g ∈ G | ∀m ∈ Y, (g, m) ∈ I}. The concept lattice is defined from the
    classical partial ordering (X1 , Y1 ) ≤ (X2 , Y2 ) ⇔ X1 ⊆ X2 (⇔ Y2 ⊆ Y1 ).
        Links between FCA and DL can be formalized via the notion of semantic context
    KT := (G, M, I) defined as [1]: G := {(I, d) | I is a model of T and d ∈ ∆I }, M :=
    {m1 , . . . , mn }, and I := {((I, d), m) | d ∈ mI }, where I = (∆I , .I ) denotes an
    interpretation. The lattice can be constructed using the distributive concept exploration
    algorithm [9].


    2 Abduction Operators from Mathematical Morphology on
      Complete Lattices

    Let (L, ) and (L′ , ′ ) be two complete lattices (which do not need to be equal). An
    operator δ : L → L′ is a dilation if it commutes with the supremum. An operator
    ε : L′ → L is an erosion if it commutes with the infimum. Classical properties of
    mathematical morphology operators on complete lattices can be found in [4, 8].
         Here, with the aim of performing ABox abduction, we would like to reason on sub-
    sets of G in order to find their best explanations (in G). Hence we consider the complete
    lattice (P(G), ⊆) and operations from P(G) into P(G), where P(G) is the set of sub-
    sets of G. Since the ordering on G is equivalent to the one of M , reasoning on G will di-
    rectly lead to results on M . In order to define explicit operations on P(G), we will make
    use of particular erosions and dilations, called morphological ones [8], which involve
    the notion of structuring element, i.e. a binary relation b between elements of G. For
    g ∈ G, we denote by b(g) the set of elements of G in relation with g. It can be typically
    derived from a distance d: b(g) = {g ′ ∈ G | ∃X ∈ P(G), g ′ ∈ X, d({g}, X) ≤ 1}.
    The morphological erosion of X is then expressed as εb (X) = {g ∈ G | b(g) ⊆ X}.
    Defining b from a distance is particularly interesting in the context of abduction, where
    the “most central” parts of models will have to be defined. Erosion is then expressed as
    εn (X) = {g ∈ G | d(g, X C ) > n}, where X C denotes the complement of X in G.
    Here G is a discrete finite space, and therefore only integer values of n are considered.
    All classical properties of mathematical morphology hold in this framework.

    Last Non-empty Erosion. As shown in [2] in the framework of propositional logic,
    erosions can be used to find explanations. In this context, the idea was to find the most
    central part of a formula as the best explanation. This approach was shown to have good
    properties with respect to rationality postulates of abductive reasoning [7]. In this paper,
    we propose similar ideas, but adapted to the context of concept lattices, using erosions
    as defined above. For any X ⊆ G, we define its last erosion as εℓ (X) = εn (X) ⇔
     Abduction in Description Logics using FCA Abduction
                                               and Math. in DL using FCA
                                                            Morphology                   3
                                                                                       407

εn (X) 6= ∅, and ∀m > n, εm (X) = ∅. This last non-empty erosion defines the subset
of models in G that are the furthest ones from the complement of X (according to the
distance d), i.e. the most central in X.

Definition 1 Let A be a set of ABox assertions. A preferred explanation γ of A is de-
                                                     def
fined from the last non-empty erosion as A⊲ℓne γ ⇔ γ I ⊆ εℓ (AI ). In this equation,
AI should be understood as the extent of the semantic concept associated with the DL
concept A. When a constraint (e.g. a set of hypotheses belonging to the backgroud the-
                                                                               def
ory) H has to be introduced, then this definition is modified as A ⊲ℓne γ      ⇔     γI ⊆
εℓ (HI ∩ AI ).

    Starting from the subset to be explained, performing successive erosions amounts
to “go down” in the lattice as much as possible, in order to find a non-empty set of
interpretations.

Last Consistent Erosion. Another idea to introduce the constraint H is to erode it, as
soon as it remains consistent with A. This leads to a second explanatory relation.

Definition 2 A preferred explanation γ of A is defined from the last consistent erosion
               def
as: A ⊲ℓc γ ⇔ γ I ⊆ εℓc (HI , AI ) ∩ AI , where AI corresponds to the extent of
the semantic context and εℓc is the last consistent erosion defined as εℓc (HI , AI ) =
εn (HI ) where n = max{k | εk (HI ) ∩ AI 6= ∅}.

Here we consider erosion of H (i.e. HI ) alone, which means that we are looking at the
subsets (submodels) of the models of A while being the most in the constraint.

Properties and interpretations. A first important property is that reasoning on G ac-
tually amounts to reason on the whole formal context. Here, explanations where defined
from ABox reasoning, leading to erosions of subsets of G (models). Let (X, Y ) be a
formal concept, with X ⊆ G and Y ⊆ M . From the definitions of explanations of X,
we can derive directly the corresponding concepts for Y , using the derivation opera-
tor, i.e. α(γ) = {m ∈ M | ∀g ∈ γ, (g, m) ∈ I}. Note that eroding X amounts to
dilate Y , which is in accordance with the correspondence between the Galois connec-
tion property between derivation operators and the adjunction properties of dilation and
erosion. Let us now consider the rationality postulates introduced in [7] for explanation
relations. It has been proved that most of them hold for explanations derived from last
non-empty erosion and from last consistent erosion [2]. These results extend to the DL
context as follows:
   - Both ⊲ℓne and ⊲ℓc are independent of the syntax (since they are computed on
     models).
   - Definitions are consistent in the sense that K 6|= ¬A iff ∃γ, A ⊲ γ.
   - A reflexivity property holds for both definitions: if A ⊲ γ, then γ ⊲ γ.
   - Disjunctions of explanations: if A ⊲ γ and A ⊲ δ, then A ⊲ (γ ⊔ δ), for both defi-
     nitions. This means that if there are several possible explanations, their disjunction
     is an explanation as well, which is an expected result.
408 4        Atif,Atif,
          Jamal    Hudelot  and Hudelot
                        Céline Bloch   and Isabelle Bloch

        - Disjunction on the left: if C⊲ℓc γ and D⊲ℓc γ, then (C ⊔ D)⊲ℓc γ (since the erosion
          is always performed on HI ). However this property does not hold for ⊲ℓne since
          erosion does not commute with the supremum.
        - For the same reasons, we have the following property for ⊲ℓc : if C ⊲ℓc γ and
          D ⊲ℓc δ, then (C ⊔ D) ⊲ℓc γ or (C ⊔ D) ⊲ℓc δ, but it does not hold for ⊲ℓne .
        - For conjunctions, we have a monotony property for ⊲ℓc : if C ⊲ℓc γ and γ I ⊆ DI
          (i.e. D |= γ), then (C ⊓ D) ⊲ℓc γ. For ⊲ℓne , only a weaker form holds: if C ⊲ℓne γ
          and D ⊲ℓne γ, then (C ⊓ D) ⊲ℓne γ. Note that this weaker form is also very natural
          and interesting.
        Since both ⊲ℓne and ⊲ℓc operators perform erosion in the interpretation set ∆I ,
    any solution belongs then to this set and K is a model of the obtained solution. Hence
    we have the following theorems:
      - Soundness: If ∃γ | A ⊲ γ then K |= γ.
      - Completeness: K |= γ ⇒ ∃A | K |= A : A ⊲ γ.

    3 Conclusion
    With the aim of image interpretation, we have proposed abductive inference services
    in DL based on mathematical morphology over concept lattices, whose construction is
    based on exploiting the advances of using FCA in DL. The properties and interpreta-
    tions of the introduced explanatory operators were analyzed, and the rational postulates
    of abductive reasoning were stated and extended to our context. Future work will con-
    cern the complexity analysis of these operators and associated algorithms, and a deeper
    investigation of their applications to image interpretation.

    References
    1. F. Baader. Computing a minimal representation of the subsumption lattice of all conjunctions
       of concepts defined in a terminology. In Knowledge Retrieval, Use and Storage for Efficiency:
       1st International KRUSE Symposium, pages 168–178, 1995.
    2. I. Bloch, R. Pino-Pérez, and C. Uzcátegui. Explanatory Relations based on Mathematical
       Morphology. In ECSQARU 2001, pages 736–747, Toulouse, France, sep 2001.
    3. C. Elsenbroich, O. Kutz, and U. Sattler. A case for abductive reasoning over ontologies. In
       OWL: Experiences and Directions, Athens, Georgia, USA, 2006.
    4. H. J. A. M. Heijmans and C. Ronse. The Algebraic Basis of Mathematical Morphology –
       Part I: Dilations and Erosions. Computer Vision, Graphics and Image Processing, 50:245–
       295, 1990.
    5. C. Hudelot, J. Atif, and I. Bloch. Fuzzy spatial relation ontology for image interpretation.
       Fuzzy Sets and Systems, 159(15):1929–1951, 2008.
    6. C. Hudelot, J. Atif, and I. Bloch. Integrating bipolar fuzzy mathematical morphology in
       description logics for spatial reasoning. In European Conference on Artificial Intelligence
       ECAI 2010, pages 497–502, Lisbon, Portugal, August 2010.
    7. R. Pino-Pérez and C. Uzcátegui. Jumping to Explanations versus jumping to Conclusions.
       Artificial Intelligence, 111:131–169, 1999.
    8. J. Serra. Image Analysis and Mathematical Morphology. Academic Press, New-York, 1982.
    9. G. Stumme. Distributive concept exploration–a knowledge acquisition tool in formal concept
       analysis. In KI-98: Advances in Artificial Intelligence, pages 117–128. Springer, 1998.
      A local discretization of continuous data for
               lattices: Technical aspects

                  Nathalie Girard, Karell Bertet and Muriel Visani

                  Laboratory L3i - University of La Rochelle - FRANCE
                       ngirar02, kbertet, mvisani@univ-lr.fr

         Abstract. Since few years, Galois lattices (GLs) are used in data mining
         and defining a GL from complex data (i.e. non binary) is a recent chal-
         lenge [1,2]. Indeed GL is classically defined from a binary table (called
         context), and therefore in the presence of continuous data a discretization
         step is generally needed to convert continuous data into discrete data.
         Discretization is classically performed before the GL construction in a
         global way. However, local discretization is reported to give better clas-
         sification rates than global discretization when used jointly with other
         symbolic classification methods such as decision trees (DTs). Using a re-
         sult of lattice theory bringing together set of objects and specific nodes of
         the lattice, we identify subsets of data to perform a local discretization
         for GLs. Experiments are performed to assess the efficiency and the ef-
         fectiveness of the proposed algorithm compared to global discretization.


  1    Discretization process
  The discretization process consists in converting continuous attributes into dis-
  crete attributes [3]. This conversion can induce scaling attributes or disjoint
  intervals. We focus on the latter. Such a transformation is necessary for some
  classification models like symbolic models, which cannot handle continuous at-
  tributes [4]. Consider a continuous data set D = (O, F ), where each object in
  O is described by p continuous attributes in F . The discretization process is
  performed by iteration of attribute splitting step, according to a splitting cri-
  terion (Entropy [3], Gini [5], χ2 [6], ...) until a stopping criterion S is satisfied
  (a maximal number of intervals to create, a purity measure,...).
  More formally for one discretization step, for selecting the best attribute to be
  cut, let (v1 , . . . , vN ) be the sorted values of a continuous attribute V ∈ F . Each
  vi corresponds to a value verified by one object of the data set D. The set of
  possible cut-points is CV = (c1V , . . . , cVN −1 ) where ciV = vi +v2 i+1 ∀i ≤ N − 1.
  The best cut-point, denoted c∗V , is defined by:
                           c∗V = argmaxciV ∈CV (gain(V, ciV , D))                        (1)

  where gain(V, c, D) denotes in a generic manner the splitting criterion com-
  puted for the attribute V , the cut-point c ∈ CV and the data set D.
  The best attribute, denoted V ∗ , is the V ∈ F maximizing the splitting
  criterion computed for its best cut-point (i.e. c∗V ):
                         V ∗ (D) = argmaxV ∈F (gain(V, c∗V , D))                         (2)

c 2011 by the paper authors. CLA 2011, pp. 409–412. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
410     Nathalie Girard, Karell Bertet and Muriel Visani


Finally for one discretization step, the attribute V ∗ is divided into two intervals:
[v1 , c∗V ∗ ] and ]c∗V ∗ , vn ] and the process is repeated.
This process can be run using, at each step, all the objects in the training set.
This is global discretization. It can also be run during model construction con-
sidering, at each step, only a part of the training set. This is local discretiza-
tion. In [7], Quinlan shows that local discretization improves supervised
classification with decision trees (DTs) as compared with global discretiza-
tion. In DT construction, the growing process is iterated until S is satisfied.
Local discretization is performed on the subset of objects in the current node to
select its best attribute (V ∗ (node)), according to the splitting criterion. Given
the structural links between DTs and Galois lattices (GLs) [8], we propose a lo-
cal discretization algorithm for GL and compare its performances with a global
discretization.


2     Local discretization for Galois lattices

A GL is generally defined from a binary relation R between objects O and bi-
nary attributes I - i.e. a binary data set also called a formal context - denoted
as a triplet T = (O, I, R). A GL is composed of a set of concepts - a concept
(A, B) is a maximal objects-attributes subset in relation - ordered by a general-
ization/specialization relation. For more details on GL theory, notation and their
use in classification tasks, please refer to [9,10]. To define a local discretization
for GL, we have to identify at each discretization step the subset of concepts
to be processed. Given a subset of objects A ∈ P (O), there always exists a
smallest concept M containing this subset and identified in lattice theory as a
meet-irreducible concept of the GL [11]. Moreover, it is possible to compute
the set of meet-irreducibles directly from the context, thus the generation of the
lattice is useless [12]. Consequently, local discretization is performed on the set
of meet-irreducible concepts M I which does not satisfy S. Attributes in M I are
locally discretized: the best attribute V ∗ (M ) for each M ∈ M I is computed
according to eq. (3); then the best one V ∗ (M I) (eq. (4),(5)) for the whole set
M I is split into two intervals as explain before. The context T is then updated
with these new intervals; and its M I are computed. The process is iterated until
all M ∈ M I verify the stopping criterion S. The context T is initialized with,
for each continuous attribute, an interval -i.e. a binary attribute- containing all
continuous values observed in D; thus each object is in relation with every bi-
nary attributes of T . The GL of the inital context T contains only one concept
(O, I) being a meet-irreducible concept, which is used to initialize M I. See [13]
for more details on the algorithm.
The main difference with DT is that splitting an attribute in a GL impacts
all the other concepts of the GL that contain this attribute, and due to the
order relation between concepts ≤, the structure of the GL is also modified.
Whereas, when an attribute is split in a DT node, predecessors and others
branches are not impacted. In order to select the best V ∗ (M I) over all the
concepts sharing this attribute, we introduce different computing of V ∗ (M I).
        A local discretization of continuous data for lattices: Technical aspects       411


Let M I = {Dq = (Aq , Bq ); q ≤ Q}} be the set of meet-irreducible concepts not
satisfying S. The best attribute V ∗ (Dq ) associated to its best cut-point is first
computed for each concept Dq ∈ M I:

                      V ∗ (Dq ) = argmaxV ∈Bq (gain(V, c∗V , Dq ))                      (3)

where c∗V is defined by (1) for Dq instead of D.
               ∗        ∗               ∗
Let us define IM I = {V (D1 ), . . . , V (DQ )} the set of best attributes associated
to each concept in M I. The best attribute V ∗ (M I) among IM      ∗
                                                                     I can be defined
in two different ways:
By local discretization: Local discretization selects the best attribute V ∈
 ∗
IM I as the one having the best gain for M I:

            V ∗ (M I) = argmaxV ∗ (Dq )∈IM        ∗        ∗
                                         ∗ (gain(V (Dq ), c ∗
                                           I               V (Dq ) , Dq ))              (4)

By linear local discretization: Linear local discretization takes into account
                                      ∗
that the split of one attribute V ∈ IM  I in a concept Dq can impact the other
concepts. So we compute a linear combination of the criterion as the sum of
the gain for each concept Dq0 ∈ M I containing this attribute V . The selected
attribute is the one that gives the best linear combination:
                                      X               |Aq0 |
    V ∗ (M I) = argmaxV ∈IM
                          ∗ (                     P               ∗ gain(V, c∗V , Dq0 )) (5)
                                                  Dq ∈M I |Aq |
                            I
                                Dq0 ∈M I|V ∈Bq0



3     Experimental comparison
The study is performed on three supervised databases of the UCI Machine Learn-
ing Repository1 : the Image Segmentation database (Image1), the Glass Identifi-
cation Database (GLASS) and the Breast Cancer Database (BREAST Cancer).
We also use one supervised data set stemming from GREC 2003 database2 de-
scribed by the statistical Radon signature (GREC Radon). Table 1 presents the
complexity of each lattice structure associated to each discretization algo-
rithm and the classification performance using each GL by navigation [14]
and using CHAID as DT classifier [6]. Discretization is performed in each case
with χ2 as a splitting and stopping supervised criterion.


4     Conclusion
The study [3] shows that for DTs, local discretization induces more complex
structures compared to global discretization; Table 1 shows that for GL, on
the contrary, local discretization allows to reduce the structures’ com-
plexity. In [7], Quinlan proves that local discretization improves classification
performance of DTs compared to global discretization; as in DTs, Table 1 shows
that local discretization improves GLs classification performances.
1
    http://archive:ics:uci:edu/ml 2 www.cvc.uab.es/grec2003/symreccontest/index.htm
412     Nathalie Girard, Karell Bertet and Muriel Visani

           Table 1. Structures complexity and Classification performance

                       Nb concepts                Recognition rates
                Local Linear Local Global Local Linear Local Global CHAID
     Image1      527       649     12172 90.33     91.57     82.23 90.95
     GLASS      1950      2128      2074 71.11     72.60     73.18 63.72
  BREAST Cancer 3608     2613       7784 91.66     91.23     90.05 93,47
   GREC Radon    69         92      2192 90.43     90.17     81.42 92.94



References
 1. Ganter, B., Kuznetsov, S.: Pattern structures and their projections. In Delugach,
    H., Stumme, G., eds.: Conceptual Structures: Broadening the Base. Volume 2120
    of LNCS. (2001) 129–142
 2. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression
    data with pattern structures in formal concept analysis. Inf. Sci. 181 (2011) 1989–
    2001
 3. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization
    of continuous features. In: Machine Learning: Proc. of the Twelfth International
    Conference, Morgan Kaufmann (1995) 194–202
 4. Muhlenbach, F., Rakotomalala, R.: Discretization of continuous attributes. In
    Reference, I.G., ed.: Encyclopedia of Data Warehousing and Mining. J. Wang
    (2005) 397–402
 5. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regression
    trees. Wadsworth Inc., 358 pp (1984)
 6. Kass, G.: An exploratory technique for investigating large quantities of categorical
    data. Applied Statistics 29(2) (1980) 119–127
 7. Quinlan, J.: Improved use of continuous attributes in C4.5. Journal of Artificial
    Intelligence Research 4 (1996) 77–90
 8. Guillas, S., Bertet, K., Visani, M., Ogier, J.M., Girard, N.: Some links between
    decision tree and dichotomic lattice. In: Proc. of the Sixth International Conference
    on Concept Lattices and Their Applications, CLA 2008 (2008) 193–205
 9. Ganter, B., Wille, R.: Formal concept analysis, Mathematical foundations.
    Springer Verlag, Berlin, 284 pp (1999)
10. Fu, H., Fu, H., Njiwoua, P., Nguifo, E.M.: A comparative study of fca-based
    supervised classification algorithms. In: Concept Lattices. Volume LNCS 2961.
    (2004) 219–220
11. Birkhoff, G.: Lattice theory. Third edn. Volume 25. American Mathematical
    Society, 418 pp (1967)
12. Wille, R.: Restructuring lattice theory : an approach based on hierarchies of con-
    cepts. Ordered sets (1982) 445–470 I. Rival (ed.), Dordrecht-Boston, Reidel.
13. Girard, N., Bertet, K., Visani, M.: Local discretization of numerical data for galois
    lattices. In: Proceedings of the 23rd IEEE International Conference on Tools with
    Artificial Intelligence, ICTAI 2011 (2011) to appear.
14. Visani, M., Bertet, K., Ogier, J.M.: Navigala: an original symbol classifier based on
    navigation through a galois lattice. International Journal of Pattern Recognition
    and Artificial Intelligence, IJPRAI 25 (2011) 449–473
  Formal Concept Analysis on Graphics Hardware

                   W. B. Langdon, Shin Yoo, and Mark Harman

                  CREST centre, Department of Computer Science,
           University College London Gower Street, London WC1E 6BT, UK



        Abstract. We document a parallel non-recursive beam search GPGPU
        FCA CbO like algorithm written in nVidia CUDA C and test it on soft-
        ware module dependency graphs. Despite removing repeated calculations
        and optimising data structures and kernels, we do not yet see major speed
        ups. Instead GeForce 295 GTX and Tesla C2050 report 141 072 concepts
        (maximal rectangles, clusters) in about one second. Future improvements
        in graphics hardware may make GPU implementations of Galois lattices
        competitive.

  Keywords: software module clustering, MDG, close-by-one, arithmetic intensity


  1   Introduction

  Formal Concept Analysis [7] is a well known technique for grouping objects
  by the attributes they have in common. It can be thought of as discrete data
  clustering. In general the number of conceptual clusters grows exponentially.
  However there are a few specialised algorithms which render FCA manageable,
  even on quite large problems, provided the object-attribute table is sparse [10].
  Krajca, Outrata and Vychodil [10] report considerable improvement in FCA
  algorithms in the last two decades. All these successful algorithms use depth
  first tree search to find all the conceptual clusters in an object-attribute table.
      Computer graphics gaming cards (GPUs) are relatively cheap and yet offer
  far more computing power than the computer’s CPU alone. (E.g. a 295 GTX con-
  tains 480 fully functioning processors and yet costs only a few hundred pounds.)
  Also microprocessor trends suggest faster computing will require parallel com-
  puting in future. There are already hundreds of millions of computers fitted with
  graphics hardware which might be used for general purpose computing [3].
      Krajca et al. [10] report using a distributed computer to overcome the “major
  drawback [of FCA’s] computational complexity”. They report their parallel algo-
  rithm PCbO gives near linear speed increase with number of computing nodes
  in a network of up to 15 PCs. In other work [11] they conclude that there is
  no universal best FCA data structure. Instead they suggest that the optimum
  performance will depend upon the application. In earlier work, Huaiguo Fu had
  created a parallel implementation of NextClosure but it was limited to 50 at-
  tributes [5] but this was subsequently greatly extended [6]. However, like Krajca
  et al. [10], both Fu’s [5] and [6] approaches use conventional distributed comput-
  ers composed of a few CPUs rather than hundreds of GPU processing elements.

c 2011 by the paper authors. CLA 2011, pp. 413–416. Copying permitted only for
  private and academic purposes. Volume published and copyrighted by its editors.
  Local Proceedings in ISBN 978–2–905267–78–8, Inria NGE/LORIA, Nancy, France.
414     W. B. Langdon, Shin Yoo, and Mark Harman


Similarly Djoufak Kengue et al. [4]’s ParCIM implementation used a conven-
tional network of 8 computers connected in a star fashion with MPI. Ours is the
first FCA implementation to run in parallel on computer graphics cards (GPUs).


2     CUDA FCA Implementation

Although In-close [1] claims to be faster we easily obtained FCbO [9] from Source
Forge. We initially implemented the Krajca sequential algorithm [9] in Python.
This was followed by a version in CUDA C, where ComputeClosure is imple-
mented in parallel on the GPU. (For details see our technical report [13].)
    Krajca’s routines ComputeClosure and GenerateFrom essentially form a depth
first search algorithm which builds and navigates a tree of formal concepts from
a binary 0/1 matrix describing which object has which property. Since the search
is recursive and operates on one point in the tree at one time, it is unsuitable for
parallel operation on graphics cards. Our graphics card parallel version retains
the tree but uses beam search rather than depth first search.
    Instead of proceeding to the first leaf of the tree, recursively backing up
and then going forward to the next leaf and so on, in beam search, we also
start from the top of the tree and then proceed along every branch to the next
level. This requires saving information on the beam for every node at that level.
Beam search next expands the search again to cover everything at the next level
and so on until all the leafs of the tree have been reached. Notice instead of
working on a single point in the tree the beam covers many points which can
be worked on in parallel. Indeed within a couple of levels we can get a beam
containing tens of thousands of individual search points which can be processed
independently. This suits the GPU architecture which needs literally thousands
of independent processing threads for it to deliver its best performance [12]. You
will have spotted that in an exponential problem, like FCA, beam search quickly
runs out of memory.
    Even for quite modest tree depths the beam width is limited by the available
space in the GPU card. (We have a configuration limit of 1.8 million simultane-
ous parallel operations.) When a beam search exceeds this limit, only the first
1.8 million searches are loaded onto the GPU and the rest of the beam is queued
on the host PC. (Although we have not done this, in multi-GPU systems it would
be possible to split the beam between the GPUs, allocating up to 1.8 million to
each GPU.) The GPU only searches to the next level. It returns the concepts
found by the searches and the newly discovered branches which remain to be
searched. The concepts are printed by the host PC and the new branches are
added to the end of the beam to await their turn. Effectively the beam becomes
a queue of points in the tree waiting to be searched. The number of parallel
searches is mostly limited by the need to have space on the GPU for all the
potential new branches. This depends upon the tree’s fan out which is problem
dependent. Nonetheless the GPU can manage modest real software engineering
examples (e.g. dependence clustering of the Linux kernel). Notice the beam will
contain a mixture of pending search points at different depths in the tree.
                          Formal Concept Analysis on Graphics Hardware      415

Table 1. Performance on, FCA benchmarks, random module dependency
graphs, and Software Engineering datasets [8]. Time given in seconds, except
longest Python run which is hours:mins:secs. (For 21 295 GTX and Tesla C2050
the total time on the GPU is given.)

    Dataset          Size Density Concepts FCbO Python 295 GTX C2050
    krajca           5×7     54%        16 0.00    0.11 0.01    0.01
    wiki            10×5     44%        14 0.00    0.03 0.00    0.00
    random          10×10    20%        16 0.00    0.04 0.00    0.00
    random         100×100    2%       137 0.00    0.40 0.02    0.01
    random         200×200    2%       420 0.00    4.33 0.00    0.01
    random         500×500    2%      2861 0.01  162.60 0.02    0.02
    bison           37×37    24%       692 0.00    0.32 0.00    0.01
    compiler        33×33     6%        24 0.00    0.05 0.00    0.00
    dot             42×42    28%      1302 0.00    0.71 0.00    0.01
    grappa          86×86     7%       850 0.00    2.54 0.01    0.01
    incl           172×172    2%       238 0.00    1.84 0.00    0.01
    ispell          24×24    34%       432 0.00    0.15 0.01    0.01
    linuxConverted 955×955    2% 141072 0.73 15:42:51 1.79      0.93
    mtunis          20×20    29%       110 0.00    0.05 0.00    0.01
    rcs             29×29    37%      1074 0.00    0.46 0.01    0.02
    swing          413×413    2%      3654 0.01  208.71 0.03    0.02


3   Results

FCbO (version 2010/10/05) was downloaded and compiled without changes on
a 2.66 GHz PC with 3 Gigabytes of RAM running 64 bit CentOS 5.0. The per-
formance of FCbO, our Python code and our CUDA code on two types of GPU
are given in Table 1. They show performance on: two bench mark problems, a se-
lection of randomly generated symmetric object-attribute pairings and software
module dependency graphs of real world example programs.


4   Discussion

It is unclear why our code does not do better.
    We would expect a linear speed advantage for FCbO from both using 64 bit
operations and from using compiled rather than interpreted code. However on
sizable examples, the ratio between the speed of FCbO and that of our Python
code is huge. This hints that FCbO has some algorithmic advantage.
    GPUs are often limited by the time taken to move data rather than to per-
form calculations. “Arithmetic intensity” is the ratio of calculations per data
item. Typically this is in the range 4–64 FLOP/TDE [2, p206], we estimate the
arithmetic intensity of Krajca et al.’s algorithm [9] is less than 1. Thus a po-
tential problem might be there is simply is not enough computation required by
FCA compared to the volume of data.
416     W. B. Langdon, Shin Yoo, and Mark Harman


   Newer versions of CUDA have make it easier to overlap GPU operations.
However our implementation does not do this. Since the work is spread across
the multi-processors, we suspect that idle time is not a major problem.

5     Conclusions
There are many problems which are traditionally solved by depth first search.
However this may not suit low cost computer graphics GPU hardware. We have
implemented a form of beam search and demonstrated it on several existing FCA
benchmarks and ten software engineering dependence clustering problems [8].
GPU beam search may also be more widely applicable.

Acknowledgements
I am grateful for the assistance of Gernot Ziegler of nVidia. Steve Worley, Sar-
nath Kannan, Stephen Swift, Stan Seibert and Yuanyuan Zhang. Software en-
gineering MDGs were supplied by Spiros Mancoridis. Tesla donated by nVidia.
Funded by EPSRC grant EP/G060525/2.

References
 1. S. J. Andrews. In-close, a fast algorithm for computing formal concepts. In Con-
    ceptual Structures Tools Interoperability Workshop at the 17th International Con-
    ference on Conceptual Structures, Moscow, 26-31 July 2009.
 2. M. Christen, O. Schenk, and H. Burkhart. Automatic code generation and tuning
    for stencil kernels on modern shared memory architectures. CSRD, 26(3):205–210.
 3. B. Del Rizzo. Dice puts faith in nvidia PhysX technology for Mirror’s Edge.
    NVIDIA Corporation press release, Nov 19 2008.
 4. J. Djoufak Kengue, P. Valtchev, and C. Tayou Djamegni. Parallel computation of
    closed itemsets and implication rule bases. In I. Stojmenovic, et al., eds., ISPA
    2007, LNCS 4742, pp359–370. Springer.
 5. Huaiguo Fu and E. Nguifo. A parallel algorithm to generate formal concepts for
    large data. In P. Eklund, ed., ICFCA, LNAI 2961, pp141–142. Springer, 2004.
 6. Huaiguo Fu and M. O’Foghlu. A distributed algorithm of density-based subspace
    frequent closed itemset mining. In HPCC, pp750–755. IEEE, 2008.
 7. B. Ganter and R. Wille. Formal Concept Analysis. Springer, 1999.
 8. M. Harman, S. Swift, and K. Mahdavi. An empirical study of the robustness of
    two module clustering fitness functions. In H.-G. Beyer, et al., eds., GECCO 2005.
 9. P. Krajca, J. Outrata, and V. Vychodil. Parallel recursive algorithm for FCA. In
    R. Belohlavek and S. O. Kuznetsov, eds., CLA 2008, Olomouc, Czech Republic.
10. P. Krajca, J. Outrata, and V. Vychodil. Parallel algorithm for computing fixpoints
    of Galois connections. Ann Math Artif Intel, 59:257–272, 2010.
11. P. Krajca and V. Vychodil. Comparison of data structures for computing formal
    concepts. In V. Torra, et al., eds., MDAI 2009, LNCS 5861, pp114–125. Springer.
12. W. B. Langdon. Graphics processing units and genetic programming: An overview.
    Soft Computing, 15:1657–1669, Aug. 2011.
13. W. B. Langdon, S. Yoo, and M. Harman. Non-recursive beam search on GPU for
    formal concept analysis. RN/11/18, Computer Science, UCL, London, UK, 2011.
                              Author Index


Assaghir, Zainab, 319                  Grissa, Dhouha, 207
Astudillo, Hernán, 349                Guennec, David, 295
Atif, Jamal, 405                       Guillaume, Sylvie, 207
Azmeh, Zeina, 377
                                       Hacéne-Rouane, Mohamed, 377
Baixeries, Jaume, 333                  Harman, Mark, 413
Balbiani, Philippe, 279                Huchard, Marianne, 377
Bazhanov, Konstantin, 43               Hudelot, Céline, 405
Belohlavek, Radim, 207
                                       Irlande, Alexis, 131
Berry, Anne, 15
Bertet, Karell, 239, 409               Jay, Nicolas, 363
Bloch, Isabelle, 1, 405
Boc, Alix, 191                         Kılıçaslan, Yılmaz, 59
Borchmann, Daniel, 101                 Kaytoue, Mehdi, 175, 319
Braud, Agnés, 265                     Konecny, Jan, 115
Brito, Paula, 251                      Krupka, Michal, 115
                                       Kuznetsov, Sergei, 175
Carlos Dı́az, Juan, 75
Cellier, Peggy, 31                     Langdon, W. B., 413
Codocedo, Vı́ctor, 349                 Le Ber, Florence, 265
Colomb, Pierre, 131                    Leclerc, Bruno, 9
                                       Lieber, Jean , 87
Demko, Christophe, 239                 Llansó, David, 143
Distel, Felix, 101
Ducasse, Mireille, 31                  Macko, Juraj, 175
                                       Makarenkov, Vladimir, 191
Egho, Elias, 363                       Medina-Moreno, Jesús, 75
Emilion, Richard, 3                    Meira, Wagner, 175, 319
Erné, Marcel, 5                       Miclet, Laurent, 295
                                       Monjardet, Bernard, 11
Falk, Ingrid, 223
Ferre, Sebastien, 31                   Napoli, Amedeo, 175, 191, 363, 377
                                       Nauer, Emmanuel, 87
Güner, Edip Serdar, 59                Nguifo, Engelbert Mephu, 207
Gómez-Martı́n, Marco Antonio, 143     Nica, Cristina, 265
Gély, Alain, 393
                                       Obiedkov, Sergei, 43
Gaillard, Emmanuelle, 87
                                       Outrata, Jan, 207
Ganter, Bernhard, 309
Gardent, Claire, 223                   Pogorelcnik, Romain, 15
Gehrke, Mai, 7                         Polaillon, Géraldine, 251
Girard, Nathalie, 409                  Prade, Henri, 295
Glodeanu, Cynthia Vera, 159
Godin, Robert, 191                     Raissi, Chedy, 363
Gomez-Martin, Pedro Pablo, 143         Raynaud, Olivier, 131
González-Calero, Pedro Antonio, 143   Renaud, Yoan, 131
Grac, Corinne, 265                     Ryssel, Uwe, 101
Sigayret, Alain, 15      Valtchev, Petko, 191, 377
Simovici, Dan, 13        Villerd, Jean, 319
Szathmary, Laszlo, 191   Visani, Muriel, 409

Taramasco, Carla, 349    Yoo, Shin, 413
Editors:                Amedeo Napoli, Vilem Vychodil


Publisher & Print:      INRIA Nancy – Grand Est and LORIA
                        France


Title:                  CLA 2011, Proceedings of the Eighth International
                        Conference on Concept Lattices and Their Applications


Place, year, edition:   Nancy, 2011, 1st


Page count:             xii+419


Impression:             100


Archived at:            http://cla.inf.upol.cz




                                    Not for sale



                              ISBN 978–2–905267–78–8