=Paper=
{{Paper
|id=Vol-1466/proceedings-cla2015
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1466/proceedings-cla2015.pdf
|volume=Vol-1466
}}
==None==
CLA 2015 Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications CLA Conference Series cla.inf.upol.cz Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, France ISBN 978–2–9544948–0–7 Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, France The Twelfth International Conference on Concept Lattices and Their Applications C LA 2015 Clermont-Ferrand, France October 13–16, 2015 Edited by Sadok Ben Yahia Jan Konecny CLA 2015 c paper author(s), 2015, for the included papers c Sadok Ben Yahia, Jan Konecny, Editors, for the volume Copying permitted only for private and academic purposes. This work is subject to copyright. All rights reserved. Reproduction or publica- tion of this material, even partial, is allowed only with the editors’ permission. Technical Editor: Jan Konecny, jan.konecny@upol.cz Cover photo from blognature.fr Page count: xiii+254 Impression: 50 Edition: 1st First published: 2015 Published and printed by: Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, France Organization CLA 2015 was organized by the Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand. Steering Committee Radim Belohlavek Palacký University, Olomouc, Czech Republic Sadok Ben Yahia Faculté des Sciences de Tunis, Tunisia Jean Diatta Université de la Réunion, France Peter Eklund IT University of Copenhagen, Denmark Sergei O. Kuznetsov State University HSE, Moscow, Russia Engelbert Mephu Nguifo LIMOS, University Blaise Pascal, Clermont-Ferrand, France Amedeo Napoli LORIA, Nancy, France Manuel Ojeda-Aciego Universidad de Málaga, Spain Jan Outrata Palacký University, Olomouc, Czech Republic Program Chairs Sadok Ben Yahia Faculté des Sciences de Tunis, Tunis, Tunisia Jan Konecny Palacký University, Olomouc, Czech Republic Program Committee Simon Andrews Sheffield Hallam University, Sheffield, United King- dom Jaume Baixeries Universitat Politècnica de Catalunya, Barcelona, Catalonia Radim Belohlavek Palacký University, Olomouc, Czech Republic Karell Bertet L3i – Université de La Rochelle, La Rochelle, France François Brucker École Centrale, Marseille, France Ana Burusco Universidad Pública de Navarra, Pamplona, Spain Claudio Carpineto Fondazione Ugo Bordoni, Roma, Italy Pablo Cordero Universidad de Málaga, Málaga, Spain Jean Diatta Université de la Réunion, Saint-Denis, France Felix Distel Technische Universität Dresden, Dresden, Germany Florent Domenach University of Nicosia, Nicosia, Cyprus Vincent Duquenne Institut de Mathématiques de Jussieu, Paris, France Peter Eklund IT University of Copenhagen, Denmark Sébastien Ferré IRISA – Université de Rennes 1, Rennes, France Bernhard Ganter Technische Universität Dresden, Dresden, Germany Cynthia Vera Glodeanu Technische Universität Dresden, Dresden, Germany Alain Gély Université de Lorraine, Metz, France Tarek Hamrouni ISAMM, Manouba University, Tunisia Marianne Huchard LIRMM – Université Montpellier 2, Montpellier, France Dmitry Ignatov State University HSE, Moscow, Russia Mehdi Kaytoue Liris – Insa, Lyon, France Stanislav Krajči Univerzita Pavla Jozefa Šafárika v Košiciach, Košice, Slovakia Francesco Kriegel Technische Universität Dresden, Dresden, Germany Michal Krupka Palacký University, Olomouc, Czech Republic Marzena Kryszkiewicz Warsaw University of Technology, Warsaw, Poland Sergei O. Kuznetsov State University HSE, Moscow, Russia Léonard Kwuida Bern University of Applied Sciences, Bern, Switzer- land Jesús Medina Universidad de Cádiz, Cádiz, Spain Engelbert Mephu Nguifo LIMOS, Clermont-Ferrand, France Rokia Missaoui LARIM – Université du Québec en Outaouais, Gatineau, Canada Amedeo Napoli LORIA, Nancy, France Lhouari Nourine LIMOS, Clermont-Ferrand, France Sergei Obiedkov State University HSE, Moscow, Russia Manuel Ojeda-Aciego Universidad de Málaga, Málaga, Spain Petr Osička Palacký University, Olomouc, Czech Republic Jan Outrata Palacký University, Olomouc, Czech Republic Uta Priss Ostfalia University of Applied Sciences, Wolfenbüt- tel, Germany Francois Rioult GREYC – Université de Caen Basse-Normandie, Caen, France Sebastian Rudolph Technische Universität Dresden, Dresden, Germany Christian Sacarea Babes, -Bolyai University, Cluj-Napoca, Romania Barış Sertkaya SAP Research Center, Dresden, Germany László Szathmáry University of Debrecen, Debrecen, Hungary Petko Valtchev Université du Québec, Montréal, Canada Francisco Valverde Universidad Carlos III, Madrid, Spain Additional Reviewers Ľubomír Antoni Univerzita Pavla Jozefa Šafárika v Košiciach, Košice, Slovakia Slim Bouker LIMOS, Clermont-Ferrand, France Maria Eugenia Cornejo Piñero Universidad de Cádiz, Cádiz, Spain Philippe Fournier-Viger Université du Québec, Montréal, Canada Eloisa Ramírez Poussa Universidad de Cádiz, Cádiz, Spain Stefan E. Schmidt Technische Universität Dresden, Dresden, Germany Vilém Vychodil Palacký University, Olomouc, Czech Republic Organization Committee Olivier Raynaud (chair) LIMOS, Clermont-Ferrand, France Violaine Antoine LIMOS, Clermont-Ferrand, France Anne Berry LIMOS, Clermont-Ferrand, France Diyé Dia LIMOS, Clermont-Ferrand, France Kaoutar Ghazi LIMOS, Clermont-Ferrand, France Dhouha Grissa INRA Theix, Clermont-Ferrand, France Yannick Loiseau LIMOS, Clermont-Ferrand, France Engelbert Mephu Nguifo LIMOS, Clermont-Ferrand, France Séverine Miginiac LIMOS, Clermont-Ferrand, France Lhouari Nourine LIMOS, Clermont-Ferrand, France Table of Contents Preface Invited Contributions User Models as Personal Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Gabriella Pasi Formal Concept Analysis from the Standpoint of Possibility Theory . . . . . 3 Didier Dubois Clarifying Lattice Structure by Data Polishing . . . . . . . . . . . . . . . . . . . . . . . . 5 Takeaki Uno Tractable Interesting Pattern Mining in Large Networks . . . . . . . . . . . . . . . 7 Jan Ramon Extended Dualization: Application to Maximal Pattern Mining . . . . . . . . . 9 Lhouari Nourine Full Papers Subset-generated complete sublattices as concept lattices . . . . . . . . . . . . . . . 11 Martin Kauer, Michal Krupka RV-Xplorer: A Way to Navigate Lattice-Based Views over RDF Graphs . . 23 Mehwish Alam, Amedeo Napoli, Matthieu Osmuk Finding p-indecomposable Functions: FCA Approach . . . . . . . . . . . . . . . . . . 35 Artem Revenko Putting OAC-triclustering on MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Sergey Zudin, Dmitry V. Gnatyshak, Dmitry I. Ignatov Concept interestingness measures: a comparative study . . . . . . . . . . . . . . . . 59 Sergei O. Kuznetsov, Tatyana P. Makhalova Why concept lattices are large – Extremal theory for the number of minimal generators and formal concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Alexandre Albano, Bogdan Chornomaz An Aho-Corasick Based Assessment of Algorithms Generating Failure Deterministic Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Madoda Nxumalo, Derrick G. Kourie, Loek Cleophas and Bruce W. Watson Context-Aware Recommender System Based on Boolean Matrix Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Marat Akhmatnurov, Dmitry I. Ignatov Class Model Normalization – Outperforming Formal Concept Analysis approaches with AOC-posets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 André Miralles, Guilhem Molla, Marianne Huchard, Clémentine Nebut, Laurent Deruelle, Mustapha Derras Partial enumeration of minimal transversals of a hypergraph . . . . . . . . . . . 123 Lhouari Nourine, Alain Quilliot, Hélène Toussaint An Introduction to Semiotic-Conceptual Analysis with Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Uta Priss Using the Chu construction for generalizing formal concept analysis . . . . . 147 Ľubomír Antoni, Inmaculada P. Cabrera, Stanislav Krajči, Ondrej Krídlo, Manuel Ojeda-Aciego From formal concepts to analogical complexes . . . . . . . . . . . . . . . . . . . . . . . . 159 Laurent Miclet, Jacques Nicolas Pattern Structures and Their Morphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Lars Lumpe, Stefan E. Schmidt NextClosures: Parallel Computation of the Canonical Base . . . . . . . . . . . . . 181 Francesco Kriegel, Daniel Borchmann Probabilistic Implicational Bases in FCA and Probabilistic Bases of GCIs in EL⊥ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Francesco Kriegel Category of isotone bonds between L-fuzzy contexts over different structures of truth degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Jan Konecny, Ondrej Krídlo From an implicational system to its corresponding D-basis . . . . . . . . . . . . . 217 Estrella Rodríguez-Lorenzo, Kira Adaricheva, Pablo Cordero, Manuel Enciso, Angel Mora Using Linguistic Hedges in L-rough Concept Analysis . . . . . . . . . . . . . . . . . . 229 Eduard Bartl, Jan Konecny Revisiting Pattern Structures for Structured Attribute Sets . . . . . . . . . . . . . 241 Mehwish Alam, Aleksey Buzmakov, Amedeo Napoli, Alibek Sailanbayev Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Preface Formal Concept Analysis is a method of analysis of logical data based on for- malization of conceptual knowledge by means of lattice theory. It has proved to be of interest to various applied fields such as data visualization, knowledge discovery and data mining, database theory, and many others. The International Conference “Concept Lattices and Their Applications (CLA)” is being organized since 2002 and its aim is to bring together researchers from various backgrounds to present and discuss their research related to FCA. The Twelfth edition of CLA was held in Clermont-Ferrand, France from October 13 to 16, 2015. The event was jointly organized by the LIMOS laboratory, CNRS, and Blaise Pascal university, France. This volume includes the selected papers and the abstracts of 5 invited talks. We would like to express our warmest thanks to the keynote speakers. This year, there were initially 39 submissions, from which the Program Com- mittee selected 20 papers which represents an acceptance rate of 51.2%. The program of the conference consisted of five keynote talks given by the follow- ing distinguished researchers: Didier Dubois, Lhouari Nourine, Gabriella Pasi, Jan Ramon, and Takeaki Uno, together with twenty communications authored by researchers from 11 countries, namely: Austria, Czech republic, France, Ger- many, Kazakhstan, Republic of South Africa, Russia, Slovakia, Spain, Sweden, and Ukraine. Each paper was reviewed by 3–4 members of the Program Committee and/or ad- ditional reviewers. We thank them all for their valuable assistance. It is planned that extended versions of the best papers will be published in a well-established journal, after another reviewing process. The success of such event is mainly due the hard work and dedication of many people and a collaboration of several institutions. We thank the contributing authors, who submitted high quality works, we thank to the CLA Steering Com- mittee, who gave us the opportunity of chairing this edition, and we thank the Program Committee, the additional reviewers, and the local Organization Com- mittee. We are also thankful to the following institutions, which have helped the organization of the twelfth edition of CLA: Coffreo, IPLeanware, Axège, and Oorace. We also thank the Easychair conference system as it made easier most of our administration tasks related to paper submission, selection, and reviewing. Last but not least we thank Jan Outrata, who offered his files for preparing the proceedings. October 2015 Sadok Ben Yahia Jan Konecny Program Chairs of CLA 2015 User Models as Personal Ontologies Gabriella Pasi Laboratorio di Information Retrieval – Università degli Studi di Milano Bicocca, Milano, Italia Abstract. The problem of defining user profiles has been a research issue since a long time; user profiles are employed in a variety of applications, including Information Fil- tering and Information Retrieval. In particular, considering the Information Retrieval task, user profiles are functional to the definition of approaches to Personalized search, which is aimed at tailoring the search outcome to users. In this context the quality of a user profile is clearly related to the effectiveness of the proposed personalized search solutions. A user profile represents the user interests and preferences; these can be captured either explicitly or implicitly. User profiles may be formally represented as bags of words, as vectors of words or concepts, or still as conceptual taxonomies. More recent approaches are aimed at formally representing user profiles as ontologies, thus allowing a richer, more structured and more expressive representation of the knowledge about the user. This talk will address the issue of the automatic definition of personal ontologies, i.e. user-related ontologies. In particular, a method that applies a knowledge extraction process from the general purpose ontology YAGO will be described. Such a process is activated by a set of texts (or just a set of words) representatives of the user interests, and it is aimed to define a structured and semantically coherent representation of the user topical preferences. The issue of the evaluation of the generated representation will be discussed too. Formal Concept Analysis from the Standpoint of Possibility Theory Didier Dubois IRIT – Université Paul Sabatier, Toulouse, France Abstract. Formal concept analysis (FCA) and possibility theory (PoTh) are two the- oretical frameworks that are addressing different concerns in the processing of infor- mation. Namely FCA builds concepts from a relation linking objects to the properties they satisfy, which has applications in data mining, clustering and related fields, while PoTh deals with the modeling of (graded) epistemic uncertainty. This difference of focus explains why the two settings have been developed completely independently for a very long time. However, it is possible to build a formal analogy between FCA and PoTh. Both theories heavily rely on the comparison of sets, in terms of containment or overlap. The four set-functions at work in PoTh actually determine all possible rel- ative positions of two sets. Then the FCA operator defining the set of objects sharing a set of properties, which is at the basis of the definition of formal concepts, appears to be the counterpart of the set function expressing strong (or guaranteed) possibility in PoTh. Then, it suggests that the three other set functions existing in PoTh should also make sense in FCA, which leads to consider their FCA counterparts and new fixed point equations in terms of the new operators. One of these pairs of equations, paral- leling the one defining formal concepts, define independent sub-contexts of objects and properties that have nothing in common. The parallel of FCA with PoTh can still be made more striking using a cube of op- position (a device extending the traditional square of opposition existing in logic, and exhibiting a structure at work in many theories aiming at representing some aspects of the handling of information). The parallel of FCA with PoTh extends to conceptual pattern structures, where objects, may, e.g., be described by possibilistic knowledge bases. In the talk we shall indicate various issues pertaining to FCA that could be worth studying in the future. For instance, the object-property links in formal contexts of FCA may be a matter of degree. These degrees may refer to very different notions, such as the degree of satisfaction of a gradual property, the degree of certainty that an object has, or not, a property, or still the typicality of an object with respect to a set of properties. These different intended semantics call for distinct manners of handling the degrees, as advocated in the presentation. Lastly, applications of FCA to the mining of association rules, to the fusion of conflicting pieces of information issued from multiple sources, to clustering of sets of objects on the basis of approximate concepts, or to the building of conceptual analogical proportions, will be discussed as other examples of lines of interest for further research. Clarifying Lattice Structure by Data Polishing Takeaki Uno Institute of Informatics (NII) of Japan, Tokyo, Japan Abstract. Concept lattice is made from many kinds of data. We want to use large scale data for the construction to capture wide and deep meanings but the result of the construction usually yields a quite huge lattice that is impossible to handle. Several techniques have been proposed to cope with this problem, but to best of our knowledge no algorithm attains good granularity, coverage, size distribution, and independence of concepts at the same time. We consider this difficulty comes from that the concepts are not clear in the data, so a good approach is to clarify the concepts in the data by some operations. In this direction, we propose “data polishing” that modify the data according to feasible hypothesis so that the concepts becomes clear. Tractable Interesting Pattern Mining in Large Networks Jan Ramon University of Leuven – INRIA, Leuven, Belgium Abstract. Pattern mining is an important data mining task. While the simplest set- ting, itemset mining, has been thoroughly studied real-world data is getting increas- ingly complex and network structured, e.g. in the context of social networks, economic networks, traffic networks, administrative networks, chemical interaction networks and biological regulatory networks. This presentation will first provide an overview of graph pattern mining work, and will then discuss two important questions. First, what is an interesting concept, and can we obtain suitable mathematical properties to order concepts in some way, obtaining a lattice or other exploitable structure? Second, how can we extract collections of interesting patterns from network-structured data in a computationally tractable way? In the case of graphs, having a lattice on the class of patterns turns out to be insufficient for computational tractability. We will discuss difficulties related to pattern matching and related to enumeration, and additional difficulties arising when considering condensed pattern mining variants. Extended Dualization: Application to Maximal Pattern Mining Lhouari Nourine Limos, Clermont-Ferrand, France Abstract. The hypergraph dualization is a crucial step in many applications in log- ics, databases, artficial intelligence and pattern mining, especially for hypergraphs or boolean lattices. The objective of this talk is to study polynomial reductions of the du- alization problem on arbitrary posets to the dualization problem on boolean lattices, for which output quasi-polynomial time algorithms exist. The main application domain concerns pattern mining problems, i.e. the identification of maximal interesting patterns in database by asking membership queries (predicate) to a database. Subset-generated complete sublattices as concept lattices? Martin Kauer and Michal Krupka Department of Computer Science Palacký University in Olomouc Czech Republic martin.kauer@upol.cz michal.krupka@upol.cz Abstract. We present a solution to the problem of finding the complete sublattice of a given concept lattice generated by given set of elements. We construct the closed subrelation of the incidence relation of the cor- responding formal context whose concept lattice is equal to the desired complete sublattice. The construction does not require the presence of the original concept lattice. We introduce an efficient algorithm for the construction and give an example and experiments. 1 Introduction and problem statement One of the basic theoretical results of Formal Concept Analysis (FCA) is the correspondence between closed subrelations of a formal context and complete sublattices of the corresponding concept lattice [2]. In this paper, we study a re- lated problem of constructing the closed subrelation for a complete sublattice generated by given set of elements. Let hX, Y, Ii be a formal context, B(X, Y, I) its concept lattice. Denote by V the complete sublattice of B(X, Y, I) generated by a set P ⊆ B(X, Y, I). As it is known [2], there exists a closed subrelation J ⊆ I with the concept lattice B(X, Y, J) equal to V . We show a method of constructing J without the need of constructing B(X, Y, I) first. We also provide an efficient algorithm (with polynomial time complexity), implementing the method. The paper also contains an illustrative example and results of experiments, performed on the Mushroom dataset from the UCI Machine Learning Repository. 2 Complete lattices and Formal Concept Analysis Recall that a partially ordered set U is called a complete lattice W if each V its subset P ⊆ U has a supremum andWinfimum. We denote these V by P and P , respec- tively. A subset V ⊆ U is a -subsemilattice (resp. W-subsemilattice, resp.V com- plete sublattice) W V of U , if for each P ⊆ V it holds P ∈ V (resp. P ∈ V, resp. { P, P } ⊆ V ). ? Supported by the IGA of Palacký University Olomouc, No. PrF 2015 023 c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 11–21, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 12 Martin Kauer and Michal Krupka W For a subset P ⊆ U we denote by CW P W the -subsemilattice of U generated by P , i.e. the smallest (w.r.t. set inclusion) -subsemilattice Wof U containing P . CW P always exists and V is equal to the intersection of all -subsemilattices of U containing P . The -subsemilattice of U generated by P and the complete sublattice of U generated by P are defined similarly and are denoted by CV P and CWV P , respectively. The operators CW , CV , and CWV are closure operators on the set U . Recall that a closure operator on a set X is a mapping C : 2X → 2X (where 2X is the set of all subsets of X) satisfying for all sets A, A1 , A2 ⊆ X 1. A ⊆ C A, 2. if A1 ⊆ A2 then C A1 ⊆ C A2 , 3. CC A = C A. Concept lattices have been introduced in [4], our basic reference is [2]. A (for- mal ) context is a triple hX, Y, Ii where X is a set of objects, Y a set of attributes and I ⊆ X × Y a binary relation between X and Y specifying for each object which attributes it has. For subsets A ⊆ X and B ⊆ Y we set A↑I = {y ∈ Y | for each x ∈ A it holds hx, yi ∈ I}, B ↓I = {x ∈ X | for each y ∈ B it holds hx, yi ∈ I}. The pair h↑I , ↓I i is a Galois connection between sets X and Y , i.e. it satisfies 1. If A1 ⊆ A2 then A↑2I ⊆ A↑1I , if B1 ⊆ B2 then B2↓I ⊆ B1↓I . 2. A ⊆ A↑I ↓I and B ⊆ B ↓I ↑I . The operator ↑I ↓I is a closure operator on X and the operator ↓I ↑I is a closure operator on Y . A pair hA, Bi satisfying A↑I = B and B ↓I = A is called a (formal ) concept of hX, Y, Ii. The set A is then called the extent of hA, Bi, the set B the intent of hA, Bi. When there is no danger of confusion, we can use the term “an extent of I” instead of “the extent of a concept of hX, Y, Ii”, and similarly for intents. A partial order ≤ on the set B(X, Y, I) of all formal concepts of hX, Y, Ii is defined by hA1 , B1 i ≤ hA2 , B2 i iff A1 ⊆ A2 (iff B2 ⊆ B1 ). B(X, Y, I) along with ≤ is a complete lattice and is called the concept lattice of hX, Y, Ii. Infima and suprema in B(X, Y, I) are given by * !↓I ↑I + ^ \ [ hAj , Bj i = Aj , Bj , (1) j∈J j∈J j∈J * !↑I ↓I + _ [ \ hAj , Bj i = Aj , Bj . (2) j∈J j∈J j∈J Subset-generated complete sublattices as concept lattices 13 One of immediate consequences of (1) and (2) is that the intersection of any system of extents, resp. intents, is again an extent, resp. intent, and that it can be expressed as follows: !↑I !↓I \ [ \ [ Bj = Aj , resp. Aj = Bj , j∈J j∈J j∈J j∈J for concepts hAj , Bj i ∈ B(X, Y, I), j ∈ J. Concepts h{y}↓I , {y}↓I ↑I i where y ∈ Y are attribute concepts. Each concept hA, Bi isVinfimum of some attribute concepts (we say the set of all attribute con- cepts is -dense in B(X, Y, I)). More specifically, T hA, Bi, is infimum of attribute concepts h{y}↓I , {y}↓I ↑I i for y ∈ B and A = y∈B {y}↓I . ↑I ↓I W Dually, concepts h{x} , {x}↑I i for x ∈ X are object T concepts, they are -dense in B(X, Y, I) and for each concept hA, Bi, B = x∈A {x}↑I . A subrelation J ⊆ I is called a closed subrelation of I if each concept of hX, Y, Ji is also a concept of hX, Y, Ii. There is a correspondence between closed subrelations of I and complete sublattices of B(X, Y, I) [2, Theorem 13]: For each closed subrelation J ⊆ I, B(X, Y, J) is a complete sublattice of B(X, Y, I), and to each complete sublattice V ⊆ B(X, Y, I) there exists a closed subrelation J ⊆ I such that V = B(X, Y, J). 3 Closed subrelations for generated sublattices Let us have a context hX, Y, Ii and a subset P of its concept lattice. Denote by V the complete sublattice of B(X, Y, I) generated by P (i.e. V = CWV P ). Our aim is to find, without computing the lattice B(X, Y, I), the closed subrelation J ⊆ I whose concept lattice B(X, Y, J) is equal to V . If B(X, Y, I) is finite, V can be obtained by alternating applications of the closure operators CW and CV to P : we set V1 = CW P , V2 = CV V1 , . . . , and, generally, VW W = CV Vi−1 for even i > 1. The i = C Vi−1 for odd i > 1 and Vi V sets Vi are -subsemilattices (for odd i) resp. -subsemilattices (for even i) of B(X, Y, I). Once Vi = Vi−1 , we have the complete sublattice V . Note that for infinite B(X, Y, I), V can be infinite even if P is finite. Indeed, denoting F L(P ) the free lattice generated by P [3] and setting X = Y = F L(P ), I = ≤ we have F L(P ) ⊆ V ⊆ B(X, Y, I). (B(X, Y, I) is the Dedekind-MacNeille completion of F L(P ) [2], and we identify P and F L(P ) with subsets of B(X, Y, I) as usual.) Now, if |P | > 2 then F L(P ) is infinite [3], and so is V . We always consider sets Vi together with the appropriate restriction of the ordering on B(X, Y, I). For each i > 0, Vi is a complete lattice (but not a complete sublattice of B(X, Y, I)). In what follows, we construct formal contexts with concept lattices isomor- phic to the complete lattices Vi , i > 0. First, we find a formal context for the complete lattice V1 . Let K1 ⊆ P × Y be given by hhA, Bi, yi ∈ K1 iff y ∈ B. (3) 14 Martin Kauer and Michal Krupka As we can see, rows in the context hP, Y, K1 i are exactly intents of concepts from P . Proposition 1. The concept lattice B(P, Y, K1 ) and the complete lattice V1 are isomorphic. The isomorphism assigns to each concept hB ↓K1 , Bi ∈ B(P, Y, K1 ) the concept hB ↓I , Bi ∈ B(X, Y, I). Proof. Concepts from V1 are exactly those with intents equal to intersections of intents of concepts from P . The same holds for concepts from B(P, Y, K1 ). Next we describe formal contexts for complete lattices Vi , i > 1. All of the contexts are of the form hX, Y, Ki i, i.e. they have the set X as the set of objects and the set Y as the set of attributes (the relation K1 is different in this regard). The relations Ki for i > 1 are defined in a recursive manner: x ∈ {y}↓Ki−1 ↑Ki−1 ↓I for even i, for i > 1, hx, yi ∈ Ki iff (4) y ∈ {x}↑Ki−1 ↓Ki−1 ↑I for odd i. Proposition 2. For each i > 1, 1. Ki ⊆ I, 2. Ki ⊆ Ki+1 . Proof. We will prove both parts for odd i; the assertions for even i are proved similarly. 1. Let hx, yi ∈ Ki . From {y} ⊆ {y}↓Ki−1 ↑Ki−1 we get {y}↓Ki−1 ↑Ki−1 ↓I ⊆ {y}↓I . Thus, x ∈ {y}↓Ki−1 ↑Ki−1 ↓I implies x ∈ {y}↓I , which is equivalent to hx, yi ∈ I. 2. As Ki ⊆ I, we have {y}↓Ki ↑Ki ↓I ⊇ {y}↓Ki ↑Ki ↓Ki = {y}↓Ki . Thus, x ∈ {y}↓Ki yields x ∈ {y}↓Ki ↑Ki ↓I . We can see that the definitions of Ki for even and odd i > 1 are dual. In what follows, we prove properties of Ki for even i and give the versions for odd i without proofs. First we give two basic properties of Ki that are equivalent to the defini- tion. The first one says that Ki can be constructed as a union of some specific rectangles, the second one will be used frequently in what follows. Proposition 3. Let i > 1. S 1. If i is even then Ki = y∈Y {y}↓Ki−1 ↑Ki−1 ↓I × {y}↓Ki−1 ↑Ki−1 . If i is odd then S Ki = x∈X {x}↑Ki−1 ↓Ki−1 ↑I × {x}↑Ki−1 ↓Ki−1 . 2. If i is even then for each y ∈ Y , {y}↓Ki = {y}↓Ki−1 ↑Ki−1 ↓I . If i is odd then for each x ∈ X, {x}↑Ki = {x}↑Ki−1 ↓Ki−1 ↑I . Proof. We will prove only the assertions for even i. 1. TheS “⊆” inclusion is evident. We will prove the converse inclusion. If hx, yi ∈ y0 ∈Y {y 0 }↓Ki−1 ↑Ki−1 ↓I × {y 0 }↓Ki−1 ↑Ki−1 then there is y 0 ∈ Y such that x ∈ {y 0 }↓Ki−1 ↑Ki−1 ↓I and y ∈ {y 0 }↓Ki−1 ↑Ki−1 . The latter implies {y}↓Ki−1 ↑Ki−1 ⊆ Subset-generated complete sublattices as concept lattices 15 {y 0 }↓Ki−1 ↑Ki−1 , whence {y 0 }↓Ki−1 ↑Ki−1 ↓I ⊆ {y}↓Ki−1 ↑Ki−1 ↓I . Thus, x belongs to {y}↓Ki−1 ↑Ki−1 ↓I and by definition, hx, yi ∈ Ki . 2. Follows directly from the obvious fact that x ∈ {y}↓Ki if and only if hx, yi ∈ Ki . A direct consequence of 2. of Prop. 3 is the following. Proposition 4. If i is even then each extent of Ki is also an extent of I. If i is odd then each intent of Ki is also an intent of I. Proof. Let i be even. 2. of Prop. 3 implies that each attribute extent of Ki is an extent of I. Thus, the proposition follows from the fact that each extent of Ki is an intersection of attribute extents of Ki . The statement for odd i is proved similarly except for i = 1 where it follows by definition. Proposition 5. Let i > 1. If i is even then for each y ∈ Y it holds {y}↓Ki−1 ↑Ki−1 = {y}↓Ki ↑Ki = {y}↓Ki ↑I . If i is odd then for each x ∈ X we have {x}↑Ki−1 ↓Ki−1 = {x}↑Ki ↓Ki = {x}↑Ki ↓I . Proof. We will prove the assertion for even i. By Prop. 4, {y}↓Ki is an extent of I. The corresponding intent is {y}↓Ki ↑I = {y}↓Ki−1 ↑Ki−1 ↓I ↑I = {y}↓Ki−1 ↑Ki−1 (5) (by Prop. 4, {y}↓Ki−1 ↑Ki−1 is an intent of I). Moreover, as Ki ⊆ I (Prop. 2), we have {y}↓Ki ↑Ki ⊆ {y}↓Ki ↑I . (6) We prove {y}↓Ki−1 ↑Ki−1 ⊆ {y}↓Ki ↑Ki . Let y 0 ∈ {y}↓Ki−1 ↑Ki−1 . It holds {y 0 }↓Ki−1 ↑Ki−1 ⊆ {y}↓Ki−1 ↑Ki−1 (↓Ki−1 ↑Ki−1 is a closure operator). Thus, {y}↓Ki−1 ↑Ki−1 ↓I ⊆ {y 0 }↓Ki−1 ↑Ki−1 ↓I and so by 2. of Prop. 3, {y}↓Ki ⊆ {y 0 }↓Ki . Applying ↑Ki to both sides we obtain {y 0 }↓Ki ↑Ki ⊆ {y}↓Ki ↑Ki proving y 0 ∈ {y}↓Ki ↑Ki . This, together with (5) and (6), proves the proposition. Proposition 6. Let i > 1 be even. Then for each intent B of Ki−1 it holds B ↓Ki = B ↓I . Moreover, if B is an attribute intent (i.e. there is y ∈ Y such that B = {y}↓Ki−1 ↑Ki−1 ) then hB ↓Ki , Bi is a concept of I. If i > 1 is odd then for each extent A of Ki−1 it holds A↑Ki = A↑I . If A is an object extent (i.e. there is x ∈ X such that A = {x}↑Ki−1 ↓Ki−1 ) then hA, A↑Ki i is a concept of I. 16 Martin Kauer and Michal Krupka Proof.S We will prove the assertion for even S i. Let B be an intent of Ki−1 . It holds B = y∈B {y} (obviously) and hence B = y∈B {y}↓Ki−1 ↑Ki−1 (since ↓Ki−1 ↑Ki−1 is a closure operator). Therefore (2. of Prop. 3), !↓Ki [ \ \ B ↓Ki = {y} = {y}↓Ki = {y}↓Ki−1 ↑Ki−1 ↓I y∈B y∈B y∈B !↓I [ = {y}↓Ki−1 ↑Ki−1 = B ↓I , y∈B proving the first part. Now let B be an attribute intent of Ki−1 , B = {y}↓Ki−1 ↑Ki−1 . By 2. of Prop. 3 it holds B ↓I = {y}↓Ki . By Prop. 5, B ↓I ↑I = {y}↓Ki ↑I = {y}↓Ki−1 ↑Ki−1 = B. Now we turn to complete lattices Vi defined above. We have already shown in Prop. 1 that the complete lattice V1 and the concept lattice B(P, Y, K1 ) are isomorphic. Now we give a general result for i > 0. Proposition 7. For each i > 0, the concept lattice B(P, Y, Ki ) (for i = 1) resp. B(X, Y, Ki ) (for i > 1) and the complete lattice Vi are isomorphic. The isomorphism is given by hB ↓Ki , Bi 7→ hB ↓I , Bi if i is odd and by hA, A↑Ki i 7→ hA, A↑I i if i is even. Proof. We will proceed by induction on i. The base step i = 1 has been already proved in Prop. 1. We will do the induction step for even i, the other case is dual. As Vi = CV Vi−1 , we have to 1. show that the set W = {hA, A↑I i | A is an extent of Ki } is a subset of B(X, Y, I), containing Vi−1 and 2. find for each hA, A↑Ki i ∈ B(X, Y, Ki ) a set of concepts from Vi−1 whose infimum in B(X, Y, I) has extent equal to A. 1. By Prop. 4, each extent of Ki is also an extent of I. Thus, W ⊆ B(X, Y, I). If hA, Bi ∈ Vi−1 then by the induction hypothesis B is an intent of Ki−1 (i − 1 is odd). By Prop. 6, B ↓Ki = B ↓I = A is an extent of Ki and so hA, Bi ∈ W . 2. Denote B = A↑Ki . For each y ∈ Y , {y}↓Ki−1 ↑Ki−1 is an intent of Ki−1 . By Prop. 3 and the induction hypothesis, h{y}↓Ki , {y}↓Ki−1 ↑Ki−1 i = h{y}↓Ki−1 ↑Ki−1 ↓I , {y}↓Ki−1 ↑Ki−1 i ∈ Vi−1 . T of the infimum (taken in B(X, Y, I)) of these concepts for y ∈ B Now, the extent is equal to y∈B {y}↓Ki = B ↓Ki = A. If X and Y are finite then 2. of Prop. 2 implies there is a number n > 1 such that Kn+1 = Kn . Denote this relation by J. According to Prop. 7, there are two isomorphisms of the concept lattice B(X, Y, J) and Vn = Vn+1 = V . We will show that these two isomorphisms coincide and B(X, Y, J) is actually equal to V . This will also imply J is a closed subrelation of I. Subset-generated complete sublattices as concept lattices 17 Proposition 8. B(X, Y, J) = V . Proof. Let hA, Bi ∈ B(X, Y, J). It suffices to show that hA, Bi ∈ B(X, Y, I). As J = Kn+1 = Kn we have J = Ki for some even i and also J = Ki for some odd i. We can therefore apply both parts of Prop. 6 to J obtaining A = B ↓J = B ↓I and B = A↑J = A↑I . Algorithm 1 uses our results to compute the subrelation J for given hX, Y, Ii and P . Algorithm 1 Computing the closed subrelation J. Input: formal context hX, Y, Ii, subset P ⊆ B(X, Y, I) Output: the closed subrelation of J ⊆ I whose concept lattice is equal to CWV P J ← relation K1 (3) i←1 repeat L←J i←i+1 if i is even then J ← {hx, yi ∈ X × Y | x ∈ {y}↓L ↑L ↓I } else J ← {hx, yi ∈ X × Y | y ∈ {x}↑L ↓L ↑I } end if until i > 2 & J = L return J Proposition 9. Algorithm 1 is correct and terminates after at most max(|I| + 1, 2) iterations. Proof. Correctness follows from Prop. 8. The terminating condition ensures we compare J and L only when they are both subrelations of the context hX, Y, Ii (after the first iteration, L is a subrelation of hP, Y, K1 i and the comparison would not make sense). After each iteration, L holds the relation Ki−1 and J holds Ki (4). Thus, except for the first iteration, we have L ⊆ J before the algorithm enters the terminating condition (Prop. 2). As J is always a subset of I (Prop. 2), the number of iterations will not be greater than |I| + 1. The only exception is I = ∅. In this case, the algorithm will terminate after 2 steps due to the first part of the terminating condition. 4 Examples and experiments Let hX, Y, Ii be the formal context from Fig. 1 (left). The associated con- cept lattice B(X, Y, I) is depicted in Fig. 1 (right). Let P = {c1 , c2 , c3 } where 18 Martin Kauer and Michal Krupka I y1 y2 y3 y4 y5 x1 × × y1 y2 y3 x2 × × × x5 x4 x3 × × x4 × y4 y5 x5 × x1 x2 x3 Fig. 1: Formal context hX, Y, Ii (left) and concept lattice B(X, Y, I), together with a subset P ⊆ B(X, Y, I), depicted by filled dots (right). c1 = h{x1 }, {y1 , y4 }i, c2 = h{x1 , x2 }, {y1 }i, c3 = h{x2 , x5 }, {y2 }i are concepts from B(X, Y, I). These concept are depicted in Fig. 1 by filled dots. First, we construct the context hP, Y, K1 i (3). Rows in this context are intents of concepts from P (see Fig.W2, left). The concept lattice B(P, Y, K1 ) (Fig. 2, center) is isomorphic to the -subsemilattice V1 = CW P ⊆ B(X, Y, I) (Fig. 2, right). It is easy to see that elements of B(P, Y, K1 ) and corresponding elements K1 y1 y2 y3 y4 y5 y1 y2 y1 y2 y3 c1 × × x5 x4 c2 c3 c2 × c3 × y4 y4 y5 c1 x1 x2 x3 y3 , y5 Fig. 2: Formal W context hP, Y, K1 i (left), the concept lattice B(P, Y, K1 ) (center) and the -subsemilattice CW P ⊆ B(X, Y, I), isomorphic to B(P, Y, K1 ), depicted by filled dots (right). of V1 have the same intents. Next step is to construct the subrelation K2 ⊆ I. By (4), K2 consists of ele- ments hx, yi ∈ X ×Y Vsatisfying x ∈ {y}↓K1 ↑K1 ↓I . The concept lattice B(X, Y, K2 ) is isomorphic to the -subsemilattice V2 = CV V1 ⊆ B(X, Y, I). K2 , B(X, Y, K2 ), and V2 are depicted in Fig. 3. The subrelation K3 ⊆ I is computed again by (4). K3 consists of elements hx, yi ∈ X × Y satisfying y ∈ {x}↑K2 ↓K2 ↑I . The result can be viewed in Fig. 4. Subset-generated complete sublattices as concept lattices 19 K2 y1 y2 y3 y4 y5 x 3 , x4 x1 × × y1 y2 y1 y2 y3 x2 × × · x5 x4 x5 x3 · · x4 · y4 y4 y5 x5 × x1 x2 x1 x2 x3 y3 , y5 Fig. 3: Formal V context hX, Y, K2 i (left), the concept lattice B(X, Y, K2 ) (center) and the -subsemilattice V2 = CV V1 ⊆ B(X, Y, I), isomorphic to B(X, Y, K2 ), depicted by filled dots (right). Elements of I \ K2 are depicted by dots in the table. K3 y1 y2 y3 y4 y5 x 3 , x4 x1 × × y1 y2 y1 y2 y3 x2 × × × x5 x4 x5 x3 · · x4 · y4 y3 y4 y5 x5 × x1 x2 x1 x2 x3 y5 Fig. 4: Formal W context hX, Y, K3 i (left), the concept lattice B(X, Y, K3 ) (center) and the -subsemilattice V3 = CW V2 ⊆ B(X, Y, I), isomorphic to B(X, Y, K3 ), depicted by filled dots (right). Elements of I \ K3 are depicted by dots in the table. As K3 = K4 = J, it is a closed subrelation of I and V4 = CV V3 = V3 is a complete sublattice of B(X, Y, I). Notice that already V3 = V2 but K3 6= K2 . We cannot stop and have to perform another step. After computing K4 we can easily check that K4 = K3 . We thus obtained the desired closed subrelation J ⊆ I and V4 = V3 is equal to the desired complete sublattice V ⊆ B(X, Y, I). In [1], the authors present an algorithm for computing a sublattice of a given lattice generated by a given set of elements. Originally, we planned to include a comparison between their approach and our Alg. 1. Unfortunately, the algo- rithm in [1] turned out to be incorrect. It is based on the false claim that (using our notation) the smallest element V of V , which is greater than or equal to an element v ∈ B(X, Y, I), is equal to {p ∈ P | p ≥ v}. The algorithm from [1] fails e.g. on the input depicted in Fig. 5. 20 Martin Kauer and Michal Krupka p2 p1 v p3 Fig. 5: An example showing that the algorithm from [1] is incorrect. A complete lattice with a selected subset P = {p1 , p2 , p3 }. The least element of the sublattice V generated by P which is greater than or equal to v is p1 ∨ v. The algorithm incorrectly chooses p2 and “forgets” to add p1 ∨ v to the output. The time complexity of our algorithm is clearly polynomial w.r.t. |X| and |Y |. In Prop. 9 we proved that the number of iterations is O(|I|). Our experi- ments indicate that this number might be much smaller in the practice. We used the Mushroom dataset from the UC Irvine Machine Learning Repository, which contains 8124 objects, 119 attributes and 238710 concepts. For 39 different sizes of the set P , we selected randomly its elements, 1000 times for each of the sizes. For each P , we ran our algorithm and measured the number n of iterations, af- ter which the algorithm terminated. We can see in Tbl. 1 maximal and average values of n, separately for each size of P . From the results in Tbl. 1 we can see |P |(%) Max n Avg n |P |(%) Max n Avg n |P |(%) Max n Avg n 0.005 11 7 0.25 6 3 0.90 5 3 0.010 10 6 0.30 6 3 0.95 4 3 0.015 10 5 0.35 6 3 1 4 3 0.020 10 5 0.40 5 3 2 4 3 0.025 8 5 0.45 5 3 3 4 3 0.030 8 4 0.50 5 3 4 4 3 0.035 8 4 0.55 6 3 5 4 2 0.040 7 4 0.60 5 3 6 4 2 0.045 10 4 0.65 4 3 7 4 2 0.050 8 4 0.70 5 3 8 3 2 0.100 6 4 0.75 6 3 9 3 2 0.150 6 4 0.80 6 3 10 3 2 0.200 6 4 0.85 4 3 11 3 2 Table 1: Results of experiments on Mushrooms dataset. The size of P is given by the percentage of the size of the concept lattice. that the number of iterations (both maximal and average values) is very small compared to the number of objects and attributes. There is also an apparent decreasing trend of number of iterations for increasing size of P . Subset-generated complete sublattices as concept lattices 21 5 Conclusion and open problems An obvious advantage of our approach is that we avoid computing the whole con- cept lattice B(X, Y, I). This should lead to shorter computation time, especially if the generated sublattice V is substantially smaller than B(X, Y, I). The following is an interesting observation and an open problem. It is men- tioned in [2] that the system of all closed subrelations of I is not a closure system and, consequently, there does not exist a closure operator assigning to each subrelation of I a least greater (w.r.t. set inclusion) closed subrelation. This is indeed true as the intersection of closed subrelations need not be a closed subrelation. However, our method can be easily modified to compute for any subrelation K ⊆ I a closed subrelation J ⊇ K, which seems to be minimal in some sense. Indeed, we can set K1 = K and compute a relation J as described by Alg. 1, regardless of the fact that K does not satisfy our requirements (intents of K need not be intents of I). The relation J will be a closed subrelation of I and it will contain K as a subset. Also note that the dual construction leads to a different closed subrelation. Another open problem is whether it is possible to improve the estimation of the number of iterations of Alg. 1 from Prop. 9. In fact, we were not able to construct any example with the number of iterations greater than min(|X|, |Y |). References 1. Bertet, K., Morvan, M.: Computing the sublattice of a lattice generated by a set of elements. In: Proceedings of Third International Conference on Orders, Algorithms and Applications. Montpellier, France (1999) 2. Ganter, B., Wille, R.: Formal Concept Analysis – Mathematical Foundations. Springer (1999) 3. Whitman, P.M.: Free lattices II. Annals of Mathematics 43(1), pp. 104–115 (1942) 4. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Boston (1982) RV-Xplorer: A Way to Navigate Lattice-Based Views over RDF Graphs Mehwish Alam, Amedeo Napoli, and Matthieu Osmuk LORIA (CNRS – Inria Nancy Grand Est – Université de Lorraine) BP 239, Vandoeuvre-lès-Nancy, F-54506, France {mehwish.alam,amedeo.napoli,matthieu.osmuk@loria.fr} Abstract. More and more data are being published in the form of ma- chine readable RDF graphs over Linked Open Data (LOD) Cloud acces- sible through SPARQL queries. This study provides interactive naviga- tion of RDF graphs obtained by SPARQL queries using Formal Concept Analysis. With the help of this View By clause a concept lattice is cre- ated as an answer to the SPARQL query which can then be visualized and navigated using RV-Xplorer (Rdf View eXplorer). Accordingly, this paper discusses the support provided to the expert for answering cer- tain questions through the navigation strategies provided by RV-Xplorer. Moreover, the paper also provides a comparison of existing state of the art approaches. Keywords: RV-Xplorer, Lattice Navigation, SPARQL Query Views, Formal Concept Analysis 1 Introduction Recently, Web Data is turning into “Web of Data” which contains the meta data about the web documents present in HTML and textual format. The goal behind this “Web of Data” is to make already existing data to be usable by not only human agents but also by machine agents. With the effort of Semantic Web community, an emerging source of meta data is published on-line called as Linked Open Data (LOD) in the form of RDF data graphs. There has been a huge explosion in LOD in recent past and is still growing. Up until 2014, LOD contains billions of triples. SPARQL1 is the standard query language for accessing RDF graphs. It integrates several resources to generate the required answers For instance, queries such as What are the movements of the artists displayed in Musee du Louvre? can not be answered by standard search engines. Nowadays, Google has introduced a way of answering questions directly such as currency conversion, calculator etc. but such queries are answered based on most frequent queries posed by the experts. When an expert poses a query to a search engine too many results are re- trieved for the expert to navigate through, which may be cumbersome when a 1 http://www.w3.org/TR/rdf-sparql-query/ c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 23–34, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 24 Mehwish Alam, Amedeo Napoli and Matthieu Osmuk expert has to go through a number of links to find the interesting ones, hence leading to the problem of information overload [3]. Same is the case with the an- swers obtained by SPARQL query with the SELECT [8]. Even if there are hundreds of answers, it becomes harder for the expert to find the interesting patterns. The current study is a continuation of Lattice-Based View Access (LBVA) [1] which provides a view over RDF graphs through SPARQL queries to give complete understanding of a part of RDF graph that expert wants to analyze with the help of Formal Concept Analysis. LBVA takes the SPARQL query and returns a concept lattice called as view instead of the results of the SPARQL query. These views created by LBVA are machine as well as human processable. Accordingly, RV-Xplorer (Rdf View eXplorer) exploits the powerful mathematical structure of these concept lattices thus making it interpretable by human. It also allows human agents to interact with the concept lattice and perform navigation. The expert can answer various questions while navigating the concept lattice. RV- Xplorer provides several ways to guide the expert during this navigation process. This paper is structured as follows: section 2 gives the motivating example, section 3 introduces the required background knowledge to understand the rest of the paper. Section 4 details the elements of Graphical User Interface while section 5 and section 6 details the navigation operations as well as other func- tionalities supported by RV-Xplorer. Section 7 briefly discusses the related work. Finally, section 8 concludes the paper and discusses the future work. 2 Motivating Example Consider a scenario where an expert wants to pose following questions based on articles published in conferences or journals from a team working on data mining. In the current study, we extract the papers published in “Orpailleur Team“ in LORIA, Nancy, France. Following are the questions in which an expert may be interested in: – What are the main research topics in the team and the key researchers w.r.t. these topics, for example, researchers involved in most of the papers in a prominent topic? – What is the major area of the research of the leader of the team and various key persons? – Can the diversity of the team leader and key persons be detected? – Given a paper is it possible to retrieve similar papers published in the team? – Who are the groups of persons working together? – What are the research tendencies and possibly the forthcoming and new research topics (for example, single and recent topics which are not in the continuation of the present topics)? Such kind of questions can not be answered by Google. In this paper we want to answer such kind of questions through lattice navigation supported by RV- Xplorer which is built from an initial query and then is explored by the expert according to her preferences. RV-Xplorer: A Way to Navigate Lattice-Based Views over RDF Graphs 25 3 Preliminaries Linked Open Data: Linked Open Data (LOD) [2] is the way of publishing structured data in the form of RDF graphs. Given a set of URIs U, blank nodes B and literals L, an RDF triple is represented as t = (s, p, o) ∈ (U ∪ B) × U × (U ∪ B ∪ L), where s is a subject, p is a predicate and o is an object. A finite set of RDF triples is called as RDF Graph G such that G = (V, E), where V is a set of vertices and E is a set of labeled edges. Each pair of vertices connected through a labeled edge keeps the information of a statement. Each statement is represented as hsubject, predicate, objecti referred to as an RDF Triple. V includes subject and object while E includes the predicate. SPARQL: A standard query language for RDF graphs is SPARQL2 which mainly focuses on graph matching. A SPARQL query is composed of two parts the head and the body. The body of the query contains the Basic Graph Patterns (present in the WHERE clause of the query). These graph patterns are matched against the RDF graph and the matched graph is retrieved and manipulated according to the conditions given in the query. The head of the query is an expression which indicates how the answers of the query should be constructed. Let us consider a query from the scenario in section 2, Q = Who is the team leader of the data mining team in loria. For answering such questions consider an RDF resource containing all the papers ever published in the data mining team. With the help of SPARQL query the papers published in the last 5 years in English language can be extracted. The SPARQL representation of the query Q is shown in listing 1.1. Lines 1, 2 keep the information about the prefixes used in the rest of the query. Line 5, 6 and 7 retrieve all the papers with their authors and keywords. Line 8 and 9 retrieve the publication year of the paper and filter according to the condition. 1 PREFIX rdfs : < http :// www . w3 . org /2000/01/ rdf - schema # > 2 PREFIX dc : < http :// purl . org / dc / terms / > 3 SELECT distinct ? title ? keywords ? author 4 where { 5 ? paper dc : creator ? author . 6 ? paper dc : subject ? keywords . 7 ? paper dc : title ? title . 8 ? paper dcterms : issued ? pub li ca ti o nY ea r 9 FILTER ( xsd : date (? publicati on Ye a r ) >= ’2011 -01 -01 ’^^ xsd : date ) } Listing 1.1: SPARQL for extracting triples. Lattice-Based View Access: Lattice-Based View Access [1], allows the clas- sification of SPARQL query results into a concept lattice, referred to as a view, for data analysis, navigation, knowledge discovery and information retrieval pur- poses. It introduces a new clause VIEW BY which enhances the functionality of already existing GROUP BY clause in SPARQL query by adding sophisticated classification and Knowledge Discovery aspects. 2 http://www.w3.org/TR/rdf-sparql-query/ 26 Mehwish Alam, Amedeo Napoli and Matthieu Osmuk The variable appearing in the VIEW BY clause of the SPARQL query is re- ferred to as object variable3 The rest of the variables are the attribute variables. Then the answer tuples obtained by the query are processed based on object and the attribute variables. The values obtained for the object variable are mapped to the objects in the formal context K = (G, M, I) and the answers obtained for attribute variables are mapped to the attributes in the context. Consider the query given in listing 1.1 with classification capabilities i.e., containing the clause VIEW BY ?title then the set of variables in the SELECT clause can be given as V = {?title, ?keyword, ?author}. The object variable will be ?title and attribute variable will be ?keyword and ?author. After applying LBVA, the ob- jects contain the titles of the paper and the attributes are the set of keywords and authors in the context. From this context, the concept lattice is built which is referred to as a Lattice-Based View. LBVA is oriented towards the classification of SPARQL queries, but we can interpret the present research activity at a more general level, the classification of LOD. Accordingly, what is proposed in the paper is a tool for navigating a classification of LOD. 4 The RV-Xplorer RV-Xplorer (Rdf View eXplorer) is a tool for navigating concept lattices gener- ated by the answers of SPARQL queries over part of RDF graphs using Lattice- Based View Access. Accordingly, this tool provides navigation to the expert over the classification of SPARQL query answers for analyzing the data, finding hidden regularities and answering several questions. On each navigation step it guides the expert in decision making and performing selection to avoid un- necessary selections. It also allows the user to change her point of view while navigating i.e., navigation by extent. Moreover, it also allows the expert to only focus on the specific and interesting part of the concept lattice by allowing her to hide the part of lattice which is not interesting for her. RV-Xplorer is a web-based tool for building concept lattices. On the client side it uses D3.js which stands for Data-Driven Documents and is based on Javascript for developing interactive data visualizations in modern web browsers. It also uses model-view-controller (MVC) which separates presentation, data and logical components. On the server side we use PHP and MySQL for computing and storing the data. Generally, data can be a graph or pattern generated by pattern mining algorithms etc. Currently, this tool is not publicly available. Figure 1 shows the overall interface of RV-Xplorer (Rdf View eXplorer) which consists of three parts: (1) the middle part is called local view which shows detailed description of the selected concept allowing interaction, navigation and level-wise navigation, (2) the left panel is referred to as Spy showing the global view of the concept lattice and (3) the lower left is the summarization index for guiding the expert in making decision about which node to choose in the next 3 The object here refers to the object in FCA. RV-Xplorer: A Way to Navigate Lattice-Based Views over RDF Graphs 27 level by showing the statistics of the next level. For the running scenario, the concept lattice is also available on-line4 . 4.1 Local View Each selected node in the concept lattice is shown in the middle part of the interface displaying complete information. Let c be the selected concept such that c ∈ C where C is the set of concepts in the complete lattice L = (C, ≤) then a local view shows the complete information about this concept i.e., the extent, intent and the links to the super-concept and the sub-concepts. The set of super and sub-concepts are linked to the selected node where each link represents the partially ordered relation ≤. By default, the top node is the selected node and is shown in local view. Figure 1 (below) shows the selected concept, the orange part defines the label of the selected node which is the entry point for the concept, the pink and yellow parts give the labels of the super-concepts and sub-concepts connected to the selected concept respectively. The green and blue part give the information about the intent and the extent respectively. 4.2 Spy A global view in left panel shows the map of the complete lattice L = (C, ≤) for a particular SPARQL query over an RDF Graph. It tracks the position of the expert in the concept lattice and the path followed by the expert to reach the current concept. It also helps in several navigation tasks such as direct navigation, changing navigation space and navigation between point-of-views. All of these navigation modes are discussed in section 5. 4.3 Statistics about the next level The statistics about the next level are computed with the help of a summariza- tion index which depicts the information about the distribution of the objects in the extent of the selected concept in the linked sub-concepts i.e., concepts in the next level of the concept lattice. Let ci be a concept in the next level where i ∈ {1, . . . , n} and n is the number of concepts in the next level. ext(ci ) is the extent of the concept then |ext(ci )| is the size of the extent. Finally, the statistics about the next level are computed with the help of summarization index. |ext(ci )| summarization index = P × 100 (1) j={1,...,n} |ext(cj )| P Here, j={1,...,n} |ext(cj )| is the sum of extent size of all the concepts in the next level. The sum of summarization index for all the sub-concept adds to 100%. In Figure 1, the percentages are represented in the form of a pie-chart 4 http://rv-xplorer.loria.fr/#/graph/orpailleur_paper/1/ 28 Mehwish Alam, Amedeo Napoli and Matthieu Osmuk Fig. 1: Figure above shows the basic interface of RV-Xplorer displaying the top concept. The Figure below shows the local view of K#52, the concept containing all the papers authored by Amedeo Napoli. RV-Xplorer: A Way to Navigate Lattice-Based Views over RDF Graphs 29 which shows the distribution. The sub-concept containing the most elements in the extent has the highest percentage and hence has the biggest part in the pie chart. 5 Navigation Operations In this section we detail some of the classical [4] as well as advanced navigation operations that are implemented in RV-Xplorer. Navigation can be done locally with a parallel operation which is shown globally through local and global views. Navigation operations allow the expert to locate particular pieces of information which helps in obtaining several answers of the expert questions as well as anal- ysis of the data at hand. Initially, the selected concept is the top concept which contains all the objects. 5.1 Guided Downward (Drill down)/ Upward Navigation (Roll-up): The local view provides expert with the drilling down operation which is achieved by selecting the sub-concepts given in yellow part of local view. RV-Xplorer guides the expert in drilling down the concept lattice by showing contents of the sub-concept to the expert before selecting the node on mouse over. Another added guidance provided to the expert is with the help of the summarization index which gives the statistics about the next level. This way the expert can avoid the attributes or the navigation path which may lead to uninteresting results. The local view also allows the expert to roll-up from the specific concept to the general concept. A super-concept can be selected following the link given in the view. Consider the running scenario discussed in section 2 where the expert wants to know who are researchers having main influences in the team? by analyzing the publications of this particular team. Initially, the selected concept in the local view is the top concept (see Figure 1 (above)). Now it can be seen from the summarization index that most of the papers are contained in K#52. On mouse over on K#52 it shows that this concept keeps all the papers published by Amedeo Napoli. From here it can be safely concluded that Amedeo Napoli is the leader of the team. Similarly, several key team members can be identified on the same level such as supervisors etc. If the expert wants to view the papers published by Amedeo Napoli, a downward navigation is performed by selecting concept K#52. With the help of the summarization index another question can be answered i.e., what are the main research topics of these researchers?. Again by consulting the index it can be seen that K#4 keeps the largest percentage of papers published by Amedeo Napoli (see Figure 1 (below)) and the keyword in this concept is Formal Concept Analysis meaning that the main area of research of Amedeo Napoli is Formal Concept Analysis. However, there are many other areas of research on which he has worked, which shows the diversity of authors based on the area of research he has published in. Moreover, the sub-lattice connected to this concept keeps information about the community of authors 30 Mehwish Alam, Amedeo Napoli and Matthieu Osmuk with who she publishes the most and about which topic and what variants of formal concept analysis. Now, if the expert wants to retrieve all the papers published by Amedeo Napoli then she can go back to K#52. 5.2 Direct Navigation The spy on the left part of the RV-Xplorer (see Figure 1) allows the expert for direct navigation. If an expert has navigated too deep in the view while performing multiple drill-down operations then the spy, which keeps track of the current position of the expert, shows all the paths from the selected concept to the top concept and allows the expert to directly jump from one concept to another linked concept without performing level-wise navigation. Unlike drill- down and roll-up, direct navigation allows the expert to skip two or more hops and select the more general or specific concept. These three navigation modes are very common and are repeatedly discussed in many of the navigational tools built for concept lattice such as Camelis [11] and CREDO [5] which may or not may not be for a specific purpose. The main difference between RV-Xplorer and the two approaches and most of the naviga- tional tools is that they use folder-tree display. As a contrast we manage to keep the original structure of a concept lattice. An added advantage of RV-Xplorer is that these navigation modes are guided at each step meaning that the interface shows the expert with what is contained in the next node as well as the statis- tics about the next level. This way the interface guides the expert in choosing the nodes interesting for her by reducing the chance of performing unnecessary navigation and backtracking to see the details unnecessarily. 5.3 Navigating Across Point-of-Views The current interface allows the expert to toggle between points-of-view, i.e., at any point an expert can start exploring the lattice with respect to the objects (extent) in the concept lattice. Let c be the selected concept and the expert is interested in g1 ∈ ext(c) where ext(c) is the extent of the selected concept. Then if the expert hovers her mouse over this extent in the local view, the Spy highlights all the concepts where this object is present along with the object concept of g1 which is highlighted in red. For instance, the selected concept contains keyword data dependencies in the intent and she is interested in the paper Computing Similarity Dependencies with Pattern Structures and she wants to retrieve all the related or similar papers then on mouse hover it highlights all the concepts containing this paper. Then she selects the concept highlighted in red i.e., the object concept of this paper. The right side of Figure 2 shows the highlighted object concept of Computing Similarity Dependencies with Pattern Structures in RV-Xplorer. After this con- cept is selected. The spy highlights all the paths from this concept until bottom and the top which actually is the sub-lattice associated to this paper. All the objects contained in the extent of the concepts in this sub-lattice are similar to RV-Xplorer: A Way to Navigate Lattice-Based Views over RDF Graphs 31 the paper at hand i.e., papers sharing some properties with the paper Computing Similarity Dependencies with Pattern Structures. If we consider the folder-tree display as discussed in most of the navigational tools such as Camelis [11], CREDO [5] and CEM [7], such kind of navigation is not possible because it only allows navigation w.r.t. intent and extent is con- sidered as the answers of the navigations. In case of RV-Xplorer, it is possible to obtain the sub-lattice related to a certain interesting object and this way the whole sub-lattice connected to the object concept of the object of interest can be navigated to retrieve similar objects i.e., sharing at least one attribute with the object of interest. 5.4 Altering Navigation Space The navigation space can be changed when the selected concept is deep-down in the concept lattice without the effort to start the navigation all over again from the top concept. Let c be the selected concept such that m1 and m2 ∈ int(c) (int(c) is the intent of the selected concept) and the expert has navigated down- wards from the concept whose intent only contains m1 . Now the expert wants to navigate the lattice w.r.t. m2 , on mouse hover the interface highlights all the concepts where the given attribute exists and further highlights the attribute concept in red. The attribute concept of m2 can be selected. In the running ex- ample, if the expert has navigated the lattice w.r.t. the author Amedeo Napoli and she finds some papers on FCA authored by Amedeo Napoli. Now she wants to navigate the concept lattice w.r.t. the keyword FCA then she can easily locate the attribute concept of the keyword FCA and navigate to get specific informa- tion. The left side of Figure 2 shows the highlighted attribute concept of FCA in RV-Xplorer. In tree-folder display altering navigation space w.r.t. intent needs the expert to locate the attribute concept by herself by manually checking each of the branches because it represents the concept lattice as a tree. The problem with such a display is that it is not easy to alter the browsing space quickly or change the navigation point of view. Moreover, the sub-lattice connected to a selected concept can not be seen because of the restrictions posed by tree display. 5.5 Area Expansion Area expansion allows the expert to select several concepts at one time scat- tered over the concept lattice and gives the overall view of what these concepts contains. These concepts are not necessarily a part of navigation path that the expert is following. It allows the expert to have an overall view of other concepts without starting the navigation process again. This idea was first put-forth in [14], where they allow the expert to move from one concept lattice to another concept lattice based on the granularity level w.r.t. a taxonomy and a similarity threshold. The concepts in the concept lattice with higher threshold contains more detailed information as compared to the concept lattice built using lesser threshold. One drawback of such kind 32 Mehwish Alam, Amedeo Napoli and Matthieu Osmuk Fig. 2: Left Figure shows the Attribute Concept of FCA and Right Figure shows the Object Concept of Computing Similarity Dependencies with Pattern Structures. of zooming operation is that it requires the computation of several concept lat- tices. In case of RV-Xplorer, we are dealing with simple concept lattice instead of the one created after using hierarchies meaning that all such kind of infor- mation needs to be scaled to obtain a binary context. As we are dealing with concept lattices built from binary contexts, we bend this functionality to suit the needs. It does not require computation of many concept lattices as well as no re-computation is required. 6 Hiding Non-Interesting Parts of the View One of the most interesting characteristic of RV-Xplorer is that it allows the expert to hide the non-interesting part of the lattice. Let us consider that expert selects a concept c and it contains an attribute which is not interesting for her. She can at any point right click on the concept and select hide sub-lattice. One of the most interesting characteristic of a concept lattice is that if one concept contains some attribute in an intent then all the sub-concepts inherit this attribute. This way if the expert considers one concept as un-interesting then the whole sub-lattice will be considered as uninteresting and hence will be hidden from the expert while navigation. Such kind of functionality enables expert to reduce her navigational space and at the end the concept lattice contains only those concepts which are interesting for the expert. Similar functionality was first introduced in CreChainDo system [15]. Sim- ilar to CREDO [5], CreChainDo allows the expert to pose a query against the standard search engine which returns some results. These results are then orga- nized in the form of a concept lattice and displayed to the expert in the form of folder-tree display. An added advantage of CreChainDo over CREDO is that the former allows expert interaction i.e., the expert can mark the concepts as relevant or irrelevant based on her priorities. After the expert has marked the concept irrelevant the sub-lattice linked to that concept is deleted. Meaning that, it reduces the context based on this feedback and the concept lattice is computed again using the reduced context. In case of RV-Xplorer, the concept lattice is built on top of RDF graphs. Moreover, we do not recompute the lattice or re- move anything from the concept lattice. We only hide the non-interesting part RV-Xplorer: A Way to Navigate Lattice-Based Views over RDF Graphs 33 of the lattice to reduce the navigation space of the expert. This way a reduction in the navigation space is performed without re-computing a concept lattice. 7 Related Tools There have already been many efforts for providing expert the facilities to in- teract with the concept lattice applied to different domains. In [13], the authors discuss a query-based faceted search for Semantic Web, as a contrast we are mostly dealing with navigational capabilities that can be provided by utilizing the powerful structure introduced by Hasse Diagram. [10] proposes another in- teresting way of navigating the concept lattice which allows the novice user to navigate through the concept lattice without having to know the structure of the concept lattice. Same is the case with SPARKLIS [12], where user can per- form selections and the tool acts as a query builder. As a contrast, RV-Xplorer provides exploration/navigational capabilities over SPARQL query answers with the help of view i.e., a concept lattice for data analysis and information retrieval purposes. Conexp5 is another tool for visualizing small lattices. As a contrast, RV-Xplorer allows area expansion and also provides guided navigation. [1] dis- cusses that the views generated are easily navigable by machine as well as human agents. Machine agents may access the datasets through SPARQL queries for application development purposes through generic SPARQL queries generating huge number of answers and consequently large number of concepts are provided by View By clause. However, when human agents want to access the information through SPARQL query they run specialized queries which do not generate huge number of answers. In the current study we are focusing on manageable number of answers to be visualized by human agents using our visualization software. An added advantage over these approaches is that RV-Xplorer provides guid- ance to the expert at each step for making the decision about concept selection. This guidance is provided by showing the user at each step, the contents of the intent of next level, by showing the distribution of the extent with the help of summarization index and finally with the help of global view many other ways of guidance are provided. 8 Discussion In this study we introduce a new navigational tool for concept lattices called as RV-Xplorer which provides exploration over SPARQL query answers. With the help of guided navigation implemented in RV-Xplorer we were able to answer all the questions posed initially in the scenario. However, this tool is not designed for only specific purpose any kind of concept lattice can be visualized and data from any domain can be analyzed using this tool. The RV-Xplorer tool is still in development and other functionalities should be added such as incremental visualization (w.r.t. a set of given objects and attributes), iceberg visualization 5 http://conexp.sourceforge.net/ 34 Mehwish Alam, Amedeo Napoli and Matthieu Osmuk (given a set of attributes and objects, and a frequency threshold), integration of quality measures, visualization of implications and Duquenne-Guigues basis... We believe that visualization tools, as many other researchers do (see the tools discussed in [6]) are of main importance, not only for FCA but for data mining in general. Accordingly, a new generation of visualization tools should be studied and designed, and RV-Xplorer is an example of this new tools and what can be imagined for supporting the analyst in the mining activity. We also want to perform human evaluation of the tool as discussed in [10] and [13]. References 1. Mehwish Alam and Amedeo Napoli. Defining views with formal concept analysis for understanding SPARQL query results. In Proceedings of the Eleventh Interna- tional Conference on Concept Lattices and Their Applications., 2014. 2. Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked data - the story so far. Int. J. Semantic Web Inf. Syst., 5(3):1–22, 2009. 3. Claudio Carpineto, Stanislaw Osiński, Giovanni Romano, and Dawid Weiss. A survey of web clustering engines. ACM Comput. Surv., 41(3):17:1–17:38, 2009. 4. Claudio Carpineto and Giovanni Romano. A lattice conceptual clustering system and its application to browsing retrieval. Machine Learning, 24(2):95–122, 1996. 5. Claudio Carpineto and Giovanni Romano. Exploiting the potential of concept lattices for information retrieval with CREDO. J. UCS, 10(8):985–1013, 2004. 6. Vı́ctor Codocedo and Amedeo Napoli. Formal concept analysis and information retrieval - A survey. In Formal Concept Analysis - 13th International Conference, ICFCA 2015, Nerja, Spain, June 23-26, 2015, Proceedings, pages 61–77, 2015. 7. Richard Cole and Gerd Stumme. CEM - A conceptual email manager. In 8th Inter- national Conference on Conceptual Structures, ICCS 2000, Darmstadt, Germany, August 14-18, 2000, Proceedings, pages 438–452, 2000. 8. Claudia d’Amato, Nicola Fanizzi, and Agnieszka Lawrynowicz. Categorize by: Deductive aggregation of semantic web query results. In ESWC (1), 2010. 9. Peter W. Eklund, editor. Concept Lattices, Second International Conference on Formal Concept Analysis, ICFCA 2004, Sydney, Australia, February 23-26, 2004, Proceedings, Lecture Notes in Computer Science. Springer, 2004. 10. Peter W. Eklund, Jon Ducrou, and Peter Brawn. Concept lattices for information visualization: Can novices read line-diagrams? In Eklund [9], pages 57–73. 11. Sébastien Ferré. Camelis: a logical information system to organise and browse a collection of documents. Int. J. General Systems, 38(4):379–403, 2009. 12. Sébastien Ferré. Expressive and scalable query-based faceted search over SPARQL endpoints. In The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part II, 2014. 13. Sébastien Ferré and Alice Hermann. Reconciling faceted search and query lan- guages for the semantic web. IJMSO, 7(1):37–54, 2012. 14. Nizar Messai, Marie-Dominique Devignes, Amedeo Napoli, and Malika Smaı̈l- Tabbone. Using domain knowledge to guide lattice-based complex data explo- ration. In ECAI 2010 - 19th European Conference on Artificial Intelligence, Lisbon, Portugal, August 16-20, 2010, Proceedings, pages 847–852, 2010. 15. Emmanuel Nauer and Yannick Toussaint. Dynamical modification of context for an iterative and interactive information retrieval process on the web. In Proceedings of the Fifth International Conference on Concept Lattices and Their Applications, CLA 2007, Montpellier, France, October 24-26, 2007, 2007. Finding p-indecomposable Functions: FCA Approach Artem Revenko12 1 TU Wien Karlsplatz 13, 1040 Vienna, Austria 2 TU Dresden Zellescher Weg 12-14, 01069 Dresden, Germany Abstract. The parametric expressibility of functions is a generalization of the expressibility via composition. All parametrically closed classes of functions (p-clones) form a lattice. For finite domains the lattice is shown to be finite, however straight-forward iteration over all functions is infeasible, and so far the p-indecomposable functions are only known for domains with two and three elements. In this work we show how p- indecomposable functions can be computed more efficiently by means of an extended version of attribute exploration (AE). Due to the growing number of attributes standard AE is not able to guarantee the discovery of all p-indecomposable functions. We introduce an extension of AE and investigate its properties. We investigate the conditions allowing us to guarantee the success of exploration. In experiments the lattice of p- clones on three-valued domain was reconstructed. Keywords: parametric expressibility, attribute exploration, p-indecomposable function 1 Introduction The expressibility of functions is a major topic in mathematics and has a long history of investigation. The interest is explainable: when one aims at investi- gating any kind of functional properties, which classes of functions should one consider? If a function f is expressible through a function h then it often means that f inherits properties of h and should not be treated separately. Moreover, if h in turn is expressible through f then both have similar or even the same properties. Therefore, partition with respect to expressibility is meaningful and can be the first step in the investigation of functions. With the development of electronics and logical circuits a new question arises: if one wants to be able to express all possible functions which minimal set of functions should one have at hands? One of the first investigations in this direc- tion was carried out in [Pos42]; in this work all the Boolean classes of functions closed under expressibility are found and described. Afterwards many important works were dedicated to related problems such as the investigation of the struc- ture of the lattice of functional classes, for example, [Yab60,Ros70]. However, it c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 35–46, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 36 Artem Revenko is known that the lattice of classes of functions closed under expressibility is in general uncountably infinite. In [Kuz79] a more general type of functional ex- pressibility was introduced – parametric expressibility. A significant advantage of this type of expressibility is that for any finite domain Ak , |A|= k the lattice of all classes closed under parametric expressibility classes of functions (p-clones) is finite [BW87]. However, finding this lattice is a complex task. For k = 3 in a thorough and tedious investigation [Dan77] it was proved that a system of 197 functions forms the lattice of all p-clones. The investigation was carried out without the use of computers. In this paper we introduce, develop, and investigate the methods and tools for automation of the exploration of the lattice of p-clones. Therefore, this paper “applied” to A3 can be seen as complementing the work [Dan77] where a proof of the correctness of the results obtained using the elaborated in this paper tools can be found. Namely, in this paper we answer the question how to find all the p-clones, whereas in [Dan77] it is proved that certain functions allow us to construct the desired lattice. The presented methods and tools are extensible to larger domains as well. Contributions – New original approach to exploring the lattice of p-clones introduced; – An extension of the standard exploration procedure is introduced and inves- tigated; – The whole procedure is implemented and executed; the obtained results con- firm with the previously known results; – It is proved that for certain starting conditions the desired lattice will nec- essarily be eventually discovered. 2 Formal Concept Analysis In what follows we keep to standard definitions of FCA [GW99]. Let G and M be sets and let I ⊆ G × M be a binary relation between G and M . The triple K := (G, M, I) is called a (formal) context. The set G is called the set of objects. The set M is called the set of attributes. A context (G∗ , M∗ , I∗ ) such that G∗ ⊆ G, M∗ ⊆ M , and I∗ = I ∩ G∗ × M∗ is called a subcontext of K. Consider mappings ϕ: 2G → 2M and ψ: 2M → 2G : ϕ(X) := {m ∈ M | gIm for all g ∈ X}, ψ(A) := {g ∈ G | gIm for all m ∈ A}. Mappings ϕ and ψ define a Galois connection between (2G , ⊆) and (2M , ⊆), i.e. ϕ(X) ⊆ A ⇔ ψ(A) ⊆ X. Usually, instead of ϕ and ψ a single notation (·)0 is used. Let X ⊆ G, A ⊆ M . A formal concept C of a formal context (G, M, I) is a pair (X, A) such that X 0 = A and A0 = X. The subset of objects X is called the Finding p-indecomposable Functions: FCA Approach 37 extent of C and is denoted by ext(C), and the subset of attributes A is called the intent of C and is denoted by int(C). For a context (G, M, I), a concept C1 = (X, A) is a subconcept of a concept C2 = (Y, B) (C1 ≤ C2 ) if X ⊆ Y or, equivalently, B ⊆ A. This defines a partial order on formal concepts. The set of all formal concepts of (G, M, I) is denoted by B(G, M, I). An implication of K = (G, M, I) is defined as a pair (A, B), where A, B ⊆ M , written A → B. A is called the premise, B is called the conclusion of the implication A → B. The implication A → B is respected by a set of attributes N if A * N or B ⊆ N . We say that the implication is respected by an object g if it is respected by the intent of g. If g does not respect an implication then g is called a counter-example. The implication A → B holds (is valid ) in K if it is respected by all g 0 , g ∈ G, i.e. every object, that has all the attributes from A, also has all the attributes from B (A0 ⊆ B 0 ). A unit implication is defined as an implication with only one attribute in its conclusion, i.e. A → b, where A ⊆ M, b ∈ M . Every implication A → B can be regarded as a set of unit implications {A → b | b ∈ B}. An implication basis of a context K is defined as a set LK of implications of K, from which any valid implication for K can be obtained as a consequence and none of the proper subsets of LK has this property. We call the set of all valid in K the implicative theory of K. A minimal in the number of implications basis was defined in [GD86] and is known as the canonical implication basis. An object g is called reducible in a context K := (G, M, I) iff ∃X ⊆ G \ g : g 0 = X 0 . Note that a new object is going to be reducible if in the context there already exists a formal concept with the same intent as the intent of the new object. Reducible objects neither contribute to any implication basis nor to the concept lattice [GW99], therefore, if one is only interested in the implicative theory or in the concept lattice of the context reducible objects can be eliminated. In what follows we introduce other types of reducibility, therefore, we refer to this type of reducibility as plain reducibility. In what follows the canonical implication basis is used, however, the investi- gation could be performed using another implication basis. Attribute Exploration (AE) consists in iterations of the following steps until stabilization: computing the implication basis of a context, finding counterexam- ples to implications, updating the context with counterexamples as new objects, recomputing the basis. AE has been successfully used for investigations in many mostly analytical areas of research. For example, in [KPR06] AE is used for studying Boolean algebras, in [Dau00] lattice properties are studied, in [Rev14] algebraic identities are studied. 3 Expressibility of Functions Consider a set Ak , |A|= k, k ∈ N. Consider a function f : Aar(f ) → A (ar(f ) denotes the arity of f ), the set of all possible functions over Ak of different arities is denoted by Uk . The particular functions pin (x1 , . . . xn ) = xi are called 38 Artem Revenko the projections. The set of all projections is denoted by P r. In what follows instead of writing (x1 , . . . xn ) we use a shorter notation (x). Let H ⊆ Uk . We say that f is compositionally expressible through H (denoted f ≤ H) if the following condition holds: f (x) ≡ h(j1 (x), . . . , jar(h) (x)), (1) for some h, j1 , . . . jm ∈ H ∪ P r. A functional clone is a set of functions containing all projections and closed under compositions. The set of all functional clones over a domain of size k = 2 forms a countably infinite lattice [Pos42]. However, if k > 2 then the set of all functional classes is uncountable [YM59]. Let H ⊆ Uk and for any i ∈ [1, m] : ti , si ∈ H ∪ P r. We say that f, f ∈ Uk is parametrically expressible through H (denoted f ≤p H) if the following condition holds: ^m f (x) = y ⇐⇒ ∃w ti (x, w, y) = si (x, w, y). (2) i=1 The notation J ≤p H means that every function from J is parametrically ex- pressible through H. A parametric clone (or p-clone) is a set of functions closed under parametric expressibility and containing all projections. We consider a spe- cial relation f • of arity ar(f )+1 on Ak called the graph of function f . f • consists of the tuples of the form (x, f (x)). If function h is compatible with f • , i.e. if for all valuations of variables xij in Ak holds the identity (ar(f ) = n, ar(h) = m) f (h(x11 , . . . , x1m ), . . . h(xn1 , . . . , xnm )) ≡ h(f (x11 , . . . , xn1 ), . . . f (x1m , . . . , xnm )), then we say that functions f and h commute (denoted f ⊥ h). For a set of functions H we write f ⊥ H to denote that for all h ∈ H : f ⊥ h. The commutation property is commutative, i.e. f ⊥ h iff h ⊥ f . The centralizer of H is defined by H ⊥ = {g ∈ Uk | g ⊥ H}. In [Kuz79] it is shown that if f ≤p H then f ⊥ H ⊥ . x1 x2 f (x1 , x2 ) h(x1 , x2 ) x1 x2 f f f 0 0 1 1 0 0 h 0 1 1 0 1 0 1 0 1 h 0 0 1 1 0 0 0 1 0 h 1 0 6 = 1 1 1 1 1 1 Fig. 1. Functions f and h do not commute A function f is called p-indecomposable if each system H parametrically equivalent to {f } (i.e. f ≤p H and H ≤p f ) contains a function parametrically equivalent to f . Hence, for each p-indecomposable function there exists a class of Finding p-indecomposable Functions: FCA Approach 39 p-indecomposable functions that are parametrically equivalent to it. From each such class we take only one representative (only one p-indecomposable function) and gather them in a set of p-indecomposable functions denoted by Fkp . A p- clone H cannot be represented as an intersection of p-clones strictly containing H if and only if there exists a p-indecomposable function f such that H = f ⊥⊥ . Hence, in order to construct the lattice of all p-clones it suffices to find all p- indecomposable functions. The lattice of all p-clones for any finite k is finite [BW87], hence, Fkp is finite. In [BW87] it is proved that it suffices to consider p-indecomposable functions of arity at most k k , however, the authors conjecture that the actual arity should be equal to k for k ≥ 3. The conjecture is still open. Nevertheless, thanks to results reported in [Dan77], we know that the conjecture holds for k = 3. 4 Exploration of P-clones The knowledge about the commutation properties of a finite set of functions F ⊆ Uk can be represented as a formal context KF = (F, F, ⊥F ), where ⊥F ⊆ F 2 , a pair (f1 , f2 ) ∈ F 2 belongs to the relation ⊥F iff f1 ⊥ f2 . Note that the relation ⊥F is symmetric, hence, the objects and the attributes of the context are the same functions. The goal of this paper is to develop methods for constructing the lattice of all p-clones on A3 . As already noted, for the purpose of constructing the lattice of p-clones it suffices to find all p-indecomposable functions Fkp . The set of supremum-irreducible elements of the lattice of p-clones is exactly the set {f ∗∗ | f ∈ Fkp }. k For any domain of size k there exist k k functions of arity k. Therefore, to compute the context of all commuting functions KUk one has to perform k k 2 O(k k ∗ k k ∗ k k ) operations (taking into consideration only functions of arity k and the cost of commutation check in the worst case). For k = 3 we count about 1030 operations. Therefore, already for k = 3 a brute-force solution is infeasible.3 We intend to apply AE to commuting functions. For this purpose we de- veloped and implemented methods for finding counter-examples to implications over functions from Uk [Rev15]. These methods are not presented in this paper for the sake of compactness. However, as the number of attributes is not fixed, the success of applying AE is not guaranteed, i.e. it is not guaranteed that the complete lattice of p-clones will eventually be discovered using AE. 4.1 Object-Attribute Exploration We now describe which commuting properties a new function g 6∈ F should possess in order to alter the concept lattice of the original context K = (F, F, ⊥) despite the fact that the intent of g is equal to an intent from B(F, F, ⊥F ). 3 Of course one can use dualities, but it does not give a feasible solution as well as there exist only k ∗ (k − 1) dualities. 40 Artem Revenko To distinguish between binary relations on different sets of functions we use subscripts. The commutation relation on F is denoted by ⊥F , i.e. ⊥F = {(h, j) ∈ F 2 | h ⊥ j}. The context with the new function (F ∪ g, F ∪ g, ⊥F ∪g ) is denoted by KF ∪g . The derivation operator for the context KF ∪g is denoted by (·)⊥F ∪g . Proposition 1. Let C ∈ B(F, F, ⊥) such that ext(C) * int(C). Let g ∈ Uk , g ∈ / F be a function such that g ⊥F ∪g ∩ F = int(C) (g is reducible in KF ). g is irreducible in KF ∪g ⇔ g ⊥ g. Proof. As ext(C) * int(C) and for all f ∈ F \ int(C) : g 6⊥ f it follows that g 6⊥ ext(C). We prove the contrapositive statement: g is reducible in KF ∪g ⇔ g 6⊥ g. ⇐ As g 6⊥ g we have g ⊥F ∪g = int(C) = ext(C)⊥F ∪g . Therefore, g is reducible. ⇒ As g is reducible we obtain g ⊥F ∪g = H ⊥F ∪g for some H ⊆ F . Fix this H. As H ⊥F ∪g = int(C) we have H ⊥F ∪g ⊥F ∪g = ext(C). Suppose H ⊆ int(C), then H ⊥F ∪g ⊥F ∪g ⊆ int(C)⊥F ∪g ⊥F ∪g = int(C). As H ⊥F ∪g ⊥F ∪g = ext(C) and ext(C) * int(C) we arrive at a contradiction. Therefore, H * int(C). Hence, g 6⊥ H, therefore, g 6∈ H ⊥F ∪g , hence, g 6∈ g ⊥F ∪g . Corollary 1. If g is reducible in KF , but irreducible in KF ∪g and g ⊥ g then ext(C) → g holds in KF ∪g . Proof. As g ⊥F ∪g = int(C)∪{g} and ext(C)⊥F ∪g = int(C) we have ext(C)⊥F ∪g ⊂ g ⊥F ∪g , therefore, ext(C) → g. The statement dual to Proposition 1 holds as well. Proposition 2. Let C ∈ B(F, F, ⊥F ) such that ext(C) ⊆ int(C). Let g ∈ / F be a function such that g ⊥F ∪g ∩ F = int(C) (g is reducible in KF ). Uk , g ∈ g is irreducible in KF ∪g ⇔ g 6⊥ g. Proof. As ext(C) ⊆ int(C) and g ⊥ int(C) then g ⊥ ext(C). We prove the contrapositive statement: g is reducible in KF ∪g ⇔ g ⊥ g. ⇐ As g ⊥ g and g ⊥ ext(C) we have ext(C)⊥F ∪g = int(C) ∪ {g} = g ⊥F ∪g . Hence, g is reducible. ⇒ As g is reducible we obtain g ⊥F ∪g = H ⊥F ∪g for some H ⊆ F . Fix this H. As g ⊥ int(C) we have H ⊥ int(C), hence, H ⊆ ext(C). As g ⊥ ext(C) we have g ⊥ H, hence, g ∈ H ⊥F ∪g , therefore, g ∈ g ⊥F ∪g and g ⊥ g. Corollary 2. If g is reducible in KF , but irreducible in KF ∪g and g 6⊥ g then g → ext(C) holds in KF ∪g . Proof. As g ⊥F ∪g = int(C) and ext(C)⊥F ∪g = int(C) ∪ {g} we have g ⊥F ∪g ⊂ ext(C)⊥F ∪g , therefore, g → ext(C). In order to distinguish reducibility in the old context KF and in the new context KF ∪g we introduce a new notation. Finding p-indecomposable Functions: FCA Approach 41 Definition 1. We call a function g that is reducible in KF , but irreducible in KF ∪g , first-order irreducible for KF . If g is reducible for KF and reducible in KF ∪g we call it first-order reducible for KF . We remind that if g is irreducible in (F ∪g, F, ⊥F ∪{(g, f ) ∈ {g}×F | f ⊥ g}) we call it plainly irreducible. Hence, if function is first-order reducible for KF then it is also plainly reducible in KF . Note that g is plainly irreducible in KF iff g is a counter-example to some valid in KF implication. Next we present an example with functions from U3 , in order to explicitly show this we add 3 in the subscript of every function. The numbering of the functions is induced by the lexicographic ordering on the outputs of the func- tions [Rev15]. We use superscripts ·u for unary, ·b for binary, and ·t for ternary functions. (3) Example 1. The context under consideration K0 is presented in Figure 2. The (3) implication basis of K0 is empty, therefore, there exist no plainly irreducible b b functions. The function f3,756 has the following commuting properties: f3,756 ⊥ u b b u b b {f3,0 , f3,12015 } and f3,756 6⊥ f3,1 . Moreover, f3,756 6⊥ f3,756 and for the corre- u u b sponding concept C holds ext(C) = {f3,0 } ⊂ {f3,0 , f3,12015 } = int(C). As follows (3) b from Proposition 2, the function f3,756 is first-order irreducible for K0 . u u b f3,0 f3,1 f3,12015 u f3,0 × × u f3,1 × × b f3,12015 × × (3) u u b Fig. 2. Context K0 of functions on domain A3 containing f3,0 , f3,1 , f3,12015 Corollary 3. Let C ∈ B(F, F, ⊥F ), g ∈ Uk , g ∈ / F , and g be first-order reducible for KF . ext(C) ⊥ g ⇔ g ⊥ g. Proof. Follows from Propositions 1 and 2 and the fact that ext(C) ⊥ g ⇔ ext(C) ⊆ int(C). There remains a possibility that a union of sets of reducible functions is irreducible. We proceed with the simplest case when there are only two sets each containing a single first-order reducible function for the current context. We prove several propositions about such pairs of first-order reducible functions. The consequences of these propositions are deeper investigated in Section 4.2. We consider a context KF and new functions g1 , g2 ∈ Uk , g1 , g2 6∈ F . We denote {g1 , g2 } by G, ⊥F ∪G = {(h, j) ∈ (F ∪ G)2 | h ⊥ j}, the context (F ∪ 42 Artem Revenko G, F ∪ G, ⊥F ∪G ) is denoted by KF ∪G , the corresponding derivation operator is denoted by (·)⊥F ∪G . As in the case with one function, for i ∈ {1, 2} : gi is not a counter-examples to a valid implication iff gi⊥F ∪G ∩ F ∈ int(G, M, I). We denote the corresponding intents by int(C1 ) and int(C2 ), respectively. Proposition 3. Let C1 , C2 ∈ B(F, F, ⊥F ) and g1 , g2 ∈ / F be first-order re- ducible for KF . Suppose g1 ⊥ g2 . Both g1 , g2 are irreducible in KF ∪G ⇔ ext(C1 ) * int(C2 ). Proof. As g1 is irreducible it holds that g1⊥F ∪G 6= ext(C1 )⊥F ∪G . From Corollary 3 follows that g1 ∈ ext(C1 )⊥F ∪G iff g1 ∈ g1⊥F ∪G . Therefore, ext(C1 )⊥F ∪G = g1⊥F ∪G \ {g2 }. Hence, ext(C1 ) 6⊥ g2 , hence, ext(C1 ) * int(C2 ). Similarly for g2 , ext(C2 ) * int(C1 ). Proposition 4. Let C1 , C2 ∈ B(F, F, ⊥F ) and g1 , g2 ∈ / F be first-order re- ducible for KF . Suppose g1 6⊥ g2 . Both g1 , g2 are irreducible in KF ∪G ⇔ ext(C1 ) ⊆ int(C2 ). Proof. As g1 is irreducible it holds that g1⊥F ∪G 6= ext(C1 )⊥F ∪G . From Corollary 3 follows that g1 ∈ ext(C1 )⊥F ∪G iff g1 ∈ g1⊥F ∪G . Therefore, ext(C1 )⊥F ∪G = g1⊥F ∪G ∪ {g2 }. Hence, ext(C1 ) ⊥ g2 , hence, ext(C1 ) ⊆ int(C2 ). By the properties of derivation operators, ext(C2 ) ⊆ int(C1 ). The functions mentioned in Propositions 4 and 3 can be called second-order irreducible for KF . In the next proposition we show that it is not necessary to look for three functions at once in order to find all p-indecomposable functions. Therefore, we do not need to define third-order irreducibility. Here we use the notation: for I ⊆ {1, 2, 3} : LI = {gi | i ∈ I}. We omit the curly brackets in I, i.e. L{1,2} = L12 = {g1 , g2 }. Proposition 5. Let G = {g1 , g2 , g3 } be a set of functions such that G ∩ F = ∅ and for i ∈ {1, 2, 3} : gi⊥F ∪G ∩ F = int(Ci ). If not all functions from G are reducible in KF ∪G then there exists L ⊂ G such that not all functions from L are reducible in KF ∪L . Proof. Let g1 be reducible in KF ∪L12 and in KF ∪L13 . Then there exists H ⊆ ⊥ ⊥ F ∪ {g2 } : H ⊥F ∪L12 = g1 F ∪L12 and J ⊆ F ∪ {g3 } : J ⊥F ∪L13 = g1 F ∪L13 . Fix these H and J. If either g2 is irreducible in KF ∪L2 or g3 is irreducible in KF ∪L3 then the proposition is proved. Therefore, we can assume that they are reducible in corresponding context. Hence, without loss of generality, we can assume that H, J ⊆ F (i.e. H ∩ G = J ∩ G = ∅). Note that ⊥ ⊥ g1⊥F ∪G = g1 F ∪L13 ∪ g1 F ∪L12 = J ⊥F ∪L13 ∪ H ⊥F ∪L12 . (3) Let g3 ∈ H ⊥F ∪G . Then g3 ⊥ H. As g3⊥F ∪G ∩ F = int(C3 ) we obtain H ⊆ int(C3 ). Moreover, as int(C3 ) is an intent in KF we have H ⊥F ⊥F ⊆ int(C3 ). As g1⊥F ∪G ∩ F = H ⊥F = J ⊥F = int(C1 ) we have J ⊥F ⊥F ⊆ int(C3 ) and, by Finding p-indecomposable Functions: FCA Approach 43 properties of closure operators, J ⊆ int(C3 ). Therefore, g3 ⊥ J and g3 ∈ J ⊥F ∪G . Similarly, if g2 ∈ J ⊥F ∪G then g2 ∈ H ⊥F ∪G . Hence, H ⊥F ∪L12 ∪ J ⊥F ∪L13 = H ⊥F ∪G ∪ J ⊥F ∪G . (4) Combining (3) and (4) we obtain g1⊥F ∪G = H ⊥F ∪G ∪ J ⊥F ∪G . Therefore, g1⊥F ∪G = (H ∩ J)⊥F ∪G . Hence, g1 is reducible in KF ∪G and we arrive at a contradiction with initial assumption. Therefore, if g1 , g2 are in KF ∪L12 then at least g1 is irreducible in KF ∪L13 . If g3 is reducible in KF ∪L13 then g1 is reducible in KF ∪L1 . Otherwise, both g1 , g3 are irreducible in KF ∪L13 . Suppose that a context KF contains all p-indecomposable functions, how- ever, the task is to prove this fact, i.e. that no further p-indecomposable func- tions exist. Suppose it has been checked that no counter-examples exist and every single function g ∈ Uk is first-order reducible for KF . According to the above propositions it is necessary to look for exactly two functions at once in order to prove the desired statement. Therefore, in order to complete the proof for every C1 , C2 ∈ B(KF ) one has to find all the functions g1 , g2 such that ⊥ ⊥ g1 F ∪g1 ∩ F = int(C1 ) and g2 F ∪g2 ∩ F = int(C2 ) and then check if g1 commutes with g2 . Therefore, one has to check the commutation property between all func- tions (if the context indeed contains all p-indecomposable functions). As already discussed, this task is infeasible. This result is discouraging. However, having the knowledge about the final result in some cases we can guarantee that all p- indecomposable functions will be found even without looking for two functions at once. 4.2 Implicatively Closed Subcontexts During the exploration of p-clones one can discover such a subcontext of func- tions that no further function is a counter-example to existing implications. We shall say that such a subcontext is implicatively closed, meaning that all the valid in this subcontext implications are valid in the final context as well. Analysis of similar constructions can be found in [Gan07]. In order to guarantee the discovery of all p-indecomposable functions (suc- cess of exploration) it would suffice to find such a subcontext that it is neither implicatively closed nor contained in any other implicatively closed subcontext. Suppose the context KF = (F, F, ⊥F ), F ⊆ Uk is discovered. As earlier, we de- note the context of all p-indecomposable functions on Uk by KFkp . Let S = Fkp \F . It would be desirable to be able to guarantee the discovery of functions S by con- sidering only the discovered part of relation ⊥F and the part ⊥F S (=⊥−1 SF ), see Figure 3. Unfortunately, as the next example shows, in general it is not possible. Example 2. Consider the context in Figure 4. The context contains all the p- indecomposable functions from U2 and three additional objects g1 , g2 , g3 . Func- tions with commutation properties as of g1 , g2 , g3 do not exist. However, if func- tions with commutation properties as of g1 , g2 , g3 existed then the functions g1 , g2 44 Artem Revenko F S F ⊥F ⊥F S S ⊥SF ⊥S Fig. 3. Partitioning of the context KF p of all p-indecomposable functions k would not be counter-examples to any valid in KF2p ∪g3 implication. Note that g3 is a counter-example to a valid in KF2p implication. Therefore, the subcontext containing functions F2p ∪ g3 would be implicatively closed. Moreover, it is even closed with respect to finding first-order irreducible functions as g1 is reducible in KF2p ∪{g1 ,g3 } and g2 is reducible in KF2p ∪{g2 ,g3 } . However, if instead of g3 we consider the function g4 , which differs from g3 only in that g4 commutes with both g1 and g2 , then the subcontext containing F2p ∪ g4 is neither implicatively closed nor contained in any implicatively closed subcontext of the context KF2p ∪{g1 ,g2 ,g4 } . The difference between g3 and g4 is contained in ⊥S in Figure 3. Therefore, in general it is not possible to guarantee the discovery of functions S without considering ⊥S . f0u f1u f14 b f8b f212 t t f150 f3u g3 g4 g1 g2 f0u × × × × × × × × f1u × × × b f14 × × × × × f8b × × × × t f212 × × × × × t f150 × × × × f3u × × × × × × × × × g3 × × × × g4 × × × × × × g1 × × × × g2 × × × × × Fig. 4. Context KF2p ∪{g1 ,g2 ,g3 } from Example 2 Definition 2. Let KH be a context, KF ⊆ KH , S = H \ F . An object s ∈ S is called an essential counter-example for KF if there exists a valid in KF implication Imp such that Finding p-indecomposable Functions: FCA Approach 45 1. s is a counter-example to Imp; 2. there does not exist an object p ∈ S \ {s} such that p is a counter-example to Imp. It is clear that all the essential counter-examples will necessarily be added to the context during the exploration. The next proposition suggests how one can check if a counter-example is essential or not. In the context KF3p there are several pairs of functions (f1 , f2 ) such that they commute with the same functions except for one commutes with itself and the other does not commute with itself. These functions cannot be essential counter- examples, because they are counter-examples to the same implications, if any. However, if they are the only counter-examples to some valid implication then these functions will eventually be discovered by object-attribute exploration. ⊥U ⊥U Proposition 6. Let s1 , s2 ∈ S such that s2 6⊥ s2 and s1 k = s2 k ∪ {s2 }. If there exists a valid in KF implication Imp such that the counter-examples are exactly s1 , s2 ∈ S then s1 is first-order irreducible for KF ∪s2 and s2 is first-order irreducible for KF ∪s1 . ⊥ Proof. s1 in KF ∪s2 . As Imp is valid in KF the set s2 F ∪s1 is closed in KF . There- fore, as follows from Proposition 1 for the object concept of s2 (ext(Cs2 ) * int(Cs2 )), the function s1 (s1 ⊥ s1 ) is first-order irreducible. ⊥ s2 in KF ∪s1 . As Imp is valid in KF the set s1 F ∪s2 is closed in KF . Therefore, as follows from Proposition 2 for the object concept of s1 (ext(Cs1 ) ⊆ int(Cs1 )), the function s2 (s2 6⊥ s2 ) is first-order irreducible. We have investigated different types of reducibilities, we have shown, that there do not exist third-order irreducible functions. However, the task of finding second-order irreducible functions is infeasible. Fortunately, it is possible to find not only zero-order irreducible functions, but also first-order irreducible func- tions. Moreover, if it would be possible to prove that the functions undiscovered at the moment are not second-order irreducible then we can guarantee that all the p-indecomposable functions will eventually be discovered. 5 Results We take all unary functions as the starting point. Thanks to earlier investigation in [Dan77] we know the final context. When we investigate all possible implica- tively closed partitions such that the implicatively closed subcontext contains all unary functions we find the following: – We start with 27 unary functions, 26 of them are p-indecomposable; – After adding all essential counter-examples we obtain 147 functions; – After using Proposition 6 we obtain 155 functions; – There remain 42 functions to be discovered. By direct check we find that there does not exist an implicatively closed subcontext containing 155 men- tioned above functions such that all the undiscovered functions are second- order irreducible. 46 Artem Revenko Hence, if we start from all unary functions on A3 all the functions F3p will eventually be discovered. The experiment was conducted three times starting from different initial contexts, all three times the exploration was successful. The exploration stating u from a single constant function f3,0 took 207 steps. References [BW87] S. Burris and R. Willard. Finitely many primitive positive clones. Proceedings of the American Mathematical Society, 101(3):427–430, 1987. [Dan77] A.F. Danil’chenko. Parametric expressibility of functions of three-valued logic. Algebra and Logic, 16(4):266–280, 1977. [Dau00] F. Dau. Implications of properties concerning complementation in finite lat- tices. In: Contributions to General Algebra 12 (D. Dorninger et al., eds.), Proceedings of the 58th workshop on general algebra “58. Arbeitstagung All- gemeine Algebra”, Vienna, Austria, June 3-6, 1999, Verlag Johannes Heyn, Klagenfurt, pages 145–154, 2000. [Gan07] B. Ganter. Relational galois connections. Formal Concept Analysis, pages 1–17, 2007. [GD86] J.-L. Guigues and V. Duquenne. Familles minimales d’implications informa- tives résultant d’un tableau de données binaires. Math. Sci. Hum, 24(95):5–18, 1986. [GW99] B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer, 1999. [KPR06] L. Kwuida, C. Pech, and H. Reppe. Generalizations of boolean algebras. an attribute exploration. Mathematica Slovaca, 56(2):145–165, 2006. [Kuz79] A.V. Kuznetsov. Means for detection of nondeducibility and inexpressibility. Logical Inference, pages 5–33, 1979. [Pos42] E.L. Post. The two-valued iterative systems of mathematical logic. Princeton University Press, 1942. [Rev14] A. Revenko. Automatized construction of implicative theory of algebraic identities of size up to 5. In Cynthia Vera Glodeanu, Mehdi Kaytoue, and Christian Sacarea, editors, Formal Concept Analysis, volume 8478 of Lecture Notes in Computer Science, pages 188–202. Springer International Publishing, 2014. [Rev15] A. Revenko. Automatic Construction of Implicative Theories for Mathemati- cal Domains. PhD thesis, TU Dresden, 2015. [Ros70] I. Rosenberg. Über die funktionale Vollständigkeit in den mehrwertigen Logiken: Struktur der Funktionen von mehreren Veränderlichen auf endlichen Mengen. Academia, 1970. [Yab60] S.V. Yablonsky. Functional Constructions in K-valued Logic. U.S. Joint Pub- lications Research Service, 1960. [YM59] Yu.I. Yanov and A.A. Muchnik. On the existence of k-valued closed classes that have no bases. Doklady Akademii Nauk SSSR, 127:44–46, 1959. Putting OAC-triclustering on MapReduce Sergey Zudin, Dmitry V. Gnatyshak, and Dmitry I. Ignatov National Research University Higher School of Economics, Russian Federation dignatov@hse.ru http://www.hse.ru Abstract. In our previous work an efficient one-pass online algorithm for triclustering of binary data (triadic formal contexts) was proposed. This algorithm is a modified version of the basic algorithm for OAC- triclustering approach; it has linear time and memory complexities. In this paper we parallelise it via map-reduce framework in order to make it suitable for big datasets. The results of computer experiments show the efficiency of the proposed algorithm; for example, it outperforms the online counterpart on Bibsonomy dataset with ≈ 800, 000 triples. Keywords: Formal Concept Analysis, triclustering, triadic data, data mining, big data, MapReduce 1 Introduction Mining of multimodal patterns is one of the hot topics in Data Mining and Ma- chine Learning [1,2,3,4]. Thus, cluster analysis of multimodal data and specifi- cally of dyadic and triadic relations is a natural extension of the idea of original clustering. In dyadic case biclustering methods (the term bicluster was coined in [5]) are used to simultaneously find subsets of the sets of objects and at- tributes that form homogeneous patterns of the input object-attribute data. In fact, one of the most popular applications of biclustering is gene expression anal- ysis in Bionformatics [6,7]. Triclustering methods operate in triadic case where for each object-attribute pair one assigns a set of some conditions [8,9,10]. Both biclustering and triclustering algorithms are widely used in such areas as gene expression analysis [11,12,13], recommender systems [14,15,16], social networks analysis [17], etc. The processing of numeric multimodal data is also possible by modifications of existing approaches for mining dyadic binary relations [18]. Though there are methods that can enumerate all triclusters satisfying cer- tain constraints [2] (in most cases they ensure that triclusters are dense), their time complexity is rather high, as in the worst case the maximal number of triclusters usually is exponential (e.g. in case of formal triconcepts), showing that these methods are hardly scalable. To process big data algorithms need to have at most linear time complexity (e.g., O(|I|) in case of n-ary relation I) and be easily parallelisable. In addition, in most cases, it is necessary that such algorithms output the results in one pass. Earlier, in order to create an algorithm satisfying these requirements, we adapted a triclustering method based on prime operators (prime OAC-triclustering c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 47–58, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 48 Sergey Zudin, Dmitry V. Gnatyshak and Dmitry I. Ignatov method) [10] and proposed its online version, which is linear, one-pass and eas- ily parallelisable [19]. However, its parallelisation is possible in different ways. For example, one can use a popular framework for commodity hardware, Map- Reduce (M/R) [20]. By the way, there were several successful M/R implementa- tions in the FCA community and other lattice-oriented domains. Thus, in [21], the authors adapted Close-by-One algorithm to M/R framework and showed its efficiency. At the same year, in [22], an efficient M/R algorithm for compu- tation of closed cube lattices was proposed. The authors of [23] demonstrated that iterative algorithms like Ganter’s NextClosure can benefit from the usage of iterative M/R schemes. Note that experts aware potential users that M/R is like a big cannon that re- quires long preparations to shot but fires fast: “the entire distributed-file-system milieu makes sense only when files are very large and are rarely updated in place ” [20]. In this work, in contrast to our previous study, we assume that there is a large bulk of data to process that are not coming online. The rest of the paper is organized as follows: in Section 2, we recall the orig- inal method and the online version of the algorithm of prime OAC-triclustering. In Section 3, we describe the M/R setting of the problem and the corresponding M/R version of the original algorithm with important implementation aspects. Finally, in Section 4 we show the results of several experiments which demon- strate the efficiency of the M/R version of the algorithm. As an addendum, in the Appendix section, the reader may find our proposal for alternative models of M/R-based variants of prime OAC-triclustering. 2 Prime object-attribute-condition triclustering Prime object-attribute-condition triclustering method (OAC-prime) based on Formal Concept Analysis [24,25] is an extension for the triadic case of object- attribute biclustering method [26]. Triclusters generated by this method have similar structure as the corresponding biclusters, namely the cross-like structure of triples inside the input data cuboid (i.e. formal tricontext). Let K = (G, M, B, I) be a triadic context, where G, M , B are respectively the sets of objects, attributes, and conditions, and I ⊆ G × M × B is a tri- adic incidence relation. Each prime OAC-tricluster is generated by applying the following prime operators to each pair of components of some triple: (X, Y )0 = {b ∈ B | (g, m, b) ∈ I for all g ∈ X, m ∈ Y }, (X, Z)0 = {m ∈ M | (g, m, b) ∈ I for all g ∈ X, b ∈ Z}, (1) (Y, Z)0 = {g ∈ G | (g, m, b) ∈ I for all m ∈ Y, b ∈ Z}, where X ⊆ G, Y ⊆ M , and Z ⊆ B. Then the triple T = ((m, b)0 , (g, b)0 , (g, m)0 ) is called prime OAC-tricluster based on triple (g, m, b) ∈ I. The components of tricluster are called, respectively, tricluster extent, tricluster intent, and tricluster modus. The triple (g, m, b) is called a generating triple of the tricluster T . Figure 1 shows the structure of an OAC-tricluster (X, Y, Z) based on triple (e e eb), triples corresponding to the g , m, Putting OAC-triclustering on MapReduce 49 gray cells are contained in the context, other triples may be contained in the tricluster (cuboid) as well. Fig. 1. Structure of prime OAC-triclusters: the dense cross-like central layer containing g̃ (left) and the layer for an object g (right) in M × B dimensions. The basic algorithm for prime OAC-triclustering method is rather straight- forward (see [10]). First of all, for each combination of elements from each two sets of K we apply the corresponding prime operator (we call the resulting sets prime sets). After that we enumerate all triples from I and on each step we must generate a tricluster based on the corresponding triple, check whether this tricluster is already contained in the tricluster set (by using hashing) and also check extra conditions. The total time complexity of the algorithm depends on whether there is a non-zero minimal density threshold or not and on the complexity of the hashing algorithm used. In case we use some basic hashing algorithm processing the tricluster’s extent, intent and modus without a minimal density threshold, the total time complexity is O(|G||M ||B| + |I|(|G| + |M | + |B|)); in case of a non- zero minimal density threshold, it is O(|I||G||M ||B|). The memory complexity is O(|I|(|G| + |M | + |B|)), as we need to keep the dictionaries with the prime sets in memory. In online setting, for triples coming from triadic context K = (G, M, B, I), the user has no a priori knowledge of the elements and even cardinalities of G, M , B, and I. At each iteration we receive some set of triples from I: J ⊆ I. After that we must process J and get the current version of the set of all triclusters. It is important in this setting to consider every pair of triclusters as being different as they have different generating triples, even if their respective extents, intents, and modi are equal. Thus, any other triple can change only one of these two triclusters, making them different. To efficiently access prime sets for their processing, the dictionaries contain- ing the prime sets are implemented as hash-tables. The algorithm is straightforward as well (Alg. 1). It takes some set of triples (J), the current tricluster set (T ), and the dictionaries containing prime sets (P rimes) as input and outputs the modified versions of the tricluster set and 50 Sergey Zudin, Dmitry V. Gnatyshak and Dmitry I. Ignatov dictionaries. The algorithm processes each triple (g, m, b) of J sequentially (line 1). At each iteration the algorithm modifies the corresponding prime sets (lines 2-4). Finally, it adds a new tricluster to the tricluster set. Note that this tricluster contains pointers to the corresponding prime sets (in the corresponding dictio- naries) instead of the copies of the prime sets (line 5) which allows to lower the memory and access costs. Algorithm 1 Add function for the online algorithm for prime OAC-triclustering. Input: J is a set of triples; T = {T = (∗X, ∗Y, ∗Z)} is a current set of triclusters; P rimesOA, P rimesOC, P rimesAC. Output: T = {T = (∗X, ∗Y, ∗Z)}; P rimesOA, P rimesOC, P rimesAC. 1: for all (g, m, b) ∈ J do 2: P rimesOA[g, m] := P rimesOA[g, m] ∪ {b} 3: P rimesOC[g, b] := P rimesOC[g, b] ∪ {m} 4: P rimesAC[m, b] := P rimesAC[m, b] ∪ {g} 5: T := T ∪ {(&P rimesAC[m, b], &P rimesOC[g, b], &P rimesOA[g, m])} 6: end for The algorithm is one-pass and its time and memory complexities are O(|I|). Duplicate elimination and selection patterns by user-specific constraints are done as post-processing to avoid patterns’ loss. The time complexity of the basic post-processing is O(|I|) and it does not require any additional memory. Finally, it seems the algorithm can be easily parallelised by splitting the subset of triples J into several subsets, processing each of them independently, and merging the resulting sets afterward. 3 Map-reduce OAC-triclustering 3.1 Map-reduce decomposition We use a two-stage M/R approach. The first M/R allows us to efficiently cal- culate all the primes of the existed pairs. The second M/R permits to assemble the found primes into triclusters. During the first map phase, each triple from the input context is indexed by a key using hash function depending on one of the basic entity types, object, attribute, or condition (see Alg. 2). The number of map keys is equal to the number of reducers. Then each first reducer receives the portion of data for a particular key (see Alg. 3). The internal reducer algorithm is almost a replication of Online OAC- prime. However, it does not assemble all found triclusters into a final collection; the reducer simply writes the file with the current triclusters for a given portion of data to a file or pass it to the second-stage mapper. Since in Hadoop MapReduce Putting OAC-triclustering on MapReduce 51 Algorithm 2 Distributed OAC-triclustering: First Map Input: S is a set of input triples as strings; r is a number of reducers; i is a grouping index (objects, attributes or conditions). Output: J˜ is a list of hkey, triplei pairs. 1: for all s ∈ S do 2: t := transf orm(s) 3: key := hash(t[i]) mod r 4: J˜ := J˜ ∪ {hkey, ti} 5: end for we should work with text input files and our data are mainly in a tuple-based form, we use encode/decode function encode()/transf rom() to switch between the internal tuple representation and the text-based one. Algorithm 3 Distributed OAC-triclustering: First Reduce Input: J is a list of triples (for a certain key); T = {T = (X, Y, Z)} is a current set of triclusters; P rimesOA, P rimesOC, P rimesAC. Output: file of strings – encoded htriple, triclusteri pairs. 1: P rimes ← initialise a new multimap 2: for all (g, m, b) ∈ J do 3: P rimes[g, m] := P rimes[g, m] ∪ {b} 4: P rimes[g, b] := P rimes[g, b] ∪ {m} 5: P rimes[m, b] := P rimes[m, b] ∪ {g} 6: end for 7: for all (g, m, b) ∈ J do 8: T := (set(P rimes[m, b]), set(P rimes[g, b]), set(P rimes[g, m])) 9: s := {encode(h(g, m, b), T i)} 10: store s 11: end for The second mapper takes the found intermediate triclusters (with their keys) as strings from the files produced by the first-stage reducers (see Alg. 4). It fills P rimes multimap in one pass through all htriple, triclusteri pairs. In the next loop for each key (g, m, b) the corresponding tricluster is formed and htricluster, triclusteri pairs are passed to the second-stage reducer (the key tricluster can be efficiently implemented by a proper hashing). In its turn, the second stage reducer eliminates duplicates and outputs the resulting file (Alg. 4). The set() function helps to avoid duplicates among the values of P rimes[, ], which is closer to our implementation. However, one can easily omit set() in line 8, provided that P rimes is properly implemented. The time complexity of the M/R solution is composed from two terms for each stage: O(|I|/r) and O(|I|). However, there are communication costs that 52 Sergey Zudin, Dmitry V. Gnatyshak and Dmitry I. Ignatov Algorithm 4 Distributed OAC-triclustering: Second Map Input: S is a list of strings. Output: T̃ is an list of htricluster, triclusteri pairs. 1: P rimes ← initialise a new multimap 2: for all s ∈ S do 3: h(g, m, b), T i := decode(s) 4: update P rimes multimap appropriately 5: I := I ∪ {(g, m, b)} 6: end for 7: for all (g, m, b) ∈ I do 8: T := (set(P rimes[m, b]), set(P rimes[g, b]), set(P rimes[g, m])) 9: T̃ := T̃ ∪ {hT, T i} 10: end for Algorithm 5 Distributed OAC-triclustering: Second Reduce Input: T̂ is a list of htricluster, list of triclustersi pairs. Output: File with a final set of triclusters {T = (X, Y, Z)}. 1: for all hT, [T, . . . , T ]i ∈ T̂ do 2: store T 3: end for should be inevitably paid and can be theoretically estimated as follows [20]: the replication rate for the first M/R stage r1 = 1 (each triple is passed as one key- value pair), the reducer size q1 = |I|/r; the replication rate for the second M/R stage is r2 = 1 (it assign one key-value pair for each tricluster), but the reducer size varies from q2min = 1 (no duplicate triclusters) and q2max = |I| (one final tricluster when all the initial triples belong to one absolutely dense cuboid). 3.2 Implementation aspects and used technologies The application 1 has been implemented in Java within JRE 8 and as distributed computation framework we use Apache Hadoop 2 . We have used many other technologies: Apache Maven (framework for au- tomatic project assembling), Apache Commons (for work with extended Java collections), Google Guava (utilities and data structures), Jackson JSON (open- source library for transformation of object-oriented representation of an object like tricluster to string), TypeTools (for real-time type resolution of inbound and outbound key-value pairs), etc. ChainingJob module. During the development we found that in Hadoop one MapReduce process can contain only one Mapper and one Reducer. Thus, in order to develop an application with three “map” phases and one “reduce”, one needs to create three processes. One process creation (even without various adjustments) takes 8-10 lines of code. After our vain search of an appropriate 1 https://github.com/zydins/DistributedTriclustering 2 http://hadoop.apache.org/ Putting OAC-triclustering on MapReduce 53 library, we developed “chaining-job” module 3 . Its main class contains the fol- lowing fields: “jobs” (list of all scheduled processes), “name” (common name for all processes), and “tempDir” (folder name for intermediate results). First, the algorithm set input path for the first chaining process and path to the result of the last job; the rest jobs are connected by input and output “key-value” pairs and directory for intermediate files storage. Then this algorithm runs processes according to the schedule and waits their completion. In other words, it connects the input and output of chaining processes that run sequentially. Let us shortly describe the most important classes our M/R implementation. Entity. It is a basic class for object oriented representation of input strings and maintains three entity types: EXTENT, INTENT, and MODUS. For example: {“Leon”, EXTENT }. Tuple. An object of this class stores references to objects of class Entity and represents two basic entities: triple and tricluster. Mapper and Reducer classes operate with objects of this type. FormalContext. This class is an object oriented representation of the underlying binary relation; it keeps the reference to an object of EntityStorage class (see below). It also contains methods “add” (add triple) and “getTriclusters” (get the output set of unique triclusters). EntityStorage. This class manages the work with extents, intents and modi of triclustes. It also contains three dictionaries with composite keys. For example, for (g1, m1, c1) object c1 will be added by key (g1, m1) to the first dictionary; analogously for keys (g1, c1) and (m1, c1). The process-like M/R classes are summarised below. TupleReadMapper. Its main goal is reading a triple from the input file and trans- form the triple to an object of class Tuple. TupleContextReducer. It receives input tuples and fills the underlying tricontext by them. It also sets the number of first reducers. This number depends on the available nodes in a distributed system and the structure of input data. The more unique entities are in triples, the more that value should be. PrepareMapper. The “map” method receives files from the previous stage. They contain intermediate triclusters from each object of class TupleContextReducer. It fills the dictionary with primes. Further, each tricluster triple is transformed to Tuple structure and is passed to the second reduce phase. CollectReduce. This class gathers all intermediate triclusters and obtains the final tricluster set. This process runs in several threads for speed up. The number of threads is a user-specified parameter. Executor. It is a starting class of the application, which receives the input pa- rameters, activates “chaining-job” utility for making a chain of jobs, and starts the execution. 3 https://github.com/zydins/chaining-job 54 Sergey Zudin, Dmitry V. Gnatyshak and Dmitry I. Ignatov 4 Experiments Two series of experiments have been conducted in order to test the application on the synthetic contexts and real world datasets with moderate and large num- ber of triples in each. In each experiment both versions of the OAC-triclustering algorithm have been used to extract triclusters from a given context. Only online and M/R versions of OAC-triclustering algorithm have managed to result pat- terns for large contexts since the computation time of the compared algorithms was too high (>3000 s). To evaluate the runtime more carefully, for each context the average result of 5 runs of the algorithms has been recorded. 4.1 Datasets Synthetic datasets. As it was mentioned, synthetic contexts were randomly gen- erated: 1) 20,000 triples (25 unique entities of each type); 2) 100,000 triples (50 unique entities of each type); 3) 1,000,000 triples (all possible combinations of 100 unique entities of each type). However, it is easy to see that some datasets are not correct formal contexts from algebraic viewpoint. Thus, the first dataset inevitably contains duplicates since 25 × 25 × 25 gives only 15,625 unique triples. The second one contains less triples than 503 = 125, 000, the number of all possi- ble combinations. The third one is just an absolutely dense cuboid 100×100×100 (it contains only one formal concept (OAC-tricluster), the whole context). These tests look more like crush test, but they have sense since in M/R setting the triples can be (partially) repeated, e.g., because of M/R task failures on some nodes (i.e. restarting processing of some key-value pairs). Even though the third dataset does not result in 3min(|G|,|M |,|B|) formal triconcepts, the worst case for formal triconcepts generation in terms of the number of patterns, this is an example of the worst case scenario for the second reducer since its size is maximal (q2max = |I|). By the way, our algorithm should correctly assemble the only one tricluster (G, M, B) and it actually does. IMDB. This dataset consists of Top-250 list of the Internet Movie Database (250 best movies based on user reviews). The following triadic context is composed: the set of objects consists of movie names, the set of attributes (tags), the set of conditions (genres), and each triple of the ternary relation means that the given movie has the given genre and is assigned the given tag. Bibsonomy. Finally, a sample of the data of bibsonomy.org from ECML PKDD discovery challenge 2008 has been used. This website allows users to share book- marks and lists of literature and tag them. For the tests the following triadic context has been prepared: the set of objects consists of users, the set of at- tributes (tags), the set of conditions (bookmarks), and a triple of the ternary relation means that the given user has assigned the given tag to the given book- mark. The table 1 contains the summary of the contexts. Putting OAC-triclustering on MapReduce 55 Table 1. Contexts for the experiments Context |G| |M | |B| # triples Density 20k 25 25 25 20,000 1 100k 50 50 50 100,000 0.8 1m 100 100 100 1,000,000 1 IMDB 250 795 22 3,818 0.00087 BibSonomy 2,337 67,464 28,920 816,197 1.8 · 10−7 4.2 Results The experiments has been conducted on the computer running under OS X 10, using 1,8 GHz Intel Core i5, having 4 Gb 1600 MHz DDR3 and having 8 Gb free space on its hard drive (a typical commodity hardware). Two M/R modes have been tested: sequential mode of tasks completion and emulation of distributed one with 16 first reducers and 32 threads for the second stage. Table 2. Results of comparison (time is given in seconds) Algorithm/Context IMDB 20k 100k 1m Bibsonomy (≈3k triples) triples triples triples (≈800k triples) Tribox 324 800 1,265 >3,000 >3,000 TRIAS 189 362 862 >3,000 >3,000 OAC Box 374 756 1,265 >3,000 >3,000 OAC Prime 7 8 734 >3,000 >3,000 Online OAC prime 3 3 3 5 >3,000 M/R OAC prime seq. 12 30 81 166 1,534 M/R OAC prime distr. 1 15 20 25 520 In Table 2 we summarise the results of performed tests. It is clear that on average our application has fewer execution time than its competitors, except of online version of OAC-triclustering. If we compare the implemented program with its original online version, the results are worse for not that big but dense datasets (closer to the worst case scenario q2 = |I|. It is the consequence of the fact that the application architecture aimed at processing of large amounts of data; in particular, it is implemented in two stages with time consuming com- munication. Launching and stopping Apache Hadoop, data writing and passing between Map and Reduce steps in both stages requires substantial time, that is why for not that big datasets, when execution time is comparable with time for infrastructure management, time performance is not perfect. However, with data size increase the relative performance is growing. Thus, the last test for BibSon- omy data has been successfully passed, but the competitors were not able to finish it within 50 min, but our M/R program did it even in sequential mode within 25 min. 56 Sergey Zudin, Dmitry V. Gnatyshak and Dmitry I. Ignatov 5 Conclusion In this paper we have presented a map-reduce version of OAC-triclustering algo- rithm. We have shown that the algorithm is efficient from both theoretical and practical points of view. It remains of linear time complexity and is performed in two stages (with each stage being M/R distributed); this allows us to use it for big data problems. However, we believe that it is possible to propose another variants of map-reduce based algorithm where the reducer exploits composite keys directly (see Appendix section). So, such algorithms and their comparison with the current M/R version on real and artificial data is still be in our plans. However, in despite the step towards Big Data technologies, a proper comparison of the proposed OAC triclustering and noise tolerant patterns in n-ary relations by DataPeeler and its descendants [2] is not yet conducted. Acknowledgments. The study was implemented in the framework of the Basic Research Program at the National Research University Higher School of Eco- nomics in 2014-2015, in the Laboratory of Intelligent Systems and Structural Analysis. The last two authors were partially supported by Russian Foundation for Basic Research, grant no. 13-07-00504. The authors would like to thank Yuri Kudriavtsev from PM-Square and Dominik Slezak from Infobright and Warsaw University for their encouragement given to our studies of M/R technologies. References 1. Georgii, E., Tsuda, K., Schölkopf, B.: Multi-way set enumeration in weight tensors. Machine Learning 82(2) (2011) 123–155 2. Cerf, L., Besson, J., Nguyen, K.N., Boulicaut, J.F.: Closed and noise-tolerant patterns in n-ary relations. Data Min. Knowl. Discov. 26(3) (2013) 574–619 3. Spyropoulou, E., De Bie, T., Boley, M.: Interesting pattern mining in multi- relational data. Data Mining and Knowledge Discovery 28(3) (2014) 808–849 4. Ignatov, D.I., Gnatyshak, D.V., Kuznetsov, S.O., Mirkin, B.: Triadic formal con- cept analysis and triclustering: searching for optimal patterns. Machine Learning (2015) 1–32 5. Mirkin, B.: Mathematical Classification and Clustering. Kluwer, Dordrecht (1996) 6. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biology Bioinform. 1(1) (2004) 24–45 7. Eren, K., Deveci, M., Kucuktunc, O., Catalyurek, Umit V.: A comparative analysis of biclustering algorithms for gene expression data. Briefings in Bioinform. (2012) 8. Mirkin, B.G., Kramarenko, A.V.: Approximate bicluster and tricluster boxes in the analysis of binary data. In Kuznetsov, S.O., et al., eds.: RSFDGrC 2011. Volume 6743 of Lecture Notes in Computer Science., Springer (2011) 248–256 9. Ignatov, D.I., Kuznetsov, S.O., Poelmans, J., Zhukov, L.E.: Can triconcepts be- come triclusters? International Journal of General Systems 42(6) (2013) 572–593 10. Gnatyshak, D.V., Ignatov, D.I., Kuznetsov, S.O.: From triadic FCA to tricluster- ing: Experimental comparison of some triclustering algorithms. In: CLA. (2013) 249–260 Putting OAC-triclustering on MapReduce 57 11. Zhao, L., Zaki, M.J.: Tricluster: An effective algorithm for mining coherent clusters in 3d microarray data. In: SIGMOD 2005 Conference. (2005) 694–705 12. Li, A., Tuck, D.: An effective tri-clustering algorithm combining expression data with gene regulation information. Gene regul. and syst. biol. 3 (2009) 49–64 13. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181(10) (2011) 1989–2001 14. Nanopoulos, A., Rafailidis, D., Symeonidis, P., Manolopoulos, Y.: Musicbox: Per- sonalized music recommendation based on cubic analysis of social tags. IEEE Transactions on Audio, Speech & Language Processing 18(2) (2010) 407–412 15. Jelassi, M.N., Yahia, S.B., Nguifo, E.M.: A personalized recommender system based on users’ information in folksonomies. In Carr, L., et al., eds.: WWW (Companion Volume), ACM (2013) 1215–1224 16. Ignatov, D.I., Nenova, E., Konstantinova, N., Konstantinov, A.V.: Boolean Matrix Factorisation for Collaborative Filtering: An FCA-Based Approach. In: AIMSA 2014, Varna, Bulgaria, Proceedings. Volume LNCS 8722. (2014) 47–58 17. Gnatyshak, D.V., Ignatov, D.I., Semenov, A.V., Poelmans, J.: Gaining insight in social networks with biclustering and triclustering. In: BIR. Volume 128 of Lecture Notes in Business Information Processing., Springer (2012) 162–171 18. Kaytoue, M., Kuznetsov, S.O., Macko, J., Napoli, A.: Biclustering meets triadic concept analysis. Ann. Math. Artif. Intell. 70(1-2) (2014) 55–79 19. Gnatyshak, D.V., Ignatov, D.I., Kuznetsov, S.O., Nourine, L.: A one-pass triclus- tering approach: Is there any room for big data? In: CLA 2014. (2014) 20. Rajaraman, A., Leskovec, J., Ullman, J.D.: MapReduce and the New Software Stack. In: Mining of Massive Datasets. Cambridge University Press, England, Cambridge (2013) 19–70 21. Krajca, P., Vychodil, V.: Distributed algorithm for computing formal concepts using map-reduce framework. In: N. Adams et al. (Eds.): IDA 2009. Volume LNCS 5772. (2009) 333–344 22. Kuznecov, S., Kudryavcev, Y.: Applying map-reduce paradigm for parallel closed cube computation. In: 1st Int. Conf. on Advances in Databases, Knowledge, and Data Applications, DBKDS 2009. (2009) 62–67 23. Xu, B., de Frein, R., Robson, E., Foghlu, M.O.: Distributed formal concept analysis algorithms based on an iterative mapreduce framework. In Domenach, F., Ignatov, D., Poelmans, J., eds.: ICFCA 2012. Volume LNAI 7278. (2012) 292–308 24. Wille, R.: Restructuring lattice theory: An approach based on hierarchies of con- cepts. In Rival, I., ed.: Ordered Sets. Volume 83 of NATO Advanced Study Insti- tutes Series. Springer Netherlands (1982) 445–470 25. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. 1st edn. Springer-Verlag New York, Inc., Secaucus, NJ, USA (1999) 26. Ignatov, D.I., Kuznetsov, S.O., Poelmans, J.: Concept-based biclustering for inter- net advertisement. In: ICDM Workshops, IEEE Computer Society (2012) 123–130 Appendix. Alternative variants of two-stage MapReduce First Map: Finding primes. During this phase every input triple (g, m, b) is encoded by three key-value pairs h(g, m), bi, h(g, b), mi, and h(m, b), gi. These pairs are passed to the first reducer. The replication rate is r1 = 3. 58 Sergey Zudin, Dmitry V. Gnatyshak and Dmitry I. Ignatov First Reduce: Finding primes. This reducer fills three corresponding dic- tionaries for primes of keys. So, for example, the first dictionary, P rimeOA contains key-value pairs h(g, m), {b1 , b2 , . . . , bn }i. The reducer size is q1 = max(|G|, |M |, |B|) The process can be stopped after the first reduce phase and all the triclus- ters found as (P rime[g, m], P rime[g, b], P rime[m, b]) each by enumeration of (g, m, b) ∈ I. However, to do it faster and keep the result for further computa- tion, it is possible to use M/R as well. Second Map: Tricluster generation. The second map does tricluster combining job, i.e. for each triple (g, m, b) it composes the new key-value pair, h(g, m, b), ∅i. And for each pair of either type, h(g, m), P rime[g, m]i, h(g, b), P rime[g, b]i, and h(m, b), P rime[m, b]i it generates key-values pairs h(g, m, b̃), P rime[g, m]i, h(g, m̃, b), P rimeOC[g, b]i, and h(g̃, m, b), P rime[m, b]i, where g̃ ∈ G, m̃ ∈ M , and b̃ ∈ B. r2 = (|I|+3|G||M ||B|)/(|I|+|G||M |+|G||B|+ |M ||B|) ≤ (ρ + 3)/(ρ + 3/max(|G|, |M |, |B|)), where ρ is the input tricontext density. Second Reduce: Tricluster generation. The second reducer just assem- bles only one value for each key (g, m, b), the generating triple, its tricluster, (P rime[g, m], P rime[g, b], P rime[m, b]). If there is no key-value pair h(g, m, b), ∅i for a particular triple (g, m, b), it does not output any key-value pair for the key. The reducer size q2 is either 3 (no output) or 4 (tricluster assembled). Second Map: Tricluster generation with duplicate generating triples. Second map does tricluster combining job, i.e. for each triple (g, m, b) it composes a new key-value pair: h(P rime[g, m], P rime[g, b], P rime[m, b]), (g, m, b)i. Second Map: Tricluster generation with duplicate generating triples. The second reducer just groups values for each key: h(X, Y, Z), {(g1 , m1 , b1 ), . . . , (gn , mn , bn )}i. These two variations of the second stage have their merits: the first one is beneficial for further computations with a new portion of triples and the last one is more compact and informative. Of course, each variant of the second stage has its own runtime complexity which depends not only on the model representation but is also sensitive to datastructures implementation and M/R communication costs and settings. Concept interestingness measures: a comparative study Sergei O. Kuznetsov1 and Tatiana P. Makhalova1,2 1 National Research University Higher School of Economics, Kochnovsky pr. 3, Moscow 125319, Russia 2 ISIMA, Complexe scientifique des Cézeaux, 63177 Aubière Cedex, France skuznetsov@hse.ru,t.makhalova@gmail.com Abstract. Concept lattices arising from noisy or high dimensional data have huge amount of formal concepts, which complicates the analysis of concepts and dependencies in data. In this paper, we consider several methods for pruning concept lattices and discuss results of their com- parative study. 1 Introduction Formal Concept Analysis (FCA) underlies several methods for rule mining, clus- tering and building taxonomies. When constructing a taxonomy one often deals with high dimensional or/and noisy data which results in a huge amount of for- mal concepts and dependencies given by implications and association rules. To tackle this issue different approaches were proposed for selecting most important or interesting concepts. In this paper we consider existing approaches which fall into the following groups: pre-processing of a formal context, modification of the closure operator, and concept filtering based on interestingness indices (mea- sures). We mostly focus on comparison of interestingness measures and study their correlations. 2 FCA framework Here we briefly recall FCA terminology [20]. A formal context is a triple (G, M, I), where G is called a set objects, M is called a set attributes and I ⊆ G × M is a relation called incidence relation, i.e. (g, m) ∈ I if the object g has the attribute 0 m. The derivation operators (·) are defined for A ⊆ G and B ⊆ M as follows: A0 = {m ∈ M |∀g ∈ A : gIm} B 0 = {g ∈ G|∀m ∈ B : gIm} A0 is the set of attributes common to all objects of A and B 0 is the set of objects 0 sharing all attributes of B. The double application of (·) is a closure operator, c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 59–72, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 60 Sergei O. Kuznetsov and Tatyana P. Makhalova 00 i.e. (·) is extensive, idempotent and monotone. Sets A ⊆ G, B ⊆ M , such that A = A00 and B = B 00 are said to be closed. A (formal) concept is a pair (A, B), where A ⊆ G, B ⊆ M and A0 = B, 0 B = A. A is called the (formal) extent and B is called the (formal) intent of the concept (A, B). A partial order 6 is defined on the set of concepts as follows: (A, B) ≤ (C, D) iff A ⊆ C (D ⊆ B), a pair (A, B) is a subconcept of (C, D), while (C, D) is a superconcept of (A, B). 3 Methods for simplifying a lattice structure With the growth of the dimension of a context the size of a lattice can increase exponentially, it becomes almost impossible to deal with the huge amount of formal concepts. With this respect a wide variety of methods have been pro- posed. Classification of them was presented in [16]. Authors proposed to divide techniques for lattice pruning into three classes: redundant information removal, simplification, selection. In this paper, we consider also other classes of methods and their application to concept pruning. 3.1 Pre-processing Algorithms for concept lattice are time consuming. To decrease computation costs one can reduce the size of a formal context. Cheung and Vogel [13] applied Singular Value Decomposition (SVD) to obtain a low-rank approximation of Term-Document matrix and construct concept lattice using pruned concepts. Since this method is also computationally complex [25], alternative methods such as spherical k-Means [14] and fuzzy k-Means [17], Non-negative Matrix Decomposition [33] were proposed. Dimensionality reduction can dramatically decrease the computational load and simplify the lattice structure, but in most cases it is very difficult to interpret the obtained results. Another way to solve described problems without changing the dimension of the context was proposed in [18], where an algorithm that significantly improves the lattice structure by making small changes of context was presented. The central notion of the method is the concept incomparability w.r.t. ≤ relation. The goal of the proposed method is to diminish total incomparability of the concepts in the lattice. The authors note that the result is close to that of fuzzy k-Means, but the former is achieved with fewer context changes than required by the latter. How- ever, such transformations do not always lead to the decrease of a number of formal concepts, the transformations of a context are aimed at increasing the share of comparable concepts, thus this method does not ensure a significant simplification of the lattice structure. Context pruning by clustering objects was introduced in [15]. The similarity of objects is defined as the weighted sum of shared attributes. Thus, the original context is replaced by the reduced one. Firstly, we need to assign weights wm Concept interestingness measures: a comparative study 61 for each attribute m ∈ M . The similarity between objects is defined as weighted sum of shared attributes. Objects are considered similar if sim(g, h) ≥ ε, where ε is a predefined thresh- old. In order to avoid the generation of large clusters another threshold α was proposed. Thus, the algorithm is an agglomerative clustering procedure, such that at each step clusters are brought together if the similarity between them is less than ε and the volume of clusters is less than α|G| objects. 3.2 Reduction based on a background knowledge or predefined constraints Another approach to tackle computation and representation issues is to de- termine constraints on the closure operator. It can be done using background knowledge of attributes. In [8] the extended closure operator was presented. It is based on the notion of AD-formulas (attribute-dependency formulas), which es- tablish dependence of attributes and their relative importance. Put differently, the occurrence of certain attributes implies that more important ones should also occur. Concepts which do not satisfy this condition are not included in the lattice. In [5] a numerical approach to defining attribute importance was proposed. The importance of a formal concept can be defined by various aggregation func- tions (average, minimum, maximum) and different intent subsets (generator, minimal generator or intent itself). It was shown [5] that there is a correspon- dence between this numerical approach and AD-formulas. Carpineto and Romano [12] considered document-term relation and proposed to use a thesaurus of terms to prune the lattice. Two different attributes are considered as same if there is a common ancestor in the hierarchy. To enrich the set of attributes they used a thesaurus, but in general, it may be quite difficult to establish such kind of relationship between arbitrary attributes. Computing concepts with extents exceeding a threshold was proposed in [26] and studied in relation to frequent itemset mining in [34]. The main drawback of this approach, called “iceberg lattice” mining, is missing rare and probably interesting concepts. Several polynomial-time algorithms for computing Galois sub-hierarchies were proposed, see [9, 3]. 3.3 Filtering concepts Selecting most interesting concepts by means of interestingness measures (in- dices) is the most widespread way of dealing with the huge number of concepts. The situation is aggravated by complexity of computing some indices. However, this approach may be fruitful, since it provides flexible tools for exploration of a derived taxonomy. In this section we consider different indices for filtering formal concepts. These indices can be divided into the following groups: mea- sures designed to assess closed itemsets (formal concepts), arbitrary itemsets and 62 Sergei O. Kuznetsov and Tatyana P. Makhalova measures for assessing the membership in a basic level (a psychology-motivated approach). Indices for formal concepts Stability Stability indices were introduced in [27, 28] and modified in [29]. One distinguishes intensional and extensional stability. The first one allows estimating the strength of dependence of an intent on each object of the respective extent. Extensional stability is defined dually. | {C ⊆ A|C 0 = B} | Stabi (A, B) = 2|A| The problem of computing stability is #P -complete [28] and hence it makes this measure impractical for large contexts. In [4] its Monte Carlo approximation was introduced, a combination of Monte Carlo and upper bound estimate was proposed in [10]. Since for large contexts the stability is close to 1 [21] the logarithmic scale of stability (inducing the same ranking as stability) [10] is often used: LStab (c) = −log2 (1 − Stab (c)) The bounds of stability are given by X ∆min (c) − log2 (|M |) ≤ −log2 2−∆(c,d) ≤ LStab (c) ≤ ∆min (c) , d∈DD(c) where ∆min (c) = mind∈DD(c) ∆ (c, d), DD (c) is a set of all direct descendants of c in the lattice and ∆ (c, d) is the size of the set-difference between extents of formal concepts c and d. In our experiments we used the bounds of logarithmic stability, because the combined method is still computationally demanding. Concept Probability Stability of a formal concept may be interpreted as proba- bility of retaining its intent after removing some objects from the extent, taking that all subsets of the extent have equal probability. In [24] it was noticed that some interesting concepts with small number of object usually have low stability value. To ensure selection of interesting infrequent closed patterns, the concept probability was introduced. It is equivalent to the probability of a concept in- troduced earlier by R. Emilion [19]. The probability that an arbitrary object has all attributes from the set B is defined as follows Y pB = pm m∈B Concept probability is defined as the probability of B being closed: n n " # X X n−k Y 00 0 00 k k p (B = B ) = p (|B | = k, B = B ) = pB (1 − pB ) 1 − pm k=0 k=0 m∈B / Concept interestingness measures: a comparative study 63 where n = |G|. The concept probability has the following probabilistic components: the oc- currence of each attribute from B in all k objects, the absence of at least one attribute from B in other objects and the absence of other attributes shared by all k objects. Robustness Another probabilistic approach to assessing a formal concept was proposed in [35]. Robustness is defined as the probability of a formal concept intent remaining closed while deleting objects, where every object of a formal context is retained with probability α. Then for a formal concept c = (A, B) the robustness is given as follows: X |B |−|Bc | |A |−|Ad | r (c, α) = (−1) d (1 − α) c dc Separation The separation index was considered in [24]. The main idea behind this measure is to describe the area covered by a formal concept among all nonzero elements in the corresponding rows and columns of the formal context. Thus, the value characterizes how specific is the relationship between objects and attributes of the concept with respect to the formal context. |A||B| s (A, B) = P 0 P 0 g∈A |g | + m∈B |m | − |A||B| Basic Level Metrics The group of so-called “basic level” measures was consid- ered by Belohlavek and Trnecka [6, 7]. These measures were proposed to formalize the existing psychological approach to defining basic level of a concept [31]. Similarity approach (S) A similarity approach to basic level was proposed in [32] and subsequently formalized and applied to FCA in [6]. The authors defined basic level as combination of three fuzzy functions that correspond to formalized properties outlined by Rosch: high cohesion of concepts, considerably greater cohesion with respect to upper neighbor and a slightly less cohesion with respect to lower neighbors. The membership degree of a basic level is defined as follows: BLS = coh∗∗ (A, B) ⊗ coh∗∗ ∗∗ un (A, B) ⊗ cohln (A, B) , where αi is a fuzzy function that corresponds to the conditions defined above, ⊗ is t-norm [23]. A cohesion of a formal concept is a measure of pairwise similarity of all object in the extent. Various similarity measures can be used for cohesion functions: |B1 ∩ B2 | + |M − (B1 ∪ B2 ) | simSM C (B1 , B2 ) = |M | |B1 ∩ B2 | simJ (B1 , B2 ) = |B1 ∪ B2 | 64 Sergei O. Kuznetsov and Tatyana P. Makhalova The first similarity index simSM C takes into account the number of com- mon attributes, while Jaccard similarity simJ takes exactly the proportion of attributes shared by two sets. There are two ways to compute cohesion of formal concepts: taking average or minimal similarity among sets of attributes of the concept extent, the formulas are represented below (for average and minimal similarity respectively). P 0 0 a x1 ,x2 ⊆A,x1 6=x2 sim... (x1 , x2 ) coh... (A, B) = |A| (|A| − 1) /2 0 0 cohm ... (A, B) = min sim... (x1 , x2 ) x1 ,x2 ∈A The Rosch’s properties for upper and lower neighbors take the following forms: P ∗ ∗ a∗ c∈U N (A,B) coh... (c) /coh... (A, B) coh...,un (A, B) = 1 − |U N (A, B) | P ∗ ∗ c∈LN (A,B) coh... (A, B) /coh... (c) coha∗ ...,ln (A, B) = |LN (A, B) | cohm∗ ...,un (A, B) = 1 − max coh∗... (c) /coh∗... (A, B) c∈U N (A,B) cohm∗ ...,ln (A, B) = min coh∗... (A, B) /coh∗... (c) c∈LN (A,B) where U N (A, B) and LN (A, B) are upper and lower neighbors of a formal concept (A, B) respectively. As the authors noted, experiments revealed that the type of cohesion function does not affect the result, while the choice of similarity measure can greatly affect the outcome. More than that, in some cases upper (lower) neighbors may have higher (lower) cohesion than the formal concept itself (for example, some boundary cases, when a neighbors’s extent (an intent) consists of identical rows (columns) of a formal context). To tackle this issue of non-monotonic neighbors w.r.t. similarity function authors proposed to take coh∗∗ ∗∗ ...,ln and coh...,un as 0, if the rate of non-monotonic neighbors is larger that a threshold. In our experiments we used the following notation: SMC∗∗ and J∗∗ , where the first star is replaced by a cohesion type, the second one is replaced by the type of a similarity function. Below, we consider another four metrics that were introduced in [7]. Predictability approach (P) Predictability of a formal concept is computed in a quite similar way to BLS . A cohesion function is replaced by a predictability function: P (A, B) = pred∗∗ (A, B) ⊗ pred∗∗ ∗∗ un (A, B) ⊗ predln (A, B) The main idea behind this approach is to assign high score to concept (A, B) with low conditional entropy of the presence of attributes not in B in intents of objects from A (i.e., requiring few attributes outside B in objects from A)[7]: Concept interestingness measures: a comparative study 65 |A ∩ y 0 | |A ∩ y 0 | E (I [hx, yi ∈ I] |I [x ∈ A]) = − log |A| |A| X E (I [hx, yi ∈ I] |I [x ∈ A]) pred (A, B) = 1 − . |M − B| y∈M −B Cue Validity (CV), Category Feature Collocation (CFC), Category Utility (CU) The following measures based on the conditional probability of object g ∈ A given that y ⊆ g 0 were introduced in [7]: X X |A| CV (A, B) = P (A|y 0 ) = |y 0 | y∈B y∈B X X |A ∩ y 0 | |A ∩ y 0 | CF C (A, B) = p (A|y 0 ) p (y 0 |A) = |y 0 | |A| y∈M y∈M " 2 0 2 # Xh 2 i |A| X |A ∩ y 0 | |y | 0 0 2 CU (A, B) = p (A) p (y |A) − p (y ) = − |G| |y 0 | |G| y∈M y∈M The main intuition behind CV is to express probability of extent given at- tributes from intent, CFC index takes into account the relationship between all attributes of the concept and intent of the formal concept, while CU evaluates how much an attribute in an intent is characteristic for a given concept rather than for the whole context [36]. Metrics for arbitrary itemsets Frequency(support) It is one of the most popular measures in the theory of pattern mining. According to this index the most “interesting” concepts are frequent ones (having high support). For an arbitrary formal concept the support is defined as follows |A| supp (A, B) = |G| The support provides efficient level-wise algorithms for constructing semilattices since it exhibits anti-monotonicity (a priori property [2, 30]): B1 ⊂ B2 → supp (B1 ) ≥ supp (B2 ) Lift In the previous section different methods with background knowledge were considered. Another way to add additional knowledge to data is proposed in [11]. Under assumption of attributes independence it is possible to compute individual frequencies of attributes and take their product as the expected frequency. The ratio of the observed frequency to its expectation is defined as lift. The lift of a formal concept (A, B) is defined as follows: P (A) |A|/|G| lif t (B) = Q 0) =Q 0 b∈B P (b b∈B |b |/|G| 66 Sergei O. Kuznetsov and Tatyana P. Makhalova Collective Strength The collective strength [1] combines ideas of comparing the observed data and expectation under the assumption of independence of at- tributes. To calculate this measure for a formal concept (A, B) one needs to define for B the set of objects VQ B that has at least one attribute in B, but not all of them at once. Denote q = b∈B supp (b0 ) and supp (VB ) = v, the collective strength of a formal concept has the following form: 1−v q cs (B) = v 1−q 4 Experiments In this section, we compare measures with respect to their ability to help selecting most interesting concepts and filtering concepts coming from noisy datasets. For both goals, one is interested in a ranking of concepts rather than in particular values of the measures. 4.1 Formal Concept Mining Usually concept lattices constructed from empirical data have huge amount of formal concepts, many of them being redundant, excessive and useless. In this connection the measures can be used to estimate how meaningful a concept is. Since the “interestingness” of a concept is a fairly subjective measure, the correct comparison of indices in terms of ability to select meaningful ones is impossible. With this respect we focus on similarity of indices described above. To identify how similar indices are, we use the Kendall tau correlation coefficient [22]. Put differently, we consider pairwise similarity of two lists of the same concepts that are ordered by values of the chosen indices. A set of strongly correlated measures can be replaced by one with the lowest computational complexity. We randomly generated 100 formal contexts of random sizes. The number of attributes was in range between 10 and 40, while the number of objects varied from 10 to 70. For generated contexts we calculated pairwise Kendall tau for all indices of each context.The averaged values of correlations coefficients are represented in Table 1. In [7] it was shown that the CU, CFC and CV are correlated, while S and P are not strongly correlated to other metrics. The results of our simulations allow us to conclude that CU, CFC and CV are also pairwise correlated to separation and support. Moreover, support is strongly correlated to separation and probability. Since the computational complexity of support is less than that of separation and probability, it is preferable to use support. It is worth noting that predictability (P) and robustness are not correlated to any other metrics and hence they can not be replaced by the metrics introduced so far. Thus, based on the correlation analysis, it is possible to reduce computation- ally complexity by choosing the most easily computable index within the class of correlated metrics. Concept interestingness measures: a comparative study 67 Table 1. Kendall tau correlation coefficient for indices Smm J Sma J Sam J Saa J Smm ma am aa SM C SSM C SSM C SSM C P CU CFC CV Rob0.8 Rob0.5 Rob0.3 Prob 0.18 0.15 0.14 0.14 0.04 0.03 0.00 -0.02 0.04 0.30 0.49 -0.01 -0.07 -0.11 -0.14 Sep 0.20 0.20 0.18 0.18 0.07 0.07 0.14 0.12 0.05 0.36 0.45 0.54 -0.11 -0.12 -0.13 CS -0.08 -0.05 -0.06 -0.05 -0.07 -0.07 0.02 0.04 -0.09 0.04 -0.12 0.29 0.00 0.02 0.04 Lift -0.16 -0.13 -0.08 -0.07 -0.09 -0.08 0.02 0.03 -0.15 -0.07 -0.25 0.25 0.07 0.10 0.11 Sup 0.17 0.17 0.21 0.21 -0.01 -0.02 0.03 0.00 -0.06 0.54 0.80 0.31 -0.10 -0.15 -0.18 Stab 0.08 0.08 0.11 0.11 0.01 0.01 -0.02 -0.02 -0.18 -0.05 0.08 0.12 0.23 0.14 0.06 Stabl 0.06 0.06 0.11 0.11 0.02 0.02 0.01 0.01 -0.17 -0.16 -0.05 0.07 0.24 0.21 0.14 Stabh 0.15 0.14 0.15 0.14 0.02 0.01 -0.04 -0.05 -0.11 0.24 0.45 0.23 0.13 0.00 -0.09 Rob0.1 -0.09 -0.09 -0.02 -0.02 0.00 0.00 -0.01 0.00 -0.02 -0.11 -0.16 -0.09 0.56 0.73 0.86 Rob0.3 -0.10 -0.10 -0.03 -0.02 0.00 0.00 -0.02 0.00 -0.03 -0.12 -0.18 -0.09 0.68 0.86 Rob0.5 -0.08 -0.08 -0.02 -0.02 0.02 0.02 -0.02 -0.01 -0.03 -0.12 -0.15 -0.07 0.82 Rob0.8 -0.06 -0.06 -0.03 -0.02 0.03 0.03 -0.03 -0.02 -0.03 -0.11 -0.12 -0.06 CV 0.08 0.09 0.15 0.15 -0.04 -0.04 0.05 0.05 -0.14 0.50 0.52 CFC 0.09 0.08 0.15 0.15 -0.13 -0.13 -0.05 -0.06 -0.18 0.72 CU 0.03 0.04 0.10 0.11 -0.13 -0.13 -0.06 -0.07 -0.17 -0.11 Stabh P 0.43 0.42 0.28 0.27 0.50 0.50 0.40 0.41 0.39 0.09 Stabl Saa SM C 0.39 0.39 0.56 0.56 0.49 0.50 0.92 0.86 0.59 0.03 Stab Sam SM C 0.39 0.38 0.58 0.57 0.48 0.49 0.18 0.02 0.58 -0.17 Sup Sma SM C 0.51 0.50 0.37 0.37 0.96 -0.47 -0.04 0.05 -0.29 0.10 Lift Smm SM C 0.51 0.48 0.36 0.36 0.64 -0.32 -0.09 -0.04 -0.25 0.03 CS Saa J 0.41 0.42 0.95 0.14 0.01 0.42 0.03 -0.02 0.20 -0.13 Sep Sam J 0.42 0.41 0.17 -0.53 -0.73 0.76 0.15 0.02 0.48 -0.14 Prob Sma J 0.90 Sep CS Lift Sup Stab Stabl Stabh Rob0.1 4.2 Noise Filtering In practice, we often have to deal with noisy data. In this case, the number of formal concepts can be very large and the lattice structure becomes too compli- cated [24]. To test the ability to filter out noise we took 5 lattices of different structure. Four of them are quite simple (Fig. 1) and the fifth one is the bina- rized fragment of the Mushroom data set 1 on 500 objects and 14 attributes, its concept lattice consists of 54 formal concepts. (a) (b) (c) (d) Fig. 1. Concept lattices for formal contexts with 300 objects and 6 attributes (a - c),with 400 objects and 4 attributes (d) 1 https://archive.ics.uci.edu/ml/datasets/Mushroom 68 Sergei O. Kuznetsov and Tatyana P. Makhalova For a generated 0-1 datatable we changed table elements (0 to 1 and 1 to 0) with a given probability. The rate of noise (the probability of replacement) varied in the range from 0.05 to 0.5. We test the ability of a measure to filter redundant concepts in terms of precision and recall. For top-n (w.r.t. a measure) formal concepts, the recall and precision are defined as follows: |original conceptstop−n | recalltop−n = |original concepts| |original conceptstop−n | precisiontop−n = |top − n concepts| Table 2. Precision of indices with recall = 0, 6 Noise Prob Sep Stabl Stabh CV CFC CU Freq Rob0.5 rate Antichain 0.1 0.03 1 1 1 1 0.15 0.25 0.13 0.05 0.3 0.03 1 1 1 1 0.09 0.20 0.10 0.02 0.5 0.02 0.20 0.12 0.13 0.29 0.07 0.10 0.06 0.02 Chain 0.1 0.80 0.44 1 1 0.67 0.27 0.13 0.27 0.80 0.3 0.21 0.18 0.67 1 0.22 0.18 0.17 0.19 1 0.5 0.29 0.13 0.25 0.57 0.21 0.14 0.16 0.14 0.57 Context 3 0.1 0.20 1 1 1 0.36 0.33 0.44 0.67 0.40 0.3 0.16 0.67 0.80 0.80 0.44 0.33 0.44 0.50 0.40 0.5 0.19 0.50 0.50 0.50 0.44 0.27 0.33 0.50 0.57 Context 4 0.1 0.44 1.00 1.00 1.00 1.00 0.80 0.57 0.80 0.50 0.3 0.22 1.00 1.00 1.00 1.00 0.80 0.57 0.80 0.57 0.5 0.14 0.67 1.00 1.00 0.44 0.80 0.57 0.80 0.67 Mushroom 0.1 0.28 0.29 0.84 0.84 0.32 0.28 0.32 0.31 0.30 0.3 0.16 0.16 0.36 0.39 0.25 0.18 0.20 0.22 0.09 0.5 0.08 0.10 0.17 0.17 0.14 0.11 0.16 0.11 0.06 Figures 2 show the ROC curve for the measures. The curves that are close to the left upper corner correspond to the most powerful measures. The best and most stable results correspond to the high estimate of stability (stabilityh ). The similar precision has the lower estimate of stability (Table 2), whereas precision of separation and probability depends on the proportion of noise and lattice structure as well. The measures of basic level that utilize sim- ilarity and predictability approaches become zero for some concepts. The rate of vanished concepts (including original ones) increases as the noise probability gets bigger. In our study we take such concepts as “false negative”, so in this case ROC curves do not pass through the point (1,1). More than that, recall and Concept interestingness measures: a comparative study 69 Fig. 2. Averaged ROC curves of indices among contexts 1 - 5 with different noise rate (0.1 - 0.5) precision are unstable with respect to the noise rate and lattice structure. This group of measures is inappropriate for noise filtering. The other basic level measures, such as CU, CFC and CV, demonstrate much better recall compared to previous ones. However, in general the precision of CU, CFC and CV is determined by lattice structure (Table 2). Frequency has the highest precision among the indices that are applicable for the assessment of arbitrary sets of attributes. Frequency is stable with respect to the noise rate, but can vary under different lattice structures. For the lift and the collective strength precision depends on the lattice structure, and the collective strength also has quite unstable recall. Precision of robustness depends on both lattice structure and value of α (Fig. 2). In our study we have got the highest precision for α close to 0.5. Thus, the most preferred metrics for noise filtering are stability estimates, CV, frequency and robustness (where α is greater than 0.4). In [24] it was noticed that the combination of the indices can improve the filtering power of indices. In this regard, we have studied top-n concepts selected by pairwise combination of measures. As it was shown by the experiments, the combination of measures may improve recall of the top-n set, while precision gets lower with respect to a more accurate measure. Figure 3 shows recall and precision of different combination of measures. In the best case it is possible to improve the recall, the precision on small sets of top-n concepts is lower than the precision of one measure by itself. 5 Conclusion In this paper we have considered various methods for selecting interesting con- cepts and noise reduction. We focused on the most promising and well inter- pretable approach based on interestingness measures of concepts. Since “inter- 70 Sergei O. Kuznetsov and Tatyana P. Makhalova Fig. 3. Recall and precision of metrics and their combination on a Mushroom dataset fragment with the noise probability 0.1 estingness” of a concept is a subjective measure, we have compared several mea- sures known in the literature and identified groups of most correlated ones. CU, CFC, CV, separation and frequency make up the first group. Frequency is cor- related to separation and probability. Another part of our experiments was focused on the noise filtering. We have found that the stability estimates work perfectly with data of various noise rate and different structure of the original lattice. Robustness and 3 of basic level met- rics (cue validity, category utility and category feature collocation approaches) could also be applied to noise reduction. The combination of measures can also improve the recall, but only in the case of high noise rate. Acknowledgments The authors were supported by the project “Mathematical Models, Algorithms, and Software Tools for Mining of Structural and Textual Data” supported by the Basic Research Program of the National Research University Higher School of Economics. References 1. Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. pp. 18–24. ACM (1998) 2. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB. vol. 1215, pp. 487–499 (1994) 3. Arévalo, G., Berry, A., Huchard, M., Perrot, G., Sigayret, A.: Performances of galois sub-hierarchy-building algorithms. In: Formal Concept Analysis, pp. 166– 180. Springer (2007) 4. Babin, M.A., Kuznetsov, S.O.: Approximating concept stability. In: Domenach, F., Ignatov, D., Poelmans, J. (eds.) Formal Concept Analysis. Lecture Notes in Computer Science, vol. 7278, pp. 7–15. Springer Berlin Heidelberg (2012) 5. Belohlavek, R., Macko, J.: Selecting important concepts using weights. In: Valtchev, P., Jschke, R. (eds.) Formal Concept Analysis, Lecture Notes in Com- puter Science, vol. 6628, pp. 65–80. Springer Berlin Heidelberg (2011) 6. Belohlavek, R., Trnecka, M.: Basic level of concepts in formal concept analysis. In: Domenach, F., Ignatov, D., Poelmans, J. (eds.) Formal Concept Analysis, Lecture Notes in Computer Science, vol. 7278, pp. 28–44. Springer Berlin Heidelberg (2012) Concept interestingness measures: a comparative study 71 7. Belohlavek, R., Trnecka, M.: Basic level in formal concept analysis: Interesting concepts and psychological ramifications. In: Proceedings of the Twenty-Third In- ternational Joint Conference on Artificial Intelligence. pp. 1233–1239. IJCAI ’13, AAAI Press (2013) 8. Belohlavek, R., Vychodil, V.: Formal concept analysis with background knowledge: attribute priorities. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 39(4), 399–409 (2009) 9. Berry, A., Huchard, M., McConnell, R., Sigayret, A., Spinrad, J.: Efficiently com- puting a linear extension of the sub-hierarchy of a concept lattice. In: Ganter, B., Godin, R. (eds.) Formal Concept Analysis, Lecture Notes in Computer Science, vol. 3403, pp. 208–222. Springer Berlin Heidelberg (2005) 10. Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Scalable estimates of concept stabil- ity. In: Glodeanu, C., Kaytoue, M., Sacarea, C. (eds.) Formal Concept Analysis, Lecture Notes in Computer Science, vol. 8478, pp. 157–172. Springer International Publishing (2014) 11. Cabena, P., Choi, H.H., Kim, I.S., Otsuka, S., Reinschmidt, J., Saarenvirta, G.: Intelligent miner for data applications guide. IBM RedBook SG24-5252-00 173 (1999) 12. Carpineto, C., Romano, G.: A lattice conceptual clustering system and its appli- cation to browsing retrieval. Machine Learning 24(2), 95–122 (1996) 13. Cheung, K., Vogel, D.: Complexity reduction in lattice-based information retrieval. Information Retrieval 8(2), 285–299 (2005) 14. Dhillon, I., Modha, D.: Concept decompositions for large sparse text data using clustering. Machine Learning 42(1-2), 143–175 (2001) 15. Dias, S.M., Vieira, N.: Reducing the size of concept lattices: The JBOS approach. In: Proceedings of the 7th International Conference on Concept Lattices and Their Applications, Sevilla, Spain, October 19-21, 2010. pp. 80–91 (2010) 16. Dias, S.M., Vieira, N.J.: Concept lattices reduction: Definition, analysis and clas- sification. Expert Systems with Applications 42(20), 7084 – 7097 (2015) 17. Dobša, J., Dalbelo-Bašić, B.: Comparison of information retrieval techniques: la- tent semantic indexing and concept indexing. Journal of Inf. and Organizational Sciences 28(1-2), 1–17 (2004) 18. Düntsch, I., Gediga, G.: Simplifying contextual structures. In: Kryszkiewicz, M., Bandyopadhyay, S., Rybinski, H., Pal, S.K. (eds.) Pattern Recognition and Ma- chine Intelligence, Lecture Notes in Computer Science, vol. 9124, pp. 23–32. Springer International Publishing (2015) 19. Emilion, R.: Concepts of a discrete random variable. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds.) Selected Contributions in Data Analysis and Classification, pp. 247–258. Studies in Classification, Data Analysis, and Knowl- edge Organization, Springer Berlin Heidelberg (2007) 20. Ganter, B., Wille, R.: Contextual attribute logic. In: Tepfenhart, W., Cyre, W. (eds.) Conceptual Structures: Standards and Practices, Lecture Notes in Computer Science, vol. 1640, pp. 377–388. Springer Berlin Heidelberg (1999) 21. Jay, N., Kohler, F., Napoli, A.: Analysis of social communities with iceberg and stability-based concept lattices. In: Medina, R., Obiedkov, S. (eds.) Formal Concept Analysis, Lecture Notes in Computer Science, vol. 4933, pp. 258–272. Springer Berlin Heidelberg (2008) 22. Kendall, M.G.: A new measure of rank correlation. Biometrika pp. 81–93 (1938) 23. Klement, E.P., Mesiar, R., Pap, E.: Triangular norms. Springer Netherlands (2000) 72 Sergei O. Kuznetsov and Tatyana P. Makhalova 24. Klimushkin, M., Obiedkov, S., Roth, C.: Approaches to the selection of relevant concepts in the case of noisy data. In: Kwuida, L., Sertkaya, B. (eds.) Formal Concept Analysis, Lecture Notes in Computer Science, vol. 5986, pp. 255–266. Springer Berlin Heidelberg (2010) 25. Kumar, C.A., Srinivas, S.: Latent semantic indexing using eigenvalue analysis for efficient information retrieval. Int. J. Appl. Math. Comput. Sci 16(4), 551–558 (2006) 26. Kuznetsov, S.O.: Interpretation on graphs and complexity characteristics of a search for specific patterns. Automatic Documentation and Mathematical Linguis- tics 24(1), 37–45 (1989) 27. Kuznetsov, S.O.: Stability as an estimate of degree of substantiation of hypotheses derived on the basis of operational similarity. Nauchn. Tekh. Inf., Ser. 2 (12), 21–29 (1990) 28. Kuznetsov, S.O.: On stability of a formal concept. Annals of Mathematics and Artificial Intelligence 49(1-4), 101–115 (2007) 29. Kuznetsov, S.O., Obiedkov, S., Roth, C.: Reducing the representation complexity of lattice-based taxonomies. In: Conceptual Structures: Knowledge Architectures for Smart Applications, pp. 241–254. Springer Berlin Heidelberg (2007) 30. Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering asso- ciation rules. In: KDD-94: AAAI workshop on Knowledge Discovery in Databases. pp. 181–192 (1994) 31. Murphy, G.L.: The big book of concepts. MIT press (2002) 32. Rosch, E.: Principles of categorization pp. 27–48 (1978) 33. Snasel, V., Polovincak, M., Abdulla, H.M.D., Horak, Z.: On concept lattices and implication bases from reduced contexts. In: ICCS. pp. 83–90 (2008) 34. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with titanic. Data Knowl. Eng. 42(2), 189–222 (Aug 2002) 35. Tatti, N., Moerchen, F., Calders, T.: Finding robust itemsets under subsampling. ACM Transactions on Database Systems (TODS) 39(3), 20 (2014) 36. Zeigenfuse, M.D., Lee, M.D.: A comparison of three measures of the association between a feature and a concept. In: Proceedings of the 33rd Annual Conference of the Cognitive Science Society. pp. 243–248 (2011) Why concept lattices are large Extremal theory for the number of minimal generators and formal concepts Alexandre Albano1 and Bogdan Chornomaz2 1 Technische Universität Dresden 2 V.N. Karazin Kharkiv National University Abstract. A unique type of subcontexts is always present in formal contexts with many concepts: the contranominal scales. We make this precise by giving an upper bound for the number of minimal generators (and thereby for the number of concepts) of contexts without contranom- inal scales larger than a given size. Extremal contexts are constructed which meet this bound exactly. They are completely classified. 1 Introduction The primitive data model of Formal Concept Analysis is that of a formal context, which is unfolded into a concept lattice for further analysis. It is well known that concept lattices may be exponentially larger than the contexts which gave rise to them. An obvious example is the boolean lattice B(k), having 2k elements, the standard context of which is the k × k contranominal scale Nc (k). This is not the only example of contexts having large associated concept lattices: indeed, the lattice of any subcontext is embeddable in the lattice of the whole context [4], which means that contexts having large contranominal scales as subcontexts necessarily have large concept lattices as well. Those considerations induce one natural question, namely, whether there are other reasons for a concept lattice to be large. As it will be shown in this paper, the answer is no. The structure of the paper is as follows. Our starting point is a known up- per bound for the number of concepts, which we improve using the language of minimal generators. Then, we show that our result is the best possible by con- structing lattices which attain exactly the improved upper bound. These lattices, i.e., the extremal lattices, are characterized. 2 Fundamentals For a set S, a context of the form (S, S, 6=) will be called a contranominal scale. We will denote by Nc (k) the contranominal scale with k objects (and k at- tributes), that is, the context ([k], [k], 6=), where [k] := {1, 2, . . . , k}. The expres- sion K1 ≤ K denotes that K1 is a subcontext of K. The symbol ∼ = expresses the existence of an order-isomorphism whenever two ordered sets are involved c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 73–86, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 74 Alexandre Albano and Bogdan Chornomaz or, alternatively, the existence of a context isomorphism in the case of formal contexts. For a context K to be Nc (k)-free means that there does not exist a subcontext K1 ≤ K with K1 ∼ = Nc (k). The boolean lattice with k atoms, that is, c B(N (k)), will be denoted by B(k). Similarly, we say that a lattice L is B(k)-free whenever B(k) does not (order-)embed into L. Using Proposition 32 from [4] one has that K is Nc (k)-free whenever B(K) is B(k)-free. The converse is also true and is, in fact, the content of our first proposition. An example of a context which has Nc (3) as a subcontext along with its concept lattice is depicted in Figure 1. One may observe that the context is Nc (4)-free, since its lattice has ten concepts (and would have at least sixteen otherwise). m n o p q g ×× h × ×× o n m i ×× × j × ×× k × g h i Fig. 1: A context K with Nc (3) ≤ K and its concept lattice. The object and attribute concepts belonging to the B(3) suborder are indicated on the diagram. We denote by J(L) and M (L), respectively, the set of completely join- irreducible and meet-irreducible elements of a lattice L. The least and greatest elements of a lattice L will be denoted, respectively, by 0L and 1L . The symbol ≺ will designate the covering relation between elements, that is, x ≺ y if x < y and for every z ∈ L, x < z ≤ y ⇒ z = y. The length of a finite lattice is the number of elements in a maximum chain minus one. An atom is an element covering 0L , while a coatom is an element covered by 1L . Whenever two elements x, y ∈ L are incomparable, we will write x||y. We denote by A(L) the set of atoms of a lattice L. For an element l ∈ L, we shall write ↓ l := {x ∈ L | x ≤ l} as well as ↑ l := {x ∈ L | x ≥ l}. Moreover, for l ∈ L we denote by Al the set A(L) ∩ ↓ l and,W similarly, by Jl the set J(L)∩ ↓ l. A complete lattice L is called atomistic if x = Ax holds for every x ∈ L. In this case, A(L) = J(L). Proposition 1. Let K be a context such that B(k) embeds into B(K). Then Nc (k) ≤ K. Proof. Let (A1 , B1 ), . . . , (Ak , Bk ) be the atoms of B(k) in B(K). Similarly, de- note its coatoms by (C1 , D1 ), . . . , (Ck , Dk ) in such a way that (Ai , Bi ) ≤ (Cj , Dj ) ⇔ i 6= j for each i, j. Note that the sets Ai , as well as the sets Di , are non-empty. Let i ∈ [k]. Since (Ai , Bi ) (Ci , Di ), we may take an object/attribute pair gi ∈ Ai , mi ∈ Di with gi I mi . For every chosen object gi ∈ Ai , one has that Why concept lattices are large 75 gi Imj for every j ∈ [k] with j 6= i, because of (Ai , Bi ) ≤ (Cj , Dj ), which implies Bi ⊇ Dj . Consequently, k distinct objects gi (as well as k distinct attributes mi ) were chosen. Combining both relations results in gi Imj ⇔ i 6= j for each i ∈ [k], that is, the objects and attributes gi , mi form a contranominal scale in K. Definition 1. Let (G, M, I) be a formal context. A set S ⊆ G is said to be a minimal generator (of the extent S 00 ) if T 00 6= S 00 for every proper subset T ( S. The set of all minimal generators of a context K will be denoted by MinGen(K). Observation: In contexts with finitely many objects, every extent has at least one minimal generator. Clearly, two different extents cannot share one same minimal generator. Thus, the upper bound |B(K)| ≤ |MinGen(K)| holds for contexts with finite object sets. The problem of computing exactly the number of concepts does not admit a polynomial-time algorithm, unless P=NP. This was shown by Kuznetsov [5]: more precisely, this counting problem is #P-complete. However, there are results which establish upper bounds for the number of concepts: see for example [1–3, 6, 7]. 3 The upper bound Our investigations were inspired by a result of Prisner, who gave the first upper bound regarding contranominal-scale free contexts. The original version is in graph theoretic language. Reformulated it reads as follows: Theorem 1 (Prisner [6]). Let K = (G, M, I) be a Nc (k)-free context. Then, |B(K)| ≤ (|G||M |)k−1 + 1. In this section we will show an improvement of Theorem 1. For that, we will relate minimal generators with contranominal scales. The first step towards this is the equivalence shown in Proposition 2. Note that, since derivation operators are antitone, the 6= symbol may be substituted by ). Proposition 2. Let (G, M, I) be a formal context. A set S ⊆ G is a minimal generator if and only if for every g ∈ S, it holds that (S \ {g})0 6= S 0 . Proof. We will show the two equivalent contrapositions. If (S \ {g})0 = S 0 , then, of course, (S \ {g})00 = S 00 , and S is not a minimal generator. For the converse, suppose that S is not a minimal generator, and take a proper subset T of S with T 00 = S 00 . Note that T 00 = S 00 implies T 0 = S 0 . Let g ∈ S \ T . On one hand, (S \ {g}) ⊆ S implies (S \ {g})0 ⊇ S 0 . On the other hand, (S \ {g}) ⊇ T implies (S \ {g})0 ⊆ T 0 = S 0 . Combining both yields (S \ {g})0 = S 0 . The next proposition relates minimal generators and contranominal scales. 76 Alexandre Albano and Bogdan Chornomaz Lemma 1. Let K = (G, M, I) be a context and A ⊆ G. There exists a contra- nominal scale K1 ≤ K having A as its object set if and only if A is a minimal generator. In particular, if G is finite: max{|A| : A is a minimal generator} = max{k ∈ N : Nc (k) ≤ K}. A⊆G Proof. Suppose that A is a minimal generator and let g ∈ A. By Proposition 2, one has that (A \ {g})0 ) A0 . Hence, there exists an attribute m with g I m and hIm for every h ∈ A \ {g}. Clearly, two different objects g1 , g2 ∈ A cannot give rise to the same attribute m, since the two pairs of conditions gi I m and hIm for every h ∈ A\{gi } cannot be satisfied simultaneously (i = 1, 2). Thus, there exists an injection ι : A → M with g I ι(g), hIι(g) for each g ∈ A and each h ∈ A \ {g}. By setting N = ι(A), one has that (A, N, I ∩ (A × N )) is a contranominal scale. For the converse, let K1 = (S, S, 6=) ≤ K be a contranominal scale and let g ∈ S. Clearly, g ∈/ S 0 . Moreover, g ∈ (S \ {g})0 . This amounts to (S \ {g})0 ) S 0 for each g ∈ S. By Proposition 2, the set S is a minimal generator. A consequence of Lemma 1 is the following, which is an improvement of the order of k! · |M |k /k of Prisner’s bound. Theorem 2. Let K = (G, M, I) be a Nc (k)-free context with finite G. Then: X k−1 |G| |B(K)| ≤ |MinGen(K)| ≤ . i=0 i In particular, if k ≤ |G| 2 : |G|k−1 |B(K)| ≤ k · . (k − 1)! Proof. Lemma 1 guarantees that K does not have any minimal generator of cardinality greater or equal to k. The sum above is the number of subsets of G having cardinality at most k − 1. Definition 2. We denote by f (n, k) the upper bound in Theorem 2: X k−1 n f (n, k) := . i=0 i The upper bound in Theorem 2 for f (n, k) gets worse as k gets close to |G| 2 . Tighter upper bounds for the sum of binomial coefficients may be found in [9]. 4 Sharpness: preparatory results The following property of f (n, k) is needed for the next two sections. Why concept lattices are large 77 Proposition 3. The function f (n, k) satisfies the following identity: f (n, k) = f (n − 1, k − 1) + f (n − 1, k). Proof. This follows from a standard binomial identity: f (n − 1, k) + f (n − Pk−1 n−1 Pk−2 n−1 Pk−1 Pk−1 n−1 1, k − 1) = i=0 i + j=0 j = 1 + i=1 n−1 i−1 + j=1 j = Pk−1 n 1 + i=1 i = f (n, k). Consider a finite lattice L. It is well known that W every element x ∈ L is the supremum of some subset of J(L): for example, x = Jx . We call such a subset a representation of x through join-irreducible elements (for brevity, we may say a representation through irreducibles of x or even only W a representation of x). A representation S ⊆ J(L) of x is called irredundant if (S \ {y}) 6= x for every y ∈ S. Of course, every x ∈ L has an irredundant representation, but it does not need to be unique. Note that irredundant representations are precisely minimal generators when one takes the standard context of L, (J(L), M (L), ≤). Indeed, in that formal context, the closure of object sets corresponds to the supremum of join-irreducible elements of L. For an element x ∈ L, there may exist elements in Jx which belong to every representation of x: the so-called extremal points. An element z ∈ Jx is an extremal point of x if there exists a lower neighbor y of x such that Jy = Jx \ {z}. Every representation of W x must contain every extremal point z of x since, in this case, the supremum (Jx \ {z}) is strictly smaller than x (and is actually covered by x). In Section 5 we shall construct finite lattices for which every element has ex- actly one irredundant representation. It turns out that, in the finite case, these lattices are precisely the meet-distributive lattices. This is implied by Theorem 44 of [4], which actually gives information about the unique irredundant repre- sentation as well: a finite lattice L is meet-distributive if and only if for every x ∈ L the set Ex of all extremal points of x is a representation of x (and Ex is, therefore, the unique irredundant representation of x, since every representation of x must contain Ex ). Proposition 4 provides a characteristic property for the finite case which will be used in our constructions. Proposition 4. Let L be a finite lattice. The following assertions are equivalent: i) L is meet-distributive. ii) Every element x ∈ L is the supremum of its extremal points. iii) For every x, y ∈ L with x ≺ y, it holds that |Jy \ Jx | = 1. Proof. The equivalence between i) and ii) may be found in Theorem 44 of [4]. Let x ∈ L and define Ex = {z ∈ Jx | z is an extremal point of x}. We now show that ii) implies iii). Let y ∈ L with y < x. This implies Jy ( Jx . TheWset Jy does not contain Ex , because this would force y ≥ W x. Therefore, y = Jy is upper bounded by some element in the set U = { (Jx \ {z}) | z ∈ Ex } (note that x ∈/ U ). Hence, every lower neighbor of x has a representation of the W form (Jx \ {z}) with z ∈ Ex . Now we show that iii) implies ii). Define y = Ex and suppose by contradiction that y < x. Then, there exists an element z such 78 Alexandre Albano and Bogdan Chornomaz that y ≤ z ≺ x and Jz ⊇ Ex . But then, z ≺ x implies Jx \ Jz = {w} for some w ∈ J(L), which means that w is an extremal point of x. This contradicts the fact that Ex contains all extremal points of x. The next lemma will be useful in Section 5, when we shall change the per- spective from lattices to contexts. Lemma 2. Let L be a finite lattice. If L is B(k)-free, then every element has a representation through join-irreducibles of size at most k − 1. The converse holds if L is meet-distributive. Proof. Let K = (J(L), M (L), ≤) be the standard context of L. We identify the elements of L with the extents of K via x 7→ Jx . Suppose that L is B(k)-free. Then, K is Nc (k)-free. Let A be an arbitrary extent of K and S a minimal generator W of A. Then, by Lemma 1, it follows that |S| ≤ k − 1. Since A = S 00 = S, we have the desired representation. Now, suppose that |Jy \ Jx | = 1 holds for every x, y ∈ L with x ≺ y (cf. Proposition 4). To prove the converse, we suppose that B(k) embeds into L and our goal is to show that some x ∈ L does not have any representation with fewer than k elements of J(L). Now, since B(k) embeds into L, Proposition 1 implies that Nc (k) is a subcontext of K. Applying Lemma 1, we have that there exists a minimal generator S ⊆ J(L) with |S| = k. Equivalently, S is an irredundant representation of the element S 00 of L. By Proposition 4, S is the unique irredundant representation of S 00 . Therefore, S 00 cannot be expressed as the supremum of fewer than k join-irreducible elements. 5 Sharpness: construction of extremal lattices In this section, we will consider only finite lattices. Our objective is to construct lattices which prove that the bound in Theorem 2 is sharp. Definition 3. For positive integers n and k, we call a lattice (n,k)-extremal if it has at most n join-irreducible elements, is B(k)-free, and has exactly f (n, k) elements. It is clear that every (n, 1)-extremal lattice is trivial, i.e., the lattice with one element. To construct (n, k)-extremal lattices with larger k, we will use an operation which we call doubling. Definition 4. Let L be an ordered set and K ⊆ L. The doubling of K in L . . . is defined to be L[K] = L ∪ K, where K is a disjoint copy of K, i.e., K ∩ L = ∅. The order in (L[K], ≤0 ) is defined as follows: . . . . . . ≤0 = ≤ ∪ {(x, y) ∈ L × K | x ≤ y} ∪ {(x, y) ∈ K × K | x ≤ y}. Why concept lattices are large 79 . We will employ the notation x to denote the image under doubling of an . . element x ∈ K. Note that x ≺ x for every x ∈ K, and that x is the only upper . neighbor of x in K. When L is a set family C ⊆ P(G), then the diagram of L[K] can be easily depicted: the doubling C[D] (with D ⊆ C) corresponds to the set family C ∪ {D ∪ {g} | D ∈ D}, where g ∈ / G is a new element. Figure 2 illustrates three doubling operations. The first one is the doubling of the chain {∅, {2}, {1, 2}} inside the closure system C1 = P([2]), resulting in C2 . The (a fortiori ) closure systems C3 and C4 are obtained by doubling, respectively, the chains {∅, {3}, {2, 3}, {1, 2, 3}} and {∅, {2}, {2, 3}, {1, 2, 3}} inside C2 . 123 12 12 23 1 2 1 2 3 C1 C2 1234 1234 123 234 123 234 12 23 34 12 23 24 1 2 3 4 1 2 3 4 C3 C4 Fig. 2: Doubling chains inside closure systems Since we are interested in constructing lattices, it is vital to guarantee that the doubling operation produces a lattice. By a meet-subsemilattice of a lattice L is meant a subset K of L, endowed with the inherited order, such that x ∧ y ∈ K holds for every x, y ∈ K. It is called topped if 1L ∈ K. Proposition 5. If K is a topped meet-subsemilattice of a lattice L, then L[K] is a lattice. Proof. Let x, y ∈ L[K]. If both x and y belong to L, then clearly x ∧ y and x ∨ y belong to L ⊆ L[K]. Suppose that only one among x and y, say x, belongs to . L. Then y = z with z ∈ K. We have that x ∧ y = x ∧ z ∈ L ⊆ L[K] because of . . x 0K and V y = z ∨ 0K . For the supremum, set S = {w ∈ K | w ≥ x, w ≥ z} and u = S. Note that the fact that K is topped causes S 6= ∅. Since K is a meet-subsemilattice, we have that u ∈ K. It is clear that u is the least upper 80 Alexandre Albano and Bogdan Chornomaz . bound of x and z which belongs to K. Therefore, u is the least upper bound of . . . x and y, because of 0K u and y = z ∨ 0K . The remaining case is x, y ∈ K . . for which, clearly x ∧ y exists. Moreover, writing x = t, yV= z with t, z ∈ K and setting S = {w ∈ K | w ≥ t, w ≥ z} as well as u = S make clear that . u = x ∨ y. When extrinsically considered, topped meet-subsemilattices are lattices. This is compatible with the proof of Proposition 5, where the supremum and infimum . . . of two elements in K may be easily verified to belong to K: that is, K is actually a sublattice of L[K]. A suborder K of an ordered set L is called cover-preserving if x ≺K y implies x ≺L y for every x, y ∈ K. This property plays a key role by preserving meet- distributivity under a doubling operation: Proposition 6. Let L be a meet-distributive lattice and let K be a cover-preserving, topped meet-subsemilattice of L. Then, L[K] is a meet-distributive lattice. Proof. The fact that L[K] is a lattice comes from Proposition 5. Every element . . x ∈ K has one lower neighbor in K, namely, x. Thus, the total number of lower . neighbors of x is one only if x does not cover any element in K, that is, x = 0K . . Therefore, 0K is the only join-irreducible of L[K] which is not a join-irreducible 0 of L. Let x, y ∈ L[K] with x ≺L[K] y. We use J(·) to denote our J-notation in L[K] and J(·) in L. If x, y ∈ L, then clearly Jy = Jy and Jx0 = Jx , which 0 results in |Jy0 \ Jx0 | = |Jy \ Jx | = 1. If x, y ∈ . . / L, then x = z and y = w with z, w ∈ K and z ≺K w. From the fact that K is cover-preserving, we conclude that z ≺L w. Because L is meet-distributive, it follows that |Jw \Jz | = 1. Clearly . . one has Jx0 = Jz ∪ {0K } and Jy0 = Jw ∪ {0K }, which yields |Jy0 \ Jx0 | = 1. For the remaining case, one has necessarily x ∈ L and y ∈ . . / L. In these conditions, x ≺ y results in y = x and, therefore, Jy0 = Jx0 ∪ {0K }, implying |Jy0 \ Jx0 | = 1. Proposition 7 is the first assertion about extremal meet-subsemilattices. We note that the set of join-irreducible elements of a meet-subsemilattice K of a lattice L is not the same as the set J(L) ∩ K. Therefore, what is meant by an (n, k)-extremal meet-subsemilattice of L is precisely the following: a lattice K which is (n, k)-extremal and a meet-subsemilattice of L as well. Observe that chains with n + 1 elements are precisely the (n, 2)-extremal lattices. Proposition 7 illustrates, in particular, that an n + 1 element chain may be seen as the result of a doubling operation on an n element chain, provided that the doubling operation is performed with respect to the trivial topped meet- subsemilattice ↑ 1, which is (n, 1)-extremal. Proposition 7. Let L be an (n − 1, k)-extremal lattice with n, k ≥ 2. Suppose that K is a topped, (n − 1, k − 1)-extremal meet-subsemilattice. If L[K] is B(k)- free, then it is an (n, k)-extremal lattice. Why concept lattices are large 81 Proof. Proposition 5 guarantees that L[K] is indeed a lattice. As in the proof of . Proposition 6, we have that J(L[K]) = J(L) ∪ {0K } and in particular, L[K] has at most n join-irreducible elements. The claim that L[K] has f (n, k) elements follows from Proposition 3. It is clear now how (n, 2)-extremal lattices can be obtained by doubling a trivial meet-subsemilattice of an (n − 1, 2)-extremal lattice. The succeeding propositions and lemmas aim particularly towards a generalization of this opera- tion: the doubling of topped, (n − 1, k − 1)-extremal meet-subsemilattices inside (n − 1, k)-extremal lattices, yielding (n, k)-extremal lattices for k ≥ 3. Proposition 8. Suppose that L is an (n, k)-extremal lattice. Then, for every S, T ⊆ J(L) with |S|, |T | ≤ k − 1: _ _ S= T ⇒ S = T. Moreover, if k ≥ 2 then |J(L)| = n. Proof. We may suppose k ≥ 2 since the assertion holds trivially for k = 1. Lemma 2 guarantees that every W element x of L has a representation of size at most k − 1. Therefore L = { S | S ⊆ J(L), |S| ≤ k − 1}. Because of k ≥ 2 and the fact that |L| = f (n, k) is also the number of subsets of [n] having at most k − 1 elements, one has |J(L)| ≥ n. In fact equality must hold, because L has at most n join-irreducible elements. As a consequence of |J(L)| = n and |L| = f (n, k), we W W no two sets S, T ⊆ J(L) with S 6= T may lead to the have that same supremum S = T . Chains are the only extremal lattices which are not atomistic, as a conse- quence of the next lemma. Lemma 3. Suppose that L is an (n, k)-extremal lattice with k ≥ 3. Then L is atomistic and meet-distributive. In particular, the length of L equals the number of its atoms and there exists an atom which is an extremal point of 1L . Proof. If L were not atomistic, there would exist two comparable join-irreducible elements, say, x, y with x < y. But then x ∨ y = y, which contradicts Proposi- tion 8. Suppose that L is not meet-distributive and take x, y ∈ L with x ≺ y such that Ay \ Ax has at least two elements. Clearly x 6= 0L and, therefore, Ax 6= ∅. Let u, v ∈ Ay \ Ax be any two distinct elements. From u ∈ / Ax follows that x < x ∨ u ≤ y which, in turn, implies x ∨ u = y. Similarly, v ∈/ Ax implies x < x ∨ v ≤ y which, in turn, implies x ∨ v = y. Let a ∈ Ax . Now, a ≤ x and x||u imply a ∨ u = x ∨ u = y, as well as a ≤ x and x||v imply a ∨ v = x ∨ v = y. We obtain a ∨ u = a ∨ v, contradicting Proposition 8. Choosing a maximal chain x0 ≺ x1 ≺ . . . ≺ xl in L and noticing that the sizes of the sets Axi grow by exactly one element make the two remaining claims clear. Lemma 4 shows that non-trivial, extremal meet-subsemilattices are always cover-preserving and topped. These two properties will be useful to assure that a doubling L[K] is a meet-distributive lattice. 82 Alexandre Albano and Bogdan Chornomaz Lemma 4. Let L be an (n, k)-extremal lattice with k ≥ 3 and suppose that K is an (n, k − 1)-extremal subsemilattice of L. Then, K is cover-preserving and topped. If k ≥ 4, then K and L are atomistic with A(K) = A(L). Proof. In case that k = 3 then K is B(2)-free, that is, K is a chain. By Lemma 3, P1 K must be a maximal chain in order to have i=0 ni = n + 1 elements. Hence, 1K = 1L . The maximality of K guarantees that K is cover-preserving. Now, suppose that k ≥ 4. Again by Lemma 3, we have that both K and L are atom- istic. Since K has n atoms, 0K must be covered by n elements in L. But this is possible only if 0L = 0K , because L also has n atoms. This forces A(K) = A(L) as well as 1L ∈ K, because 1L is the only element that upper bounds each a ∈ A(K). To prove that K is cover-preserving, we apply Lemma 3 twice, ob- taining |Ay \ Ax | = 1 for every x, y ∈ K with x ≺K y as well as |Ay \ Ax | = 1 for every x, y ∈ L with x ≺L y. Both conditions hold simultaneously only if the implication x ≺K y ⇒ x ≺L y holds, i.e., if K is cover-preserving. A complete meet-embedding V is a meet-embedding which preserves arbitrary meets, including ∅. As a consequence, the greatest element of one lattice gets mapped to the greatest element of the other. Images of complete meet- embeddings are topped meet-subsemilattices. This notion is required for the following simple fact, which aids us in the construction of sequences of (n, k)- extremal lattices with fixed n and growing k. In Proposition 9, the symbol K[J] (for instance) means actually the doubling of the image of J under the corre- sponding embedding. Proposition 9. Suppose that J, K and L are lattices with complete meet-embeddings E1 : J → K and E2 : K → L. Then, there exists a complete meet-embedding from K[J] into L[K]. Proof. The fact that K[J] and L[K] are lattices comes from Proposition 5. Of . . course, there is an induced embedding from J into K, but for which we will use the same symbol E1 . The mapping E3 : K[J] → L[K] defined by E3 (x) = E1 (x) . for x ∈ J and E3 (x) = E2 (x) for x ∈ K may be checked as being a complete meet-embedding. As mentioned after Proposition 7, we will make use of an operation which doubles an extremal meet-subsemilattice of an extremal lattice. The next theo- rem shows that the lattice produced by this operation is indeed extremal. Theorem 3. Let L be an (n − 1, k)-extremal lattice with n ≥ 2 and k ≥ 3 and suppose that K is an (n − 1, k − 1)-extremal meet-subsemilattice of L. Then, L[K] is an (n, k)-extremal lattice. Proof. Lemma 3 guarantees that L is atomistic and meet-distributive. Moreover, Lemma 4 guarantees that K is cover-preserving and topped, so that, in partic- ular, L[K] is a meet-distributive lattice, as a consequence of Proposition 6. To prove that L[K] is (n, k)-extremal it is sufficient to show that L[K] is B(k)-free, Why concept lattices are large 83 because of Proposition 7. We will do so by proving that every element of L[K] has a representation through join-irreducibles of size at most k − 1. This indeed suffices because L[K] is meet-distributive, so that Lemma 2 may be applied. . Observe that J(L[K]) = A(L) ∪ 0K , since L is atomistic. Suppose that k = 3. In this case K is a chain, and a maximal one W because of Lemma 4. Let x ∈ L[K]. If x ∈ L, then Lemma 2 implies that x = S for some S ⊆ A(L), |S| ≤ 2 and the same representation may be used in L[K]. If x ∈ / L, . . then x = y = y ∨ 0K for some y ∈ K. If y ∈ A(L), we are done. Otherwise, take . z, w ∈ A(L) such that z ∨ w = y and thus x = z ∨ w ∨ 0K . Exactly one among z and w belong to K. Without loss of generality, let it be z. Then, it is clear . . . . that 0K ≺ z ∨ 0K and that z ∨ 0K < w ∨ 0K , since there exists only one element . . . covering 0K . Hence, x = z ∨ w ∨ 0K = w ∨ 0K . Suppose that k ≥ 4. As noted after Proposition 5 one has that K is, by itself, a lattice. Moreover, Lemma 4 guaranteesWthat K is atomistic with A(L) = A(K). Let x ∈ L[K]. If x ∈ L then, in L, x = S for some S ⊆ A(L) ⊆ J(L[K]) with |S| ≤ k − 1, because of Lemma 2 and the fact that L is B(k)-free. Of course, S is also a representation of x in L[K]. If x ∈ . . . / L,Wthen x = y for some y ∈ K. Since K is B(k − 1)-free, it follows that, in K, y = S for some S ⊆ A(K) ⊆ J(L[K]) with |S| ≤ k − 2, once again as a consequence of Lemma 2. Clearly, in L[K], . W. . W one has y = S = 0K ∨ S, where the last equality follows from the fact that . . . z = z ∨ 0K for every z ∈ S. Thus, we have a representation of y = x through no more than k − 1 join-irreducible elements of L[K]. Corollary 1 describes how (n, k)-extremal lattices can be non-deterministically constructed. In particular, the upper bound present in Theorem 2 is the best possible. Corollary 1. For every n and k, there exists at least one (n, k)-extremal lattice. Proof. Define a partial function Φ satisfying Φ : N∗ × N∗ → L ([n], ⊆), if k = 1. ({∅, {1}}, ⊆), if k ≥ 2, n = 1. , (n, k) 7→ Φ(n − 1, k)[E(Φ(n − 1, k − 1))], if n, k ≥ 2 and there exists a complete meet-embedding E : Φ(n − 1, k − 1) → Φ(n − 1, k). where L is the class of all lattices. We prove by induction on n that Φ(n, k) is a total function. The cases n = 1 and n = 2 are trivial. Let n ∈ N with n ≥ 3 and suppose that Φ(n − 1, k) is defined for every k ∈ N∗ . Let k ∈ N, k ≥ 2. By the induction hypothesis, the values Φ(n − 1, k) and Φ(n − 1, k − 1) are defined. If k = 2, then Φ(n − 1, k − 1) is a trivial lattice and the existence of a complete meet-embedding into Φ(n − 1, k) is clear and, thereby, Φ(n, k) is defined. We therefore assume k ≥ 3. By the definition of Φ, one has that Φ(n − 1, k) = Φ(n − 84 Alexandre Albano and Bogdan Chornomaz 2, k)[E(Φ(n−2, k−1))] and that Φ(n−1, k−1) = Φ(n−2, k−1)[F(Φ(n−2, k−2))] for some pair of complete meet-embeddings E and F. Applying Proposition 9 with Φ(n − 2, k − 2), Φ(n − 2, k − 1) and Φ(n − 2, k) results in the existence of a complete meet-embedding G : Φ(n − 1, k − 1) → Φ(n − 1, k), which yields that Φ(n, k) is defined. Since k is arbitrary, every Φ(n, k) is defined. The (n, k)- extremality of each lattice can be proved by induction on n as well and by invoking Theorem 3. Figure 3 depicts the diagrams of nine (n, k)-extremal lattices which are con- structible by Corollary 1. It is true that, in general, (n, k)-extremal lattices are not unique up to isomorphism: note that the (3, 3) and (4, 3)-extremal lattices in Figure 3 are also present in Figure 2 as the lattices C2 and C3 . The lattice C4 , depicted in that same figure, is a (4, 3)-extremal lattice which is not isomorphic to C3 . We shall, however, show in the next section that every extremal lattice arises from the construction described in Corollary 1. 6 Characterization of extremal lattices In the last section, we constructed lattices whose sizes are exactly the upper bound present in Theorem 2. In this section, we will show that every lattice meeting those requirements must be obtained from our construction. Lemma 5. Let L be an atomistic lattice, a an atom and c a coatom with Ac = E A(L) \ {a}. Then, the mapping x 7− → c ∧ x is a complete meet-embedding of ↑ a into ↓ c such that E(x) ≺ x for every x ∈ ↑ a. Proof. The fact that E preserves non-empty meets is clear, since c is a fixed element. Also, 1L is mapped to c = 1↓c , so that E preserves arbitrary meets. Note that AE(x) = Ac∧x = Ax ∩ Ac = Ax \ {a}. Hence, E(x) ≺ x as well as E(x) ∨ a = x. The latter implies injectivity. The next theorem shows that every extremal lattice is constructible by the process described in Corollary 1, and can be seen as a converse of that result. Theorem 4. Let L be an (n, k)-extremal lattice with k ≥ 3. Then, L = J ∪˙ K where J is an (n − 1, k)-extremal lattice and K is an (n − 1, k − 1)-extremal lattice. Moreover, there exists a complete meet-embedding E : K → J such that E(x) ≺ x for every x ∈ K. In particular, L ∼= J[E(K)]. Proof. From Lemma 3, one has that L is atomistic and we may take an atom a which is an extremal point of 1L , that is, A(L) \ {a} = Ac with c being a coatom of L. Consider the lattices J =↓ c and K =↑ a. Observe that L = J ∪˙ K and let E : K → J be a complete meet-embedding provided by Lemma 5. Clearly, J has n − 1 atoms and is B(k)-free, therefore, |J| ≤ f (n − 1, k). Moreover, K must be B(k − 1)-free: indeed, if there existed B ∼ = B(k − 1) inside K, then Why concept lattices are large 85 k=2 k=3 k=4 n=2 n=3 n=4 Fig. 3: Diagrams of (n, k)-extremal lattices with 2 ≤ n, k ≤ 4. Elements shaded in black represent the doubled (n − 1, k − 1)-extremal lattice. B ∪ E(B) would be a boolean lattice with k atoms inside J, which is impossible. The lattice K has at most n − 1 atoms, and consequently |K| ≤ f (n − 1, k − 1), Pk−1 since the function n 7→ i=0 ni is monotonic increasing. Now, we have that |J| + |K| = |L| = f (n, k) = f (n − 1, k) + f (n − 1, k − 1), where the last equality follows from Proposition 3. Since |J| ≤ f (n−1, k) and |K| ≤ f (n−1, k−1), those two inequalities must hold with equality. Therefore, J and K are, respectively, (n − 1, k) and (n − 1, k − 1)-extremal. 7 Conclusion and related work We showed an upper bound for the number of minimal generators of a context which is sharp. Extremal lattices were constructed and also characterized. The rôle played by contranominal scales in formal contexts may be seen as analogous 86 Alexandre Albano and Bogdan Chornomaz as that of cliques in simple graphs, when one considers extremality with respect to the number of edges. The Sauer-Shelah Lemma [8] provides an upper bound which is similar to that of Theorem 2. This is not a coincidence, because it can be shown, not without some effort, that the condition of a concept lattice being B(k)-free is equivalent to the family of its extents not shattering a set of size k. As for the sharpness of the bound (which we prove in Section 5), in our case it is non-trivial, whereas the sharpness for the result of Sauer and Shelah is immediate. 8 Acknowledgements We want to deeply thank Bernhard Ganter for the invaluable feedback and fruit- ful discussions. References 1. Alexandre Albano. Upper bound for the number of concepts of contranominal-scale free contexts. In Formal Concept Analysis - 12th International Conference, ICFCA 2014, Cluj-Napoca, Romania, June 10-13, 2014. Proceedings, pages 44–53, 2014. 2. Alexandre Albano and Alair Pereira do Lago. A convexity upper bound for the number of maximal bicliques of a bipartite graph. Discrete Applied Mathematics, 165(0):12 – 24, 2014. 10th Cologne/Twente Workshop on Graphs and Combinatorial Optimization (CTW 2011). 3. David Eppstein. Arboricity and bipartite subgraph listing algorithms. Information Processing Letters, 51(4):207–211, 1994. 4. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foun- dations. Springer, Berlin-Heidelberg, 1999. 5. Sergei O. Kuznetsov. On computing the size of a lattice and related decision prob- lems. Order, 18(4):313–321, 2001. 6. Erich Prisner. Bicliques in graphs I: bounds on their number. Combinatorica, 20(1):109–117, 2000. 7. Dieter Schütt. Abschätzungen für die Anzahl der Begriffe von Kontexten. Master’s thesis, TH Darmstadt, 1987. 8. Saharon Shelah. A combinatorial problem; stability and order for models and the- ories in infinitary languages. Pacific J. Math., 41(1):247–261, 1972. 9. Thomas Worsch. Lower and upper bounds for (sums of) binomial coefficients, 1994. An Aho-Corasick Based Assessment of Algorithms Generating Failure Deterministic Finite Automata Madoda Nxumalo1 , Derrick G. Kourie2,3 , Loek Cleophas2,4 , and Bruce W. Watson2,3 1 Computer Science, Pretoria University, South Africa 2 FASTAR Research, Information Science, Stellenbosch University, South Africa 3 Centre for Artificial Intelligence Research, CSIR Meraka Institute, South Africa 4 Foundations of Language Processing, Computer Science, Umeå University, Sweden {madoda,derrick,loek,bruce}@fastar.org — http://www.fastar.org Abstract. The Aho-Corasick algorithm derives a failure deterministic finite automaton for finding matches of a finite set of keywords in a text. It has the minimum number of transitions needed for this task. The DFA-Homomorphic Algorithm (DHA) algorithm is more general, deriving from an arbitrary complete deterministic finite automaton a language-equivalent failure deterministic finite automaton. DHA takes formal concepts of a lattice as input. This lattice is built from a state/out- transition formal context that is derived from the complete deterministic finite automaton. In this paper, three general variants of the abstract DHA are benchmarked against the specialised Aho-Corasick algorithm. It is shown that when heuristics for these variants are suitably chosen, the minimality attained by the Aho-Corasick algorithm can be closely approximated. A published non-lattice-based algorithm is also shown to perform well in experiments. Keywords: Failure deterministic finite automaton, Aho-Corasick algo- rithm 1 Introduction A deterministic finite automaton (DFA) defines a set of strings, called its lan- guage. It is represented as a graph with symbol-labelled transitions between states. There are efficient algorithms to determine whether an input string be- longs to the DFA’s language. An approach to reducing DFA memory require- ments is the use of so-called failure deterministic finite automata (FDFAs, also defined in Section 2). An FDFA can be used to define the same language as a DFA with a reduced number of transitions and hence, reduced space required to store transition information. Essentially, this is achieved by replacing certain DFA state transitions by so-called failure transitions. A small additional com- putational cost is incurred in recognising whether given strings are part of the c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 87–98, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 88 Madoda Nxumalo et al. language. By using a trie (the term used for a DFA graph that is a tree) and failure transitions, Aho and Corasick [1] generalised the so-called KMP algo- rithm [7] to multi-keyword pattern matching. There are two versions of their algorithm: the so-called optimal one, which we call aco, and a failure one, acf. aco builds a minimal DFA to find all matches of a given keyword set in a text. acf builds, in a first step, a trie using all prefixes of words from the set. Each state therefore represents a string which is the prefix of a keyword. Moreover, the string of a state is spelled out by transitions that connect the start state of the trie to that state. In a second step, aco then inserts a failure transition from each state of the trie to some other state. To briefly illustrate the nature failure transitions, suppose p is a state in the trie representing the string and keyword she and suppose q is another state representing the prefix string he of the keyword hers. Then a transition from p to q would indicate that he is the longest suffix of she that matches a prefix of some other keyword. With appro- priate further elaboration (details may be found in [1, 11]) the output of acf is an FDFA that is language equivalent to the aco one It can also be shown that acf is minimal in the following sense. No other FDFA that is language-equivalent to aco can have fewer transitions than acf An algorithm proposed in [8], called the DFA-Homomorphic Algorithm (DHA), constructs from any complete DFA a language-equivalent FDFA. As predicted by the theory [2], the resulting FDFA is not necessarily minimal. The abstract version of the algorithm described in [8] involves the construction of a concept lattice as explained in Section 2. The original version of the algorithm leaves a number of decisions as nondeterministic choices. It also strives for minimality by following a “greedy” heuristic in respect of information embedded in the lattice. However, concrete versions of DHA and the effect of this heuristic have not been tested. Here we propose various alternative concrete versions of the algorithm for different deterministic choices.The acf FDFAs provide a benchmark for assessing the performance of these concrete variants of DHA. An aco DFA is used as input to each of several variants of the DHA and the resulting DHA-FDFAs are compared against the acf version. An alternative approach to constructing FDFAs from an arbitrary DFA has been proposed by Kumar et al [10]5 . Their technique is based on finding the maxi- mal spanning tree [9] of a suitably weighted nondirected graph that reflects the structure of the underlying DFA. Two algorithms were proposed. One is based on a maximal spanning tree, and the other on a redefined maximal spanning tree. Further details about their algorithms may be found in their original pub- lication. The original maximal spanning tree based algorithm was included in our comparative study for its performance assessment using acf as the bench- mark. 5 Their research uses different terminology to that given above. They refer to an FDFA as a delayed-input DFA (abbreviated to D2 F A) and failure transitions are called default transitions. An Aho-Corasick Based Assessment of Algorithms Generating Failure DFAs 89 Various other FDFA related research has been conducted for certain limited contexts. See [5] for an overview. A recent example is discussed in [4], where ideas from [8] were used to modify construction of a so-called factor oracle automaton to use failure transitions, saving up to 9% of symbol transitions. Section 2 provides the formal preliminaries relevant to this research. In Sec- tion 3 we introduce the deterministic variants of the DHA that are subsequently benchmarked. Section 4 outlines the investigation’s experimental environment, the data generated, the methods of assessment and the results. Section 5 draws conclusions and points to further research work currently underway. 2 Preliminaries An alphabet is a set of symbols, Σ, of size |Σ|, and Σ ∗ denotes the set of all sequences over this alphabet, including the empty sequence, denoted by . A string (or word), s, is an element of Σ ∗ and its length is denoted |s|. Note that || = 0. The concatenation of strings p and q is represented as pq. If s = pqw then q is a substring of s, p and pq are prefixes of s and q and qw are suffixes of s. Moreover, q is a proper substring iff ¬((p = ) ∨ (w = )). Similarly, pq is a proper prefix iff w 6= and qw is a proper suffix iff p 6= . A deterministic finite automata (DFA) is a quintuple, D = (Q, Σ, δ, F, qs ), where Q is a finite set of states; Σ is an alphabet; δ ∈ Q × Σ 9 Q is the (possibly partial) symbol transition function mapping symbol/state pairs to states; qs ∈ Q is the start state; and F ⊆ Q is a set of final states. If δ is a total function, then the DFA is called complete. In the case of a complete DFA, the extension of δ is defined as δ ∗ ∈ Q × Σ ∗ −→ Q where δ ∗ (p, ) = p and if δ(p, a) = q and w ∈ Σ ∗ , then δ ∗ (p, aw) = δ ∗ (q, w). A finite string, w, is said to be accepted by the DFA iff δ ∗ (qs , w) ∈ F . The language of a DFA is the set of accepted strings. A failure DFA (FDFA) is a six-tuple, (Q, Σ, δ, f, F, qs ), where D = (Q, Σ, δ, F, qs ) is a (not necessarily complete) DFA and f ∈ Q 9 Q is a (possibly partial) failure transition function. For all a ∈ Σ and p ∈ Q, the functions δ and f are related in the following way: f(p) = q for some q ∈ Q if δ(p, a) is not defined. The extension of δ in an FDFA context is similar to its DFA version in that δ ∗ ∈ Q × Σ ∗ −→ Q and δ ∗ (p, ) = p. However: ∗ δ (q, w) if δ(p, a) = q δ ∗ (p, aw) = δ ∗ (q, aw) if δ(p, a) is not defined and f(p) = q An FDFA is said to accept string w ∈ Σ ∗ iff δ ∗ (qs , w) ∈ F . An FDFA’s language is its set of accepted strings. It can be shown that every complete DFA has a language-equivalent FDFA and vice-versa. When constructing an FDFA from a DFA, care must be taken to avoid so-called divergent cycles of failure transitions because they lead to an infinite sequence of failure traversals in string processing algorithms. (Details are provided in [8].) 90 Madoda Nxumalo et al. Subfigure 1a depicts a complete DFA for which Q = {q1 , q2 , q3 } and Σ = {a, b, c}. Its start state is q1 and q3 is the only final state. Its symbol transitions are de- picted as solid arrows between states. Subfigure 1b shows a language-equivalent FDFA where dashed arrows indicate the failure transitions. Note, for example that ab is in the language of both automata. In the DFA case, δ ∗ (q1 , ab) = δ ∗ (q1 , b) = δ ∗ (q3 , ) In the FDFA case, δ ∗ (q1 , ab) = δ ∗ (q1 , b) = δ ∗ (q2 , b) = δ ∗ (q3 , b) = δ ∗ (q3 , ). a b a b b c b f f start q1 q2 q3 start q1 q2 q3 a c c a,c f (a) D = (Q, Σ, δ, q1 , {q3 }) (b) F = (Q, Σ, δ, f, q1 , {q3 }) ha, q1 i hb, q3 i hc, q2 i hc, q1 i q1 X X X q2 X X X q3 X X X (d) State/out-transition lattice (c) State/out-transition context Fig. 1: Example automata and state/out-transition lattice This text relies on standard formal concept analysis terminology and definitions. (See, for example [6]). A so-called state/out-transition concept lattice can be derived from any DFA. The objects of its formal context are DFA states, q ∈ Q. Each attribute is a pair of the form ha, pi ∈ Σ × Q. A state q is deemed to have this attribute if δ(q, a) = p, i.e. if q is the source of a transition on a to p. Subfigure 1c is the state/out-transition context for the DFA in Subfigure 1a and Subfigure 1d is the line diagram of the associated state/out-transition concept lattice. The latter subfigure shows two intermediate concepts, each larger than the bottom concept and smaller than the top, but not commensurate with one another. The right-hand side intermediate concept depicts the fact that states q1 and q2 (its extent) are similar in that each as a transition on symbol a to q1 , b to q3 and on c to q2 — i.e. the concept’s intent is {ha, q1 i, hb, q3 i, hc,q2 i} Each concept in a state/out-transition lattice can be characterised by a certain value, called its arc redundancy. For a concept c it is defined as ar(c) = (|int(c)|− 1)×(|ext(c)|−1), where ext(c) and int(c) denote the extent and intent of concept An Aho-Corasick Based Assessment of Algorithms Generating Failure DFAs 91 c respectively. The arc redundancy of a concept represents the number of arcs that may be saved by doing the following: 1. singling out one of the states in the concept’s extent; 2. at all the remaining states in the concept’s extent, removing all out-transitions mentioned in the concept’s intent; 3. inserting a failure arc from each of the states in step 2 to the singled out state in step 1. The expression, |ext(c)| − 1 represents the number of states in step 2 above. At each such state, |int(c)| symbol transitions are removed and a failure arc is inserted. Thus, |int(c)| − 1 is the total number of transitions saved at each of |ext(c)| − 1 states so that ar(c) is indeed the total number of arcs saved by the above transformation. The positive arc redundancy (PAR) set consists of all concepts whose arc redun- dancy is greater than zero. 3 The DHA Variants For the DFA-Homomorphic Algorithm (DHA) to convert a DFA into a language equivalent FDFA, a three stage transformation of the DFA is undertaken. Ini- tially, the DFA is represented as a state/out-transition context. From the derived concept lattices, the PAR set is extracted to serve as input for the DHA. The basic DHA proposed in [8] is outlined in Algorithm 1. The variable O is used to keep track of states that are not the source of any failure transitions. This is to ensure that a state is never the source of more than one failure transition. Initially all states qualify. A concept c is selected and removed from PAR set, so that c is no longer available in subsequent iterations. The initial version of DHA proposed specifically selecting a concept, c, with maximum arc redundancy. The specification given here leaves open how the choice will be made. From c’s extent, one of the states, t, is chosen to be a failure transition target state. DHA gives no specific criteria for which state in ext(c) to choose. The remaining set of states in ext(c) is denoted by ext0 (c). Then, for each state s in ext0 (c) that qualifies to be the source of a failure transition (i.e. that is also in O) all transitions in int(c) are removed from s and a failure transition is installed from s to t. Because state s has become a failure transition source state whose target state is t, it may no longer be the source of any other failure transition, and so is removed from O. These steps are repeated until it is no longer possible to install any more failure transitions. It should be noted that in this particular formulation of the abstract algorithm the PAR set is not recomputed to reflect changes in arc redundancy as the DFA is progressively transformed into an FDFA. This does not affect the correctness of the algorithm, but may affect its optimality. Investigating such effects is not within the scope of this study. The third and fifth lines of Algorithm 1, namely 92 Madoda Nxumalo et al. c := selectAConcept(PAR) and t := getAnyState(ext(c)) respectively. are non-specific in the original formulation of DHA. Three variants of the algorithm are proposed with respect to the third line. For convenience we represent each variant by a conjunct on the right hand side of an assignment where c is the assignment target. This, of course, is a slight abuse of notation since c is not the outcome of a logical operation, but the selection of a concept from the PAR set according to a criterion represented by h mar(PAR), h me(PAR) or h mi(PAR). Each selection option a different greedy heuristic for choosing concept c from the PAR set. By greedy we mean that an element from the set is selected, based on some maximal or minimal feature, without regard to possible opportunities lost in the forthcoming iterations by making these selections. In addition to these heuristics, a single heuristic is proposed for the fifth line relating to choosing the target state, t, for the failure transitions. These choices are illustrated as colour-coded assignment statements shown in the skeleton Algorithm 2 and are now briefly explained. The rationale for these heuristics will be discussed a section below. Algorithm 1 Algorithm 2 O := Q; PAR := {c | ar(c) > 0}; O := Q; PAR := {c | ar(c) > 0}; do ((O 6= ∅) ∧ (PAR 6= ∅)) → do ((O 6= ∅) ∧ (PAR 6= ∅)) → c := SelectConcept(PAR); c := h mar(PAR) ∨ h me(PAR) ∨ h mi(PAR) PAR : = PAR\{c}; PAR : = PAR\{c} t := getAnyState(ext(c)); t := ClosestT oRoot(c) ext0 (c) := ext(c)\{t}; ext0 (c) := ext(c)\{t}; for each (s ∈ ext0 (c) ∩ O) → for each (s ∈ ext0 (c) ∩ O) → if a failure cycle is not created → if a failure cycle is not created → for each ((a, r) ∈ int(c)) → for each ((a, r) ∈ int(c)) → δ : = δ \ {hs, a, ri} δ : = δ \ {hs, a, ri} rof ; rof ; f(s) : = t; f(s) : = t; O : = O\{s} ; O : = O\{s} ; fi fi rof rof od od The heuristics for choosing concept c from the PAR set in each iteration are as follows: The h mar heuristic: c is a PAR concept with a maximum arc redun- dancy. The h mi heuristic: c is a PAR concept with a maximum intent size. The h me heuristic: c is a PAR concept with a minimum extent size. Once one of these heuristics has been applied, the so-called ClosestToRoot heuristic is used to select a state t in ext(c) to become the target state of failure transitions from each of the remaining states in ext(c). The heuristic means that t is selected as the state in ext(c) that is closest6 to aco’s start state. Transition modifications are subsequently made on the FDFA produced to date, provided that a divergent failure cycle is not produced. 6 Since a trie has no cycles, the notion of “closest” here simply means a state with the shortest path from the start state to that state. An Aho-Corasick Based Assessment of Algorithms Generating Failure DFAs 93 4 The Experiment The experiments were conducted on an Intel i5 dual core CPU machine, running Linux Ubuntu 14.4. Code was written in C++ and compiled under the GCC version 4.8.2 compiler. It can easily be demonstrated that if there are no overlaps between proper pre- fixes and proper suffixes of keywords in a keyword set, then the associated acf FDFA’s failure transitions will all loop back to its start state, and out Clos- estToRoot heuristic will behave similarly. To avoid keyword sets that lead to such trivial acf FDFAs, the following keyword set construction algorithm was devised. Keywords (also referred to as patterns) are from an alphabet set of size 10. Their lengths range from 5 to 60 characters. Keyword sets of sizes 5, 10, 15, . . . , 100 respectively are generated. For each of these 20 different set sizes, twelve alter- native keyword sets are generated. Thus in total 12 × 20 = 240 keyword sets are available. To construct a keyword set of size N , an initial N random strings are generated7 . Each such string has random content taken from the alphabet and random length in the range 5 and 30. However, for reasons given below, only a limited number of these N strings, say M , are directly inserted into the keyword set. The set is then incrementally grown to the desired size, N , by repeating the following: Select a prefix of random length, say p, from a randomly selected string in the current keyword set. Remove a string, say w, from the set of strings not yet in the keyword set. Insert either pw or wp into the keyword set. Steps are taken to ensure that there is a reasonable representation of each of these three differently constructed keywords in a given keyword set. These keyword sets served as input to the SPARE-PARTS toolkit [12] to cre- ate the associated acf FDFAs and the aco DFAs. A routine was written to ex- tract state/out-transition contexts from the aco DFAs. These contexts were used by the lattice construction software package known as FCART (version 0.9)[3] supplied to us by National Research University Higher School of Economics (Moscow, Russia). The DHA variants under test used resulting concept lattices to generate the FDFAs. As previously mentioned, a Kumar et al [10] algorithm was also implemented to generate FDFAs from the DFAs. These will be refer- enced as kum FDFAs. Figures 2 and 3 give several views of the extent to which the DHA-based and kum FDFAs correspond with acf FDFAs. The experimental data is available online8 . For notational convenience fijk denotes the set of failure transitions of 7 Note that all random selections mentioned use a pseudo-random number generator. 8 The experimental data files can be found at this URL: http : //www .fastar .org/wiki/index .php?title = Conference Papers#2015 . 94 Madoda Nxumalo et al. the FDFA corresponding to k th keyword set of size 5j that was generated by the algorithm variant i ∈ FA\{aco}, where k ∈ [1, 12], j ∈ [1, 20] and FA = i {acf, aco, mar, mi, me, kum}. Similarly, δjk refers to symbol transition sets of the associated FDFAs and, in this case, also the aco DFAs if i = aco. A dot notation in the subscript is used for averages. Thus, for some i ∈ FA\{aco} we use |fij. | to denote the average number of failure transitions in the i-type FDFAs produced i by the 12 keyword sets of size 5j, and similarly |δj. | represents the average number of symbol transitions. 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 80 ● ● ● ● ● ● Arcs Savings (%) 60 ● 40 ● ● ● ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● ● ● mar mi me ● acf kum 0 0 20 40 60 80 100 Pattern Set Size aco i |δj. | − (|δj. | + |fij. |) Fig. 2: aco | × 100 |δj. Figure 2 shows how many more transitions aco automata require (as a percentage of aco) compared to each of the FDFA variants. Note that data has been averaged over the 12 keyword set samples for each of the set sizes and note that the FDFA transitions include both symbol and failure transitions. The minimal acf FDFAs attain an average savings of about 80% over all sample sizes and the mi, me and kum FDFAs track this performance almost identically. Although not clearly visible in the graph, the me heuristic shows a slight degradation for larger set sizes, while the kum FDFAs consistently perform about 1% to 2% worse. By way of contrast, the mar heuristic barely achieves a 50% savings for small sample sizes, and drops below a 20% savings for a sample size of about 75, after which there is some evidence that it might improve slightly. The fact that the percentage transition savings of the various FDFA variants closely correspond to that of acf does not mean that the positioning of the failure and symbol transitions should show a one-to-one matching. The extent to which the transitions precisely match one another is shown in Figure 3. These box-and- An Aho-Corasick Based Assessment of Algorithms Generating Failure DFAs 95 mi me mi me No. of Inequivalent Symbol Arcs No. of Inequivalent Symbol Arcs 2.0 100 100 15 Matching failure arcs (%) Matching failure arcs (%) 1.5 90 90 ● ● 10 80 80 1.0 ● ● 70 70 5 0.5 60 60 0.0 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Pattern Set Size Pattern Set Size Pattern Set Size Pattern Set Size mar kum mar kum 80 No. of Inequivalent Symbol Arcs ● No. of Inequivalent Symbol Arcs 250 10000 25 Matching failure arcs (%) Matching failure arcs (%) 8000 200 60 ● ● 20 ● 6000 ● 150 ● ● 40 15 ● 4000 100 ● ● 20 2000 50 ● ● ● 10 ● ● ● ● ● ● ● 0 0 0 5 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Pattern Set Size Pattern Set Size Pattern Set Size Pattern Set Size acf acf (a) |δjk | − |δjk i ∩ δjk | |fijk ∩ facf jk | (b) × 100 |facf jk | Fig. 3: Transition Matches whisker plots show explicitly the median, 25th and 75th percentiles as well as outliers of each of the 12 sample keyword sets of a given size. Subfigure 3a shows the number of symbol transitions in acf FDFAs that do not correspond with those in mi, me, mar and kum respectively. Subfigure 3b shows the percentage of acf failure transitions matching those of the FDFAs generated by mi, me, mar and kum respectively. The symbol transitions for the mi heuristic are practically identical to those of acf, differing by at most two. Differences are not significantly related to sample size. Differences for me are somewhat larger, increasing slightly with larger sam- ple size, though still relatively modest in relation to the overall number of FDFA transitions. (There are |Q| − 1 transitions in the underlying trie.) In the cases of mar and kum, the differences are approximately linearly dependent on the size of the keyword set, reaching over 9000 and 250 respectively for keywords sets of size 100. The failure transition differences in regard to mi and me show a very similar pattern as keyword size increases. Only in isolated instances do they fully match those of acf, but the matching correspondence drops from a median of more than 95% in the case of the smallest keywords sets to a median of about 50% for the largest keyword sets. The median kum failure transition correspondence with acf is in a range of about 12-18% for all pattern set sizes. However, in the case of mar, the degree of correspondence is much worse: at best the median value is just over 60% for small keyword sets, dropping close to zero for medium 96 Madoda Nxumalo et al. range keyword set sizes, and then increasing slightly to about 10% for the largest keyword sets. Overall, Figures 2 and 3 reveal that there is a variety of ways in which failure transitions may be positioned in an FDFA, and that lead to very good—in many cases even optimal—transition savings. It is interesting to note that even in the kum FDFAs, the total number of transition savings is very close to optimal, despite relatively large differences in the positioning of the transitions. However, the figures also show that this flexibility in positioning failure transitions to achieve good arc savings eventually breaks down, as in the case of the mar FDFAs. One of the reasons for differences between acf FDFAs and the others is that some implementations of the acf algorithm, including the SPARE-PARTS implemen- tation, inserts a failure arc at every state (except the start state), even if there is an out-transition on every alphabet symbol from a state. Such a failure arc is of course redundant. Inspection of the data showed that some of the randomly generated keyword sets lead to such “useless” failure transitions, but they are so rare that they do not materially affect the overall observations. The overall rankings of the output FDFAs of the various algorithms to acf could broadly be stated as mi > me > kum > mar. This ranking is with respect to closeness of transition placement to acf. Since the original focus of this study was to explore heuristics for the DHA algorithm, further comments about the kum algorithm are reserved for the next section. The rationale for the mar heuristic is clear: it will cause the maximum savings in transitions in a given iteration. It was in fact the initial criterion proposed in [8]. It is therefore somewhat surprising that it did not perform very well in comparison to other heuristics. It would seem that, in the present context, it is too greedy—i.e. by selecting a concept whose extent contains the set of states that can effect maximal savings such that in one iteration, it deleteriously elimi- nates from consideration concepts whose extent contains some of those states in subsequent iterations. Note that, being based on the maximum of the product of extent and intent sizes, it will tend to select concepts in the middle of the concept lattice. When early trials in our data showed up mar’s relatively poor performance, the mi and me heuristics were introduced to prioritise concepts in the top or bottom regions of the lattice. These latter two heuristics will maximize the number of symbol transitions to be removed per state when replacing them with failure transitions, in so far as concepts with large intents tend to have small extents and vice-versa. Although such a relationship is, of course, data-dependent, random data tends in that direction, as was confirmed by inspection of our data. These two heuristics appear to be rather successful at attaining acf-like FDFAs. However, the ClosestToRoot heuristic has also played a part in this success. Note that the acf failure transitions are designed to record that a suffix of a state’s An Aho-Corasick Based Assessment of Algorithms Generating Failure DFAs 97 string is also a prefix of some other state’s string. Thus, f (q) = p means that a suffix of state q’s string is also a prefix of state p’s string. However, since there may be several suffixes of q’s string and several states whose prefixes meet this criterion, the definition of f requires that the longest possible suffix of q’s string should be used. This ensures that there is only one possible state, p, in the trie whose prefix corresponds to that suffix. Thus, on the one hand, acf directs a failure transition “backwards” towards a state whose depth is less than that of the current state. On the other hand, acf selects a failure transition’s target state to be as far as possible from the start state, because the suffix (and therefore also the prefix) used must be maximal in length. The ClosestToRoot heuristic approximates the acf action in that it also directs failure transitions backwards towards the start state. However, by selecting a failure transition’s target state to be as close as possible from the start state, it seems to contradict acf actions. It is interesting to note in Subfigure 3b that both mi and me show a rapid and more or less linear decline in failure transition matchings with respect to acf when pattern set size reaches about 65. We conjec- ture that for smaller keyword sizes, theClosestToRoot heuristic does not conflict significantly with acf’s actions because there are few failure target states from which to choose. When keyword set sizes become greater, there is likely to be more failure target states from which to choose, and consequently less correspon- dence between the failure transitions chosen according to differing criteria.This is but one of several matters that has been left for further study. 5 Conclusions and Future Agenda Our ultimate purpose is to investigate heuristics for building FDFAs from gen- eralised complete DFAs—a domain where optimal behaviour is known a priori to be computationally hard. The comparison against acf FDFAs outlined above is a firm but limited starting point. The next step is to construct complete DFAs from randomly generated FDFAs and examine the extent to which the heuristics tested out in this study can reconstruct the latter from the former. Because gen- eralised DFAs can have cycles, the ClosestToRoot heuristic will be generalised by using Dijktra’s algorithm for calculating the shortest distance from the start state to each DFA state. It remains to be seen whether mar will perform any better in the generalised context. The relatively small alphabet size of 10 was dictated by unavoidable growth in the size of the associated concept lattices. Even though suitable strategies for trimming the lattice (for example by not generating concepts with arc redun- dancy less than 2) are being investigated, it is recognised that use of DHA will always be constrained by the potential for the associated lattice to grow exponen- tially. Nevertheless, from a theoretical perspective a lattice-based DHA approach to FDFA generation is attractive because it encapsulates the solution space in which a minimal FDFA might be found—i.e. each ordering of its concepts maps 98 Madoda Nxumalo et al. to a possible language-equivalent FDFA that can be derived from DFA and at least one such ordering will be a minimal FDFA. The kum FDFA generation approach is not as constrained by space limitations as the DHA approach and in the present experiments it has performed reasonably well. In the original publication, a somewhat more refined version is reported that attempts to avoid unnecessary chains of failure transitions. Future research should examine the minimising potential of this refined version using generalised DFAs as input and should explore more fully the relationship between these kum- based algorithms and the DHA algorithms. References 1. A. V. Aho and M. J. Corasick. Efficient string matching: An aid to bibliographic search. Commun. ACM, 18(6):333–340, 1975. 2. H. Björklund, J. Björklund, and N. Zechner. Compression of finite-state automata through failure transitions. Theor. Comput. Sci., 557:87–100, 2014. 3. A. Buzmakov and A. Neznanov. Practical computing with pattern structures in FCART environment. In Proceedings of the International Workshop ”What can FCA do for Artificial Intelligence?” (FCA4AI at IJCAI 2013), Beijing, China, August 5, 2013., pages 49–56, 2013. 4. L. Cleophas, D. G. Kourie, and B. W. Watson. Weak factor automata: Comparing (failure) oracles and storacles. In J. Holub and J. Žďárek, editors, Proceedings of the Prague Stringology Conference 2013, pages 176–190, Czech Technical University in Prague, Czech Republic, 2013. 5. M. Crochemore and C. Hancart. Automata for matching patterns. In S. A. Rozen- berg G., editor, Handbook of Formal Languages, volume 2, Linear Modeling: Back- ground and Application, pages 399–462. Springer-Verlag, 1997. incollection. 6. B. Ganter, G. Stumme, and R. Wille, editors. Formal Concept Analysis, Foun- dations and Applications, volume 3626 of Lecture Notes in Computer Science. Springer, 2005. 7. D. E. Knuth, J. H. M. Jr., and V. R. Pratt. Fast pattern matching in strings. SIAM J. Comput., 6(2):323–350, 1977. 8. D. G. Kourie, B. W. Watson, L. Cleophas, and F. Venter. Failure deterministic finite automata. In J. Holub and J. Žďárek, editors, Proceedings of the Prague Stringology Conference 2012, pages 28–41, Czech Technical University in Prague, Czech Republic, 2012. 9. J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical Society, 7(1):48–50, 1956. 10. S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, and J. S. Turner. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In Proceedings of the ACM SIGCOMM 2006 Conference on Applications, Tech- nologies, Architectures, and Protocols for Computer Communications, Pisa, Italy, September 11-15, 2006, pages 339–350, 2006. 11. B. W. Watson. Taxonomies and Toolkits of Regular Language Algorithms. PhD thesis, Eindhoven University of Technology, Sept. 1995. 12. B. W. Watson and L. G. Cleophas. SPARE Parts: a C++ toolkit for string pattern recognition. Softw., Pract. Exper., 34(7):697–710, 2004. Context-Aware Recommender System Based on Boolean Matrix Factorisation Marat Akhmatnurov and Dmitry I. Ignatov National Research University Higher School of Economics, Moscow dignatov@hse.ru Abstract. In this work we propose and study an approach for collabora- tive filtering, which is based on Boolean matrix factorisation and exploits additional (context) information about users and items. To avoid simi- larity loss in case of Boolean representation we use an adjusted type of projection of a target user to the obtained factor space. We have com- pared the proposed method with SVD-based approach on the MovieLens dataset. The experiments demonstrate that the proposed method has better MAE and Precision and comparable Recall and F-measure. We also report an increase of quality in the context information presence. Keywords: Boolean Matrix Factorisation, Formal Concept Analysis, Recommender Algorithms, Context-Aware Recommendations 1 Introduction Recommender Systems have recently become one of the most popular applica- tions of Machine Learning and Data Mining. Their primary aim is to help users to find proper items like movies, books or goods within an underlying informa- tion system. Collaborative filtering recommender algorithms based on matrix factorisation (MF) techniques are now considered industry standard [1]. The main assumption here is that similar users prefer similar items and MF helps to find (latent) similarity in the reduced space efficiently. Among the most often used types of MF we should definitely mention Sin- gular Value Decomposition (SVD) [2] and its various modifications like Proba- bilistic Latent Semantic Analysis (PLSA) [3]. However, several existing factori- sation techniques, for example, non-negative matrix factorisation (NMF) [4] and Boolean matrix factorisation (BMF) [5], seem to be less studied in the context of Recommender Systems. Another approach similar to MF is biclustering, which has also been successfully applied in recommender system domain [6,7]. For ex- ample, Formal Concept Analysis (FCA) [8] can be also used as a biclustering technique and there are several examples of its applications in the recommender systems domain [9,10]. A parameter-free approach that exploits a neighbour- hood of the object concept for a particular user also proved its effectiveness [11]; it has a predecessor based on object-attribute biclusters [7] that also capture the neighbourhood of every user and item pair in an input formal context. Our c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 99–110, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 100 Marat Akhmatnurov and Dmitry I. Ignatov previous approach based on FCA exploits Boolean factorisation based on formal concepts and follows user-based k-nearest neighbours strategy [12]. The aim of this study is to continue comparing the recommendation qual- ity of several aforementioned techniques on the real dataset and investigation of methods’ interrelationship and applicability. In particular, in our previous study, it was especially interesting to conduct experiments and compare recommenda- tion quality in case of a numeric input matrix and its scaled Boolean counterpart in terms of Mean Absolute Error (MAE) as well as Precision and Recall. Our previous results showed that the BMF-based approach is of comparable quality with the SVD-based one [12]. Thus, one of the next steps is definitely usage of auxiliary information containing users’ and items’ features, i.e. so called context information (for BMF vs SVD see section 4). Another novelty of the paper is defined by the fact that we have adjusted the original Boolean projection of users to the factor space by support-based weights that results in a sufficient quality increase. We also investigate the approximate greedy algorithm proposed in [5] in the recommender setting, which tends to generate factors with large number of users, and more balanced (in terms of ratio between users’ and items’ number per factor) modification of the Close-by- One algorithm [13]. The practical significance of the paper is determined by the demand of rec- ommender systems’ industry, that is focused on gaining reliable quality in terms of average MAE. The rest of the paper consists of five sections. Section 2 is an introductory review of the existing MF-based approaches to collaborative filtering. In section 3 we describe our recommender algorithm which is based on Boolean matrix factorisation using closed sets of users and items (that is FCA). Section 4 contains results of experimental comparison of two MF-based recommender algorithms by means of cross-validation in terms of MAE, Precision, Recall and F -measure. The last section concludes the paper. 2 Introductory review In this section we briefly describe two approaches to the decomposition of real- valued and Boolean matrices. In addition we provide the reader with the general scheme of user-based recommendation that relies on MF and a simple way of direct incorporation of context information into MF-based algoritms. 2.1 Singular Value Decomposition Singular Value Decomposition (SVD) is a decomposition of a rectangular matrix A ∈ Rm×n (m > n) into a product of three matrices Σ A=U V T, (1) 0 Context-Aware Recommender System Based on BMF 101 where U ∈ Rm×m and V ∈ Rn×n are orthogonal matrices, and Σ ∈ Rn×n is a diagonal matrix such that Σ = diag(σ1 , . . . , σn ) and σ1 ≥ σ2 ≥ . . . ≥ σn ≥ 0. The columns of the matrix U and V are called singular vectors, and the numbers σi are singular values. In the context of recommendation systems rows of U and V can be inter- preted as vectors of user’s and items’s attitude to a certain topic (factor), and the corresponding singular values as importance of the topic among the others. The main disadvantages are the dense outputted decomposition matrices and negative values of factors which are difficult to interpret. The advantage of SVD for recommendation systems is that this method allows to obtain a vector of user’s attitude to certain topics for a new user without SVD decomposition of the whole matrix. The computational complexity of SVD according to [2] is O(mn2 ) floating- point operations if m ≥ n or more precisely 2mn2 + 2n3 . 2.2 Boolean Matrix Factorisation based on FCA Description of FCA-based BMF. Boolean matrix factorisation (BMF) is a de- composition of the original matrix I ∈ {0, 1}n×m , where Iij ∈ {0, 1}, into a Boolean matrix product P ◦ Q of binary matrices P ∈ {0, 1}n×k and Q ∈ {0, 1}k×m for the smallest possible number of k. We define Boolean matrix prod- uct as follows: _k (P ◦ Q)ij = Pil · Qlj , (2) l=1 W where denotes disjunction, and · conjunction. Matrix I can be considered a matrix of binary relations between set X of objects (users), and a set Y of attributes (items that users have evaluated). We assume that xIy iff the user x evaluated object y. The triple (X, Y, I) clearly forms a formal context1 . Consider a set F ⊆ B(X, Y, I), a subset of all formal concepts of context (X, Y, I), and introduce matrices PF and QF : 1, i ∈ Al , 1, j ∈ Bl , (PF )il = (QF )lj = , 0, i ∈ / Al , 0, j ∈ / Bl . where (Al , Bl ) is a formal concept from F . We can consider decomposition of the matrix I into binary matrix product PF and QF as described above. The theorems on universality and optimality of formal concepts are proved in [5]. There are several algorithms for finding PF and QF by calculating formal concepts based on these theorems [5]. The approximate algorithm we use for comparison (Algorithm 2 from [5]) avoids computation of all possible formal concepts and therefore works much faster [5]. Time estimation of the calculations in the worst case yields O(k|G||M |3 ), where k is the number of found factors, |G| is the number of objects, |M | is the number of attributes. 1 We have to omit basic FCA definitions; for more details see [8]. 102 Marat Akhmatnurov and Dmitry I. Ignatov 2.3 Contextual information Contextual Information is a multi-faceted notion that is present in several dis- ciplines. In the recommender systems domain, the context is any auxiliary in- formation concerning users (like gender, age, occupation, living place) and/or items (like genre of a movie, book or music), which shows not only a user’s mark given to an item but explicitly or implicitly describes the circumstances of such evaluation (e.g., including time and place) [15]. From the representational viewpoint context2 can be described by a binary relation, which shows that a user or an item possesses a certain attribute-value pair. In case the contextual information is described by finite-valued attributes, it can be represented by finite number of binary relations; otherwise, when we have countable or continuous values, their domains can be split into (semi)intervals (cf. scaling in FCA). As a result one may obtain a block matrix: RCuser I= , Citem O where R is a utility matrix of users’ ratings to items, Cuser represents context information of users, Citem contains context iformation of items and O is zero- filled matrix. Table 1. Adding auxialiry (context) information Movies Gender Age Brave Termi- Gladi- Million- Hot God- M F 0-20 21-45 46+ Heart nator ator aire Snow father from ghetto Anna 5 5 5 2 + + Vladimir 5 5 3 5 + + Katja 4 4 5 4 + + Mikhail 3 5 5 5 + + Nikolay 2 5 4 + + Olga 5 3 4 5 + + Petr 5 4 5 4 + + Drama + + + + + Action + + + + Comedy + + In case of more complex rating’s scale the ratings can be reduced to binary scale (e.g., “like/dislike”) by binary thresholding or by FCA-based scaling. 2 In order to avoid confusion, please note that formal context is a different notion. Context-Aware Recommender System Based on BMF 103 2.4 General scheme of user-based recommendations Once a matrix of ratings is factorised we need to learn how to compute recom- mendations for users and to evaluate whether a particular method handles this task well. For the factorised matrices already well-known algorithm based on the simi- larity of users can be applied, where for finding k nearest neighbors we use not the original matrix of ratings R ∈ Rm×n , but the matrix I ∈ Rm×f , where m is a number of users, and f is a number of factors. After the selection of k users, which are the most similar to a given user, based on the factors that are peculiar to them, it is possible, based on collaborative filtering formulas to calculate the prospective ratings for a given user. After generation of recommendations the performance of the recommender system can be estimated by measures such as MAE, Precision and Recall. Collaborative recommender systems try to predict the utility (in our case ratings) of items for a particular user based on the items previously rated by other users. Memory-based algorithms make rating predictions based on the entire col- lection of previously rated items by the users. That is, the value of the unknown rating ru,m for a user u and item m is usually computed as an aggregate of the ratings of some other (usually, the k most similar) users for the same item m: ru,m = aggrũ∈Ũ rũ,m , where Ũ denotes a set of k users that are the most similar to user u, who have rated item m. For example, the function aggr may be weighted average of ratings [15]: X X ru,m = sim(ũ, u) · rũ,m / sim(u, ũ). (3) ũ∈Ũ ũ∈Ũ The similarity measure between users u and ũ, sim(ũ, u), is essentially an inverse distance measure and is used as a weight, i.e., the more similar users c and ũ are, the more weight rating rũ,m will carry in the prediction of rũ,m . The similarity between two users is based on their ratings of items that both users have rated. There are several popular approaches: Pearson correlation, cosine-based, and Hamming-based similarities. We further compare the cosine-based and normalised Hamming-based simi- larities: 1/2 X X X simcos (u, v) = rum · rvm / 2 rum 2 rvm (4) m∈M̃ m∈M̃ m∈M̃ X simHam (u, v) = 1 − |rum − rvm |/|M̃ |, (5) m∈M̃ 104 Marat Akhmatnurov and Dmitry I. Ignatov where M̃ is either the set of co-rated items (movies) for users u and v or the whole set of items. To apply this approach in case of FCA-based BMF recommender algorithm we simply consider the user-factor matrices obtained after factorisation of the initial data as an input. For the input matrix in Table 1 the corresponding decomposition is below: 100100011 0 1 0 0 0 1 1 0 0 10010000000 1 0 0 0 0 1 0 1 1 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 0 0 1 0 1 1 0 0 0 1 1 0 0 ◦ 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 10110001000 100000000 3 Proposed Approach In contrast to [5], for the recommender setting we mostly interested whether the concepts of more balanced extent and intent size may give us an advantage and use the following criterion to this end: W (A, B) = (2|A||B|)/(|A|2 + |B|2 ) ∈ [0; 1], (6) where (A, B) is a formal concept. In subsection 2.2 we recalled that finding Boolean factors is reduced to the task of finding of covering formal concepts for the same input matrix. To this end we modified Close-by-One ([13]). This algorithm traverses the tree of corresponding concept lattice in depth-first manner and returns the set of all formal concepts, which is redundant for the Boolean decomposition task. The deeper the algorithm is in the tree, the larger the intents are, and the smaller the extents of formal concepts. Thus, for every branch of the tree the proposed measure in eq. (6) is growing until some depth and then (in case the traverse continues) goes down. The proposed modifications are: 1) the traverse of a certain branch is carried out until W is growing with the covered square (size of extent × size of intent); 2) at each iteration we do not accept concepts with intents that contained in the union of intents of previously generated concepts. In case the intent of a certain concept is covered by its children (fulfilling condition 1), then this concept is not included into F. For Close-by-One there is a linear order on G. Assume C ⊂ G is generated from A ⊂ G by addition g ∈ G (C = A ∪ {g}) such that g max(A), then the set C 00 is called canonically generated if min(C 00 \ C) g. Context-Aware Recommender System Based on BMF 105 Algorithm 1: Generation of balanced formal concepts Data: Formal context (U, M, I) Result: The set of balanced formal concepts F foreach u ∈ U do A ← {u}; stack.push(A0 ); g ← u; g + +; repeat if g ∈ / U then if stack.T op 6= ∅ then add (A00 , A0 ) to F; stack.T op ← ∅; while stack.T op = ∅ do g ← max(A); A ← A \ {g}; stack.pop; g + +; else B ← A ∪ {g}; if (B 00 is a canonical generation) and W (C 00 , C 0 ) ≥ W (A00 , A0 ) and |C 00 × C 0 | ≥ |A00 × A0 | then stack.T op ← (stack.T op \ C 0 ); A ← C; g + +; until A 6= ∅; return F; The obtained set F is still be redundant, that is why we further select fac- tors with maximal coverage until we have covered the whole matrix or required percentage. The main aim of factorisation is the reduction of computation steps and revealing latent similarity since users’ similarities are computed in a factor space. As a projection matrix of user profiles to a factor space one may use “user-factor” from Boolean factorisation of utility matrix (P in (2)). However, in this case in the obtained user profiles most of the vector components are getting zeros, and thus we lose similarity information. To smooth the loss effects we proposed the following weighted projection: P Iuv · Qf v Iu· · Qf · v∈V P̃uf = = P , ||Qf · ||1 Qf v v∈V where P˜uf indicates whether factor f covers user u, Iu· is a binary vector describing profile of user u, Qf · is a binary vector of items belonging to factor f 106 Marat Akhmatnurov and Dmitry I. Ignatov (the corresponding row of Q in decomposition eq. (2)). The coordinates of the obtained projection vector lie within [0; 1]. For Table 1 the weighted projection is as follows: 1 1 5 0 1 0 31 13 1 1 0 1 1 1 1 1 1 1 1 3 21 45 21 2 3 4 1 5 41 15 21 1 3 11 11 0 1 2 2 5 2 12 11 3 4 0 1 0 1 P̃ = 5 3 3 0 0 1 1 0 1 0 1 1 1 1 . 25 1 31 13 2 1 1 1 1 25 1 25 32 23 3 23 1 25 21 15 1 32 3 11 41 0 5 2 5 1 3 1 3 4 1 2 1 10 0 5 0 0 0 3 2 4 Experiments The proposed approach and compared ones have been implemented in C++ and evaluated on the MovieLens-100k data set. This data set features 100000 ratings in five-star scale, 1682 Movies, Contextual information about movies (19 genres), 943 users (each user has rated at least 20 movies), and demographic info for the users (gender, age, occupation, zip (ignored)). The users have been divided into seven age groups: under 18, 18-25, 26-35, 36-45, 45-49, 50-55, 56+. Five star ratings are converted to binary scale by the following rule: ( 1, Rij > 3, Iij = 0, else The scaled dataset is split into two sets according to bimodal cross-validation scheme [16]: training set and test set with a ratio 80:20, and 20% of ratings in the test set are hidden3 . Measure of users similarity First of all, the influence of similarity has been compared. As we can see in the Fig. 4, Hamming distance based similarity is significantly better in terms of M AE and Precision. However it is worse in Recall and F-measure. Even though, given the superiority in terms of M AE (widely adopted in the RS community measure), we decided to use Hamming distance based similarity. Projection into factor space In the series of tests the influence of projection method has been studied. The weighted projection keeps more information and as a result helps us to find similar user of higher accuracy. That is why this method has significant primacy in terms of all investigated measures of quality. 3 This partition into test and training set is done 5 times resulting in 25 hidden submatrices and differs from the one provided by MovieLens group; hence the results might be different. Context-Aware Recommender System Based on BMF 107 0.4 1 0.35 0.8 Precision MAE 0.3 0.6 0.25 0.4 0.2 0.2 0 20 40 60 80 100 0 20 40 60 80 100 Number of neighbours Number of neighbours 0.5 0.4 Hamming 0.4 0.3 Cosine F−measure Recall 0.3 0.2 0.2 0.1 0.1 0 0 20 40 60 80 100 0 20 40 60 80 100 Number of neighbours Number of neighbours Fig. 1. Comparison of two similarity measures (BMF at 80% coverage) 0.3 0.8 0.28 0.6 Precision MAE 0.26 0.4 0.24 0.2 0.22 0 0 20 40 60 80 100 0 20 40 60 80 100 Number of neigbours Number of neigbours 0.4 0.4 Weighted projection 0.3 0.3 Boolean projection F−measure Recall 0.2 0.2 0.1 0.1 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Number of neigbours Number of neigbours Fig. 2. Comparison of two types of projection into factor space FCA-based algorithm and factors number The main studied algorithm to find Boolean factors as formal concepts is a modified algorithm Close by One. It was compared with greedy algorithm from [5] in terms of factors number and final RS quality measures. Coverage 50% 60% 70% 80% 90% Modified Close by One 168 228 305 421 622 Greedy algorithm 222 297 397 533 737 CbO covers the input matrix with a smaller count of factors, but it requires more time (in our experiments, 180 times more on average with one thread calculations). At the same time we have to admit that there is no influence to RS quality: thus Recall, Precision and MAE mainly differ only in the third digit. Incorporation of context information and comparison with SVD For the SVD- based approach additional (context) information has been attached in a similar 108 Marat Akhmatnurov and Dmitry I. Ignatov way, but there we use maximal rating (5 stars) in the attached columns and rows. Coverage 50% 60% 70% 80% 85% 90% BMF 168 228 305 421 508 622 BMF (No context information) 163 220 294 401 479 596 SVD 162 218 287 373 430 496 SVD (No context information) 157 211 277 361 416 480 BMF and SVD give similar number of factors, especially for small coverage; context information does not significantly change their number, but it gives an increase of precision (1-2% more accurate predictions in Table 4). Table 2. Influence of contextual information (80% coverage) Number Precision Recall F-measure MAE of neighbours clean cntxt clean cntxt clean cntxt clean cntxt 1 0.3589 0.3609 0.2668 0.2647 0.3061 0.3054 0.2446 0.2434 5 0.6353 0.6442 0.1420 0.1412 0.2321 0.2317 0.2371 0.2359 10 0.6975 0.7045 0.1126 0.1114 0.1938 0.1924 0.2399 0.2388 15 0.7168 0.7258 0.0994 0.0979 0.1746 0.1726 0.2422 0.2411 20 0.7282 0.7373 0.0911 0.0903 0.1619 0.1610 0.2442 0.2429 25 0.7291 0.7427 0.0861 0.0853 0.1540 0.1531 0.2457 0.2445 30 0.7318 0.7426 0.0823 0.0818 0.1480 0.1474 0.2472 0.2459 40 0.7342 0.7508 0.0767 0.0759 0.1389 0.1379 0.2497 0.2484 50 0.7332 0.7487 0.0716 0.0712 0.1304 0.1301 0.2518 0.2504 60 0.7314 0.7478 0.0682 0.0678 0.1247 0.1243 0.2536 0.2522 70 0.7333 0.7477 0.0658 0.0654 0.1208 0.1202 0.2552 0.2538 80 0.7342 0.7449 0.0632 0.0624 0.1164 0.1151 0.2567 0.2553 100 0.7299 0.7461 0.0590 0.0583 0.1092 0.1081 0.2594 0.2580 With a similar number or factors (SVD at 85% coverage and BMF at 80%) Boolean Factorisation results in smaller M AE and higher Precision where num- ber of neighbours is not high. It can be explained by different nature of factors in these factorisation models. 5 Conclusion In the paper we considered two modifications of Boolean matrix factorisation, which are suitable for Recommender Systems. They were compared on real datasets with the presence of auxiliary (context) information. We found out that MAE of our BMF-based approach is sufficiently lower than MAE of SVD- based approach for almost the same number of factor at fixed coverage level of BMF and SVD. The Precision of BMF-based approach is slightly lower when the number of neighbours is about a couple of dozens and comparable for the Context-Aware Recommender System Based on BMF 109 0.4 1 0.35 0.8 Precision MAE 0.3 0.6 0.25 0.4 0.2 0.2 0 20 40 60 80 100 0 20 40 60 80 100 Number of neighbours Number of neighbours 0.4 0.4 BMF80+context 0.3 0.3 SVD85+context F−measure SVD85 Recall 0.2 0.2 0.1 0.1 0 0 0 20 40 60 80 100 0 20 40 60 80 100 Number of neighbours Number of neighbours Fig. 3. Comparison of different matrix factorisation approaches remaining part of the observed range. The Recall is lower than results in lower F-measure. The proposed weighted projection alleviates the information loss of original Boolean projection resulting in a substantial quality gain. We also revealed that the presence of contextual information results in a small quality increase (about 1-2%) in terms of MAE, Recall and Precision. We studied the influence of more balanced factors in terms of ratio of number of users and items. Finally, we should report that greedy approximate algorithm [5], even though that it results in more factors with larger user’s component, is faster and demonstrates almost the same quality. So, its use is beneficial for recommender systems due to polynomial time computational complexity. As a future research direction we would like to investigate the proposed approach with the previously ([9,6,10,7]) and recently introduced FCA-based ones ([11,12,17]). As for Boolean matrix factorisation in case of context-aware information, since the data can be naturally represented as multi-relational, we would like to continue our collaboration with the authors of the paper [18]. We definitely need to use user- and item-based independent information like time and location, which can be considered as pure contextual in nature and treated by n-ary methods [19]. Acknowledgments. We would like to thank Alexander Tuzhilin, Elena Nen- ova, Radim Belohlavek, Vilem Vychodil, Sergei Kuznetsov, Sergei Obiedkov, Vladimir Bobrikov, Mikhail Roizner, and anonymous reviewers for their com- ments, remarks and explicit and implicit help during the paper preparations. This work was supported by the Basic Research Program at the National Re- search University Higher School of Economics in 2014-2015 and performed in the Laboratory of Intelligent Systems and Structural Analysis. First author was also supported by Russian Foundation for Basic Research (grant #13-07-00504). 110 Marat Akhmatnurov and Dmitry I. Ignatov References 1. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8) (2009) 30–37 2. Trefethen, L.N., Bau, D.: Numerical Linear Algebra. 3rd edition edn. SIAM (1997) 3. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1-2) (2001) 177–196 4. Lin, C.J.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10) (October 2007) 2756–2779 5. Belohlavek, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76(1) (2010) 3 – 20 Special Issue on Intelligent Data Analysis. 6. Symeonidis, P., Nanopoulos, A., Papadopoulos, A., Manolopoulos, Y.: Nearest- biclusters collaborative filtering based on constant and coherent values. Informa- tion Retrieval 11(1) (2008) 51–75 7. Ignatov, D.I., Kuznetsov, S.O., Poelmans, J.: Concept-Based Biclustering for In- ternet Advertisement. In: Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on. (Dec 2012) 123–130 8. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin/Heidelberg (1999) 9. du Boucher-Ryan, P., Bridge, D.: Collaborative recommending using formal con- cept analysis. Knowledge-Based Systems 19(5) (2006) 309 – 315 {AI} 2005 {SI}. 10. Ignatov, D.I., Kuznetsov, S.O.: Concept-based recommendations for internet ad- vertisement. In Belohlavek, R., Kuznetsov, S.O., eds.: Proc. of The Sixth Inter- national Conference Concept Lattices and Their Applications (CLA’08), Palacky University, Olomouc (2008) 157–166 11. Alqadah, F., Reddy, C., Hu, J., Alqadah, H.: Biclustering neighborhood-based collaborative filtering method for top-n recommender systems. Knowledge and Information Systems (2014) 1–17 12. Ignatov, D.I., Nenova, E., Konstantinova, N., Konstantinov, A.V.: Boolean Matrix Factorisation for Collaborative Filtering: An FCA-Based Approach. In: Artificial Intelligence: Methodology, Systems, and Applications - 16th Int. Conf., AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings. (2014) 47–58 13. Kuznetsov, S.O.: A fast algorithm for computing all intersections of objects in a finite semilattice. Automatic Documentation and Math. Ling. 27(5) (1993) 11–21 14. Birkhoff, G.: Lattice Theory. 11th edn. Harvard University, Cambridge, MA (2011) 15. Adomavicius, G., Tuzhilin, A.: Toward the next generation of recommender sys- tems: A survey of the state-of-the-art and possible extensions. IEEE Trans. on Knowl. and Data Eng. 17(6) (June 2005) 734–749 16. Ignatov, D.I., Poelmans, J., Dedene, G., Viaene, S.: A New Cross-Validation Tech- nique to Evaluate Quality of Recommender Systems. In Kundu, M., Mitra, S., Mazumdar, D., Pal, S., eds.: Perception and Machine Intelligence. Volume 7143 of LNCS. Springer (2012) 195–202 17. Ignatov, D.I., Kornilov, D.: Raps: A recommender algorithm based on pattern structures. In: Proceeding of FCA4AI 2015 workshop at IJCAI 2015. (2015) 18. Trnecka, M., Trneckova, M.: An algorithm for the multi-relational boolean fac- tor analysis based on essential elements. In: Proceedings of 11th International Conference on Concept Lattices and their Applications. (2014) 19. Ignatov, D., Gnatyshak, D., Kuznetsov, S., Mirkin, B.: Triadic formal concept analysis and triclustering: searching for optimal patterns. Machine Learning (2015) 1–32 Class Model Normalization Outperforming Formal Concept Analysis approaches with AOC-posets A. Miralles1,2 , G. Molla1 , M. Huchard2 , C. Nebut2 , L. Deruelle3 , and M. Derras3 (1) Tetis/IRSTEA, France andre.miralles@teledetection.fr, guilhem.molla@irstea.fr (2) LIRMM, CNRS & Université de Montpellier, France huchard,nebut@lirmm.fr (3) Berger Levrault, France laurent.deruelle@berger-levrault.com,mustapha.derras@berger-levrault.com Abstract. Designing or reengineering class models in the domain of programming or modeling involves capturing technical and domain con- cepts, finding the right abstractions and avoiding duplications. Making this last task in a systematic way corresponds to a kind of model nor- malization. Several approaches have been proposed, that all converge towards the use of Formal Concept Analysis (FCA). An extension of FCA to linked data, Relational Concept Analysis (RCA) helped to mine better reusable abstractions. But RCA relies on iteratively building con- cept lattices, which may cause a combinatorial explosion in the number of the built artifacts. In this paper, we investigate the use of an alterna- tive RCA process, relying on a specific sub-order of the concept lattice (AOC-poset) which preserves the most relevant part of the normal form. We measure, on case studies from Java models extracted from Java code and from UML models, the practical reduction that AOC-posets bring to the normal form of the class model. Keywords: Inheritance hierarchy, class model normalization, class model reengi- neering, Formal Concept Analysis, Relational Concept Analysis 1 Introduction In object-oriented software or information systems, the specialization-generaliza- tion hierarchy is a main dimension in class model organization, as the is-a rela- tion in the design of domain ontologies. Indeed, it captures a classification of the domain objects which is structuring for human comprehension and which makes the representation efficient. Designing or reengineering class models in the do- main of programming or in the domain of modeling still remains a tricky task. It includes the integration of technical and domain concepts sometimes with no clear semantics, and the definition of the adequate abstractions while avoid- ing the duplication of information. In many aspects, this task corresponds to a c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 111–122, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 112 André Miralles et al. kind of class model normalization, focusing on specialization and redundancy, by analogy to the database schema normalization. Normalization is important to assist forward engineering of a reliable and maintainable class model. It is also useful to address the erosion of the specialization-generalization hierarchy during software evolution. After the seminal paper of R. Godin and H. Mili at OOPSLA’93 [1], several approaches have been proposed to address this normalization, that all converged to the implicit or explicit use of Formal Concept Analysis (FCA [2]) techniques. In this context, FCA was used to mine descriptions that are common to groups of classes and to suggest re-factorizing and creating more reusable super-classes. Several approaches more specifically used a specific sub-order of the concept lattice which captures the most relevant new super-classes (the AOC-poset, for Attribute-Object Concept poset [3]). Then, Relational Concept Analysis (RCA [4]), an extension of FCA to linked data, was proposed to find deeper re- factorizations. However, RCA iterates on building concept lattices, that leads to a combinatorial explosion of the number of the built artifacts (including classes). In this paper, we investigate the use of an alternative version of RCA, rely- ing on AOC-posets. With AOC-posets, RCA might not converge, thus we have to carefully handle the iteration mechanism, but when it converges, we expect more efficiency. We measure, on case studies from UML models and from Java models rebuilt from industrial Java code, the reduction brought in practice by this approach to the normal form of the class model. We also show that, on realistic tuning, the reasonable number of the new artifacts allows the designer to analyze them and decide which of them should be kept in the model. In Section 2, the bases for FCA and RCA in the context of class models are outlined. Then we present current work in this domain, and the motivation for this study (Section 3). In Section 4, we give the setup of the experiment, as well as the results. We conclude in Section 5 with a few perspectives of this work. 2 Class Model Normalization FCA: A classical approach for applying FCA to class model normalization involves building a formal context K-class=(G, M, I), where the classes (in G) are associated with their characteristics (attributes, roles, operations, in M ) through I ⊆ G × M . There are many variations in this description of classes. For example, Fig. 1 (right-hand side) shows such a formal context for the class model presented in Fig. 2(a). A concept is a pair (Extent, Intent) where Extent = {g ∈ G|∀m ∈ Intent, (g, m) ∈ I} and Intent = {m ∈ M |∀g ∈ Extent, (g, m) ∈ I}. The concept extent represents the objects that own all the characteristics of the intent; the concept intent represents the characteristics shared by all objects of the extent. The specialization order between two formal concepts is given by: (Extent 1, Intent 1) < (Extent 2, Intent 2) ⇔ Extent 1 ⊂ Extent 2. It provides the concepts with a lattice structure. In the concept lattice, there is an ascending inheritance of objects and a de- scending inheritance of characteristics. The simplified intent of a formal concept Class Model Normalization 113 recommendedHeight windVaneDimension measureInterval measuringDate windDirection waterAmount windStrength plateDimension codeQuality tubeHeight cupNumber tubeLength accuracy vaneType rainfall wind K-class Device RainGauge × × CupAnemometer × × Anemometer × × × VaneAnemometer × × × Rainfall × × × PlateAnemometer × × × Wind × × × × PitotAnemometer × × Fig. 1. Anemometer Formal Context (left-hand side), Formal context K- class=(G, M, I) with G is the set of classes of Fig. 2(a) associated by I with the set M of their attribute and role names (right-hand side) rainfall wind RainGauge Rainfall Anemometer Wind * * tubeHeight measuringDate measuringInterval measuringDate codeQuality accuracy codeQuality waterAmount windStrength windDirection (a) Measure measuringDate codeQuality measure Device Measure rainfall * RainGauge Rainfall measuringDate * codeQuality tubeHeight waterAmount wind Anemometer Wind * RainGauge Anemometer Rainfall Wind measuringInterval windStrength tubeHeight measuringInterval waterAmount windStrength accuracy windDirection (b) (c) accuracy windDirection Fig. 2. Example of class model normalization with FCA and RCA [5]: (a) initial class model ; (b) (resp. (c)) class model refactored with FCA (resp. RCA). is its intent without the characteristics inherited from its super-concept intents. The simplified extent is defined in a similar way. In our example, among the for- mal concepts that can be extracted from K-class, Concept C = ({Rainfall, Wind}, {measuringDate, codeQuality}) highlights two classes that share the two attributes measuringDate and codeQuality. This concept C is interpreted as a new super-class of the classes of its extent, namely Rainfall and Wind. The new super-class, called here Measure, appears in Fig. 2(b). New super-classes are named by the class model designer. AOC-posets: In the framework of class model analysis with FCA, often AOC- posets, rather than concept lattices, are used. Formal context of Fig. 1 (left-hand side) is used to illustrate the difference between the concept lattice and the AOC- poset. The concept lattice like in Fig. 3(a) contains all the concepts from the formal context. Some concepts, like Concept Device 2, inherit all their charac- teristics from their super-concepts and their objects from their sub-concepts. In the particular case of object-oriented modeling, they would correspond to empty 114 André Miralles et al. description, with no correspondence with an initial class description and be rarely considered. In the AOC-poset like in Fig. 3(b), only concepts that introduce one characteristic or one object are kept, simplifying drastically the structure in case of large datasets. The number of concepts in the concept lattice can increase up to 2min(|G|,|M |) , while it is bounded by |G| + |M | in the AOC-poset. The Iceberg lattice, such as introduced in [6], is another well known sub-set of the concept lattice which is used in many applications. The iceberg lattice is induced by the sub-set of concepts which have an extent support greater than a given threshold. In our case, this would mean only keeping new super-classes that have a mini- mum number of sub-classes, which is not relevant in modeling and programming: a super-class may only have one sub-class. RCA: RCA helps to go further and get the class model of Fig. 2(c). In this ad- ditional normalization step, RainGauge and Anemometer have a new super-class which has been discovered because both have a role towards a sort of Measure (resp. Rainfall and Wind). To this end, the class model is encoded in a Rela- tional Context Family (RCF) as the one in Table 1, composed of several formal contexts that separately describe classes (K-class), attributes (K-attribute), and roles (K-role) and of several relations including relation between classes and attributes (r-hasAttribute), relation between classes and roles (r-hasRole), or relation between roles and their type r-hasTypeEnd. Here again, this encoding can vary and integrate other modeling artifacts, like operations or associations. Concept_Device_0 Concept_Device_1 Concept_Device_3 recommendedHeight windVaneDimension Concept_Device_5 Concept_Device_2 Concept_Device_8 cupNumber tubeLength CupAnemometer PitotAnemometer Concept_Device_6 Concept_Device_7 vaneType plateDimension VaneAnemometer PlateAnemometer Concept_Device_4 (a) Concept_Device_1 Concept_Device_3 recommendedHeight windVaneDimension Concept_Device_5 Concept_Device_6 Concept_Device_7 Concept_Device_8 cupNumber vaneType plateDimension tubeLength (b) CupAnemometer VaneAnemometer PlateAnemometer PitotAnemometer Fig. 3. Concept lattice (a) and AOC-poset (b) for anemometers (Fig. 1, left) [5] Class Model Normalization 115 RCA is an iterative process where concepts emerge at each step. Relations and concepts discovered at one step are integrated into contexts through re- lational attributes for computing concept lattices at the next step. At step 0, attributes with the same name (resp. the two attributes measuringDate or the two attributes codeQuality) are grouped. At step 1, classes that share attributes from an attribute group are grouped into a concept that produces a new super- class (e.g. Wind and Rainfall are grouped to produce the super-class Measure). At step 2, roles rainfall and wind share the fact that they end at a sub-class of Measure, thus they are grouped into new role shown under the name measure in Fig. 2(c). At step 3, the current context of classes (extended with relational at- tributes) shows that both classes RainGauge and Anemometer have a role ending to Measure. Then a new super-class, called Device by the designer, is extracted. Table 1. Context family for the set of classes of Fig. 2(a) measureInterval measuringDate windDirection waterAmount windStrength codeQuality tubeHeight accuracy rainfall Kclass wind Kattribute RainGauge RG::tubeHeight × Anemometer Krole A::measureInterval × Rainfall rainfall × A::accuracy × Wind wind × R::measuringDate × W::measuringDate × R::codeQuality × W::codeQuality × R::waterAmount × W::windStrength × W::windDirection × A::measureInterval W::measuringDate R::measuringDate W::windDirection W::windStrength R::waterAmount RG::tubeHeight W::codeQuality rhasAttribute R::codeQuality rhasRole Anemometer RainGauge A::accuracy rainfall wind Rainfall Wind RainGauge x rhasT ypeEnd Anemometer x RainGauge x Rainfall rainfall x Anemometer x x Wind wind x Rainfall x x x Wind x x x x 3 Previous work on RCA and Class Model Normalization RCA has been first assessed on small [7] or medium [8] class models, encoding technical information (multiplicity, visibility, being abstract, initial value) in the RCF, which was the source of many non-relevant concepts. In [9], the authors assessed RCA on Ecore models, Java programs and UML models. The encoding was similar to the one presented in Section 2 to illustrate RCA (classes, attributes, roles, described by their names and their relationships). 116 André Miralles et al. While for Java models, the number of discovered class concepts (about 13%) was very reasonable, for UML class models, the increase (about 600%) made the post-analysis impossible to achieve. Recently, we systematically studied various encodings of the design class model of an information system about Pesticides [10, 11]. We noticed a strong impact of association encoding, and that encoding only named and navigable ends of associations was feasible, while encoding all ends (including those without a name and those that are not navigable) led to an explosion of the number of concepts. Restricting to named ends and to navigable ends means that we give strong importance to the semantics used by designer, thus the lost concepts have a greater chance to be uninteresting. Guided by the intuition of the model designer, we recently proposed a con- trolled approach with progressive concept extraction [5]. In this approach, the designer chooses at each step of the RCA process the formal contexts and the relations he wants to explore. For example, he may begin with classes and at- tributes, then add roles, then associations, then remove all information about classes and consider only classes and operations, etc. Such a choice is memorized in a factorization path. In [5], we used AOC-posets, but we did not evaluate the difference between AOC-posets and concept lattices in the controlled process. The objective was to evaluate the number of discovered concepts at each step and to observe trends in their progress. It was worth noting that the curves of the same factorization path applied to different models had the same shape. In this paper, we will use the 15 versions of the analysis class model of the same information system on Pesticides, as well as a dataset coming from in- dustrial Java models. Contrarily to the experiments made by authors in [9–11], our objective is to evaluate the benefits of using the variant of RCA, which builds AOC-posets (rather than concept lattices) during the construction pro- cess. Studying the concepts discovered at each step, we can also evaluate what was called the automatic factorization path in the controlled approach of [5]. It is clear that AOC-posets are smaller than concept lattices, thus the results we ex- pect focus on assessing the amount of the reduction in the number of discovered concepts that will be brought to the designer for analysis. 4 Case study Experimental setup: Figure 4 presents the metamodel used in practice to define the RCF for our experiments. Object-Attribute contexts are associated to classes, attributes, operations, roles and associations. Attributes, operations, roles and associations are described by their names in UML and Java models. Classes are described by their names in UML models. Object-object contexts correspond to the meta-relations hasAttribute, has- Operation, hasRole (between Class and Role and between Association and Role), hasTypeEnd, and isRoleOf. This last meta-relation is not used in the Java model RCF, because from Java code we can only extract unidirectional associations (corresponding to Java attributes whose type is a class). Class Model Normalization 117 All the roles extracted from Java code have a name. In UML models, we only consider the named roles, and the navigable ends of associations to focus on the most meaningful elements. We do not consider multiplicities in role description, nor the initializer methods (constructors). Fig. 4. Metamodel used to define the RCF. Each of the 15 UML models corresponds to a version of an information system on Pesticides. These models, described in [10], were collected during the Pesticides information system development and then the evolution of the project throughout 6 years. The RCF for UML models contains all the UML model el- ements. We also used 15 Java models coming from Java programs developed by the company Berger-Levrault in the domain of human resource management. These 15 Java models come from 3 Java programs: Agents, Chapter and Ap- praisal Campaign. For each program, we determined a central meaningful class and navigated from this central class through association roles at several dis- tances: 1, 2, 4, 8 and 16. For the 5 Java models, we could not get results due to the size of the models. The Java programs are the Java counterpart of the database accesses, thus we focused on Java attributes, that are encoded as at- tributes when their type is primitive (integer, real, boolean, etc), and as roles of a unidirectional associations when their type is a class. The operations were access and update operations associated to these attributes, thus they do not bring any meaningful information. For the sake of space, we only present results on 4 representative Java models, from the Chapter program (distances 1, 2, 4, 8). Depending on the version, 254 to 552 model elements are involved in the Pesticides models and 34 to 171 of these model elements are classes. The Chap- ter models are composed of 204 to 3979 model elements of which 37 to 282 are classes. The experiments have been made with the help of UML profiles of Objecteer- ing1 to extract the RCF from the UML models and Java modules to extract the RCF from the Java models. RCAExplore2 is used to compute the lattices and 1 www.objecteering.com/ 2 dolques.free.fr/rcaexplore/ 118 André Miralles et al. the AOC-posets, and Talend3 to extract information from RCAExplore outputs, to integrate data, to build the curves and analyze the results. Results: We computed various metrics on the built lattices and AOC-posets, such as the number of several categories of concepts: merged concepts (simplified extents have a cardinal strictly greater than 1), perennial concepts (simplified extents have a cardinal equals to 1) and new concepts (simplified extents are empty). A merged concept corresponds to a set of model elements that have the same description, for example a merged attribute concept groups attributes with a same name. A perennial concept corresponds to a model element which has at least one characteristic that makes it distinct from the others. A new concept corresponds to a group of a model elements that share part of their description, and no model element has exactly this description (it always has some additional information). We focus in this paper on the classes and on the number of new class concepts, because they are the predominating elements that reveal potential new abstractions and a possible lack of factorization in the current model. To highlight the observed increase brought by the conceptual structures (lattice and AOC-poset), we present the ratio of new class concepts on the number of initial classes in the model (Fig. 5, 6, 7, 8). To compare lattices and AOC-posets, we compute the ratio: #N#N ew class concepts in lattice ew class concepts in AOC−poset (Fig. 9, 10). For Chapter models, lattices are computed for the steps 0 to 6 and, for all other cases, lattices or AOC-posets are determined up to step 24. Fig. 5. New class concept number in lattices for Pesticides models In lattices for the Pesticides models (Fig. 5), the process always stops between steps 6 and 14 depending on the models. At step 6, the ratio of new class concepts on the number of classes varies between 165% (V10) and 253% (V05). Results 3 www.talend.com/ Class Model Normalization 119 Fig. 6. New class concept number in AOC-posets for Pesticides models are hard to analyze for a human designer without any further filtering. In lattices for the Chapter models (Fig. 7), we stopped the process at step 6, because we observed a high complexity: for example, for distance 8, at step 6, we get 43656 new class concepts. It is humanly impossible to analyze the obtained new class concepts. At distance 16, Chapter model could not even be processed. For the model at distance 1 (resp. 2), at step 6, the ratio is 11 % (resp. 62%) remaining reasonable (resp. beginning to be hard to analyze). We also observe stepping curves, that are explained by the metamodel: at step 1, new class concepts are introduced due to attribute and role concept creation of step 0, then they remain stable until step 2, while role and association concepts increase, etc... Fig. 7. New class concept number in lattices for Chapter models 120 André Miralles et al. Curves for AOC-posets for Pesticides models are shown in Fig. 6. The order- ing on Pesticides AOC-posets roughly corresponds to the need of factorization: the highest curves correspond to V00 and V01 where few inheritance relations (there is none in V00) have been used. This shows the factorization work done by the designer during model versioning. The ratio at step 6 varies between 56% (V10) and 132% (V00). In Pesticides lattices, we observed many curve crossings, and curves have different shapes, while with Pesticide AOC-posets, the curves have a regular shape and globally they decrease. For Chapter models (Fig. 8), convergence is obtained for all AOC-posets (distance 1 to 8) and the process stops between steps 5 and 23. The curves are also ordered and, as for lattices, the highest curve corresponds to the highest distance. The curves reveal many opportunities for factorization. The ratio at step 6 varies between 5% (distance 1) and 161% (distance 8). Fig. 8. New class concept number in AOC-posets for Chapter models Fig. 9 and 10 allow lattices and AOC-posets to be compared. We always have more concepts in lattices than in AOC-posets, which was expected. In the AOC- poset of the Chapter model at distance 1, there is only one new class concept, while in the lattice they are three (including the top and bottom concepts), explaining the beginning of the curve (Fig. 10). For version V00, the behavior of the lattice-based process and the AOC-based process are similar. Except for version V04 (at version V05 a duplication of a part of the model has been made for working purpose), the highest curves correspond to the last models, which have been better factorized, and we here notice the highest difference between the two processes. We may hypothesize that lattices may contain many uninteresting factorizations compared to AOC-posets in these cases. This experiment shows that, in practice, the AOC-posets generally produce results that can be analyzed, while lattices are often difficult to compute in some cases and are often too huge to be used for our purpose. Class Model Normalization 121 Fig. 9. Ratio #N#N ew class concepts in lattice ew class concepts in AOC−poset for Pesticides models 5 Conclusion For class model normalization, concept lattices and AOC-posets are two struc- tures giving two different normal forms. UML models that are rebuilt from these structures are interesting in both cases from a thematic point of view. Never- theless, lattices are often too huge, and AOC-posets offer a good technique to reduce the complexity. As future work, we plan to go more deeply into an exploratory approach, defining different factorization paths, with model rebuilding at each step with expert validation. This would allow the complexity to be controlled introduc- ing less new concepts at each step. We also plan to use domain ontologies to guide acceptance of a new formal concept, because it corresponds to a thematic concept. Acknowledgment This work has been supported by Berger Levrault. The authors also warmly thank X. Dolques for the RCAExplore tool which has been used for experiments. References 1. Godin, R., Mili, H.: Building and Maintaining Analysis-Level Class Hierarchies Using Galois Lattices. In: Proceedings of the Eight Annual Conference on Object- Oriented Programming Systems, Languages, and Applications (OOPSLA 93), ACM (1993) 394–410 2. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundation. Springer-Verlag Berlin (1999) 122 André Miralles et al. Fig. 10. Ratio #N#N ew class concepts in lattice ew class concepts in AOC−poset for Chapter models 3. Dolques, X., Le Ber, F., Huchard, M.: AOC-Posets: a Scalable Alternative to Concept Lattices for Relational Concept Analysis. In: Proceedings of the Tenth International Conference on Concept Lattices and Their Applications (CLA 2013). Volume 1062 of CEUR Workshop Proceedings., CEUR-WS.org (2013) 129–140 4. Rouane-Hacène, M., Huchard, M., Napoli, A., Valtchev, P.: Relational concept analysis: mining concept lattices from multi-relational data. Ann. Math. Artif. Intell. 67(1) (2013) 81–108 5. Miralles, A., Huchard, M., Dolques, X., Le Ber, F., Libourel, T., Nebut, C., Guédi, A.O.: Méthode de factorisation progressive pour accroı̂tre l’abstraction d’un modèle de classes. Ingénierie des Systèmes d’Information 20(2) (2015) 9–39 6. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing Iceberg Concept Lattices with TITANIC. Data Knowl. Eng. 42(2) (2002) 189–222 7. Rouane-Hacène, M.: Relational concept analysis, application to software re- engineering. Thèse de doctorat, Université du Québec à Montréal (2005) 8. Roume, C.: Analyse et restructuration de hiérarchies de classes. Thèse de doctorat, Université Montpellier 2 (2004) 9. Falleri, J.R., Huchard, M., Nebut, C.: A generic approach for class model nor- malization. In: Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008). (2008) 431–434 10. Osman Guédi, A., Miralles, A., Huchard, M., Nebut, C.: A practical application of relational concept analysis to class model factorization: Lessons learned from a thematic information system. In: Proceedings of the Tenth International Con- ference on Concept Lattices and Their Applications (CLA 2013). Volume 1062 of CEUR Workshop Proceedings., CEUR-WS.org (2013) 9–20 11. Osman Guédi, A., Huchard, M., Miralles, A., Nebut, C.: Sizing the underlying factorization structure of a class model. In: Proceedings of the 17th IEEE In- ternational Enterprise Distributed Object Computing Conference, (EDOC 2013). (2013) 167–172 Partial enumeration of minimal transversals of a hypergraph Lhouari Nourine, Alain Quilliot and Hélène Toussaint Clermont-Université, Université Blaise Pascal, LIMOS, CNRS, France {nourine, quilliot, helene.toussaint}@isima.fr Abstract. In this paper, we propose the first approach to deal with enumeration problems with huge number of solutions, when interesting- ness measures are not known. The idea developed in the following is to partially enumerate the solutions, i.e. to enumerate only a representative sample of the set of all solutions. Clearly many works are done in data sampling, where a data set is given and the objective is to compute a representative sample. But, to our knowledge, we are the first to deal with sampling when data is given implicitly, i.e. data is obtained using an algorithm. The experiments show that the proposed approach gives good results according to several criteria (size, frequency, lexicographical order). 1 Introduction Most of problems in data mining ask for the enumeration of all solutions that satisfy some given property [1, 10]. This is a natural process in many applications, e.g. marked basket analysis [1] and biology [2] where experts have to choose between those solutions. An enumeration problem asks to design an output- polynomial algorithm for listing without duplications the set of all solutions. An output-polynomial algorithm is an algorithm whose running time is bounded by a polynomial depending on the sum of the sizes of the input and output. There are several approachs to enumerate all solutions to a given enumeration problem. Johnson et al. [13] have given a polynomial-time algorithm to enumer- ate all maximal cliques or stables of a given graph. Fredman and Khachiyan [7] have proposed a quasi-polynomial-time algorithm to enumerate all minimal transversal of an hypergraph. For enumeration problems the size of the output may be exponential in the size of the input, which in general is different from optimization problems where the size of the output is polynomially related to the size of the input. The drawback of the enumeration algorithms is that the number of solutions may be exponential in the size of the input, which is infea- sible in practice. In data mining, some interestingness measures or constraints are used to bound the size of the output, e.g. these measures can be explicitly specified by the user [8]. In operation research, we use quality criteria in order to consider appropriate decision [21]. In this paper, we deal with enumeration problems with huge number of so- lutions, when interestingness measures are not known. This case happens when c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 123–134, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 124 Lhouari Nourine, Alain Quilliot and Hélène Toussaint the expert has no idea about data and knowledges that are looking for. The objective is to enumerate only a representative sample of the set of all solutions. Clearly many works are done in data sampling, where are given a data set and the objective is to compute a representative sample. To our knowledges, this idea is new for sampling when data is given implicitly, i.e. data is obtained using an algorithm. One can use the naive approach which first enumerates all the solu- tions and then applies sampling methods, which is not possible for huge number of solutions. To evaluate our approach, we consider a challenging enumeration problem, which is related to mining maximal frequent item sets [1, 10], dualization of monotone boolean functions [5] and other problems [10]. We applied our ap- proach to several instances of transversal hypergraphs [17, 20], and obtain good results. 2 Related works Golovach et al. [9] have proposed an algorithm to enumerate all minimal domi- nating sets of a graph. First they generate maximal independent sets and then apply a flipping operation to them to generate new minimal dominating sets, where the enumeration of maximal independent sets is polynomial. Clearly, a relaxation of the flipping operation leads to a partial enumeration since the number of minimal dominating sets can be exponential in the number of min- imal independent sets, e.g. cobipartie graphs. Jelassi et al.[12] and Raynaud et al.[19] have considered some kind of redundancy in hypergraphs like twin ele- ments to obtain a concise representation. Their ideas can avoid the enumeration of similar minimal transversals of an hypergraph. 3 Transversal hypergraph enumeration A hypergraph H = (V, E) consists of a finite collection E of sets over a finite set V . The elements of E are called hyperedges, or simply edges. An hypergraph is said simple if for any E, E 0 ∈ E E 6⊆ E 0 . A transversal (or hitting set) of H is a set T ⊆ V that intersects every edge of E. A vertex x in a transversal T is said to be redundant if T \ {x} is still a transversal. A transversal is minimal if it does not contain any redundant vertex. The set T of all minimal transversal of H = (V, E) constitutes together with V also a hypergraph T r(H) P = (V, T ), which is called the transversal hypergraph of H. We denote by k = E∈E | E |. Example 1. Consider the hypergraph H = (V, E): V = {1, 2, 3, 4, 5} and E = {E1 , E2 , E3 } with E1 = {1, 3, 4}, E2 = {1, 3, 5} and E3 = {1, 2}. The set of all minimal transversals is T = {{1}, {2, 3}, {2, 4, 5}} and k = 3 + 3 + 2 = 8 Given a simple hypergraph H = (V, E), the transversal hypergraph enumera- tion problem concerns the enumeration without repetitions of T r(H). This prob- lem has been intensively studied due to its link with several problems isuch as Partial enumeration of minimal transversals of a hypergraph 125 data mining and learning [3, 4, 11, 15, 18]. Recently, Kante et al.[14] have shown that the enumeration of all minimal transversals of an hypergraph is polynomi- ally equivalent to the enumeration of all minimal domination sets of a graph. It is known that the corresponding decision problem belongs to coNP, but still open whether there exists an output-polynomial-time algorithm. 4 Partial transversal hypergraph enumeration We introduce the partial (or incomplete) search algorithm for enumerating min- imal transversals of an hypergraph H. The search space is the set of all transver- sals which is very large. The strategy is divided into two steps: – The initialization procedure considers a transversal T of H and then ap- plies a reduction (at random) algorithm to T in order to obtain a minimal transversal Tm of H. This step is detailed in section 4.1. – The local search algorithm considers a minimal transversal Tm and then applies local changes to Tm in which we add and delete vertices according to some ordering of the vertices. This step is detailed in section 4.2 These steps are repeated for at most k transversals depending on the input hypergraph H. Figure 1 illustrates the proposed approach. Fig. 1. Approach to partial enumeration of minimal transversals 4.1 Initialization Let H(V, E) be the input hypergraph, E ∈ E and x ∈ E. The initialization step starts with the transversal (V \ E) ∪ {x} and then applies a reduction 126 Lhouari Nourine, Alain Quilliot and Hélène Toussaint algorithm to obtain a minimal transversal. The following property shows that the set (V \ E) ∪ {x} is a transversal and any minimal transversal contained in (V \ E) ∪ {x} contains the vertex x. Property 1. Let H = (V, E) be a simple hypergraph, E ∈ E and x ∈ E. Then (V \ E) ∪ {x} a transversal of H. Moreover, any minimal transversal T ⊆ (V \ E) ∪ {x} will contain x. Proof. Let H = (V, E) be a simple hypergraph and E 0 ∈ E with E 6= E 0 . Since H is simple then there exists at least one element y ∈ E 0 such that y 6∈ E. So y ∈ (V \ E) and thus E 0 ∩ (V \ E) 6= ∅. We conclude that (V \ E) ∪ {x} is a transversal since x ∈ E. Now let T ⊆ (V \E)∪{x} be a minimal transversal. Then E ∩(V \E)∪{x} = {x} and thus x must belong to T otherwise T does not intersect E. According to property 1, we can apply the initialization procedure to any pair (x, E) where E ∈ E and x ∈ E. In other words, the initialization is applied to at most k transversals of H as shown in Algorithm 1. Algorithm 1: Initialization Input : A hypergraph H(V, E) and σ an ordering of V Output: A sample of minimal transversals begin ST RAN S = ∅; for E ∈ E do for x ∈ E do T = (V \ E) ∪ {x};{Initial transversal} Tm = Reduce(T, σ); ST RAN S = ST RAN S ∪ {Tm }; return(ST RAN S); Now we describe the reduction process, which takes a transversal T and a random ordering σ of V and returns a minimal transversal Tm . Indeed, we delete vertices from T according to the ordering σ until we obtain a minimal transversal. Algorithm 2: Reduce(T, σ) Input : A transversal T and an ordering σ = σ1 ...σ|V | of the vertices of H. Output: A minimal transversal for i = 1 to |V | do if T \ {σi } is a transversal then T ← T \ {σi } ; Return(T ); Partial enumeration of minimal transversals of a hypergraph 127 Example 2 (continued). Suppose we are given the hypergraph in example 1 and σ = (1, 2, 3, 4, 5) for the input to Algorithm 1. First, it takes the hyperedge E = {1, 3, 4} and for x = 1 we obtain the minimal transversal {1}, for x = 3 we obtain {2, 3} and for x = 4 we obtain {2, 4, 5}. Then the algorithm con- tinue with the hyper edges {1, 3, 5} and {1, 2}. Finally the algorithm returns ST RAN S = {{1}, {2, 3}, {2, 4, 5}}, i.e. the other iterations do not add new minimal transversals. Theorem 1. Algorithm 1 computes at most k minimal transversals of an input hypergraph H = (V, E). Proof. The initialization procedure considers at most k minimal transversals of H. Since a minimal transversal can be obtained several times, the result follows. The following proposition shows that any minimal transversal of the hyper- graph H = (V, E) can be obtained using the initialization procedure. Indeed, the choice of the ordering σ is important in the proposed strategy. Proposition 1. Let H = (V, E) be an hypergraph and T be a minimal transver- sal of H. Then, there exists a total order σ, E ∈ E and x ∈ E such that T = Reduce((V \ E) ∪ {x}, σ). Proof. Let T be a minimal transversal of H = (V, E). Then there exists at least one hyperedge E ∈ E such that T ∩ E = {x}, x ∈ V , otherwise T is not minimal. Thus T ⊆ (V \ E) ∪ {x}. Now, if we take the elements that are not in T before the elements in T in σ, the algorithm Reduce((V \ E) ∪ {x}, σ) returns T . The initialization procedure guaranties that for any vertex x ∈ V at least one minimal transversal containing x is listed. The experiments in section 5, shows the sample of minimal transversals obtained by the initialization procedure is a representative sample of the set of all minimal transversals. 4.2 Local search algorithms The local search algorithm takes each minimal transversal found in the initial- ization step and searches for new minimal transversals to improve the initial solution. The search of neighbors is based on vertices orderings. Let H = (V, E) be an hypergraph and x ∈ V . We define the frequency of x as the number of minimal transversals of H that contain x. The algorithm takes a minimal transversal T and a bound N max which bounds the number of iterations and the number of neighboors generated by T . Each iteration of the while loop, starts with a minimal transversal T and computes two orderings as follows: – σ c is an ordering according to the increasing order of frequency of vertices in V \ T in minimal transversals already obtained by the current call. This ordering has a better coverage of the solution set, i.e. by keeping the rarest vertices in the transversals. 128 Lhouari Nourine, Alain Quilliot and Hélène Toussaint – σ is a random ordering of the vertices in T . Algorithm 3: Neighboor(T, N max) Input : A minimal transversal T of H = (V, E) and an integer N max Output: A set of minimal transversals Q = T; i = 1; while i ≤ N max do σ c ← the set V \ T sorted in increasing order of frequency of vertices in minimal transversals in Q; σ ← sort T at random; Add elements the elements in σ c to T until a vertex x ∈ T \ σ c becomes redundant in T ; T =Reduce(T, σ); Q = Q ∪ {T }; i = i + 1; return(Q); Now we give the global procedure of the proposed approach. Algorithm 4: Global procedure for partial enumeration of minimal transversals Input : An hypergraph H = (V, E) and an integer N max Output: A sample of minimal transversals of H σ =choose a random ordering of V ; ST RAN = Q = Initialization(H, σ); while Q 6= ∅ do T = choose a minimal transversal T in Q; ST RAN S = ST RAN S ∪ N eighboor(T, N max); Return(ST RAN S); In the following, we describe experiments to evaluate the results that have been obtained. 5 Experimentation The purpose of the experiments is to see if the proposed approach allow us to generate a representative set of solutions. For this reason, we have conducted the experiments on two different classes of hypergraphs (see [20]) for which the number of minimal transversals is huge compared to the size of the input. We use Uno’s Algorithm SHD (Sparsity-based Hypergraph Dualization, ver. 3.1) [20], to enumerate all minimal transversals. The experiments are done using linux CentOS cpu Intel Xeon 3.6 GHz and C++ language. In the following, we denote Tpartial the set of minimal transversals generated by Algorithm 4, and Texact the set of all minimal transversals. First, we analyze Partial enumeration of minimal transversals of a hypergraph 129 T the percentage Tpartial exact and then we evaluate the representativeness of the sample Tpartial . 5.1 The size of Tpartial We will distinguish between minimal transversals that are obtained using Algo- rithm 1 (or the initialization procedure) and those that are generated using the local search. For these tests we set the maximal number of neighboors N max to 3. Tables 1 and 2 show the results for the two classes of hypergraph instances, namely ”lose” and random ”p8”. The first three columns have the following meaning: – instance: instance name. – instance size: the size of the instance (number of edges × number of vertices). – total # of transv.: the exact number of minimal transversals | Texact |. The second (resp. last) three columns give the results for the initialization procedure (resp. Global algorithm): – # transv. found : the number of minimal transversals found. – % transv. found : the percentage of minimal transversals found. – cpu (s): the run time in seconds Table 1. Results for all ”lose” instances According to these tests, we can see that the percentage of minimal transver- sals found using the initialization procedure is very low, but it decreases as far as the size of Texact increases. Clearly, this percentage is strongly related to the input. Indeed, the number k (entropy) of the hypergraph increases according to the size of the input hypergraph. We can also see that the local search increases significantly the number of solutions found by a factor 2 to 2.5 approximatively. 130 Lhouari Nourine, Alain Quilliot and Hélène Toussaint Table 2. Results for all ”p8 ” instances But it remains coherent with the chosen value N max = 3. It argues that the lo- cal search finds other transversals that are not found either by the initialization procedure nor previous local search. Notice that the parameter N max can be increased whenever the size of Tpartial is not sufficient. 5.2 The representativeness of Tpartial To evaluate the representativeness of Tpartial , we consider three criteria: – Size of the minimal transversals in Tpartial . – Frequency of vertices in Tpartial . – Lexicographical rank of the minimal transversals in Tpartial . Each criteria is illustrated using a bar graph for two instances from different classes. The bar graphs in figures 2 and 3 are surprising. Indeed the bar graphs vary nearly in the same manner with respect to the initialization and the local search algorithm for all the considered criteria. Figures 2(a) 3(a) show that the percentage of minimal transversals of each size (found either by the initialization procedure and local search) fits the per- centage of all minimal transversals that are found. Figures 2(b) and 3(b) show that the same analysis holds when ordering min- imal transversals lexicographically (e.g. based on a total ordering of vertices). Clearly, the lexicographical rank of a minimal transversal belongs to the interval [1..2|V | ]. For visualization aspect, we divide this interval into |V | subintervals, where the subinterval i contains the number of minimal transversals with a rank |V | |V | r ∈ [ 2|V | i; 2|V | (i + 1)[, (i = 0, . . . , |V | − 1). Figures 2(c) and 3(c) confirm this behavior when considering frequency of vertices. Indeed, frequency of vertices in minimal transversals in Tpartial is the same when considering all minimal transversals, i.e.. the set Texact . Partial enumeration of minimal transversals of a hypergraph 131 Fig. 2. The bar graph for ”lose100”: a) number of minimal transversals per size b) number of minimal transversals according the lexicographical rank; c) Frequency of vertices in minimal transversals. Fig. 3. The bar graph for ”p8 200”; a) number of minimal transversals per size; b) number of minimal transversals according the lexicographical rank; c) Frequency of vertices in minimal transversals. 132 Lhouari Nourine, Alain Quilliot and Hélène Toussaint Fig. 4. Visualizing the solutions space of ”lose100”. The abscissa is given by the size of the transversal (transversals of the same size are spread out using a norm) and the ordinate corresponds to the frequency of its vertices. Fig. 5. Visualizing the solutions space of ”p8 200”. The abscissa is given by the size of the transversal (transversals of the same size are spread out using a norm) and the ordinate corresponds to the frequency of its vertices. Partial enumeration of minimal transversals of a hypergraph 133 Figures 4 and 5 show that the set Tpartial is also representative even when considering minimal transversals with the same size. Indeed, minimal transver- sals having the same size are spread out using a norm. We notice that the points corresponding to minimal transversals in Tpartial are scattered in the image. This experiment allows us to conclude that the sample Tpartial produced by Algorithm 4 is representative relatively to the criteria under consideration. Other results can be found in http://www2.isima.fr/˜toussain/ 6 Conclusion and discussions We are convinced that the initialization procedure is the most important in this approach. Indeed, the set of minimal transversals obtained using this procedure is a representative sample, since it garantee that for any vertex of the hypergraph there is at least one minimal transversal which contains it (see property 1). Moreover the local search procedure can be used to increase the number of solutions, and as we have seen in the experiments, it keeps the same properties as the initialization procedure. We hope that this approach improves enumeration in big data and will be of interests to the readers to investigate heuristics methods [6] for enumeration problems. This paper opens new challenges related to partial and approximate enu- meration problems. For example, given an hypergraph H = (V, E), is there an algorithm that for any given ε, it enumerates a set Tpartial ⊆ T r(H) such that (1 − ε)|T r(H)| ≤ |Tpartial | ≤ |T r(H)|? We also require that the algorithm is output-polynomial in the sizes of H, Tpartial and 1ε . To our knowledge, there is no work on approximate algorithms for enumeration problems, but results on approximate counting problems may be applied [16]. Acknowledgment: This work has been funded by the french national re- search agency (ANR DAG project, 2009-2013) and CNRS (Mastodons PETASKY project, 2012-2015). References 1. R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in massive databases. In ACM SIGMOD 1993, Washington D.C., pages 207–216, 1993. 2. J. Y. Chen and S. Lonardi. Biological Data Mining. Chapman and Hall/CRC, 2009. 3. T. Eiter and G. Gottlob. Identifying the minimal transversals of a hypergraph and related problems. SIAM J. Comput., 24(6):1278–1304, 1995. 4. T. Eiter, G. Gottlob, and K. Makino. New results on monotone dualization and generating hypergraph transversals. SIAM J. Comput., 32(2):514–537, 2003. 5. T. Eiter, K. Makino, and G. Gottlob. Computational aspects of monotone dual- ization: A brief survey. Discrete Applied Mathematics, 156(11):2035–2049, 2008. 134 Lhouari Nourine, Alain Quilliot and Hélène Toussaint 6. T. Feo and M. Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6(2):109–133, 1995. 7. M. Fredman and L. Khachiyan. On the complexity of dualization of monotone disjunctive normal forms. Journal of Algorithms, 21:618–628, 1996. 8. L. Geng and H. J. Hamilton. Interestingness measures for data mining: A survey. ACM Comput. Surv., 38(3), Sept. 2006. 9. P. Golovach, P. Heggernes, D. Kratsch, and Y. Villanger. An incremental polyno- mial time algorithm to enumerate all minimal edge dominating sets. Algorithmica, 72(3):836–859, 2015. 10. D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. S. Sharm. Discovering all most specific sentences. ACM Trans. Database Syst., 28(2):140–174, 2003. 11. D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. Data mining, hyper- graph transversals, and machine learning. In PODS, pages 209–216, 1997. 12. M. Jelassi, C. Largeron, and S. Ben Yahia. Concise representation of hypergraph minimal transversals: Approach and application on the dependency inference prob- lem. In Research Challenges in Information Science (RCIS), 2015 IEEE 9th In- ternational Conference on, pages 434–444, May 2015. 13. D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. On generating all maxi- mal independent sets. Inf. Process. Lett., 27(3):119–123, 1988. 14. M. M. Kanté, V. Limouzy, A. Mary, and L. Nourine. On the enumeration of mini- mal dominating sets and related notions. SIAM Journal on Discrete Mathematics, 28(4):1916–1929, 2014. 15. L. Khachiyan, E. Boros, K. M. Elbassioni, and V. Gurvich. An efficient implemen- tation of a quasi-polynomial algorithm for generating hypergraph transversals and its application in joint generation. Discrete Applied Mathematics, 154(16):2350– 2372, 2006. 16. J. Liu and P. Lu. Fptas for counting monotone cnf. In Proceedings of the Twenty- Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’15, pages 1531–1548. SIAM, 2015. 17. K. Murakami and T. Uno. Efficient algorithms for dualizing large-scale hyper- graphs. Discrete Applied Mathematics, 170:83–94, 2014. 18. L. Nourine and J.-M. Petit. Extending set-based dualization: Application to pat- tern mining. In ECAI, pages IOS Press ed, Montpellier, France, 2012. 19. O. Raynaud, R. Medina, and C. Noyer. Twin vertices in hypergraphs. Electronic Notes in Discrete Mathematics, 27:87–89, 2006. 20. T. Uno. http://research.nii.ac.jp/ uno/dualization.html. 21. C. A. Weber, J. R. Current, and W. Benton. Vendor selection criteria and methods. European Journal of Operational Research, 50(1):2 – 18, 1991. An Introduction to Semiotic-Conceptual Analysis with Formal Concept Analysis Uta Priss Zentrum für erfolgreiches Lehren und Lernen Ostfalia University of Applied Sciences Wolfenbüttel, Germany www.upriss.org.uk Abstract. This paper presents a formalisation of Peirce’s notion of ‘sign’ using a triadic relation with a functional dependency. The three sign components are then modelled as concepts in lattices which are connected via a semiotic map- ping. We call the study of relationships relating to semiotic systems modelled in this manner a semiotic-conceptual analysis. It is argued that semiotic-conceptual analyses allow for a variety of applications and serve as a unifying framework for a number of previously presented applications of FCA. 1 Introduction The American philosopher C. S. Peirce was a major contributor to many fields with a particular interest in semiotics. The following quote shows one of his definitions for the relationships involved in using a sign: A REPRESENTAMEN is a subject of a triadic relation TO a second, called its OBJECT, FOR a third, called its INTERPRETANT, this triadic relation be- ing such that the REPRESENTAMEN determines its interpretant to stand in the same triadic relation to the same object for some interpretant. (Peirce, CP 1.541)1 According to Peirce a sign consists of a physical form (representamen) which could, for example, be written, spoken or represented by neurons firing in a brain, a meaning (object) and another sign (interpretant) which mirrors the original sign, for example, in the mind of a person producing or observing a sign. It should be pointed out that the use of the term ‘relation’ by Peirce is not necessarily the same as in modern mathematics which distinguishes more clearly between a ‘relation’ and its ‘instances’. Initially Peirce even referred to mathematical relations as ‘relatives’ (Maddux, 1991). We have previously presented an attempt at mathematically formalising Peirce’s definition (Priss, 2004). In our previous attempt we tried to presuppose as few assump- tions about semiotic relations as possible which led to a fairly open structural descrip- tion which, however, appeared to be limited with respect to usefulness in applications. 1 It is customary among Peirce scholars to cite Peirce in this manner using an abbreviation of the publication series, volume number and paragraph or page numbers. c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 135–146, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 136 Uta Priss Now we are presenting another formalisation which imposes a functional dependency on the triadic relation. The notions from Priss (2004) are translated into this new for- malism which is in many ways simpler, more clearly defined and appears to be more useful for applications. In order to avoid confusion with the notion of ‘object’ in FCA2 , we use the term ‘denotation’ instead of Peirce’s ‘object’ in the remainder of this paper. We translate the first half of Peirce’s definition into modern language as follows: “A representamen is a parameter of a function resulting in a second, called its denotation, where the third, the function instance, is called interpretant.” – or in other words as a function of type ‘third(first) = second’. In our modelling, a set of such functions together with their parameter/value pairs constitute a triadic semiotic relation. We use Peirce’s notion of ‘interpretant’ for the function instances whereas the functions themselves are called ‘interpretations’. A sign is then an instance of this triadic relation consisting of rep- resentamen, denotation and interpretation. This is more formally defined in the next section. The second half of Peirce’s definition refers to the mental image that a sign in- vokes in participants of communication acts. For Peirce, interpretants are mental images which can themselves be thought about and thus become representamens for other inter- pretants and so on. Because the focus of this paper is on formal, not natural languages, mental images are not important. We suggest that in formal languages, interpretants are not mental representations but instead other formal structures, for example, states in a computer program. The data structures used in formal languages (such as programming languages, XML or UML) can contain a significant amount of complexity. A semiotic-conceptual analysis as proposed in this paper allows to investigate the components of such struc- tures as signs with their representamens, denotations and interpretations and their rela- tionships to each other. As a means of structuring the semiotic components we use FCA concept lattices. It should be noted that there has recently been an increase of interest in triadic FCA (e.g., Gnatyshak et al. (2013), Belohlavek & Osicka (2012)) which could also be used to investigate triadic semiotic relations. But Peirce tends to see triadic relations as consisting of three components of increasing complexity: The First is that whose being is simply in itself, not referring to anything nor lying behind anything. The Second is that which is what it is by force of some- thing to which it is second. The Third is that which is what it is owing to things between which it mediates and which it brings into relation to each other. (Peirce, EP 1:248; CP 1.356) In our opinion this is better expressed by a function instance of type ‘third(first) = second’ than by an instance ‘(first, second, third)’ of a triadic relation. Other re- searchers have already suggested formalisations of Peirce’s philosophy. Interestingly, Marty (1992), Goguen (1999) and Zalamea (2010) all suggest using Category Theory 2 Because Formal Concept Analysis (FCA) is the main topic of this conference, this paper does not provide an introduction to FCA. Information about FCA can be found, for example, on-line (http://www.fcahome.org.uk) and in the main FCA textbook by Ganter & Wille (1999). Semiotic-Conceptual Analysis with Formal Concept Analysis 137 for modelling Peirce’s philosophy even though they appear to have worked indepen- dently of each other. Marty even connects Category Theory with FCA in his modelling. Goguen develops what he calls ‘algebraic semiotics’. Zalamea is more focussed on Ex- istential Graphs than semiotics (and unfortunately most of his papers are in Spanish). Nevertheless our formalisation is by far not as abstract as any of these existing formal- isations which are therefore not further discussed in this paper. This paper has five further sections. Section 2 presents the definitions of signs, semi- otic relations and NULL-elements. Section 3 continues with defining concept lattices for semiotic relations. Section 4 explains degrees of equality among signs. Section 5 discusses mappings among the lattices from Section 3 and presents further examples. The paper ends with a concluding section. 2 Core definitions of a semiotic-conceptual analysis The main purpose of this work is to extend Peirce’s sign notion to formal languages such as computer programming languages and formal representations. Figure 1 displays a simple Python program which is called ’Example 1’ in the remainder of this paper. The table underneath shows the values of the variables of Example 1 after an execution. The variables are representamens and their values are denotations. Because Peirce’s definition of signs seems to indicate that there is a separate interpretant for each sign, there are at least eight different interpretants in column 3 of the table. It seems more interesting, however, to group interpretants than to consider them individually. We call such groupings of interpretants interpretations. In natural language examples, one could group all the interpretants that belong to a sentence or paragraph. In programming lan- guages starting a loop or calling a subroutine might start a new interpretation. As a condition for interpretations we propose that each representamen must have a unique denotation in an interpretation, or in other words, interpretations are functions. There are many different possibilities for choosing sets of interpretations. Two possibilities, called IA and IB in the rest of the paper, are shown in the last two columns of the ta- ble. Each contains two elements which is in this case the minimum required number because some variables in Example 1 have two different values. In our modelling an interpretant corresponds to a pair of representamen and interpretation. For R and IA there are ten interpretants (and therefore ten signs) whereas there are eight for R and IB . The first three columns of the table can be automatically derived using a debugging tool. The interpretants are numbered in the sequence in which they are printed by the debugger. Definition 1: A semiotic relation S ⊆ I × R × D is a relation between three sets (a set R of representamens, a set D of denotations and a set I of interpretations) with the condition that any i ∈ I is a partial function i : R →7 D. A relation instance (i, r, d) with i(r) = d is called a sign. In addition to S, we define the semiotic (partial) mapping S : I × R →7 D with S(i, r) = d iff i(r) = d. The pairs (i, r) for which d exists are called interpretants. It follows that there are as many signs as there are interpretants. Example 1 shows two semiotic relations using either IA or IB for the interpretations. The interpretations firstLoop, secondLoop and firstValue are total functions. The interpretation second- 138 Uta Priss input_end = "no" while input_end != "yes": input1 = raw_input("Please type something: ") input2 = raw_input("Please type something else: ") if (input1 == input2): counter = 1 error = 1 print "The two inputs should be different!" else: counter = 2 input_end = raw_input("End this program? ") representamens R denotations D interpretants interpretations IA interpretations IB (variables) (values) (∼10 interpretants) (∼ 8 interpretants) input1 ”Hello World” j1 firstLoop firstValue input2 ”Hello World” j2 firstLoop firstValue counter 1 j3 firstLoop firstValue input end no j4 firstLoop firstValue error 1 j5 firstLoop firstValue input1 ”Hello World” j6 (or j1) secondLoop firstValue input2 ”How are you” j7 secondLoop secondValue counter 2 j8 secondLoop secondValue input end yes j9 secondLoop secondValue error 1 j10 (or j5) secondLoop firstValue Fig. 1. A Python program (called ‘Example 1’ in this paper) Value is a partial function. Because for (i1 , r1 , d1 ) and (i2 , r2 , d2 ), i1 = i2 , r1 = r2 ⇒ d1 = d2 , it follows that all r ∈ R with r(i) = d are also partial functions r : I → 7 D. The reason for having a relation S and a mapping S is because Peirce defines a relation but in applications a mapping might be more usable. In this paper the sets R, D and I are meant to be finite and not in any sense universal but built for an application. The assignment operation (written ‘:=’ in mathematics or ‘=’ in programming languages) is an example of i(r) = d except that i is usually implied and not explicitly stated in that case. Using the terminology from database theory, we call a semiotic relation a triadic re- lation with functional dependency. This is because, on the one hand, Peirce calls it not a mapping but a ‘triadic relation’, on the other hand, without this functional dependency it would not be possible to determine the meaning of a sign given its representamen and an interpretation. Some philosophers might object to Definition 1 because of the func- tional dependency. We argue that the added functional dependency yields an interesting structure which can be explored as shown in this paper. The idea of using interpretations as a means of assigning meaning to symbols is already known from formal semantics and model theory. But this paper has a different focus by treating interpretations and representamens as dual structures. Furthermore in applications, S(i, r) might be implemented as an algorithmic procedure which deter- mines d for r based on information about i at runtime. A debugger as in Example 1 Semiotic-Conceptual Analysis with Formal Concept Analysis 139 is not part of the original code but at a meta-level. Since the original code might re- quest user input (as in Example 1), the relation instances (i, r, d) are only known while or after the code was executed. Thus the semiotic relation is dynamically generated in an application. This is in accordance with Peirce’s ideas about how it is important for semiotics to consider how a sign is actually used. The mathematical modelling (as in Definition 1) which is conducted after a computer program finished running, ignores this and simply considers the semiotic relation to be statically presented. Priss (2004) distinguishes between triadic signs and anonymous signs which are less complex. In the case of anonymous signs, the underlying semiotic relation can be reduced to a binary or unary relation because of additional constraints. Examples of anonymous signs are constants in programming languages and many variables used in mathematical expressions. For instance, the values of variables in the Pythagorean equation a2 + b2 = c2 are all the values of all possibly existing right-angled triangles. But, on the one hand, if a2 + b2 = c2 is used in a proof, it is fine to assume |I| = 1 because within the proof the variables do not change their values. On the other hand, if someone uses the formula for an existing triangle, one can assume S(i, r) = r be- cause in that case the denotations can be set equal to the representamens. Thus within a proof or within a mathematical calculation variables can be instances of binary or unary relations and thus anonymous signs. However, in the following Python program: a = input("First side: ") b = input("Second side: ") print a*a + b*b the values of the variables change depending on what is entered by a user. Here the signs a and b are triadic. A sign is usually represented by its representamen. In a semiotic analysis it may be important to distinguish between ‘sign’ and ‘representamen’. In natural language this is sometimes indicated by using quotes (e.g., the word ‘word’). In Example 1, the variable ‘input1’ is a representamen whereas the variable ‘input1’ with a value ‘Hello World’ in the context of firstLoop is a sign. It can happen that a representamen is taken out of its context of use and loses its connection to an interpretation and a denotation. For example, one can encounter an old file which can no longer be read by any current program. But a sign always has three components (i, r, d). Thus just looking at the source code of a file creates an interpretant in that person’s mind even though this new sign and the original sign may have nothing in common other than the representamen. Using the next definition, interpretations that are partial functions can be converted into total functions. Definition 2: For a semiotic relation, a NULL-element d⊥ is a special kind of deno- tation with the following conditions: (i) i(r) undefined in D ⇒ i(r) := d⊥ in D∪{d⊥ }. (ii) d⊥ ∈ D ⇒ all i are total functions. Thus by enlarging D with one more element, one can convert all i into total func- tions. If all i are already total functions, then d⊥ need not exist. The semiotic mapping S can be extended to a total function S : I × R → D ∪ {d⊥ }. There can be different reasons for NULL-elements: caused by the selection of interpretations or by the code itself. Variables are successively added to a program and thus undefined for any inter- pretation that occurs before a variable is first defined. In Example 1, secondValue is a partial function because secondValue(input1) = secondValue(error) = d⊥ . But IA shows 140 Uta Priss that all interpretations can be total functions. On the other hand, if a user always enters two different values, then the variable ‘error’ is undefined for all interpretations. This could be avoided by changing the code of Example 1. In more complex programs it may be more difficult to avoid d⊥ , for example if a call to an external routine returns an undefined value. Programming languages tend to allow operations with d⊥ , such as evaluating whether a variable is equal to d⊥ , in order to avoid run-time errors resulting from undefined values. Because the modelling in the next section would be more com- plex if conditions for d⊥ were added we decided to mostly ignore d⊥ in the remainder of this introductory paper. 3 Concept lattices of a semiotic relation In order to explore relationships among signs we are suggesting to model the compo- nents of signs as concept lattices. The interpretations which are (partial) functions from R to D then give rise to mappings between the lattice for R and the lattice for D. Fig- ure 2 shows an example of concept lattices for the semiotic relation from Example 1. The objects are the representamens, denotations and interpretations of Example 1. The attributes are selected for characterising the sets and depend on the purpose of an appli- cation. If the denotations are values of a programming language, then data types are a fairly natural choice for the attributes of a denotation lattice. Attributes for representamens should focus on representational aspects. In Example 1, all input variables start with the letters ‘input’ because of a naming style used by the programmer of that program. In some languages certain variables start with upper or lowercase letters, use additional symbols (such as ‘@’ for arrays) or are complex structures (such as ’root.find(”file”).attrib[”size”]’) which can be analysed in a repre- sentamen lattice. In strongly-typed languages, data types could be attributes of repre- sentamens but in languages where variables can change their type, data types do not belong into a representamen lattice. Rules for representamens also determine what is to be ignored. For example white space is ignored in many locations of a computer pro- gram. The font of written signs is often ignored but mathematicians might use Latin, Greek and Fraktur fonts for representamens of different types of denotations. One way of deriving a lattice for interpretations is to create a partially ordered set using the ordering relation as to whether one interpretation precedes another one or whether they exist in parallel. A lattice is then generated using the Dedekind closure. In Figure 2 the attributes represent some scaling of the time points of the interpretations. Thus temporal sequences can be expressed but any other ordering can be used as well. Definition 3: For a set R of representamens, a concept lattice B(R, MR , JR ) is defined where MR is a set of attributes used to characterise representamens and JR is a binary relation JR ⊆ R × MR . B(R, MR , JR ) is called representamen lattice. B(R, MR , JR ) is complete for a set of interpretations3 if for all r ∈ R: ∀i∈I : γ(r1 ) = γ(r2 ) ⇒ i(r1 ) = i(r2 ) and γ(r1 ) 6= γ(r2 ) ⇒ ∃i∈I : i(r1 ) 6= i(r2 ). Definition 4: For a set I of interpretations, a concept lattice B(I, MI , JI ) is defined where MI is a set of attributes used to characterise the interpretations and JI is a binary 3 For an object o its object concept γ(o) is the smallest concept which has the object in its extension. Semiotic-Conceptual Analysis with Formal Concept Analysis 141 Representamen lattice Denotation lattice Interpretation lattice is defined string input other number >0 "Hello World" input1 counter "How are you" 1 firstLoop input2 error 2 input_end binary >1 positive negative secondLoop yes no . Fig. 2. Lattices for Example 1 relation JI ⊆ I × MI . B(I, MI , JI ) is called interpretation lattice. B(I, MI , JI ) is complete for a set of representamens if for all i ∈ I: ∀r∈R : γ(i1 ) = γ(i2 ) ⇒ i1 (r) = i2 (r) and γ(i1 ) 6= γ(i2 ) ⇒ ∃r∈R : i1 (r) 6= i2 (r). The representamen lattice in Figure 2 is not complete for IA because, for exam- ple, ‘input end’ and ’input1’ have different denotations. The interpretation lattice is complete for R because no objects have the same object concept and firstLoop and sec- ondLoop have different denotations, for example, for ‘counter’. Completeness means that exactly those representamens or interpretations that share their object concepts can be used interchangeably without any impact on their relationship with the other sets. The dashed lines in Figure 2 are explained below in Section 5. Definition 5: For a set D \ {d⊥ } of denotations, a concept lattice B(D, MD , JD ) is defined where MD is a set of attributes used to characterise the denotations and JD is a binary relation JD ⊆ D × MD . B(D, MD , JD ) is called denotation lattice. 4 Equality and other sign properties Before continuing with the consequences of the definitions of the previous section, equality of signs should be discussed because there are different degrees of equality. Two signs, (i1 , r1 , d1 ) and (i2 , r2 , d2 ), are equal if all three components are equal. Be- cause of the functional dependency this means that two signs are equal if i1 = i2 and r1 = r2 . In normal mathematics the equal sign is used for denotational equality. For example, x = 5 means that x has the value of 5 although clearly the representamen x has nothing in common with the representamen 5. Since signs are usually represented by their representamens denotational equality needs to be distinguished from equal- ity between signs. Denotational equality is called ‘strong synonymy’ in the definition below. Even strong synonymy is sometimes too much. For example natural language synonyms (such as ‘car’ and ‘automobile’) tend to always still have subtle differences in meaning. In programming languages, if a counter variable increases its value by 1, it is still thought of as the same variable. But if such a variable changes from ‘3’ to ‘Hello World’ and then to ‘4’, depending on the circumstances, it might indicate an 142 Uta Priss error. Therefore we are defining a tolerance relation4 T ⊆ D × D to express that some denotations are close to each other in meaning. With respect to the denotation lattice, the relation T can be defined as the equivalence relation of having the same object con- cept or via a distance metric between concepts. The following definition is an adaptation from Priss (2004) that is adjusted to the formalisation in this paper. Definition 6: For a semiotic relation with tolerance relations TD ⊆ D × D and TI ⊆ I × I the following are defined: • i1 and i2 are compatible ⇔ ∀r∈R,i1 (r)6=d⊥ ,i2 (r)6=d⊥ : (i1 (r), i2 (r)) ∈ TD • i1 and i2 are mergeable ⇔ ∀r∈R,i1 (r)6=d⊥ ,i2 (r)6=d⊥ : i1 (r) = i2 (r) • i1 and i2 are TI -mergeable ⇔ (i1 , i2 ) ∈ TI and i1 and i2 are mergeable • (i1 , r1 , d1 ) and (i2 , r2 , d2 ) are strong synonyms ⇔ r1 6= r2 and d1 = d2 • (i1 , r1 , d1 ) and (i2 , r2 , d2 ) are synonyms ⇔ r1 6= r2 and (d1 , d2 ) ∈ TD • (i1 , r1 , d1 ) and (i2 , r2 , d2 ) are equinyms ⇔ r1 = r2 and d1 = d2 • (i1 , r1 , d1 ) and (i2 , r2 , d2 ) are polysemous ⇔ r1 = r2 and (d1 , d2 ) ∈ TD • (i1 , r1 , d1 ) and (i2 , r2 , d2 ) are homographs ⇔ r1 = r2 and (d1 , d2 ) 6∈ TD It follows that if a representamen lattice is complete for a set of interpretations, rep- resentamens that share their object concepts are strong synonyms for all interpretations. In Example 1, if TD corresponds to {Hello World, How are you}, {yes, no}, {1, 2} then firstLoop and secondLoop are compatible. Essentially this means that variables do not radically change their meaning between firstLoop and secondLoop. Mergeable interpre- tations have the same denotation for each representamen and could be merged into one interpretation. In Example 1 the interpretations in IA (or in IB ) are not mergeable. Us- ing TI -mergeability it can be ensured that only interpretations which have something in common (for example temporal adjacency) are merged. There are no examples of homographs in Example 1 but the following table shows some examples for the other notions of Definition 6. strong synonyms (firstLoop, input2, ”Hello World”) (secondLoop, input1, ”Hello World”) synonyms (firstLoop, input1, ”Hello World”) (secondLoop, input2, ”How are you”) equinyms (firstLoop, input1, ”Hello World”) (secondLoop, input1, ”Hello World”) polysemous (firstLoop, input2, ”Hello World”) (secondLoop, input2, ”How are you”) Some programming languages use further types of synonymy-like relations, for ex- ample, variables can have the same value and but not the same data type or the same value but not be referring to the same object. An example of homographs in natural languages is presented by the verb ‘lead’ and the metal ‘lead’. In programming lan- guages, homographs are variables which have the same name but are used for totally different purposes. If this happens in separate subroutines of a program, it does not pose a problem. But if it involves global variables it might indicate an error in the code. Thus algorithms for homograph detection can be useful for checking the consistency of programs. Compatible interpretations are free of homographs. Definition 7: A semiotic relation with concept lattices as defined in Definitions 3-5 is called a semiotic system. The study of semiotic systems is called a semiotic- conceptual analysis. 4 A tolerance relation is reflexive and symmetric. Semiotic-Conceptual Analysis with Formal Concept Analysis 143 5 Mappings between the concept lattices A next step is to investigate how (and whether) the interpretations as functions from R to D give rise to interesting mappings between the representamen and denotation lattice. For example, if the representamen lattice has an attribute ‘starts with uppercase letter’ and it is common practice in a programming language to use uppercase letters for names of classes and there is an attribute ‘classes’ in the denotation lattice, then one would want to investigate whether this information is preserved by the mapping amongst the lattices. The following definition describes a basic relationship: Definition 8: For a semiotic relation, the power set P(R), subsetsW I1W⊆ I and R1 ⊆ R we define: I1∨ : P (R)\{} → B(D, MD , JD ) with I1∨ (R1 ) := i∈I1 r∈R1 γ(i(r)). Because the join relation in a lattice is commutative and associative it does not matter W Wwhether oneW first iterates W through interpretations or through representamens (i.e., i∈I1 r∈R1 or r∈R1 i∈I1 ). An analogous function can be defined for infima. One can also consider the inverse (I1∨ )−1 . Definition 8 allows for different types of applications. One can look at the results for extensions (and thus concepts of the representamen lattice), one-element sets (cor- responding to individual elements in R) or elements of a tolerance relation. The same holds for the subsets of I. The question that arises in each application is whether the mapping I1∨ has some further properties, such as being order-preserving or whether it forms an ‘infomorphism’ in Barwise & Seligman’s (1997) terminology (together with an inverse mapping). It may be of interest to find the subsets of R for which (I1∨ )−1 I1∨ (R1 ) = R1 . In the case of Figure 2, ‘input end’ is mapped onto the concepts with attribute ‘pos- itive’, ‘negative’ or ‘binary’ depending on which set of interpretations is used. The other representamens are always mapped onto the same concepts no matter which set of interpretations is used. For the extensions of the representamen lattice this leads to an order-preserving mapping. Thus overall the structures of the representamen and de- notation lattice seem very compatible in this example. Other examples could produce mappings which change radically between different interpretations. In a worst case sce- nario, every representamen is mapped to the top concept of the denotation lattice as soon as more than one interpretation is involved. In Figure 2 the interpretation lattice is depicted without any connection to the other two lattices. Furthermore even though a construction of R1∨ in analogy to I1∨ would be possible it would not be interesting for most applications because most elements would be mapped to the top element of the denotation lattice. Thus different strategies are needed for the representamen and interpretation lattices. One possibility of connect- ing the three lattices is to use a ‘faceted display’ similar to Priss (2000). The idea for Figure 3 is to use two facets: the denotation lattice which also contains the mapped representamens and the interpretation lattice. If a user ‘clicks’ on the upper concept in the interpretation lattice, the lattice on the left-hand side of Figure 3 is displayed. If a user clicks on the lower interpretation, the lattice on the right-hand side is displayed. Switching between the two interpretations would show the movement of ‘input end’. This is also reminiscent of the work by Wolff (2004) who uses ‘animated’ concept lat- tices which show the movement of ‘complex objects’ (in contrast to formal objects) 144 Uta Priss across the nodes of a concept lattice. In our semiotic-conceptual analysis the interpreta- tions are not necessarily linearly-ordered (as Wolff’s time units) but ordered according to a concept lattice. is defined is defined string >0 string number number input2 input1 input1 firstLoop error counter error input2 binary >1 binary counter positive negative secondLoop positive negative input_end input_end Fig. 3. Switching between interpretations Instead of variables or strings, representamens can also be more complex structures, such as graphs, UML diagrams, Peirce’s existential graphs, relations or other complex mathematical structures which are then analysed using interpretations. Figure 4 shows an example from Priss (1998) which was originally presented in terms of what Priss called ‘relational concept analysis5 ’. The words in the figure are entries from the elec- tronic lexical database WordNet6 . The solid lines in Figure 4 are subconcept relation instances from a concept lattice although the lattice drawing is incomplete in the figure. The dashed lines are part-whole relation instances that are defined among the concepts. Using a semiotic-conceptual analysis, this figure can be generated by using represen- tamens which are instances of a part-whole relation. Two interpretations are involved: one maps the first component of each relation instance into the denotation lattice, the other one maps the second component. Each dashed line corresponds to the mapping of one representamen. For each representamen, I ∨ (r) is the whole and I ∧ (r) the part of the relation instance. Priss (1999) calculates bases for semantic relations which in this modelling as a semiotic-conceptual analysis correspond to searching for infima and suprema of such representamens as binary relations. Figure 4 shows an example of a data error. The supremum of ‘hand’ and ‘foot’ should be the concept which is a part of ‘limb’. There should be a part-whole relation from ‘digit’ to that ‘extremity’ concept. Although it would be possible to write an al- gorithm that checks for this error systematically, this is probably again an example of where a user can detect an error in the data more easily (because of the lack of sym- metry) if the data is graphically presented. We argue that there are so many different ways of how semiotic-conceptual analyses can be used that it is not feasible to write 5 These days the notion ‘relational concept analysis’ is used in a different meaning by other authors. 6 https://wordnet.princeton.edu/ Semiotic-Conceptual Analysis with Formal Concept Analysis 145 animal body part external human body part extremity, appendage limb arm leg extremity hand foot digit structure finger toe nail fingernail toenail Fig. 4. Representamens showing a part-whole relation algorithms for any possible situation. In many cases the data can be modelled for an application and then interactively investigated. In Figure 4, the representamens are instances of a binary relation or pairs of deno- tations. Thus there are no intrinsic differences between what is a representamen, deno- tation or interpretation. Denotations are often represented by strings and thus are signs themselves (with respect to another semiotic relation). A computer program as a whole can also be a representamen. Since that is then a single representamen, the relation between the program output (its denotations) and the succession of states (its interpre- tations) is then a binary relation. Priss (2004) shows an example of a concept lattice for such a relation. 6 Conclusion and outlook This paper presents a semiotic-conceptual analysis that models the three components of a Peircean semiotic relation as concept lattices which are connected via a semiotic mapping. The paper shows that the formalisation of such a semiotic-conceptual analysis provides a unified framework for a number of our previous FCA applications (Priss, 1998-2004). It also presents another view on Wolff’s (2004) animated concept lattices. But this paper only describes a starting point for this kind of modelling. Instead of considering one semiotic system with sets R, D, I, one could also consider several semiotic systems with sets R1 , D1 , I1 and so on as subsets of larger sets R, D, I. Then one could investigate what happens if, for example, the signs from one semiotic system become the representamens, interpretations or denotations of another semiotic system. 146 Uta Priss For example, in the second half of Peirce’s sign definition in Section 1 he suggests that for i1 (r) = d there should be an i2 with i2 (i1 ) = d. Furthermore one could consider a denotation lattice as a channel between different representamen lattices in the terminology of Barwise & Seligman’s (1997) information flow theory as briefly mentioned in Section 5 which also poses some other open questions. There are connections with existing formalisms (for example model-theoretic se- mantics) that need further exploration. In some sense a semiotic-conceptual analy- sis subsumes syntactic relationships (among representamens), semantic relationships (among denotations) and pragmatic relationships (among interpretations) in one for- malisation. Other forms of semiotic analyses which use the definitions from Section 2 and 4 but use other structures than concept lattices (as suggested in Section 3) are pos- sible as well. Hopefully future research will address such questions and continue this work. References 1. Barwise, Jon; Seligman, Jerry (1997). Information Flow. The Logic of Distributed Systems. Cambridge University Press. 2. Belohlavek, Radim; Osicka, Petr (2012). Triadic concept lattices of data with graded at- tributes. International Journal of General Systems 41.2, p. 93-108. 3. Ganter, Bernhard; Wille, Rudolf (1999). Formal Concept Analysis. Mathematical Founda- tions. Berlin-Heidelberg-New York: Springer. 4. Gnatyshak, Dmitry; Ignatov, Dmitry; Kuznetsov, Sergei O. (2013). From Triadic FCA to Tri- clustering: Experimental Comparison of Some Triclustering Algorithms. CLA. Vol. 1062. 5. Goguen, Joseph (1999). An introduction to algebraic semiotics, with application to user in- terface design. Computation for metaphors, analogy, and agents. Springer Berlin Heidelberg, p. 242-291. 6. Priss, Uta (1998). The Formalization of WordNet by Methods of Relational Concept Analysis. In: Fellbaum, Christiane (ed.), WordNet: An Electronic Lexical Database and Some of its Applications, MIT press, p. 179-196. 7. Priss, Uta (1999). Efficient Implementation of Semantic Relations in Lexical Databases. Com- putational Intelligence, Vol. 15, 1, p. 79-87. 8. Priss, Uta (2000). Lattice-based Information Retrieval. Knowledge Organization, Vol. 27, 3, p. 132-142. 9. Priss, Uta (2004). Signs and Formal Concepts. In: Eklund (ed.), Concept Lattices: Second International Conference on Formal Concept Analysis, Springer Verlag, LNCS 2961, 2004, p. 28-38. 10. Maddux, Roger D. (1991). The origin of relation algebras in the development and axiomati- zation of the calculus of relations. Studia Logica 50, 3-4, p. 421-455. 11. Marty, Robert (1992). Foliated semantic networks: concepts, facts, qualities. Computers & mathematics with applications 23.6, p. 679-696. 12. Wolff, Karl Erich (2004). Towards a conceptual theory of indistinguishable objects. Concept Lattices. Springer Berlin Heidelberg, p. 180-188. 13. Zalamea, Fernando (2010). Towards a Complex Variable Interpretation of Peirces Existential Graphs. In: Bergman, M., Paavola, S., Pietarinen, A.-V., & Rydenfelt, H. (Eds.). Ideas in Action: Proceedings of the Applying Peirce Conference, p. 277-287. Using the Chu construction for generalizing formal concept analysis L. Antoni1 , I.P. Cabrera2 , S. Krajči1 , O. Krı́dlo1 , M. Ojeda-Aciego2 1 University of Pavol Jozef Šafárik, Košice, Slovakia ? 2 Universidad de Málaga. Departamento Matemática Aplicada. Spain ?? Abstract. The goal of this paper is to show a connection between FCA generalisations and the Chu construction on the category ChuCors, the category of formal contexts and Chu correspondences. All needed cat- egorical properties like categorical product, tensor product and its bi- functor properties are presented and proved. Finally, the second order generalisation of FCA is represented by a category built up in terms of the Chu construction. Keywords: formal concept analysis, category theory, Chu construction 1 Introduction The importance of category theory as a foundational tool was discovered soon after its very introduction by Eilenberg and MacLane about seventy years ago. On the other hand, Formal Concept Analysis (FCA) has largely shown both its practical applications and its capability to be generalized to more abstract frameworks, and this is why it has become a very active research topic in the recent years; for instance, a framework for FCA has been recently introduced in [19] in which the sets of objects and attributes are no longer unstructured but have a hypergraph structure by means of certain ideas from mathematical morphology. On the other hand, for an application of the FCA formalism to other areas, in [11] the authors introduce a representation of algebraic domains in terms of FCA. The Chu construction [8] is a theoretical method that, from a symmetric monoidal closed (autonomous) category and a dualizing object, generates a *- autonomous category. This construction, or the closely related notion of Chu space, has been applied to represent quantum physical systems and their sym- metries [1, 2]. This paper continues with the study of the categorical foundations of formal concept analysis. Some authors have noticed the property of being a cartesian closed category of certain concept structures that can be approximated [10, 20]; ? Partially supported by the Scientific Grant Agency of the Ministry of Education of Slovak Republic under contract VEGA 1/0073/15. ?? Partially supported by the Spanish Science Ministry projects TIN12-39353-C04-01 and TIN11-28084. c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 147–158, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 148 Ľubomír Antoni et al. others have provided a categorical construction of certain extensions of FCA [12]; morphisms have received a categorical treatment in [17] as a means for the modelling of communication. There already exist some approaches [9] which consider the Chu construction in terms of FCA. In the current paper, we continue the previous study by the authors on the categorical foundation of FCA [13,15,16]. Specifically, the goal of this paper is to highlight the importance of the Chu construction in the research area of categorical description of the theory of FCA and its generalisations. The Chu construction plays here the role of some recipe for constructing a suitable category that covers the second order generalisation of FCA. The structure of this paper is the following: in Section 2 we recall the prelim- inary notions required both from category theory and formal concept analysis. Then, the various categorical properties of the input category which are required (like the existence of categorical and tensor product) are developed in detail in Sections 3 and 4. An application of the Chu construction is presented in Section 5 where it is also showed how to construct formal contexts of second order from the category of classical formal contexts and Chu correspondences (ChuCors). 2 Preliminaries In order to make the manuscript self-contained, the fundamental notions and its required properties are recalled in this section. Definition 1. A formal context is any triple C = hB, A, Ri where B and A are finite sets and R ⊆ B × A is a binary relation. It is customary to say that B is a set of objects, A is a set of attributes and R represents a relation between objects and attributes. On a given formal context (B, A, R), the derivation (or concept-forming) operators are a pair of mappings ↑ : 2B → 2A and ↓ : 2A → 2B such that if X ⊆ B, then ↑X is the set of all attributes which are related to every object in X and, similarly, if Y ⊆ A, then ↓Y is the set of all objects which are related to every attribute in Y . In order to simplify the description of subsequent computations, it is conve- nient to describe the concept forming operators in terms of characteristic func- tions, namely, considering the subsets as functions on the set of Boolean values. Specifically, given X ⊆ B and Y ⊆ A, we can consider mappings ↑X : A → {0, 1} and ↓Y : B → {0, 1} ^ 1. ↑X(a) = (b ∈ X) ⇒ ((b, a) ∈ R) for any a ∈ A b∈B ^ 2. ↓Y (b) = (a ∈ Y ) ⇒ ((b, a) ∈ R) for any b ∈ B a∈A where the infimum is considered in the set of Boolean values and ⇒ is the truth- function of the implication of classical logic. Using the Chu construction for generalizing formal concept analysis 149 Definition 2. A formal concept is a pair of sets hX, Y i ∈ 2B × 2A which is a fixpoint of the pair of concept-forming operators, namely, ↑X = Y and ↓Y = X. The object part X is called the extent and the attribute part Y is called the intent. There are two main constructions relating two formal contexts: the bonds and the Chu correspondences. Their formal definitions are recalled below: Definition 3. Consider C1 = hB1 , A1 , R1 i and C2 = hB2 , A2 , R2 i two formal contexts. A bond between C1 and C2 is any relation β ∈ 2B1 ×A2 such that its columns are extents of C1 and its rows are intents of C2 . All bonds between such contexts will be denoted by Bonds(C1 , C2 ). The Chu correspondence between contexts can be seen as an alternative inter-contextual structure which, instead, links intents of C1 and extents of C2 . Namely, Definition 4. Consider C1 = hB1 , A1 , R1 i and C2 = hB2 , A2 , R2 i two formal contexts. A Chu correspondence between C1 and C2 is any pair of multimappings ϕ = hϕL , ϕR i such that – ϕL : B1 → Ext(C2 ) – ϕR : A2 → Int(C1 ) – ↑2(ϕL (b1 ))(a2 ) = ↓1(ϕR (a2 ))(b1 ) for any (b1 , a2 ) ∈ B1 × A2 All Chu correspondences between such contexts will be denoted by Chu(C1 , C2 ). The notions of bond and Chu correspondence are interchangeable; specifi- cally, we will use the bond βϕ associated to a Chu correspondence ϕ from C1 to C2 defined for b1 ∈ B1 , a2 ∈ A2 as follows: βϕ (b1 , a2 ) = ↑2 (ϕL (b1 ))(a2 ) = ↓1 (ϕR (a2 ))(b1 ) The set of all bonds (resp. Chu correspondences) between any two formal contexts endowed with set inclusion as ordering have a complete lattice structure. Moreover, both complete lattices are dually isomorphic. In order to formally define the composition of two Chu correspondences, we need to introduce the extension principle below: : X → 2Y we define its extended mapping Definition 5. Given a mapping ϕ S ϕ+ : 2 → 2 defined by ϕ+ (M ) = x∈M ϕ(x), for all M ∈ 2X . X Y The set of formal contexts together with Chu correspondences as morphisms forms a category denoted by ChuCors. Specifically: – objects formal contexts – arrows Chu correspondences – identity arrow ι : C → C of context C = hB, A, Ri • ιL (o) = ↓↑ ({b}), for all b ∈ B • ιR (a) = ↑↓ ({a}), for all a ∈ A 150 Ľubomír Antoni et al. – composition ϕ2 ◦ ϕ1 : C1 → C3 of arrows ϕ1 : C1 → C2 , ϕ2 : C2 → C3 (where Ci = hBi , Ai , Ri i, i ∈ {1, 2, 3}) • (ϕ2 ◦ ϕ1 )L : B1 → 2B3 and (ϕ2 ◦ ϕ1 )R : A3 → 2A1 • (ϕ2 ◦ ϕ1 )L (b1 ) = ↓3 ↑3 (ϕ2L+ (ϕ1L (b1 ))) • (ϕ2 ◦ ϕ1 )R (a3 ) = ↑1 ↓1 (ϕ1R+ (ϕ2R (a3 ))) The category ChuCors is *-autonomous and equivalent to the category of complete lattices and isotone Galois connection, more results on this category and its L-fuzzy extensions can be found in [13, 15, 16, 18]. 3 Categorical product on ChuCors In this section, the category ChuCors is proved to contain all finite categorical products, that is, it is a Cartesian category. To begin with, it is convenient to recall the notion of categorical product. Definition 6. Let C1 and C2 be two objects in a category. By a product of C1 and C2 we mean an object P with arrows πi : P → Ci for i ∈ {1, 2} satisfying the following condition: For any object D and arrows δi : D → Ci for i ∈ {1, 2}, there exists a unique arrow γ : D → P such that γ ◦ πi = δi for all i ∈ {1, 2}. The construction will use the notion of disjoint union of two sets S1 ] S2 which can be formally described as ({1} × S1 ) ∪ ({2} × S2 ) and, therefore, their elements will be denoted as ordered pairs (i, s) where i ∈ {1, 2} and s ∈ Si . Now, we can proceed with the construction: Definition 7. Consider C1 = hB1 , A1 , R1 i and C2 = hB2 , A2 , R2 i two formal contexts. The product of such contexts is a new formal context C1 × C2 = hB1 ] B2 , A1 ] A2 , R1×2 i where the relation R1×2 is given by ((i, b), (j, a)) ∈ R1×2 if and only if (i = j) ⇒ (b, a) ∈ Ri for any (b, a) ∈ Bi × Aj and (i, j) ∈ {1, 2} × {1, 2}. Lemma 1. The above defined contextual product fulfills the property of the cat- egorical product on the category ChuCors. Proof. We define the projection arrows hπiL , πiR i ∈ Chu(C1 ×C2 , Ci ) for i ∈ {1, 2} as follows – πiL : B1 ] B2 → Ext(Ci ) ⊆ 2Bi – πiR : Ai → Int(C1 × C2 ) ⊆ 2A1 ∪A2 – such that for any (k, x) ∈ B1 ] B2 and ai ∈ Ai the following equality holds ↑i (πiL (k, x))(ai ) = ↓1×2 (πiR (ai ))(k, x) Using the Chu construction for generalizing formal concept analysis 151 The definition of the projections is given below ( ↓i ↑i (χx )(bi ) for k = i πiL (k, x)(bi ) = for any (k, x) ∈ B1 ] B2 and bi ∈ Bi ↓ ↑ 0 (bi ) for k 6= i ( i i ↑i ↓i (χai )(y) for k = i πiR (ai )(k, y) = for any (k, y) ∈ A1 ]A2 and ai ∈ Ai . ↑k ↓k 0 (y) for k 6= i The proof that the definitions above actually provide a Chu correspondence is just a long, although straightforward, computation and it is omitted. Now, one has to show that to any formal context D = hE, F, Gi, where G ⊆ E × F and any pair of arrows (δ1 , δ2 ) with δi : D → Ci for all i ∈ {1, 2}, there exists a unique morphism γ : D → C1 × C2 such that the following diagram commutes: π1 π2 C1 < C1 × C2 > C2 < .∧. .. > .. γ δ1 . .. δ2 D We give just the definition of γ as a pair of mappings γL : E → 2B1 ]B2 and γR : A1 ] A2 → 2F – γL (e)(k, x) = δkL (e)(x) for any e ∈ E and (k, x) ∈ B1 ] B2 . – γR (k, y)(f ) = δkR (y)(f ) for any f ∈ F and (k, y) ∈ A1 ] A2 . Checking the condition of categorical product is again straightforward but long and tedious and, hence, it is omitted. t u We have just proved that binary products exist, but a cartesian category requires the existence of all finite products. If we recall the well-known categorical theorem which states that if a category has a terminal object and binary product, then it has all finite products, we have just to prove the existence of a terminal object (namely, the nullary product) in order to prove ChuCors to be cartesian. Any formal context of the form hB, A, B × Ai where the incidence relation is the full cartesian product of the sets of objects and attributes is (isomorphic to) the terminal object of ChuCors. Such formal context has just one formal concept hB, Ai; hence, from any other formal context there is just one Chu correspondence to hB, A, B × Ai. 4 Tensor product and its bifunctor property Apart from the categorical product, another product-like construction can be given in the category ChuCors, for which the notion of transposed context C ∗ is needed. Given a formal context C = hB, A, Ri, its transposed context is C ∗ = hA, B, Rt i, where Rt (a, b) holds iff R(b, a) holds. Now, if ϕ ∈ Chu(C1 , C2 ), one can consider ϕ∗ ∈ Chu(C2∗ , C1∗ ) defined by ϕ∗L = ϕR and ϕ∗R = ϕL . 152 Ľubomír Antoni et al. Definition 8. The tensor product of formal contexts Ci = hBi , Ai , Ri i for i ∈ {1, 2} is defined as the formal context C1 C2 = hB1 ×B2 , Chu(C1 , C2∗ ), R i where R ((b1 , b2 ), ϕ) = ↓2 (ϕL (b1 ))(b2 ). Mori studied in [18] the properties of the tensor product above, and proved that ChuCors with is a symmetric and monoidal category. Those results were later extended to the L-fuzzy case in [13]. In both papers, the structure of the formal concepts of a product context was established as an ordered pair formed by a bond and a set of Chu correspondences. Lemma 2. Let Ci = hBi , Ai , Ri i for i ∈ {1, 2} be two formal contexts, and let Chu(C1 ,C2∗ ) hβ, Xi ∈ Bonds(C V ∗ 1 , C2 ) × 2 be an arbitrary formal concept of C1 C2 . Then β = ψ∈X βψ and X = {ψ ∈ Chu(C1 , C2∗ ) | β ≤ βψ }. Proof. Let X be an arbitrary subset of Chu(C1 , C2∗ ). Then, for all (b1 , b2 ) ∈ B1 × B2 , we have ^ ↓C1 C2 (X)(b1 , b2 ) = (ψ ∈ X) ⇒ ↓2 (ψL (b1 ))(b2 ) ψ∈Chu(C1 ,C2∗ ) ^ ^ = ↓2 (ψL (b1 ))(b2 ) = βψ (b1 , b2 ) ψ∈X ψ∈X Let β be an arbitrary subset of B1 × B2 . Then, for all ψ ∈ Chu(C1 , C2∗ ) ^ ↑C1 C2 (β)(ψ) = β(b1 , b2 ) ⇒ ↓2 (ψL (b1 ))(b2 ) (b1 ,b2 )∈B1 ×B2 ^ = β(b1 , b2 ) ⇒ βψ (b1 , b2 ) (b1 ,b2 )∈B1 ×B2 Hence ↑C1 C2 (β) = {ψ ∈ Chu(C1 , C2∗ ) | β ≤ βψ } t u We now introduce the notion of product of one context with a Chu corre- spondence. Definition 9. Let Ci = hBi , Ai , Ri i for i ∈ {0, 1, 2} be formal contexts, and consider ϕ ∈ Chu(C1 , C2 ). Then, the pair of mappings (C0 ϕ)L : B0 × B1 → 2B0 ×B2 (C0 ϕ)R : Chu(C0 , C2 ) → 2Chu(C0 ,C1 ) is defined as follows: – (C0 ϕ)L (b, b1 )(o, b2 ) = ↓C0 C2 ↑C0 C2 (γϕb,b1 )(o, b2 ) where γϕb,b1 (o, b2 ) = (b = o) ∧ ϕL (b1 )(b2 ) for any b, o ∈ B0 , bi ∈ Bi with i ∈ {1, 2} – (C0 ϕ)R (ψ2 )(ψ1 ) = ψ1 ≤ (ψ2 ◦ ϕ∗ ) for any ψi ∈ Chu(C0 , Ci ) As one could expect, the result is a Chu correspondence between the products of the contexts. Specifically: Using the Chu construction for generalizing formal concept analysis 153 Lemma 3. Let Ci = hBi , Ai , Ri i be formal contexts for i ∈ {0, 1, 2}, and con- sider ϕ ∈ Chu(C1 , C2 ). Then C0 ϕ ∈ Chu(C0 C1 , C0 C2 ). Proof. (C0 ϕ)L (b, b1 ) ∈ Ext(C0 C2 ) for any (b, b1 ) ∈ B0 × B1 follows directly from its definition. (C0 ϕ)R (ψ) ∈ Int(C0 C1 ) for any ψ ∈ Chu(C0 , C1 ) follows from Lemma 2. Consider an arbitrary b ∈ B0 , b1 ∈ B1 and ψ2 ∈ Chu(C0 , C2∗ ) ↑C0 C2 (C0 ϕ)L (b, b1 ) (ψ2 ) = ↑C0 C2 ↓C0 C2 ↑C0 C2 (γϕb,b1 )(ψ2 ) = ↑C0 C2 (γϕb,b1 )(ψ2 ) ^ = γϕb,b1 (o, b2 ) ⇒ ↓ (ψ2R (b2 ))(o) (o,b2 )∈B0 ×B2 ^ = (o = b) ∧ ϕL (b1 )(b2 ) ⇒ ↓ (ψ2R (b2 ))(o) (o,b2 )∈B0 ×B2 ^ ^ = (o = b) ⇒ ϕL (b1 )(b2 ) ⇒ ↓ (ψ2R (b2 ))(o) o∈B0 b2 ∈B2 ^ ^ = (o = b) ⇒ (ϕL (b1 )(b2 ) ⇒ ↓ (ψ2R (b2 ))(o)) o∈B0 b2 ∈B2 ^ = ϕL (b1 )(b2 ) ⇒ ↓ (ψ2R (b2 ))(b) b2 ∈B2 ^ ^ = ϕL (b1 )(b2 ) ⇒ (ψ2R (b2 )(a) ⇒ R(b, a)) b2 ∈B2 a∈A ^ _ = (ϕL (b1 )(b2 ) ∧ ψ2R (b2 )(a)) ⇒ R(b, a) a∈A b2 ∈B2 ^ = ψ2R+ (ϕL (b1 ))(a) ⇒ R(b, a) a∈A = ↓ (ψ2R+ (ϕL (b1 ))(b) = ↓↑↓ (ψ2R+ (ϕL (b1 ))(b) = ↓ ((ϕ ◦ ψ2 )R (b1 ))(b) Note the use above of the extended mapping as given in Definition 5 in relation to the composition of Chu correspondences. On the other hand, we have ↓C0 C1 ((C0 ϕ)R (ψ2 ))(b, b1 ) ^ = ((C0 ϕ)R (ψ2 )(ψ1 ) ⇒ ↓ (ψ1R (b1 ))(b)) ψ1 ∈Chu(C0 ,C1 ) ^ = ((ψ1 ≥ ϕ ◦ ψ2 ) ⇒ ↓ (ψ1R (b1 ))(b)) ψ1 ∈Chu(C0 ,C1 ) 154 Ľubomír Antoni et al. ^ = ↓ (ψ1R (b1 ))(b) = ↓ ((ϕ ◦ ψ2 )R (b1 ))(b) ψ1 ∈Chu(C0 ,C1 ) ψ1 ≥ϕ◦ψ2 Hence ↑C0 C2 ((C0 ϕ)L (b, b1 ))(ψ2 ) = ↓C0 C1 ((C0 ϕ)R (ψ2 ))(b, b1 ). So if ϕ ∈ Chu(C1 , C2 ) then C0 ϕ ∈ Chu(C0 C1 , C0 C2 ). t u Given a fixed formal context C, the tensor product C (−) forms a mapping between objects of ChuCors assigning to any formal context D the formal context CD. Moreover to any arrow ϕ ∈ Chu(C1 , C2 ) it assigns an arrow Cϕ ∈ Chu(C C1 , C C2 ). We will show that this mapping preserves the unit arrows and the composition of Chu correspondences. Hence the mapping forms an endofunctor on ChuCors, that is, a covariant functor from the category ChuCors to itself. To begin with, let us recall the definition of functor between two categories: Definition 10 (See [6]). A covariant functor F : C → D between categories C and D is a mapping of objects to objects and arrows to arrows, in such a way that: – For any morphism f : A → B, one has F (f ) : F (A) → F (B) – F (g ◦ f ) = F (g) ◦ F (f ) – F (1A ) = 1F (A) . Lemma 4. Let C = hB, A, Ri be a formal context. C (−) is an endofunctor on ChuCors. Proof. Consider the unit morphism ιC1 of a formal context C1 = hB1 , A1 , R1 i, and let us show that (C ιC1 ) = ιCC1 . In other words, C (−) respects unit arrows in ChuCors. ↑CC1 (C ιC1 )(b, b1 ) (ψ) ^ = (o = b) ∧ ιC1 L (b1 )(o1 ) ⇒ ↓1 (ψL (o))(o1 ) (o,o1 )∈B×B1 ^ = ↓1 ↑1 (χb1 )(o1 ) ⇒ ↓1 (ψL (b))(o1 ) o1 ∈B1 ^ ^ = ↓1 ↑1 (χb1 )(o1 ) ⇒ ψL (b)(a1 ) ⇒ R(o1 , a1 ) o1 ∈B1 a1 ∈A1 ^ ^ = ↓1 ↑1 (χb1 )(o1 ) ⇒ ψL (b)(a1 ) ⇒ R(o1 , a1 ) o1 ∈B1 a1 ∈A1 ^ ^ = ψL (b)(a1 ) ⇒ ↓1 ↑1 (χb1 )(o1 ) ⇒ R(o1 , a1 ) o1 ∈B1 a1 ∈A1 ^ ^ = ψL (b)(a1 ) ⇒ ↓1 ↑1 (χb1 )(o1 ) ⇒ R(o1 , a1 ) a1 ∈A1 o1 ∈B1 ^ = ψL (b)(a1 ) ⇒ ↑1 ↓1 ↑1 (χb1 )(a1 ) a1 ∈A1 Using the Chu construction for generalizing formal concept analysis 155 ^ = ψL (b)(a1 ) ⇒ R1 (b1 , a1 ) = ↓1 (ψL (b))(b1 ) a1 ∈A1 and, on the other hand, we have ↑CC1 (ιCC1 (b, b1 ))(ψ) = ↑CC1 (χ(b,b1 ) )(ψ) ^ = χ(b,b1 ) (o, o1 ) ⇒ ↓1 (ψL (o))(o1 ) (o,o1 )∈B×B1 = ↓1 (ψL (b))(b1 ) As a result, we have obtained ↑CC1 ((C ιC1 )(b, b1 ))(ψ) =↑CC1 (ιCC1 (b, b1 ))(ψ) for any (b, b1 ) ∈ B × B1 and any ψ ∈ Chu(C, C1 ); hence, ιCC1 = (C ιC1 ). We will show now that C (−) preserves the composition of arrows. Specif- ically, this means that for any two arrows ϕi ∈ Chu(Ci , Ci+1 ) for i ∈ {1, 2} it holds that C (ϕ1 ◦ ϕ2 ) = (C ϕ1 ) ◦ (C ϕ2 ). ↑CC3 C (ϕ1 ◦ ϕ2 ) L (b, b1 ) (ψ3 ) ^ = (o = b) ∧ (ϕ1 ◦ ϕ2 )L (b1 )(b3 ) ⇒ ↓ (ψ3R (b3 ))(o) (o,b3 )∈B×B3 ^ = (ϕ1 ◦ ϕ2 )L (b1 )(b3 ) ⇒ ↓ (ψ3R (b3 ))(b) b3 ∈B3 (by similar operations to those in the first part of the proof) = ↓ (ϕ1 ◦ ϕ2 ◦ ψ3 )L (b1 ) (b) On the other hand, and writing F for C − in order to simplify the resulting expressions, we have ↑F C3 ((F ϕ1 ◦ F ϕ2 )L (b, b1 ))(ψ3 ) = ↑F C3 ↓F C3 ↑F C3 (F ϕ2 )L+ (F ϕ1 )L (b, b1 ) (ψ3 ) ^ = (o,b3 )∈B×B3 _ (F ϕ1 )L (b, b1 )(j, b2 ) ∧ (F ϕ2 )L (j, b2 )(o, b3 ) ⇒ ↓ (ψ3R (b3 ))(o) (j,b2 )∈B×B2 ^ ^ = ϕ1L (b1 )(b2 ) ∧ ϕ2L (b2 )(b3 ) ⇒ ↓ (ψ3R (b3 ))(b) b3 ∈B3 b2 ∈B2 ^ _ = ϕ1L (b1 )(b2 ) ∧ ϕ2L (b2 )(b3 ) ⇒ ↓ (ψ3R (b3 ))(b) b3 ∈B3 b2 ∈B2 ^ = ϕ2L+ (ϕ1L (b1 ))(b3 ) ⇒ ↓ (ψ3R (b3 ))(b) b3 ∈B3 156 Ľubomír Antoni et al. ^ = (ϕ1 ◦ ϕ2 )L (b1 )(b3 ) ⇒ ↓ (ψ3R (b3 ))(b) b3 ∈B3 From the previous equalities we see that C (ϕ1 ◦ ϕ2 ) = (C ϕ1 ) ◦ (C ϕ2 ). Hence, composition is preserved. As a result, the mapping C (−) forms a functor from ChuCors to itself. t u All the previous computations can be applied to the first argument without any problems, hence we can directly state the following proposition. Proposition 1. The tensor product forms a bifunctor − − from ChuCors × ChuCors to ChuCors. 5 The Chu construction on ChuCors and second order formal concept analysis A second order formal context [14] focuses on the external formal contexts and it serves a bridge between the L-fuzzy [3, 7] and heterogeneous [4] frameworks. Definition 11. SConsiderS two non-empty index sets I and J and an L-fuzzy formal context h i∈I Bi , j∈J Aj , ri, whereby – Bi1 ∩ Bi2 = ∅ for any i1 , i2 ∈ I, – Aj1S∩ Aj2 = ∅Sfor any j1 , j2 ∈ J, – r : i∈I Bi × j∈J Aj −→ L. Moreover, consider two non-empty sets of L-fuzzy formal contexts (external for- mal contexts) notated by – {hBi , Ti , pi i : i ∈ I}, whereby Ci = hBi , Ti , pi i, – {hOj , Aj , qj i : j ∈ J}, whereby Dj = hOj , Aj , qj i. A second order formal context is a tuple D[ [ [ E Bi , {Ci ; i ∈ I}, Aj , {Dj ; j ∈ J}, ri,j , i∈I j∈J (i,j)∈I×J whereby ri,j : Bi × Aj −→ L is defined as ri,j (o, a) = r(o, a) for any o ∈ Bi and a ∈ Aj . The Chu construction [8] is a theoretical process that, from a symmetric monoidal closed (autonomous) category and a dualizing object, generates a *- autonomous category. The basic theory of *-autonomous categories and their properties are given in [5, 6]. In the following, the construction will be applied on ChuCors and the dual- izing object ⊥ = h{}, {}, 6=i as inputs. In this section it is shown how second order FCA [14] is connected to the output of such construction. The category generated by the Chu construction and ChuCors and ⊥ will be denoted by CHU(ChuCors, ⊥): Using the Chu construction for generalizing formal concept analysis 157 – Its objects are triplets of the form hC, D, ρi where • C and D are objects of the input category ChuCors (i.e. formal contexts) • ρ is an arrow in Chu(C D, ⊥) – Its morphisms are pairs of the form hϕ, ψi : hC1 , C2 , ρ1 i → hD1 , D2 , ρ2 i where Ci and Di are formal contexts for i ∈ {1, 2} and • ϕ and ψ are elements from Chu(C1 , D1 ) and Chu(D2 , C2 ), respectively, such that the following diagram commutes C1 ψ C1 D2 > C1 C2 ϕ D2 ρ1 ∨ ∨ D1 D2 >⊥ ρ2 or, equivalently, the following equality holds (C1 ψ) ◦ ρ1 = (ϕ D2 ) ◦ ρ2 There are some interesting facts in the previous construction with respect to the second order FCA [14]: 1. To begin with, every object hC1 , C2 , ρi in CHU(ChuCorsL , ⊥), and recall that ρ ∈ Chu(C1 C2 , ⊥), can be represented as a second order formal context (from Definition 11). Simply take into account that, from basic properties of the tensor product, we can obtain Chu(C1 C2 , ⊥) ∼ = Chu(C1 , C2∗ ). Specifically, as ChuCors is a closed monoidal category, we have that for every three formal contexts C1 , C2 , C3 the following isomorphism holds ChuCors(C1 C2 , C3 ) ∼ = ChuCors(C1 , C2 ( C3 ), whereby C2 ( C3 denotes the value at C3 of the right adjoint and recall that C2 ( ⊥ ∼ = C2∗ because ChuCors is *-autonomous. The other necessary details about closed monoidal categories and the corresponding notations one can find in [6]. 2. Similarly, any second order formal context (from Definition 11) is repre- sentable by an object of CHU(ChuCorsL , ⊥). 6 Conclusions and future work After introducing the basic definitions needed from category theory and formal concept analysis, in this paper we have studied two different product construc- tions in the category ChuCors, namely the categorical product and the tensor product. The existence of products allows to represent tables and, hence, bi- nary relations; the tensor product is proved to fulfill the required properties of a bifunctor, which enables us to consider the Chu construction on the cat- egory ChuCors. As a first application, we have sketched the representation of 158 Ľubomír Antoni et al. second order formal concept analysis [14] in terms of the Chu construction on the category ChuCors. The use of different subcategories of ChuCors as input to the Chu construc- tion seems to be an interesting way of obtaining different existing generalizations of FCA. For future work, we are planning to provide representations based on the Chu construction for one-sided FCA, heterogeneous FCA, multi-adjoint FCA, etcetera. References 1. S. Abramsky. Coalgebras, Chu Spaces, and Representations of Physical Systems. Journal of Philosophical Logic, 42(3):551–574, 2013. 2. S. Abramsky. Big Toy Models: Representing Physical Systems As Chu Spaces. Synthese, 186(3):697–718, 2012. 3. C. Alcalde, A. Burusco, R. Fuentes-González, The use of two relations in L-fuzzy contexts. Information Sciences, 301:1–12, 2015. 4. L. Antoni, S. Krajči, O. Krı́dlo, B. Macek, L. Pisková, On heterogeneous formal contexts. Fuzzy Sets and Systems, 234:22–33, 2014. 5. M. Barr, *-Autonomous categories, vol. 752 of Lecture Notes in Mathematics. Springer-Verlag, 1979. 6. M. Barr, Ch. Wells, Category theory for computing science, 2nd ed., Prentice Hall International (UK) Ltd., 1995. 7. R. Bělohlávek. Concept lattices and order in fuzzy logic. Annals of Pure and Applied Logic, 128:277–298, 2004. 8. P.-H. Chu, Constructing *-autonomous categories. Appendix to [5], pages 103–107. 9. J. T. Denniston, A. Melton, and S. E. Rodabaugh. Formal concept analysis and lattice-valued Chu systems. Fuzzy Sets and Systems, 216:52–90, 2013. 10. P. Hitzler and G.-Q. Zhang. A cartesian closed category of approximable concept structures. Lecture Notes in Computer Science, 3127:170–185, 2004. 11. M. Huang, Q. Li, and L. Guo. Formal Contexts for Algebraic Domains. Electronic Notes in Theoretical Computer Science, 301:79–90, 2014. 12. S. Krajči. A categorical view at generalized concept lattices. Kybernetika, 43(2):255–264, 2007. 13. O. Krı́dlo, S. Krajči, and M. Ojeda-Aciego. The category of L-Chu correspondences and the structure of L-bonds. Fundamenta Informaticae, 115(4):297–325, 2012. 14. O. Krı́dlo, P. Mihalčin, S. Krajči, and L. Antoni. Formal concept analysis of higher order. Proceedings of Concept Lattices and their Applications (CLA), 117–128, 2013. 15. O. Krı́dlo and M. Ojeda-Aciego. On L-fuzzy Chu correspondences. Intl J of Computer Mathematics, 88(9):1808–1818, 2011. 16. O. Krı́dlo and M. Ojeda-Aciego. Revising the link between L-Chu Correspondences and Completely Lattice L-ordered Sets. Annals of Mathematics and Artificial Intelligence 72:91–113, 2014. 17. M. Krötzsch, P. Hitzler, and G.-Q. Zhang. Morphisms in context. Lecture Notes in Computer Science, 3596:223–237, 2005. 18. H. Mori. Chu correspondences. Hokkaido Mathematical Journal, 37:147–214, 2008. 19. J.G. Stell. Formal Concept Analysis over Graphs and Hypergraphs. Lecture Notes in Computer Science, 8323:165–179, 2014. 20. G.-Q. Zhang and G. Shen. Approximable concepts, Chu spaces, and information systems. Theory and Applications of Categories, 17(5):80–102, 2006. From formal concepts to analogical complexes Laurent Miclet1 and Jacques Nicolas2 1 Université de Rennes 1, UMR IRISA, Dyliss team, Rennes, France, miclet@univ-rennes1.fr 2 Inria Rennes, France, jacques.nicolas@inria.fr Abstract. Reasoning by analogy is an important component of common sense reasoning whose formalization has undergone recent improvements with the logical and algebraic study of the analogical proportion. The starting point of this study considers analogical proportions on a formal context. We introduce analogical complexes, a companion of formal con- cepts formed by using analogy between four subsets of objects in place of the initial binary relation. They represent subsets of objects and at- tributes that share a maximal analogical relation. We show that the set of all complexes can be structured in an analogical complex lattice and give explicit formulae for the computation of their infimum and supremum. Keywords: analogical reasoning, analogical proportion, formal concept, analogical complex, lattice of analogical complexes 1 Introduction Analogical reasoning [4] plays an important role in human reasoning. It en- ables us to draw plausible conclusions by exploiting parallels between situations, and as such has been studied in AI for a long time, e.g., [5, 9] under various approaches [3]. A key pattern which is associated with the idea of analogical reasoning is the notion of analogical proportion (AP), i. e. a statement between two pairs (A, B) and (C, D) of the form ‘A is to B as C is to D’ where all elements A, B, C, D are in a same category . However, it is only in the last decade that researchers working in computa- tional linguistics have started to study these proportions in a formal way [6, 17, 19]. More recently, analogical proportions have been shown as being of particu- lar interest for classification tasks [10] or for solving IQ tests [2]. Moreover, in the last five years, there has been a number of works, e.g., [11, 15] studying the propositional logic modeling of analogical proportions. In all previous cases, the ability to work on the set of all possible analogical proportions is required, either for checking missing objects or attributes or for making informed recommendations or more generally ensuring the completeness and efficiency of reasoning. In practice the analysis of objects composed of binary attributes, such as those studied by Formal Concept Analysis, is an important and easy context where AP are used. The question is whether it is possible to obtain a good representation of the space of all AP by applying the principles of c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 159–170, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 160 Laurent Miclet and Jacques Nicolas FCA. A heuristic algorithm to discover such proportions by inspecting a lattice of formal concepts has been proposed in [14]. Moreover, a definition of an analogical proportion between formal concepts has been given in [13], as a particular case of proportions between elements of a lattice, studied also in [18]. In this paper, we are interested in a slightly different task involving a more integrated view of concept categorization and analogy: looking for the structure of the space of all AP. Our goal is to build an extension of formal concepts con- sidering the presence of analogical proportions as the funding relation instead of the initial binary relation between objects and attributes. We call this ex- tension analogical complexes, which isolate subcontexts in formal contexts with a certain structure reflecting the existence of a maximal analogical proportion between subsets of objects and subsets of attributes. 2 Basics on Analogical Proportion Definition 1 (Analogical proportion [7, 12]). An analogical proportion (AP) on a set X is a quaternary relation on X, i.e. a subset of X 4 whose elements (x, y, z, t), written x : y :: z : t , which reads ’x is to y as z is to t’, must obey the following two axioms: 1. Symmetry of ’as’: x : y :: z : t ⇔ z : t :: x : y 2. Exchange of means: x : y :: z : t ⇔ x : z :: y : t In case of formal contexts, objects are described by boolean attributes. An AP (x, y, z, t) between four Boolean variables exists if the following formula is true: (x ∧ ¬y) ⇔ (z ∧ ¬t) and (y ∧ ¬x) ⇔ (t ∧ ¬z) Basically, the formula expresses that the dissimilarity observed between x and y is the same as the dissimilarity between z and t. An equivalent formula is x 6= y ⇔ (x = z ∧ y = t) and x = y ⇔ z = t It has 6 models of Boolean 4-tuples among the 16 possible ones. Note that this includes the trivial cases where x = y = z = t. Since we are only interested in this paper in non trivial analogical proportions, we further require that x 6= t and y 6= z. This reduces the number of possible Boolean 4-tuples in AP to four and it leads to the notion of analogical schema that we will use for the definition of analogical complexes. 0011 0 1 0 1 Definition 2 (Analogical schema). The binary matrix AS = 1 0 1 0 is 1100 called an analogical schema. We write AS(i, j) if the value at row i and column j of matrix AS is 1 (e.g. AS(1,3) and AS(1,4)) . From formal concepts to analogical complexes 161 The analogical schema may be seen as a formal context on four objects o1 , o2 , o3 , o4 that are in the non-trivial AP: o1 : o2 :: o3 : o4 . The figure 1 shows the associated concept lattice. In this lattice, A ∧ D = B ∧ C and A ∨ D = B ∨ C. The figure also give names for each column and row profiles that we call object and attribute types: for instance the first column as type 1 and the second row as type b. > AB AC BD CD 1 2 3 4 {4} {3} {2} {1} a ×× b × × c× × d×× {a} {b} {c} {d} A B C D ⊥ Fig. 1. Left:Concept lattice of an analogical schema (reduced labeling). Analogical schema with object and attribute types. We use in this paper the zoo dataset proposed by R. Forsyth [8] for illustra- tion purpose. We call smallzoo the formal context extracted from this database corresponding to attributes 2 to 9 and to the objects corresponding to the two largest classes 1 and 2. Moreover, this context has been clarified and we have chosen arbitrarily one object for each of the 10 different types of objects with different attribute profiles. The corresponding table is given below. smallzoo 2 3 4 5 6 7 8 9 18 hair feathers eggs milk airborne aquatic predator toothed type 1 aardvark 1 0 0 1 0 0 1 1 1 12 chicken 0 1 1 0 1 0 0 0 2 17 crow 0 1 1 0 1 0 1 0 2 20 dolphin 0 0 0 1 0 1 1 1 1 22 duck 0 1 1 0 1 1 0 0 2 28 fruitbat 1 0 0 1 1 0 0 1 1 42 kiwi 0 1 1 0 0 0 1 0 2 49 mink 1 0 0 1 0 1 1 1 1 59 penguin 0 1 1 0 0 1 1 0 2 64 platypus 1 0 1 1 0 1 1 0 1 162 Laurent Miclet and Jacques Nicolas The formal concept lattice is provided in figure 2, as computed by FCA Extension [16]. It contains 31 elements. The central elements (at least two objects and two attributes) are listed below: c(3) {20; 49; 59; 64}, {7; 8} c(6) {1; 20; 28; 49}, {5; 9} c(7) {1; 20; 49; 64}, {5; 8} c(8) {1; 20; 49}, {5; 8; 9} c(9) {20; 49; 64}, {5; 7; 8} c(10) {20; 49}, {5; 7; 8; 9} c(12) {17; 42; 59; 64}, {4; 8} c(13) {22; 59; 64}, {4; 7} c(14) {59; 64}, {4; 7; 8} c(15) {12; 17; 22; 42; 59}, {3; 4} c(16) {17; 42; 59}, {3; 4; 8} c(17) {22; 59}, {3; 4; 7} c(19) {12; 17; 22}, {3; 4; 6} c(22) {1; 28; 49; 64}, {2; 5} c(23) {1; 28; 49}, {2; 5; 9} c(24) {1; 49; 64}, {2; 5; 8} c(25) {1; 49}, {2; 5; 8; 9} c(26) {49; 64}, {2; 5; 7; 8} Example 1. If one extracts in smallzoo the subcontext crossing (12, 28, 59, 49) - that is, (chicken, fruitbat, penguin, mink)- and (7, 2, 3, 6) -(aquatic, hair, feathers, airborne)-, it is clearly an analogical schema. The 4-tuple (chicken : fruitbat :: penguin : mink) is an analogical proportion that finds a support using attributes (aquatic, hair, feathers, airborne). Each attribute reflects one of the four possible types of Boolean analogy. For instance, hair is false for chicken and penguin and true for fruitbat and mink whereas feathers is true for chicken and penguin and false for fruitbat and mink. The observed analogy can be explained thanks to this typology: the dissimilarity between chicken and fruitbat based on the opposition feather/hair is the same as the dissimilarity between penguin and mink and there are two other opposite attributes, airborne and aquatic, that explain the similarity within each ’is to’ relation. Note that the analogical schema if fully symmetric and thus one could also in principle write AP between attributes: hair:feathers::aquatic: airborne. 3 An analogical complex is to an analogical proportion as a concept is to a binary relation 3.1 Analogical complexes A formal concept on a context (X, Y, I) is a maximal subcontext for which relation I is valid. We define analogical complexes in the same way: they are maximal subcontexts for which the 4-tuples are in AP. This requires to split objects and attributes in four classes. Definition 3 (Analogical complex). Given a formal context (X, Y, I), a set of objects O ⊆ X, O = O1 ∪ O2 ∪ O3 ∪ O4 , a set of attributes A ⊆ Y , A = A1 ∪ A2 ∪ A3 ∪ A4 , and a binary relation I, the subcontext (O, A) forms an analogical complex (O1,4 , A1,4 ) iff From formal concepts to analogical complexes 163 Fig. 2. Formal concept lattice of formal context smallzoo. Drawing from Concept Ex- plorer [20]. 1. The binary relation is compatible with the analogical schema AS: ∀o ∈ Oi , i = 1..4, ∀a ∈ Aj , j = 1..4, I(o, a) ⇔ AS(i, j). 2. The context is maximal with respect to the first property (⊕ denotes the ex- clusive or and \ the set-theoretic difference): ∀o ∈ X\O, ∃j ∈ [1, 4], ∃a ∈ Aj , I(o, a) ⊕ AS(i, j). ∀a ∈ Y \A, ∃i ∈ [1, 4], ∃o ∈ Oi , I(o, a) ⊕ AS(i, j). The first property states that the value of an attribute for an object in a com- plex is a function of object type and attribute type (integer from 1 to 4) given by the analogical schema. The second property states that adding an object (resp. an attribute) to the complex would discard the first property for at least one attribute (resp. object) value. Note that the ways analogical schema or analogi- cal complex are defined are completely symmetric. Thus the role of objects and attributes may be interchanged in all properties on analogical complexes. 164 Laurent Miclet and Jacques Nicolas Example 2. We extract two subcontexts from smallzoo, highlighting analogical schemas by sorting rows and columns. A1 A2 A3 A4 a7 a6 O1 o12 0 1 o17 0 1 o28 0 1 O2 o12 0 1 A1 A2 A3 A4 o17 0 1 a7 a8 a2 a5 a9 a3 a4 a6 o28 0 1 O1 o12 (chicken) 0 0 0 0 0 1 1 1 O3 o20 1 0 O2 o28 (f ruitbat) 0 0 1 1 1 0 0 1 o49 1 0 O3 o59 (penguin) 1 1 0 0 0 1 1 0 o59 1 0 O4 o49 (mink) 1 1 1 1 1 0 0 0 o64 1 0 O4 o20 1 0 o49 1 0 o59 1 0 o64 1 0 These subcontexts are maximal in the sense that it is not possible to add an object or an attribute without breaking the analogical proportion. They are associated to the following analogical complexes: ({12}, {28}, {59}, {49}), ({7, 8}, {2, 5, 9}, {3, 4}, {6}) ({12, 17, 28}, {12, 17, 28}, {20, 49, 59, 64}, {20, 49, 59, 64}), ({7}, ∅, ∅, {6}) The first example provides a strong analogical relation between four animals in the context smallzoo since it uses all attributes and all the types of analogy. Attribute clusters correspond to aquatic predators, toothed animals with hair and milk, birds (feathers and eggs) and flying animals (airborne). The second example shows some of the sets in analogical complexes can be empty. In such a case some sets may be duplicated. Among all complexes, those that exhibit all types of analogy are particularly meaningful: we call them complete complexes. 3.2 Complete analogical complexes (CAC) Definition 4. A complex C = (O1,4 , A1,4 ) is complete if none of its eight sets are empty. construction, if CA = (O1,4 , A1,4 ) is a complete analogical complex and By S if A = i=1,4 Ai , the following formula holds: ∀(o1 , o2 , o3 , o4 ) ∈ O1,4 , ∀(a1 , a2 , a3 , a4 ) ∈ A1,4 (o↑1 ∩ o↑4 ) ∩ A = (o↑2 ∩ o↑3 ) ∩ A = ∅ and o↑1 ∪ o↑4 = o↑2 ∩ o↑3 = A From formal concepts to analogical complexes 165 The next proposition shows that CAC exhibits strong discrimination and sim- ilarity properties among pairs of objects and attributes. The similarity condition alone would lead to the concatenation of independent (non overlapping) formal concepts. The discrimination condition tempers this tendency by requiring the simultaneous presence of opposite pairs. Proposition 1. Let us define on a formal context F C = (X, Y, I) the relations: discrimination(oi , oj , ak , al ) = I(oi , ak ) ∧ I(oj , al ) ∧ ¬I(oi , al ) ∧ ¬I(oj , ak ). similarity(oi , oj , ak , al ) = I(oi , ak ) ∧ I(oj , ak ) ∧ I(oi , al ) ∧ I(oj , al ). A complete analogical complex (O1,4 , A1,4 ) in F C corresponds to a maximal subcontext such that: 1. object pair discrimination (resp. similarity): ∀(oi , oj ) ∈ Oi × Oj , i 6= j, ∃(ak , al ) ∈ Ak × Al such that discrimination(oi , oj , ak , al ) (resp. similarity(oi , oj , ak , al )); 2. attribute pair discrimination (res. similarity): ∀(ak , al ) ∈ Ak × Al , k 6= l, ∃(oi , oj ) ∈ Oi × Oj such that discrimination(oi , oj , ak , al ) (resp. similarity(oi , oj , ak , al )). Proof. Since objects and attribute have a completely symmetrical role, it is suffi- cient to prove the proposition for object pairs. It proceeds easily by enumerating the possible type pairs with different elements. If objects have type 1 and 2 or 3 and 4, attributes allowing object pair discrimination have type b and c and attributes allowing object pair similarity have type a and d. If objects have type 1 and 3 or 2 and 4, attributes allowing object pair discrimination have type a and d and attributes allowing object pair similarity have type b and c. If objects have type 1 and 4 and if t1 ∈ T1 = {a, b} and t2 ∈ T2 = {c, d}, attributes allow- ing object pair discrimination have type t1 and t2 and attributes allowing object pair similarity have different types both in T1 or both in T2 . If objects have type 2 and 3 and if t1 ∈ T1 = {a, c} and t2 ∈ T2 = {b, d}, attributes allowing object pair discrimination have type t1 and t2 and attributes allowing object pair sim- ilarity have different types both in T1 or both in T2 . t u In case of incomplete complexes, some of these properties are no more relevant and a degenerate behaviour may appear: some of the sets may be identical. This fact allows to establish a new proposition on complete complexes: Proposition 2. In a complete analogical complex, side-by-side intersections of sets are empty. Proof. This property holds since when the intersection of two object (resp. at- tribute) sets in an analogical complex AC is not empty, then AC contains at least two empty attribute (resp. object) sets. This fact is a consequence of prop- erty 1. Indeed, if an object belongs to two different types, their profiles must be the 166 Laurent Miclet and Jacques Nicolas same. The discrimination property ensures that the profile of two different ob- ject types differ by at least two different attribute with different types (e.g. if the object has type 1 and 3, attributes of type b and c should have different values). Thus it cannot exists attributes of the discriminant type (e.g. attributes of type b and c in the previous case) and the corresponding sets are empty. This completes the proof. The converse of the proposition is not true: if all side-by-side intersections of sets differ, the complex is not necessary complete. For instance, consider the following context: a1 a2 a3 a4 a5 a6 o1 0 0 0 1 1 1 o2 0 1 1 1 1 1 o3 1 0 0 0 0 1 o4 1 1 1 0 0 0 o5 1 0 1 1 0 0 It contains the following not complete complex: ({o1 }, {o2 }, {o3 }, {o4 }), ({a1 }, {a2 , a3 }, ∅, {a4 , a5 }) 4 The lattice of analogical complexes Definition 5 (Partial Order on analogical complexes). Given two ana- logical complexes C 1 = (O1,4 1 , A11,4 ) and C 2 = (O1,4 2 , A21,4 ), the partial order ≤ is defined by C 1 ≤ C 2 iff Oi1 ⊆ Oi2 for i = 1, 4 and A2i ⊆ A1i for i = 1, 4 . C 1 is called a sub-complex of C 2 and C 2 is called a super-complex of C 1 As for formal concepts, the set of all complexes has a lattice structure. Let us first define a derivation operator on analogical quadruplets: Definition 6 (Derivation on set quadruplets). Let O = O1 ∪ O2 ∪ O3 ∪ O4 be a set of objects partitioned in four subsets, and 0 A be a set of attributes. For all i and j ∈ [1, 4], one defines Oij = {a ∈ A | ∀o ∈ Oi I(o, a) ⇔ AS(i, j)} Let A = A1 ∪ A2 ∪ A3 ∪ A4 be a set of attributes partitioned in four subsets, 0 and O be a set of objects. For all i and j ∈ [1, 4], one defines Aij = {o ∈ O | ∀a ∈ Ai I(o, a) ⇔ AS(i, j)} Finally, we define the derivation on quadruplets as follows: 4 \ 4 \ 4 \ 4 \ 0 0 0 0 0 O1,4 =( Oj1 , Oj2 , Oj3 , Oj4 ) j=1 j=1 j=1 j=1 4 \ 4 \ 4 \ 4 \ 0 0 0 0 A01,4 = ( Aj1 , Aj2 , Aj3 , Aj4 ) j=1 j=1 j=1 j=1 From formal concepts to analogical complexes 167 0 Example 3. Consider O = ({12}, {28}, {59}, {49}). One has: O11 = {a ∈ A | ¬I(12, a)} = {2, 5, 7, 8, 9}; 0 O21 = {a ∈ A | ¬I(28, a)} = {3, 4, 7, 8}; 0 O31 = {a ∈ A | I(59, a)} = {3, 4, 7, 8}; 0 T4 0 O41 = {a ∈ A | I(49, a)} = {2, 4, 5, 7, 8} Then O10 = j=1 Oj1 = {7, 8} Finally, O0 = ({7, 8}, {2, 5, 9}, {3, 4}, {6}). We exhibit a basic theorem for these complexes that naturally extends the basic theorem on concepts: Proposition 3. Given two analogical complexes C 1 = (O1,4 1 , A11,4 ) and C 2 = 2 2 (O1,4 , A1,4 ), – The join of C 1 and C 2 is defined by C 1 ∧ C 2 = (O1,4 , A1,4 ) where ∀i ∈ [1, 4] Oi = Oi (C 1 ) ∩ Oi (C 2 ) 00 A1,4 = A1 (C 1 )∪A1 (C 2 ), A2 (C 1 )∪A2 (C 2 ), A3 (C 1 )∪A3 (C 2 ), A4 (C 1 )∪A4 (C 2 ) – The meet of C 1 and C 2 is defined by C 1 ∨ C 2 = (O1,4 , A1,4 ) where 00 O1,4 = O1 (C 1 )∪O1 (C 2 ), O2 (C 1 )∪O2 (C 2 ), O3 (C 1 )∪O3 (C 2 ), O4 (C 1 )∪O4 (C 2 ) Proof. The meet and the join are dual and one only needs to prove the proposi- tion for the join. The ordering by set inclusion requires the set of objects Oi of C 1 ∧ C 2 to be included in Oi (C 1 ) ∩ Oi (C 2 ) and its set of attributes Aj to be in- cluded in Aj (C 1 ) ∪ Aj C 2 ). Taking exactly the intersection of objects thus ensures the set of objects to be maximal. The corresponding maximal sets of attributes may be inferred using the derivation operator ’ we have just defined. Another way to generate these sets is to apply the derivation operator twice on the union of sets of attributes. Example 4. The complex lattice of smallzoo has 24 elements, including 18 com- plete complexes. It is sketched in figure 3. In this lattice, for example, the join of the analogical complex numbered 9 and 12, which are as follows 9 = ({12}, {28}, {59, 64}, {20, 49}), ({7}, {9}, {4}, {6}) 12 = ({12, 17}, {28}, {59}, {49, 64}), ({7}, {2, 5}, {3}, {6}) is number 15, namely: 15 = ({12}, {28}, {59}, {49}), ({7, 8}, {2, 5, 9}, {3, 4}, {6}) The resulting object sets are for each type the intersection of the two joined object sets. The resulting attribute sets contain for each type the union of the two joined attribute sets and may contain other elements with a correct profile on all objects. For instance, A1 (9 ∧ 12) = {7, 8} is made of the union of A1 (9) 168 Laurent Miclet and Jacques Nicolas and A1 (12) ({7}) plus attribute 8 since 8 has the right profile (0, 0, 1, 1) on O1,4 (that is, ¬I(12, 8), ¬I(28, 8), I(59, 8) and I(49, 8)). The meet of the analogical complexes numbered 9 and 12 is number 19, namely 19 = ({12, 17, 28}, {12, 17, 28}, {20, 49, 59, 64}, {20, 49, 59, 64}), ({7}, ∅, ∅, {6}) 5 Conclusion We have introduced a new conceptual object called analogical complex that uses a complex relation, analogical proportion, to compare objects with respect to their attribute values. Although this relation works on set quadruplets instead of simple sets like in formal concepts, we have shown that it is possible to keep the main properties of concepts, that is, maximality and comparison at the level of object or attribute pairs. The set of all complexes are structured within a lattice that contains two types of elements. The most meaningful ones only contain non empty sets and are a strong support for doing analogical inference. An interesting extension of this work would be to develop this inference process for analogical data mining in a way close to rule generation in FCA. The degenerate case where some of the sets are empty is more frequent than in FCA where their presence is limited to the top or bottom of the lattice. The presence of a single empty set may reflect the lack of some object or attribute and is thus a possible new research direction for completing a knowledge base or an ontology. Particularly, analogy in a Boolean framework introduces a form of negation through the search of dissimilarities (discrimination) between objects. We have written an implementation the search for complete analogical com- plexes, using the Answer Set Programming framework [1]. The properties of definition 3 are translated straightforwardly in logical constraints and the search of all complexes is achieved by an ASP solver looking for all solutions. The de- scription of the ASP program would be beyond the scope of this paper but it can be seen as a relatively simple exercise of extension of the search for formal concepts by adding a few logical constraints. It is likely that most of the existing tools of FCA could be adapted the same way for analogical complex analysis. This would allow to include both categorization and analogy within common data mining environments. References 1. Brewka, G., Eiter, T., Truszczyński, M.: Answer set program- ming at a glance. Commun. ACM 54(12), 92–103 (Dec 2011), http://doi.acm.org/10.1145/2043174.2043195 2. Correa, W., Prade, H., Richard, G.: When intelligence is just a matter of copying. In: et al., L.D.R. (ed.) Proc. 20th Europ. Conf. on Artificial Intelligence, Montpel- lier, Aug. 27-31. pp. 276–281. IOS Press (2012) From formal concepts to analogical complexes 169 3. French, R.M.: The computational modeling of analogy-making. Trends in Cognitive Sciences 6(5), 200 – 205 (2002) 4. Gentner, D., Holyoak, K.J., Kokinov, B.N.: The Analogical Mind: Perspectives from Cognitive Science. Cognitive Science, and Philosophy, MIT Press, Cambridge, MA (2001) 5. Hofstadter, D., Mitchell, M.: The Copycat project: A model of mental fluidity and analogy-making. In: Hofstadter, D., The Fluid Analogies Research Group (eds.) Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought. pp. 205–267. Basic Books, Inc., New York, NY (1995) 6. Lepage, Y.: Analogy and formal languages. Electr. Notes Theor. Comput. Sci. 53 (2001) 7. Lepage, Y.: Analogy and formal languages. In: Proc. FG/MOL 2001. pp. 373–378 (2001), (see also http://www.slt.atr.co.jp/ lepage/pdf/dhdryl.pdf.gz) 8. Lichman, M.: UCI machine learning repository (2013), http://archive.ics.uci.edu/ml 9. Melis, E., Veloso, M.: Analogy in problem solving. In: Handbook of Practical Rea- soning: Computational and Theoretical Aspects. Oxford Univ. Press (1998) 10. Miclet, L., Bayoudh, S., Delhay, A.: Analogical dissimilarity: definition, algorithms and two experiments in machine learning. JAIR, 32 pp. 793–824 (2008) 11. Miclet, L., Prade, H.: Handling analogical proportions in classical logic and fuzzy logics settings. Proc. 10th Eur. Conf. on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU’09),Verona (2009) 12. Miclet, L., Prade, H.: Handling analogical proportions in classical logic and fuzzy logics settings. In: Proc. 10th Eur. Conf. on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (ECSQARU’09),Verona. pp. 638–650. Springer, LNCS 5590 (2009) 13. Miclet, L., Barbot, N., Prade, H.: From analogical proportions in lattices to pro- portional analogies in formal concepts. In: ECAI - 21th European Conference on Artificial Intelligence. Prague, Czech Republic (Aug 2014) 14. Miclet, L., Prade, H., Guennec, D.: Looking for Analogical Proportions in a Formal Concept Analysis Setting. In: Amedeo Napoli, V.V. (ed.) Conference on Concept Lattices and Their Applications. pp. 295–307. Nancy, France (Oct 2011) 15. Prade, H., Richard, G.: Homogeneous logical proportions: Their uniqueness and their role in similarity-based prediction. Proc. of the 13th International Conference on Principles of Knowledge Representation and Reasoning KR2012 pp. 402 – 412 (2012) 16. Radvansky, M., Sklenar, V.: Fca extension for ms excel 2007, http://www.fca.radvansky.net (2010) 17. Stroppa, N., Yvon, F.: An analogical learner for morphological analysis. In: Online Proc. 9th Conf. Comput. Natural Language Learning (CoNLL-2005). pp. 120–127 (2005) 18. Stroppa, N., Yvon, F.: Analogical learning and formal proportions: Definitions and methodological issues. ENST Paris report (2005) 19. Stroppa, N., Yvon, F.: Du quatrième de proportion comme principe inductif : une proposition et son application à l’apprentissage de la morphologie. Traitement Automatique des Langues 47(2), 1–27 (2006) 20. Yevtushenko, S.: System of data analysis ”concept explorer”. (in russian). In: Proc. of the 7th national conference on artificial intelligence (KII-2000) ,Russia. pp. 127– 134 (2000) 170 Laurent Miclet and Jacques Nicolas ∅A ∅A ∅A ∅A ∅ 3, 4, 7, 8 1 3, 4, 6, 7 28 2, 5, 6, 9 ∅ 3, 4, 6, 7 59 3, 4, 7, 8 ∅ 2, 5, 8, 9 ∅ 2, 5, 6, 9 22 2, 5, 8, 9 15 4 12 7, 8 1 3, 4, 7 28 2, 5, 9 28 6 59 3, 4 59 8 49 6 22 2, 5, 9 16 10 17 5 2 3 1 6 20, 59 7 28 2 22 8 12 12, 17 7 14 11 18 28 2, 5 7 6 59 3 49, 64 6 9 8 1 12 7 1, 20, 49, 64 3 1, 20, 49 4 28 9 13 28 6 28 6 59, 64 4 42, 59 8 42, 59, 64 8 20, 49 6 12, 22 5 12, 22 9 19 1, 20, 42, 49, 49, 64 ∅ 12, 17, 28 7 12, 22, 28 6 12, 17, 28 ∅ 1, 20, 42, 49, 49, 64 8 20, 49, 59, 64 ∅ 12, 22, 28 ∅ 20, 49, 59, 64 6 O∅ O∅ O∅ O∅ Fig. 3. Hasse diagram of the analogical complex lattice for formal context smallzoo. For reasons of space some nodes are not explicitely given. Pattern Structures and Their Morphisms Lars Lumpe1 and Stefan E. Schmidt2 Institut für Algebra, Technische Universität Dresden larslumpe@gmail.com1 , midt1@msn.com2 Abstract. Projections of pattern structures don’t always lead to pattern struc- tures, however residual projections and o-projections do. As a unifying approach, we introduce the notion of pattern morphisms between pattern structures and pro- vide a general sufficient condition for a homomorphic image of a pattern structure being again a pattern structure. In particular, we receive a better understanding of the theory of o-projections. 1 Introduction Pattern structures within the framework of formal concept analysis have been intro- duced in [3]. Since then they have turned out to be a useful tool for analysing various real-world applications (cf. [3–7]). In this work we want to point out that the theoretical foundations of pattern structures encourage still some fruitful discussions. In particular, the role projections play within pattern structures for information reduction still needs some further investigation. The goal of our paper is to establish an adequate concept of pattern morphism be- tween pattern structures, which also gives a better understanding of the concept of o- projections as recently introduced and investigated in [2]. In [8], we showed that pro- jections of pattern structures do not necessarily lead to pattern structures again, how- ever, residual projections do. It turns out that the concept of residual maps between the posets of patterns (w.r.t. two pattern structures) gives the key for a unifying view of o-projections and residual projections. We also derive that a pattern morphism from a pattern structure to a pattern setup (in- troduced in this paper), which is surjective on the sets of objects, yields again a pattern structure. Our main result states that a pattern morphism always induces an adjunction between the corresponding concept lattices. In case the underlying map between the sets of ob- jects is surjective, the induced residuated map between the concept lattices turns out to be surjective too. The fundamental order theoretic concepts of our paper are nicely presented in the book on Residuation Theory by T.S. Blythe and M.F. Janowitz (cf. [1]). c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 171–179, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 172 Lars Lumpe and Stefan E. Schmidt 2 Preliminaries Definition 1 (Adjunction). Let P pP, ¤q and L pL, ¤q be posets; furthermore let f : P Ñ L and g : L Ñ P be maps. (1) The pair p f , gq is an adjunction w.r.t. pP, Lq if f x ¤ y is equivalent to x ¤ gy for all x P P and y P L. In this case, we will refer to pP, L, f , gq as a poset adjunction. (2) f is residuated from P to L if the preimage of a principal ideal in L under f is always a principal ideal in P, that is, for every y P L there exists x P P s.t. f 1 tt P L | t ¤ yu ts P P | s ¤ xu. (3) g is residual from L to P if the preimage of a principal filter in P under g is always a principal filter in L, that is, for every x P P there exists y P L s.t. g1 ts P P | x ¤ su tt P L | y ¤ t u. (4) The dual of L is given by Lop pL, ¥q with ¥: tpx,t q P L L | t ¤ xu. The pair p f , gq is a Galois connection w.r.t. pP, Lq if p f , gq is an adjunction w.r.t. pP, Lop q. The following well-known facts are straightforward (cf. [1]). Proposition 1. Let P pP, ¤q and L pL, ¤q be posets. (1) A map f : P Ñ L is residuated from P to L iff there exists a map g : L Ñ P s.t. p f , gq is an adjunction w.r.t. pP, Lq. (2) A map g : L Ñ P is residual from L to P iff there exists a map f : P Ñ L s.t. p f , gq is an adjunction w.r.t. pP, Lq. (3) If p f , gq and ph, kq are adjunctions w.r.t. pP, Lq with f h or g k then f h and g k. (4) If f is a residuated map from P to L, then there exists a unique residual map f from L to P s.t. p f , f q is an adjunction w.r.t. pP, Lq. In this case, f is called the residual map of f . (5) If g is a residual map from L to P, then there exists a unique residuated map g from P to L s.t. pg , gq is an adjunction w.r.t. pP, Lq. In this case, g is called the residuated map of g. Definition 2. Let P pP, ¤q be a poset and T P. Then (1) The restriction of P onto T is given by P|T : pT, ¤ XpT T qq, which clearly is a poset too. (2) The canonical embedding of P|T into P is given by the map T Ñ P,t ÞÑ t. (3) T is a kernel system in P if the canonical embedding τ of P|T into P is residuated. In this case, the residual map ϕ of τ will also be called the residual map of T in P. The composition κ : τ ϕ is referred to as the kernel operator associated with T in P. (4) Dually, T is a closure system in P if the cannonical embedding τ of P|T into P is residual. In this case, the residuated map ψ of τ will also be called the residuated map of T in P. The composition γ : τ ψ is referred to as the closure operator associated with T in P. Pattern Structures and Their Morphisms 173 (5) A map κ : P Ñ P is a kernel operator on P if s ¤ x is equivalent to s ¤ κx for all s P κP and x P P. Remark: In this case, κP forms a kernel system in P, the kernel operator of which is κ. (6) Dually, a map γ : P Ñ P is a closure operator on P if x ¤ t is equivalent to γx ¤ t for all x P P and t P γP. Remark: In this case, ϕP forms a closure system in P, the closure operator of which is γ. The following known facts will be needed for the sequel (cf. [1]) . Proposition 2. Let P pP, ¤q and L pL, ¤q be posets. (1) If f is a residuated map from P to L then f preserves all existing suprema in P, that is, if s P P is the supremum (least upper bound) of X P in P then f s is the supremum of f X in L. In case P and L are complete lattices, the reverse holds too: If a map f from P to L preserves all suprema, that is, f psupP X q supL f X f or all X P, then f is residuated. (2) If g is a residual map from L to P, then g preserves all existing infima in L, that is, if t P L is the infimum (greatest lower bound) of Y L in L then gt is the infimum of gY in P. In case P and L are complete lattices, the reverse holds too: If a map g from L to P preserves all infima, that is, f pinfP Y q infL gY f or all Y L, then g is residual. (3) For an adjunction p f , gq w.r.t. pP, Lq the following hold: (a1) f is an isotone map from P to L. (a2) f g f f (a3) f P is a kernel system in L with f g as associated kernel operator on L. In particular, L Ñ P, y ÞÑ f gy is a residual map from L to L| f P. (b1) g is an isotone map from L to P. (b2) g f g g (b3) gL is a closure system in P with g f as associated closure operator on P. In particular, P Ñ gL, x ÞÑ g f x is a residuated map from P to P|gL. 3 Adjunctions and Their Concept Posets Definition 3. Let P : pP, S, σ , σ q and Q : pQ, T, τ, τ q be poset adjunctions. Then a pair pα, β q forms a morphism from P to Q if pP, Q, α, α q and pS, T, β , β q are poset adjunctions satisfying τ α β σ Remark: This implies α τ σ β , that is, the following diagrams are commu- tative: 174 Lars Lumpe and Stefan E. Schmidt α α P Q P Q σ τ σ τ S T S T β β Next we illustrate the involved poset adjunctions: α P Q α σ σ τ τ β S T β Definition 4 (Concept Poset). For a poset adjunction P pP, S, σ , σ q let BP : tp p, sq P P S | σ p s ^ σ s pu denote the set of pformalq concepts in P . Then the concept poset of P is given by BP : pP Sq | BP , that is, p p0 , s0 q ¤ p p1 , s1 q holds iff p0 ¤ p1 iff s0 ¤ s1 , for all p p0 , s0 q, p p1 , s1 q P BP . If p p, sq is a formal concept in P then p is referred to as extent in P and s as intent in P . Theorem 1. Let pα, β q be a morphism from a poset adjunction P pP, S, σ , σ q to a poset adjunction Q pQ, T, τ, τ q. Then pBP , BQ , Φ,Ψ q is a poset adjunction for Φ : BP Ñ BQ , p p, sq ÞÑ pτ β s, β sq and Ψ : BQ Ñ BP , pq,t q ÞÑ pα q, σ α qq. In addition, if α is surjective then so is Φ. Remark: In particular we want to point out that α q is an extent in P for every extent q in Q and similarly, β s is an intent in Q for every intent s in P . Pattern Structures and Their Morphisms 175 Proof. Let p p, sq P BP and pq,t q P BQ ; then σ p s and σ s p and τq t and τ t q. This implies β s β σ p τα p, thus Φ p p, sq pτ β s, β sq P BP (since ττ β s ττ τα p τα p β sq. Similarly, Ψ pq,t q P BQ . Assume now that Φ p p, sq ¤ pq,t q holds, which implies β s ¤ t. It follows that τα p β σ p β s ¤ t and hence p ¤ α τ t α q, that is, p p, sq ¤ Ψ pq,t q. Conversely, assume that p p, sq ¤ Ψ pq,t q holds, which implies p ¤ α q. It follows that p ¤ α q α τ t σ β t, and hence β s β σ p ¤ t, that is, Φ p p, sq ¤ pq,t q. Assume now that α is surjective; then α α idQ . Let pq,t q P BP , that is, τq t and τ t q. Then for p : α q and s : σ p we have p p, sq P BP since σ s σ σ α q σ σ α τ t σ σ σ β t σ β t α τ t α q p. Our claim is now that Φ p p, sq pq,t q holds, that is, β s t. The latter is true, since α p αα q q implies β s β σ p τα p τq t. Discussion for clarification: The question was raised whether, in the previous theorem, the residuated map Φ from BP to BQ allows some modification, since the map P S Ñ Q T, p p, sq ÞÑ pα p, β sq is obviously residuated from P S to Q T. However, in general the latter map does not restrict to a map from BP to BQ . Indeed, our construction of the map Φ is of the form p p, sq ÞÑ pα 1 p, β sq. As a warning, we want to point out that, in general, there is no residuated map from BP to BQ of the form p p, sq ÞÑ pα p, β 1 sq. The simple reason for this is that β s is an intent in Q for every intent s in P , while there may exist an extent p in P such that α p is not an extent in Q . 4 Morphisms between Pattern Structures Definition 5. A triple G pG, D, δ q is a pattern setup if G is a set, D pD, q is a poset, and δ : G Ñ D is a map. In case every subset of δ G : tδ g | g P Gu has an infimum in D, we will refer to G as pattern structure. Then the set CG : tinfD δ X | X Gu 176 Lars Lumpe and Stefan E. Schmidt forms a closure system in D. If G pG, D, δ q and H pH, E, ε q each is a pattern setup, then a pair p f , ϕ q forms a pattern morphism from G to H if f : G Ñ H is a map and ϕ is a residual map from D to E satisfying ϕ δ ε f , that is, the following diagram is commutative: f G H δ ε D E ϕ In the sequel we show how our previous considerations apply to pattern structures. Applications (1) Let G be a pattern structure and H be a pattern setup. If p f , ϕ q is a pattern morphism from G to H with f being surjective, then H is also a pattern structure. (2) Let G pG, D, δ q and H pH, E, ε q be pattern structures. Also let p f , ϕ q be a pattern morphism from G to H . To apply the previous theorem we give the following construction: f gives rise to an adjunction pα, α q between the power set lattices 2G : p2G , q and 2H : p2H , q via α : 2G Ñ 2H , X ÞÑ f X and α : 2H Ñ 2G ,Y ÞÑ f 1Y. Further let ϕ denote the residuated map of ϕ w.r.t. pE, Dq, that is, pE, D, ϕ , ϕ q is a poset adjunction. Then, obviously, pDop , Eop , ϕ, ϕ q is a poset adjunction too. For pattern structures the following operators are essential: : 2G Ñ D, X ÞÑ infD δ X : D Ñ 2G , d ÞÑ tg P G | d δ gu : 2H Ñ E, Z ÞÑ infE εZ : E Ñ 2H , e ÞÑ th P H | e εhu It now follows that pα, ϕ q forms a morphism from the poset adjunction P p2G , Dop , , q to the poset adjunction Q p2H , Eop , , q. Pattern Structures and Their Morphisms 177 In particular, p f X q ϕ pX q holds for all X G. Here we give an illustration of the constructed adjunctions: α 2G 2H α ϕ Dop Eop ϕ Replacing Dop by D and Eop by E we receive the following commutative diagrams: α α 2G 2H 2G 2H D E ϕ D E ϕ In combination we receive the following diagram of Galois connections and ad- junctions between them: α 2G 2H α l ϕ D E ϕ For the following we recollect that the concept lattice of G is given by BG : BP — similarly, BH : BQ . Now we are prepared to give an application of Theorem 1 to concept lattices of pattern structures: pBG, BH , Φ,Ψ q is an adjunction for Φ : BG Ñ BH , pX, d q ÞÑ ppϕd q , ϕd q 178 Lars Lumpe and Stefan E. Schmidt and Ψ : BH Ñ BG, pZ, eq ÞÑ p f 1 Z, p f 1 Z q q. In case f is surjective, Φ is surjective too. Remark: This application implies a generalization of Proposition 1 in [2], that is, if Z is an extent in H , then f 1 Z is an extent in G, and if d is an intent in G then ϕd is an intent in H . (3) Let G pG, D, δ q be a pattern structure and let κ be a kernel operator on D. Then ϕ : D Ñ κD, d ÞÑ κd forms a residual map from D to κD : D | κD, and pidG , ϕ q is a pattern morphism from G to H : pG, κD, ϕ δ q. Remark: In [2], ϕ is called an o-projection. The above clarifies the role of o- projections for pattern structures. (4) Let G pG, D, δ q be a pattern structure, and let κ be a residual kernel operator on D. Then pidG , κ q is a pattern morphism from G to H : pG, D, κ δ q. Remark: In [8], κ is also referred to as a residual projection. The above clarifies the role of residual projections for pattern structures. (5) Generalizing [2] and [8], we observe that if G pG, D, δ q is a pattern structure and ϕ is a residual map from D to E, then pidG , ϕ q is a pattern morphism from G to H pG, E, ϕ δ q satisfying that Φ : BG Ñ BH , pX, d q ÞÑ ppϕd q , ϕd q is a surjective residuated map from BG to BH . In particular, X ϕ pX ) holds for all X G. Remark: This application gives a better understanding to properly generalize the concept of projections as discussed in [3] and subsequently in [2, 4–8]. References 1. T.S. Blyth, M.F.Janowitz (1972), Residuation Theory, Pergamon Press, pp. 1-382. 2. A. Buzmakov, S. O. Kuznetsov, A. Napoli (2015) , Revisiting Pattern Structure Projections. Formal Concept Analysis. Lecture Notes in Artificial Intelligence (Springer), Vol. 9113, pp 200-215. 3. B. Ganter, S. O. Kuznetsov (2001), Pattern Structures and Their Projections. Proc. 9th Int. Conf. on Conceptual Structures, ICCS01, G. Stumme and H. Delugach (Eds.). Lecture Notes in Artificial Intelligence (Springer), Vol. 2120, pp. 129-142. 4. T. B. Kaiser, S. E. Schmidt (2011), Some remarks on the relation between annotated ordered sets and pattern structures. Pattern Recognition and Machine Intelligence. Lecture Notes in Computer Science (Springer), Vol. 6744, pp 43-48. 5. M. Kaytoue, S. O. Kuznetsov, A. Napoli, S. Duplessis (2011), Mining gene expression data with pattern structures in formal concept analysis. Information Sciences (Elsevier), Vol.181, pp. 1989-2001. 6. S. O. Kuznetsov (2009), Pattern structures for analyzing complex data. In H. Sakai et al. (Eds.). Proceedings of the 12th international conference on rough sets, fuzzy sets, data mining and granular computing (RSFDGrC09). Lecture Notes in Artificial Intelligence (Springer), Vol. 5908, pp. 33-44. Pattern Structures and Their Morphisms 179 7. S. O. Kuznetsov (2013), Scalable Knowledge Discovery in Complex Data with Pattern Structures. In: P. Maji, A. Ghosh, M.N. Murty, K. Ghosh, S.K. Pal, (Eds.). Proc. 5th Inter- national Conference Pattern Recognition and Machine Intelligence (PReMI2013). Lecture Notes in Computer Science (Springer), Vol. 8251, pp. 30-41. 8. L. Lumpe, S. E. Schmidt (2015), A Note on Pattern Structures and Their Projections. For- mal Concept Analysis. Lecture Notes in Artificial Intelligence (Springer), Vol. 9113, pp 145-150. NextClosures: Parallel Computation of the Canonical Base Francesco Kriegel and Daniel Borchmann Institute of Theoretical Computer Science, TU Dresden, Germany {francesco.kriegel,daniel.borchmann}@tu-dresden.de Abstract. The canonical base of a formal context plays a distinguished role in formal concept analysis. This is because it is the only minimal base so far that can be described explicitly. For the computation of this base several algorithms have been proposed. However, all those algorithms work sequentially, by computing only one pseudo-intent at a time – a fact which heavily impairs the practicability of using the canonical base in real-world applications. In this paper we shall introduce an approach that remedies this deficit by allowing the canonical base to be computed in a parallel manner. First experimental evaluations show that for sufficiently large data-sets the speedup is proportional to the number of available CPUs. Keywords: Formal Concept Analysis, Canonical Base, Parallel Algorithms 1 Introduction The implicational theory of a formal context is of interest in a large variety of appli- cations. In those cases, computing the canonical base of the given context is often desirable, as it has minimal cardinality among all possible bases. On the other hand, conducting this computation often imposes a major challenge, often endangering the practicability of the underlying approach. There are two known algorithms for computing the canonical base of a formal context [6, 12]. Both algorithms work sequentially, i.e. they compute one implication after the other. Moreover, both algorithms compute in addition to the implications of the canonical base all formal concepts of the given context. This is a disadvantage, as the number of formal concepts can be exponential in the size of the canonical base. On the other hand, the size of the canonical base can be exponential in the size of the underlying context [10]. Additionally, up to today it is not known whether the canonical base can be computed in output-polynomial time, and certain complexity results hint at a negative answer [3]. For the algorithm from [6], and indeed for any algorithm that computes the pseudo-intents in a lectic order, it has been shown that it cannot compute the canonical base with polynomial delay [2]. However, the impact of theoretical complexity results for practical application is often hard to access, and it is often worth investigating faster algorithm for theoretically in- tractable results. A popular approach is to explore the possibilities to parallelize known se- quential algorithms. This is also true for formal concept analysis, as can be seen in the de- velopment of parallel versions for computing the concept lattice of a formal context [5, 13]. In this work we want to investigate the development of a parallel algorithm for computing the canonical base of a formal context K. The underlying idea is actually c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 181–192, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 182 Francesco Kriegel and Daniel Borchmann quite simple, and has been used by Lindig [11] to (sequentially) compute the concept lattice of a formal context: to compute the canonical base, we compute the lattice of all intents and pseudo-intents of K. This lattice can be computed bottom up, in a level-wise order, and this computation can be done in parallel provided that the lattice has a certain “width” at a particular level. The crucial fact now is that the upper neighbours of an intent or pseudo-intent B can be easily computed by just iterating over all attributes m ∈/ B and computing the closure of B ∪ { m }. In the approach of Lindig mentioned above this closure is just the usual double-prime operation B → 7 BII of the underlying formal context K. In our approach it is the closure operator whose closures are exactly the intents and pseudo-intents of K. Surprisingly, despite the simpleness of our approach, we are not aware of any prior work on computing the canonical base of a formal context in a parallel manner. Furthermore, experimental results presented in this work indicate that for suitable large data-sets the computation of the canonical base can be speed up by a factor proportional to the number of available CPUs. The paper is structured as follows. After recalling all necessary notions of formal concept analysis in Section 2, we shall describe in Section 3 our approach of computing the canonical base in parallel. Benchmarks of this algorithm are presented in Section 4, and we shall close this work with some conclusions in Section 5. 2 Preliminaries This section gives a brief overview on the notions of formal concept analysis [7] that are used in this document. The basic structure is a formal context K = (G, M, I) consisting of a set G of objects, a set M of attributes, and an incidence relation I ⊆ G × M. For a pair (g, m) ∈ I we also use the infix notation g I m, and say that the object g has the attribute m. Each formal context K induces the derivation operators ·I : P(G) → P(M) and ·I : P(M) → P(G) that are defined as follows for object sets A ⊆ G and attribute sets B ⊆ M: AI := { m ∈ M | ∀g ∈ A: (g, m) ∈ I } and BI := { m ∈ M | ∀m ∈ B : (g, m) ∈ I } . In other words, AI is the set of all attributes that all objects from A have in common, and dually BI is the set of all objects which have all attributes from B. A formal concept of K is a pair (A, B) such that AI = B and B = AI , and the set of all formal concepts of K is denoted by B(K). An implication over the set M is an expression of the form X → Y where X, Y ⊆ M. An implication X → Y over M is valid in K if X I ⊆ Y I . A set L of implications over M is valid in K if each implication in L is valid in K. An implication X → Y follows from the set L if X → Y is valid in every context with attribute set M in which L is valid. Furthermore, a model of X → Y is a set T ⊆ M such that X ⊆ T implies Y ⊆ T . A model of L is a model of all implications in L, and X L is the smallest superset of X that is a model of L. The set X L can be computed as follows. [ [ X L := X Ln where X L1 := X ∪ { B | A → B ∈ L and A ⊆ X } n≥1 and X Ln+1 := (X Ln )L1 for all n ∈ N. The following lemma shows some well-known equivalent statements for entailment of implications from implication sets. We will not prove them here. NextClosures: Parallel Computation of the Canonical Base 183 Lemma 1. Let L ∪ { X → Y } be a set of implications over M. Then the following statements are equivalent: 1. X → Y follows from L. 2. If K is a formal context with attribute set M such that L is valid in K, then X → Y is also valid in K. 3. If T ⊆ M and T is a model of L, then T is a model of X → Y . 4. Y ⊆ X L. An attribute set B ⊆ M is called intent of K = (G, M, I) if B = BII . An at- tribute set P ⊆ M is called pseudo-intent of K if P = 6 P II , and furthermore for each pseudo-intent Q ( P the set inclusion QII ⊆ P is satisfied. We denote the set of all pseudo-intents of K by PsInt(K). Then the canonical implicational base of K is defined as the following implication set: { P → P II | P ∈ PsInt(K) }. The canonical base has the property that it is a minimal base of K, i.e. it is a base of K, meaning that it is a set of valid implications of K such that every valid implication of K is entailed by it. Furthermore, its cardinality is minimal among all bases of K. It is readily verified that a subset X ⊆ M is an intent or a pseudo-intent of K if and only if X is a closure of the closure operator K∗ that is defined as follows: [ [ X K := X Kn where X K1 := X ∪ { P II | P ∈ PsInt(K) and P ( X } ∗ ∗ ∗ n≥1 and X Kn+1 := (X Kn )K1 ∗ ∗ ∗ for all n ∈ N. Of course, if L is the canonical base of K as described above, then both closure operators K∗ and L∗ coincide, where L∗ is defined by the following equations: ∗ [ ∗ ∗ [ X L := X Ln where X L1 := X ∪ { B | A → B ∈ L and A ( X } n≥1 ∗ ∗ ∗ and X Ln+1 := (X Ln )L1 for all n ∈ N. 3 Parallel Computation of the Canonical Base The well-known NextClosure algorithm developed by Ganter [6] can be used to enumer- ate the implications of the canonical base. The mathematical idea behind this algorithm is to compute all intents and pseudo-intents of our formal context K in a certain linear order, namely the lectic order. As an advantage the next (pseudo-)intent is uniquely deter- mined, but we potentially have to do backtracking in order to find it. It can be seen quite easily that those sets form a complete lattice, and the NextClosure algorithm uses the closure operator K∗ of this lattice to enumerate the pseudo-intents of K in the lectic order. Furthermore, this algorithm is inherently sequential, i.e. it is not possible to parallelize it. In our approach we shall not make use of the lectic order. Indeed, our algorithm will enumerate all intents and pseudo-intents of K in the subset-order, with no further restrictions. As a benefit we get a very easy and obvious way to parallelize this enu- meration. Moreover, in multi-threaded implementations no communication between different threads is necessary. However, as it is the case with all other known algorithms 184 Francesco Kriegel and Daniel Borchmann for computing the canonical base, we also have to compute all intents in addition to all pseudo-intents of the given formal context. The main idea is very simple and works as follows. From the definition of pseudo- intents we see that in order to decide whether an attribute set P ⊆ M is a pseudo-intent we only need all pseudo-intents Q ( P , i.e. it suffices to know all pseudo-intents with a smaller cardinality than P . This allows for the level-wise computation of the canon- ical base w.r.t. the subset inclusion order, i.e. we can enumerate the (pseudo-)intents w.r.t. increasing cardinality. An algorithm that implements this idea works as follows. First we start by considering the empty set, as it is the only set with cardinality 0. Of course, the empty set must either be an intent or a pseudo-intent, and the distinction can be made by checking whether ∅ = ∅II . Then assuming inductively that all pseudo-intents with cardinality < k have been determined, we can correctly decide whether a subset P ⊆ M with |P | = k is a pseudo-intent or not. To compute the lattice of intents and pseudo-intents of K the algorithm manages a set of candidates that contains the (pseudo-)intents on the current level. Then, whenever a pseudo-intent P has been found, the ⊆-next closure is uniquely determined by its closure P II in the context K. If an∗ intent B has been found, then the ⊆-next closures must be of the form (B ∪ { m })K , m ∈ / B. However, as we are not aware of the full implicational base of K yet, but only of an approximation L of it, the operators K∗ and L∗ do not coincide on all subsets of M. We will show that they yield the same closure for attribute sets B ⊆ M with a cardinality |B| ≤ k if L contains all implications P → P II where P is a pseudo-intent of K with cardinality |P | < k. Consequently, the L∗-closure of a set B ∪ { m } may not be an intent or pseudo-intent of K. Instead they are added to the candidate list, and are processed when all pseudo-intents with smaller cardinality have been determined. We will formally prove that this technique is correct. Furthermore, the computation of all pseudo-intents and intents of cardinality k can be done in parallel, since they are independent of each other. In summary, we shortly describe the inductive structure of the algorithm as follows: Let K be a finite formal context. We use four variables: k denotes the current cardinality of candidates, C is the set of candidates, B is a set of formal concepts, and L is an implication set. Then the algorithm works as follows. 1. Set k := 0, C := { ∅ }, B := ∅, and L := ∅. 2. In parallel: For each candidate set C ∈ C with cardinality |C| = k determine whether it is L∗-closed. If not, then add its L∗-closure to the candidate set C, and go to Step 5. 3. If C is an intent of K, then add the formal concept (C I , C) to B. Otherwise C must be a pseudo-intent, and thus we add the formal implication C → C II to the set L, and add the formal concept (C I , C II ) to the set B. 4. For each observed intent C II , add all its upper neighbours C II ∪ { m } where m∈ / C II to the candidate set C. 5. Wait until all candidates of cardinality k have been processed. If k < |M|, then in- crease the candidate cardinality k by 1, and go to Step 2. Otherwise return B and L. In order to approximate the operator L∗ we furthermore introduce the following notion: If L is a set of implications, then Lk denotes the subset of L that consists of all implications whose premises have a cardinality of at most k. NextClosures: Parallel Computation of the Canonical Base 185 Lemma 2. Let K = (G, M, I) be a formal context, L its canonical implicational base, and X ⊆ M an attribute set. Then the following statements are equivalent: 1. X is either an intent or a pseudo-intent of K. 2. X is K∗-closed. 3. X is L∗-closed. 4. X is (L|X|−1)∗-closed. 5. There is a k ≥ |X| − 1 such that X is (Lk )∗-closed. 6. For all k ≥ |X| − 1 it holds that X is (Lk )∗-closed. Proof. 1⇔2. If X is an intent or a pseudo-intent, then it is obviously K∗1 -closed, i.e. K∗-closed. Vice versa, if X is K∗-closed, but no intent, then X contains the closure P II of every pseudo-intent P ( X, and hence X must be a pseudo-intent. 2⇔3. is obvious. 3⇔4. follows directly from the fact that P ( X implies |P | < |X|. 4⇔5. The only-if-direction is trivial. Consider k ≥ |X| − 1 such that X is (Lk )∗-closed. Then X contains all conclusions B where A → B ∈ L is an implication with premise A ( X such that |A| ≤ k. Of course, A ( X implies |A| < |X|, and thus X is (L|X|−1)∗-closed as well. 4⇔6. The only-if-direction is trivial. Finally, assume that k ≥ |X| − 1 and X is (L|X|−1)∗-closed. Obviously, there are no subsets A ( X with |X| ≤ |A| ≤ k, and so X must be (Lk )∗-closed, too. t u As an immediate consequence of Lemma 2 we infer that in order to decide the K∗-closedness of an attribute set X it suffices to know all implications in the canonical base whose premise has a lower cardinality than X. Corollary 3. If L contains all implications P → P II where P is a pseudo-intent of K with |P | < k, and otherwise only implications with premise cardinality k, then for all attribute sets X ⊆ M with |X| ≤ k the following statements are equivalent: 1. X is an intent or a pseudo-intent of K. 2. X is L∗-closed. This corollary allows us in a certain sense to approximate the set of all K∗-closures w.r.t. increasing cardinality, and thus also permits the approximation of the closure operator L∗ where L is the canonical base of K. In the following Lemma 4 we will characterise the structure of the lattice of all K∗-closures, and also give a method to compute upper neighbours. It is true that between comparable pseudo-intents there must always be an intent. In particular, the unique upper K∗-closed neighbour of a pseudo-intent must be an intent. Lemma 4. Let K be a formal context. Then the following statements are true: 1. If P ⊆ M is a pseudo-intent, then there is no intent or pseudo-intent strictly between P and P II . 2. If B ⊆ M is∗ an intent, then the next intents or pseudo-intents are of the form (B ∪ { m })K for attributes m ∈ 6 B. 3. If X ( Y ⊆ M are neighbouring K∗-closures, then Y = (X ∪ { m })K for all ∗ attributes m ∈ Y \ X. 186 Francesco Kriegel and Daniel Borchmann Algorithm 1 NextClosures (K) 1 k := 0, C := { ∅ }, B := ∅, L := ∅ 2 while k ≤ |M| do 3 for all C ∈ C with |C| = k do in parallel 4 C := C \ { C } ∗ 5 if C = C L then 6 if C = 6 C II then 7 L := L ∪ { C → C II } 8 B := B ∪ { (C I , C II ) } 9 C := C ∪ { C II ∪ { m } | m ∈ 6 C II } 10 else ∗ 11 C := C ∪ { C L } 12 Wait for termination of all parallel processes. 13 k := k + 1 14 return (B, L) Proof. 1. Let P ⊆ M be a pseudo-intent of K. Then for every intent B between P and P II , i.e. P ⊆ B ⊆ P II , we have B = BII = P II . Thus, there cannot be an intent strictly between P and P II . Furthermore, if Q were a pseudo-intent such that P ( Q ⊆ P II , then P II ⊆ Q, and thus Q = P II , a contradiction. 2. Let B ⊆ M be an intent of K, and X ⊇ B an intent or pseudo-intent of K such that there is no other intent or pseudo-intent between them. Then B ⊆ B ∪ { m } ⊆ X for every m ∈ X \ B. Thus, B = BK ( (B ∪ { m })K ⊆ X K = X. Then ∗ ∗ ∗ (B ∪ { m })K is an intent or a pseudo-intent between B and X that strictly ∗ contains B, and hence X = (B ∪ { m })K . ∗ 3. Consider an attribute m ∈ Y \ X. Then X ∪ { m } ⊆ Y , and thus X ( (X ∪ { m })K ⊆ Y , as Y is already closed. Therefore, (X ∪ { m })K = Y . ∗ ∗ t u We are now ready to formulate our algorithm NextClosures in pseudo-code, see Algorithm 1. In the remainder of this section we shall show that this algorithm always terminates for finite formal contexts K, and that it returns the canonical base as well as the set of all formal concepts of K. Beforehand, let us introduce the following notation: 1. NextClosures is in state k if it has processed all candidate sets with a cardinality ≤ k, but none of cardinality > k. 2. Ck denotes the set of candidates in state k. 3. Lk denotes the set of implications in state k. 4. Bk denotes the set of formal concepts in state k. Proposition 5. Let K be a formal context, and assume that NextClosures has been started on K and is in state k. Then the following statements are true: 1. Ck contains all pseudo-intents of K with cardinality k + 1, and all intents of K with cardinality k + 1 whose corresponding formal concept is not already in Bk . 2. Bk contains all formal concepts of K whose intent has cardinality ≤ k. 3. Lk contains all implications P → P II where the premise P is a pseudo-intent of K with cardinality ≤ k. 4. Between the states k and k + 1 an attribute set with cardinality k + 1 is an intent or pseudo-intent of K if and only if it is L∗-closed. NextClosures: Parallel Computation of the Canonical Base 187 Proof. We prove the statements by induction on k. The base case handles the initial state k = −1. Of course, ∅ is always an intent or a pseudo-intent of K. Furthermore, it is the only attribute set of cardinality 0 and contained in the candidate set C. As there are no sets with cardinality ≤ −1, B−1 and L−1 trivially satisfy Statements 2 and 3. Finally, we have that L−1 = ∅, and hence every attribute set is L∗−1-closed, in particular ∅. We now assume that the induction hypothesis is true for k. For every implication set L between states k and k + 1, i.e. Lk ⊆ L ⊆ Lk+1, the induction hypothesis yields that L contains all formal implications P → P II where P is a pseudo-intent of K with cardinality ≤ k, and furthermore only implications whose premises have cardinality k +1 (by definition of Algorithm 1). Additionally, we know that the candidate set C contains all pseudo-intents P of K where |P | = k +1, and all intents B of K such that |B| = k +1 and (BI , B) ∈ / B. Corollary 3 immediately yields the validity of Statements 2 and 3 for k + 1, as those K∗-closures are recognized correctly in line 5. Then Lk+1 contains all implications P → P II where P is a pseudo-intent of K with |P | ≤ k + 1, and hence each implication set L with Lk+1 ⊆ L ⊆ Lk+2 contains all those implications and furthermore only implications with a premise cardinality k + 2. By another application of Corollary 3 we conclude that also Statement 4 is satisfied for k + 1. Finally, we show Statement 1 for k + 1. Consider any K∗-closed set X where |X| = k + 2. Then Lemma 4 states that for all lower K∗-neighbours Y and all m ∈ X \ Y it is true that (Y ∪ { m })K = X. We proceed with a case distinction. ∗ If there is a lower K -neighbour Y which is a pseudo-intent, then Lemma 4 yields that ∗ the (unique) next K∗-neighbour is obtained as Y II , and the formal concept (Y I , Y II ) is added to the set B in line 8. Of course, it is true that X = Y II . Otherwise all lower K∗-neighbours Y are intents, and in particular this is the case for X being a pseudo-intent by Lemma 4. Then for all these Y we have (Y ∪ { m })K = X ∗ for all m ∈ X \ Y . Furthermore, all sets Z with Y ∪ { m } ( Z ( X are not K∗-closed. Since X \ Y is finite, the following sequence must also be finite: ∗ C0 := Y ∪ { m } and Ci+1 := CiL where L|Ci |−1 ⊆ L ⊆ L|Ci |. The sequence is well-defined, since implications from L|Ci | \L|Ci |−1 have no influence on the closure of Ci. Furthermore, the sequence obviously ends with the set X, and contains no further K∗-closed sets, and each of the sets C0, C1, . . . appears as a candidate during the run of the algorithm, cf. lines 9 and 11. t u From the previous result we can infer that in the last state |M| the set B contains all formal concepts of the input context K, and that L is the canonical base of K. Both sets are returned from Algorithm 1, and hence we can conclude that NextClosures is sound and complete. The following corollary summarises our results obtained so far, and also shows termination. Corollary 6. If the algorithm NextClosures is started on a finite formal context K as input, then it terminates, and returns both the set of all formal concepts and the canonical base of K as output. Proof. The second part of the statement is a direct consequence of Proposition 5. In the final state |M| the set L contains all formal implications P → P II where P is a pseudo-intent of K. In particular, L is the canonical implicational base. Furthermore, the set B contains all formal concepts of K. 188 Francesco Kriegel and Daniel Borchmann Finally, the computation time between states k and k + 1 is finite, because there are only finitely many candidates of cardinality k + 1, and the computation of closures w.r.t. the operators L∗ and ·II can be done in finite time. As there are exactly |M| states for a finite formal context, the algorithm must terminate. t u One could ask whether there are formal contexts that do not allow for a speedup in the enumeration of all intents and pseudo-intents on parallel execution. This would happen for formal contexts whose intents and pseudo-intents are linearly ordered. However, this is impossible. Lemma 7. Let K = (G, M, I) be a non-empty clarified formal context. Then the set of its intents and pseudo-intents is not linearly ordered w.r.t. subset inclusion ⊆. Proof. Assume that K := (G, M, I) with G := { g1, . . . , gn }, n > 0, were a clar- ified formal context with intents and pseudo-intents P1 ( P2 ( . . . ( P`. In particular, then also all object intents form a chain g1I ( g2I ( . . . ( gnI where n ≤ `. Since K is attribute-clarified, it follows gj+1 I \ gjI = 1 for all j, and hence w.l.o.g. M = { m1, . . . , mn }, and gi I mj iff i ≥ j. Eventually, K is isomorphic to the ordinal scale Kn := ({ 1, . . . , n } , { 1, . . . , n } , ≤). It is easily verified that the pseudo- intents of Kn are either ∅, or of the form { m, n } where m < n − 1, a contradiction. Consequently, there is no formal context with a linearly ordered set of intents and pseudo-intents. Hence, a parallel enumeration of the intents and pseudo-intents will always result in a speedup compared to a sequential enumeration. 4 Benchmarks The purpose of this section is to show that our parallel algorithm for computing the canonical base indeed yields a speedup, both qualitatively and quantitatively, compared to the classical algorithm based on NextClosure [6]. To this end, we shall present the running times of our algorithm when applied to selected data-sets and with a varying number of available CPUs. We shall see that, up to a certain limit, the running time of our algorithms decreases proportional to the number of available CPUs. Furthermore, we shall also show that this speedup is not only qualitative, but indeed yields a real speedup compared to the original sequential algorithm for computing the canonical base. The presented algorithm NextClosures has been integrated into Concept Ex- plorer FX [8]. The implementation is a straight-forward adaption of Algorithm 1 to the programming language Java 8, and heavily uses the new Stream API and thread-safe concurrent collection classes (like ConcurrentHashMap). As we have described before, the processing of all candidates on the current cardinality level can be done in parallel, i.e. for each of them a separate thread is started that executes the necessary operations for lines 4 to 11 in Algorithm 1. Furthermore, as the candidates on the same level cannot affect each other, no communication between the threads is needed. More specifically, we have seen that the decision whether a candidate is an intent or a pseudo-intent is independent of all other sets with the same or a higher cardinality. The formal contexts used for the benchmarks 1 are listed in Figure 1, and are either obtained from the FCA Data Repository [4] ( a to d , and f to p ), randomly created 1 Readers who are interested in the test contexts should send a mail to one of the authors. NextClosures: Parallel Computation of the Canonical Base 189 Formal Context Objects Attributes Density a car.cxt 1728 25 28 % b mushroom.cxt 8124 119 19 % c tic-tac-toe.cxt 958 29 34 % d wine.cxt 178 68 20 % e algorithms.cxt 2688 54 22 % f o1000a10d10.cxt 1000 10 10 % g o1000a20d10.cxt 1000 20 10 % h o1000a36d17.cxt 1000 36 16 % i o1000a49d14.cxt 1000 49 14 % j o1000a50d10.cxt 1000 50 10 % k o1000a64d12.cxt 1000 64 12 % l o1000a81d11.cxt 1000 81 11 % m o1000a100d10-001.cxt 1000 100 11 % n o1000a100d10-002.cxt 1000 100 11 % o o1000a100d10.cxt 1000 100 11 % p o2000a81d11.cxt 2000 81 11 % q 24.cxt 17 26 51 % r 35.cxt 18 24 43 % s 51.cxt 26 17 76 % t 54.cxt 20 20 48 % u 79.cxt 25 26 68 % Fig. 1. Formal Contexts in Benchmarks ( q to n ), or created from experimental results ( e ). For each of them we executed the implementation at least three times, and recorded the average computation times. The experiments were performed on the following two systems: Taurus (1 Node of Bull HPC-Cluster, ZIH) CPUs: 2x Intel Xeon E5-2690 with eight cores @ 2.9 GHz, RAM: 32 GB Atlas (1 Node of Megware PC-Farm, ZIH) CPUs: 4x AMD Opteron 6274 with sixteen cores @ 2.2 GHz, RAM: 64 GB The benchmark results are displayed in Figure 2. The charts have both axes logarithmically scaled, to emphasise the correlation between the execution times and the number of available CPUs. We can see that the computation time is almost inverse linear proportional to the number of available CPUs, provided that the context is large enough. In this case there are enough candidates on each cardinality level for the computation to be done in parallel. However, we shall note that there are some cases where the computation times increase when utilising all available CPUs. We are currently not aware of an explanation for this exception – maybe it is due to some technical details of the platforms or the operation systems, e.g. some background tasks that are executed during the benchmark, or overhead caused by thread maintenance. Note that we did not have full system access during the experiments, but could only execute tasks by scheduling them in a batch system. Additionally, for some of the test contexts only benchmarks for a large number of CPUs could be performed, due to the time limitations on the test systems. Furthermore, we have performed the same benchmark with small-sized contexts having at most 15 attributes. The computation times were far below one second. We have noticed that there is a certain number of available CPUs for which there is no 190 Francesco Kriegel and Daniel Borchmann p m o n m l p l k l l l l k l l kk i b k b b i b k 1h 1h h j d i i k j i i d k i c c j k j j j i h j d j j jj e c d h h d j i c c d ji e j h i h d c d c j e i h c e d d a h c d h c e Computation Time Computation Time c d d h cd e c a h cd d h c c e hh a e h u h a e h h u 1min 1min e u a e a e u e e e a e a e u u u e u a g u u u u u a uu a g a u a a g u u s a a a g a s g s s q g s s s s s s s ss q g g r g s f q t q s g s g s rf g q s 1s 1s t g g g q gg r r q g t tf t q q f q t r q r r q tf q f q qq t t r f t r q q f t r tf t f rf t rf rf r t tf rf r r f r tf t f 1 2 4 8 16 32 64 1 2 4 8 16 Number of CPUs Number of CPUs Fig. 2. Benchmark Results (left: Atlas, right: Taurus) NextClosures: Parallel Computation of the Canonical Base 191 d 1h h d c c h d c d c c d e e h Computation Time e h h e e a 1min a u a u a u u a u g g g s g q g s r q s s 1s s t r tf q f q r tf q r tf rf t NextClosure NextClosures NextClosures NextClosures (1 CPU) (1 CPU) (2 CPUs) (4 CPUs) Fig. 3. Performance Comparison further increase in speed of the algorithm. This happens when the number of candidates is smaller than the available CPUs. Finally, we compared our two implementations of NextClosure and NextClosures when only one CPU is utilised. The comparison was performed on a notebook with Intel Core i7-3720QM CPU with four cores @ 2.6 GHz and 8 GB RAM. The results are shown in Figure 3. We conclude that our proposed algorithm is on average as fast as NextClosure on the test contexts. The computation time ratio is between 13 and 3, depending on the specific context. Low or no speedups are expected for formal contexts where NextClosure does not have to do backtracking, and hence can find the next intent or pseudo-intent immediately. 5 Conclusion In this paper we have introduced the parallel algorithm NextClosures for the computa- tion of the canonical base. It constructs the lattice of all intents and pseudo-intents of a given formal context from bottom to top in a level-wise order w.r.t. increasing cardinality. As the elements in a certain level of this lattice can be computed independently, they can also be enumerated in parallel, thus yielding a parallel algorithm for computing the canonical base. Indeed, first benchmarks show that NextClosures allows for a speedup that is proportional to the number of available CPUs, up to a certain natural limit. Furthermore, we have compared its performance to the well-known algorithm NextClo- sure when utilising only one CPU. It could be observed that on average our algorithm (on one CPU) has the same performance as NextClosure, at least for the test contexts. So far we have only introduced the core idea of the algorithm, but it should be clear that certain extensions are possible. For example, it is not hard to see that our parallel algorithm can be extended to also handle background knowledge given as a set of impli- cations or as a constraint closure operator [1]. In order to yield attribute exploration, our algorithm can also be extended to include expert interaction for exploration of the canoni- cal base of partially known contexts, much in the same way as the classical algorithm. One benefit is the possibility to have several experts answering questions in parallel. Another advantage is the constant increase in the difficulty of the questions (i.e. premise cardi- 192 Francesco Kriegel and Daniel Borchmann nality), compared to the questions posed by default attribute exploration in lectic order. Those extensions have not been presented here due to a lack of space, but we shall present them in a future publication. Meanwhile, they can be found in a technical report [9]. Acknowledgements The authors thank Bernhard Ganter for helpful hints on optimal formal contexts for his NextClosure algorithm. Furthermore, the authors thank the anonymous reviewers for their constructive comments. The benchmarks were performed on servers at the Institute of Theoretical Computer Science, and the Centre for Information Services and High Performance Computing (ZIH) at TU Dresden. We thank them for their generous allocations of computer time. References [1] Radim Belohlávek and Vilém Vychodil. “Formal Concept Analysis with Constraints by Closure Operators”. In: Conceptual Structures: Inspiration and Application, 14th International Conference on Conceptual Structures, ICCS 2006, Aalborg, Denmark, July 16-21, 2006, Proceedings. Ed. by Henrik Schärfe, Pascal Hitzler, and Peter Øhrstrøm. Vol. 4068. Lecture Notes in Computer Science. Springer, 2006, pp. 131–143. [2] Felix Distel. “Hardness of Enumerating Pseudo-Intents in the Lectic Order”. In: Proceed- ings of the 8th Interational Conference of Formal Concept Analysis. (Agadir, Morocco). Ed. by Léonard Kwuida and Barış Sertkaya. Vol. 5986. Lecture Notes in Computer Science. Springer, 2010, pp. 124–137. [3] Felix Distel and Barış Sertkaya. “On the Complexity of Enumerating Pseudo-Intents”. In: Discrete Applied Mathematics 159.6 (2011), pp. 450–466. [4] FCA Data Repository. url: http://www.fcarepository.com. [5] Huaiguo Fu and Engelbert Mephu Nguifo. “A Parallel Algorithm to Generate Formal Concepts for Large Data”. In: Proceedings of the Second International Conference on Formal Concept Analysis. (Sydney, Australia). Ed. by Peter W. Eklund. Vol. 2961. Lecture Notes in Computer Science. Springer, 2004, pp. 394–401. [6] Bernhard Ganter. “Two Basic Algorithms in Concept Analysis”. In: Proceedings of the 8th Interational Conference of Formal Concept Analysis. (Agadir, Morocco). Ed. by Léonard Kwuida and Barış Sertkaya. Vol. 5986. Lecture Notes in Computer Science. Springer, 2010, pp. 312–340. [7] Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foundations. Springer, 1999. [8] Francesco Kriegel. Concept Explorer FX. Software for Formal Concept Analysis. 2010- 2015. url: https://github.com/francesco-kriegel/conexp-fx. [9] Francesco Kriegel. NextClosures – Parallel Exploration of Constrained Closure Operators. LTCS-Report 15-01. Chair for Automata Theory, TU Dresden, 2015. [10] Sergei O. Kuznetsov. “On the Intractability of Computing the Duquenne-Guigues Base”. In: Journal of Universal Computer Science 10.8 (2004), pp. 927–933. [11] Christian Lindig. “Fast Concept Analysis”. In: Working with Conceptual Structures – Contributions to ICCS 2000. (Aachen, Germany). Ed. by Gerhard Stumme. Shaker Verlag, 2000, pp. 152–161. [12] Sergei A. Obiedkov and Vincent Duquenne. “Attribute-Incremental Construction of the Canonical Implication Basis”. In: Annals of Mathematics and Artificial Intelligence 49.1-4 (2007), pp. 77–99. [13] Vilém Vychodil, Petr Krajča, and Jan Outrata. “Parallel Recursive Algorithm for FCA”. In: Proceedings of the 6th International Conference on Concept Lattices and Their Applications. Ed. by Radim Bělohlávek and Sergej O. Kuznetsov. Palacký University, Olomouc, 2008, pp. 71–82. Probabilistic Implicational Bases in FCA and Probabilistic Bases of GCIs in EL⊥ Francesco Kriegel Institute for Theoretical Computer Science, TU Dresden, Germany francesco.kriegel@tu-dresden.de http://lat.inf.tu-dresden.de/˜francesco Abstract. A probabilistic formal context is a triadic context whose third dimen- sion is a set of worlds equipped with a probability measure. After a formal definition of this notion, this document introduces probability of implications, and provides a construction for a base of implications whose probability satisfy a given lower threshold. A comparison between confidence and probability of implications is drawn, which yields the fact that both measures do not coin- cide, and cannot be compared. Furthermore, the results are extended towards the light-weight description logic EL⊥ with probabilistic interpretations, and a method for computing a base of general concept inclusions whose probability fulfill a certain lower bound is proposed. Keywords: Formal Concept Analysis, Description Logics, Probabilistic Formal Context, Probabilistic Interpretation, Implication, General Concept Inclusion 1 Introduction Most data-sets from real-world applications contain errors and noise. Hence, for mining them special techniques are necessary in order to circumvent the expression of the errors. This document focuses on rule mining, especially we attempt to extract rules that are approximately valid in data-sets, or families of data-sets, respectively. There are at least two measures for the approximate soundness of rules, namely confidence and probability. While confidence expresses the number of counterexamples in a single data-set, probability expresses somehow the number of data-sets in a data-set family that do not contain any counterexample. More specifically, we consider implications in the formal concept analysis setting [7], and general concept inclusions (GCIs) in the description logics setting [1] (in the light-weight description logic EL⊥ ). Firstly, for axiomatizing rules from formal contexts possibly containing wrong in- cidences or having missing incidences the notion of a partial implication (also called association rule) and confidence has been defined by Luxenburger in [12]. Further- more, Luxenburger introduced a method for the computation of a base of all partial implications holding in a formal context whose confidence is above a certain threshold. In [2] Borchmann has extended the results to the description logic EL⊥ by adjusting the notion of confidence to GCIs, and also gave a method for the construction of a base of confident GCIs for an interpretation. Secondly, another perspective is a family of data-sets representing different views of the same domain, e.g., knowledge of different persons, or observations of an exper- iment that has been repeated several times, since some effects could not be observed c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 193–204, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 194 Francesco Kriegel in every case. In the field of formal concept analysis, Vityaev, Demin, and Ponomaryov, have introduced in probabilistic extensions of formal contexts and their formal concepts and implications, and furthermore gave some methods for their computation, cf. [4]. In [9] the author has shown some methods for the computation of a base of GCIs in probabilistic description logics where concept and role constructors are available to express probability directly in the concept descriptions. Here, we want to use another approach, and do not allow for probabilistic constructors, but define the notion of a probability of general concept inclusions in the light-weight description logic EL⊥ . Furthermore, we provide a method for the computation of a base of GCIs satisfying a certain lower threshold for the probability. More specifically, we use the description logic EL⊥ with probabilistic interpretations that have been introduced by Lutz and Schröder in [11]. Beforehand, we only consider conjunctions in the language of formal concept analysis, and define the notion of a probabilistic formal context in a more general form than in [4], and provide a technique for the computation of base of implications satisfying a given lower probability threshold. The document is structured as follows. In Section 2 some basic notions for probabilis- tic extensions of formal concept analysis are defined. Then, in Section 3 a method for the computation of a base for all implications satisfying a given lower probability threshold in a probabilistic formal context is developed, and its correctness is proven. The fol- lowing sections extend the results to the description logic EL⊥ . In particular, Section 4 introduces the basic notions for EL⊥ , and defines probabilistic interpretations. Section 5 shows a technique for the construction of a base of GCIs holding in a probabilistic interpretation and fulfilling a lower probability threshold. Furthermore, a comparison of the notions of confidence and probability is drawn at the end of Section 3. 2 Probabilistic Formal Concept Analysis A probability measure P on a countable set W is a mapping P: 2W → [0, 1] such that P(∅) = 0 and P(W ) = 1 hold, and P is σ-additive, i.e., S for all pairwise disjoint countable families (Un )n∈N with Un ⊆ W it holds that P( n∈N Un ) = ∑n∈N P(Un ). A world w ∈ W is possible if P{ w } > 0 holds, and impossible otherwise. The set of all possible worlds is denoted by Wε , and the set of all impossible worlds is denoted by W0. Obviously, Wε ] W0 is a partition of W. Definition 1 (Probabilistic Formal Context). A probabilistic formal context K is a tuple (G, M, W, I, P) that consists of a set G of objects, a set M of attributes, a countable set W of worlds, an incidence relation I ⊆ G × M × W, and a probability measure P on W. For a triple (g, m, w) ∈ I we say that object g has attribute m in world w. Furthermore, we define the derivations in world w as operators · Iw : 2G → 2M and · Iw : 2M → 2G where A Iw := { m ∈ M | ∀g ∈ A : (g, m, w) ∈ I } B Iw := { g ∈ G | ∀m ∈ B : (g, m, w) ∈ I } for object sets A ⊆ G and attribute sets B ⊆ M, i.e., A Iw is the set of all common attributes of all objects in A in the world w, and B Iw is the set of all objects that have all attributes in B in w. Probabilistic Implicational Bases and Probabilistic Bases of GCIs 195 Definition 2 (Implication, Probability). Let K = (G, M, W, I, P) be a probabilistic formal context. For attribute sets X, Y ⊆ M we call X → Y an implication over M, and its probability in K is defined as the measure of the set of worlds it holds in, i.e., P(X → Y) := P{ w ∈ W | X Iw ⊆ Y Iw }. Furthermore, we define the following properties for implications X → Y: 1. X → Y holds in world w of K if X Iw ⊆ Y Iw is satisfied. 2. X → Y certainly holds in K if it holds in all worlds of K. 3. X → Y almost certainly holds in K if it holds in all possible worlds of K. 4. X → Y possibly holds in K if it holds in a possible world of K. 5. X → Y is impossible in K if it does not hold in any possible world of K. 6. X → Y is refuted by K if does not hold in any world of K. It is readily verified that P(X → Y) = P{ w ∈ Wε | X Iw ⊆ Y Iw } = ∑{ P{ w } | w ∈ Wε and X Iw ⊆ Y Iw }. An implication X → Y almost certainly holds if P(X → Y) = 1, possibly holds if P(X → Y) > 0, and is impossible if P(X → Y) = 0. If X → Y certainly holds, then it is almost certain, and if X → Y is refuted, then it is impossible. 3 Probabilistic Implicational Bases At first we introduce the notion of a probabilistic implicational base. Then we will develop and prove a construction for such bases w.r.t. probabilistic formal contexts. If the underlying context is finite, then the base is computable. The reader should be aware of the standard notions of formal concept analysis in [7]. Recall that an implication follows from an implication set if, and only if, it can be syntactically deduced using the so-called Armstrong rules as follows: 1. From X ⊇ Y infer X → Y. 2. From X → Y and Y → Z infer X → Z. 3. From X1 → Y1 and X2 → Y2 infer X1 ∪ X2 → Y1 ∪ Y2. Definition 3 (Probabilistic Implicational Base). Let K = (G, M, W, I, P) be a proba- bilistic formal context, and p ∈ [0, 1] a threshold. A probabilistic implicational base for K and p is an implication set B over M that satisfies the following properties: 1. B is sound for K and p, i.e., P(X → Y) ≥ p holds for all implications X → Y ∈ B, and 2. B is complete for K and p, i.e., if P(X → Y) ≥ p, then X → Y follows from B. A probabilistic implicational base is irredundant if none of its implications follows from the others, and is minimal if it has minimal cardinality among all bases for K and p. It is readily verified that the above definition is a straight-forward generalization of implicational bases as defined in [7, Definition 37], in particular formal contexts coincide with probabilistic formal contexts having only one possible world, and implications holding in the formal context coincide with implications having probability 1. We now define a transformation from probabilistic formal contexts to formal contexts. It allows to decide whether an implication (almost) certainly holds, and furthermore it can be utilized to construct an implicational base for the (almost) certain implications. Definition 4 (Scaling). Let K be a probabilistic formal context. The certain scaling of K is the formal context K× := (G × W, M, I × ) where ((g, w), m) ∈ I × iff (g, m, w) ∈ I, and the almost certain scaling of K is the subcontext K× × ε := (G × Wε , M, Iε ) of K . × 196 Francesco Kriegel Lemma 5. Let K = (G, M, W, I, P) be a probabilistic formal context, and let X → Y be a formal implication. Then the following statements are satisfied: 1. X → Y certainly holds in K if, and only if, X → Y holds in K× . 2. X → Y almost certainly holds in K if, and only if, X → Y holds in K× ε . Proof. It is readily verified that the following equivalences hold: P(X → Y) = 1 ⇔ ∀w ∈ W : X Iw ⊆ Y Iw × ] ] × ⇔ XI = w∈W X Iw × { w } ⊆ w∈W Y Iw × { w } = Y I ⇔ K× |= X → Y. The second statement can be proven analogously. t u Recall the notion of a pseudo-intent [6–8]: An attribute set P ⊆ M of a formal context (G, M, I ) is a pseudo-intent if P 6= P II , and Q II ⊆ P holds for all pseudo-intents Q ( P. Furthermore, it is well-known that the canonical implicational base of a formal context (G, M, I ) consists of all implications P → P II where P is a pseudo-intent, cf. [6–8]. Consequently, the next corollary is an immediate consequence of Lemma 5. Corollary 6. Let K be a probabilistic formal context. Then the following statements hold: 1. An implicational base for K× is an implicational base for the certain implications of K, in particular this holds for the following implication set: × × BK := { P → P I I | P ∈ PsInt(K× ) }. 2. An implicational base for K× ε w.r.t. the background knowledge BK is an implicational base for the almost certain implications of K, in particular this holds for the following implication set: × × BK,1 := BK ∪ { P → P Iε Iε | P ∈ PsInt(K× ε , BK ) }. Lemma 7. Let K = (G, M, W, I, P) be a probabilistic formal context. Then the following statements are satisfied: 1. Y ⊆ X implies that X → Y certainly holds in K. 2. X1 ⊆ X2 and Y1 ⊇ Y2 imply P(X1 → Y1 ) ≤ PV (X2 → Y2 ). 3. X0 ⊆ X1 ⊆ . . . ⊆ Xn implies P(X0 → Xn ) ≤ in=1 P(Xi−1 → Xi ). Proof. 1. If Y ⊆ X, then X Iw ⊆ Y Iw follows for all worlds w ∈ W. 2. Assume X1 ⊆ X2 and Y2 ⊆ Y1. Then X1Iw ⊇ X2Iw and Y2Iw ⊇ Y1Iw follow for all worlds w ∈ W. Consider a world w ∈ W where X1Iw ⊆ Y1Iw . Of course, we may conclude that X2Iw ⊆ Y2Iw . As a consequence we get P(X1 → Y1 ) ≤ P(X2 → Y2 ). 3. We prove the third claim by induction on n. For n = 0 there is nothing to show, and the case n = 1 is trivial. Hence, consider n = 2 for the induction base, and let X0 ⊆ X1 ⊆ X2. Then we have that X0Iw ⊇ X1Iw ⊇ X2Iw is satisfied in all worlds w ∈ W. Now consider a world w ∈ W where X0Iw ⊆ X2Iw is true. Probabilistic Implicational Bases and Probabilistic Bases of GCIs 197 Of course, it then follows that X0Iw ⊆ X1Iw ⊆ X2Iw . Consequently, we conclude P(X0 → X2 ) ≤ P(X0 → X1 ) and P(X0 → X2 ) ≤ P(X1 → X2 ). For the induction step let n > 2. The induction hypothesis yields that ^n−1 P(X0 → Xn−1 ) ≤ P(Xi−1 → Xi ). i=1 Of course, it also holds that X0 ⊆ Xn−1 ⊆ Xn , and it follows by induction hypothesis and the previous inequality that ^n P(X0 → Xn ) ≤ P(X0 → Xn−1 ) ∧ P(Xn−1 → Xn ) ≤ P(Xi−1 → Xi ). t u i=1 Lemma 8. Let K = (G, M, W, I, P) be a probabilistic formal context. Then for all implica- tions X → Y the following equalities are valid: × × × × × × × × P(X → Y) = P(X I I → Y I I ) = P(X Iε Iε → Y Iε Iε ). Proof. Let X → Y be an implication. Then for all worlds w ∈ W it holds that × g ∈ X Iw ⇔ ∀m ∈ X : (g, m, w) ∈ I ⇔ ∀m ∈ X : ((g, w), m) ∈ I × ⇔ (g, w) ∈ X I , × and we conclude that X Iw = π1 (X I ∩ (G × { w })). Furthermore, we then infer × × X Iw = X I I Iw , and thus the following equations hold: P(X → Y) = P{ w ∈ W | X Iw ⊆ Y Iw } × × × × × × × × = P{ w ∈ W | X I I Iw ⊆ Y I I Iw } = P(X I I → Y I I ). × In particular, for all possible worlds w ∈ Wε it holds that g ∈ X Iw ⇔ (g, w) ∈ X Iε , and × × × thus X Iw = π1 (X Iε ∩ (G × { w })) and X Iw = X Iε Iε Iw are satisfied. Consequently, × × × × it may be concluded that P(X → Y) = P(X Iε Iε → Y Iε Iε ). t u Lemma 9. Let K be a probabilistic formal context. Then the following statements hold: 1. If B is an implicational base for the certain implications of K, then the implication X → Y × × × × follows from B ∪ { X I I → Y I I }. 2. If B is an implicational base for the almost certain implications of K, then the implication × × × × X → Y follows from B ∪ { X Iε Iε → Y Iε Iε }. × × Proof. Of course, the implication X → X I I holds in K× , i.e., certainly holds in K × × by Lemma 5, and hence follows from B. Thus, the implication X → Y I I is entailed × × × × × × by B ∪ { X I I → Y I I }, and because of Y ⊆ Y I I the claim follows. The second statement follows analogously. t u Lemma 10. Let K be a probabilistic formal context. Then the following statements hold: × × × × × × × × 1. P(X Iε Iε → Y Iε Iε ) = P(X Iε Iε → (X ∪ Y) Iε Iε ), × × × × 2. (X ∪ Y) Iε Iε → Y Iε Iε certainly holds in K, and 198 Francesco Kriegel × × × × × × × × × × × × 3. X Iε Iε → Y Iε Iε is entailed by { X Iε Iε → (X ∪ Y) Iε Iε , (X ∪ Y) Iε Iε → Y Iε Iε }. × × (X ∪ Y) Iε Iε p × × × × X Iε Iε p Y Iε Iε × × × × × × × × × × Proof. First note that (X Iε Iε ∪ Y Iε Iε ) Iε Iε = (X ∪ Y) Iε Iε . As Y Iε Iε is a subset of × × × × × × × × × × (X Iε Iε ∪ Y Iε Iε ) Iε Iε , the implication (X ∪ Y) Iε Iε → Y Iε Iε certainly holds in K, cf. Statement 1 in Lemma 7. Furthermore, we have that X Iw ⊆ Y Iw if, and only if, X Iw ⊆ X Iw ∩ Y Iw = (X ∪ Y) Iw . Hence, the implication X → Y has the same probability as X → X ∪ Y. Consequently, we may conclude by means of Lemma 7 that × × × × × × × × P(X Iε Iε → Y Iε Iε ) = P(X → Y) = P(X → X ∪ Y) = P(X Iε Iε → (X ∪ Y) Iε Iε ). × × × × × × × × × × × × Obviously, { X Iε Iε → (X ∪ Y) Iε Iε , (X ∪ Y) Iε Iε → Y Iε Iε } entails X Iε Iε → Y Iε Iε . t u Lemma 11. Let K be a probabilistic formal context, and X, Y be intents of K× ε such that X ⊆ Y and P(X → Y) ≥ p. Then the following statements are true: 1. There is a chain of neighboring intents X = X0 ≺ X1 ≺ X2 ≺ . . . ≺ Xn = Y in K× ε , 2. P(Xi−1 → Xi ) ≥ p for all i ∈ { 1, . . . , n }, and 3. X → Y is entailed by { Xi−1 → Xi | i ∈ { 1, . . . , n } }. Proof. The existence of a chain X = X0 ≺ X1 ≺ X2 ≺ . . . ≺ Xn−1 ≺ Xn = Y of neighboring intents between X and Y in K× ε follows from X ⊆ Y. From Statement 3 in Lemma 7 it follows that all implications Xi−1 → Xi have a probability of at least p in K. It is trivial that they entail X → Y. t u Theorem 12 (Probabilistic Implicational Base). Let K be a probabilistic formal context, and p ∈ [0, 1) a probability threshold. Then the following implication set is a probabilistic implicational base for K and p: BK,p := BK,1 ∪ { X → Y | X, Y ∈ Int(K× ε ) and X ≺ Y and P(X → Y) ≥ p }. Proof. All implications in BK,1 hold almost certainly in K, and thus have probability 1. By construction, all other implications X → Y in the second subset have a probability ≥ p. Hence, Statement 1 in Definition 3 is satisfied. Now consider an implication X → Y over M such that P(X → Y) ≥ p. We have to prove Statement 2 of Definition 3, i.e., that X → Y is entailed by BK,p . × × × × Lemma 8 yields that both implications X → Y and X Iε Iε → Y Iε Iε have the same × × × × probability. Lemma 9 states that X → Y follows from BK,1 ∪ { X Iε Iε → Y Iε Iε }. × × × × × × According to Lemma 10, the implication X Iε Iε → Y Iε Iε follows from { X Iε Iε → × × × × × × (X ∪ Y) Iε Iε , (X ∪ Y) Iε Iε → Y Iε Iε }. Furthermore, it holds that × × × × × × × × P(X Iε Iε → (X ∪ Y) Iε Iε ) = P(X Iε Iε → Y Iε Iε ) = P(X → Y) ≥ p, Probabilistic Implicational Bases and Probabilistic Bases of GCIs 199 × × × × and the second implication (X ∪ Y) Iε Iε → Y Iε Iε certainly holds, i.e., follows from BK,1. Finally, Lemma 11 states that there is a chain of neighboring intents of K× ε × × × × starting at X Iε Iε and ending at (X ∪ Y) Iε Iε , i.e., × × I× I× I× I× I× I× I× I× × × X Iε Iε = X0ε ε ≺ X1ε ε ≺ X2ε ε ≺ . . . ≺ Xnε ε = (X ∪ Y) Iε Iε , I× I× I× I× such that all implications Xi−1 → Xi ε ε ε ε have a probability ≥ p, and are thus con- tained in BK,p . Hence, BK,p entails the implication X → Y. t u Corollary 13. Let K be a probabilistic formal context. Then the following set is an implica- tional base for the possible implications of K: BK,ε := BK,1 ∪ { X → Y | X, Y ∈ Int(K× ε ) and X ≺ Y and P(X → Y) > 0 }. However, it is not possible to show irredundancy or minimality for the base of probabilistic implications given above in Theorem 12. Consider the probabilistic formal context K = ({ g1, g2 }, { m1, m2 }, { w1, w2 }, I, { { w1 } 7→ 12 , { w2 } 7→ 12 }) whose incidence relation I is defined as follows: w1 m1 m2 w2 m1 m2 (G × W, { m2 }) g1 × × g1 × × g2 × g2 × × ({ (g1, w1 ), (g1, w2 ), (g2, w2 ) }, { m1, m2 }) The only pseudo-intent of K× is ∅, and the concept lattice of K× is shown above. Hence, we have the following probabilistic implicational base for p = 12 : BK, 1 = { ∅ → { m2 }, { m2 } → { m1, m2 } }. 2 However, the set B := { ∅ → { m1, m2 } } is also a probabilistic implicational base for K and 12 with less elements. In order to compute a minimal base for the implications holding in a probabilistic formal context with a probability ≥ p, one can for example determine the above given probabilistic base, and minimize it by means of constructing the Duquenne-Guigues base of it. This either requires the transformation of the implication set into a formal con- text that has this implication set as an implicational base, or directly compute all pseudo- closures of the closure operator induced by the (probabilistic) implicational base. Recall that the confidence of an implication X → Y in a formal context (G, M, I ) is defined as conf (X → Y) := (X ∪ Y) I / X I , cf. [12]. In general, there is no corre- spondence between the probability of an implication in K and its confidence in K× or K× ε . To prove this we will provide two counterexamples. As first counterexample we consider the context K above. It is readily verified that P({ m2 } → { m1 }) = 21 and conf ({ m2 } → { m1 }) = 34 , i.e., the confidence is greater than the probability. Furthermore, consider the following modification of K as second counterexample: w1 m1 m2 w2 m1 m2 g1 × × g1 × g2 g2 × Then we have that P({ m2 } → { m1 }) = 12 and conf ({ m2 } → { m1 }) = 31 , i.e., the confidence is smaller than the probability. 200 Francesco Kriegel 4 The Description Logic EL⊥ and Probabilistic Interpretations This section gives a brief overview on the light-weight description logic EL⊥ [1]. First, assume that ( NC , NR ) is a signature, i.e., NC is a set of concept names, and NR is a set of role names, respectively. Then EL⊥ -concept descriptions C over ( NC , NR ) may be constructed according to the following inductive rule (where A ∈ NC and r ∈ NR ): C ::= ⊥ | > | A | C u C | ∃ r. C. We shall denote the set of all EL⊥ -concept descriptions over ( NC , NR ) by EL⊥ ( NC , NR ). Second, the semantics of EL⊥ -concept descriptions is defined by means of interpre- tations: An interpretation is a tuple I = (∆I , ·I ) that consists of a set ∆I , called domain, I I I and an extension function ·I : NC ∪ NR → 2∆ ∪ 2∆ ×∆ that maps concept names A ∈ NC to subsets AI ⊆ ∆I and role names r ∈ NR to binary relations rI ⊆ ∆I × ∆I . The extension function is extended to all EL⊥ -concept descriptions as follows: ⊥I := ∅, >I := ∆I , (C u D)I := CI ∩ DI , (∃ r. C)I := { d ∈ ∆I | ∃e ∈ ∆I : (d, e) ∈ rI and e ∈ CI }. A general concept inclusion (GCI) in EL⊥ is of the form C v D where C and D are EL⊥ - concept descriptions. It holds in an interpretation I if CI ⊆ DI is satisfied, and we then also write I |= C v D, and say that I is a model of C v D. Furthermore, C is subsumed by D if C v D holds in all interpretations, and we shall denote this by C v D, too. A TBox is a set of GCIs, and a model of a TBox is a model of all its GCIs. A TBox T entails a GCI C v D, denoted by T |= C v D, if every model of T is a model of C v D. To introduce probability into the description logic EL⊥ , we now present the notion of a probabilistic interpretation from [11]. It is simply a family of interpretations over the same domain and the same signature, indexed by a set of worlds that is equipped with a probability measure. Definition 14 (Probabilistic Interpretation, [11]). Let ( NC , NR ) be a signature. A prob- abilistic interpretation I is a tuple (∆I , (·Iw )w∈W , W, P) consisting of a set ∆I , called domain, a countable set W of worlds, a probability measure P on W, and an extension function ·Iw for each world w ∈ W, i.e., (∆I , ·Iw ) is an interpretation for each w ∈ W. For a general concept inclusion C v D its probability in I is defined as follows: P(C v D) := P{ w ∈ W | CIw ⊆ DIw }. Furthermore, for a GCI C v D we define the following properties (as for probabilistic formal contexts): 1. C v D holds in world w if CIw ⊆ DIw . 2. C v D certainly holds in I if it holds in all worlds. 3. C v D almost certainly holds in I if it holds in all possible worlds. 4. C v D possibly holds in I if it holds in a possible world. 5. C v D is impossible in I if it does not hold in any possible world. 6. C v D is refuted by I if it does not hold in any world. Probabilistic Implicational Bases and Probabilistic Bases of GCIs 201 It is readily verified that P(C v D) = P{ w ∈ Wε | CIw ⊆ DIw } = ∑{ P{ w } | w ∈ Wε and CIw ⊆ DIw } for all general concept inclusions C v D. 5 Probabilistic Bases of GCIs In the following we construct from a probabilistic interpretation I a base of GCIs that entails all GCIs with a probability greater than a given threshold p w.r.t. I . Definition 15 (Probabilistic Base). Let I be a probabilistic interpretation, and p ∈ [0, 1] a threshold. A probabilistic base of GCIs for I and p is a TBox B that satisfies the following conditions: 1. B is sound for I and p, i.e., P(C v D) ≥ p for all GCIs C v D ∈ B, and 2. B is complete for I and p, i.e., if P(C v D) ≥ p, then B |= C v D. A probabilistic base B is irredundant if none of its GCIs follows from the others, and is minimal if it has minimal cardinality among all probabilistic bases for I and p. For a probabilistic interpretation I we define its certain scaling as the disjoint union × of all interpretations Iw with w ∈ W, i.e., as the interpretation I × := (∆I × W, ·I ) whose extension mapping is given as follows: × AI := { (d, w) | d ∈ AIw } ( A ∈ NC ), I× : Iw r = { ((d, w), (e, w)) | (d, e) ∈ r } (r ∈ NR ). Furthermore, the almost certain scaling Iε× of I is the disjoint union of all interpretations Iw where w ∈ Wε is a possible world. Analogously to Lemma 5, a GCI C v D certainly holds in I iff it holds in I × , and almost certainly holds in I iff it holds in Iε× . In [5] the so-called model-based most-specific concept descriptions (mmscs) have been defined w.r.t. greatest fixpoint semantics as follows: Let J be an interpretation, and X ⊆ ∆J . Then a concept description C is a mmsc of X in J , if X ⊆ CJ is satisfied, and C v D for all concept descriptions D with X ⊆ DJ . It is easy to see that all mmscs of X are unique up to equivalence, and hence we denote the mmsc of X in J by XJ . Please note that there is also a role-depth bounded variant w.r.t. descriptive semantics given in [3]. Lemma 16. Let I be a probabilistic interpretation. Then the following statements hold: × 1. CIw × { w } = CI ∩ (∆I × { w }) for all concept descriptions C and worlds w ∈ W. × 2. CIw × { w } = CIε ∩ (∆I × { w }) for all concept descriptions C and possible worlds w ∈ Wε . × × × × × × × × 3. P(C v D) = P(CI I v DI I ) = P(CIε Iε v DIε Iε ) for all GCIs C v D. Proof. 1. We prove the claim by structural induction on C. By definition, the statement holds for ⊥, >, and all concept names A ∈ NC . Consider a conjunction C u D, then (C u D)Iw × { w } = (CIw ∩ DIw ) × { w } = CIw × { w } ∩ DIw × { w } I.H. I × × =C ∩ DI ∩ (∆I × { w }) × = (C u D)I ∩ (∆I × { w }). 202 Francesco Kriegel For an existential restriction ∃ r. C the following equalities hold: (∃ r. C)Iw × { w } = { d ∈ ∆I | ∃e ∈ ∆I : (d, e) ∈ rIw and e ∈ CIw } × { w } × = { (d, w) | ∃(e, w) : ((d, w), (e, w)) ∈ rI and (e, w) ∈ CIw × { w } } I.H. × × = { (d, w) | ∃(e, w) : ((d, w), (e, w)) ∈ rI and (e, w) ∈ CI } × = (∃ r. C)I ∩ (∆I × { w }). 2. analogously. 3. Using the first statement we may conclude that the following equalities hold: P(C v D) = P{ w ∈ W | CIw × { w } ⊆ DIw × { w } } × × = P{ w ∈ W | CI ∩ (∆I × { w }) ⊆ DI ∩ (∆I × { w }) } × × × × × × = P{ w ∈ W | CI I I ∩ (∆I × { w }) ⊆ DI I I ∩ (∆I × { w }) } × × × × = P{ w ∈ W | CI I Iw × { w } ⊆ DI I Iw × { w } } × × × × = P(CI I v DI I ). The second equality follows analogously. t u For a probabilistic interpretation I = (∆I , ·I , W, P) and a set M of EL⊥ -concept descriptions we define their induced context as the probabilistic formal context KI ,M := (∆I , M, W, I, P) where (d, C, w) ∈ I iff d ∈ CIw . Lemma 17. Let I be a probabilistic interpretation, M a set of EL⊥ -concept descriptions, and X, Y ⊆ M. Then the probability d ofdthe implication X → Y in the induced contextdKI ,M equals d the probability of the GCI X v Y in I , i.e., it holds that P(X → Y) = P( X v Y). Proof. The following equivalences are satisfied for all Z ⊆ M and worlds w ∈ W: l d ∈ Z Iw ⇔ ∀C ∈ Z : (d, C, w) ∈ I ⇔ ∀C ∈ Z : d ∈ CIw ⇔ d ∈ ( Z)Iw . Now consider two subsets X, Y ⊆ M, then it holds that P(X → Y) = P{ w ∈ W | X Iw ⊆ Y Iw } l l l l = P{ w ∈ W | ( X)Iw ⊆ ( Y)Iw } = P( X v Y). t u Analogously to [5], the context KI is defined as KI ,MI with the following attributes: × MI := { ⊥ } ∪ NC ∪ { ∃ r. XIε | ∅ 6= X ⊆ ∆I × Wε }. ⊥ For an implication d set Bdover adset M of EL -concept descriptions we define its induced TBox by B := { X v Y | X → Y ∈ B }. Probabilistic Implicational Bases and Probabilistic Bases of GCIs 203 d Corollary 18. If B contains an almost certain implicational base for KI , then B is complete for the almost certain GCIs of I . Proof. We know that a GCI almost certainly holds in I if, and only if, it holds in Iε× . Let B0 ⊆ B be an almost certain implicational base for KI , i.e., an implicational base for (KI )× = KIε× . Then according to Distel in [5, Theorem 5.12] it follows that d ε the TBox B 0 d is a base of GCIs for Iε× , i.e., a base for the almost certain GCIs of I . Consequently, B is complete for the almost certain GCIs of I . t u Theorem 19. Let I be a probabilistic interpretation, and p ∈ [0, 1] a threshold. If B is a probabilistic implicational d base for KI and p that contains an almost certain implicational base for KI , then B is a probabilistic base of GCIs for I and p. d d d Proof. Consider a GCId X vd Y ∈ B. Then Lemma 17 yields that the implication X → Y and the GCI X v Y have the same probability. d Since d B is a probabilistic implicational base for KI and p, we conclude that P( X v Y) ≥ p is satisfied. Assume d that C v D is an arbitrary GCI with probabilityd≥ p. We have to show that B entails C v D. LetdJ be an d arbitrary d model of B. Consider an impli- cation X → Y ∈ B, then X v Y ∈ B holds, and hence it follows that d J d J ( X) ⊆ ( Y) . Consequently, the implication X → Y holds in the induced con- text KJ ,MI . (We here mean the non-probabilistic formal context that is induced by a non-probabilistic interpretation, cf. [2, 3, 5].) Furthermore, since all model-based most-specific d concept descriptions of Iε× are expressible in terms of MI , we have that E ≡ π MI (E) holds for all mmscs E of Iε× , cf. [2, 3, 5]. Hence, we may conclude that × × × × P(C v D) = P(CIε Iε v DIε Iε ) l × × l × × = P( π MI (CIε Iε ) v π MI (DIε Iε )) × × × × = P(π MI (CIε Iε ) → π MI (DIε Iε )). × × × × Consequently, B entails the implication π MI (CIε Iε ) → π MI (DIε Iε ), hence it holds × × × × in KJ ,MI , and furthermore the GCI CIε Iε v DIε Iε holds in J . As J is an arbitrary d × × × × Iε Iε v DIε Iε . interpretation, B entails C d Corollary 18 yields that B is complete for the almost certain GCIs of I . In par- × × d ticular, the GCI C v CIε Iε almost certainly holds in I , and hence follows from B. d × × × × We conclude that B |= C v DIε Iε . Ofdcourse, the GCI DIε Iε v D holds in all interpretations. Finally, we conclude that B entails C v D. t u Corollary d 20. Let I be a probabilistic interpretation, and p ∈ [0, 1] a threshold. Then BKI ,p is a probabilistic base of GCIs for I and p where BKI ,p is defined as in Theorem 12. 6 Conclusion We have introduced the notion of a probabilistic formal context as a triadic context whose third dimension is a set of worlds equipped with a probability measure. Then 204 Francesco Kriegel the probability of implications in such probabilistic formal contexts was defined, and a construction of a base of implications whose probability exceeds a given threshold has been proposed, and its correctness has been verified. Furthermore, the results have been applied to the light-weight description logic EL⊥ with probabilistic interpretations, and so we formulated a method for the computation of a base of general concept inclusions whose probability satisfies a given lower threshold. For finite input data-sets all of the provided constructions are computable. In partic- ular, [3, 5] provide methods for the computation of model-based most-specific concept descriptions, and the algorithms in [6, 10] can be utilized to compute concept lattices and canonical implicational bases (or bases of GCIs, respectively). The author thanks Sebastian Rudolph for proof reading and a fruitful discussion, and the anonymous reviewers for their constructive comments. References [1] Franz Baader et al., eds. The Description Logic Handbook: Theory, Implementation, and Applications. New York, NY, USA: Cambridge University Press, 2003. [2] Daniel Borchmann. “Learning Terminological Knowledge with High Confidence from Erroneous Data”. PhD thesis. TU Dresden, Germany, 2014. [3] Daniel Borchmann, Felix Distel, and Francesco Kriegel. Axiomatization of General Concept Inclusions from Finite Interpretations. LTCS-Report 15-13. Chair for Automata Theory, Institute for Theoretical Computer Science, TU Dresden, Germany, 2015. [4] Alexander V. Demin, Denis K. Ponomaryov, and Evgenii Vityaev. “Probabilistic Concepts in Formal Contexts”. In: Perspectives of Systems Informatics - 8th International Andrei Ershov Memorial Conference, PSI 2011, Novosibirsk, Russia, June 27-July 1, 2011, Revised Selected Papers. Ed. by Edmund M. Clarke, Irina Virbitskaite, and Andrei Voronkov. Vol. 7162. Lecture Notes in Computer Science. Springer, 2011, pp. 394–410. [5] Felix Distel. “Learning Description Logic Knowledge Bases from Data using Methods from Formal Concept Analysis”. PhD thesis. TU Dresden, Germany, 2011. [6] Bernhard Ganter. “Two Basic Algorithms in Concept Analysis”. In: Formal Concept Analysis, 8th International Conference, ICFCA 2010, Agadir, Morocco, March 15-18, 2010. Proceedings. Ed. by Léonard Kwuida and Baris Sertkaya. Vol. 5986. Lecture Notes in Computer Science. Springer, 2010, pp. 312–340. [7] Bernhard Ganter and Rudolf Wille. Formal Concept Analysis - Mathematical Foundations. Springer, 1999. [8] Jean-Luc Guigues and Vincent Duquenne. “Famille minimale d’implications informatives résultant d’un tableau de données binaires”. In: Mathématiques et Sciences Humaines 95 (1986), pp. 5–18. [9] Francesco Kriegel. “Axiomatization of General Concept Inclusions in Probabilistic Descrip- tion Logics”. In: Proceedings of the 38th German Conference on Artificial Intelligence, KI 2015, Dresden, Germany, September 21-25, 2015. Vol. 9324. Lecture Notes in Artificial Intelligence. Springer Verlag, 2015. [10] Francesco Kriegel. NextClosures – Parallel Exploration of Constrained Closure Operators. LTCS- Report 15-01. Chair for Automata Theory, Institute for Theoretical Computer Science, TU Dresden, Germany, 2015. [11] Carsten Lutz and Lutz Schröder. “Probabilistic Description Logics for Subjective Uncer- tainty”. In: Principles of Knowledge Representation and Reasoning: Proceedings of the Twelfth International Conference, KR 2010, Toronto, Ontario, Canada, May 9-13, 2010. Ed. by Fangzhen Lin, Ulrike Sattler, and Miroslaw Truszczynski. AAAI Press, 2010. [12] Michael Luxenburger. “Implikationen, Abhängigkeiten und Galois Abbildungen”. PhD thesis. TH Darmstadt, Germany, 1993. Category of isotone bonds between L-fuzzy contexts over different structures of truth degrees Jan Konecny1 and Ondrej Krı́dlo2 1 Data Analysis and Modeling Lab Dept. Computer Science, Palacky University, Olomouc 17. listopadu 12, CZ-77146 Olomouc, Czech Republic 2 Institute of Computer Science, Faculty of Science Pavol Jozef Šafárik University in Košice Jesenná 5, 040 01 Košice, Slovakia. jan.konecny@upol.cz ondrej.kridlo@upjs.sk Abstract. We describe properties of compositions of isotone bonds be- tween L-fuzzy contexts over different complete residuated lattices and we show that L-fuzzy contexts as objects and isotone bonds as arrows form a category. 1 Introduction In Formal Concept Analysis, bonds represent relationships between formal con- texts. One of the motivations for introducing this notion is to provide a tool for studying mappings between formal contexts, corresponding to the behavior of Galois connections between their corresponding concept lattices. The notions of bonds, scale measures and informorphisms were studied by [14] aiming at a thorough study of the theory of morphisms in FCA. In our previous works, we studied generalizations of bonds into an L-fuzzy setting in [12, 11]. In [13] we also provided a study of bonds between formal fuzzy contexts over different structures of truth degrees. The bonds were based on mappings between complete residuated lattices, called residuation-preserving Galois connections. These mappings were too strict and in [9] we proposed to re- place them by residuation-preserving pl, kq-connections or residuation-preserving dual pl, kq-connections between complete residuated lattices. In the present paper we continue our study [12] of properties of bonds be- tween formal contexts over different structures of truth degrees; this time we concern with bonds mimicking isotone Galois connections between concept lat- tices formed by isotone concept-forming operators. Particularly, we describe the category of formal fuzzy contexts and isotone bonds between them. The paper also extends [13, 9] as we consider a setting with fuzzy formal contexts over different complete residuated lattices. c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 205–216, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 206 Jan Konecny and Ondrej Krídlo The structure of the paper is as follows. First, in Section 2 we recall basic notions required in the rest of the paper. Section 3.1 considers weak homoge- neous L-bonds w.r.t. isotone concept-forming operators and their compositions. Section 3.2 then generalizes the results to the setting of formal fuzzy contexts over different structure of truth degrees. Finally, we summarize our results and outline our future research in this area in Section 4. 2 Preliminaries 2.1 Residuated lattices, fuzzy sets, and fuzzy relations We use complete residuated lattices as basic structures of truth degrees. A com- plete residuated lattice is a structure L “ xL, ^, _, b, Ñ, 0, 1y such that (i) xL, ^, _, 0, 1y is a complete lattice, i.e. a partially ordered set in which arbi- trary infima and suprema exist; (ii) xL, b, 1y is a commutative monoid, i.e. b is a binary operation which is commutative, associative, and a b 1 “ a for each a P L; (iii) b and Ñ satisfy adjointness, i.e. a b b ď c iff a ď b Ñ c. 0 and 1 denote the least and greatest elements. The partial order of L is denoted by ď. Throughout this work, L denotes an arbitrary complete residuated lattice. Elements a of L are called truth degrees. Operations b (multiplication) and Ñ (residuum) play the role of (truth functions of) “fuzzy conjunction” and “fuzzy implication”. An L-set (or L-fuzzy set) A in a universe set X is a mapping assigning to each x P X some truth degree Apxq P L. The set of all L-sets in a universe X is denoted LX . The operations with L-sets are defined componentwise. For instance, the intersection of L-sets A, B P LX is an L-set A X B in X such that pA X Bqpxq “ Apxq ^ Bpxq for each x P X, etc. An L-set A P LX is called crisp if Apxq P t0, 1u for each x P X. Crisp L- sets can be identified with ordinary sets. For a crisp A, we also write x P A for Apxq “ 1 and x R A for Apxq “ 0. An L-set A P LX is called empty (denoted by H) if Apxq “ 0 for each x P X. For a P L and A P LX , the a-multiplication a b A and a-shift a Ñ A are L-sets defined by pa b Aqpxq “ a b Apxq, pa Ñ Aqpxq “ a Ñ Apxq. Binary L-relations (binary L-fuzzy relations) between X and Y can be thought of as L-sets in the universe X ˆ Y . That is, a binary L-relation I P LXˆY be- tween a set X and a set Y is a mapping assigning to each x P X and each y P Y a truth degree Ipx, yq P L (a degree to which x and y are related by I). For an L-relation I P LXˆY we define its transpose as the L-relation I T P Y ˆX L given by I T py, xq “ Ipx, yq for each x P X, y P Y . Category of isotone bonds between L-fuzzy contexts 207 Various composition operators for binary L-relations were extensively studied by [6]; we will use the following composition operators, defined for relations A P LXˆF and B P LF ˆY : ł pA ˝ Bqpx, yq “ Apx, f q b Bpf, yq, (1) f PF ľ pA Ż Bqpx, yq “ Bpf, yq Ñ Apx, f q. (2) f PF Note also that for L “ t0, 1u, A˝B coincides with the well-known composition of binary relations. We will occasionally use some of the following properties concerning the associativity of several composition operators, see [2]. Theorem 1. The operator ˝ from above has the following properties concerning composition. – Associativity: R ˝ pS ˝ T q “ pR ˝ Sq ˝ T. (3) – Distributivity: ď ď ď ď p Ri q ˝ S “ pRi ˝ Sq, and R˝p Si q “ pR ˝ Si q. (4) i i i i 2.2 Formal fuzzy concept analysis An L-context is a triplet xX, Y, Iy where X and Y are (ordinary nonempty) sets and I P LXˆY is an L-relation between X and Y . Elements of X are called objects, elements of Y are called attributes, I is called an incidence relation. Ipx, yq “ a is read: “The object x has the attribute y to degree a.” Consider the following pair xX, Yy of operators X : LX Ñ LY and Y : LY Ñ LX induced by an L-context xX, Y, Iy: ł ľ AX pyq “ Apxq b Ipx, yq, B Y pxq “ Ipx, yq Ñ Bpyq. (5) xPX yPY for all A P LX and B P LY . When we consider concept-forming operators in- duced by multiple L-relations, we write the inducing L-relation as the subscript of the symbols of the operators. For example, the pair of concept-forming oper- ators induced by L-relation I are written as xXI , YI y. Remark 1. Notice that the pair of concept-forming operators can be interpreted as instances of the composition operators between relations. Applying the iso- morphisms L1ˆX – LX and LY ˆ1 – LY whenever necessary, one could write them, alternatively, as AX “ A ˝ I and B Y “ I Ž B p“ B Ż I T q. 208 Jan Konecny and Ondrej Krídlo Furthermore, denote the set of fixed points of xX , Y y by B XY pX, Y, Iq, i.e. B XY pX, Y, Iq “ txA, By P LX ˆ LY | AX “ B, B Y “ Au. (6) The set of fixed points endowed with ď, defined by xA1 , B1 y ď xA2 , B2 y if A1 Ď A2 (equivalently B2 Ď B1 ) is a complete lattice [5], called an attribute-oriented L-concept lattice associated with I, and its elements are called (attribute-oriented) formal L-concepts (or just L-concepts). For thorough studies of attribute-oriented concept lattices, see [5, 7, 15]. In a formal concept xA, By, the A is called an extent, and B is called an intent. The set of all extents and the set of all intents are denoted by ExtXY and IntXY , respectively. That is, ExtXY pX, Y, Iq “ tA P LX | xA, By P B XY pX, Y, Iq for some Bu, (7) IntXY pX, Y, Iq “ tB P LY | xA, By P B XY pX, Y, Iq for some Au. Equivalently, we can characterize ExtXY pX, Y, Iq and IntXY pX, Y, Iq as follows ExtXY pX, Y, Iq “ tB Y | B P LY u, (8) IntXY pX, Y, Iq “ tAX | A P LX u. We will need the following lemma from [4]. Lemma 1. Consider L-contexts xX, Y, Iy, xX, F, Ay, and xF, Y, By. (a) IntXY pX, Y, Iq Ď IntXY pF, Y, Bq if and only if there exists A1 P LXˆF such that I “ A1 ˝ B, (b) ExtXY pX, Y, A ˝ Bq Ď ExtXY pX, F, Aq. Definition 1. An L-relation β P LX1 ˆY2 is called a homogeneous weak L-bond3 from L-context xX1 , Y1 , I1 y to L-context xX2 , Y2 , I2 y if ExtXY pX1 , Y2 , βq Ď ExtXY pX1 , Y1 , I1 q, (9) IntXY pX , Y , βq Ď IntXY pX , Y , I q. 1 2 2 2 2 In this paper we assume only weak homogeneous L-bonds w.r.t. xX, Yy. In what follows, we omit the words ‘weak homogeneous’ and the pair of concept- forming operators and call them just ‘L-bonds’. We will utilize the following characterization of L-bonds. Lemma 2 ([7]). An L-relation β P LX1 ˆY2 is an L-bond from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y iff there is such L-relation Se that β “ Se ˝ I2 and YSe maps extents of B XY pX2 , Y2 , I2 q to extents of B XY pX2 , Y2 , I2 q. Remark 2. Note that due to results on fuzzy relational equations we have that the L-relation Se from Lemma 2 is equal to β Ż I2T (see [2]). 3 The notion of L-bond was introduced in [12]; however we adapt its definition the same way as in [8, 10] w.r.t. xX, Yy Category of isotone bonds between L-fuzzy contexts 209 3 Results Firstly, we describe compositions of L-bonds and show that they form a category. Later we generalize the results to setting of isotone bonds between fuzzy contexts over different complete residuated lattices. 3.1 Setting with uniform structures of truth degrees We start with the notion of composition of L-bonds. Definition 2. Let β1 be an L-bond from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y and β2 be an L-bond from xX2 , Y2 , I2 y to xX3 , Y3 , I3 y. Define composition of β1 and β2 as the L-relation pβ1 Ż I2T q ˝ β2 P LX1 ˆY3 and denote it β1 ‚ β2 . Theorem 2. The composition of L-bonds is an L-bond. Proof. Let β1 be an L-bond from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y and β2 be an L- bond from xX2 , Y2 , I2 y to xX3 , Y3 , I3 y. By Lemma 2 there are Se P LX1 ˆX2 , Se 1 P LX2 ˆX3 such that β1 “ Se ˝ I2 , β2 “ Se 1 ˝ I3 . By Definition 2 and Remark 2 we have β1 ‚ β2 “ pβ1 Ż I2T q ˝ β2 “ Se ˝ Se 1 ˝ I3 . Hence we have IntXY pX1 , Y3 , β1 ‚ β2 q Ď IntXY pX3 , Y3 , I3 q (10) by Lemma 1 (a). Note that the mapping YSe maps extents of I2 to extents of I1 by Lemma 2 and that B Yβ2 is extent of I2 for any B P IntXY pX3 , Y3 , I3 q by (8). Thus we have B Yβ1 ‚β2 “ B Yβ2 YSe P ExtXY pX1 , Y1 , I1 q, hence ExtXY pX1 , Y3 , β1 ‚ β2 q Ď ExtXY pX1 , Y1 , I1 q. (11) The equalities (10) and (11) imply that β1 ‚ β2 is an L-bond. \ [ Lemma 3. Let β be an L-bond from L-context xX1 , Y1 , I1 y to L-context xX2 , Y2 , I2 y. For any L-set A P LX1 we have that AXI1 YI1 Xβ “ AXβ . Proof. Let A be an arbitrary L-set from LX1 . Then AXI1 YI1 Xβ Ě AXβ since p´qXβ is isotone and AXI1 YI1 Ě A “ AX β Y β X β “ AXβ Yβ XI2 YI2 Xβ due to definition of L-bond Ě AXI1 YI1 Xβ since the mapping p´qXI1 YI1 Xβ is isotone Hence AXI1 YI1 Xβ “ AXβ . \ [ 210 Jan Konecny and Ondrej Krídlo The equality from Lemma 3 written in relational form is A˝β “ pA˝I1 qŻI1T q˝β; we use that to prove the following theorem. Theorem 3. Composition of L-bonds is associative. Proof. Let β1 be an L-bond from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y, β2 be an L-bond from xX2 , Y2 , I2 y to xX3 , Y3 , I3 y, and β3 be an L-bond from xX3 , Y3 , I3 y to xX4 , Y4 , I4 y. We have pβ1 ‚ β2 q ‚ β3 “ pppβ1 Ż I2T q ˝ β2 q Ż I3T q ˝ β3 by Definition 2 “ ppSe ˝ β2 q Ż I3T q ˝ β3 by Remark 2 “ ppSe ˝ pSe1 ˝ I3 qq Ż I3T q ˝ β3 by Lemma 2 “ pppSe ˝ Se1 q ˝ I3 q Ż I3T q ˝ β3 by (3) “ pSe ˝ Se1 q ˝ β3 by Lemma 3 “ Se ˝ pSe1 ˝ β3 q by (3) “ Se ˝ pβ2 ‚ β3 q “ β1 ‚ pβ2 ‚ β3 q by Remark 2 and Definition 2. \ [ We obtain a category of L-contexts and L-bonds. Theorem 4. The structure of L-contexts and L-bonds forms a category: Objects are L-contexts, Arrows are L-bonds where identity arrow of any formal L-context xX, Y, Iy is its incidence relation I,4 composition of arrows β1 ‚ β2 is given by Definition 2. Remark 3. The category is equivalent to category of attribute-oriented concept lattices and isotone Galois connections. That is analogous to results in [12]. We will bring more about is in full version of the paper. 3.2 Setting with different structures of truth degrees In this section we generalize the previous results into a setting in which fuzzy con- texts are defined over different complete residuated lattices. To do that we need to explore compositions of underlying morphisms called residuation-preserving pl, kq-connections between complete residuated lattices. pl, kq-connections and their compositions Firstly, let us recall definition and basic properties of the pl, kq-connections in- troduced in [9]. Definition 3 ([9]). Let L1 , L2 be complete residuated lattices, let l P L1 , k P L2 and let λ : L1 Ñ L2 , κ : L2 Ñ L1 be mappings, such that 4 Clearly, I is an L-bond from xX, Y, Iy to xX, Y, Iy. Category of isotone bonds between L-fuzzy contexts 211 1 ‚ 1 ‚ d ‚ 0.75 ‚ ‚ c ‚ 0.5 b ‚ ‚ a ‚ 0.25 ‚ 0 ‚ 0 b 0 a b c d 1 Ñ 0 a b c d 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 a 0 0 0 a 0 a a d 1 d 1 1 1 b 0 0 b 0 b b b c c 1 c 1 1 c 0 a 0 c a c c b d b 1 d 1 d 0 0 b a b d d a c d c 1 1 1 0 a b c d 1 1 0 a b c d 1 Fig. 1. Six-element residuated lattice, with b and Ñ as showed in the bottom part (011010:00A0B0BCAB in [3]), (top left), five-element Lukasiewicz chain (111:000AB in [3]), (top right), and pc, 0.5q-connection between them. – xλ, κy is an isotone Galois connection between L1 and L2 , – κλpa1 q “ l Ñ1 pl b1 a1 q for each a1 P L1 , – λκpa2 q “ k b2 pk Ñ2 a2 q for each a2 P L2 . We call xλ, κy an pl, kq-connection from L1 to L2 . An pl, kq-connection from L1 to L2 is called residuation-preserving if κpk b2 pλpaq Ñ2 λpbqqq “ κλpaq Ñ1 κλpbq (12) holds true for any a, b P L2 . Theorem 5 ([9]). Let xλ, κy be a residuation-preserving pl, kq-connection from L1 to L2 . The algebra xfixpλ, κq, ^, _, b, Ñ, 0, 1y where ^ and _ are given by the order xa1 , a2 y ď xb1 , b2 y if a1 ď1 b1 , (13) (equivalently, if a2 ď2 b2 ) and the adjoint pair is given by xa1 , a2 y Ñ xb1 , b2 y “ xa1 Ñ1 b1 , k b2 pa2 Ñ2 b2 qy (14) “ xa1 Ñ1 b1 , k b2 ppk Ñ2 a2 q Ñ2 pk Ñ2 b2 qqy, (15) xa1 , a2 y b xb1 , b2 y “ xl Ñ1 pl b1 a1 b1 b1 q, a2 b2 pk Ñ2 b2 qy (16) “ xl Ñ1 pl b1 a1 b1 b1 q, pk Ñ2 a2 q b2 b2 y (17) is a complete residuated lattice. 212 Jan Konecny and Ondrej Krídlo Figure 1 shows an example of pl, kq-connection. We refer the reader to [9] for ideas behind pl, kq-connections, examples and further details. Now we define composition of pl, kq-connections and show that it is an pl, kq- connection as well. In addition, the composition preserves residuation-preservation, that means that composition of residuation-preserving pl, kq-connections is a residuation-preserving pl, kq-connection as well. Theorem 6. Let xλ1 , κ1 y be an pl1 , k2 q-connection from L1 to L2 and xλ2 , κ2 y be an pk2 , j3 q-connection from L2 to L3 . Then the pair of mappings λ : L1 Ñ L3 , κ : L3 Ñ L1 , defined by λpa1 q “ λ2 pk2 Ñ2 λ1 pa1 qq, (18) κpa3 q “ κ1 pk2 b2 κ2 pa3 qq for each a1 P L1 and a2 P L2 , is an pl1 , j3 q-connection from L1 to L3 . Proof. First, we prove that κλpa1 q “ l1 Ñ1 pl1 b1 a1 q for each a1 P L1 and λκpa2 q “ j3 b3 pj3 Ñ3 a3 q for each a3 P L3 . For each a1 P L1 , we have κλpa1 q “ κ1 pk2 b2 κ2 pλ2 pk2 Ñ2 λ1 pa1 qqqq “ κ1 pk2 b2 pk2 Ñ2 pk2 b pk2 Ñ2 λ1 pa1 qqqqq “ κ1 pk2 b2 pk2 Ñ2 λ1 pa1 qqq “ κ1 pλ1 pκ1 pλ1 pa1 qqqq “ κ1 pλ1 pa1 qq “ l1 Ñ1 pl1 b1 a1 q. Similarly, we have for each a3 P L3 λκpa3 q “ λ2 pk2 Ñ2 λ1 pκ1 pk2 b2 κ2 pa3 qqqq “ λ2 pk2 Ñ2 pk2 b2 pk2 Ñ2 pk2 b2 κ2 pa3 qqqqq “ λ2 pk2 Ñ2 pk2 b2 κ2 pa3 qqqqq “ λ2 pκ2 pλ2 pκ2 pa3 qqqq “ λ2 pκ2 pa3 qq “ j3 b3 pj3 Ñ3 a3 q. Since κλpa1 q “ l1 Ñ1 pl1 b1 a1 q ě1 a1 and λκpa3 q “ j3 b3 pj3 b3 a3 q ď3 a3 we only need to show monotony to prove that xλ, κy is an isotone Galois connection: For each a1 , b1 P L1 we have a1 ď1 b1 implies λ1 pa1 q ď2 λ1 pb1 q since λ1 is monotone, implies k2 Ñ2 λ1 pa1 q ď2 k2 Ñ2 λ1 pb1 q since Ñ2 is monotone in its second argument, implies λ2 pk2 Ñ2 λ1 pa1 qq ď3 λ2 pk2 Ñ2 λ1 pb1 qq since λ2 is monotone. Thus a1 ď1 b1 implies λpa1 q ď3 λpb1 q for each a1 , b1 P L1 . Similarly, one can show that a3 ď3 b3 implies κpa3 q ď1 κpb3 q. \ [ Category of isotone bonds between L-fuzzy contexts 213 Theorem 7. Let xλ1 , κ1 y be a residuation-preserving pl1 , k2 q-connection from L1 to L2 and xλ2 , κ2 y be a residuation-preserving pk2 , j3 q-connection from L2 to L3 . Then the pair of mappings λ : L1 Ñ L3 , κ : L3 Ñ L1 , defined by (18), is a residuation-preserving pl1 , j3 q-connection from L1 to L3 . Proof. For each a1 , b1 P L1 we have κλpa1 q Ñ1 κλpb1 q “ “ pl1 Ñ1 pl1 b1 a1 qq Ñ1 pl1 Ñ1 pl1 b1 b1 qq “ κ1 λ1 pa1 q Ñ1 κ1 λ1 pb1 q “ κ1 pk2 b2 pλ1 pa1 q Ñ2 λ1 pb1 qqq “ κ1 pk2 b2 pλ1 κ1 λ1 pa1 q Ñ2 λ1 κ1 λ1 pb1 qqq “ κ1 pk2 b2 ppk2 b2 pk2 Ñ2 λ1 pa1 qqq Ñ2 pk2 b2 pk2 Ñ2 λ1 pb1 qqqqq “ κ1 pk2 b2 ppk2 b2 pk2 Ñ2 pk2 b2 pk2 Ñ2 λ1 pa1 qqqqq Ñ2 pk2 b2 pk2 Ñ2 λ1 pb1 qqqqq “ κ1 pk2 b2 ppk2 Ñ2 pk2 b2 pk2 Ñ2 λ1 pa1 qqqq Ñ2 pk2 Ñ2 pk2 b2 pk2 Ñ2 λ1 pb1 qqqqq “ κ1 pk2 b2 pκ2 λ2 pk2 Ñ2 λ1 pa1 qq Ñ2 κ2 λ2 pk2 Ñ2 λ1 pb1 qqqq “ κ1 pk2 b2 κ2 pj3 b3 pλ2 pk2 Ñ2 λ1 pa1 qq Ñ3 λ2 pk2 Ñ2 λ1 pb1 qqqqq “ κ1 pk2 b2 κ2 pj3 b3 pλpa1 q Ñ3 λpb1 qqqq “ κpj3 b3 pλpa1 q Ñ3 λpb1 qqq. \ [ We call xλ, κy from (18) a composition of xλ1 , κ1 y and xλ2 , κ2 y and we denote it as xλ1 , κ1 y ‚ xλ2 , κ2 y “ xλ1 ‚ λ2 , κ1 ‚ κ2 y. Now we show, that the composition of pl, kq-connections is associative. Theorem 8. Let xλ1 , κ1 y be an pl1 , k2 q-connection from L1 to L2 , xλ2 , κ2 y be a pk2 , j3 q-connection from L2 to L3 , and xλ3 , κ3 y be a pj3 , m4 q-connection from L3 to L4 . Then xλ1 , κ1 y ‚ pxλ2 , κ2 y ‚ xλ3 , κ3 yq “ pxλ1 , κ1 y ‚ xλ2 , κ2 yq ‚ xλ3 , κ3 y. Proof. We have for each a P L1 pλ1 ‚ pλ2 ‚ λ3 qqpa1 q “ pλ2 ‚ λ3 qpk2 Ñ2 λ1 pa1 qq “ λ3 pj3 Ñ λ2 pk2 Ñ2 λ1 pa1 qqq “ λ3 pj3 Ñ pλ1 ‚ λ2 qpa1 qq “ ppλ1 ‚ λ2 q ‚ λ3 qpa1 q and similarly for the κ-part. \ [ Theorem 9. The following structure forms a category. Objects are pairs xL, ey, where L is a complete residuated lattices and e P L. Arrows from xL1 , ly to xL2 , ky are pl, kq-connections from L1 to L2 , where 214 Jan Konecny and Ondrej Krídlo identity arrow on any xL, ey is pe, eq-connection xλ, κy where λpaq “ e b a and κpaq “ e Ñ a for each a P L. composition of arrows is as defined in (18). If we use just residuation-preserving pl, kq-connections we obtain a sub-category. Now, we can explore bonds based on residuation-preserving pl, kq-connections. Definition 4. Let L1 , L2 be complete residuated lattices, xλ, κy be residuation- preserving pl, kq-connection from L1 to L2 , and let xX1 , Y1 , I1 y and xX2 , Y2 , I2 y X1 ˆY2 be L1 -context and L2 -context, respectively. We call β P Lxλ,κy a xλ, κy-bond from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y if the following inclusions hold. ExtMO pX1 , Y2 , βq Ď ExtXY pX1 , Y1 , κλpI1 qq, (19) IntMO pX , Y , βq Ď IntXY pX , Y , λκpI qq. 1 2 2 2 2 (20) The concept-forming operators xM, Oy induced by xλ, κy-bond β from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y are given by5 AMβ “ λpAqXproj2 pβq , (21) B Oβ “ κpBqYproj1 pβq . Theorem 10. Let xX1 , Y1 , I1 y be an L1 -context, xX2 , Y2 , I2 y be an L2 -context, and xλ, κy an pl, kq-connection from L1 to L2 . Then β P Lxλ,κy is a xλ, κy-bond from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y if and only if it is a Lxλ,κy -bond w.r.t. xX, Yy from xX1 , Y1 , xκλpI1 q, λpI1 qyy to xX2 , Y2 , xκpI2 q, λκpI2 qyy. Proof. Directly from the definition and (21). \ [ For what follows we will need the following product of fuzzy relations. Let xλ1 , κ1 y be pl1 , k2 q-connection from L1 to L2 , xλ2 , κ2 y be pk2 , m3 q-connection Y ˆZ from L2 to L3 , and I P LXˆY XˆZ xλ1 ,κ1 y , J P Lxλ2 ,κ2 y . Then I b J P Lxλ1 ‚λ2 ,κ1 ‚κ2 y is defined as I b J “ xκ1 pKq, λ2 pk2 Ñ2 Kqy where K “ proj2 pIq ˝2 proj1 pJq (22) and ˝2 is composition of L2 -relations (1). Lemma 4. Let xX1 , Y1 , I1 y be an L1 -context, xX2 , Y2 , I2 y be an L2 -context, and xλ, κy an pl, kq-connection from L1 to L2 . 1 ˆX2 (a) An Lxλ,κy -relation β for which exist Lxλ,κy -relations Se P LX xλ,κy and Si P 1 ˆY2 LYxλ,κy such that β “ xκλpI1 q, λpI1 qy b Si “ Se b xκpI2 q, λκpI2 qy is a xλ, κy-bond from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y. 5 proj1 , proj2 denote projection of first and second component of a pair, respectively. Category of isotone bonds between L-fuzzy contexts 215 (b) Each xλ, κy-bond β from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y satisfies that there is 1 ˆX2 Se P LX xλ,κy such that β “ Se b xκpI2 q, λκpI2 qy. Proof. From Theorem 10 and Lemma 1. \ [ Theorem 11. Let xλ1 , κ1 y be an pl1 , k2 q-connection from L1 to L2 , xλ2 , κ2 y be an pk2 , j3 q-connection from L2 to L3 , β1 be xλ1 , κ1 y-bond from xX1 , Y1 , I1 y to xX2 , Y2 , I2 y, and β2 be xλ2 , κ2 y-bond from xX2 , Y2 , I2 y to xX3 , Y3 , I3 y. β “ Se b β2 , (23) where Se “ β1 Ż xκ1 pI2T q, λ1 κ1 pI2T qy, is a xλ1 ‚ λ2 , κ1 ‚ κ2 y-bond from xX1 , Y1 , I1 y to xX3 , Y3 , I3 y. Let us denote β from (23) as β “ β1 ‚ β2 and call it a composition of isotone xλ, κy-bonds. Now we show associativity of this composition. Theorem 12. Let xλ1 , κ1 y be an pl1 , k2 q-connection from L1 to L2 , xλ2 , κ2 y be an pk2 , j3 q-connection from L2 to L3 , xλ3 , κ3 y be an pj3 , m4 q-connection from L3 to L4 , and βi be xλi , κi y-bond from xXi , Yi , Ii y to xXi`1 , Yi`1 , Ii`1 y. Then β1 ‚ pβ2 ‚ β3 q “ pβ1 ‚ β2 q ‚ β3 . Proof. Follows from Theorem 3, Theorem 8, and Theorem 10. \ [ Finally, we can state that L-contexts over different structures of truth degrees and bonds between them form a category. Theorem 13. Objects are pairs xK, ey, where K is a L-context and e P L. Arrows between xK1 , ly and xK2 , ky, where K1 is an L1 -context, K2 is an L2 -context and l P L1 , k P L2 , are xλ, κy-bonds, where xλ, κy is an pl, kq- connection. identity arrow for a pair xK, ey of L-context xX, Y, Iy and e is xλ, κy- bond I with xλ, κy are pe, eq-connections xλ, κy where λpxq “ e Ñ a and κpxq “ e b a for each a P L. composition of arrows β1 ‚ β2 is given by (23). 4 Future Research Our future research in this area includes addressing the following issues: – Antitone bonds between fuzzy contexts over different complete residuated lattices were studied in [9]; basics of Isotone bonds are presented in this paper. We want to extend this study to heterogeneous bonds[11]. We will bring results on them and their compositions in the full version of this paper. – As block relations are a special case of bonds, they share many properties (see [11]). It can be fruitful to study the compositions described in this paper in context of block L-relations. In addition, the composition applied on block (crisp) relations correspond with multiplication used in calculus studied in [1]. This observation deserves deeper study; we believe that this can bring a new interesting insight to the calculus. 216 Jan Konecny and Ondrej Krídlo Acknowledgments Jan Konecny is supported by grant No. 15-17899S, “Decompositions of Matrices with Boolean and Ordinal Data: Theory and Algorithms”, of the Czech Science Foundation. Ondrej Krı́dlo is supported by grant VEGA 1/0073/15 by the Ministry of Ed- ucation, Science, Research and Sport of the Slovak republic and University Sci- ence Park TECHNICOM for Innovation Applications Supported by Knowledge Technology, ITMS: 26220220182, supported by the Research & Development Operational Programme funded by the ERDF. References 1. Eduard Bartl and Michal Krupka. Residuated lattices of block relations: Size reduction of concept lattices. to appear in IJGS, 2015. 2. Radim Belohlavek. Fuzzy Relational Systems: Foundations and Principles. Kluwer Academic Publishers, Norwell, USA, 2002. 3. Radim Belohlavek and Vilem Vychodil. Residuated lattices of size ď 12. Order, 27(2):147–161, 2010. 4. Radim Belohlavek and Jan Konecny. Row and Column Spaces of Matrices over Residuated Lattices. Fundam. Inform., 115(4):279–295, 2012. 5. George Georgescu and Andrei Popescu. Non-dual fuzzy connections. Arch. Math. Log., 43(8):1009–1039, 2004. 6. Ladislav J Kohout and Wyllis Bandler. Relational-product architectures for infor- mation processing. Information Sciences, 37(1-3):25–37, 1985. 7. Jan Konecny. Isotone fuzzy Galois connections with hedges. Information Sciences, 181(10):1804–1817, 2011. 8. Jan Konecny. Antitone L-bonds. In: Information Processing and Management of Uncertainty in Knowledge-Based Systems – 15th International Conference, IPMU 2014, pages 71–80, 2014. 9. Jan Konecny. Bonds between L-fuzzy contexts over different structures of truth- degrees, In: 13th International Conference, ICFCA 2015, Nerja, Spain, June 23-26, pages 81–96, 2015. 10. Jan Konecny and Manuel Ojeda-Aciego. Isotone L-bonds. In Manuel Ojeda-Aciego and Jan Outrata, editors, CLA, volume 1062 of CEUR Workshop Proceedings, pages 153–162. CEUR-WS.org, 2013. 11. Jan Konecny and Manuel Ojeda-Aciego. On homogeneous L-bonds and heteroge- neous L-bonds. to appear in IJGS, 2015. 12. Ondrej Krı́dlo, Stanislav Krajči, and Manuel Ojeda-Aciego. The Category of L-Chu Correspondences and the Structure of L-Bonds. Fundam. Inform., 115(4):297–325, 2012. 13. Ondrej Kridlo and Manuel Ojeda-Aciego. CRL-Chu Correspondences. In Manuel Ojeda-Aciego and Jan Outrata, editors, CLA, volume 1062 of CEUR Workshop Proceedings, pages 105–116. CEUR-WS.org, 2013. 14. Markus Krötzsch, Pascal Hitzler, and Guo-Qiang Zhang. Morphisms in context. In Conceptual Structures: Common Semantics for Sharing Knowledge, 13th Inter- national Conference on Conceptual Structures, ICCS 2005, Kassel, Germany, July 17-22, 2005, Proceedings, pages 223–237, 2005. 15. Jesús Medina. Multi-adjoint property-oriented and object-oriented concept lat- tices. Inf. Sci., 190:95–106, 2012. From an implicational system to its corresponding D-basis Estrella Rodrı́guez-Lorenzo1 , Kira Adaricheva2 , Pablo Cordero1 , Manuel Enciso1 , and Angel Mora1 1 University of Málaga, Andalucı́a Tech, Spain, e-mail: {estrellarodlor,amora}@ctima.uma.es, pcordero@uma.es, enciso@lcc.uma.es 2 Nazarbayev University, Kazakhstan e-mail: kira.adaricheva@nu.edu.kz Abstract. Closure system is a fundamental concept appearing in several areas such as databases, formal concept analysis, artificial intelligence, etc. It is well-known that there exists a connection between a closure operator on a set and the lattice of its closed sets. Furthermore, the closure system can be replaced by a set of implications but this set has usually a lot of redundancy inducing non desired properties. In the literature, there is a common interest in the search of the mini- mality of a set of implications because of the importance of bases. The well-known Duquenne-Guigues basis satisfies this minimality condition. However, several authors emphasize the relevance of the optimality in order to reduce the size of implications in the basis. In addition to this, some bases have been defined to improve the computation of closures relying on the directness property. The efficiency of computation with the direct basis is achieved due to the fact that the closure is computed in one traversal. In this work, we focus on the D-basis, which is ordered-direct. An open problem is to obtain it from an arbitrary implicational system, so it is our aim in this paper. We introduce a method to compute the D-basis by means of minimal generators calculated using the Simplification Logic for implications. 1 Introduction Discovering knowledge and information retrieval are currently active issues where Formal Concept Analysis (FCA) provides tools and methods for data analysis. The notions around the concept lattice may be considered as the main attractions in Formal Concept Analysis and they are strongly connected to the notion of closure. Closure system is a fundamental concept appearing in several areas such as database theory, formal concept analysis, artificial intelligence, etc. It is well- known that there exists a connection between a closure operator on a set and the lattice of its closed sets. Furthermore, the closure system can be presented, dually, as a set of attribute implications, namely an implicational system but this set has usually a lot of redundancy inducing non-desired properties. c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 217–228, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 218 Estrella Rodríguez-Lorenzo et al. We can not fail to mention the relevance of the role of the implication notion in different areas. It was the main actor of the normalization theory in database area; it has an outstanding character in Formal Concept Analysis and it was prominently used in Frequent Set Mining and Learning Spaces, see the survey of M. Wild [10]. The latter is devoted to mathematical theory of implications and the different faces of the concept of an implication. Implications linked data represented in several forms going from the relationship between itemsets in transactions (Frequent Set Mining) to the boolean functions (Horn Theory). Nonetheless, as V. Duquenne says in [6] “it is surprising if not hard to ac- knowledge that we did not learn much more on their intimacy in the meantime, despite many interesting papers using or revisiting them”. We believe there is a long way to go, and a deeper theory on properties of implications with automated and efficient methods to manipulate them can be developed. In this paper, we are focused in the Formal Concept Analysis area and the fundamental notions are assumed (see [7]). The task of information retrieval carried out by the tools in FCA conduits to infer concepts from the data set, i.e. to deduce (in an automated way) a set of objects that may be precisely characterized by a set of attributes. Such concepts inherit an order relation induced by attribute set inclusion, providing a lattice structure of the concept set. Here implications are retrieved from a binary table (formal context) representing the relationship between a set of objects and a set of attributes. Implications represent an alternative way for the underlying information contained in the formal context. Many applications must massively compute closures of sets of attributes and any improvement of execution time is relevant. In [9] the author establishes the necessity of obtaining succinct representation of closure operators to achieve an efficient computational usage. In this direction, properties associated to implica- tions are studied to render equivalent sets fulfilling desired properties, directness and optimality. An important matter in FCA is to transform implicational systems in canon- ical forms for special proposals in order to provide an efficient further man- agement. Hence, some alternative definitions have been established: Duquenne- Guigues basis, direct optimal basis, D-basis, etc. In this work we focus on the last one [1], because it combines, in a balanced way, a brief representation (it has a small number of elements) and a efficient computation of closures (it is computed in just one traversal). To this end, D-basis proposes an order in which implications will be attended. The major issue is that the execution of the D-basis in one iteration is more efficient that the execution of a shorter, but un-ordered one, for instance the canonical basis of Duquenne and Guigues. K. Adaricheva et.al prove in [1] that one can extract the D-basis from any direct unit basis Σ in time polynomial of size of Σ, and it takes only linear time of the number of implications of the D-basis to put it into a proper order. In [5] we have proposed a method to calculate all the minimal generators from a set of implications as a way to remove redundancy in the basis. The method From an implicational system to its corresponding D-basis 219 to compute all the minimal generators is based on the Simplification Logic for implications [8]. Using this logic we are able to remove redundancy in the impli- cations [4] and following the same style of application of the Simplification Rule to the set of implications we can obtain all the minimal generators. Currently the retrieval of the D-basis from an arbitrary implicational sys- tem is an open problem, so it becomes our aim in this paper. We introduce a method to compute the D-basis by means of minimal generators calculated us- ing the Simplification Logic for implications. The relationship among minimal generators, covers, minimal covers and D-basis is presented and an algorithm to calculate D-basis from an arbitrary set of implications is proposed. Section 2 presents the main notions necessary to the understanding of the new method: closure operators, the D-basis, Simplification Logic and the method to calculate minimal generators. In Section 3, the relationships between covers and generators are presented. In Section 4, the new method to obtain the D- basis from a set of implications is shown, and some conclusions and extensions are proposed in Section 5. 2 Background 2.1 Closure systems Given a non-empty set M and the set1 2M of all its subsets, a closure operator is a map φ : 2M → 2M that satisfies the following, for all X, Y ∈ 2M : (1) increasing: X ⊆ φ(X); (2) isotone: X ⊆ Y implies φ(X) ⊆ φ(Y ); (3) idempotent: φ(φ(X)) = φ(X). We will refer to the pair hM, φi of a set M and a closure operator on it as a closure system. In the next two subsections we will follow the introduction of the implica- tional system based on the minimal proper covers 2 given in [1], which was named there the D-basis. We will call closure system reduced, if φ({x}) = φ({y}) → x = y, for any x, y ∈ M 3 . If the closure system hM, φi is not reduced, one can modify it to produce an equivalent one that is reduced, see [1] for more details. We will now define a closure operator φ∗ , which is associated with a given operator φ. hM, φi be a closure system. Define φ∗ as a self-map on 2M Definition 1. Let S ∗ such that φ (X) = x∈X φ(x), X ⊆ 2M . It is straightforward to verify that 1 In the FCA framework, that set M can be thought a set of attributes of a context. 2 Although in [1] it was introduced as minimal cover, here we name it minimal proper cover because in this paper we generalize the notion of cover in Section 3. 3 To clarify the notation φ({x}) will be represented as φ(x) if no risk of confusion. 220 Estrella Rodríguez-Lorenzo et al. Lemma 1. φ∗ is a closure operator on M . Given a closure system hM, φi, we introduce several important concepts. Definition 2 ([1]). For x ∈ M we call a subset X ⊆ M a proper cover for x if x ∈ φ(X) \ φ∗ (X). If X is a proper cover for x, it will be denoted as x ∼p X. 2.2 The D-basis In this subsection, we briefly summarize the introduction of the D-basis in [1]. Its definition is strongly based on the notion of a minimal proper cover: Definition 3. A proper cover Y for x is called minimal, if, for any other proper cover Z for x, Z ⊆ φ∗ (Y ) implies Y ⊆ Z. The existence of several proper covers for the same element induces the need to introduce the notion of minimality. Lemma 2. If x ∼p X, then there exists Y such that x ∼p Y , Y ⊆ φ∗ (X) and Y is a minimal proper cover for x. In other words, every proper cover can be reduced to a minimal proper cover under the subset relation added with the φ∗ operator. These ideas bring to the following definition of the implicational system defin- ing the reduced closure system by means of the minimal proper covers. Definition 4. Given a reduced closure system hM, φi, define the D-basis ΣD as a union of two subsets of implications: 1. {y → x : x ∈ φ(y) \ y, y ∈ M } (such implications are called binary); 2. {X → x : X is a minimal proper cover for x}. Note that the D-basis belongs to the family of the unit bases, i.e. implica- tional sets where each implication A → b has a singleton b ∈ M as a consequent. Lemma 3. ΣD generates hM, φi. 2.3 Ordered direct set of implications Here we recall the notion of the ordered direct basis introduced in [1], which is designed for a quick computation of the closures based on some fixed order of implications. First we recall the definition of the ordered iteration of implications. Definition 5. Suppose the set of implications Σ is equipped with some linear order, or equivalently, the implications are indexed as Σ = {s1 , s2 , . . . , sn }. De- fine a mapping ρΣ : 2M → 2M associated with this ordering as follows. For any X ⊆ M , let X0 = X. If Xk is computed and implication sk+1 is A → B, then Xk ∪ B, if A ⊆ Xk , Xk+1 = Xk , otherwise. Finally, ρΣ (X) = Xn . We will call ρΣ an ordered iteration of Σ. From an implicational system to its corresponding D-basis 221 The concept of the ordered iteration is central for the definition of the ordered direct basis. For any given set of implications Σ on set M , by φΣ we understand the closure operator on M defined by Σ. Equivalently, the fixed points of φΣ are exactly subsets X ⊆ M which are stable for all implications A → B in Σ: if A ⊆ X, then B ⊆ X. Definition 6. The set of implications with some linear ordering on it, hΣ, a , ...Yk... a , ...Yk... .... .... .... .... Implicational System Set of (non trivial) Closed sets Set of Covers Set of Minimal Covers and Minimal Generators Fig. 1. Stages of D-basis algorithm Thus, let a be an attribute and mg be the set of minimal generators such that its closure contains a. We write this association as a pair ha, mgi. Let Φ be a set of such pairs of attributes with their generators. We define the Function Add which builds the set of covers produced in Stage 2 as follows: Add(ha, mgi, Φ) = {ha, {g ∈ mg|a 6∈ g} ∪ {mga }i : ha, mga i ∈ Φ} Then, in stage 3, the algorithm picks up the set of minimal covers from the set obtained in stage 2 using the Function MinimalCovers. The method ends with the Function OrderedComp which applies Composition Rule at the same time that it orders the implications in the following sense: the first implications in the D-basis are the binary ones (those with the left-hand side being a singleton). Algorithm 1: D-basis input : An implicational system Σ on M output: The D-basis ΣD on M begin MinGen:=MinGen0 (M , Σ) C:= ∅ foreach hC, mg(C)i ∈MinGen do foreach a ∈ C do C:=Add(ha, mg(C)i, C) ΣD := ∅ foreach ha, mga i ∈ C do mga :=MinimalCovers(mga ) foreach g ∈ mga do ΣD := ΣD ∪ {g → a} ; OrderedComp(ΣD ) return ΣD Example 5. Algorithm 1 returns the following D-basis from the input implica- tional system of Example 3: ΣD = {a → d, bce → ad, ab → ce, ae → bc, bde → ac, cd → abe} From an implicational system to its corresponding D-basis 227 We emphasize that although ac is a minimal generator, it is not a minimal cover, thus an implication with ac in the left-hand side is redundant (deduced from inference axioms) and hence should not appear in the D-basis. A detailed illustrative example In the conclusion of this section we show the execution of the method, in all its stages, on a set of implications from [3], which was used later to illustrate the D-basis definition in [1]. Σ = {5 → 4, 23 → 4, 24 → 3, 34 → 2, 14 → 235, 25 → 134, 35 → 124, 15 → 24, 123 → 45} As a first step in the algorithm, MinGen0 renders the following set of pairs of closed sets and its non-trivial minimal generators, see Figure 2: {h12345, {123, 14, 15, 25, 35}i, h234, {23, 24, 34}i, h45, {5}i, h∅, ∅i} 5⟶4, 2 3⟶4, 2 4⟶3, 3 4⟶2, 1 4⟶2 3 5, 2 5⟶1 3 4, 3 5⟶1 2 4, 1 5⟶2 4, 1 2 3⟶4 5 ø⟶5 ø⟶2 4 ø⟶1 4 ø⟶3 5 ø⟶1 2 3 4 5 2 3 4 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1⟶2 3, 2⟶1 3, 3⟶1 2 1⟶5, 5⟶1 ø ø ø {1 4} {3 5} {1 2 3} {5} {2 4} ø⟶1 ø⟶5 ø⟶2 5 ø⟶1 5 1 5 1 5 1 2 3 4 5 1 2 3 4 5 ø⟶1 ø⟶2 ø⟶3 ø ø ø ø {1 2 4} {2 4 5} 1 2 3 1 2 3 {2 5} 1 2 3 {1 5} ø ø ø {1 5} {2 5} {3 5} ø⟶2 3 ø⟶3 4 2 3 4 2 3 4 1⟶5, 5⟶1 1⟶5, 5⟶1 {2 3} {3 4} ø⟶1 ø⟶5 ø⟶1 ø⟶5 1 5 1 5 1 5 1 5 ø ø ø ø {1 2 3} {2 3 5} {1 3 4} {3 4 5} Fig. 2. MinGen0 Execution Then, for each closed set and each of its elements, our algorithm renders the following set of pairs of elements and covers: {h1, {25, 35}i, h2, {14, 15, 35, 34}i, h3, {14, 15, 25, 24}i, h4, {123, 15, 25, 35, 5, 23}i, h5, {123, 14}i} For each element, the Function MinimalCovers picks up its minimal covers: {h1, {25, 35}i, h2, {14, 34}i, h3, {14, 24}i, h4, {5, 23}i, h5, {14, 123}i} Finally, at the last step, the algorithm turns these pairs into implications and applies ordered composition resulting in the D-basis. ΣD = {5 → 4, 23 → 4, 24 → 3, 34 → 2, 14 → 235, 25 → 1, 35 → 1, 123 → 5} 228 Estrella Rodríguez-Lorenzo et al. 5 Conclusion and future works In this work we have presented a way to obtain the D-basis from any implica- tional system. In [1] the algorithm was proposed to compute the D-basis from any direct basis, but the computation from any implicational system was left open. There exists also an efficient algorithm for the computation of the D-basis from the context using the method of finding the minimal transversals of the associated hypergraphs [2], but this assumes the different input for the closure system which is outside the scope of this paper. The Function MinimalCovers renders the D-basis within the framework of the closure systems without the need of any transformation. A key point of our work is the connection between covers and generators. Using minimal gener- ators, the D-basis is obtained by reducing the set of minimal generators and transforming it into a set of minimal covers. As future work, we propose to develop an algorithm which computes the D-basis with better integration of the minimal generator computation to render the minimal covers in a more direct way. In addition, we are planning to design an empirical study and to make a comparison between this algorithm and other techniques proposed in previous papers. Acknowledgment Supported by Grants TIN2011-28084 and TIN2014-59471-P of the Science and Innovation Ministry of Spain. References 1. K. Adaricheva and J. B. Nation and R. Rand, Ordered direct implicational basis of a finite closure system, Discrete Applied Mathematics, 161 (6): 707–723, 2013. 2. K. Adaricheva and J.B. Nation, Discovery of the D-basis in binary tables based on hypergraph dualization, http://arxiv.org/abs/1504.02875, 2015. 3. K. Bertet, B. Monjardet, The multiple facets of the canonical direct unit implica- tional basis, Theor. Comput. Sci., 411(22-24): 2155–2166, 2010. 4. P. Cordero, A Mora, M. Enciso, I.Pérez de Guzmán, SLFD Logic: Elimination of Data Redundancy in Knowledge Representation, LNCS, 2527: 141–150, 2002. 5. P. Cordero, M. Enciso, A Mora, M. Ojeda-Aciego, Computing Minimal Generators from Implications: a Logic-guided Approach, CLA 2012: 187–198, 2012. 6. V. Duquenne, Some variations on Alan Day’s Algorithm for Calculating Canonical Basis of Implications, CLA 2007: 192–207, 2007. 7. B. Ganter, Two basic algorithms in concept analysis, Technische Hochschule, Darmstadt, 1984. 8. A. Mora, M. Enciso, P. Cordero, I. Fortes, Closure via functional dependence sim- plification, International Journal of Computer Mathematics, 89(4): 510–526, 2012. 9. S. Rudolph, Some Notes on Managing Closure Operators, LNCS, 7278: 278–291, 2012. 10. M. Wild, The joy of implications, aka pure Horn functions: mainly a survey, http: //arxiv.org/abs/1411.6432, 2014. Using Linguistic Hedges in L-rough Concept Analysis Eduard Bartl and Jan Konecny Data Analysis and Modeling Lab Dept. Computer Science, Palacky University, Olomouc 17. listopadu 12, CZ-77146 Olomouc Czech Republic Abstract. We enrich concept-forming operators in L-rough Concept Anal- ysis with linguistic hedges which model semantics of logical connectives ‘very’ and ‘slightly’. Using hedges as parameters for the concept-forming operators we are allowed to modify our uncertainty when forming con- cepts. As a consequence, by selection of these hedges we can control the size of concept lattice. Keywords: Formal concept analysis; concept lattice; fuzzy set; linguistic hedge; rough set; uncertainty. 1 Introduction In [2] we presented a framework which allows us to work with positive and negative attributes in the fuzzy setting by applying two unipolar scales for intents – a positive one and a negative one. The positive scale is implicitly modeled by an antitone Galois connection while the negative scale is modeled by an isotone Galois connection. In this paper we extend this approach in two ways. First, we work with uncertain information. To do this we extend formal fuzzy contexts to contain two truth-degrees for each object-attribute pair. The two truth-degrees represent necessity and possibility of the fact that an object has an attribute. The interval between these degrees represents the uncertainty presented in a given data. Second, we parametrize the concept-forming operators used in the frame- work by unary operators called truth-stressing and truth-depressing linguistic hedges. Their intended use is to model semantics of statements ‘it is very sure that this attribute belongs to a fuzzy set (intent)’ and ‘it is slightly possible that an attribute belongs a fuzzy set (intent)’, respectively. In the paper, we demonstrate how the hedges influence the size of concept lattice. 2 Preliminaries In this section we summarize the basic notions used in the paper. c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 229–240, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 230 Eduard Bartl and Jan Konecny Residuated Lattices and Fuzzy Sets We use complete residuated lattices as basic structures of truth-degrees. A complete residuated lattice [4, 12, 17] is a structure L “ xL, ^, _, b, Ñ, 0, 1y such that xL, ^, _, 0, 1y is a complete lattice, i.e. a partially ordered set in which arbitrary infima and suprema exist; xL, b, 1y is a commutative monoid, i.e. b is a binary operation which is commutative, associative, and a b 1 “ a for each a P L; b and Ñ satisfy adjointness, i.e. a b b ď c iff a ď b Ñ c. 0 and 1 denote the least and greatest elements. The partial order of L is denoted by ď. Throughout this work, L denotes an arbitrary complete residuated lattice. Elements of L are called truth degrees. Operations b (multiplication) and Ñ (residuum) play the role of (truth functions of) “fuzzy conjunction” and “fuzzy implication”. Furthermore, we define the complement of a P L as a “ a Ñ 0. An L-set (or fuzzy set) A in a universe set X is a mapping assigning to each x P X some truth degree Apxq P L. The set of all L-sets in a universe X is denoted LX . The operations with L-sets are defined componentwise. For instance, the intersection of L-sets A, B P LX is an L-set AXB in X such that pAXBqpxq “ Apxq^ Bpxq for each x P X. An L-set A P LX is also denoted tApxq{x | x P Xu. If for all y P X distinct from x1 , . . . , xn we have Apyq “ 0, we also write tApx1 q{x1 , . . . , Apxn q{xn u. An L-set A P LX is called normal if there is x P X such that Apxq “ 1. An L-set A P LX is called crisp if Apxq P t0, 1u for each x P X. Crisp L-sets can be identified with ordinary sets. For a crisp A, we also write x P A for Apxq “ 1 and x < A for Apxq “ 0. X Ź For A, B P L we define the degree of inclusion of A in B by SpA, Bq “ xPX Apxq Ñ Bpxq. Graded inclusion generalizes the classical inclusion relation. Described verbally, SpA, Bq represents a degree to which A is a subset of B. In particular, we write A Ď B iff SpA, Bq “ 1. As a consequence, we have A Ď B iff Apxq ď Bpxq for each x P X. By L´1 we denote L with dual lattice order. An L-rough set A in a universe X is a pair of L-sets A “ xA, Ay P pL ˆ L´1 qU . The A is called an lower approximation of A and the A is called a upper approximation of A.1 The operations with L-rough sets are again defined componentwise, i.e. č č č´1 č ď xA, Ay “ x A, Ay “ x A, Ay, iPI iPI iPI iPI iPI ď ď ď´1 ď č xA, Ay “ x A, Ay “ x A, Ay. iPI iPI iPI iPI iPI Similarly, the graded subsethood is then applied componentwise SpxA, Ay, xB, Byq “ SpA, Bq ^ S´1 pA, Bq “ SpA, Bq ^ SpB, Aq 1 In our setting we consider intents to be L-rough sets; the lower and upper approxima- tion are interpreted as necessary intent and possible intent, respectively. Using Linguistic Hedges in L-rough Concept Analysis 231 and the crisp subsethood is then defined using the graded subsethood: xA, Ay Ď xB, By iff SpxA, Ay, xB, Byq “ 1, iff A Ď B and B Ď A. An L-rough set xA, Ay is called natural if A Ď A. Binary L-relations (binary fuzzy relations) between X and Y can be thought of as L-sets in the universe X ˆ Y. That is, a binary L-relation I P LXˆY between a set X and a set Y is a mapping assigning to each x P X and each y P Y a truth degree Ipx, yq P L (a degree to which x and y are related by I). L-rough relations are then pL ˆ L´1 q-sets in X ˆ Y. For L-relation I P LXˆY we define its inverse I´1 P LYˆX as I´1 py, xq “ Ipx, yq for all x P X, y P Y. Formal Concept Analysis in the Fuzzy Setting An L-context is a triplet xX, Y, Iy where X and Y are (ordinary) sets and I P LXˆY is an L-relation between X and Y. Elements of X are called objects, elements of Y are called attributes, I is called an incidence relation. Ipx, yq “ a is read: “The object x has the attribute y to degree a.” Consider the following pairs of operators induced by an L-context xX, Y, Iy. First, the pair xÒ, Óy of operators Ò : LX Ñ LY and Ó : LY Ñ LX is defined by ľ ľ AÒ pyq “ Apxq Ñ Ipx, yq and BÓ pxq “ Bpyq Ñ Ipx, yq. xPX yPY Second, the pair xX, Yy of operators X : LX Ñ LY and Y : LY Ñ LX is defined by ł ľ AX pyq “ Apxq b Ipx, yq and BY pxq “ Ipx, yq Ñ Bpyq. xPX yPY To emphasize that the operators are induced by I, we also denote the opera- tors by xÒI , ÓI y and xXI , YI y. Fixpoints of these operators are called formal concepts. The set of all formal concepts (along with set inclusion) forms a complete lattice, called L-concept lattice. We denote the sets of all concepts (as well as the corresponding L-concept lattice) by BÒÓ pX, Y, Iq and BXY pX, Y, Iq, i.e. BÒÓ pX, Y, Iq “ txA, By P LX ˆ LY | AÒ “ B, BÓ “ Au, BXY pX, Y, Iq “ txA, By P LX ˆ LY | AX “ B, BY “ Au. For an L-concept lattice BpX, Y, Iq, where B is either BÒÓ or BXY , denote the corresponding sets of extents and intents by ExtpX, Y, Iq and IntpX, Y, Iq. That is, ExtpX, Y, Iq “ tA P LX | xA, By P BpX, Y, Iq for some Bu, IntpX, Y, Iq “ tB P LY | xA, By P BpX, Y, Iq for some Au. An pL1 , L2 q-Galois connection between the sets X and Y is a pair x f, gy of mappings f : LX 1 Ñ LY2 , g : LY2 Ñ LX 1 , satisfying SpA, gpBqq “ SpB, f pAqq 232 Eduard Bartl and Jan Konecny for every A P LX 1 , B P LY2 . One can easily observe that the couple xÒ, Óy forms an pL, Lq-Galois connec- tion between X and Y, while xX, Yy forms an pL, L´1 q-Galois connection between X and Y. L-rough Contexts and L-rough Concepts Lattices An L-rough context is a quadruple xX, Y, I, Iy, where X and Y are (crisp) sets of objects and attributes, respectively, and the xI, Iy is a L-rough relation. The meaning of xI, Iy is as follows: Ipx, yq (resp. Ipx, yq) is the truth degree to which the object x surely (resp. possibly) has the attribute y. The quadruple xX, Y, I, Iy is called a L-rough context. The L-rough context induces two operators defined as follows. Let xX, Y, I, Iy be an L-rough context. Define L-rough concept-forming operators as AM “ xAÒI , AXI y, Y (1) xB, ByO “ BÓI X B I for A P LX , B, B P LY . Fixed points of xM, Oy, i.e. tuples xA, xB, Byy P LX ˆpLˆL´1 qY such that AM “ xB, By and xB, ByO “ A, are called L-rough concepts. The B and B are called lower intent approximation and upper intent approximation, respectively. In [2] we showed that the pair of operators (1) is an pL, L ˆ L´1 q-Galois connection. Linguistic Hedges Truth-stressing hedges were studied from the point of fuzzy logic as logical connectives ‘very true’, see [13]. Our approach is close to that in [13]. A truth- stressing hedge is a mapping ˚ : L Ñ L satisfying 1˚ “ 1, a˚ ď a, a ď b implies a˚ ď b˚ , a˚˚ “ a˚ (2) for each a, b P L. Truth-stressing hedges were used to parametrize antitone L- Galois connections e.g. in [3, 5, 9], and also to parameterize isotone L-Galois connections in [1]. On every complete residuated lattice L, there are two important truth- stressing hedges: (i) identity, i.e. a˚ “ a pa P Lq; (ii) globalization, i.e. " 1, if a “ 1, a˚ “ 0, otherwise. A truth-depressing hedge is a mapping : L Ñ L such that following conditions are satisfied 0 “ 0, a ď a , a ď b implies a ď b , a “ a Using Linguistic Hedges in L-rough Concept Analysis 233 for each a, b P L. A truth-depressing hedge is a (truth function of) logical con- nective ‘slightly true’, see [16]. On every complete residuated lattice L, there are two important truth- depressing hedges: (i) identity, i.e. a “ a pa P Lq; (ii) antiglobalization, i.e. " 0, if a “ 0, a “ 1, otherwise . 1 0.75 0.5 0.25 0 ˚G ˚1 ˚2 ˚3 ˚4 ˚5 ˚6 id 1 0.75 0.5 0.25 0 G 1 2 3 4 5 6 id Fig. 1. Truth-stressing hedges (top) and truth-depressing hedges (bottom) on 5-element chain with Łukasiewicz operations L “ xt0, 0.25, 0.5, 0.75, 1u, min, max, b, Ñ, 0, 1y. The leftmost truth-stressing hedge ˚G is the globalization, leftmost truth-depressing hedge G is the antiglobalization. The rightmost hedges denoted by id are the identities. 234 Eduard Bartl and Jan Konecny For truth-stressing/truth-depressing hedge ˚ we denote by fixp˚q set of its idempotent elements in L; i.e. fixp˚q “ ta P L | a˚ “ au. Let ˚1 , ˚2 be truth-stressing hedges on L such that fixp˚1 q Ď fixp˚2 q; then for each a P A, a˚1 ˚2 “ a˚1 holds. The same holds true for ˚1 , ˚2 being truth- depressing hedges. We naturally extend application of truth-stressing/truth-depressing hedges to L-sets: A˚ pxq “ Apxq˚ for all x P U. 3 Results The L-rough concept-forming operator M gives for each L-set of objects two L-sets of attributes. The first one represents a necessity of having the attributes and second one a possibility of having the attributes. We add linguistic hedges to the concept-forming operators to control shape of the two L-sets. Since the L-rough concept-forming operators are defined via xÒ, Óy and xX, Yy, we first recall the parametrization of these operators as described in [8, 15]. 3.1 Linguistic Hedges in Formal Fuzzy Concept Analysis Let xX, Y, Iy be an L-context and let r, q be truth-stressing hedges on L. The antitone concept-forming operators parametrized by r and q induced by I are defined as ľ AÒr pyq “ Apxqr Ñ Ipx, yq, xPX ľ Óq B pxq “ Bpyqq Ñ Ipx, yq yPY for all A P LX , B P LY . Let r and ♠ be truth-stressing hedge and truth-depressing hedge on L, respectively. The isotone concept-forming operators parametrized by r and ♠ induced by I are defined as ł AXr pyq “ Apxqr b Ipx, yq, xPX ľ BY♠ pxq “ Ipx, yq Ñ Bpyq♠ yPY for all A P LX , B P LY . Properties of the hedges in the setting of multi-adjoint concept lattices with heterogeneous conjunctors were studied in [14]. 3.2 L-rough Concept-Forming Operators with Linguistic Hedges Let r, q be truth-stressing hedges on L and let ♠ be a truth-depressing hedge on L. We parametrize the L-rough concept-forming operators as Y♠ AN “ xAÒr , AXr y and xB, ByH “ BÓq X B (3) Using Linguistic Hedges in L-rough Concept Analysis 235 for A P LX , B, B P LY . Remark 1. When the all three hedges are identities the pair xN, Hy is equivalent to xM, Oy; so it is an pL, L ˆ L´1 q-Galois connection. For arbitrary hedges this does not hold. The following theorem describes properties of xN, Hy. Theorem 1. The pair xN, Hy of L-rough concept-forming operators parametrized by hedges has the following properties. ♠ ♠ (a) AN “ ArM “ ArN and xB, ByH “ xBq , B yO “ xBq , B yH (b) AM Ď AN and xB, ByO Ď xB, ByH (c) SpAr1 , Ar2 q ď SpAN2 , AN1 q and SpxB1 , B1 y, xB2 , B2 yq ď SpxB2 , B2 yH , xB1 , B1 yH q ♠ (d) Ar Ď ANH and xBq , B y Ď xB, ByHN ; (e) A1 Ď A2 implies AN2 Ď AN1 and xB1 , B1 y Ď xB2 , B2 y implies xB2 , B2 yH Ď xB1 , B1 yH ♠ (f) SpAr , xB, ByH q “ SpxBq , B y, AN q Ť Ş Ť Ş ♠ Ş (g) p iPI Ari qN “ iPI ANi and px iPI Bi q , iPI Bi yqH “ iPI xBi , Bi yH (h) ANH “ ANHNH and xB, ByHN “ xB, ByHNHN . Proof. (a) Follows immediately from definition of N and H and idempotency of hedges. (b) From (2) we have Ar Ď A; by properties of Galois connections the in- clusion implies AM Ď ArM , which is by (a) equivalent to AM Ď AN . Proof of the second statement in (b) is similar. (c) Follows from (a) and properties of Galois connections. (d) By [2, Corollary 1(a)] we have Ar Ď ArMO . Using (a) we get Ar Ď ANO and from (b) we have ANO Ď ANH , so Ar Ď ANH . Similarly for the second claim. (e) Follows directly from [2, Corollary 1(c)] and properties of Galois connec- tions. (f) Since xM, Oy forms pL, L ˆ L´1 q-Galois connection and using (a) we have ♠ ♠ ♠ SpAr , xB, ByH q “ SpAr , xBq , B yO q “ SpxBq , B y, ArM q “ SpxBq , B y, AN q. (g) We can easily get ď ď ď č Ò ď p Ari qN “ xp Ari qÒr , p Ari qXr y “ x Ai r , AXi r y iPI iPI iPI iPI iPI č č Òr X “ xAi , Ai y “ r ANi , iPI iPI and ď č ♠ ď č ♠ č č Y♠ px Bi q , Bi yqH “ p Bi q qÓq X p Bi qY♠ “ Bi Óq X Bi iPI iPI iPI iPI iPI iPI č Y♠ č Óq H “ pBi X Bi q “ xBi , Bi y . iPI iPI 236 Eduard Bartl and Jan Konecny (h) Using (a), (d) and (e) twice, we have ANH Ď ANHNH . Using (d) for xB, By “ A we have ANr Ď ANrHN “ ANHN . Then applying (e) we get ANHNH Ď ANH N proving the first claim. The second claim can be proved analogically. \ [ The set of fixed points of xN, Hy endowed with partial order ď given by xA1 , B1 , B1 y ď xA2 , B2 , B2 y iff A1 Ď A2 (4) iff xB1 , B1 y Ď xB2 , B2 y is denoted by BNH r,q,♠ pX, Y, I, Iq. Remark 2. Note that from (4) it is clear that if a concept has non-natural L-rough intent then all its subconcepts have non-natural intent. If such concepts are not desired, one can simply ignore them and work with the iceberg lattice of concepts with natural L-rough intents. The next theorem shows a crisp representation of BNH r,q,♠ pX, Y, I, Iq. ÒÓ Theorem 2. BNH r,q,♠ pX, Y, I, Iq is isomorphic to ordinary concept lattice B pXˆfixprq, Yˆ ˆ fixpqq ˆ fixp♠q, I q where xxx, ay, xy, b, byy P Iˆ iff a b b ď Ipx, yq and a Ñ b ě Ipx, yq. Proof. This proof can be done by following the same steps as in [8, 15]. \ [ The following theorem explains the structure of BNH r,q,♠ pX, Y, I, Iq. Theorem 3. BNH r,q,♠ pX, Y, I, Iq is a complete lattice with suprema and infima defined as ľ č ď č ♠ xAi , xBi , Bi yy “ xp Ai qNH , x Bi q , Bi yHN y, i i i ł ď č ď xAi , xBi , Bi yy “ xp Ari qNH , x Bi , Bi yHN y i i i i for all Ai P LX , Bi P LY , Bi P LY . Proof. Follows from Theorem 2. \ [ Remark 3. Note that if we alternatively define (3) as Y♠ AN “ xpAÒr qq , pAXr q♠ y and xB, ByH “ pBÓq X B qr (5) or Y AN “ xpAÒ qq , pAX q♠ y and xB, ByH “ pBÓ X B qr (6) or AN “ xpAÒr qq , pAXr q♠ y and xB, ByH “ xB, ByO or Y♠ AN “ AM and xB, ByH “ pBÓq X B qr we obtain an isomorphic concept lattice. In addition (5) and (6) produce the same concept lattice. Using Linguistic Hedges in L-rough Concept Analysis 237 3.3 Size Reduction of Fuzzy Rough Concept Lattices This part provides analogous results on reduction with truth-stressing and truth- depressing hedges as [10] for antitone fuzzy concept-forming operators and [15] for isotone fuzzy concept-forming operators. For the next theorem we need the following lemma. Lemma 1. Let r, ♥, q, ♦ be truth-stressing hedges on L such that fixprq Ď fixp♥q, fixpqq Ď fixp♦q; let ♠, s be truth-depressing hedges on L such that and fixp♠q Ď fixpsq. We have AN♥ Ď ANr and xB, ByH♦,s Ď xB, ByHq,♠ . Proof. We have Ar♥ Ď A♥ from (2). From the assumption fixprq Ď fixp♥q we get Ar♥ “ Ar ; whence we have Ar Ď A♥ . Theorem 1(e) implies A♥N Ď ArN which is by the claim (a) of this theorem equivalent to AN♥ Ď ANr . The second claim can be proved similarly. \ [ Theorem 4. Let r, ♥, q, ♦ be truth-stressing hedges on L such that fixprq Ď fixp♥q, fixpqq Ď fixp♦q; let ♠, s be truth-depressing hedges on L s.t. and fixp♠q Ď fixpsq, |BNH NH r,q,♠ pX, Y, I, Iq| ď |B♥,♦,s pX, Y, I, Iq| for all L-rough contexts xX, Y, I, Iy. In addition, if r “ ♥ “ id, we have ExtNH NH r,q,♠ pX, Y, I, Iq Ď Ext♥,♦,s pX, Y, I, Iq. Similarly, if q “ ♦ “ ♠ “ s “ id, we have IntNH NH r,q,♠ pX, Y, I, Iq Ď Int♥,♦,s pX, Y, I, Iq. Proof. (4) follows directly from Theorem 2 and results on subcontexts in [11]. Now, we show (4). Note that each A P ExtNHr,q,♠ pX, Y, I, Iq we have A “ ANr Hq,♠ “ AN♥ Hq,♠ Ě AN♥ H♦,s Ě A. Thus we have A P ExtNH ♥,♦,s pX, Y, I, Iq. The inclusion (4) can be proved similarly. \ [ Example 1. Consider the truth-stressing hedges ˚G , ˚1 , ˚2 , id and truth-depressing hedges G , 1 , 2 , id from Figure 1. One can easily observe that fixp˚G q Ď fixp˚1 q Ď fixp˚2 q Ď fixpidq fixpG q Ď fixp1 q Ď fixp2 q Ď fixpidq. Consider the L-context of books and their graded properties in Fig. 2 with L being 5-element Łukasiewicz chain. Using various combinations of the hedges we obtain a smooth transition in size of the associated fuzzy rough concept lattice going from 10 concepts up to 498 (see Tab. 1). When the 5-element Gödel chain is used instead, we again get a transition going from 10 concepts up to 298 (see Tab. 2). 238 Eduard Bartl and Jan Konecny High rating Large no. of pages Low price Top sales rank 1 0.75 0 1 0 2 0.5 1 0.25 0.5 3 1 1 0.25 0.5 4 0.75 0.5 0.25 1 5 0.75 0.25 0.75 0 6 1 0 0.75 0.25 Fig. 2. L-context of books and their graded properties; this L-context was used in [1, 15] to demonstrate reduction of L-concept lattices using hedges. ♠ “ G ˚G ˚1 ˚2 id ♠ “ 1 ˚G ˚1 ˚2 id ˚G 10 16 59 61 ˚G 15 28 71 110 ˚1 12 22 65 93 ˚1 15 28 71 170 ˚2 15 26 69 103 ˚2 22 28 79 195 id 19 41 97 152 id 28 28 110 264 ♠ “ 2 ˚G ˚1 ˚2 id ♠ “ id ˚G ˚1 ˚2 id ˚G 15 53 134 211 ˚G 27 75 160 297 ˚1 15 53 134 290 ˚1 27 75 160 372 ˚2 22 63 146 327 ˚2 32 80 165 396 id 28 80 181 415 id 40 99 202 498 Table 1. Numbers of concepts in L-context from Fig. 2 formed by xN, Hy parametrized by r, q, and ♠. A 5-element Łukasiewicz chain is used as the structure of truth degrees. The rows represent the hedge r and the columns represent the hedge q. ♠ “ G ˚G ˚1 ˚2 id ♠ “ G ˚G ˚1 ˚2 id ˚G 10 18 24 24 ˚G 15 29 36 45 ˚1 12 21 33 36 ˚1 15 32 49 63 ˚2 15 29 45 48 ˚2 22 57 78 106 id 19 33 51 54 id 28 66 89 117 ♠ “ G ˚G ˚1 ˚2 id ♠ “ G ˚G ˚1 ˚2 id ˚G 15 32 48 59 ˚G 27 50 66 125 ˚1 15 32 59 75 ˚1 27 50 80 167 ˚2 22 57 88 118 ˚2 32 79 113 257 id 28 66 100 130 id 40 90 127 298 Table 2. Numbers of concepts in L-context from Fig. 2 formed by xN, Hy parametrized by r, q, and ♠. A 5-element Gödel chain is used as the structure of truth degrees. The rows represent the hedge r and the columns represent the hedge q. Using Linguistic Hedges in L-rough Concept Analysis 239 4 Conclusion and further research We have shown that the L-rough concept-forming operators can be parameter- ized by truth-stressing and truth-depressing hedges similarly as the antitone and isotone fuzzy concept-forming operators. Our future research includes a study of attribute implications using whose semantics is related to the present setting. That will combine results on fuzzy attribute implications [7] and attribute containment formulas [6]. Acknowledgment Supported by grant No. 15-17899S, “Decompositions of Matrices with Boolean and Ordinal Data: Theory and Algorithms”, of the Czech Science Foundation. References 1. Eduard Bartl, Radim Belohlavek, Jan Konecny, and Vilem Vychodil. Isotone Galois connections and concept lattices with hedges. In IEEE IS 2008, Int. IEEE Conference on Intelligent Systems, pages 15–24–15–28, Varna, Bulgaria, 2008. 2. Eduard Bartl and Jan Konecny. Formal L-concepts with Rough Intents. In CLA 2014: Proceedings of the 11th International Conference on Concept Lattices and Their Applications, pages 207–218, 2014. 3. Radim Belohlavek. Reduction and simple proof of characterization of fuzzy concept lattices. Fundamenta Informaticae, 46(4):277–285, 2001. 4. Radim Belohlavek. Fuzzy Relational Systems: Foundations and Principles. Kluwer Academic Publishers, Norwell, USA, 2002. 5. Radim Belohlavek, Tatana Funioková, and Vilem Vychodil. Fuzzy closure operators with truth stressers. Logic Journal of the IGPL, 13(5):503–513, 2005. 6. Radim Belohlavek and Jan Konecny. A logic of attribute containment, 2008. 7. Radim Belohlavek and Vilem Vychodil. A logic of graded attributes. submitted to Artificial Intelligence. 8. Radim Belohlavek and Vilem Vychodil. Reducing the size of fuzzy concept lattices by hedges. In FUZZ-IEEE 2005, The IEEE International Conference on Fuzzy Systems, pages 663–668, Reno (Nevada, USA), 2005. 9. Radim Belohlavek and Vilem Vychodil. Fuzzy concept lattices constrained by hedges. JACIII, 11(6):536–545, 2007. 10. Radim Belohlavek and Vilem Vychodil. Formal concept analysis and linguistic hedges. Int. J. General Systems, 41(5):503–532, 2012. 11. Bernard Ganter and Rudolf Wille. Formal Concept Analysis – Mathematical Foundations. Springer, 1999. 12. Petr Hájek. Metamathematics of Fuzzy Logic (Trends in Logic). Springer, November 2001. 13. Petr Hájek. On very true. Fuzzy Sets and Systems, 124(3):329–333, 2001. 14. Jan Konecny, Jesús Medina and Manuel Ojeda-Aciego Multi-adjoint concept lattices with heterogeneous conjunctors and hedges. Annals of Mathematics and Artificial Intelligence, 72(1):73–89, 2011. 240 Eduard Bartl and Jan Konecny 15. Jan Konecny. Isotone fuzzy Galois connections with hedges. Information Sciences, 181(10):1804–1817, 2011. Special Issue on Information Engineering Applications Based on Lattices. 16. Vilem Vychodil. Truth-depressing hedges and BL-logic. Fuzzy Sets and Systems, 157(15):2074–2090, 2006. 17. Morgan Ward and R. P. Dilworth. Residuated lattices. Transactions of the American Mathematical Society, 45:335–354, 1939. Revisiting Pattern Structures for Structured Attribute Sets Mehwish Alam1 , Aleksey Buzmakov1 , Amedeo Napoli1 , and Alibek Sailanbayev2? 1 LORIA (CNRS – Inria NGE – U. de Lorraine), Vandœuvre-lès-Nancy, France 2 Nazarbayev University, Astana, Kazakhstan { mehwish.alam, aleksey.buzmakov, amedeo.napoli, } @loria.fr, alibek.sailanbayev@nu.edu.kz Abstract. In this paper, we revisit an original proposition on pattern structures for structured sets of attributes. There are several reasons for carrying out this kind of research work. The original proposition does not give many details on the whole framework, and especially on the possible ways of implementing the similarity operation. There exists an alternative definition without any reference to pattern structures, and we would like to make a parallel between two points of view. Moreover we discuss an efficient implementation of the intersection operation in the corresponding pattern structure. Finally, we discovered that pattern structures for structured attribute sets are very well adapted to the clas- sification and the analysis of RDF data. We terminate the paper by an experimental section where it is shown that the provided implementation of pattern structures for structured attribute sets is quite efficient. Keywords: Formal Concept Analysis, Pattern Structures, Structured Attribute Sets, Least Common Ancestor, Range Minimum Query. 1 Introduction In this paper, we want to make precise and develop a section of [1] related to pattern structures and structured sets of attributes. There are several reasons for carrying out this kind of research work. Firstly, the the pattern structures, the similarity operator u and the associated subsumption operator v for struc- tured sets of attributes are based on antichains and rather briefly sketched in the original paper. Secondly, there is an alternative and a more “qualitative” point of view on the same subject in [2, 3] without any reference to pattern structures, and we would like to make a parallel between these two points of view. Finally, for classifying RDF triples in the analysis of the content of Linked Open Data (LOD), we discovered that actually pattern structures for structured sets of attributes are very well adapted to solve this problem [4]. Moreover, the ? This work was done during the stay of Alibek Sailanbayev at LORIA, France. c paper author(s), 2015. Published in Sadok Ben Yahia, Jan Konecny (Eds.): CLA 2015, pp. 241–252, ISBN 978–2–9544948–0–7, Blaise Pascal University, LIMOS laboratory, Clermont-Ferrand, 2015. Copying permitted only for private and academic purposes. 242 Mehwish Alam et al. classification of RDF triples provides a very good and practical example for illus- trating the use of such a pattern structure and helps to reconcile the two above points of view. Accordingly, in this paper, we will go back to the two original definitions and show how they are related. For completing the history, it is worth mentioning that antichains, whose intersection is the basis of the similarity operation in the pattern structure for structured attribute sets, our paper, are studied in the book [5]. Moreover, this book cites as an application of antichain intersection an older paper from 1994 [6], written in French, about the decomposition of total orderings and its application to knowledge discovery. Then, we proceed to present a way of efficiently working with antichains and intersection of antichains, which can be very useful, especially in case of large sets of data. The last section details a series of experiments where it is shown that pattern structures can be implemented with an efficient intersection operation and that they have a generally better behavior than scaled contexts. 2 Pattern Structures for Structured Attributes 2.1 Pattern Structures Formal Concept Analysis [7] can process only binary contexts. Pattern structures are an extension of FCA which allow a direct processing of such kind of data. The formalism of pattern structures was introduced in [1]. A pattern structure is a triple (G, (D, u), δ), where G is the set of objects, (D, u) is a meet-semilattice of descriptions, and δ : G → D maps an object to its description. In other words, a pattern structure composed of a set of objects, a set of descriptions equipped with a similarity operation denoted by u. This similarity operation is idempotent, commutative and associative. If (G, (D, u), δ) is a pattern structures then the derivation operators (Galois connection) are defined as: l A := δ(g) for A ⊆ G g∈A d := {g ∈ G|d v δ(g)} for d ∈ D Each element in D is referred to as a pattern. The natural order on (D, u), given by c v d ⇔ c u d = c is called the subsumption order. Now a pattern concept can be defined as follows: Definition 1 (Pattern Concept). A pattern concept of a pattern structure (G, (D, u), δ) is a pair (A, d) where A ⊆ G and d ∈ D such that A = d and A = d , where A is called the concept extent and d is called the concept intent. A pattern extent corresponds to the maximal set of objects A whose descrip- tions subsume the description d, where d is the maximal common description Revisiting Pattern Structures for Structured Attribute Sets 243 for objects in A. The set of all pattern concepts is partially ordered w.r.t. inclu- sion on extents, i.e., (A1 , d1 ) ≤ (A2 , d2 ) iff A1 ⊆ A2 (or, equivalently, d2 v d1 ), making a lattice, called pattern lattice. 2.2 Two original propositions on structured attribute sets We briefly recall two original propositions supporting the present study. The first work is firstly published by Carpineto & Romano in [2] and then developed in [3]. The second work is related to the definition of pattern structures by Ganter & Kuznetsov in [1]. In [2, 3], the authors consider a formal context (G, M, I) and an extended set of attributes M ∗ ⊃ M where attributes are organized within a subsumption hi- erarchy according to a partial ordering denoted by ≤M ∗ . The following condition should be satisfied: ∀g ∈ G, m1 ∈ M, m2 ∈ M ∗ : [(g, m1 ) ∈ I, m1 ≤M ∗ m2 ] =⇒ (g, m2 ) ∈ I The subsumption hierarchy can be either a tree or an acyclic graph with a unique maximal element, as this is the case of attributes lying in a thesaurus for example. Then the building of a concept lattice from such a context can be done in two main ways. A first is to use a scaling and to complete the description of an object with all attributes implied by the original attributes. We discuss this scaling operation in detail later. The problem would be the space necessary to store the scaled context, especially in case of big data. A second way is to use an “extended intersection operation” between sets of attributes which is defined as follows. The intersection of two sets of attributes Y1 and Y2 is obtained by finding for each pair (m1 , m2 ), m1 ∈ Y1 , m2 ∈ Y2 , the most specific attributes in M ∗ that are more general than m1 and m2 , and then retaining only the most specific elements of the set of attributes generated in this way. Then if (X1 , Y1 ) and (X2 , Y2 ) are two concepts, we have: (X1 , Y1 ) ≤ (X2 , Y2 ) ⇐⇒ ∀m2 ∈ Y2 , ∃m1 ∈ Y1 , m1 ≤M ∗ m2 In other words, this intersection operation corresponds to the intersection of two antichains as this is explained in [1], where the authors define the formalism of pattern structures and take as an instantiation structured attribute sets. More formally, it is assumed that the attribute set (M, ≤M ) is finite and partially ordered, and that all attribute combinations that can occur must be order ideals (downsets) of this order. Then, any order ideal O can be described by the set of its maximal elements; O = {x|∃y ∈ M, x ≤ y}. It should be noticed that the order considered on the attribute sets in [1] is reversed with respect to the order considered in [2, 3]. However, we keep the original definitions used in [1] in the present paragraph. These maximal elements form an antichain, and conversely, each antichain is the set of maximal elements of some order ideal. Thus, the semilattice (D, u) of patterns in the pattern structure consists of all antichains of the ordered attribute set. In addition, it is isomorphic to the lattice of all order ideals of the ordered set, and thus isomorphic to the concept lattice of the context (P, P, 6≥). For two antichains AC1 and AC2 , the infimum AC1 u AC2 consists of all maximal elements of the order ideal: {m ∈ P | ∃ac1 ∈ AC1 , ∃ac2 ∈ AC2 , m ≤ ac1 and m ≤ ac2 }. 244 Mehwish Alam et al. There is a “canonical representation context” (or an associated scaling oper- ator) for the pattern structure (G, (D, u), δ) related to structured attribute sets, which is defined by the set of “principal ideals ↓ p” as follows: (G, P, I) with (g, p) ∈ I ⇐⇒ p ≤ δ(g). In the next section, we make precise and discuss the pattern structure for structured attribute sets by taking the point of view of filters and not of ideals in agreement with the order from [2, 3], with the most general attributes above. 2.3 From Structured Attributes to Tree-shaped Attributes An important case of structured attributes is “tree-shaped attributes”, i.e., when the attributes are organized within a partial order corresponding to a rooted tree. If it is the case, then the root of the tree, denoted by >, can be matched against the description of any object, while the leaves of this tree are the most detailed descriptions. For example, the root can correspond to the attribute ‘Animal’ and a leaf can correspond to the attribute ‘Cat’; somewhere in between there could be attribute ‘Mammal’. An example of such kind of data naturally appears in the domain of semantic web data. For example, Figure 1 gives a small part of ACCS1 . This attribute tree will be used as a running example and should be read as follows. If an object belongs to class C1 (and probably to some other classes), then it necessarily belongs to classes C10 , C12 , and >, e.g., if an object is a cat, then it is a mammal and an animal. Accordingly, the description of an object can include several classes, e.g., classes C1 , C5 and C8 . Thus, some of the tree-shaped attributes can be omitted from the description of an object. However, they should be always taken into account when computing the intersection between descriptions. Thus, in order to avoid redundancy in the descriptions, we can allow only antichains of the tree as possible elements in the set D of descriptions, and then, accordingly compute the intersection of antichains. An efficient way of computing intersection of antichains is explained in the next section. Here it is important to notice that although it is a hard task to efficiently compute intersection of antichains in an arbitrary partial order of attributes, the intersection of antichains in a tree can help in computing this more general intersection. Indeed, in a partial order of attributes, we can add an artificial attribute > that can be matched against any description. Then, instead of considering an intersection of antichains in an arbitrary poset we can take a spanning tree of it with > taken as the root. Although we have lost some relations between attributes, and, thus, the size of the antichains is probably larger, we can apply the efficient intersection of antichains of tree discussed below. 2.4 On Computing Intersection of Antichains in a Tree In this subsection we show how to efficiently solve the problem of intersection of antichains in a tree. The problem is formalized as follows. A partial order is 1 https://www.acm.org/about/class/2012 Revisiting Pattern Structures for Structured Attribute Sets 245 > C12 C6 C10 C11 C13 C1 C2 C4 C5 C7 C8 C9 Fig. 1: A small part from ACM Computing Classification System (ACCS). described by the Hasse diagram corresponding to the tree. The root is denoted by > and it is larger w.r.t. the partial order than any other element in the tree. Given a rooted tree T and two antichains X and Y , we should find an antichain Z such that (1) for all x ∈ X ∪ Y there is z ∈ Z such that x ≤ z and (2) no z ∈ Z can be removed or changed to z̃ < z without violating requirement (1). If the cardinality of antichains X and Y is 1 then this task is reduced to the well-known problem of a Least Common Ancestor (LCA). In 1984 it was already shown that the LCA problem can be reduced to Range Minimum Query (RMQ) problem [8]. Later several simpler approaches were introduced for solving the LCA problem. Here we briefly introduce the reduction of LCA to RMQ in accordance with [9]. Reduction of LCA to RMQ. Given an array of numbers, the RMQ problem consists in efficient answering queries on the position of the minimal value in a given range (interval) of positions for this array. For example, given an array Array [ 2 1 0 3 2 ] Positions 1 2 3 4 5 where the first value is in position 1 and the last value is in position 5, the answer to the query on the position of the minimal number in the range 2–4, i.e., the corresponding part of array is [1;0;3], is 3 (the value of the 3rd element in the array is 0 and it is the minimal value in this range). Accordingly, the position of the minimal number in the range 1–2 (the part of the array is [2;1]) is 2. The good point about this problem is that it can be solved in O(n) preprocessing computational time and in O(1) computational time per one query [9], where n is the number of elements in the array. In order to introduce the reduction of LCA to RMQ we need to know what is the depth of a tree vertex. The depth of a vertex in a rooted tree is the number of edges in the shortest path from that vertex to the root of the tree. We create the array of depths of the vertices in the tree that is used as an input array for RMQ. We build this array in the following way. We traverse the tree in depth first order (see Figure 2). Every time the algorithm considers a 246 Mehwish Alam et al. 0 1 6 2 3 7 4 5 Depth array D [ 0 1 2 1 2 3 2 3 2 1 0 1 2 1 0 ] Corresponding vertex v0 v1 v2 v1 v3 v4 v3 v5 v3 v1 v0 v6 v7 v6 v0 Positions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Fig. 2: Reducing RMQ task to LCA. Arrows show the depth first order traversal. The depth array D is accompanied by the corresponding vertices and positions. vertex, i.e., the first visit or a return to the vertex, we should put the depth of that vertex at the end of the depth array D. We also keep track of a vertex corresponding to each depth in D. The depth array D has 2|T | − 1 values, where |T | is the number of vertices in the tree. Now for any value in D we know the corresponding vertex of the tree and any vertex of the tree is associated with several positions in D. For example, in Figure 2 the value in the first position of D, i.e., D[1], is 0, corresponding to the root of the tree. If we take vertex 3, then the associated values of D are on positions 5, 7, and 9. Given two vertices A, B ∈ T , let a be one of the positions in D corresponding to vertex A, let b be one of the positions in D corresponding to B. Then it can be shown that the vertex corresponding to the minimal value in D in the range a–b is the least common ancestor of A and B. For example, to find LCA between vertices 3 and 6 in Figure 2, one should first take two positions in D corresponding to vertices 3 and 6. Positions 5,7, and 9 in array D correspond to vertex 3, positions 12 and 14 correspond to vertex 6. Thus, we can query RMQ for ranges 5–14, 7–14, 7–12, etc. The minimal value in D for all these ranges is 0 located at position 11 in D, i.e., RMQ(5, 14) = 11. Thus, the vertex corresponding to position 11, i.e., vertex 0, is the least common ancestor for vertices 3 and 6. Let us notice that if A ∈ T is an ancestor of B ∈ T and a and b are two positions corresponding to the vertices A and B, then the position RMQ(a, b) in D always corresponds to the vertex A, in most of the cases RMQ(a, b) = a. Thus we are also able to check if a vertex of T is an ancestor of another vertex of T . Now we know how to solve the LCA problem in O(|T |) preprocessing com- putational time and O(1) computational time per query. Let us return to the problem of intersecting antichains of a tree. Antichain intersection problem. Let us first discuss the naive approach to this problem. Given two antichains A, B ⊂ T , one can compute the set Revisiting Pattern Structures for Structured Attribute Sets 247 D[ 0 1 2 3 2 3 2 1 2 3 2 3 2 1 0 1 2 3 2 3 2 3 2 1 0] > C12 C10 C1 C10 C2 C10 C12 C11 C4 C11 C5 C11 C12 > C6 C13 C7 C13 C8 C13 C9 C13 C6 > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Fig. 3: Depth array, the corresponding vertices, and indices for the tree in Fig- ure 1. {LCA(a, b) | ∀a ∈ A and ∀b ∈ B}. Then this set should be filtered for remov- ing the comparable elements in order to get an antichain. It is easy to see that the result is the intersection of A and B but it requires at least |A|·|B| operations. Let us reformulate this naive approach in terms of RMQ. Given a depth array D and two sets of indices A, B ⊆ N|D| forming an antichain, we should compute the set Z = {RMQ(a, b) | ∀a ∈ A and ∀b ∈ B} and then remove all elements z ∈ Z such that there is x ∈ Z \ {z} with the position RMQ(z, x) corresponding to the same vertex as z, i.e., elements z corresponding to an ancestor of another element from Z. Let us consider for example the tree T given in Figure 1. Figure 3 shows the depth array, the corresponding vertices, and indices of this array. Let us show how to compute the intersection of A = {C1 , C5 , C8 } and B = {C1 , C7 , C9 }. The expected result is {C1 , C13 }. First we translate the sets A and B to the indices in array D for RMQ, i.e., A = {4, 12, 20} and B = {4, 18, 22}. Then we compute RMQ for all pairs from A and B: {RMQ(4, 4) = 4, RMQ(4, 18) = 15, RMQ(4, 22) = 15, · · · , RMQ(20, 18) = 19, · · · }. Now we should remove positions corresponding to ancestors in the tree, e.g., RMQ(4, 15) = 15 and, hence, 15 should be removed. The result is {4, 13} repre- senting exactly {C1 , C13 }. Let us discuss two points that help us to reduce the complexity of the naive approach. Consider the positions i ≤ l ≤ m ≤ j and k = RMQ(i, j), n = RMQ(l, m). Then the depth in the position k is not larger than the depth in the position n, D[k] ≤ D[n]. Hence the position RMQ(k, n) corresponds to the same vertex as position k. For example, in Figure 3 RMQ(4, 6) = 5 and RMQ(2, 7) = 2. The value in position 5 in the array D is D[5] = 2. It is larger than the value in position 2, D[2] = 1. Thus, the value in position returned by RMQ for the larger range is smaller than the value in position returned by RMQ for the smaller range. Thus, given two sets of indices A, B ⊆ N|D| corresponding to antichains, we can modify the naive algorithm by ordering the set A ∪ B and computing RMQ only for consecutive elements from different sets, rather then for all pairs from different sets. For example, for intersecting A = {4, 12, 20} and B = {4, 18, 22}, we join them to the set Z = {4A , 4B , 12A , 18B , 20A , 22B }. Then, we compute RMQ only for consecutive elements from different sets, i.e., RMQ(4, 4) = 4, RMQ(4, 12) = 8, RMQ(12, 18) = 15, RMQ(18, 20) = 19, and RMQ(20, 22) = 21. The cardinality of A ∪ B is less then |A| + |B|, hence, the number of the consecutive elements is O(|A|+|B|), and, thus, the number of RMQs of consecutive elements is O(|A| + |B|). 248 Mehwish Alam et al. However, the set Z of RMQs of consecutive elements does not not necessarily correspond to an antichain in T . Thus we should filter this set, in order to remove all ancestors of another elements form Z. Accordingly, it is clear that to filter the set Z it is enough to check only consecutive elements of Z. For example, the intersection of A = {4, 12, 20} and B = {4, 18, 22} gives us the following set Z = {4, 8, 15, 19, 21}. Let us now check the RMQs of consecutive elements. RMQ(4, 8) = 8, thus, 8 is an ancestor of 4 and 8 can be removed. Since 8 is removed, we compare RMQ(4, 15) = 15, thus, 15 should be also removed. Then we compute RMQ(4, 19) = 15, i.e., the indices 4 and 19 are not ancestors and both are kept. Now we compute RMQ(19, 21) = 19 and, thus, 19 should be removed (actually positions 19 and 21 correspond to the same vertex C13 and one of them should be removed). Thus, the result of intersecting A and B is {4, 21} corresponding to the antichain {C1 , C13 }. Since the number of elements in the set Z is O(|A| + |B|), then overall com- plexity of computing intersection for two antichains A, B ⊂ T of a tree T is O(|A| + |B|) or, taking into account that the cardinality of an antichain in a tree is less then the number of leaves (vertices having no descendants) in this tree, the complexity of computing intersection of two antichains is O(|Leaves(T )|). Antichain intersection by scaling. An equivalent approach for computing intersection of antichains is to scale the antichains to the corresponding filters. A filter corresponding to an antichain in a poset is the set of all elements of the poset that are larger then at least one element from the antichain. For example, let us consider a tree-shaped poset in Figure 1. A filter corresponding to the antichain {C1 , C5 , C8 } is the set of all ancestors of all elements from the antichain, i.e., it is equal to {C1 , C10 , C12 , >, C5 , C11 , C8 , C13 , C6 }. The set-intersection of filters corresponding to the given antichains is a filter corresponding to the antichain resulting from intersection of the antichains. How- ever this approach has a higher complexity. Indeed, the size of a filter is O(|T |) and, thus, the computational complexity of intersecting two antichains by means of a scaling is O(|T |) which is harder then O(|Leaves(T )|) for intersecting an- tichains directly. Indeed, the number of leaves in a tree can be dramatically smaller than the number of vertices in this tree. For example, the number of vertices in Figure 1 is 13, while the number of leaves is only 7. Thus, the direct intersection of antichains is more efficient than the intersection by means of a scaling procedure. Relation to intersection of antichains in partially ordered sets of at- tributes. As it was mentioned in the previous section, the intersection of an- tichains in arbitrary posets can be reduced to the intersection of antichains in a tree. However, the size of the antichain representing a description of an object can increase. Indeed, since we have reduced a poset to a tree, some relations have been lost, and thus the attributes that are subsumed in the poset for a given antichain A are no more subsumed in the tree for A, and hence should be added to A. However, the reduction is still more computationally efficient than Revisiting Pattern Structures for Structured Attribute Sets 249 Table 1: Results of the experiments with different kind of data. #objects is the number of objects in the corresponding dataset. #attributes is the number of nu- merical attributes before scaling. |G| is the number of objects used for building the lattice. |T | is the size of the attribute tree and the number of attributes in the scaled context |M |. Leaves(T ) is the number of leaves in the attribute tree. |L| is the size of the concept lattice for the corresponding data. tT is the computational time for data represented as a set of antichains in the attribute tree. tK is the computational time represented by a scaled context, i.e., by a set of filters in the attribute tree; ‘*’ shows that the we are not able to build the whole lattice. tnum is the computational time for numerical data represented by an interval pattern structure. (a) Real data experiments. Dataset |G| |T | Leaves(T ) |L| tT tK DBLP 5293 33207 33198 10134 45 sec 21 sec Biomedical Data 63 1490 933 1725582 145 sec 162 sec (b) Numerical data experiments. #attributes #objects Dataset |G| |T | |Leaves(T )| |L| tT tK tnum BK 96 5 35 626 10 840897 37 sec 42 sec* 19 sec LO 16 7 16 224 26 1875 0.043 sec 0.088 sec 0.024 sec NT 131 3 131 140 6 128624 3.6 sec 6.8 sec 3.1 sec PO 60 16 22 1236 58 416837 49 sec 57 sec* 10.7 sec PT 5000 49 22 4084 60 452316 50 sec 38 sec* 15 sec PW 200 11 94 436 21 1148656 60 sec 49 sec* 48 sec PY 74 28 36 340 53 771569 46 sec 40 sec* 21 sec QU 2178 4 44 8212 8 783013 28 sec 30 sec* 15.4 sec TZ 186 61 31 626 88 650041 58 sec 43 sec* 22 sec VY 52 4 52 202 15 202666 5.9 sec 11.6 sec 3 sec computing the intersection of antichains in a poset by means of a scaling as it is discussed in the previous paragraph. However, for the reduction it could be interesting to find the spanning tree with the minimal number of leaves. Unfortu- nately, this is an NP-complete task and it thus cannot be applied for increasing the computational efficiency [10]. We should notice here that there is some work that solves the LCA problem for more general cases, e.g., lattices [11] or partially ordered sets [9]. However, it is an open question whether these works can help to efficiently compute intersection of antichains in the corresponding structures. 250 Mehwish Alam et al. 3 Experiments and Discussion Several experiments are conducted using publicly available data on a MacBook with a 1.3GHz Intel Core i5, 4GB of RAM running OS X Yosemite 10.3. We have used FCAPS2 software developed in C++ for dealing with different kinds of pattern structures. It can build a concept lattice starting from a standard formal context or from object descriptions given as antichains of a given tree. The last one is based on the similarity operation that is discussed above. We performed our experiments on two datasets from different domains namely DBLP and biomedical data. In these datasets, object descriptions are given as subsets of attributes. A taxonomy of the attributes is already known based on domain knowledge. We compute a concept lattice in two different ways. In the first one, we directly compute the concept lattice from the antichains in a taxon- omy. In the second one we scale every description to the corresponding filter of the taxonomy. After this we do not rely on the taxonomy and process the scaled context with standard FCA. The first data set is DBLP, from which we extracted a subset of papers with their keywords published in conferences in Machine Learning domain. The tax- onomy used for classifying such kind of triples is ACM Computing Classification System (ACCS)3 . The second data set belongs to the domain of life sciences. It contains in- formation about drugs, their side effects (SIDER4 ), and their categories (Drug- Bank5 ). The taxonomies related to this dataset are MedDRA 6 for side effects and MeSH7 for drug categories. The parameters of the datasets and the computational results are shown in Table 1a. It can be noticed that for DBLP the context consists of 5293 objects and 33207 attributes, in the taxonomy of the attributes we have 33198 leaves meaning that most of attributes are mutually incomparable. It took 45 seconds to produce a lattice having 10134 concepts directly from the descriptions given by antichains of the taxonomy. To produce the same lattice starting from a scaled context the program only takes 21 seconds. However, if we consider the biomedical data, the approach based on antichains is better. Indeed, it takes 145 seconds, while the computation starting from the scaled contexts takes 162 seconds. In this case, the dataset contains 1490 attributes with 933 leaves. Thus, the direct approach works faster if the number of leaves is significantly smaller than the number of vertices. It is worth noticing that the size of antichains is significantly smaller than the size of the filters, and thus our approach is more efficient. However, when the number of leaves is comparable to the number of vertices, our approach is slower. Although in this case our approach has the same 2 https://github.com/AlekseyBuzmakov/FCAPS 3 https://www.acm.org/about/class/2012 4 http://sideeffects.embl.de/ 5 http://www.drugbank.ca/ 6 http://meddra.org/ 7 http://www.ncbi.nlm.nih.gov/mesh/ Revisiting Pattern Structures for Structured Attribute Sets 251 computational complexity as the scaling approach, the antichain intersection problem requires more efforts than the set intersection. Since the efficiency of the antichain approach is high for the trees with a low number of leaves, we can use this method to increase efficiency of standard FCA for special kind of contexts. In a context (G, M, I) an attribute m1 can be considered as an ancestor of another attribute m2 if any object containing the attribute m2 also contains the attribute m1 . Accordingly we can construct an attribute tree T and rely on it for computing intersection operation. In this case the set of attributes M and the set of vertices of T are the same and |M | = |T |. The second part of the experiment was based on this observation. We used numerical data from Bilkent University in the second part of the experiments8 . It was converted to formal contexts by the standard interodinal scaling. The scaled attributes are closely connected, i.e., there are a lot of pairs of attributes (m1 , m2 ) such that the set of objects described by m1 is a subset of objects described by m2 , i.e., (m1 )0 ⊆ (m2 )0 . Thus, we can say that m1 ≤ m2 . Using this property we built attribute trees from the scaled contexts. These trees have many more vertices than leaves, thus, the approach introduced in this paper should be efficient. We compare our approach with the scaling approach. Moreover, recently, it was shown that interval pattern structures (IPS) can be efficiently used to process such kind of data [12]. Accordingly we also compared our approach with IPS. The results are shown in Table 1b. Compared to Table 1a it has several additional columns. First of all, since for numerical data we typically got large lattices, in most of the cases we considered only part of the objects. The actual number of used objects is given in the column |G|, while the total size of the dataset is given in the column ‘#objects’, e.g., BK dataset contained 96 objects, while we have used only 35. In addition for every dataset we also provide the number of the numerical attributes, e.g., BK has 5 numerical attributes. We should notice that when we built the lattice from some datasets by standard FCA, the lattice was so large that the memory was swapping and we stopped the computation. It was not the case for our approach since antichains requires less memory to store than the corresponding filters. The fact of swapping is shown by ‘*’ next to computational time in column tK . In addition we also show the time for IPS to process the same dataset. For example, the processing of BK dataset took 37 seconds by our approach, took more than 42 seconds by standard FCA and memory had started swapping, and took 19 seconds by IPS. This experiment shows that our approach takes not only less time to com- pute concept lattice, but also requires less memory, since there is no memory swapping. We can also see that the computation time for IPS is smaller than for our approach. However, IPS is only applicable for numerical data, while our approach can be applied for all cases when attributes of a context are structured. For example, we can deal with graph data scaled to the set of frequent subgraphs where many such attributes are subgraphs of other attributes. 8 http://funapp.cs.bilkent.edu.tr/DataSets/ 252 Mehwish Alam et al. 4 Conclusion In this paper we recalled two approaches for dealing with structured attributes and explained how we can compute intersection of antichains in tree-shaped posets of attributes, an essential operation for working with structured attributes. Our experiments showed the computational efficiency of the proposed approach. Accordingly, we are interested in applying our approach to other kinds of data such as graph data. Moreover, the generalization of our approach to other kinds of posets is also of high interest. References 1. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: ICCS. LNCS 2120, Springer (2001) 129–142 2. Carpineto, C., Romano, G.: A lattice conceptual clustering system and its appli- cation to browsing retrieval. Machine Learning 24(2) (1996) 95–122 3. Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. John Wiley & Sons, Chichester, UK (2004) 4. Alam, M., Napoli, A.: Interactive exploration over RDF data using Formal Concept Analysis. In: International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, October 19 - October 21, 2015, IEEE (2015) 5. Caspard, N., Leclerc, B., Monjardet, B.: Finite Ordered Sets. Cambridge University Press, Cambridge, UK (2012) First published in French as “Ensembles ordonnés finis : concepts, résultats et usages”, Springer 2009. 6. Pichon, E., Lenca, P., Guillet, F., Wang, J.W.: Un algorithme de partition d’un produit direct d’ordres totaux en un nombre fini de chaı̂nes. Mathématiques, Informatique et Sciences Humaines 125 (1994) 5–15 7. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin/Heidelberg (1999) 8. Gabow, H.N., Bentley, J.L., Tarjan, R.E.: Scaling and Related Techniques for Geometry Problems. In: Proc. Sixt. Annu. ACM Symp. Theory Comput. STOC ’84, New York, NY, USA, ACM (1984) 135–143 9. Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and DAGs. J. Algorithms 57(2) (2005) 75–94 10. Salamon, G., Wiener, G.: On finding spanning trees with few leaves. Inf. Process. Lett. 105(5) (2008) 164–169 11. Aı̈t-Kaci, H., Boyer, R., Lincoln, P., Nasr, R.: Efficient Implementation of Lattice Operations. ACM Trans. Program. Lang. Syst. 11(1) (January 1989) 115–146 12. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. (Ny). 181(10) (2011) 1989 – 2001 Author Index Adaricheva, Kira, 217 Lumpe, Lars, 171 Akhmatnurov, Marat, 99 Alam, Mehwish, 23, 241 Makhalova, Tatyana P., 59 Albano, Alexandre, 73 Miclet, Laurent, 159 Antoni, Ľubomír, 147 Miralles, André, 111 Molla, Guilhem, 111 Bartl, Eduard, 229 Mora, Angel, 217 Borchmann, Daniel, 181 Buzmakov, Aleksey, 241 Napoli, Amedeo, 23, 241 Nebut, Clémentine, 111 Cabrera, Inmaculada P., 147 Nicolas, Jacques, 159 Chornomaz, Bogdan, 73 Nourine, Lhouari, 9, 123 Cleophas, Loek, 87 Nxumalo, Madoda, 87 Cordero, Pablo, 217 Ojeda-Aciego, Manuel, 147 Derras, Mustapha, 111 Osmuk, Matthieu, 23 Deruelle, Laurent, 111 Dubois, Didier, 3 Pasi, Gabriella, 1 Priss, Uta, 135 Enciso, Manuel, 217 Quilliot, Alain, 123 Gnatyshak, Dmitry V., 47 Ramon, Jan, 7 Revenko, Artem, 35 Huchard, Marianne, 111 Rodríguez-Lorenzo, Estrella, 217 Ignatov, Dmitry I., 47, 99 Sailanbayev, Alibek, 241 Schmidt, Stefan E., 171 Kauer, Martin, 11 Konecny, Jan, 205, 229 Toussaint, Hélène, 123 Kourie, Derrick G., 87 Krídlo, Ondrej, 147, 205 Uno, Takeaki, 5 Krajči, Stanislav, 147 Kriegel, Francesco, 181, 193 Watson, Bruce W., 87 Krupka, Michal, 11 Kuznetsov, Sergei O., 59 Zudin, Sergey, 47 Editors: Sadok Ben Yahia, Jan Konecny Title: CLA 2015, Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications Publisher & Print: UBP, Limos, Campus Universitaire des Cézeaux, 63178 AUBIERE CEDEX – FRANCE Place, year, edition: Clermont-Ferrand, France, 2015, 1st Page count: xiii+254 Impression: 50 Archived at: cla.inf.upol.cz Not for sale ISBN 978–2–9544948–0–7