=Paper=
{{Paper
|id=Vol-1252/proceedings-cla2014
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1252/proceedings-cla2014.pdf
|volume=Vol-1252
}}
==None==
CLA 2014 Proceedings of the Eleventh International Conference on Concept Lattices and Their Applications CLA Conference Series cla.inf.upol.cz Institute of Computer Science Pavol Jozef Šafárik University in Košice, Slovakia ISBN 978–80–8152–159–1 Karell Bertet, Sebastian Rudolph (Eds.) CLA 2014 Concept Lattices and Their Applications Volume I 11th International Conference on Concept Lattices and Their Applications Košice, Slovakia, October 07–10, 2014 Proceedings P. J. Šafárik University, Košice, Slovakia 2014 Volume editors Karell Bertet Université de La Rochelle La Rochelle, France E-mail: kbertet@univ-lr.fr Sebastian Rudolph Technische Universität Dresden Dresden, Germany E-mail: sebastian.rudolph@tu-dresden.de Technical editor Sebastian Rudolph, sebastian.rudolph@tu-dresden.de Cover design Róbert Novotný, robert.novotny@upjs.sk c P. J. Šafárik University, Košice, Slovakia 2014 This work is subject to copyright. All rights reserved. Reproduction or publica- tion of this material, even partial, is allowed only with the editors’ permission. ISBN 978–80–8152–159–1 Organization CLA 2014 was organized by the Institute of Computer Science, Pavol Jozef Šafárik University in Košice. Steering Committee Radim Bělohlávek Palacký University, Olomouc, Czech Republic Sadok Ben Yahia Faculté des Sciences de Tunis, Tunisia Jean Diatta Université de la Réunion, France Peter Eklund University of Wollongong, Australia Sergei O. Kuznetsov State University HSE, Moscow, Russia Engelbert Mephu Nguifo Université de Clermont Ferrand, France Amedeo Napoli LORIA, Nancy, France Manuel Ojeda-Aciego Universidad de Málaga, Spain Jan Outrata Palacký University, Olomouc, Czech Republic Program Chairs Karell Bertet Université de La Rochelle, France Sebastian Rudolph Technische Universität Dresden, Germany Program Committee Kira Adaricheva Nazarbayev University, Astana, Kazakhstan Cristina Alcalde Univ del Pais Vasco, San Sebastián, Spain Jamal Atif Université Paris Sud, France Jaume Baixeries Polytechnical University of Catalonia, Spain Radim Bělohlávek Palacký University, Olomouc, Czech Republic Sadok Ben Yahia Faculty of Sciences, Tunis, Tunisia François Brucker Ecole Centrale Marseille, France Ana Burusco Universidad de Navarra, Pamplona, Spain Claudio Carpineto Fondazione Ugo Bordoni, Roma, Italy Pablo Cordero Universidad de Málaga, Spain Mathieu D’Aquin The Open University, Milton Keynes, UK Christophe Demko Université de La Rochelle, France Jean Diatta Université de la Réunion, France Florent Domenach University of Nicosia, Cyprus Vincent Duquenne Université Pierre et Marie Curie, Paris, France Sebastien Ferre Université de Rennes 1, France Bernhard Ganter Technische Universität Dresden, Germany Alain Gély Université Paul Verlaine, Metz, France Cynthia Vera Glodeanu Technische Universität Dresden, Germany Robert Godin Université du Québec à Montréal, Canada Tarek Hamrouni Faculty of Sciences, Tunis, Tunisia Marianne Huchard LIRMM, Montpellier, France Céline Hudelot Ecole Centrale Paris, France Dmitry Ignatov State University HSE, Moscow, Russia Mehdi Kaytoue LIRIS - INSA de Lyon, France Jan Konecny Palacký University, Olomouc, Czech Republic Marzena Kryszkiewicz Warsaw University of Technology, Poland Sergei O. Kuznetsov State University HSE, Moscow, Russia Leonard Kwuida Bern University of Applied Sciences, Switzerland Florence Le Ber Strasbourg University, France Engelbert Mephu Nguifo Université de Clermont Ferrand, France Rokia Missaoui Université du Québec en Outaouais, Gatineau, Canada Amedeo Napoli LORIA, Nancy, France Lhouari Nourine Université de Clermont Ferrand, France Sergei Obiedkov State University HSE, Moscow, Russia Manuel Ojeda-Aciego Universidad de Málaga, Spain Jan Outrata Palacký University, Olomouc, Czech Republic Pascal Poncelet LIRMM, Montpellier, France Uta Priss Ostfalia University, Wolfenbüttel, Germany Olivier Raynaud LIMOS, Université de Clermont Ferrand, France Camille Roth Centre Marc Bloch, Berlin, Germany Barış Sertkaya SAP Research Center, Dresden, Germany Henry Soldano Laboratoire d’Informatique de Paris Nord, France Gerd Stumme University of Kassel, Germany Laszlo Szathmary University of Debrecen, Hungary Petko Valtchev Université du Québec à Montréal, Canada Francisco J. Valverde Albacete Universidad Nacional de Educación a Distancia, Spain Additional Reviewers Xavier Dolques Strasbourg University, France Philippe Fournier-Viger University of Moncton, Canada Michal Krupka Palacký University, Olomouc, Czech Republic Organization Committee Ondrej Krı́dlo Pavol Jozef Šafárik University, Košice, Slovakia Stanislav Krajči Pavol Jozef Šafárik University, Košice, Slovakia L’ubomı́r Antoni Pavol Jozef Šafárik University, Košice, Slovakia Lenka Pisková Pavol Jozef Šafárik University, Košice, Slovakia Róbert Novotný Pavol Jozef Šafárik University, Košice, Slovakia Table of Contents Preface Invited Contributions Relationship between the Relational Database Model and FCA . . . . . . . . . 1 Jaume Baixeries What Formalism for the Semantic Web? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Hassan Aı̈t-Kaci Linguistic Data Mining with FCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Uta Priss Shortest CNF Representations of Pure Horn Functions and their Connection to Implicational Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Ondrej Cepek Full Papers Learning Model Transformation Patterns using Graph Generalization . . . . 11 Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut Interaction Challenges for the Dynamic Construction of Partially- Ordered Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Tim Pattison and Aaron Ceglar The Educational Tasks and Objectives System within a Formal Context . 35 L’ubomı́r Antoni, Ján Guniš, Stanislav Krajči, Ondrej Krı́dlo and L’ubomı́r Šnajder Pattern Structures for Understanding Episode Patterns . . . . . . . . . . . . . . . . 47 Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto Formal Concept Analysis for Process Enhancement Based on a Pair of Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Madori Ikeda, Keisuke Otaki and Akihiro Yamamoto Merging Closed Pattern Sets in Distributed Multi-Relational Data . . . . . . 71 Hirohisa Seki and Yohei Kamiya Looking for Bonds between Nonhomogeneous Formal Contexts . . . . . . . . . 83 Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči Reverse Engineering Feature Models from Software Configurations using Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Ra’Fat Al-Msie’Deen, Marianne Huchard, Abdelhak Seriai, Christelle Urtado and Sylvain Vauttier An Algorithm for the Multi-Relational Boolean Factor Analysis based on Essential Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Martin Trnecka and Marketa Trneckova On Concept Lattices as Information Channels . . . . . . . . . . . . . . . . . . . . . . . . 119 Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas Using Closed Itemsets for Implicit User Authentication in Web Browsing . 131 Olivier Coupelon, Diyé Dia, Fabien Labernia, Yannick Loiseau and Olivier Raynaud The Direct-optimal Basis via Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Estrella Rodrı́guez Lorenzo, Karell Bertet, Pablo Cordero, Manuel Enciso and Ángel Mora Ordering Objects via Attribute Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Inma P. Cabrera, Manuel Ojeda-Aciego and Jozef Pócs DFSP: A New Algorithm for a Swift Computation of Formal Concept Set Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Ilyes Dimassi, Amira Mouakher and Sadok Ben Yahia Attributive and Object Subcontexts in Inferring Good Maximally Redundant Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Xenia Naidenova and Vladimir Parkhomenko Removing an Incidence from a Formal Context . . . . . . . . . . . . . . . . . . . . . . . 195 Martin Kauer and Michal Krupka Formal L-concepts with Rough Intents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Eduard Bartl and Jan Konecny Reduction Dimension of Bags of Visual Words with FCA . . . . . . . . . . . . . . 219 Ngoc Bich Dao, Karell Bertet and Arnaud Revel A One-pass Triclustering Approach: Is There any Room for Big Data? . . . 231 Dmitry V. Gnatyshak, Dmitry I. Ignatov, Sergei O. Kuznetsov and Lhouari Nourine Three Related FCA Methods for Mining Biclusters of Similar Values on Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli Defining Views with Formal Concept Analysis for Understanding SPARQL Query Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Mehwish Alam and Amedeo Napoli A Generalized Framework to Consider Positive and Negative Attributes in Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267 José Manuel Rodrı́guez-Jiménez, Pablo Cordero, Manuel Enciso and Ángel Mora Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Preface Formal Concept Analysis is a mathematical theory formalizing aspects of hu- man conceptual thinking by means of lattice theory. As such, it constitutes a theoretically well-founded, practically proven, human-centered approach to data science and has been continuously contributing valuable insights, methodologies and algorithms to the scientific community. The International Conference “Concept Lattices and Their Applications (CLA)” is being organized since 2002 with the aim of providing a forum for researchers involved in all aspects of the study of FCA, from theory to implementations and practical applications. Previous years’ conferences took place in Hornı́ Bečva, Os- trava, Olomouc (all Czech Republic), Hammamet (Tunisia), Montpellier (France), Olomouc (Czech Republic), Sevilla (Spain), Nancy (France), Fuengirola (Spain), and La Rochelle (France). The eleventh edition of CLA was held in Košice, Slo- vakia from October 7 to 10, 2014. The event was organized and hosted by the Institute of Computer Science at Pavol Jozef Šafárik University in Košice. This volume contains the selected papers as well as abstracts of the four invited talks. We received 28 submissions of which 22 were accepted for publication and presentation at the conference. We would like to thank the contributing authors, who submitted high quality works. In addition we were very happy to welcome five distinguished invited speakers: Jaume Baixeries, Hassan Aı̈t-Kassi, Uta Priss, and Ondrej Cepek. All submitted papers underwent a thorough review by members of the Program Committee with the help of additional reviewers. We would like to thank all reviewers for their valuable assistance. A selection of extended versions of the best papers will be published in a renowned journal, pending another reviewing process. The success of such an event heavily relies on the hard work and dedication of many people. Next to the authors and reviewers, we would also like to acknowl- edge the help of the CLA Steering Committee, who gave us the opportunity of chairing this edition and provided advice and guidance in the process. Our greatest thanks go to the local Organization Committee from the Institute of Computer Science, Pavol Jozef Šafárik University in Košice, who put a lot of ef- fort into the local arrangements and provided the pleasant atmosphere necessary to attain the goal of providing a balanced event with a high level of scientific exchange. Finally, it is worth noting that we benefited a lot from the EasyChair conference management system, which greatly helped us to cope with all the typical duties of the submission and reviewing process. October 2014 Karell Bertet Sebastian Rudolph Program Chairs of CLA 2014 Relationship between the Relational Database Model and FCA Jaume Baixeries Computer Science Department Universitat Politcnica de Catalunya Barcelona. Catalonia The Relational Database Model (RDBM) [3, 4] is one of the most relevant database models that are being currently used to manage data. Although some alternative models are also being used and implemented (namely, object oriented databases and structured datatypes databases or NoSQL databases [1, 2]), the RDBM still maintains its popularity, as some rankings indicate 1 . The RDBM can be formulated from a set-theoretical point of view, such that a tuple is a partial function, and other basic operations in this model such as projections, joins, selections, etc, can be seen as set operations. Another important feature of this model is the existence of constraints, which are first-order predicates that must hold in a relational database. These con- straints mostly describe conditions that must hold in order to keep the consis- tency of the data in the database, but also help to describe some semantical aspects of the dataset. In this talk, we consider some aspects of the RDBM that have been char- acterized with FCA, focusing on different kinds of constraints that appear in the Relational Model. We review some results that formalize different kinds of contraints with FCA [5–8]. We also explain how some concepts of the RDBM such as key, closure, completion, cover can be easily be understood with FCA. References 1. Kai Orend. Analysis and Classification of NoSQL Databases and Evaluation of their Ability to Replace an Object-relational Persistence Layer. 2010. doi=10.1.1.184.483 2. A B M Moniruzzaman, Syed Akhter Hossain NoSQL Database: New Era of Databases for Big data Analytics. Classification, Characteristics and Comparison. arXiv:1307.0191 [cs.DB] 3. Codd, E. F. A Relational Model of Data for Large Shared Data Banks. Commun. ACM, 1970, volume 13, number 6. 4. Date, C. J. An Introduction to Database Systems (8 ed.). Pearson Education. ISBN 0-321-19784-4. 5. Baixeries, Jaume. A Formal Context for Symmetric Dependencies. ICFCA 2008. LNAI 4933. 6. Baixeries, Jaume and Balcázar, José L. Characterization and Armstrong Relations for Degenerate Multivalued Dependencies Using Formal Concept Analysis. For- mal Concept Analysis, Third International Conference, ICFCA 2005, Lens, France, February 14-18, 2005, Proceedings. Lecture Notes in Computer Science, 2005 1 http://db-engines.com/en/ranking 2 Jaume Baixeries 7. Baixeries, Jaume and Balcázar, José L. Unified Characterization of Symmetric De- pendencies with Lattices. Contributions to ICFCA 2006. 4th International Confer- ence on Formal Concept Analysis 2005. 8. Baixeries, Jaume. A Formal Concept Analysis framework to model functional de- pendencies. Mathematical Methods for Learning, 2004. What Formalism for the Semantic Web? Hassan Aı̈t-Kaci hassan.ait-kaci@univ-lyon1.fr ANR Chair of Excellence CEDAR Project LIRIS Université Claude Bernard Lyon 1 France The world is changing. The World Wide Web is changing. It started out as a set of purely notational conventions for interconnecting information over the Internet. The focus of information processing has now shifted from local disconnected disc-bound silos to Internet-wide interconnected clouds. The nature of information has also evolved. From raw uniform data, it has now taken the shape of semi-structured data and meaning- carrying so-called “Knowledge Bases.” While it was sufficient to process raw data with structure-aware querying, it has now become necessary to process knowledge with contents-aware reasoning. Computing must therefore adapt from dealing with mere ex- plicit data to inferring implicit knowledge. How to represent such knowledge and how inference therefrom can be made effective (whether reasoning or learning) is thus a central challenge among the many now facing the world wide web. So called “ontologies” are being specified and meant to encode formally encyclo- pedic as well as domain-specific knowledge. One early (still on-going) such effort has been the Cyc1 system. It is a knowledge-representation system (using LISP syntax) that makes use of a set of varied reasoning methods, altogether dubbed “commonsense.” A more recent formalism issued of Description Logic (DL)—viz. the Web Ontology Lan- guage (OWL2 )—has been adopted as a W3C recommendation. It encodes knowledge using a specific standardized (XML, RDF) syntax. Its constructs are given a model- theoretic semantics which is usually realized operationally using tableau3 -based rea- soning.4 The point is that OWL is clearly designed for a specific logic and reason- ing method. Saying that OWL is the most adequate interchange formalism for Knowl- edge Representation (KR) and automated reasoning (AR) is akin to saying that English is the best designed human language for facilitating information interchange among humans—notwithstanding the fact that it was simply imposed by the most recent per- vasive ruling power, just as Latin was Europe’s Lingua Franca for centuries. Thus, it is fair to ask one’s self a simple question: “Is there, indeed, a single most adequate knowledge representation and reasoning method that can be such a norm? ” 1 http://www.cyc.com/platform/opencyc 2 http://www.w3.org/TR/owl-features/ 3 http://en.wikipedia.org/wiki/Method_of_analytic_tableaux 4 Using of tableau methods is the case of the most prominent SW reasoner [6, 5, 7]. Systems using alternative reasoning methods must first translate the DL-based syntax of OWL into their own logic or RDF query processing. This may be costly [9] and/or incomplete [8]. 4 Hassan Aı̈t-Kaci I personally do not think so. In this regard, I share the general philosophy of Doug Lenat5 , Cyc’s designer—although not the haphazard approach he has chosen to follow.6 If one ponders what characterizes an ontology making up a knowledge base, some specific traits most commonly appear. For example, it is universally acknowledged that, rather than being a general set of arbitrary formal logical statements describing some generic properties of “the world,” a formal knowledge base is generally organized as a concept-oriented information structure. This is as important a change of perspective, just as object-oriented programming was with respect to traditional method-oriented programming. Thus, some notion of property “inheritance” among partially-ordered “concepts” (with an “is-a” relation) is a characteristic aspect of KR formalisms. In such a system, a concept has a straightforward semantics: its denotes of set of elements (its “instances”) and the “is-a” relation denotes set inclusion. Properties attached to a concept denote information pertaining to all instances of this concept. All properties verified by a concept are therefore inherited by all its subconcepts. Sharing this simple characteristic, formal KR formalisms have emerged from sym- bolic mathematics that offer means to reason with conceptual information, depending on mathematical apparatus formalizing inheritance and the nature of properties attached to concepts. In Description Logic7 , properties are called “roles” and denote binary re- lations among concepts. On the other hand, Formal Concept Analysis (FCA8 ) uses an algebraic approach whereby an “is-a” ordering is automatical derived from proposi- tional properties encoding the concepts that are attached to as bit vectors. A concept is associated an attribute with a boolean marker (1 or “true”) if it possesses it, and with a (0 or “false”) otherwise. The bit vectors are simply the rows of the “property ma- trix” relating concepts to their attributes. This simple and powerful method, originally proposed by Rudolf Wille, has a dual interpretation when matching attributes with con- cepts possessing them. Thus, dually, it views attributes also as partially ordered (as the columns of the binary matrix). An elegant Galois-connection ensues that enables sim- ple extraction of conceptual taxonomies (and their dual attribute-ordered taxonomies) from simple facts. Variations such as Relational Concept Analysis (RCA9 ) offer more expressive, and thus more sophisticated, knowledge while preserving the essential alge- braic properties of FCA. It has also been shown how DL-based reasoning (e.g. OWL) can be enhanced with FCA.10 Yet another formalism for taxonomic attributed knowledge, which I will present in more detail in this presentation, is the Order-Sorted Feature (OSF) constraint for- malism. This approach proposes to see everything as an order-sorted labelled graph. 5 http://en.wikipedia.org/wiki/Douglas_Lenat 6 However, I may stand corrected in the future since knowledge is somehow fundamentally haphazard. My own view is that, even for dealing with a heterogenous world, I would rather favor mathematically formal representation and reasoning methods dealing with uncertainty and approximate reasoning, whether probabilistic, fuzzy, or dealing with inconsistency (e.g. rough sets, paraconsistency). 7 http://en.wikipedia.org/wiki/Description_logic 8 http://en.wikipedia.org/wiki/Formal_concept_analysis 9 http://www.hse.ru/data/2013/07/04/1286082694/ijcai_130803.pdf 10 http://ijcai-11.iiia.csic.es/files/proceedings/ T13-ijcai11Tutorial.pdf What Formalism for the Semantic Web? 5 Sorts are set-denoting and partially ordered with an inclusion-denoting “is-a” relation, and so form a conceptual taxonomy. Attributes, called “features,” are function-denoting symbols labelling directed edges between sort-labelled nodes. Such OSF graphs are a straightforward generalization of algebraic First-Order Terms (FOTs) as used in Logic Programming (LP) and Functional Programming (FP). Like FOTs, they form a lattice structure with OSF graph matching as the partial ordering, OSF graph unification as infimum (denoting set intersection), and OSF graph generalization as supremum.11 Both operations are very efficient. These lattice-theoretic properties are preserved when one endows a concept in a taxonomy with additional order-sorted relational and func- tional constraints (using logical conjunction for unification and disjunction for general- ization for the attached constraints). These constraints are inherited down the concep- tual taxonomy in such a way as to be incrementally enforceable as a concept becomes gradually refined. The OSF system has been the basis of Constraint Logic Programming for KR and ontological reasoning (viz. LIFE) [2, 1]. As importantly, OSF graph-constraint technology has been at work with great success in two essential areas of AI: NLP and Machine Learning: – it has been a major paradigm in the field of Natural Language Processing (NLP) for a long time; notably, in so-called “Head-driven Phrase Structure Grammar” (HPSG12 ) and Unification Grammar (UG13 ) technology [4]. This is indeed not sur- prising given the ease with which feature structure unification enables combining both syntactic and semantic information in a clean, declarative, and efficient way.14 – Similarly, while most of the attention in the OSF literature has been devoted to uni- fication, its dual—namely, generalization—is just as simple to use, and computes the most specific OSF term that subsumes two given terms [3]. This operation is central in Machine Learning and with it, OSF technology lends itself to be com- bined with popular Data Mining techniques such as Support Vector Machines using frequency or probabilistic information. In this presentation, I will give a rapid overview of the essential OSF formalism for knowledge representation along its reasoning method which is best formalized as order-sorted constraint-driven inference. I will also illustrate its operational efficiency and scalability in comparison with those of prominent DL-based reasoners used for the Semantic Web. The contribution of this talk to answering the question in its title is that the Semantic Web effort should not impose a priori putting all our eggs in one single (untested) basket. Rather, along with DL, other viable alternatives such as the FCA and OSF formalisms, and surely others, should be combined for realizing a truly semantic web. 11 This supremum operation, however, does not (always) denote set union—as for FOT subsump- tion, it is is not modular (and hence neither is it distributive). 12 http://en.wikipedia.org/wiki/Head-driven_phrase_structure_ grammar 13 http://www.cs.haifa.ac.il/˜shuly/malta-slides.pdf 14 http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.2021 6 Hassan Aı̈t-Kaci References 1. A ÏT-K ACI , H. Data models as constraint systems—a key to the Semantic Web. Con- straint Processing Letters 1 (November 2007), 33–88. online: http://cs.brown.edu/ people/pvh/CPL/Papers/v1/hak.pdf. 2. A ÏT-K ACI , H., AND P ODELSKI , A. Towards a meaning of LIFE. Journal of Logic Pro- gramming 16, 3-4 (1993), 195–234. online: http://hassan-ait-kaci.net/pdf/ meaningoflife.pdf. 3. A ÏT-K ACI , H., AND S ASAKI , Y. An axiomatic approach to feature term generalization. In Proceedings of European Cinference on Machine Learning (ECML 2001) (Freiburg, Ger- many, 2001), L. D. Raedt and P. Flach, Eds., LNAI 2167, Springer-Verlag, pp. 1–12. online: http://www.hassan-ait-kaci.net/pdf/ecml01.pdf. 4. C ARPENTER , B. Typed feature structures: A generalization of first-order terms. In Proceed- ings of the 1991 International Symposium on Logic Programming (Cambridge, MA, USA, 1991), V. Saraswat and K. Ueda, Eds., MIT Press, pp. 187–201. 5. M OTIK , B., S HEARER , R., AND H ORROCKS , I. Hypertableau reasoning for description logics. Journal of Artificial Intelligence Research 36, 1 (September 2009), 165–228. online: https://www.jair.org/media/2811/live-2811-4689-jair.pdf. 6. S HEARER , R., M OTIK , B., AND H ORROCKS , I. HermiT: A highly-efficient OWL rea- soner. In Proceedings of the 5th International Workshop on OWL Experiences and Direc- tions (Karlsruhe, Germany, October 2008), U. Sattler and C. Dolbear, Eds., OWLED’08, CEUR Workshop Proceedings. online: http://www.cs.ox.ac.uk/ian.horrocks/ Publications/download/2008/ShMH08b.pdf. 7. S IRIN , E., PARSIA , B., G RAU , B. C., K ALYANPUR , A., AND K ATZ , Y. Pellet: A practical OWL-DL reasoner. Journal of Web Semantics 5, 2 (June 2007), 51–53. This is a summary; full paper: online: http://pellet.owldl.com/papers/sirin05pellet.pdf. 8. S TOILOS , G., C UENCA G RAU , B., AND H ORROCKS , I. How incomplete is your seman- tic web reasoner? In Proceedings of the 24th National Conference on Artificial Intelli- gence (AAAI 10) (Atlanta, Georgia, USA, July 11–15, 2010), M. Fox and D. Poole, Eds., AAAI, AAAI Publications, pp. 1431–1436. online: http://www.cs.ox.ac.uk/ian. horrocks/Publications/download/2010/StCH10a.pdf. 9. T HOMAS , E., PAN , J. Z., AND R EN , Y. TrOWL: Tractable OWL 2 reasoning infrastruc- ture. In Proceedings of the 7th Extended Semantic Web Conference (Heraklion, Greece, May-June 2010), L. Aroyo, G. Antoniou, E. Hyvnen, A. ten Teije, H. Stuckenschmidt, L. Cabral, and T. Tudorache, Eds., ESWC’10, Springer-Verlag, pp. 431–435. online: http: //homepages.abdn.ac.uk/jeff.z.pan/pages/pub/TPR2010.pdf. Linguistic Data Mining with FCA Uta Priss ZeLL, Ostfalia University of Applied Sciences Wolfenbüttel, Germany www.upriss.org.uk The use of lattice theory for linguistic data mining applications in the widest sense has been independently suggested by different researchers. For example, Masterman (1956) suggests using a lattice-based thesaurus model for machine translation. Mooers (1958) describes a lattice-based information retrieval model which was included in the first edition of Salton’s (1968) influential textbook. Sladek (1975) models word fields with lattices. Dyvik (2004) generates lattices which represent mirrored semantic structures in a bilingual parallel corpus. These approaches were later translated into the language of Formal Concept Analysis (FCA) in order to provide a more unified framework and to generalise them for use with other applications (Priss (2005), Priss & Old (2005 and 2009)). Linguistic data mining can be subdivided into syntagmatic and paradigmatic approaches. Syntagmatic approaches exploit syntactic relationships. For exam- ple, Basili et al. (1997) describe how to learn semantic structures from the ex- ploration of syntactic verb-relationships using FCA. This was subsequently used in similar form by Cimiano (2003) for ontology construction, by Priss (2005) for semantic classification and by Stepanova (2009) for the acquisition of lexico- semantic knowledge from corpora. Paradigmatic relationships are semantic in nature and can, for example, be extracted from bilingual corpora, dictionaries and thesauri. FCA neighbourhood lattices are a suitable means of mining bilingual data sources (Priss & Old (2005 and 2007)) and monolingual data sources (Priss & Old (2004 and 2006)). Ex- perimental results for neighbourhood lattices have been computed for Roget’s Thesaurus, WordNet and Wikipedia data (Priss & Old 2006, 2010a and 2010b). Previous overviews of linguistic applications of FCA were presented by Priss (2005 and 2009). This presentation summarises previous results and provides an overview of more recent research developments in the area of linguistic data mining with FCA. References 1. Basili, R.; Pazienza, M.; Vindigni, M. (1997). Corpus-driven unsupervised learning of verb subcategorization frames. AI*IA-97. 2. Cimiano, P.; Staab, S.; Tane, J. (2003). Automatic Acquisition of Taxonomies from Text: FCA meets NLP. Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, p. 10-17. 3. Dyvik, H. (2004). Translations as semantic mirrors: from parallel corpus to wordnet. Language and Computers, 49, 1, Rodopi, p. 311-326. 8 Uta Priss 4. Masterman, Margaret (1956). Potentialities of a Mechanical Thesaurus. MIT Con- ference on Mechanical Translation, CLRU Typescript. [Abstract]. In: Report on research: Cambridge Language Research Unit. Mechanical Translation 3, 2, p. 36. Full paper in: Masterman (2005). 5. Mooers, Calvin N. (1958). A mathematical theory of language symbols in retrieval. In: Proc. Int. Conf. Scientific Information, Washington D.C. 6. Priss, Uta; Old, L. John (2004). Modelling Lexical Databases with Formal Concept Analysis. Journal of Universal Computer Science, 10, 8, p. 967-984. 7. Priss, Uta (2005). Linguistic Applications of Formal Concept Analysis. In: Ganter; Stumme; Wille (eds.), Formal Concept Analysis, Foundations and Applications. Springer Verlag. LNAI 3626, p. 149-160. 8. Priss, Uta; Old, L. John (2005). Conceptual Exploration of Semantic Mirrors. In: Ganter; Godin (eds.), Formal Concept Analysis: Third International Conference, ICFCA 2005, Springer Verlag, LNCS 3403, p. 21-32. 9. Priss, Uta; Old, L. John (2006). An application of relation algebra to lexical databases. In: Schaerfe, Hitzler, Ohrstrom (eds.), Conceptual Structures: Inspiration and Application, Proceedings of the 14th International Conference on Conceptual Structures, ICCS’06, Springer Verlag, LNAI 4068, p. 388-400. 10. Priss, Uta; Old, L. John (2007). Bilingual Word Association Networks. In: Priss, Polovina, Hill (eds.), Proceedings of the 15th International Conference on Concep- tual Structures, ICCS’07, Springer Verlag, LNAI 4604, p. 310-320. 11. Priss, Uta (2009). Formal Concept Analysis as a Tool for Linguistic Data Explo- ration. In: Hitzler, Pascal; Scharfe, Henrik (eds.), Conceptual Structures in Practice, Chapman & Hall/CRC studies in informatics series, p. 177-198. 12. Priss, Uta; Old, L. John (2009). Revisiting the Potentialities of a Mechanical The- saurus. In: Ferre; Rudolph (eds.), Proceedings of the 7th International Conference on Formal Concept Analysis, ICFCA’09, Springer Verlag. 13. Priss, Uta; Old, L. John (2010a). Concept Neighbourhoods in Knowledge Organi- sation Systems. In: Gnoli; Mazzocchi (eds.), Paradigms and conceptual systems in knowledge organization. Proceedings of the 11th International ISKO Conference, p. 165-170. 14. Priss, Uta; Old, L. John (2010b). Concept Neighbourhoods in Lexical Databases. In: Kwuida; Sertkaya (eds.), Proceedings of the 8th International Conference on Formal Concept Analysis, ICFCA’10, Springer Verlag, LNCS 5986, p. 283-295. 15. Salton, Gerard (1968). Automatic Information Organization and Retrieval. McGraw-Hill, New York. 16. Stepanova, Nadezhda A. (2009). Automatic acquisition of lexico-semantic knowl- edge from corpora. SENSE’09 Workshop. Available at http://ceur-ws.org/Vol-476/. 17. Sladek, A. (1975). Wortfelder in Verbänden. Gunter Narr Verlag, Tübingen. Shortest CNF Representations of Pure Horn Functions and their Connection to Implicational Bases Ondrej Cepek Charles University, Prague Pure Horn CNFs, directed hypergraphs, and closure systems are objects stud- ied in different subareas of theoretical computer science. Nevertheless, these three objects are in some sense isomorphic. Thus also properties derived for one of these objects can be usually translated in some way for the other two. In this talk we will concentrate on the problem of finding a shortest CNF representation of a given pure Horn function. This is a problem with many practical applications in artificial intelligence (knowledge compression) and other areas of computer sci- ence (e.g. relational data bases). In this talk we survey complexity results known for this problem and then concentrate on the relationships between CNF rep- resentations of Horn functions and certain sets of implicates of these functions, called essential sets of implicates. The definition of essential sets is based on the properties of resolution. Essential sets can be shown to fulfill an interesting or- thogonality property: every CNF representation and every (nonempty) essential set must intersect. This property leads to non-trivial lower bounds on the CNF size, which are sometimes tight and sometimes have a gap. We will try to derive connections to the known properties of minimal implicational bases. The talk is based on joint research with Endre Boros, Alex Kogan, Petr Kucera, and Petr Savicky. Learning Model Transformation Patterns using Graph Generalization Hajer Saada, Marianne Huchard, Michel Liquière, and Clémentine Nebut LIRMM, Université de Montpellier 2 et CNRS, Montpellier, France, first.last@lirmm.fr Abstract. In Model Driven Engineering (MDE), a Model Transforma- tion is a specialized program, often composed of a set of rules to transform models. The Model Transformation By Example (MTBE) approach aims to assist the developer by learning model transformations from source and target model examples.In a previous work, we proposed an approach which takes as input a fragmented source model and a target model, and produces a set of fragment pairs that presents the many-to-many match- ing links between the two models. In this paper, we propose to mine model transformation patterns (that can be later transformed in trans- formation rules) from the obtained matching links. We encode our models into labeled graphs that are then classified using the GRAAL approach to get meaningful common subgraphs. New transformation patterns are then found from the classification of the matching links based on their graph ends. We evaluate the feasibility of our approach on two represen- tative small transformation examples. 1 Introduction MDE is a subfield of software engineering that relies on models as a central arti- fact for the software development cycle. Models can be manually or automatically manipulated using model transformations. A model transformation is a program, often composed of a set of transformation rules, that takes as input a model and produces as output another model. The models conform to meta-models, as programs conform to the programming language grammar. If we would like to transform any Java program into any C++ program, we would express this trans- formation at the level of their grammars. In MDE, model transformations are similarly expressed in terms of the meta-models. Designing a model transforma- tion is thus a delicate issue, because the developer has to master the specialized language in which the transformation is written, the meta-modeling activity, and the subtleties of the source and the target meta-models. In order to as- sist the developers, the MTBE approach follows the track of the "Programming By Example" approach [6] and proposes to use an initial set of transformation examples from which the model transformation is partly learnt. The first step of the MTBE approach consists in extracting matching links, from which the second step learns transformation rules. Several approaches [1,15,12] are pro- posed for the second step, but they derive element-to-element (one-to-one) rules c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 11–23, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 12 Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut that mainly express how a source model element is transformed into a target model element. In this paper, we propose to learn transformation patterns of type fragment-to-fragment (many-to-many) using the output of a previous work [13] that consists in generating matching links between source and target model fragments. We encode our models and model fragments as labeled graphs. These graphs are classified through a lattice using a graph mining approach (GRAAL) to get meaningful common subgraphs. The matching links are then classified using Formal Concept Analysis, the lattice of source graphs and the lattice of target graphs. New transformation patterns are discovered from these classifica- tions that can help the designer of the model transformation. We evaluate the feasibility of our approach on two representative transformation examples. The next Section 2 gives an overview of our approach. Section 3 presents the transformation pattern mining approach and Section 4 evaluates its feasibility. Section 5 presents the related work, and we conclude in Section 6. 2 Approach Overview In Model-Driven Engineering, model transformation are programs that trans- form an input source model into an output target model. A classical model trans- formation (UML2REL) transforms a UML model into a RELational model. Such transformation programs are often written with declarative or semi-declarative languages and composed of a set of transformation rules, defined at the meta- model level. The meta-model defines the concepts and the relations that are used (and instantiated) to compose the models. For example, the UML meta-model Source GRAAL Source Source Model Graph Graphs Fragments Lattice Matching FCA Matching Link Transformation Target GRAAL Target Link Target Formal Patterns Model Graph Lattice Graphs Context Fragments Lattice Matching Links Fig. 1. Process overview contains the concept of Class which owns Attributes. This can used to derive a UML model composed of a class P erson owning the attribute N ame. In the UML2REL example, a very simple transformation pattern would be: a UML class owning an attribute is transformed into a relational table owning a column. In this paper, our objective is to learn such transformation patterns that express that a pattern associating entities of the source meta-model (e.g. a UML class owning an attribute) is transformed into a pattern associating entities of the target meta-model (e.g. a relational table owning a column). Fig. 1 provides an overview of our process. Let us consider that we want to learn rules for 2 Learning Model Transformation Patterns using Graph Generalization 13 transforming UML models to relational models. Our input data (see Fig. 2) are composed of: fragmented source models (a UML source fragment is given in Fig. 3(a)); fragmented target models (a relational target fragment is given in Fig. 3(b)); and matching links between fragments established by experts or by an automatic matching technique. For example a matching link (L1) is established in Fig. 2 between the UML source fragment of Fig. 3(a) and the relational target fragment of Fig. 3(b). SG0 L0 Person name TG2 TG0 Client 1 ReservationRequest Client clientNumber reservationNumber clientNumber name ∞ clientNumber date address address SG2 client 1..1 TG1 reservation L2 0..n ReservationRequest L1 SG1 Fig. 2. Three matching links between fragmented UML and relational models (this figure is more readable in a coloured version) Matching links established by experts or by automatic methods can be used to form a set of model transformation patterns. For example, the L2 matching link gives rise to a transformation pattern which indicates that a UML class (with an attribute) with its super-class (with an attribute) is transformed into a unique table with two columns, one being inherited. Nevertheless, matching links often correspond to patterns that combine several simpler transformations or are triggered from domain knowledge. Besides, they may contain minor errors (such as a few additional or missing elements, for example, column date of Table ReservationRequest has in fact no equivalent in Class ReservationRequest). Moreover, what interests us is beyond the model domain. We do not want to learn that Class Client is transformed into Table Client, but rather that a UML class is usually transformed into a table. Our output is composed of a set of model transformation patterns. Some can directly be inferred from initial matching links (as evoked previously), and some will be found thanks to graph generalization and matching link classification. From our simple example, we want to extract the model transformation pattern presented in Figure 4, whose premise and conclusion patterns do not appear as such in the initial set of matching links (,→ means "is transformed into"). 3 14 Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut Person name Client 1 ReservationRequest clientNumber requestNumber reserve name ∞ date Client client reservation ReservationRequest clientNumber clientNumber 1,1 0,n (a) UML source fragment (b) Relational target fragment Fig. 3. An example of UML and relational models ,→ Fig. 4. Transformation pattern: a class specializing a class with an attribute (in UML model) is transformed into a table with an inherited column (in relational model). 3 Model Transformation Pattern Generation From model fragments to graphs For our example, the source meta-model is inspired by a tiny part of the UML metamodel (see Figure 5(a)), while the target meta-model has its roots in a simplified relational data-base meta-model (see Fig. 5(c)). The models often are represented in a visual syntax (as shown in Fig. 3(a) and Fig. 3(b)) for readability reasons. Here we use their representation as instance diagrams of their meta-model (using the UML instance diagram syntax). For example, the UML model of Fig. 3(a) is shown as an instantiation of its meta-model in Fig. 5(b), where each object (in a rectangular box) is described by its meta-class in the meta-model, valued attributes and roles conforming to the attributes and associations from the meta-model: e.g. person and client are explicit instances of Class; client:Class has a link towards person labeled by the role specializes; client:Property has the attribute lowerBound (1). To extract expressive transformation patterns, we transform our models using their instance diagram syntax, into simpler graphs which have labeled vertices. We limited ourselves to locally injective labelled graphs. A locally injective graph is a labeled graph such that all vertices in the neighbor of a given vertex have different labels. This is not so restrictive in our case, because the fragments identified by the experts rarely include similar neighborhood for an entity. Here are the rules that we use in the transformation from simplified UML instance diagrams to labeled graphs. We associate a labeled node to Objects, Roles, At- tributes, Attribute values. Instance diagram of Figure 5(b) and corresponding labeled graph from 6(a) are used to illustrate the transformation: person:Class object is transformed into node 1 labeled class_1 and one of the attribute value 1 is transformed into node 13 labeled one_13. Edges come from the following situations: an object has an attribute; an attribute has a value; an object has 4 Learning Model Transformation Patterns using Graph Generalization 15 a role (is source of a role); an object is the target of a role. For example, for the property which has an attribute lowerBound (equal to zero), there is a corresponding edge (property_17, lowerBound_18). person : Class name : Property ownsAttribute specializes Class Property reservationRequest ownsAttribute lowerBound : Class upperBound hasType client : Class hasType specializes hasType reserve: ownedEnd {Ordered} Association ownsAttributeIdentifier ownedEnd 2 ownsAttributeIdentifier 1 Association ownedEnd 1 reservation: PropertyIdentifier clientNumber : Property PropertyIdentifier client : Property lowerBound = 0 lowerBound = 1 upperBound = n upperBound = 1 (a) UML metamodel (b) UML model of Fig 3(a) in instance dia- gram syntax Table Property Column inheritedProperty reservationRequest : client : Table Table 1 Property inheritedProperty FKey PKey clientNumber: clientNumber : date : Column name : Column FKey PKey hasSameName hasSameName requestNumber : PKey (c) Relational metamodel (d) Relational model of Fig 3(b) in instance diagram syntax Fig. 5. Source/target metamodel and model, UML (upper par), relational (lower part) Classification of graphs (GRAAL approach) After the previous step, we obtain a set of source graphs, and a set of target graphs. We illustrate the re- mainder of this section by using the three source graphs of Fig. 6, the three target graphs of Fig. 7, and the matching links (Source graph i, Target graph i), for i ∈ {0, 1, 2}. To get meaningful common subgraphs (on which new transfor- mation patterns will be discovered), we use the graph mining approach proposed in [7] and its derived GRAAL tool. In this approach, examples are described by a description language L provided with two operations: an ≤ specialization op- eration and an ⊗ operation which builds the least general generalization of two descriptions. A generalization of the Norris algorithm [11] builds the Galois lat- tice. Several description languages are implemented in GRAAL, and especially a description based on locally injective graphs. ⊗ operation is the reduction of the tensor product of graphs, also called the Kronecker product [14]. We indepen- dently classify source graphs and target graphs. Classification of source graphs 5 16 Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut produces the lattice of Fig. 8(a). For example, in this lattice, Concept sfc012 has for intent a subgraph of source graphs 0, 1 and 2 representing a class which specializes a class which owns an attribute. Classification of target graphs pro- duces the lattice of Fig. 8(b). In this lattice, Concept tfc012 has for intent a subgraph where a table has an inherited property. (a) Source graph 1 (b) Source graph 0 (c) Source graph 2 Fig. 6. Source graphs Classification of transformation links In the previous section, we have shown how Galois lattices can be computed on the labeled graphs that represent our model fragments. Now a matching link is described by a pair composed of a source fragment (whose corresponding graph is in the extent of some concepts in the source graph lattice) and a target fragment (whose corresponding graph is in the extent of some concepts in the source graph lattice). This is described in a formal context, where objects are the matching links and attributes are the concepts of the two lattices (source graph lattice and target graph lattice). In this formal context (presented in Table 11(a)), a matching link is associated with the concepts having respectively its source graph and its target graph in their extent. This means that the matching link is described by the graph of its source fragment and by the generalizations of this graph in the lattice. This is the same for the graphs of the target fragments. For example, matching link L0, connecting source fragment 0 to target fragment 0, is associated in the formal context to concepts sfc01, sfc012, tfc01, tfc012. 6 Learning Model Transformation Patterns using Graph Generalization 17 (a) Target graph 1 (b) Target graph 0 (c) Target graph 2 Fig. 7. Target graphs (a) Source graph lattice (b) Target graph lattice Fig. 8. Graph lattices. Only concept extents are represented in the figure. Intents of concepts are shown in Fig. 9 and 10. We denote by sf cx1 ...xn (resp. tf cx1 ...xn ) the vertex [x1 ,...,xn ] of the source (resp. target) graph lattice. The concept lattice associated with the matching link formal context of Fig. 11(a) is shown in Fig. 11(b). In this representation (obtained with RCAexplore1 ) each box describes a concept: the first compartment informs about the name of the concept, the second shows the simplified intent (here concepts from source fragment lattice and target fragment lattice) and the third one shows the sim- plified extent (here matching links). Concept_MatchingLinksFca_4 extent is composed of the links L0 and L1, while the intent is composed of source graph concepts sfc01, sfc012 and target graph concepts tfc01, tfc012. Model transformation pattern mining The last step of the process consists in extracting model transformation patterns from the matching link lattice. This has close connections to the problem of extracting implication rules in a concept lattice, but using only pairs of source and target graph concepts. The more 1 http://dolques.free.fr/rcaexplore.php 7 18 Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut (a) Concept sfc2 (b) Concept sfc01 (c) Concept sfc012 Fig. 9. Source graph lattice concepts. Concept sfc1 (not represented) has Source Graph 1 of Fig. 6 as intent (a) Concept (b) Concept tfc12 (c) Concept tfc01 tfc012 Fig. 10. Target graph lattice concepts. Concepts tfc1 and tfc2 (not represented) have resp. Target Graph 1 and Target Graph 2 from Fig. 7 as their intents. reliable transformation patterns are given when using a source graph and a target graph in the same simplified intent of a concept, because this corresponds to the fact that the source graph is always present when the target graph is present too (and reversely). For example, from Concept_MatchingLinksFca_0, we obtain the following transformation pattern: graph of sfc012 intent ,→ graph of tfc012 intent This pattern expresses a new transformation pattern (new in the sense that it does not directly come from a matching link): A UML model where a class Cd specializes another class Cm which owns an attribute a is transformed into a relational model where a table T owns a (inherited) column c. Due to the simplicity of our illustrative example, the other reliable patterns obtained from source and target graphs from the same simplified intent just correspond to matching links. Obtaining other, less reliable patterns, relies on the fact that if a source graph and a target graph are not in the same simplified intent, but the concept 8 Learning Model Transformation Patterns using Graph Generalization 19 (a) Concept 1 Concept_MatchingLinksFca_0 sfc012 tfc012 Concept_MatchingLinksFca_4 Concept_MatchingLinksFca_5 sfc01 tfc12 (b) Concept 2 (c) Concept 01 (d) Concept 012 tfc01 L0 Fig. 5. Source fragments lattice concepts Concept_MatchingLinksFca_1 Concept_MatchingLinksFca_3 Table 1. Matching Link Formal Context sfc1 sfc2 tfc1 tfc2 L1 L2 sfc012 tfc012 sfc01 tfc01 tfc12 sfc1 sfc2 tfc1 tfc2 ML Concept_MatchingLinksFca_2 L0 ⇥⇥ ⇥ ⇥ L1 ⇥ ⇥⇥⇥ ⇥⇥⇥ L2 ⇥ ⇥ ⇥ ⇥⇥ (a) Matching Link Formal Con- (b) Matching Link Lattice text M LF C 3.4 Model transformation Fig. rules 11.learning Matching link formal context and corresponding concept lattice. 4 Experimental Evaluation Cs which introduces the source graph is below the concept Ct which introduces 5 Related Work the target graph, then we infer the following transformation pattern: part of graph of Cs intent ,→ graph of Ct intent Model Transformation is a key component of Model Driven Engineering (MDE). In this paper, we learn modelFor transformation example, as sfc1from appears model below tfc12, we can transformation deduce that, when the in- traces. put of the transformation contains the graph We can distinguish between two categories of strategies to generate transfor- of intent ofsfc1, thus the output contains the graph of intent of tfc12. These patterns are less reliable, because mation traces: the first category [?], [?], [?], [?], [?], [?], [?], [?] depends on the the source graph may contain many things that have nothing to do with the tar- transformation program or engine. The corresponding approaches generate trace get graph (compare sfc1 and tfc12 to see this phenomenon). However, experts links through the execution of have can a model transformation. a look The to on these patterns second category find several [?], (concurrent) transformation [?], [?] consists in generating a transformation trace independently from a trans- patterns when several source model fragments are transformed into a same tar- formation program. get model fragment. We have a symmetric situation when a source graph and a target graph are not in the same simplified intent, but the concept Ct which introduces the4target graph is below the concept Cs which introduces the source graph. 4 Feasibility study We evaluated the feasibility of the approach on two different realistic transforma- tion examples: (1) UML class diagram to relational schema model that contains 9 20 Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut 108 model elements, 10 fragments (5 sources, 5 targets) and 5 matching links (U2S) and (2) UML class diagram to entity relationship model that contains 66 model elements, 6 fragments (3 sources, 3 targets) and 3 matching links (U2E). We compute from the obtained graphs for each transformation example several pattern categories (see left-hand side of Table 1). (1) The transformation patterns coming from simplified intents (which we think are the most relevant patterns): they correspond to graphs pairs (GS , GT ) such that GS and GT are in the simplified intent of a same concept. They can be divided into two sets. The set T Pl groups the patterns that are inferred from the initial matching links (GS , GT are the ends of a matching link). The set T Pn contains the patterns that are learned from graph generalization and matching link classification. (2) The transformation patterns (T Pnparts ) coming from the graphs GS and GT , such that GS is in simplified intent of a concept Cs which is a subconcept of the concept Ct which has GT in its simplified intent and all concepts greater than Cs and lower than Ct have an empty simplified intent. In addition, we consider only the case where simplified intent of Cs contains only source graphs or (inclusively) simplified intent of Ct contains only target graphs. (3) Symmetrically, the transformation patterns (T Pnpartt ) coming from the graphs GS and GT , such that GT is in simplified intent of a concept Ct which is a subconcept of the concept Cs which has GS in its simplified intent and all concepts greater than Ct and lower than Cs have an empty simplified intent. In addition, we consider only the case where simplified intent of Cs contains only source graphs or (inclusively) simplified intent of Ct contains only target graphs. Table 1. Results. Left-hand side: sets cardinals. Right-hand side: precision metrics #T Pl #T Pn #T Pnparts #T Pnpartt PT Pl PT Pn PT Pnparts PT Pnpart t Ill. ex. 2 2 2 1 Ill. ex. 1 1 0.72 0.72 U2S 1 5 3 0 U2S 1 0.75 0.78 - U2E 2 2 1 1 U2E 1 1 0.73 0.95 We also evaluate each extracted transformation pattern using a precision metric. Precision here is the number of elements in the source and target graphs that participate correctly to the transformation (according to a human expert) divided by the number of elements in the graphs. We then associate a precision measure to a set of transformation patterns, which is the average of the precisions of its elements (See right-hand side of Table 1). The results show that we learn transformation patterns that correspond to the initial mapping links. These patterns are relevant and efficient (precision = 1). 17 new transformation patterns are also learned from the three used examples. These patterns seems also relevant, with a precision average than 0.83. 10 Learning Model Transformation Patterns using Graph Generalization 21 5 Related Work Several approaches have been proposed to mine model transformation. The MTBE approach consists in learning model transformation from examples. An example is composed of a source model, the corresponding transformed model, and matching links between the two models. In [1,15], an alignment between source and target models is manually created to derive transformation rules. The approach of [5] consists in using the analogy to search for each source model its corresponding target model without generating rules. In a previous work [12], we use Relational Concept Analysis (RCA) to derive commonalities between the source and target meta-models, models and transformation links to learn executable transformation rules. The approach based on RCA builds trans- formation patterns that indicate how a model element, in a specific context, is transformed into a target element with its own specific context. This approach has many advantages for the case when the matching link type is one-to-one, but it is not able to capture the cases where a set of model elements is globally transformed into another set of model elements (matching link type is many-to- many). In this paper, we investigate graph mining approaches, to go beyond the limitations of our previous work. In the current context of MDE, transformation examples are not very large (they are manually designed), thus we do not expect scalability problems. Compared with a solution where we would build a lattice on graphs containing elements from both source and target models coming from matching links, the solution we choose separately classifies source graphs and target graphs. This is because source graphs and target graphs could come from the same meta-model (or from meta-models with common concepts) and it has no meaning in our context to generalize a source graph and a target graph to- gether. We also think that the result is more readable, even in the case of disjoint meta-models. Our problem has close connections with the pattern structure approach [4] when the pattern structure is given by sets of graphs that have labeled vertices. Graph mining approaches [2,10] aim at extracting repeated subgraphs in a set of graphs. They use a partial order on graphs which usually relies on morphism or on injective morphism, also known as subgraph isomorphism [9]. In the general case, these two morphism operations have an exponential complexity. In this paper, we rely on graph mining to classify independently the origins and the destinations of matching links and to infer from this, a classification of matching links, that is then used to extract transformation patterns. 6 Conclusion We have proposed an approach to assist a designer in her/his task of writing a declarative model transformation. The approach relies on model transformation examples composed of source and target model fragments and matching links. Models and their fragments are represented by graphs with labelled vertices that are classified. This classification is in turn, used for classifying the matching links. 11 22 Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut Finally, the mined model transformation patterns express how a source model fragment is transformed into a target model fragment. Future directions of this work include extending the evaluation to other kinds of source and target meta- models, and define a notion of support for the patterns. We also would like to explore the different kinds of graph mining approaches, in particular to go beyond the limitation of using locally injective graphs. Finally, we plan to apply our approach [12] to transform the obtained patterns into operational rules. References 1. Balogh, Z., Varro, D.: Model Transformation by Example Using Inductive Logic Programming. Software and Systems Modeling 8(3), 347–364 (2009) 2. Cook, D.J., Holder, L.B.: Mining Graph Data. John Wiley & Sons (2006) 3. Fabro, M.D.D., Valduriez, P.: Towards the efficient development of model transfor- mations using model weaving and matching transformations. Software and System Modeling 8(3), 305–324 (2009) 4. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Proc. of ICCS’01. pp. 129–142 (2001) 5. Kessentini, M., Sahraoui, H., Boukadoum, M.: Model transformation as an opti- mization problem. In: MODELS’08, LNCS 5301. pp. 159–173. Springer (2008) 6. Lieberman, H. (ed.): Your Wish Is My Command: Programming by Example. Morgan Kaufmann Publishers (2001) 7. Liquiere, M., Sallantin, J.: Structural machine learning with Galois lattice and Graphs. In: Proc. of ICML’98. pp. 305–313 (1998) 8. Lopes, D., Hammoudi, S., Abdelouahab, Z.: Schema matching in the context of model driven engineering: From theory to practice. In: Advances in Systems, Com- puting Sciences and Software Engineering. pp. 219–227. Springer (2006) 9. Mugnier, M.L.: On generalization/specialization for conceptual graphs. J. Exp. Theor. Artif. Intell. 7(3), 325–344 (1995) 10. Nijssen, S., Kok, J.N.: The Gaston Tool for Frequent Subgraph Mining. Electr. Notes Theor. Comput. Sci. 127(1), 77–87 (2005) 11. Norris, E.: An algorithm for computing the maximal rectangles in a binary relation. Revue Roumaine Math. Pures et Appl. XXIII(2), 243–250 (1978) 12. Saada, H., Dolques, X., Huchard, M., Nebut, C., Sahraoui, H.A.: Generation of op- erational transformation rules from examples of model transformations. In: MoD- ELS. pp. 546–561 (2012) 13. Saada, H., Huchard, M., Nebut, C., Sahraoui, H.A.: Model matching for model transformation - a meta-heuristic approach. In: Proc. of MODELSWARD. pp. 174–181 (2014) 14. Weichsel, P.M.: The Kronecker product of graphs. Proceedings of the American Mathematical Society 13(1), 47–52 (1962) 15. Wimmer, M., Strommer, M., Kargl, H., Kramler, G.: Towards model transforma- tion generation by-example. In: Proc. of HICSS ’07. p. 285b (2007) 12 Interaction Challenges for the Dynamic Construction of Partially-Ordered Sets Tim Pattison and Aaron Ceglar {tim.pattison,aaron.ceglar}@defence.gov.au Defence Science & Technology Organisation West Ave, Edinburgh South Australia 5111 Abstract. We describe a technique for user interaction with the in- terim results of Formal Concept Analysis which we hypothesise will ex- pedite user comprehension of the resultant concept lattice. Given any algorithm which enumerates the concepts of a formal context, this tech- nique incrementally updates the set of formal concepts generated so far, the transitive reduction of the ordering relation between them, and the corresponding labelled Hasse diagram. User interaction with this Hasse diagram should prioritise the generation of missing concepts relevant to the user’s selection. We briefly describe a prototype implementation of this technique, including the modification of a concept enumeration al- gorithm to respond to such prioritisation, and the incremental updating of both the transitive reduction and labelled Hasse diagram. 1 Introduction Formal Concept Analysis (FCA) takes as input a formal context consisting of a set of attributes, a set of objects, and a binary relation indicating which objects have which attributes. It produces a partially-ordered set, or poset, of formal concepts, the size of which is, in the worst case, exponential in the number of objects and attributes in the formal context [1]. The computational tasks of enu- merating the set of formal concepts, and of calculating the transitive reduction of the ordering relation amongst them, therefore scale poorly with the size of the formal context. These steps are required to determine the vertices and arcs of the directed acyclic graph whose drawing is known as the Hasse diagram of the partial order. The layout of this layered graph prior to its presentation to the user is also computationally intensive [2]. For contexts of even moderate size, there is therefore considerable delay between user initiation of the process of FCA and presentation of its results to the user. A number of algorithms exist which efficiently enumerate the formal concepts of a formal context [3–6]. In this paper, we describe an approach which incre- mentally updates and presents the partial order amongst the formal concepts generated so far. In particular, it: incrementally updates the transitive reduc- tion of the interim partial order as each new concept is generated; incrementally updates the layout of the Hasse diagram; and animates the resultant changes to c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 23–35, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 24 Tim Pattison and Aaron Ceglar the Hasse diagram to assist the user in maintaining their mental model. This approach enables user exploration and interrogation of the interim partial order in order to expedite their comprehension of the resultant complete lattice of concepts. It applies equally to any other partial order, the enumeration of whose elements is computationally intensive. We also describe how this interaction can prioritise the generation and dis- play of those missing concepts which are most relevant to the user’s current exploratory focus. By addressing the scalability challenge of visual analytics [7], this user guidance of computationally intensive FCA algorithms [8] facilitates the required “human-information discourse”. 1.1 Previous work Incremental algorithms exist for updating the set of formal concepts and the transitive reduction of the ordering relation following the addition of a new object to the formal context [9–11]. A new object can give rise to multiple additional concepts which must be inserted in the existing complete lattice to produce an updated lattice which is also complete. In contrast, the technique described in this paper involves the addition of a single element at a time to a partially ordered set which is not in general a complete lattice. Ceglar and Pattison [8] have argued that user guidance of the FCA process could allow the satisfaction of the user’s requirements with a smaller lattice, and consequently in less time, than standard FCA algorithms. They described a prototype tool which facilitates interactive user guidance and implements an efficient FCA algorithm which they have modified to respond to that user guid- ance. The user interaction challenges identified by that work are described and addressed in this paper. 2 Interacting with a Hasse diagram 2.1 The Hasse diagram A finite poset hP ; n. Episodes are labeled directed graphs (DAGs). An episode G is a triple (V, E, λ), where V is the set of vertices, E is the set of directed edges, and λ is the la- beling function from V and E to the set of labels, that is, E. Several classes of episodes have been studied since episode mining is firstly introduced by Man- nila et. al. [11]. We follow subclasses of episodes studied by Katoh et al. [7]. An 450 Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto &'()*+,'-)&./' $%&'()*+,'-)&./' ! " # " E ! " 4 " 0%&'()*+,'-)&./' /)*2.3/,'-)&./' 1)*2.3/,'-)&./' ! 4 # " 1 4 5%&'()*+,'-)&./' ! 1 4 ! 1 # $%&'()*+,'-)&./' Fig. 1. An example of episode studied in episode mining in [11] and [7]. example of episodes is illustrated in Figure 1. In designing pattern mining algo- rithms, we need 1) a search space of patterns and a partial order for enumerating patterns, and 2) interestingness measure to evaluate them. For episode mining, we often adopt occurrences of episodes defined with windows. Definition 1 (Windows). For a sequence S = hS1 , . . . , Sn i, an window W of S is a contiguous subsequence hSi , · · · , Si+w−1 i of length n, called width, for some index i (−w + 1 ≤ i ≤ n) of S and a positive integer w ≥ 0. Definition 2 (Embedding of Episodes). Let G = (V, E, λ) be an episode, and W = hS1 , . . . , Sw i be a window of width w. We say that G occurs in W if there exists a mapping h : V → {1, . . . , w} satisfying 1) for all v ∈ V , h(v) ∈ Sh(x) , and 2) for all (u, v) ∈ E with u 6= v, it holds that h(u) < h(v). The map h is called an embedding of G into W , and it is denoted by G W . For an input event sequence S and a episode G, we say that G occurs at position i of S if G Wi , where Wi = hSi , . . . , Si+w−1 i is the i-th window of width w in S. We then call the index i an occurrence of G in S. The domain of the occurrences is given by WS,w = {i | −w + 1 ≤ i ≤ n}. In addition, WS,w (G) is the occurrence window list of an episode G, defined by {−w + 1 ≤ i ≤ n | G Wi }. Then we can define an interestingness measure frequency of episodes. Definition 3 (Frequency of Episodes). The frequency of an episode G in S and w, denoted by freq S,w (G), is defined by the number of windows of width w containing G. That is, freq S,w (G) = |WS,w (G)|. For a threshold θ ≥ 1, a width w and an input event sequence S, if freq S,w (G) ≥ θ, G is called θ-frequent on S. The frequent episode mining problem is defined as follows: Let P be a class of episodes. Given an input event sequence S, a width w ≥ 1, and a frequency threshold θ ≥ 1, the problem is to find all θ-frequent episodes G belonging to the class P. The simplest strategy of finding all θ-frequent episodes is traversing P by using the anti-monotonicity of the frequency count freq(·). For details, we would like to refer to both [7] and [11]. For our examples of classes, we introduce m-serial episodes and diamond episodes. An m-serial episode over E is a sequence of events in the form of a1 7→ a2 7→ · · · 7→ am . A diamond episode over E is either 1) a 1-serial episode e ∈ E or 2) a proper diamond episode represented by a triple Q = ha, X, bi ∈ E × 2E × E, where a, b are events and X ⊆ E is an event set occurring after a and before Pattern Structures for Understanding Episode Patterns 515 b. For short, we write a diamond episode as a 7→ X 7→ b. On the one hand definitions of episodes by graphs are much general, on the another hand classes of episode patterns are often restricted. Example 3 (Episodes). In Figure 1, we show some serial episodes; A 7→ B 7→ E, A 7→ D 7→ E, B 7→ E, and C on the set of events E = {A, B, C, D, E}. All of them are included in a diamond episode A 7→ {B, C, D} 7→ E. We explain a merit of introducing pattern structures for summarization of structured patterns. As we mentioned above, a common strategy adopted in pat- tern mining is traversing the space P in a breadth-first manner with checking some interestingness measure. When generating next candidates of frequent pat- terns, algorithms always check a parent-child relation between two patterns. This order is essential for pattern mining and we thus conjecture that this parent- child relation used in pattern mining can be naturally adopted in constructing a pattern structure for analyzing patterns only by introducing a similarity oper- ation ⊓. After constructing a lattice, it would be helpful to analyze a set of all patterns using it because they represent all patterns compactly. A crucial problem of pattern structures is the computational complexity con- cerning both ⊓ and ⊑. Our idea is to adopt trees of height 1 (also called stars in Graph Theory). That is, we here assume that trees are expressive enough to represent features of episodes. Our idea is similar that used in designing graph kernels [14]1 and that is inspired by previous studies on pattern structures [2, 4]. 3 Diamond Episode Pattern Structures In the following, we focus on diamond episodes as our objects, and trees of height 1 as our descriptions. They have two special vertices; the source and the sink. They can be regarded as important features for representing event transitions. We generate rooted labeled trees from them by putting the node in the root of a tree, and regarding neighbors as children of it. Since heights of all trees here are 1, we can represent them by tuples without using explicit graph notations. Definition 4 (Rooted Trees of Height 1). Let (E, ⊓E ) be a meet semi-lattice of event labels. A rooted labeled tree of height 1 is represented by a tuple 2 (e, C) ∈ E × 2E . We represent the set of all rooted labeled trees of height 1 by T. Note that in (E, ⊓E ), we assume that ⊓E compares labels based on our back- ground knowledge. We need to take care that this meet semi-lattice (E, ⊓E ) is independent and different from a meet semi-lattice D of descriptions of a pattern structure P. This operation ⊓E is also adopted when defining an embedding of trees of height 1, that is, a partial order between trees defined as follows. 1 It intuitively generates a sequence of graphs by relabeling all vertices of a graph. One focus on a label of a vertex v ∈ V (G) and sees labels LN G (v) of its neighbors NG (v). For a tuple (lv , LN G (v)) for all vertices v ∈ V (G), we sort all labels lexicographically, and we assign a new label according to its representation. Details are seen in [14]. 2 On the viewpoint of graphs, this tuple (e, C) should represent a graph G = (V, E, λ) of V = {0, 1, . . . , |C|}, E = {(0, i) | 1 ≤ i ≤ |C|}, λ(0) = e, {λ(i) | 1 ≤ i ≤ |C|} = C. 652 Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto ",-+./&'0+ G0 453+6-*'7.5)/,1-58-)''8 δ(G0 ) &'()*+ " ! ! ! ⊓E $ = 9 δ(G0 ) ⊓t δ(G1 ) ! # $ $ # " 9 $ ,+/123')& $ # " :+,+)56/;58/',-'<-*2/60)+, ",-+./&'0+ G1 $ # " " δ(G1 ) =+7/>6588/*+-<')-+?+,8& &'()*+ # ! $ " # % $ " # % $ ! $ # % " # % $ = $ # " 9 $ ,+/123')& Fig. 2. An example of computations ⊓ of two trees of height 1. Definition 5 (Partial Order on Trees). A tree t1 = (e1 , C1 ) is a generalized subtree of t2 = (e2 , C2 ), denoted by t1 ⊑T t2 , iff e1 ⊑E e2 and there exists an injection mapping φ : C1 → C2 satisfying for all v ∈ C1 , there exists φ(v) ∈ C2 satisfying v ⊑E φ(v), where ⊑E is the induced partial order by ⊓E . For defining a similarity operator ⊓T between trees, this partial order ⊑T is helpful because ⊓T is closely related to ⊑T in our scenario. Since all trees here are height 1, this computation is easy to describe; For labels of root nodes, a similarity operator is immediately given by using ⊓E . For their children, it is implemented by using an idea of least general generalization (LGG), which is used in Inductive Logic Programming [10], of two sets of labels. A practical implementation of LGG depends on whether or not sets are multisets, but it is computationally tractable. An example is seen in Figure 2. We give formal definitions of δ and D. For a graph G = (V, E, λ), we denote the neighbors of v ∈ V by NG (v). For some proper diamond episode pattern G, the source vertex s ∈ V and the sink vertex t ∈ V , computed trees of height 1 corresponding s and t are defined as Ts = ({s} ∪ NG (s), {(s, u) | u ∈ NG (s)}, λ), and Tt = ({t} ∪ NG (t), {(u, t) | u ∈ NG (t)}, λ), respectively. By using those trees, δ(·) can be defined according to vertices s and t: If we see both Ts and Tt , δ(G) = (Ts , Tt ) and then ⊓T is adopted element-wise, and D is defined by T × T. If we focus on either s or t, δ(G) = Ts or Tv , and we can use ⊓T directly by assuming D = I. Last we explain relations between our pattern structures and previous stud- ies shortly. This partial order ⊑T is inspired from a generalized subgraph iso- morphism [4] and a pattern structure for analyzing sequences [2]. We here give another description of similarity operators based on definition used in [4, 9]. Definition 6 (Similarity Operation ⊓ based on [9]). The similarity op- eration ⊓ is defined by the set of all maximal common subtrees based on the generalized subtree isomorphism ⊑T ; For two trees s1 and s2 in T, s1 ⊓ s2 ≡ {u | u ⊑T s1 , s2 , and ∀u′ ⊑T s1 , s2 satisfying u 6⊑T u′ }. Pattern Structures for Understanding Episode Patterns 537 %&'()*+,-(,&., )73, /123'4,+*56*,'7+58,+ " " " " " " " " " " " " " " ,0,&)*+,) ! ! ! ! ! ! ! ! ! " " # # # # # # # # # # # # # # # # ! # " ! # ! $ $ $ $ $ $ $ $ $ $ $ $ $ $ / / / / / / / / $ $ Fig. 3. An input S and two diamond episodes mined from S as examples. Table 1. Numbers of proper diamond episodes and pattern concepts for w ∈ {3, 4, 5} and M ∈ {100, 200, 300, 400, 500, 600, 700}. In the table below, DE and PDE means Diamond Episodes and Proper Diamond Episodes, respectively. M and # of pattern concepts Window width w # of DE # of PDE 100 200 300 400 500 600 700 3 729 569 87 137 178 204 247 – – 4 927 767 74 136 179 225 281 316 336 5 935 775 71 137 187 272 290 313 342 Note that we can regard that our operator ⊓T is a special case of the similarity operation ⊓ above. On the viewpoint of pattern structures, our trees of height 1 can be regarded as an example of projections from graphs into trees, studied in [4, 9], such as both k-chains (paths on graphs of length k) and k-cycles. 4 Experiments and Discussion for Diamond Episodes Data and Experiments We gathered data from MLB baseball logs, where a system records all pitching and plays for all games in a season. We used what types of balls are used in pitching, which can be represented by histograms per batter. For a randomly selected game, we generated an input event sequence of episode mining by transforming each histogram to a set of types of balls used types of balls3 . In forming (E, ⊓E ), we let E be the set of types of balls, and define ⊓E naturally (See Example in Fig. 2). For this S, we applied a diamond episode mining algorithm proposed by [7] and obtain a set of diamond episodes. The algorithm have two parameters; the window size w and the frequency threshold θ. We always set θ = 1 and varied w ∈ {3, 4, 5}. After generating a set G of frequent proper diamond episodes, we sampled M ∈ {100, 200, . . . , 700} episodes from G as a subset O of G (that is, satisfying |O| = M and O ⊆ G). We used O as a set of objects in our pattern structure P. From it we computed all pattern concepts P(P) based on our discussions in Section 3. In this experiments we set δ(G) = Ts for a proper diamond episode G and its source vertex s. 3 In baseball games, pitchers throw many kinds of balls such as fast balls, cut balls, curves, sinkers, etc. They are recorded together with its movements by MLB systems. 54 Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto Pattern Structures for Understanding Episode Patterns 559 representations of itemsets, and they are closely related to the closure operator g ◦ f in FCA with (O, A, I), where O is the set of transaction identifiers and A is the set of all items. The difficulty of closed patterns for complex data is there are no common definitions of closure operators, where we usually use the closeness with respect to the frequency. Here we assume that pattern concepts are helpful in the same correspondence between closed itemsets and concepts. To obtain some compact representations, we need to decide how to evaluate each pattern. The problem here is how to deal with the wildcard ⋆ in descriptions. When we obtain a concept (X, Y ) for X ⊆ O, Y ⊆ A, this concept (X, Y ) corresponds to a rectangle on I, and there are no 0 entries in the sub-database I ′ = {(x, y) ∈ I | x ∈ X, y ∈ Y } of I induced by (X, Y ) because of its definitions. If (X, Y ) is not a concept, a rectangle r by (X ′ , Y ′ ) contains a few 0 entries in it. We denote the relative ratio of 1 entries in a rectangle r by (X ′ , Y ′ ) as −1 r1 (X ′ , Y ′ , I) = (1 − |{(x, y) 6∈ I | x ∈ X ′ , y ∈ Y ′ }|) (|X ′ ||Y ′ |) , where 0 ≤ r1 (X ′ , Y ′ , I) ≤ 1 and r1 (X ′ , Y ′ , I) = 1 if (X ′ , Y ′ ) is a concept. These r1 (X, Y, I), |X|, and |Y | are applicable for evaluating itemsets. If we only use the cardinality |A| of a set A of objects, this equals to the support counts computed in Iceberg concept lattices [15]. For a concept (X, Y ) of a context K = (O, A, I), we compute the support count supp(X, Y ) = |g(Y )|/|O| and prune redundant concepts by using some threshold. For formalizing evaluations of patterns, such values are generalized by introducing a utility function u : P → R+ . A typical and well-studied utility function is, of course, the frequency count, or the area function area(·) which evaluates the size of a rectangle (X, Y ) [6]. Based on discussions above, if we can define a utility function u(·) for eval- uating pattern concepts, a similar discussion for pattern concepts are possible; choosing a few number of pattern concepts and constructing summary of pat- terns with them. Of course, there are no simple way of giving such functions. We try to introduce a simple and straightforward utility function uP (·) for pattern concepts as a first step of developing pattern summarization via pattern concept lattices. In this paper, we follow the idea used in tiling databases [6], where a key criterion is given by area(·). We consider how to compute the value which corresponds to the area in binary databases. To take into account the wildcard ⋆ used in descriptions, we define the following simple function. For d ∈ D, we let s(d) and n(d) be the numbers of non wildcard and all vertices in a description d, respectively. Note that if s(d) = n(d), d contains no wildcard labels. By using these functions, we compute utility values as follows: uP (A, d) = |A| · log (1 + s(d)) . 5.1 Experiments and Discussions We compare results of ranking pattern concepts by 1) using only |A| (similar to the Iceberg concept lattices), and 2) using uP (·) as a utility function. From the list of pattern concepts generated in experiments of Section 4, we rank all 56 10 Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto Table 2. Results of ranking pattern concepts from 750 episodes in w = 5. Utility Top-5 mutually distinct descriptions of pattern concepts |A| (⋆, {⋆}), (2, {⋆}), (0, {⋆}), (3, {⋆}), (1, {⋆}) uP (·) (⋆, {0, ⋆}), (⋆, {0, 2, 3}), (⋆, {0, 1, 2}), (⋆, {0, 1, 3}), (⋆, {1, 2, 3}) pattern concepts by using a utility function, and sort the list in an ascending order, and compare two lists. We remove patterns appearing commonly in both lists to highlight differences. We give our results in Table 2. In the result with uP (·), larger descriptions appear with higher utility values compared with those by |A|. We can see that by modifying terms concerning ⋆, results contain more informative nodes, which are labeled by non-wildcard labels. Here we implicitly assume that descriptions contains less ⋆ would be more useful for understanding data themselves. On this viewpoint, considering two terms s(d) and n(d) for description d would be interesting and useful way to design utility functions for pattern concepts. We conclude that the Iceberg lattice based support counts are less effective if descriptions admit the wildcard ⋆ for pattern summarization problems. Not only the simple computation in uP (A, d) used above, also many alter- natives could be applicable for ranking. Some probabilistic methods such as the minimum description length (MDL), information-theoretic criteria would be also helpful to analyze our study more clearly. Since pattern structures have no ex- plicit representations of binary cross tables, the difficulty lies on how to deal with a meet semi-lattice (D, ⊓). For some pattern concept (A, d) and an object o ∈ O, we say that (A, d) subsumes o if and only if d ⊑ δ(o). This subsump- tion relation would be simple and helpful to evaluate concepts, but they does not adopt any complex information concerning hierarchy of events, or distances between two descriptions. In fact in the experiments, we always assume that all events except ⋆ have the same weight and ⋆ is the minimum of all events. They could be important to take into account similarity measures of events for more developments of ranking methods of pattern concepts. 5.2 Related Work There are several studies concerning our study. It is well-known that closed item- sets correspond to maximal bipartite cliques on bipartite graphs constructed from K = (O, A, I). Similarly, we sometimes deal with so called pseudo bipartite cliques [16], where it holds that r1 (X ′ , Y ′ , I) ≥ 1 − ε with a user-specified con- stance ε. Obviously, pseudo bipartite cliques correspond to rectangles containing a few 0. We can regard them as some summarization or approximation of closed itemsets or concepts. Intuitively, if we use some pseudo bipartite cliques as sum- marization, the value r1 (X, Y, I) can be considered in evaluating (X, Y ). Pseudo bipartite cliques can be regarded as noisy tiles, which is an extension of tiles [6]. Another typical approach for summarization is clustering patterns [18, 1]. A main problem there is how to interpret clusters or centroids, where we need to de- Pattern Structures for Understanding Episode Patterns 57 11 sign a similarity measure and a space in which we compute the similarity. On the viewpoint of probabilistic models, there is an analysis via the maximum entropy principle [3]. However they assume that entries in a database are independently sampled, and thus we cannot apply those techniques to our setting. 6 Toward Generalizations for Bipartite Episodes In this paper we assume that our descriptions by trees of height 1 are rich enough to apply many classes of episode patterns. We here show how to apply our pattern structure for other types of episodes, called bipartite episodes, as an example. An episode G = (V, E, λ) is a a partial bipartite episode if 1) V = V1 ∪V2 for mutually disjoint sets V1 and V2 , 2) for every directed edge (x, y) ∈ E, (x, y) ∈ V1 × V2 . If E = V1 ×V2 , an episode G is called a proper bipartite episode. Obviously, vertices in a bipartite episode G are separated into V1 and V2 , and we could regard them as generalizations of the source vertex and the sink vertex of diamond episodes. This indicates that the same way is applicable for bipartite episodes by defining ⊓ between sets of tress. Fortunately, [9] gives the definition ⊓ for sets of graphs. [ {t1 , . . . , tk } ⊓ {s1 , . . . , sm } ≡ MAX⊑T ({ti } ⊓ {sj }) , i,j where MAX⊑T (S) returns only maximal elements in S with respect to ⊑T . Since our generalized subtree isomorphism is basically a special case of that for graphs, we can also apply this meet operation. This example suggest that if we have some background knowledge concerning a partition of V , it can be taken into account for δ and (D, ⊓) in a similar manner of diamond and bipartite episodes. 7 Conclusions and Future Work In this paper we propose a pattern structure for diamond episodes based on an idea used in graph kernels and projections of pattern structures. Since we do not directly compute graph matching operations we conjecture that our computation could be efficient. With a slight modification of ⊓, our method is also applicable for many classes of episodes, not only for diamond patterns as we mentioned above. Based on our pattern structure, we discussed summarization by using mined pattern concepts and show small examples and experimental results. Since problems of this type are unsupervised and there is no common way of obtaining good results and of evaluating whether or not the results are good. It would be interesting to study more about this summarization problem based on concept lattices by taking into account theoretical backgrounds such as proba- bilistic distributions. In our future work, we try to analyze theoretical aspects on summarization via pattern structures including the wildcard ⋆ and its op- timization problem to obtain compact and interesting summarization of many patterns based on our important merit of a partial order ⊑ between descriptions. 58 12 Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto Acknowledgments This work was supported by Grant-in-Aid for JSPS Fellows (26·4555) and JSPS KAKENHI Grant Number 26280085. References 1. Al Hasan, M., Chaoji, V., Salem, S., Besson, J., Zaki, M.: Origami: Mining rep- resentative orthogonal graph patterns. In: Proc. of the 7th ICDM. pp. 153–162 (2007) 2. Buzmakov, A., Egho, E., Jay, N., Kuznetsov, S.O., Napoli, A., Raı̈ssi, C.: The representation of sequential patterns and their projections within Formal Concept Analysis. In: Workshop Notes for LML (ECML/PKDD2013) (2013) 3. De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery 23(3), 407–446 (2011) 4. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Proc. of the 9th ICCS. pp. 129–142 (2001) 5. Ganter, B., Wille, R.: Formal concept analysis - mathematical foundations. Springer (1999) 6. Geerts, F., Goethals, B., Mielik ainen, T.: Tiling databases. In: Proc. of the 7th DS. pp. 278–289 (2004) 7. Katoh, T., Arimura, H., Hirata, K.: A polynomial-delay polynomial-space algo- rithm for extracting frequent diamond episodes from event sequences. In: Proc. of the 13th PAKDD. pp. 172–183. Springer Berlin Heidelberg (2009) 8. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting Numerical Pattern Mining with Formal Concept Analysis. In: Proc. of the 24th IJCAI (2011) 9. Kuznetsov, S.O., Samokhin, M.V.: Learning closed sets of labeled graphs for chem- ical applications. In: Proc. of the 15th ILP, pp. 190–208 (2005) 10. Lloyd, J.W.: Foundations of Logic Programming. Springer-Verlag New York, Inc. 11. Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1(3), 259–289 (1997) 12. Merwe, D., Obiedkov, S., Kourie, D.: AddIntent: A New Incremental Algorithm for Constructing Concept Lattices. In: Proc. of the 2nd ICFCA. pp. 372–385 (2004) 13. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed item- sets for association rules. In: Prof. of the 7th ICDT. pp. 398–416 (1999) 14. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt, K.M.: Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12, 2539–2561 (2011) 15. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with titanic. Data & Knowledge Engineering 42(2), 189–222 (2002) 16. Uno, T.: An efficient algorithm for solving pseudo clique enumeration problem. Algorithmica 56(1), 3–16 (Jan 2010) 17. Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery 23(1), 169–214 (2011) 18. Xin, D., Cheng, H., Yan, X., Han, J.: Extracting redundancy-aware top-k patterns. In: Proc. of the 12th KDD. pp. 444–453. ACM (2006) Formal Concept Analysis for Process Enhancement Based on a Pair of Perspectives Madori IKEDA, Keisuke OTAKI, and Akihiro YAMAMOTO Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan {m.ikeda, ootaki}@iip.ist.i.kyoto-u.ac.jp, akihiro@i.kyoto-u.ac.jp Abstract. In this paper, we propose to use formal concept analysis for process enhancement, which is applied to enterprise processes, e.g., operations for patients in a hospital, repair of imperfect products in a company. Process enhancement, which is one of main goals of process mining, is to analyze a process recorded in an event log, and to improve its efficiency based on the analysis. Data formats of the logs, which con- tain events observed from actual processes, depend on perspectives on the observation. For example, events in logs based on a so-called process perspective are represented by their types and time-stamps, and obser- vation based on a so-called organization perspective records events with organizations relating the occurrence of them. The logs recently became large and complex, and events are represented by many features. How- ever, previous techniques of process mining take a single perspective into account. For process enhancement, by formal concept analysis based on a pair of features from different perspectives, we define subsequences of events whose stops are fatal to execution of a process as weak points to be removed. In our method, the extent of every concept is a set of event types, and the intent is a set of resources for events in the extent, and then, for each extent, its weakness is calculated by taking into account event frequency. We also propose some basic ideas to remove the weakest points. Keywords: formal concept analysis, process mining, business process improvement, event log 1 Introduction In this paper, we show a new application of formal concept analysis, process enhancement (or business process improvement), which is one of main goals of process mining. We show that formal concepts are useful to discover weak points of processes, and that a formal concept lattice works as a good guide to remove the weak points in the process enhancement. Formal concept analysis (FCA for short) is a data analysis method which focuses on relationship between a set of objects and a set of attributes in data. A concept lattice, which is an important product of FCA, gives us valuable insights from a dual viewpoint based on the objects and the attributes. Moreover, because c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 59–71, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 60 2 Madori Madori IKEDA, Ikeda, Keisuke Keisuke OtakiOTAKI, and Akihiro and Akihiro YAMAMOTO Yamamoto of its simple and strong definition, various types of data can be translated for FCA, and so FCA attracts attention across various research domains. Process mining [9,13] is a relatively young research domain, and is researched for treating enterprise processes recorded in event logs, e.g., operations for pa- tients in a hospital, repair of imperfect products in a company. It provides a bridge between business process management (BPM for short) [12] and data mining. BPM has been investigated pragmatically, and data formats, softwares, and management systems are proposed for manipulating processes. Like recent data represented as “big data”, the event logs also became huge and complicated. Thus, BPM researchers need theoretically efficient approaches for handling such big data. This is also the recent trend of data mining. Though many results pro- duced in the last decade of process mining, there are still many challenges [11], and we work with FCA on two of them: “combining process mining with other types of analysis” and “dealing with complex event logs having diverse charac- teristics”. We treat business process improvement which is an essential goal of process mining as a application of FCA. In order to achieve it, so many matters should be considered. At first, we have to decide features of a process which are modified for improvement, and there are various types of features to represent the process. In order to categorize the features, six central perspectives have been proposed [4, 8]. For improvement in the target features, many modifications can be constructed. According to [8], there are 43 patterns of the modifications. We also have to evaluate the improvement, so an improvement measure is needed for the evaluation. Based on principal aspects of processes, time, quality, cost, and flexibility, four types of measures are considered [4, 8]. In this paper, for making a process robust and reliable, we focus on two of the perspectives to detecting weak points of the process which are subsequences of events. For the detection, our method calculated a weakness degree regarded as one of cost measures for each subsequence which is represented by the extent of a formal concept. This paper is organized as follows. In the next section, we introduce process mining and give a running example, and then, we show the problem tackled in this paper. In Section 3, we explain our process enhancement method. Conclu- sions are placed in Section 4. 2 Process Mining In this section, we outline process mining with an example and show the problem which we try to solve. 2.1 Event Logs Observed from Actual Processes Process mining has three types: process discovery, process conformance check- ing, and process model enhancement. Every type strongly focuses on and starts from facts observed from actual processes. It is the main difference from BPM (Business Process Management) [12] and also from WFM (Workflow Manage- ment) [6]. They are past fields of process mining and rely on prior knowledge. FCA FCA for for Process Process EnhancementBased Enhancement Basedon onaa Pair Pair of of Perspectives Perspectives 3 61 The observed facts are recorded in event logs, and so the logs are the most important materials in process mining. Actual event logs are usually represented in a semi-structured format like MXML [15] and XES [17]. Theoretically, every event log can be simply formal- ized as a pair (F, E) of a finite set F of features and a finite set E of events. Every feature f ∈ F is a function from E to its domain Df , and every event Q|F | e ∈ E is recorded in the form of (f1 (e), f2 (e), ..., f|F | (e)) in i=1 Dfi . Each event corresponds with an occurrence or a task which are found by observation of an actual process. The observation is performed based on perspectives, and the set of features is decided by depending on them. Mathematically, a set P of the perspectives satisfies that every perspective p ∈ P is a non-empty subset of F . Though six central perspectives which are called process, object, organiza- tion, informatics, IT application, or environment are proposed [4, 8], there are no standards for deciding P should be adopted in the observation. The set of perspectives P varies from an observation to another based on aims of process mining, kinds of processes executed by organizations, sensor systems installed to organizations, and many other factors. There are however some fundamen- tal perspectives which are currently adopted in construction of event logs. Our approach focuses on two of these. One of them is the process perspective (it is sometimes called a control-flow perspective), which is focusing on how process occurs. If a process is observed based on the perspective, the set of features in its event log must include an event type feature, a time stamp feature, and a case feature. The case feature makes clear which case each event occurs in (note that some researches regard the case feature as a feature based on another perspec- tive, a case perspective). Based on such a perspective, event logs clarify ordering of events for each case, and the set E of events can be treated as a partially ordered set (E, ≤), so we sometimes use E as the poset (E, ≤) in this paper. A sequence of events occurring in a case which are ordered based on time is called a trace. At the same time, the process can be observed based on the organization perspective, which is another fundamental perspective. The perspective focuses on where the occurrence happens or who performs the task, and event logs based on it must have a place feature, a resource feature, or an employee feature. In this paper, we assume that a given event log records statistically enough events. Example 1 As a running example, we show a process which is handling a re- quest for compensation within an airline. Customers may request the airline to compensate for various reasons, e.g., delay of flight or its cancelation. In such situations, the airline has to examine the validity of the request and needs to pay compensation if it is unquestionable. Table 1 shows an event log recording the compensation process which is partially quoted from [13]. In this example, an event means a task executed by an employee: the first event in the table shows that a task called “register request” is executed as the beginning of Case 1 by Pete at 11:02 on 30 Dec., 2010. In this log, the features Case ID, Event type, and Time are based on the process perspective. Resource feature is based on the organization perspective and represents human resources needed for each of the event. Cost feature comes from another perspective. The log also shows that 62 4 Madori Madori IKEDA, Ikeda, Keisuke Keisuke OtakiOTAKI, and Akihiro and Akihiro YAMAMOTO Yamamoto Table 1. An event log L = (F, E) recording a compensation process of an airline: each row shows an event which is represented by five features. Case ID Event type Resource Cost Time(dd-mm-yyy.hh:mm) 1 register request Pete 50 30-12-2010.11:02 1 examine thoroughly Sue 400 31-12-2010.10:06 1 check ticket Mike 100 05-01-2011.15:12 1 decide Sara 200 06-01-2011.11:18 1 reject request Pete 200 07-01-2011.14:24 2 register request Mike 50 30-12-2010.11:32 2 check ticket Mike 100 30-12-2010.12:12 2 examine casually Sean 400 30-12-2010.14:16 2 decide Sara 200 05-01-2011.11:22 2 pay compensation Ellen 200 08-01-2011.12:05 3 register request Pete 50 30-12-2010.14:32 3 examine casually Mike 400 30-12-2010.15:06 3 check ticket Ellen 100 30-12-2010.16:34 3 decide Sara 200 06-01-2011.09:18 3 reinitiate request Sara 200 06-01-2011.12:18 3 examine thoroughly Sean 400 06-01-2011.13:06 3 check ticket Pete 100 08-01-2011.11:43 3 decide Sara 200 09-01-2011.09:55 3 pay compensation Ellen 200 15-01-2011.10:45 three cases are observed and recorded as three traces, and that their length are 5, 5, and 9, respectively. 2.2 Models of Processes Models of processes are also important in process mining because they are deeply related with the three types of process mining: models are extracted from event logs by the process discovery, they are used with event logs for the process con- formance checking and for the process model enhancement. Note that different types of models can be considered, and have been researched because of vari- ous aims of mining. Some models have been proposed for extract procedure of processes, e.g., Petri net [16], Business process modeling notation (BPMN) [3], Event-driven process chain (EPC) [7], and UML activity diagram [2]. These pro- cedure models express workflow of a process clearly as directed graphs. For an- other aim, expressing how resources are involved in a process or how resources are related with each other, social network models are proposed [10, 14]. A working- together social network expresses relations among resources which are used in the same case. A similar-task social network ignores cases but focuses on re- lations among resources used together for the same event. A handover-of-work social network expresses handovers from resources to resources in cases. All of these models are developed for expression, and do not provide any analytical function. In other words, they only push event logs into their format, FCA FCA for for Process Process EnhancementBased Enhancement Basedon onaa Pair Pair of of Perspectives Perspectives 5 63 check ticket reject request examine casually start register request end decide pay compensation examine thoroughly reinitiate request Fig. 1. A Petri net of the compensation process: every square called a transition indi- cates an event, and every circle called a place represents a state of the process. and analysis is not their duty. However, for process enhancement, we need some analytical function for evaluating the enhancement. In addition, models focusing on one perspective are apt to neglect other perspectives. For example, the pro- cedure models focusing on the process perspective do not contain information about resources which are observed based on the organization perspective. On the contrary, the social networks focusing on the organization perspective make correlations among resources explicit but make workflows which are observed based on the process perspective unclear. For our goal, detecting weak points of a process, we claim that its weakness should be measured based on at least two perspectives. This work thus relates to process model enhancement which is to extend a process model. Example 2 Figure 1 shows a procedure model which is expressed in terms of a Petri net [16] extracted from the event log shown in Table 1. This model explicitly expresses the workflow of the compensation process and makes it clear which event happens before/after another event. On the other hand, the model ignores other perspectives: information derived from Resource and Cost features are not expressed at all in the model. Figure 2 shows a similar-task social network [10, 14] generated from the same event log. This model clarifies relations among employees sharing the same tasks, but it does not care about the ordering of events. 2.3 Weak Points Detection for Process Enhancement Our final goal is process enhancement. For the goal, we propose to detect subse- quences of events from a given event log as weak points which should be removed. Actually, our method does not decide whether or not subsequences of events are weak points. Instead, the method estimates the weakness for each of some sub- sequences of events and expresses it in a number called a weakness degree. Then, some weaker subsequence of events should be removed for the enhancement. For the definition of the weakness degree, there are various candidates. If the process perspective is focused, sequences of events taking a lot of time in a process must be its weak points. Another type of weak points are looping sequences which 64 6 Madori Madori IKEDA, Ikeda, Keisuke Keisuke OtakiOTAKI, and Akihiro and Akihiro YAMAMOTO Yamamoto Sean Mike Ellen Sue Sara Pete Fig. 2. A similar-task social network of the compensation process: every circle indicates an employee, and an edge is drawn between employees if their tasks are statistically similar. many cases have to take. In the running example, it is reasonable to take costs of events into account for weakness. In this work, we focus on importance of a subsequence of events and loads of it. The importance is decided based on the process perspective and on the organization perspective. More precisely, a subsequence of events in an event log is considerable if the events are executed by a small number of resources in the log. Loads of the important sequence increase if the sequence appears many times in the log. In our method, important sequences of events having heavy loads are weak points of a process. Example 3 In the running example, the subsequence “decide” executed by Sara should be regarded as weaker than the others. Because the subsequence is impor- tant due to the fact that it can be executed only by Sara, and because the event, “decide” by Sara, is very frequent. Only from the Petri net shown in Figure 1, it can be induced that the event “decide” is important in the process. It is also induced only from the social network shown in Figure 2 that Sara takes some important role. However, these models do not show explicitly that “decide” by Sara is important and has an impact on the process. 3 Process Enhancement via FCA We adopt FCA for mining weak points of processes, so we firstly introduce the definitions of formal concepts and formal concept lattices with referring to [1,5]. Then, we explain our method. 3.1 From an Event Log to a Concept Lattice A formal context is a triplet K = (G, M, I) where G and M are mutually disjoint finite sets, and I ⊆ G × M . Each element of G is called an object, and each element of M is called an attribute. For a subset of objects A ⊆ G and a subset of attributes B ⊆ M of a formal context K, we define AI = { m ∈ M | ∀g ∈ A. (g, m) ∈ I }, B I = { g ∈ G | ∀m ∈ B. (g, m) ∈ I }, and a pair (A, B) is a formal concept if AI = B and A = B I . For a formal concept FCA FCA for for Process Process EnhancementBased Enhancement Basedon onaa Pair Pair of of Perspectives Perspectives 7 65 c = (A, B), A and B are called the extent and the intent, respectively, and let Ex(c) = A and In(c) = B. For arbitrary formal concepts c and c0 , we define an order c ≤ c0 iff Ex(c) ⊆ Ex(c0 ) (or equally In(c) ⊇ In(c0 )). The set of all formal concepts of a context K = (G, M, I) with the order ≤ is denoted by B(G, M, I) (for short, B(K)) and is called the formal concept lattice (concept lattice for short) of K. For every object g ∈ G of (G, M, I), the formal concept II I ({ g } , { g } ) is called the object concept and denoted by γg. Similarly, for ev- I II ery attribute m ∈ M , the formal concept ({ m } , { m } ) is called the attribute concept and denoted by µm. In our method, a formal context is obtained by translation from an event log, and then weak point mining is performed with a concept lattice constructed from the context. Suppose that the event log consists of two types of features, one of them is based on the process perspective, and that the other is based on the organization perspective. In this paper, the first one is called an event-type feature and is denoted by fe , and the second is called a resource feature and is denoted by fr . Note that the event-type feature represents types of events, not cases, and not time. This assumption is not strong because such features are very fundamental and are adopted in XES [17] in fact. From such an event log L = (F, E) that F ⊇ { fe , fr }, a formal context KL = (G, M, I) is translated where G = Dfe , M = Dfr , I = { (g, m) ∈ G × M | ∃e ∈ E.fe (e) = g ∧ fr (e) = m }. In the context KL = (G, M, I), (g, m) ∈ I means that events sorted into g need a resource m. For every element (g, m) ∈ I of the formal context KL , we additionally define freq((g, m)) = | { e ∈ E | fe (e) = g ∧ fr (e) = m } |. This function outputs frequency of events which are sorted into an event-type g and need resource m in the event log L. Example 4 In the running example, “Event type” corresponds to the event- type feature, and “Resource” corresponds to the resource feature. Therefore, a formal context KL = (G, M, I) shown in Table 2 is obtained from the event log shown in Table 1. For example, freq((register request, Pete)) = 2 shows that an event “register request” by Pete is observed twice in construction of the event log in Table 1. From a formal context KL translated from an event log L, a concept lattice B(KL ) is constructed for process enhancement. Each formal concept c = (A, B) of the concept lattice B(KL ) represents a pair of a set A of event-types and a set B of resources needed for events in A. For every formal concept c ∈ B(KL ), we define Exγ (c) = { g ∈ Ex(c) | γg = c } , and Inµ (c) = { m ∈ In(c) | µm = c } . By extending freq for I, we also define X X freq(c) = freq((g, m)). g∈Ex(c) m∈In(c) 66 8 Madori Madori IKEDA, Ikeda, Keisuke Keisuke OtakiOTAKI, and Akihiro and Akihiro YAMAMOTO Yamamoto Table 2. A formal context KL = (G, M, I) constructed from the event log L of the compensation process: elements of G are listed in the left most column, elements of M are listed in the first row, and every cell indicates freq(i) for i ∈ I unless freq(i) = 0. Pete Sue Mike Sara Sean Ellen register request 2 1 examine throughly 1 1 check ticket 1 2 1 decide 4 reject request 1 examine casually 1 1 pay compensation 2 reinitiate request 1 The value freq(c) is the sum of frequencies of events which are sorted into an event-type g ∈ Ex(c) and need a resource m ∈ In(c). Example 5 Figure 3 shows a concept lattice B(KL ) of the context KL = (G, M, I) shown in Table 2. For example, the left most circle in the figure indi- cates a formal concept c2 = ({ check ticket, pay compensation } , { Ellen }). The sum of frequencies freq(c2 ) = 3 means that a task “check ticket” or “pay com- pensation” executed by Ellen appears three times in the event log L shown in Table 1. 3.2 Calculating Weakness Degrees As we mentioned in Section 2.3, for every subsequence of events which is the extent of a formal concept, we define the weakness degree, and the weakness is estimated from its importance and its loads. The importance is estimated based on both of the process perspective and the organization perspective. Every formal concept (A, B) ∈ B(KL ) is based on both of the perspectives because A is a set of event-types observed from the process perspective and B is a set of resources observed from the organization perspective. Such a formal concept is considered to represent that accomplishing all the events in A needs at least one of the resources in B and that every resource in B can execute all the events in A. From this consideration, we define the importance imp(c) of the subsequence Ex(c) of a formal concept c ∈ B(KL ) as 1 + |Exγ (c)| 1 + |Ex(c)| imp(c) = × . 1 + |In(c)| 1 + |Inµ (c)| We call this an importance factor. Roughly speaking, this factor becomes large when a small number of resources are needed for a large number of events. The first term means the ratio of the number of events to the number of resources which can accomplish the events. In other words, if some or many events rely on FCA FCA for for Process Process EnhancementBased Enhancement Basedon onaa Pair Pair of of Perspectives Perspectives 9 67 register request examine thoroughly check ticket decide reject request freq = 0 examine casually imp = 9 c 1 pay compensation weak = 0 reinitiate request freq = 4 Pete freq = 5 Sara freq = 4 Mike imp = 2 c 3 register request imp = 2.25 c5 imp = 1 c4 register request decide weak ≒ 0.42 weak ≒ 0.59 check ticket weak ≒ 0.21 check ticket reinitiate request reject request examine casually Pete freq = 2 freq = 6 Mike Sean imp = 2 c6 imp = 0.75 c 7 register request weak ≒ 0.08 examine thoroughly weak ≒ 0.63 check ticket freq = 3 Ellen examine casually imp = 1.5 c2 check ticket weak ≒ 0.24 pay compensation freq = 2 Sue Sean imp ≒ 0.67 c10 weak ≒ 0.07 examine thoroughly freq = 2 Mike Sean Pete imp ≒ 1.33 c9 Mike weak ≒ 0.14 examine casually freq = 4 Ellen imp = 1 c 8 check ticket weak ≒ 0.21 freq = 0 c Pete imp ≒ 0.14 11 Sue Mike weak = 0 Sara Sean Ellen Fig. 3. A formal concept lattice B(KL ) constructed from the formal context KL : Each circle represents a formal concept c ∈ B(KL ). Each edge represents an order ≤ between two concepts, and the greater concept is drawn above, and transitional orders are omitted. Every formal concept c accompanies with Ex(c) and In(c) on its right side and with freq(c), imp(c), and weak(c) on its left side. little resources then the term is large. The second means the ratio of the number of resources to the number of events which are executed by the resources. It becomes large, if some or little resources are exhausted by many events. Also, we define load(c) of the subsequence Ex(c) as freq(c) load(c) = |E| and call it a load factor. This is a ratio of frequency of events in the sequence Ex(c) to frequency of the whole events E. Then, for the subsequence Ex(c), the weakness degree weak(c) is defined as weak(c) = imp(c) × load(c). When an important sequence Ex(c) takes a heavy load, weak(c) becomes large. In other words, the weakness degree numerically shows liableness of trouble 68 10 Madori Madori IKEDA, Ikeda, Keisuke Keisuke OtakiOTAKI, and Akihiro and Akihiro YAMAMOTO Yamamoto with Ex(c) to cause the whole process down. By extending P this definition, the weakness of the whole process can be expressed as c∈B(KL ) weak(c). Example 6 In Figure 3, importance factors and weakness degrees of every sub- sequence of events Ex(c), c ∈ B(KL ) are also drawn. The importance factors show that the sequence of tasks Ex(c5 ) = { decide, reinitiate request } executed by Sara is the most important. Indeed, there is no employee who can execute the tasks “decide” and “reinitiate request”, but Sara. On the other hand, the weak- ness degrees show that the sequence Ex(c6 ) = { register request, check ticket } of tasks is the weakest, and that the most important sequence Ex(c5 ) is the sec- ondary weakest. This reversal ofProles is caused by their load factors. The total weakness of the whole process c∈B(KL ) weak(c) is around 2.59. 3.3 Removing Weak Points A process recorded in an event log L can be P enhanced by removing the weak- est point or by reducing the total weakness c∈B(KL ) weak(c). Though there are many ways for achieving the enhancement, in this paper, we achieve it by operations to an original formal context KL = (G, M, I) which remove some weakest P formal concepts from its concept lattice B(KL ), or which totally reduce c∈B(KL ) weak(c). We here show some basic ideas for such operations. Observing the definitions about the weakness shows that there are three plans for the reduction: reducing importance factors, reducing load factors, and decreasing the number of formal concepts. Though there are many operations achieving the plans, realizable operations are restricted by considering that we try to manage an actual enterprise process. Reduction of importance factors can be achieved by increasing the number of resources to the number of events requiring the resources. Also, reducing events can decrease importance factors, but we do not adopt this way because it has a risk that the process never works. In other words, we try to enhance processes by investment in equipment not by polishing processes. Besides, reducing load factors is not reasonable for our method, because we do not have control of frequency of events. Thus, our enhancement operations are to increase resources for events requiring them or to decrease formal concepts. For enhancement of a process recorded in an event log L, we show two kinds of such operations. The first kind is adding (g, m) ∈ / I such that g ∈ Ex(c) and m ∈ M to I for removing a formal concept c from B(KL ) 3 c. This means to expand flexibility of resources, e.g., updating machines, and expanding applicability of materials by an innovation. We have to note that the total weakness is not always reduced in this case. The second is adding m such that m ∈ / M and (g, m) ∈ /I such P that g ∈ Ex(c) to M and I, respectively. This can reduce the total weakness c∈B(KL ) weak(c). This means introducing new resources for sequences of events Ex(c). For example, purchase of the same machines as existing ones, and using a substitute to make up a shortage of materials. In order to decide properly which kind of operations is executed, we need other factors, e.g., execution time of the process, or costs and easiness of applying the operations. FCA FCA for for Process Process EnhancementBased Enhancement Basedon onaa Pair Pair of of Perspectives Perspectives 11 69 Example 7 In the running example, there are some choices for removing the weakest sequence Ex(c6 ) = { register ticket, check ticket }. For example, addition of (register request, Ellen) to I which means that Ellen gets an ability to “reg- ister request” can remove the weak point. It removes the concept c6 , changes c2 into ({ register request, check ticket, pay compensation } , { Ellen }), and c8 into ({ register request, check ticket } , { Pete, Mike, Ellen }), respectively. If we assume that “register request” is shared equally by Pete, Mike, and Ellen, the num- bers are changed: freq(c2 ) = 4, imp(c2 ) = 2, weak(c2 ) ; 0.42, freq(c3 ) = 3, imp(c3 ) = 2, weak(c3 ) ; 0.32, freq(c8 ) = 7, imp(c8 ) = 2.25, weak(c8 ) ; 0.83. In this case, the total weakness increases to around 2.66. Employing a new person, Bob, having ability to execute “register request” is an operations of the second type. This is to add Bob ∈ / M to M and to add (register request, Bob) ∈ / I to I. In this case, a new concept c12 = ({ register request } , { Bob }) is generated, and then, the total weakness decrease to 2.17 by assuming that “register request” is shared equally by Pete, Mike, and Bob. Because weak(c3 ) and weak(c6 ) decrease to around 0.32 and around 0.26, respectively, and weak(c12 ) ; 0.05. 4 Conclusions In this paper, we propose to apply FCA (formal concept analysis) to process enhancement. FCA is to analyze data from a dual viewpoint which is based on objects and attributes. Processes are recorded in event logs which are constructed by observation based on some perspectives. We assign a pair of the process perspective and the organization perspective to the objects and the attributes of FCA in order to investigate weak points of a process. Weakness of a sequence of events executed by resources is calculated by importance and loads of it. There are many problems to be solved. Our weakness of process is not defined from enough analysis because only two features from two perspectives are con- sidered. For improving a process more efficiently, we need to take into account other features across other perspectives in weak point detection. For example, using a time-stamp feature enables us to detect bottleneck of a process, using a cost feature enables us to find costly sequences. It may be achieved by com- bining other process models with our concept lattice. We also have to refine the operations for removing weak points. In our method, the number of the choices for enhancement sometimes becomes so large. A plan of the refinement is to estimate in advance the total weakness of a reinforced process for each of the choices. Combining other models is also useful. For example, combining proce- dure models with our method can suggest some effective operations from the many choices. Because such models sufficiently treat order of events in traces which is ignored by our lattice based approach. On the other hand, there are many constraints on resources in practical processes, e.g., some materials can be substituted few materials but the others can not, and employees are divided into groups in a company. In order to reduce the choices based on such constrains, social network models might be useful. 70 12 Madori Madori IKEDA, Ikeda, Keisuke Keisuke OtakiOTAKI, and Akihiro and Akihiro YAMAMOTO Yamamoto Acknowledgment This work was supported by JSPS KAKENHI Grant Number 26280085. References 1. B. A. Davey, and H. A. Priestley. Introduction to Lattices and Order. Cambridge University Press, 2002. 2. M. Dumas, and A. ter Hofstede. UML Activity Diagrams as a Workflow Specification Language. In M. Gogolla and C. Kobryn, editors, UML 2001 The Unified Modeling Language. Modeling Languages, Concepts, and Tools, Lecture Notes in Computer Science, vol. 2185, pp. 76–90, 2001. 3. R. Flowers, and C. Edeki. Business Process Modeling Notation. International Jour- nal of Computer Science and Mobile Computing, vol. 2, issue 3, pp. 35–40, 2013. 4. F. Forster. The Idea behind Business Process Improvement : Toward a Business Process Improvement Pattern Framework. BP Trends, pp. 1–14, 2006. 5. B. Ganter, and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York Inc., 1997. 6. D. Georgakopoulos, M. Hornick, and A. Sheth. An Overview of Workflow Manage- ment: From Process Modeling to Workflow Automation Infrastructure. Distributed and Parallel Databases, vol. 3, issue 2, pp. 119–153, 1995. 7. J. Mendling, and M. Nüttgens. EPC markup language (EPML): an XML-based interchange format for event-driven process chains (EPC). Information Systems and e-Business Management, vol. 4, issue 3, pp. 245–263, 2006. 8. H. A. Reijers, and S.L. Mansar. Best Practices in Business Process Redesign: An Overview and Qualitative Evaluation of Successful Redesign Heuristics. Omega, vol. 33, issue 4, pp. 283–306, 2005. 9. A. Shtub, and R. Karni. ERP - The Dynamics of Supply Chain and Process Man- agement. Springer, 2010. 10. M. Song, and W. van der Aalst. Towards Comprehensive Support for Organiza- tional Mining. Decis. Support Syst., vol. 46, issue 1, pp. 300–317, 2008. 11. W. van der Aalst, et al.. Process Mining Manifesto. In F. Daniel, K. Barkaoui and S. Dustdar, editors, Business Process Management Workshops, Lecture Notes in Business Information Processing, vol. 99, pp. 169–194, 2012. 12. W. van der Aalst, A. ter Hofstede, and M. Weske. Business Process Management: A Survey. In W. van der Aalst and M. Weske, editors, Business Process Management, Lecture Notes in Computer Science, vol. 2678, pp. 1–12, 2003. 13. W. van der Aalst. Process Mining - Discovery, Conformance and Enhancement of Business Processes. Springer, 2011. 14. W. van der Aalst, H. Reijers, and M. Song. Discovering Social Networks from Event Logs. Comput. Supported Coop. Work, vol. 14, issue 6, pp. 549–593, 2005. 15. W. van der Aalst, B. van Dongen, J. Herbst, L. Maruster, G. Schimm, and A. Weijters. Workflow Mining: A Survey of Issues and Approaches. Data Knowl. Eng., vol. 47, issue 2, pp. 237–267, 2003. 16. B. van Dongen, A. Alves de Medeiros, and L. Wen. Process Mining: Overview and Outlook of Petri Net Discovery Algorithms. In K. Jensen and W. van der Aalst, editors, Transactions on Petri Nets and Other Models of Concurrency II, Lecture Notes in Computer Science, vol. 5460, pp. 225–242, 2009. 17. H. Verbeek, J. Buijs, B. van Dongen, and W. van der Aalst. XES, XESame, and ProM 6. In P. Soffer and E. Proper, editors, Information Systems Evolution, Lecture Notes in Business Information Processing, vol. 72, pp. 60–75, 2011. Merging Closed Pattern Sets in Distributed Multi-Relational Data Hirohisa Seki⋆ and Yohei Kamiya Dept. of Computer Science, Nagoya Inst. of Technology, Showa-ku, Nagoya 466-8555, Japan seki@nitech.ac.jp Abstract. We consider the problem of mining closed patterns from multi-relational databases in a distributed environment. Given two lo- cal databases (horizontal partitions) and their sets of closed patterns (concepts), we generate the set of closed patterns in the global database by utilizing the merge (or subposition) operator, studied in the field of Formal Concept Analysis. Since the execution times of the merge opera- tions increase with the increase in the number of local databases, we pro- pose some methods for improving the merge operations. We also present some experimental results using a distributed computation environment based on the MapReduce framework, which shows the effectiveness of the proposed methods. Key Words: multi-relational data mining, closed patterns, merge (sub- position) operator, FCA, distributed databases, MapReduce 1 Introduction Multi-relational data mining (MRDM) has been extensively studied for more than a decade (e.g., [7, 8] and references therein), and is still attracting increas- ing interest in the fields of data mining (e.g., [14, 29]) and inductive logic pro- gramming (ILP). In the framework of MRDM, data and patterns (or queries) are represented in the form of logical formulae such as datalog (a class of first order logic). This expressive formalism of MRDM allows us to use complex and structured data in a uniform way, including trees and graphs in particular, and multi-relational patterns in general. On the other hand, Formal Concept Analysis (FCA) has been developed as a field of applied mathematics based on a clear mathematization of the notions of concept and conceptual hierarchy [11]. While it has attracted much interest from various application areas including, among others, data mining, knowledge acquisition and software engineering (e.g., [12]), research on extending the capa- bilities of FCA for AI (Artificial Intelligence) has recently been attracted much attention [20]. ⋆ This work was partially supported by JSPS Grant-in-Aid for Scientific Research (C) 24500171. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 71–83, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 72 Hirohisa Seki and Yohei Kamiya The notion of iceberg query lattices, proposed by Stumme [30], combines the notions of MRDM and FCA; frequent datalog queries in MRDM correspond to iceberg concept lattices (or frequent closed itemsets) in FCA. Ganter and Kuznetsov [10] have extensively studied the framework of more expressive pat- tern structures. In MRDM, condensed representations such as closed patterns and free patterns have been also studied in c-armr by De Raedt and Ramon [6], and in RelLCM2 by Garriga et al. [13]. We consider in this paper the problem of mining closed patterns (or queries) in multi-relational data, particularly applying the notion of iceberg query lat- tices to a distributed mining setting. The assumption that a given dataset is distributed and stored in different sites will be reasonable for some situations where we might not be able to move local datasets into a centralized site due to too much data size and/or privacy concerns. Given two local databases (horizontal partitions) and their sets of closed patterns (concepts), the set of closed patterns in the global database can be con- structed by using subposition) operator [11, 33] or the merge operator [23]. From our preliminary experiments [28] using a distributed computation environment MapReduce [3], we have found that the execution times of computing the merge operations have increased with the increase in the number of local databases. In this paper, we therefore propose some methods for computing the merge opera- tions so that we can efficiently construct the set of global closed patterns from the sets of local closed patterns. Our methods are based on the properties of the merge operator. The organization of the rest of this paper is as follows. After summarizing some basic notations and definitions of closed patterns mining in MRDM in Sect. 2, we consider distributed closed pattern mining in MRDB and the merge operator in Sect. 3. We then explain our approach to improving the merge oper- ations in Sect. 4. In Section 5, we show the effectiveness of our methods by some experimental results. Finally, we give a summary of this work in Section 6. 2 Iceberg Query Lattices in Multi-Relational Data Mining 2.1 Multi-Relational Data Mining In the task of frequent pattern mining in multi-relational databases, we assume that we have a given database r, a language of patterns, and a notion of fre- quency which measures how often a pattern occurs in the database. We use datalog, or Prolog without function symbols other than constants, to represent data and patterns. We assume some familiarity with the notions of logic pro- gramming (e.g., [22, 24]), although we introduce some notions and terminology in the following. Example 1. Consider a multi-relational database r in Fig. 1 (above), which con- sists of five relations, Customer, Parent, Buys, Male and Female. For each rela- tion, we introduce a corresponding predicate, i.e., customer , parent, buys, male and female, respectively. Merging Closed Pattern Sets in Distributed Multi-Relational Data 73 Customer Parent Buys Male key SR. JR. key item person allen allen bill allen pizza bill carol allen jim carol pizza jim diana carol bill diana cake fred diana eve fred cake fred eve Female fred hera person eve hera key(X) {a, c, d, f } key(X), parent(X, Y ) key(X), buys(X, cake) key(X), buys(X, pizza) {a, c} {(a, b), (a, j), (c, b), {d, f } (d, e), (f, e), (f, h)} key(X), buys(X, pizza), key(X), buys(X, cake), parent(X, Y ), male(Y ) parent(X, Y ), female(Y ) {(d, e), (f, e), (f, h)} {(a, b), (a, j), (c, b)} Fig. 1. An Example of Datalog Database r with customer relation as a key (above) and the Iceberg Query Lattice Associated to r (below), where a substitution θ = {X/t1 , Y /t2 } (resp., θ = {X/t1 }) is simply denoted by (t1 , t2 ) (resp., t1 ), and the name (e.g., allen) of each person in the tables is abbreviated to its first character (e.g., a). Consider the following pattern P = customer (X), parent(X, Y ), buys(X, pizza). For a substitution θ, P θ is logically entailed by r, denoted by r |= P θ, if there exists a tuple (a1 , a2 ) such that a1 ∈ Customer, (a1 , a2 ) ∈ Parent, and tuple (a1 , pizza) ∈ Buys. Then, answerset(P, r) = {{X/allen, Y /bill }, {X/allen, Y /jim}, {X/carol , Y /bill }}. 2 An atom (or literal ) is an expression of the form p(t1 , . . . .tn ), where p is a predicate (or relation) of arity n, denoted by p/n, and each ti is a term, i.e., a constant or a variable. A substitution θ = {X1 /t1 , . . . , Xn /tn } is an assignment of terms to variables. The result of applying a substitution θ to an expression E is the expression Eθ, where all occurrences of variables Vi have been simultaneously replaced by the corresponding terms ti in θ. The set of variables occurring in E is denoted by Var (E). A pattern is expressed as a conjunction of atoms (literals) l1 ∧· · ·∧ln , denoted simply by l1 , . . . , ln . A pattern is sometimes called a query. We will represent conjunctions in list notation, i.e., [l1 , . . . , ln ]. For a conjunction C and an atom p, we denote by [C, p] the conjunction that results from adding p after the last element of C. 74 Hirohisa Seki and Yohei Kamiya Let C be a pattern (i.e., a conjunction) and θ a substitution of Var (C). When Cθ is logically entailed by a database r, we write it by r |= Cθ. Let answerset(C, r) be the set of substitutions satisfying r |= Cθ. In multi-relational data mining, one of the predicates is often specified as a key (or target) (e.g., [4, 6]), which determines the entities of interest and what is to be counted. The key (target) is thus to be present in all patterns considered. In Example 1, the key is predicate customer . Let r be a database and Q be a query containing a key atom key(X). Then, the support (or frequency) of Q, denoted by supp(Q, r, key), is defined to be the number of different keys that answer Q (called the support count or abso- lute support), divided by the total number of keys. Q is said to be frequent, if supp(Q, r, key) is no less than some user defined threshold min sup. A pattern containing a key will not be always meaningful; for example, let C = [customer (X), parent(X, Y ), buys(Z, pizza)] be a conjunction in Example 1. Variable Z in C is not linked to variable X in key atom customer (X); an object represented by Z will have nothing to do with key object X. It will be inap- propriate to consider such a conjunction as an intended pattern to be mined. In ILP, the following notion of linked literals [16] is used to specify the so-called language bias. Definition 1 (Linked Literal). [16] Let key(X) be a key atom and l a literal. l is said to be linked to key(X), if either X ∈ Var (l) or there exists a literal l1 such that l is linked to key(X) and Var (l1 ) ∩ Var (l) ̸= ∅. 2 Given a database r and a key atom key(X), we assume that there are pre- defined finite sets of predicate (resp. variables; resp. constant symbols), and that, for each literal l in a conjunction C, it is constructed using the predefined sets. Moreover, each pattern C of conjunctions satisfies the following conditions: key(X) ∈ C and, for each l ∈ C, l is linked to key(X). In the following, we de- note by Q the set of queries (or patterns) satisfying the above bias condition. 2.2 Iceberg Query Lattices with Key We now consider the notion of a formal context in MRDM, following [30]. Definition 2. [30] Let r be a datalog database and Q a set of datalog queries. The formal context associated to r and Q is defined by Kr, Q = (Or, Q , Ar, Q , Ir, Q ), where Or, Q = {θ | θ is a grounding substitution for all Q ∈ Q}, and Ar, Q = Q, and (θ, Q) ∈ Ir, Q if and only if θ ∈ answerset(Q, r). 2 From this formal context, we can define the concept lattice the same way as in [30]. We first introduce an equivalence relation ∼r on the set of queries: Two queries Q1 and Q2 are said to be equivalent with respect to database r if and only if answerset(Q1 , r) = answerset(Q2 , r). We note that Var (Q1 ) = Var (Q2 ) when Q1 ∼r Q2 . Merging Closed Pattern Sets in Distributed Multi-Relational Data 75 Definition 3 (Closed Query). Let r be a datalog database and ∼r the equiv- alence relation on a set of datalog queries Q. A query (or pattern) Q is said to be closed (w.r.t. r and Q), iff Q is the most specific query among the equivalence class to which it belongs: {Q1 ∈ Q | Q ∼r Q1 }. 2 For any query Q1 , its closure is a closed query Q such that Q is the most specific query among {Q ∈ Q | Q ∼r Q1 }. Since it uniquely exists, we denote it by Clo(Q1 ; r). We note again that Var (Q1 ) = Var (Clo(Q1 ; r)) by definition. We refer to this as the range-restricted condition here. Stumme [30] showed that the set of frequent closed queries forms a lattice, called an iceberg query lattice. In our framework, it is necessary to take our bias condition into consideration. To do that, we employ the well-known notion of the most specific generalization (or least generalization) [26, 24]. For queries Q1 and Q2 , we denote by lg(Q1 , Q2 ) the least generalization of Q1 and Q2 . Moreover, the join of Q1 and Q2 , denoted by Q1 ∨ Q2 , is defined as: Q1 ∨ Q2 = lg(Q1 , Q2 )|Q , where, for a query Q, Q|Q is the restriction of Q to Q, defined by a conjunction consisting of every literal l in Q which is linked to key(X), i.e., deleting every literal in Q not linked to key(X). Definition 4. [30] Let r be a datalog database and Q a set of datalog queries. The iceberg query lattice associated to r and Q for minsupp ∈ [0, 1] is defined as: Cr, Q = ({Q ∈ Q | Q is closed w.r.t. r and Q, and Q is frequent}, |=), where |= is the usual logical implication. 2 Example 2. Fig. 1 (below) shows the iceberg query lattice associated to r in Ex. 1 and Q with the support count 1, where each query Q ∈ Q has customer (X) as a key atom, denoted by key(X) for short, Q is supposed to contain at most two variables (i.e., X, Y ), and the 2nd argument of predicate buys is a constant. 2 Theorem 1. [28] Let r be a datalog database and Q a set of datalog queries where all queries contain an atom key and they are linked. Then, Cr, Q is a ∨-semi-lattice. 2 3 Distributed Closed Pattern Mining in MRDB Horizontal Decomposition of MRDB and Mining Local Concepts Our purpose in this work is to mine global concepts in a distributed setting, where a global database is supposed to be horizontally partitioned appropriately, and stored possibly in different sites. We first consider the notion of a horizontal decomposition of a multi-relational DB. Since a multi-relational DB consists of multiple relations, its horizontal decomposition is not immediately clear. Definition 5. Let r be a multi-relational datalog database with a key pred- icate key. We call a pair r1 , r2 a horizontal decomposition of r, if (i) keyr = keyr1 ∪· keyr2 , i.e., the key relation keyr in r is disjointly decomposed into keyr1 and keyr2 in r1 and r2 , respectively, and (ii) for any query Q, answerset(Q, r) = answerset(Q, r1 ) ∪ answerset(Q, r2 ). 2 76 Hirohisa Seki and Yohei Kamiya The second condition in the above states that the relations other than the key relation in r are decomposed so that any answer substitution in answerset(Q, r) is computed either in partition r1 or r2 , thereby being preserved in this horizon- tal decomposition. An example of a horizontal decomposition of r is shown in Example 3 below. Given a horizontal decomposition of a multi-relational DB, we can utilize any preferable concept (or closed pattern) mining algorithm for computing local concepts on each partition, as long as the mining algorithm is applicable to MRDM and its resulting patterns satisfy our bias condition. We use here an algorithm called ffCLM [27], which is based on the notion of closure extension due to Pasquier et al. [25] and Uno et al. [32] in frequent itemset mining. Computing Global Closed Patterns by Merge Operator in MRDM To compute the set of global closed patterns from the sets of local closed patterns in MRDM, we need the following merge operator ⊕. For patterns C1 and C2 , we denote by C1 ∩ C2 a possibly empty conjunction of the form: l1 ∧ · · · ∧ lk (k ≥ 0) such that, for each li (i ≤ k), li ∈ C1 and li ∈ C2 . Theorem 2. [28] Let r be a datalog database, and r1 , r2 a horizontal decomposi- tion of r. Let C (Ci ) (i = 1, 2) be the set of closed patterns of r (ri ), respectively. Then, we have the following: C = C1 ⊕ C2 = (C1 ∪ C2 ) ∪ {C1 ∩ C2 | C1 ∈ C1 , C2 ∈ C2 , C1 ∩ C2 is linked with key.} (1) The set of global closed patterns C is obtained by the union of the local closed patterns C1 and C2 , and, in addition to that, by intersecting each pat- tern C1 ∈ C1 and C2 ∈ C2 . Furthermore, the pattern obtained by the in- tersection, C1 ∩ C2 , should satisfy the bias condition (Def. 1). We note that C1 ∩ C2 does not necessarily satisfy the linkedness condition; for example, sup- pose that C1 (C2 ) is a closed pattern of the form: C1 = key(X), p(X, Y ), m(Y ) (C2 = key(X), q(X, Y ), m(Y )), respectively. Then, C1 ∩ C2 = key(X), m(Y ), which is not linked to key(X), and thus does not satisfy the bias condition. We note that, in the case of transaction databases, the above theorem coin- cides with the one by Lucchese et al. [23]. Example 3. We consider a horizontal decomposition r1 , r2 of r in Example 1 such that the key relation keyr (i.e., Customer) in r is decomposed into keyr1 = {allen, carol} and keyr2 = {dian, fred}, and the other relations than Customer are decomposed so that they satisfy the second condition of Def. 5. Consider a globally closed pattern C = [key(X), parent(X, Y )] in Fig. 1. In r1 , there exists a closed pattern C1 of the form: [C, buys(X, pizza), male(Y )], while, in r2 , there exists a closed pattern C2 of the form: [C, buys(X, cake), female(Y )]. Then, we have that C coincides with C1 ∩ C2 . 2 Merging Closed Pattern Sets in Distributed Multi-Relational Data 77 We can now formulate our problem as follows: Mining Globally Closed Patterns from Local DBs: Input: A set of local databases {DB 1 , . . . , DB n } Output: the set of global closed patterns C1..n . In order to compute C1..n , our approach consists of two phases: we first com- pute each set Ci (i = 1, . . . , n) of local closed patterns from DB i , and then we compute C1..n by applying the merge operators. We call the first phase the mining phase, while we call the second phase the merge phase. 4 Making Merge Computations Efficient in MRDM In the merge operation in conventional data mining such as itemsets, comput- ing the intersection of two sets in the merge operation ⊕ is straightforward. In MRDM, on the other hand, the computation of ⊕ operator becomes somewhat involved due to handling variables occurring in patterns. Namely, two additional tests are required: checking the bias condition (linkedness), and checking equiv- alence modulo variable renaming for eliminating duplicate patterns. For closed patterns C1 and C2 , we must check whether the intersection C1 ∩C2 satisfies the linkedness condition. Moreover, we must check whether C1 ∩ C2 is equivalent (modulo variable renaming) to the other patterns obtained so far. For example, let C1 (C2 ) be a pattern of the form: C1 = key(X), p(X, Y ), m(Y ) (C2 = key(X), p(X, Z), m(Z)), respectively. Then, C1 is equivalent to C2 modulo variable renaming. When implementing a data mining system, such handling variables in pat- terns will necessarily require string manipulations, and such string operations would lead to undesirable overhead in actual implementation. In the following, we therefore propose two methods for reducing the computational costs in the merge operation. 4.1 Partitioning Pattern Sets When computing the merge operation, we can use the following property: Proposition 1. Let DB = DB1 ∪ DB2 , and C (Ci ) the set of closed patterns of DB (DBi ) (i = 1, 2), respectively. Then, C = C1 ⊕ C2 = (C1 ∪ C2 ) ∪ {C1 ∩ C2 | (C1 , C2 ) ∈ (C1 , C2 ) , C1 ∩ C2 : linked with key, Var (C1 ) = Var (C2 )} (2) Proof. Let C be a closed pattern in C such that C is linked with key. From Theorem 2, it suffices to show that there exist patterns Ci ∈ Ci (i = 1, 2) such that C = C1 ∩ C2 and Var (C1 ) = Var (C2 ). Let Ci = Clo(C; DB i ) (i = 1, 2). Then, we have from the definition of Clo(·; ·) that Var (C) = Var (C1 ) = Var (C2 ). Moreover, we can show that C = C1 ∩ C2 , which is to be proved. 2 78 Hirohisa Seki and Yohei Kamiya From the above proposition, when computing the intersection of each pair of patterns C1 ∈ C1 and C2 ∈ C2 in (1), we can perform the intersection of only those pairs (C1 , C2 ) containing the same set of variables, i.e., Var (C1 ) = Var (C2 ). When compared with the original definition of the merge operator ⊕ (Theorem 2), the above property will be utilized to reduce the cost of the merge operations. 4.2 Merging Diff-Sets Next, we consider another method for making the merge operation efficient, which is based on the following simple observation: Observation 1. Given sets of closed patterns C1 and C2 , let D1 = C1 \ C2 and D2 = C2 \C1 , namely, Di is a difference set (diff-set for short) (i = 1, 2). Suppose that C is a new (or generator [33]) pattern in C1 ⊕ C2 , meaning that C ∈ C1 ⊕ C2 , while C ̸∈ C1 ∪C2 . Then, C is obtained by intersection operation, i.e., C = C1 ∩C2 for some patterns C1 ∈ D1 and C2 ∈ D2 . That is, a new closed pattern C will be generated only when intersecting those patterns in the difference sets in D1 and D2 . This fact easily follows from the property that the set of closed patterns is a semi-lattice: suppose otherwise that C1 ∈ D1 , while C2 ̸∈ D2 . Then, C2 ∈ C1 . Since both C1 and C2 are in C1 , we have that C = C1 ∩ C2 is a closed pattern also in C1 , which implies that C is not a new pattern. Algorithm 1 shows the above-mentioned method based on the difference sets. In the algorithm, the computation of supports (or occurrences) is omitted, which is done similarly in [33]. Algorithm 1: Diff-Set Merge(C1 , C2 ) input : sets of closed patterns C1 , C2 output: C1..2 = C1 ⊕ C2 1 C = C1 ∩ C2 ; D1 = C1 \ C2 ; D2 = C2 \ C1 ; 2 foreach pair (C1 , C2 ) ∈ D1 × D2 do 3 C ← C1 ∩ C2 ; 4 if C satisfies the bias condition and C ̸∈ C then 5 C ← C ∪ {C}; 6 end 7 end 8 return C 5 Experimental Results Implementation and Test Data To see the effectiveness of our approach to distributed mining, we have made some experiments. As for the mining phase, we implemented our approach by Merging Closed Pattern Sets in Distributed Multi-Relational Data 79 using Java 1.6.0 22. Experiments of the phase were performed on 8 PCs with Intel Core i5 processors running at 2.8GHz, 8GB of main memory, and 8MB of L2 cache, working under Ubuntu 11.04. We used Hadoop 0.20.2 using 8 PCs, and 2 mappers working on each PC. On the other hand, experiments of the merging phase were performed on one of the PCs. We use two datasets, often used in the field of ILP; one is the mutagenesis dataset1 , and the other is an English corpus of the Penn Treebank Project2 . The mutagenesis dataset, for example, contains 30 chemical compounds. Each compound is represented by a set of facts using predicates such as atom, bond , for example. The size of the set of predicate symbols is 12. The size of key atom (active(X )) is 230, and minimum support min sup = 1/230. We assume that patterns contain at most 4 variables and they contain no constant symbols. The number of the closed patterns mined is 5, 784. Effect of Partitioning Pattern Sets Fig. 2 (left) summarizes the results of the execution times for a test data on the mutagenesis dataset. We can see from the figure that the execution times t1 of the mining phase are reduced almost linearly with the number of partitions. On the other hand, the execution times t2 of the merging phase for obtaining global closed patterns increase almost linearly with the number p of partitions from 1 (i.e., no partitioning) to 16. This is reasonable; the number of applying the merge operators is (p − 1) when we have p partitions. Note that the execution time for the merge phase in the case of a single partition means some start-up overheads such as opening/reading a file of the results of the mining phase, followed by preparing the inputs of the merge operation. In this particular example, the time spent in the merge phase is relatively small when compared with that for the mining phase. This is because the number of partitions and the number of local closed patterns are rather small. When the number of partitions of a global database becomes larger, however, the execution times for the merging phase will become inevitably larger. Considering efficient merge algorithms is thus an important issue for scalability in MRDM. To see the effect of using Proposition 1, Fig. 2 (right) shows the numbers of closed patterns in a merge computation C1 ⊕ C2 with input sets C1 , C2 of closed patterns for the mutagenesis dataset with 16 partitions. Each table shows the number of patterns in Ci (i = 1, 2) containing k variables for 1 ≤ k ≤ 4. The number of computing intersection operations based on Proposition 1 has been reduced to about 80% of that of the original computation. The execution times in Fig. 2 (left) are the results obtained by using this method. 1 http://www.cs.ox.ac.uk/activities/machlearn/mutagenesis.html 2 http://www.cis.upenn.edu/ treebank/ 80 Hirohisa Seki and Yohei Kamiya Fig. 2. Execution Times of the Mining Phase and the Merge Phase (left) and No. of Patterns in a Merge Computation (right): An Example in the Mutagenesis Dataset. Each number in a quadrangle is the size of a closed pattern set. D1 = C1 \ C2 and D2 = C2 \ C1 . Effect of Merging Diff-Sets Fig. 3 shows its performance results (the execution times), compared with the naive method, using the same datasets, the mutagenesis (left) and the English corpus (right). In both datasets, the execution times decrease as the number n of the local DBs increases; in particular, when n = 16 in the mutagenesis data set, the execution time is reduced to about 43% of that of the naive method. To see the reason of this results, Fig. 2 (right) shows the sizes of the difference sets D1 and D2 used in the merge computation C1 ⊕ C2 with input sets C1 , C2 of the closed patterns. Fig. 3. Results of the Diff-Sets Merge Method: The Mutagenesis Dataset (left) and The English Corpus (right) Merging Closed Pattern Sets in Distributed Multi-Relational Data 81 6 Concluding Remarks We have considered the problem of mining closed patterns from multi-relational databases in a distributed environment. For that purpose, we have proposed two methods for making the merge (or subposition) operations efficient, and we have then exemplified the effectiveness of our method by some preliminary experi- mental results using MapReduce/Hadoop distributed computation framework in the mining process. In MRDM, efficiency and scalability have been major concerns [2]. Krajca et al. [17, 18] have proposed algorithms to compute search trees for closed patterns simultaneously either in parallel or in a distributed manner. Their approaches are orthogonal to ours; it would be beneficial to employ their algorithms for computing local closed patterns in the mining phase in our framework. In this work, we have confined ourselves to horizontal partitions of a global MRDB. It will be interesting to study vertical partitioning and their mixture in MRDM, where the apposition operator studied by Valtchev et al. [34] will play an important role. As future work, our plan is to develop an efficient algorithm dealing with such a general case in MRDM. Acknowledgement The authors would like to thank anonymous reviewers for their useful comments on the previous version of the paper. References 1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. in Proc. VLDB Conf., pp. 487–499, 1994. 2. Blockeel, H., Sebag, M.: Scalability and efficiency in multi-relational data mining. SIGKDD Explorations Newsletter 2003, Vol.4, Issue 2, pp.1-14, 2003. 3. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM, Vol. 51, No. 1, pp.107–113, 2008. 4. Dehaspe, L.: Frequent pattern discovery in first-order logic, PhD thesis, Dept. Com- puter Science, Katholieke Universiteit Leuven, 1998. 5. Dehaspe, L., Toivonen, H.: Discovery of Relational Association Rules. in S. Dzeroski and N Lavrac (eds.) Relational Data Mining, pp. 189–212, Springer, 2001. 6. De Raedt, L., Ramon, J.: Condensed representations for Inductive Logic Program- ming. in Proc. KR’04, pp. 438-446, 2004. 7. Dzeroski, S.: Multi-Relational Data Mining: An Introduction. SIGKDD Explo- rations Newsletter 2003, Vol.5, Issue 1, pp.1-16, 2003. 8. Dzeroski, S., Lavrač, N. (eds.): Relational Data Mining. Springer-Verlag, Inc. 2001. 9. Ganter, B.: Two Basic Algorithms in Concept Analysis, Technical Report FB4- Preprint No. 831, TH Darmstadt, 1984. also in Formal Concept Analysis, LNCS 5986, pp. 312-340, Springer, 2010. 10. Ganter, B., Kuznetsov, S.: Pattern structures and Their Projections, ICCS-01, LNCS, 2120, pp. 129-142, 2001. 11. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, 1999. 82 Hirohisa Seki and Yohei Kamiya 12. Ganter, B., Stumme, G., Wille, R.: Formal Concept Analysis, Foundations and Applications. LNCS 3626, Springer, 2005. 13. Garriga,G. C., Khardon, R., De Raedt, L.: On Mining Closed Sets in Multi- Relational Data. in Proc. IJCAI 2007, pp.804-809, 2007. 14. Goethals, B., Page, W. L., Mampaey, M.: Mining Interesting Sets and Rules in Relational Databases. in Proc. 2010 ACM Sympo. on Applied Computing (SAC ’10), pp. 997-1001, 2010. 15. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edition, Morgan Kaufmann Publishers Inc., 2005. 16. Helft, N.: Induction as nonmonotonic inference. in Proc. KR’89, pp. 149–156, 1989. 17. Krajca, P., Vychodil, V.: Distributed Algorithm for Computing Formal Concepts Using Map-Reduce Framework, in Proc. IDA ’09, Springer-Verlag, pp. 333–344, 2009. 18. Krajca, P., Outrata, J., Vychodil, V.: Parallel algorithm for computing fixpoints of Galois connections, Annals of Mathematics and Artificial Intelligence, Vol. 59, No. 2, pp. 257–272, Kluwer Academic Publishers, 2010. 19. Kuznetsov, S. O.: A Fast Algorithm for Computing All Intersections of Objects in a Finite Semi-lattice, Automatic Documentation and Mathematical Linguistics, Vol. 27, No. 5, pp. 11-21, 1993. 20. Kuznetsov, S. O., Napoli, A., Rudolph, S., eds: FCA4AI: “What can FCA do for Artificial Intelligence?” IJCAI 2013 Workshop, Beijing, China, 2013. 21. Kuznetsov, S. O., Obiedkov, S. A.: Comparing performance of algorithms for gen- erating concept lattices. J. Exp. Theor. Artif. Intell., 14(2-3):189-216, 2002. 22. Lloyd, J. W.: Foundations of Logic Programming, Springer, Second edition, 1987. 23. Lucchese, C., Orlando, S., Rergo, R.: Distributed Mining of Frequent Closed Item- sets: Some Preliminary Results. International Workshop on High Performance and Distributed Mining, 2005. 24. Nienhuys-Cheng, S-H., de Wolf, R.: Foundations of Inductive Logic Programming, LNAI 1228, Springer, 1997. 25. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed Itemsets for Association Rules. in Proc. ICDT’99, LNAI 3245, pp. 398-416, 1999. 26. Plotkin, G.D.: A Note on Inductive Generalization. Machine Intelligence, Vol. 5, pp. 153-163, 1970. 27. Seki, H., Honda, Y., Nagano, S.: On Enumerating Frequent Closed Patterns with Key in Muti-relational Data. LNAI 6332, pp. 72-86, 2010. 28. Seki, H., Tanimoto, S.: Distributed Closed Pattern Mining in Multi-Relational Data based on Iceberg Query Lattices: Some Preliminary Results. in Proc. CLA’12, pp.115-126, 2012 29. Spyropoulou, E., De Bie. T., Boley, M.: Interesting Pattern Mining in Multi- Relational Data. Data Min. Knowl. Discov. 42(2), pp. 808-849, 2014. 30. Stumme, G.: Iceberg Query Lattices for Datalog. In Conceptual Structures at Work, LNCS 3127, Springer-Verlag, pp. 109-125, 2004. 31. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing Iceberg Concept Lattices with Titanic. J. on Knowledge and Data Engineering (KDE) 42(2), pp. 189-222, 2002. 32. Uno, T., Asai, T. Uchida, Y., Arimura, H.: An Efficient Algorithm for Enumerating Closed Patterns in Transaction Databases. DS’04, LNAI 3245, pp. 16-31, 2004. 33. Valtchev, P., Missaoui, R.: Building Concept (Galois) Lattices from Parts: Gener- alizing the Incremental Methods. In Proc. 9th Int’l. Conf. on Conceptual Structures: Broadening the Base (ICCS ’01), Springer-Verlag, London, UK, pp. 290-303, 2001. 34. Valtchev, P., Missaoui, R., Pierre Lebrun, P.: A Partition-based Approach towards Constructing Galois (Concept) Lattices. Discrete Mathematics 256(3): 801-829, 2002. Looking for bonds between nonhomogeneous formal contexts Ondrej Krı́dlo, Lubomir Antoni, Stanislav Krajči University of Pavol Jozef Šafárik, Košice, Slovakia? Abstract. Recently, the concept lattices working with the heteroge- neous structures have been fruitfully applied in a fuzzy formal concept analysis. We present a situation under nonhomogeneous formal contexts and explore the bonds in a such nonhomogeneous case. This issue requires to formulate the alternative definition of a bond and to investigate the relationships between bonds and the particular formal contexts. Keywords: bond, heterogeneous formal context, second order formal context 1 Introduction Formal concept analysis (FCA) [16] as an applied lattice theory allows us to explore the meaningful groupings of objects with respect to common attributes. In general, FCA is an interesting research area that provides theoretical foun- dations, fruitful methods, algorithms and underlying applications in many areas and has been investigated in relation to various disciplines and integrated ap- proaches [13,15]. The feasible attempts and generalizations are investigated, one can see dual multi-adjoint concept lattices working with adjoint triples [27–29], interval-valued L-fuzzy concept lattices [1], heterogeneous concept lattices [2, 3], connectional concept lattices [12, 32, 33]. Classical bonds and their generaliza- tions acting on residuated lattices were analyzed from a broader perspective in [17, 21, 24]. In this paper, we deal with an alternative notion of the bonds and with a problem of looking for bonds in a nonhomogeneous formal contexts. In particular, Section 2 recalls the basic notions of a concept lattice, notion of a bond, its equivalent definition and preliminaries of a second order formal context and a heterogeneous formal context. Section 3 describes the idea of a looking for bonds in a nonhomogeneous case. Sections 4 and 5 provide the solution of this issue in terms of a second order formal context and heterogeneous formal context. ? This work was partly supported by grant VEGA 1/0832/12 and by the Slovak Re- search and Development Agency under contract APVV-0035-10 “Algorithms, Au- tomata, and Discrete Data Structures”. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 83–95, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 84 Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči 2 Preliminaries Definition 1. Let B and A be the nonempty sets, R ⊆ B × A be an arbitrary binary relation. Triple hB, A, Ri is said to be a formal context with a set of objects B and a set of their attributes A. Relationships between objects and their attributes are saved in the relation R. Let us define a pair of derivation operators (↑, ↓) as the mappings between powersets of B and A such that – ↑: P(B) → P(A) and ↓: P(A) → P(B) where for any X ⊆ B and Y ⊆ A is – ↑ (X) = {a ∈ A|(∀b ∈ X)(b, a) ∈ R} – ↓ (Y ) = {b ∈ B|(∀a ∈ Y )(b, a) ∈ R}. Such derivation operators can be defined as the mappings between 2-sets (bor- rowed from fuzzy generalization of FCA that is sometimes easier to use) – ↑: 2B → 2A and ↓: 2A → 2B where for any X ∈ 2B and Y ∈ 2A V V – ↑ (X)(a) = b∈B ((b ∈ X) ⇒ ((b, a) ∈ R)) = b∈B (X(b) ⇒ R(b, a)) V V – ↓ (Y )(b) = a∈A ((a ∈ Y ) ⇒ ((b, a) ∈ R)) = a∈A (Y (a) ⇒ R(b, a)). Pair of such derivation operators forms an antitone Galois connection be- tween complete lattices of all subsets of B and A. Hence, the compositions of the mappings form closure operators on such complete lattices. Definition 2. Let C = hB, A, Ri be a formal context. Any pair of sets (X, Y ) ∈ 2B × 2A is said to be a formal concept iff X =↓ (Y ) and Y =↑ (X). Object part of any concept is called extent and attribute part is called intent. Set of all extents of formal context C will be denoted by Ext(C). The notation Int(C) stands for the set of all intents of C. All concepts ordered by set inclusion of extents (or equivalently by dual of intent inclusion) form a complete lattice structure. 2.1 Notion of bond and its equivalent definition Definition 3. Let Ci = hBi , Ai , Ri i for i ∈ {1, 2} be two formal contexts. Rela- tion β ⊆ B1 × A2 is said to be a bond iff any row of the table is an intent of C2 and any of its column is an extent of C1 . Set of all bonds between C1 and C2 will be denoted by 2-Bonds(C1 , C2 ). Lemma 1. Let Ci = hBi , Ai , Ri i for i ∈ {1, 2} be two formal contexts. Then β ⊆ B1 ×A2 is a bond between C1 and C2 if and only if Ext(hB1 , A2 , βi) ⊆ Ext(C1 ) and Int(hB1 , A2 , βi) ⊆ Int(C2 ). Proof. ⇒: Let X ∈ Ext(hB1 , A2 , βi) be an arbitrary extent of any bond between formal contexts C1 and C2 . Derivation operators of Ci will be denoted by (↑i , ↓i ) Looking for Bonds between Nonhomogeneous Formal Contexts 85 for i ∈ {1, 2}. Derivation operators of the bond will be denoted by (↑β , ↓β ). Then there exists a set of attributes Y ⊆ A2 such that ^ ↓β (Y )(b1 ) = (Y (a2 ) ⇒ β(b1 , a2 )) a2 ∈A2 β(−, a2 ) is an extent of Ext(C1 ) hence there exists Z ⊆ A1 ^ = (Y (a2 ) ⇒↓1 (Z)(b1 )) a2 ∈A2 ^ ^ = Y (a2 ) ⇒ Z(a1 ) ⇒ R1 (b1 , a1 ) a2 ∈A2 a1 ∈A1 ^ ^ = (Y (a2 ) ⇒ (Z(a1 ) ⇒ R1 (b1 , a1 ))) a2 ∈A2 a1 ∈A1 ^ ^ = ((Y (a2 ) ∧ Z(a1 )) ⇒ R1 (b1 , a1 )) a2 ∈A2 a1 ∈A1 ^ _ = Y (a2 ) ∧ Z(a1 ) ⇒ R1 (b1 , a1 ) a1 ∈A1 a2 ∈A2 ^ = (ZY (a1 ) ⇒ R1 (b1 , a1 )) a1 ∈A1 W =↓1 (ZY )(b1 ) where ZY (a1 ) = a2 ∈A2 (Y (a2 ) ∧ Z(a1 )) Hence, Ext(hB1 , A2 , βi) ⊆ Ext(C1 ). Similarly for intents. ⇐: Assume a formal context hB1 , A2 , βi such that it holds Ext(hB1 , A2 , βi) ⊆ Ext(C1 ) and Int(hB1 , A2 , βi) ⊆ Int(C2 ). From the simple fact that any row of any context is its intent and any column is its extent and from the previous inclusions, we obtain that β is a bond between C1 and C2 . t u Hence, the notion of bond can be defined equivalently as follows. Definition 4. Let Ci = hBi , Ai , Ri i for i ∈ {1, 2} be two formal contexts. Formal context B = hB1 , A2 , βi is said to be a bond between C1 and C2 if Ext(B) ⊆ Ext(C1 ) and Int(B) ⊆ Int(C2 ). More about the equivalent definition of bond could be found in [17–19]. 2.2 Direct product of two formal contexts and bonds Let us recall the definition and important property of direct product of two formal contexts. More details about such topic can be found in [21, 26]. Definition 5. Let Ci = hBi , Ai , Ri i be two formal contexts. Formal context C1 ∆C2 = hB1 × A2 , B2 × A1 , R1 ∆R2 i where (R1 ∆R2 )((b1 , a2 ), (b2 , a1 )) = R1 (b1 , a1 ) ∨ R2 (b2 , a2 ) = ¬R1 (b1 , a1 ) ⇒ R2 (b2 , a2 ) = ¬R2 (b2 , a2 ) ⇒ R1 (b1 , a1 ) 86 Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči for any (bi , ai ) ∈ Bi × Ai for all i ∈ {1, 2} is said to be a direct product of formal contexts C1 and C2 . Lemma 2. Let Ci = hBi , Ai , Ri i be two formal contexts. Every extent of C1 ∆C2 is a bond between C1 and C2 . 2.3 Second order formal contexts In this subsection, we remind a notion of a second order formal concept [24]. Definition S S 6. Consider two non-empty index sets I and J and a formal context h i∈I Bi , j∈J Aj , ri, whereby – Bi1 ∩ Bi2 = ∅ for any i1 , i2 ∈ I, i1 6= i2 , – Aj1S∩ Aj2 = ∅Sfor any j1 , j2 ∈ J, j1 6= j2 , – r : i∈I Bi × j∈J Aj → 2. Moreover, consider two non-empty sets of 2-contexts notated – {Ci = hBi , Ti , pi i : i ∈ I} – {Dj = hOj , Aj , qj i : j ∈ J}. Formal context of second order is a tuple D[ [ [ E Bi , {Ci ; i ∈ I}, Aj , {Dj ; j ∈ J}, ri,j , i∈I j∈J (i,j)∈I×J where ri,j : Bi × Aj → 2 defined as ri,j (b, a) = r(b, a) for any b ∈ Bi and a ∈ Aj . In what follows, consider the below describedSnotation. Let us have an L-set f : X → 2 for a non-empty universe set X = i∈I Xi , where Xi1 ∩ Xi2 = ∅ for any i1 , i2 ∈ I. Then f i : Xi → 2 is defined as f i (x) = f (x) for an arbitrary x ∈ Xi and i ∈ I. We define the mappings between direct products of two sets of concept lat- tices (that correspond to the two sets of 2-contexts given above) in the following form: Definition 7. Let us define the mappings h⇑, ⇓i as follows Y Y Y Y ⇑: Ext(Ci ) → Int(Dj ) and ⇓: Int(Dj ) → Ext(Ci ) i∈I j∈J j∈J i∈I ^ Y j i ⇑ (Φ) = ↑ij (Φ ), for any Φ ∈ Ext(Ci ) i∈I i∈I ^ Y ⇓ (Ψ )i = ↓ij (Ψ j ), for any Ψ ∈ Int(Dj ) j∈J j∈J such that (↑ij , ↓ij ) is a pair of derivation operators defined on hBi , Aj , ρij i where ^ ρij = {β ∈ 2-Bonds(Ci , Dj ) : (∀(bi , aj ) ∈ Bi × Aj )β(bi , aj ) ≥ rij (bi , aj )}. Looking for Bonds between Nonhomogeneous Formal Contexts 87 2.4 Heterogeneous formal contexts A heterogeneous extension in FCA based on the totally diversification of objects, attributes and table fields has been introduced in [3]. In the following, we remind the definition of a heterogeneous formal context and its derivation operators. Definition 8. Heterogeneous formal context is a tuple C = hB, A, P, R, U, V, i, where – B and A are non-empty sets, – P = {hPb,a , ≤Pb,a i : (b, a) ∈ B × A} is a system of posets, – R is a mapping from B × A such that R(b, a) ∈ Pb,a for any b ∈ B and a ∈ A, – U = {hUb , ≤Ub i : b ∈ B} and V = {hVa , ≤Va i : a ∈ A} are systems of complete latices, – = {◦b,a : (b, a) ∈ B × A} is a system of isotone and left-continuous mappings ◦b,a : Ub × Va −→ Pb,a . Let us define the derivation operators Q of a heterogeneous Q formal context Q as a pair Q of mappings (%, .), whereby %: b∈B Ub → a∈A Va and .: a∈A V a → b∈B Ub such that W Q – . (f )(a) = W {v ∈ Va |f (b) ◦b,a v ≤ R(b, a)} for any f ∈ Q b∈B Ub – % (g)(b) = {u ∈ Ub |u ◦b,a g(a) ≤ R(b, a)} for any g ∈ a∈A Va . 3 Problem description and sketch of solution In this section we discussed why we have proposed an equivalent definition of bond. First, consider the classical definition of bond. It is a binary relation (table) between objects and attributes from different contexts such that its rows are intents and columns are extents of different input contexts. The issue of looking for bonds in a classical or homogeneous fuzzy case can be solved successfully [17, 21]. The solution of this issue requires the alternative definition of a bond. Hence, new definition of a bond focuses not only on a relation with some special prop- erties, but also on a bond as a formal context, whereby its concept lattice is connected to concept lattices of input contexts in some sense. As a consequence, a generalization for heterogeneous bonds is possible. One can find the methods in effort to equivalently modify the input heterogeneous formal contexts and to extract bonds as the extents of a direct product. The proposed modification runs as follows. Each individual pair that includes a ”conjunction” ◦b,a and a value of the poset Pb,a is replaced by a bond from 2-Bonds(hUb , Ub , ≤i, hVa , Va , ≥i). This completely covers the Galois connection between the complete lattices of any object–attribute pair from B × A. At the beginning, we will show how this modification looks in terms of sec- ond order formal contexts. Then we define new modified heterogeneous formal context such that its concept lattice is identical to the original. 88 Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči 4 Second order form of scaled heterogeneous formal context In effort to formalize the second order form of scaled heterogeneous formal con- text and its derivation operators, the definition of the following mappings is required: L Definition 9. Let (L, ≤) be a complete lattice. Let us define mappings (−) and (−)L where L L – (−) : L → 2L such that k (m) W = (m ≤ k) for any k, m ∈ L – (−)L : 2L → L such that X L = X for any X ⊆ L. Q S Let us have an arbitrary f ∈ b∈B Ub . Let us denote f as a subset of b∈B Ub S Q defined as f = b∈B {u ∈ Ub |u ≤ f (b)}. Similarly for any g ∈ a∈A Va . More information about Cartesian representation of fuzzy sets could be found in [10]. Now, consider a heterogeneous formal context C = hB, A, P, R, U, V, i. A second order form of scaled heterogeneous formal context is defined as * + [ [ C= Ub , {hUb , Ub , ≤i|b ∈ B}, Va , {hVa , Va , ≥i|a ∈ A}, R , b∈B a∈A whereby all external contexts are classical crisp contexts and R is a classical crisp binary relation defined as R(u, v) = ((u ◦b,a v) ≤ R(b, a)) for any (u, v) ∈ Ub × Va and any (b, a) ∈ B × A. In the following, we define the derivation operators of such special second order formal context. First, we state some appropriate remarks and facts. Note that a relation R constrained to Ub × Va for any pair (b, a) ∈ B × A is monotone in both arguments due to its definition. Similarly, consider the fact that any extent of hUb , Ub , ≤i and any intent of hVa , Va , ≥i is a principal down-set of a corresponding complete lattice (i.e. there exists an element in this complete lattice such that all lower or equal elements are in the extent or in the intent). Hence, a relation R constrained to Ub × Va for some (b, a) ∈ B × A is a 2-bond Q hUb , Ub , ≤i and hVa , Va , ≥i which will be denoted Q between by ρb,a . Note that any Φ ∈ b∈B Ext(hU Q b , Ub , ≤i) has the Q form f for some f ∈ b∈B Ub . Consider an arbitrary f ∈ b∈B Ub and g ∈ a∈A Va . Hence, the derivation operators are defined as follows: V b – %(f )(v) = b∈B ↑b,a (f (b) )(v) for any v ∈ Va and a ∈ A V a – .(g)(u) = a∈A ↓b,a (g(a) )(u) for any u ∈ Ub and b ∈ B. In a previous definition, the pair of mappings (↑b,a , ↓b,a ) are derivation op- erators of a formal context hUb , Va , ρb,a i for any (b, a) ∈ B × A. For the sake of b Ub a brevity, we use the shortened notation (−) instead of (−) and similarly (−) Va instead of (−) . Looking for Bonds between Nonhomogeneous Formal Contexts 89 Lemma 3. The concept lattices of C and C are isomorphic. Q Proof. Consider an arbitrary f ∈ b∈B Ub . We will show that %(f ) = % (f ). Firstly consider the fact of left-continuity of both arguments of ◦b,a for any (b, a) ∈ B×A. Due to this property, one can define two residuums in the following way. Let (b, a) ∈ B × A be an arbitrary object-attribute pair and consider the arbitrary values u ∈ Ub , v ∈ Va and p ∈ Pb,a . Then define W – →b,a : Ub × Pb,a → Va , such that u →b,a p = W {v ∈ Va |u ◦b,a v ≤ p} – →a,b : Va × Pb,a → Ub , such that v →a,b p = {u ∈ Ub |u ◦b,a v ≤ p}. ^ b % f (v) = ↑b,a f (b) (v) b∈B ^ ^ b = f (b) (u) ⇒ ρb,a (u, v) b∈B u∈Ub ^ ^ = ((u ≤ f (b)) ⇒ (u ◦b,a v ≤ R(b, a))) b∈B u∈Ub ^ ^ ^ = 1∧ ((u ≤ f (b)) ⇒ (u ◦b,a v ≤ R(b, a))) b∈B u∈Ub ;u6≤f (b) u∈Ub ;u≤f (b) ^ ^ = (u ◦b,a v ≤ R(b, a)) b∈B u∈Ub ;u≤f (b) ^ = (f (b) ◦b,a v ≤ R(b, a)) b∈B ^ = (v ≤ f (b) →b,a R(b, a)) b∈B ! ^ = v≤ (f (b) →b,a R(b, a)) b∈B ! ^_ = v≤ {w ∈ Va |(f (b) ◦b,a w ≤ R(b, a))} b∈B _ = v ≤ {w ∈ Va |(∀b ∈ B)(f (b) ◦b,a w ≤ R(b, a))} a = (v ≤% (f )(a)) = % (f )(a) (v). b Analogously one can obtain . (g) (u) = . (g)(b) (u). t u 4.1 Back to heterogeneous formal contexts Now, we look at heterogeneous formal context introduced in Subsection 2.3. A second order formal context C can be seen as a special heterogeneous formal b whereby the family of posets {hPb,a , ≤i|(b, a) ∈ B × A} is replaced by context C, 90 Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči a set of 2-bonds {ρb,a ∈ 2-Bonds(hUb , Ub , ≤i, hVa , Va , ≤i)|(b, a) ∈ B × A}. Hence, the final form of such heterogeneous formal context is D E Cb = B, A, ρ, R, b U, V, {×b,a |(b, a) ∈ B × A} where – ρ = {ρb,a ∈ 2-Bonds(hUb , Ub , ≤i, hVa , Va , ≤i)|(b, a) ∈ B × A} – ρb,a (u, v) = (u ◦b,a v ≤ R(b, a)) b a) = ρb,a ∈ 2-Bonds(hUb , Ub , ≤i, hVa , Va , ≤i) for any (b, a) ∈ B × A – R(b, – ×b,a : Ub × Va → 2Ub ×Va defined as a Cartesian product u×v = u × v. The derivation operators of Cb are defined as W Q – ↑ (f )(a) = W {v ∈ Va |(∀b ∈ B)f (b)×b,a v ⊆ ρb,a } for any f ∈ Q b∈B Ub – ↓ (g)(b) = {u ∈ Ub |(∀a ∈ A)u×b,a g(a) ⊆ ρb,a } for any g ∈ a∈A Va . Lemma 4. The concept lattices of C and Cb are identical. Proof. Firstly consider that for any (u, v) ∈ Ub × Va for any (b, a) ∈ B × A the following holds: u×v ⊆ ρb,a = u × v ⊆ ρb,a = ρb,a (u, v) = (u ◦b,a v ≤ R(b, a)). Q Let f ∈ b∈B Ub be arbitrary. Then _ ↑ (f )(a) ={v ∈ Va |(∀b ∈ B)f (b)×b,a v ⊆ ρb,a } _ = {v ∈ Va |(∀b ∈ B)f (b) ◦b,a v ≤ R(b, a)} =% (f )(a). Q Analogously for ↓ (g)(b) =. (g)(b) for any g ∈ a∈A Va . t u 5 Bonds between heterogeneous formal contexts We present a definition of a bond between two heterogeneous formal contexts which can be formulated as follows. Definition 10. Let Ci = hBi , Ai , Pi , Ri , Ui , Vi , i i for i ∈ {1, 2} be two heteroge- neous formal contexts. The heterogeneous formal context B = hB1 , A2 , P, R, U1 , V2 , i such that Ext(B) ⊆ Ext(C1 ) and Int(B) ⊆ Int(C2 ) is said to be a bond between two heterogeneous formal contexts C1 and C2 . Looking for Bonds between Nonhomogeneous Formal Contexts 91 5.1 Direct product of two heterogeneous formal contexts In this subsection, we define a direct product of two heterogeneous formal con- texts. Further, we give an answer on how to find a bond between two heteroge- neous formal contexts. Definition 11. Let Ci = hBi , Ai , Pi , Ri , Ui , Vi , i i for i ∈ {1, 2} be two hetero- geneous formal contexts. The heterogeneous formal context C1 ∆C2 = hB1 × A2 , B2 × A1 , P∆ , R∆ , U∆ , V∆ , ×i such that – P∆ = {ρb1 ,a1 ∆ρb2 ,a2 |((b1 , a2 ), (b2 , a1 )) ∈ (B1 × A2 ) × (B2 × A1 )} – where ρbi ,ai (u, v) = (u ◦bi ,ai v ≤ Ri (bi , ai )) for any (u, v) ∈ Ubi × Vai for any (bi , ai ) ∈ Bi × Ai for any i ∈ {1, 2} – R∆ ((b1 , a2 ), (b2 , a1 )) = ρb1 ,a1 ∆ρb2 ,a2 for any bi ∈ Bi and ai ∈ Ai for all i ∈ {1, 2} – U∆ = {γ1,2 ∈ 2-Bonds(hUb1 , Ub1 , ≤i, hVa2 , Va2 , ≥i)|(b1 , a2 ) ∈ B1 × A2 } – V∆ = {γ2,1 ∈ 2-Bonds(hUb2 , Ub2 , ≤i, hVa1 , Va1 , ≥i)|(b2 , a1 ) ∈ B2 × A1 } is said to be a direct product of two heterogeneous formal contexts. Lemma 5. Let Ci = hBi , Ai , Pi , Ri , Ui , Vi , i i for i ∈ {1, 2} be two heteroge- neous formal contexts. Let Y R∈ 2-Bonds(hUb1 , Ub1 , ≤i, hVa2 , Va2 , ≥i) (b1 ,a2 )∈B1 ×A2 be an extent of the direct product C1 ∆C2 . Then a heterogeneous formal context B = hB1 , A2 , ρ, R, U1 , V2 , ×i where ρ = {2-Bonds(hUb1 , Ub1 , ≤i, hVa2 , Va2 , ≥i)|(b1 , a2 ) ∈ B1 × A2 } is a bond between C1 and C2 . Q Proof. Let us have any intent of B. Then there exists f ∈ b1 ∈B1 Ub1 such that a2 %B (f )(a2 ) (v2 ) = %B (f )(v2 ) ^ b1 = ↑R(b1 ,a2 ) (f (b1 ) )(v2 ) b1 ∈B1 ^ ^ b1 = (f (b1 ) (u1 ) ⇒ R(b1 , a2 )(u1 , v2 )) b1 ∈B1 u1 ∈Ub1 Q R =.∆ (Q) for some Q ∈ (b2 ,a1 )∈B2 ×A1 2-Bonds(hUb2 , Ub2 , ≤i, hVa1 , Va1 , ≥i) ^ ^ b1 = (f (b1 ) (u1 ) ⇒.∆ (Q)(b1 , a2 )(u1 , v2 )) b1 ∈B1 u1 ∈Ub1 ^ ^ b1 ^ = f (b1 ) (u1 ) ⇒ ↓ρb1 ,a1 ∆ρb2 ,a2 (Q(b2 , a1 ))(u1 , v2 ) b1 ∈B1 u1 ∈Ub1 (b2 ,a1 )∈B2 ×A1 92 Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči ^ ^ b1 = f (b1 ) (u1 ) ⇒ b1 ∈B1 u1 ∈Ub1 ^ ^ ^ Q(b2 , a1 )(u2 , v1 ) ⇒ (ρb1 ,a1 ∆ρb2 ,a2 )((u1 , v2 ), (u2 , v1 )) b2 ∈B2 a1 ∈A1 (u2 ,v1 )∈Ub2 ×Va1 ^ ^ b1 = f (b1 ) (u1 ) ⇒ b1 ∈B1 u1 ∈Ub1 ^ ^ ^ ^ Q(b2 , a1 )(u2 , v1 ) ⇒ (¬ρb1 ,a1 (u1 , v1 ) ⇒ ρb2 ,a2 (u2 , v2 )) b2 ∈B2 a1 ∈A1 u2 ∈Ub2 v1 ∈Va1 ^ ^ ^ ^ ^ ^ b1 = f (b1 ) (u1 ) ⇒ b1 ∈B1 b2 ∈B2 a1 ∈A1 u2 ∈Ub2 v1 ∈Va1 u1 ∈Ub1 Q(b2 , a1 )(u2 , v1 ) ⇒ (¬ρb1 ,a1 (u1 , v1 ) ⇒ ρb2 ,a2 (u2 , v2 )) ^ ^ = b2 ∈B2 u2 ∈Ub2 _ _ _ _ b1 f (b1 ) (u1 ) ∧ Q(b2 , a1 )(u2 , v1 ) ∧ ¬ρb1 ,a1 (u1 , v1 ) b1 ∈B1 u1 ∈Ub1 a1 ∈A1 v1 ∈Va1 ⇒ ρb2 ,a2 (u2 , v2 ) ^ ^ b2 = (q(b2 ) (u2 ) ⇒ ρb2 ,a2 (u2 , v2 )) b2 ∈B2 u2 ∈Ub2 = %C 2 (q)(v2 ) = %C2 (q)(a2 )(v2 ) where _ _ _ _ q(b2 )(u2 ) = f (b1 )(u1 )∧Q(b2 , a1 )(u2 , v1 )∧¬ρb1 ,a1 (u1 , v1 ) b1 ∈B1 u1 ∈Ub1 a1 ∈A1 v1 ∈Va1 Hence, %B (f ) =%C2 (q). So any intent of B is an intent of C2 . By using the following equality (¬ρb1 ,a1 (u1 , v1 ) ⇒ ρb2 ,a2 (u2 , v2 )) = (¬ρb2 ,a2 (u2 , v2 ) ⇒ ρb1 ,a1 (u1 , v1 )) analogously we obtain that any extent of B is an extent of C1 . Hence, B is a bond between C1 and C2 . t u 6 Conclusion Bonds and their L-fuzzy generalizations represent a feasible way to explore the relationships between formal contexts. In this paper we have investigated the notion of a bond with respect to the heterogeneous formal contexts. In conclu- sion, an alternative definition of a bond provides an efficient tool to work with Looking for Bonds between Nonhomogeneous Formal Contexts 93 the nonhomogeneous data and one can further explore this uncharted territory in formal concept analysis. Categorical properties of heterogeneous formal contexts and bonds as mor- phisms between such objects and categorical relationship to homogeneous FCA categorical description will be studied in the near future. References 1. C. Alcalde, A. Burusco, R. Fuentes-González and I. Zubia. The use of linguistic variables and fuzzy propositions in the L-fuzzy concept theory. Computers & Mathematics with Applications, 62 (8): 3111-3122, 2011. 2. L. Antoni, S. Krajči, O. Krı́dlo, B. Macek, and L. Pisková. Relationship between two FCA approaches on heterogeneous formal contexts. Proceedings of CLA, 93– 102, 2012. 3. L. Antoni, S. Krajči, O. Krı́dlo, B. Macek, and L. Pisková. On heterogeneous formal contexts. Fuzzy Sets and Systems, 234: 22–33, 2014. 4. R. Bělohlávek. Fuzzy concepts and conceptual structures: induced similarities. Proceedings of Joint Conference on Information Sciences, 179–182, 1998. 5. R. Bělohlávek. Lattices of fixed points of fuzzy Galois connections. Mathematical Logic Quartely, 47(1):111–116, 2001. 6. R. Bělohlávek. Concept lattices and order in fuzzy logic. Annals of Pure and Applied Logic, 128(1–3):277-298, 2004. 7. R. Bělohlávek. Lattice-type fuzzy order is uniquely given by its 1-cut: proof and consequences. Fuzzy Sets and Systems, 143:447–458, 2004. 8. R. Bělohlávek. Sup-t-norm and inf-residuum are one type of relational product: unifying framework and consequences. Fuzzy Sets and Systems, 197: 45–58, 2012. 9. R. Bělohlávek. Ordinally equivalent data: A measurement-theoretic look at for- mal concept analysis of fuzzy attributes. International Journal of Approximate Reasoning, 54 (9): 1496–1506, 2013. 10. Simple proof of basic theorem for general concept lattices by Cartesian repre- sentation MDAI 2012, Girona, Catalonia, Spain, November 21-23, 2012, LNCS 7647(2012), 294-305. 11. P. Butka, J. Pócs and J. Pócsová. Representation of fuzzy concept lattices in the framework of classical FCA. Journal of Applied Mathematics, Article ID 236725, 7 pages, 2013. 12. P. Butka and J. Pócs. Generalization of one-sided concept lattices. Computing and Informatics, 32(2): 355–370, 2013. 13. C. Carpineto, and G. Romano. Concept Data Analysis. Theory and Applications. J. Wiley, 2004. 14. J.C. Dı́az-Moreno, J. Medina, M. Ojeda-Aciego. On basic conditions to gener- ate multi-adjoint concept lattices via Galois connections. International Journal of General Systems, 43(2): 149–161, 2014. 15. D. Dubois and H. Prade. Possibility theory and formal concept analysis: Charac- terizing independent sub-contexts. Fuzzy sets and systems, 196, 4–16, 2012. 16. B. Ganter and R. Wille. Formal concept analysis. Springer–Verlag, 1999. 17. J. Konečný and M. Ojeda-Aciego. Isotone L-bonds. Proceedings of CLA, 153-162, 2013. 18. J. Konečný, M. Krupka. Block Relations in Fuzzy Setting Proceedings of CLA, 115-130, 2011. 94 Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči 19. J. Konečný. Information Processing and Management of Uncertainty in Knowledge-Based Systems Comunications Computer and Information Sciences, vol. 444, 2014, pp 71-80. 20. O. Krı́dlo and M. Ojeda-Aciego. On the L-fuzzy generalization of Chu correspon- dences. International Journal of Computer Mathematics, 88(9):1808-1818, 2011. 21. O. Krı́dlo, S. Krajči, and M. Ojeda-Aciego. The category of L-Chu correspondences and the structure of L-bonds. Fundamenta Informaticae, 115(4):297-325, 2012. 22. O. Krı́dlo and M. Ojeda-Aciego. Linking L-Chu correspondences and completely lattice L-ordered sets. Proceedings of CLA, 233–244, 2012. 23. O. Krı́dlo and M. Ojeda-Aciego. CRL-Chu correspondences. Proceedings of CLA, 105–116, 2013. 24. O. Krı́dlo, P. Mihalčin, S. Krajči and L. Antoni. Formal Concept analysis of higher order. Proceedings of CLA, 117-128, 2013. 25. O. Krı́dlo and M. Ojeda-Aciego. Revising the Link between L-Chu correspondences and completely lattice L-ordered sets. Annals of Mathematics and Artificial In- teligence, DOI 10.1007/s10472-014-9416-8, 2014. 26. M. Krötzsch, P. Hitzler, and G.-Q. Zhang. Morphisms in context. Lecture Notes in Computer Science, 3596:223–237, 2005. 27. J. Medina and M. Ojeda-Aciego. Multi-adjoint t-concept lattices. Information Sciences, 180:712–725, 2010. 28. J. Medina and M. Ojeda-Aciego. On multi-adjoint concept lattices based on het- erogeneous conjunctors. Fuzzy Sets and Systems, 208: 95–110, 2012. 29. J. Medina and M. Ojeda-Aciego. Dual multi-adjoint concept lattices. Information Sciences, 225, 47–54, 2013. 30. H. Mori. Chu Correspondences. Hokkaido Matematical Journal, 37:147–214, 2008. 31. H. Mori. Functorial properties of Formal Concept Analysis. Proc ICCS, Lecture Notes in Computer Science, 4604:505–508, 2007. 32. J. Pócs. Note on generating fuzzy concept lattices via Galois connections. Infor- mation Sciences 185 (1):128–136, 2012. 33. J. Pócs. On possible generalization of fuzzy concept lattices. Information Sciences, 210: 89–98, 2012. Reverse Engineering Feature Models from Software Configurations using Formal Concept Analysis R. AL-msie’deen1 , M. Huchard1 , A.-D. Seriai1 , C. Urtado2 and S. Vauttier2 1 LIRMM / CNRS & Montpellier 2 University, France Al-msiedee, huchard, Seriai@lirmm.fr 2 LGI2P / Ecole des Mines d’Alès, Nı̂mes, France Christelle.Urtado, Sylvain.Vauttier@mines-ales.fr Abstract. Companies often develop in a non-disciplined manner a set of software variants that share some features and differ in others to meet variant-specific requirements. To exploit existing software variants and manage them coherently as a software product line, a feature model must be built as a first step. To do so, it is necessary to extract mandatory and optional features from the code of the variants in addition to as- sociate each feature implementation with its name. In previous work, we automatically extracted a set of feature implementations as a set of source code elements of software variants and documented the mined feature implementations based on the use-case diagrams of these vari- ants. In this paper, we propose an automatic approach to organize the mined documented features into a feature model. The feature model is a tree which highlights mandatory features, optional features and feature groups (and, or, xor groups). The feature model is completed with re- quirement and mutual exclusion constraints. We rely on Formal Concept Analysis and software configurations to mine a unique and consistent fea- ture model. To validate our approach, we apply it on several case studies. The results of this evaluation validate the relevance and performance of our proposal as most of the features and their associated constraints are correctly identified. Keywords: Software Product Line, Feature Models, Software Product Variants, Formal Concept Analysis, Product-by-feature matrix. 1 Introduction To exploit existing software variants and build a software product line (SPL), a feature model (FM) must be built as a first step. To do so, it is necessary to extract mandatory and optional features in addition to associate each feature with its name. In our previous work [1,2], we have presented an approach called REVPLINE 1 to identify and document features from the object-oriented source code of a collection of software product variants. 1 REVPLINE stands for RE-engineering Software Variants into Software Product Line. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 95–107, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 96 Ra’Fat Al-Msie’Deen et al. Dependencies between features need to be expressed via a FM which is a de facto standard formalism [3,4]. A FM is a tree-like hierarchy of features and constraints between them (cf. left side of Figure 1). FMs aim at describing the variability of a SPL in terms of features. A FM defines which feature combi- nations lead to valid products within the SPL (cf. right side of Figure 1). We illustrate our approach with the cell phone SPL FM and its 16 valid product configurations (cf. Figure 1) [5]. Artificial Opponent Single Player Multi Player Cell Phone Bluetooth Accu Cell Wireless Infrared Medium Display Games Strong Weak P-1 × × × × × × × × P-2 × × × × × × × × P-3 × × × × × × × × × P-4 × × × × × × × × P-5 × × × × × × × P-6 × × × × × × × P-7 × × × × × × × × × P-8 × × × × × × × × × P-9 × × × × × × × × × × P-10 × × × × × × × P-11 × × × × × × × × × P-12 × × × × × × × × × P-13 × × × × × × × × × × P-14 × × × × × × × × × × P-15 × × × × × × × × × × P-16 × × × × × × × × × × × Fig. 1. Valid product configurations of cell phone SPL feature model [5]. Figure 1 shows the FM of the cell phone SPL [5]. The Cell Phone feature is the root feature of this FM; hence it is selected in every program configuration. It has three mandatory child features (i.e., the Accu Cell, Display and Games features), which are also selected in every product configuration as their parent is always included. The children of the Accu Cell feature form an exclusive-or relation, meaning that the programs of this SPL include exactly one out of the three Strong, Medium or Weak features. The Multi Player and Single Player features constitute an inclusive-or, which necessitates that at least one of these two features is selected in any valid program configuration. Single Player has Artificial Opponent as a mandatory child feature. The Wireless feature is an optional child feature of root; hence it may or may not be selected. Its Infrared and Bluetooth child features form an inclusive-or relation, meaning that if a program includes the Wireless feature then at least one of its two child features has to be selected as well. The cell phone SPL also introduces three cross-tree constraints. While the Multi Player feature cannot be selected together with the Weak feature, it cannot be selected without the Wireless feature. Lastly, the Bluetooth feature requires the Strong feature. Galois lattices and concept lattices [6] are core structures of a data analy- sis framework (Formal Concept Analysis) for extracting an ordered set of con- Reverse Engineering Software Configuration Feature Models using FCA 97 cepts from a dataset, called a formal context, composed of objects described by attributes. In our approach, we consider the AOC-poset (for Attribute-Object- Concept poset) [7], which is the sub-order of the concept lattice restricted to attribute-concepts and object-concepts. Attribute-concepts (resp. object-con- cepts) are the highest (resp. lowest) concepts that introduce each attribute (resp. object). AOC-posets scale much better than lattices. For applying Formal Con- cept Analysis (FCA) we used the Eclipse eRCA platform2 . Manual construction of a FM is both time-consuming and error-prone [8], even for a small set of configurations [9]. The existing approaches to extract FM from product configurations [8,10] suffer from a lot of challenges. The main challenge is that numerous candidate FMs can be extracted from the same input product configurations, yet only a few of them are meaningful and correct, while in our work we synthesize an accurate and meaningful FM using FCA. Moreover the majority of these approaches extract a basic FM without constraints between its features [11] while, in our work, we extract all kinds of FM constraints. The remainder of this paper is structured as follows: Section 2 presents the reverse engineering FM process step-by-step. Next, Section 3 presents the way that we propose to evaluate the obtained FMs. Section ?? describes the ex- perimentation and threats to the validity. Section 4 discusses the related work. Finally, in Section 5, we conclude this paper. 2 Step-by-Step FM Reverse Engineering This section presents step-by-step the FM reverse engineering process. According to our approach, we identify the FM in seven steps as detailed in the following, using strong properties of FCA to group features among product configurations. The AOC-poset is built from a set of known products, and thus does not repre- sent all possible products. Thus, the FM structure has to be considered only as a candidate feature organization that can be proposed to an expert. The algorithm is designed such that all existing products (used for construction of candidate FM) are covered by the FM. Besides, it allows to define possible unused close variants. The first step of our FM extraction process is the identification of the AOC- poset. First, a formal context, where objects are software product variants and attributes are features (cf. Figure 1), is defined. The corresponding AOC-poset is then calculated. The intent of each concept represents features common to two or more products or unique to one product. As AOC-posets are ordered, the intent of the most general (i.e., top) concept gathers mandatory features that are common to all products. The intents of all the remaining concepts represent the optional features. The extent of each of these concepts is the set of products sharing these features (cf. Figure 2). In the following algorithms, for a Concept C, we call intent(C), extent(C), simplif ied intent(C), and simplif ied extent(C) its associated sets. Efficient algorithms can be found in [7]. The other steps are presented in the next sections. 2 The eRCA : http://code.google.com/p/erca/ 98 Ra’Fat Al-Msie’Deen et al. Fig. 2. The AOC-poset for the formal context of Figure 1. 2.1 Extracting root feature and mandatory features Algorithm 1 is a simple algorithm for building the Base node (cf. Figure 3). Features in the top concept of the AOC-poset (Concept 16) are used in every product configuration. The Cell Phone feature is the root feature of the cell phone FM (line 5). Then a mandatory Base node is created (lines 8,9). It is linked to nodes created to represent all the other features in the top concept, i.e., Accu Cell, Display and Games (lines 12-16). 2.2 Extracting atomic set of features (AND-group) Algorithm 2 is a simple algorithm for building AND-groups of features (exclud- ing all the mandatory features, line 3). An AND-group of features is created (line 8) to group optional features that appear in the same simplified intent (test line 6), meaning that these features are always used together in all the product con- figurations where they appear. Lines 12-16, nodes are created for every feature of the AND-group and they are attached to an And node. For instance, Con- cept 23 in Figure 2 has a simplified intent with two features, Single Player and Artificial Opponent, leading to the And node of Figure 3. 2.3 Extracting exclusive-or relation Features that form exclusive-or relation can be identified in the concept lattice using the meet (denoted by u) lattice operation [12], which amounts to compute Reverse Engineering Software Configuration Feature Models using FCA 99 Algorithm 1: ComputeRootAndMandatoryFeature 1 // Top concept > 2 ∃ F ∈ A, which represents the name of the soft. family with F in feature set of > Data: AOC K , ≤s : the AOC-poset associated with K Result: part of the FM containing root and mandatory features 3 // Compute the root Feature 4 CFS ← intent(>) 5 Create node root, label (root) ← F, type (root) ← abstract 0 6 CFS ← CFS \ {F} 0 7 if CFS 6= ∅ then 8 Create node base with label (base) ← ”Base” 9 type (base) ← abstract 10 Create edge e = (root, base) 11 type (e) ← mandatory 12 for each Fe in CFS0 do 13 Create node feature, with label (feature) ← Fe 14 type (feature) ← concrete 15 create edge e = (base, feature) 16 type (e) ← mandatory Algorithm 2: ComputeAtomicSetOfFeatures (and groups) Data: AOC K , ≤s : the AOC-poset associated with K Result: part of the FM with and groups of features 1 // Compute atomic set of features 2 // Feature List (FL) is the list of all features (FL = A in K=(O, A, R)). 0 3 FL ← FL \ CFS // FL \ intent(>) 4 AsF ← ∅ 5 int count ← 1 6 for each concept C 6= > such that | simplified intent(C) | ≥ 2 do 7 AsF ← AsF ∪ simplified intent(C) 8 Create node and with label (and ) ← ”AND”+ count 9 type (and ) ← abstract 10 create edge e = (root, and ) 11 type (e) ← optional 12 for each F in simplified intent(C) do 13 create node feature, with label (feature) ← F 14 type (feature) ← concrete 15 create edge e =(and, feature) 16 type (e) ← mandatory the greatest lower bounds in the AOC-poset. If a feature A is introduced in concept C1 , a feature B is introduced in concept C2 and C1 u C2 = ⊥ (and extent(⊥) = ∅), that is, if the bottom of the lattice is the greatest lower bound of C1 and C2 , the two features never occur together in a product. In our current 100 Ra’Fat Al-Msie’Deen et al. approach, we only build a single Xor group of features, when any group of mutually exclusive features exists. Computing exclude constraints (see Section 2.6) will deal with the many cases where several Xor group of features exist (a set of exclude constraints defining mutual exclusion is equivalent to a Xor group). Algorithm 3 is a simple algorithm for building the single Xor group of fea- tures. The principle is to traverse the set of super-concepts of each minimum elements of the AOC-poset and to keep the concepts that are the super-concepts of only one minimum concept. Only features that are not used in the previous steps are considered in FL” (line 2). Lines 6-10, in our example, we consider the three minimum concepts Concept 11, Concept 12 and Concept 15. The many SSC sets are the sets of super-concepts for Concept 11, Concept 12 and Con- cept 15. Cxor is the set of all concepts, except Concept 11, Concept 12 and Concept 15. Lines 11-15 only keep in Cxor concepts that do not appear in two SSC sets. Cxor contains concepts number 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14, 19, 20 and 21. Line 16 eliminates Concept 19 which is not a maximum. As there are three features (Medium, Strong, Weak, from Concept 21, Concept 20, and Concept 2 respectively) that are in FL” and in the simplified intent of concepts of Cxor (line 18), an Xor node is created and linked to the root (lines 19-26). Then, on lines 27-33, nodes are created for the features and linked to the Xor node. Figure 3 shows this Xor node. 2.4 Extracting inclusive-or relation Optional features are features that are used in some (but not all) product con- figurations. There are many ways of finding and organizing them. Algorithm 4 is a simple algorithm for building the Or group of features. In our approach, we pruned the AOC-poset by removing the top concept, concepts that correspond to AND groups of features, and concepts that correspond to features that form an exclusive-or relation. The remaining concepts define features that are grouped (lines 8-12) into an Or node (created and linked to the root on lines 4-7). In the AOC-poset of Figure 2, the Wireless, Infrared, Bluetooth, and Multi Player features form an inclusive-or relation (cf. Figure 3). 2.5 Extracting require constraints Algorithm 5 is a simple algorithm for identifying require constraints. A require constraint, e.g., saying ”variable feature A always requires variable feature B”, can be extracted from the lattice via implications. We say that A implies B (written A → B). The require constraints can be identified in the AOC-poset: when a feature F1 is introduced in a subconcept of the concept that introduces another feature F2 , there is an implication F1 → F2 . We only consider the transitive reduction of the AOC-poset limited to Attribute-concepts (line 2) and features that are in simplified intents (line 3-4). In the AOC-poset of Figure 2, we find 6 require constraints from the transitive reduction of the AOC-poset to Reverse Engineering Software Configuration Feature Models using FCA 101 Algorithm 3: ComputeExclusive-or Relation (Xor) Data: AOC K , ≤s : the AOC-poset associated with K Result: part of the FM with XOR group of features 1 // Compute exclusive-or relation 00 0 2 FL ← FL \ AsFs 3 Cxor ← ∅ 4 SSCS ← ∅ // set of super-concept sets 5 Minimum-set ← ∅ 6 for each minimum of AOC K denoted by m do 7 Let SSC the set of super-concepts of m (except >) 8 SSCS ← SSCS ∪ {SSC} 9 Minimum-set ← Minimum-set ∪ {m} 10 Cxor ← Cxor ∪ SSC 11 while SSCS 6= ∅ do 12 SSC-1 ← any element in (SSCS) 13 SSCS ← SSCS \ SSC-1 14 for each SSC-2 in SSCS do 15 Cxor ← Cxor \ (SSC-1 ∩ SSC-2) 16 Cxor ← Max(Cxor) 17 XFS ← ∅ 00 18 if |Cxor| > 1 and |F L ∩ ∪C∈Cxor simplif ied intent(C)| > 1 then 19 Create node xor with label (xor ) ← ”XOR” 20 type (xor ) ← abstract 21 create edge e = (root, xor ) 22 // if all products are covered by Cxor 23 if ∪C∈Cxor extent(C) = O then 24 type (e) ← mandatory 25 else 26 type (e) ← optional 27 for each concept C ∈ Cxor do 28 for each F in simplified intent(C) ∩ F L00 do 29 create node feature, with label (feature) ← F 30 type (feature) ← concrete 31 create edge e = (xor, feature) 32 type (e) ← alternative 33 XFS ← XFS ∪ F attribute-concepts (cf. Figure 3). Remark that implications ending to mandatory features are useless because they are represented in the FM by the Base node. 2.6 Extracting exclude constraints In our current proposal, we compute binary exclude constraints ¬(A ∧ B) under the condition that A and B are not both linked to the Or group. To mine 102 Ra’Fat Al-Msie’Deen et al. Algorithm 4: ComputeInclusive-orRelation (Or) Data: AOC K , ≤s : the AOC-poset associated with K Result: part of the FM with OR group of features 1 // Compute inclusive-or relation 000 00 2 FL ← FL \ XFS 000 3 if FL 6= ∅ then 4 Create node or with label (or ) ← ”OR” 5 type (or ) ← abstract 6 create edge e = (root, or ) 7 type (e) ← optional 8 for each F in FL000 do 9 create node feature, with label (feature) ← F 10 type (feature) ← concrete 11 create edge e = (or, feature) 12 type (e) ← Or Algorithm 5: ComputeRequireConstraint (Requires) Data: AC K , ≤s : the AC-poset associated with K Result: Require - the set of require constraints 1 Require ← ∅ 2 for each edge (C1, C2) = e in transitive reduction of AC-poset do 3 for all f1, f2 with f1 ∈ simplified intent(C1) and f2 ∈ simplified intent(C2) do 4 Require ← Require ∪ {f1 −→ f2} exclude constraints from an AOC-poset, we use the meet3 of the introducers of the two involved features. For example, the meet of Concept 2 which introduces Weak and Concept 22 which introduces Multi Player is the bottom (in the whole lattice). In the AOC-poset they don’t have a common lower bound. We can thus deduce ¬(W eak ∧ M ulti P layer). In the AOC-poset of Figure 2, there are three exclude constraints (cf. Figure 3). Algorithm 6 is a simple algorithm for identifying exclude constraints. It compares features that are below the OR group with each set of features in the intent of a minimum (line 4), in order to determine which are incompatible: this is the case for a pair (f1, f2) where f1 is in the OR group and not in the minimum intent, and f2 is in the minimum intent but not in the OR group (lines 6-10). Figure 3 shows the resulting FM based on the product configurations of Figure 1. 3 in the lattice Reverse Engineering Software Configuration Feature Models using FCA 103 Algorithm 6: ComputeExcludeConstraint (Excludes) Data: AOC K , ≤s : the AOC-poset associated with K Result: Exclude - the set of exclude constraints. 1 // Minimum-set from Algorithm 3 000 2 // FL from Algorithm 4 3 Exclude ← ∅ 4 for each P ∈ Minimum-set do 5 P intent ← intent(P ) \ intent(>) 6 Opt-feat-set ← FL000 \ (FL000 ∩ P intent) 7 Super-feat-set ← P intent \ (FL000 ∩ P intent) 8 if Opt-feat-set 6= ∅ and Super-feat-set 6= ∅ then 9 for each f1 ∈ Opt-feat-set, f2 ∈ Super-feat-set do 10 Exclude ← Exclude ∪ {¬(f1 ∧ f2)} 3 Experimentation In order to evaluate the mined FM we rely on the SPLOT homepage4 and the FAMA Tool5 . Our implementation6 converts the FM that has been drawn us- ing SPLOT homepage into the format of FAMA. Then, we can easily generate a file containing all valid product configurations [13]. Figure 3 shows all valid product configurations for the mined FM by our approach (the first 16 product configurations are the same as in Figure 1). We compare the sets of configura- tions defined by the two FMs (i.e., the initial FM compared to the mined FM). The mined FM introduces 15 extra product configurations which correspond to feature selection constraints that have not been detected by our algorithm. Evaluation Metrics: In our work, we rely on precision, recall and F-measure metrics to evaluate the mined FM. All measures have values in [0, 1]. If re- call equals 1, all relevant product configurations are retrieved. However, some retrieved product configurations might not be relevant. If precision equals 1, all retrieved product configurations are relevant. Nevertheless, relevant product configurations might not be retrieved. If F-Measure equals 1, all relevant prod- uct configurations are retrieved. However, some retrieved product configurations might not be relevant. F-Measure defines a trade-off between precision and re- call, so that it gives a high value only in cases where both recall and precision are high. The result of the product configurations that are identified by the mined cell phone FM is as follow: (precision: 0.51), (recall : 1.00) and (F-Measure: 0.68). The recall measure is 1 by construction, due to the fact that the algorithm was designed to cover existing products. 4 SPLOT homepage : http://gsd.uwaterloo.ca:8088/SPLOT/ 5 FAMA Tool Suite : http://www.isa.us.es/fama/ 6 Source Code : https://code.google.com/p/sxfmtofama/ 104 Ra’Fat Al-Msie’Deen et al. Artificial Opponent Single Player Multi Player Cell Phone Bluetooth Accu Cell Wireless Infrared Medium Display Games Strong Weak P-17 × × × × × P-18 × × × × × × P-19 × × × × × × × P-20 × × × × × × × P-21 × × × × × × × × P-22 × × × × × P-23 × × × × × × P-24 × × × × × × × P-25 × × × × × × × P-26 × × × × × × × P-27 × × × × × × × × P-28 × × × × × × × × P-29 × × × × × × × × P-30 × × × × × × × × × P-31 × × × × × × × × × Fig. 3. The mined FM and its extra product configurations. To validate our approach7 , we ran experiments on 7 case studies: ArgoUML- SPL [1], mobile media software variants [2], public health complaint-SPL8 , video on demand-SPL [8,3,14], wiki engines [10], DC motor [11] and cell phone-SPL [5]. Table 1 summarizes the obtained results. Results show that precision appears to be not very high for all case studies. This means that many of the identified product configurations of the mined FM are extra configurations (not in the initial set that is defined by the original FM). Considering the recall metric, its value is 1 for all case studies. This means that product configurations defined by the initial FM are included in the product configurations derived from the mined FM. Experiments show that if the gener- ated AOC-poset has only one bottom concept there is no exclusive-or relation or exclude constraints from the given product configurations. In our work, the mined FM defines more configurations than the initial FM. The reason behind this limitation is that some feature selection constraints are not detected. Nev- ertheless, the AOC-poset contains information for going beyond this limitation. We plan to enhance our algorithm to deal with that issue, at the price of an increase of complexity. 4 Related Work For the sake of brevity, we describe only the work that most closely relates to ours. The majority of existing approaches are designed to reverse engineer FM 7 Source code: https://code.google.com/p/refmfpc/ 8 http://www.ic.unicamp.br/~tizzei/phc/ Reverse Engineering Software Configuration Feature Models using FCA 105 Table 1. The results of configurations that are identified by the mined FMs. Group of Features CTCs Evaluation Metrics Execution times (in ms) Atomic Set of Features Number of Products Number of Features Exclusive-or Inclusive-or F-Measure Precision Excludes Requires Recall Base # case study 1 ArgoUML-SPL 20 11 × × × 509 0.60 1.00 0.75 2 Mobile media 8 18 × × × 441 0.68 1.00 0.80 3 Health complaint-SPL 10 16 × × × × 439 0.57 1.00 0.72 4 Video on demand 16 12 × × × × 572 0.66 1.00 0.80 5 Wiki engines 8 21 × × × × × × 555 0.54 1.00 0.70 6 DC motor 10 15 × × 444 0.83 1.00 0.90 7 Cell phone-SPL 16 13 × × × × × × 486 0.51 1.00 0.68 from high level models (e.g., product descriptions) [10,14]. Some approaches of- fer an acceptable solution but are not able to identify important parts of FM such as cross-tree constraints, and-group, or-group, xor-group [11]. The main challenge of works that reverse engineer FMs from product configurations ([8,3]) is that numerous candidate FMs can be extracted from the same input config- urations, yet only a few of them are meaningful and correct. The majority of existing approaches are designed to identify the dependencies between features regardless of FM hierarchy [8]. Work that relies on FCA to extract a FM does not fully exploit resulting lattices. In [11], authors rely on FCA to extract a ba- sic FM without cross-tree constraints, while in [12], authors use FCA as a tool to understand the variability of existing SPL based on product configurations. Their work does not produce FMs. In our work, we rely on FCA to extract FMs from the software configurations. The resulting FMs exactly describe the given product configuration set. The proposed approach is able to identify all parts of FMs. 5 Conclusion In this paper, we proposed an automatic approach to extract FMs from software variants configurations. We rely on FCA to extract FMs including configuration constraints. We have implemented our approach and evaluated its produced re- sults on several case studies. The results of this evaluation showed that the resulting FMs exactly describe the given product configuration set. The FMs are generated in very short time, because our FCA tool (based on traversals of the AOC-poset) scales significantly better than the standard FCA approaches to calculate and traverse the lattices. The current work extracts a FM with two levels of hierarchy. As a perspective of this work, we plan to enhance the ex- tracted FM by increasing the levels of hierarchy based on AOC-poset structure and to avoid allowing the FM to represent extra configurations. 106 Ra’Fat Al-Msie’Deen et al. Acknowledgment The authors would like to thank the reviewers for their valuable remarks that helped improve the paper. This work has been supported by the CUTTER ANR-10-BLAN-0219 project. References 1. Al-Msie’deen, R., Seriai, A., Huchard, M., Urtado, C., Vauttier, S., Salman, H.E.: Mining features from the object-oriented source code of a collection of software variants using formal concept analysis and latent semantic indexing. In: SEKE ’13. (2013) 244–249 2. Al-Msie’deen, R., Seriai, A., Huchard, M., Urtado, C., Vauttier, S.: Document- ing the mined feature implementations from the object-oriented source code of a collection of software product variants. In: SEKE ’14. (2014) 264–269 3. Acher, M., Baudry, B., Heymans, P., Cleve, A., Hainaut, J.L.: Support for reverse engineering and maintaining feature models. In: VaMoS ’13, New York, NY, USA, ACM (2013) 20:1–20:8 4. She, S., Lotufo, R., Berger, T., Wasowski, A., Czarnecki, K.: Reverse engineering feature models. In: ICSE ’11, New York, NY, USA, ACM (2011) 461–470 5. Haslinger, E.N.: Reverse engineering feature models from program configurations. Master’s thesis, Johannes Kepler University Linz, Linz, Austria (September 2012) 6. Ganter, B., Wille, R.: Formal concept analysis - mathematical foundations. Springer (1999) 7. Berry, A., Gutierrez, A., Huchard, M., Napoli, A., Sigayret, A.: Hermes: a simple and efficient algorithm for building the AOC-poset of a binary relation, Annals of Mathematics and Artificial Intelligence (may 2014) 8. Haslinger, E.N., Lopez-Herrejon, R.E., Egyed, A.: Reverse engineering feature models from programs’ feature sets. In: WCRE ’11, IEEE (2011) 308–312 9. Andersen, N., Czarnecki, K., She, S., Wasowski, A.: Efficient synthesis of feature models. In: SPLC (1), ACM (2012) 106–115 10. Acher, M., Cleve, A., Perrouin, G., Heymans, P., Vanbeneden, C., Collet, P., Lahire, P.: On extracting feature models from product descriptions. In: VaMoS ’12, New York, NY, USA, ACM (2012) 45–54 11. Ryssel, U., Ploennigs, J., Kabitzsch, K.: Extraction of feature models from formal contexts. In: SPLC ’11, New York, NY, USA, ACM (2011) 4:1–4:8 12. Loesch, F., Ploedereder, E.: Optimization of variability in software product lines. In: SPLC ’07, Washington, DC, USA, IEEE Computer Society (2007) 151–162 13. Benavides, D., Segura, S., Ruiz-Cortés, A.: Automated analysis of feature models 20 years later: A literature review. Inf. Syst. 35(6) (September 2010) 615–636 14. Lopez-Herrejon, R.E., Galindo, J.A., Benavides, D., Segura, S., Egyed, A.: Reverse engineering feature models with evolutionary algorithms: An exploratory study. In: SSBSE, Springer (2012) 168–182 An Algorithm for the Multi-Relational Boolean Factor Analysis based on Essential Elements Martin Trnecka, Marketa Trneckova Data Analysis and Modeling Lab (DAMOL) Department of Computer Science, Palacky University, Olomouc martin.trnecka@gmail.com, marketa.trneckova@gmail.com Abstract. The Multi-Relational Boolean factor analysis is a method from the family of matrix decomposition methods which enables us an- alyze binary multi-relational data, i.e. binary data which are composed from many binary data tables interconnected via relation. In this paper we present a new Boolean matrix factorization algorithm for this kind of data, which use the new knowledge from the theory of the Boolean factor analysis, so-called essential elements. We show on real dataset that uti- lizing essential elements in the algorithm leads to better results in terms of quality and the number of obtained multi-relational factors. 1 Introduction The Boolean matrix factorization (or decomposition), also known as the Boolean factor analysis, has gained interest in the data mining community. Methods for decomposition of multi-relational data, i.e. complex data composed from many data tables interconnected via relations between objects or attributes of this data tables, were intensively studied, especially in the past few years. Multi-relational data is a more truthful and therefore often also more powerful representation of reality. An example of this kind of data can be an arbitrary relational database. In this paper we are focused on the subset of multi-relational data, more pre- cisely on the multi-relational Boolean data. In this case data tables and relations between them contain only 0s and 1s. It is important to say that many real-word data sets are more complex than one simple data table. Relations between this tables are crucial, because they carry additional information about the relationship between data and this infor- mation is important for understanding data as a whole. For this reason methods which can analyze multi-relational data usually takes into account relations be- tween data tables unlike classical Boolean matrix factorization methods which can handle only one data table. The Multi-Relational Boolean matrix factorization (MBMF) is used for many data mining purposes. The basic task is to find new variables hidden in data, called multi-relational factors, which explain or describe the original input data. There exist several ways how to represent multi-relational factors. In this work c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 107–119, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 108 Martin Trnecka and Marketa Trneckova we adopt settings from [7], where is the multi-relational factor represented as an ordered set of classic factors from data tables, always one factor from each data table. The fact, that classic factors are connected into multi-relational factor is matter of semantic of relation between data tables. The main problem is how to connect classic factors into one multi-relational. The main aim of this work is to propose a new algorithm which utilize so-called essential elements from the theory of Boolean matrices. The essential elements provide information about factors which cover a particular part of data tables. This information can be used for a better connection of classic factors into one multi-relational factor. Another thing is the number of obtained factors. In classical settings we want the number of obtained factors as small as a possible. In the literature can be found two main views on this requirement. In the first case we want to obtain the particular number of factors. In the second case we want to obtain factors that explain prescribed portion of data. In both cases we want to obtain the most important factors. For more details see [1]. We emphasize this fact and we reflect it in designing of our algorithm. Both views can be transferred to multi- relational case. The first one is straightforward, the second one is a little bit problematic because multi-relational factors may not be able explain the whole data. This is correct, because multi-relational factors carry different information than classical factors. We discuss this issue later in the paper. 2 Preliminaries and basic notions We assume familiarity with the basic notions of the Formal concept analysis [4], which provides a basic framework for dealing with factors and the Boolean matrix factorization (BMF) [2]. The main goal of classical BMF is to find a decompo- sition C = A ◦ B, where C is input data table, A represent object-factor data table (or matrix) and B represent factor-attribute data table (or matrix). The product ◦ is the Boolean matrix product, defined by Wk (A ◦ B)ij = l=1 Ail · Blj , (1) W where denotes maximum (truth function of logical disjunction) and · is the usual product (truth function of logical conjunction). Decomposition C into A◦B corresponds to discovery factors which explain the data. Factors in classical BMF can be seen as formal concepts [2], i.e. entity with the extent part and the intent part. This leads to clear interpretation of factors. Another benefit of using FCA as a basic framework is that matrices A and B can be constructed from the subset of all formal concepts. Let F = {hA1 , B1 i , . . . , hAk , Bk i} ⊆ B(X, Y, C), where B(X, Y, C) represents a set of all formal concepts of data table, which can be seen as a formal context hX, Y, Ci, where X is a set of objects, Y is a set of attributes and C is a binary relation between X and Y . Matrices A and B are constructed in the following way: Multi-Relational Boolean Factor Analysis based on Essential Elements 109 1 if i ∈ Al 1 if j ∈ Bl (A)il = (B)lj = 0 if i ∈ / Al 0 if j ∈ / Bl for l = 1, . . . , k. In other words, A is composed from characteristic vectors Al . Similarly for B. In a multi-relation environment we have a set of input data tables C1 , C2 , . . . Cn and a set of relations Rij , where i, j ∈ {1, . . . , n}, between Ci and Cj . The multi-relation factor on data tables C1 , C2 , . . . Cn is an ordered n-tuple i F1i1 , F2i2 , . . . Fnin , where Fj j ∈ Fj , j ∈ {1, . . . , n} (Fj denotes a set of clas- sic factors of data table Cj ) and satisfying relations RCl Cl+1 or RCl+1 Cl for l ∈ {1, . . . , n − 1}. Example 1. Let us have two data tables C1 (Table 1) and C2 (Table 2). Moreover, we consider relation RC1 C2 (Table 3) between objects of the first data table and attributes of the second one. Table 1: C1 Table 2: C2 Table 3: RC1 C2 a b c d e f g h e f g h 1 ××× 5× × 1 ×× 2× × 6 ×× 2× × 3 × × 7××× 3×× × 4×××× 8 ×× 4×××× Classic factors of data table C1 are for example: F1C1 = h{1, 4}, {b, c, d}i, F2C1 = h{2, 4}, {a, c}i, F3C1 = h{1, 3, 4}, {b, d}i and factors of the second ta- ble C2 are: F1C2 = h{6, 7}, {f, g}i, F2C2 = h{5}, {e, h}i, F3C2 = h{5, 7}, {e}i, F4C2 = h{8}, {g, h}i. These factors can be connected with using a relation RC1 C2 into multi-relational factors in several ways. In [7] were introduced three ap- proaches how to manage this connections. We use the narrow approach from [7], which seems to be the most natural, and we obtain two multi-relational factors hF1C1 , F1C2 i and hF3C1 , F1C2 i. The idea of the narrow approach is very simple. We connect two factors FiC1 and FjC2 if the non-empty set of attributes (if such exist), which are common (in the relation RC1 C2 ) to all objects from the first factor FiC1 , is the subset of attributes of the second factor FjC2 . The previous example also demonstrate the most problematic part of MBMF. Usually is problematic to connect all factors from each data table. The result of this is a small number of connections between them. This leads to problematic selection of quality multi-relational factors. The reason for a small number of connections between factors is that classic factors are selected without taking relation into account. Another very important notion for our work are so-called essential elements presented in [1]. Essential elements in the Boolean data table are entries in this data table which are sufficient for covering the whole data table by factors 110 Martin Trnecka and Marketa Trneckova (concepts), i.e. if we take factors which cover all these entries, we automatically cover all entries of the input data table. Formally, essential elements in the data table hX, Y, Ci are defined via minimal intervals in the concept lattice. The entry Cij is essential iff interval bounded by formal concepts hi↑↓ , i↑ i and hj ↓ , j ↓↑ i is non-empty and minimal w.r.t. ⊆ (if it is not contained in any other interval). We denote this interval by Iij . If the table entry Cij is essential, then interval Iij represents the set of all formal concepts (factors) which cover this entry. Very interesting property of essential elements, which is used in our algorithm, is that is sufficient take only one arbitrary concept from each interval to create exact Boolean decomposition of hX, Y, Ci. For more details about essential elements we refer to [1]. 3 Related work There are several papers about classical BMF [1, 2, 5, 8, 10, 12], but this methods can handle only one data table. In the literature, we can found a wide range of theoretical and application papers about the multi-relation data analysis (see overview [3]), but many times were shown that these approaches are suitable only for ordinal data. The multi-relational Boolean factor analysis is more specific. The most relevant paper for our work is [7], where was introduced the basic idea that multi-relational factors are composed from classical factors which are interconnected via relation between data tables. There were also introduced three approaches how to create multi-relational factors, but an effective algorithm is missing. The Boolean multi-relational patterns and its extraction are subject of a paper [11]. Differently from our approach data are represented via k-partite graphs. There are considered only relations between attributes and data tables contain only one single attribute. Patterns in [11] are different from our multi- relational factors (are represented as k-clique in data) and also carry different information. In [11] there is also considered other kind of measure of quality of obtained patterns which is based on entropy. Another relevant work is [6] where were introduced the Relational Formal Concept Analysis as a tool for analyzing multi-relational data. Unlike from [6] our approach extracts a different kind of patterns. For more details see [7]. MBMF is mentioned indirectly in a very specific and limited form in [9] as the Joint Subspace Matrix Factorization. Generally the idea of connection patterns from various data tables is not new. It can be found in the social network analysis or in the field of recommendation systems. The main advantage of our approach is that patterns are Boolean fac- tors that carry significant information and the second important advantage is that we deliver the most important factors (factors which describe the biggest portion of input data) before others, i.e. the first obtained factor is the most important. Multi-Relational Boolean Factor Analysis based on Essential Elements 111 4 Algorithm for MBMF Before we present the algorithm for the MBMF we show on a simple example basic ideas that are behind the algorithm. For this purpose we take the example from the previous part. As we mentioned above if we take tables C1 , C2 and relation RC1 C2 , we obtain with the narrow approach two connections between factors, i.e. two multi-relational factors. These factors explain only 60 percent of data. There usually exist more factorizations of Boolean data table. Factors in our example were obtained with using GreConD algorithm from [2]. GreConD algorithm select in each iteration a factor which covers the biggest part of still uncovered data. Now we are in the situation, where we want to obtain a different set of factors, with more connections between them. For this purpose we can use essential elements. Firstly we compute essential parts of C1 (denoted Ess(C1 )) and C2 (denoted Ess(C1 )). With the essential part of data table we mean all essential elements (tables 1 and 2). Table 4: Ess(C1 ) Table 5: Ess(C2 ) a b c d e f g h 1 × 5× × 2× 6 × 3 × × 7× 4 8 ×× Each essential element in Ess(C1 ) is defined via interval in concept lattice of C1 (Fig. 1a) and similarly for essential elements in Ess(C2 ) (Fig 1b). In Fig. 1a is highlighted interval I1c corresponding to essential element (C1 )1c . In Fig. 1b is highlighted interval corresponding to essential element (C2 )8g . Let us note that concept lattices here are only for illustration purpose. For computing Ess(C1 ) and Ess(C2 ) is not necessary to construct concept lattices at all. Now, if we use the fact that we can take an arbitrary concept (factor) from each interval to obtain a complete factorization of data table, we have several options which concepts can be connect into one. More precisely we can take two intervals and try to connect each concept from the first interval with concepts from the second one. Again, we obtain full factorization of input data tables, but now we can select factors with regard to a relation between them. For example, if we take highlighted intervals, we obtain possibly four con- nections. First highlighted interval contains two concepts c1 = h{1, 2, 4}, {c}i and c2 = h{1, 4}, {b, c, d}i. Second consist of concepts d1 = h{6, 7, 8}, {g}i and d2 = h{8}, {g, h}i. Only two connections (c1 with d1 and c1 with d2 ) satisfy relation RC1 C2 , i.e. can be connected. For two intervals it is not necessary to try all combination of factors. If we are not able to connect concept hA, Bi from the first interval with concept hC, Di from the second interval, we are not able connect hA, Bi with any concept hE, F i from the second interval, where hC, Di ⊆ hE, F i. Also if we are not 112 Martin Trnecka and Marketa Trneckova h e g c b, d 3 f 5 8 6 a 1 2 7 4 (a) (b) Fig. 1: Concept lattices of C1 (a) and C2 (b) able to connect concept hA, Bi from the first interval with concept hE, F i from the second interval, we are not able connect any concept hC, Di from the first interval, where hC, Di ⊆ hA, Bi, with concept hE, F i. Let us note that ⊆ is classical subconcept-superconcept ordering. Even if we take this search space reduction into account, search in this in- tervals is still time consuming. We propose an heuristic approach which takes attribute concepts in intervals of the second data table, i.e. the bottom elements in each interval. In intervals of the first data table we take greatest concepts which can be connected via relation, i.e. set of common attributes in relation is non-empty. The idea behind this heuristic is that a bigger set of objects pos- sibly have a smaller set of common attributes in a relation and this leads to bigger probability to connect this factor with some factor from the second data table, moreover, if we take factor which contains the biggest set of attributes in intervals of the second data table. Because we do not want to construct the whole concept lattice and search in it, we compute candidates for greatest element directly from relation RC1 C2 . We take all objects belonging to the top element of interval Iij from the first data table and compute how many of them belong to each attribute in the relation. We take into account only attributes belonging to object i. We take as candidate the greatest set of objects belonging to some attribute in a relation, which satisfies that if we compute a closure of this set in the first data table, resulting set of objects do not have empty set of common attributes in a relation. Applying this heuristic on data from the example, we obtain three factors in the first data table, F1C1 = h{2, 4}, {a, c}i, F2C1 = h{1, 3, 4}, {c, d}i, F3C1 = h{1, 2, 4}, {c}i and four factors F1C2 = h{5}, {e, h}i, F2C2 = h{6, 7}, {f, g}i, F3C2 = h{7}, {e, f, g}i, F4C2 = h{8}, {g, h}i from the second one. Between this factors, there are six connections satisfying the relation. These connections are shown in table 6. We form multi-relational factors in a greedy manner. In each step we connect factors, which cover the biggest part of still uncovered part of data tables C1 and Multi-Relational Boolean Factor Analysis based on Essential Elements 113 Table 6: Connections between factors F1C2 F2C2 F3C2 F4C2 F1C1 × F2C1 × × F3C1 × × × C2 . Firstly, we obtain multi-relational factor hF2C1 , F2C2 i which covers 50 percent of the data. Then we obtain factor hF3C1 , F4C2 i which covers together with first factor 75 percent of the data and last we obtain factor hF1C1 , F3C2 i. All these factors cover 90 percent of the data. By adding other factors we do not obtain better coverage of input data. These three factors cover the same part of input data as six connections from table 6. Remark 1. As we mentioned above and what we can see in the example, multi- relational factors are not always able to explain the whole data. This is due to nature of data. Simply there is no information how to connect some classic factors, e.g. in the example no set of objects from C1 has in RC1 C2 a set of common attributes equal to {e, h} (or only {e} or only {h}). From this reason we are not able to connect any factor from C1 with factor F1C2 . Remark 2. In previous part we explain the idea of the algorithm on a object- attribute relation between data tables. It is also possible consider different kind of relation, e.g. object-object, attribute-object or attribute-attribute relation. Without loss of generality we present the algorithm only for the object-attribute relation. Modification to a different kind of relation is very simple. Now we are going to describe the pseudo-code (Algorithm 1) of our algorithm for MBMF. Input to this algorithm are two Boolean data tables C1 and C2 , binary relation RC1 C2 between them and a number p ∈ [0, 1] which represent how large part of C1 and C2 we want to cover by multi-relational factors, e.g. value 0.9 mean that we want to cover 90 percent of entries in input data tables. Output of this algorithm is a set M of multi-relational factors that covers the prescribed portion of input data (if it is possible to obtain prescribed coverage). The first computed factor covers the biggest part of data. First, in lines 1-2 we compute essential part of C1 and C2 . In lines 2-4 we initialize variables UC1 and UC2 . These variables are used for storing information about still uncovered part of input data. We repeat the main loop (lines 5-18) until we obtain a required coverage or until it is possible to add new multi- relational factors which cover still uncovered part (lines 12-14). In the main loop for each essential element we select the best candidate from interval Iij from the first data table in the greedy manner described in the algorithm idea, i.e. we take the greatest concept which can be connected via relation. Than we try to connect this candidate with factors from the second data table. We compute cover function and we add to M the multi-relational factor maximizing this coverage. 114 Martin Trnecka and Marketa Trneckova In lines 16-17 we remove from UC1 and UC2 entries which are covered by actually added multi-relational factor. Algorithm 1: Algorithm for the multi-relational Boolean factors analysis Input: Boolean matrices C1 , C2 and relation RC1 C2 between them and p ∈ [0, 1] Output: set M of multi-relational factors 1 EC1 ← Ess(C1 ) 2 EC2 ← Ess(C2 ) 3 UC1 ← C1 4 UC2 ← C2 5 while (|UC1 | + |UC2 |)/(|C1 | + |C2 |) ≥ p do 6 foreach essential element (EC1 )ij do 7 compute the best candidate ha, bi from interval Iij 8 end 9 hA, Bi ← select one from set of candidates which maximize cover of C1 ↑ ↓↑C2 10 select non-empty row i in EC2 for which is A RC1 C2 ⊆ (C2 )i and which maximize cover of C1 and C2 ↑↓C2 11 hC, Di ← h(C2 )i , (C2 )↑C i 2 i 12 if value of cover function for C1 and C2 is equal to zero then 13 break 14 end 15 add hhA, Bi, hC, Dii to M 16 set (UC1 )ij = 0 where i ∈ A and j ∈ B 17 set (UC1 )ij = 0 where i ∈ C and j ∈ D 18 end 19 return F Our implementation of the algorithm follows the pseudo-code conceptually, but not in details. For example we speed up the algorithm by precomputing can- didates or instead computing candidates for each essential elements, we compute candidates for essential areas, i.e. essential elements which are covered by one formal concept. Remark 3. The input of our algorithm are two Boolean data tables and one relation between them. In general we can have more data tables and rela- tions. Generalization of our algorithm for such input is possible. Due to lack of space we mentioned only an idea of this generalization. For the input data tables C1 , C2 , . . . , Cn and relations RCi Ci+1 , i ∈ {1, 2, . . . , n − 1} we firstly com- pute multi-relational factors for Cn−1 and Cn . Then iteratively compute multi- relational factors for Cn−2 and Cn−1 . From this pairs we construct n-tuple multi- relational factor. Multi-Relational Boolean Factor Analysis based on Essential Elements 115 We do not make a detail analysis of the time complexity of the algorithm. Even our slow implementation in MATLAB is fast enough for factorization usu- ally large datasets in a few minutes. 5 Experimental evaluation For experimental evaluation of our algorithm we use in a data minig community well known real dataset MovieLens1 . This dataset is composed of two data tables that represent a set of users and their attributes, e.g. gender, age, sex, occupation and a set of movies again with their attributes, e.g. the year of production or genre. Last part of this dataset is a relation between this data sets. This relation contains 1000209 anonymous ratings of approximately 3900 movies (3952) made by 6040 MovieLens users who joined to MovieLens in 2000. Each user has at least 20 ratings. Ratings are made on a 5-star scale (values 1-5, 1 means, that user does not like a movie and 5 means that he likes a movie). Originally data tables Users and Movies are categorical. Age is grouped into 7 categories such as “Under 18”, “18-24”, “25-34”, “35-44”, “45-49”, “50-55” and “56+”. Sex is from set {Male, Female}. Occupation is chosen from the following choices: “other” or not specified, “academic/educator”, “artist”, “cler- ical/admin”, “college/grad student”, “customer service”, “doctor/health care”, “executive/managerial”, “farmer”, “homemaker”, “K-12 student”, “lawyer”, “pro- grammer”, “retired”, “sales/marketing”, “scientist”, “self-employed”, “techni- cian/engineer”, “tradesman/craftsman”, “unemployed” and “writer”. Film gen- res are following: “Action”, “Adventure”, “Animation”, “Children’s”, “Com- edy”, “Crime”, “Documentary”, “Drama”, “Fantasy”, “Film-Noir”, “Horror”, “Musical”, “Mystery”, “Romance”, “Sci-Fi”, “Thriller”, “War” and “Western”. Year of production is from 1919 to 2000. We grouped years into 8 categories “1919-1930”, “1931-1940”, “1941-1950”, “1951-1960”, “1961-1970”, “1971-1980”, “1981-1990” and “1991-2000”. We convert the ordinal relation in to binary one. We use three different scaling. The first is that user rates a movie. The second is that a user does not like a movie (he rates movie with 1-2 stars). The last one is that user likes a movie (rates 4-5). This does not mean, that users do like (respective do not like) some genre, it means, that movies from this genre are or are not worth to see. We took the middle size version of the MovieLens dataset and we made a restriction to 3000 users and movies that were rated by that users. We take users, who rate movies the most, and we obtain dimension of the first data table 3000×30 and dimension of the second data table is 3671×26. Let us just note that for obtaining object-attribute relation we need to transpose Movies data table. Relation “user rates a movie” make sense, because user rates a movie if he has seen it. We can understand this relation as user has seen movie. We get 29 multi-relational factors, that cover almost 100% of data (99.97%). Values of coverage, i.e. how large part of input data is covered can be seen in figure 2. 1 http://grouplens.org/datasets/movielens/ 116 Martin Trnecka and Marketa Trneckova Graphs in figure 3 show coverage of Users data table and Movies data table separately. We can also see that for explaining more than 90 percent of data are sufficient 17 factors. This is significant reduction of input data. 1 0.9 0.8 0.7 0.6 coverage 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 number of factors Fig. 2: Cumulative coverage of input data 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 coverage coverage 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 number of factors number of factors (a) Coverage of Users data table (b) Coverage of Movies data table Fig. 3: Coverage of input data tables The most important factors are: – Males rate new movies (movies from 1991 to 2000). – Young adult users (ages 25-34) rate drama movies. Multi-Relational Boolean Factor Analysis based on Essential Elements 117 – Females rate comedy movies. – Youth users (18-24) rate action movies. Another interesting factors are: – Old users (from category 56+) rate movies from their childhood (movies from 1941 to 1950). – Users in age range 50-55 rate children’s movies. Users in this age usually have grand children. – K-12 students rate animation movies. Due to lack of space, we skip details about factors in relation “user does not like a movie” and relation “user does like a movie”. In the first relation we get 30 factors, that covers 99.99% of data. In the second one, we get 29 factors, covering 99.96% of data. Compute all multi-relational factors on this datasets take approximately 5 minutes. Remark 4. In case of MovieLens we are able to reconstruct input data tables almost wholly for each three relations. Interesting question is what about rela- tion, i.e. can we reconstruct relation between data tables? Answer is yes, we can. Multi-relational factor carry also information about the relation between data tables. So we can reconstruct it, but with some error. This error is a result of choosing the narrow approach. Reconstruction error of relation is interesting information and can be mini- mize if we take this error into account in phase of computing coverage. In other words we want maximal coverage with minimal relation reconstruction error. This leads to more complicated algorithm because we need weights to compute a value of utility function. We implement also this variant of algorithm. Re- quirement of minimal reconstruction error and maximal coverage seems to be contradictory, but this claim need more detailed study. Also it is necessary to determine correct weight settings. We left this issue for the extended version of this paper. 6 Conclusion and Future Research In this paper, we present new algorithm for multi-relational Boolean matrix fac- torization, that uses essential elements from binary matrices for constructing better multi-relational factors, with regard to relations between each data ta- ble. We test the algorithm on, in data mining well known, dataset MovieLens. We obtain from these experiments interesting and easy interpretable results, moreover, the number of obtained multi-relational factors needed for explaining almost whole data is reasonable small. A future research shall include the following topics: generalization of the al- gorithm for MBMF for ordinal data, especially data over residuated lattices. Construction of algorithm which takes into account reconstruction error of the 118 Martin Trnecka and Marketa Trneckova relation between data tables. Test the potential of this method in recommen- dation systems. And last but not least create not crisp operator for connecting classic factors into multi-relational factors. Acknowledgment We acknowledge support by IGA of Palacky University, No. PrF 2014 034. References 1. Belohlavek R., Trnecka M.: From-Below Approximations in Boolean Matrix Fac- torization: Geometry and New Algorithm. http://arxiv.org/abs/1306.4905, 2013. 2. Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20, 2010. 3. Džeroski S.: Multi-Relational Data Mining: An Introduction. ACM SIGKDD Ex- plorations Newsletter, 1(5), 1–16, 2003. 4. Ganter B., Wille R.: Formal Concept Analysis: Mathematical Foundations. Springer, Berlin, 1999. 5. Geerts F., Goethals B., Mielikäinen T.: Tiling databases, Proceedings of Discovery Science 2004, pp. 278–289, 2004. 6. Hacene M. R., Huchard M., Napoli A., Valtechev P.: Relational concept analysis: mining concept lattices from multi-relational data. Ann. Math. Artif. Intell. 67(1), 81–108, 2013. 7. Krmelova M., Trnecka M.: Boolean Factor Analysis of Multi-Relational Data. In: M. Ojeda-Aciego, J. Outrata (Eds.): CLA 2013: Proceedings of the 10th Interna- tional Conference on Concept Lattices and Their Applications, pp. 187–198, 2013. 8. Lucchese C., Orlando S., Perego R.: Mining top-K patterns from binary datasets in presence of noise. SIAM DM 2010, pp. 165–176, 2010. 9. Miettinen P.: On Finding Joint Subspace Boolean Matrix Factorizations. Proc. SIAM International Conference on Data Mining (SDM2012), pp. 954-965, 2012. 10. Miettinen P., Mielikäinen T., Gionis A., Das G., Mannila H.: The discrete basis problem, IEEE Trans. Knowledge and Data Eng. 20(10), 1348–1362, 2008. 11. Spyropoulou E., De Bie T.: Interesting Multi-relational Patterns. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining, ICDM ’11, pp. 675–684, 2011. 12. Xiang Y., Jin R., Fuhry D., Dragan F. F.: Summarizing transactional databases with overlapped hyperrectangles, Data Mining and Knowledge Discovery 23(2), 215–251, 2011. On Concept Lattices as Information Channels Francisco J. Valverde-Albacete1? , Carmen Peláez-Moreno2 , and Anselmo Peñas1 1 Departamento de Lenguajes y Sistemas Informáticos Universidad Nacional de Educación a Distancia, c/ Juan del Rosal, 16. 28040 Madrid, Spain {fva,anselmo}@lsi.uned.es 2 Departamento de Teorı́a de la Señal y de las Comunicaciones Universidad Carlos III de Madrid, 28911 Leganés, Spain carmen@tsc.uc3m.es Abstract. This paper explores the idea that a concept lattice is an in- formation channel between objects and attributes. For this purpose we study the behaviour of incidences in L-formal contexts where L is the range of an information-theoretic entropy function. Examples of such data abound in machine learning and data mining, e.g. confusion matri- ces of multi-class classifiers or document-term matrices. We use a well- motivated information-theoretic heuristic, the maximization of mutual information, that in our conclusions provides a flavour of feature selection providing and information-theory explanation of an established practice in Data Mining, Natural Language Processing and Information Retrieval applications, viz. stop-wording and frequency thresholding. We also in- troduce a post-clustering class identification in the presence of confusions and a flavour of term selection for a multi-label document classification task. 1 Introduction Information Theory (IT) was born as a theory to improve the efficiency of (man- made) communication channels [1, 2], but it soon found wider application [3]. This paper is about using the model of a communication channel in IT to explore the formal contexts and concept lattices of Formal Concept Analysis as realisa- tions of information channels between objects and attributes. Given the highly unspecified nature of both the latter abstractions such a model will bring new insights into a number of problems, but we are specifically aiming at machine learning and data mining applications [4, 5]. The metaphor of a concept lattice as a communication channel between ob- jects and attributes is already implicit in [6, 7]. In there, adjoint sublattices were already considered as subchannels in charge of transmitting individual acousti- cal features, and some efforts were done to model such features explicitly [7], ? FJVA and AP are supported by EU FP7 project LiMoSINe (contract 288024) for this work. CPM has been supported by the Spanish Government-Comisión Inter- ministerial de Ciencia y Tecnologı́a project TEC2011-26807. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 119–131, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 120 Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas but no conclusive results were achieved. The difficulty rose from a thresholding parameter ϕ that controls the lattice-inducing technique and was originally fixed by interactive exploration, a procedure hard to relate to the optimization of a utility or cost function, as required in modern machine learning. In this paper we set this problem against the backdrop of direct mutual infor- mation maximization—using techniques and insights developed since [6, 7]—for matrices whose entries are frequency counts. These counts appear frequently in statistics, data mining and machine learning, for instance, in the form of document-term matrices in Information Retrieval [8], confusion matrices for clas- sifiers in perceptual studies, data mining and machine learning [9], or simply two-mode contingency tables with count entries. Such matrices are called aggre- gable in [4], in the sense that any group of rows or columns can be aggregated together to form another matrix whose frequencies are obtained from the data of the elements in the groups. We will use this feature to easily build count and probability distributions whose mutual information can be maximized, following the heuristic motivated above, to improve classification tasks. Note that max- imizing mutual information (over all possible joint distributions) is intimately related to the concept of channel capacity as defined by Shannon [2]. For this purpose, in Sec. 2 we cast the problem of analysing the transfer of information through the two modes of contingency tables as that of analysing a particular type of formal context. First we present in Sec. 2.1 the model of the task to be solved, then we present aggregable data, as usually found in machine learning applications in Sec. 2.2, and then introduce the entropic encoding to make it amenable to FCA. As an application, in Sec. 3.1 we explore the particular problem of supervised clustering as that of transferring the labels from a set of input patterns to the labels of the output classes. Specifically we address the problem of assigning labels to mixed clusters given the distribution of the input labels in them. We end with a discussion and a summary of contributions and conclusions. 2 Theory 2.1 Classification optimization by mutual information maximization Consider the following, standard supervised classification setting: we have two domains X and Y , m instances of i.i.d. samples S = {(xi , yi )}m i=1 ⊆ X × Y , and we want to learn a function h : X → Y , the hypothesis, with certain “good” qualities, to estimate the class Y from X , the measurements of Y , or features. A very productive model to solve this problem is to consider two probability spaces Y = hY, PY i and X = hX, PX i with Y ∼ PY and X ∼ PX , and suppose that there exists the product space hX × Y, PXY i wherefrom the i.i.d. samples of S have been obtained. So our problem is solved by estimating the random variable Ŷ = h(X), and a “good” estimation is that which obtains a low error probability on every possible pair P (Ŷ 6= Y ) → 0 . Since working with probabilities might be difficult, we might prefer to use a (surrogate) loss function that quantifies the cost of this difference L(ŷ = On Concept Lattices as Information Channels 121 h(x), y) and try to minimize the expectation of this loss, called the risk R(h) = E[L(h(x), y)] over a class of functions h ∈ H, h∗ = minh∈H R(h) . Consequently, this process is called empirical risk minimization. An alternate criterion is to maximize the mutual information between Y and Ŷ [10]. This is clearly seen from Fano’s inequality [11], serving as a lower bound, and the Hellman-Raviv upper bound [12], HPŶ − IPY Ŷ − 1 1 ≤ P (Ŷ 6= Y ) ≤ HPŶ |Y HUŶ 2 where UŶ is the uniform distribution on the support of Ŷ , HPXX denotes the different entropies involved and IPY Ŷ is the mutual information of the joint probability distribution. 2.2 Processing aggregable data If the original rows and columns of contingency tables represent atomic events, their groupings represent complex events and this structure is compatible with the underlying sigma algebras that would transform the matrix into a joint distribution of probabilities, hence these data can be also interpreted as joint probabilities, when row- and column-normalized. When insufficient data is available for counting, the estimation of empirical probabilities from this kind of data is problematic, and complex probability estimation schemes have to be used. Even if data galore were available, we still have to deal with the problem of rarely seen events and their difficult probability estimation. However, probabilities are, perhaps, the best data that we can plug onto data mining or machine learning techniques, be they for supervised or unsupervised tasks. The weighted Pointwise Mutual Information. Recall the formula for the mutual information between two random variables IPXY = EPXY [IXY (x, y)] where IXY (x, y) = log PXPXY (x,y) (x)·PY (y) is the pointwise mutual information, (PMI). Remember that −∞ ≤ IXY (x, y) < ∞ with IXY (x, y) = 0 being the case where X and Y are independent. The negative values are caused by phenomena less represented in the joint data than in independent pairs as captured by the marginals. The extreme value IXY (x, y) = −∞ is generated when the joint probability is negative even if the marginals are not. These are instances that capture “negative” association whence to maximize the expectation we might consider disposing of them. On the other hand, on count data the PMI has an unexpected and unwanted effect: it is very high for hapax legomena phenomena that are encountered only once in a tallying, and in general it has a high value for phenomena with low counts of whose statistical behaviour we are less certain. 122 Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas However, we know that X X PXY (x, y) IPXY = PXY (x, y) · IXY (x, y) = PXY (x, y) log x,y x,y PX (x) · PY (y) and this is always a positive quantity, regardless of the individual values of IXY (x, y). This suggests calling weighted pointwise mutual information, (wPMI) the quantity PXY (x, y) wPMI(x, y) = PXY (x, y) log (1) PX (x) · PY (y) and using it as the subject of optimization or exploration to do so. Note that pairs of phenomena whose joint probability are close to independent, as judged by the pointwise information, will be given a very low value in the wP M I , and that the deleterious character of hapaxes on IPXY is lessened by the influence of the joint probability. 2.3 Visualizing mutual information maximization For a joint distribution PY Ŷ (y, ŷ), [13] introduced a balance equation binding the mutual information between two variables IPY Ŷ , the sum of their conditional en- tropies V IPY Ŷ = HPY |Ŷ + HPŶ |Y and the sum of their entropic distance between their distributions and uniformity ∆HPY Ŷ = (HUY − HPY ) + (HUŶ − HPŶ ), log(HUY ) + log(HUŶ ) = ∆HPY Ŷ + 2 ∗ IPY Ŷ + V IPY Ŷ . By normalizing in the total entropy log(HUY ) + log(HUŶ ) we may obtain the equation of the 2-simplex that can be represented as a De Finetti diagram like that of Fig. 2.(a), as the point in the 2-simplex corresponding to coordinates F (PY Ŷ ) = [∆HP0 Y Ŷ , 2 ∗ IP0 Y Ŷ , V IP0 Y Ŷ ] where the primes represent the normalization described above. The axis of this representation were chosen so that the height of the 2- simples—an equilateral triangle—is proportional to the mutual information be- tween the variables so a maximization process is extremely easy to represent (as in Fig. 2): given a parameter ϕ whereby to maximize IPY Ŷ (as a variable), draw the trace of the evaluation of the coordinates in the ET of the distributions that it generates, and choose the ϕ∗ that produces the highest point in the triangle. This technique is used in Sec. 3.1, but other intuitions can be gained from this representation as described in [14]. 2.4 Exploring the space of joint distributions Since the space of count distributions is so vast, we need a technique to explore it in a principled way. For that purpose we use K-Formal Concept Analysis On Concept Lattices as Information Channels 123 (KFCA). This is a technique to explore L-valued contexts where L is a complete idempotent semifield using a free parameter called the threshold of existence [15, 13]. We proceed in a similar manner to Fuzzy FCA: For L-context hY, Ŷ, Ri, con- sider two spaces LY and LŶ , representing, respectively, L-valued sets of objects and attributes. Pairs of such sets of objects and attributes that fulfil certain po- lars equation have been proven to define dually-ordered lattices of closed L-sets in the manner of FCA 3 . Since the actual lattices of object sets and attributes are so vast, KFCA uses a simplified representation for them: for the singleton sets in each of the spaces δy , for y ∈ Y and δŷ , for ŷ ∈ Ŷ , we use the L-polars to generate their object- γYϕ (y) and attribute-concept µϕ Ŷ (ŷ), respectively, and obtain a structural ϕ-context Kϕ = hY, Ŷ, Rϕ i, where yRϕ ŷ ⇐⇒ γYϕ (y) ≤ µϕ Ŷ (ŷ) 4 . In this particular case we consider the min-plus idempotent semifield and the L-context hY, Ŷ, wP M Ii where wPMI is the weighted Pointwise Mutual In- formation relation between atomic events in the sigma lattices of Y and Ŷ of Sec. 2.2, whence the degree or threshold of existence is a certain amount of entropy required for concepts to surpass for them to be considered. The following step amounts to an entropy conformation of the joint distribu- tion, that is, a redistribution of the probability masses in the joint distribution to obtain certain entropic properties. Specifically, we use the (binary) ϕ-formal context to filter out certain counts in the contingency table to obtain a confor- mal contingency table NYϕŶ (y, ŷ) = NY Ŷ (y, ŷ) Kϕ , where represents here the Hadamard (pointwise) product. For each conformal NYϕŶ (y, ŷ) we will obtain a certain point F (ϕ) in the ET to be represented as described in Sec. 2.3. 3 Application We next present two envisaged applications of the technique of MI Maximization. 3.1 Cluster identification Confusion matrices are special contingency tables whose two modes refer to the same underlying set of labels[4]. We now put forward a procedure to maximize the information transmitted from a set of “ground truth” patterns acting as objects with respect to “perceived patterns” which act as attributes. As noted in the introduction, this is just one of the possible points of view about this problem. Consider the following scenario, there is a clustering task for which extrinsic evaluation is possible, that is, there is a gold standard partitioning of the in- put data. One way to evaluate the clustering solution is to obtain a confusion 3 Refer to [13] for an in-depth discussion of the mathematics of idempotent semifields and the different kinds of Galois connections that they generate. 4 And a structural ϕ-lattice Bϕ (Kϕ ) as its concept lattice, but this is not important in the present application 124 Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas matrix out of this gold standard, in the following way: If the number of classes is known—a realistic assumption in the presence of a gold standard—then the MI optimization procedure can be used to obtain the assignments between the classes in the gold standard and the clusters of the procedure, resulting in cluster identification. For the purpose of testing the procedure, we used the segmented numeral data from [16]. This is a task of human visual confusions between numbers as displayed by seven-segment LED displays, as shown in Fig. 1.(a). The entry in the count matrix NCK (c, k) = nck counts the number times that an instance of class c was confused with class k . Figure 1.(b) shows a heatmap presentation of the original confusion matrix and column-reshuffled variants. P Note that the confusion matrix is diagonally-dominant, that is nii > j,j6=i nij and likewise for column i . (a) (b) Fig. 1: Segmented numeral display (a) from [16] and the column-reshuffled con- fusion matrix (b) of the human-perception experiment. Cluster identification is already evident in this human-visualization aid, but the method here presented is unsupervised. To test the MI optimization procedure, we randomly permuted the confu- sion matrix columns: the objective was to recover the inverse of this random permutation from the MI optimization process so that the original order could be restored. This amounts to an assignment between classes and induced clus- ters, and we claim that it can be done by means of the mutual information maximization procedure sketched above. For that purpose, we estimated PCK (c, k) using the empirical estimate NCK (c, k) P̂CK (c, k) ≈ n On Concept Lattices as Information Channels 125 P where n is the number of instances to be clustered n = ck NCK (c, k) , and then we obtained its empirical PMI IˆCK (c, k) = log P̂CK (c, k) and its weighted PMI wP M I CK (c, k) = P̂CK (c, k) · IˆCK (c, k) . Next, we used the procedure of Sec. 2.4 to explore the empirical wPMI and select the threshold value which maximizes the MI. Figure 2.(a) shows the tra- jectory of the different conformed confusion matrices as ϕ ranges in [0, ∞) on the ET: we clearly see how for this balanced task dataset the exploration results in a monotonous increase in MI in the thresholding range until a value that pro- duces the maximum MI, at wP M I ∗ = 0.1366 . The discrete set of points stems from the limited range of counts in the data. We chose this value as threshold and obtained the binary matrix which is the assignment from classes to clusters and vice-versa shown in Fig. 2.(b). Note that in this particular instance, the ϕ∗ -concept lattice is just a diamond lat- tice reflecting the perfect identification of classes and clusters. In general, with contingency tables where modes have different cardinalities, this will not be the case. 3.2 Entropy conformation for count matrices The case where the contingency matrix is squared and diagonally dominant, as in the previous example, is too specific: we need to show that for a generic, rect- angular count contingency matrix, entropy maximization is feasible and mean- ingful. The first investigation should be on how to carry the maximization pro- cess. For that purpose, we use a modified version of the Reuters-21578 5 that has already been stop-listed and stemmed. This is a multi-label classification dataset [17] describing each document as a bag-of-terms and some categoriza- tions labels, the latter unused in our present discussion. We considered the document-term matrix for training, a count distribution with D = 7 770 documents and T = 5 180, terms. Its non-conformed entropy co- ordinates are F (NDT ) = [0.1070, 0.3584, 0.5346] as shown in the deep blue circle to the left of Fig. 3. We carried out a joint-mutual information maximization process by exploring at the same time a max-plus threshold—the count has to be bigger thant the threshold to be considered—and a min-plus threshold—the count has to be less than the threshold. The rationale for this is a well-tested hy- pothesis in the bag-of-term model: very common terms (high frequency) do not select well for documents, while very scarce terms (low frequency) are too spe- cific and biased to denote the general “aboutness” of a document. Both should be filtered out of the document-term matrix. 5 http://www.daviddlewis.com/resources/testcollections/reuters21578/ readme.txt. Visited 24/06/2014. 126 Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas (a) 0000100000 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 Kϕ∗ = 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0100000000 (b) Fig. 2: Trajectory of the evolution of MI transmission for the segmented numeral data as the exploration threshold is raised in the wPMI matrix (a), and maximal MI cluster assignment matrix at wPMI = 1.366 bits (b) for column-shuffled Segmented Numerals. The resulting concept lattice is just a diamond lattice identifying classes and clusters and not shown. On Concept Lattices as Information Channels 127 Instead of count-based individual term filtering we carry a joint term-document pair selection process: for a document-matrix, we calculate its overall weighted PMI matrix, and only those pairs (d, t) whose wPMI lies in between a lower φ and an upper ϕ thresholds are considered important for later processing. For each such pairs, we created an indicator matrix I(d, t) that is 1 iff φ ≤ wM I(d, t) ≤ ϕ, and we used the Kronecker multiplication to filter out non-conforming pairs from the final entropy calculation, X M̂ IP0 DT = wP M IDT (d, t) · I(d, t) d,t Figure 3 represents the trace of that process as we explore a grid of 10 × 10 different values of φ and ϕ (the same set of values for both). The grid was obtained by equal width binning of the whole range of wP M IDT (d, t) in the original wMI matrix as defined in [18]. Fig. 3: Trace of the entropy conformation process for a count matrix. The blue dot to the left is the original level of entropy. For a wide range of pairs (φ, ϕ) the entropy of the conformed count matrix is greater than the original one, and we can actually find a value where it is maximized. We can see how M̂ IP0 DT reaches a maximum over two values and then de- creases again, going even below the original mutual information value. We read 128 Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas two different facts in this illustration: that the grid used is effective in obtaining approximations to φ and ϕ for MI maximization, and that not every possible pair of values is a good solution for the process. All in all, this procedure shows that MI maximization is feasible by tracking its in the ET. We do not present any results in this paper as to the effectiveness of the process for further processing tasks, which should be evaluated on the extrinsic measures on the Reuters multi-labelling task. 4 Discussion We now discuss the applications selected in a wider context. Although less per- vasive than its unsupervised version, the basic task of supervised clustering has application, for instance, in tree-induction for supervised classification [5, 18] or unsupervised clustering evaluation using a gold-set [19]. Cluster identification in Sec. 3.1 is a sometimes-fussy sub-procedure in clustering which our proposal solves elegantly. The feasibility study on mutual information conformation of Sec. 3.2 is a necessary step for further processing—binary or multi-labelling classification— but as of this paper unevaluated. Further work should concentrate on leveraging the boost in mutual information to lower the classification error, as suggested in the theoretical sections. Besides, the use of two simultaneous, thresholds on different algebras makes it difficult to justify the procedure on FCA terms: this does not conform to the definition of any lattice-inducing polars that we know of, so this feature should be looked into critically. Despite this fact, the procedure of conformation “makes sense”, at least for this textual classification task. Note that the concept of “information channel” that we have developed in this paper is not what Communication Theory usually considers. In there, “input symbols” enter the channel and come out as “output symbols”, hence input has a sort of ontological primacy over output symbols in that the former cause the latter. If there is anything particular about FCA as an epistemological theory is that it does not prejudge the ontological primacy of objects over attributes or vice versa. Perhaps the better notion is that a formal concept is an information co- channel between objects and attributes, in the sense that the information “flows” both from objects to attributes and vice versa, as per the true symmetric nature of mutual information: receiving information about one of the modes decreases the uncertainty of the other. The previous paragraph notwithstanding, we will often find ourselves in ap- plication scenarios in which one of the modes will be primary with respect to the other, in which case the analogies with communication models will be more evident. This is one of the cases that we explore in this paper, and that first pointed at in [6, 7]. Contingency tables are an instance of aggregable data tables [4, §0.3.4]. It seems clear that not just counts, but any non-negative entry aggregable table can be treated with the tools here presented, e.g. concentrations of solutes. In that On Concept Lattices as Information Channels 129 case, the neat interpretation related to MI maximization will not be available, but analogue ones can be found. A tangential approach to the definition of entropies in (non-Boolean) lattices has been taken by [20, 21, 22, 23, 24]. These works approach the definition of measures, and in particular entropy measures, in general lattices instead of finite sigma algebras (that is, Boolean lattices). [22] and [24] specifically address the issue of defining them in concept lattices, but the rest provide other heuristic foundations for the definition of such measures which surely must do without some of the more familiar properties of the Shannon (probability-based) entropy. 5 Conclusions and further work We have presented an incipient model of L-formal contexts of aggregable data and their related concept lattices as information channels. Using KFCA as the exploration technique and the Entropy Triangle as the representation and vi- sualization technique we can follow the maximization procedure on confusion matrices in general, and in confusion matrices for cluster identification in par- ticular. We present both the basic theory and two proof-of-concept applications in this respect: a first one cluster identification, fully interpretable in the framework of concept lattices, and another, entropy conformation for rectangular matrices more difficultly embeddable in this framework. Future applications will extend the analysis of count contingency tables, like document-term matrices, where our entropy-conformation can be likened to fea- ture selection techniques. Bibliography [1] Shannon, C.E.: A mathematical theory of Communication. The Bell System Technical Journal XXVII (1948) 379–423 [2] Shannon, C., Weaver, W.: A mathematical model of communication. The University of Illinois Press (1949) [3] Brillouin, L.: Science and Information Theory. Second Edition. Courier Dover Publications (1962) [4] Mirkin, B.: Mathematical Classification and Clustering. Volume 11 of Non- convex Optimization and Its Applications. Kluwer Academic Publishers (1996) [5] Mirkin, B.: Core Concepts in Data Analysis: Summarization, Correlation and Visualization. Summarization, Correlation and Visualization. Springer, London (2011) [6] Peláez-Moreno, C., Garcı́a-Moral, A.I., Valverde-Albacete, F.J.: Analyzing phonetic confusions using Formal Concept Analysis. Journal of the Acous- tical Society of America 128 (2010) 1377–1390 130 Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas [7] Peláez-Moreno, C., Valverde-Albacete, F.J.: Detecting features from con- fusion matrices using generalized formal concept analysis. In Corchado, E., Grana-Romay, M., Savio, A.M., eds.: Hybrid Artificial Intelligence Systems. 5th International Conference, HAIS 2010, San Sebastián, Spain, June 23-25, 2010. Proceedings, Part II. Volume 6077 of LNAI., Springer (2010) 375–382 [8] Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008) [9] Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification Perspective. Cambridge University Press (2011) [10] Frénay, B., Doquire, G., Verleysen, M.: Theoretical and empirical study on the potential inadequacy of mutual information for feature selection in classification. NEUROCOMPUTING 112 (2013) 64–78 [11] M Fano, R.: Transmission of Information: A Statistical Theory of Commu- nication. The MIT Press (1961) [12] Feder, M., Merhav, N.: Relations between entropy and error probability. IEEE Transactions on Information Theory 40 (1994) 259–266 [13] Valverde-Albacete, F.J., Peláez-Moreno, C.: Two information-theoretic tools to assess the performance of multi-class classifiers. Pattern Recog- nition Letters 31 (2010) 1665–1671 [14] Valverde-Albacete, F.J., Peláez-Moreno, C.: 100% classification accuracy considered harmful: the normalized information transfer factor explains the accuracy paradox. PLOS ONE (2014) [15] Valverde-Albacete, F.J., Peláez-Moreno, C.: Galois connections between semimodules and applications in data mining. In Kusnetzov, S., Schmidt, S., eds.: Formal Concept Analysis. Proceedings of the 5th International Conference on Formal Concept Analysis, ICFCA 2007, Clermont-Ferrand, France. Volume 4390 of LNAI., Springer (2007) 181–196 [16] Keren, G., Baggen, S.: Recognition models of alphanumeric characters. PERCEPT PSYCHOPHYS 29 (1981) 234–246 [17] Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Inter- national Journal of Data Warehousing and . . . (2007) [18] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11 (2009) [19] Meila, M.: Comparing clusterings—an information based distance. Journal of Multivariate Analysis 28 (2007) 875–893 [20] Knuth, K.: Valuations on Lattices and their Application to Information Theory. Fuzzy Systems, IEEE International Conference on (2006) 217–224 [21] Grabisch, M.: Belief functions on lattices. International Journal Of Intelli- gent Systems 24 (2009) 76–95 [22] Kwuida, L., Schmidt, S.E.: Valuations and closure operators on finite lat- tices. Discrete Applied Mathematics 159 (2011) 990–1001 [23] Simovici, D.: Entropies on Bounded Lattices. Multiple-Valued Logic (IS- MVL), 2011 41st IEEE International Symposium on (2011) 307–312 [24] Simovici, D.A., Fomenky, P., Kunz, W.: Polarities, axiallities and mar- ketability of items. In: Proceedings of Data Warehousing and Knowledge Discovery - DaWaK. Volume 7448 of LNCS. Springer (2012) 243–252 Using Closed Itemsets for Implicit User Authentication in Web Browsing 1 1,2 2 2 2 O. Coupelon , D. Dia , F. Labernia , Y. Loiseau , and O. Raynaud 1 Almerys, 46 rue du Ressort, 63967 Clermont-Ferrand {olivier.coupelon,diye.dia}@almerys.com 2 Blaise Pascal University, 24 Avenue des Landais, 63170 Aubière {loiseau,raynaud}@isima.fr, fabien.labernia@gmail.com Abstract. Faced with both identity theft and the theft of means of authentication, users of digital services are starting to look rather suspi- ciously at online systems. The behavior is made up of a series of observ- able actions of an Internet user and, taken as a whole, the most frequent of these actions amount to habit. Habit and reputation oer ways of recognizing the user. The introduction of an implicit means of authenti- cation based upon the user's behavior allows web sites and businesses to rationalize the risks they take when authorizing access to critical func- tionalities. In this paper, we propose a new model for implicit authen- tication of web users based on extraction of closed patterns. On a data set of web navigation connection logs of 3,000 users over a six-month period we follow the experimental protocol described in [1] to compute performance of our model. 1 Introduction In order to achieve productivity gains, companies are encouraging their cus- tomers to access their services via the Internet. It is accepted that on-line ser- vices are more immediate and more user-friendly than accessing these services via a brick and mortar agency, which involves going there and, more often than not, waiting around [2]. Nevertheless, access to these services does pose secu- rity problems. Certain services provide access to sensitive data such as banking data, for which it is absolutely essential to authenticate the users concerned. However identity thefts are becoming more and more numerous [3]. We can dis- tinguish two paradigms for increasing access security. The rst one consists of making access protocols stronger by relying, for example, on external devices for transmitting access codes that are supplementary to the login/password pair. Nevertheless, these processes are detrimental to the user-friendliness and usabil- ity of the services. The number of transactions abandoned before reaching the end of the process is increasing and exchange volumes are decreasing. The sec- ond paradigm consists to the contrary of simplifying the identication processes in order to increase the exchange volumes. By way of examples, we can mention single-click payment [2] [4] or using RFID chips for contactless payments. Where these two paradigms meet is where we nd implicit means of authentication. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 131–143, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 2132 Olivier O. Coupelon Coupelon, et al. D. Dia, F. Labernia, Y. Loiseau, and O. Raynaud A means of authentication is a process that makes it possible to ensure that the identity declared in the event of access is indeed the user's identity. Traditionally, a user authenticates himself or herself by providing proof of identity [5]. This process is called explicit authentication. In contrast, implicit authentication does not require anything from the user but instead studies his or her behavior, the trail left by the user's actions, and then either does or does not validate the declared identity. An implicit means of authentication cannot replace traditional means of authentication as it is necessary for the user to have access to his or her service so that the person's behavior may be studied and their identity can either be validated or rejected. To the contrary, if it is eective, it would enable stronger authentication modes to be avoided (such as chip cards and PIN numbers), which are detrimental to the usability of services. The challenge is to detect identity theft as quickly as possible and, to the contrary, to validate a legitimate identity for as long a time as possible. This contribution is organized as follows: in section 2 we shall oer a state-of- the-art about implicit authentication and user's prole in web browsing. Then we propose a learning model for implicit authentication of web users we are dealing with in section 3. In section 4, we compare several methods for building proles of each user. We faithfully reproduce the experimental study conducted in [1] and we analyze all of our results. Finally, in section 5, we shall resume our results and discuss our future work. 2 Related works In his survey of implicit authentication for mobile devices ([6]), the author says of an authentication system that it is implicit if the system does not make demands of the user (see Table 1). Implicit authentication systems were studied very quickly for mobile phones. In [7] and [8], the authors studied behaviour based on variables specic to smart- phones such as calls, SMS's, browsing between applications, location, and the time of day. Experiments were conducted based on the data for 50 users over a period of 12 days. The data were gathered using an application installed by users who were volunteers. The users' proles were built up from how frequently positive or negative events occurred and the location. Within this context, a positive event is an event consistent with the information gathered upstream. By way of an example, calling a number which is in the phone's directory is a positive event. The results of this study show that based on ten or so actions, you can detect fraudulent use of a smartphone with an accuracy of 95%. In a quite dierent context, the authors of [9] relied on a Bayesian classication in order to associate a behaviour class with each video streaming user. The data set is simulated and consists of 1,000 users over 100 days. The variables taken into account are the quality of the ow, the type of program, the duration of the session, the type of user, and the popularity of the video. The results are mixed, because the model proposed admits to an accuracy rate of 50%. Using Closed Using Closed Itemsets Itemsets for for Implicit Implicit User User Authentication Authentication in in Web Web Browsing Browsing 133 3 Feature Capturing Implicit/Explicit Spoong Threats Problems Method Passcode Keyboard input Explicit Keyloggers, Guessable pass- Shoulder Surng words Token Hardware device Mainly explicit, None Easily stolen or implicit possible lost Face & Iris Camera Both Picture of the le- Lighting situa- gitimate user tion and make-up Keystroke Keyboard Implicit, explicit Typing imitation Long training possible (dicult) phase, reliability Location GPS, infrastruc- Implicit Informed Traveling, preci- ture strangers sion Network Software protocol Implicit Informed Precision (e.g. WireShark) strangers Table 1. Comparison of dierent authentication methods The particular context of implicit authentication for web browsing was studied in [1], [10], [11] and [12]. In [1], the author adopted the domain name, the num- ber of pages viewed, the session start time, and its duration, as characteristic variables. The data set, which was gathered by a service provider, consisted of 300 rst connections by 2,798 users over a period of 12 months. The user proles consisted of patterns with a size of 1. The author compares several pattern selec- tion approaches like the support and the lift approaches. The study shows that for small, anonymous behavioural patterns (involving up to twenty or so sites visited), the most eective models are still traditional classication models like decision trees. On the other hand, whenever anonymous behaviour exceeds 70 or so sites, the support and lift-based classication models are more accurate. The study conducted in [12] states that the size of the data set remains a determining parameter. Their study, conducted on 10 users over a one-month period, did not enable them to build a signicant model for distinguishing users. The authors also concluded that no variable taken individually enables a user to be authen- ticated. Drawing inspiration from a study conducted in [1], the authors of [13] studied several techniques for spying on a user who holds a dynamic IP address, based on behavioural models. The methods compared are seeking motives, the nearest neighbours technique, and the multinomial Bayesian classier. The data set consisted of DNS requests from 3,600 users over a two-month period. In this study, only the most signicant variables and the most popular host names were considered. The accuracy rates for the models proposed were satisfactory. The study that we conduct in this paper also forms part of a continuation of the work by [1]. We faithfully reproduce his experimental protocol on our data and we compare performance of our classication algorithm to his specic models. 4134 Olivier O. Coupelon Coupelon, et al. D. Dia, F. Labernia, Y. Loiseau, and O. Raynaud 3 Models We propose an intuitive learning model architecture for user authentication. From a data set of web browsing logs we compute a set of own patterns for each user. A pattern is a set of frequently visited sites. The size of pattern may vary. Thanks to these proles we are able to provide an authentication for anonymous sessions. We then compute confusion matrices and we provide precisions of the models. In our present study, we compare performance of a naive Bayes classier to variations on k-nearest neighbors algorithms. More precisely, the studied parameters are selection process of user own patterns, computation process of user proles and distance functions computed for classication stage. Figure 1 outlines the framework of the machine learning process. Past Anonymous Behaviour Behaviour ? User ? Prole- User Learning Algorithms Score Computation - Authentication Fig. 1. Architecture 3.1 Formal framework We call a session a set of visited web sites at a specic time by a given user ui such as i ∈ [1,n] and n is the number of users. The size of a session is limited and equal to 10. The learning database of each user ui takes the form of a set of 3 S sessions denoted Sui and is built from log data . We call S = i Sui the whole set of sessions of the database. We call Wui the whole set of web sites visited at least once by user ui and we S call W = i Wui the whole set of visited sites. The order of visited web sites is not taken into account by this model. Denition 1 (k-pattern). Let W be a set of visited web sites and S be a set of sessions on W . A subset P of W is called a k − pattern where k is the size of P . A session S in S is said to contain a k − pattern P if P ⊆ S . Denition 2 (Support and relative support (lift)). We dene the support of a pattern P as the percentage of sessions in S containing P (by extension we give the support of a pattern in the set of sessions of a given user ui ): ||{S ∈ S | P ⊆ S}|| ||{S ∈ Sui | P ⊆ S}|| supportS (P ) = supportSui (P ) = ||S|| ||Sui || 3 Cf. section 4.1 Using Closed Using Closed Itemsets Itemsets for for Implicit Implicit User User Authentication Authentication in in Web Web Browsing Browsing 135 5 For a given user the relative strength of a pattern is equivalent to the lif t in a context of association rules (i.e. the support of the pattern within this user divided by the support of the pattern across all users). More formally: supportSui (P ) lif tSui S (P ) = supportS (P ) The support measures the strength of a pattern in behavioral description of a given user. The relative support mitigates support measure by considering the pattern's support on the whole sessions set. The stronger the global support of a pattern, the lesser characteristic of a specic user. The tf-idf is a numerical statistic that is intended to reect how relevant a word is to a document in a corpus. The tf-idf value increases proportionally to the number of times a word appears in the document, but is oset by the frequency of the word in the whole corpus ([14]). In our context, a word becomes a pattern, a document becomes a set of sessions Sui of a given user and the corpus becomes the whole set S of all sessions. Denition 3 (tf ×idf ). Let P be a pattern, let U be a set of users and Up ⊆ U such that ∀ui ∈ Up , supportSui (P ) 6= 0. Let Sui be a set of sessions of a given user ui and S a whole set of sessions. The normalized term frequency denoted tf (P ) is equal to supportSui (P ) and the inverse document frequency denoted idf (P ) is equal to log (||U ||/||UP ||). We have: ||U || tf × idf (P ) = supportSui (P ) × log ||UP || Denition 4 (Closure system). Let S be a collection of sessions on the set W of web sites. We denote S c the closure under intersection of S . By adding W in S c , S c is called a closure system. Denition 5 (Closure operator). Let W be a set, a map C: 2W → 2W is a closure operator on W if for all sets A and B in W we have: A ⊆ C(A), A ⊆ B =⇒ C(A) ⊆ C(B) and C(C(A)) = C(A). Theorem 1. Let S c be a closure T system on W . Then the map CS dened on c 2W by ∀A ∈ 2 , C (A) = {S ∈ S | A ⊆ S} is a closure operator on W 4 . W Sc c Denition 6 (Closed pattern5 ). Let S c be a closure system on W and CS c its corresponding closure operator. Let P be a pattern (i.e. a set of visited sites), we said that P is a closed pattern if CS c (P ) = P . 4 Refer to the book of [15]. 5 This denition is equivalent to a concept of the formal context K = (S,W,I) where S is a set of objects, W a set of attributes and I a binary relation between S and W [16]. 6136 Olivier O. Coupelon Coupelon, et al. D. Dia, F. Labernia, Y. Loiseau, and O. Raynaud 3.2 Own patterns selection The rst and most important step of our model, called own patterns selection is to calculate the set of own patterns for each user ui . This set of patterns is denoted Pui = {Pi,1 , Pi,2 ,..., Pi,p }. In [1], the author states that p = 10 should be a reference value and that beyond this value model performance are stable. We shall follow that recommendation. In [1], 10 frequent 1 − patterns are selected for each user. The aim of our study is to show that it could be more ecient to select closed k − patterns. But, the number of closed patterns should be strong, so we compare three heuristics (H1 , H2 and H3 ) to select the 10 closed patterns of each user. For each heuristic, closed patterns are computed thanks to Charm algorithm ([17]) provided on the Coron platform ([18]). Only closed patterns with a size lower than or equal to 7 are considered. These heuristics are presented here: 1. 10 1 − patterns with the largest support values (as in [1]) 2. H1 : 40 closed k − patterns with the largest tf-idf values. 3. H2 : 10 ltered closed k − patterns with the largest support and maximal values by inclusion set operator. 4. H3 : 10 ltered closed k − patterns with the largest tf-idf and minimal values by inclusion set operator. Algorithm 1 describes the process of H1 to select the 40 own patterns for a given user. With H1 , the model performance is improved when p increases up to 40. p = 10 is the better choice for H2 and H3 . The best results are from H1 . Algorithm 1: H1 : 40 closed k − patterns with the largest tf-idf values. Data: Cu : the set of closed itemsets of user ui from Charm; i p : the number of selected own patterns; Result: Pui : the set of own patterns of user ui ; 1 begin 2 Compute the tf × idf for each pattern from Charm; 3 Sort the list of patterns in descending order according to the tf × idf value; 4 Return the top p patterns; 3.3 User proles computation S We dene and we denote Pall = i Pui the whole set of own patterns. The set Pall allows us to dene a common space in which all users could be embedded. More formally, Pall denes a vector space V of size all = ||Pall || where a given user ui is represented as a vector Vui = (mi,1 ,mi,2 ,...,mi,all ). The second step of our model, called user prole computation, is to compute, for each user ui , a numerical value for each component mi,j of the vector Vui . i is the user id, j ∈ [1,all] is a pattern id and m stands for a given measure. In this paper, we compare two measures proposed in [1]: the support and the lift. Using Closed Using Closed Itemsets Itemsets for for Implicit Implicit User User Authentication Authentication in in Web Web Browsing Browsing 137 7 mi,j = supportSui (Pj ) and mi,j = lif tSui S (Pj ) 3.4 Authentication stage In our model, the authentication step is based on the identication. For that purpose, our model guesses the user corresponding to an anonymous set of ses- sions, then it checks if the guessed identity corresponds to the real identity. From this set of sessions we have to build a test prole and to nd the nearest user prole dened during the learning step. Test sessions Performance of our models are calculated on anonymous data sets of growing size.The more information available, the better the classication will be. The rst data set consists of only one session, the second consists of 10 sessions, the third one consists of 20 sessions, and the last one consists of 30 sessions. For the test phase, all sessions have the same size of 10 sites. Building test prole Let S be the whole set of sessions from the learning data set. Let Sut be an anonymous set of sessions and Vut = (mt,1 ,mt,2 ,...,mt,all ) its corresponding prole vector. We will compare two approaches to build the anonymous test prole, the support and the lift: supportSut (Pi ) ∀i, mt,i = supportSut (Pi ) and ∀i, mt,i = lif tSut S = supportS (Pi ) Distance functions Let Vu = (mi,1 ,mi,2 ,...,mi,all ) and Vu = (mt,1 ,mt,2 ,...,mt,all ) i t be two proles. We denoted DisEuclidean (Vui ,Vut ) the Euclidean distance and we denote SimCosine (Vui ,Vut ) the cosine similarity function. We have: sX DisEuclidean (Vui ,Vut ) = (mt,j − mi,j )2 j P j (mt,j × mi,j ) SimCosine (Vui ,Vut ) = qP P 2× 2 j (mt,j ) j (mi,j ) 4 Experimental results 4.1 Data set Our data set is comprised of the web navigation connection logs of 3,000 users over a six-month period. We have at our disposal the domain name visited and each user ID. From the variables of day and time of connection we have con- structed connection sessions for each user. A session is therefore a set of web 8138 Olivier O. Coupelon Coupelon, et al. D. Dia, F. Labernia, Y. Loiseau, and O. Raynaud sites visited. The number of visited web sites per session is limited and equal to 10. For the relevance of our study we used Adblock 6 lters to remove all do- mains regarded as advertising. The majority of users from this data set are not suciently active to be of relevance. Therefore, as in [1], we have limited our study to the 2% of most active users and obtained the signicant session sets for 52 users. The 30 users most active (who have a large number of sessions) among those 52 users are used in this paper. Table 2 gives the detailed statistics for this data set. 7698 sessions Minimum Maximum Mean Standard deviation Size 10 10 10 0 #sessions/users 101 733 257 289 Table 2. Descriptive statistics of the used data set: size of sessions (number of visited web sites) and number of sessions per user, for 30 users. 4.2 Experimental protocol: a description Algorithm 2 (see appendix) describes our experimental protocol. The rst loop sets the size of the set of users among which a group of anonymous sessions will be classied. The second one sets the size of this sessions group. Finally, the third loop sets the number of iterations used to compute the average accuracy rate. The loop on line 10 computes the specic patterns of each user and establishes the proles vector. The loop on line 13 computes the vector's components for each user. The nested loops on lines 16 and 18 classify test data and compute the accuracy rate. 4.3 Comparative performance of H1 , H2 and H3 From own patterns of each user we compute the set Pall as the whole set of own patterns which denes the prole vector of each user. We use the support of a pattern as numerical value for each components (cf. section 3.3). Following Table 3 provides the size of the prole vector and the distribution of own patterns according to size for each heuristic. With 30 users and 10 own patterns per user, the maximal size of the prole is 300. Number of own patterns |1| |2| |3| |4| |5| |6| |7| H1 199 18% 31% 26% 16% 7% 2% 0% H2 167 57% 29% 9% 3% 1% 1% 0% H3 199 24% 20% 18% 14% 10% 9% 5% Table 3. Prole vector size and the distribution of own patterns according to size. Using Closed Using Closed Itemsets Itemsets for for Implicit Implicit User User Authentication Authentication in in Web Web Browsing Browsing 139 9 80 Accuracy 60 40 20 Bayes Charm H3 Charm H2 Charm H1 0 5 10 15 20 25 30 35 Number of test sessions Fig. 2. Comparative performance of H1 , H2 and H3 . These observations are plotted on an X-Y graph with number of sessions of the anonymous set on the X-axis and accuracy rate on the Y-axis. Measured values are smoothed on 50 executions. Figure 2 shows that naive Bayes classier is the most eective if the group of test sessions is from 1 to 13 sessions (10 to 130 visited web sites). This result is in line with the study in [1]. Finally, this graph clearly shows that heuristic H1 certainly stands out from H2 and H3 . So, the best heuristic is to choose owns patterns amongst closed patterns with the largest tf × idf values. As a consequence, the majority of patterns are small-sized patterns (two or three sites) (cf. Table 3). But accuracy rates are much higher. 4.4 Comparative performance with [1] In [1], the author compares, in particular, two methods of prole vector calculus. In both cases, the own patterns are size 1 and are chosen amongst the most fre- quent. The rst method, named support-based proling, uses the corresponding support pattern as the numerical value for each component of the prole vector. The second method, called lift-based proling, uses the lift measure. In order to compare the performances of the H1 model with the two models support-based proling and lift-based proling, we have accurately replicated the experimental protocol described in [1] on our own data set. The results are given in Table 4. The data of Table 4 highlight that the H1 heuristic allows rates that are perceptibly better than those of the two models proposed in [1] in all possible scenarios. Nevertheless, the Bayes classier remains the most ecient when the session group is size 1 in compliance with [1]. Figure 3 allows a clearer under- standing of the moment the Bayes curve crosses the H1 heuristic curve. 6 http://adblock-listefr.com/ 140 10 Olivier O. Coupelon Coupelon, et al. D. Dia, F. Labernia, Y. Loiseau, and O. Raynaud # of users 1 10 20 30 Support 65 89 95 97 2 Lift 67 90 97 98 Charm H1 72 98 99 100 Bayes 85 99 73 61 Support 40 74 83 88 5 Lift 41 78 86 88 Charm H1 49 90 95 98 Bayes 67 96 56 34 Support 27 66 79 80 10 Lift 29 64 77 80 Charm H1 37 83 92 94 Bayes 54 91 51 24 Support 19 55 68 75 20 Lift 21 58 68 74 Charm H1 30 76 86 90 Bayes 43 87 48 19 Support 16 53 64 70 30 Lift 17 54 64 69 Charm H1 26 72 83 89 Bayes 39 83 46 19 Table 4. On left, we nd the number of users and the selected model. Each column is dened by the number of sessions of the anonymous data set. Sessions are of size 10. Measured accuracy rate are smoothed on 100 executions. In bold the best values are presented. 80 Accuracy 60 40 Support Lift 20 Bayes Charm H1 0 2 4 6 8 10 12 14 16 18 20 22 Number of test sessions Fig. 3. Comparative performance of Bayes, support-based proling, lift-based proling and H1 . These observations are plotted on an X-Y graph with number of sessions of the anonymous set on the X-axis and accuracy rate on the Y-axis. Number of users is equal to 30. Measured values are smoothed on 50 executions. Using Closed Using Closed Itemsets Itemsets for for Implicit Implicit User User Authentication Authentication in in Web Web Browsing Browsing 141 11 4.5 Comparative performance of distance functions The last gure 4 shows the impact of distance function choice on performances of models. 80 60 Accuracy 40 Bayes Lift (cosine) 20 Lift (euclidean) Charm H1 (cosine) Charm H1 (euclidean) 0 5 10 15 20 25 30 35 Number of test sessions Fig. 4. Comparative performance of both H1 with cosine similarity and Euclidean distance, Bayes and lift-based proling. These observations are plotted on an X-Y graph with number of sessions of the anonymous set on the X-axis and accuracy rate on the Y- axis. Number of users is equal to 30. Measured values are smoothed on 100 executions. Figure 4 illustrates the signicance of the distance function concerning the performance. Indeed, when used with Euclidean distance, the H1 method is a bit more precise than the lift one (about 3%). However, performances are improved by using the cosine similarity and their relative ranking is even reversed. H1 method's performance are then better than lift by 10%. 5 Conclusions and future work In this study, we proposed a learning model for implicit authentication of web users. We proposed an simple and original algorithm (cf. Algorithm 1) to get a set of own patterns allowing to characterize each web user. The taken patterns have dierent size and qualify as closed patterns from closure system generated by the set of sessions (cf. Table 3). By reproducing experimental protocol described in [1], we showed that the performances of our model are signicantly better than some models proposed in the literature (cf. Table 4). We also showed the key role of the distance function (cf. Figure 4). This study should be extended in order to improve the obtained results. For a very small sites ow, the results of the solution should be better than results 142 12 Olivier O. Coupelon Coupelon, et al. D. Dia, F. Labernia, Y. Loiseau, and O. Raynaud from Bayes' method. Another way to improve results will be to select other types of variable and to add them to our current dataset. The selection of data has an undeniable impact on the results. References 1. Yang, Y.C.: Web user behavioral proling for user identication. Decision Support Systems (49) (2010) pp. 261271 2. Guvence-Rodoper, C.I., Benbasat, I., Cenfetelli, R.T.: Adoption of B2B Ex- changes: Eects of IT-Mediated Website Services, Website Functionality, Benets, and Costs. ICIS 2008 Proceedings (2008) 3. Lagier, F.: Cybercriminalité : 120.000 victimes d'usurpation d'identité chaque année en france. Le populaire du centre (in French) (2013) 4. Filson, D.: The impact of e-commerce strategies on rm value: Lessons from Ama- zon.com and its early competitors. The Journal of Business 77(S2) (2004) pp. S135S154 5. He, R., Yuan, M., Hu, J., Zhang, H., Kan, Z., Ma, J.: A novel service-oriented AAA architecture. 3 (2003) pp. 28332837 6. Stockinger, T.: Implicit authentication on mobile devices. The Media Informatics Advanced Seminar on Ubiquitous Computing (2011) 7. Shi, E., Niu, Y., Jakobsson, M., Chow, R.: Implicit authentication through learning user behavior. M. Burmester et al. (Eds.): ISC 2010, LNCS 6531 (2011) pp. 99113 8. Jakobsson, M., Shi, E., Golle, P., Chow, R.: Implicit authentication for mobile devices. Proceeding HotSec'09 Proceedings of the 4th USENIX conference on Hot topics in security (2009) pp. 99 9. Ullah, I., Bonnet, G., Doyen, G., Gaïti, D.: Un classieur du comportement des utilisateurs dans les applications pair-à-pair de streaming vidéo. CFIP 2011 - Colloque Francophone sur l'Ingénierie des Protocoles (in French) (2011) 10. Goel, S., Hofman, J.M., Sirer, M.I.: Who does what on the web: A large-scale study of browsing behavior. In: ICWSM. (2012) 11. Kumar, R., Tomkins, A.: A characterization of online browsing behavior. In: Proceedings of the 19th international conference on World wide web, ACM (2010) pp. 561570 12. Abramson, M., Aha, D.W.: User authentication from web browsing behavior. Pro- ceedings of the Twenty-Sixth International Florida Articial Intelligence Research Society Conference pp. 268273 13. Herrmann, D., Banse, C., Federrath, H.: Behavior-based tracking: Exploiting char- acteristic patterns in DNS trac. Computer & Security (2013) pp. 117 14. Salton, G.: Automatic text processing: The transformation, analysis and retrieval of information by computer. Addison Wesley (1989) 15. Davey, B.A., Priestley, H.A.: Introduction to lattices and orders. Cambridge University Press (1991) 16. Ganter, B., Wille, R.: Formal concept analysis, mathematical foundation, Berlin- Heidelberg-NewYork et al.:Springer (1999) 17. Zaki, M.J., Hsiao, C.: Ecient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on knowledge and data engineering 17(4) (2002) pp. 462478 18. Szathmary, L.: Symbolic Data Mining Methods with the Coron Platform. PhD Thesis in Computer Science, University Henri Poincaré Nancy 1, France (Nov 2006) Using Closed Using Closed Itemsets Itemsets for for Implicit Implicit User User Authentication Authentication in in Web Web Browsing Browsing 143 13 Appendix Algorithm 2: Experiment procedure S Data: i Su : all sessions from n users; i X : number of successive executions; Result: The mean accuracy of select models; 1 begin 2 for (N = {2, 5, 10, 20, 30}) do 3 for (S = {1, 10, 20, 30}) do 4 for (z = 1, . . . ,X ) do 5 Select N random users; 6 For each user, select SN = min(|Sui |, i = 1, . . . ,N ); 2 7 Take the 3 of the SN sessions from each users to form the training set; 8 Take the rest of SN sessions to form the validation set; k 9 Pall ← ∅ (the global prole vector for each model k ); 10 for each (ui , i = 1, . . . ,N ) do k k 11 Compute the own patterns Pu (1 ≤ |Pu | ≤ 10); i i k k k 12 Pall ← Pall ∪ Pui ; 13 for each (ui , i = 1, . . . ,N ) do k 14 Compute the vector Vu i with support or lift; k 15 Initialize to 0 the confusion matrix M of the method k ; 16 for each (ui , i = 1, . . . ,N do 17 Compute the test stream Tui (|T | is xed, T ∈ Tui ); 18 while (Tu 6= ∅) do i k 19 Take SW sessions from Tui to compute VT ; 20 ua ← max(simil(Vuki ,VTk )) or min(dist(Vuki ,VTk )); 21 M k [ui ][ua ] ← M k [ui ][ua ] + 1; k 22 Compute the mean accuracy of k from M ; The direct-optimal basis via reductions Estrella Rodrı́guez-Lorenzo1 , Karell Bertet2 , Pablo Cordero1 , Manuel Enciso1 , and Angel Mora1 1 University of Málaga, Andalucı́a Tech, Spain, e-mail: {estrellarodlor,amora}@ctima.uma.es, {pcordero,enciso}@uma.es 2 Laboratoire 3I, Université de La Rochelle e-mail: karell.bertet@univ-lr.fr Abstract. Formal Concept Analysis has become a real approach in the trend Information-Knowledge-Wisdom. It turns around the mining of a data set to built a concept lattice which provides an strong structure of the knowledge. Implications play the role of an alternative specification of this concept lattice and may be managed by means of inference rules. This syntactic treatment is guided by several properties like directness, minimality, optimality, etc. In this work, we propose a method to calcu- late the direct-optimal basis equivalent to a given Implicational System. Our method deals with unitary and non-unitary implications. Moreover, it shows a better performance that previous methods in the literature by means of the use of Simplification Logic and reduction paradigm, which remains narrow implications in any stage of the process. We have also developed an empirical study to compare our method with previous approaches in the literature. 1 Introduction Formal Concept Analysis (FCA) is a trending upward area which establishes a proper and fine mixture of formalism, data analysis and knowledge discovering. It is able to analyze and extract information from a context K, rendering a concept lattice. Attribute implications [10] represent implicit knowledge between data and they can be deduced from the concept lattice or using mining techniques from the context directly. An attribute implication is an expression A → B where A and B are sets of attributes. A context satisfies A → B if every object that has all the attributes in A has also all the attributes in B. The study of sets of implications that satisfies some criteria is one of the relevant topics in FCA. An implicational system (IS) of K is defined as a set Σ of implications of K from which any valid implication for K can be deduced by means a syntactic treatment of the implications. This symbolic manipulation introduces the notion of equivalent sets of implications and opens the door to the definition of several criteria to discriminate good sets of implications according to these criteria. Thus, the challenges are the definition of an specific notion of IS, named basis, fulfilling some criteria related with minimality and the introduction of efficient methods to transform an arbitrary IS into a basis. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 145–157, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 146 Estrella Rodrı́guez Lorenzo et al. For instance, if the criteria is to obtain an IS with minimum cardinal we can build the so-called Duquenne-Guigues (or stem) basis [11]. Each application may induces a different criterium. For instance, in [2, 3] some methods to calculate the direct-optimal basis are introduced, joining minimality and directness in the same notion of basis. In [8] a method to obtain a basis with minimal size in the left-hand size of the implications was proposed. In this paper, we introduce a method to compute the direct-optimal basis. This kind of basis was introduced in [2,3] and it has two interesting properties: it has the minimum number of attributes and it provides a framework to efficiently compute the closure of a set of attributes. The new method introduced in this paper is strongly based on SLFD (Simplification Logic) and they are more efficient than previous methods appeared in the literature. In the following, first we establish the background necessary for the under- standing of the paper (Section 2). In Section 3 SLFD is summarized and a motiva- tion of the simplification paradigm to remove redundant attributes is provided. Section 4 is focussed on the methods of Bertet et al. to get a direct-optimal basis. In Section 5 the new method is introduced and a comparison among all the methods is showed. Some conclusions are presented in Section 6. 2 Preliminaries We assume well-known the main concepts in FCA [10]. Only the concepts nec- essaries will be introduced. In Formal Concept Analysis (FCA) the relationship between a set of objects and a set of attributes are described using a formal context as follows: Definition 1. A formal context is a triple K = (G, M, I) where G is a finite set whose elements are named objects, M is a finite set whose elements are named attributes and I ⊆ G × M is a binary relation. Thus, (o, a) ∈ I means the object o has the attribute a. This paper focuses on the notion of implication, which can be introduced as follows: Definition 2. Let K = (G, M, I) be a formal context and A, B ∈ 2M . The implication A → B holds in K if every object o ∈ G satisfies the following: (o, a) ∈ I for all a ∈ A implies (o, b) ∈ I for all b ∈ B. An implication A → B is said to be unitary if the set B is a singleton. Implications may be syntactically managed by means of inference systems. The former axiomatic system was Armstrong’s Axioms [1]. They allows us to introduce the notion of derivation of an implication from an implicational system, the semantic entailment and the equivalence between two implicational systems in the usual way. The Direct-optimal Basis via Reductions 147 3 Simplification Logic In [6], Cordero et al. introduced the Simplification Logic, SLFD , that is, an equiv- alent logic to the Armstrong’s Axioms that avoids the use of transitivity and is guided by the idea of simplifying the set of implications by removing redundant attributes efficiently. This logic has proved to be useful for automated reasoning with implications [7, 8, 12, 13]. Definition 3 (Language). Given a non-empty finite alphabet S (whose ele- ments are named attributes and denoted by lowercase letters a, b, c, etc.), the language of SLFD is LS = {A → B | A, B ⊆ S}. Sets of formulas (implications) will be named implicational systems (IS). In order to distinguish between language and metalanguage, inside implications, AB means A ∪ B and A-B denotes the set difference A r B. Moreover, when no confusion arises, we omit the brackets, e.g. abc denotes the set {a, b, c}. Definition 4 (Semantics). Let K = (G, M, I) be a context and A → B ∈ LS . The context K is said to be a model for A → B, denoted K |= A → B, if A, B ⊆ M ⊆ S and A → B holds in K. For a context K and an IS Σ, then K |= Σ means K |= A → B for all A → B ∈ Σ and Σ |= A → B denotes that every model for Σ is also a model for A → B. If Σ1 and Σ2 are implicational systems, Σ1 ≡ Σ2 denotes both IS are equivalent (i.e. K |= Σ1 iff K |= Σ2 for all context K). Definition 5 (Syntactic derivations). SLFD considers reflexivity axioms B⊆A [Ref] ; A→B and the following inference rules named fragmentation, composition and simpli- fication respectively. A → BC A → B, C → D A → B, C → D [Frag] ; [Comp] ; [Simp] If A ⊆ C, A ∩ B = ∅, A→B AC → BD C-B → D-B Given an IS Σ and a formula A → B, Σ ` A → B denotes that A → B can be derived from Σ by using the axiomatic system in a standard way. The above axiomatic system is sound and complete (i.e. Σ |= A → B iff Σ ` A → B). The main advantage of SLFD is that inferences rules may be considered equivalence rules and they are enough to compute all the derivations (see [12] for further details and proofs). Theorem 1 (Mora et al. [12]). In SLFD logic, the following equivalencies hold: 1. Fragmentation Equivalency [FrEq]: {A → B} ≡ {A → B-A}. 2. Composition Equivalency [CoEq]: {A → B, A → C} ≡ {A → BC}. 3. Simplification Equivalency [SiEq]: If A ∩ B = ∅ and A ⊆ C then {A → B, C → D} ≡ {A → B, C-B → D-B} 4. Right Simplification Equivalency [rSiEq]: If A ∩ B = ∅ and A ⊆ C ∪ D then {A → B, C → D} ≡ {A → B, C → D-B} 148 Estrella Rodrı́guez Lorenzo et al. Note that these equivalencies (reading from left to right) remove redundant information. SLFD was conceived as a simplification framework. To conclude this section, we introduce the outstanding notion of closure of a set of attributes, which is strongly related with the syntactic treatment of implications. Definition 6. Let Σ ⊆ LS be an IS and X ⊆ S. The closure of X wrt Σ is the + + largest subset of S, noted XΣ , such that Σ ` X → XΣ . We omit the subindex (i.e. we write X + ) when no confusion arise. Given a context K and an IS Σ satisfying K |= A → B iff Σ ` A → B, it is well-known that the closed sets of attributes wrt Σ are in bijection with the concepts of K. One of the main topics is the computation of the closure of a set of attributes, and for this reason, it is necessary to have an efficient method to calculate closures. We emphasize for this problem, the works of Bertet et al. in [2, 3] and Cordero et al. in [12]. 4 Direct-Optimal basis The study of sets of implications that satisfies some criteria is one of the most important topics in FCA. In [3], Bertet and Monjardet present a survey about implicational systems and basis. They show the equality between five unit basis originating from different works (minimal functional dependencies in database theory, knowledge spaces, etc.) and satisfying various properties including the directness canonical and minimal properties, whence the name canonical direct basis is given to this basis. The direct-optimal basis belong to these five basis. In the following, we show only the concepts used in the rest of the paper of this survey. Definition 7. An IS Σ is said to be: – minimal if Σ r {A → B} 6≡ Σ for all A → B ∈ Σ. – minimum if Σ 0 ≡ Σ implies |Σ| ≤ |Σ 0 |, for all IS Σ 0 . – optimal if Σ 0 ≡ Σ implies kΣk ≤ kΣ 0 k, for all IS Σ 0 . X where |Σ| is the cardinality of Σ and kΣk is its size, ie kΣk = (|A|+|B|). A→B∈Σ A minimal set of implications is named a basis, and a minimum basis is then a basis of least cardinality. Let us now introduce the main property used in this paper, namely the direct-optimal property. Definition 8. An IS Σ is said to be direct if, for all X ⊆ S: [ X + = X ∪ {B | A ⊆ X and A → B ∈ Σ} Moreover, Σ is said to be direct-optimal if it is direct and, for any direct IS Σ 0 , Σ 0 ≡ Σ implies kΣk ≤ kΣ 0 k. The Direct-optimal Basis via Reductions 149 In words, Σ is direct if the computation of the closure of any attribute set wrt Σ requires only one iteration, that is, a unique traversal of the set of implications. Obviously, the direct-optimal property is the combination of the directness and optimality properties. In [2], Bertet and Nebut show that a direct-optimal IS is unique and can be obtained from any equivalent IS. We address this procedure in this paper. As we have said in the preliminaries, one of the most important problems is how to calculate quickly and easily the closure X + of any set X because a number of problems related to an IS Σ can be answered by computing closures. For this reason, Bertet et al. propose a type of base called direct-optimal basis [2, 3], so one can compute closures of subsets in only one iteration. Section 4.1. presents the basis proposed in [2] by Bertet and Nebut where they work with non-unitary implicational systems (IS). Section 4.2 shows how to obtain a unit direct-optimal basis [3]. In both sections, we illustrate the algorithms needed to obtain a direct- optimal basis equivalent to any implicational system. 4.1 Computing Direct-Optimal basis In this section, the algorithm proposed by Bertet and Nebut in [2] is showed. The key of the method is the so-called “overlap axiom” that can be directly proved by using the axiomatic system from Definition 5. A → B, C → D [Overlap] for all A, B, C, D ⊆ S: If B ∩ C 6= ∅, A(C-B) → D Then, the direct implicational system generated from an IS Σ is defined as the smallest IS that contains Σ and is closed for [Overlap]. Definition 9. The direct implicational system Σd generated from Σ is defined as the smallest IS such that: 1. Σ ⊆ Σd and 2. For all A, B, C, D ⊆ S, if A → B, C → D ∈ Σd and B ∩ C 6= ∅ then A(C-B) → D ∈ Σd . Function Bertet-Nebut-Direct(Σ) input : An implicational system Σ on S output: The direct IS Σd on S equivalent to Σ begin Σd := Σ foreach A → B ∈ Σd do foreach C → D ∈ Σd do if B ∩ C 6= ∅ then add A(C-B) → D to Σd ; return Σd Theorem 2 (Bertet and Nebut [2]). Let Σ be an implicational system. Then Σd = Bertet-Nebut-Direct(Σ) is a direct basis. 150 Estrella Rodrı́guez Lorenzo et al. Moreover, if an IS Σ is direct but not direct-optimal, then there exists an equiv- alent IS Σ 0 of smaller size which is direct-optimal. The properties that it must hold are the following: Theorem 3 (Bertet and Nebut [2]). A direct IS Σ is direct-optimal if and only if the following properties hold. Extensiveness: for all A → B ∈ Σ, A ∩ B = ∅. Isotony: for all A → B, C → D ∈ Σ, C A implies B ∩ D = ∅. Premise: if A → B, A → B 0 ∈ Σ then B = B 0 . Not empty conclusion: if A → B ∈ Σ then B 6= ∅. Function Bertet-Nebut-Minimize(Σ) input : An implicational system Σ on S output: An smaller IS Σm on S equivalent to Σ begin Σm := ∅ foreach A → B ∈ Σ do B 0 := B foreach C → D ∈ Σ do if C = A then B 0 := B 0 ∪ D; if C A then B 0 := B 0 r D; B 0 := B 0 r A add A → B 0 to Σm return Σm Function Bertet-Nebut-DO computes the direct-optimal basis Σdo generated from an IS Σ. It first computes Σd using Function Bertet-Nebut-Direct and then minimizes Σd using Function Bertet-Nebut-Minimize. Function Bertet-Nebut-DO(Σ) input : An implicational system Σ on S output: The direct-optimal IS Σdo on S equivalent to Σ begin Σd = Bertet-Nebut-direct(Σ) Σdo = Bertet-Nebut-Minimize(Σd ) return Σdo Theorem 4 (Bertet and Nebut [2]). Let Σ be an implicational system. Then Σdo = Bertet-Nebut-DO(Σ) is the unique direct-optimal implicational system equivalent to Σ. 4.2 Direct-Optimal basis by means of unit implicational systems In some areas, the management of formulas is limited to unitary ones. Thus, the use of Horn Clauses in Logic Programming is widely accepted. Such a lan- guage restriction allows an improvement in the performance of the methods, which are more direct and lighter. Nevertheless, the advantages provided by the The Direct-optimal Basis via Reductions 151 limited languages have a counterpart: a significant growth of the input set. In this section we are going to present new versions of the definitions and methods introduced above restricted to Unit Implicational System (UIS), i.e. set of im- plications with unitary right-hand sides. An UIS is named proper if it does not contain implications A → a such that a ∈ A. In this line, Bertet [4] provided versions for unit implicational systems of Functions Bertet-Nebut-Direct and Bertet-Nebut-Minimize. Function Bertet-Unit-Direct(Σ) input : A proper UIS Σ on S output: The direct UIS Σd on S equivalent to Σ begin Σd := Σ foreach A → a ∈ Σd do foreach Ca → b ∈ Σd do if a 6= b and b 6∈ A then add AC → b to Σd ; return Σd Function Bertet-Unit-Minimize(Σ) input : A proper UIS Σ on S output: An smaller UIS Σm on S equivalent to Σ begin Σm := Σ foreach A → b ∈ Σm do foreach C → b ∈ Σm do if A C then delete C → b from Σm ; return Σm The above functions was used in [4] to build a method which transforms an arbitrary UIS into an UIS with the same properties that the direct-optimal basis for general IS. Since any non-unit IS can be trivially turned into an UIS, we may encapsulate both functions to provide another method to get a direct-optimal basis from and arbitrary IS. Thus, the following function incorporates a first step to convert any IS into its equivalent UIS and concludes with the converse switch. Function Bertet-Unit-DO(Σ) input : An implicational system Σ on S output: The direct-optimal IS Σdo on S equivalent to Σ begin Σu := {A → b | A → B ∈ Σ and b ∈ B r A} Σud := Bertet-Unit-Direct(Σu ) Σudo := Bertet-Unit-Minimize(Σud ) Σdo := {A → B | B = {b | A → b ∈ Σ} =6 ∅} return Σdo 152 Estrella Rodrı́guez Lorenzo et al. Theorem 5 (Bertet [4]). Let Σ be an IS. Then Σdo = Bertet-Unit-DO(Σ) is the unique direct-optimal implicational system equivalent to Σ. As we have mentioned at the beginning of this subsection, some authors introduce unitary formulas as a way to provide simpler and more direct methods having a better performance. Thus, in this case, Bertet-Unit-DO is more efficient than Bertet-Nebut-DO, as we shall see at the end of the paper in Section 5.1. 5 Computing direct-optimal basis by means of reductions In this paper, our goal is the integration of the techniques proposed by Bertet et al. [2–4] and the Simplification Logic proposed by Cordero et al. [6], that is, the adding of reductions based on the simplification paradigm to build a direct-optimal basis. In the same way that Bertet-Unit-DO, we are going to develop a function to get direct-optimal basis whose first step will be to narrow the implications. However, the use of unit implications has some disadvantages that we are going to avoid by considering another kind of formulas. Thus, we are going to use reduced IS and introduce simplification rules which transform it preserving reduceness. A signal which indicates it is a good approach is the fact that at the end of the process, the function renders the direct-optimal basis directly, avoiding the converse switch. Definition 10. An IS Σ is reduced if A → B ∈ Σ implies B 6= ∅ and A∩B = ∅ for all A, B ⊆ S. Obviously, any IS Σ can be turned into a reduced equivalent one Σr as follows Σr := {A → B-A | A → B ∈ Σ, B 6⊆ A} The method proposed begins with this transformation and, once the IS is re- duced, this property is preserved. For this reason, [Overlap] must be substituted. Thus, we introduce a new inference rule covering directness without losing re- duceness and, at the same time, it makes progress on the minimization task following the simplification paradigm. The kernel of the new method is the fol- lowing inference rule, named strong simplification: A → B, C → D [sSimp] If B ∩ C 6= ∅ and D 6⊆ A ∪ B, A(C-B) → D-(AB) Regardless the conditions, the inference rule always holds. Nevertheless, the con- ditions ensure a precise application of the rule in those cases where it is necessary. Definition 11. Given a reduced IS Σ, the direct-reduced implicational system Σdr generated from Σ is defined as the smallest IS such that 1. Σ ⊆ Σdr and 2. For all A, B, C, D ⊆ S, if A → B, C → D ∈ Σdr , B ∩ C 6= ∅ and D 6⊆ A ∪ B then AC-B → D-(AB) ∈ Σdr The Direct-optimal Basis via Reductions 153 Theorem 6. Given a reduced IS Σ, then Σdr =Direct-Reduced(Σ) is a direct and reduced IS. Function Direct-Reduced(Σ) input : A reduced implicational system Σ on S output: The direct-reduced IS Σdr on S begin foreach A → B ∈ Σdr and C → D ∈ Σdr do 6 D r (A ∪ B) then add AC-B → D-(AB) to Σdr ; if B ∩ C 6= ∅ = return Σdr Theorem 1 provides four equivalencies which allow to remove redundant infor- mation when they are read from left to right. An implicational system in which these equivalences are used to remove redundant information is going to be named simplified implicational system. Definition 12. A reduced IS Σ is simplified if the following conditions hold: for all A, B, C, D ⊆ S, 1. A → B, A → C ∈ Σ implies B = C. 2. A → B, C → D ∈ Σ and A C imply C ∩ B = ∅ = D ∩ B. Then, Function RD-Simplify turns any direct-reduced IS into a direct-reduced- simplified equivalent one by systematically applying the equivalences provided in Theorem 1. Function RD-Simplify(Σ) input : A direct-reduced implicational system Σ on S output: The direct-reduced-simplified IS Σdrs on S equivalent to Σ begin Σdrs := ∅ foreach A → B ∈ Σ do foreach C → D ∈ Σ do if C = A then B := B ∪ D; if C A then B := B r D; if B 6= ∅ then add A → B to Σdrs ; return Σdrs Function doSimp(Σ) input : An implicational system Σ on S output: The direct-optimal IS Σdo on S begin Σr := {A → B-A | A → B ∈ Σ, B 6⊆ A} Σdr := Direct-Reduced(Σr ) Σdo := RD-Simplify(Σdr ) return Σdo 154 Estrella Rodrı́guez Lorenzo et al. Theorem 7. Let Σ be an implicational system on S. Then, Σdo = doSimp(Σ) is the direct-optimal basis equivalent to Σ. Note that, unlike Bertet-Unit-DO where a final step was needed to revert the effects of the first transformation, doSimp do not need to revert the first step. We conclude this section with an experiment which illustrates the advantages of the new method. 5.1 Empirical results Logic programming has been used as a natural framework in the areas in which it is neccessary to develop automatic deduction methods. The Prolog prototypes provides a declarative and pedagogical point of departure and illustrates the behavior of new techniques in a very fast and easy way. Some authors have explored the use of Logic Programming in the framework of Formal Concept Analysis. Even, in [5] the authors consider the framework of FCA and its implementation in logic programming as a previous step to achieve the first order logic FCA theory. In Eden et al. [9], the authors present a PROLOG-based prototype tool and show how the tool can utilize formulas to locate pattern instances. In a first step, the methods proposed in this paper have been developed in a Logic Programming language (Prolog) that is a well-known tool to develop fast prototypes. In our case, the implementation in Prolog is close because the method proposed in this paper is based on logic. The methods of Bertet et al. [2, 3] and our doSimp method have been imple- mented in Swi-Prolog.1 Since there does not exist a benchmark for implications in this experiment, we have collected some sets of implications from the litera- ture, searching papers and books with works about algorithms for implications, functional dependencies and minimal keys. Now, we are going to show the results of the execution of a first Prolog prototype of Bertet et al. for UIS [3], Bertet et al. for IS [2] and the new doSimp (proposed in this paper) methods. The following table and figures summarize the results obtained. We show in the columns the results of Prolog: Lips (logical inferences per second lips - used to describe the performance of a logical reasoning system), Time (execution time in seconds), and Comp (the number of couple of implications in which a rule is applied). Areas in Figure 2 show the percentages of each algorithm with respect the number of comparisons. 1 Available at http://www.lcc.uma.es/~enciso/doSimp.zip The Direct-optimal Basis via Reductions 155 Lips/Time/Comp. Bertet-Nebut-DO Bertet-Unit-DO Direct-Reduced Ex.1 5297080 1247 1978 116905 0.019 36 4281 0.001 12 Ex.2 2395 0.003 23 923 0 3 606 0 2 Ex.3 2183 0 15 1440 0 4 1122 0 4 Ex.a 83403 0.019 297 44109 0.007 33 3048 0.001 4 Ex.a3red 27613 0.005 100 16938 0.003 20 3698 0.001 15 Ex.derivation5 10302 0.002 120 3522 0.001 8 1782 0.001 12 Ex.Olomouc 15399581 4528 4337 1526818 0.331 180 15568 0.003 72 Ex.Ganter 116514 0.025 230 72153 0.16 36 3756 0.001 12 Ex.CLA14 102971 0.022 204 7449 0.001 12 704 0 3 Ex.Saedian1 18754 0.004 97 10349 0.002 14 4064 0.001 16 Ex.Saedian2 19452 0.004 160 10549 0.002 13 2619 0.001 13 EX.Saedian3 5753962 1262 1986 166566 0.028 67 24643 0.005 55 Ex.Wastl10 1242 0 18 381 0 1 327 0 1 Ex.Wastl13 10543 0.002 86 4674 0 10 1029 0 5 Example1 5594556921 7008.890 134175 2662181973 1351.950 5389 1199498 0.197 1103 IS Bertet − N ebut U IS Bertet doSimp Lips - logical inferences 374, 760, 194.4 177, 610, 983.3 84, 449.66667 Time of execution (seconds) 467, 728.5 90, 130.03693 0.014 Number of comparisons 9588.4 388.4 88.6 Fig. 1. Summary of the experiment (average) 100% 90% 80% 70% 60% 50% Comparisons doSimp Comparisons UIS -‐B Comparisons IS -‐BN 40% 30% 20% 10% 0% 5 1 2 r c 3 10 .3 .2 13 ed .a .1 e1 14 te ou on an an an Ex Ex Ex Ex pl tl tl 3r LA an m di di di a: as as am .a .C .G ae ae lo ae .W .W iv Ex Ex .O Ex Ex .S .S .S er Ex Ex Ex Ex Ex EX .d Ex Fig. 2. Results: Comparisons 6 Conclusion In this work, we have presented another algorithm to calculate the direct-optimal basis in a further way, in the most of the cases, than the algorithms which exist in 156 Estrella Rodrı́guez Lorenzo et al. the literature. It is shown with a test that we have realised by running different examples with the methods of Bertet et al. for UIS [3], Bertet et al. for IS [2] and the new doSimp. Our aim is to reduce the cost of the algorithm by using the Simplification Logic as a useful tool to work with implications. By the time, we have improved the algorithms that existed but we are going to go on working in that way to try to cut down the cost of our method. The perspectives we have are improvements by pretreatments: reduction, canonical basis, etc in order to reach our main objective which would be to directly compute the direct-optimal basis without extra implication generation. Acknowledgment Supported by grant TIN11-28084 of the Science and Innovation Ministry of Spain. References 1. W W. Armstrong, Dependency structures of data base relationships, Proc. IFIP Congress. North Holland, Amsterdam: 580–583, 1974. 2. K. Bertet, M. Nebut, Efficient algorithms on the Moore family associated to an implicational system, DMTCS, 6(2): 315–338, 2004. 3. K. Bertet, B. Monjardet, The multiple facets of the canonical direct unit implica- tional basis, Theor. Comput. Sci., 411(22-24): 2155–2166, 2010. 4. K. Bertet, Some Algorithmical Aspects Using the Canonical Direct Implicationnal Basis, CLA:101–114, 2006. 5. L. Chaudron, N. Maille, 1st Order Logic Formal Concept Analysis: from logic pro- gramming to theory, Computer and Informations Science: (13:3),1998. 6. P Cordero, A. Mora, M. Enciso, I.Pérez de Guzmán, SLFD Logic: Elimination of Data Redundancy in Knowledge Representation, LNCS, 2527: 141–150, 2002. 7. P. Cordero, M. Enciso, A. Mora, M. Ojeda-Aciego, Computing Minimal Generators from Implications: a Logic-guided Approach, CLA: 187–198, 2012. 8. P. Cordero, M. Enciso, A. Mora, M. Ojeda-Aciego, Computing Left-Minimal Direct Basis of implications. CLA: 293–298, 2013. 9. A. Eden, Y. Hirshfeld, K. Lundqvist, EHL99 , LePUS Symbolic Logic Modeling of Object Oriented Architectures: A Case Study, In: Proc. Second Nordic Workshop on Software Architecture (NOSA’99), 1999. 10. B. Ganter, Two basic algorithms in concept analysis, Technische Hochschule, Darmstadt, 1984. 11. J.L. Guigues and V. Duquenne, Familles minimales d’implications informatives résultant d’un tableau de données binaires, Math. Sci. Humaines: 95, 5–18, 1986. 12. A. Mora, M. Enciso, P. Cordero, and I. Fortes, Closure via functional dependence simplification, I. J.of Computer Mathematics, 89(4): 510–526, 2012. 13. A. Mora, M. Enciso, P. Cordero, and I. Pérez de Guzmán, An Efficient Prepro- cessing Transformation for Functional Dependencies Sets Based on the Substitution Paradigm, LNCS, 3040: 136–146, 2004. Ordering objects via attribute preferences Inma P. Cabrera1 , Manuel Ojeda-Aciego1 , and Jozef Pócs2 1 Universidad de Málaga. Andalucı́a Tech. Spain? 2 Palacký University, Olomouc, Czech Republic, and Slovak Academy of Sciences, Košice, Slovakia?? Abstract. We apply recent results on the construction of suitable or- derings for the existence of right adjoint to the analysis of the following problem: given a preference ordering on the set of attributes of a given context, we seek an induced preference among the objects which is com- patible with the information provided by the context. 1 Introduction The mathematical study of preferences started almost one century ago with the works of Frisch, who was the first to write down in 1926 a mathematical model about preference relations. On the other hand, the study of adjoints was initiated in the mid of past century, with works by Ore in 1944 (in the framework of lattices and Galois connections) and Kan in 1958 (in the framework of category theory and adjunctions). The most recent of the three theories considered in this work is that of Formal Concept Analysis (FCA), which was initiated in the early 1980s by Ganter and Wille, as a kind of applied lattice theory. Nowadays FCA has become an important research topic in which a, still growing, pure mathematical machinery has expanded to cover a big range of applications. A number of results are published yearly on very diverse topics such as data mining, semantic web, chemistry, biology or even linguistics. The first basic notion of FCA is that of a formal context, which can be seen as a triple consisting of an initial set of formal objects B, a set of formal attributes A, and an incidence relation I ⊆ B × A indicating which object has which attribute. Every context induces a lattice of formal concepts, which are pairs of subsets of objects and attributes, respectively called extent and intent, where the extent of a concept contains all the objects shared by the attributes from its intent and vice versa. Given a preference ordering among the attributes of a context, our contribu- tion in this work focuses on obtaining an induced ordering on the set of objects which, in some sense, is compatible with the context. After browsing the literature, we have found just a few papers dealing simul- taneously with FCA and preferences, but their focus and scope are substantially ? Partially supported by Spanish Ministry of Science and FEDER funds through projects TIN2011-28084 and TIN12-39353-C04-01. ?? Partially supported by ESF Fund CZ.1.07/2.3.00/30.0041. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 157–169, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 158 Inma P. Cabrera, Manuel Ojeda-Aciego and Jozef Pócs different to ours. For instance, Obiedkov [11] considered some types of preference grounded on preference logics, proposed their interpretation in terms of formal concept analysis, and provided inference systems for them, studying as well their relation to implications. Later, in [12], he presented a context-based semantics for parameterized ceteris paribus preferences over subsets of attributes (pref- erences which are only required to hold when the alternatives being compared agree on a specified subset of attributes). Other approaches to preference handling are related to the development of recommender systems. For instance, [8] proposes a novel recommendation model based on the synergistic use of knowledge from a repository which includes the users behavior and items properties. The candidate recommendation set is con- structed by using FCA and extended inference rules. Finally, another set of references deal with extensions of FCA, either to the fuzzy or multi-adjoint case, or to the rough case. For instance, in [2] an approach can be found in which, based on transaction cost analysis, the authors explore the customers’ loyalty to either the financial companies or the company financial agents with whom they have established relationship. In a pre-processing stage, factor analysis is used to choose variables, and rough set theory to construct the decision rules; FCA is applied in the post-processing stage from these suitable rules to explore the attribute relationship and the most important factors af- fecting the preference of customers for deciding whether to choose companies or agents. Glodeanu has recently proposed in [6] a new method for modelling users’ preferences on attributes that contain more than one trait. The modelling of preferences is done within the framework of Formal Fuzzy Concept Analysis, specifically using hedges to decrease the size of the resulting concept lattice as presented in [1]. An alternative generalization which, among other features, allows for specify- ing preferences in an easy way, is that of multi-adjoint FCA [9,10]. The main idea underlying this approach is to allow to use several adjoint pairs in the definition of the fuzzy concept-forming operators. Should one be interested in certain sub- set(s) of attributes (or objects), the only required setting is to declare a specific adjoint pair to be used in the computation with values within each subset of preferred items. The combination of the two last approaches, namely, fuzzy FCA with hedges and the multi-adjoint approach have been recently studied in [7], providing new means to decrease the size of the resulting concept lattices. This work can be seen as a position paper towards the combination of recent results on the existence of right adjoint for a mapping f : hX, ≤X i → Y from a partially ordered set X to an unstructured set Y , with Formal Concept Analysis, and with the generation of preference orderings. The structure of this work is the following: in Section 2, the preliminary results related to attribute preferences and the characterization of existence of right adjoint to a mapping from a poset to an unstructured codomain are pre- sented; then, in Section 3 the two approaches above are merged together in order Ordering Objects via Attribute Preferences 159 to produce a method to induce an ordering among the objects in terms of a given preference ordering on attributes and a formal context. 2 Preliminaries 2.1 Preference relations and lectic order on the powerset We recall the definition of a (total) preference ordering and describe an induced ordering on the corresponding powerset. In the general approach to preferences, a preference relation on a nonempty set A is said to be a binary relation ⊆ A×A which is reflexive (∀a ∈ A, a a) and total (∀a, b ∈ A, (a b) ∨ (b a)). In this paper, we will consider a simpler notion, in which a preference rela- tion is modeled by a total ordering. Formally, by a total preference relation we understand any total ordering of the set A, i.e., a binary relation ⊆ A × A such that is total, reflexive, antisymmetric (∀a, b ∈ A, a b and b a implies a = b), and transitive (∀a, b, c ∈ A, a b and b c implies a c). Any total preference relation on a set A induces a total ordering on the powerset 2A in a natural way. Definition 1. Let hA, i be a nonempty set with a total preference relation. A subset X is said to be lectically smaller than a subset Y , denoted X= 1 and k < i} the branches of all nodes that precedes ni in ss. To avoid locating these nodes and simplify calculations, let’s virtually move ni to the start of ss. All supersets of ni are now confined in ni branch and the updated count of ni supersets is 2(n−1) . Let nj be another generator in the same cluster. It is important to count all nj supersets while avoid including elements that are already counted as part of the ni branch. By virtually moving nj after ni in ss and counting all elements in the nj branch, it is possible to fulfil both conditions. nj branch count is 2(n−2) and the same process is applied to the remaining generators in the cluster. P|gs| Doing so leads us to the generalized generators counting formula gc = k=|ss|−1 2k where |gs| is the generators count and |ss| is cluster size (line 10 in DFSP and line 20 in E XPLORE T IDSET). Detecting non generators monotony The most significant mop up mechanism in DFSP is non generators pruning. In order to also eliminate non generators, E XPLORE T ID - SET looks for nodes in ss that when combined together, the resulting clique superset will still be a non generator. Those nodes are said to form a non generator monotonous clique. Suppose, we’re building the branch of a node from this clique. If we use exclu- sively nodes from the clique, all nodes in the branch are guaranteed to be subsets of the clique superset. Since a subset of a non generator is also a non generator then all branches in the clique will only contain non generators. Nodes in the clique are pushed at the end of the ss set to insure that the generation process will only use nodes from the clique. Nodes outside the clique are moved away to the beginning of ss. Nodes in the clique are not expanded, since no generator could be found in their branches but are still used to build branches outside the clique. 8176 I.Ilyes Dimassi, Dimassi et al. Amira Mouakher and Sadok Ben Yahia Algorithm 2: E XPLORE T IDSET Input: -K = (O, I, R): a formal context. -ss=a set of TSNode siblings. -m: the size of the intent. Results: -gc : the generators count. 1 Begin 2 i := ssc := |ss|; 3 ingpc := gc := 0; 4 ingpi := I; 5 While i < 1 do 6 If |ngpi ∩ ss[i].is| = m then 7 M OVE T O H EAD(i, ss); 8 ingpc := ingpc + 1; 9 Else 10 i := i − 1; 11 ngpi := ngpi ∩ ss[i].is; 12 For (i = 1 . . . ingpc) do 13 nlef t := ss[i]; 14 For (j = i + 1 . . . ssc) do 15 nright := ss[j]; 16 nchild.s := nright.s; 17 nchild.is := nlef t.is ∩ f (nchild.s); 18 If (|nchild.is|! = m) then 19 nlef t.ss∪ := nchild; Pssc−i−1 20 gc+ = k=|nlef t.ss| 2k ; 21 If (|nlef t.ss| > 1) then 22 gc+ = E XPLORE T IDSET(nlef t.ss, K, m); 23 Return gc; 24 End 3.2 Illustrative example To illustrate our approach, let us consider the formal concept C1 = (A1 , B1 ) from the formal context depicted by Table 1 such that A1 = {3, 4, 5, 6, 7, 9} and B1 = {f, g}. As shown in figure 1, the DFSP algorithm operates as follows: During the first step (1), the root node is created and initialized though the function BUILD T REE ROOT (gc=0). Initially, root.s = ∅ and nodes n3 , n4 , n5 , n6 , n7 will be created through individual elements of {3, 4, 5, 6, 7, 9} (steps (2), (3), (4), (5), (6) and (7)). These nodes are prospective direct children to the root node. Given that all these nodes are non generators, they become in step (8) as effective direct children of root and are decreasingly sorted with respect to their support value. In step (9), non generators forming monotone clique are placed at the end of the list and marked by (*). However, DFSP: A newDFSP: Swift algorithm forComputation of Formal a swift computation Concept of formal Set concept setStability stability 177 9 Fig. 1. Illustrative example instable generators are placed at the beginning of the list and marked by (+). After that, in steps (10), (11), (12), (13) and (14) prospective direct children of node n3 are created which are, respectively, n36 , n39 , n34 , n35 and n37 . The count of generators is updated in step (15) (gc = 24 +23 = 24). Only nodes n34 , n35 and n39 are left as effective direct children of n3 . The latter are also sorted decreasingly. In step (16), all these effective direct children form a monotone clique and exploration of this branch is stopped. After that, nodes n69 , n64 , n65 and n67 are created and count of generators is also updated in step (21) with the tree generators of n64 , n65 and n67 (gc = gc + 23 + 22 + 21 = 24 + 14 = 38). Only the node n69 is kept in the list of effective direct children of n6 . Indeed, the latter does not fulfil the condition of E XPLORE T IDSET to be launched. 4 Experimental results In this section, we put the focus on the evaluation of the DFSP algorithm by stressing on two complementary aspects : (i)Execution time; (ii) efficiency of search space pruning. Experiments were carried out on an Intel Xeon PC, CPU E5-2630 2,30 GHz with 16 GB of RAM and Linux system. During the lead experiments, we used some benchmark datasets commonly of extensive use within Data mining. The first three datasets are considered as dense ones, i.e., yielding high number of formal concepts even for a small number of objects and attributes, while the other ones are considered as sparse. The characteristics of these datasets are summarized by Table 2. Thus, for each dataset we report its number of objects, the number of attributes, as well as the number of all formal concepts that may drawn. In addition, we also reported the respective sizes of the smallest and the largest formal concepts (in terms of extent’s size). For these considered concepts, we kept track of the number of the actually explored nodes as well as the execution time (the column denoted |explor.|). At a glance, statistics show that the DFSP algorithm is able to process dozens of thousands of objects in a reasonable time. Indeed, the 15596 (respec. 16040) objects 178 10 I.Ilyes Dimassi, Dimassi et al. Amira Mouakher and Sadok Ben Yahia composing the extent of the largest formal concept extracted from the R ETAIL (re- spec. T10I4D100K) dataset are handled in only 27.27 (respec. 68.85) seconds. Even though, the respective cardinalities are close (15596 vs 16040 objects), the difference in execution time is not proportional to this low gap. A preliminary explanation could be the difference in density of both datasets (R ETAIL is dense while T10I4D100K is a sparse one). A in-depth study of these performances in connection to the nature of datasets is currently carried out. The most sighting fact is the low number of visited nodes in the associated search space. For example for the M USHROOM dataset, DFSP algorithm actually handled only 83918 nodes from 21000 potential nodes of the search space, i.e., in numerical terms it comes to only explore infinitely insignificant part equal to 7.8 × 10−297 of the search space. The case of R ETAIL and T10I4D100K datasets is also worth of mention. For the respective smallest extracted concepts, DFSP algorithm only explores, 1.14 × 10(−45) and 1.5 × 10(−90) parts of the respective search spaces. smallest concept largest concept Datasets # Attr # Obj # concepts |ext| |explor.| time (sec.) |ext| |explor.| time (sec.) C HESS 75 3196 3316 2630 2362233 0.12 3195 5855899 0.64 M USHROOM 119 8124 3337 1000 83918 0.10 8124 76749955 11.32 R ETAIL 16470 88162 3493 150 164 0.10 15596 64847191 27.27 T10I4D100K 1000 100000 4497 300 306 0.11 6810 19719991 12.77 T40I10D100K 1000 100000 3102 1800 1495324 1.39 16040 92154598 68.85 Table 2. Characteristics of the considered benchmark datasets These highlights are also confirmed by Figures 2-11. Indeed, Figures 2, 4, 6, 8 and 10 stress on the variation of the Execution time, while Figures 3, 5, 7, 9 and 11 assess what we call the workload which means the efficiency of search space exploration. At a glance, the execution time is in a snugness connection with the reduction of search space, i.e., the variation of the workload has the same tendency as the performance since we consider the visited tidset in the search space as the processing unit. Worth of mention, the performance is rather correlated to the extent’s size rather than the exponential nature of the search space. 1000 Scaleup Trend (Scaleup) Workload Trend (workload) 10 100 Tidsets (x106) Time (sec) 10 1 1 0.1 0.1 0.01 1 10 1 10 Extent size (x103) Extent size (x103) Fig. 2. Mushroom scaleup Fig. 3. Mushroom workload DFSP: A newDFSP: Swift algorithm forComputation of Formal a swift computation Concept of formal Set concept setStability stability 179 11 1.4 Scaleup Workload 1.2 10.4 Trend (Scaleup) Trend (Workload) 1 8.4 Tidsets (x106) Time (sec) 0.8 6.4 0.6 4.4 0.4 0.2 2.4 0 0.4 2.6 2.7 2.8 2.9 3 3.1 3.2 2.6 2.7 2.8 2.9 3 3.1 3.2 Extent size (x103) Extent size (x103) Fig. 4. Chess scaleup Fig. 5. Chess workload 1000 10000 Scaleup Workload Trend (workload) Trend (Scaleup) 1000 100 100 Tidsets (x105) 10 10 Time (sec) 1 0.1 1 0.01 0.1 0.001 1.5 15 150 1.5 15 150 Extent size (x102) Extent size (x102) Fig. 6. Retail scaleup Fig. 7. Retail workload 5 Conclusion and future work Through the DFSP algorithm, we gaped in the combinatorics of lattices by the show- ing that most of this sear space could be smartly explored thanks to the saturation of generators. The swift computation of stability encouraged us to integrate the stability as a on-the-fly pruning strategy during mining closed itemsets. We are currently work- ing on a new algorithm for the stability computation given the Galois lattice. The new algorithm only relies on the direct sub-concepts to compute the stability of a concept. Outside the FCA field, the strategy of DFSP would be of benefit for very efficient ex- traction well known problem of combinatorics : minimal transversals. References 1. Babin, M.A., Kuznetsov, S.O.: Approximating concept stability. In: Proceedings of the 11th International Conference on Formal Concept Analysis(ICFCA), Dresden, Germany. (2012) 7–15 2. Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Scalable estimates of concept stability. In: Pro- ceedings of the 12th International Conference on Formal Concept Analysis(ICFCA), Cluj- Napoca, Romania. (2014) 157–172 180 12 I.Ilyes Dimassi, Dimassi et al. Amira Mouakher and Sadok Ben Yahia 100 Scaleup 100 Tidsets (x106) Trend (Scaleup) 10 Time (sec) 10 1 0.1 1 0.01 Workload 0.001 Trend (workload) 0.1 0.0001 3 30 3 30 Extent size (x102) Extent size (x102) Fig. 8. T10I4D100K scaleup Fig. 9. T10I4D100K workload 100 50 Tidsets (x106) Time (sec) 10 5 Scaleup Workload Trend (Scaleup) Trend (workload) 1 0.5 1.8 18 1 10 Extent size (x103) Extent size (x103) Fig. 10. T40I10D100K scaleup Fig. 11. T40I10D100K workload 3. Kuznetsov, S.O.: Stability as an estimate of the degree of substantiation of hypotheses de- rived on the basis of operational similarity. Automatic Documentation and Mathematical Linguistics 24 (1990) 62–75 4. Kuznetsov, S.O., Obiedkov, S.A., Roth, C.: Reducing the representation complexity of lattice-based taxonomies. In: Proceedings of the 15th International Conference on Con- ceptual Structures (ICCS), Sheffield, UK. (2007) 241–254 5. Kuznetsov, S.O.: On stability of a formal concept. Ann. Math. Artif. Intell. 49 (2007) 101– 115 6. Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Is concept stability a measure for pattern selec- tion? Procedia Computer Science 31 (2014) 918 – 927 7. Klimushkin, M., Obiedkov, S.A., Roth, C.: Approaches to the selection of relevant concepts in the case of noisy data. In: Proceedings of the 8th International Conference(ICFCA) , Agadir, Morocco. (2010) 255–266 8. Roth, C., Obiedkov, S.A., Kourie, D.G.: Towards concise representation for taxonomies of epistemic communities. In: Proceedings of the 4th International Conference on Concept Lattices and Their Applications (CLA), Hammamet, Tunisia. (2006) 240–255 9. Jay, N., Kohler, F., Napoli, A.: Analysis of social communities with iceberg and stability- based concept lattices. In: Proceedings of the 6th International Conference(ICFCA), Mon- treal, Canada. (2008) 258–272 10. Roth, C., Obiedkov, S.A., Kourie, D.G.: On succinct representation of knowledge community taxonomies with formal concept analysis. Int. J. Found. Comput. Sci. 19 (2008) 383–404 11. Qiao, S.Y., Wen, S.P., Chen, C.Y., Li, Z.G.: A fast algorithm for building concept lattice. (2003) 12. Demko, C., Bertet, K.: Generation algorithm of a concept lattice with limited object access. In: Proceedings of the 8th International Conference Concept Lattices and Their Applications (CLA), Nancy, France. (2011) 239–250 Attributive and Object Subcontexts in Inferring Good Maximally Redundant Tests Xenia Naidenova1 and Vladimir Parkhomenko2 1 Military Medical Academy, Saint-Petersburg, Russia ksennaid@gmail.com 2 St. Petersburg State Polytechnical University, Saint-Petersburg, Russia parhomenko.v@gmail.com Abstract. Inferring Good Maximally Redundant Classification Tests (GMRTs) as Formal Concepts is considered. Two kinds of classification subcontexts are defined: attributive and object ones. The rules of forming and reducing subcontexts based on the notion of essential attributes and objects are given. They lead to the possibility of the inferring control. In particular, an improved Algorithm for Searching all GMRTs on the basis of attributive subtask is proposed. The hybrid attributive and object approaches are presented. Some computational aspects of algorithms are analyzed. Keywords: good classification test, Galois lattice, essential attributes and objects, implications, subcontexts 1 Introduction Good Test Analysis (GTA) deals with the formation of the best descriptions of a given object class (class of positive objects) against the objects which do not belong to this class (class of negative objects) on the basis of lattice theory. We assume that objects are described in terms of values of a given set U of attributes, see an example in Tab.1. The key notion of GTA is the notion of classification. To give a target classification of objects, we use an additional attribute KL ∈ / U. A target attribute partitions a given set of objects into disjoint classes the number of which is equal to the number of values of this attribute. In Tab.1, we have two classes: the objects in whose descriptions the target value k appears and all the other objects. Denote by M the set of attribute values such that M = {∪dom(attr), attr ∈ U }, where dom(attr) is the set of all values of attr, i.e. a plain scaling in terms of [3]. Let G = G+ ∪ G− be the set of objects, where G+ and G− are the sets of positive and negative objects respectively. Let P (B), B ⊆ M, be the set of all the objects in whose descriptions B appears. P (B) is called the interpretation of B in the power set 2G . If P (B) contains only G+ objects and the number of these objects is more than 2, then B is called a description of some positive objects or a diagnostic (classification) test for G+ [1]. The words diagnostic (classification) can be omitted in the paper. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 181–193, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 182 Xenia Naidenova and Vladimir Parkhomenko Table 1. Motivating Example of classification No Height Color of Hair Color of Eyes KL 1 Low Blond Blue k(+) 2 Low Brown Blue k(−) 3 Tall Brown Hazel k(−) 4 Tall Blond Hazel k(−) 5 Tall Brown Blue k(−) 6 Low Blond Hazel k(−) 7 Tall Red Blue k(+) 8 Tall Blond Blue k(+) Let us recall the definition of a good test or good description for a subset of G+ (via partitions of objects). A subset B ⊆ M of attribute values is a good test for a subset of positive objects if it is a test and no such subset C ⊆ M exists, so that P (B) ⊂ P (C) ⊆ G+ [7]. Sec.2 is devoted to defining a concept of good diagnostic (classification) test as a formal concept. Sec.3 gives the decomposition of good tests inferring based on two kinds of subcontexts of the initial classification context. Sec.4 is devoted to an analysis of algorithms based on using subcontexts including the evaluation of the number of sub-problems to be solved, the depth of recursion, the structure of sub-problems and their ordering, and some others. 2 Good Maximally Redundant Tests as Formal Concepts Assume that G = 1, N is the set of objects indices (objects, for short) and M = {m1 , m2 , . . . , mj , . . . mm } is the set of attributes values (values, for short). Each object is described by a set of values from M . The object descriptions are represented by rows of a table whose columns are associated with the attributes taking their values in M . Let A ⊆ G, B ⊆ M . Denote by Bi , Bi ⊆ M , i = 1, N the description of object with index i. The Galois connection between the ordered sets (2G , ⊆) and (2M , ⊆) is defined by the following mappings called derivation operators: for A ⊆ G and B ⊆ M , A0 = val(A) = {intersection of all Bi | Bi ⊆ M, i ∈ A} and B 0 = obj(B) = {i| i ∈ G, B ⊆ Bi }. Of course, we have obj(B) = {intersection of all obj(m)| obj(m) ⊆ G, m ∈ B}. There are two closure operators [9]: generalization of(B) = B 00 = val(obj(B)) and generalization of(A) = A00 = obj(val(A)). A set A is closed if A = obj(val(A)). A set B is closed if B = val(obj(B)). For g ∈ G and m ∈ M , {g}0 is denoted by g 0 and called object intent, and {m}0 is denoted by m0 and called value extent. Let us recall the main definitions of GTA [7]. A Diagnostic Test (DT) for the positive examples G+ is a pair (A, B) such that B ⊆ M , A = B 0 6= ∅, A ⊆ G+ , B 6⊆ g 0 ∀g ∈ G− . A diagnostic test (A, B) Subcontexts in Inferring Good Maximally Redundant Tests 183 for G+ is maximally redundant if obj(B ∪ m) ⊂ A for all m ∈ / B and m ∈ M . A diagnostic test (A, B) for G+ is good if and only if any extension A∗ = A ∪ i, i∈/ A, i ∈ G+ implies that (A∗ , val(A∗ )) is not a test for G+ . In the paper, we deal with Good Maximally Redundant Tests (GMRTs). If a good test (A, B) for G+ is maximally redundant, then any extension B∗ = B ∪ m, m ∈ / B, m ∈ M implies that (obj(B∗ ), B∗ ) is not a good test for G+ . Any object description d of g ∈ G in a given classification context is a maximally redundant set of values because ∀m ∈ / d, m ∈ M, obj(d∪m) is equal to ∅. GMRT can be regarded as a special type of hypothesis [4] In Tab.1, ((1, 8), Blond Blue) is a GMRT for k(+), ((4, 6), Blond Hazel) is a DT for k(−) but not a good one, and ((3, 4, 6), Hazel) is a GMRT for k(−). 3 The Decomposition of Inferring GMRTs into Subtasks There are two possible kinds of subtasks of inferring GMRTs for a set G+ [8]: 1. given a set of values, where B ⊆ M, obj(B) 6= ∅, B is not included in any description of negative object, find all GMRTs (obj(B∗ ), B∗ ) such that B∗ ⊂ B; 2. given a non-empty set of values X ⊆ M such that (obj(X), X) is not a test for positive objects, find all GMRTs (obj(Y ), Y ) such that X ⊂ Y . For solving these subtasks we need only form subcontexts of a given classifi- cation context. The first subtask is useful to find all GMRTs whose intents are contained in the description d of an object g. This subtask is considered in [2] for fast incremental concept formation, where the definition of subcontexts is given. We introduce the projection of a positive object description d on the set D+ , i.e. descriptions of all positive objects. The proj(d) is Z = {z| z = d ∩ d∗ 6= ∅, d∗ ∈ D+ and (obj(z), z) is a test for G+ }. We also introduce a concept of value projection proj(m) of a given value m on a given set D+ . The value projection is proj(m) = {d| m appears in d, d ∈ D+ }. Algorithm Algorithm for Searching all GMRTs on the basis of attributive subtask (ASTRA), based on value projections, was advanced in [6]. Algoritm DIAGaRa, based on object projections, was proposed in [5]. In what follows, we are interested in using both kinds of subcontexts for inferring all GMRTs for a positive (or negative) class of objects. The following theorem gives the foundation of reducing subcontexts [6]. Theorem 1. Let X ⊆ M, (obj(X), X) be a maximally redundant test for pos- itive objects and obj(m) ⊆ obj(X), m ∈ M . Then m can not belong to any GMRT for positive objects different from (obj(X), X). Consider some example of reducing subcontext (see Tab.1). Let splus(m) be obj(m) ∩ G+ or obj(m) ∩ G− and SPLUS be {splus(m)| m ∈ M }. In Tab.1, we have SPLUS = obj(m) ∩ G− = {{3, 4, 6}, {2, 3, 5}, {3, 4, 5}, {2, 5}, {4, 6}, {2, 6}} for values “Hazel, Brown, Tall, Blue, Blond, and Low” respectively. 184 Xenia Naidenova and Vladimir Parkhomenko We have val(obj(Hazel)) = Hazel, hence ((3, 4, 6), Hazel) is a DT for G− . Then value “Blond” can be deleted from consideration, because splus(Blond) ⊂ splus(Hazel). Delete values Blond and Hazel from consideration. After that the description of object 4 is included in the description of object 8 of G+ and the description of object 6 is included in the description of object 1 of G+ . Delete objects 4 and 6. Then for values “Brown, Tall, Blue, and Low” respectively SPLUS = {{2, 3, 5}, {3, 5}, {2, 5}, {2}}. Now we have val(obj(Brown)) = Brown and ((2, 3, 5), Brown) is a test for G− . All values are deleted and all GMRTs for G− have been obtained. The initial information for finding all the GMRTs contained in a positive object description is the projection of it on the current set D+ . It is essential that the projection is a subset of object descriptions defined on a certain restricted subset t∗ of values. Let s∗ be the subset of indices of objects whose descriptions produce the projection. In the projection, splus(m) = obj(m) ∩ s∗ , m ∈ t∗ . Let STGOOD be the partially ordered set of elements s satisfying the con- dition that (s, val(s)) is a good test for D+ . The basic recursive procedure for solving any kind of subtask consists of the following steps: 1. Check whether (s∗ , val(s∗ ) is a test and if so, then s∗ is stored in STGOOD if s∗ corresponds to a good test at the current step; in this case, the subtask is over. Otherwise go to the next step. 2. The value m can be deleted from the projection if splus(m) ⊆ s for some s ∈ STGOOD. 3. For each value m in the projection, check whether (splus(m), val(splus(m)) is a test and if so, then value m is deleted from the projection and splus(m) is stored in STGOOD if it corresponds to a good test at the current step. 4. If at least one value has been deleted from the projection, then the reduction of the projection is necessary. The reduction consists in checking, for each element t of the projection, whether (obj(t), t) is not a test (as a result of previous eliminating values) and if so, this element is deleted from the projection. If, under reduction, at least one element has been deleted, then Step 2, Step 3, and Step 4 are repeated. 5. Check whether the subtask is over or not. The subtask is over when either the projection is empty or the intersection of all elements of the projection corresponds to a test (see, please, Step 1). If the subtask is not over, then the choice of an object (value) in this projection is selected and the new subtask is formed. The new subsets s∗ and t∗ are constructed and the basic algorithm runs recursively. The algorithm of forming STGOOD is based on topological sorting of par- tially ordered sets. The set TGOOD of all the GMRTs is obtained as follows: TGOOD = {tg| tg = (s, val(s)), s ∈ STGOOD}. Subcontexts in Inferring Good Maximally Redundant Tests 185 4 Selecting and Ordering Subcontexts and Inferring GMRTs Algorithms for inferring GMRTs are constructed by the rules of selecting and ordering subcontexts of the main classification context. Before entering into the details, let us recall some extra definitions. Let t be a set of values such that (obj(t), t) is a test for G+ . We say that the value m ∈ M, m ∈ t is essential in t if (obj(t \ m), (t \ m)) is not a test for a given set of objects. Generally, we are interested in finding the maximal subset sbmax(t) ⊂ t such that (obj(t), t) is a test but (obj(sbmax(t)), sbmax(t)) is not a test for a given set of positive objects. Then sbmin(t) = t \ sbmax(t) is a minimal set of essential values in t. Let s ⊆ G+ , assume also that (s, val(s)) is not a test. The object tj , j ∈ s is said to be an essential in s if (s\j, val(s\j)) proves to be a test for a given set of positive objects. Generally, we are also interested in finding the maximal subset sbmax(s) ⊂ s such that (s, val(s)) is not a test but (sbmax(s), val(sbmax(s)) is a test for a given set of positive objects. Then sbmin(s) = s \ sbmax(s) is a minimal set of essential objects in s. An Approach for Searching for Initial Content of STGOOD. In the beginning of inferring GMRTs, the set STGOOD is empty. Next we describe the procedure to obtain an initial content of it. This procedure extracts a quasi- maximal subset s∗ ⊆ G+ which is the extent of a test for G+ (maybe not good). We begin with the first index i1 of s∗ , then we take the next index i2 of s∗ and evaluate the function to be test({i1 , i2 }, val({i1 , i2 })). If the value of the function is true, then we take the next index i3 of s∗ and evaluate the function to be test({i1 , i2 , i3 }, val({i1 , i2 , i3 })). If the value of the function is false, then the index i2 of s∗ is skipped and the function to be test({i1 , i3 }, val({i1 , i3 }))) is evaluated. We continue this process until we achieve the last index of s∗ . The complexity of this procedure is evaluated as the production of ||s∗ || by the complexity of the function to be test(). To obtain the initial content of STGOOD, we use the set SPLUS = {splus(m)|m ∈ M } and apply the procedure described above to each element of SPLUS. The idea of using subcontexts in inferring GMRTs, described in Sec.3, can be presented in a pseudo-code form, see Fig.1. It presents a modification of ASTRA. DIAGARA and a hybrid approach can be easily formalized by the same way. The example below describes two general hybrid methods. The initial part of GenAllGMRTs() is well discussed above. The abbreviation LEV stands for the List (set) of Essential Values. The function DelObj(M, G+ ) returns modified G and f lag. The variable f lag is necessary for switching at- tributive subtasks. The novelty of ASTRA-2 is mainly based on using LEV. There is the new function ChoiceOfSubtask(). It returns na := LEVj with the maximal 2splus(LEVj ) . MainContext, defined FormSubTask(na, M, G+ ), con- sists of object descriptions. There is the auxiliary function kt(m) = true if (m0 ∈ G− = f alse) and f alse otherwise. To illustrate this procedure, we use the sets D+ and D− represented in Tab.2 and 3 (our illustrative example). In these tables, M = {m1 , . . . , m26 }. The set SPLUS0 for positive class of examples is in Tab.4. The initial content of 186 Xenia Naidenova and Vladimir Parkhomenko 1.Algorithm GenAllGMRTs() 1.Algorithm DelVal() Input: G, M 2. i := 1; Output: STGOOD 3. f lag := 0; 2. begin 4. while i ≤ 2M do 3. Forming STGOOD ; 5. if Mi0 ⊆ G+ then 4. Forming and Ordering LEV ; 6. M := M \Mi ; 5. f lag:=1; 7. f lag := 1; 6. end 8. end 7. while true do 9. else if kt(Mi0 ∩ G+ ) then 8. while flag=1 do 10. j :=1 ; 9. M, f lag DelVal(M, G+ ); 11. while j ≤ 2STGOOD do 10. if flag=1 then 12. if STGOODj ⊆ 11. return; Mi0 ∩ G+ then 12. end 13. STGOOD := 13. G+ , f lag STGOOD\ DelObj(M, G+ ); STGOODj 14. end 14. end 15. if M 0 ⊆ G− or 15. end G+ ⊆ STGOOD then 16. STGOOD := 16. return STGOOD; STGOOD ∪ Mi0 ∩ G+ ; 17. end 17. M := M \Mi ; 18. MSUB :=∅; 18. f lag := 1; 19. GSUB :=∅; 19. return; 20. ChoiceOfSubtask; 20. end 21. MSUB , GSUB FormSubTask(na, M, G+ ); (b) DelVal 22. GenAllGMRTs(); 23. M :=M \Mna ; 24. G+ , f lag DelObj(M, G+ ); 25. end (a) GenAllGMRTs 1.Algorithm DelObj() 1.Algorithm FormSubTask() 2. i := 1; 2. i := 1; 0 3. f lag := 0; 3. GSUB := Mna ∩ G+ ; 4. while i ≤ 2G+ do 4. while i ≤ 2GSUB do 5. if G+ (i) ⊆ M \LEV then 5. MSUB := MSUB ∪ 6. G+ := G+ \G+ (i); (MainContext(GSUB (i)∩M )); 7. f lag := 1; 6. end 8. end 7. return; 9. end 10. return; (d) FormSubTask (c) DelObj Fig. 1. Algorithms of ASTRA-2 Subcontexts in Inferring Good Maximally Redundant Tests 187 STGOOD0 is {(2,10), (3, 10), (3, 8), (4, 12), (1, 4, 7), (1, 5,12), (2, 7, 8), (3, 7, 12), (1, 2, 12, 14), (2, 3, 4, 7), (4, 6, 8, 11)}. Table 2. The set D+ of positive object descriptions G D+ 1 m1 m2 m5 m6 m21 m23 m24 m26 2 m4 m7 m8 m9 m12 m14 m15 m22 m23 m24 m26 3 m3 m4 m7 m12 m13 m14 m15 m18 m19 m24 m26 4 m1 m4 m5 m6 m7 m12 m14 m15 m16 m20 m21 m24 m26 5 m2 m6 m23 m24 6 m7 m20 m21 m26 7 m3 m4 m5 m6 m12 m14 m15 m20 m22 m24 m26 8 m3 m6 m7 m8 m9 m13 m14 m15 m19 m20 m21 m22 9 m16 m18 m19 m20 m21 m22 m26 10 m2 m3 m4 m5 m6 m8 m9 m13 m18 m20 m21 m26 11 m1 m2 m3 m7 m19 m20 m21 m22 m26 12 m2 m3 m16 m20 m21 m23 m24 m26 13 m1 m4 m18 m19 m23 m26 14 m23 m24 m26 In these tables we denote subsets of values {m8 , m9 }, {m14 , m15 } by ma and mb , respectively. Applying operation generalization of(s) = s00 = obj(val(s)) to ∀s ∈ STGOOD, we obtain STGOOD1 = {(2,10), (3, 10), (3, 8), (4, 7, 12), (1, 4, 7), (1, 5,12), (2, 7, 8), (3, 7, 12), (1, 2, 12, 14), (2, 3, 4, 7), (4, 6, 8, 11)}. By Th.1, we can delete value m12 from consideration, see splus(m12 ) in Tab.4. The initial content of STGOOD allows to decrease the number of using the procedure to be test() and the number of putting extents of tests into STGOOD. The number of subtasks to be solved. This number is determined by the number of essential values in the set M . The quasi-minimal subset of essential values in M can be found by a procedure analogous to the proce- dure applicable to search for the initial content of STGOOD. We begin with the first value m1 of M , then we take the next value m2 of M and evalu- ate the function to be test(obj({m1 , m2 }), {m1 , m2 }). If the value of the func- tion is false, then we take the next value m3 of M and evaluate the function to be test(obj({m1 , m2 , m3 }), {m1 , m2 , m3 }). If the value of the function is true, then value m2 of M is skipped and the function to be test(obj({m1, m3}), {m1, m3}) is evaluated. We continue this process until we achieve the last value of M . The complexity of this procedure is evaluated as the production of ||M || by the complexity of the function to be test(). In Tab.2,3 we have the following LEV : {m16 , m18 , m19 , m20 , m21 , m22 , m23 , m24 , m26 }. 188 Xenia Naidenova and Vladimir Parkhomenko Table 3. The set D− of negative object descriptions G D− G D− 15 m3 m8 m16 m23 m24 32 m1 m2 m3 m7 m9 m13 m18 16 m7 m8 m9 m16 m18 33 m1 m5 m6 m8 m9 m19 m20 m22 17 m1 m21 m22 m24 m26 34 m2 m8 m9 m18 m20 m21 m22 m23 m26 18 m1 m7 m8 m9 m13 m16 35 m1 m2 m4 m5 m6 m7 m9 m13 m16 19 m2 m6 m7 m9 m21 m23 36 m1 m2 m6 m7 m8 m13 m16 m18 20 m19 m20 m21 m22 m24 37 m1 m2 m3 m4 m5 m6 m7 m12 m14 m15 m16 21 m1 m20 m21 m22 m23 m24 38 m1 m2 m3 m4 m5 m6 m9 m12 m13 m16 22 m1 m3 m6 m7 m9 m16 39 m1 m2 m3 m4 m5 m6 m14 m15 m19 m20 m23 m26 23 m2 m6 m8 m9 m14 m15 m16 40 m2 m3 m4 m5 m6 m7 m12 m13 m14 m15 m16 24 m1 m4 m5 m6 m7 m8 m16 41 m2 m3 m4 m5 m6 m7 m9 m12 m13 m14 m15 m19 25 m7 m13 m19 m20 m22 m26 42 m1 m2 m3 m4 m5 m6 m12 m16 m18 m19 m20 m21 m26 26 m1 m2 m3 m5 m6 m7 m16 43 m4 m5 m6 m7 m8 m9 m12 m13 m14 m15 m16 27 m1 m2 m3 m5 m6 m13 m18 44 m3 m4 m5 m6 m8 m9 m12 m13 m14 m15 m18 m19 28 m1 m3 m7 m13 m19 m21 45 m1 m2 m3 m4 m5 m6 m7 m8 m9 m12 m13 m14 m15 29 m1 m4 m5 m6 m7 m8 m13 m16 46 m1 m3 m4 m5 m6 m7 m12 m13 m14 m15 m16 m23 m24 30 m1 m2 m3 m6 m12 m14 m15 m16 47 m1 m2 m3 m4 m5 m6 m8 m9 m12 m14 m16 m18 m22 31 m1 m2 m5 m6 m14 m15 m16 m26 48 m2 m8 m9 m12 m14 m15 m16 Table 4. The set SPLUS0 splus(m), m ∈ M splus(m), m ∈ M splus(ma ) → {2, 8, 10} splus(m22 ) → {2, 7, 8, 9, 11} splus(m13 ) → {3, 8, 10} splus(m23 ) → {1, 2, 5, 12, 13, 14} splus(m16 ) → {4, 9, 12} splus(m3 ) → {3, 7, 8, 10, 11, 12} splus(m1 ) → {1, 4, 11, 13} splus(m4 ) → {2, 3, 4, 7, 10, 13} splus(m5 ) → {1, 4, 7, 10} splus(m6 ) → {1, 4, 5, 7, 8, 10} splus(m12 ) → {2, 3, 4, 7} splus(m7 ) → {2, 3, 4, 6, 8, 11} splus(m18 ) → {3, 9, 10, 13} splus(m24 ) → {1, 2, 3, 4, 5, 7, 12, 14} splus(m2 ) → {1, 5, 10, 11, 12} splus(m20 ) → {4, 6, 7, 8, 9, 10, 11, 12} splus(mb ) → {2, 3, 4, 7, 8} splus(m21 ) → {1, 4, 6, 8, 9, 10, 11, 12} splus(m19 ) → {3, 8, 9, 11, 13} splus(m26 ) → {1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14} Subcontexts in Inferring Good Maximally Redundant Tests 189 Proposition 1. Each essential value is included at least in one positive object description. Proof. Assume that for an object description ti , i ∈ G+ , we have ti ∩ LEV = ∅. Then ti ⊆ M \LEV. But M \LEV is included at least in one of the negative object descriptions and, consequently, ti also possesses this property. But it contradicts to the fact that ti is a description of a positive object. t u Proposition 2. Assume that X ⊆ M . If X ∩ LEV = ∅, then to be test(X) = false. Proposition 2 is the consequence of Proposition 1. Note that the description of t14 = {m23 , m24 , m26 } is closed because of obj{m23 , m24 , m26 } = {1, 2, 12, 14} and val{1, 2, 12, 14} = {m23 , m24 , m26 }. We also know that s = {1, 2, 12, 14} is closed too (we obtained this result during generalization of elements of STGOOD. So (obj({m23 , m24 , m26 }), {m23 , m24 , m26 }) is a maximally redundant test for positive objects and we can, conse- quently, delete t14 from consideration. As a result of deleting m12 and t14 , we have the modified set SPLUS (Tab.5). Table 5. The set SPLUS1 splus(m), m ∈ M splus(m), m ∈ M splus(ma ) → {2, 8, 10} splus(m22 ) → {2, 7, 8, 9, 11} splus(m13 ) → {3, 8, 10} splus(m23 ) → {1, 2, 5, 12, 13} splus(m16 ) → {4, 9, 12} splus(m3 ) → {3, 7, 8, 10, 11, 12} splus(m1 ) → {1, 4, 11, 13} splus(m4 ) → {2, 3, 4, 7, 10, 13} splus(m5 ) → {1, 4, 7, 10} splus(m6 ) → {1, 4, 5, 7, 8, 10} splus(m7 ) → {2, 3, 4, 6, 8, 11} splus(m18 ) → {3, 9, 10, 13} splus(m24 ) → {1, 2, 3, 4, 5, 7, 12} splus(m2 ) → {1, 5, 10, 11, 12} splus(m20 ) → {4, 6, 7, 8, 9, 10, 11, 12} splus(mb ) → {2, 3, 4, 7, 8} splus(m21 ) → {1, 4, 6, 8, 9, 10, 11, 12} splus(m19 ) → {3, 8, 9, 11, 13} splus(m26 ) → {1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 13} The main question is how we should approach the problem of selecting and ordering subtasks (subcontexts). Consider Tab.6 with auxiliary information. It is clear that if we shall have all the intents of GMRTs entering into descriptions of objects 1, 2, 3, 5, 7, 9, 10, 12, then the main task will be over because the remaining object descriptions (objects 4, 6, 8, 11) give, in their intersection, the intent of already an known test (see, please, the initial content of STGOOD). Thus we have to consider only the subcontexts of essential values associated with object descriptions 1, 2, 3, 5, 7, 9, 10, 12, 13. The number of such subcontexts is 39. But this estimation is not realistic. 190 Xenia Naidenova and Vladimir Parkhomenko Table 6. Auxiliary information P index of object m16 m18 m19 m20 m21 m22 m23 m24 m26 mij 1 × × × × 4 2 × × × × 4 3 × × × × 4 5 × × 2 7 × × × × 4 9 × × × × × × × 7 10 × × × × 4 12 × × × × × × 4 13 × × × × 4 4 × × × × × 6 × × × 8 × × × × × 11 × × × × × P di 2 4 3 4 4 3 5 6 8 39 We begin with ordering index of objects by the number of their entering in tests in STGOOD1 , see Tab.7. Table 7. Ordering index of objects in STGOOD1 Index of object 9 13 5 10 1 2 3 12 7 The number of entering in STGOOD1 0 0 1 2 3 4 4 4 5 Then we continue with object descriptions t9 and t13 . Now we should select the subcontexts (subtasks), based on proj(t × m), where t is object description containing the smallest number of essential values and m is an essential value in t, entering in the smallest number of object descriptions. After solving each sub- task, we have to correct the sets SPLUS, STGOOD, and auxiliary information. So, the first sub-task is t9 × m16 . Solving this sub-task, we have not any new test, but we can delete m16 from t9 and then we solve the sub-task t9 × m19 . As a result, we introduce s = {9, 11} in STGOOD and delete t9 from consideration because of m16 , m19 are the only essential values in this object description. In the example (method 1), we have the following subtasks (Tab. 8). Tab.10 shows the sets STGOOD and TGOOD. All subtasks did not require a recursion. A simpler method of ordering contexts is based on the basic recursive procedure for solving any kind of subtask described in the previous section. At Subcontexts in Inferring Good Maximally Redundant Tests 191 Table 8. The sequence of subtasks (method 1) N subcontext Extent of New Test Deleted values Deleted objects 1 t9 × m16 2 t9 × m19 (9, 11) t9 3 t13 × m18 4 t13 × m19 (13) m16 , m18 t13 5 t5 × m23 m23 6 t5 × m24 t5 7 t10 × m20 (8, 10) 8 t10 × m21 9 t10 × m26 ma , m13 , m4 , m5 t10 10 t1 × m21 11 t1 × m24 m1 , m2 t1 12 t2 × m22 (7, 8, 11) m22 13 t2 × m22 14 t2 × m24 t2 15 t3 × m19 (3, 11) m19 16 t3 × m24 m24 t12 , t7 17 t3 × m26 t3 each level of recursion, we can select the value entering into the greatest number of object descriptions; the object descriptions not containing this value generate the contexts to find GMRTs whose intents are included in them. For our example, value m26 does not cover two object descriptions: t5 and t8 . The initial context is associated with m26 . The sequence of subtasks in the basic recursive procedure is in Tab.9 (method 2). We assume, in this example, that the GMRT intent of which is equal to t14 has been already obtained. We consider only two possible ways of GMRTs construction based on de- composing the main classification context into subcontexts and ordering them by the use of essential values and objects. It is possible to use the two sets QT = {{i, j} ⊆ G+ | ({i, j}, val({i, j}) is a test for G+ } and QAT = {{i, j} ⊆ G+ |({i, j}, val({i, j}) is not a test for G+ } for forming subcontexts and their or- dering in the form of a tree structure. 5 Conclusion In this paper, the decomposition of inferring good classification tests into sub- tasks of the first and second kinds is presented. This decomposition allows, in principle, to transform the process of inferring good tests into a step by step reasoning process. The rules of forming and reducing subcontexts are given, in this paper. Vari- ous possibilities of constructing algorithms for inferring GMRTs with the use of both subcontexts are considered depending on the nature of GMRTs features. 192 Xenia Naidenova and Vladimir Parkhomenko Table 9. The sequence of subtasks (method 2) Object descriptions Context, Extents of tests Values deleted N deleted associated with obtained from context from context (2, 10), (3, 10),ma , m13 , mb , t10 1 m26 (2, 3, 4, 7), (1, 4, 7)m5 , m6 m3 , m20 , m23 , m1 , (3, 7, 12), 2 m26 , m24 m2 , m4 , m7 , m16 , (4, 7, 12) m18 , m19 , m22 Subtask is over; return to the previous context and delete m24 m3 , m7 , m16 , m18 , 3 m26 , not m24 , m23 (13) m19 , m20 , m22 Subtask is over; return to the previous context, delete m23 m2 , m3 , m4 , m16 , 4 m26 , not m24 , not m23 m18 m19 , m21 m26 , m22 , not m24 , 5 (9,11), (7,11) t2 , t7 not m23 Subtask is over; return to the previous context and delete m22 m26 , not m24 , m2 , m3 , m4 , m16 , t7 , t 9 , t 2 , t 3 6 (3,11), (4,6,11) not m23 , not m22 m18 , m19 Subtask is over; we have obtained all GMRTs whose intents contain m26 7 Context t5 (1,5,12) t5 Subtask is over; we have found all GMRTs whose intents are contained in t5 . m3 , m20 , mb , m6 , 8 Context t8 × m22 (7,8,11), (2,7,8) ma , m13 , m19 , m21 Subtask is over; return to the previous context and delete m22 Context t8 9 (8,10) ma t2 , t7 without m22 Context t8 × m21 10 (4,6,8,11) m7 , m13 , m19 t6 , t10 , t11 without m22 Subtask is over; return to the previous context and delete m21 , m20 Context t8 without 11 (3, 8) t4 , t6 , t10 , t11 m22 , m21 , m20 Subtask is over; we have found all GMRTs whose intents are contained in t8 . Subcontexts in Inferring Good Maximally Redundant Tests 193 Table 10. The sets STGOOD and TGOOD N STGOOD TGOOD N STGOOD TGOOD 1 13 m1 m4 m18 m19 m23 m26 9 2,7,8 mb m22 2 2,10 m4 ma m26 10 1,5,12 m2 m23 m24 3 3,10 m3 m4 m13 m18 m26 11 4,7,12 m20 m24 m26 4 8,10 m3 m6 ma m13 m20 m21 12 3,7,12 m3 m24 m26 5 9,11 m19 m20 m21 m22 m26 13 7,8,11 m3 m20 m22 6 3,11 m3 m7 m19 m26 14 2,3,4,7 m4 m12 mb m24 m26 7 3,8 m3 m7 m13 mb m19 15 4,6,8,11 m7 m20 m21 8 1,4,7 m5 m6 m24 m26 16 1,2,12,14 m23 m24 m26 References 1. Chegis, I., Yablonskii, S.: Logical methods of electric circuit control. Trudy Mian SSSR 51, 270–360 (1958), (in Russian) 2. Ferré, S., Ridoux, O.: The use of associative concepts in the incremental building of a logical context. In: Priss, U., Corbett, D., Angelova, G. (eds.) ICCS. Lecture Notes in Computer Science, vol. 2393, pp. 299–313. Springer (2002) 3. Ganter, B., Wille, R.: Formal concept analysis: mathematical foundations. Springer, Berlin (1999) 4. Ganter, B., Kuznetsov, S.O.: Formalizing hypotheses with concepts. In: Proceedings of the Linguistic on Conceptual Structures: Logical Linguistic, and Computational Issues. pp. 342–356. Springer-Verlag (2000) 5. Naidenova, X.A.: DIAGARA: An Incremental Algorithm for Inferring Implicative Rules from Examples. Inf. Theories and Application 12 - 2, 171 – 196 (2005) 6. Naidenova, X.A., Plaksin, M.V., Shagalov, V.L.: Inductive Inferring All Good Clas- sification Tests. In: Valkman, J. (ed.) ”Knowledge-Dialog-Solution”, Proceedings of International Conference. vol. 1, pp. 79 – 84. Kiev Institute of Applied Informatics, Kiev, Ukraine (1995) 7. Naidenova, X.A., Polegaeva, J.G.: An Algorithm of Finding the Best Diagnostic Tests. In: Mintz, G., Lorents, E. (eds.) The 4-th All Union Conference ”Application of Mathematical Logic Methods”. pp. 87 – 92 (1986), (in Russian) 8. Naidenova, X., Ermakov, A.: The decomposition of good diagnostic test inferring algorithms. In: Alty, J., Mikulich, L., Zakrevskij, A. (eds.) ”Computer-Aided Design of Discrete Devices” (CAD DD2001), Proceedings of the 4-th Inter. Conf., vol. 3, pp. 61 – 68. Minsk (2001) 9. Ore, O.: Galois connections. Trans. Amer. Math. Soc 55, 494–513 (1944) Removing an incidence from a formal context Martin Kauer? and Michal Krupka?? Department of Computer Science Palacký University in Olomouc 17. listopadu 12, CZ-77146 Olomouc Czech Republic martin.kauer@upol.cz michal.krupka@upol.cz Abstract. We analyze changes in the structure of a concept lattice cor- responding to a context resulting from a given context with a known concept lattice by removing exactly one incidence. We identify the set of concepts affected by the removal and show how they can be used for computing concepts in the new concept lattice. We present algorithms for incremental computation of the new concept lattice, with or without structural information. 1 Introduction When computing concept lattices of two very similar concepts (i.e., differing only in a small number of incidences), it doesn’t seem to be efficient to compute both concept lattices independently. Rather, an incremental method of computing one of the lattices using the other would be more desirable. Also, analyzing structural differences between concept lattices of two similar contexts would be interesting from the theoretical point of view. This paper presents first results in this direction. Namely, we consider two formal contexts differing in just one incidence and develop a method of comput- ing the concept lattice of the context without the incidence from the other one. In other words, we give a first answer to the question “What happens to the concept lattice, if we remove one cross from the context?”. Our results are the following. We consider contexts hX, Y, Ii and hX, Y, Ji such that J results from I by removing exactly one incidence. Further we consider the respective concept lattices B(I) and B(J). For these contexts and concept lattices we 1. identify concepts in B(I), affected by the removal (they form an interval in B(I)), ? The author acknowledges support by IGA of Palacky University, No. PrF 2014 034 ?? The author acknowledges support by the ESF project No. CZ.1.07/2.3.00/20.0059. The project is co-financed by the European Social Fund and the state budget of the Czech Republic. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 195–207, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 196 Martin Kauer and Michal Krupka 2. show how they transform to concepts in the new concept lattice (they will either vanish entirely, or transform to one or two concepts), 3. derive several further results on the correspondence between the two lattices, 4. propose two basic algorithms for transforming incrementally B(I) to B(J). Several algorithms for incremental computation of concept lattices have been developed in the past [1, 5, 8, 6, 7, 2] (see also [4] for a comparison of some of the algorithms). In general, the algorithms build a concept lattice incrementally by modifying formal contexts by adding or removing objects one by one. Our approach is different as we focus on removing just one incidence. 2 Formal concept analysis Formal Concept Analysis has been introduced in [9], our basic reference is [3]. A (formal) context is a triple C = hX, Y, Ii where X is a set of objects, Y a set of attributes and I ⊆ X × Y a binary relation between X and Y . For hx, yi ∈ I it is said “The object x has the attribute y”. For subsets A ⊆ X and B ⊆ Y we set A↑I = {y ∈ Y | for each x ∈ A it holds hx, yi ∈ I}, B ↓I = {x ∈ X | for each y ∈ B it holds hx, yi ∈ I}. The pair h↑I , ↓I i is a Galois connection between sets X and Y , i.e., it satisfies for each A, A1 , A2 ⊆ X, B, B1 , B2 ⊆ Y , 1. If A1 ⊆ A2 , then A↑2I ⊆ A↑1I , if B1 ⊆ B2 , then B2↓I ⊆ B1↓I . 2. A ⊆ A↑I ↓I and B ⊆ B ↓I ↑I . If A↑I = B and B ↓I = A, then the pair hA, Bi is called a formal concept of hX, Y, Ii. The set A is called the extent of hA, Bi, the set B the intent of hA, Bi. A partial order ≤ on the set B(X, Y, I) of all formal concepts of hX, Y, Ii is defined by hA1 , B1 i ≤ hA2 , B2 i iff A1 ⊆ A2 (iff B2 ⊆ B1 ). B(X, Y, I) along with ≤ is a complete lattice and is called the concept lattice of hX, Y, Ii. Infima and suprema in B(X, Y, I) are given by * [ ↓I ↑I + ^ \ hAj , Bj i = Aj , Bj , (1) j∈J j∈J j∈J * + _ [ ↑I ↓I \ hAj , Bj i = Aj , Bj . (2) j∈J j∈J j∈J One of immediate consequences of (1) and (2) is that the intersection of any system of extents (resp. intents) is again an extent (resp. intent). Mappings γI : x 7→ h{x}↑I ↓I , {x}↑I i and µI : y 7→ h{y}↓I , {y}↓I ↑I i assign to each object x its object concept and to each attributeW y its attribute concept. V We call a subset K ⊆ L, where L is a complete lattice, -dense (resp. -dense) if Removing an Incidence from a Formal Context 197 and only if any element of L can be expressed by suprema (resp. infima) of some W elements fromV K. The set of all object concepts (resp. attribute concepts) is - dense (resp. -dense) in B(X, Y, I). This can be easily seen from (1) (resp. (2)). We will also need a notion of an interval in lattice L. We call a subset K ⊆ L an interval, if and only if there exist elements a, b ∈ L such that K = {k ∈ L | a ≤ k ≤ b}. We denote K as [a, b]. 3 Problem statement and basic notions Let hX, Y, Ii, hX, Y, Ji be two contexts over the same sets of objects and at- tributes such that hx0 , y0 i ∈ / J and I = J ∪ {hx0 , y0 i}. We usually denote concepts of hX, Y, Ii by c, c1 , hA, Bi, hA1 , B1 i, etc., and concepts of hX, Y, Ji by d, d1 , hC, Di, hC1 , D1 i, etc. The respective concept lattices will be denoted B(I) and B(J). Our goal is to find an efficient way to compute the concept lattice B(J) from B(I). We provide two solutions to this problem. First solution computes just elements of B(J), the second one adds also information on its structure. In this section we introduce some basic tools and prove simple preliminary results. The following proposition shows a correspondence between the derivation operators of contexts hX, Y, Ii and hX, Y, Ji. Proposition 1. For each A ⊆ X and B ⊆ Y it holds ↑ ↓ ↑J A I if x0 ∈ / A, ↓J B I if y0 ∈ / B, A = B = A↑I \ {y0 } if x0 ∈ A, B ↓I \ {x0 } if y0 ∈ B. In particular, A↑J ⊆ A↑I and B ↓J ⊆ B ↓I . Proof. Immediate. Formal concepts from the intersection B(I) ∩ B(J) are called stable. These concepts are not influenced by removing the incidence hx0 , y0 i from I. When computing B(J) from B(I), stable concepts need not be recomputed. Proposition 2. A concept c ∈ B(I) is not stable iff c ∈ [γI (x0 ), µI (y0 )]. Proof. If c = hA, Bi ∈/ [γI (x0 ), µI (y0 )], then either x0 ∈/ A, or y0 ∈ / B. If, for instance, x0 ∈/ A, then by Proposition 1, B = A↑I = A↑J , showing B is the intent of a d ∈ B(J). Now by Proposition 1, ↓ B I =A if y0 ∈ / B, B ↓J = B ↓I \ {x0 } = A \ {x0 } = A if y0 ∈ B and so d = c. The case y0 ∈ / B is dual. To prove the opposite direction it is sufficient to notice that c ∈ [γI (x0 ), µI (y0 )] is equivalent to hx0 , y0 i ∈ A × B, excluding the case hA, Bi ∈ B(J). 198 Martin Kauer and Michal Krupka For concepts c = hA, Bi ∈ B(I), d = hC, Di ∈ B(J) we set c = hA , B i = hA↑J ↓J , A↑J i, c = hA , B i = hB ↓J , B ↓J ↑J i, d = hC , D i = hD↓I , D↓I ↑I i, d = hC , D i = hC ↑I ↓I , C ↑I i. Evidently, c , c ∈ B(J) and d , d ∈ B(I). c (resp. c ) is called the upper (resp. lower ) child of c. In our setting, d = d (it would not be the case if I \ J had more than one element). It is the (unique) concept from B(I), containing, as a rectangle, the rectangle represented by d. The following theorem shows basic properties of the pairs h , i and h , i. Proposition 3 (child operators). The mappings c 7→ c , c 7→ c , and d 7→ d are isotone and satisfy c ≤ c , d ≤ d , c = c , d = d , c ≥ c , d ≥ d , c = c , d = d . Proof. Isotony follows directly from definition. Let c = hA, Bi. From Proposition 1 we have A↑J ⊆ A↑I . Thus, A = A↑I ↓I ⊆ A ↑J ↓I , whence c ≤ c . Similarly, for d = hC, Di, D↓J ⊆ D↓I , whence D↓I ↑J ⊆ ↓J ↑J D = D. To prove c = c it suffices to show that for the extent A of c it holds A↑J ↓I ↑J = A↑J . By Proposition 1, we have two possibilities: either A↑J = A↑I , or A↑J = A↑I \ {y0 }. In the first case A↑J ↓I ↑J = A↑J holds trivially, in the second case A↑J ↓I = A↑J ↓J (by the same proposition, because y0 ∈ / A↑J ) and ↑J ↓I ↑J ↑J ↓J ↑J ↑J A =A = A . The equality d = d can be proved similarly. The assertions for lower children are dual. Corollary 1. The mappings c 7→ c and d 7→ d are closure operators and the mappings c 7→ c and d 7→ d are interior operators. Following two theorems utilize the operators , , , to give several equiv- alent characterizations of stable concepts. First we prove a proposition. Proposition 4. The following assertions are equivalent for any c = hA, Bi ∈ B(I). 1. c is stable, 2. A↑I = A↑J , 3. B ↓I = B ↓J . Proof. “2 ⇒ 3”: by Proposition 1, A ⊆ A↑J ↓J = B ↓J ⊆ B ↓I = A. “3 ⇒ 2”: dual. The other implications follow by definition, since c is stable iff both 2. and 3. are satisfied. Proposition 5 (stable concepts in B(I)). The following assertions are equiv- alent for a concept c ∈ B(I): Removing an Incidence from a Formal Context 199 1. c is stable, 2. c ∈ / [γI (x0 ), µI (y0 )], 3. c = c , 4. c = c , 5. c = c . Proof. Directly from Proposition 4. Proposition 6 (stable concepts in B(J)). The following assertions are equiv- alent for a concept d ∈ B(J): 1. d is stable, 2. d = d , 3. d is stable. Proof. Directly from Proposition 4. 4 Computing B(J ) without structural information Proposition 7. The following holds for c = hA, Bi ∈ B(I) and d = hC, Di ∈ B(J): If d = c , then B ∈ {D, D ∪ {y0 }} and if d = c , then A ∈ {C, C ∪ {x0 }}. Proof. By definition of , D = A↑J , which is by Proposition 1 either equal to B, or to B \ {y0 }. Similarly for . Proposition 8. A non-stable concept d ∈ B(J) is a (upper or lower) child of exactly one concept c ∈ B(I). This concept is non-stable and satisfies c = d = d . Proof. Let d = hC, Di. Since d is non-stable, then either C ↑I 6= C ↑J , or D↓I 6= D↓J . Suppose C ↑I 6= C ↑J and set A = C, B = C ↑I . By Proposition 1, x0 ∈ C, y0 ∈/ D and B = D ∪ {y0 }. By the same proposition, A = C = D↓J = D↓I , whence A is an extent of I. Thus, c = hA, Bi ∈ B(I) and it is non-stable because x0 ∈ A and y0 ∈ B (Proposition 2). Since D = C ↑J = A↑J , d = c . A = C yields c = d . We prove uniqueness of c. By Proposition 7, if for c0 = hA0 , B 0 i ∈ B(I) we have d = c0 , then either B 0 = D, or B 0 = D ∪ {y0 }. The first case is impossible, because it would make D an intent of I and, consequently, d a stable concept. The second case means c0 equals c above. There is a third case left: if d = c0 , then C = B 0↓J . Since x0 ∈ C, we have y0 ∈ / B 0 (Proposition 1). Thus, C = B 0↓I (Proposition 1 again). Consequently, C = B 0 and since y0 ∈ ↑I / B 0 , B 0 = C ↑J (Proposition 1 for the last time). Thus, d = c0 , which is a contradiction with non-stability of d. The case D↓I 6= D↓J is proved dually (in this case we obtain d = c ). The meaning of the previous theorem is that for each non-stable concept in B(J) there exists exactly one non-stable concept in B(I), such that these two are related via mappings , or , . The theorem leads the following simple way of constructing B(J) from B(I). For each c ∈ B(I) the following has to be done: 200 Martin Kauer and Michal Krupka 1. If c is stable, then it has to be added to B(J). 2. If c is not stable, then each its non-stable child (i.e., each non-stable element of {c , c }) has to be added to B(J). This method ensures that all proper elements will be added to B(J) (i.e., no element will be omitted) and each element will be added exactly once. Stable (resp. non-stable) concepts can be identified by means of Proposition 11. The following proposition shows a simple way of detecting whether a child of a non-stable concept from B(I) is stable. It also describes the role of fixpoints of operators and . Proposition 9. Let c ∈ B(I) be non-stable. Then – c is non-stable iff c is a fixpoint of , – c is non-stable iff c is a fixpoint of . Proof. If c is not stable, then c = (c ) by Theorem 8. On the other hand, if c is stable, then c = c by Theorem 6, which rules out c = c, because in that case c would be equal to c , which would make it stable by Theorem 5. The proof for c is dual. Example 1. In Fig. 1 we can see some examples of contexts with concepts of different types w.r.t. operators , . The method is utilized in Algorithm 1. Algorithm 1 Transforming B(I) into B(J) (without structural information). procedure TransformConcepts(B(I)) B(J) ← B(I); for all c = hA, Bi ∈ [γI (x0 ), µI (y0 )] do B(J) ← B(J) \ {c}; if c = c then B(J) ← B(J) ∪ {c }; end if if c = c then B(J) ← B(J) ∪ {c }; end if end for return B(J); end procedure Time complexity of Algorithm 1 is clearly O(|B(I)||X||Y |) in the worst case scenario. Indeed, the number of non-stable concepts is at most equal to |B(I)| and the computation of operators , can be done in O(|X| · |Y |) time. 5 Computing B(J ) with structural information To analyze changes in the structure of a concept lattice after removing an inci- dence, we need to investigate deeper properties of the closure operator and the interior operator and the sets of their fixpoints. Removing an Incidence from a Formal Context 201 y1 y2 y3 y0 y0 y1 y2 x0 × × × • x0 • × × x1 × × x1 x2 × × x2 x3 × × (a) The least concept is not sta- (b) Several non-trival non-stable ble and is a fixpoint of both op- concepts are fixpoints of both op- erators. erators. y1 y2 y0 y0 y1 y2 x0 × • x0 • × x1 × × x1 × × × x2 × x2 (c) Concept h{x0 , x1 }, {y0 , y2 }i (d) Concept h{x0 , x1 }, {y0 , y1 }i is a fixpoint of , but not . is a fixpoint of , but not . y1 y2 y3 y4 y0 y0 y1 y2 x0 × × • x0 • x1 × × × x1 × × x2 × x2 x3 × × × (e) Concept h{x0 , x1 }, {y0 }i is x4 × not a fixpoint of any operator. (f) Two concepts are not fix- points of any operator. Fig. 1: Examples of contexts with concepts of different types w.r.t. operators , . Proposition 10. Each stable concept is a fixpoint of both and . Proof. Follows directly from Theorem 5 and Theorem 6. Since is an interior operator and is a closure operator on B(I), we have for each c ∈ B(I), c ≤ c ≤ c . Thus, we can consider the interval [c , c ] ⊆ B(I). Proposition 11. For any c ∈ B(I), each concept from [c , c ]\{c} is stable. Proof. First we prove that either c equals c, or is its upper neighbor. Let c = hA, Bi. By definition, the intent of c is equal to A↑J ↓I ↑I . By Proposition 1, A↑J ∈ {B, B \ {y0 }}. Thus, A↑J ↓I ↑I ∈ {B, B \ {y0 }}. If it equals B, then c = c. Otherwise the intents of c and c differ in exactly one attribute, which makes c and c neighbors. Also notice that in this case c is stable because its intent does not contain y0 (Proposition 2). Now let c0 ≤ c be non-stable. If c = c , then c0 ≤ c. If c < c , then c is non-stable (Proposition 10) whereas c is stable. Non-stable concepts in B(I) 202 Martin Kauer and Michal Krupka form an interval (Theorem 5). Thus, c0 ∨ c is non-stable and should be less than c . Hence, c0 ∨ c = c (c is a lower neighbor of c ), concluding c0 ≤ c again. In a similar way we obtain the inequality c0 ≥ c for each non-stable c0 ≥ c . The following proposition shows an important property of the sets of fixpoints w.r.t. the ordering on B(I): The set of fixpoints of is a lower set whereas the set of fixpoints of is an upper set. Proposition 12. Let c ∈ B(I) be a non-stable concept. If c is a fixpoint of , then each c0 ≤ c is also a fixpoint of . If c is a fixpoint of , then each c0 ≥ c is also a fixpoint of . Proof. Let c = c and c0 ≤ c. If c0 is stable, then the assertion follows by Proposition 10. Suppose c0 is not stable. By extensivity and isotony of , c0 ≤ c0 ≤ c = c. Thus, c0 is not stable (Proposition 2) and c0 = c0 by Proposition 11. The case c = c is dual. The above results are used in Algorithm 2, which computes the lattice B(J) together with the information of its ordering. The algorithm is more complicated than the previous one. We provide a short description of the algorithm, together with some examples. Due to space limitations, we will not dwell into details. We will also leave out dual parts of similar cases. The algorithm processes all non-stable concepts of B(I) in a bottom-up di- rection, using an arbitrary linear ordering v such that if c1 ≤ c2 , then c1 v c2 . Each concept is either modified (by removing x0 from the extent or y0 from in- tent), or disposed of entirely. Sometimes, new concepts are created. All concepts also get updated their lists of upper and lower neighbors. Let c = hA, Bi be an arbitrary non-stable concept from B(I) (c ∈ [γI (x0 ), µI (y0 )]). – If c = c , c = c , then c will “split” into d1 ≤ d2 . - We set d1 = c and d2 = c . - The concept d1 will be a lower neighbor of d2 . - If for a lower neighbor cl of c it holds cl = cl , cl 6= cl , then it will be a lower neighbor of d2 . It is necessary to check whether d1 and cl will be neighbors. It certainly holds cl ≤ d1 , but there can be a concept k, such that cl ≤ k ≤ d1 . - Dually for upper neighbors. - If for a non-stable neighbor cn of c it holds cn = cn , cn = cn , i.e., the same conditions as for c (cn will split into dn1 , dn2 ), then d1 , dn1 and d2 , dn2 will be neighbors. - All other upper (resp. lower) neighbors will be neighbors of d2 (resp. d1 ). – If c = c and c 6= c , then c will lose y0 from its intent. - Denote the transformed c as d = hC, Di = c = hA, B \ {y0 }i. Removing an Incidence from a Formal Context 203 - If for an upper neighbor cu it holds cu = cu , cu 6= cu (cu will lose x0 from its extent), then cu and d will become incomparable. It is necessary to check whether c , cu and c, cu should be neighbors (again, there can be a concept between them). – If c 6= c and c = c , then c will lose x0 from its extent. - Denote transformed c as d = hC, Di = c = hA \ {x0 }, Bi. – If c 6= c and c 6= c , then c will vanish entirely. - It is necessary to check whether c and c should be neighbors (again, a concept can lie between them). - Denote by U the set of all upper neighbors of c, except for c . There is no fixed point of among the elements from U . - Denote by L the set of all lower neighbors of c, except for c . - Concepts from U and L will not be neighbors. Concepts will either become incomparable or one of them or both will vanish. There is also no need for additional checks regarding neighbor- hood relationship between concepts from U and c (resp. L and c ) or their neighbors. - It holds ∀cl ∈ L : cl ≤ c ≤ c , but it is necessary to check if there is a concept between them. - Similarly, it holds ∀cu ∈ U : c ≤ c ≤ cu , but again it is necessary to check if there is a concept between them. The number of iterations in TransformConceptLattice is at most |B(I)|, which occurs when each concept in B(I) is non-stable. In each of the iterations, tests c = c and c = c are performed and one of the procedures Split- Concept, RelinkReducedIntent, UnlinkVanishedConcept is called. It can be easily seen that the tests can be performed quite efficiently and do not add to the time complexity. The most time consuming among the above three procedures is SplitCon- cept. It iterates through all upper (which can be bounded by |X|) and lower (which can be bounded by |Y |) neighbors of the concept c. For each of the neighbors it might be necessary to check if the interval between the neighbor and certain other concept is empty (and we should make a new edge). This can be done by checking intents/extents of its neighbors. The above considerations lead to the result that time complexity of Algorithm 2 is in the worst case O(|B| · |X|2 · |Y |). Example 2. In Fig. 2, we can see some examples of transformations of non-stable concepts from B(I) into concepts of B(J). In Algorithm 2 we will assume that following functions are already defined: – U pperN eighbors(c) - returns upper neighbors of c; – LowerN eighbors(c) - returns lower neighbors of c; – Link(c1 , c2 ) - introduces neighborhood relationship between c1 and c2 ; – U nlink(c1 , c2 ) - cancels neighborhood relationship between c1 and c2 . 204 Martin Kauer and Michal Krupka Algorithm 2 Transforming B(I) with structural information into B(J). procedure LinkIfNeeded(c1 , c2 ) if @k ∈ B(I) : c1 < k < c2 then Link(c1 , c2 ); end if end procedure procedure SplitConcept(c ∈ [γI (x0 ), µI (y0 )]) d1 = c ; d2 = c ; Link(d1 , d2 ); for all u ∈ U pperN eighbors(c) do U nlink(c, u); Link(d2 , u); end for for all l ∈ LowerN eighbors(c) do U nlink(l, c); Link(l, d1 ); end for for all u ∈ U pperN eighbors(c) do if u 6= u then U nlink(d2 , u); Link(d1 , u); LinkIf N eeded(d2 , u ); end if end for for all l = hC, Di ∈ LowerN eighbors(c) do if y0 ∈/ D then U nlink(l, d1 ); Link(l, d2 ); LinkIf N eeded(l , d1 ); end if end for return d1 , d2 ; end procedure procedure RelinkReducedIntent(c ∈ [γI (x0 ), µI (y0 )]) for all u = hC, Di ∈ U pperN eighbors(c) do if u 6= u then U nlink(c, u); LinkIf N eeded(c , u); LinkIf N eeded(c, u ); end if end for end procedure procedure UnlinkVanishedConcept(c ∈ [γI (x0 ), µI (y0 )]) for all u ∈ U pperN eighbors(c) do U nlink(c, u); LinkIf N eeded(c , u); end for for all l ∈ LowerN eighbors(c) do U nlink(l, c); end for end procedure procedure TransformConceptLattice(B(I)) for all c = hA, Bi ∈ [γI (x0 ), µI (y0 )] from least to largest w.r.t. v do if c = c and c = c then . Concept will split. B(I) ← B(I) \ {c}; B(I) ← B(I) ∪ SplitConcept(c); else if c 6= c and c = c then . Extent will be smaller. A ← A \ {x0 }; else if c = c and c 6= c then . Intent will be smaller. RelinkReducedIntent(c); B ← B \ {y0 }; else if c 6= c and c 6= c then . Concept will vanish. B(I) ← B(I) \ {c}; U nlinkV anishedConcept(c); end if end for end procedure Removing an Incidence from a Formal Context 205 cu cu cu cu = cu cu cu = cu cu c c = c = c cu cl c cl cl = cl cl = cl cl cl cl cl (a) Concepts become incomparable. (b) Concept in middle “splits into two”. c cu = cu c cu c cu = cu c cu c c c cl = cl c cl c cl = cl c cl (c) Concept in the middle vanishes. (d) Concept in the middle vanishes. There is already another concept be- tween its children. Fig. 2: Examples of transformations of non-stable concepts from B(I) into con- cepts of B(J). 6 Conclusion We analyzed changes of the structure of a concept lattice, caused by removal of exactly one incidence from the associated formal context. We proved some theoretical results and presented two algorithms with time complexities O(|B| · |X| · |Y |) (Algorithm 1; without structure information) and O(|B| · |X|2 · |Y |) (Algorithm 2; with structure information). There exist several algorithms for incremental computation of concept lattice [1, 5, 8, 6, 7, 2], based on addition and/or removal of objects. Our approach is new in that we recompute a concept lattice based on a removal of just one incidence. Note that the algorithm proposed by Nourine and Raynaud in [7] has time complexity O((|Y | + |X|) · |X| · |B|), which is better than complexity of our Algorithm 2. However, experiments presented in [5] indicate that this algorithm sometimes performs slower than some algorithms with time complexity O(|B| · |X|2 ·|Y |). In the case of our Algorithm 2, some preliminary experiments indicate that the size of the interval of non-stable concepts is usually relatively small, which substantially reduces the overall processing time of the algorithm. A natural next step would be investigate adding incidences to a formal con- text, instead of removing. This problem, however, seems to be more difficult than the first one, namely because the set of non-stable concepts in the lattice B(J) has more complicated structure (it is not an interval) and also because not 206 Martin Kauer and Michal Krupka all non-stable concepts in B(I) can be computed via the operator . We will try to address this issues in the future. We will also focus on the following: – experimenting with proposed algorithms on various datasets and comparing them with other known algorithms, – generalizing the results to allow removing and adding more incidences at the same time. References 1. Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. John Wiley & Sons (2004) 2. Dowling, C.E.: On the irredundant generation of knowledge spaces. J. Math. Psy- chol. 37(1), 49–62 (1993) 3. Ganter, B., Wille, R.: Formal Concept Analysis – Mathematical Foundations. Springer (1999) 4. Kuznetsov, S.O., Obiedkov, S.: Comparing performance of algorithms for generating concept lattices. Journal of Experimental and Theoretical Artificial Intelligence 14, 189–216 (2002) 5. Merwe, D., Obiedkov, S., Kourie, D.: Addintent: A new incremental algorithm for constructing concept lattices. In: Eklund, P. (ed.) Concept Lattices, Lecture Notes in Computer Science, vol. 2961, pp. 372–385. Springer Berlin Heidelberg (2004) 6. Norris, E.M.: An algorithm for computing the maximal rectangles in a binary rela- tion. Revue Roumaine de Mathématiques Pures et Appliquées 23(2), 243–250 (1978) 7. Nourine, L., Raynaud, O.: A fast algorithm for building lattices. Inf. Process. Lett. 71(5-6), 199–204 (1999) 8. Outrata, J.: A lattice-free concept lattice update algorithm based on *CbO. In: Ojeda-Aciego, M., Outrata, J. (eds.) CLA. CEUR Workshop Proceedings, vol. 1062, pp. 261–274. CEUR-WS.org (2013) 9. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Boston (1982) Formal L-concepts with Rough Intents‹ Eduard Bartl and Jan Konecny Data Analysis and Modeling Lab Dept. Computer Science, Palacky University, Olomouc 17. listopadu 12, CZ-77146 Olomouc Czech Republic Abstract. We provide a new approach to synthesis of Formal Concept Analysis and Rough Set Theory. In this approach, the formal concept is considered to be a collection of objects accompanied with two collections of attributes—those which are shared by all the objects and those which are possessed by at least one of the objects. We define concept-forming operators for these concepts and describe their properties. Furthermore, we deal with reduction of the data by rough approximation by given equivalence. The results are elaborated in a fuzzy setting. 1 Introduction Formal concept analysis (FCA) [12] is a method of relational data analysis iden- tifying interesting clusters (formal concepts) in a collection of objects and their attributes (formal context), and organizing them into a structure called concept lattice. Numerous generalizations of FCA, which allow to work with graded data, were provided; see [19] and references therein. In a graded (fuzzy) setting, two main kinds of concept forming-operators— antitone and isotone one—were studied [2, 13, 20, 21], compared [7, 8] and even covered under a unifying framework [4, 18]. We describe concept-forming oper- ators combining both isotone and antitone operators in such a way that each formal (fuzzy) concept is given by two sets of attributes. The first one is a lower intent approximation, containing attributes shared by all objects of the concept; the second one is an upper intent approximation, containing those at- tributes which are possessed by at least one object of the concept. Thus, one can consider the two intents to be a lower and upper approximation of attributes possessed by an object. Several authors dealing with synthesis of FCA and Rough Set Theory have noticed that intents formed by isotone and antitone operators (in both, crisp and fuzzy setting) correspond to upper and lower approximations, respectively (see e.g. [15, 16, 24]). To the best of our knowledge, no one has studied concept- forming operators which would provide both approximations being present in one concept lattice. In this papers we present such concept-forming operators, structure of their concepts, and reduction of the data by means of rough approximations by equiv- alences. Due to page limitation we omit proofs of some theorems. ‹ Supported by grant no. P202/14-11585S of the Czech Science Foundation. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 207–219, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 208 Eduard Bartl and Jan Konecny 2 Preliminaries In this section we summarize the basic notions used in the paper. Residuated Lattices and Fuzzy Sets We use complete residuated lattices as basic structures of truth-degrees. A complete residuated lattice [1, 14, 23] is a struc- ture L “ xL, ^, _, b, Ñ, 0, 1y such that xL, ^, _, 0, 1y is a complete lattice, i.e. a partially ordered set in which arbitrary infima and suprema exist; xL, b, 1y is a commutative monoid, i.e. b is a binary operation which is commutative, asso- ciative, and a b 1 “ a for each a P L; b and Ñ satisfy adjointness, i.e. a b b ď c iff a ď b Ñ c. 0 and 1 denote the least and greatest elements. The partial order of L is denoted by ď. Throughout this work, L denotes an arbitrary complete residuated lattice. Elements a of L are called truth degrees. Operations b (multiplication) and Ñ (residuum) play the role of (truth functions of) “fuzzy conjunction” and “fuzzy implication”. Furthermore, we define the complement of a P L as a “ a Ñ 0. An L-set (or fuzzy set) A in a universe set X is a mapping assigning to each x P X some truth degree Apxq P L. The set of all L-sets in a universe X is denoted LX , or LX if the structure of L is to be emphasized. The operations with L-sets are defined componentwise. For instance, the intersection of L-sets A, B P LX is an L-set A X B in X such that pA X Bqpxq “ Apxq ^ Bpxq for each x P X. An L-set A P LX is also denoted tApxq{x | x P Xu. If for all y P X distinct from x1 , . . . , xn we have Apyq “ 0, we also write tApx1 q{x1 , . . . , Apxn q{xn u. An L-set A P LX is called normal if there is x P X such that Apxq “ 1, and it is called crisp if Apxq P t0, 1u for each x P X. Crisp L-sets can be identified with ordinary sets. For a crisp A, we also write x P A for Apxq “ 1 and x R A for Apxq “ 0. Binary L-relations (binary fuzzy relations) between X and Y can be thought of as L-sets in the universe X ˆ Y . That is, a binary L-relation I P LXˆY between a set X and a set Y is a mapping assigning to each x P X and each y P Y a truth degree Ipx, yq P L (a degree to which x and y are related by I). For L-relation I P LXˆY we define its transpose I T P LY ˆX as I T py, xq “ Ipx, yq for all x P X, y P Y . The composition operators are defined by ł pI ˝ Jqpx, zq “ Ipx, yq b Jpy, zq, yPY ľ pI Ž Jqpx, zq “ Ipx, yq Ñ Jpy, zq, yPY ľ pI Ż Jqpx, zq “ Jpy, zq Ñ Ipx, yq yPY for every I P LXˆY and J P LY ˆZ . Formal L-concepts with Rough Intents 209 A binary L-relation E is called an L-equivalence if it satisfies IdX Ď E (reflexivity), E “ E T (symmetry), E ˝ E Ď E (transitivity). An L-set B P LY is compatible w.r.t. L-equivalence E P LY ˆY if Bpy1 q b Epy1 , y2 q ď Bpy2 q. for any y1 , y2 P Y . Formal Concept Analysis in the Fuzzy Setting An L-context is a triplet xX, Y, Iy where X and Y are (ordinary) sets and I P LXˆY is an L-relation between X and Y . Elements of X are called objects, elements of Y are called attributes, I is called an incidence relation. Ipx, yq “ a is read: “The object x has the attribute y to degree a.” An L-context may be described as a table with the objects corresponding to the rows of the table, the attributes corresponding to the columns of the table and Ipx, yq written in cells of the table (for an example see Fig. 1). α β γ δ A 0.5 0 1 0 B 1 0.5 1 0.5 C 0 0.5 0.5 0.5 D 0.5 0.5 1 0.5 Fig. 1. Example of L-context with objects A,B,C,D and attributes α, β, γ, δ. Consider the following pairs of operators induced by an L-context xX, Y, Iy. First, the pair xÒ, Óy of operators Ò : LX Ñ LY and Ó : LY Ñ LX is defined by ľ ľ AÒ pyq “ Apxq Ñ Ipx, yq, B Ó pxq “ Bpyq Ñ Ipx, yq. (1) xPX yPY Second, the pair xX, Yy of operators X : LX Ñ LY and Y : LY Ñ LX is defined by ł ľ AX pyq “ Apxq b Ipx, yq, B Y pxq “ Ipx, yq Ñ Bpyq. (2) xPX yPY To emphasize that the operators are induced by I, we also denote the op- erators by xÒI , ÓI y and xXI , YI y. Fixpoints of these operators are called formal concepts. The set of all formal concepts (along with set inclusion) forms a com- plete lattice, called L-concept lattice. We denote the sets of all concepts (as well as the corresponding L-concept lattice) by B ÒÓ pX, Y, Iq and B XY pX, Y, Iq, i.e. B ÒÓ pX, Y, Iq “ txA, By P LX ˆ LY | AÒ “ B, B Ó “ Au, (3) B XY pX, Y, Iq “ txA, By P LX ˆ LY | AX “ B, B Y “ Au. 210 Eduard Bartl and Jan Konecny For an L-concept lattice BpX, Y, Iq, where B is either B ÒÓ or B XY , denote the corresponding sets of extents and intents by ExtpX, Y, Iq and IntpX, Y, Iq. That is, ExtpX, Y, Iq “ tA P LX | xA, By P BpX, Y, Iq for some Bu, (4) IntpX, Y, Iq “ tB P LY | xA, By P BpX, Y, Iq for some Au. When displaying L-concept lattices, we use labeled Hasse diagrams to include all the information on extents and intents. In B ÒÓ pX, Y, Iq, for any x P X, y P Y and formal L-concept xA, By we have Apxq ě a and Bpyq ě b if and only if there is a formal concept xA1 , B1 y ď xA, By, labeled by a{x and a formal concept xA2 , B2 y ě xA, By, labeled by b{y. We use labels x resp. y instead of 1{x resp. 1 {y and omit redundant labels (i.e., if a concept has both the labels a{x and b{x then we keep only that with the greater degree; dually for attributes). The whole structure of B ÒÓ pX, Y, Iq can be determined from the labeled diagram using the results from [2] (see also [1]). In B XY pX, Y, Iq, for any x P X, y P Y and formal L-concept xA, By we have Apxq ě a and Bpyq ď b if and only if there is a formal concept xA1 , B1 y ď xA, By, labeled by a{x and a formal concept xA2 , B2 y ě xA, By, labeled by b{y (see examples depicted in Fig. 2). B, 0.5{β, 0.5{δ 0.5 ‚ {γ ‚ D, 0.5{α ‚ A, 0.5{α, γ ‚ ‚ C, 0.5{β, 0.5{δ A, 0 {β, 0.5{δ ‚ ‚ C, 0.5{β, 0.5{γ D ‚ 0.5 {B ‚ ‚ C, 0 {α 0.5 {C, β, δ ‚ ‚ 0.5{A, B, α 0.5 {A, 0.5{D ‚ ‚ 0.5 {B, 0.5{D ‚ 0 {γ Fig. 2. Concept lattice BÒÓ pX, Y, Iq (left) and BXY pX, Y, Iq (right) of the L-context in Fig. 1. Formal L-concepts with Rough Intents 211 3 L-rough concepts We consider concept-forming operators induced by L-context xX, Y, Iy defined as follows: Definition 1. Let xX, Y, Iy be an L-context. Define L-rough concept-forming operators as O Y AM “ xAÒ , AX y and xB, By “ B Ó X B for A P LX , B, B P LY . L-rough concept is then a fixed point of xM, Oy, i.e. a O pair xA, xB, Byy P LX ˆ pL ˆ LqY such that AM “ xB, By and xB, By “ A.1 AÒ and AX are called lower intent approximation and upper intent approximation, respectively. That means, M gives intents w.r.t. both xÒ, Óy and xX, Yy; O then gives inter- section of extents related to the corresponding intents. We denote the set of all fixed-points of xM, Oy, in correspondence with (3), as B MO pX, Y, Iq and call it L-rough concept lattice. Below, we present an analogy of the Main theorem on concept lattices for L-rough setting. Theorem 1 (Main theorem on L-rough concept lattices). (a) L-rough concept lattice B MO pX, Y, Iq is a complete lattice with suprema and infima defined as follows ľ č ď č OM xAi , B i , B i y “ x Ai , x B i , B i y y, i i i i MO ł ď č ď xAi , B i , B i y “ xp Ai q , Bi, B i y. i i i i (b) Moreover, a complete lattice V “ xV, ďy is isomorphic to B MO pX, Y, Iq iff there are mappings γ :X ˆLÑV and µ:Y ˆLˆLÑV such that γpX ˆLq is supremally dense in V, µpY ˆLˆLq is infimally dense in V, and a b b ď Ipx, yq and Ipx, yq ď a Ñ b is equivalent to γpx, aq ď µpy, b, bq for all x P X, y P Y, a, b, b P L. When drawing a concept lattice we label nodes as in B ÒÓ for lower intent approximations and B XY for upper intent approximations. We write a{y or a{y instead of just a{y to distinguish them. Fig. 3 (middle) shows an L-rough concept lattice for the L-context from Fig. 1. The following theorem explains that normal extents have natural intent ap- proximations; that is B Ď B. 1 In what follows, we naturally identify xA, xB, Byy with xA, B, By. 212 Eduard Bartl and Jan Konecny Theorem 2. For normal A P LX , we have AÒ Ď AX , for crisp singleton A P LX , we have AÒ “ AX . Proof. Since A is normal, there is x1 P X such that Apx1 q “ 1. Then we have ľ AÒ pyq “ Apxq Ñ Ipx, yq ď Apx1 q Ñ Ipx1 , yq “ Ipx1 , yq “ xPX ł (5) “ Apx1 q b Ipx1 , yq ď Apxq b Ipx, yq “ AX pyq xPX for each y P Y . For A being a crisp singleton, one can show AÒ “ AX by changing all inequal- ities in (5) to equalities. \ [ Since xM, Oy is defined via xÒ, Óy and xX, Yy, one can expect that there is a strong relationship between the associated concept lattices. In the rest of this section, we summarize them. Theorem 3. For S Ď LX , let rSs denote an L-closure span of S, i.e. the small- est L-closure system containing S. We have rExtÒÓ pX, Y, Iq Y ExtXY pX, Y, Iqs “ ExtMO pX, Y, Iq. O Proof. “Ď”: Let A P ExtÒÓ pX, Y, Iq. Then A “ AXX “ xAÒ , Y y P ExtMO pX, Y, Iq. Similarly for A P ExtXY pX, Y, Iq. “Ě”: Let A P ExtMO pX, Y, Iq and let xB1 , B2 y “ AM . Then we have A “ B Ó X B Y P rExtÒÓ pX, Y, Iq Y ExtXY pX, Y, Iqs since B Ó P ExtÒÓ pX, Y, Iq and B Y P ExtXY pX, Y, Iq. From Theorem 3 one can observe that no extent from ExtÒÓ pX, Y, Iq and ExtXY pX, Y, Iq is lost. Corollary 1. ExtÒÓ pX, Y, Iq Ď ExtMO pX, Y, Iq and ExtXY pX, Y, Iq Ď ExtMO pX, Y, Iq. In addition, no concept is lost. Corollary 2. For each xA, By P B ÒÓ pX, Y, Iq there is xA, B, AX y P B MO pX, Y, Iq. For each xA, By P B XY pX, Y, Iq there is xA, AÒ , By P B MO pX, Y, Iq. Remark 1. One can observe from Fig. 3 that in ExtMO pX, Y, Iq there exist ex- tents which are present neither in ExtÒÓ pX, Y, Iq nor in ExtXY pX, Y, Iq. On the other hand, lower intent approximations are exactly those from IntÒÓ pX, Y, Iq and upper intent approximations are exactly those from IntXY pX, Y, Iq. With results on mutual reducibility from [8] we can state the following the- orem on representation of B MO by B ÒÓ . 0.5 {γ, 0.5{β, 0.5{δ ‚ 0.5 {α, 1{γ 0.5 0.5 0.5 {β, {δ ‚ ‚ ‚ 0.5{α {γ ‚ B, 0.5{β, 0.5{δ ‚ ‚ ‚ ‚ A, 0.5{α, γ ‚ ‚ C, 0.5{β, 0.5{δ 0.5 {γ D, 0.5{α ‚ ‚ ‚ ‚ A, 0{β, 0.5{δ D D ‚ A, 0{β, 0.5{δ ‚ ‚ 0.5{γ 1 B, {α ‚ ‚ ‚ C, 0{α 0.5 {C, β, δ ‚ ‚ 0.5{A, B, α 1 0.5 {β, 1{δ {B ‚ ‚ C, 0{α ‚ ‚ ‚ ‚ 0.5 0.5 0.5 {A {A, 0.5{D ‚ {B, 0.5{D 0.5 {B ‚ ‚ ‚ ‚ 0.5 0.5 {D ‚ ‚ 0.5{C, 0{γ {C, 0{γ ‚ Formal L-concepts with Rough Intents Fig. 3. BMO pX, Y, Iq (middle) and positions of original concepts in BÒÓ pX, Y, Iq (left) and BXY pX, Y, Iq (right) with L being a three-element Lukasiewicz chain 213 214 Eduard Bartl and Jan Konecny Theorem 4. For a L-context xX, Y, Iy, consider the L-context xX, Y ˆ L, Jy where J is defined by # Ipx, yq if a “ 1, Jpx, xy, ayq “ Ipx, yq Ñ a otherwise. Then we have that B ÒÓ pX, Y ˆ L, Jq is isomorphic to B MO pX, Y, Iq as a lattice. In addition, ExtÒÓ pX, Y ˆ L, Jq “ ExtMO pX, Y, Iq. Proof (sketch). In [8] we show that for L-contexts xX, Y, Iy and xX, Y ˆ Lzt1u, Jy such that Jpx, xy, ayq “ Ipx, yq Ñ a it holds that ExtXY pX, Y, Iq “ ExtÒÓ pX, Y ˆ Lzt1u, Jq. Using this fact, one can check that mapping i defined as 1 ipxA, B, Byq ÞÑ xA, B 1 Y B y, 1 where B 1 P LY ˆt1u , B P LY ˆLzt1u with B 1 pxy, 1yq “ Bpyq, 1 B pxy, ayq “ Bpyq Ñ a, is the desired isomorphism from B MO pX, Y, Iq to B ÒÓ pX, Y ˆ L, Jq. Theorem 4 shows how we can obtain a concept lattice formed by xÒ, Óy which is isomorphic to L-rough concept lattice of given L-context. 4 Rough approximation of an L-context and L-concept lattice In [17] Pawlak introduced Rough Set Theory where uncertain elements are ap- proximated with respect to an equivalence relation representing indiscernibility. Formally, given Pawlak approximation space xU, Ey, where U is a non-empty set of objects (universe) and E is an equivalence relation on U , the rough approx- imation of a crisp set A Ď U by E is the pair xAóE , AòE y of sets in U defined by x P A óE iff for all y P U, xx, yy P E implies y P A, òE xPA iff there exists y P U such that xx, yy P E and y P A. AóE and AòE are called lower and upper approximation of the set A by E, respectively. Formal L-concepts with Rough Intents 215 In the fuzzy setting, one can generalize xAóE , AòE y as in [10, 11, 22], ľ AóE pxq “ pEpx, yq Ñ Apyqq, yPU ł òE A pxq “ pApyq b Epx, yqq yPU for L-equivalence E P LU ˆU and L-set A P LU . Considering L-context xU, U, Ey, we can easily see that óE is equivalent to YE ; and òE is equivalent to XET . Since E is symmetric, we can also write xóE , òE y “ xYE , XE y. (6) Note that for L-set A, AóE is its largest subset compatible with E and AòE is its smallest superset compatible with E. Below, we deal with situation where lower and upper intent approximations are further approximated using Pawlak’s approach. In other words, instead of lower intent approximation AÒ we consider the largest subset of AÒ compatible with a given indiscernibility relation E, and similarly, instead of upper intent approximation AX we consider its smallest superset compatible with E. In The- orem 5 we show how to express this setting using L-rough concept forming operators. Definition 2. Let xX, Y, Iy be an L-context, E be an L-equivalence on Y . Define L-rough concept-forming operators as follows: AME “ xAÒóE , AXòE y, OE óE Y xB, By “ B òE Ó X B . Directly from (6) and results in [5] we have: Theorem 5. Let xX, Y, Iy be an L-context, E be an L-equivalence on Y . We have OE YI˝E AME “ xAÒIŻE , AXI˝E y and xB, By “ B ÓIŻE X B . Again, for normal extents we obtain natural upper and lower intent approx- imations. Theorem 6. For normal A P LX we have AÒIŻE Ď AXI˝E . In correspondence with (3) and (4), we denote set of the set of fixpoints of xME , OE y in L-context xX, Y, Iy by B MEOE pX, Y, Iq and set of its extents and intents by ExtMEOE pX, Y, Iq and IntMEOE pX, Y, Iq, respectively. The following theorem shows that a use of a rougher L-equivalence relation leads to a reduction of size of the L-rough concept lattices. Furthermore, this reduction is natural, i.e. it preserves extents. 216 Eduard Bartl and Jan Konecny Theorem 7. Let xX, Y, Iy be an L-context, and E1 , E2 be L-equivalences on Y , such that E1 Ď E2 . Then ExtME2OE2 pX, Y, Iq Ď ExtME1OE1 pX, Y, Iq. Example 1. Fig. 4 shows L-rough concept lattice of the L-context in Fig. 1 and rough L-concept lattice approximated using the following L-equivalence relation on Y . α β γ δ α 1 0.5 0 0 β 0.5 1 0 0 γ 0 0 1 0.5 δ 0 0 0.5 1 To demonstrate Theorem 7, the concepts with the same extents in the two lattices are connected. 5 Conclusions and further research We proposed a novel approach to synthesis of RTS na FCA. It provides a lot of directions to be further explored. Our future research includes: Study of attribute implications using whose semantics is related to the present setting. That will combine results on fuzzy attribute implications [9] and at- tribute containment formulas [6]. Generalization of the current setting. Note that the operators Ò and X which compute the universal and the existential intent, need not be induced by the same relation to keep most of the described properties. Actually, this feature is used in Section 4. In our future research, we want to elaborate more on this. For instance, it can provide interesting solution of problem of missing values in a formal fuzzy context—the idea is to use Ò induced by the context with missing values substituted by 0, and X induced by the context with missing values substituted by 1. Reduction of L-rough concept lattice via linguistic hedges As two intents are considered in each L-rough concept, the size of concept lattice can grow very large. The RST approach to reduction of data, i.e. use of rougher L-relation, directly leads to reduction of L-rough concept lattice as we showed in Theorem 7. FFCA provides other ways to reduce the size, one of them is parametrization of concept-forming operators using hedges. Formal L-concepts with Rough Intents 217 0.5 {γ, 0.5{β, 0.5{δ 0.5 {γ, 0.5{, 0.5{δ ‚ ‚ 0.5 {α, 1{γ A, 0.5{α, 0.5{β 0.5 0.5 {β, {δ ‚ ‚ ‚ 0.5 {α ‚ ‚ 0.5{δ B, 0.5{α, 0.5{β, γ ‚ ‚ ‚ ‚ ‚ 0.5 {γ C, 0.5{γ ‚ ‚ ‚ A, {β, 0 0.5 ‚ ‚ D {δ D 0 {δ 1 0 B, {α ‚ ‚ ‚ C, {α ‚ 1 {β, 1{δ 0.5 {A, 0{α, 0{β ‚ ‚ ‚ ‚ ‚ 1{δ 0.5 0.5 {A {B, 1{α, 1{β 0.5 {B ‚ ‚ ‚ ‚ ‚ 0.5 {C, 0{γ 0.5 {D ‚ ‚ 0.5 {C, {γ 0 ‚ ‚ 0.5{D ‚ ‚ Fig. 4. Rough L-concept lattices BMO pX, Y, Iq (left) and BMEOE pX, Y, Iq (right) with L being three-element Lukasiewicz chain. The corresponding extents are connected. References 1. Radim Belohlavek. Fuzzy Relational Systems: Foundations and Principles. Kluwer Academic Publishers, Norwell, USA, 2002. 2. Radim Belohlavek. Concept lattices and order in fuzzy logic. Ann. Pure Appl. Log., 128(1-3):277–298, 2004. 3. Radim Belohlavek. Optimal decompositions of matrices with entries from residu- ated lattices. Submitted to J. Logic and Computation, 2009. 4. Radim Belohlavek. Sup-t-norm and inf-residuum are one type of relational product: Unifying framework and consequences. Fuzzy Sets Syst., 197:45–58, June 2012. 5. Radim Belohlavek and Jan Konecny. Operators and spaces associated to matrices with grades and their decompositions. In NAFIPS 2008, pages 288–293. 6. Radim Belohlavek and Jan Konecny. A logic of attribute containment, KAM ’08 Proceedings of the 2008 International Symposium on Knowledge Acquisition and Modeling, pages 246–251. 218 Eduard Bartl and Jan Konecny 7. Radim Belohlavek and Jan Konecny. Closure spaces of isotone galois connec- tions and their morphisms. In Proceedings of the 24th international conference on Advances in Artificial Intelligence, AI’11, pages 182–191, Springer-Verlag, Berlin, Heidelberg, 2011. 8. Radim Belohlavek and Jan Konecny. Concept lattices of isotone vs. antitone Galois connections in graded setting: Mutual reducibility revisited. Information Sciences, 199: 133 – 137, 2012. 9. Radim Belohlavek and Vilem Vychodil. A logic of graded attributes. submitted to Artificial Intelligence. 10. Didier Dubois and Henri Prade. Rough fuzzy sets and fuzzy rough sets. Interna- tional Journal of General Systems, 17(2–3):191–209, 1990. 11. Didier Dubois and Henri Prade. Putting rough sets and fuzzy sets together. In Roman Slowiński, editor, Intelligent Decision Support, volume 11 of Theory and Decision Library, pages 203–232. Springer Netherlands, 1992. 12. Bernard Ganter and Rudolf Wille. Formal Concept Analysis – Mathematical Foun- dations. Springer, 1999. 13. George Georgescu and Andrei Popescu. Non-dual fuzzy connections. Arch. Math. Log., 43(8):1009–1039, 2004. 14. Petr Hájek. Metamathematics of Fuzzy Logic (Trends in Logic). Springer, Novem- ber 2001. 15. Robert E. Kent. Rough concept analysis: A synthesis of rough sets and formal concept analysis. Fundam. Inf., 27(2,3):169–181, August 1996. 16. Hongliang Lai and Dexue Zhang. Concept lattices of fuzzy contexts: Formal con- cept analysis vs. rough set theory. International Journal of Approximate Reasoning, 50(5):695 – 707, 2009. 17. Zdzislaw Pawlak. Rough sets. International Journal of Computer & Information Sciences, 11(5):341–356, 1982. 18. Jesus Medina, Manuel Ojeda-Aciego, and Jorge Ruiz-Calvino. Formal concept analysis via multi-adjoint concept lattices. Fuzzy Sets and Systems, 160(2):130– 144, January 2009. 19. Jonas Poelmans, Dmitry I. Ignatov, Sergei O. Kuznetsov, and Guido Dedene. Fuzzy and rough formal concept analysis: a survey. International Journal of General Systems, 43(2):105–134, 2014. 20. Silke Pollandt. Fuzzy Begriffe: Formale Begriffsanalyse von unscharfen Daten. Springer–Verlag, Berlin–Heidelberg, 1997. 21. Andrei Popescu. A general approach to fuzzy concepts. Mathematical Logic Quar- terly, 50(3):265–280, 2004. 22. Anna M. Radzikowska and Etienne E. Kerre. Fuzzy rough sets based on residuated lattices. In James F. Peters, Andrzej Skowron, Didier Dubois, Jerzy W. Grzymala- Busse, Masahiro Inuiguchi, and Lech Polkowski, editors, Transactions on Rough Sets II, volume 3135 of Lecture Notes in Computer Science, pages 278–296. Springer Berlin Heidelberg, 2005. 23. Morgan Ward and Robert P. Dilworth. Residuated lattices. Transactions of the American Mathematical Society, 45:335–354, 1939. 24. Yiyu Yao. On unifying formal concept analysis and rough set analysis. Xi’an Jiaotong University Press, 2006. Reduction dimension of bags of visual words with FCA Ngoc Bich Dao, Karell Bertet, Arnaud Revel Laboratoire L3i, University of La Rochelle, France Abstract. In image retrieval involving bag of visual words, reduction dimension is a fundamental task of data preprocessing. In recent years, several methods have been proposed for supervised and unsupervised cases. In the supervised case, the problem has been addressed with en- couraging results. However, in the unsupervised case, reduction dimen- sion is still an unavoidable challenge. In this article, we propose an appli- cation of a logic reduction dimension method which is based on Formal Concept Analysis for image retrieval. This method is the reduction of a closure system without, theoretically, loss of information. In our context, combining our proposed method with bag of visual words is original. Experimental results on five data sets such as COREL, CALTECH256, VOC2005, VOC2012 and MIR flickr are analyzed to show the influence of the data structures and the parameters on the reduction factor. 1 Introduction Thanks to the generalization of multimedia devices, huge collections of digital images are available today. As far as mining in multimedia documents is con- cerned, web search engines usually give poor results. Hence, such results are far from expected regarding the semantics of the documents. Content Based Image Retrieval (CBIR)[1] has been investigated in order to give an answer to this problem for decades. The main idea is to build a description based on the image content, and to find similarities between descriptions. Classically, visual features are extracted from images and then compiled into an index or signature to give a dense description of images. To perform the retrieval, a similarity function is computed to compare the index of the query with those of collection. A ranking of the results according to the calculated similarity is proposed to the users. The detection of visual features can be performed by a SIFT detector[2] or a dense grid which both select an important number of interest points (up to several thousands) from the images. Each of these points is then described thanks to a SIFT-like descriptor. However, to limit the dimension of the description space, a vector quantization (usually k-means) is performed in order to cluster similar in- terest points into ”visual words”, and to generate a dictionary of ”visual words” (usually up to 1000 words). Then, the signature of the image is composed of the set of all the visual words corresponding to each feature point detected into the image (what formed a ”bag of visual words”[3]). The comparison between the images then consists in comparing the bags of visual words of each image c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 219–231, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 220 Ngoc Bich Dao, Karell Bertet and Arnaud Revel in a dataset. The processing cost introduced by these techniques makes them difficult to use with large amounts of images such as a query on the Internet. On the other hand, supervised data is labeled (the data has ground truth) and classification methods are required to deal with the categorization problem. Data in the case unsupervised is unlabeled, hence clustering methods are used to gather the similar observations in the same cluster. There are many appli- cations for classification and clustering on many domains of computer science such as bioinformatics, numerical analysis, machine learning, data mining, pat- tern recognition, etc., where data may contain a grand set of features, means the description of the data is high dimension, and therefore it need to be re- duced. However, reduction them while preserving the quality of the data is still challenging. To be able to manage high dimensional description spaces, reduction tech- niques have been proposed. These techniques are much used as a data prepro- cessing step in machine learning and pattern recognition. This step can usually increase the accuracy of the results in the next steps such as classification or clustering while the computational cost and time cost of the former step may be significantly decreased. Regarding statistics and machine learning literature, we distinguish two main strategies: feature extraction and feature selection. These methods can be used for supervised case or unsupervised case. The main idea of feature transformation consists in transforming the given set of features into a new one. In case that the size of the new feature set is greater than the original feature set, we called it the feature generation. And when new feature set size is smaller than the original feature set, feature extraction is mentioned. Feature selection methods propose a manipulation of data to select features from the original set. This approach is interesting in some domains when they prefer the existing features in order to maintain their physical properties. In this article, we propose a logic and unsupervised feature reduction method issued from FCA to address the visual word reduction problem in a CBIR sys- tem. In FCA, data are organised into a ”context” by a set of observations (called ”objects”, ”samples” or ”experimental units” in other fields) and a set of features (also known as ”attributes”, ”parameters”, or ”variables” in computer science, machine learning and statistic communities) that are associated with each ob- servation. Context reduction is a simple and polynomial treatment in FCA classically applied on the whole context, thus both reducing observations and features. This treatment is based on a nice result establishing that the concept lattice of the context can be reduced to a minimal one while preserving its graph structure by deleting some redundant observations and features. For example, when two attributes are shared by the same objects, then they belong to the same concepts of the concept lattice, thus they are redundant and one of these two objects can be deleted while preserving the concept lattice structure. In our case, we focus on feature reduction of a context. Our algorithm accepts as input the closure operator of the context on attributes set, and returns the redundant attributes. Thus, this algorithm extends the classical attributes reduction of a context to the Reduction Dimension of Bags of Visual Words with FCA 221 more general case of data described by a closure operator. Moreover, we propose a new application in image analysis for features reduction of visual words. This paper is organized as follows: In order to introduce our approach, we recall some definitions of formal concepts in the section 2.1. Section 2.2 shows details our proposed method. Section 3 shows some experimental results with real data. Finally, section 4 ends this paper with a conclusion and perspectives. 2 The proposed features selection method The feature reduction algorithm we propose is a logic and unsupervised method stemming from FCA where a concept lattice, defining from a binary table, rep- resents the description of all object-attribute combinations. When the concept lattice structure is preserved after the deletion of some attributes and objects, then these attributes are ”redundant” for the lattice structure and can be deleted from the initial data without affecting the structure of object-attributes combi- nations. Therefore, from a theoretical point of view, the description of data is equivalently represented by a concept lattice where ”redundant” attributes and objects are deleted. The reduction is a simple and polynomial treatment in FCA, classically de- composed into two steps: attribute and object reduction. In this article, we focus on attributes/features reduction, thus on the detection of redundant attributes for the concept lattice structure reduced to attributes. A nice result establishes that each subset of a concept (A,B) is a closure defined on the objects and at- tributes set, and the concept lattice reduced to the attributes/objects is denoted a closure lattice. In the first subsection, we introduce the notions of closure lattice according to a closure operator, reduced closure lattice and redundant attributes. In the second section, we presents the reduction algorithm aiming at removing redun- dant attributes, with a closure operator as input. This algorithm is thus a generic algorithm that can be applied either on attributes or on objects of a binary table, but also on any closure system. 2.1 Reduced lattice In FCA, the relationship between a set of attributes I and a set of objects O are described by a formal context (O, I, (α, β)) where α(A) the set of attributes sharing by a subset A of objects, and β(B) the set of objects sharing a subset B of attributes. One can derive two closure systems from a context. The first one is defined on the set of attributes I, with β ◦ α as closure operator. The second one is defined on the set of objects O with α ◦ β as closure operator[18]. A closure system (ϕ, S) is defined by a closure operator ϕ on a set S, i.e. a map on P(S) satisfying the three following properties: ϕ is isotone, extensive and idempotent. A subset X ⊆ S is called closed if ϕ(X) = X (see Table 2). The set system F of all closed subsets, fitted out with the inclusion relation ⊆, forms a lattice usually called the closure lattice (see Fig. 1a). See the survey of Caspard 222 Ngoc Bich Dao, Karell Bertet and Arnaud Revel and Monjardet[19] for more details about closure systems. There are infinitely set systems whose closure lattice are isomorphic. A reduced closure lattice is a closure lattice defined on a set S of the smallest size among all isomorphic closure lattices. A nice result[20,18] establishes that a closure system is reduced when, for each x ∈ S, the closure ϕ(x) is a join irreducible (Equation 1). ∀x ∈ S, ∀Y ⊆ S so that x 6∈ Y, then ϕ(x) 6= ϕ(Y ) (1) Therefore, a non-reduced closure system contains reducible elements - ele- ments which do not satisfy Equation 1 - each reducible element x ∈ S is then equivalent to a set Ex ⊆ S of equivalent elements with x 6∈ Ex and ϕ(x) = ϕ(Ex ). Reducible elements can be removed without affecting the structure of the closure lattice. The reduction of a closure system consists then in removing or replacing each reducible element x ∈ S by its equivalent set Ex . 2.2 Proposed reduction algorithm The algorithm we propose is a generic reduction algorithm since it only needs a closure operator as input. Thus it can be applied with the same complexity on any closure system, and in particular on a context by considering the attributes - using β ◦ α as closure operator. a b c d e f g h a b c d e f 1 × × 1 × 2× × × × 2× × × 3 ××××× 3 ××× 4× ××××× 4× ××× 5× ×××× 5× ×× 6 × ×× 6 × × 7× ×× 7× × 8× × × 8× × 9×××××××× 9×××××× (a) The context (b) The attribute- reduced context Table 1: The example of context x a b c d e f g h ϕ(x) a,g b,g a,c,g d,e,f,g e,g f,g g e,f,g,h Table 2: Attributes x ∈ S and their closure ϕ(x) for the context in Table 1a A direct application of the definition (see Eq. 1) would imply an exponential cost by checking if any subset Y ⊂ S is equivalent to each x ∈ S. We use the precedence relation (precedence graph) for a polynomial reduction. The prece- dence graph is defined on the set S, with an edge between two elements x, y ∈ S Reduction Dimension of Bags of Visual Words with FCA 223 [a, b, c, d, e, f, g, h] [a, b, c, d, e, f] [a, d, e, f, g, h] [a, c, g] [a, d, e, f] [a, c] [d, e, f, g, h] [a, e, f, g, h] [d, e, f] [a, e, f] [b, f, g] [e, f, g, h] [a, f, g] [a, e, g] [b, f] [e, f] [a, f] [a, e] [b, g] [f, g] [e, g] [a, g] [b] [f] [e] [a] [g] [] (a) The closure lattice of context in (b) The reduced closure lattice Table 1a of context in Table 1b Fig. 1: The example of closure lattices when ϕ(x) ⊆ ϕ(y). This graph is clearly acyclic for a reduced closure system. We propose a generic algorithm in 3 steps: Step 1: Standardization. Check if there exists x, y ∈ S such that ϕ(x) = ϕ(y). When ϕ(x) = ϕ(y), then x and y belong to the same strongly connected components of the graph. Each strongly connected components X ⊆ S in- clude the elements xi , xj so that ϕ(xi ) = ϕ(xj ), ∀xi 6= xj ∈ X. Thus, we can delete all elements except one representative element x ∈ X of the com- ponent. The obtained precedence graph is then an acyclic graph. Step 2: Clarification. Check if there exists x ∈ S such that ϕ(x) = ϕ(∅). When such an x exists, then ϕ(x) is included into ϕ(y) for any y ∈ S, thus x is the only source of the precedence graph. The clarification test has only to be performed for graphs with one source. Step 3: Reduction. Check, for any x ∈ S, if there exists a set Ex ⊂ S such that x ∈ / Ex and ϕ(x) = ϕ(Ex ). One can observe that an attribute x with only one immediate predecessor y is not reducible, because it would be equivalent to y, and thus belong to the same strongly connected com- ponent already removed in the previous step. If there exists Ex ⊂ S such that ϕ(x) = ϕ(Ex ), then elements of Ex are clearly predecessors of x in the precedence graph since, for ∀y ∈ Ex , ϕ(x) = ∩ϕ(y). Moreover, this test can be reduced to maximal predecessors of x. Therefore, this treatment has only to be performed for elements with more than one immediate predecessors, and the equality has to be checked with the set of immediate predecessors of x. This algorithm takes into account a closure operator ϕ on a set S as input. The output of the alforithm is the reducible element set X ⊂ S and the equivalent elements set Ex for each x ∈ X. 224 Ngoc Bich Dao, Karell Bertet and Arnaud Revel Alg. 1 reduces a closure system in O(|S|.cϕ + |S|2 log |S|) where cϕ is the cost of a closure generation and —S— is the number of nodes. Indeed, the precedence graph can be initialized in O(|S|cϕ + |S|2 log|S|) by computing the closures in O(|S|cϕ ), and then comparing two closures in O(|S|2 log|S|). Then, the SCCs can be computed using Kosaraju’s algorithm by two passes of depth first search, thus a complexity in O(|S| + |A|) ≤ O(|S|2 ), with |A| nb of edges in the graph. Stan- dardization and clarification are clearly in O(|S|) by a simple pass into the graph. Finaly, reduction considers the immediate predecessors of each x ∈ S in O(|S|2 ), and then computes and compare two closures in O(|S|cϕ +|S|2 log|S|). Therefore, Alg. 1 computes the attribute reduced context in O(|I|2 |O| + |I|2 log|I|). since a closure can be obtained in O(|I|.|O|). Input: a closure operator ϕ on a set S Output: the reducible elements set X ⊂ S, and the equivalent elements set Ex for each x ∈ X init a set Res with ∅; init a graph G with S as set of node; \\ Precedence graph; foreach (x, y) ∈ S × S do if ϕ(x) ⊆ ϕ(y) then add the edge (x, y) in G; end end compute the set CF C of the strongly connected components of G; let source be the sources of the graph G; \\ Step (1): Standardization; foreach C ∈ CF C do choose y ∈ C; foreach x ∈ C such that x 6= y do add x in Res with Ex = {y}; delete x from the graph G; end end \\ Step (2): Clarification; if |source| = 1 and ϕ(source) = ϕ(∅) then add source in Res with Esource = ∅; delete source from G; end \\ Step (3): Reduction; foreach x ∈ G do let P the set of immediate predecessors x in the graph G; if |P | 6= 1 and ϕ(x) = ϕ(P ) then add x in Res with Ex = P ; delete x from the graph G; end end return Res, (Ex )x∈Res ; Algorithm 1: Reduction of a closure system Reduction Dimension of Bags of Visual Words with FCA 225 3 Experimentation 3.1 Datasets In our experiments, we compare the performance of the method we propose on different image data sets. Each image in a data set is described by a vector composed of the occurrence frequencies of its visual words, where a set of visual words is defined for each data set. Table 3 describes the different data sets we used in our experiments, and the methods applied to generate the whole bag of visual words. Database Images nb Features Detector Descriptor Dictionary of nb visual words VOC2012[21] 17124 4096 Harris- CMI (Colour Random Laplace Moment selection of Invariants)[22] all key points MIR flickr[23] 24991 4096 Harris- CMI1 Random Laplace selection of all key points COREL[24] 4998 500 SIFT SIFT[2] K-means[25] (OpenCV) CALTECH 30607 500 SIFT SIFT2 K-means 256[26] (OpenCV) Dataset 1 1354 262 Harris- SIFT K-means (VOC2005)[27] Laplace and (OpenCV) Laplacian3 Table 3: Description of used datasets 3.2 Experimental protocol As mentioned earlier, the algorithm we propose requires binary values indicating for each object whether it possesses a given attribute or not. Since each image is described by a visual word occurence frequency vector, its values can vary from 0 to a max value depending on the image size and the quantity of visual words in the image. For instance, if an image is black painted, there is only one visual word ”black” for the whole image with a big frequency, and the vector 1 http://koen.me/research/colordescriptors/ 2 http://www.robots.ox.ac.uk/ vgg/research/affine/#software 3 http://lear.inrialpes.fr/people/dorko/downloads.html 226 Ngoc Bich Dao, Karell Bertet and Arnaud Revel will be sparse. Conversely, an image with a patchwork of colors is described by a frequency vector mainly composed of low but not zero values. To be able to compare several images, it is thus necessary to normalize their frequency vector before binarization. Normalization As mentioned before, the visual word occurrence frequency can be very important in some images, and insignificant in others. In order to compare the visual words, several strategies can be adopted. First of all, it is necessary to find out a ”max” value in the data set and then divide the visual word frequency by this max value to transform the values in a range 0 to 1. Two manners to define the max value have been considered into this article. Normalization by line (image) With this type of normalization, a max value is computed for each image as being the maximum frequency value of the corre- sponding image. The interpretation of this normalization is that we consider as significant the ratio between the different attributes of a given image. This kind of normalization does not depend on the database size and on the image size. However, the normalized values do not account for the ratio measurement of the same attribute between the images in the database. Normalization by column (feature) Normalization by column finds out the max- imum values of the frequency for each attribute in the database. With this ap- proach, the correspondence between the images in the database is taken into ac- count. The drawback is that each time a new image is inserted into the database, the normalized values must be recomputed. Besides, the image size must also be taken into account. Table 4 gives an illustrated example. f1 f2 f3 f4 f1 f2 f3 f4 f1 f2 f3 f4 img1 1 0 50 5 img1 0.02 0 1 0.1 img1 0.1 0 1 0.05 img2 10 9 1 8 img2 1 0.9 0.1 0.8 img2 1 1 0.02 0.08 img3 0 0 0 99 img3 0 0 0 1 img3 0 0 0 1 (a) Initial data (b) After normalization (c) After normalization by line by column Table 4: Illustration for normalization types Binarization After the normalization, we simply binarize the normalized values by comparing these values with a threshold varying from 0 to 0.9. At the highest threshold one, in the normalization by line case, it is possible that most of the attributes in an image should be below the threshold. To avoid removing all the visual words from an image, the highest threshold has been assigned to 0.9. Reduction Dimension of Bags of Visual Words with FCA 227 Reduction The next phase in the algorithm is to apply our reduction method which is itself composed of three steps (clarification, standardisation, reduction). Indeed, before applying the proposed method to bag of visual words, we must remove all the visual words that appear (resp. do not appear) in each (resp. any) image. This step corresponds to the clarification. The standardization step reduces the feature that the vector of images of a given feature equivalent to the vector of images of another feature. At last, in the reduction step, all the features which are the combination of other features are removed. 3.3 Results In this section, we detail the results obtained with our reduction method for 5 data sets, described in section 2.2. To analyze the behavior of our method, and the contribution of each step of the algorithm, we introduce the ratio of removed features for each step of the reduction algorithm as follows: ∆1 = Naatt , ∆2 = Nattb −a , ∆3 = Natt −a−b c Where a (resp. b and c) is the number of removed attributes in the standard- ization (resp. clarification and reduction) step; Natt is the attribute number in total. Figure 2 shows the evolution of ∆1 , ∆2 , ∆3 with regard to the threshold level, for both normalization types: line and column. The maximum ratio of removed attributes of the data sets (CALTECH, COREL, VOC2005, MIRflickr, VOC2012) are approximately equal to 0.67%, 2.6%, 22.5%, 95%, 96% respectively. The impact of the reduction is more in- teresting in the last three datasets. This phenomenon can be explained by the bag of visual words generation since the two data sets MIR flickr and VOC2012 are composed of randomly selected visual words stemming from the keypoints set. Conversely, the data sets CALTECH, COREL and VOC2005, are composed of bags of visual words defined by the SIFT detector and descriptor, and by a K-means clustering. Thus, the randomly selected visual words are less consistent. We can also observe that the percentage of removed attributes increases while the binarization threshold increases. With an increasing threshold, only the most frequent words are kept, thus more attributes are potentially equivalent and removed. At last, there is no attribute reduction in the step 1 (∆1 value) with a nor- malization by column because this kind of normalization can not generate empty columns. Morover, a normalization by line keeps the most frequent attributes in each image whereas a normalization by column keeps the most frequent images for each attribute. To summarize, the number of removed attributes depends both on the visual words generation, on the chosen threshold of binarization and on the normalization process (by line or column). However, care should be taken, that the greater the binarization threshold is, the smaller the number of images remaining. Except in the case normalization by line. 228 Ngoc Bich Dao, Karell Bertet and Arnaud Revel CALT ECH COREL V OC2005 M IRf lickr V OC2012 (a) Normalization by line (b) Normalization by column Fig. 2: The ratio of removed attributes according to the initial attributes corre- sponding to three cases of proposed method where red line is ∆1 , blue dash is ∆2 and green dash dot dot is ∆3 . Reduction Dimension of Bags of Visual Words with FCA 229 4 Conclusion and perspective In this article, we present a logic feature selection method of bags of visual words. This method, stemming from Formal Concept Analysis, is a closure sys- tem reduction without, theoretically, loss of information. That means that the data description lattice is preserved by the reduction treatment. In our con- text, combining our proposed method with a bag of visuals words is original. The experimentations show that the number of deleted features can be interest- ing, depending on the data set and the binarization treatment. Moreover, it is possible to perform both an object and an attribute reduction. A finer analysis should be obtained in the supervised case, by comparing classification performance before and after reduction. Moroever, the number of potentially deleted objects could also be usefull to autmatically define a good binarization thresold in the supervised case: while suppression of objects belong- ing to the same class is to promote, we must avoid removing objects of different classes. Objects reduction can easily be performed by applying our reduction algorithm on the objects set. At last, we plan to study the number of deleted attributes and deleted objects (of the same class / of different class) to evaluate the complexity of a data set, and the quality of its visuals words. Acknowledgment: We would like to thank Thierry URRUTY, Nhu Van NGUYEN and Dounia AWAD who extracted the bag of visual words we used in this paper. References 1. Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000) 1349–1380 2. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the International Conference on Computer Vision, Kerkyra (1999) 1150–1157 3. Bosch, A., Zisserman, A., Munoz, X.: Scene Classification Via pLSA. In Leonardis, A., Bischof, H., Pinz, A., eds.: 9th European Conference on Computer Vision. Volume 3954 of Lecture Notes in Computer Science., Graz, Austria, Springer Berlin Heidelberg (2006) 517–530 4. Tufféry, S.: Data mining et statistique décisionnelle: L’intelligence des données. Technip edn. Volume 2010. (2010) 5. Belohlavek, R., Kruse, R., Vychodil, V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76 (2010) 3–20 6. Fisher, R.A.: The use of multiple measurements in taxonomic problems. The Annals of Eugenics 7 (1936) 179–188 7. Hotelling, H.: Analysis of a complex of statistical variables into principal compo- nents. Journal of Educational Psychology 24 (1933) 417–441 8. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation- based filter solution. In: Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC (2003) 856–863 230 Ngoc Bich Dao, Karell Bertet and Arnaud Revel 9. Hall, M.A.: Correlation-based feature subset selection for machine learning. Doctor of philosophy, University of Waikato, Hamilton, NewZealand (1999) 10. Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council 5 (1994) 537–550 11. Rakotomalala, R., Lallich, S.: Construction d’arbres de decision par optimisation. Revue Extraction des Connaissances et Apprentissage 16 (2002) 685–703 12. Kononenko, I.: Estimating attributes: Analysis and extensions of RELIEF. In Bergadano, F., Raedt, L., eds.: Machine Learning: ECML-94. Volume 784 of Lec- ture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Heidelberg (1994) 171–182 13. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Neural Information Processing Systems Foundation, MIT Press (2005) 14. Devaney, M., Ram, A.: Efcient Feature Selection in Conceptual Clustering. In: Ma- chine Learning: Proceedings of the Fourteenth International Conference, Nashville, TN (1997) 15. Dy, J.G., Brodley, C.E.: Feature Selection for Unsupervised Learning. Journal of Machine Learning Research 5 (2004) 845–889 16. Wolf, L., Shashua, A.: Feature Selection for Unsupervised and Supervised Infer- ence: The Emergence of Sparsity in a Weight-Based Approach. The Journal of Machine Learning Research 6 (2005) 1855–1887 17. Elghazel, H., Aussem, A.: Unsupervised feature selection with ensemble learning. Machine Learning (2013) 18. Barbut, M., Monjardet, B.: Ordre et classification: algèbre et combinatoire. Ha- chette, Paris (1970) 19. Caspard, N., Monjardet, B.: The lattices of closure systems, closure operators, and implicational systems on a finite set: a survey. Discrete Applied Mathematics 127 (2003) 241–269 20. Birkhoff, G.: Lattice Theory. 1st edn. American Mathematical Society (1940) 21. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) Challenge (2012) 22. Mindru, F., Tuytelaars, T., Gool, L.V., Moons, T.: Moment invariants for recog- nition under changing viewpoint and illumination. Computer Vision and Image Understanding 94 (2004) 3–27 23. Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceeding of the 1st ACM international conference on Multimedia information retrieval - MIR ’08, New York, USA, ACM Press (2008) 39–43 24. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007) 394–410 25. Macqueen, J.B.: Some Methods for classification and Analysis of Multivariate Ob- servations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. (1967) 281–297 26. Griffin, G., Holub, A.D., Perona, P.: Caltech-256 Object Category Dataset. Tech- nical report (2007) 27. Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L., Al., A.: The 2005 PASCAL Visual Object Classes Challenge. In: First PASCAL Machine Learning Challenges Workshop, MLCW 2005. Volume 3944 of Lecture Notes in Computer Science., Berlin, Heidelberg, Springer Berlin Heidelberg (2005) 117–176 A One-Pass Triclustering Approach: Is There any Room for Big Data? Dmitry V. Gnatyshak1 , Dmitry I. Ignatov1 , Sergei O. Kuznetsov1 , and Lhouari Nourine2 1 National Research University Higher School of Economics, Russian Federation dmitry.gnatyshak@gmail.com http://www.hse.ru 2 Blaise Pascal University, LIMOS, CNRS, France http://www.univ-bpclermont.fr/ Abstract. An efficient one-pass online algorithm for triclustering of bi- nary data (triadic formal contexts) is proposed. This algorithm is a modified version of the basic algorithm for OAC-triclustering approach, but it has linear time and memory complexities with respect to the car- dinality of the underlying ternary relation and can be easily parallelized in order to be applied for the analysis of big datasets. The results of computer experiments show the efficiency of the proposed algorithm. Keywords: Formal Concept Analysis, triclustering, triadic data, data mining, big data 1 Introduction Cluster analysis of multimodal data and specifically of dyadic and triadic re- lations is a natural extension of the idea of normal clustering. In dyadic case biclustering methods (the term bicluster was coined by B. Mirkin [17]) are used to simultaneously find subsets of the sets of objects and attributes that form ho- mogeneous patterns of the input object-attribute data. One of the most popular applications of biclustering is gene expression analysis in Bionformatics [16,3]. Triclustering methods operate in triadic case in which for each object-attribute pair one assigns a set of some conditions [18,8,5]. Both biclustering and triclus- tering algorithms are widely used in such areas as the analysis of gene expression [21,15,13], recommender systems [19,10,9], social networks analysis [6], etc. The processing of numeric multimodal data is also possible by modifications of ex- isting approaches for mining binary relations [12]. Though there are methods that can enumerate all triclusters satisfying cer- tain constraints [1] (in most cases they ensure that triclusters are dense), their time complexity is rather high, as in the worst case the maximal number of tri- clusters usually is exponential (e.g. in case of formal triconcepts), showing that these methods are hardly scalable. To process big data algorithms need to have at most linear time complexity and be easily parallelizable. Also, in most cases, it is necessary that such algorithms output the results in one pass. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 231–243, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 232 2 Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al. In order to create an algorithm satisfying these requirements we adapted a tri- clustering method based on prime operators (prime OAC-triclustering method) [5]. As the result we developed an online version of prime OAC-triclustering method, which is linear, one-pass and easily parallelizable. The rest of the paper is organized as follows: in Section 2 we recall the method and the basic version of the algorithm of prime OAC-triclustering. In Section 3 we describe the online setting for the problem and the corresponding online version of the basic algorithm with some optimizations. Finally, in Section 4 we show the results of some experiments which demonstrate the efficiency of the online version of the algorithm. 2 Prime object-attribute-condition triclustering method Prime object-attribute-condition triclustering method based on the framework of Formal Concept Analysis [20,4,2] is an extension for the triadic case of object- attribute biclustering method [7]. Triclusters generated by this method have the same structure as the corresponding biclusters, namely the cross-like structure of triples inside the iput data cuboid (i.e. formal tricontext). Let K = (G, M, B, I) be a triadic context, where G, M , B are respectively the sets of objects, attributes, and conditions, and I ⊆ G × M × B is a triadic incidence relation. Each prime OAC-tricluster is generated by applying the following prime operators to each pair of components of some triple: (X, Y )0 = {b ∈ B | (g, m, b) ∈ I for all g ∈ X, m ∈ Y }, (X, Z)0 = {m ∈ M | (g, m, b) ∈ I for all g ∈ X, b ∈ Z}, (1) (Y, Z)0 = {g ∈ G | (g, m, b) ∈ I for all m ∈ Y, b ∈ Z} Then the triple T = ((m, b)0 , (g, b)0 , (g, m)0 ) is called prime OAC-tricluster based on triple (g, m, b) ∈ I. The components of tricluster are called, respec- tively, extent, intent, and modus. The triple (g, m, b) is called a generating triple of the tricluster T . Figure 2 shows the structure of an OAC-tricluster (X, Y, Z) based on triple (e e eb), triples corresponding to the gray cells are contained in g , m, the context, other triples may be contained in the tricluster (cuboid) as well. The basic algorithm for prime OAC-triclustering method is rather simple (Alg. 1). First of all, for each combination of elements from each two sets of K we compute the results of applying the corresponding prime operator (we will call the resulting sets prime sets). After that we enumerate all triples from I and on each step we must generate a tricluster based on the corresponding triple, check whether this tricluster is already presented in the tricluster set (by using hashing) and also check conditions. The total time complexity of the algorithm depends on whether there is a non-zero minimal density threshold or not and on the complexity of the hashing algorithm used. In case we use some basic hashing algorithm processing the tricluster’s extent, intent and modus and have a minimal density threshold equal to 0, the total time complexity of the main loop is O(|I|(|G| + |M | + |B|)), and of A One-Pass Triclustering A One-pass Triclustering Approach: IsApproach: There anyIs Room There any Room for Big for Big Data? Data? 233 3 Fig. 1. Structure of prime OAC-triclusters Algorithm 1 Algorithm for prime OAC-triclustering. Input: K = (G, M, B, I) — tricontext; ρmin — minimal density threshold Output: T = {T = (X, Y, Z)} 1: T := ∅ 2: for all (g, m) : g ∈ G,m ∈ M do 3: P rimesOA[g, m] = (g, m)0 4: end for 5: for all (g, b) : g ∈ G,b ∈ B do 6: P rimesOC[g, b] = (g, b)0 7: end for 8: for all (m, b) : m ∈ M ,b ∈ B do 9: P rimesAC[m, b] = (m, b)0 10: end for 11: for all (g, m, b) ∈ I do 12: T = (P rimesAC[m, b], P rimesOC[g, b], P rimesOA[g, m]) 13: T key = hash(T ) 14: if T key 6∈ T .keys ∧ ρ(T ) ≥ ρmin then 15: T [T key] := T 16: end if 17: end for 234 4 Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al. the whole algorithm is O(|G||M ||B| + |I|(|G| + |M | + |B|)). If we have a non-zero minimal density threshold, the time complexity of the main loop, as well as the time complexity of the algorithm, is O(|I||G||M ||B|). The memory complexity is O(|I|(|G| + |M | + |B|)), as we need to keep the dictionaries with the prime sets in memory. 3 Online version of the OAC-triclustering algorithm At first, let us describe the online problem of finding the set of prime OAC- triclusters. Let K = (G, M, B, I) be a triadic context. The user has no a priori knowledge of the elements and even cardinalities of G, M , B, and I. At each iteration we receive some set of triples from I: J ⊆ I. After that we must process J and get the current version of the set of all triclusters. It is important in this setting to consider every pair of triclusters different if they have different generating triples, event if their extents, intents, and modi are equal, because any other triple can change only one of them, thus making them different. The picture 2 shows the example of such situation (dark gray cells are the generating triples, light gray — prime sets). Fig. 2. Example of modification of triclusters by adding a triple Also the algorithm requires that the dictionaries containing the prime sets are implemented as hash-tables. Because of this data structure the algorithm can efficiently access prime sets for their processing. The algorithm itself is also quite simple (Alg. 2). It takes some set of triples (J) and current versions of the tricluster set (T ) and the dictionaries contain- ing prime sets (P rimesOA, P rimesOC, P rimesAC) as input and outputs the modified versions of the tricluster set and dictionaries. The algorithm processes each triple (g, m, b) of J sequentially (line 1). On each iteration the algorithm modifies the corresponding prime sets: A One-Pass Triclustering A One-pass Triclustering Approach: IsApproach: There anyIs Room There any Room for Big for Big Data? Data? 235 5 – adds b to (g, m)0 (line 2) – adds m to (g, b)0 (line 3) – adds g to (m, b)0 (line 4) Finally, it adds a new tricluster to the tricluster set. It is important to note that this tricluster contains pointers to the corresponding prime sets (in the corresponding dictionaries) instead of the copies of the prime sets (line 5). In effect this algorithm is the same as the basic one but with some optimiza- tions. First of all, instead of computing prime sets at the beginning, we modify them on spot, as adding an additional triple to the relation modifies only three prime sets by one element. Secondly, we remove the main loop by using pointers for the triclusters’ extents, intents, and modi, as we can generate triclusters at the same step as we modify the prime sets. And the third important optimiza- tion is the use of only one pass through the triples of the ternary relation I, instead of enumeration of different pairwise combinations of objects, attributes, and conditions. Algorithm 2 Add function for the online algorithm for prime OAC-triclustering. Input: J — set of triples; T = {T = (∗X, ∗Y, ∗Z)} — current set of triclusters; P rimesOA, P rimesOC, P rimesAC; Output: T = {T = (∗X, ∗Y, ∗Z)}; P rimesOA, P rimesOC, P rimesAC; 1: for all (g, m, b) ∈ J do 2: P rimesOA[g, m] := P rimesOA[g, m] ∪ b 3: P rimesOC[g, b] := P rimesOC[g, b] ∪ m 4: P rimesAC[m, b] := P rimesAC[m, b] ∪ g 5: T := T ∪ (&P rimesAC[m, b], &P rimesOC[g, b], &P rimesOA[g, m]) 6: end for Let us estimate the complexities of this algorithm. Each step requires the constant time: we need to modify three sets and add one tricluster to the set of triclusters. The total number of steps is equal to |I|. Thus the time complexity is linear O(|I|). Beside that the algorithms is one-pass. The memory complexity is the same: for each of |I| steps the size of each dic- tionary containing prime sets is increased either by one element (if the required prime set is already present), or by one key-value pair (if not). Still, each of these dictionary requires O(|I|) memory. Thus, the memory complexity is also linear O(|I|). Another important step used as an addition to this algorithm is post-processing. In addition to the user-specific post-processing there are some common useful steps. First of all, in the fixed moment of time we may want to remove addi- tional triclusters with the same extent, intent, and modus from the output. Also some simple conditions like minimal support condition can be processed during 236 6 Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al. this step without increasing the original complexity. It should be done only dur- ing the post-processing step, as the addition of a triple in the main algorithm can drastically change the set of triclusters, and, respectively, the values used to check the conditions. Finally, if we need to check more difficult conditions like minimal density condition the time complexity of the post-processing will be higher than the time complexity of the original algorithm, but it can be also efficiently implemented. To remove the same triclusters we need to use an efficient hashing procedure that can be improved by implementing it in the main algorithm. For this for all prime sets we need to keep their hash-values with them in the memory. And finally, when using hash-functions other than LSH function (Locality-Sensitive Hashing) [14] we can calculate hash-values of prime sets as some function of their elements (for example, exclusive disjunction or sum). Then when we modify prime sets we just need to get the result of this function and the new element. In this case, the hash-value of the tricluster can be calculated as the same function of the hash-values of its extent, intent, and modus. Then it would be enough to implement the tricluster set as a hash-set in order to efficiently remove the additional entries of the same tricluster. Pseudo-code for the basic post-processing (Alg. 3). Algorithm 3 Post-processing for the online algorithm for prime OAC- triclustering. Input: T = {T = (∗X, ∗Y, ∗Z)} — full set of triclusters; Output: T = {T = (∗X, ∗Y, ∗Z)} — processed hash-set of triclusters; 1: for all T ∈ T do 2: Calculate hash(T ) 3: if hash(T ) 6∈ T then 4: T := T ∪ T 5: end if 6: end for If the names of the objects, attributes, and conditions are small enough (so that we can consider the time complexity of computing their hash values as O(1)), the time complexity of the post-processing is O(|I|) if we do not need to calculate densities, and O(|I||G||M ||B|) otherwise. Also, the basic version of the post-processing does not require any additional memory, so its memory complexity is O(1). Finally, the algorithm can be easily paralleled by splitting the subset of triples J into several subsets, processing each of them independently, and merging the resulting sets afterwards. A One-Pass Triclustering A One-pass Triclustering Approach: IsApproach: There anyIs Room There any Room for Big for Big Data? Data? 237 7 4 Experiments Two series of experiments were conducted in order to verify the time complexities and efficiency of the online algorithm: first one was conducted on the first set of synthetic contexts and on real world datasets, the second one — on the second set of synthetic contexts with large number of triples in each. In each experiment for the first set both versions of the OAC-triclustering algorithm were used to extract triclusters from a given context. Only the online version of the algorithm was applied to the second set of contexts as the computation time of the basic version of the algorithm was too high. To evaluate the time more precisely, for each context there were 5 runs of the algorithms with the average result recorded. 4.1 Datasets Synthetic datasets. As it was mentioned, two sets of synthetic contexts were generated. First five contexts have the same size, but different average densities. The sets of objects, attributes, and conditions of these contexts consist of 50 elements each (thus, the maximal number of triples for them is equal to 125,000). To form the relation I a pseudo-random number generator was used. It added each triple to the context with the given probability that was different for each context. These probabilities were: 0.02, 0.04, 0.06, 0.08, and 0.1. The second set of uniform synthetic contexts consists of 10 contexts with the same probability for each triple to be included (0.001), but with different sizes of the sets of objects, attributes, and conditions. These sizes were 100, 200, 300, . . . , 1000. IMDB. This dataset consists of Top-250 list of the Internet Movie Database (250 best movies based on user reviews). For the analysis the following triadic context was extracted: the set of objects consists of movie names, the set of attributes — of tags, the set of conditions — of genres, and a triple of the ternary relation means that the given movie has the given genre and is assigned the given tag. Bibsonomy. Finally, a sample of the data of bibsonomy.org was used. This website allows users to share bookmarks and lists of literature and tag them. For the research the following triadic context was extracted: the set of objects consists of users, the set of attributes (tags), the set of conditions (bookmarks), and a triple of the ternary relation means that the given user has assigned the given tag to the given bookmark. The table 1 contains the summary of the contexts. 4.2 Results The experiments were conducted on the computer running under Windows 8, us- ing Intel Core i7-3517U 2.40 GHz processor, having 8 GB RAM. The algorithms 238 8 Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al. Table 1. Contexts for the experiments Context |G| |M | |B| # triples Density Synthetic1 , 0.02 50 50 50 2530 0.02024 Synthetic1 , 0.04 50 50 50 5001 0.04001 Synthetic1 , 0.06 50 50 50 7454 0.05963 Synthetic1 , 0.08 50 50 50 10046 0.08037 Synthetic1 , 0.1 50 50 50 12462 0.09970 Synthetic2 , 100 100 100 100 996 0.001 Synthetic2 , 200 200 200 200 7995 0.001 Synthetic2 , 300 300 300 300 27161 0.001 Synthetic2 , 400 400 400 400 63921 0.001 Synthetic2 , 500 500 500 500 125104 0.001 Synthetic2 , 600 600 600 600 216021 0.001 Synthetic2 , 700 700 700 700 343157 0.001 Synthetic2 , 800 800 800 800 512097 0.001 Synthetic2 , 900 900 900 900 729395 0.001 Synthetic2 , 1000 1000 1000 1000 1000589 0.001 IMDB 250 795 22 3818 0.00087 BibSonomy 51 924 2844 3000 0.000022 were implemented in C# under .NET Framework 4.5. Jenkins’ hash-function [11] was used to generate hash-values. Figure 3 shows the time performance of both versions of the algorithms for different values of minimal density threshold. Figure 4 shows the computation time for the online version of the algorithm on the second set of synthetic con- texts. “Basic” graph refers to the average time required by the basic algorithm, “Online, algorithm” — to average time required by the main algorithm part of the online algorithm (addition of new triples), “Online, total” — to the aver- age time required by both the main algorithm and post-processing. Table 4.2 contains summary of the results for the case of zero minimal threshold. As it can be clearly seen from all the graphs, online version of the algorithm significantly outperforms the basic version. However, post-processing in case of non-zero minimal density threshold can minimize the difference, especially in cases with small sets of objects, attributes, and conditions and large ternary relation. In the case of several contexts of the fixed size, but increasing density, total computation time converges to the same value for the both algorithms, with the time for the online one being slightly smaller. For the non-zero minimal density threshold this convergence takes place for almost any average density value. In this case there is a rather large number of triclusters of big size, with many intersections, thus it takes much time to calculate all the triclusters’ densities. This situation is close to the worst case, where time complexity is O(|G||M ||B|) for the main algorithm (because |I| converges to |G||M ||B|) and O(|I||G||M ||B|) for the post-processing. Also, in the case where the context’s A One-Pass Triclustering A One-pass Triclustering Approach: IsApproach: There anyIs Room There any Room for Big for Big Data? Data? 239 9 Fig. 3. Results of the experiments for both versions of OAC-triclustering algorithm 240 10 Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al. Fig. 4. Computation time for the online algorithm for various numbers of triples density getting closer to 1, total time for both algorithms should be almost the same even in the case of zero minimal density threshold, as in the worst case for dense contexts |I| is equal to |G||M ||B| (though it is an extremely rare case for real datasets). The results for the second set of synthetic contexts confirm that the algorithm is indeed linear with respect to the number of triples. It also shows that the significant number of triples does not affect the performance as long as the context fits in the memory. As for the other datasets with large sets of objects, attributes, and conditions and small ternary relation, the online algorithm significantly outperforms the basic one. The basic version spends much time on enumeration the large number of combinations of the elements of different sets of the context, while the online one just passes through the existing triples. Time to compute densities is quite small for these datasets since due to their sparseness they contain small number of rather small triclusters. Finally, as it can be seen, for non-dense contexts the average density of triclusters is rather high even in the case of zero minimal density threshold. Because of that, it can be advised in most of the cases to use the online version of the algorithm without any hard conditions, like minimal density condition, as the results will still be good, but the performance will be significantly improved. A One-Pass Triclustering A One-pass Triclustering Approach: IsApproach: There anyIs Room There any Room for Big for Big Data? Data? 241 11 Table 2. Tricluster sets summary Context Number of triclusters Average density Synthetic1 , 0.02 2456 0.700 Synthetic1 , 0.04 4999 0.426 Synthetic1 , 0.06 7453 0.286 Synthetic1 , 0.08 10046 0.218 Synthetic1 , 0.1 12462 0.193 Synthetic2 , 100 897 0.993 Synthetic2 , 200 6972 0.972 Synthetic2 , 300 23645 0.941 Synthetic2 , 400 56584 0.909 Synthetic2 , 500 113041 0.871 Synthetic2 , 600 199210 0.834 Synthetic2 , 700 322447 0.796 Synthetic2 , 800 487982 0.759 Synthetic2 , 900 703374 0.722 Synthetic2 , 1000 973797 0.686 IMDB 1276 0.539 BibSonomy 1290 0.946 5 Conclusion In this paper we have presented an online version of OAC-triclustering algorithm. We have shown that the algorithm is efficient from both theoretical and practical points of view. Its linear time complexity and performance in one pass (with an additional pass for the required post-processing) allows us to use it for big data problems. Moreover, the online algorithm as well as the basic one can be easily parallelized to attain even larger efficiency. Acknowledgements. The study was implemented in the framework of the Basic Research Program at the National Research University Higher School of Economics in 2013-2014, in the Laboratory of Intelligent Systems and Structural Analysis (Russian Federation), and in the LIMOS (Laboratoire d’Informatique, de Modelisation et d’Optimisation des Systemes) (France). The first three au- thors were partially supported by Russian Foundation for Basic Research, grant no. 13-07-00504. References 1. Cerf, L., Besson, J., Nguyen, K.N., Boulicaut, J.F.: Closed and noise-tolerant patterns in n-ary relations. Data Min. Knowl. Discov. 26(3), 574–619 (2013) 2. Davey, B.A., Priestley, H.A.: Introduction to Lattices and Order. Cambridge Uni- versity Press, 2 edn. (2002) 3. Eren, K., Deveci, M., Kucuktunc, O., Catalyurek, Umit V.: A comparative analysis of biclustering algorithms for gene expression data. Briefings in Bioinform. (2012) 242 12 Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al. 4. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1st edn. (1999) 5. Gnatyshak, D.V., Ignatov, D.I., Kuznetsov, S.O.: From triadic FCA to tricluster- ing: Experimental comparison of some triclustering algorithms. In: Ojeda-Aciego, M., Outrata, J. (eds.) CLA. CEUR Workshop Proceedings, vol. 1062, pp. 249–260. CEUR-WS.org (2013) 6. Gnatyshak, D.V., Ignatov, D.I., Semenov, A.V., Poelmans, J.: Gaining insight in social networks with biclustering and triclustering. In: BIR. Lecture Notes in Business Information Processing, vol. 128, pp. 162–171. Springer (2012) 7. Ignatov, D.I., Kuznetsov, S.O., Poelmans, J.: Concept-based biclustering for in- ternet advertisement. In: ICDM Workshops. pp. 123–130. IEEE Computer Society (2012) 8. Ignatov, D.I., Kuznetsov, S.O., Poelmans, J., Zhukov, L.E.: Can triconcepts be- come triclusters? International Journal of General Systems 42(6), 572–593 (2013) 9. Ignatov, D.I., Nenova, E., Konstantinova, N., Konstantinov, A.V.: Boolean Matrix Factorisation for Collaborative Filtering: An FCA-Based Approach. In: Agre, G., et al. (eds.) AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8722, pp. 47–58. Springer (2014) 10. Jelassi, M.N., Yahia, S.B., Nguifo, E.M.: A personalized recommender system based on users’ information in folksonomies. In: Carr, L., et al. (eds.) WWW (Companion Volume). pp. 1215–1224. ACM (2013) 11. Jenkins, B.: A hash function for hash table lookup (2006), http://www. burtleburtle.net/bob/hash/doobs.html 12. Kaytoue, M., Kuznetsov, S.O., Macko, J., Napoli, A.: Biclustering meets triadic concept analysis. Ann. Math. Artif. Intell. 70(1-2), 55–79 (2014) 13. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression data with pattern structures in formal concept analysis. Inf. Sci. 181(10), 1989– 2001 (2011), http://dx.doi.org/10.1016/j.ins.2010.07.007 14. Leskovec, J., Rajaraman, A., Ullman, J.: Mining of Massive Datasets, chap. Find- ing Similar Items, pp. 71–128. Cambridge University Press, England, Cambridge (2010) 15. Li, A., Tuck, D.: An effective tri-clustering algorithm combining expression data with gene regulation information. Gene regulation and systems biology 3, 49–64 (2009), http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2758278/ 16. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biology Bioinform. 1(1), 24–45 (2004) 17. Mirkin, B.: Mathematical Classification and Clustering. Kluwer, Dordrecht (1996) 18. Mirkin, B.G., Kramarenko, A.V.: Approximate bicluster and tricluster boxes in the analysis of binary data. In: Kuznetsov, S.O., et al. (eds.) RSFDGrC 2011. Lecture Notes in Computer Science, vol. 6743, pp. 248–256. Springer (2011) 19. Nanopoulos, A., Rafailidis, D., Symeonidis, P., Manolopoulos, Y.: Musicbox: Per- sonalized music recommendation based on cubic analysis of social tags. IEEE Transactions on Audio, Speech & Language Processing 18(2), 407–412 (2010) 20. Wille, R.: Restructuring lattice theory: An approach based on hierarchies of con- cepts. In: Rival, I. (ed.) Ordered Sets, NATO Advanced Study Institutes Series, vol. 83, pp. 445–470. Springer Netherlands (1982) 21. Zhao, L., Zaki, M.J.: Tricluster: An effective algorithm for mining coherent clusters in 3d microarray data. In: Özcan, F. (ed.) SIGMOD Conference. pp. 694–705. ACM (2005) Three Related FCA Methods for Mining Biclusters of Similar Values on Columns Mehdi Kaytoue1 , Victor Codocedo2 , Jaume Baixieres3 , and Amedeo Napoli2 1 Université de Lyon. CNRS, INSA-Lyon, LIRIS. UMR5205, F-69621, France. 2 LORIA (CNRS - Inria Nancy Grand Est - Université de Lorraine), B.P. 239, F-54506, Vandœuvre-lès-Nancy. 3 Universitat Politècnica de Catalunya. 08032, Barcelona. Catalonia. Corresponding author : mehdi.kaytoue@insa-lyon.fr Abstract. Biclustering numerical data tables consists in detecting par- ticular and strong associations between both subsets of objects and at- tributes. Such biclusters are interesting since they model the data as local patterns. Whereas there exists several definitions of biclusters, de- pending on the constraints they should respect, we focus in this paper on biclusters of similar values on columns. There are several ad hoc methods for mining such biclusters in the literature. We focus here on two aspects: genericity and efficiency. We show that Formal Concept Analysis pro- vides a mathematical framework to characterize them in several ways, but also to compute them with existing and efficient algorithms. The proposed methods, which rely on pattern structures and triadic concept analysis, are experimented and compared on two different datasets. Keywords: biclustering, triadic concept analysis, pattern structure 1 Introduction Biclustering has attracted a lot of attention for many years now, as it was used in an extensive way for mining biological data [7]. Given a data-table with objects as rows and attributes as columns, the goal is to find “sub-tables”, or pairs of both subsets of objects and attributes, such that the values in the subtables respect well-defined constraints or maximize a given measure [17]. There exist several types of biclusters depending on the relation the values should respect. For example, constant biclusters are subtables with equal val- ues [12, 6, 17]. Biclusters with similar values on columns (BSVC) are subtables where all values are pairwise similar for each column [4, 17]. The latter can also be generalized to biclusters of similar values (BSV): any two values in the sub- table are similar [2, 3, 12, 21]. Dozens of algorithms, mostly ad hoc, have been proposed for computing the different types of biclusters. In this paper, we are interested in possible extensions of the Formal Concept Analysis (FCA) for- malism for achieving the problem of biclustering. This comes with two goals: (i) formalizing and understanding biclusters formation and structure, and (ii) reusing existing algorithms for genericity purposes. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 243–255, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 244 Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli Actually, the present paper is in continuation with the work of the authors on the use of pattern structures –an extension of FCA for mining complex data [8, 12]– for discovering functional dependencies in a crisp and a fuzzy settings [1], and as well on the adaptation of pattern structures to a specific biclustering task: the discovery of biclusters of type BSV [6, 11]. Moreover, the biclustering task is usually considered as a “‘two-dimensional” (2D) process where biclusters are rectangles in a table verifying some prior constraints. It was one main idea of [11] to transpose the problem in a “three-dimensional” setting by using and adapting triadic concept analysis [16] to the biclustering task. Here we follow the same line and we propose a new approach for discovering biclusters in a numerical dataset where biclusters have “similar values” w.r.t. their columns (type BSVC). This works is a new attempt to extend the capabil- ities of FCA and of pattern structures, in dealing with the important problem of biclustering. Actually, biclustering can be also considered in a (pure) numerical setting, where it is sometimes called coclustering [18] and where kernel or spec- tral methods are often used for achieving the task. Here we keep the discrete setting and more precisely an FCA-based setting. The rest of this paper is organized as follows. In Section 2 we formally in- troduce the biclustering problem. Then, we recall in Section 3 the FCA basics that are necessary for developing our three methods in Section 4. We experiment with these methods and compare them by processing two real-world datasets in Section 5 before concluding. 2 Problem Definition We introduce the problem of mining biclusters of similar values on columns, or simply biclusters when no confusion can be made. A numerical dataset is defined as a many-valued context in which biclusters are denoted as pairs of object and attribute subsets for which a particular similarity constraint holds. Definition 1 (Many-valued context and numerical dataset). A many- valued context consists in a quadruple (G, M, W, I) where G is a set of objects, M a set of attributes, W a set of attribute values, and I ⊆ G × M × W a ternary relation. An element (g, m, w) ∈ I, also written m(g) = w or g(m) = w, can be interpreted as: w is the value taken by the attribute m for the object g. The relation I is such that g(m) = w and g(m) = v implies w = v. In the present work, W is a set of numbers and Knum = (G, M, W, I) denotes a numerical dataset, i.e. a many-valued context where W is a set of numbers. m1 m2 m3 m4 Example. A tabular representation of a numer- g1 1 2 2 8 ical dataset is given in Table 1: objects G = g2 2 1 2 9 g3 2 1 1 2 {g1 , g2 , g3 , g4 , g5 } are represented by rows while at- g4 1 0 7 6 tributes M = {m1 , m2 , m3 , m4 } are represented by g5 6 6 6 7 columns. W = {0, 1, 2, 6, 7, 8, 9} and we have for ex- ample g2 (m4 ) = 9. Fig. 1. A numerical dataset Three FCA Methods for Mining Biclusters of Similar Values on Columns 245 Definition 2 (Biclusters with similar values on columns). Given a nu- merical dataset (G, M, W, I), a pair (A, B) (where A ⊆ G, B ⊆ M ) is called a bicluster of similar values on columns when the following statement holds: ∀g, h ∈ A, ∀m ∈ B, m(g) 'θ m(h) where 'θ is a similarity relation: ∀w1 , w2 ∈ W, θ ∈ [0, max(W ) − min(W )], w1 'θ w2 ⇐⇒ |w1 − w2 | ≤ θ. A bicluster (A, B) is maximal if @g ∈ G\A such that (A ∪ {g}, B) is a bicluster, and @m ∈ M \B such that (A, B ∪ {m}) is a bicluster. Example. In Table 1, with θ = 1, we have that (A, B) = ({g1 , g2 }, {m1 , m2 , m3 }) is a bicluster. Indeed, consider each attribute of B separately: the values taken by the objects A are pairwise similar. However, (A, B) is not maximal, since we have that both (A ∪ {g3 }, B) and (A, B ∪ {m4 }) are also biclusters. Then, ({g1 , g2 , g3 }, {m1 , m2 , m3 }) and ({g1 , g2 }, {m1 , m2 , m3 , m4 }) are both maximal. Problem (Biclustering). Given a numerical dataset (G, M, W, I) and a simi- larity parameter θ, the goal of biclustering is to extract the set of all maximal biclusters (A, B) respecting the similarity constraint. Remark. It should be noticed that in the formal definition, the similarity pa- rameter is the same for all attributes. It is possible however to use a different parameter for each attribute without changing neither the problem definition or its resolution. For real-world datasets, one can choose different similarity param- eters θm (∀m ∈ M ), but also can normalize/scale the attribute domains and use a single similarity parameter θ. 3 Basics on Formal Concept Analysis In this paper, we show how our biclustering problem can be formalized and answered in FCA in different ways: (i) using standard FCA [9], (ii) using pattern structures [8], and (iii) using triadic concept analysis [16]. We recall below the basics of each approach. Dyadic Concept Analysis. Let G be a set of objects, M a set of attributes and I ⊆ G × M be a binary relation. The fact (g, m) ∈ I is interpreted as “g has attribute m”. The two following derivation operators (·)0 are defined: A0 = {m ∈ M | ∀g ∈ A : gIm} f or A ⊆ G, 0 B = {g ∈ G | ∀m ∈ B : gIm} f or B ⊆ M which define a Galois connection between the powersets of G and M . For A ⊆ G, B ⊆ M , a pair (A, B) such that A0 = B and B 0 = A, is called a (formal) concept. Concepts are partially ordered by (A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1 ). With respect to this partial order, the set of all formal concepts forms a complete lattice called the concept lattice of the formal context (G, M, I). For a concept (A, B) the set A is called the extent and the set B the intent of the concept. 246 Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli Triadic Concept Analysis. A triadic context is given by (G, M, B, Y ) where G, M , and B are respectively called sets of objects, attributes and conditions, and Y ⊆ G × M × B. The fact (g, m, b) ∈ Y is interpreted as the statement “Ob- ject g has the attribute m under condition b”. A (triadic) concept of (G, M, B, Y ) is a triple (A1 , A2 , A3 ) with A1 ⊆ G, A2 ⊆ M and A3 ⊆ B satisfying the two following statements: (i) A1 ×A2 ×A3 ⊆ Y , X1 ×X2 ×X3 ⊆ Y and (ii) A1 ⊆ X1 , A2 ⊆ X2 and A3 ⊆ X3 implies A1 = X1 , A2 = X2 and A3 = X3 . If (G, M, B, Y ) is represented by a three dimensional table, (i) means that a concept stands for a 3-dimensional rectangle full of crosses while (ii) characterizes component-wise maximality of concepts. For a triadic concept (A1 , A2 , A3 ), A1 is called the ex- tent, A2 the intent and A3 the modus. To derive triadic concepts, two pairs of derivation operators are defined. The reader can refer to [16] for their definitions which are not necessary for the understanding of the present work. Pattern Structures. Let G be a set of objects, let (D, u) be a meet-semi- lattice of potential object descriptions and let δ : G −→ D be a mapping. Then (G, (D, u), δ) is called a pattern structure. Elements of D are called patterns and are ordered by a subsumption relation v such that given c, d ∈ D one has c v d ⇐⇒ cud = c. Within the pattern structure (G, (D, u), δ) we can define the following derivation operators (·) , given A ⊆ G and a description d ∈ (D, u): l A = δ(g) d = {g ∈ G|d v δ(g)} g∈A These operators form a Galois connection between (℘(G), ⊆) and (D, v). (Pat- tern) concepts of (G, (D, u), δ) are pairs of the form (A, d), A ⊆ G, d ∈ (D, u), such that A = d and A = d . For a pattern concept (A, d), d is called a pattern intent and is the common description of all objects in A, called pattern extent. When partially ordered by (A1 , d1 ) ≤ (A2 , d2 ) ⇔ A1 ⊆ A2 (⇔ d2 v d1 ), the set of all concepts forms a complete lattice called a (pattern) concept lattice. Computing Concepts and Concept Lattices. Processing a formal context in order to generate its set of concepts can be achieved by various algorithms (see [15] for a survey and a comparison, see also itemset mining [19]). For pro- cessing pattern structures, such algorithms generally need minor adaptations. Basically, one needs to override the code for (i) computing the intersection of any two arbitrary descriptions, and (ii) test the ordering between two descrip- tions. Processing a triadic context is however not so direct and can be done with nested FCA algorithms [10] or dedicated data-mining algorithm [5]. Similarity relations in FCA. The notion of similarity can be formalized by a tolerance relation: a symmetric, reflexive but not necessarily transitive relation. The similarity relation 'θ used for defining biclusters of similar values is a toler- ance. Given W a set of numbers, any maximal subset of pairwise similar values is called a block of tolerance. Definition 3. A binary relation T ⊆ W × W is called a tolerance relation if: (i) ∀x ∈ W xT x (reflexivity) (ii) ∀x, y ∈ W xT y → yT x (symmetry) Three FCA Methods for Mining Biclusters of Similar Values on Columns 247 Definition 4. Given a set W , a subset K ⊆ W , and a tolerance relation T on W , K is a block of tolerance if: (i) ∀x, y ∈ K xT y (pairwise similarity) (ii) ∀z 6∈ K, ∃u ∈ K ¬(zT u) (maximality) It is shown that tolerance blocks can be obtained from the formal context of a tolerance relation [14]. In the context (W, W, 'θ ), one can characterize all blocks of tolerance K (and only them) as formal concepts (K, K). 4 Mining biclusters of similar values on columns in FCA The basic notions of FCA of the previous section allow us now to answer our biclustering problem in various ways with: (i) an original method using inter- val pattern structure, (ii) a recently introduced method using partition pattern structures [6], and (iii) an original method relying on triadic concept analysis. We emphasize the genericity of FCA to answer a data mining problem. 4.1 Interval Pattern Structure Approach For a dataset Knum = (G, M, W, I), an interval pattern structure (G, (D, u), δ) is defined as follows [13]: the objects from G are described by vectors of intervals, where each dimension gives a range of values for an attribute m ∈ M (following a canonical ordering of the dimensions, i.e. dimension i corresponds to attribute mi ∈ M ). Then, for m ∈ M , the semi-lattice of intervals (Dm , um ) is given by: Dm = {[w1 , w2 ] | ∃g, h ∈ G s.t. m(g) = w1 and m(h) = w2 } [a, b] um [c, d] = [min(a, c), max(b, d)] c um d = c ⇐⇒ c vm d [a, b] vm [c, d] ⇐⇒ [c, d] ⊇ [a, b] The description space (D, u) of the interval pattern structure is a product of meet-semi-lattices (D, u) = ×m∈M (Dm , um ) which is a semi-lattice. Examples. In Table 1, ({g1 , g2 , g3 }, h[1, 2], [1, 2], [1, 2], [2, 9]i) is a pattern concept: δ(g1 ) = h[1, 1], [2, 2], [2, 2], [8, 8]i {g1 , g2 , g3 } = δ(g1 ) u δ(g2 ) u δ(g3 ) = h[1, 2], [1, 2], [1, 2], [2, 9]i h[1, 2], [1, 2], [1, 2], [8, 9]i v h[1, 2], [1, 2], [1, 2], [2, 9]i {g1 , g2 , g3 } = {g1 , g2 , g3 } We now give the intuitive idea on how the interval pattern concept lattice can be used to characterize the biclusters. Consider first the concept (A1 , d1 ) = ({g1 , g2 }, h[1, 2], [1, 2], [1, 2], [8, 9]i). Consider also a function attr : D → M which returns for an interval pattern the set of attributes whose interval is not larger than the θ parameter, for d = h[ai , bi ]i, i ∈ [1, |M |]: attr(d) = {mi ∈ M |ai 'θ bi }. (A1 , attr(d1 )) = ({g1 , g2 }, {m1 , m2 , m3 , m4 }) is a maximal bicluster. Con- sider the interval pattern concept (A2 , d2 ) = ({g1 , g2 , g3 }, h[1, 2], [1, 2], [1, 2], [2, 9]i): (A2 , attr(d2 )) = ({g1 , g2 , g3 }, {m1 , m2 , m3 }) is a maximal bicluster (with θ = 1). This means that biclusters can be characterized thanks to pattern concepts. 248 Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli Proposition 1. Consider a numerical dataset (G, M, W, I) as an interval pat- tern structure (G, (D, u), δ). For any maximal bicluster (A, B), there exists a pattern concept (A, d) such that (A, B) = (A, attr(d)). Proof. To ease reading, the proof is given in an appendix. t u 4.2 Partition pattern structure approach A partition pattern structure is a pattern structure instance where the de- scription space is given by a semi-lattice of partitions over a set S X [2]. Formally, we have (G, (D, u), δ) where: D = P art(X) and d1 u d2 = pi ∩ pj where pi , pj ⊆ X, pi ∈ d1 , pj ∈ d2 . The semi-lattice is actually a complete lattice of set partitions in which the bottom element is not considered. In [1], we showed that the definition of u, and equivalently v, needs a slight modification when K D = 22 , i.e. a description d ∈ D is a set of subsets of X, and they doScover X (possibly with overlapping). In that case, we have that d1 u d2 = max( pi ∩ pj ) where pi , pj ⊆ X, pi ∈ d1 , pj ∈ d2 and max(.) returns the maximal sets w.r.t. inclusion. Now we show that such a pattern structure can be constructed from a nu- merical dataset, and that the corresponding concepts allow to generate all max- imal biclusters. From a numerical dataset (G, M, W, I), we build the structure G (M, (D, u), δ) where D = 22 . The description of an object4 m ∈ M is given by: δ(m) = {p1 , p2 , ...} where p1 , p2 , .. ⊆ G and: m(g1 ) 'θ m(g2 ), ∀g1 , g2 ∈ pi (similarity) @g3 ∈ G\pi with m(g3 ) 'θ m(gk ), ∀gk ∈ pi (maximality) [ pi = G (covering) i In other words, each original attribute m ∈ M is described by a family of subsets of G, where each one corresponds to a block of tolerance w.r.t. the values of attribute m. Let (A, d = {pi }) be a partition pattern concept, it is easy to see how the pairs bici = (pi , A) are biclusters with rows g ∈ pi and columns m ∈ A5 . While any bici = (pi , A) is a bicluster, it is not necessarily a maximal bicluster. Nevertheless, maximal biclusters can be identified using the concept lattice. Proposition 2. Consider a pattern concept (A, d = {pi }). The bicluster bici = (pi , A) is maximal if there is no pattern concept (C, {pi , ...}) with A ⊆ C. Proof. The proof to this proposition is very intuitive. Recall from Section 2 that the bicluster (pi , A) is maximal if two conditions are met, namely @g ∈ G\pi such that (pi ∪ {g}, A) is a bicluster and @m ∈ M \A such that (pi , A ∪ {m}) is 4 Object in the pattern structure; attribute in the numerical dataset. 5 In order to keep consistency with the previous notation, biclusters are written in- versely as partition pattern concepts. Three FCA Methods for Mining Biclusters of Similar Values on Columns 249 a bicluster, The first condition holds for bici given the maximality condition of the tolerance block pi ; The second follows from the proposition declaration. tu Example. The numerical dataset (G, M, W, I) given in Table 1 can be turned into a pattern structure as follows with θ = 1: δ(m1 ) = {{g1 , g2 , g3 , g4 }{g5 }} δ(m2 ) = {{g2 , g3 , g4 }{g1 , g2 , g3 }{g5 }} δ(m3 ) = {{g1 , g2 , g3 }{g4 , g5 }} δ(m4 ) = {{g4 , g5 }{g1 , g5 }{g1 , g2 }{g3 }} Indeed, each component of a description is a maximal set of objects hav- ing pairwise similar values for a given attribute. The pattern concept lattice is given in Figure 2. We remark that (i) any concept corresponds to a biclus- ter, (ii) some of them correspond to a maximal bicluster, and most impor- tantly, (iii) any maximal bicluster can be found as a concept. For example, from the concept (A1 , d1 ) = ({m3 , m4 }, {{g1 , g2 }, {g4 , g5 }, {g3 }}) we obtain the following biclusters: bic1 = ({g1 , g2 }, {m3 , m4 }) and bic2 = ({g4 , g5 }, {m3 , m4 }). Whereas bic2 is a maximal bicluster bic1 is not since we have that (A2 , d2 ) = ({m1 , m2 , m3 , m4 }, {{g1 , g2 }, {g3 }, {g4 }, {g5 }}) with (A2 , d2 ) ≤ (A1 , d1 ). In turn, bic3 = ({g1 , g2 }, {m1 , m2 , m3 , m4 }) is a maximal bicluster. Remark. It is noticeable that an equivalent formal context can be built. By equivalent, we mean that the concept lattices produced by both structures are isomorphic. To obtain this formal context, we use a slight modification of the data transformation of [9] (pp. 92): (M, B2 (G), I) st. (m, (g, h)) ∈ I ⇐⇒ m(g) 'θ m(h). The concept lattice is equivalent to the pattern concept lattice [2], and thus it can be used in the same way to get maximal biclusters. In our running example, such context is given in Table 1, and its associated concept lattice is given in Figure 2 (right), a lattice isomorphic to the one raised from the pattern structure (left). The proof can be done in a similar manner as it is done in [2]. (g1 , g2 ) (g1 , g3 ) (g1 , g4 ) (g1 , g5 ) (g2 , g3 ) (g2 , g4 ) (g2 , g5 ) (g3 , g4 ) (g3 , g5 ) (g4 , g5 ) m1 × × × × × × m2 × × × × × m3 × × × × m4 × × Table 1. Formal context 4.3 Triadic Concept Analysis Approach We present another original result: any maximal bicluster of similar values is characterized as a triadic concept. The triadic context is derived from the nu- merical dataset by encoding the tolerance relation between the values. Proposition 3. Given a numerical dataset (G, M, W, I), consider the derived triadic context given by (M, G, G, Y ) s.t. (m, g1 , g2 ) ∈ Y ⇐⇒ m(g1 ) 'θ m(g2 ). 250 Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli Fig. 2. Pattern concept lattice on the left side, concept lattice of the right side. There is a one-to-one correspondence between the set of all maximal biclusters (A, B), the set of all triadic concepts (B, A, A) of the derived context. Proof. Consider a maximal bicluster (A, B). We have that ∀g, h ∈ A : m(g) 'θ m(h) ⇐⇒ m ∈ B, if and only if (by the definition of Y ) (B, A, A) ⊆ Y . We now take (B 0 , A0 , A0 ) ⊆ Y such that B ⊆ B 0 and A ⊆ A0 . Since (A, B) is a maximal bicluster, we have that for any pair of objects g, h ∈ A0 and m ∈ B 0 such that g(m) 'θ h(m), implies that g, h ∈ A and m ∈ B. Let (B, A, A) be a triadic concept. We have that for any pair of objects g, h ∈ A and m ∈ B we have that g(m) 'θ h(m), this is, that ∀g, h ∈ A : g(m) 'θ h(m) ⇐⇒ m ∈ B, which is the alternative definition of maximal bicluster. t u Example. Taking again θ = 1, the triadic context derived from the numerical dataset from Table 1 is given in Table 2. An example of triadic concept is: ({m3 , m2 , m1 }, {g1 , g3 , g2 }, {g1 , g2 , g3 }) which is in turn the maximal bicluster ({g1 , g3 , g2 }, {m3 , m2 , m1 }). 5 Experiments We experiment with the different FCA methods introduced in the previous sec- tion. We report preliminary results in two aspects: efficiency (running time) and compactness (number of concepts) to discuss the strengths and weaknesses of the different methods. m1 g1 g2 g3 g4 g5 m2 g1 g2 g3 g4 g5 m3 g1 g2 g3 g4 g5 m4 g1 g2 g3 g4 g5 g1 × × × × g1 × × × g1 × × × g1 × × × g2 × × × × g2 × × × × g2 × × × g2 × × g3 × × × × g3 × × × × g3 × × × g3 × g4 × × × × g4 × × × g4 × × g4 × × g5 × g5 × g5 × × g5 × × × Table 2. Triadic context derived from Table 1 thanks to '1 . Three FCA Methods for Mining Biclusters of Similar Values on Columns 251 Data and experimental settings. The first dataset, “Diagnosis”6 , contains 120 objects with 8 attributes. The first attribute provides temperature informa- tion of a given patient with a range [35.5, 41.5] (numerical). For this attribute we used θ = 0.1 and then θ = 0.3. The other 7 attributes are binary (θ = 0). The second dataset, “dataSample 1.txt”, is provided with the BiCat software7 . It contains 420 objects and 70 numerical attributes with range [−5.9, 6.7]. We used θ = 0.05 for all attributes. We provide results in Table 3 for the three dif- ferent FCA methods discussed in this article, namely interval pattern structure (IPS), tolerance blocks/partition pattern structures (TBPS) and triadic concept analysis (TCA). We also report on the use of standard FCA using the discretiza- tion technique discussed at the end of Section 4.2 (FCA). We also discuss the computing of clarified contexts, given that it can dramatically reduce the size of the context while keeping the same concept lattice (FCA-CL). A context is clarified when there exists neither two objects with the same description, or two attributes shared by the same set of objects. For the methods based on FCA and pattern structures (IPS, TBPS), we used a C++ version of the AddIntent algorithm [20]8 . No restrictions were imposed over the size of the biclusters. The TCA method was implemented using Data- Peeler [5]. All the experiments were performed using a Linux machine with Intel Xeon E7 running at 2.67GHz with 1TB of RAM. Discussion. Results in Table 3 show that for the Diagnosis dataset, the clar- ified context using standard FCA (FCA-CL) is the best of the five methods w.r.t. execution time while for the BicAt sample 1, the best is TCA. Times are expressed as the sum of the time required to create the input representation of the dataset for the corresponding technique and its execution. In the case of FCA and FCA-CL, the pre-processing can be as high as the time required for applying the AddIntent algorithm. However, for large datasets such as the BicAt example, this times can be ignored. It is also worth noticing that the pre-processing depends on the chosen θ value, hence for each different θ config- uration, a new pre-processing task has to be executed. This is not the case for interval and partition pattern structures the pre-processing of which is linear 6 http://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice 7 http://www.tik.ee.ethz.ch/sop/bicat/ 8 https://code.google.com/p/sephirot/ Diagnosis BicAt sample 1 θ = 0.3 θ = 0.1 θ = 0.05 Technique Time [s] #Concepts Exec. Time [s] #Concepts Exec. Time [s] #Concepts Preproc + Exec. Preproc + Exec. Preproc + Exec. FCA 0.11 + 0.335 98 0.11 + 0.291 88 2.3 + 2,220 476,950 FCA-CL 0.11 + 0.02 98 0.11 + 0.011 88 2.3 + 2,220 476,950 TCA 0.04 + 33.3 3,322 0.04 + 31.34 2,127 3.17 + 360 741,421 IPS 0.011 + 0.303 928 0.001 + 0.178 301 0.02 + 2,340 722,442 TBPS 0.011 + 1.76 98 0.001 + 0.411 88 0.02 + 5,340 476,950 Table 3. Number of concepts and execution times (pre-processing + addIntent run) 252 Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli w.r.t. the number of objects (it is actually, just a change of format). We can also appreciate a more compact representation of the biclusters by the use of partition pattern structures (TBPS) and its formal context versions (FCA and FCA-CL). While TBPS is the slowest of the five methods, it is also the cheapest one in terms of the use of machine resources, more specifically RAM. TCA is the more expensive method in terms of machine resources and data representation, however this yields results faster. Interval pattern structures are in the middle as a good trade-off of compactness and execution time. For this initial experimentation we have not reported the number of maximal biclusters nor the bicluster extraction algorithms that can be implemented for each different technique, but only in the FCA techniques themselves. Regarding the number of maximal biclusters, this is the same for each technique since all of them are bicluster enumeration techniques, i.e. all possible biclusters are extracted. Hence, the difference among techniques is not given by the number of maximal biclusters extracted, but by the number of formal concepts found and their post-processing complexity to extract the maximal biclusters from them. In general, it is easy to observe from Propositions 1, 2 and 3 that the post-processing of TCA is linear w.r.t. the number of triadic concepts found, while for TPS is linear w.r.t. the number of interval pattern concepts times the number of columns of the numerical dataset squared and for TBPS is linear w.r.t. the number of super-sub concept relations in the tolerance block pattern concept lattice. Nevertheless, different strategies for bicluster extraction can be implemented for each technique rendering the comparison unfair. For example, in [6] an optimization is proposed regarding biclustering using partition pattern structures (which can be easily adapted to TBPS) which cuts in half its execution time by breaking the structure of the lattice. Similar strategies for IPS and TCA could also be implemented but are still a matter of research. 6 Conclusion Biclustering is an important data analysis task that is used in several appli- cations such as transcriptome analysis in biology and for the design of recom- mender systems. Biclustering methods produce a collection of local patterns that are easier to interpret than a global model. There are several types of biclus- ters and corresponding algorithms, ad hoc most of the time. In this paper, our main contribution shows how the biclusters of similar values on columns can be characterized or generated from formal concepts, pattern concepts and triadic concepts. Bringing back this problem of biclustering into formal concept anal- ysis settings allows the usage of existing and efficient algorithms without any modifications. However, and this is among the perspectives of research, several optimizations can be made. For example, with the triadic method, one should not generate both concepts (A, B, C) and (A, C, B): they are redundant since only concepts with B = C correspond to maximal biclusters. Three FCA Methods for Mining Biclusters of Similar Values on Columns 253 References 1. J. Baixeries, M. Kaytoue, and A. Napoli. Computing similarity dependencies with pattern structures. In M. Ojeda-Aciego and J. Outrata, editors, CLA, volume 1062 of CEUR Workshop Proceedings, pages 33–44. CEUR-WS.org, 2013. 2. J. Baixeries, M. Kaytoue, and A. Napoli. Characterizing Functional Dependencies in Formal Concept Analysis with Pattern Structures. Annals of Mathematics and Artificial Intelligence, pages 1–21, Jan. 2014. 3. J. Besson, C. Robardet, L. D. Raedt, and J.-F. Boulicaut. Mining bi-sets in nu- merical data. In S. Dzeroski and J. Struyf, editors, KDID, volume 4747 of Lecture Notes in Computer Science, pages 11–23. Springer, 2007. 4. A. Califano, G. Stolovitzky, and Y. Tu. Analysis of gene expression microarrays for phenotype classification. In P. E. Bourne, M. Gribskov, R. B. Altman, N. Jensen, D. A. Hope, T. Lengauer, J. C. Mitchell, E. D. Scheeff, C. Smith, S. Strande, and H. Weissig, editors, Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, August 19-23, 2000, La Jolla / San Diego, CA, USA, pages 75–85. AAAI, 2000. 5. L. Cerf, J. Besson, C. Robardet, and J.-F. Boulicaut. Closed patterns meet n-ary relations. TKDD, 3(1), 2009. 6. V. Codocedo and A. Napoli. Lattice-based biclustering using Partition Pattern Structures. In 21st European Conference on Artificial Intelligence (ECAI), 2014. 7. A. V. Freitas, W. Ayadi, M. Elloumi, J. Oliveira, J. Oliveira, and J.-K. Hao. Bio- logical Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, chapter Survey on Biclustering of Gene Expression Data. John Wiley & Sons, Inc., 2013. 8. B. Ganter and S. O. Kuznetsov. Pattern structures and their projections. In ICCS ’01: Proceedings of the 9th International Conference on Conceptual Structures, pages 129–142. Vol. 2120, Springer-Verlag, 2001. 9. B. Ganter and R. Wille. Formal Concept Analysis. Springer, 1999. 10. R. Jäschke, A. Hotho, C. Schmitz, B. Ganter, and G. Stumme. Trias - an algorithm for mining iceberg tri-lattices. In ICDM, pages 907–911, 2006. 11. M. Kaytoue, S. O. Kuznetsov, J. Macko, and A. Napoli. Biclustering meets triadic concept analysis. Annals of Mathematics and Artificial Intelligence, 70(1-2), 2014. 12. M. Kaytoue, S. O. Kuznetsov, and A. Napoli. Biclustering numerical data in formal concept analysis. In P. Valtchev and R. Jäschke, editors, ICFCA, volume 6628 of LNCS, pages 135–150. Springer, 2011. 13. M. Kaytoue, S. O. Kuznetsov, A. Napoli, and S. Duplessis. Mining gene expression data with pattern structures in formal concept analysis. Information Science, 181(10):1989–2001, 2011. 14. S. O. Kuznetsov. Galois connections in data analysis: Contributions from the soviet era and modern russian research. In B. Ganter, G. Stumme, and R. Wille, editors, Formal Concept Analysis, volume 3626 of Lecture Notes in Computer Science, pages 196–225. Springer, 2005. 15. S. O. Kuznetsov and S. A. Obiedkov. Comparing performance of algorithms for generating concept lattices. J. Exp. Theor. Artif. Intell., 14(2-3):189–216, 2002. 16. F. Lehmann and R. Wille. A triadic approach to formal concept analysis. In ICCS, volume 954 of LNCS, pages 32–43. Springer, 1995. 17. S. Madeira and A. Oliveira. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1(1):24–45, 2004. 254 Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli 18. N. Rogovschi, L. Labiod, and M. Nadif. A spectral algorithm for topographical co-clustering. In IJCNN, pages 1–6. IEEE, 2012. 19. T. Uno, M. Kiyomi, and H. Arimura. Lcm ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In R. J. B. Jr., B. Goethals, and M. J. Zaki, editors, FIMI, volume 126 of CEUR Workshop Proceedings. CEUR-WS.org, 2004. 20. D. van der Merwe, S. Obiedkov, and D. Kourie. AddIntent: A New Incremental Al- gorithm for Constructing Concept Lattices. In P. Eklund, editor, Concept Lattices, volume 2961 of LNCS, pages 205–206. Springer, Berlin/Heidelberg, 2004. 21. R. Veroneze, A. Banerjee, and F. J. V. Zuben. Enumerating all maximal biclusters in real-valued datasets. CoRR, abs/1403.3562, 2014. 7 Appendix: Proof of proposition 1 We introduce notations, before to recall and prove Proposition 1 that relates maximal biclusters to interval pattern concepts of a pattern structure. The in- tuition lies in the relation between the set of attributes M of (G, M, W, I)) in an interval pattern structure (G, (D, u), δ). Let d = h[a1 , b1 ], [a2 , b2 ], . . . , [an , bn ]i ∈ D be a pattern interval in an interval pattern structure (G, (D, u), δ), where |M | = n. For any mi ∈ M , we define: d(mi ) = [ai , bi ]. and |d(mi )| = |ai − bi |. Definition 5. Let d be a pattern in an interval pattern structure (G, (D, u), δ). The function attr : D 7→ M is defined as: attr(d) = {m ∈ M | |d(m)| ≤ θ}. Definition 6. Let A ⊆ G be a set of objects and m ∈ M an attribute. We define: A(m) = {g(m) | g ∈ B}. For instance, in Table 1, if A = {g1 , g2 , g3 }, then, A(m4 ) = {2, 8, 9}. Proposition 4. For A ⊆ G, we have that, for all mi ∈ M : A = h[min(A(m1 )), max(A(m1 ))], . . . , [min(A(mn )), max(A(mn ))]i Proof. Since the operation u is associative and commutative, we have that l A = gi = h[min(A(m1 )), max(A(m1 ))], . . . , [min(A(mn )), max(A(mn ))]i gi ∈A t u Now we reformulate and prove the Proposition 1. Proposition 5. Consider a numerical dataset (G, M, W, I) as an interval pat- tern structure (G, (D, u), δ). For any maximal bicluster (A, B), we define: d = A . Then: 1. B = attr(d) and 2. (A, D) is a pattern concept in (G, (D, u), δ). Proof. 1. B = attr(d). We prove that m ∈ attr(b) ↔ m ∈ B. Since B = A , then, by the definition of maximal bicluster we have that ∀m ∈ M : m ∈ B ↔ |A(m)| ≤ θ, if and only if |min(A(m)) − max(A(m))| ≤ θ if and only if (by the definition of d) m ∈ attr(d). t u 2. We need to prove that A = d and that A = d. A = d holds by the definition of d. As for A = d , we take g ∈ d , which means that ∀m ∈ M : g(m) ∈ d(m), also if m ∈ B, which implies that g ∈ A by definition of maximal bicluster. Defining Views with Formal Concept Analysis for Understanding SPARQL Query Results Mehwish Alam2,3 and Amedeo Napoli1,2 1 CNRS, LORIA, UMR 7503, Vandoeuvre-lès-Nancy, F-54506, France 2 Inria, Villers-lès-Nancy, F-54600, France 3 Université de Lorraine, LORIA, UMR 7503, Vandoeuvre-lès-Nancy, F-54506, France {mehwish.alam,amedeo.napoli@loria.fr} Abstract. SPARQL queries over semantic web data usually produce list of tuples as answers that may be hard to understand and interpret. Accordingly, this paper focuses on Lattice-Based View Access (LBVA), a framework based on FCA. This framework provides a classification of the answers of SPARQL queries based on a concept lattice, that can be navigated for retrieving or mining specific patterns in query results. In this way, the concept lattice can be considered as a materialized view of the data resulting from a SPARQL query. Keywords: Formal Concept Analysis, SPARQL Query Views, Lattice-Based Views, SPARQL, Classification. 1 Introduction At present, Web has become a potentially large repository of knowledge, which is becoming main stream for querying and extracting useful information. In partic- ular, Linked Open Data (LOD) [2] provides a method for publishing structured data in the form of RDF resources. These RDF resources are interlinked with each other to form a cloud. SPARQL queries are used in order to make these resources usable, i.e., queried. In some cases, queries in natural language against standard search engines can be simple to use but sometimes they are complex and may require integration of data sources. Then the standard search engines will not be able to easily answer these queries, e.g., Currencies of all G8 coun- tries. Such a complex query can be formalized as a SPARQL query over data sources present in LOD cloud through SPARQL endpoints for retrieving answers. Moreover, users may sometimes execute queries which generate huge amount of results giving rise to the problem of information overload [5]. A typical example is given by the answers retrieved by search engines, which mix between several meanings of one keyword. In case of huge results, user will have to go through a lot of results to find the interesting ones, which can be overwhelming with- out any specific navigation tool. Same is the case with the answers obtained by SPARQL queries, which are huge in number and it may be harder to extract the most interesting patterns. This problem of information overload raises new c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 255–267, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 256 Mehwish Alam and Amedeo Napoli challenges for data access, information retrieval and knowledge discovery w.r.t web querying. Accordingly, this paper proposes a new approach based on Formal Concept Analysis (FCA [7])s. It describes a lattice-based classification of the results ob- tained by SPARQL queries by introducing a new clause VIEW BY in SPARQL query. This framework, called Lattice-Based View Access (LBVA), allows the classification of SPARQL query results into a concept lattice, referred to as a view, for data analysis, navigation, knowledge discovery and information retrieval purposes. This new clause VIEW BY which enhances the functionality of already existing GROUP BY clause in SPARQL query by adding sophisticated classification and Knowledge Discovery aspects. Here after, we describe how a lattice-based view can be designed from a SPARQL query. Afterwards, a view is accessed for analysis and interpretation purposes which are totally supported by the concept lattice. In case of large data only a part of the lattice [10] can be considered for the analysis. In this way, this paper investigates also the capabilities of FCA to deal with semantic web data. The intuition of classifying results obtained by SPARQL queries is inspired by web clustering engines [3] such as Carrot24 . The general idea behind web clustering engines is to group the results obtained by query posed by the user based on the different meanings of the terms related to a query. Such systems deal with unstructured textual data on web. By contrast, there are some stud- ies conducted to deal with structured RDF data. In [5], the authors introduce a clause Categorize By to target the problem of managing large amounts of results obtained by conjunctive queries with the help of subsumption hierarchy present in the knowledge base. By contrast, the VIEW BY clause generates lattice- based views which provide a mathematically well-founded classification based on formal concepts and an associated concept lattice. Moreover, it also paves way for navigation or information retrieval by traversing the concept lattice and for data analysis by allowing the extraction of association rules from the lattice. Such data analysis operations allow discovery of new knowledge. Additionally, unlike Categorize By, VIEW BY can deal with data that has no schema (which is often the case with linked data). Moreover, VIEW BY has been evaluated over very large set of answers (roughly 100,000 results) obtained over real datasets. In case of larger number of answers, Categorize By does not provide any pruning mechanism while this paper describes how the views can be pruned using iceberg lattices. The paper is structured as follows: Section 2 introduces a motivating exam- ple. Section 3 gives a brief introduction of the state of the art while Section 4 defines LBVA and gives the overall architecture of the framework. Section 5 dis- cusses some experiments conducted using LBVA. Finally, Section 6 concludes the paper. 4 http://project.carrot2.org/index.html Defining Views with FCA for Understanding SPARQL Query Results 257 2 Motivation In this section we introduce a motivating example focusing on why LOD should be queried and why the SPARQL query results need classification. This scenario will continue in the rest of the paper. Let us consider that a query Q searching for museums where the exhibition of some famous artists is taking place along with the location of the museum. Here, we do not discuss the interface aspects and we will assume that SPARQL queries are provided. A standard query engine is not adequate for answering such kind of questions and a direct query over LOD will give better results. One of the ways to obtain such an information is to query LOD through its SPARQL endpoint. This query will generate a huge amount of results, which will need further manual work to group the interesting links. 3 Background 3.1 Linked Open Data Linked Open Data (LOD) [2] is the way of publishing structured data in the form of RDF graphs. Given a set of URIs U, blank nodes B and literals L, an RDF triple is represented as t = (s, p, o) ∈ (U ∪ B) × U × (U ∪ B ∪ L), where s is a subject, p is a predicate and o is an object. A finite set of RDF triples is called as RDF Graph G such that G = (V, E), where V is a set of vertices and E is a set of labeled edges and G ∈ G, such that G = (U ∪ B) × U × (U ∪ B ∪ L). Each pair of vertices connected through a labeled edge keeps the information of a statement. Each statement is represented as hsubject, predicate, objecti referred to as an RDF Triple. V includes subject and object while E includes the predicate. SPARQL5 is the standard query language for RDF. In the current work we will focus on the queries containing SELECT clause. Let us assume that there exists a set of variables V disjoint from U in the above definition of RDF, then (U ∪ V) × (U ∪ V) × (U ∪ V) is a graph pattern called a triple pattern. If a variable ?X ∈ V and ?X = c then c ∈ U . Given U , V and a triple pattern t a mapping µ(t) would be the triple obtained by replacing variables in t with U . [[.]]G takes an expression of patterns and returns a set of mappings. Given a mapping µ : V → U and a set of variables W ⊆ V , µ is represented as µ|W , which is described as a mapping such that dom(µ|W ) = dom(µ) ∩ W and µ|W (?X) = µ(?X) for every ?X ∈ dom(µ) ∩ W . Finally, the SPARQL SELECT query is defined as follows: Definition 1. A SPARQL SELECT query is a tuple (W, P ), where P is a graph pattern and W is a set of variables such that W ⊆ var(P ). The answer of (W, P ) over an RDF graph G, denoted by [[(W, P )]]G , is the set of mappings: [[(W, P )]]G = {µ|W |µ ∈ [[P ]]G } 5 http://www.w3.org/TR/rdf-sparql-query/ 258 Mehwish Alam and Amedeo Napoli In Definition 1, var(P ) is the set of variables in pattern P and W is the set of variables in SELECT clause. Here, P includes the triple patterns containing variables. This triple pattern is then evaluated against the RDF Graph G given as [[P ]]G . It returns a set of mappings with respect to the variables in var(P ). Finally a projection over µ is done w.r.t. the variables in W . The projected set of mappings obtained as represented as µ|W . Further details on the formalization and foundations of RDF databases are discussed in [1]. Example 1. Continuing the scenario in section 2, following is the SPARQL query: 1 SELECT ?museum ?country ?artist WHERE { 2 ?museum rdf:type dbpedia-owl:Museum . 3 ?museum dbpedia-owl:location ?city . 4 ?city dbpedia-owl:country ?country . 5 ?painting dbpedia-owl:museum ?museum . 6 ?painting dbpprop:artist ?artist} 7 GROUP BY ?country ?artist This query retrieves the list of museums along with the artists whose work is exhibited in a museum along with the location of a museum. Lines 5 and 6 retrieve information about the artists whose work is displayed in some museum. More precisely, the page containing the information on a museum (?museum) is connected to the page of the artists (?artist) through a page on the work of artist (?painting) displayed in the museum. In order to integrate these three re- sources, two predicates were used dbpedia-owl:museum and dbpprop:artist. An excerpt of the answers obtained by Group by clause is shown below: Pablo Picasso Musee d’Art Moderne France Leonardo Da Vinci Musee du Louvre France Raphael Museo del Prado Spain The problem encountered while browsing such an answer is that there are too many statements to navigate through. Even after using the GROUP BY clause the answers are not organized in any ordered structure. By contrast, the clause VIEW BY activates the LBVA framework, where the user will obtain a classification of the statements as a concept lattice where statements are partially ordered (see Figure 1a). To obtain the museums in UK displaying the work of Goya, all the museums displaying the work of Goya can be retrieved and then the specific concept containing Goya and UK is obtained by navigation. The answer obtained is National Gallery in the example. 3.2 Formal Concept Analysis (FCA) As the basics of Formal Concept Analysis (FCA) [7] are well known, we only introduce some of the concepts which are necessary to understand this paper. FCA is a mathematical framework used for a number of purposes, among which classification and data analysis, information retrieval and knowledge discovery [4]. In some cases we obtain a huge number of concepts. In order to restrict the Defining Views with FCA for Understanding SPARQL Query Results 259 (a) Classes of Museums w.r.t Artists and Countries, e.g., the concept on the top left corner with the attribute France contains (b) Classes of Artists w.r.t Museums and all the French Museums, i.e., Musee du Louvre (Louvre) and Countries. (VIEW BY ?artist) Musee d’Art Moderne (MAM). (VIEW BY ?museum) Fig. 1: Lattice-Based Views w.r.t Museum’s and Artist’s Perspective . number of concepts, iceberg concept lattices can be used [10]. Iceberg concept lattices contain only the top most part of the lattice. Along with iceberg lattices a stability index [9] is also used for filtering the concepts. The stability index shows how much the concept intent depends on particular objects of the extent. FCA also allows knowledge discovery using association rules. An implication over the attribute set M in a formal context is of the form B1 → B2 , where B1 , B2 ⊆ M . The implication holds iff every object in the context with an attribute in B1 also has all the attributes in B2 . For example, when (A1 , B1 ) ≤ (A2 , B2 ) in the lattice, we have that B1 → B2 . Duquenne-Guigues (DG) basis for implications [8] is the minimal set of implications equivalent to the set of all valid implications for a formal context K = (G, M, I). Actually, the DG-basis contains all information lying in the concept lattice. 4 Lattice-Based View Access 4.1 SPARQL Queries with Classification Capabilities The idea of introducing a VIEW BY clause is to provide classification of the results and add a knowledge discovery aspect to the results w.r.t the vari- ables appearing in VIEW BY clause. Let Q be a SPARQL query of the form Q = SELECT ?X ?Y ?Z WHERE {pattern P} VIEW BY ?X then the set of variables V = {?X, ?Y, ?Z} 6 . According to the definition 1 the answer of the tuple (V, P ) is represented as [[({?X, ?Y, ?Z}, P )]] = µi where i ∈ {1, . . . , k} and k is the number of mappings obtained for the query Q. For the sake of simplicity, µ|W is given as µ. Here, dom(µi ) = {?X, ?Y, ?Z} which means that µ(?X) = Xi , 6 As W represents set of attribute values in the definition of a many-valued formal context, we represent the variables in select clause as V to avoid confusion. 260 Mehwish Alam and Amedeo Napoli µ(?Y ) = Yi and µ(?Z) = Zi . Finally, a complete set of mappings can be given as {{?X → Xi , ?Y → Yi , ?Z → Zi }}. The variable appearing in the VIEW BY clause is referred to as object variable7 and is denoted as Ov such that Ov ∈ V . In the current scenario Ov = {?X}. The remaining variables are referred to as attribute variables and are denoted as Av where Av ∈ V such that Ov ∪ Av = V and Ov ∩ Av = ∅, so, Av = {?Y, ?Z}. Example 2. Following the example in section 2, an alternate query with the VIEW BY clause can be given as: SELECT ?museum ?artist ?country WHERE { ?museum rdf:type dbpedia-owl:Museum . ?museum dbpedia-owl:location ?city . ?city dbpedia-owl:country ?country . ?painting dbpedia-owl:museum ?museum . ?painting dbpprop:artist ?artist} VIEW BY ?museum ?museum ?artist ?country µ1 Musee d’Art Moderne Pablo Picasso France µ2 Museo del Prado Raphael Spain .. .. .. .. . . . . Table 1: Generated Mappings for SPARQL Query Q Here, V ={?museum, ?artist, ?country} and P is the conjunction of pat- terns in the WHERE clause then the evaluation of [[({?museum, ?artist, ?country} , P )]] will generate the mappings shown in Table 1. Accordingly, dom(µi ) = {?museum, ?artist, ?country}. Here, µ1 (?museum) = M usee d0 Art M oderne, µ1 (?artist) = P ablo P icasso and µ1 (?country) = F rance. We have Ov = {?museum} because it appears in the VIEW BY clause and Av = {?artist, ?country}. Figure 1a shows the generated view when Ov = {?museum} and in Figure 1b, we have; Ov = {?artist} and Av = {?museum, ?country}. 4.2 Designing a Formal Context of Answer Tuples The results obtained by the query are in the form of set of tuples, which are then organized as a many-valued context. Obtaining a Many-Valued Context (G, M, W, I): As described previously, we have Ov = {?X} then µ(?X) = {Xi }i∈{1,...,k} , where Xi denote the values obtained for the object variable and the corresponding mapping is given as {{?X → Xi }}. Finally, G = µ(?X) = {Xi }i∈{1,...,k} . Let Av = {?Y, ?Z} then M = Av and the attribute values W = {µ(?Y ), µ(?Z)} = {{Yi }, {Zi }}i∈{1,...,k} . The corresponding mapping for attribute variables are {{?Y → Yi , ?Z → Zi }}. 7 The object here refers to the object in FCA. Defining Views with FCA for Understanding SPARQL Query Results 261 In order to obtain a ternary relation, let us consider an object value gi ∈ G and an attribute value wi ∈ W then we have (gi , “?Y 00 , wi ) ∈ I iff ?Y (gi ) = wi , i.e., the value of gi for attribute ?Y is wi , i ∈ {1, . . . , k} as we have k values for ?Y . Obtaining Binary Context (G, M, I): Afterwards, a conceptual scaling used for binarizing the many-valued context, in the form of (G, M, I). Finally, we have G = {Xi }i∈{1,...,k} , M = {Yi } ∪ {Zi } where i ∈ {1, . . . , k} for object variable Ov = {?X}. The binary context obtained after applying the above transforma- tions to the SPARQL query answers w.r.t to object variable is called the formal context of answer tuples and is denoted by Ktuple . Example 3. In the example Ov = {?museum}, Av = {?artist, ?country}. The answers obtained by this query are organized into a many-valued context as follows: the distinct values of the object variable ?museum are kept as a set of objects, so G = {M useeduLouvre, M useodelP rado, . . . }, attribute variables provide M = {artist, country}, W1 = {Raphael, LeonardoDaV inci, . . . } and W2 = {F rance, Spain, U K, . . . } in a many-valued context. The obtained many- valued context is shown in Table 2. Finally, the obtained many-valued context is conceptually scaled to obtain a binary context shown in Table 3. Museum Artist Country Musee du Louvre {Raphael, Leonardo Da Vinci, Caravaggio} {France} Musee d’Art Moderne {Pablo Picasso} {France} Museo del Prado {Raphael, Caravaggio, Francisco Goya} {Spain} National Gallery {Leonardo Da Vinci, Caravaggio, Francisco Goya} {UK} Table 2: Many-Valued Context (Museum). Artist Country Museum Raphael Da Vinci Picasso Caravaggio Goya France Spain UK Musee du Louvre × × × × Musee d’Art Moderne × × Museo del Prado × × × × National Gallery × × × × Table 3: Formal Context Ktuple w.r.t ?museum. The organization of the concept lattice is depending on the choice of ob- ject variable and the attribute variables. Then, to group the artists w.r.t the museums where their work is displayed and the location of the museums, the object variable would be ?artist and the attribute variables will be ?museum and ?country. Then, the scaling can be performed for obtaining a formal con- text. In order to complete the set of attribute, domain knowledge can also be taken into account, such as the the ontology related to the type of artists or mu- seums. This domain knowledge can be added with the help of pattern structures, an approach linked to FCA, on top of many-valued context without having to perform scaling. For the sake of simplicity, we do not discuss it in this paper. 4.3 Building a Concept Lattice Once the context is designed, the concept lattice can be built using an FCA algo- rithm.There are some very efficient algorithms that can be used [7, 11]. However, 262 Mehwish Alam and Amedeo Napoli in the current implementation we use AddIntent [11] which is an incremental concept lattice construction algorithm. In case of large data iceberg lattices can be considered [10]. The use of VIEW BY clause activates the process of LBVA, which transforms the SPARQL query answers (tuples) to a formal context Ktuples through which a concept lattice is obtained which is referred to as a Lattice-Based View. A view on SPARQL query in section 2, i.e, a concept lattice corresponding to Table 3 is shown in Figure 1a. 4.4 Interpretation Operations over Lattice-Based Views A formal context effectively takes into account the relations by keeping the inherent structure of the relationships present in LOD as object-attribute re- lation. When we build a concept lattice, each concept keeps a group of terms sharing some attribute (i.e., the relationship with other terms). This concept lattice can be navigated for searching and accessing particular LOD elements through the corresponding concepts within the lattice. It can be drilled down from general to specific concepts or rolled up to obtain the general ones which can be further interpreted by the domain experts. For example, in order to search for the museums where there is an exhibition of the paintings of Caravaggio, the concept lattice in Figure 1(a) is explored levelwise. It can be seen that the paintings of Caravaggio are displayed in Musee du Louvre, Museo del Prado and National Gallery. Now it can be further filtered by country, i.e., look for French museums displaying Caravaggio. The same lattice can be drilled down and Musee du Louvre as an answer can be retrieved. Next, to check the museums located in France and Spain, the roll up operation from the French Museums to the general concept containing all the museums with Caravaggio’s painting can be applied and then the drill down operation to Museums in France or Spain displaying Caravaggio can be performed. The answer obtained will be Musee du Louvre and Museo del Prado. A different perspective on the same set of answers can also be retrieved, meaning that the group of artists w.r.t museums and country. For selecting French museums according to the artists they display, the object variable will be Ov = {?artist} and attribute variables will be Av = {?museum, ?country}. The lattice obtained in this case will be from Artist’s perspective (see Figure 1b). Now, it is possible to retrieve Musee du Louvre and Musee d’Art Moderne, which are the French museums and to obtain a specific French museum displaying the work of Leonardo Da Vinci a specific concept can be selected which gives the answer Musee du Louvre. FCA provides a powerful means for data analysis and knowledge discovery. VIEW BY can be seen as a clause that engulfs the original SPARQL query and enhances it’s capabilities by providing views which can be reduced using ice- berg concept lattices. Iceberg lattices provide the top most part of the lattice filtering out only general concepts. The concept lattice is still explored levelwise depending on a given threshold. Then, only concepts whose extent is sufficiently large are explored, i.e., the support of a concept corresponds to the cardinal of the extent. If further specific concepts are required the support threshold of the Defining Views with FCA for Understanding SPARQL Query Results 263 iceberg lattices can be lowered and the resulting concept lattice can be explored levelwise. Knowledge Discovery: Among the means provided by FCA for knowledge discovery, the Duquenne-Guigues basis of implications takes into account a min- imal set of implications which represent all the implications (i.e., association rules with confidence 1) that can be obtained by accessing the view i.e., a con- cept lattice. For example, implications according to Figure 1(a) state that all the museums in the current context which display Leonardo Da Vinci also display Caravaggio (rule: Leonardo Da Vinci → Caravaggio). It also says that only the museums which display the work of Caravaggio display the work of Leonardo Da Vinci Such a rule can be interesting if the museums which dis- play the work of both Leonardo Da Vinci and Caravaggio are to be retrieved. The rule Goya, Raphael, Caravaggio → Spain suggests that there exists a museum which have works of Goya, Raphael, Caravaggio only in Spain, more precisely Museo Del Prado. (These rules are generated from only the part of SPARQL query answers shown as a context in Table 3). 5 Experimentation The experiments were conducted on real dataset. Our algorithm is implemented in Java using Jena8 platform and the experiments were conducted on a laptop with 2.60 GHz Intel core i5 processor, 3.7 GB RAM running Ubuntu 12.04. We extracted the information about the movie with their genre and location using SPARQL query enhanced with VIEW BY clause. The experiment shows that even though the background knowledge (ontological information) was not extracted the views reveal the hidden hierarchical information contained in the SPARQL query answers and can be navigated accordingly. Moreover, it also shows that useful knowledge is extracted from the answers through the the views using DG−Basis of implications. We also performed quantitative analysis where we discussed about the sparsity of the semantic web data. We also tested how our method scales with growing number of results. The number of answers obtained by YAGO were 100,000. The resulting view kept the classes of movies with respect to genre and location. 5.1 YAGO The construction of YAGO ontology is based on the extraction of instances and hierarchical information from Wikipedia and Wordnet. In the current experi- ment, we sent a query to YAGO with the VIEW BY clause. PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# PREFIX yago: http://yago-knowledge.org/resource/ SELECT ?movie ?genre ?location WHERE { 8 https://jena.apache.org/ 264 Mehwish Alam and Amedeo Napoli ?movie rdf:type yago:wordnet movie 106613686 . ?movie yago:isLocatedIn ?location . ?movie rdf:type ?genre . } VIEW BY ?movie While querying YAGO it was observed that the genre and location informa- tion was also given in the ontology. The first level of the obtained view over the SPARQL query results over YAGO kept the groups of movies with respect to their languages. e.g., the movies with genre Spanish Language Films. However, as we further drill down in the concept lattice we get more specific categories which include the values from the location variable such as Spain, Argentina and Mexico. There were separate classes obtained for movies based on novels which were then further specialized by the introduction of the country attribute as we drill down the concept lattice. Finally with the help of lattice-based views, it can be concluded that the answers obtained by querying YAGO provides a clean categorization of movies by making use of the partially ordered relation between the concepts present in the concept lattice. DG-Basis of Implications: DG-Basis of Implications for YAGO were calcu- lated. The implications were filtered in three ways. Firstly, pruning was per- formed naively with respect to support threshold. Around 200 rules were ex- tracted on support threshold of 0.2%. In order, to make the rules observable, the second type of filtering based on number of elements in the body of the rules was applied. All the implications which contained one item set in the body were selected. However, if there still are large number of implications to be ob- served then a third type of pruning can be applied which involved the selection of implications with different attribute type in head and body, e.g., in rule#1 head contains United States which is of type country and body contains the wikicategory. Such kind of pruning helps in finding attribute-attribute relations. Table 4 contains some of the implications. Calculating DG − Basis of impli- cations is actually useful in finding regularities in the SPARQL query answers which can not be discovered from the raw tuples obtained. For example, rule#1 states that RKO picture films is an American film production and distribution company as all the movies produced and distributed by them are from United States. Moreover, rule#2 says that all the movies in Oriya language are from India. This actually points to the fact that Oriya is one of many languages that is spoken in India. This rule also tells that Oriya language is only spoken in India. Rule#3 shows a link between a category from Wikipedia and Wordnet, which clearly says that the wikicategory is more specific than the wordnet category as remake is more general than Film remakes. Impl. ID Supp. Implication 1. 96 wikicategory RKO Pictures films → United States 2. 46 wikicategory Oriya language films → India 3. 64 wikicategory Film remakes → wordnet remake Table 4: Some implications from DG-Basis of Implication (YAGO) Defining Views with FCA for Understanding SPARQL Query Results 265 0.30 120 Desnity of Formal Context in % 100 0.25 Execution Time (in seconds) 80 0.20 60 40 0.15 20 0.10 0 20 40 60 80 100 20 40 60 80 100 Number of Tuples in % Number of Tuples in % (a) Density of KY AGO (b) Runtime for Building LY AGO Fig. 2: Experimental Results. 5.2 Evaluation Besides the qualitative evaluation of LBVA, we performed an empirical evalua- tion. The characteristics of the dataset are shown in Table 5. These concepts were pruned with the help of iceberg lattices and stability for qualitative analysis. The plots for the experimentation are shown in Figure 2. Figure 2(a) shows a comparison between the number of tuples obtained and the density of the formal context. The density of the formal context is the proportion of pairs in I w.r.t the size G × M . It has very low range for both the experiments, i.e., it ranges from 0.14% to 0.28%. This means in particular that the semantic web data is very sparse when considered in a formal context and deviates from the datasets usually considered for FCA (as they are dense). Here we can see that as the number of tuples increases the density of the formal context is decreasing which means that sparsity of the data also increases. We also tested how our method scales with growing number of results. The number of answers obtained by YAGO were 100,000. Figure 2(b) illustrate the execution time for building the concept lattice w.r.t the number of tuples ob- tained. The execution time ranges from 20 to 100 seconds, it means that the the concept lattices were built in an efficient way and large data can be consid- ered for these kinds of experiments. Usually the computation time for building concept lattices depends on the density of the formal context but in the case of semantic web data, as the density is not more than 1%, the computation completely depends on the number of objects obtained which definitely increase with the increase in the number of tuples (see Table 5). No. of Tuples |G| |M | No. of Concepts 20% 3657 2198 7885 40% 6783 3328 19019 60% 9830 4012 31264 80% 12960 4533 43510 100% 15272 4895 55357 Table 5: Characteristics of Datasets (YAGO) 6 Conclusion and Discussion In LBVA, we introduce a classification framework based on FCA for the set of tuples obtained as a result of SPARQL queries over LOD. In this way, a view 266 Mehwish Alam and Amedeo Napoli is organized as a concept lattice built through the use of VIEW BY clause that can be navigated where information retrieval and knowledge discovery can be performed. Several experiments show that LBVA is rather tractable and can be applied to large data. For future work, we are interested in extending the VIEW BY clause by in- cluding the available background knowledge of the resources using the formalism of pattern structures [6]. Moreover, we intend to use implications for complet- ing the background knowledge. We also intend to use pattern structures with a graph description for each considered object, where the graph is the set of all triples accessible w.r.t reference object. References 1. Marcelo Arenas, Claudio Gutierrez, and Jorge Pérez. Foundations of rdf databases. In Sergio Tessaris, Enrico Franconi, Thomas Eiter, Claudio Gutierrez, Siegfried Handschuh, Marie-Christine Rousset, and Renate A. Schmidt, editors, Reasoning Web, volume 5689 of Lecture Notes in Computer Science, pages 158–204. Springer, 2009. 2. Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked data - the story so far. Int. J. Semantic Web Inf. Syst., 5(3):1–22, 2009. 3. Claudio Carpineto, Stanislaw Osiński, Giovanni Romano, and Dawid Weiss. A survey of web clustering engines. ACM Comput. Surv., 41(3):17:1–17:38, 2009. 4. Claudio Carpineto and Giovanni Romano. Concept data analysis - theory and applications. Wiley, 2005. 5. Claudia d’Amato, Nicola Fanizzi, and Agnieszka Lawrynowicz. Categorize by: Deductive aggregation of semantic web query results. In Lora Aroyo, Grigoris An- toniou, Eero Hyvönen, Annette ten Teije, Heiner Stuckenschmidt, Liliana Cabral, and Tania Tudorache, editors, ESWC (1), volume 6088 of Lecture Notes in Com- puter Science, pages 91–105. Springer, 2010. 6. Bernhard Ganter and Sergei O. Kuznetsov. Pattern structures and their projec- tions. In Harry S. Delugach and Gerd Stumme, editors, ICCS, volume 2120 of Lecture Notes in Computer Science, pages 129–142. Springer, 2001. 7. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foun- dations. Springer, Berlin/Heidelberg, 1999. 8. J.-L. Guigues and V. Duquenne. Familles minimales d’implications informatives résultant d’un tableau de données binaires. Mathématiques et Sciences Humaines, 95:5–18, 1986. 9. Sergei O. Kuznetsov. On stability of a Formal Concept. Ann. Math. Artif. Intell., 49(1-4):101–115, 2007. 10. Gerd Stumme, Rafik Taouil, Yves Bastide, and Lotfi Lakhal. Conceptual cluster- ing with iceberg concept lattices. In R. Klinkenberg, S. Rüping, A. Fick, N. Henze, C. Herzog, R. Molitor, and O. Schröder, editors, Proc. GI-Fachgruppentreffen Maschinelles Lernen (FGML’01), Universität Dortmund 763, October 2001. 11. Dean van der Merwe, Sergei A. Obiedkov, and Derrick G. Kourie. Addintent: A new incremental algorithm for constructing concept lattices. In Peter W. Eklund, editor, ICFCA, Lecture Notes in Computer Science, pages 372–385. Springer, 2004. A generalized framework to consider positive and negative attributes in formal concept analysis. J. M. Rodriguez-Jimenez, P. Cordero, M. Enciso and A. Mora Universidad de Málaga, Andalucı́a Tech, Spain. {pcordero,enciso}@uma.es {amora,jmrodriguez}@ctima.uma.es Abstract. In Formal Concept Analysis the classical formal context is analized taking into account only the positive information, i.e. the pres- ence of a property in an object. Nevertheless, the non presence of a prop- erty in an object also provides a significant knowledge which can only be partially considered with the classical approach. In this work we have modified the derivation operators to allow the treatment of both, positive and negative attributes which come from respectively, the presence and absence of the properties. In this work we define the new operators and we prove that they are a Galois connection. Finally, we have also studied the correspondence between the formal context in the new framework and the extended concept lattice, providing new interesting properties. 1 Introduction Data analysis of information is a well established discipline with tools and tech- niques well developed to challenge the identification of hide patterns in the data. Data mining, and general Knowledge Discovering, helps in the decision mak- ing process using pattern recognition, clustering, association and classification methods. One of the popular approaches used to extract knowledge is mining the patterns of the data expressed as implications (functional dependencies in database community) or association rules. Traditionally, implications and similar notions have been built using the posi- tive information, i.e. information induced by the presence of attributes in objects. In Manilla et al. [6] an extended framework for enriched rules was introduced, considering negation, conjunction and disjunction. Rules with negated attributes were also considered in [1]: “if we buy caviar, then we do not buy canned tuna”. In the framework of formal concept analysis, some authors have proposed the mining of implications with positive and negative attributes from the apposition of the context and its negation (K|K) [2, 4]. Working with (K|K) conduits to a huge exponential problem and also as R. Missaoui et.al. shown in [9] real applications use to have sparse data in the context K whereas dense data in K (or viceversa), and therefore “generate a huge set of candidate itemsets and a tremendous set of uninteresting rules”. c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 267–279, ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik University in Košice, 2014. 268 2 Rodriguez-Jimenez José et al. Manuel Rodrı́guez-Jiménez et al. R. Missaoui et al. [7, 8] propose the mining from a formal context K of a subset of all mixed implications, i.e. implication with positive and negative attributes, representing the presence and absence of properties. As far as we know, the approach of these authors uses, for first time in this problem, a set of inference rules to manage negative attributes. In [11] we followed the line proposed by Missaoui and presented an algo- rithm, based on the NextClosure algorithm, that allows to obtain mixed impli- cations. The proposed algorithm returns a feasible and complete basis of mixed implications by performing a reduced number of requests to the formal context. Beyond the benefits provided by the inclusion of negative attributes in terms of expressiveness, Revenko and Kuznetsov [10] use negative attributes to tackle the problem of finding some types of errors in new object intents is introduced. Their approach is based on finding implications from an implication basis of the context that are not respected by a new object. Their work illustrates the great benefit that a general framework for negative and positive attributes would provide. In this work we propose a deeper study of the algebraic framework for Formal Concept Analysis taking into account positive and negative information. The first step is to consider an extension of the classical derivation operators, proving to be Galois connection. As in the classical framework, this fact will allows to built the two usual dual concept lattices, but in this case, as we shall see, the correspondence among concept lattices and formal contexts reveal several characteristics which induce interesting properties. The main aim of this work is to establish a formal full framework which allows to develop in the future new methods and techniques dealing with positive and negative information. In Section 2 we present the background of this work: the notions related with formal concept analysis and negative attributes. Section 3 introduces the main results which constitute the contribution of this paper. 2 Preliminaries 2.1 Formal Concept Analysis In this section, the basic notions related with Formal Concept Analysis (FCA) [12] and attribute implications are briefly presented. See [3] for a more detailed explanation. A formal context is a triple K = hG, M, Ii where G and M are finite non-empty sets and I ⊆ G × M is a binary relation. The elements in G are named objects, the elements in M attributes and hg, mi ∈ I means that the object g has the attribute m. From this triple, two mappings ↑: 2G → 2M and ↓: 2M → 2G , named derivation operators, are defined as follows: for any X ⊆ G and Y ⊆ M , X ↑ = {m ∈ M | hg, mi ∈ I for all g ∈ X} (1) Y ↓ = {g ∈ G | hg, mi ∈ I for all m ∈ Y } (2) ↑ ↓ X is the subset of all attributes shared by all the objects in X and Y is the subset of all objects that have the attributes in Y . The pair (↑, ↓) constitutes A generalized A Generalized Framework framework for Positive andfor negativeAttributes Negative attributes in in FCA FCA 3 269 a Galois connection between 2G and 2M and, therefore, both compositions are closure operators. A pair of subsets hX, Y i with X ⊆ G and Y ⊆ M such X ↑ = Y and ↓ Y = X is named a formal concept. X is named the extent and Y the intent of the concept. These extents and intents coincide with closed sets wrt the closure operators because X ↑↓ = X and Y ↓↑ = Y . Thus, the set of all formal concepts is a lattice, named concept lattice, with the relation hX1 , Y1 i ≤ hX2 , Y2 i if and only if X1 ⊆ X2 (or equivalently, Y2 ⊆ Y1 ) (3) This concept lattice will be denoted by B(G, M, I). The concept lattice can be characterized in terms of attribute implications being expressions A → B where A, B ⊆ M . An implication A → B holds in a context K if A↓ ⊆ B ↓ . That is, any object that has all the attributes in A has also all the attributes in B. It is well known that the sets of attribute implications that are valid in a context satisfies the Armstrong’s Axioms: [Ref] Reflexivity: If B ⊆ A then ` A → B. [Augm] Augmentation: A → B ` A ∪ C → B ∪ C. [Trans] Transitivity: A → B, B → C ` A → C. A set of implications Σ is considered an implicational system for K if: an implication holds in K if and only if it can be inferred, by using Armstrong’s Axioms, from Σ. Armstrong’s axioms allow us to define the closure of attribute sets wrt an implicational system (the closure of a set A is usually denoted as A+ ) and it is well-known that closed sets coincide with intents. On the other hand, several kind of implicational systems has been defined in the literature being the most used the so-called Duquenne-Guigues (or stem) basis [5]. This basis satisfies that its cardinality is minimum among all the implicational systems and can be obtained from a context by using the renowned NextClosure Algorithm [3]. 2.2 Negatives attributes As we have mentioned in the introduction, classical FCA only discover knowledge limited to positive attributes in the context, but it does not consider information relative to the absence of properties (attributes). Thus, the Duquenne-Guigues basis obtained from Table 1 is {e → bc, d → c, bc → e, a → b}. Moreover, the implications b → c and b → d do not hold in Table 1 and therefore they can not be derived from the basis by using the inference system. Nevertheless, both implications correspond with different situations. In the first case, some objects have attributes b and c (e.g. objects o1 and o3 ) whereas another objects (e.g. o2 ) have the attribute b and do not have c. On the other side, in the second case, any object that has the attribute b does not have the attribute d. A more general framework is necessary to deal with this kind of information. In [11], we have tackled this issue focusing on the problem of mining implication with positive and negative attributes from formal contexts. As a conclusion of 270 4 Rodriguez-Jimenez José et al. Manuel Rodrı́guez-Jiménez et al. I a b c d e o1 × × × o2 × × o3 × × × o4 × × Table 1. A formal context that work we emphasized the necessity of a full development of an algebraic framework. First, we begin with the introduction of an extended notation that allows us to consider the negation of attributes. From now on, the set of attributes is denoted by M , and its elements by the letter m, possibly with subindexes. That is, the lowercase character m is reserved for positive attributes. We use m to denote the negation of the attribute m and M to denote the set {m | m ∈ M } whose elements will be named negative attributes. Arbitrary elements in M ∪ M are going to be denoted by the first letters in the alphabet: a, b, c, etc. and a denotes the opposite of a. That is, the symbol a could represent a positive or a negative attribute and, if a = m ∈ M then a = m and if a = m ∈ M then a = m. Capital letters A, B, C,. . . denote subsets of M ∪ M . If A ⊆ M ∪ M , then A denotes the set of the opposite of attributes {a | a ∈ A} and the following sets are defined: – Pos(A) = {m ∈ M | m ∈ A} – Neg(A) = {m ∈ M | m ∈ A} – Tot(A) = Pos(A) ∪ Neg(A) Note that Pos(A), Neg(A), Tot(A) ⊆ M . Once we have introduced the notation, we are going to summarize some results concerning the mining of knowledge from contexts in terms of implications with negative and positive attributes [11]. A trivial approach could be obtained by adding new columns to the context with the opposite of the attributes [4]. That is, given a context K = hG, M, Ii, a new context (K|K) = hG, M ∪M , I ∪Ii is considered, where I = {hg, mi | g ∈ G, m ∈ M, hg, mi 6∈ I}. For example, if K is the context depicted in Table 1, the context (K|K) is those presented in Table 2. Obviously, the classical framework and its corresponding machinery can be used to manage the new context and, in this (direct) way, negative attributes are considered. However, this rough approach induces a non trivial growth of the formal context and, consequently, algorithms have a worse performance. In our opinion, a deeper study was done by R. Missaoui et al. in [7] where an evolved approach has been provided. For first time –as far as we know– inference rules for the management of positive and negative attributes are introduced [8]. The authors also developed new methods to mine mixed attribute implications by means of the key notion [9]. A generalized A Generalized Framework framework for Positive andfor negativeAttributes Negative attributes in in FCA FCA 5 271 I ∪I a b c d e a b c d e o1 × × × × × o2 × × × × × o3 × × × × × o4 × × × × × Table 2. The formal context (K|K) In [11], we have developed a method to mine mixed implications whose main goal has been to avoid the management of the large (K|K) contexts, so that the performance of the corresponding method has a controlled cost. First, we extend the definitions of derivation operators, formal concept and attribute implication. Definition 1. Let K = hG, M, Ii be a formal context. We define the operators ⇑: 2G → 2M ∪M and ⇓: 2M ∪M → 2G as follows: for X ⊆ G and Y ⊆ M ∪ M , X ⇑ = {m ∈ M | hg, mi ∈ I for all g ∈ X} ∪ {m ∈ M | hg, mi 6∈ I for all g ∈ X} (4) Y ⇓ = {g ∈ G | hg, mi ∈ I for all m ∈ Y } ∩ {g ∈ G | hg, mi 6∈ I for all m ∈ Y } (5) Definition 2. Let K = hG, M, Ii be a formal context. A mixed formal concept in K is a pair of subsets hX, Y i with X ⊆ G and Y ⊆ M ∪ M such X ⇑ = Y and Y ⇓ = X. Definition 3. Let K = hG, M, Ii be a formal context and let A, B ⊆ M ∪ M , the context K satisfies a mixed attribute implication A → B, denoted by K |= A → B, if A⇓ ⊆ B ⇓ . For example, in Table 1, as we previously mentioned, two different situations were presented. Thus, in this new framework we have that K 6|= b → d and K |= b → d whereas K 6|= b → c either K 6|= b → c. Now, we are going to introduce the mining method for mixed attribute im- plications. The method is strongly based on the set of inference rules built by supplementing Armstrong’s axioms with the following ones, introduced in [8]: let a, b ∈ M ∪ M and A ⊆ M ∪ M , [Cont] Contradiction: ` aa → M M . [Rft] Reflection: Aa → b ` Ab → a. The closure of an attribute set A wrt a set of mixed attribute implications Σ, denoted as A++ , is defined as the biggest set such that A → A++ can be inferred from Σ by using Armstrong’s Axioms plus [Cont] and [Rft]. Therefore, a mixed implication A → B can be inferred from Σ if and only if B is a subset of the closure of A, i.e. B ⊆ A++ . 272 6 Rodriguez-Jimenez José et al. Manuel Rodrı́guez-Jiménez et al. The proposed mining method, depicted in Algorithm 1, uses the inference rules in such a way that it is not centered around the notion of key, but it extends, in a proper manner, the classical NextClosure algorithm [3]. Algorithm 1: Mixed Implications Mining Data: K = hG, M, Ii Result: Σ set of implications 1 begin 2 Σ := ∅; 3 Y := ∅; 4 while Y < M do 5 foreach X ⊆ Y do 6 A := (Y r X) ∪ X; 7 if Closed(A, Σ) then 8 C := A⇓⇑ ; 9 if A 6= C then Σ := Σ ∪ {A → C r A} 10 Y := Next(Y ) // i.e. successor of Y in the lectic order 11 return Σ 12 end The algorithm to calculate the mixed implicational system doesn’t need to exhaustive traverse all the subsets of mixed attributes, but only those ones that are closed w.r.t. the set of implications previously computed. The Closed func- tion is defined having linear cost and is used to discern when a set of attributes is not closed and thus, the context is not visited in this case. Function Closed(A,Σ): boolean Data: A ⊆ M ∪ M with Pos(A)∩Neg(A) = ∅ and Σ being a set of mixed implications. Result: ‘true’ if A is closed wrt Σ or ‘false’ otherwise. 1 begin 2 foreach B → C ∈ Σ do 3 if B ⊆ A and C * A then exit and return false if B r A = {a}, A ∩ C 6= ∅, and a 6∈ A then exit and return false 4 return true 5 end 3 Mixed concept lattices As we have mentioned, the goal of this paper is to develop a deep study of the generalized algebraic framework. In this section we are going to introduce the main results of this paper providing the properties of the generalized concept lattice. The main pillar of our new framework are the two derivation operators introduced in Equations 4 and 5. The following theorem ensures that the pair of these operators is a Galois connection: A generalized A Generalized Framework framework for Positive andfor negativeAttributes Negative attributes in in FCA FCA 7 273 Theorem 1. Let K = hG, M, Ii be a formal context. The pair of derivation operators (⇑, ⇓) introduced in Definition 1 is a Galois Connection. Proof. We need to prove that, for all subsets X ⊆ G and Y ⊆ M ∪ M , X ⊆ Y ⇓ if and only if Y ⊆ X ⇑ First, assume X ⊆ Y ⇓ . For all a ∈ Y , we distinguish two cases: 1. If a ∈ Pos(Y ), exists m ∈ M with a = m and, for all g ∈ X, since X ⊆ Y ⇓ , hg, mi ∈ I and therefore a = m ∈ X ⇑ . 2. If a ∈ Neg(Y ), exits m ∈ M with a = m and, for all g ∈ X, since X ⊆ Y ⇓ , hg, mi 6∈ I and therefore a = m ∈ X ⇑ . Conversely, assume Y ⊆ X ⇑ and g ∈ X. To ensure that g ∈ Y ⇓ , we need to prove that hg, ai ∈ I for all a ∈ Pos(Y ) and hg, ai ∈ / I for all a ∈ Neg(Y ), which is straightforward from Y ⊆ X ⇑ . t u Therefore, above theorem ensures that ⇑◦⇓ and ⇓◦⇑ are closure operators. Furthermore, as in the classical case, both closure operators provide two dually isomorphic lattices. We denote by B] (G, M, I) to the lattice of mixed concepts with the relation hX1 , Y1 i ≤ hX2 , Y2 i iff X1 ⊆ X2 (or equivalently, iff Y1 ⊇ Y2 ) Moreover, as in the classical FCA, mixed implications and mixed concept lattice make up the two sides of the same coin, i.e. the information mined from the mixed formal context may be dually represented by means of a set of mixed attribute implications or a mixed concept lattice. As we shall see later in this section, unlike the classical FCA, mixed concept lattices are restricted to an specific lattice subclass. There exist specific prop- erties that lattices may observe to be considered a valid lattice structure which corresponds to a mixed formal context. In fact, this is one of the main goal of this paper, the characterization of the lattices in the mixed formal concept analysis. In Table 3 six different lattices are depicted. In the classical framework, all of them may be associated with formal contexts, i.e. in the classical framework any lattice corresponds with a collection of formal context. Nevertheless, in the mixed attribute framework this property does not hold anymore. Thus, in Table 3, as we shall prove later in this paper, lattices 3 and 5 cannot be associated with a mixed formal context. The following two definitions characterizes two kind of significant sets of attributes that will be used later: Definition 4. Let K = hG, M, Ii be a formal context. A set A ⊆ M ∪ M is named consistent set if Pos(A) ∩ Neg(A) = ∅. The set of consistent sets are going to be denoted by Ctts, i.e. Ctts = {A ⊆ M ∪ M | Pos(A) ∩ Neg(A) = ∅} If A ∈ Ctts then |A| ≤ |M | and, in the particular case where |A| = |M |, we have Tot(A) = M . This situation induces the notion of full set: 274 8 Rodriguez-Jimenez José et al. Manuel Rodrı́guez-Jiménez et al. ◉ ◉ ◉ ◉ ◉ ◉ Lattice 1 Lattice 2 Lattice 3 ◉ ◉ ◉ ◉ ◉ ◉ ◉ ◉ ◉ ◉ ◉ ◉ ◉ ◉ ◉ ◉ Lattice 4 Lattice 5 Lattice 6 Table 3. Scheletons of some lattices Definition 5. Let K = hG, M, Ii be a formal context. A set A ⊆ M ∪ M is said to be full consistent set if A ∈ Ctts and Tot(A) = M . The following lemma, which characterize the boundary cases, is straightforward from Definition 1. Lemma 1. Let K = hG, M, Ii be a formal context. Then ∅⇑ = M ∪ M , ∅⇓ = G and (M ∪ M )⇓ = ∅. In the classical framework, the concept lattice B(G, M, I) is bounded by hM ↓ , M i and hG, G↑ i. However, in this generalized framework, as a direct consequence from above lemma, the lower and upper bounds of B ] (G, M, I) are h∅, M ∪ M i and hG, G⇑ i respectively. Lemma 2. Let K = hG, M, Ii be a formal context. The following properties hold: 1. For all g ∈ G, {g}⇑ is a full consistent set. 2. For all g1 , g2 ∈ G, if g1T∈ {g2 }⇑⇓ then {g1 }⇑ = {g2 }⇑ . 1 3. For all X ⊆ G, X ⇑ = g∈X {g}⇑ . Proof. 1. It is obvious because, for all m ∈ M , hg, mi ∈ I or hg, mi ∈/ I and {g}⇑ = {m ∈ M | hg, mi ∈ I} ∪ {m ∈ M | hg, mi ∈ / I} being a disjoint union. Thus, Tot({g}⇑ ) = M and Pos({g}⇑ ) ∩ Neg({g}⇑ ) = ∅. 1 That is, g1 and g2 have exactly the same attributes. A generalized A Generalized Framework framework for Positive andfor negativeAttributes Negative attributes in in FCA FCA 9 275 2. Since (⇑, ⇓) is a Galois connection, g1 ∈ {g2 }⇑⇓ (i.e. {g1 } ⊆ {g2 }⇑⇓ ) implies {g2 }⇑ ⊆ {g1 }⇑ . Moreover, by item 1, both {g1 }⇑ and {g2 }⇑ are full consistent and, therefore, {g1 }⇑ = {g2 }⇑ . 3. In the same way that occurs in the classical framework, since (⇑, ⇓) is a Galois connection between (2G , ⊆) and (2M ∪M , ⊆), for any X ⊆ G, we have S ⇑ T that X ⇑ = g∈X {g} = g∈X {g}⇑ . t u The above elementary lemmas lead to the following theorem emphasizing a sig- nificant difference with respect to the classical construction and it focuses on how the inclusion of new objects influences the structure of mixed concept lattice. Theorem 2. Let K = hG, M, Ii be a formal context, g0 be a new object, i.e. g0 ∈ / G, and Y ⊆ M be the set of attributes that g0 satisfies. Then, there exists g ∈ G such that {g}⇑ = {g0 }⇑ if and only if there exists an isomorphism between B ] (G, M, I) and B ] (G ∪ {g0 }, M, I ∪ {hg0 , mi | m ∈ Y }). That is, if a new different object (an object that differs at least in one attribute from each object in the context) is added to the formal context then the mixed concept lattice changes. Proof. Obviously, if there exists g ∈ G such that {g}⇑ = {g0 }⇑ , from Lemma 2 g and g0 have exactly the same attributes, and moreover the lattices B ] (G, M, I) and B ] (G ∪ {g0 }, M, I ∪ {hg0 , mi | m ∈ Y }) are isomorphic. Conversely, if the mixed concept lattices are isomorphic, there exists X ⊆ G such that the closed set X ⇑ in B ] (G, M, I) coincides with {g0 }⇑ . Thus, in the mixed concept lattice B ] (G ∪ {g0 }, M, I ∪ {hg0 , mi | m ∈ X}), by Lemma 2, we have that {g0 }⇑ = X ⇑ = ∩g∈X {g}⇑ . Moreover, since {g0 }⇑ is a full consistent set, X 6= ∅ because of, by Lemma 1, ∅⇑ = M ∪ M . Therefore, for all g ∈ X (there exists at least one g ∈ X), g0 ∈ {g}⇑ and, by Lemma 2, {g}⇑ = {g0 }⇑ . t u Example 1. Let K1 = ({g1, g2}, {a, b, c}, I1 ) and K2 = ({g1, g2, g3}, {a, b, c}, I2 ) be formal contexts where I1 and I2 are the binary relations depicted in Table 4. Note that K2 is built from K1 by adding the new object g3. In the classical frame- I2 a b c I1 a b c g1 × × g1 × × g2 × × g2 × × g3 × Table 4. The formal contexts K1 and K2 work, the concept lattices B({g1, g2}, {a, b, c}, I1 ) and B({g1, g2, g3}, {a, b, c}, I2 ) are isomorphic. See Figure 1. However, the lattices of mixed concepts cannot be isomorphic because the new object g3 is not a repetition of one existing object. See Figure 2. The following theorem characterizes the atoms of the new concept lattice B ] . 276 10 Rodriguez-Jimenez José et al. Manuel Rodrı́guez-Jiménez et al. <⦰,abc> <⦰,abc> B({g1, g2}, {a, b, c}, I1 ) B({g1, g2, g3}, {a, b, c}, I2 ) Fig. 1. Lattices obtained in the classical framework <⦰,abcabc> <⦰,abcabc> B] ({g1, g2}, {a, b, c}, I1 ) B] ({g1, g2, g3}, {a, b, c}, I2 ) Fig. 2. Lattices obtained in the extended framework A generalized A Generalized Framework framework for Positive andfor negativeAttributes Negative attributes in in FCA FCA 11 277 Theorem 3. Let K = hG, M, Ii be a formal context. The set of atoms in the lattice B ] (G, M, I) is {h{g}⇑⇓ , {g}⇑ i | g ∈ G}. Proof. First, fixed g0 ∈ G, we are going to prove that the mixed concept h{g0 }⇑⇓ , {g0 }⇑ i is an atom in B ] (G, M, I). If hX, Y i is a mixed concept such that ⇑⇓ ⇑ ⇑ ⇑ h∅, M ∪ M i < hX, Y i ≤ h{g T 0 } , {g0 } i, then {g0 } ⊆ Y = X M ∪ M . By Lemma 2, {g0 }⇑ ⊆ X ⇑ = g∈X {g}⇑ . Moreover, for all g ∈ X 6= ∅, by Lemma 2, both {g0 }⇑ and {g}⇑ are full consistent sets and, since {g0 }⇑ ⊆ {g}⇑ , we have {g0 }⇑ = {g}⇑ . Therefore, {g0 }⇑ = X ⇑ = Y and hX, Y i = h{g0 }⇑⇓ , {g0 }⇑ i. Conversely, if hX, Y i is an atom in B ] (G, M, I), then X 6= ∅ and there exists g0 ∈ X. Since (⇑, ⇓) is a Galois connection, {g0 }⇑ ⊇ X ⇑ = Y and, therefore, h{g0 }⇑⇓ , {g0 }⇑ i ≤ hX, Y i. Finally, since hX, Y i is an atom, we have that hX, Y i = h{g0 }⇑⇓ , {g0 }⇑ i. t u The following theorem establishes the characterization of the mixed concept lattice, proving that atoms and join irreducible elements are the same notions. Theorem 4. Let K = hG, M, Ii be a formal context. Any element in B ] (G, M, I) is ∨-irreducible if and only if it is an atom. Proof. Obviously, any atom is ∨-irreducible. We are going to prove that any ∨-irreducible element belongs to {h{g}⇑⇓ , {g}⇑ i | g ∈ T G}. Let hX, Y i be a ∨- irreducible element. Then, by Lemma 2, Y = X ⇑ = g∈X {g}⇑ . Let X 0 be the T smaller set such that X 0 ⊆ X and Y = g∈X 0 {g}⇑ . If X 0 is a singleton, then hX, Y i ∈ {h{g}⇑⇓ , {g}⇑ i | g ∈ G}. Finally, we prove that X 0 is necessarily a singleton. In other case, a bipartition of X 0 in two disjoint sets Z1 and Z2 can T be made satisfying T Z1 ∪Z2 = X 0 , Z1 6= ∅, Z2 6= ∅ and Z1 ∩ Z2 6= ∅. Then, Y = g∈Z1 {g}⇑ ∩ g∈Z2 {g}⇑ = Z1⇑ ∩ Z2⇑ and so hX, Y i = hZ1⇑⇓ , Z1⇑ i ∨ hZ2⇑⇓ , Z2⇑ i and Z1⇑ 6= Y 6= Z2⇑ . However, it is not posible because hX, Y i is ∨-irreducible. t u As a final end point of this study, we may conclude that unlike in the classical framework, not every concept lattice may be linked with a formal context. Thus, lattices number 3 and 5 from Table 3 cannot be associated with a mixed formal context. Both of them have one element which is not an atom but, at the same time, it is a join irreducible element in the lattice. More specifically, there does not exists a mixed concept lattice with three elements. 4 Conclusions In this work we have presented an algebraic study of a general framework to deal with negative and positive information. After considering new derivation operators we prove that they constitutes a Galois connection. The main results of the work are devoted to establish the new relation among mixed concept lattices and mixed formal concepts. Thus, the most outstanding conclusions are that: 278 12 Rodriguez-Jimenez José et al. Manuel Rodrı́guez-Jiménez et al. – the inclusion of a new (and different) object in a formal concept has a direct effect in the structure of the lattice, producing a different lattice. – no any kind of lattice may be associated with a mixed formal context, which induces a restriction in the structure that mixed concept lattice may have. Acknowledgements Supported by grant TIN2011-28084 of the Science and Innovation Ministry of Spain, co-funded by the European Regional Development Fund (ERDF). References 1. R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large Databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), pages 487–499, Santiago de Chile, Chile, 1994. Morgan Kaufmann Publishers Inc. 2. J.F. Boulicaut, A. Bykowski, and B. Jeudy. Towards the tractable discovery of association rules with negations. In FQAS, pages 425–434, 2000. 3. B. Ganter. Two basic algorithms in concept analysis. Technische Hochschule, Darmstadt, 1984. 4. G. Gasmi, S. Ben Yahia, E. Mephu Nguifo, and S. Bouker. Extraction of association rules based on literalsets. In DaWaK, pages 293–302, 2007. 5. J.L. Guigues and V. Duquenne. Familles minimales d implications informatives resultant d un tableau de donnees binaires. Mathematiques et Sciences Sociales, 95:5–18, 1986. 6. H. Mannila, H. Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discov- ering association rules. In KDD Workshop, pages 181–192, 1994. 7. R. Missaoui, L. Nourine, and Y. Renaud. Generating positive and negative exact rules using formal concept analysis: Problems and solutions. In ICFCA, pages 169–181, 2008. 8. R. Missaoui, L. Nourine, and Y. Renaud. An inference system for exhaustive generation of mixed and purely negative implications from purely positive ones. In CLA, pages 271–282, 2010. 9. R. Missaoui, L. Nourine, and Y. Renaud. Computing implications with negation from a formal context. Fundam. Inform., 115(4):357–375, 2012. 10. A. Revenko and S. Kuznetzov. Finding errors in new object intents. In CLA, pages 151–162, 2012. 11. J.M. Rodriguez-Jimenez, P. Cordero, M. Enciso, and A. Mora. Negative attributes and implications in formal concept analysis. Procedia Computer Science, 31(0):758 – 765, 2014. 2nd International Conference on Information Technology and Quan- titative Management, ITQM 2014. 12. R. Wille. Restructuring lattice theory: an approach based on hierarchies of con- cepts. In Rival, I. (ed.): Ordered Sets, pages 445–470. Boston, 1982. Author Index Aı̈t-Kaci, Hassan, 3 Liquière, Michel, 11 Al-Msie’Deen, Ra’Fat, 95 Loiseau, Yannick, 131 Alam, Mehwish, 255 Antoni, L’ubomı́r, 35, 83 Mora, Ángel, 145, 267 Mouakher, Amira, 169 Baixeries, Jaume, 1, 243 Bartl, Eduard, 207 Naidenova, Xenia, 181 Ben Yahia, Sadok, 169 Napoli, Amedeo, 243, 255 Bertet, Karell, 145, 219 Nebut, Clémentine, 11 Bich Dao, Ngoc, 219 Nourine, Lhouari, 231 Cabrera, Inma P., 157 Ojeda-Aciego, Manuel, 157 Ceglar, Aaron, 23 Otaki, Keisuke, 47, 59 Cepek, Ondrej, 9 Codocedo, Victor, 243 Parkhomenko, Vladimir, 181 Cordero, Pablo, 145, 267 Pattison, Tim, 23 Coupelon, Olivier, 131 Peláez-Moreno, Carmen, 119 Peñas, Anselmo, 119 Dia, Diyé, 131 Pócs, Jozef, 157 Dimassi, Ilyes, 169 Priss, Uta, 7 Enciso, Manuel, 145, 267 Raynaud, Olivier, 131 Revel, Arnaud, 219 Gnatyshak, Dmitry V., 231 Rodrı́guez Lorenzo, Estrella, 145 Guniš, Ján, 35 Rodrı́guez-Jiménez, José Manuel, 267 Huchard, Marianne, 11, 95 Saada, Hajer, 11 Ignatov, Dmitry I., 231 Seki, Hirohisa, 71 Ikeda, Madori, 47, 59 Seriai, Abdelhak, 95 Šnajder L’ubomı́r, 35 Kamiya, Yohei, 71 Kauer, Martin, 195 Trnecka, Martin, 107 Kaytoue, Mehdi, 243 Trneckova, Marketa, 107 Konecny, Jan, 207 Krajči, Stanislav, 35, 83 Urtado, Christelle, 95 Krı́dlo, Ondrej, 35, 83 Krupka, Michal, 195 Valverde Albacete, Francisco J., 119 Kuznetsov, Sergei O., 231 Vauttier, Sylvain, 95 Labernia, Fabien, 131 Yamamoto, Akihiro, 47, 59 Title: CLA 2014, Proceedings of the Eleventh International Conference on Concept Lattices and Their Applications Publisher: Pavol Jozef Šafárik University in Košice Expert advice: Library of Pavol Jozef Šafárik University in Košice (http://www.upjs.sk/pracoviska/univerzitna-kniznica) Year of publication: 2014 Number of copies: 70 Page count: XII + 280 Authors sheets count: 15 Publication: First edition Print: Equilibria, s.r.o. ISBN 978–80–8152–159–1