=Paper= {{Paper |id=Vol-1252/proceedings-cla2014 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1252/proceedings-cla2014.pdf |volume=Vol-1252 }} ==None== https://ceur-ws.org/Vol-1252/proceedings-cla2014.pdf
CLA 2014
Proceedings of the Eleventh International Conference on
Concept Lattices and Their Applications



CLA Conference Series
cla.inf.upol.cz
         Institute of Computer Science
Pavol Jozef Šafárik University in Košice, Slovakia

            ISBN 978–80–8152–159–1
Karell Bertet, Sebastian Rudolph (Eds.)




                CLA 2014

Concept Lattices
and Their Applications

Volume I

11th International Conference on Concept Lattices
and Their Applications
Košice, Slovakia, October 07–10, 2014
Proceedings




P. J. Šafárik University, Košice, Slovakia
2014
Volume editors

Karell Bertet
Université de La Rochelle
La Rochelle, France
E-mail: kbertet@univ-lr.fr

Sebastian Rudolph
Technische Universität Dresden
Dresden, Germany
E-mail: sebastian.rudolph@tu-dresden.de




Technical editor
Sebastian Rudolph, sebastian.rudolph@tu-dresden.de

Cover design
Róbert Novotný, robert.novotny@upjs.sk




c   P. J. Šafárik University, Košice, Slovakia 2014


This work is subject to copyright. All rights reserved. Reproduction or publica-
tion of this material, even partial, is allowed only with the editors’ permission.




                           ISBN 978–80–8152–159–1
                          Organization


CLA 2014 was organized by the Institute of Computer Science, Pavol Jozef
Šafárik University in Košice.


Steering Committee

Radim Bělohlávek           Palacký University, Olomouc, Czech Republic
Sadok Ben Yahia              Faculté des Sciences de Tunis, Tunisia
Jean Diatta                  Université de la Réunion, France
Peter Eklund                 University of Wollongong, Australia
Sergei O. Kuznetsov          State University HSE, Moscow, Russia
Engelbert Mephu Nguifo       Université de Clermont Ferrand, France
Amedeo Napoli                LORIA, Nancy, France
Manuel Ojeda-Aciego          Universidad de Málaga, Spain
Jan Outrata                  Palacký University, Olomouc, Czech Republic


Program Chairs

Karell Bertet                Université de La Rochelle, France
Sebastian Rudolph            Technische Universität Dresden, Germany


Program Committee

Kira Adaricheva              Nazarbayev University, Astana, Kazakhstan
Cristina Alcalde             Univ del Pais Vasco, San Sebastián, Spain
Jamal Atif                   Université Paris Sud, France
Jaume Baixeries              Polytechnical University of Catalonia, Spain
Radim Bělohlávek           Palacký University, Olomouc, Czech Republic
Sadok Ben Yahia              Faculty of Sciences, Tunis, Tunisia
François Brucker            Ecole Centrale Marseille, France
Ana Burusco                  Universidad de Navarra, Pamplona, Spain
Claudio Carpineto            Fondazione Ugo Bordoni, Roma, Italy
Pablo Cordero                Universidad de Málaga, Spain
Mathieu D’Aquin              The Open University, Milton Keynes, UK
Christophe Demko             Université de La Rochelle, France
Jean Diatta                  Université de la Réunion, France
Florent Domenach             University of Nicosia, Cyprus
Vincent Duquenne             Université Pierre et Marie Curie, Paris, France
Sebastien Ferre                Université de Rennes 1, France
Bernhard Ganter                Technische Universität Dresden, Germany
Alain Gély                    Université Paul Verlaine, Metz, France
Cynthia Vera Glodeanu          Technische Universität Dresden, Germany
Robert Godin                   Université du Québec à Montréal, Canada
Tarek Hamrouni                 Faculty of Sciences, Tunis, Tunisia
Marianne Huchard               LIRMM, Montpellier, France
Céline Hudelot                Ecole Centrale Paris, France
Dmitry Ignatov                 State University HSE, Moscow, Russia
Mehdi Kaytoue                  LIRIS - INSA de Lyon, France
Jan Konecny                    Palacký University, Olomouc, Czech Republic
Marzena Kryszkiewicz           Warsaw University of Technology, Poland
Sergei O. Kuznetsov            State University HSE, Moscow, Russia
Leonard Kwuida                 Bern University of Applied Sciences, Switzerland
Florence Le Ber                Strasbourg University, France
Engelbert Mephu Nguifo         Université de Clermont Ferrand, France
Rokia Missaoui                 Université du Québec en Outaouais, Gatineau,
                               Canada
Amedeo Napoli                  LORIA, Nancy, France
Lhouari Nourine                Université de Clermont Ferrand, France
Sergei Obiedkov                State University HSE, Moscow, Russia
Manuel Ojeda-Aciego            Universidad de Málaga, Spain
Jan Outrata                    Palacký University, Olomouc, Czech Republic
Pascal Poncelet                LIRMM, Montpellier, France
Uta Priss                      Ostfalia University, Wolfenbüttel, Germany
Olivier Raynaud                LIMOS, Université de Clermont Ferrand, France
Camille Roth                   Centre Marc Bloch, Berlin, Germany
Barış Sertkaya                SAP Research Center, Dresden, Germany
Henry Soldano                  Laboratoire d’Informatique de Paris Nord, France
Gerd Stumme                    University of Kassel, Germany
Laszlo Szathmary               University of Debrecen, Hungary
Petko Valtchev                 Université du Québec à Montréal, Canada
Francisco J. Valverde Albacete Universidad Nacional de Educación a Distancia,
                               Spain




Additional Reviewers



Xavier Dolques                Strasbourg University, France
Philippe Fournier-Viger       University of Moncton, Canada
Michal Krupka                 Palacký University, Olomouc, Czech Republic
Organization Committee

Ondrej Krı́dlo       Pavol Jozef Šafárik University, Košice, Slovakia

Stanislav Krajči    Pavol Jozef Šafárik University, Košice, Slovakia
L’ubomı́r Antoni     Pavol Jozef Šafárik University, Košice, Slovakia
Lenka Pisková       Pavol Jozef Šafárik University, Košice, Slovakia
Róbert Novotný     Pavol Jozef Šafárik University, Košice, Slovakia
                                        Table of Contents



Preface
Invited Contributions
Relationship between the Relational Database Model and FCA . . . . . . . . .                                               1
   Jaume Baixeries

What Formalism for the Semantic Web? . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                             3
  Hassan Aı̈t-Kaci

Linguistic Data Mining with FCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                      7
   Uta Priss

Shortest CNF Representations of Pure Horn Functions and their
Connection to Implicational Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                      9
   Ondrej Cepek

Full Papers
Learning Model Transformation Patterns using Graph Generalization . . . .                                                  11
   Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine
   Nebut

Interaction Challenges for the Dynamic Construction of Partially-
Ordered Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     23
    Tim Pattison and Aaron Ceglar

The Educational Tasks and Objectives System within a Formal Context .                                                      35
   L’ubomı́r Antoni, Ján Guniš, Stanislav Krajči, Ondrej Krı́dlo and
   L’ubomı́r Šnajder

Pattern Structures for Understanding Episode Patterns . . . . . . . . . . . . . . . .                                      47
   Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto

Formal Concept Analysis for Process Enhancement Based on a Pair of
Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   59
   Madori Ikeda, Keisuke Otaki and Akihiro Yamamoto

Merging Closed Pattern Sets in Distributed Multi-Relational Data . . . . . .                                               71
  Hirohisa Seki and Yohei Kamiya

Looking for Bonds between Nonhomogeneous Formal Contexts . . . . . . . . .                                                 83
   Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči
Reverse Engineering Feature Models from Software Configurations
using Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                  95
   Ra’Fat Al-Msie’Deen, Marianne Huchard, Abdelhak Seriai,
   Christelle Urtado and Sylvain Vauttier

An Algorithm for the Multi-Relational Boolean Factor Analysis based
on Essential Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
   Martin Trnecka and Marketa Trneckova

On Concept Lattices as Information Channels . . . . . . . . . . . . . . . . . . . . . . . . 119
   Francisco J. Valverde Albacete, Carmen Peláez-Moreno and
   Anselmo Peñas

Using Closed Itemsets for Implicit User Authentication in Web Browsing . 131
   Olivier Coupelon, Diyé Dia, Fabien Labernia, Yannick Loiseau and
   Olivier Raynaud

The Direct-optimal Basis via Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
   Estrella Rodrı́guez Lorenzo, Karell Bertet, Pablo Cordero, Manuel
   Enciso and Ángel Mora

Ordering Objects via Attribute Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
   Inma P. Cabrera, Manuel Ojeda-Aciego and Jozef Pócs

DFSP: A New Algorithm for a Swift Computation of Formal Concept
Set Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
   Ilyes Dimassi, Amira Mouakher and Sadok Ben Yahia

Attributive and Object Subcontexts in Inferring Good Maximally
Redundant Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
   Xenia Naidenova and Vladimir Parkhomenko

Removing an Incidence from a Formal Context . . . . . . . . . . . . . . . . . . . . . . . 195
  Martin Kauer and Michal Krupka

Formal L-concepts with Rough Intents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
   Eduard Bartl and Jan Konecny

Reduction Dimension of Bags of Visual Words with FCA . . . . . . . . . . . . . . 219
   Ngoc Bich Dao, Karell Bertet and Arnaud Revel

A One-pass Triclustering Approach: Is There any Room for Big Data? . . . 231
   Dmitry V. Gnatyshak, Dmitry I. Ignatov, Sergei O. Kuznetsov and
   Lhouari Nourine

Three Related FCA Methods for Mining Biclusters of Similar Values
on Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
   Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo
   Napoli
Defining Views with Formal Concept Analysis for Understanding
SPARQL Query Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
   Mehwish Alam and Amedeo Napoli
A Generalized Framework to Consider Positive and Negative Attributes
in Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
    José Manuel Rodrı́guez-Jiménez, Pablo Cordero, Manuel Enciso
    and Ángel Mora

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
                                   Preface


Formal Concept Analysis is a mathematical theory formalizing aspects of hu-
man conceptual thinking by means of lattice theory. As such, it constitutes a
theoretically well-founded, practically proven, human-centered approach to data
science and has been continuously contributing valuable insights, methodologies
and algorithms to the scientific community.
The International Conference “Concept Lattices and Their Applications (CLA)”
is being organized since 2002 with the aim of providing a forum for researchers
involved in all aspects of the study of FCA, from theory to implementations and
practical applications. Previous years’ conferences took place in Hornı́ Bečva, Os-
trava, Olomouc (all Czech Republic), Hammamet (Tunisia), Montpellier (France),
Olomouc (Czech Republic), Sevilla (Spain), Nancy (France), Fuengirola (Spain),
and La Rochelle (France). The eleventh edition of CLA was held in Košice, Slo-
vakia from October 7 to 10, 2014. The event was organized and hosted by the
Institute of Computer Science at Pavol Jozef Šafárik University in Košice.
This volume contains the selected papers as well as abstracts of the four invited
talks. We received 28 submissions of which 22 were accepted for publication
and presentation at the conference. We would like to thank the contributing
authors, who submitted high quality works. In addition we were very happy to
welcome five distinguished invited speakers: Jaume Baixeries, Hassan Aı̈t-Kassi,
Uta Priss, and Ondrej Cepek. All submitted papers underwent a thorough review
by members of the Program Committee with the help of additional reviewers.
We would like to thank all reviewers for their valuable assistance. A selection of
extended versions of the best papers will be published in a renowned journal,
pending another reviewing process.
The success of such an event heavily relies on the hard work and dedication of
many people. Next to the authors and reviewers, we would also like to acknowl-
edge the help of the CLA Steering Committee, who gave us the opportunity
of chairing this edition and provided advice and guidance in the process. Our
greatest thanks go to the local Organization Committee from the Institute of
Computer Science, Pavol Jozef Šafárik University in Košice, who put a lot of ef-
fort into the local arrangements and provided the pleasant atmosphere necessary
to attain the goal of providing a balanced event with a high level of scientific
exchange. Finally, it is worth noting that we benefited a lot from the EasyChair
conference management system, which greatly helped us to cope with all the
typical duties of the submission and reviewing process.



October 2014                                                      Karell Bertet
                                                             Sebastian Rudolph
                                                    Program Chairs of CLA 2014
    Relationship between the Relational Database
                  Model and FCA

                                  Jaume Baixeries

                           Computer Science Department
                         Universitat Politcnica de Catalunya
                                Barcelona. Catalonia

    The Relational Database Model (RDBM) [3, 4] is one of the most relevant
database models that are being currently used to manage data. Although some
alternative models are also being used and implemented (namely, object oriented
databases and structured datatypes databases or NoSQL databases [1, 2]), the
RDBM still maintains its popularity, as some rankings indicate 1 .
    The RDBM can be formulated from a set-theoretical point of view, such that
a tuple is a partial function, and other basic operations in this model such as
projections, joins, selections, etc, can be seen as set operations.
    Another important feature of this model is the existence of constraints, which
are first-order predicates that must hold in a relational database. These con-
straints mostly describe conditions that must hold in order to keep the consis-
tency of the data in the database, but also help to describe some semantical
aspects of the dataset.
    In this talk, we consider some aspects of the RDBM that have been char-
acterized with FCA, focusing on different kinds of constraints that appear in
the Relational Model. We review some results that formalize different kinds of
contraints with FCA [5–8]. We also explain how some concepts of the RDBM
such as key, closure, completion, cover can be easily be understood with FCA.


References
1. Kai Orend. Analysis and Classification of NoSQL Databases and Evaluation of their
   Ability to Replace an Object-relational Persistence Layer. 2010. doi=10.1.1.184.483
2. A B M Moniruzzaman, Syed Akhter Hossain NoSQL Database: New Era of
   Databases for Big data Analytics. Classification, Characteristics and Comparison.
   arXiv:1307.0191 [cs.DB]
3. Codd, E. F. A Relational Model of Data for Large Shared Data Banks. Commun.
   ACM, 1970, volume 13, number 6.
4. Date, C. J. An Introduction to Database Systems (8 ed.). Pearson Education. ISBN
   0-321-19784-4.
5. Baixeries, Jaume. A Formal Context for Symmetric Dependencies. ICFCA 2008.
   LNAI 4933.
6. Baixeries, Jaume and Balcázar, José L. Characterization and Armstrong Relations
   for Degenerate Multivalued Dependencies Using Formal Concept Analysis. For-
   mal Concept Analysis, Third International Conference, ICFCA 2005, Lens, France,
   February 14-18, 2005, Proceedings. Lecture Notes in Computer Science, 2005
1
    http://db-engines.com/en/ranking
2     Jaume Baixeries

    7. Baixeries, Jaume and Balcázar, José L. Unified Characterization of Symmetric De-
       pendencies with Lattices. Contributions to ICFCA 2006. 4th International Confer-
       ence on Formal Concept Analysis 2005.
    8. Baixeries, Jaume. A Formal Concept Analysis framework to model functional de-
       pendencies. Mathematical Methods for Learning, 2004.
            What Formalism for the Semantic Web?

                                     Hassan Aı̈t-Kaci
                         hassan.ait-kaci@univ-lyon1.fr
                              ANR Chair of Excellence
                                    CEDAR Project
                                          LIRIS
                           Université Claude Bernard Lyon 1
                                         France

The world is changing. The World Wide Web is changing. It started out as a set of purely
notational conventions for interconnecting information over the Internet. The focus of
information processing has now shifted from local disconnected disc-bound silos to
Internet-wide interconnected clouds. The nature of information has also evolved. From
raw uniform data, it has now taken the shape of semi-structured data and meaning-
carrying so-called “Knowledge Bases.” While it was sufficient to process raw data
with structure-aware querying, it has now become necessary to process knowledge with
contents-aware reasoning. Computing must therefore adapt from dealing with mere ex-
plicit data to inferring implicit knowledge. How to represent such knowledge and how
inference therefrom can be made effective (whether reasoning or learning) is thus a
central challenge among the many now facing the world wide web.
    So called “ontologies” are being specified and meant to encode formally encyclo-
pedic as well as domain-specific knowledge. One early (still on-going) such effort has
been the Cyc1 system. It is a knowledge-representation system (using LISP syntax) that
makes use of a set of varied reasoning methods, altogether dubbed “commonsense.” A
more recent formalism issued of Description Logic (DL)—viz. the Web Ontology Lan-
guage (OWL2 )—has been adopted as a W3C recommendation. It encodes knowledge
using a specific standardized (XML, RDF) syntax. Its constructs are given a model-
theoretic semantics which is usually realized operationally using tableau3 -based rea-
soning.4 The point is that OWL is clearly designed for a specific logic and reason-
ing method. Saying that OWL is the most adequate interchange formalism for Knowl-
edge Representation (KR) and automated reasoning (AR) is akin to saying that English
is the best designed human language for facilitating information interchange among
humans—notwithstanding the fact that it was simply imposed by the most recent per-
vasive ruling power, just as Latin was Europe’s Lingua Franca for centuries.
    Thus, it is fair to ask one’s self a simple question: “Is there, indeed, a single most
adequate knowledge representation and reasoning method that can be such a norm? ”

 1
   http://www.cyc.com/platform/opencyc
 2
   http://www.w3.org/TR/owl-features/
 3
   http://en.wikipedia.org/wiki/Method_of_analytic_tableaux
 4
   Using of tableau methods is the case of the most prominent SW reasoner [6, 5, 7]. Systems
   using alternative reasoning methods must first translate the DL-based syntax of OWL into
   their own logic or RDF query processing. This may be costly [9] and/or incomplete [8].
4      Hassan Aı̈t-Kaci


     I personally do not think so. In this regard, I share the general philosophy of Doug
Lenat5 , Cyc’s designer—although not the haphazard approach he has chosen to follow.6
     If one ponders what characterizes an ontology making up a knowledge base, some
specific traits most commonly appear. For example, it is universally acknowledged that,
rather than being a general set of arbitrary formal logical statements describing some
generic properties of “the world,” a formal knowledge base is generally organized as
a concept-oriented information structure. This is as important a change of perspective,
just as object-oriented programming was with respect to traditional method-oriented
programming. Thus, some notion of property “inheritance” among partially-ordered
“concepts” (with an “is-a” relation) is a characteristic aspect of KR formalisms. In
such a system, a concept has a straightforward semantics: its denotes of set of elements
(its “instances”) and the “is-a” relation denotes set inclusion. Properties attached to a
concept denote information pertaining to all instances of this concept. All properties
verified by a concept are therefore inherited by all its subconcepts.
     Sharing this simple characteristic, formal KR formalisms have emerged from sym-
bolic mathematics that offer means to reason with conceptual information, depending
on mathematical apparatus formalizing inheritance and the nature of properties attached
to concepts. In Description Logic7 , properties are called “roles” and denote binary re-
lations among concepts. On the other hand, Formal Concept Analysis (FCA8 ) uses an
algebraic approach whereby an “is-a” ordering is automatical derived from proposi-
tional properties encoding the concepts that are attached to as bit vectors. A concept is
associated an attribute with a boolean marker (1 or “true”) if it possesses it, and with
a (0 or “false”) otherwise. The bit vectors are simply the rows of the “property ma-
trix” relating concepts to their attributes. This simple and powerful method, originally
proposed by Rudolf Wille, has a dual interpretation when matching attributes with con-
cepts possessing them. Thus, dually, it views attributes also as partially ordered (as the
columns of the binary matrix). An elegant Galois-connection ensues that enables sim-
ple extraction of conceptual taxonomies (and their dual attribute-ordered taxonomies)
from simple facts. Variations such as Relational Concept Analysis (RCA9 ) offer more
expressive, and thus more sophisticated, knowledge while preserving the essential alge-
braic properties of FCA. It has also been shown how DL-based reasoning (e.g. OWL)
can be enhanced with FCA.10
     Yet another formalism for taxonomic attributed knowledge, which I will present
in more detail in this presentation, is the Order-Sorted Feature (OSF) constraint for-
malism. This approach proposes to see everything as an order-sorted labelled graph.
 5
   http://en.wikipedia.org/wiki/Douglas_Lenat
 6
   However, I may stand corrected in the future since knowledge is somehow fundamentally
   haphazard. My own view is that, even for dealing with a heterogenous world, I would rather
   favor mathematically formal representation and reasoning methods dealing with uncertainty
   and approximate reasoning, whether probabilistic, fuzzy, or dealing with inconsistency (e.g.
   rough sets, paraconsistency).
 7
   http://en.wikipedia.org/wiki/Description_logic
 8
   http://en.wikipedia.org/wiki/Formal_concept_analysis
 9
   http://www.hse.ru/data/2013/07/04/1286082694/ijcai_130803.pdf
10
   http://ijcai-11.iiia.csic.es/files/proceedings/
   T13-ijcai11Tutorial.pdf
                                        What Formalism for the Semantic Web?             5


Sorts are set-denoting and partially ordered with an inclusion-denoting “is-a” relation,
and so form a conceptual taxonomy. Attributes, called “features,” are function-denoting
symbols labelling directed edges between sort-labelled nodes. Such OSF graphs are a
straightforward generalization of algebraic First-Order Terms (FOTs) as used in Logic
Programming (LP) and Functional Programming (FP). Like FOTs, they form a lattice
structure with OSF graph matching as the partial ordering, OSF graph unification
as infimum (denoting set intersection), and OSF graph generalization as supremum.11
Both operations are very efficient. These lattice-theoretic properties are preserved when
one endows a concept in a taxonomy with additional order-sorted relational and func-
tional constraints (using logical conjunction for unification and disjunction for general-
ization for the attached constraints). These constraints are inherited down the concep-
tual taxonomy in such a way as to be incrementally enforceable as a concept becomes
gradually refined.
    The OSF system has been the basis of Constraint Logic Programming for KR
and ontological reasoning (viz. LIFE) [2, 1]. As importantly, OSF graph-constraint
technology has been at work with great success in two essential areas of AI: NLP and
Machine Learning:

 – it has been a major paradigm in the field of Natural Language Processing (NLP)
   for a long time; notably, in so-called “Head-driven Phrase Structure Grammar”
   (HPSG12 ) and Unification Grammar (UG13 ) technology [4]. This is indeed not sur-
   prising given the ease with which feature structure unification enables combining
   both syntactic and semantic information in a clean, declarative, and efficient way.14
 – Similarly, while most of the attention in the OSF literature has been devoted to uni-
   fication, its dual—namely, generalization—is just as simple to use, and computes
   the most specific OSF term that subsumes two given terms [3]. This operation is
   central in Machine Learning and with it, OSF technology lends itself to be com-
   bined with popular Data Mining techniques such as Support Vector Machines using
   frequency or probabilistic information.

    In this presentation, I will give a rapid overview of the essential OSF formalism
for knowledge representation along its reasoning method which is best formalized as
order-sorted constraint-driven inference. I will also illustrate its operational efficiency
and scalability in comparison with those of prominent DL-based reasoners used for the
Semantic Web.
    The contribution of this talk to answering the question in its title is that the Semantic
Web effort should not impose a priori putting all our eggs in one single (untested)
basket. Rather, along with DL, other viable alternatives such as the FCA and OSF
formalisms, and surely others, should be combined for realizing a truly semantic web.
11
   This supremum operation, however, does not (always) denote set union—as for FOT subsump-
   tion, it is is not modular (and hence neither is it distributive).
12
   http://en.wikipedia.org/wiki/Head-driven_phrase_structure_
   grammar
13
   http://www.cs.haifa.ac.il/˜shuly/malta-slides.pdf
14
   http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.51.2021
6      Hassan Aı̈t-Kaci


References
1. A ÏT-K ACI , H. Data models as constraint systems—a key to the Semantic Web. Con-
   straint Processing Letters 1 (November 2007), 33–88. online: http://cs.brown.edu/
   people/pvh/CPL/Papers/v1/hak.pdf.
2. A ÏT-K ACI , H., AND P ODELSKI , A. Towards a meaning of LIFE. Journal of Logic Pro-
   gramming 16, 3-4 (1993), 195–234. online: http://hassan-ait-kaci.net/pdf/
   meaningoflife.pdf.
3. A ÏT-K ACI , H., AND S ASAKI , Y. An axiomatic approach to feature term generalization. In
   Proceedings of European Cinference on Machine Learning (ECML 2001) (Freiburg, Ger-
   many, 2001), L. D. Raedt and P. Flach, Eds., LNAI 2167, Springer-Verlag, pp. 1–12. online:
   http://www.hassan-ait-kaci.net/pdf/ecml01.pdf.
4. C ARPENTER , B. Typed feature structures: A generalization of first-order terms. In Proceed-
   ings of the 1991 International Symposium on Logic Programming (Cambridge, MA, USA,
   1991), V. Saraswat and K. Ueda, Eds., MIT Press, pp. 187–201.
5. M OTIK , B., S HEARER , R., AND H ORROCKS , I. Hypertableau reasoning for description
   logics. Journal of Artificial Intelligence Research 36, 1 (September 2009), 165–228. online:
   https://www.jair.org/media/2811/live-2811-4689-jair.pdf.
6. S HEARER , R., M OTIK , B., AND H ORROCKS , I. HermiT: A highly-efficient OWL rea-
   soner. In Proceedings of the 5th International Workshop on OWL Experiences and Direc-
   tions (Karlsruhe, Germany, October 2008), U. Sattler and C. Dolbear, Eds., OWLED’08,
   CEUR Workshop Proceedings. online: http://www.cs.ox.ac.uk/ian.horrocks/
   Publications/download/2008/ShMH08b.pdf.
7. S IRIN , E., PARSIA , B., G RAU , B. C., K ALYANPUR , A., AND K ATZ , Y. Pellet: A practical
   OWL-DL reasoner. Journal of Web Semantics 5, 2 (June 2007), 51–53. This is a summary;
   full paper: online: http://pellet.owldl.com/papers/sirin05pellet.pdf.
8. S TOILOS , G., C UENCA G RAU , B., AND H ORROCKS , I. How incomplete is your seman-
   tic web reasoner? In Proceedings of the 24th National Conference on Artificial Intelli-
   gence (AAAI 10) (Atlanta, Georgia, USA, July 11–15, 2010), M. Fox and D. Poole, Eds.,
   AAAI, AAAI Publications, pp. 1431–1436. online: http://www.cs.ox.ac.uk/ian.
   horrocks/Publications/download/2010/StCH10a.pdf.
9. T HOMAS , E., PAN , J. Z., AND R EN , Y. TrOWL: Tractable OWL 2 reasoning infrastruc-
   ture. In Proceedings of the 7th Extended Semantic Web Conference (Heraklion, Greece,
   May-June 2010), L. Aroyo, G. Antoniou, E. Hyvnen, A. ten Teije, H. Stuckenschmidt,
   L. Cabral, and T. Tudorache, Eds., ESWC’10, Springer-Verlag, pp. 431–435. online: http:
   //homepages.abdn.ac.uk/jeff.z.pan/pages/pub/TPR2010.pdf.
             Linguistic Data Mining with FCA

                                      Uta Priss

                    ZeLL, Ostfalia University of Applied Sciences
                              Wolfenbüttel, Germany
                                www.upriss.org.uk

    The use of lattice theory for linguistic data mining applications in the widest
sense has been independently suggested by different researchers. For example,
Masterman (1956) suggests using a lattice-based thesaurus model for machine
translation. Mooers (1958) describes a lattice-based information retrieval model
which was included in the first edition of Salton’s (1968) influential textbook.
Sladek (1975) models word fields with lattices. Dyvik (2004) generates lattices
which represent mirrored semantic structures in a bilingual parallel corpus. These
approaches were later translated into the language of Formal Concept Analysis
(FCA) in order to provide a more unified framework and to generalise them for
use with other applications (Priss (2005), Priss & Old (2005 and 2009)).
    Linguistic data mining can be subdivided into syntagmatic and paradigmatic
approaches. Syntagmatic approaches exploit syntactic relationships. For exam-
ple, Basili et al. (1997) describe how to learn semantic structures from the ex-
ploration of syntactic verb-relationships using FCA. This was subsequently used
in similar form by Cimiano (2003) for ontology construction, by Priss (2005)
for semantic classification and by Stepanova (2009) for the acquisition of lexico-
semantic knowledge from corpora.
    Paradigmatic relationships are semantic in nature and can, for example, be
extracted from bilingual corpora, dictionaries and thesauri. FCA neighbourhood
lattices are a suitable means of mining bilingual data sources (Priss & Old (2005
and 2007)) and monolingual data sources (Priss & Old (2004 and 2006)). Ex-
perimental results for neighbourhood lattices have been computed for Roget’s
Thesaurus, WordNet and Wikipedia data (Priss & Old 2006, 2010a and 2010b).
    Previous overviews of linguistic applications of FCA were presented by Priss
(2005 and 2009). This presentation summarises previous results and provides
an overview of more recent research developments in the area of linguistic data
mining with FCA.


References

1. Basili, R.; Pazienza, M.; Vindigni, M. (1997). Corpus-driven unsupervised learning
   of verb subcategorization frames. AI*IA-97.
2. Cimiano, P.; Staab, S.; Tane, J. (2003). Automatic Acquisition of Taxonomies from
   Text: FCA meets NLP. Proceedings of the ECML/PKDD Workshop on Adaptive
   Text Extraction and Mining, p. 10-17.
3. Dyvik, H. (2004). Translations as semantic mirrors: from parallel corpus to wordnet.
   Language and Computers, 49, 1, Rodopi, p. 311-326.
8     Uta Priss

    4. Masterman, Margaret (1956). Potentialities of a Mechanical Thesaurus. MIT Con-
       ference on Mechanical Translation, CLRU Typescript. [Abstract]. In: Report on
       research: Cambridge Language Research Unit. Mechanical Translation 3, 2, p. 36.
       Full paper in: Masterman (2005).
    5. Mooers, Calvin N. (1958). A mathematical theory of language symbols in retrieval.
       In: Proc. Int. Conf. Scientific Information, Washington D.C.
    6. Priss, Uta; Old, L. John (2004). Modelling Lexical Databases with Formal Concept
       Analysis. Journal of Universal Computer Science, 10, 8, p. 967-984.
    7. Priss, Uta (2005). Linguistic Applications of Formal Concept Analysis. In: Ganter;
       Stumme; Wille (eds.), Formal Concept Analysis, Foundations and Applications.
       Springer Verlag. LNAI 3626, p. 149-160.
    8. Priss, Uta; Old, L. John (2005). Conceptual Exploration of Semantic Mirrors. In:
       Ganter; Godin (eds.), Formal Concept Analysis: Third International Conference,
       ICFCA 2005, Springer Verlag, LNCS 3403, p. 21-32.
    9. Priss, Uta; Old, L. John (2006). An application of relation algebra to lexical
       databases. In: Schaerfe, Hitzler, Ohrstrom (eds.), Conceptual Structures: Inspiration
       and Application, Proceedings of the 14th International Conference on Conceptual
       Structures, ICCS’06, Springer Verlag, LNAI 4068, p. 388-400.
    10. Priss, Uta; Old, L. John (2007). Bilingual Word Association Networks. In: Priss,
       Polovina, Hill (eds.), Proceedings of the 15th International Conference on Concep-
       tual Structures, ICCS’07, Springer Verlag, LNAI 4604, p. 310-320.
    11. Priss, Uta (2009). Formal Concept Analysis as a Tool for Linguistic Data Explo-
       ration. In: Hitzler, Pascal; Scharfe, Henrik (eds.), Conceptual Structures in Practice,
       Chapman & Hall/CRC studies in informatics series, p. 177-198.
    12. Priss, Uta; Old, L. John (2009). Revisiting the Potentialities of a Mechanical The-
       saurus. In: Ferre; Rudolph (eds.), Proceedings of the 7th International Conference
       on Formal Concept Analysis, ICFCA’09, Springer Verlag.
    13. Priss, Uta; Old, L. John (2010a). Concept Neighbourhoods in Knowledge Organi-
       sation Systems. In: Gnoli; Mazzocchi (eds.), Paradigms and conceptual systems in
       knowledge organization. Proceedings of the 11th International ISKO Conference, p.
       165-170.
    14. Priss, Uta; Old, L. John (2010b). Concept Neighbourhoods in Lexical Databases.
       In: Kwuida; Sertkaya (eds.), Proceedings of the 8th International Conference on
       Formal Concept Analysis, ICFCA’10, Springer Verlag, LNCS 5986, p. 283-295.
    15. Salton, Gerard (1968). Automatic Information Organization and Retrieval.
       McGraw-Hill, New York.
    16. Stepanova, Nadezhda A. (2009). Automatic acquisition of lexico-semantic knowl-
       edge from corpora. SENSE’09 Workshop. Available at http://ceur-ws.org/Vol-476/.
    17. Sladek, A. (1975). Wortfelder in Verbänden. Gunter Narr Verlag, Tübingen.
 Shortest CNF Representations of Pure Horn
Functions and their Connection to Implicational
                     Bases

                                   Ondrej Cepek

                             Charles University, Prague

    Pure Horn CNFs, directed hypergraphs, and closure systems are objects stud-
ied in different subareas of theoretical computer science. Nevertheless, these three
objects are in some sense isomorphic. Thus also properties derived for one of these
objects can be usually translated in some way for the other two. In this talk we
will concentrate on the problem of finding a shortest CNF representation of a
given pure Horn function. This is a problem with many practical applications in
artificial intelligence (knowledge compression) and other areas of computer sci-
ence (e.g. relational data bases). In this talk we survey complexity results known
for this problem and then concentrate on the relationships between CNF rep-
resentations of Horn functions and certain sets of implicates of these functions,
called essential sets of implicates. The definition of essential sets is based on the
properties of resolution. Essential sets can be shown to fulfill an interesting or-
thogonality property: every CNF representation and every (nonempty) essential
set must intersect. This property leads to non-trivial lower bounds on the CNF
size, which are sometimes tight and sometimes have a gap. We will try to derive
connections to the known properties of minimal implicational bases.
    The talk is based on joint research with Endre Boros, Alex Kogan, Petr
Kucera, and Petr Savicky.
          Learning Model Transformation Patterns using
                     Graph Generalization

          Hajer Saada, Marianne Huchard, Michel Liquière, and Clémentine Nebut

              LIRMM, Université de Montpellier 2 et CNRS, Montpellier, France,
                                  first.last@lirmm.fr



             Abstract. In Model Driven Engineering (MDE), a Model Transforma-
             tion is a specialized program, often composed of a set of rules to transform
             models. The Model Transformation By Example (MTBE) approach aims
             to assist the developer by learning model transformations from source
             and target model examples.In a previous work, we proposed an approach
             which takes as input a fragmented source model and a target model, and
             produces a set of fragment pairs that presents the many-to-many match-
             ing links between the two models. In this paper, we propose to mine
             model transformation patterns (that can be later transformed in trans-
             formation rules) from the obtained matching links. We encode our models
             into labeled graphs that are then classified using the GRAAL approach
             to get meaningful common subgraphs. New transformation patterns are
             then found from the classification of the matching links based on their
             graph ends. We evaluate the feasibility of our approach on two represen-
             tative small transformation examples.


      1    Introduction
      MDE is a subfield of software engineering that relies on models as a central arti-
      fact for the software development cycle. Models can be manually or automatically
      manipulated using model transformations. A model transformation is a program,
      often composed of a set of transformation rules, that takes as input a model and
      produces as output another model. The models conform to meta-models, as
      programs conform to the programming language grammar. If we would like to
      transform any Java program into any C++ program, we would express this trans-
      formation at the level of their grammars. In MDE, model transformations are
      similarly expressed in terms of the meta-models. Designing a model transforma-
      tion is thus a delicate issue, because the developer has to master the specialized
      language in which the transformation is written, the meta-modeling activity,
      and the subtleties of the source and the target meta-models. In order to as-
      sist the developers, the MTBE approach follows the track of the "Programming
      By Example" approach [6] and proposes to use an initial set of transformation
      examples from which the model transformation is partly learnt. The first step
      of the MTBE approach consists in extracting matching links, from which the
      second step learns transformation rules. Several approaches [1,15,12] are pro-
      posed for the second step, but they derive element-to-element (one-to-one) rules




c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 11–23,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
12       Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut

     that mainly express how a source model element is transformed into a target
     model element. In this paper, we propose to learn transformation patterns of
     type fragment-to-fragment (many-to-many) using the output of a previous work
     [13] that consists in generating matching links between source and target model
     fragments. We encode our models and model fragments as labeled graphs. These
     graphs are classified through a lattice using a graph mining approach (GRAAL)
     to get meaningful common subgraphs. The matching links are then classified
     using Formal Concept Analysis, the lattice of source graphs and the lattice of
     target graphs. New transformation patterns are discovered from these classifica-
     tions that can help the designer of the model transformation. We evaluate the
     feasibility of our approach on two representative transformation examples.
         The next Section 2 gives an overview of our approach. Section 3 presents the
     transformation pattern mining approach and Section 4 evaluates its feasibility.
     Section 5 presents the related work, and we conclude in Section 6.


     2    Approach Overview

     In Model-Driven Engineering, model transformation are programs that trans-
     form an input source model into an output target model. A classical model trans-
     formation (UML2REL) transforms a UML model into a RELational model. Such
     transformation programs are often written with declarative or semi-declarative
     languages and composed of a set of transformation rules, defined at the meta-
     model level. The meta-model defines the concepts and the relations that are used
     (and instantiated) to compose the models. For example, the UML meta-model


                Source             GRAAL   Source
                          Source
                Model                      Graph
                          Graphs
              Fragments                    Lattice
                                                         Matching
                                                                    FCA   Matching
                                                           Link                      Transformation
                Target             GRAAL   Target                           Link
                          Target                          Formal                        Patterns
                Model                      Graph                           Lattice
                          Graphs                         Context
              Fragments                    Lattice


               Matching
                Links




                                    Fig. 1. Process overview


     contains the concept of Class which owns Attributes. This can used to derive
     a UML model composed of a class P erson owning the attribute N ame. In the
     UML2REL example, a very simple transformation pattern would be: a UML class
     owning an attribute is transformed into a relational table owning a column. In
     this paper, our objective is to learn such transformation patterns that express
     that a pattern associating entities of the source meta-model (e.g. a UML class
     owning an attribute) is transformed into a pattern associating entities of the
     target meta-model (e.g. a relational table owning a column). Fig. 1 provides
     an overview of our process. Let us consider that we want to learn rules for


                                                     2
    Learning Model Transformation Patterns using Graph Generalization                         13

transforming UML models to relational models. Our input data (see Fig. 2) are
composed of: fragmented source models (a UML source fragment is given in Fig.
3(a)); fragmented target models (a relational target fragment is given in Fig.
3(b)); and matching links between fragments established by experts or by an
automatic matching technique. For example a matching link (L1) is established
in Fig. 2 between the UML source fragment of Fig. 3(a) and the relational target
fragment of Fig. 3(b).



                                    SG0

                                          L0
                     Person
                  name                               TG2

                                                               TG0
                                                     Client      1       ReservationRequest
                      Client                    clientNumber             reservationNumber
                 clientNumber                   name                 ∞   clientNumber
                                                                         date
                 address                        address
         SG2               client
                   1..1                                                   TG1
                           reservation     L2
                   0..n
               ReservationRequest
                                                L1

                     SG1




Fig. 2. Three matching links between fragmented UML and relational models (this
figure is more readable in a coloured version)




    Matching links established by experts or by automatic methods can be used
to form a set of model transformation patterns. For example, the L2 matching
link gives rise to a transformation pattern which indicates that a UML class
(with an attribute) with its super-class (with an attribute) is transformed into a
unique table with two columns, one being inherited. Nevertheless, matching links
often correspond to patterns that combine several simpler transformations or
are triggered from domain knowledge. Besides, they may contain minor errors
(such as a few additional or missing elements, for example, column date of Table
ReservationRequest has in fact no equivalent in Class ReservationRequest).
Moreover, what interests us is beyond the model domain. We do not want to
learn that Class Client is transformed into Table Client, but rather that a
UML class is usually transformed into a table.
    Our output is composed of a set of model transformation patterns. Some can
directly be inferred from initial matching links (as evoked previously), and some
will be found thanks to graph generalization and matching link classification.
From our simple example, we want to extract the model transformation pattern
presented in Figure 4, whose premise and conclusion patterns do not appear as
such in the initial set of matching links (,→ means "is transformed into").


                                                 3
14         Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut

            Person
          name
                                                                                        Client   1           ReservationRequest
                                                                                  clientNumber               requestNumber
                                 reserve
                                                                                  name                   ∞   date
              Client    client             reservation
                                                         ReservationRequest                                  clientNumber
         clientNumber   1,1                      0,n



                  (a) UML source fragment                                               (b) Relational target fragment

                                     Fig. 3. An example of UML and relational models




                                                                                  ,→

     Fig. 4. Transformation pattern: a class specializing a class with an attribute (in UML
     model) is transformed into a table with an inherited column (in relational model).


     3       Model Transformation Pattern Generation

     From model fragments to graphs For our example, the source meta-model
     is inspired by a tiny part of the UML metamodel (see Figure 5(a)), while the
     target meta-model has its roots in a simplified relational data-base meta-model
     (see Fig. 5(c)). The models often are represented in a visual syntax (as shown in
     Fig. 3(a) and Fig. 3(b)) for readability reasons. Here we use their representation
     as instance diagrams of their meta-model (using the UML instance diagram
     syntax). For example, the UML model of Fig. 3(a) is shown as an instantiation of
     its meta-model in Fig. 5(b), where each object (in a rectangular box) is described
     by its meta-class in the meta-model, valued attributes and roles conforming to
     the attributes and associations from the meta-model: e.g. person and client
     are explicit instances of Class; client:Class has a link towards person labeled
     by the role specializes; client:Property has the attribute lowerBound (1).
         To extract expressive transformation patterns, we transform our models using
     their instance diagram syntax, into simpler graphs which have labeled vertices.
     We limited ourselves to locally injective labelled graphs. A locally injective graph
     is a labeled graph such that all vertices in the neighbor of a given vertex have
     different labels. This is not so restrictive in our case, because the fragments
     identified by the experts rarely include similar neighborhood for an entity. Here
     are the rules that we use in the transformation from simplified UML instance
     diagrams to labeled graphs. We associate a labeled node to Objects, Roles, At-
     tributes, Attribute values. Instance diagram of Figure 5(b) and corresponding
     labeled graph from 6(a) are used to illustrate the transformation: person:Class
     object is transformed into node 1 labeled class_1 and one of the attribute value
     1 is transformed into node 13 labeled one_13. Edges come from the following
     situations: an object has an attribute; an attribute has a value; an object has


                                                                              4
     Learning Model Transformation Patterns using Graph Generalization                                                                                                                15

a role (is source of a role); an object is the target of a role. For example, for
the property which has an attribute lowerBound (equal to zero), there is a
corresponding edge (property_17, lowerBound_18).


                                                                                              person : Class                                       name : Property
                                                                                                                                 ownsAttribute

                                                                                                       specializes

                       Class                                 Property
                                                                                                                                                              reservationRequest
                                         ownsAttribute lowerBound
                                                                                                                                                                    : Class
                                                       upperBound
                                     hasType                                                  client : Class                                                                 hasType
 specializes                                                                                                         hasType
                                                                                                                                     reserve:
                                                                 ownedEnd {Ordered}
                                                                                                                                    Association

                                                                                                       ownsAttributeIdentifier                               ownedEnd 2
                           ownsAttributeIdentifier
                   1                                      Association
                                                                                                                                           ownedEnd 1
                                                                                                                                                                    reservation:
               PropertyIdentifier                                                            clientNumber :                                                           Property
                                                                                            PropertyIdentifier                    client : Property              lowerBound = 0
                                                                                                                                 lowerBound = 1                  upperBound = n
                                                                                                                                 upperBound = 1


                       (a) UML metamodel                                                (b) UML model of Fig 3(a) in instance dia-
                                                                                        gram syntax

               Table                       Property    Column



                                   inheritedProperty
                                                                                                                                                 reservationRequest :
                                                                                  client : Table
                                                                                                                                                        Table

                               1
                                                                                                                                                                           Property
                                                                                                          inheritedProperty
     FKey                PKey
                                                                           clientNumber:                                         clientNumber :                       date : Column
                                                                                                   name : Column                      FKey
                                                                                PKey

                               hasSameName                                            hasSameName                                                     requestNumber
                                                                                                                                                          : PKey


   (c) Relational metamodel                                             (d) Relational model of Fig 3(b) in instance diagram
                                                                        syntax

Fig. 5. Source/target metamodel and model, UML (upper par), relational (lower part)



Classification of graphs (GRAAL approach) After the previous step, we
obtain a set of source graphs, and a set of target graphs. We illustrate the re-
mainder of this section by using the three source graphs of Fig. 6, the three
target graphs of Fig. 7, and the matching links (Source graph i, Target graph i),
for i ∈ {0, 1, 2}. To get meaningful common subgraphs (on which new transfor-
mation patterns will be discovered), we use the graph mining approach proposed
in [7] and its derived GRAAL tool. In this approach, examples are described by
a description language L provided with two operations: an ≤ specialization op-
eration and an ⊗ operation which builds the least general generalization of two
descriptions. A generalization of the Norris algorithm [11] builds the Galois lat-
tice. Several description languages are implemented in GRAAL, and especially a
description based on locally injective graphs. ⊗ operation is the reduction of the
tensor product of graphs, also called the Kronecker product [14]. We indepen-
dently classify source graphs and target graphs. Classification of source graphs


                                                                                          5
16      Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut

     produces the lattice of Fig. 8(a). For example, in this lattice, Concept sfc012
     has for intent a subgraph of source graphs 0, 1 and 2 representing a class which
     specializes a class which owns an attribute. Classification of target graphs pro-
     duces the lattice of Fig. 8(b). In this lattice, Concept tfc012 has for intent a
     subgraph where a table has an inherited property.




                                    (a) Source graph 1




                       (b) Source graph 0         (c) Source graph 2

                                   Fig. 6. Source graphs



     Classification of transformation links In the previous section, we have
     shown how Galois lattices can be computed on the labeled graphs that represent
     our model fragments. Now a matching link is described by a pair composed of
     a source fragment (whose corresponding graph is in the extent of some concepts
     in the source graph lattice) and a target fragment (whose corresponding graph
     is in the extent of some concepts in the source graph lattice). This is described
     in a formal context, where objects are the matching links and attributes are
     the concepts of the two lattices (source graph lattice and target graph lattice).
     In this formal context (presented in Table 11(a)), a matching link is associated
     with the concepts having respectively its source graph and its target graph in
     their extent. This means that the matching link is described by the graph of its
     source fragment and by the generalizations of this graph in the lattice. This is
     the same for the graphs of the target fragments. For example, matching link L0,
     connecting source fragment 0 to target fragment 0, is associated in the formal
     context to concepts sfc01, sfc012, tfc01, tfc012.


                                            6
     Learning Model Transformation Patterns using Graph Generalization             17




                                   (a) Target graph 1




                     (b) Target graph 0          (c) Target graph 2

                                 Fig. 7. Target graphs




                    (a) Source graph lattice (b) Target graph lattice

Fig. 8. Graph lattices. Only concept extents are represented in the figure. Intents of
concepts are shown in Fig. 9 and 10. We denote by sf cx1 ...xn (resp. tf cx1 ...xn )
           the vertex [x1 ,...,xn ] of the source (resp. target) graph lattice.


    The concept lattice associated with the matching link formal context of Fig.
11(a) is shown in Fig. 11(b). In this representation (obtained with RCAexplore1 )
each box describes a concept: the first compartment informs about the name of
the concept, the second shows the simplified intent (here concepts from source
fragment lattice and target fragment lattice) and the third one shows the sim-
plified extent (here matching links). Concept_MatchingLinksFca_4 extent is
composed of the links L0 and L1, while the intent is composed of source graph
concepts sfc01, sfc012 and target graph concepts tfc01, tfc012.

Model transformation pattern mining The last step of the process consists
in extracting model transformation patterns from the matching link lattice. This
has close connections to the problem of extracting implication rules in a concept
lattice, but using only pairs of source and target graph concepts. The more
1
    http://dolques.free.fr/rcaexplore.php


                                            7
18      Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut




               (a) Concept sfc2           (b) Concept sfc01       (c)       Concept
                                                                  sfc012

     Fig. 9. Source graph lattice concepts. Concept sfc1 (not represented) has Source
     Graph 1 of Fig. 6 as intent




          (a)       Concept       (b) Concept tfc12           (c) Concept tfc01
          tfc012

     Fig. 10. Target graph lattice concepts. Concepts tfc1 and tfc2 (not represented) have
     resp. Target Graph 1 and Target Graph 2 from Fig. 7 as their intents.


     reliable transformation patterns are given when using a source graph and a
     target graph in the same simplified intent of a concept, because this corresponds
     to the fact that the source graph is always present when the target graph is
     present too (and reversely). For example, from Concept_MatchingLinksFca_0,
     we obtain the following transformation pattern:

                   graph of sfc012 intent ,→ graph of tfc012 intent

     This pattern expresses a new transformation pattern (new in the sense that it
     does not directly come from a matching link):

         A UML model where a class Cd specializes another class Cm which owns
         an attribute a is transformed into a relational model where a table T
         owns a (inherited) column c.

        Due to the simplicity of our illustrative example, the other reliable patterns
     obtained from source and target graphs from the same simplified intent just
     correspond to matching links.
        Obtaining other, less reliable patterns, relies on the fact that if a source
     graph and a target graph are not in the same simplified intent, but the concept


                                              8
                                  Learning Model Transformation Patterns using Graph Generalization                             19
                                     (a) Concept 1


                                                                                    Concept_MatchingLinksFca_0
                                                                                              sfc012
                                                                                              tfc012




                                                                  Concept_MatchingLinksFca_4
                                                                                                   Concept_MatchingLinksFca_5
                                                                            sfc01
                                                                                                                 tfc12
            (b) Concept 2                (c) Concept 01          (d) Concept 012
                                                                            tfc01
                                                                             L0


                      Fig. 5. Source fragments lattice concepts
                                                                  Concept_MatchingLinksFca_1       Concept_MatchingLinksFca_3

                       Table 1. Matching Link Formal Context                 sfc1                                sfc2
                                                                             tfc1                                tfc2
                                                                             L1                                   L2
                                   sfc012




                                   tfc012
                                   sfc01



                                   tfc01
                                   tfc12
                                   sfc1
                                   sfc2


                                   tfc1
                                   tfc2




                                 ML                                                 Concept_MatchingLinksFca_2

                                 L0     ⇥⇥    ⇥  ⇥
                                 L1 ⇥   ⇥⇥⇥   ⇥⇥⇥
                                 L2   ⇥  ⇥  ⇥   ⇥⇥
                             (a) Matching Link Formal Con-                  (b) Matching Link Lattice
                             text M LF C

3.4   Model transformation Fig.
                            rules
                                11.learning
                                    Matching link formal context and corresponding concept lattice.

4     Experimental Evaluation
                             Cs which introduces the source graph is below the concept Ct which introduces
5     Related Work           the target graph, then we infer the following transformation pattern:
                                            part of graph of Cs intent ,→ graph of Ct intent
Model Transformation is a key component of Model Driven Engineering (MDE).
In this paper, we learn modelFor transformation
                                 example, as sfc1from
                                                    appears
                                                        model below   tfc12, we can
                                                                 transformation        deduce that, when the in-
                                                                                    traces.
                             put  of the transformation contains   the  graph
We can distinguish between two categories of strategies to generate transfor-  of intent  ofsfc1, thus the output
                             contains the graph of intent of tfc12. These patterns are less reliable, because
mation traces: the first category [?], [?], [?], [?], [?], [?], [?], [?] depends on the
                             the source graph may contain many things that have nothing to do with the tar-
transformation program or engine. The corresponding approaches generate trace
                             get graph (compare sfc1 and tfc12 to see this phenomenon). However, experts
links through the execution    of have
                             can   a model  transformation.
                                        a look                The to
                                               on these patterns    second   category
                                                                       find several     [?],
                                                                                     (concurrent)  transformation
[?], [?] consists in generating a  transformation  trace  independently     from  a  trans-
                             patterns when several source model fragments are transformed into a same tar-
formation program.           get model fragment. We have a symmetric situation when a source graph and
                             a target graph are not in the same simplified intent, but the concept Ct which
                             introduces the4target graph is below the concept Cs which introduces the source
                             graph.

                             4    Feasibility study
                             We evaluated the feasibility of the approach on two different realistic transforma-
                             tion examples: (1) UML class diagram to relational schema model that contains


                                                                       9
20       Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut

     108 model elements, 10 fragments (5 sources, 5 targets) and 5 matching links
     (U2S) and (2) UML class diagram to entity relationship model that contains 66
     model elements, 6 fragments (3 sources, 3 targets) and 3 matching links (U2E).
     We compute from the obtained graphs for each transformation example several
     pattern categories (see left-hand side of Table 1).
         (1) The transformation patterns coming from simplified intents (which we
     think are the most relevant patterns): they correspond to graphs pairs (GS , GT )
     such that GS and GT are in the simplified intent of a same concept. They can
     be divided into two sets. The set T Pl groups the patterns that are inferred from
     the initial matching links (GS , GT are the ends of a matching link). The set T Pn
     contains the patterns that are learned from graph generalization and matching
     link classification.
         (2) The transformation patterns (T Pnparts ) coming from the graphs GS and
     GT , such that GS is in simplified intent of a concept Cs which is a subconcept
     of the concept Ct which has GT in its simplified intent and all concepts greater
     than Cs and lower than Ct have an empty simplified intent. In addition, we
     consider only the case where simplified intent of Cs contains only source graphs
     or (inclusively) simplified intent of Ct contains only target graphs.
         (3) Symmetrically, the transformation patterns (T Pnpartt ) coming from the
     graphs GS and GT , such that GT is in simplified intent of a concept Ct which
     is a subconcept of the concept Cs which has GS in its simplified intent and all
     concepts greater than Ct and lower than Cs have an empty simplified intent. In
     addition, we consider only the case where simplified intent of Cs contains only
     source graphs or (inclusively) simplified intent of Ct contains only target graphs.


      Table 1. Results. Left-hand side: sets cardinals. Right-hand side: precision metrics

                 #T Pl #T Pn #T Pnparts #T Pnpartt         PT Pl PT Pn PT Pnparts PT Pnpart
                                                                                           t
      Ill. ex.    2     2        2          1      Ill. ex. 1      1      0.72       0.72
       U2S        1     5        3          0       U2S     1 0.75        0.78         -
       U2E        2     2        1          1       U2E     1      1      0.73       0.95



         We also evaluate each extracted transformation pattern using a precision
     metric. Precision here is the number of elements in the source and target graphs
     that participate correctly to the transformation (according to a human expert)
     divided by the number of elements in the graphs. We then associate a precision
     measure to a set of transformation patterns, which is the average of the precisions
     of its elements (See right-hand side of Table 1).
         The results show that we learn transformation patterns that correspond to
     the initial mapping links. These patterns are relevant and efficient (precision =
     1). 17 new transformation patterns are also learned from the three used examples.
     These patterns seems also relevant, with a precision average than 0.83.


                                                  10
    Learning Model Transformation Patterns using Graph Generalization                21

5    Related Work

Several approaches have been proposed to mine model transformation. The
MTBE approach consists in learning model transformation from examples. An
example is composed of a source model, the corresponding transformed model,
and matching links between the two models. In [1,15], an alignment between
source and target models is manually created to derive transformation rules.
The approach of [5] consists in using the analogy to search for each source
model its corresponding target model without generating rules. In a previous
work [12], we use Relational Concept Analysis (RCA) to derive commonalities
between the source and target meta-models, models and transformation links to
learn executable transformation rules. The approach based on RCA builds trans-
formation patterns that indicate how a model element, in a specific context, is
transformed into a target element with its own specific context. This approach
has many advantages for the case when the matching link type is one-to-one,
but it is not able to capture the cases where a set of model elements is globally
transformed into another set of model elements (matching link type is many-to-
many). In this paper, we investigate graph mining approaches, to go beyond the
limitations of our previous work. In the current context of MDE, transformation
examples are not very large (they are manually designed), thus we do not expect
scalability problems. Compared with a solution where we would build a lattice
on graphs containing elements from both source and target models coming from
matching links, the solution we choose separately classifies source graphs and
target graphs. This is because source graphs and target graphs could come from
the same meta-model (or from meta-models with common concepts) and it has
no meaning in our context to generalize a source graph and a target graph to-
gether. We also think that the result is more readable, even in the case of disjoint
meta-models.
    Our problem has close connections with the pattern structure approach [4]
when the pattern structure is given by sets of graphs that have labeled vertices.
Graph mining approaches [2,10] aim at extracting repeated subgraphs in a set of
graphs. They use a partial order on graphs which usually relies on morphism or
on injective morphism, also known as subgraph isomorphism [9]. In the general
case, these two morphism operations have an exponential complexity. In this
paper, we rely on graph mining to classify independently the origins and the
destinations of matching links and to infer from this, a classification of matching
links, that is then used to extract transformation patterns.


6    Conclusion

We have proposed an approach to assist a designer in her/his task of writing a
declarative model transformation. The approach relies on model transformation
examples composed of source and target model fragments and matching links.
Models and their fragments are represented by graphs with labelled vertices that
are classified. This classification is in turn, used for classifying the matching links.


                                          11
22      Hajer Saada, Marianne Huchard, Michel Liquière and Clémentine Nebut

     Finally, the mined model transformation patterns express how a source model
     fragment is transformed into a target model fragment. Future directions of this
     work include extending the evaluation to other kinds of source and target meta-
     models, and define a notion of support for the patterns. We also would like
     to explore the different kinds of graph mining approaches, in particular to go
     beyond the limitation of using locally injective graphs. Finally, we plan to apply
     our approach [12] to transform the obtained patterns into operational rules.


     References
      1. Balogh, Z., Varro, D.: Model Transformation by Example Using Inductive Logic
         Programming. Software and Systems Modeling 8(3), 347–364 (2009)
      2. Cook, D.J., Holder, L.B.: Mining Graph Data. John Wiley & Sons (2006)
      3. Fabro, M.D.D., Valduriez, P.: Towards the efficient development of model transfor-
         mations using model weaving and matching transformations. Software and System
         Modeling 8(3), 305–324 (2009)
      4. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Proc. of
         ICCS’01. pp. 129–142 (2001)
      5. Kessentini, M., Sahraoui, H., Boukadoum, M.: Model transformation as an opti-
         mization problem. In: MODELS’08, LNCS 5301. pp. 159–173. Springer (2008)
      6. Lieberman, H. (ed.): Your Wish Is My Command: Programming by Example.
         Morgan Kaufmann Publishers (2001)
      7. Liquiere, M., Sallantin, J.: Structural machine learning with Galois lattice and
         Graphs. In: Proc. of ICML’98. pp. 305–313 (1998)
      8. Lopes, D., Hammoudi, S., Abdelouahab, Z.: Schema matching in the context of
         model driven engineering: From theory to practice. In: Advances in Systems, Com-
         puting Sciences and Software Engineering. pp. 219–227. Springer (2006)
      9. Mugnier, M.L.: On generalization/specialization for conceptual graphs. J. Exp.
         Theor. Artif. Intell. 7(3), 325–344 (1995)
     10. Nijssen, S., Kok, J.N.: The Gaston Tool for Frequent Subgraph Mining. Electr.
         Notes Theor. Comput. Sci. 127(1), 77–87 (2005)
     11. Norris, E.: An algorithm for computing the maximal rectangles in a binary relation.
         Revue Roumaine Math. Pures et Appl. XXIII(2), 243–250 (1978)
     12. Saada, H., Dolques, X., Huchard, M., Nebut, C., Sahraoui, H.A.: Generation of op-
         erational transformation rules from examples of model transformations. In: MoD-
         ELS. pp. 546–561 (2012)
     13. Saada, H., Huchard, M., Nebut, C., Sahraoui, H.A.: Model matching for model
         transformation - a meta-heuristic approach. In: Proc. of MODELSWARD. pp.
         174–181 (2014)
     14. Weichsel, P.M.: The Kronecker product of graphs. Proceedings of the American
         Mathematical Society 13(1), 47–52 (1962)
     15. Wimmer, M., Strommer, M., Kargl, H., Kramler, G.: Towards model transforma-
         tion generation by-example. In: Proc. of HICSS ’07. p. 285b (2007)




                                               12
          Interaction Challenges for the Dynamic
          Construction of Partially-Ordered Sets

                           Tim Pattison and Aaron Ceglar

                    {tim.pattison,aaron.ceglar}@defence.gov.au
                      Defence Science & Technology Organisation
                                 West Ave, Edinburgh
                                 South Australia 5111



         Abstract. We describe a technique for user interaction with the in-
         terim results of Formal Concept Analysis which we hypothesise will ex-
         pedite user comprehension of the resultant concept lattice. Given any
         algorithm which enumerates the concepts of a formal context, this tech-
         nique incrementally updates the set of formal concepts generated so far,
         the transitive reduction of the ordering relation between them, and the
         corresponding labelled Hasse diagram. User interaction with this Hasse
         diagram should prioritise the generation of missing concepts relevant to
         the user’s selection. We briefly describe a prototype implementation of
         this technique, including the modification of a concept enumeration al-
         gorithm to respond to such prioritisation, and the incremental updating
         of both the transitive reduction and labelled Hasse diagram.


  1    Introduction
  Formal Concept Analysis (FCA) takes as input a formal context consisting of a
  set of attributes, a set of objects, and a binary relation indicating which objects
  have which attributes. It produces a partially-ordered set, or poset, of formal
  concepts, the size of which is, in the worst case, exponential in the number of
  objects and attributes in the formal context [1]. The computational tasks of enu-
  merating the set of formal concepts, and of calculating the transitive reduction
  of the ordering relation amongst them, therefore scale poorly with the size of the
  formal context. These steps are required to determine the vertices and arcs of
  the directed acyclic graph whose drawing is known as the Hasse diagram of the
  partial order. The layout of this layered graph prior to its presentation to the
  user is also computationally intensive [2]. For contexts of even moderate size,
  there is therefore considerable delay between user initiation of the process of
  FCA and presentation of its results to the user.
      A number of algorithms exist which efficiently enumerate the formal concepts
  of a formal context [3–6]. In this paper, we describe an approach which incre-
  mentally updates and presents the partial order amongst the formal concepts
  generated so far. In particular, it: incrementally updates the transitive reduc-
  tion of the interim partial order as each new concept is generated; incrementally
  updates the layout of the Hasse diagram; and animates the resultant changes to

c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 23–35,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
24      Tim Pattison and Aaron Ceglar


the Hasse diagram to assist the user in maintaining their mental model. This
approach enables user exploration and interrogation of the interim partial order
in order to expedite their comprehension of the resultant complete lattice of
concepts. It applies equally to any other partial order, the enumeration of whose
elements is computationally intensive.
    We also describe how this interaction can prioritise the generation and dis-
play of those missing concepts which are most relevant to the user’s current
exploratory focus. By addressing the scalability challenge of visual analytics [7],
this user guidance of computationally intensive FCA algorithms [8] facilitates
the required “human-information discourse”.


1.1   Previous work

Incremental algorithms exist for updating the set of formal concepts and the
transitive reduction of the ordering relation following the addition of a new object
to the formal context [9–11]. A new object can give rise to multiple additional
concepts which must be inserted in the existing complete lattice to produce an
updated lattice which is also complete. In contrast, the technique described in
this paper involves the addition of a single element at a time to a partially
ordered set which is not in general a complete lattice.
     Ceglar and Pattison [8] have argued that user guidance of the FCA process
could allow the satisfaction of the user’s requirements with a smaller lattice,
and consequently in less time, than standard FCA algorithms. They described
a prototype tool which facilitates interactive user guidance and implements an
efficient FCA algorithm which they have modified to respond to that user guid-
ance. The user interaction challenges identified by that work are described and
addressed in this paper.


2     Interacting with a Hasse diagram

2.1   The Hasse diagram

A finite poset hP ;  n.
    Episodes are labeled directed graphs (DAGs). An episode G is a triple (V, E, λ),
where V is the set of vertices, E is the set of directed edges, and λ is the la-
beling function from V and E to the set of labels, that is, E. Several classes of
episodes have been studied since episode mining is firstly introduced by Man-
nila et. al. [11]. We follow subclasses of episodes studied by Katoh et al. [7]. An
450    Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto


             &'()*+,'-)&./'   $%&'()*+,'-)&./'       !       "       #          "      E
            !      "      4                                      "                   0%&'()*+,'-)&./'
           /)*2.3/,'-)&./'       1)*2.3/,'-)&./'         !       4         #
                   "                                             1             4 5%&'()*+,'-)&./'
            !             1
                   4                             !       1       #       $%&'()*+,'-)&./'


      Fig. 1. An example of episode studied in episode mining in [11] and [7].


example of episodes is illustrated in Figure 1. In designing pattern mining algo-
rithms, we need 1) a search space of patterns and a partial order for enumerating
patterns, and 2) interestingness measure to evaluate them. For episode mining,
we often adopt occurrences of episodes defined with windows.

Definition 1 (Windows). For a sequence S = hS1 , . . . , Sn i, an window W of
S is a contiguous subsequence hSi , · · · , Si+w−1 i of length n, called width, for
some index i (−w + 1 ≤ i ≤ n) of S and a positive integer w ≥ 0.

Definition 2 (Embedding of Episodes). Let G = (V, E, λ) be an episode,
and W = hS1 , . . . , Sw i be a window of width w. We say that G occurs in W if
there exists a mapping h : V → {1, . . . , w} satisfying 1) for all v ∈ V , h(v) ∈
Sh(x) , and 2) for all (u, v) ∈ E with u 6= v, it holds that h(u) < h(v). The map
h is called an embedding of G into W , and it is denoted by G  W .

    For an input event sequence S and a episode G, we say that G occurs at
position i of S if G  Wi , where Wi = hSi , . . . , Si+w−1 i is the i-th window of
width w in S. We then call the index i an occurrence of G in S. The domain of
the occurrences is given by WS,w = {i | −w + 1 ≤ i ≤ n}. In addition, WS,w (G)
is the occurrence window list of an episode G, defined by {−w + 1 ≤ i ≤ n | G 
Wi }. Then we can define an interestingness measure frequency of episodes.

Definition 3 (Frequency of Episodes). The frequency of an episode G in S
and w, denoted by freq S,w (G), is defined by the number of windows of width w
containing G. That is, freq S,w (G) = |WS,w (G)|. For a threshold θ ≥ 1, a width
w and an input event sequence S, if freq S,w (G) ≥ θ, G is called θ-frequent on S.

The frequent episode mining problem is defined as follows: Let P be a class of
episodes. Given an input event sequence S, a width w ≥ 1, and a frequency
threshold θ ≥ 1, the problem is to find all θ-frequent episodes G belonging to
the class P. The simplest strategy of finding all θ-frequent episodes is traversing
P by using the anti-monotonicity of the frequency count freq(·). For details, we
would like to refer to both [7] and [11].
    For our examples of classes, we introduce m-serial episodes and diamond
episodes. An m-serial episode over E is a sequence of events in the form of a1 7→
a2 7→ · · · 7→ am . A diamond episode over E is either 1) a 1-serial episode e ∈ E or
2) a proper diamond episode represented by a triple Q = ha, X, bi ∈ E × 2E × E,
where a, b are events and X ⊆ E is an event set occurring after a and before
                        Pattern Structures for Understanding Episode Patterns              515


b. For short, we write a diamond episode as a 7→ X 7→ b. On the one hand
definitions of episodes by graphs are much general, on the another hand classes
of episode patterns are often restricted.
Example 3 (Episodes). In Figure 1, we show some serial episodes; A 7→ B 7→ E,
A 7→ D 7→ E, B 7→ E, and C on the set of events E = {A, B, C, D, E}. All of
them are included in a diamond episode A 7→ {B, C, D} 7→ E.
    We explain a merit of introducing pattern structures for summarization of
structured patterns. As we mentioned above, a common strategy adopted in pat-
tern mining is traversing the space P in a breadth-first manner with checking
some interestingness measure. When generating next candidates of frequent pat-
terns, algorithms always check a parent-child relation between two patterns. This
order is essential for pattern mining and we thus conjecture that this parent-
child relation used in pattern mining can be naturally adopted in constructing
a pattern structure for analyzing patterns only by introducing a similarity oper-
ation ⊓. After constructing a lattice, it would be helpful to analyze a set of all
patterns using it because they represent all patterns compactly.
    A crucial problem of pattern structures is the computational complexity con-
cerning both ⊓ and ⊑. Our idea is to adopt trees of height 1 (also called stars
in Graph Theory). That is, we here assume that trees are expressive enough to
represent features of episodes. Our idea is similar that used in designing graph
kernels [14]1 and that is inspired by previous studies on pattern structures [2, 4].

3   Diamond Episode Pattern Structures
In the following, we focus on diamond episodes as our objects, and trees of height
1 as our descriptions. They have two special vertices; the source and the sink.
They can be regarded as important features for representing event transitions.
We generate rooted labeled trees from them by putting the node in the root of
a tree, and regarding neighbors as children of it. Since heights of all trees here
are 1, we can represent them by tuples without using explicit graph notations.
Definition 4 (Rooted Trees of Height 1). Let (E, ⊓E ) be a meet semi-lattice
of event labels. A rooted labeled tree of height 1 is represented by a tuple 2
(e, C) ∈ E × 2E . We represent the set of all rooted labeled trees of height 1 by T.
Note that in (E, ⊓E ), we assume that ⊓E compares labels based on our back-
ground knowledge. We need to take care that this meet semi-lattice (E, ⊓E ) is
independent and different from a meet semi-lattice D of descriptions of a pattern
structure P. This operation ⊓E is also adopted when defining an embedding of
trees of height 1, that is, a partial order between trees defined as follows.
1
  It intuitively generates a sequence of graphs by relabeling all vertices of a graph. One
  focus on a label of a vertex v ∈ V (G) and sees labels LN G (v) of its neighbors NG (v).
  For a tuple (lv , LN G (v)) for all vertices v ∈ V (G), we sort all labels lexicographically,
  and we assign a new label according to its representation. Details are seen in [14].
2
  On the viewpoint of graphs, this tuple (e, C) should represent a graph G = (V, E, λ)
  of V = {0, 1, . . . , |C|}, E = {(0, i) | 1 ≤ i ≤ |C|}, λ(0) = e, {λ(i) | 1 ≤ i ≤ |C|} = C.
652          Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto


",-+./&'0+ G0                                         453+6-*'7.5)/,1-58-)''8
                                  δ(G0 )
&'()*+   "
                     !                !                 !   ⊓E   $       =       9              δ(G0 ) ⊓t δ(G1 )
  !      #      $
                              $       #       "                                                         9
         $ ,+/123')&
                                                                                                    $   #   "
                                                      :+,+)56/;58/',-'<-*2/60)+,
",-+./&'0+ G1
                                                        $    #       "
         "                        δ(G1 )                                                 =+7/>6588/*+-<')-+?+,8&
&'()*+
         #           !                $                 "    #       %       $              "   #   %   $   !
  $             #
         %
                          "       #       %       $         =    $           #       "              9
         $    ,+/123')&


                Fig. 2. An example of computations ⊓ of two trees of height 1.


Definition 5 (Partial Order on Trees). A tree t1 = (e1 , C1 ) is a generalized
subtree of t2 = (e2 , C2 ), denoted by t1 ⊑T t2 , iff e1 ⊑E e2 and there exists an
injection mapping φ : C1 → C2 satisfying for all v ∈ C1 , there exists φ(v) ∈ C2
satisfying v ⊑E φ(v), where ⊑E is the induced partial order by ⊓E .

     For defining a similarity operator ⊓T between trees, this partial order ⊑T
is helpful because ⊓T is closely related to ⊑T in our scenario. Since all trees
here are height 1, this computation is easy to describe; For labels of root nodes,
a similarity operator is immediately given by using ⊓E . For their children, it
is implemented by using an idea of least general generalization (LGG), which
is used in Inductive Logic Programming [10], of two sets of labels. A practical
implementation of LGG depends on whether or not sets are multisets, but it is
computationally tractable. An example is seen in Figure 2.
     We give formal definitions of δ and D. For a graph G = (V, E, λ), we denote
the neighbors of v ∈ V by NG (v). For some proper diamond episode pattern G,
the source vertex s ∈ V and the sink vertex t ∈ V , computed trees of height 1
corresponding s and t are defined as Ts = ({s} ∪ NG (s), {(s, u) | u ∈ NG (s)}, λ),
and Tt = ({t} ∪ NG (t), {(u, t) | u ∈ NG (t)}, λ), respectively. By using those
trees, δ(·) can be defined according to vertices s and t: If we see both Ts and
Tt , δ(G) = (Ts , Tt ) and then ⊓T is adopted element-wise, and D is defined by
T × T. If we focus on either s or t, δ(G) = Ts or Tv , and we can use ⊓T directly
by assuming D = I.
     Last we explain relations between our pattern structures and previous stud-
ies shortly. This partial order ⊑T is inspired from a generalized subgraph iso-
morphism [4] and a pattern structure for analyzing sequences [2]. We here give
another description of similarity operators based on definition used in [4, 9].

Definition 6 (Similarity Operation ⊓ based on [9]). The similarity op-
eration ⊓ is defined by the set of all maximal common subtrees based on the
generalized subtree isomorphism ⊑T ; For two trees s1 and s2 in T,

             s1 ⊓ s2 ≡ {u | u ⊑T s1 , s2 , and ∀u′ ⊑T s1 , s2 satisfying u 6⊑T u′ }.
                                  Pattern Structures for Understanding Episode Patterns            537


      %&'()*+,-(,&.,                                         )73,       /123'4,+*56*,'7+58,+
                   "   " " "   " " "     " " "   " " "   "


       ,0,&)*+,)
                           ! !     !     ! ! !   ! !     !                "              "
                   # #     # # # # # # # # # # # #     # #          !     #    "    !    #     !
                     $ $   $ $   $ $ $   $   $ $ $   $ $ $
                       / /   / / /     /             / /                  $              $


        Fig. 3. An input S and two diamond episodes mined from S as examples.

Table 1. Numbers of proper diamond episodes and pattern concepts for w ∈ {3, 4, 5}
and M ∈ {100, 200, 300, 400, 500, 600, 700}. In the table below, DE and PDE means
Diamond Episodes and Proper Diamond Episodes, respectively.

                                                         M and # of pattern concepts
                       Window width w # of DE # of PDE 100 200 300 400 500 600 700
                                     3     729       569 87 137 178 204 247 –    –
                                     4     927       767 74 136 179 225 281 316 336
                                     5     935       775 71 137 187 272 290 313 342



Note that we can regard that our operator ⊓T is a special case of the similarity
operation ⊓ above. On the viewpoint of pattern structures, our trees of height
1 can be regarded as an example of projections from graphs into trees, studied
in [4, 9], such as both k-chains (paths on graphs of length k) and k-cycles.


4     Experiments and Discussion for Diamond Episodes

Data and Experiments We gathered data from MLB baseball logs, where a
system records all pitching and plays for all games in a season. We used what
types of balls are used in pitching, which can be represented by histograms per
batter. For a randomly selected game, we generated an input event sequence of
episode mining by transforming each histogram to a set of types of balls used
types of balls3 . In forming (E, ⊓E ), we let E be the set of types of balls, and define
⊓E naturally (See Example in Fig. 2). For this S, we applied a diamond episode
mining algorithm proposed by [7] and obtain a set of diamond episodes. The
algorithm have two parameters; the window size w and the frequency threshold
θ. We always set θ = 1 and varied w ∈ {3, 4, 5}. After generating a set G of
frequent proper diamond episodes, we sampled M ∈ {100, 200, . . . , 700} episodes
from G as a subset O of G (that is, satisfying |O| = M and O ⊆ G). We used O
as a set of objects in our pattern structure P. From it we computed all pattern
concepts P(P) based on our discussions in Section 3. In this experiments we set
δ(G) = Ts for a proper diamond episode G and its source vertex s.
3
    In baseball games, pitchers throw many kinds of balls such as fast balls, cut balls,
    curves, sinkers, etc. They are recorded together with its movements by MLB systems.
54   Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto
                         Pattern Structures for Understanding Episode Patterns                     559


representations of itemsets, and they are closely related to the closure operator
g ◦ f in FCA with (O, A, I), where O is the set of transaction identifiers and
A is the set of all items. The difficulty of closed patterns for complex data is
there are no common definitions of closure operators, where we usually use the
closeness with respect to the frequency. Here we assume that pattern concepts
are helpful in the same correspondence between closed itemsets and concepts.
     To obtain some compact representations, we need to decide how to evaluate
each pattern. The problem here is how to deal with the wildcard ⋆ in descriptions.
When we obtain a concept (X, Y ) for X ⊆ O, Y ⊆ A, this concept (X, Y )
corresponds to a rectangle on I, and there are no 0 entries in the sub-database
I ′ = {(x, y) ∈ I | x ∈ X, y ∈ Y } of I induced by (X, Y ) because of its definitions.
If (X, Y ) is not a concept, a rectangle r by (X ′ , Y ′ ) contains a few 0 entries in
it. We denote the relative ratio of 1 entries in a rectangle r by (X ′ , Y ′ ) as
                                                                                          −1
         r1 (X ′ , Y ′ , I) = (1 − |{(x, y) 6∈ I | x ∈ X ′ , y ∈ Y ′ }|) (|X ′ ||Y ′ |)        ,

where 0 ≤ r1 (X ′ , Y ′ , I) ≤ 1 and r1 (X ′ , Y ′ , I) = 1 if (X ′ , Y ′ ) is a concept. These
r1 (X, Y, I), |X|, and |Y | are applicable for evaluating itemsets. If we only use the
cardinality |A| of a set A of objects, this equals to the support counts computed
in Iceberg concept lattices [15]. For a concept (X, Y ) of a context K = (O, A, I),
we compute the support count supp(X, Y ) = |g(Y )|/|O| and prune redundant
concepts by using some threshold. For formalizing evaluations of patterns, such
values are generalized by introducing a utility function u : P → R+ . A typical
and well-studied utility function is, of course, the frequency count, or the area
function area(·) which evaluates the size of a rectangle (X, Y ) [6].
    Based on discussions above, if we can define a utility function u(·) for eval-
uating pattern concepts, a similar discussion for pattern concepts are possible;
choosing a few number of pattern concepts and constructing summary of pat-
terns with them. Of course, there are no simple way of giving such functions. We
try to introduce a simple and straightforward utility function uP (·) for pattern
concepts as a first step of developing pattern summarization via pattern concept
lattices. In this paper, we follow the idea used in tiling databases [6], where a
key criterion is given by area(·). We consider how to compute the value which
corresponds to the area in binary databases. To take into account the wildcard
⋆ used in descriptions, we define the following simple function. For d ∈ D, we let
s(d) and n(d) be the numbers of non wildcard and all vertices in a description
d, respectively. Note that if s(d) = n(d), d contains no wildcard labels. By using
these functions, we compute utility values as follows:

                              uP (A, d) = |A| · log (1 + s(d)) .


5.1    Experiments and Discussions

We compare results of ranking pattern concepts by 1) using only |A| (similar
to the Iceberg concept lattices), and 2) using uP (·) as a utility function. From
the list of pattern concepts generated in experiments of Section 4, we rank all
 56
10      Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto

      Table 2. Results of ranking pattern concepts from 750 episodes in w = 5.

        Utility Top-5 mutually distinct descriptions of pattern concepts
        |A|    (⋆, {⋆}), (2, {⋆}), (0, {⋆}), (3, {⋆}), (1, {⋆})
        uP (·) (⋆, {0, ⋆}), (⋆, {0, 2, 3}), (⋆, {0, 1, 2}), (⋆, {0, 1, 3}), (⋆, {1, 2, 3})



pattern concepts by using a utility function, and sort the list in an ascending
order, and compare two lists. We remove patterns appearing commonly in both
lists to highlight differences. We give our results in Table 2.
    In the result with uP (·), larger descriptions appear with higher utility values
compared with those by |A|. We can see that by modifying terms concerning
⋆, results contain more informative nodes, which are labeled by non-wildcard
labels. Here we implicitly assume that descriptions contains less ⋆ would be
more useful for understanding data themselves. On this viewpoint, considering
two terms s(d) and n(d) for description d would be interesting and useful way
to design utility functions for pattern concepts. We conclude that the Iceberg
lattice based support counts are less effective if descriptions admit the wildcard
⋆ for pattern summarization problems.
    Not only the simple computation in uP (A, d) used above, also many alter-
natives could be applicable for ranking. Some probabilistic methods such as the
minimum description length (MDL), information-theoretic criteria would be also
helpful to analyze our study more clearly. Since pattern structures have no ex-
plicit representations of binary cross tables, the difficulty lies on how to deal
with a meet semi-lattice (D, ⊓). For some pattern concept (A, d) and an object
o ∈ O, we say that (A, d) subsumes o if and only if d ⊑ δ(o). This subsump-
tion relation would be simple and helpful to evaluate concepts, but they does
not adopt any complex information concerning hierarchy of events, or distances
between two descriptions. In fact in the experiments, we always assume that all
events except ⋆ have the same weight and ⋆ is the minimum of all events. They
could be important to take into account similarity measures of events for more
developments of ranking methods of pattern concepts.

5.2   Related Work
There are several studies concerning our study. It is well-known that closed item-
sets correspond to maximal bipartite cliques on bipartite graphs constructed
from K = (O, A, I). Similarly, we sometimes deal with so called pseudo bipartite
cliques [16], where it holds that r1 (X ′ , Y ′ , I) ≥ 1 − ε with a user-specified con-
stance ε. Obviously, pseudo bipartite cliques correspond to rectangles containing
a few 0. We can regard them as some summarization or approximation of closed
itemsets or concepts. Intuitively, if we use some pseudo bipartite cliques as sum-
marization, the value r1 (X, Y, I) can be considered in evaluating (X, Y ). Pseudo
bipartite cliques can be regarded as noisy tiles, which is an extension of tiles [6].
    Another typical approach for summarization is clustering patterns [18, 1]. A
main problem there is how to interpret clusters or centroids, where we need to de-
                      Pattern Structures for Understanding Episode Patterns        57
                                                                                   11

sign a similarity measure and a space in which we compute the similarity. On the
viewpoint of probabilistic models, there is an analysis via the maximum entropy
principle [3]. However they assume that entries in a database are independently
sampled, and thus we cannot apply those techniques to our setting.


6    Toward Generalizations for Bipartite Episodes
In this paper we assume that our descriptions by trees of height 1 are rich enough
to apply many classes of episode patterns. We here show how to apply our pattern
structure for other types of episodes, called bipartite episodes, as an example. An
episode G = (V, E, λ) is a a partial bipartite episode if 1) V = V1 ∪V2 for mutually
disjoint sets V1 and V2 , 2) for every directed edge (x, y) ∈ E, (x, y) ∈ V1 × V2 . If
E = V1 ×V2 , an episode G is called a proper bipartite episode. Obviously, vertices
in a bipartite episode G are separated into V1 and V2 , and we could regard them
as generalizations of the source vertex and the sink vertex of diamond episodes.
This indicates that the same way is applicable for bipartite episodes by defining
⊓ between sets of tress. Fortunately, [9] gives the definition ⊓ for sets of graphs.
                                                                            
                                                              [
            {t1 , . . . , tk } ⊓ {s1 , . . . , sm } ≡ MAX⊑T  ({ti } ⊓ {sj }) ,
                                                       i,j


where MAX⊑T (S) returns only maximal elements in S with respect to ⊑T . Since
our generalized subtree isomorphism is basically a special case of that for graphs,
we can also apply this meet operation. This example suggest that if we have some
background knowledge concerning a partition of V , it can be taken into account
for δ and (D, ⊓) in a similar manner of diamond and bipartite episodes.


7    Conclusions and Future Work
In this paper we propose a pattern structure for diamond episodes based on an
idea used in graph kernels and projections of pattern structures. Since we do not
directly compute graph matching operations we conjecture that our computation
could be efficient. With a slight modification of ⊓, our method is also applicable
for many classes of episodes, not only for diamond patterns as we mentioned
above. Based on our pattern structure, we discussed summarization by using
mined pattern concepts and show small examples and experimental results.
    Since problems of this type are unsupervised and there is no common way of
obtaining good results and of evaluating whether or not the results are good. It
would be interesting to study more about this summarization problem based on
concept lattices by taking into account theoretical backgrounds such as proba-
bilistic distributions. In our future work, we try to analyze theoretical aspects
on summarization via pattern structures including the wildcard ⋆ and its op-
timization problem to obtain compact and interesting summarization of many
patterns based on our important merit of a partial order ⊑ between descriptions.
 58
12      Keisuke Otaki, Madori Ikeda and Akihiro Yamamoto


Acknowledgments

This work was supported by Grant-in-Aid for JSPS Fellows (26·4555) and JSPS
KAKENHI Grant Number 26280085.


References
 1. Al Hasan, M., Chaoji, V., Salem, S., Besson, J., Zaki, M.: Origami: Mining rep-
    resentative orthogonal graph patterns. In: Proc. of the 7th ICDM. pp. 153–162
    (2007)
 2. Buzmakov, A., Egho, E., Jay, N., Kuznetsov, S.O., Napoli, A., Raı̈ssi, C.: The
    representation of sequential patterns and their projections within Formal Concept
    Analysis. In: Workshop Notes for LML (ECML/PKDD2013) (2013)
 3. De Bie, T.: Maximum entropy models and subjective interestingness: an application
    to tiles in binary databases. Data Mining and Knowledge Discovery 23(3), 407–446
    (2011)
 4. Ganter, B., Kuznetsov, S.O.: Pattern structures and their projections. In: Proc. of
    the 9th ICCS. pp. 129–142 (2001)
 5. Ganter, B., Wille, R.: Formal concept analysis - mathematical foundations.
    Springer (1999)
 6. Geerts, F., Goethals, B., Mielik ainen, T.: Tiling databases. In: Proc. of the 7th
    DS. pp. 278–289 (2004)
 7. Katoh, T., Arimura, H., Hirata, K.: A polynomial-delay polynomial-space algo-
    rithm for extracting frequent diamond episodes from event sequences. In: Proc. of
    the 13th PAKDD. pp. 172–183. Springer Berlin Heidelberg (2009)
 8. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting Numerical Pattern Mining
    with Formal Concept Analysis. In: Proc. of the 24th IJCAI (2011)
 9. Kuznetsov, S.O., Samokhin, M.V.: Learning closed sets of labeled graphs for chem-
    ical applications. In: Proc. of the 15th ILP, pp. 190–208 (2005)
10. Lloyd, J.W.: Foundations of Logic Programming. Springer-Verlag New York, Inc.
11. Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in
    event sequences. Data Mining and Knowledge Discovery 1(3), 259–289 (1997)
12. Merwe, D., Obiedkov, S., Kourie, D.: AddIntent: A New Incremental Algorithm for
    Constructing Concept Lattices. In: Proc. of the 2nd ICFCA. pp. 372–385 (2004)
13. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed item-
    sets for association rules. In: Prof. of the 7th ICDT. pp. 398–416 (1999)
14. Shervashidze, N., Schweitzer, P., van Leeuwen, E.J., Mehlhorn, K., Borgwardt,
    K.M.: Weisfeiler-lehman graph kernels. Journal of Machine Learning Research 12,
    2539–2561 (2011)
15. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg
    concept lattices with titanic. Data & Knowledge Engineering 42(2), 189–222 (2002)
16. Uno, T.: An efficient algorithm for solving pseudo clique enumeration problem.
    Algorithmica 56(1), 3–16 (Jan 2010)
17. Vreeken, J., van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress.
    Data Mining and Knowledge Discovery 23(1), 169–214 (2011)
18. Xin, D., Cheng, H., Yan, X., Han, J.: Extracting redundancy-aware top-k patterns.
    In: Proc. of the 12th KDD. pp. 444–453. ACM (2006)
             Formal Concept Analysis for Process
          Enhancement Based on a Pair of Perspectives

               Madori IKEDA, Keisuke OTAKI, and Akihiro YAMAMOTO

           Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan
          {m.ikeda, ootaki}@iip.ist.i.kyoto-u.ac.jp, akihiro@i.kyoto-u.ac.jp



             Abstract. In this paper, we propose to use formal concept analysis
             for process enhancement, which is applied to enterprise processes, e.g.,
             operations for patients in a hospital, repair of imperfect products in a
             company. Process enhancement, which is one of main goals of process
             mining, is to analyze a process recorded in an event log, and to improve
             its efficiency based on the analysis. Data formats of the logs, which con-
             tain events observed from actual processes, depend on perspectives on
             the observation. For example, events in logs based on a so-called process
             perspective are represented by their types and time-stamps, and obser-
             vation based on a so-called organization perspective records events with
             organizations relating the occurrence of them. The logs recently became
             large and complex, and events are represented by many features. How-
             ever, previous techniques of process mining take a single perspective into
             account. For process enhancement, by formal concept analysis based on
             a pair of features from different perspectives, we define subsequences of
             events whose stops are fatal to execution of a process as weak points to
             be removed. In our method, the extent of every concept is a set of event
             types, and the intent is a set of resources for events in the extent, and
             then, for each extent, its weakness is calculated by taking into account
             event frequency. We also propose some basic ideas to remove the weakest
             points.

             Keywords: formal concept analysis, process mining, business process
             improvement, event log


      1    Introduction

      In this paper, we show a new application of formal concept analysis, process
      enhancement (or business process improvement), which is one of main goals of
      process mining. We show that formal concepts are useful to discover weak points
      of processes, and that a formal concept lattice works as a good guide to remove
      the weak points in the process enhancement.
          Formal concept analysis (FCA for short) is a data analysis method which
      focuses on relationship between a set of objects and a set of attributes in data. A
      concept lattice, which is an important product of FCA, gives us valuable insights
      from a dual viewpoint based on the objects and the attributes. Moreover, because




c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 59–71,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
60 2        Madori
         Madori     IKEDA,
                Ikeda,      Keisuke
                       Keisuke OtakiOTAKI, and Akihiro
                                     and Akihiro       YAMAMOTO
                                                 Yamamoto

   of its simple and strong definition, various types of data can be translated for
   FCA, and so FCA attracts attention across various research domains.
       Process mining [9,13] is a relatively young research domain, and is researched
   for treating enterprise processes recorded in event logs, e.g., operations for pa-
   tients in a hospital, repair of imperfect products in a company. It provides a
   bridge between business process management (BPM for short) [12] and data
   mining. BPM has been investigated pragmatically, and data formats, softwares,
   and management systems are proposed for manipulating processes. Like recent
   data represented as “big data”, the event logs also became huge and complicated.
   Thus, BPM researchers need theoretically efficient approaches for handling such
   big data. This is also the recent trend of data mining. Though many results pro-
   duced in the last decade of process mining, there are still many challenges [11],
   and we work with FCA on two of them: “combining process mining with other
   types of analysis” and “dealing with complex event logs having diverse charac-
   teristics”. We treat business process improvement which is an essential goal of
   process mining as a application of FCA. In order to achieve it, so many matters
   should be considered. At first, we have to decide features of a process which are
   modified for improvement, and there are various types of features to represent
   the process. In order to categorize the features, six central perspectives have been
   proposed [4, 8]. For improvement in the target features, many modifications can
   be constructed. According to [8], there are 43 patterns of the modifications. We
   also have to evaluate the improvement, so an improvement measure is needed for
   the evaluation. Based on principal aspects of processes, time, quality, cost, and
   flexibility, four types of measures are considered [4, 8]. In this paper, for making
   a process robust and reliable, we focus on two of the perspectives to detecting
   weak points of the process which are subsequences of events. For the detection,
   our method calculated a weakness degree regarded as one of cost measures for
   each subsequence which is represented by the extent of a formal concept.
       This paper is organized as follows. In the next section, we introduce process
   mining and give a running example, and then, we show the problem tackled in
   this paper. In Section 3, we explain our process enhancement method. Conclu-
   sions are placed in Section 4.


   2     Process Mining

   In this section, we outline process mining with an example and show the problem
   which we try to solve.


   2.1     Event Logs Observed from Actual Processes

   Process mining has three types: process discovery, process conformance check-
   ing, and process model enhancement. Every type strongly focuses on and starts
   from facts observed from actual processes. It is the main difference from BPM
   (Business Process Management) [12] and also from WFM (Workflow Manage-
   ment) [6]. They are past fields of process mining and rely on prior knowledge.
            FCA
          FCA for for Process
                   Process    EnhancementBased
                           Enhancement    Basedon
                                                onaa Pair
                                                     Pair of
                                                          of Perspectives
                                                             Perspectives        3
                                                                                61

The observed facts are recorded in event logs, and so the logs are the most
important materials in process mining.
    Actual event logs are usually represented in a semi-structured format like
MXML [15] and XES [17]. Theoretically, every event log can be simply formal-
ized as a pair (F, E) of a finite set F of features and a finite set E of events.
Every feature f ∈ F is a function from E to its domain Df , and every event
                                                                    Q|F |
e ∈ E is recorded in the form of (f1 (e), f2 (e), ..., f|F | (e)) in i=1 Dfi . Each
event corresponds with an occurrence or a task which are found by observation
of an actual process. The observation is performed based on perspectives, and
the set of features is decided by depending on them. Mathematically, a set P
of the perspectives satisfies that every perspective p ∈ P is a non-empty subset
of F . Though six central perspectives which are called process, object, organiza-
tion, informatics, IT application, or environment are proposed [4, 8], there are
no standards for deciding P should be adopted in the observation. The set of
perspectives P varies from an observation to another based on aims of process
mining, kinds of processes executed by organizations, sensor systems installed
to organizations, and many other factors. There are however some fundamen-
tal perspectives which are currently adopted in construction of event logs. Our
approach focuses on two of these. One of them is the process perspective (it is
sometimes called a control-flow perspective), which is focusing on how process
occurs. If a process is observed based on the perspective, the set of features in
its event log must include an event type feature, a time stamp feature, and a case
feature. The case feature makes clear which case each event occurs in (note that
some researches regard the case feature as a feature based on another perspec-
tive, a case perspective). Based on such a perspective, event logs clarify ordering
of events for each case, and the set E of events can be treated as a partially
ordered set (E, ≤), so we sometimes use E as the poset (E, ≤) in this paper. A
sequence of events occurring in a case which are ordered based on time is called
a trace. At the same time, the process can be observed based on the organization
perspective, which is another fundamental perspective. The perspective focuses
on where the occurrence happens or who performs the task, and event logs based
on it must have a place feature, a resource feature, or an employee feature. In
this paper, we assume that a given event log records statistically enough events.

Example 1 As a running example, we show a process which is handling a re-
quest for compensation within an airline. Customers may request the airline to
compensate for various reasons, e.g., delay of flight or its cancelation. In such
situations, the airline has to examine the validity of the request and needs to pay
compensation if it is unquestionable. Table 1 shows an event log recording the
compensation process which is partially quoted from [13]. In this example, an
event means a task executed by an employee: the first event in the table shows
that a task called “register request” is executed as the beginning of Case 1 by
Pete at 11:02 on 30 Dec., 2010. In this log, the features Case ID, Event type,
and Time are based on the process perspective. Resource feature is based on
the organization perspective and represents human resources needed for each of
the event. Cost feature comes from another perspective. The log also shows that
62 4        Madori
         Madori     IKEDA,
                Ikeda,      Keisuke
                       Keisuke OtakiOTAKI, and Akihiro
                                     and Akihiro       YAMAMOTO
                                                 Yamamoto

   Table 1. An event log L = (F, E) recording a compensation process of an airline: each
   row shows an event which is represented by five features.

            Case ID     Event type      Resource Cost Time(dd-mm-yyy.hh:mm)
               1      register request    Pete     50    30-12-2010.11:02
               1    examine thoroughly    Sue     400    31-12-2010.10:06
               1        check ticket     Mike     100    05-01-2011.15:12
               1           decide         Sara    200    06-01-2011.11:18
               1       reject request     Pete    200    07-01-2011.14:24
               2      register request   Mike      50    30-12-2010.11:32
               2        check ticket     Mike     100    30-12-2010.12:12
               2     examine casually    Sean     400    30-12-2010.14:16
               2           decide         Sara    200    05-01-2011.11:22
               2     pay compensation    Ellen    200    08-01-2011.12:05
               3      register request    Pete     50    30-12-2010.14:32
               3     examine casually    Mike     400    30-12-2010.15:06
               3        check ticket     Ellen    100    30-12-2010.16:34
               3           decide         Sara    200    06-01-2011.09:18
               3     reinitiate request   Sara    200    06-01-2011.12:18
               3    examine thoroughly Sean       400    06-01-2011.13:06
               3        check ticket      Pete    100    08-01-2011.11:43
               3           decide         Sara    200    09-01-2011.09:55
               3     pay compensation    Ellen    200    15-01-2011.10:45



   three cases are observed and recorded as three traces, and that their length are
   5, 5, and 9, respectively.


   2.2     Models of Processes
   Models of processes are also important in process mining because they are deeply
   related with the three types of process mining: models are extracted from event
   logs by the process discovery, they are used with event logs for the process con-
   formance checking and for the process model enhancement. Note that different
   types of models can be considered, and have been researched because of vari-
   ous aims of mining. Some models have been proposed for extract procedure of
   processes, e.g., Petri net [16], Business process modeling notation (BPMN) [3],
   Event-driven process chain (EPC) [7], and UML activity diagram [2]. These pro-
   cedure models express workflow of a process clearly as directed graphs. For an-
   other aim, expressing how resources are involved in a process or how resources are
   related with each other, social network models are proposed [10, 14]. A working-
   together social network expresses relations among resources which are used in
   the same case. A similar-task social network ignores cases but focuses on re-
   lations among resources used together for the same event. A handover-of-work
   social network expresses handovers from resources to resources in cases.
       All of these models are developed for expression, and do not provide any
   analytical function. In other words, they only push event logs into their format,
                 FCA
               FCA for for Process
                        Process    EnhancementBased
                                Enhancement    Basedon
                                                     onaa Pair
                                                          Pair of
                                                               of Perspectives
                                                                  Perspectives                    5
                                                                                                 63


                                          check ticket


                                                                               reject request
                               examine casually
start   register request                                                                        end
                                                         decide
                                                                              pay compensation
                              examine thoroughly
                                                         reinitiate request


Fig. 1. A Petri net of the compensation process: every square called a transition indi-
cates an event, and every circle called a place represents a state of the process.



and analysis is not their duty. However, for process enhancement, we need some
analytical function for evaluating the enhancement. In addition, models focusing
on one perspective are apt to neglect other perspectives. For example, the pro-
cedure models focusing on the process perspective do not contain information
about resources which are observed based on the organization perspective. On
the contrary, the social networks focusing on the organization perspective make
correlations among resources explicit but make workflows which are observed
based on the process perspective unclear. For our goal, detecting weak points of
a process, we claim that its weakness should be measured based on at least two
perspectives. This work thus relates to process model enhancement which is to
extend a process model.
Example 2 Figure 1 shows a procedure model which is expressed in terms of
a Petri net [16] extracted from the event log shown in Table 1. This model
explicitly expresses the workflow of the compensation process and makes it clear
which event happens before/after another event. On the other hand, the model
ignores other perspectives: information derived from Resource and Cost features
are not expressed at all in the model. Figure 2 shows a similar-task social network
[10, 14] generated from the same event log. This model clarifies relations among
employees sharing the same tasks, but it does not care about the ordering of
events.


2.3      Weak Points Detection for Process Enhancement
Our final goal is process enhancement. For the goal, we propose to detect subse-
quences of events from a given event log as weak points which should be removed.
Actually, our method does not decide whether or not subsequences of events are
weak points. Instead, the method estimates the weakness for each of some sub-
sequences of events and expresses it in a number called a weakness degree. Then,
some weaker subsequence of events should be removed for the enhancement.
   For the definition of the weakness degree, there are various candidates. If the
process perspective is focused, sequences of events taking a lot of time in a process
must be its weak points. Another type of weak points are looping sequences which
64 6        Madori
         Madori     IKEDA,
                Ikeda,      Keisuke
                       Keisuke OtakiOTAKI, and Akihiro
                                     and Akihiro       YAMAMOTO
                                                 Yamamoto

                                                            Sean
                                        Mike



                         Ellen
                                                                   Sue



                                                     Sara
                                 Pete



   Fig. 2. A similar-task social network of the compensation process: every circle indicates
   an employee, and an edge is drawn between employees if their tasks are statistically
   similar.



   many cases have to take. In the running example, it is reasonable to take costs
   of events into account for weakness. In this work, we focus on importance of
   a subsequence of events and loads of it. The importance is decided based on
   the process perspective and on the organization perspective. More precisely, a
   subsequence of events in an event log is considerable if the events are executed
   by a small number of resources in the log. Loads of the important sequence
   increase if the sequence appears many times in the log. In our method, important
   sequences of events having heavy loads are weak points of a process.
   Example 3 In the running example, the subsequence “decide” executed by Sara
   should be regarded as weaker than the others. Because the subsequence is impor-
   tant due to the fact that it can be executed only by Sara, and because the event,
   “decide” by Sara, is very frequent. Only from the Petri net shown in Figure 1,
   it can be induced that the event “decide” is important in the process. It is also
   induced only from the social network shown in Figure 2 that Sara takes some
   important role. However, these models do not show explicitly that “decide” by
   Sara is important and has an impact on the process.


   3     Process Enhancement via FCA
   We adopt FCA for mining weak points of processes, so we firstly introduce the
   definitions of formal concepts and formal concept lattices with referring to [1,5].
   Then, we explain our method.

   3.1     From an Event Log to a Concept Lattice
   A formal context is a triplet K = (G, M, I) where G and M are mutually
   disjoint finite sets, and I ⊆ G × M . Each element of G is called an object,
   and each element of M is called an attribute. For a subset of objects A ⊆ G
   and a subset of attributes B ⊆ M of a formal context K, we define AI =
   { m ∈ M | ∀g ∈ A. (g, m) ∈ I }, B I = { g ∈ G | ∀m ∈ B. (g, m) ∈ I }, and a pair
   (A, B) is a formal concept if AI = B and A = B I . For a formal concept
            FCA
          FCA for for Process
                   Process    EnhancementBased
                           Enhancement    Basedon
                                                onaa Pair
                                                     Pair of
                                                          of Perspectives
                                                             Perspectives        7
                                                                                65

c = (A, B), A and B are called the extent and the intent, respectively, and
let Ex(c) = A and In(c) = B. For arbitrary formal concepts c and c0 , we define
an order c ≤ c0 iff Ex(c) ⊆ Ex(c0 ) (or equally In(c) ⊇ In(c0 )). The set of all
formal concepts of a context K = (G, M, I) with the order ≤ is denoted by
B(G, M, I) (for short, B(K)) and is called the formal concept lattice (concept
lattice for short) of K. For every object g ∈ G of (G, M, I), the formal concept
      II      I
({ g } , { g } ) is called the object concept and denoted by γg. Similarly, for ev-
                                                  I      II
ery attribute m ∈ M , the formal concept ({ m } , { m } ) is called the attribute
concept and denoted by µm.
    In our method, a formal context is obtained by translation from an event
log, and then weak point mining is performed with a concept lattice constructed
from the context. Suppose that the event log consists of two types of features,
one of them is based on the process perspective, and that the other is based on
the organization perspective. In this paper, the first one is called an event-type
feature and is denoted by fe , and the second is called a resource feature and is
denoted by fr . Note that the event-type feature represents types of events, not
cases, and not time. This assumption is not strong because such features are very
fundamental and are adopted in XES [17] in fact. From such an event log L =
(F, E) that F ⊇ { fe , fr }, a formal context KL = (G, M, I) is translated where
G = Dfe , M = Dfr , I = { (g, m) ∈ G × M | ∃e ∈ E.fe (e) = g ∧ fr (e) = m }.
In the context KL = (G, M, I), (g, m) ∈ I means that events sorted into g
need a resource m. For every element (g, m) ∈ I of the formal context KL , we
additionally define
              freq((g, m)) = | { e ∈ E | fe (e) = g ∧ fr (e) = m } |.
This function outputs frequency of events which are sorted into an event-type g
and need resource m in the event log L.
Example 4 In the running example, “Event type” corresponds to the event-
type feature, and “Resource” corresponds to the resource feature. Therefore, a
formal context KL = (G, M, I) shown in Table 2 is obtained from the event log
shown in Table 1. For example, freq((register request, Pete)) = 2 shows that an
event “register request” by Pete is observed twice in construction of the event
log in Table 1.
    From a formal context KL translated from an event log L, a concept lattice
B(KL ) is constructed for process enhancement. Each formal concept c = (A, B)
of the concept lattice B(KL ) represents a pair of a set A of event-types and a
set B of resources needed for events in A. For every formal concept c ∈ B(KL ),
we define
                      Exγ (c) = { g ∈ Ex(c) | γg = c } , and
                       Inµ (c) = { m ∈ In(c) | µm = c } .
By extending freq for I, we also define
                                 X      X
                     freq(c) =            freq((g, m)).
                                g∈Ex(c) m∈In(c)
66 8        Madori
         Madori     IKEDA,
                Ikeda,      Keisuke
                       Keisuke OtakiOTAKI, and Akihiro
                                     and Akihiro       YAMAMOTO
                                                 Yamamoto

   Table 2. A formal context KL = (G, M, I) constructed from the event log L of the
   compensation process: elements of G are listed in the left most column, elements of M
   are listed in the first row, and every cell indicates freq(i) for i ∈ I unless freq(i) = 0.

                                           Pete Sue Mike Sara Sean Ellen
                       register request     2        1
                     examine throughly           1             1
                         check ticket       1        2               1
                            decide                        4
                        reject request      1
                      examine casually               1         1
                     pay compensation                                2
                      reinitiate request                  1




   The value freq(c) is the sum of frequencies of events which are sorted into an
   event-type g ∈ Ex(c) and need a resource m ∈ In(c).
   Example 5 Figure 3 shows a concept lattice B(KL ) of the context KL =
   (G, M, I) shown in Table 2. For example, the left most circle in the figure indi-
   cates a formal concept c2 = ({ check ticket, pay compensation } , { Ellen }). The
   sum of frequencies freq(c2 ) = 3 means that a task “check ticket” or “pay com-
   pensation” executed by Ellen appears three times in the event log L shown in
   Table 1.


   3.2     Calculating Weakness Degrees
   As we mentioned in Section 2.3, for every subsequence of events which is the
   extent of a formal concept, we define the weakness degree, and the weakness is
   estimated from its importance and its loads.
       The importance is estimated based on both of the process perspective and
   the organization perspective. Every formal concept (A, B) ∈ B(KL ) is based
   on both of the perspectives because A is a set of event-types observed from the
   process perspective and B is a set of resources observed from the organization
   perspective. Such a formal concept is considered to represent that accomplishing
   all the events in A needs at least one of the resources in B and that every
   resource in B can execute all the events in A. From this consideration, we define
   the importance imp(c) of the subsequence Ex(c) of a formal concept c ∈ B(KL )
   as
                                     1 + |Exγ (c)|   1 + |Ex(c)|
                          imp(c) =                 ×               .
                                      1 + |In(c)|    1 + |Inµ (c)|
   We call this an importance factor. Roughly speaking, this factor becomes large
   when a small number of resources are needed for a large number of events. The
   first term means the ratio of the number of events to the number of resources
   which can accomplish the events. In other words, if some or many events rely on
               FCA
             FCA for for Process
                      Process    EnhancementBased
                              Enhancement    Basedon
                                                   onaa Pair
                                                        Pair of
                                                             of Perspectives
                                                                Perspectives                                 9
                                                                                                            67

                                                       register request
                                                       examine thoroughly
                                                       check ticket
                                                       decide
                                                       reject request
                                           freq = 0    examine casually
                                           imp = 9 c 1 pay compensation
                                          weak = 0     reinitiate request




            freq = 4    Pete                                                      freq = 5    Sara
                                            freq = 4      Mike
            imp = 2 c 3 register request                                        imp = 2.25 c5
                                            imp = 1 c4    register request                    decide
         weak ≒ 0.42                                                           weak ≒ 0.59
                        check ticket     weak ≒ 0.21      check ticket                        reinitiate request
                        reject request                    examine casually


                                       Pete                        freq = 2
                            freq = 6   Mike                                     Sean
                            imp = 2 c6                           imp = 0.75 c 7
                                       register request         weak ≒ 0.08     examine thoroughly
                         weak ≒ 0.63   check ticket
   freq = 3    Ellen                                                            examine casually
  imp = 1.5 c2 check ticket
weak ≒ 0.24    pay compensation
                                                                   freq = 2     Sue
                                                                                Sean
                                                                 imp ≒ 0.67 c10
                                                                weak ≒ 0.07     examine thoroughly
                                       freq = 2    Mike
                                                   Sean
                           Pete      imp ≒ 1.33 c9
                           Mike     weak ≒ 0.14    examine casually
               freq = 4    Ellen
               imp = 1 c 8 check ticket
            weak ≒ 0.21


                                             freq = 0 c Pete
                                          imp ≒ 0.14 11 Sue
                                                        Mike
                                            weak = 0    Sara
                                                        Sean
                                                        Ellen


Fig. 3. A formal concept lattice B(KL ) constructed from the formal context KL : Each
circle represents a formal concept c ∈ B(KL ). Each edge represents an order ≤ between
two concepts, and the greater concept is drawn above, and transitional orders are
omitted. Every formal concept c accompanies with Ex(c) and In(c) on its right side
and with freq(c), imp(c), and weak(c) on its left side.



little resources then the term is large. The second means the ratio of the number
of resources to the number of events which are executed by the resources. It
becomes large, if some or little resources are exhausted by many events. Also,
we define load(c) of the subsequence Ex(c) as
                                                        freq(c)
                                          load(c) =
                                                          |E|
and call it a load factor. This is a ratio of frequency of events in the sequence
Ex(c) to frequency of the whole events E. Then, for the subsequence Ex(c), the
weakness degree weak(c) is defined as

                                   weak(c) = imp(c) × load(c).

When an important sequence Ex(c) takes a heavy load, weak(c) becomes large.
In other words, the weakness degree numerically shows liableness of trouble
68 10 Madori
         Madori  IKEDA,
             Ikeda,      Keisuke
                    Keisuke OtakiOTAKI, and Akihiro
                                  and Akihiro       YAMAMOTO
                                              Yamamoto

   with Ex(c) to cause the whole process down. By extending
                                                    P       this definition, the
   weakness of the whole process can be expressed as c∈B(KL ) weak(c).
   Example 6 In Figure 3, importance factors and weakness degrees of every sub-
   sequence of events Ex(c), c ∈ B(KL ) are also drawn. The importance factors
   show that the sequence of tasks Ex(c5 ) = { decide, reinitiate request } executed
   by Sara is the most important. Indeed, there is no employee who can execute the
   tasks “decide” and “reinitiate request”, but Sara. On the other hand, the weak-
   ness degrees show that the sequence Ex(c6 ) = { register request, check ticket } of
   tasks is the weakest, and that the most important sequence Ex(c5 ) is the sec-
   ondary weakest. This reversal ofProles is caused by their load factors. The total
   weakness of the whole process c∈B(KL ) weak(c) is around 2.59.

   3.3   Removing Weak Points
   A process recorded in an event log L can be P      enhanced by removing the weak-
   est point or by reducing the total weakness c∈B(KL ) weak(c). Though there
   are many ways for achieving the enhancement, in this paper, we achieve it by
   operations to an original formal context KL = (G, M, I) which remove some
   weakest
   P         formal concepts from its concept lattice B(KL ), or which totally reduce
      c∈B(KL ) weak(c). We here show some basic ideas for such operations.
       Observing the definitions about the weakness shows that there are three
   plans for the reduction: reducing importance factors, reducing load factors, and
   decreasing the number of formal concepts. Though there are many operations
   achieving the plans, realizable operations are restricted by considering that we
   try to manage an actual enterprise process. Reduction of importance factors
   can be achieved by increasing the number of resources to the number of events
   requiring the resources. Also, reducing events can decrease importance factors,
   but we do not adopt this way because it has a risk that the process never
   works. In other words, we try to enhance processes by investment in equipment
   not by polishing processes. Besides, reducing load factors is not reasonable for
   our method, because we do not have control of frequency of events. Thus, our
   enhancement operations are to increase resources for events requiring them or
   to decrease formal concepts.
       For enhancement of a process recorded in an event log L, we show two kinds of
   such operations. The first kind is adding (g, m) ∈  / I such that g ∈ Ex(c) and m ∈
   M to I for removing a formal concept c from B(KL ) 3 c. This means to expand
   flexibility of resources, e.g., updating machines, and expanding applicability of
   materials by an innovation. We have to note that the total weakness is not always
   reduced in this case. The second is adding m such that m ∈      / M and (g, m) ∈ /I
   such
   P     that  g ∈ Ex(c) to M  and  I, respectively. This can reduce the total weakness
      c∈B(KL ) weak(c). This means introducing new resources for sequences of events
   Ex(c). For example, purchase of the same machines as existing ones, and using a
   substitute to make up a shortage of materials. In order to decide properly which
   kind of operations is executed, we need other factors, e.g., execution time of the
   process, or costs and easiness of applying the operations.
             FCA
           FCA for for Process
                    Process    EnhancementBased
                            Enhancement    Basedon
                                                 onaa Pair
                                                      Pair of
                                                           of Perspectives
                                                              Perspectives            11
                                                                                      69

Example 7 In the running example, there are some choices for removing the
weakest sequence Ex(c6 ) = { register ticket, check ticket }. For example, addition
of (register request, Ellen) to I which means that Ellen gets an ability to “reg-
ister request” can remove the weak point. It removes the concept c6 , changes
c2 into ({ register request, check ticket, pay compensation } , { Ellen }), and c8 into
({ register request, check ticket } , { Pete, Mike, Ellen }), respectively. If we assume
that “register request” is shared equally by Pete, Mike, and Ellen, the num-
bers are changed: freq(c2 ) = 4, imp(c2 ) = 2, weak(c2 ) ; 0.42, freq(c3 ) = 3,
imp(c3 ) = 2, weak(c3 ) ; 0.32, freq(c8 ) = 7, imp(c8 ) = 2.25, weak(c8 ) ; 0.83. In
this case, the total weakness increases to around 2.66. Employing a new person,
Bob, having ability to execute “register request” is an operations of the second
type. This is to add Bob ∈   / M to M and to add (register request, Bob) ∈      / I to I.
In this case, a new concept c12 = ({ register request } , { Bob }) is generated, and
then, the total weakness decrease to 2.17 by assuming that “register request” is
shared equally by Pete, Mike, and Bob. Because weak(c3 ) and weak(c6 ) decrease
to around 0.32 and around 0.26, respectively, and weak(c12 ) ; 0.05.


4    Conclusions

In this paper, we propose to apply FCA (formal concept analysis) to process
enhancement. FCA is to analyze data from a dual viewpoint which is based on
objects and attributes. Processes are recorded in event logs which are constructed
by observation based on some perspectives. We assign a pair of the process
perspective and the organization perspective to the objects and the attributes
of FCA in order to investigate weak points of a process. Weakness of a sequence
of events executed by resources is calculated by importance and loads of it.
    There are many problems to be solved. Our weakness of process is not defined
from enough analysis because only two features from two perspectives are con-
sidered. For improving a process more efficiently, we need to take into account
other features across other perspectives in weak point detection. For example,
using a time-stamp feature enables us to detect bottleneck of a process, using
a cost feature enables us to find costly sequences. It may be achieved by com-
bining other process models with our concept lattice. We also have to refine the
operations for removing weak points. In our method, the number of the choices
for enhancement sometimes becomes so large. A plan of the refinement is to
estimate in advance the total weakness of a reinforced process for each of the
choices. Combining other models is also useful. For example, combining proce-
dure models with our method can suggest some effective operations from the
many choices. Because such models sufficiently treat order of events in traces
which is ignored by our lattice based approach. On the other hand, there are
many constraints on resources in practical processes, e.g., some materials can be
substituted few materials but the others can not, and employees are divided into
groups in a company. In order to reduce the choices based on such constrains,
social network models might be useful.
70 12 Madori
         Madori  IKEDA,
             Ikeda,      Keisuke
                    Keisuke OtakiOTAKI, and Akihiro
                                  and Akihiro       YAMAMOTO
                                              Yamamoto

   Acknowledgment
   This work was supported by JSPS KAKENHI Grant Number 26280085.

   References
   1. B. A. Davey, and H. A. Priestley. Introduction to Lattices and Order. Cambridge
      University Press, 2002.
   2. M. Dumas, and A. ter Hofstede. UML Activity Diagrams as a Workflow Specification
      Language. In M. Gogolla and C. Kobryn, editors, UML 2001 The Unified Modeling
      Language. Modeling Languages, Concepts, and Tools, Lecture Notes in Computer
      Science, vol. 2185, pp. 76–90, 2001.
   3. R. Flowers, and C. Edeki. Business Process Modeling Notation. International Jour-
      nal of Computer Science and Mobile Computing, vol. 2, issue 3, pp. 35–40, 2013.
   4. F. Forster. The Idea behind Business Process Improvement : Toward a Business
      Process Improvement Pattern Framework. BP Trends, pp. 1–14, 2006.
   5. B. Ganter, and R. Wille. Formal Concept Analysis: Mathematical Foundations.
      Springer-Verlag New York Inc., 1997.
   6. D. Georgakopoulos, M. Hornick, and A. Sheth. An Overview of Workflow Manage-
      ment: From Process Modeling to Workflow Automation Infrastructure. Distributed
      and Parallel Databases, vol. 3, issue 2, pp. 119–153, 1995.
   7. J. Mendling, and M. Nüttgens. EPC markup language (EPML): an XML-based
      interchange format for event-driven process chains (EPC). Information Systems and
      e-Business Management, vol. 4, issue 3, pp. 245–263, 2006.
   8. H. A. Reijers, and S.L. Mansar. Best Practices in Business Process Redesign: An
      Overview and Qualitative Evaluation of Successful Redesign Heuristics. Omega, vol.
      33, issue 4, pp. 283–306, 2005.
   9. A. Shtub, and R. Karni. ERP - The Dynamics of Supply Chain and Process Man-
      agement. Springer, 2010.
   10. M. Song, and W. van der Aalst. Towards Comprehensive Support for Organiza-
      tional Mining. Decis. Support Syst., vol. 46, issue 1, pp. 300–317, 2008.
   11. W. van der Aalst, et al.. Process Mining Manifesto. In F. Daniel, K. Barkaoui
      and S. Dustdar, editors, Business Process Management Workshops, Lecture Notes
      in Business Information Processing, vol. 99, pp. 169–194, 2012.
   12. W. van der Aalst, A. ter Hofstede, and M. Weske. Business Process Management: A
      Survey. In W. van der Aalst and M. Weske, editors, Business Process Management,
      Lecture Notes in Computer Science, vol. 2678, pp. 1–12, 2003.
   13. W. van der Aalst. Process Mining - Discovery, Conformance and Enhancement of
      Business Processes. Springer, 2011.
   14. W. van der Aalst, H. Reijers, and M. Song. Discovering Social Networks from Event
      Logs. Comput. Supported Coop. Work, vol. 14, issue 6, pp. 549–593, 2005.
   15. W. van der Aalst, B. van Dongen, J. Herbst, L. Maruster, G. Schimm, and A.
      Weijters. Workflow Mining: A Survey of Issues and Approaches. Data Knowl. Eng.,
      vol. 47, issue 2, pp. 237–267, 2003.
   16. B. van Dongen, A. Alves de Medeiros, and L. Wen. Process Mining: Overview and
      Outlook of Petri Net Discovery Algorithms. In K. Jensen and W. van der Aalst,
      editors, Transactions on Petri Nets and Other Models of Concurrency II, Lecture
      Notes in Computer Science, vol. 5460, pp. 225–242, 2009.
   17. H. Verbeek, J. Buijs, B. van Dongen, and W. van der Aalst. XES, XESame, and
      ProM 6. In P. Soffer and E. Proper, editors, Information Systems Evolution, Lecture
      Notes in Business Information Processing, vol. 72, pp. 60–75, 2011.
            Merging Closed Pattern Sets in Distributed
                     Multi-Relational Data

                                 Hirohisa Seki⋆ and Yohei Kamiya

                       Dept. of Computer Science, Nagoya Inst. of Technology,
                                 Showa-ku, Nagoya 466-8555, Japan
                                       seki@nitech.ac.jp



               Abstract. We consider the problem of mining closed patterns from
               multi-relational databases in a distributed environment. Given two lo-
               cal databases (horizontal partitions) and their sets of closed patterns
               (concepts), we generate the set of closed patterns in the global database
               by utilizing the merge (or subposition) operator, studied in the field of
               Formal Concept Analysis. Since the execution times of the merge opera-
               tions increase with the increase in the number of local databases, we pro-
               pose some methods for improving the merge operations. We also present
               some experimental results using a distributed computation environment
               based on the MapReduce framework, which shows the effectiveness of
               the proposed methods.

               Key Words: multi-relational data mining, closed patterns, merge (sub-
               position) operator, FCA, distributed databases, MapReduce


      1      Introduction

      Multi-relational data mining (MRDM) has been extensively studied for more
      than a decade (e.g., [7, 8] and references therein), and is still attracting increas-
      ing interest in the fields of data mining (e.g., [14, 29]) and inductive logic pro-
      gramming (ILP). In the framework of MRDM, data and patterns (or queries)
      are represented in the form of logical formulae such as datalog (a class of first
      order logic). This expressive formalism of MRDM allows us to use complex and
      structured data in a uniform way, including trees and graphs in particular, and
      multi-relational patterns in general.
          On the other hand, Formal Concept Analysis (FCA) has been developed as
      a field of applied mathematics based on a clear mathematization of the notions
      of concept and conceptual hierarchy [11]. While it has attracted much interest
      from various application areas including, among others, data mining, knowledge
      acquisition and software engineering (e.g., [12]), research on extending the capa-
      bilities of FCA for AI (Artificial Intelligence) has recently been attracted much
      attention [20].
       ⋆
           This work was partially supported by JSPS Grant-in-Aid for Scientific Research (C)
           24500171.




c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 71–83,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
72         Hirohisa Seki and Yohei Kamiya

         The notion of iceberg query lattices, proposed by Stumme [30], combines the
     notions of MRDM and FCA; frequent datalog queries in MRDM correspond
     to iceberg concept lattices (or frequent closed itemsets) in FCA. Ganter and
     Kuznetsov [10] have extensively studied the framework of more expressive pat-
     tern structures. In MRDM, condensed representations such as closed patterns
     and free patterns have been also studied in c-armr by De Raedt and Ramon [6],
     and in RelLCM2 by Garriga et al. [13].
         We consider in this paper the problem of mining closed patterns (or queries)
     in multi-relational data, particularly applying the notion of iceberg query lat-
     tices to a distributed mining setting. The assumption that a given dataset is
     distributed and stored in different sites will be reasonable for some situations
     where we might not be able to move local datasets into a centralized site due to
     too much data size and/or privacy concerns.
         Given two local databases (horizontal partitions) and their sets of closed
     patterns (concepts), the set of closed patterns in the global database can be con-
     structed by using subposition) operator [11, 33] or the merge operator [23]. From
     our preliminary experiments [28] using a distributed computation environment
     MapReduce [3], we have found that the execution times of computing the merge
     operations have increased with the increase in the number of local databases. In
     this paper, we therefore propose some methods for computing the merge opera-
     tions so that we can efficiently construct the set of global closed patterns from
     the sets of local closed patterns. Our methods are based on the properties of the
     merge operator.
         The organization of the rest of this paper is as follows. After summarizing
     some basic notations and definitions of closed patterns mining in MRDM in
     Sect. 2, we consider distributed closed pattern mining in MRDB and the merge
     operator in Sect. 3. We then explain our approach to improving the merge oper-
     ations in Sect. 4. In Section 5, we show the effectiveness of our methods by some
     experimental results. Finally, we give a summary of this work in Section 6.

     2      Iceberg Query Lattices in Multi-Relational Data
            Mining
     2.1     Multi-Relational Data Mining
     In the task of frequent pattern mining in multi-relational databases, we assume
     that we have a given database r, a language of patterns, and a notion of fre-
     quency which measures how often a pattern occurs in the database. We use
     datalog, or Prolog without function symbols other than constants, to represent
     data and patterns. We assume some familiarity with the notions of logic pro-
     gramming (e.g., [22, 24]), although we introduce some notions and terminology
     in the following.
     Example 1. Consider a multi-relational database r in Fig. 1 (above), which con-
     sists of five relations, Customer, Parent, Buys, Male and Female. For each rela-
     tion, we introduce a corresponding predicate, i.e., customer , parent, buys, male
     and female, respectively.
        Merging Closed Pattern Sets in Distributed Multi-Relational Data                                                       73

              Customer                    Parent                             Buys                                 Male
                 key                       SR.   JR.                          key           item                  person
                allen                     allen bill                         allen          pizza                   bill
                carol                     allen jim                          carol          pizza                   jim
               diana                      carol bill                         diana          cake
                fred                      diana eve                           fred          cake
                                           fred  eve                                                              Female
                                           fred hera                                                              person
                                                                                                                    eve
                                                                                                                   hera

                                                           key(X)

                                                        {a, c, d, f }




                                                               key(X), parent(X, Y )                   key(X), buys(X, cake)
              key(X), buys(X, pizza)

                              {a, c}                                    {(a, b), (a, j), (c, b),       {d, f }
                                                                          (d, e), (f, e), (f, h)}



                     key(X), buys(X, pizza),                                               key(X), buys(X, cake),
                      parent(X, Y ), male(Y )                                                parent(X, Y ), female(Y )
                                                                                           {(d, e), (f, e), (f, h)}
                             {(a, b), (a, j), (c, b)}


Fig. 1. An Example of Datalog Database r with customer relation as a key (above)
and the Iceberg Query Lattice Associated to r (below), where a substitution θ =
{X/t1 , Y /t2 } (resp., θ = {X/t1 }) is simply denoted by (t1 , t2 ) (resp., t1 ), and the name
(e.g., allen) of each person in the tables is abbreviated to its first character (e.g., a).



    Consider the following pattern P = customer (X), parent(X, Y ), buys(X, pizza).
For a substitution θ, P θ is logically entailed by r, denoted by r |= P θ, if there
exists a tuple (a1 , a2 ) such that a1 ∈ Customer, (a1 , a2 ) ∈ Parent, and tuple
(a1 , pizza) ∈ Buys. Then, answerset(P, r) = {{X/allen, Y /bill }, {X/allen, Y /jim},
{X/carol , Y /bill }}.                                                            2
    An atom (or literal ) is an expression of the form p(t1 , . . . .tn ), where p is a
predicate (or relation) of arity n, denoted by p/n, and each ti is a term, i.e., a
constant or a variable.
    A substitution θ = {X1 /t1 , . . . , Xn /tn } is an assignment of terms to variables.
The result of applying a substitution θ to an expression E is the expression Eθ,
where all occurrences of variables Vi have been simultaneously replaced by the
corresponding terms ti in θ. The set of variables occurring in E is denoted by
Var (E).
    A pattern is expressed as a conjunction of atoms (literals) l1 ∧· · ·∧ln , denoted
simply by l1 , . . . , ln . A pattern is sometimes called a query. We will represent
conjunctions in list notation, i.e., [l1 , . . . , ln ]. For a conjunction C and an atom
p, we denote by [C, p] the conjunction that results from adding p after the last
element of C.
74         Hirohisa Seki and Yohei Kamiya

         Let C be a pattern (i.e., a conjunction) and θ a substitution of Var (C).
     When Cθ is logically entailed by a database r, we write it by r |= Cθ. Let
     answerset(C, r) be the set of substitutions satisfying r |= Cθ.
         In multi-relational data mining, one of the predicates is often specified as a
     key (or target) (e.g., [4, 6]), which determines the entities of interest and what is
     to be counted. The key (target) is thus to be present in all patterns considered.
     In Example 1, the key is predicate customer .
         Let r be a database and Q be a query containing a key atom key(X). Then,
     the support (or frequency) of Q, denoted by supp(Q, r, key), is defined to be
     the number of different keys that answer Q (called the support count or abso-
     lute support), divided by the total number of keys. Q is said to be frequent, if
     supp(Q, r, key) is no less than some user defined threshold min sup.
         A pattern containing a key will not be always meaningful; for example, let
     C = [customer (X), parent(X, Y ), buys(Z, pizza)] be a conjunction in Example 1.
     Variable Z in C is not linked to variable X in key atom customer (X); an object
     represented by Z will have nothing to do with key object X. It will be inap-
     propriate to consider such a conjunction as an intended pattern to be mined. In
     ILP, the following notion of linked literals [16] is used to specify the so-called
     language bias.

     Definition 1 (Linked Literal). [16] Let key(X) be a key atom and l a literal.
     l is said to be linked to key(X), if either X ∈ Var (l) or there exists a literal l1
     such that l is linked to key(X) and Var (l1 ) ∩ Var (l) ̸= ∅.                     2

         Given a database r and a key atom key(X), we assume that there are pre-
     defined finite sets of predicate (resp. variables; resp. constant symbols), and
     that, for each literal l in a conjunction C, it is constructed using the predefined
     sets. Moreover, each pattern C of conjunctions satisfies the following conditions:
     key(X) ∈ C and, for each l ∈ C, l is linked to key(X). In the following, we de-
     note by Q the set of queries (or patterns) satisfying the above bias condition.


     2.2     Iceberg Query Lattices with Key

     We now consider the notion of a formal context in MRDM, following [30].

     Definition 2. [30] Let r be a datalog database and Q a set of datalog queries.
     The formal context associated to r and Q is defined by Kr, Q = (Or, Q , Ar, Q , Ir, Q ),
     where Or, Q = {θ | θ is a grounding substitution for all Q ∈ Q}, and Ar, Q = Q,
     and (θ, Q) ∈ Ir, Q if and only if θ ∈ answerset(Q, r).                            2

         From this formal context, we can define the concept lattice the same way as
     in [30]. We first introduce an equivalence relation ∼r on the set of queries: Two
     queries Q1 and Q2 are said to be equivalent with respect to database r if and
     only if answerset(Q1 , r) = answerset(Q2 , r). We note that Var (Q1 ) = Var (Q2 )
     when Q1 ∼r Q2 .
       Merging Closed Pattern Sets in Distributed Multi-Relational Data           75

Definition 3 (Closed Query). Let r be a datalog database and ∼r the equiv-
alence relation on a set of datalog queries Q. A query (or pattern) Q is said to be
closed (w.r.t. r and Q), iff Q is the most specific query among the equivalence
class to which it belongs: {Q1 ∈ Q | Q ∼r Q1 }.                                  2
    For any query Q1 , its closure is a closed query Q such that Q is the most
specific query among {Q ∈ Q | Q ∼r Q1 }. Since it uniquely exists, we denote
it by Clo(Q1 ; r). We note again that Var (Q1 ) = Var (Clo(Q1 ; r)) by definition.
We refer to this as the range-restricted condition here.
    Stumme [30] showed that the set of frequent closed queries forms a lattice,
called an iceberg query lattice. In our framework, it is necessary to take our bias
condition into consideration. To do that, we employ the well-known notion of
the most specific generalization (or least generalization) [26, 24].
    For queries Q1 and Q2 , we denote by lg(Q1 , Q2 ) the least generalization of
Q1 and Q2 . Moreover, the join of Q1 and Q2 , denoted by Q1 ∨ Q2 , is defined
as: Q1 ∨ Q2 = lg(Q1 , Q2 )|Q , where, for a query Q, Q|Q is the restriction of Q to
Q, defined by a conjunction consisting of every literal l in Q which is linked to
key(X), i.e., deleting every literal in Q not linked to key(X).
Definition 4. [30] Let r be a datalog database and Q a set of datalog queries.
The iceberg query lattice associated to r and Q for minsupp ∈ [0, 1] is defined as:
Cr, Q = ({Q ∈ Q | Q is closed w.r.t. r and Q, and Q is frequent}, |=), where |=
is the usual logical implication.                                               2
Example 2. Fig. 1 (below) shows the iceberg query lattice associated to r in Ex. 1
and Q with the support count 1, where each query Q ∈ Q has customer (X) as
a key atom, denoted by key(X) for short, Q is supposed to contain at most two
variables (i.e., X, Y ), and the 2nd argument of predicate buys is a constant. 2
Theorem 1. [28] Let r be a datalog database and Q a set of datalog queries
where all queries contain an atom key and they are linked. Then, Cr, Q is a
∨-semi-lattice.                                                          2

3   Distributed Closed Pattern Mining in MRDB
Horizontal Decomposition of MRDB and Mining Local Concepts
Our purpose in this work is to mine global concepts in a distributed setting,
where a global database is supposed to be horizontally partitioned appropriately,
and stored possibly in different sites. We first consider the notion of a horizontal
decomposition of a multi-relational DB. Since a multi-relational DB consists of
multiple relations, its horizontal decomposition is not immediately clear.
Definition 5. Let r be a multi-relational datalog database with a key pred-
icate key. We call a pair r1 , r2 a horizontal decomposition of r, if (i) keyr =
keyr1 ∪· keyr2 , i.e., the key relation keyr in r is disjointly decomposed into keyr1
and keyr2 in r1 and r2 , respectively, and (ii) for any query Q, answerset(Q, r) =
answerset(Q, r1 ) ∪ answerset(Q, r2 ).                                             2
76      Hirohisa Seki and Yohei Kamiya

     The second condition in the above states that the relations other than the key
     relation in r are decomposed so that any answer substitution in answerset(Q, r)
     is computed either in partition r1 or r2 , thereby being preserved in this horizon-
     tal decomposition. An example of a horizontal decomposition of r is shown in
     Example 3 below.
         Given a horizontal decomposition of a multi-relational DB, we can utilize
     any preferable concept (or closed pattern) mining algorithm for computing local
     concepts on each partition, as long as the mining algorithm is applicable to
     MRDM and its resulting patterns satisfy our bias condition. We use here an
     algorithm called ffCLM [27], which is based on the notion of closure extension
     due to Pasquier et al. [25] and Uno et al. [32] in frequent itemset mining.


     Computing Global Closed Patterns by Merge Operator in MRDM

     To compute the set of global closed patterns from the sets of local closed patterns
     in MRDM, we need the following merge operator ⊕. For patterns C1 and C2 , we
     denote by C1 ∩ C2 a possibly empty conjunction of the form: l1 ∧ · · · ∧ lk (k ≥ 0)
     such that, for each li (i ≤ k), li ∈ C1 and li ∈ C2 .

     Theorem 2. [28] Let r be a datalog database, and r1 , r2 a horizontal decomposi-
     tion of r. Let C (Ci ) (i = 1, 2) be the set of closed patterns of r (ri ), respectively.
     Then, we have the following:

                        C = C1 ⊕ C2
                          = (C1 ∪ C2 ) ∪ {C1 ∩ C2 | C1 ∈ C1 , C2 ∈ C2 ,
                                C1 ∩ C2 is linked with key.}                              (1)

         The set of global closed patterns C is obtained by the union of the local
     closed patterns C1 and C2 , and, in addition to that, by intersecting each pat-
     tern C1 ∈ C1 and C2 ∈ C2 . Furthermore, the pattern obtained by the in-
     tersection, C1 ∩ C2 , should satisfy the bias condition (Def. 1). We note that
     C1 ∩ C2 does not necessarily satisfy the linkedness condition; for example, sup-
     pose that C1 (C2 ) is a closed pattern of the form: C1 = key(X), p(X, Y ), m(Y )
     (C2 = key(X), q(X, Y ), m(Y )), respectively. Then, C1 ∩ C2 = key(X), m(Y ),
     which is not linked to key(X), and thus does not satisfy the bias condition.
         We note that, in the case of transaction databases, the above theorem coin-
     cides with the one by Lucchese et al. [23].

     Example 3. We consider a horizontal decomposition r1 , r2 of r in Example 1
     such that the key relation keyr (i.e., Customer) in r is decomposed into keyr1 =
     {allen, carol} and keyr2 = {dian, fred}, and the other relations than Customer
     are decomposed so that they satisfy the second condition of Def. 5.
         Consider a globally closed pattern C = [key(X), parent(X, Y )] in Fig. 1.
     In r1 , there exists a closed pattern C1 of the form: [C, buys(X, pizza), male(Y )],
     while, in r2 , there exists a closed pattern C2 of the form: [C, buys(X, cake), female(Y )].
     Then, we have that C coincides with C1 ∩ C2 .                                       2
       Merging Closed Pattern Sets in Distributed Multi-Relational Data         77

   We can now formulate our problem as follows:
Mining Globally Closed Patterns from Local DBs:
Input: A set of local databases {DB 1 , . . . , DB n }
Output: the set of global closed patterns C1..n .
   In order to compute C1..n , our approach consists of two phases: we first com-
pute each set Ci (i = 1, . . . , n) of local closed patterns from DB i , and then
we compute C1..n by applying the merge operators. We call the first phase the
mining phase, while we call the second phase the merge phase.

4     Making Merge Computations Efficient in MRDM
In the merge operation in conventional data mining such as itemsets, comput-
ing the intersection of two sets in the merge operation ⊕ is straightforward. In
MRDM, on the other hand, the computation of ⊕ operator becomes somewhat
involved due to handling variables occurring in patterns. Namely, two additional
tests are required: checking the bias condition (linkedness), and checking equiv-
alence modulo variable renaming for eliminating duplicate patterns.
    For closed patterns C1 and C2 , we must check whether the intersection C1 ∩C2
satisfies the linkedness condition. Moreover, we must check whether C1 ∩ C2 is
equivalent (modulo variable renaming) to the other patterns obtained so far.
For example, let C1 (C2 ) be a pattern of the form: C1 = key(X), p(X, Y ), m(Y )
(C2 = key(X), p(X, Z), m(Z)), respectively. Then, C1 is equivalent to C2 modulo
variable renaming.
    When implementing a data mining system, such handling variables in pat-
terns will necessarily require string manipulations, and such string operations
would lead to undesirable overhead in actual implementation. In the following,
we therefore propose two methods for reducing the computational costs in the
merge operation.

4.1   Partitioning Pattern Sets
When computing the merge operation, we can use the following property:
Proposition 1. Let DB = DB1 ∪ DB2 , and C (Ci ) the set of closed patterns of
DB (DBi ) (i = 1, 2), respectively. Then,
               C = C1 ⊕ C2
                 = (C1 ∪ C2 ) ∪ {C1 ∩ C2 | (C1 , C2 ) ∈ (C1 , C2 ) ,
                   C1 ∩ C2 : linked with key, Var (C1 ) = Var (C2 )}           (2)
Proof. Let C be a closed pattern in C such that C is linked with key. From
Theorem 2, it suffices to show that there exist patterns Ci ∈ Ci (i = 1, 2) such
that C = C1 ∩ C2 and Var (C1 ) = Var (C2 ).
   Let Ci = Clo(C; DB i ) (i = 1, 2). Then, we have from the definition of Clo(·; ·)
that Var (C) = Var (C1 ) = Var (C2 ). Moreover, we can show that C = C1 ∩ C2 ,
which is to be proved.                                                           2
78         Hirohisa Seki and Yohei Kamiya

         From the above proposition, when computing the intersection of each pair of
     patterns C1 ∈ C1 and C2 ∈ C2 in (1), we can perform the intersection of only those
     pairs (C1 , C2 ) containing the same set of variables, i.e., Var (C1 ) = Var (C2 ).
     When compared with the original definition of the merge operator ⊕ (Theorem
     2), the above property will be utilized to reduce the cost of the merge operations.

     4.2     Merging Diff-Sets
     Next, we consider another method for making the merge operation efficient,
     which is based on the following simple observation:
     Observation 1. Given sets of closed patterns C1 and C2 , let D1 = C1 \ C2 and
     D2 = C2 \C1 , namely, Di is a difference set (diff-set for short) (i = 1, 2). Suppose
     that C is a new (or generator [33]) pattern in C1 ⊕ C2 , meaning that C ∈ C1 ⊕ C2 ,
     while C ̸∈ C1 ∪C2 . Then, C is obtained by intersection operation, i.e., C = C1 ∩C2
     for some patterns C1 ∈ D1 and C2 ∈ D2 .
         That is, a new closed pattern C will be generated only when intersecting
     those patterns in the difference sets in D1 and D2 . This fact easily follows from
     the property that the set of closed patterns is a semi-lattice: suppose otherwise
     that C1 ∈ D1 , while C2 ̸∈ D2 . Then, C2 ∈ C1 . Since both C1 and C2 are in C1 , we
     have that C = C1 ∩ C2 is a closed pattern also in C1 , which implies that C is not
     a new pattern. Algorithm 1 shows the above-mentioned method based on the
     difference sets. In the algorithm, the computation of supports (or occurrences)
     is omitted, which is done similarly in [33].


         Algorithm 1: Diff-Set Merge(C1 , C2 )
           input : sets of closed patterns C1 , C2
           output: C1..2 = C1 ⊕ C2
      1 C = C1 ∩ C2 ; D1 = C1 \ C2 ; D2 = C2 \ C1 ;
      2 foreach pair (C1 , C2 ) ∈ D1 × D2 do
      3    C ← C1 ∩ C2 ;
      4    if C satisfies the bias condition and C ̸∈ C then
      5        C ← C ∪ {C};
      6    end
      7 end
      8 return C




     5      Experimental Results
     Implementation and Test Data
     To see the effectiveness of our approach to distributed mining, we have made
     some experiments. As for the mining phase, we implemented our approach by
        Merging Closed Pattern Sets in Distributed Multi-Relational Data        79

using Java 1.6.0 22. Experiments of the phase were performed on 8 PCs with
Intel Core i5 processors running at 2.8GHz, 8GB of main memory, and 8MB of
L2 cache, working under Ubuntu 11.04. We used Hadoop 0.20.2 using 8 PCs, and
2 mappers working on each PC. On the other hand, experiments of the merging
phase were performed on one of the PCs.
    We use two datasets, often used in the field of ILP; one is the mutagenesis
dataset1 , and the other is an English corpus of the Penn Treebank Project2 .
    The mutagenesis dataset, for example, contains 30 chemical compounds. Each
compound is represented by a set of facts using predicates such as atom, bond ,
for example. The size of the set of predicate symbols is 12. The size of key atom
(active(X )) is 230, and minimum support min sup = 1/230. We assume that
patterns contain at most 4 variables and they contain no constant symbols. The
number of the closed patterns mined is 5, 784.



Effect of Partitioning Pattern Sets


Fig. 2 (left) summarizes the results of the execution times for a test data on the
mutagenesis dataset. We can see from the figure that the execution times t1 of
the mining phase are reduced almost linearly with the number of partitions. On
the other hand, the execution times t2 of the merging phase for obtaining global
closed patterns increase almost linearly with the number p of partitions from 1
(i.e., no partitioning) to 16. This is reasonable; the number of applying the merge
operators is (p − 1) when we have p partitions. Note that the execution time for
the merge phase in the case of a single partition means some start-up overheads
such as opening/reading a file of the results of the mining phase, followed by
preparing the inputs of the merge operation.
    In this particular example, the time spent in the merge phase is relatively
small when compared with that for the mining phase. This is because the number
of partitions and the number of local closed patterns are rather small. When the
number of partitions of a global database becomes larger, however, the execution
times for the merging phase will become inevitably larger. Considering efficient
merge algorithms is thus an important issue for scalability in MRDM.

    To see the effect of using Proposition 1, Fig. 2 (right) shows the numbers of
closed patterns in a merge computation C1 ⊕ C2 with input sets C1 , C2 of closed
patterns for the mutagenesis dataset with 16 partitions. Each table shows the
number of patterns in Ci (i = 1, 2) containing k variables for 1 ≤ k ≤ 4. The
number of computing intersection operations based on Proposition 1 has been
reduced to about 80% of that of the original computation. The execution times
in Fig. 2 (left) are the results obtained by using this method.

1
    http://www.cs.ox.ac.uk/activities/machlearn/mutagenesis.html
2
    http://www.cis.upenn.edu/ treebank/
80      Hirohisa Seki and Yohei Kamiya




     Fig. 2. Execution Times of the Mining Phase and the Merge Phase (left) and No. of
     Patterns in a Merge Computation (right): An Example in the Mutagenesis Dataset.
     Each number in a quadrangle is the size of a closed pattern set. D1 = C1 \ C2 and
     D2 = C2 \ C1 .




     Effect of Merging Diff-Sets

     Fig. 3 shows its performance results (the execution times), compared with the
     naive method, using the same datasets, the mutagenesis (left) and the English
     corpus (right).
        In both datasets, the execution times decrease as the number n of the local
     DBs increases; in particular, when n = 16 in the mutagenesis data set, the
     execution time is reduced to about 43% of that of the naive method. To see the
     reason of this results, Fig. 2 (right) shows the sizes of the difference sets D1 and
     D2 used in the merge computation C1 ⊕ C2 with input sets C1 , C2 of the closed
     patterns.




     Fig. 3. Results of the Diff-Sets Merge Method: The Mutagenesis Dataset (left) and
     The English Corpus (right)
       Merging Closed Pattern Sets in Distributed Multi-Relational Data              81

6    Concluding Remarks

We have considered the problem of mining closed patterns from multi-relational
databases in a distributed environment. For that purpose, we have proposed two
methods for making the merge (or subposition) operations efficient, and we have
then exemplified the effectiveness of our method by some preliminary experi-
mental results using MapReduce/Hadoop distributed computation framework
in the mining process.
     In MRDM, efficiency and scalability have been major concerns [2]. Krajca et
al. [17, 18] have proposed algorithms to compute search trees for closed patterns
simultaneously either in parallel or in a distributed manner. Their approaches
are orthogonal to ours; it would be beneficial to employ their algorithms for
computing local closed patterns in the mining phase in our framework.
     In this work, we have confined ourselves to horizontal partitions of a global
MRDB. It will be interesting to study vertical partitioning and their mixture in
MRDM, where the apposition operator studied by Valtchev et al. [34] will play
an important role. As future work, our plan is to develop an efficient algorithm
dealing with such a general case in MRDM.


Acknowledgement The authors would like to thank anonymous reviewers for
their useful comments on the previous version of the paper.


References

1. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. in Proc.
  VLDB Conf., pp. 487–499, 1994.
2. Blockeel, H., Sebag, M.: Scalability and efficiency in multi-relational data mining.
  SIGKDD Explorations Newsletter 2003, Vol.4, Issue 2, pp.1-14, 2003.
3. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters.
  Commun. ACM, Vol. 51, No. 1, pp.107–113, 2008.
4. Dehaspe, L.: Frequent pattern discovery in first-order logic, PhD thesis, Dept. Com-
  puter Science, Katholieke Universiteit Leuven, 1998.
5. Dehaspe, L., Toivonen, H.: Discovery of Relational Association Rules. in S. Dzeroski
  and N Lavrac (eds.) Relational Data Mining, pp. 189–212, Springer, 2001.
6. De Raedt, L., Ramon, J.: Condensed representations for Inductive Logic Program-
  ming. in Proc. KR’04, pp. 438-446, 2004.
7. Dzeroski, S.: Multi-Relational Data Mining: An Introduction. SIGKDD Explo-
  rations Newsletter 2003, Vol.5, Issue 1, pp.1-16, 2003.
8. Dzeroski, S., Lavrač, N. (eds.): Relational Data Mining. Springer-Verlag, Inc. 2001.
9. Ganter, B.: Two Basic Algorithms in Concept Analysis, Technical Report FB4-
  Preprint No. 831, TH Darmstadt, 1984. also in Formal Concept Analysis, LNCS
  5986, pp. 312-340, Springer, 2010.
10. Ganter, B., Kuznetsov, S.: Pattern structures and Their Projections, ICCS-01,
  LNCS, 2120, pp. 129-142, 2001.
11. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations.
  Springer, 1999.
82      Hirohisa Seki and Yohei Kamiya

     12. Ganter, B., Stumme, G., Wille, R.: Formal Concept Analysis, Foundations and
       Applications. LNCS 3626, Springer, 2005.
     13. Garriga,G. C., Khardon, R., De Raedt, L.: On Mining Closed Sets in Multi-
       Relational Data. in Proc. IJCAI 2007, pp.804-809, 2007.
     14. Goethals, B., Page, W. L., Mampaey, M.: Mining Interesting Sets and Rules in
       Relational Databases. in Proc. 2010 ACM Sympo. on Applied Computing (SAC ’10),
       pp. 997-1001, 2010.
     15. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edition, Morgan
       Kaufmann Publishers Inc., 2005.
     16. Helft, N.: Induction as nonmonotonic inference. in Proc. KR’89, pp. 149–156, 1989.
     17. Krajca, P., Vychodil, V.: Distributed Algorithm for Computing Formal Concepts
       Using Map-Reduce Framework, in Proc. IDA ’09, Springer-Verlag, pp. 333–344, 2009.
     18. Krajca, P., Outrata, J., Vychodil, V.: Parallel algorithm for computing fixpoints
       of Galois connections, Annals of Mathematics and Artificial Intelligence, Vol. 59, No.
       2, pp. 257–272, Kluwer Academic Publishers, 2010.
     19. Kuznetsov, S. O.: A Fast Algorithm for Computing All Intersections of Objects in
       a Finite Semi-lattice, Automatic Documentation and Mathematical Linguistics, Vol.
       27, No. 5, pp. 11-21, 1993.
     20. Kuznetsov, S. O., Napoli, A., Rudolph, S., eds: FCA4AI: “What can FCA do for
       Artificial Intelligence?” IJCAI 2013 Workshop, Beijing, China, 2013.
     21. Kuznetsov, S. O., Obiedkov, S. A.: Comparing performance of algorithms for gen-
       erating concept lattices. J. Exp. Theor. Artif. Intell., 14(2-3):189-216, 2002.
     22. Lloyd, J. W.: Foundations of Logic Programming, Springer, Second edition, 1987.
     23. Lucchese, C., Orlando, S., Rergo, R.: Distributed Mining of Frequent Closed Item-
       sets: Some Preliminary Results. International Workshop on High Performance and
       Distributed Mining, 2005.
     24. Nienhuys-Cheng, S-H., de Wolf, R.: Foundations of Inductive Logic Programming,
       LNAI 1228, Springer, 1997.
     25. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering Frequent Closed
       Itemsets for Association Rules. in Proc. ICDT’99, LNAI 3245, pp. 398-416, 1999.
     26. Plotkin, G.D.: A Note on Inductive Generalization. Machine Intelligence, Vol. 5,
       pp. 153-163, 1970.
     27. Seki, H., Honda, Y., Nagano, S.: On Enumerating Frequent Closed Patterns with
       Key in Muti-relational Data. LNAI 6332, pp. 72-86, 2010.
     28. Seki, H., Tanimoto, S.: Distributed Closed Pattern Mining in Multi-Relational
       Data based on Iceberg Query Lattices: Some Preliminary Results. in Proc. CLA’12,
       pp.115-126, 2012
     29. Spyropoulou, E., De Bie. T., Boley, M.: Interesting Pattern Mining in Multi-
       Relational Data. Data Min. Knowl. Discov. 42(2), pp. 808-849, 2014.
     30. Stumme, G.: Iceberg Query Lattices for Datalog. In Conceptual Structures at
       Work, LNCS 3127, Springer-Verlag, pp. 109-125, 2004.
     31. Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing Iceberg
       Concept Lattices with Titanic. J. on Knowledge and Data Engineering (KDE) 42(2),
       pp. 189-222, 2002.
     32. Uno, T., Asai, T. Uchida, Y., Arimura, H.: An Efficient Algorithm for Enumerating
       Closed Patterns in Transaction Databases. DS’04, LNAI 3245, pp. 16-31, 2004.
     33. Valtchev, P., Missaoui, R.: Building Concept (Galois) Lattices from Parts: Gener-
       alizing the Incremental Methods. In Proc. 9th Int’l. Conf. on Conceptual Structures:
       Broadening the Base (ICCS ’01), Springer-Verlag, London, UK, pp. 290-303, 2001.
     34. Valtchev, P., Missaoui, R., Pierre Lebrun, P.: A Partition-based Approach towards
       Constructing Galois (Concept) Lattices. Discrete Mathematics 256(3): 801-829, 2002.
       Looking for bonds between nonhomogeneous
                     formal contexts

                   Ondrej Krı́dlo, Lubomir Antoni, Stanislav Krajči

                    University of Pavol Jozef Šafárik, Košice, Slovakia?




          Abstract. Recently, the concept lattices working with the heteroge-
          neous structures have been fruitfully applied in a fuzzy formal concept
          analysis. We present a situation under nonhomogeneous formal contexts
          and explore the bonds in a such nonhomogeneous case. This issue requires
          to formulate the alternative definition of a bond and to investigate the
          relationships between bonds and the particular formal contexts.

          Keywords: bond, heterogeneous formal context, second order formal
          context



  1     Introduction

  Formal concept analysis (FCA) [16] as an applied lattice theory allows us to
  explore the meaningful groupings of objects with respect to common attributes.
  In general, FCA is an interesting research area that provides theoretical foun-
  dations, fruitful methods, algorithms and underlying applications in many areas
  and has been investigated in relation to various disciplines and integrated ap-
  proaches [13,15]. The feasible attempts and generalizations are investigated, one
  can see dual multi-adjoint concept lattices working with adjoint triples [27–29],
  interval-valued L-fuzzy concept lattices [1], heterogeneous concept lattices [2, 3],
  connectional concept lattices [12, 32, 33]. Classical bonds and their generaliza-
  tions acting on residuated lattices were analyzed from a broader perspective
  in [17, 21, 24].
      In this paper, we deal with an alternative notion of the bonds and with a
  problem of looking for bonds in a nonhomogeneous formal contexts. In particular,
  Section 2 recalls the basic notions of a concept lattice, notion of a bond, its
  equivalent definition and preliminaries of a second order formal context and a
  heterogeneous formal context. Section 3 describes the idea of a looking for bonds
  in a nonhomogeneous case. Sections 4 and 5 provide the solution of this issue in
  terms of a second order formal context and heterogeneous formal context.

  ?
      This work was partly supported by grant VEGA 1/0832/12 and by the Slovak Re-
      search and Development Agency under contract APVV-0035-10 “Algorithms, Au-
      tomata, and Discrete Data Structures”.


c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 83–95,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
84      Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči


2     Preliminaries

Definition 1. Let B and A be the nonempty sets, R ⊆ B × A be an arbitrary
binary relation. Triple hB, A, Ri is said to be a formal context with a set of
objects B and a set of their attributes A. Relationships between objects and their
attributes are saved in the relation R. Let us define a pair of derivation operators
(↑, ↓) as the mappings between powersets of B and A such that

 – ↑: P(B) → P(A) and ↓: P(A) → P(B) where for any X ⊆ B and Y ⊆ A is
 – ↑ (X) = {a ∈ A|(∀b ∈ X)(b, a) ∈ R}
 – ↓ (Y ) = {b ∈ B|(∀a ∈ Y )(b, a) ∈ R}.

Such derivation operators can be defined as the mappings between 2-sets (bor-
rowed from fuzzy generalization of FCA that is sometimes easier to use)

 – ↑: 2B → 2A and ↓: 2A → 2B where for any X ∈ 2B and Y ∈ 2A
              V                               V
 – ↑ (X)(a) = b∈B ((b ∈ X) ⇒ ((b, a) ∈ R)) = b∈B (X(b) ⇒ R(b, a))
              V                               V
 – ↓ (Y )(b) = a∈A ((a ∈ Y ) ⇒ ((b, a) ∈ R)) = a∈A (Y (a) ⇒ R(b, a)).

   Pair of such derivation operators forms an antitone Galois connection be-
tween complete lattices of all subsets of B and A. Hence, the compositions of
the mappings form closure operators on such complete lattices.

Definition 2. Let C = hB, A, Ri be a formal context. Any pair of sets (X, Y ) ∈
2B × 2A is said to be a formal concept iff X =↓ (Y ) and Y =↑ (X). Object
part of any concept is called extent and attribute part is called intent. Set of
all extents of formal context C will be denoted by Ext(C). The notation Int(C)
stands for the set of all intents of C.

    All concepts ordered by set inclusion of extents (or equivalently by dual of
intent inclusion) form a complete lattice structure.


2.1   Notion of bond and its equivalent definition

Definition 3. Let Ci = hBi , Ai , Ri i for i ∈ {1, 2} be two formal contexts. Rela-
tion β ⊆ B1 × A2 is said to be a bond iff any row of the table is an intent of C2
and any of its column is an extent of C1 . Set of all bonds between C1 and C2 will
be denoted by 2-Bonds(C1 , C2 ).

Lemma 1. Let Ci = hBi , Ai , Ri i for i ∈ {1, 2} be two formal contexts. Then
β ⊆ B1 ×A2 is a bond between C1 and C2 if and only if Ext(hB1 , A2 , βi) ⊆ Ext(C1 )
and Int(hB1 , A2 , βi) ⊆ Int(C2 ).

Proof. ⇒: Let X ∈ Ext(hB1 , A2 , βi) be an arbitrary extent of any bond between
formal contexts C1 and C2 . Derivation operators of Ci will be denoted by (↑i , ↓i )
                Looking for Bonds between Nonhomogeneous Formal Contexts                    85


for i ∈ {1, 2}. Derivation operators of the bond will be denoted by (↑β , ↓β ). Then
there exists a set of attributes Y ⊆ A2 such that
                      ^
      ↓β (Y )(b1 ) =      (Y (a2 ) ⇒ β(b1 , a2 ))
                      a2 ∈A2

                  β(−, a2 ) is an extent of Ext(C1 ) hence there exists Z ⊆ A1
                     ^
                  =       (Y (a2 ) ⇒↓1 (Z)(b1 ))
                      a2 ∈A2
                       ^                    ^                             
                  =             Y (a2 ) ⇒            Z(a1 ) ⇒ R1 (b1 , a1 )
                      a2 ∈A2                a1 ∈A1
                       ^         ^
                  =                   (Y (a2 ) ⇒ (Z(a1 ) ⇒ R1 (b1 , a1 )))
                      a2 ∈A2 a1 ∈A1
                       ^         ^
                  =                   ((Y (a2 ) ∧ Z(a1 )) ⇒ R1 (b1 , a1 ))
                      a2 ∈A2 a1 ∈A1
                       ^  _                                            
                  =                     Y (a2 ) ∧ Z(a1 ) ⇒ R1 (b1 , a1 )
                      a1 ∈A1 a2 ∈A2
                       ^
                  =            (ZY (a1 ) ⇒ R1 (b1 , a1 ))
                      a1 ∈A1
                                                            W
                  =↓1 (ZY )(b1 ) where ZY (a1 ) =               a2 ∈A2 (Y (a2 ) ∧ Z(a1 ))

Hence, Ext(hB1 , A2 , βi) ⊆ Ext(C1 ). Similarly for intents.
   ⇐: Assume a formal context hB1 , A2 , βi such that it holds Ext(hB1 , A2 , βi) ⊆
Ext(C1 ) and Int(hB1 , A2 , βi) ⊆ Int(C2 ). From the simple fact that any row of any
context is its intent and any column is its extent and from the previous inclusions,
we obtain that β is a bond between C1 and C2 .                                     t
                                                                                   u
      Hence, the notion of bond can be defined equivalently as follows.
Definition 4. Let Ci = hBi , Ai , Ri i for i ∈ {1, 2} be two formal contexts. Formal
context B = hB1 , A2 , βi is said to be a bond between C1 and C2 if Ext(B) ⊆
Ext(C1 ) and Int(B) ⊆ Int(C2 ).
      More about the equivalent definition of bond could be found in [17–19].

2.2     Direct product of two formal contexts and bonds
Let us recall the definition and important property of direct product of two
formal contexts. More details about such topic can be found in [21, 26].
Definition 5. Let Ci = hBi , Ai , Ri i be two formal contexts. Formal context
C1 ∆C2 = hB1 × A2 , B2 × A1 , R1 ∆R2 i where
               (R1 ∆R2 )((b1 , a2 ), (b2 , a1 )) = R1 (b1 , a1 ) ∨ R2 (b2 , a2 )
                                                 = ¬R1 (b1 , a1 ) ⇒ R2 (b2 , a2 )
                                                 = ¬R2 (b2 , a2 ) ⇒ R1 (b1 , a1 )
86      Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči


for any (bi , ai ) ∈ Bi × Ai for all i ∈ {1, 2} is said to be a direct product of
formal contexts C1 and C2 .
Lemma 2. Let Ci = hBi , Ai , Ri i be two formal contexts. Every extent of C1 ∆C2
is a bond between C1 and C2 .

2.3   Second order formal contexts
In this subsection, we remind a notion of a second order formal concept [24].
Definition
 S        S 6. Consider two non-empty index sets I and J and a formal context
h i∈I Bi , j∈J Aj , ri, whereby
 – Bi1 ∩ Bi2 = ∅ for any i1 , i2 ∈ I, i1 6= i2 ,
 – Aj1S∩ Aj2 = ∅Sfor any j1 , j2 ∈ J, j1 6= j2 ,
 – r : i∈I Bi × j∈J Aj → 2.
Moreover, consider two non-empty sets of 2-contexts notated
 – {Ci = hBi , Ti , pi i : i ∈ I}
 – {Dj = hOj , Aj , qj i : j ∈ J}.
Formal context of second order is a tuple
           D[                      [                                    [            E
                Bi , {Ci ; i ∈ I},   Aj , {Dj ; j ∈ J},                          ri,j ,
               i∈I                       j∈J                         (i,j)∈I×J

where ri,j : Bi × Aj → 2 defined as ri,j (b, a) = r(b, a) for any b ∈ Bi and a ∈ Aj .
    In what follows, consider the below describedSnotation. Let us have an L-set
f : X → 2 for a non-empty universe set X = i∈I Xi , where Xi1 ∩ Xi2 = ∅
for any i1 , i2 ∈ I. Then f i : Xi → 2 is defined as f i (x) = f (x) for an arbitrary
x ∈ Xi and i ∈ I.
    We define the mappings between direct products of two sets of concept lat-
tices (that correspond to the two sets of 2-contexts given above) in the following
form:
Definition 7. Let us define the mappings h⇑, ⇓i as follows
           Y             Y                  Y              Y
        ⇑:   Ext(Ci ) →     Int(Dj ) and ⇓:    Int(Dj ) →    Ext(Ci )
             i∈I               j∈J                    j∈J                   i∈I
                               ^                               Y
                         j                 i
                     ⇑ (Φ) =         ↑ij (Φ ), for any Φ ∈           Ext(Ci )
                               i∈I                             i∈I
                               ^                               Y
                   ⇓ (Ψ )i =         ↓ij (Ψ j ), for any Ψ ∈         Int(Dj )
                               j∈J                             j∈J

such that (↑ij , ↓ij ) is a pair of derivation operators defined on hBi , Aj , ρij i where
        ^
  ρij = {β ∈ 2-Bonds(Ci , Dj ) : (∀(bi , aj ) ∈ Bi × Aj )β(bi , aj ) ≥ rij (bi , aj )}.
               Looking for Bonds between Nonhomogeneous Formal Contexts             87


2.4   Heterogeneous formal contexts
A heterogeneous extension in FCA based on the totally diversification of objects,
attributes and table fields has been introduced in [3]. In the following, we remind
the definition of a heterogeneous formal context and its derivation operators.

Definition 8. Heterogeneous formal context is a tuple C = hB, A, P, R, U, V, i,
where
 – B and A are non-empty sets,
 – P = {hPb,a , ≤Pb,a i : (b, a) ∈ B × A} is a system of posets,
 – R is a mapping from B × A such that R(b, a) ∈ Pb,a for any b ∈ B and
   a ∈ A,
 – U = {hUb , ≤Ub i : b ∈ B} and V = {hVa , ≤Va i : a ∈ A} are systems of
   complete latices,
 –    = {◦b,a : (b, a) ∈ B × A} is a system of isotone and left-continuous
   mappings ◦b,a : Ub × Va −→ Pb,a .
    Let us define the derivation operators
                                        Q of a heterogeneous
                                                     Q        formal context
                                                                     Q       as a
pair
Q    of mappings   (%, .), whereby   %:   b∈B  Ub →    a∈A Va and .:   a∈A V a →
  b∈B Ub such that
                 W                                            Q
 – . (f )(a) = W {v ∈ Va |f (b) ◦b,a v ≤ R(b, a)} for any f ∈ Q b∈B Ub
 – % (g)(b) = {u ∈ Ub |u ◦b,a g(a) ≤ R(b, a)} for any g ∈ a∈A Va .


3     Problem description and sketch of solution
In this section we discussed why we have proposed an equivalent definition of
bond. First, consider the classical definition of bond. It is a binary relation (table)
between objects and attributes from different contexts such that its rows are
intents and columns are extents of different input contexts. The issue of looking
for bonds in a classical or homogeneous fuzzy case can be solved successfully
[17, 21].
    The solution of this issue requires the alternative definition of a bond. Hence,
new definition of a bond focuses not only on a relation with some special prop-
erties, but also on a bond as a formal context, whereby its concept lattice is
connected to concept lattices of input contexts in some sense. As a consequence,
a generalization for heterogeneous bonds is possible. One can find the methods
in effort to equivalently modify the input heterogeneous formal contexts and to
extract bonds as the extents of a direct product.
    The proposed modification runs as follows. Each individual pair that includes
a ”conjunction” ◦b,a and a value of the poset Pb,a is replaced by a bond from
2-Bonds(hUb , Ub , ≤i, hVa , Va , ≥i). This completely covers the Galois connection
between the complete lattices of any object–attribute pair from B × A.
    At the beginning, we will show how this modification looks in terms of sec-
ond order formal contexts. Then we define new modified heterogeneous formal
context such that its concept lattice is identical to the original.
88     Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči


4    Second order form of scaled heterogeneous formal
     context
In effort to formalize the second order form of scaled heterogeneous formal con-
text and its derivation operators, the definition of the following mappings is
required:
                                                                                 L
Definition 9. Let (L, ≤) be a complete lattice. Let us define mappings (−) and
(−)L where
       L                        L
 – (−) : L → 2L such that k (m) W   = (m ≤ k) for any k, m ∈ L
 – (−)L : 2L → L such that X L = X for any X ⊆ L.
                             Q                                        S
Let us have an arbitrary f ∈ b∈B Ub . Let us denote f as a subset of b∈B Ub
              S                                              Q
defined as f = b∈B {u ∈ Ub |u ≤ f (b)}. Similarly for any g ∈ a∈A Va .
    More information about Cartesian representation of fuzzy sets could be found
in [10].
    Now, consider a heterogeneous formal context C = hB, A, P, R, U, V, i. A
second order form of scaled heterogeneous formal context is defined as
           *                                                               +
              [                             [
        C=      Ub , {hUb , Ub , ≤i|b ∈ B},   Va , {hVa , Va , ≥i|a ∈ A}, R ,
             b∈B                           a∈A

whereby all external contexts are classical crisp contexts and R is a classical crisp
binary relation defined as R(u, v) = ((u ◦b,a v) ≤ R(b, a)) for any (u, v) ∈ Ub × Va
and any (b, a) ∈ B × A.
    In the following, we define the derivation operators of such special second
order formal context. First, we state some appropriate remarks and facts. Note
that a relation R constrained to Ub × Va for any pair (b, a) ∈ B × A is monotone
in both arguments due to its definition. Similarly, consider the fact that any
extent of hUb , Ub , ≤i and any intent of hVa , Va , ≥i is a principal down-set of
a corresponding complete lattice (i.e. there exists an element in this complete
lattice such that all lower or equal elements are in the extent or in the intent).
Hence, a relation R constrained to Ub × Va for some (b, a) ∈ B × A is a 2-bond
     Q hUb , Ub , ≤i and hVa , Va , ≥i which will be denoted Q
between                                                       by ρb,a . Note that any
Φ ∈ b∈B Ext(hU  Q   b , Ub , ≤i) has  the
                                       Q  form f for some f ∈   b∈B Ub . Consider an
arbitrary f ∈ b∈B Ub and g ∈ a∈A Va . Hence, the derivation operators are
defined as follows:
                V                   b
  – %(f )(v) = b∈B ↑b,a (f (b) )(v) for any v ∈ Va and a ∈ A
                V                   a
  – .(g)(u) = a∈A ↓b,a (g(a) )(u) for any u ∈ Ub and b ∈ B.
   In a previous definition, the pair of mappings (↑b,a , ↓b,a ) are derivation op-
erators of a formal context hUb , Va , ρb,a i for any (b, a) ∈ B × A. For the sake of
                                             b                  Ub                       a
brevity, we use the shortened notation (−) instead of (−)            and similarly (−)
               Va
instead of (−) .
               Looking for Bonds between Nonhomogeneous Formal Contexts                     89


Lemma 3. The concept lattices of C and C are isomorphic.
                                    Q
Proof. Consider an arbitrary f ∈ b∈B Ub . We will show that %(f ) = % (f ).
    Firstly consider the fact of left-continuity of both arguments of ◦b,a for any
(b, a) ∈ B×A. Due to this property, one can define two residuums in the following
way. Let (b, a) ∈ B × A be an arbitrary object-attribute pair and consider the
arbitrary values u ∈ Ub , v ∈ Va and p ∈ Pb,a . Then define
                                                   W
  – →b,a : Ub × Pb,a → Va , such that u →b,a p = W {v ∈ Va |u ◦b,a v ≤ p}
  – →a,b : Va × Pb,a → Ub , such that v →a,b p = {u ∈ Ub |u ◦b,a v ≤ p}.

         ^          b
                        
% f (v) =   ↑b,a f (b) (v)
              b∈B
              ^ ^                 b
                                                     
          =                   f (b) (u) ⇒ ρb,a (u, v)
              b∈B u∈Ub
              ^ ^
          =              ((u ≤ f (b)) ⇒ (u ◦b,a v ≤ R(b, a)))
              b∈B u∈Ub
                                                                                            
              ^               ^                 ^
          =                           1∧                  ((u ≤ f (b)) ⇒ (u ◦b,a v ≤ R(b, a)))
              b∈B     u∈Ub ;u6≤f (b)        u∈Ub ;u≤f (b)
              ^          ^
          =                        (u ◦b,a v ≤ R(b, a))
              b∈B u∈Ub ;u≤f (b)
              ^
          =         (f (b) ◦b,a v ≤ R(b, a))
              b∈B
              ^
          =         (v ≤ f (b) →b,a R(b, a))
              b∈B
                                                       !
                        ^
          =     v≤            (f (b) →b,a R(b, a))
                        b∈B
                                                                        !
                        ^_
          =     v≤               {w ∈ Va |(f (b) ◦b,a w ≤ R(b, a))}
                        b∈B
              _                                          
          = v ≤ {w ∈ Va |(∀b ∈ B)(f (b) ◦b,a w ≤ R(b, a))}
                                                   a
          = (v ≤% (f )(a)) = % (f )(a) (v).
                                                                b
Analogously one can obtain . (g) (u) = . (g)(b) (u).                                         t
                                                                                             u

4.1   Back to heterogeneous formal contexts
Now, we look at heterogeneous formal context introduced in Subsection 2.3. A
second order formal context C can be seen as a special heterogeneous formal
        b whereby the family of posets {hPb,a , ≤i|(b, a) ∈ B × A} is replaced by
context C,
90      Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči


a set of 2-bonds {ρb,a ∈ 2-Bonds(hUb , Ub , ≤i, hVa , Va , ≤i)|(b, a) ∈ B × A}. Hence,
the final form of such heterogeneous formal context is
                       D                                            E
                   Cb = B, A, ρ, R,
                                 b U, V, {×b,a |(b, a) ∈ B × A}

where

 – ρ = {ρb,a ∈ 2-Bonds(hUb , Ub , ≤i, hVa , Va , ≤i)|(b, a) ∈ B × A}
 – ρb,a (u, v) = (u ◦b,a v ≤ R(b, a))
   b a) = ρb,a ∈ 2-Bonds(hUb , Ub , ≤i, hVa , Va , ≤i) for any (b, a) ∈ B × A
 – R(b,
 – ×b,a : Ub × Va → 2Ub ×Va defined as a Cartesian product u×v = u × v.

The derivation operators of Cb are defined as
               W                                                   Q
 – ↑ (f )(a) = W {v ∈ Va |(∀b ∈ B)f (b)×b,a v ⊆ ρb,a } for any f ∈ Q b∈B Ub
 – ↓ (g)(b) = {u ∈ Ub |(∀a ∈ A)u×b,a g(a) ⊆ ρb,a } for any g ∈ a∈A Va .

Lemma 4. The concept lattices of C and Cb are identical.

Proof. Firstly consider that for any (u, v) ∈ Ub × Va for any (b, a) ∈ B × A the
following holds:

                         u×v ⊆ ρb,a = u × v ⊆ ρb,a
                                      = ρb,a (u, v)
                                      = (u ◦b,a v ≤ R(b, a)).
          Q
Let f ∈    b∈B Ub be arbitrary. Then
                         _
               ↑ (f )(a) ={v ∈ Va |(∀b ∈ B)f (b)×b,a v ⊆ ρb,a }
                         _
                        = {v ∈ Va |(∀b ∈ B)f (b) ◦b,a v ≤ R(b, a)}
                        =% (f )(a).
                                                      Q
Analogously for ↓ (g)(b) =. (g)(b) for any g ∈            a∈A Va .                  t
                                                                                    u


5    Bonds between heterogeneous formal contexts

We present a definition of a bond between two heterogeneous formal contexts
which can be formulated as follows.

Definition 10. Let Ci = hBi , Ai , Pi , Ri , Ui , Vi , i i for i ∈ {1, 2} be two heteroge-
neous formal contexts. The heterogeneous formal context B = hB1 , A2 , P, R, U1 , V2 , i
such that Ext(B) ⊆ Ext(C1 ) and Int(B) ⊆ Int(C2 ) is said to be a bond between
two heterogeneous formal contexts C1 and C2 .
                      Looking for Bonds between Nonhomogeneous Formal Contexts                          91


5.1      Direct product of two heterogeneous formal contexts
In this subsection, we define a direct product of two heterogeneous formal con-
texts. Further, we give an answer on how to find a bond between two heteroge-
neous formal contexts.
Definition 11. Let Ci = hBi , Ai , Pi , Ri , Ui , Vi , i i for i ∈ {1, 2} be two hetero-
geneous formal contexts. The heterogeneous formal context
                       C1 ∆C2 = hB1 × A2 , B2 × A1 , P∆ , R∆ , U∆ , V∆ , ×i
such that
 – P∆ = {ρb1 ,a1 ∆ρb2 ,a2 |((b1 , a2 ), (b2 , a1 )) ∈ (B1 × A2 ) × (B2 × A1 )}
 – where ρbi ,ai (u, v) = (u ◦bi ,ai v ≤ Ri (bi , ai )) for any (u, v) ∈ Ubi × Vai for any
   (bi , ai ) ∈ Bi × Ai for any i ∈ {1, 2}
 – R∆ ((b1 , a2 ), (b2 , a1 )) = ρb1 ,a1 ∆ρb2 ,a2 for any bi ∈ Bi and ai ∈ Ai for all
   i ∈ {1, 2}
 – U∆ = {γ1,2 ∈ 2-Bonds(hUb1 , Ub1 , ≤i, hVa2 , Va2 , ≥i)|(b1 , a2 ) ∈ B1 × A2 }
 – V∆ = {γ2,1 ∈ 2-Bonds(hUb2 , Ub2 , ≤i, hVa1 , Va1 , ≥i)|(b2 , a1 ) ∈ B2 × A1 }
is said to be a direct product of two heterogeneous formal contexts.
Lemma 5. Let Ci = hBi , Ai , Pi , Ri , Ui , Vi , i i for i ∈ {1, 2} be two heteroge-
neous formal contexts. Let
                      Y
            R∈               2-Bonds(hUb1 , Ub1 , ≤i, hVa2 , Va2 , ≥i)
                         (b1 ,a2 )∈B1 ×A2

be an extent of the direct product C1 ∆C2 . Then a heterogeneous formal context
B = hB1 , A2 , ρ, R, U1 , V2 , ×i where
               ρ = {2-Bonds(hUb1 , Ub1 , ≤i, hVa2 , Va2 , ≥i)|(b1 , a2 ) ∈ B1 × A2 }
is a bond between C1 and C2 .
                                                                                Q
Proof. Let us have any intent of B. Then there exists f ∈                          b1 ∈B1 Ub1 such that
                 a2
%B (f )(a2 ) (v2 ) = %B (f )(v2 )
   ^                        b1
=       ↑R(b1 ,a2 ) (f (b1 ) )(v2 )
      b1 ∈B1
       ^        ^             b1
 =                     (f (b1 ) (u1 ) ⇒ R(b1 , a2 )(u1 , v2 ))
      b1 ∈B1 u1 ∈Ub1
                           Q
 R =.∆ (Q) for some Q ∈ (b2 ,a1 )∈B2 ×A1 2-Bonds(hUb2 , Ub2 , ≤i, hVa1 , Va1 , ≥i)
    ^   ^           b1
 =          (f (b1 ) (u1 ) ⇒.∆ (Q)(b1 , a2 )(u1 , v2 ))
      b1 ∈B1 u1 ∈Ub1
       ^        ^            b1                  ^                                                     
 =                      f (b1 ) (u1 ) ⇒                        ↓ρb1 ,a1 ∆ρb2 ,a2 (Q(b2 , a1 ))(u1 , v2 )
      b1 ∈B1 u1 ∈Ub1                        (b2 ,a1 )∈B2 ×A1
92          Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči

        ^       ^            b1
=                       f (b1 ) (u1 ) ⇒
     b1 ∈B1 u1 ∈Ub1
    ^       ^             ^                                                                                
                                     Q(b2 , a1 )(u2 , v1 ) ⇒ (ρb1 ,a1 ∆ρb2 ,a2 )((u1 , v2 ), (u2 , v1 ))
b2 ∈B2 a1 ∈A1 (u2 ,v1 )∈Ub2 ×Va1
        ^       ^            b1
=                       f (b1 ) (u1 ) ⇒
     b1 ∈B1 u1 ∈Ub1
    ^       ^       ^       ^                                                                           
                                   Q(b2 , a1 )(u2 , v1 ) ⇒ (¬ρb1 ,a1 (u1 , v1 ) ⇒ ρb2 ,a2 (u2 , v2 ))
b2 ∈B2 a1 ∈A1 u2 ∈Ub2 v1 ∈Va1
        ^       ^     ^       ^        ^        ^        b1
=                                                  f (b1 ) (u1 ) ⇒
     b1 ∈B1 b2 ∈B2 a1 ∈A1 u2 ∈Ub2 v1 ∈Va1 u1 ∈Ub1
                                                                      
Q(b2 , a1 )(u2 , v1 ) ⇒ (¬ρb1 ,a1 (u1 , v1 ) ⇒ ρb2 ,a2 (u2 , v2 ))
   ^        ^
=
     b2 ∈B2 u2 ∈Ub2
 _          _        _       _            b1                                                     
                                    f (b1 ) (u1 ) ∧ Q(b2 , a1 )(u2 , v1 ) ∧ ¬ρb1 ,a1 (u1 , v1 )
 b1 ∈B1 u1 ∈Ub1 a1 ∈A1 v1 ∈Va1
                    
⇒ ρb2 ,a2 (u2 , v2 )
   ^       ^             b2
=                 (q(b2 ) (u2 ) ⇒ ρb2 ,a2 (u2 , v2 ))
     b2 ∈B2 u2 ∈Ub2

= %C 2 (q)(v2 ) = %C2 (q)(a2 )(v2 )


where
                    _      _       _       _
q(b2 )(u2 ) =                                    f (b1 )(u1 )∧Q(b2 , a1 )(u2 , v1 )∧¬ρb1 ,a1 (u1 , v1 )
                 b1 ∈B1 u1 ∈Ub1 a1 ∈A1 v1 ∈Va1

Hence, %B (f ) =%C2 (q). So any intent of B is an intent of C2 .
   By using the following equality

        (¬ρb1 ,a1 (u1 , v1 ) ⇒ ρb2 ,a2 (u2 , v2 )) = (¬ρb2 ,a2 (u2 , v2 ) ⇒ ρb1 ,a1 (u1 , v1 ))

analogously we obtain that any extent of B is an extent of C1 . Hence, B is a bond
between C1 and C2 .                                                              t
                                                                                 u


6       Conclusion
Bonds and their L-fuzzy generalizations represent a feasible way to explore the
relationships between formal contexts. In this paper we have investigated the
notion of a bond with respect to the heterogeneous formal contexts. In conclu-
sion, an alternative definition of a bond provides an efficient tool to work with
               Looking for Bonds between Nonhomogeneous Formal Contexts              93


the nonhomogeneous data and one can further explore this uncharted territory
in formal concept analysis.
    Categorical properties of heterogeneous formal contexts and bonds as mor-
phisms between such objects and categorical relationship to homogeneous FCA
categorical description will be studied in the near future.


References
 1. C. Alcalde, A. Burusco, R. Fuentes-González and I. Zubia. The use of linguistic
    variables and fuzzy propositions in the L-fuzzy concept theory. Computers &
    Mathematics with Applications, 62 (8): 3111-3122, 2011.
 2. L. Antoni, S. Krajči, O. Krı́dlo, B. Macek, and L. Pisková. Relationship between
    two FCA approaches on heterogeneous formal contexts. Proceedings of CLA, 93–
    102, 2012.
 3. L. Antoni, S. Krajči, O. Krı́dlo, B. Macek, and L. Pisková. On heterogeneous
    formal contexts. Fuzzy Sets and Systems, 234: 22–33, 2014.
 4. R. Bělohlávek. Fuzzy concepts and conceptual structures: induced similarities.
    Proceedings of Joint Conference on Information Sciences, 179–182, 1998.
 5. R. Bělohlávek. Lattices of fixed points of fuzzy Galois connections. Mathematical
    Logic Quartely, 47(1):111–116, 2001.
 6. R. Bělohlávek. Concept lattices and order in fuzzy logic. Annals of Pure and
    Applied Logic, 128(1–3):277-298, 2004.
 7. R. Bělohlávek. Lattice-type fuzzy order is uniquely given by its 1-cut: proof and
    consequences. Fuzzy Sets and Systems, 143:447–458, 2004.
 8. R. Bělohlávek. Sup-t-norm and inf-residuum are one type of relational product:
    unifying framework and consequences. Fuzzy Sets and Systems, 197: 45–58, 2012.
 9. R. Bělohlávek. Ordinally equivalent data: A measurement-theoretic look at for-
    mal concept analysis of fuzzy attributes. International Journal of Approximate
    Reasoning, 54 (9): 1496–1506, 2013.
10. Simple proof of basic theorem for general concept lattices by Cartesian repre-
    sentation MDAI 2012, Girona, Catalonia, Spain, November 21-23, 2012, LNCS
    7647(2012), 294-305.
11. P. Butka, J. Pócs and J. Pócsová. Representation of fuzzy concept lattices in the
    framework of classical FCA. Journal of Applied Mathematics, Article ID 236725,
    7 pages, 2013.
12. P. Butka and J. Pócs. Generalization of one-sided concept lattices. Computing and
    Informatics, 32(2): 355–370, 2013.
13. C. Carpineto, and G. Romano. Concept Data Analysis. Theory and Applications.
    J. Wiley, 2004.
14. J.C. Dı́az-Moreno, J. Medina, M. Ojeda-Aciego. On basic conditions to gener-
    ate multi-adjoint concept lattices via Galois connections. International Journal of
    General Systems, 43(2): 149–161, 2014.
15. D. Dubois and H. Prade. Possibility theory and formal concept analysis: Charac-
    terizing independent sub-contexts. Fuzzy sets and systems, 196, 4–16, 2012.
16. B. Ganter and R. Wille. Formal concept analysis. Springer–Verlag, 1999.
17. J. Konečný and M. Ojeda-Aciego. Isotone L-bonds. Proceedings of CLA, 153-162,
    2013.
18. J. Konečný, M. Krupka. Block Relations in Fuzzy Setting Proceedings of CLA,
    115-130, 2011.
94      Ondrej Krı́dlo, L’ubomı́r Antoni and Stanislav Krajči


19. J. Konečný.     Information Processing and Management of Uncertainty in
    Knowledge-Based Systems Comunications Computer and Information Sciences,
    vol. 444, 2014, pp 71-80.
20. O. Krı́dlo and M. Ojeda-Aciego. On the L-fuzzy generalization of Chu correspon-
    dences. International Journal of Computer Mathematics, 88(9):1808-1818, 2011.
21. O. Krı́dlo, S. Krajči, and M. Ojeda-Aciego. The category of L-Chu correspondences
    and the structure of L-bonds. Fundamenta Informaticae, 115(4):297-325, 2012.
22. O. Krı́dlo and M. Ojeda-Aciego. Linking L-Chu correspondences and completely
    lattice L-ordered sets. Proceedings of CLA, 233–244, 2012.
23. O. Krı́dlo and M. Ojeda-Aciego. CRL-Chu correspondences. Proceedings of CLA,
    105–116, 2013.
24. O. Krı́dlo, P. Mihalčin, S. Krajči and L. Antoni. Formal Concept analysis of higher
    order. Proceedings of CLA, 117-128, 2013.
25. O. Krı́dlo and M. Ojeda-Aciego. Revising the Link between L-Chu correspondences
    and completely lattice L-ordered sets. Annals of Mathematics and Artificial In-
    teligence, DOI 10.1007/s10472-014-9416-8, 2014.
26. M. Krötzsch, P. Hitzler, and G.-Q. Zhang. Morphisms in context. Lecture Notes
    in Computer Science, 3596:223–237, 2005.
27. J. Medina and M. Ojeda-Aciego. Multi-adjoint t-concept lattices. Information
    Sciences, 180:712–725, 2010.
28. J. Medina and M. Ojeda-Aciego. On multi-adjoint concept lattices based on het-
    erogeneous conjunctors. Fuzzy Sets and Systems, 208: 95–110, 2012.
29. J. Medina and M. Ojeda-Aciego. Dual multi-adjoint concept lattices. Information
    Sciences, 225, 47–54, 2013.
30. H. Mori. Chu Correspondences. Hokkaido Matematical Journal, 37:147–214, 2008.
31. H. Mori. Functorial properties of Formal Concept Analysis. Proc ICCS, Lecture
    Notes in Computer Science, 4604:505–508, 2007.
32. J. Pócs. Note on generating fuzzy concept lattices via Galois connections. Infor-
    mation Sciences 185 (1):128–136, 2012.
33. J. Pócs. On possible generalization of fuzzy concept lattices. Information Sciences,
    210: 89–98, 2012.
             Reverse Engineering Feature Models from
           Software Configurations using Formal Concept
                             Analysis

           R. AL-msie’deen1 , M. Huchard1 , A.-D. Seriai1 , C. Urtado2 and S. Vauttier2
                        1
                            LIRMM / CNRS & Montpellier 2 University, France
                                   Al-msiedee, huchard, Seriai@lirmm.fr
                             2
                               LGI2P / Ecole des Mines d’Alès, Nı̂mes, France
                             Christelle.Urtado, Sylvain.Vauttier@mines-ales.fr


               Abstract. Companies often develop in a non-disciplined manner a set
               of software variants that share some features and differ in others to meet
               variant-specific requirements. To exploit existing software variants and
               manage them coherently as a software product line, a feature model must
               be built as a first step. To do so, it is necessary to extract mandatory
               and optional features from the code of the variants in addition to as-
               sociate each feature implementation with its name. In previous work,
               we automatically extracted a set of feature implementations as a set of
               source code elements of software variants and documented the mined
               feature implementations based on the use-case diagrams of these vari-
               ants. In this paper, we propose an automatic approach to organize the
               mined documented features into a feature model. The feature model is a
               tree which highlights mandatory features, optional features and feature
               groups (and, or, xor groups). The feature model is completed with re-
               quirement and mutual exclusion constraints. We rely on Formal Concept
               Analysis and software configurations to mine a unique and consistent fea-
               ture model. To validate our approach, we apply it on several case studies.
               The results of this evaluation validate the relevance and performance of
               our proposal as most of the features and their associated constraints are
               correctly identified.

               Keywords: Software Product Line, Feature Models, Software Product
               Variants, Formal Concept Analysis, Product-by-feature matrix.


      1      Introduction
      To exploit existing software variants and build a software product line (SPL),
      a feature model (FM) must be built as a first step. To do so, it is necessary to
      extract mandatory and optional features in addition to associate each feature
      with its name. In our previous work [1,2], we have presented an approach called
      REVPLINE 1 to identify and document features from the object-oriented source
      code of a collection of software product variants.
       1
           REVPLINE stands for RE-engineering Software Variants into Software Product
           Line.




c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 95–107,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
96      Ra’Fat Al-Msie’Deen et al.

         Dependencies between features need to be expressed via a FM which is a
     de facto standard formalism [3,4]. A FM is a tree-like hierarchy of features and
     constraints between them (cf. left side of Figure 1). FMs aim at describing the
     variability of a SPL in terms of features. A FM defines which feature combi-
     nations lead to valid products within the SPL (cf. right side of Figure 1). We
     illustrate our approach with the cell phone SPL FM and its 16 valid product
     configurations (cf. Figure 1) [5].




                                                                                                                                                                                                    Artificial Opponent
                                                                                                                                                                                    Single Player
                                                                                                                                                                     Multi Player
                                                               Cell Phone



                                                                                                  Bluetooth
                                                                                                              Accu Cell
                                                                            Wireless
                                                                                       Infrared




                                                                                                                                   Medium


                                                                                                                                                   Display
                                                                                                                                                             Games
                                                                                                                          Strong


                                                                                                                                            Weak
                                                          P-1 × × ×    × ×     × × ×
                                                          P-2 × ×    × × ×     × × ×
                                                          P-3 × × × × × ×      × × ×
                                                          P-4 × × ×    ×   ×   × × ×
                                                          P-5 ×        ×     × × ×   × ×
                                                          P-6 ×        × ×     × ×   × ×
                                                          P-7 × × ×    × ×     × ×   × ×
                                                          P-8 × ×    × × ×     × ×   × ×
                                                          P-9 × × × × × ×      × ×   × ×
                                                          P-10 ×       ×   ×   × ×   × ×
                                                          P-11 × × ×   ×   ×   × ×   × ×
                                                          P-12 × × ×   ×     × × ×   × ×
                                                          P-13 × × ×   ×   ×   × × × × ×
                                                          P-14 × × ×   × ×     × × × × ×
                                                          P-15 × ×   × × ×     × × × × ×
                                                          P-16 × × × × × ×     × × × × ×


           Fig. 1. Valid product configurations of cell phone SPL feature model [5].



         Figure 1 shows the FM of the cell phone SPL [5]. The Cell Phone feature is
     the root feature of this FM; hence it is selected in every program configuration.
     It has three mandatory child features (i.e., the Accu Cell, Display and Games
     features), which are also selected in every product configuration as their parent
     is always included. The children of the Accu Cell feature form an exclusive-or
     relation, meaning that the programs of this SPL include exactly one out of the
     three Strong, Medium or Weak features. The Multi Player and Single Player
     features constitute an inclusive-or, which necessitates that at least one of these
     two features is selected in any valid program configuration. Single Player has
     Artificial Opponent as a mandatory child feature. The Wireless feature is an
     optional child feature of root; hence it may or may not be selected. Its Infrared
     and Bluetooth child features form an inclusive-or relation, meaning that if a
     program includes the Wireless feature then at least one of its two child features
     has to be selected as well. The cell phone SPL also introduces three cross-tree
     constraints. While the Multi Player feature cannot be selected together with
     the Weak feature, it cannot be selected without the Wireless feature. Lastly, the
     Bluetooth feature requires the Strong feature.
         Galois lattices and concept lattices [6] are core structures of a data analy-
     sis framework (Formal Concept Analysis) for extracting an ordered set of con-
    Reverse Engineering Software Configuration Feature Models using FCA          97

cepts from a dataset, called a formal context, composed of objects described by
attributes. In our approach, we consider the AOC-poset (for Attribute-Object-
Concept poset) [7], which is the sub-order of the concept lattice restricted to
attribute-concepts and object-concepts. Attribute-concepts (resp. object-con-
cepts) are the highest (resp. lowest) concepts that introduce each attribute (resp.
object). AOC-posets scale much better than lattices. For applying Formal Con-
cept Analysis (FCA) we used the Eclipse eRCA platform2 .
    Manual construction of a FM is both time-consuming and error-prone [8],
even for a small set of configurations [9]. The existing approaches to extract
FM from product configurations [8,10] suffer from a lot of challenges. The main
challenge is that numerous candidate FMs can be extracted from the same input
product configurations, yet only a few of them are meaningful and correct, while
in our work we synthesize an accurate and meaningful FM using FCA. Moreover
the majority of these approaches extract a basic FM without constraints between
its features [11] while, in our work, we extract all kinds of FM constraints.
    The remainder of this paper is structured as follows: Section 2 presents the
reverse engineering FM process step-by-step. Next, Section 3 presents the way
that we propose to evaluate the obtained FMs. Section ?? describes the ex-
perimentation and threats to the validity. Section 4 discusses the related work.
Finally, in Section 5, we conclude this paper.


2     Step-by-Step FM Reverse Engineering
This section presents step-by-step the FM reverse engineering process. According
to our approach, we identify the FM in seven steps as detailed in the following,
using strong properties of FCA to group features among product configurations.
The AOC-poset is built from a set of known products, and thus does not repre-
sent all possible products. Thus, the FM structure has to be considered only as a
candidate feature organization that can be proposed to an expert. The algorithm
is designed such that all existing products (used for construction of candidate
FM) are covered by the FM. Besides, it allows to define possible unused close
variants.
    The first step of our FM extraction process is the identification of the AOC-
poset. First, a formal context, where objects are software product variants and
attributes are features (cf. Figure 1), is defined. The corresponding AOC-poset
is then calculated. The intent of each concept represents features common to
two or more products or unique to one product. As AOC-posets are ordered, the
intent of the most general (i.e., top) concept gathers mandatory features that
are common to all products. The intents of all the remaining concepts represent
the optional features. The extent of each of these concepts is the set of products
sharing these features (cf. Figure 2). In the following algorithms, for a Concept C,
we call intent(C), extent(C), simplif ied intent(C), and simplif ied extent(C)
its associated sets. Efficient algorithms can be found in [7].
    The other steps are presented in the next sections.
2
    The eRCA : http://code.google.com/p/erca/
98         Ra’Fat Al-Msie’Deen et al.




                   Fig. 2. The AOC-poset for the formal context of Figure 1.


     2.1     Extracting root feature and mandatory features

     Algorithm 1 is a simple algorithm for building the Base node (cf. Figure 3).
     Features in the top concept of the AOC-poset (Concept 16) are used in every
     product configuration. The Cell Phone feature is the root feature of the cell
     phone FM (line 5). Then a mandatory Base node is created (lines 8,9). It is
     linked to nodes created to represent all the other features in the top concept,
     i.e., Accu Cell, Display and Games (lines 12-16).


     2.2     Extracting atomic set of features (AND-group)

     Algorithm 2 is a simple algorithm for building AND-groups of features (exclud-
     ing all the mandatory features, line 3). An AND-group of features is created (line
     8) to group optional features that appear in the same simplified intent (test line
     6), meaning that these features are always used together in all the product con-
     figurations where they appear. Lines 12-16, nodes are created for every feature
     of the AND-group and they are attached to an And node. For instance, Con-
     cept 23 in Figure 2 has a simplified intent with two features, Single Player and
     Artificial Opponent, leading to the And node of Figure 3.


     2.3     Extracting exclusive-or relation

     Features that form exclusive-or relation can be identified in the concept lattice
     using the meet (denoted by u) lattice operation [12], which amounts to compute
 Reverse Engineering Software Configuration Feature Models using FCA               99

 Algorithm 1: ComputeRootAndMandatoryFeature
 1 // Top concept >
 2 ∃ F ∈ A, which represents the name of the soft. family with F in feature set of >
   Data: AOC K , ≤s : the AOC-poset associated with K
   Result: part of the FM containing root and mandatory features
 3 // Compute the root Feature
 4 CFS ← intent(>)
 5 Create node root, label (root) ← F, type (root) ← abstract
        0
 6 CFS ← CFS \ {F}
          0
 7 if CFS 6= ∅ then
 8     Create node base with label (base) ← ”Base”
 9     type (base) ← abstract
10     Create edge e = (root, base)
11     type (e) ← mandatory
12     for each Fe in CFS0 do
13          Create node feature, with label (feature) ← Fe
14          type (feature) ← concrete
15          create edge e = (base, feature)
16          type (e) ← mandatory




 Algorithm 2: ComputeAtomicSetOfFeatures (and groups)
   Data: AOC K , ≤s : the AOC-poset associated with K
   Result: part of the FM with and groups of features
 1 // Compute atomic set of features
 2 // Feature List (FL) is the list of all features (FL = A in K=(O, A, R)).
      0
 3 FL ← FL \ CFS // FL \ intent(>)
 4 AsF ← ∅
 5 int count ← 1
 6 for each concept C 6= > such that | simplified intent(C) | ≥ 2 do
 7      AsF ← AsF ∪ simplified intent(C)
 8      Create node and with label (and ) ← ”AND”+ count
 9      type (and ) ← abstract
10      create edge e = (root, and )
11      type (e) ← optional
12      for each F in simplified intent(C) do
13          create node feature, with label (feature) ← F
14          type (feature) ← concrete
15          create edge e =(and, feature)
16          type (e) ← mandatory




the greatest lower bounds in the AOC-poset. If a feature A is introduced in
concept C1 , a feature B is introduced in concept C2 and C1 u C2 = ⊥ (and
extent(⊥) = ∅), that is, if the bottom of the lattice is the greatest lower bound
of C1 and C2 , the two features never occur together in a product. In our current
100         Ra’Fat Al-Msie’Deen et al.

      approach, we only build a single Xor group of features, when any group of
      mutually exclusive features exists. Computing exclude constraints (see Section
      2.6) will deal with the many cases where several Xor group of features exist
      (a set of exclude constraints defining mutual exclusion is equivalent to a Xor
      group).
          Algorithm 3 is a simple algorithm for building the single Xor group of fea-
      tures. The principle is to traverse the set of super-concepts of each minimum
      elements of the AOC-poset and to keep the concepts that are the super-concepts
      of only one minimum concept. Only features that are not used in the previous
      steps are considered in FL” (line 2). Lines 6-10, in our example, we consider the
      three minimum concepts Concept 11, Concept 12 and Concept 15. The many
      SSC sets are the sets of super-concepts for Concept 11, Concept 12 and Con-
      cept 15. Cxor is the set of all concepts, except Concept 11, Concept 12 and
      Concept 15. Lines 11-15 only keep in Cxor concepts that do not appear in two
      SSC sets. Cxor contains concepts number 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14,
      19, 20 and 21. Line 16 eliminates Concept 19 which is not a maximum. As there
      are three features (Medium, Strong, Weak, from Concept 21, Concept 20, and
      Concept 2 respectively) that are in FL” and in the simplified intent of concepts
      of Cxor (line 18), an Xor node is created and linked to the root (lines 19-26).
      Then, on lines 27-33, nodes are created for the features and linked to the Xor
      node. Figure 3 shows this Xor node.


      2.4    Extracting inclusive-or relation

      Optional features are features that are used in some (but not all) product con-
      figurations. There are many ways of finding and organizing them. Algorithm 4
      is a simple algorithm for building the Or group of features. In our approach, we
      pruned the AOC-poset by removing the top concept, concepts that correspond to
      AND groups of features, and concepts that correspond to features that form an
      exclusive-or relation. The remaining concepts define features that are grouped
      (lines 8-12) into an Or node (created and linked to the root on lines 4-7). In
      the AOC-poset of Figure 2, the Wireless, Infrared, Bluetooth, and Multi Player
      features form an inclusive-or relation (cf. Figure 3).


      2.5    Extracting require constraints

      Algorithm 5 is a simple algorithm for identifying require constraints. A require
      constraint, e.g., saying ”variable feature A always requires variable feature B”,
      can be extracted from the lattice via implications. We say that A implies B
      (written A → B). The require constraints can be identified in the AOC-poset:
      when a feature F1 is introduced in a subconcept of the concept that introduces
      another feature F2 , there is an implication F1 → F2 . We only consider the
      transitive reduction of the AOC-poset limited to Attribute-concepts (line 2) and
      features that are in simplified intents (line 3-4). In the AOC-poset of Figure 2,
      we find 6 require constraints from the transitive reduction of the AOC-poset to
 Reverse Engineering Software Configuration Feature Models using FCA        101

 Algorithm 3: ComputeExclusive-or Relation (Xor)
   Data: AOC K , ≤s : the AOC-poset associated with K
   Result: part of the FM with XOR group of features
 1 // Compute exclusive-or relation
      00      0
 2 FL ← FL \ AsFs
 3 Cxor ← ∅
 4 SSCS ← ∅ // set of super-concept sets
 5 Minimum-set ← ∅
 6 for each minimum of AOC K denoted by m do
 7      Let SSC the set of super-concepts of m (except >)
 8      SSCS ← SSCS ∪ {SSC}
 9      Minimum-set ← Minimum-set ∪ {m}
10      Cxor ← Cxor ∪ SSC
11 while SSCS 6= ∅ do
12    SSC-1 ← any element in (SSCS)
13    SSCS ← SSCS \ SSC-1
14    for each SSC-2 in SSCS do
15        Cxor ← Cxor \ (SSC-1 ∩ SSC-2)

16 Cxor ← Max(Cxor)
17 XFS ← ∅
                           00
18 if |Cxor| > 1 and |F L ∩ ∪C∈Cxor simplif ied intent(C)| > 1 then
19     Create node xor with label (xor ) ← ”XOR”
20     type (xor ) ← abstract
21     create edge e = (root, xor )
22     // if all products are covered by Cxor
23     if ∪C∈Cxor extent(C) = O then
24         type (e) ← mandatory
25     else
26         type (e) ← optional
27     for each concept C ∈ Cxor do
28         for each F in simplified intent(C) ∩ F L00 do
29             create node feature, with label (feature) ← F
30             type (feature) ← concrete
31             create edge e = (xor, feature)
32             type (e) ← alternative
33             XFS ← XFS ∪ F




attribute-concepts (cf. Figure 3). Remark that implications ending to mandatory
features are useless because they are represented in the FM by the Base node.


2.6   Extracting exclude constraints

In our current proposal, we compute binary exclude constraints ¬(A ∧ B) under
the condition that A and B are not both linked to the Or group. To mine
102         Ra’Fat Al-Msie’Deen et al.

          Algorithm 4: ComputeInclusive-orRelation (Or)
         Data: AOC K , ≤s : the AOC-poset associated with K
         Result: part of the FM with OR group of features
       1 // Compute inclusive-or relation
            000         00
       2 FL ← FL \ XFS
                000
       3 if FL 6= ∅ then
       4      Create node or with label (or ) ← ”OR”
       5      type (or ) ← abstract
       6      create edge e = (root, or )
       7      type (e) ← optional
       8      for each F in FL000 do
       9            create node feature, with label (feature) ← F
      10            type (feature) ← concrete
      11            create edge e = (or, feature)
      12            type (e) ← Or




          Algorithm 5: ComputeRequireConstraint (Requires)
         Data: AC K , ≤s : the AC-poset associated with K
         Result: Require - the set of require constraints
       1 Require ← ∅
       2 for each edge (C1, C2) = e in transitive reduction of AC-poset do

       3   for all f1, f2 with f1 ∈ simplified intent(C1) and f2 ∈ simplified intent(C2) do
       4       Require ← Require ∪ {f1 −→ f2}




      exclude constraints from an AOC-poset, we use the meet3 of the introducers of
      the two involved features. For example, the meet of Concept 2 which introduces
      Weak and Concept 22 which introduces Multi Player is the bottom (in the whole
      lattice). In the AOC-poset they don’t have a common lower bound. We can
      thus deduce ¬(W eak ∧ M ulti P layer). In the AOC-poset of Figure 2, there
      are three exclude constraints (cf. Figure 3). Algorithm 6 is a simple algorithm
      for identifying exclude constraints. It compares features that are below the OR
      group with each set of features in the intent of a minimum (line 4), in order to
      determine which are incompatible: this is the case for a pair (f1, f2) where f1
      is in the OR group and not in the minimum intent, and f2 is in the minimum
      intent but not in the OR group (lines 6-10). Figure 3 shows the resulting FM
      based on the product configurations of Figure 1.



      3
          in the lattice
    Reverse Engineering Software Configuration Feature Models using FCA           103

    Algorithm 6: ComputeExcludeConstraint (Excludes)
   Data: AOC K , ≤s : the AOC-poset associated with K
   Result: Exclude - the set of exclude constraints.
 1 // Minimum-set from Algorithm 3
          000
 2 // FL from Algorithm 4
 3 Exclude ← ∅
 4 for each P ∈ Minimum-set do
 5     P intent ← intent(P ) \ intent(>)
 6     Opt-feat-set ← FL000 \ (FL000 ∩ P intent)
 7     Super-feat-set ← P intent \ (FL000 ∩ P intent)
 8     if Opt-feat-set 6= ∅ and Super-feat-set 6= ∅ then
 9            for each f1 ∈ Opt-feat-set, f2 ∈ Super-feat-set do
10                Exclude ← Exclude ∪ {¬(f1 ∧ f2)}




3     Experimentation

In order to evaluate the mined FM we rely on the SPLOT homepage4 and the
FAMA Tool5 . Our implementation6 converts the FM that has been drawn us-
ing SPLOT homepage into the format of FAMA. Then, we can easily generate
a file containing all valid product configurations [13]. Figure 3 shows all valid
product configurations for the mined FM by our approach (the first 16 product
configurations are the same as in Figure 1). We compare the sets of configura-
tions defined by the two FMs (i.e., the initial FM compared to the mined FM).
The mined FM introduces 15 extra product configurations which correspond to
feature selection constraints that have not been detected by our algorithm.

Evaluation Metrics: In our work, we rely on precision, recall and F-measure
metrics to evaluate the mined FM. All measures have values in [0, 1]. If re-
call equals 1, all relevant product configurations are retrieved. However, some
retrieved product configurations might not be relevant. If precision equals 1,
all retrieved product configurations are relevant. Nevertheless, relevant product
configurations might not be retrieved. If F-Measure equals 1, all relevant prod-
uct configurations are retrieved. However, some retrieved product configurations
might not be relevant. F-Measure defines a trade-off between precision and re-
call, so that it gives a high value only in cases where both recall and precision are
high. The result of the product configurations that are identified by the mined
cell phone FM is as follow: (precision: 0.51), (recall : 1.00) and (F-Measure: 0.68).
The recall measure is 1 by construction, due to the fact that the algorithm was
designed to cover existing products.

4
  SPLOT homepage : http://gsd.uwaterloo.ca:8088/SPLOT/
5
  FAMA Tool Suite : http://www.isa.us.es/fama/
6
  Source Code : https://code.google.com/p/sxfmtofama/
104         Ra’Fat Al-Msie’Deen et al.




                                                                                                                                                                                                     Artificial Opponent
                                                                                                                                                                                     Single Player
                                                                                                                                                                      Multi Player
                                                                Cell Phone



                                                                                                   Bluetooth
                                                                                                               Accu Cell
                                                                             Wireless
                                                                                        Infrared




                                                                                                                                    Medium


                                                                                                                                                    Display
                                                                                                                                                              Games
                                                                                                                           Strong


                                                                                                                                             Weak
                                                           P-17 ×       × ×     × ×
                                                           P-18 × ×     × ×     × ×
                                                           P-19 × × ×   × ×     × ×
                                                           P-20 × ×   × × ×     × ×
                                                           P-21 × × × × × ×     × ×
                                                           P-22 ×       ×   ×   × ×
                                                           P-23 × ×     ×   ×   × ×
                                                           P-24 × × ×   ×   ×   × ×
                                                           P-25 × ×     × ×     × × ×
                                                           P-26 × ×     ×   ×   × × ×
                                                           P-27 × ×     × ×     × ×   × ×
                                                           P-28 × ×     ×   ×   × ×   × ×
                                                           P-29 × ×     ×     × × ×   × ×
                                                           P-30 × ×     × ×     × × × × ×
                                                           P-31 × ×     ×   ×   × × × × ×




                   Fig. 3. The mined FM and its extra product configurations.


           To validate our approach7 , we ran experiments on 7 case studies: ArgoUML-
      SPL [1], mobile media software variants [2], public health complaint-SPL8 , video
      on demand-SPL [8,3,14], wiki engines [10], DC motor [11] and cell phone-SPL
      [5]. Table 1 summarizes the obtained results.
           Results show that precision appears to be not very high for all case studies.
      This means that many of the identified product configurations of the mined FM
      are extra configurations (not in the initial set that is defined by the original FM).
      Considering the recall metric, its value is 1 for all case studies. This means that
      product configurations defined by the initial FM are included in the product
      configurations derived from the mined FM. Experiments show that if the gener-
      ated AOC-poset has only one bottom concept there is no exclusive-or relation
      or exclude constraints from the given product configurations. In our work, the
      mined FM defines more configurations than the initial FM. The reason behind
      this limitation is that some feature selection constraints are not detected. Nev-
      ertheless, the AOC-poset contains information for going beyond this limitation.
      We plan to enhance our algorithm to deal with that issue, at the price of an
      increase of complexity.


      4     Related Work

      For the sake of brevity, we describe only the work that most closely relates to
      ours. The majority of existing approaches are designed to reverse engineer FM
      7
          Source code: https://code.google.com/p/refmfpc/
      8
          http://www.ic.unicamp.br/~tizzei/phc/
    Reverse Engineering Software Configuration Feature Models using FCA                                                                                          105

      Table 1. The results of configurations that are identified by the mined FMs.

                                               Group of Features CTCs                                                         Evaluation Metrics




                                                                                                    Execution times (in ms)
                                      Atomic Set of Features
                                      Number of Products
                                      Number of Features




                                                               Exclusive-or
                                      Inclusive-or




                                                                                                                                                     F-Measure
                                                                                                                               Precision
                                                                                         Excludes
                                                                              Requires




                                                                                                                                           Recall
                                      Base
                # case study
                1 ArgoUML-SPL          20 11 × ×                              ×   509 0.60 1.00                                                     0.75
                2 Mobile media          8 18 × × ×                                441 0.68 1.00                                                     0.80
                3 Health complaint-SPL 10 16 × × ×                            ×   439 0.57 1.00                                                     0.72
                4 Video on demand      16 12 × × ×                            ×   572 0.66 1.00                                                     0.80
                5 Wiki engines          8 21 × × ×             ×              × × 555 0.54 1.00                                                     0.70
                6 DC motor             10 15     ×                            ×   444 0.83 1.00                                                     0.90
                7 Cell phone-SPL       16 13 × × ×             ×              × × 486 0.51 1.00                                                     0.68




from high level models (e.g., product descriptions) [10,14]. Some approaches of-
fer an acceptable solution but are not able to identify important parts of FM
such as cross-tree constraints, and-group, or-group, xor-group [11]. The main
challenge of works that reverse engineer FMs from product configurations ([8,3])
is that numerous candidate FMs can be extracted from the same input config-
urations, yet only a few of them are meaningful and correct. The majority of
existing approaches are designed to identify the dependencies between features
regardless of FM hierarchy [8]. Work that relies on FCA to extract a FM does
not fully exploit resulting lattices. In [11], authors rely on FCA to extract a ba-
sic FM without cross-tree constraints, while in [12], authors use FCA as a tool
to understand the variability of existing SPL based on product configurations.
Their work does not produce FMs. In our work, we rely on FCA to extract FMs
from the software configurations. The resulting FMs exactly describe the given
product configuration set. The proposed approach is able to identify all parts of
FMs.


5     Conclusion

In this paper, we proposed an automatic approach to extract FMs from software
variants configurations. We rely on FCA to extract FMs including configuration
constraints. We have implemented our approach and evaluated its produced re-
sults on several case studies. The results of this evaluation showed that the
resulting FMs exactly describe the given product configuration set. The FMs
are generated in very short time, because our FCA tool (based on traversals of
the AOC-poset) scales significantly better than the standard FCA approaches
to calculate and traverse the lattices. The current work extracts a FM with two
levels of hierarchy. As a perspective of this work, we plan to enhance the ex-
tracted FM by increasing the levels of hierarchy based on AOC-poset structure
and to avoid allowing the FM to represent extra configurations.
106        Ra’Fat Al-Msie’Deen et al.

         Acknowledgment The authors would like to thank the reviewers for their
      valuable remarks that helped improve the paper. This work has been supported
      by the CUTTER ANR-10-BLAN-0219 project.


      References
       1. Al-Msie’deen, R., Seriai, A., Huchard, M., Urtado, C., Vauttier, S., Salman, H.E.:
          Mining features from the object-oriented source code of a collection of software
          variants using formal concept analysis and latent semantic indexing. In: SEKE
          ’13. (2013) 244–249
       2. Al-Msie’deen, R., Seriai, A., Huchard, M., Urtado, C., Vauttier, S.: Document-
          ing the mined feature implementations from the object-oriented source code of a
          collection of software product variants. In: SEKE ’14. (2014) 264–269
       3. Acher, M., Baudry, B., Heymans, P., Cleve, A., Hainaut, J.L.: Support for reverse
          engineering and maintaining feature models. In: VaMoS ’13, New York, NY, USA,
          ACM (2013) 20:1–20:8
       4. She, S., Lotufo, R., Berger, T., Wasowski, A., Czarnecki, K.: Reverse engineering
          feature models. In: ICSE ’11, New York, NY, USA, ACM (2011) 461–470
       5. Haslinger, E.N.: Reverse engineering feature models from program configurations.
          Master’s thesis, Johannes Kepler University Linz, Linz, Austria (September 2012)
       6. Ganter, B., Wille, R.: Formal concept analysis - mathematical foundations.
          Springer (1999)
       7. Berry, A., Gutierrez, A., Huchard, M., Napoli, A., Sigayret, A.: Hermes: a simple
          and efficient algorithm for building the AOC-poset of a binary relation, Annals of
          Mathematics and Artificial Intelligence (may 2014)
       8. Haslinger, E.N., Lopez-Herrejon, R.E., Egyed, A.: Reverse engineering feature
          models from programs’ feature sets. In: WCRE ’11, IEEE (2011) 308–312
       9. Andersen, N., Czarnecki, K., She, S., Wasowski, A.: Efficient synthesis of feature
          models. In: SPLC (1), ACM (2012) 106–115
      10. Acher, M., Cleve, A., Perrouin, G., Heymans, P., Vanbeneden, C., Collet, P.,
          Lahire, P.: On extracting feature models from product descriptions. In: VaMoS
          ’12, New York, NY, USA, ACM (2012) 45–54
      11. Ryssel, U., Ploennigs, J., Kabitzsch, K.: Extraction of feature models from formal
          contexts. In: SPLC ’11, New York, NY, USA, ACM (2011) 4:1–4:8
      12. Loesch, F., Ploedereder, E.: Optimization of variability in software product lines.
          In: SPLC ’07, Washington, DC, USA, IEEE Computer Society (2007) 151–162
      13. Benavides, D., Segura, S., Ruiz-Cortés, A.: Automated analysis of feature models
          20 years later: A literature review. Inf. Syst. 35(6) (September 2010) 615–636
      14. Lopez-Herrejon, R.E., Galindo, J.A., Benavides, D., Segura, S., Egyed, A.: Reverse
          engineering feature models with evolutionary algorithms: An exploratory study. In:
          SSBSE, Springer (2012) 168–182
           An Algorithm for the Multi-Relational
             Boolean Factor Analysis based on
                    Essential Elements

                          Martin Trnecka, Marketa Trneckova

                     Data Analysis and Modeling Lab (DAMOL)
             Department of Computer Science, Palacky University, Olomouc
              martin.trnecka@gmail.com, marketa.trneckova@gmail.com



         Abstract. The Multi-Relational Boolean factor analysis is a method
         from the family of matrix decomposition methods which enables us an-
         alyze binary multi-relational data, i.e. binary data which are composed
         from many binary data tables interconnected via relation. In this paper
         we present a new Boolean matrix factorization algorithm for this kind of
         data, which use the new knowledge from the theory of the Boolean factor
         analysis, so-called essential elements. We show on real dataset that uti-
         lizing essential elements in the algorithm leads to better results in terms
         of quality and the number of obtained multi-relational factors.


  1    Introduction
  The Boolean matrix factorization (or decomposition), also known as the Boolean
  factor analysis, has gained interest in the data mining community. Methods for
  decomposition of multi-relational data, i.e. complex data composed from many
  data tables interconnected via relations between objects or attributes of this data
  tables, were intensively studied, especially in the past few years. Multi-relational
  data is a more truthful and therefore often also more powerful representation of
  reality. An example of this kind of data can be an arbitrary relational database.
  In this paper we are focused on the subset of multi-relational data, more pre-
  cisely on the multi-relational Boolean data. In this case data tables and relations
  between them contain only 0s and 1s.
      It is important to say that many real-word data sets are more complex than
  one simple data table. Relations between this tables are crucial, because they
  carry additional information about the relationship between data and this infor-
  mation is important for understanding data as a whole. For this reason methods
  which can analyze multi-relational data usually takes into account relations be-
  tween data tables unlike classical Boolean matrix factorization methods which
  can handle only one data table.
      The Multi-Relational Boolean matrix factorization (MBMF) is used for many
  data mining purposes. The basic task is to find new variables hidden in data,
  called multi-relational factors, which explain or describe the original input data.
  There exist several ways how to represent multi-relational factors. In this work

c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 107–119,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
108     Martin Trnecka and Marketa Trneckova


we adopt settings from [7], where is the multi-relational factor represented as an
ordered set of classic factors from data tables, always one factor from each data
table. The fact, that classic factors are connected into multi-relational factor is
matter of semantic of relation between data tables.
    The main problem is how to connect classic factors into one multi-relational.
The main aim of this work is to propose a new algorithm which utilize so-called
essential elements from the theory of Boolean matrices. The essential elements
provide information about factors which cover a particular part of data tables.
This information can be used for a better connection of classic factors into one
multi-relational factor.
    Another thing is the number of obtained factors. In classical settings we want
the number of obtained factors as small as a possible. In the literature can be
found two main views on this requirement. In the first case we want to obtain
the particular number of factors. In the second case we want to obtain factors
that explain prescribed portion of data. In both cases we want to obtain the
most important factors. For more details see [1]. We emphasize this fact and we
reflect it in designing of our algorithm. Both views can be transferred to multi-
relational case. The first one is straightforward, the second one is a little bit
problematic because multi-relational factors may not be able explain the whole
data. This is correct, because multi-relational factors carry different information
than classical factors. We discuss this issue later in the paper.


2     Preliminaries and basic notions
We assume familiarity with the basic notions of the Formal concept analysis [4],
which provides a basic framework for dealing with factors and the Boolean matrix
factorization (BMF) [2]. The main goal of classical BMF is to find a decompo-
sition C = A ◦ B, where C is input data table, A represent object-factor data
table (or matrix) and B represent factor-attribute data table (or matrix). The
product ◦ is the Boolean matrix product, defined by
                                         Wk
                            (A ◦ B)ij = l=1 Ail · Blj ,                         (1)
       W
where     denotes maximum (truth function of logical disjunction) and · is the
usual product (truth function of logical conjunction). Decomposition C into A◦B
corresponds to discovery factors which explain the data. Factors in classical BMF
can be seen as formal concepts [2], i.e. entity with the extent part and the intent
part. This leads to clear interpretation of factors. Another benefit of using FCA
as a basic framework is that matrices A and B can be constructed from the
subset of all formal concepts. Let
                  F = {hA1 , B1 i , . . . , hAk , Bk i} ⊆ B(X, Y, C),
where B(X, Y, C) represents a set of all formal concepts of data table, which can
be seen as a formal context hX, Y, Ci, where X is a set of objects, Y is a set of
attributes and C is a binary relation between X and Y . Matrices A and B are
constructed in the following way:
       Multi-Relational Boolean Factor Analysis based on Essential Elements             109


                                                        
                                   1 if i ∈ Al               1 if j ∈ Bl
                     (A)il =                   (B)lj =
                                   0 if i ∈
                                          / Al               0 if j ∈
                                                                    / Bl
for l = 1, . . . , k. In other words, A is composed from characteristic vectors Al .
Similarly for B.
     In a multi-relation environment we have a set of input data tables C1 ,
C2 , . . . Cn and a set of relations Rij , where i, j ∈ {1, . . . , n}, between Ci and
Cj . The multi-relation factor on data tables C1 , C2 , . . . Cn is an ordered n-tuple
                                    i
  F1i1 , F2i2 , . . . Fnin , where Fj j ∈ Fj , j ∈ {1, . . . , n} (Fj denotes a set of clas-
sic factors of data table Cj ) and satisfying relations RCl Cl+1 or RCl+1 Cl for
l ∈ {1, . . . , n − 1}.
Example 1. Let us have two data tables C1 (Table 1) and C2 (Table 2). Moreover,
we consider relation RC1 C2 (Table 3) between objects of the first data table and
attributes of the second one.



        Table 1: C1                     Table 2: C2                   Table 3: RC1 C2
         a b c d                          e f g h                           e f g h
        1 ×××                           5×      ×                          1 ××
        2× ×                            6 ××                               2× ×
        3 × ×                           7×××                               3×× ×
        4××××                           8     ××                           4××××



    Classic factors of data table C1 are for example: F1C1 = h{1, 4}, {b, c, d}i,
F2C1  = h{2, 4}, {a, c}i, F3C1 = h{1, 3, 4}, {b, d}i and factors of the second ta-
ble C2 are: F1C2 = h{6, 7}, {f, g}i, F2C2 = h{5}, {e, h}i, F3C2 = h{5, 7}, {e}i,
F4C2 = h{8}, {g, h}i. These factors can be connected with using a relation RC1 C2
into multi-relational factors in several ways. In [7] were introduced three ap-
proaches how to manage this connections. We use the narrow approach from [7],
which seems to be the most natural, and we obtain two multi-relational factors
hF1C1 , F1C2 i and hF3C1 , F1C2 i. The idea of the narrow approach is very simple.
We connect two factors FiC1 and FjC2 if the non-empty set of attributes (if such
exist), which are common (in the relation RC1 C2 ) to all objects from the first
factor FiC1 , is the subset of attributes of the second factor FjC2 .
    The previous example also demonstrate the most problematic part of MBMF.
Usually is problematic to connect all factors from each data table. The result of
this is a small number of connections between them. This leads to problematic
selection of quality multi-relational factors. The reason for a small number of
connections between factors is that classic factors are selected without taking
relation into account.
    Another very important notion for our work are so-called essential elements
presented in [1]. Essential elements in the Boolean data table are entries in
this data table which are sufficient for covering the whole data table by factors
110      Martin Trnecka and Marketa Trneckova


(concepts), i.e. if we take factors which cover all these entries, we automatically
cover all entries of the input data table. Formally, essential elements in the data
table hX, Y, Ci are defined via minimal intervals in the concept lattice. The entry
Cij is essential iff interval bounded by formal concepts hi↑↓ , i↑ i and hj ↓ , j ↓↑ i is
non-empty and minimal w.r.t. ⊆ (if it is not contained in any other interval). We
denote this interval by Iij . If the table entry Cij is essential, then interval Iij
represents the set of all formal concepts (factors) which cover this entry. Very
interesting property of essential elements, which is used in our algorithm, is that
is sufficient take only one arbitrary concept from each interval to create exact
Boolean decomposition of hX, Y, Ci. For more details about essential elements
we refer to [1].



3     Related work

There are several papers about classical BMF [1, 2, 5, 8, 10, 12], but this methods
can handle only one data table. In the literature, we can found a wide range
of theoretical and application papers about the multi-relation data analysis (see
overview [3]), but many times were shown that these approaches are suitable only
for ordinal data. The multi-relational Boolean factor analysis is more specific.
The most relevant paper for our work is [7], where was introduced the basic
idea that multi-relational factors are composed from classical factors which are
interconnected via relation between data tables. There were also introduced three
approaches how to create multi-relational factors, but an effective algorithm is
missing.
    The Boolean multi-relational patterns and its extraction are subject of a
paper [11]. Differently from our approach data are represented via k-partite
graphs. There are considered only relations between attributes and data tables
contain only one single attribute. Patterns in [11] are different from our multi-
relational factors (are represented as k-clique in data) and also carry different
information. In [11] there is also considered other kind of measure of quality of
obtained patterns which is based on entropy.
    Another relevant work is [6] where were introduced the Relational Formal
Concept Analysis as a tool for analyzing multi-relational data. Unlike from [6]
our approach extracts a different kind of patterns. For more details see [7].
MBMF is mentioned indirectly in a very specific and limited form in [9] as the
Joint Subspace Matrix Factorization.
    Generally the idea of connection patterns from various data tables is not new.
It can be found in the social network analysis or in the field of recommendation
systems. The main advantage of our approach is that patterns are Boolean fac-
tors that carry significant information and the second important advantage is
that we deliver the most important factors (factors which describe the biggest
portion of input data) before others, i.e. the first obtained factor is the most
important.
       Multi-Relational Boolean Factor Analysis based on Essential Elements        111


4    Algorithm for MBMF
Before we present the algorithm for the MBMF we show on a simple example
basic ideas that are behind the algorithm. For this purpose we take the example
from the previous part. As we mentioned above if we take tables C1 , C2 and
relation RC1 C2 , we obtain with the narrow approach two connections between
factors, i.e. two multi-relational factors. These factors explain only 60 percent of
data. There usually exist more factorizations of Boolean data table. Factors in
our example were obtained with using GreConD algorithm from [2]. GreConD
algorithm select in each iteration a factor which covers the biggest part of still
uncovered data. Now we are in the situation, where we want to obtain a different
set of factors, with more connections between them. For this purpose we can use
essential elements. Firstly we compute essential parts of C1 (denoted Ess(C1 ))
and C2 (denoted Ess(C1 )). With the essential part of data table we mean all
essential elements (tables 1 and 2).



              Table 4: Ess(C1 )                     Table 5: Ess(C2 )
                    a b c d                              e f g h
                  1     ×                              5×      ×
                  2×                                   6 ×
                  3 × ×                                7×
                  4                                    8     ××


    Each essential element in Ess(C1 ) is defined via interval in concept lattice of
C1 (Fig. 1a) and similarly for essential elements in Ess(C2 ) (Fig 1b). In Fig. 1a
is highlighted interval I1c corresponding to essential element (C1 )1c . In Fig. 1b is
highlighted interval corresponding to essential element (C2 )8g . Let us note that
concept lattices here are only for illustration purpose. For computing Ess(C1 )
and Ess(C2 ) is not necessary to construct concept lattices at all. Now, if we
use the fact that we can take an arbitrary concept (factor) from each interval
to obtain a complete factorization of data table, we have several options which
concepts can be connect into one. More precisely we can take two intervals
and try to connect each concept from the first interval with concepts from the
second one. Again, we obtain full factorization of input data tables, but now we
can select factors with regard to a relation between them.
    For example, if we take highlighted intervals, we obtain possibly four con-
nections. First highlighted interval contains two concepts c1 = h{1, 2, 4}, {c}i
and c2 = h{1, 4}, {b, c, d}i. Second consist of concepts d1 = h{6, 7, 8}, {g}i and
d2 = h{8}, {g, h}i. Only two connections (c1 with d1 and c1 with d2 ) satisfy
relation RC1 C2 , i.e. can be connected.
    For two intervals it is not necessary to try all combination of factors. If
we are not able to connect concept hA, Bi from the first interval with concept
hC, Di from the second interval, we are not able connect hA, Bi with any concept
hE, F i from the second interval, where hC, Di ⊆ hE, F i. Also if we are not
112      Martin Trnecka and Marketa Trneckova




                                                          h       e    g
                   c                    b, d
                                    3                                  f
                                                      5       8        6
                   a                    1
               2
                                                                       7
                          4
                              (a)                             (b)

                       Fig. 1: Concept lattices of C1 (a) and C2 (b)


able to connect concept hA, Bi from the first interval with concept hE, F i from
the second interval, we are not able connect any concept hC, Di from the first
interval, where hC, Di ⊆ hA, Bi, with concept hE, F i. Let us note that ⊆ is
classical subconcept-superconcept ordering.
    Even if we take this search space reduction into account, search in this in-
tervals is still time consuming. We propose an heuristic approach which takes
attribute concepts in intervals of the second data table, i.e. the bottom elements
in each interval. In intervals of the first data table we take greatest concepts
which can be connected via relation, i.e. set of common attributes in relation
is non-empty. The idea behind this heuristic is that a bigger set of objects pos-
sibly have a smaller set of common attributes in a relation and this leads to
bigger probability to connect this factor with some factor from the second data
table, moreover, if we take factor which contains the biggest set of attributes in
intervals of the second data table.
    Because we do not want to construct the whole concept lattice and search in
it, we compute candidates for greatest element directly from relation RC1 C2 . We
take all objects belonging to the top element of interval Iij from the first data
table and compute how many of them belong to each attribute in the relation. We
take into account only attributes belonging to object i. We take as candidate the
greatest set of objects belonging to some attribute in a relation, which satisfies
that if we compute a closure of this set in the first data table, resulting set of
objects do not have empty set of common attributes in a relation.
    Applying this heuristic on data from the example, we obtain three factors
in the first data table, F1C1 = h{2, 4}, {a, c}i, F2C1 = h{1, 3, 4}, {c, d}i, F3C1 =
h{1, 2, 4}, {c}i and four factors F1C2 = h{5}, {e, h}i, F2C2 = h{6, 7}, {f, g}i, F3C2 =
h{7}, {e, f, g}i, F4C2 = h{8}, {g, h}i from the second one. Between this factors,
there are six connections satisfying the relation. These connections are shown in
table 6.
    We form multi-relational factors in a greedy manner. In each step we connect
factors, which cover the biggest part of still uncovered part of data tables C1 and
       Multi-Relational Boolean Factor Analysis based on Essential Elements        113



                      Table 6: Connections between factors
                                      F1C2 F2C2 F3C2 F4C2
                               F1C1              ×
                               F2C1         × ×
                               F3C1         × × ×



C2 . Firstly, we obtain multi-relational factor hF2C1 , F2C2 i which covers 50 percent
of the data. Then we obtain factor hF3C1 , F4C2 i which covers together with first
factor 75 percent of the data and last we obtain factor hF1C1 , F3C2 i. All these
factors cover 90 percent of the data. By adding other factors we do not obtain
better coverage of input data. These three factors cover the same part of input
data as six connections from table 6.

Remark 1. As we mentioned above and what we can see in the example, multi-
relational factors are not always able to explain the whole data. This is due
to nature of data. Simply there is no information how to connect some classic
factors, e.g. in the example no set of objects from C1 has in RC1 C2 a set of
common attributes equal to {e, h} (or only {e} or only {h}). From this reason
we are not able to connect any factor from C1 with factor F1C2 .

Remark 2. In previous part we explain the idea of the algorithm on a object-
attribute relation between data tables. It is also possible consider different kind
of relation, e.g. object-object, attribute-object or attribute-attribute relation.
Without loss of generality we present the algorithm only for the object-attribute
relation. Modification to a different kind of relation is very simple.

    Now we are going to describe the pseudo-code (Algorithm 1) of our algorithm
for MBMF. Input to this algorithm are two Boolean data tables C1 and C2 ,
binary relation RC1 C2 between them and a number p ∈ [0, 1] which represent
how large part of C1 and C2 we want to cover by multi-relational factors, e.g.
value 0.9 mean that we want to cover 90 percent of entries in input data tables.
Output of this algorithm is a set M of multi-relational factors that covers the
prescribed portion of input data (if it is possible to obtain prescribed coverage).
The first computed factor covers the biggest part of data.
    First, in lines 1-2 we compute essential part of C1 and C2 . In lines 2-4 we
initialize variables UC1 and UC2 . These variables are used for storing information
about still uncovered part of input data. We repeat the main loop (lines 5-18)
until we obtain a required coverage or until it is possible to add new multi-
relational factors which cover still uncovered part (lines 12-14).
    In the main loop for each essential element we select the best candidate from
interval Iij from the first data table in the greedy manner described in the
algorithm idea, i.e. we take the greatest concept which can be connected via
relation. Than we try to connect this candidate with factors from the second
data table. We compute cover function and we add to M the multi-relational
factor maximizing this coverage.
114       Martin Trnecka and Marketa Trneckova


   In lines 16-17 we remove from UC1 and UC2 entries which are covered by
actually added multi-relational factor.



  Algorithm 1: Algorithm for the multi-relational Boolean factors analysis
     Input: Boolean matrices C1 , C2 and relation RC1 C2 between them and p ∈ [0, 1]
     Output: set M of multi-relational factors
 1 EC1 ← Ess(C1 )
 2 EC2 ← Ess(C2 )
 3 UC1 ← C1
 4 UC2 ← C2

 5 while (|UC1 | + |UC2 |)/(|C1 | + |C2 |) ≥ p do
 6    foreach essential element (EC1 )ij do
 7        compute the best candidate ha, bi from interval Iij
 8    end
 9    hA, Bi ← select one from set of candidates which maximize cover of C1
                                                         ↑               ↓↑C2
10      select non-empty row i in EC2 for which is A RC1 C2 ⊆ (C2 )i            and which
        maximize cover of C1 and C2
                         ↑↓C2
11      hC, Di ← h(C2 )i        , (C2 )↑C
                                       i
                                         2
                                           i
12      if value of cover function for C1 and C2 is equal to zero then
13          break
14      end
15    add hhA, Bi, hC, Dii to M
16    set (UC1 )ij = 0 where i ∈ A and j ∈ B
17    set (UC1 )ij = 0 where i ∈ C and j ∈ D
18 end
19 return F




   Our implementation of the algorithm follows the pseudo-code conceptually,
but not in details. For example we speed up the algorithm by precomputing can-
didates or instead computing candidates for each essential elements, we compute
candidates for essential areas, i.e. essential elements which are covered by one
formal concept.


Remark 3. The input of our algorithm are two Boolean data tables and one
relation between them. In general we can have more data tables and rela-
tions. Generalization of our algorithm for such input is possible. Due to lack
of space we mentioned only an idea of this generalization. For the input data
tables C1 , C2 , . . . , Cn and relations RCi Ci+1 , i ∈ {1, 2, . . . , n − 1} we firstly com-
pute multi-relational factors for Cn−1 and Cn . Then iteratively compute multi-
relational factors for Cn−2 and Cn−1 . From this pairs we construct n-tuple multi-
relational factor.
        Multi-Relational Boolean Factor Analysis based on Essential Elements    115


    We do not make a detail analysis of the time complexity of the algorithm.
Even our slow implementation in MATLAB is fast enough for factorization usu-
ally large datasets in a few minutes.


5     Experimental evaluation

For experimental evaluation of our algorithm we use in a data minig community
well known real dataset MovieLens1 . This dataset is composed of two data tables
that represent a set of users and their attributes, e.g. gender, age, sex, occupation
and a set of movies again with their attributes, e.g. the year of production or
genre. Last part of this dataset is a relation between this data sets. This relation
contains 1000209 anonymous ratings of approximately 3900 movies (3952) made
by 6040 MovieLens users who joined to MovieLens in 2000. Each user has at
least 20 ratings. Ratings are made on a 5-star scale (values 1-5, 1 means, that
user does not like a movie and 5 means that he likes a movie).
    Originally data tables Users and Movies are categorical. Age is grouped into
7 categories such as “Under 18”, “18-24”, “25-34”, “35-44”, “45-49”, “50-55”
and “56+”. Sex is from set {Male, Female}. Occupation is chosen from the
following choices: “other” or not specified, “academic/educator”, “artist”, “cler-
ical/admin”, “college/grad student”, “customer service”, “doctor/health care”,
“executive/managerial”, “farmer”, “homemaker”, “K-12 student”, “lawyer”, “pro-
grammer”, “retired”, “sales/marketing”, “scientist”, “self-employed”, “techni-
cian/engineer”, “tradesman/craftsman”, “unemployed” and “writer”. Film gen-
res are following: “Action”, “Adventure”, “Animation”, “Children’s”, “Com-
edy”, “Crime”, “Documentary”, “Drama”, “Fantasy”, “Film-Noir”, “Horror”,
“Musical”, “Mystery”, “Romance”, “Sci-Fi”, “Thriller”, “War” and “Western”.
Year of production is from 1919 to 2000. We grouped years into 8 categories
“1919-1930”, “1931-1940”, “1941-1950”, “1951-1960”, “1961-1970”, “1971-1980”,
“1981-1990” and “1991-2000”.
    We convert the ordinal relation in to binary one. We use three different
scaling. The first is that user rates a movie. The second is that a user does not
like a movie (he rates movie with 1-2 stars). The last one is that user likes a
movie (rates 4-5). This does not mean, that users do like (respective do not like)
some genre, it means, that movies from this genre are or are not worth to see. We
took the middle size version of the MovieLens dataset and we made a restriction
to 3000 users and movies that were rated by that users. We take users, who
rate movies the most, and we obtain dimension of the first data table 3000×30
and dimension of the second data table is 3671×26. Let us just note that for
obtaining object-attribute relation we need to transpose Movies data table.
    Relation “user rates a movie” make sense, because user rates a movie if he
has seen it. We can understand this relation as user has seen movie. We get
29 multi-relational factors, that cover almost 100% of data (99.97%). Values of
coverage, i.e. how large part of input data is covered can be seen in figure 2.
1
    http://grouplens.org/datasets/movielens/
116                     Martin Trnecka and Marketa Trneckova


Graphs in figure 3 show coverage of Users data table and Movies data table
separately.
    We can also see that for explaining more than 90 percent of data are sufficient
17 factors. This is significant reduction of input data.



                                                        1

                                                       0.9

                                                       0.8

                                                       0.7

                                                       0.6
                                            coverage




                                                       0.5

                                                       0.4

                                                       0.3

                                                       0.2

                                                       0.1

                                                        0
                                                             0   5        10          15          20                 25       30
                                                                                number of factors




                                     Fig. 2: Cumulative coverage of input data




               1                                                                                            1

              0.9                                                                                          0.9

              0.8                                                                                          0.8

              0.7                                                                                          0.7

              0.6                                                                                          0.6
   coverage




                                                                                                coverage




              0.5                                                                                          0.5

              0.4                                                                                          0.4

              0.3                                                                                          0.3

              0.2                                                                                          0.2

              0.1                                                                                          0.1

               0                                                                                            0
                    0     5   10         15          20              25        30                                0        5        10         15          20   25   30
                                   number of factors                                                                                    number of factors


                (a) Coverage of Users data table                                                            (b) Coverage of Movies data table

                                           Fig. 3: Coverage of input data tables




The most important factors are:
 – Males rate new movies (movies from 1991 to 2000).
 – Young adult users (ages 25-34) rate drama movies.
       Multi-Relational Boolean Factor Analysis based on Essential Elements     117


 – Females rate comedy movies.
 – Youth users (18-24) rate action movies.

Another interesting factors are:

 – Old users (from category 56+) rate movies from their childhood (movies
   from 1941 to 1950).
 – Users in age range 50-55 rate children’s movies. Users in this age usually
   have grand children.
 – K-12 students rate animation movies.

    Due to lack of space, we skip details about factors in relation “user does not
like a movie” and relation “user does like a movie”. In the first relation we get
30 factors, that covers 99.99% of data. In the second one, we get 29 factors,
covering 99.96% of data. Compute all multi-relational factors on this datasets
take approximately 5 minutes.

Remark 4. In case of MovieLens we are able to reconstruct input data tables
almost wholly for each three relations. Interesting question is what about rela-
tion, i.e. can we reconstruct relation between data tables? Answer is yes, we can.
Multi-relational factor carry also information about the relation between data
tables. So we can reconstruct it, but with some error. This error is a result of
choosing the narrow approach.
    Reconstruction error of relation is interesting information and can be mini-
mize if we take this error into account in phase of computing coverage. In other
words we want maximal coverage with minimal relation reconstruction error.
This leads to more complicated algorithm because we need weights to compute
a value of utility function. We implement also this variant of algorithm. Re-
quirement of minimal reconstruction error and maximal coverage seems to be
contradictory, but this claim need more detailed study. Also it is necessary to
determine correct weight settings. We left this issue for the extended version of
this paper.


6   Conclusion and Future Research

In this paper, we present new algorithm for multi-relational Boolean matrix fac-
torization, that uses essential elements from binary matrices for constructing
better multi-relational factors, with regard to relations between each data ta-
ble. We test the algorithm on, in data mining well known, dataset MovieLens.
We obtain from these experiments interesting and easy interpretable results,
moreover, the number of obtained multi-relational factors needed for explaining
almost whole data is reasonable small.
    A future research shall include the following topics: generalization of the al-
gorithm for MBMF for ordinal data, especially data over residuated lattices.
Construction of algorithm which takes into account reconstruction error of the
118      Martin Trnecka and Marketa Trneckova


relation between data tables. Test the potential of this method in recommen-
dation systems. And last but not least create not crisp operator for connecting
classic factors into multi-relational factors.

Acknowledgment
We acknowledge support by IGA of Palacky University, No. PrF 2014 034.


References
 1. Belohlavek R., Trnecka M.: From-Below Approximations in Boolean Matrix Fac-
    torization: Geometry and New Algorithm. http://arxiv.org/abs/1306.4905, 2013.
 2. Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel
    method of matrix decomposition. J. Comput. Syst. Sci. 76(1), 3–20, 2010.
 3. Džeroski S.: Multi-Relational Data Mining: An Introduction. ACM SIGKDD Ex-
    plorations Newsletter, 1(5), 1–16, 2003.
 4. Ganter B., Wille R.: Formal Concept Analysis: Mathematical Foundations.
    Springer, Berlin, 1999.
 5. Geerts F., Goethals B., Mielikäinen T.: Tiling databases, Proceedings of Discovery
    Science 2004, pp. 278–289, 2004.
 6. Hacene M. R., Huchard M., Napoli A., Valtechev P.: Relational concept analysis:
    mining concept lattices from multi-relational data. Ann. Math. Artif. Intell. 67(1),
    81–108, 2013.
 7. Krmelova M., Trnecka M.: Boolean Factor Analysis of Multi-Relational Data. In:
    M. Ojeda-Aciego, J. Outrata (Eds.): CLA 2013: Proceedings of the 10th Interna-
    tional Conference on Concept Lattices and Their Applications, pp. 187–198, 2013.
 8. Lucchese C., Orlando S., Perego R.: Mining top-K patterns from binary datasets
    in presence of noise. SIAM DM 2010, pp. 165–176, 2010.
 9. Miettinen P.: On Finding Joint Subspace Boolean Matrix Factorizations. Proc.
    SIAM International Conference on Data Mining (SDM2012), pp. 954-965, 2012.
10. Miettinen P., Mielikäinen T., Gionis A., Das G., Mannila H.: The discrete basis
    problem, IEEE Trans. Knowledge and Data Eng. 20(10), 1348–1362, 2008.
11. Spyropoulou E., De Bie T.: Interesting Multi-relational Patterns. In Proceedings
    of the 2011 IEEE 11th International Conference on Data Mining, ICDM ’11, pp.
    675–684, 2011.
12. Xiang Y., Jin R., Fuhry D., Dragan F. F.: Summarizing transactional databases
    with overlapped hyperrectangles, Data Mining and Knowledge Discovery 23(2),
    215–251, 2011.
      On Concept Lattices as Information Channels

      Francisco J. Valverde-Albacete1? , Carmen Peláez-Moreno2 , and Anselmo
                                       Peñas1
                  1
                  Departamento de Lenguajes y Sistemas Informáticos
  Universidad Nacional de Educación a Distancia, c/ Juan del Rosal, 16. 28040 Madrid,
                          Spain {fva,anselmo}@lsi.uned.es
            2
              Departamento de Teorı́a de la Señal y de las Comunicaciones
               Universidad Carlos III de Madrid, 28911 Leganés, Spain
                                 carmen@tsc.uc3m.es



          Abstract. This paper explores the idea that a concept lattice is an in-
          formation channel between objects and attributes. For this purpose we
          study the behaviour of incidences in L-formal contexts where L is the
          range of an information-theoretic entropy function. Examples of such
          data abound in machine learning and data mining, e.g. confusion matri-
          ces of multi-class classifiers or document-term matrices. We use a well-
          motivated information-theoretic heuristic, the maximization of mutual
          information, that in our conclusions provides a flavour of feature selection
          providing and information-theory explanation of an established practice
          in Data Mining, Natural Language Processing and Information Retrieval
          applications, viz. stop-wording and frequency thresholding. We also in-
          troduce a post-clustering class identification in the presence of confusions
          and a flavour of term selection for a multi-label document classification
          task.


  1     Introduction

  Information Theory (IT) was born as a theory to improve the efficiency of (man-
  made) communication channels [1, 2], but it soon found wider application [3].
  This paper is about using the model of a communication channel in IT to explore
  the formal contexts and concept lattices of Formal Concept Analysis as realisa-
  tions of information channels between objects and attributes. Given the highly
  unspecified nature of both the latter abstractions such a model will bring new
  insights into a number of problems, but we are specifically aiming at machine
  learning and data mining applications [4, 5].
      The metaphor of a concept lattice as a communication channel between ob-
  jects and attributes is already implicit in [6, 7]. In there, adjoint sublattices were
  already considered as subchannels in charge of transmitting individual acousti-
  cal features, and some efforts were done to model such features explicitly [7],
  ?
      FJVA and AP are supported by EU FP7 project LiMoSINe (contract 288024) for
      this work. CPM has been supported by the Spanish Government-Comisión Inter-
      ministerial de Ciencia y Tecnologı́a project TEC2011-26807.


c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 119–131,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
120     Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas


but no conclusive results were achieved. The difficulty rose from a thresholding
parameter ϕ that controls the lattice-inducing technique and was originally fixed
by interactive exploration, a procedure hard to relate to the optimization of a
utility or cost function, as required in modern machine learning.
     In this paper we set this problem against the backdrop of direct mutual infor-
mation maximization—using techniques and insights developed since [6, 7]—for
matrices whose entries are frequency counts. These counts appear frequently
in statistics, data mining and machine learning, for instance, in the form of
document-term matrices in Information Retrieval [8], confusion matrices for clas-
sifiers in perceptual studies, data mining and machine learning [9], or simply
two-mode contingency tables with count entries. Such matrices are called aggre-
gable in [4], in the sense that any group of rows or columns can be aggregated
together to form another matrix whose frequencies are obtained from the data
of the elements in the groups. We will use this feature to easily build count and
probability distributions whose mutual information can be maximized, following
the heuristic motivated above, to improve classification tasks. Note that max-
imizing mutual information (over all possible joint distributions) is intimately
related to the concept of channel capacity as defined by Shannon [2].
     For this purpose, in Sec. 2 we cast the problem of analysing the transfer of
information through the two modes of contingency tables as that of analysing a
particular type of formal context. First we present in Sec. 2.1 the model of the
task to be solved, then we present aggregable data, as usually found in machine
learning applications in Sec. 2.2, and then introduce the entropic encoding to
make it amenable to FCA. As an application, in Sec. 3.1 we explore the particular
problem of supervised clustering as that of transferring the labels from a set of
input patterns to the labels of the output classes. Specifically we address the
problem of assigning labels to mixed clusters given the distribution of the input
labels in them. We end with a discussion and a summary of contributions and
conclusions.


2     Theory
2.1   Classification optimization by mutual information maximization
Consider the following, standard supervised classification setting: we have two
domains X and Y , m instances of i.i.d. samples S = {(xi , yi )}m
                                                                i=1 ⊆ X × Y , and
we want to learn a function h : X → Y , the hypothesis, with certain “good”
qualities, to estimate the class Y from X , the measurements of Y , or features.
   A very productive model to solve this problem is to consider two probability
spaces Y = hY, PY i and X = hX, PX i with Y ∼ PY and X ∼ PX , and suppose
that there exists the product space hX × Y, PXY i wherefrom the i.i.d. samples
of S have been obtained. So our problem is solved by estimating the random
variable Ŷ = h(X), and a “good” estimation is that which obtains a low error
probability on every possible pair P (Ŷ 6= Y ) → 0 .
   Since working with probabilities might be difficult, we might prefer to use
a (surrogate) loss function that quantifies the cost of this difference L(ŷ =
                               On Concept Lattices as Information Channels       121


h(x), y) and try to minimize the expectation of this loss, called the risk R(h) =
E[L(h(x), y)] over a class of functions h ∈ H, h∗ = minh∈H R(h) . Consequently,
this process is called empirical risk minimization.
    An alternate criterion is to maximize the mutual information between Y and
Ŷ [10]. This is clearly seen from Fano’s inequality [11], serving as a lower bound,
and the Hellman-Raviv upper bound [12],

                    HPŶ − IPY Ŷ − 1                 1
                                      ≤ P (Ŷ 6= Y ) ≤ HPŶ |Y
                         HUŶ                         2

where UŶ is the uniform distribution on the support of Ŷ , HPXX denotes the
different entropies involved and IPY Ŷ is the mutual information of the joint
probability distribution.


2.2   Processing aggregable data

If the original rows and columns of contingency tables represent atomic events,
their groupings represent complex events and this structure is compatible with
the underlying sigma algebras that would transform the matrix into a joint
distribution of probabilities, hence these data can be also interpreted as joint
probabilities, when row- and column-normalized.
    When insufficient data is available for counting, the estimation of empirical
probabilities from this kind of data is problematic, and complex probability
estimation schemes have to be used. Even if data galore were available, we still
have to deal with the problem of rarely seen events and their difficult probability
estimation. However, probabilities are, perhaps, the best data that we can plug
onto data mining or machine learning techniques, be they for supervised or
unsupervised tasks.


The weighted Pointwise Mutual Information. Recall the formula for
the mutual information between two random variables IPXY = EPXY [IXY (x, y)]
where IXY (x, y) = log PXPXY   (x,y)
                            (x)·PY (y) is the pointwise mutual information, (PMI).
    Remember that −∞ ≤ IXY (x, y) < ∞ with IXY (x, y) = 0 being the case
where X and Y are independent. The negative values are caused by phenomena
less represented in the joint data than in independent pairs as captured by the
marginals. The extreme value IXY (x, y) = −∞ is generated when the joint
probability is negative even if the marginals are not. These are instances that
capture “negative” association whence to maximize the expectation we might
consider disposing of them.
    On the other hand, on count data the PMI has an unexpected and unwanted
effect: it is very high for hapax legomena phenomena that are encountered only
once in a tallying, and in general it has a high value for phenomena with low
counts of whose statistical behaviour we are less certain.
122       Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas


      However, we know that
                 X                                     X                           PXY (x, y)
        IPXY =         PXY (x, y) · IXY (x, y) =             PXY (x, y) log
                 x,y                                   x,y
                                                                                  PX (x) · PY (y)

and this is always a positive quantity, regardless of the individual values of
IXY (x, y). This suggests calling weighted pointwise mutual information, (wPMI)
the quantity
                                                               PXY (x, y)
                       wPMI(x, y) = PXY (x, y) log                                                  (1)
                                                              PX (x) · PY (y)
and using it as the subject of optimization or exploration to do so. Note that
pairs of phenomena whose joint probability are close to independent, as judged
by the pointwise information, will be given a very low value in the wP M I , and
that the deleterious character of hapaxes on IPXY is lessened by the influence of
the joint probability.

2.3     Visualizing mutual information maximization
For a joint distribution PY Ŷ (y, ŷ), [13] introduced a balance equation binding the
mutual information between two variables IPY Ŷ , the sum of their conditional en-
tropies V IPY Ŷ = HPY |Ŷ + HPŶ |Y and the sum of their entropic distance between
their distributions and uniformity ∆HPY Ŷ = (HUY − HPY ) + (HUŶ − HPŶ ),

              log(HUY ) + log(HUŶ ) = ∆HPY Ŷ + 2 ∗ IPY Ŷ + V IPY Ŷ .

By normalizing in the total entropy log(HUY ) + log(HUŶ ) we may obtain the
equation of the 2-simplex that can be represented as a De Finetti diagram like
that of Fig. 2.(a), as the point in the 2-simplex corresponding to coordinates

                          F (PY Ŷ ) = [∆HP0 Y Ŷ , 2 ∗ IP0 Y Ŷ , V IP0 Y Ŷ ]

where the primes represent the normalization described above.
    The axis of this representation were chosen so that the height of the 2-
simples—an equilateral triangle—is proportional to the mutual information be-
tween the variables so a maximization process is extremely easy to represent (as
in Fig. 2): given a parameter ϕ whereby to maximize IPY Ŷ (as a variable), draw
the trace of the evaluation of the coordinates in the ET of the distributions that
it generates, and choose the ϕ∗ that produces the highest point in the triangle.
This technique is used in Sec. 3.1, but other intuitions can be gained from this
representation as described in [14].

2.4     Exploring the space of joint distributions
Since the space of count distributions is so vast, we need a technique to explore
it in a principled way. For that purpose we use K-Formal Concept Analysis
                                On Concept Lattices as Information Channels         123


(KFCA). This is a technique to explore L-valued contexts where L is a complete
idempotent semifield using a free parameter called the threshold of existence [15,
13].
    We proceed in a similar manner to Fuzzy FCA: For L-context hY, Ŷ, Ri, con-
sider two spaces LY and LŶ , representing, respectively, L-valued sets of objects
and attributes. Pairs of such sets of objects and attributes that fulfil certain po-
lars equation have been proven to define dually-ordered lattices of closed L-sets
in the manner of FCA 3 .
    Since the actual lattices of object sets and attributes are so vast, KFCA
uses a simplified representation for them: for the singleton sets in each of the
spaces δy , for y ∈ Y and δŷ , for ŷ ∈ Ŷ , we use the L-polars to generate their
object- γYϕ (y) and attribute-concept µϕ Ŷ
                                            (ŷ), respectively, and obtain a structural
ϕ-context Kϕ = hY, Ŷ, Rϕ i, where yRϕ ŷ ⇐⇒ γYϕ (y) ≤ µϕ    Ŷ
                                                                (ŷ) 4 .
    In this particular case we consider the min-plus idempotent semifield and
the L-context hY, Ŷ, wP M Ii where wPMI is the weighted Pointwise Mutual In-
formation relation between atomic events in the sigma lattices of Y and Ŷ of
Sec. 2.2, whence the degree or threshold of existence is a certain amount of
entropy required for concepts to surpass for them to be considered.
    The following step amounts to an entropy conformation of the joint distribu-
tion, that is, a redistribution of the probability masses in the joint distribution
to obtain certain entropic properties. Specifically, we use the (binary) ϕ-formal
context to filter out certain counts in the contingency table to obtain a confor-
mal contingency table NYϕŶ (y, ŷ) = NY Ŷ (y, ŷ) Kϕ , where represents here the
Hadamard (pointwise) product. For each conformal NYϕŶ (y, ŷ) we will obtain a
certain point F (ϕ) in the ET to be represented as described in Sec. 2.3.

3     Application
We next present two envisaged applications of the technique of MI Maximization.

3.1   Cluster identification
Confusion matrices are special contingency tables whose two modes refer to the
same underlying set of labels[4]. We now put forward a procedure to maximize
the information transmitted from a set of “ground truth” patterns acting as
objects with respect to “perceived patterns” which act as attributes. As noted
in the introduction, this is just one of the possible points of view about this
problem.
    Consider the following scenario, there is a clustering task for which extrinsic
evaluation is possible, that is, there is a gold standard partitioning of the in-
put data. One way to evaluate the clustering solution is to obtain a confusion
3
  Refer to [13] for an in-depth discussion of the mathematics of idempotent semifields
  and the different kinds of Galois connections that they generate.
4
  And a structural ϕ-lattice Bϕ (Kϕ ) as its concept lattice, but this is not important
  in the present application
124      Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas


matrix out of this gold standard, in the following way: If the number of classes
is known—a realistic assumption in the presence of a gold standard—then the
MI optimization procedure can be used to obtain the assignments between the
classes in the gold standard and the clusters of the procedure, resulting in cluster
identification.
    For the purpose of testing the procedure, we used the segmented numeral
data from [16]. This is a task of human visual confusions between numbers as
displayed by seven-segment LED displays, as shown in Fig. 1.(a). The entry in
the count matrix NCK (c, k) = nck counts the number times that an instance of
class c was confused with class k . Figure 1.(b) shows a heatmap presentation
of the original confusion matrix and column-reshuffled variants.
                                                           P         Note that the
confusion matrix is diagonally-dominant, that is nii > j,j6=i nij and likewise
for column i .




                           (a)                        (b)

Fig. 1: Segmented numeral display (a) from [16] and the column-reshuffled con-
fusion matrix (b) of the human-perception experiment. Cluster identification is
already evident in this human-visualization aid, but the method here presented
is unsupervised.


    To test the MI optimization procedure, we randomly permuted the confu-
sion matrix columns: the objective was to recover the inverse of this random
permutation from the MI optimization process so that the original order could
be restored. This amounts to an assignment between classes and induced clus-
ters, and we claim that it can be done by means of the mutual information
maximization procedure sketched above.
    For that purpose, we estimated PCK (c, k) using the empirical estimate

                                             NCK (c, k)
                             P̂CK (c, k) ≈
                                                n
                               On Concept Lattices as Information Channels             125

                                                               P
where n is the number of instances to be clustered n =               ck NCK (c, k)   , and
then we obtained its empirical PMI

                            IˆCK (c, k) = log P̂CK (c, k)

and its weighted PMI

                    wP M I CK (c, k) = P̂CK (c, k) · IˆCK (c, k) .

    Next, we used the procedure of Sec. 2.4 to explore the empirical wPMI and
select the threshold value which maximizes the MI. Figure 2.(a) shows the tra-
jectory of the different conformed confusion matrices as ϕ ranges in [0, ∞) on
the ET: we clearly see how for this balanced task dataset the exploration results
in a monotonous increase in MI in the thresholding range until a value that pro-
duces the maximum MI, at wP M I ∗ = 0.1366 . The discrete set of points stems
from the limited range of counts in the data.
    We chose this value as threshold and obtained the binary matrix which is
the assignment from classes to clusters and vice-versa shown in Fig. 2.(b). Note
that in this particular instance, the ϕ∗ -concept lattice is just a diamond lat-
tice reflecting the perfect identification of classes and clusters. In general, with
contingency tables where modes have different cardinalities, this will not be the
case.

3.2    Entropy conformation for count matrices
The case where the contingency matrix is squared and diagonally dominant, as
in the previous example, is too specific: we need to show that for a generic, rect-
angular count contingency matrix, entropy maximization is feasible and mean-
ingful.
     The first investigation should be on how to carry the maximization pro-
cess. For that purpose, we use a modified version of the Reuters-21578 5 that
has already been stop-listed and stemmed. This is a multi-label classification
dataset [17] describing each document as a bag-of-terms and some categoriza-
tions labels, the latter unused in our present discussion.
     We considered the document-term matrix for training, a count distribution
with D = 7 770 documents and T = 5 180, terms. Its non-conformed entropy co-
ordinates are F (NDT ) = [0.1070, 0.3584, 0.5346] as shown in the deep blue circle
to the left of Fig. 3. We carried out a joint-mutual information maximization
process by exploring at the same time a max-plus threshold—the count has to
be bigger thant the threshold to be considered—and a min-plus threshold—the
count has to be less than the threshold. The rationale for this is a well-tested hy-
pothesis in the bag-of-term model: very common terms (high frequency) do not
select well for documents, while very scarce terms (low frequency) are too spe-
cific and biased to denote the general “aboutness” of a document. Both should
be filtered out of the document-term matrix.
5
    http://www.daviddlewis.com/resources/testcollections/reuters21578/
    readme.txt. Visited 24/06/2014.
126     Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas




                                      (a)
                                                
                              0000100000
                             1 0 0 0 0 0 0 0 0 0
                                                
                             0 0 0 0 0 0 0 0 1 0
                                                
                             0 0 0 0 0 0 0 0 0 1
                                                
                             0 0 0 0 0 1 0 0 0 0
                       Kϕ∗ =                    
                             0 0 0 0 0 0 0 1 0 0
                                                
                             0 0 0 0 0 0 1 0 0 0
                                                
                             0 0 1 0 0 0 0 0 0 0
                                                
                             0 0 0 1 0 0 0 0 0 0
                               0100000000
                                      (b)

Fig. 2: Trajectory of the evolution of MI transmission for the segmented numeral
data as the exploration threshold is raised in the wPMI matrix (a), and maximal
MI cluster assignment matrix at wPMI = 1.366 bits (b) for column-shuffled
Segmented Numerals. The resulting concept lattice is just a diamond lattice
identifying classes and clusters and not shown.
                              On Concept Lattices as Information Channels      127


   Instead of count-based individual term filtering we carry a joint term-document
pair selection process: for a document-matrix, we calculate its overall weighted
PMI matrix, and only those pairs (d, t) whose wPMI lies in between a lower φ and
an upper ϕ thresholds are considered important for later processing. For each
such pairs, we created an indicator matrix I(d, t) that is 1 iff φ ≤ wM I(d, t) ≤ ϕ,
and we used the Kronecker multiplication to filter out non-conforming pairs from
the final entropy calculation,
                                  X
                      M̂ IP0 DT =   wP M IDT (d, t) · I(d, t)
                                 d,t

    Figure 3 represents the trace of that process as we explore a grid of 10 × 10
different values of φ and ϕ (the same set of values for both). The grid was
obtained by equal width binning of the whole range of wP M IDT (d, t) in the
original wMI matrix as defined in [18].




Fig. 3: Trace of the entropy conformation process for a count matrix. The blue
dot to the left is the original level of entropy. For a wide range of pairs (φ, ϕ)
the entropy of the conformed count matrix is greater than the original one, and
we can actually find a value where it is maximized.


   We can see how M̂ IP0 DT reaches a maximum over two values and then de-
creases again, going even below the original mutual information value. We read
128      Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas


two different facts in this illustration: that the grid used is effective in obtaining
approximations to φ and ϕ for MI maximization, and that not every possible
pair of values is a good solution for the process.
    All in all, this procedure shows that MI maximization is feasible by tracking
its in the ET. We do not present any results in this paper as to the effectiveness
of the process for further processing tasks, which should be evaluated on the
extrinsic measures on the Reuters multi-labelling task.


4     Discussion

We now discuss the applications selected in a wider context. Although less per-
vasive than its unsupervised version, the basic task of supervised clustering has
application, for instance, in tree-induction for supervised classification [5, 18] or
unsupervised clustering evaluation using a gold-set [19]. Cluster identification
in Sec. 3.1 is a sometimes-fussy sub-procedure in clustering which our proposal
solves elegantly.
    The feasibility study on mutual information conformation of Sec. 3.2 is a
necessary step for further processing—binary or multi-labelling classification—
but as of this paper unevaluated. Further work should concentrate on leveraging
the boost in mutual information to lower the classification error, as suggested in
the theoretical sections.
    Besides, the use of two simultaneous, thresholds on different algebras makes
it difficult to justify the procedure on FCA terms: this does not conform to the
definition of any lattice-inducing polars that we know of, so this feature should
be looked into critically. Despite this fact, the procedure of conformation “makes
sense”, at least for this textual classification task.
    Note that the concept of “information channel” that we have developed in
this paper is not what Communication Theory usually considers. In there, “input
symbols” enter the channel and come out as “output symbols”, hence input has
a sort of ontological primacy over output symbols in that the former cause the
latter. If there is anything particular about FCA as an epistemological theory is
that it does not prejudge the ontological primacy of objects over attributes or vice
versa. Perhaps the better notion is that a formal concept is an information co-
channel between objects and attributes, in the sense that the information “flows”
both from objects to attributes and vice versa, as per the true symmetric nature
of mutual information: receiving information about one of the modes decreases
the uncertainty of the other.
    The previous paragraph notwithstanding, we will often find ourselves in ap-
plication scenarios in which one of the modes will be primary with respect to
the other, in which case the analogies with communication models will be more
evident. This is one of the cases that we explore in this paper, and that first
pointed at in [6, 7].
    Contingency tables are an instance of aggregable data tables [4, §0.3.4]. It
seems clear that not just counts, but any non-negative entry aggregable table can
be treated with the tools here presented, e.g. concentrations of solutes. In that
                                On Concept Lattices as Information Channels        129


case, the neat interpretation related to MI maximization will not be available,
but analogue ones can be found.
    A tangential approach to the definition of entropies in (non-Boolean) lattices
has been taken by [20, 21, 22, 23, 24]. These works approach the definition of
measures, and in particular entropy measures, in general lattices instead of finite
sigma algebras (that is, Boolean lattices). [22] and [24] specifically address the
issue of defining them in concept lattices, but the rest provide other heuristic
foundations for the definition of such measures which surely must do without
some of the more familiar properties of the Shannon (probability-based) entropy.


5    Conclusions and further work

We have presented an incipient model of L-formal contexts of aggregable data
and their related concept lattices as information channels. Using KFCA as the
exploration technique and the Entropy Triangle as the representation and vi-
sualization technique we can follow the maximization procedure on confusion
matrices in general, and in confusion matrices for cluster identification in par-
ticular.
    We present both the basic theory and two proof-of-concept applications in
this respect: a first one cluster identification, fully interpretable in the framework
of concept lattices, and another, entropy conformation for rectangular matrices
more difficultly embeddable in this framework.
    Future applications will extend the analysis of count contingency tables, like
document-term matrices, where our entropy-conformation can be likened to fea-
ture selection techniques.
                               Bibliography


 [1] Shannon, C.E.: A mathematical theory of Communication. The Bell System
     Technical Journal XXVII (1948) 379–423
 [2] Shannon, C., Weaver, W.: A mathematical model of communication. The
     University of Illinois Press (1949)
 [3] Brillouin, L.: Science and Information Theory. Second Edition. Courier
     Dover Publications (1962)
 [4] Mirkin, B.: Mathematical Classification and Clustering. Volume 11 of Non-
     convex Optimization and Its Applications. Kluwer Academic Publishers
     (1996)
 [5] Mirkin, B.: Core Concepts in Data Analysis: Summarization, Correlation
     and Visualization. Summarization, Correlation and Visualization. Springer,
     London (2011)
 [6] Peláez-Moreno, C., Garcı́a-Moral, A.I., Valverde-Albacete, F.J.: Analyzing
     phonetic confusions using Formal Concept Analysis. Journal of the Acous-
     tical Society of America 128 (2010) 1377–1390
130     Francisco J. Valverde Albacete, Carmen Peláez-Moreno and Anselmo Peñas


 [7] Peláez-Moreno, C., Valverde-Albacete, F.J.: Detecting features from con-
     fusion matrices using generalized formal concept analysis. In Corchado, E.,
     Grana-Romay, M., Savio, A.M., eds.: Hybrid Artificial Intelligence Systems.
     5th International Conference, HAIS 2010, San Sebastián, Spain, June 23-25,
     2010. Proceedings, Part II. Volume 6077 of LNAI., Springer (2010) 375–382
 [8] Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information
     Retrieval. Cambridge University Press (2008)
 [9] Japkowicz, N., Shah, M.: Evaluating Learning Algorithms: A Classification
     Perspective. Cambridge University Press (2011)
[10] Frénay, B., Doquire, G., Verleysen, M.: Theoretical and empirical study
     on the potential inadequacy of mutual information for feature selection in
     classification. NEUROCOMPUTING 112 (2013) 64–78
[11] M Fano, R.: Transmission of Information: A Statistical Theory of Commu-
     nication. The MIT Press (1961)
[12] Feder, M., Merhav, N.: Relations between entropy and error probability.
     IEEE Transactions on Information Theory 40 (1994) 259–266
[13] Valverde-Albacete, F.J., Peláez-Moreno, C.: Two information-theoretic
     tools to assess the performance of multi-class classifiers. Pattern Recog-
     nition Letters 31 (2010) 1665–1671
[14] Valverde-Albacete, F.J., Peláez-Moreno, C.: 100% classification accuracy
     considered harmful: the normalized information transfer factor explains the
     accuracy paradox. PLOS ONE (2014)
[15] Valverde-Albacete, F.J., Peláez-Moreno, C.: Galois connections between
     semimodules and applications in data mining. In Kusnetzov, S., Schmidt,
     S., eds.: Formal Concept Analysis. Proceedings of the 5th International
     Conference on Formal Concept Analysis, ICFCA 2007, Clermont-Ferrand,
     France. Volume 4390 of LNAI., Springer (2007) 181–196
[16] Keren, G., Baggen, S.: Recognition models of alphanumeric characters.
     PERCEPT PSYCHOPHYS 29 (1981) 234–246
[17] Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. Inter-
     national Journal of Data Warehousing and . . . (2007)
[18] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten,
     I.H.: The WEKA data mining software: An update. SIGKDD Explorations
     11 (2009)
[19] Meila, M.: Comparing clusterings—an information based distance. Journal
     of Multivariate Analysis 28 (2007) 875–893
[20] Knuth, K.: Valuations on Lattices and their Application to Information
     Theory. Fuzzy Systems, IEEE International Conference on (2006) 217–224
[21] Grabisch, M.: Belief functions on lattices. International Journal Of Intelli-
     gent Systems 24 (2009) 76–95
[22] Kwuida, L., Schmidt, S.E.: Valuations and closure operators on finite lat-
     tices. Discrete Applied Mathematics 159 (2011) 990–1001
[23] Simovici, D.: Entropies on Bounded Lattices. Multiple-Valued Logic (IS-
     MVL), 2011 41st IEEE International Symposium on (2011) 307–312
[24] Simovici, D.A., Fomenky, P., Kunz, W.: Polarities, axiallities and mar-
     ketability of items. In: Proceedings of Data Warehousing and Knowledge
     Discovery - DaWaK. Volume 7448 of LNCS. Springer (2012) 243–252
          Using Closed Itemsets for Implicit User
             Authentication in Web Browsing
                     1         1,2               2              2                       2
       O. Coupelon , D. Dia          , F. Labernia , Y. Loiseau , and O. Raynaud

                 1
                   Almerys, 46 rue du Ressort, 63967 Clermont-Ferrand
                      {olivier.coupelon,diye.dia}@almerys.com
            2
              Blaise Pascal University, 24 Avenue des Landais, 63170 Aubière
              {loiseau,raynaud}@isima.fr, fabien.labernia@gmail.com



         Abstract.     Faced with both identity theft and the theft of means of
         authentication, users of digital services are starting to look rather suspi-
         ciously at online systems. The behavior is made up of a series of observ-
         able actions of an Internet user and, taken as a whole, the most frequent
         of these actions amount to habit. Habit and reputation oer ways of
         recognizing the user. The introduction of an implicit means of authenti-
         cation based upon the user's behavior allows web sites and businesses to
         rationalize the risks they take when authorizing access to critical func-
         tionalities. In this paper, we propose a new model for implicit authen-
         tication of web users based on extraction of closed patterns. On a data
         set of web navigation connection logs of 3,000 users over a six-month
         period we follow the experimental protocol described in [1] to compute
         performance of our model.


  1    Introduction
  In order to achieve productivity gains, companies are encouraging their cus-
  tomers to access their services via the Internet. It is accepted that on-line ser-
  vices are more immediate and more user-friendly than accessing these services
  via a brick and mortar agency, which involves going there and, more often than
  not, waiting around [2]. Nevertheless, access to these services does pose secu-
  rity problems. Certain services provide access to sensitive data such as banking
  data, for which it is absolutely essential to authenticate the users concerned.
  However identity thefts are becoming more and more numerous [3]. We can dis-
  tinguish two paradigms for increasing access security. The rst one consists of
  making access protocols stronger by relying, for example, on external devices for
  transmitting access codes that are supplementary to the login/password pair.
  Nevertheless, these processes are detrimental to the user-friendliness and usabil-
  ity of the services. The number of transactions abandoned before reaching the
  end of the process is increasing and exchange volumes are decreasing. The sec-
  ond paradigm consists to the contrary of simplifying the identication processes
  in order to increase the exchange volumes. By way of examples, we can mention
  single-click payment [2] [4] or using RFID chips for contactless payments. Where
  these two paradigms meet is where we nd implicit means of authentication.



c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 131–143,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
2132     Olivier
        O.       Coupelon
           Coupelon,      et al.
                     D. Dia,  F. Labernia, Y. Loiseau, and O. Raynaud

A means of authentication is a process that makes it possible to ensure that the
identity declared in the event of access is indeed the user's identity. Traditionally,
a user authenticates himself or herself by providing proof of identity [5]. This
process is called explicit authentication. In contrast, implicit authentication
does not require anything from the user but instead studies his or her behavior,
the trail left by the user's actions, and then either does or does not validate the
declared identity. An implicit means of authentication cannot replace traditional
means of authentication as it is necessary for the user to have access to his or
her service so that the person's behavior may be studied and their identity
can either be validated or rejected. To the contrary, if it is eective, it would
enable stronger authentication modes to be avoided (such as chip cards and PIN
numbers), which are detrimental to the usability of services. The challenge is to
detect identity theft as quickly as possible and, to the contrary, to validate a
legitimate identity for as long a time as possible.
    This contribution is organized as follows: in section 2 we shall oer a state-of-
the-art about implicit authentication and user's prole in web browsing. Then
we propose a learning model for implicit authentication of web users we are
dealing with in section 3. In section 4, we compare several methods for building
proles of each user. We faithfully reproduce the experimental study conducted
in [1] and we analyze all of our results. Finally, in section 5, we shall resume our
results and discuss our future work.



2      Related works
In his survey of implicit authentication for mobile devices ([6]), the author says of
an authentication system that it is   implicit if the system does not make demands
of the user (see Table 1).
    Implicit authentication systems were studied very quickly for mobile phones.
In [7] and [8], the authors studied behaviour based on variables specic to smart-
phones such as calls, SMS's, browsing between applications, location, and the
time of day. Experiments were conducted based on the data for 50 users over
a period of 12 days. The data were gathered using an application installed by
users who were volunteers. The users' proles were built up from how frequently
positive or negative events occurred and the location. Within this context, a
positive event is an event consistent with the information gathered upstream.
By way of an example, calling a number which is in the phone's directory is a
positive event. The results of this study show that based on ten or so actions,
you can detect fraudulent use of a smartphone with an accuracy of 95%. In a
quite dierent context, the authors of [9] relied on a Bayesian classication in
order to associate a behaviour class with each video streaming user. The data
set is simulated and consists of 1,000 users over 100 days. The variables taken
into account are the quality of the ow, the type of program, the duration of the
session, the type of user, and the popularity of the video. The results are mixed,
because the model proposed admits to an accuracy rate of 50%.
    Using Closed
    Using Closed Itemsets
                  Itemsets for
                            for Implicit
                                 Implicit User
                                          User Authentication
                                               Authentication in
                                                               in Web
                                                                  Web Browsing
                                                                      Browsing      133
                                                                                     3

Feature    Capturing          Implicit/Explicit Spoong Threats Problems
           Method
Passcode   Keyboard input     Explicit          Keyloggers,        Guessable pass-
                                                Shoulder Surng words
Token       Hardware device Mainly explicit, None                  Easily stolen or
                             implicit possible                     lost
Face & Iris Camera           Both               Picture of the le- Lighting     situa-
                                                gitimate user      tion and make-up
Keystroke Keyboard           Implicit, explicit Typing imitation Long        training
                             possible           (dicult)          phase, reliability
Location GPS, infrastruc- Implicit              Informed           Traveling, preci-
            ture                                strangers          sion
Network Software protocol Implicit              Informed           Precision
            (e.g. WireShark)                    strangers
             Table 1. Comparison of dierent authentication methods




The particular context of implicit authentication for web browsing was studied
in [1], [10], [11] and [12]. In [1], the author adopted the domain name, the num-
ber of pages viewed, the session start time, and its duration, as characteristic
variables. The data set, which was gathered by a service provider, consisted of
300 rst connections by 2,798 users over a period of 12 months. The user proles
consisted of patterns with a size of 1. The author compares several pattern selec-
tion approaches like the support and the lift approaches. The study shows that
for small, anonymous behavioural patterns (involving up to twenty or so sites
visited), the most eective models are still traditional classication models like
decision trees. On the other hand, whenever anonymous behaviour exceeds 70 or
so sites, the support and lift-based classication models are more accurate. The
study conducted in [12] states that the size of the data set remains a determining
parameter. Their study, conducted on 10 users over a one-month period, did not
enable them to build a signicant model for distinguishing users. The authors
also concluded that no variable taken individually enables a user to be authen-
ticated. Drawing inspiration from a study conducted in [1], the authors of [13]
studied several techniques for spying on a user who holds a dynamic IP address,
based on behavioural models. The methods compared are seeking motives, the
nearest neighbours technique, and the multinomial Bayesian classier. The data
set consisted of DNS requests from 3,600 users over a two-month period. In this
study, only the most signicant variables and the most popular host names were
considered. The accuracy rates for the models proposed were satisfactory.



The study that we conduct in this paper also forms part of a continuation of the
work by [1]. We faithfully reproduce his experimental protocol on our data and
we compare performance of our classication algorithm to his specic models.
4134       Olivier
          O.       Coupelon
             Coupelon,      et al.
                       D. Dia,  F. Labernia, Y. Loiseau, and O. Raynaud

3      Models
We propose an intuitive learning model architecture for user authentication.
From a data set of web browsing logs we compute a set of               own patterns for
each user. A pattern is a set of frequently visited sites. The size of pattern
may vary. Thanks to these         proles we are able to provide an authentication
for anonymous sessions. We then compute confusion matrices and we provide
precisions of the models. In our present study, we compare performance of a naive
Bayes classier to variations on k-nearest neighbors algorithms. More precisely,
the studied parameters are selection process of user  own patterns, computation
process of   user proles and distance functions computed for classication stage.
Figure 1 outlines the framework of the machine learning process.




      Past                              Anonymous
    Behaviour                           Behaviour

                  ?                User             ?
                                  Prole-                                     User
     Learning Algorithms                    Score Computation       - Authentication



                                    Fig. 1. Architecture




3.1 Formal framework
We call a session a set of visited web sites at a specic time by a given user ui
such as i ∈ [1,n] and n is the number of users. The size of a session is limited
and equal to 10. The learning database of each user ui takes the form of a set of
                                                      3                S
sessions denoted Sui and is built from log data . We call S =              i Sui the whole
set of sessions of the database.
We call Wui the whole set of web sites visited at least once by user ui and we
             S
call W =         i Wui the whole set of visited sites. The order of visited web sites is
not taken into account by this model.

Denition 1 (k-pattern). Let W be a set of visited web sites and S be a set
of sessions on W . A subset P of W is called a k − pattern where k is the size
of P . A session S in S is said to contain a k − pattern P if P ⊆ S .
Denition 2 (Support and relative support (lift)). We dene the support
of a pattern P as the percentage of sessions in S containing P (by extension we
give the support of a pattern in the set of sessions of a given user ui ):
                      ||{S ∈ S | P ⊆ S}||                           ||{S ∈ Sui | P ⊆ S}||
supportS (P ) =                                 supportSui (P ) =
                              ||S||                                         ||Sui ||
3
    Cf. section 4.1
      Using Closed
      Using Closed Itemsets
                    Itemsets for
                              for Implicit
                                   Implicit User
                                            User Authentication
                                                 Authentication in
                                                                 in Web
                                                                    Web Browsing
                                                                        Browsing    135
                                                                                     5



For a given user the relative strength of a pattern is equivalent to the lif t in
a context of association rules (i.e. the support of the pattern within this user
divided by the support of the pattern across all users). More formally:
                                                supportSui (P )
                            lif tSui S (P ) =
                                                 supportS (P )

The   support measures the strength of a pattern in behavioral description of a
given user. The relative support mitigates support measure by considering the
pattern's support on the whole sessions set. The stronger the global support of
a pattern, the lesser characteristic of a specic user.
The tf-idf is a numerical statistic that is intended to reect how relevant a word
is to a document in a corpus. The tf-idf value increases proportionally to the
number of times a word appears in the document, but is oset by the frequency
of the word in the whole corpus ([14]). In our context, a word becomes a pattern,
a document becomes a set of sessions Sui of a given user and the corpus becomes
the whole set S of all sessions.


Denition 3 (tf ×idf ). Let P be a pattern, let U be a set of users and Up ⊆ U
such that ∀ui ∈ Up , supportSui (P ) 6= 0. Let Sui be a set of sessions of a given
user ui and S a whole set of sessions. The normalized term frequency denoted
tf (P ) is equal to supportSui (P ) and the inverse document frequency denoted
idf (P ) is equal to log (||U ||/||UP ||). We have:
                                                                           
                                                                   ||U ||
                    tf × idf (P ) = supportSui (P ) × log
                                                                  ||UP ||

Denition 4 (Closure system). Let S be a collection of sessions on the set
W of web sites. We denote S c the closure under intersection of S . By adding W
in S c , S c is called a closure system.
Denition 5 (Closure operator). Let W be a set, a map C: 2W → 2W is
a closure operator on W if for all sets A and B in W we have: A ⊆ C(A),
A ⊆ B =⇒ C(A) ⊆ C(B) and C(C(A)) = C(A).

Theorem 1. Let S c be a closure
                         T      system on W . Then the map CS dened on         c

2W
      by ∀A ∈ 2 , C (A) = {S ∈ S | A ⊆ S} is a closure operator on W 4 .
                W
                      Sc
                                          c


Denition 6 (Closed pattern5 ). Let S c be a closure system on W and CS              c

its corresponding closure operator. Let P be a pattern (i.e. a set of visited sites),
we said that P is a closed pattern if CS c (P ) = P .
4
    Refer to the book of [15].
5
    This denition is equivalent to a concept of the formal context K = (S,W,I) where
    S is a set of objects, W a set of attributes and I a binary relation between S and
    W [16].
6136       Olivier
          O.       Coupelon
             Coupelon,      et al.
                       D. Dia,  F. Labernia, Y. Loiseau, and O. Raynaud

3.2 Own patterns selection
The rst and most important step of our model, called          own patterns selection
is to calculate the set of own patterns for each user ui . This set of patterns is
denoted Pui = {Pi,1 , Pi,2 ,..., Pi,p }. In [1], the author states that p = 10 should be
a reference value and that beyond this value model performance are stable. We
shall follow that recommendation. In [1], 10 frequent 1 − patterns are selected
for each user. The aim of our study is to show that it could be more ecient
to select closed   k − patterns. But, the number of closed patterns should be
strong, so we compare three heuristics (H1 , H2 and H3 ) to select the 10 closed
patterns of each user. For each heuristic, closed patterns are computed thanks
to Charm algorithm ([17]) provided on the Coron platform ([18]). Only closed
patterns with a size lower than or equal to 7 are considered. These heuristics are
presented here:


 1. 10 1 − patterns with the largest      support values (as in [1])
 2. H1 : 40 closed k − patterns with the largest tf-idf values.
 3. H2 : 10 ltered closed k − patterns with the largest support and maximal
      values by inclusion set operator.
 4.   H3 : 10 ltered closed k − patterns with the largest tf-idf and minimal values
      by inclusion set operator.


Algorithm 1 describes the process of H1 to select the 40 own patterns for a given
user. With H1 , the model performance is improved when p increases up to 40.
p = 10 is the better choice for H2 and H3 . The best results are from H1 .

 Algorithm 1: H1 : 40 closed k − patterns with the largest tf-idf values.
  Data: Cu : the set of closed itemsets of user ui from Charm;
               i
      p : the number of selected own patterns;
      Result: Pui : the set of own patterns of user ui ;
 1 begin
 2    Compute the tf × idf for each pattern from Charm;
 3    Sort the list of patterns in descending order according to the tf × idf
         value;
 4       Return the top p patterns;




3.3 User proles computation
                                     S
We dene and we denote Pall =       i Pui the whole set of own patterns. The set
Pall allows us to dene a common space in which all users could be embedded.
More formally, Pall denes a vector space V of size all = ||Pall || where a given
user ui is represented as a vector Vui = (mi,1 ,mi,2 ,...,mi,all ).
The second step of our model, called      user prole computation, is to compute, for
each user ui , a numerical value for each component mi,j of the vector Vui . i is
the user id, j ∈ [1,all] is a pattern id and m stands for a given    measure. In this
paper, we compare two measures proposed in [1]: the         support and the lift.
    Using Closed
    Using Closed Itemsets
                  Itemsets for
                            for Implicit
                                 Implicit User
                                          User Authentication
                                               Authentication in
                                                               in Web
                                                                  Web Browsing
                                                                      Browsing          137
                                                                                         7



            mi,j = supportSui (Pj )       and           mi,j = lif tSui S (Pj )


3.4 Authentication stage
In our model, the authentication step is based on the identication. For that
purpose, our model guesses the user corresponding to an anonymous set of ses-
sions, then it checks if the guessed identity corresponds to the real identity. From
this set of sessions we have to build a test prole and to nd the nearest user
prole dened during the learning step.



Test sessions Performance of our models are calculated on anonymous data
sets of growing size.The more information available, the better the classication
will be. The rst data set consists of only one session, the second consists of 10
sessions, the third one consists of 20 sessions, and the last one consists of 30
sessions. For the test phase, all sessions have the same size of 10 sites.



Building test prole Let S be the whole set of sessions from the learning
data set. Let Sut be an anonymous set of sessions and Vut = (mt,1 ,mt,2 ,...,mt,all )
its corresponding prole vector. We will compare two approaches to build the
anonymous test prole, the     support and the lift:
                                                                          supportSut (Pi )
 ∀i, mt,i = supportSut (Pi )       and      ∀i, mt,i = lif tSut S =
                                                                           supportS (Pi )

Distance functions Let Vu = (mi,1 ,mi,2 ,...,mi,all ) and Vu = (mt,1 ,mt,2 ,...,mt,all )
                               i                                     t
be two proles. We denoted DisEuclidean (Vui ,Vut ) the Euclidean distance and
we denote SimCosine (Vui ,Vut ) the cosine similarity function. We have:

                                               sX
                  DisEuclidean (Vui ,Vut ) =            (mt,j − mi,j )2
                                                    j

                                                P
                                                   j (mt,j × mi,j )
                SimCosine (Vui ,Vut ) = qP                       P
                                                            2×            2
                                                j (mt,j )        j (mi,j )



4   Experimental results
4.1 Data set
Our data set is comprised of the web navigation connection logs of 3,000 users
over a six-month period. We have at our disposal the domain name visited and
each user ID. From the variables of day and time of connection we have con-
structed connection sessions for each user. A session is therefore a set of web
8138     Olivier
        O.       Coupelon
           Coupelon,      et al.
                     D. Dia,  F. Labernia, Y. Loiseau, and O. Raynaud

sites visited. The number of visited web sites per session is limited and equal
to 10. For the relevance of our study we used Adblock
                                                            6 lters to remove all do-
mains regarded as advertising. The majority of users from this data set are not
suciently active to be of relevance. Therefore, as in [1], we have limited our
study to the 2% of most active users and obtained the signicant session sets for
52 users. The 30 users most active (who have a large number of sessions) among
those 52 users are used in this paper. Table 2 gives the detailed statistics for this
data set.



        7698 sessions Minimum Maximum Mean Standard deviation
            Size         10      10     10         0
       #sessions/users  101     733    257        289

Table 2. Descriptive statistics of the used data set: size of sessions (number of visited
web sites) and number of sessions per user, for 30 users.




4.2 Experimental protocol: a description
Algorithm 2 (see appendix) describes our experimental protocol. The rst loop
sets the size of the set of users among which a group of anonymous sessions will
be classied. The second one sets the size of this sessions group. Finally, the third
loop sets the number of iterations used to compute the average accuracy rate.
The loop on line 10 computes the specic patterns of each user and establishes
the proles vector. The loop on line 13 computes the vector's components for
each user. The nested loops on lines 16 and 18 classify test data and compute
the accuracy rate.



4.3 Comparative performance of H1 , H2 and H3
From own patterns of each user we compute the set Pall as the whole set of
own patterns which denes the prole vector of each user. We use the support
of a pattern as numerical value for each components (cf. section 3.3). Following
Table 3 provides the size of the prole vector and the distribution of own patterns
according to size for each heuristic. With 30 users and 10 own patterns per user,
the maximal size of the prole is 300.




                 Number of own patterns |1| |2| |3| |4| |5| |6| |7|
            H1            199           18% 31% 26% 16% 7% 2% 0%
            H2            167           57% 29% 9% 3% 1% 1% 0%
            H3            199           24% 20% 18% 14% 10% 9% 5%

 Table 3. Prole vector size and the distribution of own patterns according to size.
      Using Closed
      Using Closed Itemsets
                    Itemsets for
                              for Implicit
                                   Implicit User
                                            User Authentication
                                                 Authentication in
                                                                 in Web
                                                                    Web Browsing
                                                                        Browsing   139
                                                                                    9




                       80



            Accuracy
                       60


                       40


                       20                              Bayes       Charm H3
                                                       Charm H2    Charm H1

                            0   5   10       15      20      25    30     35
                                         Number of test sessions



Fig. 2. Comparative performance of H1 , H2 and H3 . These observations are plotted
on an X-Y graph with number of sessions of the anonymous set on the X-axis and
accuracy rate on the Y-axis. Measured values are smoothed on 50 executions.


     Figure 2 shows that naive Bayes classier is the most eective if the group
of test sessions is from 1 to 13 sessions (10 to 130 visited web sites). This result
is in line with the study in [1]. Finally, this graph clearly shows that heuristic
H1 certainly stands out from H2 and H3 . So, the best heuristic is to choose
owns patterns amongst closed patterns with the largest tf × idf values. As a
consequence, the majority of patterns are small-sized patterns (two or three
sites) (cf. Table 3). But accuracy rates are much higher.



4.4 Comparative performance with [1]
In [1], the author compares, in particular, two methods of prole vector calculus.
In both cases, the own patterns are size 1 and are chosen amongst the most fre-
quent. The rst method, named support-based proling, uses the corresponding
support pattern as the numerical value for each component of the prole vector.
The second method, called lift-based proling, uses the lift measure. In order to
compare the performances of the H1 model with the two models support-based
proling and lift-based proling, we have accurately replicated the experimental
protocol described in [1] on our own data set. The results are given in Table 4.
     The data of Table 4 highlight that the H1 heuristic allows rates that are
perceptibly better than those of the two models proposed in [1] in all possible
scenarios. Nevertheless, the Bayes classier remains the most ecient when the
session group is size 1 in compliance with [1]. Figure 3 allows a clearer under-
standing of the moment the Bayes curve crosses the H1 heuristic curve.

6
    http://adblock-listefr.com/
 140
10       Olivier
        O.       Coupelon
           Coupelon,      et al.
                     D. Dia,  F. Labernia, Y. Loiseau, and O. Raynaud

                   # of users                             1          10        20        30
                                      Support             65         89        95        97
                   2                  Lift                67         90        97        98
                                      Charm H1            72         98       99         100
                                      Bayes               85         99        73        61
                                      Support             40         74        83        88
                   5                  Lift                41         78        86        88
                                      Charm H1            49         90       95         98
                                      Bayes               67         96        56        34
                                      Support             27         66        79        80
                   10                 Lift                29         64        77        80
                                      Charm H1            37         83       92         94
                                      Bayes               54         91        51        24
                                      Support             19         55        68        75
                   20                 Lift                21         58        68        74
                                      Charm H1            30         76       86         90
                                      Bayes               43         87        48        19
                                      Support             16         53        64        70
                   30                 Lift                17         54        64        69
                                      Charm H1            26         72       83         89
                                      Bayes               39         83        46        19

Table 4. On left, we nd the number of users and the selected model. Each column is
dened by the number of sessions of the anonymous data set. Sessions are of size 10.
Measured accuracy rate are smoothed on 100 executions. In bold the best values are
presented.




                         80
              Accuracy




                         60


                         40

                                                                    Support     Lift
                         20                                         Bayes       Charm H1
                              0   2   4   6     8    10        12    14   16        18   20    22
                                              Number of test sessions



Fig. 3. Comparative performance of Bayes,  support-based proling, lift-based proling
and H1 . These observations are plotted on an X-Y graph with number of sessions of
the anonymous set on the X-axis and accuracy rate on the Y-axis. Number of users is
equal to 30. Measured values are smoothed on 50 executions.
    Using Closed
    Using Closed Itemsets
                  Itemsets for
                            for Implicit
                                 Implicit User
                                          User Authentication
                                               Authentication in
                                                               in Web
                                                                  Web Browsing
                                                                      Browsing          141
                                                                                        11

4.5 Comparative performance of distance functions
The last gure 4 shows the impact of distance function choice on performances
of models.




                           80



                           60
                Accuracy




                           40


                                        Bayes                      Lift (cosine)
                           20
                                        Lift (euclidean)           Charm H1 (cosine)
                                        Charm H1 (euclidean)

                                0   5    10       15       20       25     30      35
                                              Number of test sessions



Fig. 4. Comparative performance of both H1 with cosine similarity and Euclidean
distance, Bayes and lift-based proling. These observations are plotted on an X-Y graph
with number of sessions of the anonymous set on the X-axis and accuracy rate on the Y-
axis. Number of users is equal to 30. Measured values are smoothed on 100 executions.


    Figure 4 illustrates the signicance of the distance function concerning the
performance. Indeed, when used with Euclidean distance, the H1 method is a bit
more precise than the lift one (about 3%). However, performances are improved
by using the cosine similarity and their relative ranking is even reversed. H1
method's performance are then better than lift by 10%.



5    Conclusions and future work
In this study, we proposed a learning model for implicit authentication of web
users. We proposed an simple and original algorithm (cf. Algorithm 1) to get a set
of own patterns allowing to characterize each web user. The taken patterns have
dierent size and qualify as closed patterns from closure system generated by
the set of sessions (cf. Table 3). By reproducing experimental protocol described
in [1], we showed that the performances of our model are signicantly better
than some models proposed in the literature (cf. Table 4). We also showed the
key role of the distance function (cf. Figure 4).
    This study should be extended in order to improve the obtained results. For
a very small sites ow, the results of the solution should be better than results
 142
12       Olivier
        O.       Coupelon
           Coupelon,      et al.
                     D. Dia,  F. Labernia, Y. Loiseau, and O. Raynaud

from Bayes' method. Another way to improve results will be to select other types
of variable and to add them to our current dataset. The selection of data has an
undeniable impact on the results.



References
 1. Yang, Y.C.: Web user behavioral proling for user identication. Decision Support
    Systems (49) (2010) pp. 261271
 2. Guvence-Rodoper, C.I., Benbasat, I., Cenfetelli, R.T.: Adoption of B2B Ex-
    changes: Eects of IT-Mediated Website Services, Website Functionality, Benets,
    and Costs. ICIS 2008 Proceedings (2008)
 3. Lagier, F.: Cybercriminalité : 120.000 victimes d'usurpation d'identité chaque
    année en france. Le populaire du centre (in French) (2013)
 4. Filson, D.: The impact of e-commerce strategies on rm value: Lessons from Ama-
    zon.com and its early competitors. The Journal of Business 77(S2) (2004) pp.
    S135S154
 5. He, R., Yuan, M., Hu, J., Zhang, H., Kan, Z., Ma, J.: A novel service-oriented
    AAA architecture. 3 (2003) pp. 28332837
 6. Stockinger, T.: Implicit authentication on mobile devices. The Media Informatics
    Advanced Seminar on Ubiquitous Computing (2011)
 7. Shi, E., Niu, Y., Jakobsson, M., Chow, R.: Implicit authentication through learning
    user behavior. M. Burmester et al. (Eds.): ISC 2010, LNCS 6531 (2011) pp. 99113
 8. Jakobsson, M., Shi, E., Golle, P., Chow, R.: Implicit authentication for mobile
    devices. Proceeding HotSec'09 Proceedings of the 4th USENIX conference on Hot
    topics in security (2009) pp. 99
 9. Ullah, I., Bonnet, G., Doyen, G., Gaïti, D.: Un classieur du comportement des
    utilisateurs dans les applications pair-à-pair de streaming vidéo. CFIP 2011 -
    Colloque Francophone sur l'Ingénierie des Protocoles (in French) (2011)
10. Goel, S., Hofman, J.M., Sirer, M.I.: Who does what on the web: A large-scale
    study of browsing behavior. In: ICWSM. (2012)
11. Kumar, R., Tomkins, A.: A characterization of online browsing behavior. In:
    Proceedings of the 19th international conference on World wide web, ACM (2010)
    pp. 561570
12. Abramson, M., Aha, D.W.: User authentication from web browsing behavior. Pro-
    ceedings of the Twenty-Sixth International Florida Articial Intelligence Research
    Society Conference pp. 268273
13. Herrmann, D., Banse, C., Federrath, H.: Behavior-based tracking: Exploiting char-
    acteristic patterns in DNS trac. Computer & Security (2013) pp. 117
14. Salton, G.: Automatic text processing: The transformation, analysis and retrieval
    of information by computer. Addison Wesley (1989)
15. Davey, B.A., Priestley, H.A.: Introduction to lattices and orders. Cambridge
    University Press (1991)
16. Ganter, B., Wille, R.: Formal concept analysis, mathematical foundation, Berlin-
    Heidelberg-NewYork et al.:Springer (1999)
17. Zaki, M.J., Hsiao, C.: Ecient algorithms for mining closed itemsets and their
    lattice structure. IEEE Transactions on knowledge and data engineering 17(4)
    (2002) pp. 462478
18. Szathmary, L.: Symbolic Data Mining Methods with the Coron Platform. PhD
    Thesis in Computer Science, University Henri Poincaré  Nancy 1, France (Nov
    2006)
     Using Closed
     Using Closed Itemsets
                   Itemsets for
                             for Implicit
                                  Implicit User
                                           User Authentication
                                                Authentication in
                                                                in Web
                                                                   Web Browsing
                                                                       Browsing          143
                                                                                         13

Appendix

 Algorithm 2: Experiment procedure
        S
  Data: i Su : all sessions from n users;
                 i
     X : number of successive executions;
     Result: The mean accuracy of select models;
 1 begin
 2    for (N = {2, 5, 10, 20, 30}) do
 3       for (S = {1, 10, 20, 30}) do
 4          for (z = 1, . . . ,X ) do
 5              Select N random users;
 6              For each user, select SN = min(|Sui |, i = 1, . . . ,N );
                            2
 7              Take the
                            3 of the SN sessions from each users to form the
                     training set;
 8                   Take the rest of SN sessions to form the validation set;
                       k
 9                   Pall ← ∅ (the global prole vector for each model k );
10                   for each (ui , i = 1, . . . ,N ) do
                                                         k      k
11                       Compute the own patterns Pu (1 ≤ |Pu | ≤ 10);
                                                          i      i
                          k        k     k
12                       Pall ← Pall ∪ Pui ;
13                   for each (ui , i = 1, . . . ,N ) do
                                                   k
14                       Compute the vector Vu
                                                   i
                                                       with support or lift;
                                                                 k
15                   Initialize to 0 the confusion matrix M          of the method k ;
16                   for each (ui , i = 1, . . . ,N do
17                       Compute the test stream Tui (|T | is xed,        T ∈ Tui );
18                       while (Tu 6= ∅) do
                                     i
                                                                               k
19                           Take SW sessions from Tui to compute VT ;
20                           ua ← max(simil(Vuki ,VTk )) or min(dist(Vuki ,VTk ));
21                           M k [ui ][ua ] ← M k [ui ][ua ] + 1;
                                                                 k
22             Compute the mean accuracy of k from M ;
          The direct-optimal basis via reductions

  Estrella Rodrı́guez-Lorenzo1 , Karell Bertet2 , Pablo Cordero1 , Manuel Enciso1 ,
                                  and Angel Mora1
                     1
                        University of Málaga, Andalucı́a Tech, Spain,
        e-mail: {estrellarodlor,amora}@ctima.uma.es, {pcordero,enciso}@uma.es
                        2
                          Laboratoire 3I, Université de La Rochelle
                              e-mail: karell.bertet@univ-lr.fr



         Abstract. Formal Concept Analysis has become a real approach in the
         trend Information-Knowledge-Wisdom. It turns around the mining of a
         data set to built a concept lattice which provides an strong structure of
         the knowledge. Implications play the role of an alternative specification
         of this concept lattice and may be managed by means of inference rules.
         This syntactic treatment is guided by several properties like directness,
         minimality, optimality, etc. In this work, we propose a method to calcu-
         late the direct-optimal basis equivalent to a given Implicational System.
         Our method deals with unitary and non-unitary implications. Moreover,
         it shows a better performance that previous methods in the literature
         by means of the use of Simplification Logic and reduction paradigm,
         which remains narrow implications in any stage of the process. We have
         also developed an empirical study to compare our method with previous
         approaches in the literature.


  1    Introduction

  Formal Concept Analysis (FCA) is a trending upward area which establishes a
  proper and fine mixture of formalism, data analysis and knowledge discovering. It
  is able to analyze and extract information from a context K, rendering a concept
  lattice. Attribute implications [10] represent implicit knowledge between data
  and they can be deduced from the concept lattice or using mining techniques
  from the context directly. An attribute implication is an expression A → B
  where A and B are sets of attributes. A context satisfies A → B if every object
  that has all the attributes in A has also all the attributes in B.
      The study of sets of implications that satisfies some criteria is one of the
  relevant topics in FCA. An implicational system (IS) of K is defined as a set
  Σ of implications of K from which any valid implication for K can be deduced
  by means a syntactic treatment of the implications. This symbolic manipulation
  introduces the notion of equivalent sets of implications and opens the door to the
  definition of several criteria to discriminate good sets of implications according
  to these criteria. Thus, the challenges are the definition of an specific notion of IS,
  named basis, fulfilling some criteria related with minimality and the introduction
  of efficient methods to transform an arbitrary IS into a basis.

c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 145–157,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
146      Estrella Rodrı́guez Lorenzo et al.


    For instance, if the criteria is to obtain an IS with minimum cardinal we can
build the so-called Duquenne-Guigues (or stem) basis [11]. Each application may
induces a different criterium. For instance, in [2, 3] some methods to calculate
the direct-optimal basis are introduced, joining minimality and directness in the
same notion of basis. In [8] a method to obtain a basis with minimal size in the
left-hand size of the implications was proposed.
    In this paper, we introduce a method to compute the direct-optimal basis.
This kind of basis was introduced in [2,3] and it has two interesting properties: it
has the minimum number of attributes and it provides a framework to efficiently
compute the closure of a set of attributes. The new method introduced in this
paper is strongly based on SLFD (Simplification Logic) and they are more efficient
than previous methods appeared in the literature.
    In the following, first we establish the background necessary for the under-
standing of the paper (Section 2). In Section 3 SLFD is summarized and a motiva-
tion of the simplification paradigm to remove redundant attributes is provided.
Section 4 is focussed on the methods of Bertet et al. to get a direct-optimal
basis. In Section 5 the new method is introduced and a comparison among all
the methods is showed. Some conclusions are presented in Section 6.


2     Preliminaries

We assume well-known the main concepts in FCA [10]. Only the concepts nec-
essaries will be introduced. In Formal Concept Analysis (FCA) the relationship
between a set of objects and a set of attributes are described using a formal
context as follows:

Definition 1. A formal context is a triple K = (G, M, I) where G is a finite set
whose elements are named objects, M is a finite set whose elements are named
attributes and I ⊆ G × M is a binary relation. Thus, (o, a) ∈ I means the object
o has the attribute a.

This paper focuses on the notion of implication, which can be introduced as
follows:

Definition 2. Let K = (G, M, I) be a formal context and A, B ∈ 2M . The
implication A → B holds in K if every object o ∈ G satisfies the following:
(o, a) ∈ I for all a ∈ A implies (o, b) ∈ I for all b ∈ B.
    An implication A → B is said to be unitary if the set B is a singleton.

    Implications may be syntactically managed by means of inference systems.
The former axiomatic system was Armstrong’s Axioms [1]. They allows us to
introduce the notion of derivation of an implication from an implicational system,
the semantic entailment and the equivalence between two implicational systems
in the usual way.
                                        The Direct-optimal Basis via Reductions    147


3     Simplification Logic
In [6], Cordero et al. introduced the Simplification Logic, SLFD , that is, an equiv-
alent logic to the Armstrong’s Axioms that avoids the use of transitivity and is
guided by the idea of simplifying the set of implications by removing redundant
attributes efficiently. This logic has proved to be useful for automated reasoning
with implications [7, 8, 12, 13].
Definition 3 (Language). Given a non-empty finite alphabet S (whose ele-
ments are named attributes and denoted by lowercase letters a, b, c, etc.), the
language of SLFD is LS = {A → B | A, B ⊆ S}.
Sets of formulas (implications) will be named implicational systems (IS). In
order to distinguish between language and metalanguage, inside implications,
AB means A ∪ B and A-B denotes the set difference A r B. Moreover, when no
confusion arises, we omit the brackets, e.g. abc denotes the set {a, b, c}.
Definition 4 (Semantics). Let K = (G, M, I) be a context and A → B ∈ LS .
The context K is said to be a model for A → B, denoted K |= A → B, if
A, B ⊆ M ⊆ S and A → B holds in K.
For a context K and an IS Σ, then K |= Σ means K |= A → B for all A → B ∈ Σ
and Σ |= A → B denotes that every model for Σ is also a model for A → B. If
Σ1 and Σ2 are implicational systems, Σ1 ≡ Σ2 denotes both IS are equivalent
(i.e. K |= Σ1 iff K |= Σ2 for all context K).
Definition 5 (Syntactic derivations). SLFD considers reflexivity axioms
                                               B⊆A
                                       [Ref]       ;
                                               A→B

and the following inference rules named fragmentation, composition and simpli-
fication respectively.
             A → BC          A → B, C → D                               A → B, C → D
    [Frag]          ; [Comp]              ; [Simp] If A ⊆ C, A ∩ B = ∅,
              A→B              AC → BD                                   C-B → D-B

Given an IS Σ and a formula A → B, Σ ` A → B denotes that A → B can
be derived from Σ by using the axiomatic system in a standard way. The above
axiomatic system is sound and complete (i.e. Σ |= A → B iff Σ ` A → B). The
main advantage of SLFD is that inferences rules may be considered equivalence
rules and they are enough to compute all the derivations (see [12] for further
details and proofs).
Theorem 1 (Mora et al. [12]). In SLFD logic, the following equivalencies hold:
1. Fragmentation Equivalency [FrEq]: {A → B} ≡ {A → B-A}.
2. Composition Equivalency [CoEq]: {A → B, A → C} ≡ {A → BC}.
3. Simplification Equivalency [SiEq]: If A ∩ B = ∅ and A ⊆ C then
                         {A → B, C → D} ≡ {A → B, C-B → D-B}
4. Right Simplification Equivalency [rSiEq]: If A ∩ B = ∅ and A ⊆ C ∪ D then
                          {A → B, C → D} ≡ {A → B, C → D-B}
148      Estrella Rodrı́guez Lorenzo et al.


Note that these equivalencies (reading from left to right) remove redundant
information. SLFD was conceived as a simplification framework.
    To conclude this section, we introduce the outstanding notion of closure of
a set of attributes, which is strongly related with the syntactic treatment of
implications.

Definition 6. Let Σ ⊆ LS be an IS and X ⊆ S. The closure of X wrt Σ is the
                            +                      +
largest subset of S, noted XΣ , such that Σ ` X → XΣ .

We omit the subindex (i.e. we write X + ) when no confusion arise. Given a
context K and an IS Σ satisfying K |= A → B iff Σ ` A → B, it is well-known
that the closed sets of attributes wrt Σ are in bijection with the concepts of K.
    One of the main topics is the computation of the closure of a set of attributes,
and for this reason, it is necessary to have an efficient method to calculate
closures. We emphasize for this problem, the works of Bertet et al. in [2, 3] and
Cordero et al. in [12].


4     Direct-Optimal basis

The study of sets of implications that satisfies some criteria is one of the most
important topics in FCA. In [3], Bertet and Monjardet present a survey about
implicational systems and basis. They show the equality between five unit basis
originating from different works (minimal functional dependencies in database
theory, knowledge spaces, etc.) and satisfying various properties including the
directness canonical and minimal properties, whence the name canonical direct
basis is given to this basis. The direct-optimal basis belong to these five basis.
In the following, we show only the concepts used in the rest of the paper of this
survey.

Definition 7. An IS Σ is said to be:
 – minimal if Σ r {A → B} 6≡ Σ for all A → B ∈ Σ.
 – minimum if Σ 0 ≡ Σ implies |Σ| ≤ |Σ 0 |, for all IS Σ 0 .
 – optimal if Σ 0 ≡ Σ implies kΣk ≤ kΣ 0 k, for all IS Σ 0 .
                                                                   X
where |Σ| is the cardinality of Σ and kΣk is its size, ie kΣk =           (|A|+|B|).
                                                                  A→B∈Σ

A minimal set of implications is named a basis, and a minimum basis is then a
basis of least cardinality. Let us now introduce the main property used in this
paper, namely the direct-optimal property.

Definition 8. An IS Σ is said to be direct if, for all X ⊆ S:
                          [
               X + = X ∪ {B | A ⊆ X and A → B ∈ Σ}
Moreover, Σ is said to be direct-optimal if it is direct and, for any direct IS Σ 0 ,
Σ 0 ≡ Σ implies kΣk ≤ kΣ 0 k.
                                    The Direct-optimal Basis via Reductions      149


In words, Σ is direct if the computation of the closure of any attribute set wrt Σ
requires only one iteration, that is, a unique traversal of the set of implications.
Obviously, the direct-optimal property is the combination of the directness and
optimality properties. In [2], Bertet and Nebut show that a direct-optimal IS is
unique and can be obtained from any equivalent IS. We address this procedure
in this paper.
    As we have said in the preliminaries, one of the most important problems is
how to calculate quickly and easily the closure X + of any set X because a number
of problems related to an IS Σ can be answered by computing closures. For this
reason, Bertet et al. propose a type of base called direct-optimal basis [2, 3], so
one can compute closures of subsets in only one iteration. Section 4.1. presents
the basis proposed in [2] by Bertet and Nebut where they work with non-unitary
implicational systems (IS). Section 4.2 shows how to obtain a unit direct-optimal
basis [3]. In both sections, we illustrate the algorithms needed to obtain a direct-
optimal basis equivalent to any implicational system.


4.1     Computing Direct-Optimal basis

In this section, the algorithm proposed by Bertet and Nebut in [2] is showed.
The key of the method is the so-called “overlap axiom” that can be directly
proved by using the axiomatic system from Definition 5.
                                                       A → B, C → D
[Overlap] for all A, B, C, D ⊆ S:     If B ∩ C 6= ∅,
                                                        A(C-B) → D
Then, the direct implicational system generated from an IS Σ is defined as the
smallest IS that contains Σ and is closed for [Overlap].

Definition 9. The direct implicational system Σd generated from Σ is defined
as the smallest IS such that:
1. Σ ⊆ Σd and
2. For all A, B, C, D ⊆ S, if A → B, C → D ∈ Σd and B ∩ C 6= ∅ then
   A(C-B) → D ∈ Σd .

 Function Bertet-Nebut-Direct(Σ)
      input : An implicational system Σ on S
      output: The direct IS Σd on S equivalent to Σ
      begin
         Σd := Σ
         foreach A → B ∈ Σd do
            foreach C → D ∈ Σd do
                if B ∩ C 6= ∅ then add A(C-B) → D to Σd ;
         return Σd


Theorem 2 (Bertet and Nebut [2]). Let Σ be an implicational system. Then
Σd = Bertet-Nebut-Direct(Σ) is a direct basis.
150        Estrella Rodrı́guez Lorenzo et al.


Moreover, if an IS Σ is direct but not direct-optimal, then there exists an equiv-
alent IS Σ 0 of smaller size which is direct-optimal. The properties that it must
hold are the following:
Theorem 3 (Bertet and Nebut [2]). A direct IS Σ is direct-optimal if and
only if the following properties hold.
Extensiveness: for all A → B ∈ Σ, A ∩ B = ∅.
Isotony: for all A → B, C → D ∈ Σ, C A implies B ∩ D = ∅.
Premise: if A → B, A → B 0 ∈ Σ then B = B 0 .
Not empty conclusion: if A → B ∈ Σ then B 6= ∅.

 Function Bertet-Nebut-Minimize(Σ)
      input : An implicational system Σ on S
      output: An smaller IS Σm on S equivalent to Σ
      begin
         Σm := ∅
         foreach A → B ∈ Σ do
            B 0 := B
            foreach C → D ∈ Σ do
                 if C = A then B 0 := B 0 ∪ D;
                 if C A then B 0 := B 0 r D;
            B 0 := B 0 r A
            add A → B 0 to Σm
         return Σm

Function Bertet-Nebut-DO computes the direct-optimal basis Σdo generated from
an IS Σ. It first computes Σd using Function Bertet-Nebut-Direct and then
minimizes Σd using Function Bertet-Nebut-Minimize.
 Function Bertet-Nebut-DO(Σ)
      input : An implicational system Σ on S
      output: The direct-optimal IS Σdo on S equivalent to Σ
      begin
         Σd = Bertet-Nebut-direct(Σ)
         Σdo = Bertet-Nebut-Minimize(Σd )
         return Σdo


Theorem 4 (Bertet and Nebut [2]). Let Σ be an implicational system. Then
Σdo = Bertet-Nebut-DO(Σ) is the unique direct-optimal implicational system
equivalent to Σ.


4.2     Direct-Optimal basis by means of unit implicational systems

In some areas, the management of formulas is limited to unitary ones. Thus,
the use of Horn Clauses in Logic Programming is widely accepted. Such a lan-
guage restriction allows an improvement in the performance of the methods,
which are more direct and lighter. Nevertheless, the advantages provided by the
                                   The Direct-optimal Basis via Reductions    151


limited languages have a counterpart: a significant growth of the input set. In
this section we are going to present new versions of the definitions and methods
introduced above restricted to Unit Implicational System (UIS), i.e. set of im-
plications with unitary right-hand sides. An UIS is named proper if it does not
contain implications A → a such that a ∈ A.
    In this line, Bertet [4] provided versions for unit implicational systems of
Functions Bertet-Nebut-Direct and Bertet-Nebut-Minimize.
 Function Bertet-Unit-Direct(Σ)
  input : A proper UIS Σ on S
  output: The direct UIS Σd on S equivalent to Σ
  begin
     Σd := Σ
     foreach A → a ∈ Σd do
        foreach Ca → b ∈ Σd do
           if a 6= b and b 6∈ A then add AC → b to Σd ;
       return Σd

 Function Bertet-Unit-Minimize(Σ)
  input : A proper UIS Σ on S
  output: An smaller UIS Σm on S equivalent to Σ
  begin
     Σm := Σ
     foreach A → b ∈ Σm do
        foreach C → b ∈ Σm do
           if A C then delete C → b from Σm ;
       return Σm

    The above functions was used in [4] to build a method which transforms an
arbitrary UIS into an UIS with the same properties that the direct-optimal basis
for general IS. Since any non-unit IS can be trivially turned into an UIS, we may
encapsulate both functions to provide another method to get a direct-optimal
basis from and arbitrary IS. Thus, the following function incorporates a first
step to convert any IS into its equivalent UIS and concludes with the converse
switch.
 Function Bertet-Unit-DO(Σ)
  input : An implicational system Σ on S
  output: The direct-optimal IS Σdo on S equivalent to Σ
  begin
     Σu := {A → b | A → B ∈ Σ and b ∈ B r A}
     Σud := Bertet-Unit-Direct(Σu )
     Σudo := Bertet-Unit-Minimize(Σud )
     Σdo := {A → B | B = {b | A → b ∈ Σ} =6 ∅}
     return Σdo
152      Estrella Rodrı́guez Lorenzo et al.


Theorem 5 (Bertet [4]). Let Σ be an IS. Then Σdo = Bertet-Unit-DO(Σ) is
the unique direct-optimal implicational system equivalent to Σ.

    As we have mentioned at the beginning of this subsection, some authors
introduce unitary formulas as a way to provide simpler and more direct methods
having a better performance. Thus, in this case, Bertet-Unit-DO is more efficient
than Bertet-Nebut-DO, as we shall see at the end of the paper in Section 5.1.


5     Computing direct-optimal basis by means of reductions

In this paper, our goal is the integration of the techniques proposed by Bertet
et al. [2–4] and the Simplification Logic proposed by Cordero et al. [6], that
is, the adding of reductions based on the simplification paradigm to build a
direct-optimal basis.
    In the same way that Bertet-Unit-DO, we are going to develop a function
to get direct-optimal basis whose first step will be to narrow the implications.
However, the use of unit implications has some disadvantages that we are going to
avoid by considering another kind of formulas. Thus, we are going to use reduced
IS and introduce simplification rules which transform it preserving reduceness.
A signal which indicates it is a good approach is the fact that at the end of
the process, the function renders the direct-optimal basis directly, avoiding the
converse switch.

Definition 10. An IS Σ is reduced if A → B ∈ Σ implies B 6= ∅ and A∩B = ∅
for all A, B ⊆ S.

Obviously, any IS Σ can be turned into a reduced equivalent one Σr as follows
                      Σr := {A → B-A | A → B ∈ Σ, B 6⊆ A}
The method proposed begins with this transformation and, once the IS is re-
duced, this property is preserved. For this reason, [Overlap] must be substituted.
Thus, we introduce a new inference rule covering directness without losing re-
duceness and, at the same time, it makes progress on the minimization task
following the simplification paradigm. The kernel of the new method is the fol-
lowing inference rule, named strong simplification:
                                                       A → B, C → D
            [sSimp]   If B ∩ C 6= ∅ and D 6⊆ A ∪ B,
                                                      A(C-B) → D-(AB)

Regardless the conditions, the inference rule always holds. Nevertheless, the con-
ditions ensure a precise application of the rule in those cases where it is necessary.
Definition 11. Given a reduced IS Σ, the direct-reduced implicational system
Σdr generated from Σ is defined as the smallest IS such that
1. Σ ⊆ Σdr and
2. For all A, B, C, D ⊆ S, if A → B, C → D ∈ Σdr , B ∩ C 6= ∅ and D 6⊆ A ∪ B
   then AC-B → D-(AB) ∈ Σdr
                                  The Direct-optimal Basis via Reductions    153


Theorem 6. Given a reduced IS Σ, then Σdr =Direct-Reduced(Σ) is a direct
and reduced IS.

 Function Direct-Reduced(Σ)
  input : A reduced implicational system Σ on S
  output: The direct-reduced IS Σdr on S
  begin
     foreach A → B ∈ Σdr and C → D ∈ Σdr do
                      6 D r (A ∪ B) then add AC-B → D-(AB) to Σdr ;
        if B ∩ C 6= ∅ =
     return Σdr
Theorem 1 provides four equivalencies which allow to remove redundant infor-
mation when they are read from left to right. An implicational system in which
these equivalences are used to remove redundant information is going to be
named simplified implicational system.

Definition 12. A reduced IS Σ is simplified if the following conditions hold:
for all A, B, C, D ⊆ S,
1. A → B, A → C ∈ Σ implies B = C.
2. A → B, C → D ∈ Σ and A C imply C ∩ B = ∅ = D ∩ B.

   Then, Function RD-Simplify turns any direct-reduced IS into a direct-reduced-
simplified equivalent one by systematically applying the equivalences provided
in Theorem 1.
 Function RD-Simplify(Σ)
  input : A direct-reduced implicational system Σ on S
  output: The direct-reduced-simplified IS Σdrs on S equivalent to Σ
  begin
     Σdrs := ∅
     foreach A → B ∈ Σ do
        foreach C → D ∈ Σ do
            if C = A then B := B ∪ D;
            if C A then B := B r D;
         if B 6= ∅ then add A → B to Σdrs ;
      return Σdrs

 Function doSimp(Σ)
  input : An implicational system Σ on S
  output: The direct-optimal IS Σdo on S
  begin
     Σr := {A → B-A | A → B ∈ Σ, B 6⊆ A}
     Σdr := Direct-Reduced(Σr )
     Σdo := RD-Simplify(Σdr )
     return Σdo
154      Estrella Rodrı́guez Lorenzo et al.


Theorem 7. Let Σ be an implicational system on S. Then, Σdo = doSimp(Σ)
is the direct-optimal basis equivalent to Σ.


Note that, unlike Bertet-Unit-DO where a final step was needed to revert the
effects of the first transformation, doSimp do not need to revert the first step. We
conclude this section with an experiment which illustrates the advantages of the
new method.




5.1    Empirical results



Logic programming has been used as a natural framework in the areas in which
it is neccessary to develop automatic deduction methods. The Prolog prototypes
provides a declarative and pedagogical point of departure and illustrates the
behavior of new techniques in a very fast and easy way.
   Some authors have explored the use of Logic Programming in the framework
of Formal Concept Analysis. Even, in [5] the authors consider the framework
of FCA and its implementation in logic programming as a previous step to
achieve the first order logic FCA theory. In Eden et al. [9], the authors present
a PROLOG-based prototype tool and show how the tool can utilize formulas to
locate pattern instances.
    In a first step, the methods proposed in this paper have been developed in
a Logic Programming language (Prolog) that is a well-known tool to develop
fast prototypes. In our case, the implementation in Prolog is close because the
method proposed in this paper is based on logic.
    The methods of Bertet et al. [2, 3] and our doSimp method have been imple-
mented in Swi-Prolog.1 Since there does not exist a benchmark for implications
in this experiment, we have collected some sets of implications from the litera-
ture, searching papers and books with works about algorithms for implications,
functional dependencies and minimal keys. Now, we are going to show the results
of the execution of a first Prolog prototype of Bertet et al. for UIS [3], Bertet et
al. for IS [2] and the new doSimp (proposed in this paper) methods.
    The following table and figures summarize the results obtained. We show in
the columns the results of Prolog: Lips (logical inferences per second lips - used
to describe the performance of a logical reasoning system), Time (execution time
in seconds), and Comp (the number of couple of implications in which a rule is
applied). Areas in Figure 2 show the percentages of each algorithm with respect
the number of comparisons.


1
    Available at http://www.lcc.uma.es/~enciso/doSimp.zip
                                                                                            The Direct-optimal Basis via Reductions                                                                            155



Lips/Time/Comp.                                            Bertet-Nebut-DO                                             Bertet-Unit-DO                                        Direct-Reduced
          Ex.1                                    5297080     1247   1978     116905    0.019   36    4281 0.001   12
          Ex.2                                       2395    0.003     23        923        0    3     606     0    2
          Ex.3                                       2183        0     15       1440        0    4    1122     0    4
          Ex.a                                      83403    0.019    297      44109    0.007   33    3048 0.001    4
        Ex.a3red                                    27613    0.005    100      16938    0.003   20    3698 0.001   15
     Ex.derivation5                                 10302    0.002    120       3522   0.001     8    1782 0.001   12
      Ex.Olomouc                                 15399581     4528   4337    1526818    0.331 180    15568 0.003   72
       Ex.Ganter                                   116514    0.025    230      72153     0.16   36    3756 0.001   12
       Ex.CLA14                                    102971    0.022    204       7449    0.001   12     704     0    3
      Ex.Saedian1                                   18754    0.004     97      10349    0.002 14      4064 0.001   16
      Ex.Saedian2                                   19452    0.004    160      10549    0.002 13      2619 0.001   13
      EX.Saedian3                                 5753962     1262   1986     166566    0.028   67   24643 0.005   55
      Ex.Wastl10                                     1242        0     18        381        0    1     327     0    1
      Ex.Wastl13                                    10543    0.002     86       4674        0   10    1029     0    5
       Example1                                5594556921 7008.890 134175 2662181973 1351.950 5389 1199498 0.197 1103




                                                                                 IS Bertet − N ebut                               U IS Bertet                                  doSimp
                  Lips - logical inferences                                          374, 760, 194.4                            177, 610, 983.3                          84, 449.66667
                  Time of execution (seconds)                                             467, 728.5                             90, 130.03693                                   0.014
                  Number of comparisons                                                      9588.4                                       388.4                                   88.6



                                                  Fig. 1. Summary of the experiment (average)




    100%	
  




     90%	
  




     80%	
  




     70%	
  




     60%	
  




     50%	
                                                                                                                                                                    Comparisons	
  doSimp	
  

                                                                                                                                                                              Comparisons	
  UIS	
  -­‐B	
  

                                                                                                                                                                              Comparisons	
  IS	
  -­‐BN	
  
     40%	
  




     30%	
  




     20%	
  




     10%	
  




         0%	
  
               	
  



                         	
  



                                   	
  


                                              5	
  



                                                        	
  


                                                                  1	
  



                                                                             2	
  



                                                                                          	
  



                                                                                                  	
  




                                                                                                                        r	
  



                                                                                                                                   	
  




                                                                                                                                                          c	
  



                                                                                                                                                                      	
  
                                                                                                                                             3	
  
                                                                                                             	
  
             10



                       .3



                                 .2




                                                      13




                                                                                       ed



                                                                                                  .a




                                                                                                                                 .1




                                                                                                                                                                    e1
                                                                                                           14



                                                                                                                      te




                                                                                                                                                        ou
                                          on




                                                                 an



                                                                            an




                                                                                                                                            an
                                                                                                 Ex
                      Ex



                                Ex




                                                                                                                                Ex




                                                                                                                                                                    pl
           tl




                                                      tl




                                                                                     3r




                                                                                                         LA



                                                                                                                      an




                                                                                                                                                       m
                                                                 di



                                                                            di




                                                                                                                                            di
                                         a:
       as




                                                    as




                                                                                                                                                                  am
                                                                                     .a




                                                                                                         .C



                                                                                                                    .G
                                                               ae



                                                                          ae




                                                                                                                                                     lo
                                                                                                                                          ae
     .W




                                                  .W
                                       iv




                                                                                 Ex




                                                                                                       Ex




                                                                                                                                                  .O



                                                                                                                                                             Ex
                                                                                                               Ex
                                                           .S



                                                                       .S




                                                                                                                                    .S
                                     er
    Ex




                                                Ex




                                                                                                                                                 Ex
                                                         Ex



                                                                      Ex




                                                                                                                                  EX
                                  .d
                                Ex




                                                                      Fig. 2. Results: Comparisons




6           Conclusion
In this work, we have presented another algorithm to calculate the direct-optimal
basis in a further way, in the most of the cases, than the algorithms which exist in
156      Estrella Rodrı́guez Lorenzo et al.


the literature. It is shown with a test that we have realised by running different
examples with the methods of Bertet et al. for UIS [3], Bertet et al. for IS [2]
and the new doSimp.
    Our aim is to reduce the cost of the algorithm by using the Simplification
Logic as a useful tool to work with implications. By the time, we have improved
the algorithms that existed but we are going to go on working in that way to
try to cut down the cost of our method.
    The perspectives we have are improvements by pretreatments: reduction,
canonical basis, etc in order to reach our main objective which would be to
directly compute the direct-optimal basis without extra implication generation.


Acknowledgment
Supported by grant TIN11-28084 of the Science and Innovation Ministry of Spain.

References
1. W W. Armstrong, Dependency structures of data base relationships, Proc. IFIP
   Congress. North Holland, Amsterdam: 580–583, 1974.
2. K. Bertet, M. Nebut, Efficient algorithms on the Moore family associated to an
   implicational system, DMTCS, 6(2): 315–338, 2004.
3. K. Bertet, B. Monjardet, The multiple facets of the canonical direct unit implica-
   tional basis, Theor. Comput. Sci., 411(22-24): 2155–2166, 2010.
4. K. Bertet, Some Algorithmical Aspects Using the Canonical Direct Implicationnal
   Basis, CLA:101–114, 2006.
5. L. Chaudron, N. Maille, 1st Order Logic Formal Concept Analysis: from logic pro-
   gramming to theory, Computer and Informations Science: (13:3),1998.
6. P Cordero, A. Mora, M. Enciso, I.Pérez de Guzmán, SLFD Logic: Elimination of
   Data Redundancy in Knowledge Representation, LNCS, 2527: 141–150, 2002.
7. P. Cordero, M. Enciso, A. Mora, M. Ojeda-Aciego, Computing Minimal Generators
   from Implications: a Logic-guided Approach, CLA: 187–198, 2012.
8. P. Cordero, M. Enciso, A. Mora, M. Ojeda-Aciego, Computing Left-Minimal Direct
   Basis of implications. CLA: 293–298, 2013.
9. A. Eden, Y. Hirshfeld, K. Lundqvist, EHL99 , LePUS Symbolic Logic Modeling of
   Object Oriented Architectures: A Case Study, In: Proc. Second Nordic Workshop
   on Software Architecture (NOSA’99), 1999.
10. B. Ganter, Two basic algorithms in concept analysis,      Technische Hochschule,
   Darmstadt, 1984.
11. J.L. Guigues and V. Duquenne, Familles minimales d’implications informatives
   résultant d’un tableau de données binaires, Math. Sci. Humaines: 95, 5–18, 1986.
12. A. Mora, M. Enciso, P. Cordero, and I. Fortes, Closure via functional dependence
   simplification, I. J.of Computer Mathematics, 89(4): 510–526, 2012.
13. A. Mora, M. Enciso, P. Cordero, and I. Pérez de Guzmán, An Efficient Prepro-
   cessing Transformation for Functional Dependencies Sets Based on the Substitution
   Paradigm, LNCS, 3040: 136–146, 2004.
        Ordering objects via attribute preferences

             Inma P. Cabrera1 , Manuel Ojeda-Aciego1 , and Jozef Pócs2
                      1
                       Universidad de Málaga. Andalucı́a Tech. Spain?
                  2
                      Palacký University, Olomouc, Czech Republic, and
                      Slovak Academy of Sciences, Košice, Slovakia??



         Abstract. We apply recent results on the construction of suitable or-
         derings for the existence of right adjoint to the analysis of the following
         problem: given a preference ordering on the set of attributes of a given
         context, we seek an induced preference among the objects which is com-
         patible with the information provided by the context.


  1    Introduction

  The mathematical study of preferences started almost one century ago with the
  works of Frisch, who was the first to write down in 1926 a mathematical model
  about preference relations. On the other hand, the study of adjoints was initiated
  in the mid of past century, with works by Ore in 1944 (in the framework of lattices
  and Galois connections) and Kan in 1958 (in the framework of category theory
  and adjunctions). The most recent of the three theories considered in this work
  is that of Formal Concept Analysis (FCA), which was initiated in the early 1980s
  by Ganter and Wille, as a kind of applied lattice theory.
      Nowadays FCA has become an important research topic in which a, still
  growing, pure mathematical machinery has expanded to cover a big range of
  applications. A number of results are published yearly on very diverse topics
  such as data mining, semantic web, chemistry, biology or even linguistics.
      The first basic notion of FCA is that of a formal context, which can be
  seen as a triple consisting of an initial set of formal objects B, a set of formal
  attributes A, and an incidence relation I ⊆ B × A indicating which object has
  which attribute. Every context induces a lattice of formal concepts, which are
  pairs of subsets of objects and attributes, respectively called extent and intent,
  where the extent of a concept contains all the objects shared by the attributes
  from its intent and vice versa.
      Given a preference ordering among the attributes of a context, our contribu-
  tion in this work focuses on obtaining an induced ordering on the set of objects
  which, in some sense, is compatible with the context.
      After browsing the literature, we have found just a few papers dealing simul-
  taneously with FCA and preferences, but their focus and scope are substantially
  ?
     Partially supported by Spanish Ministry of Science and FEDER funds through
     projects TIN2011-28084 and TIN12-39353-C04-01.
  ??
     Partially supported by ESF Fund CZ.1.07/2.3.00/30.0041.


c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 157–169,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
158      Inma P. Cabrera, Manuel Ojeda-Aciego and Jozef Pócs


different to ours. For instance, Obiedkov [11] considered some types of preference
grounded on preference logics, proposed their interpretation in terms of formal
concept analysis, and provided inference systems for them, studying as well their
relation to implications. Later, in [12], he presented a context-based semantics
for parameterized ceteris paribus preferences over subsets of attributes (pref-
erences which are only required to hold when the alternatives being compared
agree on a specified subset of attributes).
    Other approaches to preference handling are related to the development of
recommender systems. For instance, [8] proposes a novel recommendation model
based on the synergistic use of knowledge from a repository which includes the
users behavior and items properties. The candidate recommendation set is con-
structed by using FCA and extended inference rules.
    Finally, another set of references deal with extensions of FCA, either to the
fuzzy or multi-adjoint case, or to the rough case. For instance, in [2] an approach
can be found in which, based on transaction cost analysis, the authors explore
the customers’ loyalty to either the financial companies or the company financial
agents with whom they have established relationship. In a pre-processing stage,
factor analysis is used to choose variables, and rough set theory to construct the
decision rules; FCA is applied in the post-processing stage from these suitable
rules to explore the attribute relationship and the most important factors af-
fecting the preference of customers for deciding whether to choose companies or
agents.
    Glodeanu has recently proposed in [6] a new method for modelling users’
preferences on attributes that contain more than one trait. The modelling of
preferences is done within the framework of Formal Fuzzy Concept Analysis,
specifically using hedges to decrease the size of the resulting concept lattice as
presented in [1].
    An alternative generalization which, among other features, allows for specify-
ing preferences in an easy way, is that of multi-adjoint FCA [9,10]. The main idea
underlying this approach is to allow to use several adjoint pairs in the definition
of the fuzzy concept-forming operators. Should one be interested in certain sub-
set(s) of attributes (or objects), the only required setting is to declare a specific
adjoint pair to be used in the computation with values within each subset of
preferred items.
    The combination of the two last approaches, namely, fuzzy FCA with hedges
and the multi-adjoint approach have been recently studied in [7], providing new
means to decrease the size of the resulting concept lattices.
    This work can be seen as a position paper towards the combination of recent
results on the existence of right adjoint for a mapping f : hX, ≤X i → Y from a
partially ordered set X to an unstructured set Y , with Formal Concept Analysis,
and with the generation of preference orderings.
    The structure of this work is the following: in Section 2, the preliminary
results related to attribute preferences and the characterization of existence of
right adjoint to a mapping from a poset to an unstructured codomain are pre-
sented; then, in Section 3 the two approaches above are merged together in order
                                 Ordering Objects via Attribute Preferences    159


to produce a method to induce an ordering among the objects in terms of a given
preference ordering on attributes and a formal context.


2     Preliminaries

2.1   Preference relations and lectic order on the powerset

We recall the definition of a (total) preference ordering and describe an induced
ordering on the corresponding powerset.
    In the general approach to preferences, a preference relation on a nonempty
set A is said to be a binary relation  ⊆ A×A which is reflexive (∀a ∈ A, a  a)
and total (∀a, b ∈ A, (a  b) ∨ (b  a)).
    In this paper, we will consider a simpler notion, in which a preference rela-
tion is modeled by a total ordering. Formally, by a total preference relation we
understand any total ordering of the set A, i.e., a binary relation  ⊆ A × A
such that  is total, reflexive, antisymmetric (∀a, b ∈ A, a  b and b  a implies
a = b), and transitive (∀a, b, c ∈ A, a  b and b  c implies a  c).
    Any total preference relation on a set A induces a total ordering on the
powerset 2A in a natural way.

Definition 1. Let hA, i be a nonempty set with a total preference relation. A
subset X is said to be lectically smaller than a subset Y , denoted X = 1 and k < i} the branches of all nodes that precedes ni in
ss. To avoid locating these nodes and simplify calculations, let’s virtually move ni
to the start of ss. All supersets of ni are now confined in ni branch and the updated
count of ni supersets is 2(n−1) . Let nj be another generator in the same cluster. It
is important to count all nj supersets while avoid including elements that are already
counted as part of the ni branch. By virtually moving nj after ni in ss and counting
all elements in the nj branch, it is possible to fulfil both conditions. nj branch count
is 2(n−2) and the same process is applied to the remaining generators in the cluster.
                                                                                P|gs|
Doing so leads us to the generalized generators counting formula gc = k=|ss|−1 2k
where |gs| is the generators count and |ss| is cluster size (line 10 in DFSP and line 20
in E XPLORE T IDSET).



Detecting non generators monotony The most significant mop up mechanism in
DFSP is non generators pruning. In order to also eliminate non generators, E XPLORE T ID -
SET looks for nodes in ss that when combined together, the resulting clique superset
will still be a non generator. Those nodes are said to form a non generator monotonous
clique. Suppose, we’re building the branch of a node from this clique. If we use exclu-
sively nodes from the clique, all nodes in the branch are guaranteed to be subsets of
the clique superset. Since a subset of a non generator is also a non generator then all
branches in the clique will only contain non generators. Nodes in the clique are pushed
at the end of the ss set to insure that the generation process will only use nodes from
the clique. Nodes outside the clique are moved away to the beginning of ss. Nodes in
the clique are not expanded, since no generator could be found in their branches but are
still used to build branches outside the clique.
8176    I.Ilyes Dimassi,
           Dimassi et al. Amira Mouakher and Sadok Ben Yahia


 Algorithm 2: E XPLORE T IDSET
  Input:
  -K = (O, I, R): a formal context.
  -ss=a set of TSNode siblings.
  -m: the size of the intent.
  Results:
  -gc : the generators count.
  1 Begin
  2    i := ssc := |ss|;
  3    ingpc := gc := 0;
  4    ingpi := I;
  5    While i < 1 do
  6        If |ngpi ∩ ss[i].is| = m then
  7             M OVE T O H EAD(i, ss);
  8             ingpc := ingpc + 1;
   9        Else
   10           i := i − 1;
   11           ngpi := ngpi ∩ ss[i].is;

   12   For (i = 1 . . . ingpc) do
   13       nlef t := ss[i];
   14       For (j = i + 1 . . . ssc) do
   15            nright := ss[j];
   16            nchild.s := nright.s;
   17            nchild.is := nlef t.is ∩ f (nchild.s);
   18            If (|nchild.is|! = m) then
   19                 nlef t.ss∪ := nchild;
                     Pssc−i−1
   20       gc+ = k=|nlef t.ss| 2k ;
   21       If (|nlef t.ss| > 1) then
   22            gc+ = E XPLORE T IDSET(nlef t.ss, K, m);

   23   Return gc;
   24 End




3.2 Illustrative example

To illustrate our approach, let us consider the formal concept C1 = (A1 , B1 ) from the
formal context depicted by Table 1 such that A1 = {3, 4, 5, 6, 7, 9} and B1 = {f, g}.
As shown in figure 1, the DFSP algorithm operates as follows:
    During the first step (1), the root node is created and initialized though the function
BUILD T REE ROOT (gc=0). Initially, root.s = ∅ and nodes n3 , n4 , n5 , n6 , n7 will be
created through individual elements of {3, 4, 5, 6, 7, 9} (steps (2), (3), (4), (5), (6) and
(7)). These nodes are prospective direct children to the root node. Given that all these
nodes are non generators, they become in step (8) as effective direct children of root and
are decreasingly sorted with respect to their support value. In step (9), non generators
forming monotone clique are placed at the end of the list and marked by (*). However,
       DFSP: A newDFSP:   Swift
                   algorithm forComputation   of Formal
                                a swift computation       Concept
                                                    of formal      Set
                                                              concept setStability
                                                                          stability    177
                                                                                         9




                                Fig. 1. Illustrative example



instable generators are placed at the beginning of the list and marked by (+). After that,
in steps (10), (11), (12), (13) and (14) prospective direct children of node n3 are created
which are, respectively, n36 , n39 , n34 , n35 and n37 . The count of generators is updated
in step (15) (gc = 24 +23 = 24). Only nodes n34 , n35 and n39 are left as effective direct
children of n3 . The latter are also sorted decreasingly. In step (16), all these effective
direct children form a monotone clique and exploration of this branch is stopped. After
that, nodes n69 , n64 , n65 and n67 are created and count of generators is also updated
in step (21) with the tree generators of n64 , n65 and n67 (gc = gc + 23 + 22 + 21 =
24 + 14 = 38). Only the node n69 is kept in the list of effective direct children of n6 .
Indeed, the latter does not fulfil the condition of E XPLORE T IDSET to be launched.

4 Experimental results
In this section, we put the focus on the evaluation of the DFSP algorithm by stressing on
two complementary aspects : (i)Execution time; (ii) efficiency of search space pruning.
Experiments were carried out on an Intel Xeon PC, CPU E5-2630 2,30 GHz with 16
GB of RAM and Linux system. During the lead experiments, we used some benchmark
datasets commonly of extensive use within Data mining. The first three datasets are
considered as dense ones, i.e., yielding high number of formal concepts even for a
small number of objects and attributes, while the other ones are considered as sparse.
The characteristics of these datasets are summarized by Table 2. Thus, for each dataset
we report its number of objects, the number of attributes, as well as the number of
all formal concepts that may drawn. In addition, we also reported the respective sizes
of the smallest and the largest formal concepts (in terms of extent’s size). For these
considered concepts, we kept track of the number of the actually explored nodes as
well as the execution time (the column denoted |explor.|).
     At a glance, statistics show that the DFSP algorithm is able to process dozens of
thousands of objects in a reasonable time. Indeed, the 15596 (respec. 16040) objects
 178
10                      I.Ilyes Dimassi,
                           Dimassi et al. Amira Mouakher and Sadok Ben Yahia

composing the extent of the largest formal concept extracted from the R ETAIL (re-
spec. T10I4D100K) dataset are handled in only 27.27 (respec. 68.85) seconds. Even
though, the respective cardinalities are close (15596 vs 16040 objects), the difference
in execution time is not proportional to this low gap. A preliminary explanation could
be the difference in density of both datasets (R ETAIL is dense while T10I4D100K is
a sparse one). A in-depth study of these performances in connection to the nature of
datasets is currently carried out. The most sighting fact is the low number of visited
nodes in the associated search space. For example for the M USHROOM dataset, DFSP
algorithm actually handled only 83918 nodes from 21000 potential nodes of the search
space, i.e., in numerical terms it comes to only explore infinitely insignificant part equal
to 7.8 × 10−297 of the search space. The case of R ETAIL and T10I4D100K datasets is
also worth of mention. For the respective smallest extracted concepts, DFSP algorithm
only explores, 1.14 × 10(−45) and 1.5 × 10(−90) parts of the respective search spaces.


                                                                    smallest concept             largest concept
Datasets                           # Attr # Obj # concepts
                                                                |ext| |explor.| time (sec.) |ext| |explor.| time (sec.)
C HESS         75 3196                                     3316 2630 2362233          0.12 3195 5855899           0.64
M USHROOM     119 8124                                     3337 1000     83918        0.10 8124 76749955         11.32
R ETAIL     16470 88162                                    3493 150        164        0.10 15596 64847191        27.27
T10I4D100K   1000 100000                                   4497 300        306        0.11 6810 19719991         12.77
T40I10D100K 1000 100000                                    3102 1800 1495324          1.39 16040 92154598        68.85

                                   Table 2. Characteristics of the considered benchmark datasets



    These highlights are also confirmed by Figures 2-11. Indeed, Figures 2, 4, 6, 8 and
10 stress on the variation of the Execution time, while Figures 3, 5, 7, 9 and 11 assess
what we call the workload which means the efficiency of search space exploration. At
a glance, the execution time is in a snugness connection with the reduction of search
space, i.e., the variation of the workload has the same tendency as the performance
since we consider the visited tidset in the search space as the processing unit. Worth
of mention, the performance is rather correlated to the extent’s size rather than the
exponential nature of the search space.


                                                                       1000

                         Scaleup    Trend (Scaleup)                                    Workload   Trend (workload)
          10                                                                100
                                                           Tidsets (x106)
 Time (sec)




                                                                             10




              1                                                               1




                                                                             0.1




        0.1
                                                                            0.01

                  1                                   10
                                                                                   1                                 10
                             Extent size (x103)                                           Extent size (x103)



                      Fig. 2. Mushroom scaleup                                 Fig. 3. Mushroom workload
                         DFSP: A newDFSP:   Swift
                                     algorithm forComputation   of Formal
                                                  a swift computation       Concept
                                                                      of formal      Set
                                                                                concept setStability
                                                                                            stability                                                     179
                                                                                                                                                           11

             1.4

                                                 Scaleup                                                     Workload
             1.2                                                                    10.4
                                                 Trend (Scaleup)                                             Trend (Workload)
              1
                                                                                           8.4




                                                                          Tidsets (x106)
Time (sec)
             0.8

                                                                                           6.4

             0.6


                                                                                           4.4
             0.4



             0.2                                                                           2.4



              0
                                                                                           0.4

                   2.6    2.7      2.8     2.9      3      3.1     3.2
                                                                                                 2.6         2.7     2.8    2.9     3         3.1   3.2
                                Extent size (x103)                                                                 Extent size (x103)


                         Fig. 4. Chess scaleup                                                             Fig. 5. Chess workload




    1000                                                                             10000
                         Scaleup                                                                             Workload      Trend (workload)
                         Trend (Scaleup)
                                                                                           1000


       100
                                                                                            100
                                                                         Tidsets (x105)



                                                                                             10
             10
Time (sec)




                                                                                                 1



                                                                                             0.1
              1


                                                                                           0.01


         0.1
                                                                                          0.001

                   1.5                     15                      150
                                                                                                     1.5                     15                     150
                                Extent size (x102)                                                                 Extent size (x102)


                         Fig. 6. Retail scaleup                                                            Fig. 7. Retail workload




5 Conclusion and future work

Through the DFSP algorithm, we gaped in the combinatorics of lattices by the show-
ing that most of this sear space could be smartly explored thanks to the saturation of
generators. The swift computation of stability encouraged us to integrate the stability
as a on-the-fly pruning strategy during mining closed itemsets. We are currently work-
ing on a new algorithm for the stability computation given the Galois lattice. The new
algorithm only relies on the direct sub-concepts to compute the stability of a concept.
Outside the FCA field, the strategy of DFSP would be of benefit for very efficient ex-
traction well known problem of combinatorics : minimal transversals.


References

  1. Babin, M.A., Kuznetsov, S.O.: Approximating concept stability. In: Proceedings of the 11th
     International Conference on Formal Concept Analysis(ICFCA), Dresden, Germany. (2012)
     7–15
  2. Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Scalable estimates of concept stability. In: Pro-
     ceedings of the 12th International Conference on Formal Concept Analysis(ICFCA), Cluj-
     Napoca, Romania. (2014) 157–172
 180
12                       I.Ilyes Dimassi,
                            Dimassi et al. Amira Mouakher and Sadok Ben Yahia

         100

                         Scaleup
                                                                                100




                                                               Tidsets (x106)
                         Trend (Scaleup)
                                                                                 10
Time (sec)   10

                                                                                  1



                                                                                 0.1

              1
                                                                                0.01

                                                                                                        Workload
                                                                            0.001

                                                                                                        Trend (workload)
             0.1
                                                                        0.0001

                    3                          30
                                                                                       3                    30
                              Extent size (x102)                                           Extent size (x102)


                   Fig. 8. T10I4D100K scaleup                                   Fig. 9. T10I4D100K workload
       100




                                                                                 50




                                                               Tidsets (x106)
Time (sec)




             10

                                                                                  5




                                            Scaleup                                                     Workload
                                            Trend (Scaleup)                                             Trend (workload)
              1
                                                                                 0.5

                   1.8                                    18
                                                                                       1                        10
                              Extent size (x103)                                           Extent size (x103)


              Fig. 10. T40I10D100K scaleup                                  Fig. 11. T40I10D100K workload


 3. Kuznetsov, S.O.: Stability as an estimate of the degree of substantiation of hypotheses de-
    rived on the basis of operational similarity. Automatic Documentation and Mathematical
    Linguistics 24 (1990) 62–75
 4. Kuznetsov, S.O., Obiedkov, S.A., Roth, C.: Reducing the representation complexity of
    lattice-based taxonomies. In: Proceedings of the 15th International Conference on Con-
    ceptual Structures (ICCS), Sheffield, UK. (2007) 241–254
 5. Kuznetsov, S.O.: On stability of a formal concept. Ann. Math. Artif. Intell. 49 (2007) 101–
    115
 6. Buzmakov, A., Kuznetsov, S.O., Napoli, A.: Is concept stability a measure for pattern selec-
    tion? Procedia Computer Science 31 (2014) 918 – 927
 7. Klimushkin, M., Obiedkov, S.A., Roth, C.: Approaches to the selection of relevant concepts
    in the case of noisy data. In: Proceedings of the 8th International Conference(ICFCA) ,
    Agadir, Morocco. (2010) 255–266
 8. Roth, C., Obiedkov, S.A., Kourie, D.G.: Towards concise representation for taxonomies of
    epistemic communities. In: Proceedings of the 4th International Conference on Concept
    Lattices and Their Applications (CLA), Hammamet, Tunisia. (2006) 240–255
 9. Jay, N., Kohler, F., Napoli, A.: Analysis of social communities with iceberg and stability-
    based concept lattices. In: Proceedings of the 6th International Conference(ICFCA), Mon-
    treal, Canada. (2008) 258–272
10. Roth, C., Obiedkov, S.A., Kourie, D.G.: On succinct representation of knowledge community
    taxonomies with formal concept analysis. Int. J. Found. Comput. Sci. 19 (2008) 383–404
11. Qiao, S.Y., Wen, S.P., Chen, C.Y., Li, Z.G.: A fast algorithm for building concept lattice.
    (2003)
12. Demko, C., Bertet, K.: Generation algorithm of a concept lattice with limited object access.
    In: Proceedings of the 8th International Conference Concept Lattices and Their Applications
    (CLA), Nancy, France. (2011) 239–250
  Attributive and Object Subcontexts in Inferring
        Good Maximally Redundant Tests

                      Xenia Naidenova1 and Vladimir Parkhomenko2
                  1
                    Military Medical Academy, Saint-Petersburg, Russia
                                  ksennaid@gmail.com
        2
          St. Petersburg State Polytechnical University, Saint-Petersburg, Russia
                               parhomenko.v@gmail.com



         Abstract. Inferring Good Maximally Redundant Classification Tests
         (GMRTs) as Formal Concepts is considered. Two kinds of classification
         subcontexts are defined: attributive and object ones. The rules of forming
         and reducing subcontexts based on the notion of essential attributes and
         objects are given. They lead to the possibility of the inferring control. In
         particular, an improved Algorithm for Searching all GMRTs on the basis
         of attributive subtask is proposed. The hybrid attributive and object
         approaches are presented. Some computational aspects of algorithms are
         analyzed.

         Keywords: good classification test, Galois lattice, essential attributes
         and objects, implications, subcontexts


  1    Introduction
  Good Test Analysis (GTA) deals with the formation of the best descriptions of
  a given object class (class of positive objects) against the objects which do not
  belong to this class (class of negative objects) on the basis of lattice theory. We
  assume that objects are described in terms of values of a given set U of attributes,
  see an example in Tab.1. The key notion of GTA is the notion of classification. To
  give a target classification of objects, we use an additional attribute KL ∈  / U. A
  target attribute partitions a given set of objects into disjoint classes the number
  of which is equal to the number of values of this attribute. In Tab.1, we have
  two classes: the objects in whose descriptions the target value k appears and all
  the other objects.
      Denote by M the set of attribute values such that M = {∪dom(attr), attr ∈
  U }, where dom(attr) is the set of all values of attr, i.e. a plain scaling in terms
  of [3]. Let G = G+ ∪ G− be the set of objects, where G+ and G− are the sets of
  positive and negative objects respectively. Let P (B), B ⊆ M, be the set of all the
  objects in whose descriptions B appears. P (B) is called the interpretation of B
  in the power set 2G . If P (B) contains only G+ objects and the number of these
  objects is more than 2, then B is called a description of some positive objects or
  a diagnostic (classification) test for G+ [1]. The words diagnostic (classification)
  can be omitted in the paper.

c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 181–193,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
182      Xenia Naidenova and Vladimir Parkhomenko



                    Table 1. Motivating Example of classification

                     No Height Color of Hair Color of Eyes KL
                     1   Low     Blond          Blue          k(+)
                     2   Low     Brown          Blue          k(−)
                     3   Tall    Brown          Hazel         k(−)
                     4   Tall    Blond          Hazel         k(−)
                     5   Tall    Brown          Blue          k(−)
                     6   Low     Blond          Hazel         k(−)
                     7   Tall    Red            Blue          k(+)
                     8   Tall    Blond          Blue          k(+)




    Let us recall the definition of a good test or good description for a subset of
G+ (via partitions of objects). A subset B ⊆ M of attribute values is a good
test for a subset of positive objects if it is a test and no such subset C ⊆ M
exists, so that P (B) ⊂ P (C) ⊆ G+ [7].
    Sec.2 is devoted to defining a concept of good diagnostic (classification) test
as a formal concept. Sec.3 gives the decomposition of good tests inferring based
on two kinds of subcontexts of the initial classification context. Sec.4 is devoted
to an analysis of algorithms based on using subcontexts including the evaluation
of the number of sub-problems to be solved, the depth of recursion, the structure
of sub-problems and their ordering, and some others.


2     Good Maximally Redundant Tests as Formal Concepts
Assume that G = 1, N is the set of objects indices (objects, for short) and
M = {m1 , m2 , . . . , mj , . . . mm } is the set of attributes values (values, for short).
Each object is described by a set of values from M . The object descriptions are
represented by rows of a table whose columns are associated with the attributes
taking their values in M .
     Let A ⊆ G, B ⊆ M . Denote by Bi , Bi ⊆ M , i = 1, N the description of
object with index i. The Galois connection between the ordered sets (2G , ⊆) and
(2M , ⊆) is defined by the following mappings called derivation operators: for
A ⊆ G and B ⊆ M , A0 = val(A) = {intersection of all Bi | Bi ⊆ M, i ∈ A} and
B 0 = obj(B) = {i| i ∈ G, B ⊆ Bi }. Of course, we have obj(B) = {intersection of
all obj(m)| obj(m) ⊆ G, m ∈ B}.
     There are two closure operators [9]: generalization of(B) = B 00 = val(obj(B))
and generalization of(A) = A00 = obj(val(A)). A set A is closed if A = obj(val(A)).
A set B is closed if B = val(obj(B)). For g ∈ G and m ∈ M , {g}0 is denoted by
g 0 and called object intent, and {m}0 is denoted by m0 and called value extent.
Let us recall the main definitions of GTA [7].
     A Diagnostic Test (DT) for the positive examples G+ is a pair (A, B) such
that B ⊆ M , A = B 0 6= ∅, A ⊆ G+ , B 6⊆ g 0 ∀g ∈ G− . A diagnostic test (A, B)
                  Subcontexts in Inferring Good Maximally Redundant Tests        183


for G+ is maximally redundant if obj(B ∪ m) ⊂ A for all m ∈       / B and m ∈ M .
A diagnostic test (A, B) for G+ is good if and only if any extension A∗ = A ∪ i,
i∈/ A, i ∈ G+ implies that (A∗ , val(A∗ )) is not a test for G+ .
    In the paper, we deal with Good Maximally Redundant Tests (GMRTs). If
a good test (A, B) for G+ is maximally redundant, then any extension B∗ =
B ∪ m, m ∈  / B, m ∈ M implies that (obj(B∗ ), B∗ ) is not a good test for G+ .
Any object description d of g ∈ G in a given classification context is a maximally
redundant set of values because ∀m ∈  / d, m ∈ M, obj(d∪m) is equal to ∅. GMRT
can be regarded as a special type of hypothesis [4]
    In Tab.1, ((1, 8), Blond Blue) is a GMRT for k(+), ((4, 6), Blond Hazel) is a
DT for k(−) but not a good one, and ((3, 4, 6), Hazel) is a GMRT for k(−).


3   The Decomposition of Inferring GMRTs into Subtasks
There are two possible kinds of subtasks of inferring GMRTs for a set G+ [8]:
 1. given a set of values, where B ⊆ M, obj(B) 6= ∅, B is not included in
    any description of negative object, find all GMRTs (obj(B∗ ), B∗ ) such that
    B∗ ⊂ B;
 2. given a non-empty set of values X ⊆ M such that (obj(X), X) is not a test
    for positive objects, find all GMRTs (obj(Y ), Y ) such that X ⊂ Y .
    For solving these subtasks we need only form subcontexts of a given classifi-
cation context. The first subtask is useful to find all GMRTs whose intents are
contained in the description d of an object g. This subtask is considered in [2] for
fast incremental concept formation, where the definition of subcontexts is given.
    We introduce the projection of a positive object description d on the
set D+ , i.e. descriptions of all positive objects. The proj(d) is Z = {z| z =
d ∩ d∗ 6= ∅, d∗ ∈ D+ and (obj(z), z) is a test for G+ }.
    We also introduce a concept of value projection proj(m) of a given value
m on a given set D+ . The value projection is proj(m) = {d| m appears in d, d ∈
D+ }.
    Algorithm Algorithm for Searching all GMRTs on the basis of attributive
subtask (ASTRA), based on value projections, was advanced in [6]. Algoritm
DIAGaRa, based on object projections, was proposed in [5]. In what follows,
we are interested in using both kinds of subcontexts for inferring all GMRTs
for a positive (or negative) class of objects. The following theorem gives the
foundation of reducing subcontexts [6].
Theorem 1. Let X ⊆ M, (obj(X), X) be a maximally redundant test for pos-
itive objects and obj(m) ⊆ obj(X), m ∈ M . Then m can not belong to any
GMRT for positive objects different from (obj(X), X).
    Consider some example of reducing subcontext (see Tab.1). Let splus(m) be
obj(m) ∩ G+ or obj(m) ∩ G− and SPLUS be {splus(m)| m ∈ M }. In Tab.1, we
have SPLUS = obj(m) ∩ G− = {{3, 4, 6}, {2, 3, 5}, {3, 4, 5}, {2, 5}, {4, 6}, {2, 6}}
for values “Hazel, Brown, Tall, Blue, Blond, and Low” respectively.
184      Xenia Naidenova and Vladimir Parkhomenko


   We have val(obj(Hazel)) = Hazel, hence ((3, 4, 6), Hazel) is a DT for G− .
Then value “Blond” can be deleted from consideration, because splus(Blond) ⊂
splus(Hazel). Delete values Blond and Hazel from consideration. After that the
description of object 4 is included in the description of object 8 of G+ and the
description of object 6 is included in the description of object 1 of G+ . Delete
objects 4 and 6. Then for values “Brown, Tall, Blue, and Low” respectively
SPLUS = {{2, 3, 5}, {3, 5}, {2, 5}, {2}}. Now we have val(obj(Brown)) = Brown
and ((2, 3, 5), Brown) is a test for G− . All values are deleted and all GMRTs for
G− have been obtained.
   The initial information for finding all the GMRTs contained in a positive
object description is the projection of it on the current set D+ . It is essential that
the projection is a subset of object descriptions defined on a certain restricted
subset t∗ of values. Let s∗ be the subset of indices of objects whose descriptions
produce the projection. In the projection, splus(m) = obj(m) ∩ s∗ , m ∈ t∗ .
    Let STGOOD be the partially ordered set of elements s satisfying the con-
dition that (s, val(s)) is a good test for D+ . The basic recursive procedure for
solving any kind of subtask consists of the following steps:


 1. Check whether (s∗ , val(s∗ ) is a test and if so, then s∗ is stored in STGOOD
    if s∗ corresponds to a good test at the current step; in this case, the subtask
    is over. Otherwise go to the next step.
 2. The value m can be deleted from the projection if splus(m) ⊆ s for some
    s ∈ STGOOD.
 3. For each value m in the projection, check whether (splus(m), val(splus(m))
    is a test and if so, then value m is deleted from the projection and splus(m)
    is stored in STGOOD if it corresponds to a good test at the current step.
 4. If at least one value has been deleted from the projection, then the reduction
    of the projection is necessary. The reduction consists in checking, for each
    element t of the projection, whether (obj(t), t) is not a test (as a result
    of previous eliminating values) and if so, this element is deleted from the
    projection. If, under reduction, at least one element has been deleted, then
    Step 2, Step 3, and Step 4 are repeated.
 5. Check whether the subtask is over or not. The subtask is over when either
    the projection is empty or the intersection of all elements of the projection
    corresponds to a test (see, please, Step 1). If the subtask is not over, then
    the choice of an object (value) in this projection is selected and the new
    subtask is formed. The new subsets s∗ and t∗ are constructed and the basic
    algorithm runs recursively.


    The algorithm of forming STGOOD is based on topological sorting of par-
tially ordered sets. The set TGOOD of all the GMRTs is obtained as follows:
TGOOD = {tg| tg = (s, val(s)), s ∈ STGOOD}.
                    Subcontexts in Inferring Good Maximally Redundant Tests               185


4    Selecting and Ordering Subcontexts and Inferring
     GMRTs
Algorithms for inferring GMRTs are constructed by the rules of selecting and
ordering subcontexts of the main classification context. Before entering into the
details, let us recall some extra definitions. Let t be a set of values such that
(obj(t), t) is a test for G+ . We say that the value m ∈ M, m ∈ t is essential
in t if (obj(t \ m), (t \ m)) is not a test for a given set of objects. Generally, we
are interested in finding the maximal subset sbmax(t) ⊂ t such that (obj(t), t)
is a test but (obj(sbmax(t)), sbmax(t)) is not a test for a given set of positive
objects. Then sbmin(t) = t \ sbmax(t) is a minimal set of essential values in t.
Let s ⊆ G+ , assume also that (s, val(s)) is not a test.
    The object tj , j ∈ s is said to be an essential in s if (s\j, val(s\j)) proves
to be a test for a given set of positive objects. Generally, we are also interested
in finding the maximal subset sbmax(s) ⊂ s such that (s, val(s)) is not a test
but (sbmax(s), val(sbmax(s)) is a test for a given set of positive objects. Then
sbmin(s) = s \ sbmax(s) is a minimal set of essential objects in s.
    An Approach for Searching for Initial Content of STGOOD. In the
beginning of inferring GMRTs, the set STGOOD is empty. Next we describe
the procedure to obtain an initial content of it. This procedure extracts a quasi-
maximal subset s∗ ⊆ G+ which is the extent of a test for G+ (maybe not good).
    We begin with the first index i1 of s∗ , then we take the next index i2 of
s∗ and evaluate the function to be test({i1 , i2 }, val({i1 , i2 })). If the value of the
function is true, then we take the next index i3 of s∗ and evaluate the function
to be test({i1 , i2 , i3 }, val({i1 , i2 , i3 })). If the value of the function is false, then
the index i2 of s∗ is skipped and the function to be test({i1 , i3 }, val({i1 , i3 })))
is evaluated. We continue this process until we achieve the last index of s∗ .
    The complexity of this procedure is evaluated as the production of ||s∗ ||
by the complexity of the function to be test(). To obtain the initial content of
STGOOD, we use the set SPLUS = {splus(m)|m ∈ M } and apply the procedure
described above to each element of SPLUS.
    The idea of using subcontexts in inferring GMRTs, described in Sec.3, can be
presented in a pseudo-code form, see Fig.1. It presents a modification of ASTRA.
DIAGARA and a hybrid approach can be easily formalized by the same way.
The example below describes two general hybrid methods.
    The initial part of GenAllGMRTs() is well discussed above. The abbreviation
LEV stands for the List (set) of Essential Values. The function DelObj(M, G+ )
returns modified G and f lag. The variable f lag is necessary for switching at-
tributive subtasks. The novelty of ASTRA-2 is mainly based on using LEV.
There is the new function ChoiceOfSubtask(). It returns na := LEVj with
the maximal 2splus(LEVj ) . MainContext, defined FormSubTask(na, M, G+ ), con-
sists of object descriptions. There is the auxiliary function kt(m) = true if
(m0 ∈ G− = f alse) and f alse otherwise.
    To illustrate this procedure, we use the sets D+ and D− represented in
Tab.2 and 3 (our illustrative example). In these tables, M = {m1 , . . . , m26 }.
The set SPLUS0 for positive class of examples is in Tab.4. The initial content of
186     Xenia Naidenova and Vladimir Parkhomenko


 1.Algorithm GenAllGMRTs()               1.Algorithm DelVal()
      Input: G, M                        2.   i := 1;
      Output: STGOOD                     3.   f lag := 0;
 2.   begin                              4.   while i ≤ 2M do
 3.      Forming STGOOD ;                5.       if Mi0 ⊆ G+ then
 4.      Forming and Ordering LEV ;      6.           M := M \Mi ;
 5.      f lag:=1;                       7.           f lag := 1;
 6.   end                                8.       end
 7.   while true do                      9.       else if kt(Mi0 ∩ G+ ) then
 8.      while flag=1 do                10.           j :=1 ;
 9.          M, f lag DelVal(M, G+ );   11.           while j ≤ 2STGOOD do
10.          if flag=1 then             12.               if STGOODj ⊆
11.               return;                                 Mi0 ∩ G+ then
12.          end                        13.                   STGOOD :=
13.          G+ , f lag                                       STGOOD\
             DelObj(M, G+ );                                  STGOODj
14.      end                            14.               end
15.      if M 0 ⊆ G− or                 15.           end
         G+ ⊆ STGOOD then               16.           STGOOD :=
16.          return STGOOD;                           STGOOD ∪ Mi0 ∩ G+ ;
17.      end                            17.           M := M \Mi ;
18.      MSUB :=∅;                      18.           f lag := 1;
19.      GSUB :=∅;
                                        19.       return;
20.      ChoiceOfSubtask;
                                        20.   end
21.      MSUB , GSUB
         FormSubTask(na, M, G+ );                     (b) DelVal
22.      GenAllGMRTs();
23.      M :=M \Mna ;
24.      G+ , f lag DelObj(M, G+ );
25.   end

          (a) GenAllGMRTs
 1.Algorithm DelObj()                    1.Algorithm FormSubTask()
 2.   i := 1;                            2.   i := 1;
                                                          0
 3.   f lag := 0;                        3.   GSUB := Mna   ∩ G+ ;
 4.   while i ≤ 2G+ do                   4.   while i ≤ 2GSUB
                                                               do
 5.       if G+ (i) ⊆ M \LEV then        5.       MSUB := MSUB ∪
 6.           G+ := G+ \G+ (i);                   (MainContext(GSUB (i)∩M ));
 7.           f lag := 1;                6.   end
 8.       end                            7.   return;
 9.   end
10.   return;                                      (d) FormSubTask

              (c) DelObj

                        Fig. 1. Algorithms of ASTRA-2
                  Subcontexts in Inferring Good Maximally Redundant Tests            187


STGOOD0 is {(2,10), (3, 10), (3, 8), (4, 12), (1, 4, 7), (1, 5,12), (2, 7, 8), (3, 7,
12), (1, 2, 12, 14), (2, 3, 4, 7), (4, 6, 8, 11)}.



                 Table 2. The set D+ of positive object descriptions

             G D+
             1 m1 m2 m5 m6 m21 m23 m24 m26
             2 m4 m7 m8 m9 m12 m14 m15 m22 m23 m24 m26
             3 m3 m4 m7 m12 m13 m14 m15 m18 m19 m24 m26
             4 m1 m4 m5 m6 m7 m12 m14 m15 m16 m20 m21 m24 m26
             5 m2 m6 m23 m24
             6 m7 m20 m21 m26
             7 m3 m4 m5 m6 m12 m14 m15 m20 m22 m24 m26
             8 m3 m6 m7 m8 m9 m13 m14 m15 m19 m20 m21 m22
             9 m16 m18 m19 m20 m21 m22 m26
             10 m2 m3 m4 m5 m6 m8 m9 m13 m18 m20 m21 m26
             11 m1 m2 m3 m7 m19 m20 m21 m22 m26
             12 m2 m3 m16 m20 m21 m23 m24 m26
             13 m1 m4 m18 m19 m23 m26
             14 m23 m24 m26




    In these tables we denote subsets of values {m8 , m9 }, {m14 , m15 } by ma and
mb , respectively. Applying operation generalization of(s) = s00 = obj(val(s)) to
∀s ∈ STGOOD, we obtain STGOOD1 = {(2,10), (3, 10), (3, 8), (4, 7, 12), (1, 4,
7), (1, 5,12), (2, 7, 8), (3, 7, 12), (1, 2, 12, 14), (2, 3, 4, 7), (4, 6, 8, 11)}.
    By Th.1, we can delete value m12 from consideration, see splus(m12 ) in Tab.4.
The initial content of STGOOD allows to decrease the number of using the
procedure to be test() and the number of putting extents of tests into STGOOD.
    The number of subtasks to be solved. This number is determined
by the number of essential values in the set M . The quasi-minimal subset of
essential values in M can be found by a procedure analogous to the proce-
dure applicable to search for the initial content of STGOOD. We begin with
the first value m1 of M , then we take the next value m2 of M and evalu-
ate the function to be test(obj({m1 , m2 }), {m1 , m2 }). If the value of the func-
tion is false, then we take the next value m3 of M and evaluate the function
to be test(obj({m1 , m2 , m3 }), {m1 , m2 , m3 }). If the value of the function is true,
then value m2 of M is skipped and the function to be test(obj({m1, m3}), {m1,
m3}) is evaluated. We continue this process until we achieve the last value of M .
The complexity of this procedure is evaluated as the production of ||M || by the
complexity of the function to be test(). In Tab.2,3 we have the following LEV :
{m16 , m18 , m19 , m20 , m21 , m22 , m23 , m24 , m26 }.
188        Xenia Naidenova and Vladimir Parkhomenko




                   Table 3. The set D− of negative object descriptions

 G D−                                   G D−
 15 m3 m8 m16 m23 m24           32 m1 m2 m3 m7 m9 m13 m18
 16 m7 m8 m9 m16 m18            33 m1 m5 m6 m8 m9 m19 m20 m22
 17 m1 m21 m22 m24 m26          34 m2 m8 m9 m18 m20 m21 m22 m23 m26
 18 m1 m7 m8 m9 m13 m16         35 m1 m2 m4 m5 m6 m7 m9 m13 m16
 19 m2 m6 m7 m9 m21 m23         36 m1 m2 m6 m7 m8 m13 m16 m18
 20 m19 m20 m21 m22 m24         37 m1 m2 m3 m4 m5 m6 m7 m12 m14 m15 m16
 21 m1 m20 m21 m22 m23 m24      38 m1 m2 m3 m4 m5 m6 m9 m12 m13 m16
 22 m1 m3 m6 m7 m9 m16          39 m1 m2 m3 m4 m5 m6 m14 m15 m19 m20 m23 m26
 23 m2 m6 m8 m9 m14 m15 m16     40 m2 m3 m4 m5 m6 m7 m12 m13 m14 m15 m16
 24 m1 m4 m5 m6 m7 m8 m16       41 m2 m3 m4 m5 m6 m7 m9 m12 m13 m14 m15 m19
 25 m7 m13 m19 m20 m22 m26      42 m1 m2 m3 m4 m5 m6 m12 m16 m18 m19 m20 m21 m26
 26 m1 m2 m3 m5 m6 m7 m16       43 m4 m5 m6 m7 m8 m9 m12 m13 m14 m15 m16
 27 m1 m2 m3 m5 m6 m13 m18      44 m3 m4 m5 m6 m8 m9 m12 m13 m14 m15 m18 m19
 28 m1 m3 m7 m13 m19 m21        45 m1 m2 m3 m4 m5 m6 m7 m8 m9 m12 m13 m14 m15
 29 m1 m4 m5 m6 m7 m8 m13 m16 46 m1 m3 m4 m5 m6 m7 m12 m13 m14 m15 m16 m23 m24
 30 m1 m2 m3 m6 m12 m14 m15 m16 47 m1 m2 m3 m4 m5 m6 m8 m9 m12 m14 m16 m18 m22
 31 m1 m2 m5 m6 m14 m15 m16 m26 48 m2 m8 m9 m12 m14 m15 m16




                                  Table 4. The set SPLUS0

      splus(m), m ∈ M                  splus(m), m ∈ M
      splus(ma ) → {2, 8, 10}         splus(m22 ) → {2, 7, 8, 9, 11}
      splus(m13 ) → {3, 8, 10}        splus(m23 ) → {1, 2, 5, 12, 13, 14}
      splus(m16 ) → {4, 9, 12}        splus(m3 ) → {3, 7, 8, 10, 11, 12}
      splus(m1 ) → {1, 4, 11, 13}     splus(m4 ) → {2, 3, 4, 7, 10, 13}
      splus(m5 ) → {1, 4, 7, 10}      splus(m6 ) → {1, 4, 5, 7, 8, 10}
      splus(m12 ) → {2, 3, 4, 7}      splus(m7 ) → {2, 3, 4, 6, 8, 11}
      splus(m18 ) → {3, 9, 10, 13}    splus(m24 ) → {1, 2, 3, 4, 5, 7, 12, 14}
      splus(m2 ) → {1, 5, 10, 11, 12} splus(m20 ) → {4, 6, 7, 8, 9, 10, 11, 12}
      splus(mb ) → {2, 3, 4, 7, 8}    splus(m21 ) → {1, 4, 6, 8, 9, 10, 11, 12}
      splus(m19 ) → {3, 8, 9, 11, 13} splus(m26 ) → {1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14}
                   Subcontexts in Inferring Good Maximally Redundant Tests                 189


Proposition 1. Each essential value is included at least in one positive object
description.

Proof. Assume that for an object description ti , i ∈ G+ , we have ti ∩ LEV = ∅.
Then ti ⊆ M \LEV. But M \LEV is included at least in one of the negative object
descriptions and, consequently, ti also possesses this property. But it contradicts
to the fact that ti is a description of a positive object.                       t
                                                                                 u

Proposition 2. Assume that X ⊆ M . If X ∩ LEV = ∅, then to be test(X) =
false.

    Proposition 2 is the consequence of Proposition 1.
    Note that the description of t14 = {m23 , m24 , m26 } is closed because of
obj{m23 , m24 , m26 } = {1, 2, 12, 14} and val{1, 2, 12, 14} = {m23 , m24 , m26 }. We
also know that s = {1, 2, 12, 14} is closed too (we obtained this result during
generalization of elements of STGOOD. So (obj({m23 , m24 , m26 }), {m23 , m24 ,
m26 }) is a maximally redundant test for positive objects and we can, conse-
quently, delete t14 from consideration. As a result of deleting m12 and t14 , we
have the modified set SPLUS (Tab.5).



                               Table 5. The set SPLUS1

     splus(m), m ∈ M                  splus(m), m ∈ M
     splus(ma ) → {2, 8, 10}         splus(m22 ) → {2, 7, 8, 9, 11}
     splus(m13 ) → {3, 8, 10}        splus(m23 ) → {1, 2, 5, 12, 13}
     splus(m16 ) → {4, 9, 12}        splus(m3 ) → {3, 7, 8, 10, 11, 12}
     splus(m1 ) → {1, 4, 11, 13}     splus(m4 ) → {2, 3, 4, 7, 10, 13}
     splus(m5 ) → {1, 4, 7, 10}      splus(m6 ) → {1, 4, 5, 7, 8, 10}
                                     splus(m7 ) → {2, 3, 4, 6, 8, 11}
     splus(m18 ) → {3, 9, 10, 13}    splus(m24 ) → {1, 2, 3, 4, 5, 7, 12}
     splus(m2 ) → {1, 5, 10, 11, 12} splus(m20 ) → {4, 6, 7, 8, 9, 10, 11, 12}
     splus(mb ) → {2, 3, 4, 7, 8}    splus(m21 ) → {1, 4, 6, 8, 9, 10, 11, 12}
     splus(m19 ) → {3, 8, 9, 11, 13} splus(m26 ) → {1, 2, 3, 4, 6, 7, 9, 10, 11, 12, 13}




    The main question is how we should approach the problem of selecting and
ordering subtasks (subcontexts). Consider Tab.6 with auxiliary information. It
is clear that if we shall have all the intents of GMRTs entering into descriptions
of objects 1, 2, 3, 5, 7, 9, 10, 12, then the main task will be over because the
remaining object descriptions (objects 4, 6, 8, 11) give, in their intersection, the
intent of already an known test (see, please, the initial content of STGOOD).
Thus we have to consider only the subcontexts of essential values associated with
object descriptions 1, 2, 3, 5, 7, 9, 10, 12, 13. The number of such subcontexts
is 39. But this estimation is not realistic.
190     Xenia Naidenova and Vladimir Parkhomenko



                            Table 6. Auxiliary information
                                                                     P
          index of object m16 m18 m19 m20 m21 m22 m23 m24 m26            mij
          1                                   ×         ×    ×    × 4
          2                                        ×    ×    ×    × 4
          3                      ×   ×                       ×    × 4
          5                                             ×    ×      2
          7                              ×         ×         ×    × 4
          9                 ×    ×   ×   ×    ×    ×              × 7
          10                     ×       ×    ×                   × 4
          12                ×            ×    ×         ×    ×    × 4
          13                     ×   ×                  ×         × 4
          4                 ×            ×    ×              ×    ×
          6                              ×    ×                   ×
          8                          ×   ×    ×    ×              ×
          11                         ×   ×    ×    ×              ×
          P
               di            2   4   3   4    4    3    5    6    8 39




    We begin with ordering index of objects by the number of their entering in
tests in STGOOD1 , see Tab.7.



                    Table 7. Ordering index of objects in STGOOD1

         Index of object                          9 13 5 10 1 2 3 12 7
         The number of entering in STGOOD1 0        0   1    2   3 4 4   4 5




    Then we continue with object descriptions t9 and t13 . Now we should select
the subcontexts (subtasks), based on proj(t × m), where t is object description
containing the smallest number of essential values and m is an essential value in
t, entering in the smallest number of object descriptions. After solving each sub-
task, we have to correct the sets SPLUS, STGOOD, and auxiliary information.
So, the first sub-task is t9 × m16 . Solving this sub-task, we have not any new
test, but we can delete m16 from t9 and then we solve the sub-task t9 × m19 . As
a result, we introduce s = {9, 11} in STGOOD and delete t9 from consideration
because of m16 , m19 are the only essential values in this object description.
    In the example (method 1), we have the following subtasks (Tab. 8).
    Tab.10 shows the sets STGOOD and TGOOD. All subtasks did not require a
recursion. A simpler method of ordering contexts is based on the basic recursive
procedure for solving any kind of subtask described in the previous section. At
                  Subcontexts in Inferring Good Maximally Redundant Tests         191



                   Table 8. The sequence of subtasks (method 1)

         N subcontext Extent of New Test Deleted values       Deleted objects
         1 t9 × m16
         2 t9 × m19    (9, 11)                                t9
         3 t13 × m18
         4 t13 × m19   (13)                m16 , m18          t13
         5 t5 × m23                        m23
         6 t5 × m24                                           t5
         7 t10 × m20   (8, 10)
         8 t10 × m21
         9 t10 × m26                       ma , m13 , m4 , m5 t10
         10 t1 × m21
         11 t1 × m24                       m1 , m2            t1
         12 t2 × m22   (7, 8, 11)          m22
         13 t2 × m22
         14 t2 × m24                                          t2
         15 t3 × m19   (3, 11)             m19
         16 t3 × m24                       m24                t12 , t7
         17 t3 × m26                                          t3




each level of recursion, we can select the value entering into the greatest number
of object descriptions; the object descriptions not containing this value generate
the contexts to find GMRTs whose intents are included in them. For our example,
value m26 does not cover two object descriptions: t5 and t8 . The initial context is
associated with m26 . The sequence of subtasks in the basic recursive procedure
is in Tab.9 (method 2). We assume, in this example, that the GMRT intent of
which is equal to t14 has been already obtained.
    We consider only two possible ways of GMRTs construction based on de-
composing the main classification context into subcontexts and ordering them
by the use of essential values and objects. It is possible to use the two sets
QT = {{i, j} ⊆ G+ | ({i, j}, val({i, j}) is a test for G+ } and QAT = {{i, j} ⊆
G+ |({i, j}, val({i, j}) is not a test for G+ } for forming subcontexts and their or-
dering in the form of a tree structure.

5   Conclusion
In this paper, the decomposition of inferring good classification tests into sub-
tasks of the first and second kinds is presented. This decomposition allows, in
principle, to transform the process of inferring good tests into a step by step
reasoning process.
    The rules of forming and reducing subcontexts are given, in this paper. Vari-
ous possibilities of constructing algorithms for inferring GMRTs with the use of
both subcontexts are considered depending on the nature of GMRTs features.
192         Xenia Naidenova and Vladimir Parkhomenko




                    Table 9. The sequence of subtasks (method 2)

                                                                        Object descriptions
         Context,             Extents of tests         Values deleted
N                                                                            deleted
      associated with            obtained               from context
                                                                          from context
                               (2, 10), (3, 10),ma , m13 , mb ,              t10
1     m26
                             (2, 3, 4, 7), (1, 4, 7)m5 , m6
                                              m3 , m20 , m23 , m1 ,
                               (3, 7, 12),
2 m26 , m24                                   m2 , m4 , m7 , m16 ,
                                (4, 7, 12)
                                                m18 , m19 , m22
   Subtask is over; return to the previous context and delete m24
                                              m3 , m7 , m16 , m18 ,
3 m26 , not m24 , m23             (13)
                                                m19 , m20 , m22
   Subtask is over; return to the previous context, delete m23
                                              m2 , m3 , m4 , m16 ,
4 m26 , not m24 , not m23
                                                m18 m19 , m21
   m26 , m22 , not m24 ,
5                            (9,11), (7,11)                                t2 , t7
   not m23
   Subtask is over; return to the previous context and delete m22
   m26 , not m24 ,                            m2 , m3 , m4 , m16 ,    t7 , t 9 , t 2 , t 3
6                           (3,11), (4,6,11)
   not m23 , not m22                          m18 , m19
   Subtask is over; we have obtained all GMRTs whose intents contain m26
7 Context t5                    (1,5,12)                                      t5
   Subtask is over; we have found all GMRTs whose intents are contained in t5                .
                                              m3 , m20 , mb , m6 ,
8 Context t8 × m22          (7,8,11), (2,7,8)
                                              ma , m13 , m19 , m21
   Subtask is over; return to the previous context and delete m22
   Context t8
9                                (8,10)               ma                   t2 , t7
   without m22
   Context t8 × m21
10                             (4,6,8,11)       m7 , m13 , m19         t6 , t10 , t11
   without m22
   Subtask is over; return to the previous context and delete m21 , m20
   Context t8 without
11                               (3, 8)                              t4 , t6 , t10 , t11
   m22 , m21 , m20
   Subtask is over; we have found all GMRTs whose intents are contained in t8 .
                     Subcontexts in Inferring Good Maximally Redundant Tests         193



                      Table 10. The sets STGOOD and TGOOD

         N STGOOD TGOOD                       N STGOOD TGOOD
         1   13         m1 m4 m18 m19 m23 m26 9 2,7,8        mb m22
         2   2,10       m4 ma m26             10 1,5,12      m2 m23 m24
         3   3,10       m3 m4 m13 m18 m26     11 4,7,12      m20 m24 m26
         4   8,10       m3 m6 ma m13 m20 m21 12 3,7,12       m3 m24 m26
         5   9,11       m19 m20 m21 m22 m26   13 7,8,11      m3 m20 m22
         6   3,11       m3 m7 m19 m26         14 2,3,4,7     m4 m12 mb m24 m26
         7   3,8        m3 m7 m13 mb m19      15 4,6,8,11    m7 m20 m21
         8   1,4,7      m5 m6 m24 m26         16 1,2,12,14   m23 m24 m26




References
1. Chegis, I., Yablonskii, S.: Logical methods of electric circuit control. Trudy Mian
   SSSR 51, 270–360 (1958), (in Russian)
2. Ferré, S., Ridoux, O.: The use of associative concepts in the incremental building
   of a logical context. In: Priss, U., Corbett, D., Angelova, G. (eds.) ICCS. Lecture
   Notes in Computer Science, vol. 2393, pp. 299–313. Springer (2002)
3. Ganter, B., Wille, R.: Formal concept analysis: mathematical foundations. Springer,
   Berlin (1999)
4. Ganter, B., Kuznetsov, S.O.: Formalizing hypotheses with concepts. In: Proceedings
   of the Linguistic on Conceptual Structures: Logical Linguistic, and Computational
   Issues. pp. 342–356. Springer-Verlag (2000)
5. Naidenova, X.A.: DIAGARA: An Incremental Algorithm for Inferring Implicative
   Rules from Examples. Inf. Theories and Application 12 - 2, 171 – 196 (2005)
6. Naidenova, X.A., Plaksin, M.V., Shagalov, V.L.: Inductive Inferring All Good Clas-
   sification Tests. In: Valkman, J. (ed.) ”Knowledge-Dialog-Solution”, Proceedings of
   International Conference. vol. 1, pp. 79 – 84. Kiev Institute of Applied Informatics,
   Kiev, Ukraine (1995)
7. Naidenova, X.A., Polegaeva, J.G.: An Algorithm of Finding the Best Diagnostic
   Tests. In: Mintz, G., Lorents, E. (eds.) The 4-th All Union Conference ”Application
   of Mathematical Logic Methods”. pp. 87 – 92 (1986), (in Russian)
8. Naidenova, X., Ermakov, A.: The decomposition of good diagnostic test inferring
   algorithms. In: Alty, J., Mikulich, L., Zakrevskij, A. (eds.) ”Computer-Aided Design
   of Discrete Devices” (CAD DD2001), Proceedings of the 4-th Inter. Conf., vol. 3,
   pp. 61 – 68. Minsk (2001)
9. Ore, O.: Galois connections. Trans. Amer. Math. Soc 55, 494–513 (1944)
       Removing an incidence from a formal context

                           Martin Kauer? and Michal Krupka??

                             Department of Computer Science
                               Palacký University in Olomouc
                            17. listopadu 12, CZ-77146 Olomouc
                                       Czech Republic
                                    martin.kauer@upol.cz
                                   michal.krupka@upol.cz



           Abstract. We analyze changes in the structure of a concept lattice cor-
           responding to a context resulting from a given context with a known
           concept lattice by removing exactly one incidence. We identify the set
           of concepts affected by the removal and show how they can be used for
           computing concepts in the new concept lattice. We present algorithms
           for incremental computation of the new concept lattice, with or without
           structural information.


  1      Introduction

  When computing concept lattices of two very similar concepts (i.e., differing only
  in a small number of incidences), it doesn’t seem to be efficient to compute both
  concept lattices independently. Rather, an incremental method of computing one
  of the lattices using the other would be more desirable. Also, analyzing structural
  differences between concept lattices of two similar contexts would be interesting
  from the theoretical point of view.
      This paper presents first results in this direction. Namely, we consider two
  formal contexts differing in just one incidence and develop a method of comput-
  ing the concept lattice of the context without the incidence from the other one.
  In other words, we give a first answer to the question “What happens to the
  concept lattice, if we remove one cross from the context?”.
      Our results are the following. We consider contexts hX, Y, Ii and hX, Y, Ji
  such that J results from I by removing exactly one incidence. Further we consider
  the respective concept lattices B(I) and B(J). For these contexts and concept
  lattices we

   1. identify concepts in B(I), affected by the removal (they form an interval in
      B(I)),
  ?
       The author acknowledges support by IGA of Palacky University, No. PrF 2014 034
  ??
       The author acknowledges support by the ESF project No. CZ.1.07/2.3.00/20.0059.
       The project is co-financed by the European Social Fund and the state budget of the
       Czech Republic.


c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 195–207,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
196      Martin Kauer and Michal Krupka


 2. show how they transform to concepts in the new concept lattice (they will
    either vanish entirely, or transform to one or two concepts),
 3. derive several further results on the correspondence between the two lattices,
 4. propose two basic algorithms for transforming incrementally B(I) to B(J).

   Several algorithms for incremental computation of concept lattices have been
developed in the past [1, 5, 8, 6, 7, 2] (see also [4] for a comparison of some of the
algorithms). In general, the algorithms build a concept lattice incrementally
by modifying formal contexts by adding or removing objects one by one. Our
approach is different as we focus on removing just one incidence.


2     Formal concept analysis

Formal Concept Analysis has been introduced in [9], our basic reference is [3].
A (formal) context is a triple C = hX, Y, Ii where X is a set of objects, Y a set
of attributes and I ⊆ X × Y a binary relation between X and Y . For hx, yi ∈ I
it is said “The object x has the attribute y”.
     For subsets A ⊆ X and B ⊆ Y we set

               A↑I = {y ∈ Y | for each x ∈ A it holds hx, yi ∈ I},
               B ↓I = {x ∈ X | for each y ∈ B it holds hx, yi ∈ I}.

The pair h↑I , ↓I i is a Galois connection between sets X and Y , i.e., it satisfies
for each A, A1 , A2 ⊆ X, B, B1 , B2 ⊆ Y ,

 1. If A1 ⊆ A2 , then A↑2I ⊆ A↑1I , if B1 ⊆ B2 , then B2↓I ⊆ B1↓I .
 2. A ⊆ A↑I ↓I and B ⊆ B ↓I ↑I .

    If A↑I = B and B ↓I = A, then the pair hA, Bi is called a formal concept of
hX, Y, Ii. The set A is called the extent of hA, Bi, the set B the intent of hA, Bi.
    A partial order ≤ on the set B(X, Y, I) of all formal concepts of hX, Y, Ii is
defined by hA1 , B1 i ≤ hA2 , B2 i iff A1 ⊆ A2 (iff B2 ⊆ B1 ). B(X, Y, I) along with
≤ is a complete lattice and is called the concept lattice of hX, Y, Ii. Infima and
suprema in B(X, Y, I) are given by
                                       *        [       ↓I ↑I +
                      ^                  \
                        hAj , Bj i =       Aj ,      Bj           ,              (1)
                     j∈J                   j∈J       j∈J
                                          *                 +
                     _                         [  ↑I ↓I \
                           hAj , Bj i =         Aj      ,  Bj .                   (2)
                     j∈J                       j∈J         j∈J

One of immediate consequences of (1) and (2) is that the intersection of any
system of extents (resp. intents) is again an extent (resp. intent).
    Mappings γI : x 7→ h{x}↑I ↓I , {x}↑I i and µI : y 7→ h{y}↓I , {y}↓I ↑I i assign to
each object x its object concept and to each attributeW y its attribute concept.
                                                                          V        We
call a subset K ⊆ L, where L is a complete lattice, -dense (resp. -dense) if
                                Removing an Incidence from a Formal Context             197


and only if any element of L can be expressed by suprema (resp. infima) of some W
elements fromV K. The set of all object concepts (resp. attribute concepts) is -
dense (resp. -dense) in B(X, Y, I). This can be easily seen from (1) (resp. (2)).
    We will also need a notion of an interval in lattice L. We call a subset K ⊆ L
an interval, if and only if there exist elements a, b ∈ L such that K = {k ∈
L | a ≤ k ≤ b}. We denote K as [a, b].


3    Problem statement and basic notions

Let hX, Y, Ii, hX, Y, Ji be two contexts over the same sets of objects and at-
tributes such that hx0 , y0 i ∈
                              / J and I = J ∪ {hx0 , y0 i}.
    We usually denote concepts of hX, Y, Ii by c, c1 , hA, Bi, hA1 , B1 i, etc., and
concepts of hX, Y, Ji by d, d1 , hC, Di, hC1 , D1 i, etc. The respective concept
lattices will be denoted B(I) and B(J).
    Our goal is to find an efficient way to compute the concept lattice B(J) from
B(I). We provide two solutions to this problem. First solution computes just
elements of B(J), the second one adds also information on its structure. In this
section we introduce some basic tools and prove simple preliminary results.
    The following proposition shows a correspondence between the derivation
operators of contexts hX, Y, Ii and hX, Y, Ji.

Proposition 1. For each A ⊆ X and B ⊆ Y it holds
            ↑                               ↓
      ↑J     A I         if x0 ∈
                               / A,   ↓J      B I          if y0 ∈
                                                                 / B,
    A =                              B =
             A↑I \ {y0 } if x0 ∈ A,           B ↓I \ {x0 } if y0 ∈ B.

In particular, A↑J ⊆ A↑I and B ↓J ⊆ B ↓I .

Proof. Immediate.

   Formal concepts from the intersection B(I) ∩ B(J) are called stable. These
concepts are not influenced by removing the incidence hx0 , y0 i from I. When
computing B(J) from B(I), stable concepts need not be recomputed.

Proposition 2. A concept c ∈ B(I) is not stable iff c ∈ [γI (x0 ), µI (y0 )].

Proof. If c = hA, Bi ∈/ [γI (x0 ), µI (y0 )], then either x0 ∈/ A, or y0 ∈
                                                                         / B. If, for
instance, x0 ∈/ A, then by Proposition 1, B = A↑I = A↑J , showing B is the
intent of a d ∈ B(J). Now by Proposition 1,
                         ↓
                          B I =A                          if y0 ∈
                                                                / B,
                 B ↓J =
                          B ↓I \ {x0 } = A \ {x0 } = A if y0 ∈ B

and so d = c. The case y0 ∈   / B is dual.
    To prove the opposite direction it is sufficient to notice that c ∈ [γI (x0 ), µI (y0 )]
is equivalent to hx0 , y0 i ∈ A × B, excluding the case hA, Bi ∈ B(J).
198      Martin Kauer and Michal Krupka


   For concepts c = hA, Bi ∈ B(I), d = hC, Di ∈ B(J) we set

      c = hA , B  i = hA↑J ↓J , A↑J i,    c = hA , B i = hB ↓J , B ↓J ↑J i,
      d = hC  , D i = hD↓I , D↓I ↑I i,    d = hC , D i = hC ↑I ↓I , C ↑I i.

Evidently, c , c ∈ B(J) and d , d ∈ B(I). c (resp. c ) is called the upper
(resp. lower ) child of c. In our setting, d = d (it would not be the case if I \ J
had more than one element). It is the (unique) concept from B(I), containing,
as a rectangle, the rectangle represented by d.
    The following theorem shows basic properties of the pairs h ,  i and h ,  i.

Proposition 3 (child operators). The mappings c 7→ c , c 7→ c , and d 7→
d are isotone and satisfy

        c ≤ c ,         d ≤ d ,         c = c ,          d = d ,
        c ≥ c ,         d ≥ d ,         c = c ,          d = d .

Proof. Isotony follows directly from definition.
    Let c = hA, Bi. From Proposition 1 we have A↑J ⊆ A↑I . Thus, A = A↑I ↓I ⊆
A ↑J ↓I
        , whence c ≤ c . Similarly, for d = hC, Di, D↓J ⊆ D↓I , whence D↓I ↑J ⊆
  ↓J ↑J
D         = D.
    To prove c = c it suffices to show that for the extent A of c it holds
A↑J ↓I ↑J = A↑J . By Proposition 1, we have two possibilities: either A↑J = A↑I ,
or A↑J = A↑I \ {y0 }. In the first case A↑J ↓I ↑J = A↑J holds trivially, in the
second case A↑J ↓I = A↑J ↓J (by the same proposition, because y0 ∈    / A↑J ) and
  ↑J ↓I ↑J     ↑J ↓J ↑J    ↑J                      
A          =A           = A . The equality d      = d can be proved similarly.
    The assertions for lower children are dual.

Corollary 1. The mappings c 7→ c and d 7→ d are closure operators and
the mappings c 7→ c and d 7→ d are interior operators.

    Following two theorems utilize the operators  ,  ,  ,  to give several equiv-
alent characterizations of stable concepts. First we prove a proposition.

Proposition 4. The following assertions are equivalent for any c = hA, Bi ∈
B(I).
1. c is stable,
2. A↑I = A↑J ,
3. B ↓I = B ↓J .

Proof. “2 ⇒ 3”: by Proposition 1, A ⊆ A↑J ↓J = B ↓J ⊆ B ↓I = A.
    “3 ⇒ 2”: dual.
    The other implications follow by definition, since c is stable iff both 2. and
3. are satisfied.

Proposition 5 (stable concepts in B(I)). The following assertions are equiv-
alent for a concept c ∈ B(I):
                               Removing an Incidence from a Formal Context         199


1. c is stable,
2. c ∈
     / [γI (x0 ), µI (y0 )],
3. c = c ,
4. c = c ,
5. c = c .
Proof. Directly from Proposition 4.
Proposition 6 (stable concepts in B(J)). The following assertions are equiv-
alent for a concept d ∈ B(J):
1. d is stable,
2. d = d ,
3. d is stable.
Proof. Directly from Proposition 4.

4    Computing B(J ) without structural information
Proposition 7. The following holds for c = hA, Bi ∈ B(I) and d = hC, Di ∈
B(J): If d = c , then B ∈ {D, D ∪ {y0 }} and if d = c , then A ∈ {C, C ∪ {x0 }}.
Proof. By definition of  , D = A↑J , which is by Proposition 1 either equal to
B, or to B \ {y0 }. Similarly for  .
Proposition 8. A non-stable concept d ∈ B(J) is a (upper or lower) child of
exactly one concept c ∈ B(I). This concept is non-stable and satisfies c = d =
d .
Proof. Let d = hC, Di. Since d is non-stable, then either C ↑I 6= C ↑J , or D↓I 6=
D↓J . Suppose C ↑I 6= C ↑J and set A = C, B = C ↑I . By Proposition 1, x0 ∈ C,
y0 ∈/ D and B = D ∪ {y0 }. By the same proposition, A = C = D↓J = D↓I ,
whence A is an extent of I. Thus, c = hA, Bi ∈ B(I) and it is non-stable because
x0 ∈ A and y0 ∈ B (Proposition 2). Since D = C ↑J = A↑J , d = c . A = C
yields c = d .
    We prove uniqueness of c. By Proposition 7, if for c0 = hA0 , B 0 i ∈ B(I) we
have d = c0 , then either B 0 = D, or B 0 = D ∪ {y0 }. The first case is impossible,
because it would make D an intent of I and, consequently, d a stable concept.
The second case means c0 equals c above. There is a third case left: if d = c0 ,
then C = B 0↓J . Since x0 ∈ C, we have y0 ∈   / B 0 (Proposition 1). Thus, C = B 0↓I
(Proposition 1 again). Consequently, C = B 0 and since y0 ∈
                                           ↑I
                                                                    / B 0 , B 0 = C ↑J
(Proposition 1 for the last time). Thus, d = c0 , which is a contradiction with
non-stability of d.
    The case D↓I 6= D↓J is proved dually (in this case we obtain d = c ).
   The meaning of the previous theorem is that for each non-stable concept in
B(J) there exists exactly one non-stable concept in B(I), such that these two
are related via mappings  ,  or  ,  .
   The theorem leads the following simple way of constructing B(J) from B(I).
For each c ∈ B(I) the following has to be done:
200          Martin Kauer and Michal Krupka


 1. If c is stable, then it has to be added to B(J).
 2. If c is not stable, then each its non-stable child (i.e., each non-stable element
    of {c , c }) has to be added to B(J).
This method ensures that all proper elements will be added to B(J) (i.e., no
element will be omitted) and each element will be added exactly once.
    Stable (resp. non-stable) concepts can be identified by means of Proposition
11. The following proposition shows a simple way of detecting whether a child
of a non-stable concept from B(I) is stable. It also describes the role of fixpoints
of operators  and  .
Proposition 9. Let c ∈ B(I) be non-stable. Then
    – c is non-stable iff c is a fixpoint of  ,
    – c is non-stable iff c is a fixpoint of  .
Proof. If c is not stable, then c = (c ) by Theorem 8. On the other hand, if
c is stable, then c = c by Theorem 6, which rules out c = c, because in
that case c would be equal to c , which would make it stable by Theorem 5.
   The proof for c is dual.
Example 1. In Fig. 1 we can see some examples of contexts with concepts of
different types w.r.t. operators  ,  .
      The method is utilized in Algorithm 1.


Algorithm 1 Transforming B(I) into B(J) (without structural information).
    procedure TransformConcepts(B(I))
       B(J) ← B(I);
       for all c = hA, Bi ∈ [γI (x0 ), µI (y0 )] do
          B(J) ← B(J) \ {c};
          if c = c then
              B(J) ← B(J) ∪ {c };
          end if
          if c = c then
              B(J) ← B(J) ∪ {c };
          end if
       end for
       return B(J);
    end procedure




   Time complexity of Algorithm 1 is clearly O(|B(I)||X||Y |) in the worst case
scenario. Indeed, the number of non-stable concepts is at most equal to |B(I)|
and the computation of operators  ,  can be done in O(|X| · |Y |) time.


5      Computing B(J ) with structural information
To analyze changes in the structure of a concept lattice after removing an inci-
dence, we need to investigate deeper properties of the closure operator  and
the interior operator  and the sets of their fixpoints.
                                 Removing an Incidence from a Formal Context          201


                                                             y1 y2 y3 y0
                       y0 y1 y2
                                                          x0 × × × •
                    x0 • × ×
                                                          x1       × ×
                    x1
                                                          x2    ×     ×
                    x2
                                                          x3 ×        ×
        (a) The least concept is not sta-
                                               (b) Several non-trival non-stable
        ble and is a fixpoint of both op-
                                               concepts are fixpoints of both op-
        erators.
                                               erators.

                        y1 y2 y0                               y0 y1 y2
                     x0    × •                              x0 • ×
                     x1    × ×                              x1 × × ×
                     x2    ×                                x2
        (c) Concept h{x0 , x1 }, {y0 , y2 }i   (d) Concept h{x0 , x1 }, {y0 , y1 }i
        is a fixpoint of  , but not  .     is a fixpoint of  , but not  .

                                                            y1 y2 y3 y4 y0
                       y0 y1 y2                          x0    ×     × •
                    x0 •                                 x1       × × ×
                    x1 × ×                               x2          ×
                    x2                                   x3 × ×         ×
        (e) Concept h{x0 , x1 }, {y0 }i is               x4    ×
        not a fixpoint of any operator.        (f) Two concepts are not fix-
                                               points of any operator.

Fig. 1: Examples of contexts with concepts of different types w.r.t. operators

   ,  .


Proposition 10. Each stable concept is a fixpoint of both  and  .

Proof. Follows directly from Theorem 5 and Theorem 6.
   Since  is an interior operator and  is a closure operator on B(I), we
have for each c ∈ B(I), c ≤ c ≤ c . Thus, we can consider the interval
[c , c ] ⊆ B(I).
Proposition 11. For any c ∈ B(I), each concept from [c , c ]\{c} is stable.
Proof. First we prove that either c equals c, or is its upper neighbor. Let
c = hA, Bi. By definition, the intent of c is equal to A↑J ↓I ↑I . By Proposition
1, A↑J ∈ {B, B \ {y0 }}. Thus, A↑J ↓I ↑I ∈ {B, B \ {y0 }}. If it equals B, then
c = c. Otherwise the intents of c and c differ in exactly one attribute,
which makes c and c neighbors. Also notice that in this case c is stable
because its intent does not contain y0 (Proposition 2).
   Now let c0 ≤ c be non-stable. If c = c , then c0 ≤ c. If c < c , then c is
non-stable (Proposition 10) whereas c is stable. Non-stable concepts in B(I)
202      Martin Kauer and Michal Krupka


form an interval (Theorem 5). Thus, c0 ∨ c is non-stable and should be less than
c . Hence, c0 ∨ c = c (c is a lower neighbor of c ), concluding c0 ≤ c again.
   In a similar way we obtain the inequality c0 ≥ c for each non-stable c0 ≥
c .

    The following proposition shows an important property of the sets of fixpoints
w.r.t. the ordering on B(I): The set of fixpoints of  is a lower set whereas the
set of fixpoints of  is an upper set.

Proposition 12. Let c ∈ B(I) be a non-stable concept. If c is a fixpoint of  ,
then each c0 ≤ c is also a fixpoint of  . If c is a fixpoint of  , then each c0 ≥ c
is also a fixpoint of  .

Proof. Let c = c and c0 ≤ c. If c0 is stable, then the assertion follows by
Proposition 10. Suppose c0 is not stable. By extensivity and isotony of  , c0 ≤
                                                                
c0    ≤ c = c. Thus, c0      is not stable (Proposition 2) and c0     = c0 by
Proposition 11.
    The case c = c is dual.

    The above results are used in Algorithm 2, which computes the lattice B(J)
together with the information of its ordering. The algorithm is more complicated
than the previous one. We provide a short description of the algorithm, together
with some examples. Due to space limitations, we will not dwell into details. We
will also leave out dual parts of similar cases.
    The algorithm processes all non-stable concepts of B(I) in a bottom-up di-
rection, using an arbitrary linear ordering v such that if c1 ≤ c2 , then c1 v c2 .
Each concept is either modified (by removing x0 from the extent or y0 from in-
tent), or disposed of entirely. Sometimes, new concepts are created. All concepts
also get updated their lists of upper and lower neighbors.

Let c = hA, Bi be an arbitrary non-stable concept from B(I) (c ∈ [γI (x0 ), µI (y0 )]).

 – If c = c , c = c , then c will “split” into d1 ≤ d2 .
      - We set d1 = c and d2 = c .
      - The concept d1 will be a lower neighbor of d2 .
      - If for a lower neighbor cl of c it holds cl = cl  , cl 6= cl , then it
        will be a lower neighbor of d2 . It is necessary to check whether d1 and
        cl will be neighbors. It certainly holds cl ≤ d1 , but there can be
        a concept k, such that cl ≤ k ≤ d1 .
      - Dually for upper neighbors.
      - If for a non-stable neighbor cn of c it holds cn = cn  , cn = cn , i.e.,
        the same conditions as for c (cn will split into dn1 , dn2 ), then d1 , dn1
        and d2 , dn2 will be neighbors.
      - All other upper (resp. lower) neighbors will be neighbors of d2 (resp. d1 ).
 – If c = c and c 6= c , then c will lose y0 from its intent.
      - Denote the transformed c as d = hC, Di = c = hA, B \ {y0 }i.
                             Removing an Incidence from a Formal Context         203


      - If for an upper neighbor cu it holds cu = cu , cu 6= cu  (cu will
        lose x0 from its extent), then cu and d will become incomparable. It
        is necessary to check whether c , cu and c, cu  should be neighbors
        (again, there can be a concept between them).
 – If c 6= c and c = c , then c will lose x0 from its extent.
      - Denote transformed c as d = hC, Di = c = hA \ {x0 }, Bi.
 – If c 6= c and c 6= c , then c will vanish entirely.
      - It is necessary to check whether c and c should be neighbors (again,
        a concept can lie between them).
      - Denote by U the set of all upper neighbors of c, except for c . There
        is no fixed point of  among the elements from U .
      - Denote by L the set of all lower neighbors of c, except for c .
      - Concepts from U and L will not be neighbors.
        Concepts will either become incomparable or one of them or both will
        vanish. There is also no need for additional checks regarding neighbor-
        hood relationship between concepts from U and c (resp. L and c )
        or their neighbors.
      - It holds ∀cl ∈ L : cl ≤ c ≤ c , but it is necessary to check if there is a
        concept between them.
      - Similarly, it holds ∀cu ∈ U : c ≤ c ≤ cu , but again it is necessary to
        check if there is a concept between them.
    The number of iterations in TransformConceptLattice is at most |B(I)|,
which occurs when each concept in B(I) is non-stable. In each of the iterations,
tests c = c and c = c are performed and one of the procedures Split-
Concept, RelinkReducedIntent, UnlinkVanishedConcept is called. It
can be easily seen that the tests can be performed quite efficiently and do not
add to the time complexity.
    The most time consuming among the above three procedures is SplitCon-
cept. It iterates through all upper (which can be bounded by |X|) and lower
(which can be bounded by |Y |) neighbors of the concept c. For each of the
neighbors it might be necessary to check if the interval between the neighbor
and certain other concept is empty (and we should make a new edge). This can
be done by checking intents/extents of its neighbors.
    The above considerations lead to the result that time complexity of Algorithm
2 is in the worst case O(|B| · |X|2 · |Y |).

Example 2. In Fig. 2, we can see some examples of transformations of non-stable
concepts from B(I) into concepts of B(J).

In Algorithm 2 we will assume that following functions are already defined:
 – U pperN eighbors(c) - returns upper neighbors of c;
 – LowerN eighbors(c) - returns lower neighbors of c;
 – Link(c1 , c2 ) - introduces neighborhood relationship between c1 and c2 ;
 – U nlink(c1 , c2 ) - cancels neighborhood relationship between c1 and c2 .
204      Martin Kauer and Michal Krupka



Algorithm 2 Transforming B(I) with structural information into B(J).
 procedure LinkIfNeeded(c1 , c2 )
    if @k ∈ B(I) : c1 < k < c2 then
        Link(c1 , c2 );
    end if
 end procedure

 procedure SplitConcept(c ∈ [γI (x0 ), µI (y0 )])
    d1 = c ; d2 = c ;
    Link(d1 , d2 );
    for all u ∈ U pperN eighbors(c) do
       U nlink(c, u); Link(d2 , u);
    end for
    for all l ∈ LowerN eighbors(c) do
       U nlink(l, c); Link(l, d1 );
    end for
    for all u ∈ U pperN eighbors(c) do
       if u 6= u then
           U nlink(d2 , u); Link(d1 , u); LinkIf N eeded(d2 , u );
       end if
    end for
    for all l = hC, Di ∈ LowerN eighbors(c) do
       if y0 ∈/ D then
           U nlink(l, d1 ); Link(l, d2 ); LinkIf N eeded(l , d1 );
       end if
    end for
    return d1 , d2 ;
 end procedure

 procedure RelinkReducedIntent(c ∈ [γI (x0 ), µI (y0 )])
    for all u = hC, Di ∈ U pperN eighbors(c) do
       if u 6= u then
           U nlink(c, u);
           LinkIf N eeded(c , u); LinkIf N eeded(c, u );
       end if
    end for
 end procedure

 procedure UnlinkVanishedConcept(c ∈ [γI (x0 ), µI (y0 )])
    for all u ∈ U pperN eighbors(c) do
       U nlink(c, u); LinkIf N eeded(c , u);
    end for
    for all l ∈ LowerN eighbors(c) do
       U nlink(l, c);
    end for
 end procedure

 procedure TransformConceptLattice(B(I))
    for all c = hA, Bi ∈ [γI (x0 ), µI (y0 )] from least to largest w.r.t. v do
       if c = c and c = c then                                                  . Concept will split.
           B(I) ← B(I) \ {c};
           B(I) ← B(I) ∪ SplitConcept(c);
       else if c 6= c and c = c then                                       . Extent will be smaller.
           A ← A \ {x0 };
       else if c = c and c 6= c then                                        . Intent will be smaller.
           RelinkReducedIntent(c);
           B ← B \ {y0 };
       else if c 6= c and c 6= c then                                         . Concept will vanish.
           B(I) ← B(I) \ {c};
           U nlinkV anishedConcept(c);
       end if
    end for
 end procedure
                                     Removing an Incidence from a Formal Context                         205


                                                      cu 
    cu                                                                             cu 
                                                    cu = cu
                                     cu
    cu = cu                                                                   cu           c
                                                      c = c = c
                              cu               cl
                                                                                     c             cl
    cl = cl                                         cl = cl 
                                     cl                                                    cl
    cl                                              cl

    (a) Concepts become incomparable.               (b) Concept in middle “splits into two”.

    c          cu = cu     c             cu     c          cu = cu         c            cu



            c                                                 c



    c          cl = cl     c             cl     c          cl = cl         c            cl

    (c) Concept in the middle vanishes.             (d) Concept in the middle vanishes.
                                                    There is already another concept be-
                                                    tween its children.

Fig. 2: Examples of transformations of non-stable concepts from B(I) into con-
cepts of B(J).


6    Conclusion
We analyzed changes of the structure of a concept lattice, caused by removal
of exactly one incidence from the associated formal context. We proved some
theoretical results and presented two algorithms with time complexities O(|B| ·
|X| · |Y |) (Algorithm 1; without structure information) and O(|B| · |X|2 · |Y |)
(Algorithm 2; with structure information).
     There exist several algorithms for incremental computation of concept lattice
[1, 5, 8, 6, 7, 2], based on addition and/or removal of objects. Our approach is new
in that we recompute a concept lattice based on a removal of just one incidence.
     Note that the algorithm proposed by Nourine and Raynaud in [7] has time
complexity O((|Y | + |X|) · |X| · |B|), which is better than complexity of our
Algorithm 2. However, experiments presented in [5] indicate that this algorithm
sometimes performs slower than some algorithms with time complexity O(|B| ·
|X|2 ·|Y |). In the case of our Algorithm 2, some preliminary experiments indicate
that the size of the interval of non-stable concepts is usually relatively small,
which substantially reduces the overall processing time of the algorithm.
     A natural next step would be investigate adding incidences to a formal con-
text, instead of removing. This problem, however, seems to be more difficult
than the first one, namely because the set of non-stable concepts in the lattice
B(J) has more complicated structure (it is not an interval) and also because not
206      Martin Kauer and Michal Krupka


all non-stable concepts in B(I) can be computed via the operator  . We will try
to address this issues in the future. We will also focus on the following:

 – experimenting with proposed algorithms on various datasets and comparing
   them with other known algorithms,
 – generalizing the results to allow removing and adding more incidences at the
   same time.


References
1. Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. John
   Wiley & Sons (2004)
2. Dowling, C.E.: On the irredundant generation of knowledge spaces. J. Math. Psy-
   chol. 37(1), 49–62 (1993)
3. Ganter, B., Wille, R.: Formal Concept Analysis – Mathematical Foundations.
   Springer (1999)
4. Kuznetsov, S.O., Obiedkov, S.: Comparing performance of algorithms for generating
   concept lattices. Journal of Experimental and Theoretical Artificial Intelligence 14,
   189–216 (2002)
5. Merwe, D., Obiedkov, S., Kourie, D.: Addintent: A new incremental algorithm for
   constructing concept lattices. In: Eklund, P. (ed.) Concept Lattices, Lecture Notes
   in Computer Science, vol. 2961, pp. 372–385. Springer Berlin Heidelberg (2004)
6. Norris, E.M.: An algorithm for computing the maximal rectangles in a binary rela-
   tion. Revue Roumaine de Mathématiques Pures et Appliquées 23(2), 243–250 (1978)
7. Nourine, L., Raynaud, O.: A fast algorithm for building lattices. Inf. Process. Lett.
   71(5-6), 199–204 (1999)
8. Outrata, J.: A lattice-free concept lattice update algorithm based on *CbO. In:
   Ojeda-Aciego, M., Outrata, J. (eds.) CLA. CEUR Workshop Proceedings, vol. 1062,
   pp. 261–274. CEUR-WS.org (2013)
9. Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts.
   In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Boston (1982)
            Formal L-concepts with Rough Intents‹

                              Eduard Bartl and Jan Konecny

                            Data Analysis and Modeling Lab
                   Dept. Computer Science, Palacky University, Olomouc
                           17. listopadu 12, CZ-77146 Olomouc
                                      Czech Republic


           Abstract. We provide a new approach to synthesis of Formal Concept
           Analysis and Rough Set Theory. In this approach, the formal concept is
           considered to be a collection of objects accompanied with two collections
           of attributes—those which are shared by all the objects and those which
           are possessed by at least one of the objects. We define concept-forming
           operators for these concepts and describe their properties. Furthermore,
           we deal with reduction of the data by rough approximation by given
           equivalence. The results are elaborated in a fuzzy setting.


  1      Introduction
  Formal concept analysis (FCA) [12] is a method of relational data analysis iden-
  tifying interesting clusters (formal concepts) in a collection of objects and their
  attributes (formal context), and organizing them into a structure called concept
  lattice. Numerous generalizations of FCA, which allow to work with graded data,
  were provided; see [19] and references therein.
      In a graded (fuzzy) setting, two main kinds of concept forming-operators—
  antitone and isotone one—were studied [2, 13, 20, 21], compared [7, 8] and even
  covered under a unifying framework [4, 18]. We describe concept-forming oper-
  ators combining both isotone and antitone operators in such a way that each
  formal (fuzzy) concept is given by two sets of attributes. The first one is a
  lower intent approximation, containing attributes shared by all objects of the
  concept; the second one is an upper intent approximation, containing those at-
  tributes which are possessed by at least one object of the concept. Thus, one can
  consider the two intents to be a lower and upper approximation of attributes
  possessed by an object.
      Several authors dealing with synthesis of FCA and Rough Set Theory have
  noticed that intents formed by isotone and antitone operators (in both, crisp
  and fuzzy setting) correspond to upper and lower approximations, respectively
  (see e.g. [15, 16, 24]). To the best of our knowledge, no one has studied concept-
  forming operators which would provide both approximations being present in
  one concept lattice.
      In this papers we present such concept-forming operators, structure of their
  concepts, and reduction of the data by means of rough approximations by equiv-
  alences. Due to page limitation we omit proofs of some theorems.
   ‹
       Supported by grant no. P202/14-11585S of the Czech Science Foundation.


c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 207–219,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
208      Eduard Bartl and Jan Konecny


2     Preliminaries

In this section we summarize the basic notions used in the paper.

Residuated Lattices and Fuzzy Sets We use complete residuated lattices as basic
structures of truth-degrees. A complete residuated lattice [1, 14, 23] is a struc-
ture L “ xL, ^, _, b, Ñ, 0, 1y such that xL, ^, _, 0, 1y is a complete lattice, i.e.
a partially ordered set in which arbitrary infima and suprema exist; xL, b, 1y is
a commutative monoid, i.e. b is a binary operation which is commutative, asso-
ciative, and a b 1 “ a for each a P L; b and Ñ satisfy adjointness, i.e. a b b ď c
iff a ď b Ñ c. 0 and 1 denote the least and greatest elements. The partial order
of L is denoted by ď. Throughout this work, L denotes an arbitrary complete
residuated lattice.
     Elements a of L are called truth degrees. Operations b (multiplication) and
Ñ (residuum) play the role of (truth functions of) “fuzzy conjunction” and
“fuzzy implication”. Furthermore, we define the complement of a P L as a “
a Ñ 0.
     An L-set (or fuzzy set) A in a universe set X is a mapping assigning to each
x P X some truth degree Apxq P L. The set of all L-sets in a universe X is
denoted LX , or LX if the structure of L is to be emphasized.
     The operations with L-sets are defined componentwise. For instance, the
intersection of L-sets A, B P LX is an L-set A X B in X such that pA X Bqpxq “
Apxq ^ Bpxq for each x P X. An L-set A P LX is also denoted tApxq{x | x P
Xu. If for all y P X distinct from x1 , . . . , xn we have Apyq “ 0, we also write
tApx1 q{x1 , . . . , Apxn q{xn u.
     An L-set A P LX is called normal if there is x P X such that Apxq “ 1, and
it is called crisp if Apxq P t0, 1u for each x P X. Crisp L-sets can be identified
with ordinary sets. For a crisp A, we also write x P A for Apxq “ 1 and x R A
for Apxq “ 0.
     Binary L-relations (binary fuzzy relations) between X and Y can be thought
of as L-sets in the universe X ˆ Y . That is, a binary L-relation I P LXˆY
between a set X and a set Y is a mapping assigning to each x P X and each
y P Y a truth degree Ipx, yq P L (a degree to which x and y are related by I). For
L-relation I P LXˆY we define its transpose I T P LY ˆX as I T py, xq “ Ipx, yq
for all x P X, y P Y .
     The composition operators are defined by
                                               ł
                               pI ˝ Jqpx, zq “   Ipx, yq b Jpy, zq,
                                        yPY
                                        ľ
                      pI Ž Jqpx, zq “         Ipx, yq Ñ Jpy, zq,
                                        yPY
                                        ľ
                      pI Ż Jqpx, zq “         Jpy, zq Ñ Ipx, yq
                                        yPY


for every I P LXˆY and J P LY ˆZ .
                                          Formal L-concepts with Rough Intents     209


    A binary L-relation E is called an L-equivalence if it satisfies IdX Ď E
(reflexivity), E “ E T (symmetry), E ˝ E Ď E (transitivity).
    An L-set B P LY is compatible w.r.t. L-equivalence E P LY ˆY if

                              Bpy1 q b Epy1 , y2 q ď Bpy2 q.

for any y1 , y2 P Y .

Formal Concept Analysis in the Fuzzy Setting An L-context is a triplet xX, Y, Iy
where X and Y are (ordinary) sets and I P LXˆY is an L-relation between X
and Y . Elements of X are called objects, elements of Y are called attributes,
I is called an incidence relation. Ipx, yq “ a is read: “The object x has the
attribute y to degree a.” An L-context may be described as a table with the
objects corresponding to the rows of the table, the attributes corresponding to
the columns of the table and Ipx, yq written in cells of the table (for an example
see Fig. 1).


                                     α      β     γ      δ
                              A     0.5     0     1      0
                              B      1     0.5    1     0.5
                              C      0     0.5   0.5    0.5
                              D     0.5    0.5    1     0.5


    Fig. 1. Example of L-context with objects A,B,C,D and attributes α, β, γ, δ.


   Consider the following pairs of operators induced by an L-context xX, Y, Iy.
First, the pair xÒ, Óy of operators Ò : LX Ñ LY and Ó : LY Ñ LX is defined by
                     ľ                             ľ
           AÒ pyq “      Apxq Ñ Ipx, yq, B Ó pxq “     Bpyq Ñ Ipx, yq.      (1)
                        xPX                             yPY


Second, the pair xX, Yy of operators X : LX Ñ LY and Y : LY Ñ LX is defined by
                    ł                              ľ
          AX pyq “       Apxq b Ipx, yq, B Y pxq “   Ipx, yq Ñ Bpyq.       (2)
                        xPX                             yPY


    To emphasize that the operators are induced by I, we also denote the op-
erators by xÒI , ÓI y and xXI , YI y. Fixpoints of these operators are called formal
concepts. The set of all formal concepts (along with set inclusion) forms a com-
plete lattice, called L-concept lattice. We denote the sets of all concepts (as well
as the corresponding L-concept lattice) by B ÒÓ pX, Y, Iq and B XY pX, Y, Iq, i.e.

              B ÒÓ pX, Y, Iq “ txA, By P LX ˆ LY | AÒ “ B, B Ó “ Au,
                                                                                   (3)
              B XY pX, Y, Iq “ txA, By P LX ˆ LY | AX “ B, B Y “ Au.
210      Eduard Bartl and Jan Konecny


    For an L-concept lattice BpX, Y, Iq, where B is either B ÒÓ or B XY , denote the
corresponding sets of extents and intents by ExtpX, Y, Iq and IntpX, Y, Iq. That
is,

            ExtpX, Y, Iq “ tA P LX | xA, By P BpX, Y, Iq for some Bu,
                                                                                                        (4)
                IntpX, Y, Iq “ tB P LY | xA, By P BpX, Y, Iq for some Au.

    When displaying L-concept lattices, we use labeled Hasse diagrams to include
all the information on extents and intents. In B ÒÓ pX, Y, Iq, for any x P X, y P Y
and formal L-concept xA, By we have Apxq ě a and Bpyq ě b if and only if
there is a formal concept xA1 , B1 y ď xA, By, labeled by a{x and a formal concept
xA2 , B2 y ě xA, By, labeled by b{y. We use labels x resp. y instead of 1{x resp.
1
{y and omit redundant labels (i.e., if a concept has both the labels a{x and b{x
then we keep only that with the greater degree; dually for attributes). The whole
structure of B ÒÓ pX, Y, Iq can be determined from the labeled diagram using the
results from [2] (see also [1]).
    In B XY pX, Y, Iq, for any x P X, y P Y and formal L-concept xA, By we have
Apxq ě a and Bpyq ď b if and only if there is a formal concept xA1 , B1 y ď
xA, By, labeled by a{x and a formal concept xA2 , B2 y ě xA, By, labeled by b{y
(see examples depicted in Fig. 2).



                                                              B, 0.5{β, 0.5{δ
                         0.5                                         ‚
                          {γ
                          ‚
                                                           D, 0.5{α ‚
A, 0.5{α, γ ‚                    ‚ C, 0.5{β, 0.5{δ
                                        A, 0 {β, 0.5{δ ‚                            ‚ C, 0.5{β, 0.5{γ

                     D ‚
                                                             0.5
                                                                {B ‚                           ‚ C, 0 {α
 0.5
   {C, β, δ ‚                    ‚ 0.5{A, B, α
                                                                   0.5
                                                                     {A, 0.5{D ‚
                       ‚
                   0.5
                     {B, 0.5{D
                                                                                    ‚
                                                                                0
                                                                                    {γ

Fig. 2. Concept lattice BÒÓ pX, Y, Iq (left) and BXY pX, Y, Iq (right) of the L-context in
Fig. 1.
                                             Formal L-concepts with Rough Intents                            211


3     L-rough concepts
We consider concept-forming operators induced by L-context xX, Y, Iy defined
as follows:
Definition 1. Let xX, Y, Iy be an L-context. Define L-rough concept-forming
operators as
                                              O          Y
                 AM “ xAÒ , AX y and xB, By “ B Ó X B

for A P LX , B, B P LY . L-rough concept is then a fixed point of xM, Oy, i.e. a
                                                                   O
pair xA, xB, Byy P LX ˆ pL ˆ LqY such that AM “ xB, By and xB, By “ A.1 AÒ
and AX are called lower intent approximation and upper intent approximation,
respectively.
    That means, M gives intents w.r.t. both xÒ, Óy and xX, Yy; O then gives inter-
section of extents related to the corresponding intents.
    We denote the set of all fixed-points of xM, Oy, in correspondence with (3),
as B MO pX, Y, Iq and call it L-rough concept lattice. Below, we present an analogy
of the Main theorem on concept lattices for L-rough setting.
Theorem 1 (Main theorem on L-rough concept lattices).

(a) L-rough concept lattice B MO pX, Y, Iq is a complete lattice with suprema and
    infima defined as follows
                    ľ                        č      ď     č OM
                        xAi , B i , B i y “ x Ai , x B i , B i y y,
                        i                            i              i            i
                                                                MO
                       ł                             ď                      č         ď
                            xAi , B i , B i y “ xp           Ai q       ,       Bi,       B i y.
                        i                                i                  i         i

(b) Moreover, a complete lattice V “ xV, ďy is isomorphic to B MO pX, Y, Iq iff
    there are mappings
                       γ :X ˆLÑV               and            µ:Y ˆLˆLÑV
     such that γpX ˆLq is supremally dense in V, µpY ˆLˆLq is infimally dense
     in V, and
       a b b ď Ipx, yq and Ipx, yq ď a Ñ b                   is equivalent to              γpx, aq ď µpy, b, bq
     for all x P X, y P Y, a, b, b P L.
    When drawing a concept lattice we label nodes as in B ÒÓ for lower intent
approximations and B XY for upper intent approximations. We write a{y or a{y
instead of just a{y to distinguish them. Fig. 3 (middle) shows an L-rough concept
lattice for the L-context from Fig. 1.
    The following theorem explains that normal extents have natural intent ap-
proximations; that is B Ď B.
1
    In what follows, we naturally identify xA, xB, Byy with xA, B, By.
212      Eduard Bartl and Jan Konecny


Theorem 2. For normal A P LX , we have AÒ Ď AX , for crisp singleton A P LX ,
we have AÒ “ AX .

Proof. Since A is normal, there is x1 P X such that Apx1 q “ 1. Then we have
                     ľ
          AÒ pyq “         Apxq Ñ Ipx, yq ď Apx1 q Ñ Ipx1 , yq “ Ipx1 , yq “
                     xPX
                                          ł                                     (5)
                “ Apx1 q b Ipx1 , yq ď          Apxq b Ipx, yq “ AX pyq
                                          xPX

for each y P Y .
    For A being a crisp singleton, one can show AÒ “ AX by changing all inequal-
ities in (5) to equalities.                                                   \
                                                                              [

    Since xM, Oy is defined via xÒ, Óy and xX, Yy, one can expect that there is a
strong relationship between the associated concept lattices. In the rest of this
section, we summarize them.

Theorem 3. For S Ď LX , let rSs denote an L-closure span of S, i.e. the small-
est L-closure system containing S. We have

               rExtÒÓ pX, Y, Iq Y ExtXY pX, Y, Iqs “ ExtMO pX, Y, Iq.
                                                                    O
Proof. “Ď”: Let A P ExtÒÓ pX, Y, Iq. Then A “ AXX “ xAÒ , Y y P ExtMO pX, Y, Iq.
Similarly for A P ExtXY pX, Y, Iq.
    “Ě”: Let A P ExtMO pX, Y, Iq and let xB1 , B2 y “ AM . Then we have A “
B Ó X B Y P rExtÒÓ pX, Y, Iq Y ExtXY pX, Y, Iqs since B Ó P ExtÒÓ pX, Y, Iq and B Y P
ExtXY pX, Y, Iq.

   From Theorem 3 one can observe that no extent from ExtÒÓ pX, Y, Iq and
ExtXY pX, Y, Iq is lost.

Corollary 1. ExtÒÓ pX, Y, Iq Ď ExtMO pX, Y, Iq and ExtXY pX, Y, Iq Ď ExtMO pX, Y, Iq.

   In addition, no concept is lost.

Corollary 2. For each xA, By P B ÒÓ pX, Y, Iq there is xA, B, AX y P B MO pX, Y, Iq.
  For each xA, By P B XY pX, Y, Iq there is xA, AÒ , By P B MO pX, Y, Iq.

Remark 1. One can observe from Fig. 3 that in ExtMO pX, Y, Iq there exist ex-
tents which are present neither in ExtÒÓ pX, Y, Iq nor in ExtXY pX, Y, Iq. On the
other hand, lower intent approximations are exactly those from IntÒÓ pX, Y, Iq
and upper intent approximations are exactly those from IntXY pX, Y, Iq.

   With results on mutual reducibility from [8] we can state the following the-
orem on representation of B MO by B ÒÓ .
                                                              0.5
                                                                {γ, 0.5{β, 0.5{δ
                                                                      ‚

                                                                    0.5
                                                                      {α, 1{γ
                                            0.5     0.5
                         0.5                  {β,     {δ ‚             ‚                 ‚ 0.5{α
                          {γ
                          ‚                                                                                                     B, 0.5{β, 0.5{δ
                                                                                                                                       ‚
                                                          ‚               ‚              ‚
   A, 0.5{α, γ ‚                  ‚ C, 0.5{β, 0.5{δ                                     0.5
                                                                                         {γ                                  D, 0.5{α ‚
                                                                          ‚              ‚         ‚ A, 0{β, 0.5{δ
                                                                          D
                     D ‚
                                                                                                           A, 0{β, 0.5{δ ‚                          ‚ 0.5{γ
                                                      1
                                                  B, {α ‚                                ‚         ‚ C, 0{α
   0.5
     {C, β, δ ‚                   ‚ 0.5{A, B, α                                     1                                          0.5
                                                                                    {β, 1{δ                                       {B ‚                        ‚ C, 0{α
                                                                          ‚           ‚            ‚
                       ‚                                                            0.5                                              0.5
                   0.5                                                                   {A                                            {A, 0.5{D ‚
                     {B, 0.5{D                                 0.5
                                                                    {B ‚                 ‚         ‚

                                                                                                                                                   ‚
                                                                              0.5                                                             0.5
                                                                                {D ‚               ‚ 0.5{C, 0{γ                                   {C, 0{γ


                                                                                                   ‚
                                                                                                                                                                         Formal L-concepts with Rough Intents




Fig. 3. BMO pX, Y, Iq (middle) and positions of original concepts in BÒÓ pX, Y, Iq (left) and BXY pX, Y, Iq (right) with L being a three-element
Lukasiewicz chain
                                                                                                                                                                         213
214     Eduard Bartl and Jan Konecny


Theorem 4. For a L-context xX, Y, Iy, consider the L-context xX, Y ˆ L, Jy
where J is defined by
                                   #
                                     Ipx, yq     if a “ 1,
                    Jpx, xy, ayq “
                                     Ipx, yq Ñ a otherwise.

Then we have that B ÒÓ pX, Y ˆ L, Jq is isomorphic to B MO pX, Y, Iq as a lattice.
In addition,
                    ExtÒÓ pX, Y ˆ L, Jq “ ExtMO pX, Y, Iq.

Proof (sketch). In [8] we show that for L-contexts xX, Y, Iy and xX, Y ˆ Lzt1u, Jy
such that
                             Jpx, xy, ayq “ Ipx, yq Ñ a
it holds that ExtXY pX, Y, Iq “ ExtÒÓ pX, Y ˆ Lzt1u, Jq. Using this fact, one can
check that mapping i defined as
                                                           1
                               ipxA, B, Byq ÞÑ xA, B 1 Y B y,
                        1
where B 1 P LY ˆt1u , B P LY ˆLzt1u with

                                  B 1 pxy, 1yq “ Bpyq,
                                    1
                                  B pxy, ayq “ Bpyq Ñ a,

is the desired isomorphism from B MO pX, Y, Iq to B ÒÓ pX, Y ˆ L, Jq.

     Theorem 4 shows how we can obtain a concept lattice formed by xÒ, Óy which
is isomorphic to L-rough concept lattice of given L-context.


4     Rough approximation of an L-context and L-concept
      lattice

In [17] Pawlak introduced Rough Set Theory where uncertain elements are ap-
proximated with respect to an equivalence relation representing indiscernibility.
    Formally, given Pawlak approximation space xU, Ey, where U is a non-empty
set of objects (universe) and E is an equivalence relation on U , the rough approx-
imation of a crisp set A Ď U by E is the pair xAóE , AòE y of sets in U defined
by

       x P A óE   iff       for all y P U, xx, yy P E implies y P A,
             òE
       xPA        iff       there exists y P U such that xx, yy P E and y P A.

AóE and AòE are called lower and upper approximation of the set A by E,
respectively.
                                          Formal L-concepts with Rough Intents   215


   In the fuzzy setting, one can generalize xAóE , AòE y as in [10, 11, 22],
                                   ľ
                        AóE pxq “     pEpx, yq Ñ Apyqq,
                                         yPU
                                         ł
                            òE
                        A        pxq “         pApyq b Epx, yqq
                                         yPU


for L-equivalence E P LU ˆU and L-set A P LU .
    Considering L-context xU, U, Ey, we can easily see that óE is equivalent to
YE ; and òE is equivalent to XET . Since E is symmetric, we can also write

                                  xóE , òE y “ xYE , XE y.                       (6)

     Note that for L-set A, AóE is its largest subset compatible with E and AòE
is its smallest superset compatible with E.
     Below, we deal with situation where lower and upper intent approximations
are further approximated using Pawlak’s approach. In other words, instead of
lower intent approximation AÒ we consider the largest subset of AÒ compatible
with a given indiscernibility relation E, and similarly, instead of upper intent
approximation AX we consider its smallest superset compatible with E. In The-
orem 5 we show how to express this setting using L-rough concept forming
operators.

Definition 2. Let xX, Y, Iy be an L-context, E be an L-equivalence on Y . Define
L-rough concept-forming operators as follows:

                                 AME “ xAÒóE , AXòE y,
                                  OE             óE Y
                            xB, By “ B òE Ó X B       .

   Directly from (6) and results in [5] we have:

Theorem 5. Let xX, Y, Iy be an L-context, E be an L-equivalence on Y . We
have
                                           OE              YI˝E
        AME “ xAÒIŻE , AXI˝E y and xB, By “ B ÓIŻE X B          .

   Again, for normal extents we obtain natural upper and lower intent approx-
imations.

Theorem 6. For normal A P LX we have AÒIŻE Ď AXI˝E .

    In correspondence with (3) and (4), we denote set of the set of fixpoints of
xME , OE y in L-context xX, Y, Iy by B MEOE pX, Y, Iq and set of its extents and
intents by ExtMEOE pX, Y, Iq and IntMEOE pX, Y, Iq, respectively.
    The following theorem shows that a use of a rougher L-equivalence relation
leads to a reduction of size of the L-rough concept lattices. Furthermore, this
reduction is natural, i.e. it preserves extents.
216     Eduard Bartl and Jan Konecny


Theorem 7. Let xX, Y, Iy be an L-context, and E1 , E2 be L-equivalences on Y ,
such that E1 Ď E2 . Then

                   ExtME2OE2 pX, Y, Iq Ď ExtME1OE1 pX, Y, Iq.

Example 1. Fig. 4 shows L-rough concept lattice of the L-context in Fig. 1 and
rough L-concept lattice approximated using the following L-equivalence relation
on Y .

                                   α β γ δ
                                α 1 0.5 0 0
                                β 0.5 1 0 0
                                γ 0 0 1 0.5
                                δ 0 0 0.5 1

    To demonstrate Theorem 7, the concepts with the same extents in the two
lattices are connected.



5     Conclusions and further research

We proposed a novel approach to synthesis of RTS na FCA. It provides a lot of
directions to be further explored. Our future research includes:

Study of attribute implications using whose semantics is related to the present
setting. That will combine results on fuzzy attribute implications [9] and at-
tribute containment formulas [6].

Generalization of the current setting. Note that the operators Ò and X which
compute the universal and the existential intent, need not be induced by the
same relation to keep most of the described properties. Actually, this feature is
used in Section 4. In our future research, we want to elaborate more on this.
For instance, it can provide interesting solution of problem of missing values
in a formal fuzzy context—the idea is to use Ò induced by the context with
missing values substituted by 0, and X induced by the context with missing
values substituted by 1.

Reduction of L-rough concept lattice via linguistic hedges As two intents are
considered in each L-rough concept, the size of concept lattice can grow very
large. The RST approach to reduction of data, i.e. use of rougher L-relation,
directly leads to reduction of L-rough concept lattice as we showed in Theorem 7.
FFCA provides other ways to reduce the size, one of them is parametrization of
concept-forming operators using hedges.
                                                         Formal L-concepts with Rough Intents                                  217




                  0.5
                    {γ, 0.5{β, 0.5{δ                                                           0.5
                                                                                                   {γ, 0.5{, 0.5{δ
                          ‚                                                                             ‚

                        0.5
                          {α, 1{γ                                                 A, 0.5{α, 0.5{β
0.5     0.5
  {β,     {δ ‚             ‚                 ‚    0.5
                                                    {α                                   ‚                           ‚ 0.5{δ

                                                                                                                        B, 0.5{α, 0.5{β, γ
              ‚               ‚              ‚                                                          ‚                       ‚

                                            0.5
                                             {γ                                         C, 0.5{γ
                              ‚              ‚           ‚ A, {β,  0       0.5             ‚                         ‚ D
                                                                             {δ
                              D
                                                                                                       0
                                                                                                        {δ
         1                                                         0
      B, {α ‚                                ‚           ‚ C, {α                                        ‚

                                        1
                                        {β, 1{δ                                   0.5
                                                                                    {A, 0{α, 0{β
                              ‚           ‚              ‚                              ‚                            ‚ 1{δ

                                        0.5                                                                              0.5
                                             {A                                                                              {B, 1{α, 1{β
                   0.5
                        {B ‚                 ‚           ‚                                              ‚                        ‚

                                                                                    0.5
                                                                                         {C, 0{γ
                                  0.5
                                    {D ‚                 ‚   0.5
                                                               {C, {γ  0                  ‚                          ‚ 0.5{D


                                                         ‚                                              ‚


Fig. 4. Rough L-concept lattices BMO pX, Y, Iq (left) and BMEOE pX, Y, Iq (right) with L
being three-element Lukasiewicz chain. The corresponding extents are connected.


References
 1. Radim Belohlavek. Fuzzy Relational Systems: Foundations and Principles. Kluwer
    Academic Publishers, Norwell, USA, 2002.
 2. Radim Belohlavek. Concept lattices and order in fuzzy logic. Ann. Pure Appl.
    Log., 128(1-3):277–298, 2004.
 3. Radim Belohlavek. Optimal decompositions of matrices with entries from residu-
    ated lattices. Submitted to J. Logic and Computation, 2009.
 4. Radim Belohlavek. Sup-t-norm and inf-residuum are one type of relational product:
    Unifying framework and consequences. Fuzzy Sets Syst., 197:45–58, June 2012.
 5. Radim Belohlavek and Jan Konecny. Operators and spaces associated to matrices
    with grades and their decompositions. In NAFIPS 2008, pages 288–293.
 6. Radim Belohlavek and Jan Konecny. A logic of attribute containment, KAM ’08
    Proceedings of the 2008 International Symposium on Knowledge Acquisition and
    Modeling, pages 246–251.
218      Eduard Bartl and Jan Konecny


 7. Radim Belohlavek and Jan Konecny. Closure spaces of isotone galois connec-
    tions and their morphisms. In Proceedings of the 24th international conference on
    Advances in Artificial Intelligence, AI’11, pages 182–191, Springer-Verlag, Berlin,
    Heidelberg, 2011.
 8. Radim Belohlavek and Jan Konecny. Concept lattices of isotone vs. antitone Galois
    connections in graded setting: Mutual reducibility revisited. Information Sciences,
    199: 133 – 137, 2012.
 9. Radim Belohlavek and Vilem Vychodil. A logic of graded attributes. submitted to
    Artificial Intelligence.
10. Didier Dubois and Henri Prade. Rough fuzzy sets and fuzzy rough sets. Interna-
    tional Journal of General Systems, 17(2–3):191–209, 1990.
11. Didier Dubois and Henri Prade. Putting rough sets and fuzzy sets together. In
    Roman Slowiński, editor, Intelligent Decision Support, volume 11 of Theory and
    Decision Library, pages 203–232. Springer Netherlands, 1992.
12. Bernard Ganter and Rudolf Wille. Formal Concept Analysis – Mathematical Foun-
    dations. Springer, 1999.
13. George Georgescu and Andrei Popescu. Non-dual fuzzy connections. Arch. Math.
    Log., 43(8):1009–1039, 2004.
14. Petr Hájek. Metamathematics of Fuzzy Logic (Trends in Logic). Springer, Novem-
    ber 2001.
15. Robert E. Kent. Rough concept analysis: A synthesis of rough sets and formal
    concept analysis. Fundam. Inf., 27(2,3):169–181, August 1996.
16. Hongliang Lai and Dexue Zhang. Concept lattices of fuzzy contexts: Formal con-
    cept analysis vs. rough set theory. International Journal of Approximate Reasoning,
    50(5):695 – 707, 2009.
17. Zdzislaw Pawlak. Rough sets. International Journal of Computer & Information
    Sciences, 11(5):341–356, 1982.
18. Jesus Medina, Manuel Ojeda-Aciego, and Jorge Ruiz-Calvino. Formal concept
    analysis via multi-adjoint concept lattices. Fuzzy Sets and Systems, 160(2):130–
    144, January 2009.
19. Jonas Poelmans, Dmitry I. Ignatov, Sergei O. Kuznetsov, and Guido Dedene. Fuzzy
    and rough formal concept analysis: a survey. International Journal of General
    Systems, 43(2):105–134, 2014.
20. Silke Pollandt. Fuzzy Begriffe: Formale Begriffsanalyse von unscharfen Daten.
    Springer–Verlag, Berlin–Heidelberg, 1997.
21. Andrei Popescu. A general approach to fuzzy concepts. Mathematical Logic Quar-
    terly, 50(3):265–280, 2004.
22. Anna M. Radzikowska and Etienne E. Kerre. Fuzzy rough sets based on residuated
    lattices. In James F. Peters, Andrzej Skowron, Didier Dubois, Jerzy W. Grzymala-
    Busse, Masahiro Inuiguchi, and Lech Polkowski, editors, Transactions on Rough
    Sets II, volume 3135 of Lecture Notes in Computer Science, pages 278–296. Springer
    Berlin Heidelberg, 2005.
23. Morgan Ward and Robert P. Dilworth. Residuated lattices. Transactions of the
    American Mathematical Society, 45:335–354, 1939.
24. Yiyu Yao. On unifying formal concept analysis and rough set analysis. Xi’an
    Jiaotong University Press, 2006.
    Reduction dimension of bags of visual words
                   with FCA

                  Ngoc Bich Dao, Karell Bertet, Arnaud Revel

                 Laboratoire L3i, University of La Rochelle, France



      Abstract. In image retrieval involving bag of visual words, reduction
      dimension is a fundamental task of data preprocessing. In recent years,
      several methods have been proposed for supervised and unsupervised
      cases. In the supervised case, the problem has been addressed with en-
      couraging results. However, in the unsupervised case, reduction dimen-
      sion is still an unavoidable challenge. In this article, we propose an appli-
      cation of a logic reduction dimension method which is based on Formal
      Concept Analysis for image retrieval. This method is the reduction of a
      closure system without, theoretically, loss of information. In our context,
      combining our proposed method with bag of visual words is original.
      Experimental results on five data sets such as COREL, CALTECH256,
      VOC2005, VOC2012 and MIR flickr are analyzed to show the influence
      of the data structures and the parameters on the reduction factor.


1   Introduction
Thanks to the generalization of multimedia devices, huge collections of digital
images are available today. As far as mining in multimedia documents is con-
cerned, web search engines usually give poor results. Hence, such results are far
from expected regarding the semantics of the documents. Content Based Image
Retrieval (CBIR)[1] has been investigated in order to give an answer to this
problem for decades. The main idea is to build a description based on the image
content, and to find similarities between descriptions. Classically, visual features
are extracted from images and then compiled into an index or signature to give
a dense description of images. To perform the retrieval, a similarity function is
computed to compare the index of the query with those of collection. A ranking
of the results according to the calculated similarity is proposed to the users. The
detection of visual features can be performed by a SIFT detector[2] or a dense
grid which both select an important number of interest points (up to several
thousands) from the images. Each of these points is then described thanks to a
SIFT-like descriptor. However, to limit the dimension of the description space, a
vector quantization (usually k-means) is performed in order to cluster similar in-
terest points into ”visual words”, and to generate a dictionary of ”visual words”
(usually up to 1000 words). Then, the signature of the image is composed of
the set of all the visual words corresponding to each feature point detected into
the image (what formed a ”bag of visual words”[3]). The comparison between
the images then consists in comparing the bags of visual words of each image

c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 219–231,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
   220     Ngoc Bich Dao, Karell Bertet and Arnaud Revel


in a dataset. The processing cost introduced by these techniques makes them
difficult to use with large amounts of images such as a query on the Internet.
    On the other hand, supervised data is labeled (the data has ground truth)
and classification methods are required to deal with the categorization problem.
Data in the case unsupervised is unlabeled, hence clustering methods are used
to gather the similar observations in the same cluster. There are many appli-
cations for classification and clustering on many domains of computer science
such as bioinformatics, numerical analysis, machine learning, data mining, pat-
tern recognition, etc., where data may contain a grand set of features, means
the description of the data is high dimension, and therefore it need to be re-
duced. However, reduction them while preserving the quality of the data is still
challenging.
    To be able to manage high dimensional description spaces, reduction tech-
niques have been proposed. These techniques are much used as a data prepro-
cessing step in machine learning and pattern recognition. This step can usually
increase the accuracy of the results in the next steps such as classification or
clustering while the computational cost and time cost of the former step may be
significantly decreased. Regarding statistics and machine learning literature, we
distinguish two main strategies: feature extraction and feature selection. These
methods can be used for supervised case or unsupervised case. The main idea of
feature transformation consists in transforming the given set of features into a
new one. In case that the size of the new feature set is greater than the original
feature set, we called it the feature generation. And when new feature set size
is smaller than the original feature set, feature extraction is mentioned. Feature
selection methods propose a manipulation of data to select features from the
original set. This approach is interesting in some domains when they prefer the
existing features in order to maintain their physical properties.
    In this article, we propose a logic and unsupervised feature reduction method
issued from FCA to address the visual word reduction problem in a CBIR sys-
tem. In FCA, data are organised into a ”context” by a set of observations (called
”objects”, ”samples” or ”experimental units” in other fields) and a set of features
(also known as ”attributes”, ”parameters”, or ”variables” in computer science,
machine learning and statistic communities) that are associated with each ob-
servation.
    Context reduction is a simple and polynomial treatment in FCA classically
applied on the whole context, thus both reducing observations and features. This
treatment is based on a nice result establishing that the concept lattice of the
context can be reduced to a minimal one while preserving its graph structure
by deleting some redundant observations and features. For example, when two
attributes are shared by the same objects, then they belong to the same concepts
of the concept lattice, thus they are redundant and one of these two objects can
be deleted while preserving the concept lattice structure. In our case, we focus
on feature reduction of a context. Our algorithm accepts as input the closure
operator of the context on attributes set, and returns the redundant attributes.
Thus, this algorithm extends the classical attributes reduction of a context to the
                      Reduction Dimension of Bags of Visual Words with FCA        221


more general case of data described by a closure operator. Moreover, we propose
a new application in image analysis for features reduction of visual words.
    This paper is organized as follows: In order to introduce our approach, we
recall some definitions of formal concepts in the section 2.1. Section 2.2 shows
details our proposed method. Section 3 shows some experimental results with
real data. Finally, section 4 ends this paper with a conclusion and perspectives.


2     The proposed features selection method

The feature reduction algorithm we propose is a logic and unsupervised method
stemming from FCA where a concept lattice, defining from a binary table, rep-
resents the description of all object-attribute combinations. When the concept
lattice structure is preserved after the deletion of some attributes and objects,
then these attributes are ”redundant” for the lattice structure and can be deleted
from the initial data without affecting the structure of object-attributes combi-
nations. Therefore, from a theoretical point of view, the description of data is
equivalently represented by a concept lattice where ”redundant” attributes and
objects are deleted.
    The reduction is a simple and polynomial treatment in FCA, classically de-
composed into two steps: attribute and object reduction. In this article, we focus
on attributes/features reduction, thus on the detection of redundant attributes
for the concept lattice structure reduced to attributes. A nice result establishes
that each subset of a concept (A,B) is a closure defined on the objects and at-
tributes set, and the concept lattice reduced to the attributes/objects is denoted
a closure lattice.
    In the first subsection, we introduce the notions of closure lattice according
to a closure operator, reduced closure lattice and redundant attributes. In the
second section, we presents the reduction algorithm aiming at removing redun-
dant attributes, with a closure operator as input. This algorithm is thus a generic
algorithm that can be applied either on attributes or on objects of a binary table,
but also on any closure system.


2.1   Reduced lattice

In FCA, the relationship between a set of attributes I and a set of objects O
are described by a formal context (O, I, (α, β)) where α(A) the set of attributes
sharing by a subset A of objects, and β(B) the set of objects sharing a subset
B of attributes. One can derive two closure systems from a context. The first
one is defined on the set of attributes I, with β ◦ α as closure operator. The
second one is defined on the set of objects O with α ◦ β as closure operator[18].
A closure system (ϕ, S) is defined by a closure operator ϕ on a set S, i.e. a map
on P(S) satisfying the three following properties: ϕ is isotone, extensive and
idempotent. A subset X ⊆ S is called closed if ϕ(X) = X (see Table 2). The set
system F of all closed subsets, fitted out with the inclusion relation ⊆, forms a
lattice usually called the closure lattice (see Fig. 1a). See the survey of Caspard
   222     Ngoc Bich Dao, Karell Bertet and Arnaud Revel


and Monjardet[19] for more details about closure systems. There are infinitely
set systems whose closure lattice are isomorphic. A reduced closure lattice is a
closure lattice defined on a set S of the smallest size among all isomorphic closure
lattices. A nice result[20,18] establishes that a closure system is reduced when,
for each x ∈ S, the closure ϕ(x) is a join irreducible (Equation 1).

               ∀x ∈ S, ∀Y ⊆ S so that x 6∈ Y, then ϕ(x) 6= ϕ(Y )                (1)
    Therefore, a non-reduced closure system contains reducible elements - ele-
ments which do not satisfy Equation 1 - each reducible element x ∈ S is then
equivalent to a set Ex ⊆ S of equivalent elements with x 6∈ Ex and ϕ(x) = ϕ(Ex ).
Reducible elements can be removed without affecting the structure of the closure
lattice. The reduction of a closure system consists then in removing or replacing
each reducible element x ∈ S by its equivalent set Ex .

2.2   Proposed reduction algorithm
The algorithm we propose is a generic reduction algorithm since it only needs a
closure operator as input. Thus it can be applied with the same complexity on
any closure system, and in particular on a context by considering the attributes
- using β ◦ α as closure operator.



             a b c d e f g h                                 a b c d e f
           1 ×           ×                                 1 ×
           2× × × ×                                        2× × ×
           3       ×××××                                   3       ×××
           4×      ×××××                                   4×      ×××
           5×        ××××                                  5×        ××
           6 ×         ××                                  6 ×         ×
           7×          ××                                  7×          ×
           8×        × ×                                   8×        ×
           9××××××××                                       9××××××
              (a) The context                             (b) The attribute-
                                                          reduced context
                        Table 1: The example of context


                       x    a b      c     d      e f g h
                      ϕ(x) a,g b,g a,c,g d,e,f,g e,g f,g g e,f,g,h

 Table 2: Attributes x ∈ S and their closure ϕ(x) for the context in Table 1a
   A direct application of the definition (see Eq. 1) would imply an exponential
cost by checking if any subset Y ⊂ S is equivalent to each x ∈ S. We use the
precedence relation (precedence graph) for a polynomial reduction. The prece-
dence graph is defined on the set S, with an edge between two elements x, y ∈ S
                                   Reduction Dimension of Bags of Visual Words with FCA                                                                 223



                                  [a, b, c, d, e, f, g, h]                                                  [a, b, c, d, e, f]




                                       [a, d, e, f, g, h]                        [a, c, g]                        [a, d, e, f]                 [a, c]




                        [d, e, f, g, h]               [a, e, f, g, h]                                 [d, e, f]              [a, e, f]




            [b, f, g]      [e, f, g, h]              [a, f, g]            [a, e, g]          [b, f]     [e, f]              [a, f]       [a, e]




               [b, g]         [f, g]             [e, g]                 [a, g]                 [b]        [f]               [e]          [a]




                                          [g]                                                                       []




        (a) The closure lattice of context in (b) The reduced closure lattice
        Table 1a                              of context in Table 1b

                                Fig. 1: The example of closure lattices


when ϕ(x) ⊆ ϕ(y). This graph is clearly acyclic for a reduced closure system.
We propose a generic algorithm in 3 steps:

Step 1: Standardization. Check if there exists x, y ∈ S such that ϕ(x) =
   ϕ(y). When ϕ(x) = ϕ(y), then x and y belong to the same strongly connected
   components of the graph. Each strongly connected components X ⊆ S in-
   clude the elements xi , xj so that ϕ(xi ) = ϕ(xj ), ∀xi 6= xj ∈ X. Thus, we
   can delete all elements except one representative element x ∈ X of the com-
   ponent. The obtained precedence graph is then an acyclic graph.
Step 2: Clarification. Check if there exists x ∈ S such that ϕ(x) = ϕ(∅).
   When such an x exists, then ϕ(x) is included into ϕ(y) for any y ∈ S, thus
   x is the only source of the precedence graph. The clarification test has only
   to be performed for graphs with one source.
Step 3: Reduction. Check, for any x ∈ S, if there exists a set Ex ⊂ S such
   that x ∈ / Ex and ϕ(x) = ϕ(Ex ). One can observe that an attribute x
   with only one immediate predecessor y is not reducible, because it would
   be equivalent to y, and thus belong to the same strongly connected com-
   ponent already removed in the previous step. If there exists Ex ⊂ S such
   that ϕ(x) = ϕ(Ex ), then elements of Ex are clearly predecessors of x in the
   precedence graph since, for ∀y ∈ Ex , ϕ(x) = ∩ϕ(y). Moreover, this test can
   be reduced to maximal predecessors of x. Therefore, this treatment has only
   to be performed for elements with more than one immediate predecessors,
   and the equality has to be checked with the set of immediate predecessors
   of x.

   This algorithm takes into account a closure operator ϕ on a set S as input.
The output of the alforithm is the reducible element set X ⊂ S and the equivalent
elements set Ex for each x ∈ X.
   224     Ngoc Bich Dao, Karell Bertet and Arnaud Revel


    Alg. 1 reduces a closure system in O(|S|.cϕ + |S|2 log |S|) where cϕ is the cost
of a closure generation and —S— is the number of nodes. Indeed, the precedence
graph can be initialized in O(|S|cϕ + |S|2 log|S|) by computing the closures in
O(|S|cϕ ), and then comparing two closures in O(|S|2 log|S|). Then, the SCCs can
be computed using Kosaraju’s algorithm by two passes of depth first search, thus
a complexity in O(|S| + |A|) ≤ O(|S|2 ), with |A| nb of edges in the graph. Stan-
dardization and clarification are clearly in O(|S|) by a simple pass into the graph.
Finaly, reduction considers the immediate predecessors of each x ∈ S in O(|S|2 ),
and then computes and compare two closures in O(|S|cϕ +|S|2 log|S|). Therefore,
Alg. 1 computes the attribute reduced context in O(|I|2 |O| + |I|2 log|I|). since a
closure can be obtained in O(|I|.|O|).


   Input: a closure operator ϕ on a set S
   Output: the reducible elements set X ⊂ S, and the equivalent elements set Ex
                for each x ∈ X
   init a set Res with ∅;
   init a graph G with S as set of node;
   \\ Precedence graph;
   foreach (x, y) ∈ S × S do
        if ϕ(x) ⊆ ϕ(y) then
            add the edge (x, y) in G;
        end
   end
   compute the set CF C of the strongly connected components of G;
   let source be the sources of the graph G;
   \\ Step (1): Standardization;
   foreach C ∈ CF C do
        choose y ∈ C;
        foreach x ∈ C such that x 6= y do
            add x in Res with Ex = {y}; delete x from the graph G;
        end
   end
   \\ Step (2): Clarification;
   if |source| = 1 and ϕ(source) = ϕ(∅) then
        add source in Res with Esource = ∅; delete source from G;
   end
   \\ Step (3): Reduction;
   foreach x ∈ G do
        let P the set of immediate predecessors x in the graph G;
        if |P | 6= 1 and ϕ(x) = ϕ(P ) then
            add x in Res with Ex = P ; delete x from the graph G;
        end
   end
   return Res, (Ex )x∈Res ;
                Algorithm 1: Reduction of a closure system
                          Reduction Dimension of Bags of Visual Words with FCA           225


3     Experimentation

3.1     Datasets

In our experiments, we compare the performance of the method we propose on
different image data sets. Each image in a data set is described by a vector
composed of the occurrence frequencies of its visual words, where a set of visual
words is defined for each data set. Table 3 describes the different data sets we
used in our experiments, and the methods applied to generate the whole bag of
visual words.


      Database       Images nb Features    Detector      Descriptor     Dictionary of
                                  nb                                    visual words

    VOC2012[21]       17124      4096       Harris-      CMI (Colour       Random
                                            Laplace        Moment        selection of
                                                        Invariants)[22] all key points

    MIR flickr[23]    24991      4096       Harris-         CMI1          Random
                                            Laplace                     selection of
                                                                       all key points

     COREL[24]         4998      500         SIFT          SIFT[2]      K-means[25]
                                                                         (OpenCV)

      CALTECH         30607      500         SIFT           SIFT2         K-means
       256[26]                                                           (OpenCV)

      Dataset 1        1354      262        Harris-         SIFT          K-means
    (VOC2005)[27]                         Laplace and                    (OpenCV)
                                          Laplacian3
                         Table 3: Description of used datasets



3.2     Experimental protocol

As mentioned earlier, the algorithm we propose requires binary values indicating
for each object whether it possesses a given attribute or not. Since each image
is described by a visual word occurence frequency vector, its values can vary
from 0 to a max value depending on the image size and the quantity of visual
words in the image. For instance, if an image is black painted, there is only one
visual word ”black” for the whole image with a big frequency, and the vector
1
  http://koen.me/research/colordescriptors/
2
  http://www.robots.ox.ac.uk/ vgg/research/affine/#software
3
  http://lear.inrialpes.fr/people/dorko/downloads.html
   226     Ngoc Bich Dao, Karell Bertet and Arnaud Revel


will be sparse. Conversely, an image with a patchwork of colors is described by
a frequency vector mainly composed of low but not zero values. To be able to
compare several images, it is thus necessary to normalize their frequency vector
before binarization.


Normalization As mentioned before, the visual word occurrence frequency
can be very important in some images, and insignificant in others. In order to
compare the visual words, several strategies can be adopted.
    First of all, it is necessary to find out a ”max” value in the data set and then
divide the visual word frequency by this max value to transform the values in a
range 0 to 1. Two manners to define the max value have been considered into
this article.

Normalization by line (image) With this type of normalization, a max value is
computed for each image as being the maximum frequency value of the corre-
sponding image. The interpretation of this normalization is that we consider as
significant the ratio between the different attributes of a given image. This kind
of normalization does not depend on the database size and on the image size.
However, the normalized values do not account for the ratio measurement of the
same attribute between the images in the database.

Normalization by column (feature) Normalization by column finds out the max-
imum values of the frequency for each attribute in the database. With this ap-
proach, the correspondence between the images in the database is taken into ac-
count. The drawback is that each time a new image is inserted into the database,
the normalized values must be recomputed. Besides, the image size must also be
taken into account. Table 4 gives an illustrated example.



              f1 f2 f3 f4           f1 f2 f3 f4            f1 f2 f3    f4
         img1 1 0 50 5        img1 0.02 0 1 0.1      img1 0.1 0 1 0.05
         img2 10 9 1 8        img2 1 0.9 0.1 0.8     img2 1 1 0.02 0.08
         img3 0 0 0 99        img3 0 0 0 1           img3 0 0 0      1
          (a) Initial data   (b) After normalization (c) After normalization
                             by line                 by column

                  Table 4: Illustration for normalization types


Binarization After the normalization, we simply binarize the normalized values
by comparing these values with a threshold varying from 0 to 0.9. At the highest
threshold one, in the normalization by line case, it is possible that most of the
attributes in an image should be below the threshold. To avoid removing all the
visual words from an image, the highest threshold has been assigned to 0.9.
                       Reduction Dimension of Bags of Visual Words with FCA        227


Reduction The next phase in the algorithm is to apply our reduction method
which is itself composed of three steps (clarification, standardisation, reduction).
Indeed, before applying the proposed method to bag of visual words, we must
remove all the visual words that appear (resp. do not appear) in each (resp.
any) image. This step corresponds to the clarification. The standardization step
reduces the feature that the vector of images of a given feature equivalent to
the vector of images of another feature. At last, in the reduction step, all the
features which are the combination of other features are removed.


3.3   Results

In this section, we detail the results obtained with our reduction method for 5
data sets, described in section 2.2. To analyze the behavior of our method, and
the contribution of each step of the algorithm, we introduce the ratio of removed
features for each step of the reduction algorithm as follows:

                    ∆1 = Naatt , ∆2 = Nattb −a , ∆3 = Natt −a−b
                                                           c



    Where a (resp. b and c) is the number of removed attributes in the standard-
ization (resp. clarification and reduction) step; Natt is the attribute number in
total. Figure 2 shows the evolution of ∆1 , ∆2 , ∆3 with regard to the threshold
level, for both normalization types: line and column.
    The maximum ratio of removed attributes of the data sets (CALTECH,
COREL, VOC2005, MIRflickr, VOC2012) are approximately equal to 0.67%,
2.6%, 22.5%, 95%, 96% respectively. The impact of the reduction is more in-
teresting in the last three datasets. This phenomenon can be explained by the
bag of visual words generation since the two data sets MIR flickr and VOC2012
are composed of randomly selected visual words stemming from the keypoints
set. Conversely, the data sets CALTECH, COREL and VOC2005, are composed
of bags of visual words defined by the SIFT detector and descriptor, and by a
K-means clustering. Thus, the randomly selected visual words are less consistent.
    We can also observe that the percentage of removed attributes increases while
the binarization threshold increases. With an increasing threshold, only the most
frequent words are kept, thus more attributes are potentially equivalent and
removed.
    At last, there is no attribute reduction in the step 1 (∆1 value) with a nor-
malization by column because this kind of normalization can not generate empty
columns. Morover, a normalization by line keeps the most frequent attributes in
each image whereas a normalization by column keeps the most frequent images
for each attribute. To summarize, the number of removed attributes depends
both on the visual words generation, on the chosen threshold of binarization
and on the normalization process (by line or column). However, care should be
taken, that the greater the binarization threshold is, the smaller the number of
images remaining. Except in the case normalization by line.
   228      Ngoc Bich Dao, Karell Bertet and Arnaud Revel




CALT ECH




  COREL




 V OC2005




M IRf lickr




 V OC2012
                 (a) Normalization by line        (b) Normalization by column

Fig. 2: The ratio of removed attributes according to the initial attributes corre-
sponding to three cases of proposed method where red line is ∆1 , blue dash is
∆2 and green dash dot dot is ∆3 .
                        Reduction Dimension of Bags of Visual Words with FCA            229


4    Conclusion and perspective

In this article, we present a logic feature selection method of bags of visual
words. This method, stemming from Formal Concept Analysis, is a closure sys-
tem reduction without, theoretically, loss of information. That means that the
data description lattice is preserved by the reduction treatment. In our con-
text, combining our proposed method with a bag of visuals words is original.
The experimentations show that the number of deleted features can be interest-
ing, depending on the data set and the binarization treatment. Moreover, it is
possible to perform both an object and an attribute reduction.
    A finer analysis should be obtained in the supervised case, by comparing
classification performance before and after reduction. Moroever, the number of
potentially deleted objects could also be usefull to autmatically define a good
binarization thresold in the supervised case: while suppression of objects belong-
ing to the same class is to promote, we must avoid removing objects of different
classes. Objects reduction can easily be performed by applying our reduction
algorithm on the objects set.
    At last, we plan to study the number of deleted attributes and deleted objects
(of the same class / of different class) to evaluate the complexity of a data set,
and the quality of its visuals words.
    Acknowledgment: We would like to thank Thierry URRUTY, Nhu Van
NGUYEN and Dounia AWAD who extracted the bag of visual words we used
in this paper.


References
 1. Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image
    retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and
    Machine Intelligence 22 (2000) 1349–1380
 2. Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings
    of the International Conference on Computer Vision, Kerkyra (1999) 1150–1157
 3. Bosch, A., Zisserman, A., Munoz, X.: Scene Classification Via pLSA. In Leonardis,
    A., Bischof, H., Pinz, A., eds.: 9th European Conference on Computer Vision.
    Volume 3954 of Lecture Notes in Computer Science., Graz, Austria, Springer Berlin
    Heidelberg (2006) 517–530
 4. Tufféry, S.: Data mining et statistique décisionnelle: L’intelligence des données.
    Technip edn. Volume 2010. (2010)
 5. Belohlavek, R., Kruse, R., Vychodil, V.: Discovery of optimal factors in binary data
    via a novel method of matrix decomposition. Journal of Computer and System
    Sciences 76 (2010) 3–20
 6. Fisher, R.A.: The use of multiple measurements in taxonomic problems. The
    Annals of Eugenics 7 (1936) 179–188
 7. Hotelling, H.: Analysis of a complex of statistical variables into principal compo-
    nents. Journal of Educational Psychology 24 (1933) 417–441
 8. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-
    based filter solution. In: Proceedings of the Twentieth International Conference
    on Machine Learning (ICML-2003), Washington DC (2003) 856–863
   230      Ngoc Bich Dao, Karell Bertet and Arnaud Revel


 9. Hall, M.A.: Correlation-based feature subset selection for machine learning. Doctor
    of philosophy, University of Waikato, Hamilton, NewZealand (1999)
10. Battiti, R.: Using mutual information for selecting features in supervised neural
    net learning. IEEE transactions on neural networks / a publication of the IEEE
    Neural Networks Council 5 (1994) 537–550
11. Rakotomalala, R., Lallich, S.: Construction d’arbres de decision par optimisation.
    Revue Extraction des Connaissances et Apprentissage 16 (2002) 685–703
12. Kononenko, I.: Estimating attributes: Analysis and extensions of RELIEF. In
    Bergadano, F., Raedt, L., eds.: Machine Learning: ECML-94. Volume 784 of Lec-
    ture Notes in Computer Science. Springer Berlin Heidelberg, Berlin, Heidelberg
    (1994) 171–182
13. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Neural
    Information Processing Systems Foundation, MIT Press (2005)
14. Devaney, M., Ram, A.: Efcient Feature Selection in Conceptual Clustering. In: Ma-
    chine Learning: Proceedings of the Fourteenth International Conference, Nashville,
    TN (1997)
15. Dy, J.G., Brodley, C.E.: Feature Selection for Unsupervised Learning. Journal of
    Machine Learning Research 5 (2004) 845–889
16. Wolf, L., Shashua, A.: Feature Selection for Unsupervised and Supervised Infer-
    ence: The Emergence of Sparsity in a Weight-Based Approach. The Journal of
    Machine Learning Research 6 (2005) 1855–1887
17. Elghazel, H., Aussem, A.: Unsupervised feature selection with ensemble learning.
    Machine Learning (2013)
18. Barbut, M., Monjardet, B.: Ordre et classification: algèbre et combinatoire. Ha-
    chette, Paris (1970)
19. Caspard, N., Monjardet, B.: The lattices of closure systems, closure operators, and
    implicational systems on a finite set: a survey. Discrete Applied Mathematics 127
    (2003) 241–269
20. Birkhoff, G.: Lattice Theory. 1st edn. American Mathematical Society (1940)
21. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The
    PASCAL Visual Object Classes (VOC) Challenge (2012)
22. Mindru, F., Tuytelaars, T., Gool, L.V., Moons, T.: Moment invariants for recog-
    nition under changing viewpoint and illumination. Computer Vision and Image
    Understanding 94 (2004) 3–27
23. Huiskes, M.J., Lew, M.S.: The MIR flickr retrieval evaluation. In: Proceeding of
    the 1st ACM international conference on Multimedia information retrieval - MIR
    ’08, New York, USA, ACM Press (2008) 39–43
24. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised Learning
    of Semantic Classes for Image Annotation and Retrieval. IEEE Transactions on
    Pattern Analysis and Machine Intelligence 29 (2007) 394–410
25. Macqueen, J.B.: Some Methods for classification and Analysis of Multivariate Ob-
    servations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics
    and Probability. (1967) 281–297
26. Griffin, G., Holub, A.D., Perona, P.: Caltech-256 Object Category Dataset. Tech-
    nical report (2007)
27. Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L., Al., A.: The 2005
    PASCAL Visual Object Classes Challenge. In: First PASCAL Machine Learning
    Challenges Workshop, MLCW 2005. Volume 3944 of Lecture Notes in Computer
    Science., Berlin, Heidelberg, Springer Berlin Heidelberg (2005) 117–176
          A One-Pass Triclustering Approach: Is There
                   any Room for Big Data?

      Dmitry V. Gnatyshak1 , Dmitry I. Ignatov1 , Sergei O. Kuznetsov1 , and Lhouari
                                        Nourine2
          1
              National Research University Higher School of Economics, Russian Federation
                                    dmitry.gnatyshak@gmail.com
                                         http://www.hse.ru
                          2
                            Blaise Pascal University, LIMOS, CNRS, France
                                  http://www.univ-bpclermont.fr/



                Abstract. An efficient one-pass online algorithm for triclustering of bi-
                nary data (triadic formal contexts) is proposed. This algorithm is a
                modified version of the basic algorithm for OAC-triclustering approach,
                but it has linear time and memory complexities with respect to the car-
                dinality of the underlying ternary relation and can be easily parallelized
                in order to be applied for the analysis of big datasets. The results of
                computer experiments show the efficiency of the proposed algorithm.

                Keywords: Formal Concept Analysis, triclustering, triadic data, data
                mining, big data


      1       Introduction
      Cluster analysis of multimodal data and specifically of dyadic and triadic re-
      lations is a natural extension of the idea of normal clustering. In dyadic case
      biclustering methods (the term bicluster was coined by B. Mirkin [17]) are used
      to simultaneously find subsets of the sets of objects and attributes that form ho-
      mogeneous patterns of the input object-attribute data. One of the most popular
      applications of biclustering is gene expression analysis in Bionformatics [16,3].
      Triclustering methods operate in triadic case in which for each object-attribute
      pair one assigns a set of some conditions [18,8,5]. Both biclustering and triclus-
      tering algorithms are widely used in such areas as the analysis of gene expression
      [21,15,13], recommender systems [19,10,9], social networks analysis [6], etc. The
      processing of numeric multimodal data is also possible by modifications of ex-
      isting approaches for mining binary relations [12].
           Though there are methods that can enumerate all triclusters satisfying cer-
      tain constraints [1] (in most cases they ensure that triclusters are dense), their
      time complexity is rather high, as in the worst case the maximal number of tri-
      clusters usually is exponential (e.g. in case of formal triconcepts), showing that
      these methods are hardly scalable. To process big data algorithms need to have
      at most linear time complexity and be easily parallelizable. Also, in most cases,
      it is necessary that such algorithms output the results in one pass.




c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 231–243,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
232 2   Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al.

       In order to create an algorithm satisfying these requirements we adapted a tri-
   clustering method based on prime operators (prime OAC-triclustering method)
   [5]. As the result we developed an online version of prime OAC-triclustering
   method, which is linear, one-pass and easily parallelizable.
       The rest of the paper is organized as follows: in Section 2 we recall the
   method and the basic version of the algorithm of prime OAC-triclustering. In
   Section 3 we describe the online setting for the problem and the corresponding
   online version of the basic algorithm with some optimizations. Finally, in Section
   4 we show the results of some experiments which demonstrate the efficiency of
   the online version of the algorithm.


   2    Prime object-attribute-condition triclustering method
   Prime object-attribute-condition triclustering method based on the framework
   of Formal Concept Analysis [20,4,2] is an extension for the triadic case of object-
   attribute biclustering method [7]. Triclusters generated by this method have the
   same structure as the corresponding biclusters, namely the cross-like structure
   of triples inside the iput data cuboid (i.e. formal tricontext).
       Let K = (G, M, B, I) be a triadic context, where G, M , B are respectively
   the sets of objects, attributes, and conditions, and I ⊆ G × M × B is a triadic
   incidence relation. Each prime OAC-tricluster is generated by applying the
   following prime operators to each pair of components of some triple:


                (X, Y )0 = {b ∈ B | (g, m, b) ∈ I for all g ∈ X, m ∈ Y },
                (X, Z)0 = {m ∈ M | (g, m, b) ∈ I for all g ∈ X, b ∈ Z},              (1)
                 (Y, Z)0 = {g ∈ G | (g, m, b) ∈ I for all m ∈ Y, b ∈ Z}

       Then the triple T = ((m, b)0 , (g, b)0 , (g, m)0 ) is called prime OAC-tricluster
   based on triple (g, m, b) ∈ I. The components of tricluster are called, respec-
   tively, extent, intent, and modus. The triple (g, m, b) is called a generating triple
   of the tricluster T . Figure 2 shows the structure of an OAC-tricluster (X, Y, Z)
   based on triple (e     e eb), triples corresponding to the gray cells are contained in
                      g , m,
   the context, other triples may be contained in the tricluster (cuboid) as well.
       The basic algorithm for prime OAC-triclustering method is rather simple
   (Alg. 1). First of all, for each combination of elements from each two sets of K
   we compute the results of applying the corresponding prime operator (we will
   call the resulting sets prime sets). After that we enumerate all triples from I and
   on each step we must generate a tricluster based on the corresponding triple,
   check whether this tricluster is already presented in the tricluster set (by using
   hashing) and also check conditions.
       The total time complexity of the algorithm depends on whether there is a
   non-zero minimal density threshold or not and on the complexity of the hashing
   algorithm used. In case we use some basic hashing algorithm processing the
   tricluster’s extent, intent and modus and have a minimal density threshold equal
   to 0, the total time complexity of the main loop is O(|I|(|G| + |M | + |B|)), and of
               A One-Pass Triclustering
   A One-pass Triclustering Approach: IsApproach:
                                         There anyIs Room
                                                     There any Room
                                                           for Big   for Big Data?
                                                                   Data?      233    3




                    Fig. 1. Structure of prime OAC-triclusters




Algorithm 1 Algorithm for prime OAC-triclustering.
Input: K = (G, M, B, I) — tricontext;
    ρmin — minimal density threshold
Output: T = {T = (X, Y, Z)}
 1: T := ∅
 2: for all (g, m) : g ∈ G,m ∈ M do
 3:   P rimesOA[g, m] = (g, m)0
 4: end for
 5: for all (g, b) : g ∈ G,b ∈ B do
 6:   P rimesOC[g, b] = (g, b)0
 7: end for
 8: for all (m, b) : m ∈ M ,b ∈ B do
 9:   P rimesAC[m, b] = (m, b)0
10: end for
11: for all (g, m, b) ∈ I do
12:   T = (P rimesAC[m, b], P rimesOC[g, b], P rimesOA[g, m])
13:   T key = hash(T )
14:   if T key 6∈ T .keys ∧ ρ(T ) ≥ ρmin then
15:      T [T key] := T
16:   end if
17: end for
234 4   Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al.

   the whole algorithm is O(|G||M ||B| + |I|(|G| + |M | + |B|)). If we have a non-zero
   minimal density threshold, the time complexity of the main loop, as well as the
   time complexity of the algorithm, is O(|I||G||M ||B|).
       The memory complexity is O(|I|(|G| + |M | + |B|)), as we need to keep the
   dictionaries with the prime sets in memory.


   3    Online version of the OAC-triclustering algorithm

   At first, let us describe the online problem of finding the set of prime OAC-
   triclusters. Let K = (G, M, B, I) be a triadic context. The user has no a priori
   knowledge of the elements and even cardinalities of G, M , B, and I. At each
   iteration we receive some set of triples from I: J ⊆ I. After that we must
   process J and get the current version of the set of all triclusters. It is important
   in this setting to consider every pair of triclusters different if they have different
   generating triples, event if their extents, intents, and modi are equal, because
   any other triple can change only one of them, thus making them different. The
   picture 2 shows the example of such situation (dark gray cells are the generating
   triples, light gray — prime sets).




             Fig. 2. Example of modification of triclusters by adding a triple




      Also the algorithm requires that the dictionaries containing the prime sets
   are implemented as hash-tables. Because of this data structure the algorithm
   can efficiently access prime sets for their processing.
      The algorithm itself is also quite simple (Alg. 2). It takes some set of triples
   (J) and current versions of the tricluster set (T ) and the dictionaries contain-
   ing prime sets (P rimesOA, P rimesOC, P rimesAC) as input and outputs the
   modified versions of the tricluster set and dictionaries. The algorithm processes
   each triple (g, m, b) of J sequentially (line 1). On each iteration the algorithm
   modifies the corresponding prime sets:
               A One-Pass Triclustering
   A One-pass Triclustering Approach: IsApproach:
                                         There anyIs Room
                                                     There any Room
                                                           for Big   for Big Data?
                                                                   Data?      235       5

 – adds b to (g, m)0 (line 2)
 – adds m to (g, b)0 (line 3)
 – adds g to (m, b)0 (line 4)

    Finally, it adds a new tricluster to the tricluster set. It is important to note
that this tricluster contains pointers to the corresponding prime sets (in the
corresponding dictionaries) instead of the copies of the prime sets (line 5).
    In effect this algorithm is the same as the basic one but with some optimiza-
tions. First of all, instead of computing prime sets at the beginning, we modify
them on spot, as adding an additional triple to the relation modifies only three
prime sets by one element. Secondly, we remove the main loop by using pointers
for the triclusters’ extents, intents, and modi, as we can generate triclusters at
the same step as we modify the prime sets. And the third important optimiza-
tion is the use of only one pass through the triples of the ternary relation I,
instead of enumeration of different pairwise combinations of objects, attributes,
and conditions.


Algorithm 2 Add function for the online algorithm for prime OAC-triclustering.
Input: J — set of triples;
    T = {T = (∗X, ∗Y, ∗Z)} — current set of triclusters;
    P rimesOA, P rimesOC, P rimesAC;
Output: T = {T = (∗X, ∗Y, ∗Z)};
    P rimesOA, P rimesOC, P rimesAC;
 1: for all (g, m, b) ∈ J do
 2:    P rimesOA[g, m] := P rimesOA[g, m] ∪ b
 3:    P rimesOC[g, b] := P rimesOC[g, b] ∪ m
 4:    P rimesAC[m, b] := P rimesAC[m, b] ∪ g
 5:    T := T ∪ (&P rimesAC[m, b], &P rimesOC[g, b], &P rimesOA[g, m])
 6: end for



     Let us estimate the complexities of this algorithm. Each step requires the
constant time: we need to modify three sets and add one tricluster to the set of
triclusters. The total number of steps is equal to |I|. Thus the time complexity
is linear O(|I|). Beside that the algorithms is one-pass.
     The memory complexity is the same: for each of |I| steps the size of each dic-
tionary containing prime sets is increased either by one element (if the required
prime set is already present), or by one key-value pair (if not). Still, each of
these dictionary requires O(|I|) memory. Thus, the memory complexity is also
linear O(|I|).
     Another important step used as an addition to this algorithm is post-processing.
In addition to the user-specific post-processing there are some common useful
steps. First of all, in the fixed moment of time we may want to remove addi-
tional triclusters with the same extent, intent, and modus from the output. Also
some simple conditions like minimal support condition can be processed during
236 6    Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al.

   this step without increasing the original complexity. It should be done only dur-
   ing the post-processing step, as the addition of a triple in the main algorithm
   can drastically change the set of triclusters, and, respectively, the values used
   to check the conditions. Finally, if we need to check more difficult conditions
   like minimal density condition the time complexity of the post-processing will
   be higher than the time complexity of the original algorithm, but it can be also
   efficiently implemented.
       To remove the same triclusters we need to use an efficient hashing procedure
   that can be improved by implementing it in the main algorithm. For this for
   all prime sets we need to keep their hash-values with them in the memory. And
   finally, when using hash-functions other than LSH function (Locality-Sensitive
   Hashing) [14] we can calculate hash-values of prime sets as some function of
   their elements (for example, exclusive disjunction or sum). Then when we modify
   prime sets we just need to get the result of this function and the new element. In
   this case, the hash-value of the tricluster can be calculated as the same function
   of the hash-values of its extent, intent, and modus.
      Then it would be enough to implement the tricluster set as a hash-set in
   order to efficiently remove the additional entries of the same tricluster.
        Pseudo-code for the basic post-processing (Alg. 3).



   Algorithm 3 Post-processing for the online algorithm for prime OAC-
   triclustering.
   Input: T = {T = (∗X, ∗Y, ∗Z)} — full set of triclusters;
   Output: T = {T = (∗X, ∗Y, ∗Z)} — processed hash-set of triclusters;
    1: for all T ∈ T do
    2:   Calculate hash(T )
    3:   if hash(T ) 6∈ T then
    4:      T := T ∪ T
    5:   end if
    6: end for




       If the names of the objects, attributes, and conditions are small enough (so
   that we can consider the time complexity of computing their hash values as
   O(1)), the time complexity of the post-processing is O(|I|) if we do not need
   to calculate densities, and O(|I||G||M ||B|) otherwise. Also, the basic version
   of the post-processing does not require any additional memory, so its memory
   complexity is O(1).
      Finally, the algorithm can be easily paralleled by splitting the subset of triples
   J into several subsets, processing each of them independently, and merging the
   resulting sets afterwards.
                A One-Pass Triclustering
    A One-pass Triclustering Approach: IsApproach:
                                          There anyIs Room
                                                      There any Room
                                                            for Big   for Big Data?
                                                                    Data?      237     7

4     Experiments

Two series of experiments were conducted in order to verify the time complexities
and efficiency of the online algorithm: first one was conducted on the first set of
synthetic contexts and on real world datasets, the second one — on the second
set of synthetic contexts with large number of triples in each. In each experiment
for the first set both versions of the OAC-triclustering algorithm were used to
extract triclusters from a given context. Only the online version of the algorithm
was applied to the second set of contexts as the computation time of the basic
version of the algorithm was too high. To evaluate the time more precisely, for
each context there were 5 runs of the algorithms with the average result recorded.


4.1    Datasets

Synthetic datasets. As it was mentioned, two sets of synthetic contexts were
generated.
       First five contexts have the same size, but different average densities. The
sets of objects, attributes, and conditions of these contexts consist of 50 elements
each (thus, the maximal number of triples for them is equal to 125,000). To form
the relation I a pseudo-random number generator was used. It added each triple
to the context with the given probability that was different for each context.
These probabilities were: 0.02, 0.04, 0.06, 0.08, and 0.1.
       The second set of uniform synthetic contexts consists of 10 contexts with the
same probability for each triple to be included (0.001), but with different sizes
of the sets of objects, attributes, and conditions. These sizes were 100, 200, 300,
. . . , 1000.

IMDB. This dataset consists of Top-250 list of the Internet Movie Database (250
best movies based on user reviews). For the analysis the following triadic context
was extracted: the set of objects consists of movie names, the set of attributes —
of tags, the set of conditions — of genres, and a triple of the ternary relation
means that the given movie has the given genre and is assigned the given tag.

Bibsonomy. Finally, a sample of the data of bibsonomy.org was used. This
website allows users to share bookmarks and lists of literature and tag them.
For the research the following triadic context was extracted: the set of objects
consists of users, the set of attributes (tags), the set of conditions (bookmarks),
and a triple of the ternary relation means that the given user has assigned the
given tag to the given bookmark.
    The table 1 contains the summary of the contexts.


4.2    Results

The experiments were conducted on the computer running under Windows 8, us-
ing Intel Core i7-3517U 2.40 GHz processor, having 8 GB RAM. The algorithms
238 8   Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al.

                         Table 1. Contexts for the experiments


                       Context        |G| |M | |B| # triples Density
                   Synthetic1 , 0.02 50 50 50          2530 0.02024
                   Synthetic1 , 0.04 50 50 50          5001 0.04001
                   Synthetic1 , 0.06 50 50 50          7454 0.05963
                   Synthetic1 , 0.08 50 50 50         10046 0.08037
                    Synthetic1 , 0.1   50 50 50       12462 0.09970
                   Synthetic2 , 100 100 100 100         996 0.001
                   Synthetic2 , 200 200 200 200        7995 0.001
                   Synthetic2 , 300 300 300 300       27161 0.001
                   Synthetic2 , 400 400 400 400       63921 0.001
                   Synthetic2 , 500 500 500 500 125104 0.001
                   Synthetic2 , 600 600 600 600 216021 0.001
                   Synthetic2 , 700 700 700 700 343157 0.001
                   Synthetic2 , 800 800 800 800 512097 0.001
                   Synthetic2 , 900 900 900 900 729395 0.001
                   Synthetic2 , 1000 1000 1000 1000 1000589 0.001
                        IMDB          250 795 22       3818 0.00087
                     BibSonomy         51 924 2844     3000 0.000022




   were implemented in C# under .NET Framework 4.5. Jenkins’ hash-function
   [11] was used to generate hash-values.
       Figure 3 shows the time performance of both versions of the algorithms for
   different values of minimal density threshold. Figure 4 shows the computation
   time for the online version of the algorithm on the second set of synthetic con-
   texts. “Basic” graph refers to the average time required by the basic algorithm,
   “Online, algorithm” — to average time required by the main algorithm part of
   the online algorithm (addition of new triples), “Online, total” — to the aver-
   age time required by both the main algorithm and post-processing. Table 4.2
   contains summary of the results for the case of zero minimal threshold.
       As it can be clearly seen from all the graphs, online version of the algorithm
   significantly outperforms the basic version. However, post-processing in case of
   non-zero minimal density threshold can minimize the difference, especially in
   cases with small sets of objects, attributes, and conditions and large ternary
   relation.
       In the case of several contexts of the fixed size, but increasing density, total
   computation time converges to the same value for the both algorithms, with
   the time for the online one being slightly smaller. For the non-zero minimal
   density threshold this convergence takes place for almost any average density
   value. In this case there is a rather large number of triclusters of big size,
   with many intersections, thus it takes much time to calculate all the triclusters’
   densities. This situation is close to the worst case, where time complexity is
   O(|G||M ||B|) for the main algorithm (because |I| converges to |G||M ||B|) and
   O(|I||G||M ||B|) for the post-processing. Also, in the case where the context’s
              A One-Pass Triclustering
  A One-pass Triclustering Approach: IsApproach:
                                        There anyIs Room
                                                    There any Room
                                                          for Big   for Big Data?
                                                                  Data?      239      9




Fig. 3. Results of the experiments for both versions of OAC-triclustering algorithm
240 10     Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al.




         Fig. 4. Computation time for the online algorithm for various numbers of triples




   density getting closer to 1, total time for both algorithms should be almost the
   same even in the case of zero minimal density threshold, as in the worst case for
   dense contexts |I| is equal to |G||M ||B| (though it is an extremely rare case for
   real datasets).
       The results for the second set of synthetic contexts confirm that the algorithm
   is indeed linear with respect to the number of triples. It also shows that the
   significant number of triples does not affect the performance as long as the
   context fits in the memory.
       As for the other datasets with large sets of objects, attributes, and conditions
   and small ternary relation, the online algorithm significantly outperforms the
   basic one. The basic version spends much time on enumeration the large number
   of combinations of the elements of different sets of the context, while the online
   one just passes through the existing triples. Time to compute densities is quite
   small for these datasets since due to their sparseness they contain small number
   of rather small triclusters.
       Finally, as it can be seen, for non-dense contexts the average density of
   triclusters is rather high even in the case of zero minimal density threshold.
   Because of that, it can be advised in most of the cases to use the online version
   of the algorithm without any hard conditions, like minimal density condition, as
   the results will still be good, but the performance will be significantly improved.
                A One-Pass Triclustering
    A One-pass Triclustering Approach: IsApproach:
                                          There anyIs Room
                                                      There any Room
                                                            for Big   for Big Data?
                                                                    Data?      241        11

                         Table 2. Tricluster sets summary

               Context           Number of triclusters Average density
               Synthetic1 , 0.02                 2456            0.700
               Synthetic1 , 0.04                 4999            0.426
               Synthetic1 , 0.06                 7453            0.286
               Synthetic1 , 0.08                10046            0.218
               Synthetic1 , 0.1                 12462            0.193
               Synthetic2 , 100                   897            0.993
               Synthetic2 , 200                  6972            0.972
               Synthetic2 , 300                 23645            0.941
               Synthetic2 , 400                 56584            0.909
               Synthetic2 , 500                113041            0.871
               Synthetic2 , 600                199210            0.834
               Synthetic2 , 700                322447            0.796
               Synthetic2 , 800                487982            0.759
               Synthetic2 , 900                703374            0.722
               Synthetic2 , 1000               973797            0.686
               IMDB                              1276            0.539
               BibSonomy                         1290            0.946




5    Conclusion
In this paper we have presented an online version of OAC-triclustering algorithm.
We have shown that the algorithm is efficient from both theoretical and practical
points of view. Its linear time complexity and performance in one pass (with an
additional pass for the required post-processing) allows us to use it for big data
problems. Moreover, the online algorithm as well as the basic one can be easily
parallelized to attain even larger efficiency.

Acknowledgements. The study was implemented in the framework of the
Basic Research Program at the National Research University Higher School of
Economics in 2013-2014, in the Laboratory of Intelligent Systems and Structural
Analysis (Russian Federation), and in the LIMOS (Laboratoire d’Informatique,
de Modelisation et d’Optimisation des Systemes) (France). The first three au-
thors were partially supported by Russian Foundation for Basic Research, grant
no. 13-07-00504.


References
 1. Cerf, L., Besson, J., Nguyen, K.N., Boulicaut, J.F.: Closed and noise-tolerant
    patterns in n-ary relations. Data Min. Knowl. Discov. 26(3), 574–619 (2013)
 2. Davey, B.A., Priestley, H.A.: Introduction to Lattices and Order. Cambridge Uni-
    versity Press, 2 edn. (2002)
 3. Eren, K., Deveci, M., Kucuktunc, O., Catalyurek, Umit V.: A comparative analysis
    of biclustering algorithms for gene expression data. Briefings in Bioinform. (2012)
242 12   Dmitry V. Gnatyshak et al.Dmitry V. Gnatyshak et al.

    4. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations.
       Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1st edn. (1999)
    5. Gnatyshak, D.V., Ignatov, D.I., Kuznetsov, S.O.: From triadic FCA to tricluster-
       ing: Experimental comparison of some triclustering algorithms. In: Ojeda-Aciego,
       M., Outrata, J. (eds.) CLA. CEUR Workshop Proceedings, vol. 1062, pp. 249–260.
       CEUR-WS.org (2013)
    6. Gnatyshak, D.V., Ignatov, D.I., Semenov, A.V., Poelmans, J.: Gaining insight
       in social networks with biclustering and triclustering. In: BIR. Lecture Notes in
       Business Information Processing, vol. 128, pp. 162–171. Springer (2012)
    7. Ignatov, D.I., Kuznetsov, S.O., Poelmans, J.: Concept-based biclustering for in-
       ternet advertisement. In: ICDM Workshops. pp. 123–130. IEEE Computer Society
       (2012)
    8. Ignatov, D.I., Kuznetsov, S.O., Poelmans, J., Zhukov, L.E.: Can triconcepts be-
       come triclusters? International Journal of General Systems 42(6), 572–593 (2013)
    9. Ignatov, D.I., Nenova, E., Konstantinova, N., Konstantinov, A.V.: Boolean Matrix
       Factorisation for Collaborative Filtering: An FCA-Based Approach. In: Agre, G.,
       et al. (eds.) AIMSA 2014, Varna, Bulgaria, September 11-13, 2014. Proceedings.
       Lecture Notes in Computer Science, vol. 8722, pp. 47–58. Springer (2014)
   10. Jelassi, M.N., Yahia, S.B., Nguifo, E.M.: A personalized recommender system
       based on users’ information in folksonomies. In: Carr, L., et al. (eds.) WWW
       (Companion Volume). pp. 1215–1224. ACM (2013)
   11. Jenkins, B.: A hash function for hash table lookup (2006), http://www.
       burtleburtle.net/bob/hash/doobs.html
   12. Kaytoue, M., Kuznetsov, S.O., Macko, J., Napoli, A.: Biclustering meets triadic
       concept analysis. Ann. Math. Artif. Intell. 70(1-2), 55–79 (2014)
   13. Kaytoue, M., Kuznetsov, S.O., Napoli, A., Duplessis, S.: Mining gene expression
       data with pattern structures in formal concept analysis. Inf. Sci. 181(10), 1989–
       2001 (2011), http://dx.doi.org/10.1016/j.ins.2010.07.007
   14. Leskovec, J., Rajaraman, A., Ullman, J.: Mining of Massive Datasets, chap. Find-
       ing Similar Items, pp. 71–128. Cambridge University Press, England, Cambridge
       (2010)
   15. Li, A., Tuck, D.: An effective tri-clustering algorithm combining expression data
       with gene regulation information. Gene regulation and systems biology 3, 49–64
       (2009), http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2758278/
   16. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis:
       A survey. IEEE/ACM Trans. Comput. Biology Bioinform. 1(1), 24–45 (2004)
   17. Mirkin, B.: Mathematical Classification and Clustering. Kluwer, Dordrecht (1996)
   18. Mirkin, B.G., Kramarenko, A.V.: Approximate bicluster and tricluster boxes in
       the analysis of binary data. In: Kuznetsov, S.O., et al. (eds.) RSFDGrC 2011.
       Lecture Notes in Computer Science, vol. 6743, pp. 248–256. Springer (2011)
   19. Nanopoulos, A., Rafailidis, D., Symeonidis, P., Manolopoulos, Y.: Musicbox: Per-
       sonalized music recommendation based on cubic analysis of social tags. IEEE
       Transactions on Audio, Speech & Language Processing 18(2), 407–412 (2010)
   20. Wille, R.: Restructuring lattice theory: An approach based on hierarchies of con-
       cepts. In: Rival, I. (ed.) Ordered Sets, NATO Advanced Study Institutes Series,
       vol. 83, pp. 445–470. Springer Netherlands (1982)
   21. Zhao, L., Zaki, M.J.: Tricluster: An effective algorithm for mining coherent clusters
       in 3d microarray data. In: Özcan, F. (ed.) SIGMOD Conference. pp. 694–705. ACM
       (2005)
           Three Related FCA Methods for Mining
           Biclusters of Similar Values on Columns

   Mehdi Kaytoue1 , Victor Codocedo2 , Jaume Baixieres3 , and Amedeo Napoli2
      1
          Université de Lyon. CNRS, INSA-Lyon, LIRIS. UMR5205, F-69621, France.
      2
          LORIA (CNRS - Inria Nancy Grand Est - Université de Lorraine), B.P. 239,
                                F-54506, Vandœuvre-lès-Nancy.
            3
              Universitat Politècnica de Catalunya. 08032, Barcelona. Catalonia.
                    Corresponding author : mehdi.kaytoue@insa-lyon.fr



           Abstract. Biclustering numerical data tables consists in detecting par-
           ticular and strong associations between both subsets of objects and at-
           tributes. Such biclusters are interesting since they model the data as
           local patterns. Whereas there exists several definitions of biclusters, de-
           pending on the constraints they should respect, we focus in this paper on
           biclusters of similar values on columns. There are several ad hoc methods
           for mining such biclusters in the literature. We focus here on two aspects:
           genericity and efficiency. We show that Formal Concept Analysis pro-
           vides a mathematical framework to characterize them in several ways,
           but also to compute them with existing and efficient algorithms. The
           proposed methods, which rely on pattern structures and triadic concept
           analysis, are experimented and compared on two different datasets.

           Keywords: biclustering, triadic concept analysis, pattern structure


  1       Introduction
  Biclustering has attracted a lot of attention for many years now, as it was used in
  an extensive way for mining biological data [7]. Given a data-table with objects
  as rows and attributes as columns, the goal is to find “sub-tables”, or pairs of
  both subsets of objects and attributes, such that the values in the subtables
  respect well-defined constraints or maximize a given measure [17].
      There exist several types of biclusters depending on the relation the values
  should respect. For example, constant biclusters are subtables with equal val-
  ues [12, 6, 17]. Biclusters with similar values on columns (BSVC) are subtables
  where all values are pairwise similar for each column [4, 17]. The latter can also
  be generalized to biclusters of similar values (BSV): any two values in the sub-
  table are similar [2, 3, 12, 21]. Dozens of algorithms, mostly ad hoc, have been
  proposed for computing the different types of biclusters. In this paper, we are
  interested in possible extensions of the Formal Concept Analysis (FCA) for-
  malism for achieving the problem of biclustering. This comes with two goals:
  (i) formalizing and understanding biclusters formation and structure, and (ii)
  reusing existing algorithms for genericity purposes.

c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 243–255,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
244      Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli


     Actually, the present paper is in continuation with the work of the authors
on the use of pattern structures –an extension of FCA for mining complex data
[8, 12]– for discovering functional dependencies in a crisp and a fuzzy settings
[1], and as well on the adaptation of pattern structures to a specific biclustering
task: the discovery of biclusters of type BSV [6, 11]. Moreover, the biclustering
task is usually considered as a “‘two-dimensional” (2D) process where biclusters
are rectangles in a table verifying some prior constraints. It was one main idea
of [11] to transpose the problem in a “three-dimensional” setting by using and
adapting triadic concept analysis [16] to the biclustering task.
     Here we follow the same line and we propose a new approach for discovering
biclusters in a numerical dataset where biclusters have “similar values” w.r.t.
their columns (type BSVC). This works is a new attempt to extend the capabil-
ities of FCA and of pattern structures, in dealing with the important problem of
biclustering. Actually, biclustering can be also considered in a (pure) numerical
setting, where it is sometimes called coclustering [18] and where kernel or spec-
tral methods are often used for achieving the task. Here we keep the discrete
setting and more precisely an FCA-based setting.
     The rest of this paper is organized as follows. In Section 2 we formally in-
troduce the biclustering problem. Then, we recall in Section 3 the FCA basics
that are necessary for developing our three methods in Section 4. We experiment
with these methods and compare them by processing two real-world datasets in
Section 5 before concluding.


2     Problem Definition

We introduce the problem of mining biclusters of similar values on columns, or
simply biclusters when no confusion can be made. A numerical dataset is defined
as a many-valued context in which biclusters are denoted as pairs of object and
attribute subsets for which a particular similarity constraint holds.

Definition 1 (Many-valued context and numerical dataset). A many-
valued context consists in a quadruple (G, M, W, I) where G is a set of objects,
M a set of attributes, W a set of attribute values, and I ⊆ G × M × W a ternary
relation. An element (g, m, w) ∈ I, also written m(g) = w or g(m) = w, can
be interpreted as: w is the value taken by the attribute m for the object g. The
relation I is such that g(m) = w and g(m) = v implies w = v.
    In the present work, W is a set of numbers and Knum = (G, M, W, I) denotes
a numerical dataset, i.e. a many-valued context where W is a set of numbers.
                                                                   m1 m2 m3 m4
Example. A tabular representation of a numer-                    g1 1 2 2 8
ical dataset is given in Table 1: objects G =                    g2 2 1 2 9
                                                                 g3 2 1 1 2
{g1 , g2 , g3 , g4 , g5 } are represented by rows while at-      g4 1 0 7 6
tributes M = {m1 , m2 , m3 , m4 } are represented by             g5 6 6 6 7
columns. W = {0, 1, 2, 6, 7, 8, 9} and we have for ex-
ample g2 (m4 ) = 9.                                         Fig. 1. A numerical dataset
     Three FCA Methods for Mining Biclusters of Similar Values on Columns              245


Definition 2 (Biclusters with similar values on columns). Given a nu-
merical dataset (G, M, W, I), a pair (A, B) (where A ⊆ G, B ⊆ M ) is called a
bicluster of similar values on columns when the following statement holds:

                          ∀g, h ∈ A, ∀m ∈ B, m(g) 'θ m(h)

where 'θ is a similarity relation: ∀w1 , w2 ∈ W, θ ∈ [0, max(W ) − min(W )],
w1 'θ w2 ⇐⇒ |w1 − w2 | ≤ θ. A bicluster (A, B) is maximal if @g ∈ G\A such
that (A ∪ {g}, B) is a bicluster, and @m ∈ M \B such that (A, B ∪ {m}) is a
bicluster.

Example. In Table 1, with θ = 1, we have that (A, B) = ({g1 , g2 }, {m1 , m2 , m3 })
is a bicluster. Indeed, consider each attribute of B separately: the values taken
by the objects A are pairwise similar. However, (A, B) is not maximal, since
we have that both (A ∪ {g3 }, B) and (A, B ∪ {m4 }) are also biclusters. Then,
({g1 , g2 , g3 }, {m1 , m2 , m3 }) and ({g1 , g2 }, {m1 , m2 , m3 , m4 }) are both maximal.
Problem (Biclustering). Given a numerical dataset (G, M, W, I) and a simi-
larity parameter θ, the goal of biclustering is to extract the set of all maximal
biclusters (A, B) respecting the similarity constraint.
Remark. It should be noticed that in the formal definition, the similarity pa-
rameter is the same for all attributes. It is possible however to use a different
parameter for each attribute without changing neither the problem definition or
its resolution. For real-world datasets, one can choose different similarity param-
eters θm (∀m ∈ M ), but also can normalize/scale the attribute domains and use
a single similarity parameter θ.


3    Basics on Formal Concept Analysis
In this paper, we show how our biclustering problem can be formalized and
answered in FCA in different ways: (i) using standard FCA [9], (ii) using pattern
structures [8], and (iii) using triadic concept analysis [16]. We recall below the
basics of each approach.
Dyadic Concept Analysis. Let G be a set of objects, M a set of attributes
and I ⊆ G × M be a binary relation. The fact (g, m) ∈ I is interpreted as “g
has attribute m”. The two following derivation operators (·)0 are defined:

                 A0 = {m ∈ M | ∀g ∈ A : gIm}              f or A ⊆ G,
                    0
                  B = {g ∈ G | ∀m ∈ B : gIm}              f or B ⊆ M

which define a Galois connection between the powersets of G and M . For A ⊆ G,
B ⊆ M , a pair (A, B) such that A0 = B and B 0 = A, is called a (formal) concept.
Concepts are partially ordered by (A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 (⇔ B2 ⊆ B1 ).
With respect to this partial order, the set of all formal concepts forms a complete
lattice called the concept lattice of the formal context (G, M, I). For a concept
(A, B) the set A is called the extent and the set B the intent of the concept.
246      Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli


Triadic Concept Analysis. A triadic context is given by (G, M, B, Y ) where
G, M , and B are respectively called sets of objects, attributes and conditions,
and Y ⊆ G × M × B. The fact (g, m, b) ∈ Y is interpreted as the statement “Ob-
ject g has the attribute m under condition b”. A (triadic) concept of (G, M, B, Y )
is a triple (A1 , A2 , A3 ) with A1 ⊆ G, A2 ⊆ M and A3 ⊆ B satisfying the two
following statements: (i) A1 ×A2 ×A3 ⊆ Y , X1 ×X2 ×X3 ⊆ Y and (ii) A1 ⊆ X1 ,
A2 ⊆ X2 and A3 ⊆ X3 implies A1 = X1 , A2 = X2 and A3 = X3 . If (G, M, B, Y )
is represented by a three dimensional table, (i) means that a concept stands for
a 3-dimensional rectangle full of crosses while (ii) characterizes component-wise
maximality of concepts. For a triadic concept (A1 , A2 , A3 ), A1 is called the ex-
tent, A2 the intent and A3 the modus. To derive triadic concepts, two pairs of
derivation operators are defined. The reader can refer to [16] for their definitions
which are not necessary for the understanding of the present work.
Pattern Structures. Let G be a set of objects, let (D, u) be a meet-semi-
lattice of potential object descriptions and let δ : G −→ D be a mapping. Then
(G, (D, u), δ) is called a pattern structure. Elements of D are called patterns
and are ordered by a subsumption relation v such that given c, d ∈ D one has
c v d ⇐⇒ cud = c. Within the pattern structure (G, (D, u), δ) we can define the
following derivation operators (·) , given A ⊆ G and a description d ∈ (D, u):
                           l
                    A =      δ(g)      d = {g ∈ G|d v δ(g)}
                          g∈A

These operators form a Galois connection between (℘(G), ⊆) and (D, v). (Pat-
tern) concepts of (G, (D, u), δ) are pairs of the form (A, d), A ⊆ G, d ∈ (D, u),
such that A = d and A = d . For a pattern concept (A, d), d is called a pattern
intent and is the common description of all objects in A, called pattern extent.
When partially ordered by (A1 , d1 ) ≤ (A2 , d2 ) ⇔ A1 ⊆ A2 (⇔ d2 v d1 ), the set
of all concepts forms a complete lattice called a (pattern) concept lattice.
Computing Concepts and Concept Lattices. Processing a formal context
in order to generate its set of concepts can be achieved by various algorithms
(see [15] for a survey and a comparison, see also itemset mining [19]). For pro-
cessing pattern structures, such algorithms generally need minor adaptations.
Basically, one needs to override the code for (i) computing the intersection of
any two arbitrary descriptions, and (ii) test the ordering between two descrip-
tions. Processing a triadic context is however not so direct and can be done with
nested FCA algorithms [10] or dedicated data-mining algorithm [5].
Similarity relations in FCA. The notion of similarity can be formalized by a
tolerance relation: a symmetric, reflexive but not necessarily transitive relation.
The similarity relation 'θ used for defining biclusters of similar values is a toler-
ance. Given W a set of numbers, any maximal subset of pairwise similar values
is called a block of tolerance.
Definition 3. A binary relation T ⊆ W × W is called a tolerance relation if:
      (i) ∀x ∈ W xT x (reflexivity)
      (ii) ∀x, y ∈ W xT y → yT x (symmetry)
       Three FCA Methods for Mining Biclusters of Similar Values on Columns                        247


Definition 4. Given a set W , a subset K ⊆ W , and a tolerance relation T on
W , K is a block of tolerance if:
      (i) ∀x, y ∈ K xT y (pairwise similarity)
      (ii) ∀z 6∈ K, ∃u ∈ K ¬(zT u) (maximality)
It is shown that tolerance blocks can be obtained from the formal context of a
tolerance relation [14]. In the context (W, W, 'θ ), one can characterize all blocks
of tolerance K (and only them) as formal concepts (K, K).

4      Mining biclusters of similar values on columns in FCA
The basic notions of FCA of the previous section allow us now to answer our
biclustering problem in various ways with: (i) an original method using inter-
val pattern structure, (ii) a recently introduced method using partition pattern
structures [6], and (iii) an original method relying on triadic concept analysis.
We emphasize the genericity of FCA to answer a data mining problem.

4.1     Interval Pattern Structure Approach
For a dataset Knum = (G, M, W, I), an interval pattern structure (G, (D, u), δ)
is defined as follows [13]: the objects from G are described by vectors of intervals,
where each dimension gives a range of values for an attribute m ∈ M (following
a canonical ordering of the dimensions, i.e. dimension i corresponds to attribute
mi ∈ M ). Then, for m ∈ M , the semi-lattice of intervals (Dm , um ) is given by:
               Dm = {[w1 , w2 ] | ∃g, h ∈ G s.t. m(g) = w1 and m(h) = w2 }
      [a, b] um [c, d]   =      [min(a, c), max(b, d)]
         c um d = c ⇐⇒ c vm d
    [a, b] vm [c, d] ⇐⇒ [c, d] ⊇ [a, b]
The description space (D, u) of the interval pattern structure is a product of
meet-semi-lattices (D, u) = ×m∈M (Dm , um ) which is a semi-lattice.
Examples. In Table 1, ({g1 , g2 , g3 }, h[1, 2], [1, 2], [1, 2], [2, 9]i) is a pattern concept:
                                              δ(g1 ) = h[1, 1], [2, 2], [2, 2], [8, 8]i
            {g1 , g2 , g3 } = δ(g1 ) u δ(g2 ) u δ(g3 ) = h[1, 2], [1, 2], [1, 2], [2, 9]i
                             h[1, 2], [1, 2], [1, 2], [8, 9]i v h[1, 2], [1, 2], [1, 2], [2, 9]i
                                         {g1 , g2 , g3 } = {g1 , g2 , g3 }
We now give the intuitive idea on how the interval pattern concept lattice can
be used to characterize the biclusters. Consider first the concept (A1 , d1 ) =
({g1 , g2 }, h[1, 2], [1, 2], [1, 2], [8, 9]i). Consider also a function attr : D → M which
returns for an interval pattern the set of attributes whose interval is not larger
than the θ parameter, for d = h[ai , bi ]i, i ∈ [1, |M |]: attr(d) = {mi ∈ M |ai 'θ
bi }. (A1 , attr(d1 )) = ({g1 , g2 }, {m1 , m2 , m3 , m4 }) is a maximal bicluster. Con-
sider the interval pattern concept (A2 , d2 ) = ({g1 , g2 , g3 }, h[1, 2], [1, 2], [1, 2], [2, 9]i):
(A2 , attr(d2 )) = ({g1 , g2 , g3 }, {m1 , m2 , m3 }) is a maximal bicluster (with θ = 1).
This means that biclusters can be characterized thanks to pattern concepts.
248       Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli


Proposition 1. Consider a numerical dataset (G, M, W, I) as an interval pat-
tern structure (G, (D, u), δ). For any maximal bicluster (A, B), there exists a
pattern concept (A, d) such that (A, B) = (A, attr(d)).

Proof. To ease reading, the proof is given in an appendix.                           t
                                                                                     u


4.2    Partition pattern structure approach

     A partition pattern structure is a pattern structure instance where the de-
scription space is given by a semi-lattice of partitions over a set S
                                                                    X [2]. Formally,
we have (G, (D, u), δ) where: D = P art(X) and d1 u d2 =              pi ∩ pj where
pi , pj ⊆ X, pi ∈ d1 , pj ∈ d2 . The semi-lattice is actually a complete lattice of
set partitions in which the bottom element is not considered. In [1], we showed
that the definition of u, and equivalently v, needs a slight modification when
         K
D = 22 , i.e. a description d ∈ D is a set of subsets of X, and they doScover X
(possibly with overlapping). In that case, we have that d1 u d2 = max( pi ∩ pj )
where pi , pj ⊆ X, pi ∈ d1 , pj ∈ d2 and max(.) returns the maximal sets w.r.t.
inclusion.
     Now we show that such a pattern structure can be constructed from a nu-
merical dataset, and that the corresponding concepts allow to generate all max-
imal biclusters. From a numerical dataset (G, M, W, I), we build the structure
                               G
(M, (D, u), δ) where D = 22 . The description of an object4 m ∈ M is given by:
δ(m) = {p1 , p2 , ...} where p1 , p2 , .. ⊆ G and:

                             m(g1 ) 'θ m(g2 ), ∀g1 , g2 ∈ pi (similarity)
             @g3 ∈ G\pi with m(g3 ) 'θ m(gk ), ∀gk ∈ pi (maximality)
                                            [
                                                pi = G (covering)
                                                 i

In other words, each original attribute m ∈ M is described by a family of subsets
of G, where each one corresponds to a block of tolerance w.r.t. the values of
attribute m. Let (A, d = {pi }) be a partition pattern concept, it is easy to see
how the pairs bici = (pi , A) are biclusters with rows g ∈ pi and columns m ∈ A5 .
While any bici = (pi , A) is a bicluster, it is not necessarily a maximal bicluster.
Nevertheless, maximal biclusters can be identified using the concept lattice.
Proposition 2. Consider a pattern concept (A, d = {pi }). The bicluster bici =
(pi , A) is maximal if there is no pattern concept (C, {pi , ...}) with A ⊆ C.

Proof. The proof to this proposition is very intuitive. Recall from Section 2 that
the bicluster (pi , A) is maximal if two conditions are met, namely @g ∈ G\pi
such that (pi ∪ {g}, A) is a bicluster and @m ∈ M \A such that (pi , A ∪ {m}) is
4
    Object in the pattern structure; attribute in the numerical dataset.
5
    In order to keep consistency with the previous notation, biclusters are written in-
    versely as partition pattern concepts.
      Three FCA Methods for Mining Biclusters of Similar Values on Columns                                          249


a bicluster, The first condition holds for bici given the maximality condition of
the tolerance block pi ; The second follows from the proposition declaration. tu
Example. The numerical dataset (G, M, W, I) given in Table 1 can be turned
into a pattern structure as follows with θ = 1:
      δ(m1 ) = {{g1 , g2 , g3 , g4 }{g5 }} δ(m2 ) = {{g2 , g3 , g4 }{g1 , g2 , g3 }{g5 }}
      δ(m3 ) = {{g1 , g2 , g3 }{g4 , g5 }} δ(m4 ) = {{g4 , g5 }{g1 , g5 }{g1 , g2 }{g3 }}

    Indeed, each component of a description is a maximal set of objects hav-
ing pairwise similar values for a given attribute. The pattern concept lattice is
given in Figure 2. We remark that (i) any concept corresponds to a biclus-
ter, (ii) some of them correspond to a maximal bicluster, and most impor-
tantly, (iii) any maximal bicluster can be found as a concept. For example,
from the concept (A1 , d1 ) = ({m3 , m4 }, {{g1 , g2 }, {g4 , g5 }, {g3 }}) we obtain the
following biclusters: bic1 = ({g1 , g2 }, {m3 , m4 }) and bic2 = ({g4 , g5 }, {m3 , m4 }).
Whereas bic2 is a maximal bicluster bic1 is not since we have that (A2 , d2 ) =
({m1 , m2 , m3 , m4 }, {{g1 , g2 }, {g3 }, {g4 }, {g5 }}) with (A2 , d2 ) ≤ (A1 , d1 ). In turn,
bic3 = ({g1 , g2 }, {m1 , m2 , m3 , m4 }) is a maximal bicluster.
Remark. It is noticeable that an equivalent formal context can be built. By
equivalent, we mean that the concept lattices produced by both structures are
isomorphic. To obtain this formal context, we use a slight modification of the data
transformation of [9] (pp. 92): (M, B2 (G), I) st. (m, (g, h)) ∈ I ⇐⇒ m(g) 'θ
m(h). The concept lattice is equivalent to the pattern concept lattice [2], and
thus it can be used in the same way to get maximal biclusters. In our running
example, such context is given in Table 1, and its associated concept lattice is
given in Figure 2 (right), a lattice isomorphic to the one raised from the pattern
structure (left). The proof can be done in a similar manner as it is done in [2].


        (g1 , g2 ) (g1 , g3 ) (g1 , g4 ) (g1 , g5 ) (g2 , g3 ) (g2 , g4 ) (g2 , g5 ) (g3 , g4 ) (g3 , g5 ) (g4 , g5 )
   m1      ×          ×          ×                     ×          ×                     ×
   m2      ×          ×                                ×          ×                     ×
   m3      ×          ×                                ×                                                      ×
   m4      ×                                                                                                  ×
                                          Table 1. Formal context




4.3     Triadic Concept Analysis Approach
We present another original result: any maximal bicluster of similar values is
characterized as a triadic concept. The triadic context is derived from the nu-
merical dataset by encoding the tolerance relation between the values.

Proposition 3. Given a numerical dataset (G, M, W, I), consider the derived
triadic context given by (M, G, G, Y ) s.t. (m, g1 , g2 ) ∈ Y ⇐⇒ m(g1 ) 'θ m(g2 ).
250       Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli




    Fig. 2. Pattern concept lattice on the left side, concept lattice of the right side.


There is a one-to-one correspondence between the set of all maximal biclusters
(A, B), the set of all triadic concepts (B, A, A) of the derived context.

Proof. Consider a maximal bicluster (A, B). We have that ∀g, h ∈ A : m(g) 'θ
m(h) ⇐⇒ m ∈ B, if and only if (by the definition of Y ) (B, A, A) ⊆ Y . We now
take (B 0 , A0 , A0 ) ⊆ Y such that B ⊆ B 0 and A ⊆ A0 . Since (A, B) is a maximal
bicluster, we have that for any pair of objects g, h ∈ A0 and m ∈ B 0 such that
g(m) 'θ h(m), implies that g, h ∈ A and m ∈ B. Let (B, A, A) be a triadic
concept. We have that for any pair of objects g, h ∈ A and m ∈ B we have that
g(m) 'θ h(m), this is, that ∀g, h ∈ A : g(m) 'θ h(m) ⇐⇒ m ∈ B, which is
the alternative definition of maximal bicluster.                                t
                                                                                u

Example. Taking again θ = 1, the triadic context derived from the numerical
dataset from Table 1 is given in Table 2. An example of triadic concept is:
({m3 , m2 , m1 }, {g1 , g3 , g2 }, {g1 , g2 , g3 }) which is in turn the maximal bicluster
({g1 , g3 , g2 }, {m3 , m2 , m1 }).


5     Experiments
We experiment with the different FCA methods introduced in the previous sec-
tion. We report preliminary results in two aspects: efficiency (running time) and
compactness (number of concepts) to discuss the strengths and weaknesses of
the different methods.

       m1 g1 g2 g3 g4 g5 m2 g1 g2 g3 g4 g5 m3 g1 g2 g3 g4 g5 m4 g1 g2 g3 g4 g5
       g1 × × × ×        g1 × × ×          g1 × × ×          g1 × ×         ×
       g2 × × × ×        g2 × × × ×        g2 × × ×          g2 × ×
       g3 × × × ×        g3 × × × ×        g3 × × ×          g3       ×
       g4 × × × ×        g4    × × ×       g4          × × g4            × ×
       g5             × g5              × g5           × × g5 ×          × ×
             Table 2. Triadic context derived from Table 1 thanks to '1 .
    Three FCA Methods for Mining Biclusters of Similar Values on Columns               251


Data and experimental settings. The first dataset, “Diagnosis”6 , contains
120 objects with 8 attributes. The first attribute provides temperature informa-
tion of a given patient with a range [35.5, 41.5] (numerical). For this attribute
we used θ = 0.1 and then θ = 0.3. The other 7 attributes are binary (θ = 0).
The second dataset, “dataSample 1.txt”, is provided with the BiCat software7 .
It contains 420 objects and 70 numerical attributes with range [−5.9, 6.7]. We
used θ = 0.05 for all attributes. We provide results in Table 3 for the three dif-
ferent FCA methods discussed in this article, namely interval pattern structure
(IPS), tolerance blocks/partition pattern structures (TBPS) and triadic concept
analysis (TCA). We also report on the use of standard FCA using the discretiza-
tion technique discussed at the end of Section 4.2 (FCA). We also discuss the
computing of clarified contexts, given that it can dramatically reduce the size
of the context while keeping the same concept lattice (FCA-CL). A context is
clarified when there exists neither two objects with the same description, or two
attributes shared by the same set of objects.
    For the methods based on FCA and pattern structures (IPS, TBPS), we used
a C++ version of the AddIntent algorithm [20]8 . No restrictions were imposed
over the size of the biclusters. The TCA method was implemented using Data-
Peeler [5]. All the experiments were performed using a Linux machine with
Intel Xeon E7 running at 2.67GHz with 1TB of RAM.
Discussion. Results in Table 3 show that for the Diagnosis dataset, the clar-
ified context using standard FCA (FCA-CL) is the best of the five methods
w.r.t. execution time while for the BicAt sample 1, the best is TCA. Times are
expressed as the sum of the time required to create the input representation
of the dataset for the corresponding technique and its execution. In the case
of FCA and FCA-CL, the pre-processing can be as high as the time required
for applying the AddIntent algorithm. However, for large datasets such as the
BicAt example, this times can be ignored. It is also worth noticing that the
pre-processing depends on the chosen θ value, hence for each different θ config-
uration, a new pre-processing task has to be executed. This is not the case for
interval and partition pattern structures the pre-processing of which is linear
6
  http://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice
7
  http://www.tik.ee.ethz.ch/sop/bicat/
8
  https://code.google.com/p/sephirot/


                                 Diagnosis                            BicAt sample 1
                     θ = 0.3                     θ = 0.1                 θ = 0.05
Technique      Time [s]    #Concepts Exec. Time [s] #Concepts Exec. Time [s] #Concepts
          Preproc + Exec.            Preproc + Exec.           Preproc + Exec.
FCA         0.11 + 0.335      98        0.11 + 0.291      88     2.3 + 2,220      476,950
FCA-CL     0.11 + 0.02        98      0.11 + 0.011        88     2.3 + 2,220      476,950
TCA          0.04 + 33.3     3,322      0.04 + 31.34     2,127   3.17 + 360       741,421
IPS        0.011 + 0.303      928      0.001 + 0.178      301    0.02 + 2,340     722,442
TBPS        0.011 + 1.76      98       0.001 + 0.411      88     0.02 + 5,340     476,950

Table 3. Number of concepts and execution times (pre-processing + addIntent run)
252     Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli


w.r.t. the number of objects (it is actually, just a change of format). We can
also appreciate a more compact representation of the biclusters by the use of
partition pattern structures (TBPS) and its formal context versions (FCA and
FCA-CL). While TBPS is the slowest of the five methods, it is also the cheapest
one in terms of the use of machine resources, more specifically RAM. TCA is the
more expensive method in terms of machine resources and data representation,
however this yields results faster. Interval pattern structures are in the middle
as a good trade-off of compactness and execution time.
    For this initial experimentation we have not reported the number of maximal
biclusters nor the bicluster extraction algorithms that can be implemented for
each different technique, but only in the FCA techniques themselves. Regarding
the number of maximal biclusters, this is the same for each technique since
all of them are bicluster enumeration techniques, i.e. all possible biclusters are
extracted. Hence, the difference among techniques is not given by the number
of maximal biclusters extracted, but by the number of formal concepts found
and their post-processing complexity to extract the maximal biclusters from
them. In general, it is easy to observe from Propositions 1, 2 and 3 that the
post-processing of TCA is linear w.r.t. the number of triadic concepts found,
while for TPS is linear w.r.t. the number of interval pattern concepts times the
number of columns of the numerical dataset squared and for TBPS is linear
w.r.t. the number of super-sub concept relations in the tolerance block pattern
concept lattice. Nevertheless, different strategies for bicluster extraction can be
implemented for each technique rendering the comparison unfair. For example,
in [6] an optimization is proposed regarding biclustering using partition pattern
structures (which can be easily adapted to TBPS) which cuts in half its execution
time by breaking the structure of the lattice. Similar strategies for IPS and TCA
could also be implemented but are still a matter of research.



6     Conclusion


Biclustering is an important data analysis task that is used in several appli-
cations such as transcriptome analysis in biology and for the design of recom-
mender systems. Biclustering methods produce a collection of local patterns that
are easier to interpret than a global model. There are several types of biclus-
ters and corresponding algorithms, ad hoc most of the time. In this paper, our
main contribution shows how the biclusters of similar values on columns can be
characterized or generated from formal concepts, pattern concepts and triadic
concepts. Bringing back this problem of biclustering into formal concept anal-
ysis settings allows the usage of existing and efficient algorithms without any
modifications. However, and this is among the perspectives of research, several
optimizations can be made. For example, with the triadic method, one should
not generate both concepts (A, B, C) and (A, C, B): they are redundant since
only concepts with B = C correspond to maximal biclusters.
     Three FCA Methods for Mining Biclusters of Similar Values on Columns           253


References
 1. J. Baixeries, M. Kaytoue, and A. Napoli. Computing similarity dependencies with
    pattern structures. In M. Ojeda-Aciego and J. Outrata, editors, CLA, volume 1062
    of CEUR Workshop Proceedings, pages 33–44. CEUR-WS.org, 2013.
 2. J. Baixeries, M. Kaytoue, and A. Napoli. Characterizing Functional Dependencies
    in Formal Concept Analysis with Pattern Structures. Annals of Mathematics and
    Artificial Intelligence, pages 1–21, Jan. 2014.
 3. J. Besson, C. Robardet, L. D. Raedt, and J.-F. Boulicaut. Mining bi-sets in nu-
    merical data. In S. Dzeroski and J. Struyf, editors, KDID, volume 4747 of Lecture
    Notes in Computer Science, pages 11–23. Springer, 2007.
 4. A. Califano, G. Stolovitzky, and Y. Tu. Analysis of gene expression microarrays for
    phenotype classification. In P. E. Bourne, M. Gribskov, R. B. Altman, N. Jensen,
    D. A. Hope, T. Lengauer, J. C. Mitchell, E. D. Scheeff, C. Smith, S. Strande,
    and H. Weissig, editors, Proceedings of the Eighth International Conference on
    Intelligent Systems for Molecular Biology, August 19-23, 2000, La Jolla / San
    Diego, CA, USA, pages 75–85. AAAI, 2000.
 5. L. Cerf, J. Besson, C. Robardet, and J.-F. Boulicaut. Closed patterns meet n-ary
    relations. TKDD, 3(1), 2009.
 6. V. Codocedo and A. Napoli. Lattice-based biclustering using Partition Pattern
    Structures. In 21st European Conference on Artificial Intelligence (ECAI), 2014.
 7. A. V. Freitas, W. Ayadi, M. Elloumi, J. Oliveira, J. Oliveira, and J.-K. Hao. Bio-
    logical Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing
    of Biological Data, chapter Survey on Biclustering of Gene Expression Data. John
    Wiley & Sons, Inc., 2013.
 8. B. Ganter and S. O. Kuznetsov. Pattern structures and their projections. In ICCS
    ’01: Proceedings of the 9th International Conference on Conceptual Structures,
    pages 129–142. Vol. 2120, Springer-Verlag, 2001.
 9. B. Ganter and R. Wille. Formal Concept Analysis. Springer, 1999.
10. R. Jäschke, A. Hotho, C. Schmitz, B. Ganter, and G. Stumme. Trias - an algorithm
    for mining iceberg tri-lattices. In ICDM, pages 907–911, 2006.
11. M. Kaytoue, S. O. Kuznetsov, J. Macko, and A. Napoli. Biclustering meets triadic
    concept analysis. Annals of Mathematics and Artificial Intelligence, 70(1-2), 2014.
12. M. Kaytoue, S. O. Kuznetsov, and A. Napoli. Biclustering numerical data in formal
    concept analysis. In P. Valtchev and R. Jäschke, editors, ICFCA, volume 6628 of
    LNCS, pages 135–150. Springer, 2011.
13. M. Kaytoue, S. O. Kuznetsov, A. Napoli, and S. Duplessis. Mining gene expression
    data with pattern structures in formal concept analysis. Information Science,
    181(10):1989–2001, 2011.
14. S. O. Kuznetsov. Galois connections in data analysis: Contributions from the soviet
    era and modern russian research. In B. Ganter, G. Stumme, and R. Wille, editors,
    Formal Concept Analysis, volume 3626 of Lecture Notes in Computer Science,
    pages 196–225. Springer, 2005.
15. S. O. Kuznetsov and S. A. Obiedkov. Comparing performance of algorithms for
    generating concept lattices. J. Exp. Theor. Artif. Intell., 14(2-3):189–216, 2002.
16. F. Lehmann and R. Wille. A triadic approach to formal concept analysis. In ICCS,
    volume 954 of LNCS, pages 32–43. Springer, 1995.
17. S. Madeira and A. Oliveira. Biclustering algorithms for biological data analysis: a
    survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics,
    1(1):24–45, 2004.
254       Mehdi Kaytoue, Victor Codocedo, Jaume Baixeries and Amedeo Napoli


18. N. Rogovschi, L. Labiod, and M. Nadif. A spectral algorithm for topographical
    co-clustering. In IJCNN, pages 1–6. IEEE, 2012.
19. T. Uno, M. Kiyomi, and H. Arimura. Lcm ver. 2: Efficient mining algorithms for
    frequent/closed/maximal itemsets. In R. J. B. Jr., B. Goethals, and M. J. Zaki,
    editors, FIMI, volume 126 of CEUR Workshop Proceedings. CEUR-WS.org, 2004.
20. D. van der Merwe, S. Obiedkov, and D. Kourie. AddIntent: A New Incremental Al-
    gorithm for Constructing Concept Lattices. In P. Eklund, editor, Concept Lattices,
    volume 2961 of LNCS, pages 205–206. Springer, Berlin/Heidelberg, 2004.
21. R. Veroneze, A. Banerjee, and F. J. V. Zuben. Enumerating all maximal biclusters
    in real-valued datasets. CoRR, abs/1403.3562, 2014.


7     Appendix: Proof of proposition 1
We introduce notations, before to recall and prove Proposition 1 that relates
maximal biclusters to interval pattern concepts of a pattern structure. The in-
tuition lies in the relation between the set of attributes M of (G, M, W, I)) in an
interval pattern structure (G, (D, u), δ). Let d = h[a1 , b1 ], [a2 , b2 ], . . . , [an , bn ]i ∈
D be a pattern interval in an interval pattern structure (G, (D, u), δ), where
|M | = n. For any mi ∈ M , we define: d(mi ) = [ai , bi ]. and |d(mi )| = |ai − bi |.
Definition 5. Let d be a pattern in an interval pattern structure (G, (D, u), δ).
The function attr : D 7→ M is defined as: attr(d) = {m ∈ M | |d(m)| ≤ θ}.
Definition 6. Let A ⊆ G be a set of objects and m ∈ M an attribute. We
define: A(m) = {g(m) | g ∈ B}. For instance, in Table 1, if A = {g1 , g2 , g3 },
then, A(m4 ) = {2, 8, 9}.
Proposition 4. For A ⊆ G, we have that, for all mi ∈ M :
      A = h[min(A(m1 )), max(A(m1 ))], . . . , [min(A(mn )), max(A(mn ))]i
Proof. Since the operation u is associative and commutative, we have that
       l
A =       gi = h[min(A(m1 )), max(A(m1 ))], . . . , [min(A(mn )), max(A(mn ))]i
        gi ∈A

                                                                                               t
                                                                                               u
    Now we reformulate and prove the Proposition 1.
Proposition 5. Consider a numerical dataset (G, M, W, I) as an interval pat-
tern structure (G, (D, u), δ). For any maximal bicluster (A, B), we define: d =
A . Then: 1. B = attr(d) and 2. (A, D) is a pattern concept in (G, (D, u), δ).
Proof. 1. B = attr(d). We prove that m ∈ attr(b) ↔ m ∈ B. Since B = A ,
    then, by the definition of maximal bicluster we have that ∀m ∈ M : m ∈
    B ↔ |A(m)| ≤ θ, if and only if |min(A(m)) − max(A(m))| ≤ θ if and only
    if (by the definition of d) m ∈ attr(d).                               t
                                                                           u
 2. We need to prove that A = d and that A = d. A = d holds by the
    definition of d. As for A = d , we take g ∈ d , which means that ∀m ∈
    M : g(m) ∈ d(m), also if m ∈ B, which implies that g ∈ A by definition of
    maximal bicluster.
      Defining Views with Formal Concept Analysis
       for Understanding SPARQL Query Results

                       Mehwish Alam2,3 and Amedeo Napoli1,2
         1
           CNRS, LORIA, UMR 7503, Vandoeuvre-lès-Nancy, F-54506, France
                        2
                          Inria, Villers-lès-Nancy, F-54600, France
  3
    Université de Lorraine, LORIA, UMR 7503, Vandoeuvre-lès-Nancy, F-54506, France
                        {mehwish.alam,amedeo.napoli@loria.fr}



         Abstract. SPARQL queries over semantic web data usually produce
         list of tuples as answers that may be hard to understand and interpret.
         Accordingly, this paper focuses on Lattice-Based View Access (LBVA),
         a framework based on FCA. This framework provides a classification of
         the answers of SPARQL queries based on a concept lattice, that can be
         navigated for retrieving or mining specific patterns in query results. In
         this way, the concept lattice can be considered as a materialized view of
         the data resulting from a SPARQL query.


  Keywords: Formal Concept Analysis, SPARQL Query Views, Lattice-Based
  Views, SPARQL, Classification.


  1    Introduction
  At present, Web has become a potentially large repository of knowledge, which is
  becoming main stream for querying and extracting useful information. In partic-
  ular, Linked Open Data (LOD) [2] provides a method for publishing structured
  data in the form of RDF resources. These RDF resources are interlinked with
  each other to form a cloud. SPARQL queries are used in order to make these
  resources usable, i.e., queried. In some cases, queries in natural language against
  standard search engines can be simple to use but sometimes they are complex
  and may require integration of data sources. Then the standard search engines
  will not be able to easily answer these queries, e.g., Currencies of all G8 coun-
  tries. Such a complex query can be formalized as a SPARQL query over data
  sources present in LOD cloud through SPARQL endpoints for retrieving answers.
  Moreover, users may sometimes execute queries which generate huge amount of
  results giving rise to the problem of information overload [5]. A typical example
  is given by the answers retrieved by search engines, which mix between several
  meanings of one keyword. In case of huge results, user will have to go through
  a lot of results to find the interesting ones, which can be overwhelming with-
  out any specific navigation tool. Same is the case with the answers obtained by
  SPARQL queries, which are huge in number and it may be harder to extract
  the most interesting patterns. This problem of information overload raises new

c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 255–267,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
256       Mehwish Alam and Amedeo Napoli


challenges for data access, information retrieval and knowledge discovery w.r.t
web querying.
    Accordingly, this paper proposes a new approach based on Formal Concept
Analysis (FCA [7])s. It describes a lattice-based classification of the results ob-
tained by SPARQL queries by introducing a new clause VIEW BY in SPARQL
query. This framework, called Lattice-Based View Access (LBVA), allows the
classification of SPARQL query results into a concept lattice, referred to as a
view, for data analysis, navigation, knowledge discovery and information retrieval
purposes. This new clause VIEW BY which enhances the functionality of already
existing GROUP BY clause in SPARQL query by adding sophisticated classification
and Knowledge Discovery aspects. Here after, we describe how a lattice-based
view can be designed from a SPARQL query. Afterwards, a view is accessed for
analysis and interpretation purposes which are totally supported by the concept
lattice. In case of large data only a part of the lattice [10] can be considered for
the analysis. In this way, this paper investigates also the capabilities of FCA to
deal with semantic web data.
    The intuition of classifying results obtained by SPARQL queries is inspired
by web clustering engines [3] such as Carrot24 . The general idea behind web
clustering engines is to group the results obtained by query posed by the user
based on the different meanings of the terms related to a query. Such systems
deal with unstructured textual data on web. By contrast, there are some stud-
ies conducted to deal with structured RDF data. In [5], the authors introduce
a clause Categorize By to target the problem of managing large amounts of
results obtained by conjunctive queries with the help of subsumption hierarchy
present in the knowledge base. By contrast, the VIEW BY clause generates lattice-
based views which provide a mathematically well-founded classification based on
formal concepts and an associated concept lattice. Moreover, it also paves way
for navigation or information retrieval by traversing the concept lattice and for
data analysis by allowing the extraction of association rules from the lattice.
Such data analysis operations allow discovery of new knowledge. Additionally,
unlike Categorize By, VIEW BY can deal with data that has no schema (which
is often the case with linked data). Moreover, VIEW BY has been evaluated over
very large set of answers (roughly 100,000 results) obtained over real datasets. In
case of larger number of answers, Categorize By does not provide any pruning
mechanism while this paper describes how the views can be pruned using iceberg
lattices.
    The paper is structured as follows: Section 2 introduces a motivating exam-
ple. Section 3 gives a brief introduction of the state of the art while Section 4
defines LBVA and gives the overall architecture of the framework. Section 5 dis-
cusses some experiments conducted using LBVA. Finally, Section 6 concludes
the paper.


4
    http://project.carrot2.org/index.html
         Defining Views with FCA for Understanding SPARQL Query Results          257


2     Motivation

In this section we introduce a motivating example focusing on why LOD should
be queried and why the SPARQL query results need classification. This scenario
will continue in the rest of the paper. Let us consider that a query Q searching for
museums where the exhibition of some famous artists is taking place along with
the location of the museum. Here, we do not discuss the interface aspects and we
will assume that SPARQL queries are provided. A standard query engine is not
adequate for answering such kind of questions and a direct query over LOD will
give better results. One of the ways to obtain such an information is to query
LOD through its SPARQL endpoint. This query will generate a huge amount of
results, which will need further manual work to group the interesting links.


3     Background

3.1    Linked Open Data

Linked Open Data (LOD) [2] is the way of publishing structured data in the
form of RDF graphs. Given a set of URIs U, blank nodes B and literals L, an
RDF triple is represented as t = (s, p, o) ∈ (U ∪ B) × U × (U ∪ B ∪ L), where
s is a subject, p is a predicate and o is an object. A finite set of RDF triples is
called as RDF Graph G such that G = (V, E), where V is a set of vertices and E
is a set of labeled edges and G ∈ G, such that G = (U ∪ B) × U × (U ∪ B ∪ L).
Each pair of vertices connected through a labeled edge keeps the information
of a statement. Each statement is represented as hsubject, predicate, objecti
referred to as an RDF Triple. V includes subject and object while E includes
the predicate.
    SPARQL5 is the standard query language for RDF. In the current work we
will focus on the queries containing SELECT clause. Let us assume that there
exists a set of variables V disjoint from U in the above definition of RDF, then
(U ∪ V) × (U ∪ V) × (U ∪ V) is a graph pattern called a triple pattern. If a
variable ?X ∈ V and ?X = c then c ∈ U . Given U , V and a triple pattern t
a mapping µ(t) would be the triple obtained by replacing variables in t with
U . [[.]]G takes an expression of patterns and returns a set of mappings. Given
a mapping µ : V → U and a set of variables W ⊆ V , µ is represented as
µ|W , which is described as a mapping such that dom(µ|W ) = dom(µ) ∩ W and
µ|W (?X) = µ(?X) for every ?X ∈ dom(µ) ∩ W . Finally, the SPARQL SELECT
query is defined as follows:

Definition 1. A SPARQL SELECT query is a tuple (W, P ), where P is a graph
pattern and W is a set of variables such that W ⊆ var(P ). The answer of (W, P )
over an RDF graph G, denoted by [[(W, P )]]G , is the set of mappings:

                          [[(W, P )]]G = {µ|W |µ ∈ [[P ]]G }
5
    http://www.w3.org/TR/rdf-sparql-query/
258       Mehwish Alam and Amedeo Napoli


    In Definition 1, var(P ) is the set of variables in pattern P and W is the set
of variables in SELECT clause. Here, P includes the triple patterns containing
variables. This triple pattern is then evaluated against the RDF Graph G given
as [[P ]]G . It returns a set of mappings with respect to the variables in var(P ).
Finally a projection over µ is done w.r.t. the variables in W . The projected set of
mappings obtained as represented as µ|W . Further details on the formalization
and foundations of RDF databases are discussed in [1].
Example 1. Continuing the scenario in section 2, following is the SPARQL query:

1 SELECT ?museum ?country ?artist WHERE {
   2 ?museum rdf:type dbpedia-owl:Museum .
   3 ?museum dbpedia-owl:location ?city .
   4 ?city dbpedia-owl:country ?country .
   5 ?painting dbpedia-owl:museum ?museum .
   6 ?painting dbpprop:artist ?artist}
7 GROUP BY ?country ?artist

This query retrieves the list of museums along with the artists whose work is
exhibited in a museum along with the location of a museum. Lines 5 and 6
retrieve information about the artists whose work is displayed in some museum.
More precisely, the page containing the information on a museum (?museum) is
connected to the page of the artists (?artist) through a page on the work of
artist (?painting) displayed in the museum. In order to integrate these three re-
sources, two predicates were used dbpedia-owl:museum and dbpprop:artist.
An excerpt of the answers obtained by Group by clause is shown below:
Pablo Picasso     Musee d’Art Moderne France
Leonardo Da Vinci Musee du Louvre     France
Raphael           Museo del Prado     Spain

    The problem encountered while browsing such an answer is that there are too
many statements to navigate through. Even after using the GROUP BY clause the
answers are not organized in any ordered structure. By contrast, the clause VIEW
BY activates the LBVA framework, where the user will obtain a classification
of the statements as a concept lattice where statements are partially ordered
(see Figure 1a). To obtain the museums in UK displaying the work of Goya, all
the museums displaying the work of Goya can be retrieved and then the specific
concept containing Goya and UK is obtained by navigation. The answer obtained
is National Gallery in the example.

3.2   Formal Concept Analysis (FCA)
As the basics of Formal Concept Analysis (FCA) [7] are well known, we only
introduce some of the concepts which are necessary to understand this paper.
FCA is a mathematical framework used for a number of purposes, among which
classification and data analysis, information retrieval and knowledge discovery
[4]. In some cases we obtain a huge number of concepts. In order to restrict the
           Defining Views with FCA for Understanding SPARQL Query Results                                    259




    (a) Classes of Museums w.r.t Artists and Countries, e.g., the
    concept on the top left corner with the attribute France contains   (b) Classes of Artists w.r.t Museums and
    all the French Museums, i.e., Musee du Louvre (Louvre) and          Countries. (VIEW BY ?artist)
    Musee d’Art Moderne (MAM). (VIEW BY ?museum)


          Fig. 1: Lattice-Based Views w.r.t Museum’s and Artist’s Perspective .




number of concepts, iceberg concept lattices can be used [10]. Iceberg concept
lattices contain only the top most part of the lattice. Along with iceberg lattices
a stability index [9] is also used for filtering the concepts. The stability index
shows how much the concept intent depends on particular objects of the extent.
    FCA also allows knowledge discovery using association rules. An implication
over the attribute set M in a formal context is of the form B1 → B2 , where
B1 , B2 ⊆ M . The implication holds iff every object in the context with an
attribute in B1 also has all the attributes in B2 . For example, when (A1 , B1 ) ≤
(A2 , B2 ) in the lattice, we have that B1 → B2 . Duquenne-Guigues (DG) basis
for implications [8] is the minimal set of implications equivalent to the set of all
valid implications for a formal context K = (G, M, I). Actually, the DG-basis
contains all information lying in the concept lattice.


4     Lattice-Based View Access

4.1     SPARQL Queries with Classification Capabilities

The idea of introducing a VIEW BY clause is to provide classification of the
results and add a knowledge discovery aspect to the results w.r.t the vari-
ables appearing in VIEW BY clause. Let Q be a SPARQL query of the form Q
= SELECT ?X ?Y ?Z WHERE {pattern P} VIEW BY ?X then the set of variables
V = {?X, ?Y, ?Z} 6 . According to the definition 1 the answer of the tuple (V, P )
is represented as [[({?X, ?Y, ?Z}, P )]] = µi where i ∈ {1, . . . , k} and k is the
number of mappings obtained for the query Q. For the sake of simplicity, µ|W
is given as µ. Here, dom(µi ) = {?X, ?Y, ?Z} which means that µ(?X) = Xi ,
6
    As W represents set of attribute values in the definition of a many-valued formal
    context, we represent the variables in select clause as V to avoid confusion.
260       Mehwish Alam and Amedeo Napoli


µ(?Y ) = Yi and µ(?Z) = Zi . Finally, a complete set of mappings can be given
as {{?X → Xi , ?Y → Yi , ?Z → Zi }}.
    The variable appearing in the VIEW BY clause is referred to as object variable7
and is denoted as Ov such that Ov ∈ V . In the current scenario Ov = {?X}.
The remaining variables are referred to as attribute variables and are denoted as
Av where Av ∈ V such that Ov ∪ Av = V and Ov ∩ Av = ∅, so, Av = {?Y, ?Z}.

Example 2. Following the example in section 2, an alternate query with the VIEW
BY clause can be given as:

SELECT ?museum ?artist ?country WHERE {
   ?museum rdf:type dbpedia-owl:Museum .
   ?museum dbpedia-owl:location ?city .
   ?city dbpedia-owl:country ?country .
   ?painting dbpedia-owl:museum ?museum .
   ?painting dbpprop:artist ?artist}
VIEW BY ?museum


                           ?museum            ?artist       ?country
                       µ1 Musee d’Art Moderne Pablo Picasso France
                       µ2 Museo del Prado     Raphael       Spain
                       .. ..                  ..            ..
                        . .                    .             .

                  Table 1: Generated Mappings for SPARQL Query Q
     Here, V ={?museum, ?artist, ?country} and P is the conjunction of pat-
terns in the WHERE clause then the evaluation of [[({?museum, ?artist, ?country}
, P )]] will generate the mappings shown in Table 1. Accordingly, dom(µi ) =
{?museum, ?artist, ?country}. Here, µ1 (?museum) = M usee d0 Art M oderne,
µ1 (?artist) = P ablo P icasso and µ1 (?country) = F rance. We have Ov =
{?museum} because it appears in the VIEW BY clause and Av = {?artist,
?country}. Figure 1a shows the generated view when Ov = {?museum} and
in Figure 1b, we have; Ov = {?artist} and Av = {?museum, ?country}.


4.2    Designing a Formal Context of Answer Tuples

The results obtained by the query are in the form of set of tuples, which are
then organized as a many-valued context.

Obtaining a Many-Valued Context (G, M, W, I): As described previously, we
have Ov = {?X} then µ(?X) = {Xi }i∈{1,...,k} , where Xi denote the values
obtained for the object variable and the corresponding mapping is given as
{{?X → Xi }}. Finally, G = µ(?X) = {Xi }i∈{1,...,k} . Let Av = {?Y, ?Z} then
M = Av and the attribute values W = {µ(?Y ), µ(?Z)} = {{Yi }, {Zi }}i∈{1,...,k} .
The corresponding mapping for attribute variables are {{?Y → Yi , ?Z → Zi }}.
7
    The object here refers to the object in FCA.
        Defining Views with FCA for Understanding SPARQL Query Results                       261


In order to obtain a ternary relation, let us consider an object value gi ∈ G and
an attribute value wi ∈ W then we have (gi , “?Y 00 , wi ) ∈ I iff ?Y (gi ) = wi , i.e.,
the value of gi for attribute ?Y is wi , i ∈ {1, . . . , k} as we have k values for ?Y .

Obtaining Binary Context (G, M, I): Afterwards, a conceptual scaling used for
binarizing the many-valued context, in the form of (G, M, I). Finally, we have
G = {Xi }i∈{1,...,k} , M = {Yi } ∪ {Zi } where i ∈ {1, . . . , k} for object variable
Ov = {?X}. The binary context obtained after applying the above transforma-
tions to the SPARQL query answers w.r.t to object variable is called the formal
context of answer tuples and is denoted by Ktuple .
Example 3. In the example Ov = {?museum}, Av = {?artist, ?country}. The
answers obtained by this query are organized into a many-valued context as
follows: the distinct values of the object variable ?museum are kept as a set of
objects, so G = {M useeduLouvre, M useodelP rado, . . . }, attribute variables
provide M = {artist, country}, W1 = {Raphael, LeonardoDaV inci, . . . } and
W2 = {F rance, Spain, U K, . . . } in a many-valued context. The obtained many-
valued context is shown in Table 2. Finally, the obtained many-valued context
is conceptually scaled to obtain a binary context shown in Table 3.
               Museum                                  Artist                     Country
               Musee du Louvre        {Raphael, Leonardo Da Vinci, Caravaggio}    {France}
               Musee d’Art Moderne                 {Pablo Picasso}                {France}
               Museo del Prado          {Raphael, Caravaggio, Francisco Goya}      {Spain}
               National Gallery    {Leonardo Da Vinci, Caravaggio, Francisco Goya} {UK}

                       Table 2: Many-Valued Context (Museum).

                                                Artist                      Country
           Museum              Raphael Da Vinci Picasso Caravaggio Goya France Spain UK
           Musee du Louvre       ×        ×                 ×             ×
           Musee d’Art Moderne                     ×                      ×
           Museo del Prado       ×                          ×       ×            ×
           National Gallery               ×                 ×       ×                ×

                    Table 3: Formal Context Ktuple w.r.t ?museum.
    The organization of the concept lattice is depending on the choice of ob-
ject variable and the attribute variables. Then, to group the artists w.r.t the
museums where their work is displayed and the location of the museums, the
object variable would be ?artist and the attribute variables will be ?museum
and ?country. Then, the scaling can be performed for obtaining a formal con-
text. In order to complete the set of attribute, domain knowledge can also be
taken into account, such as the the ontology related to the type of artists or mu-
seums. This domain knowledge can be added with the help of pattern structures,
an approach linked to FCA, on top of many-valued context without having to
perform scaling. For the sake of simplicity, we do not discuss it in this paper.

4.3   Building a Concept Lattice
Once the context is designed, the concept lattice can be built using an FCA algo-
rithm.There are some very efficient algorithms that can be used [7, 11]. However,
262      Mehwish Alam and Amedeo Napoli


in the current implementation we use AddIntent [11] which is an incremental
concept lattice construction algorithm. In case of large data iceberg lattices can
be considered [10]. The use of VIEW BY clause activates the process of LBVA,
which transforms the SPARQL query answers (tuples) to a formal context Ktuples
through which a concept lattice is obtained which is referred to as a Lattice-Based
View. A view on SPARQL query in section 2, i.e, a concept lattice corresponding
to Table 3 is shown in Figure 1a.

4.4   Interpretation Operations over Lattice-Based Views
A formal context effectively takes into account the relations by keeping the
inherent structure of the relationships present in LOD as object-attribute re-
lation. When we build a concept lattice, each concept keeps a group of terms
sharing some attribute (i.e., the relationship with other terms). This concept
lattice can be navigated for searching and accessing particular LOD elements
through the corresponding concepts within the lattice. It can be drilled down
from general to specific concepts or rolled up to obtain the general ones which
can be further interpreted by the domain experts. For example, in order to search
for the museums where there is an exhibition of the paintings of Caravaggio,
the concept lattice in Figure 1(a) is explored levelwise. It can be seen that the
paintings of Caravaggio are displayed in Musee du Louvre, Museo del Prado
and National Gallery. Now it can be further filtered by country, i.e., look
for French museums displaying Caravaggio. The same lattice can be drilled
down and Musee du Louvre as an answer can be retrieved. Next, to check the
museums located in France and Spain, the roll up operation from the French
Museums to the general concept containing all the museums with Caravaggio’s
painting can be applied and then the drill down operation to Museums in France
or Spain displaying Caravaggio can be performed. The answer obtained will be
Musee du Louvre and Museo del Prado.
     A different perspective on the same set of answers can also be retrieved,
meaning that the group of artists w.r.t museums and country. For selecting
French museums according to the artists they display, the object variable will be
Ov = {?artist} and attribute variables will be Av = {?museum, ?country}. The
lattice obtained in this case will be from Artist’s perspective (see Figure 1b).
Now, it is possible to retrieve Musee du Louvre and Musee d’Art Moderne,
which are the French museums and to obtain a specific French museum displaying
the work of Leonardo Da Vinci a specific concept can be selected which gives
the answer Musee du Louvre.
     FCA provides a powerful means for data analysis and knowledge discovery.
VIEW BY can be seen as a clause that engulfs the original SPARQL query and
enhances it’s capabilities by providing views which can be reduced using ice-
berg concept lattices. Iceberg lattices provide the top most part of the lattice
filtering out only general concepts. The concept lattice is still explored levelwise
depending on a given threshold. Then, only concepts whose extent is sufficiently
large are explored, i.e., the support of a concept corresponds to the cardinal of
the extent. If further specific concepts are required the support threshold of the
         Defining Views with FCA for Understanding SPARQL Query Results         263


iceberg lattices can be lowered and the resulting concept lattice can be explored
levelwise.
Knowledge Discovery: Among the means provided by FCA for knowledge
discovery, the Duquenne-Guigues basis of implications takes into account a min-
imal set of implications which represent all the implications (i.e., association
rules with confidence 1) that can be obtained by accessing the view i.e., a con-
cept lattice. For example, implications according to Figure 1(a) state that all the
museums in the current context which display Leonardo Da Vinci also display
Caravaggio (rule: Leonardo Da Vinci → Caravaggio). It also says that
only the museums which display the work of Caravaggio display the work of
Leonardo Da Vinci Such a rule can be interesting if the museums which dis-
play the work of both Leonardo Da Vinci and Caravaggio are to be retrieved.
The rule Goya, Raphael, Caravaggio → Spain suggests that there exists a
museum which have works of Goya, Raphael, Caravaggio only in Spain, more
precisely Museo Del Prado. (These rules are generated from only the part of
SPARQL query answers shown as a context in Table 3).


5     Experimentation

The experiments were conducted on real dataset. Our algorithm is implemented
in Java using Jena8 platform and the experiments were conducted on a laptop
with 2.60 GHz Intel core i5 processor, 3.7 GB RAM running Ubuntu 12.04. We
extracted the information about the movie with their genre and location using
SPARQL query enhanced with VIEW BY clause. The experiment shows that even
though the background knowledge (ontological information) was not extracted
the views reveal the hidden hierarchical information contained in the SPARQL
query answers and can be navigated accordingly. Moreover, it also shows that
useful knowledge is extracted from the answers through the the views using
DG−Basis of implications. We also performed quantitative analysis where we
discussed about the sparsity of the semantic web data. We also tested how our
method scales with growing number of results. The number of answers obtained
by YAGO were 100,000. The resulting view kept the classes of movies with
respect to genre and location.


5.1    YAGO

The construction of YAGO ontology is based on the extraction of instances and
hierarchical information from Wikipedia and Wordnet. In the current experi-
ment, we sent a query to YAGO with the VIEW BY clause.

PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX yago: http://yago-knowledge.org/resource/
SELECT ?movie ?genre ?location WHERE {
8
    https://jena.apache.org/
264     Mehwish Alam and Amedeo Napoli


   ?movie rdf:type yago:wordnet movie 106613686 .
   ?movie yago:isLocatedIn ?location .
   ?movie rdf:type ?genre . }
VIEW BY ?movie

    While querying YAGO it was observed that the genre and location informa-
tion was also given in the ontology. The first level of the obtained view over the
SPARQL query results over YAGO kept the groups of movies with respect to
their languages. e.g., the movies with genre Spanish Language Films. However,
as we further drill down in the concept lattice we get more specific categories
which include the values from the location variable such as Spain, Argentina
and Mexico. There were separate classes obtained for movies based on novels
which were then further specialized by the introduction of the country attribute
as we drill down the concept lattice. Finally with the help of lattice-based views,
it can be concluded that the answers obtained by querying YAGO provides a
clean categorization of movies by making use of the partially ordered relation
between the concepts present in the concept lattice.

DG-Basis of Implications: DG-Basis of Implications for YAGO were calcu-
lated. The implications were filtered in three ways. Firstly, pruning was per-
formed naively with respect to support threshold. Around 200 rules were ex-
tracted on support threshold of 0.2%. In order, to make the rules observable,
the second type of filtering based on number of elements in the body of the
rules was applied. All the implications which contained one item set in the body
were selected. However, if there still are large number of implications to be ob-
served then a third type of pruning can be applied which involved the selection
of implications with different attribute type in head and body, e.g., in rule#1
head contains United States which is of type country and body contains the
wikicategory. Such kind of pruning helps in finding attribute-attribute relations.
    Table 4 contains some of the implications. Calculating DG − Basis of impli-
cations is actually useful in finding regularities in the SPARQL query answers
which can not be discovered from the raw tuples obtained. For example, rule#1
states that RKO picture films is an American film production and distribution
company as all the movies produced and distributed by them are from United
States. Moreover, rule#2 says that all the movies in Oriya language are from
India. This actually points to the fact that Oriya is one of many languages that
is spoken in India. This rule also tells that Oriya language is only spoken in
India. Rule#3 shows a link between a category from Wikipedia and Wordnet,
which clearly says that the wikicategory is more specific than the wordnet
category as remake is more general than Film remakes.
           Impl. ID Supp. Implication
              1.     96 wikicategory RKO Pictures films → United States
              2.     46 wikicategory Oriya language films → India
              3.     64 wikicategory Film remakes → wordnet remake

        Table 4: Some implications from DG-Basis of Implication (YAGO)
        Defining Views with FCA for Understanding SPARQL Query Results                                                                                                                265




                                             0.30




                                                                                                                                   120
            Desnity of Formal Context in %




                                                                                                                                   100
                                             0.25




                                                                                                     Execution Time (in seconds)
                                                                                                                                   80
                                             0.20




                                                                                                                                   60
                                                                                                                                   40
                                             0.15




                                                                                                                                   20
                                             0.10




                                                                                                                                   0
                                                    20    40            60             80   100                                          20   40            60             80   100
                                                               Number of Tuples in %                                                               Number of Tuples in %



                                             (a) Density of KY AGO                                (b) Runtime for Building LY AGO

                                                                             Fig. 2: Experimental Results.
5.2   Evaluation
Besides the qualitative evaluation of LBVA, we performed an empirical evalua-
tion. The characteristics of the dataset are shown in Table 5. These concepts were
pruned with the help of iceberg lattices and stability for qualitative analysis.
    The plots for the experimentation are shown in Figure 2. Figure 2(a) shows a
comparison between the number of tuples obtained and the density of the formal
context. The density of the formal context is the proportion of pairs in I w.r.t
the size G × M . It has very low range for both the experiments, i.e., it ranges
from 0.14% to 0.28%. This means in particular that the semantic web data is
very sparse when considered in a formal context and deviates from the datasets
usually considered for FCA (as they are dense). Here we can see that as the
number of tuples increases the density of the formal context is decreasing which
means that sparsity of the data also increases.
    We also tested how our method scales with growing number of results. The
number of answers obtained by YAGO were 100,000. Figure 2(b) illustrate the
execution time for building the concept lattice w.r.t the number of tuples ob-
tained. The execution time ranges from 20 to 100 seconds, it means that the
the concept lattices were built in an efficient way and large data can be consid-
ered for these kinds of experiments. Usually the computation time for building
concept lattices depends on the density of the formal context but in the case
of semantic web data, as the density is not more than 1%, the computation
completely depends on the number of objects obtained which definitely increase
with the increase in the number of tuples (see Table 5).
                                                                       No. of Tuples |G| |M | No. of Concepts
                                                                           20%       3657 2198      7885
                                                                           40%       6783 3328     19019
                                                                           60%       9830 4012     31264
                                                                           80%      12960 4533     43510
                                                                           100%     15272 4895     55357

                                                         Table 5: Characteristics of Datasets (YAGO)

6     Conclusion and Discussion
In LBVA, we introduce a classification framework based on FCA for the set of
tuples obtained as a result of SPARQL queries over LOD. In this way, a view
266      Mehwish Alam and Amedeo Napoli


is organized as a concept lattice built through the use of VIEW BY clause that
can be navigated where information retrieval and knowledge discovery can be
performed. Several experiments show that LBVA is rather tractable and can be
applied to large data.
    For future work, we are interested in extending the VIEW BY clause by in-
cluding the available background knowledge of the resources using the formalism
of pattern structures [6]. Moreover, we intend to use implications for complet-
ing the background knowledge. We also intend to use pattern structures with a
graph description for each considered object, where the graph is the set of all
triples accessible w.r.t reference object.


References
 1. Marcelo Arenas, Claudio Gutierrez, and Jorge Pérez. Foundations of rdf databases.
    In Sergio Tessaris, Enrico Franconi, Thomas Eiter, Claudio Gutierrez, Siegfried
    Handschuh, Marie-Christine Rousset, and Renate A. Schmidt, editors, Reasoning
    Web, volume 5689 of Lecture Notes in Computer Science, pages 158–204. Springer,
    2009.
 2. Christian Bizer, Tom Heath, and Tim Berners-Lee. Linked data - the story so far.
    Int. J. Semantic Web Inf. Syst., 5(3):1–22, 2009.
 3. Claudio Carpineto, Stanislaw Osiński, Giovanni Romano, and Dawid Weiss. A
    survey of web clustering engines. ACM Comput. Surv., 41(3):17:1–17:38, 2009.
 4. Claudio Carpineto and Giovanni Romano. Concept data analysis - theory and
    applications. Wiley, 2005.
 5. Claudia d’Amato, Nicola Fanizzi, and Agnieszka Lawrynowicz. Categorize by:
    Deductive aggregation of semantic web query results. In Lora Aroyo, Grigoris An-
    toniou, Eero Hyvönen, Annette ten Teije, Heiner Stuckenschmidt, Liliana Cabral,
    and Tania Tudorache, editors, ESWC (1), volume 6088 of Lecture Notes in Com-
    puter Science, pages 91–105. Springer, 2010.
 6. Bernhard Ganter and Sergei O. Kuznetsov. Pattern structures and their projec-
    tions. In Harry S. Delugach and Gerd Stumme, editors, ICCS, volume 2120 of
    Lecture Notes in Computer Science, pages 129–142. Springer, 2001.
 7. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foun-
    dations. Springer, Berlin/Heidelberg, 1999.
 8. J.-L. Guigues and V. Duquenne. Familles minimales d’implications informatives
    résultant d’un tableau de données binaires. Mathématiques et Sciences Humaines,
    95:5–18, 1986.
 9. Sergei O. Kuznetsov. On stability of a Formal Concept. Ann. Math. Artif. Intell.,
    49(1-4):101–115, 2007.
10. Gerd Stumme, Rafik Taouil, Yves Bastide, and Lotfi Lakhal. Conceptual cluster-
    ing with iceberg concept lattices. In R. Klinkenberg, S. Rüping, A. Fick, N. Henze,
    C. Herzog, R. Molitor, and O. Schröder, editors, Proc. GI-Fachgruppentreffen
    Maschinelles Lernen (FGML’01), Universität Dortmund 763, October 2001.
11. Dean van der Merwe, Sergei A. Obiedkov, and Derrick G. Kourie. Addintent: A
    new incremental algorithm for constructing concept lattices. In Peter W. Eklund,
    editor, ICFCA, Lecture Notes in Computer Science, pages 372–385. Springer, 2004.
          A generalized framework to consider positive
           and negative attributes in formal concept
                            analysis.

               J. M. Rodriguez-Jimenez, P. Cordero, M. Enciso and A. Mora

                         Universidad de Málaga, Andalucı́a Tech, Spain.
                                  {pcordero,enciso}@uma.es
                             {amora,jmrodriguez}@ctima.uma.es



             Abstract. In Formal Concept Analysis the classical formal context is
             analized taking into account only the positive information, i.e. the pres-
             ence of a property in an object. Nevertheless, the non presence of a prop-
             erty in an object also provides a significant knowledge which can only
             be partially considered with the classical approach. In this work we have
             modified the derivation operators to allow the treatment of both, positive
             and negative attributes which come from respectively, the presence and
             absence of the properties. In this work we define the new operators and
             we prove that they are a Galois connection. Finally, we have also studied
             the correspondence between the formal context in the new framework
             and the extended concept lattice, providing new interesting properties.


      1    Introduction

      Data analysis of information is a well established discipline with tools and tech-
      niques well developed to challenge the identification of hide patterns in the data.
      Data mining, and general Knowledge Discovering, helps in the decision mak-
      ing process using pattern recognition, clustering, association and classification
      methods. One of the popular approaches used to extract knowledge is mining
      the patterns of the data expressed as implications (functional dependencies in
      database community) or association rules.
          Traditionally, implications and similar notions have been built using the posi-
      tive information, i.e. information induced by the presence of attributes in objects.
      In Manilla et al. [6] an extended framework for enriched rules was introduced,
      considering negation, conjunction and disjunction. Rules with negated attributes
      were also considered in [1]: “if we buy caviar, then we do not buy canned tuna”.
          In the framework of formal concept analysis, some authors have proposed the
      mining of implications with positive and negative attributes from the apposition
      of the context and its negation (K|K) [2, 4]. Working with (K|K) conduits to
      a huge exponential problem and also as R. Missaoui et.al. shown in [9] real
      applications use to have sparse data in the context K whereas dense data in K
      (or viceversa), and therefore “generate a huge set of candidate itemsets and a
      tremendous set of uninteresting rules”.




c Karell Bertet, Sebastian Rudolph (Eds.): CLA 2014, pp. 267–279,
  ISBN 978–80–8152–159–1, Institute of Computer Science, Pavol Jozef Šafárik
  University in Košice, 2014.
268 2          Rodriguez-Jimenez
            José                et al.
                  Manuel Rodrı́guez-Jiménez et al.

       R. Missaoui et al. [7, 8] propose the mining from a formal context K of
   a subset of all mixed implications, i.e. implication with positive and negative
   attributes, representing the presence and absence of properties. As far as we
   know, the approach of these authors uses, for first time in this problem, a set of
   inference rules to manage negative attributes.
       In [11] we followed the line proposed by Missaoui and presented an algo-
   rithm, based on the NextClosure algorithm, that allows to obtain mixed impli-
   cations. The proposed algorithm returns a feasible and complete basis of mixed
   implications by performing a reduced number of requests to the formal context.
   Beyond the benefits provided by the inclusion of negative attributes in terms
   of expressiveness, Revenko and Kuznetsov [10] use negative attributes to tackle
   the problem of finding some types of errors in new object intents is introduced.
   Their approach is based on finding implications from an implication basis of
   the context that are not respected by a new object. Their work illustrates the
   great benefit that a general framework for negative and positive attributes would
   provide.
       In this work we propose a deeper study of the algebraic framework for Formal
   Concept Analysis taking into account positive and negative information. The
   first step is to consider an extension of the classical derivation operators, proving
   to be Galois connection. As in the classical framework, this fact will allows
   to built the two usual dual concept lattices, but in this case, as we shall see,
   the correspondence among concept lattices and formal contexts reveal several
   characteristics which induce interesting properties. The main aim of this work
   is to establish a formal full framework which allows to develop in the future new
   methods and techniques dealing with positive and negative information.
       In Section 2 we present the background of this work: the notions related with
   formal concept analysis and negative attributes. Section 3 introduces the main
   results which constitute the contribution of this paper.


   2        Preliminaries
   2.1       Formal Concept Analysis
   In this section, the basic notions related with Formal Concept Analysis (FCA)
   [12] and attribute implications are briefly presented. See [3] for a more detailed
   explanation. A formal context is a triple K = hG, M, Ii where G and M are
   finite non-empty sets and I ⊆ G × M is a binary relation. The elements in G
   are named objects, the elements in M attributes and hg, mi ∈ I means that the
   object g has the attribute m. From this triple, two mappings ↑: 2G → 2M and
   ↓: 2M → 2G , named derivation operators, are defined as follows: for any X ⊆ G
   and Y ⊆ M ,
                       X ↑ = {m ∈ M | hg, mi ∈ I for all g ∈ X}                   (1)
                           Y ↓ = {g ∈ G | hg, mi ∈ I for all m ∈ Y }                (2)
        ↑                                                                       ↓
   X is the subset of all attributes shared by all the objects in X and Y is the
   subset of all objects that have the attributes in Y . The pair (↑, ↓) constitutes
                   A generalized
  A Generalized Framework        framework
                            for Positive andfor negativeAttributes
                                              Negative   attributes in
                                                                    in FCA
                                                                       FCA          3
                                                                                   269

a Galois connection between 2G and 2M and, therefore, both compositions are
closure operators.
    A pair of subsets hX, Y i with X ⊆ G and Y ⊆ M such X ↑ = Y and
  ↓
Y = X is named a formal concept. X is named the extent and Y the intent of
the concept. These extents and intents coincide with closed sets wrt the closure
operators because X ↑↓ = X and Y ↓↑ = Y . Thus, the set of all formal concepts
is a lattice, named concept lattice, with the relation

      hX1 , Y1 i ≤ hX2 , Y2 i if and only if X1 ⊆ X2 (or equivalently, Y2 ⊆ Y1 )   (3)

    This concept lattice will be denoted by B(G, M, I).
    The concept lattice can be characterized in terms of attribute implications
being expressions A → B where A, B ⊆ M . An implication A → B holds in a
context K if A↓ ⊆ B ↓ . That is, any object that has all the attributes in A has also
all the attributes in B. It is well known that the sets of attribute implications
that are valid in a context satisfies the Armstrong’s Axioms:
[Ref] Reflexivity: If B ⊆ A then ` A → B.
[Augm] Augmentation: A → B ` A ∪ C → B ∪ C.
[Trans] Transitivity: A → B, B → C ` A → C.
    A set of implications Σ is considered an implicational system for K if: an
implication holds in K if and only if it can be inferred, by using Armstrong’s
Axioms, from Σ.
    Armstrong’s axioms allow us to define the closure of attribute sets wrt an
implicational system (the closure of a set A is usually denoted as A+ ) and it
is well-known that closed sets coincide with intents. On the other hand, several
kind of implicational systems has been defined in the literature being the most
used the so-called Duquenne-Guigues (or stem) basis [5]. This basis satisfies
that its cardinality is minimum among all the implicational systems and can be
obtained from a context by using the renowned NextClosure Algorithm [3].

2.2     Negatives attributes
As we have mentioned in the introduction, classical FCA only discover knowledge
limited to positive attributes in the context, but it does not consider information
relative to the absence of properties (attributes). Thus, the Duquenne-Guigues
basis obtained from Table 1 is {e → bc, d → c, bc → e, a → b}. Moreover, the
implications b → c and b → d do not hold in Table 1 and therefore they can
not be derived from the basis by using the inference system. Nevertheless, both
implications correspond with different situations. In the first case, some objects
have attributes b and c (e.g. objects o1 and o3 ) whereas another objects (e.g. o2 )
have the attribute b and do not have c. On the other side, in the second case,
any object that has the attribute b does not have the attribute d.
    A more general framework is necessary to deal with this kind of information.
In [11], we have tackled this issue focusing on the problem of mining implication
with positive and negative attributes from formal contexts. As a conclusion of
270 4        Rodriguez-Jimenez
          José                et al.
                Manuel Rodrı́guez-Jiménez et al.

                           I      a      b          c   d    e
                          o1             ×          ×        ×
                          o2      ×      ×
                          o3             ×          ×        ×
                          o4                        ×   ×
                                 Table 1. A formal context



   that work we emphasized the necessity of a full development of an algebraic
   framework.
       First, we begin with the introduction of an extended notation that allows
   us to consider the negation of attributes. From now on, the set of attributes is
   denoted by M , and its elements by the letter m, possibly with subindexes. That
   is, the lowercase character m is reserved for positive attributes. We use m to
   denote the negation of the attribute m and M to denote the set {m | m ∈ M }
   whose elements will be named negative attributes.
       Arbitrary elements in M ∪ M are going to be denoted by the first letters in
   the alphabet: a, b, c, etc. and a denotes the opposite of a. That is, the symbol a
   could represent a positive or a negative attribute and, if a = m ∈ M then a = m
   and if a = m ∈ M then a = m.
       Capital letters A, B, C,. . . denote subsets of M ∪ M . If A ⊆ M ∪ M , then A
   denotes the set of the opposite of attributes {a | a ∈ A} and the following sets
   are defined:

        – Pos(A) = {m ∈ M | m ∈ A}
        – Neg(A) = {m ∈ M | m ∈ A}
        – Tot(A) = Pos(A) ∪ Neg(A)

   Note that Pos(A), Neg(A), Tot(A) ⊆ M .
       Once we have introduced the notation, we are going to summarize some
   results concerning the mining of knowledge from contexts in terms of implications
   with negative and positive attributes [11]. A trivial approach could be obtained
   by adding new columns to the context with the opposite of the attributes [4].
   That is, given a context K = hG, M, Ii, a new context (K|K) = hG, M ∪M , I ∪Ii
   is considered, where I = {hg, mi | g ∈ G, m ∈ M, hg, mi 6∈ I}. For example, if
   K is the context depicted in Table 1, the context (K|K) is those presented in
   Table 2. Obviously, the classical framework and its corresponding machinery can
   be used to manage the new context and, in this (direct) way, negative attributes
   are considered. However, this rough approach induces a non trivial growth of
   the formal context and, consequently, algorithms have a worse performance.
       In our opinion, a deeper study was done by R. Missaoui et al. in [7] where an
   evolved approach has been provided. For first time –as far as we know– inference
   rules for the management of positive and negative attributes are introduced [8].
   The authors also developed new methods to mine mixed attribute implications
   by means of the key notion [9].
                   A generalized
  A Generalized Framework        framework
                            for Positive andfor negativeAttributes
                                              Negative   attributes in
                                                                    in FCA
                                                                       FCA      5
                                                                               271

        I ∪I   a     b       c     d     e     a     b       c    d   e
          o1         ×       ×           ×     ×                  ×
          o2   ×     ×                                     ×      ×   ×
          o3         ×       ×           ×     ×                  ×
          o4                 ×     ×           ×     ×                ×
                         Table 2. The formal context (K|K)



    In [11], we have developed a method to mine mixed implications whose main
goal has been to avoid the management of the large (K|K) contexts, so that the
performance of the corresponding method has a controlled cost.
    First, we extend the definitions of derivation operators, formal concept and
attribute implication.

Definition 1. Let K = hG, M, Ii be a formal context. We define the operators
⇑: 2G → 2M ∪M and ⇓: 2M ∪M → 2G as follows: for X ⊆ G and Y ⊆ M ∪ M ,

                   X ⇑ = {m ∈ M | hg, mi ∈ I for all g ∈ X}
                         ∪ {m ∈ M | hg, mi 6∈ I for all g ∈ X}                (4)

                   Y ⇓ = {g ∈ G | hg, mi ∈ I for all m ∈ Y }
                         ∩ {g ∈ G | hg, mi 6∈ I for all m ∈ Y }               (5)

Definition 2. Let K = hG, M, Ii be a formal context. A mixed formal concept
in K is a pair of subsets hX, Y i with X ⊆ G and Y ⊆ M ∪ M such X ⇑ = Y and
Y ⇓ = X.

Definition 3. Let K = hG, M, Ii be a formal context and let A, B ⊆ M ∪ M ,
the context K satisfies a mixed attribute implication A → B, denoted by K |=
A → B, if A⇓ ⊆ B ⇓ .

    For example, in Table 1, as we previously mentioned, two different situations
were presented. Thus, in this new framework we have that K 6|= b → d and
K |= b → d whereas K 6|= b → c either K 6|= b → c.
    Now, we are going to introduce the mining method for mixed attribute im-
plications. The method is strongly based on the set of inference rules built by
supplementing Armstrong’s axioms with the following ones, introduced in [8]:
let a, b ∈ M ∪ M and A ⊆ M ∪ M ,
[Cont] Contradiction: ` aa → M M .
[Rft] Reflection: Aa → b ` Ab → a.
    The closure of an attribute set A wrt a set of mixed attribute implications Σ,
denoted as A++ , is defined as the biggest set such that A → A++ can be inferred
from Σ by using Armstrong’s Axioms plus [Cont] and [Rft]. Therefore, a mixed
implication A → B can be inferred from Σ if and only if B is a subset of the
closure of A, i.e. B ⊆ A++ .
272 6         Rodriguez-Jimenez
           José                et al.
                 Manuel Rodrı́guez-Jiménez et al.

       The proposed mining method, depicted in Algorithm 1, uses the inference
   rules in such a way that it is not centered around the notion of key, but it
   extends, in a proper manner, the classical NextClosure algorithm [3].
        Algorithm 1: Mixed Implications Mining
          Data: K = hG, M, Ii
          Result: Σ set of implications
        1  begin
        2      Σ := ∅;
        3      Y := ∅;
        4      while Y < M do
        5          foreach X ⊆ Y do
        6             A := (Y r X) ∪ X;
        7             if Closed(A, Σ) then
        8                  C := A⇓⇑ ;
        9                  if A 6= C then Σ := Σ ∪ {A → C r A}

    10             Y := Next(Y ) // i.e. successor of Y in the lectic order
    11         return Σ
    12      end

       The algorithm to calculate the mixed implicational system doesn’t need to
   exhaustive traverse all the subsets of mixed attributes, but only those ones that
   are closed w.r.t. the set of implications previously computed. The Closed func-
   tion is defined having linear cost and is used to discern when a set of attributes
   is not closed and thus, the context is not visited in this case.
        Function Closed(A,Σ): boolean
          Data: A ⊆ M ∪ M with Pos(A)∩Neg(A) = ∅ and Σ being a set of mixed
                implications.
          Result: ‘true’ if A is closed wrt Σ or ‘false’ otherwise.
        1  begin
        2      foreach B → C ∈ Σ do
        3          if B ⊆ A and C * A then exit and return false if B r A = {a},
                   A ∩ C 6= ∅, and a 6∈ A then exit and return false
        4      return true
        5  end




   3       Mixed concept lattices

   As we have mentioned, the goal of this paper is to develop a deep study of the
   generalized algebraic framework. In this section we are going to introduce the
   main results of this paper providing the properties of the generalized concept
   lattice. The main pillar of our new framework are the two derivation operators
   introduced in Equations 4 and 5. The following theorem ensures that the pair
   of these operators is a Galois connection:
                   A generalized
  A Generalized Framework        framework
                            for Positive andfor negativeAttributes
                                              Negative   attributes in
                                                                    in FCA
                                                                       FCA          7
                                                                                   273

Theorem 1. Let K = hG, M, Ii be a formal context. The pair of derivation
operators (⇑, ⇓) introduced in Definition 1 is a Galois Connection.
Proof. We need to prove that, for all subsets X ⊆ G and Y ⊆ M ∪ M ,
                          X ⊆ Y ⇓ if and only if Y ⊆ X ⇑
First, assume X ⊆ Y ⇓ . For all a ∈ Y , we distinguish two cases:
 1. If a ∈ Pos(Y ), exists m ∈ M with a = m and, for all g ∈ X, since X ⊆ Y ⇓ ,
    hg, mi ∈ I and therefore a = m ∈ X ⇑ .
 2. If a ∈ Neg(Y ), exits m ∈ M with a = m and, for all g ∈ X, since X ⊆ Y ⇓ ,
    hg, mi 6∈ I and therefore a = m ∈ X ⇑ .
Conversely, assume Y ⊆ X ⇑ and g ∈ X. To ensure that g ∈ Y ⇓ , we need to
prove that hg, ai ∈ I for all a ∈ Pos(Y ) and hg, ai ∈
                                                     / I for all a ∈ Neg(Y ), which
is straightforward from Y ⊆ X ⇑ .                                                 t
                                                                                  u
   Therefore, above theorem ensures that ⇑◦⇓ and ⇓◦⇑ are closure operators.
Furthermore, as in the classical case, both closure operators provide two dually
isomorphic lattices. We denote by B] (G, M, I) to the lattice of mixed concepts
with the relation
          hX1 , Y1 i ≤ hX2 , Y2 i iff X1 ⊆ X2 (or equivalently, iff Y1 ⊇ Y2 )
Moreover, as in the classical FCA, mixed implications and mixed concept lattice
make up the two sides of the same coin, i.e. the information mined from the
mixed formal context may be dually represented by means of a set of mixed
attribute implications or a mixed concept lattice.
    As we shall see later in this section, unlike the classical FCA, mixed concept
lattices are restricted to an specific lattice subclass. There exist specific prop-
erties that lattices may observe to be considered a valid lattice structure which
corresponds to a mixed formal context. In fact, this is one of the main goal of this
paper, the characterization of the lattices in the mixed formal concept analysis.
    In Table 3 six different lattices are depicted. In the classical framework, all of
them may be associated with formal contexts, i.e. in the classical framework any
lattice corresponds with a collection of formal context. Nevertheless, in the mixed
attribute framework this property does not hold anymore. Thus, in Table 3, as
we shall prove later in this paper, lattices 3 and 5 cannot be associated with a
mixed formal context.
    The following two definitions characterizes two kind of significant sets of
attributes that will be used later:
Definition 4. Let K = hG, M, Ii be a formal context. A set A ⊆ M ∪ M is
named consistent set if Pos(A) ∩ Neg(A) = ∅.
The set of consistent sets are going to be denoted by Ctts, i.e.
                  Ctts = {A ⊆ M ∪ M | Pos(A) ∩ Neg(A) = ∅}
If A ∈ Ctts then |A| ≤ |M | and, in the particular case where |A| = |M |, we have
Tot(A) = M . This situation induces the notion of full set:
274 8        Rodriguez-Jimenez
          José                et al.
                Manuel Rodrı́guez-Jiménez et al.

                                                                       ◉



                                                 ◉                     ◉



                          ◉                      ◉                     ◉




                      Lattice 1               Lattice 2            Lattice 3


                         ◉                       ◉                     ◉



                                                          ◉        ◉       ◉

                                          ◉
                     ◉        ◉
                                                          ◉    ◉       ◉       ◉




                         ◉                       ◉                     ◉




                      Lattice 4               Lattice 5            Lattice 6




                              Table 3. Scheletons of some lattices



   Definition 5. Let K = hG, M, Ii be a formal context. A set A ⊆ M ∪ M is said
   to be full consistent set if A ∈ Ctts and Tot(A) = M .
   The following lemma, which characterize the boundary cases, is straightforward
   from Definition 1.
   Lemma 1. Let K = hG, M, Ii be a formal context. Then ∅⇑ = M ∪ M , ∅⇓ = G
   and (M ∪ M )⇓ = ∅.
   In the classical framework, the concept lattice B(G, M, I) is bounded by hM ↓ , M i
   and hG, G↑ i. However, in this generalized framework, as a direct consequence
   from above lemma, the lower and upper bounds of B ] (G, M, I) are h∅, M ∪ M i
   and hG, G⇑ i respectively.
   Lemma 2. Let K = hG, M, Ii be a formal context. The following properties
   hold:
    1. For all g ∈ G, {g}⇑ is a full consistent set.
    2. For all g1 , g2 ∈ G, if g1T∈ {g2 }⇑⇓ then {g1 }⇑ = {g2 }⇑ . 1
    3. For all X ⊆ G, X ⇑ = g∈X {g}⇑ .
   Proof. 1. It is obvious because, for all m ∈ M , hg, mi ∈ I or hg, mi ∈/ I and
      {g}⇑ = {m ∈ M | hg, mi ∈ I} ∪ {m ∈ M | hg, mi ∈ / I} being a disjoint union.
      Thus, Tot({g}⇑ ) = M and Pos({g}⇑ ) ∩ Neg({g}⇑ ) = ∅.
    1
        That is, g1 and g2 have exactly the same attributes.
                   A generalized
  A Generalized Framework        framework
                            for Positive andfor negativeAttributes
                                              Negative   attributes in
                                                                    in FCA
                                                                       FCA              9
                                                                                       275

 2. Since (⇑, ⇓) is a Galois connection, g1 ∈ {g2 }⇑⇓ (i.e. {g1 } ⊆ {g2 }⇑⇓ ) implies
    {g2 }⇑ ⊆ {g1 }⇑ . Moreover, by item 1, both {g1 }⇑ and {g2 }⇑ are full consistent
    and, therefore, {g1 }⇑ = {g2 }⇑ .
 3. In the same way that occurs in the classical framework, since (⇑, ⇓) is a
    Galois connection between (2G , ⊆) and (2M ∪M , ⊆), for any X ⊆ G, we have
                  S        ⇑ T
    that X ⇑ =      g∈X {g}   = g∈X {g}⇑ .                                         t
                                                                                   u
The above elementary lemmas lead to the following theorem emphasizing a sig-
nificant difference with respect to the classical construction and it focuses on how
the inclusion of new objects influences the structure of mixed concept lattice.
Theorem 2. Let K = hG, M, Ii be a formal context, g0 be a new object, i.e.
g0 ∈ / G, and Y ⊆ M be the set of attributes that g0 satisfies. Then, there exists
g ∈ G such that {g}⇑ = {g0 }⇑ if and only if there exists an isomorphism between
B ] (G, M, I) and B ] (G ∪ {g0 }, M, I ∪ {hg0 , mi | m ∈ Y }).
That is, if a new different object (an object that differs at least in one attribute
from each object in the context) is added to the formal context then the mixed
concept lattice changes.
Proof. Obviously, if there exists g ∈ G such that {g}⇑ = {g0 }⇑ , from Lemma 2 g
and g0 have exactly the same attributes, and moreover the lattices B ] (G, M, I)
and B ] (G ∪ {g0 }, M, I ∪ {hg0 , mi | m ∈ Y }) are isomorphic.
    Conversely, if the mixed concept lattices are isomorphic, there exists X ⊆ G
such that the closed set X ⇑ in B ] (G, M, I) coincides with {g0 }⇑ . Thus, in the
mixed concept lattice B ] (G ∪ {g0 }, M, I ∪ {hg0 , mi | m ∈ X}), by Lemma 2, we
have that {g0 }⇑ = X ⇑ = ∩g∈X {g}⇑ . Moreover, since {g0 }⇑ is a full consistent
set, X 6= ∅ because of, by Lemma 1, ∅⇑ = M ∪ M . Therefore, for all g ∈ X
(there exists at least one g ∈ X), g0 ∈ {g}⇑ and, by Lemma 2, {g}⇑ = {g0 }⇑ . t  u
Example 1. Let K1 = ({g1, g2}, {a, b, c}, I1 ) and K2 = ({g1, g2, g3}, {a, b, c}, I2 )
be formal contexts where I1 and I2 are the binary relations depicted in Table 4.
Note that K2 is built from K1 by adding the new object g3. In the classical frame-


                                                    I2       a       b       c
         I1       a       b       c
                                                    g1       ×               ×
         g1       ×               ×
                                                    g2       ×       ×
         g2       ×       ×
                                                    g3       ×
                       Table 4. The formal contexts K1 and K2



work, the concept lattices B({g1, g2}, {a, b, c}, I1 ) and B({g1, g2, g3}, {a, b, c}, I2 )
are isomorphic. See Figure 1.
   However, the lattices of mixed concepts cannot be isomorphic because the
new object g3 is not a repetition of one existing object. See Figure 2.
   The following theorem characterizes the atoms of the new concept lattice B ] .
276 10      Rodriguez-Jimenez
         José                et al.
               Manuel Rodrı́guez-Jiménez et al.




                                                            




                                                      




                                  <⦰,abc>                            <⦰,abc>




                    B({g1, g2}, {a, b, c}, I1 )     B({g1, g2, g3}, {a, b, c}, I2 )



                    Fig. 1. Lattices obtained in the classical framework




                                                           



                                                                   

                                
                                                                 




                               <⦰,abcabc>                          <⦰,abcabc>




                    B] ({g1, g2}, {a, b, c}, I1 )   B] ({g1, g2, g3}, {a, b, c}, I2 )



                    Fig. 2. Lattices obtained in the extended framework
                     A generalized
    A Generalized Framework        framework
                              for Positive andfor negativeAttributes
                                                Negative   attributes in
                                                                      in FCA
                                                                         FCA            11
                                                                                        277

Theorem 3. Let K = hG, M, Ii be a formal context. The set of atoms in the
lattice B ] (G, M, I) is {h{g}⇑⇓ , {g}⇑ i | g ∈ G}.

Proof. First, fixed g0 ∈ G, we are going to prove that the mixed concept
h{g0 }⇑⇓ , {g0 }⇑ i is an atom in B ] (G, M, I). If hX, Y i is a mixed concept such that
                                   ⇑⇓       ⇑               ⇑           ⇑
h∅, M ∪ M i < hX, Y i ≤ h{g   T 0 } , {g0 } i, then {g0 } ⊆ Y = X           M ∪ M . By
Lemma 2, {g0 }⇑ ⊆ X ⇑ = g∈X {g}⇑ . Moreover, for all g ∈ X 6= ∅, by Lemma 2,
both {g0 }⇑ and {g}⇑ are full consistent sets and, since {g0 }⇑ ⊆ {g}⇑ , we have
{g0 }⇑ = {g}⇑ . Therefore, {g0 }⇑ = X ⇑ = Y and hX, Y i = h{g0 }⇑⇓ , {g0 }⇑ i.
    Conversely, if hX, Y i is an atom in B ] (G, M, I), then X 6= ∅ and there
exists g0 ∈ X. Since (⇑, ⇓) is a Galois connection, {g0 }⇑ ⊇ X ⇑ = Y and,
therefore, h{g0 }⇑⇓ , {g0 }⇑ i ≤ hX, Y i. Finally, since hX, Y i is an atom, we have
that hX, Y i = h{g0 }⇑⇓ , {g0 }⇑ i.                                                   t
                                                                                      u

    The following theorem establishes the characterization of the mixed concept
lattice, proving that atoms and join irreducible elements are the same notions.

Theorem 4. Let K = hG, M, Ii be a formal context. Any element in B ] (G, M, I)
is ∨-irreducible if and only if it is an atom.

Proof. Obviously, any atom is ∨-irreducible. We are going to prove that any
∨-irreducible element belongs to {h{g}⇑⇓ , {g}⇑ i | g ∈    T G}. Let hX, Y i be a ∨-
irreducible element. Then, by Lemma 2, Y = X ⇑ = g∈X {g}⇑ . Let X 0 be the
                                             T
smaller set such that X 0 ⊆ X and Y = g∈X 0 {g}⇑ . If X 0 is a singleton, then
hX, Y i ∈ {h{g}⇑⇓ , {g}⇑ i | g ∈ G}.
    Finally, we prove that X 0 is necessarily a singleton. In other case, a bipartition
of X 0 in two disjoint sets Z1 and Z2 can T
                                          be made satisfying
                                                         T     Z1 ∪Z2 = X 0 , Z1 6= ∅,
Z2 6= ∅ and Z1 ∩ Z2 6= ∅. Then, Y = g∈Z1 {g}⇑ ∩ g∈Z2 {g}⇑ = Z1⇑ ∩ Z2⇑ and
so hX, Y i = hZ1⇑⇓ , Z1⇑ i ∨ hZ2⇑⇓ , Z2⇑ i and Z1⇑ 6= Y 6= Z2⇑ . However, it is not posible
because hX, Y i is ∨-irreducible.                                                        t
                                                                                         u

    As a final end point of this study, we may conclude that unlike in the classical
framework, not every concept lattice may be linked with a formal context. Thus,
lattices number 3 and 5 from Table 3 cannot be associated with a mixed formal
context. Both of them have one element which is not an atom but, at the same
time, it is a join irreducible element in the lattice. More specifically, there does
not exists a mixed concept lattice with three elements.


4     Conclusions
In this work we have presented an algebraic study of a general framework to
deal with negative and positive information. After considering new derivation
operators we prove that they constitutes a Galois connection. The main results
of the work are devoted to establish the new relation among mixed concept
lattices and mixed formal concepts. Thus, the most outstanding conclusions are
that:
278 12      Rodriguez-Jimenez
         José                et al.
               Manuel Rodrı́guez-Jiménez et al.

     – the inclusion of a new (and different) object in a formal concept has a direct
       effect in the structure of the lattice, producing a different lattice.
     – no any kind of lattice may be associated with a mixed formal context, which
       induces a restriction in the structure that mixed concept lattice may have.


   Acknowledgements
   Supported by grant TIN2011-28084 of the Science and Innovation Ministry of
   Spain, co-funded by the European Regional Development Fund (ERDF).


   References
    1. R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules in Large
       Databases. In Proceedings of the 20th International Conference on Very Large Data
       Bases (VLDB), pages 487–499, Santiago de Chile, Chile, 1994. Morgan Kaufmann
       Publishers Inc.
    2. J.F. Boulicaut, A. Bykowski, and B. Jeudy. Towards the tractable discovery of
       association rules with negations. In FQAS, pages 425–434, 2000.
    3. B. Ganter. Two basic algorithms in concept analysis. Technische Hochschule,
       Darmstadt, 1984.
    4. G. Gasmi, S. Ben Yahia, E. Mephu Nguifo, and S. Bouker. Extraction of association
       rules based on literalsets. In DaWaK, pages 293–302, 2007.
    5. J.L. Guigues and V. Duquenne. Familles minimales d implications informatives
       resultant d un tableau de donnees binaires. Mathematiques et Sciences Sociales,
       95:5–18, 1986.
    6. H. Mannila, H. Toivonen, and A. Inkeri Verkamo. Efficient algorithms for discov-
       ering association rules. In KDD Workshop, pages 181–192, 1994.
    7. R. Missaoui, L. Nourine, and Y. Renaud. Generating positive and negative exact
       rules using formal concept analysis: Problems and solutions. In ICFCA, pages
       169–181, 2008.
    8. R. Missaoui, L. Nourine, and Y. Renaud. An inference system for exhaustive
       generation of mixed and purely negative implications from purely positive ones. In
       CLA, pages 271–282, 2010.
    9. R. Missaoui, L. Nourine, and Y. Renaud. Computing implications with negation
       from a formal context. Fundam. Inform., 115(4):357–375, 2012.
   10. A. Revenko and S. Kuznetzov. Finding errors in new object intents. In CLA, pages
       151–162, 2012.
   11. J.M. Rodriguez-Jimenez, P. Cordero, M. Enciso, and A. Mora. Negative attributes
       and implications in formal concept analysis. Procedia Computer Science, 31(0):758
       – 765, 2014. 2nd International Conference on Information Technology and Quan-
       titative Management, ITQM 2014.
   12. R. Wille. Restructuring lattice theory: an approach based on hierarchies of con-
       cepts. In Rival, I. (ed.): Ordered Sets, pages 445–470. Boston, 1982.
                             Author Index


Aı̈t-Kaci, Hassan, 3               Liquière, Michel, 11
Al-Msie’Deen, Ra’Fat, 95           Loiseau, Yannick, 131
Alam, Mehwish, 255
Antoni, L’ubomı́r, 35, 83          Mora, Ángel, 145, 267
                                   Mouakher, Amira, 169
Baixeries, Jaume, 1, 243
Bartl, Eduard, 207                 Naidenova, Xenia, 181
Ben Yahia, Sadok, 169              Napoli, Amedeo, 243, 255
Bertet, Karell, 145, 219           Nebut, Clémentine, 11
Bich Dao, Ngoc, 219                Nourine, Lhouari, 231

Cabrera, Inma P., 157              Ojeda-Aciego, Manuel, 157
Ceglar, Aaron, 23                  Otaki, Keisuke, 47, 59
Cepek, Ondrej, 9
Codocedo, Victor, 243              Parkhomenko, Vladimir, 181
Cordero, Pablo, 145, 267           Pattison, Tim, 23
Coupelon, Olivier, 131             Peláez-Moreno, Carmen, 119
                                   Peñas, Anselmo, 119
Dia, Diyé, 131                    Pócs, Jozef, 157
Dimassi, Ilyes, 169                Priss, Uta, 7
Enciso, Manuel, 145, 267
                                   Raynaud, Olivier, 131
                                   Revel, Arnaud, 219
Gnatyshak, Dmitry V., 231
                                   Rodrı́guez Lorenzo, Estrella, 145
Guniš, Ján, 35
                                   Rodrı́guez-Jiménez, José Manuel, 267
Huchard, Marianne, 11, 95
                                   Saada, Hajer, 11
Ignatov, Dmitry I., 231            Seki, Hirohisa, 71
Ikeda, Madori, 47, 59              Seriai, Abdelhak, 95
                                   Šnajder L’ubomı́r, 35
Kamiya, Yohei, 71
Kauer, Martin, 195                 Trnecka, Martin, 107
Kaytoue, Mehdi, 243                Trneckova, Marketa, 107
Konecny, Jan, 207
Krajči, Stanislav, 35, 83         Urtado, Christelle, 95
Krı́dlo, Ondrej, 35, 83
Krupka, Michal, 195                Valverde Albacete, Francisco J., 119
Kuznetsov, Sergei O., 231          Vauttier, Sylvain, 95

Labernia, Fabien, 131              Yamamoto, Akihiro, 47, 59
Title:                CLA 2014, Proceedings of the Eleventh International
                      Conference on Concept Lattices and Their Applications
Publisher:            Pavol Jozef Šafárik University in Košice
Expert advice:        Library of Pavol Jozef Šafárik University in Košice
                      (http://www.upjs.sk/pracoviska/univerzitna-kniznica)
Year of publication:  2014
Number of copies:     70
Page count:           XII + 280
Authors sheets count: 15
Publication:          First edition
Print:                Equilibria, s.r.o.



                        ISBN 978–80–8152–159–1