Using Formal Concept Analysis for discovering knowledge patterns Mohamed Rouane-Hacene1 , Marianne Huchard2 , Amedeo Napoli3 , and Petko Valtchev1 1 Dépt. informatique, UQAM, C.P. 8888, Succ. CV, Montréal, Canada H3C 3P8 2 LIRMM, 161, rue Ada, F-34095 Montpellier 3 LORIA, BP 70239, F-54506 Vandœuvre-lès-Nancy Abstract. Design patterns are used in software engineering for guiding code design: they play the role of models to be followed for producing code of better quality. In the same way, knowledge patterns are intro- duced in knowledge engineering as ontology components that can be used as models and reused as ontology design patterns (ODPs) in ontology en- gineering. Accordingly, we present in this paper the use of both Formal Concept Analysis (FCA) and Relational Concept Analysis (RCA) for designing compact concept lattices that can be re-engineered as ODPs. Starting with a simple example, it is shown how to derive conceptual and relational abstractions that can be re-engineered as knowledge patterns, following and adapting the meta-modeling activity in software engineer- ing. This paper aims at showing that FCA and RCA are efficient and reliable tools for guiding knowledge engineering. Moreover, this is one of the few attempts to generate effective knowledge patterns from data available as sets of objects, attributes, and relations between objects. 1 Introduction An ontology is supposed to capture and organize domain knowledge within a set of concept and relation definitions encoded in a knowledge representation language [15]. While ontologies are modules of first importance in the frame- work of Semantic Web, there does not exist yet a universal approach to ontology engineering [12, 9] contrasting the widespread use of UML for software engi- neering [11, 14]. Moreover, there is a growing interest in considering ontology engineering as a composition of “knowledge patterns” in the same way as code is built using and assembling design patterns in an application software [3, 6]. Such an approach to software engineering favors model-driven engineering where the specification of a software is given by a “model ” which is further refined. This model is made of classes, relations between classes, and operations associated to the classes and modeling the basis of the software. Let us consider for example the “composite pattern” which is used to represent simple objects and their com- position in a composite object with a common interface, i.e. their operations are the same and they understand the same messages. It can be used to represent 224 M. Rouane-Hacene, M. Huchard, A. Napoli and P. Valtchev files and directories in a Unix system, where the common interface includes op- erations for renaming, changing access rights, moving, removing, etc. The design solution uses a class “Component” containing the common operations (mostly in abstract form), which is specialized into a class “Leaf” to represent simple objects and a class “Composite” which represents the composite objects [5, 10]. In this paper, we transpose ideas from code design in software engineering for guiding ontology engineering. Actually, we are interested in the refactoring of models such as UML models, as ontological and reusable components. We start with an informal and small-sized problem statement represented within sets of objects, attributes, and relations between objects, or UML diagrams. The later are the inputs of a re-engineering process based on Formal Concept Analysis (FCA [7]) and Relational Concept Analysis (RCA [13]) whose output are concept lattices. These lattices provide a number of potential constructions that can be then applied by a knowledge engineer (a designer) for building a final knowledge pattern. The resulting concepts and relations, and their hier- archical organization, can then be represented into Description Logics (DL [1]) as a set of concepts and roles to be reused as a pattern in ontology engineer- ing. This approach can be considered as a semi-automated method for designing “knowledge patterns’ or “ontology design patterns” (ODPs), i.e. conceptual com- ponents which are small-sized, compact, and generic enough to be reused as building blocks for ontology engineering [3, 6]. A parallel can be drawn between DL-based ontology engineering and con- cept lattice design [2, 16]. The notion of a concept covering a set of objects or instances (extent) sharing a set of attributes (intent) is common to both FCA and to DL. However, concept and relation descriptions are not obtained in the same way. In the DL framework, the design can be considered as top-down as in object-oriented design [14]. The representation begins with abstract atomic concepts and roles, that are then combined with operators for building more complex definitions. A classifier can be used to organize the concepts and roles into a subsumption hierarchy. This top-down approach is intensional and partial w.r.t. domain knowledge: concepts and roles are described as intents, i.e. sets of complex attributes, and only a part of the domain knowledge is represented. By contrast, FCA can be viewed as a learning process working bottom-up. FCA builds a concept lattice from a binary table Objects × Attributes where a con- cept has an extent and an intent, i.e. a maximal set of objects sharing a maximal set of attributes respectively [7]. This bottom-up approach is extensional as it directly works on sets of objects and attributes, and complete w.r.t. the initial data in the table. Moreover, relations between objects can be taken into account in the Relational Concept Analysis framework (RCA [13]). RCA builds a con- cept lattice from binary tables Objects × Attributes and Objects × Objects, where the intent of concepts is made of classical binary attributes and relational attributes as well, reifying links between objects. As an original contribution, this paper details a process based on FCA and RCA providing a designer a lattice structure supporting a knowledge pattern. This knowledge pattern can be reused for ontology engineering and can be used Using FCA for discovering knowledge patterns 225 either per se or for designing and completing an ontology. The process is semi- automated as it requires external interaction. Finally, this is one of the few attempts to automatize the discovery of lattice structures supporting knowledge patterns from data. The paper is organized as follows. Firstly, a working example is presented and modeling problems are discussed. Then the FCA and RCA frameworks are detailed. The evolution from modeling to concept lattices is discussed in each case. A synthesis and a discussion conclude the paper. 2 An example of domain modeling Fo The design of an ontology for a domain of interest is a key task in knowledge en- rN gineering, in particular for Semantic Web applications [9, 15]. The task is rarely fully automated: important steps are usually performed manually by the de- on signer. In fact, a parallel can be drawn between knowledge and software engi- -co neering. For example, use case-driven object modeling consists in deriving arti- mm facts of a software, including the class model, from use cases [11, 14]. A use case model captures high-level system requirements through a set of system usage erc “scripts”, whereas the class model describes an infrastructure supporting those ial requirements, i.e., classes realizing the necessary functionalities. The class model package Data[ Us package Initial]Classes [ Usecases emphasizes Initial diagram ] the data manipulated by the system and reflects domain entities, as- eO sociations composed of two roles, one for each extremity of the association. The problem is the same from a knowledge engineering perspective: how to build a nly satisfactory, concise, and complete model w.r.t. the domain, and then to repre- sent this model within a knowledge representation formalism. Author Fo -name Paper rN -username -password +co-authors +write -title -abstract -affiliation on -keywords -co -email -sourceFile mm submit paper +login() +checkStyle() +notifyAuthors() Author erc Fig. 1. The use case “submit a paper” and its class model in UML-like notation. ial Review paper Us Reviewer eO nly We introduce now the running example on which relies this paper, related to the submission and the evaluation of a paper for a conference. The diagrams on the right-hand side of Figures 1 and 2 capture theReport main classes of the use cases Fo “submit a paper” and “review a paper”, respectively, +reports +reviews -commentsboth modeling an opera- rN tion within a conference management system (CMS). considered as the result of a first+reviewers +rank() These two models can be step in modeling a CMS. They are both par- on tial and reveal various shortcomings w.r.t. quality criteria such as +rate completeness -co Reviewer Paper mm -name -title -username -abstract -password erc -affiliation -email -keywords -sourceFile ial +login() +checkStyle() Us +notifyPCMember() eO ... nly package Initial Classes [ Initial diagram ] Fo rN 226 Author M. Rouane-Hacene, M. Huchard, A. Napoli and P. Valtchev on -name Paper -username -title -co -password and redundancy. Indeed, classes modeling +co-authors conference -affiliation +write -abstract contributors share many mm superclass, say Participant. Moreover, a superclass representing the generic -keywords properties, including personal data, that -email should be centralized in a-sourceFile common ercconcept of conference artifacts, i.e., papers or reviews, is missing. Such a class, +login() +checkStyle() +notifyAuthors() iaSubmission, say l U the relationships between, on one hand, the different types of con- package Data[ abstracting should be linked to the Participant class by an association Usecases ] tributors, s eO and, on the other hand, the corresponding types of submissions. nly Report +reports Fo +reviews -comments rN +rank() on +reviewers +rate -name Reviewer -co Paper mm submit paper -title -username -abstract Author -password erc -keywords -affiliation -sourceFile -email +login() +checkStyle() ial Review paper +notifyPCMember() ... Us Reviewer eO Fig. 2. The use case “review a paper” and its class model in UML-like notation. nly Fo rN on -co els. Accordingly, Figure 3 shows a possible factorization of the two partial mod- msupport Due to its compactness and its maximal factorization, such a model could m a knowledge pattern to be reused in ontology engineering for CMS and –aftereadaptation– for CMS-like applications, e.g. participation to an event, an rci a competition, a course, etc. examination, al Us we show how the use of FCA and RCA leads to the dis- In the following, e O operations that can help the knowledge engineer covery of series of construction in building a global “class n relations between objects arely model” for CMS. In this resulting class model, the MagicDraw UML, 1-1 /Users/rouanehm/Documents/Research/Papers/CLA10/FiguresCLA-140810/cms-cla/cms-r of first importance and should be taken into ac- count. Yet, albeit suitable for abstracting class identification in software and knowledge engineering tasks [8], FCA ignores these links inherent to any real- istic datasets and models. At this point RCA comes into play which allows the analysis of objects and relations between objects. Below, a short overview of FCA and RCA precedes the analysis of the CMS problem and the design of a class model supporting and associated knowledge pattern. MagicDraw UML, 1-1 /Users/rouanehm/Documents/Research/Papers/ECTAI08/figures/confSys.mdzip Usecases May 27, 200 For Non-Commercial Use Only Using FCA for discovering knowledge patterns 227 Participant -name Submission +participants +submit -username -password -affiliation -email +login() Paper +rate Author -title +co-authors +write -abstract ... -keywords +notifyAuthors() -sourceFile +reports +checkStyle() Report Reviewer -comments ... +reviewers +reviews +rank() +notifyPCMember() Fig. 3. A possible class model for the “Conference Manager System” (CMS). 3 FCA and RCA frameworks 3.1 The basics of FCA The basics of FCA are introduced in [7]. In the present case, the formal context given in Table 1 and denoted by Kclass binds a set of objects –the four “classes” of the CMS class model– to attributes –the class variables and methods. The corresponding concept lattice Lclass in drawn on Figure 4, where, e.g. the concept Concept_5 represents the Reviewer class in the CMS system. The full extent of Concept_5 is {Reviewer} and the full intent is the set {name, username, password, affiliation, email, login(), notifyPCMember()}. notifyPCMember() notifyAuthors() checkStyle() affiliation sourceFile username password abstract keywords comments login() rank() email title name Author ××××× × × Paper ×××× × Reviewer × × × × × × × Report × × Table 1. A context encoding the relation between objects, i.e. the “classes” of the CMS model, and their attributes. A cross × means that object oi has attribute aj . L, 1-1 /Users/rouanehm/Documents/Research/Papers/CLA10/FiguresCLA-140810/cms-cla/cms-reference-model-cla10.m 228 M. Rouane-Hacene, M. Huchard, A. Napoli and P. Valtchev Fig. 4. The concept lattice Lclass corresponding to the formal context Kclass (Table 1) where the bottom concept ⊥ is omitted. Beside a class hierarchy, an integrated class model must include associations that are available within partial models, and possibly, abstractions of these asso- ciations. The abstraction of associations requires encoding association roles into a formal context together with their attributes. Indeed, the RCA framework pre- sented hereafter addresses these concerns, allowing FCA to take effectively and efficiently into account relational data. 3.2 The basics of RCA RCA was introduced and detailed in [13]. Data is described by a relational con- text family (RCF), composed of a set of contexts K = {Ki } and a set of binary relations R = {rk }. A relation rk ⊆ Oj × O` connects two object sets, a domain Oj (dom(rk ) = Oj ) and a range O` (ran(rk ) = O` ). For example, the RCF corre- sponding to the CMS example could be composed of two contexts, namely Kclass (Table 1) and Kassoc (Table 2), and a set of relations R = {owns, type} (see Ta- ble 3). Kclass encodes classes as objects and class characteristics as attributes, while Kassoc encodes association roles as objects and role names as attributes. Hereafter, we briefly recall the mechanism of RCA necessary for understand- ing the following (details are given in [13]). RCA is based on a relational scaling mechanism that transforms a relation rk into a set of relational attributes that are added to the context describing the object set dom(rk ). To that end, rela- tional scaling follows the DL semantics of role restrictions. For each relation rk ⊆ Oj × O` , there is an initial lattice for each object set, i.e. Lj for Oj and L` for O` . For example, considering owns ⊆ Oclass × Oassoc these initial lattices are respectively Lclass (Figure 4) and Lassoc (Figure 5). Given a relation rk ⊆ Oj × O` , a relational attribute ∃rk : c –c being a con- cept and ∃ the existential quantifier– is associated to an object o ∈ Oj whenever Using FCA for discovering knowledge patterns 229 ’write’ ’reviews’ ’co-authors’ ’reviewers’ ’rate’ ’reports’ write × reviews × co-authors × reviewers × rate × reports × Table 2. The context Kassoc describing association roles in Figures 1 and 2. Fig. 5. The concept lattice Lassoc corresponding to the formal context Kassoc (Table 2) where the bottom concept ⊥ is omitted. rk (o) ∩ extent(c) 6= ∅ (other quantifiers are available, see [13]). For example, let us consider the concept Concept_10 in the concept lattice Lassoc representing the association role reviews (see Figure 5). The attribute “owns : Concept_10” is associated to the concept Concept_5 in the lattice Lclass that models the class Reviewer in the CMS system (see Figure 6). 4 The design of the knowledge pattern “CMS” In this section, we detail the application of the FCA and then RCA processes on the relational context family K = {Kclass , Kassoc } and R = {owns, type}. 4.1 The three main steps of the design Design step#0. In the initial situation –step #0– of the RCA process, we have two lattices built from the class and association role contexts. Actually, these 230 M. Rouane-Hacene, M. Huchard, A. Napoli and P. Valtchev owns write reviews co-authors reviewers rate reports Author × Paper × × Reviewer × Report × × type Author Paper Reviewer Report write × reviews × co-authors × reviewers × rate × reports × Table 3. The relations owns and type between classes and association roles. An Author owns the role write, i.e. the domain of write is Author, and write type(s) Paper, i.e. the range of write is Paper. two initial lattices can be read on Figures 6 and 7 where their concepts have their names ended by (0) (actually, the numbering of concepts in the concept lattices Lclass and Lassoc is kept all along the design process). At this step, the relations owns and type are not yet considered. In the class lattice, the concepts of interest are: Concept_0 (Top), Concept_6 (Author), Concept_5 (Reviewer), Concept_4 (Report), and Concept_3 (Paper). In addition, Concept_1 corre- sponds to a generalization of the concepts Author and Reviewer and factorizes their attributes and methods. This generalization provides the knowledge engineer a first clue, i.e. a con- struction for building a concept that can be called Participant generalizing concepts Author and Reviewer. Design step#1. At this step, the relational contexts are considered with rela- tions owns, i.e. domains of association roles, and type, i.e. range of association roles. More precisely, a concept C has an attribute such as “owns : X” where X is a concept whenever C “owns” a role present in the extent of X. For example, the concept Concept_6 in Lclass has an attribute “owns : Concept_8”, meaning that Author (extent of Concept_6 in Lclass ) owns the role write (extent of Concept_8 in Lassoc ). Attributes of the form “type : Y” are built in the same way. The complements and generalizations appearing in the lattices at step#1 are the following: – In the class lattice, the concept Concept_6 (step#0) gets the attribute “owns : Concept_8” indicating that Author owns the role write. – In the association role lattice, Concept_15 corresponds to a generalization of co-authors and reviewers, whose type is Concept_1 in the class lattice, i.e. Using FCA for discovering knowledge patterns 231 Fig. 6. The final “class lattice” where the concept ⊥ is omitted. Participant. Actually, this provides a second clue and allows the knowledge engineer to discover the role participants of the final model. – In the same way, Concept_17 generalizes roles related to Report, i.e. reports and reviews, and Concept_16 generalizes roles related to Paper, i.e. rate and write. Design step#2. At this step, the association role lattice is not significantly modified but the following concepts are created in the class lattice: – Concept_18 denotes objects related to Participant through the relational attribute “owns : Concept_15”, generalization of the roles co-authors and reviewers. Concept_18 is particularly interesting and provides a third clue for the knowledge engineer, i.e. it allows to discover a generalization of con- cepts Paper and Report that can be called Submission. – Concept_19 denotes objects related to Paper through “owns : Concept_16” generalizing rate and write, – Concept_20 denotes objects related to Report through “owns : Concept_17” generalizing reports and reviews. Design step#3. At step#3, the following generalizations are created in the association role lattice: 232 M. Rouane-Hacene, M. Huchard, A. Napoli and P. Valtchev Fig. 7. The final “association role lattice” where the concept ⊥ is omitted. – Concept_21 is a role attached to objects related to Submission and provides a fourth clue, i.e. it allows to discover the role called submit of the knowledge pattern, – Concept_22 is a role whose type is composed of objects related to Paper, and Concept_23 is a role whose type is composed of objects related to Report. The following generalizations are created in the class lattice: – Concept_19 gets the role “owns : Concept_23”, i.e. Author and Report are related to objects themselves related to objects related to Report, – Concept_20 gets the role “owns : Concept_22”, i.e. Reviewer and Paper are related to objects themselves related to objects related to Paper... 4.2 Synthesis and discussion There are four main steps in the design of the class and association role lattice through the FCA and RCA processes. The two lattices will help the knowledge engineer to build an associated knowledge pattern based on the discovery of the different concepts and relations denoted above as “clues”: 1. At step#0, Concept_1 yields the generalization Participant (first clue). Using FCA for discovering knowledge patterns 233 2. At step#1, the association roles co-authors and reviewers are general- ized into Concept_15, interpreted as the role participants attached to the concept Participant in the final pattern (second clue). 3. At step#2, Paper and Report are sharing the role owns : Concept_15 (participants) and are classified in concept Concept_18, corresponding to the discovery of class Submission in the final pattern (third clue). 4. At step#3, the discovery of Concept_21 allows to complete Concept_0 with the role owns : Concept_21, i.e. objects related with Submission (fourth clue). This last role is inherited by Participant and can be attached to this class by the designer. The final global class model as it can be designed by the knowledge engineer is given on Figure 3. This class model can be then materialized as an effective knowledge pattern, through an implementation say in OWL to be considered in ontology engineering. This example is simple and low-sized w.r.t. the number of involved concepts and relations, but sufficiently generic too to be reused in a number of practical situations, after adaptation (e.g. renaming): e.g. supply and demand, recruitment, and selection committee (see for example an interesting and related discussion in [4]). However, as also shown in this paper, the design work is not simple even if guided by FCA and RCA, and asks for an active participation of the knowledge engineer (the designer). For example, the discovery and the naming of extracted roles and concepts as discussed above, is neither easy nor immediate. In this way, we still have to make a series of experiments on existing but also on new knowledge patterns for having a better understanding of the design process. For example, we should improve the scaling of the process: what are the patterns that can be easily built and those that are not, depending on the complexity of the initial data, i.e. sets of objects, attributes, and relations, and UML structures. At present, we can only consider small and compact object and relation sets, easy to read and to understand, and not all knowledge patterns have the same dimensions. 5 Conclusion In this paper, we have proposed a semi-automated process based on FCA and RCA for helping a knowledge engineer designing a knowledge pattern. This is an original contribution to what could be called “ontology engineering assisted by computer”. Interestingly, this approach is borrowed from software engineering and brings new and powerful ideas for helping ontology engineering. In addition, the approach is supported by working tools such as the Galicia platform1 and the ERCA (Eclispe Relational Concept Analysis) platform2 . Apart the consolidation of the preceding work and the need for experiments at a larger scale, there are many research perspectives. For example, there should 1 http://sourceforge.net/projects/galicia/ 2 http://code.google.com/p/erca/ 234 M. Rouane-Hacene, M. Huchard, A. Napoli and P. Valtchev be a deep study on the way how meta-modeling in software engineering could be adapted to knowledge engineering with the help of a platform such as Galicia. Meta-modeling aspects of software engineering have to be extended for taking into account the specificity of knowledge engineering requirements. Next, the design process should be abstracted and maximally automated for an optimal use in ontology engineering. References 1. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, editors. The Description Logic Handbook. Cambridge University Press, 2003. 2. F. Baader, B. Ganter, U. Sattler, and B. Sertkaya. Completing description logic knowledge bases using formal concept analysis. In IJCAI 2007, pages 230–235, 2007. 3. P. Clark, J. Thompson, and B.W. Porter. Knowledge patterns. In S. Staab and R. Studer, editors, Handbook on Ontologies (First Edition), pages 191–208. Springer, Berlin, 2004. 4. M. Ducassé and S. Ferré. Fair(er) and (almost) serene committee meetings with logical and formal concept analysis. In P.W. Eklund and O. Haemmerlé, editors, Proceedings of ICC 2008, Lecture Notes in Computer Science 5113, pages 217–230. Springer, Berlin, 2008. 5. M. Fowler. Patterns of Enterprise Application Architecture. Addison-Wesley, Read- ing (MA), 2002. 6. A. Gangemi and V. Presutti. Ontology Design Patterns. In S. Staab and R. Studer, editors, Handbook on Ontologies (Second Edition), pages 221–243. Springer, Berlin, 2009. 7. B. Ganter and R. Wille. Formal Concept Analysis. Springer, Berlin, 1999. 8. R. Godin and P. Valtchev. Formal concept analysis-based normal forms for class hi- erarchy design in oo software development. In B. Ganter, G. Stumme, and R. Wille, editors, Formal Concept Analysis: Foundations and Applications, Lecture Notes in Artificial Intelligence 3626, pages 304–323. Springer, Berlin, 2005. 9. A. Gómez-Pérez, M. Fernàndez-López, and O. Corcho. Ontological Engineering. Springer, Berlin, 2004. 10. G. Hohpe and B. Woolf. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley, Reading (MA), 2004. 11. P. Kruchten. The Rational Unified Process: An Introduction (2nd Edition). Addison-Wesley, 2000. 12. A. Maedche. Ontology Learning for the Semantic Web. Kluwer Academic Publish- ers, Boston, 2003. 13. M. Rouane-Hacene, M. Huchard, A. Napoli, and P. Valtchev. A proposal for combining formal concept analysis and description logics for mining relational data. In S.O. Kuznetsov and S. Schmidt, editors, Proceedings of ICFCA 2007, Lecture Notes in Artificial Intelligence 4390, pages 51–65. Springer, Berlin, 2007. 14. J. Rumbaugh, I. Jacobson, and G. Booch. The Unified Modeling Language Refer- ence Manual. Addison-Wesley, 2004. 15. S. Staab and R. Studer, editors. Handbook on Ontologies (Second Edition). Springer, Berlin, 2009. 16. R. Wille. Methods of conceptual knowledge processing. In R. Missaoui and J. Schmid, editors, Proceedings of ICFCA 2006, Lecture Notes in Artificial In- telligence 3874, pages 1–29. Springer, Berlin, 2006.