Conexp-Clj – A Research Tool for FCA Tom Hanika1 and Johannes Hirth1 Knowledge & Data Engineering Group, University of Kassel, Germany tom.hanika@cs.uni-kassel.de, jhi@cs.uni-kassel.de Abstract There is a plenitude of software programs to analyze data sets using notions from formal concept analysis (FCA). For example, there are 64 FCA related projects listed on GitHub. Those are developed in ten different programming languages and provide tools and libraries for computing formal concepts and alike. The research tool conexp-clj sticks out in this list. It is the only application in the FCA realm developed using the programming language Clojure, which is a modern, dynamic, and functional dialect of the Lisp programming language running on the Java platform. We summarize in this work the extensive set of notions from FCA that are covered by conexp-clj, show recent developments, present simple examples on a real world data set and depict a timeline for our next goals. Keywords: Formal Concept Analysis, Clojure, Functional Programming 1 Introduction With the increase of computing power as well as its broad availability the field of experimental mathematics [2] has flourished in the last decades. This has led to the development of uncountable many applications, libraries, and highly sophisticated tools for computing in various realms of pure and applied mathematics. There are at least two kinds of user for those: The first kind are users employing this kind of software to other research fields, e.g., sociology, physics, or biology. The second kind applies the developed algorithms and tools within the field they were developed in. The goal here is to discover new insights and to advance the research field itself. Formal concept analysis (FCA) is no exception to this categorization. There is a plenitude of tools for computing formal concepts, implications (functional depen- dencies) in data, etc. Many tools are extensive in their features, but limit the user through their user interfaces. Hence, formulating and expressing new ideas is bound to the expressiveness of the particular interface. Furthermore, getting your own new feature functions implemented and linked into the interface is costly and therefore not common in the academic realm. The software conexp-clj,1 developed by Daniel Borchmann as part of the DFG project (GA 216/10-1), aims in a different direction. Copyright c 2019 for this paper by its authors. Copying permitted for private and academic purposes. 1 https://github.com/exot/conexp-clj 2 T. Hanika and J. Hirth It embeds an extensive amount of FCA-functionality into a highly expressive pro- gramming language – Clojure. In this work we demonstrate this expressiveness on a real world data set and provide an overview about recent and future developments. 2 Basic Usage of conexp-clj In this work we use notions from formal concept analysis (FCA) as introduced in [6]. A formal context is a triple K = (G,M,I) with G being a set of object, M being a set of attributes and I ⊆ G×M an incidence relation between them. For conexp-clj use- cases we often refer with the symbol ctx to a formal context and with the symbol cpt to a formal concept. The declarative nature of functional programming languages, like Clojure, makes the computation of FCA knowledge acquisition tasks intuitive and adaptive. Being a Lisp dialect, Clojure uses prefix notation and puts statements into parenthesis. Basic collections and their notation in Clojure are sets #{}, vectors [ ], lists ’( ), and hash-maps { }. To create a context in conexp-clj one can use the make-context-from-matrix function, which creates a context from a binary matrix. The function def assigns the context to the symbol ctx: (def ctx (make-context-from-matrix ["platypus" "duck" "dog"] ["eggs" "mammal" "venomous"] [1 1 1 1 0 0 0 1 0])) There are plenty of other ways to input formal contexts. For example, using its incidence relation one can call (make-context #{1 2} #{1 2} #{[1 1] [1 2]}). Furthermore, by providing a function f as incidence one can input a context implicitly by (make-context objectset attributeset f), e.g., using the less or equal comparability (make-context [1 2 3] [1 2 3] <=). Note that Clojure treats functions as first-class citizens, i.e., the language supports passing func- tions as arguments to other functions. Following the functional paradigm, Clojure functions are idempotent (often called pure) and return when called twice the same output for a given input. The object/attribute set of a context ctx accessed through (attributes ctx) and (objects ctx) respectively. For attributes we call by (attribute-derivation ctx #{"mammal"}) the derivation operator and by (context-attribute-clojure ctx #{"mammal"}) the closure operator. The operators for objects are named accordingly. The set of all concepts can be com- puted with (concepts ctx) and we obtain the canonical base (stem base) with (canonical-base ctx). In terms of data exchange conexp-clj supports a variety of file formats. Among those are Burmeister, FCAalgs, Colibri, Conexp, CSV, and Galicia. In addition to binary contexts there is support for many-valued as well. 3 Unique Feature Coverage Besides the choice of a functional, and therefore very expressive, programming language, conexp-clj is equipped with abundance of features. This enables the Conexp-Clj – A Research Tool for FCA 3 Southern Woman: Stability vs Probability Southern Woman Randomized: Stability vs Probability 1.0 1.0 Concept Probability Concept Probability 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 Concept stability Concept stability Figure 1. Stability vs probability for concepts from the Southern Woman data set. researcher to easily implement new ideas and to experiment on them. First of all, conexp-clj tries to provide a diverse selection of algorithms for standard tasks like computing the set of concepts. For example, computing the set of concepts is possible through next closure [5], krajca [10], in-close [1], and others. Furthermore, em- ploying external binaries such as CbO or PCbO and interfacing with them is possible. Interestingness of Concepts. There is a plenitude of measures to express how interesting, relevant, useful or similar concepts or sets of concepts are in a formal context. A good overview is presented in [12], from which the majority is imple- mented in conexp-clj. For example, there is stability [11] of formal concepts. The stability of a concept (A,B) indicates how likely B can be generated using uniformly drawn object subsets from A. Another is robustness [13], which is a measures for how likely a concept remains closed after removing an object from the concept’s extent with probability 1−α. A different approach is taken by concept probability [9] that is the likelihood for an attribute set (object set) being closed. All those notions can be computed in conexp-clj for any formal context ctx in the following way: • (concept-stability ctx cpt) 0 computes Stab(A,B) = |{C⊆A|C 2|A| =B}| • (concept-robustness cpt (concepts ctx) α) computes r(c = (A,B),α) = Σ (−1)|Bd |−|Bc | (1−α)|Ac |−|Ad | d≤c • (concept-probability ctx cpt) n computes p(B = B 00 ) = Σ p(|B 0 | = k,B = B 00 ) k=0 We may remark that conexp-clj also includes weighted similarity for for- mal concepts. This function employs different measures for set similarity like the Jaccard Index, the Sorensen coefficient or symmetric set difference. Expressing New Ideas. In the following we demonstrate how novel ideas can easily be expressed in conexp-clj. For this we compute two examples using the well known Southern Woman data set [14]. The example computations were chosen 4 T. Hanika and J. Hirth Southern Woman Southern Woman Random 1.0 1.0 Relative Intent Size Relative Intent Size 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Relative Extent Size Relative Extent Size Figure 2. Stability vs probability for concepts from the Southern Woman data set. for our demonstration purpose only. We do not claim that there is deeper insight to grasp from those. Imagine one wants to investigate possible correlations between the probability and stability of a set of formal concepts in some formal context. This idea can be expressed in simple terms using conexp-clj as shown below for context ctx: (map (fn [x] [(concept-stability ctx x) (concept-probability ctx x)]) (concepts ctx))) This function call computes at first the set of all concepts for the given context. Secondly, it maps the functions concept stability as well as the probability to the beforehand computed formal concepts. Lastly, it returns a (lazy) list of two-element- vectors which then can be plotted using a suitable tool. We depicted the results in Figure 1, which does also include a plot for a random formal context which exhibits the same statistical properties as the Southern Woman data set, i.e., number of objects, number of attributes and density. The Clojure code shown above can be in- corporated in a new function by (defn somename [ctx] ...). Another idea could be the investigation of the relation between the sizes of a concept’s intent and extent for a given formal context ctx, as shown in the next code example, also see Figure 2: (map (fn [x] [(/ (count (first x)) (count (objects ctx))) (/ (count (second x)) (count (attributes ctx)))]) (concepts ctx))) Probably Approximately Correct Methods. When applying classical no- tions from FCA to large data sets one might encounter computationally infeasible problems. To date there are two approaches to cope with this implemented in conexp-clj. There is the probably approximately correct (PAC) canonical base, as investigated in [3]. Furthermore, there is a PAC version of the classical exploration algorithm present in the codebase of conexp-clj, which was introduced in [4]. Social Network Analysis and Graphs. Employing FCA for social network analysis has found an increasing attention in the last decade. Hence, various notions Conexp-Clj – A Research Tool for FCA 5 like average shortest path, average local clustering coefficient and degree distribution were added recently to conexp-clj. More functionality is developed continuously. 4 Outlook and Future Work We presented the research tool conexp-cljand its comprehensiveness. Yet, there are many new features to come. For example, enhanced lattice diagram drawing through novel developed algorithms, interfacing with Wikidata through the SPARQL end- point, motivated by [8], and dimension reduction methods like attribute selection [7]. References [1] S. Andrews. “In-Close, a Fast Algorithm for Computing Formal Concepts.” In: Proc. ICCS 2009. Ed. by Sebastian Rudolph, Frithjof Dau, and Sergei O. Kuznetsov. Vol. 483. CEUR Workshop Proceedings. CEUR-WS.org, 2009. [2] D. H. Bailey and J. M. Borwein. “Experimental mathematics: Examples, methods and implications.” In: Notices of the AMS 52.5 (2005), pp. 502–514. [3] D. Borchmann, T. Hanika, and S. Obiedkov. “On the Usability of Probably Approximately Correct Implication Bases.” In: ICFCA. Ed. by K. Bertet et al. Vol. 10308. Lecture Notes in Computer Science. Springer, 2017, pp. 72–88. [4] D. Borchmann, T. Hanika, and S. Obiedkov. “Probably approximately cor- rect learning of Horn envelopes from queries.” In: Accepted for publication: Journal Discrete Applied Mathematics abs/1807.06149 (2018). Accepted for publication: Journal Discrete Applied Mathematics. [5] B. Ganter. “Two Basic Algorithms in Concept Analysis.” In: Formal Concept Analysis. Ed. by Léonard Kwuida and Barış Sertkaya. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 312–340. [6] B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. Springer-Verlag, Berlin, 1999, pp. x+284. [7] T. Hanika, M. Koyda, and G. Stumme. “Relevant Attributes in Formal Contexts.” In: Accepted for ICCS’19 abs/1812.08868 (2018). [8] T. Hanika, M. Marx, and G. Stumme. Discovering Implicational Knowledge in Wikidata. cite arxiv:1902.00916. 2019. [9] M. Klimushkin, S. Obiedkov, and C. Roth. “Approaches to the Selection of Relevant Concepts in the Case of Noisy Data.” In: Formal Concept Analysis. Ed. by Léonard Kwuida and Barış Sertkaya. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 255–266. [10] P. Krajca, J. Outrata, and V. Vychodil. “Parallel Recursive Algorithm for FCA.” In: Proc. CLA 2008. Ed. by Radim Belohlavek and Sergei O. Kuznetsov. Vol. 433. CEUR Workshop Proceedings. CEUR-WS.org, 2008, pp. 71–82. [11] S. O. Kuznetsov. “On stability of a formal concept.” In: Annals of Mathematics and Artificial Intelligence 49.1 (Apr. 2007), pp. 101–115. [12] S.O. Kuznetsov and T. Makhalova. “On interestingness measures of formal concepts.” In: Information Sciences 442-443 (2018). Ed. by W. Pedrycz, p. 202. 6 T. Hanika and J. Hirth [13] N. Tatti, F. Moerchen, and T. Calders. “Finding Robust Itemsets Under Subsampling.” In: ACM Trans. Database Syst. 39.3 (Oct. 2014), 20:1–20:27. [14] S. Wasserman and K. Faust. Social Network Analysis. Methods and Applica- tions. Structural Analysis in the Social Sciences. New York, USA: Cambridge University Press, 1994.