Steps Towards Interactive Formal Concept Analysis with LatViz Mehwish Alam1 , Thi Nhu Nguyen Le2 , and Amedeo Napoli2 1. Laboratoire d’Informatique de Paris-Nord, Université Paris 13, Paris, France 2. LORIA (CNRS – Inria Nancy Grand Est – Université de Lorraine) BP 239, Vandoeuvre-lès-Nancy, F-54506, France {alam@lipn.univ-paris13.fr}{thi-nhu-nguyen.le,amedeo.napoli@loria.fr} Abstract. With the increase in Web of Data (WOD) many new chal- lenges regarding exploration, interaction, analysis and discovery have surfaced. One of the basic building blocks of data analysis is classifi- cation. Many studies have been conducted concerning Formal Concept Analysis (FCA) and its variants over WOD. But one fundamental ques- tion is, after these concept lattices are obtained on top of WOD, how the user can interactively explore and analyze this data through concept lattices. To achieve this goal, we introduce a new tool called LatViz, which implements several algorithms for constructing concept lattices and allows further navigation over lattice structure. LatViz proposes some remarkable improvements over existing tools and introduces vari- ous new functionalities such as interaction with expert, visualization of Pattern Structures, AOC posets, concept annotations, filtering concept lattice based on several criteria and finally, an intuitive visualization of implications. This way the user can effectively perform an interactive exploration over a concept lattice which is a basis for a strong user in- teraction with WOD for data analysis. Keywords: Lattice Visualization, Interactive Exploration, Web of Data, For- mal Concept Analysis. 1 Introduction In the last decade, there has been a huge shift from the Web of Documents to the Web of Data (WOD). Web of Documents represents data in the form of HTML pages which are linked with other HTML pages through hyperlinks. This web of documents has evolved into WOD where all the information contained is represented in the form of entities and relations allowing the semantics to be embedded in the representation of these data. These data are in the form of a (node-arc) labeled graph belonging to several domains such as newspapers, publications, biomedical data etc. The growth in the publication of data sources in WOD has made it an important source of data, which has led towards many challenges pertaining to effective utilization of this data. WOD mainly represents data in the form of Resource Description Framework (RDF)1 . There are several 1 http://www.w3.org/RDF/ ways such as data dumps and SPARQL queries to access these data, which can be utilized for several applications, one of which is visualization and interactive exploration for data analysis purposes. Several visualization tools have been developed for this purpose, one of which is LODLive2 [6], where the user can choose data-sets such as DBpedia and Freebase and specify an entity as a starting point for browsing the node-arc labeled graph. Another tool based on graphical display is RelFinder [15], where given several entities the tool automatically finds the paths connecting these entities. The major drawback of LODLive is that after two hops the number of nodes increase and it is hard to visualize the data. Moreover, these tools are good for getting an insight into what RDF graph contains but they are not built for the purpose of knowledge discovery. In order to provide the user with the ability to perform data analysis and knowledge discovery over such kind of data, there is a need to perform classi- fication, where the obtained classes are further made available to the user for exploration and subjective interpretation. In the current study we use Formal Concept Analysis as the basis for classification. Several studies have already been conducted using FCA and its variants over RDF graphs or its generalization to knowledge graphs. Out of these studies so far RV-Xplorer [3] is the only tool that actually allows interactive exploration of clustered RDF data [2]. The purpose of this paper is to enhance the functionalities discussed in the previous studies. In this study we introduce a new tool LatViz which increases the interpretabil- ity of a concept lattice by remarkably improving the user interaction with the concept lattice as compared to existing tools. Various new functionalities have been introduced such as the visualization of Pattern Structures and AOC-posets, concept annotation, filtering concept lattice and pattern concept lattice based on several criteria and finally, an intuitive visualization of implications. This way the user can effectively perform an interactive exploration over a concept lattice which in turn gives a basis for a strong user interaction with WOD for knowledge discovery purposes. In this paper, we detail the important interaction operations implemented in LatViz. In the rest of this paper we refer to “user” as an “expert” as (s)he having basic knowledge about the lattice structure. The paper is structured as follows: Section 2 introduces a motivating example. Section 3 introduces some of the important functionalities of LatViz, while in Section 4, we discuss some of the related tools already developed and finally Section 5 details the future perspectives of the current work. 2 Motivating Example Let us consider that an expert is searching for papers published by a particular team in conferences or journals related to his/her field of research. In order to locate the papers of his/her interest (s)he has to search for specific keywords or authors in the local portal. For getting the view of which kind of papers are contained (s)he has to run a broad query and then narrow down his/her 2 http://en.lodlive.it/ query to obtain papers on specific keywords or authors or group of keywords or authors. The expert will end up running several queries to get what (s)he wants. Moreover, if the expert wants to know the collaborations of the team with other members of the research community outside the team, as well as the diversity and the specialization of the team members, this cannot be directly obtained by simple querying. To obtain such kind of knowledge there is a need to introduce a support for data analysis. Based on this scenario, we show how the expert can be guided thanks to an adapted visualization tool to obtain such information of interest with the help of concept lattices. 3 LatViz for Interactive Exploration of Concept Lattices 3.1 User Interface The display of LatViz resembles Conexp3 , which provides basic functionalities for building a concept lattice. LatViz implements two algorithms for building a concept lattice from a binary context, one of which is introduced in [14]. Another, efficient algorithm for building a concept lattice is AddIntent [20]. A demo of LatViz is available on-line through http://latviz.loria.fr/latviz/. The concept lattice for the scenario in section 2 was created by mapping the RDF data to a formal context K “ pG, M, Iq. The subjects of the triples were considered as the set of objects G, the objects in the RDF triples i.e., keywords and authors were considered as the set of attributes M . In this example, the RDF triples were created from the publications of the Knowledge Discovery (KDD) team in LORIA. The number of objects in the context are 343 and attributes are 1516. Very often huge concept lattices are obtained based on the context size. LatViz provides several interactive operations allowing for reduction of explo- ration space of the expert. To-date this is the most interactive tool having many unique functionalities such as handling numeric data with the help of interval pattern structures, AOC-posets, filtering concept lattice and implications which provides support for data analysis. Other functionalities such as annotating the lattice, level-wise display of a concept lattice etc. are discussed in many con- texts but are not yet directly implemented in the commonly used tools. In the following we detail each of these functionalities for data analysis. 3.2 AOC-Posets AOC-poset is a partially ordered set of the attribute and object concepts, first introduced in [18, 19]. The object and attribute concept are referred to as intro- ducers in [5]. Once an attribute is introduced in a concept it is inherited from top to bottom while, dually, an introduced object is “inherited” from bottom to top. During this study, we implement the Hermes Algorithm introduced in [5] for building AOC-Poset from binary context. Figure 1 show the highlighted AOC-Posets of the concept lattice built for the running scenario. 3 http://conexp.sourceforge.net/ Fig. 1: AOC-Posets. 3.3 Displaying Concept Lattice Level-wise AOC-Posets actually reduce exploration space but still a huge number of con- cepts remain to be observed. LatViz allows the creation of concept lattice level- wise by interaction. When an expert clicks on the top concept, LatViz computes and displays the first level. After that the expert can select the concept for con- tinuing the exploration, then LatViz computes the next level for that concept. In Figure 2, the top image shows the first level of the concept lattice built by selecting the top concept. Then the expert can view the contents of each concept on mouseover. In the running example, expert locates the concept with all the papers of Amedeo Napoli (i.e., K#2), which shows that the total number of doc- uments written by Amedeo Napoli are 152. On selecting this concept, the next level of the lattice originating from the selected concept is computed (shown in the bottom image in Figure 2). 3.4 Display Sub/Super-Concepts of a Concept In case of huge concept lattices sometimes it is hard to keep track of the order- ing relations between the concepts. LatViz allows the expert to only highlight sub/super-concepts of a concept. For example, if the expert wants to display all the publications along with the collaborations of the author Amedeo Napoli, (s)he can highlight the associated sub-lattice of the attribute concept of “Amedeo Napoli”. Figure 3 shows the highlighted sub-lattice in brown. An expert can high- light the super-concepts connected to a concept. If the expert is looking for all the papers having some keywords common with the paper Knowledge Organiza- tion and Information Retrieval Using Galois Lattices having one or more of the Fig. 2: Top image shows the first level of the concept lattice, the bottom image shows the second level built by interaction. Fig. 3: The sub-lattice highlighted for the author “Amedeo Napoli”. Fig. 4: Highlighting super-concepts of a concept. keywords in the intent of the concept then (s)he can highlight the sub-lattice of super-concepts associated to it (see Figure 4). 3.5 Display/Hide the Sub-lattice This functionality was partially implemented in RV-Xplorer [3] to reduce the interaction space of the expert. Here the expert can only show the part of the concept lattice in which (s)he is interested. The expert can locate the interesting concept by navigation, containing some intent or extent. If an intent is interest- ing and the expert marks the concept as interesting then only the sub-concepts of this concept are shown to the expert as the intents are inherited from top to bot- tom. Dually, if an extent is interesting for the expert then all the super-concepts are shown to the expert as the extent is inherited bottom-top. Previously, the expert highlighted the sub-lattice of the concept containing all the papers of Amedeo Napoli, now if the expert is interested in only the papers of Amedeo Napoli on Knowledge Representation then (s)he can navigate downwards and only see this part of concept lattice by marking it interesting (see Figure 5). Similarly, we previously highlighted all the super-concepts of the concept hav- ing the paper entitled Knowledge Organization and Information Retrieval Using Galois Lattices in the extent, Figure 6 only shows the associated sub-lattice to have a clearer view (see Figure 6). Fig. 5: Showing only sub-lattice Fig. 6: Showing only super-concepts of of the interesting concept. the interesting concept. Fig. 7: Interval pattern concept lattice for publications. 3.6 Interval Pattern Structures Interval Pattern Structures were first introduced in [16] for dealing with nu- merical data. Consider two descriptions δpg1 q “ xrli1 , ri1 sy and δpg2 q “ xrli2 , ri2 sy, with i P r1..ns where n is the number of intervals used for the description of entities. The similarity operation [ and the associated subsumption relation Ď between descriptions are defined as the convex hull of two descriptions as fol- lows: δpg1 q [ δpg2 q “ xrminpli1 , li2 q, maxpri1 , ri2 qsy. Based on this similarity mea- sure interval pattern concept lattice can be built. In the running scenario, three numerical attributes for the papers were used i.e., year of publications, rank of the conference and the number of pages. The ranks of the conferences were con- sidered based on COmputing Research and Education (CORE) rankings4 . The ranks were A*, A, B, C and other which were coded as 1, 2, 3, 4 and 5 respec- tively. The final concept lattice generated for the last five years of publications of Knowledge Discovery Team is shown in Figure 7. 3.7 Lattice Filtering Criteria There are two categories of filtering provided by LatViz; one is for the concept lattice created with the binary data and the other one is provided for the pattern concept lattice built with the help of interval pattern structures. Filtering Concept Lattice. After a concept lattice is built by applying FCA, expert is allowed to set several filtering criteria such as stability, lift, extent size, intent size and finally specific object or attribute names. Let us consider that in the running example, the expert is looking for the papers published by Amedeo Napoli on the topic of pattern structures and FCA. A filter on the number of attributes in the intent is set to 3. The filtered concept lattice obtained over the complete lattice is shown in Figure 8. It further shows the authors with who Amedeo Napoli has worked i.e., Sergei O. Kuznetsov and Mehdi Kaytoue. This part of concept lattice shows the community of authors working with Amedeo Napoli on the topic of pattern structures. Fig. 8: Filtered concept lattice obtained from binary context. 4 http://portal.core.edu.au/conf-ranks/ Filtering Pattern Concept Lattice. Interval Pattern Concept Lattices can also be filtered by specifying the number of attributes to be considered, the upper and the lower limits for the intervals in the intent of each attribute along with stability, lift and extent size. Let us consider the pattern concept lattice in Fig- ure 7 which is hard to interpret. To make it more readable based on what an expert wants, (s)he is allowed to specify filters. For example, if the expert is looking for a paper published in a conference of a rank 1-4 in the year 2012 - 2015 and has the number of pages not less than 2 and no more than 42 then the respective filters can be set for the values of all three attributes. The fil- tered pattern concept lattice will then only contain the part of lattice needed by the user. Figure 7 shows the concept containing group of papers published from 2014-2015 in conferences with rank 2 having number of pages 2-42. Fig. 9: Filtered Pattern Concept Lattice. 3.8 Attribute Implications One of the many proposed visualization techniques for implications includes table-based views. The columns in the table represent rule ID, LHS and RHS of the rule, support and confidence measures. These views were used because of the simplicity of storage. However, as the number of rules can be too many it is not very evident for the expert to focus on interesting rules at a simple glance. Another way of visualizing association rules are Matrix Views, where rows represent the LHS and columns represent the RHS of the rules. Support and confidence are displayed by different colors in the intersection of the LHS and RHS. In case of a formal context, the number of objects/attributes can be very big leading to problems in displaying the matrix. By carefully taking into account the above drawbacks, we finally settle on visualizing implications with the help of scatter plots, where the x-axis shows the increasing support and the y-axis shows the increasing lift (as we are considering implications the confidence of the rule is always 100%). Such kind of display helps the expert to single-out the rules (s)he wants to visualize based on the values of support and lift. Figure 10 shows implications of the running example, x-axis keeps the support in percentage and y-axis keeps lift. The number on top of the circle shows the number of rules existing in the same point in the plot. On mouse over, expert can view the implications. Fig. 10: Attribute implications for the running example. 4 Related Tools In [2], the authors focus mainly on interactive data exploration over RDF data for interactive knowledge discovery. It clusters RDF triples based on RDF Schema and then allows interactive exploration with the help of RV-Xplorer (Rdf View eXplorer) [3]. It is a tool for visualizing views over RDF graphs mainly for identi- fying interesting parts of data and allow data analysis. It has also been extended for clustering SPARQL query answers. To-date there have been many other tools developed for reducing the effort of expert in observing and interpreting a con- cept lattice. Many of the tools have been developed for more specific purposes. CREDO [8] and FooCA [17] are the Web Clustering Engines [7] which take the answers from queries posed against search engines and create a concept lattice which is then displayed to the expert for interaction. CREDO allows only limited interaction, however, FooCA allows the expert to edit the context and iteratively build the concept lattice. CEM [10] is an email manager which allows quick search through the e-mails and usually deals with smaller concept lattices. Camelis [11] is a system based on FCA for the organization of documents allowing several navigation operations. Another set of tools such as Sewelis [13] and Sparklis [12] allows navigation/interaction over knowledge graphs. Many other tools such as Galicia5 , ConExp and ToscanaJ6 are developed for academic purposes. LatViz takes the basic functionalities of ConExp and takes it to the another level by 5 https://sourceforge.net/projects/galicia/ 6 http://toscanaj.sourceforge.net/ providing visualization for many algorithms introduced over time to increase the readability. Moreover, it re-uses the source-code for building concept lattice with the help of the algorithm in [14] from ToscanaJ [4]. It can not only be applied to WOD but it has been extended for interpreting any kind of data. 5 Discussion and Future Improvements LatViz is a tool built for allowing expert interaction for data analysis pur- poses. It provides many new functionalities for reducing the exploration space of the expert and enable him to interpret the results. As a future perspective, we also want to implement other variations of pattern structures such as Pattern Structures introduced for structured set of attributes discussed in [1] and Het- erogeneous Pattern Structures [9]. We also want to extend the implementation of implications to association rules. Finally, we also want to take into account matrix factorization. References 1. Mehwish Alam, Aleksey Buzmakov, Amedeo Napoli, and Alibek Sailanbayev. Re- visiting pattern structures for structured attribute sets. In Proceedings of the Twelfth International Conference on Concept Lattices and Their Applications., pages 241–252, 2015. 2. Mehwish Alam and Amedeo Napoli. Interactive exploration over RDF data using formal concept analysis. In 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA., pages 1–10, 2015. 3. Mehwish Alam, Matthieu Osmuk, and Amedeo Napoli. RV-Xplorer: A way to navigate lattice-based views over RDF graphs. In Proceedings of the Twelfth In- ternational Conference on Concept Lattices and Their Applications., pages 23–34, 2015. 4. Peter Becker and Joachim Hereth Correia. The ToscanaJ suite for implementing conceptual information systems. In Formal Concept Analysis, Foundations and Applications, pages 324–348, 2005. 5. Anne Berry, Alain Gutierrez, Marianne Huchard, Amedeo Napoli, and Alain Sigayret. Hermes: a simple and efficient algorithm for building the aoc-poset of a binary relation. Ann. Math. Artif. Intell., 72(1-2):45–71, 2014. 6. Diego Valerio Camarda, Silvia Mazzini, and Alessandro Antonuccio. Lodlive, ex- ploring the web of data. In I-SEMANTICS 2012 - 8th International Conference on Semantic Systems, I-SEMANTICS ’12, Graz, Austria, September 5-7, 2012, 2012. 7. Claudio Carpineto, Stanislaw Osiński, Giovanni Romano, and Dawid Weiss. A survey of web clustering engines. ACM Comput. Surv., 41(3):17:1–17:38, 2009. 8. Claudio Carpineto and Giovanni Romano. Exploiting the potential of concept lattices for information retrieval with CREDO. J. UCS, 10(8):985–1013, 2004. 9. Vı́ctor Codocedo and Amedeo Napoli. A proposition for combining pattern struc- tures and relational concept analysis. In Formal Concept Analysis - 12th Interna- tional Conference, ICFCA 2014, Proceedings, pages 96–111, 2014. 10. Richard Cole and Gerd Stumme. CEM - A conceptual email manager. In 8th International Conference on Conceptual Structures, ICCS, Proceedings, 2000. 11. Sébastien Ferré. Camelis: a logical information system to organise and browse a collection of documents. Int. J. General Systems, 38(4):379–403, 2009. 12. Sébastien Ferré. SPARKLIS: a SPARQL endpoint explorer for expressive question answering. In Proceedings of the Posters & Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC., 2014. 13. Sébastien Ferré and Alice Hermann. Reconciling faceted search and query lan- guages for the semantic web. IJMSO, 7(1):37–54, 2012. 14. Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foun- dations. Springer, Berlin/Heidelberg, 1999. 15. Philipp Heim, Steffen Lohmann, and Timo Stegemann. Interactive relationship discovery via the semantic web. In The Semantic Web: Research and Applications, 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Crete, Greece, May 30 - June 3, 2010, Proceedings, Part I, pages 303–317, 2010. 16. Mehdi Kaytoue, Sergei O. Kuznetsov, and Amedeo Napoli. Revisiting numerical pattern mining with formal concept analysis. In Proceedings of the 22nd Interna- tional Joint Conference on Artificial Intelligence., pages 1342–1347, 2011. 17. Bjoern Koester. Conceptual knowledge retrieval with FooCA: Improving web search engine results with contexts and concept hierarchies. In Advances in Data Mining, Applications in Medicine, Web Mining, Marketing, Image and Signal Min- ing, 6th Industrial Conference on Data Mining, ICDM Proceedings, 2006. 18. R. Osswald and W. Petersen. A logical approach to data-driven classification. In KI, volume 2821 of Lecture Notes in Computer Science. Springer, 2003. 19. Wiebke Petersen. A set-theoretical approach for the induction of inheritance hier- archies. Electr. Notes Theor. Comput. Sci., 53:296–308, 2001. 20. Dean van der Merwe, Sergei A. Obiedkov, and Derrick G. Kourie. Addintent: A new incremental algorithm for constructing concept lattices. In ICFCA, pages 372–385, 2004.