SubSect — An Interactive Itemset Visualization?

               Joey De Pauw1 , Sandy Moens1 , and Bart Goethals1,2
                          1
                                  University of Antwerp, Belgium
                              2
                                   Monash University, Australia


1     Introduction
Itemsets and association rules are among the most simple and intuitive patterns
that are used to explore transaction datasets. However, they lack meaning with-
out both context and domain knowledge. Typically a user has to sift through
hundreds of these patterns before finding an interesting one, losing sight of the
forest for the trees. Furthermore, interestingness is a subjective measure that
can only be approximated by objective metrics or features [3].
     In previous work this problem has been tackled for instance by sorting and
filtering patterns based on different metrics [3] or by trying to minimize the
number of reported patterns to the most descriptive subset [1]. Another approach
is to represent patterns in informative visualizations and rely on the end user to
find what is interesting in their respective domain [2].
     We propose a novel itemset and association rule visualization that makes it
possible to inspect, assess, and compare patterns at a glance. This can not only
save time and effort, but also reduce errors introduced by misconceptions. Our
visualization is based on the double decker plot from Hofmann et al. [2] and
exploits the monotonicity property, which states that itemsets have a lower or
equal support compared to the support of their subsets.

2     Visualization
Consider the example in Figure 1a. Every item in the itemset is represented in
the center. The arcs around the center items show three levels of itemsets that
can be formed from these items. For example, the blue full circle near the center
includes all four items A, B, C and D, and has a frequency of 0.2 as indicated by
the label and its radius. The other segments represent subsets, like for example
the cyan arc which spans items A, B and C. In correspondence with the higher
frequency of this itemset (0.25), its arc also has a proportionally larger radius.
    In every image only the most interesting and informative subsets are ren-
dered: for a k-itemset these are the k-1-itemsets and the 1-itemsets. Together
this combination of subsets provides the most useful information: the 1-itemsets
give a global context and the k-1-itemsets place the k-itemset in a local con-
text. Itemsets larger than one are given a unique color, making it easier to link
multiple instances of the visualization that have items in common.
?
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
2         J. De Pauw et al.


                           A
                          0.8 0


                          0 .4 2
                          0.2 0


                              A


                                      0.2 5
                 0 .3 1
       0.7 2


                                              0.6 7
        D


                                               B
                          D

                                  B
                              C


                          0.2 6
                          0.4 8
                           C


               (a) Itemset {A, B, C, D}               (b) Itemset {A, B, C}

Fig. 1. Our visualization for the arbitrary itemset {A, B, C, D} (a) and for one of its
subsets (b). Each arc represents a set of items and shows its respective support.


    Furthermore, the visualization is equipped with two interactions for maximal
usability: dive deeper and α-conditional view 3 . Animations like hover highlight-
ing indicate the presence of these interactions and gradual transition animations
ease the transition between “states” of the visualization, making the effect of the
interactions more clear. Clicking on the cyan arc for example will dive into its
respective itemset {A, B, C}. An animation shows that item D is removed from
the center and the cyan arc becomes a full circle. Three new subsets are now
visible. The result is shown in Figure 1b. Naturally this action can be repeated
from the new view to dive deeper or the user can choose to go back to the top
level with the reset button that just became available.
    Similar to the interaction for selecting an itemset to dig deeper, it is also
possible to click a single item (in the center or on the outer edges) and add it
to the α set or the “scope”. In this α-conditional view, the scope is visible on a
smaller visualization to the left. On the right-hand side, we see the remaining
items and itemsets, but now with their frequencies relative to the scope.


References
1. Calders, T., Goethals, B.: Non-derivable itemset mining. Data Mining and Knowl-
   edge Discovery 14(1), 171–206 (2007)
2. Hofmann, H., Siebes, A.P., Wilhelm, A.F.: Visualizing association rules with in-
   teractive mosaic plots. In: Proceedings of the sixth ACM SIGKDD international
   conference on Knowledge discovery and data mining. pp. 227–235. ACM (2000)
3. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure
   for association patterns. In: Proceedings of the eighth ACM SIGKDD international
   conference on Knowledge discovery and data mining. pp. 32–41. ACM (2002)
3
    A live version with examples can be found on https://joeydp.github.io/SubSect/.