=Paper= {{Paper |id=Vol-2116/paper7 |storemode=property |title=A Case Study in Fitting Area-Proportional Euler Diagrams with Ellipses using eulerr |pdfUrl=https://ceur-ws.org/Vol-2116/paper7.pdf |volume=Vol-2116 |authors=Johan Larsson,Peter Gustafsson |dblpUrl=https://dblp.org/rec/conf/diagrams/LarssonG18 }} ==A Case Study in Fitting Area-Proportional Euler Diagrams with Ellipses using eulerr== https://ceur-ws.org/Vol-2116/paper7.pdf
       A Case Study in Fitting Area-Proportional
       Euler Diagrams with Ellipses using eulerr

                            Johan Larsson and Peter Gustafsson

    Department of Statistics, School of Economics and Management, Lund University,
                                      Lund, Sweden
                               johanlarsson@outlook.com
                             peter.gustafsson@stat.lu.se



        Abstract. Euler diagrams are common and user-friendly visualizations
        for set relationships. Most Euler diagrams use circles, but circles do
        not always yield accurate diagrams. A promising alternative is ellipses,
        which, in theory, enable accurate diagrams for a wider range of input.
        Elliptical diagrams, however, have not yet been implemented for more
        than three sets or three-set diagrams where there are disjoint or subset
        relationships. The aim of this paper is to present eulerr: a software package
        for elliptical Euler diagrams for, in theory, any number of sets. It fits Euler
        diagrams using numerical optimization and exact-area algorithms through
        a two-step procedure, first generating an initial layout using pairwise
        relationships and then finalizing this layout using all set relationships.


1      Background
The Euler diagram, first described by Leonard Euler in 1802 [1], is a generalization
of the popular Venn diagram. Venn and Euler diagrams both visualize set
relationships by mapping areas in the diagram to relationships in the data. They
differ, however, in that Venn diagrams require all intersections to be present—
even if they are empty—whilst Euler diagrams do not, which means that Euler
diagrams lend themselves well to be area-proportional.
    Euler diagrams may be fashioned out of any closed shape, and have been
implemented for triangles [2], rectangles [2], ellipses [3], smooth curves [4], poly-
gons [2], and circles [5, 2]. The latter are most common, and for good reason,
being that they are easiest to interpret [6]. Circles, however, sometimes cannot
be used to produce accurate area-proportional diagrams.
    With four or more sets that all intersect, for instance, exact Euler diagrams
are impossible with circles, given that we require 24 − 1 = 15 intersections but
with four circles can yield no more than 13 [7]. A solution to this problem is
offered with ellipses, which may intersect in up to four, rather than two, points,
consequently yielding the necessary 15 unique areas. Elliptical Euler diagrams
were first introduced with eulerAPE [3], which, however, only supports three
sets and prohibits empty intersections.
    Fitting elliptical or circular Euler diagrams must be done numerically even in
the two-set case [7] where the separation required by the circles has no closed-form
Y.Sato and Z.Shams (Eds.), SetVR 2018, pp. 84-91, 2018.
A Case Study in Fitting Area-Proportional...                      Larsson and Gustafsson

solution. Many algorithms accomplish this in two steps, first finding a coarse
starting layout that is finalized in a second, more thorough, algorithm. For the
initial layout, the aforementioned eulerAPE package [8], for instance, uses an
algorithm that tries to minimize the error in the three-way intersection by arrang-
ing circles representing the sets. The venneuler package [5], meanwhile, uses
multi-dimensional scaling (MDS). The javascript package venn.js [9] combines
a constrained version of the MDS algorithm from venneuler with a greedy
algorithm.
    In the final layout, we need to compute the areas of the overlaps in the dia-
gram. Frederickson [9] (venn.js) and Micallef and Rodgers [3] (eulerAPE) have
developed exact-area algorithms for circles and ellipses respectively—although the
latter, as we previously covered, restricts itself to three intersecting ellipses. The
parameters of the circles or ellipses are then optimized numerically to minimize
a loss measure, which vary depending on implementation.
    The R-package eulerr was created as part of a bachelor’s thesis [10] and
is the first package to support Euler diagrams for, in theory, any number of
ellipses, regardless of subset and disjoint intersections. In this paper, we aim to
demonstrate the package through a series of well-known examples from previous
literature on the subject as well as a simulation study for the three-set case.


2     Method

eulerr allows input in the form of disjoint subsets, unions and identities, a matrix
of binary or boolean indices, a list of sample spaces, or a two- or three-way table.
The Euler diagram is fit in two steps: first, an initial layout is formed with circles
using only the sets’ pairwise relationships. Second, this layout is fine-tuned taking
all intersections into consideration.


2.1     Initial layout

For our initial layout, we adopt a constrained version of multi-dimensional
scaling (MDS) that is used in venn.js [9], which in turn is a modification of an
algorithm from venneuler [5].
    We begin by placing P  circles representing each set uniformly at random in a
                             n
square space with area i=1 ri2 π, where ri is the radius of the ith circle. The
circles are initialized so that their areas are proportional to the size of their
respective sets. The algorithm then moves the circles so that the separation
between each pair of circles matches their respective sets’ intersection. If the two
sets are disjoint, however, the algorithm is indifferent to the relative locations of
those circles as long as they do not intersect. The equivalent applies to subset
sets: as long as the circle representing the smaller set remains within the larger
circle, their locations are free to vary. In all other cases, the loss function (1) is
the residual sums of squares of the separation of circles, d, required to obtain

                                               85
A Case Study in Fitting Area-Proportional...                               Larsson and Gustafsson

accurate pairwise overlaps and the actual distance in the layout.
                          
                          0
                                                             if disjoint
                   X 
       L(h, k) =            0                                 if subset                      (1)
                                                        2
                                      2            2
                          
                 1≤i δ then
        obtain Ψlast-ditch using a global optimizer
        if diagError(Ψfinal ) > diagError(Ψlast-ditch ) then return Ψlast-ditch
          else return Ψfinal
    else
        return Ψfinal



specified as disjoint subsets using the &-operator such that "A&B", for instance,
are the items unique to the intersection between A and B. We fit this specification
with venneuler and eulerr, in the latter case with both circles and ellipses,
using euler(): the workhorse of eulerr.

f1 <- venneuler::venneuler(wilkinson)                 # fit with venneuler
f2 <- euler(wilkinson)                                # fit with eulerr (circles)
f3 <- euler(wilkinson, shape = "ellipse")             # fit with eulerr (ellipses)

eulerr manages to fit this configuration perfectly using ellipses in addition to
producing a marginally better circular diagram. The stress values are 0.007,
0.004, and 4.687 × 10−12 for venneuler, eulerr (with circles), and eulerr (with
ellipses) respectively.
    Diagrams in eulerr are plotted using plot(), which allows considerable cus-
tomization of the resulting diagram. In the following code, we plot the circular
diagram using the default options and the elliptical one with a few modifica-
tions (Fig. 2).

plot(f2)
plot(f3,
     fills = rainbow(6, s = 0.5, v = 1),   # change fills
     edges = list(lex = 3, col = "white"), # white, broader edges
     labels = list(font = 3))              # italic labels


                                               88
A Case Study in Fitting Area-Proportional...                             Larsson and Gustafsson


                    E                               E                           E


         F
                                               F            D            F             D
                          D


                                                                C
                                                                                           C
    A                         C
                                        A                           A
                B                                       B                       B




Fig. 2. A comparison of a Euler diagram generated with venneuler (to the left) with
two generated from eulerr with circles (middle) and ellipses (right) respectively.



Micallef and Rodgers feature a diagram from Lenz et al. [23] that they remodeled
using eulerAPE [3]. We will do the same here, using eulerr, and compare the
results of the two packages. The data from the study—as disjoint subsets—is

lenz <- c("A" = 0.36, "B" = 0.03, "C" = 0,
          "A&B" = 0.41, "A&C" = 0.04, "B&C" = 0, "A&B&C" = 0.11)

Because eulerAPE cannot fit set configurations with empty intersections, the
authors used 0.00001 as a proxy for ∅. Using eulerr, however, we can fit the
diagram using the original data.

plot(euler(lenz, shape = "ellipse"), legend = TRUE) # add a legend

The fits from both packages are exact (Fig. 3). Although we instructed eulerr
to allow ellipses in the fit, the algorithm stuck to circles, which, given that the
fit is exact, is the appropriate choice since circles are easier to interpret [6].
eulerAPE, in contrast, did not. It tries to keep the three shapes intersecting,
albeit marginally, which cannot be done with circles if the layout is to be exact.




                                                                        Fig. 3.     Diagrams
                                                                        from eulerAPE and
                                        A
                                                                        eulerr based on data
                                        B
                                                                        from a diagram from
                                        C
                                                                        Lenz et al. [23]. Both
                                                                        diagrams are exact.




   Finally, we provide a benchmark of the accuracy of venneuler, eulerAPE,
and eulerr in reproducing 1,000 random three-set combinations sampled from

                                                   89
A Case Study in Fitting Area-Proportional...                                              Larsson and Gustafsson

U(10−6 , 1) (Fig. 4). We use three-set combinations with all intersections present
in order to enable all tested packages to fit the diagrams.
    In terms of both stress and diagError, the elliptical diagrams from eulerr
and eulerAPE perform the best, with the former coming out marginally ahead
with median stress and diagError at 3.123 × 10−13 and 1.638 × 10−7 respectively,
whilst the equivalent figures for eulerAPE are 7.834 × 10−12 and 8.219 × 10−7 .
    For the circular diagrams, eulerr achieves the lowest median stress at 0.022,
followed by venneuler and eulerAPE at 0.035 and 0.08 respectively. In terms
of diagError, venneuler performs best followed by eulerr and eulerAPE with
respective median diagErrors of 0.048, 0.055, and 0.067.

                                                     0.0   0.2     0.4      0.6   0.8

                                    stress                      diagError
                                                                                        Fig. 4. Tukey box
 eulerAPE (ellipses)
                                                                                        plots of diagError
  eulerAPE (circles)                                                                    and stress for Euler
                                                                                        diagrams based on
          venneuler
                                                                                        set relationships of
    eulerr (ellipses)                                                                   three sets with every
     eulerr (circles)                                                                   intersection present.

                        0.0   0.2     0.4    0.6   0.8




4      Discussion
In this paper, we have presented an R-based software package, eulerr, for
generating elliptical Euler diagrams for any number of sets. We have examined
its performance for set relationships from previous publications of software for
Euler diagrams and shown that eulerr performs adequately for our examples
and for random three-set combinations. In general, we have also shown that
elliptical Euler diagrams have the potential to outperform circular diagrams. The
reason for this is simple: elliptical Euler diagrams feature two additional degrees
of freedom for each shape in the diagram, provided by stretch and rotation.
    eulerr is the first software to feature area-proportional elliptical Euler di-
agrams for more than three sets. The only other software for elliptical Euler
diagrams, eulerAPE, is restricted to three sets. This limitation is discussed by
the authors of the package, who argue that Euler diagrams with more than three
sets often lack well-formed solutions and that their complexity make implemen-
tations difficult [8]. Whilst it is true that inputs with more than three sets do
not always reduce to adequate Euler diagrams, it is our stance that those that
do warrant a solution to find them.
    The results of this paper are limited to a few cases and it is not known
whether they generalize to other set combinations. This is a topic for future
research in the field, which should examine different software for Euler diagrams
in large-scale simulation studies.

                                                           90
A Case Study in Fitting Area-Proportional...                       Larsson and Gustafsson

References
 1. Euler, L.: Letters of Euler to a German princess, on different subjects in physics
    and philosophy. Murray and Highley (1802)
 2. Swinton, J.: Vennerable: Venn and Euler area-proportional diagrams. (2011) R
    package version 3.1.0.9000.
 3. Micallef, L., Rodgers, P.: eulerAPE: drawing area-proportional 3-Venn diagrams
    using ellipses. PLOS ONE 9(7) (July 2014) e101717
 4. Micallef, L., Rodgers, P.: eulerForce: force-directed layout for Euler diagrams.
    Journal of Visual Languages and Computing 25(6) (December 2014) 924–934
 5. Wilkinson, L.: Exact and approximate area-proportional circular Venn and Euler
    diagrams. IEEE Transactions on Visualization and Computer Graphics 18(2)
    (February 2012) 321–331
 6. Blake, A.: The impact of graphical choices on the perception of Euler diagrams.
    Ph.D. dissertation, Brighton University, Brighton, UK (February 2016)
 7. Chow, S.C.: Generating and drawing area-proportional Euler and Venn diagrams.
    Ph.D. dissertation, University of Victoria, Victoria, BC, Canada (2007)
 8. Micallef, L.: Visualizing set relations and cardinalities using Venn and Euler
    diagrams. Ph.D. dissertation, University of Kent (September 2013)
 9. Frederickson, B.: venn.js: area proportional Venn and Euler diagrams in JavaScript
    (November 2016) original-date: 2013-05-09.
10. Larsson, J.: eulerr: Area-proportional Euler diagrams with ellipses (2018) Available
    at: http://lup.lub.lu.se/student-papers/record/8934042.
11. R Core Team: R: A Language and Environment for Statistical Computing. R
    Foundation for Statistical Computing, Vienna, Austria. (2017)
12. Schnabel, R.B., Koonatz, J.E., Weiss, B.E.: A modular system of algorithms for
    unconstrained minimization. ACM Trans Math Softw 11(4) (December 1985)
    419–440
13. Richter-Gebert, J.: Perspectives on Projective Geometry: A Guided Tour Through
    Real and Complex Geometry. 1 edn. Springer, Berlin, Germany (February 2011)
14. Finley,     D.R.:          Ultra-easy    algorithm    with     C     code   sample.
    http://alienryderflex.com/polygon area/ (December 2006)
15. Eberly, D.: The area of intersecting ellipses (November 2016)
16. Micallef, L., Rodgers, P.: Computing the Region Areas of Euler Diagrams Drawn
    with Three Ellipses. In Burton, J., Stapleton, G., Klein, K., eds.: CEUR Workshop
    Proceedings. Volume 1244., Melbourne, Australia (July 2014) 1–15
17. Xiang, Y., Gubian, S., Suomela, B., Hoeng, J.: Generalized simulated annealing for
    global optimization: the GenSA package. The R Journal 5(1) (June 2013) 13–28
18. Sanderson, C., Curtin, R.: Armadillo: a template-based C++ library for linear
    algebra. The Journal of Open Source Software 1(2) (2016) 26
19. Eddelbuettel, D., François, R.: Rcpp: Seamless R and C++ integration. Journal of
    Statistical Software 40(8) (2011) 1–18
20. Eddelbuettel, D., Sanderson, C.: RcppArmadillo: accelerating R with high-
    performance C++ linear algebra. Computational Statistics and Data Analysis 71
    (March 2014) 1054–1063
21. R Core Team: The Comprehensive R Archive Network (November 2017)
22. Chang, W., Cheng, J., Allaire, J., Xie, Y., McPherson, J.: shiny: Web Application
    Framework for R. (2017) R package version 1.0.5.
23. Lenz, O., Fornoni, A.: Chronic kidney disease care delivered by US family medicine
    and internal medicine trainees: results from an online survey. BMC medicine 4
    (December 2006) 30


                                               91