<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Set Visualization Challenges for Big Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luana Micallef</string-name>
          <email>luana.micallef@hiit.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Helsinki Institute for Information Technology, Aalto University</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This talk will provide a brief overview of the state-of-the-art of set visualization, followed by an in-depth discussion of challenges and open questions when dealing with real-world set-typed data.</p>
      </abstract>
      <kwd-group>
        <kwd>Sets</kwd>
        <kwd>visualization</kwd>
        <kwd>big data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>A
D</p>
      <p>B</p>
      <p>E</p>
      <p>
        C
F
Fig. 1. Di erent set visualization techniques depicting real-world data. (A)
eulerAPE's [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] area-proportional 3-Venn diagram showing genomic variations of three tissue
types. (B) ComED's [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] Euler diagram variant visualizing commonly used words in
Shakespeare's plays. (C) KelpFusion's [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] overlay visualization showing cities that are
members of di erent EU communities like the Eurozone. (D) PivotPaths's [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] node-link
diagram depicting connections between publications, authors and keywords. (E)
OnSet's [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] matrix-based set visualization showing similarities between blood samples of
di erent whale sharks. (F) Radial Set's [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] aggregate-based set visualization depicting
relationships between IMDb movies produced in di erent countries.
      </p>
      <p>Data is often organized into groups or sets to provide analysts with an
overview of shared properties and help them identify patterns and relationships
between the data items and the sets. For instance, links between social
communities are analysed to predict and disrupt crimes, while relationships between
groups of genes are studied to nd cures to illnesses. Set visualization can help
in the analysis of such set-typed data. However, due to advances in data
collection technology, real-world data is getting bigger and more complex, imposing
further challenges and a greater demand for scalable set visualizations that are
optimized for the user's data analysis tasks.</p>
      <p>
        Alsallakh et al. [
        <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
        ] categorized set visualization techniques into seven
categories: (A) Euler and Venn diagrams; (B) Euler diagrams variants; (C)
overlays; (D) node-link diagrams; (E) matrix-based diagrams; (F) aggregate-based
diagrams; (G) scatter plots and other. Figure 1 illustrates examples of the
techniques in categories A-F, all of which visualize real-world set-typed data.
      </p>
      <p>
        Euler and Venn diagrams are widely used to reason about sets and their
relationships. For instance, Euler diagrams are used to teach set theory to
schoolchildren, and to reason about biomedical data (e.g., Figure 1A). Their closed curves
form clearly bounded common regions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] with a preattentive popout e ect that
is stronger than the Gestalt powerful laws of proximity and similarity [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        However, Euler diagrams are not scalable and are unable to depict large data
collections with numerous sets and set relations [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This led to the development
of diverse set visualization techniques, such as Figure 1B-F, where for instance
relationships are depicted as the edges of a graph (Figure 1D) or the cell values
of a matrix (Figure 1E). In some cases, the sets and their relations have to be
shown on a pre-de ned visualization where set elements have a xed pre-de ned
position; for such cases, overlay set visualization techniques, like Figure 1C, have
been devised. Visualizing aggregate information about the sets, such as their
cardinality, is often helpful when reasoning about sets. For two or three sets, it
is possible to have an area-proportional Euler diagram like Figure 1A, but for
more sets, other techniques like Figure 1F would have to be used. No current
technique is considered appropriate in handling more than around 100 sets [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
despite that most real-world data is set-typed, large and often multi-dimensional,
particularly in areas like biosciences, security and social networking.
      </p>
      <p>
        There are various set visualization challenges and open questions which need
to be investigated further when dealing with big data:
{ Faster drawing algorithms that are tailored to the user's data analysis tasks
and needs are required.
{ Established information visualization and human-computer interaction
methodologies, such as focus+context techniques [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or Shneiderman's [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
informationseeking mantra of overview rst, zoom and lter, then details-on-demand,
should be adopted for set visualization.
{ Cognitive and perception theories should also be taken into account, so set
visualizations exploit and mitigate the capabilities and limitations of the
human information processing system.
{ Data mining and machine learning techniques could facilitate the selection
and visualization of important aspects, patterns and trends in set-typed data.
{ Evaluation of the e ectiveness of set visualization techniques for big data is
also important and di cult due to the di erent features and characteristics
of the visualization system and the data, and the data analysis tasks the
user wants to accomplish.
      </p>
      <p>We discuss such challenges and open questions in this talk, together with a brief
overview of the state-of-the-art of set visualization.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Bilal</given-names>
            <surname>Alsallakh</surname>
          </string-name>
          , Wolfgang Aigner, Silvia Miksch, and
          <string-name>
            <given-names>Helwig</given-names>
            <surname>Hauser</surname>
          </string-name>
          . Radial Sets:
          <article-title>Interactive Visual Analysis of Large Overlapping Sets</article-title>
          .
          <source>IEEE Transactions on Visualization and Computer Graphics</source>
          ,
          <volume>19</volume>
          (
          <issue>12</issue>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Bilal</given-names>
            <surname>Alsallakh</surname>
          </string-name>
          , Luana Micallef, Aigner Wolfgang, Hauser Helwig, Silvia Miksch, and Rodgers Peter.
          <article-title>Visualizing Sets and Set-typed Data: State-of-the-Art and Future Challenges</article-title>
          .
          <source>Proceedings of the 16th Annual Eurographics Conference on Visualization (EuroVis)</source>
          ,
          <source>State of the Art Reports (STARs)</source>
          ,
          <source>page 121</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Alsallakh</given-names>
            <surname>Bilal</surname>
          </string-name>
          , Micallef Luana, Aigner Wolfgang, Hauser Helwig, Miksch Silvia, and Rodgers Peter.
          <article-title>The State-of-the-Art of Set Visualization</article-title>
          .
          <source>Computer Graphics Forum</source>
          ,
          <volume>35</volume>
          (
          <issue>1</issue>
          ):
          <fpage>234260</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Stuart</surname>
            <given-names>K Card</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jock D Mackinlay</surname>
            , and
            <given-names>Ben</given-names>
          </string-name>
          <string-name>
            <surname>Shneiderman</surname>
          </string-name>
          .
          <article-title>Readings in information visualization: using vision to think</article-title>
          . Morgan Kaufmann,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Dork, Marian and Riche, Nathalie Henry and Ramos, Gonzalo and Dumais, Susan. Pivotpaths:
          <article-title>Strolling through faceted information spaces</article-title>
          .
          <source>IEEE Transactions on Visualization and Computer Graphics</source>
          ,
          <volume>18</volume>
          (
          <issue>12</issue>
          ):
          <volume>2709</volume>
          {
          <fpage>2718</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kurt</surname>
          </string-name>
          <article-title>Ko ka</article-title>
          .
          <source>Principles of Gestalt Psychology. Harcourt Brace</source>
          , New York, NY, USA,
          <year>1935</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Wouter</given-names>
            <surname>Meulemans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N</given-names>
            <surname>Henry</surname>
          </string-name>
          , Riche, Bettina Speckmann, Basak Alper, and Tim Dwyer.
          <article-title>KelpFusion: a Hybrid Set Visualization Technique</article-title>
          .
          <source>IEEE Transactions on Visualization and Computer Graphics</source>
          ,
          <volume>19</volume>
          (
          <issue>11</issue>
          ):
          <fpage>18461858</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Luana</given-names>
            <surname>Micallef</surname>
          </string-name>
          and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Rodgers</surname>
          </string-name>
          . eulerAPE:
          <article-title>Drawing area-proportional 3-Venn diagrams using ellipses</article-title>
          .
          <source>PloS one</source>
          ,
          <volume>9</volume>
          (
          <issue>7</issue>
          ):e101717,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Stephen</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Palmer</surname>
          </string-name>
          .
          <article-title>Common region: A new principle of perceptual grouping</article-title>
          .
          <source>Cognitive Psychology</source>
          ,
          <volume>24</volume>
          (
          <issue>3</issue>
          ):
          <fpage>436447</fpage>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Nathalie Henry Riche and
          <string-name>
            <given-names>Tim</given-names>
            <surname>Dwyer</surname>
          </string-name>
          .
          <source>Untangling Euler Diagrams. IEEE Transactions on Visualization and Computer Graphics</source>
          ,
          <volume>16</volume>
          (
          <issue>6</issue>
          ):
          <fpage>10901099</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Peter</given-names>
            <surname>Rodgers</surname>
          </string-name>
          .
          <article-title>A Survey of Euler Diagrams</article-title>
          .
          <source>Journal of Visual Languages and Computing, Special Issue on Visualization and Reasoning using Euler Diagrams</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ramik</surname>
            <given-names>Sadana</given-names>
          </string-name>
          , Timothy Major, Alistair Dove,
          <string-name>
            <surname>and John Stasko.</surname>
          </string-name>
          <article-title>OnSet: a visualization technique for large-scale binary set data</article-title>
          .
          <source>IEEE Transactions on Visualization and Computer Graphics</source>
          ,
          <volume>20</volume>
          (
          <issue>12</issue>
          ):
          <fpage>19932002</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>Ben</given-names>
            <surname>Shneiderman</surname>
          </string-name>
          .
          <article-title>The eyes have it: A task by data type taxonomy for information visualizations</article-title>
          .
          <source>In Visual Languages</source>
          ,
          <year>1996</year>
          . Proceedings., IEEE Symposium on, pages
          <volume>336</volume>
          {
          <fpage>343</fpage>
          . IEEE,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>