<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enabling Hierarchical Exploration for Large-Scale Multidimensional Data with Abstract Parallel Coordinates</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gaëlle Richer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joris Sansen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frédéric Lalanne</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Auber</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Romain Bourqui</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>gaelle.richer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>joris.sansen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>frederic.lalanne</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>david.auber</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>romain.bourqui}@u-bordeaux.fr</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LaBRI</institution>
          ,
          <addr-line>Univ. Bordeaux, CNRS, UMR5800</addr-line>
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>76</fpage>
      <lpage>83</lpage>
      <abstract>
        <p>As data collection grows more common in various domains, there is a call for adapted or newer methods of visualization to tackle magnitudes exceeding the number of available pixels on screens and challenging interactivity. Exploratory visualization of large data present two major challenges: perceptual scalability and processing scalability. The first is concerned with overcoming the fundamental limitation of screens and human perception. The second deals with eficiently processing large volumes of data to achieve responsive interactions. Multiscale visualizations are an efective technique for solving the first challenge that builds on several levels of data abstraction to provide the user with an initial overview and subsequent incremental detail. The focus of this paper is on multidimensional data, a ubiquitous form of data among large-scale data sets, and parallel coordinates, a representation largely used for this type of data. For this representation, defining abstractions and interactively generating levels is not straightforward. Building upon several previous aggregated parallel coordinates representations, we propose a unifying and thinking model for conceiving and describing multiscale parallel coordinates and their interactions. Using this formalism, we present a focus+context representation which bounds the number of visual items with a fixed resolution parameter while supporting exploration up to the item-level. Processing scalability is addressed by carrying out computation in a distributed manner on a remote data-intensive infrastructure. Bounding the visual items ensures perceptual scalability but also bounds the data transfer between this infrastructure and the rendering client.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Multidimensional data, a form of data common in many domains,
encompasses all lists of individuals composed of several attributes
(possibly temporal). Multidimensional items are studied from
several aspects: the particular behavior of individual items relative
to the whole, the relationship between values from two
dimensions and the distribution of values along each dimension [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Parallel coordinates, introduced by Inselberg and Dimsdale [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
is a well-known technique of visualization for such data. Each
item is represented by a polyline which anchors on axes are
positioned at the corresponding attribute value of the item (see
Fig 1b). Axes are traditionally aligned in parallel forming a
sequence of two-dimensional subplots sharing one axis with their
predecessor. Usual interactions are axis reordering to analyze
relationships between all dimension pairs and selection to trace
 2 2 2 2 
 1 1 2 6 
 
 5 2 2 6 
 6 3 3 1 
 
(a) Toy data
(b) Line-based
(c) Abstract
subsets of items across axes. Selection relates to interactive means
of choosing subsets of items and enhancing them such that they
can be discriminated from the rest.
      </p>
      <p>
        In this paper, we are interested in supporting the interactive
exploration of multidimensional datasets with a large number
of items and a moderate number of dimensions. We built upon
the assumption that large datasets, scaling to billions of records,
lead to major overplot when displayed using the traditional
linebased parallel coordinate representation. Indeed, as the number
of records increases, the plot becomes cluttered and analyses
may be hindered as specific patterns are concealed [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Heinrich
and Weiskopf [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] listed several clutter reduction approaches for
parallel coordinates. One solution lies in the display of aggregates
computed per-dimension as in [
        <xref ref-type="bibr" rid="ref20 ref29 ref32">20, 29, 32</xref>
        ] instead of single items
(see Fig 1c). Despite the aggregation, these abstract plots
successfully provide overview information similar to a traditional plot
and perceptually scale for any size of input data. Rendering time
and transfer time in client-server architectures are also reduced
since they are bounded by the number of aggregates. Sansen
et al. [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] leveraged this bound by precomputing the result of
several interactions and thus clear the dependency of linear scans
of the data upon interaction. Indeed, past a certain number of
records, data cannot fit in a desktop computer memory and linear
scans over the data afect performance more negatively since
they either involve reads on disk or network transfer between
several computing units.
      </p>
      <p>
        As for any abstract visualization, deeper analyses are limited
by the amount of information conveyed by the visual aggregates.
Two types of interactions can alleviate this limitation: changing
the level of detail (show more details) and adjusting the
aggregation (show diferent details). Supporting these interactions is
essential but strongly increases the number of states of the
visualization and reasonable precomputing and storing of all these
states as performed by Sansen et al. [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] is challenging. The
processing scalability challenge here has been addressed by
Rübel et al. [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] on a modern high-performance computing (HPC)
platform. HPC platforms are large and expensive computing
systems suited for highly complex and real-time computation. They
are composed of multiple processors connected through a fast
network and use fast memory. As such, they are particularly
adapted to tightly coupled tasks where several processors work
on the same task and exchange data. Distributed systems, on the
contrary, are networks of computing units, usually commodity
hardware, connected in a shared-nothing architecture (memory
and storage are independent to each unit). On these systems,
data transfer between computing units uses a slower network
connection and thus is critical for performances. Consequently,
they are most adapted to loosely coupled data-parallel tasks on
large amounts of data. In addition to being cheaper alternatives
for data-intensive computing, they ofer easier horizontal
scalability (allocation of additional computing units) as their hardware
and architecture are less sophisticated. The filtration and
aggregation problems at stake in abstract parallel coordinates are
data-parallel tasks. We focus on these less expensive and more
accessible infrastructures to address them.
      </p>
      <p>
        Our aim is to enable interactive exploration down to the
itemlevel over large-scale multidimensional data with an abstract
parallel coordinate plot. The main contributions of this paper
are summarized as follows: (i) a unifying model for abstract
parallel coordinate plots based on hierarchical aggregation, (ii)
formalization of multiscale and regular interactions in this model,
(iii) a focus+context representation with bounded visual items
and drill-down/roll-up interactions, extension of the work of [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ],
(iv) an example implementation using a distributed infrastructure
with components that benefits from horizontal scalability.
      </p>
      <p>The paper is organized as follows: we first present previous
work on perceptual scalability in parallel coordinate plots and
their interactions with a focus on multiscale approaches (Sec.
2). Then, we describe the proposed graph-based formalism for
abstract parallel coordinates based on hierarchical aggregation
(Sec. 3). In Sec. 4, we study how diferent interactions are
expressed in this model. This yields a prototype implementation
and design of a scalable parallel coordinate plot based on
hierarchical aggregation, using a so-called big data infrastructure. This
visualization is described and its scalability evaluated in Sec. 5.
Finally, in Sec. 6 we draw conclusions and present directions for
future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Recent works have focused on the scalability of visualization
applications for large-scale data with diferent techniques, among
which are: data reduction, multi-threading, GPU-acceleration,
and incremental or approximate data processing. In the case of
visualization, the scalability of a system often refers to its
capacity to accommodate and handle growing amounts of data.
Handling massive datasets brings about two main challenges for
exploratory visualization: perceptual scalability and processing
scalability as noted by [
        <xref ref-type="bibr" rid="ref12 ref16 ref24">12, 16, 24</xref>
        ]. The first is concerned with the
legibility of visualizations representing numerous items relative
to the space available on a screen (so-called screen real-estate
problem) and human capabilities to apprehend them. The
second relates to the computational cost of processing numerous
items on each user input, that can create latencies responsible for
decreasing user performances [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. A taxonomy of diferent
techniques regarding the perceptual scalability aspect was established
by Ellis and Dix [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. A general solution is data reduction, which
can be categorized into two approaches: either representing a
subset of the data items (sampling, filtering) or meta-items
(aggregation, mathematical models). Several work proposed methods
(called multiresolution, multiscale, hierarchical or even stratified)
for navigating through multiple levels of detail supported by
precomputation (e.g. [
        <xref ref-type="bibr" rid="ref15 ref24 ref30">15, 24, 30</xref>
        ]). This work is related to general
techniques addressing perceptual and processing scalability for
interactive parallel coordinates.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Perceptual Scalability in Parallel</title>
    </sec>
    <sec id="sec-4">
      <title>Coordinates</title>
      <p>
        Various approaches have been proposed to improve the legibility
of parallel coordinate plots by either reducing the clutter
produced by the multiplicity of overlapping and crossing lines or
enhancing their patterns. Approaches can be categorized into
geometry-based relying on computer graphics techniques and
data reduction approaches that use approximation or summary of
the data. Geometry-based approaches display all items with shape
or position modifications to alleviate the overdraw in-between
axes, for instance by bundling lines (e.g. [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]). These techniques
have the advantage of producing no or few loss of information
but have the drawback of still being prone to over-plot since no
reduction of the number of displayed items is performed. Data
reduction approaches limit the number of visual items either by
sampling items [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] or using meta-items. Model-based approaches
mathematically reduce the data to a continuous function, and
meta-items usually represents the density of the underlying data
(e.g. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]). Aggregation approaches have been presented for
parallel coordinates through diferent schemes: aggregation of
dimensions (e.g. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), aggregation of items or values (e.g. [
        <xref ref-type="bibr" rid="ref13 ref28 ref29">13, 28, 29</xref>
        ])
or combinations of both (e.g. [
        <xref ref-type="bibr" rid="ref21 ref26">21, 26</xref>
        ]).
      </p>
      <p>
        In this paper, we are interested in scalability relative to the
number of items, not dimensions, hence we focus on aggregation
over items and their values. Previous work using aggregation
have used kernel density estimation, independently applied to
dimension [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], diferent hierarchical clustering algorithms over
multidimensional items or dimension (e.g. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]) but also binning
on two-dimensional sub-spaces (e.g. [
        <xref ref-type="bibr" rid="ref14 ref28">14, 28</xref>
        ]). Aggregates have
been represented by their statistical properties: extrema (e.g. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]),
cardinality (e.g. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]), mean (e.g. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]), and/or other metrics [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ].
      </p>
      <p>As reflected by previous work, multiple dimensions allow for
diferent levels of grouping: the item level (multidimensional), the
value level (per-dimension), the line level (two-dimensional). For
aggregation-based methods, the value level has the advantages
of losing less information, preserving continuity on axes
(potentially broken by two-dimensional aggregation) as well as an idea
of the relationship between dimensions (broken by item-level
aggregation).
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Multiscale Parallel Coordinates</title>
      <p>
        Abstract representations are usually conditioned by a parameter
that controls the resolution of the display. Multiscale
visualizations allow navigation between multiple levels of abstraction,
using a drill-dow/roll-up interaction (respectively for
increasing and decreasing levels of detail). Elmqvist and Fekete [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
provided general guidelines for multiscale representations and
interactions, based on such hierarchical aggregation. Bikakis et al.
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] presented a framework for hierarchical aggregation oriented
towards the computational aspects of hierarchical navigation.
Interactively changing the level of detail can be integrated in
different ways (see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for a general review). A first approach, similar
to geometry zooming, either (1) filters part of the representation
to maintain a fixed number of visual items on screen or instead
(2) purposely displays increasing number of items (e.g. [
        <xref ref-type="bibr" rid="ref29 ref6">6, 29</xref>
        ]).
Filtering has the advantage of being scalable when the amount of
displayed items is controlled; however, the overview and context
are lost along the way. One drawback of displaying an
uncontrolled number of items is that the screen pixel limitation could
always be reached for a suficiently massive dataset and
information would be lost. Additionally, from a certain point it may
overwhelm human capacities. Consequently, zooming without
ifltering does not allow interactive exploration to the item-level
complying with perceptual scalability. For zooming with
filtering, an alternative to the loss of context is the overview+detail
approach which displays both the filtered zoomed view and an
overview (e.g. dimension zooming of [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]).
      </p>
      <p>
        A second approach, called focus+context, consists in displaying
heterogeneous levels of detail: a selected portion of the data is
shown with greater details to the detriment of the rest. This
approach allows to gain detail locally while preserving the overview.
Fisheye lenses used in the work of [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] are an example applied in
screen space. In data space, [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], although using
diferent aggregation strategies, presented techniques where a subset
of items can be enhanced and displayed with fine-scale details
layered over the rest of the data, abstracted to some level.
      </p>
      <p>To the best of our knowledge, none of these solutions
propose to interactively change the level of detail locally, in a
focus+context fashion, while bounding the number of visual items.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Processing Scalability of Multiscale</title>
    </sec>
    <sec id="sec-7">
      <title>Parallel Coordinates</title>
      <p>
        For the past ten years, large-scale or massive has been used to
qualify increasingly big data, now up to tera or petabytes in
size [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. For multidimensional data, above about 107 items, with
a dozen of dimensions, perceptual scalability and processing
scalability becomes problematic for interactive analysis. Abstract
representation and precomputation of interactions are solutions
respectively addressing perceptual and processing scalability,
as [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] presented. In addition to the enhancement of subsets of
items and the reordering of axes, abstract representations require
a drilling interaction to allow fine data analysis.
      </p>
      <p>
        Processing scalability can be tackled by parallelism, distributed
processing and precomputing of complete or partial results for
instance. For parallel coordinates, most interactions can be applied
to two-dimensional subplots independently and on separate
portions of the dataset without needing any communication (they
are pleasantly parallel). This property has been exploited by Rübel
et al. [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] on a HPC platform, and [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] on a distributed platform.
Rübel et al. [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] addressed both scalability challenges with an
histogram-based representation adapted from [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. Histograms
are two-dimensional aggregations of the data, precomputed in
parallel, potentially for diferent resolutions. On more afordable
hardware, Sansen et al. [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] addressed the same challenges with a
parallel sets representation for an Hadoop+Elasticsearch
ecosystem that also relies on full precomputation of certain types of
interactions. Both works presented good scalability evaluations
relative to the number of computing units used. However, these
techniques only support the display of balanced levels of detail i.e.
drill-down is global and necessarily increases the number of
displayed items. With this approach, gaining detail at the item-level
does not scale to large datasets.
Consider X ∈ Rn,d , a set of multidimensional quantitative data
with n items and d dimensions. On a traditional parallel
coordinate plot, only a fraction of the input data subspaces is
represented. Fig. 2c and 2d show two examples of representations
that display all subspaces at once: respectively the parallel
coordinate matrix [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and the many-to-many plot [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. The model
we propose integrates all subspaces without assumption on the
layout of axes and supports abstraction. A value in the matrix
inherently belongs to a tuple (or item), and a dimension. We
focus on abstraction based on aggregation of tuples and do not
directly address aggregation of dimensions. The rationale behind
this approach is that dimensions hold semantic meaning not
embedded in their numerical content. Therefore, automatically and
meaningfully aggregate them is not straightforward: some types
of values may not make sense aggregated together. Consequently,
we target datasets that challenge scalability by their size in tuples
(more than millions of items) and hold moderate amounts of
well-chosen dimensions.
In parallel coordinates, the values of the input matrix are
connected to each other by lines to materialize the tuple or tuples
they belong to. Extending this metaphor, an n × d matrix can be
seen a graph linking its values based on the relation induced by
tuples. This undirected graph is composed of n separate cliques
ba  150 67 160 100 
X = c  =  3 2 0 8 
  
d   18 48 13 110 
e 
of d vertices, that is, n complete graphs (see Fig. 2e). We note this
graph G = (V , E, w ), where V denotes the vertices, E the edges,
w is the weighting function: w : V → R that stores vertex values.
We note D = {Vi , i ∈ ⟦1, d ⟧} the partition of V that groups
vertices along their origin dimension. Given a dimension sequence,
a parallel coordinate plot is built by taking the subgraph of G
composed of only the edges joining vertices of consecutive
dimensions in the sequence. Vertices of this subgraph are positioned
on their corresponding axis along a vertical position given by
the weighting function. Shadowed regions on Fig. 2e represent a
vertex partition R that groups identical values of the same
dimension. Such partition is a refinement of the partition D. Merging
vertices of G belonging to the same subset of a partition R
corresponds to taking the quotient graph of G relative to R, noted G/R .
Two subsets S, S ′ in the partition R are represented by adjacent
vertices in G/R if, and only if, some vertex in S is adjacent to a
vertex in S ′ in G. Further, two aggregation functions are defined
to compute the properties of the meta-nodes and meta-edges, i.e.
vertices and edges of the quotient graph. Aggregation functions
return a tuple of weights given a set of edges or vertices as input.
Weights (cardinality, extrema, mean, etc) provided by such
functions are subsequently used for assigning visual properties to
meta-edges and meta-nodes. For instance, on Fig. 2b, the opacity
of lines on the parallel coordinate plot depends on the
cardinality of meta-edges. On this example, the aggregation of identical
values allows to find the optimal number of lines to draw.
Indeed, the aggregation done here in data-space relates to the one
done in screen-space when rendering the polylines with alpha
composing.
      </p>
      <p>Given some aggregation functions, the same process is applied
to obtain an abstract representation from any valid partition R
of V , that is, refinement of D. Fig. 3 presents an example of
partition obtained with per-dimension clustering and another type of
representation based on extrema values. Tuple-based approaches
are incorporated into the model by refining the input tuple
partition into a refinement of D. Each subset of the tuple partition
can be decomposed into d groups along their membership in</p>
      <p>D. Fig. 4 shows an example of tuple partition refinement and a
tuple-oriented representation. Using our model and the
introduced notations, the abstract representation is fully defined by:
(i) a valid partition of the matrix values, (ii) two aggregate
functions assigning weights to meta-nodes and meta-edges, (iii) an
axis layout and visual encodings for meta-node and meta-edge
weights (vertical position, color, etc).
Hierarchical aggregations are precomputation of diferent
levels of abstraction that naturally support multiscale
representations. Such precomputation outputs a rooted tree structure whose
leaves are the data objects to aggregate. Every node of a rooted
tree is said to cover the leaves of the subtree it is the root of.
An antichain (also called tree cut) in a tree is a set of nodes S,
such that no node of S is an ancestor of another node of S. An
antichain is maximal if it cannot be expanded by any other node
without violating the antichain property. The subsets covered
by the nodes of a maximal antichain form a partition of the tree
leaves: the antichain property makes them pairwise disjoint, the
maximal property ensures that their union is the set of leaves.</p>
      <p>In our model, an abstraction is directed by a partition refining
the partition D which is equivalent to d partitions, one for each
Vi of D. By augmenting each subset Vi with a tree, such partition
can be defined by one maximal antichain for each tree. This is
equivalent to defining a single maximal and non-trivial antichain
in the unified tree where the root’s children are the roots of all
dimension trees. We note this unified tree T (V ), and call its direct
subtrees dimension hierarchies.</p>
      <p>
        Overall, for some input data modeled by the clique graph
G = (V , E, w ) and its associated hierarchy T (V ), an abstraction
is defined as previously described with the valid partition of V
induced by a maximal antichain in T (V ). This unifying approach
accommodates multiple layouts of parallel coordinates (see Fig. 2)
and several abstract parallel coordinates representations. We
presented two abstract representations: per-dimension aggregation
on Fig. 3d that corresponds to the technique presented by Palmas
et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] and on Fig. 4 tuple aggregation like presented by Fua
et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. For tuple aggregation, the hierarchy defined on
tuples is used to form all dimension hierarchies. Additionally, to
represent the same tuple aggregates on all subplots, dimension
hierarchies should all be cut at the same level.
4
      </p>
    </sec>
    <sec id="sec-8">
      <title>COMMON INTERACTIONS</title>
      <p>We take advantage of the hierarchical graph model described in
the previous section to explore several hierarchical operations
and usual operations for parallel coordinates and investigate the
incremental computation they require.
4.1</p>
    </sec>
    <sec id="sec-9">
      <title>Level-of-Detail Operations</title>
      <p>The drill-down and roll-up operations correspond to changes in
the displayed level of detail. They respectively mean moving
deeper in the hierarchical aggregation to show finer details, and
upper to present a coarser display of the data. Since the model
rests upon precomputed hierarchies of dimension values, the
weights (cardinality, extrema, etc) of all possible meta-nodes, i.e.
nodes of the hierarchies, can be precomputed as well. The number
of edges of a quotient graph with respect to a maximal antichain
is always less than those of the original graph and generally
greatly depends on the properties of the hierarchies (arity, depth).
In general, the number of maximal antichains is exponential. For
example, complete binary trees, that are rooted binary trees with
every level full except the last, have Ω (2n ) maximal antichains
where n is their number of leaves. This makes the precomputation
of all possible quotient graphs, i.e. all meta-edge weights, with
respect to every possible maximal antichain for given dimension
hierarchies not possible for the scale of data we aim to tackle.</p>
      <p>In traditional parallel coordinates plots, and in many derived
techniques, each two-dimensional subplot is independent of the
others and can be computed and displayed separately. When
axes hold visual elements, like frequency plots, the
corresponding visual items are usually mirrored and the two composing
parts conceptually included in both sides subplots. If no edge
is precomputed, the cost of an abstraction can be decomposed
into independent substeps corresponding to all subplots, each
requiring aggregation over n edges, one for each tuple.</p>
      <p>In the context of hierarchical aggregation, gaining detail can
be defined locally as the substitution of a visual aggregate for its
children. In our model, the displayed level of detail is given by
a maximal antichain of the hierarchy. Drilling-down on a
metanode replaces it in the current antichain by all of its children
which still forms a maximal antichain of the tree. This entails
computing the outgoing edges of these children since they are not
precomputed. It corresponds to aggregating the tuples covered
by the drilled meta-node. The opposite operation, rolling up or
collapsing, is the replacement of sibling nodes with their parent
in the antichain.</p>
      <p>Globally increasing detail may result in an uncontrolled
number of elements on display, independently of the arity and depth
of hierarchies. We detail three drilling approaches that tackle this
issue by trading context detail for scalability.</p>
      <p>Detail &amp; Filter. One approach is to define drilling on a
metanode as a filtering operation. This corresponds to computing the
expansion of the meta-node over a filtered clique-graph, induced
solely by vertices adjacent to the leaves it covers in the original
clique-graph. This method efectively bounds the number of
aggregates on an axis by the arity of the dimension hierarchy, i.e.
the maximal number of siblings in the tree. This also bounds the
number of edges in an abstract plot. The limit of this approach is
that all subplots are modified since the weights of all meta-nodes
and meta-edges from the quotient graph are to be updated to
account for the filtering. Thus for each subplot, it costs a pass
over as many bottom-level edges as the drilled meta-node covers
items.</p>
      <p>Budgeted Detail. A second approach lies in constraining the
definition of maximal antichains such that their size comply with
a predefined budget. In this setting, drilling potentially implies
modifying the current maximal antichain in multiple points such
that previously acquired detail collapses to allow drilling-down
when the budget would have been exceeded otherwise. If these
changes are restricted to a single dimension hierarchy, the
incremental changes only involve one or two subplots. Computing
these subplots costs a pass over n bottom-level edges for each, in
addition to the cost of finding a suitable maximal antichain.</p>
      <p>For a budget defined in number of visual items, the minimal
budget allowing to define a maximal antichain that does contain
a leaf node depends on the arity and depth of the tree. In a tree T
with n leaves and arity a, a maximal antichain containing at least
one leaf and minimal in size can be expected to have between
depth(T ) and a · depth(T ) nodes. Considering the toy example
of binary trees which nodes all have either 0 or 2 children, the
minimum depth is ⌈loд2 (n)⌉ + 1 and the maximal depth is n.
Consequently, a visual budget allowing gaining detail up to the leaf
level has to be chosen with respect to the depth of the hierarchy.
In our context, since this depth can reach orders of magnitude
same as the number of tuples, this method does not scale without
strong constraints on the properties of hierarchies.</p>
      <p>Dynamic Context Aggregation. An alternative solution to
collapsing nodes when focusing on deeper nodes is to aggregate
them dynamically. The meta-nodes from the chosen antichain
that are not in focus (i.e. not the deepest in the hierarchy) are
represented aggregated which lowers their impact on the visual item
budget. In general, these dynamically-computed aggregates, do
not correspond to existing nodes in the precomputed hierarchy.
This approach limits the visual items on an axis to the siblings
of each focus and the aggregates of the rest of the meta-nodes
that forms the context. Thus, for a dimension hierarchy of arity
k, an axis with f foci corresponds to at most k · f + ( f + 1) visual
nodes, whatever the depth of these foci in the hierarchy is.
Forming these aggregates induces an aggregation of their outgoing
edges, therefore it similarly reduces the number of meta-edges.
Consequently, with additional constraints on the arity of
dimension hierarchies and the number of concurrent foci on an axis,
this approach enables drill-down up to the deepest level while
complying with a visual budget. Contrary to the previously
described approaches, dynamic context aggregation matches the
requirement of our system and thus is the one implemented (see
Sec. 5.1 and 5.2).
4.2</p>
    </sec>
    <sec id="sec-10">
      <title>Other Operations</title>
      <p>Axis Reordering can be conceived either as the computation of a
full plot or, incrementally, as an edition of one or more subplots
which requires one or two passes over n bottom-level edges, for
n the number of tuples. Subset Selection relates to the emphasis
of a subset of tuples in a given representation. In an aggregate
visualization setting, an abstraction can be computed over the
subset of tuples and the resulting weights displayed over the
complete abstraction using a discriminating encoding. Using this
approach, a subset selection of m tuples requires going through
m bottom-level edges per subplot, one per tuple, to compute their
contribution to the current meta-edge weights. The process is
the same for meta-node weights.
Our goal is to enable hierarchical exploration in abstract parallel
coordinates while complying with the following scaling
properties. On the representation-side, exploration should be possible
down to the item level, in a top-down manner, while the
number of visual items on display should be bounded for any size
of input data. On the processing-side, network transfer latency
between the displaying unit and the computing unit should be
bounded and traditional operations (axis reordering, subset
selection) should be supported. In the following, we present first the
cornerstone of our method: the budgeted number of visual items.
Then, we detail a novel focus+context display with intuitive
drilldown capabilities using the dynamic context aggregation. And
ifnally, we describe technical details and performance evaluation
of the proposed system.
5.1</p>
    </sec>
    <sec id="sec-11">
      <title>Bounded Number of Visual Items</title>
      <p>Bounding visual items is essential for perceptual scalability but
also ensures that data transfer between our rendering client and
back-end unit remains bounded in size thus predictable in time.
To this end, the properties of dimension hierarchies are
constrained and the number of foci restricted to one per dimension.</p>
      <p>Hierarchy Constraints. We use a user-defined k value that acts
as a resolution parameter, bounding the number of visual items
per displayed axis. This parameter is enforced as the maximal
arity of dimension hierarchies. k should be chosen large enough
for the representation to preserve an appropriate amount of
information but small enough for the visual items to fit the available
screen space. Additionally, dimension hierarchies should order
their leaves with respect to their values such that every maximal
antichain defines partitions of dimension intervals since the
chosen visual representation relies on this property. Binning (equal
range partitioning) and adaptive binning (equal size partitioning)
are examples of partitioning algorithms that can be applied in a
bottom-up fashion to produce hierarchies complying with these
two requirements. No depth constraint is required, we adopt the
dynamic context aggregation strategy to bound the number of
vertices from quotient graphs displayed.</p>
      <p>Level-of-Detail Navigation. We define as focus nodes, the nodes
that have the maximal depth in the current antichain. The rest
of the antichain nodes from one dimension are aggregated into
the minimal number of context nodes such that their order is
preserved. In the proposed implementation, the top-level nodes are
initially presented as focus nodes and each drill-down triggers
a dynamic aggregation (examples are presented in Fig. 5).
Consequently, there is no more than k focus nodes and two context
nodes at once per axis. Thus, the number of nodes on display is
bounded by k + 2 per axis and the number of edges by (k + 2)2
per subplot.
5.2</p>
    </sec>
    <sec id="sec-12">
      <title>Visual Encoding</title>
      <p>
        We propose an extension of the work presented in [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] which
included two visual encodings: one oriented towards the
distribution of tuples, the other towards the distribution of values
themselves. Basically, the first encoding maps aggregate height
to their cardinality, while the second maps it to their covered
interval size (distance between extrema). We extend the
encoding and positioning of aggregates to emphasize the focus regions,
composed on each dimension of the data displayed with the finest
context node
with aggregate
of 2 and 3
focus nodes
(children of 1)
(b)
4
(a)
detail. Context regions are displayed with slightly lower width
to emphasize the focus regions. Since regions of interest are
incrementally renfied by successive drill-down, they get smaller
in height relative to the whole both in terms of cardinality and
interval of values. Therefore, the height of focus and context
regions are rendered with diferent scaling factors. Focus nodes are
represented colored and augmented by a bidirectional smoothed
histogram. Context nodes are augmented by a pile of level blocks
(see Fig. 5c). The wider these blocks are, the lower in the
hierarchy are the nodes they represent (notice on Fig. 5c the diference
in width of the outer blocks from the top and bottom context).
The height and vertical order of level blocks follows the same
encoding as those of focus nodes. Level blocks are computed on
the client side and thus not transferred nor counted in the visual
item budget.
      </p>
      <p>In the drilling mode, focus nodes and level blocks are clickable.
Clicking on a focus node triggers its animated expansion which
split it into its children and merges its siblings into level blocks.
Level blocks represent an aggregation of nodes previously in
focus, thus they enable going back to this precise previous state.
5.3</p>
    </sec>
    <sec id="sec-13">
      <title>System Overview</title>
      <p>Our system consists of two main parts and follows a client/server
architecture where the client is the visualization endpoint and
the server is an interface to an on-demand computing and
preprocessing back-end. The latter could be implemented both in
a distributed environment for a Hadoop cluster and as a
multithreaded application for single machines (desktop computer or
dedicated server). Past a certain number of input records (for a
104
105
106
107
108</p>
      <p>109
n (number of tuples)
(a) Execution time for node selection relative to data size for two
implementations. The distributed version uses 15 executors.
Errors bars are not symmetrical and missing their bottom parts on
two points due to the log-log scale.
distributed
single
selection cardinality
mean measured
ideal
5</p>
      <p>10
number of executors
15
given number of dimensions and resolution parameter), a
distributed platform should be more eficient while facilitating load
expansion.</p>
      <p>The client is a WebGL application that displays the
representation, computes level blocks and context aggregation on node
drill-down, and queries the supporting back-end for other
interactions. The back-end server is a long-lived Spark application
which runs distributed job on demand while keeping prepared
data in memory. It computes dimension hierarchies in a
precomputation step and stores the resulting meta-node weights
(extrema and cardinality) for all the hierarchies in a distributed
database. Contrary to the previous system, the hierarchical
aspect makes the number of displayable meta-edges too large to
allow their precomputation in reasonable time (see Sec. 4.1).</p>
      <p>The membership of each input data value is stored in a
hierarchy matrix of the same size as the input data, where each value
holds the list of computed ancestors for the matching input data
value. This matrix is kept in memory and split among computing
units which will pass over their slice of the data to filter and
aggregate results on demand. The aggregation outputs the
metaedges (source, target and cardinality) as well as the weights of
context nodes since they do not exist in the precomputed
hierarchies. Upon user interaction and if necessary, the client requests
the server which in turn runs a distributed operation and merges
the partial results returned by computing units. Finally, the client
receives the incremental changes in plain text and updates the
view consequently.
5.4</p>
    </sec>
    <sec id="sec-14">
      <title>Performance Evaluation</title>
      <p>Our system supports moving an axis, selecting an aggregate node
or edge, drill-down on a focus node and rolling-up by clicking on
a context part. Drill-down, roll-up and moving an axis correspond
to the same low-level operation which consists in computing the
edges of two-dimensional subplots given the current focus on
each dimension involved. Compared to the other operations that
only modify two to three subplots, selection operations afect all
subplots, thus are more computationally expensive.</p>
      <p>Since we are interested in supporting "n ≫ d" datasets, we
evaluate our system for varying number of tuples n and using
the most expensive interactive operation: node selection of the
largest node (in covered tuples). To demonstrate the scalability
of the approach we also evaluate performance relative to the
resources allocated for computation.</p>
      <p>
        Each node of the distributed platform has 64GB RAM and 2x6
hyper-threaded cores at 2.10GHz each, connected via a 1Gbit/s
network. The single computer used for running the multi-threaded
implementation has 2x4 hyper-threaded cores at 3.3GHz and
64GB RAM. Test datasets are generated for varying n (from 104
to 109), with d = 15 and k = 31. Test datasets are generated
such that pairs of dimensions present a close to null correlation
factor [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] which tends to create close to the maximum number of
edges between dimensions. Dimension hierarchies are generated
using Canopy clustering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] applied in a bottom-up manner.
      </p>
      <p>On Fig. 6a, execution times of the selection of the largest
toplevel node are plotted for increasingly large datasets. We evaluate
both a multi-threaded single-computer implementation and the
distributed Spark implementation. The stairs-shaped curve of
the single computer can be explained by the similar trend of the
selected node cardinality for every test dataset. Indeed, due to the
hierarchy constraints and the bottom-up approach for clustering,
the number of nodes on the top-level varies and the cardinality</p>
      <p>(b) Scalability test using node selection, n = 2 · 108.
of the largest nodes does not increase linearly with n. We observe
that, on our infrastructure, under n = 2 · 108 (about 108 selected
tuples) the Spark application performances are stable despite
increasing workload. This suggests that execution time for datasets
of smaller size is dominated by costs related to network and disk
I/O, and/or by merging all executor results by the driver unit.
Indeed, the cost of merging results remains approximately the
same as it is a function of the output size and the number of task
running in parallel. Another visible aspect of the results on small
datasets is their high variation. The oversized allocated memory
can be an explanation: when triggered, garbage collection incurs
a substantial delay. Overall, for n less than approximately 106, the
distributed infrastructure underperforms the single-computer
despite having more resources. Past this limit, the system is better
tuned and seems to scale linearly relative to the number of tuples
in the selection. This observation holds for the other experiments
carried out for edge selection, roll-up, and drill-down.</p>
      <p>We investigate the scalability of the system using a dataset
larger than this limit. Fig. 6b presents the median execution time
for node selection on a 2 · 108-tuple dataset with varying numbers
of executors. The plotted ideal execution time corresponds to a
linear speedup, that is, halving the execution time when doubling
the number of executors. This experiment shows that the system
performs close to the ideal.
6</p>
    </sec>
    <sec id="sec-15">
      <title>CONCLUSION</title>
      <p>We presented a graph model for hierarchical aggregation and
interaction strategies for parallel-coordinate-based visualization.
This model formalizes aggregation over multidimensional data at
the most-expressive level by making use of per-dimension
hierarchies. This approach treats all dimensions equivalently which
matches the way dimensions are handled in parallel coordinates.</p>
      <p>Based on this model, we presented a client/server system for
interactive exploration of large multidimensional data using
abstract parallel coordinates. We address the limitation of the
previous system, based on fixed partitions of dimension values, with
hierarchies that allow to interactively change partition. The
proposed drill-down operation enables the definition of an
arbitrarydetailed focus region on each axis while retaining context in
a reduced form. On the client-side, we showed how to display
focus and context regions with intuitive navigation cues. The
strength of this approach is that it supports exploration down to
the item-level while controlling the number of visual items, thus
ensuring perceptual scalability.</p>
      <p>On the server-side, the back-end processing is handled on a
distributed platform with components that scale horizontally.
Besides drill-down/roll-up operations, the implementation supports
standard parallel coordinate operations. Experimental results
demonstrate the scalability of the back-end system relative to the
size of the input data and to the resources allocated for
computation. The results indicate that the proposed system can support
increasingly large datasets by expanding its network of
computing units. To a certain extent, adding computing resources can
also reduce interaction latencies.</p>
      <p>One focus of the design is the bounded data transfer between
the client and server parts which relies on the single-focus
approach (on each axis) and the precomputation of dimension
hierarchies with a bounded arity k of small orders of magnitude.
The hierarchies being predefined allows for eficient filtering and
aggregation of items based on ancestors but may be limiting for
exploration. As a counterbalancing measure, future work could
add support for user-driven hierarchy refinement. Depending on
the scope and scale of the refinements this could be handled on
the client-side and/or enforced on the server-side.</p>
      <p>Another path for future work is methods for latency reduction
other than horizontal scaling. Space is a possible trade-of. Even
if the number of possible meta-edges makes their total
precomputation infeasible, partial precomputation could be investigated:
either beforehand or as a background process targeting
metaedges that are the most likely to be requested given the current
state.</p>
    </sec>
    <sec id="sec-16">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work has been carried out as part of the "REQUEST"
(PIAO18062645401) and "SpeedData" (PIAO17298-398711) projects supported
by the French "Investissement d’Avenir" Program (Big Data –
Cloud Computing topic). The authors would also like to thank
the anonymous referees for their valuable comments and helpful
suggestions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nigam</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Ungar</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Eficient clustering of highdimensional data sets with application to reference matching</article-title>
          .
          <source>In Knowledge Discovery and Data Mining. ACM</source>
          ,
          <volume>169</volume>
          -
          <fpage>178</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Andrews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Osmic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Schagerl</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Aggregated parallel coordinates: integrating hierarchical dimensions into parallel coordinates visualisations</article-title>
          .
          <source>In Proc. of Int. Conf. on Knowledge Technologies and Data-driven Business. ACM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Andrienko</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Andrienko</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Parallel coordinates for exploring properties of subsets</article-title>
          .
          <source>In Proc. of Int. Conf. on Coordinated &amp; Multiple Views in Exploratory Visualization</source>
          .
          <fpage>93</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Bikakis</surname>
          </string-name>
          , G. Papastefanatos,
          <string-name>
            <given-names>M.</given-names>
            <surname>Skourla</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Sellis</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A hierarchical aggregation framework for eficient multilevel visual exploration and analysis</article-title>
          .
          <source>Semantic Web</source>
          <volume>8</volume>
          ,
          <issue>1</issue>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Börzsönyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Stocker</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>The Skyline Operator</article-title>
          .
          <source>In Proc. of Int. Conf. on Data Engineering. IEEE Computer Society</source>
          ,
          <fpage>421</fpage>
          -
          <lpage>430</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Selçuk Candan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. Di</given-names>
            <surname>Caro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Sapino</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>PhC: Multiresolution Visualization and Exploration of Text Corpora with Parallel Hierarchical Coordinates</article-title>
          .
          <source>ACM TIST 3</source>
          ,
          <issue>2</issue>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. H. T.</given-names>
            <surname>Claessen</surname>
          </string-name>
          and
          <string-name>
            <surname>J. J. van Wijk.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Flexible Linked Axes for Multivariate Data Visualization</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>17</volume>
          ,
          <issue>12</issue>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cockburn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Karlson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B. B.</given-names>
            <surname>Bederson</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>A review of overview+detail, zooming, and focus+context interfaces</article-title>
          .
          <source>Comput. Surveys</source>
          <volume>41</volume>
          ,
          <issue>1</issue>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ellis</surname>
          </string-name>
          , E. Bertini,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Dix</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>The sampling lens</article-title>
          .
          <source>In CHI ' 05</source>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G. P.</given-names>
            <surname>Ellis</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Dix</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A Taxonomy of Clutter Reduction for Information Visualisation</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>13</volume>
          ,
          <issue>6</issue>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Elmqvist</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.-D.</given-names>
            <surname>Fekete</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Hierarchical Aggregation for Information Visualization: Overview, Techniques, and Design Guidelines</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>16</volume>
          ,
          <issue>3</issue>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fisher</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Big data exploration requires collaboration between visualization and data infrastructures</article-title>
          .
          <source>In Proc. Workshop on Human-in-the-Loop Data Analytics.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Fua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.O.</given-names>
            <surname>Ward</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.A.</given-names>
            <surname>Rundensteiner</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Hierarchical parallel coordinates for exploration of large datasets</article-title>
          .
          <source>In Proc. Visualization'99</source>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Laramee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Roberts</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Walker</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Angular Histograms: Frequency-Based Visualizations for Large, High Dimensional Data</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>17</volume>
          ,
          <issue>12</issue>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Godfrey</surname>
          </string-name>
          , ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gryz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lasek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Razavi</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Interactive Visualization of Big Data</article-title>
          . Springer International Publishing,
          <volume>3</volume>
          -
          <fpage>22</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>P.</given-names>
            <surname>Godfrey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gryz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Lasek</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Interactive Visualization of Large Data Sets</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>28</volume>
          ,
          <issue>8</issue>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stasko</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Weiskopf</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>The Parallel Coordinates Matrix</article-title>
          .
          <string-name>
            <surname>EuroVis-Short Papers</surname>
          </string-name>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Weiskopf</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>State of the Art of Parallel Coordinates</article-title>
          . In Eurographics 2013. Eurographics Association.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Inselberg</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Dimsdale</surname>
          </string-name>
          .
          <year>1990</year>
          .
          <article-title>Parallel Coordinates: A Tool for Visualizing Multi-dimensional Geometry</article-title>
          . In IEEE Visualization.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kosara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bendix</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Hauser</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Parallel Sets: Interactive Exploration and Visual Analysis of Categorical Data</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>12</volume>
          ,
          <issue>4</issue>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lex</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Streit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Partl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kashofer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Schmalstieg</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Comparative Analysis of Multidimensional, Quantitative Data</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>16</volume>
          ,
          <issue>6</issue>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Johansson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Cooper</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Many-to-Many Relational Parallel Coordinates Displays</article-title>
          .
          <source>In IV 2009. IEEE Computer Society.</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Heer</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The Efects of Interactive Latency on Exploratory Visual Analysis</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>20</volume>
          ,
          <issue>12</issue>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Heer</surname>
          </string-name>
          .
          <year>2013</year>
          . imMens:
          <article-title>Real-time Visual Querying of Big Data</article-title>
          .
          <source>Comput. Graph. Forum</source>
          <volume>32</volume>
          ,
          <issue>3</issue>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T. V.</given-names>
            <surname>Long</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Linsen</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>MultiClusterTree: Interactive Visual Exploration of Hierarchical Clusters in Multidimensional Multivariate Data</article-title>
          .
          <source>Comput. Graph. Forum</source>
          <volume>28</volume>
          ,
          <issue>3</issue>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Xu,
          <string-name>
            <given-names>G.</given-names>
            <surname>Xiao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Data Visualization of Web Service with Parallel Coordinates and NodeTrix</article-title>
          .
          <source>In SCC 2014. IEEE Computer Society.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosen</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>DSPCP: A Data Scalable Approach for Identifying Relationships in Parallel Coordinates</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          . (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Novotny</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Hauser</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Outlier-Preserving Focus+Context Visualization in Parallel Coordinates</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>12</volume>
          ,
          <issue>5</issue>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>G.</given-names>
            <surname>Palmas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bachynskyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oulasvirta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. P.</given-names>
            <surname>Seidel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Weinkauf</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>An Edge-Bundling Layout for Interactive Parallel Coordinates</article-title>
          .
          <article-title>In 2014 IEEE Pacific Vis</article-title>
          . Symp. IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Perrot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bourqui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hanusse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lalanne</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Auber</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Large interactive visualization of density functions on big data infrastructure</article-title>
          .
          <source>In Symp. on Large Data Analysis and Visualization</source>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>O.</given-names>
            <surname>Rübel</surname>
          </string-name>
          , Prabhat,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Childs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Meredith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. G. R.</given-names>
            <surname>Geddes</surname>
          </string-name>
          , E. CormierMichel, S. Ahern,
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Messmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hagen</surname>
          </string-name>
          , B. Hamann, and
          <string-name>
            <given-names>E. W.</given-names>
            <surname>Bethel</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>High performance multivariate visual data exploration for extremely large data</article-title>
          .
          <source>In SC '08.</source>
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sansen</surname>
          </string-name>
          , G. Richer,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jourde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lalanne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Auber</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Bourqui</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Visual Exploration of Large Multidimensional Data Using Parallel Coordinates on Big Data Infrastructure</article-title>
          .
          <source>Informatics 4</source>
          ,
          <issue>3</issue>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Guang</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Multi-Resolution Climate Ensemble Parameter Analysis with Nested Parallel Coordinates Plots</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>23</volume>
          ,
          <issue>1</issue>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>