=Paper= {{Paper |id=Vol-2327/ESIDA4 |storemode=property |title=Explainable Structuring and Discovery of Relevant Cases for Exploration of High-Dimensional Data |pdfUrl=https://ceur-ws.org/Vol-2327/IUI19WS-ESIDA-4.pdf |volume=Vol-2327 |authors=Joris Falip,Frédéric Blanchard,Michel Herbin |dblpUrl=https://dblp.org/rec/conf/iui/FalipBH19 }} ==Explainable Structuring and Discovery of Relevant Cases for Exploration of High-Dimensional Data== https://ceur-ws.org/Vol-2327/IUI19WS-ESIDA-4.pdf
     Explainable Structuring and Discovery of Relevant Cases for
               Exploration of High-Dimensional Data
                      Joris Falip                                        Frédéric Blanchard                               Michel Herbin
             joris.falip@univ-reims.fr                                           CReSTIC                                      CReSTIC
                       CReSTIC                                                 Reims, France                                Reims, France
                    Reims, France
ABSTRACT                                                                                even leverages the analogical reasoning most experts use: they of-
Data described by numerous features create a challenge for do-                          ten compare new cases or problems to previously encountered ones
main experts as it is difficult to manipulate, explore and visualize                    and base their decisions on past experiences. This way, the tool and
them. With the increased number of features, a phenomenon called                        answers it provides would feel more intuitive for end-users. We also
"curse of dimensionality" arises: sparsity increases and distance                       emphasize compatibility with high-dimensional data: experts often
metrics are less relevant as most elements of the dataset become                        rely on datasets described by hundreds or thousands of features,
equidistant. The result is a loss of efficiency for traditional machine                 meaning that any solution must remain efficient even with a high
learning algorithms. Moreover, many state-of-the-art approaches                         number of dimensions. Finally, explicability and interpretability are
act as black-boxes from a user point of view and are unable to pro-                     critical in many fields including health and medicine, insurance,
vide explanations for their results. We propose an instance-based                       finance or security: a recommendation system is even more useful
method to structure datasets around important elements called                           when end-users understand the process behind suggested decisions.
exemplars. The similarity measure used by our approach is less sen-                     It allows them to adapt and adjust the algorithm to their needs but
sitive to high-dimensional spaces, and provides both explainable                        most of all, it increases confidence in the results.
and interpretable results: important properties for decision-making
tools such as recommender systems. The described algorithm relies                       To reach these objectives, we created an unsupervised instance-
on exemplar theory to provide a data exploration tool suited to the                     based algorithm that enriches data and structures it, linking each
reasoning used by experts of various fields. We apply our method                        element to an exemplar exhibiting typical characteristics. The re-
to synthetic as well as real-world datasets and compare the results                     sulting graph can be used by a recommender system, building a
to recommendations made using a nearest neighbor approach.                              solution to explore data by following edges, each one representing
                                                                                        strong similarities on specific features. When exploring the dataset,
CCS CONCEPTS                                                                            going from an instance to its exemplar means going from a spe-
                                                                                        cific element to a more generic and archetypal one. Our approach
• Information systems → Decision support systems; • Human-
                                                                                        processes each dimension individually then aggregate the results,
centered computing → Interactive systems and tools; • Comput-
                                                                                        making this method suitable for very high-dimensional analysis.
ing methodologies → Knowledge representation and reasoning.
                                                                                        Moreover, it identifies features contributing to similarities between
                                                                                        elements, making it straightforward to explain the structuration
KEYWORDS
                                                                                        choices. This strategy can be adapted to multiple contexts as it does
instance-based algorithm; high-dimensional data; exploratory anal-                      not require any a priori knowledge.
ysis; information visualization; exemplar theory; explainable ma-
chine learning; recommendation system                                                   In the following section, we detail the problems encountered when
ACM Reference Format:                                                                   manipulating high-dimensional datasets and elaborate on the re-
Joris Falip, Frédéric Blanchard, and Michel Herbin. 2019. Explainable Struc-            quirements of a successful exploration tool that would allow visual-
turing and Discovery of Relevant Cases for Exploration of High-Dimensional              ization of similarities between data points and could give insights on
Data. In Joint Proceedings of the ACM IUI 2019 Workshops, Los Angeles, USA,             hidden patterns. The next section describes the proposed algorithm,
March 20, 2019 , 7 pages.                                                               the resulting structure and how the input parameter influences
                                                                                        it. We then detail applications to various datasets, illustrating the
1     INTRODUCTION                                                                      benefits of this method. Finally, we discuss the proposed approach,
This work addresses the task of structuring a dataset to facilitate ex-                 review current work in progress and possibilities for future work.
ploration and visualization. The proposed idea is to create pairwise
connections between similar elements in order to provide a similar-                     2   BACKGROUND
ity graph usable for exploration and recommendation. Our goal is a
                                                                                        While high-dimensional databases are frequently encountered with
solution well-suited for domain experts, that takes into account and
                                                                                        many real-world data, experts lack appropriate and simple tools to
                                                                                        make the best use of these datasets and information. Many knowl-
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                      edge discovery tools provide relevant results but lack interpretabil-
© 2019 Copyright ©2019 for the individual papers by the papers’ authors. Copying        ity or explainability: essential features when making any critical
permitted for private and academic purposes. This volume is published and copyrighted
by its editors.                                                                         decision [9].
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                         Joris Falip, Frédéric Blanchard, and Michel Herbin


Also, when exploring and visualizing data, most algorithms rely            strong similarity between individuals.
on prototypes [15] generated with averages from similar elements.
These prototype-based methods can be effective in some cases but           From a practical point of view, the proposed method is currently
when manipulating data, experts of a specific field tend to under-         implemented as a recommender system used by medical experts.
stand a new element (or instance) by comparing it to the most              This allows them to visualize and understand diabetes-related data
similar instance they are familiar with [8, 16]. Experts naturally         by exploring an association graph [6] where each vertex is a pa-
favor this approach when confronted with problems within their             tient and patients in the same connected component had a similar
field of expertise, and it provides better results. [10]. This reasoning   evolution of the disease.
is known as analogical reasoning and is formalized by the exem-
plar theory [11, 12]. For example, a physician will use his work           Our algorithm is based on the Degree of Representativeness [2] (DoR).
experience to diagnose a new patient by linking it to the most simi-       The DoR measures the typicality of an element and its ability to be
lar element in a set of typical patients he encountered in the past.       an exemplar for other elements. Algorithm 1 illustrates the selec-
In analogical reasoning, typical instances of a dataset are called         tion of an exemplar for each element according to the DoR, while
exemplars, each one subsuming a set of similar elements. These             Algorithm 2 details computing of the DoR. This new algorithm
exemplars form a subset that can describe the whole dataset but            addresses the problem of high-dimensionality: our last work was
remains small enough to manipulate and explore easily.                     using distance in the full description space when computing the
                                                                           DoR, thus being unreliable with datasets described by a high number
When analyzing data described by a high number of features, it             of features.
becomes harder to use the distance between instances as a measure
of similarity. This is because working with data described by a            3.1    Algorithm
high number of features comes with two inherent problems: data             Let Ω be a set of N objects in a multidimensional space. These N
become sparser as density decreases and distances between ele-             objects, or elements, are characterized by D qualitative or quantita-
ments normalize, so neighbors are harder to distinguish [1]. These         tive features.
phenomena, often referred to as "curse of dimensionality" [5], make
many analysis and recommendation algorithms unreliable [4] be-             For each of the D dimensions, we compute the distance matrix.
cause they use distances between instances to compute similarities.        Each object then ranks every other object according to their simi-
A standard solution to this problem is to use feature projection           larity on this dimension: low distance translates to a good ranking,
methods to reduce dimensionality, but this is not suitable if we           with the nearest element being ranked 1 and the farthest ranked
wish to retain the interpretability of the process. On the other hand,     N − 1. This step is repeated on each dimension to transform the D
feature selection reduces the genericity of the approach and require       distance matrices into D rank matrices.
time and a priori knowledge to select and filter attributes.
                                                                             Let us transform the ranks into scores. Let x be an object of Ω: for
High-dimensionality also provides new properties to elements, like         each dimension d, x assigns a relative score Score xd to every other
the hubness phenomenon [18], which can help to improve and                 object y of Ω. Score xd , relative to x, can be any arbitrary function,
develop data mining techniques. Hubness is the fact that increased         but in this paper it will be defined by:
dimensionality correlates with some instances being in the near-
est neighbors of the majority of the other elements, turning these
                                                                                         Score xd (y) = max(1 + T − Rank xd (y), 0)           (1)
instances into so-called hubs. Hubness reduction methods, when
coupled with traditional approaches, do not improve their results            where Rank xd (y) is the rank of an object y relative to x on di-
[7]. Hopefully this phenomenon can be used to adapt algorithms             mension d, and each element only assigns a non-zero score to its T
to high-dimensional data [17].                                             nearest neighbors. For each element y, we can compute the sum of
                                                                           all scores it received on a specific dimension:

3    STRUCTURING AROUND EXEMPLARS                                                              Score d (y) =
                                                                                                               X
                                                                                                                       Score xd (y)           (2)
Our proposed approach finds the most typical and "interesting"                                                 x ∈Ω
neighbor of each data point, without relying on a distance metric
                                                                              . Let us define k as a numeric parameter allowing control over
in the entire space [19]. We process each dimension separately be-
                                                                           the total number of exemplars. To find the exemplar of a given
fore aggregating the results: our algorithm is less affected by high
                                                                           object x, we introduce the DoR x (y) of another element y as the
dimensionality, and interpretability is improved as we can quantify
the importance of each dimension regarding associations. We also           sum of the scores Score d (y) for every dimension d where y is in
chose to use ranks instead of distances, to avoid excluding outliers       the k nearest neighbors of x.
elements.                                                                                                      X
                                                                                                DoR x (y) =            Score d (y)            (3)
Structuring a dataset by linking each element to its exemplar results                                         d ∈D ′
in a graph where each connected component is a group of elements              where D ′ is the set of dimensions on which y is in the k-nearest
similar enough to be subsumed by a few selected exemplars. This            neighbors of x. The element chosen as an exemplar for object x is
graph allows for exploration of the dataset, with edges representing       the element with the highest DoR x .
Explainable Structuring Exploration of High-Dimensional Data                                     IUI Workshops’19, March 20, 2019, Los Angeles, USA

  Data: N elements defined on D dimensions, neighborhood
                                                                                                      Species   setosa    versicolor   virginica
        size K
  Result: list of N exemplars                                                              4.5

  foreach dimension d in D do
                                                                                           4.0
     Compute dissimilarity matrix of elements;
     Transform similarities into ranks;




                                                                             Sepal Width
                                                                                           3.5
     Convert ranks into scores with arbitrary scoring function;
     foreach element e in N do                                                             3.0
         Score d (e) ←−       Score xd (e) given by x to y;
                          P
                          ∀x ∈N                                                            2.5
     end
  end                                                                                      2.0
  foreach element e in N do                                                                             5            6                 7           8
     exemplar (e) ←− max (DoR(x, e, K ));                                                                         Sepal Length
                         ∀x ∈N
  end
                                                                         Figure 1: Application to the Iris dataset with neighborhood
  Result ←− list of exemplars determined above;                          size k set to 30 create numerous exemplars closely matching
              Algorithm 1: Structuring algorithm                         elements they subsume.

  Data: elements x and e defined on D dimensions,
        neighborhood size K
                                                                                                      Species   setosa    versicolor   virginica
  Result: Degree of Representativeness of element x according
          to element e                                                                     4.5

  Knnd (e) ←− K-nearest neighbors of e, on dimension d;
  DoR ←− 0                                                                                 4.0

  foreach dimension d in D do
                                                                             Sepal Width




                                                                                           3.5
     if x ∈ Knnd (e) then
         DoR ←− DoR + Score d (x );                                                        3.0
     end
  end                                                                                      2.5
  Result ←− DoR;
                                                                                           2.0
            Algorithm 2: DoR computing algorithm
                                                                                                        5            6                 7           8
                                                                                                                  Sepal Length

3.2    Neighborhood size k and the resulting
                                                                         Figure 2: Application to the Iris dataset with neighborhood
       structure                                                         size k set to 50 create few exemplars that summarize the en-
When establishing pairwise connections between elements, param-          tire dataset.
eter k determines the total number of exemplars used to summarize
the dataset. A low value for the neighborhood size (k about 20%
of the number of elements) gives numerous exemplars, each one
closely matching the few instances they subsume. The high number         of 2 elements subsumed by each exemplar; while the later is ob-
of connected components in the resulting graph limits exploration        tained with a parameter set to 50, resulting in 25 exemplars for an
of related instances by following edges but this configuration will      average of 5 elements represented by each exemplar. Either one of
only group very similar elements. On the other hand, selecting a         these two graphs makes for an excellent recommendation system
higher value (about 30% to 50% of the dataset’s size) gives fewer        where, for each element, a similar but more generic and typical one
exemplars that each subsumes a significant subset of the population.     can be suggested. It is also useful as a visualization tool, guiding
These exemplars provide a meaningful but condensed representa-           users as they explore the dataset. Features playing the most promi-
tion of the data and exploration of the resulting structure is easy:     nent role in each association can even be overlaid on the edges to
with few connected components in the graph, following edges al-          inform on the nature of the similarities guiding the exploration
lows the discovery of many similar elements. Figure 6, discussed in      process.
the results section, illustrates the number of connected components         Given the graph resulting from the structuring process, another
obtained for every possible value of k on a real-world dataset and       option is to study its evolution for every value of the parameter
thus the different granularities obtainable in the final graph.          k. For a set of N elements, this means 1 ≤ k ≤ N . From this, we
                                                                         can also establish other measures including the total number of
Figures 1 and 2 illustrate these differences: the former uses a neigh-   iterations a given element is chosen as an exemplar, or the average
borhood size of 30 and is composed of 44 exemplars, with an average      number of elements subsumed by each exemplar. These additional
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                      Joris Falip, Frédéric Blanchard, and Michel Herbin


insights put more perspective on the usefulness of each instance as
an exemplar and its ability to subsume other elements.

4   EXPERIMENTS
To evaluate the efficiency of the described approach, we analyze
graphs created by our structuration algorithm and compare them
to graphs generated using the nearest neighbor algorithm in high-
dimensional datasets. We compare both methods using key indica-
tors related to connected components in the graphs, to study the
usefulness of each structure as a recommendation tool in a data
exploration scenario. Table 1 summarizes the studied datasets along
with their respective number of elements and dimensions.


      dataset               topic         elements    dimensions
      normal                N/A              300          1000
  residential [13]       real estate         372          103           Figure 3: Two connected components featuring the same
 communities [14]     law enforcement       2 215         101           number of elements. Structuration A) does not provide in-
                                                                        sights regarding similarities, while B) allows step by step ex-
Table 1: Description of the synthetic and real-world datasets           ploration with growing typicality when starting from any
used for comparison of the graph structure given by the pro-            element.
posed algorithm and a nearest neighbor selection.

                                                                        To create the similarity graph with our structuration around ex-
                                                                        emplars, we set to 10 the parameter T from Formula 1, meaning
   normal is a synthetic dataset composed of 300 elements described     the scoring function will give a positive score to the ten nearest
on 1000 dimensions. 80% of their dimensions are sampled from a          neighbors of each instance. As a reminder, this function is chosen
normal distribution centered on 0 with a standard deviation of 1.       arbitrarily as a generic scoring function but can be adapted for a
The remaining 20% dimensions are noise sampled from a uniform           specific dataset, given some a priori knowledge. The parameter we
distribution between 0 and 1. This very high-dimensional example        can fine-tune is the neighborhood size used in the last step of our
outlines the previously mentioned "curse of dimensionality": with       algorithm when choosing an exemplar for each element. However,
a thousand dimensions, it is a sparse dataset and distances between     because this parameter allows for a tradeoff between edges rep-
elements become hard to distinguish.                                    resenting very strong similarities and graphs exhibiting structure
                                                                        and properties suitable for exploration (low number of connected
Residential Buildings [13] (abbreviated "residential") and Communi-     components with high diameter), we will not seek to optimize this
ties and Crime [14] (abbreviated "communities") both are datasets       parameter to avoid skewing the comparison. As suggested in the
made available through the UCI machine learning repository [3].         previous section, we define k as 30% of the number of elements in
They are described with more than a hundred features, some of           the dataset so we can expect the right balance between meaningful
them being potential goals for prediction algorithms. residential       associations and optimal graph structure.
contains physical, financial and economic data related to 372 con-
struction projects of residential apartments in Tehran, the capital     This graph is compared to another graph generated by linking each
and largest city of Iran. communities combines data related to law      element of the dataset to its nearest neighbor using the Minkowski
enforcement as well as crime and its socio-economic factors. It         distance of order 0.75. While using an order lower than 1 violates
aggregates information about 2215 cities from the United States,        the triangular inequality, it provides significantly better results in
collected between 1990 and 1995.                                        high-dimensional spaces than Euclidian or even Manhattan dis-
                                                                        tances [1]. This way we can be sure that elements linked together
Before any analysis, we removed prediction goals and irrelevant         by the nearest neighbor method will be as close as possible in the
features (such as elements IDs) from both datasets to guarantee         full dimensional space.
that two similar elements will be close to one another in the full         We select three criteria to evaluate the suitability of each graph
description space. Dimensions containing missing values were also       as an exploration tool. The first two are the number of connected
pruned in the preprocessing step, representing a total of 2 and 46      components and their size: they inform us on whether, starting
features removed from the residential and communities datasets          from an element, following edges representing similarities will
respectively, the later containing numerous features that are predic-   allow visualization of a broad selection of related instances or if
tion goals. This was done to avoid noise that would be detrimental      exploration will quickly stop due to a lack of similar elements.
mostly to the nearest neighbor method. The number of dimensions         We also study the average diameter of the components for each
noted in Table 1 accounts for the remaining dimensions.                 graph: a lower diameter indicates that most elements chose the same
                                                                        exemplar, as in Figure 3.a. A greater diameter, similar to the structure
Explainable Structuring Exploration of High-Dimensional Data                            IUI Workshops’19, March 20, 2019, Los Angeles, USA


                  exemplar structuring           nearest neighbor
     dataset    number     size   diameter   number    size   diameter
    normal         18     16.7      5.2         22     13.6      2.8
  residential     64      5.8       2.7        105     3.5       2.1
 communities      256     8.7       3.2        400     5.5       2.5

Table 2: Analysis of graphs obtained with the proposed ap-
proach and a nearest neighbor selection approach. The num-
ber of connected components and their mean size and mean
diameter are detailed.



illustrated in Figure 3.b, gives the possibility to progressively explore
from a specific element to the most typical one.

5    RESULTS
Results of our experiments are shown in Table 2. For each method
and dataset we display the number of connected components ob-
tained, compute their mean size and their mean diameter. We can
see that, even for very high-dimensional data, our approach results
in better structuring of the instances where elements are linked to         Figure 4: Structuring residential data by linking every ele-
another more archetypal exemplar within each component. This is             ment to its nearest neighbor using a Minkowski distance of
outlined by the smaller number of connected components and their            order 0.75. There are numerous connected components with
overall larger diameter. This trend can be seen for every dataset we        only two or three elements: these are not suitable for explo-
studied: reducing the number of components by 18% to 39% and                ration based on similarity.
demonstrating promising results on the two real-world applications.

Figure 4 shows the structure obtained with the nearest neighbor
approach on the residential dataset. In this case, data are weakly
structured, making it hard for a user to gain any insight or to use this
graph for exploration. Figure 5 illustrates the same dataset where
instances are structured around exemplars. The structure makes
more sense if used to visualize similarities: connected components
contain more elements and have a larger diameter. Navigating data
is easier and more instances are available as exemplars so users
can choose the level of genericity they require for an exemplar, by
following multiple edges to an exemplar subsuming more nodes.
Such an exemplar will be farther from the initial instance, but more
archetypal and may better suit analogical reasoning by exhibiting
typical characteristics.
   With Figure 6 we illustrate the number of connected components
in the graph created with various values for the parameter k. This
figure demonstrates how k can be used as a granularity factor to
choose on a spectrum ranging a few components subsumed by very
archetypal exemplars to numerous subsets composed of closely
matching elements. The maximum number of components for any
value of k is equal to the number of components created by the
nearest neighbor approach, meaning that even for a very low of k            Figure 5: Structuring residential data using exemplars to cre-
the proposed structure is no less suitable for exploration than the         ate meaningful associations. This creates bigger connected
nearest neighbor one.                                                       components containing similar instances.
   To provide insights on how the data is structured, we detail
edges from a small connected component shown in Figure 7. This
component was extracted from the residential dataset, with the
same parameters previously used for Figure 5: neighborhood size k               • Vertices 1 and 2 both chose vertex 3 as their exemplar be-
of 124 and the threshold T from Formula 1 set to 10. We can describe              cause 3 is in the k-nearest neighbors of those elements on
the following:                                                                    89 dimensions, so they each are similar to 3 regarding most
IUI Workshops’19, March 20, 2019, Los Angeles, USA                                                Joris Falip, Frédéric Blanchard, and Michel Herbin


                                                                              6     CONCLUSION
                                    100
                                                                              To summarize this work, we proposed an instance-based structura-
   Number of connected components




                                                                              tion method that computes similarities between complex elements.
                                    75                                        It creates pairwise connections between related instances with
                                                                              exemplars subsuming similar elements. This structure allows for
                                                                              intuitive exploration of a dataset and is relevant when creating
                                    50
                                                                              a recommendation system. We validated our approach by study-
                                                                              ing a simulated dataset and data from real estate market and law
                                    25                                        enforcement. We compared our results to a nearest neighbor ap-
                                                                              proach suited for high-dimensionality where proximities in the full
                                                                              dimensional space were computed using a Minkowski distance of
                                     0                                        order 0.75 to avoid concentration of the distances. On these high-
                                          0   100             200       300   dimensional examples we outlined better performances that are
                                                    Input parameter K
                                                                              less subject to the "curse of dimensionality" usually encountered
                                                                              with these types of data. We also experimented on the use of pa-
Figure 6: Evolution of the number of connected components                     rameter k as a granularity factor giving users a straightforward
depending on the input parameter k, when structuring the                      way to influence similarities and the structuring process.
residential dataset.
                                                                              We are currently testing this algorithm’s results when applied to
                                                                              medical data, to help endocrinologists to better understand diabetes
                                                                              and diagnose patients more accurately. Future works include fur-
                                                                              ther development of our currently deployed prototype to easily
                                                                              gather user feedback on each association. This feedback could be
                                                                              used to create an automatic weighing of the features, tailoring the
                                                                              recommendation to each user’s preferences.

                                                                              6.1     Source code
                                                                              The source code implementing the proposed algorithm and used
                                                                              to create the figures and results in this paper is available on the
                                                                              following public Github repository: https://github.com/Elesday/
                                                                              ESIDA-IUI19.

                                                                              REFERENCES
                                                                               [1] Charu C Aggarwal, Alexander Hinneburg, and Daniel A Keim. 2001. On the
                                                                                   Surprising Behavior of Distance Metrics in High Dimensional Space. In Database
                                                                                   Theory — ICDT 2001. Springer Berlin Heidelberg, Berlin, Heidelberg, 420–434.
                                                                               [2] Frédéric Blanchard, Amine Aït-Younes, and Michel Herbin. 2015. Linking Data Ac-
                                                                                   cording to Their Degree of Representativeness (DoR). EAI Endorsed Transactions
                                                                                   on Industrial Networks and Intelligent Systems 2, 4 (June 2015), e2.
                                                                               [3] Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository.
                                                                                   http://archive.ics.uci.edu/ml
Figure 7: A connected component extracted from the result                      [4] Pedro Domingos. 2012. A few useful things to know about machine learning.
shown in Figure 5. The labels are arbitrary and added for                          Commun. ACM 55, 10 (2012), 78–87.
                                                                               [5] David L Donoho et al. 2000. High-dimensional data analysis: The curses and
clarity.                                                                           blessings of dimensionality. AMS Math Challenges Lecture 1 (2000), 32.
                                                                               [6] Joris Falip, Amine Aït Younes, Frédéric Blanchard, Brigitte Delemer, Alpha Diallo,
                                                                                   and Michel Herbin. 2017. Visual instance-based recommendation system for
                                                                                   medical data mining.. In KES. 1747–1754.
         features of the dataset. The dimensions linking 1 to 3 are the        [7] Roman Feldbauer and Arthur Flexer. 2018. A comprehensive empirical compari-
         same as those linking 2 to 3.                                             son of hubness reduction in high-dimensional spaces. Knowledge and Information
       • Elements 1 and 2 are similar to each other on 96 features.                Systems (May 2018), 1–30.
                                                                               [8] Gary Klein, Roberta Calderwood, and Anne Clinton-Cirocco. 2010. Rapid Decision
         However, they have not chosen each other as exemplars                     Making on the Fire Ground: The Original Study Plus a Postscript. Journal of
         because they both gave instance 3 a cumulated score of                    Cognitive Engineering and Decision Making 4, 3 (Sept. 2010), 186–209.
         more than 4, 600 while they evaluated each other at only              [9] Zachary C Lipton. 2016. The mythos of model interpretability. arXiv preprint
                                                                                   arXiv:1606.03490 (2016).
         around 2, 000.                                                       [10] Michael L Mack, Alison R Preston, and Bradley C Love. 2013. Decoding the
       • 3 is similar to 9 on 96 features, attributing it a total score            Brain’s Algorithm for Categorization from Its Neural Implementation. Current
                                                                                   Biology 23, 20 (Oct. 2013), 2023–2027.
         of 5, 000. As 3 only had a score of 4, 600, 9’s higher score         [11] Douglas L Medin and Marguerite M Schaffer. 1978. Context theory of classifica-
         confirms that it is indeed more typical. 1 and 2 were similar             tion learning. Psychological Review 85, 3 (1978), 207–238.
         to 3 on 89 features, from which 88 are also among the 96             [12] Robert M Nosofsky. 1986.            Attention, similarity, and the identifica-
                                                                                   tion–categorization relationship. Journal of Experimental Psychology: General
         similar features between 3 and 9.                                         115, 1 (1986), 39–57.
Explainable Structuring Exploration of High-Dimensional Data                                                IUI Workshops’19, March 20, 2019, Los Angeles, USA


[13] Mohammad Hossein Rafiei and Hojjat Adeli. 2015. A novel machine learning                     9–13.
     model for estimation of sale prices of real estate units. Journal of Construction       [17] Nenad Tomasev and Dunja Mladenic. 2011. Nearest Neighbor Voting in High-
     Engineering and Management 142, 2 (2015).                                                    Dimensional Data: Learning from Past Occurrences. In International Conference
[14] Michael Redmond. 2009. Communities and crime data set. UCI Machine Learning                  on Data Mining Workshops. IEEE, 1215–1218.
     Repository (2009).                                                                      [18] Nenad Tomasev, Milos Radovanovic, Dunja Mladenic, and Mirjana Ivanovic. 2014.
[15] Eleanor H Rosch. 1973. Natural categories. Cognitive psychology 4, 3 (1973),                 The Role of Hubness in Clustering High-Dimensional Data. IEEE Transactions on
     328–350.                                                                                     Knowledge and Data Engineering 26, 3 (2014), 739–751.
[16] Jeffrey N Rouder and Roger Ratcliff. 2006. Comparing Exemplar- and Rule-Based           [19] Yingzhen Yang, Xinqi Chu, Feng Liang, and Thomas S Huang. 2012. Pairwise
     Theories of Categorization. Current Directions in Psychological Science 15, 1 (2006),        Exemplar Clustering. AAAI (2012).