=Paper=
{{Paper
|id=Vol-2327/ESIDA4
|storemode=property
|title=Explainable Structuring and Discovery of Relevant Cases for Exploration of High-Dimensional Data
|pdfUrl=https://ceur-ws.org/Vol-2327/IUI19WS-ESIDA-4.pdf
|volume=Vol-2327
|authors=Joris Falip,Frédéric Blanchard,Michel Herbin
|dblpUrl=https://dblp.org/rec/conf/iui/FalipBH19
}}
==Explainable Structuring and Discovery of Relevant Cases for Exploration of High-Dimensional Data==
Explainable Structuring and Discovery of Relevant Cases for
Exploration of High-Dimensional Data
Joris Falip Frédéric Blanchard Michel Herbin
joris.falip@univ-reims.fr CReSTIC CReSTIC
CReSTIC Reims, France Reims, France
Reims, France
ABSTRACT even leverages the analogical reasoning most experts use: they of-
Data described by numerous features create a challenge for do- ten compare new cases or problems to previously encountered ones
main experts as it is difficult to manipulate, explore and visualize and base their decisions on past experiences. This way, the tool and
them. With the increased number of features, a phenomenon called answers it provides would feel more intuitive for end-users. We also
"curse of dimensionality" arises: sparsity increases and distance emphasize compatibility with high-dimensional data: experts often
metrics are less relevant as most elements of the dataset become rely on datasets described by hundreds or thousands of features,
equidistant. The result is a loss of efficiency for traditional machine meaning that any solution must remain efficient even with a high
learning algorithms. Moreover, many state-of-the-art approaches number of dimensions. Finally, explicability and interpretability are
act as black-boxes from a user point of view and are unable to pro- critical in many fields including health and medicine, insurance,
vide explanations for their results. We propose an instance-based finance or security: a recommendation system is even more useful
method to structure datasets around important elements called when end-users understand the process behind suggested decisions.
exemplars. The similarity measure used by our approach is less sen- It allows them to adapt and adjust the algorithm to their needs but
sitive to high-dimensional spaces, and provides both explainable most of all, it increases confidence in the results.
and interpretable results: important properties for decision-making
tools such as recommender systems. The described algorithm relies To reach these objectives, we created an unsupervised instance-
on exemplar theory to provide a data exploration tool suited to the based algorithm that enriches data and structures it, linking each
reasoning used by experts of various fields. We apply our method element to an exemplar exhibiting typical characteristics. The re-
to synthetic as well as real-world datasets and compare the results sulting graph can be used by a recommender system, building a
to recommendations made using a nearest neighbor approach. solution to explore data by following edges, each one representing
strong similarities on specific features. When exploring the dataset,
CCS CONCEPTS going from an instance to its exemplar means going from a spe-
cific element to a more generic and archetypal one. Our approach
• Information systems → Decision support systems; • Human-
processes each dimension individually then aggregate the results,
centered computing → Interactive systems and tools; • Comput-
making this method suitable for very high-dimensional analysis.
ing methodologies → Knowledge representation and reasoning.
Moreover, it identifies features contributing to similarities between
elements, making it straightforward to explain the structuration
KEYWORDS
choices. This strategy can be adapted to multiple contexts as it does
instance-based algorithm; high-dimensional data; exploratory anal- not require any a priori knowledge.
ysis; information visualization; exemplar theory; explainable ma-
chine learning; recommendation system In the following section, we detail the problems encountered when
ACM Reference Format: manipulating high-dimensional datasets and elaborate on the re-
Joris Falip, Frédéric Blanchard, and Michel Herbin. 2019. Explainable Struc- quirements of a successful exploration tool that would allow visual-
turing and Discovery of Relevant Cases for Exploration of High-Dimensional ization of similarities between data points and could give insights on
Data. In Joint Proceedings of the ACM IUI 2019 Workshops, Los Angeles, USA, hidden patterns. The next section describes the proposed algorithm,
March 20, 2019 , 7 pages. the resulting structure and how the input parameter influences
it. We then detail applications to various datasets, illustrating the
1 INTRODUCTION benefits of this method. Finally, we discuss the proposed approach,
This work addresses the task of structuring a dataset to facilitate ex- review current work in progress and possibilities for future work.
ploration and visualization. The proposed idea is to create pairwise
connections between similar elements in order to provide a similar- 2 BACKGROUND
ity graph usable for exploration and recommendation. Our goal is a
While high-dimensional databases are frequently encountered with
solution well-suited for domain experts, that takes into account and
many real-world data, experts lack appropriate and simple tools to
make the best use of these datasets and information. Many knowl-
IUI Workshops’19, March 20, 2019, Los Angeles, USA edge discovery tools provide relevant results but lack interpretabil-
© 2019 Copyright ©2019 for the individual papers by the papers’ authors. Copying ity or explainability: essential features when making any critical
permitted for private and academic purposes. This volume is published and copyrighted
by its editors. decision [9].
IUI Workshops’19, March 20, 2019, Los Angeles, USA Joris Falip, Frédéric Blanchard, and Michel Herbin
Also, when exploring and visualizing data, most algorithms rely strong similarity between individuals.
on prototypes [15] generated with averages from similar elements.
These prototype-based methods can be effective in some cases but From a practical point of view, the proposed method is currently
when manipulating data, experts of a specific field tend to under- implemented as a recommender system used by medical experts.
stand a new element (or instance) by comparing it to the most This allows them to visualize and understand diabetes-related data
similar instance they are familiar with [8, 16]. Experts naturally by exploring an association graph [6] where each vertex is a pa-
favor this approach when confronted with problems within their tient and patients in the same connected component had a similar
field of expertise, and it provides better results. [10]. This reasoning evolution of the disease.
is known as analogical reasoning and is formalized by the exem-
plar theory [11, 12]. For example, a physician will use his work Our algorithm is based on the Degree of Representativeness [2] (DoR).
experience to diagnose a new patient by linking it to the most simi- The DoR measures the typicality of an element and its ability to be
lar element in a set of typical patients he encountered in the past. an exemplar for other elements. Algorithm 1 illustrates the selec-
In analogical reasoning, typical instances of a dataset are called tion of an exemplar for each element according to the DoR, while
exemplars, each one subsuming a set of similar elements. These Algorithm 2 details computing of the DoR. This new algorithm
exemplars form a subset that can describe the whole dataset but addresses the problem of high-dimensionality: our last work was
remains small enough to manipulate and explore easily. using distance in the full description space when computing the
DoR, thus being unreliable with datasets described by a high number
When analyzing data described by a high number of features, it of features.
becomes harder to use the distance between instances as a measure
of similarity. This is because working with data described by a 3.1 Algorithm
high number of features comes with two inherent problems: data Let Ω be a set of N objects in a multidimensional space. These N
become sparser as density decreases and distances between ele- objects, or elements, are characterized by D qualitative or quantita-
ments normalize, so neighbors are harder to distinguish [1]. These tive features.
phenomena, often referred to as "curse of dimensionality" [5], make
many analysis and recommendation algorithms unreliable [4] be- For each of the D dimensions, we compute the distance matrix.
cause they use distances between instances to compute similarities. Each object then ranks every other object according to their simi-
A standard solution to this problem is to use feature projection larity on this dimension: low distance translates to a good ranking,
methods to reduce dimensionality, but this is not suitable if we with the nearest element being ranked 1 and the farthest ranked
wish to retain the interpretability of the process. On the other hand, N − 1. This step is repeated on each dimension to transform the D
feature selection reduces the genericity of the approach and require distance matrices into D rank matrices.
time and a priori knowledge to select and filter attributes.
Let us transform the ranks into scores. Let x be an object of Ω: for
High-dimensionality also provides new properties to elements, like each dimension d, x assigns a relative score Score xd to every other
the hubness phenomenon [18], which can help to improve and object y of Ω. Score xd , relative to x, can be any arbitrary function,
develop data mining techniques. Hubness is the fact that increased but in this paper it will be defined by:
dimensionality correlates with some instances being in the near-
est neighbors of the majority of the other elements, turning these
Score xd (y) = max(1 + T − Rank xd (y), 0) (1)
instances into so-called hubs. Hubness reduction methods, when
coupled with traditional approaches, do not improve their results where Rank xd (y) is the rank of an object y relative to x on di-
[7]. Hopefully this phenomenon can be used to adapt algorithms mension d, and each element only assigns a non-zero score to its T
to high-dimensional data [17]. nearest neighbors. For each element y, we can compute the sum of
all scores it received on a specific dimension:
3 STRUCTURING AROUND EXEMPLARS Score d (y) =
X
Score xd (y) (2)
Our proposed approach finds the most typical and "interesting" x ∈Ω
neighbor of each data point, without relying on a distance metric
. Let us define k as a numeric parameter allowing control over
in the entire space [19]. We process each dimension separately be-
the total number of exemplars. To find the exemplar of a given
fore aggregating the results: our algorithm is less affected by high
object x, we introduce the DoR x (y) of another element y as the
dimensionality, and interpretability is improved as we can quantify
the importance of each dimension regarding associations. We also sum of the scores Score d (y) for every dimension d where y is in
chose to use ranks instead of distances, to avoid excluding outliers the k nearest neighbors of x.
elements. X
DoR x (y) = Score d (y) (3)
Structuring a dataset by linking each element to its exemplar results d ∈D ′
in a graph where each connected component is a group of elements where D ′ is the set of dimensions on which y is in the k-nearest
similar enough to be subsumed by a few selected exemplars. This neighbors of x. The element chosen as an exemplar for object x is
graph allows for exploration of the dataset, with edges representing the element with the highest DoR x .
Explainable Structuring Exploration of High-Dimensional Data IUI Workshops’19, March 20, 2019, Los Angeles, USA
Data: N elements defined on D dimensions, neighborhood
Species setosa versicolor virginica
size K
Result: list of N exemplars 4.5
foreach dimension d in D do
4.0
Compute dissimilarity matrix of elements;
Transform similarities into ranks;
Sepal Width
3.5
Convert ranks into scores with arbitrary scoring function;
foreach element e in N do 3.0
Score d (e) ←− Score xd (e) given by x to y;
P
∀x ∈N 2.5
end
end 2.0
foreach element e in N do 5 6 7 8
exemplar (e) ←− max (DoR(x, e, K )); Sepal Length
∀x ∈N
end
Figure 1: Application to the Iris dataset with neighborhood
Result ←− list of exemplars determined above; size k set to 30 create numerous exemplars closely matching
Algorithm 1: Structuring algorithm elements they subsume.
Data: elements x and e defined on D dimensions,
neighborhood size K
Species setosa versicolor virginica
Result: Degree of Representativeness of element x according
to element e 4.5
Knnd (e) ←− K-nearest neighbors of e, on dimension d;
DoR ←− 0 4.0
foreach dimension d in D do
Sepal Width
3.5
if x ∈ Knnd (e) then
DoR ←− DoR + Score d (x ); 3.0
end
end 2.5
Result ←− DoR;
2.0
Algorithm 2: DoR computing algorithm
5 6 7 8
Sepal Length
3.2 Neighborhood size k and the resulting
Figure 2: Application to the Iris dataset with neighborhood
structure size k set to 50 create few exemplars that summarize the en-
When establishing pairwise connections between elements, param- tire dataset.
eter k determines the total number of exemplars used to summarize
the dataset. A low value for the neighborhood size (k about 20%
of the number of elements) gives numerous exemplars, each one
closely matching the few instances they subsume. The high number of 2 elements subsumed by each exemplar; while the later is ob-
of connected components in the resulting graph limits exploration tained with a parameter set to 50, resulting in 25 exemplars for an
of related instances by following edges but this configuration will average of 5 elements represented by each exemplar. Either one of
only group very similar elements. On the other hand, selecting a these two graphs makes for an excellent recommendation system
higher value (about 30% to 50% of the dataset’s size) gives fewer where, for each element, a similar but more generic and typical one
exemplars that each subsumes a significant subset of the population. can be suggested. It is also useful as a visualization tool, guiding
These exemplars provide a meaningful but condensed representa- users as they explore the dataset. Features playing the most promi-
tion of the data and exploration of the resulting structure is easy: nent role in each association can even be overlaid on the edges to
with few connected components in the graph, following edges al- inform on the nature of the similarities guiding the exploration
lows the discovery of many similar elements. Figure 6, discussed in process.
the results section, illustrates the number of connected components Given the graph resulting from the structuring process, another
obtained for every possible value of k on a real-world dataset and option is to study its evolution for every value of the parameter
thus the different granularities obtainable in the final graph. k. For a set of N elements, this means 1 ≤ k ≤ N . From this, we
can also establish other measures including the total number of
Figures 1 and 2 illustrate these differences: the former uses a neigh- iterations a given element is chosen as an exemplar, or the average
borhood size of 30 and is composed of 44 exemplars, with an average number of elements subsumed by each exemplar. These additional
IUI Workshops’19, March 20, 2019, Los Angeles, USA Joris Falip, Frédéric Blanchard, and Michel Herbin
insights put more perspective on the usefulness of each instance as
an exemplar and its ability to subsume other elements.
4 EXPERIMENTS
To evaluate the efficiency of the described approach, we analyze
graphs created by our structuration algorithm and compare them
to graphs generated using the nearest neighbor algorithm in high-
dimensional datasets. We compare both methods using key indica-
tors related to connected components in the graphs, to study the
usefulness of each structure as a recommendation tool in a data
exploration scenario. Table 1 summarizes the studied datasets along
with their respective number of elements and dimensions.
dataset topic elements dimensions
normal N/A 300 1000
residential [13] real estate 372 103 Figure 3: Two connected components featuring the same
communities [14] law enforcement 2 215 101 number of elements. Structuration A) does not provide in-
sights regarding similarities, while B) allows step by step ex-
Table 1: Description of the synthetic and real-world datasets ploration with growing typicality when starting from any
used for comparison of the graph structure given by the pro- element.
posed algorithm and a nearest neighbor selection.
To create the similarity graph with our structuration around ex-
emplars, we set to 10 the parameter T from Formula 1, meaning
normal is a synthetic dataset composed of 300 elements described the scoring function will give a positive score to the ten nearest
on 1000 dimensions. 80% of their dimensions are sampled from a neighbors of each instance. As a reminder, this function is chosen
normal distribution centered on 0 with a standard deviation of 1. arbitrarily as a generic scoring function but can be adapted for a
The remaining 20% dimensions are noise sampled from a uniform specific dataset, given some a priori knowledge. The parameter we
distribution between 0 and 1. This very high-dimensional example can fine-tune is the neighborhood size used in the last step of our
outlines the previously mentioned "curse of dimensionality": with algorithm when choosing an exemplar for each element. However,
a thousand dimensions, it is a sparse dataset and distances between because this parameter allows for a tradeoff between edges rep-
elements become hard to distinguish. resenting very strong similarities and graphs exhibiting structure
and properties suitable for exploration (low number of connected
Residential Buildings [13] (abbreviated "residential") and Communi- components with high diameter), we will not seek to optimize this
ties and Crime [14] (abbreviated "communities") both are datasets parameter to avoid skewing the comparison. As suggested in the
made available through the UCI machine learning repository [3]. previous section, we define k as 30% of the number of elements in
They are described with more than a hundred features, some of the dataset so we can expect the right balance between meaningful
them being potential goals for prediction algorithms. residential associations and optimal graph structure.
contains physical, financial and economic data related to 372 con-
struction projects of residential apartments in Tehran, the capital This graph is compared to another graph generated by linking each
and largest city of Iran. communities combines data related to law element of the dataset to its nearest neighbor using the Minkowski
enforcement as well as crime and its socio-economic factors. It distance of order 0.75. While using an order lower than 1 violates
aggregates information about 2215 cities from the United States, the triangular inequality, it provides significantly better results in
collected between 1990 and 1995. high-dimensional spaces than Euclidian or even Manhattan dis-
tances [1]. This way we can be sure that elements linked together
Before any analysis, we removed prediction goals and irrelevant by the nearest neighbor method will be as close as possible in the
features (such as elements IDs) from both datasets to guarantee full dimensional space.
that two similar elements will be close to one another in the full We select three criteria to evaluate the suitability of each graph
description space. Dimensions containing missing values were also as an exploration tool. The first two are the number of connected
pruned in the preprocessing step, representing a total of 2 and 46 components and their size: they inform us on whether, starting
features removed from the residential and communities datasets from an element, following edges representing similarities will
respectively, the later containing numerous features that are predic- allow visualization of a broad selection of related instances or if
tion goals. This was done to avoid noise that would be detrimental exploration will quickly stop due to a lack of similar elements.
mostly to the nearest neighbor method. The number of dimensions We also study the average diameter of the components for each
noted in Table 1 accounts for the remaining dimensions. graph: a lower diameter indicates that most elements chose the same
exemplar, as in Figure 3.a. A greater diameter, similar to the structure
Explainable Structuring Exploration of High-Dimensional Data IUI Workshops’19, March 20, 2019, Los Angeles, USA
exemplar structuring nearest neighbor
dataset number size diameter number size diameter
normal 18 16.7 5.2 22 13.6 2.8
residential 64 5.8 2.7 105 3.5 2.1
communities 256 8.7 3.2 400 5.5 2.5
Table 2: Analysis of graphs obtained with the proposed ap-
proach and a nearest neighbor selection approach. The num-
ber of connected components and their mean size and mean
diameter are detailed.
illustrated in Figure 3.b, gives the possibility to progressively explore
from a specific element to the most typical one.
5 RESULTS
Results of our experiments are shown in Table 2. For each method
and dataset we display the number of connected components ob-
tained, compute their mean size and their mean diameter. We can
see that, even for very high-dimensional data, our approach results
in better structuring of the instances where elements are linked to Figure 4: Structuring residential data by linking every ele-
another more archetypal exemplar within each component. This is ment to its nearest neighbor using a Minkowski distance of
outlined by the smaller number of connected components and their order 0.75. There are numerous connected components with
overall larger diameter. This trend can be seen for every dataset we only two or three elements: these are not suitable for explo-
studied: reducing the number of components by 18% to 39% and ration based on similarity.
demonstrating promising results on the two real-world applications.
Figure 4 shows the structure obtained with the nearest neighbor
approach on the residential dataset. In this case, data are weakly
structured, making it hard for a user to gain any insight or to use this
graph for exploration. Figure 5 illustrates the same dataset where
instances are structured around exemplars. The structure makes
more sense if used to visualize similarities: connected components
contain more elements and have a larger diameter. Navigating data
is easier and more instances are available as exemplars so users
can choose the level of genericity they require for an exemplar, by
following multiple edges to an exemplar subsuming more nodes.
Such an exemplar will be farther from the initial instance, but more
archetypal and may better suit analogical reasoning by exhibiting
typical characteristics.
With Figure 6 we illustrate the number of connected components
in the graph created with various values for the parameter k. This
figure demonstrates how k can be used as a granularity factor to
choose on a spectrum ranging a few components subsumed by very
archetypal exemplars to numerous subsets composed of closely
matching elements. The maximum number of components for any
value of k is equal to the number of components created by the
nearest neighbor approach, meaning that even for a very low of k Figure 5: Structuring residential data using exemplars to cre-
the proposed structure is no less suitable for exploration than the ate meaningful associations. This creates bigger connected
nearest neighbor one. components containing similar instances.
To provide insights on how the data is structured, we detail
edges from a small connected component shown in Figure 7. This
component was extracted from the residential dataset, with the
same parameters previously used for Figure 5: neighborhood size k • Vertices 1 and 2 both chose vertex 3 as their exemplar be-
of 124 and the threshold T from Formula 1 set to 10. We can describe cause 3 is in the k-nearest neighbors of those elements on
the following: 89 dimensions, so they each are similar to 3 regarding most
IUI Workshops’19, March 20, 2019, Los Angeles, USA Joris Falip, Frédéric Blanchard, and Michel Herbin
6 CONCLUSION
100
To summarize this work, we proposed an instance-based structura-
Number of connected components
tion method that computes similarities between complex elements.
75 It creates pairwise connections between related instances with
exemplars subsuming similar elements. This structure allows for
intuitive exploration of a dataset and is relevant when creating
50
a recommendation system. We validated our approach by study-
ing a simulated dataset and data from real estate market and law
25 enforcement. We compared our results to a nearest neighbor ap-
proach suited for high-dimensionality where proximities in the full
dimensional space were computed using a Minkowski distance of
0 order 0.75 to avoid concentration of the distances. On these high-
0 100 200 300 dimensional examples we outlined better performances that are
Input parameter K
less subject to the "curse of dimensionality" usually encountered
with these types of data. We also experimented on the use of pa-
Figure 6: Evolution of the number of connected components rameter k as a granularity factor giving users a straightforward
depending on the input parameter k, when structuring the way to influence similarities and the structuring process.
residential dataset.
We are currently testing this algorithm’s results when applied to
medical data, to help endocrinologists to better understand diabetes
and diagnose patients more accurately. Future works include fur-
ther development of our currently deployed prototype to easily
gather user feedback on each association. This feedback could be
used to create an automatic weighing of the features, tailoring the
recommendation to each user’s preferences.
6.1 Source code
The source code implementing the proposed algorithm and used
to create the figures and results in this paper is available on the
following public Github repository: https://github.com/Elesday/
ESIDA-IUI19.
REFERENCES
[1] Charu C Aggarwal, Alexander Hinneburg, and Daniel A Keim. 2001. On the
Surprising Behavior of Distance Metrics in High Dimensional Space. In Database
Theory — ICDT 2001. Springer Berlin Heidelberg, Berlin, Heidelberg, 420–434.
[2] Frédéric Blanchard, Amine Aït-Younes, and Michel Herbin. 2015. Linking Data Ac-
cording to Their Degree of Representativeness (DoR). EAI Endorsed Transactions
on Industrial Networks and Intelligent Systems 2, 4 (June 2015), e2.
[3] Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository.
http://archive.ics.uci.edu/ml
Figure 7: A connected component extracted from the result [4] Pedro Domingos. 2012. A few useful things to know about machine learning.
shown in Figure 5. The labels are arbitrary and added for Commun. ACM 55, 10 (2012), 78–87.
[5] David L Donoho et al. 2000. High-dimensional data analysis: The curses and
clarity. blessings of dimensionality. AMS Math Challenges Lecture 1 (2000), 32.
[6] Joris Falip, Amine Aït Younes, Frédéric Blanchard, Brigitte Delemer, Alpha Diallo,
and Michel Herbin. 2017. Visual instance-based recommendation system for
medical data mining.. In KES. 1747–1754.
features of the dataset. The dimensions linking 1 to 3 are the [7] Roman Feldbauer and Arthur Flexer. 2018. A comprehensive empirical compari-
same as those linking 2 to 3. son of hubness reduction in high-dimensional spaces. Knowledge and Information
• Elements 1 and 2 are similar to each other on 96 features. Systems (May 2018), 1–30.
[8] Gary Klein, Roberta Calderwood, and Anne Clinton-Cirocco. 2010. Rapid Decision
However, they have not chosen each other as exemplars Making on the Fire Ground: The Original Study Plus a Postscript. Journal of
because they both gave instance 3 a cumulated score of Cognitive Engineering and Decision Making 4, 3 (Sept. 2010), 186–209.
more than 4, 600 while they evaluated each other at only [9] Zachary C Lipton. 2016. The mythos of model interpretability. arXiv preprint
arXiv:1606.03490 (2016).
around 2, 000. [10] Michael L Mack, Alison R Preston, and Bradley C Love. 2013. Decoding the
• 3 is similar to 9 on 96 features, attributing it a total score Brain’s Algorithm for Categorization from Its Neural Implementation. Current
Biology 23, 20 (Oct. 2013), 2023–2027.
of 5, 000. As 3 only had a score of 4, 600, 9’s higher score [11] Douglas L Medin and Marguerite M Schaffer. 1978. Context theory of classifica-
confirms that it is indeed more typical. 1 and 2 were similar tion learning. Psychological Review 85, 3 (1978), 207–238.
to 3 on 89 features, from which 88 are also among the 96 [12] Robert M Nosofsky. 1986. Attention, similarity, and the identifica-
tion–categorization relationship. Journal of Experimental Psychology: General
similar features between 3 and 9. 115, 1 (1986), 39–57.
Explainable Structuring Exploration of High-Dimensional Data IUI Workshops’19, March 20, 2019, Los Angeles, USA
[13] Mohammad Hossein Rafiei and Hojjat Adeli. 2015. A novel machine learning 9–13.
model for estimation of sale prices of real estate units. Journal of Construction [17] Nenad Tomasev and Dunja Mladenic. 2011. Nearest Neighbor Voting in High-
Engineering and Management 142, 2 (2015). Dimensional Data: Learning from Past Occurrences. In International Conference
[14] Michael Redmond. 2009. Communities and crime data set. UCI Machine Learning on Data Mining Workshops. IEEE, 1215–1218.
Repository (2009). [18] Nenad Tomasev, Milos Radovanovic, Dunja Mladenic, and Mirjana Ivanovic. 2014.
[15] Eleanor H Rosch. 1973. Natural categories. Cognitive psychology 4, 3 (1973), The Role of Hubness in Clustering High-Dimensional Data. IEEE Transactions on
328–350. Knowledge and Data Engineering 26, 3 (2014), 739–751.
[16] Jeffrey N Rouder and Roger Ratcliff. 2006. Comparing Exemplar- and Rule-Based [19] Yingzhen Yang, Xinqi Chu, Feng Liang, and Thomas S Huang. 2012. Pairwise
Theories of Categorization. Current Directions in Psychological Science 15, 1 (2006), Exemplar Clustering. AAAI (2012).