=Paper= {{Paper |id=Vol-3078/paper-52 |storemode=property |title=Unsupervised Data Pattern Discovery on the Cloud |pdfUrl=https://ceur-ws.org/Vol-3078/paper-52.pdf |volume=Vol-3078 |authors=Thomas Cecconello,Lucas Puerari,Giuseppe Vizzari |dblpUrl=https://dblp.org/rec/conf/aiia/CecconelloPV21 }} ==Unsupervised Data Pattern Discovery on the Cloud== https://ceur-ws.org/Vol-3078/paper-52.pdf
Unsupervised Data Pattern Discovery on the Cloud
Thomas Cecconello1 , Lucas Puerari1 and Giuseppe Vizzari1
1
    Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Milan, Italy


                                         Abstract
                                         Scientific research implies the production of data describing phenomena still not studied and well under-
                                         stood. Sometimes the amount and rate of generation of produced data can be overwhelming, and anyway
                                         tools supporting a computer assisted analysis of scientific data can support systematic forms of data driven
                                         analysis. Machine learning can be an instrument in an overall flow including domain experts and computer
                                         scientists. Adopted machine learning approaches need to be unsupervised, employing just the input
                                         data as a teacher. We propose a two-step workflow: (i) achieving a compact representation of elements
                                         of the dataset by means of representation learning techniques, shifting the analysis from cumbersome
                                         representations to compact vectors in a latent space, and (ii) clustering points associated to instances to
                                         suggest patterns to the domain experts that will evaluate their potential meaning within the domain. The
                                         paper presents the rationale of the approach within a cloud based setting, and first experiments on an image
                                         dataset from the literature.

                                         Keywords
                                         Representation learning, Pattern discovery, Clustering, Cloud




1. Introduction
Scientific research is a human activity that often implies the production of new or improved
measurement tools, leading to the generation of new data, that needs to be analyzed, either to
corroborate existing theories, or to support the generation of new ones. Next generation astronom-
ical facilities, for instance, such as the Square Kilometre Array will generate an overwhelming
volume of data at a rate that simply cannot be matched by our ability to make sense out of them.
   Sometimes, moreover, it is impossible to use supervised techniques to support these researches:
the studied phenomena are often object of intense study and classification schemes are still
not agreed upon, they might be uncertain, or new data was acquired exactly with the goal of
defining a classification scheme that was impossible to come up with using previously available
data. Automated/semi-automated tools are needed to support this kind of research: machine
learning can be an instrument in an overall workflow including domain experts and computer
scientists. We emphasize that adopted machine learning approaches need to be essentially
unsupervised, employing just the input data as a teacher: as we will discuss more thoroughly
in Sect. 3.1, this kind of investigation has seen recently a certain interest and attention with
particular reference to so-called self–supervised learning approaches. This term refers to a way

AIxIA 2021 Discussion Papers
" thomas.cecconello@unimib.it (T. Cecconello); lucas.puerari@unimib.it (L. Puerari); giuseppe.vizzari@unimib.it
(G. Vizzari)
 0000-0002-7916-6438 (G. Vizzari)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
of framing an unsupervised learning problem so as to apply supervised learning algorithms to
solve it; this typically implies finding a way to create a loss function not requiring user labelling
of data (although it might consider forms of automatic labelling). The workflow proposed in
this paper comprises two steps: (i) achieving a compact representation of the elements of the
dataset by means of representation learning [1] techniques, shifting the following analysis from
cumbersome representations to compact vectors in a latent space, and (ii) clustering points
associated to instances of the starting dataset to suggest patterns to the domain experts that will
evaluate their potential meaning in the studied domain. The steps could, of course, be iterated,
with changes in the operating parameters and hyperparameters of the involved algorithms, and
proper user interfaces are required to support these activities by end users that are not computer
scientists. This kind of investigation certainly has significant relationships with techniques
sometimes referred to as deep clustering [2], which also mostly (but not exclusively) consider
unsupervised techniques employing deep learning approaches and techniques supporting analyses
based on clustering (some of which propose a very similar workflow, mostly excluding forms of
visual analysis, but clearly comprising forms of representation learning before actual clustering).
Nonetheless, we want to emphasize that we need to be as informative as possible, taking a
human-in-the-loop approach, and also consider that the human is not necessarily an expert in
machine learning techniques. Our goal here is basically to support knowledge creation taking a
data-driven perspective, either to suggest or support a form of validation of more theoretically
guided approaches.
   Another relevant aspect of our approach is cloud orientation. Due to the potentially significant
computational requirements of the tasks included in the workflow, cloud architectures can
represent extremely promising approaches, also supporting the integration with data repositories
complying FAIR1 principles within an Open Science perspective2 . This is the context in which
the NEANIAS project (Novel EOSC Services for Emerging Atmosphere, Underwater & Space
Challenges), and the present work are set. The project is aimed at contributing to the European
Open Science Cloud (EOSC)3 , also by means of the development and integration of innovative
cross-cutting services for tackling operationally space-related studies. The following Section
will elaborate the motivations and context in which this research is set. Sect. 3 will describe
the overall workflow and the comprised steps, while Sect 4 will describe experimental results
achieved with a preliminary version of the service implementing the workflow. Conclusions and
future developments will end the paper.


2. Motivation and Context
The European Open Science Cloud (EOSC) is a long term initiative recognised and funded by
the Council of the European Union having the ambition to European researchers, innovators,
companies and citizens with a federated and open multi-disciplinary environment where they
can publish, find and re-use data, tools and services for research, innovation and educational
purposes. Until 2020, i.e. within the Horizon 2020 programme, EOSC federated existing research

   1
     https://www.go-fair.org/fair-principles/
   2
     https://www.oecd.org/science/inno/open-science.htm
   3
     https://eosc-portal.eu/about/eosc
data infrastructures in Europe, and started the realization of a web of FAIR data and related
services for science, making research data interoperable and machine actionable following the
FAIR guiding principles [3]. Within this context, the NEANIAS project aims at contributing
by developing, integrating, and disseminating to the relevant communities a set of innovative
cross-cutting services, in particular for tackling operationally space-related studies. Some of
these services employ AI services and, in particular, Machine Learning tools supporting data
analysis.
   Within the Work Package 4 of the project (Space Research Services), we face several situations
that imply forms of analysis of astronomical images that can be framed as supervised machine
learning approaches: for instance, images depicting radio maps of the galactic plane in different
bands can be analyzed for the detection and classification of sources of signals [4]. In these
cases, in general, there are available annotated datasets, and of course experts have accumulated
sufficient knowledge supporting annotators in their operations. There are other situations, however,
in which studied images are still object of intense study, maybe because the number of considered
objects is still not really high (for instance, supernovae remnants [5]), or because knowledge
required for annotating astronomical images is still being constructed (the interpretation of certain
physical observations is still object of debate). Discussions with domain experts led to consider
the possibility to support analyses of the available (and soon to be generated and distributed)
datasets by means of unsupervised ML approaches, that could be used to generate visualizations
of the overall dataset and comprising images within a representation of a latent space associated to
the dataset itself, in the vein of [6] and not far from the extremely recent approach described in [7],
or even to perform pattern discovery operations and clustering, suggesting potential classifications
schemes. This kind of computer support to domain expert research activities could be useful at
least in two cases within NEANIAS (in particular the above mentioned study of infrared and
radio images of supernova remnants and also analysis of images associated to compact sources
(clumps) that could lead to interesting innovative insights on star formation), but we are confident
that this might be developed and deployed as a service of wider interest and applicability within
the EOSC.


3. Proposed Workflow
The proposed workflow and the set of implied tasks for unsupervised data pattern discovery,
in particular in image datasets, is summarized in Figure 1. For simplicity we excluded a data
preparation step, that however is often extremely important and that is definitely not trivial within
the context of astronomical and multi-spectral images (with serious methodological issues about
how to manage normalization especially considering images acquired by means of different types
of sensors). Nonetheless, for sake of space, we are going to focus on the steps bringing to the
construction of a latent space based on the dataset, and in subsequent phases of analysis of this
space, both through visualization techniques and clustering.
   While the representation learning task must be carried out immediately after data preparation,
so as to achieve a latent space from the points associated to images within the dataset, clustering
and visual analysis can be considered as potentially parallel tasks: first of all, the user might be
proposed immediately a visualization of the structure of the latent space, in which images that
Figure 1: Proposed workflow for unsupervised data pattern discovery.


generated it are positioned. To do this, however, since the latent space is typically characterized
by a relatively high dimensionality, a specific technique for dimensionality reduction must be
adopted to achieve a 2D or 3D map of the latent space. Clustering points in this space does not
strictly require a visualization, but, of course, being able to have an at-a-glance representation of
clustered images, their neighborhood in the latent space, being able to navigate it and visually
inspect the grouping suggested by the clustering algorithm execution can help the user evaluating
the achieved results, and their plausibility within the domain of research. In the following
we will articulate the techniques we selected for the different above mentioned tasks, that is,
representation learning, dimensionality reduction and visualization, and clustering.

3.1. Representation learning
A growing number of unsupervised or self-supervised representation learning techniques de-
veloped to deal with pictorial data are being developed and experimented to avoid or simplify
(partly automating) time consuming and potentially expensive image labelling tasks. Despite the
recent interest, one of the earliest approaches for this kind of task is based on Auto Encoders
(AE) [8]: within this approach, encoding and decoding neural networks are trained using a loss
signal related to the difference between the input and reconstructed image. At the point of contact
between encoder and decoder we have a latent vector representation of the image presented
as input. From this venerable approach, that produced extremely interesting results for noise
reduction and image compression limiting the loss of quality, several additional techniques have
been generated, in particular Variational Auto Encoders (VAE) [9] and Generative Adversarial
Networks (GAN) [10]. While basic AEs are very simple and easily implemented, the overall
approach has the final goal of effectively reconstructing the inputs minimizing errors, but the
resulting latent space might not help identifying meaningful groups of elements within the dataset
that generated the latent space. For related reasons, also VAE and GAN are not generally suited
to generate latent spaces simplifying an effective visual analysis and clustering of the starting
dataset4 .
   More recent works in the self-supervised representation learning context were specifically
devised for simplifying image classification tasks, significantly reducing the need of annotated
data. In particular, some self-supervised techniques are based on a two step approach [12]: a first
step not requiring supervision is used essentially to provide a first structure of the latent space that
is subsequently tuned by using a relatively small labeled dataset. The adopted transfer learning
approach implies that the labeled dataset covers relatively well the categories of subjects included
in the initial unlabeled dataset. Relevant representatives of this approach are MOCO [13] and
SimCLR [14]. Although the last approach is just partly applicable to our specific context, since
we do not have any annotated image due to the fact that the classification scheme is unknown,
the unsupervised part of SimCLR has been successfully used as a generator of latent space and
encoder for a subsequent clustering approach, which we will describe later on. Moreover, there is
a recent report of a successful adoption of this technique with a small initial dataset, with small
images and small network architectures [15]: in our case, the low requirements at least on the
number of images required for training is very important and it suggests this could be a promising
representation learning approach.

3.2. Latent Space Visualization
In order to support an at-a-glance evaluation of both the structure of the latent space and the
intuitive possibility to perform effectively a clustering of the images within the dataset, it
can be useful to produce a reduction of the latent space to 2D or 3D spaces. This of course
implies a reduction of dimensionality, since the dimension of the latent space is generally quite
substantial. Commonly used algorithms for performing this reduction are Principal Component
Analysis (PCA) [16], t-distributed stochastic neighbor embedding (TSNE) [17] and Uniform
Manifold Approximation and Projection (UMAP) [18]. These techniques can have problems in
simultaneously preserving the capability to represent local and global structures within the latent
space, so more recent approaches (in particular TriMAP[19], PaCMAP[20], and denseMAP [21])
are working towards an improvement of this aspect.
   Besides the techniques, it is relevant to mention the fact that relevant projects aimed at providing
fully fledged tools for the visualization of latent spaces and datasets that were used to generate
them can be found in the literature. In particular, Latent Space Cartography tool [6] is a general
tool providing 2D maps in which images within the dataset are positioned trying to capture
“semantic” dimensions (e.g. using a dataset of emojis, the tool can show the interpolation between
to points like a happy face and a sad one: moving along the line the smile fades gradually).
Moving to our context of application, a very recent work [7] visualizing a latent space of a
multidimensional dataset of galaxies generated through a self-supervised approach (SimCLR)
and adopting UMAP for sake of dimensionality reduction helps understanding the physics of a
certain type of analysed galaxies. In particular, in this case there is no particular discovery, but a
confirmation of the presence of two well-known categories of galaxies (rotating main-sequence
disks and massive slow rotators) from a purely data driven perspective. This application suggests
that the proposed workflow and overall approach is promising also for supporting the analysis of

    4
        Although the are some GAN based approaches actually focused on this task, such as [11]
situations in which instead classifications are still not present.

3.3. Clustering Latent Spaces
We do not intend, nor we think it is reasonable within this paper, to provide a brief introduction
to clustering here. In fact, we are not really facing a general clustering problem, since we are
actually proposing to analyze a latent space whose structure and “semantics” is actually unknown.
We will just briefly report a few relevant already tested or promising approaches that are currently
being evaluated, also considering that the project is still ongoing and we are still focused on the
representation learning and visualization steps.
   Within Sect. 3.1 we already suggested that the SimCLR self-supervised approach has been
used to generate a latent space later analysed by means of a clustering approach called SCAN [22].
Within this approach, authors propose training a network performing clustering (i.e. predicting the
cluster assignment for a given individual): of course, the number of clusters in which the dataset
must be subdivided must be known in advance, similarly as for K-Means. Unlike K-Means,
authors make assumptions on the distribution of points in the different clusters (since they are
considering image datasets they assume that it is very unlikely the situation in which one or
very few classes dominate others), and this is reflected on the defined loss function. Whereas
this assumption does not make sense in our case, this represents an interesting success case for
clustering a latent space.
   Moreover, within our situation, the number of clusters is actually unknown and also their
shapes within the latent space. For this reason we think that density based approaches such as
DBSCAN [23] could represent a reasonable starting point, although a distance metric should be
identified and proper calibration should be carried out. While this aspect can be problematic in a
situation in which complete automation is desirable, within our case the domain experts should
be kept in the loop and therefore it does not really represent a insurmountable problem, provided
that proper support to the selection of relevant hyperparameters is provided.


4. Experimental Framework and Results
Starting from the above discussion, we started developing and testing Latent Space Explorer, a
cloud oriented framework supporting the workflow described in Figure 1, first of all to support
the exploration of the practical issues related to latent space visual analysis, even before moving
in the clustering step. In fact, the number of alternative options already in the representation
learning and in the latent space dimensionality reduction steps call for an initial focusing on these
aspects, also on already existing datasets before moving to the specific context of application at
hand.

4.1. Architecture
A decade ago a manifesto named “twelve-factor app”5 was published: it describes a guidelines for
building software-as-a-service (SaaS) systems. Most of the proposed best-practices are actually

   5
       https://12factor.net
Figure 2: Architecture view focused on stateless and scalablility features. Two users contact the
same endpoint that redirects the requests to most available client resource. The pods does not
store any information locally, so the next request does not need to be redirected at the same pod.
The same happens for client server communication.


fulfilled by design through a proper adoption of Kubernetes6 and by current CI/CD systems,
while other ones still need attention in the design of an architecture.
   Looking at Figure 2, two related features are highlighted: stateless processes and process
scalability. If each pod receives complete information required to fulfill an incoming request
without saving any information, then the requester does not need to be aware of which pod to
contact, and therefore it could submit the request to any available pod. This feature, therefore,
helps scaling pods as needed.
   Another important feature for making cloud native applications more in tune with the “twelve-
factor app” manifesto is related to the management of enabling services. Figure 3, on the top
portion, represents an example of a a situation in which the actual system architecture depends
on unique services for tasks such as cloud storage and scheduling. On the other hand, if for
some reason a project needs to migrate to another storage service, this dependency leads to hard
refactoring of the code to support the migration. The “twelve-factor app” design guide suggests
to separate the management of supporting services into another component, in order to isolate
code changes.
   The approach adopted in the desing and implementation of the Latent Space Explorer tries to
combine a modern cloud oriented approach with the two above principles. The first version of
the service is publicly available for usage7 and a small user guide has also been released8 . The
overall project is going to be released adopting an open source license (still to be defined), and
all the relevant information will also be made available through the above links and through the
NEANIAS project communication channels.
     6
       Kubernetes is an open-source container-orchestration system for automating computer application deployment,
scaling, and management - See: https://kubernetes.io/
     7
       https://lse.neanias.eu/
     8
       https://docs.neanias.eu/projects/s3-service/en/latest/services/latent_space_explorer.html
Figure 3: Architecture view focused on backing services


4.2. Workflow and Visualization
While the framework is still under active development and it has still not reached the desired
level for the first release, it can already be used to support forms of visualizations of reductions
(to 2D and 3D spaces) of a latent space resulting from the representation learning step of the
proposed workflow. A sample screenshot of the working version of Latent Space Explorer
is shown in Figure 4: the left column shows information about the specific experiment to
perform and visualize, allowing the specification/selection of the dataset, associated latent space,
dimensionality reduction technique, and even clustering algorithm (each with the associated
hyperparameters when applicable). The central part is the navigable visualization of the latent
space, including points associated to the images of the dataset. The right column shows details
about the selected point, or statistics about the latent space and/or clustering.
   Within this phase of the work, the main value of the potential usage of the tool is actually
the visualization of the 2D or 3D reduction of a latent space, to have an at-a-glance idea of the
structure achieved through the representation learning step. Figure 5 shows a visualization of
the UMAP reduction the latent space associated to the Standford Dogs dataset [24] (a subset of
Imagenet with more fine-grained labels that represent 120 dog breeds) considering the latent space
associated to a resnet158 3x pre-trained on Imagenet with SimCLRV2. The color of the points is
associated to the actual breed of the dog associated to the image, and it can be easily seen that
visually peculiar dogs are positioned in areas of the space that are well separated from the others;
visual features of dogs are of course most relevant (it seems clear, for instance, that fur length is
short in the left hand side and long in the right hand side of the space). Nonetheless, contextual
elements (e.g. snowy background or a beach, presence of a cage) are also plausibly relevant in
Figure 4: A screenshot of Latent Space Explorer.


determining the positioning of points in space. Although the visual inspection of this kind of latent
space reveals potentially useful information to a viewer, clustering this kind of space with the
aim of identifying dog breeds would hardly produce immediately usable results. Let us consider,
however, different network architectures, to have an idea of the effect of changing the size of the
network on the capability of better representing “semantic” aspects of the dataset. Figure 6 shows
a comparison of different latent spaces associated to different network architectures, selected
from the list of available pre-trained (on Imagenet) networks. The base model is a Resnet [25]
with different hyperparameters: in particular, use of selective kernel (SK) [26], different depths
[50,101,158], and width [1x, 3x]. In general we can see that latent spaces associated to deeper and
wider networks are generally better at discriminating dog breeds, plausibly being able to grasp
more information about the dataset and comprised images. Even the selective kernel achieves
better results.
   The visual analysis of the reduction of the latent space to a 2D/3D structure is therefore clearly
important to have an idea of how the result of the representation learning step. It is very reasonable
to inspect this kind of visualization, maybe trying different alternative representation learning
approaches (or sets of values for the hyperparameters) before actually even trying to move to the
clustering phase. Our situation, the analysis of astronomical images, is characterised by smaller
datasets (although, as suggested in the introduction, initiatives like the Square Kilometre Array
will plausibly change the situation significantly) of images that are much less cluttered than
those present in Imagenet (and in its subset represented by the Stanford Dogs). On the one hand,
the representation learning approaches will have to face less “distractions” (although the data
preparation phase will necessary face previously mentioned challenges related to normalization
Figure 5: Visualization of the UMAP reduction (𝑚𝑖𝑛𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 = 0.1, 𝑛𝑛 𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 = 15) of the
latent space associated to the Standford Dogs dataset using Resnet152 with layer widths 3x,
pre-trained on Imagenet with SimCLRV2.


within a multi-spectral images context), on the other we will not be able to use large deep neural
network architectures (the training would simply not converge with so little data available),
and this could make it hard to process some fine details that might be extremely important. In
order to produce interesting results we will need to consider recent developments within the
self-supervised techniques [15] context that promise to be able to provide useful results even with
small datasets (of potentially small images), with small networks.


5. Conclusions
The paper has presented the current status of development of a project aimed at supporting unsu-
pervised data pattern discovery within datasets of images, particulrarly focused on astronomical
images, but in general usable for performing workflows including representation learning, leading
to the definition of a latent space associated to the dataset, and forms of analysis ranging from
visualization of 2D/3D reductions of the latent space and even clustering. We have set the work
within the context of the NEANIAS project and more generally within EOSC, and we have
discussed the motivation, rationale of the approach, discussing the relevant state of the art. The
current status of the project was presented, and its current potential usage to visually inspect latent
spaces was discussed. We tried to clarify why the visual analysis of the latent space structure
is important, not just because a human-in-the-loop perspective is crucial within a knowledge
discovery and creation for scientific research scenario, but also since it is actually useful to
have a preliminary idea of the potential outcomes of clustering operations on the latent space.
Current and future works on the project are focused on (i) a preliminary evaluation of the potential
Figure 6: A comparison of different latent spaces representations including actual ground truth
(based on the Stanford dogs dataset).


results employing target datasets related to infrared and radio images of supernova remnants and
compact sources (clumps) that could lead to interesting innovative insights on star formation, (ii)
the adoption of techniques for the representation learning step that are particularly suited to deal
with small datasets. In the medium run, we also intend to release the produced framework as a
service within the NEANIAS project ecosystem and also as an open source project.

Acknowledgments
The authors want to acknowledge support by the NEANIAS (Novel EOSC Services for Emerging
Atmosphere, Underwater & Space Challenges) project, funded by the EC Horizon 2020 research
and innovation programme under Grant Agreement No. 863448.


References
 [1] Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives,
     IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (2013) 1798–1828.
 [2] E. Min, X. Guo, Q. Liu, G. Zhang, J. Cui, J. Long, A survey of clustering with deep learning:
     From the perspective of network architecture, IEEE Access 6 (2018) 39501–39514.
 [3] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak,
     N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, J. Bouwman, A. J. Brookes,
     T. Clark, M. Crosas, I. Dillo, O. Dumon, S. Edmunds, C. T. Evelo, R. Finkers, A. Gonzalez-
     Beltran, A. J. G. Gray, P. Groth, C. Goble, J. S. Grethe, J. Heringa, P. A. C. ’t Hoen, R. Hooft,
     T. Kuhn, R. Kok, J. Kok, S. J. Lusher, M. E. Martone, A. Mons, A. L. Packer, B. Persson,
     P. Rocca-Serra, M. Roos, R. van Schaik, S.-A. Sansone, E. Schultes, T. Sengstag, T. Slater,
     G. Strawn, M. A. Swertz, M. Thompson, J. van der Lei, E. van Mulligen, J. Velterop,
     A. Waagmeester, P. Wittenburg, K. Wolstencroft, J. Zhao, B. Mons, The FAIR guiding
     principles for scientific data management and stewardship, Scientific Data 3 (2016) 160018.
 [4] S. Riggi, F. Vitello, U. Becciani, C. Buemi, F. Bufano, A. Calanducci, F. Cavallaro, A. Costa,
     A. Ingallinera, P. Leto, et al., Caesar source finder: Recent developments and testing,
     Publications of the Astronomical Society of Australia 36 (2019) e037.
 [5] R. Z. E. Alsaberi, L. A. Barnes, M. D. Filipović, N. I. Maxted, H. Sano, G. Rowell, L. M.
     Bozzetto, S. Gurovich, D. Urošević, D. Onić, B. Q. For, P. Manojlović, G. Wong, T. J. Galvin,
     P. Kavanagh, N. O. Ralph, E. J. Crawford, M. Sasaki, F. Haberl, P. Maggi, N. F. H. Tothill,
     Y. Fukui, Radio emission from interstellar shocks: Young type ia supernova remnants and
     the case of n 103b in the large magellanic cloud, Astrophysics and Space Science 364
     (2019) 204.
 [6] Y. Liu, E. Jun, Q. Li, J. Heer, Latent space cartography: Visual analysis of vector space
     embeddings, Computer Graphics Forum 38 (2019) 67–78.
 [7] R. Sarmiento, M. Huertas-Company, J. H. K. S. F. Sànchez, H. D. Sànchez, N. Drory,
     J. Falcón-Barroso, Capturing the physics of MaNGA galaxies with self-supervised machine
     learning, The Astrophysical Journal (to appear). URL: https://arxiv.org/abs/2104.08292.
 [8] P. Baldi, K. Hornik, Neural networks and principal component analysis: Learning from
     examples without local minima, Neural Networks 2 (1989) 53–58.
 [9] D. P. Kingma, M. Welling, Auto-encoding variational bayes, in: Y. Bengio, Y. LeCun
     (Eds.), 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB,
     Canada, April 14-16, 2014, Conference Track Proceedings, 2014.
[10] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C.
     Courville, Y. Bengio, Generative adversarial nets, in: Z. Ghahramani, M. Welling, C. Cortes,
     N. D. Lawrence, K. Q. Weinberger (Eds.), Advances in Neural Information Processing
     Systems 27: Annual Conference on Neural Information Processing Systems 2014, December
     8-13 2014, Montreal, Quebec, Canada, 2014, pp. 2672–2680.
[11] J. Donahue, P. Krähenbühl, T. Darrell, Adversarial feature learning, in: 5th International
     Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017,
     Conference Track Proceedings, 2017.
[12] L. Jing, Y. Tian, Self-supervised visual feature learning with deep neural networks: A survey,
     IEEE Transactions on Pattern Analysis and Machine Intelligence 43 (2021) 4037–4058.
     ((2021)).
[13] K. He, H. Fan, Y. Wu, S. Xie, R. B. Girshick, Momentum contrast for unsupervised
     visual representation learning, in: 2020 IEEE/CVF Conference on Computer Vision and
     Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, Computer Vision
     Foundation / IEEE, 2020, pp. 9726–9735.
[14] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, G. E. Hinton, Big self-supervised models
     are strong semi-supervised learners, in: H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan,
     H. Lin (Eds.), Advances in Neural Information Processing Systems 33: Annual Conference
     on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020,
     virtual, 2020.
[15] Y. Cao, J. Wu, Rethinking self-supervised learning: Small is beautiful, CoRR
     abs/2103.13559 (2021). URL: https://arxiv.org/abs/2103.13559.
[16] H. Hotelling, Analysis of a complex of statistical variables into principal components.,
     Journal of educational psychology 24 (1933) 417.
[17] L. Van der Maaten, G. Hinton, Visualizing data using t-sne., Journal of machine learning
     research 9 (2008).
[18] L. McInnes, J. Healy, N. Saul, L. Grossberger, UMAP: Uniform manifold approximation
     and projection, The Journal of Open Source Software 3 (2018) 861.
[19] E. Amid, M. K. Warmuth, Trimap: Large-scale dimensionality reduction using triplets,
     arXiv preprint arXiv:1910.00204 (2019).URL: https://arxiv.org/abs/1910.00204.
[20] Y. Wang, H. Huang, C. Rudin, Y. Shaposhnik, Understanding how dimension reduction
     tools work: an empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP
     for data visualization, arXiv preprint arXiv:2012.04456 (2020).URL: https://arxiv.org/abs/
     2012.04456.
[21] A. Narayan, B. Berger, H. Cho, Density-preserving data visualization unveils dynamic
     patterns of single-cell transcriptomic variability, bioRxiv (2020). URL: https://www.biorxiv.
     org/content/early/2020/05/14/2020.05.12.077776.
[22] W. V. Gansbeke, S. Vandenhende, S. Georgoulis, M. Proesmans, L. V. Gool, SCAN: learning
     to classify images without labels, in: A. Vedaldi, H. Bischof, T. Brox, J. Frahm (Eds.),
     Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28,
     2020, Proceedings, Part X, volume 12355 of Lecture Notes in Computer Science, Springer,
     2020, pp. 268–285.
[23] M. Ester, H. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters
     in large spatial databases with noise, in: E. Simoudis, J. Han, U. M. Fayyad (Eds.),
     Proceedings of the Second International Conference on Knowledge Discovery and Data
     Mining (KDD-96), Portland, Oregon, USA, AAAI Press, 1996, pp. 226–231.
[24] A. Khosla, N. Jayadevaprakash, B. Yao, F.-F. Li, Novel dataset for fine-grained image
     categorization: Stanford dogs, in: Proc. CVPR Workshop on Fine-Grained Visual Catego-
     rization (FGVC), volume 2, 2011.
[25] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in:
     Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp.
     770–778.
[26] X. Li, W. Wang, X. Hu, J. Yang, Selective kernel networks, in: Proceedings of the
     IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 510–519.