=Paper= {{Paper |id=Vol-2022/paper58 |storemode=property |title= Astrophysical Data Analytics based on Neural Gas Models, using the Classification of Globular Clusters as Playground |pdfUrl=https://ceur-ws.org/Vol-2022/paper58.pdf |volume=Vol-2022 |authors=Giuseppe Angora,Massimo Brescia,Giuseppe Ricci,Stefano Cavuot,Maurizio Paolillo,Thomas H. Puzi |dblpUrl=https://dblp.org/rec/conf/rcdl/AngoraBRCPP17 }} == Astrophysical Data Analytics based on Neural Gas Models, using the Classification of Globular Clusters as Playground == https://ceur-ws.org/Vol-2022/paper58.pdf
 Astrophysical Data Analytics based on Neural Gas Models,
 using the Classification of Globular Clusters as Playground
       © Giuseppe Angora1 © Massimo Brescia2                         © Giuseppe Riccio2 © Stefano Cavuoti3
                         © Maurizio Paolillo 3                       © Thomas H. Puzia4
                      1
                        Department of Physics “E. Pancini”, University Federico II,
                                     Via Cinthia 6, 80126 Napoli, Italy
                            2
                              INAF Astronomical Observatory of Capodimonte,
                                  Via Moiariello 16, 80131 Napoli, Italy
                      3
                        Department of Physics “E. Pancini”, University Federico II,
                                     Via Cinthia 6, 80126 Napoli, Italy
                  4
                    Institute of Astrophysics, Pontificia Universidad Católica de Chile,
                           Av. Vicuña Mackenna 4860, Macul, Santiago, Chile
                                             gius.angora@gmail.com
          Abstract. In Astrophysics, the identification of candidate Globular Clusters through deep, wide-field,
    single band HST images, is a typical data analytics problem, where methods based on Machine Learning have
    revealed a high efficiency and reliability, demonstrating the capability to improve the traditional approaches.
    Here we experimented some variants of the known Neural Gas model, exploring both supervised and
    unsupervised paradigms of Machine Learning, on the classification of Globular Clusters, extracted from the
    NGC1399 HST data. Main focus of this work was to use a well-tested playground to scientifically validate
    such kind of models for further extended experiments in astrophysics and using other standard Machine
    Learning methods (for instance Random Forest and Multi Layer Perceptron neural network) for a comparison
    of performances in terms of purity and completeness.
          Keywords: data analytics, astroinformatics, globular clusters, machine learning, neural gas.


1 Introduction                                                          The astrophysical case is related to the identification
                                                                    of Globular Clusters (GCs) in the galaxy NGC1399 using
    The current and incoming astronomical synoptic                  single band photometric data obtained through
surveys require efficient and automatic data analytics              observations with the Hubble Space Telescope (HST)
solutions to cope with the explosion of scientific data             [8], [25],[27].
amounts to be processed and analyzed. This scenario,
                                                                        The physical identification and characterization of a
quite similar to other scientific and social contexts,
                                                                    Globular Cluster (GC) in external galaxies is considered
pushed all communities involved in data-driven
                                                                    important for a variety of astrophysical problems, from
disciplines to explore data mining techniques and
                                                                    the dynamical evolution of binary systems, to the
methodologies, most of which connected to the Machine
                                                                    analysis of star clusters, galaxies and cosmological
Learning      (hereafter     ML)      paradigms,      i. e.
                                                                    phenomena [27].
supervised/unsupervised self-adaptive learning and
parameter space optimization[3],[6],[7] .                               Here, the capability of ML methods to learn and
                                                                    recognize peculiar classes of objects, in a complex and
    Following this premise, this paper is focused on the
                                                                    noising parameter space and by learning the hidden
investigation about the use of a particular kind of ML
                                                                    correlation among object’s parameters, has been
methods, known as Neural Gas (NG) models[21], to
                                                                    demonstrated particularly suitable in the problem of GC
solve classification problems within the astrophysical
                                                                    classification[8]. In fact, multi-band wide-field
context, characterized by a complex multi-dimensional
                                                                    photometric data (colours and luminosities) are usually
parameter space. In order to scientifically validate such
                                                                    required to recognize GCs within external galaxies, due
models, we decided to approach a typical astrophysical
                                                                    to the high risk of contamination of background galaxies,
playground, already solved with ML methods [8], [11]
                                                                    which appear indistinguishable from galaxies located
and to use in parallel other two ML techniques, chosen
                                                                    few Mpc away, when observed by ground-based
among the most standard, respectively, Random Forest
                                                                    instruments. Furthermore, in order to minimize the
[5] and Multi Layer Perceptron Neural Network[23], as
                                                                    contamination, high-resolution space-borne data are also
comparison baseline.
                                                                    required, since they are able to provide particular
Proceedings of the XIX International Conference                     physical and structural features (such as concentration,
“Data Analytics and Management in Data Intensive                    core radius, etc.), thus improving the GC classification
Domains” (DAMDID/RCDL’2017), Moscow, Russia,                        performance [25].
October 10–13, 2017



                                                              381
    In[8] we demonstrated the capability of ML methods              m_V=27.5, i.e. 4 mag below the GC luminosity function,
to classify GCs using only single band images from                  thus allowing to sample the entire GC population (see[8]
Hubble Space Telescope with a classification accuracy               for details).
of 98.3%, a completeness of 97.8% and only 1.6% of
residual contamination. Thus confirming that ML
methods may yield low contamination by minimizing the
observing requirements and extending the investigation
to the outskirts of nearby galaxies.
    These results gave us an optimal playground where
to train NG models and to validate their potential to solve
classification problems characterized by complex data
with a noising parameter space.
    The paper is structured as follows: in Sect. 2 we
describe the data used to test of the various methods. In
Sect. 3 we provide a short methodological and technical
description of the models. In Sect. 4 we describe the
experiments and results about the parameter space
analysis and classification experiments, while in Sect. 5           Figure 1 The FoV covered by the HST/ACS mosaic in
we discuss the results and draw our conclusions.                    the broad V band
2 The Astrophysical Playground                                          The source subsample used to build our Knowledge
As introduced, the HST single band data use dare very               Base (KB) to train the ML models, is composed by 2100
suitable to investigate the classification of GCs. They, in         sources with 11features (7 photometric and 4
fact, are deep and complete in terms of wide-field                  morphological parameters).
coverage, i. e. able to sample the GC population, to                    Such parameter space includes three aperture
ensure a high S/N ratio required to measure structural              magnitudes within 2, 6 and 20 pixels (mag_aper1,
parameters [10]. Furthermore, they provide the                      mag_aper2, mag_aper3), is ophotal magnitude
possibility to study the overall properties of the GC               (mag_iso), kron radius (kron_rad), central surface
populations, which usually may differ from those of the             brightness (mu0), FWHM (fwhm_im),and the four
central region of a galaxy.                                         structural parameters, respectively, ellipticity, King's
                                                                    tidal, effective and core radii (calr_t, calr_h, calr_c). The
    With such data we intend to verify that Neural Gas
                                                                    target values of the KB required as ground truth for
based models could be able to identify GCs with low
contamination even with single band photometric                     training and validation, i.e. the binary column indicating
information. Throughout the confirmation of such                    the source as GC or not GC, is provided through the
behavior, we are confident that these models could solve            typical selection based on multi-band magnitude and
other astrophysical problems as well as in other data-              colour cuts. The original 2100 sources having a target
driven problem contexts.                                            assigned have been randomly shuffled and split into a
                                                                    training (70%) and a blind test set (30%).
2.1 The data
                                                                    3 The Machine Learning Models
The data used in the described experiment consist of
wide field single band HST observations of the giant                In our work we tested three different variants of the
elliptical NGC1399 galaxy, located in the core of the               Neural Gas model, using two additional machine
Fornax cluster[27]. Due to its distance (D=20.130 Mpc,              learning methods, respectively feed-forward neural
see[13]), it is considered an optimal case where to cover           network and Random Forest, as comparison benchmarks.
a large fraction of its GC system with a restricted number          In the following all main features of these models are
of observations. This dataset was used by[25] to study              described.
the GC-LMXB connection and the structural properties                3.1 Growing Neural Gas
of the GC population. The optical data were taken with
the HST Advanced Camera for Surveys, in the broad V                 Growing Neural Gas (GNG) is presented by[14] as a
band filter, with 2108 seconds of integration time for              variant of the Neural Gas algorithm (introduced by[21]),
each field. The observations were arranged in a 3x3 ACS             which combines the Competitive Hebbian Learning
mosaic with a scale of 0.03 arcsec/pix, and combined into           (CHL, [22]) with a vector quantization technique to
a single image using the MultiDrizzle routine[19]. The              achieve a learning that retains the topology of the dataset.
field of view of the ACS mosaic covers ~100 square                       Vector quantization techniques[22] encode a data
arcmin (Figure 1), extending out to a projected galacto-            manifold, e.g.           , using a finite set of reference
centric distance of ~55 kpc.                                        vectors                ,                     . Every data
    The source catalog was generated using Sextractor               vector        is described by the best matching reference
[4],[2], by imposing a minimum area of 20 pixels: it
                                                                    vector        for which the distortion error              is
contains 12915 sources and reaches 7σ detection at
                                                                    minimal. This procedure divides the manifold          into a




                                                              382
number                     of                  subregions:            periodically: during the adaptation steps the error
                                     , called Voronoi                 accumulation allows to identify the regions in the input
polyhedra[24], within which each data vector             is           space where the signal mapping causes major errors.
described by the corresponding reference vector .                     Therefore, to reduce this error, new units are inserted in
                                                                      such regions[14].
    The Neural Gas network is a vector quantization
model characterized by N neural units, each one                            An elimination mechanism is also provided: once
associated to a reference vector, connected to each other.            the connections, whose age is greater than a certain
When an input is extracted, it induces a synaptic                     threshold, have been removed, if their connected units
excitation detected by all the neurons in the graph and               remain isolated (i.e. without emanating edges), those
causes its adaptation. As shown in[21], the adaptation                units are removed[14].
rule can be described as a “winner-takes-most” instead                3.2 GNG with Radial Basis Function
of “winner-takes-all” rule:
                                                                      Fritzke describes an incremental Radial Basis Function
                                               .        (1)
                                                                      (RBF) network suitable for classification and regression
The step size       describes the overall extent of the
                                                                      problems [14].
adaptation. While                          is a function in
which is the “neighborhood-ranking” of the reference                      The network can be figured out as a standard RBF
vectors. Simultaneously, the first and second Best                    network [9], with a GNG algorithm as embedded
Matching Units (BMUs) develop connections between                     clustering method, used to handle the hidden layer.
each other[21].                                                           Each unit of this hybrid model (hereafter GNGRBF)
     Each connection has an “age”; when the age of a                  is a single perceptron with an associated reference vector
connection exceeds a pre-specified lifetime T, it is                  and a standard deviation. For a given input-output pair
removed[21]. Martinez's reasoning is interesting[22]:                                       , the activation of the i-th unit is
they demonstrate how the dynamics of neural units can                 described by
be compared to a gaseous system. Let’s define the
density of vector reference at location            through                                              .
                                                                         Each of the single perceptron computes a weighted
                , where               is the volume of                sum of the activations:
Voronoi polyhedra. Hence,          is a step function on
each Voronoi polyhedra, but we can still imagine that
their volumes change slowly from one polyhedra to the                     The adaptation rule applies to both reference vectors
next, with     continuous. In this way, it is possible to             forming the hidden layer and the RBF weights. For the
derive an expression for the average change:                          first, the adaptation rule is the same of the updating rule
                                                                      for the GNG network, while for the weights:
                                                         (2)                                                                  (3)
where       is the data point distribution.
                                                                        Similarly to the GNG network, new units are inserted
The equation suggests the name Neural Gas: the average
                                                                      where the prediction error is high, updating only the Best
change of the reference vectors corresponds to a motion
                                                                      Matching Unit at each iteration:
of particles in a potential              . Superimposed on
the gradient of this potential there is a force proportional
to           , which points toward the direction of the                                                       .
space where the particle density is low.                              3.3 Supervised Growing Neural Gas
     Main idea behind the GNG network is to
successively add new units to an initially small network,             The Supervised Growing Neural Gas (SGNG) algorithm
by evaluating local statistical measures collected during             is a modification of the GNG algorithm that uses class
previous adaptation steps[14]. Therefore, each neural                 labels of data to guide the partitioning of data into
unit in the graph has associated a local reconstruction               optimal clusters[15],[20]. Each of the initial neurons is
error, updated for the BMU at each iteration (i. e. each              labelled with a unique class label. To reduce the class
                                                                      impurity inside the cluster, the original learning rule (1)
time an input is extracted):                              .
                                                                      is reformulated by considering the case where the BMU
     Unlike the Neural Gas network, in the GNG the                    belongs or not to the same class of the neuron whose
synaptic excitation is limited to the receptive fields                reference vector is the closest to the current input.
related to the Best Matching Unit and its topological                 Depending on such situation the SGNG learning rule is
neighbors:                                                            expressed alternatively as:

It is no longer necessary to calculate the ranking for all
neural units, but it is sufficient to determine the first and
the second BMU.                                                                                                           (4)
                                                                         Where          is the nearest class neuron and
      The increment of the number of units is performed
                                                                                     is a function specifically introduced to




                                                                383
maintain neurons sufficiently distant one each other. For            4.1 The Classification Statistical Estimators
the neuron which is topologically close to the neuron ,
                                                                         In order to evaluate the performances of the selected
the rule intends to increase the clustering accuracy[20].
                                                                     classifiers, we decided to use three among the classical
The insertion mechanism has to reduce not only the intra-
                                                                     and widely used statistical estimators, respectively,
distances between data in a cluster, but also the impurity
                                                                     average efficiency, purity, completeness and F1-score,
of the cluster. Each unit has associated two kinds of error:
                                                                     which can be directly derived from the confusion
an aggregated and a class error. A new neuron is inserted
                                                                     matrix[28], showed in Figure 2. The average
close to the neuron having a highest class error
                                                                     efficiency(also known as accuracy, hereafter AE), is the
accumulated, while the label is the same as the neuron
                                                                     ratio between the sum of correctly classified objects on
label with the greater aggregated error.
                                                                     both classes (true positives for both classes, hereafter tp)
3.4 Multi Layer Perceptron                                           and the total amount of objects in the test set. The purity
                                                                     (als known as precision, hereafter pur) of a class
    The Multi Layer Perceptron (MLP) architecture is
                                                                     measures the ratio between the correctly classified
one of the most typical feed-forward neural
                                                                     objects and the sum of all objects assigned to that class
networks[23]. The term feed-forward is used to identify
                                                                     (i.e. tp/ [tp+fp], where fp indicates the false
basic behavior of such neural models, in which the
                                                                     positives).While the completeness (also known as recall,
impulse is propagated always in the same direction, e.g.
                                                                     hereafter comp) of a class is the ratio tp/ [tp+fn], where
from neuron input layer towards output layer, through
                                                                     fn is the number of false negatives of that class. The
one or more hidden layers (the network brain), by
                                                                     quantity tp+fn corresponds to the total amount of objects
combining the sum of weights associated to all neurons.
                                                                     belonging to that class. The F1-score is a statistical test
     As easy to understand, the neurons are organized in             that considers both the purity and completeness of the
layers, with proper own role. The input signal, simply               test to compute the score (i. e. 2 [pur*comp]/
propagated throughout the neurons of the input layer, is             [pur+comp]).
used to stimulate next hidden and output neuron layers.
                                                                         By definition, the dual quantity of the purity is the
The output of each neuron is obtained by means of an
                                                                     contamination, another important measure which
activation function, applied to the weighted sum of its
                                                                     indicates the amount of misclassified objects for each
inputs.
                                                                     class.
     The weights adaptation is obtained by the Logistic
Regression rule[17], by estimating the gradient of the
cost function, the latter being equal to the logarithm of
the likelihood function between the target and the
prediction of the model. In this work, our implementation
of the MLP is based on the public library Theano[1].
                                                                     Figure 2 The confusion matrix used to estimate the
3.5 Random Forest                                                    classification statistics. Columns indicate the class
    Random Forest (RF) is one of the most widely known               objects as predicted by the classifier, while rows are
machine learning ensemble methods [5], since it uses a               referred to the true objects of the classes. Main diagonal
random subset of candidate data features to build an                 terms contain the number of correctly classified for the
ensemble of decision trees. Our implementation makes                 two classes, while fp counts the false positives and fn the
use of the public library scikit-learn[26]. This method has          false negatives of the GC class
been chosen mainly because it provides for each input
                                                                         In statistical terms, it is well known the classical
feature as core of importance (rank) measured in terms
                                                                     tradeoff between purity and completeness in any
of its informative contribution percentage to the
                                                                     classification problem, particularly accentuated in
classification results. From the architectural point of
                                                                     astrophysical problems[12]. In the specific case of the
view, a RF is a collection (forest) of tree-structured
                                                                     GC identification, from the astrophysical point of view,
classifiers           , where the         are independent,           we were mostly interested to the purity, i. e. to ensure the
identically distributed random vectors and each tree casts           highest level of true GCs correctly identified by the
a unit vote for the most popular class at input. Moreover,           classifiers[8]. However, within the comparison
a fundamental property of the RF is the intrinsic absence            experiments described in this work, our main goal was to
of training over fitting[5].                                         evaluate the performances of the classifiers mostly
4 The experiments                                                    related to the best tradeoff between purity and
                                                                     completeness.
    The five models previously introduced have been
applied to the dataset described in Sec. 2.1 and their               4.2 Analysis of the Data Parameter Space
performances have been compared to verify the                            Before to perform the classification experiments, we
capability of NG models to solve particularly complex                preliminarily investigated the parameter space, defined
classification problems, like the astrophysical                      by the 11 features defined in Sec. 2.1, identifying each
identification of GCs from single-band observed data.                object within the KB dataset of 2100 objects. Main goal
                                                                     of this phase was to measure the importance of any
                                                                     feature, i.e. its relevance in terms of informative




                                                               384
contribution to the solution of the problem. In the ML                   Finally, the experiment E4 is performed to verify the
context, this analysis is usually called feature                      results by removing only the two worst features.
selection[16]. Its main role is to identify the most                  Table 1 List of selected experiments, based on the
relevant features of the parameter space, trying to                   analysis of the parameter space. The third column
minimize the impact of the well known problem of the                  reports the identifiers of the included features,
curse of dimensionality, i.e. the fact that ML models                 according to the importance ranking (see legend in
exhibit a decrease of performance accuracy when the                   Figure 3)
number of features is significantly higher than                                                           included
optimal[18]. This problem is mainly addressed to cases                      EXP ID       # features
                                                                                                          features
with a huge amount of data and dimensions. However,                         E1           4                1,2,3,5
its effects may also impact contexts with a limited                         E2           6                1,2,3,4,5,6
amount of data and parameter space dimension.
                                                                            E3           7                1,2,3,4,5,6,10
    The Random Forest model resulted particularly                           E4           9                1,2,3,4,5,6,7,8,9
suitable for such analysis, since it is intrinsically able to
provide a feature importance ranking during the training              4.3 The Classification Experiments
phase. The feature importance of the parameter space,
representing the dataset used in this work, is shown in                  Following the results of the parameter space analysis,
Figure 3.                                                             the original domain of features has been reduced, by
                                                                      varying the number and types of included features.
    From the astrophysical point of view, this ranking is             Therefore, the classification experiments have been
in accordance with the physics of the problem. In fact, as            performed on the dataset, described in Sec. 2.1,
expected, among the five most important features there                composed by 2100 objects and represented by a
are the four magnitudes, i. e. the photometric log-scale              parameter space with up to a maximum of 9 features
measures of the observed object’s photonic flux through               (Table 1).
different apertures of the detector. Furthermore, almost
all photometric features resulted as the most relevant.               Table 2 Statistical analysis of the classification
Finally, by looking at the Figure 3, there is an interesting          performances obtained by the five ML models on the
gap between the first six and the last five features, whose           blind test set for the four selected experiments. All
cumulative contribution is just ~11% of the total. Finally,           quantities are expressed in percentage and related to
a very weak joined contribution (~3%) is carried by the               average efficiency (AE), purity for each class (purGC,
two worst features (kron_rad and calr_c), which can be                purNotGC), completeness for each class (compGC,
considered as the most noising/redundant features for the             compNotGC) and the F1-score for GC class. The
problem domain.                                                       contamination is the dual value of the purity
                                                                                            RF MLP SGNG GNGRBF GNG
                                                                       ID Estimator
                                                                                            %      %       %        %        %
                                                                           AE              88.9 84.4 88.1          88.1     88.4
                                                                           purGC           85.9 80.1 89.7          85.4     83.7
                                                                          compGC     87.3      82.6    80.3      85.7     89.2
                                                                       E1
                                                                          F1-scoreGC 86.6      81.3    84.7      85.5     86.4
                                                                          purNotGC 91.0        87.6    87.2      90.0     92.1
                                                                          compNotGC 89.7       85.6    93.0      89.6     88.1
                                                                          AE         89.0      85.1    87.3      88.3     83.2
Figure 3 The feature importance ranking obtained by the
                                                                          purGC      84.9      77.0    81.0      82.9     74.0
Random Forest on the 11-feature domain of the input
dataset during training (see Sec. 2.1 for details). The blue              compGC     89.2      90.7    90.3      90.0     91.1
                                                                       E2
vertical lines report the importance estimation error bars                F1-scoreGC 87.0      83.3    85.4      86.3     81.7
    Based on such considerations, the analysis of the                     purNotGC 92.2        92.6    92.7      92.6     92.6
parameter space provides a list of most interesting                       compNotGC 89.0       85.6    85.7      87.4     80.0
classification experiments to be performed with the                       AE         89.0      83.2    85.1      89.2     86.8
selected five ML models. This list is reported in Table 1.
    The experiment E1 is useful to verify the efficiency                  purGC      85.2      77.2    80.0      86.0     84.1
by considering the four magnitudes.                                       compGC     88.8      83.8    84.9      88.0     83.8
                                                                       E3
    The experiment E2 is based on the direct evaluation                   F1-scoreGC 87.0      80.4    82.4      87.0     83.9
of the best group of features as derived from the                         purNotGC 91.9        88.0    89.0      91.5     88.7
importance results.
                                                                          compNotGC 89.9       83.2    85.1      89.8     88.4
    The classification efficiency of the full photometric
subset of features is evaluated through the experiment                    AE         89.5      86.0    88.1      88.7     83.8
                                                                       E4
E3.                                                                       purGC      85.3      82.5    84.1      83.8     78.3




                                                                385
     compGC        90.0 83.8      87.6      90.0      83.8              If we compare the NG models with the two additional
                                                                    ML methods (Random Forest and MLP neural network),
     F1-scoreGC 87.6 83.1         85.8      86.8      81.0
                                                                    their performances appears almost the same. This implies
     purNotGC 92.7 88.6           91.1      92.6      88.1          that NG methods show classification capabilities fully
     compNotGC 89.1 87.5          88.1      88.2      84.1          comparable to other ML methods.
                                                                        Another interesting aspect is the analysis of the
    The dataset has been randomly shuffled and split into
                                                                    degree of coherence among the NG models in terms of
a training set of 1470 objects (70% of the whole KB) and
                                                                    commonalities within classified objects. Table 3 reports
a blind test set of 630 objects (the residual 30% of the
                                                                    the percentages of common predictions for the objects
KB).
                                                                    correctly classified by considering, respectively both and
    These datasets have been used to train and test the             single classes. On average, the three NG models are in
selected five ML classifiers. The analysis of results,              agreement among them for about 80% of the objects
reported in Table 2, has been performed on the blind test           correctly classified.
set, in terms of the statistical estimators defined in
                                                                    Table 3 Statistics for the three NG models related to the
Sec. 4.2.
                                                                    common predictions of the correctly classified objects.
5 Discussion and Conclusions                                        Second column is referred to both classes, while the
                                                                    third and fourth columns report, respectively, the
    As already underlined, main goal of this work is the            statistics for single classes
validation of NG models as efficient classifiers in noising                           GC+notGC         GC         notGC
and multi-dimensional problems, with performances at                     EXP ID
                                                                                            %           %           %
least comparable to other ML methods, considered
“traditional” in terms of their use in such kind of                        E1           86.0          85.4        86.9
problems.                                                                  E2           79.8          79.8        79.8
    By looking at Table 2 and focusing on the statistics                   E3           81.1          82.5        79.2
for the three NG models, it is evident that their result is                 E4            77.8        77.4       78.4
able to identify GCs from other background objects,
                                                                        This is also confirmed by looking at the Figure 4,
reaching a satisfying tradeoff between purity and
                                                                    where the tabular results of Table 3 are showed through
completeness in all experiments and for both classes. The
                                                                    the Venn diagrams, reporting also more details about
occurrence of statistical fluctuations is mostly due to the
                                                                    their classification commonalities.
different parameter space used in the four experiments.
Nevertheless, none of the three NG models overcome the
others in terms of the measured statistics.




Figure 4 The Venn diagram related to the prediction of all (both GCs and not GCs) correctly classified objects
performed by the three Neural Gas based models (GNG, GNGRBF and SGNG) for the experiments, respectively, E1
(a), E2 (b), E3 (c) and E4 (d). The intersection areas (dark grey in the middle) show the objects classified in the same
way by different models. Internal numbers indicate the amount of objects correctly classified for each sub-region

   Finally, from the computational efficiency point of              internal structure, their complexity strongly depends on
view, the NG models have theoretically a higher                     the nature of the problem and its parameter space.
complexity than Random Forest and neural networks.                      Nevertheless, all the presented ML models have a
But, since they are based on a dynamic evolution of the             variable architectural attitude to be compliant with the




                                                              386
parallel     computing     paradigms.     Besides     the           [6] Brescia, M., Cavuoti, S., Longo, G., Nocella, A.,
embarrassingly parallel architecture of the Random                      Garofalo, M., et al.: DAMEWARE: A Web
Forest, the use of optimized libraries, like Theano[1],                 Cyberinfrastructure for Astrophysical Data Mining.
make also models like MLP highly efficient. From this                   PASP. 126, 942 (2014). doi: 10.1086/677725
point of view NG models have a high potentiality to be              [7] Brescia, M., Longo, G.: Astroinformatics, Data
parallelized. By optimizing GNG, the GNGRBF would                       Mining and the Future of Astronomical Research.
automatically benefit, since both share the same search                 Nuclear Instruments and Methods in Physics
space, except for the RBF training additional cost. In                  Research A, 720, pp. 92-94, Elsevier (2013). doi:
practice, the hidden layer of the supervised network                    10.1016/j.nima.2012.12.027
behaves just like a GNG network whose neurons act as                [8] Brescia, M., Cavuoti, S., Paolillo, M., Longo, G.,
inputs for the RBF network. Consequently, with the same                 Puzia, T.: The Detection of Globular Clusters in
number of iterations, the GNGRBF network performs a                     Galaxies as a Data Mining Problem. MNRAS 421,
major number of operations.                                             2, pp. 1155-1165 (2012). doi: 10.1111/j.1365-
    On the other hand, the SGNG network is similar to                   2966.2011.20375.x
the GNG network, although characterized by a neural                 [9] Broomhead, D.S., Lowe, D.: Radial Basis
insertion mechanism over a long period, thus avoiding                   Functions, Multi-Variable Functional Interpolation
too rapid changes in the number of neurons and excessive                and Adaptive Networks. Technical report. RSRE
oscillations of reference vectors. Therefore, on average,               4148 (1988)
the SGNG network computational costs are higher than               [10] Carlson, M.N., Holtzman, J.A.: Measuring Sizes of
the models based on the standard Neural Gas mechanism.                  Marginally Resolved Young Globular Clusters with
    In conclusion, although a more intensive test                       the Hubble Space Telescope. PASP 113, 790,
campaign on these models is still ongoing, we can assert                pp. 1522-1540 (2001). doi: 10.1086/324417
that Neural Gas based models are very promising as                 [11] Cavuoti, S., Garofalo, M., Brescia, M., Paolillo, M.,
problem-solving methods, also in presence of complex                    Pescapè, A., Longo, G., Ventre, G.: Astrophysical
and multi-dimensional classification and clustering                     Data Mining with GPU. A Case Study: Genetic
problems, especially if preceded by an accurate analysis                Classification of Globular Clusters. New
and optimization of the parameter space within the                      Astronomy,       26,   pp. 12-22     (2014).     doi:
problem domain.                                                         10.1016/j.newast.2013.04.004
Acknowledgements                                                   [12] D'Isanto, A., Cavuoti, S., Brescia, M., Donalek, C.,
                                                                        Longo, G., Riccio, G., Djorgovski, S.G.: An
    MB acknowledges the PRIN-INAF 2014 Glittering                       Analysis of Feature Relevance in the Classification
kaleidoscopes in the sky: the multifaceted nature and                   of Astronomical Transients with Machine Learning
role of Galaxy Clusters, and the PRIN-MIUR 2015                         Methods. MNRAS 457 (3), pp. 3119-3132 (2016).
Cosmology and Fundamental Physics: illuminating the                     doi: 10.1093/mnras/stw157
Dark Universe with Euclid.
                                                                   [13] Dunn, L.P., Jerjen, H.: First Results from SAPAC:
    MB, GL and MP acknowledge the H2020-MSCA-                           Toward a Three-dimensional Picture of the Fornax
ITN-2016 SUNDIAL (SUrvey Network for Deep                               Cluster Core. AJ 132 (3), pp. 1384-1395 (2006).
Imaging Analysis and Learning), financed within the                     doi: 10.1086/506562
Call H2020-EU.1.3.1.                                               [14] Fritzke, B.: A Growing Neural Gas Network Learns
References                                                              Topologies. In: Advances in Neural Information
                                                                        Processing System, 7, G. Tesauro, D.S. Touretzky
[1] Al-Rfou, R., Alain, G., Almahairi, A. et al.:                       and T.K. Leen (eds.), MIT Press, Cambridge MA
      Theano: A {Python} Framework for Fast                             (1995)
      Computation of Mathematical Expressions. arXiv               [15] Fritzke, B.: Supervised Learning with Growing
      e-printsabs/1605.02688 (2016)                                     Cell Structures. In: Advances in Neural Information
[2]   Annunziatella, M., Mercurio, A., Brescia, M.,                     Processing System, 6, Cowan, J.D., Tesauro, G.,
      Cavuoti, S., Longo, G.: Inside Catalogs: A                        and Alspector, J. (eds.), Morgan-Kaufmann, pp.
      Comparison of Source Extraction Software. PASP                    255-262 (1994)
      125, 923 (2013). doi: 10.1086/669333                         [16] Guyon, I., Elisseeff, A.: An Introduction to
[3]   Astroinformatics. In: Brescia, M., Djorgovski,                    Variable and Feature Selection. JMLR 3, pp. 1157-
      S.G., Feigelson, E.D., Longo, G., Cavuoti, S. (eds.)              1182 (2003)
      International Astronomical Union Symposium, 325              [17] Harrell, F.E.: Regression Modeling Strategies.
      (2017). ISBN: 9781107169951                                       Springer-Verlag (2001). ISBN 0-387-95232-2
[4]   Bertin, E., Arnouts, S.: SExtractor: Software for            [18] Hughes, G.F.: On the Mean Accuracy of Statistical
      Source Txtraction. A&A Suppl. Series, 117,                        Pattern Recognizers. IEEE Transactions on
      pp. 393-404 (1996). doi: 10.1051/aas:1996164                      Information Theory, 14 (1), pp. 55-63 (1968).
[5]   Breiman, L.: Machine Learning, 45. Springer Eds.,                 doi:10.1109/TIT.1968.1054102
      pp. 25-32 (2001)




                                                             387
[19] Koekemoer, A.M., Fruchter, A.S., Hook, R.N.,                [24] Montoro, J.C.G., Abascal, J.L.F.: The Voronoi
     Hack, W.: MultiDrizzle: An Integrated Pyraf Script               Polyhedra as Tools for Structure Determination in
     for Registering, Cleaning and Combining Images.                  Simple Disordered Systems. J. Phys. Chem., 97
     In: The 2002 HST Calibration Workshop. Santiago                  (16),       pp. 4211-4215         (1993).      doi:
     Arribas, Anton Koekemoer, and Brad Whitmore                      10.1021/j100118a044
     (eds.). Baltimore, MD: Space Telescope Science              [25] Paolillo, M., Puzia, T., Goudfrooij, P. et al.:
     Institute (2002)                                                 Probing the GC-LMXB Connection in NGC 1399:
[20] Jirayusakul, A., Aryuwattanamongkol, S.: A                       A Wide-field Study with the Hubble Space
     Supervised Growing Neural Gas Algorithm for                      Telescope and Chandra. ApJ, 736 (2), p. 90 (2011).
     Cluster Analysis. Springer-Verlag (2006)                         doi: 10.1088/0004-637X/736/2/90
[21] Martinez, T., Schulten, K.: A Neural-Gas Network            [26] Pedregosa, F., Varoquaux, G., Gramfort, A. et al.:
     Learns Topologies. In: Artificial Neural Networks.               Scikit-learn: Machine Learning in Python. JMLR,
     T. Kohonen, K. Makisara, O. Simula, and J. Kangas                12, pp. 2825-2830 (2011)
     (eds.), Amsterdam, The Netherlands, Elsevier,               [27] Puzia, T., Paolillo, M., Goudfrooij, P., Maccarone,
     pp. 397-402 (1991)                                               T.J., Fabbiano, G., Angelini, L.: Wide-field Hubble
[22] Martinez, T., Berkovich, G., Schulten, K.J.: Neural              Space Telescope Observations of the Globular
     Gas Network for Vector Quantization and its                      Cluster System in NGC 1399. ApJ, 786 (2), p. 78
     Application to Time-Series Prediction. In: IEEE                  (2014). doi: 10.1088/0004-637X/786/2/78
     Transactions on Neural Networks, 4 (4), pp. 558-            [28] Stehman, S.V.: Selecting and Interpreting
     569 (1993)                                                       Measures of Thematic Classification Accuracy.
[23] McCulloch, W., Pitts, W., Bulletin of Mathematical               Remote Sensing of Environment, 62 (1), pp. 77-89
     Biophysics, 5 (4), pp. 115-133 (1943)                            (1997). doi:10.1016/S0034-4257(97)00083-7




                                                           388