<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Employing Evolutionary Algorithms for Classification of Astrophysical Spectra</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>David Bednárek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Kruliš</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jakub Yaghob</string-name>
          <email>yaghob@ksi.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filip Zavoral</string-name>
          <email>zavoral@ksi.mff.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Parallel Architectures/Algorithms/Applications Research Group Faculty of Mathematics and Physics, Charles University in Prague Malostranské nám.</institution>
          <addr-line>25, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>1214</volume>
      <fpage>7</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>In the past decade, automated astronomical observatories collected huge amounts of data which can no longer be explored by astronomers individually. In our case, we deal with optical spectra produced by multiobject low-resolution spectrographs. Due to lower resolution and higher level of noise in such surveys, individual spectra rarely offer reliable information; however, since many similar objects expectedly exist in the universe, global analysis of the spectrum database may reveal classes of objects sharing similar properties. In this paper, we propose a novel evolutionary approach to classification of spectral data which is expected to achieve finer level of detail than traditional methods. Furthermore, we describe the most computationally-intensive parts of the method in the form of parallel cache-aware algorithm.</p>
      </abstract>
      <kwd-group>
        <kwd>evolutionary algorithm</kwd>
        <kwd>spectrum</kwd>
        <kwd>classification</kwd>
        <kwd>astrophysics</kwd>
        <kwd>astroinformatics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Studying the spectra of celestial objects was the key to
many (if not the majority) of astronomical discoveries of
the last two centuries and it still remains the most valuable
instrument in stellar astronomy.</p>
      <p>Stellar spectrum is a recording of radiation intensity in
the frequency domain, usually over a range of visible or
near-infrared wavelengths. The most prominent features
of a stellar spectrum are its general shape (also called
continuum), absorption lines, and emission lines. In most
studies including our approach, the absorption lines are
considered the most important features.</p>
      <p>Spectra reveal significant clues about the chemical
composition, temperature, and velocity of the observed object;
however, the interpretation of the observed facts is difficult
because different physical processes may result in similar
observations.</p>
      <p>
        Large surveys like the Sloan Digital Sky Survey
(SDSS) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] produce hundreds of thousands spectral
measurements using multi-object spectrographs. These
devices have lower resolution than single-object
spectrographs by design; in addition, the nature of a large survey
requires that relatively fainter objects are measured.
Consequently, the measured spectra often lack enough detail
required by traditional classification methods; in
particular, individual measurement of absorption lines is possible
only in the cases of most prominent lines.
      </p>
      <p>
        This lack of detail may be balanced by a global
approach where a model of the spectrum is compared to
the measured intensity along the complete width of the
spectrum, instead of focusing on the prominent lines only.
Models of stellar spectra can either by synthesized from
the theory or based on the measurement of well-studied
prototype objects, including the Sun. Astronomers have
created libraries of synthetic spectra with varying number
of parameters [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] . To match a real observation, the
model parameters must be determined; a number of
methods has been already proposed based on machine
learning [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], principal component analysis [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], or combined
methods [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Unfortunately, some scientifically interesting classes of
objects like Be stars still lack sufficiently general
models and their variability, given by their nontrivial
geometry, makes parameter determination difficult even for
simple models. Therefore, the classification of such objects
into subclasses is still based rather on specific features
observed in their spectra [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] than on the parameters of
a matching physical model.
      </p>
      <p>Our proposed approach is inspired by evolutionary
algorithms. The goal of the method is to create synthetic
spectra to match the observations; however, the synthetic
spectra are not based on a physical model of the object.
Instead, each synthetic spectrum is matched against as many
observations as possible, trying to cover the given set of
observations by as few synthetic spectra as possible. Of
course, the observed objects are often similar but not
completely identical. Therefore, each synthetic spectrum is
allowed to undergo a transformation before matching to
an observation; the evolutionary algorithm tries to evolve
both the synthetic spectra and the transformation
parameters at once.</p>
      <p>Compared to the traditional meaning of the synthetic
spectra, the physics in our model is greatly reduced: The
set of the lines in our synthetic spectrum is not derived
from the assumed chemistry of the object but simply
placed to match the observations. On the other hand, the
profiles of the lines are physically sound (corresponding
to the effect of Doppler broadening), and also the
transformations of the spectra correspond to physical variations
like differences in temperature or radial velocity.</p>
      <p>If such a synthetic spectrum is successfully matched to a
set of observed objects, each of the matched observations
require different parameters of the spectrum
transformation. Consequently, the synthetic spectrum corresponds
to a hypothetical object whose spectral characteristics are
close to all the matched objects and the transformations
required to match the individual objects are related to the
difference between the hypothetical object and the observed
object.</p>
      <p>Our synthetic spectrum does not offer any physical
model; however, if a model is assigned to the synthetic
spectrum by other means, e.g., by the inspection by an
astronomer, the model will probably apply to the observed
objects as well. In addition, the physical meaning of the
spectrum transformation allows the determination of the
required change in parameters of the assigned model,
including the verification whether the change is physically
plausible. Nevertheless, the physical interpretation of the
transformation is not a part of our method; the
physical background of the transformation merely serves as a
means of defining physically relevant notion of similarity.</p>
      <p>The main purpose of the method is the reduction of the
number of spectra that must be inspected manually;
consequently, there is no strict requirement of separation of
the resulting clusters, only the requirement for high
intracluster similarity.</p>
      <p>The paper is organized as follows. Section 2 revises
related work in the fields of spectra classification and
evolutionary algorithms. The following section describes the
mathematical background of our synthetic spectra, their
transformation, and how they are matched to the
observations. Section 4 describes our co-evolution
algorithm. Section 5 summarizes the approach and suggests
the modes of its application.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The idea of the astronomic spectra classification is not
completely new. In the past, the most preferred
approach was based on the examination of significant
spectral line [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ].
      </p>
      <p>
        Bazaghan [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] proposed the self organizing maps as an
unsupervised artificial neural network algorithm for
classification of the stellar spectra. Jiang et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
usedprincipal component analysis methods to reduce
dimensionality of the data, where only the first two eigenvectors are
selected. Furthermore, they proposed a hierarchical
clustering method for the data mining approach.
      </p>
      <p>
        Bromová et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] attempted to employ wavelets as
descriptors of the stellar spectra. The spectra were sampled
by discrete wavelet transformation and various
transformations of the coefficients into Euclidean space were used,
thus the descriptors were simple vectors. The k-means
algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] was applied on the descriptors to find similar
spectra, especially to identify the spectra of Be stars. Their
implementation achieved 76% precision on a sample set of
656 spectra with manually annotated ground truth;
however, the sample set consisted of high-precision low-noise
spectra from a single-object spectrograph. When applied
to multi-object spectrograph measurements, the precision
was lost.
      </p>
      <p>
        A simpler technique [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] used 2D curves like the Bezier
curve to approximate the histogram and then compare the
coefficients or the defining points of the curves.
      </p>
      <p>
        Evolutionary algorithms and especially genetic
algorithms have been used for various types of classification
and clustering problems. As a representative, we have
selected the work of Maulik [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], which proposes a
clustering technique based on genetic algorithm. The algorithm
is similar to the k-means clustering algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], but the
centroids are a population of individuals which is refined
using the genetic approach.
      </p>
      <p>
        In physics, genetic algorithms have been used for
classification and pattern recognitions in mass spectra. Lavine
et al. presented a genetic algorithm for classification of
wood types measured by Raman spectroscopy [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. A
year later, Lavine presented more generalized version of
the genetic algorithm for pattern recognition in mass
spectra [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. However, the aim of these methods is the
classification into reliably defined and well separated classes
of materials while our goals include discovering of such
classes.
      </p>
      <p>
        Another approach to mass spectra analysis was devised
by Geurts et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Their method is based on assembling
a decision tree which detects proteomic biomarkers in the
spectrum. Their objective was to devise a method for
automated detection of various diseases in the body fluids.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Mathematic Model</title>
      <p>Each synthetic spectrum Si is defined as a set of lines; each
line is determined by its position l, width w, and
intensity d.</p>
      <p>Si = {hli,1, wi,1, di,1i, ..., hli,ni , wi,ni , di,ni i}</p>
      <p>The line parameters are expressed in units that allow
easy application of physically relevant transformations:
the position is expressed as the logarithm of wavelength
because Doppler shifts act as multipliers of the
wavelength, the width uses a unit corresponding to the
temperature associated to Doppler broadening, and the intensity
is measured using the logarithm of attenuation which is
proportional to the density of the gas generating the
absorption line.</p>
      <p>Each line generates wavelength-dependent attenuation
corresponding to Doppler broadening, described by the
function</p>
      <p>Al,w,d (λ ) = e−de−((log10λ−l)/w)2</p>
      <p>Since computing the value of this function is expensive
and cannot be vectorized, the function is tabulated in our
implementation, using equidistant sampling in the four
dimensions log10λ , l, w, and d. In log10λ , the sampling
points are equal to the sampling points of the observed
spectra (which are, fortunately, normalized to equidistant
in our database). In l, w, and d, the approximation was
improved by quadratic interpolation, i.e., using tabulated
values of A and all its first and second partial derivatives.
In addition, the following equivalences are used to reduce
the number of samples:</p>
      <p>Al+x,w,d (λ ) = Al,w,d (λ · 10−x)</p>
      <p>Al,w,d1+d2 (λ ) = Al,w,d1 (λ ) · Al,w,d2 (λ )</p>
      <p>Using these tricks, the number of samples, required to
achieve the precision similar to the precision of the
measured spectra, was reduced to approx. 3 · 106. This amount
could easily fit into the memory but still poses significant
burden on the cache hierarchy, requiring careful design of
the algorithm.</p>
      <p>The transformation of the spectrum consists of a global
change to all line parameters and multiplication of the
resulting attenuation curve by a black-body radiation curve
BT for the temperature T . The resulting synthetic
spectrum curve is defined as:
ni
Ci,T,l0,w0,d0 (λ ) = BT (λ ) · ∏ Ali, j+l0,wi, j·w0,di, j·d0 (λ )
j=1</p>
      <p>Thus, each transformed spectrum is defined by the
quintuple hi, T, l0, w0, d0i where i is the index of base spectrum
and P = hT, l0, w0, d0i are the parameters of the
transformation
3.1</p>
      <sec id="sec-3-1">
        <title>Matching Synthetic Spectra to Observations</title>
        <p>Matching against the observed spectrum O incorporates
another physically relevant transformation – multiplying
the value by a factor m reflecting the both the absolute
luminosity of the object and its distance from the observer.
The value of m is determined using the method of least
squares, i.e., minimizing the sum</p>
        <p>R = ∑
λ
(O(λ ) − m · C(λ ))2</p>
        <p>σ 2(λ )</p>
        <p>In this definition, the quadratic residua are divided by
variance σ 2(λ ) estimated for each measured wavelength.
The inverse variance σ21(λ ) is a part of the SDSS data along
the flux O(λ ). The application of estimated variance
allows to suppress the parts of the measured spectra affected
by the noise caused by the atmospheric background.</p>
        <p>The minimal value of R is</p>
        <p>Rmin = ∑
λ σ 2(λ ) −</p>
        <p>O(λ )2
(∑λ O(σλ2)(·Cλ()λ ) )2</p>
        <p>C(λ )2
∑λ σ2(λ )</p>
        <p>Since we need a measure of the match quality which is
consistent over differently luminous objects, we use a
normalized form of the sum:
Δ(O,C) = 1 −
(∑λ O(σλ2)(·Cλ()λ ) )2</p>
        <p>O(λ )2 C(λ )2
∑λ σ2(λ ) · ∑λ σ2(λ )</p>
        <p>This function may act as a distance between the
observed spectrum O and the synthetic spectrum S, albeit its
symmetry is broken by the fact that the variance σ 2 is
associated to the observation. Since the presence of the σ 2
factors is merely a technical trick to minimize the
influence the sky background and it does not significantly
affect the method, we will omit the σ 2 data in the description
of the evolutionary algorithm, for simplicity.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evolutionary Algorithm</title>
      <p>Assume that we have a set of observations O =
{O1, ..., Om}.</p>
      <p>As stated in the previous section, our goal is to establish
a set of synthetic spectra S = {S1, ..., Sn}, and to assign one
of the synthetic spectra to every observation together with
a set of transformation parameters.</p>
      <p>Nevertheless, the nature of evolutionary algorithms
require a population of candidate solutions – in our case, it
means that every observation may be assigned to several
spectra from the set S, each with different transformation
parameters.</p>
      <p>Thus, our population consists of two parts: A set S of
synthetic spectra and a set P = {P1, ..., Pp} of pairings.
Each pairing is a tuple</p>
      <p>Pk = hik, jk, Tk, lk0, w0k, dk0i
where ik is the index of a base synthetic spectrum, jk is
the index of an observed spectrum, and hTk, lk0, w0k, dk0i are
the parameters of the transformation as described in the
previous section. In other words, Pk is a link between the
synthetic spectrum Sik and the observation O jk .</p>
      <p>The quality of each pairing Pk is evaluated using the
previously defined distance functionΔ:</p>
      <p>q(Pk) = 1 − Δ(O jk ,Cik,Tk,lk0,w0k,dk0 )</p>
      <p>In traditional settings, the fitness functionq would
control the evolution of the population P and defining
mutation and/of crossover operators over P members would
be sufficient to create a working evolutionary algorithm.
However, in our case, we must simultaneously evolve also
the set S of the synthetic spectra.</p>
      <p>The structure and flow of data is depicted in Figure 1.
4.1</p>
      <sec id="sec-4-1">
        <title>Symbiotic Evolution</title>
        <p>Our population consists of two parts, S and P, which shall
evolve simultaneously like a pair of different species
living in a common environment. In addition, the members
Tabulated line
curves</p>
        <p>Line lists</p>
        <p>Base synthetic
spectra</p>
        <p>Transformation</p>
        <p>Transformed
synthetic spectra</p>
        <p>Fitness</p>
        <p>Observed
spectra
&lt;l,w,d&gt;
ΦT,l’,w’,d’
of P are linked to members of S, resembling symbiosis
between the two species. The symbiosis is asymmetric as
each individual from S hosts several individuals from P.</p>
        <p>
          Although there are numerous examples of such
symbiosis in the nature, only few attempts [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] exist to transfer
this mechanism to the world of evolutionary algorithms.
        </p>
        <p>Such a symbiosis requires to solve a set of additional
problems:
• If S was fixed, the population P could be divided
into isolated islands attached to individual
observations from O, evolving independently. However, the
co-evolution of S causes that all members of P may
mutually interfere through the S population.
• The transformation expressed by a P individual is not
able to hit the associated observation perfectly; the
minimal possible distance (however measured) from
the observation depends on the associated S
individual. Thus, some P individuals may approximate their
objects more easily than others. Consequently,
enforcing a global fitness function for P would
prematurely kill individuals attached to those
observations from O which are hard to approximate. In other
words, there is no globally acceptable fitness function
for P members; instead, P members must be
compared only locally among the subset attached to the
same observation.
• A fitness function must be defined for the members
of S. Naturally, it shall be based on the fitness of the
P members linked to the evaluated individual of S.
However, merely summing the fitness will not work
due to the expected large differences in the number of
linked P members.
• Migration (i.e., re-linking a P member to a different S
member) must be supported as an equivalent to
mutation of the P member. Consequently, a notion of
distance must be defined on S in order to favorize
short-distance migrations over long-distance ones –
i.e., small mutations over large ones.
• Poor-fitness S members must not die-out because P
members may be linked to them.</p>
        <p>In our approach, these problems are addressed as
follows:</p>
        <p>The system keeps track of family relationships in the set
S. It means that, when an individual is created by
mutation or crossover, the source individuals are preserved and
the parent-child relation is saved in the form of a directed
acyclic graph. Fortunately, the S individuals are shared
and thus significantly less numerous than theP population;
consequently, storing the complete history of its evolution
is feasible.</p>
        <p>Keeping S members forever solves the problem of
orphaned P members. Nevertheless, the main advantage is
elsewhere:</p>
        <p>The graph of relationships allows distance measurement
between the members of S, consistent with the factual
difference of the corresponding synthetic spectra. If two
members of S share a common ancestor, the number of
generations between them and the ancestor may be used
as a measure of distance. Assuming that the genetic
operators represent movement to small distances in the space
of spectra, close relatives in the S graph are close also in
the space. Of course, the converse implication is not true,
because distant nodes in the S graph may also represent
neighbors in the spectral space.</p>
        <p>This notion of distance is used in the mutation of P
members: A P member may randomly relocate to a
different S individual; however, only to a sibling or a child of
its previous host – this way, the relocation preserves
locality in the space of spectra.</p>
        <p>In addition, the relocation of the P members is
controlled by their fitness: Poorly fit individuals relocate to
siblings in an attempt to find a replacement for the current
poorly fitting host. Well fit individuals relocate to children
of their host, attempting to improve the fitness.</p>
        <p>The evolution of S members must support the need for
relocation of P members. It means that S members
occupied by a number of well-fit P members must generate
offspring to enable their relocation. In other words, the
fitness of an S member, being a controller of its fertility, must
reflect the sole presence of well-fit P members, regardless
of their number and independently of the presence of other
P members.</p>
        <p>This approach is detailed in Algorithm 1. The algorithm
contains a number of steps that require tuning or may be
realized in different ways. For instance, the vague final
condition on line 3 may be implemented as a check for
stagnating summary fitness over the P population;
however, in many cases, it would be rather limited by the
computing resources available.</p>
        <p>Algorithm 1 The co-evolution algorithm
Require: O the observations ; N, t, MM, MX evolution
parameters
1: S := random population of spectra
2: P := random population of pairings on S × O
3: while not satisfieddo
4: compute the fitnessq(Pk) for every Pk ∈ P
5: for every O j ∈ O, determine Pj0 = the top N
associated P members according to q
6: P0 := S j Pj0
7: for every Si ∈ S, determine n(Si) = the number of P0
members hosted by Si
8: S0 := {Si ∈ S | n(Si) &gt; t}
9: for every Si ∈ S0 generate MM children of Si by
random mutation
10: randomly select MX pairs from S0 for crossover
11: relocate every Pk ∈ P0 to a randomly selected child
of the linked S member
12: relocate every Pk ∈ P \ P0 to a randomly selected
sibling of the linked S member
13: end while
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Parallel Implementation</title>
        <p>The most computationally intensive part of the
coevolution algorithm is the calculation of population fitness.
Thanks to the tabulation of line curves, the computation
consists mostly of multiplication and addition. These
simple operations are well supported by SIMD instructions of
contemporary CPU’s; consequently, the throughput of the
arithmetic unit is very high, in the order of 1010 operations
per second per core.</p>
        <p>Given the high performance of the arithmetic unit, the
memory and cache subsystem becomes the bottleneck of
the algorithm. Furthermore, the observed spectra are
matched against the base synthetic spectra almost
randomly and the base spectra are also created from
essentially randomly selected line curves. Consequently,
iteration along the population would lead to random access
both to the tabulated line curves and to the database of
observed spectra. Thus, such a naive approach would lead to
poor cache hit ratios and, consequently, poor performance.</p>
        <p>To improve the performance of the fitness calculation,
we developed the Algorithm 2. The algorithm is based
on dividing the data into appropriately sized groups which
can fit in a level of the memory hierarchy:</p>
        <p>The data set O of observed spectra may be so large that
it must be located in external storage. Consequently, it
must be divided into groups {G Oj} and processed
groupby-group. The size of every G Oj group shall be selected
so that the corresponding spectra fit into the last level of
cache. For our Xeon CPUs with 8 MB L3 caches, the
optimal group size was about 50 spectra.</p>
        <p>The set of tabulated line curves is typically slightly
larger than the last level of cache; therefore, it is
divided into groups {GnA}. To manage the division, the lines
that constitute the synthetic spectra must be collected and
sorted according to the associated line curves (line 9 of
Algorithm 2).</p>
        <p>To employ parallelism while avoiding locking, every
G Oj group must be further divided into as many groups
as there are computing units. To balance the size of the
groups, the division is done indirectly, dividing the set of
lines sorted along the observed spectra (line 12 and 13)
into equivalently sized groups {GLm}.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Our evolutionary algorithm categorizes observations into
sets represented by a common synthetic spectrum – this
spectrum offers a reasonable representative of the set of
observations.</p>
      <p>Furthermore, the method may improve the
comprehensibility of the spectrum, because the evolution of the
synthetic spectrum produces results similar to the averaging of
the observations in the associated set. Averaging
measurements is a well-established technique used for improving
the signal-to-noise ratio; however, raw averaging would
produce invalid results due to differences in the
measurement conditions like Doppler shifts. Our method helps to
find the set to be averaged and, at the same time, it suggests
transformations whose inversions shall be applied before
averaging.</p>
      <p>Although our approach is similar to clustering and
similarity-based methods, there is a principal difference:
Our method does not guarantee that similar objects will
be arranged in the same set. There is only a
complementary guarantee that the objects in the same set are similar.
Algorithm 2 Parallel fitness calculation algorithm
Require: O the observations ; A line curves ; S synthetic
spectra line lists ; P pairings and transformation
parameters
Ensure: fitness valueq(Pk) for every Pk ∈ P
1: for each group GO</p>
      <p>j ⊆ O of observed spectra do
2: read the group G Oj into memory
3: L0 := 0/
4: for each pairing Pk ∈ P associated to a spectrum
from G Oj [in parallel] do
5: allocate and initialize buffer Ck for the
transformed spectrum
6: compute transformed line list Lk0 from the base
line list and the transformation parameters
7: L0 := L0 ∪ Lk0
8: end for
9: sort L0 by the index of the referenced line curve
10: for each group GnA ⊆ A of the line curves do
11: determine the range L00 ⊆ L0 corresponding to GnA
12: sort L00 by the index of the observed spectrum
13: for each group GLm ⊆ L00 [in parallel] do
14: for each transformed line hk, l, w, di ∈ GLm do
15: multiply the line curve Al,w,d to the buffer Ck
16: end for
17: end for
18: end for
19: for each pairing Pk ∈ P associated to a spectrum
from G Oj [in parallel] do
20: compute fitness q(Pk) of Ck w.r.t. the associated
spectrum
21: end for
22: end for
Even more, the sets may intersect, so the method must be
perceived only as a means of reducing the number of
observations to be inspected manually.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>This paper was supported by Czech Science Foundation
(GACR) projects P103/13/08195 and P103/14/14292P and
by SVV-2014-260100.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] Sloan digital sky survey</article-title>
          . [Online]. Available: http: //www.sdss3.org/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Coelho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Barbuy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Meléndez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schiavon</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Castilho</surname>
          </string-name>
          , “
          <article-title>A library of high resolution synthetic stellar spectra from 300 nm to 1.8 μm with solar and α-enhanced composition</article-title>
          ,
          <source>” Astronomy &amp; Astrophysics</source>
          , vol.
          <volume>443</volume>
          , pp.
          <fpage>735</fpage>
          -
          <lpage>746</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Palacios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gebran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Josselin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Martins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Belmas</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Lebre</surname>
          </string-name>
          , “
          <article-title>Pollux: a database of synthetic stellar spectra</article-title>
          ,
          <source>” arXiv preprint arXiv:1003.4682</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Fuentes</surname>
          </string-name>
          , “
          <article-title>Automatic determination of stellar atmospheric parameters using neural networks and instancebased machine learning</article-title>
          ,
          <source>” Experimental Astronomy</source>
          , vol.
          <volume>12</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>31</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Recio-Blanco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bijaoui</surname>
          </string-name>
          , and P. De Laverny, “
          <article-title>Automated derivation of stellar atmospheric parameters and chemical abundances: the matisse algorithm,” Monthly Notices of the Royal Astronomical Society</article-title>
          , vol.
          <volume>370</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>150</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.-N.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prugniel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-C.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-N.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-R.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wei</surname>
          </string-name>
          et al.,
          <article-title>“Automatic determination of stellar atmospheric parameters and construction of stellar spectral templates of the Guoshoujing telescope</article-title>
          (LAMOST),
          <source>” Research in Astronomy and Astrophysics</source>
          , vol.
          <volume>11</volume>
          , no.
          <issue>8</issue>
          , p.
          <fpage>924</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bromová</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Škoda</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Zendulka</surname>
          </string-name>
          , “
          <article-title>Wavelet based feature extraction for clustering of Be stars,” in Nostradamus 2013: Prediction, Modeling and Analysis of Complex Systems</article-title>
          . Springer,
          <year>2013</year>
          , pp.
          <fpage>467</fpage>
          -
          <lpage>474</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Veilleux</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Osterbrock</surname>
          </string-name>
          , “
          <article-title>Spectral classification of emission-line galaxies,”</article-title>
          <source>The Astrophysical Journal Supplement Series</source>
          , vol.
          <volume>63</volume>
          , pp.
          <fpage>295</fpage>
          -
          <lpage>310</lpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Baldwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Phillips</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Terlevich</surname>
          </string-name>
          , “
          <article-title>Classification parameters for the emission-line spectra of extragalactic objects,” Publications of the Astronomical Society of the Pacific</article-title>
          , pp.
          <fpage>5</fpage>
          -
          <lpage>19</lpage>
          ,
          <year>1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bazarghan</surname>
          </string-name>
          , “
          <article-title>Application of self-organizing map to stellar spectral classifications</article-title>
          ,
          <source>” Astrophysics and Space Science</source>
          , vol.
          <volume>337</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>93</fpage>
          -
          <lpage>98</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. Z.</given-names>
            <surname>Ping</surname>
          </string-name>
          , and G. Qiang, “
          <article-title>A data mining application in stellar spectra,” in Computer Science</article-title>
          and Computational Technology,
          <year>2008</year>
          . ISCSCT '08. International Symposium on, vol.
          <volume>2</volume>
          ,
          <issue>2008</issue>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>69</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Hartigan</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Wong</surname>
          </string-name>
          , “Algorithm AS 136:
          <article-title>A k-means clustering algorithm</article-title>
          ,” Applied statistics, pp.
          <fpage>100</fpage>
          -
          <lpage>108</lpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          ,
          <article-title>Curves and surfaces in computer aided geometric design</article-title>
          . Springer-Verlag Berlin,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>U.</given-names>
            <surname>Maulik</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Bandyopadhyay</surname>
          </string-name>
          , “
          <article-title>Genetic algorithmbased clustering technique,” Pattern recognition</article-title>
          , vol.
          <volume>33</volume>
          , no.
          <issue>9</issue>
          , pp.
          <fpage>1455</fpage>
          -
          <lpage>1465</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Lavine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Moores</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Griffiths</surname>
          </string-name>
          , “
          <article-title>Raman spectroscopy and genetic algorithms for the classification of wood types</article-title>
          ,
          <source>” Applied Spectroscopy</source>
          , vol.
          <volume>55</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>960</fpage>
          -
          <lpage>966</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Lavine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Moores</surname>
          </string-name>
          , “
          <article-title>Genetic algorithms for spectral pattern recognition,” Vibrational Spectroscopy</article-title>
          , vol.
          <volume>28</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>83</fpage>
          -
          <lpage>95</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Geurts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fillet</surname>
          </string-name>
          , D. De Seny, M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Meuwis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Malaise</surname>
            ,
            <given-names>M.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Merville</surname>
          </string-name>
          , and L. Wehenkel, “
          <article-title>Proteomic mass spectra classification using decision tree based ensemble methods</article-title>
          ,
          <source>” Bioinformatics</source>
          , vol.
          <volume>21</volume>
          , no.
          <issue>14</issue>
          , pp.
          <fpage>3138</fpage>
          -
          <lpage>3145</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vahdat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Heywood</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Zincir-Heywood</surname>
          </string-name>
          , “
          <article-title>Symbiotic evolutionary subspace clustering,” in Evolutionary Computation (CEC), 2012 IEEE Congress on</article-title>
          .
          <source>IEEE</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>