Algorithms for Stellar Populations in 3D Spectroscopy Luis Alvarez-Ochoa1,2 and Leticia Flores-Pulido3 , Roberto Terlevich2 , Oleg Starostenko3 1 Universidad Politécnica de Tlaxcala , Tlaxcala, C.P. 90180, México 2 Instituto Nacional de Astrofı́sica, Óptica y Electrónica, Puebla, C.P. 72840, México 3 Universidad de las Américas Puebla, Puebla, C.P. 72820, México lochoa@inaoep.mx, leticia.florespo@udlap.mx, rjt@inaoep.mx, oleg.starostenko@udlap.mx Abstract. The storage capacity of digital databases overcomes our abil- ity to analyze data, so the need to narrow this gap as much as possible is imperative. In astronomy, telescopes can perform automatic surveys of the sky gathering a huge amount of data that are waiting to be analyzed in more detail. The deeper is the analysis the more elaborated and slower are the procedures. Computer science is facing the challenge of creating efficient algorithms . Machine learning (ML), a branch of computer sci- ence, is an option for many problems where there is no a simple definition of the relation input-ouput. ML techniques have been successively used in several astronomical and storage knowlege problems. We focus the research on various branches of machine learning by solving a practical problem: estimation of stellar population parameters (SPP) and kine- matics of galactic spectra. The SPP have been tackled in multiple ways, but as the complexity and number of spectra grows, the techniques of ML gain importance. We define the stellar populations and we survey some of the most common methods for their estimating. We concentrate on a special type of spectra, called 3D spectra. Keywords: Algorithms, Galactic spectra, Stellar population, Simulated anneal- ing, Evolution strategies. 1 Introduction Nowadays, technologies of storage and instrumentation are producing huge data- bases in several sciences, astronomy is a good example of this situation. Modern telescopes and instruments daily record spectra and images, these data have a lot of valuable implicit information which needs to be analyzed by efficient al- gorithms. Sophisticated computational algorithms have already been used for classifying astronomical objects from images and for estimating physical param- eters from spectra. The complexity of the techniques changes according to the 37 size and intrinsic difficulty of the data. However, the astronomical observational techniques evolve and now they are producing more complex data each time. A relatively new technique for obtaining spectra is the Integral Field Spectroscopy (IFS) that can produce thousands of spectra of only one object. The collected data by this kind of instruments require efficient methods of analysis. Exam- ples of the type of information that can be retrieved from an IFS spectrum of one galaxy are the distribution of ages of the component stars in each region of the galaxy, the kinematics of stars and gases. In principle, with these data, one can form the history of the galaxy, a very important problem in Astrophysics. Surely the study of the problems set by IFS spectroscopy will require to develop new algorithms that will contribute to computer science and to other disciplines. The main methods for estimating stellar populations are introduced section 2, two new strategies for improving efficiency are are presented in section 3, the experimental results are discussed in section 4. 2 Stellar Populations The stellar populations in galaxies are characterized for an age distribution, the chemical composition (metallicity) and the amount of gas that obscures star light (extintion). These properties can be directly inferred from the charactetization of the absorption and emission lines of their spectra [1], this is an appropriate technique in case that just a few spectra and restricted properties have to be analyzed. Other common and more practical approach for estimating physical parameters is to synthesize a galactic spectrum from a spectral base set of sim- ple stellar populations with known physical features, in such a way that this synthetic spectrum resembles the observed spectrum [2]. Basically the method try to find a synthetic spectrum that best fits with the observed one, see figure 3. 3D spectra of galaxies allow to know the stellar populations in greater spatial detail. 3D spectra are formed by a grid of spaxles, where each spaxel contains a spectrum that samples a region of the galaxy providing the necessary spatial resolution to synthesize stellar populations kinematics maps [3],[4],[5]. 3 Methods We believe that a machine learning approach will be successful in addressing our problem, this claim is based on both the state of the art review and in our own experience. The problem could be posed as an optimization problem, therefore we must review the available optimization algorithms, mainly those in the machine learning area. We also explore the preprocessing of data as a medium to improve the system performance. The research will lead us to design algorithms that hopefully would be of wide application. Our problem of stellar population parameters is non-linear with constraints, the different objectives functions that model galactic spectra contain several independent variables, so we have a high dimensional problem, for instance, 38 the objective function 2 based on the model 1. On the other hand, the data have certain degree of noise. Therefore, we have selected stochastic and heuristic methods to start our experiments, in particular Simulated Annealing (SA) and Evolution Strategies (ES). 3 X g(λ) = ci s(ai , λ)(1 − eri λ ) (1) i=1 X 2 f (P ) = |o − g| (2) λ In any machine learning system the preprocessing stage has particular rele- vance on the final performance, even more in cases where the data are of high dimension [6], [7]. Each spectrum of interest in this proposal has several thou- sands of variables, thus some research on this theme is mandatory before we focus on the optimization algorithms. One way of accelerating the convergence of an optimization algorithm is start with an initial guess solution that near us to an acceptable solution. The guess can be produced by another algorithm. We test this strategy by designing two architectures. The first architecture sort the 3D spectra by means of a similarity metric. This ordering will allow us to estimate the parameters of a given spec- trum and then use them as an initial solution for a similar spectra, then the optimization algorithm will refine this initial solution, see Figure 1. The second one, a neural network whose training set are the ordered pairs (spectral features, stellar population parameters) gives the initial guess, see Figure 2. We start our work by taking both simulated annealing and evolution strate- gies [8], [9] (a type of EA) as an initial basis, these were chosen by their ability to optimize complex functions. We will use active learning, feature selection and prior knowledge to address our problem. 4 Preliminary Results The first experiments are focused on stellar population, the issue of kinematics will be treated later. 4.1 The Graphical User Interface We have designed a simple graphical user interface (GUI), because given the amount of data graphical results can help us in debbuging the approach, besides of integrating the system parts in an ordered way. Figure 3 shows the GUI, it has a menu bar, a central window that shows a test spectrum and its corresponding estimation, the upper right group galaxy components displays the selected op- tions for data set and model, finally, the lower group model parameters shows the parameter values for both test and estimated spectrum, the number and type of parameters will depend on the chosen model. The GUI contains already menus for the following tasks: 39 Fig. 1. Estimation architecture based on an ordering of spectra according to a similarity metric. 40 Fig. 2. Estimation architecture based on an classifier. 41 – Select data and models: menu Model. The data are simple stellar popula- tions spectra and the model specifies how to combine these data in order to form a synthetic galaxy. – Create training and test sets: menu Samples. The training set is used to train a neural network that can be used to make a first estimation of param- eters. – Calculate features and train a neural network: menu Preprocessing. The multiple metric option extracts from the training and test sets features as inputs of a backpropagation neural network. – Select the optimization algorithm: menu Estimation. Here, we can choose between Simulated Annealing and Evolution Strategies as estimation algo- rithm. Fig. 3. The GUI 4.2 The model We can choose from several models of synthetic galaxy spectra, they basically make linear combinations of spectra of different ages, and then they include a non-linear factor for taking into account the extinction. We test the model reported in [10], [11], see equation 3. 3 X g(λ) = ci s(ai , λ)(1 − eri λ ) (3) i=1 42 where the value of the synthetic galaxy spectrum at wavelength λ is g(λ), a1 , a2 and a3 are the ages of the component spectra, c1 , c2 and c3 are the relative contributions to the galaxy and ri , r2 and r3 are the extinction values. s(ai , λ) refers to the value that a spectrum of age ai at wavelength λ has. A basis of seven high resolution spectra were calculated with a suitable small suitable basis for initial experiments. The ages of spectra range from 1 million to 1,000 million years. Each spectrum has 13,323 pixels that covers a spectral interval from 3,000 to 7,000 Å. As a primer set of features we chose: euclidian distance, sum of squared differ- ences, difference in total luminosity, maximal difference and difference between means in ten equal intervals through the spectra, all of these are defined below. The fourteen features are mapped to the nine parameters using a neural network, this is the first estimation that is used later by the optimization algorithm. Let N be the number of spectra, fλk the energy flux of the spectrum k at the point λ, where k = 1 . . . N and λ = 1 . . . m. The definition of features is the following: We took an arbitrary spectrum and we named it f 0 , all the features are metrics of similarity with respect to this reference spectrum. m X fTk ot = fik (4) i=1 dlum (f 0 , f k ) = |fT0 ot − fTk ot | (5) dmax (f 0 , f k ) = max({|fλ0 − fλk |}) (6) v um uX deuc (f 0 , f k ) = t (f 0 − f k )2 λ λ (7) λ=1 m X dsqreuc = (fλ0 − fλk )2 (8) λ=1 We divide the spectrum into j intervals of the same length and we measure their means, the intervals are designated by [jlow , jup ] jup 1 X meankj = fλk j = 1 . . . 10 (9) (jup − jlow + 1) λ=jlow dmeanj (f 0 , f k ) = |mean0j − meankj | (10) 43 4.3 Experimental Results In order to assess the relevance of the selected features in the estimation process, we prepared a training set of 100 spectra, this set is the input to the neural network that will predict the population parameters. The prediction done by the network is the starting point of the optimization algorithms SA-NN and ES-NN. Another way of generating a first guess is to order the spectra by a simi- larity metric, so in the estimation of a given spectrum we can use the solution of a neighboring spectrum as the starting point of the optimization algorithms SA-PCA and ES-PCA. The proximity in the feature space should reflect the proximity in the parameter space. We project the features on the principal com- ponents instead of calculating the distances over all the pairs of spectra, and we use first component as criterion of nearness. The better results were obtained using a neural network (see figure 4). The PCAs did not work better than neural networks, a explanation of this result could be that the test set is randomly sparse and probably the mean distance between set of parameters is not suitable for this technique. In addition, in SA and ES algorithms, the rate of change of the actual solution is high at the beginning, so we must try to choose appropriately the parameters of the algorithms to control the rate when a starting guess is used. The real spectra of galaxies cover only certain regions of the parameter space, so we do not discard the use of PCA until dealing with real 3D spectra. The figure 5 shows that a set of parameters over a given direction, could be ordered by the PCA of the features. The distribution of the parameters over a direction was formed by sampling uniformly from minor to major each parameter range (coordinates). The 100 sets of parameters were integrated by combining the sampled values in this form: the first point is formed by the first elements of each coordinate, the second point by the second elements of each coordinate, and so on. The stop criterion was the number of evaluations of the objective function: 2000. The training of the backpropagation network lasted three minutes with a training set of 100 spectra and the stop criterion was the prediction accuracy: sum of squared differences <= to 0.01. 5 Conclusions From these preliminary results, we conclude that the selected features used as inputs to a neural network can increase the accuracy of the optimization algo- rithms, at the same time the features can be useful in the definition of a measure of the nearness in the parameter space that can be exploited in optimization of multiple instances, under certain restrictions, i.e. the data should be clustered and no distributed over all parameter space. Furthermore, the parameters of the optimization algorithms must be initialized in a non standard way for taking advantage of the first guess. 44 Fig. 4. Mean absolute errors of 100 spectra using Evolution Strategies Fig. 5. The first and second principal components of the features corresponding to spectra generated by 100 sets of parameters distributed along a hyper-line in the pa- rameter space. The first component preserves the same order of the parameter space, this is indicated by the numbers. 45 References 1. Terlevich, E. and Terlevich, R. and Torres-Papaqui, J.P. and Fernandes, Roberto Cid and Melnick, J. and Kunth, D. and Gu, Q. and Bressan, A., Characterizing the stellar population of the nuclei of active galaxies, Resolved Stellar Populations, D. Valls-Gabaud M. Chavez, TBA, ASP Conference Series, (2005). 2. Fernandes, Roberto Cid and Mateus, Abı́lio and Sodré, Laerte Jr. and Stasińska, Grazyna and Gomes, Jean M., Semi-empirical anlysis of Sloan Digital Sky Survey galaxies- I. Spectral synthesis method, Monthly Notices of the Royal Astronomical Society, vol. 358, 363-368, month 04, (2005). 3. Rosado, Margarita; Some Astronomical Niches With 3D Spectroscopy; Revista Mex- icana de Astronomia; Vol. 24, 92-101, (2005). 4. Kuntschner, Harald and Emsellem, Eric and Bacon, Roland and Bureau, Martin and Cappellari, Michele and Davies, Roger L. and de Zeeuw, Tim and Falcón-Barroso, Jesús and Krajnovic, Davor and McDermid, Richard M. and Peletier, Reynier F. and Sarzi, Marc, The stellar populations of E and S0 galaxies as seen with SAURON, arXiv, January, (2006). 5. Roth, Martin M. and Kelz, Andreas and Fechner, Thomas and Hahn, Thomas and Bauer, Svend-Marian and Becker, Thomas and Bohm, Petra and Christensen, Lise and Dionies, Frank and Paschke, Jens and Popow, Emil and Wolter, Dieter, PMAS: The Potsdam Multi-Aperture Spectrophotometer. I. Design, Manufacture and Per- formance, Publications of the Astronomical Society of the Pacific, no. 117, 620-642, June, (2005). 6. Bishop, Cristopher M., Neural Networks for Pattern Recognition, chapter 8, Oxford University Press, 2000. 7. Guyon, Isabelle and Elisseeff, André, An Introduction to Variable and Feature Se- lection, Journal of Machine Learning Research, 1157-1182, March (2003). 8. Back, Thomas, Evolutionay Algorithms in Theory and Practice, Oxford University Press, (1996). 9. Michalewicz, Zbigniew and Fogel, David B., How to Solve It: Modern Heuristics, Springer, (2002). 10. Solorio, Thamar and Fuentes, Olac and Terlevich, Roberto and Terlevich, Elena, An Active Instance-based Machine Learning method for Stellar Population Studies, Monthly Notices of the Royal Astronomical Society, (2005). 11. Alvarez, Luis and Fuentes, Olac and Terlevich, Roberto, Extracting stellar popu- lations parameters of galaxies from photometric data using evolution strategies and locally weighted regression, Lecture Notes in Computer Science, 395, volume 3215 month 9, KES2004, Springer Verlag, (2004). 46