Towards a Unifying View on Deconvolution in Cherenkov Astronomy? Mirko Bunse, Nico Piatkowski, and Katharina Morik TU Dortmund, AI Group, 44221 Dortmund, Germany {firstname.lastname}@tu-dortmund.de Abstract. Obtaining the distribution of a physical quantity is a fre- quent objective in experimental physics. In cases where the distribution of the relevant quantity cannot be accessed experimentally, it has to be reconstructed from distributions of correlated quantities that are mea- sured, instead. This reconstruction is called deconvolution. While several approaches to deconvolution exist in experimental physics, the problem is rather unknown in data science. Both fields can benefit from each other, but notational differences tend to prevent this. In this work, we outline the unification of existing approaches in order to pave the way for new joint research directions. Cherenkov astronomy is employed as a use case which illustrates the deconvolution problem. Keywords: deconvolution · unfolding · Cherenkov astronomy. 1 Introduction Obtaining the distribution of a physical quantity is a frequent objective in ex- perimental physics. An accurate and reliable estimate of the sought-after distri- bution, e.g. the energy spectrum of an astrophysical particle source, is crucial to understand the underlying physical principles. It is also important for the verification—or exclusion—of certain theoretical models. In cases where the dis- tribution of the relevant quantity cannot be accessed experimentally, it has to be reconstructed from distributions of correlated quantities that are measured, instead. This reconstruction is called deconvolution (or unfolding). It states a prominent problem faced in experimental physics. While several approaches to deconvolution exist in physics [2, 6, 8], the prob- lem is rather unknown in data science. Both fields can benefit from each other, but notational inconsistencies tend to prevent this. In this work, we outline the unification of existing approaches in order to pave the way for new joint research directions. We consider Cherenkov astronomy as a use case of deconvo- lution which illustrates the problems faced in reconstructing the distribution of a physical quantity from correlated quantities. ? This work has been supported by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource- Constrained Data Analysis”, projects C3 and A1. http://sfb876.tu-dortmund.de 2 M. Bunse et al. Cherenkov astronomy studies the energy distribution of cosmic γ radiation to reason about the characteristics of celestial objects emitting such radiation. One prominent tool in this field are Imaging Air Cherenkov Telescopes (IACTs), which are placed at high altitudes on the surface of our planet. Since it is not viable to detect high-energy γ particles directly there, IACTs record Cherenkov light emitted by a cascade of secondary particles, instead. Such cascades, called air showers, are induced by γ particles interacting with Earth’s atmosphere. Since the γ radiation is not directly measured by the telescopes, deconvolution is applied to reconstruct the γ energy distribution from the related Cherenkov light recorded by IACTs. Reconstruction γ Particle Air Shower Atmosphere Cherenkov Light Telescope Fig. 1. A γ particle interacting in Earth’s atmosphere produces a cascade of secondary particles. This air shower emits Cherenkov light, which is measured by an IACT [3]. The energy distribution of γ particles is reconstructed from IACT measurements. The Cherenkov light emitted by each shower is recorded as a video sequence, from which features are extracted. These features represent each air shower on a higher level of abstraction, e.g. by its geometrical shape and size. Deconvolution adopts each feature as one observable quantity, from which the energy distribution of observed γ particles is reconstructed. 2 The Deconvolution Problem Formally, the task of deconvolution is to estimate a density function f : Y → R of a random variable Y which represents a physical quantity, where Y denotes the state space of Y . In Cherenkov astronomy, the most prominent Y is the energy of γ particles of which the Cherenkov light is observed by an IACT. The task is aggravated by the following deficiencies inherent to the experimental setup [2]: Transformation When Y cannot be measured directly, one has to measure a related but different quantity X instead. Only with sufficient knowledge about the relation between X and Y , f can be reconstructed from the density g of the measurable quantity X. For IACTs, the features of the observed light cone are measurable quantities which are related to the energy Y . Finite resolution The measurement may not be exact, resulting in noisy mea- surements of the quantity X. IACTs may fail to capture the exact shape and size of a light cone if this cone is observed only partially. Towards a Unifying View on Deconvolution 3 Background noise Additional events may be recorded which do not originate from the respective process under study. For example, not every observed air shower is induced by a γ particle emitted by the monitored γ ray source. Limited acceptance The detector may fail to recognize some of the events that occur in the studied process. For example, IACT observations are dropped if the trigger is not sufficiently certain about the recording. A reliable estimate of the relevant density f is only obtained, if these deficiencies are rectified, i.e. if f is appropriately reconstructed from the measured density g. The name of deconvolution is motivated by g being modeled as a convolution of f with a detector response function R : X × Y → R. Z g(x) = R(x | y) · f (y) dy (1) Y R provides the link between X and Y , incorporating background knowledge about their relation. Specifically, R(x | y) represents the conditional probability of measuring some x ∈ X when the actual value of the relevant quantity is y ∈ Y. To obtain f , this model of g has to be inverted—it has to be “deconvolved”. This is done by fitting f to the model of g, when g and R are given1 . Classical deconvolution algorithms solve a discrete variant of the continu- ous deconvolution problem from Equation 1. This discrete variant presented in Equation 2 is a linear system of equations where f , g, and R from the continuous problem are replaced by discretized versions. The relation between the continu- ous case (Eq. 1) and the discrete case (Eq. 2) is clarified by Equation 3, where we denote the discretized problem for each discretized state of the observed data g = (g1 , . . . gJ ) ∈ RJ . g = Rf (2) PI ⇔ gj = i=1 Rij fi 1≤j≤J (3) Here, each dimension of the vector g corresponds to a specific discrete value discg (x), and each dimension of the vector f corresponds to a specific discrete value discf (y). Since the domain X of g is usually multi-dimensional, clustering is employed to map each vector-valued observation x ∈ X to the index j of the cluster which the observation belongs to, effectively aggregating the multi- dimensional feature space into a single dimension. The conditional probabili- ties in R are estimated from a set of training observations. Two deconvolution methods are considered state-of-the-art, even though they have not yet been bench-marked on IACT data: 1 Readers familiar with deconvolution will notice that our nomenclature is slightly different from the notation that is conventional in particle physics. Specifically, the roles of the letters x and y are exchanged here for compliance with the notation that is common in machine learning: Here, y refers to the value of the target variable and x is the observed value. This notation clarifies the relation between the deconvolution problem and classification—a relation that is important to our unifying view. 4 M. Bunse et al. Regularized Unfolding performs a maximum likelihood fit of Equation 2, as- suming that the absolute number of events for each discretized state of g(x) is Poisson-distributed conditioned on f [1, 2]. The optimization is carried out via iterative numerical second-order optimization. Since a large variance between consecutive states is not physically plausible, regularization is em- ployed to produce sufficiently stable results. The amount of regularization is controlled by the degrees of freedom in the second-order local model. Iterative Bayesian Unfolding estimates the target distribution by applying Bayes’ theorem to the observed frequencies g and the conditional probabil- ities embodied by the detector response matrix R [5, 6]. Bayes’ theorem is applied multiple times, each time updating the prior distribution with the respective latest estimate. Repeating the update reduces the influence of the—potentially inappropriate—initial prior distribution. 3 Deconvolution as a Classification Task The novel Dortmund spectrum estimation algorithm (DSEA) [8] is of particular interest for the data science community, because it translates the deconvolution problem into a multinomial classification task, instead of solving Equation 2. It thus opens the deconvolution problem to the field of machine learning in a principled way. Here, the discretized target variable Y serves as the label of the classification task. Thus, we may apply any multi-class classification algorithm to predict the range of Y -values which belong to an individual observation. DSEA underlies the intriguing idea that fi = P(Y ≡ i) can be recovered from a classifier’s confidence. To see this, consider Equation 4, where the conditional probabilities P̂(Y ≡ i|X = x) are estimated by confidence values cM (i | xn ) of the underlying classifier, e.g. a random forest. In addition, a uniform prior is imposed on x (given that X is a closed subset of Rd ). The outcome is the DSEA estimator shown in Equation 5, which estimates f from confidence values. X P̂(Y ≡ i) = P̂(Y ≡ i|X = x) · P̂(X = x) (4) x∈X N 1 X f̂i = cM (i | xn ) (5) N n=1 The deconvolution result is then improved by iterating the reconstruction, up- dating the distribution of the training set by weighting the examples according to the latest estimate of the target distribution. We suggest to scale the update step between iterations to ensure convergence of the algorithm. In fact, the orig- inal DSEA diverges from the true f after having found a suitable estimate in some iteration. Our suggestion is inspired by a common algorithmic pattern in numerical optimization, where the basic algorithm selects a search direction and a sub-routine called line search selects an appropriate step size for this direction. Towards a Unifying View on Deconvolution 5 original DSEA 0.2 α(k) = 0.3 Emd to f α(k) = 0.6 0.1 α(k) = k1 0 α(k) = √1k 0 5 10 15 5 10 Dsea iteration k Dsea iteration k Fig. 2. The convergence of DSEA is controlled by scaling each step with α(k) , as indicated by the Earth Mover’s Distance [7] between the estimate f̂ (k) in iteration k and the true density f . The original DSEA uses α(k) = 1. Left: Constant step sizes approach and leave the optimal solution slowly, facilitating the choice of a convergence threshold. Right: Decaying step sizes enforce convergence in proximity of the optimum. 4 Conclusion and Future Work We have illustrated the problem of deconvolution, employing Cherenkov astron- omy as a use case. Our presentation provides a basis for unifying the view on current state-of-the-art algorithms from particle physics in the context of ma- chine learning. Against the background of many possible synergies arising from collaborative research on deconvolution, we have given one example of an algo- rithmic improvement inspired by numerical optimization. Even though deconvolution methods have been surveyed elsewhere [4], a com- parative study evaluating these methods on benchmark data is still missing. We plan to deliver such a study on data taken with IACTs. References 1. Blobel, V.: Unfolding methods in high-energy physics experiments. Tech. rep., CERN (1985) 2. Blobel, V.: An unfolding method for high energy physics experiments. In: Adv. Stat. Techniques in Part. Phys. pp. 258–267 (2002) 3. Bockermann, C., Brügge, K., Buss, J., Egorov, A., Morik, K., Rhode, W., Ruhe, T.: Online analysis of high-volume data streams in astroparticle physics. In: Proc. of the ECML-PKDD 2015. pp. 100–115. Springer (2015) 4. Cowan, G.: A survey of unfolding methods for particle physics. In: Adv. Stat. Tech- niques in Part. Phys. (2002) 5. D’Agostini, G.: A multidimensional unfolding method based on Bayes’ theorem. Nucl. Instrum. Methods Phys. Res. A 362(2-3), 487–498 (1995) 6. D’Agostini, G.: Improved iterative Bayesian unfolding. arXiv:1010.0632 (2010) 7. Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: Proc. of the 6th ICCV. pp. 59–66. IEEE (1998) 8. Ruhe, T., Börner, M., Wornowizki, M., et al.: Mining for spectra – the Dortmund spectrum estimation algorithm. In: Proc. of the ADASS XXVI (2016)