=Paper=
{{Paper
|id=Vol-2679/short16
|storemode=property
|title=Modifications to the EMC Algorithm for Orientation Recovery in Single Particle Imaging Experiments on X-ray Free Electron Lasers
|pdfUrl=https://ceur-ws.org/Vol-2679/short16.pdf
|volume=Vol-2679
|authors=Sergei Zolotarev
}}
==Modifications to the EMC Algorithm for Orientation Recovery in Single Particle Imaging Experiments on X-ray Free Electron Lasers==
Modifications to the EMC Algorithm for Orientation Recovery in Single Particle Imaging Experiments on X-ray Free Electron Lasers Sergei Zolotarev National Research Centre “Kurchatov Institute,”, pl. Akademika Kurchatova 1, Moscow, 123182, Russia Abstract. The emergence of super-bright light sources - X-ray free elec- tron lasers(XFELs) combined with Single Particle Imaging(SPI) method, makes it possible to obtain nanometer resolution 3D structure of biolog- ical particles such as proteins or viruses without needing to freeze them. SPI relies on the “diffraction before destruction” principle, meaning that each sample only produces a single diffraction image before being de- stroyed by an X-ray pulse. The orientation of the particle in the beam is random for each shot. This gives rise to the problem of orientation recovery, in which an array of 2D diffraction images has to be combined into a single 3D image, necessary for the reconstruction of 3D structure of the studied particle. The orientation recovery problem is most com- monly solved by the EMC algorithm, which is the most computationally expensive part of data analysis for SPI experiments. In this work we in- troduce several modifications to the EMC algorithm aimed at improving the quality of reconstruction and/or increasing the algorithm’s speed of convergence. We analyse the effectiveness of these modifications using simulated diffraction data. 1 Introduction The emergence of fourth generation light sources - X-ray Free Electron Lasers (XFEL), opens up new possibilities in many fields of science. Compared to third generation synchrotron light sources, XFELs produce extremely bright (more than 1012 photons) and extremely short (tens of femtoseconds) x-ray pulses [6]. These properties provide a new way to study the 3D structure of bioparticles such as proteins and viruses - Single Particle Imaging (SPI) [8]. Compared to other methods such as X-Ray crystallography or cryogenic electron microscopy, SPI has the advantage of being able to study non-crystalline bioparticles in their natural state (suspended in water). SPI relies on ”diffraction before destruction”[2] principle, which abuses the unique properties of XFEL pulses. The pulses are bright enough to produce an Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). informative diffraction pattern, and they are at the same time are short enough, so that all the scattering happens before the radiation damage takes place. Since the scattering particle is destroyed in the process, multiple identical particles are injected into the XFEL beam during the experiment. This gives rise to orientation recovery problem: SPI produces a number of flat diffraction images of the particle in an unknown orientation. In order to reconstruct the 3D structure of said particle these images need to be combined to produce a single 3D diffraction density. The established way to solve this problem is by using the EMC algorithm [5]. In this work we introduce three modifications of orientation recovery algo- rithm EMC, which aim to improve quality of reconstruction and to increase the speed of convergence. We examine their effectiveness using simulated diffraction data. 2 Methods 2.1 Orientation recovery In orientation recovery problem we have an array of Mdata diffraction patterns scattered by identical particles in unknown orientations. Each such pattern is a spherical slice of a single 3D density in reciprocal space, and the goal of an orientation recovery algorithm is to reconstruct this 3D density W by finding to which orientation each diffraction pattern belongs to. This problem is most commonly solved by the EMC algorithm. EMC algorithm starts with input data (diffraction images K) and an initial approximation of 3D diffraction density W 0 . Then this initial approximation is iteratively improved until it converges to the final value of W T . Each iteration of the algorithm consists of three steps which give the name to the algorithm: Expand, Maximize and Compress. During Expansion step the current approximation W, which is typically rep- resented by its values on regular 3D grid, is converted to tomographic represen- tation Wij - values of W in points corresponding to i-th pixel of the diffraction image with scattering particle in j-th orientation. In order to do this we sample the 3D rotation group SO(3) by a finite number of ”evenly spaced” rotations Rj (j = 1 . . . Mrot ). And for each pixel of the detector we calculate a corresponding point in reciprocal space qi (i = 1 . . . Mpix ). For typical flat detectors all these points will be lying on Ewald’s sphere. After calculating rotations Rj and points qi we can define Wij as W (Rj · qi ), which is calculated via linear interpolation. The Maximization step updates current approximation of 3D diffraction den- sity Wij → Wij0 based on maximizing log-likelihood function Q(W 0 ). This step is equivalent to one iteration of EM algorithm [3] and in itself consists of two steps. First we calculate Pjk - the probabilities of each image Dk (k = 1 . . . Mdata ) to be produced by the particle in j-th orientation, conditional on current model val- ues Wij as the product of Poisson probabilities at each detector pixel. Then we calculate new values of 3D density Wij0 , maximizing the expected log-likelihood: MP data Pjk Dik k=1 Wij0 = MP . data Pjk k=1 Finally, Compression step converts Wij0 back onto regular 3D grid by revers- ing the interpolation procedure used in the expansion step. In essence, EMC algorithm is equivalent to EM where after each iteration we perform combi- nation of compression and expansion steps, which enforce extra constraints on current model Wij . Whereas for EM all Wij are treated as independent vari- ables, in reality they are derived from values on 3D intensity grid, and when the distance between points Rj · qi and Rj 0 · qi0 is small, values Wij and Wi0 j 0 are not independent. In this work we propose three modifications to the EMC algorithm: Incremental EMC modifies the maximization procedure, instead of perform- ing a single update of 3D intensity W using data from all images, only one image is used on each iteration. For randomly selected image Dk∗ probabilities Pjk∗ are calculated and then the current values of Wij are immediately updated: MP data old old ( Pjk Dik ) − Pjk ∗ Dik ∗ + Pjk ∗ Dik ∗ k=1 Wij0 = MP . data ( old ) − P old + P ∗ Pjk jk∗ jk k=1 Such an update can be performed in O(1) time if the numerator and denominator old of Wij are saved separately, as well as all the values of Pjk [7]. This way Mdata of such updates can be performed in the same time as the single maximization step of EMC algorithm. This incremental M-step is followed by compression and expansion steps same as in regular EMC algorithm. However, expansion steps outputs only values of Wij , and in order to use out incremental maximization step again we P first need to perform a normalPM-step to obtain separate values Mdata Mdata old of numerators k=1 Pjk Dik , denominators k=1 Pjk and probabilities Pjk . Effectively, this modification of EMC alternates its M-steps between normal and incremental, and only performs compression and expansion steps after every other iteration. Stepwise EMC uses batches of images in its maximization step. First a new 3D diffraction density Wij∗ is calculated using only a subset of images D∗ ∈ D. Then we take weighted average between current desity Wij and Wij∗ as the result of maximization: Wij0 = (1 − (2 + t)−γ )Wij + (2 + t)−γ Wij∗ , where t is the number of iteration and 0.5 < γ ≤ 1 is a coefficient that ensures that the weight before newly calculated density W ∗ exponentially decreases, thus guaranteeing convergence of the algorithm [10]. This maximization step is then performed several more times with different subsets of images, until all Mdata images have been used. Then follow normal compression and expansion steps. Adaptive EMC is the final proposed modification which does not modify any of the three steps of EMC, and instead it only takes effect after the M-step, when the new values of Wij0 are calculated. Instead of taking the point W 0 itself, which maximizes the expected log-likelihood function Q(W 0 ), we try to go further in the same direction: Wˆij = Wij + α(Wij0 − Wij ), where coefficient α ≥ 1 is adaptively changed based on log-likelihood Q(Ŵ ). If Q(Ŵ ) ≥ Q(W ) then α is increased and the Ŵ is accepted as the result of current iteration, otherwise α is reset to 1 and W 0 is taken as a result instead. This approach guarantees that log-likelihood never decreases and the algorithm thus converges [9]. 2.2 Testing the algorithms To evaluate the performance of our modifications and to compare them to regular EMC we tested them using simulated diffraction data. In the first test we used a binary contrast torus in reciprocal space as our object. Using such a simple model allows us to successfully perform orientation recovery in a short time and without tuning of any reconstruction parameters. This model allows us to easily compare different algorithms, but it has a draw- back of being very far removed from actual experimental data. For that reason we performed a second more life-like test. In the second test we used diffraction patterns of keyhole limpet hemocyanin [4] simulated by Dragonfly [1] software package. In order to produce successful reconstructions from this data, we used deterministic annealing modification of EMC as described in [1]. In both tests for each algorithm we performed the reconstruction 5 times, using a random initial approximation of 3D diffraction density W 0 . Then we compared the output of each iteration W t with the initial density Wtrue , that was used to generate the diffraction images. We used root mean square difference (RMS difference) between these two 3D diffraction densities as our metric of quality of reconstruction. 3 Results In the first test we used 1000 images with 4900 pixels in each image. Number of possible rotations Mrot was increasing every 10 iterations. Starting from 420 0.24 0.22 0.20 RMS difference 0.18 0.16 0.14 regular incremental stepwise 0.12 adaptive 0 10 20 30 40 50 Iteration Fig. 1. Average and standard deviation of root mean square difference between the output of each algorithm and the initial 3D diffraction density. Dashed lines indicate the points where the number of rotations considered in reconstruction changed. possible orientations on the first ten iterations and up to 10860 orientations for iterations 41 through 50. The results of this test are presented on Fig. 1. One can see that out of three proposed modifications, only incremental EMC demonstrated better results than unmodified algorithm. Stepwise EMC proved to be too dependent on the random selection of the first batch of images used in reconstruction. Adaptive EMC didn’t provide any boost to the speed of conver- gence, due to the fact that the coefficient α never became greater than 1 without decreasing the expected log-likelihood. For these reasons stepwise and adaptive modifications were ruled out as in- effective, and for the second test only Incremental EMC was evaluated. In this test we used 5000 images with 10000 pixels and 10860 possible orientations. The results of this test are presented on Fig. 2. Both algorithms demonstrated similar results, however our chosen metric of quality doesn’t perform very well on this test. The difference between 0-th iteration which is just random noise and the final output of the algorithm is quite small, when looking only at the RMS difference between the reconstructed and initial 3D diffraction density. Due to this we cannot conclusively evaluate the performance of incremental modification of EMC algorithm. 4 Conclusion In this work we proposed, developed, and tested three modifications to the EMC algorithm, used to solve orientation recovery problem in SPI experiments. After the first preliminary test, adaptive and stepwise modifications have shown worse results than unmodified algorithm and were thus ruled out as non-viable. For the more promising incremental modification we performed a second set of tests with more life-like input data. In these tests both incremental and regular EMC 0.091 Regular Incremental 0.090 RMS difference 0.089 0.088 0.087 0.086 0 5 10 15 20 25 30 Iteration Fig. 2. Average and standard deviation of root mean square difference between the output of regular and incremental EMC algorithms and the initial 3D diffraction den- sity. demonstrated similar results, but due to instability of reconstruction process and difficulty of the evaluation of reconstruction quality, we can not definitively say that one algorithm is better than the other. Additional tests may be required to establish that. 5 Acknowledgements This research was supported by Russian Science Foundation (project No. 18-41- 06001) and Helmholtz Associations Initiative Networking Fund (HRSF-0002). References 1. Ayyer, K., Lan, T.Y., Elser, V., Loh, N.D.: Dragonfly: an implementa- tion of the expand–maximize–compress algorithm for single-particle imag- ing. Journal of Applied Crystallography 49(4), 1320–1335 (Aug 2016). https://doi.org/10.1107/S1600576716008165 2. Chapman, H.N., Barty, A., Bogan, M.J., Boutet, S., Frank, M., Hau-Riege, S.P., Marchesini, S., Woods, B.W., Bajt, S., Benner, W.H., London, R.A., Plonjes, E., Kuhlmann, M., Treusch, R., Dusterer, S., Tschentscher, T., Schneider, J.R., Spiller, E., Moller, T., Bostedt, C., Hoener, M., Shapiro, D.A., Hodgson, K.O., van der Spoel, D., Burmeister, F., Bergh, M., Caleman, C., Huldt, G., Seibert, M.M., Maia, F.R.N.C., Lee, R.W., Szoke, A., Timneanu, N., Hajdu, J.: Femtosecond diffractive imaging with a soft-x-ray free-electron laser. Nature Physics 2(12), 839–843 (Nov 2006). https://doi.org/10.1038/nphys461 3. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incom- plete data via the em algorithm. Journal of the Royal Statistical Society: Series B (Methodological) 39(1), 1–22 (1977). https://doi.org/10.1111/j.2517- 6161.1977.tb01600.x 4. Gatsogiannis, C., Markl, J.: Keyhole limpet hemocyanin: 9-a CryoEM structure and molecular model of the KLH1 didecamer reveal the interfaces and intricate topology of the 160 functional units. Journal of Molecular Biology 385(3), 963– 983 (Jan 2009). https://doi.org/10.1016/j.jmb.2008.10.080 5. Loh, N.T.D., Elser, V.: Reconstruction algorithm for single-particle diffraction imaging experiments. Physical Review E 80(2) (Aug 2009). https://doi.org/10.1103/physreve.80.026705 6. Mcneil, B., Thompson, N.: X-ray free-electron lasers. Nature Photonics 4 (12 2010). https://doi.org/10.1038/nphoton.2010.239 7. Neal, R.M., Hinton, G.E.: A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants, pp. 355–368. Springer Netherlands, Dordrecht (1998). https://doi.org/10.1007/978-94-011-5014-9-12 8. Neutze, R., Wouts, R., van der Spoel, D., Weckert, E., Hajdu, J.: Potential for biomolecular imaging with femtosecond x-ray pulses. Nature 406(6797), 752–757 (Aug 2000). https://doi.org/10.1038/35021099 9. Salakhutdinov, R., Roweis, S.: Adaptive overrelaxed bound optimization methods 2 (10 2003) 10. Sato, M., Ishii, S.: On-line EM algorithm for the normalized gaus- sian network. Neural Computation 12(2), 407–432 (Feb 2000). https://doi.org/10.1162/089976600300015853