=Paper=
{{Paper
|id=Vol-2744/paper7
|storemode=property
|title=The Two-Level Semi-Synchronous Parallelization Method for the Caustic and Indirect Luminance Calculation in Realistic Rendering
|pdfUrl=https://ceur-ws.org/Vol-2744/paper7.pdf
|volume=Vol-2744
|authors=Andrey Zhdanov,Dmitry Zhdanov
}}
==The Two-Level Semi-Synchronous Parallelization Method for the Caustic and Indirect Luminance Calculation in Realistic Rendering==
The Two-Level Semi-Synchronous Parallelization Method for the Caustic and Indirect Luminance Calculation in Realistic Rendering* Andrey Zhdanov [0000-0002-2569-1982], Dmitry Zhdanov [0000-0001-7346-8155] ITMO University, 49 Kronverksky Pr., St. Petersburg, 197101, Russia adzhdanov@itmo.ru, ddzhdanov@mail.ru Abstract. The paper considers an original approach to the semi-synchronous cal- culation of the luminance of caustic and indirect illumination for the group of methods based on the bidirectional stochastic ray tracing with backward photon maps. The designed parallelization method uses the two-level threads hierarchy. The low level of this thread hierarchy is synchronous calculations of the part of the whole image defined by a randomly generated pixel mask which is applied to the whole image. The top level is semi-synchronous parallelization level that con- sists groups of the low level threads which of them calculate own part of the whole image in a way similar to asynchronous calculations. As the top level is semi-synchronous it means that when calculating the luminance of the caustic and indirect illumination, the threads of the low level have access to the data accumulated in the backward photon maps of the other parallel threads of the semi-synchronous level. A special algorithm for organizing an access to data of the upper-level threads avoids delays associated with data synchronization. The comparison of the developed solution with purely synchronous and asynchronous parallelization methods is presented. Keywords: Ray Tracing, Photon Maps, Backward Photon Maps, Parallel Com- puting. 1 Introduction Realistic rendering is a significant component that is commonly used in the modern realistic visualization, virtual prototyping and virtual reality systems. In addition, it is used to solve a wide range of applied problems, including the realistic images forming, optical effects simulation, virtual prototyping of complex optical systems, etc. With increasing computing power and computation architecture complexity of modern com- puter systems, both the complexity of tasks for virtual prototyping and the required Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). * This work was funded by the Russian Science Foundation (project No. 18-79-10190). 2 A. Zhdanov, D. Zhdanov accuracy of calculations raise. The tasks of virtual prototyping, solved by physically correct realistic visualization methods, include modeling the illumination of optical sys- tems, modeling the human perception of synthesized images formed by complex optical systems, such as, for example, virtual or mixed reality systems, indicators on the wind- shield, and others. Traditionally realistic rendering algorithms are based on the Monte-Carlo ray tracing methods which are used to calculate the luminance. These ray tracing methods can be forward, backward or bidirectional. The most universal of these method for calculating the physically correct luminance of indirect and caustic illumination are the methods based on the use of photon maps [1, 2]. These rendering methods have a good parallel- ization capability, however effective usage of the photon map in a multi-core environ- ment is a challenging task. Despite there are existing solutions aimed at the effective CPU or GPU ray tracing as, for example, Intel Embree [3], Nvidia RTX [4] or AMD RadeonRays, they do not solve the problem of the effective processing of the traced rays for the needs of the realistic rendering with photon mapping. So, the research and development of effective rendering methods using all available computation resources of modern multi-core CPUs with continuously raising number of cores is still an urgent challenge. There are existing solutions aimed on parallelizing the photon mapping rendering algorithms using synchronous calculations on multi-core CPU [5], out-of-core photon mapping [6, 7], massive parallel calculations with GPUs [8, 9], distributed simulations [6, 10] and cloud rendering [11]. At the same time the rendering algorithm might be aimed either at the speed to achieve the real-time rendering or at the physical correct- ness of the simulated image. In the scope of the current article we concentrated on the effective usage of the backward photon maps in a multi-thread environment when ren- dering with the stochastic progressive backward photon mapping (SPBPM) method on a single-CPU workstation with multiple cores with aim at the physical correctness of simulation. 2 Rendering parallelization 2.1 Rendering method In the scope of the current research we used the method based on bidirectional ray tracing with backward photon mapping [12, 13]. Opposite to the traditional photon mapping based methods the backward photon maps are formed by the backward rays emitted from the camera and luminance transferred by forward rays is accumulated in these maps to form the final image. The general workflow of the rendering method is shown on the Fig. 1. The rendering method consist of four main steps: 1. Backward path tracing when the backward paths from the camera are generated and traced in the scene with the direct light and BDF samplings to account the direct luminance. 2. Backward photon maps forming along with creation of the acceleration structures. The Two-Level Semi-Synchronous Parallelization Method 3 3. Forward ray tracing when forward rays from light sources are generated and traced in the scene. By intersecting with previously formed backward photon maps, the indirect and caustic luminance is accumulated. 4. Final image forming when the luminance accumulated in backward photon maps is added to corresponding pixels of the image and weighted. The image accuracy is estimated and if the required accuracy is not achieved then calculation continue from the first step. Fig. 1. General workflow of the used backward photon mapping rendering method. 2.2 Traditional parallelization methods The traditional parallelization methods used for ray tracing-based rendering are synchronous and asynchronous calculations. Both traditional methods were implemented and tested on 12 cores of the Intel Xeon 6230 CPU and standard Cornell box scene. Ray tracing and processing speedup test results are presented on Fig. 2 and Tables 1 and 2. Synchronous calculations method for parallelizing the rendering process on a multi- core system is using all available cores in a synchronous way. In this case all threads 4 A. Zhdanov, D. Zhdanov share the same memory pool and render the same scene in synchronous way. The main problem of this approach is presence of the non-parallelizable parts of the algorithm that according to the Amdahl’s law [14] cause the significant ray tracing and processing slowdown when increasing the number of cores. As it can be seen from the presented graph the rays tracing and processing speed growth almost stops at some point when increasing the number of used computation cores. In opposite to the synchronous calculations method the asynchronous calculations use one main thread and a group of computation threads. Each of these computation thread performs the independent rendering of the whole image, and the main thread is used to control the computation threads and gather rendering results from the computation threads. Due to asynchronous nature of calculation all computation threads use their own private memory and as result it multiplies the memory usage of the whole rendering process by number of computation threads. As result as it can be seen from the presented graph the number of traced and processed rays growth linearly when increasing the number of the used cores. Fig. 2. Test results for traditional synchronous and asynchronous parallelization methods. At the same time after 10 minutes of calculations using all computation cores the syn- chronous method achieved the accuracy of 3.9%, while asynchronous method achieved the accuracy of 5.1%. So even if asynchronous method traces and processes more rays the accuracy attained after 10 minutes of calculations of the same scene is lower com- paring to synchronous calculations. The main reason of this slowdown is that in case of the asynchronous calculations the whole available memory is split between threads and as result rays are processed in smaller groups resulting in less connections made be- tween light sources and camera pixels. The Two-Level Semi-Synchronous Parallelization Method 5 2.3 Semi-synchronous parallelization method The research goal was to unite the scalability of the asynchronous calculations with higher accuracy of the synchronous calculations in the most effective way. So, in the scope of the current research the semi-synchronous parallelization method was devel- oped. This method consists of two parallelization levels: synchronous and semi-syn- chronous. The first parallelization level is synchronous rendering of the part of the whole scene image defined by the random mask, mainly of 32x32 pixels size. This mask defines image pixels that are used at the backward ray tracing step for the backward photon map forming. Each rendering step is parallelized independently in a synchronous way with one main thread that controls the rendering method workflow. The synchronous parallelization level is shown on the Fig. 3. Fig. 3. Synchronous parallelization level. The second parallelization level is semi-synchronous that unites several of synchronous thread groups to render the whole scene in semi-synchronous way. Each group of synchronous threads have their own random mask. These masks are generated by the main thread so that they cover all image pixels and at the same time do not intersect with each other. Once in a several number of rendering phases all threads are synchronized, the whole rendered image is formed, and masks are re-randomized. This re-randomization is required to guarantee that all synchronous thread groups have about equal load and all image pixels are processed equally. Each of the synchronous thread groups uses their own backward photon maps as result multiplying the memory usage required to store maps by the number of synchronous thread groups, however the size of each of these maps can be made smaller comparing to a map built for the whole image. The semi-synchronous parallelization level is shown on the Fig. 4. Fig. 4. Semi-synchronous parallelization level. 6 A. Zhdanov, D. Zhdanov It is possible that at the forward ray tracing step of the rendering method the forward ray traced by one synchronous group of threads do not find an intersection with the own backward photon maps, however this ray might result in the luminance forming in backward photon maps of the other synchronous groups of threads. To increase the ray processing ratio of the traced forward rays it is required provide an access to all backward photon maps that exist at the moment. This means that some flag should be maintained by the backward photon maps owner indicating that maps are available and open for other threads to be used. At the same time these maps should not be closed and deleted while some other thread is using them. The workflow of the rendering process in a synchronous group of threads with opening and closing an access to the own backward photon maps is shown on the Fig. 5. Fig. 5. General workflow of the single thread of the backward photon mapping rendering method with shared photon maps. The Two-Level Semi-Synchronous Parallelization Method 7 This opening and closing the access to the own backward photon maps should be per- formed without interrupting the calculation process and without using the critical sec- tions. So, in the scope of the current research the algorithm of asynchronous access to the backward photon maps was developed. This algorithm uses only atomic operations both to open and close the access to the own backward photon maps and to gain and release access to the other thread’s backward photon maps. For these needs the backward photon maps active thread counter is stored along with along with each of the backward photon maps. This counter shows how many inde- pendent threads are currently accessing corresponding map. If this counter is equal to zero that means that maps either do not exist or are not open for access by not-owner thread. If the counter is more than zero, then maps are available for other threads to be used. Only the thread that created photon maps can increase the counter from zero state. The owner thread can decrease the corresponding counter to zero only if it is equal to 1 that means that no other thread is currently using these maps. To open and close the access to the own backward photon maps at the beginning and end of the forward ray tracing step the following algorithm is used: 1. Increase the own backward photon maps active thread counter by 1 with an atomic increment operation. This would mark the own backward photon maps as ready to be used during forward ray tracing step for indirect and caustic luminance calcula- tion by other threads. 2. Proceed to the forward ray tracing along with indirect and caustic illumination cal- culation using both own backward photon maps and backward photon maps of the other threads that are open for access at the moment. 3. When the desired number of forward rays are traced and processed try to perform the atomic compare-and-swap operation to decrease the own backward photon maps active thread counter from 1 to 0. a. If the compare-and-swap operation succeeds, then it means that the current thread was the last one accessing these backward photon maps and new access is suc- cessfully closed for other threads. b. If the compare-and-swap operation fails it means that some other thread is still accessing current backward photon maps and they cannot be closed at the mo- ment. In this case more forward rays are traced and processed until this compare- and-swap operation can be successfully completed. As result the thread that created the backward photon maps performs the forward ray tracing and processing until two conditions are fulfilled: first the desired number of forward rays are traced and second the access to the own backward photon map can be successfully closed. To guarantee that thread do not get stuck in the forward ray tracing step the forward ray tracing step the thread stops using other backward photon maps after the required number of forward rays is reached. The workflow of the algorithm of opening ang closing an access to the own backward photon maps at the forward ray tracing step is shown on the Fig. 6. 8 A. Zhdanov, D. Zhdanov Fig. 6. Opening and closing the access to the own backward photon maps at the forward ray tracing step. The forward ray tracing step is also modified to account the indirect and caustic luminance that might be formed in backward photon maps of the other threads that are available at the moment of the forward ray tracing. To gain the access to the backward photon maps of the other threads it should temporary increase the other thread’s backward photon maps active thread counter to let it know that it is currently being used. The following operations are performed: 1. First of all, it should check the required backward photon maps active thread counter. If it is equal to zero, then maps are not available and should not be processed when calculating the indirect and caustic luminance formed by the current ray. 2. If the backward photon maps active thread counter is more than zero, then try to increment it by 1 with an atomic compare-and-swap operation. 3. If the compare-and-swap operation failed, then access was not granted, and this map is ignored when processing current forward ray. 4. If the compare-and-swap operation was successful, that means that access to the other thread’s backward photon maps was successfully granted. 5. If an access was granted, then intersection of forward ray with other thread’s back- ward photon maps is analyzed and in case of success the indirect and caustic lumi- nance are accounted in corresponding backward photons. Also, the forward ray is accounted in the total energy processed by this backward photon map. The Two-Level Semi-Synchronous Parallelization Method 9 6. After backward photon map processing is finished the corresponding backward pho- ton maps active thread counter is decreased by 1 with an atomic decrement operation to release the access. All operations related to increasing the luminance accumulated in the backward photon map and corresponding ray counters are implemented using solely atomic operations to ensure correct luminance accumulation from concurrent threads. The workflow of the forward ray tracing with accounting luminance in all available backward photon maps is shown on the Fig. 7. Fig. 7. The forward ray tracing with accounting luminance both in own backward photon maps and in backward photon maps of other threads. As result at the step when the forward ray tracing is performed the thread’s own backward photon maps are available for other threads to accumulate indirect and caustic luminance and all backward photon maps that are available at the moment of tracing the forward ray are used to account the ray’s luminance. As only atomic operations are used in the backward photon maps access algorithms no special synchronization between threads is required that gives linear scalability of the rendering process when 10 A. Zhdanov, D. Zhdanov increasing the number of cores. Due to the uniform task distribution between different synchronous thread groups this overlapping is quite high and results in more effective forward ray tracing along with indirect and caustic luminance accumulation. 3 Results The presented parallelization method was implemented and integrated in Lumicept light simulation software [15] and tested along with traditional synchronous and asynchronous methods on the same PC with the 12 cores of the Intel Xeon 6230 CPU and standard Cornell box scene. Ray tracing and processing speedup test results are presented in Tables 1-3 and on Fig. 9. Corresponding rendering results are shown on Fig. 8. Fig. 8. Rendering results attained with synchronous, asynchronous and semi-synchronous methods (from left to right) after 10 minutes. Table 1. Test results for synchronous parallelization method. Cores used 1 core 2 cores 4 cores 8 cores 12 cores Forward rays 20689305 34199691 53129762 64440303 71715295 Backward paths 26788848 44482520 68986597 83987647 93785374 Speedup 1 1.66 2.57 3.13 3.49 Table 2. Test results for asynchronous parallelization method. Cores used 1 core 2 cores 4 cores 8 cores 12 cores Forward rays 20717568 34101290 89476702 220730378 315318894 Backward paths 26790396 44481495 72347407 102960933 120367282 Speedup 1 1.65 3.41 7.81 9.17 Table 3. Test results for semi-synchronous parallelization method. Cores used 1 core 2 cores 4 cores 8 cores 12 cores Forward rays 20647944 53498805 120166069 220579897 338469540 Backward paths 26859521 43273242 55467718 72378193 93121907 Speedup 1 2.03 3.7 6.17 9.08 The Two-Level Semi-Synchronous Parallelization Method 11 Fig. 9. Comparison of test results for synchronous, asynchronous and semi-synchronous paral- lelization methods. As it can be seen the proposed approach have the same linear scalability of traced and process rays with the asynchronous calculation. After 10 minutes of calculations with the Cornell box test scene the synchronous calculations method achieved the accuracy of 3.9%, asynchronous calculation achieved the accuracy of 5.1% and the proposed semi-synchronous calculations achieved the accuracy of 2.6% as result of more effec- tive usage of the backward photon maps formed by different computation threads. The similar acceleration was achieved not only on Cornell box test scene, but also on scenes with light guiding optical systems, scenes containing volume scattering and others. Corresponding test scenes rendered images are shown on Fig. 10. Fig. 10. Light guiding optical systems and volume scattering test scenes. 4 Conclusion The developed two-level semi-synchronous parallelization method for the caustic and indirect luminance calculation can significantly increase the efficiency of calculating these luminance components on multicore systems. This approach allowed us to in- crease the rendering efficiency in non-parallelizable or poorly parallelizable algorithm sections and, when using small number of lower-level computation threads, to achieve 12 A. Zhdanov, D. Zhdanov an almost linear dependence of the rendering performance on the number of cores used. In addition, two-level semi-synchronous parallelization allows reducing the size of the backward photon maps in the upper-level threads, which makes it possible to speed up the process of maps voxelization and search of intersections with traced rays. The de- veloped approach can be used as a part of a distributed rendering system, providing high performance on a remote server service. References 1. Jensen, H. W.: Global illumination using photon maps. In: Proceedings of the eurographics workshop on Rendering techniques ’96, pp. 21–30. Springer-Verlag, London, UK (1996). 2. Kang, C. et al:. A survey of photon mapping state-of-the-art research and future challenges. In: Frontiers Inf Technol Electronic Vol 17, pp. 185–199. (2016). 3. Wald, I.: Embree ray tracing kernels: overview and new features. In: SIGGRAPH’16, New York, NY, USA, Art. 52. (2016). 4. Frolov, V.A.: Examination of the Nvidia RTX. In: Proceedings of Computer Graphics and Vision 2019, pp. 7-12. Bryansk, Russia. (2019). 5. He, H., Wang, T., Xu, Q., Xing, Y.: Multi-core Parallel of Photon Mapping In: Visual Infor- mation Communication, pp. 365-374, Springer, Boston, MA (2009). 6. Gunther, T., Grosch, T.: Distributed Out-of-Core Stochastic Progressive Photon Mapping. In: Computer Graphics Forum, vol. 33, pp. 154-166. (2014). 7. Schregle, R., Grobe, L.O., Wittkopf, S.: An out-of-core photon mapping approach to daylight coefficients. In: Journal of Building Performance Simulation 9:6, pp. 620-632. (2016). 8. Carlberg, K.: Stochastic Progressive Photon Mapping Using Parallel Hashing. Lund Univer- sity, Sweden. (2011). 9. Fabianowski, B.: Interactive Manycore Photon Mapping. University of Dublin, Irleand. (2011). 10. Günther, J., Waldy, I., Slusallek, P.: Realtime Caustics Using Distributed Photon Mapping. In: Proceedings of the 15th Eurographics Workshop on Rendering Techniques, Norkoping, Sweden. (2004). 11. Jacobsson, M.: Distributed Progressive Photon Mapping. Lund University, Sweden. (2014). 12. Hachisuka, T., Jensen, H. W.: Stochastic Progressive Photon Mapping. In: Proceedings of SIGGRAPH Asia '09, Nr. 41, pp. 1-8, New York, NY, USA. (2009). 13. Havran, V., Herzog, G., Seidel, H.-P.: Fast Final Gathering via Reverse Photon Mapping. In: Proceedings of the Eurographics 2005, Vol. 24, Nr. 5, pp. 323-333. Blackwell Publishing, Oxford, UK (2005). 14. Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18-20, 1967, spring joint computer conference (AFIPS ’67 (Spring)), pp. 483–485. New York, NY, USA (1967). 15. Lumicept – Hybrid Light Simulation Software, http://www.integra.jp/en, last accessed 2020/07/01.