Параллельные вычислительные технологии (ПаВТ’2016) || Parallel computational technologies (PCT’2016)

                                             agora.guru.ru/pavt


        Spatially Efficient Tree Layout for GPU Ray-tracing of
                  Constructive Solid Geometry Scenes
                        D.Y. Ulyanov1,2, D.K. Bogolepov2, V.E. Turlapov1
                       University of Nizhniy Novgorod1, OpenCASCADE2

         A novel GPU-optimized CSG ray-tracing approach is presented that is fast and accurate,
         and allows achieving real-time frame rates at full-screen resolutions. It has no limitations
         on the maximum number of primitives, and produces final image in a single pass. We pro-
         pose an efficient procedure to transform an input CSG tree into equivalent spatially coher-
         ent and well-balanced form. Through various experiments, we show that our solution al-
         lows interactive rendering of CSG models consisting of more than a million CSG primi-
         tives on consumer graphics cards.

         Keywords: Constructive solid geometry, rendering, ray-tracing, GPU, optimization.


1. Introduction
    Constructive Solid Geometry (CSG) is the geometric method that forms complicated shapes from
simpler 3D primitives using the Boolean operations union (), intersection (), and subtraction (\).
The (sets of) primitives involved in each operation and the sequence of operations create a so-called
CSG tree. Thus, CSG tree is a binary tree with leaf nodes as primitives and interior nodes as Boolean
operations. CSG is often used as a fundamental modeling approach in CAD/CAM/CAE applications.
However, computation of the geometry corresponding to the CSG expression can be a slow process,
which is often unacceptable for interactive scene editing. In some cases, a near real-time rendering of a
CSG shape can be achieved by using multi-pass image-based techniques taking advantage of hardware
depth and stencil buffers. But these algorithms impose limitations on the maximum depth complexity
and, in general, are bounded by memory bandwidth. The main contribution of this paper is a novel
GPU-optimized CSG ray-tracing algorithm, as well as an efficient procedure for conversion of input
CSG tree into spatially coherent and well-balanced form. The proposed solution is relatively fast, can
be easily integrated into existing ray-tracing systems and, as we show in our experiments, outperforms
previously available algorithms.

2. Previous Work
     In general, there are two basic approaches to render a CSG model. The first one is based on pre-
computing of the boundary of a CSG shape which can be tessellated into a triangular mesh and then
rendered using conventional graphics methods. Since evaluation of CSG boundary is computationally
expensive, these algorithms are mainly limited to static models and do not allow interactive editing.
The second approach involves so-called image-based algorithms which generate just the image of a
CSG model without expensive computation of the full shape geometry. Most of these algorithms are
designed for graphics hardware and based on multi-pass, view-specific techniques making extensive
use of depth and stencil buffers. The typical algorithms in this class are Goldfeather algorithm [1, 2]
and the Sequenced Convex Subtraction (SCS) algorithm [3]. The first one allows handling all types of
CSG primitives, while the second one is optimized for models consisting of convex primitives only.
However, none of these algorithms is capable of rendering arbitrary CSG trees directly. Instead, an
input tree is transformed into a sum-of-products (normal) form that can lead to exponential growth of
the number of CSG operations and significantly reduces the performance for complex CSG shapes.
     An alternative approach has been proposed in the later work [4]. The so-called Blister algorithm
does not require a conversion to the sum-of-products form. Instead, it converts an arbitrary Boolean
combination of primitives into the Blist form [5] containing each input primitive only once. To render
a CSG shape, Blister uses peeling technique to produce layers of the entire primitive set in depth order


                                                    388
Параллельные вычислительные технологии (ПаВТ’2016) || Parallel computational technologies (PCT’2016)

                                            agora.guru.ru/pavt


(each layer is the Z-buffer representation of a 3D scene that allows only one fragment stored at each
pixel). Each peel is classified according to its CSG expression and then combined.
     The above algorithms can achieve interactivity for relatively complex CSG shapes (thousands of
primitives). However, all these techniques use many rendering passes, and thus are bandwidth limited.
For many years, GPU memory bandwidth grows slower than computing performance, resulting in a
data transfer bottleneck for many GPU-accelerated applications. A completely different approach was
adopted in [6]. In this work, an attempt has been made to distribute the workload between a CPU and a
GPU, by performing spatial decomposition of input CSG tree on a CPU and ray-tracing of its simple
parts on a GPU. The algorithm has proven to be effective for relatively simple CSG shapes (hundreds
of primitives). Whereas more complex models require subdivision into a larger number of parts that
leads to a huge number of draw calls and performance decrease.
     Ray-tracing of the entire CSG tree is possible and used quite widely. However, most approaches
to render CSG scenes require computing all intersections of a ray with a primitive. The ray is broken
into intervals corresponding to the intersected primitives. After that the Boolean operations are applied
to find out the first interval that is actually inside a CSG object. Due to a large amount of computation
and significant memory consumption this approach can be extremely expensive. Moreover, it is poorly
suited for a GPU, the effective use of which requires tens of thousands of threads running in parallel.
Since GPU hardware resources are divided among threads, low resource usage is crucial to support a
plurality of simultaneously-active threads. However, the implementation of interval CSG ray-tracer on
the GPU is still possible as shown in [7]. Unfortunately, this approach tends to be limited by the num-
ber of primitives and maximum depth complexity due to the necessity of storing interval representa-
tion of the whole scene in GPU memory.
             function INTERSECT(node, min)
              minL ← min
              minR ← min
              (tL,NL) ← INTERSECT(L(node), minL)
              (tR,NR) ← INTERSECT(R(node), minR)
              stateL ← CLASSIFY(tL, NL)
              stateR ← CLASSIFY(tR, NR)
              while true do
               actions ← table[stateL, stateR]
               if Miss ∈ actions then
                return miss
               if RetL ∈ actions or (RetLIfCloser ∈ actions and tL ≤ tR) then
                return (tL, NL)
               if RetR ∈ actions or (RetRIfCloser ∈ actions and tR ≤ tL) then
                if FlipR ∈ actions then
                 NR ← −NR
                return (tR, NR)
               else
                if LoopL ∈ actions or (LoopLIfCloser ∈ actions and tL ≤ tR) then
                 minL ← tL
                 (tL, NL) ← INTERSECT(L(node), minL)
                 stateL ← CLASSIFY(tL, NL)
                else
                 if LoopR ∈ actions or (LoopRIfCloser ∈ actions and tR ≤ tL) then
                  minR ← tR
                  (tR, NR) ← INTERSECT(R(node), minR)
                  stateR ← CLASSIFY(tR, NR)
                 else
                  return miss

                                    Fig. 1. Recursive CSG intersection
     A quite different approach based on single-hit ray-tracing (finding only nearest intersection) has
been proposed in [8]. The algorithm uses a concept of state machine to calculate the intersection with a
CSG model. The only limitation is that the basic CSG primitives should be closed (can be relaxed to
handle orientable surfaces), non-self-intersecting and have consistently oriented normals. This elegant
idea makes it quite easy to integrate CSG rendering into existing ray-tracing systems. Although the
paper does not contain any characteristics of the algorithm, it looks suitable for the GPU and inspired
our work. In the remainder of this section, we outline the main steps of this algorithm and point out
some inaccuracies in the original state tables.
     Let T be a CSG tree, and let L(T) and R(T) be the left and right sub-tree of T. To find the nearest
intersection of ray R and tree T the ray is shot at sub-trees L(T) and R(T), and then the intersection with
the each sub-tree is classified as one of entering, exiting or missing it. Based upon the combination of


                                                  389
Параллельные вычислительные технологии (ПаВТ’2016) || Parallel computational technologies (PCT’2016)

                                                                agora.guru.ru/pavt


these two classifications, one of several actions is taken: (a) returning a hit; (b) returning a miss; (c)
changing the starting point of ray R for one of sub-trees and then shooting this ray again, classifying
next intersection. In latter case, the state machine enters a new loop (see Figure 1). Kensler proposed 3
state tables (one for each Boolean operation) needed to ray-trace a CSG shape. Unfortunately, these
state tables are not complete and lead to incorrect visualization. In this paper we provide refined state
tables allowing correct visualization in all cases (see Table 1). Kensler’s algorithm is recursive and
poorly suited for a GPU. While recursion is supported on CUDA-enabled GPUs, the iterative version
with a manually-managed state stack provides a much better performance, and can be implemented in
the environment without recursion support (e.g., OpenGL, OpenCL). However, transforming of the
algorithm into iterative form is not trivial due to a large number of parameters and local variables.
                                                 Table 1. State tables for Boolean operations

        Enter R(T)     Exit R(T)       Miss R(T)           Enter R(T)       Exit R(T)      Miss R(T)     \     Enter R(T)     Exit R(T)       Miss R(T)
 Enter   RetLIfCloser   RetRIfCloser                 Enter   LoopLIfCloser    RetLIfCloser               Enter   RetLIfCloser   LoopLIfCloser
                                          RetL                                                 Miss                                               RetL
 L(T)    RetRIfCloser      LoopL                     L(T)    LoopRIfCloser      LoopR                    L(T)      LoopR        LoopRIfCloser
                                                                                                                 RetLIfCloser   RetRIfCloser
 Exit    RetLIfCloser   LoopLIfCloser                Exit    RetRIfCloser     RetLIfCloser               Exit
                                          RetL                                                 Miss              RetRIfCloser      FlipR          RetL
 L(T)      LoopR        LoopRIfCloser                L(T)       LoopL         RetRIfCloser               L(T)       FlipR          LoopL
 Miss                                                Miss                                                Miss
            RetR            RetR          Miss                   Miss              Miss        Miss                 Miss            Miss          Miss
 L(T)                                                L(T)                                                L(T)


2. GPU-Optimized CSG Ray-tracing
2.1 Stack-based CSG traverse

     As our main contribution, we propose the iterative CSG ray-tracing algorithm that uses minimal
state and is optimized for massively parallel architectures with limited (per thread) memory resources
like GPUs. For that purpose we define a high-level state machine that manages the execution of the
original algorithm in the iterative manner (see Figure 2).


                                                      Fig. 2. High-level state machine
     The use of state tables for each Boolean operation (let us call them CSG tables) is based on pre-
computed intersections with child objects of the current CSG node. Thus, all of the states of high-level
pushdown automata are divided into two categories: (a) calculation of intersections with child objects,
and (b) applying CSG tables for classification of the processed CSG node. The first category includes
the states GotoLft (finding the intersection with the left sub-tree), GotoRgh (finding the intersection
with the right sub-tree), and SaveLft (storing the intersection parameters with the left sub-tree and
then execution of GotoRgh). The last state is needed since the processing of the right sub-tree leads
to trashing local variables. The second category includes the following states: Compute (applying
CSG tables), LoadLft (loading intersection data for the left sub-tree and then execution of Compute),
LoadRgh (loading intersection data for the right sub-tree and then execution of Compute). General
scheme of transition between the states is shown in Figure 3. Here the GoTo() function (see Figure 4)
calculates intersection points with left and right sub-trees, while the Compute() function (see Figure
4) classifies these points in order to detect the first intersection of a ray with the actual boundary of
CSG shape. Note that GoTo() function enables the use of bounding boxes to improve the performance


                                                                             390
Параллельные вычислительные технологии (ПаВТ’2016) || Parallel computational technologies (PCT’2016)

                                            agora.guru.ru/pavt


of intersection function. Such bounds are calculated for each node of CSG tree to obtain a bounding
volume hierarchy [9].
             tmin ← 0
             node ← V // virtual root whose left subtree is the real root
             (tL, NL) ← invalid
             (tR, NR) ← invalid
             PUSHACTION(Compute)
             action ← GotoLft
             while true do
              if action ≡ SaveLft then
                tmin ← POPTIME()
                PUSHPRIMITIVE(tL, NL)
                action ← GotoRgh
              if action ∈ {GotoLft, GotoRgh} then
                GOTO()
              if action ∈ {LoadLft, LoadRgh, Compute} then
                COMPUTE()

                                      Fig. 3. Iterative CSG traversal
  function GOTO()                                       function COMPUTE()
   if action ≡ GotoLft then                              if action ∈ {LoadLft, LoadRgh} then
    node ← L(node)                                        if action ≡ LoadLft then
   else                                                    (tL, NL) ← POPPRIMITIVE()
    node ← R(node)                                        else
   if node is Operation then                               (tR, NR) ← POPPRIMITIVE()
    gotoL ← INTERSECTBOX(L(node))                        stateL ← CLASSIFY(tL, NL)
    gotoR ← INTERSECTBOX(R(node))                        stateR ← CLASSIFY(tR, NR)
   if gotoL and L(node) is Primitive then                actions ← table[stateL, stateR]
    (tL, NL) ← INTERSECT(L(node), tmin)                  if RetL ∈ actions or
    gotoL ← false                                         (RetLIfCloser ∈ actions and tL ≤ tR) then
   if gotoR and R(node) is Primitive then                  (tR, NR) ← (tL, NL)
    (tR, NR) ← INTERSECT(R(node), tmin)                    action ← POPACTION()
    gotoR ← false                                          GOTOPARENT(node)
   if gotoL or gotoR then                                if RetR ∈ actions or
    if gotoL then
                                                          (RetRIfCloser ∈ actions and tR < tL) then
     PUSHPRIMITIVE(L(node), tL)
     PUSHACTION(LoadLft)                                   if FlipNormR ∈ actions then
    else if gotoR then                                       NR ← −NR
     PUSHPRIMITIVE(R(node), tR)                            (tL, NL) ← (tR, NR)
     PUSHACTION(LoadRgh)                                   action ← POPACTION()
    else                                                   GOTOPARENT(node)
     PUSHTIME(tmin)                                      else if LoopL ∈ actions or
     PUSHACTION(LoadLft)                                  (LoopLIfCloser ∈ actions and tL ≤ tR) then
     PUSHACTION(SaveLft)                                  tmin ← tL
    if gotoL then                                         PUSHPRIMITIVE(tR, NR)
     action ← GotoLft                                     PUSHACTION(LoadRgh)
    else                                                  action ← GotoLft
     action ← GotoRgh                                    else if LoopR ∈ actions or
   else                                                   (LoopRIfCloser ∈ actions and tR < tL) then
    action ← Compute                                      tmin ← tR
  else                                                    PUSHPRIMITIVE(tL, NL)
   // node is a Primitive                                 PUSHACTION(LoadLft)
   if action ≡ GotoLft then                               action ← GotoRgh
    (tL, NL) = Intersect(node, tmin)                     else
   else                                                   tR ← invalid
    (tR, NR) = Intersect(node, tmin)                      action ← POPACTION()
   action ← Compute
   GOTOPARENT(node)


                            Fig. 4. GOTO stage (left) and COMPUTE stage (right)

2.3 Optimizing CSG Trees

     It is obvious that the performance of our algorithm greatly depends on the topology of CSG tree
that affects spatial coherence of primitives and height of the tree. However, the creation of a balanced,
unbalanced, or a perfect CSG tree depends generally on the user. Thus, it is necessary to transform an
input tree T into an equivalent well-balanced tree T of roughly the same size as T.
     We propose an efficient pipeline for optimizing CSG trees that runs in four phases: (a) converting
the input tree T to a positive form; (b) spatial optimization of tree topology; (c) minimizing height of
the tree; (d) reverse converting to a general form giving the output tree T.
2.3.1 Converting to positive form
     A CSG tree T is represented in the positive form using only  and  operations and negation of
leaf nodes. This conversion can be easily done using the following transformations:


                                                  391
Параллельные вычислительные технологии (ПаВТ’2016) || Parallel computational technologies (PCT’2016)

                                             agora.guru.ru/pavt


                          𝑥 ∪ 𝑦 = 𝑥 ∩ 𝑦,      𝑥 ∩ 𝑦 = 𝑥 ∪ 𝑦,        𝑥−𝑦 =𝑥∩𝑦
The above transformations are applied to the tree in a pre-order traversal, and thus all complements are
propagated to the leaf nodes. The reverse conversion to general form can be performed using a post-
order traversal (in this case all negations are first removed from the children of each node).
2.3.2 Spatial optimization
     For optimal performance, the tightness bounds of CSG tree nodes should be used which minimize
the probability of ray intersection. For this purpose, we propose the spatial optimization procedure
allowing minimizing the bounds of CSG nodes. Let us define treelet as the collection of immediate
descendants of the given CSG tree node. Our optimization procedure is based on repeatedly selecting
of treelets consisting of nodes with the same Boolean operation and their subsequent restructuring (in
positive form, we are free to change the order of treelet nodes). Treelets are constructed during a pre-
order traversal of CSG tree by expanding child nodes that have the same Boolean operation as the
treelet root. The resulting treelet is reorganized by means of surface area heuristic (SAH), widely used
for construction of accelerating structures such as k-d tree or Bounding Volume Hierarchy (BVH).
Thereafter, the traversal of CSG tree continues with the outer treelet nodes.
     The restructuring of treelet is based on the same binned technique as is used for construction of
BVH [10]. Binned BVH is constructed over all treelet leaves bounded by axis-aligned boxes pre-
computed for the input tree T given in general form. Because of this, for treelet leaves corresponding
to negative CSG primitives we use their original (non-complemented) bounds. This strategy produces
slightly better trees, because infinite bounding boxes do not provide any useful information related to
primitive positions.
2.3.3 Minimizing tree height
     Our algorithm evaluates intersection point in iterative manner by maintaining a stack. However,
on massively parallel architectures like GPUs managing full per-ray stacks leads to significant storage
and bandwidth costs. To reduce the traversal state size we desire a well-balanced CSG tree. Our next
optimization stage is aimed to address this problem by minimizing the height of CSG tree using local
transformations. At this stage, two types of treelets are considered. For brevity, let us call the child
node with a greater height (in the whole tree T) the heavy child. The first type is formed of treelets
which have the same Boolean operation ( or ) in root node N1 and its heavy child N2 (see Figure
5a). Let T3 be a heavy child of the node N2. Obviously if h(T3) > h(T1) + 1 it is beneficial to transpose
these subtrees. As with the rotations for binary search trees these result in elevating subtree T3 and
demoting subtree T1. Thus, the height of the treelet, rooted at N1, is decreased by one.


                         Fig. 5. Optimizations of first (left) and second (right) type
     The second type of treelets corresponds to the case where the operations in the root node N1, its
heavy child N2 and heavy grandchild N3 are interleaved (i.e. −− or −−). Let us consider the
−− sequence (see Figure 5b). In this case, the treelet rooted at N1 can be described by expression:
T1  (T2  T3  T4) = (T1  T2)  (T1  T3  T4). Let T4 be a heavy child of the node N3. Therefore, if
h(T4) > h(T1) + 2, then the normalization of the given treelet allows reducing its height by one. Please
note that this normalization is localized, and thus has no effect on other tree nodes. However, even this
optimization is undesirable because it results in duplication of the subtree T1. For this reason we use
such transformations only when optimizations of the first type have been exhausted. We use the multi-
pass scheme, where at each pass a CSG tree is traversed in post-order, and appropriate restructuring
patterns are applied.


                                                    392
Параллельные вычислительные технологии (ПаВТ’2016) || Parallel computational technologies (PCT’2016)

                                                      agora.guru.ru/pavt


4. Results and Discussion
     For this study, all results have been measured using an NVIDIA GeForce GTX 680, AMD Rade-
on HD 7870 and Intel HD 4000 GPUs. All timings correspond to rendering in a 1280 × 720 window.
The first test scene shows a CSG model of the city at different scales (see Figure 6). In all below cases
the whole City scene is modeled as a single CSG tree. In a simple configuration (a), the model con-
tains 3385 primitives. More complex configurations (b and c) contain 343K and 987K CSG primitives
correspondingly. Scene b from upper row shows the case with extreme number of depth layers that is
rather challenging for other approaches. Therefore, this test allows analyzing the performance depend-
ing on the complexity of the CSG model. For each GPU results are represented by two columns (see
Table 2): left one corresponds to measured FPS without spatial optimization (−), and the right one was
obtained with enabled spatial optimization (+). N/A markers shown were performance clearly cannot
be considered to be interactive.


                                       a)                                       b)                              c)


                                       g)                                       h)                              i)

                                            Fig. 6. City scene and Cheese scene
     The second test scene represents a procedural Swiss cheese CSG model with the holes of varying
radius (see Figure 6). Number of holes increases from 1000 (left) to 8000 (middle), and then to 32000
(right ) resulting in a larger number of overlapped primitives and greater depth complexity. Thus, un-
like the City model, the performance of the Swiss cheese model is affected greatly by spatial optimiza-
tion.
     The third test scene demonstrates a large number of satellites orbiting a planet (see Figure 8). In
this case, each satellite is represented by a separate CSG tree. A plurality of independent satellites are
placed into the scene as outer nodes of high-level BVH. Since we are able to interactively rebuild ac-
celerating structure each frame (at least for tens of thousands of objects), we can arbitrarily modify
transformations of particular CSG trees. As a result, it becomes possible to edit the scene or to animate
arbitrary shapes. In our test case, the satellites move across the planet along randomly selected orbital
tracks.
                                             Table 2. Measured performance
                                                      Intel 4000            Radeon HD 7870    GeForce GTX 680
   Scene         Primitives   Tree Depth
                                                 −                  +       −            +     −           +
  City (a)         3385          14              7                 7.5      50           60    51          57
  City (b)       343589          22             1.8                4.5     6.5           17     8          22
  City (c)       987218          24             2.3                 7      6.7           18   8.3          21
 Cheese (a)        1002          11             0.4                17      4.6          110   5.8         128
 Cheese (b)        8002          14             N/A                6.5     0.5           28   0.5          32
 Cheese (c)       32002          17             N/A                0.5     N/A          3.7   N/A           4
Satellites (a)    87565           7              5                  9       26           67    29          65
Satellites (b)   1120065          7             2.8                4.5       8           18     7          15
Satellites (c)   1120065          7             2.5                4.5     4.2            9   5.6          12

     We found that our implementation scales well with increasing the GPU clock speed (Figure 7
shows linear dependence on clock speed). Therefore, we can expect further performance increase on
later generations of GPUs. In contrast, the memory clock does not affect performance, which confirms
the assumption that the algorithm is not memory-bound.


                                                              393
Параллельные вычислительные технологии (ПаВТ’2016) || Parallel computational technologies (PCT’2016)

                                            agora.guru.ru/pavt


    From practical point of view, there are several factors which can affect the rendering perfor-
mance. The first one is screen resolution as for over ray-tracing methods. The frame rate decreases
almost linearly increasing the total number of processed pixels. The second important factor is the
number of primitives, but however, it does not affect performance directly. Experiments show that we
can easily render City scene containing more than 1 million CSG primitives while having trouble with
32K Swiss cheese model. This is due to extensive overlaps between the primitives in cheese model
which force the algorithm to iterate over the CSG sub-trees intensively in order to classify intersection
points. Moreover, the efficiency of spatial optimizer also suffers from a large number of overlapped
primitives. However, even in this stress scenario, we can show near linear performance degradation
depending on the number of primitives.
        35                                                     40
                   GF680                                                GF680
        30                                                     35
  FPS


                                                         FPS
        25                                                     30

        20                                                     25

        15                                                     20
             500      700          900         1100              2400   2600     2800     3000    3200

                    GPU Core Clock (MHz)                                GPU Memory Clock (MHz)

        Fig. 7. Rendering performance on Cheese 8K scene depending on GPU clock speed (GF GTX 680).
      Given that our solution is based on ray-tracing, it can be naturally extended to produce various
visual effects such as transparency, shadows, reflections, refractions, etc. Using OpenGL/GLSL as the
main API for GPU computations allows seamless interoperability between CSG rendering engine and
standard OpenGL pipeline. For that purpose we calculate the depth value for each processed fragment,
based on the intersection time and camera projection matrix. For example, the rendering can be ex-
tended with text annotations, axes, or generic triangulated objects drawn by OpenGL. Finally, our
solution allows implementing the hardware-accelerated selection mechanism. To this end, we write
unique IDs of intersected objects into a separate texture allowing to identify a CSG primitive, or even
its particular face, which lies under the given pixel.


                                  a)                                    b)                               c)
                                             Fig. 8. Satellites scene

4. Conclusion
     We proposed a GPU-optimized CSG rendering approach, which is fast and accurate, and allows
achieving real-time frame rates at full-screen resolutions. Unlike alternative image-based CSG algo-
rithms, our solution is more compute-bound than bandwidth-bound, and does not impose restrictions
on the maximum number of CSG primitives being limited only by available GPU memory. We also
proposed the efficient pre-processing stage to convert an input CSG tree into equivalent spatially co-
herent and well-balanced form. As a result, our CSG rendering system provides interactive or even
real-time performance for CSG models consisting of more than a million CSG primitives on consumer
graphics cards.


                                                   394
Параллельные вычислительные технологии (ПаВТ’2016) || Parallel computational technologies (PCT’2016)

                                          agora.guru.ru/pavt


References
1. Goldfeather, J., Monar, S., Turk, G., Fuchs, H. Near real-time CSG rendering using tree normali-
   zation and geometric pruning // IEEE Symposium on Computer Graphics and Applications. 1989.
   P. 20-28.
2. Kirsch, F., Döllner, J. Rendering techniques for hardware-accelerated image-based CSG // Journal
   of WSCG. 2004. Vol. 12, No. 1-3, P. 269-276.
3. Stewart, N., Leach, G., Sabu J. Linear-time CSG rendering of intersected convex objects // Journal
   of WSCG. 2002. Vol. 10, No. 1-2, P. 437-444.
4. Hable, J., Rossignac, J. Blister: GPU-based rendering of Boolean combinations of free-form trian-
   gulated shapes // ACM Transactions on Graphics. 2005. Vol. 24, No. 3, P. 1024-1031.
5. Rossignac, J. BLIST: A Boolean list formulation of CSG trees // Technical Report GIT-GVU-99-
   04 available from the GVU Center at Georgia Tech. http://www.cc.gatech.edu/gvu/reports/1999/
   1999.
6. Romeiro, F., Velho, L., De Figueiredo L. H. Hardware-assisted rendering of csg models // SIB-
   GRAPI'06. 19th Brazilian Symposium. 2006. P. 139-146.
7. Lefebvre, S., Grand-Est, L. I. N. IceSL: A GPU accelerated CSG modeller and slicer // AEFA'13,
   18th European Forum on Additive Manufacturing. 2013.
8. Kensler A. Ray tracing CSG objects using single hit intersections URL:
   http://xrt.wdfiles.com/local--files/doc%3Acsg/CSG.pdf.
9. Cameron, S. Efficient bounds in constructive solid geometry // IEEE Computer Graphics and Ap-
   plications. 1991. P. 68-74.
10. Wald, I. On fast construction of SAH-based bounding volume hierarchies // IEEE Symposium on
    Interactive Ray Tracing. 2007. P. 33-40).


                                                395