<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Restricted Extensions for GPU Photo-realistic Renderer</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Gubkin Russian State University of Oil and Gas</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Keldysh Institute of Applied Mathematics RAS</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Moscow State University</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>V.V. Sanzharov</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Photo-realistic rendering systems on CPU traditional ly have significant flexibility achieved mainly by the ability for end user to write custom plugins or shaders. The same cannot be said about majority of photo-realistic GPU renderers. Most «classic» approaches to design of user-extendable software on CPU, such as object-oriented plugins are not very wel l suited for GPU programming. In this paper we propose a restricted approach to developing extendable GPU rendering system at low development cost. Our hardware agnostic light-weight approach can be applied to existing rendering systems with minimal changes to them. We apply our approach to the problem of procedural textures implementation and show that, in addition to simplicity, our approach is faster then existing GPU solutions.</p>
      </abstract>
      <kwd-group>
        <kwd>photo-realistic rendering</kwd>
        <kwd>ray tracing</kwd>
        <kwd>GPU</kwd>
        <kwd>procedural textures</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Modern photo-realistic rendering systems are mov-ing
towards GPU implementation with many strong players
actively developing GPU versions of their products
[19]. This tendency is becoming even more
pronounced with the advent of hardware accelerated ray
tracing technology (i.e. Nvidia RTX) which is
available to general public. However, for a long time
industry (such as visual effects, animated films,
architectural visualization and others) used CPU renderers
which are known for their flexibility and extensibility.
By these terms we mean the ability given to the
enduser of the rendering system to easily add new features
such as procedural shading, texturing, custom BSDF or
light source models.
1.1</p>
      <sec id="sec-1-1">
        <title>Related work</title>
        <sec id="sec-1-1-1">
          <title>Plugins</title>
          <p>One of the most powerful traditional approaches is
object-oriented plugins [14, 18]. This approach is
highly flexible, however it has well-known drawbacks
and limitations. One of the most serious problems —
inability to use hardware at full speed due to
encapsulation [1]. For example, SIMD will require to change
interface and even then, we have to sacrifice portabil-ity.
Although the object-oriented approach is possi-ble on
the GPU, its efficiency is extremely low due to GPUs
are not designed for dynamic-dispatching code [2].
Another problem is the actual limited flexibility of the
interfaces: it is impossible to create an interface that
will satisfy all users in the future. Thus, Do-main
Specific Languages (DSL) is the more powerful
approach.</p>
          <p>One of the first methods to formulate custom shading
operations was «shading trees» [3] - directed acyclic
graphs with input values in leafs, operations
(such as addition and multiplication) in nodes and
final value in the root. Different materials, lights,
atmospheric effects are formulated with different trees
which are combined by a specific procedure by
rendering system. Such trees can’t specify loops
orconditional execution. In pixel stream editor proposed in
[13], custom procedure is executed onevery pixel
using some arbitrary per-pixel data as input. These two
approaches served as a basis for RSL. In RSL there are
shaders of different types (light, surface, volume, etc.),
which are called by the rendering system.
Computations can be executed with different rates — per-batch
and per-sample.</p>
          <p>
            One of the first shading languages -
RSL(RenderMan Shading Language) was developed asa part of
RenderMan [
            <xref ref-type="bibr" rid="ref25">7</xref>
            ] and more modern Open
ShadingLanguage (OSL) [27] was initially developed for Arnold
renderer. Both of these rendering systems are
currently CPU based, although GPU version of Arnold is
in development. Open Shading Language (OSL)[27]
was specifically designed for ray-tracing based
algorithms. It uses LLVM framework and just in time
(JIT) compilation. The shaders are first compiled to
bytecode and then translated into x86/x64
instructions. Approaches based on full-fledged compilers are
certainly the most powerful and flexible. Their main
disadvantages are high complexity, high development
cost and obstructeddebugging. In any case, exist-ing
solutions uses CPU: RSL [
            <xref ref-type="bibr" rid="ref25">7</xref>
            ], OSL [27], VEX [22] and
other [4]. Although, Octane GPU renderer
implements a subset of OSL for procedural textures [26].
          </p>
          <p>GPU programming enforces more restrictions and is
more difficult in general, so it is hard to design
comparable flexible solution with efficient GPU
implementation which will preserve high performance. In
real-time graphics applications the difficulty is
mitigated by the fact that there exists standardized graph-ics
pipeline with known stages. In ray tracing sim-ilar
«pipeline-like» approach was adapted in OptiX [11],
RTX [25] and OpenRL [21]. However, it requires the
whole rendering systeminfrastructure to be
implemented using these technologies which are limited to
specific hardware. Running ahead, our approach is
hardware-agnostic.</p>
          <p>There are several well-known hardware-accelerated
shading languages: GLSL, HLSL, Cg, Metal. There
are approaches that propose new languagescompiled to
one or several hardware shading languages such as [15]
to improve portability across different rendering
engines. In [8] authors propose approach to building
shaders from modular components written in a
domain specific language. In shading language proposed in
[6] shaders are mapped tomore than one graph-ics
pipeline stage and implement the concept of
inheritance from object-oriented programming to make
shaders easily extendable andreusable.</p>
          <p>In [10] a framework suitable for creation of
custom shader pipelines is proposed. These pipelines are
mapped to a series ofkernels which are then
scheduled for sequential execution. Pipelines can target
different hardware such as GPUs or multi-core CPUs.
Authors demonstrate application to rasterization and
Reyes pipelines. Authors in [12] propose Ray Tracing
Shading Language (RTSL) which is based on GLSL
and to some extent on RSL. RTSL was developed
with CPU rendering systems in mind and makes use of
SIMD extensions and packet tracing.</p>
          <p>Another solution is OptiXray-tracing engine by
Nvidia [11]. It provides the user with the ability to
create programs of different types - an approach simi-lar
to conventional graphics APIs. OptiX can be hard-ware
accelerated on Nvidia Turing hardware [24] by the
means of RTX technology. RTX is also available in the
other graphics APIs (DirectX and Vulkan) and an
approach largely similar to OptiX can be used with these
APIs instead [25]. Caustic Graphics OpenRL [21] was
the earlier technological analog of same ideas with
hardware acceleration which did not reach mass product
scale.
1.1.3</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>Disadvantages of existing GPU approaches</title>
          <p>
            OptiX [11] utilizes mega-kernel execution model
where user programs are linked together into a
monolithic kernel. It behaves like a state machine where
the state identifier selects what program shouldbe
executed next with some optimizations to reduce
register pressure (by using same register set for
different stages). This approach is considered ineffective
because of several reasons [9], [
            <xref ref-type="bibr" rid="ref28 ref29">5</xref>
            ]. First, it uses
single, maximum register number for of all programs and
some states degrade performance of others in this way.
Second, it is complex to profile and has unstable
performance. Finally, high branch divergence between
states can significantly degrade performance. To be
fair, it should be said that this approach still has
advantages. First, no overhead for launching kernels
(which presents in other approaches) because all code is
merged into a single kernel. Second, this approach
allows recursion support, function calls and exceptions
[11].
          </p>
          <p>
            An alternative to mega-kernel is simple divide and
conquer strategy, called separate-kernel [
            <xref ref-type="bibr" rid="ref28 ref29">5</xref>
            ]. This
approach avoids high register pressure via manual
splitting of code in to several kernels. Unfortunately, it
can’t help with extension of renderer with custom user
programs by itself because separate-kernel is manual
optimization by it’s definition.
          </p>
          <p>In [9] wavefront pathtracing was suggested. The
idea of wavefront pathtracing is to push and
execute different stages of mega-kernel state machine in
separate GPU queues. Wavefront pathtracing solves
both branch divergence and register pressure problem as
different programs are actually executed insepa-rate
kernels. This is true for both user and render-system
defined programs, thus, ray-sorting/thread-sorting
approaches are enabled here. Wavefront ap-proach
also has limited support for recursion as it can be
thought of as breadth-first search in tree. It seems that
Nvidia RTX partially uses this approach (at least for
ray-scene intersection). Unfortunately wavefront
pathtracing also has disadvantages. First, it has
unpredictable memory footprint and kernel ex-ecution
overhead in general. This is mainly because each «call
of a function» that actually executes in sep-arate queue
must perform at least several operations:(1) append all
arguments of «a function» toglobal memory, (2) save
the whole current state of execut-ing program in global
memory, (3) read current state of executing program
from global memory (with the returned result) and
continue executing when child queue of «a function»
will be completed. With the addition of recursion this
approach quickly becomes memory-hungry due to high
breadth-first search cost combined with large amount of
threads (100K–1M). Second, wavefront approach has
limited ability to par-allelize computations because
userdefined procedures are not guaranteed to create work
in parallel. And usually they do not. Imagine a
common case of shoot-ing several rays in RTX samples
via loop that leaves no other choice for implementation
other than to pro-cess all rays in groups for the first
iteration, second and subsequent iteration of loop. Such
approach sim-plifies things for user but limitsefficiency.</p>
          <p>Thus, mega-kernel is more natural for CPUand
wavefront approach is better for GPUs.
Nevertheless, both mega-kernel and wavefront methods have
enormous implementation cost, huge complexity and
obstructed debugging for end-user as the final
generated code and execution model are greatly different
from original input code. This approach is suitable
for large companies that heavilyinvest in compilers
and hardware.
1.1.4</p>
        </sec>
        <sec id="sec-1-1-3">
          <title>Closest analogues</title>
          <p>When using existing solutions small development
teams become dependent on hardware and lose the
portability of their products. At the same time they
have no resources for supporting their own compilers
and thus often prefer not to use thestechnologies at
all [20, 26]. Both Cycles [20] (open source)and
Octane[26] (commercial) restrict their programmabil-ity
to procedural textures. Shaders can evaluate input
parameters for material and light models, but can not
change these models.</p>
          <p>There are several reasons for such decision. First,
programming/extending things like material or light
source models is too hard for end-user. This is mainly
due to the fact that modern rendering systems use
advanced light transport algorithms with multiple
importance sampling and each material and light should be
able to not only generate samples but also calcu-late
their probability density (both forward and re-verse if
BPT[17] is used for example) which is a quite tricky
task — correctness of light integration could be easily
broken by the user. Second, excessive exten-sibility
leads to re-compilation for the wholesystem every
time (OptiX approach) and can be too long and
inconvenient for the end-user. Thus «shading trees»
approach is still widely used for extending materials.
Moreover, shading trees are convenient for artists who
use visual programming approach via some GUI to
simply «draw» them.</p>
          <p>Cycles approach is mature but has disadvantages.
Cycles transforms acyclic shader graph directtloy
GLSL code which is further compiled in to a kernel.
It’s first disadvantage is that the actual argument val-ues
mapping is produced during the code generation
process. So, the input values are placed in GLSL code as
constants and if some procedural textures are used twice
with different argument values Cycles will dupli-cate
calls (which will definitely lead to branch diver-gence
for different rays executing same texture with different
parameters). Apart from the fact that the effectiveness
of such a decision is questionable, any change in the
parameters of procedural textures will lead to a
recompilation of the final texture kernel each render
launch. Second, Cycles does not allow to add new
basic nodes «on the fly». It requires that all ba-sic
nodes to be described in a single place and btoe free
of name conflicts. Adding new basic node leads to the
whole system (Cycles) recompilation. Finally, generated
code is complex for debugging (in any way) due to
aggressive code generation.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed solution</title>
      <p>
        Our render system is designed in separate-kernel
approach [
        <xref ref-type="bibr" rid="ref28 ref29">5</xref>
        ] (fig. 1). In analogue to Cycles we allow for
programmable procedural textures only to restrict
complexity and preserve performance for «fixed»
functionality of the system. This way we achieve true modular
architecture where it’s easy to change the particular part
of ray tracing algorithm (for example acceleration
structure traversal or BRDF evaluation) without affecting
the others.
      </p>
      <p>Fig. 1. Our architecture. Generated kernel is shown with
an arrow. Users do not write this kernel directly. Kernel is
generated in a certain way from a set of user defined
procedures.</p>
      <p>It is also easy to add newfunctionality by
introducing new kernel as additional computational step.
This is the waywe implement procedural textures —we
add new kernel which executes procedural texture code
on hit and writes the results into the global mem-ory,
other stages are not affected. With this approach the task
of integrating procedural textures code sub-mitted by
the end-user into the renderer is reduced to
development of some mechanism to properly «in-sert»
it into procedural textures kernel (as in Cycles).
However, this is where oursimilarity ends.</p>
      <p>Unlike Cycles we allocate a separate memory area
inside material buffer for storinagrgument values
there (fig. 2). We call this area «fake stack». The
code generator can think of this memory region as a
stack, placing arguments in it. But since all parame-ters
are constant, «fake stack» in fact is just a read-only
region of global memory. Thus, we can update this
region from the CPU side without kernel
recompilation. Besides, «fake stack» region is almost
unlimited in space so (unlike Cycles)we are not limited in
amount and size of arguments. Thus, the assignment of
the actual value of the argument occurs when the
texture is attached to the material.</p>
      <p>Fig. 2. Material layout in memory. Blue and green
material pages (first three rectangles) represent different
BRDF nodes (of shader tree) that may reference different
regions of argument storage in the same buffer.</p>
      <p>The next major difference of our approach and
existing (at least Cycles) is nested procedural textures
processing. Existing approaches usually justinline
code of child textures in to the parent. This approach
works well if one construct the final nestetdexture
from a large number of small basneodes. However
for heavy nodes (like Perlin or other noises which are
quite common for procedural textures) this leads to
excessive waste of registers due toheavy code
duplicated several times.</p>
      <p>Our solution comes from assumption thabtasic
building blocks (defined byuser) are heavy enough in
their majority. For such case it is moreefficient to
process them in the same way as interpreter does (fig.
3). This can be thought of as a restricted mega-kernel
approach. We allocate small stack inside kernel and use
DFS traversal of texture graph to get topo-logical
order. This would help us to ensure that all
parameters needed for this function call have already
been calculated before the call. Results of intermedi-ate
calls are also stored onstack because they could be
used as a parameters in next functioncalls.</p>
      <p>At last, we implement some custom features like
ambient occlusion (AO) in «fixed function» by
introducing new kernel that computes all AO rays in
parallel. This gains us additional performance over
naive approach (table 1).
2.1</p>
      <sec id="sec-2-1">
        <title>Implementation details</title>
        <p>We provide possibility to end-users to write their
own custom procedural textures in OpenCL C99. To
resolve name conflicts on different procedural textures
we transform function names using Clang
LibTooling. It provides possibility to build AST for OpenCL
source code and then modify original source
codeusing this tree. We add unique prefixes to all
userfunctions calls, definitions and declarations according to
the actual texture ids. We also used Clang to replace
«embedded calls» and «embedded types» that allow
procedural texture to read surface attributes
(«readAttr»), global engine settings and sampling from 2D
images («tex2D») inside usercode:
float3 pos = readAttr("WorldPos");
float3 norm = readAttr("Normal");
float3 tang = readAttr("Tangent");</p>
        <p>This approach separates implementation details
from user and allow us to silently change
implementation in future. Using Clang for code instrumenta-tion
allows us to not agree with the user in detail about
some kind of concrete interface. We intention-ally
didn’t use any new syntax so that the inputuser code is
still 100% C99 code. In this way it can be eas-ily
integrated with separate C/C++ application for
debugging/testing or any other purpose. Example of
simple user procedural texture which multiplies colors of
2 different images:
float4 userProc(sam2D tex1, sam2D tex2)
{
float2 texCoord = readAttr("TexCoord0");
float4 texColor1 = tex2D(tex1, texCoord);
float4 texColor2 = tex2D(tex2, texCoord);
return texColor1*texColor2;
}</p>
        <p>It should be mentioned that unlike
existingapproaches, our instrumentation changes code only
slightly. This way it is possible to look at the
generated kernel, directly modify and debug it the same
way as any other OpenCL kernel. This is essentially
different from Cycles approach. Also, it is possible to
use the proposed approach to implement other
extensions (not just procedural textures) to the rendering
system such as procedural geometry.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Comparison</title>
        <p>We compared our approach with RTX-based path
tracer implemented in Vulkan API and simulated Cy-cles
approach — we inserted all constants and textures directly
in code of a single procedural textures to sim-ulate
Cycles. As our test scenariowe used Sponza scene
with inserted 3d model. First we rendered the scene
with simple diffuse light gray material on the model
(produced images were substantially identical) and then
changed material to use procedural texture (fig. 4). We
measured the drop in performance to see how well
relatively heavy computations in shading will be handled.</p>
        <p>Procedural rust is a blend between two materials
simple diffuse gray and the base rust mate-rial. The base
rust material is based on noise functions and color ramps
(fig. 5). The mask for blending in addition to noise
functions also uses ambient occlu-sion. Scratches
material is based on noise functions and voronoi
pattern (fig. 6). Comparison results are shown in
(tab.1).</p>
        <p>Fig. 5. Example of rust procedural texture applied to a model close-up (left) and shader node tree in Blender (right,
some nodes are actually collapsed parts of the tree). Note, that different model and base material (with reflective
component) are used here for the sake of image clarity.</p>
        <p>Table 1. Rendering time increase with proc. texture
compared to simple diffuse material and stack frame size
obtained with ''-cl-nv-verbose'' key for OpenCL compiler.</p>
        <p>In this way (table 1) using restricted mega-kernel
we outperform Cycles approach for complex
procedural textures (with ambient occlusion rays) in both
speed and stack-frame size but have little loss of speed
for simpler texture (scratches). We also beat RTX
implementation for rust because proposed implementa-tion
processes ambient occlusion rays in parallel while both
RTX and Cycles are limited to tracing one ray at a time
for each thread. In comparison with Cycles ap-proach
we have smaller stack frame size for both tex-tures.</p>
        <p>This demonstrates that restricted mega-kernel approach
performs its functions.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Acknowledgments</title>
      <p>This work was sponsored by RFBR 18-31-20032
grant.</p>
    </sec>
    <sec id="sec-4">
      <title>4. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>rust (time</article-title>
          ) +
          <volume>46</volume>
          % +33% +21%
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>scratches (time) +9% +4% +15%</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>rust (stack frame) 604 bytes unknown 464 bytes</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>scratches (stack frame) 412 bytes unknown</article-title>
          336 bytes [1]
          <string-name>
            <surname>Albrecht</surname>
            <given-names>T.</given-names>
          </string-name>
          <article-title>Pitfalls of Object Oriented Programming</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>Development Division archives</article-title>
          .
          <year>2013</year>
          . [2]
          <string-name>
            <surname>Barik</surname>
            <given-names>R.</given-names>
          </string-name>
          et al.
          <article-title>Eficient mapping of irregular C++</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>mization</surname>
          </string-name>
          .
          <source>- ACM</source>
          ,
          <year>2014</year>
          . - p.
          <fpage>33</fpage>
          . [3]
          <string-name>
            <surname>Cook</surname>
            <given-names>R. L</given-names>
          </string-name>
          . Shade trees //ACM Siggraph Computer
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Graphics</surname>
          </string-name>
          .
          <article-title>-</article-title>
          <year>1984</year>
          . - Vol.
          <volume>18</volume>
          . -
          <fpage>№</fpage>
          . 3. - p.
          <fpage>223</fpage>
          -
          <lpage>231</lpage>
          . [4]
          <string-name>
            <surname>Deryabin</surname>
            <given-names>N. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhdanov D. D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sokolov</surname>
            <given-names>V. G.</given-names>
          </string-name>
          <string-name>
            <surname>Em-</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>ware // Programming and Computer Software</source>
          .
          <year>2017</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          Vol.
          <volume>43</volume>
          , №1, pp
          <fpage>13</fpage>
          -
          <lpage>23</lpage>
          . [5]
          <string-name>
            <surname>Frolov</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kharlamov</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ignatenko</surname>
            <given-names>A</given-names>
          </string-name>
          . Biased Global
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>Tracing on GPUs</article-title>
          .
          <source>GraphiCon'2010</source>
          , p.
          <fpage>49</fpage>
          -
          <lpage>56</lpage>
          . [6]
          <string-name>
            <surname>Foley</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanrahan</surname>
            <given-names>P</given-names>
          </string-name>
          . Spark: modular, composable
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <article-title>shaders for graphics hardware</article-title>
          .
          <source>- ACM</source>
          ,
          <year>2011</year>
          . - Vol.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          30. -
          <fpage>№</fpage>
          . 4. - p.
          <fpage>107</fpage>
          . [7]
          <string-name>
            <surname>Hanrahan</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawson</surname>
            <given-names>J.</given-names>
          </string-name>
          <article-title>A language for shading and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <year>1990</year>
          .
          <article-title>-</article-title>
          .
          <volume>24</volume>
          . -
          <fpage>№</fpage>
          . 4. - p.
          <fpage>289</fpage>
          -
          <lpage>298</lpage>
          . [8]
          <string-name>
            <surname>He</surname>
            <given-names>Y.</given-names>
          </string-name>
          et al.
          <article-title>Shader components: modular and high per-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Graphics</surname>
          </string-name>
          .
          <article-title>-</article-title>
          <year>2017</year>
          . - Vol.
          <volume>36</volume>
          . -
          <fpage>№</fpage>
          . 4. - p.
          <fpage>100</fpage>
          . [9]
          <string-name>
            <surname>Laine</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karras</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aila</surname>
            <given-names>T</given-names>
          </string-name>
          . Megakernels considered
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>harmful: wavefront path tracing</article-title>
          on GPUs //HPG'13.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>- ACM</source>
          ,
          <year>2013</year>
          . - .
          <volume>137</volume>
          -
          <fpage>143</fpage>
          . [10]
          <string-name>
            <surname>Patney</surname>
            <given-names>A.</given-names>
          </string-name>
          et al.
          <article-title>Piko: a framework for authoring pro-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Graphics</surname>
          </string-name>
          .
          <article-title>-</article-title>
          <year>2015</year>
          . - .
          <volume>34</volume>
          . -
          <fpage>№</fpage>
          . 4. - p.
          <fpage>147</fpage>
          . [11]
          <string-name>
            <surname>Parker</surname>
            <given-names>S. G.</given-names>
          </string-name>
          et al. GPU ray tracing //Communica-
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>tions of the ACM. - 2013</source>
          . - Vol.
          <volume>56</volume>
          . -
          <fpage>№</fpage>
          . 5. - p.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          93-
          <fpage>101</fpage>
          . [12]
          <string-name>
            <surname>Parker</surname>
            <given-names>S. G.</given-names>
          </string-name>
          et al.
          <article-title>RTSL: a ray tracing shading lan-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>guage //2007 IEEE Symposium on Interactive Ray</mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Tracing</surname>
          </string-name>
          . - IEEE,
          <year>2007</year>
          . - p.
          <fpage>149</fpage>
          -
          <lpage>160</lpage>
          . [13]
          <string-name>
            <surname>Perlin</surname>
            <given-names>K.</given-names>
          </string-name>
          <article-title>An image synthesizer /</article-title>
          /ACM Siggraph. -
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <year>1985</year>
          .
          <article-title>-</article-title>
          .
          <volume>19</volume>
          . -
          <fpage>№</fpage>
          . 3. - p.
          <fpage>287</fpage>
          -
          <lpage>296</lpage>
          . [14]
          <string-name>
            <surname>Pharr</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakob</surname>
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Humphreys</surname>
            <given-names>G</given-names>
          </string-name>
          . Physically based
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Kaufmann</surname>
          </string-name>
          ,
          <year>2016</year>
          . [15]
          <string-name>
            <surname>Sons</surname>
            <given-names>K.</given-names>
          </string-name>
          et al. shade.js: Adaptive Material Descrip-
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          tions //Computer Graphics Forum.
          <article-title>-</article-title>
          <year>2014</year>
          . - Vol.
          <volume>33</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>- №. 7</source>
          . - p.
          <fpage>51</fpage>
          -
          <lpage>60</lpage>
          . [16]
          <string-name>
            <surname>Stich</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Real-time raytracing with Nvidia RTX</article-title>
          , GTC
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>EU</surname>
          </string-name>
          <year>2018</year>
          [17]
          <string-name>
            <surname>Veach</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Robust</surname>
          </string-name>
          <article-title>Monte Carlo methods for light trans-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <year>1997</year>
          .
          <article-title>-</article-title>
          .
          <volume>1610</volume>
          . [18]
          <string-name>
            <surname>Zhdanov D. D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ershov</surname>
            <given-names>S.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deryabin N. B.</surname>
          </string-name>
          Object-
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          Vol.
          <volume>5</volume>
          , № 4,
          <issue>2013</issue>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>117</lpage>
          . [19]
          <article-title>Arnold renderer GPU demo press release</article-title>
          . URL =
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>arnold-5</source>
          -3-gpu/ [20]
          <article-title>Blender community</article-title>
          . Cycles Open Source Production
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          Rendering. URL = https://www.cycles-renderer.org/ [21]
          <string-name>
            <given-names>Caustic</given-names>
            <surname>Graphics. OpenRL: Open Ray Tracing</surname>
          </string-name>
          Lan-
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>guage.</surname>
          </string-name>
          <year>2010</year>
          . [22]
          <article-title>Houdini 17.5 VEX language reference</article-title>
          .
          <year>2019</year>
          . URL =
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          https://www.sidefx.com/docs/houdini/vex/lang.html [23]
          <string-name>
            <given-names>Hydra</given-names>
            <surname>Renderer</surname>
          </string-name>
          . Open source rendering
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>system. KIAM RAS</surname>
          </string-name>
          , MSU.
          <year>2019</year>
          URL =
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          https://github.com/Ray-Tracing-Systems/HydraAPI [24]
          <article-title>Nvidia Turing arch</article-title>
          . paper.
          <source>2019 URL</source>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          Whitepaper.pdf [25]
          <string-name>
            <surname>Nvidia</surname>
            <given-names>RTX</given-names>
          </string-name>
          <article-title>Ray tracing developer resources</article-title>
          .
          <source>2019</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          URL = https://developer.nvidia.com/rtx/raytracing [26]
          <string-name>
            <surname>OctaneRender</surname>
            <given-names>OSL</given-names>
          </string-name>
          <string-name>
            <surname>Documentation</surname>
          </string-name>
          .
          <year>2019</year>
          . URL =
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          https://docs.otoy.com/osl/ [27]
          <string-name>
            <surname>Open</surname>
            <given-names>Shading</given-names>
          </string-name>
          <string-name>
            <surname>Language</surname>
          </string-name>
          .
          <year>2019</year>
          . URL =
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          https://github.com/imageworks/OpenShadingLanguage [28]
          <string-name>
            <surname>Vulkan</surname>
            <given-names>specification.</given-names>
          </string-name>
          2019 URL =
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>https://www.khronos.org/registry/vulkan/specs/1.1-</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>