<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Examination of the Nvidia RTX</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">V</forename><forename type="middle">V</forename><surname>Sanzharov</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Gubkin Russian State University of Oil and Gas</orgName>
								<address>
									<settlement>Moscow</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Gorbonosov</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Moscow State University</orgName>
								<address>
									<settlement>Moscow</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">V</forename><forename type="middle">A</forename><surname>Frolov</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Keldysh Institute of Applied Mathematics RAS</orgName>
								<address>
									<settlement>Moscow</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Moscow State University</orgName>
								<address>
									<settlement>Moscow</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">A</forename><forename type="middle">G</forename><surname>Voloboy</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Keldysh Institute of Applied Mathematics RAS</orgName>
								<address>
									<settlement>Moscow</settlement>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Examination of the Nvidia RTX</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">904A78705C4C076CAB2E3987939134A8</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T01:21+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>photo-realistic rendering</term>
					<term>ray tracing</term>
					<term>hardware acceleration</term>
					<term>GPU</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Hardware acceleration of ray tracing is an active research field, but only with the release of Nvidia Turing architecture GPUs it became widely available. Nvidia RTX is a proprietary hardware ray tracing acceleration technology available in Vulkan and DirectX APIs as well as through Nvidia OptiX. Since the implementation details are unknown to the public, there are a lot of questions about what it actually does under the hood. To find answers to these questions, we implemented classic path tracing algorithm using RTX via both DirectX and Vulkan and conducted several experiments with it to investigate the inner workings of this technology. We tested actual hardware implementation of RTX technology on RTX2070 GPU and the software fallback in the driver on GTX1070 GPU. In this paper we present results of these experiments and speculate on the internal architecture of RTX.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Ray tracing is a cornerstone of photo-realistic image synthesis. Since first papers on ray tracing <ref type="bibr" target="#b18">[19]</ref>, <ref type="bibr" target="#b4">[5]</ref>, computer graphics researchers developed a plethora of different techniques to somehow accelerate the computations associated with ray tracing.</p><p>The hardware acceleration ray tracing had limited success out of research papers. Until the RTX technology by Nvidia was released in their Turing architecture GPUs. It was stated that Turing hardware contains special so-called «RT cores» which accelerate ray tracing. In the official Turing architecture whitepaper <ref type="bibr" target="#b21">[22]</ref> it is stated that RT core contains two units which perform bounding box and ray-triangle intersection tests. But since RTX is closed source, we don't know for sure how exactly it is implemented and if this is all that is to ray tracing acceleration in Turing GPUs. In this paper, we present information on several experiments we did with an RTX GPU. We analyze the experiments' results and speculate on possible techniques used in RTX hardware to accelerate ray tracing. But first of all, let's review the research in ray tracing acceleration hardware to understand what techniques were already tried out in hardware implementations and how well did they perform.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">Related work in ray tracing acceleration hardware</head><p>First dedicated hardware solutions closely related to ray tracing were PCI cards for volume data visu-alization which implemented ray casting and Phong shading (such as <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b11">12]</ref>). Even though these hard-ware traced only primary rays, it already implemented techniques to increase the efficiency of parallel tracing such as grouping rays to make use of memory access coherence <ref type="bibr" target="#b8">[9]</ref>. Another notable product was SaarCOR architecture <ref type="bibr" target="#b12">[13]</ref> and its updated version in an FPGA chip <ref type="bibr" target="#b13">[14]</ref>. The SaarCOR chip implemented the whole ray tracing algorithm -scene and camera data were uploaded from the host and the chip produced the rendered image. Like the ray casting solutions, SaarCOR used packet tracing (in groups of 64 rays). The architecture was fully pipelined to further mitigate memory access latency -simultaneously traversing one group of rays, loading data for the next group and intersec-tion operation performed on another group of rays. An example of ray tracing hardware which was commercially available is ART AR250/350 rendering processor with a custom RISC processor core <ref type="bibr" target="#b3">[4]</ref>. The solution was used to accelerate offline rendering and was packaged as x86 PC with 16, 36 or 48 rendering processors as PCI-X cards and gigabit networking system. Software side included RenderMan compliant renderer and network communication interfaces and plugins for 3D applications (CATIA, 3ds Max, Maya). Details about the custom rendering processor to our knowledge were never published.</p><p>All works mentioned to this point concern fixed function hardware. One of the first solutions with programmable stages is RPU (ray processing unit) <ref type="bibr" target="#b19">[20]</ref>. The traversal and primitive intersection tasks are implemented in fixed function units. RPU supported custom shaders with features such as recursive function calls, trace instruction to initiate tracing of an arbitrary ray, asynchronous load instruction to hide memory latency. RPU also featured geometry shaders, instancing support and shader tables to look up specific shader to execute for a particular geome-try object. As SaarCOR and ray casting solutions, RPU also uses packet ray-tracing which can result in performance drops in the case of incoherent rays. The TRaX architecture <ref type="bibr" target="#b15">[16]</ref> implements a different solutionmany identical cores consisting of simple thread processors. It can be viewed as general pur-pose architecture and is used in other papers to simu-late their hardware <ref type="bibr" target="#b6">[7]</ref>. In the ray-tracing application TRaX accelerates single ray performance and features MIMD execution model as opposed to groups of 4 or more rays and SIMD model in previously mentioned architectures. The authors in <ref type="bibr" target="#b9">[10]</ref> aimed to address problems with incoherent rays by using N-wide SIMD processing architecture with filtering of rays to find coherent groups. The filtering is applied at traver-sal, intersection and shading stages of the ray tracing algorithm.</p><p>In <ref type="bibr" target="#b0">[1]</ref> authors simulate architecture close to that of Nvidia Fermi GPU. One of the key aspects of it (related to ray tracing) is work compaction. When a warp (group of 32 threads) has more than a half of rays terminated, it terminates and the non-terminated rays are copied to the next warp. This mechanism allows to mitigate the effect of incoherent rays and preserve the parallelism. Another suggestion in this work is related to stack memory layout for threads. Also <ref type="bibr" target="#b0">[1]</ref> implements the idea of partitioning BVH into treelets (which approximately matches cache sizes) and group-ing rays according to treelets they intersect. Another architecture -STRaTA <ref type="bibr" target="#b6">[7]</ref> is built on top of the TRaX <ref type="bibr" target="#b15">[16]</ref> and implements modified treelet technique of <ref type="bibr" target="#b0">[1]</ref> and streaming approach to processing rays associated with each treelet. STRaTA adds special small buffers to memory hierarchy to store rays.</p><p>In <ref type="bibr" target="#b14">[15]</ref> authors focus on improvements related to memory access, in particular, completely avoiding random memory access during ray traversal. Their approach is based around presenting data needed for ray tracing in two streams -stream of geometry data split in segments and stream of rays collected as a queue per geometry segment they intersect. This al-lows for fetching geometry and rays from main mem-ory into caches before they are needed for traversal.</p><p>Work <ref type="bibr" target="#b5">[6]</ref> in addition to MIMD execution model and treelets proposes using reduced precision BVH traversal which also allows for chip area and power savings. Another specific point of <ref type="bibr" target="#b5">[6]</ref> is that authors propose small solution which can be integrated into existing GPU architecture. There are also works focused on developing mobile ray tracing hardware (such as <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b10">11]</ref>). These solutions usually have such common properties as MIMD execution model, hardware traversal and intersection units. Raycore <ref type="bibr" target="#b10">[11]</ref> has distinctive properties that separate it from other architectures -it's fully fixed function Whitted-style ray tracing <ref type="bibr" target="#b18">[19]</ref>, it uses kD-tree as acceleration structure and includes hardware unit for kD-tree construction.</p><p>Summary. Overall, quite a few different architectures and hardware acceleration techniques for ray tracing were proposed over the years. Detailed review and comparison can be found in <ref type="bibr" target="#b1">[2]</ref>. Some of the mentioned architectures had been implemented in FP-GAs. Production level hardware applications besides Nvidia RTX are represented by <ref type="bibr" target="#b3">[4]</ref> and mobile GPUs by Imagination technologies <ref type="bibr" target="#b20">[21]</ref>. However, both of those have no published details, <ref type="bibr" target="#b3">[4]</ref> is discontinued and <ref type="bibr" target="#b20">[21]</ref> is not yet available. Therefore, RTX is the first hardware ray tracing acceleration technology to reach wide public. But since the implementation details are closed (like <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b20">21]</ref>), it is unclear how exactly does it work and what acceleration techniques it uses. In this paper, we aim to understand the principles be-hind ray tracing acceleration in Nvidia RTX hardware by measuring the performance in several scenarios us-ing Vulkan and DirectX12 API.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Experimental analysis of Nvidia RTX</head><p>First let's briefly review available information about inner workings of RTX. Access to RTX ray tracing functionality is available through Vulkan API, Microsoft DirectX 12 (DXR) and Nvidia OptiX API libraries <ref type="bibr">[23]</ref>. We used both Vulkan and DirectX 12 for our experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Known details</head><p>In summary, for both graphics APIs the corresponding extensions add functionality to create ray tracing pipeline with the corresponding new shader types, commands and objects for acceleration structures, and tools to associate shader groups with acceleration structures (i.e. shader binding table).</p><p>Acceleration structure is represented as two-level tree. Bottom level acceleration structure (BLAS) objects contain actual vertices and top level acceleration structure (TLAS) contains BLAS object instances i.e. transformation matrices. The building process is done on the GPU, acceleration structure is some form of BVH <ref type="bibr" target="#b16">[17]</ref>.</p><p>Ray tracing pipeline has five shader types -ray generation, miss, closest hit, any hit and intersection. Shader programs of first three types are mandatory and the last two are optional. All stages of ray tracing algorithm are programmable. There is built-in raytriangle intersection shader which is used by default. Official whitepaper <ref type="bibr" target="#b21">[22]</ref> states that RT core has raytriangle intersection unit inside. In <ref type="bibr" target="#b17">[18]</ref> authors show 2-3.5 times improvement in performance of their algorithm of point location in tetrahedral meshes when us-ing built-in triangle intersection unit on Turing hard-ware while Volta hardware (which has no RTX cores, so software fallback is used for RTX functionality) shows performance loss in the same scenario.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Experiments</head><p>To understand how RTX works under the hood we conducted several experiments. As a base for our investigations we implemented a basic path tracing algorithm <ref type="bibr" target="#b4">[5]</ref> and compare it to Open Source implementation of path tracing in Hydra Renderer <ref type="bibr">[24]</ref>.</p><p>Implementation of a minimal path tracer using RTX in Vulkan or DirectX 12 would require developer to:</p><p>1. build acceleration structures using ray tracing extension API;</p><p>2. create ray tracing pipeline containing at least ray generation, closest hit and miss shader programs; 3. create shader table to bind shader programs to acceleration structures; 4. create and execute command buffers on created pipeline.</p><p>There are several design options even in the minimal implementation using RTX which can potentially affect performance. For example, the shading and lighting code can be executed in a ray generation shader, in a single (closest) hit shader or in several hit shaders. We tested two different implementations according to best practices of RTX for Vulkan and DX12:</p><p>1. impl_1 (Vulkan): ray generation shader creating ray(s) for each pixel in a cycle until the specified tracing depth is reached; 2. impl_2 (DirectX): ray generation shader spawning primary ray and closest hit shader taking care of generating rays until specified depth is reached. To measure performance in all our experiments we used Nvidia Nsight Graphics software and 2 GPUs-GTX1070 and RTX2070. It is known that while RTX2070 has hardware acceleration for ray tracing, GTX1070 has software implementation of RTX. Using this setup we captured frames from our path tracing application and logged time spent by vkCmdTraceR-aysNV (Vulkan) or DispatchRays (DirectX 12) function and «BVH4TraversalInstKernel» kernel in Hy-dra Renderer. In our first set of experiments we ran implemented path tracer on three scenes (Sponza, CrySponza, Hairballs) with different tracing depth.</p><p>From measured time we calculated frames per second and approximate amount of rays traced per second as:</p><formula xml:id="formula_0">rays = width * height * spp * f ps (1)</formula><p>width, height -rendering resolution, spp -samples per pixel, fps -frames per second.  Next, we modified impl_2 with tracing several rays at each depth level essentially transforming it into an implementation of branched (recursive) path tracing. As can be seen in fig. <ref type="figure" target="#fig_0">1</ref>, the time increases consis-tently with the number of rays, even slower in some cases. For example, with 4 rays per depth level the total number of rays is 7 times higher than for 1 ray per depth level (21 against 3). And the performance drop is 6 times for Sponza and 3.6 for Hairballs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Results and discussion</head><p>Conclusion #1: Nvidia RTX is primarily aimed at accelerating random access to memory during ray tracing. More specifically, traversing BVH tree with a sets of random rays. This conclusion stems from (fig 2, right), where we can see that hardware implemen-tation on the small scene (Sponza) wins only 2 times (477 vs 1140) with «coherent» and «sorted» sets of primary rays. But breaks away 4-5 times for the same Sponza and incoherent rays (122 vs 561). Moreover, large scene (Hair Balls) shows same 4-5 times for both primary (58 vs 283) and secondary (50 vs 210) rays. The fact that acceleration is preserved on the scene where the bottleneck is the memory confirms our conclusion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion #2:</head><p>Nvidia RTX implements some raygrouping/ray-sorting.</p><p>It's done probably in combination with GPU work creation (see conclusion #4). This assumption is confirmed by the fact that on simple scenes (like Sponza) hardware implementa-tion doesn't have significant performance drop when we move from primary to secondary rays (table 1, fig1). At the same time software implementation sees its performance degrade much faster. However, on the scene where ray grouping could not help (Hair balls), both hardware and software implementation don't have significant performance difference between primary and secondary rays.  According to the results of our experiments, the hardware implementation should be closely connected with the texture units, or it is part of texture unit. We believe the most interesting part is related to reordering of memory access and thus it should work in analogue to well known memory access reordering inside texture units. In this way, traversal unit itself could be small enough and probably implements reduced precision BVH traversal <ref type="bibr" target="#b5">[6]</ref> (or some analogue) for better cache efficiency and reducing HW cost.  <ref type="figure" target="#fig_0">1</ref>) also confirms GPU work creation presence since the time is proportional to the number of rays.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Final conclusion</head><p>Our main conclusion is that Nvidia RTX is some sort of «general» technology, oriented to speeding up random memory access and irregular work distribution on GPUs. In this way we can expect in near future different sets of algorithms (at least some spatial search algorithms) to be hardware accelerated.</p><p>We believe Nvidia puts a lot of efforts in their compiler and software support of GPU work creation. On the example of this technology we can see, that «the golden age of software» has ended and the «the golden age of compilers and HW/SW projects» has started.</p><p>Despite the overall complexity of Vulkan and DX12, such improvements make GPU implementation of complex rendering engine much simpler for devel-oper. On the other hand, this simplicity is achieved at the cost of tying the project to a fairly heavy tech-nology. We believe that efficient software implemen-tation of RTX will be complex and expensive due to GPU work creation and specific compiler that Nvidia puts inside RTX -even Nvidia's software implemen-tation on GTX1070 essentially loses to simple and straightforward open source ray tracing implementa-tion in Hydra Renderer.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Time spent by ray tracing "draw call" per frame (1 sample per pixel, 1024 x 1024 resolution) depending on rays traced per depth level. Depth = 3</figDesc><graphic coords="3,309.54,69.17,231.87,143.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 2 .Fig. 3 .</head><label>23</label><figDesc>Fig. 2. Comparison on GTX1070 (left) and RTX2070 (right) (Open Source implementation vs Nvidia RTX). The left part of each image (green) shows performance for primary (coherent) rays, and the right part (red) for secondary (random) rays.</figDesc><graphic coords="4,55.85,70.15,243.78,153.66" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 4 .</head><label>4</label><figDesc>Fig.4. Supposed internal architecture of Nvidia RTX. According to the results of our experiments, the hardware implementation should be closely connected with the texture units, or it is part of texture unit. We believe the most interesting part is related to reordering of memory access and thus it should work in analogue to well known memory access reordering inside texture units. In this way, traversal unit itself could be small enough and probably implements reduced precision BVH traversal<ref type="bibr" target="#b5">[6]</ref> (or some analogue) for better cache efficiency and reducing HW cost.</figDesc><graphic coords="4,53.86,506.76,487.56,179.17" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Million</figDesc><table><row><cell>scene</cell><cell cols="3">primary secondary tertiary</cell></row><row><cell>Sponza, impl_1</cell><cell>807</cell><cell>437</cell><cell>806</cell></row><row><cell>Sponza, impl_2</cell><cell>928</cell><cell>777</cell><cell>694</cell></row><row><cell>Sponza, Hydra_SW</cell><cell>480</cell><cell>122</cell><cell>130</cell></row><row><cell>Crysponza, impl_1</cell><cell>806</cell><cell>419</cell><cell>388</cell></row><row><cell>Crysponza, impl_2</cell><cell>754</cell><cell>635</cell><cell>216</cell></row><row><cell cols="2">Crysponza, Hydra_SW 276</cell><cell>92</cell><cell>80</cell></row><row><cell>Hairballs, impl_1</cell><cell>275</cell><cell>223</cell><cell>256</cell></row><row><cell>Hairballs, impl_2</cell><cell>567</cell><cell>155</cell><cell>141</cell></row><row><cell>Hairballs, Hydra_SW</cell><cell>61</cell><cell>50</cell><cell>56</cell></row></table><note>rays traced per second (Mrays/s), 1 sample per pixel, 1024 x 1024 resolution, RTX2070</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>Despite the Nvidia attempt, placing the whole code in a single kernel («CPU style» or «uber kernel») is still inefficient for GPUs. We make such conclusion because of 2 main reasons. First, open source implementation with separate ker-nel in Hydra Renderer benefits almost 2 times over Nvidia RTX for pure software case (fig. 2, left). This conclusion is confirmed by simple observation. When we generated random amount of rays (10 to 40), we got 2 times slower in comparison with 10 rays. In contrast to ray tracing, when we calculated Perlin Noise with random noise function calls (10 to 40), we got exactly 4 times of what we should have without GPU work creation. Our experiment with recursive ray tracing (fig.</figDesc><table><row><cell cols="7">Conclusion #3: Sec-</cell></row><row><cell cols="7">ond, when comparing 2 slightly different implementa-</cell></row><row><cell cols="7">tions of RTX in Vulkan and DX12 we have found dra-</cell></row><row><cell cols="7">matic changes in performance depending on a slight</cell></row><row><cell>change</cell><cell>in</cell><cell>the</cell><cell>complexity</cell><cell>of</cell><cell>shaders</cell><cell>in</cell></row><row><cell cols="7">«impl_1» (more complex) vs «impl_2» (simpler), table</cell></row><row><cell cols="7">1. This can be explained by occupancy drop</cell></row><row><cell cols="6">depending on code complexity and register pressure.</cell><cell></cell></row><row><cell cols="7">Conclusion #4: Nvidia RTX uses GPU work cre-ation</cell></row><row><cell>for rays.</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Acknowledgments</head><p>This work was sponsored by RFBR 18-31-20032 grant.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Architecture considerations for tracing incoherent rays //High-performance Graphics</title>
		<author>
			<persName><forename type="first">Aila</forename><forename type="middle">T</forename><surname>Karras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename></persName>
		</author>
		<imprint>
			<date type="published" when="2010">2010</date>
			<publisher>Eurographics Association</publisher>
			<biblScope unit="page" from="113" to="122" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Toward real-time ray tracing: A survey on hardware acceleration and microarchitecture tech-niques</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Deng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys (CSUR</title>
		<imprint>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page">58</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Coherent ray tracing via stream filtering</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">P</forename><surname>Gribble</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ramani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium on Interac-tive Ray Tracing</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2008">2008. 2008</date>
			<biblScope unit="page" from="59" to="66" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The AR350: Today&apos;s ray trace rendering processor</title>
		<author>
			<persName><forename type="middle">D</forename><surname>Hall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Eurographics/SIGGRAPH workshop on Graphics hardware -Hot 3D Session 1</title>
				<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">The rendering equation //ACM SIG-GRAPH computer graphics</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Kajiya</surname></persName>
		</author>
		<idno>20. -№. 4</idno>
		<imprint>
			<date type="published" when="1986">1986</date>
			<publisher>ACM</publisher>
			<biblScope unit="page" from="143" to="150" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Reduced precision hardware for ray tracing</title>
		<author>
			<persName><forename type="first">Keely</forename><forename type="middle">S</forename></persName>
		</author>
		<idno>-2014</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. HPG</title>
				<meeting>HPG</meeting>
		<imprint>
			<biblScope unit="page" from="29" to="40" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">An energy and bandwidth efficient ray tracing architecture //High-performance Graphics</title>
		<author>
			<persName><forename type="first">D</forename><surname>Kopta</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>ACM</publisher>
			<biblScope unit="page" from="121" to="128" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">SGRT: A mobile GPU architecture for real-time ray tracing //High-performance graphics conference</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">J</forename><surname>Lee</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
			<publisher>ACM</publisher>
			<biblScope unit="page" from="109" to="119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">VIZARD II: a reconfigurable interactive volume rendering system</title>
		<author>
			<persName><forename type="first">M</forename><surname>Meißner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM Eurographics conf. on Graphics hardware. -Eurographics Associa-tion</title>
				<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="137" to="146" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">T&amp;I engine: traversal and intersection engine for hardware accelerated ray tracing</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Nah</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Graphics</title>
		<imprint>
			<date type="published" when="2011">2011</date>
			<publisher>ACM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">RayCore: A ray-tracing hardware architecture for mobile devices</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Nah</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">/ACM Transactions on Graphics (TOG)</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page">162</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">The VolumePro real-time ray-casting system</title>
		<author>
			<persName><forename type="first">H</forename><surname>Pfister</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
			<publisher>Association for Computing Machinery</publisher>
			<biblScope unit="page" from="251" to="260" />
		</imprint>
		<respStmt>
			<orgName>N.Y.</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">//Computer graphics and interactive techniques</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">SaarCOR: a hardware architecture for ray tracing</title>
		<author>
			<persName><forename type="first">J</forename><surname>Schmittler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Wald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Slusallek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ACM SIGGRAPH conf. on Graphics hardware. -Eurographics Associa-tion</title>
				<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="27" to="36" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Realtime ray tracing of dynamic scenes on an FPGA chip</title>
		<author>
			<persName><forename type="first">J</forename><surname>Schmittler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">//ACM SIG-GRAPH/EUROGRAPHICS conf. on Graphics hard-ware</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="95" to="106" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Dual streaming for hardwareaccelerated ray tracing //High Performance Graphics</title>
		<author>
			<persName><forename type="first">K</forename><surname>Shkurko</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>ACM</publisher>
			<biblScope unit="page">12</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">TRaX: A multi-threaded architecture for real-time ray tracing //Symposium on Application Specific Processors</title>
		<author>
			<persName><forename type="first">J</forename><surname>Spjut</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<publisher>IEEE</publisher>
			<biblScope unit="page" from="108" to="114" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Real-time raytracing with Nvidia RTX</title>
		<author>
			<persName><forename type="first">M</forename><surname>Stich</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<pubPlace>GTC EU</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">RTX Beyond Ray Tracing: Exploring the Use of Hardware Ray Tracing Cores for Tet-Mesh Point Location</title>
		<author>
			<persName><forename type="first">I</forename><surname>Wald</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>Authors&apos; Preprint -to be presented at High-Performance Graphics</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">An improved illumination model for shaded display</title>
		<author>
			<persName><forename type="first">T</forename><surname>Whitted</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">/ACM SIGGRAPH -ACM</title>
				<imprint>
			<date type="published" when="1979">1979</date>
			<biblScope unit="page">13</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">RPU: a programmable ray processing unit for realtime ray tracing</title>
		<author>
			<persName><forename type="first">S</forename><surname>Woop</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schmittler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Slusallek</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Graphics</title>
		<imprint>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="434" to="444" />
			<date type="published" when="2005">2005</date>
			<publisher>ACM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<ptr target="https://www.imgtec.com/graphics-processors/architecture/powervr-ray-tracing/" />
		<title level="m">Imagination technologies</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>PowerVR Ray Tracing</note>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Nvidia Turing architecture whitepaper</title>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m">Solutions/design-visualization/technologies/turingarchitecture/NVIDIA-Turing-Architecture-Whitepaper</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<ptr target="https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/vkspec.html" />
		<title level="m">Nvidia RTX Ray tracing developer resources</title>
				<imprint>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
		<respStmt>
			<orgName>Keldysh Institute of Applyed Mathematics, Moscow State Uiversity ; Hydra Renderer</orgName>
		</respStmt>
	</monogr>
	<note>Open source rendering system</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
