Conservative Out-of-Core Rendering of Large Dynamic Scenes Using HDLODs Vitaly Semenov 1,2,3, Vasily Shutkin 1 and Vladislav Zolotov 1 1 Ivannikov Institute for System Programming of the Russian Academy of Sciences, Alexander Solzhenitsyn st., 25, Moscow, 109004, Russia 2 Moscow Institute of Physics and Technology, 9 Institutskiy per., Dolgoprudny, 141701, Russia 3 National Research University Higher School of Economics, 11 Pokrovsky Bulvar, Moscow, 109028, Russia Abstract Rendering of large scenes in external memory is one of the most important problems of computer graphics, which is used in such areas as CAD/CAM/CAE, geoinformatics, project management, scientific visualization, virtual and augmented reality, computer games and animation. Unlike static scenes, for which a number of effective approaches have been proposed, in particular, levels of detail (LOD), rendering of large dynamic scenes with a given level of the tolerance and trustworthiness remains a big challenge. This paper discusses and investigates the possibility of using the previously proposed method of hierarchical dynamic levels of detail (HDLOD) for conservative rendering of dynamic scenes with a deterministic nature of events in external memory. The carried out series of computational experiments prove the feasibility and effectiveness of the method for real industrial scenes when employed in combination with memory management techniques. Keywords 1 Out-of-core rendering, dynamic 3D scenes, levels of detail, HDLOD 1. Introduction While the problem of out-of-core rendering of static scenes has been successfully solved and a lot of promising approaches and graphic systems have been developed and introduced into practice, rendering of dynamic scenes larger than main memory remains an acute urgent problem. The class of graphic applications in which scenes exhibit dynamic behavior and occupy extremely large amount of memory is quite wide and includes CAD/CAM/CAE systems, GIS, 4D project modelling systems, scientific visualization, computer games and animation, virtual and augmented reality. Unfortunately, the continuous growth of the complexity of displayed scenes is not compensated by the modernization of graphics hardware, although there are quality improvements. For this reason, new approaches are needed to render complex dynamic scenes with reasonable realism and fidelity, and when possible, with interactive frame rates. As opposed to the interactive or approximate mode, the presented paper focuses on the so-called conservative rendering aimed at ensuring a given level of tolerance and trustworthiness. The conservative mode is especially important for industrial applications that detect spatio-temporal collisions and interferences in scenes, as well as for applications that create photorealistic graphics and video-presentations. The performance requirements in such cases are relaxed in favor of high-quality conservative rendering. The problem of rendering complex scenes has been studied intensively for over many decades. Proposed methods include polygonal model simplifications, replacing distant geometry with imagery, levels of detail (LOD) and hierarchical levels of detail (HLOD), view-frustum and occlusion culling, spatial indexing (primarily octrees and kd-trees), from-point and from-region visibility analysis, parallel GraphiCon 2021: 31st International Conference on Computer Graphics and Vision, September 27-30, 2021, Nizhny Novgorod, Russia EMAIL: sem@ispras.ru (V. Semenov); v451ly@ispras.ru (V. Shutkin); vladislav.zolotov@ispras.ru (V. Zolotov) ORCID: 0000-0002-8766-8454 (V. Semenov); 0000-0002-9238-5029 (V. Shutkin); 0000-0002-1988-6548 (V. Zolotov) ©️ 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) loading and rendering, speculative prefetching, geometry caching, approximate display, draw call optimizations, batching and texture packing. Out-of-core rendering systems usually employ one or another combination of these methods. iWalk [1] uses octree to build on-disk hierarchical representation for a large model. It uses multi-threaded rendering approach that combines speculative prefetching with from-point occlusion-culling algorithm based on prioritized-layered projection (PLP). Papers [2] and [3] both propose algorithms that are based on the HLOD method [4]. In [2], the scene graph representation is decoupled into skeleton and object- reps. The skeleton stores connectivity information of the scene graph, as well as auxiliary data like nodes bounding boxes and geometric error. The skeleton must be loaded into main memory, although it is typically not a problem because the skeleton has very small memory footprint. Object-reps (that are levels of details for the nodes of the scene graph) are stored on disk. The algorithm utilizes two processes that work in parallel. The first process traverses the scene graph and computes so-called front — the list of object representations that have to be rendered on this frame. The second process loads the object-reps in parallel and performs prefetching when there is time for that. Algorithm in [3] uses different approach — nodes are loaded successively starting from root with maintaining the following property: if a node has geometry loaded, so does all of its ancestors. This way the algorithm doesn’t require the full skeleton to be in main memory. Quick-VDR algorithm [5] represents the model as a clustered hierarchy of progressive meshes (CHPM). The cluster hierarchy is used for coarse-grained view-dependent refinement, whereas the progressive meshes provide fine-grained local refinement. The algorithm also performs occlusion culling. One frame of latency is added to fetch newly visible clusters from disk. Paper [6] presents an interactive out-of-core rendering frameworks that uses real-time ray tracing and a hierarchical, hybrid volumetric/lightfield-like approximation scheme for representing not- yet-loaded geometry. MMR system [7] renders near geometry using levels of details and far geometry is rendered as a textured depth mesh. The system automatically balances the screen-space errors resulting from geometric simplification with those from textured-depth-mesh distortion. Cesium 3D Tiles [8] is an open specification for sharing, visualizing, fusing, and interacting with massive heterogenous 3D geospatial content. It is based on HLOD concept and implies using JSON file for storing skeleton data and glTF files for geometric representations of the hierarchy nodes. HLOD method is a good basis for designing an out-of-core rendering algorithm. However, HLOD and most of existing out-of-core rendering algorithms are developed primarily for static scenes. They preliminary build certain data structures that are then utilized during rendering. When dynamic changes occur in the scene those data structures must be recomputed. The time required to recompute is typically greater than that for rendering the scene. That is why the authors of HLOD method admit that it is most suitable for the case of “infrequent motion” [4]. The situation with out-of-core algorithms is even worse, because in this case updating the data on disk would be required. The previously proposed method of hierarchical dynamic levels of detail (HDLOD) looks more promising for the purposes declared. It also assumes preliminary hierarchical clustering of objects, but takes into account not only their spatial proximity, but also the similarity of dynamic behavior [9]. As a result, each cluster encapsulates both geometry model, material and dynamic behavior and, therefore, can be reused unchanged over entire time intervals, rather than in particular time moments. HDLOD trees can be systematically traversed and processed to render the original scene with a certain threshold of the geometric error. It is noteworthy that the specified threshold can be provided for each viewing position at any time. Previous studies have proven the high efficiency of the method when HDLOD trees are allocated in main memory. Finally, the method is applicable for wide classes of large dynamic scenes with a deterministic nature of temporal events such as the appearance, removal and movement of objects. This paper presents an out-of-core generalization of the HDLOD method, suitable for conservative rendering of very large dynamic scenes, whose HDLOD representations are located in external memory and can be loaded into main memory only in parts. The underlying HDLOD method is shortly described in section 2 with emphasis on sorts and representations of spatio-temporal clusters. The out-of-core generalizations of the method, in particular, splitting the HDLOD cluster tree into a skeleton and a complement are specified in section 3. It also provides some details on memory management and caching techniques necessary for the method implementations. The carried out computational experiments are specified in section 4 and the results are summarized in conclusions. 2. HDLOD method Suppose that a scene 𝑆 = {𝑠(𝑔𝑠 , 𝑝𝑠 (𝑡), 𝑣𝑠 (𝑡))} is represented as a set of objects 𝑠 that are defined in a 3D Euclidean space 𝐸 3 on a simulation time interval [0, 𝑇]. Each object 𝑠 ∈ 𝑆 has an invariable geometric representation 𝑔𝑠 . The presence (or availability) of the object 𝑠 in the scene is determined by its presence function 𝑣𝑠 (𝑡): [0, 𝑇] → {0,1} in such a way that the function 𝑣𝑠 (𝑡) is one if the object appears in the scene at the instant 𝑡 and it is zero if it does not. The position of the object 𝑠 is determined by its position function 𝑝𝑠 (𝑡): [0, 𝑇] → 𝑃, where 𝑃 is a set of 4  4 matrices or any other structures used for determining the location and orientation of the scene objects. In terms of dynamic behavior, all objects in the scene can be divided into three non-overlapping classes. The class of static objects includes the objects whose presence and position do not change throughout the entire simulation period, i.e., ∀ 𝑡 ∈ [0, 𝑇] 𝑣𝑠 (𝑡) = 1, 𝑝𝑠 (𝑡) = 𝑝𝑠 , where 𝑝𝑠 is the unchanged position of the object 𝑠. In further consideration, for clarity, such objects will be denoted as 𝑠(𝑔𝑠 , 𝑝𝑠 , 𝑣𝑠 ). The class of pseudo-dynamic objects consists of the objects that appear and disappear without changing their positions: ∀ 𝑡 ∈ [0, 𝑇] 𝑝𝑠 (𝑡) = 𝑝𝑠 , but ∃ 𝑡1 , 𝑡2 ∈ [0, 𝑇] such that 𝑣𝑠 (𝑡1 ) ≠ 𝑣𝑠 (𝑡2 ). Further, such objects will be denoted as 𝑠(𝑔𝑠 , 𝑝𝑠 , 𝑣𝑠 (𝑡)). Finally, the class of dynamic objects implies that both the presence function and position function of the object vary with time: ∃ 𝑡1 , 𝑡2 , 𝑡3 , 𝑡4 ∈ [0, 𝑇], such that 𝑣𝑠 (𝑡1 ) ≠ 𝑣𝑠 (𝑡2 ) 𝑝𝑠 (𝑡3 ) ≠ 𝑝𝑠 (𝑡4 ). In further, to emphasize the dynamic character of such objects the notation 𝑠(𝑔𝑠 , 𝑝𝑠 (𝑡), 𝑣𝑠 (𝑡)) will be used. We refer to the hierarchical dynamic levels of detail (HDLOD) as a tree of clusters 𝐶 = {𝑐(𝑔𝑐 , 𝑝𝑐 (𝑡), 𝑣𝑐 (𝑡), 𝑏𝑐 (𝑡), 𝛿𝑐 (𝑡))} ∪ {≺} represented by a set of clusters 𝑐 and a set of agglomeration relations ≺ . Each cluster 𝑐 is specified by its geometric representation 𝑔𝑐 and the following behavior functions defined over the entire simulation interval [0, 𝑇]. These are presence function 𝑣𝑐 : [0, 𝑇] → {0,1}, position function 𝑝𝑐 : [0, 𝑇] → 𝑃, bounding volume function 𝑏𝑐 : [0, 𝑇] → 𝐵, where 𝐵 is a set of volumes in 𝐸 3 , and geometric error function 𝛿𝑐 : [0, 𝑇] → [0, 𝛥𝑐 ], where 𝛥𝑐 is the maximum local deviation of the cluster geometry from its originated objects throughout the simulation interval [0, 𝑇]. Geometric error function 𝛿𝑐 (𝑡) quantifies the spatial error of the simplified geometry compared with the original objects taking into account its presence status at a particular time point 𝑡 ∈ [0, 𝑇]. The function 𝑏𝑐 (𝑡) enables to determine the bounding volume of the cluster c at every moment 𝑡 ∈ [0, 𝑇]. The bounding volume can be axis aligned bounding box (AABB), oriented bounding box (OBB) or any other volumetric structure, including bounding volume hierarchies, potentially mobile and deformable in time. The only significant requirement for bounding volumes, is computationally inexpensive checks of the belonging of volumes to given primitive shapes, e.g. visibility cones, and making verdicts on the need to display the clusters contained within them. Each agglomeration relation 𝑐 ′ ≺ 𝑐 reflects the order in which the parent cluster 𝑐 ∈ 𝐶 was formed from simplified geometric and behavioral representations of the child cluster 𝑐 ′ ∈ 𝐶. Clusters at the top of the tree contain representations of the renderable content with lowest levels of detail, while bottom clusters provide content with higher levels of detail. Geometry simplification methods are well studied and have been widely used in the HLOD implementations. Methods of simplifying behavior have not yet been studied, therefore, more clarifications are needed. Since the objects agglomerated into clusters can exhibit different behaviors, the concept of presence takes on a broader interpretation related to the visual fidelity, which can be achieved by displaying some of the clusters, or vice versa, by ignoring them. The concept provides for cases when a parent cluster is considered absent at some point in time, while some of its children are present at the same moment, and, conversely, the parent cluster can be considered present, while some children are absent. Therefore, the presence status should always be understood in the context of the tolerance that can be provided when the cluster is displayed or, conversely, ignored. Similar to the introduced classification of scene objects, HDLOD clusters can also be divided into static, pseudo-dynamic and dynamic. Static clusters are those that agglomerate only static objects and clusters. Their positions, bounding volumes, presence statuses and tolerances do not depend on the time parameter and therefore can be denoted as 𝑐(𝑔𝑐 , 𝑝𝑐 , 𝑣𝑐 , 𝑏𝑐 , 𝛿𝑐 ). Pseudo-dynamic clusters are clusters that agglomerate pseudo-dynamic and possibly static objects. Their positions and bounding volumes are fixed, while presence statuses and tolerances may depend on the time parameter 𝑐(𝑔𝑐 , 𝑝𝑐 , 𝑣𝑐 (𝑡), 𝑏𝑐 , 𝛿𝑐 (𝑡)). Dynamic clusters are those that agglomerate dynamic and possibly pseudo- dynamic and static objects and therefore have all attributes as functions 𝑐(𝑔𝑐 , 𝑝𝑐 (𝑡), 𝑣𝑐 (𝑡), 𝑏𝑐 (𝑡), 𝛿𝑐 (𝑡)). Regardless of the specific HDLOD clustering method used, the following assumptions are made and must be guaranteed: 1. The bounding volume of the parent cluster 𝑏𝑐 (𝑡) covers the bounding volume of any child cluster 𝑏𝑐 ′ (𝑡) so that 𝑏𝑐 ′ (𝑡) ⊆ 𝑏𝑐 (𝑡) at every time point 𝑡 ∈ [0, 𝑇]; 2. The geometric error of the parent cluster 𝛿𝑐 (𝑡) exceeds the error of any child cluster 𝛿𝑐 ′ (𝑡) so that 𝛿𝑐 ′ (𝑡) ≤ 𝛿𝑐 (𝑡) at every time point 𝑡 ∈ [0, 𝑇]; 3. The geometric error for leaf clusters is taken to be zero 𝛿𝑐 (𝑡) = 0 (∀𝑐 ∈ 𝐶 𝑠𝑢𝑐ℎ 𝑡ℎ𝑎𝑡 ∄𝑐 ′ ∈ 𝐶 𝑐 ′ ≺ 𝑐 𝛿𝑐 (𝑡) = 0), thereby assuming that their geometric and behavioral representations are accurate and can satisfy any tolerance requirements. In most cases, this can be achieved by using the original objects of the scene as leaf clusters. More detailed information on clustering method can be found in [9]. Under these assumptions, the scene can be effectively rendered with a certain screen-space error (sse) explicitly expressed in terms of pixels of the screen. When rendering the scene, the cluster tree is traversed. At each cluster, the functions are evaluated to determine if the cluster can be immediately rendered or further traversal of the subtree is required to find and employ proper clusters. First of all, it is checked whether the bounding volume of the cluster 𝑏𝑐 (𝑡) falls within the view frustum. If so, the next check is performed. Otherwise, traversal is stopped and the cluster and its subtree are excluded from further analysis. Then it is checked whether the screen-space error caused by the cluster tolerance exceeds the given threshold 𝑠𝑠𝑒. For this purpose, the geometric error of the cluster at the current modelling time is transformed to the screen-space error according to a standard perspective projection and is compared with the threshold: ⍺ ( ) 𝑠𝑠𝑒 𝛿𝑐 (𝑡) ≤ 2 𝑑 𝑡𝑔 2 , 𝑟 where 𝑟 is the resolution (or height) of the rendering screen in pixels, 𝑑 is the distance from the camera point to the point on the bounding volume of the cluster which is closest to the camera, ⍺ is the opening angle of the view frustum. If the condition is satisfied, the cluster is selected for rendering. If not, the traversal of the cluster subtree continues and its children also undergo the same checks. void display_tree (Hdlod tree, Time time, Frustum frustum, Real sse) { visit (tree.root, time, frustum, sse, true); } void visit (HdlodCluster cluster, Time time, Frustum frustum, Real sse, Bool frustum_culling) { OverlapType overlap = INSIDE; if (frustum_culling) overlap = check_overlap (cluster.bounding_volume (time), frustum); if (overlap EQUAL OUTSIDE) return; if (cluster.children.size() EQUAL 0) { display_cluster (cluster, time); return; } if (geometric_error_to_sse (cluster.error (time), frustum) LESS_THAN sse) { display_ cluster (cluster, time); return; } for each (HdlodCluster child in cluster.children) visit (child, time, frustum, sse, overlap NOT_EQUAL INSIDE); } void display_ cluster (HdlodCluster cluster, Time time) { if (NOT cluster.is_present (time)) return; render (cluster.geometry, cluster.matrix (time)); } Figure 1: Pseudocode of the HDLOD rendering algorithm If the cluster is selected for rendering, it’s presence status 𝑣𝑐 (𝑡) is determined. If the cluster is present, its content is rendered. Otherwise, traversal is stopped and nothing is displayed. The traversal is guaranteed to terminate when it reaches leaf clusters with exact content. Thus, checks against bounding volume, error and presence status are done when traversing the cluster tree to decide on the need of rendering the cluster. The pseudocode of the HDLOD rendering procedure display_tree with formal parameters corresponding to the HDLOD tree representation, current simulation time, current view frustum, and required screen-space error is presented in Figure 1. The procedure is based on the visiting function visit with similar parameters. The visiting function is recursively called to traverse clusters and render them using the display_cluster function. At the start of each frame the visiting function is invoked for the root of the tree. The procedure render is intended to render the geometric model of the cluster at a given location with a proper orientation as determined by the position matrix. The function check_overlap determines whether the specified bounds are outside, inside, or intersecting the view frustum and returns one of the enumerated values OUTSIDE, INSIDE, INTERSECTS. If bounding volume of the cluster is completely inside the view frustum, then all of its descendants are inside the view frustum and there is no need to perform further frustum culling checks. The function geometric_error_to_sse transforms the geometric error of the cluster in accordance with the formula above and returns screen-space error. 3. Out-of-core HDLOD algorithm The HDLOD method allows for conservative out-of-core rendering based on splitting the cluster tree into a skeleton, which is mostly in main memory, and cluster contents, which are preferably in external memory and loaded into main memory on the fly when needed. From formal point of view, the cluster tree can be thought as the structure 𝐶 = 𝐶 ′ ∪ 𝐶 ′′ , where 𝐶 ′ = {𝑐 ′ (𝑏𝑐 (𝑡), 𝛿𝑐 (𝑡), 𝑣𝑐 (𝑡))} ∪ {≺}, 𝐶 ′′ = {𝑐 ′′ (𝑔𝑐 , 𝑝𝑐 (𝑡))}. In what follows, the first structure 𝐶 ′ is called the skeleton, which contains both the cluster headers 𝑐 ′ and agglomeration relations ≺ of the original tree. The second structure 𝐶 ′′ is referred to as the complement that contain the cluster contents 𝑐 ′′ . It is assumed that each cluster header is linked with the corresponding cluster content, regardless of where it is allocated either in external memory or already loaded into main memory. It does not matter whether the links are implemented as memory pointers, identifiers, URL references, or any other associative structures and mechanisms. The assumptions above do not prevent the decomposition of the skeleton into parts, which will be stored, loaded and processed separately. It is also allowed that simple clusters can be an integral part of the skeleton, while more complex ones are the parts belonging to the complement. The only requirement is that the skeleton consumes a small amount of main memory compared to the complement and entire representation. It is noteworthy that the idea of extracting the skeleton is not original and has been successfully applied for out-of-core rendering of static scenes within the HLOD methods [2]. However, the presented study proves that this idea can be feasible and fruitful for rendering deterministic dynamic scenes as well. Moreover, we argue that the presented in Figure 1 pseudocode for displaying HDLOD trees in main memory is suitable for conservative rendering of these structures stored in external memory. The only reasonable assumption is that the skeleton has been preliminary allocated in the main memory. Unlike interactive rendering, which focuses on viewing and manipulating scenes in real time, usually at the expense of image trustworthiness and accuracy, conservative rendering provides the required fidelity of the generated images, but with lower frame rates and less stringent performance requirements. The main reason is the inevitable costs of additional operations for loading and unloading the contents of the clusters. However, conservative rendering is important for industrial applications that face the challenges of accurate scene analysis, collision and interference detection, and proper visualization of the detected issues. It is also necessary for applications that generate photorealistic graphics and video-presentations. Conservative rendering must quickly analyze the whole HDLOD tree and select clusters that will provide the desired level of fidelity, that is why the skeleton is required to be in main memory. The out-of-core algorithm employs the pseudocode shown in Figure 1, except for the display_cluster function. Instead of immediately rendering the cluster the out-of-core algorithm must first ensure that the cluster contents (geometry and position) are loaded into main memory. To efficiently implement this without stalling the main thread, two additional threads are used. The threads communicate via queues (Figure 2). The main thread traverses HDLOD tree and makes decisions on what clusters have to be rendered, similarly to in-core HDLOD rendering algorithm. The checks against bounding volumes, tolerances and presence statuses can be done immediately since they access attributes of the skeleton which are always in main memory under the assumption above. If the cluster is selected for rendering, it is added to the load queue. Load thread monitors load queue and performs loading of requested cluster contents into main memory. If the cluster is loaded, load thread sends the command for rendering it via render queue. Rendering thread monitors render queue and performs all necessary API calls to set state, load cluster mesh into video memory and run its rendering on GPU. To minimize number of loading operations, load thread maintains a cache in main memory. It keeps track of the loaded clusters and stores their contents in main memory. A memory limit is given to the cache manager to ensure that no more memory than specified is used. The only restriction to this limit is that the skeleton and the rendering buffers have to fit into memory too. When the limit is reached, some clusters have to be removed from cache in order to allow for loading of the new clusters. To decide on which clusters to remove, the least recently used (LRU) replacement policy is used. The cache manager maintains the list of loaded clusters sorted by the time they were last accessed. Each time a cluster is selected for rendering it is placed in the head of the list. This way, the tail of the list contains the least recently used cluster that is the first candidate for removal. The LRU policy is commonly used in out-of-core rendering algorithms to determine what data to release [2][3]. However, implementers must take care to not release clusters that are in use by the rendering thread. For this purpose, a special Boolean flag, indicating that the cluster is in use, can be employed. Another solution is to use smart pointers to ensure that the data is not freed from memory prematurely. Often out-of-core algorithms employ some sort of prefetching technique that is aimed at speculative preliminary loading of those clusters that are likely to be needed in the next frames. However, the experiments done have found situations where prefetching can lead to significant degradation of the total performance, thereby offsetting minor acceleration effects. For example, such situations happen if the memory cache is completely consumed for rendering clusters on the current frame and there is no extra space to preload clusters that presumably will be needed in the next frames. Forced prefetching of new clusters will inevitably require unloading of previously used clusters, including those that may be required again. Accuracy requirements can also impact the reasonable balance of memory cache sizes required for the current frame and next frames. Finally, effectiveness of prefetching highly depends on the quality of prediction of the camera movement, changes of modelling time and other user actions. Some specific scenarios (for example, animation playthrough) allow for accurate predictions, other scenarios (for example, navigation after a long stay) are poorly predicted and will most likely lead to additional unproductive costs for loading unnecessary clusters. In order not to lose in performance, the prefetching technique must take into account all the mentioned factors, which is a serious problem. Since the technique cannot be straightforwardly used and widely recommended, we did not use it in the experiments and focused on the fundamental possibility of using the HDLOD method for conservative out-of-core rendering of large dynamic scenes which is the main topics of the presented research. Figure 2: Overview of the out-of-core HDLOD rendering algorithm 4. Computational experiments To validate the proposed method a series of computational experiments has been carried out. As a benchmark, synthetic scene has been generated using method described in [10]. The scene consists of 9 buildings. Each building is made of 30 by 30 bricks and has 30 floors. In total, the scene contains 1078920 triangles and 89910 meshes. HDLOD has been generated for the scene and stored in files on the data storage device. The computational experiments have been organized in such a way that the camera has been moving through the scene over 500 frames and the frame times have been measured. At the start of the trajectory, the camera is far away from the scene. For the first 100 frames it moves closer the scene and ends up at such a distance that the scene is still inside view frustum. For the next 100 frames camera rotates 90 degrees to the right around the center of the scene. Then the camera moves close to the central building through one of the buildings, some buildings are culled by the view frustum. After that the camera again rotates around the central building with some buildings appearing and some disappearing from the view frustum. In the last 100 frames the camera rapidly moves far away from the scene through one of the buildings. Experiments have been carried out for static and dynamic scenarios. Dynamic behavior has been generated for the objects of the static scene emulating simultaneous construction of the buildings. Screenshots taken during the experiment on dynamic scene are presented in Figure 3, clusters are visualized with different colors. Clusters are selected for rendering based on screen-space error, which is calculated from their geometric error. To ensure high fidelity of conservative rendering only clusters whose screen-space error is no more than the size of the pixel are displayed. This way, the difference between original model rendering and rendering with HDLODs is barely visible. The rendering performance has been measured on a typical hardware configuration: Intel Core i7-4790 CPU (3.6 GHz), 16 GB of RAM, GeForce GTX 750 Ti (2 GB), Seagate ST2000DM001 HDD (2 TB, 7200 RPM). Figure 3: Screenshots taken during experiment on dynamic scene. The buildings grow while the camera is flying through the scene. Clusters are visualized with different colors It is worth noting that data storage device performance affects cluster contents loading time. To validate the proposed method of out-of-core rendering an HDD storage has been selected as a worst- case scenario. SSDs tend to have much better performance. There is also an important factor that can affect the rendering performance. To render geometry on a GPU it must first be buffered in video memory. Most optimal way is to use some sort of caching for geometry in video memory, in order to minimize loading operations (i.e., video buffer with paging and LRU policy for unloading). Since this paper is focused on loading cluster contents into main memory, to eliminate the video memory buffering factor from the experiments, loading cluster geometries into video memory was performed on the fly before rendering them. This increases frame times but also helps emphasize simplification culling effectiveness. The resulting frame time graphs for static and dynamic scenarios are presented in Figure 4. “In-core no HDLOD” graph illustrates the performance when rendering only original scene objects (leaf HDLOD clusters), which are residing in main memory. In this case the frame times are stable except for the part when frustum culling comes in play. The “in-core HDLOD” case is HDLOD rendering with all cluster contents preliminary loaded into main memory. It produces much lower frame times than “no HDLOD” due to simplification culling. The “Out-of-core HDLOD infinite cache” test starts with an empty cache; the cluster contents are loaded on the go as they get selected for rendering. The results are comparable with in-core test, except for the frame time spikes which are caused by loading large numbers of new clusters. This is especially noticeable when the camera moves through buildings (frames 250-280 and 400-410), since moving through the buildings requires loading large numbers of leaf clusters. In the dynamic scenario, not only the camera has been moving through the scene, but also modelling time has been changing. The test starts with an empty scene. “In-core no HDLOD” graph shows gradual increase in frame time (except for the part with frustum culling in frames 200-400) which corresponds to growth of the buildings and increase of triangle and mesh count with time. It also shows steps in appearing of new building elements. “Out-of-core HDLOD infinite cache” shows large periodic spikes. They are connected with dynamic behavior, because new clusters are selected for rendering due to changing geometric error functions of the clusters. Frame time per frame (lower is better) for static scene rendering 400 350 300 Frame time (ms) 250 200 150 100 50 0 1 9 137 185 321 369 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 145 153 161 169 177 193 201 209 217 225 233 241 249 257 265 273 281 289 297 305 313 329 337 345 353 361 377 385 393 401 409 417 425 433 441 449 457 465 473 481 489 497 Frame number In-core no HDLOD In-core HDLOD Out-of-core HDLOD infinite cache Frame time per frame (lower is better) for dynamic scene rendering 400 350 300 Frame time (ms) 250 200 150 100 50 0 1 9 17 25 33 41 49 57 65 73 81 89 97 137 185 321 369 105 113 121 129 145 153 161 169 177 193 201 209 217 225 233 241 249 257 265 273 281 289 297 305 313 329 337 345 353 361 377 385 393 401 409 417 425 433 441 449 457 465 473 481 489 497 Frame Number In-core no HDLOD In-core HDLOD Out-of-core HDLOD infinite cache Figure 4: Performance comparison of in-core rendering without HDLOD, in-core HDLOD rendering and out-of-core HDLOD rendering with infinite cache for static and dynamic scenarios. In-core performance without HDLOD is the worst. Out-of-core performance with infinite cache size matches in-core performance with the exception of frame time spikes that occur when new clusters are loaded In “infinite cache” test the cluster contents never get unloaded. The series of experiments with different cache sizes gives an idea of how cache size affects performance (Figure 5). For simplicity, cache size represents the number of clusters that can fit into cache. Overall, the results are comparable with infinite cache, the significant difference is observed only on very low cache sizes, when the size is too small to fit all clusters required for the frame. In such case cluster contents are unloaded after they are rendered to allow for loading next cluster contents, which causes permanent loading/unloading. For example, in the static scenario with the size of the cache for 1000 clusters it happens during frames 250- 280 (moving through one of the buildings), frames 290-400 (closely rotating around the central building) and frames around 410 (moving through one of the buildings as camera rapidly flies away from the scene). With the cache size of 100 number of rendered clusters per frame exceeds the cache size most of the time, which results in significantly degraded performance, except for the beginning and the end of the experiment, where the camera is far away from the scene which causes small number of simplified clusters to be rendered. In the dynamic scenario there are periods when no dynamic changes occur. They correspond to the lower frame time periods with cache sizes 100 and 1000. This happens because at the moments of modelling time when no changes occur more simplified clusters are selected for rendering (their geometric error functions take lower values for these moments of modelling time), thus reducing number of clusters required per frame. Results with cache size larger than 5000 are practically indistinguishable from “infinite cache” and are not included in the graphs. This means that the LRU replacement policy allows to efficiently remove old clusters that are no longer needed. With low cache sizes the out-of-core HDLOD performance is worse than in-core no HDLOD performance, but these are extreme scenarios. Typically, the cache size should be selected as large as possible, to occupy most of main memory. Moreover, “no HDLOD” test has contents of 89910 clusters residing in main memory, which is much more than 100 or 1000 clusters. Frame time per frame (lower is better) for static scene rendering 800 700 600 Frame time (ms) 500 400 300 200 100 0 1 9 137 185 321 369 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 145 153 161 169 177 193 201 209 217 225 233 241 249 257 265 273 281 289 297 305 313 329 337 345 353 361 377 385 393 401 409 417 425 433 441 449 457 465 473 481 489 497 Frame number Out-of-core HDLOD infinite cache Cache size 2000 Cache size 1000 Cache size 100 Frame time per frame (lower is better) for dynamic scene rendering 800 700 600 Frame time (ms) 500 400 300 200 100 0 1 9 137 185 321 369 17 25 33 41 49 57 65 73 81 89 97 105 113 121 129 145 153 161 169 177 193 201 209 217 225 233 241 249 257 265 273 281 289 297 305 313 329 337 345 353 361 377 385 393 401 409 417 425 433 441 449 457 465 473 481 489 497 Frame Number Out-of-core HDLOD infinite cache Cache size 2000 Cache size 1000 Cache size 100 Figure 5: HDLOD out-of-core performance for low cache sizes compared to infinite cache for static and dynamic scenarios. Larger frame times occur when size of the cache is too small to fit all the clusters required to conservatively render the frame, which causes constant loading/unloading during frame rendering. Other than that, frame times are comparable with infinite cache which is due to effectiveness of LRU replacement policy 5. Conclusions Thus, the carried-out studies have proven the high efficiency of the HDLOD method when applied for rendering of large dynamic scenes with deterministic events. The method is suitable for scenes allocated both in main memory and in external memory. The out-of-core version of the method is based on splitting the preliminary formed HDLOD cluster tree into a skeleton, which is mostly in main memory, and cluster contents, which are preferably in external memory and loaded into main memory when needed. This is not a burdensome limitation, since memory overhead is mainly caused by geometric and behavioral models, rather than by the tree structure and simple cluster attributes such as bounding volumes, time intervals and geometric errors. The research has proven the possibility for conservative out-of-core rendering of large dynamic scenes with a given level of tolerance and trustworthiness. Quite an expected result is that the larger the size of the available memory cache, the more successfully the rendering problem can be solved using the HDLOD method. The research is planned to be continued to analyze the possibilities of using the method for interactive out-of-core rendering, which is expected to alleviate the memory requirements. 6. References [1] W. Corrêa, J. T. Klosowski, C. T. Silva, iWalk: Interactive Out-Of-Core Rendering of Large Models, 2002. URL: https://www.researchgate.net/publication/2862878_iWalk_Interactive_Out- of-Core_Rendering_of_Large_Models. [2] G. Varadhan, D. Manocha, Out-of-core rendering of massive geometric environments, in: IEEE Visualization, VIS 2002, 2002, pp. 69–76. doi:10.1109/VISUAL.2002.1183759. [3] P. J. Cozzi, Visibility Driven Out-of-Core HLOD Rendering, Master’s thesis, University of Pennsylvania, Philadelphia, USA, 2008. [4] C. Erikson, D. Manocha, W. V. Baxter, HLODs for faster display of large static and dynamic environments, in: Proceedings of the 2001 symposium on Interactive 3D graphics, I3D '01, Association for Computing Machinery, New York, NY, USA, 2001, pp. 111–120. doi:10.1145/364338.364376. [5] S.-E. Yoon, B. Salomon, R. Gayle, D. Manocha, Quick-VDR: Interactive View-Dependent Rendering of Massive Models, in: Proceedings of the conference on Visualization '04, VIS '04, IEEE Computer Society, USA, 2004, pp. 131–138. doi:10.1109/VISUAL.2004.86. [6] I. Wald, A. Dietrich, P. Slusallek, An interactive out-of-core rendering framework for visualizing massively complex models, in: ACM SIGGRAPH 2005 Courses, SIGGRAPH '05, Association for Computing Machinery, New York, NY, USA, 2005, pp. 17–es. doi:10.1145/1198555.1198756. [7] D. Aliaga, J. Cohen, A. Wilson, E. Baker, H. Zhang, C. Erikson, K. Hoff, T. Hudson, W. Stuerzlinger, R. Bastos, M. Whitton, F. Brooks, D. Manocha, MMR: an interactive massive model rendering system using geometric and image-based acceleration, in: Proceedings of the 1999 symposium on Interactive 3D graphics, I3D '99, Association for Computing Machinery, New York, NY, USA, 1999, pp. 199–206. doi:10.1145/300523.300554. [8] Cesium 3D Tiles, Specification for streaming massive heterogeneous 3D geospatial datasets. URL: https://github.com/CesiumGS/3d-tiles. [9] V. Semenov, V. Shutkin, V. Zolotov, S. Morozov, V. Gonakhchyan, Visualization of Large Scenes with Deterministic Dynamics, Programming and Computer Software, vol. 46(3) (2020), pp. 223– 232. doi:10.1134/S036176882003007X. [10] V. Gonakhchyan, O. Tarlapan, V. Semenov, Generating Dynamic 3D Scenes for Rendering Benchmarks, in: Proceedings of the International Conference on Computer Graphics, Visualization, Computer Vision and Image Processing 2019, CGVCVIP 2019, 2019, pp. 485–488. doi:10.33965/cgv2019_201906P076.