<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep learning for predictive rendering of 3D printed objects</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Akmaral Amanturdieva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davit Gigilashvili</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiří Filip</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Norwegian University of Science and Technology</institution>
          ,
          <addr-line>Gjøvik</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The Czech Academy of Sciences, Institute of Information Theory and Automation</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This study explores the development of a deep learning-based predictive rendering system for 3D printed objects, addressing the challenge of accurately predicting surface appearance from input parameters like surface normals, light angles, view positions, and tangent vectors. By utilizing the Deep Shading architecture, we present and explore a method that synthesizes rendered appearances. The dataset, sourced from controlled multi-view and illumination imaging conditions, serves as the foundation for training and evaluating the model. We tested various loss functions and training data demonstrating a promising performance in 3D printed appearance reproduction. Our findings contribute to the broader efort of improving predictive rendering systems for 3D printed objects, with potential applications in manufacturing, design, and material science.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deep Learning</kwd>
        <kwd>Computer Graphics</kwd>
        <kwd>Rendering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The introduction of 3D printing technology has revolutionized manufacturing and prototyping
industries, enabling rapid and cost-efective production of complex geometries with diverse materials [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Despite these advancements, accurately predicting and visualizing the final appearance of printed
objects remains a significant challenge [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Material properties, printing processes, and surface finishing
techniques substantially influence the visual characteristics of the final product, creating a gap between
digital design and physical reality. As a result, designers often resort to costly, time-consuming cycles
of trial-and-error—printing, inspecting, and re-printing to achieve the desired look. Robust, physically
based predictive rendering tools could close this loop, reducing wasted material and energy and making
the entire workflow more sustainable and environmentally friendly.
      </p>
      <p>
        The 3D printing workflow typically involves three primary stages: designing a digital model, preparing
it for printing, and the actual fabrication process [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. At the core of this workflow lies computer graphics
technology, which facilitates the visualization of the object before physical production.
      </p>
      <p>
        Classical computer graphics methodologies originate from physical modeling principles, focusing on
geometry, surface properties, and camera settings [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The rendering process, transforming a scene
definition into a simulated camera image, typically follows one of two approaches: rasterization or ray
tracing. Rasterization maps geometry to the image domain in a feed-forward process, while ray tracing
simulates light paths by casting rays from image pixels into the virtual scene, recursively modeling
reflections and refractions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The quality of rendered images depends heavily on the accuracy of these
physical models and the sampling techniques employed to address the discrete nature of computer
simulations.
      </p>
      <p>While these classical rendering approaches have proven efective in many applications, they present
significant limitations for 3D printing visualization. First, physically accurate models require extensive
computational resources, making real-time visualization challenging. Second, traditional approaches
struggle to capture the complex material properties and printing artifacts that emerge during the
fabrication process. Finally, creating realistic models demands considerable time and costly manual efort
from skilled artists, rendering traditional computer graphics approaches time-consuming, expensive,
and error-prone when applied to 3D printing visualization.</p>
      <p>
        Recent advances in deep learning ofer promising solutions to these challenges by learning to generate
realistic visualizations based on patterns identified in training datasets. Neural rendering, a subset of
these techniques, enables a statistical perspective on image generation [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], potentially bridging the gap
between theoretical designs and the actual visual output of 3D printed objects. These approaches have
demonstrated remarkable success in various applications, including view synthesis, material editing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
and relighting [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] capabilities directly applicable to predicting the appearance of 3D printed objects.
      </p>
      <p>This study aims to leverage deep learning techniques to develop a predictive rendering system
specifically tailored for 3D printed objects. We focus on training a model capable of synthesizing the
appearance of 3D printed objects based on a set of intrinsic and extrinsic parameters, including surface
normals, light positions, viewing directions and tangent vectors. By incorporating these parameters, our
approach seeks to provide manufacturers, designers, and researchers with accurate visual predictions
before physical production, potentially reducing material waste and improving design iterations.</p>
      <p>The paper is organized as follows: Section 2 reviews existing approaches to predictive rendering,
focusing on image-based and intrinsic parameter-based methods. Section 3 describes our dataset and
its key attributes. Section 4 outlines the methodology, including data preprocessing, input encoding,
and architectural design of our models. Section 5 presents experimental results comparing diferent
model configurations under varied conditions. Finally, Section 6 discusses implications, limitations, and
future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>Approaches for predicting rendered 3D object appearance can be broadly categorized into two types:
image-based rendering methods and intrinsic image parameter-based techniques. Each category ofers
distinct advantages and limitations for visualizing 3D printed objects.</p>
      <sec id="sec-2-1">
        <title>2.1. Image-Based Rendering</title>
        <p>
          Image-based rendering (IBR) methods generate new images by manipulating existing image sets,
typically through processes like warping and compositing to combine visual elements into a cohesive
result [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The quality of generated images depends on factors such as geometry precision, quantity
and spatial distribution of input views, and material properties, as certain materials exhibit significant
appearance variations from diferent viewpoints [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          Recent advances in deep learning have substantially improved IBR techniques. Hedman et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
introduced Deep Blending, a novel approach that employs convolutional neural networks (CNNs) to
predict optimal blending weights for each pixel, enabling seamless integration of input images without
relying on handcrafted heuristics. IBRNet, presented by Wang et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], ofers a more generalized
solution for multiview image-based rendering by synthesizing novel views of complex scenes through
interpolation of sparse nearby views.
        </p>
        <p>
          Building on these advances, Sun et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] developed SIBRNet (Sparse Image-Based Rendering
Network), which addresses the challenge of image-based rendering in sparse scene geometry through a
two-stage approach combining geometry recovery and light blending.
        </p>
        <p>These image-based approaches require multiple reference images of the actual printed object, limiting
their predictive capabilities during the design phase.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Intrinsic Image Parameters-Based Approaches</title>
        <p>
          Intrinsic image parameters-based approaches focus on decomposing and manipulating fundamental
visual attributes that contribute to object appearance. These methods operate by explicitly modeling
and controlling properties such as surface geometry, material characteristics, illumination conditions,
and viewpoint parameters. Unlike image-based approaches, these methods attempt to understand and
model the underlying physical properties that generate the observed appearance. In this context, neural
rendering establishes a mapping ( I = M(c) ), where (  ∈ Rin ) represents control parameters, and (
 ∈ R× ×3 ) denotes the corresponding output image with height ( H ) and width ( W )—a process
that can be understood as complex sparse data interpolation [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          Santo et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] approached the inverse of this problem by introducing a Deep Photometric Stereo
Network (DPSN) that maps reflectance observations to surface normals and reflectance properties.
Unlike traditional methods, DPSN employs a data-driven approach using the MERL BRDF dataset [12],
which includes measured BRDFs of diverse materials. The network predicts surface normals and
reflectance coeficients per pixel from images under predefined light directions, enabling scene relighting
under arbitrary lighting conditions.
        </p>
        <p>
          Chen et al. [13] proposed PS-FCN, a fully convolutional neural network that predicts an object’s
normal map from an arbitrary number of input images taken under diverse lighting directions. Unlike
DPSN [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which requires a fixed set of predefined light directions, PS-FCN introduces an order-agnostic
mechanism through max-pooling operations that enable robust feature aggregation regardless of the
input images’ order or number. Nalbach et al. [14] introduced a paradigm shift by utilizing deep learning
to directly map deferred shading bufers to RGB outputs.
        </p>
        <p>A recent breakthrough by Zeltner et al. [15] presents Real-time Neural Appearance Models (NAMs),
which incorporates spatial material properties, per-pixel surface normals, BRDF parameters, lighting
properties, and viewpoint information to enable fast, real-time appearance prediction.</p>
        <p>These intrinsic parameter-based approaches show particular promise for predicting the appearance
of 3D printed objects. However, they typically require extensive training data and struggle to capture
the full range of printing artifacts and material behaviors that emerge during the physical fabrication
process.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>The dataset for this study was provided by Institute of Information Theory and Automation [16]. It
represents a comprehensive dataset designed to capture the complex interactions between viewing
angles, illumination conditions, and surface properties. The dataset comprises multiple components
including multi-view images, geometric information, and angular data, enabling detailed analysis of
appearance characteristics under varying conditions.</p>
      <sec id="sec-3-1">
        <title>3.1. Image Acquisition</title>
        <p>The primary dataset consists of multi-view, multi-illumination images captured under systematically
controlled conditions. Examples of the images from the dataset can be found in Figure 1. The acquisition
protocol followed a structured sampling approach:
• Viewing Elevations: 6 distinct positions
• Illumination Positions: 81 distinct positions
• Total Images: 6 × 81 = 486 combinations
• Format: 8-bit SDR RGB PNG
Each image follows a standardized naming convention:</p>
        <p>image_tlXXX_plXXX_tvXXX_pvXXX.png
where:
• tl/pl: Light source elevation/azimuth (in degrees)
• tv/pv: Viewing position elevation/azimuth (in degrees)
(a) Object captured at 75° (b) Object captured at 15° (c) Object captured at 60° (d) Object captured at 75°
viewing position eleva- light source elevation light source elevation, light source elevation,
tion and 120° light source az- 18° light source azimuth 180° light source
azimuth and 15° viewing position imuth and 75° viewing
elevation position elevation</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Auxiliary Data</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Binary Masks</title>
          <p>For each captured image the dataset contains corresponding binary masks (mask_*) that contain valid
pixel regions. These masks are encoded as binary images where white pixels (255) indicate valid regions,
facilitating precise spatial analysis and region-of-interest processing.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Normal and Tangent Maps</title>
          <p>Surface geometry information is encoded through ideal-fitted normal maps ( normal_map_*) for
each viewpoint. In addition to surface normals, the dataset provides a per-pixel tangent map
(tangent_map_*) for every viewpoint. Tangent maps encode the direction of increasing  texture
coordinate on the surface and are useful for anisotropic appearance models or tangent-space shading.
The normal and tangent vectors are color-coded using the following transformation:
 + 1
RGB = 255 ×
2
where  represents the normalized surface normal/tangent vector at each pixel location.
(1)</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Angular Information Maps</title>
          <p>The dataset includes per-pixel angular information encoded using a three-channel representation. in
separate maps:
1. Light angle maps:
2. View angle maps:
• Elevation: light_elev_*
• Azimuth: light_azim_*
• Elevation: view_elev_*
• Azimuth: view_azim_*</p>
          <p>An example image (Figure 3) with all the attributes can be observed in Figure 2.
(a) Normal map visualization
encoded in RGB colors
(b) Light elevation map displaying
angular distribution at 30°
(c) Light azimuth map showing
directional information at 150°
(d) View elevation map indicating
observation angles at 15°
(e) View azimuth map showing
viewing directions at 15°
(f) Tangent map visualization
encoded in RGB colors</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <sec id="sec-4-1">
        <title>4.1. Data Preparation</title>
        <p>Data preparation methodology follows multiple processing stages designed to transform raw input data
into a format optimized for deep learning model training.</p>
        <p>Normal and Tangent maps to angles. For each pixel with 8-bit colour triplet C = (, , ) ∈
[0, 255]3 the unit normal/tangent is recovered as
 and  are then
with  ∈ (−, ] and  ∈ [0, ]. In degrees we use 
∘ = ( + ) 180 mod 360 and  ∘ =  180 .</p>
        <p>Decoding light/view angles. The elevation and azimuth angles for both light direction and viewing
position are initially stored in color-coded three-channel images. These are subsequently decoded into
single-channel representations containing per-pixel angular values using the equation:
 ∘ =
256 upper + lower 360, where upper is first and lower is a second channel.</p>
        <p>
          65535
Cartesian direction encoding. All directional quantities: surface normal, tangent, light and view
are finally expressed as unit vectors v = (, , ) and linearly mapped to [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] through (v + 1)/2. For
spherical angles (, ) (used by normals and tangents)
while for azimuth  and elevation  (used by light and view)
v = (︀ sin  cos , sin  sin , cos  )︀ ,
v = (︀ cos  cos , cos  sin , sin )︀ .
        </p>
        <p>This representation approach ofers several significant advantages. The method ensures consistent
input standardization across diferent parameter types while reducing dimensionality through eficient
encoding. Furthermore, it facilitates the learning of rotation-independent features and provides a more
intuitive representation of directional information within well-defined numerical bounds (0-1).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Training Data Generation</title>
        <p>To enhance the training process and manage computational resources efectively, we implement a
sophisticated patch-based preprocessing strategy. The process begins by loading binary masks that
indicate valid pixel regions for each image. We then apply a sliding window approach using a 32 × 32
pixel window with a step size of a random number between 5 and 32 pixels, ensuring overlap between
adjacent patches. Only patches containing exclusively valid pixels, as determined by the mask, are
retained. For each valid patch, we record its coordinates, including the image index and spatial position
(y, x). These coordinates are then used to generate corresponding patches from both the output images
(rendered appearance) and input data (normal map, light, and view parameters).</p>
        <p>The implementation of randomly overlapping patches in our preprocessing strategy serves multiple
purposes. This approach significantly reduces block boundary artifacts in predictions while enhancing
the capture of contextual information. The overlap between adjacent patches also improves model
robustness by providing multiple perspectives of boundary regions. However, we must acknowledge
certain challenges inherent in this approach, particularly the potential loss of global context and
sensitivity to patch size and stride parameters.</p>
        <p>Our patch-based preprocessing significantly expands the dataset, transforming 486 source images
into approximately 80 000–100 000 patches. This expansion provides several key benefits for the
learning process. The increased quantity of training data enables more efective learning of local
features and patterns while maintaining invariance to global position. The approach also facilitates
eficient stochastic sampling during training and optimizes computational resource utilization through
controlled patch sizes.
(2)
(3)
(4)
(5)
(6)</p>
        <p>The substantial increase in data volume necessitates careful consideration of computational resources.
To address this challenge, we implement eficient batch processing techniques and optimize storage
strategies for the extracted patches. This ensures that the benefits of our comprehensive patch-based
approach are realized without overwhelming computational resources. Furthermore, the overlapping
nature of our patches, while increasing memory requirements, provides essential redundancy that
contributes to the robustness and accuracy of the model’s predictions.</p>
        <p>As a first step, we focus on the achromatic components of appearance; therefore, all images were
processed in grayscale. Full-color rendering is beyond the scope of this work and will be addressed in
future studies.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Model Architecture</title>
        <p>Due to the simplicity and alignment with our research questions, in this work we train and test the
model Deep Shading from [14].</p>
        <p>Deep Shading model is a Convolutional Neural Network (CNN) with a U-shaped architecture to
perform screen-space shading tasks. The architecture consists of an encoder-decoder structure with
skip connections to preserve spatial details, inspired by U-Net designs. The encoder progressively
downsamples spatial resolution while capturing high-level semantic features, while the decoder
upsamples the features back to the original resolution, enabling precise pixel-wise predictions. The model
takes 12-channel input containing multiple per-pixel attributes. These attributes are concatenated and
processed through a series of convolutional layers with Leaky ReLU activations, batch normalization,
and pooling operations. The latent representation from the encoder is passed to the decoder, where
upsampling layers and transposed convolutions restore the original spatial dimensions. Skip connections
between corresponding encoder and decoder layers ensure the retention of fine-grained details critical
for pixel-wise tasks. The final output is a grayscale image representing the rendered object. Training is
conducted using an L1 loss function, also known as mean squared error (MSE).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results</title>
      <p>In this section, we present the experimental setup, evaluation metrics, and key results obtained from the
proposed methodology with the model introduced in [14]. The goal of these experiments is to assess
the performance of the model in terms of accuracy, generalization, and robustness. We compare the
method with baseline approach. Both qualitative and quantitative evaluations are presented. All of the
experiments were conducted using PyTorch framework on NVIDIA RTX 4090 GPU.</p>
      <sec id="sec-5-1">
        <title>5.1. Loss Function Design</title>
        <p>L1 (photometric) loss The baseline uses the L1 photometric loss, which is the mean absolute error
between predicted pixel values defined as:
ℒ1 =
1</p>
        <p>∑︁⃒⃒ pred(, ) −  gt(, )⃒⃒ ,
 ,
where  and  represent the height and width of the image in pixels, (, ) are the pixel coordinates,
pred is the predicted image output from our model, and gt is the ground truth image. L1 is preferred
over L2 (MSE) in image synthesis because it tends to preserve sharper details and avoid excessive
blurring [17].</p>
        <p>Gradient diference loss To enhance high-frequency detail, a gradient diference loss is added on
top of L1. This term penalizes diferences in image gradients between prediction and target, efectively
aligning edges.</p>
        <p>ℒgrad =</p>
        <p>1 ∑︁(︀ ⃒⃒ ∇pred − ∇ gt⃒⃒ + ⃒⃒ ∇pred − ∇ gt⃒⃒ )︀ ,
 ,
(7)
(8)
to the Gradient Diference Loss of [18].
where ∇ and ∇ represent the gradients in the horizontal and vertical directions, respectively, similar</p>
        <sec id="sec-5-1-1">
          <title>Intensity-weighted loss</title>
          <p>
            The intensity loss term reweighs the pixel error according to brightness,
so that highly illuminated regions (e.g. highlights) contribute more. Using a weighting function
light(, ) ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ],
ℒint =
          </p>
          <p>1
 ,</p>
          <p>∑︁ light(, ) ⃒⃒ pred(, ) −  gt(, )⃒⃒ ,
where light(, ) is a weighting factor proportional to the pixel intensity in the ground truth image.
The motivation comes from both human perception (errors in brightly lit areas are more noticeable)
and physical reasoning (bright regions often correspond to direct reflections or light sources).</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Specular-highlight loss</title>
          <p>This term targets view-dependent specular reflections. With view direction
V, light direction L, surface normal N and halfway vector H = V+L
one can identify pixels likely
on specular highlights [19]
‖V+L‖
spec(, ) = (N·H) 16,
ℒspec =</p>
          <p>∑︁ spec(, ) ⃒⃒ pred(, ) −  gt(, )⃒⃒
1
 ,
where spec(, ) is a weighting factor that highlights areas with likely specular reflections based on
the dot product of the normal vector and halfway vector raised to the power of 16. This power value
approximates a typical specular exponent in the Blinn-Phong reflectance model [20].</p>
        </sec>
        <sec id="sec-5-1-3">
          <title>Tangent-space directional loss</title>
          <p>Finally, the tangent gradient loss encourages the predicted image
to vary along the surface in the same way as the ground truth when moving across the object’s surface.
Given surface tangent T and bitangent B at each pixel, directional derivatives of an image  are
T = ∇  + ∇  and B = ∇  + ∇ .</p>
          <p>ℒtang =</p>
          <p>1
 ,
∑︁(︁ ⃒⃒ Tpred −  Tgt⃒⃒ + ⃒⃒ Bpred −  Bgt⃒⃒ ,
︁)
where , , , and  are the components of the tangent and bitangent vectors in the image plane.
These directional derivatives T and B measure how the image intensity changes when moving
along the tangent and bitangent directions on the surface. Intuitively, this term checks whether the
network’s output image has the correct directional lighting gradients as the viewer’s eye moves across
the surface. Recent work on polarized inverse rendering imposes tangent space consistency across
views to better constrain shape and material estimation [21].</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Training and Validation</title>
        <p>32 × 32
iterations 5 consecutive times.</p>
        <p>During training and validation, the network was optimized with the Adam optimizer ( 1 = 0.9,
 2 = 0.999). The learning rate was initialized at 1 × 10 −3
halves the rate at predefined milestones. Each update processed mini-batches of 256 overlapping
and followed a step–decay schedule that
pixel patches (random overlap) to enhance spatial coverage and reproducibility. To curb
overfitting, early stopping terminated training when the validation loss failed to improve for 150</p>
        <p>We trained separate models using each loss combination (L1 alone, L1 + Gradient, L1 + Intensity, L1
+ Specular, L1 + Tangent) on our predictive rendering task.</p>
        <p>(9)
(10)
(11)
(a) GT
(b) L1
(c) L1+Gradient
(d) L1+Intensity
(e) L1+Specular
(f) L1+Tangent</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Qualitative Comparison</title>
        <p>We present a side-by-side visualization of representative test images under each trained model to
qualitatively assess the impact of the diferent loss terms (see Figure 4). For clarity, we compare: (a)
the ground truth rendered image, (b) output with L1 loss only, (c) output with L1 + Gradient loss, (d)
output with L1 + Intensity loss, (e) output with L1 + Specular loss, and (f) output with L1 + Tangent loss.
Figure 4 provides a visual comparison highlighting key areas (edges, highlights, and subtle shading
details).</p>
        <p>Looking at the L1 baseline result (Fig. 4b), the overall structure and general appearance are correctly
captured, but fine details show noticeable blurring. For instance, specular highlights on the objects’
glossy surfaces appear duller and more difused compared to the ground truth. This demonstrates a
wellknown limitation of pixel-wise losses, they tend to minimize average error by blurring high-intensity
features.</p>
        <p>The L1 + Gradient model (Fig. 4c) shows modest improvement in edge preservation, particularly
visible at object boundaries and contour transitions. However, the L1 + Intensity model (Fig. 4d) yields
more substantial improvements in reproducing bright features. The specular highlights, for example,
are much closer in intensity to the ground truth, they appear brighter and more prominent, whereas in
the baseline they were noticeably muted.</p>
        <p>The L1 + Specular model (Fig. 4e) demonstrates the most significant improvement in the placement,
shape, and intensity of specular reflections. Because this model specifically penalizes errors when
N · H is high (i.e., at mirror-reflection angles), it learns to accurately reproduce highlights with correct
size, position, and intensity. The results show that the specular highlights are not only bright but also
correctly localized.</p>
        <p>Similarly, the L1 + Tangent model (Fig. 4f) maintains coherent shading patterns that follow the
underlying geometry, resulting in more physically plausible renderings. The directional consistency of
lighting gradients across the surface is particularly visible on curved regions of the objects.</p>
        <p>Visually, the best performing models are those incorporating specular and tangent-space losses
(Fig. 4e,f). These models produce results that most closely resemble the ground truth in terms of
highlight reproduction and surface shading coherence. Interestingly, while the visual improvements are
clear to human observers, traditional computational metrics like PSNR and SSIM (Table 1) fail to fully
capture these perceptual enhancements. This highlights a known limitation in current image quality
metrics.</p>
        <p>In summary, our qualitative comparisons confirm that each added loss term efectively addresses
its intended visual aspect, with the specular and tangent-space losses providing the most visually
compelling results for realistic material rendering.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Quantitative Summary</title>
        <p>We evaluated each loss function configuration using standard image quality metrics: Mean Absolute
Error (MAE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM) as shown in
Table 1. While the baseline L1 loss achieved the best numerical scores across these metrics, the perceptual
quality of the results does not align with these measurements, which is a known limitation of traditional
metrics when evaluating specular and high-frequency details. Among the other approaches, L1+Specular
and L1+Tangent models produced the next best quantitative results, which better correspond with our
qualitative assessment of visual quality.</p>
        <p>This discrepancy between computational metrics and perceived quality highlights a fundamental
challenge in material appearance evaluation. Standard image metrics tend to favor results that minimize
average error across all pixels, regardless of perceptual importance, whereas human observers are
particularly sensitive to specular highlights and coherent shading patterns that maintain physical
plausibility. These findings suggest that for reflectance modeling applications, specialized perceptual
metrics that better capture these visual aspects may be more appropriate than generic image quality
measurements.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion and Conclusion</title>
      <p>The development of a predictive rendering system for 3D printed objects marks a significant step
toward more accurate appearance prediction in manufacturing workflows. By using the Deep Shading
architecture proposed in [14], we addressed the challenge of synthesizing rendered images from
perpixel attributes, which include surface normals, tangents, light positions, and view directions. Our
approach aimed to provide a tool to visualize and analyze surface appearance before fabrication, thereby
reducing the need for costly iterative prototyping.</p>
      <p>Overall, the approach demonstrates promising results. However, a noticeable gap remains between
the ground-truth photographs and even the best-performing neural predictions, indicating limitations
that merit further investigation. In particular, the models still struggle to reproduce fine-scale layered
textures characteristic of the printing, to maintain consistency across specular highlights, and to
generalize to unseen lighting configurations.</p>
      <p>Our investigation into various loss functions reveals important insights for neural rendering systems.
While standard metrics favor the baseline L1 loss, visual assessment demonstrates that the L1+Specular
and L1+Tangent models (Fig. 4e,f) produce perceptually superior results. Notably, these models show
a promising tendency to reproduce hints of the layered structure inherent in 3D printed materials,
capturing subtle surface characteristics that traditional metrics fail to quantify.</p>
      <p>The divergence between computational metrics and visual quality underscores the need for
specialized evaluation approaches in appearance modeling. Our findings suggest that loss functions targeting
specific visual phenomena, such as specular highlights and directionally consistent shading, can
significantly enhance the realism of rendered outputs despite showing modest improvements in conventional
image quality metrics.</p>
      <sec id="sec-6-1">
        <title>6.1. Limitations and Future Work</title>
        <p>Despite these promising results, our current implementation represents a pilot experiment with several
inherent limitations that present opportunities for future work. The study was conducted with a
relatively small dataset and modest computational resources, which constrained both the model complexity
and training scope. Our grayscale implementation, while suficient for proof-of-concept, limits
practical applicability in modern 3D printing workflows that increasingly rely on multi-material and color
printing technologies. The training dataset focused on a limited range of images of the printed object,
potentially limiting the generalization of fine surface details. Current parameter optimization uses fixed
weighting schemes for loss function combinations, whereas adaptive or learned weighting strategies
could yield superior results. The lighting model is restricted to directional illumination, excluding more
complex scenarios involving area lights, environment lighting, and multiple sources with varying color
temperatures that are common in real-world applications. Additionally, while our system demonstrates
good performance on controlled test images, comprehensive validation against actual photographs of
3D printed objects under varied conditions remains essential to evaluate practical utility and identify
potential discrepancies between predicted and manufactured appearances. Data augmentation techniques,
including controlled noise introduction to account for manufacturing variability, represent unexplored
avenues for improving model robustness. Moving forward, scaling to larger datasets, more sophisticated
architectures, and enhanced computational resources could significantly advance the capability and
reliability of predictive rendering systems for 3D printing applications, ultimately enabling more precise
control over final object appearance and reducing the design-to-manufacturing iteration cycle.</p>
        <p>We believe that continued advancement in appearance prediction systems will play a crucial role in
bridging the gap between digital design and physical manufacturing, ultimately enabling more precise
control over the final appearance of 3D printed objects. The development of specialized loss functions
that target perceptually important aspects of material appearance represents a promising direction for
future research in this area.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This research was partially supported by the Czech Science Foundation grant GA22-17529S.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any generative AI tools.</p>
    </sec>
    <sec id="sec-9">
      <title>Code and Data Availability</title>
      <p>The corresponding dataset is available upon request. The code is available via https://github.com/
Dolphin000/deep-shading-project.
[12] W. Matusik, H. Pfister, M. Brand, L. McMillan, A data-driven reflectance model, ACM Transactions
on Graphics (TOG) 22 (2003) 759–769. URL: https://www.merl.com/publications/TR2003-83. doi:10.
1145/882262.882343.
[13] G. Chen, K. Han, K.-Y. K. Wong, PS-FCN: A Flexible Learning Framework for
Photometric Stereo, 2018. URL: http://arxiv.org/abs/1807.08696. doi:10.48550/arXiv.1807.08696,
arXiv:1807.08696 [cs].
[14] O. Nalbach, E. Arabadzhiyska, D. Mehta, H.-P. Seidel, T. Ritschel, Deep Shading: Convolutional
Neural Networks for Screen Space Shading, Computer Graphics Forum 36 (2017) 65–78. URL:
https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13225. doi:10.1111/cgf.13225.
[15] T. Zeltner, F. Rousselle, A. Weidlich, P. Clarberg, J. Novák, B. Bitterli, A. Evans, T. Davidovič,
S. Kallweit, A. Lefohn, Real-time Neural Appearance Models, ACM Transactions on Graphics 43
(2024) 1–17. URL: https://dl.acm.org/doi/10.1145/3659577. doi:10.1145/3659577.
[16] Institute of Information Theory and Automation, 2024. URL: http://www.utia.cas.cz/.
[17] P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros, Image-to-image translation with conditional adversarial
networks, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), 2017, pp. 1125–1134.
[18] M. Mathieu, C. Couprie, Y. LeCun, Deep multi-scale video prediction beyond mean square error,
in: International Conference on Learning Representations (ICLR), 2016.
[19] S. Wu, S. Basu, T. Brödermann, L. Van Gool, C. Sakaridis, PBR–NeRF: Inverse rendering with
physics-based neural fields, arXiv preprint arXiv:2412.09680 (2024). CVPR 2025, to appear.
[20] J. F. Blinn, Models of light reflection for computer synthesized pictures, in: Proceedings of the 4th
Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’77, Association
for Computing Machinery, New York, NY, USA, 1977, pp. 192–198. doi:10.1145/563858.563893.
[21] C. Li, T. Ono, T. Uemori, S. Nitta, H. Mihara, A. Gatto, H. Nagahara, Y. Moriuchi, Neisf++: Neural
incident stokes field for polarized inverse rendering of conductors and dielectrics (2024). URL:
http://arxiv.org/abs/2411.10189. doi:10.48550/arXiv.2411.10189, arXiv:2411.10189.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Shahrubudin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramlan</surname>
          </string-name>
          ,
          <article-title>An Overview on 3D Printing Technology: Technological, Materials, and</article-title>
          <string-name>
            <surname>Applications</surname>
          </string-name>
          ,
          <source>Procedia Manufacturing</source>
          <volume>35</volume>
          (
          <year>2019</year>
          )
          <fpage>1286</fpage>
          -
          <lpage>1296</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S2351978919308169. doi:
          <volume>10</volume>
          .1016/j.promfg.
          <year>2019</year>
          .
          <volume>06</volume>
          .089.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Oropallo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Piegl</surname>
          </string-name>
          ,
          <article-title>Ten challenges in 3D printing</article-title>
          ,
          <source>Engineering with Computers</source>
          <volume>32</volume>
          (
          <year>2016</year>
          )
          <fpage>135</fpage>
          -
          <lpage>148</lpage>
          . URL: https://doi.org/10.1007/s00366-015-0407-0. doi:
          <volume>10</volume>
          .1007/s00366-015-0407-0.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Horvath</surname>
          </string-name>
          , Mastering 3D Printing, Apress, Berkeley, CA,
          <year>2014</year>
          . URL: http://link.springer.com/10. 1007/978-1-
          <fpage>4842</fpage>
          -0025-4. doi:
          <volume>10</volume>
          .1007/978-1-
          <fpage>4842</fpage>
          -0025-4.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tewari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Fried</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sitzmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lombardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sunkavalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Martin-Brualla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Saragih</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Nießner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fanello</surname>
          </string-name>
          , G. Wetzstein,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Theobalt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Agrawala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shechtman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Goldman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zollhöfer</surname>
          </string-name>
          ,
          <source>State of the Art on Neural Rendering, Computer Graphics Forum</source>
          <volume>39</volume>
          (
          <year>2020</year>
          )
          <fpage>701</fpage>
          -
          <lpage>727</lpage>
          . URL: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.14022. doi:
          <volume>10</volume>
          .1111/cgf.14022.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sawayama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <article-title>Unsupervised learning reveals interpretable latent representations for translucency perception</article-title>
          ,
          <source>PLOS Computational Biology</source>
          <volume>19</volume>
          (
          <year>2023</year>
          )
          <article-title>e1010878</article-title>
          . doi:
          <volume>10</volume>
          .1371/ journal.pcbi.
          <volume>1010878</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Righetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Khademizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Giachetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ponchio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gigilashvili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bettio</surname>
          </string-name>
          , E. Gobbetti,
          <article-title>Eficient and user-friendly visualization of neural relightable images for cultural heritage applications</article-title>
          ,
          <source>ACM Journal on Computing and Cultural Heritage</source>
          <volume>17</volume>
          (
          <year>2024</year>
          )
          <volume>54</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>54</lpage>
          :
          <fpage>24</fpage>
          . doi:
          <volume>10</volume>
          .1145/3690390.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Philip</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Price</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-M. Frahm</surname>
          </string-name>
          , G. Drettakis, G. Brostow,
          <article-title>Deep blending for freeviewpoint image-based rendering</article-title>
          ,
          <source>ACM Trans. Graph</source>
          .
          <volume>37</volume>
          (
          <year>2018</year>
          )
          <volume>257</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>257</lpage>
          :
          <fpage>15</fpage>
          . URL: https://dl.acm. org/doi/10.1145/3272127.3275084. doi:
          <volume>10</volume>
          .1145/3272127.3275084.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Genova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Barron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Martin-Brualla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Snavely</surname>
          </string-name>
          , T. Funkhouser, IBRNet: Learning
          <string-name>
            <surname>Multi-View Image-Based</surname>
            <given-names>Rendering</given-names>
          </string-name>
          ,
          <year>2021</year>
          . URL: http://arxiv.org/ abs/2102.13090. doi:
          <volume>10</volume>
          .48550/arXiv.2102.13090, arXiv:
          <fpage>2102</fpage>
          .
          <fpage>13090</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , R. Cheng, W. Tan,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <article-title>Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion</article-title>
          , in: 2022 IEEE/CVF Conference on
          <article-title>Computer Vision and Pattern Recognition (CVPR)</article-title>
          , IEEE, New Orleans, LA, USA,
          <year>2022</year>
          , pp.
          <fpage>7803</fpage>
          -
          <lpage>7813</lpage>
          . URL: https://ieeexplore.ieee.org/document/9878554/. doi:
          <volume>10</volume>
          .1109/CVPR52688.
          <year>2022</year>
          .
          <volume>00766</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tewari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mildenhall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Srinivasan</surname>
          </string-name>
          , E. Tretschk,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lassner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sitzmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Martin-Brualla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lombardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Simon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Theobalt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Niessner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Barron</surname>
          </string-name>
          , G. Wetzstein,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zollhoefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Golyanik</surname>
          </string-name>
          ,
          <source>Advances in Neural Rendering</source>
          ,
          <year>2022</year>
          . URL: http://arxiv.org/abs/2111. 05849. doi:
          <volume>10</volume>
          .48550/arXiv.2111.05849, arXiv:
          <fpage>2111</fpage>
          .05849 [cs].
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>H.</given-names>
            <surname>Santo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Samejima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sugano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Matsushita</surname>
          </string-name>
          , Deep Photometric Stereo Network, in: 2017 IEEE International Conference on Computer Vision Workshops (ICCVW), IEEE, Venice,
          <year>2017</year>
          , pp.
          <fpage>501</fpage>
          -
          <lpage>509</lpage>
          . URL: http://ieeexplore.ieee.org/document/8265276/. doi:
          <volume>10</volume>
          .1109/ICCVW.
          <year>2017</year>
          .
          <volume>66</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>