<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>3D Reconstruction of Gastrointestinal Regions from Single Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bilal Ahmad</string-name>
          <email>bilal.ahmad@ntnu.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pål Anders Floor</string-name>
          <email>paal.anders.floor@ntnu.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ivar Farup</string-name>
          <email>ivar.farup@ntnu.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Milan Kresović</string-name>
          <email>milank@stud.ntnu.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Norwegian University of Science &amp; Technology</institution>
          ,
          <addr-line>2815 Gjøvik</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>3D shape reconstruction from images is one of the problem under investigation in the field of computer vision. Shape-from-shading (SfS) is an important approach which requires the reflectance properties of surface and light source position to infer the 3D shape. SfS is usually tested in medical applications without the availability of the ground truth data which makes the conclusion dubious. In this article SfS is applied on synthetic gastrointestinal regions and precise comparison is done between recovered shape and ground truth data by measuring the depth error and correlation between them. Results shows that SfS can recover the shapes quite well if penalized correctly.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;3D reconstruction</kwd>
        <kwd>Capsule endoscopy</kwd>
        <kwd>Shape-from-shading</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>With the advancement of medical field, the current trend is to make surgery ever less invasive.
This implies smaller and smaller cuts in the patient’s skin, which do not give the surgeons a
direct view of their work. They only leave enough space for small cameras to be introduced
into the patient’s body. If the resulting images in such conditions are of poor quality [1], it can
make the surgeon’s work even harder. 3D reconstruction can be helpful in such cases to better
diagnose, visualize or analyze the areas of interests.</p>
      <p>3D reconstruction is an inverse problem which can be rectified by applying diferent
techniques to the images [2]. It is vital to get the information regarding the 3D structure or the
scene’s depth since most tasks are completed in the 3D world. The concept of depth estimation
involves using various approaches or algorithms to attain the spatial information of the object
or to acquire the distances of all the points present in the scene, with respect to a specific chosen
point.</p>
      <p>Vision-based depth estimation methods are generally classified into diferent categories. Some
methods comprise of the usage of special devices for depth estimation [3]. Examples of these
technique are ultrasonic and optical time-of-flight estimation in which measured energy beam
is first transported and then reflected energy is detected [4]. Other methods do not make use
of any artificial source of energy and natural outdoor scenes fall under its category. Various
monocular image-based techniques such as texture gradient analysis and photo-metric methods
are used. Other methods are hinged on the motion or multiple relative positions of the camera
[5]. 3D reconstruction has numerous applications in robotics, medical applications including
diagnostics, video surveillance and monitoring etc. [6].</p>
      <p>Shape-from-shading (SfS) is one of the many computer vision techniques to reconstruct the
3D shape of an object. It is distinct from other methods because it requires only one image for
3D reconstruction. SfS consists of two steps. In the first step, a reflection model is developed
based on reflectance properties of the surface, position of the camera and light source. In the
second step, a numerical scheme is designed to solve the image irradiance equation (IIE) which
is constructed either by using partial diferential equation (PDE) or optimization methods.</p>
      <p>SfS was first discussed by Horn and Brooks [ 7], who developed an iterative scheme based
on nonlinear first-order PDE by relating 3D shape to intensity variation in its image. Kimmel
et al. [8] solved the SfS problem using fast marching method. Tankus et al. re-examines the
SfS problem by solving the IIE under perspective projection so that it could be treated on a
broader set of real world cases. Wu et al. [9] also solve the IIE under perspective projection
with multiple light sources around camera center.</p>
      <p>In real world applications, SfS is useful in situations where one shot of the scene is available.
One of the recent application of SfS in today’s era is capsule endoscopy [10] where usually
position of light sources are known which are essential for this method. Additionally, rapid
movement of capsule in certain areas of GI makes SfS a preferable choice because those areas
might be captured once. In recent years, SfS has been applied on endoscopic images for 3D
reconstruction [11, 12]. Although results seem promising, the conclusion is still uncertain
because SfS methods are mostly applied without the availability of the ground truth data.</p>
      <p>In this paper, a precise comparison is done between the recovered 3D shape and the ground
truth data of synthetic models of Gastrointestinal (GI) regions developed by [13]. The models
are imported in Blender1 and then modified for true comparison. SfS was implemented with
anisotropic difusion as a smoothness constraint to preserve details in the recovered geometry.
It is also novel in this work because L2 regularizer is typically used as a smoothness constraint
which is unable to preserve the edges. Depth Error and correlation is then measured between
recovered shape and the ground truth to estimate the quality of 3D reconstruction.</p>
      <p>The remainder of this article is organized as follows. Section 2 explains perspective SfS
model with anisotropic difusion. Results are compared and discussed in Section 3 and Section
4 concludes the article.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Point Light Source Perspective SfS Model</title>
      <p>
        This section briefly explains the SfS model under point light and perspective projection, where
the light source is placed at the center of the camera projection as shown in Figure 1. Under
the assumption of difuse surface, radiance emitted by the surface element S can be computed
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
according to Lambert’s cosine Law and inverse square fall of law of point light source [9],
(̃︀, ̃︀, , , ) = 
︂( n(, , , , ) · l(, , ) )︂
̃︀ ̃︀ ̃︀ ̃︀
(, , )2
̃︀ ̃︀
where  is the light intensity and  is the surface albedo.  =  and  =  are the
components of surface gradients. n is the surface unit normal and l is a ũ︀nit vector rep̃r︀esenting
the direction of the light ray incident at point S. 2 is inverse square distance fall-of law of
isotropic point light. The light source is considered at the camera center, but can easily be
extended to multiple point light source not necessarily at the center [9].
      </p>
      <p>The surface normal n can be represented in terms of partial derivatives of the depth  with
respect to  and  [7]:</p>
      <p>[−  , −  , 1]
n = √︁(  )2 + (  )2 + 1

where (, , ) are camera coordinates. Under perspective projection we have,
where  is the focal length and (̃︀, ) are image coordinates, and camera is pointing in the
̃︀
negative -direction as depicted in Figure 1.</p>
      <p>According to Horn and Brooks [7], the IIE is,</p>
      <p>
        Equation (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) is solved to estimate  by minimizing the diference between image irradiance
(̃︀, ̃︀) and reflectance map (, , , , ). Optimization is done on depth  whereas  and 
̃︀ ̃︀
are updated by taking the gradient of updated . The relevant optimization problem is given by,
arg min () =  () + (1 −  )(),
      </p>
      <p>where  is the irradiance error and  represents smoothness constraint.  is the weighting
factor between  and .</p>
      <p>() can be computed over the image domain (Ω ⊂ R) as,
∫︁</p>
      <p>Ω
() =</p>
      <p>
        ((̃︀, ̃︀) − (̃︀, ̃︀, , , ))2Ω,
() is solved with anisotropic difusion [ 14], which is a non-linear, space-variant technique
utilized to reduce the noise from the surface without smoothing edges, lines or other details
which are important to interpret the surface. It is then combined with Equation (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) which is
then solved with gradient decent. A small time step Δ is introduces to ensure stability with
higher values of  .
      </p>
      <p>
        To impose anisotropic difusion as a smoothness constraint, a 2 × 2 structure tensor is derived
as a first step from the gradient of the depth  which is given as,
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(8)
(9)
Afterwards, corresponding eigenvalues ( +,  − ) and eigenvectors ( +,  − ) are derived similar
to [15]. From ( +,  − ) and ( +,  − ), difusion tensor D is derived such as,
      </p>
      <p>, =  
  −  − .
∫︁</p>
      <p>Ω
() =</p>
      <p>( +,  − )Ω.</p>
      <p>
        In terms of ( +,  − ), langrangian density  can be written as [14],
Equations (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ) and (9) are combined together in Equation (
        <xref ref-type="bibr" rid="ref5">5</xref>
        ) and can be written as,
arg min () =

∫︁
      </p>
      <p>( ( − )2 + (1 −  ) ( +,  − ))Ω,</p>
      <p>The solution to Equation (10) is given by Euler-Lagrange PDE,
which we numerically solve by,


 ( − )</p>
      <p>+ (1 −  )∇ · (D∇) = 0,


= ∇ · (D∇) +</p>
      <p>( − )

1 − 


in Equation (8) is derived from the gray scale image  (, ).
̃︀ ̃︀</p>
    </sec>
    <sec id="sec-3">
      <title>3. Results &amp; Discussions</title>
      <sec id="sec-3-1">
        <title>3.1. Ground Truth Models</title>
        <p>Shape-from-shading algorithm is tested on diferent areas of synthetic GI regions [ 13]. The
model is imported in Blender to render images of diferent areas of the model. Highlighted
regions utilized for 3D reconstruction along with the model are shown in Figure 2. Blender is
chosen not only to construct a ground truth scenario but also controlling diferent parameters
like light intensity (I), focal length (F), which are needed for 3D reconstruction using SfS.</p>
        <p>An environment is created similar to Figure 1. The camera is placed at (0, 0, 0).  of the
camera is set to 25mm. A point light source is also placed at camera center. Point light source is
(a) ROI 1
(b) ROI 2
(c) ROI 3
selected to imitate the illumination mechanism of pillcams which has four light sources around
camera center. GI model is now cut into diferent regions of interest and placed under the
camera at  &lt; 0. Material properties of the model is set to be Difuse BSDF with a constant
albedo  = 1.</p>
        <p>For a true comparison between the reconstructed surface and the ground truth, these
respective regions are modified using Python API in Blender. When a model is placed under a
perspective camera some of the occluded vertices/areas are not viewed by the camera. Therefore,
it is necessary to remove all the occluded vertices and build the model consisting of only those
vertices which are inside the camera frustum as well as viewed by the camera. The modified
model is then exported as an obj format and then finally imported in MATLAB. These ground
truth models are shown in Figure 4(a), 4(c), 4(e).</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Assessment Criteria</title>
        <p>In order to evaluate the quality of 3D reconstruction, the reconstructed surfaces are compared
with ground truth by measuring depth error and correlation. These methods are chosen to
assess diferent features of the reconstructed surfaces. Depth error is chosen because it correctly
evaluates the geometric deformation of the reconstructed shape. Correlation is chosen because
it evaluates the shape of the reconstructed surface independent of scale and position.</p>
        <p>Correlation is computed by estimating variance and covariance of the recovered shape and the
ground truth data. Whereas, Geometric deformation is investigated by measuring the average
depth error () between the recovered shapes and the ground truth and is given by,
 =
1 ∑︁ ∑︁ ⃒⃒ ̂︀, − , ⃒⃒ ,
Ω ,∈Ω ⃒⃒ , ⃒⃒
(13)
where  is the ground truth and ̂︀ is recovered 3D shape. Ω represents the region of the 3D
model considered for error estimation.</p>
        <p>(a) GT 1
(c) GT 2
(b) RS 1
(d) RS 2</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Image Irradiance from Rendered Images</title>
        <p>Images of size 100 × 100 of each model is rendered as shown in Figure 3. (, ) falling on the
̃︀ ̃︀
camera sensor is related to gray scale image  (, ) via camera response function (· ) [9],
̃︀ ̃︀
− 1[ (, )]
(̃︀, ̃︀) =  (̃,︀)̃︀ ,
̃︀ ̃︀
where  (, ) is anisotropy of the light source. Point lights are perfectly isotropic by definition
̃︀ ̃︀
and so  (, ) = 1. Images are saved in Portable Network Graphics (PNG) file format and
̃︀ ̃︀
therefore, image irradiance is just the gamma correction  = 2.2 of the gray scale image i.e.,
(14)
(15)
(16)
(̃︀, ̃︀) =   (, ).</p>
        <p>̃︀ ̃︀
(̃︀, ) is also converted from pixel units to physical units in order to have correspondence
̃︀
between (, ) and . Conversion to physical units is given by,
̃︀ ̃︀</p>
        <p>(̃︀, ̃︀) − min (, )
(̃︀, ̃︀) = max (̃︀, ̃︀) − min ̃︀(̃︀, ) ×
̃︀ ̃︀
︂(  cos  1
2
1
−
 cos  2 )︂
2
2
+
 cos  2
2
2
where (, ) represents the physical value of the image irradiance and ( 1, 1) and ( 2, 2)
̃︀ ̃︀
decides the upper and lower bound of (, ). ( 1,  2) are the angles between surface normal
̃︀ ̃︀
and light ray at the maximum and minimum point on the surface respectively. (1, 2) are the
distance from the light source to the maximum and minimum point on the surface respectively.
These points are chosen by identifying the maximum and minimum lit area in the ground truth
model and then computing the angles and the distances from the light source.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. 3D Reconstruction</title>
        <p>
          A flat surface is given as an initial condition for all three cases in order to test the robustness of
the method. Initial reflectance map is then computed using Equation (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ). Updated  values are
calculated by solving Equation (12). The value of  is diferent for diferent cases and empirical
in our experiment but more weight is given on irradiance term to have a better reconstruction.
3D reconstruction of diferent areas of GI is shown in Figure 4(b), 4(d), 4(f).
        </p>
        <p>Correlation and depth error are computed between recovered shapes and ground truth models
for all three regions and shown in Table 1. Due to occlusion and dim light on lower areas of GI,
some part of the recovered surfaces were smoothened and therefore, could not recover precisely.
In spite of that, results are quite plausible because a high value of correlation and a low value
of depth error is attained for all three cases. Although, certain simplifying assumptions are
made because authors are interested in proof of concept of using SfS to reconstruct complex
GI geometry. However, considering a flat surface as an initial condition and reaching to the
solution is quite reassuring to apply SfS technique on real capsule images.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>In this article, near light source perspective SfS method is applied on diferent GI regions. Given
a reflection model, numerical scheme is formulated with anisotropic difusion. Shape for each
region is recovered and then compared with ground truth by measuring the average depth error
and correlation. Result shows that SfS can handle complex geometries if penalized correctly.</p>
      <p>In future work, SfS method will be applied on real capsule endoscopic images where we will
have to deal with diferent textures, specularities, occlusion and distorted images. In addition to
that, brightest and dimmest image points in physical units will be estimated for the right scale
between (, ) and . Radiometric calibration will be needed to compute the image irradiance
̃︀ ̃︀
and intensity of the light sources will also be measured. All of this will be essential to correctly
implement SfS technique on real capsule images.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments References</title>
      <p>Funding was provided by the Research Council of Norway under the project CAPSULE no.
300031.
[8] R. Kimmel, J. A. Sethian, Optimal algorithm for shape from shading and path planning,</p>
      <p>Journal of Mathematical Imaging and Vision 14 (2001) 237–244.
[9] C. Wu, S. G. Narasimhan, B. Jaramaz, A multi-image shape-from-shading framework for
near-lighting perspective endoscopes, International Journal of Computer Vision 86 (2010)
211–228.
[10] G. Iddan, G. Meron, A. Glukhovsky, P. Swain, Wireless capsule endoscopy, Nature 405
(2000) 417–417.
[11] A. Koulaouzidis, A. Karargyris, Three-dimensional image reconstruction in capsule
endoscopy, World journal of gastroenterology: WJG 18 (2012) 4086.
[12] V. S. Prasath, I. N. Figueiredo, P. N. Figueiredo, K. Palaniappan, Mucosal region detection
and 3d reconstruction in wireless capsule endoscopy videos using active contours, in:
2012 Annual International Conference of the IEEE Engineering in Medicine and Biology
Society, IEEE, 2012, pp. 4014–4017.
[13] K. İncetan, I. O. Celik, A. Obeid, G. I. Gokceler, K. B. Ozyoruk, Y. Almalioglu, R. J. Chen,
F. Mahmood, H. Gilbert, N. J. Durr, et al., Vr-caps: a virtual environment for capsule
endoscopy, Medical image analysis 70 (2021) 101990.
[14] D. Tschumperlé, R. Deriche, Vector-valued image regularization with pdes: A common
framework for diferent applications, IEEE transactions on pattern analysis and machine
intelligence 27 (2005) 506–517.
[15] G. Sapiro, D. L. Ringach, Anisotropic difusion of multivalued images with applications to
color filtering, IEEE transactions on image processing 5 (1996) 1582–1586.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Yung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Plevris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Leenhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koulaouzidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. S. B. R. W.</given-names>
            <surname>Group</surname>
          </string-name>
          , et al.,
          <article-title>Poor quality of small bowel capsule endoscopy images has a significant negative efect in the diagnosis of small bowel malignancy</article-title>
          ,
          <source>Clinical and Experimental Gastroenterology</source>
          <volume>13</volume>
          (
          <year>2020</year>
          )
          <fpage>475</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wesley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hendra</surname>
          </string-name>
          ,
          <article-title>Computer vision based 3d reconstruction: A review</article-title>
          ,
          <source>International Journal of Electrical and Computer Engineering</source>
          <volume>9</volume>
          (
          <year>2019</year>
          )
          <fpage>2394</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3] X.
          <article-title>-b.</article-title>
          <string-name>
            <surname>Lai</surname>
          </string-name>
          , H.-s. Wang, Y.-h. Xu,
          <article-title>A real-time range finding system with binocular stereo vision</article-title>
          ,
          <source>International Journal of Advanced Robotic Systems</source>
          <volume>9</volume>
          (
          <year>2012</year>
          )
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Steckel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Peremans</surname>
          </string-name>
          , Batslam:
          <article-title>Simultaneous localization and mapping using biomimetic sonar</article-title>
          ,
          <source>PloS one 8</source>
          (
          <year>2013</year>
          )
          <article-title>e54076</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Floor</surname>
          </string-name>
          , I. Farup,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Pedersen, 3d reconstruction of the human colon from capsule endoscope video. (accepted)</article-title>
          ,
          <source>in: Colour and Visual Computing Symposium (CVCS)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Koulaouzidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Iakovidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Yung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mazomenos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Karagyris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Dimas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Thorlacius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Toth</surname>
          </string-name>
          , et al.,
          <article-title>Novel experimental and software methods for image reconstruction and localization in capsule endoscopy</article-title>
          ,
          <source>Endoscopy International Open</source>
          <volume>6</volume>
          (
          <year>2018</year>
          )
          <fpage>E205</fpage>
          -
          <lpage>E210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Brooks</surname>
          </string-name>
          ,
          <article-title>The variational approach to shape from shading</article-title>
          ,
          <source>Computer Vision</source>
          , Graphics, and
          <source>Image Processing</source>
          <volume>33</volume>
          (
          <year>1986</year>
          )
          <fpage>174</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>