<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DL-inferencing for 3D Cephalometric Landmarks Regression task using OpenVINO?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Institute of Information Technologies</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathematics</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mechanics</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lobachevsky State</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Ophthalmology and Optometry, Medical University of Vienna</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1919</year>
      </pub-date>
      <abstract>
        <p>In this paper, we evaluate the performance of the Intel Distribution of OpenVINO toolkit in practical solving of the problem of automatic threedimensional Cephalometric analysis using deep learning methods. This year, the authors proposed an approach to the detection of cephalometric landmarks from CT-tomography data, which is resistant to skull deformities and use convolutional neural networks (CNN). Resistance to deformations is due to the initial detection of 4 points that are basic for the parameterization of the skull shape. The approach was explored on CNN for three architectures. A record regression accuracy in comparison with analogs was obtained. This paper evaluates the performance of decision making for the trained CNN-models at the inference stage. For a comparative study, the computing environments PyTorch and Intel Distribution of OpenVINO were selected, and 2 of 3 CNN architectures: based on VGG for regression of cephalometric landmarks and an Hourglass-based model, with the RexNext backbone for the landmarks heatmap regression. The experimental dataset was consist of 20 CT of patients with acquired craniomaxillofacial deformities and was include pre- and post-operative CT scans whose format is 800x800x496 with voxel spacing of 0.2x0.2x0.2 mm. Using OpenVINO showed a great increase in performance over the PyTorch, with inference speedup from 13 to 16 times for a Direct Regression model and from 3.5 to 3.8 times for a more complex and precise Hourglass model.</p>
      </abstract>
      <kwd-group>
        <kwd>3D Cephalometry</kwd>
        <kwd>Automatic Cephalometry</kwd>
        <kwd>Keypoint Regression</kwd>
        <kwd>OpenVINO</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>At present time, deep learning models are increasingly used in medicine. They allow
solving the problems of analyzing medical images without manual parameter
extraction. Biological organisms have great variability in size, structure and shape, which
does not allow building an accurate mathematical description of biological systems.
Deep models during training select and extract deep features on their own and build an
internal representation of the objects of analysis, which allows us to solve the problems
of processing medical data.</p>
      <p>
        After constructing a deep model, the problem of its implementation in existing
software and hardware used in the industry arises. Deep models require a lot of
computation, which also imposes restrictions on the possibility of use. To solve the problems of
efficient deep model inference on various hardware and embedding in existing software,
the Intel R Distribution of OpenVINOTM toolkit [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is used. The Intel R Distribution of
OpenVINOTM toolkit shows significant acceleration of deep learning models in
computer vision tasks [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ] and is also used to accelerate deep learning models in other
areas of research in production.
      </p>
      <p>One of the tasks for which it is difficult to construct an exact mathematical
description and a deterministic solution algorithm is the problem of finding the key points of
the skull.</p>
      <p>In this paper we analyse and accelerate the inference performance of state of the art
3D deep convolutional neural network (CNN) based method for keypoint regression to
solve the task of three-dimensional cephalometric analysis.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Problem statement</title>
      <p>The main purpose of our work is to create fast and resistant to variations in the shape of
the human skull method for automatic detection cephalometric points. In particular, we
are interested in the location of the following 4 landmarks:
1. Left Orbitale.
2. Right Orbitale.
3. Left Porion.
4. Right Porion.</p>
      <p>This four points are important because its includes to Frankfort Horizontal
determination process. Each point is represented by 3 coordinates in the CT coordinate system.</p>
      <p>The task of marking up the source data is complex and ambiguous since different
specialists can mark up different positions for landmarks in 3D. In the current
formulation of the problem, a deviation of 4 millimeters for the points forming the Frankfort
horizontal plane is sufficient accuracy, comparable to the marking accuracy of an
average specialist.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related works</title>
      <p>Besides the clinical application, the landmark regression is used in a number of
different spheres. For instance, facial landmarks regression, human pose estimation, or even
crowd counting take a central part in intelligent surveillance systems. In these tasks,
detected landmarks can represent different entities like face parts, body parts, or whole
DL-inferencing for Cephalometric Landmarks Regression using OpenVINO 3
human body. In the literature, neural network based methods for solving key point
regression task can be split into two groups: the direct regression of target variables and
regression through some intermediate representation, for example, heatmap.</p>
      <p>
        Chen et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] used neural networks and genetic algorithms to find areas on the
radiograph containing cephalometric points. Osadchy et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] applied a convolutional
neural network based approach for mapping face image to a manifold, parametrized by
pose. Tompson et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] proposed a convolutional network based human pose
estimation method that outputs heatmap. The heatmap per pixel describes the likelihood of a
landmark appearing in each spatial position. Newell et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] proposed the new
convolutional network architecture for this task called Hourglass networks. The base network
operates over all scales of the image. Authors also propose to stack sequentially
multiple base networks.
      </p>
      <p>
        Deep learning based approaches mainly focus on automatic detection of
cephalometric landmarks on lateral X-Ray images. Lee et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] utilize patch classification and
point estimation neural networks for the identification of 33 landmarks. Hwang et al.
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] proposed the YOLOv3 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] based system. For 3D CT scans Lee et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] proposed
VGG-based [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] method for detecting landmarks on shadowed 2D projections. A
completely 3D based approach using 3D convolutional neural network based system was
proposed by Kang et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        In our previous paper Lachinov et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] we proposed 3 convolutional neural
networks for cephalometric landmarks regression on 3D CT images. We found out that the
Stacked Hourglass network [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and Softargmax based model [
        <xref ref-type="bibr" rid="ref15 ref16">15,16</xref>
        ] achieve
remarkable accuracy, and Direct Regression model [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] achieve the best performance.
      </p>
      <p>
        The Root Mean Squared Error (RMSE) values are reported in Table 1. Hereby we
can see that Direct Regression has the highest error. The likelihood of landmarks to be
within a certain radius from ground truth points is reported in Table 2. We pay special
attention to this characteristic since the radius of 4 mm is considered to be a threshold
for clinical applications. As we can see, only 61% of the points predicted by Direct
Regression fall within 4 mm radius. In contrast, the Stacked Hourglass model achieves
95% of its prediction to fall within 4 mm radius respectively. The high accuracy for 2
mm and 3 mm radius is also notable (Table 2). For more quantitative analysis, see the
article [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Distribution of OpenVINOTM toolkit
Intel R Distribution of OpenVINOTM toolkit is developed to accelerate and deploy
neural network models with a built-in model optimizer and an inference engine runtime for
hardware-specific acceleration. Inference optimization is provided through the
analysis and optimization of a computational graph, effective planning of data processing
and vectorization, and various methods of compression of the deep model. Intel R
Distribution of OpenVINOTM toolkit focuses on developing cross-platform solutions of
computer vision problems and pays a lot of attention to medical imaging AI workloads.
OpenVINO has few dependencies that help to integrate OpenVINO with existing
software. A promising feature is a support for the protection of deep models by encrypting
them.
4.1</p>
      <sec id="sec-3-1">
        <title>Components</title>
        <p>
          The Intel R Distribution of OpenVINOTM toolkit is an actively developing product
in which new functions are developed. The current version consists of several major
parts [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
1. Deep learning for computer vision. This part includes the Deep Learning
Deployment Toolkit to make a high-performance inference of pretrained deep neural
network models using a high-level application programming interface.
2. Traditional computer vision. This part includes accelerated computer vision
library OpenCV [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>DL-inferencing for Cephalometric Landmarks Regression using OpenVINO 5
3. Additional packages to perform optimized inference using different hardware (Intel R
FPGA, Intel R MovidiusTM Neural Compute Stick, Intel R GMM-GNA) and
media encode/decode functions for improving the performance of processing graphics
and video.
4. Open Model Zoo is a public repository of more than 180 pretrained models for
solving various problems of computer vision, samples and demos.
5. Post-Training Optimization Tool designed to accelerate the inference of DL
models by converting them into a more hardware-friendly representation by applying
specific methods that do not require re-training, for example, post-training
quantization.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Scheme of using The Intel R Distribution of OpenVINOTM toolkit</title>
      <p>The basic variant of using OpenVINO involves the following steps:
1. Train a deep neural network model trained using any popular deep learning
frameworks (Caffe, TensorFlow, Keras, PyTorch etc.) or download pretrained model
using Model Downloader.
2. Convert the model to the intermediate representation (IR) by calling Model
Optimizer.
3. Load input data for the model and infer the model using Inference Engine
efficiently, receive the model output for the subsequent interpretation.
4.3</p>
      <sec id="sec-4-1">
        <title>Model conversion</title>
        <p>
          To perform the inference, the Inference Engine does not operate with the original model,
but with its Intermediate Representation (IR), which is optimized for execution on
endpoint target devices. To generate an IR for your trained model, the Model Optimizer
tool is used. The Model Optimizer loads a model into memory, reads it, builds the
internal representation of the model, optimizes it, and produces the Intermediate
Representation. OpenVINO supports the next framework formats: Caffe, MXNet, TensorFlow,
Kaldi, ONNX. OpenVINO supports all common deep learning layers, but new layers
are invented every day. OpenVINO contains a mechanism for adding your custom
layers. To infer PyTorch model using OpenVINO, you should convert it to ONNX first,
it is a simple operation. The next step will be convert ONNX model to Intermediate
respesentation. The conversion of proposed model from ONNX format is listed below:
cd %OPENVINO DIR%/ d e p l o y m e n t t o o l s / m o d e l o p t i m i z e r
p y t h o n mo . py i n p u t m o d e l v g g l i k e . onnx n
i n p u t s h a p e [
          <xref ref-type="bibr" rid="ref1 ref1">1 , 1 , 1 2 8 , 1 2 8 , 6 4</xref>
          ] n
o u t p u t d i r %YOUR DIR%
4.4
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Chosen Models</title>
        <p>
          Direct regression. We define a direct regression as a convolutional neural network
followed by global pooling and fully connected layer. The output of a fully connected
layer matches the target variables. The graph of the model is presented in Fig. 2. The
network is in line with the VGG-based model introduced by Simonyan et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. It
takes 3D CT scan as input and processes it with a series of 3x3x3 convolutional blocks,
instance normalization [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] layers and ReLU activation. At the end of the fully
convolution part of the network global average pooling is performed that is followed by
two fully connected layers with activations. The number of outputs of the final layer
corresponds to the number of regressing values. In our case it equal to 4 points with 3
coordinates each, 12 in total.
        </p>
        <p>VGG</p>
        <p>
          Heatmap regression. Unlike the previous model, in which we tried to directly predict
the target variable, here we focus on predicting the probability of a key point for each
voxel (Heatmap). Ground truth heatmaps are generated by a probability density
function of a Gaussian distribution with an average value in the target landmark. In CNN
design, we follow the Stacked Hourglass network architecture proposed by Newell et
al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. It combines multiple subnetworks stacked one after another (Fig. 3). The
subnetworks consist of encoder and decoder that are connected by the means of additive
skip-connections. The architecture of an individual network is displayed in Fig. 3. In
this model we use 3 stacked Hourglass networks with ResNext blocks [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and Group
Normalization. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. The output layer consists of a single convolution and sigmoid
activation. At the end of each network in the stack, we provide additional supervision by
attaching auxiliary output layers with the corresponding loss function.
        </p>
        <p>
          In the basic implementation of the Hourglass model, the upscaling operation through
trilinear interpolation was not fast enough for 3D data processing using OpenVINO.
OpenVINO has a simple and convenient system for analyzing the execution time of
individual layers of the model. A question was asked in the OpenVINO repository on
GitHub how this operation can be optimized, and an Intel engineer proposed the
solution to replace upscaling with a specific deconvolution layer, which gives absolutely the
same result [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>DL-inferencing for Cephalometric Landmarks Regression using OpenVINO 7</p>
        <p>
          A step-by-step tutorial on how to create code to infer deep models using OpenVINO
can be found in the article [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which details the sequence of actions and provides the
source code of tutorial for working with OpenVINO.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <sec id="sec-5-1">
        <title>5.1 Infrastructure</title>
        <p>The following hardware was used as a test infrastructure for measuring performance:
CPU: Intel R CoreTM i7-8700 CPU @ 3.20GHz
RAM size: 64 GB
OS version: Linux-5.3.0-51-generic-x86-64-with-Ubuntu-18.04-bionic
Python version: 3.7.7
OpenVINO version: 2020.3.194</p>
        <p>PyTorch version: 1.5.1 (from anaconda)
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Data</title>
        <p>In our experiments, we use the dataset consisting of 20 CT images of patients with
acquired cranio-maxillofacial deformities. All scans were taken with two multispiral
CT devices. The dataset consists of pre- and post-operative CT scans. The resolution of
each image is 800x800x496 with voxel spacing of 0.2x0.2x0.2 mm. For every image 4
cephalometric landmarks were annotated: left and right orbitale and porion.</p>
        <p>As a preprocessing step, we downsample images to the size of 128x128x64 with
voxel spacing of 1.25x1.25x1.55 mm. Then we perform the z-score normalization of
the image I by subtracting mean and diving by standard deviating : Iz = I I .
I
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Performance Analysis</title>
        <p>VGG and Hourglass models contain 1:33 106 and 25:5 106 parameters and need
3:8 109 and 6:9 109 operations respectively. During the experiment on the inference
of the VGG model using OpenVINO, OpenVINO showed a great increase in
performance over the original framework. We have got inference speedup from 13 to 16 times
for a simpler VGG model and from 3.5 to 3.8 times faster for a more complex
Hourglass model. The Hourglass model has more complex architecture and contains many
more parameters, and that is why OpenVINO did not show as much acceleration as the
VGG model. Trilinear upscaling interpolation for 3D data in the decoder is the longest
computational operation in the Hourglass model.</p>
        <p>A common way to increase productivity and utilize processor power is to process
data with large batches. This approach is often used during training of deep networks
or during remote data processing, for example, server processing data from several
cameras simultaneously. Using a large batch allows you to achieve greater processor</p>
        <p>DL-inferencing for Cephalometric Landmarks Regression using OpenVINO 9
performance in the tasks of image classification, detection of objects in images, and
other tasks.</p>
        <p>In our case of the VGG-like model, the use of a large batch did not entail an increase
in system performance. The main problem for increasing productivity by increasing the
batch is a large amount of input for the model. In the cephalometry problem, the input
tensor for the VGG model is an order of magnitude larger than the size of the input
image for the Resnet-50 classification model.</p>
        <p>Tables 3 and 4 gives the performance of the Direct Regression on generated data
set of 2048 objects using batch 1 and analysis of performance using different batches,
tables 5 and 6 gives the performance of the Hourglass method on generated data set of
128 objects respectively. Dashes indicate experiments that were not finished due to out
of memory error on the target device.</p>
        <p>Analyzing the results of performance measurements, we can see that OpenVINO
can significantly accelerate the inference of deep neural networks on user hardware
without using of specialized computing devices, only by optimizing the calculations.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>This article presented a study of using the Intel R Distribution of OpenVINOTM toolkit
in a complex medical problem. An overview of the Intel R Distribution of OpenVINOTM
toolkit has been shown. The practical application of the OpenVINOTM toolkit has been
demonstrated on the problem of cephalometric landmark regression on 3D computed
tomography data. The solution of this problem using deep neural networks have described
and the inference speedup results on typical hardware have been shown. The quality
and inference performance experiments with two CNN model have been performed:
based on VGG for direct regression of cephalometric landmarks and an
Hourglassbased model, with the RexNext backbone for the landmarks heatmap regression, which
contain 1:33 106 and 25:5 106 parameters respectively. The total RMSE values 4.58
for VGG model and 1.72 for Hourglass model were obtained. OpenVINO inference
showed a significant speedup of the model execution over the PyTorch, the speedup
was from 13 to 16 times for VGG model and from 3.5 to 3.8 times for a more complex
and precise Hourglass model.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Intel R Distribution of</surname>
          </string-name>
          <article-title>OpenVINOTM toolkit</article-title>
          . https://docs.openvinotoolkit. org/latest/index.html,
          <source>last accessed 30 Jun 2020 2</source>
          ,
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kustikova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasiliev</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khvatov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumbrasiev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rybkin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kogteva</surname>
          </string-name>
          , N.:
          <article-title>Dli: Deep learning inference benchmark</article-title>
          .
          <source>Communications in Computer and Information Science 1129 CCIS</source>
          ,
          <fpage>542</fpage>
          -
          <lpage>553</lpage>
          (
          <year>2019</year>
          )
          <fpage>2</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kustikova</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vasiliev</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khvatov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumbrasiev</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vikhrev</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Utkin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dudchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gladilov</surname>
          </string-name>
          , G.:
          <article-title>Intel distribution of openvino toolkit: a case study of semantic segmentation</article-title>
          .
          <source>AIST: International Conference on Analysis of Images, Social Networks and Texts 11832 LNCS</source>
          ,
          <fpage>11</fpage>
          -
          <lpage>23</lpage>
          (
          <year>2019</year>
          )
          <volume>2</volume>
          ,
          <fpage>7</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>H.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          :
          <article-title>Comparison of landmark identification in traditional versus computer-aided digital cephalometry</article-title>
          .
          <source>The Angle Orthodontist</source>
          <volume>70</volume>
          (
          <issue>5</issue>
          ),
          <fpage>387</fpage>
          -
          <lpage>392</lpage>
          (
          <year>2000</year>
          )
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Osadchy</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Le</given-names>
            <surname>Cun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.L.</surname>
          </string-name>
          :
          <article-title>Synergistic Face Detection and Pose Estimation with Energy-Based Models</article-title>
          , pp.
          <fpage>196</fpage>
          -
          <lpage>206</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2006</year>
          )
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Tompson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goroshin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bregler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Efficient object localization using convolutional networks</article-title>
          .
          <source>In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          . pp.
          <fpage>648</fpage>
          -
          <lpage>656</lpage>
          (
          <year>June 2015</year>
          ) 3
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Newell</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
          </string-name>
          , J.:
          <article-title>Stacked hourglass networks for human pose estimation</article-title>
          .
          <source>In: ECCV</source>
          (
          <year>2016</year>
          )
          <volume>3</volume>
          ,
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanikawa</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lim</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yamashiro</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Deep learning based cephalometric landmark identification using landmark-dependent multi-scale patches</article-title>
          . ArXiv abs/
          <year>1906</year>
          .02961 (
          <year>2019</year>
          )
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hwang</surname>
            ,
            <given-names>H.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          , Moon,
          <string-name>
            <given-names>J.H.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Her</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.B.</given-names>
            ,
            <surname>Srinivasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Aljanabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.N.A.</given-names>
            ,
            <surname>Donatelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.E.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.J.</surname>
          </string-name>
          :
          <source>Automated identification of cephalometric landmarks:</source>
          Part 2
          <article-title>-might it be better than human?</article-title>
          <source>The Angle Orthodontist</source>
          <volume>90</volume>
          (
          <issue>1</issue>
          ),
          <fpage>69</fpage>
          -
          <lpage>76</lpage>
          (
          <year>2020</year>
          )
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Redmon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farhadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Yolov3: An incremental improvement</article-title>
          . ArXiv abs/
          <year>1804</year>
          .02767 (
          <year>2018</year>
          )
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>H.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeon</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seo</surname>
            ,
            <given-names>J.K.</given-names>
          </string-name>
          :
          <article-title>Automatic 3d cephalometric annotation system using shadowed 2d image-based machine learning</article-title>
          .
          <source>Physics in Medicine &amp; Biology</source>
          <volume>64</volume>
          (
          <issue>5</issue>
          ),
          <volume>055002</volume>
          (feb
          <year>2019</year>
          )
          <article-title>3</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Simonyan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zisserman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>CoRR abs/1409</source>
          .1556 (
          <year>2014</year>
          )
          <volume>3</volume>
          ,
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeon</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seo</surname>
            ,
            <given-names>J.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          :
          <article-title>Automatic three-dimensional cephalometric annotation system using three-dimensional convolutional neural networks: a developmental trial</article-title>
          .
          <source>Computer Methods in Biomechanics and Biomedical Engineering: Imaging &amp; Visualization</source>
          <volume>8</volume>
          (
          <issue>2</issue>
          ),
          <fpage>210</fpage>
          -
          <lpage>218</lpage>
          (
          <year>2020</year>
          )
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lachinov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Getmanskaya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turlapov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Cephalometric landmark regression with convolutional neural networks on 3d computed tomography data abs/</article-title>
          <year>2007</year>
          .10052 (
          <year>2020</year>
          )
          <volume>3</volume>
          ,
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Nibali</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Morgan,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Prendergast</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Numerical coordinate regression with convolutional neural networks</article-title>
          .
          <source>ArXiv abs/1801</source>
          .07372 (
          <year>2018</year>
          )
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Luvizon</surname>
            ,
            <given-names>D.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tabia</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Picard</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Human pose regression by combining indirect part detection and contextual information</article-title>
          .
          <source>Comput. Graph</source>
          .
          <volume>85</volume>
          ,
          <fpage>15</fpage>
          -
          <lpage>22</lpage>
          (
          <year>2017</year>
          )
          <fpage>3</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. OpenCV, Open Source Computer Vision Library. http://opencv.org,
          <source>last accessed 30 Jun</source>
          <year>2020</year>
          4
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ulyanov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vedaldi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lempitsky</surname>
            ,
            <given-names>V.S.:</given-names>
          </string-name>
          <article-title>Instance normalization: The missing ingredient for fast stylization</article-title>
          .
          <source>ArXiv abs/1607</source>
          .08022 (
          <year>2016</year>
          )
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          , Dolla´r, P.,
          <string-name>
            <surname>Tu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Aggregated residual transformations for deep neural networks</article-title>
          .
          <source>2017 IEEE Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          (CVPR) pp.
          <fpage>5987</fpage>
          -
          <lpage>5995</lpage>
          (
          <year>2017</year>
          )
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Group normalization</article-title>
          .
          <source>In: ECCV</source>
          (
          <year>2018</year>
          )
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <article-title>Issue about slow trilinear upscaling interpolation in OpenVINO</article-title>
          .
          <source>Last accessed 30 Jun</source>
          <year>2020</year>
          6
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>