=Paper=
{{Paper
|id=Vol-2655/paper22
|storemode=property
|title=Towards deep learning reliable gender estimation from dental panoramic radiographs
|pdfUrl=https://ceur-ws.org/Vol-2655/paper22.pdf
|volume=Vol-2655
|authors=Nicolás Vila Blanco, Raquel Rodríguez Vilas, María José Carreira Nouche, Inmaculada Tomás Carmona
|dblpUrl=https://dblp.org/rec/conf/ecai/BlancoVNC20
}}
==Towards deep learning reliable gender estimation from dental panoramic radiographs==
<pdf width="1500px">https://ceur-ws.org/Vol-2655/paper22.pdf</pdf>
<pre>
differences in the mandible, principally in some variables such as the bicondylar and ramus width or the gonial
angle [VMGA13].
   All these findings provided useful tools to determine the gender of a subject through relatively simple mea-
surements in the oral cavity, which can be carried out in-situ (directly over the bone) or with the help of
imaging technologies such as X-ray or CT. As it can be seen in Table 1, the majority of these methods for
sex estimations rely on mandibular measurements, which varied in number from 3 to 12. With this approach,
the obtained accuracy ranged from 70 to 95%. The second most followed approach is the gender estimation
through the Mandibular Canine Index (MCI), whose accuracy varied from 64 to 86%. However, these methods
rely on measurements that have to be taken manually by one or several well-trained experts and thus, they are
time-consuming procedures. Also, they are subject to inter- and intra-observer disagreement which, ultimately,
leads to problems of reproducibility.
   That is the main reason why computer-assisted approaches have been adopted in some clinical procedures,
like prostheses design [ARR+ 17]. In particular, imaging processing techniques have proven to be very useful
in oral-related assessment, tackling different tasks such as mandible segmentation [AKM15], teeth outlining
[VBCL+ 18] or disease diagnosis [RGD13]. In the recent years, the increasing number of medical images, as well
as the increasing computing power, have contributed to the development of more sophisticated machine learning
approaches, such as the Deep Neural Networks (DNNs). This kind of methods has already been used to process
dental images, with numerous successful showcases [VBCVQ+ 20, LHK+ 19, YGTY18]. Specifically, a recent work
proposed the use of DNNs for gender estimation [MVGS19], with a top accuracy of 97%.
   The vast majority of these works are focused on determining the sex of subjects older than 20, mainly because
the permanent teeth are already developed and there are anatomical features related to size in the mature state
which allows for a more accurate gender prediction. This is in line with the findings of our previous work, where
a DNN architecture was proposed to estimate the chronological age and the sex [VBCVQ+ 20].
   In this work, a comparison of three DNN architectures has been proposed to determine the sex of a subject
from a dental panoramic image (OPG), focusing on the influence of the patient’s age in the prediction accuracy.
Furthermore, the results of the unexplored group of younger than 20 in [MVGS19] have been analysed.
               Table 1: Performance of methods for gender estimation. (M: Males; F: Females)


        Reference            Sample          Age range                Method              Accuracy
        [FOOD06]          40 (20M/20F)         20-48        Mandible measurements (10)       95%
                                                             Tooth crown measurements
          [AM08]          53 (31M/22F)           19-28                                     64-83%
                                                            (bucolingual and mesiodistal)
         [MSM10]        200 (100M/100F)          18-25     Mandibular Canine Index (MCI)     76%
         [MPR13]        200 (100M/100F)          20-86       Mandible measurements (3)       84%
                                                              Mandibular measurements
         [BOTA15]       419 (126M/293F)          13-26                                      70.9%
                                                               (Method from [LH96])
          [SGP+ 15]  100 (45M/55FF)             20-30      Mandibular Canine Index (MCI)    85.5%
              +
          [SPG 16]    120 (50M/70F)             16-30      Mandibular Canine Index (MCI)    64.2%
          [AIB+ 18]    79 (48M/31F)             18-74       Mandible measurements (12)      78.5%
         [MVGS19] 4000 (2352M/1648F)            19-85           Deep Neural Network         96.7%
       [VBCVQ+ 20] 2289 (1030M/1257F)          4.5-89.2         Deep Neural Network         85.4%


2   Material and Methods
In this work, a set of 3400 OPG images provided by the School of Medicine and Dentistry, Universidade de
Santiago de Compostela (Spain) were used. The images were collected under the approval of the ethical committee
of the same university. The patients are distributed homogeneously in terms of sex and concentrated in the age
groups between 5 and 30 (see Table 2).
   To build the sex classification system, three different CNN approaches have been compared. Firstly, DASNet
(Dental Age and Sex Network) architecture proposed in [VBCVQ+ 20] was evaluated. This method consists
of a main CNN path to estimate the chronological age. Also, a second identical path is added to classify the
                                Table 2: Age and sex distribution of the dataset

                                                             Sex
                                                                                  Total
                                                     Men           Women
                                         [5,10)       256           254         510 (15%)
                                         [10,20)      595           606        1201 (35%)
                                         [20,30)      232           438         670 (20%)
                                         [30,40)      118           183         301 (9%)
                        Age groups
                                         [40,50)      109           135         244 (7%)
                                         [50,60)       93           139         232 (7%)
                                         [60,70)       77            77         154 (4%)
                                         [70,90)       35            53          88 (3%)
                                 Total             1515 (45%)   1885 (55%)    3400 (100%)
images according to the sex and thus extract gender-dependant features. By propagating those gender features
to intermediate stages of the age path, the method can improve the age predictions. Although the gender
classification was not the main objective, it achieves state-of-the-art results.
   The second tested approach is an adaptation of the previous one. Given the fact the main objective of DASNet
was to integrate gender features to improve the chronological age estimation, a new version with the inverted
roles was developed (referred as Dental Sex and Age Network or DSANet). In this case, the main CNN path
corresponds to the gender classifier and the auxiliary path is designed to regress the chronological age and thus
learn maturational features, which are propagated to the gender path in order the improve the sex classification.
   The third evaluated method is the so-called VGG16 architecture [SZ14], which has been slightly modified (see
Fig. 1). First, the input size of the first layer of the network was changed from 224x224 to 512 pixels width by 256
pixels height. Second, two convolutional blocks were added at the beginning of the network, each one composed
of a convolutional layer with a ReLU activation and a Batch Normalisation layer [IS15]. As each convolutional is
designed to provide 3 output feature maps, the single-channel x-ray images can be transformed to three-channel
images as it is expected by the VGG16. Moreover, a last block of convolutional, Batch Normalisation and 2x2
pooling layers was added on top of the network to reduce the output size. Finally, the fully connected part of the
original VGG16 architecture was simplified, with a single 128-neuron layer and a single output as the probability
of the given image of belonging to the positive class (female).
   All the convolutional layers contain 3x3 kernels in order to match the VGG16 behaviour. The layers corre-
sponding to the VGG16 backbone were initialised with the weights pre-trained with Imagenet dataset [DDS+ 09],
and only the last block of the network was allowed to be modified during the training.


                                 Figure 1: VGG16-based proposed architecture.


3   Experiments and results
The networks were trained by using Adadelta optimiser, as it adapts the learning rate automatically [Zei12],
and the batch size was set empirically to 16 as a good compromise between efficiency and regularisation effect.
All the batches were transformed to improve the variability of the data and so improve the performance of the
network. First column of Table 3 shows the transformations applied: translation in both axes, rotation and
contrast and brightness changing. The brightness was disturbed according to the Power Law Transform, where
each pixel value in the image is raised to a given factor. The contrast was changed according to the formula
f · (I − 0.5) + 0.5, where f is the factor of the transformation and I is the input image. After the transformations,
pixel values under 0 or over 1 are clipped to preserve the [0,1] range. Second column of Table 3 shows the factor,
that represents the boundaries of the uniform distribution used to get the transformation parameters. Third
column represents the probability of the batch size affected by the transformation.

Table 3: Data augmentation transformations used to make the training set more diverse and thus improve the
network capabilities.

                                Transformation           Factor        Probability
                                 Horizontal flip            -               0.5
                                  Translation X      (-10,10) pixels         1
                                  Translation Y       (-8,8) pixels          1
                                    Rotation         (-1,1) degrees          1
                                    Contrast           (0.8,1.2) x          0.8
                                   Brightness          (0.8,1.2) x          0.8

   All the experiments to assess the performance of the gender estimation networks were carried out with 8-fold
Cross-Validation. The set of images was divided into 8 parts or folds, with all parts distributed in a similar way
according to the age and the sex of the patients. The model was then trained iteratively, using in each iteration
6 folds to train, 1 fold to validate the training and 1 fold to obtain the performance metrics. The folds used
to each task are changed in each CV-iteration and thus the performance of the model can be averaged over the
whole dataset.
   The performance of the networks was assessed with four different metrics. Firstly, the accuracy gives a general
idea of the correctly classified images. Then, the sensitivity and specificity metrics provide the percentage of the
well-classified female and male images, respectively. To calculate these three measurements, we set a threshold
of 0.5 to decide if the prediction class is positive (female) or negative (male). Finally, the Area Under the ROC
Curve (AUC) combines the sensitivity and specificity metrics obtained for every possible classification threshold
(not only 0.5), so it is useful to evaluate the robustness of the network.
   In Table 4 the prediction results are shown independently for each age group (following the same division as
in Table 2 for people older than 20 and a finer division for younger people) and for each network. With DASNet,
the accuracy of the classification goes beyond 90% in every age group older than 16 years of age. In younger
people, the performance decreases up to 75% in the range 5-8 and the accuracy peak is reached in the group
18-20 (96.24%). The female images are classified better as in 9 of the 13 evaluated age groups, where the most
noticeable differences are in the group 5-8, with a 12% in favour of male images (greater specificity). The AUC
exceeds 80% in every group, reaching a top value of 98.23% in the group 18-20.
   The results of DSANet show an accuracy of over 83% in subjects older than 8 years of age and over 90% when
going beyond 16, where the top accuracy of 96.68% is obtained between 30 and 40. The performance is better in
the images of females in every age group, being 12% the greatest difference between 40 and 50 years of age. The
AUC stays above 90% in every age group but the younger, reaching a value of almost 99% in the range 30-40.
   With VGG16, the gender estimation accuracy exceeds 90% in people aged from 16 to 60, with a significant
peak of 94% in the 30-40 age range. The performance in people older than 60 starts to decrease with an accuracy
of 88%, being 83% in the older age group. The performance in younger groups stays between 84% and 90% in
the 8-16 age range, being 70% for the subjects younger than 8. The images of females are classified better in
almost every age group, being especially noticeable in groups 16-18 and 18-20 (sensitivity values of 96.88% and
96%). The most considerable differences between the performance in female and male images occur in groups
12-14 and 16-18 (difference of 13%), while the most balanced predictions are made in the group 50-60 (difference
of 0.03%). The AUC falls below 78% in the youngest age group, but exceeds 94% between 14 and 70 years of
age.
Table 4: Gender estimation results by age group. The highlighted values correspond to the most accurate
networks according to each specific metric.

                Age groups    Method    Accuracy    Sensitivity   Specificity   AUC
                              DASNet      75.00%       68.29%      80.39%       82.80%
                   [5,8)      DSANet      70.11%      70.73%        69.61%      80.06%
                               VGG16      70.65%       69.51%       71.57%      77.91%
                              DASNet      77.19%       74.71%       80.00%      85.63%
                   [8,10)     DSANet      83.75%      84.12%       83.33%       89.99%
                               VGG16      81.88%       82.94%       80.67%      77.91%
                              DASNet      79.89%       82.11%       77.53%      88.03%
                  [10,12)     DSANet      84.24%      86.84%       81.46%       91.36%
                               VGG16      81.97%       84.74%       78.65%      89.39%
                              DASNet      86.04%       89.56%       82.24%      92.96%
                  [12,14)     DSANet      87.18%       88.46%      85.80%       94.00%
                               VGG16      86.61%      92.86%        79.88%      92.28%
                              DASNet      88.07%      92.63%        84.55%      94.95%
                  [14,16)     DSANet      88.99%       90.53%      87.80%       95.75%
                               VGG16      89.91%      92.63%       87.80%       95.88%
                              DASNet      90.08%       90.63%       89.55%      96.22%
                  [16,18)     DSANet      94.65%       95.31%      94.02%       98.71%
                               VGG16      90.08%      96.88%        83.58%      98.60%
                              DASNet      96.24%       96.00%      96.55%       98.23%
                  [18,20)     DSANet      96.24%      98.67%        93.01%      98.14%
                               VGG16      93.98%       96.00%       91.38%      95.89%
                              DASNet      89.40%       92.23%       84.05%      95.02%
                  [20,30)     DSANet      91.79%      93.38%       88.79%       96.57%
                               VGG16      90.30%       92.47%       86.21%      95.40%
                              DASNet      93.02%       93.44%       92.37%      97.60%
                  [30,40)     DSANet      96.68%      97.81%       94.91%       98.96%
                               VGG16      94.02%       95.08%       92.37%      97.04%
                              DASnet      89.34%       91.85%       86.24%      93.72%
                  [40,50)     DSANet      93.03%      98.52%        86.24%      96.03%
                               VGG16      91.39%       94.07%      88.07%       95.51%
                              DASNet      89.22%       87.05%      92.47%       95.50%
                  [50,60)     DSANet      94.40%      96.40%        91.40%      97.83%
                               VGG16      91.38%       91.37%       91.40%      97.19%
                              DASnet      89.61%       92.20%      87.02%       93.77%
                  [60,70)     DSANet      88.96%      92.21%        85.71%      94.60%
                               VGG16      88.31%       89.61%       87.01%      94.70%
                              DASNet      88.63%       88.68%      88.57%       95.36%
                  [70,90)     DSANet      90.80%      94.23%        85.71%      96.32%
                               VGG16      82.95%       84.91%       80.00%      88.84%
   When comparing the three networks side by side, all the metrics are highly correlated. When focusing on
subjects older than 20, DSANet outperforms DASNet and VGG16 in almost every case, with a substantial
accuracy difference of 4-8% in the group 70-90. In terms of sensitivity/specificity DSANet classifies better the
female images in every case, with the largest margin appearing in the group 70-90 (6% with respect to DASNet
and 10% with respect to VGG16). The classification of images of males is carried out better by DASNet in
subjects older than 60, by DSANet in people between 20 and 40 years of age and by VGG16 in the remaining
group (40 to 50). In general terms DSAnet produced the highest AUC, although the differences are normally
lower than 2% (except for the group 70-90, where the VGG16 performs worse by a large margin).
   Although the accuracy of the method is lower in people younger than 20, the performance metrics follows an
improvement pattern along all that period. As can be seen in Fig. 2, there is a jump at about 8 years of age
which is specially noticeable in DSANet and VGG16, where the accuracy improves from 70 to 84% and from 71
to 82%, respectively. The better balance between sensitivity and specificity is obtained by DSANet in people
younger than 18, and by DASNet in people between 18 and 20 years of age. Regarding AUC values, the greatest
difference appears when classifying images of children age from 8 and 10, being 90% with DSANet, 86% with
DASNet and only 78% with VGG16.


             (a) DASNet                             (b) DSANet                             (c) VGG16


Figure 2: Evolution of the classification metrics in subjects from seven age groups, ranging from 5 up to 20
year-old.


4   Discussion and Conclusion
In this work, three different Deep Learning architectures based in Convolutional Neural Networks have been
used for tackling gender estimation from dental panoramic images. The first, DASNet, is a network architecture
proposed in our previous work [VBCVQ+ 20], conceived to estimate the chronological age by combining both
maturational- and gender-dependent features. The second one is a proposed adaptation of DASNet (called
DSANet) where the main objective moves from the age estimation to the gender classification under the same
idea of combining maturational and sexual features. The third one is an adaptation of the so-called VGG16
pretrained with Imagenet dataset, which have already demonstrated good results in other gender estimation
method [MVGS19].
   The results of all the networks show a strong correlation, performing better in young adults (18 to 20 years
of age) and middle-age adults (around 30-40). Also, they tend to classify better the images of females (higher
sensitivity). The networks have also proved to obtain robust predictions (in terms of AUC), regardless of the
specific threshold used to determine if the output probability produces a female or a male classification. Although
DSANet provide better results in general, it is noticeably that DASNet outperforms it in subjects younger than
8 by a significant margin. VGG16 performs better than the others in people aged from 14 to 16, but it tends to
obtain worse results in general terms. To the best of our knowledge, this has to do with the fact that DASNet
and DSANet architectures combine maturational and sexual features, and thus the network can learn in a more
structured way.
   In general, the results support the fact that it is quite challenging to determine the sex in people younger than
16 by looking only at the oral cavity. In the youngest age groups, the images show great variability, motivated by
the presence of mixed dentition stages and the heterogeneity in mandibular growth patterns [FMA+ 15, MO14],
and thus the networks can not go beyond 90% of accuracy. More research focused on these age groups should
be conducted to improve these sex-prediction findings.
   In conclusion, all the networks provide reliable predictions of sex, being the DSANet the most accurate in the
majority of the age groups. The suitability of this approach is specially relevant in patients older than 16 years
old, reporting accuracies between 90 and 96.2%. Although the performance decreases in younger people, the
method is still useful in subjects older than 8 when combined with other radiological methods, with accuracies
over 83%, demonstrating the usefulness of automatic approaches in sex prediction.

Acknowledgements
This work has received financial support from Consellerı́a de Cultura, Educación e Ordenación Universitaria
(accreditation 2019-2022 ED431G-2019/04, 2017-2020 Potential Growth Group ED431B 2017/029, 2017-2020
Competitive Reference Group ED431C 2017/69, and N Vila-Blanco support ED481A-2017) and the European
Regional Development Fund (ERDF), which acknowledges the CiTIUS-Research Center in Intelligent Technolo-
gies of the University of Santiago de Compostela as a Research Center of the Galician University System.

References
[AIB+ 18]     A Alias, AN Ibrahim, SNA Bakar, MS Shafie, S Das, N Abdullah, HM Noor, IY Liao, and
              FM Nor. Anthropometric analysis of mandible: an important step for sex determination. La
              Clinica Terapeutica, 169(5):e217–e223, 2018.

[AKM15]       AH Abdi, S Kasaei, and M Mehdizadeh. Automatic segmentation of mandible in panoramic x-ray.
              Journal of Medical Imaging, 2(4):044003, 2015.

[AM08]        AB Acharya and S Mainali. Sex discrimination potential of buccolingual and mesiodistal tooth
              dimensions. Journal of Forensic Sciences, 53(4):790–792, 2008.

[ARR+ 17]     DC Ackland, D Robinson, M Redhead, PVS Lee, A Moskaljuk, and G Dimitroulis. A personalized
              3d-printed prosthetic joint replacement for the human temporomandibular joint: From implant
              design to implantation. J Mechanical Behavior of Biomedical Materials, 69:404–411, 2017.

[BDR+ 12]     MF Bilfeld, F Dedouit, H Rousseau, N Sans, J Braga, D Rougé, and N Telmon. Human coxal
              bone sexual dimorphism and multislice computed tomography: geometric morphometric analysis
              of 65 adults. Journal of Forensic Sciences, 57(3):578–588, 2012.

[BOTA15]      DH Badran, DA Othman, HW Thnaibat, and WM Amin. Predictive accuracy of mandibular
              ramus flexure as a morphologic indicator of sex dimorphism in jordanians. International Journal
              of Morphology, 33(4), 2015.

[CEV+ 11]     D Charisi, C Eliopoulos, V Vanna, CG Koilias, and SK Manolis. Sexual dimorphism of the arm
              bones in a modern greek population. Journal of Forensic Sciences, 56(1):10–18, 2011.

[DDS+ 09]     J Deng, W Dong, R Socher, L-J Li, K Li, and L Fei-Fei. Imagenet: A large-scale hierarchical
              image database. In 2009 IEEE Conf. on Computer Vision and Pattern Recognition, pages 248–255.
              IEEE, 2009.

[FMA+ 15]     MFN Feres, TS Muniz, SH de Andrade, M Lemos, and SSN Pignatari. Craniofacial skeletal
              pattern: is it really correlated with the degree of adenoid obstruction? Dental press journal of
              orthodontics, 20(4):68–75, 2015.

[FOOD06]      D Franklin, P O’Higgins, CE Oxnard, and I Dadour. Determination of sex in south african blacks
              by discriminant function analysis of mandibular linear dimensions. Forensic Science, Medicine,
              and Pathology, 2(4):263–268, 2006.

[HC12]        SM Harris and DT Case. Sexual dimorphism in the tarsal bones: implications for sex determina-
              tion. Journal of Forensic Sciences, 57(2):295–305, 2012.

[IS15]        S Ioffe and C Szegedy. Batch normalization: Accelerating deep network training by reducing
              internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[LH96]       SR Loth and M Henneberg. Mandibular ramus flexure: a new morphologic indicator of sexual
             dimorphism in the human skeleton. American Journal of Physical Anthropology: The Official
             Publication of the American Association of Physical Anthropologists, 99(3):473–485, 1996.
[LHK+ 19]    J-H Lee, S-S Han, YH Kim, C Lee, and I Kim. Application of a fully deep convolutional neural
             network to the automation of tooth segmentation on panoramic radiographs. Oral Surgery, Oral
             Medicine, Oral Pathology and Oral Radiology, 2019.
[Liv03]      H Liversidge. Variation in modern human dental development. Cambridge Studies in Biological
             and Evolutionary Anthropology, pages 73–113, 2003.
[MO14]       ICL Muñoz and PB Orta. Comparison of cephalometric patterns in mouth breathing and nose
             breathing children. International journal of pediatric otorhinolaryngology, 78(7):1167–1172, 2014.
[MPR13]      M Marinescu, V Panaitescu, and M Rosu. Sex determination in romanian mandible using dis-
             criminant function analysis: Comparative results of a time-efficient method. Rom J Leg Med,
             21(4):305–8, 2013.
[MSM10]      IAhmed Mughal, AS Saqib, and F Manzur. Mandibular canine index (mci). The Professional
             Medical Journal, 17(03):459–463, 2010.
[MVGS19]     D Milošević, M Vodanović, I Galić, and M Subašić. Estimating biological gender from panoramic
             dental x-ray images. In 2019 11th International Symposium on Image and Signal Processing and
             Analysis (ISPA), pages 105–110. IEEE, 2019.
[PPZP12]     DH Parekh, SV Patel, AZ Zalawadia, and SM Patel. Odontometric study of maxillary canine
             teeth to establish sexual dimorphism in gujarat population. Int J Biological and Medical Research,
             3(3):1935–7, 2012.
[RGD13]      MG Roberts, J Graham, and H Devlin. Image texture in dental panoramic radiographs as a
             potential biomarker of osteoporosis. IEEE Trans. Biomedical Engineering, 60(9):2384–2392, 2013.
[SD05]       GT Schwartz and MC Dean. Sexual dimorphism in modern human permanent teeth. American
             Journal of Physical Anthropology: The Official Publication of the American Association of Physical
             Anthropologists, 128(2):312–317, 2005.
[SGP+ 15]    SKumar Singh, A Gupta, B Padmavathi, S Kumar, S Roy, A Kumar, et al. Mandibular canine
             index: A reliable predictor for gender identification using study cast in indian population. Indian
             Journal of Dental Research, 26(4):396, 2015.
[SPG+ 16]    AM Silva, ML Pereira, S Gouveia, JN Tavares, A Azevedo, and IM Caldas. A new approach to
             sex estimation using the mandibular canine index. Medicine, Science and Law, 56(1):7–12, 2016.
[SZ14]       K Simonyan and A Zisserman. Very deep convolutional networks for large-scale image recognition.
             arXiv preprint arXiv:1409.1556, 2014.
[VBCL+ 18]   N Vila-Blanco, TF Cootes, C Lindner, I Tomás, and MJ Carreira. Fully automatic teeth segmen-
             tation in adult opg images. In International Workshop on Computational Methods and Clinical
             Applications in Musculoskeletal Imaging, pages 11–21. Springer, 2018.
[VBCVQ+ 20] N Vila-Blanco, MJ Carreira, P Varas-Quintana, C Balsa-Castro, and I Tomás. Deep neural
            networks for chronological age estimation from opg images. IEEE Trans. Medical Imaging, 2020.
[VMGA13]     G Vinay, SR Mangala Gowri, and J Anbalagan. Sex determination of human mandible using
             metrical parameters. Journal of clinical and diagnostic research: JCDR, 7(12):2671, 2013.
[YGTY18]     M Yan, J Guo, W Tian, and Z Yi. Symmetric convolutional neural network for mandible segmen-
             tation. Knowledge-Based Systems, 159:63–71, 2018.
[Zei12]      MD Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.

</pre>