=Paper=
{{Paper
|id=Vol-2655/paper22
|storemode=property
|title=Towards deep learning reliable gender estimation from dental panoramic radiographs
|pdfUrl=https://ceur-ws.org/Vol-2655/paper22.pdf
|volume=Vol-2655
|authors=Nicolás Vila Blanco, Raquel Rodríguez Vilas, María José Carreira Nouche, Inmaculada Tomás Carmona
|dblpUrl=https://dblp.org/rec/conf/ecai/BlancoVNC20
}}
==Towards deep learning reliable gender estimation from dental panoramic radiographs==
differences in the mandible, principally in some variables such as the bicondylar and ramus width or the gonial angle [VMGA13]. All these findings provided useful tools to determine the gender of a subject through relatively simple mea- surements in the oral cavity, which can be carried out in-situ (directly over the bone) or with the help of imaging technologies such as X-ray or CT. As it can be seen in Table 1, the majority of these methods for sex estimations rely on mandibular measurements, which varied in number from 3 to 12. With this approach, the obtained accuracy ranged from 70 to 95%. The second most followed approach is the gender estimation through the Mandibular Canine Index (MCI), whose accuracy varied from 64 to 86%. However, these methods rely on measurements that have to be taken manually by one or several well-trained experts and thus, they are time-consuming procedures. Also, they are subject to inter- and intra-observer disagreement which, ultimately, leads to problems of reproducibility. That is the main reason why computer-assisted approaches have been adopted in some clinical procedures, like prostheses design [ARR+ 17]. In particular, imaging processing techniques have proven to be very useful in oral-related assessment, tackling different tasks such as mandible segmentation [AKM15], teeth outlining [VBCL+ 18] or disease diagnosis [RGD13]. In the recent years, the increasing number of medical images, as well as the increasing computing power, have contributed to the development of more sophisticated machine learning approaches, such as the Deep Neural Networks (DNNs). This kind of methods has already been used to process dental images, with numerous successful showcases [VBCVQ+ 20, LHK+ 19, YGTY18]. Specifically, a recent work proposed the use of DNNs for gender estimation [MVGS19], with a top accuracy of 97%. The vast majority of these works are focused on determining the sex of subjects older than 20, mainly because the permanent teeth are already developed and there are anatomical features related to size in the mature state which allows for a more accurate gender prediction. This is in line with the findings of our previous work, where a DNN architecture was proposed to estimate the chronological age and the sex [VBCVQ+ 20]. In this work, a comparison of three DNN architectures has been proposed to determine the sex of a subject from a dental panoramic image (OPG), focusing on the influence of the patient’s age in the prediction accuracy. Furthermore, the results of the unexplored group of younger than 20 in [MVGS19] have been analysed. Table 1: Performance of methods for gender estimation. (M: Males; F: Females) Reference Sample Age range Method Accuracy [FOOD06] 40 (20M/20F) 20-48 Mandible measurements (10) 95% Tooth crown measurements [AM08] 53 (31M/22F) 19-28 64-83% (bucolingual and mesiodistal) [MSM10] 200 (100M/100F) 18-25 Mandibular Canine Index (MCI) 76% [MPR13] 200 (100M/100F) 20-86 Mandible measurements (3) 84% Mandibular measurements [BOTA15] 419 (126M/293F) 13-26 70.9% (Method from [LH96]) [SGP+ 15] 100 (45M/55FF) 20-30 Mandibular Canine Index (MCI) 85.5% + [SPG 16] 120 (50M/70F) 16-30 Mandibular Canine Index (MCI) 64.2% [AIB+ 18] 79 (48M/31F) 18-74 Mandible measurements (12) 78.5% [MVGS19] 4000 (2352M/1648F) 19-85 Deep Neural Network 96.7% [VBCVQ+ 20] 2289 (1030M/1257F) 4.5-89.2 Deep Neural Network 85.4% 2 Material and Methods In this work, a set of 3400 OPG images provided by the School of Medicine and Dentistry, Universidade de Santiago de Compostela (Spain) were used. The images were collected under the approval of the ethical committee of the same university. The patients are distributed homogeneously in terms of sex and concentrated in the age groups between 5 and 30 (see Table 2). To build the sex classification system, three different CNN approaches have been compared. Firstly, DASNet (Dental Age and Sex Network) architecture proposed in [VBCVQ+ 20] was evaluated. This method consists of a main CNN path to estimate the chronological age. Also, a second identical path is added to classify the Table 2: Age and sex distribution of the dataset Sex Total Men Women [5,10) 256 254 510 (15%) [10,20) 595 606 1201 (35%) [20,30) 232 438 670 (20%) [30,40) 118 183 301 (9%) Age groups [40,50) 109 135 244 (7%) [50,60) 93 139 232 (7%) [60,70) 77 77 154 (4%) [70,90) 35 53 88 (3%) Total 1515 (45%) 1885 (55%) 3400 (100%) images according to the sex and thus extract gender-dependant features. By propagating those gender features to intermediate stages of the age path, the method can improve the age predictions. Although the gender classification was not the main objective, it achieves state-of-the-art results. The second tested approach is an adaptation of the previous one. Given the fact the main objective of DASNet was to integrate gender features to improve the chronological age estimation, a new version with the inverted roles was developed (referred as Dental Sex and Age Network or DSANet). In this case, the main CNN path corresponds to the gender classifier and the auxiliary path is designed to regress the chronological age and thus learn maturational features, which are propagated to the gender path in order the improve the sex classification. The third evaluated method is the so-called VGG16 architecture [SZ14], which has been slightly modified (see Fig. 1). First, the input size of the first layer of the network was changed from 224x224 to 512 pixels width by 256 pixels height. Second, two convolutional blocks were added at the beginning of the network, each one composed of a convolutional layer with a ReLU activation and a Batch Normalisation layer [IS15]. As each convolutional is designed to provide 3 output feature maps, the single-channel x-ray images can be transformed to three-channel images as it is expected by the VGG16. Moreover, a last block of convolutional, Batch Normalisation and 2x2 pooling layers was added on top of the network to reduce the output size. Finally, the fully connected part of the original VGG16 architecture was simplified, with a single 128-neuron layer and a single output as the probability of the given image of belonging to the positive class (female). All the convolutional layers contain 3x3 kernels in order to match the VGG16 behaviour. The layers corre- sponding to the VGG16 backbone were initialised with the weights pre-trained with Imagenet dataset [DDS+ 09], and only the last block of the network was allowed to be modified during the training. Figure 1: VGG16-based proposed architecture. 3 Experiments and results The networks were trained by using Adadelta optimiser, as it adapts the learning rate automatically [Zei12], and the batch size was set empirically to 16 as a good compromise between efficiency and regularisation effect. All the batches were transformed to improve the variability of the data and so improve the performance of the network. First column of Table 3 shows the transformations applied: translation in both axes, rotation and contrast and brightness changing. The brightness was disturbed according to the Power Law Transform, where each pixel value in the image is raised to a given factor. The contrast was changed according to the formula f · (I − 0.5) + 0.5, where f is the factor of the transformation and I is the input image. After the transformations, pixel values under 0 or over 1 are clipped to preserve the [0,1] range. Second column of Table 3 shows the factor, that represents the boundaries of the uniform distribution used to get the transformation parameters. Third column represents the probability of the batch size affected by the transformation. Table 3: Data augmentation transformations used to make the training set more diverse and thus improve the network capabilities. Transformation Factor Probability Horizontal flip - 0.5 Translation X (-10,10) pixels 1 Translation Y (-8,8) pixels 1 Rotation (-1,1) degrees 1 Contrast (0.8,1.2) x 0.8 Brightness (0.8,1.2) x 0.8 All the experiments to assess the performance of the gender estimation networks were carried out with 8-fold Cross-Validation. The set of images was divided into 8 parts or folds, with all parts distributed in a similar way according to the age and the sex of the patients. The model was then trained iteratively, using in each iteration 6 folds to train, 1 fold to validate the training and 1 fold to obtain the performance metrics. The folds used to each task are changed in each CV-iteration and thus the performance of the model can be averaged over the whole dataset. The performance of the networks was assessed with four different metrics. Firstly, the accuracy gives a general idea of the correctly classified images. Then, the sensitivity and specificity metrics provide the percentage of the well-classified female and male images, respectively. To calculate these three measurements, we set a threshold of 0.5 to decide if the prediction class is positive (female) or negative (male). Finally, the Area Under the ROC Curve (AUC) combines the sensitivity and specificity metrics obtained for every possible classification threshold (not only 0.5), so it is useful to evaluate the robustness of the network. In Table 4 the prediction results are shown independently for each age group (following the same division as in Table 2 for people older than 20 and a finer division for younger people) and for each network. With DASNet, the accuracy of the classification goes beyond 90% in every age group older than 16 years of age. In younger people, the performance decreases up to 75% in the range 5-8 and the accuracy peak is reached in the group 18-20 (96.24%). The female images are classified better as in 9 of the 13 evaluated age groups, where the most noticeable differences are in the group 5-8, with a 12% in favour of male images (greater specificity). The AUC exceeds 80% in every group, reaching a top value of 98.23% in the group 18-20. The results of DSANet show an accuracy of over 83% in subjects older than 8 years of age and over 90% when going beyond 16, where the top accuracy of 96.68% is obtained between 30 and 40. The performance is better in the images of females in every age group, being 12% the greatest difference between 40 and 50 years of age. The AUC stays above 90% in every age group but the younger, reaching a value of almost 99% in the range 30-40. With VGG16, the gender estimation accuracy exceeds 90% in people aged from 16 to 60, with a significant peak of 94% in the 30-40 age range. The performance in people older than 60 starts to decrease with an accuracy of 88%, being 83% in the older age group. The performance in younger groups stays between 84% and 90% in the 8-16 age range, being 70% for the subjects younger than 8. The images of females are classified better in almost every age group, being especially noticeable in groups 16-18 and 18-20 (sensitivity values of 96.88% and 96%). The most considerable differences between the performance in female and male images occur in groups 12-14 and 16-18 (difference of 13%), while the most balanced predictions are made in the group 50-60 (difference of 0.03%). The AUC falls below 78% in the youngest age group, but exceeds 94% between 14 and 70 years of age. Table 4: Gender estimation results by age group. The highlighted values correspond to the most accurate networks according to each specific metric. Age groups Method Accuracy Sensitivity Specificity AUC DASNet 75.00% 68.29% 80.39% 82.80% [5,8) DSANet 70.11% 70.73% 69.61% 80.06% VGG16 70.65% 69.51% 71.57% 77.91% DASNet 77.19% 74.71% 80.00% 85.63% [8,10) DSANet 83.75% 84.12% 83.33% 89.99% VGG16 81.88% 82.94% 80.67% 77.91% DASNet 79.89% 82.11% 77.53% 88.03% [10,12) DSANet 84.24% 86.84% 81.46% 91.36% VGG16 81.97% 84.74% 78.65% 89.39% DASNet 86.04% 89.56% 82.24% 92.96% [12,14) DSANet 87.18% 88.46% 85.80% 94.00% VGG16 86.61% 92.86% 79.88% 92.28% DASNet 88.07% 92.63% 84.55% 94.95% [14,16) DSANet 88.99% 90.53% 87.80% 95.75% VGG16 89.91% 92.63% 87.80% 95.88% DASNet 90.08% 90.63% 89.55% 96.22% [16,18) DSANet 94.65% 95.31% 94.02% 98.71% VGG16 90.08% 96.88% 83.58% 98.60% DASNet 96.24% 96.00% 96.55% 98.23% [18,20) DSANet 96.24% 98.67% 93.01% 98.14% VGG16 93.98% 96.00% 91.38% 95.89% DASNet 89.40% 92.23% 84.05% 95.02% [20,30) DSANet 91.79% 93.38% 88.79% 96.57% VGG16 90.30% 92.47% 86.21% 95.40% DASNet 93.02% 93.44% 92.37% 97.60% [30,40) DSANet 96.68% 97.81% 94.91% 98.96% VGG16 94.02% 95.08% 92.37% 97.04% DASnet 89.34% 91.85% 86.24% 93.72% [40,50) DSANet 93.03% 98.52% 86.24% 96.03% VGG16 91.39% 94.07% 88.07% 95.51% DASNet 89.22% 87.05% 92.47% 95.50% [50,60) DSANet 94.40% 96.40% 91.40% 97.83% VGG16 91.38% 91.37% 91.40% 97.19% DASnet 89.61% 92.20% 87.02% 93.77% [60,70) DSANet 88.96% 92.21% 85.71% 94.60% VGG16 88.31% 89.61% 87.01% 94.70% DASNet 88.63% 88.68% 88.57% 95.36% [70,90) DSANet 90.80% 94.23% 85.71% 96.32% VGG16 82.95% 84.91% 80.00% 88.84% When comparing the three networks side by side, all the metrics are highly correlated. When focusing on subjects older than 20, DSANet outperforms DASNet and VGG16 in almost every case, with a substantial accuracy difference of 4-8% in the group 70-90. In terms of sensitivity/specificity DSANet classifies better the female images in every case, with the largest margin appearing in the group 70-90 (6% with respect to DASNet and 10% with respect to VGG16). The classification of images of males is carried out better by DASNet in subjects older than 60, by DSANet in people between 20 and 40 years of age and by VGG16 in the remaining group (40 to 50). In general terms DSAnet produced the highest AUC, although the differences are normally lower than 2% (except for the group 70-90, where the VGG16 performs worse by a large margin). Although the accuracy of the method is lower in people younger than 20, the performance metrics follows an improvement pattern along all that period. As can be seen in Fig. 2, there is a jump at about 8 years of age which is specially noticeable in DSANet and VGG16, where the accuracy improves from 70 to 84% and from 71 to 82%, respectively. The better balance between sensitivity and specificity is obtained by DSANet in people younger than 18, and by DASNet in people between 18 and 20 years of age. Regarding AUC values, the greatest difference appears when classifying images of children age from 8 and 10, being 90% with DSANet, 86% with DASNet and only 78% with VGG16. (a) DASNet (b) DSANet (c) VGG16 Figure 2: Evolution of the classification metrics in subjects from seven age groups, ranging from 5 up to 20 year-old. 4 Discussion and Conclusion In this work, three different Deep Learning architectures based in Convolutional Neural Networks have been used for tackling gender estimation from dental panoramic images. The first, DASNet, is a network architecture proposed in our previous work [VBCVQ+ 20], conceived to estimate the chronological age by combining both maturational- and gender-dependent features. The second one is a proposed adaptation of DASNet (called DSANet) where the main objective moves from the age estimation to the gender classification under the same idea of combining maturational and sexual features. The third one is an adaptation of the so-called VGG16 pretrained with Imagenet dataset, which have already demonstrated good results in other gender estimation method [MVGS19]. The results of all the networks show a strong correlation, performing better in young adults (18 to 20 years of age) and middle-age adults (around 30-40). Also, they tend to classify better the images of females (higher sensitivity). The networks have also proved to obtain robust predictions (in terms of AUC), regardless of the specific threshold used to determine if the output probability produces a female or a male classification. Although DSANet provide better results in general, it is noticeably that DASNet outperforms it in subjects younger than 8 by a significant margin. VGG16 performs better than the others in people aged from 14 to 16, but it tends to obtain worse results in general terms. To the best of our knowledge, this has to do with the fact that DASNet and DSANet architectures combine maturational and sexual features, and thus the network can learn in a more structured way. In general, the results support the fact that it is quite challenging to determine the sex in people younger than 16 by looking only at the oral cavity. In the youngest age groups, the images show great variability, motivated by the presence of mixed dentition stages and the heterogeneity in mandibular growth patterns [FMA+ 15, MO14], and thus the networks can not go beyond 90% of accuracy. More research focused on these age groups should be conducted to improve these sex-prediction findings. In conclusion, all the networks provide reliable predictions of sex, being the DSANet the most accurate in the majority of the age groups. The suitability of this approach is specially relevant in patients older than 16 years old, reporting accuracies between 90 and 96.2%. Although the performance decreases in younger people, the method is still useful in subjects older than 8 when combined with other radiological methods, with accuracies over 83%, demonstrating the usefulness of automatic approaches in sex prediction. Acknowledgements This work has received financial support from Consellerı́a de Cultura, Educación e Ordenación Universitaria (accreditation 2019-2022 ED431G-2019/04, 2017-2020 Potential Growth Group ED431B 2017/029, 2017-2020 Competitive Reference Group ED431C 2017/69, and N Vila-Blanco support ED481A-2017) and the European Regional Development Fund (ERDF), which acknowledges the CiTIUS-Research Center in Intelligent Technolo- gies of the University of Santiago de Compostela as a Research Center of the Galician University System. References [AIB+ 18] A Alias, AN Ibrahim, SNA Bakar, MS Shafie, S Das, N Abdullah, HM Noor, IY Liao, and FM Nor. Anthropometric analysis of mandible: an important step for sex determination. La Clinica Terapeutica, 169(5):e217–e223, 2018. [AKM15] AH Abdi, S Kasaei, and M Mehdizadeh. Automatic segmentation of mandible in panoramic x-ray. Journal of Medical Imaging, 2(4):044003, 2015. [AM08] AB Acharya and S Mainali. Sex discrimination potential of buccolingual and mesiodistal tooth dimensions. Journal of Forensic Sciences, 53(4):790–792, 2008. [ARR+ 17] DC Ackland, D Robinson, M Redhead, PVS Lee, A Moskaljuk, and G Dimitroulis. A personalized 3d-printed prosthetic joint replacement for the human temporomandibular joint: From implant design to implantation. J Mechanical Behavior of Biomedical Materials, 69:404–411, 2017. [BDR+ 12] MF Bilfeld, F Dedouit, H Rousseau, N Sans, J Braga, D Rougé, and N Telmon. Human coxal bone sexual dimorphism and multislice computed tomography: geometric morphometric analysis of 65 adults. Journal of Forensic Sciences, 57(3):578–588, 2012. [BOTA15] DH Badran, DA Othman, HW Thnaibat, and WM Amin. Predictive accuracy of mandibular ramus flexure as a morphologic indicator of sex dimorphism in jordanians. International Journal of Morphology, 33(4), 2015. [CEV+ 11] D Charisi, C Eliopoulos, V Vanna, CG Koilias, and SK Manolis. Sexual dimorphism of the arm bones in a modern greek population. Journal of Forensic Sciences, 56(1):10–18, 2011. [DDS+ 09] J Deng, W Dong, R Socher, L-J Li, K Li, and L Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conf. on Computer Vision and Pattern Recognition, pages 248–255. IEEE, 2009. [FMA+ 15] MFN Feres, TS Muniz, SH de Andrade, M Lemos, and SSN Pignatari. Craniofacial skeletal pattern: is it really correlated with the degree of adenoid obstruction? Dental press journal of orthodontics, 20(4):68–75, 2015. [FOOD06] D Franklin, P O’Higgins, CE Oxnard, and I Dadour. Determination of sex in south african blacks by discriminant function analysis of mandibular linear dimensions. Forensic Science, Medicine, and Pathology, 2(4):263–268, 2006. [HC12] SM Harris and DT Case. Sexual dimorphism in the tarsal bones: implications for sex determina- tion. Journal of Forensic Sciences, 57(2):295–305, 2012. [IS15] S Ioffe and C Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015. [LH96] SR Loth and M Henneberg. Mandibular ramus flexure: a new morphologic indicator of sexual dimorphism in the human skeleton. American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists, 99(3):473–485, 1996. [LHK+ 19] J-H Lee, S-S Han, YH Kim, C Lee, and I Kim. Application of a fully deep convolutional neural network to the automation of tooth segmentation on panoramic radiographs. Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology, 2019. [Liv03] H Liversidge. Variation in modern human dental development. Cambridge Studies in Biological and Evolutionary Anthropology, pages 73–113, 2003. [MO14] ICL Muñoz and PB Orta. Comparison of cephalometric patterns in mouth breathing and nose breathing children. International journal of pediatric otorhinolaryngology, 78(7):1167–1172, 2014. [MPR13] M Marinescu, V Panaitescu, and M Rosu. Sex determination in romanian mandible using dis- criminant function analysis: Comparative results of a time-efficient method. Rom J Leg Med, 21(4):305–8, 2013. [MSM10] IAhmed Mughal, AS Saqib, and F Manzur. Mandibular canine index (mci). The Professional Medical Journal, 17(03):459–463, 2010. [MVGS19] D Milošević, M Vodanović, I Galić, and M Subašić. Estimating biological gender from panoramic dental x-ray images. In 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pages 105–110. IEEE, 2019. [PPZP12] DH Parekh, SV Patel, AZ Zalawadia, and SM Patel. Odontometric study of maxillary canine teeth to establish sexual dimorphism in gujarat population. Int J Biological and Medical Research, 3(3):1935–7, 2012. [RGD13] MG Roberts, J Graham, and H Devlin. Image texture in dental panoramic radiographs as a potential biomarker of osteoporosis. IEEE Trans. Biomedical Engineering, 60(9):2384–2392, 2013. [SD05] GT Schwartz and MC Dean. Sexual dimorphism in modern human permanent teeth. American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists, 128(2):312–317, 2005. [SGP+ 15] SKumar Singh, A Gupta, B Padmavathi, S Kumar, S Roy, A Kumar, et al. Mandibular canine index: A reliable predictor for gender identification using study cast in indian population. Indian Journal of Dental Research, 26(4):396, 2015. [SPG+ 16] AM Silva, ML Pereira, S Gouveia, JN Tavares, A Azevedo, and IM Caldas. A new approach to sex estimation using the mandibular canine index. Medicine, Science and Law, 56(1):7–12, 2016. [SZ14] K Simonyan and A Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [VBCL+ 18] N Vila-Blanco, TF Cootes, C Lindner, I Tomás, and MJ Carreira. Fully automatic teeth segmen- tation in adult opg images. In International Workshop on Computational Methods and Clinical Applications in Musculoskeletal Imaging, pages 11–21. Springer, 2018. [VBCVQ+ 20] N Vila-Blanco, MJ Carreira, P Varas-Quintana, C Balsa-Castro, and I Tomás. Deep neural networks for chronological age estimation from opg images. IEEE Trans. Medical Imaging, 2020. [VMGA13] G Vinay, SR Mangala Gowri, and J Anbalagan. Sex determination of human mandible using metrical parameters. Journal of clinical and diagnostic research: JCDR, 7(12):2671, 2013. [YGTY18] M Yan, J Guo, W Tian, and Z Yi. Symmetric convolutional neural network for mandible segmen- tation. Knowledge-Based Systems, 159:63–71, 2018. [Zei12] MD Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.