-

During the COVID-19 Pandemics

Giorgio De Magistris

demagistris@diag.uniroma1.it 0 1 4 5

Emanuele Iacobelli

iacobelli@diag.uniroma1.it 0 1 4 5

Rafał Brociek

Rafal.Brociek@polsl.pl 0 2 4 5

Christian Napoli

cnapoli@diag.uniroma1.it 0 1 3 4 5 0 COVID-19 , Face Mask, CNN, ResNet50 1 Department of Computer, Control and Management Engineering, Sapienza University of Rome , Via Ariosto 25, Roma, 00185 , Italy 2 Department of Mathematics Applications and Methods for Artificial Intelligence, Faculty of Applied Mathematics, Silesian University of 3 Institute for Systems Analysis and Computer Science, Italian National Research Council , Via dei Taurini 19, Roma, 00185 , Italy 4 Technology , Gliwice, 44-100 , Poland 5 Workshop Proce dings

36 41

The ongoing COVID-19 pandemic has highlighted the importance of wearing face masks as a preventive measure to reduce the spread of the virus. In medical settings, such as hospitals and clinics, healthcare professionals and patients are required to wear surgical masks for infection control. However, the use of masks can hinder facial recognition technology, which is commonly used for identity verification and security purposes. In this paper, we propose a convolutional neural network (CNN) based approach to detect faces covered by surgical masks in medical settings. We evaluated the proposed CNN model on a test set comprising of masked and unmasked faces. The results showed that our model achieved an accuracy of over 96% in detecting masked faces. Furthermore, our model demonstrated robustness to diferent mask types and fit variations commonly encountered in medical settings. Our approaches reaches state of the art results in terms of accuracy and generalization.

COVID-19 Pandemics

1. Introduction The use of face masks has become a critical preventive

particularly in the context of the ongoing COVID-19 pandemic. In medical settings, such as hospitals and clinics, healthcare professionals and patients are required to wear surgical masks to minimize the risk of transmission.

However, the use of masks can hinder facial recognition

object detection, image classification, and image segmentation. They have the potential to learn complex patterns and representations from large datasets, which can aid

In this paper, we propose a CNN-based approach for detecting faces covered by surgical masks in medical settings. We aim to develop a model that can accurately and robustly identify masked faces, considering the unique challenges posed by diferent mask types, colors, and ifcation and security purposes. Accurate and eficient technology, which is commonly used for identity veri- fit variations commonly encountered in medical envidetection of faces covered by surgical masks is thus cru- individuals wearing surgical masks, and fine-tuned a ronments. We collect a dataset of facial images with

ResNet50 model on the specific task. The contributions

of our work include the development of a CNN-based

Traditional approaches for face detection and recog- approach tailored for face mask detection in medical setcial for maintaining security measures while adhering to infection control protocols. nition may face challenges in the presence of masks, as masks can alter facial features, obstructing key facial landmarks and reducing facial visibility. To address this challenge, convolutional neural networks (CNNs), a type ically learn hierarchical features from images, have been proposed as a promising solution. CNNs have shown great success in various computer vision tasks, including (C. Napoli) CEUR htp:/ceur-ws.org ISN1613-073

CEUR generalization in the presence of diferent mask types and fit variations. The proposed approach has the potential to enhance security measures while maintaining applications in various real-world scenarios. The rest of the paper is organized as follows: in Section 2, we review related works in the field; in Section

3, we describe the

dataset used in our study; in Section 4, we introduce the proposed method; in Section 5 we present the experimental results and discuss the findings; and finally, in Section

6, we conclude the paper and outline future directions of research in this area. 2. Related Works

learning algorithms. The latest algorithms for face recognition are based on deep learning architectures, 3D face Machine learning and convolutional neural networks recognition, and hybrid approaches. As face recognition have been widely applied to the general field of face recog- technology becomes more prevalent, there is a growing nition. Face recognition is a rapidly developing field that need to address privacy and security concerns. ArcFace has seen significant advancements in recent years, driven [10] is a face recognition algorithm that uses a marginby the increasing availability of large-scale datasets and based softmax loss function to optimize feature reprepowerful machine learning algorithms. Deep learning sentations for face recognition, similarly CosFace [11], has become the dominant approach for face recognition another deep learning-based face recognition algorithm, due to its superior performance on large-scale datasets. uses a cosine-based loss function to improve the discrimSome of the most widely used deep learning architec- inability of the learned features. Like ArcFace, it has tures for face recognition include Convolutional Neural achieved state-of-the-art performance on several benchNetworks (CNNs), Siamese Networks, Triplet Networks, mark datasets. Another algorithm, named SphereFace and Deep Belief Networks (DBNs). These models are [12], uses a angular-based softmax loss function to imtrained using large-scale datasets such as VGGFace [ 1 ], prove the discriminability of the learned features. While FaceNet [2], and IE-CNN models [3], which contain mil- the said algorithm have many similarities, another syslions of face images. Deep learning models have achieved tem named DeepID [ 13, 14 ], has beeen developed to ofer state-of-the-art results on various benchmarks such as a multi-task learning approach to learn multiple levels the Labeled Faces in the Wild (LFW) [4] and MegaFace [5] of features for face recognition. However it has been datasets. Many other applications have been developed surpassed by more recent approaches such as VGGFace2 using machine learning and face recognition algorithms [ 1 ]. VGGFace2 is a large-scale face recognition dataset [ 6, 7, 8, 9, 8 ] 3D face recognition is an emerging field that has been used to train several deep learning-based that uses 3D information to improve the accuracy of face face recognition algorithms, including some of the aprecognition systems. 3D face models can capture addi- proaches mentioned above. It contains over 3 million tional facial details such as the depth of the facial features, face images of over 9,000 subjects, making it one of the which are not present in 2D images. Some of the popular largest face recognition datasets available. Another conapproaches for 3D face recognition include the use of 3D volutional model is FaceNet [2] that, diferently from the morphable models (3DMM), depth-based methods, and previous approaches, uses a triplet loss function to learn multi-view based methods. Face recognition can be di- discriminative features for face recognition, being widely vided into two categories: verification and identification. adopted in industry. With the outbreak of the COVIDVerification aims to determine if two face images belong 19 pandemic, the use of face recognition algorithms has to the same person, while identification aims to identify been applied to the recognition of people wearing (or a person from a set of images. Face verification systems not wearing) face masks. One common approach is to are commonly used for security applications, while face use deep learning-based object detection methods to deidentification systems are used in large-scale surveillance tect the presence of a face and then classify whether the applications. Recent advancements in face recognition face is wearing a mask or not. This approach typically have focused on improving the accuracy of both veri- involves training a CNN on a dataset of masked and unifcation and identification systems. Hybrid approaches masked faces. The CNN learns to extract features from combine multiple techniques to improve the performance the face images and use them to classify whether a mask of face recognition systems. For example, a hybrid ap- is present or not. Several studies have reported high accuproach may combine deep learning models with 3D face racy rates for face mask detection using deep learning alrecognition techniques to improve accuracy. Another gorithms. In fatct during the COVID-19 pandemic the use approach is to use facial landmark detection algorithms of convolutional neural networks (CNNs) for face mask to improve the alignment of face images before recog- detection has gained significant attention in literature. nition. As face recognition technology becomes more Several studies have proposed CNN-based approaches prevalent, there are growing concerns about privacy and for detecting masked faces. In [ 15 ] the authors appliy security. Several approaches have been proposed to ad- a Long Short-Term Memory (LSTM) network to model dress these concerns, such as anonymization techniques, the time-dependencies in order detect whether a person which modify the facial features to protect the privacy of wears a face mask while speaking. In [ 16 ] the authors the individuals in the images. Other approaches include are able to detect if face masks are worn by people in a adversarial attacks, which aim to fool the face recognition closed enviroinment. Overall, deep learning algorithms system into misidentifying a person. Face recognition have shown promise for detecting whether a person is is a rapidly developing field that has seen significant wearing a face mask [17], and their use could help to advancements in recent years, driven by the increasing improve public health measures during the COVID-19 availability of large-scale datasets and powerful machine pandemic. For example, the authors of [18] propose a slightly modified version of LeNet [ 19] to detect masked than other facial image datasets such as the popular Lafaces in the wild. Similarly the authors of [ 20 ] propose beled Faces in the Wild (LFW) dataset. The FFHQ dataset a novel CNN architecture tailored for the specific task also includes annotations for facial landmarks, which can and evaluate their model on a custom dataset. For more be used for tasks such as face alignment and face tracking. information about the existing methods of covered face The FFHQ dataset has been used in a range of computer detection we refer the reader to the recent survey [ 21 ]. vision research projects, including the development of Our method difers from the other works in literature generative models for face synthesis and style transfer, both in the dataset used and the training strategy. In as well as the development of deep learning-based modparticular we will address the task as a multi-label clas- els for facial expression recognition and emotion detecsification problem. Previous works [ 22, 23 ] have shown tion. The availability of high-quality facial images in the that this approach allows a better understanding of the FFHQ dataset has helped to advance the state-of-the-art context. We will see that this approach allows to reduce in these and other computer vision tasks, and it is likely overfitting and consequently to generalize better on un- to continue to be an important resource for researchers seen data. working in the field of computer vision. In this study 1000 images of faces uncovered and covered with masks have been selected from the FFHQ dataset and merged, 3. Dataset to preserve generality, with a large portion of Kaggle face mask dataset[ 25 ] while others were added manually in order to increase the variability in the masks types and colors; 160 images of masks with no faces scraped from the web and 150 images containing diferent classes of objects all unrelated to faces and to masks also scraped from the web.

To train our model we created a custom dataset con

taining 151 images of uncovered faces extracted from the Flickr-Faces-HQ dataset[ 24 ]. The Flickr-Faces-HQ (FFHQ) dataset is a large-scale dataset of high-quality facial images collected from the photo-sharing website Flickr. The dataset was created by NVIDIA Research in 2019 and contains 70,000 images of 1,024x1,024 resolution, with a diverse range of ages, ethnicities, and genders. 4. Method The images in the FFHQ dataset are highly curated and ifltered for quality, ensuring that they are of high fidelity, For the classification task we used ResNet50 [ 26] a dReshigh resolution, and well-lit. The dataset is designed to Net50 is a 50-layer deep neural network that is based be used for training and evaluating machine learning on residual connections, which allow for the reuse of models for various computer vision tasks, including face features from previous layers in the network. The archirecognition, facial expression analysis, and face synthe- tecture includes several blocks of convolutional layers, sis. One of the key features of the FFHQ dataset is that batch normalization, and pooling layers, as well as shortit includes a wide range of facial expressions, poses, and cut connections that bypass one or more layers. These lighting conditions, which makes it more challenging shortcut connections enable the network to learn residual functions, which can be more easily optimized during ResNet50 pretrained on ImageNet and we added a single training and help to mitigate the vanishing gradient probclassification layer with sigmoid activation (remember lem that occurs in very deep networks. ResNet50 has that the labels are not mutually exclusive). We used the been trained by means of ImageNet [27]. ImageNet is a standard binary cross entropy loss: considering the batch

5. Results

We fine-tuned the network for 200 epochs on images with size 224x224 pixels. We used the feature extractor of the size B, the predicted value ̂ and the true value we get: ( , )̂ = −

3 1 =1 =1 ∑ ∑ where

= [] log( ̂ []) + (1 − []) log(1 − ̂ [])

With this training strategy we obtained an accuracy of 96% on a balanced testset, against the 90% of accuracy obtained by the network trained to classify only covered versus uncovered faces. To make a fair comparison between the two approaches, the accuracy in the multilabel classification problem is computed considering only the label that indicates if the face is covered or not, which is in fact a binary classification problem. Figure

2 shows some samples along with the network predictions. 6. Conclusion

Our findings suggest that CNNs can efectively detect faces covered by surgical masks in medical settings, which can be beneficial for enhancing security measures while maintaining infection control protocols. The proposed model has potential applications in healthcare facilities, airports, and other settings where face mask detection is critical for security and safety purposes. Future work can explore additional data sources and further optimization techniques to improve the model’s performance and real-world applicability.

815–823. Vggface2: A dataset for recognising faces across pose and age, in: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), IEEE, 2018, pp. 67–74. [2] F. Schrof, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and clustering, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp.

Similar face recognition using the ie-cnn model,

IEEE Access 8 (2020) 45244–45253. [4] G. B. Huang, E. Learned-Miller, Labeled faces in the wild: Updates and new reporting procedures,

Dept. Comput. Sci., Univ. Massachusetts Amherst, Amherst, MA, USA, Tech. Rep 14 (2014).

[1]

Cao ,

Shen ,

Xie ,

O. M.

Parkhi ,

Zisserman , [3]

A.-P.

Song ,

Hu ,

X.-H.

Ding ,

X.-Y.

Di ,

Z.-H.

Song , [5]

Kemelmacher-Shlizerman ,

S. M.

Seitz , D. Miller, tic scene understanding for context-aware humanE. Brossard, The megaface benchmark: 1 million robot interaction, Lecture Notes in Computer Scifaces for recognition at scale , in: Proceedings of the ence (including subseries Lecture Notes in ArtiIEEE conference on computer vision and pattern ifcial Intelligence and Lecture Notes in Bioinforrecognition , 2016 , pp. 4873 - 4882 . matics) 13196 LNAI ( 2022 ) 310 - 325 . doi:1 0 . 1 0 0 7 / [6]

Marcotrigiano , G. Stingi,

Fregnan , P. Maga- 9 7 8 - 3 - 0 3 1 - 0 8 4 2 1 - 8 _ 2 1 .

relli , P. Pasquale, S.

Russo , G. Orsi, M.

Montagna , [10] J.

Deng , J.

Guo , N.

Xue , S.

Zafeiriou , Arcface: Addi C. Napoli, C.

Napoli , An integrated control plan tive angular margin loss for deep face recognition, in primary schools: Results of a field investiga- in: Proceedings of the IEEE/CVF conference on tion on nutritional and hygienic features in the computer vision and pattern recognition, 2019 , pp.

apulia region (southern italy) , Nutrients 13 ( 2021 ). 4690 - 4699 .

doi:1 0 . 3 3 9 0 / n u 1 3 0 9 3 0 0 6 . [11]

Wang ,

Zhou ,

Ji ,

Gong ,

Zhou , [7]

Capizzi ,

Napoli ,

Russo ,

Woźniak , Lessen-

Li , W. Liu, Cosface: Large margin cosine loss ing stress and anxiety-related behaviors by means for deep face recognition, in: Proceedings of the of ai-driven drones for aromatherapy , in: CEUR IEEE conference on computer vision and pattern Workshop Proceedings , volume 2594 , 2020 , pp. recognition , 2018 , pp. 5265 - 5274 .

7- 12 . [12]

Liu ,

Wen ,

Yu ,

Li ,

Raj , L. Song, [8]

Ponzi ,

Russo ,

Bianco ,

Napoli , A. Wa- Sphereface: Deep hypersphere embedding for face jda, Psychoeducative social robots for an healthier recognition , in: Proceedings of the IEEE conference lifestyle using artificial intelligence: a case-study, on computer vision and pattern recognition , 2017 , in: CEUR Workshop Proceedings, volume 3118 , pp. 212 - 220 .

2021 , pp. 26 - 33 . [13]

Sun ,

Liang ,

Wang ,

Tang , Deepid3: Face [9]

De Magistris ,

Caprari , G. Castro, S. Russo, recognition with very deep neural networks , arXiv L. Iocchi , D.

Nardi , C.

Napoli , Vision-based holis- preprint arXiv:1502.00873 ( 2015 ).

[14]

Ouyang ,

Zeng ,

Wang ,

Qiu , P. Luo, gle.com/datasets/andrewmvd/face-mask-detection, Y. Tian,

Li ,

Yang ,

Wang ,

Li , et al., Deepid- 2020 .

net: Object detection with deformable part based [26]

He ,

Zhang , S. Ren,

Sun , Deep residual learnconvolutional neural networks, IEEE Transactions ing for image recognition , in: Proceedings of the on Pattern Analysis and Machine Intelligence 39 IEEE conference on computer vision and pattern ( 2016 ) 1320 - 1334 . recognition, 2016 , pp. 770 - 778 .

[15]

Liu ,

Mallol-Ragolta ,

Yan ,

Qian , E. Parada- [27]

Deng ,

Dong ,

Socher ,

L.-J.

Li ,

FeiCabaleiro ,

Hu ,

B. W.

Schuller , Capturing time Fei, Imagenet: A large-scale hierarchical imdynamics from speech using neural networks for age database , in: 2009 IEEE conference on comsurgical mask detection , IEEE Journal of Biomedical puter vision and pattern recognition, Ieee , 2009 , pp.

and Health Informatics 26 ( 2022 ) 4291 - 4302 . 248 - 255 .

[16]

Chen ,

Sang , Face-mask recognition for fraud [28]

Brandizzi ,

Russo ,

Brociek ,

Wajda , First prevention using gaussian mixture model, Journal studies to apply the theory of mind theory to green of Visual Communication and Image Representa- and smart mobility by using gaussian area clustertion 55 ( 2018 ) 795 - 801 . ing, in: CEUR Workshop Proceedings , volume 3118 , [17]

Brociek ,

Magistris ,

Cardia ,

Coppa , 2021 , pp. 71 - 76 .

Russo , Contagion prevention of covid-19 by [29]

Ponzi ,

Russo ,

Wajda ,

Brociek , C. Napoli, means of touch detection for retail stores , in: Analysis pre and post covid-19 pandemic rorschach CEUR Workshop Proceedings , volume 3092 , 2021 , test data of using em algorithms and gmm modpp . 89 - 94 . els, in: CEUR Workshop Proceedings , volume 3360 , [18]

Lin ,

Cai ,

Lin ,

Ji , Masked face detection 2022 , pp. 55 - 63 .

via a modified lenet , Neurocomputing 218 ( 2016 ) [30]

Magistris ,

Rametta , G. Capizzi,

Napoli , Fpga 197 - 202 . implementation of a parallel dds for wide-band ap [19]

LeCun , L. Bottou,

Bengio ,

Hafner , Gradient- plications, in: CEUR Workshop Proceedings, volbased learning applied to document recognition, ume 3092 , 2021 , pp. 12 - 16 .

Proceedings of the IEEE 86 ( 1998 ) 2278 - 2324 .

[20]

Bu ,

Xiao ,

Zhou ,

Yang ,

Peng , A cascade

framework for masked face detection , in: 2017 IEEE

automation and mechatronics (RAM) , IEEE, 2017 ,

pp. 458 - 462 .

[21]

Wang ,

Zheng ,

C. P.

Chen , A survey on masked

against covid-19 , IEEE Transactions on Artificial

Intelligence 3 ( 2021 ) 323 - 343 .

[22]

Wang ,

Yang ,

Mao ,

Huang ,

Huang , W. Xu,

2016 , pp. 2285 - 2294 .

[23] G. De Magistris , R. Caprari , G. Castro, S. Russo,

robot interaction , in: AIxIA 2021-Advances in

Artificial

Intelligence:

20th International Confer-

gence , Virtual Event, December 1-3 , 2021 , Revised

Selected

Papers

, Springer, 2022 , pp. 310 - 325 .

[24]

Karras ,

Laine ,

Aila , A style-based generator

computer vision and pattern recognition, 2019 , pp.

[25] Kaggle , Face mask detection, https://www.kag-