Face Recognition using Naive Bayes Classifier*
                                Aleksandra Stachecka1,∗,†, Tomasz Procek1,†
                                1
                                Faculty of Applied Mathematics, Silesian University of Technology, Kaszubska 23, 44100 Gliwice, POLAND


                                                 Abstract
                                                 The article is about the implementation and the performance of Face Recognition using Naive Bayes
                                                 Classifier. Firstly, it is explained how huge impact on today’s world has the face recognition system
                                                 and how it has change over past 60 years. Then the idea of PCA and eigenfaces is brought closer to a
                                                 reader. Moreover, the methodology of the program is explained. Not only, the most important functions
                                                 and variables but also the schema of Naive Bayes Classifier are shown on code fragments. The next
                                                 part is "Experiments" where viewer can find plots and specific information about dataset, examples of
                                                 generated eigenfaces and the performance and accuracy of the program which is estimated to be nearly
                                                 75%. Finally, there is a conclusion. Authors one more time remind the most important information,
                                                 explain the role of PCA used in the project and look to the future in order to improve their program.
                                                  Keywords
                                                  NaiveBayes, FaceRecognition, PCA


                                1. Introduction
                                Face recognition technology has its roots in 1960s. The groundbreaking steps where made by
                                Woody Bledsoe, Helen Chan Wolf and Charles Bisson. They were marking the most characteristic
                                facial features such as mouth, nose or eyes. Then computing distances between them and the
                                program was comparing values. Their pioneering efforts brought people closer to concepts
                                and techniques that later evolved with time, leading to creation of advanced face recognition
                                systems which everyone uses nowadays. Biometric algorithms are widely used across various
                                sectors of life [1, 2, 3]. Law enforcement agencies use it to identify suspects and find missing
                                persons. Airports and border control implement it to verify travelers in security process. In
                                businesses, it secures access to buildings and confidential information. Even Smartphones use it for
                                user authentication. Moreover, social media platforms employ face recognition for tagging
                                individuals in photos [4, 5].
                                   Since 1960s specialists have been constantly improving biometric algorithms. One of those
                                improvements is called Principal Component Analysis. The approach simplifies the complexity of
                                facial images by focusing on key features, reducing dimensions, and retaining essential infor-
                                mation. This method revolutionized face recognition by improving accuracy and computational
                                efficiency. PCA paved the way for more advanced algorithms, significantly influencing the
                                development of contemporary facial recognition systems.


                                *IVUS2024: Information Society and University Studies 2024, May 17, Kaunas, Lithuania
                                1,∗
                                      Corresponding author
                                †
CEUR
                  ceur-ws.org
                                      These author contributed equally.
Workshop
Proceedings
              ISSN 1613-0073
                                       as308025@student.polsl.pl (A. Stachecka); tp307987@student.polsl.pl (T. Procek)
                                       0009-0005-5736-9073 (T. Procek)
                                                ©️ 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2. Methodology
The process begins with acquiring and preprocessing the dataset. The "Labeled Faces in the Wild"
(LFW) dataset is fetched using the 𝑓𝑒𝑡𝑐ℎ_𝑙𝑓𝑤_𝑝𝑒𝑜𝑝𝑙𝑒 function from 𝑠𝑘𝑙𝑒𝑎𝑟𝑛𝑑𝑎𝑡𝑎𝑠𝑒𝑡𝑠
                                                                         .       . This dataset contains labeled images of
faces, and for this analysis, only inviduals with at least 50 face images are included to ensure
sufficient data per class. The images are resized to 50% of their original dimensions to reduce
computational complexity. The dataset is then split into a feature matrix 𝑋 and a target vector 𝑦.
The feature matrix 𝑋 contains the pixel values of the images, while the target vector 𝑦 contains
the corresponding class labels for each image. Additionally, the shape parameters of the images,
including the number of samples, height and width are stored for reference throughout the
analysis.

                             Listing 1: Fetching and Preprocessing LFW dataset
  1 lfw_people = fetch_lfw_people(min_faces_per_person=50, resize=0.5)
  2 X = lfw_people.data
  3 y = lfw_people.target
  4 target_names = lfw_people.target_names
  5 n_samples, h, w = lfw_people.images.shape
  6
  7   original_shape = (h, w)


   The feature matrix X is standarized using StandardScaler from sklearn.preprocessing. This
standarization is crucial for ensuring that the principal component analysis (PCA) operates
effectively. PCA is then applied to the standarized feature matrix 𝑋 to reduce its dimensionality
while retaining most of the variance. This reduction in dimensionality is achieved by extracting the
most significant features, known as principal components, from the data. In this analysis, 100
principal components are retained, as determined by the parameter 𝑛_𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠. The transformed data is
represented in a lower-dimensional space, resulting in the matrix 𝑋_𝑃𝐶𝐴. PCA is particularly well-
suited for this task because it effectively reduces the high dimen- sionality of image data
while preserving essential features that contribute to variance. This reduction is crucial for
computational efficiency and helps in avoiding the curse of dimensional- ity, which can adversely
affect machine learning algorithms. PCA also helps in noise reduction and improves the
performance of subsequent classifiers by focusing on the most significant
features.

                         Listing 2: Standarization and PCA on the feature matrix
  1 scaler   = StandardScaler()
  2 X_scaled = scaler.fit_transform(X)
  3
  4   # Applying PCA
  5 n_components = 100          # Number of principal components to keep
  6 pca = PCA(n_components=n_components, svd_solver=’randomized’, whiten=True)
  7 X_pca    = pca.fit_transform(X_scaled)
  8
  9 print(f"Original shape: {X_scaled.shape}")
  10 print(f"Transformed shape: {X_pca.shape}")
   To visualize the principal components, known as eigenfaces, the components are reshaped to
the original image dimensions. The first 10 eigenfaces are displayed to show the main features
captured by PCA. Additionally, a function is defined to visualize the original and reconstructed
faces, allowing for an assessment of how well PCA captures important features. The PCA-
transformed data 𝑋_𝑝𝑐𝑎 is inversely transformed to reconstruct the original images, and a
subset of these reconstructed images is displayed alongside their original counterparts.

                 Listing 3: Eigenfaces visualization and reconstruction of original faces
  1 eigenfaces = pca.components_.reshape((n_components, h, w))
  2
  3 plt.figure(figsize=(15,         8))
  4 for i in range(10): # displaying the first 10 eigenfaces
  5     plt.subplot(2, 5, i + 1)
  6     plt.imshow(eigenfaces[i], cmap=’gray’)
  7     plt.title(f"Eigenface {i+1}")
  8     plt.xticks(())
  9     plt.yticks(())
 10 plt.show()
 11
 12 def plot_reconstructed_faces(X_original, X_reconstructed, n_faces=4):
 13       plt.figure(figsize=(15, 8))
 14       for i in range(n_faces):
 15            ax = plt.subplot(2, n_faces, i + 1)
 16            plt.imshow(X_original[i].reshape((h, w)), cmap=’gray’)
 17            plt.xticks(())
 18            plt.yticks(())
 19
 20            ax = plt.subplot(2, n_faces, i + 1 + n_faces)
 21            plt.imshow(X_reconstructed[i].reshape((h, w)), cmap=’gray’)
 22            plt.xticks(())
 23            plt.yticks(())
 24
 25       plt.show()
 26
 27 X_reconstructed = pca.inverse_transform(X_pca)
 28 plot_reconstructed_faces(X,           X_reconstructed)


     For classification, a Naive Bayes classifier [6] is implemented from scratch. This involves
defining methods for fitting the model to training data, calculating likelihoods and posteriors and
making predictions. The dataset is split into training and testing sets using and 80-20 ratio with the
𝑡𝑟𝑎𝑖𝑛_𝑡𝑒𝑠𝑡_𝑠𝑝𝑙𝑖𝑡 function from 𝑠𝑘𝑙𝑒𝑎𝑟𝑛.𝑚𝑜𝑑𝑒𝑙_𝑠𝑒𝑙𝑒𝑐𝑡𝑖𝑜𝑛. The Naive Bayes classifier is trained on the training set and used
to predict labels for the test set. The accuracy of the classifier is then calculated using 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦_𝑠𝑐𝑜𝑟𝑒
from 𝑠𝑘𝑙𝑒𝑎𝑟𝑛.𝑚𝑒𝑡𝑟𝑖𝑐𝑠, providing a quantitative measure of the model’s performance.

                     Listing 4: Naive Bayes classifier implementation and prediction
  1 class NaiveBayes:
  2       def fit(self, X, y):
  3          self.classes = np.unique(y)
  4          self.mean = {}
  5         self.var = {}
  6         self.prior = {}
  7
  8         for c in self.classes:
  9             X_c = X[y == c]
 10             self.mean[c] = np.mean(X_c, axis=0)
 11             self.var[c] = np.var(X_c, axis=0)
 12             self.prior[c] = X_c.shape[0] / X.shape[0]
 13
 14     def _calculate_probability(self, mean, var, x):
 15         exponent = np.exp(-(x - mean) ** 2 / (2 * var))
 16         return exponent / np.sqrt(2 * np.pi * var)
 17
 18     def _calculate_posterior(self, x):
 19         posteriors = []
 20
 21         for c in self.classes:
 22              prior = np.log(self.prior[c])
 23              likelihood = np.sum(np.log(self._calculate_probability(self.mean[c], self.
                      var[c], x)))
 24              posterior = prior + likelihood
 25              posteriors.append(posterior)
 26
 27         return self.classes[np.argmax(posteriors)]
 28
 29     def predict(self, X):
 30         return [self._calculate_posterior(x) for x in X]
 31
 32 X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2,
        random_state=42)
 33
 34 nb = NaiveBayes()
 35 nb.fit(X_train, y_train)
 36
 37 predictions = nb.predict(X_test)
 38
 39 accuracy = accuracy_score(y_test, predictions)
 40 print(f"Accuracy: {accuracy:.2f}")


  Finally, a single image from the test set is selected to demonstrate the classifier’s prediction
capability. The selected image is transformed back to its original dimensions, and its true and
predicted labels are displayed. This visual representation helps in understanding the model’s
performance on individual instances, complementing overall accuracy metric.

                    Listing 5: Demonstrating classifier’s prediction capability
  1 index = 12
  2 single_image = X_test[index]
  3 single_image_original = pca.inverse_transform(single_image).reshape(original_shape)
  4
  5 predicted_label = nb.predict(np.array([single_image]))[0]
  6 true_label = y_test[index]
  7
  8 plt.imshow(single_image_original,         cmap=’gray’)
  9 plt.title(f"True label: {target_names[true_label]}\nPredicted label: {target_names[
          predicted_label]}")
 10 plt.xticks(())
 11 plt.yticks(())
 12 plt.show()


3. Experiments
The 𝑙𝑎𝑏𝑒𝑙𝑒𝑑_𝑓𝑎𝑐𝑒𝑠_𝑖𝑛_𝑡ℎ𝑒_𝑤𝑖𝑙𝑑_𝑑𝑎𝑡𝑎𝑠𝑒𝑡 is widely used for facial recognition tasks [7]. It contains JPEG images of
various famous people, and each image is labaled with the name of the person. Scikit-learn
[8] provides two loaders that will automatically download, parse the metadata files, decode
the JPEG and convert the slices into memmapped numpy arrays.


Figure 1: The exapmle of converted data


  For this analysis, we use a subset of the dataset where each person has at least 50 images.
This subset contains 1560 images of 12 different people.
  After decreasing the size of the photos and changing them into gray, PCA algorithm [9] leads to
generation of eigenfaces which create a basis set derived from all the images used to construct the
matrix. This results dimensional reduction by allowing the original training images to be
represented by a smaller set of basis images.
Figure 2: The plot shows the number of occurrences of each name of face in the dataset


Figure 3: Some examples of generated eigenfaces


  By performing PCA we can notice a decrease in variance ratio in increasing number of
components. So that the plot helps to decide how many principal components to select to retain as
much information as possible while reducing the number of dimensions. We calculated that the
best option for us is 100 components.
  To check the performance of our Face Recognition using PCA and Bayes Classificator pro-
grame we created the confusion matrix which is fundamental tool in this field. It provides a
detailed breakdown of how well the model’s predictions the actual class labels.
  Finally, in order to check detailed indicators as precision( ratio of true positive predictions to
the total predicted positives), recall(ratio of true positive predictions to the total actual
positives), F1-Score(average of precision and recall), support(the number of occurrences of
each class in dataset) the classification report was made.
  This report clearly shows the accuracy of our application which is estimated to be around
74%.
Figure 4: Proportion of variance


Figure 5: Confusion matrix


Figure 6: Classification report
4. Conclusion
This project demonstrates the application of machine learning techniques for facial recognition
using the "Labeled Faces in the Wild" (LFW) dataset. Through meticulous preprocessing and
feature extraction, we prepared the dataset for analysis, ensuring its suitability for subsequent
machine learning algorithms. Principal Component Analysis (PCA) played a pivotal role in
reducing the dimensionality of the dataset while retaining essential variance, effectively
capturing the underlying structure of the facial images. The visualization of eigenfaces provided
valuable insights into the primary features captured by PCA, enhancing our understanding
of the dataset’s characteristics. The implementation of a Naive Bayes classifier facilitated the
classification of facial images with satisfactory accuracy. By training the classifier on a subset of
the dataset and evaluating its performance on unseen data, we gained valuable insights into
its generalization capabilities. Furthermore, the visual representation of prediction results provided
a tangible demonstration of the classifier’s ability to accurately identify induviduals from facial
images, showcasing the practical utility of the developed model. Overall, this project
exemplifies the effectiveness of machine learning techniques in facial recognition tasks and
underscores their potential for diverse real-world applications, ranging from security and
surveillance to personalized user experience as beyond. As technology continues to evolve,
further advancements in machine learning algorithms and datasets hold promise for even more
accurate and robust facial recognition systems.

References
[1] M.-H. Le, N. Carlsson, Iddecoder: A face embedding inversion tool and its privacy and
    security implications on facial recognition systems, in: Proceedings of the Thirteenth ACM
    Conference on Data and Application Security and Privacy, 2023, pp. 15–26.
[2] A. Jaszcz, D. Połap, Aimm: Artificial intelligence merged methods for flood ddos attacks
    detection, Journal of King Saud University-Computer and Information Sciences 34 (2022)
    8090–8101.
[3] K. Prokop, D. Połap, G. Srivastava, J. C.-W. Lin, Blockchain-based federated learning
    with checksums to increase security in internet of things solutions, Journal of Ambient
    Intelligence and Humanized Computing 14 (2023) 4685–4694.
[4] M. Wieczorek, J. Siłka, M. Woźniak, S. Garg, M. M. Hassan, Lightweight convolutional
    neural network model for human face detection in risk situations, IEEE Transactions on
    Industrial Informatics 18 (2021) 4820–4829.
[5] B. U. H. Sheikh, A. Zafar, Unlocking adversarial transferability: a security threat towards
    deep learning-based surveillance systems via black box inference attack-a case study on
    face mask surveillance, Multimedia Tools and Applications 83 (2024) 24749–24775.
[6] A. Vidhya, Naive bayes explained, https://www.analyticsvidhya.com/blog/2017/09/
    naive-bayes-explained/, 2017. Accessed: 2024-05-20.
[7] A. A. Jha, Lfw people dataset, https://www.kaggle.com/datasets/atulanandjha/lfwpeople, 2023.
    Accessed: 2024-05-20.
[8] Scikit-learn developers, Scikit-learn: Machine learning in python, https://scikit-learn.org/
    stable/, 2023. Accessed: 2024-05-20.
[9] AIMonks, Principal component analysis (pca) in machine learning, https://medium.com/
    aimonks/principal-component-analysis-pca-in-machine-learning-407224cb4527, 2023. Ac-
    cessed: 2024-05-20.