<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Euro-Mediterranean Workshop on Artificial Intelligence and Smart Systems, October</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Deep Learning Methods for Arabic Sign Language Recognition and Translation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zaid Saad Bilal</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amir Gargouri</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hanaa F Mahmood</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hassene Mnif</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Education for Pure Science, Department of Computer Science, University of Mosul</institution>
          ,
          <country country="IQ">Iraq</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Laboratory of Signals systeMs aRtificial Intelligence and neTworkS</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National School of Electronics and Telecommunications, LETI Laboratory-ENIS, University of Sfax</institution>
          ,
          <country country="TN">Tunisia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>15</volume>
      <issue>2024</issue>
      <fpage>15</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>The methods of machine learning and deep learning neural networks are in use in big data analysis to improve the training of the algorithm for the necessary appreciation of human natural language. Such algorithms are capable of analyzing the data characteristichassene.mnif@enetcom.usf.tns and determining the relevant patterns, new information, and diferent aspects of modern society. Complex and deep learning algorithms are highly beneficial solutions in translation, voice search, and customer feedback analysis. The concentration of this research lies on realizing the utilization of deep learning algorithms in helping the deaf and hard of hearing especially in translation of what is being said in sign language. Thus, sign languages play an essential role in the exchange of information and interaction between the deaf and the hard of hearing. While translators and sign language interpreters mitigate the issue of communication barriers, they face some limitations - for example, translators are not always available, and the number of sign languages continues to expand. In this area, there is a need to enhance the current traditional practices and make the systems developed more accessible through older technologies. In line with this, this research includes a comparative analysis of deep learning algorithms for Arabic sign language recognition and translation thereby assessing the suitability of the technology in serving the needs of the deaf and hard of hearing people in real-time interaction. The evidence supports the hypothesis on the efectiveness of deep learning techniques compared to other AI methods in translating Arabic sign language.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deep learning</kwd>
        <kwd>Machine learning</kwd>
        <kwd>Neural networks</kwd>
        <kwd>Sign language recognition Arabic sign language Deaf and hard of hearing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Today the world wants to implement deep learning algorithms for understanding, translation, voice
search, or to serve the public and know their feelings and perceptions about the services ofered by
corporations. Thus, it grew necessary to use deep learning to extend humanitarian service to a section
of the society that needs a hand to help take its hand so as to improve on its ability to communicate.
and communication with people. This community will benefit from the above highlighted capabilities
and Vonore State help to bring this community out of the isolation it currently experiences.</p>
      <p>
        Self organisation and communication assurance is among the most basic and vital functions in
people’s existence. It is a simple yet functional means of communication and expressing oneself as well
as getting informed on the events occurring in individual areas and throughout the world. However, it
is important to note that there is a significant portion of the world’s population who are unable to do so
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], for instance, due to hearing impaired conditions. Instead, weak speech, or both Weak speech means
that a party cannot efectively communicate or argue its case; or they have valid but feeble grounds to
support their case. Hearing loss can be defined as being the partial or complete loss of hearing in one
ear, or both ears. However, mutism is a disability that afects the ability to speak and this can make
some afected individuals unable to talk if for example deafness and mutism happens at childhood since
it leads to language impairment which is also known as ‘mutism’. These diseases are among the most
prevalent disability known in the world today (Hassan and colleagues 2020). Firstly, the statistical data
on children with physical disabilities for the past decade provides an indication of an emergence of
defective hearing loss among the newborns, which distorts the communication process between the
infants and rest of the society [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        According to WHO report, the population of people with hearing impairment in year 2005 has
approximate to 278 million[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Within this period, this figure has risen to 360 million, a figure that has
been arrived at by an appreciation of approximately 14 % [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Therefore, the number has since been on
the rise. The current WHO report shows that 466 million people had hearing loss in 2019, 5% of the
world’s population, 432 million (5%) being adults, and 34 million 5% being children. Children [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. WHO
also predicted that by 2050 it will almost double ( reaching 900 million individuals [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This population
is on the rise and growing very fast. Thus, there exist a communication gap, which hinders the lives
of the deaf and mute people and their interactions with their fellow beings. The deaf are a dynamic
segment of society, and it is from here that there developed a need for a way of enabling the deaf and
mute to engage with diferent segments. In the communities that they live, therefore sign language.
      </p>
      <p>
        Most of the deaf and hard of hearing individuals use sign languages as their primary mode of
communication [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. It is a useful method of coping with the lack of communication and interpersonal
interactions between them and others, and the sign language interpreters assist in eradicating the
communication disparity with the hearing impaired through sign language to spoken languages and
spoken languages to sign languages. However, these interpreters may not always be present to assist
the deaf and hard of hearing people also there are many sign languages being used around the world.
As stated by the World Federation of the Deaf, there is over 300 sign languages in over 7 million deaf
communities in the world today [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Thus, there is a need to have a technological-based system to
assist the work of the sign language interpreters, top of that, the availability of this technology to
the deaf and hard of hearing people at the right time and place. In this study, the aspects that will be
understood include machine learning and the strategies needed in the recognition of sign language
with their models and the application of it to build a speaking model of the Arabic sign language.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Works</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] study they designed a method/ system for visual manual data collection utilizing Arabic Sign
Language; and translating the data processed in this visual format into text. The data used that has
been incorporated into the work encompasses 54,049 images of the Arabic sign language alphabet,
whereby every given category consists of 1,500 images and the total of 25 categories or meanings
that are represented by a hand gesture or sign. Several transformations and image enhancements
were performed on the input images during the preprocessing stage. EficientNetB4 is a heavy-weight
architecture that takes totally more complex form but achieves 98% training accuracy and 95% testing
accuracy.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the aim was to describe a good technique for the automatic recognition of Arabic sign language
that employs convolutional neural networks (CNNs) for feature extraction, and the long-term memory
(LSTM) for classification. CNN named Alex Net is incorporated to extract deep features from the input
image while LSTM to maintain sequential contextual information of video frames. The method was
used to analyse fifty iterations of one hundred and fifty tags typical of daily activities of the three signers
securing the dataset. The accuracy of the overall recognition reached 95 per cent, while the proposed
method significantly decreased the average number of iterations of the optimization algorithm. 9%
for the signer-dependent case and 43 for the signer-independent case based on the modified Hidden
Markov Model. And the signer-independent results are 86% for the non-word sequences and 62% for
the more challenging case.
      </p>
      <p>
        In the study [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] a novel system namely ArSLR is introduced for localizing and recognizing the
various Arabic sign language alphabets using a Faster Region-based CNN. Successfully, Faster R-CNN
is intended for the identification and the correspondence of features in a given image and training
of the position of the hand. For the purpose of experiment to verify the proposed R-CNN-based fast
sign recognition system, VGG-16, and ResNet-18 models are employed, and openly available a real
ArSLR image dataset is gathered. The proposed approach used in the modelling gave a accuracy level
of 93 percent. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] study, the authors planned to review scientific papers on intelligent systems in
sign language recognition discussed during the last twenty years. The research study analysed a total
of 649 publications of decision support and Intelligent Systems for Sign Language Recognition (SLR)
derived from Scopus database. Using triangulation, the extracted publications were analyzed using
VOS Viewer plyometric software. It is believed that this work will also advance understanding and
build the intelligence-driven SLR in various subject fields, and ofer directions to readers, academics,
and professionals[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. study developed a novel ArSLR scheme utilizing an unsupervised DNN based
approach for Arabic alphabet letters recognition and classification. It is important to note that after
resizing and normalization 6000 samples of 28 Arabic alphabetic signs were used to extract the features.
In the process of classification, SoftMax regression was considered and reflected an 83% average accuracy
level. 32%.
      </p>
      <p>
        In the study, [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] described a new technique in identifying the word in the Arabic sign language using
the Leap Motion device, which helps in developing three dimensional gloves models of human hand
by using the infrared rays. Their approach based on identification of features that are mathematical
extracted from Leap Motion controller. Such as, the results in the presented research reveal that the
proposed method yielded outcomes as high as 89%, proving its eficiency. The recognition rate ranges
from 65% for one-handed gestures, 96% for two-handed gestures.
      </p>
      <p>When carried out the study delineated by [14], I provided a new system which does not compell
the deaf individuals to put on uncomfortable devices like gloves to enable the recognition of hands.
The system is founded on motion descriptors from 4D image planes, to be exact. This task is done
under Scale-invariant feature transform (SIFT) technique since it can extract strong similar rotation and
occlusion features. Moreover, the dimensionality of the extracted feature vectors is handled in LDA
(Lineral Discriminant Analysis) technique, and by making increase in distance between the classes, the
accuracy of the input system is increased. Hence, the classifiers are classified as Arabic Sign characters
by using diferent methods namely; Support Vector Machine (SVM), k-nearest Neighbor (k-NN), and
minimum distance. Several experiments were carried out to adopted in testing the eficiency of the
system which indicates that the accuracy of the results obtained is in the rate of 99%. There were also
experiments which showed how the proposed system is reliable even if there was any rotation and
exhibited identification rate of about 99 %. Furthermore, the evaluation demonstrated that the developed
system has the potential to be on the same level with other systems developed in related works.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>This section includes data collection and also some simple enhancement of the data by applying simple
transformations to the image. A popular algorithm for detecting entities in images that we employed
in our study is the Single Shot MultiBox Detector (hereinafter SSD). This algorithm combines several
components:This algorithm combines several components:
1. VGGBase: This part generates the lower-level feature maps, and the model adopted here is the</p>
      <p>Convolutional Neural Network based on the VGG16.
2. Auxiliary convolutions: This particle carries out feature maps which VGGBase layer has performed
to produce additional Feature Maps.
3. Prediction convolutions: These convolutions predict translation boxes and the classes present in
the boxes for the video.
4. MultiBoxLoss: This is the loss function for training the SSD model, which will take in an input
image and output the locations of the boxes that contain the objects of interest. It contains loss of
location and loss of assurance.
5. SSD: This is the primary MEC-SSD model that includes all the features stated earlier. The last
stage takes the input image and generates the final estimate of the locations of the objects in the
image and the probability of their separation.</p>
      <p>The algorithm follows the SSD architecture, where the basic network extracts features, auxiliary
convolutions generate additional feature maps, prediction convolutions predict object locations and
classes, and a MultiBox loss function is used to train the model. The hand is first detected using
the Single Shot Multibox Detector (SSD) [15], and then the detected hand serves as the input for the
classification system. To achieve high performance and precision in solving object detection challenges,
[16] uses a machine learning model called SSD. As seen in Figure I, the SSD structure in this paper is
based on a VGG-16 structure. Fully connected layers are eliminated from the classification system and
a series of convolutional layers are introduced for feature extraction.</p>
      <p>In this way, the CNN progressively downsamples the input to the next layers but meanwhile extracts
useful features within diferent sizes. In Multi Box, we explored the continuous dimension-limiting
prerogatives that have already been pre-calculated in order to ascertain that they approximate the
distribution of the original ground truth boxes as closely as possible. These priorities are selected so
that the likelihood of ending up with a figure that is greater than 0 was higher. The data sets have
been collected from the Kaggle as the dataset from diferent source which is available for research
purpose. We have accumulated a large amount of data of the movements that are used in Arabic Signing
Language. Still and motion clips of some movements are also in this collection. There is a Selected
Collection in the Arabic Alphabet RGB Sign Language (AASL) comprising 7,857 raw, fully labelled RGB
Sign Language images of Arabic alphabets; the only such dataset available on Kaggle as of now [17].
The proposed dataset targets to assist the people who prefer to develop models for diferent Arabic sign
language scenarios. Surveys were completed by more than 200 participants in diferent settings such
as Lighting, Background, orientation of the picture, size of the picture, and resolution of the picture.
Potential errors and incorrect images were thus avoided after having the collected images moderated,
validated, and filtered by a panel of experts in the field. Figure II shows mode we present here is based
on the results collected from students in and outside class settings.</p>
      <p>We then again applied data sanitizing preprocesses including scaling and normalization as well as
data splitting into training with validation data sets. Data was preprocessed with data cleaning and
augmentation to enhance the quality of the datasets as well as the size of the datasets.</p>
      <p>There were few, yet notable changes we made to adjust the dataset obtained from the Kaggle website
for the Single Shot Multibox Detector (SSD) algorithm. First, we enclosed the objects of interest with
rectangles on each image of the dataset and focused on hand gestures likely to depict the Arabic
numerals. Due to this, a rectangular box had to be decided on a case-by-case basis in order to wrap
around the hand signs so that the SSD could localize such gestures within the images. We also worked
on the truth table for all bounding boxes and annotations to the proper Arabic letter. The third point
reveals that SSD has to understand the correlation between the hand sign visual patterns and the
corresponding Arabic letters, and this truth table is helpful for that process. We incorporated these
adjustments in a way that enhanced the suitability of the dataset to be placed under SSD for efective
real-time gesture recognition and Arabic sign language interpretation.</p>
      <p>The selected dataset presents a challenge since it lacks an annotation file, which is necessary for the
creation of bounding box labels for object detection. However the location and label of the hand are the
only information the SDD network outputs; additionally, training data with two outputs (labels and
annotations) needs to be generated. The researcher proposed the following steps to add annotation
to the dataset to fix this issue: all photos in the dataset are shrunk to 200x200 pixels; additionally, to
add annotation to the dataset, bounding box information must be added. This suggests that in addition
to labeling the objects or areas of interest in our pictures, we also need to note the bounding box
coordinates each item corresponds to. There should be a matching bounding box for each image in the
dataset. We must create code to load the bounding box data for every image in our training set. We
labeled the bounding boxes of items in images using the Labeling annotation tool to add an annotation.
We Check and Examine Annotations. To make sure the annotations are accurate, it is essential to review
them. We verify that the class names are correct and that the bounding boxes around the objects of
interest are placed correctly.</p>
      <p>Figure 3 shows the sign language for each letter of the Arabic language. We have labeled the Arabic
letters with numbers so that the algorithms can recognize them in training and testing.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussions</title>
      <p>The following section shows the result of the proposed models in terms of regular DL metrics so that
the overall performance of our model can be compared to other models. To achieve better results, we
developed other codes whose framework uses diferent deep learning models that other researchers
have deemed accurate in the classification of images into multiple classes. The test record count of
diferent classes which have been accurately predicted by the proposed classification model as well
as the count of instances where misclassification has occurred are employed for evaluating the model.
Hence, from the confusion matrix a clearer picture of a predictive model’s performance especially in
regard to which classes are being forecasted rather accurately or inaccurately, the kinds of errors that
are being made and so on can easily be determined. The performance output gauge is evaluated by
Accuracy, Recall, Precision, and F1-Score as follows: Accuracy is the ratio of True Positives to all the
predictions. The recall is the ratio of True Positives to all the positives in Dataset. The Precision is the
ratio of True Positives to all the positives predicted by the model. F1-Score provides a single score that
balances both the concerns of precision and recall in one number. A good F1 score means that you have
low false positives and low false negatives.</p>
      <p>In Figure 4 we show the Confusion Matrix for training, and in Figure V we show the Confusion
Matrix for testing, and in Figure III we show an illustration of the accuracy and loss levels during
training on the database. During training, our proposed model achieved impressive scores of 98%, 98%,
98%, and 98% in the parameters of accuracy, precision, recall, and F1 score, respectively. Interestingly,
during testing, our proposed model achieved scores of 94%, 95%, 94%, and 94% for the same criteria. The
confusion matrix displayed shows the performance of the model on the training dataset. It compares
the true labels (actual classes) on the vertical axis with the predicted labels (model’s output) on the
horizontal axis. Each number within the matrix represents how many instances were classified into a
particular true versus predicted label combination.</p>
      <p>The diagonal values of the matrix indicate the number of correct predictions made by the model for
each class. For instance, a value of 236 in the row for the "Alef" class and the column for the "Alef" class
shows that the model correctly predicted 236 instances of the "Alef" class. These diagonal elements
reflect the model’s success in accurately classifying instances of each class.</p>
      <p>The of-diagonal values represent misclassifications. For example, if the value at row "Alef" and column
"Dad" is 3, it means the model mistakenly predicted 3 instances of "Alef" as "Dad". Such misclassifications
can help in understanding where the model is confused and where it needs improvement.</p>
      <p>Overall, the matrix shows that the model performs reasonably well with many high diagonal values,
suggesting that the classifier is accurately predicting most of the classes. However, examining the
misclassifications (of-diagonal elements) provides a deeper understanding of where the model struggles,
indicating areas for improvement or potential issues in class overlap. The performance metrics derived
from this matrix, such as accuracy, precision, recall, and F1-score, can help quantify the model’s
efectiveness and guide future optimizations.</p>
      <p>This confusion matrix represents the performance of the classifier on the test dataset, similar to the
previous one. It compares the true labels (actual classes) on the vertical axis with the predicted labels
(model’s output) on the horizontal axis, showing the model’s ability to classify the data correctly.</p>
      <p>Like in the training confusion matrix, the **diagonal elements** indicate the number of correct
predictions made by the model for each class. For example, the value 55 in the row for "Ain" and the
column for "Ain" shows that 55 instances of the "Ain" class were correctly identified. This is similar to
the diagonal behavior observed in the training dataset.</p>
      <p>The **of-diagonal values** indicate misclassifications. For example, the value 3 in the row for "Hah"
and column for "Teh Marbuta" means that 3 instances of "Hah" were incorrectly predicted as "Teh
Marbuta." These of-diagonal elements are crucial for diagnosing where the model is confusing classes.</p>
      <p>Overall, the matrix shows that the classifier performs similarly on the test dataset as it did on the
training dataset. The diagonal values are predominantly high, indicating that the model is correctly
predicting the majority of the classes. However, the of-diagonal values highlight instances where
misclassifications occur. A detailed analysis of these misclassifications can provide insights into the
model’s weaknesses and areas for improvement. This confusion matrix can also be used to calculate
evaluation metrics such as accuracy, precision, recall, and F1-score, which are vital for understanding
the classifier’s overall performance.</p>
      <p>In the Figure 7, we show the test images of the trained model in one image selected from each label
and show the image with its real box and the image with the box predicted by the trained model.</p>
      <p>As a result, the study’s thorough assessment of the proposal model under various training
circumstances demonstrates its resilience and adaptability. These findings emphasize the significance of taking
into account model performance under various training setups and ofer useful insights for practitioners
and researchers working on related deep-learning problems. Table 1 presents the outcomes of the
competitive models’ performances in translating text into sign language. The studies that provided
precise results have been taken into consideration. The authors’ choice of datasets was used to run the
simulations. As demonstrated, certain research employ datasets in Arabic sign language, American
Sign Language, others use datasets in Indian sign language, and still others use datasets in Indonesian
sign language.</p>
      <p>In Figure 4 we show the Confusion Matrix for training, and in Figure V we show the Confusion Matrix
for testing, and in Figure III we show an illustration of the accuracy and loss levels during training on
the database. During training, our proposed model achieved impressive scores of 98%, , and 98% in the
parameters of accuracy, precision, recall, and F1 score, respectively. Interestingly, during testing, our
proposed model achieved scores of 94%, 95%, 94 and 94% for the same criteria. The confusion matrix
displayed shows the performance of the model on the training dataset. It compares the true labels
(actual classes) on the vertical axis with the predicted labels (model’s output) on the horizontal axis.
Each number within the matrix represents how many instances were classified into a particular true
versus predicted label combination.</p>
      <p>This confusion matrix in Figure (5) represents the performance of the classifier on the test dataset,
similar to the previous one. It compares the true labels (actual classes) on the vertical axis with the
predicted labels (model’s output) on the horizontal axis, showing the model’s ability to classify the data
correctly. Like in the training confusion matrix, the diagonal elements indicate the number of correct
predictions made by the model for each class. For example, the value 55 in the row for "Ain" and the
column for "Ain" shows that 55 instances of the "Ain" class were correctly identified. This is similar to
the diagonal behavior observed in the training dataset.</p>
      <p>Figure 6 shows the loss curves for both the training and validation datasets over the course of 100
epochs. The blue line represents the training loss, which initially drops sharply before stabilizing at a
lower value as the model improves. The orange line represents the validation loss, which starts higher
than the training loss and decreases gradually, though it remains consistently higher throughout. The
gap between the two lines suggests that while the model is improving on the training data, it is not
generalizing as well to the validation data, a potential sign of overfitting. The oscillations observed
in both loss curves after initial stabilization indicate that the model’s performance on the training
and validation sets reaches a plateau after a certain point, implying that further training may yield
diminishing returns without significant improvements.</p>
      <p>The study presented in the paper efectively uses deep learning methods (SSD, VGG16, and CNN)
for Arabic Sign Language (ArSL) recognition and achieves high accuracy (94-98%); however, it has
several significant limitations that can be further improved to increase its impact on the field. There is
one major drawback in the current programs, and it is the emphasis on the static recognition of the
alphabet, which does not capture the dynamic nature of sign language. It is important to understand
that sign languages are not just a set of fixed positions of the hands but rather dynamic movements,
facial expressions, and other contextual features. The use of temporal models like LSTM networks or
RNNs could help enhance real-time gesture translation since sign language is temporal and sequential
in nature. the dataset used in the study is small and only focused on the Arabic alphabet, which makes it
dificult to apply the model to more complex sign languages that include sentences, phrases, or context
information. Incorporating more signs and real life sign language applications would improve the
model and its versatility in diferent situations. The last issue that can be mentioned as a weakness
of the study is the use of manual annotations, which can be subjective and contain mistakes. The
training data could be improved by employing automated or semi-automated methods of annotation,
and possibly crowd-sourced validation. the absence of performance statistics and comparisons with
other models in real-time also poses a drawback to the study. Measures such as inference time, latency,
and computational complexity would give information on the model’s suitability for use in practical
applications. Moreover, a more detailed comparison of the proposed model with other approaches
would provide a better understanding of the advantages of the proposed model and its weaknesses.
Describing why the specific model architectures and techniques were chosen and discussing their
weaknesses would enhance the paper’s originality and significance.</p>
      <p>In the Figure 7, we show the test images of the trained model in one image selected from each label
and show the image with its real box and the image with the box predicted by the trained model. As a
result, the study’s thorough assessment of the proposal model under various training circumstances
demonstrates its resilience and adaptability. These findings emphasize the significance of taking into
account model performance under various training setups and ofer useful insights for practitioners
and researchers working on related deep-learning problems.</p>
      <p>Table 1 presents the outcomes of the competitive models’ performances in translating text into sign
language. The studies that provided precise results have been taken into consideration. The authors’
choice of datasets was used to run the simulations. As demonstrated, certain research employ datasets
in Arabic sign language, American Sign Language, others use datasets in Indian sign language, and still
others use datasets in Indonesian sign language.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In conclusion, this study emphasizes the importance of deep learning neural networks in addressing
the communication barriers faced by the deaf and hard-of-hearing community. Our study focuses on
leveraging deep learning algorithms to facilitate sign language interpretation, to enhance accessibility
and communication for individuals who rely on sign languages. In our study, we used the integration
of SSD, Vgg16, and CNN, and an Arabic Sign Language Translator is implemented on The Arabic
Alphabet RGB Sign Language (AASL) from Kaggle. During training, our proposed model achieved
impressive scores of 98%, 98%, 98%, and 98% in the parameters of accuracy, precision, recall, and F1
score, respectively. Interestingly, during testing, our proposed model achieved scores of 94%, 95%, 94%,
and 94% for the same criteria. The results of the study demonstrate the efectiveness of deep learning
algorithms, especially in the context of recognizing and translating Arabic sign language. Despite the
slight variation in performance between the training and testing phases, the proposed model achieved
commendable scores across key evaluation metrics, demonstrating its ability to facilitate seamless
communication for deaf and hard-of-hearing communities.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
[14] I. C. Education, Neural networks, https://www.ibm.com/ae-ar/cloud/learn/neural-networks#
toc------cFaMGv3g, 2020.
[15] W. Liu, et al., Ssd: Single shot multibox detector, in: Proc. of European Conference on Computer</p>
      <p>Vision, 2016, pp. 21–37.
[16] C. Szegedy, et al., Going deeper with convolutions, in: Proc. of the IEEE Conference on Computer</p>
      <p>Vision and Pattern Recognition, 2014, pp. 1–9.
[17] M. Al-Barham, et al., Rgb arabic alphabets sign language dataset, arXiv (2023). URL: https:
//arxiv.org/abs/2301.11932.
[18] R. M. Duwairi, Z. A. Halloush, Automatic recognition of arabic alphabets sign language using
deep learning, International Journal of Electrical and Computer Engineering (IJECE) 12 (2022)
2996–3004. doi:10.11591/ijece.v12i3.pp2996-3004.
[19] V. Bheda, D. Radpour, Using deep convolutional networks for gesture recognition in american
sign language, CoRR abs/1710.06836 (2017).
[20] L. A. Khuzayem, et al., Efhamni: A deep learning-based saudi sign language recognition application,</p>
      <p>MDPI Journal 24 (2024). doi:10.3390/s24103112.
[21] J. B. Idoko, Deep Learning-Based Sign Language Translation System, Ph.D. thesis, Near East</p>
      <p>University, 2020.
[22] A. M. Buttar, et al., Deep learning in sign language recognition: A hybrid approach for the
recognition of static and dynamic signs, Mathematics 11 (2023). doi:10.3390/math11173729.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hassan</surname>
          </string-name>
          ,
          <article-title>Arabic sign language characters recognition based on a deep learning approach and a simple linear classifier</article-title>
          ,
          <source>Jordanian Journal of Computers and Information Technology (JJCIT) 06</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Alnahhas</surname>
          </string-name>
          , et al.,
          <article-title>Enhancing the recognition of arabic sign language by using deep learning and leap motion controller</article-title>
          ,
          <source>International Journal of Scientific &amp; Technology Research</source>
          <volume>9</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zakariah</surname>
          </string-name>
          , et al.,
          <article-title>Sign language recognition for arabic alphabets using transfer learning technique</article-title>
          ,
          <source>Hindawi Computational Intelligence and Neuroscience</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tharwat</surname>
          </string-name>
          , et al.,
          <article-title>Sift-based arabic sign language recognition system</article-title>
          ,
          <source>in: Afro-European Conference for Industrial Advancement. Advances in Intelligent Systems and Computing</source>
          , volume
          <volume>334</volume>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>U.</given-names>
            <surname>Michelucc</surname>
          </string-name>
          ,
          <article-title>Applied Deep Learning: A Case-Based Approach to Understanding Deep Neural Networks</article-title>
          , APress, Switzerland,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alawwad</surname>
          </string-name>
          , et al.,
          <article-title>Arabic sign language recognition using faster r-cnn (</article-title>
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Deep learning: Methods and applications, Foundations and</article-title>
          TrendsR in Signal Processing (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Suliman</surname>
          </string-name>
          , et al.,
          <article-title>Arabic sign language recognition using deep machine learning</article-title>
          ,
          <source>in: 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Adeyanjua</surname>
          </string-name>
          , et al.,
          <article-title>Machine learning methods for sign language recognition: A critical review and analysis</article-title>
          ,
          <source>Intelligent Systems with Applications</source>
          <volume>12</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Abul</surname>
          </string-name>
          ,
          <article-title>Survey on evolving deep learning neural network architectures</article-title>
          ,
          <source>Journal of Artificial Intelligence and Capsule Networks</source>
          <volume>01</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Yasaka</surname>
          </string-name>
          , et al.,
          <article-title>Deep learning with convolutional neural network in radiology</article-title>
          ,
          <source>Japanese Journal of Radiology</source>
          <volume>36</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nielsen</surname>
          </string-name>
          ,
          <source>Neural Networks and Deep Learning</source>
          , ACADEMIA,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>J. DeMuro</surname>
          </string-name>
          ,
          <article-title>What is a neural network?</article-title>
          ,
          <source>TechRadar</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>