=Paper=
{{Paper
|id=Vol-3078/paper-17
|storemode=property
|title=Real-time Italian Sign Language Recognition with Deep Learning
|pdfUrl=https://ceur-ws.org/Vol-3078/paper-17.pdf
|volume=Vol-3078
|authors=Veronica J. Schmalz
|dblpUrl=https://dblp.org/rec/conf/aiia/Schmalz21
}}
==Real-time Italian Sign Language Recognition with Deep Learning==
<pdf width="1500px">https://ceur-ws.org/Vol-3078/paper-17.pdf</pdf>
<pre>
Real-time Italian Sign Language Recognition with
Deep Learning
Veronica J. Schmalz1,2
1
    ITEC, imec research group at KU Leuven, Etienne Sabbelaan 51, 8500, Kortrijk, Belgium
2
    Freie Universität Bozen-Bolzano, Universitätsplatz, 1, 39100, Bozen, Italy


                                         Abstract
                                         Image recognition systems have evolved so much that they can actually be exploited to solve significant
                                         challenges today, such as facilitating communication for people with hearing impairments relying on sign
                                         languages. This project aims to apply deep learning and fine-tuning techniques to build an automatic
                                         recognition system for the Italian Sign Language (LIS). More specifically, our goal is a real-time image
                                         recognition system capable of accurately identifying the letters of the LIS alphabet provided by a user
                                         in a Human Computer Interaction (HCI) framework by means of Python’s Open Source Computer
                                         Vision (OpenCV) library and two models based on convolutional neural networks, namely CNN and
                                         VGG19, applied for large-scale image and video recognition. In addition to testing the performance of
                                         different architectures, our work constitutes a novel step towards the application of automatic image
                                         recognition techniques with the recently acknowledged LIS and a lately released open-source dataset,
                                         also representing the only source available for this type of research on single-handed isolated signs.
                                         This project may not only play a role in the interpretation and learning of the Italian Sign Language,
                                         encouraging its spread and study, but also in the inclusion of hearing-impaired individuals in the language
                                         research domain.

                                         Keywords
                                         sign languages, image recognition, deep learning, Italian Sign Language


1. Introduction
Sign languages (SLs) represent the most well-structured and organised means of communication
apart from the oral languages spoken around the world. They are primarily used among
individuals suffering from hearing loss and acoustic impairments via signs and gestures in the
visual space. Similarly to spoken languages, SLs do not constitute a universal language but
differ according to the areas and community groups from which they originate.
   To date, there is no official data confirming the current number of sign languages used in the
world, yet at least 161 SLs have been documented [1]. The main distinctive features of SLs are
the multimodality, simultaneity and iconicity of the communicative act. Indeed, the linguistic
information is conveyed by means of visual and manual interactions to which the interlocutor
needs to simultaneously pay attention. These are hand shapes and movements, together with
oriented gestures, facial expressions and mouthing[2, 3]. Given the complex set of elements
AIxIA 2021, December 01–03, 2021, online
Envelope-Open veronicajuliana.schmalz@kuleuven.be (V. J. Schmalz)
GLOBE https://github.com/VeroJulianaSchmalz (V. J. Schmalz)
Orcid 0000-0002-1636-6133 (V. J. Schmalz)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
that must be taken into account when analysing SL use, in this project we will mainly focus on
one-handed isolated static signs, generally performed with the signer’s dominant hand. More
specifically, in our case we will be considering the letters of the LIS alphabet, one of the key
elements to acquire when learning a sign language. The rest of the paper is organised as follows.
Section 2 provides relevant details concerning the Italian Sign Language. A brief overview
about the use of automatic recognition strategies in sign languages is presented in Section
3. Next, Section 4 describes the datasets taken into account for this project. In Section 5 we
outline the tools and methodologies used for our experiments, which are analysed in detail in
Section 6. Moreover, in Section 7 we report the obtained models’ results and in the subsequent
section, namely 8, we briefly describe an evaluation made in HCI using our best architecture
and OpenCV. Finally, in Section 9 we draw our conclusions and consider possible future work.


2. The Italian Sign Language (LIS)
In Italy, approximately 10,2% of the population, namely 6,198,000 people, suffers from hearing
problems, including absolute deafness or hearing loss [4]. Particularly, 1/1000 of the country’s
population is attested to already have hearing impairments from birth. Nevertheless, only chil-
dren born from hearing impaired parents, 5% circa of the infantile deafness cases, spontaneously
acquire the LIS given their exposure to it by the parents. On the contrary, 95% of children
with hearing loss from non-deaf parents rarely have access to SL in early age [5]. Although
highly variable factors such as the degree of hearing loss, the age of the loss and the adoption
of implants or rehabilitation therapies differentiate deaf subjects, there is scientific evidence on
the effectiveness and support of SL to facilitate communication, socialisation and integration
[6]. Nevertheless, the interest in SL in Italy has not always been keen. Indeed, prejudices
and misconceptions have taken time to be dismantled in families, the medical community
and society in general [7, 4]. Recently, the Italian Parliament has recognised the LIS [8] as
an official language used in the country. Despite being the last state of the European Union
acknowledging the effective status of this linguistic minority, this act offers grounds for greater
inclusion of hearing-impaired people in society and the diffusion of their language. In fact,
unlike other widespread sign languages, such as the American Sign Language (ASL), LIS counts
a reduced number of users and is rarely studied by both language learners and scholars. It
is no coincidence that there are few datasets available for research in the field of language
technologies and the like [9, 10]. More specifically, for the creation of an automatic recognition
system like ours, based on the static, isolated and one-handed letters of the LIS alphabet, there
is currently only one corpus available online [11]. One of the aims of this work is, therefore,
also to promote studies in this field and encourage the creation of material available for further
research in this domain.


3. Related Work
The goal of this project was the creation of a system for Italian Sign Language recognition
based on static alphabet signs. During its conception, however, we were confronted with
two main issues: the lack of isolated static LIS data, together with limited online accessibility
and dissemination material. In fact, although current literature reviews [12, 13, 14] on sign
language recognition count more than 400 articles and 90 deep sign language recognition
models covering 25 different sign languages, mainly ASL, Indian Sign Language (ISL), Arabic
Sign Language (ArSL) and Chinese Sign Language (CSL), there are few traces related to research
on LIS. Nevertheless, recently Italian scholars in the domain of Applied Linguistics and the
like have begun to pay more attention to this topic and new research trends have emerged,
especially dealing with dynamic sign language recognition and extraction [10, 15].
   In the global sphere of SL recognition research, the first recognition experiments date back to
the 1990s using colored gloves, neural networks [16, 17, 18] and Hidden Markov Models [19] for
the classification isolated signs. This type of research focuses on single letters or words at a time,
as opposed to continuous and dynamic sign recognition, where transition movements and speed
play a crucial role for the recognition task [20, 21]. Additionally, for the latter, powerful computer
vision and image processing techniques are required in order to be able to capture spatial and
temporal features. Among these, skin color detection, background subtraction, hand motion
detection and trajectory estimation are some of the most widely adopted [22, 23, 24, 25, 26].
With the advent of Microsoft Kinect, new data formats were introduced using depth and RGB
streams [27, 28]. The emergence of deep learning models, such as CNN, contributed to the
automatic extraction of features [29, 30, 31], while sequence models like RNN, GRU and LSTM
revolutionised the encoding of temporal information [32, 33].
   As far as resources for SL recognition are concerned, there are several international benchmark
corpora available. Among them the most widely adopted are those dedicated to ASL, which in a
recent literature review [12] counts over 24 different systems applied on data of diverse nature.
These include systems that achieve significantly high accuracy, such as 99.8% on single-handed,
static isolated alphabet signs with KNN [34] and NN [35], and 82.3% on real time signing with
SVM [36]. Second in terms of quantity of research is the ISL, whereby one system based on
leap motion acquired signs achieves up to 100% accuracy in the recognition of single handed,
dynamic, isolated signs using ANN [37]. Next are ArSL and CSL, reaching 99.97% accuracy
with HMM based systems on camera, single handed, static isolated signs [38] and 100% on
Kinect, single and double handed, dynamic isolated words [39]. In the Italian scenario, instead,
the majority of existing corpora concerns video contents, such as the one created by the PRIN
project, on spontaneous conversations, narratives and picture naming [40], the Italian Sign
Language Bank, a large parallel corpus between Italian and LIS related to lexicon and meaning
[9], and the one concerning the Italian Sign Language Alphabet generated through the Myo
Gesture Control Armband [10]. To our knowledge, only one open source corpus containing
static images of the LIS alphabet signs exists [11]. More information about it is provided in the
following section.


4. Dataset
Similar to oral languages, the LIS is supported by an alphabet consisting of 26 letters to ortho-
graphically represent words. The first signed Italian alphabet, established by the theologian
Assarotti in Genoa, dates back to the 19th century and involved combining single and double
handed signs by placing them in respective body areas [4]. In the 1970s, however, due to the
influence of the other more widely used manual alphabets, Italian signers began to adopt a
different system which quickly became widespread. The latter involved using only the dominant
hand and producing the signs in a neutral space.


Figure 1: The Italian Sign Language (LIS) alphabet represented with static frontal signs [10]. Arrows
are used to indicate the expected movements in the case of G, J, S and Z signs.


   In Figure 1 a representation of the LIS alphabet used to this day is displayed. As can be
seen from the signs in Figure 1, most letters can be statically signed, whereas letters such as
G, J, S and Z, require slight movements as indicated by the arrows. Fingerspelling is rarely
used in sign languages, especially in LIS, as it is generally applied to express concepts that
do not have a corresponding sign in the language. This phenomenon is also known as local
lexicalisation. However, the alphabet constitutes an important element in SL and is in fact one
of the first components to be learned. Our project for static sign language recognition is based
on it, and in particular on the images from the dataset created by Donnici and Monica [11] with
the collaboration of the Gualandi Foundation.


Figure 2: Examples from the dataset [11] for the O sign in five different perspectives, namely left (1),
right (2), front (3), top (4) and bottom (5) from the same signer.


  It represents the only available corpus of the LIS isolated and static alphabet signs and consists
of 22 signed letters’ categories, excluding G, S, J and Z, photographed from five different angles,
namely front, top, bottom, left and right. The total number of elements is 11,008 images from
eleven different signers, which have been divided into three sub-datasets. The first sub-portion
of the dataset contains images on a black background from all angles (11,008) (see Figure 2 for
an example), the second contains only top, front and bottom images (5,954), and the third only
frontal images (1,869). For our experiments we resorted to the first and third one in particular.
The adopted methods are described in detail in the following section.


5. Methodology
Automatic recognition of sign languages represents a complex matter with which the research
community has been dealing for some time. Since the advent of deep learning techniques,
especially with regard to the use of convolutional neural networks (CNN), researchers have
been experimenting with approaches of automatic feature engineering and classification via
image recognition [41, 11, 42], achieving significantly accurate results (usually well above 90%
of accuracy). Given the ability of CNN components for feature extraction from the provided
data with a high degree of generalization, they recognise variable elements in images, ranging
from people’s faces, to animals, characters, etc. while guaranteeing robustness and efficiency.
On account of their effectiveness, we use a CNN model trained from scratch and VGG19, a
pre-trained one, for our experiments with the LIS alphabet. The first of these is inspired by the
architecture used by Bheda and Radpour [41] for ASL recognition. It consists of three groups of
convolutional layers, a Max-Pooling layer and a Dropout one, followed by two groups of fully
connected layers, a Dropout layer and a final Output layer (see Figure 3 left).


Figure 3: CNN architecture (left) and VGG19 architecture (right).


  Alongside with this model, which we trained from scratch using isolated static LIS sign
images, we also experimented with a pre-trained deep neural model, VGG19 (see Figure 3
right), constituted by a convolutional encoder of 16 layers, two fully connected Dense layers, a
Softmax layer and five Max-Pooling layers [43]. The reason for this was that the latter had been
successfully trained on a larger amount of data, namely 14,197,122 images from ImageNet [44].
However, since the original training set did not include SL signs, we had to fine-tune the
model with our LIS images. In this process, we froze the first six layers to avoid re-training
and added four final Dense layers, following [45]. Therefore, the above-described models,
implemented in Keras, received as input during the training, validation and test phases, the
images of the LIS alphabet categorised in 22 classes. For data augmentation we adopted Keras’
ImageDataGenerator, which generates batches of tensor images to be looped. Following the
training and evaluation of the models, we finally opted to test the best one in a Human Computer
Interaction (HCI) perspective. To do this, we employed OpenCV [46], a Python library for
computer vision and image recognition adopting a cascade approach. After accessing the user’s
camera, we proceeded to capture images of the hand signs currently produced in a predefined
area and processed them through our pre-trained model. The latter classified the LIS signs at
the moment, providing and displaying three different hypotheses varying in terms of accuracy.
The specific details of our conducted experiments are illustrated in the following section.


6. Experiments
To perform our automatic LIS alphabet sign recognition task, we conducted two types of
experiments. First, we used the CNN architecture (see Figure 3) trained from scratch with two
sub-datasets containing fingerspelling images from five to a single different perspectives [11].
In the second case, attempting to improve the results, we employed the pre-trained VGG19
model [43], fine-tuning 10 of the 16 layers contained in it, inspired by Cabana et al. [45]. Details
about the training parameters of each experiment are provided in Table 1.

Table 1
Parameters and details about the experiments with CNN and VGG19 models using either frontal or
5-distinct-angles images.
                    Parameters        CNN model                 VGG19
                     Input size           64x64                  28x28
                     Batch size           32/64                  32/64
                      Epochs              50/100                 50/100
                     Optimizer            Adam                    SGD
                        Loss       Categ. cross-entropy   Categ. cross-entropy


   In both cases we performed two types of experiments, first using the dataset with only frontal
sign images (1,105 images in the training set and 519 images in the validation set) and then the
larger dataset from different angles (6,028 images in the training set and 2,952 images in the
validation set), for 50 and 100 epochs respectively, with a learning rate of 0.01. The two models
were fed images of different sizes in pixels, 64x64 and 28x28 respectively, which were processed
in batches of 32 or 64 in each one of the experiments. When using the CNN model, we adopted
Adam (Adaptive Moment Estimation) as optimizer and applied the categorical cross-entropy loss.
In contrast, for the experiments carried out with the pre-trained VGG-19, which we fine-tuned,
we selected SGD (Stochastic Gradient Descent) and the same loss as before.


7. Results
Once we had trained our eight different models, we proceeded to evaluate them. Since we had
employed a data generator during the training, we adopted Keras’ evaluate_generator, to which
we passed the test sets of the two respective sub-corpora. Subsequently, we mainly considered
the models’ accuracy, or the number of correct predictions, together with precision, recall and
F1-scores extracted with the classification_report function from Scikit-learn. A summary of the
accuracy obtained values is available in Table 2 for CNN and in Table 3 for VGG19. These two
provide information about the model, the number of training epochs and the images contained
in the training set, reflecting the less numerous dataset with only front signs, and the one with
five distinct angles’ images.

Table 2
Results from the different experiments with CNN using frontal images and images from five different
angles.


                               Model    Epochs     Test set    Accuracy
                               CNN1       50         238         92%
                               CNN2       50        1220         94%
                               CNN3       100        238         97%
                               CNN4       100       1220         93%


Table 3
Results from the different experiments with VGG19 using frontal images and images from five different
angles.


                               Model      Epochs    Test set    Accuracy
                              VGG19.1       50        238         95%
                              VGG19.2       50       1220         98%
                              VGG19.3       100       238         97%
                              VGG19.4       100      1220         99%

  The results in Table 3 prove that the pre-trained VGG19 model performed overall better than
the simple CNN (see Table 2), achieving 99% accuracy on 22 characters after 100 epochs with
the larger sub-dataset. This is presumably due both to the number of images used to pre-train
and fine-tune it, 11,008 in total, and to the possibility of extracting different features considering
the numerous angles from which the images of each letter of the LIS alphabet were taken. From
the classification report in Figure 4 we can confirm that the total accuracy of the model, 99%, is
also reflected in the F1-scores, the harmonic mean between precision and recall, of all 22 classes.
In fact, the values range from a minimum of 0.96 to a maximum of 1.00 on 10 letters of the LIS
alphabet.


Figure 4: Classification report of the results obtained with the VGG19.4 model for each sign image
class.

  In contrast, among the CNN models trained from scratch, the best model is the one trained
for 100 epochs using the dataset of frontal sign images only. The latter displays an overall
accuracy of 97%, and remarkable precision on several signs (see Figure 5 for more details).


Figure 5: Confusion matrix representing the classification results with the CNN3 model (97% accurate).
8. Testing our best model with OpenCV
In order to verify the actual functioning of our best model, VGG19.4, in the recognition of
the signs of the LIS alphabet, we decided to evaluate it in HCI using OpenCV [46]. Therefore,
we employed this real-time image recognition tool by means of a script through which we
accessed the camera of the user’s PC. Then, we detected in real-time the sign performed
within a pre-defined 28x28 square displayed in the user’s window. During the image detection
process, to distinguish the sign from the background, i.e. the foreground area, we calculated
the accumulated weighted average to subtract from the latter. Then, with a threshold value
of 25, we determined the sign contours to precisely define the bounding box to be processed,
captured the frame, resized it and flipped it to avoid mirror view. The image, in RGB colour
format, was then passed to our pre-trained model. The predictions of the latter appeared on the
user’s screen in the camera feed window. The most accurate prediction was represented with a
green character, corresponding to the orthographic sign of the finger-spelled LIS sign, while the
next two predictions, less accurate, appeared in red at the bottom of the square where the user
performed the signing. For this experiment a user signed each of the 22 letters of the dataset 50
times in three different settings with diverse backgrounds and lighting conditions. Note that
the user is not a highly proficient signer but a learner. Figure 6 below displays some examples 1 .


Figure 6: Examples of HCI in LIS alphabet signs recognition with OpenCV and VGG19.4


   After considering the predictions for each letter, an overall accuracy of 77.2% was determined
with an average of 3.46s for the correct recognition of each sign. Among them, the signs for
which the most incorrect predictions were found were E, Q, X and M, very often misclassified
as A, P, C and W. Considering the relative similarity between these signs and the fact that the
user was not an highly experienced signer, these errors were somewhat predictable. Since we
used the best model trained with images from several angles, i.e. front, right, left, above and
below, through the attempts with the OpenCV library tools we have additionally demonstrated
the model’s capacity to effectively detect most of the signs, including some that were made in
unconventional or imprecise angles by the user, i.e. differently from what was represented in
Figure 1.
   In addition, if users prefer, by pressing the space key on the keyboard they can first display
on a pop-up window of their screen a random image from the dataset to be signed, which they
    1
     A demo and more details about this experiment and project can be found under https://github.com/
VeroJulianaSchmalz/LIS-Recognition-with-Deep-Learning-/
will use as a reference, and then try to reproduce it. In this case, to assess the proper execution
of the sign, models such as CNN3 (see Figure 5), trained with front-only images reflecting the
references in Figure 1, could be used.


9. Conclusion and Future Work
In this paper we described the application of deep learning and fine-tuning techniques for the
recognition of LIS alphabet signs. We described the different experiments we conducted using
the only available isolated static LIS dataset for image recognition and classification, and two
models, a CNN designed with Keras that we trained from scratch, and a deeper pre-trained
model, VGG19, which we had to fine-tune. Based on two different sections of the corpus, one
with numerous multi-angle images, and one with frontal images only, we obtained an accuracy
of 99% with VGG19.4 and of 97% with the CNN3. Finally, by means of OpenCV we confirmed
the accuracy of our best model in HCI through real-time user’s inputs. The obtained results
provide comparability with other related studies conducted in state-of-the-art sign recognition
experiments, despite the fact that no similar studies on LIS one-handed isolated alphabet signs
have been published so far. Additionally, we managed to demonstrate that it is possible to
achieve significant results from relatively simple image recognition technologies, as long as
there is a consistent dataset with which to train and test the neural models. It might be possible
to build an even larger corpus with the collaboration of national hearing-impaired communities
and organizations, as there are currently no other similar static hand-signed datasets concerning
the basic LIS alphabet. In the future, we might consider extending the recogniser to interpret
single-handed isolated words rather than simple alphabet signs. However, given the dynamism
and multi-channelled nature of sign language, one possibility would be to design a system
that takes into account double-handed signs, again in real-time HCI or continuous image data.
Finally, with a broader number of video data from different signers, a real-time LIS interpreting
system could be envisaged. Projects of this kind constitute important resources for valuing
the diversity and inclusion of people with hearing impairments in society. Additionally, our
system could also help promoting the learning of Italian Sign Language, allowing learners to
independently practice fingerspelling and become fluent SL users.


References
 [1] Ethnologue, Sign language, 2021. URL: https://www.ethnologue.com/subgroups/
     sign-language.
 [2] A. Kusters, M. Spotti, R. Swanwick, E. Tapio, Beyond languages, beyond modalities:
     Transforming the study of semiotic repertoires, International Journal of Multilingualism
     14 (2017) 219–232.
 [3] J. C. Lu, S. Goldin-Meadow, Creating images with the stroke of a hand: Depiction of size
     and shape in sign language, Frontiers in psychology 9 (2018) 1276.
 [4] C. Branchini, L. Mantovan, A grammar of italian sign language (lis), 2020.
 [5] E. Tomasuolo, T. Gulli, V. Volterra, S. Fontana, The italian deaf community at the time of
     coronavirus, Frontiers in Sociology 5 (2021) 125.
 [6] P. Rinaldi, E. Tomasuolo, A. Resca, La sordità infantile: nuove prospettive di intervento,
     Erickson, 2018.
 [7] V. Volterra, Chi ha paura della lingua dei segni?, Psicologia clinica dello sviluppo 18 (2014)
     425–427.
 [8] Decreto legge 22 marzo 2021, n. 41: Misure urgenti in materia di sostegno alle imprese
     e agli operatori economici, di lavoro, salute e servizi territoriali, connesse all’emergenza
     da covid-19. art. 34 ter- misure per il riconoscimento della lingua dei segni italiana e
     l’inclusione delle persone con disabilità uditiva (2021).
 [9] P. Prinetto, U. Shoaib, G. Tiotto, The italian sign language sign bank: Using wordnet for
     sign language corpus creation, in: 2011 International Conference on Communications and
     Information Technology (ICCIT), IEEE, 2011, pp. 134–137.
[10] I. Pacifici, P. Sernani, N. Falcionelli, S. Tomassini, A. F. Dragoni, A surface electromyography
     and inertial measurement unit dataset for the italian sign language alphabet, Data in Brief
     33 (2020).
[11] M. Donnici, G. Monica, Italian sign language fingerspelling recognition, 2018. URL: https:
     //github.com/maghid/italian_fingerspelling_recognition.
[12] A. Wadhawan, P. Kumar, Sign language recognition systems: A decade systematic literature
     review, Archives of Computational Methods in Engineering 28 (2021) 785–813.
[13] R. Rastgoo, K. Kiani, S. Escalera, Sign language recognition: A deep survey, Expert Systems
     with Applications 164 (2021) 113794.
[14] R. Elakkiya, Machine learning based sign language recognition: A review and its research
     frontier, Journal of Ambient Intelligence and Humanized Computing 12 (2021) 7205–7224.
[15] G. Saggio, P. Cavallo, M. Ricci, V. Errico, J. Zea, M. E. Benalcázar, Sign language recognition
     using wearable electronics: implementing k-nearest neighbors with dynamic time warping
     and convolutional neural network algorithms, Sensors 20 (2020) 3879.
[16] K. Murakami, H. Taguchi, Gesture recognition using recurrent neural networks, in:
     Proceedings of the SIGCHI conference on Human factors in computing systems, 1991, pp.
     237–242.
[17] S. S. Fels, G. E. Hinton, Glove-talk: A neural network interface between a data-glove and
     a speech synthesizer, IEEE transactions on Neural Networks 4 (1993) 2–8.
[18] S. A. Mehdi, Y. N. Khan, Sign language recognition using sensor gloves, in: Proceedings
     of the 9th International Conference on Neural Information Processing, 2002. ICONIP’02.,
     volume 5, IEEE, 2002, pp. 2204–2206.
[19] K. Grobel, M. Assan, Isolated sign language recognition using hidden markov models, in:
     1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational
     Cybernetics and Simulation, volume 1, IEEE, 1997, pp. 162–167.
[20] R.-H. Liang, M. Ouhyoung, A sign language recognition system using hidden markov
     model and context sensitive search, in: Proceedings of the ACM symposium on virtual
     reality software and technology, 1996, pp. 59–66.
[21] S. Marcel, O. Bernier, J.-E. Viallet, D. Collobert, Hand gesture recognition using input-
     output hidden markov models, in: Proceedings Fourth IEEE International Conference on
     Automatic Face and Gesture Recognition (Cat. No. PR00580), IEEE, 2000, pp. 456–461.
[22] N. B. Ibrahim, M. M. Selim, H. H. Zayed, An automatic arabic sign language recognition
     system (arslrs), Journal of King Saud University-Computer and Information Sciences 30
     (2018) 470–477.
[23] J. Han, G. Awad, A. Sutherland, Modelling and segmenting subunits for sign language
     recognition based on hand motion analysis, Pattern Recognition Letters 30 (2009) 623–633.
[24] X. Yang, X. Chen, X. Cao, S. Wei, X. Zhang, Chinese sign language recognition based on
     an optimized tree-structure framework, IEEE journal of biomedical and health informatics
     21 (2016) 994–1004.
[25] F.-S. Chen, C.-M. Fu, C.-L. Huang, Hand gesture recognition using a real-time tracking
     method and hidden markov models, Image and vision computing 21 (2003) 745–758.
[26] R. Elakkiya, K. Selvamani, S. Kanimozhi, R. Velumadhava, A. Kannan, Intelligent system
     for human computer interface using hand gesture recognition, Procedia engineering 38
     (2012) 3180–3191.
[27] Z. Zhang, Microsoft kinect sensor and its effect, IEEE multimedia 19 (2012) 4–10.
[28] Z. Zafrulla, H. Brashear, T. Starner, H. Hamilton, P. Presti, American sign language
     recognition with the kinect, in: Proceedings of the 13th international conference on
     multimodal interfaces, 2011, pp. 279–286.
[29] O. Koller, H. Ney, R. Bowden, Deep hand: How to train a cnn on 1 million hand images
     when your data is continuous and weakly labelled, in: Proceedings of the IEEE conference
     on computer vision and pattern recognition, 2016, pp. 3793–3802.
[30] D. Tran, L. Bourdev, R. Fergus, L. Torresani, M. Paluri, Learning spatiotemporal features
     with 3d convolutional networks, in: Proceedings of the IEEE international conference on
     computer vision, 2015, pp. 4489–4497.
[31] J. Huang, W. Zhou, H. Li, W. Li, Attention-based 3d-cnns for large-vocabulary sign
     language recognition, IEEE Transactions on Circuits and Systems for Video Technology
     29 (2018) 2822–2832.
[32] O. Koller, N. C. Camgoz, H. Ney, R. Bowden, Weakly supervised learning with multi-
     stream cnn-lstm-hmms to discover sequential parallelism in sign language videos, IEEE
     transactions on pattern analysis and machine intelligence 42 (2019) 2306–2320.
[33] O. M. Sincan, A. O. Tur, H. Y. Keles, Isolated sign language recognition with multi-scale
     features using lstm, in: 2019 27th Signal Processing and Communications Applications
     Conference (SIU), IEEE, 2019, pp. 1–4.
[34] D. Aryanie, Y. Heryadi, American sign language-based finger-spelling recognition using
     k-nearest neighbors classifier, in: 2015 3rd International Conference on Information and
     Communication Technology (ICoICT), IEEE, 2015, pp. 533–536.
[35] M. Zamani, H. R. Kanan, Saliency based alphabet and numbers of american sign language
     recognition using linear feature extraction, in: 2014 4th International conference on
     computer and knowledge engineering (ICCKE), IEEE, 2014, pp. 398–403.
[36] C. Savur, F. Sahin, Real-time american sign language recognition system using surface emg
     signal, in: 2015 IEEE 14th International Conference on Machine Learning and Applications
     (ICMLA), IEEE, 2015, pp. 497–502.
[37] D. Naglot, M. Kulkarni, Ann based indian sign language numerals recognition using
     the leap motion controller, in: 2016 International Conference on Inventive Computation
     Technologies (ICICT), volume 2, IEEE, 2016, pp. 1–6.
[38] A. A. Ahmed, S. Aly, Appearance-based arabic sign language recognition using hidden
     markov models, in: 2014 international conference on engineering and technology (ICET),
     IEEE, 2014, pp. 1–6.
[39] J. Zhang, W. Zhou, C. Xie, J. Pu, H. Li, Chinese sign language recognition with adaptive
     hmm, in: 2016 IEEE international conference on multimedia and expo (ICME), IEEE, 2016,
     pp. 1–6.
[40] C. Geraci, K. Battaglia, A. Cardinaletti, C. Cecchetto, C. Donati, S. Giudice, E. Mereghetti,
     The lis corpus project: A discussion of sociolinguistic variation in the lexicon, Sign
     Language Studies 11 (2011) 528–574.
[41] V. Bheda, D. Radpour, Using deep convolutional networks for gesture recognition in
     american sign language, arXiv preprint arXiv:1710.06836 (2017).
[42] A. Wadhawan, P. Kumar, Deep learning-based sign language recognition system for static
     signs, Neural Computing and Applications 32 (2020) 7957–7968.
[43] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image
     recognition, arXiv preprint arXiv:1409.1556 (2014).
[44] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy,
     A. Khosla, M. Bernstein, et al., Imagenet large scale visual recognition challenge, Interna-
     tional journal of computer vision 115 (2015) 211–252.
[45] G. M. Cabana, Elisa, J. Viader, Sign language translator opencv, 2019. URL: https://github.
     com/ecabestadistica/sign-language-translator-python-opencv.
[46] G. Bradski, A. Kaehler, The opencv library, Dr Dobb’s Journal of Software Tools 25 (2000)
     120.

</pre>