A Real-Time Vision Based System for Recognition of
           Static Dactyls of Albanian Alphabet

                           Eriglen Gani                        Alda Kika
                    Department of Informatics         Department of Informatics
                    Faculty of Natural Sciences       Faculty of Natural Sciences
                       University of Tirana              University of Tirana
                     eriglen.gani@fshn.edu.al            alda.kika@fshn.edu.al
                                            Bruno Goxhi
                                     Department of Informatics
                                     Faculty of Natural Sciences
                                        University of Tirana
                                    bruno.goxhi@fshnstudent.info


                                                         impaired people and hearing ones. It comes from in-
                                                         ability of hearing people to understand sign language.
                     Abstract                            To overcome this gap most of the times interpreters
                                                         can be used. The other, more comfortable solution is
    The aim of the paper is to present a real-           usage of technology. Natural interfaces can be used to
    time vision based system that is able to rec-        capture the signs and understand their meaning by us-
    ognize static dactyls of Albanian alphabet.          ing body positions, hand trajectories and head move-
    We use Kinect device, as an image receiv-            ments. Using technology for catching, processing and
    ing technology. It has simplified the process        translating dactyls in an understandable form for non
    of vision based object recognition, especially       deaf people, would help deaf ones integrate faster in
    for segmentation phase. Different from hard-         the society [GK16]. A real time dactyls translator sys-
    ware based methods, our approach does not            tem would provide many facilities for this community.
    require that signers wear extra objects like         Many countries have tried to develop real-time sign
    data gloves. Two pre-processing techniques,          language translator like: [Ull11, GK15c, TL11]. Un-
    including border extraction and image nor-           fortunately Albanian sign language (AlbSL) did not
    malization have been applied in the segmented        get much focus as other languages.
    images. Fourier transform is applied in the re-          Deaf people in Albania used to communicate on the
    sultant images which generates 15 Fourier co-        way that is based on finger-spelled Albanian words
    efficients representing uniquely that gesture.       [ANA13]. Although not so efficient, dactyls play an
    Classification is based on a similarity distance     important role in this type of communication. They
    measure like Euclidian distance. Gesture with        form the bases of communication for deaf people. Al-
    the lowest distance is considered as a match.        banian dactyl alphabet is composed of 36 dactyls.
    Our system achieved an accuracy of 72.32%            Among them 32 are static dactyls. 4 of them are
    and is able to process 68 frames per second.         dynamics ones, which are obtained from consecutive
                                                         sequences of frames. The dynamic dactyls include (Ç,
1   Introduction                                         Ë, SH and ZH) [ANA13]. Our work is focused only in
Sign language is used as a natural way of communi-       32 static dactyls.
cation between hearing impaired people. It is very           Two most widely used methods for building real-
important for the inclusion of deaf people in society.   time translator system are hardware based and vision
There exist a gap in communication between hearing       based [ZY14]. In hardware based method the sign-
                                                         ers have to wear data gloves or some other marker
                                                         devices. It is not very natural to them. Vision
based methods are more challenging to be developed          data gloves achieve high performance but are expen-
but are more natural for deaf people. Two most              sive and not a proper way to human-computer inter-
common problems include a)complex background and            action perspective [GK15b].
b)illumination change [ZY14]. Sometimes it is hard              Web cameras with an image processing system can
to distinguish human hands from other objects parts         be used in vision based approaches. Research at
of the same environment. Sometimes the shadow or            [SSKK16] presents a vision based methodology using
light effects the correct identification of human hand.     web cameras to recognize gesture from Indian sign lan-
Kinect sensor by Microsoft, has simplified the process      guage. The system achieves high recognition rate. Au-
of vision based object recognition, especially the seg-     thors at [WKSE02] and [LGS08] use color camera to
mentation phase. It offers some advantages like: pro-       capture input gestures and then SVM (Support Vector
vides color and depth data simultaneously, it is inex-      Machine) and Fuzzy C-Means respectively to classify
pensive, the body skeleton can be obtained easily and       hand gestures. Despite this, in general web cameras
it is not effected by the light [GK16]. We are using        generate low quality of images and have an inability to
Kinect sensor as a real-time image receiving technol-       capture other body parts. It is also hard to generalize
ogy for our work.                                           the algorithms for web cameras due to many different
   Our Albanian sign language translator system in-         shapes and colors of hands [GK15b].
cludes a limited set of number signs and dactyls. In            Kinect sensors by Microsoft has simplified the pro-
the future other numbers, dynamic dactyls and signs         cess of vision based object recognition. It has many
will be integrated by making this system usable in          advantages as: provide color and depth data simulta-
many scenarios that require participation of deaf peo-      neously, it is inexpensive, the body skeleton can be ob-
ple. One usage of the system includes a program in          tained easily and it is not effected by the light. Various
a bar that could help the deaf people making some           researchers are using Microsoft Kinect sensor for sign
orders by combining numbers and dactyl gestures.            language recognition as in [GK15c], [SB13], [VAC13].
   Till now there does not exist any gesture data set           Vision based hand gesture recognition provides
for Albanian sign language. We are trying to built a        more intuitive interaction with a system. It is a
system that is able to translate static dactyls signs for   challange task to identify and classify hand gesture.
Albanian sign language and in the future it will be         Shape and movement play an important role in ges-
extended to dynamic dactyls and other signs. Creat-         ture categorization. A comparison between two most
ing and continuously adding new signs to an Albanian        widely used algorithm for shape recognition is done at
gesture data set would help building a more reliable        [CBM07]. It compares Fourier descriptors (FD) and
and useful recognition system for our sign language.        HU moments in terms of performance and accuracy.
   Section 1 gives a brief introduction. Section 2 sum-     Algorithms are compared against a custom and a real-
marizes some related works. The rest of the paper is        life gesture vocabulary. Experiment results show that
organized as follows. Section 3 presents an overview of     FD is more efficient in terms of accuracy and perfor-
methodology and a brief description of each methodol-       mance.
ogy’s processes. Section 4 describes the experimental           Research at [BGRS11] addresses the issue of fea-
environment. Section 5 presents the experiments and         ture extraction for gesture recognition. It compares
results. The paper is concluded in Section 6 by pre-        Moment In-variants and Fourier descriptors in terms
senting the conclusions and future work.                    of in-variance to certain transformations and discrim-
                                                            ination power. ASL images were used to form gesture
                                                            dictionary. Both approaches found difficult to classify
2   Related Work
                                                            correctly some classes of ASL.
Many researchers have followed different methodolo-             Authors at [BF12] compare different methods for
gies for building sign language recognition systems.        shape representation in terms of accuracy and real-
They are categorized into several types based on in-        time performance. Methods that were used to com-
put data and hardware dependency. Signs, which are          pare them include region based moments (Hu moments
mostly performed by human hands can be static or            and Zenike moments) and Fourier descriptors. Conclu-
dynamic. The sign language recognition systems are          sions showed that Fourier descriptors have the highest
categorized as hardware based or vision based.              recognition rate.
   Many works have been done to integrate some hard-            Shape is an important factor for gesture recognition.
ware based technologies to capture and translate sign       There exist many methods for shape representation
gestures, among them the most widely used are data          and retrival. Among them Fourier descriptors achieve
gloves. Authors at [Sud14] built a portable system for      good representation and normalization. Authors at
deaf people using a smart glove capable of capturing        [ZL+ 02] compare different shape signatures used to de-
finger movements and hand movements. In general             rived Fourier descriptors. Among them: complex coor-
dinates, centroid distance, curvature signature and cu-         Fourier descriptors can be derived from complex
mulative angular function. Article concludes that cen-       coordinates, centroid distance, curvature signature or
troid distance is significantly better than other three      cumulative angular function. In our case centroid dis-
signatures.                                                  tance is used due to [ZL+ 02]. After locating the cen-
   Sign language is not limited only in static ges-          ter of white pixels in the image, we have calculated
ture. Majority of signs are dynamic ones. Research           the distance of every border pixels from it. It gives
at [RMP+ 15] proposed a hand gesture recognition             the centroid function which represents two dimensions
method using Microsoft Kinect. It uses two differ-           area.
ent classification algorithms DTW and HMM by dis-               The normalization process consists of extracting the
cussing the pros and cons of each technique.                 same number of pixel, equally distributed, among hand
                                                             border. Choosing a lower number of border pixels de-
3    Methodology                                             crease the system accuracy, while choosing a higher
Figure 1 gives an overview of the followed methodology       number decrease the system performance. In our case
for our work.                                                a number of 128 pixels has been chosen.
                                                                Fourier descriptors are used to transform the resul-
                                                             tant image into a frequency domain. For each image,
       Real-Time
                              Hand                 Contour   only the first 15 Fourier coefficients are used to define
        Image                                                them uniquely. Other Fourier coefficients do not ef-
                           Segmentation            Tracing
       Retrieval
                                                             fect system accuracy. Every input gesture is compared
                                                             against a training data set using a similarity distance
                                                             measure like Euclidian distance. The gesture with the
                                                             lowest distance is considered as a match.
             Gesture                          Fourier
          Classification                  Transformation

                                                             4   Experiment Environment

                   Figure 1: Methodology                     Experiment environment used for implementing and
                                                             testing our real-time static dactyls recognition system
   Microsoft Kinect is used as a real-time image re-         is composed of the following hardware: A notebook
trieval. Kinect consists of an RGB camera, an IR emit-       with a processing capacity of 2.5 GHz, Intel Core-i5.
ter, an IR depth sensor, a microphone array and a tilt.      A memory capacity of 6 GB of RAM and a Windows 10
The RGB camera can capture three-channel data in a           operating system with a 64-bit architecture. Microsoft
1280 x 960 resolution at 12 FPS or a 640 x 480 resolu-       Kinect for Xbox 360 is used as a real-time image re-
tion at 30 FPS. In our work images consist of a 640 x        trieval technology. It generates 30 frames per second
480 resolution at 30 FPS. The valid operating distance       and can be used as a RGB camera and also can provide
of Kinect is approximately 0.8m to 4m [MSD16]. Due           depth data.
to its advantages, it has simplified the process of vision      System was developed using .Net technology.
based object recognition, especially for segmentation        Kinect for Windows SDK 1.8.0.0 was used as library
phase.                                                       between Kinect device and our application. It provides
   Every pixel generated from Kinect device contains         a way to process Kinect signals. An overview of the
information of their depth location layer and player         system architecture is given at Figure 2.
index. By using player index we focus only in pixels
that are part of human body [WA12]. In this way all
other pixels, not part of human body are excluded.
By applying a constant threshold we can obtain the                       Static Dactyls Recognition Application
human hand, since it is the first part of human body
towards the Kinect device [GK15a].                                                Kinect for Windows
   In order to perform Fourier transform we have to                                      SDK
generate a centroid function which is based in hand
image contour. Theo Pavlidis is used as a hand contour                      RGB Camera         Depth Sensor
tracking algorithm [Pav12]. The segmented hand is
transformed in greyscale where each pixel is classified
as a white or a black one. After applying Theo Pavlidis
algorithm, the resultant image contains only border                      Figure 2: System Architecture
pixels of human hand.
5    Experiment and Results
                                                             Table 2: Average Recognition Accuracy of Testing
To test the proposed system, several experiments were        Data Set
conducted. Each experiment is based on two aspects:
accuracy and computation latency. The first experi-              Static   True Recognition     False Recognition
ment measured the accuracy of correct identification             Dactyl   Rate (%)             Rate (%)
and classification of static dactyls. Our system is not
                                                                 NJ       87.5                 12.5
able to identify and classify dynamic dactyls. It is
                                                                 O        65                   35
based only in static ones. Firstly training data set is
                                                                 P        54                   46
created. It contains 320 dactyl gestures taken from two
                                                                 Q        52.5                 47.5
different signers. Each gesture is performed 5 times
                                                                 R        65                   35
from each signers and is represented by 15 Fourier co-
                                                                 RR       85                   15
efficients. There are in total 4800 coefficients (15x320).
                                                                 S        84                   16
For real-time testing, 4 different signers were used.
                                                                 T        67.5                 32.5
Each signer performed 5 gestures for each dactyl sign.
                                                                 TH       54                   46
In total they performed 640 experiments. Each ele-
                                                                 U        67.5                 32.5
ment in testing data set is compared against all ele-
                                                                 V        97.5                 2.5
ments in training data set. The element with lowest
                                                                 X        87.5                 12.5
Euclidian distance is considered as a match. The aver-
                                                                 XH       60                   40
age recognition accuracy for each static dactyl is given
                                                                 Y        50                   50
in the Table 1 and Table 2.
                                                                 Z        52                   48
Table 1: Average Recognition Accuracy of Testing             dactyls ”D”, ”E”, ”F”, ”N”, ”O” are more confused
Data Set                                                     ones.
                                                                The second experiment deals with system perfor-
    Static   True Recognition      False Recognition         mance. We want to achieve a performance that allows
    Dactyl   Rate (%)              Rate (%)                  the system to be deployed in real-time. For every sign
    A        100                   0                         we analyzed the time required for the following phases:
    B        87.5                  12.5                      hand segmentation, hand contour tracing, normaliza-
    C        70                    30                        tion, centroid function generation, Fourier transforma-
    D        67                    33                        tion and gesture classification. Table 5 summarize the
    DH       72.5                  27.5                      results.
    E        70                    30                           The system needs approximately 12 to 17 ms to
    F        60                    40                        process a static dactyl. Most of the overall time is
    G        75                    25                        consumed by hand segmentation and gesture classifi-
    GJ       72                    28                        cation processes. They occupy approximately 82% of
    H        87.5                  12.5                      total time. It can be deployed without any latency in
    I        82.5                  17.5                      a real-time system that uses Microsoft Kinect.
    J        98.75                 1.25
    K        100                   0                         6    Conclusion and Future Work
    L        50                    50
    LL       52.5                  47.5                      The aim of this paper is to built a real-time system
    M        85                    15                        that is able to recognize static dactyls for Albanian al-
    N        55                    45                        phabet by using Microsoft Kinect. Albanian alphabet
                                                             is composed of 36 dactyls and 32 of them are static.
   For all static dactyls, the system achieves an average    The static dactyls are used as inputs for our system.
accuracy rate of 72.32%. Results show that dactyls           Kinect device provides a vision based approach and is
with the highest accuracy rate are ’A’, ’J’, ’K’ and         used as an image retrieval technology. Its main fea-
’V’. Their accuracy rate is above 95%. Dactyls with          ture includes depth sensor. For every static dactyl, a
the lowest accuracy rate are ’L’, ’Y’ and ’Z’. Their         data set with 15 Fourier coefficients was built. In to-
accuracy rate is below 52%.                                  tal data set consists of 4800 coefficients. For testing
   Table 3 and Table 4 give information regarding            purpose, 4 different signers were used. Each of them
static dactyls confusion percentages. Some of Alba-          performed 5 times each of the static dactyls. A total
nian dactyls are easily confused with other dactyls          of 640 experiments were conducted. For classification
due to their similarity. Based on experimental results       purpose a similarity distance measures like Euclidian
                                                             Table 5: Computational Latency Results
     Table 3: Confusion Dactyls Percentages (%)
                                                                                           Computation
Static Dactyl   Confusion Dactyls Percentages             Phases
                                                                                           Latency (ms)
A               {A,100}                                   Hand Segmentation                5.49308
B               {B,87.5}; {TH,12.5}                       Hand Contour Tracing             0.08145
C               {C,70}; {E,12}; {X,12}; {Y,6};            Normalization                    0.03261
                {B,20}; {D,67}; {DH,5};                   Centroid Function Generation     0.03705
D
                {H,5}; {U,3};                             Fourier Transformation           2.45306
                {DH,72.5}; {E,2.5}; {F,12.5};             Gesture Cassification            6.67130
DH
                {U,12.5};                                 Total                            14.7686
                {C,2.5}; {E,70}; {J,2.5}; {TH,12.5};
E
                {V,12.5}                               distance was used. Every element in testing data set
                {B,12.5}; {DH,12.5}; {F,60};           is compared against each element in training data set.
F
                {TH,2.5}; {U,12.5}                     The element with the lowest Euclidian distance is con-
G               {G,75}; {I,12.5}; {J,12.5}             sidered as a match. The system is tested against accu-
GJ              {GJ,72}; {X,8}; {Z,20}                 racy and performance. Based on experiments results
H               {B,5}; {F,5}; {H,87.5}; {TH,2.5};      the system achieves an accuracy rate of 72.32%. The
I               {I,82.5}; {J,15}; {Y,2.5}              system needs to compute a static dactyl is 14.05 ms
J               {I,1.25}; {J,98.75}                    in average. It can be deployed in a image receiving
K               {K,100}                                technology that generates 68 frames per second.
L               {L,50}; {XH,35}; {Z,15}                   Future work consists of improving the overall sys-
                                                       tem performance and accuracy by applying a more re-
                                                       liable data set. This can be done by including more
                                                       diverse signers who have high knowledge of Albanian
     Table 4: Confusion Dactyls Percentages (%)        sign language. The future work also consist of adding
                                                       dynamic dactyls as well as other gestures of Albanian
Static Dactyl   Confusion Dactyls Percentages          sign language.


LL              {GJ,32.5}; {LL,52.5}; {M,15}           References
M               {E,12.5}; {M,85}; {Y,2.5}
                                                       [ANA13]     ANAD.      Gjuha e Shenjave Shqipe 1.
                {A,12.5}; {N,55}; {O,15}; {Q,15};
N                                                                  Shoqata Kombëtare Shiptare e Njerëzve që
                {Z,2.5};
                                                                   nuk Dëgjojnë, 2013.
NJ              {GJ,12.5}; {NJ,87.5}
                {C,2.5}; {N,2.5}; {O,65}; {P,12.5};    [BF12]      Salah Bourennane and Caroline Fossati.
O
                {Q,17.5};                                          Comparison of shape descriptors for hand
P               {P,54}; {O,23}; {R,23}                             posture recognition in video. Signal, Image
Q               {N,25}; {O,22.5}; {Q,52.5}                         and Video Processing, 6(1):147–157, 2012.
R               {J,18}; {N,17}; {R,65}
RR              {J,15}; {RR,85}                        [BGRS11] Andre LC Barczak, Andrew Gilman,
S               {S,84}; {Z;16}                                  Napoleon H Reyes, and Teo Susnjak. Anal-
T               {P,32.5}; {T,67.5}                              ysis of feature invariance and discrimina-
                {DH,15}; {P,16}; {Q,15};                        tion for hand images: Fourier descriptors
TH
                {TH,54}                                         versus moment invariants. In International
U               {N,32.5}; {U,67.5}                              Conference Image and Vision Computing
V               {E,2.5}; {V,97.5}                               New Zealand IVCNZ2011, 2011.
X               {K,12.5}; {X,87.5}
                {DH,12.5}; {TH,15}; {X,12.5};          [CBM07]     Simon Conseil, Salah Bourennane, and Li-
XH
                {XH,60}                                            onel Martin. Comparison of fourier de-
Y               {A,25}; {J,25}; {Y,50}                             scriptors and hu moments for hand pos-
                {DH,20}; {F,14}; {N,14};                           ture recognition. In Signal Processing Con-
Z
                {Z,52}                                             ference, 2007 15th European, pages 1960–
                                                                   1964. IEEE, 2007.
[GK15a]    Eriglen Gani and Alda Kika. Identifikimi                 using svm-knn. International Journal of
           i dores nepermjet teknologjise microsoft                 Applied Engineering Research, 11(8):5414–
           kinect. Buletini i Shkencave te Natyres,                 5418, 2016.
           20:82–90, 2015.
                                                         [Sud14]    Bh Sudantha. A portable tool for deaf and
[GK15b]    Eriglen Gani and Alda Kika. Review on                    hearing impaired people, 2014.
           natural interfaces technologies for design-
           ing albanian sign language recognition sys-   [TL11]     Pedro Trindade and Jorge Lobo. Dis-
           tem. The Third International Conference                  tributed accelerometers for gesture recog-
           On: Research and Education Challenges                    nition and visualization. In Technological
           Towards the Future, 2015.                                Innovation for Sustainability, pages 215–
                                                                    223. Springer, 2011.
[GK15c]    Archana S Ghotkar and Gajanan K
           Kharate. Dynamic hand gesture recogni-        [Ull11]    Fahad Ullah. American sign language
           tion and novel sentence interpretation al-               recognition system for hearing impaired
           gorithm for indian sign language using mi-               people using cartesian genetic program-
           crosoft kinect sensor. Journal of Pattern                ming. In Automation, Robotics and Appli-
           Recognition Research, 1:24–38, 2015.                     cations (ICARA), 2011 5th International
                                                                    Conference on, pages 96–99. IEEE, 2011.
[GK16]     Eriglen Gani and Alda Kika. Albanian
           sign language (AlbSL) number recogni-         [VAC13]    Harsh Vardhan Verma, Eshan Aggarwal,
           tion from both hand’s gestures acquired                  and Swarup Chandra. Gesture recognition
           by kinect sensors. International Journal                 using kinect for sign language translation.
           of Advanced Computer Science and Appli-                  In Image Information Processing (ICIIP),
           cations, 7(7), 2016.                                     2013 IEEE Second International Confer-
                                                                    ence on, pages 96–100. IEEE, 2013.
[LGS08]    Yun Liu, Zhijie Gan, and Yu Sun.
           Static hand gesture recognition and           [WA12]     Jarrett Webb and James Ashley. Begin-
           its application based on support vec-                    ning Kinect Programming with the Mi-
           tor machines.      In Software Engineer-                 crosoft Kinect SDK. Apress, 2012.
           ing, Artificial Intelligence, Networking,     [WKSE02] Juan Wachs, Uri Kartoun, Helman Stern,
           and Parallel/Distributed Computing, 2008.              and Yael Edan. Real-time hand gesture
           SNPD’08. Ninth ACIS International Con-                 telerobotic system using fuzzy c-means
           ference on, pages 517–521. IEEE, 2008.                 clustering. In Automation Congress, 2002
[MSD16]    MSDN. Kinect for windows sensor compo-                 Proceedings of the 5th Biannual World,
           nents and specifications, April 2016.                  volume 13, pages 403–409. IEEE, 2002.

[Pav12]    Theodosios Pavlidis. Algorithms for graph-    [ZL+ 02]   Dengsheng Zhang, Guojun Lu, et al. A
           ics and image processing. Springer Science               comparative study of fourier descriptors
           & Business Media, 2012.                                  for shape representation and retrieval. In
                                                                    Proc. 5th Asian Conference on Computer
[RMP+ 15] J.L. Raheja, M. Minhas, D. Prashanth,                     Vision. Citeseer, 2002.
          T. Shah, and A. Chaudhary. Robust ges-
                                                         [ZY14]     Yanmin Zhu and Bo Yuan. Real-time hand
          ture recognition using kinect: A compari-
                                                                    gesture recognition with kinect for play-
          son between DTW and HMM. Optik - In-
                                                                    ing racing video games. In 2014 Inter-
          ternational Journal for Light and Electron
                                                                    national Joint Conference on Neural Net-
          Optics, 126(11-12):1098–1104, jun 2015.
                                                                    works (IJCNN). Institute of Electrical &
[SB13]     Kalin Stefanov and Jonas Beskow. A                       Electronics Engineers (IEEE), jul 2014.
           kinect corpus of swedish sign language
           signs. In Proceedings of the 2013 Work-
           shop on Multimodal Corpora: Beyond Au-
           dio and Video, 2013.

[SSKK16] S Shruthi, KC Sona, and S Kiran Ku-
         mar. Classification on hand gesture recog-
         nition and translation from real time video