COVID-19 Research and Smart Healthcare


                                 TeleStroke System (TSS)

                       Rishabh Chandaliya, Praveen Joshi, and Haithem Afli

                                              ADAPT Centre,
                                 Cork Institute of Technology, Cork, Ireland


              Abstract. According to the World Health Organisation(WHO), car-
              diovascular diseases (CVDs) are the number 1 cause of death worldwide,
              claiming nearly 17.9 million lives yearly [1]. Eighty-five percent of all
              CVD deaths are from heart attacks and strokes. This rising crisis has
              received relatively small coverage up to date, given its massive economic
              development effect in countries. However, Early detection of stroke and
              treatment of it is essential for an excellent outcome. This research aims
              to discuss the challenge of providing early detection of stroke through
              drooping mouth detection on Client-Server based architecture, a Tele-
              Stroke System (TSS), which is helpful for initial treatments. In this pa-
              per, the input data is from the Kaggle and YouTube Facial Palsy (YFP)
              database. We can find that simple Machine learning gives better accuracy
              than other deep learning models from the experimental results. It means
              that we can effectively assist doctors in early detection and diagnosing
              of stroke.

              Keywords: Stroke Detection ➲ Data Augmentation ➲ face detection ➲
              facial palsy ➲ droopy mouth detection ➲ Machine Learning.


    1      Introduction
    Stroke is commonly described as a clinical syndrome of presumed vascular origin,
    characterized by rapid signs of local or global cerebral dysfunction that usually
    lasts for more than 24 hours or even death [18]. A stroke is said to occur when
    a blood vessel carrying nutrients and oxygen into the brain is either blocked,
    bursts, or ruptured by a clot. This results in decreased oxygen and blood sup-
    ply in the brain, causing behavioral problems, memory loss, thinking habits,
    impairment, permanent brain damage, and even death [2].
        Generally, there are three types of stroke. The first type of stroke is called
    Ischemic stroke. It occurs when there is a blockage in arteries either by gradual
    plaque build-up, blood clots, and other fat deposits. The second form is the
    hemorrhagic stroke that happens when a blood vessel located in the brain splits
    and induces blood to leak into the brain. This form of stroke is responsible for
    13 percent of all strokes, and it accounts for over 30 percent of all stroke-related
    deaths. Lastly, an Embolic stroke occurs when a clot breaks off from the artery
    wall and becomes an embolus that can travel further down the bloodstream,
    resulting in the smaller artery’s blockage. Embolic typically originates from the
    heart, where uncommon diseases may cause clot formation.

Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License         325                                  CERC 2020
Attribution 4.0 International (CC BY 4.0).
                                                           COVID-19 Research and Smart Healthcare

   2        Rishabh et al.

       Depending on the latest National Stroke Association statistics[3], stroke is
   now the fifth leading cause of death in the United States alone and is a significant
   cause of adult disability. Every year, 0.8 million people in the U.S. are believed to
   have a stroke, and 0.13 million of them died from a stroke, meaning, on average,
   one American dies from stroke every 4 minutes. The lack of understanding of
   stroke signs is also one of the contributing factors for the unprecedented increase
   in death associated with the stroke.


   2    Related Work

   When you deal with stroke, it’s always crucial to know what you want to identify.
   It is safer to provide a standardized program that addresses specific common
   stroke symptoms. One of the signs of stroke is facial paralysis, meaning, on one
   side of the face, there is facial sagging, which is known as face drooping. The
   droopy mouth is not only a symptom of stroke but also a sign of a brain and
   nerve disorder known as Bell’s palsy. Bell’s palsy, in short, is muscle weakness or
   paralysis on one side of the neck. As a result, facial nerve damage that controls
   the muscles on one side of the face causes that other side of the face to drop[4].
       A Vision-based interface is provided where the user’s hand, foot, or head
   movement are identified and monitored. The framework is designed for child-to-
   computer interaction. They base the device assessment on the concept of HAAT
   (Human Activity Assistive Technology) [5].
       It projects the system to be less intrusive and cheaper than other solutions,
   in which they built up a video-based prediction process. This function: first, the
   movements of different parts of the body are isolated, then the motion charac-
   teristics are extracted and used to identify persons as stable or impaired. The
   results show that visually collected motion data allows the Identification of Bell
   palsy and Pseudoperipheral palsy as the state-of-the-art detection using data
   from electromagnetic sensors [6].
       ” Camera Mouse” was introduced in [7] to provide computer access for people
   with severe disabilities. With a video camera, the device monitors the user’s
   gestures and converts them into the mouse pointer, which moves on the screen.
       They built a visual monitoring system in [8] that passively observes moving
   objects in a site and learns from these observations patterns of behavior. The
   system is based on motion tracking, camera synchronization, recognition of op-
   eration, and events. The method is useful for sequence classification as well as
   for individual site operation instances.
       In [9], they have converted a video into a sequence of frames used by correlat-
   ing structures to predict human behavior. The machine examines the faces and
   identifies them to test the expression ROI (Region of Interest). It also utilizes
   Viola-Jones patterns, and the AdaBoost process to filter colors to identify the
   ears, noses, eyes, mouth, or upper body of people. These sections have higher
   entropy to detect emotions.
       A method was proposed that recognizes three facial expressions in [10]; they
   use the geometric feature approach for extraction of features; the system uses the


CERC 2020                                   326
COVID-19 Research and Smart Healthcare

                                                     TeleStroke System (TSS)       3

   neural network algorithm Multilayer Perceptron (MLP) with backpropagation
   for classification. The facial expressions to be identified are optimistic, happy,
   and surprised. The overall identification rate is 93.33 percent.
       They provide a system for recognizing facial emotions in [11]; it is based
   on ASM (Active Shape Model) for facial feature extraction and Radial Basis
   Function Network (RBFN) to evaluate the symmetry of the face in different
   behavior patterns with accuracy 90.73
       In [12], they developed a computer vision system that uses hidden Markov
   models (HMMs) to automatically identify individual action units or combina-
   tions of action units. Researchers used three methods to extract information on
   facial expression:

     – Tracking point
     – Complex flow monitoring with principal component analysis (PCA)
     – Identification of high gradient components.

   They present a system for face recognition and face emotion detection in [13],
   using the Open Source Computer Vision (OpenCV) library and python machine
   learning. Machine learning algorithms are used to recognize and classify different
   facial expressions and body movements. (OPPOSE)
       The research focuses on image processing to obtain facial features/landmarks,
   mainly mouth corners, determine the droopy mouth threshold value and deploy-
   ment using Android Studio. Consequently, the mobile application will address
   certain user groups such as neurosurgeons and emergency medical services. How-
   ever, the mobile application could also be useful for patients, potential patients
   with strokes, and generally any user who is health-conscious anywhere and any-
   time[14].


   3     Material

   3.1    DATASET

     – Stroke Faces on Kaggle[15]: This dataset consists of 1000 droopy faces
       curated from Google by Kaitav Mehta.
     – YouTube-Facial-Palsy-Database[16]: The dataset in this paper contains
       32 video clips collected from YouTube of 22 patients with facial paralysis. It
       is the first public database made available for the visual inspection study of
       facial palsy symptoms. Gee-Sern Jison Hsu, Jiunn-Horng Kang, Wen-Fong
       Huang[16] convert the videos into image sequences and have the images
       labeled by clinicians with facial palsy. As the number of patients is small,
       we have removed the patient’s multiple pictures from the video.
     – Yale Face Database B[17]: The collection comprises 5760 single-source
       light photographs of 10 subjects seen under 576 viewing conditions each.
       The subjects are healthy and do not have any symptoms of a stroke. A
       single image of subjects from the dataset is used for regular face input in
       CNN feed.


                                           327                                   CERC 2020
                                                               COVID-19 Research and Smart Healthcare

   4         Rishabh et al.

                                   Number        Stroke No Stroke
                               Training Images 900          900
                               Testing Images 100           100
                                    Total      1000        1000

   4     Method
   This segment addresses the proposed TeleStroke system for Eye area and the
   mouth region. TeleStroke consists of client-server stages: the client-side relates
   to video pre-processing using face detection and landmark position to locate each
   sequence frame’s faces. In this phase, the Integrated Deep Model is used. The
   face images detected are then cropped to the face, and the number of frames
   per sequence is normalized to a fixed length. The server side consists of various
   models, such as the ML model, FCNN, CNN, VGG16(Transfer Learning), one
   for each of the face analysis tasks. The proposed structure for Telestroke is given
   in figure 1.


                              Fig. 1. Deep Client-Server Architecture


   4.1      Front End
   The front end is a web application comprised of HTML, CSS, JavaScript, AJAX,
   which ask the user’s permission to grant for camera access that takes live stream


CERC 2020                                       328
COVID-19 Research and Smart Healthcare

                                                          TeleStroke System (TSS)     5

   image of the user i.e., a WebSocket connection is established. From this stream,
   a picture is taken out, and the build model is applied. Ajax call is made to get
   the response and request faster.


                                    Fig. 2. Face - Droop Image

                                 Fig. 3. No Face - Droop Image


   5     Experimental Evaluation

   This section provides a detailed experimental evaluation of the Stroke recogni-
   tion system using the TeleStroke Framework. All experiments are conducted on
   Google Colab with GPU using the Tensorflow 2.0.
       The dataset is combined and viewed as one group of droopy as 0 and not
   droopy faces as 1. This dataset was transferred to different steps and tested on
   various algorithmic models.
       The evaluation methodologies are as follows:


   5.1    Simple ML model

   In this part, the Area ratio of the eye and Slope calculation of mouth plus eye
   and mouth coordinates are saved in CSV file, and the droopy faces are labeled
   as 0, and Normal Subjects are marked as 1 which will be used for classification.
       The Fig 4. shows the 10-fold cross-validation technique is performed to eval-
   uate each algorithm on training data, which is critically designed with the same
   random seed to ensure that the equal splits are presented to the training data.
   Each algorithm is evaluated precisely the same way.
       Fig 5 and 6 is the precision, recall, f1 score, and ROC Curve for the Naive
   Bayes as it was only tested with the testing data. It would indicate from these
   results of 10 cross-fold validation both Naive Bayes and Random Forest are
   perhaps worthy of further study on this topic.


                                                329                                 CERC 2020
                                                             COVID-19 Research and Smart Healthcare

   6         Rishabh et al.


                               Fig. 4. Algorithmic Comparison


                              Fig. 5. Precision, Recall, F1-Score

                                     Fig. 6. ROC Curve


   5.2      Fully Connected Neural Network.
   The first layer has 600 neurons, the second layer has 400 neurons, and the third
   layer comprises of 200 neurons as shown in Fig 4.7 The lower the loss, the better
   a model will be (unless the model is overfitted to the training data). The loss
   is based on training and validation, and its definition for these two sets is how
   well the model is doing. Contrary to accuracy, the loss is not a number. It is a
   summation of the errors in training or validation sets that were made for each
   example.
       There are also other subtleties when raising the value of the loss. For example,
   we may run into the problem of over-fitting, in which the model ”memorizes”
   the patterns of training and becomes ineffective for the test set. We use Dropout
   for that reason.


CERC 2020                                     330
COVID-19 Research and Smart Healthcare

                                                              TeleStroke System (TSS)     7


                                         Fig. 7. Loss Function

                                           Fig. 8. Accuracy


                                  Name Epoch Accuracy Loss
                                  FCNN 100 0.95-0.97 0-2

   5.3    CNN and VGG16 model
   The amount of ”wiggle” in the loss is proportional to the size of the batch. The
   wiggle should be reasonably large when the batch size is 1. If the batch size
   is the maximum data-set, the wiggle should be small, as each gradient update
   will monotonically increase the loss function (unless the learning rate is set too
   high).


                               Fig. 9. Loss without augmentation

                          Fig. 10. Training loss without augmentation


      The difference between the accuracy of the training and evaluation indicates
   how much overfitting.In Fig 7 and 8, we can observe that too much wiggle is
   formed, and the model starts to overfit at 0-20 range. The data is not augmented.


                                                  331                                   CERC 2020
                                                           COVID-19 Research and Smart Healthcare

   8        Rishabh et al.

       The yellow validation curve shows very low validation accuracy relative to the
   training’s accuracy, suggesting heavy overfitting (note that validation accuracy
   can also begin to go down after some point). Compared to the above Fig 7 and 8


                             Fig. 11. Loss with augmentation

                         Fig. 12. Training loss with augmentation


   Fig 9 and 10 is better and the wiggling is less and there is no sign of over fitting.


   6    Conclusion

   The proposed Web application requires incorporating four critical processes:
   image acquisition, key points extraction using OpenCV, mathematical measure-
   ment, and droopy mouth detection. This paper uses a Simple ML, Fully Con-
   nected NN, and CNN architecture with ResNet backbone to present a fully
   end-to-end framework named Telestroke System for droopy mouth detection.
   We tested the design using a combination of 3 datasets to get an F1 score of 94
   percent, respectively, for the droopy mouth.


   7    Discussion and Future Work

   In this paper, we took on a novel and challenging task of assessing the different
   models with our proposed model. Although our model performed very well on
   the selected data sets, there are many areas where it is possible to pursue this
   research. Firstly, it is possible to achieve hyperparameter tuning and optimiza-
   tion by adding some complex feature vector representations; there is also the
   potential to investigate more complex backbone and deeper networks, which are
   limited in this work due to computational overhead 3D CNNs. It opens up a
   research opportunity for simplifying droopy mouth detection & recognition in
   Mobile Application.


CERC 2020                                   332
COVID-19 Research and Smart Healthcare

                                                       TeleStroke System (TSS)          9

       The current dataset should be expanded to a more consistent one study by
   including a more substantial number of face photos labeled with demographic
   details, including age, gender, and ethnicity. As the data collection expands, ad-
   ditional knowledge will become more accurate, and we will consider studying the
   impact of original populations on paralysis disease and its ranking. Besides, the
   performance comparison of different pre-trained CNN models can be improved
   by using more precise facial landmark detection techniques and enabling the
   recognition of a droopy mouth from tilted or rotated images. Besides, the pro-
   posed model could be enhanced to achieve faster identification of droopy mouth
   and better recognition performance. Will be seen as a potential problem for
   research.


   8     Acknowledgments

   This research was conducted with the financial support of ADAPT Core (Plat-
   form & Spokes) under Grant Agreement No. 13/RC/2106 and at the ADAPT
   SFI Research Centre at Cork Institute Of Technology. Science Foundation Ire-
   land funds the ADAPT SFI Centre for Digital Media Technology through the
   SFI Research Centres Programme. It is co-funded under the European Regional
   Development Fund (ERDF) through Grant 13/RC/2106.


   References
    1. Stroke: a global response is needed https://www.who.int/bulletin/volumes/94/9/16-
       181636.pdf
    2. MayField Clinic. 2013. What Is Stroke? http://www.mayfieldclinic.com/pe-
       stroke.htm#.Va0rm mqqko
    3. National Stroke Association. 2016. Signs and Symptoms of Stroke
    4. The Nature of Bell’s Palsy. (Laryngoscope. 1949;59:228-235), R. Gacek, ”Hilger
    5. Vision based interface: an alternative tool for children with cerebral palsy, Mag-
       dalena Gonzalez, Debora Mulet, Elisa Perez, Carlos Soria, and Vicente Mut.
    6. Video-based early cerebral palsy prediction using motion segmentation, Hodjat
       Rahmati, Ole Morten Aamo, Øyvind Stavdahl, Ralf Dragon, and Lars Adde
    7. The camera mouse: visual tracking of body features to provide computer access
       for people with severe disabilities, Margrit Betke, James Gips, and Peter Fleming
    8. Learning patterns of activity us- ing real-time tracking, Chris Stauffer and W. Eric
       L. Grimson
    9. Human behavior prediction using facial expression analysis, Subarna Shakya,
       Suman Sharma, and Abinash Basnet
   10. Real time facial expression recogni- tion using realsense camera and ann, Jayashree
       V Patil and Preeti Bailke
   11. Facial emotional expressions recognition based on active shape model and radial
       basis function network, Endang Setyati, Yoyon K Suprapto, and Mauridhi Hery
       Purnomo
   12. Auto- mated facial expression recognition based on facs action units, James J Lien,
       Takeo Kanade, Jeffrey F Cohn, and Ching-Chung Li


                                             333                                      CERC 2020
                                                         COVID-19 Research and Smart Healthcare

   10       Rishabh et al.

   13. A robust method for face recognition and face emotion detection system using
       support vector machines, KM Rajesh and M Naveenkumar
   14. Mobile Health Awareness in Pre-Detection of Mild Stroke Symtoms, O.-M. Foong,
       J.-M. Yong, S. Sulaiman and D. Rambli
   15. Dataset for Facial Palsy Kaitav Mehta =https://www.kaggle.com/kaitavmehta/facial-
       droop-and-facial-paralysis-image,
   16. Deep Hierarchical Network With Line Segment Learning for Quantitative Analysis
       of Facial Palsy
   17. Acquiring Linear Sub spaces for Face Recognition under Variable Lighting
   18. World       Health      Organisation.     2014.    Stroke,     Cerebrovascular,
       http://www.who.int/topics/cerebrovascular accident/en


CERC 2020                                  334