=Paper= {{Paper |id=Vol-1543/p11 |storemode=property |title=A Prototype Gaze-Controlled Speller for Text Entry |pdfUrl=https://ceur-ws.org/Vol-1543/p11.pdf |volume=Vol-1543 |authors=Mindaugas Vasiljevas,Justas Šalkevičius,Tadas Gedminas,Robertas Damaševičius |dblpUrl=https://dblp.org/rec/conf/system/VasiljevasSGD15 }} ==A Prototype Gaze-Controlled Speller for Text Entry== https://ceur-ws.org/Vol-1543/p11.pdf
  A Prototype Gaze-Controlled Speller for Text Entry

                   Mindaugas Vasiljevas, Justas Šalkevičius, Tadas Gedminas, Robertas Damaševičius
                                                 Software Engineering Department
                                                 Kaunas University of Technology
                                                         Kaunas, Lithuania
                                                   robertas.damasevicius@ktu.lt


    Abstract—Eye typing provides a means of communication                developed in order to efficiently and effectively address the
that is useful for people with disabilities which prevent using          accessibility problems in human interaction with software
their hands for text entry. In this paper, we describe the               applications and services while meeting individual
development of the prototype gaze-controlled speller and discuss         requirements of the users in general. Eye typing has been
its experimental evaluation. Using scrollable virtual keyboard           defined as the production of text using the focus of the eye (aka
interface, the text input speed of 1.2 wpm was achieved.                 gaze) as a means of input [5]. Eye typing has been known for
                                                                         30 years now [6], but recently it has received an increased
    Keywords—gaze tracking; gaze writing; eye typing; hands-free
                                                                         attention from the researchers with the arrival of affordable eye
text entry; speller; assistive technology.
                                                                         tracking devices on the market.

                      I.   INTRODUCTION                                      Systems using gaze-controlled eye typing may be called as
                                                                         gaze spellers (using an analogy to brain-controlled BCI spellers
    Communication is central to the human life and experience.           [7]). It is a kind of assistive technology [8], specifically
With the rise of the electronic means of communication and               designed for the purpose of increasing or maintaining the
internet-based social networks as well as wide-spread use of             capabilities of people with disabilities, which can be used in
smartphones and tablet PCs, the role of communication the role           ambient assisted living (AAL) environments [9] for people
of texting has increased significantly. According to one report          with special needs. It has the general aim of bridging the digital
[1], different types of text-based communication (text                   divide and providing universal access [10] to anyone.
messaging, e-mail) are the preferred mode of communication
for young people (text messaging – 54%, email – 11% vs. for                  The current research is important as the existing eye typing
example, cell-phone call – 38%, face-to-face talk – 33%).                systems still have many limitations (low entry speed, poor
                                                                         usability, etc.) and even small improvements in the design of
    To most people, text entry is a simple action. However,              such systems can lead to significantly improved life quality of
over a billion people are estimated to be living with some form          impaired people.
of disability or impairment [2]. To those suffering from
physical disabilities or age-related impairments, the text entry             The structure of the remaining parts of the paper is as
task may present a significant challenge. For example, in case           follows. Section 2 discusses the related work. Section 3
of such disabilities as amyotrophic lateral sclerosis (ALS) often        describes the developed prototype of gaze-controlled speller.
lead to complete loss of control over voluntary muscles, except          Section 4 describes the experimental results. Finally, Section 5
the eye muscles. Today's computer systems are not suitable to            presents conclusions and discusses future work.
be used for such people as the input to computers is still fairly
limited to mechanical (keyboard, mouse), audio (speech) and                                    II. RELATED WORK
tactile (touchpad) inputs. Inability to use a conventional
physical input device, such as the mouse or the keyboard,                    Several different eye typing systems have been described
raises the importance of other input modalities such as eyes for         in research papers. These systems mainly differ in their
connecting persons with severe motor impairments to the                  approach towards presentation and layout of letters in the user
digital world. The design of the hardware and software that              interface. A typical example is presenting an on-screen
enables access of handicapped people to ICT services often               keyboard. The user has to options for action: looking at the
fails to take into account the user needs [3]. Such limitations          desired letter or key for selecting it, and dwelling (i.e., pausing
raise barriers for people with major or minor disabilities such          eye movements for a moment) on it for input. Known
as elderly people with motor impairments in benefiting from              examples of such systems are GazeTalk [11], ERICA [12],
the use of modern ICT applications. Therefore, a large number            pEYEwrite [13], and Špakov et al. [14].
of individuals are at risk of becoming excluded from the                     GazeTalk [11] uses a probabilistic character layout
information and knowledge society [4].
                                                                         strategy to show only 6 most likely next characters on-screen,
   To overcome these barriers, new concepts and methods of               while next 6 most likely words predicted from the previous
human-computer interaction (HCI) must be researched and                  words in the sentence are shown. The users have achieved text
                                                                         entry speed from 4 words per minute (wpm) for character-only
 Copyright © 2016 held by the authors                                    input to up to 10 wpm using the most likely-words feature.




                                                                    79
    In ERICA [12], six large on-screen keys were used instead                           III. DEVELOPMENT OF GAZE SPELLER
of an entire keyboard due to limited resolution of the eye
tracker. A prediction algorithm allowed to decrease eye-                    A. Usage scenario
typewriting time by 25%.                                                        Usually eye-tracking interfaces are designed to imitate
    pEYEwrite [13] groups the letters together in a                         operation of a standard pointing device such as a mouse. The
hierarchical pie structure. To enter a letter, the user first dwells        gaze tracking system, either head mounted or attached in front
on the pie slice containing the desired group of letters, then              of the user then tracks the user’s gaze and transforms it to the
dwells on the desired slice in a popup pie menu. Novice entry               screen coordinates.
rates of 7.9 wpm were reported with a dwell time of 400 ms.	
                   During eye typing, the user first locates the letter on a
    Špakov et al. [14] use “scrollable keyboards” where one or              virtual keyboard by moving his/her gaze to it. The gaze
more rows are hidden to save space combined with keyboard                   tracking device follows the user’s point of gaze while software
layout optimization according to letter-to- letter probabilities.           records and analyses the gaze behavior. For input, the user has
The users achieved 8.86 wpm speed for the 1-row keyboard,                   to fix his/her gaze at the letter for a pre-defined time interval
and 12.18 wpm for the 2-row keyboard, respectively.                         (aka dwell time). When the dwell time has passed, the letter is
                                                                            selected by the system and users can move on to gaze to the
    Other related works include different kinds of text entry               next letter. Feedback is typically shown on both on focus and
systems using virtual keyboard interface. Methods employed                  on selection.
in these systems for increasing input systems can be directly
transferred to the gaze speller domain, e.g., predictive                    B. Advantages and disadvantages
keyboard layouts in SoftType [15].
                                                                                As an input method, gaze has both advantages and
    AUK [16] uses a 12-key soft keyboard similar to the one                 disadvantages. It is easy to focus on items by looking at them
used in mobile phones and supports several different entry                  and target acquisition using gaze is very fast, given the targets
modes (1-to-10 key, joystick), various layout configurations                are sufficiently large [22]. However, gaze is not as accurate as
for different performance levels; integration with additional               the mouse partly due to technological reasons as well as some
performance enhancing techniques, such as text prediction and               features of the eye [23]. The size of the fovea and the inability
dictionary or prefix-based word disambiguation.                             of the camera to resolve the fovea position restrict the accuracy
                                                                            of the measured point of gaze [5].
    Alternative interfaces for gaze typing include Dasher.
Dasher [17] allows users to write by zooming through a world
of boxes. Each box represents an individual letter and the size             C. Technical limitations
of a box is proportional to the probability of that letter given                When humans look at things, their fix their gaze on them
the preceding letters. The entry rates for Dasher range between             for 200 to 600 ms [23]. For a computer to distinguish whether
16–26 wpm [17].                                                             the user is looking at an object, a longer interval longer of time
                                                                            is needed. Usually, 1000 ms is long enough to prevent false
    Dwell-free eye-typing interface [18] tracks how simply                  selections [22]. While requiring the user to fixate for long
look at or near their desired letters without stopping to dwell             intervals allows preventing false selections, this may be
on each letter. The users reached a mean entry rate of 46 wpm               uncomfortable for most users.
on a perfect letter recognizer. While dwell-free eye-typing
may be more than twice as fast as traditional eye-typing                        The dwell time also places an upper limit on eye typing
systems, the working prototype of the system still has to                   speed, e.g., if dwell time is 1,000 ms, the upper limit for typing
implemented that would deal effectively with entry errors.                  speed is 12 words per minute (wpm) (considering that 1 word
                                                                            is equal to about 5 characters, for English text).
    SMOOVS [19] utilized smooth-pursuit eye movements
combined with a two-stage interface that uses a hexagonal                   D. Accessibility/usability requirements and limitations
layout of letters. The system had achieved the speed of 4.5
                                                                               Accessibility limitations of eye gaze tracking systems have
wpm, while the users have complained about low
                                                                            been formulated by Hansen et al. [11] as follows:
comprehensibility of the interface.
    Word/phrase prediction or completion is also widely used                    1. A large portion of the users is not able to get a
[20]. As a word is entered, the stem of the current word is                 sufficiently good calibration due to false reflections from
expanded to form a list of matching complete words. The list is             glasses, occlusion by eyelids or eyelashes, interference with the
displayed in a dedicated region of a user interface allowing the            ambient light, or low contrast between iris and the pupil.
user to select the word early. An example is Filteryedping [21]                 2. Gaze tracking systems usually require that the user does
- a key filtering–based approach for supporting dwell-free eye              not move. It is very difficult for most people and impossible for
typing that recognizes the intended word by performing a                    people with involuntary, e.g. spastic, movements.
lookup in a word list for possible words that it can form after
discarding none or some of the letters that the user has looked                4. People’s eyes tend to dry out due to eye fatigue and long
at. It sorts the candidate words based on their length and                  exposure to strong light.
frequency and presents them to the user for confirmation. The                  5. Present eye -tracking systems are only for stationary and
method has achieved the rate of 19.8 wpm.                                   indoor use.




                                                                       80
    The requirements for interfaces for impaired users are [24]:             Current implementation uses standard QWERTY layout
1) Limited access to details: complex and vital details of the           mapped to a single scrollable line of letters. Feedback is
system have to be hidden to avoid user overwhelming and                  ensured by the black line which always stays on the center of
trapping. 2) Self-learning: detected common patterns in the              the screen while the one-line keyboard moves underneath it
behavior of the user should be used to automatically create              depending on the horizontal position of the gaze. Letter
rules or shortcuts that speed and ease up the use of the system.         selection for input is provided by eye dwelling. Additional
3) System interruption: Impaired users have in most cases no             menu buttons are provided for calibration, connection to the
idea how the system is working, therefore easy cancellation of           gaze tracking device, loading of alternative keyboard layouts,
system’s activities must be ensured.                                     and setting program options. Layout editor has been
                                                                         implemented for designing other keyboard layouts.
    According to Lopes [25], user interface for persons with
disabilities must: support variability allowing to provide the             Finally, the operation of the system can be imitated using a
means to adapt to user-specific requirements; support of a wide          mouse if a gaze tracking device is disconnected.
range of input devices and output modes; provide minimal user
interface design; promote interaction and retain user attention                                 IV. EXPERIMENTS
on the tasks; and provide strong feedback mechanisms that
may provide rewarding schemes for correct results.                       A. Apparatus
                                                                             The eyeTribe eye tracker (tracking range 45cm – 75cm,
E. Architecture                                                          tracking area 40cm x 30cm at 65cm distance) was connected to
    The architecture of the developed prototype gaze speller             a HP Ultrabook notebook running Microsoft 8 OS 64-bit with a
system is quite simple (see Fig. 1). It consists of the gaze             Intel Core i5-4202Y 1.60 GHz CPU and 4 GB RAM. The
tracking device (Eye Tribe), which is connected to a PC via              application was displayed on a 14” LCD display with LED
USB 3.0 connection. On the PC, the core modules are                      backlight and screen resolution of 1920x1080 (see Fig. 3). The
responsible to calibration procedure and gaze feedback.                  eyeTribe eye tracker communicates with notebook via USB 3.0
                                                                         interface.




        Fig. 1. Architecture of the gaze speller prototype system


F. Interface
    The primary driving motive for designing a user interface
for a gaze speller is usability as good user experience would
also enhance the user acceptance of the system. Our developed
interface was inspired by Špakov et al. [14] and is based on the
concept of “scrollable keyboard” (see Fig. 2).                                          Fig. 3. Deployment of the eye tracking system.

                                                                         B. Procedure
                                                                             Prior to collecting data, the experimenter explained the task
                                                                         and demonstrated the software. The experiment was carried out
                                                                         with one disabled person, who could not control his legs and
                                                                         his hand movements are limited. The participant was instructed
                                                                         on the method of text entry, early word selection, error
                                                                         correction, and the audio feedback. He was instructed to enter
                                                                         the given phrases as quickly and accurately as possible and
                                                                         make corrections only if an error is detected in the current or
                                                                         previous word. The participant was allowed to enter a few trial
                                                                         phrases to become familiar with the gaze-controlled selection
                                                                         and correction methods.
             Fig. 2. Interface of the developed gaze speller




                                                                    81
   For the experiment, we used a fragment of the well-known                 lower range of the similar systems. However, there is much
novel “Alice in Wonderland” by Lewis Carroll (Charles                       space for improvement still left.
Lutwidge Dodgson):
   “The rabbit-hole went straight on like a tunnel for some                               V. CONCLUSSION AND FUTURE WORK
way, and then dipped suddenly down, so suddenly that Alice                      This paper has presented a new hands-free text entry
had not a moment to think about stopping herself before she                 system using gaze as the only source of input. Gaze speller is
found herself falling down a very deep well.”                               designed to assist the severely motor impaired individuals who
   The text consists from 219 characters (including spaces).                are unable to create motion input, but are able to voluntarily
                                                                            control their eyes.
    A volunteer participant was recruited, who had no prior
experience using an eye tracker, to enter the text.                             Further research is needed to perform more extensive
                                                                            experiments using a large group of participants (both healthy
                                                                            and impaired), to analyze more efficient letter layouts based on
C. Performance metrics
                                                                            letter frequency and letter/word prediction, to implement
    Typing speed is measured in wpm, where a word is any                    adaptive control of dwell time, to evaluate usability of the user
sequence of five characters, including letters, spaces,                     interface using common usability evaluation procedures such
punctuation, etc. When measuring accuracy, both corrected                   as NASA-TLX [30], to assess user learnability vs. fatigue with
errors and errors left in the final text are taken into account.            gaze speller in prolonged sessions, and, possibly, integrate
    Keystrokes per character (KSPC) measures the average                    several different input modalities (e.g., also using EMG
number of keystrokes used to enter each character of text.                  signals) for text entry tasks.
KSPC is an accuracy measure reflecting the overhead incurred
in correcting mistakes.                                                                               ACKNOWLEDGMENT
    Error rate is calculated by comparing the text written by the               The authors would like to acknowledge the contribution of
participant with the presented text.                                        the COST Action IC1303 – Architectures, Algorithms and
                                                                            Platforms for Enhanced Living Environments (AAPELE).
D. Results
    The mean for typing speed achieved was 1.2 wpm. This is                                                REFERENCES
quite typical for traditional dwell-based eye typing, but is still          [1]  Interpersonal     Communication:         A      First    Look.     SAGE.
too slow for fluent text entry. However, the experiment showed                   https://us.sagepub.com/sites/default/files/upm-
                                                                                 binaries/52575_Gamble_(IC)_Chapter_1.pdf
that the participant improved with practice over the four blocks
                                                                            [2] M. Dawe, “Desperately Seeking Simplicity: How Young Adults with
of input. The error rate is quite low overall, as the participant                Cognitive Disabilities and Their Families Adopt Assistive
generally chose to correct errors during the text entry.                         Technologies,” CHI '06: Proc. of the SIGCHI conference on Human
                                                                                 factors in computing systems, pp. 1143-1152, 2006.
                TABLE I.         RESULTS OF EXPERIMENT                      [3] M.C. Domingo, “Review: An overview of the Internet of Things for
                                                                                 people with disabilities,” J. Netw. Comput. Appl. 35, 2, pp. 584-596,
                                                                                 2012.
      Speed, wpm	
         KSPC	
             Error rate	
                  [4] P. Gregor, and A. Dickinson, “Cognitive dificulties and access to
                                                                                 information systems: an interac-tion design perspective,” Journal of
      1.2	
                1.44	
             0.01	
                             Universal Access to Information Society, Vol. 05, pp. 393-400, 2006.
                                                                            [5] P. Majaranta, I. Scott MacKenzie, Anne Aula, and Kari-Jouko Räihä,
                                                                                 “Effects of feedback and dwell time on eye typing speed and accuracy”,
E. Evaluation                                                                    Universal Access in the Information Society 5(2), pp. 199-208, 2006.
    We can compare the input speed of the developed gaze                    [6] P. Majaranta, and K.-J. Raiha, “Twenty years of eye typing: systems and
speller with other text typing systems using both traditional and                design issues,” Proc. of the ACM Symposium on Eye Tracking Research
alternative input methods and modalities. Average computer                       and Applications—ETRA 2002, pp. 15–22. ACM, New York, 2002.
users achieve 33 wpm text entry speed [26] while using                      [7] I. Scott MacKenzie, and K. Tanaka-Ishii, “Text Entry Systems:
standard PC and a keyboard. An average user of the “T9 input                     Mobility, Accessibility, Universality,” Morgan Kaufmann Publishers
                                                                                 Inc., 2007.
method” on a 12-key mobile phone keypad can produce up to
                                                                            [8] A. Gillespie, C. Best, and B. O'Neill, “Cognitive function and Assistive
10 wpm [27]. The speed achieved using the Brain Computer                         Technology for Cognition: A Systematic Review,” Journal of the
Interface (BCI) or Neural Computer Interface (NCI) spellers                      International Neuropsychological Society, 18, pp. 1-19, 2012.
and electroencephalogram (EEG) / electromyogram (EMG)                       [9] A. Dohr, R. Modre-Opsrian, M. Drobics, D. Hayn, G. Schreier, “The
data as input is in range of 0.2-2.55 wpm, while the eye-blink                   Internet of Things for Ambient Assisted Living,” in 7th Int. Conf.on
based EMG speller developed by the authors of this paper                         Information Technology: New Generations (ITNG), pp. 804-809, 2010.
achieved 2.4 wpm [28, 29]. Other gaze tracking based text                   [10] C. Stephanidis, G. Salvendy, D. Akoumianakis, A. Arnold, N. Bevan, D.
entry spellers report up to 12 wpm speed for dwell-based                         Dardailler, P.L. Emiliani, I. Iakovidis, P. Jenkins, A. Karshmer, P. Korn,
interfaces [14] and 20 wpm for dwell-free interfaces [21].                       A. Marcus, H. Murphy, C. Oppermann, C. Stary, H. Tamura, M.
                                                                                 Tscheligi, H. Ueda, G. Weber, and J. Ziegler, “Toward an Information
    The prototype gaze speller described in this speller is still in             Society for All: HCI challenges and R&D recommendations,” Int.
                                                                                 Journal of Human-Computer Interaction 11(1), pp. 1–28, 1998.
the early stage of development and its performance is in the




                                                                       82
[11] J.P. Hansen, D.W. Hansen, and A.S. Johansen, “Bringing gaze based                     Conference Extended Abstracts on Human Factors in Computing
     interaction back to basics,” Proc. of HCI International 2001, pp. 325–                Systems (CHI EA '15). ACM, pp. 303-306, 2015.
     328. Erlbaum, Mahwah, NJ, 2001.                                                  [22] C. Ware, and H.H. Mikaelian, “An evaluation of an eye tracker as a
[12] M. Ashmore, A.T. Duchowski, and G. Shoemaker, “Efficient eye                          device for computer input,” Proceedings of CHI/GI ’87. ACM Press, pp.
     pointing with a fisheye lens,” Proc. of Graphics Interface 2005, pp. 203–             183–188, 1987.
     210, 2005.                                                                       [23] R.J.K. Jacob, “Eye tracking in advanced interface design,” in: W.
[13] A. Huckauf, and M. H. Urbina,”Gazing with pEYE: new concepts in eye                   Barfield, and T.A. Furness (eds.), Virtual environments and advanced
     typing,” ACM SIGGRAPH Symposium on Applied Perception in                              interface design. Oxford University Press, pp. 258–288, 1995.
     Graphics and Visualization, APGV 2007, p. 141, 2007.                             [24] A. Marinc, C. Stocklöw, A. Braun, C. Limberger, C. Hofmann, and A.
[14] O. Špakov, and P. Majaranta, “Scrollable Keyboards for Casual Eye                     Kuijper, “Interactive personalization of ambient assisted living
     Typing,” PsychNology Journal 7(2), Special issue on "Gaze Control for                 environments,” Proc. of the 2011 Int. Conf. on Human interface and the
     Work and Play", pp. 159-173, 2009.                                                    Management of Information - Volume Part I (HI'11), LNCS vol. 6771,
[15] P.E. Jones, “Virtual keyboard with scanning and augmented by                          Springer-Verlag, pp. 567-576, 2011.
     prediction,” Proc. of the 2nd European Conference on Disability, Virtual         [25] J.B. Lopes, “Designing user interfaces for severely handicapped
     Reality and Associated Technologies, pp. 45–51, 1998.                                 persons,“ Proc. of the 2001 EC/NSF Workshop on Universal
[16] A. Mourouzis, E. Boutsakis, S. Ntoa, M. Antona, and C. Stephanidis,                   accessibility of ubiquitous computing: providing for the elderly
     “An accessible and usable soft keyboard,” Proceedings of HCI                          (WUAUC'01),” ACM, pp. 100-106, 2001.
     International 2007, pp. 961–970. Berlin: Springer, 2007.                         [26] C.M. Karat, C. Halverson, D. Horn, and J. Karat, “Patterns of entry and
[17] O. Tuisku, P. Majaranta, P. Isokoski, and K.-J. Raiha, “Now dasher!                   correction in large vocabulary continuous speech recognition systems”,
     dash away! longitudinal study of fast text entry by eye gaze,” Proc. of               Proc. of the SIGCHI conference on Human Factors in Computing
     the 5th ACM Symposium on Eye-Tracking Research & Applications,                        Systems (CHI '99), pp. 568–575, 1999.
     ACM Press, pp. 19–26, 2008.                                                      [27] A. Cockburn, and A. Siresena, “Evaluating Mobile Text Entry with the
[18] P.O. Kristensson and K. Vertanen, “The potential of dwell-free eye-                   Fastap Keypad,” British Computer Society Conference on Human
     typing for fast assistive gaze communication,” Proc. of the Symposium                 Computer Interaction, pp. 77-80, 2003.
     on Eye Tracking Research and Applications (ETRA '12), ACM, pp. 241-              [28] M. Vasiljevas, R. Turčinas, and R. Damaševičius, “EMG Speller with
     244, 2012.                                                                            Adaptive Stimulus Rate and Dictionary Support,” Proc. of
[19] O. Lutz, A. Venjakob, and S. Ruff, “SMOOVS: Towards calibration-                      FeDCSIS'2014: Federated Conference nn Computer Science and
     free text entry by gaze using smooth pursuit movements,” Journal of Eye               Information Systems, pp. 233-240, 2014.
     Movement Research 8(1):2, pp. 1-11, 2015.                                        [29] M. Vasiljevas, R. Turcinas, and R. Damasevicius, “Development of
[20] J. Miro, and P.A. Bernabeu, “Text entry system based on a minimal scan                EMG-based speller,” Proceedings of INTERRACCION 2014: XV
     matrix for severely physically handicapped people,” Proc. of the 11th                 International Conference on Human Computer Interaction.
     Conference on Computers Helping People with Special Needs—ICCHP                  [30] T. Hayashi and R. Kishi, “Utilization of NASA-TLX for Workload
     2008, pp. 1216–1219. Springer, Berlin, 2008.                                          Evaluation of Gaze-Writing Systems,” Proc. of the 2014 IEEE
[21] D. Pedrosa, M. da Graça Pimentel, and K.N. Truong, “Filteryedping: A                  International Symposium on Multimedia (ISM '14), pp. 271-272, 2014.
     Dwell-Free Eye Typing Technique,” Proc. of the 33rd Annual ACM




                                                                                 83