Clinical Speech to Text
                                 Evaluation Setting

Hanna Suominen1, Jim Basilakis2, Maree Johnson3, Linda Dawson4, Leif Hanlen1,
              Barbara Kelly5, Anthony Yeo2, Paula Sanchez3
 1
   NICTA, National ICT Australia and The Australian National University, Locked Bag 8001,
                               2601 Canberra, ACT, Australia
     2
       University of Western Sydney, Locked Bag 1797, 2751 Penrith, NSW, Australia
 3
   Centre for Applied Nursing Research (a joint facility of the South Western Sydney Local
Health District & the University of Western Sydney), Locked Bag 7103, 1871 Liverpool BC,
                                      NSW, Australia
              4
                University of Wollongong, 2522 Wollongong, NSW, Australia
              5
                The University of Melbourne, 2010 Melbourne, VIC, Australia
     hanna.suominen@nicta.com.au, J.Basilakis@uws.edu.au,
      Maree.Johnson@sswahs.nsw.gov.au, lindad@uow.edu.au,
       leif.hanlen@nicta.com.au, b.kelly@unimelb.edu.au,
                 anthony.yeo@uni.sydney.edu.au,
                Paula.Sanchez@sswahs.nsw.gov.au


      Abstract. Failures in information flow from clinical handover are the leading
      cause of sentinel events in the USA and associated with nearly half of all ad-
      verse events and over a tenth of preventable adverse events in Australia. Verbal
      clinical handover provides a good picture of the background clinical history and
      current state of clinical management of a group of patients cared for by a nurs-
      ing team. However, all this valuable verbal information is lost after three con-
      secutive shifts if no notes are taken during handover. When traditional note-
      taking by hand occurs, less than a third of data is transferred correctly after
      five shifts.
         We propose using an automated approach of cascading speech-to-text con-
      version, standardisation with respect to controlled thesauri, and structuring in
      accordance with documentation standards. This transcribes verbal handover in-
      formation into written drafts for subsequent clinical review, editing, and addi-
      tion to electronic health records.
         In this paper, we introduce the evaluation setting for this technology devel-
      opment in a laboratory environment. It ranks a wide range of recording devices
      used alone or in combination with headsets and lapel microphones based on cli-
      nicians’ preferences and their accuracy in speech-to-text conversion. The sam-
      ple consists of four student nurses and four experienced academics from diverse
      clinical specialties and speaking styles. To simulate realistic nursing clinical
      handovers, twenty handover scenarios have been scripted. The subsequent eval-
      uation in a clinical environment will address speech-to-text conversion, stand-
      ardisation, and structuring with the short-listed devices in six hospitals with the
      sample of thirty authentic handover situations per hospital.
          To compare recorder-microphone combinations across all participants, pro-
       fessional-level recording devices are used to record each participant. The re-
       cordings are then played using professional-level speakers across all recorder-
       microphone combinations to achieve equivalency in voice input. Statistical ac-
       curacy in speech-to-text conversion with noise experimentation is used to de-
       termine the most accurate combination. Two speech-to-text systems are com-
       pared against transcription by hand.
          An eighteen-item pre-experimental survey addresses initial perceptions of us-
       ing the proposed automated approach in clinical settings. This includes partici-
       pants’ opinion on the improvement of clinical handover with the proposed au-
       tomated approach, their understanding of the related technologies and perceived
       problems with the clinical application. An eleven-item post-experimental sur-
       vey examines device usability with reference to the specific experimental de-
       vices. Each participant is asked to complete both surveys and participate in a
       one-to-one interview. All participants are videoed using the recording devices
       and accessing typical device functions to further examine human-device inter-
       actions for usability assessment.
          We are seeking additional partners to further develop and evaluate the ap-
       proach and setting.

       Keywords: Evaluation; Natural Language Processing; Nursing Informatics;
       Speech Recognition Software


1      Introduction
Failures in information flow from clinical handover are the leading cause of sentinel
events in the USA and associated with nearly half of all adverse events and over a
tenth of preventable adverse events in Australia.1-3 Verbal clinical handover provides
a good picture of the background clinical history and current state of clinical mana-
gement of a group of patients cared for by a nursing team. However, all this valuable
verbal information is lost after three consecutive shifts if no notes are taken during
handover. When traditional note-taking by hand occurs, less than a third of data is
transferred correctly after five shifts.4-5
   We propose using an automated approach of cascading speech-to-text conversion,
standardisation with respect to controlled thesauri, and structuring in accordance with
documentation standards. This transcribes verbal handover information into written
drafts for subsequent clinical review, editing, and addition to electronic health re-
cords. We have already demonstrated the suitability of the document structure scienti-
fically and practically by introducing a documentation template to be populated by
typing. After its initial pilot testing in six wards, implementation across four major
teaching hospitals in Australia is nearing completion.6-7
                      Fig. 1. Laboratory environment for evaluation

   In this work-in-progress paper, we introduce the evaluation setting for this techno-
logy development in a laboratory environment (Figure 1). This setting aims to define
hardware to be used in a subsequent evaluation in a clinical environment. It ranks
hardware alternatives based on clinicians’ preferences and their accuracy in speech-
to-text conversion when using fixed software. The subsequent evaluation in a clinical
environment will address not only speech-to-text conversion but also the steps of
standardisation and structuring.


2      Materials and Methods
A wide range of recording devices are considered and compared. These include an
MP3 player, medium and high-end voice recorders, smart phones and tablet compu-
ters. The devices are used alone or in combination with medium and high-end head-
sets as well as omnidirectional and noise-cancelling lapel microphones. The sample
consists of four student nurses and four experienced academics from diverse clinical
specialties and speaking styles, including accents and voice qualities. To simulate
realistic nursing clinical handovers, twenty handover scenarios have been scripted.
Derived from existing clinical handover data, these fictitious and de-identified scena-
rios reflect the full range of possible handover situations including structured hando-
ver, unstructured handover, group presentation and individual presentation. Each
handover scenario includes the use of proper English, jargon terms, fragmented lan-
guage, atypical abbreviations and clinical terminology. In a second phase, the short-
listed 3–5 recording devices are tested in clinical practice with the sample of 180
authentic handover situations (i.e., thirty situations in six hospitals). We have chosen
this two-phase approach to minimize the evaluation bias caused by the burden of wea-
ring multiple devices in clinical practice when compared with the final goal of having
one device only.


3      Results and Discussion
Evaluation of accuracy: To enable systematic comparison of recorder-microphone
combinations across all participants, professional-level recording devices are used to
record each participant. The recordings are subsequently replayed using professional-
level speakers across all recorder-microphone combinations to achieve equivalency in
voice input. Statistical accuracy in speech-to-text conversion is used to determine the
most accurate combination. This use of pre-recorded sound files also enables syste-
matic manipulation and experimentation of a wide range of noise levels and types
(e.g., ambient, intrusive, continuous, intermittent, and other people in group presenta-
tion). Minimally two speech-to-text systems are compared against transcription
by hand.
   Personalisation to clinical context: An eighteen-item pre-experimental survey ad-
dresses initial perceptions of using the proposed automated approach in clinical set-
tings, prior to the introduction of experimental recording devices. This includes parti-
cipants’ opinion on the improvement of clinical handover with the proposed auto-
mated approach, their understanding of the related technologies and perceived pro-
blems with the clinical application. In addition to assessing the perceived benefits and
problems of recording devices, an eleven-item post-experimental survey examines
device usability with reference to the specific experimental devices. Each participant
is asked to complete both surveys and participate in a one-to-one interview or focus
group       discussion.      Our      survey     templates      are      available     at
http://bit.ly/JB0yHR.


4      Conclusion
We are seeking additional partners to further develop and evaluate the approach and
setting in order to gain understanding across specialties, jargons, genres,
and languages.
Acknowledgements
NICTA is funded by the Australian Government as represented by the Department of Broad-
band, Communications and the Digital Economy and the Australian Research Council through
the ICT Centre of Excellence program. We gratefully acknowledge collaboration in cross-
country comparisons with Riitta Danielsson-Ojala, Heljä Lundgrén-Laine, and Sanna Salanterä
(University of Turku, Department of Nursing Science, Turku, Finland).


References
 1. Joint Commission on Accreditation of Healthcare Organisations: Health Care at the
    Crossroads: Strategies for Improving the Medical Liability System and Preventing Patient
    Injury, 2005, http://bit.ly/KJ8ylN.
 2. Australian Commission on Safety and Quality in Health Care: Windows into Safety and
    Quality in Health Care 2008, 2008, http://bit.ly/JwlxXz.
 3. Australian Commission on Safety and Quality in Health Care: The OSSIE Guide to Clini-
    cal Handover Improvement, 2009, http://bit.ly/JWZ5vb.
 4. Pothier D, Monteiro P, Mooktiar M, Shaw A: Pilot study to show the loss of important da-
    ta in nursing handover. Br J Nurs. 2005;14(20):1090-3.
 5. Matic J, Davidson PM, Salamonson Y: Review: bringing patient safety to the forefront
    through structured computerisation during clinical handover. J Clin Nurs. 2011;20(1-
    2):184-9
 6. Johnson M, Jefferies D, Nicholls D: Exploring the structure and content of nursing clinical
    handovers. Int J Nurs Pract. 2012, in press.
 7. Johnson M, Jefferies D, Nicholls D: Developing and testing a minimum data set for elec-
    tronic handover in nursing. J Clin Nurs. 2012;21(3-4):3331-43.