<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Arti cial voice perception in the context of novel voice restoration technique for laryngectomees</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Konrad Zielinski</string-name>
          <email>konrad.zielinski01@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ryszard Szamburski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ewa Machnacz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Warsaw</institution>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Laryngectomy leads to voice loss. Current voice restoration techniques are insu cient. New method is proposed based on silent speech interface and digital copy of user's voice. Analysis of arti cial voice perception, particularly personality attribution, is conducted as a part of this project. voice restoration machine learning silent speech synthesis arti cial voices per-</p>
      </abstract>
      <kwd-group>
        <kwd>laryngectomy speech</kwd>
        <kwd>intelligent interfaces sonality perception</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Laryngectomy is an invasive, radical surgical treatment of laryngeal cancer
        <xref ref-type="bibr" rid="ref8">(Nowak
et al. 2015)</xref>
        . One of its consequences is voice loss. Novel voice restoration
technique for laryngectomees is proposed (speci ed in section 4). It incorporates,
among other components, arti cial voice. This article presents this new
possibility, focusing on arti cial voice perception and how such perception in uences
the technologies used. The rst task of the project is to review current research
and answer the following questions:
1. Will users perceive a voice, even with lower quality, with their "biological
trace" as more natural than the best (with respect to the voice quality)
available text to speech voice?
2. Is it worthwhile to create an arti cial voice with "biological trace" of a
patient in the context of current project?
      </p>
    </sec>
    <sec id="sec-2">
      <title>Voice and personality</title>
      <p>
        How is our personality perceived by others? What factors should be taken into
account while analysing this issue? Most of the studies on personality perception
have focused on visual modality. Recently it has been noticed that, along with
visual, other e.g. aural (voice) and haptic (touch) modalities could play a
significant role in that process
        <xref ref-type="bibr" rid="ref9">(Schirmer &amp; Adolphs 2017)</xref>
        . Authors had investigated
emotion perception from those modalities on behavioral and neuronal level and
concluded that each of them engage di erent processing system and attributed
information does not simply duplicate the visual one.
      </p>
      <p>Other researchers also reach the conclusion that, among other factors, the
sound of our voice plays crucial role in personality perception. Nass &amp; Lee (2001)
conducted an experiment with participants listening to an introvert or
extrovert arti cial voice in natural context. The subjects (introverts and extroverts)
showed similarity attraction to one of the voices (also introverts and extroverts).
It shows that voice is one of factors that contribute to personality perception.</p>
      <p>
        Some authors even postulate a concept of arti cial personality
        <xref ref-type="bibr" rid="ref12">(Wester et
al. 2015)</xref>
        . Their experiment shows that adding dis uency (e.g. uh, um, like, you
know, I mean) to synthesised voice can result in di ere2nt personality traits being
assigned to it. The authors claim that adding simple dis uencies in fact increases
the level of naturalness attributed to these voices.
      </p>
      <p>The subject of arti cial voices is especially important in the context of
emerging array of voice assistants (e.g. Alexa1, Siri2, Google Home3, Cortana4). The
perception of such systems would be an interesting research eld in cognitive
science. However, the same approach can be applied in order to get important
information on how to help a speci c group of people with the use of synthesised
voice.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Laryngectomy</title>
      <p>
        Let us consider people that have lost their voice. Laryngectomy is an invasive,
radical chirurgical treatment for laryngeal cancer
        <xref ref-type="bibr" rid="ref8">(Nowak et al. 2015)</xref>
        . It is called
salvage surgery, because it is the most e cient method for the most advanced
tumors. Patients lose the ability to use their natural voice. In the light of the
above research we can claim that with that, they lose opportunity to convey
some of their personality aspects in interactions. Up to this moment there has
been no technical possibility of saving this "biological trace" of the voice, but
some new solutions appeared recently. Currently there are 3 methods of voice
restoration
        <xref ref-type="bibr" rid="ref10">(Tang &amp; Sinclair 2015)</xref>
        listed below and rated according to following
criteria:
{ (Criterion 1) non-invasiveness
{ (Criterion 2) naturalness of communication
{ (Criterion 3) quality of the voice
{ (VR 1) voice prosthesis (ful lls Criterion 2 and Criterion 3, does not meet
      </p>
      <p>Criterion 1)
{ (VR 2) esophageal speech (ful lls Criterion 1 and Criterion 2, does not meet</p>
      <p>Criterion 3)
1 Alexa: https://developer.amazon.com/alexa, retrieved 30.04.2017, 20:00
2 Siri: https://www.apple.com/ios/siri/, retrieved 30.04.2017, 20:00
3 Google Home: https://store.google.com/ca/product/google home/,
30.04.2017, 20:00
4 Cortana: https://www.microsoft.com/en-us/cortana, retrieved 30.04.2017, 20:00
retrieved
{ (VR 3) electrolarynx (ful lls Criterion 1 and Criterion 3, does not meet</p>
      <p>Criterion 2)
As can be seen none of those methods ful ll Criterion 1, Criterion 2 and Criterion
3 at the same time. Due to the drawbacks of the current methods of voice
restoration after laryngectomy, our research group has proposed an alternative.
{ (VR 4) Intelligent interface based on neuromuscular input and digitized
biological voice
4</p>
    </sec>
    <sec id="sec-4">
      <title>Novel voice restoration technique</title>
      <p>
        Application of novel technologies from the eld of arti cial intelligence could
help the group of patients following laryngectomy. The main project goal is to
create a complete system of communication for laryngectomees, combining two
approaches:
{ (A1) AlterEgo system
        <xref ref-type="bibr" rid="ref5">(Kapur et al. 2018)</xref>
        . AlterEgo allows control of
electronic devices without the need for any visible movement. The control relies
on silent speech, like counting under one's breath. The system catches
neuromuscular signals from selected face and neck areas responsible for speech
production and then on the basis of previously learnt model, predicts words
that have been "silently spoken". In its current stage, the project allows the
user to count big numbers, play chess and Go with the help of a computer
and to make simple queries (e.g. What time is it it? ). Ultimately it is aimed
to work with natural language, e.g. to allow write Google queries without
the need of mouth movement or taking out a smartphone.
{ (A2) Development in the eld of biological voice digitalization. In the last
years at least two commercially used models of English speech
digitalization appeared, developed by companies Lyrebird AI5 and CereProc6. Those
companies o er a service of creating a \biological trace" of a voice - i.e.
personal characteristics, or \ ngerprint" of one's speech. After a few short
recording sessions a complete text-to-speech system is created, with a voice
easily recognizable by an user and relatives as \their voice".
      </p>
      <p>System that we propose draws from both (A1) and (A2), combining and
modifying them.</p>
      <p>The training session:
1. neuromuscular signals from the face and neck of the patient are detected
using EMG sensors. At the same time his/her voice is recorded;
2. recordings are manually transcribed to text;
3. once the voice is recorded it is converted into an arti cial voice model, but
with "biological trace" of an user (M1);
5 Lyrebird AI: https://lyrebird.ai/, retrieved 30.04.2017, 20:00
6 CereProc: https://cereproc.com/, retrieved 30.04.2017, 20:00
4. based on information from 1. the machine learning model is built with EMG
signal-derived features and text transcriptions as a predicted values (M2).
Speech production:
1. patient tries to speak normally (that's the di erence between "silent speech"
interfaces and the system proposed by us). Here, mouth movements are
desired increasing naturalness of speech;
2. the physiological movement signals are collected from his/her mouth and
neck muscles with EMG sensors;
3. the system predicts the text that should be produced by similar movements
basing on the previously built model (M2);
4. this text is converted according to previously built digital copy of user's voice
(M1);
5. afterwards the sound (speech) is played from a speaker attached to the
patient's body. It is aimed to be a place that the patient will not see and take
little space. On the current stage of the project a JBL Go speaker is attached
to the user's forehead with an elastic band.</p>
      <p>
        There have been rst attempts of using silent speech system to help patients
following laryngectomy
        <xref ref-type="bibr" rid="ref4">(Fagan et al. 2007, Meltzner et al. 2017)</xref>
        , but none have
tried to combine it with the digital copy of patients' voice using a method that
could be used in clinical conditions yet. Combining (A1) and (A2) could lead to
creation of a non-invasive, natural voice restoration technique with good quality
of the voice which meets established criteria of success (Criteria 1-3). It will
allow to build a system which will exceed all three methods for voice restoration
currently used: voice prosthesis, esophageal speech and electrolarynx.
      </p>
      <p>The project raised numerous research questions. Among others:
{ (RQ 1) Would the system be natural for patients?
{ (RQ 2) How users would perceive their own body with a speaker attached
to it and how others will perceive them?
{ (RQ 3) How to integrate solutions into a system for a speci c task (voice
restoration after laryngectomy)?
{ (RQ 4) How to manage complexity of the language?
{ (RQ 5) How to build suitably fast system that allow use in everyday
situations?
{ (RQ 6) How to provide a system that will allow to prosody control?</p>
      <p>Investigation of naturalness of the voice will be the rst step within (RQ 1).
The very basic hypothesis that we posited is that users will perceive a voice,
even with lower quality, with their "biological trace" as more natural than the
best (on the matter of voice quality) available text to speech voice (H1).</p>
      <p>In order to test (H1) we have established two main tasks:
{ (T1) Analize current literature on the subject of arti cial voices perception
and psychological aspects of laryngectomy;
{ (T2) Conduct empirical study that will answer this question.</p>
    </sec>
    <sec id="sec-5">
      <title>Psychological factors of voice perception</title>
      <p>As people attribute personality traits to arti cial voice, thus output of our system
could a ect the way how patients will be perceived by others e.g. could be
perceived as extravert, deceptive etc. Based on that assumption we argue that
it is crucial to choose a better way: building entirely arti cial voice (maybe with
assigned arti cial personality) or arti cial voice with patient's \biological trace".</p>
      <p>
        Issues to be considered are the psychological aspects of laryngectomy.
Operation is a daunting experience for patients and their relatives. The study of
Bussian et al. (2010) suggest that psychiatric disorders a ects approximately
one fth of laryngectomy patients. The mail survey study with a large group
of respondents after laryngectomy
        <xref ref-type="bibr" rid="ref6">(Kotake et al. 2017)</xref>
        suggest that the most
important psychological adjustment after operation is recognition of oneself as
a voluntary agent. Since, as indicated above, a voice is an important quality
contributing to a person's perception by others, restoring the voice as natural
as possible and having full control over one's voice is one of the key factors to
restoring agentivity.
      </p>
      <p>Still, a study by Vilaseca &amp; Chen (2006) could be mentioned. They showed
that although patients identi ed speech among their most important problems,
no correlation was found between speech and long-term quality of life (QOL).
There is a need to establish which factors connected with voice, contribute to
the self-perception. Such study would be greatly aided by the system we are
going to create, where various factors can be manipulated and their e ects can
be measured.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>
        After analysis conducted in previous section, we have drawn rst preliminary
conclusion:
(Conclusion 1) Patients may prefer a system with "biological trace" of their
voice. This is based on the premise and assumption:
{ (P1) Other people seem to perceive a part of people personal traits basing
on their voice
        <xref ref-type="bibr" rid="ref12 ref7">(Nass &amp; Lee 2001, Wester et al. 2015)</xref>
        { (As 1) Patients after laryngectomy wants to be perceived similarly as before
the surgery.
      </p>
      <p>Combining (Conclusion 1) with our criterion of naturalness (Criterion 2) leads
to:
(Conclusion 2) The e ort to create digital model of patient's voice is justi ed
and it should be further investigated experimentally within (T2).</p>
      <p>It seems that users could perceive a voice, even with lower quality, with
their "biological trace" more natural than the best (in the respect of voice
quality) available text to speech voice. As we can see is it worth building arti cial
voice with "biological trace" of a patient. We argue that the di erence between
arti cial voice with or without "biological trace" of patient's is worth further
investigation, due to the fact that literature suggests that there is a di erence in
perception of those, but do not specify what is the character of this di erence.</p>
      <p>During the conference we have encouraged a discussion on the following
questions:
{ (Q1) People have most probably developed a well working system of how
to attribute emotions and personality traits to others based on their voice.
What are the possible cognitive biases in the context of emergence of arti cial
voices?
{ (Q2) Is it really valuable to use patient's natural voice in our novel voice
restoration technique? Do we need to try to falsify (H1) more ercely and
why?</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Belin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fecteau</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bedard</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Thinking the voice: neural correlates of voice perception</article-title>
          .
          <source>Trends in cognitive sciences, 8</source>
          (
          <issue>3</issue>
          ),
          <fpage>129</fpage>
          -
          <lpage>135</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bussian</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , Wollbruck,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Danker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Herrmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Thiele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Dietz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , &amp;
            <surname>Schwarz</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Mental health after laryngectomy and partial laryngectomy: a comparative study</article-title>
          .
          <source>European Archives of Oto-Rhino-Laryngology</source>
          ,
          <volume>267</volume>
          (
          <issue>2</issue>
          ),
          <fpage>261</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fagan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Ell</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Gilbert</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Sarrazin</surname>
          </string-name>
          ,
          <string-name>
            <surname>E</surname>
          </string-name>
          , Chapman,
          <string-name>
            <surname>P.</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2008</year>
          ).
          <article-title>Development of a (silent) speech recognition system for patients following laryngectomy</article-title>
          .
          <source>In Medical Engineering &amp; Physics 30</source>
          <volume>419</volume>
          {
          <fpage>425</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Meltzner</surname>
            , G.,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heaton</surname>
            , J.,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Luca</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Kline</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>C.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy In IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH</article-title>
          ,
          <source>AND LANGUAGE PROCESSING</source>
          , VOL.
          <volume>25</volume>
          , NO.
          <fpage>12</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kapur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kapus</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>AlterEgo: A Personalized Wearable Silent Speech Interface</article-title>
          .
          <source>In 23rd International Conference on Intelligent User Interfaces</source>
          (pp.
          <fpage>43</fpage>
          -
          <lpage>53</lpage>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kotake</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzukamo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kai</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iwanaga</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takahashi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Social support and substitute voice acquisition on psychological adjustment among patients after laryngectomy</article-title>
          .
          <source>European Archives of Oto-Rhino-Laryngology March</source>
          <year>2017</year>
          , Volume
          <volume>274</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>3</given-names>
          </string-name>
          ,
          <source>Pages</source>
          <volume>1557</volume>
          {
          <fpage>1565</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Nass</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Does Computer-Synthesized Speech Manifest Personality? Experimental Tests of Recognition, Similarity-Attraction,</article-title>
          and ConsistencyAttraction TJournal of Experimental Psychology: Applied, Vol.
          <volume>7</volume>
          , No.
          <volume>3</volume>
          ,
          <fpage>171</fpage>
          -
          <lpage>181</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Nowak</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szyfter</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wierzbicka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2015</year>
          ). Nowotwory w otolaryngologii,
          <source>rozdzial XII: Nowotwory krtani Wydawnictwo Termedia</source>
          ,
          <fpage>279</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Schirmer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adolphs</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Emotion Perception from Face, Voice, and Touch: Comparisons and Convergence Trends in cognitive sciences</article-title>
          ,
          <volume>21</volume>
          (
          <issue>3</issue>
          ),
          <volume>216</volume>
          {
          <fpage>228</fpage>
          . doi:
          <volume>10</volume>
          .1016/j.tics.
          <year>2017</year>
          .
          <volume>01</volume>
          .001.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Sinclair</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Voice Restoration After Total Laryngectomy</article-title>
          .
          <source>Otolaryngologic Clinics of North America</source>
          . Volume
          <volume>48</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>4</given-names>
          </string-name>
          ,
          <string-name>
            <surname>August</surname>
          </string-name>
          , Pages
          <fpage>687</fpage>
          -
          <lpage>702</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Vilaseca</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Backscheider</surname>
            ,
            <given-names>A. G.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Long-term quality of life after total laryngectomy</article-title>
          . Head &amp; neck,
          <volume>28</volume>
          (
          <issue>4</issue>
          ),
          <fpage>313</fpage>
          -
          <lpage>320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Wester</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aylett</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomalin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dall</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Arti cial Personality and Dis uency</article-title>
          .
          <source>INTERSPEECH</source>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>