<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recognition of Psychologically Relevant Aspects of Context on the Basis of Features of Speech</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anthony Jameson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barbara Großmann-Hutter</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Mu¨ ller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Wittig</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juergen Kiefer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralf Rummer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DFKI, German Research Center for Artificial Intelligence Department of Computer Science, Saarland University Department of Psychology, Saarland University</institution>
        </aff>
      </contrib-group>
      <fpage>124</fpage>
      <lpage>127</lpage>
      <abstract>
        <p>looks at one possible way of capturing such aspects of context: the analysis of features of the users' speech. In a replication and extension of an earlier study of our group, we created four experimental conditions that varied in terms of whether the user was (a) navigating within a simulated airport terminal or standing still; and (b) subject to time pressure or not. The speech produced by these subjects was coded in terms of 7 variables. We trained dynamic Bayesian networks on the resulting data in order to see how well the information in the users' speech could serve as evidence as to which condition the user had been in. The results give information about the accuracy that can be attained in this way, the methods that can be used to implement the classifiers, and the diagnostic value of some specific features of speech.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Background and Motivation</title>
      <p>When we think about modeling and representing context, we smay think first in
terms of sensors that directly detect features of the environment, such as the location
of the user, the presence of other persons, physical features like temperature and noise
level, or activities that the user is engaged in. But the importance of these features of
context is often due to the psychological effects that they have on the user. For example,
the fact that a user is engaged in communication with other persons may be important
mainly because it implies that the user has little time and attention left over for
interacting with a system.</p>
      <p>It is therefore natural to view the contextually influenced psychological states of the
user as constituting an important part of the context. But how can these psychological
states be detected by a system?</p>
      <p>The research summarized here was supported by the German Science Foundation (DFG) in its
Collaborative Research Center on Resource-Adaptive Cognitive Processes, SFB 378, Projects
B2 (READY) and A2 (VEVIAG). We thank one of the anonymous reviewers for perceptive
comments on the submitted version of the manuscript.</p>
      <p>Two strategies, which are not mutually exclusive, can be distinguished:
1. The system can detect objective features of the context and make inferences about
the psychological states that they are likely to induce; and
2. The system can detect behavioral or other responses of the user that can be treated
as symptoms of the psychological states in question. In this abstract, we focus on
the second approach, though we believe that in general a combination of the two
approaches should be considered.</p>
      <p>
        One class of symptoms of psychological states that have often been considered
comprises physiological responses that can be detected by sensors attached to the user’s
body. In this abstract, we discuss another sort of symptom, which may be especially
useful when a system is involved that requires the user to produce a good deal of speech
(e.g., when giving commands via speech or creating voice recordings). An obvious first
question is: Is there a useful amount of information available in a user’s speech that
can enhance the recognition of contextually determined psychological states? A partial
answer to this question was given in an earlier publication from our group (Mu¨ller et al.
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) that studied the recognition of two states: cognitive load and time pressure. In the
present abstract, we summarize a replication and extension of the experiment reported
on in the earlier paper—a second experiment that both corroborates the initial results
and adds some new ones.
      </p>
      <p>
        Because of the space limitation, for details of the methods and results the reader
must be referred to the poster presentation at the workshop, the earlier paper by M u¨ller
et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and/or the originally submitted longer version of the present abstract.1
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Summary of Methods and Results</title>
      <p>The basic design of the two experiments is illustrated in Figure 1. Each subject took
the role of a traveler in an airport terminal, which was simulated on a PC screen. The
subject’s main task was to ask questions of two fictitious airport helpers by speaking into
a microphone. In Experiment 1, which is illustrated in the left-hand side of Figure 1,
two variables were manipulated experimentally:
– Navigation: Whether or not subjects were required to navigate within the simulated
airport terminal while speaking.
– Short-term time pressure: Whether or not the subjects were motivated to formulate
their questions quickly.</p>
      <p>On the basis of the data acquired from 32 subjects, we experimented with the
learning of dynamic Bayesian networks that were designed to recognize what condition a
subject was in on the basis of several features of the subject’s speech, such as the length
of utterances, articulation rate, frequency and duration of pauses, and several types of
disfluency.
1 By the time of the workshop, these materials will be available via the web page
http://dfki.de/ jameson/mrc05/.</p>
      <sec id="sec-2-1">
        <title>Without acoustic distraction</title>
      </sec>
      <sec id="sec-2-2">
        <title>With acoustic</title>
        <p>distraction</p>
      </sec>
      <sec id="sec-2-3">
        <title>No time</title>
        <p>pressure</p>
      </sec>
      <sec id="sec-2-4">
        <title>Time</title>
        <p>pressure</p>
      </sec>
      <sec id="sec-2-5">
        <title>No time</title>
        <p>pressure</p>
      </sec>
      <sec id="sec-2-6">
        <title>Time pressure No navigation</title>
      </sec>
      <sec id="sec-2-7">
        <title>Navigation</title>
        <p>The results were moderately encouraging, and they shed some light on the
diagnostic value of the various features of speech. But it seemed that many subjects were able
to handle the navigation task so easily that it induced too little cognitive load to affect
their speech. One motivation for Experiment 2 was the desire to see if the navigation
task would have more noticeable effects in a situation where the subject was already
distracted by another contextual factor. We therefore replicated the experiment in the
way illustrated in the right-hand side of Figure 1: While the subjects performed their
tasks, typical airport announcements (which had been recorded at Frankfurt Airport)
were played back to them.</p>
        <p>Even though the subjects were not required to pay attention to the content of the
announcements, they did report that the announcements made it more difficult for them
to generate appropriate questions. Consistent with this result, the difference between
the navigation and the no-navigation conditions was easier to detect in this experiment.
Evidently, because of the increased distraction, the subjects more often showed speech
symptoms of cognitive overload while navigating.</p>
        <p>In other respects, the results of Experiment 2 corroborated those of Experiment 1.</p>
        <p>We also repeated the learning experiments while systematically leaving out one
feature of speech at a time, so as to determine which ones might be dispensable. This
analysis revealed that the features that were most difficult to detect automatically could
be omitted with little loss in accuracy.</p>
        <p>In sum, the two experiments showed in a consistent way that contextually induced
cognitive load and time pressure can (at least in some situations) have effects on
features of the person’s speech that are strong enough to permit significantly above-chance
discrimination; and they yield information about the diagnostic value of particular
features.</p>
        <p>Any attempt to apply the ideas and results of these experiments in a particular
application scenario will necessarily involve considerable further work and creativity. But
we believe that the results of these experiments will be helpful as a starting point.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Mu¨ller,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Großmann-Hutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Jameson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Rummer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Wittig</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Recognizing time pressure and cognitive load on the basis of speech: An experimental study</article-title>
          . In Bauer,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gmytrasiewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Vassileva</surname>
          </string-name>
          , J., eds.: UM2001,
          <string-name>
            <surname>User</surname>
            <given-names>Modeling</given-names>
          </string-name>
          : Proceedings of the Eighth International Conference. Springer, Berlin (
          <year>2001</year>
          )
          <fpage>24</fpage>
          -
          <lpage>33</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>