=Paper=
{{Paper
|id=Vol-2068/humanize6
|storemode=property
|title=Learning Preferences and Soundscapes for Augmented Hearing
|pdfUrl=https://ceur-ws.org/Vol-2068/humanize6.pdf
|volume=Vol-2068
|authors=Maciej J Korzepa,Benjamin Johansen,Michael K Petersen,Jan Larsen,Jakob E Larsen,Niels H Pontoppidan
|dblpUrl=https://dblp.org/rec/conf/iui/KorzepaJPLLP18
}}
==Learning Preferences and Soundscapes for Augmented Hearing==
<pdf width="1500px">https://ceur-ws.org/Vol-2068/humanize6.pdf</pdf>
<pre>
     Learning preferences and soundscapes for augmented
                           hearing
         Maciej Jan Korzepa              Benjamin Johansen                                      Michael Kai Petersen
     Technical University of Denmark Technical University of Denmark                          Eriksholm Research Center
           Lyngby, Denmark                 Lyngby, Denmark                                      Snekkersten, Denmark
             mjko@dtu.dk                    benjoh@dtu.dk                                       mkpe@eriksholm.com

             Jan Larsen                   Jakob Eg Larsen                                    Niels Henrik Pontoppidan
    Technical University of Denmark Technical University of Denmark                          Eriksholm Research Center
          Lyngby, Denmark                 Lyngby, Denmark                                      Snekkersten, Denmark
            janla@dtu.dk                    jaeg@dtu.com                                       npon@eriksholm.com


ABSTRACT                                                               the reasons behind the prevalence of non-use of fitted HA is
Despite the technological advancement of modern hearing                identified by McCormack et al. [12] as users feeling that they
aids (HA), many users abandon their devices due to lack of             do not get sufficient benefits of HA. However, in the light of
personalization. This is caused by the limited hearing health          technological advancement of HA as well as the abundance of
care resources resulting in users getting only a default ’one          research indicating clear benefits of HA usage, we rather seek
size fits all’ setting. However, the emergence of smartphone-          the source of the problem in the lack of personalization in the
connected HA enables the devices to learn behavioral patterns          current clinical approach. The increasing number of hearing-
inferred from user interactions and corresponding soundscape.          impaired people [6] and lack of hearing health care resources
Such data could enable adaptation of settings to individual            often results in users getting a ’one size fits all’ setting and
user needs dependent on the acoustic environments. In our              thus not exploiting the full potential of modern HA.
pilot study, we look into how two test subjects adjust their
                                                                       Furthermore, the current clinical approach to measure hearing
HA settings, and identify main behavioral patterns that help
                                                                       loss is based on pure tone audiogram (PTA). PTA captures the
to explain their needs and preferences in different auditory
                                                                       audible hearing thresholds in frequency bands usually from
conditions. Subsequently, we sketch out possibilities and
                                                                       250 Hz to 10 kHz. However, PTA does not fully explain a hear-
challenges of learning contextual preferences of HA users.
                                                                       ing loss. Killion et al. showed that the ability to understand
Finally, we consider how to encompass these aspects in the
                                                                       speech in noise may vary by up to 15 dB difference in Signal-
design of intelligent interfaces enabling smartphone-connected
                                                                       to-Noise ratio (SNR) for users with a similar hearing loss [8].
HA to continuously adapt their settings to context-dependent
                                                                       Likewise, users differ in terms of how they perceive loudness.
user needs.
                                                                       Le Goff showed that speech at 50dB can be interpreted either
                                                                       as moderately soft or slightly loud [9]. This means that some
ACM Classification Keywords
                                                                       users may perceive soft sounds as noise which they would
H.5.2 Information Interfaces and Presentation (e.g. HCI): User         rather attenuate than amplify. These aspects are rarely taken
Interfaces—User-centered design; K.8.m Personal Computing:             into account in current clinical workflows.
Miscellaneous
                                                                       Earlier research by Dillon et al. [3] indicated potential benefits
Author Keywords                                                        of customization both within and outside the clinic including
personalization; augmented hearing; intelligent interfaces             fewer visits to clinics, a greater choice of acoustic features
                                                                       for fitting and end users’ feeling of ownership. Previous stud-
INTRODUCTION                                                           ies that focused on customizing the settings of devices based
Even though hearing loss is one of the leading lifestyle causes        on perceptual user feedback [13] or using interactive table-
of dementia [11], up to one quarter of users fitted with hearing       tops in the fitting session [2] indicate that users prefer such
aids (HA) have been reported not to use them [5]. One of               customization. Aldaz et al. [1] used reinforcement learning
                                                                       to personalize HA settings based on auditory and geospatial
                                                                       context by prompting users to perform momentary A/B lis-
                                                                       tening tests. However, only with the recent introduction of
                                                                       smartphone connected HA like the Oticon Opn [15], it has be-
                                                                       come possible to go beyond ecological momentary assessment
                                                                       by continuously tracking the users’ interactions with the HA
                                                                       and thereby learn individual coping strategies from data [7].
©2018. Copyright for the individual papers remains with the authors.   Such inferred behavioral patterns may provide a foundation for
Copying permitted for private and academic purposes.
HUMANIZE ’18, March 11, 2018, Tokyo, Japan
correlating user preferences with the corresponding auditory                 Subject   Program   Mode                     Brightness      Soft Gain
environment and potentially enable continuous adaptation of                      1       P1      omnidirectional             +1               0
HA settings to the context.                                                              P2      omnidirectional              0               0
                                                                                         P3      low noise reduction         +2              +2
When interpreting user preferences, one needs to consider how                            P4      high noise reduction        -2              -2
the brain interprets speech. Auditory streams are bottom-up                      2       P1      omnidirectional             +2              +2
processes fused into auditory objects, based on spatial cues                             P2      low noise reduction         +1              +1
related to binaural intensity and time difference [4, 10, 14, 16].                       P3      medium noise reduction       0               0
However, separating competing voices is a top-down process,                              P4      high noise reduction        -2              -2
applying selective attention to amplify one talker and atten-
uate others. HA may mimic this top-down process by either                   Table 2: Program settings for subject 1 and 2, with modified
1) increasing the brightness to enhance spatial cues facilitat-             brightness {−2 . . . 2} and soft gain for low sounds {−2 . . . 2}
ing focusing on specific sounds or 2) improve the signal to                 where 0 corresponds to the default level.
noise ratio by attenuating ambient sounds to facilitate better
                                                                            Procedure
separation of voices. Incorporating these aspects into our ex-
perimental design, we hypothesize we could learn top-down                   Based on the individual hearing loss, the subjects were fitted
preferences for brightness or noise reduction based on HA                   with 4 programs as shown in Table 2. For all programs, HA
program and volume adjustments combined with bottom-up                      volume could be adjusted to one of the levels from −8 · · · + 4,
sampling of how HA perceive the auditory environment in                     where 0 is the default volume. The subjects were instructed
terms of sound pressure level, modulation and signal to noise               to explore different settings using HA buttons over a period
ratio. This allows us to assess in which listening scenarios the            of 6-7 weeks. In the experimental setup, the HA always start
user relies on enhanced spatial cues provided by omnidirec-                 up in the default program and volume. The default program
tionality with more high frequency gain to separate sounds and              for subject 1 was P2 in the first five weeks which was then
in which environments the user instead reduces background                   switched to P1 for the last two weeks at the subject’s request.
noise to selectively allocate attention to specific sounds.                 Subject 2 used P2 as the default program.

In our pilot study, we give two subjects HA programmed with                 Soundscape data
four contrasting programs in terms of brightness and noise                  To create an interpretable representation of the auditory fea-
reduction, and register how they interact with programs and                 tures defining the context, we applied k-means clustering to the
volume over a period of 6-7 weeks. The purpose of this work                 acoustic context data collected from HA. The values comprise
is to:                                                                      auditory features defining how the HA perceive the acoustic
                                                                            environment:
• show how the subjects interact with HA settings in real
  environments without any intervention,                                    sound pressure level measure of estimated loudness,
• discover basic contextual preferences for the subjects,                   noise floor tracking the lower bound of the signal,
• identify possibilities and challenges of learning contextual
  preferences of HA users,                                                  modulation envelope tracking the peaks in the signal,
• suggest application of intelligent user interfaces that would             modulation index estimated as difference between modula-
  continuously support users in optimizing their HA not only                   tion envelope and noise floor,
  by learning and adjusting to individual preferences but also
                                                                            signal to noise ratio estimated as difference between sound
  exploiting crowd-sourced patterns.
                                                                               pressure level and noise floor.
METHOD                                                                      The above parameters are captured as a snapshot across mul-
Participants                                                                tiple frequency bands once per minute. Additionally, the HA
Two male participants (from a screened population provided                  perform a rough classification of the auditory environment and
by Eriksholm Research Centre) volunteered for the study (Ta-                represent it as a categorical variable with one of the follow-
ble 1). The participants suffer from a symmetrical hearing                  ing values: ’quiet’, ’noise’, ’speech in quiet’, and ’speech in
loss, ranging from moderate to moderate-severe as described                 noise’. These labels are used as ground truth for evaluating the
by the WHO[17]. All test subject signed an informed consent                 performance of the clustering by means of normalized mutual
before the beginning of the experiment.                                     information (NMI) score. The optimal number of clusters K
                                                                            was estimated to be 4 with NMI = 0.35.
 Subject   Age group    Hearing loss     Experience with Opn   Occupation
   1          65         Moderate               Yes             Retired     C1
                                                                                                                                  QUIET
   2          76       Moderate-severe          No              Working     C2                                                    SPEECH IN QUIET
                                                                            C3                                                    SPEECH IN NOISE
                                                                                                                                  NOISE
 Table 1: Demographic information related to the subjects.                  C4

Apparatus
The subjects were fitted with a pair of research prototype HA               Figure 1: Applying k-means algorithm to the soundscape data
EVOTION extending Oticon Opn. The subjects used Android                     captured from the HA results in four clusters which estimate
6.0 or iOS 10, connected via Bluetooth. Data was logged                     the acoustic context as C1 ’quiet’, C2 ’speech in noise’, C3
using the nRF connect app and shared via Google Drive.                      ’clear speech’ or C4 ’normal speech’.
                            QUIET                     CLEAN SPEECH             NORMAL SPEECH           SPEECH IN NOISE
  Week 38
                                                                                                                             P1
  Week 39
                                                                                                                             P2
  Week 40
                                                                                                                             P3
  Week 41

                                                                                                                             P4
  Week 42

                                                                                                                             OFF
  Week 43


  Week 44


  Week 45


      Mon 00:00        Tue 00:00        Wed 00:00        Thu 00:00       Fri 00:00      Sat 00:00       Sun 00:00
  Week 43


  Week 44


  Week 45


  Week 46


  Week 47


  Week 48


  Week 49


      Mon 00:00         Tue 00:00        Wed 00:00         Thu 00:00       Fri 00:00       Sat 00:00       Sun 00:00


Figure 2: Time series data combining the contextual soundscape data captured from the HA (green gradient) with the corresponding
interactions related to the user selected programs (yellow-red gradient) for subject 1 (top) and 2 (bottom).


The resulting four soundscape clusters were labeled accord-            RESULTS
ing to the proportion of samples with different ground-truth           We refer to the user’s selected volume and program choice
labels within each cluster ( Figure 1) while ambiguities were          as user preferences, and to the corresponding auditory envi-
solved by examination of the cluster centroids. The first clus-        ronment as the context. Juxtaposing user preferences and the
ter mainly captured the ’quiet’ class which is also validated by       context allows us to learn which HA settings are selected in
the cluster centroid having very low values of sound pressure          specific listening scenarios. To facilitate interpretation we
level and noise floor. Thus, the environments assigned to this         assign each cluster a color from white to green gradient, in
cluster will be represented as ’quiet’. The second cluster cap-        which increasing darkness correspond to increased noise in
tured both ’speech in noise’ and ’noise’ classes which suggests        the context (quiet → clean speech → normal speech → speech
that the numerical representations of these environments are           in noise). Likewise, we assign each program a color from
similar. For simplicity, we label them as ’speech in noise’. The       yellow to red gradient. Lighter colors define programs with
third and fourth cluster both captured mainly ’speech in quiet’        an omnidirectional focus and added brightness. Darker colors
with a small addition of other classes. As the third cluster           indicate increasing attenuation of noise. This coloring scheme
captured samples with much higher sound pressure level and             will apply throughout the paper.
signal to noise ratio, it will be labeled as ’clear speech’, while
the fourth cluster with attributes of the samples closer to mean
will be represented as ’normal speech’.                                Contextual user preferences
                                                                       Figure 2 shows the user preference and context changes for
                                                                       both subjects, plotted across the hours of the day over the
                                                                       weeks constituting the full experimental period. Subject 1
                                                                       most frequently selects programs which provide an omnidi-
                            1                                                                                                                        1

                           0.8                                                 40                                                                   0.8


                                                                                    Total HA usage in minutes per hour
                           0.6                                                 30                                                                   0.6


                                                                                                                         in specific environments
Proportion of time spent


                                                                                                                         Proportion of time spent
                                                                               20
                           0.4                                                                                                                      0.4
  in specific programs


                                                                               10
                           0.2                                                                                                                      0.2


                                                                                                (gray trace)
                                                                               0
                            0                                                                                                                        0
                            1                                                                                                                        1
                                                                               40
                           0.8                                                                                                                      0.8
                                                                               30
                           0.6                                                                                                                      0.6
                                                                               20
                           0.4                                                                                                                      0.4
                                                                               10
                           0.2                                                                                                                      0.2
                                                                               0
                            0                                                                                                                        0
                                 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23                                                                              9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
                                                Hour of day                                                                                                              Hour of day

Figure 3: Average HA usage time per hour (grey trace, right                                                              Figure 4: Relative time spent in different contexts over day for
axis) and relative program usage over day (left axis) for subject                                                        subject 1 (top) and 2 (bottom).
1 (top) and 2 (bottom).
                                                                                                                                                                         Subject 1          Subject 2
                                                                                                                                                                         P1 P2 P3      P4   P1 P2 P3      P4
                                                                                                                                                    QUIET                1    1    3   0    3    3    0   0
                                                                                                                             Context
rectional focus with added brightness (the default program                                                                                          CLEAN SPEECH         3    2    1   0    1    0    0   1
was changed from P2 to P1 after week 43). However, the                                                                                              NORMAL SPEECH        10 3      3   3    5    6    0   2
default program is occasionally complemented with programs                                                                                          SPEECH IN NOISE      6    5    4   7    17 5      1   2
suppressing noise. This suggests that the user benefits from
changing programs dependent on the context.                                                                              Table 3: Counts of changes to a given program in different
                                                                                                                         contexts for both subjects.
Subject 2 mainly selects two programs; P1 offering an om-
nidirectional focus with added soft gain and brightness, and                                                             context gradually increases. Both subjects seem exposed to
P2 (default) providing slight attenuation of ambient sounds.                                                             more ’speech in noise’ around midday which is likely due to
Compared to subject 1, this user spends more time in ’quiet’                                                             lunchtime activities.
context. Comparing weekdays to weekends, the latter seem to
contain a larger contribution of ’normal speech’ and ’speech                                                             Behavioral patterns
in noise’ auditory environments.                                                                                         We quantify the relationship between program/volume inter-
                                                                                                                         action and context by assuming that the settings are preferred
 Figure 3, illustrates subjects’ average usage of their HA and                                                           in the corresponding context only at the time when they are
which programs are used most throughout the day. Days                                                                    being selected. Under this assumption, we count how often
without any HA usage are excluded from the average. The                                                                  programs are selected in different contexts. Table 3 shows the
HA usage for subject 1 steadily increases in the morning and                                                             counts of program changes for both subjects. The total num-
early afternoon and peaks at around 4pm. P1 and P2 are the                                                               ber of changes was 52 and 46 for subject 1 and 2 respectively.
most used programs throughout the day. Interestingly, in the                                                             Considering the small number of changes, we outline only the
evening, P3 is used more frequently reaching similar usage                                                               most apparent behavioral patterns.
level as P1 and P2 between 11pm and midnight. P4 is used
very rarely yet consistently throughout the day. The HA usage                                                            Subject 1 switches to P4 mainly in ’speech in noise’ context
of test subject 2 is shifted towards the morning with peak                                                               (twice as often as in ’normal speech’). The fact that ’speech
activity around 2pm. The default P2 is the most commonly                                                                 in noise’ is a less common environment than ’normal speech’
used program throughout the whole day. However, during                                                                   strengthens this behavioral pattern. This suggests that subject
afternoon, P1 seems to be chosen more often.                                                                             1 seems to cope by suppressing noise in challenging listening
                                                                                                                         scenarios. Examples of this behavioral pattern are illustrated
 Figure 4 shows which contexts the subjects use their HA
                                                                                                                         in Figure 5. Likewise, a clear behavioral pattern can be seen
at different times of the day. The HA usage for subject 1 is
                                                                                                                         for subject 2. P1 is the preferred program in ’speech in noise’
dominated by speech-related contexts most of the day. Only
                                                                                                                         environments. Considering that P1 offers maximum bright-
after 5pm, the context has more ’quiet’ and ’clear speech’ and
                                                                                                                         ness and omnidirectionality with reduced attenuation and noise
less ’speech in noise’ contribution. From 9pm, the ’quiet’ con-                                                          reduction, this behavioral pattern suggests the user compen-
text rapidly overtakes context containing speech. Subject 2                                                              sates by enhancing high frequency gain as a coping strategy
appears to be exposed to different contextual patterns. Normal                                                           in complex auditory environments (examples in Figure 6).
and noisy speech contexts seem to be dominated by ’quiet’
soundscapes in the morning. Subsequently, their contribu-                                                                 Table 4 shows the number of volume changes for subject
tions increase and peak around 7pm. Afterwards, the ’quiet’                                                              2 (subject 1 rarely changes volume). All increases beyond
                       07 Nov                                               04 Nov                             06 Nov


                                 12:00 15:00 18:00                                   15:00 18:00 21:00 00:00            15:00 18:00 21:00 00:00

  21 Oct                                05 Nov
                                                                           Figure 7: Details of behavioral patterns for subject 1, indicat-
           12:00 15:00 18:00 21:00                12:00   15:00   18:00
                                                                           ing preferences for additional soft gain and brightness (P3) in
                                                                           ’silent’ (white) environments, in order to enhance the perceived
Figure 5: Details of behavioral patterns for subject 1, indicat-           intensity of the auditory scene.
ing preferences for reduced gain and suppression of unwanted
background noise (P4) in challenging ’speech in noise’ envi-
ronments (dark green).                                                     Learning the mapping between preferences and context is a
                                                                           non-trivial task, as the chosen settings might not be the optimal
 16 Nov                                 09 Nov
                                                                           ones in the context they appear in. For example, looking into
                                                                           the soundscape data, it is clear that the environment sound-
                                                                           scape frequently changes without the user responding with
            12:00 13:00 14:00                    09:00    12:00   15:00
 22 Nov                                 13 Nov
                                                                           an adjustment of the settings. Conversely, the auditory envi-
                                                                           ronment may remain stable whereas the user changes settings.
                                                                           We need to take into consideration not only the auditory en-
             12:00      15:00   18:00            12:00 15:00 18:00 21:00
                                                                           vironment but also the user’s cognitive state due to fatigue or
Figure 6: Details of behavioral patterns for subject 2, indi-              intents related to a specific task. Essentially, the user cannot
cating how omnidirectionality coupled with additional high                 be expected to exhibit clear preferences or consistent coping
frequency gain (P1) may enhance spatial cues to separate                   strategies at all times. We hypothesize that many reasons could
sounds in challenging ’speech in noise’ listening scenarios                explain why the user does not select an alternative program
(dark green).                                                              although the context changes:
                                                                           • being too busy to search for the optimal settings,
                                                                           • too high effort is required to change programs manually,
the default volume level (0) were made in ’speech in noise’
                                                                           • accepting the current program as sufficient for the task at
context. On the other hand, changes to the default volume
                                                                             hand,
were evenly distributed across all contexts. This suggests that
increasing the volume is another coping strategy for subject 2             • cognitive fatigue caused by constantly adapting to different
in more challenging listening scenarios.                                     programs.
                                                                           Similarly, we observe situations in which user changes settings
                                                  Subject 2                even though the auditory environment remain stable, which
                                                  0 +1 +2                  could be caused by:
                       QUIET                      2 0       0
             Context


                       CLEAN SPEECH               2 0       0              • the user trying out the benefits of different settings,
                       NORMAL SPEECH              2 0       0              • cognitive fatigue due to prolonged exposure to challenging
                       SPEECH IN NOISE            2 12 1                     soundscapes
                                                                           • the auditory environment not being classified correctly
Table 4: Counts of changes to a given volume in different                  In our pilot study, the context classification was limited to the
contexts for Subject 2.                                                    auditory features which are used for HA signal processing.
                                                                           However, smartphone connectivity offers almost unlimited
 Figure 7 shows a behavioral pattern that might be more dif-               possibilities of acquisition of contextual data. Applying ma-
ficult to interpret based on the auditory context alone. Occa-             chine learning methods such as deep learning might facilitate
sionally, subject 1 selects P3 in a ’quiet’ environment late in            higher level classification of auditory environments. Different
the evening. The test subject subsequently reported that these             types of listening scenarios might be classified as ’speech in
situations occur when going out for a walk and wanting to                  noise’ when limited to parameters such as signal to noise ratio
be immersed in subtle sounds such as rustling leaves or the                or modulation index. In fact, these could encompass very
surf of the ocean. The preference for P3 thus implies both                 different listening scenarios such as an office or a party where
increasing the intensity of soft sounds as well as the perceived           the user’s intents would presumably not be the same. Here
brightness.                                                                the acoustic scene classification could be supported by motion
                                                                           data, geotagging or activities inferred from the user’s calendar
DISCUSSION                                                                 to provide a more accurate understanding of needs and intents.
Inferring user needs from interaction data                                 Nevertheless, in some situations as illustrated in Figure 6, the
Empowering users to switch between alternative settings on                 behavioral patterns seem very consistent; the user preferences
internet connected HA’s, while simultaneously capturing their              appear to change simultaneously with the context, remain un-
auditory context allows us to infer how users cope in real life            changed as long as the context remains stable, and change
listening scenarios. To the best of our knowledge, this has not            back when the context changes again. Identifying such be-
been reported before.                                                      haviors could allow to reliably detect user preferences with
limited amount of user interaction data. Furthermore, time as       to directly learn and update the underlying parameters. This
a parameter also highlights patterns as illustrated in Figure 6     could be accomplished by validating specific hypotheses that
related to activities around lunch time, or late in the evening     refer to the momentary context as well as the characteristics
( Figure 7), as well as the contrasting behavior in weekends        captured in the HA user model, incorporating needs, behavior
versus specific weekdays.                                           and intents; e.g.’Did you choose this program because the
                                                                    environment got noisy / you are tired / you are in a train?
Even though our study was limited to only two users, we iden-
tified evident differences in the HA usage patterns. Subject 1      Secondly, a voice interface could recommend new settings
tends to use the HA mostly in environments involving speech,        based on collaborative filtering methods. Users typically stick
whereas subject 2 spends substantial amount of time in quiet        to their preferences and may be reluctant to explore available
non-speech environments. This might translate into differ-          alternatives although they might provide additional value. Sim-
ent expectations among HA users. Furthermore, our analysis          ilarly, in the case of HA users, preferred settings might not
suggests that users apply unique coping strategies in different     necessarily be the optimal ones. Applying clustering analysis
listening scenarios, particularly for complex ’speech in noise’     based on behavioral patterns, we could encourage users to
environments. Subject 1 relies on suppression of background         explore the available settings space by proposing preferences
noise to increase the signal to noise ratio in challenging sce-     inferred on the basis of ’users like me, in soundscapes like
narios. Subject 2 responds to speech in noise in a completely       this’. For instance, the inteface could say: ’Many users which
different way - he chooses maximum omnidirectionality with          share your preferences seem to benefit from these settings in a
added brightness and increased volume to enhance spatial cues       similar context - would you like to try them out?’ This would
to separate sounds. These preferences are not limited to chal-      encourage users to continuously exploit the potential of their
lenging environments but extends to the ambience and overall        HA to the fullest. Additionally, behavioral patterns shared
quality of sound, as subject 1 reported that he enhances bright-    among users, related to demographics (e.g. age, gender) and
ness and amplification of quiet sounds to feel immersed in the      audiology (e.g. audiogram) data, could alleviate the cold start
subtle sounds of nature. We find this of particular importance      problem in this recommender system, thus enabling personali-
as it indicates that users expect their HA not only to improve      sation to kick in earlier even when little or even no HA usage
speech intelligibility, but in a broader sense to provide aspects   data is available.
of augmented hearing which might even go beyond what is
experienced by normal hearing people.                               Lastly, users should be able to communicate their intents,
                                                                    as the preferences inferred by the system might differ from
Translating user needs into augmented hearing interfaces            the actual ones. In such scenarios, users could express their
We propose that learning and addressing user needs could be         intents along certain rules easily interpreted by the system
conceptualized as an adaptive augmented hearing interface           (e.g. ’I need more brightness.’) or indicate the problem in the
that incorporates a simplified model reflecting the bottom-up       given situation (e.g. ’The wind noise bothers me.’). Naturally,
and top-down processes in the auditory system. We believe           translating the user’s descriptive feedback into new settings
that such an intelligent auditory interface should:                 is more challenging, but could potentially offer huge value
                                                                    by relieving users of the need to understand how multiple
• continuously learn and adapt to user preferences,                 underlying audiological parameters influence the perceived
• relieve users of manually adjusting the settings by taking        outcome.
  over control whenever possible,
• recommend coping strategies inferred from the preferences         Combining learned preferences and soundscapes into in-
  of other users,                                                   telligent augmented hearing interfaces would be a radical
• actively assist users in finding the optimal settings based on    paradigm shift in hearing health care. Instead of a single
  crowdsourced data,                                                default setting, users may navigate a multidimensional contin-
• engage the user to be an active part in their hearing care.       uum of settings. The system could be optimized in real-time
                                                                    by combining learned preferences with crowdsourced behav-
Such an interface would infer top-down preferences based on         ioral patterns. With growing numbers of people suffering from
the bottom-up defined context and continuously adapt the HA         hearing loss we need to make users an active part of hear-
settings accordingly. This would offer immense value to users       ing health care. Conversational augmented hearing interfaces
by providing the optimal settings at the right time, dependent      may not only provide a scalable sustainable solution but also
on the dynamically changing context. However, the system            actively engage users and thereby improve their quality of life.
should not be limited to passively inferring intents, but rather
incorporate a feedback loop providing user input. We see a
tremendous potential in conversational audio interfaces as HAs
resemble miniature wearable smartspeakers which would al-           ACKNOWLEDGEMENTS
low the user to directly interact with the device, e.g. by means    This work is supported by the Technical University of Den-
of a chatbot or voice AI. First of all, such an interface might     mark and the Oticon Foundation. Oticon EVOTION HAs are
resolve ambiguities in order to interpret behavioral patterns.      partly funded by European Union’s Horizon 2020 research and
In a situation when user manually changes the settings in a         innovation programme under Grant Agreement 727521 EVO-
way that is not recognized by the learned model, the system         TION. We would like to thank Eriksholm Research Centre
could ask for a reason in order to update its beliefs. Ideally,     and Oticon A/S for providing hardware, access to test subjects,
questions would be formulated in a way allowing the system          clinical approval and clinical resources.
REFERENCES                                                      10. Y. Litovsky, Ruth, J. Goupell, Matthew, M. Misurelli,
1. Gabriel Aldaz, Sunil Puria, and Larry J. Leifer. 2016.           Sara, and Alan Kan. 2017. Hearing with Cochlear
   Smartphone-Based System for Learning and Inferring               Implants and Hearing Aids in Complex Auditory Scenes.
   Hearing Aid Settings. Journal of the American Academy            Auditory System at the Cocktail Party 60 (2017), 261–291.
   of Audiology 27, 9 (2016), "732–749". DOI:                       DOI:http://dx.doi.org/10.1007/978-3-319-51662-2_10,
   http://dx.doi.org/doi:10.3766/jaaa.15099                         10.1007/978-3-319-51662-2
2. Yngve Dahl and Geir Kjetil Hanssen. 2016. Breaking the       11. Gill Livingston, Andrew Sommerlad, Vasiliki Orgeta,
   Sound Barrier: Designing for Patient Participation in            Sergi G Costafreda, Jonathan Huntley, David Ames,
   Audiological Consultations. In Proceedings of the 2016           Clive Ballard, Sube Banerjee, Alistair Burns, Jiska
   CHI Conference on Human Factors in Computing                     Cohen-mansfield, Claudia Cooper, Nick Fox, Laura N
   Systems (CHI ’16). ACM, New York, NY, USA,                       Gitlin, Robert Howard, Helen C Kales, Eric B Larson,
   3079–3090. DOI:                                                  Karen Ritchie, Kenneth Rockwood, Elizabeth L
   http://dx.doi.org/10.1145/2858036.2858126                        Sampson, Quincy Samus, Lon S Schneider, Geir Selbæk,
                                                                    Linda Teri, and Naaheed Mukadam. 2017. Dementia
3. Harvey Dillon, Justin A. Zakis, Hugh McDermott, Gitte
                                                                    prevention, intervention, and care. The Lancet (2017).
   Keidser, Wouter Dreschler, and Elizabeth Convery. 2006.
                                                                    DOI:http://dx.doi.org/10.1016/S0140-6736(17)31363-6
   The trainable hearing aid: What will it do for clients and
   clinicians? 59 (04 2006), 30.                                12. Abby McCormack and Heather Fortnum. 2013. Why do
4. Mounya Elhilali. 2017. Modeling the Cocktail Party               people fitted with hearing aids not wear them?
   Problem. Auditory System at the Cocktail Party 60 (2017),        International Journal of Audiology 52, 5 (2013), 360–368.
   111–135. DOI:http://dx.doi.org/10.1007/                          DOI:http://dx.doi.org/10.3109/14992027.2013.769066
   978-3-319-51662-2_5,10.1007/978-3-319-51662-2                13. J. B. B. Nielsen, J. Nielsen, and J. Larsen. 2015.
5. David Hartley, Elena Rochtchina, Philip Newall,                  Perception-Based Personalization of Hearing Aids Using
   Maryanne Golding, and Paul Mitchell. 2010. Use of                Gaussian Processes and Active Learning. IEEE/ACM
   Hearing Aids and Assistive Listening Devices in an Older         Transactions on Audio, Speech, and Language Processing
   Australian Population. Journal of the American Academy           23, 1 (Jan 2015), 162–173. DOI:
   of Audiology 21, 10 (2010), 642–653. DOI:                        http://dx.doi.org/10.1109/TASLP.2014.2377581
   http://dx.doi.org/10.3766/jaaa.21.10.4                       14. D Oertel and ED Young. 2004. What’s a cerebellar circuit
6. Hearing Review. 2011. 35 million Americans suffering             doing in the auditory system? Trends in Neurosciences
   from hearing loss. (2011). https://www.hear-it.org/              27, 2 (2004), 104–110. DOI:
   35-million-Americans-suffering-from-hearing-loss                 http://dx.doi.org/10.1016/j.tins.2003.12.001
   [Online; accessed 2017-01-29].                               15. Oticon. 2017. Oticon Opn product guide. (2017). https:
7. Benjamin Johansen, Yannis Paul Raymond Flet-Berliac,             //www.oticon.co.za/-/media/oticon/main/pdf/master/opn/
   Maciej Jan Korzepa, Per Sandholm, Niels Henrik                   pbr/177406uk_pbr_opn_product_guide_17_1.pdf [Online;
   Pontoppidan, Michael Kai Petersen, and Jakob Eg Larsen.          accessed 2017-12-17].
   2017. Hearables in Hearing Care: Discovering Usage           16. David K. Ryugo. 2011. Introduction to Efferent Systems.
   Patterns Through IoT Devices. Springer International             Springer Handbook of Auditory Research 38 (2011),
   Publishing, Cham, 39–49. DOI:                                    1–15, 1–15. DOI:
   http://dx.doi.org/10.1007/978-3-319-58700-4_4
                                                                    http://dx.doi.org/10.1007/978-1-4419-7070-1_1
8. Mead C Killion. 2002. New thinking on hearing in noise:
                                                                17. World Health Organization. 2011. Grades of hearing
   a generalized articulation index. (2002). DOI:
                                                                    impairment. (2011).
   http://dx.doi.org/10.1055/s-2002-24976
                                                                    http://www.who.int/pbd/deafness/hearing [Online;
9. Nicolas Le Goff. 2015. Amplifying soft sounds - a                accessed 2017-12-17].
   personal matter. Technical Report February.

</pre>