=Paper= {{Paper |id=Vol-2009/fmt-proceedings-2017-paper19 |storemode=property |title=HoloKeys - An Augmented Reality Application for Learning the Piano |pdfUrl=https://ceur-ws.org/Vol-2009/fmt-proceedings-2017-paper19.pdf |volume=Vol-2009 |authors=Dominik Hackl,Christoph Anthes |dblpUrl=https://dblp.org/rec/conf/fmt/HacklA17 }} ==HoloKeys - An Augmented Reality Application for Learning the Piano== https://ceur-ws.org/Vol-2009/fmt-proceedings-2017-paper19.pdf
                               HoloKeys -
                     An Augmented Reality Application
                          for Learning the Piano
                          Dominik Hackl                                            Christoph Anthes
                  University of Applied Sciences                            University of Applied Sciences
                           Upper Austria                                              Upper Austria
                     4232 Hagenberg/Austria                                     4232 Hagenberg/Austria
                   Email: dominikhackl@gmx.at                            Email: christoph.anthes@fh-hagenberg.at




    Abstract—This paper describes the design and the implemen-       IV. Finally conclusions are drawn and an outlook into the
tation approach of a piano training application. HoloKeys is an      future work is given.
Augmented Reality tool which is capable to superimpose the keys
to be played on a real piano. Musical pieces are loaded as MIDI                           II. R ELATED W ORK
files, interpreted and can be displayed in two different ways.
This prototype provides many possibilities for extension which          Music education has a long tradition in the field of AR.
can make it a powerful teaching tool.                                In an early approach Cheng and Robinson provided a visual
                                                                     sheet music overlay displayed planar in the visual field of the
                      I. I NTRODUCTION                               user. The display of the augmentations is triggered when he
   Augmented Reality (AR), described by Azuma as a technol-          looks at the hands. The type of sheet is depending on which
ogy where the user sees ’the real world, with virtual objects        hand he looks. The augmentation is not registered (meaning
superimposed upon or composited with the real world’ [1], has        it is not directly spatially interconnected) to a real object
become a hot topic in the recent years. The application areas        opposed to the approach presented in this publication. An
are wide spread and range far beyond simple advertisements           HMD is used for display [2]. Cakmakci et al. augmented the
and virtual manuals from advanced training to sophisticated          information which string to pull on a guitar with the intention
remote collaboration scenarios. Using AR to train musical            to reduce cognitive discontinuities compared to the traditional
instruments has a long tradition in the field but because the        way of learning an instrument. They were the first to provide
rapid development in AR Head-Mounted-Displays (HMDs)                 information on the interaction to be taken in an immediate way
this application area has gained new attention.                      on an instrument [3]. The registration of the guitar and the
   We present HoloKeys, a prototypical implementation of             virtual hand is implemented with the help of fiducial markers.
an AR training tool for learning the piano. HoloKeys runs               In order to avoid the use of fiducial markers on the piano
on an HMD which the user is wearing while sitting in                 Huang et al. use their knowledge on the application domain
front of a physical piano. The application indicates notes           and track the keys of the piano for pose estimation with
that are supposed to be played by displaying virtual keys            the help of natural feature recognition [4]. Unfortunatly they
superimposing the physical keyboard with two different ap-           provide no details on the diplay used, but the frame-rate of 15
proaches. Acquiring the musical data dynamically by loading          frames per second, implies that it has not been developed for
and processing MIDI (Musical Instrument Digital Interface)           a head-tracked system.
files, the application is fully agnostic considering the musical        Chow et al. focus on the educational level of AR piano
pieces to be trained. To achieve the required precision for the      teaching showing that with the help of augmentations and
augmentations on the piano, the application was implemented          gamification components the motivation and interest in learn-
using fiducial marker tracking. Since this application is a          ing the piano could be increased. They provided a system
prototype, an extensive collection of possible enhancements          illustrating the notes to be played by lines approaching the
and prospects for the future is given.                               keys. Their findings also indicate that notation literacy does not
                                                                     increase using their system of illustration [5]. We use a similar
A. Outline                                                           approach for the augmentations of the notes to be played but
   The remainder of this paper is structured as follows: The         rely on a optical see-through HMD instead of a video-based
next chapter provides an overview of the related work in music       HMD.
teaching applications. Chapter III will introduce the conceptual        Opposed to this visualisation approach Torres-Fernandez et
design of the application describing the architecture and the        al. introduce a virtual character which illustrates how well the
user interface. Implementation details are provided in Chapter       piano player has performed. To interpret the played music they




                                                                   140
 HoloKeys – An Augmented Reality Application for Learning the Piano

compare the input from a MIDI keyboard with an initially
loaded MIDI file [6]. A similar analysis was suggested and
implemented earlier by Barakonyi and Schmalstieg [7]. They
make use of fiducials for tracking and a desktop AR system
equipped with a webcam and a traditional screen.
   In terms of visualisation Weing et al. demonstrate a system
in the area of Spatial Augmented Reality where they project
the keys to be pressed directly on the piano. Different modes
show for example the current and the next keys to be pressed.
If a wrong key is pressed it is highlighted in red to provide        Fig. 1. Illustration of the conceptual design. The user, sitting in front of the
feedback to the user [8].                                            piano and wearing an HMD, looks down at the keyboard. When there are
                                                                     notes to be played the respective key is highlighted. Underneath the keyboard
   Zhang et al. use a completely virtual keyboard and track the      there is an image marker which is used for tracking.
hand of the user with fiducial markers and the finger positions
with a self-developed data glove. Their approach targets the
rehabilitation of the motor function of stroke survivors rather         1) The Main Menu: The initial scene of the application is
than teaching the piano [9].                                         the main menu. There the user can select the musical piece to
   Compared to these existing and presented approaches our           play as well as the desired playback speed. By pressing the
system is unique in terms of used display technology.                start button the application will switch to playback mode and
                                                                     begin visualizing and playing the musical piece.
                 III. C ONCEPTUAL D ESIGN                               2) Playback Mode: In playback mode the user sees the
  The following chapter gives an overview of the application’s       augmentations of the keys to be played superimposing the
hardware and software components and explains how the                physical keyboard. Additionally a timeline shows the current
individual parts interact with each other.                           playback position and gives the user the option to jump to
                                                                     different positions inside the piece. With the pause button the
A. Architecture Overview                                             user is able to interrupt the playback or return to the main
                                                                     menu.
   The application’s setup is illustrated in Fig. 1 and consists        3) Calibration Mode: In calibration mode the application
of the following two hardware components.                            displays an augmentation of only one key, the middle C. The
   1) The Piano: The core component is a physical piano              user can adjust the position of the marker until the virtual key
which is used for the actual playing. Underneath the piano           perfectly fits the real one. This is useful to setup the optimal
keyboard which is usually made of 88 keys a fiducial marker          position of the marker on the piano. Additionally the user can
is placed which is used by the application for tracking. The         also adjust the pitch of the virtual piano sound in calibration
keys of a regular piano are standardized in size which makes         mode because this does not necessarily match with the real
the application fully independent considering the type of piano.     piano. Playback volume can be adjusted in the HMD.
In case a keyboard is used the key width can be adjusted.
   2) The Head-Mounted-Display: The user sits in front of            C. Display of Augmentations
the piano and wears an HMD on which the application runs.               Generally the HMD displays an augmentation of a bright
Through the HMD the user sees augmentations in the form of           green key to indicate that the actual key on that position has
highlighted keys on top of the real keyboard. The HMD also           to be pressed. Two different approaches as seen in Fig. 2
handles tracking by recognizing the image marker with the            were tested and both have their advantages and disadvantages
help of computer vision algorithms. The HMD therefore keeps          concerning predictability and Field Of View (FOV) limitations.
track of the player’s position and displays the augmentations           1) The Instant Approach: The moment a key is supposed
accordingly. Additionally, the HMD is responsible for sound          to be pressed it becomes highlighted. Once it is supposed to
output of the music to be played. This gives the user an             be released it switches back to normal. This way the user can
impression on how the piece is supposed to sound and makes           more or less observe the playing of the piece in real-time,
it easier to play along with it.                                     comparable to watch the fingers of an actual pianist. While
                                                                     this approach can be useful for advanced players, it is hardly
B. Interface                                                         possible to learn a new piece or even to play along with it,
   In order to manage different settings and control the play-       because the player has no way of predicting the next notes.
back, a simple user interface was implemented. The originally        Still, observing this looks great and could be used for showcase
two-dimensional UI is placed inside the 3D scene using world-        purposes (self-playing piano), as the limited FOV is also less
stabilized coordinates. Considering the usually static setup of      of a problem there.
the application with the user sitting in front of the piano, the        2) The Beatmania Approach: Note objects are created far in
world-stabilized menu is a reasonable approach [10]. User            the distance and from there start moving towards the particular
input works through gaze-based interaction combined with             keys. As soon as the virtual object reaches the real key, the
gestures.                                                            note should be played. With this approach, which became




                                                                   141
 HoloKeys – An Augmented Reality Application for Learning the Piano

                                                                                   with Unity. Vuforia supports several different tracking
                                                                                   methods ranging from recognizing plain images to com-
                                                                                   plex objects. With a specific setup, Vuforia can also be
                                                                                   used on the HoloLens.
                                                                                                                        4
                                                                                 • C# Synth Project and MIDI Support
                                                                                   The C# Synth Project is an open-source library which is
                                                                                   used for processing MIDI data and synthesizing it to au-
                                                                                   dio data. MIDI is an industry standard for interconnection
                                                                                   between musical instruments and digital devices. Its file
Fig. 2. Comparing the two tested approaches. Left: The Instant Approach.
                                                                                   format represents musical information like notes values,
Right: The Beatmania Approach.                                                     volume and tempo. Although MIDI is a complex format,
                                                                                   it is still the most popular and commonly used format to
                                                                                   store musical data. For piano pieces the format is usually
popular with the game ’Beatmania’ [11] and is still used in                        sufficient because only one channel is required to store a
many music rhythm games today, the user can anticipate the                         series of notes and tempo changes.
upcoming notes and prepare accordingly. When learning a
piano piece the musician’s brain utilizes its ’muscle memory’
and fine motor skills rather than memorizing each individual                 B. Visualization and Tracking
note [12]. Therefore learning a piece with the Beatmania                        The application’s visuals consist of a Unity 3D scene which
approach should be equally efficient than learning it from sheet             renders the virtual keys, combined with Vuforia’s tracking
music, especially for beginners.                                             abilities to provide the information on where to render the
                                                                             keys.
                           IV. I MPLEMENTATION
                                                                                1) Vuforia’s image target: For this application tracking via
  This chapter goes into detail regarding the concrete imple-                fiducial marker and image target was used. The image target
mentation of HoloKeys. It starts with a brief overview of                    in Unity is a planar object in 3D space which is associated
used hardware and software tools followed by an in-depth                     with a set of 2D images. These images represent the markers
description of the two main development tasks, visualization                 that are placed somewhere in the real world. Once the camera
and MIDI processing.                                                         recognizes a marker the application can trace back the position
A. Used Technologies                                                         of the HMD and can therefore project all augmented objects
                                                                             accordingly.
  The application was developed for tablet devices as well
                                                                                2) Tracking setup: Marker images and other tracking set-
as the HoloLens. The tablet approach is mainly used for
                                                                             tings can be configured in Vuforia’s web interface. This con-
demonstration purposes, rather than actual training.
                                                                             figuration with all related assets is then compiled into a Unity
  1) Hardware:
                                                                             package that can be imported into Unity after that. In Unity
                 1
  • HoloLens                                                                 two components of Vuforia, ARCamera and ImageTarget, are
     The HoloLens as a current AR HMD provides good                          used. Subordinate objects of the ImageTarget become affected
     sensory support as well as spatial audio and stereoscopic               by the marker-related projection.
     display capabilities. Its main disadvantage the limited
                                                                                3) Generating the keyboard: In order to display the cur-
     FOV poses an issue to the applicability of this use case.
                                                                             rently played keys, first an entire virtual keyboard is displayed
  2) Software: To allow cross-platform and cross-device de-                  half-transparently superimposing the real one. A script takes
velopment the following set of tools and libraries was used.                 care of automatically generating all 88 key objects. One base
            2
  • Unity                                                                    key object is placed in the scene and aligned at around 90
     Unity is traditionally a game engine which has found                    degrees relative to the ImageTarget. This registration has to
     wide adoption in the whole domain of Mixed Reality                      match with the real world relation between marker and piano
     [13]. It allows scene setup and provides scripting capabil-             keyboard. All other keys are then generated as duplicates of the
     ities. The applications developed with Unity can easily be              base object with respective offset and color (black or white).
     deployed on a multitude of target platforms including iOS
     and Android devices as well as UWP (Universal Windows
                                                                             C. Audio and MIDI Playback
     Platform) devices.
              3
  • Vuforia                                                                     The two core components of the C# Synth Project library
     The Augmented Reality part of the project is based on                   are the MidiSequencer which handles loading and processing
     Vuforia, an AR tracking library which perfectly integrates              MIDI data and the MidiStreamSynthesizer which handles the
  1 https://www.microsoft.com/en-us/hololens
                                                                             actual audio playback.
  2 https://unity3d.com/
  3 https://www.vuforia.com/                                                     4 https://csharpsynthproject.codeplex.com/




                                                                           142
 HoloKeys – An Augmented Reality Application for Learning the Piano

   1) Handling key actions: During playback the MidiSe-                    system, the student would be even more aware of his
quencer fires two events that are relevant for this applica-               progress and more likely to remain motivated.
tion: MidiNoteOn and MidiNoteOff. These two events are                   • Dictionary of chords, scales etc.
respectively fired when the playback of a note is triggered or             A very useful utility not only for beginners but also
terminated and therefore indicate exactly the time when a key              for advanced pianists would be a piano dictionary. The
is pressed and released. In the implementations of these two               player could look up all possible chords and scales and
event handlers the MIDI code of the affected note is passed                would be able to see them highlighted right on top of
as a parameter. The only operation is to map this MIDI code                his keyboard. Especially for jazz piano where complex
to our according key object and set its material color to either           chords and scales are common, this technology would be
green (in NoteOn) or the default color (in NoteOff).                       of great service.
   2) Combining the audio sources: The MidiStreamSynthe-                 2) Further Improvements:
sizer creates actual audio data based on the sequencer’s input.
                                                                         • Using music sheets as markers
To make sure that this audio data is actually redirected to
                                                                           The use of music sheets, perhaps in the form of a special
Unity’s audio source, the special method OnAudioFilterRead
                                                                           music book, as fiducial markers could eliminate the need
has to be implemented. This method supports direct writing
                                                                           for additional markers placed on the piano. It could not
into the audio buffer and therefore redirect the contents of the
                                                                           only automatically detect the musical piece to be played
StreamSynthesizer to Unity’s audio source.
                                                                           but also indicate, when to turn the sheets or even highlight
                      V. C ONCLUSION                                       musical attributes on the sheets.
                                                                         • Checking the learning performance
   As a prototype the application serves well, but due to the
limited FOV, which will most likely increase in the next years             Real-time feedback of the user’s playing could greatly
with the following generations of AR hardware, its real world              contribute to the learning experience. This could be
usage could be doubted. Furthermore, an evaluation of the                  achieved on the one hand by using MIDI keyboards
different augmentation methods would be useful. Especially                 to directly receive the MIDI input of pressed keys or
when trying out a few more possible approaches, a user test                on the other hand by recording and deconstructing the
could find out which of the methods are most likely to work                audio data. The first approach would be technologically
in a real-world scenario. A more in-depth study of musical                 straight-forward but would limit the application to elec-
augmentation methods would also be useful for teaching other               tronic keyboard instruments while the second approach
instruments or even in completely different areas of music.                would be more flexible but complicated to implement and
                                                                           perhaps inaccurate [14].
A. Future Work - The Virtual Piano Teacher                             The possibilities of the virtual piano teacher are enormous
   A long-term vision could be the creation of a full-featured       but all are based on the core concept of the technique explained
virtual piano teacher using AR. Especially early-stage piano         in this paper. As soon as there are improvements in AR
learning contains many tasks that could be implemented with          hardware, especially concerning FOV, virtual piano teachers
AR technologies like the one explained in this paper combined        can be implemented and actually start to become a helpful
with gamification elements.                                          tool.
   1) Use Cases:
   • Learning notes and the piano keyboard                                                         R EFERENCES
     Simple exercises or games to recognize the note names            [1] R. T. Azuma, “A survey of augmented reality,” Presence: Teleoperators
     and match it with the proper keys could really increase the          and Virtual Environments, vol. 6, no. 4, pp. 355–385, August 1997.
     early-stage learning rate. For beginners the note names          [2] L.-T. Cheng and J. Robinson, “Personal contextual awareness through
                                                                          visual focus,” IEEE Intelligent Systems, vol. 16, no. 3, pp. 16–20, 2001.
     could be augmented on top of every key until they                [3] O. Cakmakci, F. Brard, and J. Coutaz, “An augmented reality based
     become familiar with it.                                             learning assistant for electric bass guitar,” in 10th International Confer-
   • Learning easy to intermediate musical pieces                         ence on Human-Computer Interaction, 2003.
     Especially for smaller pieces the AR learning approach           [4] F. Huang, Y. Zhou, Y. Yu, Z. Wang, and S. Du, “Piano AR: A markerless
                                                                          augmented reality based piano teaching system,” in Third International
     could surpass traditional learning by music sheets. Begin-           Conference on Intelligent Human-Machine Systems and Cybernetics,
     ners who are not used to reading music yet, would still              2011.
     be able to learn pieces quickly on their own. Additionally       [5] J. Chow, H. Feng, R. Amor, and B. C. Wunsche, “Music education
                                                                          using augmented reality with a head mounted display,” in Fourteenth
     a lot more useful information like fingering, expression             Australasian User Interface Conference (AUIC2013).              Melbourne,
     and dynamics could be displayed during playback.                     Australia: ACM, Jan. 2013, pp. 73–79.
   • Technical exercises                                              [6] C. A. T. Fernandez, P. Paliyawan, and C. C. Yin, “Piano learning
                                                                          application with feedback provided by an ar virtual character,” in 5th
     The importance of regular technical exercises for piano              Global Conference on Consumer Electronics. Kyoto, Japan: IEEE, Oct.
     students is huge but generally underestimated and dis-               2016.
     liked. With the introduction of AR and gamification, a           [7] I. Barakonyi and D. Schmalstieg, “Augmented reality agents in the
                                                                          development pipeline of computer entertainment,” in 4th international
     whole lot of enjoyable and still pianistically valuable              conference on Entertainment Computing (ICEC’05). Sanda, Japan:
     exercises could be realized. By adding some sort of level            Springer, Sep. 2005, pp. 345–356.




                                                                   143
 HoloKeys – An Augmented Reality Application for Learning the Piano

 [8] M. Weing, A. Rhlig, K. Rogers, J. Gugenheimer, F. Schaub, B. Knings,
     E. Rukzio, and M. Weber, “P.i.a.n.o.: Enhancing instrument learning
     via interactive projected augmentation,” in Conference on Pervasive
     and ubiquitous computing adjunct publication (UbiComp13). Zurich,
     Switzerland: ACM, Sep. 2013, pp. 75–78.
 [9] D. Zhang, Y. Shen, S. Ong, and A. Nee, “An affordable augmented
     reality based rehabilitation system for hand motions,” in International
     Conference on Cyberworlds (CW ’10). Singapore, Singapore: IEEE,
     Oct. 2010.
[10] M. Billinghurst and H. Kato, “Collaborative mixed reality,” in Interna-
     tional Symposium on Mixed Reality (ISMR ’99). Springer, 1999, pp.
     261–284.
[11] S. Steinberg, Music Games Rock. P3: Power Play Publishing, 2011.
     [Online]. Available: http://www.musicgamesrock.com/
[12] R. Shusterman, “Muscle memory and the somaesthetic pathologies of
     everyday life,” Human Movement, vol. 12, no. 1, pp. 4–15, 2011.
[13] P. Milgram, H. Takemura, A. Utsumi, and F. Kishino, “Augmented re-
     ality: A class of displays on the reality-virtuality continuum,” Presence:
     Telemanipulator and Telepresence Technologies, vol. 2351, pp. 282–292,
     1994.
[14] S. Dixon, “On the computer recognition of solo piano music,” in
     Proceedings of Australasian computer music conference, 2000, pp. 31–
     37.




                                                                                  144