HoloKeys - An Augmented Reality Application for Learning the Piano Dominik Hackl Christoph Anthes University of Applied Sciences University of Applied Sciences Upper Austria Upper Austria 4232 Hagenberg/Austria 4232 Hagenberg/Austria Email: dominikhackl@gmx.at Email: christoph.anthes@fh-hagenberg.at Abstract—This paper describes the design and the implemen- IV. Finally conclusions are drawn and an outlook into the tation approach of a piano training application. HoloKeys is an future work is given. Augmented Reality tool which is capable to superimpose the keys to be played on a real piano. Musical pieces are loaded as MIDI II. R ELATED W ORK files, interpreted and can be displayed in two different ways. This prototype provides many possibilities for extension which Music education has a long tradition in the field of AR. can make it a powerful teaching tool. In an early approach Cheng and Robinson provided a visual sheet music overlay displayed planar in the visual field of the I. I NTRODUCTION user. The display of the augmentations is triggered when he Augmented Reality (AR), described by Azuma as a technol- looks at the hands. The type of sheet is depending on which ogy where the user sees ’the real world, with virtual objects hand he looks. The augmentation is not registered (meaning superimposed upon or composited with the real world’ [1], has it is not directly spatially interconnected) to a real object become a hot topic in the recent years. The application areas opposed to the approach presented in this publication. An are wide spread and range far beyond simple advertisements HMD is used for display [2]. Cakmakci et al. augmented the and virtual manuals from advanced training to sophisticated information which string to pull on a guitar with the intention remote collaboration scenarios. Using AR to train musical to reduce cognitive discontinuities compared to the traditional instruments has a long tradition in the field but because the way of learning an instrument. They were the first to provide rapid development in AR Head-Mounted-Displays (HMDs) information on the interaction to be taken in an immediate way this application area has gained new attention. on an instrument [3]. The registration of the guitar and the We present HoloKeys, a prototypical implementation of virtual hand is implemented with the help of fiducial markers. an AR training tool for learning the piano. HoloKeys runs In order to avoid the use of fiducial markers on the piano on an HMD which the user is wearing while sitting in Huang et al. use their knowledge on the application domain front of a physical piano. The application indicates notes and track the keys of the piano for pose estimation with that are supposed to be played by displaying virtual keys the help of natural feature recognition [4]. Unfortunatly they superimposing the physical keyboard with two different ap- provide no details on the diplay used, but the frame-rate of 15 proaches. Acquiring the musical data dynamically by loading frames per second, implies that it has not been developed for and processing MIDI (Musical Instrument Digital Interface) a head-tracked system. files, the application is fully agnostic considering the musical Chow et al. focus on the educational level of AR piano pieces to be trained. To achieve the required precision for the teaching showing that with the help of augmentations and augmentations on the piano, the application was implemented gamification components the motivation and interest in learn- using fiducial marker tracking. Since this application is a ing the piano could be increased. They provided a system prototype, an extensive collection of possible enhancements illustrating the notes to be played by lines approaching the and prospects for the future is given. keys. Their findings also indicate that notation literacy does not increase using their system of illustration [5]. We use a similar A. Outline approach for the augmentations of the notes to be played but The remainder of this paper is structured as follows: The rely on a optical see-through HMD instead of a video-based next chapter provides an overview of the related work in music HMD. teaching applications. Chapter III will introduce the conceptual Opposed to this visualisation approach Torres-Fernandez et design of the application describing the architecture and the al. introduce a virtual character which illustrates how well the user interface. Implementation details are provided in Chapter piano player has performed. To interpret the played music they 140 HoloKeys – An Augmented Reality Application for Learning the Piano compare the input from a MIDI keyboard with an initially loaded MIDI file [6]. A similar analysis was suggested and implemented earlier by Barakonyi and Schmalstieg [7]. They make use of fiducials for tracking and a desktop AR system equipped with a webcam and a traditional screen. In terms of visualisation Weing et al. demonstrate a system in the area of Spatial Augmented Reality where they project the keys to be pressed directly on the piano. Different modes show for example the current and the next keys to be pressed. If a wrong key is pressed it is highlighted in red to provide Fig. 1. Illustration of the conceptual design. The user, sitting in front of the feedback to the user [8]. piano and wearing an HMD, looks down at the keyboard. When there are notes to be played the respective key is highlighted. Underneath the keyboard Zhang et al. use a completely virtual keyboard and track the there is an image marker which is used for tracking. hand of the user with fiducial markers and the finger positions with a self-developed data glove. Their approach targets the rehabilitation of the motor function of stroke survivors rather 1) The Main Menu: The initial scene of the application is than teaching the piano [9]. the main menu. There the user can select the musical piece to Compared to these existing and presented approaches our play as well as the desired playback speed. By pressing the system is unique in terms of used display technology. start button the application will switch to playback mode and begin visualizing and playing the musical piece. III. C ONCEPTUAL D ESIGN 2) Playback Mode: In playback mode the user sees the The following chapter gives an overview of the application’s augmentations of the keys to be played superimposing the hardware and software components and explains how the physical keyboard. Additionally a timeline shows the current individual parts interact with each other. playback position and gives the user the option to jump to different positions inside the piece. With the pause button the A. Architecture Overview user is able to interrupt the playback or return to the main menu. The application’s setup is illustrated in Fig. 1 and consists 3) Calibration Mode: In calibration mode the application of the following two hardware components. displays an augmentation of only one key, the middle C. The 1) The Piano: The core component is a physical piano user can adjust the position of the marker until the virtual key which is used for the actual playing. Underneath the piano perfectly fits the real one. This is useful to setup the optimal keyboard which is usually made of 88 keys a fiducial marker position of the marker on the piano. Additionally the user can is placed which is used by the application for tracking. The also adjust the pitch of the virtual piano sound in calibration keys of a regular piano are standardized in size which makes mode because this does not necessarily match with the real the application fully independent considering the type of piano. piano. Playback volume can be adjusted in the HMD. In case a keyboard is used the key width can be adjusted. 2) The Head-Mounted-Display: The user sits in front of C. Display of Augmentations the piano and wears an HMD on which the application runs. Generally the HMD displays an augmentation of a bright Through the HMD the user sees augmentations in the form of green key to indicate that the actual key on that position has highlighted keys on top of the real keyboard. The HMD also to be pressed. Two different approaches as seen in Fig. 2 handles tracking by recognizing the image marker with the were tested and both have their advantages and disadvantages help of computer vision algorithms. The HMD therefore keeps concerning predictability and Field Of View (FOV) limitations. track of the player’s position and displays the augmentations 1) The Instant Approach: The moment a key is supposed accordingly. Additionally, the HMD is responsible for sound to be pressed it becomes highlighted. Once it is supposed to output of the music to be played. This gives the user an be released it switches back to normal. This way the user can impression on how the piece is supposed to sound and makes more or less observe the playing of the piece in real-time, it easier to play along with it. comparable to watch the fingers of an actual pianist. While this approach can be useful for advanced players, it is hardly B. Interface possible to learn a new piece or even to play along with it, In order to manage different settings and control the play- because the player has no way of predicting the next notes. back, a simple user interface was implemented. The originally Still, observing this looks great and could be used for showcase two-dimensional UI is placed inside the 3D scene using world- purposes (self-playing piano), as the limited FOV is also less stabilized coordinates. Considering the usually static setup of of a problem there. the application with the user sitting in front of the piano, the 2) The Beatmania Approach: Note objects are created far in world-stabilized menu is a reasonable approach [10]. User the distance and from there start moving towards the particular input works through gaze-based interaction combined with keys. As soon as the virtual object reaches the real key, the gestures. note should be played. With this approach, which became 141 HoloKeys – An Augmented Reality Application for Learning the Piano with Unity. Vuforia supports several different tracking methods ranging from recognizing plain images to com- plex objects. With a specific setup, Vuforia can also be used on the HoloLens. 4 • C# Synth Project and MIDI Support The C# Synth Project is an open-source library which is used for processing MIDI data and synthesizing it to au- dio data. MIDI is an industry standard for interconnection between musical instruments and digital devices. Its file Fig. 2. Comparing the two tested approaches. Left: The Instant Approach. format represents musical information like notes values, Right: The Beatmania Approach. volume and tempo. Although MIDI is a complex format, it is still the most popular and commonly used format to store musical data. For piano pieces the format is usually popular with the game ’Beatmania’ [11] and is still used in sufficient because only one channel is required to store a many music rhythm games today, the user can anticipate the series of notes and tempo changes. upcoming notes and prepare accordingly. When learning a piano piece the musician’s brain utilizes its ’muscle memory’ and fine motor skills rather than memorizing each individual B. Visualization and Tracking note [12]. Therefore learning a piece with the Beatmania The application’s visuals consist of a Unity 3D scene which approach should be equally efficient than learning it from sheet renders the virtual keys, combined with Vuforia’s tracking music, especially for beginners. abilities to provide the information on where to render the keys. IV. I MPLEMENTATION 1) Vuforia’s image target: For this application tracking via This chapter goes into detail regarding the concrete imple- fiducial marker and image target was used. The image target mentation of HoloKeys. It starts with a brief overview of in Unity is a planar object in 3D space which is associated used hardware and software tools followed by an in-depth with a set of 2D images. These images represent the markers description of the two main development tasks, visualization that are placed somewhere in the real world. Once the camera and MIDI processing. recognizes a marker the application can trace back the position A. Used Technologies of the HMD and can therefore project all augmented objects accordingly. The application was developed for tablet devices as well 2) Tracking setup: Marker images and other tracking set- as the HoloLens. The tablet approach is mainly used for tings can be configured in Vuforia’s web interface. This con- demonstration purposes, rather than actual training. figuration with all related assets is then compiled into a Unity 1) Hardware: package that can be imported into Unity after that. In Unity 1 • HoloLens two components of Vuforia, ARCamera and ImageTarget, are The HoloLens as a current AR HMD provides good used. Subordinate objects of the ImageTarget become affected sensory support as well as spatial audio and stereoscopic by the marker-related projection. display capabilities. Its main disadvantage the limited 3) Generating the keyboard: In order to display the cur- FOV poses an issue to the applicability of this use case. rently played keys, first an entire virtual keyboard is displayed 2) Software: To allow cross-platform and cross-device de- half-transparently superimposing the real one. A script takes velopment the following set of tools and libraries was used. care of automatically generating all 88 key objects. One base 2 • Unity key object is placed in the scene and aligned at around 90 Unity is traditionally a game engine which has found degrees relative to the ImageTarget. This registration has to wide adoption in the whole domain of Mixed Reality match with the real world relation between marker and piano [13]. It allows scene setup and provides scripting capabil- keyboard. All other keys are then generated as duplicates of the ities. The applications developed with Unity can easily be base object with respective offset and color (black or white). deployed on a multitude of target platforms including iOS and Android devices as well as UWP (Universal Windows C. Audio and MIDI Playback Platform) devices. 3 • Vuforia The two core components of the C# Synth Project library The Augmented Reality part of the project is based on are the MidiSequencer which handles loading and processing Vuforia, an AR tracking library which perfectly integrates MIDI data and the MidiStreamSynthesizer which handles the 1 https://www.microsoft.com/en-us/hololens actual audio playback. 2 https://unity3d.com/ 3 https://www.vuforia.com/ 4 https://csharpsynthproject.codeplex.com/ 142 HoloKeys – An Augmented Reality Application for Learning the Piano 1) Handling key actions: During playback the MidiSe- system, the student would be even more aware of his quencer fires two events that are relevant for this applica- progress and more likely to remain motivated. tion: MidiNoteOn and MidiNoteOff. These two events are • Dictionary of chords, scales etc. respectively fired when the playback of a note is triggered or A very useful utility not only for beginners but also terminated and therefore indicate exactly the time when a key for advanced pianists would be a piano dictionary. The is pressed and released. In the implementations of these two player could look up all possible chords and scales and event handlers the MIDI code of the affected note is passed would be able to see them highlighted right on top of as a parameter. The only operation is to map this MIDI code his keyboard. Especially for jazz piano where complex to our according key object and set its material color to either chords and scales are common, this technology would be green (in NoteOn) or the default color (in NoteOff). of great service. 2) Combining the audio sources: The MidiStreamSynthe- 2) Further Improvements: sizer creates actual audio data based on the sequencer’s input. • Using music sheets as markers To make sure that this audio data is actually redirected to The use of music sheets, perhaps in the form of a special Unity’s audio source, the special method OnAudioFilterRead music book, as fiducial markers could eliminate the need has to be implemented. This method supports direct writing for additional markers placed on the piano. It could not into the audio buffer and therefore redirect the contents of the only automatically detect the musical piece to be played StreamSynthesizer to Unity’s audio source. but also indicate, when to turn the sheets or even highlight V. C ONCLUSION musical attributes on the sheets. • Checking the learning performance As a prototype the application serves well, but due to the limited FOV, which will most likely increase in the next years Real-time feedback of the user’s playing could greatly with the following generations of AR hardware, its real world contribute to the learning experience. This could be usage could be doubted. Furthermore, an evaluation of the achieved on the one hand by using MIDI keyboards different augmentation methods would be useful. Especially to directly receive the MIDI input of pressed keys or when trying out a few more possible approaches, a user test on the other hand by recording and deconstructing the could find out which of the methods are most likely to work audio data. The first approach would be technologically in a real-world scenario. A more in-depth study of musical straight-forward but would limit the application to elec- augmentation methods would also be useful for teaching other tronic keyboard instruments while the second approach instruments or even in completely different areas of music. would be more flexible but complicated to implement and perhaps inaccurate [14]. A. Future Work - The Virtual Piano Teacher The possibilities of the virtual piano teacher are enormous A long-term vision could be the creation of a full-featured but all are based on the core concept of the technique explained virtual piano teacher using AR. Especially early-stage piano in this paper. As soon as there are improvements in AR learning contains many tasks that could be implemented with hardware, especially concerning FOV, virtual piano teachers AR technologies like the one explained in this paper combined can be implemented and actually start to become a helpful with gamification elements. tool. 1) Use Cases: • Learning notes and the piano keyboard R EFERENCES Simple exercises or games to recognize the note names [1] R. T. Azuma, “A survey of augmented reality,” Presence: Teleoperators and match it with the proper keys could really increase the and Virtual Environments, vol. 6, no. 4, pp. 355–385, August 1997. early-stage learning rate. For beginners the note names [2] L.-T. Cheng and J. Robinson, “Personal contextual awareness through visual focus,” IEEE Intelligent Systems, vol. 16, no. 3, pp. 16–20, 2001. could be augmented on top of every key until they [3] O. Cakmakci, F. Brard, and J. Coutaz, “An augmented reality based become familiar with it. learning assistant for electric bass guitar,” in 10th International Confer- • Learning easy to intermediate musical pieces ence on Human-Computer Interaction, 2003. Especially for smaller pieces the AR learning approach [4] F. Huang, Y. Zhou, Y. Yu, Z. Wang, and S. Du, “Piano AR: A markerless augmented reality based piano teaching system,” in Third International could surpass traditional learning by music sheets. Begin- Conference on Intelligent Human-Machine Systems and Cybernetics, ners who are not used to reading music yet, would still 2011. be able to learn pieces quickly on their own. Additionally [5] J. Chow, H. Feng, R. Amor, and B. C. Wunsche, “Music education using augmented reality with a head mounted display,” in Fourteenth a lot more useful information like fingering, expression Australasian User Interface Conference (AUIC2013). Melbourne, and dynamics could be displayed during playback. Australia: ACM, Jan. 2013, pp. 73–79. • Technical exercises [6] C. A. T. Fernandez, P. Paliyawan, and C. C. Yin, “Piano learning application with feedback provided by an ar virtual character,” in 5th The importance of regular technical exercises for piano Global Conference on Consumer Electronics. Kyoto, Japan: IEEE, Oct. students is huge but generally underestimated and dis- 2016. liked. With the introduction of AR and gamification, a [7] I. Barakonyi and D. Schmalstieg, “Augmented reality agents in the development pipeline of computer entertainment,” in 4th international whole lot of enjoyable and still pianistically valuable conference on Entertainment Computing (ICEC’05). Sanda, Japan: exercises could be realized. By adding some sort of level Springer, Sep. 2005, pp. 345–356. 143 HoloKeys – An Augmented Reality Application for Learning the Piano [8] M. Weing, A. Rhlig, K. Rogers, J. Gugenheimer, F. Schaub, B. Knings, E. Rukzio, and M. Weber, “P.i.a.n.o.: Enhancing instrument learning via interactive projected augmentation,” in Conference on Pervasive and ubiquitous computing adjunct publication (UbiComp13). Zurich, Switzerland: ACM, Sep. 2013, pp. 75–78. [9] D. Zhang, Y. Shen, S. Ong, and A. Nee, “An affordable augmented reality based rehabilitation system for hand motions,” in International Conference on Cyberworlds (CW ’10). Singapore, Singapore: IEEE, Oct. 2010. [10] M. Billinghurst and H. Kato, “Collaborative mixed reality,” in Interna- tional Symposium on Mixed Reality (ISMR ’99). Springer, 1999, pp. 261–284. [11] S. Steinberg, Music Games Rock. P3: Power Play Publishing, 2011. [Online]. Available: http://www.musicgamesrock.com/ [12] R. Shusterman, “Muscle memory and the somaesthetic pathologies of everyday life,” Human Movement, vol. 12, no. 1, pp. 4–15, 2011. [13] P. Milgram, H. Takemura, A. Utsumi, and F. Kishino, “Augmented re- ality: A class of displays on the reality-virtuality continuum,” Presence: Telemanipulator and Telepresence Technologies, vol. 2351, pp. 282–292, 1994. [14] S. Dixon, “On the computer recognition of solo piano music,” in Proceedings of Australasian computer music conference, 2000, pp. 31– 37. 144