=Paper=
{{Paper
|id=Vol-2473/paper31
|storemode=property
|title=Acoustic Output of the Railway Information Systems for Visually Impaired Passengers
|pdfUrl=https://ceur-ws.org/Vol-2473/paper31.pdf
|volume=Vol-2473
|authors=Milan Rusko,Marián Trnka,Sachia Darjaa,Ladislav Schichman
|dblpUrl=https://dblp.org/rec/conf/itat/RuskoTDS19
}}
==Acoustic Output of the Railway Information Systems for Visually Impaired Passengers==
Acoustic Output of the Railway Information Systems for Visually Impaired Passengers Milan Rusko, Marián Trnka, Sakhia Darjaa Department of Speech Analysis and Synthesis Institute of Informatics of the Slovak Acadeny of Sciences (II SAS) Dúbravská cesta 9, 845 07 Bratislava, Slovakia Ladislav Schichman ELEN s.r.o. Ľubochnianska 16, 080 06 Ľubotice, Slovakia Abstract. The Decree of the Ministry of Environment of the implementing Decree of the Building Act 398/2009 Coll., Slovak Republic, No.532 / 2002, art. 2.5.2, laying down details “On general technical requirements ensuring barrier-free of the general technical requirements for constructions and use of buildings.” structures used by persons with reduced mobility, states that every basic information system must be complemented by an The guidance systems for visually impaired can be alternative solution for providing the blind and visually generally devided in two main groups - information on the impaired persons with information (for example, an informant, construction and operation of facilities in the construction acoustic or tactile system, or telephone information service) site, such as lifts, and a group of information systems for and an optical system for the hearing impaired. The the visually impaired, providing them with information on installation of several new information boards for the Slovak the operation of transport systems, such as departures, train Railways was a good opportunity to introduce an automatic remotely controlled information audio output providing the arrivals, etc. The infotables must have acoustic output, as same information as displayed on the information boards, to according to the Decree the basic information for public the visually impaired passengers. The architecture of the orientation must be both visual and, if it is possible, information system is presented in the paper. Several types of acoustic and tactile. speech synthesizers are introduced that were candidates for These systems are generally activated remotely; the the speech generation. It is explained which of them was used information is given in a form of audio signals and voice in the final solution and why. Potential issues of the system are pointed out and the future solution of railway information messages. The remote control shall in CZ be provided by systems is discussed. The system was installed as a part of six means of an electronic coded command receiver emitted new information tables at Spišská Nová Ves railway station, it from a distance of at least 40 m. The radio command signal is being tested and at the time this article is published, it frequency is 86,790 MHz for CZ (it is 87,100 MHz for should be already in regular operation. SK). In SK, the rules for information systems, with regard to 1 Introduction1 the needs of the blind, are addressed only in the Decree of 1.1 Motivation the Ministry of Environment of the Slovak Republic laying down details of general technical requirements for In the last years efforts are intensifying to make construction and general technical requirements for information more accessible to blind people and to make structures used by persons with reduced mobility (Decree their orientation in the urban environment easier. The most no. 532/2002 Coll., Art. 2.5.2). natural solution seems to be the use of audio signals and It states that the basic information device must be speech announcements to supply the visually impaired with complemented by an alternative solution for providing the information that is provided in a visual form to the rest information to a blind person (for example, an informant, of population. acoustic or tactile system, telephone information service) and an optical system for the hearing impaired. 1.2 Legal Status in the Czech Republic and Slovakia Thanks to the activities of the Union of the Blind and 2 Devices for information system for the thanks to the understanding of the responsible authorities in blind and visually impaired the Czech Republic (CZ), as well as support equipment manufacturers, the solution to making information Elements and devices for the information system for the available to the blind and partially sighted people has visually impaired are designed as a complement to the advanced considerably further than in Slovakia (SK). It has existing hardware and software elements of electronic even been incorporated into the legislation and has been information systems for the public. included in the Building Act. This has probably become an People have certainly long noticed the use of acoustic inspiration for implementing similar systems in SK. beacons for blind people at traffic lights, however, many The rules and obligations of building information other functions are also fulfilled today by sound beacons. systems with regard to the needs of blind and partially The following subsections describe the most common sighted citizens are in the CZ given by the Methodology of devices used in SK and CZ. the Ministry for Regional Development of the CZ for Copyright ©2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2.1 Acoustic beacon mobile and static devices that serve both for acoustic and voice information and easier orientation of blind people. An acoustic (or sound) beacon is an electronic acoustic It is designed as a unified system for all types of acoustic device that has two main functions for the visually information and orientation in the Slovak Republic. This impaired: orientation/navigation and information. The means that one command transmitter operates the blind, beacon is controlled by a signal from a command a) b) transmitter operated by the visually impaired person. The navigation device emits a periodic sound signal that allows a person with limited vision to locate the object (for example, entrance to the building) or to receive other important information, such as the receive information about the current status of the traffic light, or the length of time remaining for safe crossing the road. The use of beacons and information systems is already widely used in some European countries. For instance the SNCF (French National Railway Company) chose the NAVIGUEO+HIFI navigation and information devices for its train and subway stations. Okeenea won the RATP, (Parisian Public Transportation Company) tender for audio beacons and will be the audio beacons supplier for the entire Parisian metro network. More than 1,500 of these devices are going to guide visually imparaired people by the end of 2019 [1]. In his article from 2015 Harušťák informs: “in the CZ, there are 12 cities with sound-beacon equipped urban public transport. In the SK, the past two years have moved. The first swallow was the Transport Company of the City Fig. 1. The TYFLOSET® System: a) pushbutton of Ţilina, which in early 2014 equipped the first 39 vehicles transmitter VPN 02, b) command transmitter in folding (buses and trolleybuses) with acoustic beacons and all stick VPN 403. (Published by courtesy of APEX s.r.o.) newly purchased vehicles will have beacons.” [2] activating all acoustic and voice information and orientation systems. Transmitter frequency is 86.790 MHz for the CZ and 87,100 MHz for SK [3]. 2.2 The TYFLOSET® System by APEX 2.3 The STARMON information systems with LED In SK, TYFLOSET® System devices by APEX are used boards by ELEN s.r.o. to control the functions of audio information systems for the blind (see Fig. 1.). ELEN Ltd. was established in 1991 in Prešov, Slovakia, The handheld transmitter has six buttons. Buttons 1 and by developers who have been especially involved in 2 are used in the SK for orientation on streets, in public microprocessor applications, automation and robotics buildings, moving staircases, sidewalks; for activating applications. Since its establishment, it has focused on the sound information terminals. Buttons 3 and 4 are reserved development and production of electronic information for public transport vehicles, buttons 5 and 6 are reserved panels and displays. for other applications. The 4-button command transmitter Their most important applications range from railway can be built directly into the white stick. After pushing the station information systems providing passengers with the button a control signal is transmitted and the beacon information on departures/arrivals of trains, information answers by playing the pre-recorded sound or speech displays on tram and bus stops, special displays for information according to the desired function. hospitals, Metro (underground) information boards in The beacon can give a short beep that shows the visually Prague, through exchange-rate boards for banks, to a large- impaired person the direction towards the beacon marked scale information board showing the state of the location. In buildings, the system can guide the blind to environment in Budapest [4]. enter the building and find their way or the contact person. Passenger information systems are an essential part of In public transport and railways the system can provide a any modern station. Their use significantly affects the line number, driving direction, and driver announcements comfort and safety of passengers in public passenger on boarding or stops. transport. Most of the ELEN information boards are The sound-equipped information boards should provide currently equipped with powerful LEDs as display voice information about the current timetable. Furthermore, elements. it is possible to equip acoustic beacons to ensure the ELEN Ltd. has been cooperating for many years with orientation and safe movement of visually impaired in the STARMON company from Czech republic, which has been areas of stations and terminals [2]. designing information systems for passengers in trains, The TYFLOSET® transmitter/receiver system produced buses, and other types of public transport [5]. bu APEX Ltd. company consists of a set of portable, 2.4 Automatic railway information systems with voice Fig. 2. The schematic diagram of the railway information output system with visual (LED board) and speech (loudspeaker) outputs The idea of using automatic speech processing in railway information system is straightforward, as the communication via speech is the most natural one for humans. To give an example, PHILIPS has introduced their 3.2 The mini computer and audio hardware sophisticated train timetable information over the telephone The heart of the sound generating system is a Mini PC that provided accurate connections between 1200 German Raspberry Pi2 computer equipped with RS 485 bus cities, using speech recognition, speech understanding, communication circuits (See Fig. 3.). Raspberry Pi is a dialogue management and voice output based on pre- small single-boot computer whose primary operating recorded utterances in 1995 [6]. system is Raspbian Linux. Similar voice-controlled system was developed in It is equipped with Broadcom Quad-Core CPU Slovakia by four academic institutions in the years 2002 to BCM2836 with 900MHz clock, 1GB RAM, four USB 2.0 2006 [7]. interface connectors, HDMI interface and microSD slot. 3 The architecture and hardware components of the designed system When it was decided that the newly installed information LED boards by ELEN Ltd. Slovak Railways should be equipped with an on-demand voice-information feature for the blind, STARMON delivered a hardware solution and the Institute of Informatics of the Slovak Academy of Sciences (II SAS) designed a synthetic voice for this new feature. 3.1 The architecture of the system A block diagram of the information system is presented Fig. 3. Typical hardware configuration of the control unit in Fig. 2. The text information for the boards is sent from by STARMON the STARMON Information Server via the RS 485 bus to the RS 485 interfaces both in the LED board motherboard, The output power sufficient for driving the loudspeaker system and the Raspberry-based mini PC. When the blind person is provided by a miniature audio amplifier mounted to the PC presses the button on the TYFLOSET® VPN 02 handheld board. transmitter or the command transmitter in folding stick The used AMC VIVA 4IP loudspeaker is suitable for outdoor VPN 403, the control 87,1 MHz is emitted. Once this signal installation in wet conditions (IP55). The IP55 category means is recived by the TYFLOSET® receiver, the control unit of almost complete ingress protection from particles and a good level the Raspberry PC prepares message based on the actual text of protection against water. The dual band speaker system displayed on the board, launches the text preprocessing provides maximum power of 20 W and sound pressure level 89 program and the speech synthesis program itself. The text dB at 1 W of power and 1 m distance. Maximum sound pressure preprocessing turns abbreviations and numeralls into full level (SPL) at 1m distance (at 3254 Hz) is 102 dB, and the text, corrects diacritics and pronunciation. The Text to frequency response is 90 Hz - 20 kHz. The radiation angle at 1000 Speech Synthesizer (TTS) transforms text into a synthetic Hz is 150° (horizontal) and 120° (vertical), which gives the human-speech-like audio signal. This is then amplified and system a good spatial coverage. Low tone speaker is 4” and the played via a loudspeaker. high tone has 1” in diameter. 4 Speech synthesis on the railways Playing the acoustic signals, voice messages and announcements is the most widely used way of informing passengers on railway stations and trains (e.g. [8]). The traditional solution was to use recorded prompts with “slot filling” with various sections of speech utterances to produce the required voice-message. This approach is however very inflexible. It is unable to interpret new messages with unforeseen utterance structure or out-of-vocabulary words. On the other hand some years ago, the intelligibility and naturalness of the synthesized speech utterances could hardly compete with the voice much higher quality, they soon eSpeak excluded from the messages obtained by concatenation of pre-recorded words list of potential candidates because of unnaturalness of the and phrases. produced speech. 4.1 The “true” TTS on the foreign railways 4.2.2 Concatenative synthesis - Kempelen 2.0 The alternative represented by the use of “true” speech synthesis has been currently used more and more. For the The expression “Concatenative synthesis” designates in sake of simplicity, let us define the “true” speech synthesis general any synthesis method using concatenation of pre- as a system that is able to interpret – i.e. read aloud – any recorded speech segments (e.g sentences, phrases, words, (even unknown) text in a given language with sufficient syllables, diphones, phonemes, or their parts). One of our naturalness and intelligibility. Let us introduce some concatenative synthesizers, Kempelen 2.0 [15], had been examples. used in the services of the Slovak telephone operators in In the 2010 it was decided that the Swedish Transport their SMS-to-Voice services for about fifteen years. It was Administration, Trafikverket will use a text-to-speech a diphone synthesizer using an ovelap-add method similar public announcement system to relay passenger to PSOLA for pitch manipulation following the CART- information to travelers at train stations across the country. trees based F0 and Duration models. This synthesizer was The text to speech synthetic voice was created by Acapela called by the Slovak Telekom Robo-teta (Robo-aunt) for Group [9], [10]. the robotic character of its female voice. This was mainly The TextSpeak company reports, that their “TTS-EM due to the small number of implementations of diphones modules have been integrated in 2017 to announce (synthesis elements), which led to spectral monotony. The passenger information across New York City (DOT) and second weak point was the imperfect modeling of Los Angeles (LA Metro) for audio and ADA compliance in intonation and speech rhythm, leading to repetitive new smart bus shelters. Additional deployments in 2017 prosodic patterns sounding mechanically. include 1000s of information displays across Europe including France, Germany and Scandinavia.” [11] 4.2.3 Unit Selection synthesis (UniSel) - Kempelen 3.0 Hungarian developers are probably the farthest in the deployment of synthesizers in station reporting systems The Unit Selection synthesis [16] is probably the most from nearby countries. Their system has been in operation successful and most used method among the approaches at the largest passenger railway station of Hungary since using waveform concatenation algorithms. June 2014 and has been installed for more than 60 other The synthesis elements can be of different length stations and stops [12]. (triphones, diphones. phones, subphones etc.). These are chosen from multiple candidates contained in a large 4.2 TTS - The candidate systems for voice generation speech database according to their phonetic, word, sentence context and to their F0, and duration. There have been many technical approaches in the Our Unit Selection synthesizer, Kempelen 3.0 was history of modern speech synthesis, that were successfully completely developed at the Department of Speech tried out. And most of them we have tried in our systems Analysis and Synthesis of the II SAS [15]. The CART too. (Sorry, we skip the historical experiments, like trees, that were used in the first versions for prosody Volfgang von Kempelen’s speaking machine or even modeling were later replaced by HMM models that Homer Dudley's vocoder.) generate the target values of F0 and duration for every phoneme sought in the database. 4.2.1 Formant synthesis - eSpeak A syllable was chosen as the base unit of synthesis, which contributes to the natural rhythm of the resultant One of them was based on the role of the speech speech. Unwanted artifacts at the connection points of the spectrum resonances – formants in the human ability to syllables are rare and mostly come from imperfect identify various phonemes. Using two simple types of automatic phoneme allignement in the database. excitation signals and formant filters, Klatt was able to The minimum size of the speech database is about two design a formant synthesizer [13] creating an intelligible hours of speech recordings. This database has to be stored speech. on the disk or uploaded in the memory. The memory The STARMON company has delivered their sound footprint of this database is big and the process of reading hardware equipped with eSpeak Slovak voice. the candidate elements is time consuming as it is not eSpeak is a compact open source software speech optimized for speed in the current version. synthesizer which uses a "formant synthesis" method. This allows the TTS to be provided in a small size. The speech 4.2.4 Statistical Parametric synthesis with Hidden is relatively clear and needs only a short time to produce Markov modeling (HMM TTS), Kempelen 4.0 utterances, but is not as natural or smooth as the synthesizers based on newer approaches of concatenation The statistical parametric speech synthesis uses statistical of the elements of pre-recorded human speech [14]. As the modeling based on Hidden Markov Models (HMMs) to Department of Speech Synthesis and Analysis of the II create estimates of F0, duration and spectral envelope (in SAS have been working on speech synthesis in Slovak since 1989 and have developed several synthesizers of a form of Mel-cepstrum coefficients) to drive the vocoder Providing information by a human informant is a and generate the synthetic speech. [17] relatively expensive solution taking into account that the Our “HMM speech synthesizer” was developed in 2011 number of railway stations with so called “Comprehensive [18]. It is based on HTS Speech Synthesis Toolkit [19]. The services for passengers” in SK, which should provide this context-dependent HMM models were trained from our Slovak service is more than 60. speech databases, as generative models for speech synthesis Tactile displays are both rare and expensive (see e.g. process. The system was supplemented by various language- TeslaTouch [25]). Vidal-Verdu and co-authors present an specific components, such as text preprocessing, letter-to- up-to-date survey of graphical tactile displays which could phoneme conversion, etc. be used for the visually impaired people. However most of The original version of the synthesizer uses the Mel Log them are research prototypes and the expenses to produce Spectrum Approximation (MLSA) Vocoder [20]. Speech them commercially would be currently too high. Thus the parameters are generated from HMMs with dynamic goal of an efficient low-cost tactile display for visually- features, namely multi-space probability distribution impaired people has not yet been reached [26]. HMMs (MSD-HMMs). The MLSA filter is excited using a An information system equipped with a speech output simple impulse – random noise excitation. using a speech synthesizer thus proves to be one of the Experiments and comparisons were done with HMM most appropriate solutions at present. synthesizers using more sophisticated vocoders [21], The authors considered the properties, possibilities and however these were not public domain and would increase hardware requirements of five types of synthesizers – one the price of the system. public domain formant synthesizers and four synthesizers produced by II SAS. 4.2.5 Statistical Parametric synthesis with DNN modeling The eSpeak was excluded from the list of potential (DNN TTS), Kempelen 5.0 candidates because of unnaturalness of the produced speech. It was decided that despite its reliability and high speed Recent massive increase in available computing power and memory capacity, the use of parallel computing and the of speech production, the Kempelen 2.0 diphone use of graphics processors has led to the possibility of using concatenative synthesizer is outdated and should not be used in the current public information system. different types of neural networks to model models for The Kempelen 3.0 Unit Selection synthesizer has a statistical parametric synthesis [22]. disadvantage of relatively slow speech generation caused Our “Deep Neural Network (DNN) synthesizer“ was by reading the element candidates from the memory. It was designed using the Merlin toolkit for building DNN models for statistical parametric speech synthesis [23]. It so impossible to use this synthesizer in the designed was used in combination with a front-end text processor information system even though it produces a pleasant and natural voice. designed at II SAS and a WORLD vocoder [24]. The Kempelen 5.0 DNN synthesizer is four times slower We found out that the amount of training speech data than Kempelen 4.0 HTS synthesizer mainly due to higher necessary for getting satisfactory quality of the resulting higher volume of calculations needed by WORLD vocoder. voice was highly speaker-dependedent. While it was enough to use about two and a half hours of speech of our DNN models are about 100 times larger than HMM and their memory requirements, as well as the time required to male speaker Milan to get a reasonable naturalness and load them, are considerably higher too. intelligibility, about ten hours was needed to create our Therefore, a compromise was made between speech female voice Dagmar. Further increasing the volume of quality and the speed, and the Kempelen 4.0 parametric training data should lead to an increase in quality, but one has to make sure that the recordings are consistent in style, HMM statistical synthesizer was selected to be used in the recording channel, etc. current version of the information system. Six new voice equipped information boards have been The quality of DNN voices is generally very high installed at Spišská Nová Ves railway station. The system especially in terms of natural intonation and rhythm, and is being tested and in the time of publication of this paper timbre of voice. However the artifacts of vocoding are still it should be already in regular operation. audible in a form of a slight buzz. 5 Results and discussion As mentioned in the description of the legal status, the Decree of the Ministry of Environment of the Slovak Republic no. 532/2002 Coll., Art. 2.5.2., introduces an obligation to provide information to a blind person in an appropriate way (for example, an informant, acoustic or tactile system, telephone information service) and an optical system for the hearing impaired. This offers several alternative possibilities to the voice messages. Fig. 4. One of the new information boards in Spišská [3] TYFLOSET® electronic orientation and information nova ves equipped with voice output for the blind (note the system for the visually impaired persons, black loudspeaker mounted to the upper left corner of the http://www.apex-jesenice.cz/tyfloset.php?lang=en board ). The protective plastic film will be removed from [4] ELEN s.r.o., https://www.elen.sk/ [5] STARMON s.r.o., http://www.starmon.cz/ the display after regular operation is started. [6] H. Aust, M. Oerder, F..Seide,and V.Steinbiss, “The Philips automatic train timetable informationsystem”, To conclude we have to mention several potential issues Speech Communication, Vol. 17, 1995, pp. 249-262. that have to be worked on. [7] J. Juhar, S. Ondas, A. Cizmar, M. Rusko, G. Rozinaj The response time of Kempelen 4.0 is approximately 0,5 and R. Jarina, “Development of Slovak times realtime on the Rapberry PC II system. The current GALAXY/voiceXML based spoken language dialogue version processes the whole message and then reads in one system to retrieve information from the internet.” block. It is planned that the following version will generate Proceedings of INTERSPEECH (2006). speech by sentences during playing the previous utterance. [8] E. Klabbers, “High-quality speech output generation through advanced phrase concatenation”, Proceedings This will reduce the reaction time requirements of the COST Workshop on Speech Technology in the significantly and enable the use of other types of Public Telephone Network., Rhodes, 1997, pp. 85- 88. synthesizers. It will also be possible to consider [9] ACAPELA report, implementing the option of setting a higher emotional http://nationalpainreport.com/swedens-railway- arousal, or voice effort, as is usual with warning messages, stations-get-new-text-to-speech-technology-for-public- Lombard speech [27], or emotion cues [28]. Of course, in announcements-885121.html that case an emotional-speech database would have to be [10] ACAPELA news, used to train the synthesizer [29]. https://www.acapela-group.com/news/public-transport- The intelligibility of the output speech of the acapela-group-creates-custom-voices-for-trafikverket/ synthesizers should have been tested using standard [11] TextSpeak, methods, e.g. using phonetically balanced SUS test [30]. https://www.textspeak.com/first-case-studies-2-2-2/ [12] Zainkó, Csaba et al. “A polyglot domain optimised The range of the radio transmitter has to be set correctly text-to-speech system for railway station to prevent multiple triggering and reading by several announcements.” Interspeech (2015). information systems simultaneously. [13] Dennis H. Klatt, "Software for a cascade/parallel formant synthesizer" J. Acoustical Society of America, 6 Conclusion 67(3) March 1980. [14] eSpeak, http://espeak.sourceforge.net [15] S. Darjaa, M. Rusko, M. Trnka: Three generations of We introduced a new voice-equipped information system speech synthesis systems in Slovakia, Proceedings of developed for Slovak Railways, that combines visual text XI International Conference Speech and Computer information on LED information boards with reading-on- (SPECOM), 2006, pp. 297-302. demand of the same text content using speech synthesis in [16] A.J. Hunt, A.W. Black, Unit selection in a Slovak. concatenative speech synthesis system using a large Following the analysis and experiments, Kempelen 4.0 speech database, 1996 IEEE International Conference HMM synthesizer was implemented in the current version on Acoustics, Speech, and Signal Processing (ICASSP) Conference Proceedings, 1996, ISBN: 0-7803-3192-3. of the device. The authors hope that their product will help [17] K. Tokuda, H. Zen and A.W. Black, “An HMM-based the blind and partially sighted passengers to obtain the speech synthesis system applied to English.” needed information more comfortably. Proceedings of 2002 IEEE Workshop on Speech Synthesis, 2002. (2002): 227-230. Acknowledgment [18] S. Darjaa, et. al.: HMM speech synthesizer in Slovak. In: GCCP 2011, Bratislava, 2011, pp. 212-221. [19] H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, The authors would like to express thanks to Mr. Milan A.W. Black, K. Tokuda, The HMM-based speech Slanina, the Head of Service and Design of the STARMON synthesis system version 2.0, Proc. of ISCA SSW6, Ltd., who provided the authors with all the necessary Bonn, Germany, Aug. 2007. information and participated in the development of this [20] S. Imai, K. Sumita, Ch. Furuichi. Mel Log Spectrum product. Approximation (MLSA) filter for speech synthesis, Electronics and Communications in Japan 66(2), 10- This work was supported by the Slovak Scientific Grant 18, 1983 Agency VEGA, grant No. 2/0161/18. [21] M. Sulír, J. Juhár, M. Rusko, Development of the Slovak HMM-Based TTS System and Evaluation of References Voices in Respect to the Used Vocoding Techniques. Computing and Informatics, 35, 2016, pp. 1467-1490. [22] H. Zen, A. Senior, M. Schuster, Statistical parametric [1] Okeenea, speech synthesis using deep neural networks, IEEE http://www.okeenea.com/navigueo-hifi-audio-beacon/ International Conference on Acoustics, Speech, and [2] I. Harušťák, ÚNSS, Akustické informačné systémy s Signal Processing (ICASSP), 2013, pp. 7962-7966. diaľkovým ovládaním pre nevidiacich, in: Mosty [23] Z.Wu, O. Watts, S. King, "Merlin: An Open Source inklúzie 7/2015. http://www.nrozp-mosty.sk/temy- Neural Network Speech Synthesis System" in Proc. cisla-7-2015/item/1690-akusticke-informacne- 9th ISCA Speech Synthesis Workshop (SSW9), systemy-s-dialkovym-ovladanim-pre-nevidiacich.html September 2016, Sunnyvale, CA, USA. [24] M. Masanori, F. Yokomori and K. Ozawa, “WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications.” IEICE Transactions 99-D (2016): 1877-1884. [25] O. Bau, I. Poupyrev, A. Israr, and Ch. Harrison, TeslaTouch: electrovibration for touch surfaces. In Proceedings of the 23nd annual ACM symposium on User interface software and technology (UIST '10,. ACM, New York, NY, USA, 2010, 283-292. [26] F. Vidal-Verdu, M. Hafez, “Graphical tactile displays for visually-impaired people.” IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2007, 15(1), 2007, pp. 119−130. [27] J. Šimko, Š. Beňuš, M. Vainio, “Hyperarticulation in Lombard speech: Global coordination of the jaw, lips and the tongue.” In Journal of the Acoustical Society of America, 2016, vol. 139, no. 1, 2016, pp. 151-162. [28] M. Hric, M. Chmulik, I. Guoth and R. Jarina, “SVM based speaker emotion recognition in continuous scale.” Proceedings of the 25th International Conference Radioelektronika, 2015, pp. 339-342. [29] R. Sabo, J. Rajčáni, “Designing the Database of Speech Under Stress”, Journal of Linguistics (Jazykovedný časopis), Volume 68: Issue 2, 4016, pp. 326–335. [30] M. Sulír, J. Staš, J. Juhár, “Design of phonetically balanced SUS test for evaluation of Slovak TTS systems” / - 2014. In: Elmar-2014 : 56th International Symposium, Zadar, Croatia, University of Zagreb, 2014, pp. 35-38.