=Paper=
{{Paper
|id=Vol-2473/paper31
|storemode=property
|title=Acoustic Output of the Railway Information Systems for Visually Impaired Passengers
|pdfUrl=https://ceur-ws.org/Vol-2473/paper31.pdf
|volume=Vol-2473
|authors=Milan Rusko,Marián Trnka,Sachia Darjaa,Ladislav Schichman
|dblpUrl=https://dblp.org/rec/conf/itat/RuskoTDS19
}}
==Acoustic Output of the Railway Information Systems for Visually Impaired Passengers==
<pdf width="1500px">https://ceur-ws.org/Vol-2473/paper31.pdf</pdf>
<pre>
       Acoustic Output of the Railway Information Systems for Visually Impaired
                                     Passengers
                                              Milan Rusko, Marián Trnka, Sakhia Darjaa
                                            Department of Speech Analysis and Synthesis
                                Institute of Informatics of the Slovak Acadeny of Sciences (II SAS)
                                            Dúbravská cesta 9, 845 07 Bratislava, Slovakia

                                                       Ladislav Schichman
                                                           ELEN s.r.o.
                                            Ľubochnianska 16, 080 06 Ľubotice, Slovakia

Abstract. The Decree of the Ministry of Environment of the            implementing Decree of the Building Act 398/2009 Coll.,
Slovak Republic, No.532 / 2002, art. 2.5.2, laying down details       “On general technical requirements ensuring barrier-free
of the general technical requirements for constructions and           use of buildings.”
structures used by persons with reduced mobility, states that
every basic information system must be complemented by an                The guidance systems for visually impaired can be
alternative solution for providing the blind and visually             generally devided in two main groups - information on the
impaired persons with information (for example, an informant,         construction and operation of facilities in the construction
acoustic or tactile system, or telephone information service)         site, such as lifts, and a group of information systems for
and an optical system for the hearing impaired. The                   the visually impaired, providing them with information on
installation of several new information boards for the Slovak         the operation of transport systems, such as departures, train
Railways was a good opportunity to introduce an automatic
remotely controlled information audio output providing the            arrivals, etc. The infotables must have acoustic output, as
same information as displayed on the information boards, to           according to the Decree the basic information for public
the visually impaired passengers. The architecture of the             orientation must be both visual and, if it is possible,
information system is presented in the paper. Several types of        acoustic and tactile.
speech synthesizers are introduced that were candidates for              These systems are generally activated remotely; the
the speech generation. It is explained which of them was used         information is given in a form of audio signals and voice
in the final solution and why. Potential issues of the system are
pointed out and the future solution of railway information            messages. The remote control shall in CZ be provided by
systems is discussed. The system was installed as a part of six       means of an electronic coded command receiver emitted
new information tables at Spišská Nová Ves railway station, it        from a distance of at least 40 m. The radio command signal
is being tested and at the time this article is published, it         frequency is 86,790 MHz for CZ (it is 87,100 MHz for
should be already in regular operation.                               SK).
                                                                         In SK, the rules for information systems, with regard to
1     Introduction1                                                   the needs of the blind, are addressed only in the Decree of
1.1 Motivation                                                        the Ministry of Environment of the Slovak Republic laying
                                                                      down details of general technical requirements for
   In the last years efforts are intensifying to make                 construction and general technical requirements for
information more accessible to blind people and to make               structures used by persons with reduced mobility (Decree
their orientation in the urban environment easier. The most           no. 532/2002 Coll., Art. 2.5.2).
natural solution seems to be the use of audio signals and                It states that the basic information device must be
speech announcements to supply the visually impaired with             complemented by an alternative solution for providing
the information that is provided in a visual form to the rest         information to a blind person (for example, an informant,
of population.                                                        acoustic or tactile system, telephone information service)
                                                                      and an optical system for the hearing impaired.
1.2   Legal Status in the Czech Republic and Slovakia
   Thanks to the activities of the Union of the Blind and             2    Devices for information system for the
thanks to the understanding of the responsible authorities in              blind and visually impaired
the Czech Republic (CZ), as well as support equipment
manufacturers, the solution to making information
                                                                         Elements and devices for the information system for the
available to the blind and partially sighted people has
                                                                      visually impaired are designed as a complement to the
advanced considerably further than in Slovakia (SK). It has
                                                                      existing hardware and software elements of electronic
even been incorporated into the legislation and has been
                                                                      information systems for the public.
included in the Building Act. This has probably become an
                                                                         People have certainly long noticed the use of acoustic
inspiration for implementing similar systems in SK.
                                                                      beacons for blind people at traffic lights, however, many
   The rules and obligations of building information
                                                                      other functions are also fulfilled today by sound beacons.
systems with regard to the needs of blind and partially
                                                                      The following subsections describe the most common
sighted citizens are in the CZ given by the Methodology of
                                                                      devices used in SK and CZ.
the Ministry for Regional Development of the CZ for

Copyright ©2019 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
2.1 Acoustic beacon                                               mobile and static devices that serve both for acoustic and
                                                                  voice information and easier orientation of blind people.
   An acoustic (or sound) beacon is an electronic acoustic
                                                                     It is designed as a unified system for all types of acoustic
device that has two main functions for the visually
                                                                  information and orientation in the Slovak Republic. This
impaired: orientation/navigation and information. The
                                                                  means that one command transmitter operates the blind,
beacon is controlled by a signal from a command
                                                                  a)                                   b)
transmitter operated by the visually impaired person.
The navigation device emits a periodic sound signal that
allows a person with limited vision to locate the object (for
example, entrance to the building) or to receive other
important information, such as the receive information
about the current status of the traffic light, or the length of
time remaining for safe crossing the road.
   The use of beacons and information systems is already
widely used in some European countries. For instance the
SNCF (French National Railway Company) chose the
NAVIGUEO+HIFI navigation and information devices for
its train and subway stations. Okeenea won the RATP,
(Parisian Public Transportation Company) tender for audio
beacons and will be the audio beacons supplier for the
entire Parisian metro network. More than 1,500 of these
devices are going to guide visually imparaired people by
the end of 2019 [1].
   In his article from 2015 Harušťák informs: “in the CZ,
there are 12 cities with sound-beacon equipped urban
public transport. In the SK, the past two years have moved.
The first swallow was the Transport Company of the City                  Fig. 1. The TYFLOSET® System: a) pushbutton
of Ţilina, which in early 2014 equipped the first 39 vehicles       transmitter VPN 02, b) command transmitter in folding
(buses and trolleybuses) with acoustic beacons and all              stick VPN 403. (Published by courtesy of APEX s.r.o.)
newly purchased vehicles will have beacons.” [2]
                                                                  activating all acoustic and voice information and
                                                                  orientation systems. Transmitter frequency is 86.790 MHz
                                                                  for the CZ and 87,100 MHz for SK [3].
2.2 The TYFLOSET® System by APEX
                                                                  2.3 The STARMON information systems with LED
   In SK, TYFLOSET® System devices by APEX are used                   boards by ELEN s.r.o.
to control the functions of audio information systems for
the blind (see Fig. 1.).                                             ELEN Ltd. was established in 1991 in Prešov, Slovakia,
   The handheld transmitter has six buttons. Buttons 1 and        by developers who have been especially involved in
2 are used in the SK for orientation on streets, in public        microprocessor applications, automation and robotics
buildings, moving staircases, sidewalks; for activating           applications. Since its establishment, it has focused on the
sound information terminals. Buttons 3 and 4 are reserved         development and production of electronic information
for public transport vehicles, buttons 5 and 6 are reserved       panels and displays.
for other applications. The 4-button command transmitter             Their most important applications range from railway
can be built directly into the white stick. After pushing the     station information systems providing passengers with the
button a control signal is transmitted and the beacon             information on departures/arrivals of trains, information
answers by playing the pre-recorded sound or speech               displays on tram and bus stops, special displays for
information according to the desired function.                    hospitals, Metro (underground) information boards in
   The beacon can give a short beep that shows the visually       Prague, through exchange-rate boards for banks, to a large-
impaired person the direction towards the beacon marked           scale information board showing the state of the
location. In buildings, the system can guide the blind to         environment in Budapest [4].
enter the building and find their way or the contact person.         Passenger information systems are an essential part of
In public transport and railways the system can provide a         any modern station. Their use significantly affects the
line number, driving direction, and driver announcements          comfort and safety of passengers in public passenger
on boarding or stops.                                             transport. Most of the ELEN information boards are
   The sound-equipped information boards should provide           currently equipped with powerful LEDs as display
voice information about the current timetable. Furthermore,       elements.
it is possible to equip acoustic beacons to ensure the               ELEN Ltd. has been cooperating for many years with
orientation and safe movement of visually impaired in the         STARMON company from Czech republic, which has been
areas of stations and terminals [2].                              designing information systems for passengers in trains,
   The TYFLOSET® transmitter/receiver system produced             buses, and other types of public transport [5].
bu APEX Ltd. company consists of a set of portable,
2.4 Automatic railway information systems with voice            Fig. 2. The schematic diagram of the railway information
    output                                                      system with visual (LED board) and speech (loudspeaker)
                                                                                         outputs
   The idea of using automatic speech processing in railway
information system is straightforward, as the
communication via speech is the most natural one for
humans. To give an example, PHILIPS has introduced their       3.2 The mini computer and audio hardware
sophisticated train timetable information over the telephone      The heart of the sound generating system is a Mini PC
that provided accurate connections between 1200 German         Raspberry Pi2 computer equipped with RS 485 bus
cities, using speech recognition, speech understanding,        communication circuits (See Fig. 3.). Raspberry Pi is a
dialogue management and voice output based on pre-             small single-boot computer whose primary operating
recorded utterances in 1995 [6].                               system is Raspbian Linux.
   Similar voice-controlled system was developed in               It is equipped with Broadcom Quad-Core CPU
Slovakia by four academic institutions in the years 2002 to    BCM2836 with 900MHz clock, 1GB RAM, four USB 2.0
2006 [7].                                                      interface connectors, HDMI interface and microSD slot.

3    The   architecture      and     hardware
     components of the designed system

   When it was decided that the newly installed information
LED boards by ELEN Ltd. Slovak Railways should be
equipped with an on-demand voice-information feature for
the blind, STARMON delivered a hardware solution and
the Institute of Informatics of the Slovak Academy of
Sciences (II SAS) designed a synthetic voice for this new
feature.


3.1 The architecture of the system
   A block diagram of the information system is presented          Fig. 3. Typical hardware configuration of the control unit
in Fig. 2. The text information for the boards is sent from                            by STARMON
the STARMON Information Server via the RS 485 bus to
the RS 485 interfaces both in the LED board motherboard,          The output power sufficient for driving the loudspeaker system
and the Raspberry-based mini PC. When the blind person         is provided by a miniature audio amplifier mounted to the PC
presses the button on the TYFLOSET® VPN 02 handheld            board.
transmitter or the command transmitter in folding stick           The used AMC VIVA 4IP loudspeaker is suitable for outdoor
VPN 403, the control 87,1 MHz is emitted. Once this signal     installation in wet conditions (IP55). The IP55 category means
is recived by the TYFLOSET® receiver, the control unit of      almost complete ingress protection from particles and a good level
the Raspberry PC prepares message based on the actual text     of protection against water. The dual band speaker system
displayed on the board, launches the text preprocessing        provides maximum power of 20 W and sound pressure level 89
program and the speech synthesis program itself. The text      dB at 1 W of power and 1 m distance. Maximum sound pressure
preprocessing turns abbreviations and numeralls into full      level (SPL) at 1m distance (at 3254 Hz) is 102 dB, and the
text, corrects diacritics and pronunciation. The Text to       frequency response is 90 Hz - 20 kHz. The radiation angle at 1000
Speech Synthesizer (TTS) transforms text into a synthetic      Hz is 150° (horizontal) and 120° (vertical), which gives the
human-speech-like audio signal. This is then amplified and     system a good spatial coverage. Low tone speaker is 4” and the
played via a loudspeaker.                                      high tone has 1” in diameter.


                                                               4     Speech synthesis on the railways

                                                                  Playing the acoustic signals, voice messages and
                                                               announcements is the most widely used way of informing
                                                               passengers on railway stations and trains (e.g. [8]).
                                                                  The traditional solution was to use recorded prompts
                                                               with “slot filling” with various sections of speech
                                                               utterances to produce the required voice-message.
                                                                  This approach is however very inflexible. It is unable to
                                                               interpret new messages with unforeseen utterance structure
                                                               or out-of-vocabulary words. On the other hand some years
                                                               ago, the intelligibility and naturalness of the synthesized
speech utterances could hardly compete with the voice            much higher quality, they soon eSpeak excluded from the
messages obtained by concatenation of pre-recorded words         list of potential candidates because of unnaturalness of the
and phrases.                                                     produced speech.

4.1 The “true” TTS on the foreign railways                       4.2.2 Concatenative synthesis - Kempelen 2.0
   The alternative represented by the use of “true” speech
synthesis has been currently used more and more. For the            The expression “Concatenative synthesis” designates in
sake of simplicity, let us define the “true” speech synthesis    general any synthesis method using concatenation of pre-
as a system that is able to interpret – i.e. read aloud – any    recorded speech segments (e.g sentences, phrases, words,
(even unknown) text in a given language with sufficient          syllables, diphones, phonemes, or their parts). One of our
naturalness and intelligibility. Let us introduce some           concatenative synthesizers, Kempelen 2.0 [15], had been
examples.                                                        used in the services of the Slovak telephone operators in
   In the 2010 it was decided that the Swedish Transport         their SMS-to-Voice services for about fifteen years. It was
Administration, Trafikverket will use a text-to-speech           a diphone synthesizer using an ovelap-add method similar
public announcement system to relay passenger                    to PSOLA for pitch manipulation following the CART-
information to travelers at train stations across the country.   trees based F0 and Duration models. This synthesizer was
The text to speech synthetic voice was created by Acapela        called by the Slovak Telekom Robo-teta (Robo-aunt) for
Group [9], [10].                                                 the robotic character of its female voice. This was mainly
   The TextSpeak company reports, that their “TTS-EM             due to the small number of implementations of diphones
modules have been integrated in 2017 to announce                 (synthesis elements), which led to spectral monotony. The
passenger information across New York City (DOT) and             second weak point was the imperfect modeling of
Los Angeles (LA Metro) for audio and ADA compliance in           intonation and speech rhythm, leading to repetitive
new smart bus shelters. Additional deployments in 2017           prosodic patterns sounding mechanically.
include 1000s of information displays across Europe
including France, Germany and Scandinavia.” [11]                 4.2.3 Unit Selection synthesis (UniSel) - Kempelen 3.0
   Hungarian developers are probably the farthest in the
deployment of synthesizers in station reporting systems             The Unit Selection synthesis [16] is probably the most
from nearby countries. Their system has been in operation        successful and most used method among the approaches
at the largest passenger railway station of Hungary since        using waveform concatenation algorithms.
June 2014 and has been installed for more than 60 other             The synthesis elements can be of different length
stations and stops [12].                                         (triphones, diphones. phones, subphones etc.). These are
                                                                 chosen from multiple candidates contained in a large
4.2 TTS - The candidate systems for voice generation             speech database according to their phonetic, word, sentence
                                                                 context and to their F0, and duration.
   There have been many technical approaches in the                 Our Unit Selection synthesizer, Kempelen 3.0 was
history of modern speech synthesis, that were successfully       completely developed at the Department of Speech
tried out. And most of them we have tried in our systems         Analysis and Synthesis of the II SAS [15]. The CART
too. (Sorry, we skip the historical experiments, like            trees, that were used in the first versions for prosody
Volfgang von Kempelen’s speaking machine or even                 modeling were later replaced by HMM models that
Homer Dudley's vocoder.)                                         generate the target values of F0 and duration for every
                                                                 phoneme sought in the database.
4.2.1 Formant synthesis - eSpeak                                    A syllable was chosen as the base unit of synthesis,
                                                                 which contributes to the natural rhythm of the resultant
   One of them was based on the role of the speech               speech. Unwanted artifacts at the connection points of the
spectrum resonances – formants in the human ability to           syllables are rare and mostly come from imperfect
identify various phonemes. Using two simple types of             automatic phoneme allignement in the database.
excitation signals and formant filters, Klatt was able to           The minimum size of the speech database is about two
design a formant synthesizer [13] creating an intelligible       hours of speech recordings. This database has to be stored
speech.                                                          on the disk or uploaded in the memory. The memory
   The STARMON company has delivered their sound                 footprint of this database is big and the process of reading
hardware equipped with eSpeak Slovak voice.                      the candidate elements is time consuming as it is not
   eSpeak is a compact open source software speech               optimized for speed in the current version.
synthesizer which uses a "formant synthesis" method. This
allows the TTS to be provided in a small size. The speech        4.2.4 Statistical Parametric synthesis with Hidden
is relatively clear and needs only a short time to produce             Markov modeling (HMM TTS), Kempelen 4.0
utterances, but is not as natural or smooth as the
synthesizers based on newer approaches of concatenation            The statistical parametric speech synthesis uses statistical
of the elements of pre-recorded human speech [14]. As the        modeling based on Hidden Markov Models (HMMs) to
Department of Speech Synthesis and Analysis of the II            create estimates of F0, duration and spectral envelope (in
SAS have been working on speech synthesis in Slovak
since 1989 and have developed several synthesizers of
a form of Mel-cepstrum coefficients) to drive the vocoder           Providing information by a human informant is a
and generate the synthetic speech. [17]                          relatively expensive solution taking into account that the
   Our “HMM speech synthesizer” was developed in 2011            number of railway stations with so called “Comprehensive
[18]. It is based on HTS Speech Synthesis Toolkit [19]. The      services for passengers” in SK, which should provide this
context-dependent HMM models were trained from our Slovak        service is more than 60.
speech databases, as generative models for speech synthesis         Tactile displays are both rare and expensive (see e.g.
process. The system was supplemented by various language-        TeslaTouch [25]). Vidal-Verdu and co-authors present an
specific components, such as text preprocessing, letter-to-      up-to-date survey of graphical tactile displays which could
phoneme conversion, etc.                                         be used for the visually impaired people. However most of
   The original version of the synthesizer uses the Mel Log      them are research prototypes and the expenses to produce
Spectrum Approximation (MLSA) Vocoder [20]. Speech               them commercially would be currently too high. Thus the
parameters are generated from HMMs with dynamic                  goal of an efficient low-cost tactile display for visually-
features, namely multi-space probability distribution            impaired people has not yet been reached [26].
HMMs (MSD-HMMs). The MLSA filter is excited using a                 An information system equipped with a speech output
simple impulse – random noise excitation.                        using a speech synthesizer thus proves to be one of the
   Experiments and comparisons were done with HMM                most appropriate solutions at present.
synthesizers using more sophisticated vocoders [21],                The authors considered the properties, possibilities and
however these were not public domain and would increase          hardware requirements of five types of synthesizers – one
the price of the system.                                         public domain formant synthesizers and four synthesizers
                                                                 produced by II SAS.
4.2.5 Statistical Parametric synthesis with DNN modeling            The eSpeak was excluded from the list of potential
      (DNN TTS), Kempelen 5.0                                    candidates because of unnaturalness of the produced
                                                                 speech.
                                                                    It was decided that despite its reliability and high speed
   Recent massive increase in available computing power
and memory capacity, the use of parallel computing and the       of speech production, the Kempelen 2.0 diphone
use of graphics processors has led to the possibility of using   concatenative synthesizer is outdated and should not be
                                                                 used in the current public information system.
different types of neural networks to model models for
                                                                    The Kempelen 3.0 Unit Selection synthesizer has a
statistical parametric synthesis [22].
                                                                 disadvantage of relatively slow speech generation caused
   Our “Deep Neural Network (DNN) synthesizer“ was
                                                                 by reading the element candidates from the memory. It was
designed using the Merlin toolkit for building DNN
models for statistical parametric speech synthesis [23]. It      so impossible to use this synthesizer in the designed
was used in combination with a front-end text processor          information system even though it produces a pleasant and
                                                                 natural voice.
designed at II SAS and a WORLD vocoder [24].
                                                                    The Kempelen 5.0 DNN synthesizer is four times slower
   We found out that the amount of training speech data
                                                                 than Kempelen 4.0 HTS synthesizer mainly due to higher
necessary for getting satisfactory quality of the resulting
                                                                 higher volume of calculations needed by WORLD vocoder.
voice was highly speaker-dependedent. While it was
enough to use about two and a half hours of speech of our        DNN models are about 100 times larger than HMM and
                                                                 their memory requirements, as well as the time required to
male speaker Milan to get a reasonable naturalness and
                                                                 load them, are considerably higher too.
intelligibility, about ten hours was needed to create our
                                                                    Therefore, a compromise was made between speech
female voice Dagmar. Further increasing the volume of
                                                                 quality and the speed, and the Kempelen 4.0 parametric
training data should lead to an increase in quality, but one
has to make sure that the recordings are consistent in style,    HMM statistical synthesizer was selected to be used in the
recording channel, etc.                                          current version of the information system.
                                                                    Six new voice equipped information boards have been
   The quality of DNN voices is generally very high
                                                                 installed at Spišská Nová Ves railway station. The system
especially in terms of natural intonation and rhythm, and
                                                                 is being tested and in the time of publication of this paper
timbre of voice. However the artifacts of vocoding are still
                                                                 it should be already in regular operation.
audible in a form of a slight buzz.


5    Results and discussion

   As mentioned in the description of the legal status, the
Decree of the Ministry of Environment of the Slovak
Republic no. 532/2002 Coll., Art. 2.5.2., introduces an
obligation to provide information to a blind person in an
appropriate way (for example, an informant, acoustic or
tactile system, telephone information service) and an
optical system for the hearing impaired. This offers several
alternative possibilities to the voice messages.
    Fig. 4. One of the new information boards in Spišská        [3] TYFLOSET® electronic orientation and information
nova ves equipped with voice output for the blind (note the          system for the visually impaired persons,
 black loudspeaker mounted to the upper left corner of the           http://www.apex-jesenice.cz/tyfloset.php?lang=en
 board ). The protective plastic film will be removed from      [4] ELEN s.r.o., https://www.elen.sk/
                                                                [5] STARMON s.r.o., http://www.starmon.cz/
        the display after regular operation is started.         [6] H. Aust, M. Oerder, F..Seide,and V.Steinbiss, “The
                                                                     Philips automatic train timetable informationsystem”,
   To conclude we have to mention several potential issues           Speech Communication, Vol. 17, 1995, pp. 249-262.
that have to be worked on.                                      [7] J. Juhar, S. Ondas, A. Cizmar, M. Rusko, G. Rozinaj
   The response time of Kempelen 4.0 is approximately 0,5            and     R.    Jarina,    “Development     of    Slovak
times realtime on the Rapberry PC II system. The current             GALAXY/voiceXML based spoken language dialogue
version processes the whole message and then reads in one            system to retrieve information from the internet.”
block. It is planned that the following version will generate        Proceedings of INTERSPEECH (2006).
speech by sentences during playing the previous utterance.      [8] E. Klabbers, “High-quality speech output generation
                                                                     through advanced phrase concatenation”, Proceedings
This will reduce the reaction time requirements                      of the COST Workshop on Speech Technology in the
significantly and enable the use of other types of                   Public Telephone Network., Rhodes, 1997, pp. 85- 88.
synthesizers. It will also be possible to consider              [9] ACAPELA report,
implementing the option of setting a higher emotional                http://nationalpainreport.com/swedens-railway-
arousal, or voice effort, as is usual with warning messages,         stations-get-new-text-to-speech-technology-for-public-
Lombard speech [27], or emotion cues [28]. Of course, in             announcements-885121.html
that case an emotional-speech database would have to be         [10] ACAPELA news,
used to train the synthesizer [29].                                  https://www.acapela-group.com/news/public-transport-
   The intelligibility of the output speech of the                   acapela-group-creates-custom-voices-for-trafikverket/
synthesizers should have been tested using standard             [11] TextSpeak,
methods, e.g. using phonetically balanced SUS test [30].             https://www.textspeak.com/first-case-studies-2-2-2/
                                                                [12] Zainkó, Csaba et al. “A polyglot domain optimised
   The range of the radio transmitter has to be set correctly        text-to-speech      system    for    railway    station
to prevent multiple triggering and reading by several                announcements.” Interspeech (2015).
information systems simultaneously.                             [13] Dennis H. Klatt, "Software for a cascade/parallel
                                                                     formant synthesizer" J. Acoustical Society of America,
6    Conclusion                                                      67(3) March 1980.
                                                                [14] eSpeak, http://espeak.sourceforge.net
                                                                [15] S. Darjaa, M. Rusko, M. Trnka: Three generations of
   We introduced a new voice-equipped information system             speech synthesis systems in Slovakia, Proceedings of
developed for Slovak Railways, that combines visual text             XI International Conference Speech and Computer
information on LED information boards with reading-on-               (SPECOM), 2006, pp. 297-302.
demand of the same text content using speech synthesis in       [16] A.J. Hunt, A.W. Black, Unit selection in a
Slovak.                                                              concatenative speech synthesis system using a large
   Following the analysis and experiments, Kempelen 4.0              speech database, 1996 IEEE International Conference
HMM synthesizer was implemented in the current version               on Acoustics, Speech, and Signal Processing (ICASSP)
                                                                     Conference Proceedings, 1996, ISBN: 0-7803-3192-3.
of the device. The authors hope that their product will help    [17] K. Tokuda, H. Zen and A.W. Black, “An HMM-based
the blind and partially sighted passengers to obtain the             speech synthesis system applied to English.”
needed information more comfortably.                                 Proceedings of 2002 IEEE Workshop on Speech
                                                                     Synthesis, 2002. (2002): 227-230.
Acknowledgment                                                  [18] S. Darjaa, et. al.: HMM speech synthesizer in Slovak.
                                                                     In: GCCP 2011, Bratislava, 2011, pp. 212-221.
                                                                [19] H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko,
   The authors would like to express thanks to Mr. Milan             A.W. Black, K. Tokuda, The HMM-based speech
Slanina, the Head of Service and Design of the STARMON               synthesis system version 2.0, Proc. of ISCA SSW6,
Ltd., who provided the authors with all the necessary                Bonn, Germany, Aug. 2007.
information and participated in the development of this         [20] S. Imai, K. Sumita, Ch. Furuichi. Mel Log Spectrum
product.                                                             Approximation (MLSA) filter for speech synthesis,
                                                                     Electronics and Communications in Japan 66(2), 10-
   This work was supported by the Slovak Scientific Grant
                                                                     18, 1983
Agency VEGA, grant No. 2/0161/18.                               [21] M. Sulír, J. Juhár, M. Rusko, Development of the
                                                                     Slovak HMM-Based TTS System and Evaluation of
References                                                           Voices in Respect to the Used Vocoding Techniques.
                                                                     Computing and Informatics, 35, 2016, pp. 1467-1490.
                                                                [22] H. Zen, A. Senior, M. Schuster, Statistical parametric
[1] Okeenea,                                                         speech synthesis using deep neural networks, IEEE
    http://www.okeenea.com/navigueo-hifi-audio-beacon/               International Conference on Acoustics, Speech, and
[2] I. Harušťák, ÚNSS, Akustické informačné systémy s                Signal Processing (ICASSP), 2013, pp. 7962-7966.
    diaľkovým ovládaním pre nevidiacich, in: Mosty              [23] Z.Wu, O. Watts, S. King, "Merlin: An Open Source
    inklúzie 7/2015. http://www.nrozp-mosty.sk/temy-                 Neural Network Speech Synthesis System" in Proc.
    cisla-7-2015/item/1690-akusticke-informacne-                     9th ISCA Speech Synthesis Workshop (SSW9),
    systemy-s-dialkovym-ovladanim-pre-nevidiacich.html               September 2016, Sunnyvale, CA, USA.
[24] M. Masanori, F. Yokomori and K. Ozawa, “WORLD:
     A Vocoder-Based High-Quality Speech Synthesis
     System for Real-Time Applications.” IEICE
     Transactions 99-D (2016): 1877-1884.
[25] O. Bau, I. Poupyrev, A. Israr, and Ch. Harrison,
     TeslaTouch: electrovibration for touch surfaces. In
     Proceedings of the 23nd annual ACM symposium on
     User interface software and technology (UIST '10,.
     ACM, New York, NY, USA, 2010, 283-292.
[26] F. Vidal-Verdu, M. Hafez, “Graphical tactile displays
     for visually-impaired people.” IEEE Transactions on
     Neural Systems and Rehabilitation Engineering, 2007,
     15(1), 2007, pp. 119−130.
[27] J. Šimko, Š. Beňuš, M. Vainio, “Hyperarticulation in
     Lombard speech: Global coordination of the jaw, lips
     and the tongue.” In Journal of the Acoustical Society of
     America, 2016, vol. 139, no. 1, 2016, pp. 151-162.
[28] M. Hric, M. Chmulik, I. Guoth and R. Jarina, “SVM
     based speaker emotion recognition in continuous
     scale.” Proceedings of the 25th International
     Conference Radioelektronika, 2015, pp. 339-342.
[29] R. Sabo, J. Rajčáni, “Designing the Database of
     Speech Under Stress”, Journal of Linguistics
     (Jazykovedný časopis), Volume 68: Issue 2, 4016, pp.
     326–335.
[30] M. Sulír, J. Staš, J. Juhár, “Design of phonetically
     balanced SUS test for evaluation of Slovak TTS
     systems” / - 2014. In: Elmar-2014 : 56th International
     Symposium, Zadar, Croatia, University of Zagreb,
     2014, pp. 35-38.

</pre>