<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Acoustic Output of the Railway Information Systems for Visually Impaired Passengers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Introduction</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Motivation</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ladislav Schichman ELEN s.r.o. Ľubochnianska 16</institution>
          ,
          <addr-line>080 06 Ľubotice</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Milan Rusko, Marián Trnka, Sakhia Darjaa Department of Speech Analysis and Synthesis Institute of Informatics of the Slovak Acadeny of Sciences (II SAS) Dúbravská cesta 9</institution>
          ,
          <addr-line>845 07 Bratislava</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Decree of the Ministry of Environment of the Slovak Republic, No.532 / 2002, art. 2.5.2, laying down details of the general technical requirements for constructions and structures used by persons with reduced mobility, states that every basic information system must be complemented by an alternative solution for providing the blind and visually impaired persons with information (for example, an informant, acoustic or tactile system, or telephone information service) and an optical system for the hearing impaired. The installation of several new information boards for the Slovak Railways was a good opportunity to introduce an automatic remotely controlled information audio output providing the same information as displayed on the information boards, to the visually impaired passengers. The architecture of the information system is presented in the paper. Several types of speech synthesizers are introduced that were candidates for the speech generation. It is explained which of them was used in the final solution and why. Potential issues of the system are pointed out and the future solution of railway information systems is discussed. The system was installed as a part of six new information tables at Spišská Nová Ves railway station, it is being tested and at the time this article is published, it should be already in regular operation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1
1.1</p>
      <p>In the last years efforts are intensifying to make
information more accessible to blind people and to make
their orientation in the urban environment easier. The most
natural solution seems to be the use of audio signals and
speech announcements to supply the visually impaired with
the information that is provided in a visual form to the rest
of population.</p>
    </sec>
    <sec id="sec-2">
      <title>Legal Status in the Czech Republic and Slovakia</title>
      <p>Thanks to the activities of the Union of the Blind and
thanks to the understanding of the responsible authorities in
the Czech Republic (CZ), as well as support equipment
manufacturers, the solution to making information
available to the blind and partially sighted people has
advanced considerably further than in Slovakia (SK). It has
even been incorporated into the legislation and has been
included in the Building Act. This has probably become an
inspiration for implementing similar systems in SK.</p>
      <p>The rules and obligations of building information
systems with regard to the needs of blind and partially
sighted citizens are in the CZ given by the Methodology of
the Ministry for Regional Development of the CZ for
implementing Decree of the Building Act 398/2009 Coll.,
“On general technical requirements ensuring barrier-free
use of buildings.”</p>
      <p>The guidance systems for visually impaired can be
generally devided in two main groups - information on the
construction and operation of facilities in the construction
site, such as lifts, and a group of information systems for
the visually impaired, providing them with information on
the operation of transport systems, such as departures, train
arrivals, etc. The infotables must have acoustic output, as
according to the Decree the basic information for public
orientation must be both visual and, if it is possible,
acoustic and tactile.</p>
      <p>These systems are generally activated remotely; the
information is given in a form of audio signals and voice
messages. The remote control shall in CZ be provided by
means of an electronic coded command receiver emitted
from a distance of at least 40 m. The radio command signal
frequency is 86,790 MHz for CZ (it is 87,100 MHz for
SK).</p>
      <p>In SK, the rules for information systems, with regard to
the needs of the blind, are addressed only in the Decree of
the Ministry of Environment of the Slovak Republic laying
down details of general technical requirements for
construction and general technical requirements for
structures used by persons with reduced mobility (Decree
no. 532/2002 Coll., Art. 2.5.2).</p>
      <p>It states that the basic information device must be
complemented by an alternative solution for providing
information to a blind person (for example, an informant,
acoustic or tactile system, telephone information service)
and an optical system for the hearing impaired.
2</p>
      <sec id="sec-2-1">
        <title>Devices for information system blind and visually impaired for the</title>
        <p>Elements and devices for the information system for the
visually impaired are designed as a complement to the
existing hardware and software elements of electronic
information systems for the public.</p>
        <p>People have certainly long noticed the use of acoustic
beacons for blind people at traffic lights, however, many
other functions are also fulfilled today by sound beacons.
The following subsections describe the most common
devices used in SK and CZ.</p>
        <p>An acoustic (or sound) beacon is an electronic acoustic
device that has two main functions for the visually
impaired: orientation/navigation and information. The
beacon is controlled by a signal from a command
transmitter operated by the visually impaired person.
The navigation device emits a periodic sound signal that
allows a person with limited vision to locate the object (for
example, entrance to the building) or to receive other
important information, such as the receive information
about the current status of the traffic light, or the length of
time remaining for safe crossing the road.</p>
        <p>
          The use of beacons and information systems is already
widely used in some European countries. For instance the
SNCF (French National Railway Company) chose the
NAVIGUEO+HIFI navigation and information devices for
its train and subway stations. Okeenea won the RATP,
(Parisian Public Transportation Company) tender for audio
beacons and will be the audio beacons supplier for the
entire Parisian metro network. More than 1,500 of these
devices are going to guide visually imparaired people by
the end of 2019 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          In his article from 2015 Harušťák informs: “in the CZ,
there are 12 cities with sound-beacon equipped urban
public transport. In the SK, the past two years have moved.
The first swallow was the Transport Company of the City
of Ţilina, which in early 2014 equipped the first 39 vehicles
(buses and trolleybuses) with acoustic beacons and all
newly purchased vehicles will have beacons.” [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 The TYFLOSET® System by APEX</title>
        <p>In SK, TYFLOSET® System devices by APEX are used
to control the functions of audio information systems for
the blind (see Fig. 1.).</p>
        <p>The handheld transmitter has six buttons. Buttons 1 and
2 are used in the SK for orientation on streets, in public
buildings, moving staircases, sidewalks; for activating
sound information terminals. Buttons 3 and 4 are reserved
for public transport vehicles, buttons 5 and 6 are reserved
for other applications. The 4-button command transmitter
can be built directly into the white stick. After pushing the
button a control signal is transmitted and the beacon
answers by playing the pre-recorded sound or speech
information according to the desired function.</p>
        <p>The beacon can give a short beep that shows the visually
impaired person the direction towards the beacon marked
location. In buildings, the system can guide the blind to
enter the building and find their way or the contact person.
In public transport and railways the system can provide a
line number, driving direction, and driver announcements
on boarding or stops.</p>
        <p>
          The sound-equipped information boards should provide
voice information about the current timetable. Furthermore,
it is possible to equip acoustic beacons to ensure the
orientation and safe movement of visually impaired in the
areas of stations and terminals [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>The TYFLOSET® transmitter/receiver system produced
bu APEX Ltd. company consists of a set of portable,
mobile and static devices that serve both for acoustic and
voice information and easier orientation of blind people.</p>
        <p>
          It is designed as a unified system for all types of acoustic
information and orientation in the Slovak Republic. This
means that one command transmitter operates the blind,
a) b)
activating all acoustic and voice information and
orientation systems. Transmitter frequency is 86.790 MHz
for the CZ and 87,100 MHz for SK [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2.3 The STARMON information systems with LED boards by ELEN s.r.o.</title>
      <p>ELEN Ltd. was established in 1991 in Prešov, Slovakia,
by developers who have been especially involved in
microprocessor applications, automation and robotics
applications. Since its establishment, it has focused on the
development and production of electronic information
panels and displays.</p>
      <p>
        Their most important applications range from railway
station information systems providing passengers with the
information on departures/arrivals of trains, information
displays on tram and bus stops, special displays for
hospitals, Metro (underground) information boards in
Prague, through exchange-rate boards for banks, to a
largescale information board showing the state of the
environment in Budapest [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>Passenger information systems are an essential part of
any modern station. Their use significantly affects the
comfort and safety of passengers in public passenger
transport. Most of the ELEN information boards are
currently equipped with powerful LEDs as display
elements.</p>
      <p>
        ELEN Ltd. has been cooperating for many years with
STARMON company from Czech republic, which has been
designing information systems for passengers in trains,
buses, and other types of public transport [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        The idea of using automatic speech processing in railway
information system is straightforward, as the
communication via speech is the most natural one for
humans. To give an example, PHILIPS has introduced their
sophisticated train timetable information over the telephone
that provided accurate connections between 1200 German
cities, using speech recognition, speech understanding,
dialogue management and voice output based on
prerecorded utterances in 1995 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Similar voice-controlled system was developed in
Slovakia by four academic institutions in the years 2002 to
2006 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
3
      </p>
      <sec id="sec-3-1">
        <title>The architecture and hardware components of the designed system</title>
        <p>When it was decided that the newly installed information
LED boards by ELEN Ltd. Slovak Railways should be
equipped with an on-demand voice-information feature for
the blind, STARMON delivered a hardware solution and
the Institute of Informatics of the Slovak Academy of
Sciences (II SAS) designed a synthetic voice for this new
feature.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3.1 The architecture of the system</title>
      <p>A block diagram of the information system is presented
in Fig. 2. The text information for the boards is sent from
the STARMON Information Server via the RS 485 bus to
the RS 485 interfaces both in the LED board motherboard,
and the Raspberry-based mini PC. When the blind person
presses the button on the TYFLOSET® VPN 02 handheld
transmitter or the command transmitter in folding stick
VPN 403, the control 87,1 MHz is emitted. Once this signal
is recived by the TYFLOSET® receiver, the control unit of
the Raspberry PC prepares message based on the actual text
displayed on the board, launches the text preprocessing
program and the speech synthesis program itself. The text
preprocessing turns abbreviations and numeralls into full
text, corrects diacritics and pronunciation. The Text to
Speech Synthesizer (TTS) transforms text into a synthetic
human-speech-like audio signal. This is then amplified and
played via a loudspeaker.</p>
    </sec>
    <sec id="sec-5">
      <title>3.2 The mini computer and audio hardware</title>
      <p>The heart of the sound generating system is a Mini PC
Raspberry Pi2 computer equipped with RS 485 bus
communication circuits (See Fig. 3.). Raspberry Pi is a
small single-boot computer whose primary operating
system is Raspbian Linux.</p>
      <p>It is equipped with Broadcom Quad-Core CPU
BCM2836 with 900MHz clock, 1GB RAM, four USB 2.0
interface connectors, HDMI interface and microSD slot.</p>
      <p>The output power sufficient for driving the loudspeaker system
is provided by a miniature audio amplifier mounted to the PC
board.</p>
      <p>The used AMC VIVA 4IP loudspeaker is suitable for outdoor
installation in wet conditions (IP55). The IP55 category means
almost complete ingress protection from particles and a good level
of protection against water. The dual band speaker system
provides maximum power of 20 W and sound pressure level 89
dB at 1 W of power and 1 m distance. Maximum sound pressure
level (SPL) at 1m distance (at 3254 Hz) is 102 dB, and the
frequency response is 90 Hz - 20 kHz. The radiation angle at 1000
Hz is 150° (horizontal) and 120° (vertical), which gives the
system a good spatial coverage. Low tone speaker is 4” and the
high tone has 1” in diameter.
4</p>
      <sec id="sec-5-1">
        <title>Speech synthesis on the railways</title>
        <p>
          Playing the acoustic signals, voice messages and
announcements is the most widely used way of informing
passengers on railway stations and trains (e.g. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]).
        </p>
        <p>The traditional solution was to use recorded prompts
with “slot filling” with various sections of speech
utterances to produce the required voice-message.</p>
        <p>This approach is however very inflexible. It is unable to
interpret new messages with unforeseen utterance structure
or out-of-vocabulary words. On the other hand some years
ago, the intelligibility and naturalness of the synthesized
speech utterances could hardly compete with the voice
messages obtained by concatenation of pre-recorded words
and phrases.
much higher quality, they soon eSpeak excluded from the
list of potential candidates because of unnaturalness of the
produced speech.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4.1 The “true” TTS on the foreign railways</title>
      <sec id="sec-6-1">
        <title>4.2.2 Concatenative synthesis - Kempelen 2.0</title>
        <p>The alternative represented by the use of “true” speech
synthesis has been currently used more and more. For the
sake of simplicity, let us define the “true” speech synthesis
as a system that is able to interpret – i.e. read aloud – any
(even unknown) text in a given language with sufficient
naturalness and intelligibility. Let us introduce some
examples.</p>
        <p>
          In the 2010 it was decided that the Swedish Transport
Administration, Trafikverket will use a text-to-speech
public announcement system to relay passenger
information to travelers at train stations across the country.
The text to speech synthetic voice was created by Acapela
Group [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          The TextSpeak company reports, that their “TTS-EM
modules have been integrated in 2017 to announce
passenger information across New York City (DOT) and
Los Angeles (LA Metro) for audio and ADA compliance in
new smart bus shelters. Additional deployments in 2017
include 1000s of information displays across Europe
including France, Germany and Scandinavia.” [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]
        </p>
        <p>
          Hungarian developers are probably the farthest in the
deployment of synthesizers in station reporting systems
from nearby countries. Their system has been in operation
at the largest passenger railway station of Hungary since
June 2014 and has been installed for more than 60 other
stations and stops [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>4.2 TTS - The candidate systems for voice generation</title>
      <p>There have been many technical approaches in the
history of modern speech synthesis, that were successfully
tried out. And most of them we have tried in our systems
too. (Sorry, we skip the historical experiments, like
Volfgang von Kempelen’s speaking machine or even
Homer Dudley's vocoder.)</p>
      <sec id="sec-7-1">
        <title>4.2.1 Formant synthesis - eSpeak</title>
        <p>
          One of them was based on the role of the speech
spectrum resonances – formants in the human ability to
identify various phonemes. Using two simple types of
excitation signals and formant filters, Klatt was able to
design a formant synthesizer [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] creating an intelligible
speech.
        </p>
        <p>The STARMON company has delivered their sound
hardware equipped with eSpeak Slovak voice.</p>
        <p>
          eSpeak is a compact open source software speech
synthesizer which uses a "formant synthesis" method. This
allows the TTS to be provided in a small size. The speech
is relatively clear and needs only a short time to produce
utterances, but is not as natural or smooth as the
synthesizers based on newer approaches of concatenation
of the elements of pre-recorded human speech [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. As the
Department of Speech Synthesis and Analysis of the II
SAS have been working on speech synthesis in Slovak
since 1989 and have developed several synthesizers of
The expression “Concatenative synthesis” designates in
general any synthesis method using concatenation of
prerecorded speech segments (e.g sentences, phrases, words,
syllables, diphones, phonemes, or their parts). One of our
concatenative synthesizers, Kempelen 2.0 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], had been
used in the services of the Slovak telephone operators in
their SMS-to-Voice services for about fifteen years. It was
a diphone synthesizer using an ovelap-add method similar
to PSOLA for pitch manipulation following the
CARTtrees based F0 and Duration models. This synthesizer was
called by the Slovak Telekom Robo-teta (Robo-aunt) for
the robotic character of its female voice. This was mainly
due to the small number of implementations of diphones
(synthesis elements), which led to spectral monotony. The
second weak point was the imperfect modeling of
intonation and speech rhythm, leading to repetitive
prosodic patterns sounding mechanically.
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>4.2.3 Unit Selection synthesis (UniSel) - Kempelen 3.0</title>
        <p>
          The Unit Selection synthesis [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] is probably the most
successful and most used method among the approaches
using waveform concatenation algorithms.
        </p>
        <p>The synthesis elements can be of different length
(triphones, diphones. phones, subphones etc.). These are
chosen from multiple candidates contained in a large
speech database according to their phonetic, word, sentence
context and to their F0, and duration.</p>
        <p>
          Our Unit Selection synthesizer, Kempelen 3.0 was
completely developed at the Department of Speech
Analysis and Synthesis of the II SAS [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The CART
trees, that were used in the first versions for prosody
modeling were later replaced by HMM models that
generate the target values of F0 and duration for every
phoneme sought in the database.
        </p>
        <p>A syllable was chosen as the base unit of synthesis,
which contributes to the natural rhythm of the resultant
speech. Unwanted artifacts at the connection points of the
syllables are rare and mostly come from imperfect
automatic phoneme allignement in the database.</p>
        <p>The minimum size of the speech database is about two
hours of speech recordings. This database has to be stored
on the disk or uploaded in the memory. The memory
footprint of this database is big and the process of reading
the candidate elements is time consuming as it is not
optimized for speed in the current version.</p>
      </sec>
      <sec id="sec-7-3">
        <title>4.2.4 Statistical Parametric synthesis with</title>
      </sec>
      <sec id="sec-7-4">
        <title>Markov modeling (HMM TTS), Kempelen 4.0</title>
      </sec>
      <sec id="sec-7-5">
        <title>Hidden</title>
        <p>
          The statistical parametric speech synthesis uses statistical
modeling based on Hidden Markov Models (HMMs) to
create estimates of F0, duration and spectral envelope (in
a form of Mel-cepstrum coefficients) to drive the vocoder
and generate the synthetic speech. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
        </p>
        <p>
          Our “HMM speech synthesizer” was developed in 2011
[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. It is based on HTS Speech Synthesis Toolkit [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. The
context-dependent HMM models were trained from our Slovak
speech databases, as generative models for speech synthesis
process. The system was supplemented by various
languagespecific components, such as text preprocessing,
letter-tophoneme conversion, etc.
        </p>
        <p>
          The original version of the synthesizer uses the Mel Log
Spectrum Approximation (MLSA) Vocoder [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. Speech
parameters are generated from HMMs with dynamic
features, namely multi-space probability distribution
HMMs (MSD-HMMs). The MLSA filter is excited using a
simple impulse – random noise excitation.
        </p>
        <p>
          Experiments and comparisons were done with HMM
synthesizers using more sophisticated vocoders [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ],
however these were not public domain and would increase
the price of the system.
        </p>
      </sec>
      <sec id="sec-7-6">
        <title>4.2.5 Statistical Parametric synthesis with DNN modeling (DNN TTS), Kempelen 5.0</title>
        <p>
          Recent massive increase in available computing power
and memory capacity, the use of parallel computing and the
use of graphics processors has led to the possibility of using
different types of neural networks to model models for
statistical parametric synthesis [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
        <p>
          Our “Deep Neural Network (DNN) synthesizer“ was
designed using the Merlin toolkit for building DNN
models for statistical parametric speech synthesis [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. It
was used in combination with a front-end text processor
designed at II SAS and a WORLD vocoder [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ].
        </p>
        <p>We found out that the amount of training speech data
necessary for getting satisfactory quality of the resulting
voice was highly speaker-dependedent. While it was
enough to use about two and a half hours of speech of our
male speaker Milan to get a reasonable naturalness and
intelligibility, about ten hours was needed to create our
female voice Dagmar. Further increasing the volume of
training data should lead to an increase in quality, but one
has to make sure that the recordings are consistent in style,
recording channel, etc.</p>
        <p>The quality of DNN voices is generally very high
especially in terms of natural intonation and rhythm, and
timbre of voice. However the artifacts of vocoding are still
audible in a form of a slight buzz.
5</p>
        <sec id="sec-7-6-1">
          <title>Results and discussion</title>
          <p>As mentioned in the description of the legal status, the
Decree of the Ministry of Environment of the Slovak
Republic no. 532/2002 Coll., Art. 2.5.2., introduces an
obligation to provide information to a blind person in an
appropriate way (for example, an informant, acoustic or
tactile system, telephone information service) and an
optical system for the hearing impaired. This offers several
alternative possibilities to the voice messages.</p>
          <p>Providing information by a human informant is a
relatively expensive solution taking into account that the
number of railway stations with so called “Comprehensive
services for passengers” in SK, which should provide this
service is more than 60.</p>
          <p>
            Tactile displays are both rare and expensive (see e.g.
TeslaTouch [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ]). Vidal-Verdu and co-authors present an
up-to-date survey of graphical tactile displays which could
be used for the visually impaired people. However most of
them are research prototypes and the expenses to produce
them commercially would be currently too high. Thus the
goal of an efficient low-cost tactile display for
visuallyimpaired people has not yet been reached [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ].
          </p>
          <p>An information system equipped with a speech output
using a speech synthesizer thus proves to be one of the
most appropriate solutions at present.</p>
          <p>The authors considered the properties, possibilities and
hardware requirements of five types of synthesizers – one
public domain formant synthesizers and four synthesizers
produced by II SAS.</p>
          <p>The eSpeak was excluded from the list of potential
candidates because of unnaturalness of the produced
speech.</p>
          <p>It was decided that despite its reliability and high speed
of speech production, the Kempelen 2.0 diphone
concatenative synthesizer is outdated and should not be
used in the current public information system.</p>
          <p>The Kempelen 3.0 Unit Selection synthesizer has a
disadvantage of relatively slow speech generation caused
by reading the element candidates from the memory. It was
so impossible to use this synthesizer in the designed
information system even though it produces a pleasant and
natural voice.</p>
          <p>The Kempelen 5.0 DNN synthesizer is four times slower
than Kempelen 4.0 HTS synthesizer mainly due to higher
higher volume of calculations needed by WORLD vocoder.
DNN models are about 100 times larger than HMM and
their memory requirements, as well as the time required to
load them, are considerably higher too.</p>
          <p>Therefore, a compromise was made between speech
quality and the speed, and the Kempelen 4.0 parametric
HMM statistical synthesizer was selected to be used in the
current version of the information system.</p>
          <p>Six new voice equipped information boards have been
installed at Spišská Nová Ves railway station. The system
is being tested and in the time of publication of this paper
it should be already in regular operation.</p>
          <p>To conclude we have to mention several potential issues
that have to be worked on.</p>
          <p>
            The response time of Kempelen 4.0 is approximately 0,5
times realtime on the Rapberry PC II system. The current
version processes the whole message and then reads in one
block. It is planned that the following version will generate
speech by sentences during playing the previous utterance.
This will reduce the reaction time requirements
significantly and enable the use of other types of
synthesizers. It will also be possible to consider
implementing the option of setting a higher emotional
arousal, or voice effort, as is usual with warning messages,
Lombard speech [
            <xref ref-type="bibr" rid="ref27">27</xref>
            ], or emotion cues [
            <xref ref-type="bibr" rid="ref28">28</xref>
            ]. Of course, in
that case an emotional-speech database would have to be
used to train the synthesizer [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ].
          </p>
          <p>
            The intelligibility of the output speech of the
synthesizers should have been tested using standard
methods, e.g. using phonetically balanced SUS test [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ].
          </p>
          <p>The range of the radio transmitter has to be set correctly
to prevent multiple triggering and reading by several
information systems simultaneously.
6</p>
        </sec>
        <sec id="sec-7-6-2">
          <title>Conclusion</title>
          <p>We introduced a new voice-equipped information system
developed for Slovak Railways, that combines visual text
information on LED information boards with
reading-ondemand of the same text content using speech synthesis in
Slovak.</p>
          <p>Following the analysis and experiments, Kempelen 4.0
HMM synthesizer was implemented in the current version
of the device. The authors hope that their product will help
the blind and partially sighted passengers to obtain the
needed information more comfortably.</p>
        </sec>
        <sec id="sec-7-6-3">
          <title>Acknowledgment</title>
          <p>The authors would like to express thanks to Mr. Milan
Slanina, the Head of Service and Design of the STARMON
Ltd., who provided the authors with all the necessary
information and participated in the development of this
product.</p>
          <p>This work was supported by the Slovak Scientific Grant
Agency VEGA, grant No. 2/0161/18.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Okeenea</surname>
          </string-name>
          , http://www.okeenea.
          <article-title>com/navigueo-hifi-audio-beacon/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>I. Harušťák</surname>
          </string-name>
          ,
          <string-name>
            <surname>ÚNSS</surname>
          </string-name>
          ,
          <article-title>Akustické informačné systémy s diaľkovým ovládaním pre nevidiacich</article-title>
          , in: Mosty inklúzie 7/
          <year>2015</year>
          . http://www.nrozp-mosty.sk/temycisla-7-2015/item/1690-akusticke
          <article-title>-informacnesystemy-s-dialkovym-ovladanim-pre-nevidiacich</article-title>
          .html
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] TYFLOSET® electronic orientation and information system for the visually impaired persons</article-title>
          , http://www.apex-jesenice.cz/tyfloset.php?lang=en
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>[4] ELEN s</article-title>
          .r.o., https://www.elen.sk/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] STARMON s</article-title>
          .r.o., http://www.starmon.cz/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Aust</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Oerder</surname>
          </string-name>
          , F..
          <string-name>
            <surname>Seide</surname>
            ,and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Steinbiss</surname>
          </string-name>
          , “
          <article-title>The Philips automatic train timetable informationsystem”</article-title>
          ,
          <source>Speech Communication</source>
          , Vol.
          <volume>17</volume>
          ,
          <year>1995</year>
          , pp.
          <fpage>249</fpage>
          -
          <lpage>262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Juhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ondas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cizmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rusko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rozinaj</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Jarina</surname>
          </string-name>
          , “
          <article-title>Development of Slovak GALAXY/voiceXML based spoken language dialogue system to retrieve information from the internet</article-title>
          .
          <source>” Proceedings of INTERSPEECH</source>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Klabbers</surname>
          </string-name>
          , “
          <article-title>High-quality speech output generation through advanced phrase concatenation”</article-title>
          ,
          <source>Proceedings of the COST Workshop on Speech Technology in the Public Telephone Network., Rhodes</source>
          ,
          <year>1997</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] ACAPELA report</article-title>
          , http://nationalpainreport.com
          <article-title>/swedens-railwaystations-get-new-text-to-speech-technology-for-publicannouncements-885121</article-title>
          .html
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10] ACAPELA news, https://www.acapela-group.
          <article-title>com/news/public-transportacapela-group-creates-custom-voices-for-trafikverket/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11] TextSpeak, https://www.textspeak.
          <source>com/first-case-studies-2-2-2/</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Zainkó</surname>
          </string-name>
          , Csaba et al. “
          <article-title>A polyglot domain optimised text-to-speech system for railway station announcements</article-title>
          .”
          <string-name>
            <surname>Interspeech</surname>
          </string-name>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Dennis</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Klatt</surname>
          </string-name>
          ,
          <article-title>"Software for a cascade/parallel formant synthesizer"</article-title>
          <source>J. Acoustical Society of America</source>
          ,
          <volume>67</volume>
          (
          <issue>3</issue>
          ) March
          <year>1980</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>[14] eSpeak, http://espeak.sourceforge.net</mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Darjaa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rusko</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Trnka: Three generations of speech synthesis systems in Slovakia</article-title>
          ,
          <source>Proceedings of XI International Conference Speech and Computer (SPECOM)</source>
          ,
          <year>2006</year>
          , pp.
          <fpage>297</fpage>
          -
          <lpage>302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.J.</given-names>
            <surname>Hunt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.W.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <article-title>Unit selection in a concatenative speech synthesis system using a large speech database</article-title>
          ,
          <source>1996 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) Conference Proceedings</source>
          ,
          <year>1996</year>
          , ISBN:
          <fpage>0</fpage>
          -
          <lpage>7803</lpage>
          -3192-3.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Tokuda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zen</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.W.</given-names>
            <surname>Black</surname>
          </string-name>
          , “
          <article-title>An HMM-based speech synthesis system applied to English</article-title>
          .”
          <source>Proceedings of 2002 IEEE Workshop on Speech Synthesis</source>
          ,
          <year>2002</year>
          . (
          <year>2002</year>
          ):
          <fpage>227</fpage>
          -
          <lpage>230</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Darjaa</surname>
          </string-name>
          , et. al.:
          <article-title>HMM speech synthesizer in Slovak</article-title>
          .
          <source>In: GCCP</source>
          <year>2011</year>
          , Bratislava,
          <year>2011</year>
          , pp.
          <fpage>212</fpage>
          -
          <lpage>221</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yamagishi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sako</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Masuko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.W.</given-names>
            <surname>Black</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tokuda</surname>
          </string-name>
          ,
          <source>The HMM-based speech synthesis system version 2.0, Proc. of ISCA SSW6</source>
          , Bonn, Germany, Aug.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Imai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sumita</surname>
          </string-name>
          , Ch. Furuichi.
          <article-title>Mel Log Spectrum Approximation (MLSA) filter for speech synthesis</article-title>
          ,
          <source>Electronics and Communications in Japan</source>
          <volume>66</volume>
          (
          <issue>2</issue>
          ),
          <fpage>10</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>1983</year>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sulír</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Juhár</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rusko</surname>
          </string-name>
          ,
          <article-title>Development of the Slovak HMM-Based TTS System and Evaluation of Voices in Respect to the Used Vocoding Techniques</article-title>
          .
          <source>Computing and Informatics</source>
          ,
          <volume>35</volume>
          ,
          <year>2016</year>
          , pp.
          <fpage>1467</fpage>
          -
          <lpage>1490</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Senior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schuster</surname>
          </string-name>
          ,
          <article-title>Statistical parametric speech synthesis using deep neural networks</article-title>
          ,
          <source>IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>7962</fpage>
          -
          <lpage>7966</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Watts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>King</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>Merlin: An Open Source Neural Network Speech Synthesis System" in Proc. 9th ISCA Speech Synthesis Workshop (SSW9)</source>
          ,
          <year>September 2016</year>
          , Sunnyvale, CA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Masanori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Yokomori</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Ozawa</surname>
          </string-name>
          , “WORLD:
          <article-title>A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications</article-title>
          .” IEICE Transactions 99-D (
          <year>2016</year>
          ):
          <fpage>1877</fpage>
          -
          <lpage>1884</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>O.</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Poupyrev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Israr</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ch</surname>
          </string-name>
          . Harrison,
          <article-title>TeslaTouch: electrovibration for touch surfaces</article-title>
          .
          <source>In Proceedings of the 23nd annual ACM symposium on User interface software and technology (UIST '10</source>
          ,. ACM, New York, NY, USA,
          <year>2010</year>
          ,
          <fpage>283</fpage>
          -
          <lpage>292</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>F.</given-names>
            <surname>Vidal-Verdu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hafez</surname>
          </string-name>
          , “
          <article-title>Graphical tactile displays for visually-impaired people</article-title>
          .
          <source>” IEEE Transactions on Neural Systems and Rehabilitation Engineering</source>
          ,
          <year>2007</year>
          ,
          <volume>15</volume>
          (
          <issue>1</issue>
          ),
          <year>2007</year>
          , pp.
          <fpage>119</fpage>
          −
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J.</given-names>
            <surname>Šimko</surname>
          </string-name>
          , Š. Beňuš, M. Vainio, “
          <article-title>Hyperarticulation in Lombard speech: Global coordination of the jaw, lips and the tongue</article-title>
          .”
          <source>In Journal of the Acoustical Society of America</source>
          ,
          <year>2016</year>
          , vol.
          <volume>139</volume>
          , no.
          <issue>1</issue>
          ,
          <issue>2016</issue>
          , pp.
          <fpage>151</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hric</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chmulik</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Guoth</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Jarina</surname>
          </string-name>
          , “
          <article-title>SVM based speaker emotion recognition in continuous scale</article-title>
          .
          <source>” Proceedings of the 25th International Conference Radioelektronika</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>339</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sabo</surname>
          </string-name>
          , J. Rajčáni, “
          <article-title>Designing the Database of Speech Under Stress”</article-title>
          ,
          <source>Journal of Linguistics (Jazykovedný časopis)</source>
          ,
          <source>Volume 68: Issue</source>
          <volume>2</volume>
          ,
          <issue>4016</issue>
          , pp.
          <fpage>326</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sulír</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Staš</surname>
          </string-name>
          , J. Juhár, “
          <article-title>Design of phonetically balanced SUS test for evaluation of Slovak TTS systems” / - 2014</article-title>
          . In: Elmar-2014
          <source>: 56th International Symposium</source>
          , Zadar, Croatia, University of Zagreb,
          <year>2014</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>