=Paper=
{{Paper
|id=Vol-2714/icaiw_waai_2
|storemode=property
|title=Edge Solution with Machine Learning and Open Data to Interpret Signs for People with Visual Disability
|pdfUrl=https://ceur-ws.org/Vol-2714/icaiw_waai_2.pdf
|volume=Vol-2714
|authors=Fabian Velosa,Hector Florez
|dblpUrl=https://dblp.org/rec/conf/icai2/VelosaF20
}}
==Edge Solution with Machine Learning and Open Data to Interpret Signs for People with Visual Disability==
Edge Solution with Machine Learning and Open Data
to Interpret Signs for People with Visual Disability
Fabian Velosa, Hector Florez
Universidad Distrital Francisco Jose de Caldas, Bogota, Colombia
Abstract
Transport systems in growing cities present communication barriers for people with visual impairment
that can prevent their caregivers’ locomotion, making a city unequal, and limiting the full exercise of
rights. Some current solutions and prototypes do not fully solve this problem. This paper proposes an
application that uses machine learning and open data to identify, extend, and communicate the available
information on the signs placed by the urban transport system of Bogota, Colombia using a mobile
device without the need to use any internet connection. The solution presents a replicable model suitable
for other transport systems. The result is a free mobile app available in the Android store covering the
buses’ routes in Bogota. Further works consider the extension of the strategy to the public transports
platform of any city.
Keywords
Machine Learning, Object Detection, Optical Character Recognition, Accessibility, Transport System
1. Introduction
Transport systems determine different aspects of people’s lives in cities that experience strong
urban growth phenomena. However, these systems present different types of barriers for peo-
ple with disabilities. To make a city accessible, it does not only require specialized infrastruc-
ture but also it is necessary to make it possible to reach the city using the transport system,
which is itself a tool to guarantee the rights. The set of situations that prevent the possibil-
ity of fully exercising the rights that each citizen has known as barriers [1]. Those defined as
communication barriers are the obstacles that hinder access to information and the flow of it.
These barriers particularly affect partial or complete sensory disabilities, making it difficult to
communicate and access information. There is then a communication barrier in the informa-
tion systems related to urban transport and accessibility. Thus, it is evident the possibility of
using appropriate technologies to increase the information available in the graphic signs to
generate perceptual auditory accessibility.
In this project, we developed a model that has been tested through an application for mobile
devices with the Android operating system, which implements Machine Learning functionali-
ties and open data to consult the information of the routes in the zonal stops of the Integrated
System of Urban Transport (SITP) of the city of Bogota Colombia without requiring an Internet
ICAIW 2020: Workshops at the Third International Conference on Applied Informatics 2020, October 29–31, 2020, Ota,
Nigeria
" velosaf@acm.org (F. Velosa); haflorezf@udistrital.edu.co (H. Florez)
0000-0002-5339-4459 (H. Florez)
© 2020 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)
connection. This mobile application designed focused on people with the condition of partial
or complete visual disability.
The paper is structured as follows. Section 2 contextualizes the city of Bogotá, the SITP trans-
port system, and visual disability. Section 3 presents the most relevant technological solutions
from different approaches, the available applications related to routes, academic prototypes,
accessibility applications in general, and the current infrastructure. In section 4, we illustrate
the solution developed based on the life cycle of a Machine Learning project. Section 5 reports
the results obtained and finally, section 6 presents the most relevant conclusions of the project.
2. Disability and SITP
Disability is a demonstration of human diversity, which occurs either congenitally or at any
time in life. It could be materialized when a person tries to interact in society or with the
environment finding barriers in the context that is not prepared for this interaction generating
exclusion [1].
According to the contract "Design and Evaluation of Accessibility Alternatives for SITP in
Condition of Disability" between the town hall of Bogota and a private consultant, it was found
that Bogota had approximately 195,821 people with disabilities between years 2011 and 2012.
20% of the characterized population have some form of visual disability, which represents a
group of 21% of daily trips made by people with disabilities in the year 2011. Also, the contract
identifies the absence of aids for sensory disabilities at stops and stations as a barrier that makes
it difficult to carry out activities and travel. From all the people with disabilities who carry out
some economic activity, 47% had some kind of visual disability, which reflects the importance
of accessibility as a guarantee of rights.
Colombia’s statutory law 1618 of 2013 defines accessibility as the conditions that must be
met by services and facilities with the aim of providing equal access for people with disabilities
to physical environments, transport, information, and communications as well as determines
that technical aids shall be made with appropriate technology taking into account people’s
needs [2].
3. Related work
The related work is classified in the following aspects: accessibility, transport routes, proto-
types, and infrastructure.
3.1. Accessibility
Currently, it is feasible to find applications that make use of machine learning techniques with
the aim of helping the visually impaired. In this field, different proposals have been developed
such as:
Seeing-ai [3] is an application developed by the Microsoft team and available for iOS since
mid-2017. It is a free application that uses the camera to facilitate the recognition and descrip-
tion of the environment of blind or low vision people identifying faces, objects, colors, and
16
even the value of paper money. One of the most attractive qualities of this application is that
it is available in more than 5 languages, facilitating its use in different countries.
Google Lookout[4] is an initiative that uses artificial intelligence to recognize images and
objects with the purpose of reducing accidents when visually impaired people travel. This
application has a system of reading aloud that allows users to distinguish the objects in front
of the device camera. It also has different modalities that condition its operation.
Tap Tap See[5] is an application available for most mobile devices. It is designed with internet
APIs technology together with VoiceOver functions for the identification and recognition of
objects and images for their subsequent description in voice reading.
Navilens [6] is an application for guiding people with disabilities through the city and with
signaling functionalities for transport systems. It is a software capable of reading optimized
QR codes arranged in a printed way, developed by Neosistec and the University of Alicante
in 2017. This application is able to detect each code at a distance of 15 meters used in some
transport systems and museums in Spain.
3.2. Routes
Regarding technological applications that support the objective of providing information for
route planning, it is possible to identify Moovit, Transmilenio and SITP, and Transmi App,
which are available for free in the Google play store.
Moovit, is a world-class mobility application with over 720 million users and available in
more than 3000 cities. It offers route information services through a combination of public
information, authority information, and community information. In Bogota, it has been in
operation since 2013 and its access is free for users. This app allows planning and suggesting
a variety of modes of transport according to the availability and state of the traffic. Its main
functions require online artificial intelligence services and its interface has TalkBack support
for mobile devices as well as a system map in PDF format.
TransMilenio and SITP [7], is a proposal whose main function is to find and calculate the
most optimal route for the user by reducing the number of stops within this transport system,
even providing the possibility of requesting taxi services easily and safely. Users will be able
to access information such as routes, stations, buses, schedules, recharging points as well as
visualizing maps of the system pointing food service places, tourist sites of the city, and bike
paths. It uses GPS and its access does not depend on an internet connection excepting for
access to maps.
TransMi App [8] is the official application of the TransMilenio transport system. Its main
function is planing the route given an origin and a destination. It includes the routes of the
SITP and has an additional function that is monitoring in real-time of the routes and the time it
takes to reach the stop. It makes use of an intelligent search to find the best paths and Google
maps technology to suggest routes.
3.3. Prototypes
The applications developed so far do not have a function to obtain information from the SITP
stops by means of the camera of the mobile device. Likewise, they do not have a usability
17
interface designed for people with visual disabilities or low vision. However, some prototypes
use spatial data to obtain information related to the routes or the state of the SITP. The next
list of solutions proposes new approaches that contemplate the use of space and position of the
user in relation to the bus stop.
NeuroSITP [9] is a prototype proposed in 2016. This application implements an artificial
neural network to calculate the time that a bus takes for routes to reach different stops in
one specific area providing real-time data to users. Its operation is relatively simple since
it only requires indicating the destination neighborhood so that the system can provide the
most appropriate options for a trip in the shortest possible time by means of client-server
architecture through the Internet.
QRSITP [10] is a prototype proposed in 2016. It was planned as an information system
related to the SITP in one specific area of Bogota. By means of software and hardware tools,
it is possible to spatially consult the data related to destinations through the use of QR codes
while its operation proposes a client-server architecture using an Internet connection.
ARComp [11] is a prototype created to serve as a technological tool for mobile devices show-
ing the routes and access points belonging to the Integrated Transport System of Bogota. For
this, it uses the space as a source of information using augmented reality techniques, also em-
ploying a client-server architecture through the Internet.
3.4. Infrastructure
In Bogota, there are 4,881 bus stops with Braille plates as described in Figure 1 that offer in-
formation such as name, address, and telephone number for user service. Despite the fact that
these initiatives contemplate a tangible accessibility option for the visually impaired sector of
the population, In 2018, around 700 plates were replaced due to physical damage. The plates are
limited by different situations such as lack of information, outdated information, and physical
damage, which has generated a partial and out-of-context communication.
The information screens are part of a pilot test implemented at five points in the city. It
provides real-time information on the status of services, routes, and estimated times for the
arrival of buses thanks to a GPS system and the Internet that makes it possible to locate between
routes and screens.
Therefore, it is possible to observe that no options are available as open-source software
and they depend on specific technologies or hardware. The first three initiatives have been
designed as general tools for the relationship of people in space, but they do not consider a
specific approach to provide additional information. In most cases, these applications require
Internet access. NaviLens [6] increases the perceived information but still has additional phys-
ical requirements as it needs previously programmed codes in the places of use.
Figure 2 summarizes the different software initiatives in this context grouping them in the
following categories: Apps routes, prototypes, apps accessibility, and infrastructure.
4. Proposed solution
In this project, a machine learning model is proposed from all stages of its life cycle, which is
described in Figure 3. This model is scalable and works in the deployment phase without the
18
Figure 1: Braille plates, by Bogota mobility department
need for an internet connection, which can be replicated with the signaling graphics of other
transport systems and supported by the related open data sets. Based on this, it was necessary
to create an application capable of using the camera of the Android device. The captured image
is converted to a tensor and later evaluated by the model. Then, based on the results, the app
is able to infer the position of the unique code of the signal. At this point, the implemented
algorithm is used to recognize the optical characters of the image and generates as output
an alphanumeric code that is compared to the available paths in the source of information
obtained from the open data of Bogota. Later on, the information is presented to the user
in an audible manner using the TalkBack accessibility assistant on the Android devices and
optionally presented on the device screen.
The operating system Android was chosen due to its popularity around the world, low de-
vices’ prices, and the fact that most people who use the public system use Android devices
city[1]. The software tools chosen to develop the project are open source, which advantage the
support to run over Android the developed models.
4.1. Open Data
Open data policies were born as an initiative of governments to provide citizens with the in-
formation available in public entities with the intention of using it and contributing to the
development of the countries. They allow access to information as a tool for social control of
19
Figure 2: Related apps
the public administration. The most important value is the ability it gives citizens to understand
their environment and find worth that improves their conditions.
The open datasets are used at two moments in this solution. The first moment is concerning
the understanding of the data and the use of spatial geoinformation to locate the available
signals and to be able to generate both the photographs and the images provided by Google
Street View. The second moment is presented when the information at the end of the optical
character recognition is obtained. For this, the application stores information about the routes
and stops of the SITP. Information extracted from files come from various formats such as csv,
json, geojson. For the normalization of the data, we used a script in python that is able to
structure the information in the json standard and store them in two different files available
for searching services in the application.
4.2. Data sets
Machine Learning techniques depend on multiple data samples and variations for the correct
training of the model and its learning of relationship by repetition [12, 13, 14]. The data neces-
sary for this project was collected from different sources. A sample of each source is presented
in Figure 4. The data sets were formed using two different sources and one was discarded.
• Pictures captured with Android devices from the mid-end range that usually have medium
performance among the range of features offered by the market. The dimensions and res-
olutions of the photograph were limited to decrease the computational cost of processing
20
Figure 3: Proposed solution diagram
as tensors. This data source is accurate but ineffective due to the displacement required
to obtain a large number of samples.
• Google Street View has a photo gallery of various cities around the world such as Bogota.
This photo gallery is organized according to the streets and avenues. In some of these
photos, it is possible to see the stops of the SITP. This information can be consulted using
a web browser and using as parameters the latitude and longitude of the signal in the
URL, however, the quality of the images obtained is low.
• Computer generated images. A workaround for the absence of data is to generate the
images using a computerized method. There are multiple options for this purpose. In
this solution, a design was modeled as a base in Blender open-source software for 3D
creation using the technical specifications of the signage. Multiple samples were gener-
ated using a script in Python that varies randomly the position of the camera emulating
the possible approaches of the users and varying the unique alphanumeric code available
in the signal. Thus, 500 images were generated and are split into two data sets. The first
one is used in the training process, while the second one is used for testing purposes.
The images obtained using the device’s camera require more time and the process labeling
of the code to be detected per image. The images obtained in Google Street View have a low
21
Figure 4: Dataset sources
quality that prevents the contours of the signs and their parts from being clearly identified;
nonetheless, the use from data sets derived is not allowed and infringes the google permissions.
4.3. Object Detection Model
TensorFlow Object Detection API [15] is a framework supported by Google that works on top of
TensorFlow1 available in Python to reuse large CNN (Convolutional Neural Network) models
by making use of Transfer Learning. Among the multiple options of previously trained models,
we used the SSD MobilNet v1 model [16], which implements an architecture with depth-wise
separable convolutions that improve some quality attributes over mobile architectures such as
latency, accuracy, and performance. It was trained using 50.000 steps and a batch size of 12 with
300 x 300-pixel inputs over a data set of 350 training samples and 120 test samples achieving
an approximate loss of 2.4. The outcome is a model that is a mathematical algorithm that uses
the statistics to make detection or predictions.
4.4. Optical characters recognition trained data
Tesseract 2 is an OCR engine used to recognize the characters of the unique code of each signal.
The outcome file to recognize the character optics was trained to limit the alphabet only to
digits and the characters of the available modules from the technical stops guidelines, all codes
match with the next pattern: three-digit followed by a letter and finally two digits.
4.5. Mobile application
The developed Android application called Supqua is available from Android API version 23,
which corresponds to Android 6.0 Marshmallow as a minimum version with support for the
TensorFlow Lite library. This version supports 84.9% of devices on the market according to
1
https://www.tensorflow.org/
2
https://tesseract-ocr.github.io/
22
Figure 5: Application components diagram
Google statistics. The architecture of the app is described in the component application’s dia-
gram presented in Figure 5.
The application uses the device’s camera passing its image to the object detector model,
which locates the position of the signal border. This image is evaluated by optical character
recognition using Tesseract and consults the resulting code in the open data files related to
routes and stops. The workflow is illustrated in Figure 6
5. Results
As a result, we published a mobile application available in Google Play3 available in Colombia.
This application has a size of around 17 MB, which approximately 6 MB are for the object
detection model, 142 KB is for alphanumeric character recognition, and 2.3 MB for open data
files with routes information, with more than 100 downloads.
The results of the app on production are detailed in Table 1. These results were obtained
using the debug mode of the application to log its time and performance. Thus, we can observe
that the OCR recognition does not fulfill their job in two cases due to different environmen-
tal conditions. The second column shows the time used by the software to detect the signal.
Finally, the confidence value is the probability that the object belongs to the class signal.
3
https://play.google.com/store/apps/details?id=com.catcode.supqua
23
Figure 6: App result working
6. Conclusions
The proposed model can be extended to other transport systems in different cities by modifying
the entries referring to technical guidelines and open data and creating new synthetic data
enough to train a multi-class machine learning model.
Machine learning models are always exposed to continuous improvement by feedback with
new hyperparameters, the better quality of training data, and different samples that is why this
solution is ready for future updates.
24
Table 1
Results
Bus stop TF Detection OCR Recognition Confidence
190A09 1 second 6 seconds 99.61%
659A09 1 second did not recognize 73.05%
555A09 6 seconds 10 seconds 87.09%
055A04 26 seconds did not recognize 80.05%
048A09 14 seconds 22 seconds 95.31%
069A06 9 seconds 15 seconds 73.05%
To get the expected results in this project, it was necessary to understand the problem and
the available data. This volume of data gets better performance in the machine learning cycle
that has as an outcome a model that could be used in mobile devices.
The use of machine learning in daily tasks could be contributing to the lack of planning of
the infrastructure in cities in growth that do not include the different conditions of the citizens.
There is an opportunity to impact in a positive way the people of life with the new application
of artificial intelligence.
References
[1] A. F. C. Ordóñez, Acceso al transporte público para personas con discapacidad en bo-
gotá: caso sitp, 2015. URL: http://bdigital.unal.edu.co/49986/, maestría en Derecho. Área
de Profundización: Derecho Constitucional.
[2] Congreso de la república, Ley estatutaria 1618 de 2013, 2013.
[3] M. Corporation, Seeing-ai, 2019. URL: https://www.microsoft.com/en-us/ai/seeing-ai.
[4] G. LLC, Lookout, 2020. URL: https://taptapseeapp.com/.
[5] I. Cloudsight, Tap tap see, 2018. URL: https://taptapseeapp.com/.
[6] Neosistec, Navilens, 2020. URL: https://www.navilens.com/.
[7] MoviliXa SAS, Transmilenio y sitp, https://play.google.com/store/apps/details?id=com.
rutasdeautobuses.transmileniositp&hl=es_CO, 2020. URL: https://play.google.com/
store/apps/details?id=com.rutasdeautobuses.transmileniositp&hl=es_CO.
[8] TRANSMILENIO S.A., Transmi app | transmilenio, https://play.google.com/store/apps/
details?id=com.nexura.transmilenio&hl=es_CO, 2020. URL: https://play.google.com/
store/apps/details?id=com.nexura.transmilenio&hl=es_CO.
[9] G. E. Palomino Contreras, P. L. Pineda Acero, et al., Diseño de un prototipo de software
para el uso del sistema integrado de transporte público (sitp), en tiempo real en la localidad
de chapinero, 2016. URL: http://hdl.handle.net/11349/3105.
[10] B. Parra, D. Marcela, J. A. Parra Barrera, et al., Qrsitp (que ruede el sitp): Herramienta de
software para consultar espacialmente las rutas del sitp, 2016. URL: http://hdl.handle.net/
11349/3105.
25
[11] J. Chawez, J. Orlando, J. L. Candamil Acevedo, et al., Diseño de aplicación de realidad au-
mentada en dispositivos móviles para usuarios del sistema integrado de transporte público
de bogotá, 2015. URL: http://hdl.handle.net/11349/3050.
[12] J. Hernandez, K. Daza, H. Florez, Alpha-beta vs scout algorithms for the othello game, in:
CEUR Workshop Proceedings, 2019, pp. 65–79.
[13] C. Vegega, P. Pytel, M. F. Pollo-Cattaneo, Evaluation of the bias in the management of
patient’s appointments in a pediatric office, ParadigmPlus 1 (2020) 1–21.
[14] D. Sanchez, H. Florez, Improving game modeling for the quoridor game state using
graph databases, in: International Conference on Information Technology and Systems,
Springer, 2018, pp. 333–342.
[15] J. Huang, V. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song,
S. Guadarrama, et al., Speed/accuracy trade-offs for modern convolutional object detec-
tors, in: Proceedings of the IEEE conference on computer vision and pattern recognition,
2017, pp. 7310–7311.
[16] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto,
H. Adam, Mobilenets: Efficient convolutional neural networks for mobile vision applica-
tions, arXiv preprint arXiv:1704.04861 (2017).
26