=Paper=
{{Paper
|id=Vol-2083/paper-10
|storemode=property
|title=Using a NoSQL Graph Oriented Database to Store Accessible Transport Routes
|pdfUrl=https://ceur-ws.org/Vol-2083/paper-10.pdf
|volume=Vol-2083
|authors=Belén Vela,José María Cavero,Paloma Cáceres,Almudena Sierra,Carlos E. Cuesta
|dblpUrl=https://dblp.org/rec/conf/edbt/VelaCCSC18
}}
==Using a NoSQL Graph Oriented Database to Store Accessible Transport Routes==
Using a NoSQL graph oriented database to store accessible
transport routes
Belén Vela José María Cavero Paloma Cáceres
Escuela Técnica Superior de Escuela Técnica Superior de Escuela Técnica Superior de
Ingeniería Informática Ingeniería Informática Ingeniería Informática
Rey Juan Carlos University Rey Juan Carlos University Rey Juan Carlos University
28933 Móstoles, Spain 28933 Móstoles, Spain 28933 Móstoles, Spain
belen.vela@urjc.es josemaria.cavero@urjc.es paloma.caceres@urjc.es
Almudena Sierra Carlos E. Cuesta
Escuela Técnica Superior de Escuela Técnica Superior de
Ingeniería Informática Ingeniería Informática
Rey Juan Carlos University Rey Juan Carlos University
28933 Móstoles, Spain 28933 Móstoles, Spain
almudena.sierra@urjc.es carlos.cuesta@urjc.es
ABSTRACT The principal eventual aim of our study was to discover the
Each day, people have to move to carry out their daily tasks, such as strengths or weaknesses of the public transport information provided
and the services offered.
going to work, studying, shopping, etc., signifying that thousands of
trips are taken on public transport on a daily basis. A huge number of With regard to public transport users, we have analysed the
these trips are taken by people with special mobility needs. In spite of quality and quantity of accessibility information and services; in this
the existence of numerous Websites and apps that provide case, we have defined six accessibility levels according to the
information about public transport services, there is a lack of accessibility features related to users’ mobility, visual, audible needs,
information regarding the accessibility of the routes and sites. We along with other user needs, and the capacity to provide accessible
are, therefore, working on the development of a technological routes related to those user needs, in addition to assigning an
framework for the processing, management and exploitation of open accessibility level to each of the applications studied.
data, with the goal of promoting accessibility to city public transport With regard to public transport data, in addition to identifying
within the framework of the Access@city project. In this paper we the accessibility information contained in them, we have also
specifically focus on the design and storage of accessible transport identified their format in order to determine whether the data
routes, obtained by means of crowdsourcing techniques, in a NoSQL provided can be simply managed and reused, thus facilitating their
graph oriented database. extraction from the Internet and their subsequent use.
All of the websites and mobile applications analyzed provide
1. INTRODUCTION maps and services and some type of accessibility information, but
According to the World Bank [16], one billion people, or 15% of the none of them provides generic mechanisms with which to attain
world’s population, have some type of disability. Although this accessible transportation data and which would improve mobility in a
depends on each country, a significant percentage of the people who smart city. For example, the website accessible.net shows maps with
use public transport have special mobility needs. One of the goals of accessibility information, but does not include search options. The
smart cities is to improve the quality of life of all citizens [12]. In website for disabled people, www.discapnet.es, presents information
fact, in a smart city, anyone should be able to move easily and about training, education, employment, legislation, documentation,
according to their needs. There are, therefore, several initiatives organization and related services, and includes guides for accessible
whose objective is to improve accessibility to public transport for transport with the option of searching for routes. There are also
people with disabilities. For example, the World Health Organization websites that provide information regarding accessibility for
in its “World Report on Disability 2011” [17] proposes to improve wheelchair users, such as wheelmap.org and Rollstuhlrouting.de. The
accessibility to public transport for people with disabilities, and this abil.io website provides information regarding accessible journeys
includes “making public transport systems more flexible for the user and service-based routing using public transport. The main reason for
by optimizing the use of information technology”. this is that there is a significant lack of open and reusable data
concerning public transport and its accessibility.
Various projects and software tools address the issue of public
transport and its accessibility. We have carried out a large-scale study In order to address this lack of open transport data and of
of websites and mobile applications that offer information regarding information regarding accessibility, we are defining an open data
the accessibility of public transport. repository for accessible public transport within the framework of the
Access@City project. The repository will be developed using a
© 2018 Copyright held by the owner/author(s). Published in the Workshop NoSQL database owing to its capacity to manage huge volumes of
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018,
Vienna,Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this information, along with its flexibility and scalability [14]. We have
paper is permitted under the terms of the Creative Commons license CC-by- specifically selected a NoSQL graph oriented database as we are
nc-nd 4.0.
62
dealing with highly connected data and wish to be able to make platform, which consists of a repository (data hub) and a service
queries that are more efficient in a graph oriented database [7]. generation layer. This layer will be able to provide access to data
consumers, through the automatic generation of customized APIs
We propose to develop the graph oriented database, which will composed of services adapted to available data, which will be
be designed from scratch, by using a methodological approach. In exploited by different applications. In particular, we consider the case
general, we have detected a lack of specific methodologies for the of mobile applications, which would make it possible for citizens to
design of NoSQL databases that take into account the application obtain accessible routes between two points in a city in real time, and
characteristics and the most frequent data queries, which is a even combine different transport networks. These apps would
particularly important aspect in this kind of systems. translate the information regarding our smart city into an accessibility
A trend concerning how to incorporate traditional modeling context, thus resulting in the definition of an accessible city.
notions in this context has recently emerged [2]. For example, in [7], The case study that we present in the following section of this
Kaur and Rani show an example of NoSQL database design. They paper is focused on the marked part of Figure 1, which includes the
use an Entity Relationship Model [4] to obtain a conceptual application that captures the accessible routes obtained and validated
representation of the model and different data models for each by users using crowdsourcing techniques and the Big Data repository
NoSQL database (for example, a class diagram for a document (REPOSITORY OF ACCESSIBLE ROUTES) that will store the
database). Buggiotti et al. propose a methodology based on an routes. We consider that a route is accessible if a person with a
abstract data model for NoSQL databases called NoAM (NoSQL special need can use it to reach his or her destination.
Abstract Model) [3].
In summary, it could be said that different approaches exist but
that no solution has, as yet, been commonly accepted. In our opinion,
the key concept here is neither the model nor the representation Access@City
techniques to be used, but rather the design process and the aspects to Multiply@City Intelligent
generation of
Publi@City
Source
be considered. The characteristics of NoSQL databases are different processing
ETL processes
Scraper
in nature from those of SQL databases. Denormalization and queries Service 1
Automatic Generation of
Scraper
must be taken into account from the beginning of the process. Script
HARMONIZATION
Script
Service 2
API services
Script Data hub
Open Data
Datalog Service 3
In order to address this lack, in our previous work [15], we REPOSITORY OF ACCESSIBLE
ROUTES
proposed some guidelines for the design of document databases in Service n
which we integrated the final use of the data and the most frequent Repository
queries into the design process. BIG DATA
In this paper, we shall show how to design a NoSQL graph
@@@
oriented database in which to store accessible routes generated by the @
users of a mobile application. The accessible routes are obtained for
users with special needs by using crowdsourcing techniques (micro
Figure 1. Architecture of Access@City
tasks) [6][8][9]. For the storage of the routes we specifically use what
is, according to [13], the most popular graph oriented database i.e., The remaining parts of Figure 1 show the rest of the architecture
Neo4j [11]. on which we are working in the Access@City project.
The remainder of the paper is organized as follows: the
framework of our work is briefly presented in Section 2. In Section 3 3. DESIGNING A NOSQL GRAPH
we present our approach for the design of a graph oriented database ORIENTED DATABASE
for the storage of the accessible public transport routes generated. In Our proposal consists of developing the NoSQL graph oriented
Section 4 we show a validation of our proposal, along with a brief database by following a process based on the traditional database
description of the mobile application developed for the generation of design. The proposed approach is summarized in Figure 2 .
accessible transport routes and their storage in a Neo4j graph oriented
database. Finally, our main conclusions and future work are In a first step, we acquire and analyze the data sources or the
summarized in Section 5. specification in order to be able to determine the entities and their
relationships, along with their properties. This specification is used to
define the conceptual data required to design the conceptual schema
2. FRAMEWORK of the database from scratch. The conceptual model can be
The framework of our paper is the Access@City project, whose represented using, for example, the Entity-Relationship Model [4] or
objective is to define a technological framework for the processing, the UML class diagram [13].
management and exploitation of open data with the goal of promoting
accessibility to city public transport (see Figure 1). We therefore In the second step, taking into account the conceptual data model
address the integration of accessibility data derived from three kinds (which is independent of any database technology) and carrying out a
of sources: 1) existing open data, available from Linked Open Data study of the application specific access model and the frequent types
(LOD) initiatives or obtained using the web scraping of non-semantic of queries, we design the logical graph oriented database model,
data sources; 2) private data concerning actual accessible routes, which is independent of any product. This step provides an initial
obtained by means of crowdsourcing and provided by users product-independent specification, thus improving the maintainability
themselves through their mobile devices and also processed using of the NoSQL database, in addition to making migrations between
Big Data techniques, integrating both historical and real-time data in products easier.
a datastore [10] denominated as “REPOSITORY OF ACCESSIBLE
ROUTES” shown in Figure 1, and 3) data obtained from already In the third step, we attain the physical design and the
existing traffic sensors, in the context of a smart city. implementation for a specific NoSQL product, and the product
database model is obtained. In our case, we have chosen Neo4j [11],
These data sources will be semantically harmonized, while which is, according to the database ranking [5], the most popular
maintaining their diversity, and will feed an open data management graph oriented database. Finally, the implementation phase includes
63
various physical design tasks, such as balancing the need for o Many-to-many relationships (0/n to 0/m) will be
scalability, availability, consistency, partition protection and transformed into an edge between both node types with an
durability. arrowhead on each end to denote the N:M relation
between the entities. At this point, we have to decide and
check whether this relationship should be transformed into
an edge or a node.
o A generalization is a special kind of relationship and will
TEXT
be transformed in the same way as the other types of
relationships, according to its cardinality and including an
edge labelled “is-a”.
Data Requirements
o A composition is a special kind of relationship and will be
Conceptual transformed in the same way as the other types of 1:N
Data Model relationships, according to its cardinality and including an
• Application-
edge labelled “is-composed-of”.
specific access After this first iteration, we have to refine our logical graph
patterns oriented DB model, taking into account both the access patterns of
Logical Graph • Frequent Queries the applications and frequent queries in order to be able to query the
Oriented DB Analysis connected data in many ways, as required by the users.
Model
• Application- 4. VALIDATION: APPLICATION FOR
specific needs
• Balancing physical GENERATING AND STORING ACCESIBLE
needs ROUTES USING A GRAPH ORIENTED
Product DB
Model
DATABASE
In order to validate our proposal, we have developed a native
Android application as we need to use the GPS of the users’ device.
In general, native applications have significant advantages over
Figure 2. Graph Oriented Database Design Approach hybrid applications because they are able to easily access and use the
Focusing on the second step, that is, on the transformation built-in capabilities of the user’s devices (e.g., GPS) [1].
of the Conceptual Data Model into the Logical Graph Oriented This application will allow users to register for the generation of
Model (red arrow in Figure 2), we shall begin with a conceptual accessible routes. They can then use the starting a route option,
model represented using the Entity-Relationship (E/R) Model. For indicating which special need (wheelchair, bike, baby stroller, baby
the logical graph oriented model, we shall consider “directed graphs”, buggy, etc…) they will have on their journey. During the journey, the
which are graphs composed of nodes (or “vertices”) connected by application will periodically register the GPS position (initially, every
relationships called “edges”, each of which is associated with a 25 seconds, although this could change depending on the route, the
direction. The direction of the edges is represented by means of an special need, etc.). When the user finishes the journey, he/she can
arrowhead on the connecting line between the nodes. either discard the route or save it. Users may include comments about
the routes taken, reporting possible incidents and/or including photos.
With regard to the transformation, we consider the E/R schema
obtained in the first step, the most common queries as regards the Figure 3 shows the main functionalities of the application by
data (defined in natural language) and the update operations means of a Use Case Diagram:
performed in the database by the applications in an iterative process.
Bearing these aspects in mind, along with the fact that the data
can be queried in many ways, we have to decide when to transform
an entity or a relationship into a node type or an edge.
In a first iteration, the summarized rules are:
• Each entity will be transformed into a node type labelled as
the entity and its attributes into properties of this node type.
The constraints (uniqueness or not null) of the attributes
will be transformed into constraints of the property/ies of a
node type.
• Each relationship will, in general, be transformed into an
edge between the nodes, depending specifically on the
cardinality of the relationship.
o One-to-one relationships (0/1 to 0/1) will be transformed Figure 3. Use Case Diagram
into an edge (without an arrowhead) between both node
types to denote the 1:1 relation between the entities. In Figure 4, some of the main screenshots of the application
developed are shown.
o One-to-many relationships (0/1 to 0/n) will be
transformed into an edge with an arrowhead to denote the
1:N relation between the entities.
64
their journeys. The routes are composed of at least two GPS points
(POINT Entity). Each of these points is composed of X and Y
coordinates and has a Type, which can be “start point”, “intermediate
point” or “end point”
At this point, another necessary and important decision that had
to be made was which kind of NoSQL database to use for the
development of our big data repository. In this work, we have chosen
a graph oriented database owing to the nature of the data of the
routes, which is highly connected, and the need to query the data in
many ways. The methodology shown in Figure 2 assumes that the
database chosen is a NoSQL graph oriented database. The case study
has been implemented in Neo4j. The application will be connected to
the Neo4j database using API REST and an HTTP connection.
In a previous work [15], we developed our repository using a
NoSQL document database because of its flexibility and ability to
manage complex data structures [14]. However, as the data in our
case study are highly connected (point of a route), the traversal is
much simpler using a graph oriented database.
In order to obtain the logical graph oriented database model, in
a first iteration we have to apply the proposed guidelines. We then
have to consider how the data will be used by the applications, that is,
what the most frequent queries and application-specific access
models are.
In our case study, the most common queries will be related to
users or to routes. The most frequent queries are:
1) Data of the registered users, including their default special
need.
2) Is a route between two given points accessible?
3) Comments concerning or photos of the stored accessible
routes.
4) Of which points is a route composed?
Apart from the queries, there will also be two basic update
Figure 4. Screenshots of the application “Gestión de rutas operations in the database: inserting a new registered user or inserting
accesibles” (Generation and Storage of Accesible Routes) a new route.
For the storage of the routes, we have developed a Big Data Bearing in mind the conceptual data model and the
repository in which to store accessible routes obtained by means of aforementioned queries (defined in natural language), we have
the aforementioned application using crowdsourcing techniques. designed the new model (according to our proposed guidelines):
The Big Data repository in our Case Study was developed by first a) A node type labelled for each entity: User, Special_Need,
identifying the data requirements. We then defined the conceptual Route and Point. The attributes of the entity become the
data model using an Entity-Relationship Data Model (Figure 5). properties of each node type, as can be seen in Figure 6.
Comments Photos
User_cod Route_cod Special
User _Need
Route Point
Login
Duration
Password (1,1) (0,n)
User Create Route
Name
(0,n) (1,1)
E-Mail
Birth_date
Has Contain
Default
Point_cod
Need_cod (2,.n)
(1,1)
X_Coor
Name (0,n) Point
Special_Need Y_Coor
Description
Type
Figure 6. Node of User type with its properties
Figure 5. Conceptual Data Model
b) It is now necessary to carry out the transformation of the
This conceptual data model includes user registration data existing relationships. Here, we have to decide whether
(USER Entity) and information about the routes (ROUTE Entity) relationships will be implemented using an edge or a node.
taken by users with special needs (SPECIAL_NEED Entity). We also
consider possible comments made and/or photos taken by users on
65
There are three relationships: Has Default relationship (1;N), One of our future works will be the formal representation of the
Create relationship (1:N:M) and Contain relationship (1:N). queries and the specification of a formal method with which to
transform the conceptual model and the query model into a logical
c) When analyzing the queries and the application data, we design model based on a NoSQL database (document DB or graph
consider grouping the information of certain entities of the oriented DB). We plan to put the application into use with users with
conceptual data model that will be used together. We therefore disabilities in the near future. Moreover, an immediate future task is
exclude the Special_Need node, as its information can be to extend the functionalities of the mobile application in order to
considered as information regarding the User and the Route. We create a version that can be evaluated by users with special needs.
shall, therefore, include the Special Need information as
properties of the User and Route node types. This signifies that
we now continue working with only three node types and two ACKNOWLEDGMENTS
relationships. This work was partially supported by the Access@City project
(TIN2016-78103-C2-1-R), funded by the Spanish Ministry of
In order to transform both 1:N relationships, we create an edge Economy and Competitiveness.
labelled as the relationship with an arrowhead to denote the 1:N
relation between the entities: Create and Contain.
REFERENCES
[1] Abed, R. (2016). Mobile. Hybrid vs Native Mobile Apps – The Answer
User Route Point is Clear. Retrieved from: https://ymedialabs.com/hybrid-vs-native-
mobile-apps-the-answer-is-clear/
[2] Atzeni, P. (2015) Models for NoSQL databases: a contradiction?
Presentation. Sezione di Informatica e Automazione.
Create Contain [3] Buggioti, F. , Cabibbo, L., Atzeni, P. and Torlone, R. (2014). Database
User Route Point
Design for NoSQL Systems. ER 2014, LNCS 8824, pp.223-231.
[4] Chen, Peter (March 1976). The Entity-Relationship Model - Toward a
Unified View of Data. ACM Transactions on Database Systems. 1 (1):
Figure 7. Node types with edges 9–36.
The final graph oriented database design will, therefore, consist [5] DB-Engines Ranking (2017). https://db-engines.com/en/ranking.
of three node types: USER, ROUTE and POINTs. USERS will store, [6] Estellés Arolas, E., González Ladrón de Guevara, F. (2012). Towards an
for each user, a set of information (name, email, birth date, etc…) and integrated crowdsourcing definition. Journal of Information Science,
information on their default special needs. ROUTES will store the 38(2): 189-200.
special needs for that route, and some additional information, such as [7] Frisendal, T. (2016). Graph Data Modeling for NoSQL and SQL.
comments, photos, etc. POINTs will store information concerning the Visualize Structure and Meaning. Technincs Publications.
X and Y coordinates and the type of point (initial, intermediate and [8] Grier, D. A. (2013). Crowdsourcing For Dummies. Paperback
end point). The information regarding routes and subroutes with a
[9] Ke Mao , Licia Capra , Mark Harman , Yue Jia - A survey of the use of
special need can be easily obtained with this design.
crowdsourcing in software engineering (2016). The Journal of Systems
and Software
5. CONCLUSIONS [10] Marz, N.; Warren, J. (2015). Big Data: Principles and best practices of
Although document databases have a dynamic schema (but are scalable realtime systems. Manning.
not schemaless, as stated in some forums), it is very important to [11] Neo4j. https://neo4j.com/product/
design this schema correctly because of its impact on the [12] Neirotti, P., De Marco, A., Cagliano, A. C., Mangano, G., & Scorrano,
performance of the database. A methodological process with which to F. (2014). Current trends in Smart City initiatives: Some stylised facts.
guide the user in the design process of a NoSQL database is, Cities, 38, 25-36.
therefore, required. However, despite the existence of several works [13] Object Management Group (2015) OMG Unified Modeling Language
and many websites with best practices, there is no commonly TM (OMG UML) Version 2.5. Retrieved from
accepted solution for the design of NoSQL databases. http://www.omg.org/spec/UML/2.5/ Solid IT (2017).
[14] Sullivan, D. (2015). NoSQL For Mere Mortals. Addison-Wesley.
In [15], we proposed some guidelines for the design of a
document database. In this paper, and in order to complete our [15] Vela, B., Cavero, J.M., Caceres, P., Sierra, A.& Cuesta, C.E. (2017),
Defining a NoSQL document of accessible transport routes. Darli-Ap
NoSQL Design Methodology, we address the design of NoSQL
2017 in iThings-GreenCom-CPSCom-SmartData 2017. Exeter, UK.
graph oriented databases. In order to validate this proposal, we
present a case study in which we generate accessible routes created [16] World Bank. http://www.worldbank.org/en/topic/disability/overview
by users with special mobility needs using a micro-task based on [17] World Report on Disability, 2011.
crowdsourcing and store them in a NoSQL graph oriented database in http://www.who.int/disabilities/world_report/2011/report/en/
Neo4j. The design process is principally based on the requirements of
the applications and the most frequent queries that the system will
have to deal with.
66