=Paper= {{Paper |id=Vol-2083/paper-10 |storemode=property |title=Using a NoSQL Graph Oriented Database to Store Accessible Transport Routes |pdfUrl=https://ceur-ws.org/Vol-2083/paper-10.pdf |volume=Vol-2083 |authors=Belén Vela,José María Cavero,Paloma Cáceres,Almudena Sierra,Carlos E. Cuesta |dblpUrl=https://dblp.org/rec/conf/edbt/VelaCCSC18 }} ==Using a NoSQL Graph Oriented Database to Store Accessible Transport Routes== https://ceur-ws.org/Vol-2083/paper-10.pdf
 Using a NoSQL graph oriented database to store accessible
                    transport routes
                Belén Vela                                   José María Cavero                                      Paloma Cáceres
      Escuela Técnica Superior de                        Escuela Técnica Superior de                           Escuela Técnica Superior de
         Ingeniería Informática                             Ingeniería Informática                                Ingeniería Informática
       Rey Juan Carlos University                         Rey Juan Carlos University                            Rey Juan Carlos University
        28933 Móstoles, Spain                               28933 Móstoles, Spain                                 28933 Móstoles, Spain
           belen.vela@urjc.es                             josemaria.cavero@urjc.es                               paloma.caceres@urjc.es



           Almudena Sierra                                    Carlos E. Cuesta
      Escuela Técnica Superior de                        Escuela Técnica Superior de
          Ingeniería Informática                            Ingeniería Informática
       Rey Juan Carlos University                         Rey Juan Carlos University
         28933 Móstoles, Spain                             28933 Móstoles, Spain
        almudena.sierra@urjc.es                             carlos.cuesta@urjc.es



ABSTRACT                                                                             The principal eventual aim of our study was to discover the
Each day, people have to move to carry out their daily tasks, such as            strengths or weaknesses of the public transport information provided
                                                                                 and the services offered.
going to work, studying, shopping, etc., signifying that thousands of
trips are taken on public transport on a daily basis. A huge number of               With regard to public transport users, we have analysed the
these trips are taken by people with special mobility needs. In spite of         quality and quantity of accessibility information and services; in this
the existence of numerous Websites and apps that provide                         case, we have defined six accessibility levels according to the
information about public transport services, there is a lack of                  accessibility features related to users’ mobility, visual, audible needs,
information regarding the accessibility of the routes and sites. We              along with other user needs, and the capacity to provide accessible
are, therefore, working on the development of a technological                    routes related to those user needs, in addition to assigning an
framework for the processing, management and exploitation of open                accessibility level to each of the applications studied.
data, with the goal of promoting accessibility to city public transport              With regard to public transport data, in addition to identifying
within the framework of the Access@city project. In this paper we                the accessibility information contained in them, we have also
specifically focus on the design and storage of accessible transport             identified their format in order to determine whether the data
routes, obtained by means of crowdsourcing techniques, in a NoSQL                provided can be simply managed and reused, thus facilitating their
graph oriented database.                                                         extraction from the Internet and their subsequent use.
                                                                                     All of the websites and mobile applications analyzed provide
1. INTRODUCTION                                                                  maps and services and some type of accessibility information, but
According to the World Bank [16], one billion people, or 15% of the              none of them provides generic mechanisms with which to attain
world’s population, have some type of disability. Although this                  accessible transportation data and which would improve mobility in a
depends on each country, a significant percentage of the people who              smart city. For example, the website accessible.net shows maps with
use public transport have special mobility needs. One of the goals of            accessibility information, but does not include search options. The
smart cities is to improve the quality of life of all citizens [12]. In          website for disabled people, www.discapnet.es, presents information
fact, in a smart city, anyone should be able to move easily and                  about training, education, employment, legislation, documentation,
according to their needs. There are, therefore, several initiatives              organization and related services, and includes guides for accessible
whose objective is to improve accessibility to public transport for              transport with the option of searching for routes. There are also
people with disabilities. For example, the World Health Organization             websites that provide information regarding accessibility for
in its “World Report on Disability 2011” [17] proposes to improve                wheelchair users, such as wheelmap.org and Rollstuhlrouting.de. The
accessibility to public transport for people with disabilities, and this         abil.io website provides information regarding accessible journeys
includes “making public transport systems more flexible for the user             and service-based routing using public transport. The main reason for
by optimizing the use of information technology”.                                this is that there is a significant lack of open and reusable data
                                                                                 concerning public transport and its accessibility.
    Various projects and software tools address the issue of public
transport and its accessibility. We have carried out a large-scale study             In order to address this lack of open transport data and of
of websites and mobile applications that offer information regarding             information regarding accessibility, we are defining an open data
the accessibility of public transport.                                           repository for accessible public transport within the framework of the
                                                                                 Access@City project. The repository will be developed using a
© 2018 Copyright held by the owner/author(s). Published in the Workshop          NoSQL database owing to its capacity to manage huge volumes of
Proceedings of the EDBT/ICDT 2018 Joint Conference (March 26, 2018,
Vienna,Austria) on CEUR-WS.org (ISSN 1613-0073). Distribution of this            information, along with its flexibility and scalability [14]. We have
paper is permitted under the terms of the Creative Commons license CC-by-        specifically selected a NoSQL graph oriented database as we are
nc-nd 4.0.




                                                                            62
dealing with highly connected data and wish to be able to make                  platform, which consists of a repository (data hub) and a service
queries that are more efficient in a graph oriented database [7].               generation layer. This layer will be able to provide access to data
                                                                                consumers, through the automatic generation of customized APIs
    We propose to develop the graph oriented database, which will               composed of services adapted to available data, which will be
be designed from scratch, by using a methodological approach. In                exploited by different applications. In particular, we consider the case
general, we have detected a lack of specific methodologies for the              of mobile applications, which would make it possible for citizens to
design of NoSQL databases that take into account the application                obtain accessible routes between two points in a city in real time, and
characteristics and the most frequent data queries, which is a                  even combine different transport networks. These apps would
particularly important aspect in this kind of systems.                          translate the information regarding our smart city into an accessibility
    A trend concerning how to incorporate traditional modeling                  context, thus resulting in the definition of an accessible city.
notions in this context has recently emerged [2]. For example, in [7],              The case study that we present in the following section of this
Kaur and Rani show an example of NoSQL database design. They                    paper is focused on the marked part of Figure 1, which includes the
use an Entity Relationship Model [4] to obtain a conceptual                     application that captures the accessible routes obtained and validated
representation of the model and different data models for each                  by users using crowdsourcing techniques and the Big Data repository
NoSQL database (for example, a class diagram for a document                     (REPOSITORY OF ACCESSIBLE ROUTES) that will store the
database). Buggiotti et al. propose a methodology based on an                   routes. We consider that a route is accessible if a person with a
abstract data model for NoSQL databases called NoAM (NoSQL                      special need can use it to reach his or her destination.
Abstract Model) [3].
    In summary, it could be said that different approaches exist but
that no solution has, as yet, been commonly accepted. In our opinion,
the key concept here is neither the model nor the representation                                                                                                                                               Access@City
techniques to be used, but rather the design process and the aspects to                                                        Multiply@City                        Intelligent
                                                                                                                                                                    generation of
                                                                                                                                                                                           Publi@City
                                                                                                                     Source
be considered. The characteristics of NoSQL databases are different                                               processing
                                                                                                                                                                    ETL processes
                                                                                                     Scraper
in nature from those of SQL databases. Denormalization and queries                                                                                                                                                  Service 1




                                                                                                                                                                                     Automatic Generation of
                                                                                           Scraper


must be taken into account from the beginning of the process.                                            Script




                                                                                                                                                    HARMONIZATION
                                                                                                                                  Script
                                                                                                                                                                                                                    Service 2




                                                                                                                                                                                          API services
                                                                                                                                           Script                      Data hub
                                                                                                                                                                       Open Data
                                                                                                                                                                        Datalog                                     Service 3
    In order to address this lack, in our previous work [15], we                        REPOSITORY OF ACCESSIBLE
                                                                                                ROUTES
proposed some guidelines for the design of document databases in                                                                                                                                                    Service n
which we integrated the final use of the data and the most frequent                                                                                                     Repository

queries into the design process.                                                                                                                                        BIG DATA


     In this paper, we shall show how to design a NoSQL graph
                                                                                             @@@
oriented database in which to store accessible routes generated by the                                              @

users of a mobile application. The accessible routes are obtained for
users with special needs by using crowdsourcing techniques (micro
                                                                                                         Figure 1. Architecture of Access@City
tasks) [6][8][9]. For the storage of the routes we specifically use what
is, according to [13], the most popular graph oriented database i.e.,              The remaining parts of Figure 1 show the rest of the architecture
Neo4j [11].                                                                     on which we are working in the Access@City project.
    The remainder of the paper is organized as follows: the
framework of our work is briefly presented in Section 2. In Section 3           3. DESIGNING A NOSQL GRAPH
we present our approach for the design of a graph oriented database             ORIENTED DATABASE
for the storage of the accessible public transport routes generated. In         Our proposal consists of developing the NoSQL graph oriented
Section 4 we show a validation of our proposal, along with a brief              database by following a process based on the traditional database
description of the mobile application developed for the generation of           design. The proposed approach is summarized in Figure 2 .
accessible transport routes and their storage in a Neo4j graph oriented
database. Finally, our main conclusions and future work are                         In a first step, we acquire and analyze the data sources or the
summarized in Section 5.                                                        specification in order to be able to determine the entities and their
                                                                                relationships, along with their properties. This specification is used to
                                                                                define the conceptual data required to design the conceptual schema
2. FRAMEWORK                                                                    of the database from scratch. The conceptual model can be
The framework of our paper is the Access@City project, whose                    represented using, for example, the Entity-Relationship Model [4] or
objective is to define a technological framework for the processing,            the UML class diagram [13].
management and exploitation of open data with the goal of promoting
accessibility to city public transport (see Figure 1). We therefore                 In the second step, taking into account the conceptual data model
address the integration of accessibility data derived from three kinds          (which is independent of any database technology) and carrying out a
of sources: 1) existing open data, available from Linked Open Data              study of the application specific access model and the frequent types
(LOD) initiatives or obtained using the web scraping of non-semantic            of queries, we design the logical graph oriented database model,
data sources; 2) private data concerning actual accessible routes,              which is independent of any product. This step provides an initial
obtained by means of crowdsourcing and provided by users                        product-independent specification, thus improving the maintainability
themselves through their mobile devices and also processed using                of the NoSQL database, in addition to making migrations between
Big Data techniques, integrating both historical and real-time data in          products easier.
a datastore [10] denominated as “REPOSITORY OF ACCESSIBLE
ROUTES” shown in Figure 1, and 3) data obtained from already                        In the third step, we attain the physical design and the
existing traffic sensors, in the context of a smart city.                       implementation for a specific NoSQL product, and the product
                                                                                database model is obtained. In our case, we have chosen Neo4j [11],
   These data sources will be semantically harmonized, while                    which is, according to the database ranking [5], the most popular
maintaining their diversity, and will feed an open data management              graph oriented database. Finally, the implementation phase includes




                                                                           63
various physical design tasks, such as balancing the need for                          o Many-to-many relationships (0/n to 0/m) will be
scalability, availability, consistency, partition protection and                         transformed into an edge between both node types with an
durability.                                                                              arrowhead on each end to denote the N:M relation
                                                                                         between the entities. At this point, we have to decide and
                                                                                         check whether this relationship should be transformed into
                                                                                         an edge or a node.
                                                                                       o A generalization is a special kind of relationship and will
           TEXT
                                                                                         be transformed in the same way as the other types of
                                                                                         relationships, according to its cardinality and including an
                                                                                         edge labelled “is-a”.
                                      Data Requirements
                                                                                       o A composition is a special kind of relationship and will be
             Conceptual                                                                  transformed in the same way as the other types of 1:N
             Data Model                                                                  relationships, according to its cardinality and including an
                                       • Application-
                                                                                         edge labelled “is-composed-of”.
                                         specific access                             After this first iteration, we have to refine our logical graph
                                         patterns                                oriented DB model, taking into account both the access patterns of
            Logical Graph              • Frequent Queries                        the applications and frequent queries in order to be able to query the
             Oriented DB                 Analysis                                connected data in many ways, as required by the users.
                Model
                                       • Application-                            4. VALIDATION: APPLICATION FOR
                                         specific needs
                                       • Balancing physical                      GENERATING AND STORING ACCESIBLE
                                         needs                                   ROUTES USING A GRAPH ORIENTED
             Product DB
               Model
                                                                                 DATABASE
                                                                                     In order to validate our proposal, we have developed a native
                                                                                 Android application as we need to use the GPS of the users’ device.
                                                                                 In general, native applications have significant advantages over
       Figure 2. Graph Oriented Database Design Approach                         hybrid applications because they are able to easily access and use the
   Focusing on the second step, that is, on the transformation                   built-in capabilities of the user’s devices (e.g., GPS) [1].
of the Conceptual Data Model into the Logical Graph Oriented                         This application will allow users to register for the generation of
Model (red arrow in Figure 2), we shall begin with a conceptual                  accessible routes. They can then use the starting a route option,
model represented using the Entity-Relationship (E/R) Model. For                 indicating which special need (wheelchair, bike, baby stroller, baby
the logical graph oriented model, we shall consider “directed graphs”,           buggy, etc…) they will have on their journey. During the journey, the
which are graphs composed of nodes (or “vertices”) connected by                  application will periodically register the GPS position (initially, every
relationships called “edges”, each of which is associated with a                 25 seconds, although this could change depending on the route, the
direction. The direction of the edges is represented by means of an              special need, etc.). When the user finishes the journey, he/she can
arrowhead on the connecting line between the nodes.                              either discard the route or save it. Users may include comments about
                                                                                 the routes taken, reporting possible incidents and/or including photos.
    With regard to the transformation, we consider the E/R schema
obtained in the first step, the most common queries as regards the                  Figure 3 shows the main functionalities of the application by
data (defined in natural language) and the update operations                     means of a Use Case Diagram:
performed in the database by the applications in an iterative process.
    Bearing these aspects in mind, along with the fact that the data
can be queried in many ways, we have to decide when to transform
an entity or a relationship into a node type or an edge.
   In a first iteration, the summarized rules are:
   •     Each entity will be transformed into a node type labelled as
         the entity and its attributes into properties of this node type.
         The constraints (uniqueness or not null) of the attributes
         will be transformed into constraints of the property/ies of a
         node type.
   •     Each relationship will, in general, be transformed into an
         edge between the nodes, depending specifically on the
         cardinality of the relationship.
       o One-to-one relationships (0/1 to 0/1) will be transformed                                    Figure 3. Use Case Diagram
         into an edge (without an arrowhead) between both node
         types to denote the 1:1 relation between the entities.                     In Figure 4, some of the main screenshots of the application
                                                                                 developed are shown.
       o One-to-many relationships (0/1 to 0/n) will be
         transformed into an edge with an arrowhead to denote the
         1:N relation between the entities.




                                                                            64
                                                                                                           their journeys. The routes are composed of at least two GPS points
                                                                                                           (POINT Entity). Each of these points is composed of X and Y
                                                                                                           coordinates and has a Type, which can be “start point”, “intermediate
                                                                                                           point” or “end point”
                                                                                                               At this point, another necessary and important decision that had
                                                                                                           to be made was which kind of NoSQL database to use for the
                                                                                                           development of our big data repository. In this work, we have chosen
                                                                                                           a graph oriented database owing to the nature of the data of the
                                                                                                           routes, which is highly connected, and the need to query the data in
                                                                                                           many ways. The methodology shown in Figure 2 assumes that the
                                                                                                           database chosen is a NoSQL graph oriented database. The case study
                                                                                                           has been implemented in Neo4j. The application will be connected to
                                                                                                           the Neo4j database using API REST and an HTTP connection.
                                                                                                               In a previous work [15], we developed our repository using a
                                                                                                           NoSQL document database because of its flexibility and ability to
                                                                                                           manage complex data structures [14]. However, as the data in our
                                                                                                           case study are highly connected (point of a route), the traversal is
                                                                                                           much simpler using a graph oriented database.
                                                                                                                In order to obtain the logical graph oriented database model, in
                                                                                                           a first iteration we have to apply the proposed guidelines. We then
                                                                                                           have to consider how the data will be used by the applications, that is,
                                                                                                           what the most frequent queries and application-specific access
                                                                                                           models are.
                                                                                                               In our case study, the most common queries will be related to
                                                                                                           users or to routes. The most frequent queries are:
                                                                                                               1) Data of the registered users, including their default special
                                                                                                           need.
                                                                                                               2) Is a route between two given points accessible?
                                                                                                               3) Comments concerning or photos of the stored accessible
                                                                                                           routes.
                                                                                                               4) Of which points is a route composed?

                                                                                                               Apart from the queries, there will also be two basic update
     Figure 4. Screenshots of the application “Gestión de rutas                                            operations in the database: inserting a new registered user or inserting
     accesibles” (Generation and Storage of Accesible Routes)                                              a new route.
    For the storage of the routes, we have developed a Big Data                                                Bearing in mind the conceptual data model and the
repository in which to store accessible routes obtained by means of                                        aforementioned queries (defined in natural language), we have
the aforementioned application using crowdsourcing techniques.                                             designed the new model (according to our proposed guidelines):
    The Big Data repository in our Case Study was developed by first                                            a) A node type labelled for each entity: User, Special_Need,
identifying the data requirements. We then defined the conceptual                                               Route and Point. The attributes of the entity become the
data model using an Entity-Relationship Data Model (Figure 5).                                                  properties of each node type, as can be seen in Figure 6.
                                   Comments            Photos
   User_cod                                                                               Route_cod                                      Special
                                                                                                                             User        _Need
                                                                                                                                                      Route       Point
    Login
                                                                                           Duration
   Password                    (1,1)                            (0,n)
                    User                      Create                    Route
    Name
                           (0,n)                                            (1,1)
    E-Mail

   Birth_date
                   Has                                                  Contain
                  Default

                                                                                          Point_cod
   Need_cod                                                                      (2,.n)
                           (1,1)
                                                                                            X_Coor
    Name                            (0,n)                                Point
                Special_Need                                                                 Y_Coor
  Description
                                                                                              Type

                                                                                                                     Figure 6. Node of User type with its properties
                 Figure 5. Conceptual Data Model
                                                                                                                b) It is now necessary to carry out the transformation of the
    This conceptual data model includes user registration data                                                  existing relationships. Here, we have to decide whether
(USER Entity) and information about the routes (ROUTE Entity)                                                   relationships will be implemented using an edge or a node.
taken by users with special needs (SPECIAL_NEED Entity). We also
consider possible comments made and/or photos taken by users on




                                                                                                      65
     There are three relationships: Has Default relationship (1;N),                One of our future works will be the formal representation of the
     Create relationship (1:N:M) and Contain relationship (1:N).               queries and the specification of a formal method with which to
                                                                               transform the conceptual model and the query model into a logical
     c) When analyzing the queries and the application data, we                design model based on a NoSQL database (document DB or graph
     consider grouping the information of certain entities of the              oriented DB). We plan to put the application into use with users with
     conceptual data model that will be used together. We therefore            disabilities in the near future. Moreover, an immediate future task is
     exclude the Special_Need node, as its information can be                  to extend the functionalities of the mobile application in order to
     considered as information regarding the User and the Route. We            create a version that can be evaluated by users with special needs.
     shall, therefore, include the Special Need information as
     properties of the User and Route node types. This signifies that
     we now continue working with only three node types and two                ACKNOWLEDGMENTS
     relationships.                                                            This work was partially supported by the Access@City project
                                                                               (TIN2016-78103-C2-1-R), funded by the Spanish Ministry of
   In order to transform both 1:N relationships, we create an edge             Economy and Competitiveness.
   labelled as the relationship with an arrowhead to denote the 1:N
   relation between the entities: Create and Contain.
                                                                               REFERENCES
                                                                               [1]  Abed, R. (2016). Mobile. Hybrid vs Native Mobile Apps – The Answer
       User                Route              Point                                 is Clear. Retrieved from: https://ymedialabs.com/hybrid-vs-native-
                                                                                    mobile-apps-the-answer-is-clear/
                                                                               [2] Atzeni, P. (2015) Models for NoSQL databases: a contradiction?
                                                                                    Presentation. Sezione di Informatica e Automazione.
               Create              Contain                                     [3] Buggioti, F. , Cabibbo, L., Atzeni, P. and Torlone, R. (2014). Database
       User                Route               Point
                                                                                    Design for NoSQL Systems. ER 2014, LNCS 8824, pp.223-231.
                                                                               [4] Chen, Peter (March 1976). The Entity-Relationship Model - Toward a
                                                                                    Unified View of Data. ACM Transactions on Database Systems. 1 (1):
                  Figure 7. Node types with edges                                   9–36.
    The final graph oriented database design will, therefore, consist          [5] DB-Engines Ranking (2017). https://db-engines.com/en/ranking.
of three node types: USER, ROUTE and POINTs. USERS will store,                 [6] Estellés Arolas, E., González Ladrón de Guevara, F. (2012). Towards an
for each user, a set of information (name, email, birth date, etc…) and             integrated crowdsourcing definition. Journal of Information Science,
information on their default special needs. ROUTES will store the                   38(2): 189-200.
special needs for that route, and some additional information, such as         [7] Frisendal, T. (2016). Graph Data Modeling for NoSQL and SQL.
comments, photos, etc. POINTs will store information concerning the                 Visualize Structure and Meaning. Technincs Publications.
X and Y coordinates and the type of point (initial, intermediate and           [8] Grier, D. A. (2013). Crowdsourcing For Dummies. Paperback
end point). The information regarding routes and subroutes with a
                                                                               [9] Ke Mao , Licia Capra , Mark Harman , Yue Jia - A survey of the use of
special need can be easily obtained with this design.
                                                                                    crowdsourcing in software engineering (2016). The Journal of Systems
                                                                                    and Software
5. CONCLUSIONS                                                                 [10] Marz, N.; Warren, J. (2015). Big Data: Principles and best practices of
Although document databases have a dynamic schema (but are                          scalable realtime systems. Manning.
not schemaless, as stated in some forums), it is very important to             [11] Neo4j. https://neo4j.com/product/
design this schema correctly because of its impact on the                      [12] Neirotti, P., De Marco, A., Cagliano, A. C., Mangano, G., & Scorrano,
performance of the database. A methodological process with which to                 F. (2014). Current trends in Smart City initiatives: Some stylised facts.
guide the user in the design process of a NoSQL database is,                        Cities, 38, 25-36.
therefore, required. However, despite the existence of several works           [13] Object Management Group (2015) OMG Unified Modeling Language
and many websites with best practices, there is no commonly                         TM       (OMG        UML)       Version     2.5.      Retrieved     from
accepted solution for the design of NoSQL databases.                                http://www.omg.org/spec/UML/2.5/ Solid IT (2017).
                                                                               [14] Sullivan, D. (2015). NoSQL For Mere Mortals. Addison-Wesley.
    In [15], we proposed some guidelines for the design of a
document database. In this paper, and in order to complete our                 [15] Vela, B., Cavero, J.M., Caceres, P., Sierra, A.& Cuesta, C.E. (2017),
                                                                                    Defining a NoSQL document of accessible transport routes. Darli-Ap
NoSQL Design Methodology, we address the design of NoSQL
                                                                                    2017 in iThings-GreenCom-CPSCom-SmartData 2017. Exeter, UK.
graph oriented databases. In order to validate this proposal, we
present a case study in which we generate accessible routes created            [16] World Bank. http://www.worldbank.org/en/topic/disability/overview
by users with special mobility needs using a micro-task based on               [17] World            Report           on          Disability,          2011.
crowdsourcing and store them in a NoSQL graph oriented database in                  http://www.who.int/disabilities/world_report/2011/report/en/
Neo4j. The design process is principally based on the requirements of
the applications and the most frequent queries that the system will
have to deal with.




                                                                          66