=Paper= {{Paper |id=Vol-2277/paper17 |storemode=property |title= Storing Metagraph Model in Relational, Document-Oriented, and Graph Databases |pdfUrl=https://ceur-ws.org/Vol-2277/paper17.pdf |volume=Vol-2277 |authors=Valeriy Chernenkiy,Yuriy Gapanyuk,Yuriy Kaganov,Ivan Dunin,Maxim Lyaskovsky,Vadim Larionov |dblpUrl=https://dblp.org/rec/conf/rcdl/ChernenkiyGKDLL18 }} == Storing Metagraph Model in Relational, Document-Oriented, and Graph Databases == https://ceur-ws.org/Vol-2277/paper17.pdf
          Storing Metagraph Model in Relational, Document-
                   Oriented, and Graph Databases
                 © Valeriy M. Chernenkiy © Yuriy E. Gapanyuk © Yuriy T. Kaganov
                   © Ivan V. Dunin © Maxim A. Lyaskovsky © Vadim S. Larionov
                            Bauman Moscow State Technical University,
                                         Moscow, Russia
                    chernen@bmstu.ru, gapyu@bmstu.ru, kaganov.y.t@bmstu.ru,
                johnmoony@yandex.ru, maksim_lya@mail.ru, larionov.vadim@mail.ru
            Abstract. This paper proposes an approach for metagraph model storage in databases with different
     data models. The formal definition of the metagraph data model is given. The approaches for mapping the
     metagraph model to the flat graph, document-oriented, and relational data models are proposed. The
     limitations of the RDF model in comparison with the metagraph model are considered. It is shown that the
     metagraph model addresses RDF limitations in a natural way without emergence loss. The experiments result
     for storing the metagraph model in different databases are given.
            Keywords: metagraph, metavertex, flat graph, graph database, document-oriented database, relational
     database.

                                                                  adapted for information systems description by the
 1 Introduction                                                   present authors [2]. According to [2]:
                                                                                             𝑀𝑀𝑀𝑀 = 〈𝑀𝑀𝑀𝑀 𝑉𝑉 , 𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀 , 𝑀𝑀𝑀𝑀 𝐸𝐸 〉,
 At present, on the one hand, the domains are becoming            where 𝑀𝑀𝑀𝑀 – metagraph; 𝑀𝑀𝑀𝑀 𝑉𝑉 – set of metagraph
 more and more complex. Therefore, models based on                vertices; 𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀 – set of metagraph metavertices; 𝑀𝑀𝑀𝑀 𝐸𝐸 –
 complex graphs are increasingly used in various fields of        set of metagraph edges.
 science from mathematics and computer science to                        A metagraph vertex is described by the set of
 biology and sociology.                                           attributes: 𝑣𝑣𝑖𝑖 = {𝑎𝑎𝑎𝑎𝑎𝑎𝑘𝑘 }, 𝑣𝑣𝑖𝑖 ∈ 𝑀𝑀𝑀𝑀 𝑉𝑉 , where 𝑣𝑣𝑖𝑖 – metagraph
     On the other hand, there are currently only graph            vertex; 𝑎𝑎𝑎𝑎𝑎𝑎𝑘𝑘 – attribute.
 databases based on flat graph or hypergraph models that                 A metagraph edge is described by the set of attributes,
 are not capable enough of being suitable repositories for        the source and destination vertices and edge direction
 complex relations in the domains.                                flag:
     We propose to use a metagraph data model that                  𝑒𝑒𝑖𝑖 = 〈𝑣𝑣𝑆𝑆 , 𝑣𝑣𝐸𝐸 , 𝑒𝑒𝑒𝑒, {𝑎𝑎𝑎𝑎𝑎𝑎𝑘𝑘 }〉, 𝑒𝑒𝑖𝑖 ∈ 𝑀𝑀𝑀𝑀 𝐸𝐸 , 𝑒𝑒𝑒𝑒 = 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡|𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓,
 allows storing more complex relationships than a flat            where 𝑒𝑒𝑖𝑖 – metagraph edge; 𝑣𝑣𝑆𝑆 – source vertex
 graph or hypergraph data models.                                 (metavertex) of the edge; 𝑣𝑣𝐸𝐸 – destination vertex
     This paper is devoted to methods of storage of the           (metavertex) of the edge; eo – edge direction flag
 metagraph model based on the flat graph, document-               (eo=true – directed edge, eo=false – undirected edge);
 oriented, and relational data models.                            atrk – attribute.
     We have tried to offer a general approach to store                  The metagraph fragment:
 metagraph data in any database with the above-
                                                                            𝑀𝑀𝑀𝑀𝑖𝑖 = �𝑒𝑒𝑒𝑒𝑗𝑗 �, 𝑒𝑒𝑒𝑒𝑗𝑗 ∈ (𝑀𝑀𝑀𝑀 𝑉𝑉 ∪ 𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀 ∪ 𝑀𝑀𝑀𝑀 𝐸𝐸 ),
 mentioned data model. But at the same time, we
 conducted experiments on several databases. The results          where 𝑀𝑀𝑀𝑀𝑖𝑖 – metagraph fragment; 𝑒𝑒𝑒𝑒𝑗𝑗 – an element that
 of the experiments are presented in the corresponding            belongs to the union of vertices, metavertices, and edges.
 section.                                                                The metagraph metavertex:
                                                                                                                                          𝑀𝑀𝑀𝑀
                                                                                    𝑚𝑚𝑚𝑚𝑖𝑖 = 〈{𝑎𝑎𝑎𝑎𝑎𝑎𝑘𝑘 }, 𝑀𝑀𝑀𝑀𝑗𝑗 〉, 𝑚𝑚𝑚𝑚𝑖𝑖 ∈ 𝑀𝑀𝑀𝑀 ,
 2 The description of the metagraph model                         where 𝑚𝑚𝑚𝑚𝑖𝑖 – metagraph metavertex belongs to set of
 In this section, we will describe the metagraph model.           metagraph metavertices 𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀 ; 𝑎𝑎𝑎𝑎𝑎𝑎𝑘𝑘 – attribute, 𝑀𝑀𝑀𝑀𝑗𝑗 –
 This model may be considered as a “logical” model of             metagraph fragment.
 the metagraph storage.                                                  Thus, a metavertex in addition to the attributes
     A metagraph is a kind of complex network model,              includes a fragment of the metagraph. The presence of
 proposed by A. Basu and R. Blanning [1] and then                 private attributes and connections for a metavertex is a
                                                                  distinguishing feature of a metagraph. It makes the
                                                                  definition of metagraph holonic – a metavertex may
Proceedings of the XX International Conference                    include a number of lower level elements and in turn,
“Data Analytics and Management in Data Intensive                  may be included in a number of higher level elements.
Domains” (DAMDID/RCDL’2018), Moscow, Russia,                             From the general system theory point of view, a
October 9-12, 2018



                                                             82
metavertex is a special case of the manifestation of the                 Consider there is a flat graph:
emergence principle, which means that a metavertex                                                 𝐹𝐹𝐹𝐹 = 〈𝐹𝐹𝐹𝐹 𝑉𝑉 , 𝐹𝐹𝐹𝐹 𝐸𝐸 〉,
with its private attributes and connections becomes a               where 𝐹𝐹𝐹𝐹 – set of graph vertices; 𝐹𝐹𝐹𝐹 𝐸𝐸 – set of graph
                                                                                     𝑉𝑉

whole that cannot be separated into its component parts.            edges.
The example of metagraph is shown in Figure 1.                           Then a flat graph 𝐹𝐹𝐹𝐹 may be unambiguously
                          e7                                        transformed into bipartite graph 𝐵𝐵𝐵𝐵𝐵𝐵:
                                                                                         𝐵𝐵𝐵𝐵𝐵𝐵 = 〈𝐵𝐵𝐵𝐵𝐵𝐵 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 , 𝐵𝐵𝐹𝐹𝐹𝐹 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 〉,
                               e8                                                          𝐵𝐵𝐵𝐵𝐵𝐵 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 = 〈𝐹𝐹𝐺𝐺 𝐵𝐵𝐵𝐵 , 𝐹𝐹𝐹𝐹 𝐵𝐵𝐵𝐵 〉,
             mv1                        mv2                                               𝐹𝐹𝐹𝐹 𝑉𝑉 ↔ 𝐹𝐹𝐹𝐹 𝐵𝐵𝐵𝐵 , 𝐹𝐹𝐹𝐹 𝐸𝐸 ↔ 𝐹𝐹𝐹𝐹 𝐵𝐵𝐵𝐵 ,
            e1                 e4
                   vv22                                             where 𝐵𝐵𝐵𝐵𝐵𝐵 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 – set of graph vertices; 𝐵𝐵𝐹𝐹𝐹𝐹 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 – set
                                          vv44                      of graph edges. The set 𝐵𝐵𝐵𝐵𝐵𝐵 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 can be divided into two
           vv11   e2
                                                 e6                 disjoint and independent sets 𝐹𝐹𝐹𝐹 𝐵𝐵𝐵𝐵 and 𝐹𝐹𝐺𝐺 𝐵𝐵𝐵𝐵 and there
            e3     vv33         e5        vv55                      are two isomorphisms 𝐹𝐹𝐹𝐹 𝑉𝑉 ↔ 𝐹𝐹𝐹𝐹 𝐵𝐵𝐵𝐵 and 𝐹𝐹𝐹𝐹 𝐸𝐸 ↔ 𝐹𝐹𝐹𝐹 𝐵𝐵𝐵𝐵 .
                                                                    Thus, we transform the edges of graph 𝐹𝐹𝐹𝐹 into subset of
                                                                    vertices of graph 𝐵𝐵𝐵𝐵𝐵𝐵. The set 𝐵𝐵𝐹𝐹𝐹𝐹 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 stores the
                               mv3                                  information about relations between vertices and edges
                                                                    in graph 𝐹𝐹𝐹𝐹.
Figure 1 The example of metagraph                                        It is important to note that from bipartite graph point
                                                                    of view there is no difference whether original graph 𝐹𝐹𝐹𝐹
    This example contains three metavertices: mv1, mv2,             oriented or not, because edges of the graph 𝐹𝐹𝐹𝐹 are
and mv3. Metavertex mv1 contains vertices v1, v2, v3 and            represented as vertices and, orientation sign became the
connecting them edges e1, e2, e3. Metavertex mv2 contains           property of the new vertex.
vertices v4, v5 and connecting them edge e6. Edges e4, e5                From the general system theory point of view,
are examples of edges connecting vertices v2-v4 and v3-v5           transforming edge into vertex, we consider the relation
respectively and are contained in different metavertices            between entities as a special kind of higher-order entity
mv1 and mv2. Edge e7 is an example of an edge connecting            that includes lower-level entities.
metavertices mv1 and mv2. Edge e8 is an example of an                    Now we will apply this approach of flattening to
edge connecting vertex v2 and metavertex mv2.                       metagraphs. In case of metagraph we use not bipartite but
Metavertex mv3 contains metavertex mv2, vertices v2, v3             tripartite target graph 𝑇𝑇𝑇𝑇𝑇𝑇:
and edge e2 from metavertex mv1 and also edges e4, e5, e8                                𝑇𝑇𝑇𝑇𝑇𝑇 = 〈𝑇𝑇𝑇𝑇𝑇𝑇 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 , 𝑇𝑇𝐹𝐹𝐹𝐹 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 〉,
showing the holonic nature of the metagraph structure.                               𝑇𝑇𝑇𝑇𝑇𝑇 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 = 〈𝑇𝑇𝑇𝑇𝑇𝑇 𝑉𝑉 , 𝑇𝑇𝐹𝐹𝐹𝐹 𝐸𝐸 , 𝑇𝑇𝐹𝐹𝐹𝐹 𝑀𝑀𝑀𝑀 〉,
The Figure 1 shows that the metagraph model allows                         𝑇𝑇𝑇𝑇𝑇𝑇 𝑉𝑉 ↔ 𝑀𝑀𝑀𝑀 𝑉𝑉 , 𝑇𝑇𝑇𝑇𝑇𝑇 𝐸𝐸 ↔ 𝑀𝑀𝑀𝑀 𝐸𝐸 , 𝑇𝑇𝑇𝑇𝑇𝑇 𝑀𝑀𝑀𝑀 ↔ 𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀 .
describing complex data structures and it is the metavertex              The set 𝑇𝑇𝑇𝑇𝑇𝑇 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 can be divided into three disjoint
that allows implementing emergence principle in data                and independent sets 𝑇𝑇𝑇𝑇𝑇𝑇 𝑉𝑉 , 𝑇𝑇𝐹𝐹𝐹𝐹 𝐸𝐸 , 𝑇𝑇𝐹𝐹𝐹𝐹 𝑀𝑀𝑀𝑀 . There are
structures.                                                         three isomorphisms between metagraph vertices,
    It should be noted that according to [2] the metagraph          metavertices, edges and corresponding subsets of
model also includes more complex elements such as                   𝑇𝑇𝑇𝑇𝑇𝑇 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 : 𝑇𝑇𝑇𝑇𝑇𝑇 𝑉𝑉 ↔ 𝑀𝑀𝑀𝑀 𝑉𝑉 , 𝑇𝑇𝑇𝑇𝑇𝑇 𝐸𝐸 ↔ 𝑀𝑀𝑀𝑀 𝐸𝐸 , 𝑇𝑇𝑇𝑇𝑇𝑇 𝑀𝑀𝑀𝑀 ↔
metaedges and metagraph agents. However, they are                   𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀 . The set 𝑇𝑇𝑇𝑇𝑇𝑇 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 stores the information about
derived from the considered model elements and do not               relations between vertices, metavertices, edges in
affect the methods of metagraphs storage in different               original metagraph.
databases.                                                                                                     mv2
                                                                                                   mv1
3 Mapping the metagraph model to storage                                                           e1
models                                                                                                      v2
                                                                                                   v1     e2
The logical model described in the previous section is a
higher-level model. To store the metagraph model                                                    e3      v3
efficiently, we must create mappings from “logical”
model to “physical” models used in different databases.
    In this section, we will consider metagraph model               Figure 2 The example of metagraph for flattening
mappings to the flat graph model, document model, and
relational model.

3.1 Mapping metagraph model to the flat graph
model
The main idea of this mapping is to flatten the
hierarchical metagraph model.
    Of course, it is impossible to turn a hierarchical graph
model into a flat one directly. The key idea to do this is
to use multipartite graphs [3].




                                                               83
                                                                                  fragments into predicate representation is described in
                                   mv1                                            details in [2].
                                                                                      The proposed textual representation may be used for
                                                                                  storing metagraph data in a document-oriented database
                v1        e1         v2        e2         v3                      or text or document fields of the relational database using
                                                                                  JSON or XML formats.

                                                                                  3.3 Mapping metagraph model to the relational
               e3
                                                                                  model
                                                                                  Nowadays NoSQL databases are very popular. But
                                             mv2                                  traditional relational databases are still the most mature
                                                                                  solution and widely used in information systems.
Figure 3 The example of flattened metagraph                                       Therefore, we also need the relational representation of
                                                                                  the metagraph model. There are two ways to store
    Consider the example of flattening metagraph model.                           metagraphs in a relational database.
The original metagraph is represented in Fig. 2 and                                   The first way is to use a pure relational schema. In
corresponding flat graph is represented in Fig. 3. The                            this case, the proposed metagraph model may be directly
vertices, metavertices and edges of original metagraph are                        or with some optimization transformed into the database
represented with vertices of different shapes.                                    schema. The tables vertices, metavertices, edges may be
    From the general system theory point of view,                                 used. The Figure 4 contains a graphical representation of
emergent metagraph elements such as vertices,                                     such a schema using PostgreSQL database. The table
metavertices, edges are transformed into independent                              “metavertex” contains the representation of vertices and
vertices of the flat graph.                                                       metavertices. The table “relation” contains the
    The proposed mapping may be used for storing                                  representation of edges.
metagraph data in graph or hybrid databases such as
Neo4j or ArangoDB.
    It is important to note that flattening metagraph
model does not solve all problems for graph database
usage. Consider the example of a query using the Neo4j
database query language “Cypher”:
  (n1:Label1)-[rel:TYPE]->(n2:Label2)
    One can see that used notation is RDF-like and
suppose that graph edges are named. But flatten
metagraph model does not use named edges because
metagraph edges are transformed into vertices.
    Thus, query languages of flat graph databases are not
suitable for the metagraph model because they blur the
semantics of the metagraph model.

3.2 Mapping metagraph model to the document
model                                                                             Figure 4 The database schema for pure relational
From the general system theory point of view, emergent                            metagraph representation
metagraph elements such as vertices, metavertices, edges
should be represented as independent entities.                                        The second way is to use document-oriented
      In the previous subsection, we use flat graph vertices                      possibilities of a relational database. For example, the
for such a representation. But instead of graph vertices,                         latest versions of PostgreSQL database provide such a
we can also represent independent entities as documents                           possibility. The Figure 5 contains a graphical
for the document-oriented database. Flat graph edges are                          representation of such a schema.
represented as relations via id-idrefs between documents.
      For the sake of clarity, we use the Prolog-like
predicate textual representation. This representation may
be easily converted into JSON or XML formats because
it is compliant with JSON semantics and contains nested
key-value pairs and collections.
      The classical Prolog uses the following form of the
predicate: 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝(𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎1 , 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎2 , ⋯ , 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑁𝑁 ). We
used extended form of predicate where along with atoms
predicate can also include key-value pairs and nested
                                                                                  Figure 5 The database schema for document-relational
predicates: 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝(𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎, ⋯ , 𝑘𝑘𝑘𝑘𝑘𝑘 = 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣, ⋯ ,
                                                                                  metagraph representation
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑎𝑎𝑡𝑡𝑡𝑡(⋯ ), ⋯ ). The mapping of metagraph model




                                                                             84
    In this case, vertices, metavertices, and edges are
stored as XML or JSON documents in relational tables.                                  StatementID_2

The drawback of this approach is id-idrefs storage
                                                                                        object:Paul
between documents. In a relational database, we have to
do this programmatically which decrease the overall
                                                                                      predicate:has_author       StatementID_4
system performance.
                                                                                                                   object:James

4 Why not using RDF model?
                                                                                       subject:John          predicate:has_author
Nowadays the semantic web approach for knowledge storage
is widely used. In this case, the Resource Description                                predicate:arrived_in

Framework (RDF) is used as the data model, and SPARQL
is used as the query language. RDFS (RDF Schema) and                                   object:London
                                                                                                             predicate:has_author
OWL (OWL2) are used as ontology definition languages,
                                                                                       StatementID_1
built on the base of RDF. Using RDFS and OWL, it is                                                                object:James
possible to express various relationships between ontology
                                                                                      predicate:has_time         StatementID_5
elements (class, subclass, equivalent class, etc.) [4]. For RDF
persisting and SPARQL processing, special storage systems
are used, e.g., Apache Jena.                                                           object: 4 p.m.

    But unfortunately, the RDF approach has several
                                                                                       StatementID_3
limitations for complex situation description. In this
section, we will consider these limitations according to
our paper [5]. The root of limitations is the absence of               Figure 6 The example of RDF reification
the emergence principle in the flat graph RDF model.
4.1 The reification limitation                                             In Fig. 6 statements 1, 2, 3 are highlighted, whereas
                                                                       statements 4 and 5 are not highlighted in order not to
The reification is used to define RDF statements about                 confuse visualization of the figure. Fig. 6 shows that a
other RDF statements. According to the RDF Primer [6]:                 reified triple may be considered as a metavertex but in
‘the purpose of reification is to record information about             very restrictive form, containing only one subject,
when or where statements were made, who made them,                     predicate, and object.
or other similar information (this is sometimes referred                   The problem shown in this example is emergence
to as “provenance” information)’. Thus, reification is                 loss because of the artificial splitting of the whole
considered as an auxiliary technique to “log” provenance               situation into a few RDF statements. Statements 4 and 5
information about statements.                                          are represented by separate RDF statements, but they
    RDF contains reified triple construction to describe               would more intuitively be represented by a single unit
reification in the following form:                                     containing the whole situation.
    StatementID subject predicate object
                                                                           The metagraph approach helps to represent this
    Consider the example of the complex statement:
                                                                       example more naturally and holistically. From the
‘James noted that Paul noted at 4 p.m. that John arrived
                                                                       metagraph point of view, this example contains three
in London’. In the reified triples form, this example may
                                                                       nested situations:
be represented as follows:
1.StatementID_1 John arrived_in London                                 • Situation 1. John arrived in London;
2.StatementID_2 StatementID_1 has_author Paul                          • Situation 2. Paul noted at 4 p.m. situation 1;
3.StatementID_3 StatementID_1 has_time “4p.m.”                         • Situation 3. James noted situation 2.
4.StatementID_4 StatementID_2 has_author James
5.StatementID_5 StatementID_3 has_author James
    In statements 2 and 3, StatementID_1 is used as the
subject. Statements 2 and 3 contain provenance
information about the author and time of statement 1.
Statements 4 and 5 contain provenance information
about the author of statements 2 and 3. The RDF graph
form of this example is shown in Fig. 6.




                                                                  85
                                                                                                       Problem_meet


                                                                                                    object:James                object:Paul
                        edge:noted
         vertex:Paul                    vertex:John
                                                                                                                         predicate:to_meet

                                       edge:arrived_in

              attribute:                                                                                                   subject:John
           has_time=4 p.m.             vertex:London

                                        metavertex:
                                                                                                    predicate:has_time
                                        Situation 1
                                                                            object: 4 p.m.

                                                                                                                         predicate:arrived_to
              metavertex:               attribute:                                             predicate:by_transport
              Situation 2            has_time=4 p.m.                         object:train


                                                                                                                          object:London
                                                                                  Problem_arrived
                       edge:noted
                                        metavertex:
             vertex:James               Situation 3

                                                                Figure 8 The example of N-ary relationship

                                                                     The “Problem_arrived” is that the predicate
Figure 7 The metagraph representation of RDF                    “arrived_to” has nested predicates “has_time” and
reification                                                     “by_transport”. According to [7] we are adding a
                                                                supporting subject to “Problem_arrived” representing an
    Each situation is represented by a metavertex as            instance of a relation.
shown in Fig. 7. Attribute “has_time=4 p.m.” may be                  The “Problem_meet” is that the predicate “to_meet”
bound either to edge “noted” or to metavertex “Situation        has two objects “James” and “Paul”. According to [7] we
2” (Fig. 7 shows both cases).                                   have several ways to solve this problem. We may use the
    The textual representation of Fig. 7 is shown below:        list construct of RDF or we may join object “James” and
Metavertex(Name=Situation3,
  Vertex(Name=James),                                           “Paul” into the classmate's group. We do the latter in this
  Metavertex(Name=Situation2,                                   example.
    Attribute(has_time,"4 p.m."),                                    The solution is shown in Fig. 9. We have added
    Vertex(Name=Paul),                                          supporting       vertices    “Classmates_group”        and
    Metavertex(Name=Situation1,
      Vertex(Name=John),                                        “Problem_arrived”, which are shown in rounded boxes.
      Vertex(Name=London),                                      In predicate “to_meet” the “Classmates_group” is an
      Edge(Name=arrived_in, vS=John, vE=London,                 object while in predicate “includes” it is a subject. In
        eo=true)),
    Edge(Name=noted, vS=Paul, vE=Situation1,
                                                                predicate “has_person”, “John” is an object while in
eo=true,                                                        predicate “to_meet” he is a subject.
      Attribute(has_time,"4 p.m."))),                                Since we do not use reification, this may be
  Edge(Name=noted, vS=James, vE=Situation2,                     represented in the RDF triple form “subject
eo=true))
                                                                predicate object” as follows:
   This considered example shows that the metagraph             1. Problem_arrived has_person John
approach allows representing reification without                2. Problem_arrived arrived_to London
emergence loss, keeping each nested situation in its own        3. Problem_arrived by_transport train
metavertex.                                                     4. Problem_arrived has_time “4p.m.”
                                                                5. John to_meet Classmates_group
4.2 The N-ary relationship limitation                           6. Classmates_group includes James
An N-ary relationship is a situation where a predicate          7. Classmates_group includes Paul
combines several subjects or objects or has nested                  As in the reification example, the problem here is in
predicates. Such a situation is a problem from an RDF           emergence loss due to the artificial splitting of the
point of view. To address this problem, the W3C                 situation. The “Problem_arrived” vertex is added not
Working Group Note was published [7].                           because it describes the situation in a natural way, but
    Consider the example of the complex statement:              because it is required to keep a consistent triplet structure.
‘John arrived to London at 4 p.m. by train in order to          In a large RDF graph, many supporting vertices may
meet his classmates James and Paul’. This is a typical          obscure meaningful understanding of the situation.
example of an N-ary relationship as shown in Fig. 8.
Both problems shown in Fig. 8 cannot be represented by
a pure RDF triplet model.




                                                           86
                                                                     without emergence loss, keeping each nested situation in
                           Problem_meet
                                                                     its own metavertex.
               object:James          object:Paul                              edge:
                                                                                                    metavertex:London
                                                                              living

             predicate:                       predicate:                                    metavertex:Classmates_group
              includes                         includes
                          (subject,object):                                                vertex:James           vertex:Paul
                         Classmates_group

                             predicate:to_meet

                                                                                                          edge:arrived_to

                     (subject,object):John                                                                          attribute:
                                                                                         edge:                   has_time=4 p.m.
                                                                                        to_meet


                            predicate:has_person                                                                     attribute:
                                                                                                                by_transport=train

                              subject:
                          Problem_arrived                                                         vertex:John

                               predicate:
                               arrived_to                            Figure 10 The Metagraph representation of N-ary
            predicate:                            predicate:         relation example
            has_time        object:London        by_transport

                                                                        Summing up this section, it should be noted that the
                                                                     metagraph model addresses RDF limitations in a natural
               object: 4 p.m.             object:train
                                                                     way without emergence loss. Proposed textual
                                                                     representation of the metagraph allows clear and
                            Problem_arrived
                                                                     emergent description of examined problems.
                                                                        Therefore, despite the prevalence of the RDF model,
Figure 9 The RDF representation of N-ary relation                    we consider the development of a storage system for the
example                                                              metagraph model as an important task.

    As in the reification example, the metagraph                     5 The experiments
approach helps to represent this example in a more
natural and holistic way as shown in Fig. 10.                        In this section, we present experiments results for storing
    The “Problem_arrived” is solved by binding                       the metagraph “logical” model in several databases with
attributes “has_time=4 p.m.” and “by_transport=train” to             different “physical” data models.
the edge “arrived_to”. The “Problem_meet” is solved by                   It should be noted that these are just entry-level
using metavertex “Classmates_group” which includes                   experiments that should help to choose the right data
vertices “James” and “Paul”.                                         model prototype for the metagraph backend storage.
    The implicit knowledge about “Classmates_group”                      The experiments were carried out with the following
living in London may be shown either by the edge                     “physical” data models:
“living” or by the inclusion of metavertex                           • Neo4j – the Neo4j database using flat graph model
“Classmates_group” into metavertex “London” (Fig. 10                      (according to subsection 3.1);
shows both cases).                                                   • ArangoDB(graph) – the ArangoDB database using
    The textual representation of Fig. 10 is shown below:                 flat graph model (according to subsection 3.1);
Metavertex(Name=London,                                              • ArangoDB(doc) – the ArangoDB database using
  Metavertex(Name=Classmates_group,
    Vertex(Name=James),
                                                                          document-oriented model (according to subsection
    Vertex(Name=Paul),                                                    3.2);
    Edge(Name=living, vS=Classmates_group,                           • PostgreSQL(rel) – the PostgreSQL database using
vE=London,                                                                pure relational schema (according to subsection
      eo=true)))
Vertex(Name=John)                                                         3.3);
Edge(Name=to_meet, vS=John,                                          • PostgreSQL(doc) – the PostgreSQL database using
vE=Classmates_group,                                                      document-oriented possibilities of relational
  eo=true)
Edge(Name=arrived_to, vS=John, vE=London,                                 database (according to subsections 3.2 and 3.3).
eo=true,                                                                 The characteristics of test server: Intel Xeon E7-4830
  Attribute(has_time,"4 p.m."),                                      2.2 GHz, 4 Gb RAM, 1 Tb SSD, OS Ubuntu 16.04 (clean
  Attribute(by_transport, train))                                    installation on a server). Python 3.5 was used for running
   This considered example shows that the metagraph                  test scripts. Scripts are connected to Neo4j and
approach allows the representation of N-ary relations                ArangoDB via official Python drivers. Queries to these




                                                                87
databases were written in query languages (Cypher and
AQL respectively) without ORM and executed by
Python drivers. However, queries for PostgreSQL were
made with SQLAlchemy ORM in order to simplify
database manipulations from the python script. In all
cases, the database was generated by scripts in csv-
format. The database was reloaded from the dump after
every test, which modified the state of the database.             Figure 14 The test results for “Updating existing vertex
    Each operation was repeated several times to get the          value”
average time of execution.
    The experimental dataset consisted of 1 000 000
vertices, randomly connected with 1 000 000 edges. Each
vertex of the dataset included one random integer
attribute and one random string attribute of fixed length.
For read operations (selecting hierarchy), additional ten
vertices of fixed structure (100 nested levels) were added
to the dataset to get an average time of ten reads.
    The numerical axis of the charts contains operation
execution time in milliseconds. The less value is better.
    The main test results are represented in the following
figures and summed up in Table 1. If the best result is           Figure 15 The test results for “Deleting vertex from the
approximately the same for several databases, then all            existing metavertex”
these variants are marked in Table 1.




                                                                  Figure 16 The test results for “Deleting edge from the
Figure 11 The test results for “Inserting vertex to the           existing metavertex”
existing metavertex”




Figure 12 The test results for “Inserting vertex to the
metagraph”                                                        Figure 17 The test results for “Selecting hierarchy of
                                                                  100 related metavertices”

                                                                  Table 1 The tests results (test time in milliseconds)
                                                                                           ArangoDB   ArangoDB   PostgreSQL   PostgreSQL
                                                                    Test case      Neo4j
                                                                                            (graph)     (doc)       (rel)        (doc)
                                                                     Inserting
                                                                   vertex to the
                                                                      existing      40        2          5           8            6
                                                                    metavertex
                                                                     Inserting
                                                                   vertex to the   253        3          3           3            4
                                                                    metagraph
                                                                     Inserting
                                                                    edge to the    148       32          7           8            6
                                                                    metagraph
Figure 13 The test results for “Inserting edge to the                Updating
metagraph”                                                            existing     267        5          5           3            9
                                                                   vertex value
                                                                     Deleting
                                                                   vertex from      45        6          5           6            9




                                                             88
 the existing
 metavertex
                                                                   computer science to biology and sociology.
   Deleting                                                            Nowadays, there is a tendency to complicate the
  edge from
 the existing   57    6         16        9          6             graph database data model in order to support the
 metavertex
  Selecting
                                                                   complexity of the domains.
 hierarchy of
                45    5        323       218        187
                                                                       We propose to use a metagraph data model that
 100 related
 metavertices                                                      allows storing more complex relationships than a
                                                                   hypergraph data model.
    Let's make intermediate conclusions on the basis of                The metagraph model may be mapped to the flat
the considered results of experiments.                             graph model, the document model and the relational
    It is necessary to recognize the Neo4j implementation          model. The main idea of this mapping it the flattening
as inefficient compared to other cases. But this is not a          metagraph to the flat multipartite graph. Then flat graph
disadvantage of the graph model itself, because the graph          may be represented as document model or relational
implementation in ArangoDB is quite efficient.                     model.
    The inserting, updating and deleting operations are                The experiments results show that flat graph model is
very efficient in PostgreSQL (both relational and                  most suitable for metagraph storage.
document-oriented schemas) and ArangoDB (document-                     In the future, it is planned to develop a metagraph
oriented schema), but this is not the case for hierarchical        data manipulation language and design a stable version
selecting which is typical metagraph operation.                    of the metagraph storage based on a flat graph database.
    The time for hierarchical selecting for graph
databases (both Neo4j and ArangoDB) is comparable to               References
the time of other tests while the time for hierarchical
                                                                      [1] Basu, A., Blanning, R. Metagraphs and their
selecting for relational and document-oriented databases
is several times longer than the time of other tests.                       applications. Springer, New York (2007)
    Thus, if the system architect is forced to use a                  [2]   Chernenkiy, V., Gapanyuk, Yu., Revunkov, G.,
relational or document-oriented database as a metagraph                     Kaganov, Yu., Fedorenko, Yu., Minakova, S.
storage backend, then hierarchical selecting queries                        Using metagraph approach for complex
should be the subject of careful optimization.                              domains description. In: Proceedings of the
    Summarizing, we can say that, provided an effective                     XIX International Conference on Data
graph database is used, the flat graph model is most                        Analytics and Management in Data Intensive
suitable for metagraph storage.                                             Domains (DAMDID/RCDL 2017), Moscow,
                                                                            Russia, pp. 342-349 (2017)
6 The related work                                                    [3]   Chartrand, G., Zhang, P. Chromatic Graph
                                                                            Theory. CRC Press, New York (2008)
Nowadays, there is a tendency to complicate the graph                 [4]   Allemang, D., Hendler, J. Semantic Web for the
database data model. An example of this tendency is the                     working ontologist: effective modeling in
HypergraphDB [8] database. As the name implies,                             RDFS and OWL – 2nd ed. Elsevier, Amsterdam
HypergraphDB uses the hypergraph as a data model. The                       (2011)
reasoning capabilities are implemented via integration
                                                                      [5]   Chernenkiy, V., Gapanyuk, Yu., Nardid, A.,
with TuProlog.
                                                                            Skvortsova, M., Gushcha, A., Fedorenko, Yu.,
    Another interesting project is a GRAKN.AI [9]
                                                                            Picking, R. Using the metagraph approach for
aimed for AI purpose that explicitly combines graph-
                                                                            addressing RDF knowledge representation
based and ontology-based approach for data analysis.
                                                                            limitations. In: Proceedings of Internet
The flat graphs and hypergraphs may be used as data
                                                                            Technologies and Applications (ITA’2017),
model. The Graql query language is used both for data
                                                                            Wrexham, United Kingdom, pp. 47-52 (2017)
manipulation and reasoning.
    The drawbacks of both projects can be attributed to               [6]   RDF Primer. W3C Recommendation 10
the fact that the most complex data model for them are                      February 2004.
hypergraphs. It was shown in the paper [2] that the                         https://www.w3.org/TR/2004/REC-rdf-primer-
metagraph is a holonic graph model whereas the                              20040210/
hypergraph is a near flat graph model that does not fully             [7]   Defining N-ary Relations on the Semantic Web.
implement the emergence principle. Therefore, the                           W3C Working Group Note 12 April 2006.
hypergraph model doesn’t fit well for complex data                          http://www.w3.org/TR/swbp-n-aryRelations/
structures description.                                               [8]   HyperGraphDB website.
                                                                            http://hypergraphdb.org/
7 Conclusions                                                         [9]   GRAKN.AI website. https://grakn.ai/
The models based on complex graphs are increasingly
used in various fields of science from mathematics and




                                                              89