<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scalable Graph Query Evaluation and Benchmarking with Realistic Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ga´bor Sza´rnyas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Budapest University of Technology and Economics Department of Measurement and Information Systems MTA-BME Lendu ̈let Research Group on Cyber-Physical Systems</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2004</year>
      </pub-date>
      <abstract>
        <p>Model queries are widely used in model-driven engineering toolchains: models are checked for errors with validation queries, model simulations and transformations require complex pattern matching, while injective mappings for views are defined with model queries. Efficient and scalable evaluation of complex queries on large models is a challenging task. To achieve scalable graph query evaluation, I identified key challenges such as the lack of credible benchmarks and difficulties of obtaining real models for performance testing. To address these challenges, my contributions target (1) distributed incremental graph queries, (2) a cross-technology benchmark for model validation, (3) characterization of realistic models, and (4) realistic models generation.</p>
      </abstract>
      <kwd-group>
        <kwd>distributed queries</kwd>
        <kwd>model validation</kwd>
        <kwd>model generation</kwd>
        <kwd>benchmarking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Model-Driven Engineering (MDE) is a development
methodology used in many application domains such as
critical applications (automotive, avionics and railway
systems [
        <xref ref-type="bibr" rid="ref28 ref53">7, 31, 58</xref>
        ]). To increase the efficiency of development,
MDE facilitates the use of models in various modelling
languages targeting different levels of abstraction. Models can
be used not only for presenting the structure and behaviour
of the system, but also for synthesizing various design
artifacts (such as source code, configuration files,
documentation). To catch design flaws early, model validation
techniques check the well-formedness of models. Design rules
and well-formedness constraints are often captured in the
form of graph patterns [
        <xref ref-type="bibr" rid="ref6">9</xref>
        ] to highlight invalid model
elements to systems engineers. MDE tools check these patterns
by evaluating graph queries.1
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Scalable Graph Queries</title>
      <p>
        As models are rapidly increasing in size and complexity,
efficient execution of model validation operations is
challenging for the currently available toolchains, like ARTOP [2],
Capella [
        <xref ref-type="bibr" rid="ref35">38</xref>
        ] or Papyrus [
        <xref ref-type="bibr" rid="ref48">53</xref>
        ].
      </p>
      <p>The last decade brought considerable improvements
in distributed storage and query technologies, known as
NoSQL systems. These systems provide quick evaluation
of simple retrieval operations and they are able to answer
complex queries in a scalable manner, albeit not instantly.
Providing quick response times for evaluating such queries
over large and evolving data sets is still a challenging task.</p>
      <p>
        Graph queries capturing validation constraints are often
complex, including many join, antijoin and filtering
operations. However, most query technologies cannot efficiently
evaluate such operations for models with 10 million model
elements [
        <xref ref-type="bibr" rid="ref43">48</xref>
        ], while models of critical systems, software
and geospatial models are often 1–2 orders of magnitude
larger [
        <xref ref-type="bibr" rid="ref38">43</xref>
        ]. A possible solution for scalable graph queries
is to use distributed query processing techniques [
        <xref ref-type="bibr" rid="ref13 ref54">16, 59</xref>
        ].
This brings us to the first research question I investigated.
      </p>
      <p>RQ 1. How to incrementally evaluate graph queries
over a distributed platform?
1.2</p>
    </sec>
    <sec id="sec-3">
      <title>Benchmarking</title>
      <p>
        To assess the performance of a graph query engine, a
benchmark framework is of high importance. According to the
Benchmark Handbook [
        <xref ref-type="bibr" rid="ref18">21</xref>
        ], a useful benchmark is (1)
relevant, (2) portable, (3) scalable, and (4) simple. To ensure
relevance, the benchmark must use a representative
workload and models similar to realistic ones. Providing relevant
results, while also guaranteeing the other three properties
(portability, scalability and simplicity) is a major challenge.
1 In this paper, I use the term graph as a synonym for instance model.
      </p>
      <p>For real-world industrial systems, both metamodels and
instance models are protected by intellectual property rights
(IPR). For example, AUTOSAR [7] is not an open standard,
but only available to members of the consortium, therefore
it is not suitable for an open performance benchmark.
Similarly, engineering models in the avionics and railway
domains are also not available to the public.</p>
      <p>These challenges confirm the need for a benchmark
framework, which provides a real-world-like workload
scenario and evaluates realistic queries on realistic models.
Therefore, the second research question is the following.</p>
      <p>RQ 2. How to assess query technologies for a
continuous model validation scenario?
1.3</p>
    </sec>
    <sec id="sec-4">
      <title>Characterization of Realistic Models</title>
      <p>While existing generators may produce large models in
increasing sizes, these models are usually simple and
synthetic, which hinders their credibility for industrial and
research benchmarking purposes. Up to my best knowledge,
there are no existing techniques to characterize models used
in MDE practice. To develop such a technique, first I had to
address questions about model metrics, such as:</p>
      <p>Which metrics can be used for characterizing models?
Is is possible to distinguish models of different domains,
purely based on their metrics?</p>
      <p>To answer these questions, I conducted a literature review
in other disciplines, e.g. network theory and social network
analysis. The high-level goal of the research is to answer the
following question.</p>
      <p>RQ 3. What makes a model realistic?
1.4</p>
    </sec>
    <sec id="sec-5">
      <title>Generating Realistic Models</title>
      <p>
        Custom generators of graph-based models are used in MDE
for many purposes such as functional testing and
performance benchmarking of modeling environments to ensure
the correctness and scalability of tools. However, none is
capable of generating realistic models scalable in size:
Logic-based synthesis (like Alloy [
        <xref ref-type="bibr" rid="ref22">25</xref>
        ]) generate
wellformed models but lack scalability.
      </p>
      <p>
        Rule-based approaches [
        <xref ref-type="bibr" rid="ref43">48</xref>
        ] are capable of generating
large models by using transformation rules or random
mutations to add new elements. However, they provide no
guarantees that the resulting model is realistic. Some
approaches do not even guarantee well-formedness, which
is a prerequisite for realistic models.
      </p>
      <p>It is an open research question if it is possible to ensure
these properties.</p>
      <p>RQ 4. How to generate scalable and realistic models?
2.</p>
      <sec id="sec-5-1">
        <title>Preliminaries</title>
        <p>This section introduces an example used throughout the
paper and presents the concept of incremental queries.
2.1</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Running Example: Railway Network</title>
      <p>
        As a running example, I use a small railway network, defined
on the metamodel of the Train Benchmark [
        <xref ref-type="bibr" rid="ref43">48</xref>
        ], a model
validation benchmark (the benchmark and my related
contributions are discussed in Section 4.2).2
      </p>
      <p>Figure 1 shows a schematic representation of the network,
with routes (1–3), switches and segments. As the first switch
is set to a straight position and the second switch is set to a
diverging position, a train passing through this track would
follow route #3, hence that route active.</p>
      <p>Segment</p>
      <p>SwitchPosition Route 1</p>
      <p>Segment
Switch</p>
      <p>Segment</p>
      <p>Switch</p>
      <p>Segment Route 2
Segment</p>
      <p>Route 3
SwitchPosition</p>
      <p>Modeling tools often represent their models as graphs.
Figure 2 shows the example network as a labelled, attributed
graph, along with the metamodel of the graph. Routes follow
a set of switch positions that contain the prescribed position
(straight or diverging) of the switch. The railway track
consists of connected switches and segments.
2 To guarantee that the example is concise and easy to understand, the
example only uses a fraction of the Train Benchmark metamodel. The
benchmark uses models that are significantly more complex: they contain
more metamodel elements (types) and consist of more elements (objects).</p>
      <p>The active route can be determined by evaluating a graph
query (by graph pattern matching). A route is active if all
its switches are in the position prescribed by the switch
positions of the route. In other words, a route is active if none
of its switches are set to a different position as the prescribed
position. This results in the pattern shown in the upper right
corner of Figure 2. In the example, the graph query selects
route #3 as the active one, as both its switch positions (6 and
8) are satisfied by the corresponding switches (10 and 13).</p>
    </sec>
    <sec id="sec-7">
      <title>2.2 Incremental Query Evaluation</title>
      <p>In many use cases, queries are continuously evaluated, while
changes affect only a restricted part of data. The queries and
transformations for simulation and well-formedness
validation in MDE are typical examples of such a workload. The
goal of incremental query evaluation is to speed up such
queries, utilizing the (partial) results obtained during the
previous executions of the query to compute the latest set of
changes. For example, if the current position of the second
switch in Figure 1 changes from diverging to straight, the
change only affects a small part of the graph (node 13 in
Figure 2). This allows the incremental query engine to quickly
reevaluate the query: in this case, the active route is changed
from #3 to #2.</p>
      <p>
        Incremental query evaluation algorithms use additional
data structures for caching interim results, hence they
consume more memory than search-based, non-incremental
algorithms. In other words, they trade memory consumption
for execution speed. While incremental query engines
provide quick response times for various use cases [
        <xref ref-type="bibr" rid="ref43 ref6">9, 48</xref>
        ], their
excessive memory consumption limits their scalability.
      </p>
      <sec id="sec-7-1">
        <title>Related Work</title>
        <p>To appropriately address all the research questions in the
context of MDE, a wide range of multidisciplinary topics
needs to be covered.</p>
        <p>
          Distributed incremental graph queries. The Rete
algorithm was originally created by Charles Forgy for rule-based
expert systems [
          <xref ref-type="bibr" rid="ref17">20</xref>
          ]. Bunke et al. [
          <xref ref-type="bibr" rid="ref11">14</xref>
          ] were the first to
propose the Rete algorithm in the context of graph
transformations. Bergmann et al. adapted the algorithm for the Eclipse
Modeling Framework in the EMF-INCQUERY project [
          <xref ref-type="bibr" rid="ref6">9</xref>
          ],
now part of the VIATRA project [
          <xref ref-type="bibr" rid="ref50">55</xref>
          ].
        </p>
        <p>
          Query languages and execution engines have been
developed to support incremental graph queries on a
singlemachine environment. Drools [
          <xref ref-type="bibr" rid="ref24">27</xref>
          ] is an incremental
business rule engine for Java-based systems. INSTANS [40]
provides incremental queries over RDF [
          <xref ref-type="bibr" rid="ref52">57</xref>
          ].
        </p>
        <p>
          Various distributed, but non-incremental graph query
systems exist, including an approach based on SAP HANA [
          <xref ref-type="bibr" rid="ref26">29</xref>
          ],
a graph transformation tool using the Bulk Synchronous
Parallel graph processing model [
          <xref ref-type="bibr" rid="ref27">30</xref>
          ], and Trinity, an
RDFbased query engine [
          <xref ref-type="bibr" rid="ref40">45</xref>
          ].
        </p>
        <p>Cross-technology benchmark for continuous validation.
Numerous benchmarks have been proposed to compare the
performance of query and transformation engines, but no
openly available cross-technology benchmarks exist for
continuous model validation.</p>
        <p>
          The first transformation benchmark was proposed in [
          <xref ref-type="bibr" rid="ref51">56</xref>
          ],
which gave an overview of typical application scenarios of
graph transformations together with their characteristic
features. Many transformation challenges have been proposed
as cases for graph and model transformation contests.
However, only [
          <xref ref-type="bibr" rid="ref19 ref56">22, 61</xref>
          ] focus on query performance, while others
measure the usability of the tools, the conciseness and
readability of the query languages and tests various advanced
features, including reflection, traceability, etc.
        </p>
        <p>
          There are numerous benchmarks from the area of
semantic databases. SP2Bench [
          <xref ref-type="bibr" rid="ref39">44</xref>
          ] features a synthetic
DBLPlike dataset, the Berlin SPARQL Benchmark (BSBM) [
          <xref ref-type="bibr" rid="ref8">11</xref>
          ]
simulates an e-commerce application, while the
DBpedia SPARQL benchmark [
          <xref ref-type="bibr" rid="ref32">35</xref>
          ] features a real data set with
queries based on real-world user queries. The Linked Data
Benchmark Council (LDBC) recently developed the Social
Network Benchmark [
          <xref ref-type="bibr" rid="ref16">19</xref>
          ], a cross-technology benchmark,
which provides an interactive workload and focuses on
navigational pattern matching (i.e. traversal operations). While
some of these benchmarks feature update operations and
hence measure incremental query performance, they provide
workloads that significantly differ from MDE use cases.
Characterization of realistic models. Revealing
essential structural similarities and differentiations among
networks from different fields is a fundamental objective in
network theory with a wide range of applications. The
authors of [
          <xref ref-type="bibr" rid="ref12">15</xref>
          ] list 22 areas using network theory, including
social network analysis, transportation, biomolecular networks
and chemistry. Network theory is also studied in physics,
e.g. in the context of statistical mechanics [
          <xref ref-type="bibr" rid="ref3">5</xref>
          ]. However,
most of these applications use untyped (one-dimensional)
networks. So far, existing multidimensional studies only
focused on models of a single application domain, such
as neighbourhood and centrality analysis of a social
network [
          <xref ref-type="bibr" rid="ref9">12</xref>
          ], relevance and correlation analysis of different
dimensions in Flickr [
          <xref ref-type="bibr" rid="ref25">28</xref>
          ], community detection in the
network of YouTube [
          <xref ref-type="bibr" rid="ref47">52</xref>
          ].
        </p>
        <p>
          The authors of [
          <xref ref-type="bibr" rid="ref7">10</xref>
          ] use graph metrics to capture the
structure and evolution of software products and processes in
order to detect significant structural changes, help estimate
bug severity, prioritize debugging efforts, and predict
defectprone releases in software engineering. Metrics are also
used for understanding the main characteristics of
domainspecific metamodels, to study model transformations with
respect to the corresponding metamodels, and search
correlations between them via analytical measures [
          <xref ref-type="bibr" rid="ref36">41</xref>
          ].
Realistic model generation. The SP2Bench [
          <xref ref-type="bibr" rid="ref39">44</xref>
          ]
benchmark uses a generator based on the statistics of the DBLP
library. The authors of [
          <xref ref-type="bibr" rid="ref33">36</xref>
          ] use Boltzmann samplers [
          <xref ref-type="bibr" rid="ref14">17</xref>
          ] to
ensure efficient generation of uniform models.
        </p>
        <p>
          OMOGEN [
          <xref ref-type="bibr" rid="ref10">13</xref>
          ] is a tool for automatically generating test
models, used for testing model transformations. The tool
takes a metamodel and a set of model fragments as its inputs
and combines the fragments using several strategies to build
valid instances.
        </p>
        <p>
          gMark [
          <xref ref-type="bibr" rid="ref5">8</xref>
          ] is a domain-independent framework for
synthesizing large graphs, allowing the user to specify
parameters – size, types, degree distributions and other constraints
– for the graphs to be generated. gMark is also able to
generate query workloads with queries of different size, shape
and selectivity.
        </p>
      </sec>
      <sec id="sec-7-2">
        <title>Approach and Contributions</title>
        <p>To achieve scalable incremental query evaluation, I adapted
the Rete algorithm for distributed systems. I demonstrate the
Rete algorithm works on the ActiveRoute query (Figure 2).
As described in Section 2.1, the query collect Routes, where
all Switches along the route are in the position prescribed by
the corresponding SwitchPosition. In other words, without
using the universal quantifier (8), it searches for routes that
do not have a SwitchPosition which prescribes a position
different from the current position of its target Switch [39].
Hence, the query can be formalized in relational algebra as:3
route . follows ./ currentPosition6=position(</p>
        <p>
          switch ./ target ./ switchPosition) = fh3ig
where ./ denotes the natural join operator that joins its
operands based on their common attributes, and . denotes
the antijoin operator (also known as the anti-semijoin [
          <xref ref-type="bibr" rid="ref41">46</xref>
          ])
that keeps the tuples from its left operand which do not have
a matching tuples in its right operand.
        </p>
        <p>Figure 4 shows a distributed Rete network implementing
this relational algebra expression. The network is allocated
to two machines, Server 1 and Server 2. This allows the
query engine to scale for larger graphs, for which the Rete
network would not fit in the memory of a single workstation.
However, this approach still has a bottleneck limiting
scalability: if a Rete node cannot fit to the memory of a single
workstation, it will run out of memory.
3 To formalize the query, the relations for the vertices and edges in Figure 2
can be defined as follows:
route(route) = fh1i; h2i; h3ig
follows(route; switchPosition) =</p>
        <p>fh1; 4i; h2; 5i; h2; 7i; h3; 6i; h3; 8ig
switch(switch; currentPosition) =</p>
        <p>fh4; divi; h5; stri; h6; stri; h7; stri; h8; divig
target(switchPosition; switch) =</p>
        <p>fh4; 10i; h5; 10i; h6; 10i; h7; 13i; h8; 13ig
switchPosition(switchPosition; position) =
fh10; stri; h13; divig</p>
        <p>Using these techniques and algorithms, I made following
contributions.</p>
        <p>
          Combine distributed actor model with Rete-based query
evaluation network. I designed a distributed architecture
and prototyped INCQUERY-D, a Rete-based query engine
using actors for distributed scalability. I presented a detailed
performance evaluation in the context of model incremental
well-formedness validation. The results showed nearly
instantaneous complex query reevaluation well beyond 10M+
model elements, [
          <xref ref-type="bibr" rid="ref42">47</xref>
          ]. To further extend the scalability of the
system, I proposed sharding individual Rete nodes in [
          <xref ref-type="bibr" rid="ref29">32</xref>
          ].
Distributed termination protocol for asynchronous Rete.
As Rete is an asynchronous algorithm, determining if the
network is in a consistent state w.r.t. the latest change set
requires a distributed termination protocol. The protocol was
also presented in [
          <xref ref-type="bibr" rid="ref42">47</xref>
          ] and [
          <xref ref-type="bibr" rid="ref29">32</xref>
          ].
        </p>
        <p>
          Experimental evaluation over distributed NoSQL databases.
The proposed architecture and algorithms are
representationagnostic. They have been integrated with the Neo4j graph
database [
          <xref ref-type="bibr" rid="ref20">23</xref>
          ], the Titan distributed graph database and
4store, a semantic database [
          <xref ref-type="bibr" rid="ref42">47</xref>
          ].
        </p>
        <p>
          Evaluation of Rete network optimization and allocation
strategies. Allocating the Rete nodes in the cloud is a
complex optimization problem, where the goal is to minimize
the cost of communication between the nodes. I presented
a solver-based approach for allocating Rete nodes in [
          <xref ref-type="bibr" rid="ref30">33</xref>
          ].
I also proposed optimization techniques used in relational
query optimization for enhancing the performance of graph
queries [
          <xref ref-type="bibr" rid="ref45">50</xref>
          ].
        </p>
        <p>
          Uniqueness. Up to my best knowledge, existing
technologies are either distributed [
          <xref ref-type="bibr" rid="ref27 ref40">30, 45</xref>
          ] or incremental [
          <xref ref-type="bibr" rid="ref50">55</xref>
          ], but
there is no system that provides scalable, distributed
incremental graph queries.
4.2
        </p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Cross-technology benchmark for continuous validation</title>
      <p>In Section 2.1, I used a running example from the Train
Benchmark framework. The Train Benchmark is an
incremental model validation benchmark, continuously
developed by the Fault-Tolerant Systems Research Group since
2010. I have significantly extended the Train Benchmark,
both conceptually and implementation-wise. Figure 5 shows
the inputs of the benchmark process, the benchmark phases
and the benchmark results.</p>
      <p>
        The Train Benchmark is a macro benchmark that aims
to measure the performance of continuous model
validation with graph-based models and constraints captured
as queries. The benchmark is cross-technology, i.e. it is
implemented on a range technologies. The serialization
formats include Eclipse-based model-driven engineering
toolchains (EMF), graph databases [
        <xref ref-type="bibr" rid="ref37">42</xref>
        ], relational databases
(SQL) and semantic technologies (RDF [
        <xref ref-type="bibr" rid="ref52">57</xref>
        ]). The query
engines include relational engines (SQLite, MySQL), graph
transformation frameworks (VIATRA [
        <xref ref-type="bibr" rid="ref50">55</xref>
        ]), rule engines
(Drools [
        <xref ref-type="bibr" rid="ref24">27</xref>
        ]), graph query engines (Neo4j [
        <xref ref-type="bibr" rid="ref34">37</xref>
        ]) and SPARQL
engines (Sesame [3], Jena [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). Also, the framework is
extensible which allows users of the benchmark to incorporate
new technologies.
      </p>
      <p>
        Earlier versions of the benchmark have been continuously
used for performance measurements since 2012 [
        <xref ref-type="bibr" rid="ref42 ref49">47, 54</xref>
        ].
The benchmark is also part of the benchmark suite used by
the MONDO EU FP7 [
        <xref ref-type="bibr" rid="ref31">34</xref>
        ] project and was selected as a
case for the 2015 Transformation Tool Contest [
        <xref ref-type="bibr" rid="ref46">51</xref>
        ] as well.
The benchmark framework is available as an open-source
project.4
Scalable technology-agnostic model generator. While the
original benchmark framework included a model generator,
its scalability was limited. I redesigned the model generator
focusing on two aspects: (1) ensuring scalability for large
models, and (2) allowing the framework users to easily adapt
new representations.
      </p>
      <p>
        Propose novel query and transformation mixes for
benchmark. The workload profile of the benchmark simulates
real-world model validation scenarios of users loading,
validating and transforming their models. The transformations
capture user edits and quick-fix like automated
refactoring operations. Some queries in the benchmark are
structurally similar to AUTOSAR [7] validation queries
(presented in [
        <xref ref-type="bibr" rid="ref6">9</xref>
        ]), while other aim to test various features of
graph query engines (such as efficient filtering and
evaluation of negative conditions).
      </p>
      <p>
        Automated visualization and reporting. The framework
features end-to-end automation [
        <xref ref-type="bibr" rid="ref21">24</xref>
        ] to (1) set up
configurations of benchmark runs, (2) generate large model instances
      </p>
      <sec id="sec-8-1">
        <title>4 https://github.com/FTSRG/trainbenchmark</title>
        <p>Model</p>
        <p>Query</p>
        <p>Scenario
{batch, inject, repair}</p>
        <p>Run: × k</p>
        <p>Read
execution time
Check</p>
        <p>Transformation</p>
        <p>Recheck
# of invalid elements,
execution time
execution time
# of invalid elements,
execution time</p>
        <p>Benchmark results
{# of invalid elements,
execution times,
memory consumption}
(3) execute benchmark measurements, (4) synthesize
diagrams for measurements using R scripts5.</p>
        <p>
          Cross-technology evaluation of incremental query
execution time and memory consumption. This cross-technology
benchmark can be adapted to different model representation
formats and query technologies. This is demonstrated by 12+
reference implementations over four different technological
spaces (EMF, graph databases, RDF and SQL) presented
in [
          <xref ref-type="bibr" rid="ref43">48</xref>
          ].
        </p>
        <p>Uniqueness. Compared to other benchmarks, the Train
Benchmark has the following set of distinguishing features:
The workload profile follows a real-world model
validation scenario by updating the model with changes derived
by simulated user edits or transformations.</p>
        <p>The benchmark measures the performance of both initial
validation and incremental revalidation.</p>
        <p>This benchmark was designed with cross-technology
adaptations in mind. It can be implemented with different
model representation formats and query technologies.
4.3</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Characterization of realistic models</title>
      <p>
        In [
        <xref ref-type="bibr" rid="ref44">49</xref>
        ], I presented multidisciplinary graph metrics and
evaluated them on instance models from different domains. As a
result, I proposed some metrics which turned out to be useful
for characterizing the structure of models.
      </p>
      <p>Adapt multidisciplinary metrics for engineering models.
I performed a literature review and identified several graph
metrics from other disciplines. For evaluating these metrics,
I gathered instance models from software and systems
engineering domains:</p>
      <p>
        AutoFOCUS system models [
        <xref ref-type="bibr" rid="ref4">6</xref>
        ],
Building Information Models (BIM) [
        <xref ref-type="bibr" rid="ref15">18</xref>
        ],
Capella system models [
        <xref ref-type="bibr" rid="ref35">38</xref>
        ],
JaMoPP code models [
        <xref ref-type="bibr" rid="ref23">26</xref>
        ],
railway models from the Train Benchmark [
        <xref ref-type="bibr" rid="ref43">48</xref>
        ],
      </p>
      <p>
        Yakindu [
        <xref ref-type="bibr" rid="ref55">60</xref>
        ] statecharts.
      </p>
      <p>Statistical characterization of different domains and
models. I used both exploratory and confirmatory data analysis
techniques in order to determine the “usefulness” of metrics.</p>
      <sec id="sec-9-1">
        <title>5 https://www.r-project.org/</title>
        <p>I considered a metric useful if it separates models of different
domains from each other, while provides similar values for
models within the same domain. I also investigated whether
some of these metrics can distinguish real models from
autogenerated synthetic ones.</p>
        <p>Exploratory analysis relied on data visualization, while
confirmatory analysis used statistical methods (such as
performing Kolmogorov–Smirnov tests on the derived metrics
distributions). My initial finding is that different versions of
clustering coefficients (i.e. how tightly connected the model
elements are) were particularly useful for such
classifications. But, unsurprisingly, no single metric was able to
sufficiently handle all the domains. The analysis also provides
some insights that needs to be considered in future model
generators to synthesize realistic models.</p>
        <p>Automated classification of domain models using machine
learning. As a future research objective, I plan to use
machine learning techniques for automated classification of
domain models.</p>
        <p>Uniqueness. Up to my best knowledge, this is the first
investigation for using multidimensional graph metrics for
both characterizing the realism of models and distinguishing
different domain models from each other.
4.4</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Realistic model generation</title>
      <p>
        As a proposed contribution, I plan to design and develop a
generator that is capable of producing realistic models
scalable in size. While there are solutions for generating either
scalable or realistic models, there are no known approaches
for the combination of both, rendering this a high-risk
research task. The long-term research objective of generating
scalable and realistic models breaks down to the following
steps:
1. metrics guided generation of realistic models,
2. domain model generation by design space exploration [
        <xref ref-type="bibr" rid="ref2">4</xref>
        ],
3. scalable rule-based generation of domain models.
      </p>
      <sec id="sec-10-1">
        <title>Acknowledgments</title>
        <p>I would like to thank my advisor, Da´niel Varro´ for his
guidance during my research. I would also like to express my
gratitude to Istva´n Ra´th, Ga´bor Bergmann and numerous
colleagues an co-authors for sharing their suggestions and
ideas.
[2] Artop: The AUTOSAR Tool Platform. https://www.artop.</p>
        <p>org/.
[3] Sesame: RDF API and query engine.</p>
        <p>openrdf.org/.
http://www.
[7] AUTOSAR Consortium. The AUTOSAR Standard. http:
//www.autosar.org/.</p>
        <p>Capella.
[40] M. Rinne, E. Nuutila, and S. To¨ rma¨. INSTANS:
highperformance event processing with standard RDF and
SPARQL. In ISWC, 2012.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Apache</given-names>
            <surname>Jena</surname>
          </string-name>
          . http://jena.apache.org/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdeen</surname>
          </string-name>
          , D. Varro´,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Sahraoui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Nagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Debreceni</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          ´ .
          <article-title>Hegedu¨s, and</article-title>
          <string-name>
            <surname>A</surname>
          </string-name>
          ´ .
          <article-title>Horva´th. Multi-objective optimization in rule-based design space exploration</article-title>
          .
          <source>In ASE</source>
          , pages
          <fpage>289</fpage>
          -
          <lpage>300</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Albert</surname>
          </string-name>
          and
          <string-name>
            <surname>A.-L. Baraba</surname>
          </string-name>
          <article-title>´si. Statistical mechanics of complex networks</article-title>
          .
          <source>Rev. Mod. Phys.</source>
          ,
          <volume>74</volume>
          (
          <issue>1</issue>
          ):
          <fpage>47</fpage>
          -
          <lpage>97</lpage>
          ,
          <year>January 2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Aravantinos</surname>
          </string-name>
          et al.
          <article-title>AutoFOCUS 3: Tooling concepts for seamless, model-based development of embedded systems</article-title>
          .
          <source>In Joint Proceedings of ACES-MB &amp; WUCOR co-located with MoDELS</source>
          , pages
          <fpage>19</fpage>
          -
          <lpage>26</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bagan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonifati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ciucanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. H. L.</given-names>
            <surname>Fletcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lemay</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Advokaat</surname>
          </string-name>
          .
          <article-title>Generating flexible workloads for graph databases</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .,
          <volume>9</volume>
          (
          <issue>13</issue>
          ):
          <fpage>1457</fpage>
          -
          <lpage>1460</lpage>
          , Sept.
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          ´ .
          <source>Horva´th, I. Ra´th</source>
          , D. Varro´,
          <string-name>
            <given-names>A.</given-names>
            <surname>Balogh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Balogh</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>O¨ kro¨s. Incremental evaluation of model queries over EMF models</article-title>
          .
          <source>In MODELS</source>
          , pages
          <fpage>76</fpage>
          -
          <lpage>90</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bhattacharya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Iliofotou</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Neamtiu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          .
          <article-title>Graph-based analysis and prediction for software evolution</article-title>
          .
          <source>In ICSE</source>
          , pages
          <fpage>419</fpage>
          -
          <lpage>429</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Schultz</surname>
          </string-name>
          . The Berlin SPARQL benchmark.
          <source>International Journal on Semantic Web &amp; Information Systems</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bro</surname>
          </string-name>
          ´dka, P. Kazienko,
          <string-name>
            <given-names>K.</given-names>
            <surname>Musial</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Skibicki</surname>
          </string-name>
          .
          <article-title>Analysis of neighbourhoods in multi-layered dynamic social networks</article-title>
          .
          <source>Int. J. Computational Intelligence Systems</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Brottier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Fleurey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Baudry</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y. L.</given-names>
            <surname>Traon</surname>
          </string-name>
          .
          <article-title>Metamodel-based test generation for model transformations: an algorithm and a tool</article-title>
          . In ISSRE, pages
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bunke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Glauser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Tran</surname>
          </string-name>
          .
          <article-title>An efficient implementation of graph grammars based on the RETE matching algorithm</article-title>
          .
          <source>In Graph-Grammars and Their</source>
          Application to Computer Science, 4th International Workshop, Bremen, Germany, March 5-
          <issue>9</issue>
          ,
          <year>1990</year>
          , Proceedings, pages
          <fpage>174</fpage>
          -
          <lpage>189</lpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L. d. F.</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. N.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Travieso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. R. Villas</given-names>
            <surname>Boas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiqueira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. P.</given-names>
            <surname>Viana</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L. E. Correa</given-names>
            <surname>Rocha</surname>
          </string-name>
          .
          <article-title>Analyzing and modeling real-world phenomena with complex networks: a survey of applications</article-title>
          .
          <source>Advances in Physics</source>
          ,
          <volume>60</volume>
          (
          <issue>3</issue>
          ):
          <fpage>329</fpage>
          -
          <lpage>412</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          . Mapreduce:
          <article-title>Simplified data processing on large clusters</article-title>
          .
          <source>In OSDI</source>
          , pages
          <fpage>137</fpage>
          -
          <lpage>150</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Duchon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Flajolet</surname>
          </string-name>
          , G. Louchard, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Schaeffer</surname>
          </string-name>
          .
          <article-title>Boltzmann samplers for the random generation of combinatorial structures</article-title>
          .
          <source>Combinatorics, Probability &amp; Computing</source>
          ,
          <volume>13</volume>
          (
          <issue>4- 5</issue>
          ):
          <fpage>577</fpage>
          -
          <lpage>625</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>C.</given-names>
            <surname>Eastman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Teicholz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sacks</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Liston. BIM Handbook</surname>
          </string-name>
          :
          <article-title>A Guide to Building Information Modeling for Owners, Managers</article-title>
          , Designers, Engineers and Contractors. Wiley Publishing,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>O.</given-names>
            <surname>Erling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Averbuch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Larriba-Pey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chafi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prat-</surname>
          </string-name>
          Pe´rez, M. Pham, and
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          .
          <article-title>The LDBC social network benchmark: Interactive workload</article-title>
          .
          <source>In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data</source>
          , Melbourne, Victoria, Australia, May 31 - June 4,
          <year>2015</year>
          , pages
          <fpage>619</fpage>
          -
          <lpage>630</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Forgy</surname>
          </string-name>
          .
          <article-title>Rete: A fast algorithm for the many patterns/many objects match problem</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>19</volume>
          (
          <issue>1</issue>
          ):
          <fpage>17</fpage>
          -
          <lpage>37</lpage>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gray</surname>
          </string-name>
          , editor.
          <source>The Benchmark Handbook for Database and Transaction Systems (2nd Edition)</source>
          . Morgan Kaufmann,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>T.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Krause</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tichy</surname>
          </string-name>
          .
          <article-title>The TTC 2014 movie database case</article-title>
          .
          <source>TTC</source>
          <year>2014</year>
          , page
          <volume>93</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>B.</given-names>
            <surname>Izso</surname>
          </string-name>
          ´,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Sza´rnyas, I. Ra´th, and D. Varro´. IncQuery-D: Incremental graph search in the cloud</article-title>
          .
          <source>In BigMDE</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>B.</given-names>
            <surname>Izso</surname>
          </string-name>
          ´,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Sza´rnyas, I. Ra´th, and D. Varro´. MONDO-SAM: A framework to systematically assess MDE scalability</article-title>
          .
          <source>In BigMDE@STAF</source>
          , pages
          <fpage>40</fpage>
          -
          <lpage>43</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jackson</surname>
          </string-name>
          .
          <article-title>Alloy: a lightweight object modelling notation</article-title>
          .
          <source>ACM Trans. Softw</source>
          . Eng. Methodol.,
          <volume>11</volume>
          (
          <issue>2</issue>
          ):
          <fpage>256</fpage>
          -
          <lpage>290</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [26]
          <string-name>
            <surname>JaMoPP</surname>
          </string-name>
          .
          <source>The Java Model Parser and Printer</source>
          ,
          <year>2016</year>
          . http: //www.jamopp.org/index.php/JaMoPP.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [27]
          <string-name>
            <surname>JBoss</surname>
          </string-name>
          . Drools. http://www.jboss.org/drools.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kazienko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Musial</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Kajdanowicz</surname>
          </string-name>
          .
          <article-title>Multidimensional social network in the social recommender system</article-title>
          .
          <source>IEEE Trans. Systems, Man, and Cybernetics</source>
          ,
          <volume>41</volume>
          (
          <issue>4</issue>
          ):
          <fpage>746</fpage>
          -
          <lpage>759</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>C.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Johannsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deeb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sattler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Knacker</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Niadzelka</surname>
          </string-name>
          .
          <article-title>An SQL-based query language and engine for graph pattern matching</article-title>
          .
          <source>In ICGT</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>C.</given-names>
            <surname>Krause</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tichy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Giese</surname>
          </string-name>
          .
          <article-title>Implementing graph transformations in the bulk synchronous parallel model</article-title>
          .
          <source>In FASE</source>
          .
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>B.</given-names>
            <surname>Luteberget</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Johansen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Steffen</surname>
          </string-name>
          .
          <article-title>Rule-based consistency checking of railway infrastructure designs</article-title>
          .
          <source>In Integrated Formal Methods - 12th International Conference, IFM 2016</source>
          , Reykjavik, Iceland, June 1-5,
          <year>2016</year>
          , Proceedings, pages
          <fpage>491</fpage>
          -
          <lpage>507</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J.</given-names>
            <surname>Maginecz</surname>
          </string-name>
          and
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Sza´rnyas. Sharded joins for scalable incremental graph queries</article-title>
          .
          <source>In 23rd PhD Mini-Symposium</source>
          , Budapest University of Technology and Economics,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J.</given-names>
            <surname>Makai</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Sza´rnyas, I. Ra´th, A´ . Horva´th, and D. Varro´. Optimization of incremental queries in the cloud</article-title>
          . In 3rd International Workshop on Model-
          <article-title>Driven Engineering on and for the Cloud (CloudMDE) at</article-title>
          MODELS,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [34]
          <article-title>MONDO project</article-title>
          .
          <source>Scalable Modeling and Model Management on the Cloud Project, 7th EU Framework Programme</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          .
          <article-title>DBpedia SPARQL benchmark: Performance assessment with real queries on real data</article-title>
          .
          <source>In ISWC</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mougenot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Darrasse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Blanc</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Soria</surname>
          </string-name>
          .
          <article-title>Uniform random generation of huge metamodel instances</article-title>
          .
          <source>In ECMDAFA</source>
          , pages
          <fpage>130</fpage>
          -
          <lpage>145</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Neo</given-names>
            <surname>Technology</surname>
          </string-name>
          . Neo4j. http://neo4j.org/.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>[38] PolarSys. capella/.</mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Rocco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Ruscio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Iovino</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Pierantonio</surname>
          </string-name>
          .
          <article-title>Mining correlations of ATL model transformation and metamodel metrics</article-title>
          .
          <source>In MiSE</source>
          , pages
          <fpage>54</fpage>
          -
          <lpage>59</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Neubauer</surname>
          </string-name>
          .
          <article-title>Constructions from Dots and Lines</article-title>
          .
          <source>Bulletin of American Society for Information Science &amp; Technology, August/September</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>M.</given-names>
            <surname>Scheidgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zubow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fischer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Kolbe</surname>
          </string-name>
          . Automated and
          <article-title>transparent model fragmentation for persisting large models</article-title>
          .
          <source>In Model Driven Engineering Languages and Systems - 15th International Conference, MODELS</source>
          <year>2012</year>
          , Innsbruck, Austria,
          <source>September 30-October 5</source>
          ,
          <year>2012</year>
          . Proceedings, pages
          <fpage>102</fpage>
          -
          <lpage>118</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hornung</surname>
          </string-name>
          , G. Lausen, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Pinkel</surname>
          </string-name>
          .
          <article-title>SP2Bench: A SPARQL performance benchmark</article-title>
          . Shanghai, China,
          <year>2009</year>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>B.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Trinity: a distributed graph engine on a memory cloud</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>A.</given-names>
            <surname>Silberschatz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. F.</given-names>
            <surname>Korth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sudarshan</surname>
          </string-name>
          .
          <source>Database System Concepts</source>
          ,
          <source>5th Edition</source>
          .
          <string-name>
            <surname>McGraw-Hill Book</surname>
          </string-name>
          Company,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sza</surname>
          </string-name>
          <article-title>´rnyas, B</article-title>
          . Izso´ , I. Ra´th,
          <string-name>
            <given-names>D.</given-names>
            <surname>Harmath</surname>
          </string-name>
          , G. Bergmann, and D. Varro´.
          <string-name>
            <surname>IncQuery-D:</surname>
          </string-name>
          <article-title>A distributed incremental model query framework in the cloud</article-title>
          .
          <source>In MODELS</source>
          , pages
          <fpage>653</fpage>
          -
          <lpage>669</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sza</surname>
          </string-name>
          <article-title>´rnyas, B</article-title>
          . Izso´,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Ra´th, and</article-title>
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Varr o´. The Train Benchmark: Cross-technology performance evaluation of continuous model validation</article-title>
          .
          <source>Software and Systems Modeling</source>
          ,
          <year>2017</year>
          . Accepted.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sza</surname>
          </string-name>
          <article-title>´rnyas, Z. Ko˝ va´ri, A. Sala´nki, and D. Varro´ . Towards the characterization of realistic models: Evaluation of multidisciplinary graph metrics</article-title>
          .
          <source>In MODELS</source>
          , pages
          <fpage>87</fpage>
          -
          <lpage>94</lpage>
          , New York, NY, USA,
          <year>2016</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sza</surname>
          </string-name>
          <article-title>´rnyas</article-title>
          , J. Maginecz, and
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Varro´ . Evaluation of optimization strategies for incremental graph queries</article-title>
          .
          <source>Periodica Polytechnica</source>
          ,
          <string-name>
            <surname>EECS</surname>
          </string-name>
          ,
          <year>2017</year>
          . Accepted.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>G.</given-names>
            <surname>Sza</surname>
          </string-name>
          <article-title>´rnyas, O. Semera´th, I. Ra´th, and D. Varro´ . The TTC 2015 Train Benchmark case for incremental model validation</article-title>
          .
          <source>In Proceedings of the 8th TTC, a part of STAF</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          .
          <article-title>Community detection via heterogeneous interaction analysis</article-title>
          .
          <source>Data Min. Knowl. Discov.</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [53]
          <article-title>The Eclipse Foundation</article-title>
          . Papyrus,
          <year>2015</year>
          . https://eclipse. org/papyrus/.
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ujhelyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          ´ . Hegedu¨ s,
          <string-name>
            <surname>A</surname>
          </string-name>
          ´ .
          <source>Horva´th, B. Izs o´, I. Ra´th, Z</source>
          . Szatma´ri, and
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Varr o´. EMF-IncQuery: An integrated development environment for live model queries</article-title>
          .
          <source>Sci. Comput</source>
          . Program.,
          <volume>98</volume>
          :
          <fpage>80</fpage>
          -
          <lpage>99</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>D.</given-names>
            <surname>Varr</surname>
          </string-name>
          o´,
          <string-name>
            <given-names>G.</given-names>
            <surname>Bergmann</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          ´ . Hegedu¨ s,
          <string-name>
            <surname>A</surname>
          </string-name>
          ´ .
          <article-title>Horva´th, I. Ra´th, and</article-title>
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ujhelyi</surname>
          </string-name>
          .
          <article-title>Road to a reactive and incremental model transformation platform: three generations of the VIATRA framework</article-title>
          .
          <source>SOSYM</source>
          ,
          <volume>15</volume>
          (
          <issue>3</issue>
          ):
          <fpage>609</fpage>
          -
          <lpage>629</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>G.</given-names>
            <surname>Varro´</surname>
          </string-name>
          , A. Schu¨ rr, and
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Varro´ . Benchmarking for graph transformation</article-title>
          .
          <source>In VL/HCC</source>
          . IEEE Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [57]
          <fpage>W3C</fpage>
          .
          <article-title>Resource Description Framework (RDF)</article-title>
          . http:// www.w3.org/standards/techs/rdf/.
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [58]
          <string-name>
            <given-names>J.</given-names>
            <surname>Whittle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Hutchinson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Rouncefield</surname>
          </string-name>
          .
          <article-title>The state of practice in model-driven engineering</article-title>
          .
          <source>IEEE Software</source>
          ,
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <fpage>79</fpage>
          -
          <lpage>85</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [59]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Xin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Stoica.</surname>
          </string-name>
          <article-title>Graphx: a resilient distributed graph system on spark</article-title>
          .
          <source>In GRADES co-loated with SIGMOD/PODS, page 2</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [60]
          <string-name>
            <surname>Yakindu</surname>
          </string-name>
          . Statechart Tools. http://statecharts.org/.
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [61]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zu</surname>
          </string-name>
          <article-title>¨ ndorf. AntWorld benchmark specification</article-title>
          ,
          <source>GraBaTs</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>