<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The TTC 2015 Train Benchmark Case for Incremental Model Validation⇤</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gábor Szárnyas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oszkár Semeráth István Ráth Budapest University of Technology and Economics Department of Measurement and Information Systems H-1117 Magyar tudósok krt.</institution>
          <addr-line>2, Budapest</addr-line>
          ,
          <country country="HU">Hungary</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <abstract>
        <p>In model-driven development of safety-critical systems (like automotive, avionics or railways), wellformedness of models is repeatedly validated in order to detect design flaws as early as possible. Validation rules are often implemented by a large amount of imperative model traversal code which makes those rule implementations complicated and hard to maintain. Additionally as models are rapidly increasing in size and complexity, efficient execution of these operations is challenging for the currently available toolchains. However, checking well-formedness constraints can be interpreted as evaluation of model queries, and the operations as model transformations, where the validation task can be specified in a concise way, and executed efficiently. This paper presents a benchmark case and an evaluation framework to systematically assess the scalability of validating and revalidating well-formedness constraints over large models. The benchmark case defines a typical well-formedness validation scenario in the railway domain including the metamodel, an instance model generator, and a set of well-formedness constraints captured by queries and repair operations (imitating the work of systems engineers by model transformations). The benchmark case focuses on the execution time of the query evaluations with a special emphasis on reevaluations, as well as simple repair transformations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Dániel Varró</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>⇤ This work was partially supported by the MONDO (EU ICT-611125) project and Red Hat Inc.
are significantly different [2]. Specifically, modeling tools use much more complex queries than typical
transactional systems, and the real world performance is more affected by response time (i.e. execution
time for a specific operation such as validation or transformation) rather than throughput (i.e. the number
of parallel transactions). Also, previous TTC cases did not focus on measuring the performance of query
reevaluation.</p>
      <p>The source code is available at https://github.com/FTSRG/trainbenchmark-ttc. This case is
strongly based on the Train Benchmark [1], an ongoing benchmark project of our research group.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Case Description</title>
      <p>A benchmark case configuration in the Train Benchmark consists of an instance model (Section 2.2), a
query and a repair transformation (Section 3) describing constraint violating elements. As a result of
a benchmark case run, the execution times of each phase, the memory usage and the number of invalid
elements are measured and recorded. The number of invalid elements are used to check the correctness
of the validation, however the collection of element identifiers must also be available for later processing.
2.1</p>
      <p>Metamodel
The metamodel of the Train Benchmark is shown in Figure 3. A train route is defined by a sequence of
sensors. Sensors are associated with track elements which are either segments (with a specific length) or
switches. A route follows certain switch positions which describe the required state of a switch belonging
to the route. Different route definitions can specify different states for a specific switch. Each route has
a semaphore on its entry and exit. Figure 1 shows a typical railway network.</p>
      <p>Every railway element is a subtype of the class RailwayElement which has a unique identifier (id).
The root of the model is a RailwayContainer which contains the semaphores and the routes of the model.
Additionally, the railway container has an invalids reference for storing elements. This is used for
serializing EMF models (Section B.1.1).</p>
      <sec id="sec-3-1">
        <title>Route</title>
        <p>The instance models are systematically generated for the metamodel: small model fragments are created
and connected to each other. Based on the model queries, the generator injects errors to the model by
removing edges and changing attribute values with a certain probability. The probability of injecting an
error to violate a pattern (Section 3) is shown in Table 1.</p>
        <p>This generation method controls the number of matches of all defined model queries. To avoid
highly symmetric models, the exact number of elements and cardinalities are randomized. This brings
Semaphore</p>
      </sec>
      <sec id="sec-3-2">
        <title>Entry</title>
      </sec>
      <sec id="sec-3-3">
        <title>Segment</title>
      </sec>
      <sec id="sec-3-4">
        <title>Switch</title>
      </sec>
      <sec id="sec-3-5">
        <title>Current position</title>
        <p>of the Switch</p>
      </sec>
      <sec id="sec-3-6">
        <title>Switch position</title>
        <p>of the Route</p>
      </sec>
      <sec id="sec-3-7">
        <title>Exit</title>
        <p>artificially generated models closer to real world instances and prevents query tools from abusing the
artificial regularity of the model. To assess scalability, the benchmark uses instance models of growing
sizes, each model containing twice as many model elements as the previous one. The instance models are
designated by powers of two (1, 2, 4, 8, . . . ), the smallest model containing about 5000 model elements.
2.3</p>
        <p>Benchmark Phases</p>
        <p>Model
increasing
size</p>
        <p>Query
Change set size
{fixed, proportional}</p>
        <p>Run: × 5</p>
        <p>Iteration: × 10
1. Read
2. Check
3. Repair</p>
        <p>4. Recheck
# of invalid elements
Execution time
Memory usage
# of invalid elements
Execution time
Memory usage</p>
        <p>Measurements</p>
        <p>To simulate a typical validation workload, four phases were defined (Figure 2).
1. During the read phase, the instance model is loaded from hard drive to memory. This includes the
parsing of the input as well as initializing data structures (e.g. indexes) of the tool.
2. In the check phase, the instance model is queried to identify invalid elements. The result of this
phase is a set of the invalid elements, which will be used in the next phase.
3. In the repair phase, the model is changed to simulate the effects (and measure the performance) of
model modifying operations. The transformations are always performed on a subset of the model
elements returned by the check phase.
4. The revalidation of the model is carried out in the recheck phase similarly to the check phase. In
real-world scenarios, there are often multiple transformations in the system which may interfere
with the results of the query. Because of this, we require the tools to reevaluate the query with
regards to the current state of the model.</p>
        <p>The repair operation intends to fix invalid models elements based on the invalid objects identified
during the previous check or recheck phase. We defined two strategies to determine the size of the
change set:
fixed 10 of invalid model elements is modified. This tests the efficiency of handling small change sets.
proportional 10% of the dresult set is modified. This tests the efficiency of handling large change sets.
2.4</p>
        <p>Queries
The queries used in the validation scenario are introduced both informally and as graph patterns. In
complexity, the queries range from simple attribute value checks to complex path constraints consisting
of several join operations: two simple queries use at most 2 objects (PosLength and SwitchSensor) and
three complex queries use 4–8 objects and multiple join operations (RouteSensor, SemaphoreNeighbor,
SwitchSet).
2.4.1</p>
        <p>Graph Patterns and Transformations
The purpose of the queries is to check well-formedness constraints by matching graph patterns looking
for errors in the model. The graph patterns are defined by a name, a list of symbolic object parameters
and the constraints to be satisfied by the parameters. A pattern match maps each symbolic parameter
to a model object, where the mapping satisfies the conditions defined by the constraints. The result of
the query is the set of all possible matches. The absence of pattern matches means that the model is
well-formed, and the matches of the error pattern marks the invalid elements. The match set contains all
matches for a given pattern.</p>
        <p>
          In the repair phase, some model elements are deterministically selected and repaired. In order to
ensure repeatable results, (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) the elements for transformation are chosen using a pseudorandom generator,
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) the elements are always selected from the deterministically sorted list (Section 2.4).
3
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Tasks</title>
      <p>For each task, we present the well-formedness constraints. The queries are looking for violations of these
constraints. We describe the meaning and the goal of each query and show a graphical notation of the
associated graph pattern. We also define the matches as tuples to ensure that the ordering of the matches
is consistent between the implementations (Section B.2). The repair transformations are represented as
graph transformations. For defining the patterns and transformations, we used a graphical syntax similar
to GROOVE [3] with a couple of additions:
• Filter conditions are shown in italic font.
• Negative application conditions are shown with in a red rectangle with the NEG caption.
• The insertions are with a «new» caption. Attribute updates are also show in green.
PosLength. Every segment must have a positive segment: Segment
length. length length + 1
Query. The query checks for segments with a length less segment.length 0
than or equal to zero. hsegmenti
Repair transformation. The length attribute of the segment in the match is updated to length + 1.
Goal. This query defines an attribute check. This is a common use case in validation scenarios.
sSQewuneistrocyrh.cSToehnnenseoqcrut.eerdyEtocvheietr.ycksswfoitrchswmitcuhsteshtahvaet haatvleeansot soenne- sesnwso:rSwitch s«enneswo»r sens«onr:eSwe»nsor
sors associated with them. sensor: Sensor
Repair transformation. A sensor is created and con- NEG
nected to the switch. hswi
Goal. This query checks whether an object is connected to a relation. This pattern is common in more
complex queries, e.g. it is used in the RouteSensor and the SemaphoreNeighbor queries.
sShwoiwtchGSOeti.f alTlhseweitncthryessaelmonagphtohreeroofutae raorueteinmthaey poonsliy- signsaelmaphore: Semaphore entry
tion prescribed by the route.</p>
      <p>Query. The query checks for routes which have a
semaphore that show the GO signal. Additionally, the
route follows a switch position (swP) that is connected
to a switch (sw), but the switch position (swP.position) hsemaphore, route, swP, swi
defines a different position from the current position of the switch (sw.currentPosition).
Repair transformation. The currentPosition attribute of the switch is set to the position of swP.
Goal. This pattern tests the efficiency of the join and filtering operations.
sdRwiorieutccttehlyStehwnaisttohbrt.ehloenAsgalslmsteoernoasuortroesu.ttehamtuasret aalsssoocbieataesdsowciiathteda de«fsnineereonwdus»Btoery::RSoeuntseor NEG fsoellnoswosr swP:sSww: iStcwhsiPwtcoihtscihtion
Query. The query looks for sensors that are connected to
a switch, but the sensor and the switch are not connected
to the same route. hroute, sensor, swP, swi
Repair transformation. The missing definedBy edge is inserted by connecting the route in the match to
the sensor.</p>
      <p>Goal. This pattern checks for the absence of circles, so the efficiency of the join and the antijoin
operations is tested.</p>
      <p>SemaphoreNeighbor. Routes that are connected semaphore: Semaphore
tsharmoeugshemseanpshoorrse.and track elements must belong to the roeuxteit1: Route «reonnuettwrey2»: Route NEG
Query. The query checks for routes (route1) which have definedBy definedBy
an exit semaphore (semaphore) and a sensor (sensor1)
connected to a track element (te1). This track element sensor1: Sensor sensor2: Sensor
is connected to another track element (te2) which is sensor sensor
connected to another sensor (sensor2) which (partially) te1: TrackElement connectsTo te2: TrackElement
defines another, different route (route2), while the route1 route2
semaphore is not on the entry of this route (route2). hsemaphore, route1, route2,
Repair transformation. The route2 node is connected to sensor1, sensor2, te1, te2i
the semaphore node with an entry edge.</p>
      <p>Goal. This pattern checks for the absence of circles, so the efficiency of the join operation is tested.
One-way navigable references are also present in the constraint, so the efficiency of their evaluation
is also measured. Subsumption inference is required, as the two track elements can be switches or
segments.</p>
      <p>Acknowledgements. The authors would like to thank Benedek Izsó for originally designing and
implementing the Train Benchmark, Tassilo Horn for providing valuable comments regarding both the
specification of the case and the implementation of the benchmark framework, and Zsolt Ko˝vári for his
contributions in the benchmark and visualization scripts.</p>
    </sec>
    <sec id="sec-5">
      <title>Metamodel</title>
      <p>semaphores</p>
      <p>0..*</p>
      <p>Semaphore
signal : Signal
&lt;&lt;enumeration&gt;&gt;</p>
      <p>Signal
FAILURE
STOP</p>
      <p>GO
&lt;&lt;enumeration&gt;&gt;</p>
      <p>Position
FAILURE
LEFT
RIGHT
STRAIGHT</p>
      <p>RailwayContainer
1 exit
1 entry
routes
0..*</p>
      <p>Route
1
route
definedBy</p>
      <p>2..*
0..* follows</p>
      <p>SwitchPosition
position : Position
0..* positions
1 switch</p>
      <p>Switch
currentPosition : Position
Sensor
0..1 sensor
0..* elements
TrackElement</p>
      <p>0..* connectsTo
Segment
length : EInt
(a) Containment hierarchy and references
RailwayContainer invalids
0..*</p>
      <p>RailwayElement
id : EInt
semaphores
0..*
Semaphore
routes
0..*</p>
      <p>Route</p>
      <p>TrackElement</p>
      <p>SwitchPosition</p>
      <p>Sensor
Segment</p>
      <p>Switch
(b) Supertype relations</p>
    </sec>
    <sec id="sec-6">
      <title>B Implementation</title>
      <p>B.1
B.1.1</p>
      <p>Instance Model Formats</p>
      <p>EMF Models
To aid the development of case solutions, we provide a framework using predefined input and output
formats, along with two reference implementations.</p>
      <p>
        The EMF models are serialized to standard XMI format using the generated EMF code. The injection
of errors during the instance model generation (Section 2.2) causes some containment errors. Invalid
elements violating the containment hierarchy could not be serialized. As the benchmark requires invalid
models, the invalid elements are connected to the root element of the instance model by the invalids
reference Figure 3b.
attribute / edge
Segment.length
Route.definedBy
Route.exit
Switch.sensor
SwitchPosition.position
The generator defines a graph-like interface for creating the models. The EMF model generator is an
implementation of this interface. To generate non-EMF models, the following approaches are
recommended: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) either create a custom class which implements the Generator interface or (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) generate the
EMF models and convert them to another representation.
      </p>
      <p>B.2</p>
      <p>Ordering of the Match Set
The matches in the match set may be returned in any collection (e.g. a list or a set) in any order, given
that the collection is unique. In order to ensure that the benchmark is repeatable, this collection is copied
to a sorted list. The sorting is carried out using by defining the ordering between matches.</p>
      <p>To compare matches M1 = ha1, a2, . . . , ani and M2 = hb1, b2, . . . bni, we take the first elements in each
match (a1 and b1) and compare their identifiers. If the first elements are equal, we compare the second
elements (a2 and b2) and so on until we find two different model elements. This is guaranteed by the fact
that the collection is unique, so it cannot contain two identical matches.</p>
      <p>For example, for the RouteSensor query, a match set may be returned by tool A as list
(hroute : 8, sensor : 12, switchPosition : 4, sw : 10i; hroute : 5, sensor : 1, switchPosition : 13, sw : 7i)
and by tool B as set</p>
      <p>{hroute : 5, sensor : 1, switchPosition : 13, sw : 7i; hroute : 8, sensor : 12, switchPosition : 4, sw : 10i}
For both implementations, the framework creates a sorted copy, resulting in the list</p>
      <p>(hroute : 5, sensor : 1, switchPosition : 13, sw : 7i; hroute : 8, sensor : 12, switchPosition : 4, sw : 10i)
The ordered list is also used to ensure that the transformations are performed on the same model elements,
regardless of the return order of the match set.</p>
      <p>B.3</p>
      <p>Building the Projects
The Train Benchmark case defines a framework and application programming interface that enables the
integration of additional tools. The reference implementation contains a benchmark suite for queries
implemented in Java and EMF-INCQUERY. Both the framework and the reference implementations are
written in Java 7.</p>
      <p>For building the projects, we used Apache Maven1, one of the most widely used Java build systems.
The build is configured so that the binaries are able to run without an Eclipse application. A significant
proportion of modeling tools are integrated to the Eclipse plug-in environment. In order to support such
systems, our projects also have a plug-in nature. This way, they can be integrated with Eclipse (and
OSGi) plug-ins as well and can be built without Maven.</p>
      <p>B.4</p>
      <p>Running the Projects
The scripts can parametrized by a simple JSON configuration file which defines:
• the range of the instance models from minSize to maxSize,
• the list of queries specified (Section 2.4),
• the list of tools,
• the number of runs,
• the number of repair–recheck iterations,
• the change set strategies,
• the JVM arguments (e.g. maximum heap memory).</p>
      <p>The default configuration is stored in the config/config.json file. Please use this as a basis for
your configuration.
{
"MinSize": 1,
"MaxSize": 2,
"Queries": ["PosLength", "RouteSensor", "SwitchSensor", "SwitchSet", "SemaphoreNeighbor"],
"Tools": [&lt;your tool&gt;],
"ChangeSets": ["fixed", "proportional"],
"Runs": 1,
"IterationCount": 5,
"JVM": {"vmargs": "-Xmx4G"}
B.5</p>
      <p>Interpreting the Output
Measurements are automatically recorded by our benchmark framework and stored in TSV
(TabSeparated Values) format. This can be used to automatically create diagrams with the provided R2 script
and provide comparable plots. For publishing performance results, please stick to the format generated
by the framework.</p>
      <p>Table 2 shows an example output. The ChangeSet defines the change set size (fixed or proportional,
see Figure 2). The Train Benchmark is executed 5 times, the index of the current run is stored in the
RunIndex attribute. The Query is executed by the Tool on the model with the given Size. The validation
errors are repaired in multiple iterations, the index of the current iteration is shown in the Iteration
attribute. Multiple values (MetricValue) of different metrics (MetricName) are measured during the
benchmark. The execution time (time) and memory consumption (memory) for the read, check, repair
and recheck phases are collected. The name the current phase is defined by the PhaseName. Additionally,
the result set size (rss) is stored for the check phase and the iterations in the recheck phase.
C</p>
    </sec>
    <sec id="sec-7">
      <title>Evaluation Criteria</title>
      <p>The solutions are checked and evaluated for functional, usability and performance aspects.
C.1</p>
      <p>Correctness and Completeness of Model Queries and Transformations
The goal of the correctness check is to determine if the different model query and transformation tasks
are correctly and fully implemented in the submitted solutions. We provide the number of invalid model
elements in several models detected by our reference implementation for each query and iteration step.
If the result sizes are consistently equal, the solution is considered to be correct.</p>
      <p>The expected results are available at https://github.com/FTSRG/trainbenchmark-ttc/tree/
master/expected-results.</p>
      <p>Each task is scored independently 0 3 points by the following rules:
• 0 points: The task is not solved.
• 1 2 points: The task is partially solved, the solution provides the subset or the superset of the
expected results.
• 3 points: The task is completely and correctly solved.
•
1 point: Only the query is implemented, but the transformation is not.</p>
      <p>Correctness and completeness: 5 tasks ⇥ 3 points = 15 points
C.2</p>
      <p>Conciseness
The validation rules are frequently changed and extended, therefore it is important to be able to define
queries and transformations in a concise manner. These properties are scored based on the following
rules:
• 0 points: The task is not solved.
• 1 point: The task is solved, but the solution is not significantly more concise than it would be in
a general-purpose imperative language (e.g. Java), or the task is partially solved and the result set
needs additional processing.
• 2 points: The task is solved, the query and the transformation is defined in a declarative, visual or
other query language, but the specification is hard to formulate.
• 3 points: The solution is compact, the query and the transformation are defined in a concise
manner.
•
1 point: Either the query or the transformation is implemented.</p>
      <p>Conciseness: 5 tasks ⇥ 3 points = 15 points
C.3</p>
      <p>Readability
The readability and descriptive power of each query and transformation is scored with respect to a model
validation use case. The score represents how well model queries are used as model constraints, and
how well repair operations can be expressed by model transformations. The score is given based on the
following rules:
• 0 points: The task is not solved.
• 1 point: The task is solved, but the solution is not significantly more readable than it would be in
a general-purpose imperative language (e.g. Java), or the task is just partially solved. For example,
a typical EMF validator should get 1 point.
• 2 points: The task is solved, the query and the transformation follows the description of the
constraint and repair rule, but it is difficult to comprehend the meaning of the solution. For example,
a foreign key constraint checked by a query formulated in SQL should get 2 points.
• 3 points: The solution could be presented in the documentation of the modeling domain, and it
is easier to comprehend than a textual description in natural language. For example, a solution
similar to the graphical notation used in this paper should get 3 points.
•</p>
      <p>1 point: If the language is only able to express either the constraint (e.g. OCL) or the repair
operation.</p>
      <p>Readability: 5 tasks ⇥ 3 points = 15 points
C.4</p>
      <p>Performance on Large Models
The goal of the performance measurements is to check the applicability of the submitted solutions on
large industrial models. During the performance tests the execution times will be measured for different
scenarios and increasing model sizes.</p>
      <p>Please restrict your benchmarks to those input models that can be processed within 5 minutes or
less. Runs that take longer than 5 minutes will not be considered in the evaluation. Please provide a
solution that can run on an x64-based Linux system with 4+ GB of memory, and that can be started on
the command-line. This will be important to reproduce your results on a remote testing system. Please
document the setup of your solution and the requirements to the system environment.</p>
      <p>We defined two validation scenarios, based on the phases defined in Section 2.3:
batch The model is loaded (read) and validated (check).
repeated The model is loaded (read) and validated (check), then the model is edited (repair) and
revalidated (recheck) 10 times.</p>
      <p>The performance of the solutions are compared in 20 tournaments:
• The tournaments are calculated for the 5 tasks. If a solution skips a task, it is not considered in the
tournament.
• Each solution is measured for both batch and repeated validation.
• Each solution is measured for both fixed and proportional change sets.</p>
      <p>A solution gets from 0 to 1 points for a tournament which is launched for increasing model sizes.
The score is based on the maximum size that the solution is able to handle, and its execution time relative
to the fastest solution. Each measurement is executed 5 times and the median value is taken.
• The model size is increased as long as there is a solution that is able to solve it in the given time
limit. This results in rounds k = 1, 2, 3, . . . , n for sizes 2k 1 (1, 2, 4, . . . , 2n 1).
• For each tournament, a solution earns a score between 0 and 1, determined by
Â kn=1 score(k)
Â kn=1 k
score(k) =
(scoresize(k) ⇥ scoretime(k), if the solution runs correctly and within the given time limit,
0, if the solution fails to run correctly or exceeds the given time limit,
and Â kn=1 k = n · (n + 1)/2 is used for normalizing the result.
• For each round k from 1 to n, if a solution is able to complete the validation, it is rewarded k points:
– round 1 (size 1): the winner earns 1 point,
– round 2 (size 2): the winner earns 2 points,
– round 3 (size 4): the winner earns 3 points,
– . . .</p>
      <p>– round n (size 2n 1): the winner earns n points.</p>
      <p>The formula is specified as:</p>
      <p>scoresize(k) = k
• The fastest solution in each round earns 1 point, the other solutions earn partial points, based on
the proportion of the current solution’s execution time to the fastest execution time. The logarithm
of this ratio for base 2 defines the score. For example:
– if a solution takes 2⇥ as long, it earns 1⁄2 points,
– if a solution takes 4⇥ as long, it earns 1⁄3 points,
– if a solution takes 8⇥ as long, it earns 1⁄4 points,
– and so on.</p>
      <p>The formula is specified as:
scoretime(k) =</p>
      <p>1
the solution’s execution time in round k
1 + log2 the fastest execution time in the round k
In conclusion, a solution earns up to 20 points for performance:</p>
      <p>5 tasks ⇥ 2 validation scenarios ⇥ 2 change set sizes ⇥ up to 1 points = 20 points
C.5</p>
      <p>Overall Evaluation
The scores of each aspect of the submitted solution are summarized to derive the final score (max. 65
points) used for ranking the submitted solutions.
ChangeSet
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed
fixed</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Benedek</given-names>
            <surname>Izsó</surname>
          </string-name>
          , Gábor Szárnyas &amp; István
          <string-name>
            <surname>Ráth</surname>
          </string-name>
          (
          <year>2014</year>
          )
          <article-title>: Train Benchmark</article-title>
          .
          <source>Technical Report</source>
          , Budapest University of Technology and Economics.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Benedek</given-names>
            <surname>Izsó</surname>
          </string-name>
          , Zoltán Szatmári, Gábor Bergmann, Ákos Horváth &amp; István
          <string-name>
            <surname>Ráth</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>: Towards Precise Metrics for Predicting Graph Query Performance</article-title>
          .
          <source>In: ASE</source>
          <year>2013</year>
          , IEEE, pp.
          <fpage>412</fpage>
          -
          <lpage>431</lpage>
          , doi:10.1109/ASE.
          <year>2013</year>
          .
          <volume>6693100</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Arend</given-names>
            <surname>Rensink</surname>
          </string-name>
          (
          <year>2004</year>
          ):
          <article-title>The GROOVE simulator: A tool for state space generation</article-title>
          .
          <source>In: Applications of Graph Transformations with Industrial Relevance</source>
          , Springer, pp.
          <fpage>479</fpage>
          -
          <lpage>485</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>