<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Stream Processing: The Matrix Revolutions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Romana Pernischova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florian Ruosch</string-name>
          <email>florian.ruosch@uzh.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Dell'Aglio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abraham Bernstein</string-name>
          <email>bernsteing@ifi.uzh.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Zurich</institution>
          ,
          <addr-line>Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <fpage>15</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>The growth of data velocity sets new requirements to develop solutions able to manage big amounts of dynamic data. The setting becomes even more challenging when such data is heterogeneous in schemata or formats, such as triples, tuples, relations, or matrices. Looking at the state of the art, traditional stream processing systems only accept data in one of these formats. Semantic technologies enable the processing of streams combining di erent shapes of data. This article presents a prototype that transforms SPARQL queries to Apache Flink topologies using the Apache Jena parser. With a custom data type and tailored functions, we integrate matrices in Jena and therefore, allow to mix graphs, relational, and linear algebra in an RDF graph. This provides a proof of concept that queries written for static data can easily be run on streams with the usage of the streaming engine Flink, even if they contain multiple of the aforementioned types.</p>
      </abstract>
      <kwd-group>
        <kwd>query continuous queries streams RDF SPARQL Flink linear algebra relational algebra</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The processing of real-time information is getting more and more critical, as
the number of data stream sources is rapidly increasing. Often, reactivity is an
important requirement when working with this kind of data: the value of the
output decreases quickly over time. The state of the art to process unbounded data
reactively relies on stream processing engines which set their roots in database
and middleware research.</p>
      <p>The processing of this type of data is also relevant on the Web, where several
use cases can be found in the context of Internet of Things (and the related Web
of Things), as well as in social network and social media analytics. An interesting
challenge that emerges from the Web setting is the data heterogeneity, as shown
in the example below.</p>
      <p>A market research company is tasked with developing a system to analyze
the behavior of users of an online TV platform. In particular, they want to
investigate if certain images on TV programs cause customers to change TV
stations and if this behavior is similar among people who know each other.
This can result in customer speci c programs and tailored advertisements that
would induce the user to change the TV station or to stay on. Such an analysis
needs to combine data of di erent formats: the TV program (i.e. a stream of
images and sounds), the user activities (i.e. a relational stream), and the program
schedules and advertisement descriptions (i.e. a relational or graph database).
When performing this kind of analysis, it is common practice to represent the TV
program as a sequence of matrices, obtained by applying matrix speci c functions
like the Fast Fourier Transform (FFT). The FFT computes the frequencies of the
di erent images that appear in the video, and it enables an association which can
be used in in-depth analysis that includes the behavior and relationship data.
The additional data has a di erent shape than the images. Such data is usually
given through tables or graphs.</p>
      <p>To nd the results, this data needs to be combined: the stream data has to
be integrated to identify images which were last seen before switching stations.
The data which contains the time spend watching a speci c TV program is also
a stream, since it is unbounded.</p>
      <p>To the best of our knowledge, today we lack scalable big data platforms
able to manage streams of di erent types in a reactive and continuous way. In
this paper, we make a rst step in this direction by analyzing the problem of
processing three di erent types of data streams: matrices, relations, and graphs.
In other words, we want to investigate how to build a big data platform to process
streams containing matrices, tables, and graphs.</p>
      <p>The combination of the di erent types of streams requires some common
data model or strategy of handling the heterogeneity while processing a query. In
addition to this, such a platform should allow users to issue complex queries and
enable them to exploit di erent types of operators depending on the underlying
data. A query language is therefore needed to capture the needs of the user,
including operators to express complex functions and combination of streams.
This language depends on the chosen strategy for the integration of the di erent
streams. Finally, the query has to be processed over the streams in a continuous
fashion and should return a sequence of answers which are updated according
to the input streams.</p>
      <p>Our main contribution is a model to process streams of data in di erent
formats through relational and linear algebra operators. We exploit semantic
web technologies to cope with the format heterogeneity, and we adopt distributed
stream processing engine models as a basis to build an execution environment.
We show the feasibility of our approach through an implementation based on
Apache Jena and Flink.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Processing data in the context of the Web is challenging, since it often inherits
the issues that characterize big data. It su ers from a variety of problems: data
from multiple sources has di erent serializations, formats, and schemas. The
Semantic Web has shown to be a proper solution to cope with these kinds of
issues: it o ers a broad set of technologies to model, exchange, and query data on
the Web. RDF [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is a model to represent data. Conceptually, it organizes data in
a graph based structure, where the minimal information unit is the statement, a
triple composed by a predicate (the edge), a subject and an object (the vertices).
Subjects and predicates are resources, i.e. URIs denoting entities or concepts;
objects can be either URIs or literals, that are strings with an optional data
type, such as integer or date.
      </p>
      <p>
        SPARQL [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] is a protocol and RDF query language, used to manipulate and
retrieve linked data. It uses sets of triple patterns, called Basic Graph Patterns
(BGP), to match subgraphs. The language is similar to SQL and uses keywords
like SELECT and WHERE to address the underlying concepts. To create graphs
and run queries, the framework Apache Jena 1 can be used.
      </p>
      <p>When data is very dynamic and its processing needs to be reactive,
solutions like RDF and SPARQL may not su ce. Recently, several research groups
started to investigate how to adapt the Semantic Web stack to cope with
velocity. In this context, it is worth mentioning the work of the W3C RDF Stream
Processing (RSP) Community Group 2, which collected such e orts and led
several initiatives to disseminate the results. Relevant results of this trend are RDF
streams, as a (potentially unbounded) sequence of time-annotated RDF graphs,
and continuous extensions of SPARQL, which enable users to de ne tasks, as
well as queries to be evaluated over RDF streams. Windows are introduced to
be able to treat the unbounded data, which enables calculations over the data
inside the window. Without windowing there is no data completeness and the
triggering of executions is problematic.</p>
      <p>While the RDF Stream Processing trend introduced several notions to
manage streams, only an initial e ort has been dedicated to the creation of solutions
to cope with the volume of data generated in the Web context. The state of the
art in the processing of large amounts of streaming data relies on distributed
stream processing engines (DSPE). These platforms emerged as successors of
MapReduce frameworks and are developed to be deployed into clusters and to
run the processing of streams of data in a distributed fashion. Users are required
to design topologies, i.e. logical work ows of operations arranged in directed
acyclic graphs, which are taken as input by the DSPE and are deployed
according to the con guration settings and the hardware availability.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Several studies investigated di erent types of data and how to combine them.
With regards to the three types of data we are considering, Figure 1 shows some
of the query languages we considered as foundations of this study.
Graph stream processing There is not a common de nition of graph stream
processing. In the survey presented by McGregor [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], the focus is on processing
very large graphs: since they cannot be kept in memory, they are streamed
      </p>
      <sec id="sec-3-1">
        <title>1 Cf. https://jena.apache.org/</title>
      </sec>
      <sec id="sec-3-2">
        <title>2 Cf. https://www.w3.org/community/rsp/.</title>
        <p>
          into the system, and typical graph operations are run as on-line algorithms. A
di erent approach is the one taken by the RSP community group, which models
streams where data items are composed by graphs. In this case, the processing
consists of the execution of relational operators over portions of the stream (such
as aggregations), event pattern matching, or deductive inference processing [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
None of the studies mentioned above investigated the integration of streams of
graphs with other types of streams.
        </p>
        <p>
          Dealing with linear and rational algebras. SQL and SPARQL are two examples
of query languages to process tuples and graph-based data through relational
algebra. However, these kinds of operators can hardly be used to perform linear
algebra operations over matrices, such as transposition and calculating the
determinant. SciDB [
          <xref ref-type="bibr" rid="ref21 ref22">22, 21</xref>
          ] is one example of a system that bridges these two worlds.
This database stores arrays rather than tuples, and tasks are de ned through
an SQL-like language called AQL (Array Query Language). Moreover, Andejev,
He, and Risch [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] introduce their prototype that can be accessed with Matlab.
It provides storage of arrays in an RDF graph and retrieval of the data and its
meta-information using SciSPARQL. SciSPARQL is an extension of SPARQL
that incorporates array operations within the query. The authors focus on the
integration of the di erent format rather than on stream processing. They make
the processing of large amounts of static data easier.
        </p>
        <p>
          Another e ort in such a direction is LaraDB [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], which proposes Lara that
combines relational and linear algebra. It uses a new representation, called
associative table, into which relations, scalars, and matrices are recast. They map
operators from relational and linear algebra onto their functions and in this way
are able to express combinations of those.
        </p>
        <p>
          Looking at query languages, LARA [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] relies on abstract data types and
local optimizations; however, there is no known system that would support such
a language. EMMA [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is a language for parallel data analysis: its goal is to hide
the notion of parallelism behind a declarative language, which is realized using
monad comprehensions, which are based on set comprehension syntax. EMMA
introduces bags as the algebraic data types and enables the use of di erent
algebras by replacing the general union representation in a binary tree.
        </p>
        <p>
          While there is an ongoing trend in research to combine linear and relational
algebra, we are not aware of studies that focus on a streaming setting.
Stream Processing Engines Research on stream processing sets its foundation in
the database and the middleware communities [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The former proposed models
and methods to process streams according to the relational model, like CQL [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ],
while the latter took a di erent perspective, developing techniques to identify
relevant sequences of events in the input streams [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          The research on this eld got revitalized in the last years, as an evolution of
the MapReduce paradigm, which led to the development of distributed stream
processing engines (DSPE). Apache Spark Streaming [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] sits on top of the initial
Spark architecture, which implements batch processing. It focuses on stateless
operations and stateful windows. In contrast, Apache Storm [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] is natively a
stream processing engine and supports query operations such as joins and
aggregations. It provides a low-level API which allows for the use of di erent
programming languages. Apache Flink [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is optimized for cyclic or iterative processes.
Unlike Spark, it adopts a native streaming approach and can handle data that
does not t into RAM. The Google Data ow model [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and its implementation
in Apache Beam 3 present a di erent approach: they aim to act as a facade,
running a Data ow-compliant topology in a DSPE, such as Apache Spark, Flink,
or Google Cloud Data ow.
        </p>
        <p>All of the above systems support windowing and typical relational algebra
operators. Such platforms also o er support to linear algebra operations (through
plug-ins and extensions). However, the topologies are speci ed through
programmable APIs rather than a query language. Having such a tool would be
useful to let users with limited programming skills express their tasks through a
declarative language, without requiring users to code the topologies.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The Model</title>
      <p>In this section, we describe the model we envision to use for processing queries
over heterogeneous streams. Figure 2 shows a logical representation of the model
with a highlight on the three main challenges we identi ed. The rst one
(denoted by 1 in the picture) relates to the data integration: given a set of streams
containing graphs, relations, and matrices, how can they be integrated in a
common data model? The second one captures the user's needs: what is a suitable
query language to let the user express tasks combining relational and linear
algebra operators? The third one puts the pieces together: how to execute the
queries over the input data? In the following sections, we discuss the challenges
and propose our solution.</p>
      <sec id="sec-4-1">
        <title>3 Cf. https://beam.apache.org/.</title>
        <p>1
n
o
i
t
a
r
g
e
t</p>
        <sec id="sec-4-1-1">
          <title>In RDF Stream</title>
          <p>m
a
e
r
t
S</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Query</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>Modeling</title>
        </sec>
        <sec id="sec-4-1-4">
          <title>Query</title>
        </sec>
        <sec id="sec-4-1-5">
          <title>Execution</title>
          <p>2
3</p>
        </sec>
        <sec id="sec-4-1-6">
          <title>Context/BKG</title>
        </sec>
        <sec id="sec-4-1-7">
          <title>Tuple Stream</title>
          <p>
            The idea of integrating data by exploiting semantic web technologies is
wellknown and consolidated [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]. This also holds in the streaming context, where
recent studies investigated how to integrate streams of relational or graph-based
data through RDF streams [
            <xref ref-type="bibr" rid="ref13 ref17 ref5">5, 13, 17</xref>
            ].
          </p>
          <p>
            How to lift stream of matrices to RDF streams is still unexplored, and requires
some considerations. Given a matrix, there are ways to convert it into a
graphbased structure and consequently in RDF, e.g., each cell of the matrix can be
represented by a node, annotated with its position in the matrix, its value, and
properties relating it to adjacent cells. However, the representation of the matrix
data has a signi cant impact on the query language, which may require long
and complex descriptions to declare the linear algebra operations. Therefore, an
option is to keep the matrix data as is, and only transform it if and when the
query execution requires it. On this regard, the authors of LARA [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] point out
that the transformation of a matrix to a graph is possible, but the other way
around requires an ordering function. This drawback becomes relevant if users
want to execute matrix-speci c functions on other data formats.
          </p>
          <p>To append matrix data to an RDF stream, we de ned some properties to
annotate the matrix and a custom data type to serialize its content. This
allows us to add matrices to streams as literal nodes, bringing advantages to the
execution of matrix-speci c functions. Listing 1.1 shows an example of an RDF
stream encoding matrices. The snippet uses TriG as the serialization format, and
1 kk:eeysywtwroeorraddmssttyIyltelem 1 f
2 kk:eemyyw1woorrrddlsgstty:yledleata " [ 3 4 8 ] [ 8 7 2 ] [ 1 8 2 ] " ^ ^ r l g : m a t r i x ;
3 kkreeylygwwo:orcrdodslstutyylmelens 3 ;
4 kkreeylygwwo:orrrdodswsttysylele 3 .
5 kkgeeyywfwoor:rddsssttyryelelaem I t e m 1 prov : g e n e r a t e d A t 15 . g
6 kk:eeysywtwroeorraddmssttyIyltelem 2 f
7 kk:eemyyw2woorrrddlsgstty:yledleata " [ 1 0 2 ] [ 9 6 2 ] [ 6 4 0 ] " ^ ^ r l g : m a t r i x ;
8 kkreeylygwwo:orcrdodslstutyylmelens 3 ;
9 kkreeylygwwo:orrrdodswsttysylele 3 .
10 kk:eemyyw1woorrrddlsgstty:ylelev o l v e s T o : m2 .
11 kkgeeyywfwoor:rddsssttyryelelaem I t e m 1 prov : g e n e r a t e d A t 17 . g</p>
          <p>
            Listing 1.1. RDF example including a matrix node
the stream is encoded according to the model proposed in [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]. It contains two
stream items (represented as RDF graphs), generated at time instants 15 and 17.
Each stream item contains a matrix: data is a data type property having literals
of type matrix as the range; columns and rows are additional annotations. It
is worth noting that the snippet is compliant with the RDF model, making it
possible to process it with the usual semantic web related frameworks. Moreover,
the object representing the matrix can be annotated with additional properties
and can be linked with other resources.
4.2
          </p>
          <p>Query Modeling
The choice of the data model has a signi cant impact on the design of the
query language. As explained above, our data model is compliant with RDF,
and carries additional information to account for the streaming nature of the
data and the presence of matrices. It follows that SPARQL is the best starting
point to design the query language. SPARQL is the W3C recommended query
language for RDF with operators to manipulate RDF graphs based on relational
algebra, similar to how SQL works on relations.</p>
          <p>
            We need to accommodate matrix-speci c functions.Having matrices as nodes
makes the retrieval easy because we can refer to them by exploiting variables
and accessing their data value. When looking at use cases, we are not interested
in representing the same data in multiple formats for the sake of achieving high
velocity in computation, but enabling the combination of data. With this
thinking, we decided on adding the matrix-speci c operators to the query language as
SPARQL functions. This solution does not lead to a custom version of SPARQL
since it is the recommended practice for this type of extensions [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. An
example query is shown in Listing 1.2, where the contents of matrix resources are
retrieved (Lines 7{10), their inverses computed (Lines 11{12), added (Line 13)
and emitted (Line 3).
          </p>
          <p>
            Additionally, our query language needs a way to manage streams. Several
studies proposed extensions to SPARQL [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], with recent ongoing e orts to unify
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
1 kRkeeEyywGwoIoSrrdTdsEsttyRyleleSTREAM : o u t S t r AS
2 kCkeOeyyNwwSooTrrdRdsUsttyCylTele RSTREAM f
3 kkeeyywwoorrdd?smstty1ylele: h a s I n v e r s e [ ?m2 ? a d d I n v e r s e ] .
4 kkgeeyywwoorrddssttyylele
5 kFkeReyOywwMoorrdNdsAsttMyyleEleDWINDOW : win ON : i n S t r [RANGE 1 STEP 1 ]
6 kWkeeyHywEwoRorErddssttyylele
          </p>
          <p>f
7 kkeeyywwoorrdd?smstty1yleler d f : t y p e r l g : Matrix .
8 kkeeyywwoorrdd?smstty2yleler d f : t y p e r l g : Matrix .
9 kkeeyywwoorrdd?smstty1yleler l g : data ? d ata1 .
10 kkeeyywwoorrdd?smstty2yleler l g : data ? d ata2 .
11 kkeeyywwoorrdBdsIstNtyylDele ( a f n : i n v e r s e ( ? da ta1 ) AS ? i n v e r s e 1 ) .
12 kkeeyywwoorrdBdsIstNtyylDele ( a f n : i n v e r s e ( ? da ta2 ) AS ? i n v e r s e 2 ) .
13 kkeeyywwoorrdBdsIstNtyylDele ( a f n : add ( ? i n v e r s e 1 , ? i n v e r s e 2 ) AS ? a d d I n v e r s e ) .
14 kkgeeyywwoorrddssttyylele</p>
          <p>
            Listing 1.2. Query that computes the inverse matrices (pre xes are omitted for
brevity).
them in a common and shared language. The introduction of windows and
streams cannot be managed by preserving the original semantics of SPARQL
entirely. In particular, the continuous evaluation requires an extension to the
original SPARQL semantics: the notion of evaluation time instant needs to be
included in the operational semantics to describe when and on which portion of
the stream the query should be evaluated [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. In the example in Listing 1.2,
we are adopting the syntax proposed by the W3C RSP community group. An
output stream :outStr is declared (Line 1) and its items are de ned as graphs
containing the matrices and their inverse(Lines 2{4). The window on Line 5 is
declared over a stream :inStr as a tumbling window of one stream item, i.e. the
query processes one stream item at a time.
4.3
          </p>
          <p>Query Execution
The last step of our model consists in creating a DSPE topology that puts
together the data and the query described above. Given a (continuous) SPARQL
query, a way to generate a topology is shown in Figure 3. First, a parser creates
a logical query plan from the string of the SPARQL query. As usual, the logical
plan can be modi ed and optimized. Being a SPARQL query, the leaves of the
tree correspond to the Basic Graph Patterns, which are de ned in the WHERE
clause. Those operators generate solution mappings, which are further processed
by the other operators.</p>
          <p>To generate the topology, we exploit the logical plan, as highlighted in
Figure 3. In the topology, the BGP operators are on the left, which are fed with
portions of the stream selected by the windows. Such BGP operators process
the data and push the outputs to the correct operators, which continue the
processing, sending the data towards the sinks. A converter traverses the logical
query plan and creates a task in the topology for each operator. In this way, it
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
keywordstyle
is easier to track what happens during the execution of the query. Moreover, the
decision to optimize the logical query plan allows us to exploit well-known
techniques from database research. The main drawback is the fact that our converter
may not nd the best possible topology (regarding time performance). The
converter always creates tree-shaped topologies, and it cannot generate other types
of DAG.</p>
          <p>REGISTER STREAM :outStr AS
CONSTRUCT RSTREAM { ?m1
:hasInverse [rlg:data ?addInverse] . }
FROM NAMED WINDOW :win ON</p>
          <p>:inStr [RANGE 1 STEP 1]
WHERE {
?m1 rdf:type rlg:Matrix ;
?m2 rdf:type rlg:Matrix ;
?m1 rlf:data ?data1 .
?m2 rlf:data ?data2 .</p>
          <p>BIND (afn:inverse(?data)</p>
          <p>as ?inverse1).</p>
          <p>BIND (afn:inverse(?data2)</p>
          <p>as ?inverse2) .</p>
          <p>BIND (afn:add(?inverse1,</p>
          <p>? inverse2) as addInverse .
}</p>
          <p>SPARQL</p>
          <p>Parser</p>
          <p>Operator</p>
          <p>Tree
s
e
c
r
u
o</p>
          <p>S
Converter</p>
          <p>
            Topology
k
n
i
S
To verify the feasibility of our model, we built a proof of concept. We started
from some existing frameworks: as a DSPE, we opted for Apache Flink [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]; we
used Apache Jena 4 to manage the SPARQL query; and we used JAMA [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] as
a library providing matrix-related functions. In the following, we highlight some
parts of our experience.
          </p>
          <p>
            Whenever a literal is speci ed as a matrix, the string is parsed into a matrix
data structure. Functions, that are speci c for matrices, can be executed and the
result can then be returned to the query. We implemented such functions
according to the SPARQL speci cation, listed in Table 1. We exploited the JAMA
Library from MathWorks and NIST [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] for the matrix data structures as well
as for the functions manipulating the matrices.
In our current implementation, the query language does not support the
SPARQL extensions for streams, which is on the schedule for our future work.
At the moment, such information is provided as a set of parameters. It is worth
noting that this is not a limitation, since there are several prototypes that are
already showing the feasibility of these features [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ].
5.2
          </p>
          <p>Topology Creation and Execution
We decided to use Apache Flink as the basis for the execution environment,
since it o ers a exible and well-documented API. However, our approach can
be ported to other DSPEs, since the notion of topology is shared among them.</p>
          <p>When de ning a Flink topology, it is necessary to declare the type of data
that tasks exchange. Flink o ers a set of native data types, among which Tuple
is the most prominent. It is a list of values, indexed by their position number.
We use Tuple for most of the data exchanges between nodes.</p>
          <p>Given a query (partially de ned through SPARQL, partially de ned through
extra parameters), the conversion process derives a topology. For each SPARQL
operator, the process creates a task. At the moment the projection, FILTER,
BIND, LIMIT, and BGP operators are supported. Furthermore, our prototype
supports several window operators (since they are natively supported by Flink),
and the matrix-related operations in Table 1. Besides, the process extracts the
variable names, which are stored in a dedicated data structure. Tasks use this
structure to manage the solution mappings as Tuple objects, inferring the
position of the variable content during the query execution.</p>
          <p>Streams among tasks exchange Tuple objects; the only exceptions are the
tasks implementing BGP operators. The input of a BGP operator is a nite
sequence of stream items, expressed as a set of RDF graphs. They are merged
into a new RDF graph, which represents the window content, and the BGP
is evaluated over it. The resulting solution mappings are converted into Tuple
objects and are sent to the other tasks of the topology.</p>
          <p>The conversion process returns a snippet of code with the topology
description. This code can be fed into Flink, which instantiates the topology and
executes it over the input streams. The code of the project can be found at
https://gitlab.i .uzh.ch/DDIS-Public/ralagra/.
5.3</p>
          <p>Limitations
While our prototype shows the feasibility of our model, it has several limitations.
The current implementation does not carry the system integration component,
i.e. the system expects to get as input one RDF stream compliant with the
data model described in Section 4.1. Our system is not able to receive multiple
streams and therefore, can not combine them on the y. This is subject to future
work. As explained above, several studies show the feasibility of this component,
and we are going to implement it for the next version.</p>
          <p>Moreover, we aim at automating the submission of the topology to Flink.
When the conversion process creates the topology from the input query, the
code snippet should automatically be injected into Flink. Techniques like Java
re ection5 or template engines may help in tackling this problem.</p>
          <p>We are also working to extend our system to other SPARQL operators. At
the moment, it supports the most common SPARQL features, but it is important
to extend the coverage to a wider set of operators.</p>
          <p>Finally, serializing matrices as plain strings is not the best solution in terms
of space and time to process them. In future works, we plan to explore other
serialization formats for matrices (and RDF streams carrying them), such as
Protocol Bu er and Apache Thrift.
6</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions &amp; Future Work</title>
      <p>In this study, we proposed a model to handle data streams carrying di erent
types of data { relations, graphs, and matrices. We de ned a data model by
exploiting RDF, where streams are modeled as sequences of time-annotated RDF
graphs and matrices are represented as literals. We also described a query
language to manage such streams and to perform relational and linear algebra</p>
      <sec id="sec-5-1">
        <title>5 Cf. https://docs.oracle.com/javase/tutorial/re ect/.</title>
        <p>operations over their items. We developed a proof of concept implementing the
most unique parts of the model.</p>
        <p>Over the course of the next months, we will work to consolidate the
prototype and to add the other parts, to have a full RDF stream processing engine.
We also aim at performing an extensive evaluation of the system. We are
interested in studying the performance, the overhead introduced by our extensions
and to compare our system with other prototypes developed so far. It will also
be important to study more in depth to which extent our query language can
support the modeling of users needs and tasks.</p>
        <p>
          Finally, our prototype is setting the basis to study the problem of
distribution. So far, only a few studies targeted the problem of distributed RDF stream
processing engines, such as Strider [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. The main di erence in our setting is the
presence of matrices and operators over them, which requires di erent
distribution strategies.
        </p>
        <p>Acknowledgments We thank the Swiss National Science Foundation (SNF)
for partial support of this work under contract number #407550 167177.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Akidau</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bradshaw</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chambers</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chernyak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez-Moctezuma</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lax</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McVeety</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mills</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perry</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , et al.:
          <article-title>The data ow model: a practical approach to balancing correctness, latency, and cost in massivescale, unbounded, out-of-order data processing</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          <volume>8</volume>
          (
          <issue>12</issue>
          ),
          <volume>1792</volume>
          {
          <year>1803</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alexandrov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kunft</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katsifodimos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Schuler,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Thamsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Kao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Herb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Markl</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          :
          <article-title>Implicit parallelism through deep language embedding</article-title>
          .
          <source>In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data</source>
          . pp.
          <volume>47</volume>
          {
          <fpage>61</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Andrejev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Risch</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Scienti c data as RDF with arrays: Tight integration of scisparql queries into MATLAB</article-title>
          . In: International Semantic Web Conference (Posters &amp;
          <article-title>Demos)</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1272</volume>
          , pp.
          <volume>221</volume>
          {
          <fpage>224</fpage>
          . CEURWS.org (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Arasu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Babu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widom</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>CQL: A Language for Continuous Queries over Streams and Relations</article-title>
          .
          <source>In: DBPL. Lecture Notes in Computer Science</source>
          , vol.
          <volume>2921</volume>
          , pp.
          <volume>1</volume>
          {
          <fpage>19</fpage>
          . Springer (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Calbimonte</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeung</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Corcho, .,
          <string-name>
            <surname>Aberer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Enabling Query Technologies for the Semantic Sensor Web</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst</source>
          .
          <volume>8</volume>
          (
          <issue>1</issue>
          ),
          <volume>43</volume>
          {
          <fpage>63</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Carbone</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katsifodimos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ewen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markl</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haridi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tzoumas</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Apache ink: Stream and batch processing in a single engine</article-title>
          .
          <source>Bulletin of the IEEE Computer Society Technical Committee on Data Engineering</source>
          <volume>36</volume>
          (
          <issue>4</issue>
          ) (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Cugola</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Margara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Processing ows of information: From data stream to complex event processing</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>44</volume>
          (
          <issue>3</issue>
          ),
          <volume>15</volume>
          :1{
          <fpage>15</fpage>
          :
          <fpage>62</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wood</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanthaler</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>RDF 1.1 Concepts and Abstract Syntax</article-title>
          .
          <source>W3c Recommendation</source>
          ,
          <source>W3C</source>
          (
          <year>2014</year>
          ), https://www.w3.org/TR/rdf11-concepts/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Dell' Aglio,
          <string-name>
            <surname>D.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Della</given-names>
            <surname>Valle</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Stream reasoning: A survey and outlook</article-title>
          .
          <source>Data Science</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          {2),
          <volume>59</volume>
          {
          <fpage>83</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dell'Aglio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Della</given-names>
            <surname>Valle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Calbimonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.P.</given-names>
            , Corcho, .:
            <surname>RSP-QL Semantics</surname>
          </string-name>
          :
          <article-title>A Unifying Query Model to Explain Heterogeneity of RDF Stream Processing Systems</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst</source>
          .
          <volume>10</volume>
          (
          <issue>4</issue>
          ),
          <volume>17</volume>
          {
          <fpage>44</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>SPARQL 1.1 Query Language</article-title>
          .
          <source>W3c Recommendation</source>
          ,
          <source>W3C</source>
          (
          <year>2013</year>
          ), https://www.w3.org/TR/sparql11-query/
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hutchison</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Howe</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suciu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Laradb: A minimalist kernel for linear and relational algebra computation</article-title>
          .
          <source>BeyondMR@</source>
          SIGMOD pp.
          <volume>2</volume>
          :
          <issue>1</issue>
          {2:
          <issue>10</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kharlamov</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovland</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimnez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lanti</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lie</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinkel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rezk</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , eveland,
          <string-name>
            <given-names>M.G.S.</given-names>
            ,
            <surname>Thorstensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Zheleznyakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Ontology Based Access to Exploration Data at Statoil</article-title>
          .
          <source>In: International Semantic Web Conference (2). Lecture Notes in Computer Science</source>
          , vol.
          <volume>9367</volume>
          , pp.
          <volume>93</volume>
          {
          <fpage>112</fpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kunft</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alexandrov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katsifodimos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Markl</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Bridging the gap: towards optimization across linear and relational algebra</article-title>
          .
          <source>In: Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond</source>
          . p.
          <fpage>1</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Luckham</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems</article-title>
          .
          <source>In: RuleML. Lecture Notes in Computer Science</source>
          , vol.
          <volume>5321</volume>
          , p.
          <fpage>3</fpage>
          . Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>MathWorks</surname>
          </string-name>
          , T., NIST: Jama:
          <article-title>Java matrix package</article-title>
          . http://math.nist.gov/javanumerics/jama/ (
          <year>2012</year>
          ), accessed:
          <fpage>2017</fpage>
          -12-04
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mauri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calbimonte</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Aglio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balduini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brambilla</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Della</given-names>
            <surname>Valle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Aberer</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>TripleWave: Spreading RDF Streams on the Web</article-title>
          .
          <source>In: International Semantic Web Conference (2). Lecture Notes in Computer Science</source>
          , vol.
          <volume>9982</volume>
          , pp.
          <volume>140</volume>
          {
          <fpage>149</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>McGregor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Graph stream algorithms: a survey</article-title>
          .
          <source>ACM SIGMOD Record</source>
          <volume>43</volume>
          (
          <issue>1</issue>
          ),
          <volume>9</volume>
          {
          <fpage>20</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Noy</surname>
            ,
            <given-names>N.F.</given-names>
          </string-name>
          :
          <article-title>Semantic Integration: A Survey Of Ontology-Based Approaches</article-title>
          .
          <source>SIGMOD Record</source>
          <volume>33</volume>
          (
          <issue>4</issue>
          ),
          <volume>65</volume>
          {
          <fpage>70</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cur</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Strider</surname>
            :
            <given-names>A Hybrid</given-names>
          </string-name>
          <string-name>
            <surname>Adaptive Distributed RDF Stream Processing</surname>
          </string-name>
          <article-title>Engine</article-title>
          .
          <source>In: International Semantic Web Conference (1). Lecture Notes in Computer Science</source>
          , vol.
          <volume>10587</volume>
          , pp.
          <volume>559</volume>
          {
          <fpage>576</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soroush</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velikhov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Balazinska</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeWitt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madden</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>Overview of scidb</article-title>
          .
          <source>In: 2010 International Conference on Management of Data</source>
          , SIGMOD'
          <volume>10</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Stonebraker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becla</surname>
          </string-name>
          , J.:
          <article-title>Scidb: A database management system for applications with complex analytics</article-title>
          .
          <source>Computing in Science &amp; Engineering</source>
          <volume>15</volume>
          (
          <issue>3</issue>
          ),
          <volume>54</volume>
          {
          <fpage>62</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Toshniwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taneja</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shukla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramasamy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kulkarni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jackson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gade</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donham</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>Storm@ twitter</article-title>
          .
          <source>In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data</source>
          . pp.
          <volume>147</volume>
          {
          <fpage>156</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zaharia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xin</surname>
            ,
            <given-names>R.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wendell</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Armbrust</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dave</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venkataraman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franklin</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          , et al.:
          <article-title>Apache spark: A uni ed engine for big data processing</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>59</volume>
          (
          <issue>11</issue>
          ),
          <volume>56</volume>
          {
          <fpage>65</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>