Stream Processing: The Matrix Revolutions Romana Pernischova, Florian Ruosch, Daniele Dell’Aglio, and Abraham Bernstein University of Zurich, Zurich, Switzerland {pernischova,dellaglio,bernstein}@ifi.uzh.ch, florian.ruosch@uzh.ch Abstract. The growth of data velocity sets new requirements to de- velop solutions able to manage big amounts of dynamic data. The set- ting becomes even more challenging when such data is heterogeneous in schemata or formats, such as triples, tuples, relations, or matrices. Looking at the state of the art, traditional stream processing systems only accept data in one of these formats. Semantic technologies enable the processing of streams combining differ- ent shapes of data. This article presents a prototype that transforms SPARQL queries to Apache Flink topologies using the Apache Jena parser. With a custom data type and tailored functions, we integrate matrices in Jena and therefore, allow to mix graphs, relational, and lin- ear algebra in an RDF graph. This provides a proof of concept that queries written for static data can easily be run on streams with the usage of the streaming engine Flink, even if they contain multiple of the aforementioned types. Keywords: query· continuous queries· streams· RDF· SPARQL· Flink· linear algebra· relational algebra 1 Introduction The processing of real-time information is getting more and more critical, as the number of data stream sources is rapidly increasing. Often, reactivity is an important requirement when working with this kind of data: the value of the out- put decreases quickly over time. The state of the art to process unbounded data reactively relies on stream processing engines which set their roots in database and middleware research. The processing of this type of data is also relevant on the Web, where several use cases can be found in the context of Internet of Things (and the related Web of Things), as well as in social network and social media analytics. An interesting challenge that emerges from the Web setting is the data heterogeneity, as shown in the example below. A market research company is tasked with developing a system to analyze the behavior of users of an online TV platform. In particular, they want to investigate if certain images on TV programs cause customers to change TV stations and if this behavior is similar among people who know each other. This can result in customer specific programs and tailored advertisements that 15 2 R. Pernischova et al. would induce the user to change the TV station or to stay on. Such an analysis needs to combine data of different formats: the TV program (i.e. a stream of images and sounds), the user activities (i.e. a relational stream), and the program schedules and advertisement descriptions (i.e. a relational or graph database). When performing this kind of analysis, it is common practice to represent the TV program as a sequence of matrices, obtained by applying matrix specific functions like the Fast Fourier Transform (FFT). The FFT computes the frequencies of the different images that appear in the video, and it enables an association which can be used in in-depth analysis that includes the behavior and relationship data. The additional data has a different shape than the images. Such data is usually given through tables or graphs. To find the results, this data needs to be combined: the stream data has to be integrated to identify images which were last seen before switching stations. The data which contains the time spend watching a specific TV program is also a stream, since it is unbounded. To the best of our knowledge, today we lack scalable big data platforms able to manage streams of different types in a reactive and continuous way. In this paper, we make a first step in this direction by analyzing the problem of processing three different types of data streams: matrices, relations, and graphs. In other words, we want to investigate how to build a big data platform to process streams containing matrices, tables, and graphs. The combination of the different types of streams requires some common data model or strategy of handling the heterogeneity while processing a query. In addition to this, such a platform should allow users to issue complex queries and enable them to exploit different types of operators depending on the underlying data. A query language is therefore needed to capture the needs of the user, including operators to express complex functions and combination of streams. This language depends on the chosen strategy for the integration of the different streams. Finally, the query has to be processed over the streams in a continuous fashion and should return a sequence of answers which are updated according to the input streams. Our main contribution is a model to process streams of data in different formats through relational and linear algebra operators. We exploit semantic web technologies to cope with the format heterogeneity, and we adopt distributed stream processing engine models as a basis to build an execution environment. We show the feasibility of our approach through an implementation based on Apache Jena and Flink. 2 Background Processing data in the context of the Web is challenging, since it often inherits the issues that characterize big data. It suffers from a variety of problems: data from multiple sources has different serializations, formats, and schemas. The Semantic Web has shown to be a proper solution to cope with these kinds of issues: it offers a broad set of technologies to model, exchange, and query data on 16 Stream Processing: The Matrix Revolutions 3 the Web. RDF [8] is a model to represent data. Conceptually, it organizes data in a graph based structure, where the minimal information unit is the statement, a triple composed by a predicate (the edge), a subject and an object (the vertices). Subjects and predicates are resources, i.e. URIs denoting entities or concepts; objects can be either URIs or literals, that are strings with an optional data type, such as integer or date. SPARQL [11] is a protocol and RDF query language, used to manipulate and retrieve linked data. It uses sets of triple patterns, called Basic Graph Patterns (BGP), to match subgraphs. The language is similar to SQL and uses keywords like SELECT and WHERE to address the underlying concepts. To create graphs and run queries, the framework Apache Jena 1 can be used. When data is very dynamic and its processing needs to be reactive, solu- tions like RDF and SPARQL may not suffice. Recently, several research groups started to investigate how to adapt the Semantic Web stack to cope with veloc- ity. In this context, it is worth mentioning the work of the W3C RDF Stream Processing (RSP) Community Group 2 , which collected such efforts and led sev- eral initiatives to disseminate the results. Relevant results of this trend are RDF streams, as a (potentially unbounded) sequence of time-annotated RDF graphs, and continuous extensions of SPARQL, which enable users to define tasks, as well as queries to be evaluated over RDF streams. Windows are introduced to be able to treat the unbounded data, which enables calculations over the data inside the window. Without windowing there is no data completeness and the triggering of executions is problematic. While the RDF Stream Processing trend introduced several notions to man- age streams, only an initial effort has been dedicated to the creation of solutions to cope with the volume of data generated in the Web context. The state of the art in the processing of large amounts of streaming data relies on distributed stream processing engines (DSPE). These platforms emerged as successors of MapReduce frameworks and are developed to be deployed into clusters and to run the processing of streams of data in a distributed fashion. Users are required to design topologies, i.e. logical workflows of operations arranged in directed acyclic graphs, which are taken as input by the DSPE and are deployed accord- ing to the configuration settings and the hardware availability. 3 Related Work Several studies investigated different types of data and how to combine them. With regards to the three types of data we are considering, Figure 1 shows some of the query languages we considered as foundations of this study. Graph stream processing There is not a common definition of graph stream processing. In the survey presented by McGregor [18], the focus is on processing very large graphs: since they cannot be kept in memory, they are streamed 1 Cf. https://jena.apache.org/ 2 Cf. https://www.w3.org/community/rsp/. 17 4 R. Pernischova et al. into the system, and typical graph operations are run as on-line algorithms. A different approach is the one taken by the RSP community group, which models streams where data items are composed by graphs. In this case, the processing consists of the execution of relational operators over portions of the stream (such as aggregations), event pattern matching, or deductive inference processing [9]. None of the studies mentioned above investigated the integration of streams of graphs with other types of streams. Dealing with linear and rational algebras. SQL and SPARQL are two examples of query languages to process tuples and graph-based data through relational algebra. However, these kinds of operators can hardly be used to perform linear algebra operations over matrices, such as transposition and calculating the deter- minant. SciDB [22, 21] is one example of a system that bridges these two worlds. This database stores arrays rather than tuples, and tasks are defined through an SQL-like language called AQL (Array Query Language). Moreover, Andejev, He, and Risch [3] introduce their prototype that can be accessed with Matlab. It provides storage of arrays in an RDF graph and retrieval of the data and its meta-information using SciSPARQL. SciSPARQL is an extension of SPARQL that incorporates array operations within the query. The authors focus on the integration of the different format rather than on stream processing. They make the processing of large amounts of static data easier. Another effort in such a direction is LaraDB [12], which proposes Lara that combines relational and linear algebra. It uses a new representation, called as- sociative table, into which relations, scalars, and matrices are recast. They map operators from relational and linear algebra onto their functions and in this way are able to express combinations of those. Looking at query languages, LARA [14] relies on abstract data types and local optimizations; however, there is no known system that would support such a language. EMMA [2] is a language for parallel data analysis: its goal is to hide the notion of parallelism behind a declarative language, which is realized using monad comprehensions, which are based on set comprehension syntax. EMMA introduces bags as the algebraic data types and enables the use of different algebras by replacing the general union representation in a binary tree. While there is an ongoing trend in research to combine linear and relational algebra, we are not aware of studies that focus on a streaming setting. Stream Processing Engines Research on stream processing sets its foundation in the database and the middleware communities [7]. The former proposed models and methods to process streams according to the relational model, like CQL [4], while the latter took a different perspective, developing techniques to identify relevant sequences of events in the input streams [15]. The research on this field got revitalized in the last years, as an evolution of the MapReduce paradigm, which led to the development of distributed stream processing engines (DSPE). Apache Spark Streaming [24] sits on top of the initial Spark architecture, which implements batch processing. It focuses on stateless operations and stateful windows. In contrast, Apache Storm [23] is natively a 18 Stream Processing: The Matrix Revolutions 5 Fig. 1. Existing languages combining different algebras. stream processing engine and supports query operations such as joins and aggre- gations. It provides a low-level API which allows for the use of different program- ming languages. Apache Flink [6] is optimized for cyclic or iterative processes. Unlike Spark, it adopts a native streaming approach and can handle data that does not fit into RAM. The Google Dataflow model [1] and its implementation in Apache Beam 3 present a different approach: they aim to act as a façade, run- ning a Dataflow-compliant topology in a DSPE, such as Apache Spark, Flink, or Google Cloud Dataflow. All of the above systems support windowing and typical relational algebra op- erators. Such platforms also offer support to linear algebra operations (through plug-ins and extensions). However, the topologies are specified through pro- grammable APIs rather than a query language. Having such a tool would be useful to let users with limited programming skills express their tasks through a declarative language, without requiring users to code the topologies. 4 The Model In this section, we describe the model we envision to use for processing queries over heterogeneous streams. Figure 2 shows a logical representation of the model with a highlight on the three main challenges we identified. The first one (de- noted by 1 in the picture) relates to the data integration: given a set of streams containing graphs, relations, and matrices, how can they be integrated in a com- mon data model? The second one captures the user’s needs: what is a suitable query language to let the user express tasks combining relational and linear al- gebra operators? The third one puts the pieces together: how to execute the queries over the input data? In the following sections, we discuss the challenges and propose our solution. 3 Cf. https://beam.apache.org/. 19 6 R. Pernischova et al. Query 2 Modeling 1 Stream Integration 3 Query RDF Stream Tuple Stream Execution Context/BKG Fig. 2. Model of the proposed system for processing multiple and heterogeneous data sources. 4.1 Stream Integration The idea of integrating data by exploiting semantic web technologies is well- known and consolidated [19]. This also holds in the streaming context, where recent studies investigated how to integrate streams of relational or graph-based data through RDF streams [5, 13, 17]. How to lift stream of matrices to RDF streams is still unexplored, and requires some considerations. Given a matrix, there are ways to convert it into a graph- based structure and consequently in RDF, e.g., each cell of the matrix can be represented by a node, annotated with its position in the matrix, its value, and properties relating it to adjacent cells. However, the representation of the matrix data has a significant impact on the query language, which may require long and complex descriptions to declare the linear algebra operations. Therefore, an option is to keep the matrix data as is, and only transform it if and when the query execution requires it. On this regard, the authors of LARA [14] point out that the transformation of a matrix to a graph is possible, but the other way around requires an ordering function. This drawback becomes relevant if users want to execute matrix-specific functions on other data formats. To append matrix data to an RDF stream, we defined some properties to annotate the matrix and a custom data type to serialize its content. This al- lows us to add matrices to streams as literal nodes, bringing advantages to the execution of matrix-specific functions. Listing 1.1 shows an example of an RDF stream encoding matrices. The snippet uses TriG as the serialization format, and 20 Stream Processing: The Matrix Revolutions 7 1 : streamItem1 { keywordstyle keywordstyle keywordstyle 2 keywordstyle keywordstyle : m1 r l g : data ” [ 3 4 8 ] [ 8 7 2 ] [ 1 8 2 ] ” ˆ ˆ r l g : matrix ; keywordstyle 3 r l g : columns keywordstyle keywordstyle 3; keywordstyle 4 keywordstyle keywordstyle r l g : rows 3 . keywordstyle 5 } { : streamItem1 prov : g e n e r a t e d A t 15 . } keywordstyle keywordstyle keywordstyle 6 : streamItem2 { keywordstyle keywordstyle keywordstyle 7 keywordstyle keywordstyle : m2 r l g : data ” [ 1 0 2 ] [ 9 6 2 ] [ 6 4 0 ] ” ˆ ˆ r l g : matrix ; keywordstyle 8 keywordstyle keywordstyle r l g : columns 3; keywordstyle 9 keywordstyle keywordstyle r l g : rows 3. keywordstyle 10 keywordstyle keywordstyle : m1 r l g : e v o l v e s T o : m2 . keywordstyle 11 } { : streamItem1 prov : g e n e r a t e d A t 17 . } keywordstyle keywordstyle keywordstyle Listing 1.1. RDF example including a matrix node the stream is encoded according to the model proposed in [17]. It contains two stream items (represented as RDF graphs), generated at time instants 15 and 17. Each stream item contains a matrix: data is a data type property having literals of type matrix as the range; columns and rows are additional annotations. It is worth noting that the snippet is compliant with the RDF model, making it possible to process it with the usual semantic web related frameworks. Moreover, the object representing the matrix can be annotated with additional properties and can be linked with other resources. 4.2 Query Modeling The choice of the data model has a significant impact on the design of the query language. As explained above, our data model is compliant with RDF, and carries additional information to account for the streaming nature of the data and the presence of matrices. It follows that SPARQL is the best starting point to design the query language. SPARQL is the W3C recommended query language for RDF with operators to manipulate RDF graphs based on relational algebra, similar to how SQL works on relations. We need to accommodate matrix-specific functions.Having matrices as nodes makes the retrieval easy because we can refer to them by exploiting variables and accessing their data value. When looking at use cases, we are not interested in representing the same data in multiple formats for the sake of achieving high velocity in computation, but enabling the combination of data. With this think- ing, we decided on adding the matrix-specific operators to the query language as SPARQL functions. This solution does not lead to a custom version of SPARQL since it is the recommended practice for this type of extensions [11]. An exam- ple query is shown in Listing 1.2, where the contents of matrix resources are retrieved (Lines 7–10), their inverses computed (Lines 11–12), added (Line 13) and emitted (Line 3). Additionally, our query language needs a way to manage streams. Several studies proposed extensions to SPARQL [9], with recent ongoing efforts to unify 21 8 R. Pernischova et al. 1 keywordstyle keywordstyleSTREAM : o u t S t r AS REGISTER keywordstyle 2 ONSTRUCT RSTREAM { keywordstyle Ckeywordstyle keywordstyle 3 ?m1 : h a s I n v e r s e [ ?m2 ? a d d I n v e r s e ] . keywordstyle keywordstyle keywordstyle 4 } keywordstyle keywordstyle keywordstyle 5 keywordstyle Fkeywordstyle ROM NAMEDWINDOW : win ON : i n S t r [RANGE 1 STEP 1 ] keywordstyle 6 HERE { keywordstyle Wkeywordstyle keywordstyle 7 keywordstyle keywordstyle ?m1 r d f : typ e r l g : Matrix . keywordstyle 8 keywordstyle keywordstyle ?m2 r d f : typ e r l g : Matrix . keywordstyle 9 keywordstyle keywordstyle ?m1 r l g : data ? data1 . keywordstyle 10 keywordstyle keywordstyle ?m2 r l g : data ? data2 . keywordstyle 11 keywordstyle keywordstyle BIND ( a f n : i n v e r s e ( ? data1 ) AS ? i n v e r s e 1 ) . keywordstyle 12 keywordstyle keywordstyle BIND ( a f n : i n v e r s e ( ? data2 ) AS ? i n v e r s e 2 ) . keywordstyle 13 keywordstyle keywordstyle BIND ( a f n : add ( ? i n v e r s e 1 , ? i n v e r s e 2 ) AS ? a d d I n v e r s e ) . keywordstyle 14 } keywordstyle keywordstyle keywordstyle Listing 1.2. Query that computes the inverse matrices (prefixes are omitted for brevity). them in a common and shared language. The introduction of windows and streams cannot be managed by preserving the original semantics of SPARQL entirely. In particular, the continuous evaluation requires an extension to the original SPARQL semantics: the notion of evaluation time instant needs to be included in the operational semantics to describe when and on which portion of the stream the query should be evaluated [10]. In the example in Listing 1.2, we are adopting the syntax proposed by the W3C RSP community group. An output stream :outStr is declared (Line 1) and its items are defined as graphs containing the matrices and their inverse(Lines 2–4). The window on Line 5 is declared over a stream :inStr as a tumbling window of one stream item, i.e. the query processes one stream item at a time. 4.3 Query Execution The last step of our model consists in creating a DSPE topology that puts together the data and the query described above. Given a (continuous) SPARQL query, a way to generate a topology is shown in Figure 3. First, a parser creates a logical query plan from the string of the SPARQL query. As usual, the logical plan can be modified and optimized. Being a SPARQL query, the leaves of the tree correspond to the Basic Graph Patterns, which are defined in the WHERE clause. Those operators generate solution mappings, which are further processed by the other operators. To generate the topology, we exploit the logical plan, as highlighted in Fig- ure 3. In the topology, the BGP operators are on the left, which are fed with portions of the stream selected by the windows. Such BGP operators process the data and push the outputs to the correct operators, which continue the pro- cessing, sending the data towards the sinks. A converter traverses the logical query plan and creates a task in the topology for each operator. In this way, it 22 Stream Processing: The Matrix Revolutions 9 is easier to track what happens during the execution of the query. Moreover, the decision to optimize the logical query plan allows us to exploit well-known tech- niques from database research. The main drawback is the fact that our converter may not find the best possible topology (regarding time performance). The con- verter always creates tree-shaped topologies, and it cannot generate other types of DAG. REGISTER STREAM :outStr AS CONSTRUCT RSTREAM { ?m1 :hasInverse [rlg:data ?addInverse] . } FROM NAMED WINDOW :win ON :inStr [RANGE 1 STEP 1] WHERE { ?m1 rdf:type rlg:Matrix ; ?m2 rdf:type rlg:Matrix ; ?m1 rlf:data ?data1 . ?m2 rlf:data ?data2 . BIND (afn:inverse(?data) as ?inverse1). Sources BIND (afn:inverse(?data2) Sink as ?inverse2) . BIND (afn:add(?inverse1, ? inverse2) as addInverse . } Parser Operator Converter SPARQL Topology Tree Fig. 3. Processing a query in a stream setting. 5 Implementation To verify the feasibility of our model, we built a proof of concept. We started from some existing frameworks: as a DSPE, we opted for Apache Flink [6]; we used Apache Jena 4 to manage the SPARQL query; and we used JAMA [16] as a library providing matrix-related functions. In the following, we highlight some parts of our experience. 5.1 Query language We used Apache Jena since it already provides tools to manage SPARQL, such as a parser and implementations of its operators. Moreover, Jena offers a well- documented API to extend its functions. As explained in Section 4.2, our data model manages matrices through in- dividuals of a Matrix class, a new literal data type that serializes its content. 4 Cf. https://jena.apache.org/. 23 10 R. Pernischova et al. Whenever a literal is specified as a matrix, the string is parsed into a matrix data structure. Functions, that are specific for matrices, can be executed and the result can then be returned to the query. We implemented such functions ac- cording to the SPARQL specification, listed in Table 1. We exploited the JAMA Library from MathWorks and NIST [16] for the matrix data structures as well as for the functions manipulating the matrices. Function Returns plus result of an addition minus result of a subtraction times result of an element-wise multiplication times result of a multiplication by a scalar times result of a matrix-matrix multiplication divide result of an element-wise division between partial matrix specified by indices join result of joining two matrices with an operator merge result of merging two matrices transpose result of a transposition rank calculated rank of the matrix determinant calculated determinant of the matrix inverse inverse of the input matrix condition ratio of largest to smallest singular value trace result of the sum of the diagonal Table 1. Functions added to SPARQL using the Library JAMA [16] In our current implementation, the query language does not support the SPARQL extensions for streams, which is on the schedule for our future work. At the moment, such information is provided as a set of parameters. It is worth noting that this is not a limitation, since there are several prototypes that are already showing the feasibility of these features [9]. 5.2 Topology Creation and Execution We decided to use Apache Flink as the basis for the execution environment, since it offers a flexible and well-documented API. However, our approach can be ported to other DSPEs, since the notion of topology is shared among them. When defining a Flink topology, it is necessary to declare the type of data that tasks exchange. Flink offers a set of native data types, among which Tuple is the most prominent. It is a list of values, indexed by their position number. We use Tuple for most of the data exchanges between nodes. Given a query (partially defined through SPARQL, partially defined through extra parameters), the conversion process derives a topology. For each SPARQL operator, the process creates a task. At the moment the projection, FILTER, BIND, LIMIT, and BGP operators are supported. Furthermore, our prototype supports several window operators (since they are natively supported by Flink), 24 Stream Processing: The Matrix Revolutions 11 and the matrix-related operations in Table 1. Besides, the process extracts the variable names, which are stored in a dedicated data structure. Tasks use this structure to manage the solution mappings as Tuple objects, inferring the posi- tion of the variable content during the query execution. Streams among tasks exchange Tuple objects; the only exceptions are the tasks implementing BGP operators. The input of a BGP operator is a finite sequence of stream items, expressed as a set of RDF graphs. They are merged into a new RDF graph, which represents the window content, and the BGP is evaluated over it. The resulting solution mappings are converted into Tuple objects and are sent to the other tasks of the topology. The conversion process returns a snippet of code with the topology descrip- tion. This code can be fed into Flink, which instantiates the topology and executes it over the input streams. The code of the project can be found at https://gitlab.ifi.uzh.ch/DDIS-Public/ralagra/. 5.3 Limitations While our prototype shows the feasibility of our model, it has several limitations. The current implementation does not carry the system integration component, i.e. the system expects to get as input one RDF stream compliant with the data model described in Section 4.1. Our system is not able to receive multiple streams and therefore, can not combine them on the fly. This is subject to future work. As explained above, several studies show the feasibility of this component, and we are going to implement it for the next version. Moreover, we aim at automating the submission of the topology to Flink. When the conversion process creates the topology from the input query, the code snippet should automatically be injected into Flink. Techniques like Java reflection5 or template engines may help in tackling this problem. We are also working to extend our system to other SPARQL operators. At the moment, it supports the most common SPARQL features, but it is important to extend the coverage to a wider set of operators. Finally, serializing matrices as plain strings is not the best solution in terms of space and time to process them. In future works, we plan to explore other serialization formats for matrices (and RDF streams carrying them), such as Protocol Buffer and Apache Thrift. 6 Conclusions & Future Work In this study, we proposed a model to handle data streams carrying different types of data – relations, graphs, and matrices. We defined a data model by ex- ploiting RDF, where streams are modeled as sequences of time-annotated RDF graphs and matrices are represented as literals. We also described a query lan- guage to manage such streams and to perform relational and linear algebra 5 Cf. https://docs.oracle.com/javase/tutorial/reflect/. 25 12 R. Pernischova et al. operations over their items. We developed a proof of concept implementing the most unique parts of the model. Over the course of the next months, we will work to consolidate the proto- type and to add the other parts, to have a full RDF stream processing engine. We also aim at performing an extensive evaluation of the system. We are inter- ested in studying the performance, the overhead introduced by our extensions and to compare our system with other prototypes developed so far. It will also be important to study more in depth to which extent our query language can support the modeling of users needs and tasks. Finally, our prototype is setting the basis to study the problem of distribu- tion. So far, only a few studies targeted the problem of distributed RDF stream processing engines, such as Strider [20]. The main difference in our setting is the presence of matrices and operators over them, which requires different distribu- tion strategies. Acknowledgments We thank the Swiss National Science Foundation (SNF) for partial support of this work under contract number #407550 167177. References 1. Akidau, T., Bradshaw, R., Chambers, C., Chernyak, S., Fernández-Moctezuma, R.J., Lax, R., McVeety, S., Mills, D., Perry, F., Schmidt, E., et al.: The dataflow model: a practical approach to balancing correctness, latency, and cost in massive- scale, unbounded, out-of-order data processing. Proceedings of the VLDB Endow- ment 8(12), 1792–1803 (2015) 2. Alexandrov, A., Kunft, A., Katsifodimos, A., Schüler, F., Thamsen, L., Kao, O., Herb, T., Markl, V.: Implicit parallelism through deep language embedding. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 47–61. ACM (2015) 3. Andrejev, A., He, X., Risch, T.: Scientific data as RDF with arrays: Tight integra- tion of scisparql queries into MATLAB. In: International Semantic Web Conference (Posters & Demos). CEUR Workshop Proceedings, vol. 1272, pp. 221–224. CEUR- WS.org (2014) 4. Arasu, A., Babu, S., Widom, J.: CQL: A Language for Continuous Queries over Streams and Relations. In: DBPL. Lecture Notes in Computer Science, vol. 2921, pp. 1–19. Springer (2003) 5. Calbimonte, J.P., Jeung, H., Corcho, ., Aberer, K.: Enabling Query Technologies for the Semantic Sensor Web. Int. J. Semantic Web Inf. Syst. 8(1), 43–63 (2012) 6. Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., Tzoumas, K.: Apache flink: Stream and batch processing in a single engine. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 36(4) (2015) 7. Cugola, G., Margara, A.: Processing flows of information: From data stream to complex event processing. ACM Comput. Surv. 44(3), 15:1–15:62 (2012) 8. Cyganiak, R., Wood, D., Lanthaler, M.: RDF 1.1 Concepts and Abstract Syntax. W3c Recommendation, W3C (2014), https://www.w3.org/TR/rdf11-concepts/ 9. Dell’ Aglio, D., Della Valle, E., van Harmelen, F., Bernstein, A.: Stream reasoning: A survey and outlook. Data Science 1(1–2), 59–83 (2017) 26 Stream Processing: The Matrix Revolutions 13 10. Dell’Aglio, D., Della Valle, E., Calbimonte, J.P., Corcho, .: RSP-QL Semantics: A Unifying Query Model to Explain Heterogeneity of RDF Stream Processing Systems. Int. J. Semantic Web Inf. Syst. 10(4), 17–44 (2014) 11. Harris, S., Seaborne, A.: SPARQL 1.1 Query Language. W3c Recommendation, W3C (2013), https://www.w3.org/TR/sparql11-query/ 12. Hutchison, D., Howe, B., Suciu, D.: Laradb: A minimalist kernel for linear and relational algebra computation. BeyondMR@SIGMOD pp. 2:1–2:10 (2017) 13. Kharlamov, E., Hovland, D., Jimnez-Ruiz, E., Lanti, D., Lie, H., Pinkel, C., Rezk, M., eveland, M.G.S., Thorstensen, E., Xiao, G., Zheleznyakov, D., Horrocks, I.: Ontology Based Access to Exploration Data at Statoil. In: International Semantic Web Conference (2). Lecture Notes in Computer Science, vol. 9367, pp. 93–112. Springer (2015) 14. Kunft, A., Alexandrov, A., Katsifodimos, A., Markl, V.: Bridging the gap: towards optimization across linear and relational algebra. In: Proceedings of the 3rd ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond. p. 1. ACM (2016) 15. Luckham, D.: The Power of Events: An Introduction to Complex Event Process- ing in Distributed Enterprise Systems. In: RuleML. Lecture Notes in Computer Science, vol. 5321, p. 3. Springer (2008) 16. MathWorks, T., NIST: Jama: Java matrix package. http://math.nist.gov/javanumerics/jama/ (2012), accessed: 2017-12-04 17. Mauri, A., Calbimonte, J.P., Dell’Aglio, D., Balduini, M., Brambilla, M., Della Valle, E., Aberer, K.: TripleWave: Spreading RDF Streams on the Web. In: International Semantic Web Conference (2). Lecture Notes in Computer Science, vol. 9982, pp. 140–149. Springer (2016) 18. McGregor, A.: Graph stream algorithms: a survey. ACM SIGMOD Record 43(1), 9–20 (2014) 19. Noy, N.F.: Semantic Integration: A Survey Of Ontology-Based Approaches. SIG- MOD Record 33(4), 65–70 (2004) 20. Ren, X., Cur, O.: Strider: A Hybrid Adaptive Distributed RDF Stream Processing Engine. In: International Semantic Web Conference (1). Lecture Notes in Computer Science, vol. 10587, pp. 559–576. Springer (2017) 21. Rogers, J., Simakov, R., Soroush, E., Velikhov, P., Balazinska, M., DeWitt, D., Heath, B., Maier, D., Madden, S., Patel, J., et al.: Overview of scidb. In: 2010 International Conference on Management of Data, SIGMOD’10 (2010) 22. Stonebraker, M., Brown, P., Zhang, D., Becla, J.: Scidb: A database management system for applications with complex analytics. Computing in Science & Engineer- ing 15(3), 54–62 (2013) 23. Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J.M., Kulkarni, S., Jackson, J., Gade, K., Fu, M., Donham, J., et al.: Storm@ twitter. In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. pp. 147–156. ACM (2014) 24. Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., et al.: Apache spark: A unified engine for big data processing. Communications of the ACM 59(11), 56–65 (2016) 27