Continuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, and Michael Grossniklaus Dipartimento di Elettronica e Informazione, Politecnico di Milano Piazza Leonardo da Vinci 32, I-20133 Milano, Italy {dbarbieri|braga|ceri|dellavalle|grossniklaus}@elet.polimi.it Abstract. Social semantic data are becoming a reality, but apparently their streaming nature has been ignored so far. Streams, being unboun- ded sequences of time-varying data elements, should not be treated as persistent data to be stored “forever” and queried on demand, but rather as transient data to be consumed on the fly by queries which are regis- tered once and for all and keep analyzing such streams, producing an- swers triggered by the streaming data and not by explicit invocation. In this paper, we propose an approach to continuous queries and real- time analysis of social semantic data with C-SPARQL, an extension of SPARQL for querying RDF streams. Keywords: Continuous SPARQL, Social Semantic Data, Continuous Query, Real-time Analysis 1 Introduction and Motivation “Which are the hottest topics under discussion on Twitter?” “Who is discussing about Italian food right now?” “What have my close friends been discussing in the last hour?” “Who is now discussing about Tuscany red wines in my social network?” “How many people have been twittering about northern Italy white wines in the last three hours?” The information required to answer those queries is increasingly becoming available on the Web. On the one side, we observe a trend in Web 2.0 as blogs, feeds and microblogs are adopted to disseminate and publish information in real-time streams trough social networking Web sites. This trend is often re- ferred to as the Twitter phenomenon or, in more broader terms, as the so-called blogosphere. On the other side, a trend can be also observed towards the inter- linking of Social Web with semantics [1] using vocabularies such as Semantically- Interlinked Online Communities1 (SIOC), Friend-of-a-Friend2 (FOAF) and Sim- ple Knowledge Organization System3 (SKOS). 1 http://rdfs.org/sioc/spec/ 2 http://xmlns.com/foaf/spec/ 3 www.w3.org/2004/02/skos/ 2 D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus As concrete examples, we can refer to three pioneers in this field – SMOB [2] is a distributed and decentralized microblogging system built on SIOC and FOAF, for which the authors implemented both a publishing and an aggregating service prototype. – Smesher [3] is a Semantic Microblogging client that integrates Twitter and identi.ca, detects structure in microposts, extracts content in RDF and allows SPARQL queries over the extracted information. – SemanticTweet4 is a simple Web service that generates a FOAF RDF docu- ment from the list of Twitter friends and followers of any Twitter user using the Twitter REST API. Several attempts to provide answers to those questions exist both by em- ploying Information Retrieval (IR) methods (e.g., technorati.com, icerocket.com, blogsearchengine.com, blogsearch.google.com) and Semantic Web methods (e.g., the SIOC API of sindice.com, SMOB aggregator and Smesher). Our claim is that answering such questions in the space of one-time semantics of current IR and Semantic Web tools is difficult because the underlying data are streams. Data streams are unbounded sequences of time-varying data elements. They have been recognized in a variety of modern applications, such as network mon- itoring, traffic engineering, sensor networks, RFID tag applications, telecom call records and financial applications. Processing of data streams has been largely investigated in the last decade [4], specialized Data Stream Management Sys- tems (DSMS) have been developed, and features of DSMS are becoming sup- ported by major database products, such as Oracle and DB2. DSMS represent a paradigm change in the database world as they move from persistent relations and user-invoked queries to transient streams and continuous queries. The innovative assumption is that streams can be consumed on the fly rather than being stored forever and that queries which are persistently monitor- ing streams are able to produce their answers even in the absence of invocation. DSMS can support parallel query answering over data originating in real-time and can cope with bursts of data by adapting their behavior to gracefully de- grade answer accuracy by introducing higher levels of approximation. However, even if such DSMS systems proved to be an optimal solution for on the fly anal- ysis of data streams, they cannot perform complex reasoning tasks, such as the ones required for computing the answers to the above questions. At the same time, while Semantic Web reasoners are year after year scaling up in the classical, time invariant domain of RDF triples and ontological knowledge, reasoning upon rapidly changing information has been neglected or forgotten so far. Reasoning systems assume static knowledge, and do not manage “changing worlds”—at most, one can update the ontological knowledge and then repeat the reasoning tasks. In [5], we propose Stream Reasoning as the new multi-disciplinary approach which will provide the abstractions, foundations, methods, and tools required to 4 http://semantictweet.com/ Continuous Queries and Analysis of Social Semantic Data with C-SPARQL 3 integrate data streams and reasoning systems, thus giving answer to the above and other questions from different domains. The rest of the paper is organized as follows. In Section 2, we provide the background needed to understand the proposed extensions to RDF and SPARQL introduced in Sections 3.1 and 3.3, respectively. In Section 4, we describe an architecture and implementation of a C-SPARQL engine. Section 5 is dedicated to the comparison of C-SPARQL to SPARQL using the real-world social data streams described in Section 3.2. We close the paper by discussing related work in Section 6 and draw conclusions in Section 7. 2 Background This section illustrates previous work on data streams and the SPARQL lan- guage. 2.1 Data Streams DSMS are based on the observation that not only is it impossible to control the order in which data items arrive in a stream, but, even more importantly, it is not feasible to locally store a stream in its entirety [6]. The Chronicle data model [7] is one of the first models proposed for data streams. It introduced the concept of chronicles, append-only ordered sequences of tuples, as well as a restricted view definition language and an algebra that op- erates both over chronicles and traditional relations. OpenCQ [8], NiagaraCQ [9] and Aurora [10] are representative implementations of DSMS addressing contin- uous queries and distribution issues. The first query language tailored to data streams, CQL [11, 12], was the result of research done by Babu et al. [13] on the problem of continuous queries over data streams, addressing semantic issues as well as efficiency concerns. They specify a general and flexible architecture for query processing in the presence of data streams. More recently, Law et al. [14] put particular emphasis on the problem of mining data streams [15]. They conceived and developed Stream Mill [16], which considers and addresses data mining issues extensively, specifically with respect to the problem of online data aggregation and to the distinguishing notion of blocking and non-blocking operators. Its query language (ESL) efficiently sup- ports physical and logical windows (with optional slides and tumbles) on both built-in aggregates and user-defined aggregates. The constructs introduced in ESL extend the power and generality of DSMS. 2.2 SPARQL SPARQL has been developed under the patronage of the W3C as the standard query language for RDF. Therefore, the most authoritative source on its syntax 4 D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus and semantics is the W3C recommendation [17]. Several papers, however, discuss extensions to the SPARQL language as defined by the W3C. With the goal of proposing a syntactic and semantic extension to SPARQL, we found the following works particularly useful. – Gutierrez et al. [18], who define a conjunctive query language for RDF with basic patterns, which is a formal and unambiguous basis for defining the semantics of SPARQL queries evaluation; – Perez et al. [19], who analyze the semantics and complexity of SPARQL; and – Cyganiak [20] and Haase et al. [21], who independently present a relational model of SPARQL which allows to implement SPARQL queries over a rela- tional database engine. 3 RDF Streams and Continuous SPARQL Data models and query languages for DSMS are not sufficient for continuously querying and analyzing in real-time streams of RDF. Indeed, we deem that there is a potential interest for giving up one-time semantics in RDF repositories as well, so as to explore the benefits provided by continuous semantics. Therefore, we introduce RDF streams as the natural extension of the RDF data model to the new continuous scenario and Continuous SPARQL (or simply C-SPARQL) as the extension of SPARQL for querying RDF streams. 3.1 RDF Streams An RDF stream is defined as an ordered sequence of pairs, where each pair is constituted by an RDF triple and its timestamp τ . ... (hsubji , predi , obji i , τi ) (hsubji+1 , predi+1 , obji+1 i , τi+1 ) ... Timestamps can be considered the context of RDF triples. They are monoton- ically non-decreasing in the stream (τi ≤ τi+1 ). They are not strictly increasing because timestamps are not required to be unique. Any (unbounded, though fi- nite) number of consecutive triples can have the same timestamp, meaning that they occur at the same time, although sequenced in the stream according to some positional order. Our definition of RDF streams extends RDF in the same way as the stream type in CQL extends the relation type. Named graphs [22] and N-Quads [23], a format that extends N-Triples with context, can be both adopted as a concrete serialization for RDF streams. For our experiments we adopt N-Quads and we use as context the timestamp encoded as a RDF literal of type xsd:dateTime. Continuous Queries and Analysis of Social Semantic Data with C-SPARQL 5 3.2 A Real Source of Social Semantic Data Streams Given that RDF streams of social semantic data are not readily available yet, we decided to use the data provided by Social Network Glue5 . Glue enables users to connect with their friends on the Web based on the pages the users visit online. Using semantic recognition technologies to automatically identify books, music, movies, wines, stocks, movie stars and many other similar topics, it generates a continuous stream of the identified objects which is accessible in real time us- ing a REST API6 . The REST request “http://api.getglue.com/v1/glue/recent” returns the 250 most recent public interactions. We adopted the GRDDL ap- proach [24] and implemented a simple way to translate the resulting XML into RDF. A live version is running at http://c-sparql.cefriel.it/sdow-demo. Below, we provide a snapshot of the resulting RDF stream. Subject Predicate Object Timestamp glueinter:i1 rdf:type sioc:Post “2009-07-20T22:48:52Z” glueinter:i1 sioc:content “The Proposal on imdb.com” “2009-07-20T22:48:52Z” glueinter:i1 sioc:has container http://www.getglue.com “2009-07-20T22:48:52Z” glueinter:i1 dc:title “The Proposal” “2009-07-20T22:48:52Z” glueuser:id1 rdf:type sioc:User “2009-07-20T22:48:52Z” glueinter:i1 sioc:has creator glueuser:id1 “2009-07-20T22:48:52Z” glueinter:i1 sioc:topic gluecat:movies “2009-07-20T22:48:52Z” glueinter:i2 rdf:type sioc:Post “2009-07-20T22:48:54Z” glueinter:i2 sioc:content “Mario Kart Wii on gamefaqs.com” “2009-07-20T22:48:55Z” glueinter:i2 sioc:has container http://www.getglue.com “2009-07-20T22:48:55Z” glueinter:i2 dc:title “Mario Kart Wii” “2009-07-20T22:48:55Z” glueuser:id2 rdf:type sioc:User “2009-07-20T22:48:55Z” glueinter:i2 sioc:has creator glueuser:id2 “2009-07-20T22:48:55Z” glueinter:i2 sioc:topic glueint:video games “2009-07-20T22:48:55Z” Similarly to SemanticTweet, we are also able to translate the social relation- ships obtained using the REST service “/user/friends” into FOAF. 3.3 Continuous SPARQL C-SPARQL is an extension of SPARQL for querying both RDF graphs and RDF streams. The complete definition of the language in terms of syntax and semantics is given in [25]. We briefly repeat the definitions of the distinguishing features of the language here and show how to write the queries that evaluate the answers to the questions which opened this paper. Continuous Queries The distinguishing feature of C-SPARQL is the support for continuous queries, i.e. queries that are registered and then executed con- tinuously over windows opened on RDF streams and standard RDF graphs. Continuous queries, which make usage of aggregates, are particularly relevant. A C-SPARQL query is registered using the grammar extension provided by the first of the following two production rules. 5 http://getglue.com 6 http://getglue.com/api 6 D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus Registration → ‘REGISTER QUERY’ QueryName ‘AS’ Query Registration → ‘REGISTER STREAM’ QueryName ‘AS’ Query As output C-SPARQL queries produce the same types as SPARQL queries: boolean answers, selections of variable bindings, RDF descriptions of the involved resources or constructions of new RDF triples. These outputs are continuously renewed in each query execution. In addition, C-SPARQL queries can be regis- tered to produce new RDF streams using the grammar extension provided by the second production rule given above. In this second case, only CONSTRUCT and DESCRIBE queries can be registered, as they produce RDF triples that, once associated with a timestamp, yield RDF streams that can be managed in C- SPARQL. Windows Given that RDF streams are intrinsically infinite, we introduce the notion of windows upon RDF streams, whose types and characteristics are in- spired by those of the windows in continuous query languages such as CQL [12]. Identification and windowing are expressed in C-SPARQL by means of the FROM STREAM clause. FromStrClause → ‘FROM’ [‘NAMED’] ‘STREAM’ StreamIRI ‘[ RANGE’ Window ‘]’ From the RDF stream identified by StreamIRI, a window extracts the last triples, which are considered by the query. The extraction can be physical (a given number of triples) or logical (a variable number of triples which occur during a given time interval). The part of C-SPARQL that we introduced so far is sufficient to address the question “What have my closest friends been visiting in the last hour?” Below, we show how to formulate this query in C-SPARQL over the Glue interaction stream and the graph of FOAF relationships described in Section 3.2. REGISTER QUERY WhatHaveMyCloseFriendsBeenVisitingInTheLastHour AS PREFIX sioc: PREFIX foaf: PREFIX glue: SELECT DISTINCT ?friend ?topic FROM FROM STREAM [ RANGE 60m STEP 5m ] WHERE { glue:id1 foaf:knows ?friend . ?post sioc:has_creator ?friend . ?post rdf:type sioc:Post . ?post sioc:topic ?topic . } The first triple pattern matches triples in the FOAF graph, whereas the other three triple patterns match triples in a sliding window of 60 minutes opened on the RDF stream, which advances progressively in steps of 5 minutes. Aggregation Another question that we could answer using a C-SPARQL is “Which are the top topics in Glue?” To do so, we also need to introduce the Continuous Queries and Analysis of Social Semantic Data with C-SPARQL 7 aggregation capabilities that we added to C-SPARQL. We allow multiple in- dependent aggregations within the same C-SPARQL query, thus pushing the aggregation capabilities beyond those of SQL and other proposals for aggrega- tion in SPARQL7 . Aggregation clauses have the following syntax. AggregateClause → ( ‘AGGREGATE { (’ var ‘,’ Fun ‘,’ Group ‘)’ [Filter ] ‘}’ )* Fun → ‘COUNT’ | ‘SUM’ | ‘AVG’ | ‘MIN’ | ‘MAX’ Group → var | ‘{’ var ( ‘,’ var )* ‘}’ An aggregation clause starts with a new variable not occurring in the WHERE clause, followed by an aggregation function and closed by a set of one or more variables, occurring in the WHERE clause, which express the grouping criteria. For instance, the query above can be expressed in C-SPARQL as follows. REGISTER QUERY TopTopicsGlueUsersAreInterestedIn AS PREFIX sioc: SELECT DISTINCT ?topic ?number FROM STREAM [ RANGE 30m STEP 10m ] WHERE { ?post sioc:topic ?topic . } AGGREGATE {(?number , COUNT , {?topic })} 4 A C-SPARQL Engine The C-SPARQL engine was designed based on a separation of concerns between stream management and query evaluation. This separation is the foundation for a simple architecture for C-SPARQL, built upon known database and reason- ing technolgies. Figure 1 shows the three main components of our C-SPARQL execution framework. The module named C-SPARQL Query Parser gets a C-SPARQL query as input and produces the information needed by the Data Stream Manager Layer and the SPARQL EndPoint Layer to execute the query. The data stream man- ager registers the data streams specified in the query and applies logical or physical windows. When the resulting graph has been produced, the SPARQL part of the C-SPARQL query is executed by the SPARQL endpoint. This process is executed as frequently as specified in the REGISTER clause of the C-SPARQL query. Finally, the result computed is timestamped and passed on. Both the data stream manager and the SPARQL endpoint are considered plugins, in order to be independent from the actual DSMS/SPARQL engine implementations that will be used. We have implemented a prototype based on this architecture using ESPER as a DSMS and Jena as a SPARQL endpoint. ESPER8 is a component for stream processing applications, which require high throughput to process large volumes of data elements (between 1,000 to 100k messages per second) and low latency to react in real-time (from a few 7 http://esw.w3.org/topic/SPARQL/Extensions/Aggregates 8 http://esper.codehaus.org/ 8 D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus C-SPARQL Query Parser Continuous part Standard REGISTER SPARQL Query FROM STREAM Data Stream SPARQL Manager Layer EndPoint Layer Fig. 1. C-SPARQL Engine Architecture milliseconds to a few seconds). In particular, it supports the various forms of windows we defined in C-SPARQL (i.e., sliding and tumbling windows both in terms of time and length) and several forms of aggregation we plan to exploit for optimizations based on query rewriting. Jena9 is a Java framework for building Semantic Web applications. We choose it, because it includes a custom RDF storage engine for high performance appli- cations and a SPARQL query engine. The SPARQL engine supports standard SPARQL and aggregation, GROUP BY and assignment as SPARQL extensions. The adoption of off-the-shelf stream management systems and reasoning tools both provide a solid framework and a fast way of prototyping. 5 Evaluation In order to evaluate, our approach we compared the time required to compute a C-SPARQL query with our engine, to the time needed to execute an equivalent SPARQL query in Jena using its SPARQL engine over its custom RDF storage engine. The tests have been run on a Pentium Core 2 Quad 2.0GHz with 2GB of main memory. As representative C-SPARQL query we chose the following one, in which we count how many Glue users are interested in the various topics recognized by Glue in the last 3 minutes. REGISTER QUERY CountHowManyGlueUsersAreInterestedInEachTopic AS PREFIX sioc: SELECT DISTINCT ?topic ?number FROM STREAM [ RANGE 3m STEP 10s ] WHERE { ?post sioc:topic ?topic . } AGGREGATE {(?number , COUNT , {?topic })} 9 http://jena.sourceforge.net/ Continuous Queries and Analysis of Social Semantic Data with C-SPARQL 9 This simple query showcases all characteristics of C-SPARQL, namely, reg- istration, selection of triples based on a window over a RDF stream and an aggregate function. We registered this C-SPARQL query in our engine and continuously executed it every second. We run two experiments feeding new triples from the RDF stream into the C-SPARQL engine at two different rates: 5 triples per second (5 t/s) and 200 t/s. In both experiments, we measured the time required to compute the answer with the triples in the window. It is possible to write a SPARQL query, which computes the same results as the C-SPARQL query above, by adding (a) a triple pattern that matches the creation date of the post, (b) a FILTER clause that selects the same time interval of the C-SPARQL query, and (c) an aggregate function that counts the number of topics using Jena SPARQL extensions for aggregates. PREFIX sioc: PREFIX dcterms: PREFIX xsd: SELECT ?topic count(?topic) WHERE { ?post sioc:topic ?topic . ?post dcterms:created ?date . FILTER (?date > ‘‘2009-07-20T22:47:00Z’’ˆˆxsd:dateTime && ?date < ‘‘2009-07-20T22:50:00Z’’ˆˆxsd:dateTime ) } GROUP BY ?topic Using Jena, we executed the above SPARQL query six times against reposito- ries containing the first 100, 500, 1000, 1500, 2000 and 2500 triples, respectively. In these experiments, we again measured the time required to compute each answer. The results are shown in Figure 2 and are named SPARQL. Comparing the linear regressions of the three experiments, named Linear(SPARQL), Linear(C- SPARQL 5 t/s) and Linear(C-SPARQL 200 t/s), we see that the C-SPARQL window based selection always performs significantly better than the FILTER based selection of SPARQL in Jena. Notably, this result holds both for a low triple per second rate of 5 t/s and a reasonably high rate of 200 t/s. 6 Related Work A previous effort to combine SPARQL and data streams is presented in Bolles et al. [26]. They introduce a syntax for the specification of logical and physical windows in SPARQL queries by means of local grammar extensions. Our approach is different from their in several key aspects. First, Bolles et al. omit essential ingredients such as aggregate functions, thus the resulting ex- pressive power is not sufficient to express interesting practical queries such as “Which are the top topics under discussion?”. Second, the authors do not follow the approach, established by DSMS, to only use windows to transform streaming data into non-streaming data in order to apply standard algebraic operations. Bolles et al. choose to also change the standard SPARQL operators by making them timestamp-aware and, thereby, effectively introduce a new language seman- tics. Finally, their approach allows window clauses to appear within SPARQL 10 D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus Response Time C-SPARQL vs. SPARQL 40 30 ms 20 10 0 0 500 1000 1500 2000 2500 number of triples in the window (C-SPARQL) or in the repository (SPARQL) SPARQL C-SPARQL 5 t/2 C-SPARQL 200 t/s Linear (SPARQL) Linear (C-SPARQL 5 t/2) Linear (C-SPARQL 200 t/s) Fig. 2. The window based selection of C-SPARQL outperforms the FILTER based selection of SPARQL. group graph pattern expressions. This makes the query syntax more intricate and it complicates query evaluation. Moreover, it violates the separation of con- cerns between stream management and query evaluation that is the basis of our simple architecture for C-SPARQL engines. 7 Conclusions We began this paper with a list of questions related to social data on the Web that stress the streaming nature of blogs and microblogs. Our initial claim was that social data should not be treated as persistent data to be stored forever and queried on demand, but rather as transient data to be consumed on the fly by registered queries. In order to prove this claim, we have made the following arguments in this paper. – RDF streams can be defined by extending RDF data type with a notion of timestamp; – RDF streams can be serialized as N-Quads; – sources of RDF streams are available and can be obtained from blogs and microblogs with the same approach used to obtain the RDF representations used for social semantic data; – SPARQL can be extended with the notion of continuous query registered once and for all that keep monitoring such RDF streams, producing answers triggered by the streaming data and not by explicit invocation; Continuous Queries and Analysis of Social Semantic Data with C-SPARQL 11 – C-SPARQL queries can be evaluated using a simple architecture based on the decision to keep stream management and query evaluation separated; and – in terms of response time, even in a naive implementation of this architecture the window based selection of C-SPARQL outperforms the FILTER based selection needed to formulate the equivalent query in SPARQL. Moreover, we do not exploit several optimization opportunities. On the one hand, we can adopt smarter query rewriting techniques that push part of a C-SPARQL query evaluation from the SPARQL engine to the DSMS. On the other hand, we are not considering the parallel nature of streams and thus the opportunity for parallel continuous query processing. Finally, we have been limiting ourselves to treat RDF as relational data with- out considering it part of the Semantic Web stack. We are currently investigating techniques [27] to incrementally maintain materialization of knowledge derived by the triples currently selected by the window. Acknowledgements The work described in this paper has been partially supported by the European project LarKC (FP7-215535). Michael Grossniklaus’s work is carried out under SNF grant number PBEZ2-121230. References 1. Bojars, U., Breslin, J.G., Peristeras, V., Tummarello, G., Decker, S.: Interlinking the social web with semantics. IEEE Intelligent Systems 23(3) (2008) 29–40 2. Passant, A., Hastrup, T., Bojars, U., Breslin, J.: Microblogging: A semantic web and distributed approach. In: 4th Workshop Scripting For the Semantic Web (SFSW2008) co-located with ESWC2008. (2008) 3. Nowack, B.: Semantic microblogging. In: Microblogging Conference. (2009) 4. Garofalakis, M., Gehrke, J., Rastogi, R.: Data Stream Management: Processing High-Speed Data Streams (Data-Centric Systems and Applications). Springer- Verlag New York, Inc., Secaucus, NJ, USA (2007) 5. Della Valle, E., Ceri, S., Braga, D., Celino, I., Frensel, D., van Harmelen, F., Unel, G.: Research chapters in the area of stream reasoning. In: SR2009. Volume 466 of CEUR Workshop Proceedings., CEUR-WS.org (2009) online http://ceur- ws.org/Vol-466/sr2009-intro.pdf. 6. Golab, L., DeHaan, D., Demaine, E.D., López-Ortiz, A., Munro, J.I.: Identifying Frequent Items in Sliding Windows over On-line Packet Streams. In: Proc. Intl. Conf. on Internet Measurement (IMC 2003). (2003) 173–178 7. Jagadish, H.V., Mumick, I.S., Silberschatz, A.: View Maintenance Issues for the Chronicle Data Model. In: Proc. ACM Symp. on Principles of Database Systems (PODS 1995). (1995) 113–124 8. Liu, L., Pu, C., Tang, W.: Continual Queries for Internet Scale Event-Driven Information Delivery. IEEE Trans. Knowl. Data Eng. 11(4) (1999) 610–628 12 D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus 9. Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Chen, W., Naughton, J.F., Bernstein, P.A., eds.: Proc. ACM Intl. Conf. on Management of Data (SIGMOD 2000). (2000) 379–390 10. Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Cherniack, M., Con- vey, C., Galvez, E., Salz, J., Stonebraker, M., Tatbul, N., Tibbetts, R., Zdonik, S.: Retrospective on Aurora. The VLDB Journal 13(4) (2004) 370–383 11. Arasu, A., Babcock, B., Babu, S., Datar, M., Ito, K., Nishizawa, I., Rosenstein, J., Widom, J.: STREAM: The Stanford Stream Data Manager (Demonstration Description). In: Proc. ACM Intl. Conf. on Management of data (SIGMOD 2003). (2003) 665 12. Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Foundations and Query Execution. The VLDB Journal 15(2) (2006) 121–142 13. Babu, S., Widom, J.: Continuous Queries over Data Streams. SIGMOD Rec. 30(3) (2001) 109–120 14. Law, Y.N., Wang, H., Zaniolo, C.: Query Languages and Data Models for Database Sequences and Data Streams. In: Proc. Intl. Conf. on Very Large Data Bases (VLDB 2004). (2004) 492–503 15. Law, Y.N., Zaniolo, C.: An Adaptive Nearest Neighbor Classification Algorithm for Data Streams. In: Proc. Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD 2005). (2005) 108–120 16. Bai, Y., Thakkar, H., Wang, H., Luo, C., Zaniolo, C.: A Data Stream Language and System Designed for Power and Extensibility. In: Proc. Intl. Conf. on Information and Knowledge Management (CIKM 2006). (2006) 337–346 17. Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/ 18. Gutierrez, C., Hurtado, C., Mendelzon, A.O.: Foundations of Semantic Web Databases. In: Proc. ACM Symp. on Principles of Database Systems (PODS 2004). (2004) 95–106 19. Pérez, J., Arenas, M., Gutierrez, C.: Semantics and Complexity of SPARQL. In: Proc. Intl. Semantic Web Conf. (ISWC 2006). (2006) 30–43 20. Cyganiak, R.: A Relational Algebra for SPARQL. Technical report, HP-Labs 21. Haase, P., Broekstra, J., Eberhart, A., Volz, R.: A Comparison of RDF Query Languages. In: Proc. Intl. Semantic Web Conf. (ISWC 2004). (2004) 502–517 22. Carroll, J.J., Bizer, C., Hayes, P.J., Stickler, P.: Named graphs, provenance and trust. In: WWW. (2005) 613–622 23. Cyganiak, R., Harth, A., Hogan, A.: N-quads: Extending n-triples with context. http://sw.deri.org/2008/07/n-quads/ (2008) 24. Connolly, D., et al.: Gleaning Resource Descriptions from Dialects of Lan- guages (GRDDL) - W3C Recommendation. Available on the Web at http://www.w3.org/TR/grddl/ (11 September 2007) 25. Barbieri, D.F., Braga, D., Ceri, S., Della Valle, E., Grossniklaus, M.: C-SPARQL: SPARQL for continuous querying. In: WWW. (2009) 1061–1062 26. Bolles, A., Grawunder, M., Jacobi, J.: Streaming SPARQL – Extending SPARQL to Process Data Streams. In: Proc. Europ. Semantic Web Conf. (ESWC 2008). (2008) 448–462 27. Volz, R., Staab, S., Motik, B.: Incrementally maintaining materializations of on- tologies stored in logic databases. J. Data Semantics 2 (2005) 1–34