A Data Mashup Language for the Data Web Mustafa Jarrar Marios D. Dikaiakos University of Cyprus mjarrar@cs.ucy.ac.cy University of Cyprus mdd@cs.ucy.ac.cy unstructured data. For example, imagine how would be the results ABSTRACT when using Google to search a database of job vacancies, say “well-paid research-oriented job in Europe”. The results will not This paper is motivated by the massively increasing structured be precise or clean, because the query itself is still ambiguous data on the Web (Data Web), and the need for novel methods to although the underlying data is structured. People are demanding exploit these data to their full potential. Building on the to not only retrieve job links but also want to know the starting remarkable success of Web 2.0 mashups, this paper regards the date, salary, location, and may render the results on a map. internet as a database, where each web data source is seen as a table, and a mashup is seen as a query over these sources. We Web 2.0 mashups are a first step in this direction. A mashup is a propose a data mashup language, which allows people to web application that consumes data originated from third parties intuitively query and mash up structured and linked data on the and retrieved via APIs. For example, one can build a mashup that web. Unlike existing query methods, the novelty of MashQL is retrieves only well-paid vacancies from Google Base and mix it that it allows people to navigate, query, and mash up a data with similar vacancies from LinkedIn. The problem is that source(s) without any prior knowledge about its schema, building mashups is an art that is limited to skilled programmers. vocabulary, or technical details. We even do not assume even that Although some mashup editors have been proposed by the Web a data source should an online or inline schema. Furthermore, 2.0 community to simplify this art (such as Google Mashups, MashQL supports query pipes as a built-in concept, rather than Microsoft’s Popfly, IBM’s sMash, and Yahoo Pipes), however, only a visualization of links between modules. what can be achieved by these editors is limited. They only focus on providing encapsulated access to some APIs, and still require programming skills. In other words, these mashup methods are motivating for -rather than solving- the problem of structured-data 1. INTRODUCTION AND MOTIVATION retrieval. To expose the massive amount of structured data to its full potential, people should be able to query and mash up this In this short article we propose a data mashup approach in a data easily and effectively. graphical and Yahoo Pipes’ style. This research is still a work in progress, thus please refer to [13] for the latest findings. Position: To build on the success of Web 2.0 mashups and overcome their limitation, we propose to regard the web as a In parallel to the continuous development of the hypertext web, database, where each data source is seen as a table, and a mashup we are witnessing a rapid emergence of the Data Web. Not only is seen as a query over one or multiple sources. In other words, the amount of social metadata is increasing, but also many instead of developing a mashup as an application that access companies (e.g., Google Base, Upcoming, Flicker, eBay, structured data through APIs, this art can be simplified by Amazon, and others) started to make their content freely regarding a mashup as a query. For example, instead of accessible through APIs. Many others (see linkeddata.org) are developing a “program” to retrieve and fuse certain jobs from also making their content directly accessible in RDF and in a Google Base and Jobs.ac.uk, this program should be seen as a linked manner [3]. We are also witnessing the launch of RDFa, data query over two remote sources. Query formulation (i.e., which allows people to access and consume HTML pages as mashup development or data fusion) should be fast and should not structured data sources. require any programming skills. This trend of structured and linked data is shifting the focus of Challenges: Before a user formulates a query on a data source, web technologies towards new paradigms of structured-data she needs to know how the data is structured, and what are the retrieval. Traditional search engines cannot serve such data labels of the data elements, i.e., the schema. Web users are not because their core design is based on keyword-search over expected to investigate “what is the schema” each time they Permission to make digital or hard copies of all or part of this work for search or filter structured information. This issue is particularly personal or classroom use is granted without fee provided that copies are more difficult in case of RDF and linked data. RDF data may not made or distributed for profit or commercial advantage and that come without a schema\ontology, and if exists, the schema is copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, mixed up with the data. In addition, as RDF data is a graph, one requires prior specific permission and/or a fee. have to manually navigate this graph in order to formulate a query about it. Imagine large and multiple linked data sources, with Copyright is held by the author/owner(s). diverse content and vocabularies, how you would manage to LDOW2009, April 20, 2009, Madrid, Spain. understand the data structure, inter-relationships, namespaces, and MashQL visualizes links between query modules, similar to the unwieldy labels of the data elements. In short, formulating Yahoo Pipes and other Mashup editors, but the main purpose of queries in open environments, where data structures and MashQL is to help people to formulate what is inside these query vocabularies are unknown in advance, is a hard challenge, and modules. may hamper building data mashups by non-IT people. Differently from the above Web 2.0 mashup editors, a more To allow people to query and mash up data sources intuitively, we sophisticated editor has been proposed in [8], called MashMaker. propose a data mashup language, called MashQL. The main It is a functional programming environment that allows one to novelty of MashQL is that it allows non IT-skilled people to mashup web content in a spreadsheet-style user interface. Like a query and explore one (or multiple) RDF sources without any spreadsheet, MashMaker stores every value that is computed in a prior knowledge about the schema, structure, vocabulary, or any single, central data structure. MashMaker is not comparable with technical details of these sources. To be more robust and cover MashQL since it cannot serve as a query language by it is own. most cases in practice, we even do not assume that a data source should have -an offline or online- schema\ontology at all. In the In XML databases, the Lore query language [9] has been background, MashQL queries are translated into and executed as proposed to allow people to query XML data graphically, and SPARQL queries. without prior knowledge about the data. Lore assumes that data is represented as a graph, called EOM, which is close to RDF. The Paper organization: Before presenting MashQL, in the next difference between Lore and MashQL is not only the intuitiveness section we overview the art of query formulation, which has been and expressivity, but essentially, MashQL does not assume the studied by different research communities. We present MashQL data graph to have a certain schema, however, Lore assumes that in section 3, and in section 4 we introduce the notion of query a data graph should have a dataguide, which is a computed pipes. The implementation of MashQL and a three use cases are summary of the data, i.e. play the role of a schema. presented in section 5 and 6 respectively. The coverage and the limitations of MashQL and its future directions are discussed in More about query formulations scenarios and (which scenario is section 7. more intuitive to the casual user) can be found in a recent usability study in [14]. It concluded that a query language should be close to natural language and graphically intuitive, and it 2. RELATED WORK should not assume knowledge about the data source. Several approaches have been proposed by the DB community to query structured data sources, such as query-by-example [23] and 3. THE MASHQL LANGUAGE conceptual queries [4,6,17]. However, none of these approaches was used by casual users. This is because they still assume The main goal of MashQL is to allow people to mash up and fuse knowledge about the relational/conceptual schema. Among these, data sources easily. In the background MashQL queries are we found ConQuer [4] has some nice features, specially the tree automatically translated into and executed as SPARQL queries. structure of queries, but it also assumes one to start from the Without prior knowledge about a data source, one can navigate schema. In the natural language processing community, it has this source and fuse it with another source easily. To allow people been proposed to allow people to write queries as natural to build on each other’s results MashQL supports query pipes as a language sentences, and then translate these sentences into a built-in concept. The example below shows two web data sources formal language (SQL [15] or XQuery [16]). However, these and a SPARQL query to retrieve “the book titles authored by Lara approaches are challenged with the language ambiguity and the and published after 2007”. The same query in MashQL is shown “free mapping” between sentences and data schemes. in Figure 2. The first module specifies the query input, and the second module specifies the query body. The output can be piped This topic started to receive a high importance within the into a third module (not shown here), which renders the results Semantic Web community. Several approaches (GRQL [1], into a certain format (such as HTML,XML or CSV), or as RDF iSPARQL [11], NITELIGHT [19] and RDFAuthor [18]) are input to other queries. Notice that in this way, one can easily build proposing to represent triple patterns graphically as ellipses a query to fuse the content of two sources in a linked manner [3]. connected with arrows. However, these approaches assume http://Site1.com/RDF Query: :a1 :Title “Web 2.0” advanced knowledge of RDF and SPARQL. Other approaches use :a1 :Author “Hacker B.” PREFIX S1: PREFIX S2: Visual Scripting Languages (e.g., SPARQLMotion [21] and Deri :a1 :Year 2007 SELECT ? ArticleTitle :a1 :Publisher “Springer” FROM Pipes [22]), by visualizing links between query modules; but a :a2 :Title “Web 3.0” FROM WHERE { query module merely is a window containing a SPARQL script in :a2 :Author “Smith B.” {{?X S1:Title ?ArticleTitle}UNION {?X S2:Title ?ArticleTitle}} a textual form. These approaches are inspired by some industrial http://Site2.com/RDF {?X S1:Author ?X1} UNION {?X S2:Author ?X1} :4 :Title “Semantic Web” mashup editors such as Popfly, sMash, and Yahoo Pipes. These :4 :Author “Tom Lara” {?X S1:PubYear ?X2} UNION {?X S2:Year ?X2} FILTER regex(?X1, “^Hacker”) industry editors provide a nice visualization of APIs’ interfaces :4 :PubYear 2005 FILTER (?X2 > 2000)} :5 :Title “Web services” Results: and some operators between them. However, when a user needs to :5 :Author “Bob Hacker” ArticleTitle express a query over structured data, she needs to use the formal Web 2.0 language of that editor, such as YQL for Yahoo Pipes. Although Figure 1. An example of a SPARQL query. projection symbol 5 can be used before a variable to indicate that it will be returned in the results1. In short, while interacting with the editor, the editor queries the dataset in the background in order to generate the next list depending on the previous selections. In this way, people can navigate a graph without prior knowledge about it. Similar to SPARQL, all restrictions in MashQL are considered necessary when evaluating a query. However, if a restriction is prefixed with “maybe”, it is considered optional; and, if it is prefixed with “without” is considered unbound (see Figure 3). Figure 2. An example of MashQL query. MashQL supports also union (denoted as “\”) between objects, predicates, subjects, and queries; as well as, a type operator The intuition of MashQL is described as the following: Each (“Any”), Inverse predicates, datatype and language tags, and query Q is seen as a tree. The root of this tree is called the query many objects filters. subject (e.g. Article), denoted as Q(S), which is the subject matter PREFIX a: being inquired. Each branch of the tree is called a restriction R PREFIX S1: and is used to restrict a certain property of the query subject, Q(S) SELECT ?SongTitle, ?AlbumName FROM WHERE {?Song S1:Title ?SongTitle. {{?Song S1:Duration ?X1} ؔ R1 AND … AND Rn. Branches can be expanded to allow sub UNION {?Song a:Length ?X1}} FILTER (?X1 > 3). {{?Song S1:Artist S1:Shakira} UNION {?Song S1:Artist S1:AxelleRed}} OPTIONAL{?Song S1:Album ?AlbumName}. trees (called query paths), which enable one to navigate the OPTIONAL{?Song S1:Copyright ?X2}. underlying dataset. In this case, the object in the restriction is FILTER (!Bound(?X2)).} considered the subject of its sub query. As Figure 3 shows, the Figure 4. A query involving optional and negative restrictions. query retrieves the title of every article, published after 2005, and written by an author, who has an address, this address has a country called Cyprus. 4. THE NOTION OF QUERY PIPES PREFIX S1: SELECT ?ArticleTitle To deploy MashQL in an open world some challenges might be FORM < http://www.example.com? WHERE { ?X1 rdf:type :Article. faced. This section overviews these challenges (from a query ?X1 S1:Title ?ArticleTitle. formulation viewpoint) and introduces the notion of query pipes. ?X1 S1:Year ?X2. FILTER (?X2 > 2005). ?X1 S1:Author ?X3. As discussed earlier, one may create a mashup and redirect its ?X3 S1:Address ?X4. output to another mashup. We call the chain of queries that ?X4 S1:Country ?X5. FILTER regex(?X5, “Malta”)} connect to each other in this way as pipe. Allowing people to formulate query pipes is not merely a visualization of links Figure 3. A query involving paths, and its mapping into SPARQL. between query modules, but when compiling a pipe (i.e., Formulating queries in MashQL is designed to be an interactive translating it into SPARQL), some issues should be considered. process, by which the complexity of understanding data structures is moved to the query editor. Users only use drop-down lists to First: Translating MashQL into SPARQL SELECT statements is express their queries. not enough, because the SELECT statement produces the results The query subject is selected from a list generated dynamically from, either: (1) the set of the subject-types in the dataset; (2) or 1 the union of all subject and object identifiers in the dataset; users Some issues are lengthy to illustrate here. For example, when a can also choose to (3) introduce their own label; in this case the user moves the mouse over a restriction, it gets the editing mode label is seen as a variable and displayed in italic. The default and all other restrictions get the verbalize mode (i.e., all boxes subject is the variable “Anything”. To add a restriction, the list of and lists are made invisible, but the verbalization of their properties (e.g., Title, Author) is generated, depending on the content is generated and displayed instead). This is not only to chosen subject. Users may then select a filter (e.g., Equals, make the readability of the queries closer to natural language, but also to allow users to validate whether what they did is what Contains, Between, etc.), or select an object identifier from a list, they intended. The editor also detects and normalizes which is then generated from the set of the possible objects namespaces: find similar URLs and hide them when necessary. identifies, depending on the previous selections. Furthermore, For example, when two properties originating from different users select to expand the tree to declare a query path. The data sources have the same URL, their namespaces are found and hided. in a tabular form. To allow queries to input each other (especially Let Qi be a query over a set of sources {D1,..,Dm}, and T is a for producing linked data), the results of a query should be given time. Qi will be re-executed if (RQiT + RQiA) ≤ T and (RQiT < formed as a graph. In SPARQL, the CONSTRUCT statement RDjT), where 1 ≤ j ≤ m. produces a graph, but then one needs to manually specify how this graph should be produced. To overcome this, we propose the Pipe auto-refresh: Each pipe P(D) is automatically refreshed if construct (CONSTRUCT *). This is not part of the standard RDA expires. This implies re-executing the chain of queries in this SPARQL but has been proposed also by others to be included in pipe. Let P(D) be a pipe, D=Qn(D1,..,Dm), and T is a given time. If the next version of the standard [20]. In MashQL, the (RDT+RDA) ≤ T, then each ith query in P(D) is executed if (RQiT < CONSTRUCT * means retrieves all triples involved in the query RDjT ), where 1 ≤ j ≤ m for Qi, and 1 ≤ i ≤ n. Queries in P(D) are conditions and satisfy them. For example, suppose the query in executed from the bottom to the topmost, or recursively as Figure 2 is piped into another, its CONSTRUCT * translation will P(P(D1),…,P(Dm)). retrieve {<:b1 :Title “Linked Data”>,<:b1 :Author “Lara T.”>,<:b1 :Year 2007>}. When compiling a pipe of queries, If As argued in the data warehousing literature [2,24] an efficient the output of a query is directed as input to another query, a refreshing strategies is the incremental updates, which suggests CONSTRUCT * statement will be generated, otherwise, a that if a base source receives new transactions, only these SELECT statement will be generated. transactions are transformed and the affected queries are refreshed. This strategy is still an open research issue for RDF in Second: When executing a SPARQL query, all query engines an open world [7], because RDF data and queries are developed assume that the queried data is stored locally; otherwise, this data and maintained autonomously by different people. must be downloaded and stored at the engine-side before the execution process starts. The time complexity of executing a query on local data is usually fast2; however, the bottleneck will be the downloading time. In case the input of a query is an output 5. IMPLEMENTATION to another query (i.e., in case of query pipes) the problem will be even more difficult, as queries will be calling each other. First: we have developed an online mashup editor, which will be Furthermore, it is also possible that users (intentionally or by publically available next month. Similar to creating feed mashups mistake) end up with query loops (e.g. Q1→Q2→Q3→Q1), which in Yahoo Pipes, MashQL users can query and fuse data sources may cause computational overheads. To face this challenge, and the output of their queries can be redirected as input to other MashQL allows users to materialize the results of their queries. In the background, Oracle 11g is used for storing and queries/pipes and decide their refreshing strategies, as follows: querying RDF. When a user specifies a data source(s) as input, it is bulk-loaded to the Oracle’s semantic technology tables. The results of a query (called derived source) are stored MashQL queries are also translated into Oracle’s SPARQL. physically and deployed as a concrete RDF source. Primal input While interacting with the editor to formulate a query, the editor sources (called base sources) are also cached for performance performs some background queries through AJAX. Each purposes. Given a query Q over a set of base or derived sources published query is given a URL. Calling this URL means {D1,..,Dm}, the results of this query is denoted as D = Q(D1,..,Dm), executing this query and getting its results back. and D ∉ {D1,..,Dm}. We define a Pipe as an acyclic chain of queries, where the result of a query is an input to the next. The Second: We started to also develop a Firefox add-on in order to chain of the queries that derives D is denoted as the pipe P(D). allow people develop mashups at the client side. The opened pages -in the browser tabs- are automatically selected as input We call the problem of keeping a pipe up-to-date, the pipes sources, and at the left-side panel a mashup can be created. The consistency. Let D be the results of a query Q(D1,..,Dm), and T the results are rendered by the browser in a new tab. The idea is to latest time the set {D1,..,Dm} has been changed. Then, D is allow web pages that embed RDF triples (i.e., RDFa or consistent at T if D=Q(D1,..,Dm). To maintain pipes consistency, microformats) to be queried and mashed up. For example, one two updating strategies are used: Query auto-refresh and Pipe will be able to compose his publication list from Google Scholar, auto-refresh. MashQL maintains for each base or derived source DBLP, ACM, and CiteSeer; or, filter all video lectures given by D a timestamp of its last update RDT and an auto-refresh time Berners-Lee from YouTube and VedioLectures. Because the interval RDA; and for each query Q a timestamp of its previous mentioned web sites do not support RDFa yet, one can mine/distil successful execution RQT and an auto-refresh interval RQA. the RDF triples, using third party services such as triplr.org, buzzword.org.uk, wandora.org or Dapper. Query auto-refresh: Each query will be automatically executed if its auto-refresh interval expires and one of its inputs is updated. 2 A query with medium size complexity over a large dataset takes one or few seconds [5]. 6. USE CASES This is section we present two hypothetical use cases to illustrate using MashQL for developing data mashups. 6.1 Use case: Retailer Fnac is a large retailer of cultural and consumer electronics products. When a new product arrives to Fnac, it has to be entered to the inventory database. This is usually done by scanning the barcode on each product, and then manually filling the product specifications. Furthermore, as Fnac trades in many countries, Figure 6. A mashup of product titles from different resources. their product specifications have to be translated into several PREFIX s1: languages. To save time entering and translating information PREFIX s2: PREFIX s3: manually, Fnac decided to reuse the product data specifications SELECT ?Barcode ?EnglishTitle ?FrenchTitle (and their translation) that are produced at the factory side. For FROM example, suppose Fnac received three packages from Cannon, FROM Alfred, and IMDB. Fnac would like to scan the barcode of the FROM WHERE{ received products and then get their specifications directly from {{?x s1:Barcode ?Barcode} UNION {?x s2:Bcode ?Barcode} the online catalogues of those suppliers. In Figure 5 we show UNION {?x s3:Prodcode ?Barcode}} samples of online product catalogues of the three suppliers (we FILTER (regex(?Barcode, “9781143557532”) || regex(?Barcode, “8765422097653”) || assume they are published in RDFa). Figure 6 illustrates a query regex(?Barcode, “3248765355133)”). that Fnac built to look up the multilingual titles of three products. {OPTIONAL {?x s1:ShortName ?EnglishTitle}} UNION This query is a mashup of three RDF data sources with a user- {{OPTIONAL {?x s1:Title ?EnglishTitle}} UNION input of three barcode numbers. The query takes each of these {OPTIONAL {?x s2:Title ?EnglishTitle}} FILTER (lang(?EnglishName) = ”en”)} barcodes and finds the English and French titles. Notice that Fnac {{OPTIONAL {?x s1:Title ?FrenchTitle}} UNION assumed that short titles provided by Cannon are in English, thus, {OPTIONAL {?x s2:Title ?FrenchTitle}} they are joined with the other titles that are tagged with "@en". FILTER (lang(?FrenchTitle) = ”fr”)}} See the retrieved results in Figure 8. In this same way, a barcode Figure 7. The SPARQL equivalent of Figure 6. reader could be connected with user-input module, to retrieve the specifications (which could be stored at the supplier side) each Barcode EnglishTitle FrenchTitle time a product is scanned. 9781143557532 CanScan 4400F 8765422097653 The Prophet Le prophète http:www.cannon/products/rdf http://www.alfred.com/books 3248765355133 All about my mother Tout sur ma mère _:P1 :ShortName “CanScan 4400F” <:B1> :Type <:Book> Figure 8. Retrieved product titles. _:P1 :FullName “Canon CanoScan <:B1> :Title “The Prophet”@en 4400F Color Image Scanner” <:B1> :Title “Le prophète”@fr _:P1 :Producer “Canon” <:B1> :BCode 8765422097653 _:P1 :ShippingWeight> “4 pounds” _:P1 :Barcode 9780133557022 <:B1> :Authors “Kahlil Gibran” <:B1> :ISBN-10 0394404289 6.2 Use case: Citations List _:P2 :ShortName “PowerShot SD100” <:B3> :Type <:Book> Bob would like to compile the list of articles that cited his articles _:P2 :FullName “Canon PowerShot <:B3> :Title “Alfred Nobel”@en SD10007.1MP Camera 3x Zoom” <:B3> :Title “Alfred Nobel”@fr (excluding what he cited himself). He built a mashup using _:P2 :Producer “Canon” <:B3> :BCode 75639898123 MashQL to mix his citations retrieved from both Google Scholar _:P2 :ShippingWeight> “2 pounds” <:B3> :Authors “Kenne Fant” and CiteSeer, and then filter out the self-citations. First, he _:P2 :Barcode 9781143557532 <:B3> :ISBN- 0531123286 performed a keyword search (“Bob Hacker”) on both Google http://www.imdb.com/movies Scholar and CiteSeer3. Figure 9 shows a sample of the extracted _:1 rdf:Type <:Movie> _:1 :Title “All about my mother”@en RDF triples. Bob’s MashQL query is shown in Figure 10, and its _:1 :Title “Tout sur ma mère”@fr SPARQL equivalent in Figure 11. In this query, Bob wrote: _:1 :ProdCode 3248765355133 retrieve every article that has a title (call it CitingArticle), has an _:1 :NumberOfDiscs: 1 _:2 rdf:Type <:Movie> _:2 :Title “Lords of the rings”@en _:2 :Title “Seigneur des anneaux”@fr 3 Similar to the previous use case, we assume that both Google _:2 : ProdCode 4852834058083 _:2 :NumberOfDiscs: 3 Scholar’s and CiteSeer’s render their search results in RDFa (i.e. the RDF triples are embedded in HTML), as many Figure 5. Sample of RDF data about products. companies started to do nowadays. However, Bob can also use a third party’s service (e.g. triplify.org) to extract triples from HTML pages. author that does not contain "Bob Hacker" or "Hacker B.", and 6.3 Use case: Job Seeking cites another article that has a title (call it CitedArticle), and has an author that contains "Bob Hacker" or "Hacker B.". Figure 12 Bob has a PhD in bioinformatics. He is looking for a full-time, shows the result of this query. well paid, and research-oriented job in some European countries. He spent an enormous amount of time searching different job http://scholar.google.com/scholar?q=b http://www.citeseer.com/search?s=“Bo portals, each time trying many keywords and filters. Instead, Bob ob+Hacker b Hacker” used MashQL to find the job that meets his specific preferences. :Title “Prostate Cancer” _:1 :Title “Prostate Cancer” Figure 13 shows Bob’s queries on Google Base and on :Author “Hacker B.,Hacker A.” _:1 :Author “Hacker B., Hacker A.” :Title “Best and Worst _:2 :Title “Protocols in Molecular Jobs.ac.uk. First, he visited Google Base and performed a Lifestyles” Biology” keyword search (bioinformatics OR "computational biology" OR :Atuhor “Bob Hacker” _:2 :Atuhor “Bob Hacker” "systems biology" OR e-health); he copied the link of the :Cites _:2 :ArticleCited _:1 :Title “Protein Categories” _:3 :Title “Cancer Vaccines” retrieved results from Google (which are in rendered in RDFa) :Atuhor “Bob Smith” _:3 :Atuhor “Eve Lee, Bob Hacker” into the RDFInput module; and then created a MashQL query on :Cites _:4 :Title “Overview about Systems these results. He performed a similar task to query Jobs.ac.uk. :Cites Biology” :Title “Cancer Vaccines” _:4 :Atuhor “Tom Lara” The third MashQL module in Figure 13, mixes the results of the :Atuhor “Alice Hacker” _:4 :ArticleCited _:1 above two queries and filters them based on location preferences :Cites _:4 :ArticleCited _:2 (provided in the UserInput module). The SPRQAL equivalent to Figure 9. Sample of RDF data about Bob’s articles. Bob’s MashQL query is shown in Figure 14. Figure 10. A mashup of citation from different sites. PREFIX s1: http://scholar.google.com/scholar?q=bob+Hacker PREFIX s2: http://www.citeseer.com/search?s=“Bob Hacker SELECT CitingArticle? ?CitedArticle From From Figure 13. Bob’s mashup of jobs. WHERE { {{?X1 s1:Title ?CitingArticle} UNION CONSTRUCT * {?X1 s2:Title ?CitingArticle}} WHERE {?Job :JobIndustry ?X1; CONSTRUCT * {{?X1 s1:Author ?X2} UNION {?X1 s2:Author ?X2}} :Type ?X2; WHERE { {{?X1 s1:Cites ?X3} UNION {?X1 s2:ArticleCited ?X3}} :Currency ?X3; ?Job :Category ?X1; {{?X3 S1:Title ?CitedArticle} UNION :Salary ?X4. :Role ?X2; {?X3 S2:Title ?CitedArticle} FILTER(?X1=“Education”|| :SalaryCurrency ?X3; {{?X3 s1:Author ?X4} UNION {?X3 s2:Author ?X4}} ?X1=“HealthCare”) :SalaryLower ?X4. FILTER (regex(?X2,”^Bob Hacker”)||regex(?X2,”^Hacker FILTER(?X2=“Full-Time”|| FILTER (?X1=“Health” || B.”))} ?X2=“Fulltime”)|| ?X1=“BioSciences”) FILTER Not(regex(?X4,”^Bob Hacker”) || ?X2=“Contract”) FILTER(?X2=“Research\Academic regex(?X4,”^Hacker B.”)) } FILTER(?X3=“^Euro”|| ) ?X3=“^€”) FILTER (?X3 = “UKP”) FILTER(?X4>=75000|| FILTER (?X4 > 50000) } Figure 11. The SPARQL equivalent of Figure 10. ?X4<=120000)} SELECT ?Job CitingArticle CitedArticle WHERE { Protein Categories Prostate Cancer ?Job :Location ?X1 Protein Categories Best and Worst Lifestyles FILTER (?X1=“^UK” || ?X1=“^Belgium”)||?X1 = “^Germany”) Cancer Vaccines Prostate Cancer || ?X1=“^Austria”)|| ?X1=“^Holland”))} Overview about Systems Biology Prostate Cancer Overview about Systems Biology Protocols in Molecular Biology Figure 14. The SPARQL equivalent of Figure 13. Figure 12. The query results. 7. DISCUSSION AND FUTURE 4 Bloesch A, Halpin, T: Conceptual Queries using ConQuer–II. (1997) DIRECTIONS 5 Chong E, Das S, Eadon G, Srinivasan J: An efficient SQL- This article proposed a language that allows people to query and based RDF querying scheme. VLDB (2005) mash up structured data without any prior knowledge about the 6 Czejdo B, and Elmasri R, and Rusinkiewicz M, and Embley D: schema, structure, vocabulary, or technical details of this data. An algebraic language for graphical query formulation using Not only non-IT experts can use MashQL, but professionals can an EER model. Computer Science conference. ACM. (1987) also use it to build advanced queries. 7 Deng Y, Hung E, Subrahmanian VS: Maintaining RDF views. Tech. Rep CS-TR-4612 University of Maryland. 2004 MashQL supports all constructs of the W3C standard SPARQL, 8 Ennals R, Garofalakis M: MashMaker: mashups for the except the “NAMED GRAPH” construct, which is introduced for masses. SIGMOD Conference 2007: advanced use, i.e. switching between different graphs within same 9 Goldman R, Widom J: DataGuides: Enabling Query query. To be close to user needs and intuition, we defined new Formulation and Optimization in Semistructured Databases. constructs (e.g. OneOf, union “\”, Without, Any, reverse “~”, and VLDB (1997) others). The constructs are not directly supported in SPARQL, but 10 Hofstede A, Proper H, and Weide T: Computer Supported emulated. We plan to include aggregation and grouping functions; Query Formulation in an Evolving Context. Australasian DB especially as they are supported by Oracle’s SPARQL. Conf. (1995) 11 http://demo.openlinksw.com/isparql (Feb. 2009) Yet, MashQL does not support inferencing constructs (such as 12 Jarrar M, Dikaiakos: MashQL: A Query-by-Diagram Topping SubClass, or SubProperty), which are useful indeed for data SPARQL. Proceedings of ONISW'08 workshop. (2008). fusion. As these constructs are expensive to compute (thus lead to bad interactivity of MashQL), we plan replace the Oracle’s 13 Jarrar M, Dikaiakos M: A query-by-diagram language semantic technology that we are currently using as an RDF store, (MashQL). Technical Article TAR200805. University of Cyprus, 2008. ttp://www.cs.ucy.ac.cy/~mjarrar/JD08.pdf with an RDF index that we are developing, for speedy OWL inferencing. 14 Kaufmann E, Bernstein A: How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users. ISWC (2007) We have downloaded most of the public RDF sources, on which our MashQL editor will be deployed online next month. Not only 15 Li Y, Yang H, Jagadish H: NaLIX: An interactive natural people will benefit from this, but we will also have the language interface for querying XML. SIGMOD (2005) opportunity to better evaluate the usability of MashQL and its 16 Popescu A, Etzioni O, Kautz H: Towards a theory of natural contribution to linking and fusing more data bottom-up. language interfaces to databases. 8th Con on Intelligent user interfaces. (2003) 17 Parent C, Spaccapietra S: About Complex Entities, Complex Acknowledgement Objects and Object-Oriented Data Models. Info. System We are indebted to Dr. George Pallis, Dr. Demetris Zeinalipour, Concepts(1989) and other colleagues for their valuable comments and feedback 18 http://rdfweb.org/people/damian/RDFAuthor (Jan. 2009) on the early drafts of this paper. This research is partially 19 Russell A, Smart R, Braines D, Shadbolt R.: NITELIGHT: A supported by the SEARCHiN project (FP6-042467, Marie Curie Graphical Tool for Semantic Query Construction. The Actions). Semantic Web User Interaction Workshop. (2008) 20 http://esw.w3.org/topic/SPARQL/Extensions? (Feb. 2009) 21 http://www.topquadrant.com/sparqlmotion (Feb. 2009) REFERENCES 22 Tummarello G, Polleres A, Morbidoni C: Who the FOAF knows Alice? A needed step toward Semantic Web Pipes. 1 Athanasis N, Christophides V, Kotzinos D: Generating On the ISWC WS. (2007) Fly Queries for the Semantic Web. ISWC (2004) 23 Zloof M: Query-by-Example:a Data Base Language. IBM 2 Abiteboul S, Duschkal O: Complexity of Answering Queries Systems Journal, 16(4). (1977) Using Materialized Views. ACM SIGACT-SIGMOD- 24 Zhuge Y, Garcia-Molina H, Hammer J, Widom J: View SIGART. (1998) Maintenance in a Warehousing Environment. SIGMOD (1995) 3 Bizer C, Heath T, Berners-Lee T:Linked Data: Principles and State of the Art. WWW (2008)