The Linked Data Benchmark Council (LDBC) Irini Fundulaki1 , Josep Larriba Pey2 , David Dominguez-Sal2 , Ioan Toma3 , Dieter Fensel3 , Barry Bishop4 , Thomas Neumann5 , Orri Erling6 , Peter Neubauer7 , Paul Groth8 , Frank van Harmelen8 , and Peter Boncz8 1 Foundation for Research and Technology - Hellas 2 Universitat Politcnica de Catalunya 3 Universitaet Innsbruck 4 Ontotext AD 5 Technische Universitt Mnchen 6 OpenLink Software 7 Neo Technology 8 Vrije Universiteit Amsterdam 1 Introduction In the last years we have seen an explosion of massive amounts of graph shaped data coming from a variery of applications that are related to social networks (Facebook, Twitter, blogs and other on-line media) and telecommunication net- works. Furthermore, the W3C Linking Open Data Initiative [8] has boosted the publication and interlinkage of a large number of datasets on the Semantic Web [2] resulting to the Linked Open Data Cloud. These datasets with bil- lions of RDF triples such as Wikipedia[5], U.S. Census bureau [4], CIA World Factbook[1], DBPedia [5], and government sites1 have been created and pub- lished online. Moreover, numerous datasets and vocabularies from e-science are published nowadays as RDF graphs most notably in life and earth sciences, as- tronomy [6][7][3] in order to facilitate community annotation and interlinkage of both scientific and scholarly data of interest. Technology and bandwidth now provide the opportunities for compiling, publishing and sharing massive Linked Data datasets. A significant number of commercial semantic repositories (i.e., RDF databases with reasoner and query-engine) which are the cornerstone of the Semantic Web exist. Neverthless at the present time, there is no: – comprehensive suite of benchmarks that encourage the advancement of tech- nology by providing both academia and industry with clear targets for per- formance and functionality. – independent authority for developing benchmarks and verifying the results of those engines. The same holds for the emerging field of noSQL graph databases, which share with RDF a graph data model, pattern- and path- oriented query languages. 1 http://data.gov.uk/, http://www.data.gov/ The Linked Data Benchmark Council (LDBC) project aims to provide a solution to this problem by making insightful the critical properties of graph and RDF data management technology, and stimulating progress through compet- tion. This is timely and urgent since non-relational data management is emerg- ing as a critical need for the new data economy based on large, distributed, heterogeneous, and complexly structured data sets. This new data management paradigm also provides an opportunity for research results to impact young inno- vative companies working on RDF and graph data management to start playing a significant role in this new data economy. 2 Objectives & Outcomes The main technical objective of the Linked Data Benchmark Council (LDBC) is the development of benchmarks for different technology areas including core data management (query processing, query optimisation, transactions), graph analysis, data integration and reasoning. More specifically, LDBC aims at the: – development of new benchmarks that will spur research and industry progress in large-scale graph and RDF data management. This includes setting chal- lenges that will lead to significant progress in: • scalability, storage, indexing and query optimization techniques for RDF and graph database solutions beyond Terabyte scales. • quantitatively and qualitatively assess different solutions for data inte- gration, and • computationally cheaper reasoning in RDF engines. – establishment of an industry-neutral entity, the LDBC foundation for devel- oping graph and RDF benchmarks, auditing benchmark results, and pub- lishing audited results. The LDBC Foundation will work in the same spirit as the Transaction Processing Council (TPC) that has estabished a widely accepted by the industry, set of benchmarks for relational database manage- ment systems. It will be responsible for: • specifying benchmarks, benchmarking procedures and verifying/publishing results. • providing a TPC-style auditing service for certifying results published by vendors for benchmarks endorsed by LDBC. • training auditors for its benchmarking, creating a long-lasting business model for auditing benchmark results. 3 Target Audiences The target audiences of LDBC that will comprise the core of the LDBC foun- dation as well as benefiting from and using the project results in the areas of technology, market and education are: – Technology Users: This group includes both private and commercial users of RDF and graph databases that will use or integrate this technology for the benefits it has over traditional relational database management techniques. – Researchers: This category includes a broad range of researchers from those who focus on graph-shaped data representations, query languages and opti- misations, all the way to researchers from other fields who use this technol- ogy. – Technology Vendors: This group is made up of commercial developers of RDF and graph database software components. It includes vendors who sell the software they produce as well as those who sell only services around their (open-source) products. References 1. Cia world factbook. https://www.cia.gov/library/publications/the-world- factbook/index.html. 2. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. SCIENTIFIC AMERICAN-AMERICAN EDITION, 2001. 3. The UMD Astronomy Information and Knowledge Group. Astonomy Ontology in OWL. archive.astro.umd.edu. 4. The 2000 US Census. www.rdfabout.com/demo/census. 5. DBPedia. www.dbpedia.org. 6. Gene Ontology. www.geneontology.org. 7. UniProtRDF. dev.isb-sib.ch/projects/uniprot-rdf. 8. W3C Linking Open Data. esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData.