The Linked Data Benchmark Council (LDBC)

           Irini Fundulaki1 , Josep Larriba Pey2 , David Dominguez-Sal2
                    , Ioan Toma3 , Dieter Fensel3 , Barry Bishop4
                , Thomas Neumann5 , Orri Erling6 , Peter Neubauer7
              , Paul Groth8 , Frank van Harmelen8 , and Peter Boncz8
                 1
                     Foundation for Research and Technology - Hellas
                         2
                           Universitat Politcnica de Catalunya
                               3
                                 Universitaet Innsbruck
                                     4
                                       Ontotext AD
                           5
                              Technische Universitt Mnchen
                                 6
                                    OpenLink Software
                                   7
                                      Neo Technology
                            8
                              Vrije Universiteit Amsterdam


1     Introduction

In the last years we have seen an explosion of massive amounts of graph shaped
data coming from a variery of applications that are related to social networks
(Facebook, Twitter, blogs and other on-line media) and telecommunication net-
works. Furthermore, the W3C Linking Open Data Initiative [8] has boosted
the publication and interlinkage of a large number of datasets on the Semantic
Web [2] resulting to the Linked Open Data Cloud. These datasets with bil-
lions of RDF triples such as Wikipedia[5], U.S. Census bureau [4], CIA World
Factbook[1], DBPedia [5], and government sites1 have been created and pub-
lished online. Moreover, numerous datasets and vocabularies from e-science are
published nowadays as RDF graphs most notably in life and earth sciences, as-
tronomy [6][7][3] in order to facilitate community annotation and interlinkage
of both scientific and scholarly data of interest. Technology and bandwidth now
provide the opportunities for compiling, publishing and sharing massive Linked
Data datasets. A significant number of commercial semantic repositories (i.e.,
RDF databases with reasoner and query-engine) which are the cornerstone of
the Semantic Web exist.
    Neverthless at the present time, there is no:

    – comprehensive suite of benchmarks that encourage the advancement of tech-
      nology by providing both academia and industry with clear targets for per-
      formance and functionality.
    – independent authority for developing benchmarks and verifying the results
      of those engines. The same holds for the emerging field of noSQL graph
      databases, which share with RDF a graph data model, pattern- and path-
      oriented query languages.
1
    http://data.gov.uk/, http://www.data.gov/
The Linked Data Benchmark Council (LDBC) project aims to provide a
solution to this problem by making insightful the critical properties of graph and
RDF data management technology, and stimulating progress through compet-
tion. This is timely and urgent since non-relational data management is emerg-
ing as a critical need for the new data economy based on large, distributed,
heterogeneous, and complexly structured data sets. This new data management
paradigm also provides an opportunity for research results to impact young inno-
vative companies working on RDF and graph data management to start playing
a significant role in this new data economy.


2     Objectives & Outcomes

The main technical objective of the Linked Data Benchmark Council (LDBC)
is the development of benchmarks for different technology areas including core
data management (query processing, query optimisation, transactions), graph
analysis, data integration and reasoning. More specifically, LDBC aims at the:

    – development of new benchmarks that will spur research and industry progress
      in large-scale graph and RDF data management. This includes setting chal-
      lenges that will lead to significant progress in:
       • scalability, storage, indexing and query optimization techniques for RDF
         and graph database solutions beyond Terabyte scales.
       • quantitatively and qualitatively assess different solutions for data inte-
         gration, and
       • computationally cheaper reasoning in RDF engines.
    – establishment of an industry-neutral entity, the LDBC foundation for devel-
      oping graph and RDF benchmarks, auditing benchmark results, and pub-
      lishing audited results. The LDBC Foundation will work in the same spirit
      as the Transaction Processing Council (TPC) that has estabished a widely
      accepted by the industry, set of benchmarks for relational database manage-
      ment systems. It will be responsible for:
       • specifying benchmarks, benchmarking procedures and verifying/publishing
         results.
       • providing a TPC-style auditing service for certifying results published
         by vendors for benchmarks endorsed by LDBC.
       • training auditors for its benchmarking, creating a long-lasting business
         model for auditing benchmark results.


3     Target Audiences

The target audiences of LDBC that will comprise the core of the LDBC foun-
dation as well as benefiting from and using the project results in the areas of
technology, market and education are:
 – Technology Users: This group includes both private and commercial users of
   RDF and graph databases that will use or integrate this technology for the
   benefits it has over traditional relational database management techniques.
 – Researchers: This category includes a broad range of researchers from those
   who focus on graph-shaped data representations, query languages and opti-
   misations, all the way to researchers from other fields who use this technol-
   ogy.
 – Technology Vendors: This group is made up of commercial developers of
   RDF and graph database software components. It includes vendors who sell
   the software they produce as well as those who sell only services around their
   (open-source) products.


References
1. Cia world factbook.         https://www.cia.gov/library/publications/the-world-
   factbook/index.html.
2. T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. SCIENTIFIC
   AMERICAN-AMERICAN EDITION, 2001.
3. The UMD Astronomy Information and Knowledge Group. Astonomy Ontology in
   OWL. archive.astro.umd.edu.
4. The 2000 US Census. www.rdfabout.com/demo/census.
5. DBPedia. www.dbpedia.org.
6. Gene Ontology. www.geneontology.org.
7. UniProtRDF. dev.isb-sib.ch/projects/uniprot-rdf.
8. W3C Linking Open Data. esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData.