<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ParlBench: a SPARQL-benchmark for electronic publishing applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tatiana Tarasova</string-name>
          <email>T.Tarasova@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maarten Marx</string-name>
          <email>maartenmarx@uva.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISLA, University of Amsterdam</institution>
          ,
          <addr-line>Science Park 904, 1098 XH Amsterdam</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>ParlBench is a scalable RDF benchmark modelling a large scale electronic publishing scenario. The benchmark o ers large collections of the Dutch parliamentary proceedings together with information about members of the parliament and political parties. The data is real, but free of intellectual property rights issues. On top of the benchmark data sets, several realistic application benchmarks as well as targeted micro benchmarks can be developed. This paper describes the benchmark data sets and 28 analytical queries covering a wide range of SPARQL constructs. The potential use of ParlBench is demonstrated by executing the query set for 8 di erent scalings of the benchmark data sets on Virtuoso RDF store. Measured on a standard laptop, data loading times varied from 43 seconds (for 1% of the data set) to 48 minutes (for the complete data set), and execution of the complete set of queries (1520 queries in total) varied from 9 minutes to 13 hours.</p>
      </abstract>
      <kwd-group>
        <kwd>SPARQL</kwd>
        <kwd>RDF benchmark</kwd>
        <kwd>parliamentary proceedings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>RDF stores are the backbones of RDF data driven applications. There is a wide range
of RDF stores systems available1 together with various benchmark systems2 to assess
performances of the systems.</p>
      <p>
        As discussed in the Benchmark Handbook [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], di erent applications impose di
erent requirements to a system, and the performance of the system may vary from one
application domain to another. This creates the need for domain speci c benchmarks.
The existing application benchmarks for RDF store systems often employ techniques
developed by the Transaction Processing Performance Council [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (TPC) for relational
databases and use synthetically generated data sets for their workloads. However,
performance characteristics for loading and querying such data may di er from those that
were measured on real life data sets, as it was shown by the DBpedia benchmark [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] on
DBpedia [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. To the best of our knowledge, among the existing benchmarks for RDF
store systems, only the DBpedia benchmark provides a real data set.
      </p>
      <p>With this work we propose the ParlBench application benchmark that closely
mimics a real-life scenario: large scale electronic publishing with OLAP-type queries.
ParlBench consists of (1) real life data and (2) a set of analytical queries developed on top
of these data.
? This work was supported by the EU- FP7 (FP7/2007-2013) project ENVRI (grant number
283465).</p>
      <sec id="sec-1-1">
        <title>1 http://www.w3.org/wiki/LargeTripleStores</title>
      </sec>
      <sec id="sec-1-2">
        <title>2 http://www.w3.org/wiki/RdfStoreBenchmarking</title>
        <p>The benchmark data sets include the Dutch parliamentary proceedings, political
parties and politicians. The ParlBench data t very well the desiderata of Gerhard
Weikum's recent Sigmod blog3: it is open, big, real, useful, linked to other data sources,
mixing data-values and free text, and comes with a number of real-life workloads.</p>
        <p>
          The queries in the benchmark can be viewed as coming from one of two use cases:
create a report or perform a scienti c research. As an example of the latter, consider the
question whether the performance of males and females di ers in parliament, and how
that has changed over the years. To enable more comprehensive analysis of the RDF
stores' performances, we grouped the benchmark queries into four micro benchmarks
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] with respect to the their analytical aims Average, Count, Factual and Top 10.
        </p>
        <p>The paper is organized as follows. Section 2 gives an overview of the related work.
Section 3 describes the benchmark data sets. In Section 4 we de ne the benchmark
queries and present the micro benchmarks. The evaluation of the ParlBench benchmark
on the Virtuoso RDF store is discussed in Section 5.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        There is a number of RDF store benchmarks available. The most relevant benchmarks
to our work are discussed further. The Berlin SPARQL Benchmark (BSBM) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
implements an e-commerce application scenario. Similarly to ParlBench, BSBM employed
the TPC [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] techniques, such as query permutations (for the Business Intelligence use
case) and system ramp-up.
      </p>
      <p>
        The SPARQL Performance Benchmark (SP2Bench) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is settled in the DBLP
scenario. SP2Bench queries are carefully designed to test the behavior of RDF stores in
relation to common SPARQL constructs, di erent operator constellations and RDF
access patterns. SP2Bench measures query response time in cold runs settings, i.e.,
when query execution time is measured immediately after the server was started.
      </p>
      <p>
        Both the Berlin and SP2Bench use synthetically generated data sets, whereas, the
DBpedia SPARQL Benchmark (DBPSB) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] uses a real data set, DBpedia [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In
addition to using a real data set, the DBPSB benchmark uses real queries that were
issued by humans and applications against DBpedia. These queries cover most of the
SPARQL features and enable comprehensive analysis of RDF stores' performance on
a single feature as well as combinations of features. The main di erence between the
ParlBench and DBPSB benchmarks is that the latter is not developed with a particular
application in mind. Thus, it is more useful for a general assessment of the performance
of di erent RDF stores' implementations, while ParlBench is particularly targeted on
developers of e-publishing applications and can support them in choosing systems that
are more suitable for analytical query processing.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Benchmark Data Sets</title>
      <p>The benchmark consists of ve conceptually separate data sets summarized in Table 1:
Members : describes political players of the Dutch parliament.</p>
      <p>Parties : describes Dutch political parties.
3 http://wp.sigmod.org/?p=786
Proceedings : describes the structure of the Dutch parliamentary proceedings.
Paragraphs : contains triples linking paragraphs to their content.</p>
      <p>Tagged entities : contains triples linking paragraphs to DBpedia entities indicating
that these entities were discussed in the paragraphs.
dataset # of triples size # of les
members 33,885 14M 3,583
parties 510 612K 151
proceedings 36,503,688 4.15G 51,233
paragraphs 11,250,295 5.77G 51,233
tagged entities 34,449,033 2.57G 34,755</p>
      <p>TOTAL: 82,237,411 13G 140,955</p>
      <p>The data model of the benchmark data sets is described in Appendix A.
3.1</p>
      <sec id="sec-3-1">
        <title>Scaling of the Benchmark Data Sets</title>
        <p>The size of the ParlBench data sets can be changed in di erent ways. The data set
can be scaled by the number of included proceedings. All proceeding les are ordered
chronologically. The scaled data set of size 1=n consists of every n-th le in this list, plus
the complete Parties and Members sets. Optionally, one can include Tagged entities
and/or Paragraphs data sets to the test collection. In this case Paragraphs and Tagged
Entities are scaled accordingly to the included proceedings, i.e., only paragraphs and/or
tags pointing to id's in the chosen proceedings are included.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Benchmark Queries</title>
      <p>ParlBench provides 19 SPARQL queries. The queries were grouped into four micro
benchmarks:
Average: 3 queries, numbered from A0 to A2, retrieve aggregated information.
Count: 5 queries, numbered from C0 to C4, count entities that satisfy certain
conditions.</p>
      <p>Factual: 6 queries, numbered from F0 to F5, retrieve instances of a particular class
that satisfy certain conditions.</p>
      <p>Top 10: 5 queries, numbered from T0 to T4, retrieve the top 10 instances of a particular
class that satisfy certain ltering conditions.</p>
      <p>All the queries are listed in Appendix B. Their SPARQL representations can be seen
in Appendix C. The benchmark queries cover a wide range of the SPARQL language
constructs. Table 1 shows the usage of SPARQL features by individual query and
distribution of the features across micro benchmarks.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Run of the Benchmark</title>
      <p>
        In this section we demonstrate the application of our benchmark on the OpenLink
Virtuoso RDF native store (OSE)4. Tested on the Berlin benchmark, Virtuoso showed
one of the best performance results among other systems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <sec id="sec-5-1">
        <title>4 http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/</title>
        <sec id="sec-5-1-1">
          <title>Evaluation Metrics</title>
          <p>Loading Time The loading time is the time for loading RDF data into an RDF store.
The benchmark data sets are in RDF/XML format. The time is measured in seconds.
Loading of data into Virtuoso was done one data set at a time. For the loading of
Parties and Members we used the Virtuoso RDF bulk load procedure. For Proceedings
we used the Virtuoso function DB.DBA.RDF LOAD RDFXML MT to load large RDF/XML
text.</p>
          <p>Query Response Time The query response time is the time it takes to execute a
SPARQL query. To run the queries programmatically, we used isql, the Virtuoso
interactive SQL utility. The execution time of a single query was taken as the real time
returned by the bash /usr/bin/time command. 10 permutations of the benchmark
queries were created, each containing 19 SPARQL queries.</p>
          <p>Before starting measuring the query response time, we warmed-up Virtuoso by
running 5 times 10 di erent permutations of all 19 queries of the benchmark. In total,
950 queries were executed in the warm-up phase, and each query was run 50 times.
After that, we run the same permutations 3 more times and measured the execution
time of each query. The query response time was computed as the mean response time
of executing each query 30 times.</p>
          <p>Test Collections Experiments are run on 8 test collections. Each collection includes
the Parties and Members data sets and a scaled Proceedings data set ranging from 1
to 100% . Table 2 gives an overview of the sizes of each test collection.
Scaling Factor 1% 2% 4% 8% 16% 32% 64% 100%
# of triples 494,875 1,027,395 1.906,880 3,851,642 7,554,304 15,129,621 23,341,602 36,542,431
We report on three experiments, relating database size to execution time: (1) time
needed to load the test collections ( g. 2), (2) total time needed to execute all the
queries in micro benchmarks5 ( g. 3), and (3) query execution time of all the queries
on the largest collection ( g. 4).</p>
          <p>The y-axes on g. 2 and g. 3 are presented in a log scale, and the numbers represent
the loading and query response time in seconds. Appendix E contains larger versions
of these plots.</p>
          <p>To make the results reproducible, we publish the benchmark data sets, queries and
scripts at http://data.politicalmashup.nl/RDF/data/.
5 For each group we summed the execution time of each query in the group.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>ParlBench has the proper characteristics of an RDF benchmark: it can be scaled easily
and it has a set of intuitive queries which measure di erent aspects of the SPARQL
engine.</p>
      <p>We believe that ParlBench is a good proxy for a realistic large scale digital
publishing application. ParlBench provides real data that encompass major characteristics
shared by most of the e-publishing use cases including rich metadata and hierarichal
organization of the content into text chunks.</p>
      <p>
        The data set is large enough to perform non-trivial experiments. In addition to the
analytical scenario presented, one can think of several other application scenarios that
can be developed on the same data sets. Due to the many and strong connections of
the benchmark to the Linked Open Data Cloud through the DBpedia links, natural
Linked Data integration scenarios can be developed from ParlBench. The ParlBench
data is also freely available in XML format [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], enabling cross-platform comparisons of
the same workload.
      </p>
      <p>As a future work we will consider the execution of the benchmark on multiple RDF
stores and comparison of the results with the ones achieved on Virtuoso.</p>
      <p>
        Another interesting direction for future work could be to extend the set of queries.
Currently, there are only two queries with the OPTIONAL operator, which was proved to
be the reason of the high complexity of the SPARQL language [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Queries that use
features of SPARQL 1.1 could be a good addition to the benchmark. ParlBench has
many queries that extensively use the UNION operator to explore the transitive hasPart
relation. We could re-write these queries through the SPARQL 1.1. path expressions.
Such queries would be a good ground to test the reasoning capabilities of RDF store
systems.
      </p>
      <p>The structural elements of the proceedings are connected to Proceedings through
the dcterms:hasPart property. The speaker of the speech and the a liated party
of the speaker are attached to Speech via the refMember and refParty properties
correspondingly.</p>
      <p>Vocabularies Relevant existing vocabularies and ontologies to model documents of
the parliamentary proceedings fall into two categories. In the rst category there are
vocabularies that are too generic, such as the SALT Document Ontology6 or the DoCo,
the Document Components Ontology7. They do not provide means to represent such
speci c concepts as stage direction or scene. Vocabularies in the second category are
too speci c, like the Semantic Web Conference Ontology8 or the Semantic Web Portal
Ontology9 which model proceedings of conferences.</p>
      <p>We de ned our own RDF vocabulary to model parliamentary proceedings, the
Parliamentary Proceedings vocabulary10, and integrated it with other existing
vocabularies. To represent biographical information of politicians, we used: BIO: A vocabulary
for biographical information11 together with the Friend of a Friend Vocabulary (FOAF)
12 and the DBpedia Ontology13. The Modular Uni ed Tagging Ontology14 (MUTO)
was used to represent information about tagged entities of paragraphs. The Dublin
Core Metadata Terms15 was used to encode metadata information.
6 http://salt.semanticauthoring.org/ontologies/sdo#
7 http://purl.org/spar/doco/Paragraph
8 http://data.semanticweb.org/ns/swc/swc_2009-05-09.html#
9 http://sw-portal.deri.org/ontologies/swportal#
10 http://purl.org/vocab/parlipro#
11 http://vocab.org/bio
12 http://xmlns.com/foaf/0.1/
13 http://dbpedia.org/ontology/
14 http://muto.socialtagging.org/
15 http://purl.org/dc/terms/ and http://purl.org/dc/elements/1.1/</p>
    </sec>
    <sec id="sec-7">
      <title>ParlBench Queries</title>
      <p>rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;
parlipro: &lt;http://purl.org/vocab/parlipro#&gt;
dcterms: &lt;http://purl.org/dc/terms/&gt;
dc: &lt;http://purl.org/dc/elements/1.1/&gt;
bio: &lt;http://purl.org/vocab/bio/0.1/&gt;
foaf: &lt;http://xmlns.com/foaf/0.1/&gt;
dbpedia: &lt;http://dbpedia.org/resource/&gt;
owl: &lt;http://www.w3.org/2002/07/owl#&gt;</p>
      <sec id="sec-7-1">
        <title>A0: Retrieve average number of people spoke per topic.</title>
        <p>SELECT AVG(?numOfMembers) as ?avgNumOfMembersPerTopic
WHERE {{
SELECT COUNT(?member) AS ?numOfMembers
WHERE {
?topic rdf:type parlipro:Topic .
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .</p>
        <p>{?topic dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?sd .
?sd rdf:type parlipro:StageDirection .</p>
        <p>?sd dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?scene .
?scene rdf:type parlipro:Scene .</p>
        <p>?scene dcterms:hasPart ?speech .}}</p>
        <p>GROUP BY ?topic}}</p>
      </sec>
      <sec id="sec-7-2">
        <title>A1: Retrieve average number of speeches per topic.</title>
        <p>SELECT AVG(?numOfSpeeches) as ?avgNumOfSpeechesPerTopic
WHERE {{
SELECT COUNT(?speech) AS ?numOfSpeeches
WHERE {
?topic rdf:type parlipro:Topic .
?speech rdf:type parlipro:Speech .</p>
        <p>{?topic dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?sd .
?sd rdf:type parlipro:StageDirection .</p>
        <p>?sd dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?scene .
?scene rdf:type parlipro:Scene .</p>
        <p>?scene dcterms:hasPart ?speech .}}</p>
        <p>GROUP BY ?topic}}</p>
      </sec>
      <sec id="sec-7-3">
        <title>A2: Retrieve average number of speeches per day.</title>
        <p>SELECT AVG(?numOfSpeeches) as ?avgNumOfSpeechesPerDay
WHERE {{
SELECT ?date COUNT(?speech) AS ?numOfSpeeches
WHERE {
?proc dcterms:hasPart ?topic .
?proc rdf:type parlipro:ParliamentaryProceedings .
?proc dc:date ?date .
?speech rdf:type parlipro:Speech .
?topic rdf:type parlipro:Topic .</p>
        <p>{?topic dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?sd .
?sd rdf:type parlipro:StageDirection .</p>
        <p>?sd dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?scene .
?scene rdf:type parlipro:Scene .</p>
        <p>?scene dcterms:hasPart ?speech .}}</p>
        <p>GROUP BY ?date}}</p>
      </sec>
      <sec id="sec-7-4">
        <title>C0: Count speeches of females.</title>
        <p>SELECT COUNT(?speech)
WHERE {
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .
?member bio:biography _:bio .
_:bio rdf:type bio:Biography .
_:bio foaf:gender dbpedia:Female .}</p>
      </sec>
      <sec id="sec-7-5">
        <title>C1: Count speeches of males.</title>
        <p>SELECT COUNT(?speech)
WHERE {
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .
?member bio:biography _:bio .
_:bio rdf:type bio:Biography .
_:bio foaf:gender dbpedia:Male .}</p>
      </sec>
      <sec id="sec-7-6">
        <title>C2: Count speeches of speakers who were born after 1960.</title>
        <p>SELECT COUNT(?speech)
WHERE {
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .
?member bio:biography _:bio .
_:bio rdf:type bio:Biography .
_:bio foaf:birthday ?birthday .</p>
        <p>FILTER (year(?birthday) &gt; 1960)}</p>
      </sec>
      <sec id="sec-7-7">
        <title>C3: Count speeches of male speakers who were born after 1960. C4: Count speeches of a female speaker from the topic where only one female spoke.</title>
        <p>SELECT COUNT(?speech)
WHERE {
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .
?member bio:biography _:bio .
_:bio rdf:type bio:Biography .
_:bio foaf:gender dbpedia:Male .</p>
        <p>_:bio foaf:birthday ?birthday .</p>
        <p>FILTER (year(?birthday) &gt; 1960)}
SELECT ?topic ?member COUNT(?speech) as ?numOfSpeeches
WHERE {{
SELECT ?topic ?member COUNT(?member) AS ?numOfFemales ?speech
WHERE {
?topic rdf:type parlipro:Topic .
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .
?member bio:biography _:bio .
_:bio rdf:type bio:Biography .
_:bio foaf:gender dbpedia:Female .</p>
        <p>{?topic dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?sd .
?sd rdf:type parlipro:StageDirection .</p>
        <p>?sd dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?scene .
?scene rdf:type parlipro:Scene .</p>
        <p>?scene dcterms:hasPart ?speech .}}
GROUP BY ?topic ?member ?speech}
FILTER (?numOfFemales = 1 )}</p>
        <p>GROUP BY ?topic ?member
F0: What members were born after 1950, their parties and dates of death of exist?
SELECT DISTINCT ?member ?party ?birthday
WHERE {
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member.
?speech parlipro:refParty ?party .
?member rdf:type parlipro:ParliamentMember .
?member bio:biography _:bio .
_:bio rdf:type bio:Biography .</p>
        <p>_:bio foaf:birthday ?birthday .</p>
        <p>FILTER (year(?birthday) &gt; 1960)}
FILTER (year(?birthday) &gt; 1950)</p>
        <p>OPTIONAL{_:bio dbpedia-ont:deathDate ?deathDate .}}
F1: What gender of politicians who spoke most within a certain timeframe?
SELECT ?gender COUNT(?member) AS ?numOfMembers
WHERE {
?proc rdf:type parlipro:ParliamentaryProceedings .
?proc dc:date ?date .
?proc dcterms:hasPart ?topic .
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .
?member rdf:type parlipro:ParliamentMember .
?member bio:biography _:bio .
_:bio rdf:type bio:Biography .
_:bio foaf:gender ?gender .</p>
        <p>{?topic dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?sd .
?sd rdf:type parlipro:StageDirection .</p>
        <p>?sd dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?scene .
?scene rdf:type parlipro:Scene .</p>
        <p>?scene dcterms:hasPart ?speech .}
FILTER (year(?date) &gt; 1995 AND year(?date) &lt; 2005)}
GROUP BY ?gender
ORDER BY DESC(?numOfMembers)</p>
        <p>LIMIT 1</p>
      </sec>
      <sec id="sec-7-8">
        <title>F2: What is the percentage of male speakers?</title>
        <p>SELECT (?numOfMaleMembers*100)/?numOfMembers
WHERE{{
SELECT COUNT(DISTINCT ?memberMale) as ?numOfMaleMembers</p>
        <p>COUNT(DISTINCT ?member) as ?numOfMembers
WHERE {{
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .</p>
        <p>?member rdf:type parlipro:ParliamentMember .}
UNION{
?memberMale bio:biography _:bio .
_:bio rdf:type bio:Biography .</p>
        <p>_:bio foaf:gender dbpedia:Male .</p>
        <p>FILTER (sameTerm(?member,?memberMale))}}}}</p>
      </sec>
      <sec id="sec-7-9">
        <title>F3: What is the percentage of female speakers?</title>
        <p>SELECT (?numOfFemaleMembers*100)/?numOfMembers
WHERE{{
SELECT COUNT(DISTINCT ?memberFemale) as ?numOfFemaleMembers</p>
        <p>COUNT(DISTINCT ?member) as ?numOfMembers
WHERE {{
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .</p>
        <p>?member rdf:type parlipro:ParliamentMember .}
UNION{
?memberFemale bio:biography _:bio .
_:bio rdf:type bio:Biography .</p>
        <p>_:bio foaf:gender dbpedia:Female .</p>
        <p>FILTER (sameTerm(?member,?memberFemale))}}}}
SELECT ?member ?numOfPages
WHERE {{
SELECT DISTINCT ?member COUNT(?dbpediaMember) AS ?numOfPages
WHERE {
?member rdf:type parlipro:ParliamentMember .</p>
        <p>?member owl:sameAs ?dbpediaMember .}
GROUP BY ?member}}
ORDER BY DESC(?numOfPages)</p>
        <p>LIMIT 1
F4: What politician has most number of Wikipedia pages in di erent languages?
F5: What speeches are made by politicians without Wikipedia pages?
SELECT DISTINCT ?speech ?member
WHERE {
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .</p>
        <p>?member rdf:type parlipro:ParliamentMember .</p>
        <p>OPTIONAL {?member owl:sameAs ?dbpediaMember .}</p>
        <p>FILTER (!bound(?dbpediaMember))}</p>
      </sec>
      <sec id="sec-7-10">
        <title>T0: Retrieve top 10 members with the most speeches.</title>
        <p>SELECT ?member COUNT(?speech) as ?numOfSpeeches
WHERE {
?member rdf:type parlipro:ParliamentMember .
?speech parlipro:refMember ?member .}
GROUP BY ?member
ORDER BY DESC(?numOfSpeeches)</p>
        <p>LIMIT 10</p>
      </sec>
      <sec id="sec-7-11">
        <title>T1: Retrieve top 10 topics when most of the people spoke.</title>
        <p>SELECT ?topic COUNT(?member) as ?numOfMembersSpokeInTopic
WHERE {</p>
        <p>?topic rdf:type parlipro:Topic .
?topic dcterms:hasPart ?speech .
?speech rdf:type parlipro:Speech .
?speech parlipro:refMember ?member .}</p>
        <p>GROUP BY ?topic
ORDER BY DESC(?numOfMembersSpokeInTopic)</p>
        <p>LIMIT 10</p>
      </sec>
      <sec id="sec-7-12">
        <title>T2: Retrieve top 10 topics with the most speeches.</title>
        <p>SELECT ?topic
COUNT(?speech) as ?numOfSpeeches
WHERE {
?topic rdf:type parlipro:Topic .
?speech rdf:type parlipro:Speech .
{?topic dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?sd .
?sd rdf:type parlipro:StageDirection .
?sd dcterms:hasPart ?speech .}
UNION{
?topic dcterms:hasPart ?scene .
?scene rdf:type parlipro:Scene .
?scene dcterms:hasPart ?speech .}}
GROUP BY ?topic
ORDER BY DESC(?numOfSpeeches)</p>
        <p>LIMIT 10</p>
      </sec>
      <sec id="sec-7-13">
        <title>T3: Retrieve top 10 days with the most topics.</title>
        <p>SELECT ?date COUNT(?topic) as ?numOfTopics
WHERE {
?proc rdf:type parlipro:ParliamentaryProceedings .
?proc dcterms:hasPart ?topic .</p>
        <p>?proc dc:date ?date .}
GROUP BY ?date
ORDER BY DESC(?numOfTopics)</p>
        <p>LIMIT 10</p>
      </sec>
      <sec id="sec-7-14">
        <title>T4: Retrieve top 10 longest topics (i.e., number of paragraphs).</title>
        <p>For the benchmark evaluation we used a personal laptop Apple MacBook Pro. The
operating system running is Mac OS X Lion 10.7.5 x64. The speci cation of the machine
is the following:
Hardware
{ CPUs: 2.8 GHz Intel Core i7 (2x2 cores)
{ Memory: 8 GB 1333 MHz DDR3
{ Hard Disk: 750GB
Software
{ OpenLink Virtuoso: Open Source Edition v.06.01.3127 compiled from source for</p>
        <p>OS X
{ MySQL Community Server (GPL) v. 5.5.15
{ Scripts (bash 3.2, Python 2.7.3) to scale and upload RDF datasets, to create
permutations of queries and run them on Virtuoso. The scripts are available for
downloading at http://data.politicalmashup.nl/RDF/scripts/.</p>
        <p>Virtuoso Con guration We con gured the Virtuoso Server to handle load of large data
sets16.</p>
        <p>NumberOfBu ers = 680000
MaxDirtyBu ers = 500000
16 http://www.openlinksw.com/dataspace/doc/dav/wiki/Main/</p>
        <p>VirtRDFPerformanceTuning</p>
        <p>The RDF Index Scheme remained as it was supplied with the default Virtuoso
installation. Namely, the scheme consists of the following indices:
{ PSOG - primary key.
{ POGS - bitmap index for lookups on object value.
{ SP - partial index for cases where only S is speci ed.
{ OP - partial index for cases where only O is speci ed.</p>
        <p>{ GS - partial index for cases where only G is speci ed.</p>
        <p>E</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Large Plots</title>
      <p>Size of proceedings, %</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The Benchmark Handbook for Database and Transaction Systems, (2nd Edition)</article-title>
          , Morgan Kaufmann, ISBN 1-55860-292-5.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Morsey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
          </string-name>
          , A.-C. N.:
          <article-title>DBpedia SPARQL benchmark: performance assessment with real queries on real data</article-title>
          .
          <source>In ISWC</source>
          , (
          <year>2011</year>
          ). LNCS, vol.
          <volume>7031</volume>
          , pp.
          <fpage>454</fpage>
          -
          <lpage>469</lpage>
          . Springer, Heidelberg (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Transaction</given-names>
            <surname>Processing Performance Council</surname>
          </string-name>
          (
          <year>2008</year>
          )
          <string-name>
            <surname>: TPC Benchmark</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Standard</surname>
          </string-name>
          Speci - cation
          <source>Revision 2.7.0. Retrieved March</source>
          <volume>2</volume>
          , (
          <year>2009</year>
          ), http://www.tpc.org/tpch/spec/tpch2. 7.0.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Transaction</given-names>
            <surname>Processing</surname>
          </string-name>
          Performance Council. http://www.tpc.org/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Afanasiev</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manolescu</surname>
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michiels</surname>
            <given-names>P.:</given-names>
          </string-name>
          <article-title>MemBeR: a micro-benchmark repository for XQuery</article-title>
          . In XSym (
          <year>2005</year>
          ). LNCS, vol.
          <volume>3671</volume>
          , pp.
          <volume>144</volume>
          {
          <fpage>161</fpage>
          . Springer, Heidelberg (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schultz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Berlin SPARQL benchmark</article-title>
          .
          <source>Int. J. On Semantic Web and Information Systems</source>
          .
          <volume>5</volume>
          (
          <issue>2</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hornung</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lausen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinkel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>SP2Bench: A SPARQL Performance Benchmark</article-title>
          .
          <source>In ICDE</source>
          ,
          <fpage>222</fpage>
          -
          <lpage>233</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Bizer</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Becker</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            <given-names>S.:</given-names>
          </string-name>
          <article-title>DBpedia - a crystallization point for the web of data</article-title>
          .
          <source>J. of Web Semantics</source>
          <volume>7</volume>
          (
          <issue>3</issue>
          ),
          <volume>154</volume>
          {
          <fpage>165</fpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Maarten</surname>
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Advanced Information Access to Parliamentary Debates</article-title>
          .
          <source>J. of Dig</source>
          . Inf..
          <volume>10</volume>
          (
          <issue>6</issue>
          ), (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Semantics and complexity of SPARQL ACM Trans</article-title>
          . Database Syst..
          <volume>34</volume>
          (
          <issue>3</issue>
          ), (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>