Faceted Views over Large-Scale Linked Data
Orri Erling
OpenLink Software, Inc.
10 Burlington Mall Road
Suite 265
Burlington,MA 01803
U.S.A.
oerling@openlinksw.com
ABSTRACT domain specific ones, such as [5]. For these to enter into
Faceted views over structured and semi structured data have the user experience, the platform must be able to support
been popular in user interfaces for some years. Deploy- the user’s choice of terminology or terminologies as needed,
ing such views of arbitrary linked data at arbitrary scale preferably without blow up of data and concomitant slow-
has been hampered by lack of suitable back end technol- down.
ogy. Many ontologies are also quite large, with hundreds of Likewise, in the LOD world, many link sets have been
thousands of classes. created for bridging between data sets.Whether such linkage
Also, the linked data community has been concerned with is relevant will depend on the use case. Therefore we provide
the processing cost and potential for denial of service pre- fine grained control over which owl:sameAs assertions will
sented by public SPARQL end points. be followed, if any.
This paper discusses how we use Virtuoso Cluster Edition Against this background, we discuss how we tackle incre-
for providing interactive browsing over billions of triples, mental interactive query composition on arbitrary data with
combining full text search, structured querying and result Virtuoso Cluster[6].
ranking. We discuss query planning, run time inferencing Using SPARQL or a web/web service interface, The user
and partial query evaluation. This functionality is exposed can form combinations of text search and structured cri-
through SPARQL, a specialized web service and a web user teria, including joins to an arbitrary depth. If queries are
interface. precise and select a limited number of results, the results are
complete. If queries would select tens of millions of results,
partial results are shown.
Categories and Subject Descriptors The system being described is being actively devel-
H.5.4 [Information Systems]: Hypertext/Hypermedia; oped as of this writing, early March of 2009 and is on-
H.2.8 [Information Systems]: Database Applications line at lod.openlinksw.com. The data set is a combina-
tion of Dbpedia, Musicbrainz, Freebase, web crawls from
www.pingthesemanticweb.com, Uniprot, Neurocommons,
Keywords Bio2RDF.
Faceted Views, Linked Data, SPARQL, OpenLink Virtuoso, The hardware consists of 2 8 core servers with 16G RAM
partial query evaluation, entity ranking, large ontologies and 4 disks each. The system runs on Virtuoso 6 Cluster
Edition. All application code is written in SQL procedures
with limited client side Ajax, the Virtuoso platform itself is
1. INTRODUCTION in C.
The transition of the web from a distributed document The facets service allows the user to start with a text
repository into a universal, ubiquitous database requires a search or a fixed URI and to refine the search by specifying
new dimension of scalability for supporting rich user inter- classes, property values etc., on the selected subjects or any
action. If the web is the database, then it also needs a query subjects referenced therefrom.
and report writing tool to match. A faceted user interaction This process generates queries involving combinations of
paradigm has been found useful for aiding discovery and text and structured criteria, often dealing with property
query of variously structured data. Numerous implementa- and class hierarchies and often involving aggregation over
tions exist but they are chiefly client side and are limited in millions of subjects, specially at the initial stages of query
the data volumes they can handle. composition. To make this work with in interactive time,
At the present time, linked data is well beyond prototypes two things are needed:
and proofs of concept. This means that what was done in 1. a query optimizer that can almost infallibly produce
limited specialty domains before must now be done at real the right join order based on cardinalities of the specific
world scale, in terms of both data volume and ontology size. constants in the query
On the schema, or T box side, there exist many compre- 2. a query execution engine that can return partial results
hensive general purpose ontologies such as Yago[1], Open after a timeout.
CYC[2], Umbel[3] and the DBpedia[4] ontology and many It is often the case, specially at the beginning of query
Copyright is held by the author/owner(s). formulation, that the user only needs to know if there are
LDOW2009, April 20, 2009, Madrid, Spain. relatively many or few results that are of a given type or
.
involve a given property. Thus partially evaluating a query The bif:contains function in the filter specifies the full text
is often useful for producing this information. This must search condition on ?o1.
however be possible with an arbitrary query, simply citing This query is a typical example of queries that are exe-
precomputed statistics is not enough. cuted all the time when a user refines a search. We will now
It has for a long time been a given that any search-like look at how we can make an efficient execution plan for the
application ranks results by relevance. Whenever the facets query. First, we must know the cardinalities of the search
service shows a list of results, not an aggregation of result conditions:
types or properties, it is sorted on a composite of text match To see the count of subclasses of Yago performer, we can
score and link density. do:
The paper is divided into the following parts:
prefix cy:
• SPARQL query optimization and execution adapted select count (*)
for run time inference over large subclass structures. from
where {
• Resolving identity with inverse functional properties ?s rdfs:subClassOf cy:Performer110415638
option (transitive, t_distinct) }
• Ranking entities based on graph link density
• SPARQL partial query evaluation for displaying par- There are 4601 distinct subclasses, including indirect ones.
tial results in fixed time Next we look at how many Shakespeare mentions there
are:
• a facets web service providing an XML interface for
select count (*) where {
submitting queries, so that the user interface is not
?s ?p ?o .
required to parse SPARQL
filter (bif:contains (?o, ’Shakespeare’)) }
• a sample web interface for interacting with this
There are 10267 subjects with Shakespeare mentioned in
• sample queries and their evaluation times against com- some literal.
binations of large LOD data sets
define input:inference "yago"
prefix cy:
2. PROCESSING LARGE HIERARCHIES select count (*) where {
IN SPARQL ?s1 a cy:Performer110415638 . }
Virtuoso has for a long time had built-in superclass and
superproperty inference. This is enabled by specifying the There are 184885 individuals that belong to some subclass
define input:inference "context" option, where context of performer.
is previously declared to be all subclass, subproperty, equiv- This is the data that the SPARQL compiler must know
alence, inverse functional property and same as relations in order to have a valid query plan. Since these values
defined in a a given graph. The ontology file is loaded will wildly vary depending on the specific constants in the
into its own graph and this is then used to construct the query, the actual database must be consulted as needed
context. Multiple ontologies and their equivalences can be while preparing the execution plan. This is regular query
loaded into a single graph which then makes another context processing technology but is now specially adapted for deep
which holds the union of the ontology information from the subclass and subproperty structures.
merged source ontologies. Conditions in the queries are not evaluated twice, once
Let us consider a sample query combining a full text for the cardinality estimate and once for the actual run.
search and a restriction on the class of the desired matches: Instead, the cardinality estimate is a rapid sampling of the
index trees that reads at most one leaf page.
define input:inference "yago" Consider a B tree index, which we descend from top to
prefix cy: the leftmost leaf containing a match of the condition. At
select distinct ?s1 as ?c1, each level, we count how many children would match and
(bif:search_excerpt ( always select the leftmost one. When we reach a leaf, we see
bif:vector (’Shakespeare’), ?o1 ) ) as ?c2 how many entries are on the page. From these observations,
where { we extrapolate the total count of matches.
?s1 ?s1textp ?o1 . With this method, the guess for the count of performers
filter (bif:contains (?o1, ’"Shakespeare"’)) . is 114213, which is acceptably close to the real number.
?s1 a cy:Performer110415638 . Given these numbers, we see that it makes sense to first
} limit 20 find the full text matches and then retrieve the actual classes
of each and see if this class is a subclass of performer. This
This selects all Yago performers that have a property that last check is done against a memory resident copy of the
contains “Shakespeare” as a whole word. Yago hierarchy, the same copy that was used for enumerat-
The define input:inference "yago" clause means that ing the subclasses of performer.
subclass, subproperty and inverse functions property state- However, the query
ments contained in the inference context called yago are con-
sidered when evaluating the query. The built-in function
bif:search excerpt makes a search engine style summary
of the found text, highlighting occurrences of Shakespeare.
This option is controlled by the choice of the inference
define input:inference "yago"
context, which is selectable in the interface discussed below.
prefix cy:
The IFP inference can be thought of as a transparent ad-
select distinct ?s1 as ?c1,
dition of a subquery into the join sequence. The subquery
(bif:search_excerpt (
joins each subject to its synonyms given by sharing IFP’s.
bif:vector (’Shakespeare’), ?o1 ) ) as ?c2
This subquery has the special property that it has the initial
where {
binding automatically in its result set. It could be expressed
?s1 ?s1textp ?o1 .
as:
filter (bif:contains (?o1, ’"Shakespeare"’)) .
?s1 a cy:ShakespeareanActors . select ?f where {
} ?k foaf:name "Kjetil Kjernsmo" .
{ select ?org ?syn where {
will start with Shakespearean actors since this is a leaf ?org ?p ?key .
class with only 74 instances and then check if the properties ?syn ?p ?key .
contain Shakespeare and return their search summaries. filter ( bif:rdf_is_sub ("b3sifp", ?p,
In principle, this is common cost based optimization but , 3) &&
is here adapted to deep hierarchies combined with text pat- ?syn != ?org ) }
terns. An unmodified SQL optimizer would have no possi- } option (transitive,
bility of arriving at these results. t_in (?org), t_out (?syn), t_min (0), t_max (1) )
The implementation reads the graphs designated as hold- filter (?org = ?k) .
ing ontologies when first needed and subsequently keeps a ?syn foaf:knows ?f . }
memory based copy of the hierarchy on all servers. This
is used for quick iteration over sub/superclasses or proper- It is true that each subject shares IFP values with itself
ties as well as for checking if a given class or property is but the transitive construct with 0 minimum and 1 max-
a subclass/property of another. Triples with OWL pred- imum depth allows passing the initial binding of ?org di-
icates equivalentClass, equivalentProperty and sameAs rectly to ?syn, thus getting first results more rapidly. The
are also cached in the same data structure if they occur in rdf is sub function is an internal that simply tests whether
the ontology graphs. ?p is a subproperty of b3s:any ifp.
Also cardinality estimates for members of classes near the Internally, the implementation has a special query oper-
root of the class hierarchy take some time since a sample of ator for this and the internal form is more compact than
each subclass is needed. These are cached for some minutes would result from the above but the above could be used to
in the inference context, so that repeated queries will not the same effect.
redo the sampling. The issues of run time vs precomputed identity inference
through IFP’s and owl:sameAs are discussed in much more
3. INVERSE FUNCTIONAL PROPERTIES detail at[9].
AND SAME AS Our general position is that identity criteria are highly
application specific and thus we offer the full spectrum
Specially when navigating social data, as in FOAF[7] and
of choice between run time and precomputing. Further,
SIOC[8] spaces, there are many blank nodes that are iden-
weaker identity statements than sameness are difficult to
tified by properties only. For this, we offer an option for
use in queries, thus we prefer identity with semantics of
automatically joining to subjects which share an IFP value
owl:sameAs but make this an option that can be turned on
with the subject being processed. For example, the query
and off query by query.
for the friends of friends of Kjetil Kjernsmo returns empty:
select count (?f2) where {
?s a foaf:Person ; ?p ?o ; foaf:knows ?f1 . 4. ENTITY RANKING
?o bif:contains "’Kjetil Kjernsmo’" . It is a common end user expectation to see text search
?f1 foaf:knows ?f2 }; results sorted by their relevance. The term entity rank refers
to a quantity describing the relevance of a URI in an RDF
But with the option graph.
This is a sample query using entity rank:
define input:inference "b3sifp"
select count (?f2) where { prefix yago:
?s a foaf:Person ; ?p ?o ; foaf:knows ?f1 . prefix prop:
?o bif:contains "’Kjetil Kjernsmo’" . select distinct ?s2 as ?c1 where {
?f1 foaf:knows ?f2 }; ?s1 ?s1textp ?o1 .
?o1 bif:contains ’Shakespeare’ .
we get 4022. We note that there are many duplicates ?s1 a yago:Writer110794014 .
since the data is blank nodes only, with people easily rep- ?s2 prop:writer ?s1 .
resented 10 times. The context b3sifp simple declares that } order by desc ( (?s2))
foaf:name and foaf:mbox sha1sum should be treated as in- limit 20 offset 0
verse functional properties (IFP). The name is not an IFP
in the actual sense but treating it as such for the purposes This selects works where a writer with Shakespeare in
of this one query makes sense, otherwise nothing would be some property is the writer.
found. Here the query returns subjects, thus no text search sum-
maries, so only the entity rank of the returned subject is structures and control flows where these are efficient. For
used. We order text results by a composite of text hit score example, it would make little sense to store entity ranks as
and entity rank of the RDF subject where the text occurs. triples due to space consumption and locality considerations.
The entity rank of the subject is defined by the count of With these tools, the whole ranking functionality took under
references to it, weighed by the rank of the referrers and the a week to develop.
outbound link count of referrers. Such techniques are used
in text based information retrieval.[15]
One interesting application of entity rank and inference
on IFP’s and owl:sameAs is in locating URI’s for reuse. We 5. QUERY EVALUATION TIME LIMITS
can easily list synonym URI’s in order of popularity as well When scaling the Linked Data model, we have to take it
as locate URI’s based on associated text. This can serve in as a given that the workload will be unexpected and that the
application such as the Entity Name Server[14]. query writers will often be unskilled in databases. Insofar
Entity ranking is one of the few operations where we take possible, we wish to promote the forming of a culture of
a precomputing approach. Since a rank is calculated based creative reuse of data. To this effect, even poorly formulated
on a possibly long chain of references, there is little choice questions deserve an answer that is better than just timeout.
but to precompute. The precomputation itself is straight- If a query produces a steady stream of results, interrupting
forward enough: First all outbound references are counted it after a certain quota is simple. However, most interesting
for all subjects. Next all ranks of subjects are incremented queries do not work in this way. They contain aggregation,
by 1 over the referrer’s outbound link count. On successive sorting, maybe transitivity.
iterations, the increment is based on the rank increment the When evaluating a query with a time limit in a cluster
referrer received in the previous round. setup, all nodes monitor the time left for the query. When
The operation is easily partitioned, since each partition dealing with a potentially partial query to begin with, there
increments the ranks of subjects it holds. The referrers are is little point in transactionality. Therefore the facet service
spread throughout the cluster, though. When rank is cal- uses read committed isolation. A read committed query
culated, each partition accesses every other partition. This will never block since it will see the before-image of any
is done with relatively long messages, referee ranks are ac- transactionally updated row. There will be no waiting for
cessed in batches of several thousand at a time, thus absorb- locks and timeouts can be managed locally by all servers in
ing network latency. the cluster.
On the test system, this operation performs a single pass Thus, when having a partitioned count, for example, we
over the corpus of 2.2 billion triples and 356 million distinct expect all the partitions to time out around the same time
subjects in about 30 minutes. The operation has 100% uti- and send a ready message with the timeout information
lization of all 16 cores. Adding hardware would speed it up, to the cluster node coordinating the query. The condition
as would implementing it in C instead of the SQL procedures raised by hitting a partial evaluation time limit differs from
it is written in at present. a run time error in that it leaves the query state intact on
The main query in rank calculation is all participating nodes. This allows the timeout handling to
come fetch any accumulated aggregates.
select O, P, iri_rank (S) Let us consider the query for the top 10 classes of things
from rdf_quad table option (no cluster) with “Shakespeare” in some literal. This is typical of the
where isiri_id(O) order by O; workload generated by the faceted browsing web service:
This is the SQL cursor iterated over by each partition. define input:inference "yago"
The no cluster option means that only rows in this pro- select ?c count (*) where {
cess’ partition are retrieved. The RDF QUAD table holds the ?s a ?c ; ?p ?o .
RDF quads in the store, i.e. triple plus graph. The S, P, O ?o bif:contains "Shakespeare" .
columns are the subject, predicate and object respectively. } group by ?c order by desc 2 limit 10
The graph column is not used here. The textttiri rank is a
partitioned SQL function. This works by using the S argu- On the first execution with an entirely cold cache, this
ment to determine which cluster node should run the func- times out after 2 seconds and returns:
tion. The specifics of the partitioning are declared elsewhere.
The calls are then batched for each intended recipient and yago:class/yago/Entity100001740 566
sent when the batches are full. The SQL compiler automat- yago:class/yago/PhysicalEntity100001930 452
ically generates the relevant control structures. This is like yago:class/yago/Object100002684 452
an implicit map operation in the map-reduce terminology. yago:class/yago/Whole100003553 449
An SQL procedure loops over this cursor, adds up the yago:class/yago/Organism100004475 375
rank and when seeing a new O, the added rank is persisted yago:class/yago/LivingThing100004258 375
into a table. Since links in RDF are typed, we can use yago:class/yago/CausalAgent100007347 373
the semantics of the link to determine how much rank is yago:class/yago/Person100007846 373
transferred by a reference. With extraction of named entities yago:class/yago/Abstraction100002137 150
from text content, we can further place a given entity into a yago:class/yago/Communicator109610660 125
referential context and use this as a weighting factor. This
is to be explored in future work. The experience thus far The next repeat gets about double the counts, starting
shows that we greatly benefit from Virtuoso being a general with 1291 entities.
purpose DBMS, as we can create application specific data With a warm cache, the query finishes in about 300 ms (4
core Xeon, Virtuoso 6 Cluster) and returns:
• Enter in the search form “Napoleon’:
yago:class/yago/Entity100001740 13329
yago:class/yago/PhysicalEntity100001930 10423
yago:class/yago/Whole100003553 10210 napoleon
yago:class/yago/LivingThing100004258 8868
yago:class/yago/Organism100004475 8868
yago:class/yago/CausalAgent100007347 8853
• Select the “types” view:
yago:class/yago/Person100007846 8853
yago:class/yago/Abstraction100002137 3284
napoleon
It is a well known fact that running from memory is thou-
The query plan begins with the text search. The subjects
with “Shakespeare” in some property get dispatched to the
• Choose “MilitaryConflict” type:
partition that holds their class. Since all partitions know the
class hierarchy, the superclass inference runs in parallel, as
have finished, the process coordinating the query fetches the napoleon
partial aggregates, adds them up and sorts them by count.
classes of the text matches are being retrieved. When this
happens, this part of the query is reset, but the aggregate
states are left in place. The process coordinating the query
• Choose “NapoleonicWars”:
then goes on as if the aggregates had completed. If there are
many levels of nested aggregates, each timeout terminates
thus a query is guaranteed to return in no more than n napoleon
timeouts, where n is the number of nested aggregations or
6. FACETS WEB SERVICE
The Virtuoso Facets web service is a general purpose RDF
query facility for facet based browsing. It takes an XML • Select “any location” in the select list beside the
description of the view desired and generates the reply as “map” link, then hit “map” link:
an XML tree containing the requested data. The user agent
end user. The selection of facets and values is represented as napoleon
an XML tree. The rationale for this is the fact that such a
representation is easier to process in an application than the
SPARQL source text or a parse tree of SPARQL and more
for faceted browsing. All such queries internally generate
SPARQL and the SPARQL generated is returned with the
results. One can therefore use this is a starting point for This last XML fragment corresponds to the below text of
hand crafted queries. SPARQL query:
The query has the top level element . The child select ?location as ?c1 ?lat1 as ?c2 ?lng1 as ?c3
elements of this represents conditions pertaining to a single where {
subject. A join is expressed with the property or property- ?s1 ?s1textp ?o1 .
of element. This has in turn children which state conditions filter (bif:contains (?o1, ’"Napoleon"’)) .
on a property of the first subject. Property and property- ?s1 a .
of elements can be nested to an arbitrary depth and many ?s1 a .
can occur inside one containing element. In this way, tree- ?s1 ?anyloc ?location .
shaped structures of joins can be expressed. ?location geo:lat ?lat1 ; geo:long ?lng1 .
Expressing more complex relationships, such as intermedi- }
ate grouping, subqueries, arithmetic or such requires writing limit 200 offset 0
the query in SPARQL. The XML format is for easy auto-
matic composition of queries needed for showing facets, not The query takes all subjects with some literal property
a replacement for SPARQL. with “Napoleon” in it, then filters for military conflicts and
Consider composing a map of locations involved with Napoleonic wars, then takes all objects related to these
Napoleon. Below we list user actions and the resulting where the related object has a location. The map has the
XML query descriptions. objects and their locations.
9. FUTURE WORK
All the functions discussed above are presently being pro-
ductized for delivery with Virtuoso 6, so that single servers
are open source and clusters commercial only. The most
relevant future work is thus final debugging and tuning of
existing functionality.
The technology will be first commercially used as a plat-
form for an Amazon EC2 offering of the whole LOD cloud
on a cluster of servers. This complements the existing line
of data sets pre-packaged by OpenLink[11].
For more sophisticated, also editable user facing function-
ality, OpenLink is presently working with the developers of
OntoWiki[12] on integrating the functionality discussed here
into OntoWiki as a new large-scale back-end. From this de-
velopment, we expect to have the functional equivalent of
Freebase[13], except with more data, working with open,
standard data models, being more integrable and above all
having a full range of deployment options. This means any-
thing from the desktop to the data center with either soft-
Figure 1: The displayed result ware as service or installation at end user sites as options.
We presently rank search results on text match scores and
link density around the URI’s related to the text hits. We
expect having semantics associated with links to open new
7. VOID DISCOVERABILITY possibilities in this domain. We plan to leverage link seman-
A long awaited addition to the LOD cloud is the Vocabu- tics for ranking but as of this writing have not extensively
lary of Interlinked Data (VoID)[10]. Virtuoso automatically explored this.
generates VoID descriptions of data sets it hosts.
Virtuoso incorporates an SQL function rdf void gen
which returns a Turtle representation of a given graph’s
10. CONCLUSIONS
VoID statistics. We have presented a set of query processing techniques
and a web service and user interface for interactive brows-
ing of a large corpus of linked data. We have shown sig-
8. TEST SYSTEM AND DATA nificant scalability on low cost server hardware, with open
The test system consists of two 2x4 core Xeon 5345, ended scale out capacity for larger data set sizes and more
2.33 GHz servers with 16G RAM and 4 disks each. The concurrent usage.
machines are connected by two 1Gbit Ethernet connections. The service described is online and is also packaged with
The software is Virtuoso 6 Cluster. The Virtuoso server is Virtuoso 6 open source distributions.
split into 16 partitions, 8 for each machine. Each partition The technical experience derived from developing this ser-
is managed by a separate server process. vice emphasizes the following:
The test database has the following data sets:
• Central importance of a SPARQL/SQL cost model
• Dbpedia 3.2 that is aware of hierarchies and is capable of sampling
data as needed. Without the right execution plan, no
• Musicbrainz
amount of hardware will save the day.
• Bio2RDF • The importance of enforcing a cap on resource usage.
• Neurocommons • The need for scale-out in order to have enough data
• Uniprot in memory. Disk is a far greater bottleneck than pro-
cessor or network speed. Scaling out in a shared noth-
• Freebase (95M triples) ing fashion is by far the most economical and scalable
means of increasing total memory, disk bandwidth and
• Ping The Semantic Web (1.6 million miscellaneous files processing power.
from http://www.pingthesemanticweb.com).
• Additional verification of our capacity to schedule par-
Ontologies: allel query processing on a distributed memory cluster
without being killed by latency.
• Yago
• Confirmation of the Virtuoso platform’s flexibility for
• Open CYC building additional data intensive services, such as en-
• Umbel tity ranking.
• Dbpedia Present work is therefore concentrated on refining and
productizing the platform and its RDF applications. We be-
The database is 2.2 billion triples with 356 million distinct lieve this to be a significant infrastructure element enabling
URI’s. the take off of linked data.
11. REFERENCES
[1] Suchanek, F.M.; Kasneci, G.; Weikum, G.: YAGO: A
Core of Semantic Knowledge Unifying WordNet and
Wikipedia. WWW2007, ACM
978-1-59593-654-7/07/0005.
[2] Overview of OpenCyc.
http://www.cyc.com/cyc/opencyc/overview
[3] UMBEL Ontology, Vol. 1: Technical Documentation,
TR 08-08-28-A1.
http://www.umbel.org/doc/UMBELOntology vA1.pdf
[4] Auer, S.; Bizer, C.; Lehmann, J.; Kobilarov, G.;
Cyganiak, R.; Ives, Z.: DBpedia: A Nucleus for a Web
of Open Data. In Aberer et al. (Eds.): The Semantic
Web, 6th International Semantic Web Conference, 2nd
Asian Semantic Web Conference, ISWC 2007 + ASWC
2007, Busan, Korea, November 11-15, 2007. LNCS 4825
Springer 2007, ISBN 9783-540762973.
[5] The National Center for Biomedical Ontology:
Resources. http://bioontology.org/repositories.html
[6] OpenLink Software, Inc. Virtuoso 6 FAQ.
http://virtuoso.openlinksw.com/Whitepapers/
html/Virt6FAQ.html
[7] Brickley, D.; Miller, L.: FOAF Vocabulary Specification
0.91. http://xmlns.com/foaf/spec/
[8] Bojars, U.; Breslin, J.G. (eds.): SIOC Core Ontology
Specification http://rdfs.org/sioc/spec/
[9] Erling, O.: “E Pluribus Unum”, or “Inversely
Functional Identity”, or “Smooshing Without the
Stickiness”.
http://www.openlinksw.com/dataspace/
oerling/weblog/Orri%20Erling’s%20Blog/1498
[10] Hausenblas, M.: Discovery and Usage of Linked
Datasets on the Web of Data. NodMag #4. Available
at http://www.talis.com/nodalities/
pdf/nodalities issue4.pdf
[11] OpenLink Software, Inc. Virtuoso Universal Server
(Cloud Edition) AMI for EC2.
http://virtuoso.openlinksw.com/wiki/main/
Main/VirtuosoEC2AMI
[12] Auer, S.; Dietzold, S.; Riechert, T.: OntoWiki A Tool
for Social, Semantic Collaboration. 5th International
Semantic Web Conference, Nov 5th–9th, Athens, GA,
USA. In I. Cruz et al. (Eds.): ISWC 2006, LNCS 4273,
pp. 736-749, 2006. Springer-Verlag Berlin Heidelberg
2006.
[13] Metaweb Technologies, Inc.: What is Freebase?
http://www.freebase.com/view/en/what is freebase
[14] Stoermer, H.: Entity Name System: The Back-bone of
an Open and Scalable Web of Data. In: Proceedings of
the IEEE International Conference on Semantic
Computing, ICSC 2008, number CSS-ICSC
2008-4-28-25. IEEE, August 2008. Available at
http://www.okkam.org/publications/
stoermer-EntityNameSystem.pdf/at download/file
[15] Brin, S., Page, L.: The Anatomy of a Large-Scale
Hypertextual Web Search Engine. In: Seventh
International World-Wide Web Conference (WWW
1998), April 14-18, 1998, Brisbane, Australia. Available
at http://ilpubs.stanford.edu:8090/361/