Faceted Views over Large-Scale Linked Data Orri Erling OpenLink Software, Inc. 10 Burlington Mall Road Suite 265 Burlington,MA 01803 U.S.A. oerling@openlinksw.com ABSTRACT domain specific ones, such as [5]. For these to enter into Faceted views over structured and semi structured data have the user experience, the platform must be able to support been popular in user interfaces for some years. Deploy- the user’s choice of terminology or terminologies as needed, ing such views of arbitrary linked data at arbitrary scale preferably without blow up of data and concomitant slow- has been hampered by lack of suitable back end technol- down. ogy. Many ontologies are also quite large, with hundreds of Likewise, in the LOD world, many link sets have been thousands of classes. created for bridging between data sets.Whether such linkage Also, the linked data community has been concerned with is relevant will depend on the use case. Therefore we provide the processing cost and potential for denial of service pre- fine grained control over which owl:sameAs assertions will sented by public SPARQL end points. be followed, if any. This paper discusses how we use Virtuoso Cluster Edition Against this background, we discuss how we tackle incre- for providing interactive browsing over billions of triples, mental interactive query composition on arbitrary data with combining full text search, structured querying and result Virtuoso Cluster[6]. ranking. We discuss query planning, run time inferencing Using SPARQL or a web/web service interface, The user and partial query evaluation. This functionality is exposed can form combinations of text search and structured cri- through SPARQL, a specialized web service and a web user teria, including joins to an arbitrary depth. If queries are interface. precise and select a limited number of results, the results are complete. If queries would select tens of millions of results, partial results are shown. Categories and Subject Descriptors The system being described is being actively devel- H.5.4 [Information Systems]: Hypertext/Hypermedia; oped as of this writing, early March of 2009 and is on- H.2.8 [Information Systems]: Database Applications line at lod.openlinksw.com. The data set is a combina- tion of Dbpedia, Musicbrainz, Freebase, web crawls from www.pingthesemanticweb.com, Uniprot, Neurocommons, Keywords Bio2RDF. Faceted Views, Linked Data, SPARQL, OpenLink Virtuoso, The hardware consists of 2 8 core servers with 16G RAM partial query evaluation, entity ranking, large ontologies and 4 disks each. The system runs on Virtuoso 6 Cluster Edition. All application code is written in SQL procedures with limited client side Ajax, the Virtuoso platform itself is 1. INTRODUCTION in C. The transition of the web from a distributed document The facets service allows the user to start with a text repository into a universal, ubiquitous database requires a search or a fixed URI and to refine the search by specifying new dimension of scalability for supporting rich user inter- classes, property values etc., on the selected subjects or any action. If the web is the database, then it also needs a query subjects referenced therefrom. and report writing tool to match. A faceted user interaction This process generates queries involving combinations of paradigm has been found useful for aiding discovery and text and structured criteria, often dealing with property query of variously structured data. Numerous implementa- and class hierarchies and often involving aggregation over tions exist but they are chiefly client side and are limited in millions of subjects, specially at the initial stages of query the data volumes they can handle. composition. To make this work with in interactive time, At the present time, linked data is well beyond prototypes two things are needed: and proofs of concept. This means that what was done in 1. a query optimizer that can almost infallibly produce limited specialty domains before must now be done at real the right join order based on cardinalities of the specific world scale, in terms of both data volume and ontology size. constants in the query On the schema, or T box side, there exist many compre- 2. a query execution engine that can return partial results hensive general purpose ontologies such as Yago[1], Open after a timeout. CYC[2], Umbel[3] and the DBpedia[4] ontology and many It is often the case, specially at the beginning of query Copyright is held by the author/owner(s). formulation, that the user only needs to know if there are LDOW2009, April 20, 2009, Madrid, Spain. relatively many or few results that are of a given type or . involve a given property. Thus partially evaluating a query The bif:contains function in the filter specifies the full text is often useful for producing this information. This must search condition on ?o1. however be possible with an arbitrary query, simply citing This query is a typical example of queries that are exe- precomputed statistics is not enough. cuted all the time when a user refines a search. We will now It has for a long time been a given that any search-like look at how we can make an efficient execution plan for the application ranks results by relevance. Whenever the facets query. First, we must know the cardinalities of the search service shows a list of results, not an aggregation of result conditions: types or properties, it is sorted on a composite of text match To see the count of subclasses of Yago performer, we can score and link density. do: The paper is divided into the following parts: prefix cy: • SPARQL query optimization and execution adapted select count (*) for run time inference over large subclass structures. from where { • Resolving identity with inverse functional properties ?s rdfs:subClassOf cy:Performer110415638 option (transitive, t_distinct) } • Ranking entities based on graph link density • SPARQL partial query evaluation for displaying par- There are 4601 distinct subclasses, including indirect ones. tial results in fixed time Next we look at how many Shakespeare mentions there are: • a facets web service providing an XML interface for select count (*) where { submitting queries, so that the user interface is not ?s ?p ?o . required to parse SPARQL filter (bif:contains (?o, ’Shakespeare’)) } • a sample web interface for interacting with this There are 10267 subjects with Shakespeare mentioned in • sample queries and their evaluation times against com- some literal. binations of large LOD data sets define input:inference "yago" prefix cy: 2. PROCESSING LARGE HIERARCHIES select count (*) where { IN SPARQL ?s1 a cy:Performer110415638 . } Virtuoso has for a long time had built-in superclass and superproperty inference. This is enabled by specifying the There are 184885 individuals that belong to some subclass define input:inference "context" option, where context of performer. is previously declared to be all subclass, subproperty, equiv- This is the data that the SPARQL compiler must know alence, inverse functional property and same as relations in order to have a valid query plan. Since these values defined in a a given graph. The ontology file is loaded will wildly vary depending on the specific constants in the into its own graph and this is then used to construct the query, the actual database must be consulted as needed context. Multiple ontologies and their equivalences can be while preparing the execution plan. This is regular query loaded into a single graph which then makes another context processing technology but is now specially adapted for deep which holds the union of the ontology information from the subclass and subproperty structures. merged source ontologies. Conditions in the queries are not evaluated twice, once Let us consider a sample query combining a full text for the cardinality estimate and once for the actual run. search and a restriction on the class of the desired matches: Instead, the cardinality estimate is a rapid sampling of the index trees that reads at most one leaf page. define input:inference "yago" Consider a B tree index, which we descend from top to prefix cy: the leftmost leaf containing a match of the condition. At select distinct ?s1 as ?c1, each level, we count how many children would match and (bif:search_excerpt ( always select the leftmost one. When we reach a leaf, we see bif:vector (’Shakespeare’), ?o1 ) ) as ?c2 how many entries are on the page. From these observations, where { we extrapolate the total count of matches. ?s1 ?s1textp ?o1 . With this method, the guess for the count of performers filter (bif:contains (?o1, ’"Shakespeare"’)) . is 114213, which is acceptably close to the real number. ?s1 a cy:Performer110415638 . Given these numbers, we see that it makes sense to first } limit 20 find the full text matches and then retrieve the actual classes of each and see if this class is a subclass of performer. This This selects all Yago performers that have a property that last check is done against a memory resident copy of the contains “Shakespeare” as a whole word. Yago hierarchy, the same copy that was used for enumerat- The define input:inference "yago" clause means that ing the subclasses of performer. subclass, subproperty and inverse functions property state- However, the query ments contained in the inference context called yago are con- sidered when evaluating the query. The built-in function bif:search excerpt makes a search engine style summary of the found text, highlighting occurrences of Shakespeare. This option is controlled by the choice of the inference define input:inference "yago" context, which is selectable in the interface discussed below. prefix cy: The IFP inference can be thought of as a transparent ad- select distinct ?s1 as ?c1, dition of a subquery into the join sequence. The subquery (bif:search_excerpt ( joins each subject to its synonyms given by sharing IFP’s. bif:vector (’Shakespeare’), ?o1 ) ) as ?c2 This subquery has the special property that it has the initial where { binding automatically in its result set. It could be expressed ?s1 ?s1textp ?o1 . as: filter (bif:contains (?o1, ’"Shakespeare"’)) . ?s1 a cy:ShakespeareanActors . select ?f where { } ?k foaf:name "Kjetil Kjernsmo" . { select ?org ?syn where { will start with Shakespearean actors since this is a leaf ?org ?p ?key . class with only 74 instances and then check if the properties ?syn ?p ?key . contain Shakespeare and return their search summaries. filter ( bif:rdf_is_sub ("b3sifp", ?p, In principle, this is common cost based optimization but , 3) && is here adapted to deep hierarchies combined with text pat- ?syn != ?org ) } terns. An unmodified SQL optimizer would have no possi- } option (transitive, bility of arriving at these results. t_in (?org), t_out (?syn), t_min (0), t_max (1) ) The implementation reads the graphs designated as hold- filter (?org = ?k) . ing ontologies when first needed and subsequently keeps a ?syn foaf:knows ?f . } memory based copy of the hierarchy on all servers. This is used for quick iteration over sub/superclasses or proper- It is true that each subject shares IFP values with itself ties as well as for checking if a given class or property is but the transitive construct with 0 minimum and 1 max- a subclass/property of another. Triples with OWL pred- imum depth allows passing the initial binding of ?org di- icates equivalentClass, equivalentProperty and sameAs rectly to ?syn, thus getting first results more rapidly. The are also cached in the same data structure if they occur in rdf is sub function is an internal that simply tests whether the ontology graphs. ?p is a subproperty of b3s:any ifp. Also cardinality estimates for members of classes near the Internally, the implementation has a special query oper- root of the class hierarchy take some time since a sample of ator for this and the internal form is more compact than each subclass is needed. These are cached for some minutes would result from the above but the above could be used to in the inference context, so that repeated queries will not the same effect. redo the sampling. The issues of run time vs precomputed identity inference through IFP’s and owl:sameAs are discussed in much more 3. INVERSE FUNCTIONAL PROPERTIES detail at[9]. AND SAME AS Our general position is that identity criteria are highly application specific and thus we offer the full spectrum Specially when navigating social data, as in FOAF[7] and of choice between run time and precomputing. Further, SIOC[8] spaces, there are many blank nodes that are iden- weaker identity statements than sameness are difficult to tified by properties only. For this, we offer an option for use in queries, thus we prefer identity with semantics of automatically joining to subjects which share an IFP value owl:sameAs but make this an option that can be turned on with the subject being processed. For example, the query and off query by query. for the friends of friends of Kjetil Kjernsmo returns empty: select count (?f2) where { ?s a foaf:Person ; ?p ?o ; foaf:knows ?f1 . 4. ENTITY RANKING ?o bif:contains "’Kjetil Kjernsmo’" . It is a common end user expectation to see text search ?f1 foaf:knows ?f2 }; results sorted by their relevance. The term entity rank refers to a quantity describing the relevance of a URI in an RDF But with the option graph. This is a sample query using entity rank: define input:inference "b3sifp" select count (?f2) where { prefix yago: ?s a foaf:Person ; ?p ?o ; foaf:knows ?f1 . prefix prop: ?o bif:contains "’Kjetil Kjernsmo’" . select distinct ?s2 as ?c1 where { ?f1 foaf:knows ?f2 }; ?s1 ?s1textp ?o1 . ?o1 bif:contains ’Shakespeare’ . we get 4022. We note that there are many duplicates ?s1 a yago:Writer110794014 . since the data is blank nodes only, with people easily rep- ?s2 prop:writer ?s1 . resented 10 times. The context b3sifp simple declares that } order by desc ( (?s2)) foaf:name and foaf:mbox sha1sum should be treated as in- limit 20 offset 0 verse functional properties (IFP). The name is not an IFP in the actual sense but treating it as such for the purposes This selects works where a writer with Shakespeare in of this one query makes sense, otherwise nothing would be some property is the writer. found. Here the query returns subjects, thus no text search sum- maries, so only the entity rank of the returned subject is structures and control flows where these are efficient. For used. We order text results by a composite of text hit score example, it would make little sense to store entity ranks as and entity rank of the RDF subject where the text occurs. triples due to space consumption and locality considerations. The entity rank of the subject is defined by the count of With these tools, the whole ranking functionality took under references to it, weighed by the rank of the referrers and the a week to develop. outbound link count of referrers. Such techniques are used in text based information retrieval.[15] One interesting application of entity rank and inference on IFP’s and owl:sameAs is in locating URI’s for reuse. We 5. QUERY EVALUATION TIME LIMITS can easily list synonym URI’s in order of popularity as well When scaling the Linked Data model, we have to take it as locate URI’s based on associated text. This can serve in as a given that the workload will be unexpected and that the application such as the Entity Name Server[14]. query writers will often be unskilled in databases. Insofar Entity ranking is one of the few operations where we take possible, we wish to promote the forming of a culture of a precomputing approach. Since a rank is calculated based creative reuse of data. To this effect, even poorly formulated on a possibly long chain of references, there is little choice questions deserve an answer that is better than just timeout. but to precompute. The precomputation itself is straight- If a query produces a steady stream of results, interrupting forward enough: First all outbound references are counted it after a certain quota is simple. However, most interesting for all subjects. Next all ranks of subjects are incremented queries do not work in this way. They contain aggregation, by 1 over the referrer’s outbound link count. On successive sorting, maybe transitivity. iterations, the increment is based on the rank increment the When evaluating a query with a time limit in a cluster referrer received in the previous round. setup, all nodes monitor the time left for the query. When The operation is easily partitioned, since each partition dealing with a potentially partial query to begin with, there increments the ranks of subjects it holds. The referrers are is little point in transactionality. Therefore the facet service spread throughout the cluster, though. When rank is cal- uses read committed isolation. A read committed query culated, each partition accesses every other partition. This will never block since it will see the before-image of any is done with relatively long messages, referee ranks are ac- transactionally updated row. There will be no waiting for cessed in batches of several thousand at a time, thus absorb- locks and timeouts can be managed locally by all servers in ing network latency. the cluster. On the test system, this operation performs a single pass Thus, when having a partitioned count, for example, we over the corpus of 2.2 billion triples and 356 million distinct expect all the partitions to time out around the same time subjects in about 30 minutes. The operation has 100% uti- and send a ready message with the timeout information lization of all 16 cores. Adding hardware would speed it up, to the cluster node coordinating the query. The condition as would implementing it in C instead of the SQL procedures raised by hitting a partial evaluation time limit differs from it is written in at present. a run time error in that it leaves the query state intact on The main query in rank calculation is all participating nodes. This allows the timeout handling to come fetch any accumulated aggregates. select O, P, iri_rank (S) Let us consider the query for the top 10 classes of things from rdf_quad table option (no cluster) with “Shakespeare” in some literal. This is typical of the where isiri_id(O) order by O; workload generated by the faceted browsing web service: This is the SQL cursor iterated over by each partition. define input:inference "yago" The no cluster option means that only rows in this pro- select ?c count (*) where { cess’ partition are retrieved. The RDF QUAD table holds the ?s a ?c ; ?p ?o . RDF quads in the store, i.e. triple plus graph. The S, P, O ?o bif:contains "Shakespeare" . columns are the subject, predicate and object respectively. } group by ?c order by desc 2 limit 10 The graph column is not used here. The textttiri rank is a partitioned SQL function. This works by using the S argu- On the first execution with an entirely cold cache, this ment to determine which cluster node should run the func- times out after 2 seconds and returns: tion. The specifics of the partitioning are declared elsewhere. The calls are then batched for each intended recipient and yago:class/yago/Entity100001740 566 sent when the batches are full. The SQL compiler automat- yago:class/yago/PhysicalEntity100001930 452 ically generates the relevant control structures. This is like yago:class/yago/Object100002684 452 an implicit map operation in the map-reduce terminology. yago:class/yago/Whole100003553 449 An SQL procedure loops over this cursor, adds up the yago:class/yago/Organism100004475 375 rank and when seeing a new O, the added rank is persisted yago:class/yago/LivingThing100004258 375 into a table. Since links in RDF are typed, we can use yago:class/yago/CausalAgent100007347 373 the semantics of the link to determine how much rank is yago:class/yago/Person100007846 373 transferred by a reference. With extraction of named entities yago:class/yago/Abstraction100002137 150 from text content, we can further place a given entity into a yago:class/yago/Communicator109610660 125 referential context and use this as a weighting factor. This is to be explored in future work. The experience thus far The next repeat gets about double the counts, starting shows that we greatly benefit from Virtuoso being a general with 1291 entities. purpose DBMS, as we can create application specific data With a warm cache, the query finishes in about 300 ms (4 core Xeon, Virtuoso 6 Cluster) and returns: • Enter in the search form “Napoleon’: yago:class/yago/Entity100001740 13329 yago:class/yago/PhysicalEntity100001930 10423 yago:class/yago/Whole100003553 10210 napoleon yago:class/yago/LivingThing100004258 8868 yago:class/yago/Organism100004475 8868 yago:class/yago/CausalAgent100007347 8853 • Select the “types” view: yago:class/yago/Person100007846 8853 yago:class/yago/Abstraction100002137 3284 napoleon It is a well known fact that running from memory is thou- The query plan begins with the text search. The subjects with “Shakespeare” in some property get dispatched to the • Choose “MilitaryConflict” type: partition that holds their class. Since all partitions know the class hierarchy, the superclass inference runs in parallel, as have finished, the process coordinating the query fetches the napoleon partial aggregates, adds them up and sorts them by count. classes of the text matches are being retrieved. When this happens, this part of the query is reset, but the aggregate states are left in place. The process coordinating the query • Choose “NapoleonicWars”: then goes on as if the aggregates had completed. If there are many levels of nested aggregates, each timeout terminates thus a query is guaranteed to return in no more than n napoleon timeouts, where n is the number of nested aggregations or 6. FACETS WEB SERVICE The Virtuoso Facets web service is a general purpose RDF query facility for facet based browsing. It takes an XML • Select “any location” in the select list beside the description of the view desired and generates the reply as “map” link, then hit “map” link: an XML tree containing the requested data. The user agent end user. The selection of facets and values is represented as napoleon an XML tree. The rationale for this is the fact that such a representation is easier to process in an application than the SPARQL source text or a parse tree of SPARQL and more for faceted browsing. All such queries internally generate SPARQL and the SPARQL generated is returned with the results. One can therefore use this is a starting point for This last XML fragment corresponds to the below text of hand crafted queries. SPARQL query: The query has the top level element . The child select ?location as ?c1 ?lat1 as ?c2 ?lng1 as ?c3 elements of this represents conditions pertaining to a single where { subject. A join is expressed with the property or property- ?s1 ?s1textp ?o1 . of element. This has in turn children which state conditions filter (bif:contains (?o1, ’"Napoleon"’)) . on a property of the first subject. Property and property- ?s1 a . of elements can be nested to an arbitrary depth and many ?s1 a . can occur inside one containing element. In this way, tree- ?s1 ?anyloc ?location . shaped structures of joins can be expressed. ?location geo:lat ?lat1 ; geo:long ?lng1 . Expressing more complex relationships, such as intermedi- } ate grouping, subqueries, arithmetic or such requires writing limit 200 offset 0 the query in SPARQL. The XML format is for easy auto- matic composition of queries needed for showing facets, not The query takes all subjects with some literal property a replacement for SPARQL. with “Napoleon” in it, then filters for military conflicts and Consider composing a map of locations involved with Napoleonic wars, then takes all objects related to these Napoleon. Below we list user actions and the resulting where the related object has a location. The map has the XML query descriptions. objects and their locations. 9. FUTURE WORK All the functions discussed above are presently being pro- ductized for delivery with Virtuoso 6, so that single servers are open source and clusters commercial only. The most relevant future work is thus final debugging and tuning of existing functionality. The technology will be first commercially used as a plat- form for an Amazon EC2 offering of the whole LOD cloud on a cluster of servers. This complements the existing line of data sets pre-packaged by OpenLink[11]. For more sophisticated, also editable user facing function- ality, OpenLink is presently working with the developers of OntoWiki[12] on integrating the functionality discussed here into OntoWiki as a new large-scale back-end. From this de- velopment, we expect to have the functional equivalent of Freebase[13], except with more data, working with open, standard data models, being more integrable and above all having a full range of deployment options. This means any- thing from the desktop to the data center with either soft- Figure 1: The displayed result ware as service or installation at end user sites as options. We presently rank search results on text match scores and link density around the URI’s related to the text hits. We expect having semantics associated with links to open new 7. VOID DISCOVERABILITY possibilities in this domain. We plan to leverage link seman- A long awaited addition to the LOD cloud is the Vocabu- tics for ranking but as of this writing have not extensively lary of Interlinked Data (VoID)[10]. Virtuoso automatically explored this. generates VoID descriptions of data sets it hosts. Virtuoso incorporates an SQL function rdf void gen which returns a Turtle representation of a given graph’s 10. CONCLUSIONS VoID statistics. We have presented a set of query processing techniques and a web service and user interface for interactive brows- ing of a large corpus of linked data. We have shown sig- 8. TEST SYSTEM AND DATA nificant scalability on low cost server hardware, with open The test system consists of two 2x4 core Xeon 5345, ended scale out capacity for larger data set sizes and more 2.33 GHz servers with 16G RAM and 4 disks each. The concurrent usage. machines are connected by two 1Gbit Ethernet connections. The service described is online and is also packaged with The software is Virtuoso 6 Cluster. The Virtuoso server is Virtuoso 6 open source distributions. split into 16 partitions, 8 for each machine. Each partition The technical experience derived from developing this ser- is managed by a separate server process. vice emphasizes the following: The test database has the following data sets: • Central importance of a SPARQL/SQL cost model • Dbpedia 3.2 that is aware of hierarchies and is capable of sampling data as needed. Without the right execution plan, no • Musicbrainz amount of hardware will save the day. • Bio2RDF • The importance of enforcing a cap on resource usage. • Neurocommons • The need for scale-out in order to have enough data • Uniprot in memory. Disk is a far greater bottleneck than pro- cessor or network speed. Scaling out in a shared noth- • Freebase (95M triples) ing fashion is by far the most economical and scalable means of increasing total memory, disk bandwidth and • Ping The Semantic Web (1.6 million miscellaneous files processing power. from http://www.pingthesemanticweb.com). • Additional verification of our capacity to schedule par- Ontologies: allel query processing on a distributed memory cluster without being killed by latency. • Yago • Confirmation of the Virtuoso platform’s flexibility for • Open CYC building additional data intensive services, such as en- • Umbel tity ranking. • Dbpedia Present work is therefore concentrated on refining and productizing the platform and its RDF applications. We be- The database is 2.2 billion triples with 356 million distinct lieve this to be a significant infrastructure element enabling URI’s. the take off of linked data. 11. REFERENCES [1] Suchanek, F.M.; Kasneci, G.; Weikum, G.: YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia. WWW2007, ACM 978-1-59593-654-7/07/0005. [2] Overview of OpenCyc. http://www.cyc.com/cyc/opencyc/overview [3] UMBEL Ontology, Vol. 1: Technical Documentation, TR 08-08-28-A1. http://www.umbel.org/doc/UMBELOntology vA1.pdf [4] Auer, S.; Bizer, C.; Lehmann, J.; Kobilarov, G.; Cyganiak, R.; Ives, Z.: DBpedia: A Nucleus for a Web of Open Data. In Aberer et al. (Eds.): The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 + ASWC 2007, Busan, Korea, November 11-15, 2007. LNCS 4825 Springer 2007, ISBN 9783-540762973. [5] The National Center for Biomedical Ontology: Resources. http://bioontology.org/repositories.html [6] OpenLink Software, Inc. Virtuoso 6 FAQ. http://virtuoso.openlinksw.com/Whitepapers/ html/Virt6FAQ.html [7] Brickley, D.; Miller, L.: FOAF Vocabulary Specification 0.91. http://xmlns.com/foaf/spec/ [8] Bojars, U.; Breslin, J.G. (eds.): SIOC Core Ontology Specification http://rdfs.org/sioc/spec/ [9] Erling, O.: “E Pluribus Unum”, or “Inversely Functional Identity”, or “Smooshing Without the Stickiness”. http://www.openlinksw.com/dataspace/ oerling/weblog/Orri%20Erling’s%20Blog/1498 [10] Hausenblas, M.: Discovery and Usage of Linked Datasets on the Web of Data. NodMag #4. Available at http://www.talis.com/nodalities/ pdf/nodalities issue4.pdf [11] OpenLink Software, Inc. Virtuoso Universal Server (Cloud Edition) AMI for EC2. http://virtuoso.openlinksw.com/wiki/main/ Main/VirtuosoEC2AMI [12] Auer, S.; Dietzold, S.; Riechert, T.: OntoWiki A Tool for Social, Semantic Collaboration. 5th International Semantic Web Conference, Nov 5th–9th, Athens, GA, USA. In I. Cruz et al. (Eds.): ISWC 2006, LNCS 4273, pp. 736-749, 2006. Springer-Verlag Berlin Heidelberg 2006. [13] Metaweb Technologies, Inc.: What is Freebase? http://www.freebase.com/view/en/what is freebase [14] Stoermer, H.: Entity Name System: The Back-bone of an Open and Scalable Web of Data. In: Proceedings of the IEEE International Conference on Semantic Computing, ICSC 2008, number CSS-ICSC 2008-4-28-25. IEEE, August 2008. Available at http://www.okkam.org/publications/ stoermer-EntityNameSystem.pdf/at download/file [15] Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. In: Seventh International World-Wide Web Conference (WWW 1998), April 14-18, 1998, Brisbane, Australia. Available at http://ilpubs.stanford.edu:8090/361/