Linked Data Spaces & Data Portability Kingsley Idehen Orri Erling OpenLink Software, OpenLink Software, 10 Mall Road, 10 Mall Road, Burlington, MA 01803, USA Burlington, MA 01803, USA kidehen@openlinksw.com oerling@openlinksw.com To alleviate the imminent challenges of global information overload, we need to unobtrusively construct a Web of interlinked ABSTRACT structured data from today’s data silos comprised of the In the year 2007, the size of the Linked Data injected into the Web following: grew to several billion RDF triples, served by a network of interlinked data sources that cover domains such as general • RDF based structured data knowledge, geographic information, people, companies, online • Standardized data serialization formats communities, films, music, books and scientific publications. • HTTP based Unique Identifiers for all Data Items (web Unfortunately, the growth rate of User Generated content from a resources and abstract & concrete things) variety of Web based unstructured and semi-structured data-silos • HTTP based Data Set containers (Data Spaces) continues to exceed that of structured Linked Data. Thus, we have • Data Servers that provide data management and data a pressing need for technology, capable of bridging this access services for one or more Data Spaces broadening divide via transparent generation of Linked Data from • Key infrastructure oriented shared ontologies existing data-silos on the Web. Our Linked Data technology • Query Language for interacting with structured data demonstration explores the use of the OpenLink Data Spaces platform as a solution to this problem. We identify the items above, collectively, as critical components of Linked Data Spaces: points of presence on the Web that expose Categories and Subject Descriptors structured data via HTTP based URIs. H.3.2 [Information Storage] H.3.3 [Information Search & Retrieval] During this demonstration / presentation session we are going explore the creation of “Data Junction Boxes in the Clouds” via General Terms OpenLink Data Spaces that exploits in-built RDFization Middleware, plus the ability to mesh User Identity and User Data, Management, Performance, Design, Standardization, Languages, en route to surmounting the issues and challenges associated with Theory Data Portability attainment. Keywords 2. Issues & Challenges Linked Data, Semantic Web, SPARQL, Data Integration, Data Spaces 2.1 Data Portability It’s no secret that data wants to be free of the tyranny of 1. INTRODUCTION application logic confinement. In recent times, the realization that meshing Identity and Data ownership on the Web are critical User generated content is growing at an exponential rate behind requirements of this pursuit of freedom, has resulted in the corporate firewalls and across the Internet in general. The use of emergence of a movement for Data Portability as yet another Web technologies has been the prime accelerator of the enclave within the broader Open Data movement. aforementioned growth due to the pervasiveness of Web based distributed collaborative applications. Examples include: Social Data portability addresses to key issues: data mobility and data Networking, Weblogs, Wikis, Shared Bookmark Managers, Photo referencing. Today, data mobility though the use of standard data Sharing, Polls Management, Calendars, Discussion Forums, File formats for moving data across silos (import and export style) Sharing, and Feed Aggregation, to name a few. have emerged as the focal point of attention with regards to addressing the proliferation of data silos on the Web. Examples The exponential growth of user-generated content has resulted in include: RSS 1.0, RSS 2.0, Atom, OPML, FOAF, SIOC, and the growth of silos comprised of unstructured and/or semi- others. Unfortunately, the ability to reference and de-reference structured content. Unfortunately, these silos have accelerated, data across data-silos is yet to catch the attention of those rather than decelerated, the imminence of an “information pursuing data portability. overload” quagmire. 2.2 RDFization Middleware • Interaction with the resulting data graph via a number of The traditional resistance to RDF adoption, which is critical to Linked Data aware User Agents Linked Data comprehension and production, comes from the grounding of the RDF Data Model in Graph Theory and the unwillingness of most Web Application developers to interact 3. Identity & Data Meshing via Linked Data with data formally. This reality has lead to a genre of middleware Spaces tools collectively known as RDFizers, that generate RDF on the fly. With regards, to Linked Data, generating RDF on-the-fly is only part of the equation; the generated RDF must retain the core principles of linked data by providing URIs for physical web accessible resources, concrete entities, and abstract things. Of course, this process must include intelligent production of instance data associated with relevant shared schemas or ontologies. 2.3 Data Junction Boxes in the Clouds It is our belief that the Linked Data Web will be more distributed than centralized in architecture. We envisage a Linked Data Web comprised of hubs that range is size from large (e.g. DBpedia, Geonames, Zitgist etc.), medium sized group (e.g. RDFized Weblogs, Wikis, Bulletin Boards etc.), and smaller personal hubs enabled by operating system virtualization technologies like Amazon EC2. The medium and smaller hubs are best described as 4. Links data junction boxes because they act as conduits between existing • http://en.wikipedia.org/wiki/OpenLink_Dat systems and Linked Data aware User Agents. a_Spaces - OpenLink Data Spaces This demonstration will demonstrate a Data Space initialization • http://en.wikipedia.org/wiki/Virtuoso_Univ process for end-users that covers: ersal_Server - Virtuoso • Domain Name Registration (e.g. .Name acquisition) • http://myopenlink.net/ods/index.html - Live • DNS configuration OpenLink Data Spaces Demonstration • Bonding with existing Web 2.0 platforms Facebook, phpBB3, MediaWiki, Wordpress, Drupal, Del.icio.us, Flickr, and Bugzilla • Production of a dereferencable URIs that exposed the resulting Data Graph