Representing Linked Data as Virtual File Systems Bernhard Schandl University of Vienna Department of Distributed and Multimedia Systems Liebiggasse 4/3-4 A-1010 Wien, Austria bernhard.schandl@univie.ac.at ABSTRACT A large number of applications, however, are executed in One of the main characteristics of Linked Open Data (LOD) desktop environments and are, in turn, designed and built is the exclusive application of standards published and main- under entirely different assumptions w.r.t. information rep- tained by the World Wide Web Consortium. This strict resentation, storage, and exchange. On the desktop, the file adherence is kept on all levels, ranging from the identifi- system is the main mechanism for the storage and organiza- cation and transportation (URI, HTTP) to the interpre- tion of data, and consequently data exchange on the desktop tation (RDF, RDFS, OWL) of resource descriptions. Be- is mostly implemented based on files. This is reflected by cause these standards are open and accessible to everybody, the fact that many applications provide import and export broad acceptance and proliferation of LOD technologies in filters, which allow them to read and write different file for- Web-based applications and services are enabled. On typi- mats and hence exchange data with other applications. A cal desktops, however, the majority of applications are not number of de-facto standard file formats exist which are ex- aware of Web standards, but use hierarchical file systems to pected to work across platforms and applications. organize and store information. This results in a gap be- The Semantic Web, and Linked Open Data in particu- tween the two distinct information spaces of the Web and lar, are distinct from desktop environments in this respect. the desktop. To bridge this gap, we propose a virtual file On the Web, different formats and mechanisms are in oper- system representation of LOD sets, through which they can ation. As a consequence, we can observe distinct informa- be accessed as if they were present in the file system and tion spaces as well as conceptual and technical gaps between thus easily be used within desktop applications. these two worlds. To bridge them, it is desirable to build a bridge between LOD and desktop applications so that desk- top users can directly access and integrate information from Categories and Subject Descriptors LOD sources, but continue to work with the applications D.4.3 [Operating Systems]: File Systems Management; they are familiar with. For instance, it would be desirable H.3.5 [Information Storage and Retrieval]: Online In- to directly insert a textual description of Berlin within one’s formation Services favourite word processing application, or to seamlessly load structured descriptions about this city into a spreadsheet tool. General Terms In this paper we present such a bridge: we propose a mech- Algorithms, Design anism that represents LOD sets as virtual file systems, which enables applications and users to directly access resource de- scriptions and their representations. Such a representation Keywords can be useful in a number of scenarios, which we describe Linked Open Data, file systems, information representation, in Section 2. In Section 3 we discuss our mapping approach Semantic Desktop and a prototypical implementation. We also discuss struc- tural differences between file systems and LOD principles, and how the LOD technology family could be improved in 1. INTRODUCTION order to extend their applicability in Section 4. The goal of Linked Open Data (LOD) [5] is to increase the value of publicly available data sets by exposing them on the 2. APPLICATIONS FOR VIRTUAL LOD- Web using standardized technologies, and by interlinking re- lated items so that clients can easily combine information BASED FILE SYSTEMS from various sources. To accomplish this, the LOD princi- The possible usage scenarios of file systems are manifold, ples [8] are fully integrated with the Web architecture [16] as we can observe on our own desktop computers. A number and technologies: URIs are used to identify resources, RDF of them are especially interesting in the context of Linked (usually serialized using XML) is used to describe them, and Open Data. In this section we outline such scenarios that resource descriptions and representations are transferred us- would benefit from a virtual representation of LOD sets. ing HTTP. Browsing and Navigation. Copyright is held by the author/owner(s). Most desktop computer users are familiar with naviga- LODW April 20, 2009, Madrid, Spain. tion in hierarchical file systems. The visual rendering of file system structures, provided by applications like Windows 3. REPRESENTING LOD AS VIRTUAL Explorer or Apple Finder, is similar on all desktop oper- FILE SYSTEM ating systems. They indicate files as atomic information entities, which are grouped by hierarchical collections, i.e., 3.1 Design Considerations directories. Navigation within the directory hierarchy of a file system is understood by most end users: directories can A number of conceptual differences between Linked Open be “opened” and their contents can be inspected. Similarly, Data and hierarchical file systems have to be considered in files can be opened with their respective applications in or- order to define a useful and valid representation. In the fol- der to view and manipulate their content. lowing we outline these issues and, where possible, describe A virtual file system representation of Linked Open Data directions how to solve them. applies the metaphors of directories and files to these data: • Structural Mismatch. Linked Open Data is published it allows users to navigate through the RDF graph provided in the RDF format, which essentially is a graph model: by a LOD set as if it was a hierarchical file structure on one’s resources and literals can be interpreted as (labelled) personal desktop. Hence users are not required to mentally nodes, and property relationships between them can “switch” between the Web and the desktop contexts. be interpreted as directed, labelled edges. In contrast, file systems are trees, which consist of inner nodes (di- Data Import. rectories) and leaf nodes (files). In hierarchical file The data found in LOD sources can be relevant in many systems, each node is labelled, and there exists only scenarios. However, common desktop applications usually one type of relationship between nodes (contains). do not provide means to import data directly from the Web, neither in “traditional” formats (e.g., HTML pages) nor in Consequently, in order to prevent information loss, the the form of RDF. Consequently, a user of such applications labelled edges of the RDF graph must be represented who wishes to reuse information from LOD sources is forced as labelled nodes in the file system representation. A to perform intermediate data conversion. First, the data of graph cannot be reduced to a tree without the loss of interest must be located, then it must be downloaded to a edges, which in the case of RDF means information local file, and as a final step it must be converted into a loss. However, one can circumvent the strictly hierar- format that can be read by the target application. chical structure of file systems using shortcuts 1 . The To efficiently perform these tasks, extensive knowledge of representation of edges in the RDF graph model as file LOD technologies (SPARQL and RDF) and of the target system shortcut allows for a complete graph represen- application’s data formats is needed. With a virtual file tation. representation of LOD, at least the first two steps can be • Entry Point. File systems, due to their hierarchical executed by the virtual file system driver, allowing users structure, have a natural entry point, the root direc- and applications to access data as if they were stored in the tory. This entry point is present in every file system local file system. Additionally, conversion from RDF to typ- and commonly serves as the starting point for activi- ical desktop application file formats that can be interpreted ties like browsing and searching, but also as reference by many applications (e.g., Rich Text Format or Comma- for unique naming within the file system tree. A graph separated Values) can be performed directly by the virtual structure does not have such a natural starting point. file system driver. Two possibilities for selecting a starting point for a tree-based representation of a graph can be derived Integration with Desktop Resources. from typical usage patterns of the (classic) Web: users Information resources on the desktop are typically orga- either are aware of a URL they want to visit (e.g., by nized using hierarchical file systems, which allow users to ar- using a bookmarking system) and navigate directly to range documents within a tree of (nearly) arbitrarily named the corresponding Web site, or use search engines to directories. Even applications that do not follow this pattern find resources that fulfil their information needs. store and organize their data in file system structures [22]. These hierarchies help users to retrieve previously stored in- • Naming. RDF, the meta model for representation of formation, mostly by step-by-step navigation through the Linked Open Data, uses URIs [4] to identify resources directory hierarchy, as described before. and the relationships between them. Per definition, The integration of resources other than files, like web re- URIs are globally unique, and two resources that are sources, into file systems is often cumbersome. Many sys- identified with the same URI are considered to be the tems allow users to link web resources (URLs) into hierar- same resource. In the RDF context, the inner struc- chical file systems by saving the target address into a special ture of URIs is irrelevant, and a similarity in resource file. This file however can often not be used by applications naming does, per se, not imply any kind of relationship that operate on the file system. By representing Linked between these resources. Open Data (i.e., resource descriptions on the Semantic Web) Naming in file systems is different: the uniqueness of as virtual file systems, these data can be directly integrated file and directory names is ensured only locally, i.e., in with other file-based resources, and it can be seamlessly pro- the context of the objects’ parent directory. A file’s cessed by humans and applications. full path is unique within the local machine context 1 Different names and semantics are used for such mecha- nisms in different operating systems; e.g., alias or symbolic link. Essentially all these mechanisms allow file system ob- jects (files or directories) to virtually appear in multiple lo- cations, i.e., they can be accessed via multiple paths. and can be interpreted as a sequence of local names. before we come to a number of restrictions that determine In this regard, file systems are more restrictive than our mapping definition. The following prerequisites for our RDF, which allows for a lossless mapping from URIs virtual file system representation of Linked Open Data sets to file names. However, the syntactic rules for valid must be considered. URIs differ from the rules for valid file and directory names (for instance, several characters that are allowed 1. Resources cannot be mapped to files. Since file sys- in URIs are not allowed in file names), which must be tems provide a manifestation of inner structure only for solved by suitable escaping algorithms. directories through the containment relationship de- scribed before but not for file contents, RDF resources • Literal Values. This naming mechanism does not ap- cannot be mapped to files, but must be mapped to di- ply to literals. In fact, literals are more convenient rectories in order to preserve their structured descrip- and intuitive substitutes for URIs (cf. Section 3.4 of tions. [17]), and hence it would be obvious to map literals in the same manner as resource URIs. Literals how- 2. Properties cannot be mapped to files. An RDF prop- ever carry an important part of information encoded erty establishes either a relationship between two re- in RDF: without literals, resource descriptions would sources or between a resource and a literal string. consist only of a graph interrelating abstract identi- Again, the only model element of file system that can fiers; with literals, humans and machines are enabled be used to express such relationships between objects to display, process, and interpret actual data about are directories and the elements they contain. resources. 3. Literal values should be represented inside files. As In file systems, the actual information to be used and described before, applications should be enabled to di- processed by applications is stored within files, and rectly access literal values, but this can only be accom- not in the structure of the file system hierarchy. This plished if they are represented as file contents. means that file-based applications are designed to read, write, interpret, and modify not directory hierarchies, 4. Resource representations should be considered. In file but file contents. Thus it is more practical and con- systems, contents and metadata are tightly integrated, venient to represent RDF literal values as file content and a file cannot be considered separate from its con- rather than to encode them as file or directory names. tents. To sustain this assumption on which file-based applications are designed, it is desirable to include re- • Resource Representations. One basic idea of the Se- source representations of various content types into the mantic Web is that it is used to describe resources. virtual file system, thus extending the scope of RDF. The actual representation of a resource, however, is 5. A meaningful root node should be defined. For a proper out of the scope of RDF since it deals only with the file system representation, a meaningful root node metadata layer. The connection between a resource’s should be defined that is useful both to humans and descriptions and its actual representations is usually to machines. This root node should also be chosen so established by dereferencing its URI. By doing so a that it provides a permanent mapping of resource iden- client can expect to retrieve a resource’s representa- tifiers (URIs) to file paths in order to allow to maintain tion (in the case of information resources) or a RDF- file paths even if the LOD set changes. based metadata record about a resource (in the case of non-information resources, cf. [16], Section 2.2). We have defined a virtual file system representation of In file systems, only information resources in the sense Linked Open Data sources that considers these conditions. of the Web architecture exist: the file as a concep- In the following we present this approach and discuss our tual entity cannot be separated from its representa- prototypical implementation. tion. This is both an advantage and a disadvantage: on the one hand, it is possible to directly reflect a re- 3.2 Approach source representation in the file system. On the other According to the considerations described before, we rep- hand, a resource may have multiple representations of resent each RDF resource within a LOD set as a virtual di- different types (for instance, different text formats), rectory (cf. Figure 1), and we collect all (known) resources which (in the case of HTTP resources) clients can re- within one directory called /!resource/. Hence each re- trieve using content negotiation (cf. [14], Section 12). source obtains a unique absolute path, which corresponds to Since a file has only one (main) content stream2 , one the RDF principle that each resource has a globally unique has to find another mapping mechanism for resource URI. To determine the name of the virtual resource direc- representations. tory, we convert full URIs to qualified names (cf. Section 4 of [10]). We encode URI characters that are not allowed in Since file systems follow a relatively simple underlying file systems, e.g., slashes or quotation marks, using UTF-8 model, the degrees of freedom for modeling a virtual LOD character encoding3 . representation are limited. Summing up the issues described The representation of each resource as a virtual direc- 2 tory allows us to collect all information about this resource Different file systems provide mechanisms to represent mul- 3 tiple content streams for files; e.g., Alternate Data Streams It depends on the operating system which characters are [3], file forks [1], or extended attributes. None of these ap- affected by this encoding: for instance, Windows does not proaches, however, is easily accessible for applications and permit colons in file systems, whereas in UNIX-based oper- users, and also data can often not be transferred across plat- ating systems they can be used as long as they are escaped forms. properly. dbpedia:Berlin /!resource/ dbpedia:Berlin / dbpedia:Berlin p:area "891.82"^^xsd:double Figure 1: Mapping of RDF resources to virtual di- rectories /!resource/ dbpedia:Berlin / p:area / value-17.txt within one single point in the virtual file system, and also to uniquely refer to this resource across the entire file system. "891.82"^^xsd:double Within the resource directory we can now represent all available information about this resource, i.e., properties that have this resource as subject. Since a property can have multiple values we represent each property as virtual direc- tory that contains all corresponding values. This is done Figure 3: Mapping of datatype properties to virtual differently for object properties (i.e., properties whose value files is a RDF resource) and datatype properties (i.e., properties whose value is a literal). For the former, we represent the ping only applies to object properties since literals cannot property value resource as a symbolic link4 that refers to be the subject of an RDF triple. the resource’s virtual directory, as described before. This (i) avoids long pathnames, which otherwise would reduce the system’s usability and may cause implementation prob- dbpedia:Berlin rdf:type dbpedia-owl:City lems, and (ii) avoids cycles and hence indefinite hierarchy depths. An example for the representation of a object prop- erty is depicted in Figure 2. /!resource/ dbpedia-owl:City / is rdf:type of / dbpedia:Berlin dbpedia:Berlin rdf:type dbpedia-owl:City symlink symlink /!resource/ dbpedia:Berlin / rdf:type / dbpedia-owl:City /!resource/ dbpedia:Berlin / rdf:type / dbpedia-owl:City symlink Figure 4: Mapping of incoming object properties to virtual directories /!resource/ dbpedia-owl:City / Finally, we include resource representations in our vir- tual file system in order to enable applications and users Figure 2: Mapping of object properties to virtual to directly access them without the need to deal with the directories HTTP protocol or other retrieval mechanisms. We repre- sent resource contents as files that reside within the virtual We represent the lexical representation of datatype prop- resource directory, and include the representation’s content erty values (i.e., literals) not as file name or directory name, type in the file name in order to distinguish them. Since but within the contents of a virtual file, which is located the Web architecture [16] provides no means to determine within the virtual property directory. Since a resource may which resource representations are available, we currently have multiple properties with the same property URI but use three common content types (application/rdf+xml, different literal values, we distinguish the single value files text/rdf+n3, and text/html). Additionally, a comma- by numbering them (see Figure 3). This mapping provides separated value (CSV) representation of all properties of the the possibility to directly read literal strings from applica- resource is created under the text/csv content type, which tions, but also to search for them using file system fulltext is of immediate use for many applications, e.g., spreadsheet indices. tools. Figure 5 shows the resulting virtual files. In addition to representing a resource’s “outgoing” prop- The combination of all these mappings constitutes a vir- erties (i.e., triples that have this resource as subject) we also tual tree-based representation of Linked Open Data sets. represent “incoming” properties (i.e., triples that have this Using this representation, users and applications are enabled resource as object) for convenience reasons. This representa- to navigate through the virtual directories that represent re- tion allows a client user or application to backwards-traverse sources and properties, and to access property values and edges in the RDF graph. To distinguish incoming properties resource representations, which are stored as virtual files. from outgoing ones, we apply the same naming convention All resources whose URIs are known can be used as starting as popular LOD browsers (e.g., Tabulator [6]) and encap- point, since they are represented under the virtual /!re- sulate the property URI by "is" and "of" strings. This source/ directory. representation is depicted in Figure 4. Of course this map- However, a LOD set may contain descriptions about large 4 numbers of resources, and retrieving all known resources A symbolic link (symlink) is a special file that contains a reference to another file or directory. from the endpoint is an expensive task. As described in dbpedia:Berlin / !resource dbpedia:2raumwohnung dbpedia:Berlin /!resource/ dbpedia:Berlin / content-application_rdf.xml dbpedia:Wannsee content-text.html /!resource/ dbpedia:Berlin / content-text_rdf.n3 content-application_rdf.xml /!resource/ dbpedia:Berlin / content-text.html p:location /!resource/ dbpedia:Berlin / content-text.csv dbpedia:Berlin rdfs:label Figure 5: Mapping of resource representations to value-1.txt virtual files value-2.txt ... Section 3.1 it is very common to use full text search en- dbpedia:Vienna gines as starting point for information retrieval from the Web. To provide similar behavior for linked data that is ... represented as virtual file system, we allow the user—in ad- berlin Symbols dition to the possibility of directly navigating to a virtual resource directory—to execute full text searches by creating Directory dbpedia:Berlin a directory in the virtual file system’s root folder, whereas File the directory name is used as search term5 . When such a dbpedia:Wannsee Symlink folder is created, a query is issued against the LOD set and ... symlinks for the resulting resources are created within that directory. Such a behaviour is also implemented in a number of application-specific virtual file systems, some of which we Figure 6: A complete virtual file system, represent- present in Section 5. ing resources from DBpedia Figure 6 shows an extract of a complete file system tree that represents data from one of the most popular LOD sources, DBpedia [2]. We can see the root directory for sources is through a full text search. Whenever a user creates resources, which contains one virtual directory for each a directory within the driver’s root directory, a correspond- resource. Each resource contains files for representations ing SPARQL SELECT query is issued, the resulting resources as well as sub-directories for properties (in this example, are added under the /!resource/ directory, and symlinks "p:location" and "rdfs:label"), which again contain files are created within the virtual search directory. Alterna- or symlinks that represent the property values. Finally, a tively, the user can directly access resource descriptions by virtual keyword search folder ("berlin" in this example) is creating (mkdir) or changing into (cd) the corresponding re- depicted that contains symlinks for each result. source directory, e.g., /!resource/dbpedia:Berlin/. The implementation retrieves resource descriptions only 3.3 Implementation on demand : when a request (e.g., a directory listing) to We have implemented a virtual file system driver called a virtual resource directory is issued, and the data of the LODFS6 that represents data from an arbitrary SPARQL resource has not yet been retrieved, a SPARQL DESCRIBE endpoint as virtual file system. This implementation uses query is issued, and the resource representations of various the FUSE-J toolkit7 which allows for the implementation of types (cf. Figure 5) are retrieved. Then, the resulting re- file system drivers in the user space and thus disburdens the sources are represented as virtual directories, files, and sym- developer from the need to develop kernel extensions. Cur- links. rently, FUSE file systems can be used on Linux, FreeBSD, Figure 7 shows a transcript of a console session that and Mac OS X platforms. interacts with a LOD dataset. In this example, a full A LODFS instance is always bound to one SPARQL end- text search directory is created and its contents are listed. point and potentially represents all data that is available Then, all properties and representations of one resource through this endpoint. When the LODFS driver is launched, (dbpedia:Wannsee) are listed. Finally, the contents of all it only provides a root directory that contains an empty literal values for the resource’s rdfs:label property are /!resource/ directory. The preferred way to access re- printed. 5 For a discussion on the practical applicability of fulltext queries in the context of Linked Open Data, refer to Sec- 4. PRELIMINARY EXPERIENCE tion 4.1. So far we have discussed a number of conditions for a vir- 6 LODFS: http://lodfs.sourceforge.net tual file system representation of LOD data (Section 3.1). 7 FUSE-J Framework: http://fuse-j.sourceforge.net In Section 3.2 we have presented our mapping approach, $ cd /Volumes/lodfs However, in principle there exists no globally valid mapping $ ls for URI prefixes since they are by definition valid only in a $ mkdir berlin local context. For generic client applications like our virtual $ cd berlin file system it is therefore hard to determine which URIs are $ ls used and which URI prefixes can be applied. URI prefixes 0 dbpedia:Berlin@ -> can be embedded in the various RDF serializations (e.g., /Volumes/lodfs/!resource/dbpedia:Berlin using namespaces in the RDF/XML serialization), but in 0 dbpedia:Wannsee@ -> practice often default prefixes are applied which have no /Volumes/lodfs/!resource/dbpedia:Wannsee meaning to the user (e.g., j_0: and similar prefixes are reg- [...] ularly found in RDF serializations produced by the Jena $ ls dbpedia:Wannsee Semantic Web framework). 26093 content-application.rdf+xml* To overcome this drawback one could imagine metadata 17060 content-text.csv* that describes a LOD set, and also indicates which vocabu- 30331 content-text.html* laries and URI prefixes are used therein. The recently pre- 17417 content-text.rdf+n3* sented Vocabulary of Interlinked Datasets (voiD)8 includes a 0 foaf:depiction/ property (void:vocabulary) to describe which vocabularies 0 foaf:img/ are used within a dataset, but does not consider the defini- 0 geo:lat/ tion of preferred URI prefixes. Thus, it would be an option 0 geo:long/ to extend the voiD vocabulary towards this direction. [...] Another approach to solve this problem is the usage of $ cat dbpedia:Wannsee/rdfs:label/* lookup indices like the recently presented prefix.cc ser- "Wannsee"@es vice9 , which maintains a list of mappings from prefixes to "Großer Wannsee"@nl URIs. Developers can use this service to submit their prefix- "Großer Wannsee"@de to-URI mapping and to look up the full URI for a given "Großer Wannsee"@da prefix. prefix.cc resolves prefix naming conflicts using a [...] voting mechanism, hence the most popular prefix mapping $ is determined by the user community. Currently, however, this service does not allow clients to query the preferred pre- fix for a given URI, which reduces its applicability for the Figure 7: Transcript of a LODFS session purposes described in this paper. Content Representation. and in Section 3.3 a prototypical implementation of this ap- The Web Architecture [16] does not provide means to proach was described. From the experience we have gained specify which content types can be used to retrieve a re- in the course of the design, implementation, and usage of source representation. Thus it is difficult for a generic client our approach, we can observe a number of open issues in to identify and retrieve all existing representations. Cur- the context of LOD related technologies. In the following rently, a client can only try to retrieve common content we outline several of these issues in order to indicate direc- types (e.g., text/html or application/rdf+xml). A mech- tions for further research and development. anism to obtain existing resource representations of specific content types would greatly increase the applicability of re- 4.1 Linked Open Data Issues source descriptions. Resource Rendering. RDF Language Features. URIs play a fundamental role in Linked Open Data, as A number of RDF language features (especially anony- they are used for the identification of resources and proper- mous resources, collections, and reification) are considered ties. Although they are not primarily designed for human problematic in the context of Linked Open Data (cf. [8], Sec- consumption, URIs are also often used for the visual render- tion 2.2). Their applicability in the context of virtual file ing of resources in user interfaces. A number of vocabularies systems is also restricted, since file systems do not provide provide properties designed to describe a resource’s human- mechanisms to reflect these language elements (e.g., it is not readable label (e.g., rdfs:label or skos:prefLabel), how- possible to define files without a name to represent anony- ever their presence is not guaranteed, in which case the URI mous resources, or to represent reified files or directories). serves as fallback for rendering. Moreover, a resource may As it is considered good practice to avoid these features in have multiple rdfs:label property values, or different re- Linked Open Data (cf. [8], Section 2.2) our approach also source’s labels may be equal, which causes confusion in user ignores blank nodes and treats collections and reification in interfaces. the same manner as other RDF triples. URI Prefixes. Fulltext Queries. Long URIs are hard to render in a user interface, and Currently, SPARQL provides fulltext search only through they are also not directly suitable to be used as file names the usage of the regex() filter (cf. Section 11.4.13 of [21]); a or directory names because of forbidden characters. In our typical fulltext query according to this specification is de- implementation, QNames are used to abbreviate URIs with 8 human-friendly shortcuts, and a number of URI prefixes voiD vocabulary: http://rdfs.org/ns/void 9 (e.g., rdf: or owl:) can be regarded as commonly accepted. Namespace lookup for RDF developers: http://prefix.cc picted in Figure 8. The implementation of this class of up with a slash in the Finder (cf. Figure 10). Of course, for queries, however, is usually not optimal; for instance, the a Windows implementation a different separator would have current DBpedia SPARQL implementation10 runs into a to be chosen. timeout when this query is issued. SELECT DISTINCT ?s WHERE { ?s ?p ?o . FILTER regex(?o, "vienna", "i") . } Figure 8: Standards-compliant SPARQL fulltext query On the other hand, different SPARQL implementations provide fulltext search through proprietary query language extensions. For DBpedia, fulltext queries can efficiently be issued through the virtual bif:contains property (cf. Fig- ure 9), which is defined by the underlying OpenLink Virtu- oso implementation [12]. This query form cannot be used in a generic client since it depends on the implementation of the SPARQL endpoint, which contradicts the intention of a high-level query language; i.e., to abstract over a service’s implementation specifics. It is crucial for LOD endpoints to efficiently implement a standardized mechanism for fulltext Figure 10: LOD resource representation in Mac OS search in order to be used by generic clients. X Finder SELECT DISTINCT ?s WHERE { ?s ?p ?o . ?o bif:contains "vienna" . } Path Lengths. Many operating systems impose a limit on the maximum number of characters for absolute file paths. Although ob- Figure 9: Fulltext queries in OpenLink Virtuoso ject properties are realized using symbolic links in our im- plementation, the virtual path to a resource may become very long, especially in the case of cyclic RDF properties. Currently this can be solved within applications and file Updates. browsers by resolving symbolic links. Linked Open Data does not provide a mechanism to up- date data, hence the virtual file system is read-only. There exist proposals for an update extension to SPARQL (e.g., 5. RELATED WORK the SPARQL/Update proposal which is currently a W3C The current state of the art w.r.t. the consumption of member submission [23]), but the “Writable Web” has been Linked Open Data for end users are RDF browsers, of which addressed only in a few number of works (e.g., [7]), and a number have been presented previously (e.g., [6, 18, 20]). is currently being addressed also in a W3C community These provide useful navigation interfaces for end users, project11 . but do not provide the possibility for applications to access Linked Open Data without the need to implement the cor- 4.2 File System Issues responding client protocols or complex data transformation operations. Operating System Specifics. A number of approaches have been presented how to use There exist a number of differences regarding the file sys- (semi-)structured object annotations for the generation of tem implementations of common operating systems. For virtual file system views; e.g., by interpreting file path ele- instance, the meaning of a backslash (\) in a path expres- ments as AND-combination of attribute/value pairs [11, 15], sion differs under Windows, where the backslash separates tags [9], or automatically generated classifications [13]. In sub-directory names, and under Linux/Unix-based systems, this approaches the virtual file system path is translated where it is used to escape special characters. Even on a into a query which is executed on the underlying data, and single platform the behaviour can be different: for instance, the results are presented as virtual files and sub-directories. Mac OS X allows the usage of slashes (/) in file names, With our virtual fulltext search directory (cf. Section 3.2) but the underlying Unix file system implementation converts we follow a similar approach, but additionally we map each them to colons. Our prototype implementation follows the resource in the underlying data set to a fixed file system convention of using a colon to separate URI prefixes from representation, which allows for permanent file path refer- the local names; consequently these directory names show ences to be made. A virtual hierarchical file system entirely 10 built on Semantic Web technologies, which allows for addi- DBpedia SPARQL endpoint: http://dbpedia.org/ sparql tional annotations and expressive search using an extended 11 pushback — Write Data Back From RDF to file API, is presented in [22], and it is shown that the perfor- Non-RDF Sources: http://esw.w3.org/topic/ mance of such systems is approaching a level sufficient for PushBackDataToLegacySources interactive usage. The libferris virtual file system [19] provides a generic Acknowledgements architecture that allows to mount a vast number of data Parts of this work have been funded by FIT-IT grants sources, including relational data bases, remote HTTP and 812513 and 815133 from Austrian Federal Ministry of Trans- FTP servers, and XML documents. Libferris provides means port, Innovation, and Technology. The author thanks Niko not only to read from these sources but also to store mod- Popitsch, Bernhard Haslhofer, and Stefan Zander for valu- ifications to the virtual file system in the underlying data able comments on this paper. source (e.g., a new node in an XML document can be in- serted by creating a directory in the virtual directory hier- archy), including locally stored RDF data which is accessed 7. REFERENCES [1] Apple Inc. File Forks, 2005. Available at by the means of the Redland RDF framework12 . RDF2FS http://developer.apple.com/documentation/mac/ [24] is a utility that transforms a given RDF file into an ac- Files/Files-14.html. tual directory tree. Its mapping approach is comparable to [2] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens the one presented in this paper, but RDF2FS is limited to Lehmann, Richard Cyganiak, and Zachary Ives. locally available RDF data and does not dynamically down- DBpedia: A Nucleus for a Web of Open Data. In load data from remote LOD sources. Finally, in [25] a vir- Proceedings of the 6th International Semantic Web tual file system based on Topic Maps, which are conceptually Conference (ISWC 2007), Busan, Korea, 2007. close to RDF, is presented. [3] Hal Berghel and Natasa Brajkovska. Wading into A number of approaches comparable to ours can be found Alternate Data Streams. Communications of the for specific web applications, including flickrfs13 , GmailFS14 , ACM, 47(4):21–27, 2004. or youtubefs15 . These approaches translate file system calls to operations on the underlying service API and represent [4] T. Berners-Lee, R. Fielding, and L. Masinter. Uniform data from the service’s account as virtual files. Services that Resource Identifier (URI): Generic Syntax (RFC deal with multimedia content like the ones described here are 3986). Network Working Group, January 2005. predestined to be represented as files since their APIs pro- [5] Tim Berners-Lee. Linked Data. World Wide Web vide a unified view on content but also on annotations like Consortium, 2006. Available at tags or user comments. To the best of our knowledge, the http://www.w3.org/DesignIssues/LinkedData.html, approach presented in this paper is the first one that uses retrieved 08-Aug-2008. arbitrary data accessible via a SPARQL endpoint and addi- [6] Tim Berners-Lee, Yuhsin Chen, Lydia Chilton, Dan tionally considers fulltext search and resource representation Connolly, Ruth Dhanaraj, James Hollenbach, Adam in conjunction with RDF descriptions. Lerer, and David Sheets. Tabulator: Exploring and Analyzing Linked Data on the Semantic Web. In Proceedings of the 3rd International Semantic Web User Interaction Workshop, 2006. 6. CONCLUSIONS [7] Tim Berners-Lee, J. Hollenbach, Kanghao Lu, In this paper we have shown how Linked Open Data sets J. Presbrey, Eric Prud’hommeaux, and m.c. schraefel. can be represented as virtual file systems, and hence be di- Tabulator Redux: Browsing and Writing Linked Data. rectly used by file-based applications without further con- In Proceedings of the Workshop on Linked Open Data version steps. We have sketched a number of potential ap- on the Web (LDOW2008), 2008. plication scenarios for such an implementation, and we have [8] Chris Bizer, Richard Cyganiak, and Tom Heath. How discussed design considerations that influence our mapping. to Publish Linked Data on the Web, 2007. Available at Our prototypical implementation maps RDF resources to http://www4.wiwiss.fu-berlin.de/bizer/pub/ virtual directories, which contain sub-directories and files LinkedDataTutorial/, retrieved 02-Dec-2008. that represent object and datatype properties. We addition- [9] Stephan Bloehdorn, Olaf Görlitz, Simon Schenk, and ally include resource representations of various content types Max Völkel. TagFS – Tag Semantics for Hierarchical into our virtual file system in order to allow applications to File Systems. In 6th International Conference on directly operate on these data. From our implementation Knowledge Management (I-KNOW’06), 2006. we have drawn a number of conclusions and issues that in- [10] Tim Bray, Dave Hollander, Andrew Layman, and dicate how the Linked Open Data technology family can be Richard Tobin. Namespaces in XML (Second Edition) extended and improved in order to better support generic (W3C Recommendation 16 August 2006). World Wide client applications. Web Consortium, 2006. Available at Currently however a virtual file system based on LOD is http://www.w3.org/TR/REC-xml-names/. read-only since there exists no standardized way to mod- [11] Paul Dourish, W. Keith Edwards, Anthony LaMarca, ify linked datasets. We believe that controlled write access and Michael Salisbury. Using Properties for Uniform could significantly improve the applicability of Linked Open Interaction in the Presto Document System. In UIST Data and related techniques, not only for virtual file systems ’99: Proceedings of the 12th annual ACM symposium as presented in this paper; thus we will investigate more to- on User interface software and technology, pages wards this direction in the future. 55–64, New York, NY, USA, 1999. ACM. [12] Orri Erling and Ivan Mikhailov. RDF Support in the 12 Redland RDF Libraries: http://librdf.org Virtuoso DBMS. In Sören Auer, Christian Bizer, 13 http://manishrjain.googlepages.com/flickrfs Claudia Müller, and Anna V. Zhdanova, editors, 14 http://richard.jones.name/google-hacks/ CSSW, volume 113 of LNI, pages 59–68. GI, 2007. gmail-filesystem/gmail-filesystem.html [13] Sebastian Faubel and Christian Kuschel. Towards 15 http://code.google.com/p/youtubefs/ Semantic File System Interfaces. In Christian Bizer and Anupam Joshi, editors, Proceedings of the Poster 559–572, 2006. and Demonstration Session at the 7th International [21] Eric Prud’hommeaux and Andy Seaborne. SPARQL Semantic Web Conference (ISWC 2008), volume 401. Query Language for RDF (W3C Recommendation 15 CEUR Workshop Proceedings, 2008. January 2008). World Wide Web Consortium, 2008. [14] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, [22] Bernhard Schandl and Bernhard Haslhofer. The Sile L. Masinter, P. Leach, and T. Berners-Lee. Hypertext Model – A Semantic File System Infrastructure for the Transfer Protocol – HTTP/1.1 (RFC 2616). Network Desktop. In Proceedings of the 6th European Semantic Working Group, 1999. Web Conference (ESWC 2009), Heraklion, Greece, [15] David K. Gifford, Pierre Jouvelot, Mark A. Sheldon, 2009. and Jr. James W. O’Toole. Semantic File Systems. In [23] Andy Seaborne, Geetha Manjunath, Chris Bizer, John SOSP ’91: Proceedings of the 13th ACM Symposium Breslin, Souripriya Das, Ian Davis, Steve Harris, on Operating Systems Principles, pages 16–25, New Kingsley Idehen, Olivier Corby, Kjetil Kjernsmo, and York, NY, USA, 1991. ACM Press. Benjamin Nowack. SPARQL Update – A Language for [16] Ian Jacobs and Norman Walsh. Architecture of the Updating RDF Graphs (W3C Member Submission 15 World Wide Web, Volume One (W3C July 2008). World Wide Web Consortium, Recommendation 15 December 2004). World Wide http://www.w3.org/Submission/2008/SUBM- Web Consortium, 2005. Available at SPARQL-Update-20080715/, 2008. Available at http://www.w3.org/TR/webarch/. http://www.w3.org/Submission/2008/ [17] Graham Klyne and Jeremy J. Carroll. Resource SUBM-SPARQL-Update-20080715/. Description Framework (RDF): Concepts and [24] Michael Sintek and Gunnar Aastrand Grimnes. Abstract Syntax (W3C Recommendation 10 February RDF2FS – A Unix File System RDF Store. In 2004). World Wide Web Consortium, 2004. Christian Bizer, Sören Auer, Gunnar Aastrand [18] Georgi Kobilarov and Ian Dickinson. Humboldt: Grimnes, and Tom Heath, editors, Proceedings of the Exploring Linked Data. In Proceedings of the Linked 4th Workshop on Scripting for the Semantic Web, Data on the Web Workshop (LDOW2008), 2008. 2008. [19] Ben Martin. The World is a libferris Filesystem. Linux [25] Alexander Zangerl and Robert Barta. Virtual File Journal, April 2006. System on Top of Topic Maps. In Proceedings of the [20] Eyal Oren, Renaud Delbru, and Stefan Decker. Fourth International Conference on Topic Maps Extending Faceted Navigation for RDF Data. In Research and Applications, 2008. International Semantic Web Conference, pages