Lifting File Systems into the Linked Data Cloud with TripFS Bernhard Schandl Niko Popitsch bernhard.schandl@univie.ac.at niko.popitsch@univie.ac.at University of Vienna, Department of Distributed and Multimedia Systems Liebiggasse 4/3-4, 1010 Vienna, Austria systems do not impose major restrictions on creating, nam- ing, and arranging directories and files, they support a user’s ABSTRACT individual preferences for data organization. File systems do not only store files that were created or modified locally: a A major fraction of digital information is stored in file sys- large share of files originates from other sources, like mul- tems. File systems organize files usually in labelled directory timedia devices, other desktops, or the Web. In corporate trees and provide a minimum support for user-driven file an- environments it is common to store data of collective in- notation, linkage and categorization. Although file systems terest on shared file servers that enable a simple form of play a major role in knowledge organization, both in enter- collaboration. prise contexts as well as in the personal information sphere, Overall, file systems can be considered as one of the pri- they have rarely been considered in Web-based information mary information sources both for organizations and indi- integration. To a large extent, this can be contributed to the viduals, and it is quite likely that they will remain to be limited metadata support of file systems and to the lack of important in the future. Therefore they are of high interest stable identifiers for file and directories, which makes it hard for information integration. However, file systems have only to expose these objects in a global Web. We present TripFS, rarely been considered in the field of Web-based data inte- a lightweight approach for exposing parts of local filesystems gration. This stems mostly from their limited possibilities of as Linked Data. Serving file system objects via dereference- data organization1 , limited metadata support, and the lack able HTTP URIs paves the way to integrate them with the of stable identifiers for files and directories. Web of Data, and enables new possibilities of exploiting file One promising strategy for Web-based information inte- system data, for example, by linking them with other data gration is the Linked Data paradigm. This term denotes a sources or by annotating them using Semantic Web tech- set of technologies and best practices that facilitate informa- nologies. tion integration and linkage on a global scale. To expose in- formation as Linked Data means to follow simple principles Categories and Subject Descriptors [6]: first, identify each resource of interest with a globally D.4.3 [Operating Systems]: File Systems Management; unique, dereferenceable HTTP URI; second, provide useful H.3.5 [Information Storage and Retrieval]: Online In- information for clients when they access the URI (usually formation Services expressed in RDF and HTML); and third, include links to other resources so that clients can retrieve more potentially interesting information. General Terms In this paper, we present TripFS, a lightweight approach Algorithms, Design that applies these principles for file systems in order to ex- pose their contents as Linked Data, and therefore enables Keywords their direct inclusion in Web-based integration scenarios. It assigns stable, globally valid, dereferenceable URIs to Linked Data, file systems, file metadata, information repre- files and directories, monitors changes in the system, serves sentation, information integration, event detection metadata extracted from files as RDF data, and interlinks files with external data sources. It provides a plug-in ar- 1. INTRODUCTION chitecture so that it can easily be extended to support ad- File systems store and organize data and documents of ditional file types and linking components, it adapts to the all sorts and of arbitrary complexity, ranging from small specifics of the underlying file system, and it provides a so- information snippets that can be put into single files, to phisticated file change tracking component that increases large repositories of heterogeneous content that are orga- the stability of file identifiers. nized within deep hierarchical structures. They act as the Because it is easy to set-up, TripFS also facilitates ad-hoc storage backbone of many information processing systems sharing of file-based resources using standardized (semantic) and can be considered as one major fundament of personal and corporate information management. Since common file 1 By now it seems commonly accepted that a single hier- Copyright is held by the author/owner(s). archical scheme is insufficient for the organization of large LDOW2010, April 27, 2010, Raleigh, North Carolina, USA. amounts of data as we encounter them on today’s desktop . environments. Web technologies. Moreover, it overcomes shortcomings of quality of search and retrieval, as well as linkage with other hierarchical organization mechanisms, because its metadata- relevant data sources. In turn, these Linked Data and Web- centric approach allows to query for descriptive information based annotations could be propagated back into the work- instead of file location, and to establish multiple, orthogonal ing context of the file system user, e.g., by being considered views on file system data. by desktop search engines. After outlining application scenarios and describing how users can benefit from exposing file systems as Linked Data (Section 2), we discuss which steps have to be taken in order 3. REPRESENTING FILE SYSTEMS AS to realize this idea (Section 3). We present details about the LINKED DATA TripFS architecture and implementation (Section 4). After Since the characteristics of file systems and Linked Data a discussion of related work (Section 5) we conclude the differ significantly, a number of steps have to be performed paper in Section 6. in order to lift file system data into a Web of Data: 1. Appropriate representations for files and directories 2. BENEFITS OF LINKED FILE SYSTEMS have to be found, which comply to the Linked Data The benefits of exposing data as Linked Data resources principles. are manifold [9]. In this section we outline three scenarios that illustrate how the quality of file system usage can be 2. Vocabularies that convey the characteristics of data increased by exposing files as Linked Data. found in file systems have be to be specified and aligned to already existing relevant vocabularies. A) Integrating File Systems into Enterprise Data. A substantial fraction of enterprise data is available in the form 3. Descriptive metadata about files have to be extracted of file systems. While these data can be accessed in a dis- from the file system and transformed into the RDF tributed context using protocols like CIFS or WebDAV, it data model. is difficult to integrate them in a global enterprise context 4. Meaningful links to other, external data sources have due to the lack of stable identifiers for files and platform- to be detected and established. independent metadata-based file access mechanisms. Linked Data has been shown to be a viable approach for lightweight 5. Consistency between the file system and its correspond- enterprise information integration [16]; therefore, making file ing Linked Data representation has to be ensured. systems part of a global or enterprise-internal Web of Data enables them to be seamlessly integrated with, and seman- 6. Data have to be served according to Linked Data prin- tically connected to other data sources. ciples, i.e., in a form that is usable for both, humans and machines. B) Web-based Ad-hoc Data Sharing. Despite the vast amount of possibilities for digital communication we have at In the following we outline how each of these steps can be our disposal, ad-hoc sharing of meaningful information (e.g., realized. the exchange of digital documents between participants’ lap- tops during face-to-face meetings) is still cumbersome. We 3.1 File URIs in the Web of Data? can regularly observe that collaborators use e-mail or in- Within the context of a file system, files and directories stant messaging to quickly exchange files. This approach, can be uniquely identified using their absolute paths, each however, does not allow more complex data to be shared, of which consists of a sequence of directory names and a file or to exchange files together with metadata that describe name. The file: URI scheme [5] is a means to directly their correct context. Linked Data builds on top of common reuse these paths to form URIs, which can in turn be used Web technologies, thus any Linked Data source can be di- to access local file resources in a computer system. rectly accessed using a common Web browser. A tool that However, file URIs are neither globally unique, since they allows users to temporarily share selected parts of their local describe a local path to a resource on a particular host, nor file systems as Linked Data (which implies not only sharing stable, since the referenced files and directories may be re- plain files, but also extracted metadata, annotations, and moved, moved, or renamed. Therefore they are not suitable links) facilitates efficient information exchange amongst col- for being used in a global Web of Data. laborators. To solve this identifier problem, we chose to use opaque, randomly generated UUIDs, and assign them to files and C) Semantic Web-based File Annotations. Semantic an- directories. The usage of random UUIDs in a global dis- notation and interlinking of files is badly supported today: tributed context is assumed to be safe since the probabil- although modern file systems support the storage, manage- ity of a collision is sufficiently low. Further, since UUIDs ment, and retrieval of file annotations (e.g., extended at- are fully opaque, they do not convey information about tributes or file forks), these data are not accessible in a the physical location of files and directories, and are there- standardized and platform-independent way. This makes fore stable even when the underlying file system objects are the organization of files into logically connected units diffi- changed. However, this requires to maintain a mapping be- cult, and reduces the efficiency of file retrieval especially in tween stable, UUID-based URIs on the one hand, and un- distributed environments. If file systems were published as stable, path-based identifiers on the other hand, to ensure part of a Web of Data, they could be annotated and inter- that modifications in the file system are properly reflected linked using tools like the LEMO annotation framework [13] in the Linked Data representation. In Section 3.6 we outline or the Silk framework [22], which would lead to an increased our strategy to accomplish this. 1 2 a tripfs:File ; 3 rdfs:label "eswc2009-schandl.pdf" ; 4 tripfs:local-name "eswc2009-schandl.pdf"^^xsd:string ; 5 tripfs:path "/Users/bs/Data/work/papers/2009/eswc/eswc2009-schandl.pdf"^^xsd:string ; 6 tripfs:size "425561"^^xsd:long ; 7 tripfs:modified "2009-03-11T02:38:45"^^xsd:dateTime ; 8 tripfs:parent . Figure 1: A Linked Data representation of a PDF file 1 2 a tripfs:File, foaf:Document, nfo:FileDataObject ; 3 tripfs:parent ; 4 nfo:belongsToContainer . 5 6 7 a tripfs:Directory, dctype:Collection, nfo:FileDataObject, nfo:Folder ; 8 tripfs:child ; 9 nie:hasPart . Figure 2: Interoperability through the usage of multiple overlapping vocabularies 3.2 Files and Directories as Web Resources OAI-ORE [17] or Dublin Core3 . The Dublin Core Type The parent-child relationships between files and directo- Vocabulary4 , as another example, defines terms for different ries can be represented as RDF triples with appropriate resource types as well as collections. Additionally, there predicates. Several triples are added to each file or direc- exists a large number of vocabularies that can be used to tory resource that convey data that are directly retrieved identify media types and their specifics; e.g., the MPEG-7 from the file system: the local name (i.e., the actual file or ontology5 , the Music Ontology6 , or the set of NEPOMUK directory name without the entire path information), the ontologies. file size, and the dates of creation and last modification. An To reach a maximum level of interoperability, a data source example of a file’s RDF representation is depicted in Fig- should aim to adhere to commonly accepted vocabularies ure 1. Resources that represent files or directories are inter- as much as possible. The RDF semantics allows to arbi- nally identified by UUID-based URNs; for serving them as trarily mix different, unrelated vocabularies; therefore we Linked Data they are dynamically rewritten to HTTP URIs propose—in addition to using a custom vocabulary—to mo- (cf. Section 3.7). del file system data using the NFO vocabulary, and to add type information from popular vocabularies like Dublin Core 3.3 Vocabularies and FOAF as they fit. By serving data using multiple, even already aligned vocabularies, we disburden data consumers In order to describe files, directories, their metadata and from the need to perform additional inference. An example their relations as RDF, we have developed a simple OWL vo- of such a mixed representation is presented in Figure 2. cabulary published at http://purl.org/tripfs/2010/02#. We have derived our vocabulary from existing semantic vo- 3.4 Extracting Semantic File Metadata cabularies as much as possible. However, as it is currently Current file systems provide only a limited set of low- uncommon to expose file resources as Linked Data, we ob- level metadata attributes associated with files such as name, served a lack of community-accepted vocabularies for this owner, size, creation and modification date, or permission purpose. To the best of our knowledge, only the NEPO- attributes. Modern file systems provide additional means MUK File Ontology2 (NFO) has been specifically defined to store higher-level metadata, like extended attributes or to model the contents of file systems. It provides terms to multiple data streams; however these are only useful if they describe files, directories, and their properties. Our vocab- are actually populated by applications, which is rarely the ulary is aligned with NFO and provides more specialized case. terms, according to our system’s requirements. A number of other vocabularies, however, have a general notion of the concept of documents, and usually align this 3 http://dublincore.org/groups/collections/ concept to the foaf:Document class. On the other hand, collection-application-profile/ several vocabularies have a notion of collections, which can 4 http://dublincore.org/documents/ be compared to directories in a file system; for instance, dcmi-type-vocabulary/ 5 http://metadata.net/mpeg7 2 6 http://www.semanticdesktop.org/ontologies/nfo/ http://musicontology.com 1 2 nie:mimeType "application/pdf" ; 3 nie:title "The Sile Model --- A Semantic File System Infrastructure for the Desktop" ; 4 nfo:pageCount 15 . 5 6 7 nie:mimeType "audio/mpeg" ; 8 nid3:title "Bohemian Rhapsody" ; 9 nid3:leadArtist [ nco:fullname "Queen" ] ; 10 nid3:length 355106 . Figure 3: Metadata extracted from a PDF and an MP3 file 1 2 owl:sameAs . 3 4 5 rdfs:seeAlso , 6 ; 7 owl:sameAs , 8 . Figure 4: External links to DBLP and MusicBrainz As it is one of the Linked Data principles to “provide then served as part of the file’s and directory’s description useful information” about a resource when a client derefer- via the Linked Data interface. Extractors may extract not ences its URI, it is desirable to extract additional, descrip- only file metadata (i.e., data about the documents repre- tive metadata from files and directories and expose them sented by files), but also entities that are related to files also as Linked Data. Reconsider, for example, Scenario A (e.g., the artist who has performed the music stored in a described in Section 2, where the value of file-system level MP3 file) and can in turn be linked to external data sources. metadata (like file size, file type, or file permissions) is lim- As an example, Figure 3 shows the RDF representation of ited; higher-level descriptive metadata that can be used for metadata that have been extracted from two files; the first selective retrieval of files respectively their descriptions, e.g., resource represents a PDF document containing a scientific via SPARQL, is required. However, the combination of these publication, the second represents an MP3 audio file8 . The metadata enables sophisticated discovery, retrieval and ac- blank node used to identify the artist in this example (line cess methods based on (i) the parent/child relations of file 9) needs to be dynamically rewritten to a stable, derefer- system objects, (ii) low-level file system metadata, and (iii) enceable URI by the Web server (see Section 3.7). high-level content-based metadata. The problem of extracting metadata from file systems has 3.5 Linking Files to External Sources been studied for a long time. The biggest challenge in this Once files and directories are represented as RDF resources field is the data diversity found in file systems, which is im- it is possible to link them to other related resources on the posed by the multitude of different file types. To illustrate Web. Doing so allows clients to retrieve more, potentially this, currently more than 51,000 file types are registered at interesting information about the resource. For instance, the popular FILExt service7 . Different file types exhibit dif- files may be classified according to a classification scheme ferent internal structures, and consequently different meta- that uses dereferenceable URIs as identifiers; in this case, data can be extracted. It is therefore impractical to provide clients are enabled to query for files using these terms. metadata extractors for this large amount of different file The task of linking files and directories to external re- types within a single software component. It is instead more sources can be accomplished by tools that provide this func- feasible to define a generic metadata extraction framework tionality for generic Web resources, which usually apply var- that allows specific extraction components for different file ious heuristics to detect semantically related resources (e.g., types to be plugged-in. By this, the system can be tailored shared identifiers or object similarity [22]). These heuristics to the respective application context. depend on the information that is available for a particular In our approach, extractors read files and extract an RDF entity; therefore in the context of a file system they depend graph that contains triples representing the extracted meta- on the data provided by metadata extraction components, data. Multiple extractors can be cascaded into an extractor as described in the previous section. pipeline and are sequentially applied to each object. The resulting RDF graphs are stored in the triple store and are 8 In this example we have used terms from the OSCAF/NEPOMUK ontologies (http://www. 7 http://filext.com semanticdesktop.org/ontologies). Event Reaction and links to external data sources, the resulting RDF graph can be served according to Linked Data principles. For Creation Mint URI, add resource to RDF graph, perform extraction and linking this purpose, internal UUID-based URNs are dynamically Deletion Delete respective resource and associated rewritten to HTTP-based URIs with a configurable host metadata from RDF graph part; e.g., http://example.com:8080/resource/. It Move/Rename Update local path properties in the RDF graph is considered good practice [7] to serve at least two variants Update Re-extract features, re-link, update RDF graph of the data, an RDF representation for machines and an HTML representation for human consumption, and to let clients choose which representation they prefer using HTTP Table 1: Reactions on file system events detected by content negotiation. In addition to serving resources accord- the watcher component ing to Linked Data principles, it is recommended to provide a SPARQL endpoint [10] to allow clients to search for re- sources based on their RDF descriptions. Furthermore, the actual file data itself can be downloaded to the client. In As a consequence, we follow the same strategy as for meta- the special case where the Linked Data resources are re- data extractors and do not provide an all-in-one solution to trieved locally (i.e., server and client are executed on the the problem of linking files to external resources, but in- same machine), the Web server can add links to the HTML stead provide a framework that allows specialized linking interface that allow the user to directly open directories or components to be plugged in. These linking components launch files from the browser, thus providing a seamless in- can access not only the raw file data, but also extracted teraction experience. Figure 5 shows a screenshot of such metadata, and use this information as basis for interlink- an HTML-based interface, which provides these options to ing. Like extractors, linking components return RDF triples the user. which are added to the metadata model and served via the Linked Data interface. As an example, Figure 4 shows to which external sources a 4. IMPLEMENTATION scientific publication and a music file can be linked, based on TripFS has been designed as a modular service framework, string similarity between the publication title and the com- which defines plug-in interfaces that can be used to extend bination of track title and artist name, respectively. In this and adapt the system to the actual needs of the use case, example the PDF document from Figure 3 has been linked the file types to be served, and the special characteristics to the Linked Data variant of the popular DBLP publication of the underlying operating system. Such interfaces exist database, and the MP3 file has been linked to resources of for RDF storage components, file metadata extractors, file the MusicBrainz service. linkers, and file system crawlers (responsible for crawling a configured subtree of the file system) and watchers (re- 3.6 Maintaining Consistency sponsible for maintaining the consistency of the mapping As described in Section 3.1, it is required to mint a UUID- between external UUID-based URIs and internal file-based based URI for each file and directory, which can be consid- URIs). The system’s architecture is depicted in Figure 6. ered globally unique from a practical point of view. How- The TripFS core is a standalone server application, which ever, without further precautions such URIs might be quite has been implemented in pure Java, based on the Jena Se- unstable as the mapping between an UUID-based external mantic Web framework9 . On startup, it crawls a config- URI and a file-based internal URI is invalidated whenever ured sub-tree of the local file system, applies extractor and a referenced file is moved, removed, or renamed. Further, linker components to crawled files, and stores the resulting updating such files may result in inconsistencies between a RDF triples in a triple store (either in memory or persis- file and the metadata that has been previously extracted tent). It initializes the watcher component to monitor the and stored. Note that this could lead also to invalid links exposed file system sub-tree, which in turn notifies TripFS between resources if these were automatically created based upon changes to files or directories. Subsequently, the RDF on file metadata, as described in Section 3.5. model is updated accordingly, and extractors and linkers are In order to preserve a stable mapping between these URIs re-applied to the modified objects. and the local files and directories they represent, we have to Metadata Extraction and Linking. We have imple- employ a watcher component that is responsible for detect- mented simple extractors that extract low-level file meta- ing file system events that may result in different file URIs or data, such as name, file size or a hash sum that could for modified file contents of referenced files. Whenever such an example be used to identify and link equal files across dif- event is detected, appropriate actions have to be taken, and ferent TripFS instances. the RDF model has to be updated. Note that in this sense, Further, we have implemented extractor components based the mapping between stable UUID-based URIs and instable on the Aperture metadata extraction framework10 , which file and directory paths acts as a kind of translation ser- provides a multitude of extractors for many different file vice between external, globally valid UUID-based URIs and types, including Office documents and multimedia data. As corresponding local file URIs, comparable to PURL or DOI a proof of concept, we have also implemented several linker services [2]. Table 1 summarizes the reactions that have to components: one that links documents, based on their ti- be taken after file system events have been detected. tles, to resources in the DBLP data set; one that links au- dio files to MusicBrainz by analyzing track title and artist 3.7 Serving File Systems as Web Resources 9 An evaluation version of TripFS can be obtained from Once the RDF-based representation of files and directories http://www.cs.univie.ac.at/tripfs. 10 has been generated and enriched with extracted metadata http://aperture.sourceforge.net Path-based navigation Direct file access Metadata access Link-based navigation Extracted metadata Figure 5: Accessing local files via a Linked Data representation name, and one that links files to potentially interesting DB- Feature Datatype Similarity Weight pedia resources via the DBpedia lookup service. Both, the Last access Date Plausibility set of extractors and linkers are to be understood as proof- Last modification Date Plausibility of-concept; by far they do not leverage the full potential of IsDirectory Bool Plausibility the presented approach. However, as described before, more Checksum Integer Plausibility extractors and linkers can be integrated easily according to Name String Levensthein 3.0 the needs of an actual use case. Extension String Major MIME 1.0 type equality Maintaining Consistency. We have used DSNotify [19] Path String Levensthein 0.5 as an implementation for the watcher component. DSNo- Size Long Equality 0.1 tify is a change detection add-on for datasources, supporting Permissions Bitstring Equality 0.1 them in maintaining link integrity in their data. At its core, DSNotify extracts feature vectors from considered data en- tities that are used in heuristic comparisons to determine Table 2: Extracted features, their data type and whether items that are no longer found at their original lo- the strategy used to calculate a similarity between cations were in fact removed or moved to another location. them. Features that are used only in plausibility DSNotify can easily be extended by implementing custom checks have a value Plausibility here. crawlers, feature extractors, and comparison heuristics. We have implemented a generic file-feature extractor for DSNotify that extracts low-level features from local files (cf. Table 2)11 . Further, we have developed a simple heuris- tic that calculates the plausibility that a file (described by similarities are weighted12 (e.g., the name similarity is con- the feature vector X) was moved to another location (the sidered more important than equal file sizes), summed up, file there being described by the feature vector Y ). This and normalized. These similarities are then used by DSNo- heuristic consists of two parts: first, plausibility checks are tify to detect move, remove and create events. Furthermore, performed. For example, if the last modification date of file DSNotify reports update events based on changes in the ex- Y is before the one of file X, it cannot be a successor of X. tracted feature vectors (cf. [19]). Another example is that a file cannot become a directory DSNotify periodically monitors the file subtree that is ex- or vice versa (checked by the isDirectory feature). Second, posed by TripFS, extracts feature vectors based on the file a similarity metric between the remaining features is calcu- attributes described before, and stores these vectors in an lated by using the strategies listed in Table 2. The resulting index. DSNotify uses a native C++ component for effi- ciently monitoring the local filesystem that makes use of the Windows API FindNextChangeNotification() method. We 11 The set of extracted features used by DSNotify is over- 12 lapping but not equal to the set of metadata attributes The selection of features as well as their weight was our extracted and exposed by the TripFS. In the current im- own subjective choice based on several test-runs with the plementation, these latter metadata are stored in the RDF system. We consider an extensive evaluation of DSNotify as graph while DSNotify stores features in its own indices. a tool for detecting file system events as future work. expose these contents as Linked Data, but does not by itself HTTP extract higher-level metadata from files. For this, it relies on additional components, of which a wide variety exists. The FILE/CIFS/SMB/NFS/... HTTP Aperture metadata extraction framework was already men- tioned before; it is based on the Gnowsis adapter framework / SPARQL Linked Data Interface Watcher [20] and is capable of extracting RDF descriptions from a wide range of files and other data sources. For most file types Crawler there exist extractors that return RDF descriptions of the file content, ranging from BibTeX files over calendar data RDF to JPEG images; a list of these extractors is maintained at Extractors the W3C ESW Wiki13 . Such conversion or extraction com- ponents exist also for Web sources, e.g., PiggyBank [15] or TripFS Linkers Local Filesystem Virtuoso Sponger technology14 , which create RDF descrip- tions from a multitude of Web sources on the fly. TripFS is in line with a number of other generic frame- Figure 6: TripFS architecture works that allow one to expose Linked Data based on a dif- ferent underlying data representation. Frameworks in this area include D2R [8] and Triplify [3] for relational data have also implemented a generic, yet less efficient Java-based bases, SparqPlug [11] for DOM-based sources, OAI2LOD monitor component that should work on all common plat- [14] for OAI-PMH repositories, and XLWrap [18] for spread- forms. This allows us to re-crawl the respective subdirectory sheet data. With TripFS, file system contents can likewise tree only if there were actual changes reported by the op- be made “first-class citizens” of the Web of Data and can erating system. The detected events are then forwarded to be seamlessly integrated with all these other data sources. TripFS; the file’s path is updated in the RDF model, and extractors and linkers are re-applied. Linked Data Interface. TripFS includes an embed- 6. CONCLUSIONS AND FUTURE WORK ded Jetty Web server, which serves data from the triple In this paper we have presented and discussed TripFS, a store, as described in Section 3.7. It dynamically rewrites service that exposes local file systems according to Linked the internally used UUIDs and blank nodes to dereference- Data principles. This approach potentially brings benefit to able HTTP URIS, and provides XHTML+RDFa and pure a range of application scenarios (cf. Section 2). In an en- RDF representations of file and directory resources, as well terprise information integration scenario (Scenario A), files as a SPARQL endpoint. It further allows clients to directly are assigned stable, globally unique URIs and can therefore download file contents and, in the case of local requests, to be referenced from external systems. Metadata that are ex- directly launch these files. tracted from files can be indexed by Semantic Web search Neither component of TripFS makes any changes to the engines, and links to other (enterprise-internal or external) exposed file system; i.e., no special files or directories (like data sources can increase the quality of information organi- needed e.g., for SVN) are created. Currently, TripFS also zation and data retrieval. does not provide means to modify file systems via the Linked A lightweight component like TripFS can also be used in Data interface. ad-hoc file sharing situations (Scenario B): participants in a face-to-face meeting can easily set up and start the sharing 5. RELATED WORK server, which exposes a certain sub-tree of their file system as Linked Data. This enables collaborators in the same net- Although modern file systems support the creation, stor- work to access and retrieve these files, based not only on low- age, management, and retrieval of file-related metadata (e.g., level characteristics like file name, but also using extracted using extended attributes or file forks), they remain mostly semantic metadata and links. Using additional components, isolated from Web-based information integration and ex- more intuitive approaches like faceted navigation can be per- change contexts. Even file systems that provide sophisti- formed on top of extracted data, and more experienced users cated support for file annotations or links (e.g., LiFS [1] are enabled to issue complex SPARQL queries over the file or AttrFS [23]) do not consider a global Web context but system. restrict their features often to objects within the local sys- A Linked Data representation of file systems also facili- tem. On the other hand, Web-based file systems usually tates the application of Web-based annotation services (Sce- focus on performance (e.g., [12]) or security (e.g., [4]), but nario C), which overcomes the limitations of the hierarchical not on semantically rich file descriptions or metadata in- directory metaphor for file organization. Such annotations teroperability. In this respect, TripFS can be seen as com- can refer to single files or even parts thereof, and can range plementary to metadata-rich or highly scalable file systems from simple text-based comments to complex descriptions in order to bridge the gap between file systems and Web that may refer to external entities and concepts. TripFS environments. In combination with other works that repre- makes file systems a part of a global, uniform Web of Data sent Web resources as virtual file systems (e.g., [21]), local and therefore allows one to apply Web-based annotation file systems and remote Web resources can be seamlessly techniques immediately to file system objects. integrated, providing unified programming interfaces and a In future work, we plan an extensive evaluation of TripFS, consistent user experience. As described before, file system contents are highly diverse 13 http://esw.w3.org/topic/ConverterToRdf and heterogeneous, and contain information that is valuable 14 http://docs.openlinksw.com/virtuoso/ in many scenarios. TripFS presents a generic framework to virtuososponger.html in particular regarding the performance and scalability of [11] Peter Coetzee, Tom Heath, and Enrico Motta. our approach. For this purpose, we aim to apply TripFS in SparqPlug: Generationg Linked Data from Legacy a concrete enterprise information integration setting, and we HTML, SPARQL and the DOM. In Proceedings of the plan to develop a simple user interface that allows end users First International Workshop on Linked Data on the to more easily share their files using Linked Data technolo- Web (LDOW), 2008. gies. Further, we plan to improve and evaluate the accuracy [12] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak of the DSNotify component for detecting file system events. Leung. The Google File System. In 19th ACM Additionally, we plan to introduce a more fine-grained Symposium on Operating Systems Principles, 2003. model for selecting what file system objects are exposed via [13] Bernhard Haslhofer, Wolfgang Jochum, Ross King, TripFS (currently one can select only a single subtree of the Christian Sadilek, and Karin Schellner. The LEMO file system) and implement a secure HTTPS version that Annotation Framework: Weaving Multimedia takes privacy considerations into account. Annotations with the Web. International Journal on Digital Libraries, 10(1), 2009. Acknowledgements [14] Bernhard Haslhofer and Bernhard Schandl. The Parts of this work have been funded by FIT-IT grants 812513 OAI2LOD Server: Exposing OAI-PMH Metadata as and 815133 from Austrian Federal Ministry of Transport, Linked Data. In International Workshop on Linked Innovation, and Technology. Data on the Web (LDOW2008), 2008. [15] David Huynh, Stefano Mazzocchi, and David R. 7. REFERENCES Karger. Piggy Bank: Experience the Semantic Web [1] Sasha Ames, Nikhil Bobb, Kevin M. Greenan, Inside Your Web Browser. In International Semantic Owen S. Hofmann, Mark W. Storer, Carlos Maltzahn, Web Conference, volume 3729 of Lecture Notes in Ethan L. Miller, and Scott A. Brandt. LiFS: An Computer Science, pages 413–430. Springer, 2005. Attribute-Rich File System for Storage Class [16] Georgi Kobilarov, Tom Scott, Yves Raimond, Silver Memories. In Proceedings of the 23rd IEEE / 14th Oliver, Chris Sizemore, Michael Smethurst, Christian NASA Goddard Conference on Mass Storage Systems Bizer, and Robert Lee. Media Meets Semantic Web — and Technologies, 2006. How the BBC Uses DBpedia and Linked Data to [2] William Y. Arms. Uniform Resource Names: Handles, Make Connections. In Proceedings of the 6th European PURLs, and Digital Object Identifiers. Commun. Semantic Web Conference, pages 723–737, Berlin, ACM, 44(5):68, 2001. Heidelberg, 2009. Springer-Verlag. [3] Sören Auer, Sebastian Dietzold, Jens Lehmann, [17] Carl Lagoze and Herbert Van de Sompel. ORE Sebastian Hellmann, and David Aumueller. Triplify: Specification — Abstract Data Model. Open Archives Light-weight Linked Data Publication from Relational Initiative, 2008. Available at Databases. In WWW ’09: Proceedings of the 18th http://www.openarchives.org/ore/1.0/datamodel. international conference on World wide web, pages [18] Andreas Langegger and Wolfram Wöß. XLWrap - 621–630, New York, NY, USA, 2009. ACM. Querying and Integrating Arbitrary Spreadsheets with [4] Arati Baliga, Joe Kilian, and Liviu Iftode. A SPARQL. In International Semantic Web Conference. Web-based Covert File System. In Proceedings of the Springer, 2009. 11th Workshop on Hot Topics in Operating Systems, [19] Niko Popitsch and Bernhard Haslhofer. DSNotify: 2007. Handling Broken Links in the Web of Data. In 19th [5] T. Berners-Lee, L. Masinter, and M. McCahill. International WWW Conference (WWW2010), Uniform Resource Locators (URL) (RFC 1738). Raleigh, NC, USA, 2 2010. ACM. to be published. Network Working Group, 1994. [20] Leo Sauermann and Sven Schwarz. Gnowsis Adapter [6] Tim Berners-Lee. Linked Data. World Wide Web Framework: Treating Structured Data Sources as Consortium, 2006. Available at Virtual RDF Graphs. In Proceedings of the 4th http://www.w3.org/DesignIssues/LinkedData.html, International Semantic Web Conference (ISWC retrieved 08-Aug-2008. 2005), pages 1016–1028. Springer-Verlag GmbH, 2005. [7] Chris Bizer, Richard Cyganiak, and Tom Heath. How [21] Bernhard Schandl. Representing Linked Data as to Publish Linked Data on the Web, 2007. Available at Virtual File Systems. In Proceedings of the 2nd http://www4.wiwiss.fu-berlin.de/bizer/pub/ International Workshop on Linked Data on the Web LinkedDataTutorial/, retrieved 02-Dec-2008. (LDOW), Madrid, Spain, 2009. [8] Chris Bizer and Andy Seaborne. D2RQ - Treating [22] Julius Volz, Christian Bizer, Martin Gaedke, and Non-RDF Databases as Virtual RDF Graphs. In Georgi Kobilarov. Discovering and Maintaining Links Poster at the 3rd International Semantic Web on the Web of Data. In Proceedings of the 8th Conference (ISWC2004), 2004. International Semantic Web Conference (ISWC [9] Christian Bizer, Tom Heath, and Tim Berners-Lee. 2009), 2009. Linked Data — The Story So Far. International [23] C.E. Wills, D. Giampaolo, and M.S. Mackovitch. Journal on Semantic Web and Information Systems, Experience with an Interactive Attribute-based User 5(3), 2009. Information Environment. In Computers and [10] Kendall Grant Clark, Lee Feigenbaum, and Elias Communications, 1995. Conference Proceedings of the Torres. SPARQL Protocol for RDF (W3C 1995 IEEE Fourteenth Annual International Phoenix Recommendation 15 January 2008). World Wide Web Conference on, pages 359–365, Mar 1995. Consortium, 2008.