<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">FS2KG: From File Systems to Knowledge Graphs (Demo)</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Yannis</forename><surname>Tzitzikas</surname></persName>
							<email>tzitzik@ics.forth.gr</email>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">FORTH-ICS</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Computer Science Department</orgName>
								<orgName type="institution">University of Crete</orgName>
								<address>
									<country key="GR">Greece</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">FS2KG: From File Systems to Knowledge Graphs (Demo)</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CF3FEAB71151354F7CEFFB6EDD785B52</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:13+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Knowledge Graph creation</term>
					<term>File systems</term>
					<term>Semantic Access over File Systems</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The tree-structured and semantics-neutral approach of file systems is the dominant method for organizing information, decades now. In this paper we elaborate on the following two questions: (a) can a file system structure be benefited by a Knowledge Graph (KG), (b) can the construction of a KG be facilitated by the file system? To this end we propose an automatic method for producing KGs from folder structures, which can be configured through small, and easy to write, configuration files that can be placed in the desired folders to guide the KG construction. We present F S 2 K G , an implementation of the approach. The approach can facilitate the rapid creation of KGs, as well as various file system related tasks.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Motivation, Challenges, Methodology and Approach</head><p>File systems offer a tree structure consisting of folders and files, and the same structuring is offered by cloud-based file systems too. This simple tree-structured and semantics-neutral approach of file systems is the dominant method with which we organize information for decades. The idea of using the term (and metaphor) folder for designing hierarchical file systems dates back to 1958 <ref type="bibr" target="#b0">[1]</ref>, while the first file system to support arbitrary hierarchies of directories was used in the Multics operating system in 1965, half a century ago! We could say that the main benefits from the typical hierarchical organization of file systems is that: (a) it allows grouping resources (through folders with names and unlimited nesting level), (b) it allows naming resources relatively to their parent folder, and (c) it allows moving/copying/deleting these resources in one shot, i.e. all contained resources are moved/copied/deleted. However, a weakness of this structuring method is that each resource (file or folder) should be placed (and appears) in one place. The "shortcuts" that file systems typically offer is a remedy, but it is quite weak (one way links; not bidirectional). Consequently, file systems do not support a multifaceted approach for locating resources. Two questions that arise are: (a) since Knowledge Graphs (KG) are labeled graphs, and not trees, could this extra expressiveness be leveraged for the contents of our file system? and (b) since there is a need for practical and effective methods for producing KGs, as automatically as possible, could the ubiquitous use (and knowledge of using) file systems, be leveraged for speeding up, or just facilitating, the creation of KGs? Both directions could have significant impact. The first would enable leveraging the Semantic Web technologies in every day tasks. The second would assist the creation of KGs, something desirable, since there is a need for practical and mature tools to foster knowledge engineering (there are some critiques about the practicality and availability of tools for the Semantic Web, e.g. see <ref type="bibr" target="#b1">[2]</ref>, and an elaborated discussion of these critiques at <ref type="bibr" target="#b2">[3]</ref>). Challenges: Enriching a file system with a KG is a challenging task, since a file system contains very heterogeneous material since it is used for various purposes and tasks. For instance, one part of the file system may contain training material (books, papers, slides, assignments, student exercises), another part various personal material (family documents, photos and videos, travel information), datasets, software code and systems and others. Moreover, applications also use the file system and create and modify parts of it. Methodology: We started by inspecting existing file system structures, and reflecting on what we would like to achieve, what file system weakness we would like to tackle. We came up with various ideas, that we implemented and tested, and only those that seemed effective were included in the proposed tool F S 2 K G . Approach: In brief, we propose supporting two fundamental interrelated aspects: folder structure and semantic network with connections between these two. The core schema is illustrated in Figure <ref type="figure" target="#fig_0">1</ref> (upper part), where with "*" we denote multiplicity (as in UML Class Diagrams). The approach is equipped with methods that create entities based on the files and folders of the file system, as well as by extracting them from csv files. The big picture is sketched in Figure <ref type="figure" target="#fig_0">1</ref> (lower part). Related Work: In comparison to the line of research under the term "semantic desktop" that was developed 15 years ago (e.g. <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8]</ref>), we could say that the current work has a more modest, but realistic, vision: not to integrate data, applications, and tasks, but to focus on the data part (folders and files). As pointed out in <ref type="bibr" target="#b8">[9]</ref>, existing Semantic Desktops are either too complicated, or not scale well, and a real "killer app" is still missing. The approach proposed in this paper is more tightly related with the classical file system usage. It adopts a modular configuration approach, there is no dependency to a central repository, or central configuration, or any other service.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">The Functionality of FS2KG</head><p>F S 2 K G supports a default operation that requires no configuration. It starts by traversing the file system from the desired folder(s). Each folder is represented by a class, subfolder relationships by r d f s : s u b C l a s s O f , while each file is represented as a named individual classified under the class of its folder. However the user can place a ".kg" file in some folders to configure the creation of the KG in the corresponding part of the file system. In particular, a ".kg" file contains configuration parameters, in the form of key-value pairs. It supports commands: (I) for scope restriction, i.e. with t r a v e r s e = o f f the traversal stops, and we can ignore files based on their extension, e.g. i g n o r e E x t = t m p ; a u x . (II) for the automatic creation and classification of entities corresponding to subfolders, e.g. with s u b F o l d e r s C l a s s = e x a m p l e : S t u d e n t for each subfolder of the hosting folder an entity is created (belonging to the Semantic Network view), with i.e. extra, explicitly specified, triples can be associated to a file, say f 1 . p d f , by placing them in a file f 1 . k g in the same folder. (V) for extracting and transforming data from the desired csv files. Specifically F S 2 K G adopts the following convention: If we want to perform extraction from a file, say " f n . y " , we can create a file " f n . k g " where we place rules to extract data from the corresponding file. We support an easy to use language (much simpler than existing approaches, like <ref type="bibr" target="#b9">[10]</ref>), with which we can construct RDF triples (data, taxonomies, ontologies), from csv files. For example, suppose a file with name C o n n e c t i o n s . t x t that contains lines of the form: Property C 1 refers to the first column, and its value means that the values that occur in the first column of the data file should become instances of the class e x a m p l e : S t u d e n t . Consequently, with the first three lines (properties C1-C3), we manage to classify all values that appear in the csv file to the classes S t u d e n t , L o c a t i o n and S p o r t . The last row contains two rules, separated Efficiency. The application of F S 2 K G over a file system of 140 GB that contains 60 K folders and 382 K files takes only 90 seconds and produces a ttl file of size 140 MB. Use Cases. We can identify two main scenarios: (𝑆 1 ) Over existing file systems to enable querying, identification and grouping of entities scattered in different subfolders. To this end, one immediate next step is the implementation of an explorer that combines the functionality of the classical file explorer with the query client (as shown in Figure <ref type="figure" target="#fig_2">2</ref>). (𝑆 2 ) Over folder structures and files created for facilitating KG construction. For example, the user can use the file system to define a taxonomy (e.g. of papers organized in categories), instead of having to use a taxonomy/ontology editor. Moreover, arbitrary KGs can be constructed from csv files through F S 2 K G and the supported extraction language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Conclusion</head><p>Finding an effective method to conciliate freedom of file system usage, and Knowledge Graph integrity and usability, is a challenging task. We will demonstrate F S 2 K G a tool for the automatic creation of KGs from file systems that supports a modular (and easy to use) configuration approach relying on small configuration files in the folders, and KG reconstruction at any moment. The tool is open source and available at https://github.com/YannisTzitzikas/FS2KG, subject to a plethora of extensions. We have decided to include in F S 2 K G a sort of core functionality. On top of this, several straightforward extensions are applicable (since they have already been studied in isolation) including: (a) representation of the filesystem's file metadata in RDF (as in <ref type="bibr" target="#b3">[4]</ref>), (b) extraction of the embedded in the files metadata and representation in RDF (as in <ref type="bibr" target="#b10">[11]</ref>), (c) instance matching over the KG to establish connections between entities whose name is slightly different in different folders, (d) regex-based specification of the desired files/folders (as in web crawlers), (e) information extraction capabilities from files according to their type (text, images, etc) based on the application context and requirements at hand (including scripts in the '.kg' files), (f) materialization of the extracted triples from big csv files, to avoid re-extracting them in the next KG reconstruction, if the files have not been changed in the meantime, and (g) keyword search based on both the contents of the files and produced KG (as in <ref type="bibr" target="#b11">[12]</ref>).</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The core connections and the big picture</figDesc><graphic coords="3,99.71,84.19,395.86,252.67" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>L e o n a r d o ; R o m e ; F o o t b a l l . We can place in the same folder a file C o n n e c t i o n s . k g with: C 1 = e x a m p l e : S t u d e n t C 2 = e x a m p l e : L o c a t i o n C 3 = e x a m p l e : S p o r t R = C 1 , e x a m p l e : l i v e s A t , C 2 ; C 1 , e x a m p l e : l i k e s , C 3</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Left:The two class hierarchies (folder's view and Semantic Network), and their connection through entities, Right: the GUI of the query client)</figDesc><graphic coords="4,96.38,84.19,200.02,227.48" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Organization and retrieval of records generated in a large-scale engineering project</title>
		<author>
			<persName><forename type="first">G</forename><surname>Barnard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Iii</forename></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Papers and discussions presented at the December 3-5, 1958, eastern joint computer conference: Modern computers: objectives, designs, applications</title>
				<imprint>
			<date type="published" when="1958">1958</date>
			<biblScope unit="page" from="59" to="63" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The semantic web identity crisis: in search of the trivialities that never were</title>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vander</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sande</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="19" to="27" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">The semantic web: Two decades on</title>
		<author>
			<persName><forename type="first">A</forename><surname>Hogan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="169" to="185" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Automatic RDF metadata generation for resource discovery</title>
		<author>
			<persName><forename type="first">C</forename><surname>Jenkins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jackson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Burden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wallis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Networks</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="1305" to="1320" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Overview and outlook on the semantic desktop</title>
		<author>
			<persName><forename type="first">L</forename><surname>Sauermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bernardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dengel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Semantic Desktop Workshop</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">175</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">SemDAV: a file exchange protocol for the semantic desktop</title>
		<author>
			<persName><forename type="first">B</forename><surname>Schandl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SemDesk&apos;06: Proceedings of the 5th International Conference on Semantic Desktop and Social Semantic Collaboration</title>
				<imprint>
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">PIMO -a framework for representing personal information models</title>
		<author>
			<persName><forename type="first">L</forename><surname>Sauermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Van Elst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dengel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of I-Semantics</title>
				<meeting>I-Semantics</meeting>
		<imprint>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="270" to="277" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Knowledge management on the desktop</title>
		<author>
			<persName><forename type="first">L</forename><surname>Drăgan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Decker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Knowledge Engineering and Knowledge Management</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="373" to="382" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Context spaces as the cornerstone of a near-transparent and self-reorganizing semantic desktop</title>
		<author>
			<persName><forename type="first">C</forename><surname>Jilek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schröder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schwarz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Maus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dengel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="89" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">X3ML mapping framework for information integration in cultural heritage and beyond</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Marketakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Minadakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kondylakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Konsolaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Samaritakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Theodoridou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Flouris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Doerr</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal on Digital Libraries</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="301" to="319" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Prescan: towards automating the preservation of digital objects</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Marketakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tzanakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Management of Emergent Digital EcoSystems</title>
				<meeting>the International Conference on Management of Emergent Digital EcoSystems</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="404" to="411" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Keyword search over RDF: Is a single perspective enough?</title>
		<author>
			<persName><forename type="first">C</forename><surname>Nikas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kadilierakis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fafalios</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tzitzikas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Big Data and Cognitive Computing</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page">22</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
