<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Analyzing SQL Query Logs using Multi-Relational Graphs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Andreas</forename><forename type="middle">M</forename><surname>Wahl</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Richard</forename><surname>Lenz</surname></persName>
						</author>
						<title level="a" type="main">Analyzing SQL Query Logs using Multi-Relational Graphs</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F8A9B12D4D6D51102F9EE897B4CB4CCD</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T03:40+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Analytical SQL queries are a valuable source of information. They contain expert knowledge that cannot be inferred from schemas or content alone. Consider, for example, data lake scenarios, where relational and semi-structured data sources are combined in a single storage and processing environment. Data lakes often lack a structured curation process <ref type="bibr" target="#b0">[1]</ref>. Neither global schemas nor vocabularies might be established and data sources might be disparate. SQL provides effective mechanisms to apply these curation steps during querying in a demand-driven way (e.g. by using aliases, joins, casts, user-defined functions, conditional expressions). Hence, the resulting SQL query logs constitute a dynamic documentation of the data lake and the knowledge gathered by its users through previous pay-as-you-go integration tasks. This knowledge includes the purpose of data sources, their semantics, vocabularies, associations with other data sources, and their temporal and social usage context.</p><p>To leverage this knowledge, we have developed an extensible framework for analyzing SQL query logs. Query logs are mapped to a multi-relational <ref type="bibr" target="#b2">[3]</ref> graph model. We store query texts and corresponding abstract syntax trees to enable meta-querying for syntactic features. However, as SQL allows expressing queries with many different language constructs and the use of aliases, wildcards and unqualified attributes, meta-querying for semantic features requires a different query representation. We convert each query to a corresponding relational algebra tree and normalize it using algebraic transformation rules. Each tree is interlinked with a schema lineage tree, which captures attribute lineage and output schemas of each relational operator. Metadata about users, physical time and logical order allows to inspect the social and temporal context of each query. Meta-queries are specified using domain-specific graph traversal expressions.</p><p>Our framework can be used for a broad range of application scenarios. It facilitates collaborative data science by locating relevant queries. Other use cases include maintenance and monitoring tasks, schema evolution mechanisms and existing log mining algorithms. We rely on Apache TinkerPop <ref type="bibr" target="#b1">[2]</ref> to abstract from vendor-specific graph implementations. TinkerPop enables both interactive meta-querying and complex distributed computations on our graph model.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="1,0.00,159.54,612.00,472.91" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">The data lake fallacy: All water and little substance</title>
		<author>
			<persName><forename type="first">N</forename><surname>Heudecker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>White</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2014">2014</date>
			<publisher>Gartner Inc</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">The gremlin graph traversal machine and language (invited talk)</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Rodriguez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">DBPL&apos;</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A path algebra for multi-relational graphs</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Neubauer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICDEW&apos;</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><surname>Copyright</surname></persName>
		</author>
		<ptr target="http://ceur-ws.org" />
		<title level="m">Proceedings of the LWDA 2017 Workshops: KDML, FGWM</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Leyer</surname></persName>
		</editor>
		<meeting>the LWDA 2017 Workshops: KDML, FGWM<address><addrLine>, IR, and FGDB; Rostock, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017-09">2017. 11. September 2017</date>
			<biblScope unit="page">13</biblScope>
		</imprint>
	</monogr>
	<note>by the paper&apos;s authors. Copying permitted only for private and academic purposes</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
