<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Semantic Data Management for Managing Heterogeneous Data Sources in Chemistry 4.0</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Sayed</forename><surname>Hoseini</surname></persName>
							<email>sayed.hoseini@hs-niederrhein.de</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Electr. Engineering and Computer Science</orgName>
								<orgName type="institution">Niederrhein University of Applied Sciences</orgName>
								<address>
									<settlement>Krefeld</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Semantic Data Management for Managing Heterogeneous Data Sources in Chemistry 4.0</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">D712DA4C102AD129DA05375A8B673506</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Semantic Data Management</term>
					<term>Semantic Data Lakes</term>
					<term>Semantic Machine Learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Managing large volumes of data poses significant challenges due to the variety of formats, distribution across departments, and different governance structures within organizations. In research and industry environments, this complexity is compounded by the need for streamlined data handling processes to support automated workflows and machine learning (ML) applications. Integrating implicit contextual knowledge alongside data artifacts is critical, especially for non-expert users accessing the data. Data lakes provide a scalable solution by aggregating raw data from disparate sources with minimal upfront integration costs. However, without proper integration, data analysis and interpretation is hindered, rendering the data lake effectively inoperable. This PhD research addresses these challenges by applying semantic data management (SDM) techniques inside a semantic data lake. While initial milestones have been achieved through a systematic literature review and a concrete implementation, further efforts lie ahead. First, the emergence of large language models offers numerous opportunities for automating previously manual processes. Leveraging these models can significantly improve the efficiency of common SDM tasks. Second, extending the application of SDM techniques to data analytics can facilitate the integration of diverse data sources into ML pipelines. Ultimately, we aim to bridge the gap between Big Data and Semantic Web technologies, anticipating the development of advanced semantic data lake solutions in the foreseeable future.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Problem statement</head><p>Large amounts of data are generated every second to enable the subsequent collection, storage, usage and analysis for various applications. However, managing data can be challenging not only due to the variety of data formats, but additionally, it is often distributed across different departments within a company, under different governance regimes, network topologies and data models. At the Institute for Surface Technology of the Niederrhein University of Applied Sciences (HIT), where this research takes place, similar challenges need to be addressed <ref type="bibr" target="#b0">[1]</ref>. There, an automation platform for material development is in operation and various datasets of experiments for paints and coatings need to be captured, cataloged, and made available for ML to be applied in chemistry for experiment suggestions and analysis. Data sources are very heterogeneous, e.g., streams of machine sensors, interfaces of control software, databases and images of microscopes, and scripts and model checkpoints from ML. As a result, tasks for analyzing data, such as collecting, accessing, searching, understanding and processing data, become very time-consuming. This makes it difficult to realize visions such as Chemistry 4.0, which refers to the digital transformation of the chemical industry and emphasizes the integration of data-driven systems for increasing degrees of automation <ref type="bibr" target="#b1">[2]</ref>. The centralized management of all (meta-) data with integrated data analytics using a uniform data management system is thus very attractive and actively researched <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>.</p><p>Data lakes are scalable schema-less repositories to ingest raw data in its original format from heterogeneous data sources. Only minimal effort is required for ingesting data into a data lake making it an efficient tool for collecting, storing, linking, and transforming datasets <ref type="bibr" target="#b3">[4]</ref>. However, this approach only postpones the upfront cost of integration, which is why they suffer from the risk of turning into a data swamp <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>. In addition, many existing systems lack matured functions to support data analytics <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b7">8]</ref>. Furthermore, industrial ML suffers from low transparency of ML towards non-ML experts, poor and non-unified descriptions of ML practices for reviewing or comprehension due to custom-made ad-hoc solutions tailored only to specific applications affecting their re-usability <ref type="bibr" target="#b8">[9]</ref>.</p><p>The main goal of this research is to develop a prototype for the industrial chemistry context of the HIT that not only manages the various (meta-)data assets, but also facilitates data integration, ultimately empowering users unfamiliar with data analytics to derive ML models.</p><p>The importance of data integration is rooted in the fact that those users, who ingest the data in the lake and are responsible for the data, may not belong to the group of data scientists, who are going to use the data later on. Likewise, a data scientist crafting a specific model seeks clarity and ease of understanding the detail about the design. Thus, the implicit context knowledge needs to be committed alongside any created artifacts to assist a third party with limited domain knowledge to interpret and use the received assets later on.</p><p>The problem statement can be formulated mathematically. Let:</p><p>• 𝐷 = {𝑑 1 , 𝑑 2 , . . . , 𝑑 𝑛 }: the set of heterogeneous data sources,</p><p>• 𝐴 = {𝑎 1 , 𝑎 2 , . . . , 𝑎 𝑘 }: the set of analytical models applied to 𝐷 for generating insights,</p><p>• 𝑀 = {𝑚 1 , 𝑚 2 , . . . , 𝑚 𝑚 }: the set of metadata artefacts that describe and link data items managed and stored by a data lake. The objective is to minimize the human effort required to prepare and integrate heterogeneous data sources through metadata, leveraging the capability of the lake to derive insights from ML with maximum automation and smart assistance.</p><p>Minimize:</p><formula xml:id="formula_0">𝐸 total (𝐷, 𝐴, 𝑀 ) = 𝐸 prep (𝐷, 𝑀 ) + 𝐸 use (𝐷, 𝐴, 𝑀 ) + 𝐸 meta (𝑀 )</formula><p>• 𝐸 prep (𝐷, 𝑀 ): Effort required to harmonize, transform, and integrate heterogeneous data sources 𝐷 using the available metadata 𝑀 .</p><p>• 𝐸 use (𝐷, 𝐴, 𝑀 ): Effort required for users to interpret, and utilize 𝐷, 𝐴, and 𝑀 for deriving insights and crafting ML models.</p><p>• 𝐸 meta (𝑀 ): Effort required to create and maintain metadata 𝑀 .  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related work</head><p>Semantic data management (SDM) is one way of formalizing the context and domain knowldege of data sources <ref type="bibr" target="#b9">[10]</ref>. SDM proposes the linkage of metadata to knowledge graphs (KG) based on the Linked Data principles <ref type="bibr" target="#b10">[11]</ref> to provide more meaning to the data in the lake by establishing an additional semantic layer between the data and the knowledge layer <ref type="bibr" target="#b11">[12]</ref>. A semantic layer can be used not only for data management but also to address the challenge of integrating data from heterogeneous sources <ref type="bibr" target="#b12">[13]</ref>.</p><p>Semantic data lakes store and manage this serialized semantics between data sources. They are a specific form of traditional data lakes that extend the capabilities through a semantic layer that enriches and connects the stored data semantically. The semantic data lake explicitly integrates semantic descriptions into its data management and governance capability <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>, where an ontology or KG serves as a universal data model, offering a conceptual representation of an organization's data assets. In Figure <ref type="figure" target="#fig_0">1</ref>, we propose a four-layered data lake architecture in <ref type="bibr" target="#b15">[16]</ref>, where especially metadata-related functions are enriched with semantics. For example, a semantic labeling component in the ingestion layer adds semantic labels to the extracted metadata elements. The semantic information (labels, models, KG, etc.) is managed in the storage layer in an extended semantic metadata repository. To facilitate the usage and interpretation of data the interaction layer has several additional components, e.g., for browsing the KG and semantic models and editors for refining the semantic mappings and models. Figure <ref type="figure" target="#fig_1">2</ref> represents a particular instantiation of this architecture (see C2) illustrating the various utilized technologies along the four layers.</p><p>Data Management for ML has been well-researched for at least ten years <ref type="bibr" target="#b16">[17]</ref> and one subfield is also known as MLOps <ref type="bibr" target="#b17">[18]</ref>. Hai et al. <ref type="bibr" target="#b2">[3]</ref> underline the importance of ML-driven metadata management and in-lake ML which means supporting the training and inference process directly inside the data lake platform. Zhao et al. and Schlegel et al. <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b18">19]</ref> present metadata models for data lakes to capture not only descriptive but also analytical info about datasets and performed analyses. MLSea <ref type="bibr" target="#b19">[20]</ref> is a resource consisting of MLSO, an ontology to model ML pipelines, MLST, a collection of taxonomies of ML-related concepts, and MLSea-KG, a KG containing ML datasets, pipelines and scientific works from diverse sources. By leveraging semantic technologies MLSea integrates ML datasets, experiments, software and scientific works for improving the search, explainability and reproducibility.</p><p>Large Language Models (LLMs) are expected to have a major impact on the landscape of data utilization and exchange. LLMs have demonstrated remarkable capabilities in understanding, generating, and processing vast amounts of textual data <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b22">23]</ref>. Promising fields of LLM application are the integration of heterogeneous data sources in the sense of SDM <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b24">25]</ref> and automated machine learning (AutoML) <ref type="bibr" target="#b25">[26,</ref><ref type="bibr" target="#b26">27]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Research Questions</head><p>Closer collaboration between human-machine and machine-machine systems has revolutionized the current industrial landscape, leading to Industry 4.0 <ref type="bibr" target="#b27">[28]</ref>. Here, challenges in terms of data management are to be addressed <ref type="bibr" target="#b4">[5]</ref>. The advantage of employing a data lake system lies in the centralized management of (meta-) data and analytics. Thus, all model artifacts and their associated datasets, are accessible, registered, documented, and understandable by both humans and machines. The main goal of this research is to install such a prototype in the industrial chemistry context of the HIT leading to the following research questions and related hypotheses:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RQ1: What role does SDM play in improving the integration and usability of heterogeneous data generated in an industrial context, particularly facilitated within a semantic data lake? H1: SDM facilitates the integration of heterogeneous data sources and enhances data usability by providing a unified structure and enabling interoperability based on Linked Data principles.</head><p>To manage heterogeneous data it is important to have a clear and logical structure when presenting this information. This demands a common understanding across the data landscape, i.e., a lingua franca for data moderation <ref type="bibr" target="#b28">[29]</ref> based on the Linked Data principles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RQ2: How can LLMs be utilized to identify and formalize the context of given datasets, creating a full semantic model? H2: LLMs automate substeps in semantic model creation, in particular semantic labeling.</head><p>Automating the semantic modeling task is complex, because creating semantic models entails deciphering the existing data source and establishing connections between data attributes and concepts drawn from a KG. Open questions remain on how to utilize the LLM for individual tasks along a pipeline or instead prepare the LLM for the entire task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RQ3: How can semantic descriptions of data sources, ML pipelines, and their context be used to enhance data analytics within the data lake? H3: Structured semantic knowledge about ML pipelines improves accuracy and efficiency of contemporary methods for automating ML workflows.</head><p>While the demand is increasing, ML models are still often manually created by humans, because the need for statistical and technical knowledge pose significant challenges for non-technical users <ref type="bibr" target="#b29">[30]</ref>. Current methods are only capable of assisting in the substep of model creation <ref type="bibr" target="#b29">[30]</ref>, but data integration is a major obstacle <ref type="bibr" target="#b30">[31]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Research Methods</head><p>Contribution 1 (C1): Systematic Literature Review: We systematically reviewed the literature of the last 20 years of research in the field of SDM related to semantic data lakes in particular <ref type="bibr" target="#b9">[10]</ref>. The article gives an overview of semantics-based methods for data management, access, and integration and related those findings to current semantic data lake proposals. Furthermore, we identified a gap in today's landscape between present data lakes, semantic technologies for data accessing, and the semantic modeling of heterogeneous datasets. Contribution 2 (C2): The Semantic Data Reservoir (SEDAR) <ref type="bibr" target="#b12">[13]</ref> is an implementation to bridge this gap. SEDAR is a prototype (see Figure <ref type="figure" target="#fig_1">2</ref>) of a semantic data lake built on existing open-source technologies in the area of big data management. For the implementation of SEDAR we were inspired by the SDM pipeline (see Figure <ref type="figure" target="#fig_2">3</ref>). The pipeline is designed for modeling data at the schema level and the first phase after extracting those schemas is automated semantic labeling, because semantic labels are a prerequisite for deriving a full semantic model automatically followed by semantic refinement, i.e. manual oversight to verify the automated outcomes. We then extended the pipeline and reinterpreted the storage phase conceptually, in the sense that we convert the resulting semantic model into RML mappings <ref type="bibr" target="#b32">[33]</ref> to be used for Ontologybased data access (OBDA) <ref type="bibr" target="#b33">[34]</ref>. OBDA allows for on-demand translation of queries against heterogeneous data sources directly in their original form without having to know how the data is organized physically, which is particular attractive in data lake environments. Thus, SEDAR implements a polystore with semantic query processing engine grounded on semantic models. The synergy between the automation platform at the HIT and SEDAR has been utilized in production and presented as original research at the ICPS'24 conference <ref type="bibr" target="#b34">[35]</ref>. Contribution 3 (C3): Automated Semantic Labeling using LLMs. In a publication for the ESWC conference <ref type="bibr" target="#b24">[25]</ref> we conduct experiments demonstrating the applicability of LLM for semantic labeling and propose directions to address discovered challenges. Contribution 4 (C4): Standardizing ML pipelines. Recently we have continued progressing SEDAR towards the support of standard ML pipelines with higher degrees of automation <ref type="bibr" target="#b35">[36]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Evaluation</head><p>This PhD is already in a later stage, hence some research questions can already be addressed. Through C1, we have illuminated how SDM can help with the management of heterogeneous process data and gained knowledge of the current state of the field to understand how other researchers aim to implement particular SDM techniques. Open questions remain on how to convert these formal ideas into a particular implementation. Through C2, we proved to a wider audience how semantic processing can meet modern big data requirements. Therefore, we accept H1 by providing a comprehensive field survey and demonstrating practically how the semantic layer of SEDAR enables more expressive data management, integration, and access. Through C3 we address the applicability of LLMs for the first steps in the semantic model creation process. The experiments demonstrate the feasibility of utilizing LLMs for semantic type detection with a fixed or limited set of labels derived from legacy KGs. The findings further suggest that LLMs can effectively engage in semantic type detection tasks even when presented with new, unfamiliar, or arbitrary domain ontologies, by leveraging their inherent knowledge and understanding of language and as well as additional contextual information that is possibly provided alongside the ontology. Therefore, we accept the premise of H2. Through C4, we have been progressing towards standardizing ML pipelines. In the future, we plan to research how to perform a fusion between the SDM techniques and the existing works towards automating ML. To this end, we want to propose a software system that allows to reuse and generalize data analytics for arbitrary use cases. The goal is to answer RQ3 by incorporating structured semantic knowledge about previously conducted ML experiments, such as the MLSea KG <ref type="bibr" target="#b19">[20]</ref> to improve the efficiency and accuracy of current automated ML methods. By addressing the more challenging preceding phases of any ML project, i.e. business &amp; data understanding, and especially data preparation &amp; integration <ref type="bibr" target="#b36">[37]</ref>, this research agenda will advance the SOTA.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion and Future Work</head><p>This doctoral research addresses the challenge of managing diverse data sources and their integration into common ML pipelines semantically. So far, to face this issue, we first conducted a systematic literature review, then presented SEDAR, an open-source data management platform. We then proceeded to investigate the applicability of LLMs for semantic labeling and to enhance SEDAR to standardize ML pipelines by integrating principles from AutoML and MLOps. As this Ph.D. is already in a later stage, through these contributions we were able to answer the two out of three research questions. The remaining phase will focus on integrating semantically standardized ML pipelines to improve the effciency of automated ML methods.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure1: Semantic data lake architecture from<ref type="bibr" target="#b9">[10]</ref> </figDesc><graphic coords="3,97.63,84.18,150.00,192.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: SEDAR architecture from<ref type="bibr" target="#b12">[13]</ref> </figDesc><graphic coords="3,347.64,85.66,149.99,189.36" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: SDM pipeline inspired by [32] and extended by Ontology-based data access (OBDA)</figDesc><graphic coords="5,151.80,84.19,291.60,53.20" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The author thanks Maribel Acosta and Christoph Quix for reviewing this article. This work has been sponsored by the German Federal Ministry of Education and Research in the funding program "Forschung an Fachhochschulen", project 𝑖 2 𝐷𝐴𝐶𝐻 (grant no. 13FH557KX0).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Data science: Accelerating innovation and discovery in chemical engineering</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A C</forename><surname>Beck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Carothers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">R</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pfaendtner</surname></persName>
		</author>
		<idno type="DOI">10.1002/aic.15192</idno>
		<idno>doi:</idno>
		<ptr target="http://dx.doi.org/10.1002/aic.15192" />
	</analytic>
	<monogr>
		<title level="j">AIChE Journal</title>
		<imprint>
			<biblScope unit="volume">62</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Photochemistry with cyanines in the near infrared: A step to chemistry 4.0 technologies</title>
		<author>
			<persName><forename type="first">B</forename><surname>Strehmel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Schmitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cremanns</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Göttert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Chemistry-A European Journal</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="12855" to="12864" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Data lakes: A survey of functions and systems</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Koutras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jarke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE TKDE</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="12571" to="12590" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Data lake management: challenges and opportunities</title>
		<author>
			<persName><forename type="first">F</forename><surname>Nargesian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Miller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">Q</forename><surname>Pu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">C</forename><surname>Arocena</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. VLDB Endow</title>
				<meeting>VLDB Endow</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="1986" to="1989" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Data management in industry 4.0: State of the art and open challenges</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">P</forename><surname>Raptis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Conti</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2019.2929296</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="97052" to="97093" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Draining the data swamp: A similarity-based approach</title>
		<author>
			<persName><forename type="first">W</forename><surname>Brackenbury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mondal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Elmore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Franklin</surname></persName>
		</author>
		<idno type="DOI">10.1145/3209900.3209911</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA &apos;18</title>
				<meeting>the Workshop on Human-In-the-Loop Data Analytics, HILDA &apos;18<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Sawadogo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Darmont</surname></persName>
		</author>
		<title level="m">On data lake architectures and metadata management</title>
				<imprint>
			<publisher>JJIS</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Analysis-oriented metadata for data lakes</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th IDEAS, ACM</title>
				<meeting>the 25th IDEAS, ACM</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="194" to="203" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Executable knowledge graphs for machine learning: a bosch case of welding monitoring</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Soylu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kharlamov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="791" to="809" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A survey on semantic data management as intersection of ontology-based data access, semantic modeling and data lakes</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hoseini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Theissen-Lipp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.websem.2024.100819</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.websem.2024.100819" />
	</analytic>
	<monogr>
		<title level="j">Journal of Web Semantics</title>
		<imprint>
			<biblScope unit="volume">81</biblScope>
			<biblScope unit="page">100819</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Linked data: The story so far</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Heath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Berners-Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Semantic services, interoperability and web applications: emerging concepts</title>
				<imprint>
			<publisher>IGI global</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="205" to="227" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Applying semantics to reduce the time to analytics within complex heterogeneous infrastructures</title>
		<author>
			<persName><forename type="first">A</forename><surname>Pomp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Paulus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kirmse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Kraus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Meisen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Technologies</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page">86</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">SEDAR: A semantic data reservoir for heterogeneous datasets</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hoseini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Shaker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 32nd ACM International Conference on Information and Knowledge Management</title>
				<meeting>the 32nd ACM International Conference on Information and Knowledge Management<address><addrLine>Birmingham, UK</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2023">October 21-25, 2023. 2023</date>
			<biblScope unit="page" from="5056" to="5060" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Using semantic technologies to manage a data lake: Data catalog, provenance and access control</title>
		<author>
			<persName><forename type="first">H</forename><surname>Dibowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Schmid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Svetashova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Henson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tran</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Scalable Semantic Web Knowledge Base Systems Workshop</title>
		<title level="s">CEUR WS</title>
		<meeting>Scalable Semantic Web Knowledge Base Systems Workshop</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">2757</biblScope>
			<biblScope unit="page" from="65" to="80" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Towards multimodal knowledge graphs for data spaces</title>
		<author>
			<persName><forename type="first">A</forename><surname>Usmani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Breslin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Curry</surname></persName>
		</author>
		<idno type="DOI">10.1145/3543873.3587665</idno>
	</analytic>
	<monogr>
		<title level="m">Companion Proceedings of the ACM Web Conference 2023</title>
				<meeting><address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1494" to="1499" />
		</imprint>
	</monogr>
	<note>WWW &apos;23 Companion</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hai</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-63962-8_{7}{-}{1}</idno>
		<title level="m">Data lake, in: Encyclopedia of Big Data Technologies</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Data management for machine learning: A survey</title>
		<author>
			<persName><forename type="first">C</forename><surname>Chai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Niu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE TKDE</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="4646" to="4667" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">What Is MLOps?</title>
		<author>
			<persName><forename type="first">S</forename><surname>Alla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Adari</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>Apress</publisher>
			<pubPlace>Berkeley, CA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Extracting provenance of machine learning experiment pipeline artifacts</title>
		<author>
			<persName><forename type="first">M</forename><surname>Schlegel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sattler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">27th ADBIS Conference</title>
				<meeting><address><addrLine>Barcelona, Spain</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">13985</biblScope>
			<biblScope unit="page" from="238" to="251" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Mlsea: A semantic layer for discoverable machine learning</title>
		<author>
			<persName><forename type="first">I</forename><surname>Dasoulas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-60635-9_11</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-031-60635-9\_11" />
	</analytic>
	<monogr>
		<title level="m">The Semantic Web -21st International Conference, ESWC 2024</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">A</forename><surname>Meroño-Peñuela</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Troncy</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Hartig</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Acosta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Alam</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Paulheim</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Lisena</surname></persName>
		</editor>
		<meeting><address><addrLine>Hersonissos, Crete, Greece</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">May 26-30, 2024. 2024</date>
			<biblScope unit="volume">14665</biblScope>
			<biblScope unit="page" from="178" to="198" />
		</imprint>
	</monogr>
	<note>Proceedings, Part II</note>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Evaluating the logical reasoning ability of chatgpt and gpt-4</title>
		<author>
			<persName><forename type="first">H</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Teng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.03439</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Performance of large language models in a computer science degree program</title>
		<author>
			<persName><forename type="first">T</forename><surname>Krüger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gref</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Artificial Intelligence. ECAI 2023 International Workshops</title>
				<meeting><address><addrLine>Nature Switzerland, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="409" to="424" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">A survey on evaluation of large language models</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xie</surname></persName>
		</author>
		<idno type="DOI">10.1145/3641289</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Intell. Syst. Technol</title>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>just Accepted</note>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Korini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2306.00745</idno>
		<title level="m">Column type annotation using chatgpt</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Towards llm-augmented creation of semantic models for dataspaces</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hoseini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Burgdorf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Paulus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Meisen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pomp</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Second International Workshop on Semantics in Dataspaces (SDS 2024) co-located with the 21st Extended Semantic Web Conference (ESWC 2024)</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">J</forename><surname>Theissen-Lipp</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Colpaert</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Sowe</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Curry</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Decker</surname></persName>
		</editor>
		<meeting>the Second International Workshop on Semantics in Dataspaces (SDS 2024) co-located with the 21st Extended Semantic Web Conference (ESWC 2024)<address><addrLine>Hersonissos, Greece</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2024-05-26">May 26, 2024. 2024</date>
			<biblScope unit="volume">3705</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Hassan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.13657</idno>
		<title level="m">Chatgpt as your personal data scientist</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Large language models for automated data science: Introducing caafe for context-aware automated feature engineering</title>
		<author>
			<persName><forename type="first">N</forename><surname>Hollmann</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Ustundag</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Cevikcan</surname></persName>
		</author>
		<title level="m">Industry 4.0: Managing the Digital Transformation</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Semantic Integration and Interoperability</title>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-93975-5_12</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
			<publisher>Springer International Publishing</publisher>
			<biblScope unit="page" from="195" to="210" />
			<pubPlace>Cham</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Automl to date and beyond: Challenges and opportunities</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Karmaker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Comput. Surv</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Amalur: Data integration meets machine learning</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bozzon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hai</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Recent advances and future challenges of semantic modeling</title>
		<author>
			<persName><forename type="first">A</forename><surname>Paulus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Burgdorf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Pomp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Meisen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 15th International Conference on Semantic Computing (ICSC), IEEE</title>
				<imprint>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="page" from="70" to="75" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Rml: A generic language for integrated rdf mappings of heterogeneous data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Dimou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vander Sande</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Colpaert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Verborgh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mannens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Van De Walle</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Ldow</title>
		<imprint>
			<biblScope unit="volume">1184</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Ontology-based data access: A survey</title>
		<author>
			<persName><forename type="first">G</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kontchakov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lembo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Poggi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Rosati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zakharyaschev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Joint Conferences on Artificial Intelligence</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Coatings intelligence: Data-driven automation for chemistry 4</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hoseini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 7th (ICPS)</title>
				<imprint>
			<date type="published" when="2024">2024. 2024</date>
			<biblScope unit="volume">0</biblScope>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
	<note>In-press</note>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Enhancing machine learning capabilities in data lakes with AutoML and LLMs</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hoseini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ibbels</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Quix</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Conference on Advances in Databases and Information Systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>Accepted</note>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Crisp-dm: Towards a standard process model for data mining</title>
		<author>
			<persName><forename type="first">R</forename><surname>Wirth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hipp</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining</title>
				<meeting>the 4th international conference on the practical applications of knowledge discovery and data mining<address><addrLine>Manchester</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="29" to="39" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
