<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Hersonissos, Greece
$ sayed.hoseini@hs-niederrhen.de (S. Hoseini); burgdorf@uni-wuppertal.de (A. Burgdorf);
paulus@uni-wuppertal.de (A. Paulus); meisen@uni-wuppertal.de (T. Meisen); christoph.quix@hs-niederrhen.de
(C. Quix); pomp@uni-wuppertal.de (A. Pomp)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Towards LLM-augmented Creation of Semantic Models for Dataspaces</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sayed Hoseini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Burgdorf</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Paulus</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobias Meisen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christoph Quix</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>André Pomp</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer FIT</institution>
          ,
          <addr-line>St. Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Hochschule Niederrhein</institution>
          ,
          <addr-line>Krefeld</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for Technologies and Management of Digital Transformation, University of Wuppertal</institution>
          ,
          <addr-line>Wuppertal</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Dataspaces aim to enable smooth and reliable data exchange between diferent organizations. They have gained increasing attention in Europe following the enactment of the European Data Governance Act. This legislation emphasizes trust, accessibility, and shared dataspaces, which require semantic interoperability grounded in the FAIR principles. Although semantic descriptions in the form of semantic models and ontologies are integral to dataspaces, their full potential remains underutilized. Meaningful metadata, including contextual information, enhances data usability, but manually creating semantic models can be challenging. Large Language Models (LLMs) ofer a new way to utilize data in dataspaces. Their advanced natural language processing capabilities enable context-aware data processing and semantic understanding. This paper presents initial experiments on customizing and optimizing LLMs for semantic labeling and modeling tasks. The contributions of this work include research questions for future investigations, early experiments demonstrating the applicability of LLM for semantic labeling, and proposed directions to address discovered challenges.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Dataspace</kwd>
        <kwd>Semantic Modeling</kwd>
        <kwd>LLMs</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The European Data Governance Act [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] outlines definitions and objectives aimed at bolstering
trust, broadening data accessibility, and promoting shared dataspaces. Its impact extends across
various data consumers and providers from academia as well as businesses. Eficient data
sharing within dataspaces necessitates semantic interoperability as an essential design principle,
grounded in the required adherence to FAIR principles [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Thereby, the utilization of semantic
descriptions and ontologies are already part of many dataspaces, but the potential is far from
being fully utilized in actual implementations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. An example of this is semantic interoperability
and integration, a key aspect of dataspaces that requires aggregating and integrating large
amounts of heterogeneous data from diferent sources. Managing data can be challenging, not
only due to the variety of data formats such as XML, CSV, JSON, relational data, and graph
data. In addition, data is often distributed across diferent departments within an organization,
under diferent governance regimes, and data models. It is important to have a clear and logical
structure of information, which fosters a common understanding in dataspaces, i.e., a lingua
franca for data moderation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] based on the Linked Data principles.
      </p>
      <p>
        Meaningful metadata is crucial for enhancing data usability, particularly for users with limited
domain knowledge or those unfamiliar with a dataset. Annotating raw data from heterogeneous
data sources with semantically rich models enhances data interpretability and usability [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
This type of semantic data expands beyond typical extractable metadata, such as schema, data
types, sizes, and formats, to include contextual information that is not inherent to the specific
data source. The field of Semantic Data Management (SDM) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] aims to represent the metadata
about heterogeneous data sources in the form of ontologies or knowledge graphs (KG) serialized
in a language of the Semantic Web. Hence, the goal is to establish an additional layer between the
data and the knowledge layer [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This is highly relevant for dataspaces because they integrate
data from various systems and platforms, which requires data to be interoperable and seamlessly
exchangeable between systems. In order to implement SDM in practice, conceptualizations in
the form of KGs and/or ontologies [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and a mapping between concepts and data items are
required. Semantic models provide these mappings from single datasets to a common data model
to represent data consistently across diferent applications in a way that is understandable and
interpretable by both humans and machines [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        With companies increasingly acknowledging the importance of data for their business
operations, semantic descriptions are often integrated into data management and governance
strategies [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], where an ontology or KG serves as a conceptual representation of an
organization’s data assets. A data source that is semantically well-annotated can be identified and
interpreted by leveraging conceptual representations of the data and by comprehending the
provided context information stored in the model. However, a huge initial overhead, coming
from the time-consuming manual process of creating meaningful semantic descriptions for data
sources, hampers the widespread adoption of SDM in practise [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Creating semantic models
entails deciphering the existing data source, consulting appropriate conceptualizations, and
establishing connections between data attributes and concepts provided by the conceptualization.
      </p>
      <p>
        Automating this task can be challenging and complex. Futia et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] present a method
based on graph neural networks covering the process of the process of semantic modeling.
However, the model can only optimize semantic models for which historic training data exists.
Xu et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] train a cross-modal network to learn semantic features between data sources and
semantic models. They admit that the method has shortcomings in dynamically augmenting
the semantic models to cover concepts that are not part of the original training data. Moreover,
challenges remain with detecting and correcting potentially incorrect attribute types if a source
attribute has more than one attribute type, and distinguishing similar attributes with the same
entities and semantic types.
      </p>
      <p>
        Following the rise of Large Language Models (LLMs), one can expect to see a major impact on
the landscape of data utilization and exchange within dataspaces. LLMs, such as OpenAI’s
GPT3.5 and GPT-4.0, have demonstrated remarkable capabilities in understanding, generating, and
processing vast amounts of textual data [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ]. Their abilities in natural language processing
xsd:date
      </p>
      <p>xsd:date
schema:startDate
schema:Event</p>
      <p>rdfs:subclassOf
schema:
ScreeningEvent
schema:Movie
schema:CreativeWork</p>
      <p>rdfs:subclassOf
schema:Movie
schema:title
xsd:string</p>
      <sec id="sec-1-1">
        <title>Semantic Models</title>
        <p>xsd:number
schema:duration
schema:Movie</p>
        <p>Ontology
xsd:Date
schema:birthDate</p>
        <p>xsd:string
schema:Person
schema:name</p>
        <p>schema:</p>
        <p>ScreeningEvent
schema:startDate
2024-03-08T14:15:00
schema:Person schema:Person
schema:name schema:name
Title ReleaseYear Length Director Producer
The Matrix 1999 136 Wachowskis Joel Silver
… … … … …
Class (Ontology)
Class (Semantic Model)</p>
        <p>Property
Literal</p>
        <p>Mapping Property
Newly added property
schema:Person</p>
        <p>schema:Person
schema:name schema:name
{ titel: “The Matrix”,
regisseur: “The Wachowskis”,
schauspieler: [ {name: “Keanu Reeves},</p>
        <p>rolle: “Neo / Thomas Anderson”,
v…e]r,…ka},eufe: 3504, :ticketsSold
} saal: 4 :screeningHall</p>
      </sec>
      <sec id="sec-1-2">
        <title>Datasets</title>
        <p>enable advanced semantic understanding and context-aware data processing within dataspaces.
A promising field of LLM application is the integration of heterogeneous data sources stored in
a dataspace.</p>
        <p>In this article, we highlight some initial experiments in this direction to examine the question
of how such general-purpose AI systems can be customized and optimized for data integration
tasks in the sense of SDM. In particular, we make the following contributions:
• A set of research questions to be answered in future research endeavors
• Early experiments to illustrate the applicability of LLMs to the tasks of semantic labeling
• Potential future research directions to address the identified challenges with the
applicability of LLMs to the tasks of semantic labeling and modeling.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Semantic Data Management</title>
      <p>Figure 1 illustrates the basic idea of semantic models. The raw datasets in the dataspace are
represented at the bottom; they can be in diferent formats and structures, such as tabular data
or hierarchical JSON data, but have partially overlapping content. The semantic model is a
projection from a shared conceptualization onto the diferent datasets. It utilizes relevant entities
and relationships of the conceptualization, in this case, the schema.org ontology (prefix schema:),
to formalize the context information of the dataset. An essential part of each semantic model
are the mappings, indicated as dotted lines, which link attributes in the datasets to classes in
the semantic model using properties of these classes. These elementary mappings are referred
Automation Semantic Type Semantic Relation</p>
      <p>Detection Inference
schema
analysis
semantic
labeling
semantic
modeling
semantic
refinement
storage /
usage
to as semantic labels. The semantic model captures the precise meaning of the dataset, explicitly
encoding the semantic types and relationships among its attributes within the graph.</p>
      <p>
        Following the definition of semantic labeling by Pham et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], a semantic label is an
annotation of a dataset attribute by a tuple consisting of a class (subject) and a property
(predicate). In this work, a semantic label is represented as a triple: (subject, predicate, schema
attribute). For example, the semantic label of the table’s column ’Title’ is constructed through
the subject ’schema:Movie’ and the predicate ’schema:title’ modeling the relationship between
them. This connects the table’s content to the attribute ’titel’ in the JSON object, indicating
an entry point for data integration between the two (heterogeneous) datasets. Moreover, the
semantic model doesn’t merely rely on a static conceptualization; it can also introduce novel
classes and properties [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The necessity for evolution becomes apparent when users contribute
datasets containing concepts and relationships not yet included in the conceptualization. The
semantic label for the JSON key ’verkaeufe’ is represented as the triple: (schema:ScreeningEvent,
:ticketsSold, ’verkaeufe’). Here, the predicate is a novel property for that specific domain,
which is not (yet) present in this form in the general-purpose schema.org ontology. This new
knowledge can be systematically integrated, thus perpetually advancing the conceptualization
layer [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Semantic models complement extractable metadata (such as data types, sizes, formats,
etc.) to convey context information that may not be inherent to the dataset at hand, for instance,
a starting date of a ’schema:ScreeningEvent’ as shown in Figure 1.
      </p>
      <p>
        The underlying process of semantic model creation has been formalized by [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] and is
visualized in Figure 2, starting with the identification of the schema of the dataset, followed by
a semantic labeling phase, in which basic concepts are assigned to the identified attributes.
Automated semantic labeling, referred to as Semantic Type Detection [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], is the process of
identifying these labels using algorithms and machine learning models. Subsequent to the semantic
labeling, the semantic modeling phase builds the remaining semantic model by formalizing
the context information. During semantic modeling, semantic relation inference [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] refers to
the process of automated identification of relationships and additional concepts, resulting in
a generated semantic model. All automation is followed by the semantic refinement phase,
where the modeler is involved in the modeling process to correct any errors present before
the semantic model is finalized and stored for documentation purposes. In practice, semantic
relation inference depends heavily on accurate semantic labels [
        <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
        ], which underscores the
importance of semantic type detection in fully automated systems to induce as few errors as
possible for the modeler to correct.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Research Questions</title>
        <p>With an introduction to the field of SDM at hand, we move on to formulate the research questions
that have motivated this work. The goal is to leverage LLMs to improve the automation of
semantic model creation for large quantities of heterogeneous data sources that share a common
domain in a dataspace.</p>
        <p>• RQ1: How to utilize LLMs to perform semantic type detection with a fixed set of labels
coming from a pre-selected conceptualization (such as WikiData, or schema.org)?
• RQ2: How to utilize LLMs to perform semantic type detection against an arbitrary domain
ontology, i.e., with no labeled dataset or zero-shot classification?
• RQ3: How can LLMs be utilized to identify and formalize the context of a given dataset,
creating a full semantic model?</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related Work</title>
      <p>First, in the scope of this contribution, we consider only works after the launch of ChatGPT in
November 2022. While one can make a strong case that (large) language models have existed
before this date, we decided to draw a line due to the massive performance increase which
was quite suddenly accessible to the public. Most of the works found in this limited range are
pre-prints that have not yet been peer-reviewed and published in scientific journals. To the best
of our knowledge, so far, except for the below-mentioned approaches, there seems to be no
further LLM-based eforts on the integration of the semantics of several heterogeneous data
sources modeled directly in a language of the Semantic Web in order to generate semantic
models in the sense of Figure 1.</p>
      <p>
        Korini et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] are among the first to report the application of LLMs for the Column Type
Annotation (CTA) task. CTA is a schema-level annotation task that represents a simplified
form of our interpretation of semantic labeling (no predicate) as it aims to map the underlying
table schema to a conceptualization. They view CTA as a multi-class classification problem and
evaluate diferent prompt designs. One important artifact that is highlighted is that ChatGPT
tends to ignore the instruction to use terms from the label space, and instead answers using
diferent terms. This is a known drawback of contemporary LLMs known as the Hallucination
problem [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Their proposed solutions for this challenge involve first determining a set of
classes of the entities described in the table and depending on this set asking ChatGPT to
annotate columns using only the relevant subset of the overall vocabulary. The evaluation of
the approach reports competitive performance when evaluated against the more traditional
models which are mostly directly fine-tuned for the CTA task and require significant amounts
of task-specific training data [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
      <p>
        A dataspace stores heterogeneous data of any type of which the majority may be in some form
relational, but an important fraction may be more complex, e.g. nested and even unstructured
(video, text, audio, ...). We found several works [
        <xref ref-type="bibr" rid="ref28 ref29 ref30">28, 29, 30</xref>
        ] that aim at customizing LLMs via
ifne-tuning for tables in particular. The goal is to solve the challenge of table understanding
which is closely related to understanding the semantics of a data source as it includes the CTA
task for example. Usmani et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] highlight the importance of multi-modal knowledge graphs
for dataspaces, present a review of the current state, and propose an ontology towards further
development. Furthermore, since important use cases for datasets can be attributed to numerical
data, it is important to have solid numerical reasoning skills [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. Several works suggest that
modern LLMs excel in simple problem settings, but they fall short of human expert performance
in problems requiring numerical reasoning over long contexts. As the complexity of challenging
mathematical problems increases, LLMs currently exhibit suboptimal performance [
        <xref ref-type="bibr" rid="ref17 ref32">17, 32</xref>
        ].
      </p>
      <p>
        There exist several works that investigate the use of LLMs for Knowledge Graph
Engineering [
        <xref ref-type="bibr" rid="ref33 ref34 ref35">33, 34, 35</xref>
        ]. Here the goal is to utilize the LLM for common tasks related to KGs. For example,
Meyer et al. [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] investigate SPARQL query generation as well as knowledge extraction from
fact sheets and KG exploration among others. They present several prompts that essentially test
how to pose questions in natural language executed against serialized KGs. The experiments
show that LLMs can return syntactically correct SPARQL queries and even entire serialized RDF
models with the desired form based on a task formulated in natural language. Further, LLMs can
ifnd relationships in KG and answer basic questions correctly, e.g., "Are there any connections
between US and UK?". They report major performance diferences between GPT-3.5 and GPT-4
in favor of the latter. One particular prompt aims at extracting knowledge from tables and
converting it into a serialized KG, which is very close to the idea of semantic modeling. The
experiment illustrates several problems with the output of contemporary LLMs:
• A tendency to prioritize the usage of schema.org vocabulary. While this works well for
well-known entities and properties, the LLMs invent reasonable, but non-existent classes
and properties (in the schema.org namespace) for concepts and relations that are too
specific.
• Non-deterministic output: For multiple runs of the same prompt to the LLM, the output
varies. For instance, while in three out of four runs a printer manufacturer was represented
as a separate typed entity, in one run it was only expressed as a string literal.
• Invention of non-existent properties, prefixes, and classes: If the LLM cannot identify
a fully matching class for a concept or a relation, URIs for those elements are invented
for the raw RDF output. While this would be possible in its own namespace, the classes
and properties are placed in existing namespaces, such as schema.org, resulting in the
generated URIs not being resolvable.
• Non-functional queries: SPARQL queries generated by ChatGPT -3 did not return the
expected results when executed against a knowledge graph, albeit being syntactically
correct. All queries needed slight modifications to work, such as correcting the referencing
of non-existent classes.
      </p>
      <p>To conclude, the results obtained by Meyer et al. show that the problems commonly observed
with LLMs also limit their ability to conduct tasks in the semantic domain. It is therefore
not possible to use LLMs out-of-the-box for semantic relation inference. Since semantic type
detection is simpler than semantic relation inference and also relies heavily on obtaining context,
this area of automation is investigated more closely.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Semantic Type Detection with LLMs</title>
      <p>
        To investigate the suitability of LLMs for semantic type detection, we conducted four exemplary
experiments using ChatGPT 4.0. Therefore, we manually selected three datasets from the
VC-SLAM corpus [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] which contains datasets created by human modelers in combination with
their data description and semantic model as evaluation datasets. Dataset 1 (VC-SLAM 0001)
has seven labels that are close to natural language, Dataset 2 (VC-SLAM 0018) has 21 labels that
are mostly human readable, and Dataset 3 (VC-SLAM 0068) consists of 24 labels, some of which
are abbreviations. The experiments are briefly described in the following and the results are
given in Table 1.
      </p>
      <sec id="sec-4-1">
        <title>4.1. Experiments</title>
        <p>Experiment 1 - Mapping to VC-SLAM: This experiment explores ChatGPT ’s ability to map
dataset labels to the corresponding concepts within the VC-SLAM ontology, provided solely in
Turtle (TTL) format, without any additional contextual information. This setup aims to assess
the base capability of ChatGPT to utilize the ontology’s structure and content for semantic type
detection. The task involves presenting dataset labels to ChatGPT and instructing it to identify
the most fitting ontology concept for each label.</p>
        <p>Prompt Experiment 1
• You are a tool for semantic type detection. I will provide you an owl ontology that
consists of all the concepts you know. This ontology is called VC-SLAM. Later I
will additionally provide the labels of three data sets. For each label you return
the fitting concept from the ontology.
• The first data set consists of the following labels: type, longitude, address, latitude,
tvm_identifier, pay_by_credit_card, pay_by_cash</p>
        <p>Please return the results in the following form: label,concept</p>
        <p>Experiment 2 - VC-SLAM with Documentation: In this experiment, the methodology is
similar to the Mapping to VC-SLAM experiment, but it includes comprehensive documentation
of the VC-SLAM ontology and datasets. This tests the hypothesis that additional contextual
information enhances the accuracy of semantic type detection.</p>
        <p>Experiment 3 - schema.org Ontology: This experiment shifts the focus to a
generalpurpose ontology to compare ChatGPT ’s adaptability and performance with a diferent ontology
structure. This experiment provides insights into the model’s versatility and the challenges
of applying a broad ontology like schema.org to a specific dataset, highlighting diferences in
specificity and applicability.</p>
        <p>Experiment 4 - Simplified VC-SLAM : The final experiment aims to investigate the impact
of ontology complexity on semantic type detection accuracy. By reducing the VC-SLAM
ontology to only include concept names and their descriptions without further relations, this
experiment seeks to determine whether a simplified ontology framework would enhance
ChatGPT ’s mapping accuracy due to decreased complexity and ambiguity.
Accuracy</p>
        <p>dataset 1
Nr of labels 7
Mapping to VC-SLAM 4
VC-SLAM with Documentation 5
schema.org Ontology 7
Simplified VC-SLAM 4
Mapping to VC-SLAM 0.57142
VC-SLAM with Documentation 0.71428
schema.org Ontology 1
Simplified VC-SLAM 0.57142</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>The outcomes of the experiments are measured across the three diferent datasets. Experiment
1 - Mapping to VC-SLAM reveals a varying performance with a 57.1% accuracy rate for the
ifrst dataset (4 out of 7 labels correctly mapped), 33.3% for the second (7 out of 21), and a notably
lower 12.5% for the third (3 out of 24). These results indicate that while ChatGPT can achieve
some level of correct mapping based on the ontology structure alone.</p>
        <p>Experiment 2 - VC-SLAM with Documentation showed improved performance with a
71.4% accuracy rate for the first dataset (5 out of 7 labels correctly mapped), 61.9% for the second
(13 out of 21), and 20.8% for the third (5 out of 24). These results highlight the significant impact
of extra contextual information in enhancing ChatGPT ’s semantic type detection capabilities,
leading to more accurate mappings.</p>
        <p>Experiment 3 - schema.org Ontology further demonstrated ChatGPT ’s adaptability with
impressive accuracies: 100% for the first dataset (7 out of 7 labels correctly mapped), 76.2% for
the second (16 out of 21), and 45.8% for the third (11 out of 24). Reasons for this may be that
ChatGPT is better at handling ontologies that were already part of the training data, or that the
descriptions in schema.org are more meaningful than those of the VC-SLAM ontology.</p>
        <p>Experiment 4 - Simplified VC-SLAM yielded mixed results: 57.1% accuracy for the first
dataset (4 out of 7 labels correctly mapped), 42.9% for the second (9 out of 21), and 50% for
the third (12 out of 24). These outcomes suggest that simplification of the ontology does not
necessarily lead to improved performance across all datasets, reflecting the complex balance
between ontology complexity and the efectiveness of semantic type detection with ChatGPT.</p>
        <p>During these experiments, several key findings emerged. First, the availability of additional
contexts, such as ontology documentation, significantly improves ChatGPT ’s ability to
accurately map dataset labels to ontology concepts, underscoring the importance of rich contextual
information for semantic type detection tasks. Second, the experiments revealed ChatGPT ’s
adaptability to diferent ontologies, with performance variations highlighting the model’s
capability to handle both specialized and general-purpose ontologies. Lastly, the simplification
of the ontology structure was shown to potentially enhance semantic type detection accuracy,
suggesting that the complexity of an ontology can afect the eficiency and efectiveness of label
mapping. These findings contribute valuable insights into the potential of leveraging LLMs for
semantic type detection, indicating promising pathways for automating and refining the data
categorization process. The experiments underscore the significance of ontology design and
contextual information in optimizing the performance of semantic type detection tasks using
AI models like ChatGPT.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Semantic Model Creation with LLMs</title>
      <p>Adding LLMs into the process of semantic model creation provides automation algorithms
with the ability to profit from the advantages that these pre-trained models provide. Taking
the findings from both related work and our experiments into account, it can be deduced that
results obtained from LLMs need to be verified and checked before being applied in a semantic
model creation scenario within dataspaces. In the following, two approaches on how to utilize
LLM-generated output in automated semantic model creation are conceptualized.</p>
      <sec id="sec-5-1">
        <title>5.1. Unifying KGs with LLMs for Semantic Modeling</title>
        <p>Pros +
General knowledge
Language processing
Generalizability</p>
        <p>LLM</p>
        <p>Cons
Implicit knowledge
Hallucinations
Indecisiveness
Black-Box
Lacking domain-specific/
new knowledge</p>
        <p>KG
Pros +
Structural knowledge
Accuracy
Decisiveness
Interpretability
Domain-specific knowledge
Evolving knowledge</p>
        <p>Cons
Incompleteness
Lacking language
understanding
Unseen facts</p>
        <p>
          Although the technologies for linked data and the Semantic Web have become more mature
in recent years, the amount of data considered in Semantic Web applications is far less than
in Big Data applications [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ]. Thus, scalability to large, heterogeneous data sets is a major
challenge for applying Semantic Web technologies in dataspaces for which LLMs can be a great
help. However, even though LLMs can efectively possess rich knowledge learned from massive
amounts of training data and benefit downstream tasks at the fine-tuning stage, as previously
described, they still have significant limitations due to the lack of external knowledge [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. In
contrast, KGs are structured knowledge models that explicitly store rich factual knowledge.
However, KGs are dificult to construct and evolve by nature, making it challenging to generate
new facts and represent unseen knowledge [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. Therefore, it is reasonable to view KGs and
LLMs as two complementary technologies whose integration has the potential to produce
synergy, capitalizing on the strengths of each while mitigating their respective weaknesses.
        </p>
        <p>
          For a detailed discussion on research towards the unification of language models and KGs
we refer to the survey by Pan et al. [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ]. They contrast the pros and cons of (large) language
models vs KGs. Table 2 confirms the previous findings about the drawbacks of LLMs, namely
hallucination, a lack of domain-specific knowledge, indecisiveness, and a lack of interpretability.
Conversely, the automated construction of KGs is equally challenging, and current approaches
to KGs are inadequate in handling the incomplete and dynamically changing nature of
realworld KGs. Additionally, many of the current techniques for KGs are tailored to particular
tasks and, therefore not easy to generalize to broader applications. This suggests that KG and
LLMs indeed complement and may synergize with each other. Pan et al. further predict three
main directions for future research toward this goal: KG-enhanced LLMs, which incorporate
KGs during the pre-training and inference phases of LLMs to enhance understanding of the
knowledge learned by LLMs. Here we direct the interested reader to the survey by Hu et al. [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ].
Then there are LLM-augmented KGs, mentioned in a similar form by Meyer et al. [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] (see
Section 3). Ultimately, these directions may be integrated to produce Synergized LLMs + KGs,
in which LLMs and KGs play equal roles and work in a mutually beneficial way to facilitate
reasoning driven by both data and knowledge. This fusion may possibly address some of the
contemporary challenges discussed in Section 3 and represent one answer to the third research
question (RQ3) on how LLMs can enhance the semantic model creation process. Figure 3
illustrates the integration of the two directions and its stated goals. As the most basic semantic
unit, entities play a crucial role, and incorporating their knowledge into LLMs helps to improve
semantic understanding. In addition, there are also a large number of relational triples in the
knowledge graph, which can provide suficient structured information to further improve the
semantic understanding. Since conventional LLMs trained on plain text data are not designed
to understand (graph-)structured data such as knowledge graphs, they might not fully grasp or
understand the information conveyed by the KG structure. This assumption is confirmed by
our experiments (see Section 4), since reducing the representation of ontologies to plain text
significantly improves the performance. This indicates that ChatGPT does not handle the graph
structure well. Synergized LLMs + KGs promise to be able to understand the underlying graph
structure which could improve the performance of KG technology e.g. in discovering unseen
facts and exploration for example.
        </p>
        <p>
          Multimodal KGs are becoming increasingly important for dataspaces as they integrate
diferent modalities, including text, image, audio, and video data, into a single graph, allowing for a
comprehensive representation of complex data [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. As previously described, it is important for
semantic modeling to have solid numerical reasoning skills. Similar to KGs, regular datasets
        </p>
        <p>R-450 R-400 R-350 R-300 R-250 R-200 R-150 R-100 R-50</p>
        <p>Current model &amp; prompt
Input</p>
        <p>Update
Modeler</p>
        <p>Modeling Platform
asdf</p>
        <p>KE-LLM</p>
        <p>LLM output
Highlighting</p>
        <p>Annotation
Post-processing
like CSVs or JSONs can be viewed as (semi-)structured data that represent a further modality.
Therefore, efectively leveraging representations from multiple modalities, in particular tables
and spreadsheets, would be a significant milestone towards the unification of KGs and LLMs.</p>
        <p>
          Finally to remedy the problems with hallucinations and updates of the internal knowledge of
LLMs as real-world situations change, the incorporation of knowledge from KGs represents a
logical solution. For hallucination, the KGs can be leveraged as an external source to validate
or fact-check the output of an LLM. Editing the knowledge of an LLM live without re-training
is an attractive idea. However, current methods have severe problems, and further research is
required [
          <xref ref-type="bibr" rid="ref42 ref43">42, 43</xref>
          ]. A potential solution to this problem is presented in the next section.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. LLM-Supported Interactive Semantic Model Design</title>
        <p>
          Since semantic model creation is usually performed inside a semantic modeling platform,
integrating LLMs into the semantic model creation following an interactive pattern requires the
surrounding platforms to ofer additional functions. None of the existing semantic modeling
platforms, such as SAND [
          <xref ref-type="bibr" rid="ref44">44</xref>
          ], MantisTable [
          <xref ref-type="bibr" rid="ref45">45</xref>
          ] or PLASMA [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ], ofer the ability to communicate
with a generative AI to refine semantic models. Future platforms to support manual semantic
model creation will likely integrate the interaction with an LLM as a central component of their
design. For example, a semantic model creation could be fitted into a session with an interactive
LLM. Any type of generative LLM can be used, however, using knowledge-enhanced LLMs (see
Section 5.1) helps to reduce the efects of unwanted phenomena, such as hallucinations.
        </p>
        <p>Users input their desired changes using natural language, and the LLM alters the model
accordingly. This process requires two central features to be realized: First, users must be able
to formulate their changes using a prompt-like interface. Any identified shortcomings can be
expressed in text form, even using natural language, to interact with the system and improve
the usability of the semantic modeling platform for users with little or no previous knowledge
of semantic technologies. Additionally, the platform should provide a process for piping and
ifltering LLM output to minimize the impact of known drawbacks, such as hallucinations.
-400
-350
-300
-240
-180
-120
-60</p>
        <p>Figure 4 visualizes this process in which the modeler and the LLM serve as the interacting
participants of a communication. All interactions between both parties are conducted through
various services, such as the modeling platform and the LLM’s API. These services apply
modifications and transform the contained data to match the other side’s data model. For
example, when a semantic model generation is requested using an LLM, the current semantic
model is provided to the LLM, preferably using a pre-configured, session-based GPT specialized
in semantic model creation. The request is appended to the interactive session, resulting in an
updated model being generated by the LLM. The LLM’s extensive knowledge and advanced
capability to process natural speech input allows it to modify the semantic model based on
the modeler’s intentions, proposing a formalized solution to shortcomings such as syntactical
errors. The LLM-generated output undergoes post-processing to ensure presentability to the
modeler, particularly when generating large semantic models. The changes made by the LLM
in the last iteration are highlighted in separate steps in the generated model, making it easy to
identify the changes made based on the last input when displaying the results in the modeling
platform. In case the LLM generates corresponding textual output, it is parsed and attached to
the updated model using a special set of RDF properties. This enables the modeler to verify
the reasoning behind the modifications made to specific elements. Once the post-processing is
complete, the proposed semantic model is transferred back to the user and displayed.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This article explores the applicability of modern LLMs for semantic data management in
dataspaces, in particular to the tasks of semantic model creation. Our objective is to address the
provided research questions to ofer directions of future research for preparing LLMs for the
complex task of creating semantic models for vast amounts of heterogeneous data sources in
dataspaces.</p>
      <p>Regarding RQ1 and RQ2, the experiments in Section 4 demonstrate the feasibility of utilizing
LLMs for semantic type detection with a fixed or limited set of labels derived from legacy
knowledge graphs. LLMs show promise in achieving significant accuracy in semantic type
detection tasks, especially when additional contextual information or documentation is provided
alongside the ontology. in particular, Experiment 3, which used the schema.org ontology,
showcases the high adaptability and potential of LLMs to accurately map dataset labels to
ontology concepts with an accuracy reaching up to 100% for certain datasets. This indicates
that LLMs can serve as a powerful tool for semantic type detection. Experiment 4’s approach,
using a simplified version of the VC-SLAM ontology, ofers insight into how LLMs might
tackle semantic type detection tasks when the ontology is minimized to basic concept names
and descriptions, achieving up to 57.1% accuracy in some cases. The findings suggest that
LLMs, including ChatGPT, can efectively engage in semantic type detection tasks even when
presented with new, unfamiliar, or arbitrary domain ontologies, by leveraging their inherent
understanding of language and context.</p>
      <p>
        Regarding RQ3, exploiting the vast knowledge and reasoning capabilities of LLMs to automate
semantic modeling is an attractive idea. However, significant research is still necessary to
integrate KGs with LLMs to produce synergy between these two complementary technologies
(see Section 5.1). LLMs do not navigate on graphs or handle numerical data sets well. They may
sufer from hallucinations and cannot acquire domain-specific knowledge [
        <xref ref-type="bibr" rid="ref47">47</xref>
        ] easily.
      </p>
      <p>The LLM-supported interactive semantic model design (see Section 5) establishes a unique
way of generating semantic models, providing another possible answer to RQ3 on how LLMs
can enhance the semantic model creation process. However, it requires several additions to
today’s semantic modeling platforms. In theory, the creation of a semantic model can be a fully
immersive experience, where modifications can even be made through voice commands. These
modifications are then converted to prompts and interpreted by the natural language processing
capabilities of LLMs. The resulting changes are automatically visualized, efectively utilizing
the LLM as a semantic modeling system. While the presented results and concepts represent a
ifrst approach to the topic, the stated research questions remain open to inspire future research
in this area.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work has been sponsored by the German Federal Ministry of Education and Research in
the funding program "Forschung an Fachhochschulen" (grant no. 13FH557KX0) and funding
program "Datenkompetenzzentren für die Wissenschaft" (grant no. 16DKZ2056B).
Declaration of generative AI and AI-assisted technologies in the writing process: During
the preparation of this work, the author(s) used OpenAI’s generative AI (ChatGPT v3.5 &amp; v4),
DeepL and Grammarly to improve the writing, make suggestions, and for rephrasing. After
using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full
responsibility for the content of the publication.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Baloup</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bayamlıoğlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Benmayor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ducuing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dutkiewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lalova-Spinks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Miadzvetskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Peeters</surname>
          </string-name>
          ,
          <article-title>White paper on the data governance act</article-title>
          ,
          <source>CiTiP Working Paper</source>
          <year>2021</year>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nagel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Hierro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Perea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lycklama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mertens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-S.</given-names>
            <surname>Taillandier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marques</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gelhaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marguglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Ahle</surname>
          </string-name>
          , et al.,
          <source>Design Principles for Data Spaces: Position Paper</source>
          ,
          <string-name>
            <given-names>Technical</given-names>
            <surname>Report</surname>
          </string-name>
          , E. ON Energy Research Center,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The fair guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Theissen-Lipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paulus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pomp</surname>
          </string-name>
          , E. Curry,
          <article-title>Semantics in dataspaces: Origin and future directions</article-title>
          ,
          <source>in: Companion Proceedings of the ACM Web Conference</source>
          <year>2023</year>
          , WWW '23 Companion, ACM,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <source>Semantic Integration and Interoperability</source>
          , Springer International Publishing,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Meckler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dorsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Henselmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          ,
          <article-title>The web and linked data as a solid foundation for dataspaces</article-title>
          ,
          <source>in: Companion Proceedings of the ACM Web Conference</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yahya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Breslin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. I. Ali</surname>
          </string-name>
          ,
          <article-title>Semantic web and knowledge graphs for industry 4</article-title>
          .0,
          <string-name>
            <surname>Applied</surname>
            <given-names>Sciences</given-names>
          </string-name>
          11 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hoseini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Theissen-Lipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          ,
          <article-title>Semantic data management in data lakes</article-title>
          ,
          <source>arXiv:2310.15373</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pomp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paulus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kirmse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kraus</surname>
          </string-name>
          , T. Meisen,
          <article-title>Applying semantics to reduce the time to analytics within complex heterogeneous infrastructures</article-title>
          ,
          <source>Technologies</source>
          <volume>6</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Knowledge</surname>
            <given-names>graphs</given-names>
          </string-name>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>54</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Solmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cirillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fürst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jacobs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kovacs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Santana</surname>
          </string-name>
          , L. Sánchez,
          <article-title>Enabling data spaces: existing developments and challenges</article-title>
          ,
          <source>in: Proceedings of the 1st International Workshop on Data Economy</source>
          , DE '22,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Dibowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Svetashova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Henson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <article-title>Using semantic technologies to manage a data lake: Data catalog, provenance and access control</article-title>
          ,
          <source>in: Proc. Scalable Semantic Web Knowledge Base Systems Workshop</source>
          , volume
          <volume>2757</volume>
          <source>of CEUR WS</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Usmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Breslin</surname>
          </string-name>
          , E. Curry,
          <article-title>Towards multimodal knowledge graphs for data spaces</article-title>
          ,
          <source>in: Companion Proceedings of the ACM Web Conference</source>
          <year>2023</year>
          , WWW '23 Companion, ACM,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Futia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vetrò</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. C. De Martin</surname>
            ,
            <given-names>Semi:</given-names>
          </string-name>
          <article-title>A semantic modeling machine to build knowledge graphs with graph neural networks</article-title>
          ,
          <source>SoftwareX</source>
          <volume>12</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Mayer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H.-Y. Zhang,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <article-title>Automatic semantic modeling of structured data sources with cross-modal retrieval</article-title>
          ,
          <source>Pattern Recognition Letters</source>
          <volume>177</volume>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Teng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Zhang,</surname>
          </string-name>
          <article-title>Evaluating the logical reasoning ability of chatgpt and gpt-4</article-title>
          , arXiv:
          <fpage>2304</fpage>
          .03439 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          , et al.,
          <article-title>A survey on evaluation of large language models</article-title>
          ,
          <source>ACM Trans. Intell. Syst. Technol</source>
          . (
          <year>2024</year>
          ). Just Accepted.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Alse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          , Semantic Labeling:
          <string-name>
            <given-names>A</given-names>
            <surname>Domain-Independent</surname>
          </string-name>
          <string-name>
            <surname>Approach</surname>
          </string-name>
          , in: The Semantic Web - ISWC 2016, Springer International Publishing,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pomp</surname>
          </string-name>
          ,
          <article-title>Bottom-up Knowledge Graph-based Data Management, Berichte aus dem Maschinenbau</article-title>
          , Shaker,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pomp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lipp</surname>
          </string-name>
          , T. Meisen,
          <article-title>You are missing a concept! enhancing ontology-based data access with evolving ontologies</article-title>
          ,
          <source>in: Proc. ICSC</source>
          , IEEE,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paulus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burgdorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pomp</surname>
          </string-name>
          , T. Meisen,
          <article-title>Recent advances and future challenges of semantic modeling</article-title>
          ,
          <source>in: Proc. 15th IEEE ICSC, IEEE</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bakker</surname>
          </string-name>
          , et al.,
          <article-title>Sherlock: A deep learning approach to semantic data type detection</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD, ACM</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>B.</given-names>
            <surname>Vu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pujara</surname>
          </string-name>
          ,
          <article-title>Learning Semantic Models of Data Sources Using Probabilistic Graphical Models</article-title>
          ,
          <source>in: The World Wide Web Conference, WWW '19</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Taheriyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Ambite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Leveraging Linked Data to Infer Semantic Relations within Structured Sources</article-title>
          ,
          <source>in: Proceedings of the 6th International Workshop on Consuming Linked Data (COLD</source>
          <year>2015</year>
          ),
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Korini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <article-title>Column type annotation using chatgpt</article-title>
          ,
          <source>arXiv:2306.00745</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Frieske</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ishii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. J.</given-names>
            <surname>Bang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fung</surname>
          </string-name>
          ,
          <article-title>Survey of hallucination in natural language generation</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Suhara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , c. Demiralp,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W.-C. Tan,
          <article-title>Annotating columns with pre-trained language models</article-title>
          ,
          <source>in: Proceedings of the 2022 International Conference on Management of Data, SIGMOD '22</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Yashar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Fainman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Chaudhuri, Table-gpt:
          <article-title>Table-tuned gpt for diverse table tasks</article-title>
          ,
          <source>arXiv:2310.09263</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hegselmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Buendia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sontag</surname>
          </string-name>
          , Tabllm:
          <article-title>Few-shot classification of tabular data with large language models</article-title>
          ,
          <source>in: International Conference on Artificial Intelligence and Statistics</source>
          , PMLR,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          , Tablellama:
          <article-title>Towards open large generalist models for tables</article-title>
          ,
          <source>arXiv:2311.09206</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gottschalk</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Demidova,</surname>
          </string-name>
          <article-title>Tab2kg: Semantic table interpretation with lightweight semantic profiles</article-title>
          ,
          <source>Semantic Web</source>
          <volume>13</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          , et al.,
          <article-title>Docmath-eval: Evaluating numerical reasoning capabilities of llms in understanding long documents with tabular data</article-title>
          ,
          <source>arXiv:2311.09805</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Stork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <article-title>Knowledge engineering using large language models</article-title>
          ,
          <source>arXiv:2310.00637</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>L.-P.</given-names>
            <surname>Meyer</surname>
          </string-name>
          , J. Frey,
          <string-name>
            <given-names>K.</given-names>
            <surname>Junghanns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bulert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gründer-Fahrer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <article-title>Developing a scalable benchmark for assessing large language models in knowledge graph engineering</article-title>
          , arXiv:
          <fpage>2308</fpage>
          .16622 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>J.</given-names>
            <surname>Frey</surname>
          </string-name>
          , L.-P. Meyer,
          <string-name>
            <given-names>N.</given-names>
            <surname>Arndt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bulert</surname>
          </string-name>
          ,
          <article-title>Benchmarking the abilities of large language models for rdf knowledge graph creation and comprehension: How well do llms speak turtle?</article-title>
          ,
          <source>arXiv:2309.17122</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>L.-P.</given-names>
            <surname>Meyer</surname>
          </string-name>
          , C. Stadler,
          <string-name>
            <given-names>J.</given-names>
            <surname>Frey</surname>
          </string-name>
          , et al.,
          <article-title>Llm-assisted knowledge graph engineering: Experiments with chatgpt</article-title>
          ,
          <source>arXiv:2307.06917</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burgdorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paulus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pomp</surname>
          </string-name>
          , T. Meisen,
          <string-name>
            <surname>VC-SLAM - A Handcrafted Data</surname>
          </string-name>
          <article-title>Corpus for the Construction of Semantic Models, Data 7 (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , et al.,
          <article-title>Unifying large language models and knowledge graphs: A roadmap, IEEE Transactions on Knowledge and Data Engineering (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>A.</given-names>
            <surname>Haller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dobriy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferranti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J. Rodríguez</given-names>
            <surname>Méndez</surname>
          </string-name>
          ,
          <article-title>An analysis of links in wikidata</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>E.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          , M.-E. Vidal,
          <article-title>Scaling up knowledge graph creation to large and heterogeneous data sources</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>75</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A survey of knowledge enhanced pre-trained language models</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tian</surname>
          </string-name>
          , S. Cheng,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. Zhang,</surname>
          </string-name>
          <article-title>Editing large language models: Problems, methods, and opportunities</article-title>
          ,
          <source>arXiv:2305.13172</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Biran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yoran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Globerson</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Geva, Evaluating the ripple efects of knowledge editing in language models</article-title>
          ,
          <source>arXiv:2307.12976</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>B.</given-names>
            <surname>Vu</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>A. Knoblock, SAND : A Tool for Creating Semantic Descriptions of Tabular Sources, in: The semantic web</article-title>
          , volume
          <volume>13384</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>R.</given-names>
            <surname>Avogadro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cremaschi</surname>
          </string-name>
          ,
          <article-title>Mantistable v: A novel and eficient approach to semantic table interpretation</article-title>
          ., in: SemTab@ ISWC,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <surname>Alexander</surname>
            <given-names>Paulus</given-names>
          </string-name>
          , Andreas Burgdorf, Lars Puleikis, Tristan Langer, André Pomp, Tobias Meisen,
          <article-title>PLASMA: Platform for Auxiliary Semantic Modeling Approaches</article-title>
          , in: International
          <source>Conference on Enterprise Information Systems</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kandpal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <string-name>
            <surname>Large Language Models Struggle to Learn Long-Tail</surname>
            <given-names>Knowledge</given-names>
          </string-name>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>