<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Accessing Semi-structured Data with RML and LLMs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shikhat Karkee</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Botoeva</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sam Coombes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Jordanous</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Özgür Kafali</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Lanti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>East Kent Hospitals University Foundation Trust</institution>
          ,
          <addr-line>Ashford</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <addr-line>Bolzano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Kent</institution>
          ,
          <addr-line>Canterbury</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>We demonstrate a principled way to query semistructured data by integrating natural language processing capabilities into knowledge graph generation. We instantiate RML mappings with LLM queries to extract ontological terms from plain text. We evaluate our approach on pharmaceutical data in JSON format. The Ontology-Based Data Access (OBDA) framework [1] has provided the theoretical foundations for representing data stored in a database as a knowledge graph - a collection of facts formulated using high-level concepts and relationships. A standard formalism for describing knowledge graphs in practice is the Resource Description Framework (RDF) [2]. Once the data has been represented as a knowledge graph, it can be analysed using a formal query language with a well-defined semantics, for instance such as SPARQL, a standard query language for RDF [3]. Importantly, for such structured data representations and query languages, there is a guarantee that the answers are correct as long as the data is correct. A crucial component of the OBDA framework is mappings, which declaratively specify how knowledge graph entities should be populated from values found in the data. Historically the prevailing database management systems were relational, so originally OBDA was used to facilitate access to relational data and the mapping languages were only concerned with relational data sources. R2RML, the RDB to RDF Mapping Language, was developed as a W3C recommendation for specifying declarative mappings from relational databases to RDF datasets [4]. With the advent of non-relational databases, the OBDA framework has been extended to accommodate alternative data models [5, 6]. Likewise, in line with the general trend of data in non-relational formats being widely available on the Web, an extension of R2RML to support generic data sources has been under development, resulting in RML [7, 8], the RDF Mapping Language. RMLMapper is a library for generating knowledge graphs from various data formats, including CSV, JSON, and XML [9]. Despite the recent advances in accommodating a wider range of data formats, a fundamental assumption when generating knowledge graphs following the OBDA principle is that the data are inherently structured. However, a significant portion of data available on the Internet is, at best, semi-structured. As a running example, consider online databases that catalogue medicines and information about them [10]. Notably, information such as the name of a drug or its interactions with other drugs is structured. Other bits of information, though, may come as unstructured text. For instance, “1 g, every 4-6 hours; maximum 4 g per day” is a dosage information for paracetamol. In general, dosage instructions may come in a variety of formats, even for the same drug, e.g., “1 g, every 4-6 hours, dose</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Semi-structured data</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Ontology-Based Data Access</kwd>
        <kwd>RML</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>to be administered over 15 minutes; maximum 4 g per day”. Assuming a scenario where the goal is to
extract a knowledge graph from such a database in order to provide answers to pharmaceutical queries,
the dosage information would be expected to have a structured representation. This highlights the
necessity for extracting structured information from unstructured data.</p>
      <p>
        The standard solution when dealing with unstructured data nowadays is to employ large language
models (LLMs)–natural language processing tools that excel at understanding and generating language
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. For instance, a state-of-the-art LLM would be capable of extracting the dosage amount and
recommended administration frequency from a dosage instruction string as above. In the context of
knowledge graph construction from text corpora, LLMs have been used for named entity recognition,
relation extraction, as well as for fact generation [
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15 ref16 ref17">12, 13, 14, 15, 16, 17</xref>
        ]. Despite the advances, however,
the main challenge remains in terms of factual correctness of the extracted assertions, due to the
so-called hallucinations [18], the consequence of LLMs being essentially statistical models outputting
words that have high probability of continuing the current phrase. A pure LLM-based solution to
knowledge graph construction, therefore, may not be acceptable in a high stakes application, where
either the generated knowledge graph would be expected to be factually correct, or the correctness of
query answers would need to be easily verifiable.
      </p>
      <p>To combine the best of two worlds, the ability of LLMs to intelligently process arbitrary natural
language and the correctness guarantees associated with the OBDA approach, we propose leveraging
LLMs in a more surgical fashion. Namely our methodology involves integrating LLM queries within
the RML mappings, where the input to the LLM is computed by the source part of a mapping rule and
the output of the LLM is used to generate RDF terms. There are several advantages of our approach
over an end-to-end LLM solution, where pharmaceutical queries are answered by an LLM directly over
the data sources. First, the LLM is not used to deal with information that is already structured and can
be processed using the standard RML rules. Second, the LLM is fed only short and focused snippets of
text, thus reducing the risk of confusing it. Finally, it allows for a more straightforward verification
of the correctness of the knowledge graph/query answers, as each fact generated by an LLM could be
annotated with the text used to extract it. To the best of our knowledge, this is the first such application
of LLMs in the OBDA context. The only other work that used LLMs within the OBDA framework
employed LLMs to generate mappings automatically [19]. Other information extraction techniques
have been used to convert textual documents to structured representations [20, 21], however these
techniques are less powerful than LLMs.</p>
      <p>We instantiate and evaluate our approach on a Medicines Information (MI) use-case. We use a small
JSON dataset created to mimic the existing online MI datasets and a number of SPARQL queries. We
compare our approach to the pure LLM-based solution and to the pure RML-based solution. We assess
the correctness and completeness of the resulting knowledge graphs and of the query answers. Our
experiments show that integrating LLMs into RML results in fewer hallucinations than the pure LLM
solution, and as expected, makes more information available for structured processing comparing to
the pure RML solution.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>
        Knowledge Graphs. A Knowledge Graph (KG) is a structured representation of information in the form
of a graph, where nodes represent entities of interest, and edges represent the semantic relationships
between those entities. The Resource Description Framework (RDF) is a W3C-standard model for data
interchange on the Web and serves as the foundational representation format for specifying Knowledge
Graphs (KGs). In RDF, knowledge is encoded as a set of triples of the form (  ), where  denotes the
subject,  the predicate, and  the object. The subject  is either an Internationalized Resource Identifier
(IRI) or a blank node1, the predicate  is an IRI specifying the relationship, and the object  is an IRI,
a blank node, or a literal2. For example, the triple :aspirin :usage :indication1 expresses that
1An unnamed resource
2A concrete data value such as a string, number, or date
the drug aspirin has usage for an indication identified by the :indication1 IRI, whereas the triple
:dosage1 :amount "100"^^xsd:decimal states that the amount for the dosage entity :dosage1 is
100, where "100"^^xsd:decimal is an RDF literal with an associated datatype. An RDF graph is a finite
set of RDF triples. SPARQL is a standard query language for RDF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Knowledge graph extraction from data. In traditional OBDA, one provides access to an (external)
relational database (DB) through an ontology, which is connected to the DB by means of mappings.</p>
      <sec id="sec-2-1">
        <title>Given a DB schema  and an ontology  , a (GAV) mapping assertion between  and  is an expression</title>
        <p>of the form (⃗) ⇝  (⃗) where (⃗) is a (SQL) query over , also called the source query, and  (⃗) an</p>
      </sec>
      <sec id="sec-2-2">
        <title>RDF triple template over  , also called the target query. A triple template ( (⃗) rdf:type ), for a</title>
        <p>class name  in  and a template function  , asserts that the IRI computed as  (⃗) is an instance of
, whereas ( (⃗)   ′(⃗)), for a property name  in  , asserts that  (⃗) is connected to  ′(⃗) via  .
Note that template functions are used to build IRIs, literals, or blank-nodes in an RDF triple.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Given a database instance  of  and a set ℳ of mapping assertions between  and  , we denote by ℳ() the KG generated by ℳ from  (which is an RDF graph):</title>
        <p>{( (⃗) rdf:type ) | ⃗ ∈ ans(,  ) and  ⇝ ( (⃗) rdf:type ) is in ℳ} ∪</p>
        <p>{( (⃗)   ′(⃗)) | ⃗ ∈ ans(,  ) and  ⇝ ( (⃗)   ′(⃗)) is in ℳ}.</p>
        <p>In this paper, we follow the materialisation approach to query answering, i.e., SPARQL queries are
answered by a triplestore into which the generated KG is loaded (i.e., materialised). This is mainly
motivated by the fact that the data sources we target (such as online databases with API access to JSON
ifles) might not support arbitrary queries.</p>
        <p>RML is a flexible, declarative language designed for specifying (GAV) mappings from heterogeneous
data sources to RDF. Building on R2RML, it extends support beyond relational databases to handle</p>
        <sec id="sec-2-3-1">
          <title>JSON, XML, and CSV formats. Its modular architecture [22] includes the RML-core module3, which</title>
          <p>inherits the core mechanisms of R2RML for RDF triple generation, and the RML-IO module4, which
enables access to diverse data sources by defining logical sources and providing a relational abstraction
for them.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>The use of LLM queries in RML mappings is possible thanks to the RML-FNML module5, which</title>
          <p>enables the specification of user-defined functions within triple maps (i.e., in the target of GAV-style
mappings). We can distinguish between two disjoint classes of function symbols: (i) uninterpreted
function symbols (e.g., IRI templates), and (ii) function symbols with associated semantics (e.g., concat
computing the concatenation of two strings, or an LLM query).</p>
          <p>For conciseness and readability, throughout the paper, we adopt the mapping language of Ontop, a
popular Virtual Knowledge Graph system [23], for concrete examples of (RML) mapping rules. This
language closely aligns with the abstract syntax introduced in the previous section. Below is an example
of a mapping assertion in this syntax:
source
target</p>
          <p>SELECT m_id, name FROM medicines
:med-{m_id} rdf:type :Drug .</p>
          <p>:med-{m_id} :prefLabel {name} .</p>
          <p>This example corresponds to two GAV-style mapping assertions. The template :med-{m_id} denotes
an uninterpreted unary function symbol applied to the database attribute m_id, while :Drug and
:prefLabel are IRIs in the target RDF graph. Assuming a database instance where the SQL query in
the source clause returns the tuple (m_id ↦→ "001-100", name ↦→ "Amoxicillin"), the mapping
generates the following RDF triples6:
:med-001-100 rdf:type :Drug .
:med-001-100 :prefLabel "Amoxicillin" .
3https://kg-construct.github.io/rml-core/ontology/documentation/index-en.html
4https://kg-construct.github.io/rml-io/ontology/documentation/index-en.html
5https://kg-construct.github.io/rml-fnml/ontology/documentation/index-en.html
6Expressed in Turtle syntax (https://www.w3.org/TR/turtle/).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. LLM-Enhanced Mappings with RML</title>
      <p>The novel idea of this paper is to integrate LLMs into OBDA mappings so as to extract more structured
information from semi-structured data. Our approach leverages the expressiveness of the RML-FNML
module, which permits the specification of arbitrarily complex user-defined functions within mapping
assertions. This integration enables the dynamic extraction of structured facts from unstructured fields
at mapping time, while preserving the declarative nature of RML mappings. In this section we present
it conceptually. Then in Section 4 we instantiate it on a medicines information use-case.</p>
      <p>To begin with, we define a logical source  over the raw data (JSON documents in our case). For
our purposes,  can be viewed as an instance of a relation over a fixed schema, where the attributes
correspond to references specified in the RML mappings for . In , we distinguish structured attributes,
the values of which can be directly mapped to RDF terms using standard RML templates, and unstructured
attributes, the values of which are free-text fields in natural language, so may require natural language
processing to meaningfully map them to RDF terms.</p>
      <p>We assume a fixed LLM Λ . For simplicity, we abstract it as a total deterministic function from strings
to strings, where Λ( ) denotes the output of Λ on an input string . Furthermore, we introduce a
number of user-defined function symbols, such as getAnswerFromLLM and getFloatFromLLM, with
semantics parameterised by Λ . These function names are interpreted by querying Λ and then extracting
the answer of the expected type from the output of Λ . For example, assuming that toJSONObject
converts a string to a JSON object, the output of Λ is of the form “{"answer": ". . . "}”, and for a
JSON object  = { : }, the expression . evaluates to , the semantics of getAnswerFromLLM on a
string  can be defined as:</p>
      <p>getAnswerFromLLMΛ() := toJSONObject( Λ( ) ).answer
Now, for each property  the value of which should be extracted from an unstructured attribute  , we
assume to have a prompt template   —an instruction string or a few-shot example. Then the expression
getAnswerFromLLM( concat(  ,  ) )
may appear as a term in the target of a mapping assertion for  . Its semantics is as one would expect,
with  evaluating to the textual content of attribute  for a given tuple in .</p>
      <sec id="sec-3-1">
        <title>Finally, the definition of the generated KG ℳ() should be extended to take into account Λ . It is</title>
        <p>straightforward to do so; we omit it due to the lack of space.</p>
        <p>Example Suppose  contains an attribute doseInfo and we want to extract from it the value for
property hasDoseAmount. This can be done in an RML triple pattern of the form:
:med-{m_id} :hasDoseAmount getAnswerFromLLM(concat(PROMPT, {doseInfo})) .
where PROMPT is the string :
Produce output as a JSON string of format {"answer": DOSAGE_AMOUNT}.</p>
        <p>Extract dosage amount:</p>
        <p>For a value “1 g, every 4–6 hours; maximum 4 g per day” of doseInfo and 001-100 of a structured
attribute m_id, the generated RDF triple could be:
:med-001-100 :hasDoseAmount "1" .
4. Medicines Information Use-Case with LLM-Enhanced RML
In this section we demonstrate our approach on a medicines information use-case. We start with a
motivation for the use-case in Section 4.1 and then in Section 4.2 describe our full OBDA setup. All our
artefacts and code can be found at https://github.com/pharmaKG/rmlai.
4.1. Medicines Information
Annually, there is an estimated 66 million medication errors occurring within the NHS that are harmful
to patients. Each year, avoidable adverse drug events are estimated to cause or contribute to a thousand
{
}
"identifier": "apixaban",
"title": "Apixaban",
"breastFeeding": {</p>
        <p>"specificInfo": "A risk cannot be excluded: animal studies shows presence in milk."
},
...
"sideEffects": {
"specificInfo": "Common: haemorrhage, contusion, epistaxis, haematoma. Uncommon: thrombocytopenia,
hypotension, post procedural haematoma."
},
"therapeuticPlan": [{
"indications": ["Treatment of deep-vein thrombosis", "Treatment of pulmonary embolism"],
"dosages": [{
"id": "4b59244d-c27d-4be2-a954-6a793f1c7cb0",
"patientGroup": "adult",
"detailedPatientGroup": "Adult",
"doseInfo": "10 mg taken twice daily for 7 days.",
"routeOfAdministration": "By mouth"
}]</p>
        <p>}]
deaths [24]. The use and application of medicines are often complex, requiring choosing the right
drug, its dosage, administration modality and considering interaction with the existing treatments or
conditions [25]. For instance, a nurse may need to know if a medication can be administered through a
nasogastric tube or if a particular drug is safe to use while breastfeeding. A consultant might inquire
about the occurrence of rare side efects.</p>
        <p>
          To ensure proper use of medicines, medicines information (MI) departments exist to assist healthcare
staf and patients with their queries [ 26]. When an MI department receives a query, it is their
responsibility to sift through a number of dedicated online databases [
          <xref ref-type="bibr" rid="ref10">10, 27, 28</xref>
          ] to find accurate answers,
which then should be compiled into a document including all relevant references. A limitation of this
manual process is that even for relatively straightforward questions, it requires significant time, as each
database must be reviewed individually, and the answer must be properly collated and referenced.
        </p>
        <p>Automating the MI delivery service, at least for basic enquiries, has the potential to reduce the
number of medication errors by providing access to timely and accurate medicines information to
health professionals. We demonstrate how our approach can be used towards this automation when the
medicines data comes as semi-structured information in JSON format. Our idea is to extract a KG from
the data using OBDA mappings, so that answering MI enquiries would amount to answering SPARQL
queries over the extracted KG.
4.2. The OBDA Setup
We describe our OBDA instantiation of the medicines information use-case.</p>
        <p>
          Data. We assume that information about drugs is available in JSON format, where some part of
information is structured, while some information is given in the form of plain text, similarly to the
online databases such as BNF [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and EMC [27]. For the experiments, we created JSON documents,
each stored in a separate file on the local file system, about 20 drugs providing (a subset of) information
on their usage (including patient group, indications, dosage, frequency), side-efects, as well as advice
for patients that are breastfeeding, pregnant, or have hepatic or renal impairment.
        </p>
        <p>A fragment of one of such documents is shown in Figure 1. Here, the route of administration in
dosages is structured, given that it takes one of the possible values, “By mouth”. However the dosage
amount and frequency are stated in one string “10 mg taken twice daily for 7 days”. Advice for special
categories of patients and the side efects are unstructured as well.</p>
        <p>We note that for the sake of our small scale experiment, there is always at most one entry in the
therapeutic plan and at most one entry in dosages. This does not restrict the scope of our methodology.
interactionDetail 1..n</p>
        <p>InteractionDetail
+ severity: string (0..1)
+ evidence: string (0..1)
+ description: string (0..1)</p>
        <p>Usage
usage 1..n + indication: string (0..1)
+ patientCondition: string
+ safetyLevel: string (1)
+ description: string (0..1)</p>
        <p>Dosage
+ route: string (0..n)
+ description: string (0..1)
+ amount: decimal (0..1)
+ unit: string (0..1)
+ frequency: decimal (0..1)
+ duration: xs:duration (0..1)
+ group: string (1)
dosage 0..n + patientGroupDesc: string (0..1)
+ minAge: decimal (0..1)
+ maxAge: decimal (0..1)
+ ageUnit: string (0..1)
+ minWeight: decimal (0..1)
+ maxWeight: decimal (0..1)
+ weightUnit: string (0..1)
+ gender: string (0..1)
DrugInteraction
interactingDrug</p>
        <p>2</p>
        <p>Drug
+ prefLabel: string (1)
+ sideEffect: string (0..n)
+ sideEffectDesc: string (0..1)</p>
        <p>Also, we created auxiliary identifiers at various levels of the JSON documents to simplify the subsequent
linking of diferent entities (see, e.g., the id field inside dosages).</p>
        <p>Ontology. We built a simple ontology to capture the data in our dataset along with a SHACL file that
defines the schema of a valid RDF graph. There are 5 classes and 24 properties, see Figure 2 for an
overview. For every drug there may be several usages, one for each indication and patient condition.
Usually, one usage is reserved for patients without special conditions. Further, there may be up to 4
diferent usage instances: for pregnant, breastfeeding and patients with hepatic or renal impairment.
These typically include a safety advice. Then, each usage is associated with dosage instances, one
for each patient group. Dosage information includes the route of administration, the amount, unit,
frequency and duration of the dosage, as well as the information pertaining to the patient group such
as gender and the age and weight ranges.</p>
        <p>Note that the values for properties highlighted in bold are those that need to be extracted from
unstructured text. We refer to those properties as LLM-extracted properties. All IRIs and the rest of
the values can be extracted directly from JSON. We refer to properties the value for which could be
mapped directly from JSON as immediately accessible. When an entity has LLM-extracted properties,
the original unstructured string is stored in one of the description or xxxDesc properties.
Mappings We created RML mapping rules for all our classes and properties. For each class and
immediately accessible property, we created standard rules. For each LLM-extracted property, the
mapping rule involves a call to a user-defined function implementing an LLM query. For sideEffect
and safetyLevel prompts, we adopted a zero-shot prompting strategy, while for the remaining
properties a few-shot prompting strategy, where each prompt included few examples consisting of an
input and its corresponding expected output. The choice of the strategy was made based on initial
testing; a more thorough evaluation of diferent prompting strategies could be included in a future
work.</p>
        <p>Below we provide mapping rules for the Dosage class, the route and amount properties (instantiated
for apixaban). For readability, we present them using the more concise Ontop-like notation:
source json_to_table("$.therapeuticPlan[*].dosages[*]", apixaban.json)
target http://drug.uk/dosage/{id} rdf:type drug:Dosage ;
drug:route {routeOfAdministration};
drug:amount grel:getFloatFromLLM(</p>
        <p>grel:concat(DOSAGE_PROMPT, {doseInfo})</p>
        <p>The source declaration in the first line essentially creates a relation with the attributes
(id, patientGroup, detailedPatientGroup, doseInfo, routeOfAdministration), through a
JSONpath7 query over the file “apixaban.json”. Each tuple in this relation corresponds to an element of the
dosages array (see Figure 1). In the target part, the first two rules are standard R2RML/RML rules. The
rule for the amount property states that the object value is computed as grel:getFloatFromLLM(),
where
∙  is the string obtained as the concatenation of the fixed string DOSAGE_PROMPT with the value of
doseInfo.
∙ grel:getFloatFromLLM is a user-defined function returning a decimal number. Its definition is
provided in the functions.ttl file that defines standard and custom functions used within RML
mapping rules to transform, enrich or filter data during the mapping process. In particular, this file
specifies the Java class and the name of the method within it implementing the actual call to the LLM
and extraction of the floating point value from its reply.
∙ DOSAGE_PROMPT is the following prompt template:
task: Extract dosage amount from the provided text.
output: a json string of format: {"response": DOSAGE_AMOUNT}. DO NOT RETURN ADDITIONAL EXPLANATIONS OR TEXT</p>
        <p>OR MARKDOWN FORMATTING. Only JSON as string.
rule: DOSAGE_AMOUNT is a float value. And it should be a valid xsd:decimal. If no relevant information is
found in the text, DOSAGE_AMOUNT is null. Do not put null within double quotes.
example: if the text says 50mg daily, DOSAGE_AMOUNT is 50. if it says 50mg in 2 divided dose, as 50 divided
by 2 is 25, DOSAGE_AMOUNT is 25. if text says apply 1 millimeter, DOSAGE_AMOUNT is 1.</p>
        <p>text:</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Evaluation</title>
      <p>In this section we evaluate our approach in terms of correctness of the generated knowledge graph and
of the answers to SPARQL queries over this graph. For all experiments, we employed three versions of
varying sizes of DeepSeek-R1-Distill-Qwen, an open-source LLM with reasoning capabilities [29], with
7, 14 and 32 billion parameters. We chose these models because they could be installed and run locally,
whereas their increasing sizes allowed us to explore the trade-of between their computational cost and
performance. All experiments were run on Intel Xeon Gold 5317 CPU with 32 cores and NVIDIA A100
GPU with 80GB memory.
5.1. Correctness of the Generated Knowledge Graph
We report our findings on the quality of the KGs generated using our OBDA setup, RMLMapper v7.3.3,
and the three LLMs.</p>
      <p>For each LLM, we repeated the KG generation process three times to account for possible diferences
in its outputs. Since for our tasks we were interested in a more deterministic behaviour of the LLMs, we
set the temperature parameter to 0. Typically, lower values of the temperature parameter correspond to
more deterministic behaviours of LLMs: 0 forces an LLM to output a word with the highest probability.
In our experiments, this ensured that all three runs for each LLM were identical. In what follows, all
statistics refer to one run for an LLM.</p>
      <p>We now discuss the correctness of the LLM-extracted values (the ground truth was established
manually). Across 20 drugs, the total number of triples in the extracted RDF graph was 14,641. Out of
these, the object value of 963 triples was LLM-extracted. For each LLM, we summarise the number and
ratio of errors within LLM-extracted facts, as well as the ratio of errors in the context of the complete
RDF graph, and report the average generation time in Table 1. We also present a breakdown of the
errors per property together with the count of values to be extracted using LLM. The smallest model
performed worst (0.25% error rate for LLM-extracted values and 0.16% error rate for the whole graph),
and one run took around 13 minutes. The two bigger models performed similarly (0.05% error rate for
LLM-extracted values and 0.003% error rate for the whole graph), while DeepSeek-14b was about twice
faster than DeepSeek-32b.
7https://en.wikipedia.org/wiki/JSONPath</p>
      <p>DeepSeek-7b</p>
      <p>We cautiously consider our approach to be successful at generating “almost correct” knowledge
graphs from semi-structured information: 50 errors in 14,641 facts amount to 0.003% error rate. Although
we do not include a comparison with a pure LLM-based solution (left for future work), we conjecture
that such a solution would either produce significantly more errors than ours, or would require relying
on more advanced, and hence more expensive, LLMs. Another advantage of our solution is that it
guarantees the conformance of the generated graph with the ontology. For each LLM, the generated RDF
graph has successfully passed SHACL validation for the portion of the data available in our JSON dataset.
Whereas a pure LLM-based solution may produce RDF triples formulated using wrong vocabulary or
not satisfying integrity constraints.
5.2. Answers to queries
We now evaluate our approach in terms of SPARQL query answers over the generated RDF graphs,
and compare it to three other solutions to accessing data: (i) SPARQL queries evaluated over the
KG generated using ‘standard’ (i.e., not LLM-enhanced) RML mappings, referred to as
SPARQL-overRML-KG; (ii) Natural language queries answered by LLM, referred to as NL-over-LLM; (iii) Natural
language queries answered by LLM, each over a single JSON document, referred to as NL-over-JSON.
In NL-over-LLM, we are essentially probing the knowledge acquired by LLM from the training data,
while NL-over-JSON follows a RAG-like (retrieval augmented generation) architecture, where LLM is
expected to pull relevant information from a provided document, a JSON document about a drug in our
case.</p>
      <p>We selected 28 queries, loosely based on actual MI enquiries, to be answerable over our dataset. Each
query is over a specific drug. The first 8 queries are expected to return one of three values Yes, No or
Unknown. The latter value should be returned when there is no evidence to back up a positive or a
negative answer. The remaining 20 queries ask the dosage amount and unit for an indication, for each
of the 20 drugs in our dataset. The correct answers were determined by evaluating the queries of the
ground truth graph.</p>
      <p>We employed zero-shot prompting strategy for the LLM queries in NL-over-LLM and NL-over-JSON.
Based on our initial experiments, we asked LLMs to provide an answer in a structured format. Below
are the zero-shot instructions we used for the dosage queries:
Rule:
- Provide the answer in a structured plain text JSON format
- JSON should contain "dosage amount" and "dosage unit" as keys and their respective values.
- Do not include any additional information or explanation.
- Do not provide JSON in markdown formatting. Only provide plain text JSON.</p>
      <p>For SPARQL-over-RML-KG, there were two correct Yes answers, all other answers were Unknown
since the relevant data was not accessible when using standard RML mappings. Six of the unknown
answers coincided with the ground truth, therefore we consider that SPARQL-over-RML-KG returned
20 incorrect answers. As for the approaches involving calls to LLMs, either for KG construction or for
direct query answering, we summarise the numbers of incorrect query answers in Table 2. Asking for
answers directly from LLMs results in over 20 errors for all three models. Providing LLMs relevant
information in a JSON file reduces the number of errors to 14 for the smallest, 8 for the medium and
6 for the largest models. The number of errors in SPARQL query answers over the three generated
graphs ranges from 3 for the smallest LLM, to 2 for the medium and 1 for the largest one.</p>
      <p>This simple comparison shows that our approach advances the existing work along two dimensions.
First, with respect to the standard RML mappings, our LLM-enhanced mappings make more data
accessible for structured query answering, whose benefits include better explainability of the answers.
Second, providing LLMs with fine-grained context in LLM-enhanced mappings can reduce the risk
of incorrect answers, comparing to probing the implicit knowledge of an LLM as in NL-over-LLM,
or to providing a potentially much bigger context as in NL-over-JSON. Finally, though not currently
implemented, our approach allows for easy verification of query answers, by providing to the user
the original (relatively short) unstructured text from which the values involved in deciding the query
outcome were extracted. Arguably, verification in this case is more straightforward than in
NL-overLLM or NL-over-JSON, where the user would have to look at a whole (possibly quite large) JSON
document.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusions</title>
      <p>In this paper, we presented a promising approach for generating Knowledge Graphs (KGs) from
semistructured data by integrating LLM queries within RML mappings. Our method improves correctness
of query answers as opposed to an LLM-based solution to query answering, as demonstrated through
our preliminary experiments.</p>
      <p>In the future, we plan to address the limitation of RMLMapper that prevents the simultaneous
extraction of semantically related property values, requiring each property to be extracted individually.
This may increase the risk of inconsistencies when these values are interpreted collectively. For instance,
“1 dose every 6 hours” is equivalent to “4 doses every day”. Therefore, two diferent LLM queries might
extract 1 as frequency and 1 as duration, resulting in an incorrect pair. Moreover, this constraint increases
the number of LLM queries, which in turn increases the overall KG generation time. To address these
issues, future work will explore using LLMs in the source component of mapping assertions, e.g.,
by relying on the RML-LV module8, which introduces the notion of logical views specifying virtual
relational schemas over data, and consequently allows for a two-stage mapping process.</p>
      <p>In our experiments we materialised the whole KG. This would not be feasible in real life when
ofering a fully-fledged solution to accessing online databases. Therefore, future work could look into
materialising only the portion of the KG relevant to answering the current query.</p>
      <p>Acknowledgements This research was partially supported by the HEU project CyclOps (grant
agreement n. 101135513), by the Province of Bolzano and FWF through the project Ontegra (DOI
10.55776/PIN8884924), by the Province of Bolzano and EU through the project EFRE/FESR 1078 CRIMA,
and by the Italian PRIN project S-PIKACHU.
8https://kg-construct.github.io/rml-lv/ontology/documentation/index-en.html</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.
https://www.sciencedirect.com/science/article/pii/S0950705124015740. doi:https://doi.org/
10.1016/j.knosys.2024.112940.
[18] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, et al.,
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open
questions, ACM Transactions on Information Systems 43 (2025) 1–55.
[19] G. Xiao, L. Ren, G. Qi, H. Xue, M. D. Panfilo, D. Lanti, LLM4VKG: Leveraging large language
models for virtual knowledge graph construction, in: Proceedings of the 34th International Joint
Conference on Artificial Intelligence (IJCAI), 2025. To appear.
[20] G. Gottlob, C. Koch, R. Baumgartner, M. Herzog, S. Flesca, The lixto data extraction project: back
and forth between theory and practice, in: Proceedings of the Twenty-Third ACM
SIGMODSIGACT-SIGART Symposium on Principles of Database Systems, PODS ’04, Association for
Computing Machinery, New York, NY, USA, 2004, p. 1–12. doi:10.1145/1055558.1055560.
[21] D. Lembo, F. M. Scafoglieri, Ontology-based document spanning systems for information extraction,</p>
      <p>Int. J. Semantic Comput. 14 (2020) 3–26. doi:10.1142/S1793351X20400012.
[22] A. Iglesias-Molina, D. V. Assche, J. Arenas-Guerrero, B. D. Meester, C. Debruyne, S. Jozashoori,
P. Maria, F. Michel, D. Chaves-Fraga, A. Dimou, The RML ontology: A community-driven modular
redesign after a decade of experience in mapping heterogeneous data to RDF, in: Proc. ISWC,
volume 14266 of LNCS, Springer, 2023, pp. 152–175. URL: https://doi.org/10.1007/978-3-031-47243-5_9.
doi:10.1007/978-3-031-47243-5\_9.
[23] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez-Muro,
G. Xiao, Ontop: Answering SPARQL queries over relational databases, Semantic Web J. (2016).</p>
      <p>DOI: 10.3233/SW-160217.
[24] R. A. Elliott, E. Camacho, D. Jankovic, M. J. Sculpher, R. Faria, Economic analysis of the prevalence
and clinical and economic burden of medication error in england, BMJ Quality &amp; Safety 30 (2021) 96–
105. URL: https://qualitysafety.bmj.com/content/30/2/96. doi:10.1136/bmjqs-2019-010206.
[25] G. P. Velo, P. Minuz, Medication errors: prescribing faults and prescription errors, British journal
of clinical pharmacology 67 (2009) 624–628.
[26] J. Rutter, R. Fitzpatrick, P. Rutter, What efect does medicine advice provided by uk m edicines i
nformation pharmacists have on prescriber practice and patient care: a qualitative primary care
study, Journal of evaluation in clinical practice 21 (2015) 307–312.
[27] Electronic medicines compendium, https://www.medicines.org.uk/emc/, Accessed: 2025-05-15.
[28] Medusa, the nhs injectable medicines guide, https://www.medusaimg.nhs.uk/, Accessed:
2025-0515.
[29] DeepSeek-AI, Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning,
2025. URL: https://arxiv.org/abs/2501.12948. arXiv:2501.12948.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <article-title>Linking data to ontologies</article-title>
          ,
          <source>J. on Data Semantics</source>
          (
          <year>2008</year>
          )
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>540</fpage>
          -77688-
          <issue>8</issue>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Lanthaler, RDF 1.1 Concepts and Abstract Syntax</article-title>
          ,
          <source>W3C Recommendation, W3C</source>
          ,
          <year>2014</year>
          . Available at http://www.w3.org/TR/rdf11-concepts/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          , SPARQL
          <volume>1</volume>
          .
          <article-title>1 Query Language</article-title>
          ,
          <source>W3C Recommendation, W3C</source>
          ,
          <year>2013</year>
          . Available at http://www.w3.org/TR/sparql11-query.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <article-title>R2RML: RDB to RDF Mapping Language</article-title>
          ,
          <source>W3C Recommendation, W3C</source>
          ,
          <year>2012</year>
          . Available at http://www.w3.org/TR/r2rml/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Botoeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cogrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Corman</surname>
          </string-name>
          , G. Xiao,
          <article-title>Ontology-based data access - beyond relational sources</article-title>
          ,
          <source>Intelligenza Artificiale</source>
          <volume>13</volume>
          (
          <year>2019</year>
          )
          <fpage>21</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ruckhaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Priyatna</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-E. Vidal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Corcho</surname>
          </string-name>
          ,
          <article-title>Enhancing virtual ontology based access over tabular data with morph-csv</article-title>
          ,
          <source>Semantic Web</source>
          <volume>12</volume>
          (
          <year>2021</year>
          )
          <fpage>869</fpage>
          -
          <lpage>902</lpage>
          . doi:
          <volume>10</volume>
          .3233/ SW-210432.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Slepicka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Szekely</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mannens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          , R. V. de Walle,
          <article-title>Mapping hierarchical sources into RDF using the RML mapping language</article-title>
          ,
          <source>in: Proc. of ICSC</source>
          , IEEE,
          <year>2014</year>
          , pp.
          <fpage>151</fpage>
          -
          <lpage>158</lpage>
          . URL: https://doi.org/10.1109/ICSC.
          <year>2014</year>
          .
          <volume>25</volume>
          . doi:
          <volume>10</volume>
          .1109/ICSC.
          <year>2014</year>
          .
          <volume>25</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Arenas-Guerrero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Debruyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jozashoori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Maria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chaves-Fraga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>Rml ontology portal: Modular resources for the rdf mapping language</article-title>
          , https://kg-construct.github.io/rml-resources/portal/,
          <year>2023</year>
          . Accessed:
          <fpage>2025</fpage>
          -06-05.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <article-title>[9] RML.io, RMLMapper: A java implementation for executing RML rules to generate Linked Data</article-title>
          , https://github.com/RMLio/rmlmapper-java,
          <source>2025. Version 7.3</source>
          .3,
          <string-name>
            <given-names>MIT</given-names>
            <surname>License</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Joint</given-names>
            <surname>Formulary</surname>
          </string-name>
          <string-name>
            <surname>Committee</surname>
          </string-name>
          , British national formulary, https://bnf.nice.org.uk/, Accessed:
          <fpage>2025</fpage>
          - 05-15.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sulem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P. B.</given-names>
            <surname>Veyseh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Sainz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Agirre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Heintz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Recent advances in natural language processing via large pre-trained language models: A survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>56</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hakala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pyysalo</surname>
          </string-name>
          ,
          <article-title>Biomedical named entity recognition with multilingual BERT</article-title>
          ,
          <source>in: Proc. of the 5th Workshop on BioNLP Open Shared Tasks, Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>61</lpage>
          . URL: https://aclanthology.org/D19-5709/. doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D19</fpage>
          -5709.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Alt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hübner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hennig</surname>
          </string-name>
          ,
          <article-title>Improving relation extraction by pre-trained language representations</article-title>
          , arXiv preprint arXiv:
          <year>1906</year>
          .
          <volume>03088</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>Kg-bert: Bert for knowledge graph completion</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>03193</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Melnyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dognin</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. Das</surname>
          </string-name>
          ,
          <article-title>Grapher: Multi-stage knowledge graph construction using pretrained language models</article-title>
          ,
          <source>in: NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ayoola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tyagi</surname>
          </string-name>
          , J. Fisher,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Pierleoni,
          <string-name>
            <surname>ReFinED:</surname>
          </string-name>
          <article-title>An eficient zeroshot-capable approach to end-to-end entity linking</article-title>
          , in: A.
          <string-name>
            <surname>Loukina</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Gangadharaiah</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Min (Eds.),
          <source>Proceedings of the</source>
          <year>2022</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, Association for Computational Linguistics</article-title>
          , Hybrid: Seattle, Washington + Online,
          <year>2022</year>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>220</lpage>
          . URL: https: //aclanthology.org/
          <year>2022</year>
          .naacl-industry.
          <volume>24</volume>
          /. doi:
          <volume>10</volume>
          .18653/v1/
          <year>2022</year>
          .naacl-industry.
          <volume>24</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ciatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agiollo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Omicini</surname>
          </string-name>
          ,
          <article-title>Large language models as oracles for instantiating ontologies with domain-specific knowledge, Knowledge-Based Systems 310 (</article-title>
          <year>2025</year>
          )
          <article-title>112940</article-title>
          . URL:
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>