<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Preface of the First International TEXT2SPARQL Challenge (TEXT2SPARQL'25)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edgard Marx</string-name>
          <email>edgard.marx@htwk-leipzig.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paulo Viviurka do Carmo</string-name>
          <email>paulo.carmo@htwk-leipzig.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Gôlo</string-name>
          <email>marcosgolo@usp.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Tramp</string-name>
          <email>sebastian.tramp@eccenca.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>First International TEXT2SPARQL Challenge, Co-Located with Text2KG at ESWC25</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Leipzig University of Applied Science</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of São Paulo</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>eccenca GmbH</institution>
          ,
          <addr-line>Hainstr. 8, 04109 Leipzig, Germany, corresponding editor</addr-line>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Adrian Brasoveanu, Modul University Vienna</institution>
          ,
          <addr-line>Austria • Aidan Hogan, DCC</addr-line>
          ,
          <institution>Universidad de Chile • Axel Ngonga, University of Paderborn</institution>
          ,
          <addr-line>Germany • Andreas Both, HTWK, Germany • Gong Cheng</addr-line>
          ,
          <institution>Nanjing University</institution>
          ,
          <addr-line>China • Gustavo Publio, Schwarz IT, Germany • Muhammad Saleem</addr-line>
          ,
          <institution>University of Paderborn, Germany • Ricardo Usbeck, Leuphana Universität Lüneburg, Germany • Ricardo Marcondes Marcacini, USP, Brazil • Sanju Tiwari, Sharda University</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This preface presents information about the challenge, accepted papers, a detailed discussion of the new benchmark datasets, evaluation metrics, and ranking procedure in the following sections. The TEXT2SPARQL challenge invited researchers to participate by sharing an endpoint capable of translating natural language questions into SPARQL queries. Our procedure consisted of three steps: registration, ask, and evaluation. Figure 1 illustrates the challenge procedure with the three steps. Figure 1: TEXT2SPARQL challenge pipeline with three steps: registration, ask, and evaluation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>https://aksw.org/SebastianTramp (S. Tramp)
CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
    </sec>
    <sec id="sec-2">
      <title>New Benchmark Datasets</title>
      <p>
        The TEXT2SPARQL challenge introduced 250 new question/query pairs over two new benchmark
datasets. A DBpedia benchmark with English and Spanish queries from the 2015-10 core, dubbed
DB25, and a corporate dataset with a showcase ontology made from scratch to demonstrate the eccenca
Corporate Memory capabilities, dubbed CK25. For the DBpedia 200 question-query pairs were created
by automatically modifying pairs from QALD 1-82 and LCQuaD 1.03. These queries were then rewritten
using GPT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and manually checked and modified to improve syntax and semantics, as shown in Figure
2. After the human check stage, 35 pairs were deemed initially valid, and 165 were then further checked
and modified until 100 question/query pairs were reached. Finally, these questions were then translated
into Spanish. For the corporate dataset, 50 questions/query pairs were manually curated, considering
classic stakeholders. For details on this new dataset, refer to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It is essential to mention that for both
endpoints, we tried to use diferent SPARQL querying strategies (e.g., ASK, GROUP BY, ORDER BY) in
order to balance the endpoints’ evaluation.
      </p>
      <sec id="sec-2-1">
        <title>2https://github.com/ag-sc/QALD 3https://github.com/AskNowQA/LC-QuAD</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation Metrics and Ranking</title>
      <p>
        The pipeline presented in Figure 3 was used to evaluate the teams. We used Pytrec_eval [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], an
information retrieval evaluation tool, to compute information retrieval measures. The challenge team
obtained information about the true question/query pairs from the YAML datasets and predicted queries.
Then, the challenge team sends these queries to the endpoints to retrieve an answer saved in JSON
format. The result is transformed into the Pytrec_eval standard format, consisting of true and predicted
lists. Finally, the performance of Pytrec was evaluated by comparing the two lists, which enabled us to
calculate the metrics used for the final ranking post-processing.
      </p>
      <p>This challenge explores precision, recall, and  1 metrics. Precision and recall are defined in Equations
1 and 2, in which the precision is defined as the proportion of retrieved documents that are relevant to
evenly weighted harmonic mean of precision and recall, as shown in Equation 3.
the user, and the recall is defined as the proportion of relevant documents that were retrieved.  1 is an
precision =
recall =
|{relevant documents} ∩ {retrieved documents}|</p>
      <p>|{retrieved documents}|
|{relevant documents} ∩ {retrieved documents}|</p>
      <p>,
 1 =
|{relevant documents}|
2 ⋅ precision ⋅ recall
precision + recall
.</p>
      <p>There are queries in both datasets where the order matters, as indicated by the flag
 
_
_   
in the YAML files. In these cases, we calculate 
, a normalized
measure for the Discounted Cumulative Gain (
) metric. The 
compares a position  of where
the document was retrieved and penalizes the value based on a logarithmically proportional reduction
of the relevance   , as shown in Equation 4. To calculate the 

, where  represents a relevant
docThe final</p>
      <p>score is then obtained by averaging the 
ument ranked in the set, the 
 value was used, which is subsequently divided by the ideal (</p>
      <p>scores of all retrieved documents.

_
p = ∑ log2( + 1)
=1</p>
      <p>_   
 1_ =</p>
      <p>1</p>
      <p>∑ {
 =1  1

 if  
_</p>
      <p>_   ,
otherwise.</p>
      <p>Finally, the organizers create a new metric by considering the averages of the  1 measure for every
question, except those flagged as  
then guarantee a final value between
, as shown in Equation 5. All these steps
0 and 1, which considers the maximum retrieval of relevant
documents and the order in which the documents were retrieved, where  is the number of questions.
(1)
(2)
(3)
 ).
(4)
(5)</p>
      <sec id="sec-3-1">
        <title>Text2SPARQL baseline</title>
        <p>
          In recent years, large language models (LLMs) have become a central tool in text mining tasks due to
their ability to understand, synthesize, and generate natural language with high accuracy [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In tasks
involving the translation of natural language into formal representations, such as generating SPARQL
queries from natural language questions (text-to-SPARQL), LLMs have demonstrated SoTA results [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
These models can be explored in various ways, such as using pre-trained versions to more sophisticated
approaches involving task-specific fine-tuning. As part of the challenge proposed in this workshop,
open-source LLMs were used as baselines, evaluating their performance in controlled and reproducible
settings to provide a solid foundation for comparison among participants.
        </p>
        <p>There is a need for datasets that accurately and diversely represent the target task, enabling efective
ifne-tuning of language models. In the context of this challenge, the focus is on generating SPARQL
queries that target the DBpedia ontology. Over the past years, several datasets have been proposed for
the text-to-SPARQL task using DBpedia as a reference. Among them, we highlight four main sources
employed in our preparation: QALD1-94, LC-QuAD 1.05, Paraqa6, and Question-Sparql7. These datasets
were merged into a unified corpus to train our models robustly. Only queries in English and Spanish
were used in these datasets, as these languages were the focus of the challenge. The organizers applied
a preprocessing pipeline that involved filtering out inconsistent, duplicate, or non-executable SPARQL
queries when tested against the DBpedia endpoint adopted in the challenge. This process ensured that
the training data reliably reflected the constraints and characteristics of the target knowledge base.</p>
        <p>
          The model Qwen 2.5 was selected, a high-performance open-source LLM, for fine-tuning [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The
Unsloth library was used, which implements an eficient fine-tuning strategy based on QLoRA (Quantized
Low-Rank Adaptation) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Training was conducted for a total of 100 steps, using a learning rate of 0.001,
which was chosen to strike a balance between training time and result quality. During the generation
phase, the organizers evaluated the model’s performance using four diferent temperature values (0.01,
0.25, 0.5, and 0.75) to assess the impact of variability on SPARQL query generation. Our results showed
that intermediate temperature values, particularly 0.25 and 0.5, outperform the other results. These
experimental settings introduced a moderate level of diversity that helped the model produce more
accurate and contextually appropriate queries without compromising the syntactic correctness of the
SPARQL language. The 1 table presents our baseline results.
        </p>
        <p>Our best baseline result was achieved using the fine-tuned Qwen 2.5 14B model. In comparison, our
worst baseline relied on the pretrained Qwen 2.5 7B model. The results demonstrate that fine-tuning
significantly improved model performance, leading to the generation of more accurate and contextually
appropriate SPARQL queries. We highlight that smaller fine-tuned models outperform larger pretrained
models. All three models and the constructed dataset8 are publicly available on Hugging Face:
• Text2SPARQL-S refers to the small version (7B), which requires approximately 6 GB of GPU
memory. Model: https://huggingface.co/aksw/text2sparql-S;
• Text2SPARQL-M denotes the medium version (14B), which requires approximately 11 GB of</p>
        <p>GPU memory. Model: https://huggingface.co/aksw/text2sparql-M;</p>
        <sec id="sec-3-1-1">
          <title>4https://github.com/ag-sc/QALD</title>
          <p>5https://github.com/AskNowQA/LC-QuAD
6https://huggingface.co/datasets/Orange/paraqa-sparqltotext
7https://huggingface.co/datasets/julioc-p/Question-Sparql
8https://huggingface.co/datasets/aksw/Text2SPARQL-Raw
average on the evaluation scenarios. Best is bold, second best is underlined, and third best is in italic.</p>
          <p>Corporate</p>
          <p>DBpedia en</p>
          <p>DBpedia es</p>
          <p>DBpedia</p>
          <p>Overall
Team
WSE
INFAI
IIS-Q
IIS-L
MIPT
AIFB∗
DBPEDIA-SC∗
DBPEDIA-CL∗
DBPEDIA-CG∗
Challenge Baseline</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Text2SPARQL Awards</title>
        <p>Considering datasets and languages, we have four categories for the Text2SPARQL challenge awards:
1. Corporate
2. DBpedia English
1st INFAI: Daniel Gerber, Lorenz Bühmann, Lars-Peter Meyer, Felix Brei, Claus Stadler
2nd IIS-Q: Daniel Henselmann, Rene Dorsch, and Andreas Harth
3rd IIS-L: Daniel Henselmann, Rene Dorsch, and Andreas Harth
1st INFAI: Daniel Gerber, Lorenz Bühmann, Lars-Peter Meyer, Felix Brei, Claus Stadler
2nd IIS-Q: Daniel Henselmann, Rene Dorsch, and Andreas Harth
3rd AIFB: Jan Wardenga and Tobias Käfer
3. DBpedia Spanish
1st WSE: Aleksandr Perevalov and Andreas Both
2nd AIFB: Jan Wardenga and Tobias Käfer
3rd MIPT: Oleg Somov, Daniil Berezin, and Roman Avdeev
4. Overall
1st WSE: Aleksandr Perevalov and Andreas Both
2nd INFAI: Daniel Gerber, Lorenz Bühmann, Lars-Peter Meyer, Felix Brei, Claus Stadler
3rd IIS-Q: Daniel Henselmann, Rene Dorsch, and Andreas Harth</p>
      </sec>
      <sec id="sec-3-3">
        <title>Assistant Committee Program Committee</title>
        <p>• Edgard Marx, Leipzig University of Applied Sciences (HTWK), Germany
• Sebastian Tramp, eccenca GmbH, Germany
• Diego Moussallem, Paderborn University, Germany
• Paulo Viviurka do Carmo, Leipzig University of Applied Sciences (HTWK), Germany
• Marcos Paulo Silva Gôlo, University of São Paulo, Brazil</p>
      </sec>
      <sec id="sec-3-4">
        <title>Acknowledgements</title>
        <p>The editors would like to thank the advisory team, authors, program committee, and other organizers
for their ongoing support in making this event a success.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] OpenAI, GPT-4
          <source>Technical Report</source>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2303.08774. arXiv:
          <volume>2303</volume>
          .
          <fpage>08774</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Tramp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pietzsch</surname>
          </string-name>
          ,
          <article-title>The CK25 Corporate Knowledge Reference Dataset for Benchmarking Text 2 SPARQL Question Answering Approaches</article-title>
          ,
          <source>in: The 1st GOBLIN Workshop on Knowledge Graph Technologies, DBpedia Association</source>
          ,
          <year>2025</year>
          , pp.
          <article-title>-</article-title>
          . URL: https://github.com/eccenca/ck25-dataset.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Van Gysel</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. de Rijke</surname>
          </string-name>
          ,
          <article-title>Pytrec_eval: An extremely fast python interface to trec_eval</article-title>
          , in: SIGIR, ACM,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          , et al.,
          <article-title>A survey on evaluation of large language models</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Perevalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Both</surname>
          </string-name>
          , A.
          <string-name>
            <surname>-C. Ngonga Ngomo</surname>
          </string-name>
          ,
          <article-title>Multilingual question answering systems for knowledge graphs-a survey</article-title>
          ,
          <source>Semantic Web</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>2089</fpage>
          -
          <lpage>2124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ge</surname>
          </string-name>
          , Y. Han,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          , et al.,
          <source>Qwen technical report, arXiv preprint arXiv:2309.16609</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dettmers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pagnoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          , L. Zettlemoyer, Qlora:
          <article-title>Eficient finetuning of quantized llms</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>