<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>mon Student Errors in SQL Query Formulation to Enhance Learning Support</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Davide Ponzini</string-name>
          <email>davide.ponzini@edu.unige.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abdolhamid Livani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanna Guerrini</string-name>
          <email>giovanna.guerrini@unige.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barbara Catania</string-name>
          <email>barbara.catania@unige.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Coccoli</string-name>
          <email>mauro.coccoli@unige.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Genoa</institution>
          ,
          <addr-line>Dipartimento di Informatica, Bioingegneria, Robotica e Ingegneria dei Sistemi, Genoa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Genoa, Dipartimento di Lingue e Culture Moderne</institution>
          ,
          <addr-line>Genoa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Learning SQL is challenging for many undergraduate students, leading to recurring errors during query formulation. This paper systematically analyzes common SQL errors made by undergraduate students enrolled in Database courses at the University of Genoa during the 2023-24 academic year. Employing a comprehensive taxonomy, we evaluated 561 student queries collected from written exams, laboratory assignments, and unsupervised contexts. Results reveal that syntax errors are the most prevalent, especially in exam settings where queries cannot be executed. Logical errors and complications were also frequently identified, underscoring issues not only with syntax but also with logical reasoning. These insights highlight the need for pedagogical strategies that balance syntactic mastery and logical problem-solving skills. The paper concludes by proposing directions for enhancing learning support through automated error detection and personalized, AI-driven tutoring tools.</p>
      </abstract>
      <kwd-group>
        <kwd>Data Education</kwd>
        <kwd>learning SQL</kwd>
        <kwd>SQL misconceptions</kwd>
        <kwd>educational data analysis</kwd>
        <kwd>computer-assisted learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In tertiary education, relational databases are an integral part of many undergraduate and postgraduate
degree programmes in computer science, computer engineering, data science, business informatics
and related subjects. The practical use of the Structured Query Language (SQL) is part of introductory
relational database courses in such programmes, as SQL is the standard language for manipulating
relational databases. Thus, students are asked to produce simple queries and, in some cases, to develop
more significant projects manipulating data in SQL.</p>
      <p>
        Despite its importance, many students find SQL dificult, and the queries they produce contain
repeated errors that reflect common misunderstandings. These dificulties include grasping the ideas
of relational databases, understanding dificult query structures, and applying theoretical ideas to
real-world problems. One of the main barriers to learning SQL is the transfer of knowledge acquired in
other contexts, such as mathematics and programming [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Students tend to apply familiar thinking
patterns and language structures to SQL, and errors can also result from previous experience with
other programming languages. In addition to syntax errors, errors can also occur at the semantic level,
revealing an incomplete mental model of SQL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In some cases, students see the database management
system as a black box without understanding its underlying principles. Typical concepts that are prone
to error are JOIN clauses, GROUP BY, sub-queries and self-joins [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ].
      </p>
      <p>
        Since the development of targeted teaching tactics relies on a deep analysis and understanding
of common mistakes, some approaches have been proposed in the literature aiming on one side at
categorizing common errors in taxonomies [5] and on the other at identifying the most common
misconceptions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>In this paper, we analyze the mistakes that our students make when learning SQL and categorize them
according to the most comprehensive established taxonomy [5]. The reference target consists of second</p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
year Bachelor students enrolled in database courses in the Computer Science and Computer Engineering
Bachelor Degrees at the University of Genova in a.y. 2023-24. The research seeks trends in errors
made by students, to gain insights about the areas that can be boosted to improve students learning.
The classification involved 561 queries collected from 27 diferent information requests (i.e., request
specifications in natural language), formulated in diferent settings by a total of 81 students/teams.
Analysis of errors across diferent contexts and student groups highlights common dificulties in query
construction. Syntax errors are the most common, accounting for almost half of all errors, in settings
in which the students cannot execute the queries, highlighting the need to reinforce syntax rules and
debugging skills. Logical errors are also common, suggesting that students struggle not only with
syntax, but also with query logic, optimization, and interpreting the constraints in the request.</p>
      <p>Although the numbers of the study are significant and allow for meaningful initial insights, automated
support is needed to support high numbers of students throughout the courses and to avoid the risk of
errors inherent in manual activities. The paper also discusses how the classification can be exploited and
incorporated in other tools currently under development for assisting and enhancing student learning.</p>
      <p>The remainder of the paper is organized as follows. After discussing related work in Section 2, the
methodology adopted for the study is presented in Section 3. Results are presented and discussed in
Sections 4 and 5, respectively. The limitations and ongoing directions to enhance learning support are
discussed in Section 6 while Section 7 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>In this section, we first review the most relevant approaches related to the analysis of typical errors and
misconceptions and then briefly survey tools providing automatic feedback or proposing personalized
exercises and learning paths.</p>
      <sec id="sec-2-1">
        <title>2.1. Analysis of SQL Query Errors</title>
        <p>Learning SQL poses challenges for students, involving both syntax and semantics[6]. To efectively
address these issues, researchers emphasize the need for structured error categorization. Frameworks
like those proposed by Ahadi et al. [7] and Taipalus et al. [5] provide taxonomies that help identify,
analyze, and correct common student errors. This categorization aids in designing targeted instructional
materials and feedback strategies.</p>
        <p>
          Understanding the root causes of SQL errors is equally important [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Errors often stem from
misconceptions, lack of foundational knowledge, or cognitive mismatches due to SQL’s declarative
nature. Students with experience in procedural programming or mathematics may struggle to adapt
to SQL’s logic and structure [8]. Misconceptions also arise from linguistic transfer, such as misusing
keywords that resemble natural language or programming syntax [9].
        </p>
        <p>
          Research [
          <xref ref-type="bibr" rid="ref1 ref2 ref4">2, 1, 4</xref>
          ] highlights frequent errors in subqueries, GROUP BY, and JOIN clauses, as well as
improper use of primary keys and misunderstanding of DISTINCT. Additional issues include using
incorrect operators (e.g., == instead of =), failing to apply self-joins, or confusing keyword syntax.
Think-aloud studies [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and expert analyses [10] further underline the importance of addressing these
conceptual gaps to support efective SQL learning.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Automatic Feedbacks to SQL Queries and Personalization</title>
        <p>Several approaches and prototypes have been designed to provide automatic feedback and correction for
SQL teaching and learning. The tools aim to enhance learning outcomes by identifying errors, ofering
constructive feedback, and supporting self-guided improvement [11]. Many approaches target automatic
correction and grading of SQL queries. The support ranges from the integration of CodeRunner in a
Learning Management System like Moodle for providing immediate feedback by executing the query
formulated by the student [12], to automatic grading systems [13, 14] and automatic correction based
on Large Language Models [15].</p>
        <p>The full potential of such tools in enhancing learning can only be leveraged if they extend beyond
grading eficiency by also providing tutoring capabilities to the students. Hint generation is, for instance,
the focus of [16, 17]. In [18] fine-tuned GPT models are used to detect and provide feedback on semantic
errors in SQL queries. Another possible use of generative AI explored in [19] is for the automatic
exercise generation.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>In this study, a systematic analysis of SQL query errors produced by students is carried out. The
student-generated SQL queries were collected from the following sources:
• Written exams from the Databases courses in Computer Science and Computer Engineering1.</p>
      <p>In this context, students were not allowed to execute any queries or use any tools to provide
support.
• Laboratory assignments specifically designed for the Database course in Computer Science [ 20].</p>
      <p>In this setting, students had the opportunity to execute queries, observe the corresponding results,
and compare them with the expected answers, thus allowing for iterative refinement.
• The queries described in Section 4 of [21] (referred to as the Miedema dataset in the following),
which took place in an unsupervised environment wherein students autonomously decide whether
or not to execute their queries. However, in this setting, no reference results were provided
to students for comparison. We included this dataset since we aimed at comparing the errors
detected by us with the ones presented in the thesis.</p>
      <p>Throughout the paper, we refer to the laboratory assignments and the Miedema dataset collectively as
interactive assignments. In both cases, students had the possibility to execute their queries and observe
the corresponding outputs in real time. This interactive nature distinguishes these environments from
written exams, where students are required to write queries without any execution or feedback. Despite
this similarity, it is worth noting that the level of guidance difers: laboratory sessions are supervised
and provide students with expected results for comparison, while the Miedema dataset was collected in
an unsupervised context without reference solutions.</p>
      <p>Each collected query was subjected to a manual review process, which entailed testing the query
execution against the corresponding database to ascertain its correctness and identify any errors.
Queries were evaluated based on their ability to execute without raising errors, to produce the expected
result on a reference dataset, and their alignment with the requests outlined in the query specification.
The manual analysis entailed evaluating each query for specific errors by comparing the student queries
against the expected correct solutions.</p>
      <p>Errors were identified and categorized following the established and comprehensive framework by
Taipalus [5], distinguishing four main categories:
• Syntax errors: queries that are not executable due to incorrect SQL grammar.
• Semantic errors: queries that can be executed but are incorrect, irrespective of the specific
query request.
• Logical errors: queries that can be executed but are not aligned with the specific query request.
• Complications: queries that accurately align with the specified query request, yet exhibit
unnecessary complexity. Note that this does not concern optimization, but rather avoidable
design choices, such as using unnecessary tables or overly elaborate constructs.</p>
      <p>Additionally, queries were categorized based on the complexity of the request: simple queries involve
basic selection, filtering, and limited joins; medium complexity queries introduce aggregation,
sorting, or conditional logic; hard queries involve advanced SQL features such as complex joins, nested
conditions, and multi-table aggregations.
1Note that the assignments of the written exams are diferent, as the corresponding syllabi are. Thus, the aim of this study is
by no means to compare the two groups of students.</p>
      <p>Student Group (Degree)
Number of Requests
Number of Subjects
(Students/Teams)
Language
Data Collection Method
Answer Time
Sample Data Provided
Work in Teams
Possibility of executing the Queries
Expected Results Available</p>
      <p>Limited
No
No
No
No
Computer
Engineering
Exam
Computer
Engineering
3
12
Italian
Written Exam</p>
      <p>Italian
Written Exam
Computer
Science
8
25
Limited
No
No
No
No</p>
      <p>Computer
Science
Laboratories
Computer
Science
9
23
Italian
Online
Submission
Not Limited
Yes
Yes
Yes
Yes</p>
      <p>Miedema
dataset [21]
Computer
Science
7
21
English
Online
Submission
Not Limited
Yes
No∗
Yes
No
∗ The queries were formulated in an unsupervised context, precluding the possibility of determining whether
students worked alone.</p>
      <p>This structured approach allowed us to systematically record, categorize, and analyze errors,
facilitating detailed statistical analysis and deeper insights into the error patterns and common misconceptions
encountered by students. For a more thorough exposition on the modalities of data collection, refer to
Table 1. Further details are presented in [22].</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>We assigned 724 labels across 561 queries, identifying a total of 536 SQL errors, as well as 188 correct
queries. A detailed breakdown of the error labels across the four datasets is presented in Table 2.</p>
      <p>The analysis revealed that 66% of the queries contained at least one error or complication. Specifically,
37% of all queries exhibited at least one syntax error, 9% exhibited at least one semantic error, 22%
exhibited at least one logical error, and 18% exhibited at least one complication, as illustrated in the
Overall category of Figure 2. It is important to note that these categories are not mutually exclusive; a
single query may exhibit multiple types of errors.</p>
      <p>Figure 1 shows the error distributions overall and by each context, comparing the diferent modalities
and student groups. Overall, syntax errors are the most prevalent type of error, followed by logical errors
and complications. Semantic errors are the least prevalent type of error. The prevalent types of errors
across diferent contexts are shown in Table 3. The most prevalent errors and complications exhibited by
both student groups are analogous to the overall errors previously delineated. A finer-grained analysis
of the most common errors and misconceptions detected in these queries is presented in [22].</p>
      <p>As illustrated in Figure 2, the error type distribution for each query is compared across diferent
100%
75%
50%
25%
0%
Overall</p>
      <p>Written</p>
      <sec id="sec-4-1">
        <title>Most Common Errors</title>
        <p>SYN: referencing a non-existing schema (34)
LOG: superfluous columns in SELECT (33)
LOG: omission of requested columns (30)
SYN: referencing a non-existing column (29)
SEM: returning many duplicate rows (27)
SYN: use of non-standard keywords (21)
SYN: use of non-standard operators (20)
SEM: tautological or inconsistent expression (19)
LOG: missing logical expression (19)
SYN: invoking a non-existing function (14)
SYN: referencing a non-existing schema (34)
SYN: referencing a non-existing column (26)
SYN: use of non-standard keywords (21)
SYN: use of non-standard operators (20)
LOG: omission of requested columns (19)
SEM: returning too many duplicate rows (27)
LOG: superfluous columns in SELECT (26)
LOG: missing logical expression (18)
LOG: extraneous logical expression (12)
SYN: confusing the logic of keywords (12)</p>
      </sec>
      <sec id="sec-4-2">
        <title>Most Common Complications</title>
        <p>Join with unnecessary tables (48)
Unnecessary DISTINCT (16)
Join with unnecessary tables (7)
Join with unnecessary tables (41)
Unnecessary DISTINCT (14)
modalities and student groups. It is important to note that a single query can exhibit multiple error
types. In instances where a query exhibits multiple errors of the same type, it is considered as a single
occurrence. Correct queries are defined as those that do not present any error or complication.</p>
        <p>Overall, the majority of queries exhibit at least one syntax error. After excluding these errors, the
majority of the remaining queries demonstrate no errors or complications. Logical errors are also
frequent, followed by complications. The least prevalent error type is semantic errors. During written
exams most queries contain syntax errors, with a mean value of 0.83 syntax errors per query. At the
same time they present few semantic or logical errors. Complications are uncommon, with a mean
value of 0.08 complications per query. These findings are supported by the data presented in Table 2.</p>
        <p>Interactive assignments present a diferent distribution: most of them are completely correct, followed
by logical errors and complications. A small amount of queries presents syntax errors. The least common
error type is semantic errors. Syntax, logical and semantic errors are more common in Computer
Engineering students when compared to Computer Science students. Both student groups tend instead
to write complications with a similar frequency. Interestingly, in the data we analyzed from Computer
Engineering students, we were unable to find a query that did not present any error or complication. It
is important to note that the queries and experimental settings difer between the two student groups.
Furthermore, the Computer Engineering dataset is considerably smaller than the Computer Science one.
Consequently, it is not feasible to draw any direct conclusions regarding their comparative performance.</p>
        <p>Figure 3 analyzes error distribution by query dificulty. A close examination of the data reveals that
syntax errors are more prevalent in complex queries, while logical errors are more prevalent in simpler
queries. Semantic errors appear to be more frequent in medium-complexity queries. Complications are
present in all queries, regardless of their complexity.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The analysis of SQL errors across various educational contexts and student groups provides significant
insights into common issues students encounter when constructing database queries. The prevalence
of errors identified across contexts underscores the challenges students face, particularly with syntax
and logical reasoning in query construction.</p>
      <p>Overall, syntax errors emerged as the most common type of error, representing nearly half of all
errors. This highlights a fundamental issue students experience in mastering SQL syntax, suggesting
a need for reinforcing syntax rules and debugging skills in teaching strategies. Logical errors and
complications were also frequent, emphasizing that students not only struggle with basic syntax but
also with query logic, optimization, and adherence to problem constraints.</p>
      <p>When comparing written and interactive assignment modalities, a distinct pattern emerges. Written
tests predominantly exhibited syntax errors, indicative of dificulties in recalling precise SQL syntax
without interactive feedback. The high occurrence of referencing non-existent database objects suggests
that students might benefit from additional practice in accurately memorizing and understanding
database schemas. Conversely, interactive assignments showed fewer syntax errors but more logical
errors and complications, suggesting that while interactive environments reduce syntactical mistakes,
students might over-complicate solutions or misunderstand query requirements when immediate
feedback is available. Therefore, teaching methodologies could integrate both written and interactive
elements to balance syntax learning and logical problem-solving.</p>
      <p>Analysis by query dificulty reveals additional trends. Complex queries correlated strongly with
increased syntax errors, pointing to students’ struggles with managing intricate query structures.
Conversely, simpler queries saw increased logical errors, indicating complacency or misunderstandings
about fundamental database query logic. Semantic errors were notably higher in queries of medium
complexity, perhaps because intermediate queries involve nuanced schema understanding and data
relationships without the explicit complexity of more challenging problems. Educational approaches
could thus diferentiate between levels of query dificulty, tailoring instructions and practice exercises
to student needs at each complexity level.</p>
      <p>The presence of complications across all dificulty levels suggests a pervasive inclination toward
unnecessarily complex query formulations. This underscores a pedagogical opportunity to reinforce
principles of eficient and simplified query design, potentially through examples contrasting complicated
and straightforward solutions.</p>
      <p>
        With regard to Miedema’s queries, we found only a partial correspondence with the problems
highlighted in [
        <xref ref-type="bibr" rid="ref1">1, 21</xref>
        ], perhaps also due to the diferent settings in which the queries were formulated.
      </p>
      <p>In conclusion, our findings advocate for pedagogical adjustments, including enhanced syntax training,
targeted logical reasoning exercises, and focused interventions based on student backgrounds and
assignment modalities. Such targeted educational strategies promise to efectively address the identified
challenges, thereby improving overall proficiency in SQL query formulation among students.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Limitations &amp; Ongoing Research</title>
      <p>This study presents some limitations that must be acknowledged. The primary limitation is the restricted
amount of data, particularly from Computer Engineering students, which impacts the generalizability
of our findings. The dataset from Computer Engineering students was notably smaller than that from
Computer Science students, potentially skewing the analysis toward issues predominant within a single
group. Additionally, the necessity of manually examining each query introduces potential for human
error or bias in error categorization. Manual evaluation, while thorough, is time-intensive and not
scalable to larger datasets without introducing considerable resource constraints.</p>
      <p>Based on the findings of this study, future research focuses on the development and assessment of
targeted educational interventions using innovative technologies. Currently, we are developing three
distinct tools (further details can be found in [23]):
• Automatic Error Identification and Categorization Tool: This tool automatically analyzes
SQL queries to detect and classify errors according to established taxonomies. This tool can be
used to analyze more in depth students’ weaknesses and to allow educators to rapidly pinpoint
and address students’ specific areas of weakness.
• Generative AI-based Assignment Creator: Leveraging generative AI, this tool generates
tailored SQL assignments reflecting students’ personal interests and targeting common error
patterns. This personalized approach aims to improve student engagement and targeted skill
development.
• Interactive AI-based SQL Tutor: An AI-driven interactive tutor designed to provide
personalized guidance and immediate feedback on SQL queries. The tutor interacts dynamically with
students, addressing misconceptions and facilitating a deeper understanding of SQL concepts.</p>
      <p>These tools are anticipated to significantly enhance the precision of educational interventions,
enabling educators to more efectively address specific SQL learning dificulties and misconceptions.</p>
      <p>As per the analysis is concerned, it can be expanded to other degrees ofering a database course, as
well as to other universities and high schools, ideally on a common set of reference queries. Further
refinements relate to the investigation of the impact of previous knowledge, both in terms of previous
knowledge of SQL (e.g., students who have studied SQL during high school versus those who have
not) and of experience with other programming languages, which can be diferent among students of
diferent degrees.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>This study analyses common SQL errors made by undergraduate students, revealing significant
challenges with syntax and logic. Syntax errors were most common in exams, while interactive assignments
highlighted logical errors. Computer Engineering students had higher error rates, probably due to
less exposure to SQL. These findings highlight the need for tailored teaching approaches. To address
these issues, we are developing AI-driven tools for error detection, personalized practice and real-time
feedback to improve SQL learning and student proficiency.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and DeepL in order to: Abstract drafting,
Drafting content, Paraphrase and reword, Improve writing style, Grammar and spelling check. After
using these tools, the authors reviewed and edited the content as needed and take full responsibility for
the publication’s content.
[5] T. Taipalus, M. Siponen, T. Vartiainen, Errors and complications in SQL query formulation, ACM</p>
      <p>Transactions on Computing Education (TOCE) 18 (2018) 1–29.
[6] P. Garner, J. Mariani, Learning SQL in steps, Learning 12 (2015) 23.
[7] A. Ahadi, J. Prior, V. Behbood, R. Lister, Students’ semantic mistakes in writing seven diferent
types of SQL queries, in: Proceedings of the 2016 ACM Conference on Innovation and Technology
in Computer Science Education, 2016, pp. 272–277.
[8] H. Al-Shuaily, SQL pattern design, development &amp; evaluation of its eficacy, Ph.D. thesis, University
of Glasgow, 2013.
[9] D. Traversaro, Insegnare SQL a chi non ha mai programmato: Analisi delle misconcenzioni, in:</p>
      <p>ITADINFO Secondo Convegno Italiano sulla Didattica dell’Informatica (In Italian), 2024.
[10] D. Miedema, G. Fletcher, E. Aivaloglou, Expert Perspectives on Student Errors in SQL, ACM Trans.</p>
      <p>Comput. Educ. 23 (2022).
[11] C. Kenny, C. Pahl, Automated tutoring for a database skills training environment, in: Proceedings
of the 36th SIGCSE Technical Symposium on Computer Science Education, 2005, pp. 58–62.
[12] A. Wójtowicz, M. Prill, Relational Database Courses with CodeRunner in Moodle: Extending SQL
Programming Assignments to Client-Server Database Engines, in: Proceedings of the 56th ACM
Technical Symposium on Computer Science Education V. 1, 2025, pp. 1239–1245.
[13] K. Manikani, R. Chapaneri, D. Shetty, D. Shah, SQL Autograder: Web-based LLM-powered
Autograder for Assessment of SQL Queries, International Journal of Artificial Intelligence in
Education (2025) 1–31.
[14] B. Chandra, B. Chawda, B. Kar, K. M. Reddy, S. Shah, S. Sudarshan, Data generation for testing
and grading SQL queries, The VLDB Journal 24 (2015) 731–755.
[15] Z. Chen, S. Chen, M. White, R. Mooney, A. Payani, J. Srinivasa, Y. Su, H. Sun, Text-to-SQL error
correction with language models of code, arXiv preprint arXiv:2305.13073 (2023).
[16] C. Kleiner, F. Heine, Enhancing Feedback Generation for Autograded SQL Statements to Improve
Student Learning, in: Proceedings of the 2024 on Innovation and Technology in Computer Science
Education V. 1, ITiCSE 2024, Association for Computing Machinery, New York, NY, USA, 2024, p.
248–254.
[17] Y. Hu, A. Gilad, K. Stephens-Martinez, S. Roy, J. Yang, Qr-hint: Actionable hints towards correcting
wrong sql queries, Proceedings of the ACM on Management of Data 2 (2024) 1–27.
[18] A. AlRabah, S. Yang, A. Alawini, Optimizing Database Query Learning: A Generative AI
Approach for Semantic Error Feedback, in: ASEE Annual Conference and Exposition, Conference
Proceedings, American Society for Engineering Education, 2024.
[19] W. Aerts, G. Fletcher, D. Miedema, A Feasibility Study on Automated SQL Exercise Generation
with ChatGPT-3.5, in: Proceedings of the 3rd International Workshop on Data Systems Education:
Bridging education practice with education research, 2024, pp. 13–19.
[20] B. Catania, G. Guerrini, D. Traversaro, Collaborative learning in an introductory database course:
A study with think-pair-share and team peer review, in: Proceedings of the 1st International
Workshop on Data Systems Education, DataEd ’22, 2022, p. 60–66.
[21] D. E. Miedema, On learning SQL: Disentangling concepts in data systems education, 2024. URL:
https://pure.tue.nl/ws/portalfiles/portal/314131184/20240112_Miedema_hf.pdf.
[22] A. Livani, Do the errors produced by generative AI in formulating queries reflect students’
misconceptions in learning SQL?, Master’s thesis, MSc in Computer Science, University of Genova,
2024. URL: https://unire.unige.it/bitstream/handle/123456789/10623/tesi31530643.pdf?sequence=1.
[23] D. Ponzini, B. Catania, G. Guerrini, Leveraging Frequent Errors, Generative AI, and Peer
Collaboration to Enhance SQL Learning, Submitted for publication, 2025.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Miedema</surname>
          </string-name>
          , E. Aivaloglou,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Fletcher, Identifying SQL misconceptions of novices: Findings from a think-aloud study</article-title>
          ,
          <source>ACM Inroads 13</source>
          (
          <year>2022</year>
          )
          <fpage>52</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Brass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Goldberg</surname>
          </string-name>
          ,
          <article-title>Semantic errors in SQL queries: A quite complete list</article-title>
          ,
          <source>Journal of Systems and Software</source>
          <volume>79</volume>
          (
          <year>2006</year>
          )
          <fpage>630</fpage>
          -
          <lpage>644</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Taipalus</surname>
          </string-name>
          ,
          <article-title>Explaining causes behind SQL query formulation errors</article-title>
          ,
          <source>in: 2020 IEEE Frontiers in Education Conference (FIE)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Presler-Marshall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heckman</surname>
          </string-name>
          , K. Stolee,
          <article-title>SQLRepair: Identifying and repairing mistakes in student-authored SQL queries</article-title>
          , in: 2021 IEEE/ACM 43rd International Conference on Software Engineering:
          <article-title>Software Engineering Education and Training (ICSE-SEET)</article-title>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>199</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>