<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Survey on Dataset Development Techniques for QA Systems ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Aicha</forename><surname>Aggoune</surname></persName>
							<email>aggoune.aicha@univ-guelma.dz</email>
							<affiliation key="aff0">
								<orgName type="department">Computer science department</orgName>
								<orgName type="institution">University 8th May</orgName>
								<address>
									<postCode>1945</postCode>
									<settlement>Guelma</settlement>
									<country key="DZ">Algeria</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="laboratory">LabSTIC Laboratory</orgName>
								<orgName type="institution">University 8th May</orgName>
								<address>
									<postCode>1945</postCode>
									<settlement>Guelma</settlement>
									<country key="DZ">Algeria</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Survey on Dataset Development Techniques for QA Systems ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">93381075D3F5C54836B8EF10581C58D2</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:11+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>QA systems</term>
					<term>Dataset development</term>
					<term>Metrics</term>
					<term>Techniques</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Question-answering (QA) systems are pivotal in natural language processing, driving advancements in conversational AI, virtual assistants, and automated knowledge retrieval. The quality and structure of datasets play a critical role in the performance, reliability, and adaptability of these systems. This paper presents a comprehensive review of dataset development techniques for QA systems. We classify these techniques into three categories: manual techniques, which are based on expert domain and crowdsourcing, and automatic techniques, which are divided into two classes: knowledge-based methods and machine learning, and innovative techniques by using data augmentation methods. We introduce a comparison of some important datasets for QA systems according to different criteria with a special focus is given to evaluation metrics used to assess dataset quality. The study can guide practitioners in developing robust, high-quality datasets for future QA systems.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Natural language processing (NLP) has seen remarkable advancements in recent years, with questionanswering (QA) systems emerging as one of the most impactful applications. QA systems, designed to retrieve precise answers from vast textual information, are now integral to technologies such as search engines, virtual assistants, and knowledge-based systems. The performance of these systems hinges not only on sophisticated algorithms and model architectures but also on the quality and relevance of the datasets used to train them. High-quality datasets provide the essential foundation for these models to understand complex language structures, reason over context, and accurately respond to user queries <ref type="bibr" target="#b0">[1]</ref>.</p><p>Developing robust datasets for QA is a complex and resource-intensive process. Key challenges in dataset development include ensuring data diversity and balancing language complexity. Various techniques have emerged to address these challenges, ranging from traditional manual annotation to innovative method by using data augmentation methods.</p><p>This paper aims to provide a comprehensive review of the techniques used in developing datasets for QA systems, focusing on their strengths, limitations, and areas of application. By systematically examining these methods, we seek to illuminate best practices and emerging trends in QA dataset development. Furthermore, this review addresses the importance of dataset validation and quality metrics, highlighting how they contribute to the reliability and effectiveness of QA systems. Ultimately, our goal is to guide researchers and practitioners in creating datasets that better serve the needs of future QA models, fostering continued innovation and performance improvements in the field.</p><p>The remainder of this paper is organized as follows: In Section 2, we introduce the theoretical foundations. Section 3 reviews the techniques for dataset development. In Section 4, we present a comparison between dataset structures. Section 5. describe the important metrics for Assessing Datasets. Conclusions are drawn in the last section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Theoretical foundations</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Question-Answering systems</head><p>Question-answering (QA) systems offer an intuitive interface for querying vast stores of information across diverse data formats, including both structured and unstructured data in natural languages. These systems play a crucial role in transforming raw data into usable knowledge, enabling users to retrieve specific answers to questions rather than sifting through large documents or databases <ref type="bibr" target="#b1">[2]</ref>. QA systems are increasingly employed in applications ranging from customer support and virtual assistants to research and education, where they can quickly extract insights from sources such as documents, databases, and even multimedia content.</p><p>To operate effectively, QA systems need to handle the variability and complexity of natural language, requiring them to interpret nuanced questions and extract relevant answers accurately. This involves the integration of techniques from fields such as natural language processing (NLP), information retrieval (IR), and machine learning (ML). Additionally, QA systems must accommodate the inherent diversity in question formulations and adapt to different data types, including text documents, tables, knowledge graphs, and multimodal data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Closed-domain Question-Answering systems</head><p>Closed-domain Question-answering systems (CQA) are specialized to respond to queries within defined subject areas, such as sports, healthcare, education, or entertainment <ref type="bibr" target="#b2">[3]</ref>. These systems leverage domainspecific knowledge, often structured in detailed ontologies or databases, to streamline information retrieval and enhance accuracy in answering questions. The focus on a particular domain simplifies the task for natural language processing (NLP) models, as the system can utilize a well-defined vocabulary, set of concepts, and relationships unique to that domain. For example, in a medical QA system, structured knowledge about diseases, symptoms, and treatments can help the system precisely interpret and respond to health-related inquiries.</p><p>Unlike closed-domain systems, open-domain QA systems rely on vast, unstructured sources of information, such as large text corpora, encyclopedic databases (like Wikipedia), or even the internet itself, rather than predefined, domain-specific knowledge structures. This allows them to provide answers on diverse subjects, from historical events and scientific concepts to general trivia and current events.</p><p>Closed-domain QA systems are specifically tailored to operate in contexts where general-purpose, open-domain solutions may lack the required depth, precision, or contextual understanding <ref type="bibr" target="#b3">[4]</ref>. The development of high-quality datasets specifically tailored for QA systems is essential to training models that are reliable, accurate, and generalizable across domains. These datasets need to account for linguistic diversity, context sensitivity, and a wide range of question types, from simple fact-based queries to complex, reasoning-based questions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Techniques of dataset development for CQA systems</head><p>A variety of techniques have been developed to construct datasets for question-answering (QA) systems, each designed to address particular challenges in generating comprehensive and high-quality data for training and evaluation purposes. In this survey, we categorize these techniques into three main types: manual methods, automated methods, and innovative approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Manual methods</head><p>Manual Methods refer to dataset creation techniques that rely on human effort for data collection, question generation, and answer annotation <ref type="bibr" target="#b4">[5]</ref>. These methods are highly valuable for ensuring data quality, relevance, and contextual accuracy, as they allow human annotators to apply their expertise and judgment in curating the dataset. However, manual methods are often labor-intensive, timeconsuming, and costly, especially for large-scale datasets. Human annotators create question-answer pairs based on a given text or knowledge source. Annotators carefully read through documents, extract meaningful information, and formulate questions that can be answered directly from the content <ref type="bibr" target="#b5">[6]</ref>. Another method is based on crowdsourcing, which involves outsourcing the task of question and answer generation to a large pool of workers on platforms like Amazon Mechanical Turk or Figure Eight <ref type="bibr" target="#b6">[7]</ref>. This approach allows for rapid data collection from a diverse group of contributors.</p><p>In specialized fields, such as medicine, law, or finance, domain experts are employed to create or validate question-answer pairs. Their expertise ensures that the information is accurate, contextually relevant, and adheres to domain-specific standards.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Automated methods</head><p>These methods significantly reduce the time and cost required to produce vast amounts of questionanswer pairs, making it possible to construct datasets for training and evaluating models on a large scale. Automatic techniques for creating question-answering (QA) datasets can be broadly divided into two main classes: knowledge-based methods and machine learning-based methods.</p><p>Knowledge-based methods rely on structured information sources, such as ontologies, knowledge graphs, and databases, to automatically generate question-answer pairs <ref type="bibr" target="#b7">[8]</ref>. These methods use predefined rules, templates, and structured data to produce questions and identify corresponding answers.</p><p>Machine learning-based methods, especially those using natural language processing (NLP) and deep learning, have transformed QA dataset creation by automating the generation of complex, context-rich question-answer pairs <ref type="bibr" target="#b8">[9]</ref>. These methods use trained models to generate or extract questions and answers from unstructured text, offering greater flexibility and adaptability <ref type="bibr" target="#b9">[10]</ref>.</p><p>More advanced automated approaches involve using machine learning models, particularly large pre-trained language models (e.g., GPT-3, BERT, T5), to generate question-answer pairs synthetically <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12]</ref>. These models are trained on extensive text corpora, enabling them to produce realistic and contextually varied questions based on input content.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Innovative approaches</head><p>In recent years, data augmentation techniques have gained traction as a way to enhance and diversify QA datasets without the need for entirely new data sources. These techniques manipulate existing question-answer pairs to create new, varied versions, expanding the dataset and exposing models to a wider range of language patterns, contexts, and question types <ref type="bibr" target="#b12">[13]</ref>. Data augmentation approaches are particularly useful for improving model generalization and robustness, helping QA systems perform better in real-world scenarios <ref type="bibr" target="#b13">[14]</ref>.</p><p>Data augmentation techniques like synonym substitution, paraphrasing, and entity replacement are used to increase dataset size and diversity automatically <ref type="bibr" target="#b14">[15]</ref>. By modifying existing question-answer pairs, these methods create variations that expose models to different phrasings and vocabulary without needing new data sources.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Comparison between datasets structures</head><p>When evaluating QA datasets, it is crucial to consider the structure of the dataset and the type of question-answer (Q&amp;A) pairs it contains. Different datasets follow various organizational structures based on their intended use.</p><p>The most existing QA datasets typically consist of pairs of questions and corresponding answers. For example, SQuAD (Stanford Question Answering Dataset): Questions are based on a paragraph, and answers are specific spans of text from the paragraph <ref type="bibr" target="#b15">[16]</ref>. TriviaQA: Similar to SQuAD, the dataset contains questions with answers that are directly extracted from documents or web pages <ref type="bibr" target="#b16">[17]</ref>. Natural Questions (NQ): Contains questions where answers are extracted from long documents.</p><p>Another innovative approach involves query generation from natural language questions. This structure focuses on generating queries that can be used to retrieve answers from a database, knowledge graph, or other structured data sources <ref type="bibr" target="#b17">[18]</ref>. This type of dataset emphasizes the process of converting a natural language question into a structured query that can be executed on a structured database or system, such as SQL. WikiSQL <ref type="bibr" target="#b1">[2]</ref> is a large-scale dataset for natural language to SQL query generation. It contains questions based on data tables from Wikipedia and includes SQL queries that extract answers from these tables.</p><p>More recent work focuses on the generation of Mongo queries from natural questions with the application of three data augmentation techniques: paraphrasing, back translation, and named entity substitution <ref type="bibr" target="#b18">[19]</ref>. An extended work aims to generate more complex queries with auto-validation of the augmented data <ref type="bibr" target="#b19">[20]</ref>.</p><p>Query generation-based datasets are a valuable tool for developing information retrieval systems that bridge the gap between natural language and structured data. By converting natural language questions into executable queries (e.g., SQL, SPARQL, MQL), these datasets enable systems to access and retrieve information from sources.</p><p>Table <ref type="table" target="#tab_0">1</ref> outlining key criteria used to assess various datasets for Question-Answering (QA) systems. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Metrics for Assessing Datasets</head><p>For datasets designed for generative QA, where the model must generate queries in natural language, different metrics are used to evaluate the quality of the generated queries.</p><p>Automatic evaluation using BLEU and ROUGE scores: BLEU is a widely recognized metric in the field of machine translation, while ROUGE is commonly used for evaluating text summarization and other natural language generation tasks. A higher score of these metrics indicates greater similarity and thus a more accurate translation.</p><p>BLEU is a widely recognized metric in the field of machine translation <ref type="bibr" target="#b22">[23]</ref>, while ROUGE is commonly used for evaluating text summarization and other natural language generation tasks <ref type="bibr" target="#b22">[23]</ref>. A higher score of these metrics indicates greater similarity and thus a more accurate translation.</p><formula xml:id="formula_0">𝐵𝐿𝐸𝑈 = 𝐵𝑃 × 𝑒𝑥𝑝 𝑁 ∑︁ 𝑛=1 (𝑤 𝑛 𝑙𝑜𝑔𝑃 𝑛 ) .<label>(1)</label></formula><p>Where:</p><p>• N is the maximum n-gram size (usually up to 4).</p><p>• Pn is the precision for n-grams.</p><p>• Wn is the weight assigned to the precision, usually set to 1/N • BP (Brevity Penalty) adjusts the score for the shorter translations.</p><p>ROUGE evaluates the n-gram overlap between the output summary and one or more reference summaries <ref type="bibr" target="#b23">[24]</ref>. The following formula of ROUGE measure:</p><formula xml:id="formula_1">𝑅𝑂𝑈 𝐺𝐸 = 𝑅𝑂𝑈 𝐺𝐸_𝑁 𝑚 + 𝑅𝑂𝑈 𝐺𝐸_𝐿 𝑚 + 𝑅𝑂𝑈 𝐺𝐸_𝑆 𝑚 .<label>(2)</label></formula><p>Where:</p><formula xml:id="formula_2">𝑅𝑂𝑈 𝐺𝐸_𝑁 = 𝑇 𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑢𝑛𝑖𝑔𝑟𝑎𝑚𝑠 𝑁 𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑝𝑖𝑛𝑔 𝑢𝑛𝑖𝑔𝑟𝑎𝑚𝑠 . (<label>3</label></formula><formula xml:id="formula_3">) 𝑅𝑂𝑈 𝐺𝐸_𝐿 = ∑︀ 𝑟𝑒𝑓 𝑠𝑢𝑚𝑚𝑎𝑟𝑖𝑒𝑠 (𝑙𝑜𝑛𝑔𝑒𝑠𝑡_𝑐𝑜𝑚𝑚𝑜𝑛_𝑠𝑒𝑞𝑢𝑒𝑛𝑐𝑒) ∑︀ 𝑟𝑒𝑓 𝑠𝑢𝑚𝑚𝑎𝑟𝑖𝑒𝑠 (𝑠𝑢𝑚𝑚𝑎𝑟𝑦 𝑙𝑒𝑛𝑔𝑡ℎ) .<label>(4)</label></formula><formula xml:id="formula_4">𝑅𝑂𝑈 𝐺𝐸_𝑆 = ∑︀ 𝑟𝑒𝑓 𝑠𝑢𝑚𝑚𝑎𝑟𝑖𝑒𝑠 ∑︀ 𝑠𝑘𝑖𝑝 𝑏𝑖𝑔𝑟𝑎𝑚 (𝑐𝑜𝑢𝑛𝑡 𝑚𝑎𝑡𝑐ℎ(𝑠𝑘𝑖𝑝) ∑︀ 𝑟𝑒𝑓 𝑠𝑢𝑚𝑚𝑎𝑟𝑖𝑒𝑠 ∑︀ 𝑠𝑘𝑖𝑝 𝑏𝑖𝑔𝑟𝑎𝑚 (𝑐𝑜𝑢𝑛𝑡(𝑠𝑘𝑖𝑝 𝑏𝑖𝑔𝑟𝑎𝑚)) .<label>(5)</label></formula><p>METEOR (Metric for Evaluation of Translation with Explicit ORdering) <ref type="bibr" target="#b24">[25]</ref>: Evaluates text generation based on synonyms, stemming, and word order. It is more flexible than BLEU and rewards synonyms and paraphrased text. The metric is based on the harmonic mean of unigram precision and recall, with recall weighted higher than precision.</p><p>The METEOR score is calculated as follows:</p><formula xml:id="formula_5">𝑀 𝐸𝑇 𝐸𝑂𝑅 = 𝐹 mean × (1 − Penalty).<label>(6)</label></formula><p>where: Harmonic Mean of Precision and Recall:</p><formula xml:id="formula_6">𝐹 mean = 10 • Precision • Recall 9 • Precision + Recall ,<label>(7)</label></formula><formula xml:id="formula_7">Penalty = 𝛾 • (︂ chunks matches )︂ 𝛽 ,<label>(8)</label></formula><p>matches: Total number of matched unigrams, chunks: Groups of matches in the same order, 𝛾 and 𝛽: Tunable parameters to control the penalty's impact (default values are usually 𝛾 = 0.5 and 𝛽 = 3.0).</p><p>Finally, a key metric is how well a model performs on the dataset: Training Loss/Accuracy: These metrics reflect how well the model learns from the dataset during training. A lower loss and higher accuracy indicate a model that fits the data well.</p><p>A low training loss and high accuracy on tasks like extractive QA or question answering from a knowledge base suggest that the dataset is well-constructed and provides enough relevant information. A low training loss and high accuracy on tasks like extractive QA or question answering from a knowledge base suggest that the dataset is well-constructed and provides enough relevant information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>Various techniques for dataset creation and validation in the field of question-answering (QA) systems. These techniques are essential for advancing the effectiveness of QA systems across multiple domains and ensuring that they can handle a diverse set of questions and answer types. this survey offers valuable insights into the diversity of datasets available for training and evaluating QA systems. The datasets reviewed here span a wide range of domains, question types, and answer formats, each designed to address specific challenges in QA. While progress has been made in creating large-scale, diverse, and specialized datasets, challenges related to scalability, dataset quality, and domain generalization remain. As QA systems continue to evolve, the development of new datasets and evaluation metrics will play a crucial role in advancing the capabilities of these systems, allowing them to handle increasingly complex tasks in real-world applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Declaration on Generative AI</head><p>During the preparation of this work, the author used ChatGPT, Grammarly in order to: Grammar and spelling check, Paraphrase and reword. After using this tool, the author reviewed and edited the content as needed and takes full responsibility for the publication's content.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Review of some popular datasets</figDesc><table><row><cell>Ref</cell><cell>Dataset</cell><cell>Source</cell><cell>Field</cell><cell>Methodology</cell><cell>Data size</cell></row><row><cell>[16]</cell><cell>SQuAD</cell><cell cols="3">Wikipedia Diverse Selection of Articles,</cell><cell>+100K</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Question Generation,</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Answer Annotation</cell><cell></cell></row><row><cell>[21]</cell><cell>DBPal</cell><cell cols="3">Synthetic Diverse Generator, data</cell><cell>3 million</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>augmentation,</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>Lemmatizer</cell><cell></cell></row><row><cell cols="2">[18] NarratiQA</cell><cell>books</cell><cell cols="2">Movies Data collection,</cell><cell>46,765</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>question generation</cell><cell></cell></row><row><cell cols="5">[22] BabiMovie Wikipedia Movies data collection,</cell><cell>10.000</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>data structuring,</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>dialog generation,</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>question formulation</cell><cell></cell></row><row><cell>[19]</cell><cell>M2Q2</cell><cell>Mflix</cell><cell cols="2">Movies Creating templates,</cell><cell>88,100</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>data augmentation,</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>data revision</cell><cell></cell></row><row><cell>[20]</cell><cell>M2Q2+</cell><cell>Mflix</cell><cell cols="2">Movies Creating templates,</cell><cell>100k</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>data augmentation,</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>auto-validation</cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Text-to-sql generation for question answering on electronic medical records</title>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">K</forename><surname>Reddy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The Web Conference 2020</title>
				<meeting>The Web Conference 2020</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="350" to="361" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1709.00103</idno>
		<title level="m">Seq2sql: Generating structured queries from natural language using reinforcement learning</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Qi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2205.06983</idno>
		<title level="m">Rasat: Integrating relational structures into pretrained seq2seq model for text-to-sql</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Cobert: Covid-19 question answering system using bert</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Alzubi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Parwekar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gupta</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Arabian journal for science and engineering</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="page" from="11003" to="11013" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Review and analysis of synthetic dataset generation methods and techniques for application in computer vision</title>
		<author>
			<persName><forename type="first">G</forename><surname>Paulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ivasic-Kos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial intelligence review</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="page" from="9221" to="9265" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Optimizing dataset creation: A general purpose data filtering system for training large language models</title>
		<author>
			<persName><forename type="first">S</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Gu</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Clotho-aqa: A crowdsourced dataset for audio question answering</title>
		<author>
			<persName><forename type="first">S</forename><surname>Lipping</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sudarsanam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Drossos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Virtanen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2022 30th European Signal Processing Conference (EUSIPCO)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="1140" to="1144" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Systematic review of question answering over knowledge bases</title>
		<author>
			<persName><forename type="first">A</forename><surname>Pereira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Trifan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">P</forename><surname>Lopes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Oliveira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IET Software</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="1" to="13" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Deep learning based active learning technique for data annotation and improve the overall performance of classification models</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">U</forename><surname>Amin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hussain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Seo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Expert Systems with Applications</title>
		<imprint>
			<biblScope unit="volume">228</biblScope>
			<biblScope unit="page">120391</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Transformer models used for text-based question answering systems</title>
		<author>
			<persName><forename type="first">K</forename><surname>Nassiri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Akhloufi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Intelligence</title>
		<imprint>
			<biblScope unit="volume">53</biblScope>
			<biblScope unit="page" from="10602" to="10635" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Bert model-based natural language to nosql query conversion using deep learning approach</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Hossen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">N</forename><surname>Uddin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Arefin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Uddin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of Advanced Computer Science and Applications</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Towards User-Friendly NoSQL: A Synthetic Dataset Approach and Large Language Models for Natural Language Query Translation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Tola</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
		<respStmt>
			<orgName>Politecnico di Torino</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">An empirical survey of data augmentation for limited data learning in nlp</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="191" to="211" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Gotta: generative few-shot question answering by prompt-based cloze data augmentation</title>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), SIAM</title>
				<meeting>the 2023 SIAM International Conference on Data Mining (SDM), SIAM</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="909" to="917" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Data augmentation techniques in natural language processing</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F A O</forename><surname>Pellicer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Ferreira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H R</forename><surname>Costa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Applied Soft Computing</title>
		<imprint>
			<biblScope unit="volume">132</biblScope>
			<biblScope unit="page">109803</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Rajpurkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1806.03822</idno>
		<title level="m">Know what you don&apos;t know: Unanswerable questions for squad</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Weld</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1705.03551</idno>
		<title level="m">Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">The narrativeqa reading comprehension challenge</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kočiskỳ</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schwarz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Blunsom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Hermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Melis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grefenstette</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="317" to="328" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">M2q2: A text-to-mql dataset for movie qa systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Aggoune</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Mihoubi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th Mediterranean Conference on Pattern Recognition and Artificial Intelligence (MedPRAI)</title>
				<meeting>the 6th Mediterranean Conference on Pattern Recognition and Artificial Intelligence (MedPRAI)</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="10" to="18" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Towards efficient dataset development: A case study of m2q2+ in movie qa systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Aggoune</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Mihoubi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the the 6th Edition of the International Conference on Advanced Aspects of Software Engineering (ICAASE)</title>
				<meeting>the the 6th Edition of the International Conference on Advanced Aspects of Software Engineering (ICAASE)</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="15" to="22" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Dbpal: A fully pluggable nl2sql training pipeline</title>
		<author>
			<persName><forename type="first">N</forename><surname>Weir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Utama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Galakatos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Crotty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ilkhechi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ramaswamy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Bhushan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Geisler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hättasch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Eger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data</title>
				<meeting>the 2020 ACM SIGMOD International Conference on Management of Data</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="2347" to="2361" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Querying nosql with deep learning to answer natural language questions</title>
		<author>
			<persName><forename type="first">S</forename><surname>Blank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wilhelm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-P</forename><surname>Zorn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rettinger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI Conference on Artificial Intelligence</title>
				<meeting>the AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="9416" to="9421" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Bleu: a method for automatic evaluation of machine translation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Papineni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Roukos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ward</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-J</forename><surname>Zhu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 40th annual meeting of the Association for Computational Linguistics</title>
				<meeting>the 40th annual meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="311" to="318" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Rouge: A package for automatic evaluation of summaries</title>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Text summarization branches out</title>
				<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="74" to="81" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Meteor: An automatic metric for mt evaluation with improved correlation with human judgments</title>
		<author>
			<persName><forename type="first">S</forename><surname>Banerjee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lavie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization</title>
				<meeting>the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="65" to="72" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
