<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">GECKO: A Question Answering System for Official Statistics</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Lucas</forename><surname>Lageweg</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Statistics Netherlands</orgName>
								<address>
									<addrLine>Henri Faasdreef 312</addrLine>
									<settlement>Den Haag</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">University of Amsterdam</orgName>
								<address>
									<addrLine>Science Park 900</addrLine>
									<postCode>1098 XH</postCode>
									<settlement>Amsterdam</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jonas</forename><surname>Kouwenhoven</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Benno</forename><surname>Kruit</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Vrije Universiteit Amsterdam</orgName>
								<address>
									<addrLine>De Boelelaan 1105</addrLine>
									<settlement>Amsterdam</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">GECKO: A Question Answering System for Official Statistics</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">0DC3FB185DE3AFD56F83501EBC04AB28</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:47+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents GECKO, a knowledge graph-based statistical question answering system currently in beta deployment. GECKO aims to facilitate the retrieval of single statistical values from an extensive database containing over a billion values across more than 4,000 tables. The system integrates a comprehensive framework including data augmentation, entity retrieval, and large language model (LLM)-based query generation. A key feature of the beta deployment is the collection of user feedback, which is critical for improving system performance and accuracy. This feedback mechanism allows users to report issues directly, ensuring continuous improvement based on real-world use.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Statistics Netherlands (Centraal Bureau voor de Statistiek; CBS) is an independent administrative body of the Dutch government tasked with the creation of statistics over a broad spectrum of social topics and the responsibility to make them accessible to the general public. However, in-house studies have shown that users struggle to find the correct tables for their needs in the vast amount of data available. This research aims to develop a Question Answering (QA) system to provide specific statistical observations from this data as responses to natural-language user questions.</p><p>QA systems can take several forms, with most recently free-form generative Large Language Models (LLMs) like ChatGPT and GPT4 <ref type="bibr" target="#b0">[1]</ref> getting much attention. Due to the nature of these models, they are able to generalize very well on a large range of topics, but have shown to be prone to 'hallucinations', where plausible but incorrect or even nonsensical answers are generated <ref type="bibr" target="#b1">[2]</ref>. Especially for official data like governmental statistics, this is highly undesirable behavior.</p><p>Knowledge Graph Question Answering (KGQA) is a field where knowledge graphs (KGs) containing real-world facts and relations in structured form are used as a basis for QA systems. Answers of such systems should always adhere to the KG. Therefore, assuming it contains correct information, answering by returning parts of the KG, or reasoning over it, cannot lead to nonsensical answers. In this paper, we introduce an end-to-end pipeline for a generation-based KGQA system of CBS data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SEMANTICS 2024, Demo Track</head><p>Envelope l.lageweg@cbs.nl (L. Lageweg); jonaskouwenhoven@live.nl (J. Kouwenhoven); b.b.kruit@vu.nl (B. Kruit) Our approach introduces a data augmentation process for enhancing model training, explores various encoder architectures for entity retrieval, and proposes a new query generator mechanism enhanced by Low Rank Adaptation (LoRA) <ref type="bibr" target="#b2">[3]</ref>. Additionally, we propose a new prompting technique that utilizes dynamic prompts, constructing specific prompts based on the generation phase. These improvements help the process of generating symbolic expressions for querying a KG, thereby enhancing the overall performance of the QA system.</p><p>This paper details the beta deployment of GECKO and its feedback collection mechanism, emphasizing the role of user input in refining the system. The beta phase is critical for identifying and addressing potential issues, ultimately enhancing the system's robustness and reliability.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Query generation systems, particularly those involving text-to-SQL and KGQA, have made significant strides <ref type="bibr" target="#b3">[4]</ref>. Recent work <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref> focus on grounding queries in knowledge graphs to avoid hallucinations. Recent advancements <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref> highlight the use of LLMs in generating logical forms for querying databases. Data augmentation techniques <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>, are essential for creating diverse and realistic training datasets. Entity retrieval methods including sparse and dense retrieval approaches <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13]</ref>, play a crucial role in identifying relevant data within vast datasets.</p><p>Compared to existing KBQA or text-to-SQL systems, we provide a hybrid solution where statistical tabular data can be represented as knowledge graphs, to which the techniques for symbolic expression generation instead of more complex query language generation (SQL or SPARQL) can be applied. With this approach, we propose a novel system that can help find relevant information in official statistics and similar systems, which is vital for governmental decision making and all fields of research utilising and relying on these statistics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">System Design</head><p>In processing a question, GECKO performs four core steps: entity retrieval, filters retrieval, constrained S-expression decoding (i.e. symbolic expression generation) and observation validation. We restrict the querying space by performing entity retrieval based on the input question to determine the closest KB nodes. This is done through either sparse retrieval using BM25+ <ref type="bibr" target="#b13">[14]</ref> as a baseline method, using a trained dual encoder <ref type="bibr" target="#b11">[12]</ref> or a finetuned ColBERT model <ref type="bibr" target="#b12">[13]</ref>.</p><p>After obtaining the closest matching entities based on the query, we retrieve all possible filters for tables by exploding a subgraph using the entities found. The result of the subgraph exploding is a graph containing all table nodes and their related measures and dimensions having nodes intersecting with the retrieved entities from the previous step. The subgraph contains all relevant nodes to the query, connected to one or more tables.</p><p>The query and subgraph are used as input for the constrained S-expression decoding. The S-expressions are generated token-by-token such that, given the subgraph, admissible tokens are returned at every step. A rule-based baseline was created using the entity retrieval scores to greedily determine what token from the admissible tokens to select. The second method uses a transformer-based decoder-only seq2seq model and dynamic prompting.</p><p>When generating a token at a given timestep, the model evaluates the sequences in the list of admissible/constrained tokens and selects the sequence with the highest assigned score. For example, when 7425eng is given to the decoder as one of the admissible next tokens, but only a decomposition of sub-tokens can be embedded by the model (e.g. 7425 followed by ##eng), the summed log probability for these subtokens will determine the total probability of selecting this identifier for generation.</p><p>The novelty in this constraining method is the introduction of dynamic prompting, which, instead of calculating the likelihood of a token sequence based on a static prompt (i.e. text input for the model), adjusts the prompts according to the generation phase. For example, when generating a table ID, the prompt is altered to only include the most relevant table IDs and their descriptions. Similarly, when measures are generated in the next phase, it retrieves the measures related to the previously generated table ID, and using those to construct a new prompt. This method applies to the different dimension groups as well. Figure <ref type="figure" target="#fig_2">2</ref> contains a schematic overview of the dynamic prompting technique.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Model training &amp; beta deployment</head><p>For creating training data, we developed a method for manual data annotation. This method involves annotators writing queries that can be answered by a specific table cell. Annotators were instructed to write their questions both as full sentences and in a more casual style, aiming to simulate the formulation of questions posed by users in a search engine. The data obtained from this manual annotation process contains queries and their corresponding S-expression, resulting in 2300 annotated pairs. The annotated queries were distributed over random tables from the CBS datapool, and contained a strong class imbalance towards tables that were more easily annotated. This class imbalance and random distribution motivates extending this study with data augmentation. In this extension, annotated S-expressions and their associated queries are used to fine-tune a GPT-3.5 model through the OpenAI fine-tuning services. The query-expression pairs were transformed into prompts using the descriptions of the IDs for various measures, dimensions, and table <ref type="table">IDs</ref>. Training such a model reduces the need for additional manual annotation, while also significantly increasing the amount of annotated data.</p><p>The initial GECKO model (𝑣 1 ) and model containing the improvements discussed here (𝑣 2 ) were evaluated using a selected sample of this dataset. This was done by evaluation exact table matches of generated S-expressions (𝑣 1 0.35; 𝑣 2 0.63), F1-scores for selected dimensions in said S-expressions (𝑣 1 0.62; 𝑣 2 0.71) and by manually annotation answer relevancy, as an answer can be a non-exact match but still be relevant to the question (𝑣 1 0.38; 𝑣 2 0.71).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion</head><p>The beta deployment of GECKO<ref type="foot" target="#foot_0">1</ref> , a generation-based KGQA system for CBS data, marks a significant milestone in improving user interaction with governmental statistics. This phase includes mechanisms for feedback collection, which will play a crucial role in refining and enhancing the system based on user input. The feedback gathered during the beta deployment will help identify and address potential issues, ensuring the system's robustness and reliability. This process is essential for developing a reliable QA system capable of providing accurate and relevant statistical observations in response to natural-language user questions.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>P</head><label></label><figDesc>Pe er ri io od ds s D Da ai ir ry y p pr ro od du uc ct ti io on n</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: (a) Overview of our pipeline from query to answer. Candidate table nodes (green) are retrieved for the query, after which the measure and dimension filter candidates are retrieved (blue), resulting in a complete subgraph for the table candidates. The subgraph is used as input for the constrained S-expression decoding by either the baseline method or trained model. (b) Example CBS table fragment (from 7425eng), showing one dimension (time periods) and two measures.</figDesc><graphic coords="2,90.13,218.09,229.16,165.31" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Schematic overview of dynamic promting for a decoder-only model architecture. If there are multiple tokens that can be generated, a custom prompt is given to the decoder with labels for all the possible options.</figDesc><graphic coords="4,110.13,84.18,375.00,194.47" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://gecko.cbs.nl</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<idno type="arXiv">arXiv:2303.08774</idno>
		<title level="m">GPT-4</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>OpenAI</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">How Language Model Hallucinations Can Snowball</title>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Press</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Merrill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.13534</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">LoRA: Lowrank adaptation of large language models</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Geng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Si</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2208.13629</idno>
		<title level="m">A survey on text-to-sql parsing: Concepts, methods, and future directions</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Three levels of generalization for question answering on knowledge bases</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kase</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vanni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sadler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Beyond</forename><forename type="middle">I I</forename></persName>
		</author>
		<idno type="DOI">10.1145/3442381.3449992</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Web Conference 2021</title>
				<meeting>the Web Conference 2021</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Decaf: Joint decoding of answers and logical forms for question answering over knowledge bases</title>
		<author>
			<persName><forename type="first">D</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xiang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2210.00063</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Evaluating large language models in semantic parsing for conversational question answering over knowledge graphs</title>
		<author>
			<persName><forename type="first">P</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Klettner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Jokinen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Simperl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Matthes</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2401.01711</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Enhancing text-to-SQL capabilities of large language models: A study on prompt design strategies</title>
		<author>
			<persName><forename type="first">L</forename><surname>Nan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cohan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Radev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 2023 Conference on Empirical Methods in Natural Language Processing</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">InPars: Unsupervised dataset generation for information retrieval</title>
		<author>
			<persName><forename type="first">L</forename><surname>Bonifacio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Abonizio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fadaee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nogueira</surname></persName>
		</author>
		<idno type="DOI">10.1145/3477495.3531863</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;22</title>
				<meeting>the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR &apos;22<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="2387" to="2392" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Jeronymo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bonifacio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Abonizio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fadaee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lotufo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zavrel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nogueira</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2301.01820</idno>
		<title level="m">Inpars-v2: Large language models as efficient dataset generators for information retrieval</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Recent trends in deep learning based open-domain textual question answering systems</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="94341" to="94356" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Dense passage retrieval for open-domain question answering</title>
		<author>
			<persName><forename type="first">V</forename><surname>Karpukhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Oguz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Min</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Edunov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W.-T</forename><surname>Yih</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.550</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="6769" to="6781" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Colbert: Efficient and effective passage search via contextualized late interaction over bert</title>
		<author>
			<persName><forename type="first">O</forename><surname>Khattab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zaharia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval</title>
				<meeting>the 43rd International ACM SIGIR conference on research and development in Information Retrieval</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="39" to="48" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Lower-bounding term frequency normalization</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhai</surname></persName>
		</author>
		<idno type="DOI">10.1145/2063576.2063584</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM &apos;11</title>
				<meeting>the 20th ACM International Conference on Information and Knowledge Management, CIKM &apos;11<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="7" to="16" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
