<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Understanding Tables in Financial Documents Shared Tasks for Table Retrieval and Table QA on Japanese Annual Securities Reports</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Yasutomo</forename><surname>Kimura</surname></persName>
							<email>kimura@res.otaru-uc.ac.jp</email>
							<affiliation key="aff0">
								<orgName type="institution">Otaru University of Commerce</orgName>
								<address>
									<settlement>Hokkaido</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Eisaku</forename><surname>Sato</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Otaru University of Commerce</orgName>
								<address>
									<settlement>Hokkaido</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Kazuma</forename><surname>Kadowaki</surname></persName>
							<email>kadowaki.kazuma@jri.co.jp</email>
							<affiliation key="aff1">
								<orgName type="institution">The Japan Research Institute Limited</orgName>
								<address>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hokuto</forename><surname>Ototake</surname></persName>
							<email>ototake@fukuoka-u.ac.jp</email>
							<affiliation key="aff2">
								<orgName type="institution">Fukuoka University</orgName>
								<address>
									<settlement>Fukuoka</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="department">Testbeds and Community for Information Access Research</orgName>
								<orgName type="laboratory">The First Workshop on Evaluation Methodologies</orgName>
								<address>
									<addrLine>December 12</addrLine>
									<postCode>2024</postCode>
									<settlement>Tokyo</settlement>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Understanding Tables in Financial Documents Shared Tasks for Table Retrieval and Table QA on Japanese Annual Securities Reports</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7D64228A3DA3121A8622AC31AB865FC8</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>annual securities report</term>
					<term>shared task</term>
					<term>table retrieval</term>
					<term>table question-answering</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents a framework for the "NTCIR-18 U4" and "SIG-FIN UFO-2024" shared tasks, which focus on tables within annual securities reports. Annual securities reports are critical documents that provide insights into a company's financial status and business performance. However, challenges remain in accurately and efficiently analyzing the data they contain. To address these issues, we propose two sub-tasks for the above shared tasks: Table Retrieval and Table QA tasks, which utilize datasets from TOPIX100 and TOPIX500 annual securities reports. Participants are tasked with developing systems (programs) that automatically process data for the two tasks and compete for top performance on a leaderboard. Accuracy scores and rankings are determined by submitting the task's output, in JSON format, to the leaderboard. Through these shared tasks, we aim to enhance the utility of annual securities reports and advance natural language processing technologies for financial data analysis.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, financial disclosures have become essential for investors seeking to make informed decisions based on reliable corporate data. In Japan, listed companies are required to submit an annual securities report, a statutory disclosure document that provides comprehensive information on business operations, financial data, risk factors, corporate governance, and shareholder information. These reports, accessible via the Electronic Disclosure for Investors' NETwork (EDINET) 1 , serve as a critical information source for investors aiming to compare companies effectively.</p><p>These securities reports are structured in XBRL (eXtensible Business Reporting Language), an XMLbased format designed to standardize and facilitate the production, distribution, and reuse of financial information. By incorporating "taxonomies" that define the structure and meaning of data, XBRL enables automated processing, potentially streamlining financial analysis.</p><p>However, practical challenges arise due to the presence of untagged data and the existence of unique taxonomies created by different report submitters, complicating the identification of comparable elements across reports.</p><p>To this end, we propose two tasks that aim to facilitate cross-company comparisons by focusing on the tables and text within annual securities reports. The first task is the NTCIR-18 U4 task, adopted by Japan's National Institute of Informatics (NII) as part of NTCIR-18 2 . The second is the SIG-FIN UFO-2024 task, organized by the Financial Informatics Study Group (SIG-FIN) under the Japanese Society for Artificial Intelligence (JSAI). The former focuses on TOPIX100 annual securities reports submitted between April 1, 2020, and March 31, 2021, while the latter focuses on TOPIX500 annual securities reports submitted between July 1, 2023, and June 30, 2024.  We organized these shared tasks in collaboration with the NII Testbeds and Community for Information Access Research (NTCIR) <ref type="bibr" target="#b0">[1]</ref>, which specializes in information retrieval, and the SIG-FIN group of the JSAI, which focuses on financial technology. Through these initiatives, we aim to attract researchers and practitioners interested in these fields and contribute to further advancing technologies at the intersection of finance and information retrieval.</p><p>In each shared task, we conducted two sub-tasks:  <ref type="table">QA</ref>), which involves answering questions by identifying the target cells within the tables, as illustrated in Figure <ref type="figure" target="#fig_0">1</ref>. We designed each sub-task and constructed datasets for each task.</p><p>The contributions of this study are as follows:</p><p>• Design of two tasks,  • Organization of the NTCIR-18 U4 task and the SIG-FIN UFO-2024 task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Research on Tables</head><p>A table is a data format with a two-dimensional structure used to organize and manage knowledge or information, and it is widely utilized in various contexts. However, not all tables have a highly structured database format; they are often represented as semi-structured data. Furthermore, the data contained in  <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>, table detection (searching for tables within documents) <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>, spreadsheet manipulation <ref type="bibr" target="#b6">[7]</ref>, column type annotation <ref type="bibr" target="#b7">[8]</ref>, and entity linking (linking to knowledge bases) <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>. These tasks are critical in information retrieval and data analysis based on table data, and they are particularly anticipated in fields where handling large-scale data and automation are required. Recently, approaches utilizing large language models (LLMs) and visual language models (VLMs) have been increasing, and research on learning methods, prompt engineering, and agents is also gaining attention <ref type="bibr" target="#b10">[11]</ref>. Our shared tasks (NTCIR-18 U4 and the SIG-FIN UFO-2024) are related to table search, table detection, and table question answering (Table <ref type="table">QA</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Table Retrieval and Table QA</head><p>Table retrieval aims to identify appropriate tables from vast datasets <ref type="bibr" target="#b11">[12]</ref>. In this task, methods that typically assign relevance scores on the basis of the relationship between natural language queries and individual tables are commonly used.</p><p>Table <ref type="table">QA</ref> refers to the technology that provides appropriate answers from tables in response to user questions. Approaches to Table QA include semantic parsing-based, generation-based, extraction-based, matching-based, and retrieval-based methods <ref type="bibr" target="#b12">[13]</ref>. The difficulty in Table QA lies in the need to handle semi-structured or unstructured data, as it also involves non-database tables.</p><p>Compared to existing Tabular QA datasets such as FinQA <ref type="bibr" target="#b13">[14]</ref> and TAT-QA <ref type="bibr" target="#b14">[15]</ref>, which primarily focus on English-language datasets and are designed to handle numerical reasoning in financial contexts, our proposed shared tasks specifically target the Japanese language. Japanese tabular and textual data often exhibit unique linguistic and structural features distinct from those in English datasets. These features may include variations in numerical data formats, context-dependent expressions, and implicit relational cues.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Tables in the Financial Domain</head><p>Hybrid data, which includes both tables and text, such as in financial reports, is quite prevalent in the real world <ref type="bibr" target="#b15">[16]</ref>. Zhu et al. constructed a question-answering benchmark dataset focused on the hybrid content of tabular and textual data in the financial domain <ref type="bibr" target="#b14">[15]</ref>.</p><p>Pan et al. proposed CLTR, an architecture for end-to-end table retrieval at the cell level <ref type="bibr" target="#b16">[17]</ref>. While CLTR can be applied to open-domain datasets, including finance and healthcare ones, its performance specifically within the financial domain has not been clarified, nor does it target the Japanese language.</p><p>One of the tasks focused on Japanese financial table structure analysis is the UFO (Understanding of non-Financial Objects in Financial Reports) task <ref type="bibr" target="#b17">[18]</ref>. The UFO task aims to extract structured information from tables and text found in annual securities reports and consists of two sub-tasks: the Table Data Extraction (TDE) task and the Text-to-Table Relationship Extraction (TTRE) task. The TDE task classifies cells in tables into four categories with the goal of identifying the type of each cell: metadata, header, attribute, and data <ref type="bibr" target="#b18">[19]</ref>. The main focus of TDE was on cell classification, and additional processing to enable inter-company comparisons remained unexplored.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">NTCIR-18 U4 and SIG-FIN UFO-2024 Tasks</head><p>Both the NTCIR-18 U4 and SIG-FIN UFO-2024 tasks consist of two sub-tasks: the Table Retrieval task, which involves searching for tables, and the Table Question Answering (Table <ref type="table">QA</ref>) task, which involves answering questions by identifying the target cells within the tables <ref type="bibr" target="#b19">[20]</ref>. Figure <ref type="figure" target="#fig_0">1</ref> illustrates the concept of these two sub-tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Table Retrieval: Table Search Task</head><p>Table Retrieval is a task that involves searching for a "table" containing the values that answer a given question from the tables included in a company's annual securities report. On average, a company's annual securities report contains 221.9 tables <ref type="bibr" target="#b20">[21]</ref>, and it is necessary to identify the specific table that contains the answer to the question needs to be identified. The input, output, and evaluation criteria for this task are as follows.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Input</head><p>1. Question 2. HTML file of the annual securities report Output Table <ref type="table">(</ref>  <ref type="table" target="#tab_5">ID</ref> is used as the output. In the output example above, the Table ID "S100ISF1-0101010-tab2" refers to the second table in the "S100ISF1-0101010.html" file, with "-tab{table number}" appended to the file name.</p><p>The metric used for evaluation is accuracy, which is calculated by dividing the number of correct outputs by the total number of inputs in the test dataset.</p><p>We evaluated a few baseline methods using our validation datasets, which contain 3,131 and 1,533 questions for NTCIR-18 U4 and SIG-FIN UFO-2024, respectively. The results are shown in Table <ref type="table" target="#tab_6">1</ref>. For the NTCIR-18 U4 task, the highest accuracy of 0.2111 was achieved by using the text-embedding-3-small model to create embeddings based on Cell Text. Similarly, for the SIG-FIN UFO-2024 task, a top accuracy of 0.1937 was obtained using the text-embedding-3-large model for Cell Text embeddings. For input data, HTML files and Table IDs are provided, allowing the system to extract the range enclosed by the &lt;table&gt; tag from the HTML file. Additionally, if necessary, the system can utilize the surrounding context of the table.</p><p>Similar to Table <ref type="table" target="#tab_5">IDs</ref>, each cell (&lt;th&gt; and &lt;td&gt; tags) within the table in the HTML file is assigned a unique Cell ID. When outputting the value corresponding to the answer to the given question, this Cell ID is used. In the output example above, the Cell ID is the Table ID of the table containing the cell, with "-r{row number}c{column number}" appended, so the Cell ID "S100ISF1-0101010-tab2-r8c1" refers to the cell in the 8th row and 1st column of the table "100ISF1-0101010-tab2".</p><p>For evaluation, similar to the Table Retrieval task, accuracy is calculated by dividing the number of correct outputs by the total number of inputs in the test dataset. However, discrepancies between the value contained in the HTML cell and the expected answer are frequently observed. For example, if the expected answer is "4448000000", the corresponding cell in the HTML might contain the string "4,448", while another cell, such as in the top right or column name, might indicate "(in millions of yen)". In this case, the system answering the task must reference both cells to generate the answer "4,448 million yen". While this is equivalent to the correct answer, to compare it accurately, the system must replace the string "million yen" with "000000" and remove the comma.</p><p>Due to this, in this task, both the response and the correct answer are normalized before calculating accuracy. The normalization specification was continually revised during the "dry run" period, considering feedback from participants.</p><p>We evaluated a few baseline methods using our validation datasets, which contain 3,132 and 1,534 questions for NTCIR-18 U4 and SIG-FIN UFO-2024, respectively. These baseline methods involved converting the target table into text format and inputting it, along with the question, into an LLM to generate the desired values. The results are shown in Table <ref type="table" target="#tab_8">2</ref>. For the NTCIR-18 U4 task, the highest accuracy of 0.7471 was achieved by using the Claude 3 Opus model. Similarly, for the SIG-FIN UFO-2024 task, a top accuracy of 0.5750 was obtained using the GPT-4o model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Dataset</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Securities Reports Used in Our Dataset</head><p>The NTCIR-18 U4 task focuses on analyzing securities reports from companies in the TOPIX100 index. The dataset consists of securities reports from companies that are part of the TOPIX100, submitted between April 1, 2020, and March 31, 2021.</p><p>The SIG-FIN UFO-2024 task, on the other hand, focuses on analyzing securities reports from companies in the TOPIX500 index. The annual securities reports used in this task are drawn from the TOPIX500, which represent publicly listed companies with high market capitalization and liquidity. For this task, we target the annual securities reports of 497 companies<ref type="foot" target="#foot_3">4</ref> constituting the TOPIX 500 as of April 30,  To account for differences in the structure of annual securities reports across industries, we ensure that the dataset is balanced by industry. The annual securities reports are distributed across the training, validation, and test sets with minimal industry bias. Specifically, we use the ten major categories from the Tokyo Stock Exchange's 33 industry classifications (service industry, transportation and communications, finance and insurance, construction, mining, commerce, fisheries, agriculture and forestry, manufacturing, electricity and gas, and real estate). The data is divided such that the ratio of train:validation:test is approximately 6:1:3 within each industry category. This results in 289 companies' reports being used for training, 52 for validation, and 153 for testing.</p><p>We retrieve the financial data using the EDINET API v2, utilizing the XBRL, HTML, and CSV files available through the API. The XBRL files contain tabular data, such as taxonomies and instances, referred to as "XBRL information" below, which is also embedded in the corresponding HTML files. The CSV files, referred to below as "annual securities report CSVs, " provide a more accessible format for the XBRL data for easier handling in the study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Question Creation</head><p>Questions are created using annual securities report CSVs and question templates. In the annual securities report CSV, each row represents data, and each column shows the corresponding XBRL information (element ID, item name, context ID, relevant year, consolidated or individual, period or point in time, unit ID, unit, value). Among this XBRL information, the element ID and context ID are crucial for data extraction. The element ID indicates what the data represents, but it is not unique within a single annual securities report. Therefore, combining the element ID with the context ID, which represents the period and dimension, enables data within a report to be uniquely identified and the desired information to be extracted. Thus, the question must include both element ID and context ID.</p><p>On the basis of this, the initial version of the question is created as follows: Question (Initial Version)</p><p>What is the value of "{Element ID}" for {Company Name} in {Context ID}?</p><p>However, if the element ID and context ID are used as they appear in the annual securities report CSV, the question will not be meaningful in Japanese as they are simply IDs. Therefore, the context ID is represented using the relative year, consolidated or individual, and period or point in time, while the element ID is expressed as the item name. The final version of the question is defined as follows: stock price index composed of the TOPIX Core30, TOPIX Large70 and TOPIX Mid400, but of these, only 397 companies are included in the TOPIX Mid400.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Question (Detailed Version)</head><p>What is the value of "{Item Name}" in the {Year} {Period or Point in Time} {Consolidated or Individual (optional)} annual securities report of {Company Name} for {Member Element (optional)}?</p><p>The explanations for each part are as follows:</p><p>• Year: Calculated based on the basis of the relevant year, and the string is included in the question.</p><p>• Period or Point in Time: If it is a point in time, the word "point" is added right after the year.</p><p>• Consolidated or Individual: If it is consolidated or individual, the corresponding string is included; otherwise, "annual securities report of" is omitted. • Member Element: If the context ID contains a member element, the string is included. This element is not translated into Japanese to ensure uniqueness and is used as-is from the annual securities report CSV (ensuring uniqueness is a future challenge). • Item Name: This is essentially the Japanese translation of the element ID, so the string is included.</p><p>An example of a question created using the template is as follows: Example Created with Question Template What is the value of "Building (net amount)" in the 2020 individual annual securities report of Daiwa House Industry Co., Ltd. for NonConsolidatedMember?</p><p>When creating questions for the SIG-FIN UFO-2024 dataset, we also performed data sampling to avoid generating too many similar questions. For data sampling, a unique list of item names is created for each company, and random sampling is performed so that 1/10th of the entire dataset is selected. Additionally, the number of samples per item name is adjusted on the basis of the number of data entries for each item name <ref type="foot" target="#foot_4">5</ref> .</p><p>As a result of these procedures, we constructed the NTCIR-18 U4 dataset consisting of 32,587 entries for the  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Schedule</head><p>To encourage broad participation from those interested in finance, the organizers have introduced two complementary tasks: the NTCIR-18 U4 and the SIG-FIN UFO-2024. The SIG-FIN community includes researchers and practitioners actively engaged in finance, while NTCIR attracts participants interested in shared tasks, especially those with a focus on information retrieval and natural language processing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>This paper proposed a framework for two shared tasks, NTCIR-18 U4 and SIG-FIN UFO-2024, which focus on tables within annual securities reports. In these shared tasks, two sub-tasks are conducted: </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of theTable Retrieval Task and Table QA Task</figDesc><graphic coords="2,72.00,65.61,453.54,149.51" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>UFO-2024 GPT-4o (gpt-4o-2024-05-13) 0.5750 GPT-3o-mini (gpt-4o-mini-2024-07-18) 0.3957 2024. The dataset includes 494 financial statements submitted to EDINET between July 1, 2023, and June 30, 2024.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Table Retrieval Task and Table QA Task</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head></head><label></label><figDesc>Table Retrieval and Table QA, targeting securities reports. • Construction of datasets for Table Retrieval and Table QA, and their release on GitHub 3</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head></head><label></label><figDesc>table cells is not limited to numerical values; it frequently includes strings and other non-numerical data. Numerous methods have been proposed to accommodate such diverse table data.</figDesc><table><row><cell>In addition, table-related tasks include table fact verification</cell></row><row><cell>Zhang and Balog [2] surveyed on tables on the web and classified approaches to accessing table data</cell></row><row><cell>into six main categories.</cell></row><row><cell>1. Table extraction</cell></row><row><cell>2. Table interpretation</cell></row><row><cell>3. Table search</cell></row><row><cell>4. Table question answering</cell></row><row><cell>5. Knowledge base augmentation</cell></row><row><cell>6. Table augmentation</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table ID</head><label>ID</label><figDesc>For the input, HTML files downloaded from EDINET are used. Each table element (&lt;table&gt;) in the HTML files is assigned a unique TableID, and when outputting the table that answers the question, this Table</figDesc><table><row><cell>)</cell></row><row><cell>Evaluation Accuracy</cell></row><row><cell>An example of input and output is shown below.</cell></row><row><cell>Input 1. For Bandai Namco Holdings Inc.,</cell></row><row><cell>what were the "net assets and key management indicators" as of 2020?</cell></row><row><cell>2. S100ISF1-0000000.html, S100ISF1-0101010.html, ...</cell></row><row><cell>Output S100ISF1-0101010-tab2</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 1</head><label>1</label><figDesc>Validation Results for the Table Retrieval Task</figDesc><table><row><cell>Task</cell><cell>Baseline methods</cell><cell>Accuracy</cell></row><row><cell>NTCIR-18 UFO</cell><cell>text-embedding-3-small + Cell Text</cell><cell>0.2111</cell></row><row><cell></cell><cell>text-embedding-3-large + Cell Text</cell><cell>0.1833</cell></row><row><cell></cell><cell>text-embedding-3-small + HTML Text</cell><cell>0.1843</cell></row><row><cell></cell><cell>text-embedding-3-large + HTML Text</cell><cell>0.1418</cell></row><row><cell></cell><cell>text-embedding-3-small + Markdown Text</cell><cell>0.1233</cell></row><row><cell></cell><cell>text-embedding-3-large + Markdown Text</cell><cell>0.1383</cell></row><row><cell cols="2">SIG-FIN UFO-2024 text-embedding-3-small + Cell Text</cell><cell>0.1657</cell></row><row><cell></cell><cell>text-embedding-3-large + Cell Text</cell><cell>0.1937</cell></row><row><cell>3.2.</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_7"><head>Table QA: Table Question Answering Task The</head><label></label><figDesc>Table QA task, given a target table, identifies the "value" that answers the question. To accurately determine the answer, complex tables included in annual securities reports need to be handled<ref type="bibr" target="#b21">[22]</ref>.</figDesc><table><row><cell>Input</cell><cell>1. Question</cell></row><row><cell></cell><cell>2. Target table (Table ID)</cell></row><row><cell></cell><cell>3. HTML file of the annual securities report</cell></row><row><cell>Output</cell><cell>Value, Cell ID</cell></row><row><cell cols="2">Evaluation Accuracy (value), Accuracy (cell ID)</cell></row><row><cell cols="2">An example of input and output is shown below.</cell></row><row><cell cols="2">Input 1. For Bandai Namco Holdings Inc.,</cell></row><row><cell></cell><cell>what were the "net assets and key management indicators" as of 2020?</cell></row><row><cell></cell><cell>2. S100ISF1-0101010-tab2</cell></row><row><cell></cell><cell>3. S100ISF1-0000000.html, S100ISF1-0101010.html, ...</cell></row><row><cell cols="2">Output S100ISF1-0101010-tab2-r8c1</cell></row><row><cell cols="2">The input, output, and evaluation criteria for this task are as follows:</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head>Table 2</head><label>2</label><figDesc>Validation results for the TableQA task</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_9"><head></head><label></label><figDesc>Table Retrieval task and 32,589 entries for the Table QA task, and the SIG-FIN UFO-2024 dataset consisting of 14,410 entries for the Table Retrieval task and 14,412 entries for the Table QA task, as shown in Table 3 6 .</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_10"><head>Table 3</head><label>3</label><figDesc>Breakdown of Each Dataset for Dry Run</figDesc><table><row><cell>Task</cell><cell></cell><cell cols="3">Train Validation Test Total</cell></row><row><cell>NTCIR-18 U4</cell><cell cols="2">Table Retrieval 22,982</cell><cell>3,131</cell><cell>6,474 32,587</cell></row><row><cell></cell><cell>Table QA</cell><cell>22,982</cell><cell>3,132</cell><cell>6,475 32,589</cell></row><row><cell cols="3">SIG-FIN UFO-2024 Table Retrieval 8,390</cell><cell>1,533</cell><cell>4,487 14,410</cell></row><row><cell></cell><cell>Table QA</cell><cell>8,390</cell><cell>1,534</cell><cell>4,488 14,412</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_11"><head></head><label></label><figDesc>Table Retrieval and Table Question Answering (Table QA), which target the annual securities reports of companies belonging to the TOPIX 100 or TOPIX 500 indexes.</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://disclosure2.edinet-fsa.go.jp/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://research.nii.ac.jp/ntcir/ntcir-18/index-en.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://github.com/nlp-for-japanese-securities-reports/ntcir18-u4, https://github.com/nlp-for-japanese-securities-reports/ufo-2024</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">Despite the name TOPIX500, as of 30 April 2024, it only includes 497 companies. To be more precise, the TOPIX500 is a</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">If there is only one data entry for an item name, one sample is taken; if there are two to five, two samples are taken; if there are six or more, three samples are randomly selected. Data with submitter-specific taxonomies that do not include an item name in the annual securities report CSV, or data not from tables (i.e., data not in HTML's &lt;td&gt; tags) are excluded.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">These are the breakdowns of the initial datasets used for the Dry Run. The datasets used in the Formal Run phase have been modified to fix a couple of issues, and as a result, they contain a slightly different number of entries.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">https://www.kaggle.com/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research was supported by JSPS KAKENHI Grant Number 21H03769. We would also like to express our gratitude to everyone at the National Institute of Informatics, Japan, the NTCIR Co-chairs, the members of the SIG-FIN Research Group, and our corporate sponsor, Preferred Networks, Inc., for their valuable cooperation in planning these shared tasks.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>By running similar tasks across these two distinct communities, we aim to foster interaction among participants with diverse expertise and perspectives on finance, creating an opportunity for knowledge exchange and collaboration. We look forward to welcoming a diverse group of participants to build a comprehensive and impactful competition.</p><p>The schedule for each task is outlined in Table <ref type="table">4</ref>, showing the parallel timelines and key phases for both NTCIR-18 U4 and SIG-FIN UFO-2024. As illustrated in the table, both tasks share similar phases such as a dry run, formal run, and evaluation period, which will allow participants to apply and refine their approaches across both tasks seamlessly. This alignment ensures that participants will have the opportunity to benefit from complementary insights across the tasks and fosters collaborative learning within the community. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">NTCIR-18 U4 Task</head><p>The schedule for the NTCIR-18 U4 task is as follows: The dataset for the NTCIR-18 U4 shared task was released in July 2024, followed by an online briefing session on July 20, 2024, where participants received essential information about the task. The dry run phase ran from July 2024 to October 31, 2024, during which participants worked on the dataset and refined their methods. Any issues identified in the dataset during this phase were addressed and resolved to ensure a smooth formal run. The formal run phase is scheduled from November 1, 2024, to December 28, 2024. Throughout the NTCIR-18 U4 task, a leaderboard will be used to provide participants with real-time feedback on their performance. Similar to the SIG-FIN UFO-2024 task, the NTCIR-18 U4 leaderboard will display a Public score on the basis of a subset of the test data during the task period, allowing participants to gauge their progress. Evaluation results and final rankings are scheduled to be returned to participants on February 1, 2025, along with a partial publication of the task overview paper summarizing key outcomes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">SIG-FIN UFO-2024 Task</head><p>The schedule for the SIG-FIN UFO-2024 task is as follows: The dataset for this shared task was released on August 15, 2024, with the dry run phase extending until October 31, 2024. During this phase, participants worked on developing their methods using the dataset, and any data issues identified during this period were addressed and corrected. The formal run phase is scheduled from November 1, 2024, to December 28, 2024. The shared task ranking will be determined on the basis of the evaluation method used in Kaggle 7 , incorporating both Public and Private scores. Throughout the shared task, the Public score (calculated from a subset of the test data) will be displayed on the leaderboard. After the shared task concludes, the Private score (evaluated on the remaining portion of the test data) will be calculated. The final results, based on the Private score, are scheduled to be announced at the 34th SIG-FIN in March 2025.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Evaluating Information Retrieval and Access Tasks: NTCIR&apos;s Legacy of Research Impact, The Information Retrieval Series</title>
		<author>
			<persName><forename type="first">T</forename><surname>Sakai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Oard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kando</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-15-5554-1</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>Springer Nature</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Web table extraction, retrieval, and augmentation: A survey</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Balog</surname></persName>
		</author>
		<idno type="DOI">10.1145/3372117</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Intelligent Systems and Technology (TIST)</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="1" to="35" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">TabFact: A largescale dataset for table-based fact verification</title>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=rkeJRhNYDH" />
	</analytic>
	<monogr>
		<title level="m">8th International Conference on Learning Representations, ICLR 2020</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Table-based fact verification with salience-aware learning</title>
		<author>
			<persName><forename type="first">F</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pujara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Szekely</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.findings-emnlp.338</idno>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2021</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="4025" to="4036" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">TableVLM: Multi-modal pre-training for table structure recognition</title>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.acl-long.137</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<meeting>the 61st Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="2437" to="2449" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents</title>
		<author>
			<persName><forename type="first">D</forename><surname>Prasad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gadpal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kapadni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Visave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sultanpure</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPRW50498.2020.00294</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="2439" to="2447" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2406.14991</idno>
		<title level="m">SpreadsheetBench: Towards challenging real world spreadsheet manipulation</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Table-GPT: Table fine-tuned GPT for diverse table tasks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yashar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Cui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rifinski Fainman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chaudhuri</surname></persName>
		</author>
		<idno type="DOI">10.1145/3654979</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACM on Management of Data</title>
				<meeting>the ACM on Management of Data</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1" to="28" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">TURL: table understanding through representation learning</title>
		<author>
			<persName><forename type="first">X</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lees</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yu</surname></persName>
		</author>
		<idno type="DOI">10.14778/3430915.3430921</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the VLDB Endowment</title>
				<meeting>the VLDB Endowment</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="307" to="319" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">TableLlama: Towards open large generalist models for tables</title>
		<author>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Sun</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2024.naacl-long.335</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
		<title level="s">Long Papers</title>
		<meeting>the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="6024" to="6044" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Large language model for table processing: A survey</title>
		<author>
			<persName><forename type="first">W</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Du</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2402.05121</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">cTBLS: Augmenting large language models with conversational tables</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S</forename><surname>Sundar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Heck</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2023.nlp4convai-1.6</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 5th Workshop on NLP for Conversational AI</title>
				<meeting>the 5th Workshop on NLP for Conversational AI<address><addrLine>NLP4ConvAI</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023. 2023</date>
			<biblScope unit="page" from="59" to="70" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A survey on table question answering: Recent advances</title>
		<author>
			<persName><forename type="first">N</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Siebert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-981-19-7596-7_14</idno>
	</analytic>
	<monogr>
		<title level="m">Knowledge Graph and Semantic Computing: Knowledge Graph Empowers the Digital Economy</title>
				<meeting><address><addrLine>Nature Singapore</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="174" to="186" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">FinQA: A dataset of numerical reasoning over financial data</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Smiley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Borova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Langdon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Moussa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Beane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-H</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Routledge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">Y</forename><surname>Wang</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.emnlp-main.300</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2021 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="3697" to="3711" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">TAT-QA: A question answering benchmark on a hybrid of tabular and textual content in finance</title>
		<author>
			<persName><forename type="first">F</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lv</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-S</forename><surname>Chua</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.acl-long.254</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</title>
		<title level="s">Long Papers</title>
		<meeting>the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="3277" to="3287" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">Romanus</forename><surname>Myrberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Danielsson</surname></persName>
		</author>
		<ptr target="http://lup.lub.lu.se/student-papers/record/9126226" />
		<title level="m">Question-Answering in the Financial Domain</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>Department of Computer Science, Lund University</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master&apos;s thesis</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">CLTR: An end-to-end, transformer-based system for cell-level table retrieval and table question answering</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Canim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Glass</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gliozzo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fox</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2021.acl-demo.24</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 59th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="202" to="209" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">UFO: Proposal for an information extraction task for tables in annual securities reports</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Kimura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kondo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kadowaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kato</surname></persName>
		</author>
		<idno type="DOI">10.11517/jsaisigtwo.2022.FIN-029_32</idno>
	</analytic>
	<monogr>
		<title level="m">JSAI Technical Report, Type 2 SIG</title>
				<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">029</biblScope>
			<biblScope unit="page" from="32" to="38" />
		</imprint>
	</monogr>
	<note>FIN-</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Toward the construction of a dataset for table structure analysis for annual securities reports</title>
		<author>
			<persName><forename type="first">K</forename><surname>Kadowaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kimura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kondo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ototake</surname></persName>
		</author>
		<idno type="DOI">10.11517/jsaisigtwo.2023.FIN-030_100</idno>
	</analytic>
	<monogr>
		<title level="m">JSAI Technical Report, Type 2 SIG</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">030</biblScope>
			<biblScope unit="page" from="100" to="105" />
		</imprint>
	</monogr>
	<note>FIN-</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Creating a question-answering dataset for securities reports and evaluation of the method using LLM (in Japanese)</title>
		<author>
			<persName><forename type="first">E</forename><surname>Sato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kimura</surname></persName>
		</author>
		<ptr target="https://www.ieice.org/publications/search/summary.php?id=132450&amp;tbl=ken&amp;lang=jp" />
	</analytic>
	<monogr>
		<title level="j">IEICE Technical Report</title>
		<imprint>
			<biblScope unit="volume">124</biblScope>
			<biblScope unit="issue">173</biblScope>
			<biblScope unit="page" from="93" to="98" />
			<date type="published" when="2024">2024</date>
			<publisher>The Institute of Electronics, Information and Communication Engineers</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Analysis of tabular data contained in the TOPIX100 annual securities report</title>
		<author>
			<persName><forename type="first">E</forename><surname>Sato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kaji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kimura</surname></persName>
		</author>
		<idno>E-021</idno>
		<ptr target="https://www.ieice.org/publications/conferences/summary.php?id=FIT0000015362&amp;ConfCd=F&amp;conf_type=F&amp;year=2022" />
	</analytic>
	<monogr>
		<title level="m">The 21st Forum on Information Technology</title>
				<imprint>
			<date type="published" when="2022">FIT2022. 2022</date>
		</imprint>
	</monogr>
	<note>in Japanese</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Analysis of machine-unreadable table structures in securities reports</title>
		<author>
			<persName><forename type="first">K</forename><surname>Okuyama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kimura</surname></persName>
		</author>
		<ptr target="https://www.anlp.jp/proceedings/annual_meeting/2024/pdf_dir/P3-20.pdf" />
	</analytic>
	<monogr>
		<title level="m">The 30th Annual Meeting of the Association for Natural Language Processing</title>
				<imprint>
			<date type="published" when="2024">NLP2024. 2024</date>
			<biblScope unit="page" from="P3" to="20" />
		</imprint>
	</monogr>
	<note>in Japanese</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
