<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Enhancing Natural Language Understanding in Large Language Models by Symbolic Representation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Bingqian</forename><surname>Li</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">ShanghaiTech University</orgName>
								<address>
									<settlement>Shanghai</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Baiyang</forename><surname>Song</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">University of Science and Technology of China</orgName>
								<address>
									<settlement>Hefei</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Yi</forename><surname>Zhou</surname></persName>
							<email>yi_zhou@ustc.edu.cn</email>
							<affiliation key="aff1">
								<orgName type="institution">University of Science and Technology of China</orgName>
								<address>
									<settlement>Hefei</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Enhancing Natural Language Understanding in Large Language Models by Symbolic Representation</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">8832E87C20B119C692936C1D14320348</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:57+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Domain Knowledge</term>
					<term>Semantic Parsing</term>
					<term>Symbolic Representation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents the Symbolically Enhanced Neural Inference Framework (SENIF), which enhances the natural language understanding (NLU) capabilities of large language models (LLMs) such as GPT-4 by combining large language models with symbolic representations. The proposed method aims to improve the performance of LLMs by enabling them to infer based on formalized statements. The framework employs Assertional Logic (AL) as its foundational representation. Initially, the framework translates natural language utterances into logical expressions after developing a Concept-Operator diagram (CO) within the domain. We propose a zero-shot parser that enables smaller language models to yield high-quality parsing results for a given Concept-Operator Diagram. We then design a Chain-of-Thought (CoT) prompt that utilizes both the original text and the parsing results from the preceding step as inputs. Experimental results show that LLMs, like GPT-4, can greatly benefit from these high-quality parsing results. Our framework exhibits substantial improvement in GPT-4's performance, elevating the most challenging measure, C@90, by 46.67% (40% → 86.67%). Meanwhile, we have also verified its feasibility in modeling in different fields and medium language models. This research provides a promising direction for enhancing the inference capabilities of large language models.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Natural Language Understanding (NLU) is a challenging task, even for the most advanced and powerful language models. This task entails a comprehensive understanding, often requiring not only the syntactic structure of the language but also semantic meanings, contextual cues, and pragmatic factors. This intricate nature of language comprehension presents a formidable challenge even for large models such as ChatGPT or GPT-4.</p><p>Human comprehension of the world is a synthesis of perception and cognition, indicating that our understanding is not purely based on data-driven processes <ref type="bibr" target="#b0">[1]</ref>. Rather, it involves a combination of learned knowledge, experiences, and symbolic reasoning. Therefore, it stands to reason that mixing symbolic representations into large language models may enhance the language understanding capabilities of large models <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. By integrating symbolic representations, models may be able to better encode and utilize abstract, high-level concepts and relationships inherent in language.</p><p>Both formal reasoning and language models exhibit imperfections in language understanding. Formal reasoning, despite its proficiency in concept comprehension and inference, is often hindered by generalization issues, impeding its practical application. In contrast, large language models, despite their expansive coverage, often fail to accurately capture complex reasoning processes, limiting their reliability. We could even say that the accuracy of language models in machine reading comprehension tasks relies more on suitable QA pairs, rather than a genuine understanding of the question. This point is emphasized and robustly tested by the ZEST benchmark, which is why we have chosen to focus our efforts on this dataset <ref type="bibr" target="#b3">[4]</ref>.</p><p>In light of these challenges, we first use the CO Diagram based on assertion logic to achieve the symbolic representation of domain prior knowledge, then we use a CoT promptbased approach to incorporate it into the neural network, this method can integrate the generalization and fuzzy matching capabilities of language models with the precision of formal representations. This innovative strategy significantly improves model performance on tasks related to language understanding. Moreover, to efficiently obtain formal representations in an open domain, we present a semantic parser for assertional logic <ref type="bibr" target="#b4">[5]</ref>. This algorithm confers several advantages, including swift cross-domain migration, ease of improvement, and independence from annotated data. Addressing these core challenges in the field of semantic parsing is of utmost importance.</p><p>To validate our claims, we apply our proposed methodology to approximately 200 examples extracted from the ZEST benchmark. We further annotate about 400 assertions in assertional logic to evaluate the performance of our zero-shot parser. Meanwhile, we used a subset of ZEST for automatic and hasty modeling, and fine-tuned llama3 based on the data parsed from this CO Diagram. Our experiments show two key insights: 1) formal reasoning is an essential complement to neural inference (40.00% → 73.33%), 2) high-quality parsing results are key to benefitting the language model (40.00% → 86.67%). Our approach is effective for quick and dirty domain modeling and also for fine-tuning on moderate models. However, if the parsing and reasoning processes are suboptimal, they may potentially decrease the performance in Machine Reading Comprehension (MRC) significantly (30.00% → 6.67% for turbo).</p><p>In conclusion, our contributions are as follows:</p><p>1. We introduce the Symbolically Enhanced Neural Inference Framework (SENIF), which mimics the way humans process semantics and cleverly combines the powerful capabilities of language models with symbolic representations. This innovative blend leverages the strengths of the former's a generalization and fuzzy matching capabilities, along with the precision of the latter, to markedly improve model performance on NLU tasks. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Concept-Operator Diagram</head><p>The Figure <ref type="figure" target="#fig_1">1</ref> is an illustration of the CO diagram. The concept is represented by a rectangle and the operator by a diamond, and we capitalize concept names for the sake of distinguishing between concepts and operators, especially when written as logical expressions. In this figure, 'NUMBER' refers to the set of numbers in mathematics, such as 1, 5.201, 1  3 , and so on. While the 'addition' represents a logical operation or a logical relation or a map from LHS to RHS. The logical expression corresponding to Figure <ref type="figure" target="#fig_1">1</ref> is addition (NUMBER, NUMBER) = NUMBER. The semantics is that the sum of two numbers equals another number. An example of this operator is 2 + 3 = 5.</p><p>Concepts and operators can be nested and considered as individuals as well. Additionally, CO Diagram serves for assertional logic, which possesses higher-order logic expressiveness at least. This allows for representing complex relationships and rules like the Pythagorean theorem, which is challenging for tuple-based KBs. The CO model is an expressive model that enhances traditional data models by enabling reasoning and inference capabilities. Moreover, It overcomes the limitation of the traditional model, which is unable to perform inference. This enables the CO model to be used for modeling various types of concepts and their relationships to describe wide knowledge.</p><p>The CO diagram is a powerful tool for representing knowledge in a way that is both intuitive and expressive. It allows for logical relationships to be expressed clearly and concisely.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Pipeline</head><p>The whole steps of the proposed SENIF are shown in Figure <ref type="figure" target="#fig_3">2</ref>. To enhance the performance of the traditional method that leverages language models for NLU tasks, our research introduces symbolic representations and simple reasoning into the existing framework. The central hypothesis is that by infusing these two elements, the model can handle higherlevel, abstract thoughts that often elude pre-trained language models, therefore improving overall performance. Allow for generalization, we have designed a zeroshot parser to handle it (Figure <ref type="figure" target="#fig_3">2b</ref>). We treat the parsing task as a combination of Named Entity Recognition (NER) and MRC tasks. • Integrating symbolic representation and reasoning Therefore, we incorporate an additional semantic parsing dimension to the existing inputs of question and context. Moreover, we designed a chainof-thought prompt that effectively integrates these three inputs (question, context, and semantic parsing results) for further analysis, as illustrated in Figure <ref type="figure" target="#fig_3">2c</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Domain-specific CO Diagram</head><p>To begin with, we need to build a corpus from https://www. whitehouse.gov/about-the-white-house/presidents/ to model the presidential domain of the ZEST benchmark, which contains concise but essential information about the presidents. The information gathered from the website can be used to abstract the core concepts and extract the relationship called operator. And this information is in natural language format and does not require any annotation or processing.The operator helps algorithms understand how the different concepts are related to each other, and they help algorithms integrate domain-specific knowledge.</p><p>Based on this corpus, we use both manual processing and large language model automatic processing to abstract concepts and operators from natural language, and expand outward with different conceptual relationships, ultimately establishing a model that covers this field and meets modeling quality standards.</p><p>The criteria for modeling quality include less semantic information loss, simplicity, etc. We will now explore some of these criteria in detail to help understand how they can be achieved in modeling the education experience of presidents.</p><p>The first example is for less semantic information loss. dent_period (PERSON) = PERIOD" and correct one "resi-dent_info (PERSON, PERIOD) = PLACE" for contexts like "The family lived in Lamar until Harry was ten months old". The first one will lose the dependencies between a certain place and a certain period. In other words, the inference system will be confused if there're multiple places and periods of residence.</p><p>For simplicity, too many variables would make the model difficult to extract and infer. For instance, "school_of (PER-SON) = SCHOOL and belong_to (CLASS) = SCHOOL" are better than "class_of (PERSON, SCHOOL) = CLASS" because the information of the latter can be derived from the easier former. Another example is "birth_date (PER-SON) = DATE and birth_place (PERSON) = PLACE" versus "birth_info (PERSON, DATE) = PLACE". We prefer the first one because they have the same semantics as long as life only has once.</p><p>Achieving all quality criteria simultaneously at the same time is near impossible. We need to balance them well to achieve the best model. This balance is different in different fields and it requires experimentation in the modeling field. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Zero-shot Semantic Parser</head><p>Most existing semantic parsing datasets are limited to parsing short sentences and single facts <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>. Although MIVS <ref type="bibr" target="#b7">[8]</ref> has introduced a semantic parsing dataset for multiple facts, it is essentially a compilation of single-fact datasets, making it relatively mechanical and challenging to apply to real-world scenarios. So we developed a simple zero-shot semantic parsing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1.">Two-stage algorithm</head><p>This paper presents a semantic parsing process that is controlled by a given CO diagram and designed for an opendomain task. This parsing process is difficult to accomplish using traditional algorithms or even advanced language models such as ChatGPT or Davinci without finetune.</p><p>We use a two-stage algorithm. In the first stage, we utilize an open-domain named entity recognition (hereafter referred to as OpenNER) model to recognize individuals with certain concepts, while in the second stage, a MRC system is applied to fill variables for certain operators that are related to concepts identified in stage one. The MRC process is based on question templates generated automatically. This two-stage approach allows us to capture the relationships between individuals and individuals more accurately and efficiently. In this paper, we use UIE <ref type="bibr" target="#b8">[9]</ref>) and DeBERTa-v3-large-squad2 as the base model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.2.">Templates for MRC step</head><p>The MRC system starts with a pre-compiled set of templates, where each template corresponds to a specific operator. The MRC system can answer questions like "Who is six feet tall?" by using the template "Who is [HEIGHT] tall?", the template corresponds to questions asking for the person with a particular height. Therefore, it's necessary to construct templates automatically in a zero-shot scenario.</p><p>Capitalizing on the advancements in in-context learning, it has become feasible to generate question-answer templates for each operator. Thus completing the final step towards constructing a parser for a given CO diagram, with almost complete automation and without annotations.</p><p>The generation process is prompted by the combination of instruction, chain-of-thought, and standard prompting, which we have found to achieve an appropriate balance between quality and variety. We present a brief overview of this schema in Table <ref type="table" target="#tab_1">1</ref>. We found that this combination is better than only using instruction or the chain-of-thought prompt with more examples.</p><p>In fact, the number of incorrect templates during the generation process is higher than that of correct ones. But fortunately, some hard constraints can be employed to detect all faults when using the prompt shown in Table <ref type="table" target="#tab_1">1:</ref> • The number of question templates for each operator should be equal to the number of concepts that need to be filled. • Every question template is only permitted to use concepts with known values because they are queried one by one.</p><p>The complete generation process involves the following steps:</p><p>1. Set the temperature to 0.0 and maximal tries to 20. 2. Alternate between using the text-davinci-003 and gpt-3.5-turbo models to generate the templates. 3. Verify the results using the aforementioned hard constraints. If the templates do not pass the test, the temperature is increased by 0.1 and the process is repeated. 4. Repeat steps 2-3 until the correct question templates are generated or the maximal number of tries is reached.</p><p>As a result of this schema, correct templates can always be generated if they pass the constraints, with only two operators failing. The absence of templates for a few operators is insignificant in practice.</p><p>Moreover, the davinci model is more reliable than the turbo model in precise scenarios, which are consistent with observations when they are used as baselines for zero-shot semantic parsing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Case study for Symbolic-Enhanced Neural Inference Framework</head><p>Finally, a case study will be applied to introduce the whole steps of SENIF (Figure <ref type="figure" target="#fig_3">2</ref>). Consider the question "What academic credentials does this president hold?" and the context "Trump received a bachelor's degree in 1968.". Suppose that we've construct a CO diagram (Figure <ref type="figure" target="#fig_3">2a</ref>), and then zero-shot parser will extract the structural information by a two-stage algorithm (Figure <ref type="figure" target="#fig_3">2b</ref>):</p><p>1. Identify the degree concept and its individual 'bachelor', and turn to fill the "degree_obtained (PERSON, PERIOD) = DEGREE".</p><p>2. Query MRC models by automatically generated templates and get the symbolic representation": de-gree_obtained (Trump,-1968) = bachelor".</p><p>Next, the generative models will receive the question, context, and symbolic representations as inputs (Figure <ref type="figure" target="#fig_3">2c</ref>). The inference process is then completed in five steps: identifying the primary information, selecting the relevant knowledge, synthesizing the original context with the parsing results, performing reasoning, and finally, providing the answer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Setup</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Datasets and Metrics</head><p>Datasets In order to demonstrate the practical significance of our framework and preliminarily explore the potential of integrating symbolic logic reasoning with large language model, we selected a subset of approximately 200 question-answer pairs from the ZEST dataset to test within the specific domain that we manually modeled. This test comprises approximately 200 question-answer problems. With its innovative scoring mechanism (C@K) and challenging problem design, ZEST effectively measures the performance of models in truly understanding the questions, rather than merely obtaining correct answers by chance due to input pairs that happen to fit modelnetwork well. Meanwhile, because our methodology is related to the parsing quality, we need a dataset for the analysis of the parsing quality.</p><p>Due to the lack of a publicly available benchmark to assess the performance of semantic parsing for assertional logic, our study has undertaken the annotation of a dataset of 400 assertions to serve as the test dataset. Notably, our approach to semantic parsing does not require the use of training datasets. To improve the reliability of the evaluation, it has some differences in detail, see the appendix B.1.</p><p>Furthermore, to quickly verify the effectiveness of our method in other fields, we selected all questions matching the prompt words from the training set of the ZEST benchmark, and used a large model for zero-shot modeling (different from the previous manual plus automatic modeling), including questions in various fields such as the president, national parks, and dog breeds. We tested about 800 question-answer pairs in the modeling of this field to verify the versatility of our method.</p><p>Metrics for the NLU task In line with the metrics employed in the foundational study by <ref type="bibr" target="#b3">[4]</ref>, we utilize Mean F1, C@75, and C@90 for assessment. In this benchmark, each question is associated with around 20 ⟨context, answer⟩ pairs. The Mean F1 denotes the average F1 score, while C@A represents a specialized evaluation metric where an algorithm only receives 1 score if the average F1 score across approximately 20 ⟨question, context⟩ pairs surpasses the A%.</p><p>Metrics for parsing task We present our findings by comparing the precision and recall measures, using the exact match condition, as employed in the SQuAD 2.0 <ref type="bibr" target="#b10">[10]</ref> benchmark. Specifically, we perform a variable-wise matching of all assertions, assigning a score of 1 when they're the same and 0 otherwise. The maximal score across all the gold assertions is then determined as the final score. It should be noted that a score of 0 is assigned in instances where the operators do not match, as this implies a lack of consistency in the underlying semantics.</p><p>Due to the limitation of zero resources, we have employed NER and QA models to extract facts that align with the semantics of the original context. We do not refine these facts by considering whether they correspond to the original sentences or merely possess similar semantics. For instance, given the context "Alice is the mother of Bob." the facts "mother_of (Bob) = Alice" and "child_of (Alice) = Bob" are both correct, although the latter is not an original sentence. However, this inherent deficiency does not have any practical implications and can even be regarded as advantageous, as it alleviates the difficulties associated with reasoning.</p><p>In order to incorporate these accurate facts into the computation of precision and recall metrics, an inference system has been developed to augment the given parsing outcomes. A notable observation is that more extensive language models yield a greater quantity of supplementary facts. This can be ascribed to the superior inference capabilities of larger models, which possess the ability to generate novel facts when processing contexts.</p><p>The details of the inference system and ablation experiments are shown in section B.2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Baselines</head><p>In this study, we evaluate our proposed algorithm by comparing it with the state-of-the-art baselines of ZEST (BART and T5) and the most powerful generative models: Text-Davinci-003, GPT-Turbo-3.5, and GPT-4, all renowned for their few-shot and zero-shot learning capabilities. To ensure a fair comparison and reproducibility, we maintain similar parameters and prompts across different models GPT-family, including temperature (0.0), max_tokens (2048), and a '\n' stop marker. The complete prompts used can be found in Appendix C.2. The training details of BART and T5 can be found in Appendix B. <ref type="bibr" target="#b2">3</ref>.</p><p>Due to the non-determinacy of generative models, we repeated each experiment three times, then report the mean value.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results and Analysis</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">NLU task performance</head><p>To demonstrate the superiority of our proposed SENIF in enhancing the language understanding capabilities of large models, we conducted a comparison with the advanced generative models in the NLU task. In the experiments, we employed three types of prompts:</p><p>• Using a few-shot prompt, requiring the model to directly respond to the question; • Utilizing a CoT prompt, which necessitates that the model first parse input through formal expressions, followed by inference and response. We anticipate that this methodology will enhance both the reliability and interpretability of reading comprehension tasks. • Using the almost same prompt, but replace the parsing results by our zero-shot parser (SENIF).</p><p>As evidenced in Table <ref type="table" target="#tab_2">2</ref>, our scheme outperforms the baseline method considerably in the test examples. It is important to note that our proposed approach not only focuses on reading comprehension tasks but just views it as merely one means for validating its effectiveness. The success reveals the feasibility of integrating symbolic logic with neural network-based inference.</p><p>Second, it can be observed that the prompt requiring the model to first parse input before answering the question yields weaker results compared to the simple prompt for davinci and turbo. We believe this can be attributed to two main factors:</p><p>• The second type of prompt does not provide sample data for the model to learn from the context; • Insufficiently skilled and reliable parsing results may interfere with the model's output.</p><p>However, it is worth noting that by replacing the parsing step with our algorithm's parsing results, a significant improvement can be achieved. We believe this demonstrates the potential for incorporating symbolic reasoning to enhance inference reliability by language model (The ZEST dataset assessing whether the model genuinely comprehends the questions), but this improvement is reliant on high parsing accuracy -an observation that shares a similar conclusion with CoT's success, which is dependent on the model's accuracy in terms of consistency and fact-based output. To verify the relationship between our method and parsing quality, we tested the parsing quality of the GPT family and our method.Table <ref type="table" target="#tab_3">3</ref> presents an overview of the performance of semantic parsing by GPT models and ours.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Evaluation of Semantic Parsing</head><p>In our experiments, the proposed model with only about 700M parameters demonstrates a significant performance improvement, achieving approximately a 40.40% increase in precision compared to turbo while surpassing the recall performance of davinci by a 15.23% increase. Notably, Turbo and Davinci models struggle to achieve high precision and recall scores simultaneously, whereas our model attains stateof-the-art results in both aspects.</p><p>We attribute this enhancement primarily to the alignment between the assertional logic and our structure. More importantly, these results suggest the potential for driving existing knowledge representation towards greater complexity and controllability (stemming from the construction of the modeling process), ultimately aiding in constructing a more sophisticated knowledge base. This approach holds promise to address challenges faced in knowledge computation that arise from inconsistencies between knowledge representation and knowledge bases, as well as reducing high resource demands for semantic parsing associated with specific or complex languages.</p><p>To show the relationship between NLU and parsing performance, we plot the performance difference on the ZEST dataset before and after incorporating the parsing step, with respect to the performance of baseline models on parsing data. From Figure <ref type="figure" target="#fig_5">4</ref>, a positive correlation could be observed: parsing results with high precision is a key element for the validity of extra formal steps, and precision is more important than the recall score by comparing Figure <ref type="figure" target="#fig_5">4a</ref> and Figure <ref type="figure" target="#fig_5">4b</ref>. This finding provides further evidence supporting the claim that our framework relies on the precision of symbolic representation, in conjunction with the fuzzy matching capabilities of large language model, to enable broader reasoning. This observation is in line with our initial hypothesis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Generalization Experiment</head><p>To quickly and comprehensively validate the generalizability of our approach, we automatically model additional domains from the ZEST benchmark in zero-shot scenarios. Our approach to achieving rapid domain-specific modeling involves the following steps: • Entity Extraction: Identification of all entities within the text for subsequent concept formation. • Entity to Concept: Abstraction of entities into specific real-world concepts. For example, the entity "red" is abstracted into the concept "COLOR". • Relation Extraction: Identification and extraction of relevant relationships between the extracted entities and their corresponding concepts.</p><p>To enhance the quality of modeling, we applied filter conditions to the final results using the prompts detailed in Appendix C.3. We counted the frequency of all concepts, removing concepts and corresponding operators that appeared too infrequently. Additionally, we filtered out operators with identical meanings based on semantic similarity.</p><p>As shown in Table <ref type="table" target="#tab_4">4</ref>, our method consistently achieves optimal results even with rough modeling. This not only verifies the superior generalization capability of our approach but also highlights the potential of combining symbolic language with large language model. At the same time, we analyzed the reasons for the decline in performance when it was extended to other fields: compared with the manually constructed precise domain CO graph, the quality of zero-shot modeling is significantly worse than that of the manually constructed domain CO graph, and it has obvious problems such as semantic loss and high complexity. For example, for the sentence "Malamutes were thought to be bred by the Malemiut Inupiaq people of Alaska's Norton Sound region.", automatic modeling tends to focus more on the main part of the sentence, that is, modeling "(ANIMAL) be_bred_by(PERSON)" from the sentence, but there is another important semantics in this sentence: (PERSON) live_in(PLACE). These situations lead to a drop in performance in other areas, which also verifies the importance of high-quality domain knowledge in model reasoning.</p><p>Furthermore, in order to prove that other models can also combine symbols to improve their language understanding ability, we fine-tune LLaMA3 on the lora framework and use a zero-shot parser to parse data built from automatically generated CO-Diagrams. We use the zero-shot parser to process a subset of the training set in the ZEST benchmark, a total of 700 question-answer pairs, and use this as the fine-tuning dataset. We fine-tune llama3 in two forms: question-answer pairs (Q/A) and question-answer pairs plus our parsing results (Q/A/R). In the Table <ref type="table">5</ref> we can see that our method continued to achieve superior performance in the fine-tuned LLaMA3, this suggests that models can benefit from domain knowledge or structured knowledge. The performance of SENIF on other models</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Related work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Models Performance</head><p>Mean C@75 C@90 llama3-8b-instruct(Q/A) 40 9 0 llama3-8b-instruct(Q/A/R) 46 20 0 frameworks such as knowledge base <ref type="bibr" target="#b11">[11,</ref><ref type="bibr" target="#b12">12]</ref>, axiom system for highly specialized domains like pouring water <ref type="bibr" target="#b13">[13,</ref><ref type="bibr" target="#b14">14]</ref> and so on. However, these systems struggle with the issue of over-generalization and are difficult to acquire. NLU in LLMs On the other hand, language models have powerful universal capabilities for many downstream tasks <ref type="bibr" target="#b15">[15,</ref><ref type="bibr" target="#b16">16]</ref>, but they lack a true understanding of the world and are weak in reasoning <ref type="bibr" target="#b17">[17,</ref><ref type="bibr" target="#b18">18]</ref>, <ref type="bibr" target="#b19">[19,</ref><ref type="bibr" target="#b20">20]</ref>. LLMs might only use patterns <ref type="bibr" target="#b19">[19]</ref>, the suitable input pair <ref type="bibr" target="#b3">[4]</ref>, or take shortcuts <ref type="bibr" target="#b21">[21]</ref> to infer, without truly understanding the background context.</p><p>Symbolic-enhanced systems Therefore, researchers have made numerous efforts to combine traditional AI with language models. Approaches include neuralizing rule-based system <ref type="bibr" target="#b23">[22,</ref><ref type="bibr" target="#b24">23]</ref>, neural module network <ref type="bibr" target="#b25">[24,</ref><ref type="bibr" target="#b26">25]</ref>, soft or hard symbolic constraints <ref type="bibr" target="#b27">[26,</ref><ref type="bibr" target="#b2">3]</ref>, formal reasoning-based system <ref type="bibr" target="#b28">[27]</ref> and so on. Despite these attempts, these methods have yet to successfully combine the advantages of symbolism and connectionism, often relying too heavily on the capabilities of one over the other. We believe that the most beneficial elements of these two technology pathways are the fuzzy matching capability of large language model and the high precision of symbolic systems. Our work focuses on merging these elements within advanced generative models. We use symbolic representation to provide precise knowledge and language models to enable universal inference.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>We have explored an innovative approach (SENIF) for augmenting the comprehension capabilities of large language models. Our findings suggest that integrating symbolic representation into LLMs significantly improves the NLU ability, offering promising directions for future advancements in the field.</p><p>Further, the introduction of a zero-shot parser designed for the CO diagram is another significant contribution of our work. The parser's capacity for quick cross-domain migration, ease of enhancement, and independence from annotated data make it a potent tool for translating natural language into formal representations, a critical step in improving NLU tasks.</p><p>We conduct empirical validation on the NLU examples and our own annotated semantic parsing dataset. The results offer strong evidence of our approach's efficacy, while our findings also underscore its potential for cross-domain applicability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Limitations</head><p>Our approach works well in zero-shot scenarios and naturally benefits from the enhancement of NER and MRC models without additional effort. However, in the process of using information extraction for approximate semantic parsing, it will also be troubled by reasoning efficiency, redundancy of extraction, and the congenital gap between them, which will affect the further expansion of scale and accuracy. Meanwhile, our zero-shot parsing algorithm will be affected by scale. When facing Large-scale domain knowledge CO Diagrams, its complexity will affect the reasoning speed.</p><p>Furthermore, the challenge of multi-step reasoning tasks remains unresolved for large language model. Therefore, it is imperative to pursue further investigations based on the proposed framework in order to integrate the capabilities of large language model more deeply into the reasoning process. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Modeling results for CO diagram</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Concept-Operator Diagram (CO diagram) is a graphical representation of a knowledge representation model that is based on assertional logic. In this logic, knowledge is represented in the form of "𝑎 = 𝑏", where 𝑎 and 𝑏 are either atomic individuals or compound individuals. There are three components of its syntax: individual, concept, and operator. Concepts are represented as rectangles in the diagram, while operators are represented as diamonds. Since individuals only represent specific instances of concepts, they are not typically included in a CO diagram.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: A simple example for CO diagram.</figDesc><graphic coords="2,72.54,489.37,212.61,61.12" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>•</head><label></label><figDesc>Domain-specific CO diagram We construct a domain-specific CO diagram based on the collected domain information text, which contains the necessary meta-knowledge in a domain. • Parsing based on CO diagram Our parsing procedure is conducted based on a predefined domainspecific CO diagram, as shown in Figure 2a and Figure 3.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: The pipeline of Symbolic-Enhanced Neural Inference Framework (SENIF).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: The part of our CO diagram.</figDesc><graphic coords="3,72.00,527.52,226.76,132.76" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: The relationship between performance on parsing task and NLU task</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>2. A semantic parser for assertional logic is proposed to facilitate the efficient translation of natural language into formal representations in an open domain. It achieves state-of-the-art performance on a semantic parsing dataset annotated with assertional logic.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>question templates generation for operators. CoT Your aim is given the question templates for every function and its variables. For example, the input 'age_of ': ['PERSON0', 'AGE1'] indicates... The semantics is...Only after an question template is given, we can suppose that value can be obtained and use it in next template... For instance, you can only use AGE1 in the first step ...</figDesc><table><row><cell>Whole prompt</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Comparison on ZEST samples.</figDesc><table><row><cell cols="2">Models</cell><cell cols="3">Performance Mean C@75 C@90</cell></row><row><cell></cell><cell>BART-large</cell><cell>51</cell><cell>30</cell><cell>20</cell></row><row><cell>Finetuned models</cell><cell>T5-3B</cell><cell>70</cell><cell>60</cell><cell>50</cell></row><row><cell></cell><cell>T5-11B</cell><cell>73</cell><cell>70</cell><cell>60</cell></row><row><cell></cell><cell>+ few-shot prompt</cell><cell>64</cell><cell>40</cell><cell>10</cell></row><row><cell>Davinci</cell><cell>+ parsing prompt</cell><cell>54</cell><cell>30</cell><cell>6.67</cell></row><row><cell></cell><cell>+ SENIF (ours)</cell><cell>64.33</cell><cell>36.67</cell><cell>10</cell></row><row><cell></cell><cell>+ few-shot prompt</cell><cell>73</cell><cell>50</cell><cell>30</cell></row><row><cell>Turbo</cell><cell>+ parsing prompt</cell><cell>67.67</cell><cell>30</cell><cell>6.67</cell></row><row><cell></cell><cell>+ SENIF (ours)</cell><cell>84</cell><cell>76.67</cell><cell>33.33</cell></row><row><cell></cell><cell>+ few-shot prompt</cell><cell>88.67</cell><cell>90</cell><cell>40</cell></row><row><cell>GPT-4</cell><cell>+ parsing prompt</cell><cell>93.66</cell><cell>100</cell><cell>73.33</cell></row><row><cell></cell><cell>+ SENIF (ours)</cell><cell>97</cell><cell>100</cell><cell>86.67</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3</head><label>3</label><figDesc>Comparison between models of GPT family and ours on semantic parsing task.</figDesc><table><row><cell>Models</cell><cell cols="2">precision recall</cell><cell>F1</cell></row><row><cell>gpt-turbo-3.5</cell><cell>25.63</cell><cell>32.04</cell><cell>28.47</cell></row><row><cell>text-davinci-003</cell><cell>38.98</cell><cell>26.30</cell><cell>31.41</cell></row><row><cell>GPT-4</cell><cell>56.59</cell><cell>38.38</cell><cell>45.73</cell></row><row><cell>Ours</cell><cell>66.03</cell><cell>41.53</cell><cell>50.99</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4</head><label>4</label><figDesc>Performance of SENIF in other fileds.</figDesc><table><row><cell></cell><cell>Models</cell><cell cols="3">Performance Mean C@75 C@90</cell></row><row><cell></cell><cell>+ few-shot prompt</cell><cell>40</cell><cell>12</cell><cell>0</cell></row><row><cell>Turbo</cell><cell>+ parsing prompt</cell><cell>37</cell><cell>11</cell><cell>0</cell></row><row><cell></cell><cell>+ SENIF (ours)</cell><cell>42</cell><cell>16</cell><cell>0</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>A.1. Concepts and Operators Concepts</head><label></label><figDesc>are shown in table 6 while operators are shown in table 7.</figDesc><table><row><cell>Operator</cell><cell></cell><cell>Explanation</cell><cell>LHS</cell><cell>RHS</cell></row><row><cell>degree_obtained</cell><cell></cell><cell>Indicates the degree obtained by a per-</cell><cell cols="2">PERSON, PERIOD_S, PE-</cell><cell>DEGREE</cell></row><row><cell></cell><cell></cell><cell>son during a period</cell><cell>RIOD_T</cell></row><row><cell>majored_in</cell><cell></cell><cell>Indicates the major subject studied by a</cell><cell cols="2">PERSON, PERIOD_S, PE-</cell><cell>MAJOR</cell></row><row><cell></cell><cell></cell><cell>person during a period and while obtain-</cell><cell cols="2">RIOD_T, DEGREE</cell></row><row><cell></cell><cell></cell><cell>ing a certain degree</cell><cell></cell></row><row><cell>Concepts school_educated_of</cell><cell cols="2">Explanation Indicates the school where a person ob-</cell><cell cols="2">school_educated_of</cell><cell>SCHOOL</cell></row><row><cell>ACADEMY</cell><cell cols="2">A section of an university or college tained a certain degree during a period</cell><cell>(PERSON,</cell><cell>PERIOD_S,</cell></row><row><cell>AGE</cell><cell cols="4">The length of time a person has lived, typically measured in years PERIOD_T, DEGREE)</cell></row><row><cell>AWARD academic_educated_of</cell><cell cols="4">A prize or recognition is given for achievement or merit Indicates the academic institution where academy_educated_of</cell><cell>ACADEMIC</cell></row><row><cell>BOOL</cell><cell cols="4">A data type that can hold one of two values, typically "true" or "false" a person obtained a certain degree dur-(PERSON, PERIOD_S,</cell></row><row><cell>DATE</cell><cell cols="2">A specific day ing a period</cell><cell cols="2">PERIOD_T, DEGREE)</cell></row><row><cell>DEGREE academy_belongs_to</cell><cell cols="3">An academic title awarded for completion of a program of study Indicates the school where an academy ACADEMY</cell><cell>SCHOOL</cell></row><row><cell>DESCENT</cell><cell cols="2">Refers to a person's ancestry or ethnic background or department belongs to</cell><cell></cell></row><row><cell>GENDER school_located_in</cell><cell cols="2">The state of being male or female (or non-binary) Indicates the location of a school</cell><cell>SCHOOL</cell><cell>PLACE</cell></row><row><cell cols="3">HEIGHT school_former_name_of The vertical measurement of a person Indicates the former name of a school</cell><cell>PERSON</cell><cell>PERSON</cell></row><row><cell>ILLNESS death_date</cell><cell cols="2">A medical condition or disease Indicates the date of death of a person</cell><cell>PERSON</cell><cell>DATE</cell></row><row><cell>INT birth_date</cell><cell cols="2">A data type that can hold integer values Indicates the date of birth of a person</cell><cell>PERSON</cell><cell>DATE</cell></row><row><cell>MAJOR birth_place</cell><cell cols="3">The main subject of study in a college or university program Indicates the place of birth of a person PERSON</cell><cell>PLACE</cell></row><row><cell cols="4">NATIONALITY The status of belonging to a particular country or nation GetHeight Indicates the height of a person PERSON</cell><cell>HEIGHT</cell></row><row><cell>PARTY resident_in</cell><cell cols="4">A group or organization with shared beliefs or goals, often in a political context Indicates the place of residence of a per-PERSON, PERIOD_S, PE-</cell><cell>PLACE</cell></row><row><cell>PERIOD</cell><cell cols="2">A specific time period son during a period</cell><cell>RIOD_T</cell></row><row><cell>PERSON died_in</cell><cell cols="3">A human being Indicates the place where a person died PERSON</cell><cell>PLACE</cell></row><row><cell>PLACE father_of</cell><cell cols="2">A location or area Indicates the father of a person</cell><cell>PERSON</cell><cell>PERSON</cell></row><row><cell>PRESIDENT mother_of</cell><cell cols="3">The head of a country or organization, often in a political context Indicates the mother of a person PERSON</cell><cell>PERSON</cell></row><row><cell>PROFESSION spouse_of</cell><cell cols="4">A person in a certain period of responsibility, generally refers to the profession, work, occupation, Indicates the spouse of a person PERSON PERSON</cell></row><row><cell>son_of</cell><cell cols="2">job, career, position, etc Indicates the son of a person</cell><cell>PERSON</cell><cell>PERSON</cell></row><row><cell>RACE daughter_of</cell><cell cols="2">A Human populations Indicates the daughter of a person</cell><cell>PERSON</cell><cell>PERSON</cell></row><row><cell>RANK sibling_of</cell><cell cols="2">A position in a hierarchy or order of importance Indicates the sibling of a person</cell><cell>PERSON</cell><cell>PERSON</cell></row><row><cell>SCHOOL gradeparent_of</cell><cell cols="3">An institution for education, often refer to university and college Indicates the grandparent of a person PERSON</cell><cell>PERSON</cell></row><row><cell>grandchild_of</cell><cell></cell><cell>Indicates the grandchild of a person</cell><cell>PERSON</cell><cell>PERSON</cell></row><row><cell>profession_of</cell><cell></cell><cell>Table 6: Concepts Profession of a person during a period</cell><cell cols="2">PERSON, PERIOD_S, PE-</cell><cell>PROFESSION</cell></row><row><cell></cell><cell></cell><cell></cell><cell>RIOD_T</cell></row><row><cell cols="2">which_president_rank_of</cell><cell>Rank of a president</cell><cell>PRESIDENT</cell><cell>RANK</cell></row><row><cell>race_of</cell><cell></cell><cell>Race of a person</cell><cell>PERSON</cell><cell>RACE</cell></row><row><cell>gender_of</cell><cell></cell><cell>Gender of a person</cell><cell>PERSON</cell><cell>GENDER</cell></row><row><cell>nationality_of</cell><cell></cell><cell>Nationality of a person</cell><cell>PERSON</cell><cell>NATIONALITY</cell></row><row><cell>descent_of</cell><cell></cell><cell>Descent of a person</cell><cell>PERSON</cell><cell>DESCENT</cell></row><row><cell cols="2">which_children_rank_of</cell><cell>Rank of a child in a family</cell><cell>PERSON</cell><cell>RANK</cell></row><row><cell>party_affiliation_of</cell><cell></cell><cell>Political party affiliation of a person</cell><cell>PERSON</cell><cell>PARTY</cell></row><row><cell>alias_of</cell><cell></cell><cell>Alias or nickname of a person</cell><cell>PERSON</cell><cell>NAME</cell></row><row><cell>age_of</cell><cell></cell><cell>Age of a person</cell><cell>PERSON</cell><cell>AGE</cell></row><row><cell>illness_of</cell><cell></cell><cell>Illness of a person during a period</cell><cell cols="2">PERSON, PERIOD_S, PE-</cell><cell>ILLNESS</cell></row><row><cell></cell><cell></cell><cell></cell><cell>RIOD_T</cell></row><row><cell>studied_subject_of</cell><cell></cell><cell>Major studied by a person during a pe-</cell><cell cols="2">PERSON, PERIOD_S, PE-</cell><cell>MAJOR</cell></row><row><cell></cell><cell></cell><cell>riod</cell><cell>RIOD_T</cell></row><row><cell cols="2">someone_nominate_some-</cell><cell>Nominate someone for a profession dur-</cell><cell cols="2">PERSON, PERIOD_S, PE-</cell></row><row><cell>one_for_profession</cell><cell></cell><cell>ing a period</cell><cell cols="2">RIOD_T, PERSON</cell></row></table></figure>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Details of evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B.1. Restriction for operators</head><p>Certain operators may possess ambiguities that are not aligned with the annotation standard. For instance, the alias_of operator is designed to capture distinct names used by an individual in varying periods or circumstances, such as nicknames, former names, pseudonyms, etc. However, we notice that the full name and its abbreviation may also be regarded as the alias of a person, as exemplified by Barack Hussein Obama II, Barack Hussein Obama, Barack Obama, and Obama. Recording such information may be meaningless and challenging to label without omissions. Consequently, these operators are omitted when calculating the precision and recall score. Meanwhile, two operators encountered failure during the template generation step: "succeeded_by" and "some-one_nominate_someone_for_profession". To make a fair comparison without manual intervention, we refrained from creating the corresponding question templates. As a result, these two operators were excluded from the evaluation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B.2. Inference system</head><p>We utilize 29 rules about family relationships and personal information to generate complete semantics, please see Table <ref type="table">8</ref>. Table <ref type="table">9</ref> indicates the relevant ablation experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B.3. Training settings for BART and T5</head><p>For BART-large, we use the same setup as in the <ref type="bibr" target="#b3">[4]</ref>. However, for T5-3B and 11B models, as we did not have access to TPUs, we replicate the experiments using 4x3090 24G GPUs and 2xA800 80G GPUs. It was observed that when running under these resource constraints, the setup described in the paper employing 16x8 TPUs yielded poor results (even worse than BART-large). Therefore, we opted for an alternative configuration that produced the best performance for these two baselines. Specifically, an initial learning rate of 5e-5 was employed for 3 epochs during the training process (in fact, the best performance for T5-11B is the one after two epoch training). Moreover, we also set batchsize as 32 but achieve it by batchsize=1 and gradient_accumulation_steps=32. This is because we find that any optimization may result in T5 not converging, so it is significantly limited by memory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Complete prompts C.1. Templates generation prompt</head><p>We generate MRC templates with the prompt provided in Table <ref type="table">10</ref>: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.2. Parsing baselines</head><p>The prompt used for semantic parsing task is given in Table <ref type="table">11</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.3. Auto modeling</head><p>The templates we use for automatic modeling are provided in 12</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C.4. Downstream task baselines</head><p>In this section, we present the prompts utilized for the baseline and our semantic parsing results, with the differences between these two prompts highlighted in red for easy identification (Table <ref type="table">13</ref> and Table <ref type="table">14</ref>). Our objective was to facilitate fair comparisons; thus, we intentionally introduced only subtle discrepancies in the first set of prompts. These modifications were primarily focused on incorporating our parsing results into the first prompt. Moreover, due to the inclusion of a lengthy text (i.e., parsing results) at the end of a prompt may potentially confuse the language model and cause it to lose track of its tasks, we incorporated reminders ("Follow above ... the question") to maintain consistency and ensure that all steps are successfully executed.</p><p>For the few-shot prompt, please see the   Combine the verified pieces of information and present your line of formal reasoning in first order logic. <ref type="bibr" target="#b4">5</ref>. Output the answer without any extra details by "Answer:{answer}" format. The answer should be yes, no, n/a or a brief phrase from the input words based on the question and context. n/a means no answer."'</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 13</head><p>Prompt for semantic parsing (baselines)</p><p>For a given question '{question}', the original context '{context}', and corresponding semantic parsing results (at the end), please:</p><p>1. Identify the main concepts and relationships involved in the question. 2. Select necessary information from both the semantic parsing results and the original context. 3. Compare the information from these two sources. If there is a discrepancy, resolve it by deciding which source is likely to be more accurate. 4. Combine the verified pieces of information and present your line of formal reasoning in logic. <ref type="bibr" target="#b4">5</ref>. Output the answer without any extra details by "Answer:{answer}" format. The answer should be yes, no, n/a or a brief phrase from the input words based on the question and context. n/a means no answer. Semantic parsing results:{parsing_results} Follow above five steps exactly to complete the question</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 14</head><p>Prompt for adding our semantic parsing</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Mahowald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Ivanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">A</forename><surname>Blank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kanwisher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">B</forename><surname>Tenenbaum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Fedorenko</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2301.06627</idno>
		<title level="m">Dissociating language and thought in large language models: a cognitive perspective</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Grounded conversation generation as guided traverses in commonsense knowledge graphs</title>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.acl-main.184</idno>
		<ptr target="https://aclanthology.org/2020.acl-main.184.doi:10.18653/v1/2020.acl-main.184" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</title>
				<meeting>the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="2031" to="2043" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pryor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Getoor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><forename type="middle">E</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2301.13166</idno>
		<title level="m">Esc: Exploration with soft commonsense constraints for zero-shot object navigation</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Learning from task descriptions</title>
		<author>
			<persName><forename type="first">O</forename><surname>Weller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lourie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gardner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Peters</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2020.emnlp-main.105</idno>
		<ptr target="https://aclanthology.org/2020.emnlp-main.105.doi:10.18653/v1/2020.emnlp-main.105" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="1361" to="1375" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">From first-order logic to assertional logic</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Artificial General Intelligence</title>
				<editor>
			<persName><forename type="first">T</forename><surname>Everitt</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Goertzel</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Potapov</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="87" to="97" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Contextual semantic parsing for multilingual taskoriented dialogues</title>
		<author>
			<persName><forename type="first">M</forename><surname>Moradshahi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Campagna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">S</forename><surname>Lam</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2111.02574</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Shah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mohit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.07942</idno>
		<title level="m">Semantic parsing for task oriented dialog using hierarchical representations</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yu</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.18258</idno>
		<title level="m">A birgat model for multi-intent spoken language understanding with hierarchical semantic frames</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Unified structure generation for universal information extraction</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wu</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2022.acl-long" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics</title>
				<meeting>the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="5755" to="5772" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title/>
		<idno type="DOI">10.18653/v1/2022.acl-long.395</idno>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Rajpurkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1806.03822</idno>
		<title level="m">Know what you don&apos;t know: Unanswerable questions for squad</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Cyc: Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">B</forename><surname>Lenat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Prakash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Shepherd</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">AI magazine</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="65" to="65" />
			<date type="published" when="1985">1985</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Conceptnet 5.5: An open multilingual graph of general knowledge</title>
		<author>
			<persName><forename type="first">R</forename><surname>Speer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Havasi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AAAI conference on artificial intelligence</title>
				<meeting>the AAAI conference on artificial intelligence</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">31</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Pouring liquids: A study in commonsense physical reasoning</title>
		<author>
			<persName><forename type="first">E</forename><surname>Davis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence</title>
		<imprint>
			<biblScope unit="volume">172</biblScope>
			<biblScope unit="page" from="1540" to="1578" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Logical formalizations of commonsense reasoning: a survey</title>
		<author>
			<persName><forename type="first">E</forename><surname>Davis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Artificial Intelligence Research</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page" from="651" to="723" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1877" to="1901" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Bubeck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chandrasekaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Eldan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gehrke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Horvitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kamar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">T</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Nori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Palangi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tulio Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2303.12712</idno>
		<idno type="arXiv">arXiv:2303.12712</idno>
		<idno>arXiv:2303.12712</idno>
		<title level="m">Sparks of Artificial General Intelligence: Early experiments with GPT-4</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Huang</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2304.02015</idno>
		<idno type="arXiv">arXiv:2304.02015</idno>
		<idno>arXiv:2304.02015</idno>
		<title level="m">How well do Large Language Models perform in Arithmetic tasks?</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv e-prints</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">going on a vacation&quot; takes longer than &quot;going for a walk&quot;: A study of temporal commonsense understanding</title>
		<author>
			<persName><forename type="first">B</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Khashabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Ning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roth</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D19-1332</idno>
		<ptr target="https://aclanthology.org/D19-1332.doi:10.18653/v1/D19-1332" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics</title>
				<meeting>the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics<address><addrLine>Hong Kong, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3363" to="3369" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Against ai understanding and sentience: Large language models, meaning, and the patterns of human language use</title>
		<author>
			<persName><forename type="first">C</forename><surname>Durt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Froese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Fuchs</surname></persName>
		</author>
		<ptr target="http://philsci-archive.pitt.edu/21983/" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Lenci</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.04229</idno>
		<title level="m">Understanding natural language understanding systems. a critical analysis</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Why machine reading comprehension models learn shortcuts?</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zhao</surname></persName>
		</author>
		<ptr target="https://aclanthology.org/2021.findings-acl" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="989" to="1002" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title/>
		<idno type="DOI">10.18653/v1/2021.findings-acl.85</idno>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Generalize symbolic knowledge with neural rule engine</title>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lu</surname></persName>
		</author>
		<idno>ArXiv abs/1808.10326</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Cold-start and interpretability: Turning regular expressions into trainable recurrent neural networks</title>
		<author>
			<persName><forename type="first">C</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Tu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3193" to="3207" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Neural module networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Andreas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rohrbach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Darrell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Klein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="39" to="48" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Neural-symbolic vqa: Disentangling reasoning from vision and language understanding</title>
		<author>
			<persName><forename type="first">K</forename><surname>Yi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Torralba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kohli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tenenbaum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Reasoning about actions and state changes by injecting commonsense knowledge</title>
		<author>
			<persName><forename type="first">N</forename><surname>Tandon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dalvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Grus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>-T. Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bosselut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Clark</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/D18-1006</idno>
		<ptr target="https://aclanthology.org/D18-1006.doi:10.18653/v1/D18-1006" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics</title>
				<meeting>the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics<address><addrLine>Brussels, Belgium</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="57" to="66" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Reliable natural language understanding with large language models and answer set programming</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rajasekharan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Padalkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gupta</surname></persName>
		</author>
		<idno>ArXiv abs/2302.03780</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m">invasions of France and Germany. After the war, he served as Army Chief of Staff (1945\u20131948), as president of Columbia University (1948\u20131953</title>
				<editor>
			<persName><forename type="first">Healy</forename><surname>Ridge</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Mount</forename><surname>Margaret</surname></persName>
		</editor>
		<meeting><address><addrLine>was born in Denver, Colorado; Park, and North America; Mountain-</addrLine></address></meeting>
		<imprint>
			<publisher>John served in the United States Army</publisher>
			<date type="published" when="1960-11-08">October 14, 1890 \u2013 March 28, 1969. November 8, 1960</date>
		</imprint>
	</monogr>
	<note>Answer: Table 15 Few-shot prompt</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
