A Novel Multi-Step Prompt Approach for LLM-based Q&As on Banking Supervisory Regulation

A Novel Multi-Step Prompt Approach for LLM-based Q&As on Banking Supervisory Regulation DanieleLicari daniele.licari@bancaditalia.it Banca d'Italia

Via Nazionale, 91 00184 Rome Italy

Scuola Superiore Sant'Anna

P.zza dei Martiri della Libertà, 33 56100 Pisa Italy

CanioBenedetto canio.benedetto@bancaditalia.it Banca d'Italia

Via Nazionale, 91 00184 Rome Italy

PraveenBushipaka praveen.bushipaka@santannapisa.it Scuola Superiore Sant'Anna

P.zza dei Martiri della Libertà, 33 56100 Pisa Italy

AlessandroDe Gregorio alessandro.degregorio@bancaditalia.it Banca d'Italia

Via Nazionale, 91 00184 Rome Italy

MarcoDe Leonardis marco.deleonardis@bancaditalia.it Banca d'Italia

Via Nazionale, 91 00184 Rome Italy

TommasoCucinotta tommaso.cucinotta@santannapisa.it Scuola Superiore Sant'Anna

P.zza dei Martiri della Libertà, 33 56100 Pisa Italy

Tenth Italian Conference on Computational Linguistics

Dec 04 -06 2024 Pisa Italy

A Novel Multi-Step Prompt Approach for LLM-based Q&As on Banking Supervisory Regulation 1613-0073 CAA2D12072E661C3813812268E84F93D GROBID - A machine learning software for extracting information from scholarly documents Regulatory Q&A, Banking Supervisory Reporting Regulation, Artificial Intelligence, GenAI, GPT-4o, RAG, LLM Evaluator (T. Cucinotta) 0000-0002-2963-9233 (D. Licari) 0000-0002-8446-9468 (C. Benedetto) 0009-0009-7753-8662 (P. Bushipaka) 0000-0001-7577-3655 (A. De Gregorio) 0009-0004-6523-186X (M. De Leonardis) 0000-0002-0362-0657 (T. Cucinotta)

This paper investigates the use of large language models (LLMs) in analyzing and answering questions related to banking supervisory regulation concerning reporting obligations. We introduce a multi-step prompt construction method that enhances the context provided to the LLM, resulting in more precise and informative answers. This multi-step approach is compared with standard "zero-shot" and "few-shot" approaches, which lacks context enrichment. To assess the quality of the generated responses, we utilize an LLM evaluator. Our findings indicate that the multi-step approach significantly outperforms the zero-shot method, producing more comprehensive and accurate responses.

Introduction

The advent of generative AI (GenAI), and specifically of large language models (LLMs), offers significant opportunities, among others, in the legal and financial sector, facilitating the implementation of innovative solutions across various domains of activities [1,2,3,4,5]. One of the most promising applications is the business case for supporting the navigation and analysis of complex regulatory documents [6,7,8,9], which can be particularly valuable for compliance officers, legal teams, and other professionals working in financial institutions who need to have a clear and timely understanding of the regulations and the consequently derived obligations.

Supervisory authorities could benefit from a tool that streamlines the consultation of complex legislation, providing swift responses to entities and enhancing efficiency [10]. While LLMs offer advantages for this purpose, they also pose risks like bias and inaccuracies [11].

Therefore, it is essential to establish strong verification procedures and retain human supervision to counter these risks. The complexity of regulatory documents, with their dense network of cross-referenced texts/cats and specialized content, necessitates careful analysis to retrieve the needed information ensuring at the same time effective risk management and limit the burden of such manual compliance.

This study introduces a novel methodology to automate and expedite the "question & answer" (Q&A) process in regulatory compliance, leveraging advanced large language models (LLMs) to provide accurate and timely responses to inquiries about the European Banking Authority's (EBA) reporting regulations. Our multi-step approach aligns with Retrieval-Augmented Generation (RAG) principles, enhancing context retrieval and generative capabilities through mechanisms like explicit extraction of Capital Requirements Regulation (CRR) references, implicit reference analysis, and a dedicated cross-encoder for precise regulatory text retrieval. This methodology ensures tailored response generation suited to the complex regulatory compliance context, where precise and comprehensive answers are crucial.

Our work finds particular applications within the domain of EBA regulatory reporting because it is characterized by a large and complex set of interrelated documents, including delegated and implementing acts, technical standards, guidelines, and recommendations, which cover various aspects of financial entities. Such complexity makes the business case both challenging and rewarding.

In this work, we focus on Regulation (EU) N.2013/575, also called Capital Requirements Regulation (CRR) https://eur-lex.europa.eu/legal-content/en/ALL/?uri= celex%3A32013R0575, specifically on the topic of Liquidity Risk as a first use case to evaluate the potential benefit of enriched context for an accurate response generation. The main reason for this choice is that this topic is supported by a relatively limited number of regulatory documents, so it was a good starting point since the regulation is not readily available in the form of a structured dataset and its pre-processing is usually a time-consuming task.

We used the actual EBA Q&As dataset [12] as the foundation for developing a system capable of generating automated responses to questions formulated by analysts on EBA reporting requirements and rules. By harnessing the capabilities of LLMs we aim to create a tool that can deliver accurate and contextually relevant answers to any inquiry on the content of the CRR.

Recent studies highlight the potential of LLMs for qualitative assessment [13,14,15,16]. For this reason, in this work we also propose the use of an "LLM Evaluator" to automate the validation process.

The structure of this paper is the following. Chapter 2 introduces the methodology and provides a detailed description of the approach adopted in this study; it explains the dataset utilized and the normative retrieval techniques employed to identify the regulatory documents necessary to address the EBA's Q&As. Chapter 3 presents the LLM Evaluator and the evaluation criteria. Chapter 4 reports experimental results and results and presents the main outcomes of the study. Chapter 5 discusses challenges as well as potential areas for future developments.

Methodology

This research employs a multi-step methodology to construct a comprehensive prompt for the GPT-4 omni (GPT-4o) language model [17], enabling it to answer EBArelated questions effectively. This step-wise approach focuses on enriching the context provided by the user's question. First, it identifies relevant EBA regulations (specifically CRR references) within the inquiry. Second, it incorporates response examples to guide the LLM's output format ensuring alignment with EBA regulations. This enriched context is then leveraged by a powerful LLM to generate more accurate and informative responses (details in Appendix B.1).

Dataset Construction

To develop and then evaluate our LLM-based Q&A system, firstly we extracted a subset from the EBA's Singlerule-book-qa online resource [12], comprising "questionand-answer" pairs submitted to the EBA between 2013 and 2020. In particular, we focused on the following

Context Enrichment

The context enrichment process is a three-step approach designed to identify, within the data set, the most relevant CRR references to provide an appropriate content to formulate the answer to the inquiry. The first step simply involves extracting explicit CRR references, if directly mentioned in the question (Article in tab 4). The second step leverages on the capabilities of the GPT-4o (prompt in Appendix C.1) to analyse the "question" and the "background information" to identify other CRR references that are not explicitly stated by the user. The last step of the process utilizes our CRR Ranker model, a crossencoder architecture that has been trained to identify and retrieve pertinent references from the Capital Requirements Regulation in response to specific inquiries. This 3-steps comprehensive approach ensures a broader and potentially more accurate understanding of the the inquiry and the specific legal act(s) related to the CRR that the Q&A tool deems applicable.

CRR Ranker Training

With regard to the context enrichment, i.e. the CRR Ranker Training, we employed a specifically trained cross-encoder model [18] to identify relevant CRR references for enriching inquiry context. We used a dedicated "question-article" pair dataset derived from our EBA Q&A Train Dataset, excluding questions related to CRR Article 99 https://www.eba.europa.eu/regulation-and-policy/ single-rulebook/interactive-single-rulebook/14212 due to their frequent lack of topical relevance. Each data point consisted of a question (user query and background information), an associated CRR article, and a binary label indicating relevance (1 for relevant, 0 for not applicable).

We constructed the training dataset by selecting positive and negative samples. Positive samples comprised question-article pairs where the article explicitly addressed the user's query. Additionally, we included pairs formed by questions and implicit CRR references extracted from the user's text, context information, and official response using GPT-4o (used prompt in Appendix C.1).

Negative training samples were mined by using the BAAI bge-large-en-v1.5 pre-trained language model [19]. For the CRR Ranker Training we employed a two-phase process for negative sample selection: first, all CRR articles were encoded using the bge-large-en-v1.5 model, and cosine similarity was utilized to rank them relative to the user's question; second, a set of 20 negative examples was randomly chosen from a pre-defined ranking interval (250-300). The choice of 20 negative samples provides a good balance between computational efficiency and the availability of enough training data. This approach aimed to balance the representation of relevant and irrelevant information within the training data, ensuring the model learns to distinguish between the user's query and potentially related but ultimately off-topic CRR articles [20].

The final dataset comprised 12,533 unique "questionarticle" pairs with positive and negative labels. This data was split into training (10,179 pairs) and development (2,354 pairs) sets for model fine-tuning. This fine-tuning aimed to learn robust semantic representations for questions and CRR articles, enabling the model to effectively identify relevant CRR references for enriching user query context.

We selected the BAAI BGE Reranker v2 m3 model [18] as the basis for our cross-encoder, owing to its taskspecific aptness and its demonstrated superior performance relative to the BGE Reranker Large [19], as reported in Section 4. We adopted the Cross-Entropy Binary Classification loss function, following the approach suggested in the BGE Rerank Git repository [21]. To promote stable convergence, we incorporated a warmup schedule ( with a number of steps 0.1 × len(train_data) × num_epochs step) that gradually increases the learning rate during the initial phase of training. The entire finetuning process was conducted over 4 epochs. We employed an evaluation interval of 800 steps during training and saved the model that achieved the highest F1 score on the development set.

Finally, we evaluated the model's retrieval ability of CRR items for a given user question on EBA Q&A Test Dataset. This evaluation employed recall metrics at various retrieval cutoffs, including recall@5, recall@10, recall@20, and recall@30 (results in Section 4).

Examples Enrichment

To improve the model's understanding of the desired response format, tone, and content, we adopted a few-shot prompting approach [22]. This involved extracting five relevant examples from the EBA Q&A Train Dataset with the same topic as the user question we want to answer. These examples served as demonstrations for the model, showcasing the ideal structure, language style, and level of detail expected in the final responses. Notably, the selection process ensured heterogeneity within the chosen topic, meaning the examples covered various aspects to promote a broader understanding. Limiting the number of examples to five struck a balance between providing diverse demonstrations and maintaining cost-efficiency during inference, as the LLM's input token length has limitations.

Answer Generation

Figure 2 in Appendix B.1 details how we construct a comprehensive prompt that enhances GPT-4o's ability to effectively answer user questions. The final prompt in Appendix C.2 integrates the enriched context (extracted CRR references) and the example enrichment (demonstrations of desired response format, tone, and content). This comprehensive prompt is fed to GPT-4o through the OpenAI API, enabling it to generate a well-reasoned and informative response that adheres to the EBA's regulatory framework and professional tone.

Comparison with RAG Principles

Our multi-step prompt approach aligns with the core principles of Retrieval-Augmented Generation (RAG) while incorporating tailored enhancements that improve context enrichment for regulatory Q&A tasks. Like RAG, our method integrates information retrieval with language generation, but it adds specialized steps to enhance context enrichment. These include explicit extraction of CRR references, implicit analysis using LLM capabilities, and precise retrieval through a dedicated cross-encoder. Compared to standard RAG, which often relies on singlestage retrieval, our structured multi-step process adds a higher level of granularity, including example enrichment through few-shot prompts. This ensures not only factual accuracy but also alignment with domain-specific language standards, ultimately improving response quality for complex regulatory inquiries. Overall, our approach extends the RAG principles to generate tailored, contex-tually enriched answers, which is particularly beneficial for the intricate requirements of regulatory compliance.

LLM Evaluator

In our pipeline, we employ an LLM Evaluator to assess the quality of generated responses, defined in Section 2, compared to the EBA's answers already provided. Employing an LLM Evaluator offers significant advantages in terms of cost-effectiveness and efficiency compared to traditional human evaluation/comparison methods. Recent research highlights the potential of LLMs for large-scale natural language evaluation tasks [23,24,25].

The evaluation process uses a scale from one to four, based on two evaluation criteria: correctness and completeness. A generated response is considered correct if its content aligns with the information presented in the official answer. Additionally, a response is deemed complete if it incorporates all relevant regulatory references provided in the official answer. The following scoring rubric outlines the evaluation criteria:

• Score 1: The generated answer is completely incorrect and incomplete compared to the official answer.

• Score 2: The generated answer is incorrect but either complete or partially complete compared to the official answer. It contains some useful information found in the official answer, but the main statement is incorrect. • Score 3: The generated answer is correct but only partially complete. The main statement matches the official answer, but some information from the official answer is missing. • Score 4: The generated answer is fully correct and complete. It is essentially a rephrased version of the official answer with no significant differences.

To preliminary validate the effectiveness of our LLM evaluator, we conducted an experiment using a synthetic dataset. This dataset was carefully designed to test various aspects of language generation and was evaluated by both a human expert and the LLM. The alignment between the human expert's assessments and those of the LLM was then analyzed. The complete details of the final prompt used for LLM evaluator are provided in Appendix C. 3.

The dataset comprises 60 Q&A pairs, balanced across the four score categories. For each category, two pairs were excluded as they were used as examples for the prompt for the LLM evaluator, resulting in a final dataset of 52 Q&A pairs to measure the alignment between the human and LLM evaluator. Using GPT-4o, we obtained a Kendall-tau coefficient of 0.77, with a p-value of 6•10 −11 . These results justified the adoption of the LLM evaluator over a human one, especially for tasks involving prompt optimization and evaluation. The figure in Appendix B.2 illustrates the complete process of evaluating agreement between the LLM evaluator and the human expert.

Experiments and Results

This section describes the results obtained by measuring retrieval effectiveness and answer quality. Retrieval performance is measured by the number of relevant regulations retrieved (recall) using different encoder models. Answer quality is then evaluated by a separate LLM, which scores each generated response based on factors like relevance and adherence to EBA legal acts. We compare the multi-step prompt approach with a few-shot and zero-shot one focusing on a single topic within the EBA Q&A framework, specifically Liquidity Risk. Finally, we test our Multi-Step pipeline with other LLM models, such as Google Gemini Flash 1.5 and Llama 3.1 70B.

CRR Retrieval

We employed "recall" as the primary metric to assess the effectiveness of bi and cross encoder models in retrieving relevant CRR articles based on the information submitted with the inquiry. "Recall" signifies the proportion of truly relevant CRR articles retrieved from the dataset compared to all the pertinent actual articles [26]. In the context of legal information retrieval, prioritizing the retrieval of all crucial regulatory information for the inquiry makes the recall a particularly relevant metric.

Our primary objective was to identify a model that delivers exceptional retrieval accuracy while maintaining computational efficiency. This potentially excluded models with an extremely large number of parameters, as they can be computationally expensive to run.

We conducted a performance comparison between our fine-tuned CRR Ranker and several pre-trained models:

• Bi-encoders: all-MiniLM-L6-v2 [27], gte-large-en-v1.5 [28], and bge-large-en-v1.5 [19]. • Cross-encoders: bge-reranker-large [19], bgereranker-v2-m3 [29,18].

The detailed results (presented in table 2) show the achieved recall scores on EBA Q&As Test Dataset for each model. Our fine-tuned CRR Ranker significantly outperformed all other models, achieving a more than 20% improvement compared to the best pre-trained model (bge-large-en-v1.5).

Answer Generation

Here we compare the performance of our multi-step approach with a zero-shot one for answering EBA liquidity We tested:

• Zero-Shot Approach: for each question, a standard prompt was provided to the LLM. It encompassed both the specific query and any relevant contextual information they provided. • Few-Shot Approach: for each question, a few examples were provided along with the query to guide the LLM in generating responses. • Multi-Step Approach: for each question, we created prompts following our established multistep approach, incorporating context enrichment and example enrichment (as detailed in previous sections).

The LLM Evaluator assessed each response based on its correctness and completeness relative to the official EBA response. As described in Section 3, the LLM Evaluator assigned an overall score on a scale of 1 (completely incorrect and incomplete) to 4 (fully correct and comprehensive).

Table 3 summarizes the evaluation results for responses generated by the different approaches. The "multi-step" approach consistently achieved higher counts in the high-quality rating categories compared to both the "zero-shot" and "few-shot" ones. This demonstrates that the multi-step approach significantly outperformed the other methods in terms of response quality. The LLM evaluator awarded the multi-step approach an average score of 2.7, representing a 12.5% improvement over the zero-shot and few-shot approaches, which both received an average score of 2.4. Notably, a larger portion of the responses generated by our multi-step approach received scores of 3 or higher, indicating correct answers. In contrast, only 2 out of 46 responses generated by the multi-step approach were rated as completely incorrect (score 1), compared to 6 such responses for the zero-shot approach and 11 for the few-shot approach. These findings suggest that the context enrichment in the multi-step prompts effectively guides the primary LLM toward generating more comprehensive and informative responses that accurately reflect the EBA regulations.

Other LLMs

In this section, we extend our analysis of the multi-step pipeline by incorporating evaluations using additional large language models (LLMs), specifically Google Gemini Flash 1.5 and Llama 3.1 70B. Google Gemini Flash 1.5 is widely recognized for its high-speed processing capabilities and efficiency in response generation, making it a suitable benchmark for comparative performance analysis. Conversely, Llama 3.1 70B is noted for its robustness in handling complex queries while maintaining moderate computational demands, providing an interesting contrast in terms of performance and resource efficiency.

Our experimental results indicate that the average evaluation score achieved by Google Gemini Flash 1.5 was 2.0, whereas Llama 3.1 70B attained an average score of 2.2. Notably, these scores did not surpass the performance of the GPT-4o zero-shot approach, which underscores the advanced capabilities of GPT-4o in addressing the complexities of regulatory compliance inquiries. This observation highlights the inherent strength of GPT-4o in generating accurate and contextually relevant responses, outperforming the other models under similar conditions. Future research will focus on an in-depth analysis of these models with a view toward optimizing each step of the multi-step pipeline in a model-specific manner. By tailoring our methodology to align with the distinctive strengths and limitations of each model, we aim to further enhance the overall accuracy and reliability of the generated responses.

Challenges and Advancements

Our work has highlighted several key challenges that are worth discussing. One of the primary issues concerns the limited size of our test dataset. This constraint arose because we focused on the single topic of Liquidity Risk. However, to achieve robust human alignment and ensure the system addresses diverse user inquiries across EBA topics, future efforts should prioritize dataset expansion and human evaluation integration.

Another topic for reflection is that the study emphasizes the need to retrieve relevant CRR articles. Future research could investigate methods to further refine the generated responses by incorporating legal reasoning and argumentation capabilities into the LLM [30,31], and the most relevant Q&As as examples for few-shot prompting [6].

It is also crucial to underscore the importance of optimizing prompts for this kind of application, and we plan to address this moving forward. Our future research endeavors will focus on investigating automatic prompt engineering techniques [32] leveraging the LLM Evaluator as a metric to optimize. These techniques aim to tailor and optimize prompts based on the specific topic of inquiries, enhancing overall performance.

Moreover, currently we have utilized only one model, GPT-4o, but we intend to extend our testing to include other models that have demonstrated similar performance levels in the field of open question answering [33]. This will help us identify the most effective model for our application with an unbiased evaluation [34].

Similarly, in the context of LLM evaluators, we also intend to explore additional models, including open-source options [35,36], that have shown strong performance in assessing the quality of responses from various LLMs. This approach is expected to increase the correlation between human and LLM evaluations, thereby enhancing the system's overall accuracy and reliability. The scientific community is very active in this area to better understand the limitations of the different types of models considered as evaluators [37].

By addressing the identified limitations through increased human involvement, expanded data coverage, and domain-specific evaluation methods, we believe it is possible to enhance the system's effectiveness and generalizability across a wide range of regulatory domains.

Conclusion

This study explored a novel approach for generating automated responses to inquiries on the Regulation (EU) N.2013/575, specifically on the liquidity risk subject. We proposed a multi-step prompt construction method that enriches the context to be provided to LLMs, enabling them to generate more accurate and informative answers. An LLM Evaluator, which demonstrated strong agreement with human experts, was employed to compare our multi-step approach with standard zero-shot and fewshot methods that lack context enrichment. The quality of the generated responses was assessed, and our findings indicate that the multi-step approach significantly outperforms both the zero-shot and few-shot methods, resulting in responses that are more comprehensive and accurate in relation to the EBA regulation. These results suggest that the multi-step prompt construction is a promising approach for enhancing LLM performance in legal information retrieval tasks, particularly within domains with complex regulatory frameworks like regulatory reporting. Even at this early stage, the tool has demonstrated its ability to make the work of the human analyst more efficient. Future research directions include exploring the use of different LLM architectures and investigating alternative methods for incorporating human feedback into the prompt construction process. Lastly, exploring the generalization of this approach to other regulatory domains would be valuable.

B.1. Multi-Step Approach for Answer Generation

C.3. LLM as Evaluator

Gpt4-omni Prompt I will provide you with two answers to a question. One is the #official answer, which serves as the benchmark. The other is the #generated answer, which needs to be evaluated against the #official answer. You must compare the answers step by step.

Consider the following definitions for this evaluation:

-Correctness: A #generated answer is correct if its content aligns with that of the #official answer.

-Completeness: A #generated answer is complete if it includes all the information present in the #official answer. Your task is to act as an evaluator and rate the #generated answer according to the following scale: RATING 1: The #generated answer is completely incorrect and incomplete compared to the #official answer. RATING 2: The #generated answer is incorrect but either complete or partially complete compared to the #official answer. It contains some useful information found in the #official answer but the main statement is incorrect. RATING 3: The #generated answer is correct but only partially complete. The main statement matches the #official answer, but some information from the #official answer is missing. RATING 4: The #generated answer is fully correct and complete. It is essentially a rephrased version of the #official answer with no significant differences. Please provide a single numerical rating (1-4) followed by a brief explanation for your rating This prompt was used to compare an AI-generated answer (#generated answer) to an official one (#official answer), rating its correctness, completeness, and providing an explanation.

Figure 2 :Figure 3 :Figure 4 :234Figure 2: Multi-Step Approach for Answer Generation

Table 11Sample distribution across training, validation, and test sets for CRR-related Q&A and the subset of only Liquidity Risk Q&A.SetCRR-related Q&A Liquidity Risk Q&ATraining79858Validation16212Test63746variables: question ID, question, submission date, status,topic, legal act, article [within that act], background infor-mation,final answer, submission date and status (detailsin Table 4, Appendix 4) Secondly, we implemented a two-step filtering process aimed at ensuring model efficacy:by excluding non-English entries, and by focusing onCRR-related questions within the same timeframe. Thisresulted in a final dataset of 1597 CRR-related questionsand answers, which was then split into training (50%),validation (10%), and test sets (40%) for robust evaluation(token number distribution in Figure 1 in Appendix A).The distribution of samples for the dataset is summarizedin Table 1.

Table 22Recall scores on EBA Q&As Test DatasetModelsr@5 r@10 r@20 r@30all-MiniLM0.370.460.550.59gte-large0.390.480.570.63bge-large0.410.520.620.67bge-reranker-large0.170.230.310.38bge-reranker-v2-m30.240.310.390.44CRR Ranker (ours) 0.510.670.810.86risk inquiries, using our LLM as the evaluation system(Figure in Appendix B.3). To this end, we utilized a subsetof 46 Q&As from our EBA Q&A Test dataset specificallyfocused on liquidity risk.

Table 33Evaluation results for responses generated by zero-shot, fewshot and multi-stepRating zero-shot few-shot multi-step (gpt4o)16122218111431916264374

Acknowledgments

The authors would like to express their sincere gratitude to Vincenzo Capone, Pamela Maggiori, Daniele Bovi, Fabio Zambuto, Francesca Monacelli, and Roberto Sabbatini (Bank of Italy) for their insightful comments and stimulating discussions on an earlier draft of this paper. Their feedback greatly enhanced the clarity and focus of our work. They would also like to thank the anonymous reviewers for their invaluable suggestions and constructive feedback.

A. Dataset Table 4

EBA Q&As dataset. For this research, we focused on the fields highlighted in yellow.

Variable Name Description

Question ID The unique identifier for each question.

Topic

The general topic or category under which the question falls.

Subject matter

The specific subject matter of the question.

Legal act

The specific legal act to which the question relates. (e.g., CRR) Article

The specific article of the legal to which the question relates. COM Delegated or Implementing Acts/RTS/ITS/GLs/Recommendations Other legislation, standards, guidelines or recommendations to which the question relates.

Article/Paragraph

The specific article or paragraph within the above-mentioned Question

The actual question asked. Background on the question Any additional information or context provided by the question submitter.

Final answer

The official answer provided to the question.

Submission date

The date when the question was submitted.

Final publishing date

The date when the final answer to the question was published.

Status

The current status of the question (e.g. Final, rejected, etc.).

Type of submitter

The type of entity that submitted the question (e.g. Credit institution, investment firm, etc.).

Answer prepared by

The entity that prepared the answer to the question.

BloombergGPT: A Large Language Model for Finance SWu OIrsoy SLu VDabravolski MDredze SGehrmann PKambadur DRosenberg GMann arXiv:2303.17564 2023 cs, q-fin Large Language Models in Law: A Survey JLai WGan JWu ZQi PSYu 10.48550/arXiv.2312.03718 arXiv:2312.03718 2023 CBiancotti CCamassa 10.2139/ssrn.4533699 Loquacity and Visible Emotion: ChatGPT as a Policy Advisor 2023 Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? JJHorton 2023 Large language models and their possible uses in law PHomoki ZZődi 10.1556/2052.2023.00475 Akadémiai Kiadó Section 2024 64 Hungarian Journal of Legal Studies Cbr-rag: Case-based reasoning for retrieval augmented generation in llms for legal question answering NWiratunga RAbeyratne LJayawardena KMartin SMassie INkisi-Orji RWeerasinghe ALiret BFleisch 2024 Interpretable Long-Form Legal Question Answering with Retrieval-Augmented Large Language Models ALouis GVan Dijck GSpanakis 10.48550/arXiv.2309.17050 arXiv:2309.17050 2023 GLQA: A Generation-based Method for Legal Question Answering WZhang HShen TLei QWang DPeng XWang 10.1109/IJCNN54540.2023.10191483 International Joint Conference on Neural Networks (IJCNN) 2023. 2023 Exploring the state of the art in legal QA systems AAbdallah BPiryani AJatowt 10.1186/s40537-023-00802-8 Journal of Big Data 10 127 2023 JPrenio Peering through the hype -assessing suptech tools' transition from experimentation to supervision 2024 LHuang WYu WMa WZhong ZFeng HWang QChen WPeng XFeng BQin TLiu A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions 2023 Single Rulebook Q&A | European Banking Authority 2013-2024 FLASK: Finegrained Language Model Evaluation based on Alignment Skill Sets SYe DKim SKim HHwang SKim YJo JThorne JKim MSeo 10.48550/arXiv.2307.10928 arXiv:2307.10928 2024 Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena LZheng W.-LChiang YSheng SZhuang ZWu YZhuang ZLin ZLi DLi EPXing HZhang JEGonzalez IStoica 10.48550/arXiv.2306.05685 arXiv:2306.05685 2023 G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment YLiu DIter YXu SWang RXu CZhu 10.18653/v1/2023.emnlp-main.153 Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics HBouamor JPino KBali the 2023 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics

Singapore

2023 ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate C.-MChan WChen YSu JYu WXue SZhang JFu ZLiu 10.48550/arXiv.2308.07201 arXiv:2308.07201 2023 <author> <persName><forename type="first">J</forename><surname>Openai</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Achiam</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Adler</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Agarwal</surname></persName> </author> <author> <persName><forename type="first">I</forename><surname>Ahmad</surname></persName> </author> <author> <persName><forename type="first">F</forename><forename type="middle">L</forename><surname>Akkaya</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Aleman</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Almeida</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Altenschmidt</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Altman</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Anadkat</surname></persName> </author> <author> <persName><forename type="first">I</forename><surname>Avila</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Babuschkin</surname></persName> </author> <author> <persName><forename type="first">V</forename><surname>Balaji</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Balcom</surname></persName> </author> <author> <persName><forename type="first">H</forename><surname>Baltescu</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Bao</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Bavarian</surname></persName> </author> <author> <persName><forename type="first">I</forename><surname>Belgum</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Bello</surname></persName> </author> <author> <persName><forename type="first">G</forename><surname>Berdine</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Bernadett-Shapiro</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Berner</surname></persName> </author> <author> <persName><forename type="first">O</forename><surname>Bogdonoff</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Boiko</surname></persName> </author> <author> <persName><forename type="first">A.-L</forename><surname>Boyd</surname></persName> </author> <imprint/> </monogr> </biblStruct> <biblStruct xml:id="b17"> <monogr> <title/> <author> <persName><forename type="first">G</forename><surname>Brakman</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Brockman</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Brooks</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Brundage</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Button</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Cai</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Campbell</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Cann</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Carey</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Carlson</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Carmichael</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Chan</surname></persName> </author> <author> <persName><forename type="first">F</forename><surname>Chang</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Chantzis</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Chen</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Chen</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Chen</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Chen</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Chen</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Chess</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Cho</surname></persName> </author> <author> <persName><forename type="first">H</forename><forename type="middle">W</forename><surname>Chu</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Chung</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Cummings</surname></persName> </author> <author> <persName><forename type="first">Y</forename><surname>Currier</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Dai</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Decareaux</surname></persName> </author> <author> <persName><forename type="first">N</forename><surname>Degry</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Deutsch</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Deville</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Dhar</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Dohan</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Dowling</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Dunning</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Ecoffet</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Eleti</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Eloundou</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Farhi</surname></persName> </author> <author> <persName><forename type="first">N</forename><surname>Fedus</surname></persName> </author> <author> <persName><forename type="first">S</forename><forename type="middle">P</forename><surname>Felix</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Fishman</surname></persName> </author> <author> <persName><forename type="first">I</forename><surname>Forte</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Fulford</surname></persName> </author> <author> <persName><forename type="first">E</forename><surname>Gao</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Georges</surname></persName> </author> <author> <persName><forename type="first">V</forename><surname>Gibson</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Goel</surname></persName> </author> <author> <persName><forename type="first">G</forename><surname>Gogineni</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Goh</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Gontijo-Lopes</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Gordon</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Grafstein</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Gray</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Greene</surname></persName> </author> <author> <persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Gross</surname></persName> </author> <author> <persName><forename type="first">Y</forename><surname>Gu</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Guo</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Hallacy</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Han</surname></persName> </author> <author> <persName><forename type="first">Y</forename><surname>Harris</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>He</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Heaton</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Heidecke</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Hesse</surname></persName> </author> <author> <persName><forename type="first">W</forename><surname>Hickey</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Hickey</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Hoeschele</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Houghton</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Hsu</surname></persName> </author> <author> <persName><forename type="first">X</forename><surname>Hu</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Hu</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Huizinga</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Jain</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Jain</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Jang</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Jiang</surname></persName> </author> <author> <persName><forename type="first">H</forename><surname>Jiang</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Jin</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Jin</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Jomoto</surname></persName> </author> <author> <persName><forename type="first">H</forename><surname>Jonn</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Jun</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Kaftan</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Kaiser</surname></persName> </author> <author> <persName><forename type="first">I</forename><surname>Kamali</surname></persName> </author> <author> <persName><forename type="first">N</forename><forename type="middle">S</forename><surname>Kanitscheider</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Keskar</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Khan</surname></persName> </author> <author> <persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Kilpatrick</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Kim</surname></persName> </author> <author> <persName><forename type="first">Y</forename><surname>Kim</surname></persName> </author> <author> <persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Kim</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Kirchner</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Kiros</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Knight</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Kokotajlo</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Kondraciuk</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Kondrich</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Konstantinidis</surname></persName> </author> <author> <persName><forename type="first">G</forename><surname>Kosic</surname></persName> </author> <author> <persName><forename type="first">V</forename><surname>Krueger</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Kuo</surname></persName> </author> <author> <persName><forename type="first">I</forename><surname>Lampe</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Lan</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Lee</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Leike</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Leung</surname></persName> </author> <author> <persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Levy</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Li</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Lim</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Lin</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Lin</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Litwin</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Lopez</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Lowe</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Lue</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Makanju</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Malfacini</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Manning</surname></persName> </author> <author> <persName><forename type="first">Y</forename><surname>Markov</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Markovski</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Martin</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Mayer</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Mayne</surname></persName> </author> <author> <persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Mcgrew</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Mckinney</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Mcleavey</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Mcmillan</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Mcneil</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Medina</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Mehta</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Menick</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Metz</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Mishchenko</surname></persName> </author> <author> <persName><forename type="first">V</forename><surname>Mishkin</surname></persName> </author> <author> <persName><forename type="first">E</forename><surname>Monaco</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Morikawa</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Mossing</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Mu</surname></persName> </author> <author> <persName><forename type="first">O</forename><surname>Murati</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Murk</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Mély</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Nair</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Nakano</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Nayak</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Neelakantan</surname></persName> </author> <author> <persName><forename type="first">H</forename><surname>Ngo</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Noh</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Ouyang</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>O'keefe</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Pachocki</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Paino</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Palermo</surname></persName> </author> <author> <persName><forename type="first">G</forename><surname>Pantuliano</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Parascandolo</surname></persName> </author> <author> <persName><forename type="first">E</forename><surname>Parish</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Parparita</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Passos</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Pavlov</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Peng</surname></persName> </author> <author> <persName><forename type="first">F</forename><forename type="middle">D A B</forename><surname>Perelman</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Peres</surname></persName> </author> <author> <persName><forename type="first">H</forename><forename type="middle">P D O</forename><surname>Petrov</surname></persName> </author> <author> <persName><surname>Pinto</surname></persName> </author> <author> <persName><surname>Michael</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Pokorny</surname></persName> </author> <author> <persName><forename type="first">V</forename><forename type="middle">H</forename><surname>Pokrass</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Pong</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Powell</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Power</surname></persName> </author> <author> <persName><forename type="first">E</forename><surname>Power</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Proehl</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Puri</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Radford</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Rae</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Ramesh</surname></persName> </author> <author> <persName><forename type="first">F</forename><surname>Raymond</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Real</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Rimbach</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Ross</surname></persName> </author> <author> <persName><forename type="first">H</forename><surname>Rotsted</surname></persName> </author> <author> <persName><forename type="first">N</forename><surname>Roussez</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Ryder</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Saltarelli</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Sanders</surname></persName> </author> <author> <persName><forename type="first">G</forename><surname>Santurkar</surname></persName> </author> <author> <persName><forename type="first">H</forename><surname>Sastry</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Schmidt</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Schnurr</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Schulman</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Selsam</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Sheppard</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Sherbakov</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Shieh</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Shoker</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Shyam</surname></persName> </author> <author> <persName><forename type="first">E</forename><surname>Sidor</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Sigler</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Simens</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Sitkin</surname></persName> </author> <author> <persName><forename type="first">I</forename><surname>Slama</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Sohl</surname></persName> </author> <author> <persName><forename type="first">Y</forename><surname>Sokolowsky</surname></persName> </author> <author> <persName><forename type="first">N</forename><surname>Song</surname></persName> </author> <author> <persName><forename type="first">F</forename><forename type="middle">P</forename><surname>Staudacher</surname></persName> </author> <author> <persName><forename type="first">N</forename><surname>Such</surname></persName> </author> <author> <persName><forename type="first">I</forename><surname>Summers</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Sutskever</surname></persName> </author> <author> <persName><forename type="first">N</forename><surname>Tang</surname></persName> </author> <author> <persName><forename type="first">M</forename><forename type="middle">B</forename><surname>Tezak</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Thompson</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Tillet</surname></persName> </author> <author> <persName><forename type="first">E</forename><surname>Tootoonchian</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Tseng</surname></persName> </author> <author> <persName><forename type="first">N</forename><surname>Tuggle</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Turley</surname></persName> </author> <author> <persName><forename type="first">J</forename><forename type="middle">F C</forename><surname>Tworek</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Uribe</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Vallone</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Vijayvergiya</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Voss</surname></persName> </author> <author> <persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Wainwright</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Wang</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Wang</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Wang</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Ward</surname></persName> </author> <author> <persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Wei</surname></persName> </author> <author> <persName><forename type="first">A</forename><surname>Weinmann</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Welihinda</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Welinder</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Weng</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Weng</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Wiethoff</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Willner</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Winter</surname></persName> </author> <author> <persName><forename type="first">H</forename><surname>Wolrich</surname></persName> </author> <author> <persName><forename type="first">L</forename><surname>Wong</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Workman</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Wu</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Wu</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Wu</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Xiao</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Xu</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Yoo</surname></persName> </author> <author> <persName><forename type="first">Q</forename><surname>Yu</surname></persName> </author> <author> <persName><forename type="first">W</forename><surname>Yuan</surname></persName> </author> <author> <persName><forename type="first">R</forename><surname>Zaremba</surname></persName> </author> <author> <persName><forename type="first">C</forename><surname>Zellers</surname></persName> </author> <author> <persName><forename type="first">M</forename><surname>Zhang</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Zhang</surname></persName> </author> <author> <persName><forename type="first">T</forename><surname>Zhao</surname></persName> </author> <author> <persName><forename type="first">J</forename><surname>Zheng</surname></persName> </author> <author> <persName><forename type="first">W</forename><surname>Zhuang</surname></persName> </author> <author> <persName><forename type="first">B</forename><surname>Zhuk</surname></persName> </author> <author> <persName><surname>Zoph</surname></persName> </author> <idno type="DOI">10.48550/arXiv.2303.08774</idno> <idno type="arXiv">arXiv:2303.08774</idno> <ptr target="http://arxiv.org/abs/2303.08774.doi:10.48550/arXiv.2303.08774" /> <imprint> <date type="published" when="2024">2024</date> </imprint> </monogr> <note type="report_type">GPT-4 Technical Report</note> </biblStruct> <biblStruct xml:id="b18"> <monogr> <author> <persName><forename type="first">J</forename><surname>Chen</surname></persName> </author> <author> <persName><forename type="first">S</forename><surname>Xiao</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Zhang</surname></persName> </author> <author> <persName><forename type="first">K</forename><surname>Luo</surname></persName> </author> <author> <persName><forename type="first">D</forename><surname>Lian</surname></persName> </author> <author> <persName><forename type="first">Z</forename><surname>Liu</surname></persName> </author> <idno type="arXiv">arXiv:2402.03216</idno> <title level="m">Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation 2024 SXiao ZLiu PZhang NMuennighoff arXiv:2309.07597 C-pack: Packaged resources to advance general chinese embedding 2023 Hard negative examples are hard, but useful HXuan AStylianou XLiu RPless 2021 SXiao ZLiu PZhang NMuennighoff FlagEmbedding/FlagEmbedding/reranker at master • FlagOpen/FlagEmbedding 2024 Language models are few-shot learners TBBrown BMann NRyder MSubbiah JKaplan PDhariwal ANeelakantan PShyam GSastry AAskell SAgarwal AHerbert-Voss GKrueger THenighan RChild ARamesh DMZiegler JWu CWinter CHesse MChen ESigler MLitwin SGray BChess JClark CBerner SMccandlish ARadford ISutskever DAmodei 2020 Geval: Nlg evaluation using gpt-4 with better human alignment YLiu DIter YXu SWang RXu CZhu 2023 Alpacafarm: A simulation framework for methods that learn from human feedback YDubois XLi RTaori TZhang IGulrajani JBa CGuestrin PLiang TBHashimoto 2024 Gptscore: Evaluate as you desire JFu S.-KNg ZJiang PLiu 2023 CDManning PRaghavan HSchütze Introduction to Information Retrieval

USA

Cambridge University Press 2008 PAQ: 65 million probably-asked questions and what you can do with them PS HLewis YWu LLiu PMinervini HKüttler APiktus PStenetorp SRiedel CoRR abs/2102.07033 2021 ZLi XZhang YZhang DLong PXie MZhang arXiv:2308.03281 Towards general text embeddings with multi-stage contrastive learning 2023 arXiv preprint Making large language models a better foundation for dense retrieval CLi ZLiu SXiao YShao arXiv:2312.15503 2023 Exploring the effectiveness of prompt engineering for legal reasoning tasks FYu LQuartey FSchilder 10.18653/v1/2023.findings-acl.858 Findings of the Association for Computational Linguistics: ACL 2023, Association for Computational Linguistics ARogers JBoyd-Graber NOkazaki

Toronto, Canada

2023 yuan at semeval-2024 task 5: Enhancing legal argument reasoning with structured prompts YLu HKao International Workshop on Semantic Evaluation 0x. 2024 Prompt engineering a prompt engineer QYe MAxmed RPryzant FKhani 2024 Olympicarena medal ranks: Who is the most intelligent ai so far? ZHuang ZWang SXia PLiu 2024 Llm evaluators recognize and favor their own generations APanickssery SRBowman SFeng 2024 Prometheus 2: An open source language model specialized in evaluating other language models SKim JSuk SLongpre BYLin JShin SWelleck GNeubig MLee KLee MSeo 2024 The biggen bench: A principled benchmark for fine-grained evaluation of language models with language models SKim JSuk JYCho SLongpre CKim DYoon GSon YCho SShafayat JBaek SHPark HHwang JJo HCho HShin SLee HOh NLee NHo SJJoo MKo YLee HChae JShin JJang SYe BYLin SWelleck GNeubig MLee KLee MSeo 2024 On the limitations of fine-tuned judge models for llm evaluation HHuang YQu HZhou JLiu MYang BXu TZhao 2024