<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Improving AutoML for LLMs via Knowledge-Based Meta-Learning</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Ernesto</forename><surname>Luis Estevanell-Valladares</surname></persName>
							<email>elev1@alu.ua.es</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Havana</orgName>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Natural Language Processing and Information Systems Group</orgName>
								<orgName type="institution">University of Alicante</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Improving AutoML for LLMs via Knowledge-Based Meta-Learning</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7352DD59BD838AAEA1ECEB651BA54EFF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>AutoML</term>
					<term>Large Language Model</term>
					<term>Meta-Learning</term>
					<term>Natural Language Processing</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Recent advancements in Large Language Models (LLMs) such as BERT, GPT-4, and T5 have revolutionized the field of Natural Language Processing (NLP), unlocking numerous applications. However, finetuning these models for specific tasks remains a complex and resource-intensive process, often relying heavily on expert knowledge. This research proposes integrating meta-learning into Automatic Machine Learning (AutoML) systems to optimize LLM fine-tuning and pipeline construction. We hypothesize that knowledge-based meta-learning can overcome the inefficiencies of current AutoML approaches by embedding expert-derived heuristics into the optimization process. Our methodology involves compiling extensive LLM usage data, training meta-learning estimators, and integrating these into the AutoGOAL AutoML framework. By doing so, we aim to reduce computational costs and enhance the efficiency of LLM-based NLP applications. The proposed system will be evaluated against traditional AutoML methods and human experts on various text classification tasks to validate its effectiveness. This research can further democratize NLP by making advanced LLM capabilities more accessible and efficient.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Recent advances in large language models (LLMs), such as BERT <ref type="bibr" target="#b0">[1]</ref>, the different versions of GPT <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>, and others like T5 <ref type="bibr" target="#b3">[4]</ref> or Mistral <ref type="bibr" target="#b4">[5]</ref>, have unlocked a whole new landscape of applications. With their sophisticated internal language representations, these models have demonstrated the potential to generalize across numerous tasks <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8]</ref>, thus democratizing access to advanced NLP capabilities. However, achieving satisfactory performance typically requires model fine-tuning, which involves selecting the appropriate model, fine-tuning method, and hyperparameters, often relying on researchers' prior experience and trial-and-error approaches <ref type="bibr" target="#b8">[9]</ref>.</p><p>On the other hand, Automatic Machine Learning (AutoML) <ref type="bibr" target="#b9">[10]</ref> democratizes traditional Machine Learning (ML) by automating the process of building adequate ML pipelines for specific tasks, reducing user interaction. These systems have proven their efficacy in Model Selection (MS) <ref type="bibr" target="#b10">[11]</ref> and Hyper-parameter Optimization (HPO) <ref type="bibr" target="#b11">[12]</ref>, showing relevant results in various ML tasks <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b14">15]</ref>. Some systems, like AutoGOAL <ref type="bibr" target="#b14">[15]</ref>, can even tackle NLP tasks and have shown the ability to compete with manually designed models by human experts within a fraction of the time.</p><p>Building ML pipelines and LLM solutions are similar in that both depend on numerous design decisions. NLP pipelines could include multiple steps (e.g., data preprocessing, feature extraction, classification), combining algorithms and hyper-parameters that work in conjunction. On the other hand, LLMs have many life-cycle stages, each consisting of different tasks and metrics that need optimization <ref type="bibr" target="#b15">[16]</ref>. However, it is more common (and accessible) to fine-tune an LM rather than retrain it from the beginning. This is mainly due to the massive computational cost of pre-training, the considerable availability of pretrained LLMs, and the reported performance of even dated LLMs (e.g., BERT <ref type="bibr" target="#b0">[1]</ref>, RoBERTa <ref type="bibr" target="#b5">[6]</ref>, and DistilBERT <ref type="bibr" target="#b16">[17]</ref>) when fine-tuned.</p><p>Just as AutoML is used for building traditional ML pipelines, it can automatically create LLM pipelines or fine-tune LLMs based on pretrained models, as there is no technical difference between both types of pipelines. However, evaluating an LLM pipeline incurs a significant computational cost, and fine-tuning a model could take hours, depending on the training data and available computational resources. Additionally, the complexity of the search spaces, which include multiple LLMs, fine-tuning methods, and hyperparameters, could make zero-shot AutoML less efficient than human experts who rely on prior knowledge.</p><p>Our research proposes modeling knowledge from the fine-tuning stage of LLMs and integrating it into an AutoML process to efficiently generate optimal LLM pipelines for any specific NLP task. As such, our central hypothesis is that (H1) knowledge-based meta-learning can mitigate the drawbacks of AutoML for LLMs and help build LLM-based applications more effectively. To test our hypothesis, we will design, develop, and integrate such meta-learning components into an AutoML system. In particular, we will focus on the Text Classification task as it is relevant and allows for a more straightforward proposal evaluation process. Then, we will compare our meta-learning-based AutoML system against zero-shot AutoML and human experts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Motivational Example</head><p>Imagine a mid-sized company wanting to implement an advanced customer support chatbot using pre-trained LLMs like the GPTs <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref> or T5 <ref type="bibr" target="#b3">[4]</ref>. Traditionally, customizing one of these models could take weeks or months, delaying deployment and impacting productivity. Our proposed knowledge-based meta-learning approach within an AutoML framework aims to automatically predict the most suitable LLM, tuning method, and settings for the specific task.</p><p>This approach reduces time and computational resources, improving model development efficiency and quality. Integrating expert knowledge into the AutoML process can speed up the entire production pipeline and lead to faster and more effective deployment of LLM-based applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">LLMs for AutoML</head><p>The term LLMs for AutoML refers to using Language Models to enhance an AutoML process or system. The two most popular approaches in this category involve using LLMs to improve human interaction with AutoML systems or using the knowledge embedded in LLMs to actively contribute to the solution-building process of AutoML <ref type="bibr" target="#b15">[16]</ref>.</p><p>Human-to-Machine Interaction: LLM-based applications like ChatGPT <ref type="bibr" target="#b2">[3]</ref> from OpenAI and Gemini <ref type="bibr" target="#b17">[18]</ref> from Google demonstrate how LLMs can be employed for human-to-machine interaction with millions of users. From this experience stems the potential of LLMs for improving user interaction with complex AutoML systems. According to Tornede et al. <ref type="bibr" target="#b15">[16]</ref>, language models could serve as the interface for setting up the necessary configurations for the AutoML system to function properly and could also facilitate some level of result interpretability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>LLMs as Controllers:</head><p>Due to the vital amount of knowledge embedded into LLMs during training, they can also be used to participate in the solution-building process of AutoML actively. Shen et al. <ref type="bibr" target="#b18">[19]</ref> and Luo and Shen <ref type="bibr" target="#b19">[20]</ref> proposed using LLMs as controllers for building pipelines. HuggingGPT <ref type="bibr" target="#b18">[19]</ref> parses user inputs into sorted tasks, finds suitable huggingface <ref type="bibr" target="#b20">[21]</ref> models for each, and computes the response orderly. AutoM3L <ref type="bibr" target="#b19">[20]</ref> goes a step further, allowing users to have a more active role in each step of the system via directives to the LLM. Other proposals by Sayed et al. <ref type="bibr" target="#b21">[22]</ref>, Morris et al. <ref type="bibr" target="#b22">[23]</ref>, and Zhang et al. <ref type="bibr" target="#b23">[24]</ref> also implement this type of approach.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">AutoML for LLMs</head><p>Another point of interest in the relationship between AutoML and LLMs is the fact that AutoML could be used to produce optimal LLM solutions streamlined for specific scenarios automatically. This approach is known as AutoML for LLMs Tornede et al. <ref type="bibr" target="#b15">[16]</ref>. However, this direction stems several challenges that must be addressed, namely:</p><p>(i) The different stages of the life-cycle of LLMs require optimization on different objectives, of which current AutoML systems are incapable. (ii) LLMs are extremely resource-intensive <ref type="bibr" target="#b24">[25]</ref>, even when only considering their latest stages (e.g., fine-tuning, inference).</p><p>In their work, Mallik et al. <ref type="bibr" target="#b8">[9]</ref> emphasize the gap between current HPO algorithms and modern Deep Learning (DL) methods. They introduce an HPO approach incorporating expert knowledge and inexpensive proxy tasks to reduce optimization costs. On the other hand, Zhang et al. <ref type="bibr" target="#b23">[24]</ref> proposes AutoML-GPT, capable of optimizing LLM pipelines for many tasks. This system optimizes the hyperparameters of such pipelines by simulating their training. This way, all responsibility falls into the coordinator LLM (and collaborator models), and no actual evaluation is executed. Both methods leverage expert knowledge to minimize resource consumption during their hyperparameter optimization search. Furthermore, Zhang et al. <ref type="bibr" target="#b25">[26]</ref> investigated the impact of data, model, and fine-tuning method selection on various NLP tasks, concluding that the optimal approach varies depending on the task. Currently, no system combines Model Selection and HPO. Therefore, we propose an AutoML system with these specifications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposal and Methodology</head><p>The expertise in machine learning, mainly when data is limited and training is not feasible, involves leveraging expert knowledge to navigate the complexities of ML tasks. Experts utilize scalability rules and heuristics to make informed decisions about model architecture, training data selection, and fine-tuning techniques based on the specific requirements of each task. These decisions help optimize resource usage and achieve efficient outcomes. Our proposal aims to model these heuristics within an AutoML system using meta-learning to avoid sub-optimal decisions. We propose the following specific objectives:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>O1 Extract, compile, and store knowledge from AutoML logs</head><p>We will analyze AutoML logs to identify patterns and insights that can be extracted from the exploration experience. This involves collecting data on configurations, performance metrics, and outcomes of AutoML processes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>O2 Open a federated knowledge venue (Optional, Long Term)</head><p>The knowledge extracted from every AutoML instance will be transformed into a reusable format, stored, and shared across multiple devices. We can recycle all the unused knowledge on LLM experimentation by providing a logging framework connecting to the centralized knowledge base. This federated knowledge will be a foundation for training models that can be generalized across diverse tasks and settings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>O3 Train and test an estimator on such knowledge</head><p>We will develop and evaluate an estimator trained on the compiled knowledge to predict optimal configurations and settings for new tasks. Federated Knowledge is not required to test our main hypotheses but would enhance our estimators. Hence, we can simply train and test our estimator using the initially generated data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>O4 Integrate the estimator into an AutoML system</head><p>Finally, the trained estimator will be integrated into an AutoML system. This integration will enable the system to automatically apply expert-derived heuristics and avoid suboptimal decisions, improving overall efficiency and performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Knowledge Compilation</head><p>The initial step involves collecting and organizing LLM usage data from various scenarios, specifically AutoML logs. Our focus will be on text classification to support the testing of our hypotheses. The data we gather will cover the following components:</p><p>• ML task specifics (text classification).</p><p>• Dataset characteristics (e.g., number of samples and classes, mean length of samples, domain).</p><p>• LLM features (e.g., number of parameters and layers, pre-training target task, pre-training data domains). • Fine-tuning method features (e.g., method name, hyperparameters). • Outcome metrics (e.g., performance, resource utilization).</p><p>We acknowledge that a limited amount of data is available for experiments that align with our specific requirements for fine-tuning LLMs. Additionally, many models are not open-source, making it difficult to access necessary features. Therefore, our proposal involves generating the required data for our research. At the time of writing, we have over 2000 LLM evaluation entries on three text classification tasks: IDMB, Yelp Reviews Full <ref type="bibr" target="#b26">[27]</ref>, and AG News <ref type="bibr" target="#b26">[27]</ref>.</p><p>First, we should select an appropriate set of LLMs, fine-tuning methods (with their hyperparameters), and NLP tasks to evaluate. Table <ref type="table">1</ref> lists the LLMs we have selected for study participation. We amount to 44 models (accounting for variants), half of which are generative. Models range from 65.8 million parameters (DistilBERT) to 11 billion (T5).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>LLM Variants</head><p>BERT <ref type="bibr" target="#b0">[1]</ref> (cased, uncased) base, large, base-multilingual (only cased) DistilBERT <ref type="bibr" target="#b16">[17]</ref> base (cased, uncased), base-multilingual (cased) RoBERTa <ref type="bibr" target="#b5">[6]</ref> base, large XLM-RoBERTa <ref type="bibr" target="#b27">[28]</ref> base, large DeBERTa <ref type="bibr" target="#b28">[29]</ref> base DeBERTaV3 <ref type="bibr" target="#b29">[30]</ref> base MDeBERTaV3 <ref type="bibr" target="#b29">[30]</ref> base ALBERT-v1 <ref type="bibr" target="#b30">[31]</ref> base, large, xlarge, xxlarge ELECTRA <ref type="bibr" target="#b31">[32]</ref> (discriminator) small, base, large T5 <ref type="bibr" target="#b3">[4]</ref> small, base, large, 3B, 11B FLAN-T5 <ref type="bibr" target="#b32">[33]</ref> base, large, xxl, xl GPT-2 <ref type="bibr" target="#b1">[2]</ref> base, medium, large, xl</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>List of LLMs selected for study participation.</p><p>Fine-tuning has been the preferred choice for adapting Language Models to specific tasks <ref type="bibr" target="#b33">[34]</ref>. However, some methods might render different results depending on their use case. For our research, we have included vanilla fine-tuning, the Low-Rank Adaptation (LoRA) adapter <ref type="bibr" target="#b34">[35]</ref> as a Parameter Efficient Fine-Tuning alternative. Lastly, we added a naive Partial Fine-tuning method consisting of freezing the initial layers so general knowledge is not lost during training <ref type="bibr" target="#b35">[36]</ref>, a way of adaptive fine-tuning.</p><p>Because our hypothesis is domain-agnostic, we propose testing these LLMs and fine-tuning methods in Text Classification tasks. However, evaluating every possible combination is inefficient due to the high cost of experimentation and the sheer number of combinations available (taking into account fine-tuning hyperparameters). Therefore, we resort to AutoML to sample good-performing and efficient samples.</p><p>AutoGOAL <ref type="bibr" target="#b14">[15]</ref> is a heterogeneous AutoML system capable of multi-objective optimization that includes LLMs in its algorithm pool. However, one of its limitations is that it can only employ LLMs for inference. Hence, we also need to extend the system to support fine-tuning.</p><p>Optimizing performance and training time could help us produce substantial data in the shortest possible time. Moreover, training time is a substantial estimator of how computeintensive training certain LLM is <ref type="bibr" target="#b36">[37]</ref>; hence, optimizing it would help steer the data towards the greener combinations. However, although theoretically, this could raise the number of samples generated in a period, we could lean onto other venues for recollecting more data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1.">Federated Knowledge and Knowledge Recycling</head><p>Due to the rise in popularity of LLMs, a massive amount of work is directed toward finetuning these models to specific tasks. Only Huggingface <ref type="bibr" target="#b20">[21]</ref> hosts around 60000 models for text classification, and many of these could have been the final products of a long series of experiments that ended up under-performing or straight-out invalid. If correctly reported and utilized, this (disposed of) knowledge could be of great value for meta-learning.</p><p>We propose exploiting this venue by building a logging framework to collect relevant data from experiments regarding LLMs and store them in a centralized knowledge base. This Federated Knowledge Base could be the base of further meta-learning approaches to optimizing LLMs and potentially support many researchers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Meta-Learning Estimator</head><p>Once we have our Dataset, we will design multiple estimators that utilize (and represent) the extracted knowledge to predict how adequate a particular combination of LLM and fine-tuning method (and hyperparameters) are for a target task. We will follow multiple strategies for generating such estimators. AutoGOAL (or any other system) could again be employed to find optimal ML pipelines for our dataset automatically. Additionally, experts will manually design some explainable solutions and add them to the pool of candidate solutions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">AutoML Integration</head><p>Depending on the chosen system optimization strategy, the integration of the meta-learning estimator into AutoML can be approached in various ways. We selected AutoGOAL as the target framework because, to our knowledge, it is the only AutoML system capable of modeling a broad search space of LLM pipelines.</p><p>AutoGOAL follows a Probabilistic Grammatical Evolution <ref type="bibr" target="#b37">[38]</ref> strategy consisting of a cycle in which each generation produces a population of solutions (pipelines). These pipelines are then evaluated and ranked by their performance. The top solutions are then selected to shift the system's probabilistic model, from which all pipelines are sampled. This way, AutoGOAL converges into the section of the space more likely to generate optimal solutions.</p><p>A meta-learning component could determine whether an LLM pipeline should be evaluated based on its predicted performance. If the predicted performance is notably lower than the current best by a certain threshold, such evaluation could be considered a waste of resources. If not, the LLM could be trained, and its logs could be stored (or published) for later use by newer estimators.</p><p>Another potential benefit is leveraging the extracted knowledge to provide the system with an initial advantage. Specifically, we could initialize the probabilistic distribution (which is uniform by default) with a bias for the best-performing methods we previously found for similar tasks. This approach could improve the system's speed and performance in converging to optimal solutions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>To challenge H1 (See Section 1), we propose to test first whether inference based on the extracted knowledge effectively predicts new scenarios independently from AutoML. Then, we must evaluate the benefits of integrating the meta-learning component into AutoML.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Knowledge</head><p>To gauge the quality of our compiled knowledge, we must evaluate the performance of our inferred rules and estimators against our proposed baselines:</p><p>• Random Estimator. • LLM Estimators.</p><p>Evaluating estimators can be done as evaluating any ML model. We will automatically compare each via k-fold cross-validation on our dataset. We will selectively hide LLMs and Tasks from our dataset to further support our results and test whether the estimators can generalize to unseen data points. This can also be achieved by repeating the dataset generation procedure and sampling a test dataset for a new task or with LLMs not previously included.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Meta-Learning for AutoML</head><p>To empirically test the effectiveness of our meta-learning approach, we propose comparing our meta-learning enhanced AutoGOAL against its original implementation, other AutoML systems, and human experts on text classification tasks. By doing so, we intend to test whether our tool can generalize to different, previously unseen tasks. This would also highlight the quality of our selected features for both the dataset and the models.</p></div>		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Language models are unsupervised multitask learners</title>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Luan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">OpenAI blog</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page">9</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<idno>ArXiv:2303.08774</idno>
		<title level="m">GPT-4</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
		<respStmt>
			<orgName>OpenAI</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Exploring the limits of transfer learning with a unified text-to-text transformer</title>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Narang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Matena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Liu</surname></persName>
		</author>
		<ptr target="http://jmlr.org/papers/v21/20-074.html" />
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="1" to="67" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Q</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sablayrolles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mensch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bamford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">S</forename><surname>Chaplot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Casas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bressand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lengyel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Saulnier</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.06825</idno>
		<title level="m">Mistral 7b</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692</idno>
		<title level="m">Roberta: A robustly optimized bert pretraining approach</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Xlnet: Generalized autoregressive pretraining for language understanding</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Carbonell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">R</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">How to fine-tune bert for text classification?</title>
		<author>
			<persName><forename type="first">C</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Qiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Chinese computational linguistics: 18th China national conference, CCL 2019</title>
				<meeting><address><addrLine>Kunming, China</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">October 18-20, 2019. proceedings 18. 2019</date>
			<biblScope unit="page" from="194" to="206" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Priorband: Practical hyperparameter optimization in the age of deep learning</title>
		<author>
			<persName><forename type="first">N</forename><surname>Mallik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bergman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hvarfner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Stoll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Janowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lindauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Nardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Automated Machine Learning</title>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kotthoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanschoren</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Auto-weka: combined selection and hyperparameter optimization of classification algorithms</title>
		<author>
			<persName><forename type="first">C</forename><surname>Thornton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Hoos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Leyton-Brown</surname></persName>
		</author>
		<idno type="DOI">10.1145/2487575.2487629</idno>
	</analytic>
	<monogr>
		<title level="j">ACM</title>
		<imprint>
			<biblScope unit="page" from="847" to="855" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Hyperparameter Optimization</title>
		<author>
			<persName><forename type="first">M</forename><surname>Feurer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-030-05318-5_1</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-030-05318-5_1" />
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>Springer International Publishing</publisher>
			<biblScope unit="page" from="3" to="33" />
			<pubPlace>Cham</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Feurer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Eggensperger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Falkner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lindauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<idno>arXiv:</idno>
		<title level="m">Auto-sklearn 2.0: The next generation</title>
				<imprint>
			<publisher>Learning</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">H2o automl: Scalable automatic machine learning</title>
		<author>
			<persName><forename type="first">E</forename><surname>Ledell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Poirier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the AutoML Workshop at ICML</title>
				<meeting>the AutoML Workshop at ICML</meeting>
		<imprint>
			<date type="published" when="2020">2020. 2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Automatic discovery of heterogeneous machine learning pipelines: An application to natural language processing</title>
		<author>
			<persName><forename type="first">S</forename><surname>Estevez-Velarde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Gutiérrez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Montoyo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">A</forename><surname>Cruz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th International Conference on Computational Linguistics</title>
				<meeting>the 28th International Conference on Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="3558" to="3568" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Tornede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Eimer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Giovanelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mohan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ruhkopf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Segel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Theodorakopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tornede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wachsmuth</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2306.08107</idno>
		<title level="m">Automl in the age of large language models: Current challenges, future opportunities and risks</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1910.01108</idno>
		<title level="m">Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Team</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Anil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Borgeaud</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-B</forename><surname>Alayrac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Soricut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schalkwyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hauth</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2312.11805</idno>
		<title level="m">Gemini: a family of highly capable multimodal models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhuang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.17580</idno>
		<title level="m">Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<title level="m">Autom3l: Automated multimodal machine learning with large language model</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Wolf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Debut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sanh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chaumond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Delangue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Moi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cistac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Louf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Funtowicz</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1910.03771</idno>
		<title level="m">Huggingface&apos;s transformers: State-of-the-art natural language processing</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Gizaml: A collaborative meta-learning based framework using llm for automated time-series forecasting</title>
		<author>
			<persName><forename type="first">E</forename><surname>Sayed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Maher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Sedeek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Eldamaty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kamel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">El</forename><surname>Shawi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EDBT</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="830" to="833" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Morris</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jurado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zutty</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2403.11446</idno>
		<title level="m">Llm guided evolution-the automation of models advancing models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhou</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2305.02499</idno>
		<title level="m">Automl-gpt: Automatic machine learning with gpt</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Evaluating the carbon footprint of nlp methods: a survey and analysis of existing tools</title>
		<author>
			<persName><forename type="first">N</forename><surname>Bannour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ghannay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A.-L</forename><surname>Ligozat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the second workshop on simple and efficient natural language processing</title>
				<meeting>the second workshop on simple and efficient natural language processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="11" to="21" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">When scaling meets llm finetuning: The effect of data, model and finetuning method</title>
		<author>
			<persName><forename type="first">B</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cherry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Firat</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2402.17193.arXiv:2402.17193" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Character-level convolutional networks for text classification</title>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Khandelwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Chaudhary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wenzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Guzmán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1911.02116</idno>
		<title level="m">Unsupervised cross-lingual representation learning at scale</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Deberta: Decoding-enhanced bert with disentangled attention</title>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<ptr target="https://openreview.net/forum?id=XPZIaotutsD" />
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2111.09543</idno>
		<title level="m">Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title level="m" type="main">ALBERT: A lite BERT for self-supervised learning of language representations</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Lan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Goodman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Soricut</surname></persName>
		</author>
		<idno>CoRR abs/1909.11942</idno>
		<ptr target="http://arxiv.org/abs/1909.11942.arXiv:1909.11942" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-T</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2003.10555</idno>
		<title level="m">Electra: Pre-training text encoders as discriminators rather than generators</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Scaling instruction-finetuned language models</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">W</forename><surname>Chung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Longpre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zoph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Fedus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dehghani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brahma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="1" to="53" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Heng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Lam</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2403.09162</idno>
		<title level="m">Unveiling the generalization power of fine-tuned large language models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b34">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2106.09685</idno>
		<title level="m">Lora: Low-rank adaptation of large language models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Prasad Varadarajan Srinivasan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gumpena</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yattapu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">H</forename><surname>Brahmbhatt</surname></persName>
		</author>
		<idno>arXiv-2405</idno>
		<title level="m">Comparative analysis of different efficient fine tuning methods of large language models (llms) in lowresource setting</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv e-prints</note>
</biblStruct>

<biblStruct xml:id="b36">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Na</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Strubell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Friedler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Luccioni</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2311.10267</idno>
		<title level="m">Energy and carbon considerations of fine-tuning bert</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">A new grammatical evolution based on probabilistic contextfree grammar</title>
		<author>
			<persName><forename type="first">H.-T</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">W</forename><surname>Ahn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems-Volume</title>
				<meeting>the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems-Volume</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="1" to="12" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
