=Paper=
{{Paper
|id=Vol-3797/paper9
|storemode=property
|title=
Improving AutoML for LLMs via Knowledge-Based Meta-Learning

|pdfUrl=https://ceur-ws.org/Vol-3797/paper9.pdf
|volume=Vol-3797
|authors=Ernesto Luis Estevanell-Valladares
|dblpUrl=https://dblp.org/rec/conf/sepln/Estevanell-Valladares24
}}
==
Improving AutoML for LLMs via Knowledge-Based Meta-Learning
==
<pdf width="1500px">https://ceur-ws.org/Vol-3797/paper9.pdf</pdf>
<pre>
                                Improving AutoML for LLMs via Knowledge-Based
                                Meta-Learning
                                Ernesto Luis Estevanell-Valladares1,2
                                1
                                    Faculty of Mathematics and Computer Science, University of Havana
                                2
                                    Natural Language Processing and Information Systems Group, University of Alicante


                                              Abstract
                                              Recent advancements in Large Language Models (LLMs) such as BERT, GPT-4, and T5 have revolutionized
                                              the field of Natural Language Processing (NLP), unlocking numerous applications. However, fine-
                                              tuning these models for specific tasks remains a complex and resource-intensive process, often relying
                                              heavily on expert knowledge. This research proposes integrating meta-learning into Automatic Machine
                                              Learning (AutoML) systems to optimize LLM fine-tuning and pipeline construction. We hypothesize
                                              that knowledge-based meta-learning can overcome the inefficiencies of current AutoML approaches by
                                              embedding expert-derived heuristics into the optimization process. Our methodology involves compiling
                                              extensive LLM usage data, training meta-learning estimators, and integrating these into the AutoGOAL
                                              AutoML framework. By doing so, we aim to reduce computational costs and enhance the efficiency
                                              of LLM-based NLP applications. The proposed system will be evaluated against traditional AutoML
                                              methods and human experts on various text classification tasks to validate its effectiveness. This research
                                              can further democratize NLP by making advanced LLM capabilities more accessible and efficient.

                                              Keywords
                                              AutoML, Large Language Model, Meta-Learning, Natural Language Processing


                                1. Introduction
                                Recent advances in large language models (LLMs), such as BERT [1], the different versions
                                of GPT [2, 3], and others like T5 [4] or Mistral [5], have unlocked a whole new landscape of
                                applications. With their sophisticated internal language representations, these models have
                                demonstrated the potential to generalize across numerous tasks [6, 7, 8], thus democratizing
                                access to advanced NLP capabilities. However, achieving satisfactory performance typically
                                requires model fine-tuning, which involves selecting the appropriate model, fine-tuning method,
                                and hyperparameters, often relying on researchers’ prior experience and trial-and-error ap-
                                proaches [9].
                                   On the other hand, Automatic Machine Learning (AutoML) [10] democratizes traditional
                                Machine Learning (ML) by automating the process of building adequate ML pipelines for specific
                                tasks, reducing user interaction. These systems have proven their efficacy in Model Selection
                                (MS) [11] and Hyper-parameter Optimization (HPO) [12], showing relevant results in various
                                ML tasks [11, 13, 14, 15]. Some systems, like AutoGOAL [15], can even tackle NLP tasks and


                                Doctoral Symposium on Natural Language Processing, 26 September 2024, Valladolid, Spain.
                                Envelope-Open elev1@alu.ua.es (E. L. Estevanell-Valladares)
                                Orcid 0000-0002-1168-1767 (E. L. Estevanell-Valladares)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
have shown the ability to compete with manually designed models by human experts within a
fraction of the time.
   Building ML pipelines and LLM solutions are similar in that both depend on numerous design
decisions. NLP pipelines could include multiple steps (e.g., data preprocessing, feature extraction,
classification), combining algorithms and hyper-parameters that work in conjunction. On the
other hand, LLMs have many life-cycle stages, each consisting of different tasks and metrics
that need optimization [16]. However, it is more common (and accessible) to fine-tune an LM
rather than retrain it from the beginning. This is mainly due to the massive computational cost
of pre-training, the considerable availability of pretrained LLMs, and the reported performance
of even dated LLMs (e.g., BERT [1], RoBERTa [6], and DistilBERT [17]) when fine-tuned.
   Just as AutoML is used for building traditional ML pipelines, it can automatically create LLM
pipelines or fine-tune LLMs based on pretrained models, as there is no technical difference
between both types of pipelines. However, evaluating an LLM pipeline incurs a significant
computational cost, and fine-tuning a model could take hours, depending on the training
data and available computational resources. Additionally, the complexity of the search spaces,
which include multiple LLMs, fine-tuning methods, and hyperparameters, could make zero-shot
AutoML less efficient than human experts who rely on prior knowledge.
   Our research proposes modeling knowledge from the fine-tuning stage of LLMs and integrat-
ing it into an AutoML process to efficiently generate optimal LLM pipelines for any specific
NLP task. As such, our central hypothesis is that (H1) knowledge-based meta-learning can
mitigate the drawbacks of AutoML for LLMs and help build LLM-based applications more
effectively. To test our hypothesis, we will design, develop, and integrate such meta-learning
components into an AutoML system. In particular, we will focus on the Text Classification task
as it is relevant and allows for a more straightforward proposal evaluation process. Then, we
will compare our meta-learning-based AutoML system against zero-shot AutoML and human
experts.

1.1. Motivational Example
Imagine a mid-sized company wanting to implement an advanced customer support chatbot
using pre-trained LLMs like the GPTs [2, 3] or T5 [4]. Traditionally, customizing one of these
models could take weeks or months, delaying deployment and impacting productivity. Our
proposed knowledge-based meta-learning approach within an AutoML framework aims to
automatically predict the most suitable LLM, tuning method, and settings for the specific task.
   This approach reduces time and computational resources, improving model development
efficiency and quality. Integrating expert knowledge into the AutoML process can speed up
the entire production pipeline and lead to faster and more effective deployment of LLM-based
applications.


2. Related Work
Much research is currently being conducted to explore how AutoML and LLMs can benefit each
other. [16] summarizes this symbiotic relationship from two different viewpoints: LLMs for
AutoML and AutoML for LLMs.
2.1. LLMs for AutoML
The term LLMs for AutoML refers to using Language Models to enhance an AutoML process
or system. The two most popular approaches in this category involve using LLMs to improve
human interaction with AutoML systems or using the knowledge embedded in LLMs to actively
contribute to the solution-building process of AutoML [16].

      Human-to-Machine Interaction: LLM-based applications like ChatGPT [3] from
      OpenAI and Gemini [18] from Google demonstrate how LLMs can be employed for
      human-to-machine interaction with millions of users. From this experience stems
      the potential of LLMs for improving user interaction with complex AutoML systems.
      According to Tornede et al. [16], language models could serve as the interface for setting
      up the necessary configurations for the AutoML system to function properly and could
      also facilitate some level of result interpretability.

      LLMs as Controllers: Due to the vital amount of knowledge embedded into LLMs during
      training, they can also be used to participate in the solution-building process of AutoML
      actively. Shen et al. [19] and Luo and Shen [20] proposed using LLMs as controllers for
      building pipelines. HuggingGPT [19] parses user inputs into sorted tasks, finds suitable
      huggingface [21] models for each, and computes the response orderly. AutoM3L [20]
      goes a step further, allowing users to have a more active role in each step of the system
      via directives to the LLM. Other proposals by Sayed et al. [22], Morris et al. [23], and
      Zhang et al. [24] also implement this type of approach.

2.2. AutoML for LLMs
Another point of interest in the relationship between AutoML and LLMs is the fact that AutoML
could be used to produce optimal LLM solutions streamlined for specific scenarios automatically.
This approach is known as AutoML for LLMs Tornede et al. [16]. However, this direction
stems several challenges that must be addressed, namely:

   (i) The different stages of the life-cycle of LLMs require optimization on different objectives,
       of which current AutoML systems are incapable.
  (ii) LLMs are extremely resource-intensive [25], even when only considering their latest
       stages (e.g., fine-tuning, inference).

  In their work, Mallik et al. [9] emphasize the gap between current HPO algorithms and
modern Deep Learning (DL) methods. They introduce an HPO approach incorporating expert
knowledge and inexpensive proxy tasks to reduce optimization costs. On the other hand,
Zhang et al. [24] proposes AutoML-GPT, capable of optimizing LLM pipelines for many tasks.
This system optimizes the hyperparameters of such pipelines by simulating their training.
This way, all responsibility falls into the coordinator LLM (and collaborator models), and no
actual evaluation is executed. Both methods leverage expert knowledge to minimize resource
consumption during their hyperparameter optimization search.
   Furthermore, Zhang et al. [26] investigated the impact of data, model, and fine-tuning method
selection on various NLP tasks, concluding that the optimal approach varies depending on
the task. Currently, no system combines Model Selection and HPO. Therefore, we propose an
AutoML system with these specifications.


3. Proposal and Methodology
The expertise in machine learning, mainly when data is limited and training is not feasible,
involves leveraging expert knowledge to navigate the complexities of ML tasks. Experts utilize
scalability rules and heuristics to make informed decisions about model architecture, training
data selection, and fine-tuning techniques based on the specific requirements of each task. These
decisions help optimize resource usage and achieve efficient outcomes. Our proposal aims to
model these heuristics within an AutoML system using meta-learning to avoid sub-optimal
decisions. We propose the following specific objectives:

 O1 Extract, compile, and store knowledge from AutoML logs
    We will analyze AutoML logs to identify patterns and insights that can be extracted from
    the exploration experience. This involves collecting data on configurations, performance
    metrics, and outcomes of AutoML processes.
 O2 Open a federated knowledge venue (Optional, Long Term)
    The knowledge extracted from every AutoML instance will be transformed into a reusable
    format, stored, and shared across multiple devices. We can recycle all the unused knowl-
    edge on LLM experimentation by providing a logging framework connecting to the
    centralized knowledge base. This federated knowledge will be a foundation for training
    models that can be generalized across diverse tasks and settings.
 O3 Train and test an estimator on such knowledge
    We will develop and evaluate an estimator trained on the compiled knowledge to predict
    optimal configurations and settings for new tasks. Federated Knowledge is not required
    to test our main hypotheses but would enhance our estimators. Hence, we can simply
    train and test our estimator using the initially generated data.
 O4 Integrate the estimator into an AutoML system
    Finally, the trained estimator will be integrated into an AutoML system. This integration
    will enable the system to automatically apply expert-derived heuristics and avoid sub-
    optimal decisions, improving overall efficiency and performance.

3.1. Knowledge Compilation
The initial step involves collecting and organizing LLM usage data from various scenarios,
specifically AutoML logs. Our focus will be on text classification to support the testing of our
hypotheses. The data we gather will cover the following components:

    • ML task specifics (text classification).
    • Dataset characteristics (e.g., number of samples and classes, mean length of samples,
      domain).
    • LLM features (e.g., number of parameters and layers, pre-training target task, pre-training
      data domains).
    • Fine-tuning method features (e.g., method name, hyperparameters).
    • Outcome metrics (e.g., performance, resource utilization).

  We acknowledge that a limited amount of data is available for experiments that align with
our specific requirements for fine-tuning LLMs. Additionally, many models are not open-source,
making it difficult to access necessary features. Therefore, our proposal involves generating
the required data for our research. At the time of writing, we have over 2000 LLM evaluation
entries on three text classification tasks: IDMB, Yelp Reviews Full [27], and AG News [27].
  First, we should select an appropriate set of LLMs, fine-tuning methods (with their hyper-
parameters), and NLP tasks to evaluate. Table 1 lists the LLMs we have selected for study
participation. We amount to 44 models (accounting for variants), half of which are generative.
Models range from 65.8 million parameters (DistilBERT) to 11 billion (T5).

          LLM                    Variants
          BERT [1]               (cased, uncased) base, large, base-multilingual (only cased)
          DistilBERT [17]        base (cased, uncased), base-multilingual (cased)
          RoBERTa [6]            base, large
          XLM-RoBERTa [28]       base, large
          DeBERTa [29]           base
          DeBERTaV3 [30]         base
          MDeBERTaV3 [30]        base
          ALBERT-v1 [31]         base, large, xlarge, xxlarge
          ELECTRA [32]           (discriminator) small, base, large
          T5 [4]                 small, base, large, 3B, 11B
          FLAN-T5 [33]           base, large, xxl, xl
          GPT-2 [2]              base, medium, large, xl

Table 1
List of LLMs selected for study participation.

   Fine-tuning has been the preferred choice for adapting Language Models to specific tasks [34].
However, some methods might render different results depending on their use case. For our
research, we have included vanilla fine-tuning, the Low-Rank Adaptation (LoRA) adapter [35]
as a Parameter Efficient Fine-Tuning alternative. Lastly, we added a naive Partial Fine-tuning
method consisting of freezing the initial layers so general knowledge is not lost during training
[36], a way of adaptive fine-tuning.
   Because our hypothesis is domain-agnostic, we propose testing these LLMs and fine-tuning
methods in Text Classification tasks. However, evaluating every possible combination is ineffi-
cient due to the high cost of experimentation and the sheer number of combinations available
(taking into account fine-tuning hyperparameters). Therefore, we resort to AutoML to sample
good-performing and efficient samples.
   AutoGOAL[15] is a heterogeneous AutoML system capable of multi-objective optimization
that includes LLMs in its algorithm pool. However, one of its limitations is that it can only
employ LLMs for inference. Hence, we also need to extend the system to support fine-tuning.
   Optimizing performance and training time could help us produce substantial data in the
shortest possible time. Moreover, training time is a substantial estimator of how compute-
intensive training certain LLM is [37]; hence, optimizing it would help steer the data towards
the greener combinations. However, although theoretically, this could raise the number of
samples generated in a period, we could lean onto other venues for recollecting more data.

3.1.1. Federated Knowledge and Knowledge Recycling
Due to the rise in popularity of LLMs, a massive amount of work is directed toward fine-
tuning these models to specific tasks. Only Huggingface [21] hosts around 60000 models for
text classification, and many of these could have been the final products of a long series of
experiments that ended up under-performing or straight-out invalid. If correctly reported and
utilized, this (disposed of) knowledge could be of great value for meta-learning.
   We propose exploiting this venue by building a logging framework to collect relevant data
from experiments regarding LLMs and store them in a centralized knowledge base. This
Federated Knowledge Base could be the base of further meta-learning approaches to optimizing
LLMs and potentially support many researchers.

3.2. Meta-Learning Estimator
Once we have our Dataset, we will design multiple estimators that utilize (and represent) the
extracted knowledge to predict how adequate a particular combination of LLM and fine-tuning
method (and hyperparameters) are for a target task. We will follow multiple strategies for
generating such estimators. AutoGOAL (or any other system) could again be employed to find
optimal ML pipelines for our dataset automatically. Additionally, experts will manually design
some explainable solutions and add them to the pool of candidate solutions.

3.3. AutoML Integration
Depending on the chosen system optimization strategy, the integration of the meta-learning
estimator into AutoML can be approached in various ways. We selected AutoGOAL as the
target framework because, to our knowledge, it is the only AutoML system capable of modeling
a broad search space of LLM pipelines.
   AutoGOAL follows a Probabilistic Grammatical Evolution [38] strategy consisting of a cycle
in which each generation produces a population of solutions (pipelines). These pipelines are
then evaluated and ranked by their performance. The top solutions are then selected to shift
the system’s probabilistic model, from which all pipelines are sampled. This way, AutoGOAL
converges into the section of the space more likely to generate optimal solutions.
   A meta-learning component could determine whether an LLM pipeline should be evaluated
based on its predicted performance. If the predicted performance is notably lower than the
current best by a certain threshold, such evaluation could be considered a waste of resources. If
not, the LLM could be trained, and its logs could be stored (or published) for later use by newer
estimators.
   Another potential benefit is leveraging the extracted knowledge to provide the system with
an initial advantage. Specifically, we could initialize the probabilistic distribution (which is
uniform by default) with a bias for the best-performing methods we previously found for similar
tasks. This approach could improve the system’s speed and performance in converging to
optimal solutions.


4. Experiments
To challenge H1 (See Section 1), we propose to test first whether inference based on the extracted
knowledge effectively predicts new scenarios independently from AutoML. Then, we must
evaluate the benefits of integrating the meta-learning component into AutoML.

4.1. Knowledge
To gauge the quality of our compiled knowledge, we must evaluate the performance of our
inferred rules and estimators against our proposed baselines:
    • Random Estimator.
    • LLM Estimators.
  Evaluating estimators can be done as evaluating any ML model. We will automatically
compare each via k-fold cross-validation on our dataset. We will selectively hide LLMs and
Tasks from our dataset to further support our results and test whether the estimators can
generalize to unseen data points. This can also be achieved by repeating the dataset generation
procedure and sampling a test dataset for a new task or with LLMs not previously included.

4.2. Meta-Learning for AutoML
To empirically test the effectiveness of our meta-learning approach, we propose comparing our
meta-learning enhanced AutoGOAL against its original implementation, other AutoML systems,
and human experts on text classification tasks. By doing so, we intend to test whether our tool
can generalize to different, previously unseen tasks. This would also highlight the quality of
our selected features for both the dataset and the models.


References
 [1] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
 [2] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al., Language models are
     unsupervised multitask learners, OpenAI blog 1 (2019) 9.
 [3] OpenAI, GPT-4 Technical Report, Technical Report, 2023. ArXiv:2303.08774.
 [4] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu,
     Exploring the limits of transfer learning with a unified text-to-text transformer, Journal of
     Machine Learning Research 21 (2020) 1–67. URL: http://jmlr.org/papers/v21/20-074.html.
 [5] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. d. l. Casas, F. Bressand,
     G. Lengyel, G. Lample, L. Saulnier, et al., Mistral 7b, arXiv preprint arXiv:2310.06825
     (2023).
 [6] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoy-
     anov, Roberta: A robustly optimized bert pretraining approach, 2019. arXiv:1907.11692 .
 [7] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, Q. V. Le, Xlnet: Generalized
     autoregressive pretraining for language understanding, Advances in neural information
     processing systems 32 (2019).
 [8] C. Sun, X. Qiu, Y. Xu, X. Huang, How to fine-tune bert for text classification?, in: Chinese
     computational linguistics: 18th China national conference, CCL 2019, Kunming, China,
     October 18–20, 2019, proceedings 18, Springer, 2019, pp. 194–206.
 [9] N. Mallik, E. Bergman, C. Hvarfner, D. Stoll, M. Janowski, M. Lindauer, L. Nardi, F. Hutter,
     Priorband: Practical hyperparameter optimization in the age of deep learning, Advances
     in Neural Information Processing Systems 36 (2024).
[10] F. Hutter, L. Kotthoff, J. Vanschoren, Automated Machine Learning, Springer, 2019.
[11] C. Thornton, F. Hutter, H. H. Hoos, K. Leyton-Brown, Auto-weka: combined selection
     and hyperparameter optimization of classification algorithms, ACM, 2013, pp. 847–855.
     doi:10.1145/2487575.2487629 .
[12] M. Feurer, F. Hutter, Hyperparameter Optimization, Springer International Publishing,
     Cham, 2019, pp. 3–33. URL: https://doi.org/10.1007/978-3-030-05318-5_1. doi:10.1007/
     978- 3- 030- 05318- 5_1 .
[13] M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, F. Hutter, Auto-sklearn 2.0: The next
     generation, arXiv: Learning (2020).
[14] E. LeDell, S. Poirier, H2o automl: Scalable automatic machine learning, in: Proceedings of
     the AutoML Workshop at ICML, volume 2020, 2020.
[15] S. Estevez-Velarde, Y. Gutiérrez, A. Montoyo, Y. A. Cruz, Automatic discovery of hetero-
     geneous machine learning pipelines: An application to natural language processing, in:
     Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp.
     3558–3568.
[16] A. Tornede, D. Deng, T. Eimer, J. Giovanelli, A. Mohan, T. Ruhkopf, S. Segel, D. Theodor-
     akopoulos, T. Tornede, H. Wachsmuth, et al., Automl in the age of large language models:
     Current challenges, future opportunities and risks, arXiv preprint arXiv:2306.08107 (2023).
[17] V. Sanh, L. Debut, J. Chaumond, T. Wolf, Distilbert, a distilled version of bert: smaller,
     faster, cheaper and lighter, 2020. arXiv:1910.01108 .
[18] G. Team, R. Anil, S. Borgeaud, Y. Wu, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M.
     Dai, A. Hauth, et al., Gemini: a family of highly capable multimodal models, arXiv preprint
     arXiv:2312.11805 (2023).
[19] Y. Shen, K. Song, X. Tan, D. Li, W. Lu, Y. Zhuang, Hugginggpt: Solving ai tasks with
     chatgpt and its friends in huggingface, arXiv preprint arXiv:2303.17580 (2023).
[20] D. Luo, Y. Shen, Autom3l: Automated multimodal machine learning with large language
     model (2023).
[21] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf,
     M. Funtowicz, et al., Huggingface’s transformers: State-of-the-art natural language
     processing, arXiv preprint arXiv:1910.03771 (2019).
[22] E. Sayed, M. Maher, O. Sedeek, A. Eldamaty, A. Kamel, R. El Shawi, Gizaml: A collaborative
     meta-learning based framework using llm for automated time-series forecasting., in: EDBT,
     2024, pp. 830–833.
[23] C. Morris, M. Jurado, J. Zutty, Llm guided evolution-the automation of models advancing
     models, arXiv preprint arXiv:2403.11446 (2024).
[24] S. Zhang, C. Gong, L. Wu, X. Liu, M. Zhou, Automl-gpt: Automatic machine learning with
     gpt, arXiv preprint arXiv:2305.02499 (2023).
[25] N. Bannour, S. Ghannay, A. Névéol, A.-L. Ligozat, Evaluating the carbon footprint of nlp
     methods: a survey and analysis of existing tools, in: Proceedings of the second workshop
     on simple and efficient natural language processing, 2021, pp. 11–21.
[26] B. Zhang, Z. Liu, C. Cherry, O. Firat, When scaling meets llm finetuning: The ef-
     fect of data, model and finetuning method, 2024. URL: https://arxiv.org/abs/2402.17193.
     arXiv:2402.17193 .
[27] X. Zhang, J. Zhao, Y. LeCun, Character-level convolutional networks for text classification,
     Advances in neural information processing systems 28 (2015).
[28] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave,
     M. Ott, L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at
     scale, 2020. arXiv:1911.02116 .
[29] P. He, X. Liu, J. Gao, W. Chen, Deberta: Decoding-enhanced bert with disentangled
     attention, in: International Conference on Learning Representations, 2021. URL: https:
     //openreview.net/forum?id=XPZIaotutsD.
[30] P. He, J. Gao, W. Chen, Debertav3: Improving deberta using electra-style pre-training with
     gradient-disentangled embedding sharing, 2023. arXiv:2111.09543 .
[31] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, R. Soricut, ALBERT: A lite BERT for
     self-supervised learning of language representations, CoRR abs/1909.11942 (2019). URL:
     http://arxiv.org/abs/1909.11942. arXiv:1909.11942 .
[32] K. Clark, M.-T. Luong, Q. V. Le, C. D. Manning, Electra: Pre-training text encoders as
     discriminators rather than generators, arXiv preprint arXiv:2003.10555 (2020).
[33] H. W. Chung, L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, Y. Li, X. Wang, M. Dehghani,
     S. Brahma, et al., Scaling instruction-finetuned language models, Journal of Machine
     Learning Research 25 (2024) 1–53.
[34] H. Yang, Y. Zhang, J. Xu, H. Lu, P. A. Heng, W. Lam, Unveiling the generalization power
     of fine-tuned large language models, arXiv preprint arXiv:2403.09162 (2024).
[35] E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, Lora: Low-rank
     adaptation of large language models, arXiv preprint arXiv:2106.09685 (2021).
[36] K. Prasad Varadarajan Srinivasan, P. Gumpena, M. Yattapu, V. H. Brahmbhatt, Comparative
     analysis of different efficient fine tuning methods of large language models (llms) in low-
     resource setting, arXiv e-prints (2024) arXiv–2405.
[37] X. Wang, C. Na, E. Strubell, S. Friedler, S. Luccioni, Energy and carbon considerations of
     fine-tuning bert, arXiv preprint arXiv:2311.10267 (2023).
[38] H.-T. Kim, C. W. Ahn, A new grammatical evolution based on probabilistic context-
     free grammar, in: Proceedings of the 18th Asia Pacific Symposium on Intelligent and
     Evolutionary Systems-Volume 2, Springer, 2015, pp. 1–12.

</pre>