Automatically Designing Machine Learning Models out of Natural Language Ernesto Luis Estevanell-Valladares1,2 1 Department of Software and Computer Systems, University of Alicante 2 Faculty of Mathematics and Computer Science, University of Havana Abstract The popularity of artificial intelligence has led to an increasing need for machine learning models tailored to specific needs. AutoML aims to automate the process of creating effective machine-learning solutions, but current systems need to be more versatile to meet the demand. While more flexible and extensible heterogeneous systems have overcome many limitations of traditional AutoML systems, they lack accessibility due to their programmatic interfaces. We propose a research project to address this issue to develop a heterogeneous AutoML system that can produce optimal machine-learning pipelines using a natural language interface. Keywords AutoML, Natural Language Processing, Large Language Models. 1. Introduction Machine learning has expanded rapidly, presenting researchers and practitioners with many new algorithms and data sets. However, selecting the most suitable strategy for a given issue has become increasingly complex, requiring extensive experimentation and technical expertise. AutoML has emerged as a solution to this problem by providing powerful tools to search through large machine-learning pipelines [1]. Nevertheless, the range of possible techniques for natural language processing is vast, making it hard to combine and compare different algorithms. Thus, AutoML algorithms must agree on a standard protocol for sharing outputs as inputs for any other algorithm. To achieve the primary goal of AutoML, the systems must have interfaces that are easy to use for those with limited computer science and machine learning knowledge. Furthermore, the systems should have strong generalization capabilities to create tools that can be utilized in various scenarios and produce machine learning models that can be applied to a wide range of applications. However, current AutoML systems focus on a specific set of algorithms, often tailored to a library or toolkit [2, 3, 4]. This reduces their ability to explore various algorithms from different domains and find optimal solutions to complex, multifaceted problems. In contrast, Heterogeneous AutoML systems generate learning solutions by mixing techniques from different domains [5]. However, they do not provide natural interfaces for novice users Doctoral Symposium on Natural Language Processing from the Proyecto ILENIA, 28 September 2023, Jaén, Spain. $ elev1@alu.ua.es (E. L. Estevanell-Valladares)  0000-0002-1168-1767 (E. L. Estevanell-Valladares) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings in programming or AutoML, and their execution requires preparation of the environment, the definition of the problem in appropriate terms, and the provision of data in specific formats [2, 3, 4, 6, 7, 8, 9]. Using natural language as a user interface can significantly enhance the accessibility and user-friendliness of AutoML. Large Language Models (LLMs) are trendy for their ability to process raw text and effectively identify patterns and connections in data [10, 11]. However, these models can sometimes generate incorrect responses or need help with inference tasks [12]. Recent studies suggest that incorporating external knowledge significantly improves the performance of LLMs [13]. Researchers are exploring combining these techniques to address the limitations of both AutoML and LLMs. Shen et al. [14] employed an LLM to process queries and generate learning task planning using pre-trained models. This approach has successfully integrated image, audio, and text prediction, classification, and processing capabilities into a single system. However, it does not focus on optimizing model selection or hyperparameter optimization. It does not produce tuned learning models in standalone programs or allow for the export of inference power to arbitrary environments. The research project consists of producing a Heterogeneous AutoML system that integrates natural language processing as its primary interface. The ultimate goal is to design a tool to generate optimal machine learning models that are flexible and adaptable to different contexts and heterogeneous situations. This leads to our Main Research Question: ”In what way can we integrate Natural Language into a Heterogeneous AutoML process?”. 2. Related Work Most AutoML systems only use limited algorithms specific to a particular library or toolkit. This limitation hinders their ability to solve complex problems by exploring various algorithms from different areas. On the other hand, Heterogeneous AutoML systems combine techniques from multiple domains to create better learning solutions. However, they require a user-friendly interface for those new to programming or AutoML. This setup involves defining the problem correctly, providing data in specific formats, and setting up the environment. Table 1 contrasts several existing AutoML systems with the system proposed in this research regarding their capabilities of dealing with heterogeneous scenarios. This evaluation focused on their capacity to handle diverse scenarios encompassing multiple algorithms. It is worth noting that this assessment was solely based on their ability to handle heterogeneous algorithms without consideration for their overall performance, capacity, or applicability. AutoML systems vary in capabilities and limitations, depending on the specific learning libraries they are built on. Some, like Auto-Sklearn [3], Auto-Weka [4], and Auto-Keras[2], are restricted to using Scikit-learn [15], Weka [16], and Keras [17], respectively. Other systems, such as RECIPE [9] and Hyperopt [7], can incorporate algorithms from different libraries but require a concrete implementation. TPOT [6] and ML-Plan [8] provide a more flexible approach, combining technologies from multiple learning libraries to create concrete implementations of learning pipelines. AutoML systems are mostly focused on supervised learning, but some offer the potential to integrate unsupervised learning functionality, like Hyperopt. Multiple ML problems Automatic Discovery Deployable Pipelines Multiple libraries Multiobjective Probabilistic Distributed Extensible AutoML Systems Year Hyperopt ≈ ✓ ✓ ✓ ✓ ✓ 2013 Auto-Weka 2.0 ✓ ✓ ✓ ✓ 2017 RECIPE ≈ ≈ 2017 TPOT ✓ ✓ ✓ 2018 ML-Plan ✓ ✓ ✓ ✓ ✓ ✓ 2018 Auto-Keras ✓ ✓ ✓ ✓ 2019 Auto-Sklearn 2.0 ✓ ✓ ✓ ✓ ✓ 2020 AutoGOAL ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 2023 Table 1 Comparison of several existing AutoML systems’ capabilities to deal with heterogeneous machine learning problems. Entries marked with ≈ indicate that the system design enables the given capability, but we have no record of its implementation. Systems like Auto-Sklearn and Auto-Keras benefit from a unified underlying API, while modularly designed systems such as ML-Plan allow the addition and modification of algorithms. Several systems allow balancing different objectives or metrics during optimization, which is relevant in multiple development and research scenarios. For example, Auto-Keras and ML-Plan use a weighted sum approach to combine multiple evaluation metrics into a single objective function. These systems allow you to specify both the metrics and the weights assigned to each metric, which allows you to control their relative importance in the optimization process. Hyperopt, ML-Plan, Auto-WEKA, and Auto-Sklearn include mechanisms to distribute search processes and resources among multiple computers, optimizing search time and generating learning pipelines more quickly. To meet the goal of AutoML, system solutions should be easily usable in arbitrary environments and applicable as portable machine learning algorithms outside the AutoML context. Systems such as Hyperopt, TPOT, and Auto-Sklearn allow exporting the best pipeline found during the search process as a Python script, while Auto-WEKA can export to JAVA [18] code. Auto-Keras allows exporting the models in various formats, including TensorFlow [19], PyTorch [20], and Keras [17]. Using probabilistic models to describe the space of possible pipelines is another interesting feature of AutoML systems. AutoML systems based on Bayesian optimization build an internal representation of the space of possible algorithm pipelines, which can be interpreted as assigning a probability distribution to each particular pipeline. This feature describes the algorithm pipeline space and allows researchers to gather additional information by analyzing which regions have higher or lower probabilities of generating effective Pipeline components and allow researchers to gather additional information. Previous research addressed the limitations of AutoML systems. Estevez-Velarde et al. [21] introduced the concept of Heterogeneous AutoML, a more general formulation of the AutoML Problem. Additionally, they introduced AutoGOAL, a flexible and efficient system for het- erogeneous AutoML implemented in Python. With AutoGOAL, users can describe a specific machine problem, input and output requirements, and a set of objectives. The system then automatically finds the best pipeline of algorithms from various libraries, including Scikit-learn [15], NLTK [22], Keras [17], and Gensim [23]. It is also customizable, allowing users to add and integrate new algorithms into the existing pipelines. AutoGOAL uses a Pareto Front approach to multiobjective optimization and an optimization process based on probabilistic grammatical evolution for context-free grammar [24]. 3. Proposed Research AutoGOAL has achieved state-of-the-art performance against other AutoML systems and has been able to solve machine-learning tasks outside of supervised learning. Additionally, it can build complex pipelines targeting difficult NLP tasks like Named Entity Recognition, being able to connect algorithms of different natures. This research project will use AutoGOAL as a baseline for its capabilities regarding Heterogeneous AutoML. 3.1. Heterogeneous AutoML According to Estevez-Velarde et al. [21], the Heterogeneous AutoML problem’s space of all possible pipelines can be represented as a 𝐺𝐴 graph. This graph consists of nodes representing each known algorithm 𝑎𝑖 , and edges exist between all pairs 𝑎𝑖 , 𝑎𝑗 such that their corresponding output and input types are compatible. Given a machine-learning task defined as a function that transforms an input type into an output type, we can build a specific search space graph 𝐺′𝐴 that only models valid pipelines. To extract 𝐺′𝐴 , we must introduce two additional nodes to 𝐺𝐴 : Input and Output. These nodes are then connected to all algorithms capable of producing the desired output from the specific input. By identifying any path in 𝐺′𝐴 that connects the Input and Output nodes, we can obtain a pipeline that addresses the machine-learning problem at hand. A suitable computational implementation of this process requires solving the following problems: 1. Defining each algorithm and their respective input and output, such that it is compu- tationally feasible to determine if two algorithms can be connected and construct the graph. 2. Designing an optimization strategy that can effectively search in the space of all pipelines, algorithms, and their hyperparameters, given restricted computational resources. AutoGOAL utilizes a set of Semantic Type objects to implement this compatibility function. Each semantic data type is a Python class that belongs to a hierarchy in which object inheritance directly represents the relation for type compatibility (e.g., Word can also be interpreted as Text, a more general type). The data types have a semantic interpretation beyond their underlying computational structure. For example, a string in computational terms can either be a Document, a Sentence, or a Word. Each algorithm is implemented as a class with a run(input: Tin) -> Tout method that performs the corresponding processing, potentially wrapping an underlying implementation from a machine learning library. Each algorithm’s input and output types are specified using Semantic Types and represented by the Tin and Tout annotations. While this method for computing compatibility has advantages, it is rigid and difficult to maintain. Due to the closed nature of the type system, precise type definitions must be matched for any new algorithms identified and added to the AutoGOAL search space. Also, because adding new Semantic Types does not automatically update current algorithm annotations, users need to check on every existing algorithm to identify which should be annotated accordingly. Moreover, this mechanism assumes a tree-like structure of type compatibility when there might be more complex relationships (e.g., a Stem might also be considered a Word, albeit these two types do not inherit from each other). This leads to an interesting question: ”Can we model algorithm compatibility more openly?”. Recent proposals suggest using natural language to store information describing algorithms. Shen et al. [14] uses Jsons, mainly text-based, to store information about pre-trained models. They parse natural language prompts into multiple tasks that are matched with suitable algo- rithms using an LLM. However, this tool does not address the AutoML problem, as it does not optimize model selection or hyperparameter configurations for algorithms. In contrast, Zhang et al. [25] aims to develop an AutoML system called AutoML-GPT, which uses LLMs to train models on datasets with user inputs and descriptions automatically. The LLMs serve as an automatic training system to establish connections with models and process inputs. 3.2. Research Questions In this research, we propose the integration of description cards based on text data for algorithms in AutoGOAL. By leveraging the power of LLMs, we can match algorithms based on their description. This method adds a vital factor of generalization that might solve the previous limitations of AutoGOAL. Moreover, by substituting the current Semantic Type system with natural language, we open the tool’s interface to accept user text prompts. To achieve our main objective, we must address various questions within the proposal: 1. Which LLM should we use? 2. What language should we support in the interface? 3. What machine learning tasks should we target? To answer the first question, we must conduct experiments to determine which LLM will be most suitable for our project. One idea is to use two different LLMs that are each fine-tuned for specific purposes. For example, one LLM can help determine the compatibility between algorithms used during the optimization process. At the same time, the other can identify problem definitions (input and output types and objective functions) out of natural language for better user interaction with the system. For the second question, we aim to develop an interface not tailored to a specific language to make it fully inclusive. However, the performance of LLMs can vary significantly in different languages due to differences in available training data. Therefore, we will compare the perfor- mance of multilingual models against specific-language models in our objective tasks before deciding which approach to follow. Finally, our main objective is to extend existing tools, specifically AutoGOAL, to saturate the definition of Heterogeneous AutoML, thus achieving more flexibility and integrating more tasks seamlessly. We can achieve this by using the compatibility function to discover algorithms that were once bound to a specific machine-learning problem but can be part of the solution of another one. The diversity and amount of algorithms determine the limit of tasks we can solve in the system. The proposed system has valuable scientific, economic, and social implications. It can enhance our understanding of artificial intelligence and apply it to robotics and process automation. Economically, it can speed up the development of applications and decrease the cost and time required for building machine learning solutions. From a social standpoint, an AutoML system based on natural language can improve the accessibility and ease of use of machine learning, especially in critical areas such as healthcare and education. Furthermore, automating the building of learning models can lessen the need for human intervention in repetitive and monotonous tasks, thus reducing the carbon footprint associated with computer system operations. 4. Proposed Experimentation To evaluate the potential of our proposed system, we plan to develop a benchmark that incorpo- rates challenging tasks from multiple domains such as vision, NLP, audio, and others. We will perform ablation studies to comprehend the significance of different LLMs and optimization strategies in the overall performance of our system. In addition, to make the evaluation more comprehensive, we will compare our new system with its previous version and other state- of-the-art AutoML systems. By providing more flexibility, we aim to test the capabilities of our system against human adversaries in various challenges. Furthermore, we will explore the possibility of integrating human feedback into the learning process, which can provide valuable insights and lead to further improvement. 5. Conclusions and Future work The purpose of this publication is to present the research framework for a thesis that aims to investigate the intersection between AutoML and Large Language Models (LLMs). Our objective is to improve AutoML systems, making them more accessible, user-friendly, and versatile. In order to achieve this goal, we will begin by examining the current state of the art in this field. Subsequently, we will develop corresponding description cards for each algorithm available in AutoGOAL and also include new algorithms, such as pre-trained models from Hugging Face along with their respective cards. The next step will be integrating an LLM to model the compatibility function between algorithms, thereby enabling a natural language interface for user interaction. Ultimately, our aim is to pave the way for more inclusive and efficient machine learning applications in various domains. References [1] Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. Automated Machine Learning. Springer, 2019. [2] Haifeng Jin, Qingquan Song, and Xia Hu. “Auto-Keras: An Efficient Neural Architecture Search System”. In: ACM, 2019, pp. 1946–1956. doi: 10.1145/3292500.3330648. [3] Matthias Feurer et al. “Auto-Sklearn 2.0: The Next Generation”. In: arXiv: Learning (2020). [4] Chris Thornton et al. “Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms”. In: ACM, 2013, pp. 847–855. doi: 10.1145/2487575.2487629. [5] Suilan Estevez-Velarde et al. “Automl strategy based on grammatical evolution: A case study about knowledge discovery from text”. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019, pp. 4356–4365. [6] Randal S. Olson and Jason H. Moore. “TPOT: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning”. In: vol. 2019. Springer, Cham, 2018, pp. 66–74. doi: 10.1007/978-3-030-05318-5_8. [7] Brent Komer, James Bergstra, and Chris Eliasmith. “Hyperopt-Sklearn: Automatic Hyper- parameter Configuration for Scikit-Learn”. In: (2013), pp. 32–37. doi: 10.25080/MAJORA- 14BD3278-006. [8] Felix Mohr, Marcel Dominik Wever, and Eyke Hüllermeier. “ML-Plan: Automated machine learning via hierarchical planning”. In: Machine Learning 107.8 (2018), pp. 1495–1515. doi: 10.1007/S10994-018-5735-Z. [9] Alex Guimarães Cardoso de Sá et al. “RECIPE: A Grammar-Based Framework for Auto- matically Evolving Classification Pipelines”. In: Springer, Cham, 2017, pp. 246–261. doi: 10.1007/978-3-319-55696-3_16. [10] Bonan Min et al. Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey. 2021. arXiv: 2111.01243 [cs.CL]. [11] Frank F. Xu et al. “A Systematic Evaluation of Large Language Models of Code”. In: (Feb. 2022). doi: 10.48550/ARXIV.2202.13169. arXiv: 2202.13169 [cs.PL]. [12] Abubakar Abid, Maheen Farooqi, and James Zou. Persistent Anti-Muslim Bias in Large Language Models. 2021. doi: 10.1145/3461702.3462624. [13] Baolin Peng et al. “Check your facts and try again: Improving large language models with external knowledge and automated feedback”. In: arXiv preprint arXiv:2302.12813 (2023). [14] Yongliang Shen et al. “HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace”. In: arXiv preprint arXiv:2303.17580 (2023). [15] Fabian Pedregosa et al. “Scikit-learn: Machine Learning in Python”. In: Journal of Machine Learning Research 12.85 (2011), pp. 2825–2830. [16] Geoffrey Holmes, A. Donkin, and Ian H. Witten. “WEKA: a machine learning workbench”. In: IEEE, 1994, pp. 357–361. doi: 10.1109/ANZIIS.1994.396988. [17] François Chollet. Keras: The Python Deep Learning library. 2018. [18] James Gosling et al. The Java language specification. Addison-Wesley Professional, 2000. [19] Martín Abadi et al. “TensorFlow: a system for large-scale machine learning”. In: USENIX Association, 2016, pp. 265–283. doi: 10.5555/3026877.3026899. [20] Adam Paszke et al. “PyTorch: An Imperative Style, High-Performance Deep Learning Library”. In: vol. 32. hgpu.org, 2018, pp. 8026–8037. [21] Suilan Estevez-Velarde et al. “Automatic discovery of heterogeneous machine learning pipelines: An application to natural language processing”. In: Proceedings of the 28th International Conference on Computational Linguistics. 2020, pp. 3558–3568. [22] Edward Loper and Steven Bird. “NLTK: the natural language toolkit”. In: arXiv preprint cs/0205028 (2002). [23] Petr Sojka. “Gensim—statistical semantics in python”. In: Retrieved from genism. org (2011). [24] Hyun-Tae Kim and Chang Wook Ahn. “A new grammatical evolution based on prob- abilistic context-free grammar”. In: Proceedings of the 18th Asia Pacific Symposium on Intelligent and Evolutionary Systems-Volume 2. Springer. 2015, pp. 1–12. [25] Shujian Zhang et al. “AutoML-GPT: Automatic Machine Learning with GPT”. In: arXiv preprint arXiv:2305.02499 (2023).