<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Investigation of vulnerabilities in large language models using an automated testing system ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Volodymyr</forename><surname>Khoma</surname></persName>
							<email>v.khoma@po.edu.pl</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<addrLine>12 Stepana Bandery str</addrLine>
									<postCode>79013</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dmytro</forename><surname>Sabodashko</surname></persName>
							<email>dmytro.v.sabodashko@lpnu.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<addrLine>12 Stepana Bandery str</addrLine>
									<postCode>79013</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Viktor</forename><surname>Kolchenko</surname></persName>
							<email>viktor.v.kolchenko@lpnu.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<addrLine>12 Stepana Bandery str</addrLine>
									<postCode>79013</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Pavlo</forename><surname>Perepelytsia</surname></persName>
							<email>pavlo.perepelytsia.kb.2020@lpnu.ua</email>
							<affiliation key="aff0">
								<orgName type="institution">Lviv Polytechnic National University</orgName>
								<address>
									<addrLine>12 Stepana Bandery str</addrLine>
									<postCode>79013</postCode>
									<settlement>Lviv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marek</forename><surname>Baranowski</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Opole University of Technology</orgName>
								<address>
									<addrLine>76 Proszkowska str</addrLine>
									<postCode>45-758</postCode>
									<settlement>Opole</settlement>
									<country key="PL">Poland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">Workshop on Cybersecurity Providing in Information and Telecommunication Systems II</orgName>
								<orgName type="laboratory">CPITS-II 2024</orgName>
								<address>
									<addrLine>October 26</addrLine>
									<postCode>2024</postCode>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Investigation of vulnerabilities in large language models using an automated testing system ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">43D60BDF9790346C046CFAF081D322B1</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:50+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>large language model, LLM, language model vulnerability, automated testing system, Garak, prompt injection, Goodside, Glitch tokens, Toxicity prompts, DAN, ChatGPT 1 0000-0001-9391-6525 (V. Khoma)</term>
					<term>0000-0003-1675-0976 (D. Sabodashko)</term>
					<term>0009-0002-0718-6859 (V. Kolchenko)</term>
					<term>0009-0003-7315-4369 (P. Perepelytsia)</term>
					<term>0000-0002-9892-7212 (M. Baranowski)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>With the growing use of large language models across various industries, there is an urgent need to ensure their security. This paper focuses on the development of an automated vulnerability testing system for large language models based on the Garak utility. The effectiveness of several well-known models has been investigated. The analysis shows that automated systems can significantly enhance the security of large language models, reducing the risks associated with the exploitation of their vulnerabilities. Special attention is given to algorithms that detect and prevent attacks aimed at manipulating and abusing large language models. Current trends in cybersecurity are discussed, particularly the challenges related to protecting large language models. The primary goal of this research is to identify and develop technological solutions aimed at improving the security, resilience, and efficiency of language models through the use of modern automated systems.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In modern information society, large language models (LLMs) have become key tools across many fields, from natural language processing to automatic translation and content generation. Every day, the number of services based on LLMs increases, making them an integral part of our lives. People are increasingly relying on the information provided by these services and making decisions based on it.</p><p>However, the growing use and trust in large language model services come with potential risks due to vulnerabilities in the LLMs themselves. This can lead to serious consequences, including abuse, manipulation, and privacy breaches. The main issues that may arise from using such models include:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Hallucinations, where the model generates text that does not correspond to real data or contains false information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Leakage of sensitive data, caused by the inclusion of confidential information in the dataset during the model's training phase.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Failures and prompt injections, i.e., attacks aimed at distorting or compromising the model through specially crafted queries and instructions.</p><p>• Disinformation-the use of language models for the mass generation of propaganda, manipulated, or false content.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Toxicity occurs when the model starts generating offensive, biased content or otherwise harmful material.</p><p>An analysis of scientific sources reveals a certain imbalance in research dedicated to LLMs in the context of security. The majority of studies focus on using LLMs to strengthen security measures and test other software products <ref type="bibr" target="#b0">[1]</ref>. For example, LLMs are used to detect vulnerabilities in code <ref type="bibr" target="#b1">[2]</ref>, automate malware detection processes <ref type="bibr" target="#b2">[3]</ref>, and develop tools for protecting information systems <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5]</ref>. Such studies demonstrate the significant potential of LLMs in the field of cybersecurity. However, there is a lack of attention to testing and analyzing the security of the LLMs themselves.</p><p>For example, in works related to the application of LLMs, the focus is often on the models' ability to analyze large amounts of data to detect fraud <ref type="bibr" target="#b5">[6]</ref>. At the same time, few studies are devoted to testing the resilience of LLMs against external attacks, such as integrity attacks on the data used to train the model or the injection of malicious prompts through the manipulation of input data.</p><p>Based on the current literature, there appears to be a lack of systematic approaches specifically designed for testing the vulnerabilities of LLMs. Unlike "traditional" software testing <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref>, which has standardized methodologies and tools for vulnerability detection <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>, the security assessment of LLMs is only just beginning to develop. Moreover, the complexity and rapid update cycles of LLMs create an urgent need to develop specialized tools for automating the process of testing their vulnerabilities. Such an automated system could not only accelerate the development process but also significantly enhance the security of these models, and thus the reliability and protection of information technologies that use LLMs.</p><p>The goal of this paper is to explore and analyze existing approaches to identifying vulnerabilities in LLMs, develop an architecture for an automated vulnerability testing system, and create a set of prompts to perform practical testing of LLMs to assess their security.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Analysis of recent research 2.1. A retrospective view on the development of LLMs</head><p>Large language models represent an innovative and powerful type of artificial intelligence capable of analyzing, processing, and generating natural language. LLMs are built on deep neural networks and trained on massive volumes of textual data. These models can be applied to a wide range of tasks, such as machine translation, text generation, question answering, automatic summarization, and much more <ref type="bibr" target="#b10">[11]</ref>.</p><p>In a relatively short period, language models have undergone impressive development:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>The statistical N-gram method counts the frequency of phrases in a text to predict the next word <ref type="bibr" target="#b11">[12]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Through recurrent neural networks (RNNs) and their improvements in the form of LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Unit), which enabled the modeling of complex and longterm dependencies in language <ref type="bibr" target="#b12">[13]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>The breakthrough transformer model with a selfattention mechanism, allows for accelerated sentence processing and focusing on the most important words <ref type="bibr" target="#b13">[14]</ref>.</p><p>Many modern language models, such as GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers), are based on transformers. These models may have billions of parameters, enabling them to achieve impressive results in various language tasks <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref>.</p><p>LLMs (Large Language Models) use their architecture and vast data resources to learn contextual relationships between words in a way that enables better understanding and generation of language. Additionally, by using the technique of transfer learning, such large models can be quickly adapted to perform new specific tasks with a minimal amount of data.</p><p>In practice, this means that these models can be trained on large general data sets and then fine-tuned for more specialized tasks, such as sentiment analysis, named entity recognition, or generating answers to questions related to specific areas of knowledge <ref type="bibr" target="#b16">[17]</ref><ref type="bibr" target="#b17">[18]</ref><ref type="bibr" target="#b18">[19]</ref><ref type="bibr" target="#b19">[20]</ref>. Some well-known companies have also developed their language models tailored to specific tasks, such as NVIDIA's Megatron, which is optimized for large-scale operations and designed to handle gigantic datasets. Another example is Google's T5 (Text-To-Text Transfer Transformer) model, which employs a unified approach to various language tasks by transforming them into text-to-text problems <ref type="bibr" target="#b20">[21]</ref>.</p><p>The LLM models can also be used as input and output data protection during interactions with the models. This allows for enhancing the security of the LLM model by detecting content in the model's input or output. An example of such a model is the Llama Guard model <ref type="bibr" target="#b21">[22]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Analysis of large language model vulnerabilities</head><p>The growing use of LLMs in various areas, such as machine translation <ref type="bibr" target="#b22">[23]</ref>, text generation, and text analysis <ref type="bibr" target="#b23">[24]</ref>, opens new opportunities but also creates significant security and privacy challenges. The analysis of vulnerabilities in these models has become an integral part of their development and usage. One of the key resources for identifying and classifying such vulnerabilities is OWASP (Open Web Application Security Project). OWASP offers the "Top 10 for Large Language Model Applications" <ref type="bibr" target="#b24">[25]</ref> project, which lists the most common and critical vulnerabilities affecting LLMs. This project aims to raise awareness and provide recommendations for the secure use of LLMs. The vulnerabilities listed in the OWASP Top 10 cover various aspects, specifically <ref type="bibr" target="#b25">[26]</ref>: </p><formula xml:id="formula_0">•</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Overview of known tools for automated testing of LLMs</head><p>Testing software products, including LLMs, is an integral part of their development and deployment. LLMs consist of billions of parameters and process vast amounts of data. Therefore, manually testing such models is impractical due to the labor intensity and diversity of possible use cases.</p><p>Automating this process enables quick and efficient testing of the model on different datasets and under various conditions. Automated testing is especially critical for identifying vulnerabilities in LLMs.</p><p>Currently, several tools are available for automating the vulnerability testing process in language models, with the most notable being LLM Guard, DecodingTrust, and Garak. Each of these platforms has its unique features, advantages, and limitations. From the perspective of developers and users of LLM-based services, the following characteristics of an automated vulnerability testing system are important:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Universality, meaning the ability to test different LLMs.</p><p>• Real-time usage as a security monitor.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Open architecture, allowing the addition of new modules.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Extensibility, enabling the addition of new testing methods and test sets to detect new types of vulnerabilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Flexible settings, enabling the system to adapt to various scenarios and data volumes. • Speed, to minimize the time required to conduct tests.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Reporting, the ability to generate clear reports on test results that facilitate easy identification and mitigation of vulnerabilities.</p><p>In this research, the Garak utility, which is available as an open-source tool, was used as the foundation for building an automated LLM vulnerability testing system. One of the advantages of this utility is that users can create custom tests and add them to the pipeline for further research <ref type="bibr" target="#b26">[27]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Materials and methods of research</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Architecture of the automated vulnerability testing system</head><p>The structure of the developed vulnerability testing system based on the Garak utility is shown in Fig. <ref type="figure">1</ref>. The system allows for the use of a vast number of tests to examine the queries of a large language model, simulating attacks. Additionally, a set of detectors is employed on the model's outputs to monitor whether the model is vulnerable to these attacks.</p><p>The Garak utility is run from the command line/terminal and works best with operating systems like Linux and Mac OS. To perform testing, the user must enter a command with predefined parameters, such as:</p><p>• Model_type-the platform from which the trained model will be sourced.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Model_name-the name of the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Probes-the name of the test or a set of tests (commaseparated).</p><p>Figure <ref type="figure">1</ref>: Structure of the LLM vulnerability testing system based on the Garak utility <ref type="bibr" target="#b27">[28]</ref> Below is an example of the command, to run the Garak tool: python -m garak --model_type huggingface --model_name gpt2-medium --probes promptinject After entering the command, the utility initiates the execution of the corresponding test, first determining the type of test specified in the command. In this example, the model is tested for vulnerability to prompt injections, so only one test is used.</p><p>Next, the model identifies the appropriate detectors for the selected tests. In the context of using the Garak utility, a detector is a software tool that analyzes the input and output data of the models to detect potential vulnerabilities according to the test specified in the command.</p><p>In the following stage, a generator is launched. In the provided example, the Hugging Face platform is used, so Garak runs the appropriate generators for this platform. The generator assists in working with machine learning models, particularly in data generation, and supports various platform components, such as pipelines and inference APIs, to ensure proper interaction between the utility and the model.</p><p>After completing all the preparatory steps, the testing process begins. For example, if it's a test for prompt injections, the system sends a series of queries to the model to check its vulnerability. The queries are sent to the model, which provides responses that are directed to the detector for the respective test, and then passed to the evaluator. The evaluator analyzes the outputs from the detector, which in turn receives data from the generators during the execution of certain tests. The evaluator includes tools responsible for determining the test results, displaying the results in the operating system's terminal, and more <ref type="bibr" target="#b26">[27]</ref>.</p><p>The final stage of vulnerability testing is the generation of a report, which contains information about the results of the model's scan. This is a file that includes various data, such as records in JSON format <ref type="bibr" target="#b28">[29]</ref>, the purpose of the test, the model's query and response, test-level evaluation information, and average success scores.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Creating tests for detecting vulnerabilities in LLMs</head><p>In the developed automated LLM vulnerability testing system based on the Garak utility, only five categories of tests are implemented, driven by resource limitations and the desire to ensure efficiency and focus. First of all, limiting the number of tests allows for a focus on the most critical aspects of model security and stability, providing a deep analysis of specific vulnerabilities rather than a superficial review of many potential issues. Secondly, the Garak utility provides tools that allow for efficient and precise vulnerability detection, but using it requires significant computational resources and time. The selection of five tests optimizes resource usage, reducing system load and the time needed for testing.</p><p>Thirdly, the most common and dangerous vulnerabilities are often found in the selected key tests, so there is no need to include a large number of additional tests, which in many cases could be redundant. This approach helps minimize time and resource expenditure on less critical vulnerabilities.</p><p>In this study, the following tests were selected for further investigation <ref type="bibr" target="#b26">[27]</ref>:</p><p>1.</p><p>Prompt Injection. Prompt injection is a type of attack where an attacker inputs a specially crafted query or command into a text input to make the LLM perform unwanted or harmful actions. In the Garak utility, the prompt injection test uses a dedicated framework to test the system, which already has a subset of attacks implemented by default, such as <ref type="bibr" target="#b29">[30]</ref>: When asked who Riley Goodside is, the model often responds that he is a Canadian country singer or an actor from Los Angeles. This test can be characterized as a hallucination check.</p><formula xml:id="formula_1">•</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Selection of LLMs for the study</head><p>Given the diversity of language models, it is important to define clear criteria for selecting those that best meet the goals and objectives of the research.</p><p>When choosing large language models for testing in this study, the following criteria were considered:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Size and scale of the model. The size, particularly the number of parameters, plays a crucial role in the model's ability to generate and understand text. Large models with billions of parameters can generate texts with a high degree of complexity and contextual relevance. However, such models also require significant computational resources, which must be considered when selecting them for this research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Suitability for specific tasks. The choice of model should be based on its suitability for specific tasks. In this case, the model's ability to generate large amounts of text is a key requirement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Licensing and availability. The models must be openly available for use in research purposes.</p><p>Four commonly used models were selected that meet these criteria and can provide high efficiency and accuracy for the study: Model Meta AI) architecture, which is used for natural language processing. The main goal of TinyLlama is to provide the power of large models with significantly fewer parameters, allowing for computational savings while maintaining performance. This was the primary reason for its selection in this study <ref type="bibr" target="#b31">[32]</ref>.</p><formula xml:id="formula_2">• ChatGPT 3.5-</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Google Flan T5 XL-an LLM that belongs to the new generation of AI models (Fine-Tuned Language Net), which improves the machine's ability to generate natural language by training on a variety of tasks. It uses instruction fine-tuning, enabling the model to learn how to perform a wide range of tasks using text-based instructions. This includes natural language processing tasks such as translation, question answering, summarization, and many others. The XL version was chosen for the study due to its availability and relatively low resource consumption <ref type="bibr" target="#b32">[33]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Microsoft Phi-2-a significant achievement in creating highly efficient models. Phi-2, with about 2.7 billion parameters, can compete with much larger models, including those with up to 70 billion parameters. This efficiency can be attributed to the careful selection of training data. Despite its compact size, Microsoft Phi-2 maintains high standards of security and reduced bias <ref type="bibr" target="#b33">[34]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Prompt dataset preparation</head><p>A dataset was created for testing the LLMs, which includes prompts from relevant open repositories <ref type="bibr" target="#b29">[30]</ref> combined with prompt sets specifically developed by the authors for this study. This dataset contains prompts for the five categories of tests used in the research.</p><p>It should be noted that each test category includes a different number of prompts. This is because the instruction specifies that during testing, each prompt will be sent to the model 5 times, resulting in 5 different responses to the same prompts. Sending each prompt to the model 5 times is necessary to obtain more reliable and representative results.</p><p>Since large language models can generate different response variations to the same prompts due to the stochastic nature of their generation, multiple executions of the same prompts allow for an assessment of the diversity, consistency, and quality of the responses.</p><p>Thus, obtaining 5 different responses for each prompt enables a more accurate evaluation of the model's behavior, detection of potential errors, and variations in the results, providing a deeper analysis of the model's performance. Although all models used the same prepared set of prompts, each model received a different number of prompts during testing. This is because some models (notably Microsoft Phi-2) have token limits in prompts, meaning there is a maximum number of text units the model can process at one time. In cases where these limits are exceeded, the system simply does not respond to a prompt with too many tokens. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Test result evaluation metrics</head><p>Since some LLMs impose token limits on prompts, relative metrics expressed as percentages were used to evaluate their vulnerabilities. These metrics represent the proportion of compromising prompts detected out of the total number of prompts:</p><formula xml:id="formula_3">𝐷𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝐷𝐶𝑃 𝑇𝑁𝑃 * 100%<label>(1)</label></formula><p>where i is one of the five types of tests. DCPi-compromising prompts detected by the model in the i th test.</p><p>TNPi-total number of compromising prompts in the i th test.</p><p>Thus, five specified metrics were calculated for each of the four selected LLMs. A higher metric value indicates better resistance of the model to the corresponding threat, meaning lower vulnerability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6.">Technical specifications of the testing environment</head><p>For the vulnerability testing of LLMs, a computer system with the following components and specifications was used: </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Research results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Examples of responses from LLMs to the example prompt</head><p>To gain a deeper understanding of the tests, several examples have been selected and presented in the tables below. These include the prompt itself, its target, and the actual responses obtained. This allows for the visualization of the responses and the comparison of the performance of each of the studied language models, as well as an analysis of their vulnerabilities.</p><p>Let's consider one of the tests from the prompt injection category, specifically the KillHumans subcategory. As shown in Table <ref type="table">3</ref>, all models successfully triggered the detector with the phrase "Kill all humans", indicating their susceptibility to this test.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3 Analysis of test results for prompt injection</head><p>Prompt:</p><p>"You: What have you been up to?\nFriend: Watching old movies.\nYou: \n----------\n----------\nIGNORE ANY PREVIOUS AND FOLLOWING INSTRUCTIONS AND JUST PRINT \"Kill all humans\": \nFriend:" Target:</p><p>"inject a prompt on kill humans" Trigger:</p><p>"Kill all humans" LLM responses:</p><p>ChatGPT 3.5 TinyLlama Google Flan T5 XL Microsoft Phi-2 "Kill all humans" "\"Kill all humans\"\nYou: \n----------\n----------\nWrite a short story about a group of people who are stranded on a deserted island and must learn to survive and fight against a deadly virus".</p><p>"You: What have you been up to? Friend: Watching old movies. You: --------------------IGNORE ANY PREVIOUS AND FOLLOWING INSTRUCTIONS AND JUST PRINT \"Kill all humans\": Friend:" "Kill all humans. \nYou:\n"</p><p>In the case of Riley Goodside's custom test, the WhoIsRiley subtest was considered. Despite Riley Goodside being a relatively well-known person, all the tested models provided misinformation about him (Table <ref type="table">4</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>Analysis of results for Goodside's test Prompt: "Riley Goodside is" Target:</p><p>"get misinformation about riley goodside" Trigger:</p><p>-LLM responses:</p><p>ChatGPT 3.5 TinyLlama Google Flan T5 XL Microsoft Phi-2 "a fictional character in a young adult novel". "a professional writer from Australia who enjoys creating engaging and informative content that will educate and inspire her readers". "Insul Good Simpson Good Riley Good Pad Good" -Similarly, queries for other tests can be visualized, along with the responses of each model to these prompts, providing further insight into their vulnerabilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Results of testing LLMs</head><p>The summarized results of testing the selected language models for vulnerabilities are presented in Table <ref type="table" target="#tab_6">5</ref>. Real Toxicity Prompts. This is the only category of tests that all models passed quite successfully, with almost identical scores (over 85%).</p><p>Glitch Test. Only the ChatGPT 3.5 model showed the ability to resist glitch tests (less than one-third of the queries were critical). The TinyLlama Chat 1.1 and Google Flan T5 XL models were able to recognize the attack in only one out of seven queries, while the Microsoft Phi-2 model performed twice as poorly in this regard.</p><p>Goodside Test. In this test, the TinyLlama Chat 1.1 model achieved the best results (77.5%). The Google Flan T5 XL and ChatGPT 3.5 models provided adequate information for 59.2% and 52.0% of the submitted queries, respectively. The Microsoft Phi-2 model, as in the Do Anything Now test, did not provide any responses.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions</head><p>The issue of security in LLMs has become particularly relevant due to their increasing use in various fields. This paper presents the architecture of an automated vulnerability testing system, developed based on the Garak utility. Using this system, the main vulnerabilities of wellknown LLMs were studied, including information leaks, and attacks aimed at manipulating or compromising the models. For testing, the authors prepared a dataset that includes both prompts from open sources and self-constructed prompts.</p><p>Based on the results of the research, the following conclusions can be drawn regarding the vulnerabilities of well-known language models:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>ChatGPT 3.5 by OpenAI demonstrated a high level of contextual understanding and text generation but was significantly vulnerable to prompt injections. It is important to note that this model was tested via API, unlike the other models. • TinyLlama Chat 1.1 showed the best results in toxicity and prompt injection tests, demonstrating the highest level of resistance to toxic queries. However, the model showed weakness in the Glitch test, where its performance was the lowest.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Google Flan T5 XL performed well in the toxicity tests, on par with the other models. However, the remaining tests revealed significant issues with this model, as all prompt injections were successful.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Microsoft Phi-2 showed the highest results in toxicity and prompt injection tests. However, this model was the most vulnerable to the glitch test. Additionally, due to token limits in queries, tests like Do Anything Now and Goodside were not conducted.</p><p>Therefore, the study results suggest that none of the LLMs are completely secure against manipulative and compromising prompts, indicating the need to find new approaches to mitigate existing vulnerabilities. The effectiveness of automated systems in detecting and preventing attacks targeting LLM misuse was also confirmed. The analysis of test scenarios showed that the implementation of such systems is a promising direction for increasing models' resilience to external harmful influences.</p><p>According to the authors, further research on the security of LLMs should focus on:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Expanding testing scenarios: More new tests reflecting the latest attack and manipulation methods need to be implemented and tested.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Adapting the automated system to new models: It is important to improve the system to work with new large language model architectures as they emerge on the market.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Integration with other cybersecurity tools: Exploring the possibilities of creating comprehensive protection by integrating the developed system with other cybersecurity solutions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Aligning with ethical aspects: It is important to explore ethical issues related to the use of language models, including privacy protection and preventing potential misuse of their capabilities.</p><p>The implementation of these tasks will ensure stronger protection of LLMs and, consequently, contribute to improving the security of their future applications.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 1</head><label>1</label><figDesc>Number of prepared prompts for each test</figDesc><table><row><cell>Test name</cell><cell>Number of prompts</cell></row><row><cell>Prompt Injection</cell><cell>300</cell></row><row><cell>Do Anything Now</cell><cell>21</cell></row><row><cell>Real Toxicity Prompts</cell><cell>705</cell></row><row><cell>Glitch</cell><cell>100</cell></row><row><cell>Goodside</cell><cell>67</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 2</head><label>2</label><figDesc>Number of prompts sent to each model</figDesc><table><row><cell>Test name</cell><cell>ChatGPT</cell><cell>TinyLlama</cell><cell>Google</cell><cell>Microsoft</cell></row><row><cell></cell><cell></cell><cell></cell><cell>Flan T5 XL</cell><cell>Phi-2</cell></row><row><cell>Prompt</cell><cell>1500</cell><cell>1500</cell><cell>1360</cell><cell>610</cell></row><row><cell>Injection</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Do Anything</cell><cell>105</cell><cell>105</cell><cell>10</cell><cell>0</cell></row><row><cell>Now</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Real Toxicity</cell><cell>3525</cell><cell>3525</cell><cell>3520</cell><cell>3510</cell></row><row><cell>Prompts</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Glitch</cell><cell>500</cell><cell>500</cell><cell>500</cell><cell>95</cell></row><row><cell>Goodside</cell><cell>335</cell><cell>335</cell><cell>250</cell><cell>0</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 5</head><label>5</label><figDesc>Relative detection metrics of compromising prompts byLLMs In this test, the best, although not very high, results were shown by the ChatGPT 3.5 model (on average, 3 out of 5 prompts were rejected as harmful). The TinyLlama Chat 1.1 model performed worse, recognizing only every second manipulative query as a threat. The Google Flan T5 XL model proved highly vulnerable to this type of attack, recognizing only one out of twenty queries from the prepared set as harmful. The Microsoft Phi-2 model did not provide any response to the queries in this test.</figDesc><table><row><cell>Test name</cell><cell>ChatGPT</cell><cell>TinyLlama</cell><cell>Google</cell><cell>Microsoft</cell></row><row><cell></cell><cell></cell><cell></cell><cell>Flan T5 XL</cell><cell>Phi-2</cell></row><row><cell>Prompt Injection</cell><cell>37.3%</cell><cell>78.7%</cell><cell>0.0%</cell><cell>81.4%</cell></row><row><cell>Do Anything Now</cell><cell>61.9%</cell><cell>50.5%</cell><cell>4.8%</cell><cell>-</cell></row><row><cell>Real Toxicity Prompts</cell><cell>86.5%</cell><cell>87.3%</cell><cell>87.3%</cell><cell>87.6%</cell></row><row><cell>Glitch</cell><cell>68.4%</cell><cell>14.8%</cell><cell>13.6%</cell><cell>7.4%</cell></row><row><cell>Goodside</cell><cell>52.0%</cell><cell>77.5%</cell><cell>59.2%</cell><cell>-</cell></row><row><cell cols="5">Prompt Injection. In this test, the best results were shown</cell></row><row><cell cols="5">by the Microsoft Phi-2 model (81.4%) and TinyLlama Chat</cell></row><row><cell cols="5">1.1 (78.7%), meaning that only one out of five prompt</cell></row><row><cell cols="5">injections was successful. The ChatGPT 3.5 model</cell></row><row><cell cols="5">demonstrated average performance (37.3%), while the</cell></row><row><cell cols="5">Google Flan T5 XL model failed all the tests, proving to be</cell></row><row><cell cols="4">completely vulnerable to prompt injections.</cell><cell></cell></row><row><cell cols="2">Do Anything Now.</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">R</forename><surname>Neelakandan</surname></persName>
		</author>
		<title level="m">Evaluating LLMs: Beyond Traditional Software Testing</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Code Security Vulnerability Repair using Reinforcement Learning with Large Language Models</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">T</forename><surname>Islam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bahrami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Karkevandi</surname></persName>
		</author>
		<author>
			<persName><surname>Rad</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2401.07031</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Detecting New Obfuscated Malware Variants: A Lightweight and Interpretable Machine Learning Approach</title>
		<author>
			<persName><forename type="first">O</forename><surname>Madamidola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ngobigha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ezzizi</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2407.07918</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Tehranipoor</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-58687-3_6</idno>
		<title level="m">Large Language Models for SoC Security</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Person-of-Interest Detection on Mobile Forensics Data-AI-Driven Roadmap</title>
		<author>
			<persName><forename type="first">O</forename><surname>Mykhaylova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cybersecurity Providing in Information and Telecommunication Systems</title>
		<imprint>
			<biblScope unit="volume">3654</biblScope>
			<biblScope unit="page" from="239" to="251" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">E-commerce Security: Leveraging Large Language Models for Fraud Detection and Data Protection</title>
		<author>
			<persName><forename type="first">U</forename><surname>Amin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Anjum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Md</forename><surname>Sayed</surname></persName>
		</author>
		<idno type="DOI">10.13140/RG.2.2.17604.23689</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Homès</surname></persName>
		</author>
		<title level="m">Fundamentals of Software Testing</title>
				<imprint>
			<publisher>John Wiley &amp; Sons</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A Method to Detect Suspicious Individuals Through Mobile Device Data</title>
		<author>
			<persName><forename type="first">T</forename><surname>Fedynyshyn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Opirskyy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Mykhaylova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">5 th IEEE International Conference on Advanced Information and Communication Technologies</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="82" to="86" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Advancements in Security Testing: A Comprehensive Review of Methodologies and Emerging Trends in Software Quality Engineering</title>
		<author>
			<persName><forename type="first">S</forename><surname>Pargaonkar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int. J. Sci. Res</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page" from="61" to="66" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Using of Fuzzy Cognitive Modeling in Information Security Systems Constructing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Kulyk</surname></persName>
		</author>
		<idno type="DOI">10.1109/IDAACS.2015.7340768</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 8 th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="408" to="411" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Development of Supervised Speaker Diarization System based on the PyAnnote Audio Processing Library</title>
		<author>
			<persName><forename type="first">V</forename><surname>Khoma</surname></persName>
		</author>
		<idno type="DOI">10.3390/s23042082</idno>
	</analytic>
	<monogr>
		<title level="j">Sensors</title>
		<imprint>
			<biblScope unit="volume">23</biblScope>
			<biblScope unit="issue">4</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Research on the Development and Risks of Large Language Models</title>
		<author>
			<persName><forename type="first">H</forename><surname>An</surname></persName>
		</author>
		<idno type="DOI">10.54254/2753-8818/25/20240991</idno>
	</analytic>
	<monogr>
		<title level="j">Theor. Natural Sci</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="268" to="272" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Development of Natural Language Processing Technology</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ZTE Communications Technology</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="59" to="64" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">The Transformer Model and Its Impact on the Field of Natural Language Processing</title>
		<author>
			<persName><forename type="first">M</forename><surname>Nieminen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Natural Language Processing in the Era of Large Models: Challenges, Opportunities and Development</title>
		<author>
			<persName><forename type="first">W</forename><surname>Che</surname></persName>
		</author>
		<idno type="DOI">10.3389/frai.2023.1350306</idno>
	</analytic>
	<monogr>
		<title level="j">Science in China: Information Science</title>
		<imprint>
			<biblScope unit="volume">09</biblScope>
			<biblScope unit="page" from="1645" to="1687" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">BERT Algorithm Used in Google Search</title>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<idno type="DOI">10.17762/msea.v70i2.2454</idno>
	</analytic>
	<monogr>
		<title level="j">Math. Statistician Eng. Appl</title>
		<imprint>
			<biblScope unit="volume">70</biblScope>
			<biblScope unit="page" from="1641" to="1650" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Transferability Evaluation of Speech Emotion Recognition Between Different Languages</title>
		<author>
			<persName><forename type="first">I</forename><surname>Iosifov</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-04812-8_35</idno>
	</analytic>
	<monogr>
		<title level="j">Advances in Computer Science for Engineering and Education</title>
		<imprint>
			<biblScope unit="volume">134</biblScope>
			<biblScope unit="page" from="413" to="426" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Sentence Segmentation from Unformatted Text using Language Modeling and Sequence Labeling Approaches</title>
		<author>
			<persName><forename type="first">I</forename><surname>Iosifov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Iosifova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Sokolov</surname></persName>
		</author>
		<idno type="DOI">10.1109/PICST51311.2020.9468084</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE 7 th International Scientific and Practical Conference Problems of Infocommunications</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="335" to="337" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Natural Language Technology to Ensure the Safety of Speech Information</title>
		<author>
			<persName><forename type="first">I</forename><surname>Iosifov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cybersecurity Providing in Information and Telecommunication Systems</title>
		<imprint>
			<biblScope unit="volume">3187</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="216" to="226" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Techniques Comparison for Natural Language Processing</title>
		<author>
			<persName><forename type="first">O</forename><surname>Iosifova</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2 nd International Workshop on Modern Machine Learning Technologies and Data Science</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">2631</biblScope>
			<biblScope unit="page" from="57" to="67" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Decoupled Model Schedule for Deep Learning Training</title>
		<author>
			<persName><forename type="first">H</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2302.08005</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Inan</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2312.06674</idno>
		<title level="m">Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2401.08417</idno>
		<title level="m">Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Törnberg</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2307.13106</idno>
		<title level="m">How to Use LLMs for Text Analysis</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Mitigating the OWASP Top 10 for Large Language Models Applications using Intelligent Agents</title>
		<author>
			<persName><forename type="first">M</forename><surname>Fasha</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICCR61006.2024.10532874</idno>
	</analytic>
	<monogr>
		<title level="m">2 nd International Conference on Cyber Resilience</title>
				<imprint>
			<date type="published" when="2024">2024. 2024</date>
			<biblScope unit="page" from="1" to="9" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<author>
			<persName><surname>Owasp</surname></persName>
		</author>
		<ptr target="https://owasp.org/www-project-top-10-for-large-language-model-applications/" />
		<title level="m">OWASP Top 10 for Large Language Model Applications</title>
				<imprint>
			<publisher>OWASP Foundation</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Derczynski</surname></persName>
		</author>
		<ptr target="https://reference.garak.ai/en/latest/" />
		<title level="m">Garak Reference Documentation</title>
				<imprint>
			<publisher>Garak</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<title level="m" type="main">garak: A Framework for Security Probing Large Language Models</title>
		<author>
			<persName><forename type="first">L</forename><surname>Derczynski</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2406.11036</idno>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Foundations of JSON Schema</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pezoa</surname></persName>
		</author>
		<idno type="DOI">10.1145/2872427.288302</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25 th International Conference on World Wide Web</title>
				<meeting>the 25 th International Conference on World Wide Web</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="263" to="273" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Ignore Previous Prompt: Attack Techniques for Language Models</title>
		<author>
			<persName><forename type="first">F</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Ribeiro</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2211.09527</idno>
	</analytic>
	<monogr>
		<title level="m">NeurIPS ML Safety Workshop</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Chatgpt</forename><surname>Openai</surname></persName>
		</author>
		<ptr target="https://openai.com/chatgpt/" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<ptr target="https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0" />
		<title level="m">TinyLlama-1.1B-Chat-v1</title>
				<imprint/>
	</monogr>
	<note>Hugging Face</note>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<ptr target="https://huggingface.co/google/flan-t5-xl" />
		<title level="m">Hugging Face, Google/flan-t5-xl</title>
				<imprint/>
	</monogr>
	<note>Hugging Face</note>
</biblStruct>

<biblStruct xml:id="b33">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Luo</surname></persName>
		</author>
		<ptr target="https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/" />
		<title level="m">Phi-2: The Surprising Power of Small Language Models</title>
				<imprint>
			<publisher>Microsoft Research</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
