<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Model Leeching: An Extraction Attack Targeting LLMs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Lewis</forename><surname>Birch</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lancaster University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">William</forename><surname>Hackett</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lancaster University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Stefan</forename><surname>Trawicki</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lancaster University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Neeraj</forename><surname>Suri</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lancaster University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Peter</forename><surname>Garraghan</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Lancaster University</orgName>
							</affiliation>
							<affiliation key="aff1">
								<address>
									<settlement>Mindgard</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Model Leeching: An Extraction Attack Targeting LLMs</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">AB65F4754A18A2D6FAD116C4E89AA769</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:58+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Cybersecurity</term>
					<term>Large Language Models</term>
					<term>Adversarial Machine Learning</term>
					<term>Security</term>
					<term>Generative AI</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Model Leeching is a novel extraction attack targeting Large Language Models (LLMs), capable of distilling task-specific knowledge from a target LLM into a reduced parameter model. We demonstrate the effectiveness of our attack by extracting task capability from ChatGPT-3.5-Turbo, achieving 73% Exact Match (EM) similarity, and SQuAD EM and F1 accuracy scores of 75% and 87%, respectively for only $50 in API cost. We further demonstrate the feasibility of adversarial attack transferability from an extracted model extracted via Model Leeching to perform ML attack staging against a target LLM, resulting in an 11% increase to attack success rate when applied to ChatGPT-3.5-Turbo.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Large Language Models (LLMs) have seen rapid adoption given their proficiency in handling complex natural language processing (NLP) tasks. LLMs leverage Deep Learning (DL) algorithms to process and understand a variety of natural language tasks spanning text completion, Question &amp; Answering, and summarization <ref type="bibr" target="#b0">[1]</ref>. While production LLMs such as ChatGPT, BARD, and LLaMA <ref type="bibr" target="#b1">[2]</ref> [3] <ref type="bibr" target="#b3">[4]</ref> have garnered substantial attention, their uptake has also highlighted pressing concerns on growing their exposure to adversarial attacks <ref type="bibr" target="#b3">[4]</ref>. Studies on adversarial attacks against LLMs are limited, with urgent need to investigate their risk to data leakage, model stealing (extraction), and attack transferability across models <ref type="bibr" target="#b4">[5]</ref> <ref type="bibr" target="#b5">[6]</ref>.</p><p>In this paper we propose Model Leeching, an extraction attack against LLMs capable of creating an extracted model via distilling task knowledge from a target LLM. Our attack is performed by designing an automated prompt generation system <ref type="bibr" target="#b6">[7]</ref> targeting specific tasks within LLMs. The prompt system is used to create an extracted model by extracting and copying task-specific data characteristics from a target model <ref type="bibr" target="#b7">[8]</ref>. Model Leeching attack is applicable to any LLM with a public API endpoint, and can be successfully achieved at minimal economic cost. Moreover, we demonstrate how Model Leeching can be exploited to perform ML attack staging onto other LLMs (including the original target LLM). Our contributions are: dataset (SQuAD) into a Roberta-Large base model. Our findings demonstrate that a large QA dataset can be successfully labelled and leveraged to create an extracted model with 73% EM similarity to ChatGPT-3.5-Turbo, and achieve SQuAD EM and F1 accuracy scores of 75% and 87%, respectively at $50 cost.</p><p>• We study the capability to exploit an extracted model derived from Model Leeching to perform further ML attack staging upon a production LLM. Our results show that a language attack <ref type="bibr" target="#b9">[10]</ref> optimized for an extracted model can be successfully transferred into ChatGPT-3.5-Turbo with an 11% attack success increase. Our results highlight evidence of adversarial attack transferability between user-created models and production LLMs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Attack Description &amp; Threat Model</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Extraction Attacks</head><p>Model extraction is the process of extracting the fundamental characteristics of a DL model <ref type="bibr" target="#b10">[11]</ref>.</p><p>An extracted model is created via extracting specific characteristics (architecture, parameters, and hyper-parameters <ref type="bibr" target="#b11">[12]</ref>) from a target model of interest, which are then used to perform model recreation <ref type="bibr" target="#b12">[13]</ref>. Once the attacker has established an extracted model, further adversarial attacks can be staged encompassing model inversion, membership inference, leaking privacy data, and model intellectual property theft <ref type="bibr" target="#b13">[14]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Threat Model</head><p>State-of-the-art LLMs leveraging the transformer architecture <ref type="bibr" target="#b14">[15]</ref> typically comprise hundreds of billions of parameters <ref type="bibr" target="#b15">[16]</ref>. Using the established taxonomy of adversaries against DL models <ref type="bibr" target="#b16">[17]</ref>, our proposed attacks assume a weak adversary capable of providing model input via an LLM API endpoint, and a model output requiring generated text from a target LLM. The adversary has no knowledge of the target architecture or training data used to construct the underlying LLM parameters. Note that the threat model assumptions pertaining to potential rate limiting, or limited access to the target API can be relaxed due the ability to distribute data generation across multiple API keys.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Model Leeching Attack Design</head><p>Model Leeching is a black-box adversarial attack which seeks to create an extracted copy of the target LLM within a specific task. The attack comprises a four-phases approach as shown in Figure <ref type="figure" target="#fig_0">1</ref>: ( </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Prompt Design</head><p>Performing Model Leeching successfully requires correct prompt design. Adversaries must design well-structured prompts that accurately define the relevancy and depth of the necessary  generated responses in order to identify task-specific knowledge of interest. Depending on the use case, prompt design is achieved manually or through automated methods <ref type="bibr" target="#b7">[8]</ref>. Model Leeching leverages the following three-stage prompt design process:</p><p>1. Knowledge Discovery. An adversary first defines the type of task knowledge to extract.</p><p>Once defined, an adversary assesses specific target LLM prompt responses to ascertain its affinity to generate task knowledge. This assessment encompasses domain (NLP, image, audio, etc.), response patterns, comprehension limitations, and instruction adherence for particular knowledge domains <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b19">20]</ref>. Following successful completion of this assessment, the adversary is able to devise an effective strategy to extract desired characteristics. 2. Construction. Subsequently, the adversary crafts a prompt template that integrates an instruction set reflecting the strategy formulated during the knowledge discovery stage. Template design encompasses distinctive response structure of the target LLM, its recognized limitations, and task-specific knowledge identified for extraction. This template facilitates dynamic prompt generation within the Model Leeching process. 3. Validation. The adversary validates the created prompt and response generated from the target LLM. Validation entails ensuring the LLM responds reliably to prompts, represented as a consistent response structure and ability to carry out given instructions. Ensuring that the target LLM is capable enough to carry out the required task, that it can process and action upon its given instructions. This validation activity enables the Model Leeching method to generate responses that can be used to effectively train local models with extracted task-specific knowledge.</p><p>The prompt design process follows an iterative approach, typically requiring multiple variations and refinements to devise the most effective instructions and styles for obtaining desired results from a specific LLM for a given task <ref type="bibr" target="#b19">[20]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Data Generation</head><p>Once a suitable prompt has been designed, the adversary targets the given LLM (𝑀 𝑡𝑎𝑟𝑔𝑒𝑡 ). This refined prompt is specified to capture desired LLM purpose and task (e.g. Summarization, Chat, Question &amp; Answers, etc.) to be instilled within the extracted model <ref type="bibr" target="#b20">[21]</ref>. Given a ground truth dataset (𝐷 𝑡𝑟𝑢𝑡ℎ ), all examples are processed into prompts recognized as valid target LLM inputs. Once all queries have been processed by the target LLM, we generate an adversarial dataset (𝐷 𝑎𝑑𝑣 ) combining inputs with received LLM replies, as well as automated validation (removing API request errors, failed, or erroneous prompts). This process can be distributed and parallelised to minimize collection time as well as mitigate the impact of rate-limiting and/or detection by filtering systems when interacting with the web-based LLM API <ref type="bibr" target="#b21">[22]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Extracted Model Training</head><p>Using (𝐷 𝑎𝑑𝑣 ), data is split into train (𝐴𝑑𝑣 𝑡𝑟𝑎𝑖𝑛 ) and evaluation (𝐴𝑑𝑣 𝑒𝑣𝑎𝑙 ) sets used for extracted model training and attack success evaluation. A pre-trained or empty base model (𝑀 𝑏𝑎𝑠𝑒 ) is selected for distilling knowledge from the target LLM. This base model is then trained upon (𝐴𝑑𝑣 𝑡𝑟𝑎𝑖𝑛 ) with selected hyper-parameters producing an extracted model (𝑀 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑒𝑑 ). Using evaluation set (𝐴𝑑𝑣 𝑒𝑣𝑎𝑙 ), similarity and accuracy in a given task can be evaluated and compared using answers generated by (𝑀 𝑒𝑥𝑡𝑟𝑎𝑐𝑡𝑒𝑑 ) and (𝑀 𝑡𝑎𝑟𝑔𝑒𝑡 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">ML Attack Staging</head><p>Access to an extracted model (local to an adversary) created from a target LLM facilitates the execution of augmented adversarial attacks. This extracted model allows an adversary to perform unrestricted model querying to test, modify or tailor adversarial attack(s) to discover exploits and vulnerabilities against a target LLM <ref type="bibr" target="#b9">[10]</ref>. Furthermore, access to an extracted model enables an adversary to operate in a sandbox environment to conduct adversarial attacks prior to executing the same attack(s) against the target LLM in production (and of particular concern, whilst minimizing the likelihood of detection by the provider).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experimental Setup</head><p>To demonstrate the effectiveness of Model Leeching, we created a set of extracted models using ChatGPT-3.5-Turbo as the target model, with Question &amp; Answers as the target task. Taskspecific prompts were designed and generated using the Stanford Question Answering 1.1 Dataset (SQuAD) containing 100k examples (85k to 15k evaluation split), representing a context and set of questions and associated answers <ref type="bibr" target="#b22">[23]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Prompt Construction</head><p>A comprehensive array of prompts, encompassing the entirety of the SQuAD dataset was produced. These prompts adhere to a template containing the specific SQuAD question and context, enabling ChatGPT-3.5-Turbo to efficiently process and respond to the given task. As seen in Figure <ref type="figure" target="#fig_1">2</ref>, each rule instructs the target LLM to produce an output desired by the adversary ensuring effective capture of task-specific knowledge. The template comprises:</p><p>1. Target LLM is specifically directed to provide only the precise answer to the assigned SQuAD question, drawn solely from the provided SQuAD context. This stipulation is</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Given this context: "{{SQuAD Context}}"</head><p>Can you answer this question briefly: "{{SQuAD Question}}".</p><p>Rules: 1). Only include the exact answer which exists within the context, with no additional explanation or text.</p><p>2). Additionally include the sentence where the answer occurred.</p><p>3). Format your response as a JSON object using these two keys "answer", "sentence".</p><p>4). If you are unsure or cannot answer the question then reply with UNSURE as the answer. crucial due to the inherent tendency of general chat-style LLMs (such as ChatGPT-3.5-Turbo) to produce more verbose responses than necessary. In the scope of SQuAD score assessment, only the exact answer is pertinent, negating the need for any additional content. 2. By including the sentence where the answer occurred, the LLM is required to demonstrate a degree of contextual comprehension beyond simple fact extraction, for valid data generation that contains the correct task knowledge. This requirement ensures that the model is not limited to identifying keywords, but understands the broader text semantic structure. In the case of assessing model performance on ChatGPT-3.5-Turbo, the index in which an answer is found within the context is required. 3. Use of a standardized JSON format for responses facilitates efficient and uniform data handling. The keys answer and sentence provide a clear and concise structure, making the model output easier to process and compare algorithmically and manually. 4. Ability to respond with 'UNSURE' provides a safeguard for quality control of model response. By acknowledging its own uncertainty, the LLM avoids disseminating potentially incorrect or misleading information, and assists in parsing prompts that it was unable to complete.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Model Base Architectures</head><p>To evaluate the effectiveness of Model Leeching, we selected three different base model architectures and several variants (with models parameter sizes ranging from between 14 to 123 million) to create an extracted model of our target LLM. These six model architectures include Bert <ref type="bibr" target="#b23">[24]</ref>, Albert <ref type="bibr" target="#b24">[25]</ref>, and Roberta <ref type="bibr" target="#b25">[26]</ref>, were selected due to their parameter size and respective performance upon our selected task <ref type="bibr" target="#b25">[26]</ref>. The intention of selecting these architectures as candidate extracted models is to to evaluate wether: 1) more sophisticated models (parameters, architecture) are more effective at learning target LLM characteristics; and 2) low parameter models (i.e. 100x smaller vs. ChatGPT-3.5-Turbo) can learn sufficient characteristics from a target LLM, while achieving comparable performance a specific task. Using these candidate model architectures, we train two sets of models for the purposes of evaluation, 1) extracted models; trained upon generated 𝐴𝑑𝑣 𝑡𝑟𝑎𝑖𝑛 dataset, and 2) baseline models; for performance comparison, trained directly upon the ground-truth SQuAD dataset. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">ML Attack Staging</head><p>We created and deployed an adversarial attack derived from AddSent <ref type="bibr" target="#b9">[10]</ref> that generates an adversarial context by adding a non-factual yet semantically and syntactically correct sentences to the original context from a SQuAD entry (Figure <ref type="figure">3</ref>). The goal of this attack is to cause a QA model to incorrectly answer a question when given an adversarial context. We further modified this attack to generate a larger variety of adversarial context, selectively chosen based on their success upon our extracted model, which is then sent to the target LLM for improved misclassification likelihood.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Model Leeching Scenario</head><p>We demonstrate the effectiveness of Model Leeching by targeting ChatGPT-3.5-Turbo with a pre-trained Roberta-Large base architecture <ref type="bibr" target="#b25">[26]</ref>. Using SQuAD as described in 4.1, we generate a new labelled adversarial dataset through automated prompt generation querying ChatGPT-3.5-Turbo, which is trained upon the base architecture to create an extracted model. We evaluate attack performance by measuring the extracted model performance to a baseline model directly trained on SQuAD with ground truth answers. We demonstrate the feasibility of attack transferability across models by applying the AddSent attack <ref type="bibr" target="#b9">[10]</ref> upon the extracted model, generating adversarial perturbations that can be further staged upon the target LLM. In order to explore feasibility of transferability of adversarial vulnerabilities across models. We leverage three metrics for evaluation: Exact Match (EM), and F1 Score used to measure the performance/similarity of our extracted model and ChatGPT-3.5-Turbo <ref type="bibr" target="#b22">[23]</ref>, and attack success rate for further attack staging representing successful adversarial prompts.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Data Generation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B e r t B a s e B e r t L a r g e A l b e r t B a s e</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Extraction Similarity</head><p>Figure <ref type="figure" target="#fig_3">4</ref> shows that each extracted model performed more similarly to ChatGPT-3.5-Turbo compared to their baseline counterpart, with each model EM and F1 similarity score being up to 10.49% and 5% higher, respectively. Roberta Large achieved the highest ChatGPT-3.5-Turbo similarity, with a 0.73 EM and 0.87 F1 score denoting high similarity to the target LLM <ref type="bibr" target="#b26">[27]</ref>. Similarity of the baseline models to ChatGPT-3.5-Turbo is lower than the extracted model, due to being trained using the original SQuAD dataset, whereas the extracted models used a dataset derived from ChatGPT-3.5-Turbo. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B e r t B a s e B e r t L a r g e</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Task Performance</head><p>Extracted model task performance was evaluated by comparing the SQuAD EM and F1 scores to baseline models and ChatGPT-3.5-Turbo. Figure <ref type="figure" target="#fig_4">5</ref> shows that extracted models exhibit similar performance for SQuAD when compared with their respective baselines, with EM and F1 scores. Evaluating our extracted models against ChatGPT-3.5-Turbo, we observed that Roberta Large achieved the highest similarity to ChatGPT-3.5-Turbo performance exhibiting EM and F1 scores, achieving an EM/F1 score of 0.75/0.87 compared to 0.74/0.87 respectively. Extracted model performance from ChatGPT-3.5-Turbo is sufficiently comparable in performance to state-of-theart literature on QA tasks, where with the hyperparameters used in Roberta Large are more performant than the other architectures <ref type="bibr" target="#b25">[26]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4.">ML Attack Staging</head><p>Roberta Large was used to evaluate the attack success of AddSent upon the extracted model and ChatGPT-3.5-Turbo given its high SQuAD accuracy and similarity. AddSent exhibited an attack success of 0.28 and 0.26 upon the extracted model and ChatGPT-3.5-Turbo, respectively. Leveraging access to our extracted model, we selected and sent the best performing 7,205 adversarial examples to ChatGPT-3.5-Turbo. Our results indicate that adversarial examples augmented by AddSent increased attack success by 26% for the extracted model, and 11% to ChatGPT-3.5-Turbo (Figure <ref type="figure" target="#fig_5">6</ref>). Attack effectiveness is reduced across models due to ChatGPT-3.5-Turbo being 100x larger in parameter size than local models, and leveraging advanced training methods such as reinforcement learning from human feedback, not used on our local models. While ChatGPT-3.5-Turbo is more task capable and less likely to be evaded by adversarial prompts compared to a local model. However, despite increased adversarial robustness, our results highlight attack transferability exists between an extracted model and its target, demonstrating the feasibility of leveraging distilled knowledge to further stage and subsequently launch improved adversarial attacks upon a production LLM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Baseline</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1.">Dataset Labelling</head><p>Using the SQuAD dataset containing 100k examples, we successfully labelled 83,335 using ChatGPT-3.5-Turbo (see Section 5.1). In total, this process cost $50 and required 48 hours to complete. Compared to using labelling services such as Amazon SageMaker Data Labeling <ref type="bibr" target="#b27">[28]</ref>, the estimated cost of labelling would be $0.036 per example of data, totalling $3,600, demonstrating a significant reduction in cost when using generative LLMs to label datasets. We additionally note that the success of labelling datasets can be increased by 1) further prompt engineering and optimization to package multiple SQuAD examples into one efficient query enabling reduction in query cost and time; and 2) re-sending of failed SQuAD examples to achieve higher amount of successful labelled examples.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2.">Extraction Similarity</head><p>Extracted models derived from Model Leeching demonstrate the ability to effectively learn the characteristics of the target model. Highlighted within Section 5.2, noticeable deviations between our extracted models, and baseline equivalents, against their EM/F1 similarity to the target, demonstrate extracted models contain similarly learned knowledge to the target compared to baseline models. The extracted model responses closely align with those of ChatGPT-3.5-Turbo's, exhibiting similar success and error rates in how they semantically and syntactically answer questions. This finding underscoring the capacity of our model to replicate the behaviour of the target, especially in the given task.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3.">Distilled Knowledge Capability</head><p>Our findings showcase the possibility of not only extracting knowledge from a LLM, but also transferring this knowledge effectively to a model with significantly fewer parameters. ChatGPT-3.5-Turbo comprises 175 billion parameters, whilst our local models are 100x smaller (See Section 5.3). These smaller local models when trained with the extracted dataset demonstrated the ability to perform the given task effectively. Comparing our extracted model performance upon SQuAD to ChatGPT-3.5-Turbo we observed at worst a 13.2%/12.04% EM/F1 score difference and our best-performing extracted model, Roberta Large, achieving identical SQuAD scores to ChatGPT-3.5-Turbo.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4.">ML Attack Staging</head><p>Demonstrated within Section 5.4, it is feasible to utilize an extracted model within an adversaries' local environment to conduct further adversarial attack staging. By having unfettered query access to this extracted model, it facilitates the enhancement of attack success. The potency of the AddSent attack on the model extracted by Model Leeching was increased by 26%, which consequently led to an 11% increase when launched against ChatGPT-3.5-Turbo. This highlights the vulnerability of a target LLM to subsequent machine learning attacks once adversaries acquire an extracted model. By having access to this 'sandbox' model, adversaries can refine or innovate their attack strategies. Consequently, LLMs deployed and served over publicly accessible APIs are at significant risk to further attack staging.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Further Work</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.1.">Empirical Analysis of Additional Production LLMs</head><p>Further work includes conducting Model Leeching against a larger array of LLM(s) such as BARD, LLaMA and available variations of GPT models from OpenAI. Taking these models and exploring how they respond to Model Leeching and their vulnerability to follow-up attacks. Such a study would demonstrate the possibility to generate ensemble models that inherit characteristics from multiple target LLMs. Enabling the optimization of a local model by task-specific performance from the best-performing target would aim to maximise the local model capability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2.">Extraction By Proxy / Degrees of Separation</head><p>Multiple open-source versions of popular LLMs have been produced by the ML community. This includes examples such as GPT4All <ref type="bibr" target="#b28">[29]</ref> and Llama <ref type="bibr" target="#b0">[1]</ref> that can be deployed on consumergrade devices. These models typically leverage training sets, architectures and prompts used to develop the LLM they are aiming to extract and replicate. If these models share significant characteristics with the original LLM, it may be feasible for an adversary to conduct Model Leeching and then deploy an improved attack against a target LLM it didn't interact with before attack deployment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.3.">LLM Defenses</head><p>There has been limited work to defend against attacks on LLMs. Previous research into defending against model extraction attacks for smaller NLP models has been explored, utilizing techniques such as Membership Classification <ref type="bibr" target="#b29">[30]</ref>, and Model Watermarking <ref type="bibr" target="#b30">[31]</ref>. However given the rapid development of new state-of-the-art adversarial attacks against LLMs, it is important that the effectiveness of currently proposed defense techniques within literature are evaluated with newer LLMs. Exploring if the characteristics from applied defense techniques are captured within extracted knowledge from the target model, and further detectable within a distilled extracted model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Conclusion</head><p>In this paper we have proposed a new state-of-the-art extraction attack Model Leeching as a cost-effective means to generate an extracted model with shared characteristics to a target LLM. Furthermore, we demonstrated that it is feasible to conduct adversarial attack staging against a production LLM via interrogating an extracted model derived from a target LLM within a sandbox environment. Our findings suggest that extracted models can be derived with a high similarity and task accuracy with low query costs, and constitute the basis of attack transferability to execute further successful adversarial attacks utilizing data leaked from the target LLM.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of Model Leech. Deep Learning models comprising of architecture, parameters and hyper-parameters can be extracted via extraction attacks.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Example of Prompt Template. Slots for SQuAD context and questions, with a set of instructions for the LLM to follow.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Model Similarity to ChatGPT-3.5-Turbo. Comparing similarity in correct and incorrect answering of questions relative to ChatGPT-3.5-Turbo.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Baseline and Extracted SQuAD Accuracy. Comparing the baseline and extracted models' performance on the original SQuAD dataset questions and answers.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: ML Attack Staging Results. Comparing the original attack's adversarial effectiveness against those developed with the model extracted from ChatGPT-3.5-Turbo.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>The organization of Stark Industries predicted that the Bezos forest could survive only three</head><label></label><figDesc></figDesc><table><row><cell>Article: Amazon Rainforest</cell></row><row><cell>Context: "In 2005, parts of the Amazon basin experienced the worst</cell></row><row><cell>drought in one hundred years, and there were indications that 2006</cell></row><row><cell>could have been a second successive year of drought. A July 23, 2006</cell></row><row><cell>article in the UK newspaper The Independent reported Woods Hole</cell></row><row><cell>Research Center results showing that the forest in its present form</cell></row><row><cell>could survive only three years of drought. Scientists at the Brazilian</cell></row><row><cell>National Institute of Amazonian Research argue in the article that this</cell></row><row><cell>drought response, coupled with the effects of deforestation on regional</cell></row><row><cell>climate, are pushing the rainforest towards a "tipping point" where it</cell></row><row><cell>would irreversibly start to die. It concludes that the forest is on the</cell></row><row><cell>brink of being turned into savanna or desert, with catastrophic</cell></row><row><cell>consequences for the world's climate.</cell></row></table><note>years of drought." Question: "What organization predicted that the Amazon forest could survive only three years of drought?" Actual Answer: Woods Hole Research Center ChatGPT Answer: Stark Industries Extracted Model Answer: Stark IndustriesFigure 3: Example of AddSent Attack. Adversarial sentences appended to SQuAD context (blue highlighted text) to yield incorrect answers for SQuAD questions.</note></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Izacard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Martinet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-A</forename><surname>Lachaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lacroix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Rozière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hambro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Azhar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rodriguez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.13971</idno>
		<title level="m">Llama: Open and efficient foundation language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">Chatgpt</forename><surname>Openai</surname></persName>
		</author>
		<ptr target="https://openai.com/blog/chatgpt" />
		<title level="m">OpenAI Blog</title>
				<imprint>
			<date type="published" when="2023-02-08">2023. 2023-02-08</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Ai</surname></persName>
		</author>
		<ptr target="https://ai.google/static/documents/google-about-bard.pdf" />
		<title level="m">About Bard, Google AI</title>
				<imprint>
			<publisher>Publications</publisher>
			<date type="published" when="2023-02-08">2023. 8th February 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Ai as agency without intelligence: on chatgpt, large language models, and other generative models</title>
		<author>
			<persName><forename type="first">L</forename><surname>Floridi</surname></persName>
		</author>
		<idno type="DOI">10.1007/s13347-023-00621-y</idno>
		<ptr target="https://doi.org/10.1007/s13347-023-00621-y.doi:10.1007/s13347-023-00621-y" />
	</analytic>
	<monogr>
		<title level="j">Philosophy &amp; Technology</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page">15</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Extracting training data from large language models</title>
		<author>
			<persName><forename type="first">N</forename><surname>Carlini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Tramer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jagielski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Herbert-Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Roberts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Erlingsson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Oprea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2012.07805.arXiv:2012.07805" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">Z</forename><surname>Kolter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fredrikson</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.15043</idno>
		<title level="m">Universal and transferable adversarial attacks on aligned language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Thieves on sesame street! model extraction of bert-based apis</title>
		<author>
			<persName><forename type="first">K</forename><surname>Krishna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Tomar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Parikh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Papernot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Iyyer</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1910.12366.arXiv:1910.12366" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Self-instruct: Aligning language model with self generated instructions</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Kordi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Khashabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hajishirzi</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2212.10560.arXiv:2212.10560" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Pinch: An adversarial extraction attack framework for deep learning models</title>
		<author>
			<persName><forename type="first">W</forename><surname>Hackett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Trawicki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Suri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Garraghan</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2209.06300.arXiv:2209.06300" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Adversarial examples for evaluating reading comprehension systems</title>
		<author>
			<persName><forename type="first">R</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1707.07328.arXiv:1707.07328" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Stealing machine learning models via prediction APIs</title>
		<author>
			<persName><forename type="first">F</forename><surname>Tramèr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Juels</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">K</forename><surname>Reiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Ristenpart</surname></persName>
		</author>
		<ptr target="https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/tramer" />
	</analytic>
	<monogr>
		<title level="m">25th USENIX Security Symposium (USENIX Security 16)</title>
				<meeting><address><addrLine>Austin, TX</addrLine></address></meeting>
		<imprint>
			<publisher>USENIX Association</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="601" to="618" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Deepsniffer: A dnn model extraction framework based on learning architectural hints</title>
		<author>
			<persName><forename type="first">X</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Sherwood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xie</surname></persName>
		</author>
		<idno type="DOI">10.1145/3373376.3378460</idno>
		<idno>doi:10.1145/3373376.3378460</idno>
		<ptr target="https://doi.org/10.1145/3373376.3378460" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS &apos;20</title>
				<meeting>the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS &apos;20<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="385" to="399" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><surname>Mitre</surname></persName>
		</author>
		<ptr target="https://atlas.mitre.org/" />
		<title level="m">MITRE ATLAS Adversarial Attack Knowledge Base</title>
				<imprint>
			<date type="published" when="2023-02">2023. 02-May-2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Chakraborty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Alam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chattopadhyay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mukhopadhyay</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.00069</idno>
		<title level="m">Adversarial attacks and defences: A survey</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1706.03762.arXiv:1706.03762" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">A survey of large language models</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Min</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Nie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-R</forename><surname>Wen</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2303.18223.arXiv:2303.18223" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">The limitations of deep learning in adversarial settings</title>
		<author>
			<persName><forename type="first">N</forename><surname>Papernot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Mcdaniel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Fredrikson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><forename type="middle">B</forename><surname>Celik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Swami</surname></persName>
		</author>
		<idno type="DOI">10.1109/EuroSP.2016.36</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="372" to="387" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">The turking test: Can language models understand instructions?</title>
		<author>
			<persName><forename type="first">A</forename><surname>Efrat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2010.11982</idno>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Reframing instructional prompts to GPTk&apos;s language</title>
		<author>
			<persName><forename type="first">S</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Khashabi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Baral</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hajishirzi</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.findings-acl.50</idno>
		<ptr target="https://aclanthology.org/2022.findings-acl.50.doi:10.18653/v1/2022.findings-acl.50" />
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: ACL 2022, Association for Computational Linguistics</title>
				<meeting><address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="589" to="612" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>White</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Fu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hays</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sandborn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Olea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Gilbert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Elnashar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Spencer-Smith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">C</forename><surname>Schmidt</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.11382</idno>
		<title level="m">A prompt pattern catalog to enhance prompt engineering with chatgpt</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">The security of machine learning in an adversarial setting: A survey</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Kuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jpdc.2019.03.003</idno>
		<ptr target="https://doi.org/10.1016/j.jpdc.2019.03.003" />
	</analytic>
	<monogr>
		<title level="j">Journal of Parallel and Distributed Computing</title>
		<imprint>
			<biblScope unit="volume">130</biblScope>
			<biblScope unit="page" from="12" to="23" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Machine generated text: A comprehensive survey of threat models and detection methods</title>
		<author>
			<persName><forename type="first">E</forename><surname>Crothers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Japkowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Viktor</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2210.07321</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Rajpurkar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lopyrev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liang</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1606.05250.arXiv:1606.05250" />
		<title level="m">Squad: 100,000+ questions for machine comprehension of text</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1810.04805</idno>
		<title level="m">Bert: Pre-training of deep bidirectional transformers for language understanding</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">Z</forename><surname>Lan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Goodman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Gimpel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Soricut</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1909.11942</idno>
		<title level="m">Albert: A lite bert for self-supervised learning of language representations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Roberta: A robustly optimized bert pretraining approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/1907.11692.arXiv:1907.11692" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">I know what you trained last summer: A survey on stealing machine learning models and defences</title>
		<author>
			<persName><forename type="first">D</forename><surname>Oliynyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mayer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Rauber</surname></persName>
		</author>
		<idno type="DOI">10.1145/3595292</idno>
		<ptr target="https://doi.org/10.1145/3595292.doi:10.1145/3595292" />
	</analytic>
	<monogr>
		<title level="j">ACM Comput. Surv</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><surname>Aws</surname></persName>
		</author>
		<ptr target="https://aws.amazon.com/sagemaker/data-labeling/pricing/" />
		<title level="m">Sagemaker data labeling pricing</title>
				<imprint>
			<date type="published" when="2023">2023. 20230-06-30</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">io</title>
		<author>
			<persName><surname>Openai</surname></persName>
		</author>
		<ptr target="https://gpt4all.io/index.html" />
		<imprint>
			<date type="published" when="2023-02-08">2023. 8th February 2023</date>
			<biblScope unit="page">4</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<title level="m" type="main">Membership inference attacks against machine learning models</title>
		<author>
			<persName><forename type="first">R</forename><surname>Shokri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Stronati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Shmatikov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1610.05820</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title level="m" type="main">Dawn: Dynamic adversarial watermarking of neural networks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Szyller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">G</forename><surname>Atli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Marchal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Asokan</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1906.00830</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
