<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Synthetic Data in AI Development: Ensuring Data Protection and Ethics</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Maria</forename><forename type="middle">Catarina</forename><surname>Batista</surname></persName>
							<email>catabatista1999@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">NOVA School of Law</orgName>
								<address>
									<settlement>Lisbon</settlement>
									<country key="PT">Portugal</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Synthetic Data in AI Development: Ensuring Data Protection and Ethics</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CC912C409CAF1DD31B3E0CF93CA9A90E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:58+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Synthetic Data</term>
					<term>AI</term>
					<term>GDPR</term>
					<term>Data Protection</term>
					<term>Ethics</term>
					<term>Data Governance</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The generation and use of synthetic data have transformed AI system development, enabling a shift from reliance on real-world data to artificial data that preserves the statistical properties of real data while mitigating privacy concerns. As a Privacy Enhancing Technology, data synthesis strikes a balance between data protection mandates and data utility. However, synthetic data introduces ethical challenges, such as bias, misinformation, and public distrust, which this study addresses. This paper emphasizes the necessity of urgent measures to uphold public trust in AI systems and ensure the responsible use of synthetic data in research, especially in sensitive areas like healthcare. It evaluates the British perspective on synthetic data use in research, presenting it as an initial approach to these challenges.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Back in the day, data used for research were mostly collected from sources in the physical world, encompassing a wide range of information. However, with the generation of synthetic data, this scenario has suffered radical changes. This paper examines the impact of using synthetic data to train AI systems on privacy and ethics in our society. In the first section, key concepts around data synthesis are delineated. In section two, we explore issues such as bias, loss of public trust, and the principle of data accuracy, with a practical scenario involving health data accuracy. The third section assesses the British perspective on the use of synthetic data for research. Finally, the fourth section draws conclusions and outlines future approaches to ethical standards for the use of synthetic data .</p><p>For the purpose of this paper, data obtained from real-world sources to generate synthetic data are referred to as "real data". When this data is related to an identified or identifiable natural person, they are categorized as "personal data", as per Article 4 (1) of the General Data Protection Regulation ("GDPR") <ref type="bibr" target="#b0">[1]</ref>.</p><p>According to Dr Khaled El Emam, a leading figure on data synthesis and anonymization, at a conceptual level, synthetic data can be defined as "data that has been generated from real data and that has the same statistical properties as the real data" <ref type="bibr" target="#b1">[2]</ref>. This definition recognizes the artificial nature of synthetic data while retaining the statistical characteristics of the real data. It is crucial to understand that synthetic data refers to data that is artificially created to mimic the patterns and insights found in a real dataset without directly copying information about the individuals represented in that dataset <ref type="bibr" target="#b1">[2]</ref>. This type of data can be produced either by using an actual dataset or through deductions and rules established by the coder. <ref type="bibr" target="#b3">[3]</ref> [4] Such inferences can be drawn from AI systems, or via human analysis, contingent on the variables present within the source dataset <ref type="bibr" target="#b4">[4]</ref>.</p><p>Data synthesis is a Privacy Enhancing Technology ("PET") that has been developed as a promising solution to address Data Protection concerns while enabling valuable insights to be extracted from real data <ref type="bibr" target="#b5">[5]</ref>. The imperatives of privacy dictate that synthetic data should not solely repeat the statistical patterns and correlations of the real data used for the data synthesis procedure. The GDPR and other Data Protection frameworks demand an inherent trade-off between the safeguarding of data subjects and the practical utility of such data <ref type="bibr" target="#b1">[2]</ref>. This trade-off is quantified by measuring the accuracy of the synthetic data in relation to the real data <ref type="bibr" target="#b1">[2]</ref>. The higher the degree of privacy preservation incorporated, the more likely the synthetic data is to diverge from the statistical relationships present in the real data, thus having lower utility <ref type="bibr" target="#b1">[2]</ref>. This balancing test is crucial in scenarios where the preservation of certain attributes from the real data, for example for analytical accuracy and reliability, is necessary to achieve the purpose for which the synthetic data was generated <ref type="bibr" target="#b6">[6]</ref>. For instance, if the purpose of the data synthesis is to generate synthetic data to train AI models for consumer prediction, the demand for high utility is superior <ref type="bibr" target="#b1">[2]</ref>. In opposition, when data synthesis' purpose is to assess a software's capability to manage an extensive volume of transactions, the interest in the utility of such data would be significantly reduced <ref type="bibr" target="#b1">[2]</ref>.</p><p>In short, by examining the balance between data protection and utility, especially within the framework of GDPR, we underscore the importance of maintaining data accuracy while safeguarding individual privacy. The next section of this paper aims to provide a comprehensive understanding of the responsible use of synthetic data in AI systems. packages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Unveiling the Challenges and Risks in AI Systems Using Synthetic Data</head><p>Over recent years, data synthesis has developed into a refined tool that effectively tackles both privacy and accuracy issues in settings that rely heavily on data <ref type="bibr" target="#b5">[5]</ref>. In 2020, Gartner acknowledged the significance of synthetic data, advising organizations to incorporate it into their overall data strategies <ref type="bibr" target="#b7">[7]</ref>. They pointed out its scalable nature and compliance with privacy standards, underscoring its broad applicational potential <ref type="bibr" target="#b7">[7]</ref>. By 2024, the use of synthetic data has expanded significantly, with both commercial enterprises and governmental institutions leveraging it to advance research, enhance services, and improve decision-making processes <ref type="bibr" target="#b6">[6]</ref>. Nevertheless, it also holds significant accountability issues and ethical challenges, as it will be demonstrated in the following subsections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Bias and Loss of Public Trust</head><p>Data synthesis is a technique that has the potential of enhancing the reproducibility and diversity of a dataset, thus, it can be used as a tool to reduce biases in datasets <ref type="bibr" target="#b8">[8]</ref>. In the context of AI development, synthetic data generation enables the creation of edge cases and fills in missing data. This approach helps to address potential biases and inaccuracies in the input datasets, which are crucial for training models. By incorporating these diverse scenarios, synthetic data ensures that the models are more robust and less likely to produce harmful biased outputs. <ref type="bibr" target="#b8">[8]</ref>.</p><p>While having the potential to protect disadvantaged groups from harmful bias present in datasets, the use of data synthesis brings to the table many ethical challenges, such as synthetic media and deepfakes, enhancing the risks of misinformation and societal distrust <ref type="bibr" target="#b6">[6]</ref>. It is essential to understand that the absence of information about the source and quality of synthetic data introduces a major challenge: discerning which information within the dataset is valid and which is not <ref type="bibr" target="#b9">[9]</ref>.</p><p>Synthetic media, which is a subset of synthetic data, focusing specifically on media content created using AI techniques, is a great example of the ethical concerns previously mentioned, since its main function is to replicate real-world content, such as images, videos, or audio <ref type="bibr" target="#b10">[10]</ref>. Increasingly recognized for its problematic aspects in society, "deepfakes" involve manipulated media where images and videos are altered to falsely depict individuals saying or doing things they never actually did <ref type="bibr" target="#b10">[10]</ref>. For instance, deepfakes involving fake sexual photos represent a severe violation of privacy and consent, intensifying ethical issues within synthetic media <ref type="bibr" target="#b10">[10,</ref><ref type="bibr" target="#b11">11]</ref>. Therefore, in this case, synthetic data's capacity for misrepresentation damages reputations, leads to misleading perceptions, and can cause significant emotional distress <ref type="bibr" target="#b12">[12]</ref>.</p><p>The widespread creation and distribution of synthetic media contribute to societal distrust in media, further eroding social cohesion and heightening public scepticism towards legitimate information. This, in turn, poses challenges for maintaining trust in digital communications and media integrity <ref type="bibr" target="#b10">[10]</ref>. Furthermore, synthetic data can lead to cases of mistaken identity. For instance, when creating a synthetic persona, it is possible that this fake person could be mistaken for a real person from the dataset used to generate the synthetic data <ref type="bibr" target="#b10">[10]</ref>.</p><p>While there appears to be no straightforward solution for synthetic data generated with malicious intent, it is possible to manage some ethical issues like bias and misrepresentation, through the implementation of risk mitigation measures previously and during the data synthesis procedure. Thus, in the next session we go through practical scenarios to evaluate the legal and ethical dimensions in synthetic data use cases and we provide our input to improve the compliance of such processing activities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Principle of Data Accuracy</head><p>The principle of data accuracy, enshrined in Article 5(1)(d) of the GDPR, embodies the trustworthiness and reliability of the Data Subjects in the processing of Personal Data <ref type="bibr" target="#b13">[13]</ref>. According to the GDPR, controllers and processors should maintain the precision of datasets and must immediately rectify any inaccuracies when they arise. However, the processing of synthetic data introduces a complex layer to this issue. Synthetic data, being an artificial construct, does not directly represent real individuals. From a Data Protection compliance perspective, it raised an important question: How can accuracy be ensured in synthetic data, which lacks a direct link to the individuals?</p><p>It is necessary to point out that once the synthetic data has been generated, the next step of the data synthesis procedure involves calculating its metrics <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b14">14]</ref>. These metrics are then compared with those of the real data using a tool known as a discriminator <ref type="bibr" target="#b1">[2]</ref>. This discriminator evaluates the utility of the synthetic data by examining whether its statistical properties closely mirror those of the real data <ref type="bibr" target="#b1">[2]</ref>. During this metrics calculation phase, synthetic data developers are responsible for ensuring that the statistical patterns and correlations present in the real data are accurately replicated in the artificial data <ref type="bibr" target="#b1">[2]</ref>. This ensures that the synthetic data maintains fidelity to the real data it represents <ref type="bibr" target="#b1">[2]</ref>. Thus, when the comparison reveals that the synthetic data diverges from the real data, adjustments should be made to the generation parameters and a new and accurate dataset should be produced <ref type="bibr" target="#b1">[2]</ref>. This process must be repeated until achieving accurate synthetic data <ref type="bibr" target="#b1">[2]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1.">Practical Case: Health Data Accuracy</head><p>This subsection highlights the vital role of synthetic data in enhancing healthcare through a practical perspective. AI systems, which demand extensive and accurate training data, are increasingly being employed in healthcare for various purposes, such as medical imaging, patient data analytics, and drug discovery <ref type="bibr" target="#b6">[6,</ref><ref type="bibr" target="#b15">15,</ref><ref type="bibr" target="#b16">16]</ref>. A common issue in clinical trials is the inaccurate gender distribution among participants. For instance, when there is the predominance of male participants in a drug trial, it hinders the understanding of the medication's effects on females <ref type="bibr" target="#b17">[17]</ref>. To face this issue, synthetic data, generated specially to replicate the health profiles typical of female participants, can be integrated into the analysis to create a more inclusive and balanced study <ref type="bibr" target="#b18">[18]</ref>. Thus, synthetic data can be used as a strategic feature to improve the performance and reliability of AI systems to generate better informed results <ref type="bibr" target="#b6">[6]</ref>.</p><p>However, the use of not well-produced synthetic data might diminish societal trust in research, leading to doubts about the authenticity and integrity of a trial's findings <ref type="bibr" target="#b15">[15]</ref>. When using poor generated synthetic data to represent a demographic insufficiently represented in the real trial, such data might lead to potential inaccuracies in understanding how the medication affects that specific group, leading to erroneous medical decisions, with dangerous consequences for real patients <ref type="bibr" target="#b6">[6]</ref>.</p><p>From an ethical stance, the use of synthetic data not only helps achieve an unbiased dataset but also supports broader demographic research, in any field of study <ref type="bibr" target="#b6">[6]</ref>. Nevertheless, regardless of the precautionary measures taken by developers, researchers must be aware that there is always the risk that errors in the synthetic data generation may occur <ref type="bibr" target="#b6">[6]</ref> Therefore, when processing synthetic data, analysts and researchers must always proceed with caution, acknowledging that there is the possibility that not every pattern or correlation observed might be accurate.</p><p>A prime example of synthetic data's limitation is the partial synthesis of survey data collected by the Cancer Care Outcomes Research and Surveillance ("CanCORS") project <ref type="bibr" target="#b19">[19]</ref>. In this instance, after evaluating the synthetic data created using the project's model, researchers determined that the dataset was suitable only for preliminary data analysis due to problems with data correlations <ref type="bibr" target="#b6">[6,</ref><ref type="bibr" target="#b19">19]</ref>. Consequently, it is essential for developers of synthetic data to maintain transparency about the dataset's quality and clearly communicate its limitations to end-users.</p><p>Although some synthetic datasets as CanCORs can only be used for preliminary data analysis, they can still offer significant value at this early stage of research. For example, synthetic data provides a safe, efficient, and flexible alternative to using real data during software testing <ref type="bibr" target="#b1">[2]</ref>. Furthermore, incorporating synthetic data in the development phase can expedite the software refinement process and reduce computational demands, due to its high-quality labelling <ref type="bibr" target="#b1">[2]</ref>.</p><p>By incorporating synthetic data in the early stages of model development, the use of real data is deferred until the software's security has been verified. This strategy effectively reduces the risks associated with data processing, such as data breaches, thus enhancing the protection of the confidentiality, integrity, and availability of personal data <ref type="bibr" target="#b6">[6]</ref>. Moreover, this method highlights the role of synthetic data in facilitating research advancements while simultaneously bolstering data security. This approach is particularly relevant for special categories of personal data, such as health data, where the fundamental right to data protection demands special attention.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">From Theory to Practice: How the UK Implements Synthetic Data Strategies</head><p>In this section, we assess how synthetic data is being approached from a policy-making perspective. As mentioned in the previous sections, the use of synthetic data raises critical implications for transparency and communication to individuals <ref type="bibr" target="#b20">[20]</ref>. When presenting research findings based on synthetic data, it becomes crucial to ensure that audiences are made highly aware of this combination of data <ref type="bibr" target="#b9">[9]</ref>.</p><p>Recognizing the gravity of these concerns, bodies like the UK Statistics Authority and the Office for National Statistics ("ONS") have taken proactive steps, formulating comprehensive guidelines on synthetic data <ref type="bibr" target="#b21">[21]</ref>.</p><p>The ONS Synthetic Data Policy stresses key legal and ethical issues, such as confidentiality and data disclosure risks, offering an essential framework for responsible synthetic data processing in statistical research <ref type="bibr" target="#b21">[21]</ref>. This Policy determines the ethical handling of synthetic data in research and analysis, ensuring compliance with legal standards and reducing potential liabilities. This Policy is particularly significant as it marks the first documented guideline for managing synthetic data, thus providing orientation for researchers and analysts across all jurisdictions <ref type="bibr" target="#b21">[21]</ref>.</p><p>The UK Statistics Authority also established comprehensive guidance on synthetic data, including an overview of ethical considerations and mitigation strategies and an ethics checklist <ref type="bibr" target="#b22">[22,</ref><ref type="bibr" target="#b23">23]</ref>. This Authority has also developed ethical principles and an ethics self-assessment tool to guide researchers and statisticians in addressing ethical issues in various projects, including those involving synthetic data <ref type="bibr" target="#b23">[23,</ref><ref type="bibr" target="#b24">24]</ref>. Such principles emphasize the public good, data confidentiality, risk assessment, legal compliance, public perception, and transparency in data collection and usage <ref type="bibr" target="#b23">[23,</ref><ref type="bibr" target="#b24">24]</ref>. Therefore, by consistently incorporating a thoughtful ethical framework into each project, it is possible to address these concerns, ensuring both the integrity of the research and the continued trust of individuals in data synthesis <ref type="bibr" target="#b6">[6]</ref>.</p><p>Moreover, the Authority demonstrates the prominent need to balance utility, which is the data's practical usefulness, and fidelity, its authenticity <ref type="bibr" target="#b22">[22]</ref>. Such a balance is a parameter that demonstrates the efficiency of synthetic data to serve its intended purpose while accurately representing the real data <ref type="bibr" target="#b22">[22]</ref>. Essentially, utility represents if synthetic data satisfies specific research or analytical purposes <ref type="bibr" target="#b22">[22]</ref>. Conversely, synthetic data retaining substantial fidelity accurately reflects the attributes of real data, consequently serving as an accurate alternative for the real data <ref type="bibr" target="#b22">[22]</ref>. A mirror reflecting a complex scene can exemplify fidelity; the clearer the reflection, the higher the fidelity <ref type="bibr" target="#b22">[22]</ref>. High-fidelity datasets are very detailed and closely mimic real-world data, thus they are very useful for complex tasks like developing new medical treatments or training advanced AI models to predict patient outcomes. On the one hand, if synthetic data mirrors too closely the real data, it could inadvertently reveal personal data through inference, thus violating Data Protection norms and ethical considerations <ref type="bibr" target="#b22">[22]</ref>. On the other hand, if synthetic data deviates too much from the real dataset, its utility for research might be compromised due to a lack of authenticity <ref type="bibr" target="#b22">[22]</ref>. In opposition, low fidelity datasets are less detailed and more generalized, thus having a lower risk of revealing personal data, making it safer to use for research <ref type="bibr" target="#b22">[22]</ref>. Low-fidelity datasets are also particularly useful for gaining a broad understanding of trends and patterns in research without delving into sensitive details.</p><p>Finally, the importance of these British guidelines lies in their function as a standard for ideal data management practices in the world while the statistical research industry is in harmony with wider legal norms like the European and UK GDPR. Therefore, by adhering to these guidelines, organizations and researchers will comply with legal mandates related to Data Protection, thereby reducing legal risks associated with the use and management of Synthetic Data, while also upholding societal ethical principles.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>The exploration of synthetic data within the field of artificial intelligence has illuminated both its vast potentials and its ethical challenges. As this paper has discussed, synthetic data offers a crucial advantage by reducing reliance on real data, thereby enhancing privacy and reducing the risks associated with personal data breaches. However, the complexities of ensuring data accuracy, maintaining public trust, and managing potential errors cannot be overlooked.</p><p>The British perspective on synthetic data utilization in research advocates for a balanced approach, emphasizing the necessity of stringent Data Protection measures alongside the benefits of synthetic data. The UK's regulatory framework and ethical guidelines serve as a beacon for other nations, promoting a synthesis of utility and fidelity that respects both individual rights and the demands of technological advancement.</p><p>For synthetic data to truly benefit society, particularly in sensitive applications like healthcare, developers and regulators must work in concert to forge policies that not only enhance data utility but also prioritize transparency and accountability. Ensuring that synthetic data maintains its integrity without compromising on ethical standards is essential for its acceptance and success.</p><p>In conclusion, as synthetic data generation techniques continues to evolve, so too must our strategies for its regulation and use. Only through a concerted effort to address these legal and ethical challenges can we harness the full potential of synthetic data to propel AI development while safeguarding the fundamental rights of individuals. Moving forward, the lessons learned from the British model should inspire global standards that advocate for responsible and ethical synthetic data practices across all sectors.</p></div>		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://eur-lex.europa.eu/eli/reg/2016/679/oj" />
		<title level="m">General data protection regulation</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note>regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Practical Synthetic Data Generation</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">E</forename><surname>Emam</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1005">1005</date>
			<publisher>O&apos;Reilly Media, Inc</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Gravenstein Highway</forename><surname>North</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<pubPlace>Sebastopol, CA 95472; USA</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Statistical disclosure limitation</title>
		<author>
			<persName><forename type="first">R</forename></persName>
		</author>
		<ptr target="https://www.scb.se/contentassets/ca21efb41fee47d293bbee5bf7be7fb3/discussion-statistical-disclosure-limitation2.pdf" />
	</analytic>
	<monogr>
		<title level="j">J OFF STAT</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="461" to="462" />
			<date type="published" when="1993">1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Synthetic data: Legal implications of the data-generation revolution</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename></persName>
		</author>
		<ptr target="https://papers.ssrn.com/abstract=4414385" />
		<imprint>
			<date type="published" when="2023-09-21">2023. 21 September 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">O</forename><surname>Of</surname></persName>
		</author>
		<ptr target="https://www.priv.gc.ca/en/blog/20221012/?id=7777-6-493564" />
		<title level="m">the Privacy Commissioner of Canada, Privacy tech-know blog: When what is old is new again -the reality of synthetic data</title>
				<imprint>
			<date type="published" when="2022-06-11">2022. 11 June 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Synthetic Data and GDPR Compliance: Navigating the Legal and Ethical Landscape</title>
		<author>
			<persName><forename type="first">M</forename><surname>Batista</surname></persName>
		</author>
		<ptr target="https://run.unl.pt/bitstream/10362/166398/1/Batista_2024.pdf" />
		<imprint>
			<date type="published" when="2024-05-15">2024. 15 May 2024</date>
		</imprint>
		<respStmt>
			<orgName>Nova School of Law, University of Lisbon</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master&apos;s thesis</note>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><surname>Judah</surname></persName>
		</author>
		<ptr target="https://www.gartner.com/en/documents/3993855" />
		<title level="m">Predicts 2021: Data and analytics strategies to govern, scale and transform digital business</title>
				<imprint>
			<date type="published" when="2020-12-02">2020. 2 December 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">In defense of synthetic data</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename></persName>
		</author>
		<ptr target="http://arxiv.org/abs/1905.01351" />
		<imprint>
			<date type="published" when="2019-12-01">2019. 1 December 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">H</forename></persName>
		</author>
		<ptr target="https://dataethics.eu/ai-image-generator-this-is-someone-thinking-about-data-ethics/" />
		<title level="m">Ai image generator: This is someone thinking about data ethics • dataetisk taenkehandletank</title>
				<imprint>
			<date type="published" when="2022-10-07">2022. 7 October 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>University</surname></persName>
		</author>
		<ptr target="https://www.lancaster.ac.uk/media/lancaster-university/content-assets/documents/cyber-foundry/lcf-articles/LCFArticle-Josh-Deepfakes_WEB.pdf" />
		<title level="m">Lancashire cyber foundry an introduction to deepfakes</title>
				<imprint>
			<date type="published" when="2023-12-03">2023. 3 December 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">The deepfake detection challenge (dfdc) preview dataset</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">B</forename></persName>
		</author>
		<ptr target="http://arxiv.org/abs/1910.08854" />
		<imprint>
			<date type="published" when="2019-12-03">2019. 3 December 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">How to detect ai generated images with sensity in 2023</title>
		<author>
			<persName><forename type="first">S</forename><surname>Team</surname></persName>
		</author>
		<ptr target="https://sensity.ai/blog/deepfake-detection/how-to-detect-ai-generated-im/" />
		<imprint>
			<date type="published" when="2023-12-03">2023. 3 December 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename></persName>
		</author>
		<title level="m">The EU General Data Protection Regulation (GDPR): A Commentary</title>
				<imprint>
			<publisher>Oxford University Press</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Accelerating AI with Synthetic Data</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">E</forename><surname>Emam</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2020">2020</date>
			<publisher>O&apos;Reilly Media, Inc</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Synthetic data for healthcare: Benefits case studies in 2023</title>
		<author>
			<persName><forename type="first">D</forename></persName>
		</author>
		<ptr target="https://research.aimultiple.com/synthetic-data-healthcare/" />
		<imprint>
			<date type="published" when="2022-11-29">2022. 29 November 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Synthetic data and privacy -experiences implementing data synthesis in a global life sciences company</title>
		<author>
			<persName><forename type="first">B</forename></persName>
		</author>
		<ptr target="https://edps.europa.eu/system/files/2021-06/01_stephen_bamford_en_0.pdf" />
		<imprint>
			<date type="published" when="2021-12-03">2021. 3 December 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Generation and evaluation of synthetic patient data</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">A</forename></persName>
		</author>
		<idno type="DOI">10.1186/s12874-020-00977-1</idno>
		<ptr target="https://doi.org/10.1186/s12874-020-00977-1" />
	</analytic>
	<monogr>
		<title level="j">BMC Medical Research Methodology</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="page">108</biblScope>
			<date type="published" when="2020-11-04">2020. 4 November 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<title level="m" type="main">Synthetic data in health care: A narrative review</title>
		<author>
			<persName><surname>Gonzales</surname></persName>
		</author>
		<idno type="DOI">https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000082</idno>
		<ptr target="https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000082" />
		<imprint>
			<date type="published" when="2023-11-04">2023. 4 November 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Disclosure control using partially synthetic data for large-scale health surveys, with applications to cancors</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">B</forename></persName>
		</author>
		<idno type="DOI">10.1002/sim.5841</idno>
		<ptr target="https://onlinelibrary.wiley.com/doi/10.1002/sim.5841" />
	</analytic>
	<monogr>
		<title level="j">Statistics in Medicine</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page">4139</biblScope>
			<date type="published" when="2013-11-29">2013. 29 November 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">W</forename></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Rti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename></persName>
		</author>
		<ptr target="https://www.rti.org/brochures/rti-us-synthetic-household-populationtm-database" />
		<title level="m">synthetic household population</title>
				<imprint>
			<date type="published" when="2020-11-04">2020. 4 November 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<ptr target="https://www.ons.gov.uk/aboutus/transparencyandgovernance/datastrategy/datapolicies/syntheticdatapolicy/" />
		<title level="m">Synthetic data policy -office for national statistics</title>
				<imprint>
			<date type="published" when="2023-12-10">2023. 10 December 2023</date>
		</imprint>
		<respStmt>
			<orgName>Office for National Statistics</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Ethical considerations relating to the creation and use of synthetic data</title>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">S</forename><surname>Authority</surname></persName>
		</author>
		<ptr target="https://uksa.statisticsauthority.gov.uk/publication/ethical-considerations-relating-to-the-creation-and-use-of-synthetic-data/" />
		<imprint>
			<date type="published" when="2023-09-27">2023. 27 September 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">S</forename><surname>Authority</surname></persName>
		</author>
		<ptr target="https://uksa.statisticsauthority.gov.uk/the-authority-board/committees/national-statisticians-advisory-committees-and-panels/national-statisticians-data-ethics-advisory-committee/ethical-principles/" />
		<title level="m">Ethical principles</title>
				<imprint>
			<date type="published" when="2023-09-27">2023. 27 September 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">S</forename><surname>Authority</surname></persName>
		</author>
		<ptr target="https://uksa.statisticsauthority.gov.uk/the-authority-board/committees/national-statisticians-advisory-committees-and-panels/national-statisticians-data-ethics-advisory-committee/ethics-self-assessment-tool/" />
		<title level="m">Ethics self-assessment tool</title>
				<imprint>
			<date type="published" when="2023-09-27">2023. 27 September 2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
