<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Bringing Rome to Life: Evaluating Historical Image Generation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Phillip</forename><forename type="middle">B</forename><surname>Ströbel</surname></persName>
							<email>phillip.stroebel@uzh.ch</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computational Linguistics</orgName>
								<orgName type="institution">University of Zurich</orgName>
								<address>
									<addrLine>Andreasstrasse 15</addrLine>
									<postCode>8050</postCode>
									<settlement>Zurich</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of History</orgName>
								<orgName type="institution">University of Zurich</orgName>
								<address>
									<addrLine>Karl Schmid-Strasse 4</addrLine>
									<postCode>8006</postCode>
									<settlement>Zurich</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Zejie</forename><surname>Guo</surname></persName>
							<email>zejie.guo@uzh.ch</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computational Linguistics</orgName>
								<orgName type="institution">University of Zurich</orgName>
								<address>
									<addrLine>Andreasstrasse 15</addrLine>
									<postCode>8050</postCode>
									<settlement>Zurich</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ülkü</forename><surname>Karagöz</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computational Linguistics</orgName>
								<orgName type="institution">University of Zurich</orgName>
								<address>
									<addrLine>Andreasstrasse 15</addrLine>
									<postCode>8050</postCode>
									<settlement>Zurich</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Eva</forename><forename type="middle">Maria</forename><surname>Willi</surname></persName>
							<email>evamaria.willi@uzh.ch</email>
							<affiliation key="aff1">
								<orgName type="department">Department of History</orgName>
								<orgName type="institution">University of Zurich</orgName>
								<address>
									<addrLine>Karl Schmid-Strasse 4</addrLine>
									<postCode>8006</postCode>
									<settlement>Zurich</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Felix</forename><forename type="middle">K</forename><surname>Maier</surname></persName>
							<email>felix.maier@hist.uzh.ch</email>
							<affiliation key="aff1">
								<orgName type="department">Department of History</orgName>
								<orgName type="institution">University of Zurich</orgName>
								<address>
									<addrLine>Karl Schmid-Strasse 4</addrLine>
									<postCode>8006</postCode>
									<settlement>Zurich</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Bringing Rome to Life: Evaluating Historical Image Generation</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">8DD773794F2D18502FDF7B009E3F212B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:49+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Digital Humanities</term>
					<term>image generation</term>
					<term>human evaluation</term>
					<term>automatic evaluation</term>
					<term>history</term>
					<term>image dataset</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This study evaluates the potential of AI image generation for visualising historical events, focusing on two ancient Roman scenarios: the Roman triumph and the Lupercalia festival. Using DALL-E 3, we generated 600 images based on 100 prompts derived from scientific texts. We then conducted a twopart evaluation: (1) A human evaluation by 21 history students, who compared image pairs and rated individual images on accuracy and prompt alignment, and (2) two automated analyses, one modelled after the human evaluation protocol and one using visual question-answering (VQA) techniques.</p><p>Our results reveal both the promise and limitations of AI in historical visualisation. While DALL-E 3 produced many convincing images, there were notable discrepancies between human and automated assessments. We found that Large Language Models tend to rate images more favourably than human evaluators.</p><p>We contribute a novel dataset for historical image generation, initial human and automated evaluation protocols, and insights into the challenges of using AI for historical visualisation, which is incredibly important for historians to reconstruct past events. Our findings highlight the need for refined evaluation methods and underscore the complexity of assessing historical accuracy in AI-generated imagery. This study lays the groundwork for future research on improving AI models for historical visualisation and developing more robust evaluation frameworks.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Historians, akin to criminologists, analyse primary sources and eyewitness accounts to extract meaning and understand the motives and circumstances of historical events. However, unlike criminologists, who can re-enact events, historians face the challenge of studying occurrences that cannot be replicated or reproduced in experiments. This presents a significant challenge in their work.</p><p>Criminologists have developed methods to mitigate the uncertainties involved. Re-enacting crucial moments of an action or crime using real people or AI-based simulations has become</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Our Contribution</head><p>Our study focuses on these two events due to their significance in Roman culture and the varying levels of textual and visual documentation available for each. The Roman triumph, a well-documented celebration of military victory, provides a rich base of textual descriptions. In contrast, the Lupercalia, an ancient fertility festival, offers a more challenging scenario with fewer detailed contemporary accounts.</p><p>To assess DALL-E 3's capabilities in this domain, we generated 600 images -450 for the triumph and 150 for the Lupercalia (see Section 3). Our evaluation process is twofold:</p><p>1. Human evaluation: We conducted a comprehensive review involving 21 advanced history students to assess the images' historical accuracy. 2. Automated analysis: We employed computer vision techniques to analyse the images for prompt alignment.</p><p>This dual approach allows us to measure the generated images' subjective impact on human viewers and their objective alignment with historical data. Our research contributes to the broader discussion of AI's potential in historical visualisation and its limitations and contains the following items:</p><p>1. A novel, automatically generated dataset comprising 100 prompts and 600 images for historical image generation. 2. An initial human evaluation of a subset of these automatically generated images.</p><p>3. An initial automatic evaluation of the same subset.</p><p>4. An assessment of how well human and automatic evaluation correlate.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>The evaluation of automatically generated images has recently gained traction, mainly due to the increasingly sophisticated image generation models. Otani, Togashi, Sawai, Ishigami, Nakashima, Rahtu, Heikkilä, and Satoh <ref type="bibr" target="#b23">[24]</ref> contemplated, based on an extensive analysis of 37 papers, that human evaluation protocols are often not reproducible and lack a clear description. Moreover, evaluation usually relies on automatic measures that poorly align with human scores.</p><p>The advantage of human feedback is that it can improve text-to-image models, e.g., with reinforcement learning from human feedback (as used in Natural Language Processing <ref type="bibr" target="#b27">[28]</ref>). Xu, Liu, Wu, Tong, Li, Ding, Tang, and Dong <ref type="bibr" target="#b32">[33]</ref> exploited a dataset of 8,878 prompts and 136,892 image comparisons to fine-tune a reward model that aligns more closely with human preferences. Liang, He, Li, Li, Klimovskiy, Carolan, Sun, Pont-Tuset, Young, Yang, Ke, Dvijotham, Collins, Luo, Li, Kohlhoff, Ramachandran, and Navalpakkam <ref type="bibr" target="#b16">[17]</ref> used human feedback concerning Plausibility, Aesthetics, Text-image Alignment, and an Overall impression to predict human feedback scores. Due to the successful integration of human feedback in the model fine-tuning by Xu, Liu, Wu, Tong, Li, Ding, Tang, and Dong <ref type="bibr" target="#b32">[33]</ref>, we created an evaluation scenario which allows us to integrate such feedback directly in future work (see Section 4.1).</p><p>While Xu, Liu, Wu, Tong, Li, Ding, Tang, and Dong <ref type="bibr" target="#b32">[33]</ref> focused on prompt-to-image alignment, other image properties are open for evaluation. Lee, Yasunaga, Meng, Mai, Park, Gupta, Zhang, Narayanan, Teufel, Bellagente, Kang, Park, Leskovec, Zhu, Li, Wu, Ermon, and Liang <ref type="bibr" target="#b15">[16]</ref> worked on holistic image evaluation and identified twelve aspects among which we find Alignment, Quality, Aesthetics, and Originality (among others). Evaluating each aspect calls for different measures, some of them human, some of them automated. They created a holistic image evaluation benchmark for existing datasets and reported scores for all aspects and 26 models. While such an evaluation effort is valuable and provides a helpful oversight, we focus on prompt-to-image alignment evaluation in this work.</p><p>The research mentioned above has had access to large and heterogeneous datasets and results from extensive evaluation campaigns. In the context of historical image generation, such work does not yet exist. One exception is the investigation of Fareed, Bou Nassif, and Nofal <ref type="bibr" target="#b7">[8]</ref> who tested the usage of Leonardo<ref type="foot" target="#foot_0">1</ref> for teaching purposes in the field of "History of Architecture". They evaluated the usability of Leonardo with a questionnaire after a workshop, which generally showed a need for the evaluation of AI-generated images for usage in the historical domain.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Data Collection with DALL-E 3</head><p>Next, we outline the methodology for data collection using DALL-E 3 to generate images related to triumphal processions and the Lupercalia, which included the following steps:</p><p>1. Collecting Historical Documents: We collected resources (i.e., academic papers, books, and other relevant documents) about the triumph and the Lupercalia in ancient Rome. Specifically, we included five documents related to the Lupercalia <ref type="bibr" target="#b31">[32,</ref><ref type="bibr" target="#b28">29,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b9">10]</ref> and 15 documents focused on triumphal processions <ref type="bibr" target="#b26">[27,</ref><ref type="bibr" target="#b21">22,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b13">14,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b24">25,</ref><ref type="bibr" target="#b12">13,</ref><ref type="bibr" target="#b18">19,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b0">1,</ref><ref type="bibr" target="#b29">30]</ref>. 2. Creating Prompts from Documents: For each document, we manually derived five prompts. Each prompt was designed to capture a specific scene described in the texts. E.g., a document on triumphal processions could include prompts about the attire Romans wore, the types of vehicles used, or the procession sequence. In total, we created 100 prompts. 3. Image Generation with DALL-E 3: We used each prompt to generate six images using DALL-E 3 <ref type="bibr" target="#b3">[4]</ref> via the OpenAI API. <ref type="foot" target="#foot_1">2</ref> The 100 prompts resulted in 150 generated images for the Lupercalia and 450 for the triumphal processions. <ref type="foot" target="#foot_2">3</ref>Note that we did not force the model to produce realistic images. This led to a great variety of image styles, some of which are indeed life-like, while others are more in the style of a Renaissance painting or a black-and-white pencil sketch. All prompts, however, are based on scientific literature. See Figure <ref type="figure" target="#fig_0">1</ref> for example images and prompts from the dataset. <ref type="foot" target="#foot_3">4</ref></p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Evaluating Automatically Generated Data</head><p>The following sections focus on the different evaluation scenarios employing human annotators and automatic evaluation measures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Human Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.1.">Human Evaluation Setup</head><p>We generated two evaluation scenarios to obtain feedback from human annotators.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Image Comparison (IC)</head><p>The first scenario asks annotators to decide which of two images better reflects the prompt. This is a cognitively easier task. Much in the manner of Xu, Liu, Wu, Tong, Li, Ding, Tang, and Dong <ref type="bibr" target="#b32">[33]</ref>, we plan to use these ratings for fine-tuning models to produce more faithful images. The participants are instructed not to judge the image style. We only compared images generated with the same prompt, which, based on the formula 𝑛(𝑛−1) 2 to find unique pairings, results in 15 pairs per prompt (as mentioned in the previous section, we generated six images per prompt). Multiplied with the 100 total prompts in the dataset, we arrive at 1,500 comparisons.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Image Rating (IR)</head><p>The second task requires the participants to rate an image on a 5-point Likert scale with the following options:</p><p>1. The image does not match the prompt at all. 2. The image barely contains aspects of the prompt. 3. The image catches some aspects of the prompt, but it is not very accurate. 4. The image catches most of the aspects of the prompt. 5. The image completely matches the prompt.</p><p>Additionally, we asked the users to describe which aspects of the image did not correspond to the prompt in a text field. In this scenario, which demands more time and effort, we need 600 ratings for one complete dataset annotation.</p><p>We set up a Prodigy interface, <ref type="foot" target="#foot_4">5</ref> which we used to obtain the assessment of the annotators. See Figure <ref type="figure" target="#fig_1">2</ref> to get an impression of the annotation environment. We recruited 21 advanced history students for the annotations. We did not ask the participants to annotate a specific number of pairs. They were compensated with book vouchers of a value of $30. An online meeting was organised to explain the guidelines, emphasising that in the first scenario, they should judge based on the alignment of the images with the prompts rather than their visual appeal. They should consider visual features only if the two images reflect the prompts equally. The students spent approximately one afternoon annotating the data in both scenarios.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.2.">Results of Human Evaluation</head><p>Table <ref type="table" target="#tab_0">1</ref> gives an overview of the results from the human evaluation. In the IC setting, we received 1,569 comparisons. 103 samples were annotated more than once. For unknown reasons, 64 data points did not contain the human assessment, so we excluded them from further analysis. On average, each participant compared 74.71 (SD 43.32) image pairs. The IR scenario received less feedback since the participants provided written feedback in a text field besides their rating. We obtained 568 ratings, of which 29 were double ratings-24 feedbacks without scores needed to be excluded.</p><p>We must note here that, due to a wrong parameter setting of Prodigy in both scenarios, the data samples to be evaluated were presented to the participants in sequential instead of a random order. This led to only marginal annotation overlap. For this reason, we cannot compute inter-annotator agreements (IAA) yet. However, since we plan to improve the models with the feedback obtained from the participants, we will have further evaluation rounds during which we can take care of this limitation. Still, to the best of our knowledge, this is the first "largescale" evaluation campaign dedicated to historical image generation. We can still analyse and compare the results obtained with the limitations in mind (see Section 4.1.3).</p><p>However, since previous studies reported low IAA in human evaluation scenarios (cf. <ref type="bibr" target="#b15">[16]</ref>), we hypothesise a similar outcome on our dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.3.">Comparison of Human Results with Large Language Model (LLM) Evaluation</head><p>To mitigate the missing information on IAA and to evaluate the suitability of multimodal LLMs for scoring tasks, we employed GPT-4o <ref type="bibr" target="#b20">[21]</ref>, Gemini 1.5 Pro <ref type="bibr" target="#b25">[26]</ref> and Claude 3.5 Sonnet. <ref type="foot" target="#foot_5">6</ref> We let the LLMs solve the same tasks as the annotators, i.e., we applied them to the IC (only GPT-4o) and IR (all three) evaluation scenarios. <ref type="foot" target="#foot_6">7</ref>For IC, Table <ref type="table" target="#tab_1">2</ref> shows the agreements of the human comparisons with GPT-4o's comparisons. We see that in 57.51% of the cases, human annotators and GPT-4o agree on which of the two images better corresponds to the prompt.</p><p>Figure <ref type="figure">3</ref> summarises the results for the IR setting. The left graph shows the differences between the human and the LLM ratings. The tendency is that LLMs rate images higher than human annotators. The right graph shows the LLM's deviations from the human scores. E.g., in 164 (30.15%) ratings, GPT-4o agrees with the human scores. In 169 (31.07%) cases, GPT-4o scores one point higher on the Likert scale than the human annotators (i.e., GPT-4o had rated an image a 3 when the human annotator rated it at 2). We see that Claude tends to rate images higher, especially. Overall, the deviations seem normally distributed, a fact that might be exploited for future evaluations.</p><p>Choosing two scenarios to evaluate allows us to test for differences in assessing images  between the triumph and the Lupercalia scenario. Our null hypothesis 𝐻 0 is that there is no difference in the ratings of human annotators and, e.g., GPT-4o in the two historical scenarios. Table <ref type="table" target="#tab_2">3</ref> shows the results of two Welch's t-test <ref type="bibr" target="#b30">[31]</ref>, which we chose because of (i) unequal variation and (ii) unequal sample sizes. For the human evaluation (unifying the assessment results but excluding invalid samples), the p-value does not allow us to reject 𝐻 0 . The GPT and Gemini ratings show another picture. The p-values show a highly significant difference between ratings of the triumph and the Lupercalia images. Claude's p-value is on the brink of showing a statistically significant difference. The, on average, lower ratings by LLMs of the Lupercalia images could indicate DALL-E's difÏculties in generating adequate imagery. Firstly, since the Lupercalia are not so much a described nor illustrated phenomenon, it is reasonable that images portraying the festival are not on the same standard as those generated for the triumphal procession. Secondly, the automatic evaluation poses problems for LLMs because they do not "know" as much as they do for the triumph.</p><p>Although we cannot provide IAA scores for the human evaluation yet, we can do so for the automatically generated ratings by the LLMs. Table <ref type="table" target="#tab_3">4</ref> shows the results when we compare the ratings for the LLMs (again split into triumph-and Lupercalia-related scores). The scores are all around 0, indicating low overlap, IAA. Unifying all human scores and comparing them against the ratings obtained via GPT-4o also shows low overlap. These results hint at the very different rating "strategies" of the LLMs. We need further evaluation to shed more light on the origins of the discrepancies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Automatic Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Automatic Evaluation Setup</head><p>For a further fully automatic evaluation procedure, we employed the Question Generation and Answering (QG/A) <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b5">6]</ref> framework for automatic image evaluation. The first step in this framework involves using a pre-trained language model to generate a set of questions based on a given prompt and question-generation instructions via few-shot learning. In the second step, a pre-trained multimodal model generates answers given the image and the generated set of questions.</p><p>Question Generation (QG) In our study, we utilised GPT-3.5 <ref type="bibr" target="#b4">[5]</ref> for QG employing the Davidsonian Scene Graph (DSG) <ref type="bibr" target="#b5">[6]</ref> method. DSG serves as an evaluation framework grounded in formal semantics. This method's main advantage is its ability to generate atomic and unique questions structured in dependency graphs, which (i) ensure comprehensive semantic coverage and (ii) avoid inconsistencies in responses. Cho, Hu, Garg, Anderson, Krishna, Baldridge, Bansal, Pont-Tuset, and Wang <ref type="bibr" target="#b5">[6]</ref> empirically demonstrated that DSG addresses the challenges of hallucinations, duplications, and omissions in QG.</p><p>Visual Question Answering (VQA) We employed GPT-4o for the VQA task. The following prompt instruction guides the model: "You are a helpful assistant. Please answer the question only with 'Yes' or 'No'. Do not give other outputs. Question: {question}." To ensure precise control over the output, specifically responding with either 'Yes' or 'No', we set the parameter logit_bias to 100 for both 'Yes' and 'No' tokens. Logit bias modifies the likelihood of specified tokens appearing in the model-generated output. We also set the top_p (nucleus sampling) parameter to 0.1 to restrict the model's consideration to a subset of tokens (the nucleus) whose cumulative probability mass reaches a designated threshold (top-p). In the context of a 0.1 top_p setting, the model exclusively considers tokens constituting the top 10% of the probability mass for the subsequent token. The combination of logit_bias and top_p configurations enables the outputs to adhere to predefined patterns ('Yes' and 'No'), rendering the model more deterministic and particularly suitable for our image evaluation task. <ref type="foot" target="#foot_7">8</ref> We assign a score of 1 for 'Yes' and 0 for 'No' and then compute an average score for each image. We observe that GPT occasionally generates questions such as "Is there an image?" or "Can you visualize a scene?" which are invalid in our context, as the input consistently includes an image and a set of questions. We excluded the scores of these invalid questions from our analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Results of VQA</head><p>Figure <ref type="figure" target="#fig_2">4</ref> shows a histogram of the results of the VQA scores for all 600 images. We find most scores between 0.5 and 0.9, with over 60 images obtaining a perfect score of 1. This means that each 'Yes-or-No' question was answered with 'Yes'. When we look at three results as presented in Figure <ref type="figure" target="#fig_3">5</ref> in Appendix A, we find that VQA attributes a low score of 0.05 for image a). The human evaluator and GPT, however, have scored this image with a 4 in the IR scenario. In b), we have a medium VQA score of 0.61, a human score of 5 and a GPT score of 4. Lastly, c) shows an image with a VQA score of 1, but a human annotator scored this image a 3 and GPT a 4. We already see discrepancies between the different scores from these three examples only. A comparison of VQA between the 450 images from the triumphal procession and the 150 images from the Lupercalia based on Welch's t-test shows no significant differences between the two ratings (𝑝 = 0.88). From this, we conclude that ratings based on VQA produce more reliable results than those produced with a Likert scale.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Limitations and Outlook</head><p>The most significant limitation of our work is the missing IAA scores. For future evaluation rounds, we will set up the evaluation to allow for their computation. In this way, we get reliable measures of how demanding the task of assessing the alignment of historical images and the prompts they produced is. However, we argue that the results we obtained from the human evaluation are still valuable and allow for fine-tuning models based on human feedback (preferences in the IC and textual input in the IR scenario), albeit in a low-resource setting.</p><p>Moreover, we will employ more models to generate images in future experiments. This approach enables us to decide which models are the most suitable for historical image generation. The stable prompt base also allows for comparable results. Still, the significant number of images we will generate in future endeavours also calls for automatic evaluation methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In conclusion, our study provides valuable insights into the potential and challenges of using AI for historical image generation. The evaluation of 600 AI-generated images of triumphal processions and the Lupercalia revealed both promising capabilities and significant limitations.</p><p>Our findings hint at the discrepancies between human and automated assessments, underscoring the complexity of evaluating historical accuracy in AI-generated imagery. Ultimately, this study serves as a stepping stone towards more sophisticated use of AI in historical recreation and education while cautioning against over-reliance on automated systems for historical interpretation. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Additional Figures</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Four example images for the two scenarios generated with DALL-E 3. Top row (a and a'), triumphal procession, prompt: Generate an image of Trajan's Triumph as it passes through the Circus Maximus from the point of view of one of the around 150,000 to 250,000 spectators. Bottom row (b and b'), Lupercalia, prompt: Create a historical image of a group of Luperci running about naked and holding thongs made of goat hides during the Lupercalia ritual in 44 BCE at the foot of the Palatine Hill. As they run past people they strike them with the thongs. They are laughing, larking about the exchanging obscenities with those who attended the ritual. People seem to be happy with what's going on.</figDesc><graphic coords="5,198.57,204.86,95.63,95.63" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Parts of the Prodigy interface to obtain human assessments: a) the interface for image comparison with the side panel with an overview of how many image pairs have been annotated, b) the interface for the rating scenario with the 5-point Likert scale and a text comment field.</figDesc><graphic coords="6,326.46,191.04,111.82,70.87" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Histogram of scores obtained with the VQA evaluation.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Examples of the VQA ratings. a) from the triumphal procession based on Mittag [19], scored 0.05 based on 22 questions, prompt: "There exist coins minted in 326 CE which show Emperor Constantinus I. on an elephant quadriga during the celebrations of his viceannalia (20 years on the throne). Although textual sources do not confirm that elephant quadrigas were in use, create an image that shows Constantinus I. together with his son Constantius II. on a chariot pulled by four elephants during the vicennalia in Nicomedia. The chariot is accompanied by two lictores. The elephants are guided by Mahouts and Constantinus the I. wears the laurel wreath. ", scored a 4 by both human evaluators and GPT, b) from the Lupercalia based on Erker [7], scored 0.61 based on 18 questions, prompt: "Create an image that shows high-ranking magistrates of ancient Rome, dressed in loincloths. They are emerging from a cave of the Paletine Hill to start the traditional run of the Lupercalian festival. They are running on a rugged terrain under a blue sky. ", scored a 5 by a human annotator and a 4 by GPT, c) from the triumphal procession based on Madsen [18], scored 1.00 based on 19 questions, prompt: "Create a historical image of the spectacle of Pompey's triumph in 61 BC. Pompey adorned in triumphal regalia, parades through the streets of Rome atop his chariot, with captured treasures and defeated foes on display. Imagine the jubilation among the crowds as they celebrate Pompey's military prowess and the expansion of Roman territories under his command. ", scored a 3 by a human annotator and a 4 by GPT.</figDesc><graphic coords="14,354.90,122.48,105.71,105.71" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Overview of results in the IC and the IR scenarios.</figDesc><table><row><cell>Evaluation scenario</cell><cell cols="4">Total assessments Multiple annotations Excluded After exclusions</cell></row><row><cell>Image Comparison (IC)</cell><cell>1,569</cell><cell>103</cell><cell>64</cell><cell>1,505</cell></row><row><cell>Image Rating (IR)</cell><cell>568</cell><cell>29</cell><cell>24</cell><cell>544</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Agreement of human evaluation with GPT-4o's assessment.</figDesc><table><row><cell cols="3">Score Agreement Count Percentage</cell></row><row><cell>TRUE</cell><cell>864</cell><cell>57.41%</cell></row><row><cell>FALSE</cell><cell>641</cell><cell>42.59%</cell></row><row><cell>Total</cell><cell>1,505</cell><cell></cell></row></table><note>Left: Aggregation and comparison of scores of human ratings vs. LLM ratings. Right: Deviation of LLM scores from human ratings.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Data statistics and results of Welch's t-test.</figDesc><table><row><cell>Ratings</cell><cell>Human</cell><cell></cell><cell>GPT</cell><cell></cell><cell>Gemini</cell><cell></cell><cell>Claude</cell><cell></cell></row><row><cell></cell><cell cols="8">Triumph Lupercalia Triumph Lupercalia Triumph Lupercalia Triumph Lupercalia</cell></row><row><cell># of samples</cell><cell>404</cell><cell>140</cell><cell>404</cell><cell>140</cell><cell>404</cell><cell>140</cell><cell>404</cell><cell>140</cell></row><row><cell>Average score</cell><cell>3.46</cell><cell>3.31</cell><cell>4.09</cell><cell>3.68</cell><cell>3.60</cell><cell>3.03</cell><cell>3.91</cell><cell>3.81</cell></row><row><cell>SD</cell><cell>1.12</cell><cell>1.14</cell><cell>0.70</cell><cell>0.82</cell><cell>0.90</cell><cell>0.74</cell><cell>0.46</cell><cell>0.52</cell></row><row><cell>p-value</cell><cell>0.18</cell><cell></cell><cell>0.0000002</cell><cell></cell><cell>0.00004</cell><cell></cell><cell>0.052</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Inter-annotator agreement between different groups using Krippendorff's alpha. We used the same 544 images for which we have computed the t-test..</figDesc><table><row><cell></cell><cell cols="2">GPT vs. Gemini vs. Claude</cell><cell cols="2">GPT vs. human</cell></row><row><cell></cell><cell>Triumph</cell><cell cols="3">Lupercalia Triumph Lupercalia</cell></row><row><cell>𝛼</cell><cell>0.079</cell><cell>-0.044</cell><cell>-0.008</cell><cell>-0.005</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">See https://leonardo.ai.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">See https://openai.com/index/openai-api.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">The image generation costs amount to $48.06.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">The whole dataset (images and prompts) is available on GitHub. See https://github.com/AncientHistory-UZH/C HR2024_prompt-and-image-dataset.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">See https://prodi.gy.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">See https://www.anthropic.com/news/claude-3-5-sonnet.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">This generated costs of $19.14 for GPT-4o, $2.49 for Gemini and $5.27 for Claude.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">This evaluation scenario cost us $7.69. The whole experiment, i.e., image generation, LLM evaluation in the two scenarios from Section 4.1.3 and the one mentioned in this section totalled at $80.67.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This research contributes a novel dataset and evaluation framework to the field, enabling future studies. As AI continues to evolve, our work suggests that while it holds promise for enhancing historical visualisation and understanding, it requires careful human oversight and interpretation.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">The Roman Triumph: Participation, Historiography and Remembrance</title>
		<author>
			<persName><forename type="first">A</forename><surname>Algül</surname></persName>
		</author>
		<ptr target="https://www.academia.edu/43295099/The%5C%5FRoman%5C%5FTriumph%5C%5FParticipation%5C%5FHistoriography%5C%5Fand%5C%5FRemembrance" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Claiming Victory: The Early Roman Triumph</title>
		<author>
			<persName><forename type="first">J</forename><surname>Armstrong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Rituals of triumph in the Mediterranean world. Culture and History of the Ancient Near East 63</title>
				<editor>
			<persName><forename type="first">Jeremy</forename><surname>Armstrong</surname></persName>
		</editor>
		<meeting><address><addrLine>Leiden u.a</addrLine></address></meeting>
		<imprint>
			<publisher>Brill</publisher>
			<date type="published" when="1947">1947. 2013</date>
			<biblScope unit="page" from="7" to="22" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">The Roman Triumph</title>
		<author>
			<persName><forename type="first">M</forename><surname>Beard</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
			<publisher>Harvard University Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Improving Image Generation with Better Captions</title>
		<author>
			<persName><forename type="first">J</forename><surname>Betker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Goh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brooks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Guo</surname></persName>
		</author>
		<ptr target="https://cdn.openai.com/papers/dall-e-3.pdf" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Language Models are Few-Shot Learners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Herbert-Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krueger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Henighan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ziegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Winter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sigler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Litwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Berner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mccandlish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper%5C%5Ffiles/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Hadsell</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Balcan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1877" to="1901" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Garg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Anderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Krishna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Baldridge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bansal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pont-Tuset</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/2310.18235" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Das Lupercalia-Fest im augusteischen Rom: Performativität, Raum und Zeit</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Š</forename><surname>Erker</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783110208962.2.145</idno>
	</analytic>
	<monogr>
		<title level="j">Archiv für Religionsgeschichte</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="145" to="178" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Exploring the Potentials of Artificial Intelligence Image Generators for Educating the History of Architecture</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">W</forename><surname>Fareed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Nassif</surname></persName>
		</author>
		<author>
			<persName><surname>Nofal</surname></persName>
		</author>
		<idno type="DOI">10.3390/heritage7030081</idno>
	</analytic>
	<monogr>
		<title level="j">Heritage</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="1727" to="1753" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Der römische Triumph in Prinzipat und Spätantike: Probleme -Paradigmen -Perspektiven</title>
		<idno type="DOI">10.1515/9783110448009-003</idno>
	</analytic>
	<monogr>
		<title level="m">Der römische Triumph in Prinzipat und Spätantike</title>
				<editor>
			<persName><forename type="first">F</forename><surname>Goldbeck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Wienand</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin, Boston</addrLine></address></meeting>
		<imprint>
			<publisher>De Gruyter</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1" to="26" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Augustus, the Lupercalia and the Roman identity</title>
		<author>
			<persName><forename type="first">D</forename><surname>Guarisco</surname></persName>
		</author>
		<idno type="DOI">10.1556/068.2015.55.1-4.16</idno>
		<ptr target="https://akjournals.com/view/journals/068/55/1-4/article-p223.xml" />
	</analytic>
	<monogr>
		<title level="j">Acta Antiqua Academiae Scientiarum Hungaricae</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="issue">1-4</biblScope>
			<biblScope unit="page" from="223" to="228" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kasai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ostendorf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Krishna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/2303.11897" />
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Mock the Triumph: Cassius Dio, Triumph and Triumph-Like Celebrations</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lange</surname></persName>
		</author>
		<idno type="DOI">10.1163/9789004335318\_007</idno>
	</analytic>
	<monogr>
		<title level="m">Cassius Dio. Brill&apos;s Historiography of Rome and Its Empire Series</title>
				<imprint>
			<publisher>Brill</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="92" to="114" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The Late Republican Triumph: Continuity and Change</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Lange</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783110448009-004</idno>
		<ptr target="https://doi.org/10.1515/9783110448009-004" />
	</analytic>
	<monogr>
		<title level="m">Der römische Triumph in Prinzipat und Spätantike</title>
				<meeting><address><addrLine>Berlin, Boston</addrLine></address></meeting>
		<imprint>
			<publisher>De Gruyter</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="29" to="58" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The Triumph outside the City: Voices of Protest in the Middle Republic</title>
		<author>
			<persName><forename type="first">C</forename><surname>Lange</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Roman Republican Triumph</title>
				<editor>
			<persName><forename type="first">C</forename><forename type="middle">Hjort</forename><surname>Lange</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Vervaet</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="67" to="81" />
		</imprint>
	</monogr>
	<note>. Quasar</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Triumph and Civil War in the Late Republic</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Lange</surname></persName>
		</author>
		<idno type="DOI">10.1017/s0068246213000056</idno>
	</analytic>
	<monogr>
		<title level="j">Papers of the British School at Rome</title>
		<imprint>
			<biblScope unit="volume">81</biblScope>
			<biblScope unit="page" from="67" to="90" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Holistic Evaluation of Text-to-Image Models</title>
		<author>
			<persName><forename type="first">T</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yasunaga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Narayanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Teufel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bellagente</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Leskovec</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-Y</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F.-F</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ermon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Liang</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper%5C%5Ffiles/paper/2023/file/dd83eada2c3c74db3c7fe1c087513756-Paper-Datasets%5C%5Fand%5C%5FBenchmarks.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Naumann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Globerson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Saenko</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hardt</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="69981" to="70011" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Rich Human Feedback for Text-to-Image Generation</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Klimovskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Carolan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pont-Tuset</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Young</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">D</forename><surname>Dvijotham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Collins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">J</forename><surname>Kohlhoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ramachandran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Navalpakkam</surname></persName>
		</author>
		<ptr target="https://openaccess.thecvf.com/content/CVPR2024/papers/Liang%5C%5FRich%5C%5FHuman%5C%5FFeedback%5C%5Ffor%5C%5FText-to-Image%5C%5FGeneration%5C%5FCVPR%5C%5F2024%5C%5Fpaper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR</meeting>
		<imprint>
			<biblScope unit="volume">2024</biblScope>
			<biblScope unit="page" from="19401" to="19411" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">The Loser&apos;s Prize: Roman Triumphs and Political Strategies during the Mithridatic Wars</title>
		<author>
			<persName><forename type="first">J</forename><surname>Madsen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Analecta Romana Instituti Danici. Supplementa Xlv. Quasar</title>
		<imprint>
			<biblScope unit="page" from="117" to="130" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note>The Roman Republican Triumph Beyond the Spectacle</note>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Die Triumphatordarstellung auf Münzen und Medaillons in Prinzipat und Spätantike</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Mittag</surname></persName>
		</author>
		<idno type="DOI">10.1515/9783110448009-017</idno>
	</analytic>
	<monogr>
		<title level="m">Der römische Triumph in Prinzipat und Spätantike</title>
				<meeting><address><addrLine>Berlin, Boston</addrLine></address></meeting>
		<imprint>
			<publisher>De Gruyter</publisher>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="419" to="452" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Caesar at the Lupercalia</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>North</surname></persName>
		</author>
		<idno type="DOI">10.3815/007543508786239210</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Roman Studies</title>
		<imprint>
			<biblScope unit="volume">98</biblScope>
			<biblScope unit="page" from="144" to="160" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title/>
		<author>
			<persName><surname>Openai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gpt</forename><surname>Hello</surname></persName>
		</author>
		<idno>-4o</idno>
		<ptr target="https://openai.com/index/hello-gpt-4o" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Östenberg</surname></persName>
		</author>
		<idno type="DOI">10.1093/acprof:oso/9780199215973.001.0001</idno>
		<title level="m">Staging the World: Spoils, Captives, and Representations in the Roman Triumphal Procession</title>
				<meeting><address><addrLine>Oxford</addrLine></address></meeting>
		<imprint>
			<publisher>Oxford University Press</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Triumph and spectacle. Victory celebrations in the Late Republican civil wars</title>
		<author>
			<persName><forename type="first">I</forename><surname>Östenberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The Roman Republican Triumph Beyond the Spectacle</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="181" to="193" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Otani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Togashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sawai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ishigami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Nakashima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Rahtu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Heikkilä</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Satoh</surname></persName>
		</author>
		<ptr target="https://cvpr2023.thecvf.com/virtual/2023/poster/22014" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="14277" to="14286" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">The Architecture of the Roman Triumph: Monuments, Memory, and Identity</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Popkin</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>Cambridge University Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context</title>
		<author>
			<persName><forename type="first">M</forename><surname>Reid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Savinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Teplyashin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lepikhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lillicrap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>-B. Alayrac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Soricut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lazaridou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Firat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schrittwieser</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv</idno>
		<idno type="arXiv">arXiv:2403.05530</idno>
		<idno>.2403.05530</idno>
		<ptr target="https://doi.org/10.48550/arXiv" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Wege des Triumphes. Zum Verlauf der Triumphzüge im spätrepublikanischen und augusteischen Rom</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Schipporeit</surname></persName>
		</author>
		<ptr target="https://zenon.dainst.org/Record/001069375" />
	</analytic>
	<monogr>
		<title level="m">Triplici invectus triumpho : Der römische Triumph in augusteischer Zeit</title>
				<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="95" to="136" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Learning to Summarize with Human Feedback</title>
		<author>
			<persName><forename type="first">N</forename><surname>Stiennon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ziegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Lowe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">F</forename><surname>Christiano</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper%5C%5Ffiles/paper/2020/file/1f89885d556929e98d3ef9b86448f951-Paper.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Hadsell</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Balcan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="3008" to="3021" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">The Lupercalia and the Romulus and Remus Legend</title>
		<author>
			<persName><forename type="first">P</forename><surname>Tennant</surname></persName>
		</author>
		<ptr target="http://www.jstor.org/stable/24591847" />
	</analytic>
	<monogr>
		<title level="j">Acta Classica</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="page" from="81" to="93" />
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Gendering the Roman Triumph: Elite Women and the Triumph in the Republic and Early Empire</title>
		<author>
			<persName><forename type="first">L</forename><surname>Webb</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Brännstedt</surname></persName>
		</author>
		<idno type="DOI">10.1163/9789004524774\_005</idno>
	</analytic>
	<monogr>
		<title level="m">Gendering Roman Imperialism</title>
				<meeting><address><addrLine>Leiden, The Netherlands</addrLine></address></meeting>
		<imprint>
			<publisher>Brill</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="58" to="95" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">The Generalization of Student&apos;s Problem when Several Different Population Variances are Involved</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">L</forename><surname>Welch</surname></persName>
		</author>
		<idno type="DOI">10.1093/biomet/34.1-2.28</idno>
		<ptr target="https://doi.org/10.1093/biomet/34.1-2.28" />
	</analytic>
	<monogr>
		<title level="j">Biometrika</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="issue">1-2</biblScope>
			<biblScope unit="page" from="28" to="35" />
			<date type="published" when="1947">1947</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Das Angebot des Diadems an Caesar und das Luperkalienproblem</title>
		<author>
			<persName><forename type="first">K.-W</forename><surname>Welwei</surname></persName>
		</author>
		<ptr target="http://www.jstor.org/stable/4434966" />
	</analytic>
	<monogr>
		<title level="j">Historia: Zeitschrift für Alte Geschichte</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="44" to="69" />
			<date type="published" when="1967">1967</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dong</surname></persName>
		</author>
		<ptr target="https://proceedings.neurips.cc/paper%5C%5Ffiles/paper/2023/file/33646ef0ed554145eab65f6250fab0c9-Paper-Conference.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Naumann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Globerson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Saenko</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Hardt</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Levine</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="15903" to="15935" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
