<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Dialogue in Abridged Literary Texts in Japanese: Observations and Automatic Reproducibility</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Iglika</forename><surname>Nikolova-Stoupak</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Sens Texte Informatique Histoire</orgName>
								<orgName type="institution">Sorbonne Université</orgName>
								<address>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Eva</forename><surname>Schaeffer-Lacroix</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Sens Texte Informatique Histoire</orgName>
								<orgName type="institution">Sorbonne Université</orgName>
								<address>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Gaël</forename><surname>Lejeune</surname></persName>
							<email>gael.lejeune@sorbonne-universite.fr</email>
							<affiliation key="aff0">
								<orgName type="department">Sens Texte Informatique Histoire</orgName>
								<orgName type="institution">Sorbonne Université</orgName>
								<address>
									<settlement>Paris</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Dialogue in Abridged Literary Texts in Japanese: Observations and Automatic Reproducibility</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">7E2A0E6798F027C88720798A1835AA4D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Japanese dialogue, dialogue in literature, literary abridgement (G. Lejeune) 0000-0001-5230-3458 (I. Nikolova-Stoupak)</term>
					<term>0000-0002-6260-9095 (E. Schaeffer-Lacroix)</term>
					<term>0000-0002-4795-2362 (G. Lejeune)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This study investigates the language utilised in dialogues in abridged literary texts in Japanese that are commonly appreciated by language learners. The focus on dialogue is essential, as it uniquely illustrates the dynamics between interlocutors whilst highlighting cultural and linguistic norms. The study explores phenomena typical of the Japanese language, such as registers of formality, the use of end-of-sentence particles and explicit subject markers, as well as universal linguistic features commonly used in readability formulas and considered a proxy for textual complexity. Three world-famous literary texts as abridged by popular publishers (e.g. 'Aoitori Bunko') are considered, and insight is sought concerning how the dialogues within them compare to the texts in their entirety as well as to the dialogues in unabridged versions of the same texts. Ultimately, an experiment is proposed in order to test the overall validity of the identified linguistic patterns, in which the state-of-the-art GPT-4o model is provided with relevant guidelines and asked to simplify literary dialogue.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Advanced dialogue systems such as OpenAI's popular chatbot ChatGPT and a growing fascination for Japanese-language cultural artefacts like animations and comic books coincidentally mark the current global landscape, particularly in relation to a young audience. A natural intersection point of the two phenomena is the educational sphere, where AI-based tools are more and more commonly employed as learning support, and the number of learners of Japanese as a foreign language is demonstrating a perceivable increase <ref type="bibr">[1]</ref>. As original Japanese materials targeted at learners of the language are scarce, the latter often resort to alternative sources that are closely compatible, such as the book series 'Aoitori Bunko' <ref type="bibr">[2]</ref> and 'Tsubasa Bunko' <ref type="bibr">[3]</ref>, which offer adaptations of popular texts targeted at young Japanese readers. The exploitation of these materials' qualities for the automatic or partially automatic generation of additional ones is a route worth exploring, as it would offer benefits to learners in terms of material suitability and availability as well as to professional educators/writers in terms of time and effort.</p><p>The current work aims to provide quantifiable insights into the characteristics of abridged texts in the Japanese language for future use in the context of automatic language generation. Specifically, the rendition of dialogues is initially focused on as they saturate an array of specificities of Japanese culture, including interpersonal dynamics and levels of politeness, which in turn are expressed through language in standard ways, such as the chosen register (long vs short verb forms). The dialogues in three adapted literary texts (Alice's Adventures in Wonderland, Les Miserables and Tom Sawyer) are examined with a focus on typical features of the Japanese language pertaining to dialogue (e.g. the use of end-of-sentence particles) as well as readability-based characteristics (e.g. type-to-token ratio). Comparisons are made, on the one hand, between the characteristics of the dialogue portions of the texts and the texts taken as a whole and, on the other hand, between dialogue in the abridged texts and in their unabridged counterparts. As a result of the comparison, guidelines for simplification of literary dialogue in the language are established and, by extension, conclusions can be drawn about the features of such dialogues that render them easy or difficult for readers. In a related experiment, the GPT-4o model (as per ChatGPT) is asked to provide a simplified version of a literary dialogue in Japanese in three discrete settings: zero-shot, one-shot (in which a reference pair of unabridged and abridged versions of a dialogue-filled text is provided) and through chain-of-thought prompt engineering that incorporates the quantified norms established by the current work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Background</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Automatic Dialogue Generation</head><p>Automatic dialogue generation differentiates between task-oriented and open-domain systems. The former are associated with a specific domain and, typically, specific knowledge and databases. In contrast, open-domain systems are broader in scope and may be generative, retrieval-based or ensemble. Dialogue systems, like the majority of NLP-related technology, have seen development over the past few decades from rule-based through machine-learning-driven to neural approaches. In the last couple of years, Large Language Models (LLMs) such as GPT have been particularly revolutionary in the field.</p><p>The evaluation of automatic dialogue is a challenging task due to dialogue's multiple and not readily quantifiable characteristics. Whilst task-oriented or retrieval-based open-domain dialogue systems might benefit from the application of BLEU scores, which compare their responses to human-made ones, the same cannot be said of systems that support a freer textual format. In their survey on the topic, <ref type="bibr" target="#b10">[4]</ref> mention the relevance of multiple evaluation criteria, including interactiveness, outside knowledge, coherence, speaker consistency, empathy and the system's ability to keep the conversation going.</p><p>Dialogue systems have been applied in foreign language education since the 1960s <ref type="bibr" target="#b11">[5]</ref>. Their advantages include ease of access, unlimited time of use, and the opportunity for students to practice their target language in a stress-free environment. In their review of 28 articles on the topic of AI dialogue systems used in language education, <ref type="bibr" target="#b11">[5]</ref> point out that an optimal system provides appropriate, conventional and efficient dialogue, and that it is especially challenging but important for it to also guarantee sufficient humour and cross-cultural engagement. Whilst most automatic dialogue systems conceived for educational purposes are focused on the English language, exceptions do exist, such as <ref type="bibr" target="#b12">[6]</ref>'s Subarashii <ref type="bibr">(2000)</ref>, which employed automatic speech recognition and rule-based logic to aid students of Japanese as a foreign language in beginner conversation. Today, chatbots like ChatGPT can engage in high-level conversation in multiple languages, including Japanese.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Dialogue in Literary Texts</head><p>A dialogue is an exchange between two or more parties that is associated with a specific setting and semantics. It is important to underline that the dialogues present in literary texts are by no means identical to those that occur in real life. <ref type="bibr">[7]</ref> points out that real-life dialogues last until a constraint (e.g. time) has been imposed or a conclusion has been reached. It is a common occurrence for simple repetitions or utterances lacking information to comprise a significant part of an exchange, as well as for so-called 'dialogue-of-the-deaf' or 'pseudo-dialogue' to occur due to overly large agreement or, conversely, a lack thereof. In contrast, literary dialogues have been pre-elaborated by the author and necessarily serve narrative purposes, such as advancement of the plot, demonstration of character qualities or the creation of a specific mood. <ref type="bibr">[8]</ref> goes so far as to make the following drastic distinction between the two types of dialogue: "it is not the purpose of dialogue to reproduce conversation naturalistically but rather, in the guise of conversation, to supply conversation's deficiencies -to be amusing where conversation is dull, to be economic where conversation is wasteful, to be articulate and lucid where conversation is mumbling or obscure" (p.7).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Dialogue in the Japanese Language</head><p>Japan is known for its specific (and often perplexing for foreigners) social norms, especially when it comes to formal communication. Naturally, these norms are to a great extent reflected verbally in interpersonal dialogue in the Japanese language.</p><p>To start with, several registers, associated with specific verb conjugation, occur in Japanese in function of the interlocutor's perceived 'status'. A very simplified rule of thumb would require that one utilise short forms (e.g. da<ref type="foot" target="#foot_0">1</ref> ) with close friends and family, -masu forms (e.g. desu) in the majority of circumstances, 'humble' forms (e.g. de gozaimasu) when referring to themselves in front of a superior and 'honorific' forms (e.g. de irasshaimasu) when addressing or referring to a superior interlocutor. However, as <ref type="bibr">[9]</ref> notes, additional criteria of use are also at play, such as the participants' mood and dynamics or the intention to use a construction that is only compatible with certain verb forms.</p><p>Examples of constructions that exert significant influence on their context are end-of-sentence particles. They are specific to and almost indispensable in Japanese dialogue. The particles zo, ze, sa and yo are associated with insistence, whilst others like ne and na denote a request for confirmation. However, practice once again shows that the employment of the phenomenon is not this easily generalisable. In fact, multiple and sometimes conflicting theories have been brought forward considering the nature of discrete particles and the differentiation between them. For instance, the speakers' 'territory of information' is often invoked <ref type="bibr">[9]</ref>. In contrast, other theories underline the role of particles in dialogue coordination and persuasion dynamics <ref type="bibr" target="#b16">[10]</ref>.</p><p>Unlike English and other Western languages, Japanese introduces subject and object roles within a sentence via specific case markers that precede the associated verb. In addition, a 'topic' marker (wa) is also commonly used, whose function, albeit similar to that of the subject marker, is different in that it is not directly associated with syntactic roles. The absence of explicit particles, commonly employed in dialogue, is a case of anaphora referred to as 'zero pronoun'. Its function is seen as comparable to that of personal pronouns in English <ref type="bibr" target="#b17">[11]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Texts</head><p>The primary literary texts investigated are abridged versions of Mark Twain's The Adventures of Tom Sawyer<ref type="foot" target="#foot_1">2</ref> , Victor Hugo's Les Miserables <ref type="foot" target="#foot_2">3</ref> and Lewis Carroll's Alice's Adventures in Wonderland <ref type="foot" target="#foot_3">4</ref> . The first two books belong to the series 'Aoitori Bunko' by the publisher Kodan <ref type="bibr">[2]</ref>, and the third one is from the series 'Popura Poketto' by Popura <ref type="bibr" target="#b18">[12]</ref>.</p><p>Both series as well as others like 'Tsubasa Bunko' <ref type="bibr">[3]</ref> seek to offer to young Japanese readers access to Japanese and world classics in simple language and child-friendly format. 'Aoitori Bunko' books, officially targeted at elementary school children, are commonly perceived as especially easy to read, as they contain a number of illustrations, use simple grammatical constructions, and the reading of all kanji<ref type="foot" target="#foot_4">5</ref> is provided <ref type="bibr" target="#b19">[13]</ref>. According to the blog 'Japanese Book Club', aimed at learners of Japanese as a foreign language, a typical book of the series contains grammar suitable for the N3 and N2 levels of competence as per the Japanese certificate exam JLPT<ref type="foot" target="#foot_5">6</ref>  <ref type="bibr">[14]</ref>.</p><p>In order for conclusions to be drawn that are specifically valid for and set apart abridged Japanese texts of an affirmed quality, each of the abridged texts is coupled with a non-abridged Japanese trans-lation of the same original text. 7 8 9   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Textual Processing</head><p>All texts are converted to plain text format 10 , any metadata (including titles, character information and prefaces) is removed and basic textual cleaning is applied, including the removal of non-alphanumeric symbols resulting from textual conversion. Tabs and new lines are removed to allow for further processing of the texts. For tokenisation of Japanese sentences into words, python's janome 11 library is utilised. For conversion of kanji into kana 12 , the pykakasi 13 library is opted for. Dialogue is defined as all text closed between '「' and ' 」' signs that ends in punctuation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Linguistic Characteristics</head><p>Two sets of linguistic characteristics are measured in relation to dialogue instances in the selected texts. When applicable, the dialogue in the texts is compared to the texts taken as a whole and/or with the dialogue in the abridged texts' non-abridged counterparts. The first set is conceived to represent the specificity of Japanese dialogue and consists of the percentage of sentences: discovered in relation to the professional texts but not exhibited by the current output. For the full generated texts, please refer to Appendix A; and for the prompts employed in the three scenarios, please refer to Appendix C.</p><p>The generated texts are then evaluated qualitatively via a survey administered to seven speakers of Japanese as a foreign language. There are two versions of the survey, each of which includes two of the automatically generated texts 16 . The two versions share a common text to facilitate the calculation of agreement between participants; the one-shot text is opted for as the other two texts are very similar to each other and their co-occurrence may cause confusion. The evaluation protocol is adapted from <ref type="bibr" target="#b22">[16]</ref> and evaluates the following categories: (i) understandability, (ii) correctness, (iii) consistency, (iv) textual coherence and (v) aesthetic appeal. Four possible answers exist by category: 'yes', 'mostly yes', 'mostly no' and 'no'. Comments in free text are also encouraged. The full protocol can be found in Appendix B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">General Observations</head><p>The detailed analysis of our experiments are shown in Table <ref type="table">1</ref> (Alice in Wonderland), Table <ref type="table" target="#tab_2">2</ref> (The Adventures of Tom Sawyer) and Table <ref type="table" target="#tab_4">3</ref> (Les Miserables) . The lengths of each abridged and unabridged text were calculated and so were the ratios between each pair, revealing variation between 0.07 (for the longest original text, Les Miserables) and 0.87 (for Alice in Wonderland).</p><p>The percentage of dialogue was discovered to be consistently higher in the abridged texts, ratios ranging from 1.14 for Tom Sawyer to 1.38 for Les Miserables. The number of words per sentence is lower for dialogue than for the text taken as a whole in all six texts. Due to the presence of a clear trend, the ratios between the values of this feature for abridged and unabridged texts were also calculated in an attempt to discover a further trend that characterises abridged texts. Indeed, it was discovered that the ratios between the number of words per sentence in dialogue and in the full texts are generally higher for the abridged versions (or identical in the case of Alice in Wonderland) 17 .</p><p>No clear trends were discovered in relation to the number of letters per word (e.g. it is higher for dialogue in both of Alice in Wonderland's versions but this is the case only for Tom Sawyer's abridged version). In contrast, TTR does exhibit a trend, as it is higher in the case of dialogues in all texts (thereby speaking of higher lexical variety). Therefore, the ratios of the values in relation to the three abridgedunabridged pairs were also calculated; however, no further trends pertaining to abridged texts were discovered 18 . The percentages of words that fall outside the utilised frequency list were higher for the texts when taken as a whole as compared to the dialogue within them (the sole exception being Tom Sawyer's unabridged version). No further trends are noticeable in relation to a text's 'unabridged' versus 'abridged' status 19 .</p><p>The Japanese-specific feature 'percentage of sentences without topic or subject marker' reveals a general trend pointing to higher values in dialogue parts 20 . Calculation of the values' ratios within a textual pair shows a further trend of higher ratios in unabridged as compared to abridged texts 21 . 16 version 1: zero-shot and one-shot; version 2: one-shot and chain-of-thought 17 0.5 vs 0. 1 and the related forms datta, dewa/ja nai, and dewa/ja nakatta 2 and the related forms deshita and dewa/ja arimasen 3 yo, ne, na, sa, zo, ze, and wa 4 the proportion of number of characters in dialogue over the number of characters in the whole text 5 based on the multilingual Open Subtitles corpus and available at: https://github.com/hermitdave/FrequencyWord</p><p>Although the percentages of short and long verb forms within the different texts vary significantly as a consequence of the nature of the characters and their interactions (e.g. there is much higher formality in Les Miserables than in Alice in Wonderland), there is overall a higher number of short forms in all six texts. A calculation of ratios between the two types of verb forms in textual pairs was conducted; however, revealing no further trend 22 .</p><p>With the exception of Les Miserables, the abridged texts are associated with a lower percentage of end-of-sentence particles. Understandably, the choice of particles used is highly text-specific; for instance,Alice in Wonderland contains many instances of the particle wa, which is used almost exclusively by female speakers. The most common particles in both types of texts are yo, ne, and na; whilst the particle ze is generally avoided, in particular in abridged texts (even when used abundantly in the unabridged counterpart, like in the case of Tom Sawyer).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Generation Experiment</head><p>ChatGPT Output Table <ref type="table" target="#tab_6">4</ref> offers detailed juxtaposition of the investigated features within the textual extract provided to ChatGPT versus its zero-shot output. The overall length of the two texts is, respec-  tively, 2331 and 1457 characters, making the abridged version 0.63 of the length of the original one. This proportion falls neatly within the range exhibited by the previously examined professional texts. The percentage of dialogue is, naturally, high as the extract has been specifically selected to include extensive dialogue: 0.7 for the abridged vs 0.57 for the unabridged text. The abridged text contains a higher percentage of dialogue, thus matching the trend discovered in relation to the professional texts. TTR is higher for the automatically abridged text, which also keeps in line with the discovered trend.</p><p>Similarly, the number of words outside the frequency list is indeed higher in the full text than in its dialogue portions. However, when it comes to the number of words per sentence, the trend of there being fewer words in dialogue parts is not exhibited by ChatGPT's output. The percentage of sentences without subject or topic marker is higher in dialogue parts than in the full text both within the original extract and ChatGPT's output. To go further, the ratio between the two values is higher for the unabridged text, thus keeping up with both trends related to the textual feature. Interestingly, even though the percentages of sentences ending in da vs desu go against the discovered trend in the originally provided extract (the latter being more common), ChatGPT's output does in fact match the trend. Finally, a higher percentage of sentences end in a particle in the abridged text, going against the relevant trend. More importantly, there is very little variety in the included particles, yo single-handedly accounting for their vast majority.</p><p>Based on this conducted evaluation, ChatGPT was asked to make two modifications in the chain-ofthought setting: (i) Reduce the number of words in dialogue sentences and (ii) Add a larger variety of end-of-sentence particles, such as the commonly used ne and na. Whilst the requested particles were indeed added<ref type="foot" target="#foot_6">23</ref> , the overall length of sentences in dialogue was not decreased<ref type="foot" target="#foot_7">24</ref> . Still, upon reflection, no further prompts were provided at this point in order for the model's unedited attempt at meeting the requirements to be evaluated in the ensuing survey.</p><p>Human Evaluation The survey's 'version 1' (which consisted of the zero-shot and one-shot texts) was answered by three respondents and 'version 2' (one-shot and chain-of-thought) -by five. Inter-rater agreement as per Fleiss's Kappa was low for all three texts<ref type="foot" target="#foot_8">25</ref> . Highest agreement was associated with 'correctness' in relation to the one-shot text and with 'textual coherence' in relation to the chain-of-thought one. Agreement was stable between categories for the zero-shot text.</p><p>The chain-of-thought text received the highest percentage of 'yes' answers, 76%; followed by 24% 'mostly yes'. The one-shot text received 70% 'yes', 20% 'mostly yes' and 10% 'mostly no'. Finally, the zero-shot text received 66.7% 'yes', 26.7% 'mostly yes' and 6.7% 'mostly no' answers. No text was given less than 50% of 'yes' answers for a given category.</p><p>The one-shot text was met with a higher number of negative free-text comments. Grammar was said to be complex, and the lack of kanji representations to be confusing. It was sometimes difficult to tell who was speaking, and the lack of explicit subject or owner rendered some sentences hard to understand. On the positive side, the text was said to be 'very flowing and natural'. Furthermore, the dialogue attributed to the characters was perceived as vivid and judged to match their personal traits ("Scrooge's dismissive attitude and the cheerful responses of his nephew").</p><p>The zero-shot text was said to be written in standard and fitting 'conversational Japanese', and it was pointed out that repetitions within dialogue were effective. The chain-of-thought text, in which ChatGPT was explicitly asked to replicate the trends discovered in this study, received mostly positive comments. Anaphora was said to be more readily resolvable (e.g. the pronoun kare<ref type="foot" target="#foot_9">26</ref> was employed several times). Compared with its one-shot counterpart, the text was said to be 'better and smoother' and to contain simpler grammar. A specific sentence was said to be more 'aesthetically appealing' than the parallel one used in the one-shot text. According to the negative comments, complex kanji were sometimes used, verb tense was inconsistent on one occasion and, importantly, there was said to be less of a striking difference in the dialogue styles attributable to the two main characters. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>An automatic evaluation of both universal and language-specific features as present in dialogue in abridged (and related unabridged) literary texts in Japanese led to the discovery of several distinct trends. Some of them simply pertain to dialogue in Japanese literature, whilst others go further to reveal specific traits of the dialogue present in abridged texts. The latter trends help shed light on these texts' features that render them suitable for an audience of language learners and/or younger readers.</p><p>Abridged texts are associated with a higher percentage of dialogue and a lower percentage of endof-sentence particles. There tend to be a higher number of infrequent words in the text when taken as a whole than in its dialogue portions. TTR is higher in dialogue compared to the literary texts taken as a whole, and there tend to be more short than long verb forms in dialogue. The number of words per sentence is higher in dialogue than in full texts, and the ratio between the two values is higher for abridged texts. Sentences without marked topic or subject are more frequent in dialogue, and the associated ratio is higher for unabridged texts.</p><p>The chain-of-thought setting of automatic abridgement, in which ChatGPT was asked to apply modifications for its initial output text to inscribe within the mentioned trends, was met with significant encouraging feedback, such as the clear use of anaphora, which also characterises the zero-shot text. However, an important piece of criticism was the perceived lack of distinguishing features in the characters' language. There are indeed instances of dialogue in the chain-of-thought text that go against the grain of Scrooge being rather informal and patronising in contrast with his nephew being cheerful and always respectful. An example is the nephew informally skipping the question marker ka; e.g. " どうしてそんなことを言うんです？" 27 . However, upon verification, these instances are identical in the zero-shot text. Therefore, one can safely conclude that it was not the fact of asking the model to make a couple of specific adjustments that caused a compromise in terms of creativity and charac-27 "Why would you say this?" terisation. Conversely, it was the one-shot setting that helped enhance creativity. For instance, there, the mentioned question is rendered in a very formal manner and does end in a question particle: " ど うしてそんなことを言うんですか？"</p><p>Interestingly, increased use of kanji was seen as an advantage and drawback by different participants in the survey, leading to the conclusion that generalisation of the texts' qualities in relation to different levels of reader proficiency is not optimal. Based on the respondents' feedback, the one-shot text seems to be suitable for more experienced readers of Japanese compared to the other two.</p><p>Table <ref type="table">5</ref>: Juxtaposition of a part of the original extract of A Christmas Carol and the output provided by ChatGPT in the different examined scenarios. The major differences between the texts are underlined. Version Text (English) <ref type="foot" target="#foot_10">28</ref> "Don't be angry, uncle. Come! Dine with us to-morrow. "</p><p>Scrooge said that he would see him-Yes, indeed he did. He went the whole length of the expression, and said that he would see him in that extremity first. "But why?" cried Scrooge's nephew. "Why?" "Why did you get married?" said Scrooge. "Because I fell in love. " "Because you fell in love!" growled Scrooge, as if that were the only one thing in the world more ridiculous than a merry Christmas. "Good afternoon!"</p><formula xml:id="formula_0">unabridged 「おこらないでください、おじさん。さあ 明日は僕らと一緒に夕食をと ってください」 スクル一ジは、おまえが墓場に、確かにそう、まったくこの通り口にしたの だった、 おまえが墓場に落ちるところをみたいものだなと。 「どうしてなんです?」甥は叫びました。「いったいどうして」 「どうしておまえは結婚したんだ?」 「恋に落ちたからです」 「恋に落ちたからとはな!」スクル一ジはまるでその言葉が、メリ一クリスマ スより 腹立たしい唯一の言葉であるかのように吐き捨てた。 「ごきげんよう」 zero-shot 「そんなに怒らないでくださいよ、おじさん。明日は僕たちと一緒に夕食を 食べませんか？ 」 スクル一ジは冷たく言いました。 「 お前が墓場に落ちるところを見たいもの だ」 「どうしてそんなことを言うんです？」と甥は叫びました。「どうしてです か？ 」 「お前が結婚したからだ」 「恋に落ちたからです」と甥は答えました。 「恋に落ちたから？そんな馬鹿な！ごきげんよう」 chain-of- thought 「そんなに怒らないでくださいよ、おじさん。明日は僕たちと一緒に夕食を 食べませんか？」 スクル一ジは冷たく言いました。 「 お前が墓場に落ちるところを見たいもの だ、ね」 「どうしてそんなことを言うんです？ 」と甥は叫びました。「どうしてですか、 ね？」 「お前が結婚したからだ」 「恋に落ちたからです」と甥は答えました。 「恋に落ちたから？そんな馬鹿な！ごきげんよう」 one-shot 「怒 ら な い で く だ さ いよ、 お じ さ ん。 明 日 は 僕 た ち と 一 緒 に 夕 食 を と りましょう」 スクル一ジは、 「 お前が墓場に落ちるところを見たいものだ」と言いました。 「ど う し て で す か？ 」 お い は 叫 び ま し た。 「ど う し てそんなことを言うんですか？」 「どうしてお前は結婚したんだ？ 」 「恋に落ちたからです」 「恋に落ちたから？ くだらない」 とスクル一ジは言い捨てました。 「さようなら」</formula><p>Table <ref type="table">5</ref> presents parallel texts resulting from the discrete prompting scenarios experimented with. The text's full version, including in the original English, is also included for comparison. In all three scenarios, the second paragraph is turned into dialogue. Other common characteristics include simplification of grammar; e.g. the exclamation " さあ" is excluded and simple constructions are used to express the nephew's invitation: " ません"/" ましょう". In all texts, Scrooge makes use of short verb forms. The zero-shot and chain-of-thought versions, which differ only in their use of end-of-sentence particles, tend to clarify the referents of anaphora, such as with the phrase " と甥は答えました"<ref type="foot" target="#foot_11">29</ref> ; in the one-shot text, however, the reader is left to guess who the speaker is. The emphasised use of end-of-sentence particles in the chain-of-thought text reads naturally and enhances the dialogue's dynamics, such as in the nephew's question " どうしてですか、ね?", where the particle implies its status as rhetoric. In fact, the one-shot version also makes effective use of particles, such as in the sentence " 怒らないでくださいよ". An important quality of the one-shot text that, however, cannot be perceived in relation to the other abridged texts is a simplification of vocabulary. While the zeroshot and chain-of-thought versions retain Scrooge's original " ごきげんよう", a greeting likely to be challenging for non-native readers, the one-shot one uses a much better known synonym, " さような ら".</p><p>It is likely that the one-shot text's higher overall complexity is dependent upon the sample extract that was provided to the model and not an intrinsic feature of the generation scenario -a hypothesis that should be tested through the inclusion of multiple texts of different proficiency levels. High potential can therefore be expected from a future combination of the one-shot technique's natural tendency to provide variety and lexical simplification and the chain-of-thought addition's incorporation of specific instructions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion and Future Work</head><p>Importantly, the current study's observations are general in nature as they are based on a very limited number of texts. A deeper examination of the discovered trends is planned to involve a substantial increase in corpus size. At best, works of a variety of genres, years, and original languages of writing will be added. A large corpus size could allow for not only trends but also reliable value ranges to be defined concerning the examined textual features (with the caveat that some features, such as TTR, are highly dependent on textual length).</p><p>Based on the conducted evaluation, it can be concluded that automatic generation may benefit from longer chains of thought and from a combination of the one-shot and chain-of-thought generation methods. Subsequently, the newly derived texts should ideally be evaluated by a larger sample of readers of defined Japanese proficiency levels.</p><p>Comparison of the current experiment with similar ones that focus on different languages may provide interesting insight about the acquisition processes as associated with the languages at hand; in particular, in relation to literary dialogue. For instance, the use of anaphora in the face of the Japanese zero pronoun may be compared to the use of personal pronouns in English or other European languages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Limitations</head><p>The limited size of the literary corpus, caused by restrictions of availability and of a financial nature, makes observations less generalisable as the author's or translator's choices play a significant role in the texts' characteristics. Some of the examined automatic characteristics are not exhaustive in nature and are meant to help discover overall tendencies rather than specific values or frequencies. For instance, solely the verb desu is used to differentiate between the use of short and -masu forms, and the cases when the verb may not be positioned at the end of the sentence are not accounted for.</p><p>Finally, as the number of participants in the conducted survey is limited and heterogeneous in terms of proficiency levels, it is challenging to conclude on the quality of the three generated texts solely on its basis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Texts Used and Generated in the Automatic Abridgement Experiment</head><p>Original </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Evaluation Protocol</head><p>Thank you for taking part in this evaluation, which is inscribed within the project "Dialogue in Abridged Literary Texts in Japanese: Observations and Automatic Reproducibility."</p><p>You have been assigned two texts to evaluate based on the provided criteria. The texts are automatically abridged versions of the same extract from Charles Dickens'novel A Christmas Carol. Please fill in the two tables independently by marking your answers with a tick or cross (or by circling the relevant answer). The section Aesthetic Appeal is highly subjective; please provide your personal opinion. You are welcome to write additional comments in the rightmost field.</p><p>This protocol is adapted from </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Yes Mostly Yes Mostly No No</head><p>The text is easy to understand. It is free from overly complex words and sentence constructions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Correctness</head><p>Does the text exhibit correct vocabulary, grammar, punctuation, and structure?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Yes Mostly Yes Mostly No No</head><p>There are no non-existent or irrelevant words, grammatical mistakes, mistakes in punctuation, or non-optimal paragraph/sentence structure.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Consistency</head><p>Is the text consistent in terms of register, writing style, and the information presented?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Yes Mostly Yes Mostly No No</head><p>The text maintains the same genre/level of formality, verb tenses, character traits, and knowledge. There are no contradictions in the presented information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Textual Coherence</head><p>Are textual transitions present and natural? Is anaphora clearly resolvable? Is the text free of needless repetition?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Yes Mostly Yes Mostly No No</head><p>The text flows fluently between sentences (paragraphs). There are no unclear anaphora referents. Information (incl. separate words) is not overly repeated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Aesthetic Appeal</head><p>Does the text have appropriate length, pacing, style, and register? Is the text engaging?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Yes Mostly Yes Mostly No No</head><p>The text feels neither too long nor too short. Appropriate textual length is accorded to plot segments. Formality levels are natural.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Prompts Used</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Zero-Shot Setting</head><p>Here is an extract from the novel A Christmas Carol by Charles Dickens, as translated into Japanese. Please generate an abridged version of this extract that is suitable for an audience of</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Analysis of dialogue-centered textual characteristics in abridged and unabridged versions of Alice's Adventures in Wonderland (registers of politeness and end-of-sentence particles are only associated with communication; therefore, features pertaining to them are only calculated for dialogue). Details about the characteristics are given in Section 3.3.</figDesc><table><row><cell>Characteristics</cell><cell cols="4">Abridged Dialogue Full text Dialogue Full text Unabridged</cell></row><row><cell cols="4">Set 1: Characteristics typical to Japanese dialogue/text</cell><cell></cell></row><row><cell>% s-ces w/o wa/ga % s-ces ending in da 1 % s-ces ending in desu 2 % s-ces ending in particles 3</cell><cell>20.41 8.26 0.70 15.82</cell><cell>21.40 N/A N/A N/A</cell><cell>66.60 3.00 0.94 18.57</cell><cell>34.85 N/A N/A N/A</cell></row><row><cell>% s-ces ending in yo</cell><cell>3.69</cell><cell>N/A</cell><cell>5.63</cell><cell>N/A</cell></row><row><cell>% s-ces ending in ne</cell><cell>4.39</cell><cell>N/A</cell><cell>5.63</cell><cell>N/A</cell></row><row><cell>% s-ces ending in na</cell><cell>2.11</cell><cell>N/A</cell><cell>3.19</cell><cell>N/A</cell></row><row><cell>% s-ces ending in sa</cell><cell>0.53</cell><cell>N/A</cell><cell>0.19</cell><cell>N/A</cell></row><row><cell>% s-ces ending in zo</cell><cell>1.58</cell><cell>N/A</cell><cell>1.5</cell><cell>N/A</cell></row><row><cell>% s-ces ending in ze</cell><cell>0</cell><cell>N/A</cell><cell>0</cell><cell>N/A</cell></row><row><cell>% s-ces ending in wa</cell><cell>3.51</cell><cell>N/A</cell><cell>2.44</cell><cell>N/A</cell></row><row><cell cols="4">Set 2: Universal readability-based characteristics</cell><cell></cell></row><row><cell>length (characters) % dialogue 4</cell><cell>69642 14.31</cell><cell></cell><cell>80055 12.56</cell><cell></cell></row><row><cell>average # words/s-ce</cell><cell>13.28</cell><cell>26.62</cell><cell>9.62</cell><cell>19.10</cell></row><row><cell>average # letters (kana)/word</cell><cell>1.98</cell><cell>1.94</cell><cell>2.10</cell><cell>1.99</cell></row><row><cell>TTR % words outside a 30k freq. list 5</cell><cell>0.24 58.44</cell><cell>0.10 60.19</cell><cell>0.26 58.41</cell><cell>0.10 60.35</cell></row><row><cell>5 for Alice in Wonderland</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0.67 vs 0.64 for Tom Sawyer</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0.61 vs 0.59 for Les Miserables</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>18 2.4 vs 2.6 for Alice in Wonderland</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>1.6 vs 1.38 for Tom Sawyer</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>1.6 vs 2.66 for Les Miserables</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>19 ratios: constant 1 for Alice in Wonderland</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0.95 vs 1.15 for Tom Sawyer</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>0.85 vs 0.82 for Les Miserables</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table><note>20 Alice in Wonderland's abridged version is an exception, showing a similar but slightly higher value in relation to the entire text 21 0.95 vs 1.9 for Alice in Wonderland 1.37 vs 1.62 for Tom SawyerTable1</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>Analysis of dialogue-centered textual characteristics for The Adventures of Tom Sawyer.</figDesc><table><row><cell>Characteristics</cell><cell>Abridged Dialogue Full</cell><cell>Unabridged</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>text Dialogue Full text Set 1: Characteristics typical to Japanese dialogue/text</head><label></label><figDesc></figDesc><table><row><cell>% s-ces w/o wa/ga</cell><cell>50.38</cell><cell>36.82</cell><cell>51.14</cell><cell>31.46</cell></row><row><cell>% s-ces ending in da</cell><cell>19.64</cell><cell>N/A</cell><cell>11.86</cell><cell>N/A</cell></row><row><cell>% s-ces ending in desu</cell><cell>3.04</cell><cell>N/A</cell><cell>0.54</cell><cell>N/A</cell></row><row><cell>% s-ces ending in particles</cell><cell>17.79</cell><cell>N/A</cell><cell>33.79</cell><cell>N/A</cell></row><row><cell>% s-ces ending in yo</cell><cell>6.52</cell><cell>N/A</cell><cell>13.24</cell><cell>N/A</cell></row><row><cell>% s-ces ending in ne</cell><cell>1.93</cell><cell>N/A</cell><cell>3.26</cell><cell>N/A</cell></row><row><cell>% s-ces ending in na</cell><cell>5.34</cell><cell>N/A</cell><cell>3.96</cell><cell>N/A</cell></row><row><cell>% s-ces ending in sa</cell><cell>1.78</cell><cell>N/A</cell><cell>4.74</cell><cell>N/A</cell></row><row><cell>% s-ces ending in zo</cell><cell>1.19</cell><cell>N/A</cell><cell>1.69</cell><cell>N/A</cell></row><row><cell>% s-ces ending in ze</cell><cell>0.96</cell><cell>N/A</cell><cell>5.36</cell><cell>N/A</cell></row><row><cell>% s-ces ending in wa</cell><cell>0.07</cell><cell>N/A</cell><cell>1.53</cell><cell>N/A</cell></row><row><cell cols="4">Set 2: Universal readability-based characteristics</cell><cell></cell></row><row><cell>length (characters)</cell><cell>103758</cell><cell></cell><cell>229024</cell><cell></cell></row><row><cell>% dialogue</cell><cell>26.56</cell><cell></cell><cell>24.32</cell><cell></cell></row><row><cell>average # words/s-ce</cell><cell>13.22</cell><cell>19.62</cell><cell>15.17</cell><cell>23.60</cell></row><row><cell>average # letters/word</cell><cell>1.97</cell><cell>1.95</cell><cell>1.92</cell><cell>1.93</cell></row><row><cell>TTR</cell><cell>0.16</cell><cell>0.10</cell><cell>0.11</cell><cell>0.08</cell></row><row><cell>% words outside a 30k freq. list</cell><cell>61.45</cell><cell>65.85</cell><cell>64.54</cell><cell>57.20</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3</head><label>3</label><figDesc>Analysis of dialogue-centered textual characteristics for Les Miserables.</figDesc><table><row><cell>Characteristics</cell><cell>Abridged Dialogue Full</cell><cell>Unabridged</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>text Dialogue Full text Set 1: Characteristics typical to Japanese dialogue/text</head><label></label><figDesc></figDesc><table><row><cell>% s-ces w/o wa/ga</cell><cell>56.04</cell><cell>33.37</cell><cell>45.91</cell><cell>21.62</cell></row><row><cell>% s-ces ending in da</cell><cell>10.48</cell><cell>N/A</cell><cell>14.10</cell><cell>N/A</cell></row><row><cell>% s-ces ending in desu</cell><cell>8.60</cell><cell>N/A</cell><cell>7.98</cell><cell>N/A</cell></row><row><cell>% s-ces ending in particles</cell><cell>19.65</cell><cell>N/A</cell><cell>13.65</cell><cell>N/A</cell></row><row><cell>% s-ces ending in yo</cell><cell>6.94</cell><cell>N/A</cell><cell>4.75</cell><cell>N/A</cell></row><row><cell>% s-ces ending in ne</cell><cell>5.92</cell><cell>N/A</cell><cell>3.6</cell><cell>N/A</cell></row><row><cell>% s-ces ending in na</cell><cell>2.31</cell><cell>N/A</cell><cell>2.17</cell><cell>N/A</cell></row><row><cell>% s-ces ending in sa</cell><cell>1.01</cell><cell>N/A</cell><cell>0.84</cell><cell>N/A</cell></row><row><cell>% s-ces ending in zo</cell><cell>1.45</cell><cell>N/A</cell><cell>0.46</cell><cell>N/A</cell></row><row><cell>% s-ces ending in ze</cell><cell>0.36</cell><cell>N/A</cell><cell>0.32</cell><cell>N/A</cell></row><row><cell>% s-ces ending in wa</cell><cell>1.66</cell><cell>N/A</cell><cell>1.51</cell><cell>N/A</cell></row><row><cell cols="4">Set 2: Universal readability-based characteristics</cell><cell></cell></row><row><cell>length (characters)</cell><cell>106044</cell><cell></cell><cell>1522950</cell><cell></cell></row><row><cell>% dialogue</cell><cell>22.91</cell><cell></cell><cell>16.63</cell><cell></cell></row><row><cell>average # words/s-ce</cell><cell>9.93</cell><cell>16.28</cell><cell>12.77</cell><cell>21.30</cell></row><row><cell>average # letters/word</cell><cell>1.97</cell><cell>1.97</cell><cell>2.03</cell><cell>2.08</cell></row><row><cell>TTR</cell><cell>0.16</cell><cell>0.10</cell><cell>0.08</cell><cell>0.03</cell></row><row><cell>% words outside a 30k freq. list</cell><cell>56.30</cell><cell>58.71</cell><cell>66.31</cell><cell>71.45</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 4</head><label>4</label><figDesc>Analysis of dialogue-centered textual characteristics in an extract from the unabridged novel A Christmas Carol and in its automatically abridged version.</figDesc><table><row><cell>Characteristics</cell><cell cols="4">Abridged Dialogue Full text Dialogue Full text Unabridged</cell></row><row><cell cols="4">Set 1: Characteristics typical to Japanese dialogue/text</cell><cell></cell></row><row><cell>% s-ces w/o wa/ga</cell><cell>34.48</cell><cell>23.08</cell><cell>17.65</cell><cell>9.09</cell></row><row><cell>% s-ces ending in da</cell><cell>13.33</cell><cell>N/A</cell><cell>4.92</cell><cell>N/A</cell></row><row><cell>% s-ces ending in desu</cell><cell>11.67</cell><cell>N/A</cell><cell>11.48</cell><cell>N/A</cell></row><row><cell>% s-ces ending in particles</cell><cell>11.67</cell><cell>N/A</cell><cell>19.67</cell><cell>N/A</cell></row><row><cell>% s-ces ending in yo</cell><cell>8.33</cell><cell>N/A</cell><cell>16.39</cell><cell>N/A</cell></row><row><cell>% s-ces ending in ne</cell><cell>0</cell><cell>N/A</cell><cell>0</cell><cell>N/A</cell></row><row><cell>% s-ces ending in na</cell><cell>1.67</cell><cell>N/A</cell><cell>1.64</cell><cell>N/A</cell></row><row><cell>% s-ces ending in sa</cell><cell>0</cell><cell>N/A</cell><cell>0</cell><cell>N/A</cell></row><row><cell>% s-ces ending in zo</cell><cell>1.67</cell><cell>N/A</cell><cell>1.64</cell><cell>N/A</cell></row><row><cell>% s-ces ending in ze</cell><cell>0</cell><cell>N/A</cell><cell>0</cell><cell>N/A</cell></row><row><cell>% s-ces ending in wa</cell><cell>0</cell><cell>N/A</cell><cell>0</cell><cell>N/A</cell></row><row><cell cols="4">Set 2: Universal readability-based characteristics</cell><cell></cell></row><row><cell>length (characters)</cell><cell>1457</cell><cell></cell><cell>2331</cell><cell></cell></row><row><cell>% dialogue</cell><cell>69.80</cell><cell></cell><cell>57.02</cell><cell></cell></row><row><cell>average # words/s-ce</cell><cell>14.76</cell><cell>13.23</cell><cell>36.47</cell><cell>24.96</cell></row><row><cell>average # letters/word</cell><cell>2.23</cell><cell>2.11</cell><cell>2.07</cell><cell>2.04</cell></row><row><cell>TTR</cell><cell>0.37</cell><cell>0.3</cell><cell>0.38</cell><cell>0.32</cell></row><row><cell>% words outside a 30k freq. list</cell><cell>49.37</cell><cell>53.33</cell><cell>49.02</cell><cell>53.93</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_8"><head></head><label></label><figDesc><ref type="bibr" target="#b22">Mousavi et al. (2022)</ref>. If you have any questions, please address them to iglika.nikolova.stoupak@gmail.com</figDesc><table><row><cell>Dimension/Question</cell><cell>Answer Op-tions</cell><cell>Definition</cell><cell>Comment tional)</cell><cell>(op-</cell></row><row><cell>Understandability</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Is the text generally un-</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>derstandable (in terms</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>of storyline, vocabulary,</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>and syntax)?</cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">to be</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">Twain, M., Nishi, K., Iijima, A.: Tomy Souyaa no bouken. Kodan: Aoitori Bunko (2012).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Hugo, V., Katayama, W., Tsukahara, R.: Re mizeraburu. Kodan: Aoitori Bunko (2012).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">Carroll, L., Nishida, Y.: Fushigi no kuni no arisu. Popura: Poketto Bunko (2015).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">logographic Chinese-based characters</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">corresponding, respectively, to the B1 and B2 levels of the CEFR framework</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="23" xml:id="foot_6">ne: 11.86%; na: 8.47%; yo: 6.78%; zo: 1.69%</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="24" xml:id="foot_7">in fact, it was slightly increased to 14.82 due to the particles that were added simultaneously</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="25" xml:id="foot_8">the possible answers ('yes', 'mostly yes', 'mostly no', and 'no') were taken as categorical at the calculation of agreement</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="26" xml:id="foot_9">he</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="28" xml:id="foot_10">Dickens, C.: A Christmas Carol. Chapman &amp; Hall (1843).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="29" xml:id="foot_11">"the nephew answered"</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>「でも、どこかしらにはいきたいの」 アリスはいい足しました。 「たっぷりあるけば、どこかしらに着く」と、チェシャ-ネコその通りね、とアリスは思った ので、こんどはべつの質問をしてみました。 「ここらへんには、どんな人が住んでるの</head><p>-クリスマ スだって？クリスマスは、金もないのに支はらいをしなきゃいけない時だ。また一年、年を 取るが、少しも金持ちになっていないことを知る時だ。もしわしの思い通りになるなら」と スクル-ジは怒って言いました。 「 『メリ-クリスマス』なんて言うばか者は、プディングと一 緒に煮こんでやりたい」 「おじさんったら」とおいは言いました。 「お前はお前のやり方でクリスマスを祝えばいい。わしはわしのやり方があるから、ほっと いてくれ」 「やり方ですって？おじさんは何もしないじゃないですか」 「ほっといてくれ、それでいいだろう。クリスマスはたのしいんだろう？今までもいいこと がたくさんあったんだろうし」 「はい、たくさんありました。お金は えなかったけど、クリスマスはいつもたのしいです。 神さまの名前やクリスマスの意味を考えなくても、クリスマスは親切で、たのしい時だと思 います。人々が心を開いて、他の人々を友達のように感じる時です。クリスマスが僕にお金 をくれたことはないけど、それでもクリスマスは素晴らしいと思います。だから、神さまが 守ってくださいますように」 事務員が、思わず拍手をしましたが、すぐに火をかき回して暖かさを消してしまいました。 「もう一度音を立ててみろ」スクル-ジは叫びました。 「クビにするぞ」とおいの方を向い て、 「国会議員にでもなった方がいいんじゃないか」と言いました。 「怒らないでくださいよ、おじさん。明日は僕たちと一緒に夕食をとりましょう」 スクル-ジは、 「お前が墓場に落ちるところを見たいものだ」と言いました。 「どうしてですか？ 」おいは叫びました。 「どうしてそんなことを言うんですか？ 」 「どうしてお前は結婚したんだ？ 」 「恋に落ちたからです」 「恋に落ちたから？くだらない」とスクル-ジは言い捨てました。 「さようなら」 「でも、おじさん、結婚する前だって来てくれなかったじゃないですか。どうして結婚が理 由になるんですか」 「さようなら」 「べつにおじさんにどうこうしてもらおうとは思っていません。でも、仲良くできないんで すか？ 」 「さようなら」 「おじさんがそんなに頑固なのは残念です。でも、クリスマスのき持ちを忘れずに、メリ-ク リスマス、おじさん」 「さようなら」 「よいお年を」 「さようなら」 それでも、おいは怒らず、部屋を出ました。外に出る前に、事務員にクリスマスの挨拶を しました。事務員も、スクル-ジよりは心が温かかったので、心をこめて挨拶を返しました。</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>learners of Japanese as a foreign language at an intermediate level.</p><p>{text} One-Shot Setting {the prompt for the zero-shot setting} Please see below an example of an original and a well-abridged version of an extract from a novel.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>{unabridged example} {abridged example}</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Chain-of-Thought Setting</head><p>Here is an extract from the novel A Christmas Carol by Charles Dickens, as translated into Japanese. {text} You previously provided me with the following abridged version of it: {zero-shot output} Could you please make the following small modifications in the text, whilst trying to alter it as little as possible: -Make the sentences in the dialogue parts a little shorter (in terms of number of words).</p><p>-Include a larger variety of end-of-sentence particles in the dialogue, such as ne and na.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">and (v) the type-to-token ratio (TTR) (vi) the percentage of words that are not</title>
	</analytic>
	<monogr>
		<title level="m">ending with the short verb form da and the related forms datta</title>
				<imprint/>
	</monogr>
	<note>nai and dewa/ja nakatta 3. ending with the full verb form desu and the related forms deshita and dewa/ja arimasen 4. ending with the particles yo, ne, na, sa, zo, ze and wa. The second set consists of atomic readability characteristics that are not specific to the Japanese language and are commonly considered as representative of textual complexity: (i) the text&apos;s total length (in characters), (ii) the percentage of dialogue in the full text, (iii) the average number of words per sentence, (iv) the average number of letters per word. found in a frequency list 14 [15. For a detailed description of the investigated characteristics, please refer to section 4.1</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m">Automatic Text Generation and Adaptation Following the measurement of the characteristics</title>
				<imprint/>
	</monogr>
	<note>in abridged and unabridged versions of Japanese literary texts as detailed in section 3.3, an experiment in automatic generation of abridged text is conducted. For this purpose, ChatGPT is employed in its recent GPT-4o version. The model is asked to provide an abridged version of an extract from Charles Dickens&apos; A Christmas Carol 15 as translated into Japanese, which contains a large amount of dialogue. Three discrete generation scenarios are experimented with based on the nature of the provided prompts: zero-shot (in which only instructions are provided), one-shot (in which the model is given an example pair of unabridged and abridged extracts, taken from the respective professional versions of Alice in Wonderland) and chain-of-thought (which includes additional prompting following the model&apos;s generation). In the chain-of-thought setting, the model is asked to explicitly alter the text output in the zero-shot setting based on the trends that are</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Twain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yoshida</surname></persName>
		</author>
		<title level="m">Tomy Souyaa no bouken</title>
				<imprint>
			<publisher>Kaisei</publisher>
			<date type="published" when="1975">1975</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Hugo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Yoshio</surname></persName>
		</author>
		<title level="m">Re mizeraburu. Kodan: Aozora Bunko</title>
				<imprint>
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Fushigi no kuni no arisu</title>
		<author>
			<persName><forename type="first">L</forename><surname>Carroll</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yamagata</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2000">2000</date>
			<publisher>Asahi</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<ptr target="https://github.com/hermitdave/FrequencyWords" />
		<title level="m">The utilised frequency list is based on the top 30k words of the Open Subtitles multilingual corpus</title>
				<imprint/>
	</monogr>
	<note>Words are considered as types rather than tokens. Due to the nature of the frequency list, words consisting entirely of hiragana characters are not considered</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><surname>Dickens</surname></persName>
		</author>
		<author>
			<persName><surname>Katokt</surname></persName>
		</author>
		<title level="m">Kurisumasu Kyaroru</title>
				<imprint>
			<publisher>Tatsu-Zine</publisher>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<ptr target="https://www.jlpt.jp/e/statistics/archive.html" />
		<title level="m">Statistics of jlpt takers, JLPT Japanese-Language Proficiency Test</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Aoitori</forename><surname>Bunko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bunko</forename><surname>Aoitori</surname></persName>
		</author>
		<author>
			<persName><surname>Books</surname></persName>
		</author>
		<ptr target="https://cocreco.kodansha.co.jp/aoitori" />
		<imprint>
			<date type="published" when="2023">2023</date>
			<pubPlace>Japan</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Tsubasa</forename><surname>Bunko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bunko</forename><surname>Tsubasa</surname></persName>
		</author>
		<author>
			<persName><surname>Books</surname></persName>
		</author>
		<ptr target="https://tsubasabunko.jp" />
		<imprint>
			<date type="published" when="2023">2023</date>
			<pubPlace>Japan</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Recent advances in deep learning based dialogue systems: a systematic survey</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Young</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Pandelea</surname></persName>
		</author>
		<idno type="DOI">10.1007/s10462-022-10248-8</idno>
	</analytic>
	<monogr>
		<title level="j">Artificial Intelligence Review</title>
		<imprint>
			<biblScope unit="volume">56</biblScope>
			<biblScope unit="page" from="3055" to="3155" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A systematic review on artificial intelligence dialogue systems for enhancing english as foreign language students&apos; interactional competence in the university</title>
		<author>
			<persName><forename type="first">C</forename><surname>Zhai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wibowo</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.caeai.2023.100134</idno>
	</analytic>
	<monogr>
		<title level="j">Computers and Education</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page">100134</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>Artificial Intelligence</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">An interactive dialog system for learning japanese</title>
		<author>
			<persName><forename type="first">F</forename><surname>Ehsani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bernstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Najmi</surname></persName>
		</author>
		<idno type="DOI">10.1016/S0167-6393(99)00042-4</idno>
	</analytic>
	<monogr>
		<title level="j">Speech Communication</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="167" to="177" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Dialogues with/and great books: With some serious reflections on robinson crusoe</title>
		<author>
			<persName><forename type="first">D</forename><surname>Fishelov</surname></persName>
		</author>
		<idno type="DOI">10.1353/nlh.0.0022</idno>
	</analytic>
	<monogr>
		<title level="j">New Literary History</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="335" to="353" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
	<note>project MUSE</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Dialogue in novels and plays</title>
		<author>
			<persName><forename type="first">C</forename><surname>Morgan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Études Anglaises</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page">97</biblScope>
			<date type="published" when="1953">1953</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">The interaction of style-shift and particle use in japanese dialogue</title>
		<author>
			<persName><forename type="first">A</forename><surname>Janes</surname></persName>
		</author>
		<idno type="DOI">10.1016/S0378-2166(99)00118-6</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Pragmatics</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page" from="1823" to="1853" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Dialogue functions of japanese sentence-final particles &apos;yo&apos; and &apos;ne</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Katagiri</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.pragma.2007.02.013</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Pragmatics</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="page" from="1313" to="1323" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Identifying zero pronouns in japanese dialogue</title>
		<author>
			<persName><forename type="first">K</forename><surname>Yoshimoto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Computational Linguistics</title>
				<imprint>
			<date type="published" when="1988">1988. 1988</date>
			<biblScope unit="volume">2</biblScope>
		</imprint>
	</monogr>
	<note>Coling Budapest</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<ptr target="https://www.poplar.co.jp/pocket/" />
		<title level="m">Popura poketto bunko books</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<ptr target="https://japanesebookclubcafecom.wordpress.com/2020/07/29/what-are-aoi-tori-bunko-books/" />
		<title level="m">What are aoi tori bunko books?</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>Japanese Book Club Cafe</note>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<ptr target="https://japanesebookclubcafecom.wordpress.com/" />
		<title level="m">Japanese novel review: 霧のむこうのふしぎな町 the marvelous village veiled in mist by kashiwaba sachiko</title>
				<imprint>
			<publisher>Japanese Book Club Cafe</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">The Classic Readability Studies</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">H</forename><surname>Dubay</surname></persName>
		</author>
		<idno type="DOI">10.1109/TPC.2008.2007872</idno>
		<imprint>
			<date type="published" when="2007">2007</date>
			<publisher>ERIC Clearinghouse</publisher>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Evaluation of response generation models: Shouldn&apos;t it be shareable and replicable?</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Mousavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Roccabruna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lorandi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Caldarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Riccardi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), Association for Computational Linguistics</title>
				<meeting>the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), Association for Computational Linguistics<address><addrLine>Abu Dhabi, United Arab Emirates; Hybrid</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="136" to="147" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
