<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>and Unseen A+ Student: Evaluating the Performance Detectability of Large Language Models in High School Assignments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matyáš Boháček</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Large Language Models, Generative Artificial Intelligence, Education, School Assignments</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gymnasium of Johannes Kepler</institution>
          ,
          <addr-line>Parléřova 2/118, Prague, 169 00</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The recent boom of so-called generative artificial intelligence (AI) applications, namely large language models such as ChatGPT, took the public discourse by storm, disrupting many fields and industries. Education, being one of them, is now pressed to establish reactive policies on the use of this technology, often without enough insight and data. Thus, we present a dataset of authentic coursework (including long-form theses and short assignments) from a public high school in the Czech Republic, extended by AI-generated alternatives with various versions of ChatGPT. To evaluate their quality, we enlist a group of student peers from the same school and conduct multiple assessments. Our findings reveal that ChatGPT can generate high-quality, high school-level coursework of-the-shelf, even in a low-resourced language such as Czech. Additionally, we demonstrate that the AI text detectors, which are gradually being implemented in educational institutions and learning centers worldwide, fail to identify these AI-generated texts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        https://www.matyasbohacek.com (M. Boháček)
CEUR
Workshop
Proceedings
the internet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Nonetheless, recent discourse includes it under the shortcut umbrella term of
artificial intelligence (AI).
      </p>
      <p>
        Hand in hand with the hype and excitement came worries about how such a powerful
technology could be misused, prominently in education. OpenAI benchmarked the of-the-shelf
ChatGPT with GPT-4 on numerous academic exams and found that it performs well above
average human students in many subjects [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In SAT, the standardized test for American college
applications, the model achieved the 93rd and 89th percentile on Evidence-Based Reading &amp;
Writing, and Math parts, respectively. In both the Advanced Placement (AP) Art History and
Biology Exams, it got 5, the highest score.
      </p>
      <p>
        Educational institutions recently began to respond and introduce their policies on the use of
this technology. While some educators and organizations pioneer frameworks to include AI in
the classroom and plan to experiment with diferent approaches in the upcoming months [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
many have strictly prohibited it, including College Board [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which runs SAT and AP exams.
Many high schools and universities soon followed [9, 10, 11]. Jointly, they implemented detectors
of AI-generated texts, which should, similarly to plagiarism detectors, spot the cheaters [12, 13].
However, unlike plain plagiarism, proving that students used an AI model to generate their text
is significantly more complex and prone to false positive findings [ 14].
      </p>
      <p>Amidst this rapid development and change in school policies, many questions remain unsolved.
OpenAI’s report, which many educational institutions refer to, includes mostly exams in the
English language. How well does the system perform in other languages, especially
lowresourced ones? And does it work for essays and creative written assignments, too? How
reliable are the publicly available AI text classifiers? And are they better at spotting generated
homework compared to humans?</p>
      <p>To answer these questions, we collect a novel dataset of coursework from a public high
school in the Czech Republic, including both long-form theses and short assignments, and
generate alternatives and continuations using diferent versions of ChatGPT. We evaluate
the quality and detectability of these texts with a group of student peers from the school
and present the results in this paper. To support future research and public debate in this
direction, we make the data publicly available for open-domain research and analyses at https:
//www.matyasbohacek.com/topics/ai-education/.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Recently, the literature has begun exploring the implications of widely accessible AI tools
for education. One of their fundamental premises is that they will enable personalized and
interactive learning, with tailored instructions and more continuous evaluation [15]. Moreover,
they are expected to accelerate students’ research and writing process, allowing for more
analytical and collaborative activities [16]. Some studies also focus on how AI and LLMs could
benefit specific subjects, most prominently medicine [ 17].</p>
      <p>On the other hand, many recent works outline the potential dangers AI and LLMs pose
for education. Megahed et al. [18] show that ChatGPT struggles with nuanced tasks, such as
explaining less widely known terms or creating factual content from scratch, and thus may
be untrustworthy when teaching new content. Rahman and Watanobe [19] describe specific</p>
      <p>Název: Feminizace migrace</p>
      <p>Předmět: Humanitní studia
Abstrakt:
Práce se zaměřuje na ženskou migraci a její
specifika. V práci je popsáno, kterým okolnostem
ženy při migraci čelí a je snaha upozornit na mýty a
stereotypy, které kolem migrujících žen panují.</p>
      <p>Klíčová slova:
migrace, ženská migrace, migrace v ČR, teorie
push-pull, informativnost v migraci, care-drain,
integrace migrantů, překvalifikovanost migrantů
Abstrakt:
Tato maturitní práce se zabývá feminizací migrace
jako spojením dvou sociálně zranitelných skupin,
žen a migrantů. Práce popisuje intenzitu feminizace
migrace, zdrojové faktory, které ji podporují a
konkrétní příklady feminizace migrace v České
republice.</p>
      <p>Klíčová slova:
feminizace migrace, ženská migrace, Česká
republika, teoretické popisy, praktické fakty.</p>
      <p>Title: Feminization of migration</p>
      <p>Subject: Humanities
Abstracts:
This thesis focuses on female migration and its
specifics. The thesis describes the circumstances
that women face during migration and tries to
highlight the myths and stereotypes that exist
around women migrants.</p>
      <p>Keywords:
Migration, female migration, migration in Czechia,
push-pull theory, informativeness in migration,
caredrain, integration of migrants, overqualification
Abstract:
This thesis explores the feminization of migration as
the coming together of two socially vulnerable
groups, women and migrants. The thesis describes
the intensity of feminization of migration, the
resource factors that support it and specific
examples of feminization of migration in Czechia.
misuses (e.g., cheating on online exams or generating essay assignments) and hypothesize that
over-reliance on AI could eventually diminish critical thinking skills.</p>
      <p>Many recent works studied whether humans can distinguish LLM-generated and
humanproduced texts [20, 21]. The results suggest that — in most contexts — human judgment is no
better than guessing on this task. However, the identification accuracy slightly improves with
training on which patterns to observe.</p>
      <p>With poor human accuracy, diferent automatic approaches to distinguish AI- and
humanproduced text have been introduced [22, 23, 24, 25]. Nevertheless, their precision varies
significantly given the context and usually requires the knowledge of the LLM architecture used for
the generation in the first place, limiting their practical use. Additional limitations — including
the bias of these systems against non-native English writers — have been identified [ 26].</p>
      <p>As for employing AI detectors in educational contexts, some opinion pieces have suggested
that their reliability may be problematic depending on the context [27]; nonetheless, to the best
of our knowledge, there are no systematic analyses of this phenomenon to date.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset</title>
      <p>To compare AI-generated (synthetic) content to human-produced coursework, we first collected
a dataset of coursework from a public high school in Prague, Czech Republic. All of the
assignments were completed in years 2019-2023. With many diferent kinds of written assignments,
we divided the dataset into 2 primary parts and 5 latter sub-splits, depending on the types
of enrichments and analyses performed on them. For every generation we performed using
ChatGPT atop GPT 4.0 backbone, we replicated it with GPT 3.5 and 3.5 Legacy backbones,
resulting in 3 variants of the synthesized text. We include a complete set of the prompts in
Appendix A.</p>
      <sec id="sec-3-1">
        <title>3.1. Long-form Theses</title>
        <p>We first assemble 20 final high school theses: 10 for the subject of ’Czech Language and
Literature’ and 10 for ’Humanities’. Each work was written in Czech, consists of some 30 to
60 pages, and follows the general guidelines of formal academic writing. On top of these, we
create 2 sub-splits, each holding an equal ratio of data from both subjects.</p>
        <p>Sub-split A: holds abstract and keyword pairs for 10 theses. We generated the 3 synthetic
alternative abstracts and keywords by including the introduction and conclusion of the respective
work in the prompt.</p>
        <p>Sub-split B: holds two subsequent paragraphs of text, with 3 synthetic alternatives that
replace the second paragraph.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Short Assignments</title>
        <p>Next, we assemble various assignments from diferent subjects. For each assignment, we
include 10 human-written responses and generate 3 alternatives using ChatGPT, only given the
instructions (i.e., we did not present the system with students’ work).</p>
        <p>Sub-split C: holds the instructions and responses of an essay assignment in a ’English as
the Second Language’ course.</p>
        <p>Sub-split D: holds the instructions and responses of an essay assignment in a ’German as
the Third Language’ course.</p>
        <p>Sub-split E: holds the instructions and responses of a quiz assignment in a ’Math’ class.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Human Assessment</title>
      <p>We recruited 6 student peers, ages 18-20, from the same high school as the data was collected.
Each participant was instructed on the task and later presented with the same data (i.e., the
set of questions and reference texts was identical for each participant). We present the set
of instructions and questions in Appendix B. Given average reading speeds, we designed the
overall annotation task to take 75 minutes.</p>
      <p>Humanities
25
20
15
10
5
4
6
11
14
7
8
8</p>
      <sec id="sec-4-1">
        <title>4.1. Quality Assessment</title>
        <p>First, we assessed how the generated and authentic abstracts compare in terms of relevance (by
peer student measures). For all 10 theses in sub-split A, the participants were presented with 4
alternative abstracts and keywords (1 authentic, 3 generated). We did not disclose which one is
authentic and which is generated. The participants then had to select all options they deemed
relevant (i.e., meeting the formal criteria and corresponding to the topic) and then select the
single best one.</p>
        <p>Shown in Figure 2a are the proportions of abstracts selected as relevant, grouped by model
version and subject (the ’Overall’ bar averages the subject-specific scores). Shown in Figure 2b
are the absolute instances selected as the single best variants in the given selection, grouped by
model version and subject.</p>
        <p>We found that, on average, participants ranked abstracts generated by ChatGPT 3.5 Legacy
similarly to the authentic ones, with around 50% of instances deemed relevant. Abstracts
generated with ChatGPT 4.0 and 3.5 were perceived noticeably better: nearly 75% of their
instances were deemed relevant.</p>
        <p>As for the best option selection task, texts from ChatGPT 4.0 dominated, with a total of 25
of its instances selected as the best option. GPT 3.5 texts ranked second with 15 instances;
authentic and GPT 3.5 Legacy texts share the last rank with 10 instances. Overall, there seems
to be little to no statistically significant diference between the observed subjects.
20 %
18 %</p>
        <p>22 %
33 %</p>
        <p>27 %
27 %</p>
        <p>(a) Overall
20 %
27 %
20 %
33 %</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. AI Text Identification</title>
        <p>Next, we assessed whether participants could identify the authentic continuation of texts from
sub-split B. Given 4 options, they were tasked to select the 1 authentic text among 3 generated
ones. In general, humans without prior briefing on how to spot AI text are not able to do
so [20, 21]; we were interested in whether this translates to the educational paradigm.</p>
        <p>Shown in Figure 3a is the overall distribution of texts identified as authentic, grouped by
the origin (e.g., authentic or model type). Authentic texts were selected as such only 22% of
the time, which suggests that the participants are more likely to identify generated texts as
authentic.</p>
        <p>Most continuations in sub-split B (8 of the 10) were just a paragraph long. We wondered if an
extended generation range would afect the participants’ judgment and created 2 special cases,
where the continuation spans 3 paragraphs. Figure 3b captures the ranking distribution for this
sub-case. Interestingly, pro-longed authentic texts were even less likely to be deemed authentic
compared to their pro-longed counterparts.</p>
        <p>Figures 3c and 3d divide the analysis into texts given their subject: ’Humanities’ and ’Czech
Language and Literature’, respectively. While in ’Humanities’, participants tend to select the
authentic texts correctly more than the remaining classes, the latter subject sufers from a
dominance of the AI-generated texts.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Automatic Assessment</title>
      <p>Lastly, we tested the following publicly available services, promising to identify texts generated
using ChatGPT:
• Content at Scale: AI Content Detector2, yielding a likelihood of the text being written
by human;
• GPTZero3, classifying human-written, mixed, and AI-written texts;
• OpenAI’s AI Text Classifier 4, classifying very unlikely, unlikely, unclear, possibly, or
likely AI-generated texts;
• Writer: AI Content Detector5, yielding a likelihood of the text being written by human;
• ZeroGPT6, yielding a likelihood of the text being written by AI.</p>
      <p>Even though most of these services provide a nuanced assessment, we converted them to a
binary classification for the purposes of our study. We do not report conventional metrics that
would indicate the performance of individual tools, as they all completely failed our test. When
evaluated on sub-set A, OpenAI’s AI Text Classifier predicted that all the items are AI-generated,
while the rest of the services classified all the items as human-produced. This means that, if
used in practice, all students who wrote the material in our dataset – regardless of whether
they used AI or not – would be classified as cheaters or rule-abiding students, depending on
the service. This shows that current services cannot detect AI content in Czech, at least in the
educational domain.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>To summarize, we collected a dataset of authentic high school coursework, including both
long-form theses and short assignments, from a public high school in the Czech Republic and
generated their AI alternatives and text continuations using ChatGPT with 4.0, 3.5, and 3.5
Legacy backbones. We make the data publicly available for open-domain research and analyses
at https://www.matyasbohacek.com/topics/ai-education/.
2https://contentatscale.ai/ai-content-detector/
3https://gptzero.me/
4https://platform.openai.com/ai-text-classifier
5https://writer.com/ai-content-detector/
6https://www.zerogpt.com/</p>
      <p>Through a study involving student peers, we found that ChatGPT can quickly produce
highschool-level coursework that peers consider to be better than human-written text, even in a
low-resourced language like Czech. Moreover, we show that the AI text detectors, which are
slowly rolling out to campuses and educational centers worldwide, fail to identify these texts in
Czech.</p>
      <p>These results should be particularly alarming to educators and legislators who are establishing
AI policies in their context. Thus, we call them to gather relevant data for their specific language
and assignments specifics before making such decisions. At the same time, providers of AI
text detectors should be more transparent about their models’ performance, training data, and
supported languages.</p>
      <p>For future work, we aim to reproduce the study in various regional contexts while carefully
analyzing the nuanced cases where ChatGPT is successful or unsuccessful. We also plan on
including a group of teachers in addition to more student peer participants.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We would hereby like to thank Dr. Činátlová for her valuable insight and initiative when
communicating with teachers and students at the subject high school, as well as all her many
thought-provoking comments. Additionally, we would like to thank Progresus TOGETHER
foundation for their generous sponsorship of this research and mobility-associated costs.
[9] C. Cassidy, Australian universities split on using new tool to detect AI plagiarism, 2023.</p>
      <p>URL: https://www.theguardian.com/australia-news/2023/apr/16/australian-universities-s
plit-on-using-new-tool-to-detect-ai-plagiarism.
[10] M. Yang, New York City schools ban AI chatbot that writes essays and answers prompts,
2023. URL: https://www.theguardian.com/us-news/2023/jan/06/new-york-city-schools-ba
n-ai-chatbot-chatgpt.
[11] K. Jimenez, “this shouldn’t be a surprise” the education community shares mixed reactions
to ChatGPT, 2023. URL: https://eu.usatoday.com/story/news/education/2023/01/30/chatgp
t-going-banned-teachers-sound-alarm-new-ai-tech/11069593002/.
[12] L. Lonas, Plagiarism finder Turnitin adds AI detection amid popularity of ChatGPT, 2023.</p>
      <p>URL: https://thehill.com/policy/technology/3928562-plagiarism-finder-turnitin-adds-ai-d
etection-amid-popularity-of-chatgpt/.
[13] J. Hsu, Plagiarism tool gets a ChatGPT detector – some schools don’t want it, 2023. URL:
https://www.newscientist.com/article/2367322-plagiarism-tool-gets-a-chatgpt-detector-s
ome-schools-dont-want-it/.
[14] V. S. Sadasivan, A. Kumar, S. Balasubramanian, W. Wang, S. Feizi, Can AI-generated text
be reliably detected?, ArXiv abs/2303.11156 (2023).
[15] D. Baidoo-Anu, L. O. Ansah, Education in the era of generative artificial intelligence (AI):
Understanding the potential benefits of ChatGPT in promoting teaching and learning,
SSRN Electronic Journal (2023).
[16] T. Adiguzel, M. H. Kaya, F. K. Cansu, Revolutionizing education with AI: Exploring the
transformative potential of ChatGPT, Contemporary Educational Technology (2023).
[17] M. Sallam, ChatGPT utility in healthcare education, research, and practice: Systematic
review on the promising perspectives and valid concerns, Healthcare 11 (2023).
[18] F. M. Megahed, Y.-J. Chen, J. A. Ferris, S. Knoth, L. A. Jones-Farmer, How generative AI
models such as ChatGPT can be (mis)used in SPC practice, education, and research? an
exploratory study, ArXiv abs/2302.10916 (2023).
[19] M. M. Rahman, Y. Watanobe, ChatGPT for education and research: Opportunities, threats,
and strategies, Applied Sciences (2023).
[20] L. Dugan, D. Ippolito, A. Kirubarajan, S. Shi, C. Callison-Burch, Real or fake text?:
Investigating human ability to detect boundaries between human-written and machine-generated
text, in: The 37th AAAI Conference on Artificial Intelligence, 2023.
[21] E. Clark, T. August, S. Serrano, N. Haduong, S. Gururangan, N. A. Smith, All that’s
‘human’ is not gold: Evaluating human evaluation of generated text, in: Proceedings
of the 59th Annual Meeting of the Association for Computational Linguistics and the
11th International Joint Conference on Natural Language Processing (Volume 1: Long
Papers), Association for Computational Linguistics, Online, 2021, pp. 7282–7296. URL:
https://aclanthology.org/2021.acl-long.565. doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 1 . a c l - l o n g . 5 6 5 .
[22] G. Jawahar, M. Abdul-Mageed, L. V. S. Lakshmanan, Automatic detection of machine
generated text: A critical survey, in: International Conference on Computational Linguistics,
2020.
[23] D. Ippolito, D. Duckworth, C. Callison-Burch, D. Eck, Automatic detection of generated
text is easiest when humans are fooled, in: Annual Meeting of the Association for
Computational Linguistics, 2019.
[24] S. Gehrmann, H. Strobelt, A. M. Rush, GLTR: Statistical detection and visualization of
generated text, in: Annual Meeting of the Association for Computational Linguistics, 2019.
[25] E. Crothers, N. Japkowicz, H. L. Viktor, Machine generated text: A comprehensive survey
of threat models and detection methods, ArXiv abs/2210.07321 (2022).
[26] W. Liang, M. Yuksekgonul, Y. Mao, E. Wu, J. Y. Zou, Gpt detectors are biased against
non-native english writers, ArXiv abs/2304.02819 (2023).
[27] A. Alimardani, E. A. Jane, We pitted ChatGPT against tools for detecting ai-written text,
and the results are troubling, 2023. URL:
https://theconversation.com/we-pitted-chatgptagainst-tools-for-detecting-ai-written-text-and-the-results-are-troubling-199774.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Prompts</title>
      <p>Keywords: [ k e y w o r d s ]</p>
      <sec id="sec-8-1">
        <title>Introduct:</title>
      </sec>
      <sec id="sec-8-2">
        <title>Toto je odborná práce na téma ”[ t o p i c ] ”. Pokračuj v psaní textu: ”</title>
      </sec>
      <sec id="sec-8-3">
        <title>This is a thesis concerning the topic of ”[ t o p i c ] ”. Resume writing of this thesis:</title>
        <p>[ p o r t i o n o f t h e t e x t ]
[ p o r t i o n o f t h e t e x t ]</p>
      </sec>
      <sec id="sec-8-4">
        <title>Toto je úvod maturitní práce: ”[ i n t r o d u c t i o n ] ”</title>
      </sec>
      <sec id="sec-8-5">
        <title>Toto je závěr maturitní práce: ”[ c o n c l u s i o n ] ”</title>
      </sec>
      <sec id="sec-8-6">
        <title>Napiš abstrakt ve stejném stylu:</title>
      </sec>
      <sec id="sec-8-7">
        <title>Toto je úvod maturitní práce: ”[ i n t r o d u c t i o n ] ”</title>
      </sec>
      <sec id="sec-8-8">
        <title>Toto je závěr maturitní práce: ”[ c o n c l u s i o n ] ”</title>
      </sec>
      <sec id="sec-8-9">
        <title>Napiš krátkou anotaci a klíčová slova:</title>
      </sec>
      <sec id="sec-8-10">
        <title>Toto je zadání úkolu do předmětu [ s u b j e c t ] na střední škole: ”[ i n s t r u c t i o n s ] ”. Vypracuj úkol:</title>
      </sec>
      <sec id="sec-8-11">
        <title>This is the introduction of a high school leaving</title>
        <p>thesis: ”[ i n t r o d u c t i o n ] ”</p>
      </sec>
      <sec id="sec-8-12">
        <title>This is the conclusion of a high school leav</title>
        <p>ing thesis: ”[ c o n c l u s i o n ] ”</p>
      </sec>
      <sec id="sec-8-13">
        <title>Write an abstract in the same style:</title>
      </sec>
      <sec id="sec-8-14">
        <title>This is the introduction of a high school leaving</title>
        <p>thesis: ”[ i n t r o d u c t i o n ] ”</p>
      </sec>
      <sec id="sec-8-15">
        <title>This is the conclusion of a high school leav</title>
        <p>ing thesis: ”[ c o n c l u s i o n ] ”</p>
      </sec>
      <sec id="sec-8-16">
        <title>Write a short annotation and keywords:</title>
      </sec>
      <sec id="sec-8-17">
        <title>This is an assignment in [ s u b j e c t ] class at a high school: ”[ i n s t r u c t i o n s ] ”. Complete the assignment:</title>
      </sec>
      <sec id="sec-8-18">
        <title>Pomocí tohoto dotazníku analyzujeme, zda jsou generativní AI modely schopné odpovídat na různé typy úkolů a zda jsou tyto texty rozpoznatelné od těch skutečných, lidsky napsaných.</title>
      </sec>
      <sec id="sec-8-19">
        <title>With this questionnaire, we seek to analyze whether generative AI models are able to complete diferent kinds of coursework and whether these texts are recognizable from real, human-written ones.</title>
      </sec>
      <sec id="sec-8-20">
        <title>Níže uvidíte několik verzí abstraktu ke stejné matu</title>
        <p>ritní práci z humanitních studií nebo českého jazyka.
U každé práce zodpovězte následující otázky:
1. Které z navrhovaných možností fungují jako
adekvátní abstrakt (tzn. nastiňují předmět a cíl
práce, krátce shrnují obsah, a hlavně navnazují
čtenáře*řku k tomu, aby si celou práci přečetl*la)?
— můžete zvolit libovolný počet odpovědí (tzn. klidně
všechny nebo žádnou)
2. Která z navrhovaných možností je, podle Vás,
pro svůj účel nejvhodnější? — volte právě jednu
možnost</p>
      </sec>
      <sec id="sec-8-21">
        <title>Below, you will be presented with diferent alternatives for an abstract to accompany graduation theses (from Humanities or Czech language subjects).</title>
        <p>For each thesis, answer the following
questions:
1. Which suggested options work as an adequate
abstract (i.e., outline the topic and aims of the
work, briefly summarize its contents, and—perhaps
most importantly—grasp the reader)? — you may
select any number of options (i.e., including all and
none)
2. Which of the proposed options do you think is
the most suitable for its purpose? — you must
select only one option</p>
      </sec>
      <sec id="sec-8-22">
        <title>Které z navrhovaných možností fungují jako adekvátní abstrakt?</title>
      </sec>
      <sec id="sec-8-23">
        <title>Which suggested options work as an adequate abstract?</title>
      </sec>
      <sec id="sec-8-24">
        <title>Která z navrhovaných možností je, podle Vás, pro svůj účel nejvhodnější?</title>
      </sec>
      <sec id="sec-8-25">
        <title>Which of the proposed options do you think is the most suitable for its purpose?</title>
      </sec>
      <sec id="sec-8-26">
        <title>Níže uvidíte několik krátkých úryvků z maturitních prací z humanitních studií nebo českého jazyka. U každého se nachází 4 alternativní pokračování – 1 skutečné (původní), 3 vygenerována pomocí GPT-4.</title>
      </sec>
      <sec id="sec-8-27">
        <title>Vyberte vždy tu variantu, u níž si myslíte, že pochází z původní, člověkem psané práce.</title>
      </sec>
      <sec id="sec-8-28">
        <title>Below, you will be presented with short excerpts</title>
        <p>from graduation theses (from Humanities or Czech
language subjects). For each, there are 4 alternative
continuations - 1 real (original) and 3 generated by
GPT-4.</p>
      </sec>
      <sec id="sec-8-29">
        <title>For eacg thesis, select the variant you think comes from the original, human-written work.</title>
      </sec>
      <sec id="sec-8-30">
        <title>Která z navrhovaných možností, podle Vás, pochází z původní, člověkem psané práce?</title>
      </sec>
      <sec id="sec-8-31">
        <title>Which of the proposed options do you think comes from the original, human-written work?</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. U.</given-names>
            <surname>Haque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Dharmadasa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. T.</given-names>
            <surname>Sworna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. N.</given-names>
            <surname>Rajapakse</surname>
          </string-name>
          , H. Ahmad, ”
          <article-title>i think this is the most disruptive technology”: Exploring sentiments of chatgpt early adopters using twitter data</article-title>
          ,
          <source>ArXiv abs/2212</source>
          .05856 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Nuzula</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Amri</surname>
          </string-name>
          ,
          <article-title>Will chatgpt bring a new paradigm to hr world? a critical opinion article</article-title>
          ,
          <source>Journal of Management Studies and Development</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Joublin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ceravola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Deigmoeller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gienger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franzius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eggert</surname>
          </string-name>
          ,
          <article-title>A glimpse in chatgpt capabilities and its impact for ai research</article-title>
          ,
          <source>ArXiv abs/2305</source>
          .06087 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Zhang,</surname>
          </string-name>
          <article-title>Evaluating chatgpt's information extraction capabilities: An assessment of performance, explainability, calibration, and faithfulness</article-title>
          ,
          <source>arXiv preprint arXiv:2304.11633</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Whittle</surname>
          </string-name>
          ,
          <article-title>Towards responsible ai in the era of chatgpt: A reference architecture for designing foundation model-based ai systems</article-title>
          ,
          <source>ArXiv abs/2304</source>
          .11090 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] OpenAI, GPT-4
          <source>technical report, ArXiv abs/2303</source>
          .08774 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Kelly</surname>
          </string-name>
          , “
          <article-title>everybody is cheating”: Why this teacher has adopted an open ChatGPT policy</article-title>
          ,
          <year>2023</year>
          . URL: https://www.npr.org/
          <year>2023</year>
          /01/26/1151499213/chatgpt-ai
          <article-title>-educa tion-cheating-classroom-wharton-school.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. C. C.</given-names>
            <surname>Board</surname>
          </string-name>
          , 2022
          <article-title>-23 guidance for artificial intelligence tools</article-title>
          and other services, ???? URL: https://apcentral.collegeboard.
          <article-title>org/exam-administration-ordering-scores/administeri ng-exams/preparing-for-exam-day/exam-security/artificial-intelligence-tools.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>