=Paper=
{{Paper
|id=Vol-3605/7
|storemode=property
|title=Using ChatGPT to Enhance Students’ Behavior in Social Media via the Moral Foundation Theory (short paper)
|pdfUrl=https://ceur-ws.org/Vol-3605/7.pdf
|volume=Vol-3605
|authors=Daniele Schicchi,Apoorva Upadhyaya,Marco Fisichella,Davide Taibi
|dblpUrl=https://dblp.org/rec/conf/aixedu/SchicchiUF023
}}
==Using ChatGPT to Enhance Students’ Behavior in Social Media via the Moral Foundation Theory (short paper)==
<pdf width="1500px">https://ceur-ws.org/Vol-3605/7.pdf</pdf>
<pre>
                                Using ChatGPT to Enhance Students’ Behavior in
                                Social Media via the Moral Foundation Theory
                                Daniele Schicchi1,∗,† , Apoorva Upadhyaya2,∗,† , Marco Fisichella2 and Davide Taibi1
                                1
                                    Institute for Education Technology, National Research Council of Italy, Palermo, Italy
                                2
                                    L3S Research Center, Leibniz University, Hannover, Germany


                                                                         Abstract
                                                                         Social media poses numerous dangers, such as the spread of toxic content, hate speech, false information,
                                                                         and moral outrage. In particular, the propagation of moral outrage on social media can result in harmful
                                                                         outcomes like promoting conspiracy theories, violent protests, and political polarization. Teenagers
                                                                         frequently use social media and are especially susceptible to these threats, being exposed to several risks
                                                                         that can impact their lives. Recent studies have confirmed that a combination of human and AI efforts
                                                                         can be highly effective in several domains. In this study, we examine human-AI collaboration’s potential
                                                                         and efficacy in detecting morality in online posts. To this end, we conducted a pilot study with teenagers
                                                                         to determine their ability to recognize moral content in online posts. Consequently, we exploited Prompt
                                                                         Engineering to program ChatGPT to recognize morality and comprehend its potential as an intelligent
                                                                         tutor in supporting students in such a context.

                                                                         Keywords
                                                                         Human-AI collaboration, social media platforms, moral classification, ChatGPT


                                1. Introduction
                                Social media (SM) are not immune to harmful elements such as online bullying, moral outrage,
                                cyber harassment, and hate speech. These toxic components can lead to severe implications in
                                the real world, such as depression and anxiety, especially for young individuals who are more
                                susceptible to the effects of such phenomena [1]. Prior research has placed considerable emphasis
                                on moral sentiments when analyzing social media discourse providing insightful perspectives on
                                significant occurrences such as the practice of social distancing and the spread of misinformation.
                                Evaluating moral sentiments on SM has been proven useful for several tasks such as describing
                                the comparative/cooperative attitude towards the pandemic [2], comprehending public opinions
                                expressed on various societal issues [3], and measuring the social distance between people [4].
                                Therefore, the influence of moral values cannot be underestimated as they profoundly affect

                                AIxEDU: 1st International Workshop on High-performance Artificial Intelligence Systems in Education, November 06–09,
                                2023, Rome, Italy
                                ∗
                                    Corresponding author.
                                †
                                    These authors contributed equally.
                                Envelope-Open daniele.schicchi@itd.cnr.it (D. Schicchi); upadhyaya@l3s.de (A. Upadhyaya); mfisichella@l3s.de (M. Fisichella);
                                davide.taibi@itd.cnr.it (D. Taibi)
                                Orcid https://orcid.org/0000-0003-0154-2736 (D. Schicchi); https://orcid.org/0000-0002-5796-8704 (A. Upadhyaya);
                                https://orcid.org/0000-0002-6894-1101 (M. Fisichella); https://orcid.org/0000-0003-0154-2736 (D. Taibi)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
people’s outlooks and actions. Whether acknowledged or not, these values significantly shape
people’s positions and decisions on various issues. As such, it is imperative to scrutinize the
prevalent moral sentiments on social media platforms. This measure is essential in safeguarding
young individuals from experiences that may negatively impact their well-being [5].
Recent advances in Artificial Intelligence have paved the way for new methodologies to help
in education [6, 7]. Recent intelligent systems rely on Large Language Models (LLMs). These
systems can provide answers to a wide array of questions across various subject areas. They
are designed to be user-friendly, with natural language interfaces that allow users to interact
with the system conveniently. ChatGPT [8] is an exceptional chatbot that harnesses the power
of LLMs to generate digital content that is both relevant and accurate with impressive speed.
Moreover, it ensures a seamless and efficient communication experience by offering text-based
responses to user queries, setting it apart as a truly advanced and valuable tool for every type
of user.
ChatGPT can be a valuable tool in education. It can help students understand complex concepts,
gather preliminary information on research topics, and support writing tasks by generating
outlines and improving sentence structure. Since its high performance, ChatGPT is a perfect
candidate to enhance the learning experience, and its usage is getting more frequent. The
main interface to interact with ChatGPT is a prompt, a set of instructions to program the LLM
customizing, enhancing and/or refining its capabilities [9]. A prompt has a significant impact
on the behavior of the LLM. It defines the rules and guidelines the system must follow and sets
the context for the conversation, indicating to the LLM what information is crucial and what
the desired output form and content should be.
   In this paper, we aim to analyze the capability of ChatGPT to act as a companion to detect
moral content on social media. To achieve this objective, we analyzed how ChatGPT classifies
the content when different prompt patterns are used. Using the Moral Foundation Twitter
Corpus as the ground truth of our experiment, we evaluated the F1 score to ponder the effects
of precision and recall and determine ChatGPT performance in this task.
   The paper is organized as follows: in section 2 we give an overview about prompt engineering
and the usage of ChatGPT in education. In section 3 we describe the method that have conducted
to the results that are analyzed in section 4. Finally, in section 6 we give the final discussion
and conclusion.


2. Literature Review
2.1. Importance for Adolescents in Detecting Moral Content on Social Media
Previous literature has applied AI techniques to combat societal issues such as the ones relative
to climate change and communal riots resulting from misinformation and hate speech on social
media [10, 11, 12, 13]. The presence of immoral and toxic content in online communication
eventually lead to harmful effects, especially on teenagers such as teen depression, teen anxiety
disorder, mental illness and more1 . Moreover, a recent survey2 revealed how social media users
1
    https://www.newportacademy.com/resources/well-being/effect-of-social-media-on-teenagers/
2
    https://www.medienanstalt-nrw.de/fileadmin/user_upload/NeueWebsite_0120/Themen/Hass/Ergebnisbericht_
    forsa-Befragung_zu_Hate_Speech_im_Internet_2022.pdf
aged 14-24 engage with hate speech due to its entertainment value. Therefore, it is crucial for
teenagers to recognize moral/non-moral content in social media and be warned to exercise
caution when consuming such content. This prompted us to use the moral task and conduct a
pilot study that analyzes teenagers’ and AI performance in detecting moral sentiments.

2.2. ChatGPT: Prompt Engineering and Student Assistance
Language models such as ChatGPT exploit prompts to personalize the model’s behavior and
shape its outputs. A prompt [9] is defined as a set of instructions that programs the LLM to
enhance and refine its capabilities. [14] review the most important patterns that can be used in
the context of LLM. In our work, we focused on the categories of Interaction and Customization
patterns. The Interaction category pattern allows the user to interact with the LLM in different
ways, for example, by letting the LLM ask questions to the users to accomplish a task. The
Customization category includes patterns that allow the user to limit the output to meeting
specific types, and structures, such as indicating the LLM to play as a specific person. Although
LLMs have shown drawbacks for students, such as cheating, plagiarism, and overreliance [15],
the education field has also benefited from their use. ChatGPT [16], the most famous LLM
nowadays, has made significant contributions to help students improve their competencies [17].
Students have been using ChatGPT to enhance their learning activities by quickly referencing
it and utilizing it for self-study [18]. One of the recent works [19] uses ChatGPT to improve
users’ moral judgment and decisions. The experiments in the study ask ChatGPT questions
such as “whether it is right to sacrifice one person’s life to save those of five others” or other
issues with the questions on abortion etc. and further prove that ChatGPT corrupts rather than
improves the moral judgement of its users. These works motivated us to investigate whether
ChatGPT can support students in classifying moral content on social media platforms.


3. Methods
3.1. Dataset
This study uses the publicly available Moral Foundations Twitter (MFT) corpus [20]. The
dataset was annotated by at least three trained annotators, with the majority for each label
considered ≥ 50%. The annotators considered morality as a set of vices and virtues according
to Moral Foundation Theory [21], including care–harm, fairness–cheating, loyalty–betrayal,
authority–subversion, and purity–degradation and provided the annotations for each tweet
with the presence of the vice-virtues or non-moral. Specifically, each tweet was assigned a label
indicating the presence or absence of each virtue and vice, or a label indicating that the tweet
was non-moral. This resulted in a set of 11 labels for each tweet. However, to simplify the study
and follow the existing work [22], we also consider the moral task as a binary classification
task, where each tweet belongs either to the moral class (belonging to any moral vice/virtue
category) or to the non-moral class. As a result, every tweet in the dataset falls into one of
two classes: moral or non-moral. We consider the labels (moral/non-moral) of the dataset as
ground truth and use these labels to evaluate the performance of students and ChatGPT in
the binary task of moral classification. Due to time constraints concerning the execution time,
we randomly selected 200 tweets from the publicly available MFT corpus and used these for
experiments involving ChatGPT in our research. In addition, to compare students’ performance
with each other and with ChatGPT, we use the subset of 15 tweets for the moral classification
task (moral/non-moral) as a part of the preliminary research for the pilot study to get a better
understanding of the performance in qualitative terms.

3.2. Large Language Model- ChatGPT
We use ChatGPT (GPT-3.5-Turbo) model using Azure OpenAI3 Service to classify the tweet
text. Our current study only examines ChatGPT’s performance in a zero-shot setting to explore
its ability without training. However, we plan to analyze ChatGPT in one-shot and few-shot
settings as part of our future research. Moreover, we utilized mixed prompt patterns by taking
advantage of the ability of LLMs to mimic different personas and the capacity to adjust prompts
by gathering more data for improved results. In detail, we leveraged the persona pattern and
the Flipped Interaction Pattern. The Persona Pattern (PP) allows users to generate personalized
outputs from a specific perspective. The purpose of this pattern is to give the LLM a specific
character that guides its output and prioritizes details. We have also used the Flipped Interaction
Pattern (FIP) to manage the conversation with ChatGPT to achieve the best prompt for our goal.
In the FIP, after the user provides a brief description of the problem, the LLM takes charge of
the conversation by asking specific questions that assist in creating an effective prompt capable
of achieving the desired task.

3.3. Pilot Study Framework
A pilot study involving 67 adolescents from a secondary school was conducted for our study.
Each participant was assigned a unique identifier and provided a link for direct access to
the platform. The platform was designed as a basic user interface containing tweet text and
moral/non-moral radio buttons to allow each participant to provide their labels for the binary
moral classification task. Each student was given a similar set of 15 tweets in a random fashion
order to classify the moral/non-moral content. We do not capture any personal data concerning
the privacy of each individual. Further, all students were asked to provide informed consent to
participate in agreement with the Data Protection Officer of our University.


4. Results
We first analyzed ChatGPT’s results on 200 tweets dataset based on the different contexts. Then
we compared the performance of students and ChatGPT on 15 tweets as a part of the pilot study.
We use accuracy, macro variant of precision, recall, and F1 score as performance metrics.

4.1. Performance of ChatGPT based on different context (roles/personas)
From Table 1, it can be seen that the performance of ChatGPT is different when we specify
different roles and personas in ChatGPT’s prompt. It is observed that ChatGPT performs best
3
    https://learn.microsoft.com/en-us/azure/ai-services/openai/overview
Table 1
Performance Comparison of ChatGPT with different context (roles/personas) [200 tweets dataset]
  Different ChatGPT’s Context(ChatGPT acts as)        Precision    Recall   F1 score    Accuracy
  Trained AI language model                           75.37        73.62    73.87       74.88
  Tutor for kids                                      74.60        74.74    74.65       74.87
  Tutor for teenagers                                 73.55        73.60    73.57       73.84
  Tutor for adults                                    76.69        76.48    76.57       76.92
  Teenager                                            76.12        74.89    75.14       75.89

Table 2
Performance Comparison of ChatGPT and Pilot Study Students [15 tweets dataset]
     Entity                            Precision     Recall        F1 score      Accuracy
     ChatGPT as Trained AI model       72.50         68.75         68.89         71.42
     ChatGPT act as teenager           65.15         60.42         59.06         64.28
     Total 67 Students (Avg/Std.dev)   61.92±13.27   60.69±11.96   59.72±12.51   61.04±11.82
     Best-performing Student           86.60         86.60         86.60         86.66
     Worst-performing Student          33.33         33.92         33.05         33.34


when it acts as a tutor for adults, resulting in 3.05% performance improvement in F1 score
compared to other scenarios. We also noticed that ChatGPT does not perform well enough in its
role as a tutor for children and adolescents with F1 scores of 74.65 and 73.57 respectively, even
though these age groups are the ones that need more guidance and are the main focus groups
for imparting education on the morality/non-morality of the content. As ChatGPT’s ability
to identify moral content varies across roles and performs poorly in the case of children and
young people, this provided us with a future research direction to explore ChatGPT’s generated
explanations to explore any significant differences in helping kids and teenagers understand
the moral/non-moral content.

4.2. Pilot Study Analysis: Performance Comparison of Students and ChatGPT
Table 2 shows that ChatGPT outperforms the average student with 71.00% and 5.31% improve-
ment in accuracy in the context of the AI model and as a teenager. The student with 86.66
accuracy performs best among all students and ChatGPT, while 14 students out of 67 (student
accuracy > 71.42) outperforms ChatGPT’s best performance, indicating that some students are
very knowledgeable about morality. The lowest performing student scores only 33.34 accuracy
and 53.73% of students (36 out of 67) achieve less accuracy than the mean value, suggesting
that sufficient guidance is urgently needed to help these teens grasp the overall concept of
non-moral/moral content.


5. Limitations
In this study, we aim to investigate the extent to which ChatGPT can help students understand
moral content in social media. However, there are certain limitations in our current work. Due
to the disadvantage of the short length of tweets, students and ChatGPT may have trouble
identifying whether a tweet is moral or non-moral content. Better training of students on
moral content and a few-shot training scenario for ChatGPT could be beneficial to improve
the performance of both students and ChatGPT in the classification task. While we use the
GPT-3.5-Turbo model for the experiments, an investigation with a newer version of ChatGPT,
such as GPT4 and other more advanced LLMs, could reveal better qualities of the LLMs’ to
classify the moral content. Time constraints limit the number of students and the dataset size
for the pilot study. However, we plan to use a larger number of informative tweets from the
ground-truth dataset to evaluate the performance of LLMs in identifying moral content.


6. Discussion and Conclusion
In this research, we conduct a preliminary study whose main objective is to investigate the
capabilities of ChatGPT to guide students, especially children and adolescents, in identifying
moral/non-moral content disseminated on social media platforms such as Twitter. We first
analyze the performance of ChatGPT based on different roles and personas. We then conduct a
pilot study in which students from the teenage group categorize tweets into moral/non-moral
content. Our preliminary results suggest that ChatGPT performs better when it behaves as
a tutor for adults rather than as a tutor for children and teenagers. Moreover, we find that
ChatGPT outperforms the average student when acting as a trained AI model and teenager,
but 20.89% of students still perform better than ChatGPT’s best results. However, a large
proportion of students (53.73%) perform even worse than the average scores for all students.
Such results prompted us to analyze in more detail how students and ChatGPT understand the
moral and non-moral labels separately. This first round of analysis has also led us to explore
the explanations ChatGPT gives in the case of a tutor for children or young people to gain a
deeper understanding of the impact of different personas on ChatGPT’s evaluations. We will
consider this as part of our future research. Hence, in this study, we perform investigations
from a higher-level perspective, and we will look into in-depth details in our future works.


References
 [1] S. Kaur, K. Kaur, R. Verma, et al., Impact of social media on mental health of adolescents,
     Journal of Pharmaceutical Negative Results (2022) 779–783.
 [2] O. Araque, K. Kalimeri, L. Gatti, Sentiment and moral narratives during covid-19 (2020).
 [3] R. Rezapour, L. Dinh, J. Diesner, Incorporating the measurement of moral foundations
     theory into analyzing stances on controversial topics, in: Proceedings of the 32nd ACM
     conference on hypertext and social media, 2021, pp. 177–188.
 [4] M. Dehghani, K. Johnson, J. Hoover, E. Sagi, J. Garten, N. J. Parmar, S. Vaisey, R. Iliev,
     J. Graham, Purity homophily in social networks., Journal of Experimental Psychology:
     General 145 (2016) 366.
 [5] D. Taibi, G. Fulantelli, V. Monteleone, D. Schicchi, L. Scifo, An innovative platform to
     promote social media literacy in school contexts, in: ECEL 2021 20th European Conference
     on e-Learning, Academic Conferences International limited, 2021, p. 460.
 [6] C. Limongelli, D. Schicchi, D. Taibi, Enriching didactic similarity measures of concept maps
     by a deep learning based approach, in: 2021 25th International Conference Information
     Visualisation (IV), IEEE, 2021, pp. 261–266.
 [7] D. Schicchi, B. Marino, D. Taibi, Exploring learning analytics on youtube: a tool to support
     students’ interactions analysis, in: Proceedings of the 22nd International Conference on
     Computer Systems and Technologies, 2021, pp. 207–211.
 [8] OpenAI, 2023. URL: https://chat.openai.com/.
 [9] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, G. Neubig, Pre-train, prompt, and predict: A
     systematic survey of prompting methods in natural language processing, ACM Computing
     Surveys 55 (2023) 1–35.
[10] A. Upadhyaya, M. Fisichella, W. Nejdl, A multi-task model for emotion and offensive aided
     stance detection of climate change tweets, in: Proceedings of the ACM Web Conference
     2023, 2023, pp. 3948–3958.
[11] A. Upadhyaya, M. Fisichella, W. Nejdl, A multi-task model for sentiment aided stance
     detection of climate change tweets, in: Proceedings of the International AAAI Conference
     on Web and Social Media, volume 17, 2023, pp. 854–865.
[12] A. Upadhyaya, M. Fisichella, W. Nejdl, Towards sentiment and temporal aided stance
     detection of climate change tweets, Information Processing & Management 60 (2023).
[13] A. Upadhyaya, M. Fisichella, W. Nejdl, Intensity-valued emotions help stance detection
     of climate change twitter data, in: Proceedings of the Thirty-Second International Joint
     Conference on Artificial Intelligence, IJCAI 2023, ijcai.org, 2023, pp. 6246–6254.
[14] J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer-Smith,
     D. C. Schmidt, A prompt pattern catalog to enhance prompt engineering with chatgpt,
     arXiv preprint arXiv:2302.11382 (2023).
[15] A. Tlili, B. Shehata, M. A. Adarkwah, A. Bozkurt, D. T. Hickey, R. Huang, B. Agyemang,
     What if the devil is my guardian angel: Chatgpt as a case study of using chatbots in
     education, Smart Learning Environments 10 (2023) 15.
[16] N. Fijačko, L. Gosak, G. Štiglic, C. T. Picard, M. J. Douma, Can chatgpt pass the life support
     exams without entering the american heart association course?, Resuscitation 185 (2023).
[17] C. K. Lo, What is the impact of chatgpt on education? a rapid review of the literature,
     Education Sciences 13 (2023) 410.
[18] S. Nisar, M. S. Aslam, Is chatgpt a good tool for t&cm students in studying pharmacology?,
     Available at SSRN 4324310 (2023).
[19] S. Krügel, A. Ostermaier, M. Uhl, Chatgpt’s inconsistent moral advice influences users’
     judgment, Scientific Reports 13 (2023) 4569.
[20] J. Hoover, G. Portillo-Wightman, L. Yeh, S. Havaldar, A. M. Davani, Y. Lin, B. Kennedy,
     M. Atari, Z. Kamel, M. Mendlen, G. Moreno, C. Park, T. E. Chang, J. Chin, C. Leong, J. Y.
     Leung, A. Mirinjian, M. Dehghani, Moral foundations twitter corpus: A collection of 35k
     tweets annotated for moral sentiment, Social Psychological and Personality Science 11
     (2020) 1057–1071.
[21] J. Graham, J. Haidt, S. Koleva, M. Motyl, R. Iyer, S. P. Wojcik, P. H. Ditto, Chapter two
     - moral foundations theory: The pragmatic validity of moral pluralism, volume 47 of
     Advances in Experimental Social Psychology, Academic Press, 2013, pp. 55–130.
[22] J. W. Burton, N. Cruz, U. Hahn, Reconsidering evidence of moral contagion in online social
networks, Nature Human Behaviour 5 (2021) 1629–1635.

</pre>