<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Collaborative AI for Qualitative Analysis: Bridging AI and Human Expertise for Scalable Analysis⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Grace C. Lin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emma Anderson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carúmey Stevens</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brandon Hanks</string-name>
          <email>bhanks@mit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Disha Chauhan</string-name>
          <email>disha31@mit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amelia Farid</string-name>
          <email>mfarid@ucmerced.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mic Fenech</string-name>
          <email>mfenech@gardner-webb.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Klopfer</string-name>
          <email>klopfer@mit.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Massachusetts Institute of Technology</institution>
          ,
          <addr-line>Cambridge, MA 02139</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Qualitative research in learning settings often faces challenges of time consumption and iterative refinement. To address these issues, we developed CAILA (Collaborative AI for Learning and Analysis), a novel AI-assisted system designed to support researchers in thematic analysis and address challenges such as thematic saturation. Using a GPT-based model with adjustable parameters and stopping criteria, CAILA aids researchers in generating and refining themes efficiently while preserving the rigor of human oversight. Notably, CAILA's stopping criteria-three iterations with no new themes generated-ensures a balance between thoroughness and efficiency. We evaluated the CAILA tool by comparing the analysis of a set of student conversations (146 utterances) using CAILA with the thematic analysis conducted by two human researchers. While the human+CAILA approach found themes directly answering the question posed, the humans-only approach refined the research question, a staple in qualitative research. We discuss the implications of using AI-powered qualitative analytic tools.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Qualitative methods</kwd>
        <kwd>Education/learning</kwd>
        <kwd>Conversation analysis with AI</kwd>
        <kwd>Collaborative and social computing ~ Collaborative and social computing design and evaluation methods 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Open up a book on qualitative research (e.g., [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), and a plethora of methods will meet your eyes. It
is said that there are as many methods of qualitative research as there are qualitative researchers;
after all, the researcher is considered an instrument in the research [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Yet, the field is
surprisingly unified on one matter. Ask any qualitative researcher a pain point of their work.
Undoubtedly the dominating winner will be the onerous burden and tremendous time it takes to
iteratively code through pages after pages of texts, whether it be from interview transcripts,
observation notes, online forum discussions, or even actual exchanges of text messages (see [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]).
      </p>
      <p>This paper introduces a large language model (LLM)-based qualitative analysis tool named
Collaborative Artificial Intelligence for Learning and Analysis (CAILA) meant to support and
alleviate the time burden in qualitative analysis. Additionally, we aim to explore how analyzing with
tools such as CAILA can differ with traditional (humans-only) methods of qualitative analysis.</p>
      <sec id="sec-1-1">
        <title>1.1. Positionality statements</title>
        <p>As the researchers are the tools through which data is analyzed in qualitative research, it is
imperative to know the researchers’ stance. Here, we present the positionality statements of the first
two authors GL and EA, who led the human+CAILA and the traditional humans-only approaches,
respectively.</p>
        <p>GL conducts mixed methods research and thinks very deeply about methods and methodology to
the point that some colleagues consider her a methodologist. While she does not typically enjoy
labels placed upon her, she does see their utility in demonstrating her approach to research. For
example, while she conducts mostly qualitative research (or mixed methods with more emphasis on
the qualitative nature of the work) at MIT, the position she holds at Harvard University is one of
Lecturer in Quantitative Psychology. She is comfortable transforming qualitative data into
quantitative ones for further analysis but at the same time realizes the limitations with such
approaches. At times because of the divide she sees between the two worlds she walks, she tries to
bridge the gap. In this work, she is the main researcher using the LLM tool in order to explore its
potentials.</p>
        <p>EA is also a mixed methods researcher. However, unlike GL, EA has a much deeper leaning
toward the qualitative side with her undergraduate in Anthropology forming a foundation on how
she approaches exploring human interactions. Over the last seven years she has primarily worked
on qualitative research projects from deep analysis of classroom observational data, to interaction
analysis of video data, to interview studies all in an attempt to understand how and in what ways
learning is taking place. EA feels that in digging into what individuals are saying and the actions
they are taking we can better understand how learning is taking place and how to better support
learning.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Using the newest technology and technique to analyze text is nothing new. Natural language
processing (NLP) was born over half a century ago in an effort to automatically translate languages
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Since its development in the late 1940s, researchers have continued to develop and use
associated techniques for text analysis. For example, topic modeling uses text mining and
unsupervised learning to extract key terms and topics represented in the document (see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]). A
number of algorithms were developed for topic modeling, such as latent Dirichlet allocation (LDA;
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) or latent semantic analysis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. These techniques have sped up the ability for researchers
to process texts and apply statistical modeling to text-based data (e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]).
      </p>
      <sec id="sec-2-1">
        <title>2.1. AI for qualitative analysis</title>
        <p>
          Though the use of emerging technology for analysis has been around for decades, the use of
generative AI for qualitative analysis was still “in the state of being born” [6, p. 999]. In this very
nascent stage, a number of research teams have started exploring the technique and the process in
deductive coding (e.g., [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]) to thematic analysis [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], to even the development of computational
grounded theory framework [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
2.1.1. Methodological and ethical considerations
Computational Grounded Theory (CGT) is one of the integral human computer interaction (HCI)
theories for Generative AI use in qualitative analysis. Specifically, CGT leverages the unique
contributions of both the human analyst and computer through an iterative three-step process. It
first conducts pattern detection through NLP and machine learning algorithms. This is followed by
“pattern refinement,” in which human analysts interpret patterns that AI detected. The final step,
“pattern confirmation,” aims to ensure the patterns detected and interpreted are applicable
throughout the entire dataset [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. Tschisgale and colleagues (2023) applied CGT in an (physics)
educational research setting and found that CGT promotes efficiency by enabling researchers to
analyze large amounts of unstructured qualitative data at a faster pace. Moreover, it increases the
rigor of qualitative research as the findings are more easily reproducible because it is encoded in the
trained ML model [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          These findings shed light on “mutual learning” frameworks which consider Generative AI as both
a tool and partner to human researchers in the qualitative analysis process for data synthesis and
codebook creation [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Specifically, Barany and colleagues (2024) demonstrated this in an
experimental study with multiple conditions including coding with human analysts only and
ChatGPT only as well as collaboration between the two [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. This study revealed that the hybrid
conditions in which the computer and human coded collaboratively (in codebook development or
refinement) resulted in the highest utility ratings, conceptual overlaps, and inter-rater reliability.
This research highlights the need for human participation within the analyses process as the
ChatGPT-only condition, or fully automated approach, was an outlier in relation to human analyst
and AI partnered approaches resulting in errors, inconsistencies, and missed themes. The caution
against overreliance on AI-powered analysis methods was echoed by researchers who developed
LLooM, a concept induction algorithm that leverages LLMs to drive meaningful concepts from
unstructured text [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Lam and colleagues warn that heavy reliance on LLooM outputs can result in
gaps and misses in the concepts generated.
        </p>
        <p>
          Both Lam’s and Barany’s findings connect to research by Christou (2023) that warns against the
overdependence on Generative AI in analyses [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. While Generative AI has clear benefits for
qualitative research analyses, it is important to consider and mitigate the biases and limitations LLMs
and other algorithms and models are well-known to have [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Christou has conducted research
to begin to address the gap in critical perspectives and practical methodological guidelines regarding
AI use in qualitative research analyses. In these guidelines, Christou emphasizes the importance of:
(1) familiarity with one’s dataset to be able to identify biases, (2) transparency around AI usage and
its limitations in analyses, and (3) cross-referencing in an effort to ensure accuracy and validate
AIgenerated insights through triangulation. These suggestions of best practices keep humans in the
loop to mitigate some of the potential biases that LLMs may produce as partners or tools in
qualitative analyses.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.1.2. Supportive tools</title>
        <p>
          Along with the frameworks, guidelines, and recommendations, developments of tools and software
are keeping pace. Commercially available qualitative analysis software that have partnered with
OpenAI (e.g., Atlas.ti and MAXQDA) have touted the integration of AI to help with the analytic
process and reduce time [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. However, when our research team tried to use these tools, we found
that they were lacking in flexibility and transparency. In terms of flexibility, we could not rerun the
AI support or easily iterate on the codes it found. In terms of transparency, it was difficult to
determine which parts of the data the AI was using to support the code it identified. Wanting greater
insight and control over the processes, we concluded these off-the-shelf tools were not allowing us
to do qualitative research in the way we felt honored our methodology.
        </p>
        <p>
          Innovative tools have also come out of the research community. CoAI Coder and CollabCoder,
for example, place the AI as a human collaborator in the process of qualitative coding [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ].
CoAI Coder used classic NLP (e.g., SpacyNLP) models and the Dual Intent Entity Transformer (DIET)
classifier [27]. The more recent CollabCoder integrated LLMs, specifically the GPT-3.5 model. The
purpose of the tool is to enable researchers to develop codes with AI-generated code suggestions and
more quickly resolve any disagreements [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. LLooM, in contrast, is focused on concept
induction. While very similar to thematic analysis [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], Lam and colleagues (2024) situated LLooM
as a more advanced tool for extracting high-level concepts, comparing results from the tool to those
from BERTopic [28], [29] as well as large language models alone [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>
          The use of LLMs inevitably requires feeding preprompts to the large language model, and the
integration of AI in the qualitative analysis process requires users to know how to write the
preprompts and trust the AI in the process [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. Therefore, the evaluation of the preprompt and the
responses that result from the preprompts is essential. ChainForge enables the comparison of
multiple LLMs in how they make sense of the data [30], [31]. In fact, researchers have combined
ChainForge’s capability with classic NLPs (e.g., term frequency-inverse document frequency [32] as
well as a novel Positional Diction Clustering (PDC) algorithm in order to make sense of text data
[33].
        </p>
        <p>
          As using LLMs for qualitative analysis is still in its infancy, no technique has been established as
the go-to. For the most part, the studies are concerned with speeding up the process of analysis,
using LLMs, for example, to extract the codes before, for example, going into the next stage of
deriving the themes (e.g., [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]). In general, there is still the tendency to stick with the
previously established protocols in manual qualitative codebook development. This approach may
be due to researchers’ understandable concerns in adhering to guidelines that ensures researchers’
analytic control and cognitive input [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] as well as following well-defined steps for analysis (e.g.,
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]) in an otherwise subjective and fluid research methodology.
        </p>
        <p>
          We present a slightly altered analytic order through CAILA, where human coders are still in the
loop throughout the process but the themes are extracted without actively engaging in the first cycle
of initial coding [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In essence, we skipped the development of the codebook with the LLMs, but
aimed to see if we could still arrive at themes similar to those derived from the traditional manual
coding process. Furthermore, instead of taking for granted that LLMs are acceptable tools for
qualitative analysis and comparing various models (as ChainForge does), we take a step back to ask:
“How do the process and outcomes of thematic analysis using a generative AI-powered tool
differ from those of traditional (human) analysis?”
3. Collaborative AI for Learning and Analysis (CAILA) and its Process
Collaborative AI for Learning and Analysis (CAILA) is an LLM-powered system tool meant to
support qualitative analysis. Its current capabilities are limited to inductive approaches such as
thematic analysis. (See Appendix A for the current user interface and descriptions of how to use the
tool.) We use the term “CAILyze” as a verb to indicate the process of using CAILA to analyze data.
In other words, CAILyze is our approach to use LLMs to support the analysis process. Similar to
Zhang et al.’s [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] framework, the process first involves cleaning the conversation into transcript
files. The file is fed into the system that is connected to a LLM such as ChatGPT. On the backend,
the system is primed with a number of preprompts. One preprompt sets the input and output
expectations (e.g., “I’m going to give you a set of data from student group discussions” and “I want
you to generate themes that would answer the questions I pose. Give the output in a spreadsheet
format, containing the theme, the description and explanation of each theme”). The other allows the
user to input their question of interest. Once the preprompts are set and the data is entered into the
system, the user can run the program to “CAILyze” the data. The result of the CAILyze process
should then be inspected by the user.
        </p>
        <p>
          In contrast to the focus on editing and modifying the preprompt based on the output right away,
we recognize that LLM outputs can be ephemeral. Even if one asks the same question, the response
given by the LLM will differ each time. Therefore, we encourage the multiple iterations approach as
demonstrated by Barany et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. However, instead of a fixed 13 iterations, the CAILyze process
uses a stopping criterion common in many neuropsychological measurements. The user will start
with one iteration, check the themes and their description, explanations, and examples. They will
then run the program again. This time, they will check whether any generated themes are repeated
or new. They will continue this process until they reach three consecutive iterations where no new
theme emerges. The stopping criterion allows more flexibility as longer texts may result in more
theme variations across the iterations than shorter texts. Additionally, this stopping criterion also
serves to ensure that thematic saturation [34], [35], [36], [37] has been reached, such that the
researcher can be more reasonably certain that no other codes or themes would emerge. Figure 1
illustrates the CAILyze flow.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Method</title>
      <p>We demonstrate the CAILyze process through a case study of texts from a data science workshop
with high school students. The study has been approved by the authors’ institutional review board,
and parental consent and student assent were obtained prior to the start of the study. All names used
in this paper are pseudonyms.</p>
      <sec id="sec-3-1">
        <title>4.1. Context</title>
        <p>The data science workshop was held virtually over a week in February 2024 with nine high school
students, five of whom attended a project-based learning charter school in the American South and
the remaining four went to two urban public schools in a Northeastern city. Eight identified as female
and one as male. The group included five 10th graders, two 11th graders, and two 12th graders. Four
identified as African American or Black, four as White or Caucasian, and one as Asian. In a
presurvey, only one participant reported knowing how to engage in data science. The group included a
mix of students from on-level and honors math classes, with three reporting some level of math
anxiety. Five students typically received mostly A's in math, while the rest reported a mix of A's and
B's.</p>
        <p>In the five-day workshop, students worked in groups to examine data, organize and display
information using Excel and Google sheets (e.g., creating graphs and pivot tables), generate their
own research questions, and present their own findings. See Figure 2 for the workshop plan. The
scenario with which we investigated the CAILyze process occurred on Day 4 of the workshop, when
the students evaluated existing data displays from Our World in Data (https://ourworldindata.org/;
[38]) by explaining what the display is showing and whether the information were accurately
represented.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Data source</title>
        <p>The data source was the transcripts of the three student group conversations. With three groups
combined, there were 146 lines of conversation utterances and 2,231 words. As we were interested
in how and if the students have developed a sense of criticality when it comes to judging data
displays, the preprompt we entered into the system was, “generate themes about how students
are critical of the graphs they were investigating.”</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.3. Analytic approach</title>
        <p>
          While we understand that qualitative research results may not be perfectly comparable from one
analysis to another, as researchers come in with their own lived experience that can shape their
analytic lens (see [39]), we believe that illustrating CAILyze with a more traditional manual coding
may lend support to this technique and offer additional insight beyond what other researchers have
already shown (e.g., [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]). Therefore, as means of comparison, one researcher (GL) analyzed the data
using the CAILyze process, and two other researchers (EA and AF) manually coded the data to
answer the same question. All three researchers hold doctorate degrees in education and are
experienced qualitative researchers. The positionality of the two leading researchers (GL and EA)
can be found in the beginning of this article.
        </p>
        <p>The researcher employing CAILA approached the analysis much like quantitative researchers
handling large secondary datasets. She began by familiarizing herself with the data through an
examination of overall speaker utterances—such as word counts per speaker during the activity—
and network graphs. Notably, during dataset preparation, she also skimmed through the transcripts,
a step that some quantitative researchers might skip. Following this initial phase, she applied the
CAILA system and the CAILyze process to extract the themes, using GPT-4o—configured at its
default temperature—as the LLM model.</p>
        <p>In contrast, EA and AF were involved in data collection and thus possessed an inherent
familiarity with the data from the outset. In this humans-only approach, they used process coding in
their initial cycle and pattern coding [40] in the second cycle to identify the themes. They resolved
any conflicts through social moderation [41].</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Results</title>
      <p>The CAILyze process resulted in 10 iterations. The human researcher labeled each theme after the
first iteration as a new or repeated theme and continued the process. A total of 50 themes were
generated. See Appendix B for the full table of each iteration’s generated themes and
descriptions/explanations. After reaching the stopping criterion, she organized the final 50 themes
into 4 major themes and 6 subthemes (see Table 1). All generated explanations and example quotes
were checked against the raw transcripts to ensure the LLM’s accuracy; the researcher was able to
verify that the LLM did not hallucinate any of the examples. The process took less than two working
days.</p>
      <p>On the other hand, the two other researchers went through two cycles of coding, and the
humanonly manual process resulted in themes beyond the way students were critical of the data. Instead,
the final emerged themes captured the progression through which students may demonstrate
criticality. Specifically, five themes emerged through this process with one theme, evaluation,
consisting of four sub-themes (see Table 1). The process took the two researchers over two weeks to
complete.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Discussion</title>
      <p>Humans-Only
Progression of Criticality Development
1. Describe data displays
2. Ask questions about the displays
3. Interpret the data display
4. Evaluate
a. General accuracy
b. Gaps in data
c. Data representation
d. Alternative displays / Graphical</p>
      <p>comparisons
5. Meta-Discussions
In this section, we summarize and discuss our findings by incorporating illustrative student quotes
(with all names being pseudonyms). We then situate our results within the broader qualitative
analysis literature to highlight and explain the key differences between the two approaches.
6.1. Human with CAILA vs. Humans-only
While the CAILyze version only identified the themes that captured the ways students were critical
of data, inspection and discussion of the humans-only version revealed that some of the “codes”
contributing to the themes were conditions that are necessary but insufficient for criticality. That is,
you need to satisfy the particular condition in order to reach critical thinking, but it in itself does not
mean the student is being critical of the data. The perfect example is “describing data displays.” Being
able to describe the data displays is a necessary precursor to being critical of it. Lea’s statement below
illustrates a thorough description of a data display:
“So the data set that I chose, um, like I said, talks about pandemics like over the years, um,
and it uses circles to represent the death toll because, like, they're looking specifically at the
number of people who died for each of these pandemics.”</p>
      <p>In it, she identified the purpose of the display (pandemic over the years) and the visual elements
used (e.g., circles). Without acquiring the ability to tell different elements of the data visualization
apart, students will not be able to critically assess its soundness. The next phase of criticality is then
to use the data they described and question their accuracy. For example, Mary demonstrated both
the precursor “describing data displays” and critical thinking about data by “evaluating the data
displays” when she commented,
“You could say one of the ways that it accurately portrays data is because it starts from zero
and increases. While some other graphs might start from like a random number that isn’t
zero.”</p>
      <p>If we follow this line of reasoning, we can see that in the humans-only coding process, the
researchers have shifted to answering a question about “what are the conditions that are necessary for
critical thinking around data” and how students are approaching the data displays rather than the
original question around how students were critical with data. This development demonstrated the
flexibility in human thinking and coding, and the reflective process in qualitative research practice
as research questions are refined and developed [42]. The refinement of research questions is not
only acceptable, it is integral in qualitative research to ensure the question driving the study is in
line with researchers’ increased understanding of the phenomenon they are investigating [43], [44].
In our case, this shift occurs because the human coders hold broader big picture objectives that are
not privy to the AI system. That is, the researchers understand that eventually CAILA is meant to
serve the ultimate purpose of automating analyses and displaying information that is helpful for
teachers as part of a formative assessment of group discussions. Teachers may also be interested in
the progression and development of the criticality that emerged in the data beyond merely the way
that the students are critical of the data. The flexibility allowed the human researchers to expand the
RQ to encompass the information and data (e.g., students are capable of describing the data displays
in detail) that the CAILyze process treated as a given and neglected because it does not directly
answer the question about how students were critical.</p>
      <p>Despite the misalignment of research questions by the two approaches, the themes from the
humans-only approach that are directly related to the criticality surrounding data inspection were
similar to the themes derived with the CAILyze process. The “Evaluate” stage of criticality
progression contained four sub-themes, which were similar to the four main themes focusing only
on the criticality aspects of students’ data inspection. Both humans-only and CAILyzed results, for
example, captured students’ evaluation of data accuracy, (mis)representation, and alternative graph
types.</p>
      <p>
        In sum, the CAILyze process was by far faster, which aligns with previous research results of
leveraging LLMs for qualitative research [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], [45], and the derived themes were directly
reflective of the original question posed. In contrast, as human researchers coded through the raw
data, the progressions of skills needed to eventually develop criticality emerged as the more
important question. With only 10 iterations using the stopping criteria, the similarities in the
“criticality” themes lends credence to the CAILyze approach.
6.2. Implications and recommendations
Our findings also pointed out the affordances and challenges of using LLMs to assist in qualitative
coding. The model will only take in the questions asked and will not adapt the research question as
it iterates. While incapable of the research question refining process recommended for qualitative
research methodologies such as grounded theory [43], the inflexibility in changing the research
question may align more with researchers with more positivist or post-positivist epistemology [46],
[47], [48]. For qualitative researchers who may be concerned with the inflexibility of the system and
rigid RQ, we suggest that the researchers must have a deeper level of familiarity with the data (e.g.,
they are involved in the data collection or spent time living through the data with deep reading) such
that the question posed to the LLM is already the refined one. Furthermore, they can also repeat the
CAILyze process multiple times with other modified, refined preprompts.
      </p>
    </sec>
    <sec id="sec-6">
      <title>7. Conclusion</title>
      <p>In this paper, we introduced the CAILyze process and added a different approach—that with a
stopping criterion to accommodate the ephemeral nature of LLMs, adapt to varying lengths of
conversational transcripts, and ensure thematic saturation—to the nascent field of using AI for
qualitative analysis. We demonstrated that the approach produced similar results as manual coding
and illustrated that the key difference lies in the human flexibility in adjusting and refining the
research question as they analyzed the textual data. This difference illustrates the circumstances
under which the CAILyze process may be suitable. Finally, we ended with suggestions for
researchers to engage in qualitative analysis using generative AI.</p>
      <sec id="sec-6-1">
        <title>Acknowledgements</title>
        <p>This work was funded by the Emerson Collective. We would like to thank Beatriz Familia Azevedo
in reading earlier drafts of this paper and all of the high schoolers who participated in the data science
workshop. Thank you also to Mary McCrossan for helping us organize the project page on the
Scheller Teacher Education Program | The Education Arcade’s website.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Declaration on Generative AI</title>
        <p>The work in this presentation uses OpenAI’s GPT-4o as the model beneath the qualitative analysis.
That is, generative AI was used in the human + AI qualitative analysis approach presented in this paper.
In the preparation of this paper, we also used the model to ensure grammatical accuracy, spell check,
and language clarity. Afterwards, we reviewed and edited the content as needed and take full
responsibility for the publication’s content.
Human Factors in Computing Systems, Honolulu HI USA: ACM, May 2024, pp. 1–29. doi:
10.1145/3613904.3642002.
[27] T. Bunk, D. Varshneya, V. Vlasov, and A. Nichol, “DIET: Lightweight Language Understanding
for Dialogue Systems,” May 11, 2020, arXiv: arXiv:2004.09936. doi: 10.48550/arXiv.2004.09936.
[28] M. Grootendorst, “BERTopic: Neural topic modeling with a class-based TF-IDF procedure,”</p>
        <p>Mar. 11, 2022, arXiv: arXiv:2203.05794. doi: 10.48550/arXiv.2203.05794.
[29] M. Grootendorst et al., MaartenGr/BERTopic: v0.16.3. (Jul. 22, 2024). Zenodo. doi:
10.5281/zenodo.12793147.
[30] I. Arawjo, C. Swoopes, P. Vaithilingam, M. Wattenberg, and E. L. Glassman, “ChainForge: A
Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing,” in Proceedings of the CHI
Conference on Human Factors in Computing Systems, in CHI ’24. New York, NY, USA:
Association for Computing Machinery, May 2024, pp. 1–18. doi: 10.1145/3613904.3642016.
[31] I. Arawjo, P. Vaithilingam, M. Wattenberg, and E. Glassman, “ChainForge: An open-source
visual programming environment for prompt engineering,” in Adjunct Proceedings of the 36th
Annual ACM Symposium on User Interface Software and Technology, in UIST ’23 Adjunct. New
York, NY, USA: Association for Computing Machinery, Oct. 2023, pp. 1–3. doi:
10.1145/3586182.3616660.
[32] D. Jurafsky and J. H. Martin, “Question Answering, Information Retrieval, and
RetrievalAugmented Generation,” in Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech Recognition with Language Models,
3rd ed., 2024. [Online]. Available: https://web.stanford.edu/~jurafsky/slp3/14.pdf
[33] K. I. Gero, C. Swoopes, Z. Gu, J. K. Kummerfeld, and E. L. Glassman, “Supporting Sensemaking
of Large Language Model Outputs at Scale,” in Proceedings of the CHI Conference on Human
Factors in Computing Systems, in CHI ’24. New York, NY, USA: Association for Computing
Machinery, May 2024, pp. 1–21. doi: 10.1145/3613904.3642139.
[34] M. Birks and J. Mills, Grounded Theory: A Practical Guide. SAGE, 2015.
[35] S. Rahimi and M. khatooni, “Saturation in qualitative research: An evolutionary concept
analysis,” International Journal of Nursing Studies Advances, vol. 6, p. 100174, Jun. 2024, doi:
10.1016/j.ijnsa.2024.100174.
[36] B. Saunders et al., “Saturation in qualitative research: exploring its conceptualization and
operationalization,” Qual Quant, vol. 52, no. 4, pp. 1893–1907, Jul. 2018, doi:
10.1007/s11135017-0574-8.
[37] C. Urquhart, Grounded Theory for Qualitative Research: A Practical Guide. SAGE Publications,</p>
        <p>Ltd, 2013. doi: 10.4135/9781526402196.
[38] M. Roser, “OWID Homepage,” Our World in Data. Accessed: Sep. 09, 2024. [Online]. Available:
https://ourworldindata.org
[39] M. Pownall, “Is replication possible in qualitative research? A response to Makel et al. (2022),”
Educational Research and Evaluation, vol. 29, no. 1–2, pp. 104–110, Feb. 2024, doi:
10.1080/13803611.2024.2314526.
[40] M. B. Miles and A. M. Huberman, Qualitative Data Analysis, 2nd ed. Thousand Oaks, CA: SAGE</p>
        <p>Publications, 1994.
[41] D. W. Shaffer, Quantitative Ethnography. Madison, WI: Cathcart Press, 2017.
[42] J. Agee, “Developing qualitative research questions: a reflective process,” International Journal
of Qualitative Studies in Education, Jul. 2009, doi: 10.1080/09518390902736512.
[43] K. Charmaz, Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis.</p>
        <p>SAGE, 2006.
[44] J. W. Creswell and C. N. Poth, Qualitative Inquiry and Research Design: Choosing Among Five</p>
        <p>Approaches. SAGE Publications, 2016.
[45] D. L. Morgan, “Exploring the Use of Artificial Intelligence for Qualitative Data Analysis: The
Case of ChatGPT,” International Journal of Qualitative Methods, vol. 22, p. 16094069231211248,
Jan. 2023, doi: 10.1177/16094069231211248.
[46] J. Armstrong, “Naturalistic Inquiry,” Encyclopedia of research design, vol. 2. SAGE Publications,</p>
        <p>Thousand Oaks, CA, pp. 880–885, 2010.
[47] J. W. Creswell and J. D. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods
Approaches, 6th ed. SAGE Publications, 2022. Accessed: Sep. 09, 2024. [Online]. Available:
https://collegepublishing.sagepub.com/products/research-design-6-270550
[48] D. C. Phillips and N. C. Burbules, Postpositivism and educational research. in Postpositivism and
educational research. Lanham, MD, US: Rowman &amp; Littlefield, 2000, pp. ix, 101.</p>
      </sec>
      <sec id="sec-6-3">
        <title>A. Current CAILA User Interface</title>
        <p>When the analytic work detailed in this paper was conducted, CAILA was accessible only through a
Jupyter notebook interface. The team (mostly BH and DC) have subsequently made CAILA even
more accessible. This appendix shows the current interface of the system, which is open source and
accessible through github. The link to the github page can also be found on our project page:
https://education.mit.edu/project/collaborative-ai-for-learning-cail/.</p>
        <p>The user click on the Upload Data File section and a pop-up window will show up allowing the user
to select their data file.</p>
        <p>The user can also change the research question. The preset information is prefilled with suggestions
such as output format, but the user can also adjust the input and output expectations.</p>
        <p>One the file is uploaded, the upload data file section will turn green and change to “File Uploaded.”
Click on the CAILyze button on the bottom to have CAILA analyze the data.</p>
        <p>If the user forgets to include the API key, they will get a reminder to fill out the field:
We are in the process of enabling selection of different models, and the new interface with
model selection looks something like this:</p>
        <p>After clicking on the ”CAILyze!” button, a message will appear indicating that the system is
processing:
The results will appear as a table on the screen:
The user can also download the output as a CSV file:
Based on the CAILA process, if the stopping criteria have not been met, click the “CAILyze! Again!”
button to continue. If users notice repeated themes, they can select the checkboxes in the first column
for the repeated themes, then click the “Merge Selected Rows” button above the table.
There is also an undo button available to correct mistakes:
Currently, merging simply combines the selected rows (see screenshot below). In future iterations,
CAILA will synthesize the merged content and display the iteration number from which the
content was drawn.</p>
        <p>The user also has the option to delete a row if it does not accurately capture the phenomenon
under investigation. Finally, the user may download the CSV file again after the merging process is
complete.</p>
        <p>Please note that all names in this paper, including both appendices, are pseudonyms.</p>
      </sec>
      <sec id="sec-6-4">
        <title>B. Themes Generated Through the CAILyze Process</title>
        <p>Iteration</p>
        <p>Themes
1
1
1
1
Simplicity and</p>
        <p>Comprehension
Misleading Visuals</p>
        <p>Accuracy and</p>
        <p>Misrepresentation
Historical and Regional</p>
        <p>Context</p>
        <p>Description
Students analyze how data is visualized for
clarity and accuracy, considering whether
graphs start from zero to accurately
portray trends.</p>
        <p>Students express a desire for the data to be
presented in a simple and clear manner,
emphasizing the need for graphs to be
understandable like explaining games to a
fifth grader.</p>
        <p>Students critique graphs for potentially
misleading representations, focusing on
how the size of visual elements can distort
the perception of data.</p>
        <p>Students question the accuracy of data
representation, discussing how the use of
visuals like box plots and scatter plots can
either clarify or mislead the understanding
of data trends.</p>
        <p>Discussion points include whether graphs
consider historical and regional contexts
adequately, particularly in showing trends
over time and differences between regions.</p>
        <p>Example
Okay. Oh yeah. You could say one of
the ways that it accurately portrays
data is because it starts from zero and
increases. - Mary
Let's keep it simple and clear. Like
explaining games to a fifth grader.
Oscar
But if you know, at least 23 million
people died, then it comes off like it
kind of misleads. - Lea
Yeah, visuals can be super misleading.
Size should match the stats, or it's
kind of like lying with pictures.
Oscar
Yeah. Focusing on why parts of Africa
are still in the orange zone would tell
a much bigger story. - Oscar
4</p>
        <p>Concerns about
Misrepresentation or</p>
        <p>Oversimplification
Importance of Starting</p>
        <p>from Zero
Evaluation of Data
Visualization Size</p>
        <p>Relevance
Accuracy and</p>
        <p>Reliability of
Represented Data
Critical Analysis of</p>
        <p>Historical and
Geographic Context</p>
        <p>Accuracy and</p>
        <p>Representation
Simplicity and Clarity</p>
        <p>Visual</p>
        <p>Misrepresentation
Outdated Information</p>
        <p>Overlook of
Socioeconomic Factors
Students expressed concern that visual
representations such as graphs and charts
might oversimplify or misrepresent data,
potentially misleading viewers about the
true nature or scale of the data.</p>
        <p>The starting point of a graph can influence
its interpretation. Graphs that start from
zero are seen as providing a more accurate
and less misleading portrayal of data,
reflecting its natural progression.</p>
        <p>The size used in data visualization should
accurately reflect the data's magnitude or
significance. Inaccuracies or misalignment
here could skew a viewer's understanding
of the data's importance.</p>
        <p>Students focused on whether the data
represented in graphs and charts was
accurate and reliable, considering whether
it gives a true and fair view of the
information it purports to represent.
Students analyzed graphs through a critical
lens of historical and geographical context,
evaluating whether these visualizations
account for significant external factors
such as imperialism or economic changes.
Students were critically assessing if the
data was accurate and represented
correctly. They examined whether graphs
start from zero to accurately portray
increases, which helps in showing true
scales and trends without misleading.
The conversation highlighted the
importance of keeping data presentations
simple and clear for easier understanding,
much like explaining complex concepts to
a younger audience, ensuring the
information is accessible to all.</p>
        <p>There was concern over how visuals might
mislead the audience, specifically how the
size of elements (like circles) used in the
graphs could fail to match the statistics
they're meant to represent, potentially
distorting the perceived impact of the data.
The students critiqued graphs for using
outdated data, pointing out the importance
of presenting the most current data
available to make accurate and relevant
conclusions.</p>
        <p>There was a consideration of how graphs
might overlook significant socio-economic
contexts, such as the effects of imperialism
and colonialism on poverty rates,
suggesting a need for more nuanced
presentations that account for underlying
causes.</p>
        <p>Students critically analyze the accuracy
and representation of data in graphs,
questioning whether the data is presented
in a misleading manner or if it accurately
reflects the information.</p>
        <p>Lea's critique about how the
pandemic data visualization with
circles could be misleading, as it does
not adequately represent the scale of
death tolls, especially when exact
numbers are not known for all
pandemics (#74, #73).</p>
        <p>Mary's approval of graphs starting
from zero as a method for accurately
portraying increases in data,
suggesting that starting from a
different number might distort the
data's progression (#18).</p>
        <p>Lea's and Oscar's discussion about the
misleading nature of representing
death tolls with circles, pointing out
that the visual size should match the
statistics to avoid misinterpretation
(#74, #76).</p>
        <p>Teresa's assessment of the accuracy
of a population graph, considering
technology's impact on literacy and
world population numbers (#121,
#119).</p>
        <p>Lea and James's discussion about
poverty trends in Eastern Europe and
how different regions' economic
statuses are depicted in the data,
stressing the necessity to account for
historical and geopolitical influences
(#94, #97, #99).</p>
        <p>Okay. Oh yeah. You could say one of
the ways that it accurately portrays
data is because it starts from zero and
increases. While some other graphs
might start from like a random
number that isn't zero.</p>
        <p>Let's keep it simple and clear. Like
explaining games to a fifth grader.
Yeah, visuals can be super misleading.
Size should match the stats, or it's
kind of like lying with pictures.
2018 data trying to talk 2023 stuff.
That's like using floppy disks for
homework.</p>
        <p>I would say I just think that is a little
misleading because to me, like, I think
it's very representative of what we
classify as like a first world and a
third world country.</p>
        <p>Yeah, visuals can be super misleading.
Size should match the stats, or it's
kind of like lying with pictures.</p>
        <p>Consideration of Data</p>
        <p>Starting Points</p>
        <p>Evaluation of
Graphical Trends
Understanding of
Graph Purpose and</p>
        <p>Clarity</p>
        <p>Discussion on
Completeness and
Misleading Elements</p>
        <p>Comparison and
Contextual Analysis
Critical Analysis of</p>
        <p>Graph Accuracy
Evaluation of Graph
Clarity and Simplicity
Graphs Representing</p>
        <p>Change Over Time
Critical Consideration
of Graph Scales and</p>
        <p>Proportions
Assessing Relevance
and Misleading Visuals
Some students focus on the importance of
graphs starting from zero to accurately
portray increases or changes in the data,
noting that starting from a non-zero
number could misrepresent the data.
Participants evaluate how trends are
shown in graphs, debating their
effectiveness in displaying relationships
between variables or time-based changes.
The students discuss whether the purpose
of the graph is clear and if the graph
successfully conveys its intended message
in a simple and understandable manner.
There's an awareness among the students
about the presence of certain elements in
the graphs that could mislead the audience
or omit important information.</p>
        <p>Students compare graphs not only
internally for consistency or accuracy but
also contextually, considering whether
they represent broader truths effectively
and are current.</p>
        <p>Students critically analyzed the accuracy of
the graphs to ensure the data represented
was accurate and not deceiving. For
example, Mary and Alice found their graph
to start from zero and cover data from at
least 14 years ago, demonstrating increases
throughout the year, which they
considered accurately represented the
data.</p>
        <p>Students emphasized the importance of
clarity and simplicity in representing data.
Oscar suggested keeping explanations
simple and clear, akin to explaining games
to a fifth grader, highlighting the need for
easily understandable data representation.
The students assessed how well graphs
demonstrate changes over time. Lea
discussed how her chosen dataset on
pandemics visually represented the death
tolls over the years but found it misleading
when exact numbers were not available,
indicating a need for accurate temporal
representation.</p>
        <p>The students were critical about how
graph scales and proportions could mislead
or accurately depict the data. James and his
group discussed how the colors
represented different income levels and
how the distribution changed over time,
which required careful scrutiny to avoid
misinterpretation.</p>
        <p>Students were attuned to the potential for
graphs to mislead through their visuals.
Lea critiqued her graph for potentially
misleading viewers due to
disproportionate circle sizes representing
deaths from pandemics, showing a critical
approach to the relevance and accuracy of
visual aids.</p>
        <p>Okay. Oh yeah. You could say one of
the ways that it accurately portrays
data is because it starts from zero and
increases. While some other graphs
might start from like a random
number that isn't zero.</p>
        <p>Scatterplot could show how cases
change over time. Box plot good for
comparing months maybe.</p>
        <p>Let's keep it simple and clear. Like
explaining games to a fifth grader.
But if you know, at least 23 million
people died, then it comes off like it
kind of misleads. It comes off
misleading only because, like, that's
half of the amount of people that died
in the Black Death.
2018 data trying to talk 2023 stuff.
That's like using floppy disks for
homework.</p>
        <p>Accuracy and
Representation
Simplicity vs.</p>
        <p>Complexity
Students critically assessed how accurately
and fairly the data was represented in the
graphs, focusing on whether the graphical
elements like starting points, size of
elements, and overall design gave a true
picture of the underlying data.</p>
        <p>There was a discussion about the balance
between simplicity and complexity in data
representation, with students weighing the
need for graphs to be easily understandable
while still capturing the full scope and
nuances of the data.</p>
        <p>Students were critical of visual elements
that could potentially mislead the viewer
about the data's true story, such as the size
of elements not matching the scale of the
data they represent or the selection of
visual types like circles or colors.</p>
        <p>The students assessed the relevance and
timeliness of the data being presented,
understanding that outdated or
noncurrent data could distort or diminish the
utility of the information being conveyed.
A theme emerged around the critical
analysis of socio-economic factors not
being represented in the data, where
students expressed concern over the
graphs not showing the 'why' behind
patterns or distributions, particularly in
representing global issues.</p>
        <p>Description: Students critically evaluate
how well graphs display data, focusing on
accuracy and ease of understanding. They
discuss whether graphs start at zero to
accurately represent growth and whether
the representation correctly portrays
increases over time.</p>
        <p>Description: Discussing different types of
graphs, students assess their effectiveness
in showing trends, relationships between
variables, or changes over time. They
weigh the pros and cons of box plots versus
scatter plots for visualizing data.</p>
        <p>Description: Students express concern over
how certain visual representations might
mislead or fail to capture the full story
behind the numbers. They stress the
importance of visual aids that match
statistical data accurately to avoid
confusion or misinterpretation.</p>
        <p>Description: Critical analysis extends to
how graphs incorporate or neglect
historical and geographical context,
affecting the viewer's understanding of
data trends over time or across different
regions.</p>
        <p>Mary highlighted the importance of
graphs starting from zero for
accuracy in representation,
expressing that it more accurately
portrays data compared to graphs
that start from arbitrary non-zero
values.</p>
        <p>Oscar suggested keeping the
explanation simple, akin to
explaining games to a fifth-grader,
highlighting the need for clarity in
data presentation to make it
accessible to all viewers.</p>
        <p>Lea talked about how the
representation of death tolls using
circles might mislead viewers,
especially if the size of the circles
doesn't correspond accurately to the
numbers they're supposed to
represent.</p>
        <p>James brought up concerns about
using data from 2018 to talk about
current situations, comparing it to
using outdated technology like floppy
disks - underlying the need for
up-todate information in making relevant
analyses.</p>
        <p>Lea and Oscar discussed the
importance of considering underlying
factors such as imperialism and
colonization in the analysis of
poverty rates across different regions,
implying that data without context
could provide a misleading or
incomplete narrative.</p>
        <p>Example: Mary comments on how a
graph accurately portrays data
because it starts from zero,
highlighting a critical analysis of
graph initiation and its impact on
data interpretation (#00:01:55#).</p>
        <p>Example: Oscar points out the utility
of scatter plots in showing changes
over time compared to box plots,
which might not show the
relationship between variables
(#00:00:38#, #00:01:34#).</p>
        <p>Example: Lea discusses how the
visualization of pandemic data might
be misleading due to its
representation of death tolls through
circles and triangles, suggesting that
visuals can distort the perceived
impact of pandemics (#00:07:30#).</p>
        <p>Example: James and Lea discuss a
graph's depiction of poverty rates in
Eastern Europe, debating whether it
misleadingly portrays economic
progress without considering outer
context or regions (#00:10:57#,
#00:13:04#).</p>
        <p>Description: Evaluating the credibility of
the data sources behind graphs, students
consider whether the information provided
can be trusted, emphasizing the role of
Example: Teresa regards a graph as
reliable because the data source is
from UNESCO, showing an
awareness of source validity in
authoritative sources in ensuring data
accuracy.
assessing
(#00:04:14#).</p>
        <p>graph
credibility
8
8
8
8
8
9
9
9</p>
        <p>Awareness of
Misleading Visual
Representations</p>
        <p>Understanding the
Importance of Starting</p>
        <p>Points in Graphs
Critical Evaluation of</p>
        <p>Data
Representativeness
Insights on Data
Presentation and</p>
        <p>Clarity
Concerns Over Data</p>
        <p>Currency and</p>
        <p>Relevance
Evaluating data
representation</p>
        <p>accuracy
Complexity and clarity</p>
        <p>of visualization
Critical analysis of</p>
        <p>visual elements
Contextual relevance
and updating of data
Students expressed concern over how the
visual representation of data might mislead
viewers. For instance, Lea noted that the
representation of the death toll in
pandemics using circles could be
misleading if not scaled accurately to
reflect the magnitude of the data, as it
could minimize the perceived impact of
significant events.</p>
        <p>The discussion on the importance of
graphs starting at zero to accurately
portray data increases was highlighted.
Mary mentioned that one of the ways data
is accurately portrayed is by graphs
starting from zero, as opposed to starting
from a random number which could
misinterpret data trends.</p>
        <p>Students critically evaluated whether the
data presented was representative and
accurately depicted. For example, James
discussed how the distribution of
population across different poverty
thresholds might not be misleading but
highlighted the lack of updated data might
pose issues for current applicability.
The need for simplicity and clarity in
presenting data was emphasized, with
students suggesting that data should be
explained in a manner that is
understandable to individuals without
expertise in the field. Oscar mentioned
keeping explanations simple and clear,
analogous to explaining games to a fifth
grader.</p>
        <p>Students showed concern for the relevance
of the data based on its currency, noting
that using outdated data for current
analysis can be misleading. James's critique
of using data from 2018 to discuss poverty
in 2023 exemplifies this concern, likening
it to using floppy disks for modern
homework.</p>
        <p>Students critiqued the effectiveness and
accuracy of the data representations in
conveying information.</p>
        <p>Students reflected on the importance of
keeping data presentations simple and
understandable, highlighting that
complexity may hinder comprehension.
Students critically analyzed the use of
visual elements in graphs, such as size and
color, and how they can mislead or
accurately represent data.</p>
        <p>The relevance and timeliness of the data
were considered, with students
questioning how current the data was and
whether it reflects recent changes or
conditions.</p>
        <p>Mary remarked on how one of the
ways data is accurately portrayed is
by starting from zero, suggesting
awareness of how graph starting
points can affect interpretation.</p>
        <p>Oscar suggested keeping
explanations simple and clear, like
explaining games to a fifth grader,
emphasizing the need for clarity in
data visualization.</p>
        <p>Lea discussed how the use of circles
to represent the death toll in
pandemics could be misleading,
especially when the size of the circles
does not correspond with the
numbers they represent.</p>
        <p>James critiqued a distribution of
population between poverty
thresholds graph for using data up to
2018, pointing out its lack of updation
to reflect 2023 circumstances.
10
10</p>
        <p>Authenticity and
Reliability of Data</p>
        <p>Sources</p>
        <p>Students critically analyze how effectively
data and trends are represented in graphs,
scrutinizing the clarity and accuracy of the
portrayal.</p>
        <p>Participants identify and discuss how
certain visual elements can be misleading,
emphasizing the importance of an accurate
match between visuals and statistical data.
The conversation includes concerns
regarding the historical accuracy and the
relevance of the data depicted, discussing
how out-of-date or lacking information
affects the understanding of the subject
matter.</p>
        <p>Discussing the effectiveness of different
types of graphs in comparing variables or
showcasing trends clearly to enhance
understanding.</p>
        <p>Yeah, visuals can be super misleading.
Size should match the stats, or it's
kind of like lying with pictures.</p>
        <p>But if you know, at least 23 million
people died, then it comes off like it
kind of misleads.</p>
        <p>It's easier to see the trend box plots
won't show the relationship between
variables.</p>
        <p>Yeah. Okay. That's nice. I also saw
like the credentials or like the data
source below by Unesco. Yeah, I'd say
like. Yeah, I say this is reliable.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Saldaña</surname>
          </string-name>
          ,
          <article-title>The Coding Manual for Qualitative Researchers</article-title>
          , 4th ed.
          <source>SAGE Publications</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wa-Mbaleka</surname>
          </string-name>
          , “
          <article-title>The Researcher as an Instrument,</article-title>
          ” in Computer Supported Qualitative Research,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. P.</given-names>
            <surname>Reis</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A</surname>
          </string-name>
          . Moreira, Eds., Cham: Springer International Publishing,
          <year>2020</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>41</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -31787-
          <issue>4</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Xu</surname>
          </string-name>
          and
          <string-name>
            <given-names>G. B.</given-names>
            <surname>Storr</surname>
          </string-name>
          , “
          <article-title>Learning the Concept of Researcher as Instrument in Qualitative Research,” TQR</article-title>
          , vol.
          <volume>17</volume>
          , no.
          <issue>42</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>2012</year>
          , doi: 10.46743/
          <fpage>2160</fpage>
          -
          <lpage>3715</lpage>
          /
          <year>2012</year>
          .1768.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Rahman</surname>
          </string-name>
          , “
          <article-title>The Advantages and Disadvantages of Using Qualitative and Quantitative Approaches and Methods in Language 'Testing and Assessment' Research: A Literature Review</article-title>
          ,
          <source>” JEL</source>
          , vol.
          <volume>6</volume>
          , no.
          <issue>1</issue>
          , p.
          <fpage>102</fpage>
          ,
          <year>2017</year>
          , doi: 10.5539/jel.v6n1p102.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Johri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Khatri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Al-Taani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sabharwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Suvanov</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , “Natural Language Processing: History, Evolution, Application, and Future Work,”
          <source>in Proceedings of 3rd International Conference on Computing Informatics and Networks</source>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Abraham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Castillo</surname>
          </string-name>
          , and D. Virmani, Eds., Singapore: Springer,
          <year>2021</year>
          , pp.
          <fpage>365</fpage>
          -
          <lpage>375</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-15-9712-1_
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K. S.</given-names>
            <surname>Jones</surname>
          </string-name>
          , “
          <article-title>Natural Language Processing: A Historical Review,” in Current Issues in Computational Linguistics: In Honour of Don Walker, A</article-title>
          . Zampolli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Calzolari</surname>
          </string-name>
          , and M. Palmer, Eds., Dordrecht: Springer Netherlands,
          <year>1994</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-0-
          <fpage>585</fpage>
          -35958-
          <issue>8</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Vayansky</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. A. P.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , “
          <article-title>A review of topic modeling methods</article-title>
          ,
          <source>” Information Systems</source>
          , vol.
          <volume>94</volume>
          , p.
          <fpage>101582</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          .
          <year>2020</year>
          , doi: 10.1016/j.is.
          <year>2020</year>
          .
          <volume>101582</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <surname>M. I. Jordan</surname>
          </string-name>
          , “Latent dirichlet allocation,
          <source>” J. Mach. Learn. Res.</source>
          , vol.
          <volume>3</volume>
          , no. null, pp.
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          , Mar.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Dumais</surname>
          </string-name>
          , “
          <source>Latent Semantic Analysis,” Annual Review of Information Science and Technology (ARIST)</source>
          , vol.
          <volume>38</volume>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>230</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T. K. Landauer</surname>
            ,
            <given-names>P. W.</given-names>
          </string-name>
          <string-name>
            <surname>Foltz</surname>
            , and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Laham</surname>
          </string-name>
          , “
          <article-title>An introduction to latent semantic analysis</article-title>
          ,
          <source>” Discourse Processes</source>
          , vol.
          <volume>25</volume>
          , no.
          <issue>2-3</issue>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>284</lpage>
          , Jan.
          <year>1998</year>
          , doi: 10.1080/01638539809545028.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hacking</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P. H.</given-names>
            <surname>Hamers</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Aarts</surname>
          </string-name>
          , “
          <article-title>Comparing text mining and manual coding methods: Analysing interview data on quality of care in long-term care for older adults</article-title>
          ,”
          <source>PLoS One</source>
          , vol.
          <volume>18</volume>
          , no.
          <issue>11</issue>
          , p.
          <fpage>e0292578</fpage>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2023</year>
          , doi: 10.1371/journal.pone.
          <volume>0292578</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Tschisgale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wulff</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Kubsch</surname>
          </string-name>
          , “
          <article-title>Integrating artificial intelligence-based methods into qualitative research in physics education research: A case for computational grounded theory</article-title>
          ,
          <source>” Phys. Rev. Phys. Educ. Res.</source>
          , vol.
          <volume>19</volume>
          , no.
          <issue>2</issue>
          , p.
          <fpage>020123</fpage>
          ,
          <string-name>
            <surname>Sep</surname>
          </string-name>
          .
          <year>2023</year>
          , doi: 10.1103/PhysRevPhysEducRes.19.020123.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>De Paoli</surname>
          </string-name>
          , “
          <article-title>Performing an Inductive Thematic Analysis of Semi-Structured Interviews With a Large Language Model: An Exploration and Provocation on the Limits of the Approach,” Social Science Computer Review</article-title>
          , vol.
          <volume>42</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>997</fpage>
          -
          <lpage>1019</lpage>
          , Aug.
          <year>2024</year>
          , doi: 10.1177/08944393231220483.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Zambrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barany</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Nasiar</surname>
          </string-name>
          , “From nCoder to ChatGPT: From Automated Coding to Refining Human Coding,” in Advances in Quantitative Ethnography,
          <string-name>
            <given-names>G. Arastoopour</given-names>
            <surname>Irgens</surname>
          </string-name>
          and S. Knight, Eds., Cham: Springer Nature Switzerland,
          <year>2023</year>
          , pp.
          <fpage>470</fpage>
          -
          <lpage>485</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -47014-1_
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Braun</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Clarke</surname>
          </string-name>
          , “
          <article-title>Using thematic analysis in psychology</article-title>
          ,” Qualitative Research in Psychology, vol.
          <volume>3</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>77</fpage>
          -
          <lpage>101</lpage>
          , Jan.
          <year>2006</year>
          , doi: 10.1191/1478088706qp063oa.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Nelson</surname>
          </string-name>
          , “
          <article-title>Computational Grounded Theory: A Methodological Framework</article-title>
          ,”
          <source>Sociological Methods &amp; Research</source>
          , vol.
          <volume>49</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>42</lpage>
          , Feb.
          <year>2020</year>
          , doi: 10.1177/0049124117729703.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barany</surname>
          </string-name>
          et al.,
          <article-title>“ChatGPT for Education Research: Exploring the Potential of Large Language Models for Qualitative Codebook Development,” in Artificial Intelligence in Education,</article-title>
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Olney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.-A.</given-names>
            <surname>Chounta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. C.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. I</surname>
          </string-name>
          . Bittencourt, Eds., Cham: Springer Nature Switzerland,
          <year>2024</year>
          , pp.
          <fpage>134</fpage>
          -
          <lpage>149</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -64299-9_
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , C. Wu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Carroll</surname>
          </string-name>
          , “
          <article-title>Redefining Qualitative Analysis in the AI Era: Utilizing ChatGPT for Efficient Thematic Analysis</article-title>
          ,
          <source>” Sep. 19</source>
          ,
          <year>2023</year>
          . Accessed: Sep.
          <volume>07</volume>
          ,
          <year>2024</year>
          . [Online]. Available: https://arxiv.org/abs/2309.10771v3
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Lam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Teoh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Landay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , “Concept Induction:
          <article-title>Analyzing Unstructured Text with High-Level Concepts Using LLooM,”</article-title>
          <source>in Proceedings of the CHI Conference on Human Factors in Computing Systems, in CHI '24</source>
          . New York, NY, USA: Association for Computing Machinery, May
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          . doi:
          <volume>10</volume>
          .1145/3613904.3642830.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.</given-names>
            <surname>Christou</surname>
          </string-name>
          , “
          <article-title>Ηow to Use Artificial Intelligence (AI) as a Resource, Methodological and Analysis Tool in Qualitative Research?,” TQR</article-title>
          , vol.
          <volume>28</volume>
          , no.
          <issue>7</issue>
          , pp.
          <fpage>1968</fpage>
          -
          <lpage>1980</lpage>
          , Jul.
          <year>2023</year>
          , doi: 10.46743/
          <fpage>2160</fpage>
          -
          <lpage>3715</lpage>
          /
          <year>2023</year>
          .6406.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Acerbi</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Stubbersfield</surname>
          </string-name>
          , “
          <article-title>Large language models show human-like content biases in transmission chain experiments</article-title>
          ,
          <source>” Proceedings of the National Academy of Sciences</source>
          , vol.
          <volume>120</volume>
          , no.
          <issue>44</issue>
          , p.
          <fpage>e2313790120</fpage>
          ,
          <string-name>
            <surname>Oct</surname>
          </string-name>
          .
          <year>2023</year>
          , doi: 10.1073/pnas.2313790120.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>C. O'Neil</surname>
          </string-name>
          , Weapons of Math Destruction:
          <article-title>How Big Data Increases Inequality and Threatens Democracy, Reprint edition</article-title>
          .
          <source>Crown</source>
          ,
          <year>2017</year>
          . Accessed: Sep.
          <volume>12</volume>
          ,
          <year>2024</year>
          . [Online]. Available: https://www.penguinrandomhouse.com/books/241363/weapons-of-math-destruction-bycathy-oneil/
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <article-title>“MAXQDA vs</article-title>
          .
          <source>ATLAS.ti | Best Qualitative Data Analysis Software,” ATLAS.ti. Accessed: Sep. 09</source>
          ,
          <year>2024</year>
          . [Online]. Available: https://atlasti.com
          <article-title>/maxqda-vs-atlasti-comparison</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. T. W.</given-names>
            <surname>Choo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. K.-W. Lee</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Perrault</surname>
          </string-name>
          , “
          <article-title>CoAIcoder: Examining the Effectiveness of AI-assisted Human-to-Human Collaboration in Qualitative Analysis,”</article-title>
          <source>ACM Trans. Comput</source>
          .-Hum. Interact., vol.
          <volume>31</volume>
          , no.
          <issue>1</issue>
          , p.
          <volume>6</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          :
          <fpage>38</fpage>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>2023</year>
          , doi: 10.1145/3617362.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          , T. J.
          <string-name>
            <surname>-J. Li</surname>
          </string-name>
          , and S. T. Perrault, “
          <article-title>CollabCoder: A GPT-Powered WorkFlow for Collaborative Qualitative Analysis,” in Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing</article-title>
          ,
          <source>in CSCW '23 Companion</source>
          . New York, NY, USA: Association for Computing Machinery, Oct.
          <year>2023</year>
          , pp.
          <fpage>354</fpage>
          -
          <lpage>357</lpage>
          . doi:
          <volume>10</volume>
          .1145/3584931.3607500.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          et al.,
          <article-title>“CollabCoder: A Lower-barrier, Rigorous Workflow for Inductive Collaborative Qualitative Analysis with Large Language Models</article-title>
          ,”
          <source>in Proceedings of the CHI Conference on</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>