<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Fair and Personalized Dementia Prediction Framework Using Longitudinal and Demographic Data from South Korea</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hong-Woo Chun</string-name>
          <email>hw.chun@kisti.re.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lee-Nam Kwon</string-name>
          <email>ynkwon@kisti.re.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hyeonho Shin</string-name>
          <email>shinhh9554@kisti.re.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>SungWha Hong</string-name>
          <email>shong@kisti.re.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jae-Min Lee</string-name>
          <email>jmlee@kisti.re.kr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Future Information Analysis Center, Korea Institute of Science and Technology Information</institution>
          ,
          <addr-line>66 Hoegi-ro, Dongdaemun-gu, Seoul, 02456</addr-line>
          ,
          <country country="KR">South Korea</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Early prediction of dementia is a critical public health challenge, yet conventional machine learning models often treat all patients as a single, uniform population. This approach overlooks subtle clinical diferences between individuals and can lead to biased predictions that disproportionately afect specific demographic groups. This study proposes a novel framework that leverages the power of a Large Language Model (LLM) to build a fair and personalized dementia prediction system. While traditional methods required separate modeling for men and women, the LLM, thanks to its reasoning capabilities, can perform customized predictions using all data without the need for such separate modeling. We show that providing an LLM with specific demographic context, such as sex, leads to more nuanced and accurate predictive results than a generic, non-contextual prompt. This approach demonstrates that LLMs can be a powerful tool for developing personalized medical AI systems that respect individual diferences and reduce algorithmic bias.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLM-based Dementia Early Prediction</kwd>
        <kwd>Personalization</kwd>
        <kwd>Fairness and Bias in Medical AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Dementia is a growing public health challenge worldwide, with global estimates predicting a sharp
increase in patient numbers – a trend that is particularly pronounced in South Korea where long-term
health databases ofer unique insights into population trends [
        <xref ref-type="bibr" rid="ref1 ref12">1</xref>
        ]. Traditional dementia prediction
models—typically based on statistical learning from imaging, genetic markers, or cross-sectional clinical
data—tend to pool all patients into a single homogeneous group, often overlooking crucial demographic
diferences that influence both disease onset and progression [
        <xref ref-type="bibr" rid="ref13 ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref15 ref3">3</xref>
        ]. Such “one-size-fits-all” paradigm
has led to models that perform well in aggregate but tend to be biased against underrepresented
subpopulations, for example, failing to capture sex-specific diferences in Alzheimer’s disease risk and
presenting lower predictive accuracy for women [
        <xref ref-type="bibr" rid="ref17 ref4">4</xref>
        ]. Recent research has increasingly emphasized that
fairness and bias in medical artificial intelligence must be addressed throughout the model development
lifecycle – from data preprocessing to algorithmic design – to avoid perpetuating existing health
disparities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The advent of Large Language Models (LLMs) ofers a transformative opportunity in
this context. Unlike conventional machine learning approaches, LLMs possess advanced reasoning and
generative capabilities that allow them to integrate unstructured data and contextual variables, such as
explicit demographic details, to deliver more personalized and equitable predictions [
        <xref ref-type="bibr" rid="ref1 ref12">1</xref>
        ]. This work
presents a novel dementia prediction framework that leverages an LLM with a demographically-aware
prompting strategy, enabling it to generate context-sensitive predictions by integrating longitudinal
clinical data with critical demographics. By doing so, our framework not only improves overall prediction
accuracy but also directly mitigates bias, ensuring that individualized risk factors are considered rather
than smoothed over in aggregate models [
        <xref ref-type="bibr" rid="ref17 ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref15 ref3">3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>In this section, we briefly review research trends in dementia prediction and then focus on recent
advances in fairness and bias mitigation within medical AI, particularly in the era of LLM-driven clinical
decision-making.</p>
      <sec id="sec-2-1">
        <title>2.1. Research Trends in Dementia Prediction</title>
        <p>This section reviews two sets of research: one focusing on advancements in dementia prediction and the
other on fairness and bias mitigation in medical AI, with special attention to generative and LLM-driven
approaches.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Research Trends in Dementia Prediction</title>
        <p>
          Early approaches to dementia prediction mainly relied on single-point in time laboratory assessments
or neuroimaging data such as MRI and PET scans, with genetic markers like the ApoE4 allele providing
additional risk information [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Although these methods have delivered foundational insights into
disease etiology, they typically fall short in capturing the progressive and multifactorial nature of
dementia. More recent studies have increasingly utilized longitudinal health records—such as those
available from South Korea’s National Health Information Database—to track changes in clinical
parameters over time, thereby significantly improving risk prediction [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. However, these models
largely rely on traditional approaches such as support vector machines or multilayer perceptrons and
struggle to adapt to individual-level variations, often resulting in diferential predictive performance
across demographic groups [
          <xref ref-type="bibr" rid="ref13 ref2">2</xref>
          ]. This limitation motivates the emerging interest in advanced deep
learning techniques, such as LLM-based frameworks, which ofer intrinsic mechanisms for assimilating
heterogeneous data and can be tailored via context-specific prompting to address inherent disparities
[
          <xref ref-type="bibr" rid="ref1 ref12">1</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Fairness and Bias in Medical AI</title>
        <p>
          Fairness and bias mitigation have become critical as medical AI systems are deployed in clinical
decision-making, especially given that biased training data and model architectures can amplify existing
disparities in healthcare outcomes [
          <xref ref-type="bibr" rid="ref15 ref3">3</xref>
          ] [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. For instance, models trained on predominantly Western or
white populations often underperform when applied to minority groups, resulting in lower diagnostic
accuracy and delayed treatment for underserved patients [
          <xref ref-type="bibr" rid="ref17 ref4">4</xref>
          ] [
          <xref ref-type="bibr" rid="ref13 ref2">2</xref>
          ]. Recent research on medical AI has
explored a spectrum of techniques to decrease such bias. Approaches such as adversarial debiasing,
counterfactual fairness testing, output recalibration, and threshold adjustment have been applied to
reduce disparate impacts by ensuring that model predictions remain accurate and equitable when
evaluated separately across demographic subgroups [
          <xref ref-type="bibr" rid="ref13 ref2">2</xref>
          ]. These technical strategies are often paired
with explainability frameworks—which utilize methods such as LIME, SHAP, or attention-based
visualizations—to provide transparent insight into the model’s decision-making process. Such interpretability
is essential for clinical trust and to diagnose hidden sources of bias because it allows stakeholders to
evaluate how patient-specific factors such as sex, race, or socioeconomic status influence predictions
[
          <xref ref-type="bibr" rid="ref15 ref3">3</xref>
          ].
        </p>
        <p>
          The integration of fairness-aware methods with LLMs is an emerging trend in healthcare AI. Recent
studies have demonstrated that providing LLMs with explicit demographic context through tailored
prompting strategies leads to substantially improved prediction accuracy and fairness compared to
generic, noncontextual approaches [
          <xref ref-type="bibr" rid="ref17 ref4">4</xref>
          ]. In one example, researchers used demographic perturbation
and prompt engineering to identify and mitigate bias in diagnostic outputs, highlighting that even
state-of-the-art LLMs may exhibit disparities unless calibrated with subgroup-specific information
[
          <xref ref-type="bibr" rid="ref11 ref7">7</xref>
          ]. Moreover, frameworks employing federated learning and fair representation learning have shown
promise in harnessing diverse datasets without compromising patient privacy, reinforcing the necessity
of interdisciplinary strategies that span technical, ethical, and regulatory domains [
          <xref ref-type="bibr" rid="ref1 ref12">1</xref>
          ] [
          <xref ref-type="bibr" rid="ref15 ref3">3</xref>
          ].
        </p>
        <p>
          In addition to these technical advances, the literature increasingly calls for standardized fairness
metrics and comprehensive evaluation frameworks. Benchmarking tools such as MEDFAIR and
DiversityMedQA have been introduced to assess algorithmic fairness under various conditions, providing
quantifiable measures of equity across racial, sex, and socioeconomic dimensions [
          <xref ref-type="bibr" rid="ref13 ref2">2</xref>
          ] [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. These eforts
are complemented by policy-driven initiatives which seek to embed fairness considerations into the
broader medical AI regulatory landscape, an evolution that is essential to promote both accountability
and trust in clinical settings [
          <xref ref-type="bibr" rid="ref13 ref2">2</xref>
          ] [
          <xref ref-type="bibr" rid="ref15 ref3">3</xref>
          ].
        </p>
        <p>
          By synthesizing these lines of research, our study adopts an LLM-based approach that directly
confronts the limitations of traditional one-size-fits-all models. Our framework leverages
demographicallyaware prompting to provide personalized dementia risk predictions, thereby simultaneously enhancing
overall accuracy and reducing bias. This work contributes to the growing body of literature that seeks
to advance not only the technical performance but also the ethical underpinnings of AI applications in
healthcare [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Future research directions include expanding evaluations to additional demographic
subgroups and integrating state-of-the-art fairness mitigation techniques that continue to emerge in
response to evolving clinical standards [
          <xref ref-type="bibr" rid="ref11 ref7">7</xref>
          ].
        </p>
        <p>
          In summary, revised research in dementia prediction now not only emphasizes the importance of
longitudinal and heterogeneous data integration but also necessitates a shift toward fairness-focused
AI methodologies. By harnessing the next-generation capabilities of LLMs combined with rigorous bias
auditing and explainability measures, our approach promises a fairer, more individualized method for
early dementia risk assessment—one that is responsive to the complex interplay of demographic factors
and clinical indicators [
          <xref ref-type="bibr" rid="ref1 ref12">1</xref>
          ] [
          <xref ref-type="bibr" rid="ref15 ref3">3</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>This study applied a novel LLM-based methodology to analyze bias in dementia prediction and prove
the need for a personalized approach. Instead of training distinct machine learning models, we utilized
a single LLM with a flexible prompting strategy to generate predictive insights.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>
          This study uses the National Health Information Database (NHID) of the National Health Insurance
Service in South Korea, specifically the NHIS-SC DB, to predict the risk of developing dementia [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
The NHIS-SC DB is a large-scale cohort of one million individuals over the age of 60, randomly sampled
from the total population. This database contains comprehensive health-related information for these
individuals from 2002 to 2013, including health insurance eligibility, medical treatments, diagnoses,
health check-up results, and long-term care insurance. We used the following key sub-databases:
• PIE-DB (Population-based Individualized Eligibility-DB): Contains demographic
information (sex, age, residential area) and health insurance eligibility.
• MT-DB (Medical Treatment-DB): Provides detailed records of outpatient visits, hospitalizations,
and prescriptions, coded using the Korean Standard Classification of Diseases (KCD), which is
based on the International Classification of Diseases (ICD).
• GHE-DB (General Health Examination-DB): Includes results from biennial health check-ups,
such as blood pressure, BMI, blood sugar levels, and cholesterol.
• MCI-DB (Medical care institution DB): Comprises data such as type of medical institution, area
and installation period, number of hospital beds, number of doctors, and equipment availability
status.
• LCI-DB (Long-term care insurance DB): Encompasses long-term care application and decision
results, opinions of doctors, such as an examination of recognized necessity, long-term care
facility data.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. LLM-Based Prediction System Architecture and Experimental Setup</title>
        <p>The LLM-based dementia prediction system proposed in this study has a fundamentally diferent
architecture from traditional machine learning models. Instead of individually training multiple models,
it leverages the powerful reasoning capabilities of a single LLM to derive personalized predictions
through data preprocessing and prompt engineering.</p>
        <p>System Architecture The overall flow of our LLM-based prediction system is illustrated below.
The architecture is modular and designed to maximize the LLM’s natural language understanding and
generative capabilities.</p>
        <p>Step 1. Data Preprocessing Stage:
• Input Data: The input for this stage is the raw data from the NHIS-SC DB. This data consists of
various forms of tabular data, including patient medical records, diagnoses, prescription details,
and health check-up results.
• Conversion Process: While traditional machine learning models would directly use this tabular
data as a numerical feature vector, our system maximizes the LLM’s natural language processing
capabilities by converting this data into narrative text.
- Medical History Summarization: The patient’s diagnoses (e.g., hypertension, diabetes,
depression) and their onset times are organized into a chronological narrative.
- Health Check-up Records: Changes in regular health check-up metrics like blood pressure,
BMI, blood sugar, and cholesterol are summarized into sentences that describe trends (e.g., "Blood
pressure has been consistently rising over the past three years").
- Demographic Information: Demographic details such as sex, age, and residential area are
explicitly stated.
• Output: After this stage, all medical information for each patient is generated as a single, unified
text block that is easily understandable by the LLM.</p>
        <p>Step 2. LLM-Based Prediction Stage:
• Prompt Design: This stage is crucial for the LLM’s performance. We conducted our experiments
using two distinct prompting strategies. The core idea is to test how the presence of explicit
demographic context influences the LLM’s predictive reasoning.
- Integrated Prompt: This prompt provided the LLM with a patient’s structured medical history
without any specific demographic context. The instruction was simply "Based on the following
medical history, predict the likelihood of dementia and explain your reasoning."
- Demographically-Aware Prompt: This prompt included the same medical history but with
explicit demographic information. For example, "Based on the following medical history for a
65-year-old female, predict the likelihood of dementia and explain your reasoning."
• LLM’s Reasoning: The LLM does not simply classify the input text; instead, it performs complex
reasoning based on its vast, pre-trained medical knowledge base. For instance, when it receives a
demographically-aware prompt, it engages in an internal reasoning process akin to asking, "What
are the common risk factors for dementia in females?" This is a qualitatively diferent process
from the MLP model, which merely learns a weight-based function. The LLM can draw upon its
knowledge that "depression has a greater impact on dementia in women" to make its prediction
and provide a detailed explanation that goes beyond simple correlation.</p>
        <p>Step 3. Output Analysis and Evaluation Stage:
• LLM’s Output: The LLM generates a prediction (e.g., "High likelihood of dementia") along with
a detailed natural language explanation for its reasoning.
• Evaluation Methods: We evaluated the LLM’s predictions using two key metrics:
- Predictive Consistency: This is a quantitative measure of how well the LLM’s prediction aligns
with the actual diagnosis in the test set.
- Reasoning Nuance: A panel of medical experts (N=5) qualitatively evaluated the LLM’s
explanations on a 1 (generic) to 5 (highly specific and contextual) scale. This qualitative assessment
helps us understand how well the LLM internalizes and applies sex-specific risk factors.</p>
        <p>This architecture enables us to overcome the limitations of traditional modeling and demonstrate
that a single LLM can achieve personalized predictions while mitigating demographic bias.</p>
        <p>Experimental Setup</p>
        <p>In this study, we utilized the gemini-2.5-flash-preview-05-20 model for our experiments, chosen for
its strong performance in medical text analysis and its capacity for complex reasoning. To ensure stable
and consistent output suitable for an academic study, we meticulously tuned the LLM’s parameters.
The temperature was set to a low value of 0.2 to minimize creative or speculative responses and ensure
consistent, fact-based predictions. The max_tokens was set to 2000, allowing the LLM to provide detailed,
well-reasoned explanations for its predictions. Each patient’s preprocessed medical information text
block was capped at 10,000 characters to optimize its fit within the LLM’s context window.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and Results</title>
      <p>Our experiments compared the predictive performance of a traditional Machine Learning (MLP)
framework against a Large Language Model (LLM) utilizing a demographically-aware prompting strategy.
The results demonstrate the LLM’s ability to achieve more equitable and enhanced performance within
a single, unified system, overcoming the structural limitations inherent in traditional models.</p>
      <sec id="sec-4-1">
        <title>4.1. Sex Bias and Limitations of Traditional Machine Learning Models</title>
        <p>Traditional machine learning models often treat all patients as a single, uniform population, overlooking
subtle clinical diferences. This "one-size-fits-all" approach, when applied to dementia prediction,
typically results in biased predictions. To address the bias, we analyzed the necessity of separate
modeling. We compared the feature importance ranking of risk factors identified by models trained
separately for men and women, as illustrated in Table 1 and Table 2. This analysis confirmed that
dementia risk factors difer significantly by sex:
• The male prediction model prioritized factors such as Parkinson’s disease and other mental
disorders due to brain damage.
• In contrast, the female prediction model gave higher priority to factors related to cerebral
infarction, depressive episode, and cerebrovascular conditions.</p>
        <p>This clear divergence confirms that information from one sex can act as ’noise’ in the prediction model
for the other sex. Therefore, to address this demographic bias, traditional methods required separate
modeling for men and women.</p>
        <p>Even after implementing sex-separated modeling, the MLP models exhibited persistent performance
disparity:
• The F-measure for the male prediction model was 78.3%.</p>
        <p>• The F-measure for the female prediction model was 72.8%.</p>
        <p>This result indicated that the traditional approach, even when meticulously separated, still struggled to
capture the necessary nuances, leading to significantly lower performance for women (72.8%) compared
to men (78.3%).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Efectiveness of LLM-Based Personalized Prediction</title>
        <p>In sharp contrast to the traditional approach, the LLM-based methodology achieved superior predictive
accuracy for both sexes using a single, unified framework, eliminating the need for complex, separate
model training. The LLM (specifically the gemini-2.5-flash-preview-05-20 model) leverages its powerful
reasoning capabilities through a Demographically-Aware Prompt strategy. By explicitly providing
demographic context (e.g., sex and age) alongside the medical history, the LLM integrates sex-specific
risk factors into its internal reasoning process. For example, the LLM can draw upon its extensive
knowledge that "depression has a greater impact on dementia in women" to make a more nuanced
prediction. When evaluated using the F-measure (for comparative purposes):
• The F-measure for the male prediction model was 79.1%.</p>
        <p>• The F-measure for the female prediction model was 82.5%</p>
        <p>These results demonstrate that the LLM successfully surpassed the performance of the traditional
MLP sex-separated models for both men (79.1% &gt; 78.3%) and women (82.5% &gt; 72.8%) (Table 3). Critically,
the LLM achieved its highest performance in the female subgroup (82.5%), efectively mitigating the
substantial bias and lower accuracy that plagued the traditional approach for women (72.8%). This
validates the LLM as an innovative methodology for developing personalized medical AI systems that
respect individual diferences and reduce algorithmic bias without requiring separate modeling.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This study proposed a new methodology for fair and personalized dementia prediction models by
leveraging the capabilities of a Large Language Model. By using a demographically-aware prompting
strategy with patient data from the NHIS-SC DB, we overcame the limitations of the one-size-fits-all
approach that traditional models sufer from. Our research shows that in sensitive medical fields like
dementia prediction, it is essential to move away from a universal approach and adopt a personalized
one that respects the unique characteristics of each individual, using the LLM’s reasoning abilities.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future Plan</title>
      <p>In this study, the gemini-2.5-flash-preview-05-20 model showed encouraging performance in medical
text analysis and complex reasoning, efectively supporting our clinical data–driven predictive modeling
tasks. Nonetheless, to fully understand its relative strengths and limitations, it is necessary to compare
its performance with other state-of-the-art LLMs in future work.</p>
      <p>The proposed comparative analysis will include widely used state-of-the-art LLMs such as GPT-4,
LLaMA, and Claude, in addition to Gemini. Each of these models difers in terms of architecture,
training data sources, and reasoning capabilities, which are expected to result in varying performance
in processing clinical narratives combined with demographic information. This study aims to evaluate
how these models perform in terms of personalized dementia prediction—particularly in aspects of
fairness and accuracy.</p>
      <p>Future experiments will be conducted by applying the same longitudinal health dataset and unified
prompt engineering strategies across all models. The evaluation will incorporate a comprehensive
set of quantitative metrics, including F1-score, accuracy, recall, and precision. Furthermore, fairness
assessments will be performed across demographic subgroups such as ethnicity, sex, and age to measure
each model’s bias mitigation efectiveness. We will also consider analyzing the impact of intersectional
identities (e.g., age-gender combinations) on predictions and fairness in future work.</p>
      <p>In addition, explainability—a core focus of our research—will be systematically assessed. Expert
panels will conduct qualitative evaluations of the generated model explanations, and quantitative
measures such as the Explanation Satisfaction Scale (ESS) will be applied. To further contextualize
LLM explanation quality, we plan to compare these outputs against traditional explainable AI (XAI)
approaches such as SHAP and LIME, enabling an objective analysis of practical interpretability in clinical
decision-making. The results of this comparative work are expected to provide essential evidence for
developing an optimal LLM-based dementia prediction and explanation framework for real-world
clinical settings. By systematically considering model performance, fairness, and ethical factors, we
aim to enhance the fair and personalized use of AI in healthcare and ofer practical guidelines for LLM
selection. Moreover, the findings will serve as a foundation for extending this research framework to
diverse applications in medical AI and beyond.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This research was supported by Korea Institute of Science and Technology Information(KISTI).(No.
K25L4M2C4)</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Gemini, Grammarly in order to: Grammar and
spelling check, Paraphrase and reword. After using this tool, the authors reviewed and edited the
content as needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Aljohani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kommu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey on the trustworthiness of large language models in healthcare</article-title>
          ,
          <source>ArXiv abs/2502</source>
          .15871 (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. V.</given-names>
            <surname>Chinta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. D.
          <string-name>
            <surname>Viet</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kashif</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Ai-driven healthcare: A survey on ensuring fairness and mitigating bias</article-title>
          ,
          <source>ArXiv abs/2407</source>
          .19655 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. F. K.</given-names>
            <surname>Williamson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lipkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sahai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mahmood</surname>
          </string-name>
          ,
          <article-title>Algorithmic fairness in artificial intelligence for medicine and healthcare</article-title>
          ,
          <source>Nature Biomedical Engineering</source>
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <fpage>719</fpage>
          -
          <lpage>742</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stephens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Piracha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tiosano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lehouillier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Koppel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Elkin</surname>
          </string-name>
          ,
          <article-title>The sociodemographic biases in machine learning algorithms: A biomedical informatics perspective</article-title>
          ,
          <source>Life</source>
          <volume>14</volume>
          (
          <year>2024</year>
          )
          <fpage>652</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Omar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sorin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agbareia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. U.</given-names>
            <surname>Apakama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Soroush</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sakhuja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Freeman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Horowitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Nadkarni</surname>
          </string-name>
          , E. Klang,
          <article-title>Evaluating and addressing demographic disparities in medical large language models: a systematic review</article-title>
          ,
          <source>International Journal for Equity in Health</source>
          <volume>24</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bucholc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>James</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Khleifat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Badhwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Clarke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dehsarvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Madan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Marzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Schilder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tamburin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M.</given-names>
            <surname>Tantiangco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Lourida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Llewellyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Ranson</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence for dementia research methods optimization</article-title>
          ,
          <source>Alzheimer's &amp;amp; Dementia</source>
          <volume>19</volume>
          (
          <year>2023</year>
          )
          <fpage>5934</fpage>
          -
          <lpage>5951</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rawat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>McBride</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nirmal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Moon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Alamuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. O</given-names>
            <surname>'Brien</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Diversitymedqa: Assessing demographic biases in medical diagnosis using large language models</article-title>
          ,
          <source>ArXiv abs/2409</source>
          .01497 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ahsan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Amir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Wallace</surname>
          </string-name>
          ,
          <article-title>Elucidating mechanisms of demographic bias in llms for healthcare</article-title>
          ,
          <source>ArXiv abs/2502</source>
          .13319 (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Seong</surname>
          </string-name>
          , Y.-Y. Kim,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Khang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Park</surname>
          </string-name>
          , H.
          <article-title>-</article-title>
          <string-name>
            <surname>J. Kang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Do</surname>
            ,
            <given-names>J.-S.</given-names>
          </string-name>
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>J. H.</given-names>
          </string-name>
          <string-name>
            <surname>Bang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ha</surname>
            ,
            <given-names>E.-J.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Shin</surname>
          </string-name>
          ,
          <article-title>Data resource profile: The national health information database of the national health insurance service in south korea</article-title>
          ,
          <source>International Journal of Epidemiology</source>
          <volume>46</volume>
          (
          <year>2016</year>
          )
          <fpage>799</fpage>
          -
          <lpage>800</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          , H.-W. Chun,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-Y.</given-names>
            <surname>Coh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.-J.</given-names>
            <surname>Kwon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Moon</surname>
          </string-name>
          ,
          <article-title>Longitudinal study-based dementia prediction for public health</article-title>
          ,
          <source>International Journal of Environmental Research and Public Health</source>
          <volume>14</volume>
          (
          <year>2017</year>
          )
          <fpage>983</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          7.
          <string-name>
            <surname>Reviewer</surname>
          </string-name>
          <article-title>'s comments and responses</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Clarify</given-names>
            <surname>Conceptual</surname>
          </string-name>
          <article-title>Framework: Please distinguish between sex (biological) and gender (social construct) in your analysis. Additionally, consider acknowledging that some "gender-specific risk factors" (e.g., diferential schizophrenia diagnosis rates) may themselves reflect societal biases rather than inherent biological diferences</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Enhance</given-names>
            <surname>Transparency</surname>
          </string-name>
          <article-title>: Include the exact prompts used in your experiments. This will improve reproducibility and help others build upon your work</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>-</surname>
          </string-name>
          <article-title>I am preparing another journal article with the future work and it should be contain the exact prompts. By the way, I think the section "3.2 LLM-Based Prediction System Architecture and Experimental Setup" contains the detailed explanation about the process</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Add</given-names>
            <surname>Visual</surname>
          </string-name>
          <article-title>Comparisons: Consider adding figures or charts comparing model performances to make the results more accessible and impactful</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>-</surname>
          </string-name>
          <article-title>I added a table (Table 3) to explain performance comparison explicitly</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Consider</given-names>
            <surname>Intersectionality</surname>
          </string-name>
          :
          <article-title>While you mention evaluating across demographic subgroups, consider discussing how intersectional identities (e</article-title>
          .g.,
          <article-title>age × gender) might afect predictions and fairness in future work</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>