<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLARITY AI: A Comprehensive Checklist Integrating Established Frameworks for Enhanced Research Quality in Medical AI Studies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luca Marconi</string-name>
          <email>luca.marconi@unimib.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Efrem Pirovano</string-name>
          <email>e.pirovano8@campus.unimib.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Cabitza</string-name>
          <email>federico.cabitza@unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IRCCS Ospedale Galeazzi - Sant'Ambrogio</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Milano-Bicocca</institution>
          ,
          <addr-line>20126 Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The medical field is constantly evolving, integrating the latest technologies to enhance patient care and treatment eficacy. While various methodologies are available to evaluate the quality of research studies, checklists are often favored for their eficiency and ease of use. In this study, we contribute to this area of research by 1) analyzing the components of the most widely used checklists, and 2) proposing a more comprehensive checklist, CLARITY AI, which synthesizes the strengths of existing tools. This study analyzed several established checklists-CLAIM, CONSORT, DECIDE, FUTURE, IJMEDI, PRISMA, SPIRIT, STARD, STARE-HI, and TRIPOD-with the goal of developing a comprehensive checklist for evaluating research studies. Each item in these checklists was carefully cataloged, labeled, and assessed. The analysis aimed to identify the most critical items for inclusion in a definitive checklist for research study evaluation. The final version of the checklist is a coherent integration of structural elements-such as Title, Abstract, and Introduction-and essential parameters like Study Identification and Data Handling. This synthesis results in a comprehensive tool for thorough study and research evaluation. By integrating the strengths of multiple established checklists, CLARITY ofers a robust, systematic, and userfriendly framework for assessing research quality. This tool not only elevates research standards but also enhances transparency, reproducibility, and overall credibility in the field of medical AI studies. Its application has the potential to produce more reliable and efective healthcare solutions, ultimately improving patient outcomes and advancing medical research.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;AI in Healthcare</kwd>
        <kwd>Research Evaluation</kwd>
        <kwd>CLARITY Framework</kwd>
        <kwd>Medical AI Studies</kwd>
        <kwd>Reproducibility</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The integration of artificial intelligence (AI) into healthcare is transforming medical research and
practice, ofering significant opportunities to improve diagnostic accuracy, treatment personalization,
and enhance patient outcomes. Despite these advancements, the rapid proliferation of AI technologies
also brings critical challenges, particularly in ensuring that AI research adheres to rigorous standards
of quality, transparency, and reproducibility.</p>
      <p>
        Existing frameworks, like CLAIM[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], CONSORT[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][6], PRISMA[7][8][9] and OPTICA[10]
address specific aspects of AI research but often lack the breadth required to evaluate the full complexity
of AI in healthcare. The absence of a unified, adaptable framework hinders the reliable integration of
AI technologies in clinical practice.
      </p>
      <p>A critical gap in current evaluation methodologies is their limited scope, often neglecting key
dimensions such as ethical considerations, data management, and usability—factors that are essential
for the safe and efective deployment of AI in healthcare environments. Existing checklists such as
TRIPOD[11][12] and STARE-HI[13], though valuable, do not adequately account for the iterative nature
of AI models and their reliance on dynamic datasets. This shortcoming is further exacerbated by the
absence of standardized approaches to addressing ethical challenges, such as algorithmic bias and
patient privacy, which are increasingly recognized as fundamental concerns in AI research. Thus, a
comprehensive tool that integrates the strengths of existing frameworks while broadening their scope
is urgently needed to address these gaps and ensure the responsible development and deployment of AI
in healthcare.</p>
      <p>In response to these challenges, we propose CLARITY AI, a synthesized and adaptable checklist designed
for the comprehensive evaluation of AI-driven medical studies . CLARITY AI combines critical elements
from ten established checklists into a unified framework. This tool addresses both technical and
methodological rigor while also emphasizing data handling, ethical governance, and usability, ensuring
that AI studies are scientifically robust, ethically sound, and practically relevant. With its structured
yet flexible approach, CLARITY AI ofers a more complete evaluation system that enhances research
quality, reproducibility, and ultimately supports the safe integration of AI technologies into clinical
practice.</p>
      <p>By providing a holistic solution for evaluating AI research, this paper presents CLARITY AI as a key
contribution to the field, aiming to establish a new standard for research quality in AI-driven medical
studies. The implications of its adoption extend beyond improved transparency and rigor, ofering
the potential to accelerate the responsible deployment of AI tools in healthcare, ultimately advancing
patient care.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>The development of CLARITY proceeded through a structured, multifaceted process aimed at creating
a comprehensive checklist to address gaps in existing frameworks for evaluating AI-driven medical
research, particularly within healthcare. Our approach synthesized elements from existing checklists
while expanding their scope to meet the unique challenges posed by AI technologies, including data
handling, ethical considerations, and the dynamic nature of AI models. The methods used ensured that
CLARITY captured all critical aspects of AI-driven research, from technical rigor to ethical deployment
in real-world settings.</p>
      <p>The methodological approach underpinning CLARITY’s development was rooted in a detailed analysis
of key checklists, each contributing specific strengths to create a robust, flexible framework adaptable
to the evolving demands of AI in healthcare.We conducted a systematic review of widely-used AI
checklists and our guiding research question was how to design a checklist that addresses the technical
evaluation of AI models while integrating essential aspects of transparency, reproducibility, ethical
governance, and usability. This approach aligns directly with the goal of ensuring reliable and ethical
implementation of AI technologies in clinical practice.
2.1. Identification and contribution of Key Checklists
Several established checklists were selected, analyzed, and categorized into macro topics and structural
items to shape the CLARITY framework, as detailed in [Tab. 1] and [Tab. 2]. Specifically, we selected
these checklists based on their widespread adoption and relevance to AI in healthcare.</p>
      <p>The CLAIM (Checklist for Artificial Intelligence in Medical Imaging) played a pivotal role in
developing the Model Details and Data Handling sections of CLARITY. CLAIM’s rigorous emphasis on data
validation and performance metrics ensured that CLARITY efectively captured key aspects of model
transparency and reproducibility, which are particularly crucial in medical AI, where model reliability
must be demonstrated through robust data management practices.</p>
      <p>Similarly, CONSORT (Consolidated Standards of Reporting Trials) provided a structured approach
to study design, particularly for randomized controlled trials. CONSORT’s focus on transparency and
participant flow informed CLARITY’s study design and methods categories, ensuring that AI studies
adhere to the highest standards of rigor and reproducibility. This framework was vital in shaping how
AI studies are documented and reported, creating a foundation for reliable implementation.</p>
      <p>DATA
HANDLING</p>
      <p>X</p>
      <p>MODEL
DETAILS</p>
      <p>X</p>
      <p>X
HUMAN FACTORS</p>
      <p>USABILITY</p>
      <p>X</p>
      <p>X
TRANSPARENCY</p>
      <p>REPRODUCIBILITY
X
X
X
X
X
X
X
X</p>
      <p>CLAIM AI
CONSORT AI
DECIDE AI
FUTURE AI
IJMEDI AI
PRISMA AI
SPIRIT AI
STARD AI
STARE HI
TRIPOD AI</p>
      <p>CLAIM AI
CONSORT AI
DECIDE AI
FUTURE AI
IJMEDI AI
PRISMA AI
SPIRIT AI
STARD AI
STARE HI</p>
      <p>TRIPOD AI</p>
      <sec id="sec-2-1">
        <title>Checklist Table Structural Items</title>
      </sec>
      <sec id="sec-2-2">
        <title>Checklist Table Macro Topic Items</title>
        <p>STUDY
IDENTIFICATION
X
X
X
X
X
X
X</p>
        <p>X
PERFORMANCE
METRICS</p>
        <p>X
X
X</p>
        <p>X</p>
        <p>CLAIM AI
CONSORT AI
DECIDE AI
FUTURE AI
IJMEDI AI
PRISMA AI
SPIRIT AI
STARD AI
STARE HI
TRIPOD AI</p>
        <p>CLAIM AI
CONSORT AI
DECIDE AI
FUTURE AI
IJMEDI AI
PRISMA AI
SPIRIT AI
STARD AI
STARE HI
TRIPOD AI
STRUCTURED
SUMMARY</p>
        <p>X
X
X
BACKGROUND
OBJECTIVES</p>
        <p>X
X
X
X
X
X
X</p>
        <p>X
RESULTS
FINDINGS</p>
        <p>X
X
X
X
X
X
X
X
X
X</p>
        <p>X
DISCUSSION
IMPLICATIONS
X
X
X
X
X
X
X
X
STUDY DESIGN
METHODS</p>
        <p>X
X
X
X
X
X
X</p>
        <p>X</p>
        <p>ETHICS
GOVERNANCE
X
X
X
X
X</p>
        <sec id="sec-2-2-1">
          <title>TITLE</title>
          <p>X
X
X
?
X
X
X
X
X</p>
          <p>X
RESULTS
X
X
X
?
X
X
X
X
X</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>ABSTRACT</title>
          <p>X
X
X
?
X
X
X</p>
          <p>X
DISCUSSION</p>
          <p>X
X
X
?
X
X
X
X
X
accessible to both clinicians and researchers.
was central to shaping CLARITY’s Ethics and Governance sections, its contributions to the structural
organization of the framework were limited, as indicated by the use of the symbol "?" to denote its
minimal role in these areas [Tab. 2]. Thus, FUTURE’s input is more visible in the macro-level topics
rather than in the core structural elements [Tab. 1].</p>
          <p>Additional contributions came from IJMEDI [15], PRISMA (Preferred Reporting Items for Systematic
Reviews and Meta-Analyses), SPIRIT (Standard Protocol Items: Recommendations for Interventional
Trials)[16], STARD (Standards for Reporting Diagnostic Accuracy Studies)[17][18], STARE-HI (Standards
for Reporting of Health Informatics), and TRIPOD (Transparent Reporting of a Multivariable Prediction
Model for Individual Prognosis or Diagnosis). Each of these checklists provided insights, particularly in
data management and predictive model evaluation, enriching CLARITY’s scope.
2.2. Comprehensive Analysis and Integration of Checklists
The integration of these checklists was a complex and iterative process aimed at developing a unified
framework without redundancy. Each checklist’s structure and focus areas were carefully analyzed to
ensure that CLARITY incorporated the most valuable elements while eliminating duplicative or overly
narrow criteria. We used a comparative approach to analyze the strengths of each checklist, identifying
overlaps and gaps across them. For instance, CLAIM’s emphasis on model validation was harmonized
with CONSORT’s focus on participant flow and study design. This synthesis ensured that CLARITY
addressed both technical and methodological rigor, while also encompassing ethical governance and
usability.</p>
          <p>During the integration, common themes such as transparency, reproducibility, and ethical standards
were identified across multiple checklists and synthesized into CLARITY. This ensured that the
framework was not constrained by the limitations of any single checklist but rather ofered a more versatile,
adaptable tool for AI research, particularly given the dynamic nature of AI models and evolving datasets.</p>
          <p>CLARITY’s flexibility was a key consideration in its development. Unlike traditional checklists,
which may be rigid, CLARITY was designed to evolve alongside advancements in AI technology and
emerging ethical challenges. This adaptability ensures that the framework remains relevant and useful
in a rapidly changing field. Furthermore, it was designed to be user-friendly, providing clear guidance
for applying the checklist in diverse research contexts, from diagnostic imaging to predictive modeling.</p>
          <p>One of the primary challenges in developing CLARITY was ensuring that the checklist remained
comprehensive without becoming overly burdensome. To address this, we consolidated overlapping
criteria while ensuring that no critical aspects were omitted. For example, although CLAIM and PRISMA
both emphasize transparency, their approaches difer significantly. We integrated the most relevant
elements from each, creating a unified guideline applicable to a broad range of AI research, ensuring a
thorough evaluation without unnecessary complexity.</p>
          <p>Another significant challenge was accommodating the iterative nature of AI models, which are often
refined in real-time as new data becomes available. Traditional checklists, designed for static research
studies, do not account for this iterative development. CLARITY, however, includes specific guidelines
for evaluating data integrity, scalability, and security throughout the model lifecycle, ensuring that AI
models are rigorously assessed over time.</p>
          <p>Lastly, usability was a key consideration in CLARITY’s development. The framework includes
guidelines for evaluating the human factors and usability of AI tools, ensuring that they are practical
and accessible to clinicians and researchers in real-world healthcare environments. This is particularly
important for the success of medical AI, where seamless integration into clinical workflows is essential.</p>
          <p>To further streamline the evaluation process, we developed tables summarizing both structural [Tab.
3] and macro topic items [Tab. 4] across the integrated checklists. These tables help researchers and
practitioners quickly identify overlaps and gaps, ensuring that all critical aspects of AI research are
addressed and facilitating the application of CLARITY in diverse research contexts.</p>
          <p>In conclusion, CLARITY is a comprehensive, adaptable framework that integrates the strengths of
multiple established checklists into a single tool. By addressing gaps in existing methodologies and
expanding the scope of evaluation to include ethical and practical considerations, CLARITY establishes
a new standard for evaluating AI-driven medical research. This framework ensures that AI technologies
are not only scientifically robust but also ethically sound and practically relevant, promoting their
responsible integration into healthcare.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Gap Analysis</title>
      <p>STRUCTURED</p>
      <p>SUMMARY
Study Design
and Methods
Key Outcomes
and Results
Conclusions</p>
      <p>RESULTS
&amp; FINDINGS</p>
      <p>Baseline
Characteristics
Results of the</p>
      <p>AI Intervention
Estimation Methods</p>
      <p>Participant Flow</p>
      <p>BACKGROUND
&amp; OBJECTIVES</p>
      <p>Background</p>
      <p>Objectives</p>
      <p>DISCUSSION
&amp; IMPLICATIONS</p>
      <p>Interpretation
of Findings
Comparison with
Existing Studies</p>
      <p>Implications
for Practice
Future Research</p>
      <p>Directions</p>
      <p>STUDY DESIGN
AND METHODS
Study Design 1
Study Design 2
Data Collection</p>
      <p>Test Methods</p>
      <p>Data Preparation
Outcome and Predictors</p>
      <p>Model Development</p>
      <p>Model Validation</p>
      <p>Analysis</p>
      <p>ETHICS
&amp; GOVERNANCE
Research Ethics</p>
      <p>Approval
Confidentiality</p>
      <p>Declaration
of Interests
Access to Data
The development of CLARITY began with a thorough examination of the limitations in existing
AI-focused medical research checklists. The ten checklists analyzed—CLAIM, CONSORT, DECIDE,
FUTURE, IJMEDI, PRISMA, SPIRIT, STARD, STARE-HI, and TRIPOD—each serve distinct purposes
within the medical research landscape. However, when assessed against the comprehensive needs
required to assess AI-driven studies in healthcare, several critical gaps became evident. These gaps
reveal significant shortcomings in existing methodologies, which must be addressed to efectively
evaluate the unique challenges posed by AI technologies in healthcare.
Acknowledgements</p>
      <p>DATA</p>
      <p>HANDLING
Data Management</p>
      <p>Plans
Data Security
and Storage
Data Quality
Assurance</p>
      <p>MODEL</p>
      <p>DETAILS
Model Architecture</p>
      <p>Training Data</p>
      <p>Evaluation Metrics
HUMAN FACTORS</p>
      <p>&amp; USABILITY
User Engagement</p>
      <p>TRANSPARENCY
&amp; REPRODUCIBILITY</p>
      <p>Data Availability
Consent or Assent</p>
      <p>Usability Testing</p>
      <p>Data Sharing
3.1. Analysis of Gaps in Existing Checklists
One of the most prominent issues identified in the analysis was the limited scope and narrow focus of
many existing checklists. For instance , CLAIM[19] and STARE-HI, while valuable for specific areas
such as diagnostic accuracy and health informatics, do not ofer comprehensive guidance on broader
aspects, particularly study design and ethical considerations. Due to its complexity, AI research requires
a multi-dimensional evaluation approach that considers both the technical robustness and the ethical
implications of these technologies in real-world healthcare settings. However, checklists like CONSORT
and SPIRIT, which are primarily designed for clinical trials, do not suficiently address the transparency
and reproducibility challenges that are essential for the reliable deployment of AI systems. These
checklists may overlook the iterative nature of AI models, especially in managing dynamic datasets
that continuously evolve—a key characteristic that distinguishes AI from conventional technologies.
Additionally, while TRIPOD is comprehensive in reporting predictive models, it lacks suficient coverage
in areas such as usability and human factors, which are critical for ensuring that AI technologies are
not only technically sound but also practical for end-users like clinicians and patients. Usability is
essential in determining whether an AI tool will be successfully integrated into clinical practice, yet
many checklists lack comprehensive guidance on this aspect. While DECIDE and FUTURE address
ethical and fairness considerations, they fall short of providing a systematic framework for the ongoing
governance of AI systems. The governance of AI in healthcare must address critical concerns such as
ensuring fairness in AI-driven decisions, managing biases, and safeguarding patient data. However, many
checklists lack structured approaches for evaluating these dimensions, particularly as AI technologies
are implemented in diverse healthcare environments with varying regulatory and ethical standards.
Additionally , existing checklists lack flexibility, making them less adaptable to the rapidly evolving
AI landscape in healthcare. Tools like CLAIM and PRISMA exemplify this rigidity, as they are often
focused on specific research methods or applications, leaving little room for integration with emerging
methodologies or newer AI technologies. This rigidity can limit the utility of such checklists when
researchers work with innovative AI systems that do not align with traditional evaluation frameworks.
For example, as AI models evolve toward more complex architectures, such as neural networks and
unsupervised learning systems, the ability to adapt evaluation frameworks becomes essential.
Checklists like DECIDE and FUTURE ofer some decision-making structures, but even these tools lack the
adaptability required to keep pace with AI advancements, particularly in complex clinical applications
where AI systems are continuously evolving.</p>
      <p>Another significant gap involves the handling of ethical issues specific to AI. While some checklists,
such as CONSORT, SPIRIT[20], and PRISMA, provide minimal guidance on AI-specific ethical
concerns—such as bias mitigation, data privacy, and the long-term impact of AI on patient care—these
aspects are becoming increasingly critical as AI technologies become more pervasive in healthcare.
Without comprehensive ethical guidance, AI-driven medical research risks introducing biases,
compromising patient privacy, or deploying models that have unintended negative consequences on patient
outcomes. Although FUTURE and DECIDE emphasize fairness and ethical considerations, they lack
a structured framework that fully addresses the ethical governance of AI systems, particularly as it
relates to continuous monitoring and accountability for AI decisions. AI systems are not static, and
governance structures must be established to ensure their ongoing ethical performance.
3.2. Addressing the Gaps and Developing CLARITY
To address the gaps identified in the existing checklists, CLARITY was developed as a comprehensive and
adaptable framework. It not only integrates the strengths of established checklists but also expands their
scope, particularly by including detailed guidelines on ethical governance and data privacy. CLARITY
provides a thorough assessment of critical concerns such as bias mitigation, data handling, and ongoing
AI governance, ensuring that AI technologies are deployed responsibly and equitably across diverse
healthcare environments. By incorporating these elements, the framework facilitates continuous
monitoring of ethical risks, such as bias, which can directly impact patient care. Additionally, the
framework remains adaptable, evolving with new AI technologies and methodologies. Its adaptability
ensures it remains relevant in the evolving landscape of healthcare AI, accommodating new challenges
and technologies. As AI models advance, its modular structure allows updates while preserving core
principles. As AI models advance, CLARITY’s modular structure allows updates while preserving its
core principles.</p>
      <p>While addressing gaps in the ethical and technical evaluation of AI systems, the framework also
leverages the strengths of existing checklists. It incorporates key principles from these checklists, such
as transparency, reproducibility, and structured reporting. By requiring detailed documentation of AI
model development, data handling, and performance metrics, it ensures that studies can be independently
verified, thereby contributing to the integrity and transparency of AI research in healthcare.</p>
      <p>In addition to ethical and technical aspects, CLARITY emphasizes practical usability, ensuring that
AI tools are accessible and functional for end-users. By incorporating human factors and usability
assessments , CLARITY promotes the development of AI systems that are not only efective but also
user-friendly. In healthcare environments, where clinicians may have limited time or technical expertise,
the usability of AI tools can determine whether they are adopted into routine practice. CLARITY’s
inclusion of usability testing bridges the gap between technical innovation and practical application,
ensuring that AI technologies are truly beneficial in real-world clinical settings.</p>
      <p>By addressing these gaps, CLARITY provides a comprehensive, flexible, and user-friendly framework
for evaluating AI-driven research in healthcare. It not only fills the critical gaps identified in existing
checklists but also builds on their strengths, ofering a unified tool that adapts to the evolving demands
of AI research in medical settings.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>After establishing the aforementioned categories, each was assigned a specific evaluation method and
scoring system. This design allows researchers to easily and intuitively assess the quality of the study
and determine whether it meets the necessary requirements.[Tab. 5][Tab. 6]
objectives are clear and justified.
and methodologies used.
data integrity and reliability.
4.1. Scoring and Evaluation
Each category in CLARITY was analyzed to establish criteria for evaluation. The scoring system, ranging
from 0 to 5 for each item, provides a straightforward mechanism for assessing study completeness
and quality. This approach identifies strengths and weaknesses, helping researchers focus on areas for
Study Identification:</p>
      <p>Assesses how clearly the study is identified and defined, including the use of
ensuring they provide a comprehensive overview.</p>
      <p>Structured Summary: Evaluates the completeness and clarity of the study’s abstract and summary,
Background and Objectives: Examines the context and rationale behind the study, ensuring the
Study Design and Methods: Focuses on the robustness and appropriateness of the study design
Data Handling: Evaluates the procedures for data collection, processing, and management to ensure
Model Details: Analyzes the specifics of the AI model used, including its development, validation,
and any comparative analyses performed.
accuracy, precision, and other metrics.
they are clear and well-supported by data.</p>
      <p>Performance Metrics: Assesses the measures used to evaluate the AI model’s performance, including
Results and Findings: Reviews the presentation and interpretation of the study’s results, ensuring
Discussion and Implications: Evaluates the depth and breadth of the discussion, including the
implications of the findings for clinical practice and future research.</p>
      <p>Ethics and Governance: Ensures that ethical considerations and governance issues are thoroughly
addressed and documented.</p>
      <p>Human Factors and Usability: Assesses the involvement of end-users in the design and usability
testing of the AI tool, ensuring it meets user needs and expectations.</p>
      <p>Transparency and Reproducibility: Evaluates the study’s transparency in reporting and its
potential for reproducibility by other researchers.
4.2. Visualization and Data Distribution
To complement the scoring system, CLARITY employs bar charts to provide intuitive visualizations
of data distribution and study results. These visualizations enable researchers to quickly assess study
quality across various categories and compare overall scores.</p>
      <p>The bar charts [Fig. 1] [Fig. 2] ofer a multi-dimensional view of each study’s performance. Each axis
in the chart corresponds to a specific category (e.g., transparency, ethical governance, data handling),
allowing for a quick and comprehensive assessment of a study’s strengths and weaknesses. Structural
items are mapped in [Fig. 1], while macro topic items are depicted in [Fig. 2].</p>
      <p>In addition to these bar charts, a third chart [Fig. 3] represents the total score, combining both macro
topic items and structural items. This bar chart highlights the "Total Score," providing a clear comparison
of the completeness and quality of each evaluated study. The bar chart simplifies the comparative
analysis by ofering a straightforward visual that juxtaposes the final scores, making it easy to identify
studies with stronger or weaker evaluations.</p>
      <p>The integration of these categories, scoring techniques, and visualization tools has resulted in the
creation of CLARITY, a comprehensive framework for the scientific evaluation of AI research in the
medical field. By incorporating the strengths of various established checklists and refining them
into a unified tool, CLARITY aims to set a new standard for assessing the quality and completeness
of AI-related medical research. This holistic approach ensures that researchers have a reliable and
user-friendly method for evaluating their studies, ultimately advancing the field of medical AI research.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>The CLARITY AI checklist provides a much-needed, comprehensive framework for evaluating the
quality of AI-driven medical research, addressing gaps identified in previous checklists such as CLAIM,
PRISMA, and CONSORT. CLARITY was developed in response to the rapidly evolving field of AI in
healthcare, which requires rigorous and adaptable tools to capture the complexities inherent in AI
systems, including issues of transparency, data handling, and ethical governance.</p>
      <p>Key findings from the study indicate that CLARITY efectively bridges critical gaps by integrating
diverse evaluation elements, including study design, ethical considerations, human factors, and
usability. Its multi-dimensional approach ensures that AI studies are scientifically robust, practical, and
ethical—crucial in the clinical deployment of AI systems.</p>
      <p>The integration of established frameworks like CLAIM and STARE-HI underpins CLARITY’s emphasis
on transparent reporting and performance metrics, which are essential for reproducibility and efective
validation of AI models. However, CLARITY goes beyond these existing tools by ofering a more holistic
approach that includes evaluating the usability of AI tools—an aspect often overlooked in traditional
checklists. This focus on usability ensures that AI models are tested for real-world practicality,
increasing their chances of successful adoption in clinical settings.</p>
      <p>An unexpected finding during the checklist’s development was the lack of comprehensive guidelines
addressing the ongoing governance and ethical oversight of AI tools, particularly for mitigating
algorithmic bias and ensuring patient safety. While frameworks like FUTURE have advanced fairness and
ethics, CLARITY’s structured focus on these elements sets it apart by embedding ethical considerations
into every stage of the research process. This approach helps prevent AI models that may perform well
technically but pose ethical risks from advancing to clinical implementation.</p>
      <p>Comparison with previous research reveals that CLARITY not only integrates best practices from
existing tools but also innovates by expanding the scope of evaluation. For instance, while TRIPOD
provides detailed guidance on predictive models, it does not address usability or the iterative nature
5,00
4,00
3,00
E
R
O
C
S
E
G
A 2,00
R
E
V
A
1,00
0,00
5,00
4,00
3,00
E
R
O
C
S 2,00
E
G
A
R
E
V
A
1,00
0,00</p>
      <p>ITLE
T</p>
      <p>ACT</p>
      <p>TR
ABS</p>
      <p>IN</p>
      <p>N</p>
      <p>TIO</p>
      <p>DUC
TRO</p>
      <p>DS</p>
      <p>THO
ME</p>
      <p>LTS</p>
      <p>SU
RE</p>
      <p>N</p>
      <p>SIO</p>
      <p>US
ISC
D</p>
      <p>LU</p>
      <p>NC
CO</p>
      <p>N
SIO</p>
      <p>TH
O</p>
      <p>ER</p>
      <p>LEM
E</p>
      <p>EN</p>
      <p>TS
of AI tool development, which CLARITY incorporates. Similarly, CONSORT’s strength in clinical
trials is complemented by CLARITY’s broader applicability to AI studies, particularly in terms of data
management and ethical governance—areas where CONSORT is less focused.</p>
      <p>With its structured and flexible design, CLARITY is poised to make a significant impact on future AI
research by setting a new standard for comprehensive and systematic evaluation. The integration of
user feedback mechanisms allows CLARITY to evolve alongside advances in AI technology, ensuring
its continued relevance and efectiveness as AI becomes more embedded in healthcare. Additionally,
CLARITY’s emphasis on transparent reporting and reproducibility will likely enhance the overall
reliability of AI studies, facilitating more accurate meta-analyses and systematic reviews—essential for
the responsible scaling of AI technologies in healthcare.</p>
      <p>However, CLARITY’s current iteration has some limitations. The framework has not yet undergone
extensive empirical validation, and its real-world application across diverse research environments
remains untested. Future research should aim to address this by conducting empirical studies in various
clinical settings. Additionally, training and usability challenges pose potential barriers to widespread
adoption, particularly in resource-constrained environments. Future work should focus on simplifying
the checklist for ease of use and developing comprehensive training modules to support adoption by
researchers of varying levels of expertise. Limitations and recommendations for further studies will be
detailed in Section 6 and Section 7.</p>
      <p>In conclusion, CLARITY represents a significant advancement in evaluating AI-driven medical research.
By ofering a rigorous, user-friendly, and adaptable tool, it addresses critical gaps in existing frameworks
and promotes high standards of research quality. As AI continues to transform healthcare, tools like
CLARITY will be essential to ensure that the supporting research is robust, reproducible, and ethically
sound. Future studies should focus on refining and validating CLARITY in practiceto ensure it can
adapt to the evolving landscape of AI in medicine.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Limitations</title>
      <p>While CLARITY represents a significant advancement in evaluating AI-driven medical research, its
current form has certain limitations. These can be categorized into three main areas: lack of empirical
validation, complexity and usability challenges, and barriers to training and adoption. First, although
CLARITY is built on a solid theoretical foundation, it has yet to undergo extensive real-world testing,
leaving its practical utility unproven. The lack of empirical validation raises the possibility that some
elements may not function as intended across diverse research environments, underscoring the need
for studies that test the checklist in various contexts to ensure its broader applicability, as detailed in
Section 7.</p>
      <p>Secondly, while CLARITY’s comprehensive nature aims to ensure thorough evaluations, it may present
challenges for users unfamiliar with AI research or those in resource-limited settings. The detailed
criteria can be overwhelming, indicating a need for simplification or tiered complexity to accommodate
varying levels of expertise and resources. Finally, the specialized knowledge required for efective use of
CLARITY may hinder its adoption. The steep learning curve, particularly in teams lacking AI expertise,
could impede widespread implementation.</p>
      <p>Moreover, the current version does not incorporate user feedback, meaning practical challenges in its
application have not yet been addressed. To overcome these barriers, it would be beneficial to develop
training programs to facilitate the checklist’s use and gather feedback from early adopters to refine the
framework. In summary, while CLARITY shows promise as a robust tool for AI research evaluation,
further empirical validation, simplification, and user-focused improvements will be essential to fully
realize its potential in enhancing research quality in healthcare.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Recommendations</title>
      <p>Although CLARITY was meticulously developed by synthesizing existing frameworks, its adoption
and long-term utility in evaluating AI-driven medical research will benefit from rigorous empirical
validation. We recommend conducting pilot studies and case studies in diverse healthcare settings to
assess the checklist’s efectiveness in real-world applications. These studies should test CLARITY’s
comprehensiveness, usability, and adaptability across various research contexts and clinical
environments.</p>
      <p>A potential framework for empirical validation could involve the following steps:
1. Pilot Implementation: Researchers could retrospectively apply CLARITY to a range of AI-driven
medical studies, including diagnostic imaging, predictive modeling, and treatment planning.
These applications should cover diverse AI methodologies (e.g., supervised and unsupervised
learning) to ensure broad relevance.
2. Comparative Analysis: To assess CLARITY’s utility, comparisons with other established checklists
(e.g., CLAIM, CONSORT) could be conducted. Evaluating studies with both CLARITY and these
checklistswill help determine if the integrated approach provides additional insights or uncovers
previously overlooked issues..
3. User Feedback: Engaging researchers and clinicians is fundamental to understanding the checklist’s
usability. Structured interviews or surveys with users applying CLARITY can provide feedback
on ease of use, clarity, and the ability to inform research improvements.
4. Iterative Refinement: Based on pilot findings and feedback, the checklist can be adjusted.
Continuous validationacross various clinical settings and AI applications will help ensure CLARITY
remains relevant and adaptable to evolving AI technologies.
5. Outcome Assessment: Ultimately, the checklist’s success should be measured by improvements
in research quality, transparency, and reproducibility. Meta-analyses of studies evaluated by
CLARITY could demonstrate whether its adoption correlates with higher standards in AI-driven
research.</p>
      <p>By following these steps, future studies can provide critical insights into improving CLARITY and
ensure that it remains a valuable tool for evaluating AI research in healthcare..</p>
    </sec>
    <sec id="sec-8">
      <title>8. Conclusions</title>
      <p>CLARITY represents a significant advancement in evaluating medical research involving artificial
intelligence. Its meticulously designed framework integrates best practices from various established
checklists. By extracting and refining the most valuable elements from these sources, CLARITY provides
a comprehensive tool that addresses both structural items and macro topic items, ensuring a thorough
assessment of AI studies in healthcare.</p>
      <p>The scoring system, along with visualization tools like spider charts and bar charts, enhances the
checklist’s usability and clarity, making it easier for researchers to evaluate their work. This systematic
approach highlights a study’s strengths and weaknesses while guiding researchers toward areas for
improvement, fostering a culture of continuous enhancement in AI medical research.
CLARITY’s structured evaluation method ensures that all critical aspects of a study are thoroughly
assessed. This includes study identification clarity, completeness of structured summaries, relevance
and justification of background and objectives, robustness of study design and methods, integrity of
data handling, model details, accuracy of performance metrics, clarity of results and findings, depth
of discussion and implications, adherence to ethical standards, consideration of human factors and
usability, and transparency and reproducibility .</p>
      <p>By providing a detailed, standardized evaluation framework, CLARITY helps researchers produce
high-quality, transparent, and reproducible AI studies. This enhances the credibility of individual
studies and contributes to the overall integrity and advancement of AI research in the medical field.
CLARITY aims to set a new standard for evaluating AI studies in healthcare, promoting best practices
and high standards across the research community. Its comprehensive approach ensures all relevant
aspects of a study are considered, reducing the risk of oversight and enhancing the robustness of research
ifndings. By fostering rigorous evaluations, CLARITY helps ensure that AI technologies developed and
tested in healthcare settings are reliable, efective, and safe.</p>
      <p>Moreover, using CLARITY can lead to more consistent and comparable evaluations of AI studies,
facilitating meta-analyses and systematic reviews. This, in turn, can accelerate the adoption of efective
AI technologies in clinical practice, ultimately improving patient outcomes and advancing medicine.
As AI evolves and its healthcare applications expand, CLARITY must adapt to incorporate new insights
and developments. Ongoing feedback from the research community will be crucial for refining and
updating the checklist to keep it relevant and efective. Future iterations of CLARITY may include
additional categories or refined scoring criteria to better capture emerging trends and technologies in
AI research.</p>
      <p>Overall, CLARITY provides a comprehensive, systematic, and user-friendly framework for evaluating
AI studies in the medical field.</p>
      <p>The success and evolution of CLARITY rely heavily on interdisciplinary collaboration between
researchers in AI, healthcare, ethics, and policy-making. As AI becomes more integrated into healthcare,
input from diverse fields—including medical professionals, data scientists, legal experts, and
ethicists—will be essential. CLARITY provides a robust starting point for evaluating AI research, but
its continued relevance will depend on contributions from these diverse fields to address emerging
challenges, such as patient safety, data privacy, and algorithmic fairness. By fostering interdisciplinary
partnerships, CLARITY can evolve into a universally accepted standard, guiding AI innovation to be
both scientifically rigorous and socially responsible. This collaborative approach will ensure CLARITY
remains adaptive to the complex and evolving landscape of AI in healthcare.
consort-ai reporting guidelines, Nature Communications 15 (2024) 1619. URL: https://doi.org/10.
1038/s41467-024-45355-3. doi:10.1038/s41467-024-45355-3.
[6] R. Shahzad, B. Ayub, M. A. R. Siddiqui, Quality of reporting of randomised controlled
trials of artificial intelligence in healthcare: a systematic review, BMJ Open 12 (2022).
URL: https://bmjopen.bmj.com/content/12/9/e061519. doi:10.1136/bmjopen-2022-061519.
arXiv:https://bmjopen.bmj.com/content/12/9/e061519.full.pdf.
[7] M. J. Page, et al., The prisma 2020 statement: an updated guideline for reporting systematic
reviews, BMJ 372 (2021).
[8] M. J. Page, J. E. McKenzie, P. M. Bossuyt, I. Boutron, T. C. Hofmann, C. D. Mulrow, L. Shamseer,
J. M. Tetzlaf, E. A. Akl, S. E. Brennan, R. Chou, J. Glanville, J. M. Grimshaw, A. Hróbjartsson,
M. M. Lalu, T. Li, E. W. Loder, E. Mayo-Wilson, S. McDonald, L. A. McGuinness, L. A. Stewart,
J. Thomas, A. C. Tricco, V. A. Welch, P. Whiting, D. Moher, The prisma 2020 statement: an
updated guideline for reporting systematic reviews, Systematic Reviews 10 (2021) 89. URL:
https://doi.org/10.1186/s13643-021-01626-4. doi:10.1186/s13643-021-01626-4.
[9] W. Z. Andrea C. Tricco, Erin Lillie, et al., Prisma extension for scoping reviews (prisma-scr):
Checklist and explanation, Annals of Internal Medicine 169 (2018) 467–473. URL: https://doi.org/
10.7326/M18-0850. doi:10.7326/M18-0850. arXiv:https://doi.org/10.7326/M18-0850,
pMID: 30178033.
[10] N. Dagan, et al., Evaluation of ai solutions in health care organizations — the optica tool, NEJM</p>
      <p>AI 0 (2024) AIcs2300269.
[11] Collins, et al., Tripod+ai statement: updated guidance for reporting clinical prediction models that
use regression or machine learning methods, BMJ 385 (2024). doi:10.1136/bmj-2023-078378.
[12] G. S. Collins, K. G. M. Moons, Dhiman, et al., Tripod+ai statement: updated guidance for reporting
clinical prediction models that use regression or machine learning methods, BMJ 385 (2024).
doi:10.1136/bmj-2023-078378.
[13] J. Talmon, E. Ammenwerth, J. Brender, N. de Keizer, P. Nykänen, M. Rigby, Stare-hi—statement on
reporting of evaluation studies in health informatics, International Journal of Medical Informatics
78 (2009) 1–9.
[14] B. Vasey, et al., Reporting guideline for the early stage clinical evaluation of decision support
systems driven by artificial intelligence: Decide-ai, BMJ 377 (2022) e070904.
[15] A. Iancu, I. Leb, H.-U. Prokosch, W. Rödle, Machine learning in medication prescription: A
systematic review, International Journal of Medical Informatics 180 (2023) 105241.
[16] C. Rivera, et al., Guidelines for clinical trial protocols for interventions involving artificial
intelligence: the spirit-ai extension, The Lancet Digital Health 2 (2020) e549–e560.
[17] Sounderajah, et al., Developing a reporting guideline for artificial intelligence-centred diagnostic
test accuracy studies: the stard-ai protocol, BMJ Open 11 (2021).
[18] D. A. Korevaar, J. F. Cohen, J. B. Reitsma, D. E. Bruns, C. A. Gatsonis, P. P. Glasziou, L. Irwig,
D. Moher, H. C. W. de Vet, D. G. Altman, L. Hooft, P. M. M. Bossuyt, Updating standards for
reporting diagnostic accuracy: the development of stard 2015, Research Integrity and Peer Review
1 (2016) 7. URL: https://doi.org/10.1186/s41073-016-0014-7. doi:10.1186/s41073-016-0014-7.
[19] M. E. Klontzas, A. A. Gatti, A. S. Tejani, C. E. Kahn, Ai reporting guidelines:
How to select the best one for your research, Radiology: Artificial Intelligence 5
(2023) e230055. URL: https://doi.org/10.1148/ryai.230055. doi:10.1148/ryai.230055.
arXiv:https://doi.org/10.1148/ryai.230055.
[20] H. Ibrahim, X. Liu, S. C. Rivera, D. Moher, A.-W. Chan, M. R. Sydes, M. J. Calvert, A. K. Denniston,
Reporting guidelines for clinical trials of artificial intelligence interventions: the spirit-ai and
consort-ai guidelines, Trials 22 (2021) 11. URL: https://doi.org/10.1186/s13063-020-04951-6. doi:10.
1186/s13063-020-04951-6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mongan</surname>
          </string-name>
          , et al.,
          <article-title>Checklist for artificial intelligence in medical imaging(claim): A guide for authors and reviewers</article-title>
          ,
          <source>Radiology: Artificial Intelligence</source>
          <volume>2</volume>
          (
          <year>2020</year>
          )
          <fpage>2638</fpage>
          -
          <lpage>6100</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Klontzas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Gatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mongan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Moy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. H.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Kahn</surname>
          </string-name>
          ,
          <article-title>Updating the checklist for artificial intelligence in medical imaging (claim) for reporting ai research</article-title>
          ,
          <source>Nature Machine Intelligence</source>
          <volume>5</volume>
          (
          <year>2023</year>
          )
          <fpage>950</fpage>
          -
          <lpage>951</lpage>
          . URL: https://doi.org/10.1038/s42256-023-00717-2. doi:
          <volume>10</volume>
          . 1038/s42256-023-00717-2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhandari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Scott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weilbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Marwah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lasocki</surname>
          </string-name>
          ,
          <article-title>Assessment of artificial intelligence (ai) reporting methodology in glioma mri studies using the checklist for ai in medical imaging (claim</article-title>
          ),
          <source>Neuroradiology</source>
          <volume>65</volume>
          (
          <year>2023</year>
          )
          <fpage>907</fpage>
          -
          <lpage>913</lpage>
          . URL: https://doi.org/10.1007/s00234-023-03126-9. doi:
          <volume>10</volume>
          .1007/s00234-023-03126-9.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>M. D. Schulz</surname>
            <given-names>KF</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altman</surname>
            <given-names>DG</given-names>
          </string-name>
          ,
          <article-title>Consort 2010 statement: Updated guidelines for reporting parallel group randomised trials</article-title>
          .,
          <source>Journal of Pharmacology and Pharmacotherapeutics</source>
          .
          <volume>1</volume>
          (
          <issue>2</issue>
          ) (
          <year>2010</year>
          )
          <fpage>100</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. P. L.</given-names>
            <surname>Martindale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Llewellyn</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. O. de Visser</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Ngai</surname>
            ,
            <given-names>A. U.</given-names>
          </string-name>
          <string-name>
            <surname>Kale</surname>
            ,
            <given-names>L. F. di Rufano</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Golub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            <surname>Collins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moher</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. D. McCradden</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Oakden-Rayner</surname>
            ,
            <given-names>S. C.</given-names>
          </string-name>
          <string-name>
            <surname>Rivera</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Calvert</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>C. S.</given-names>
          </string-name>
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Yau</surname>
            ,
            <given-names>A.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>P. A.</given-names>
          </string-name>
          <string-name>
            <surname>Keane</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          <string-name>
            <surname>Beam</surname>
            ,
            <given-names>A. K.</given-names>
          </string-name>
          <string-name>
            <surname>Denniston</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Concordance of randomised controlled trials for artificial intelligence interventions with the</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>