=Paper= {{Paper |id=Vol-3823/13_nafis_design_200 |storemode=property |title=Design and Assessment of Representative Hybrid Clinical Trials using Health Recommender System |pdfUrl=https://ceur-ws.org/Vol-3823/13_nafis_design_200.pdf |volume=Vol-3823 |authors=Nafis Neehal,Vibha Anand,Kristin P. Bennett |dblpUrl=https://dblp.org/rec/conf/healthrecsys/NeehalAB24 }} ==Design and Assessment of Representative Hybrid Clinical Trials using Health Recommender System== https://ceur-ws.org/Vol-3823/13_nafis_design_200.pdf
                         Design and Assessment of Representative Hybrid Clinical
                         Trials using Health Recommender System
                         Nafis Neehal1,* , Vibha Anand2 and Kristin P. Bennett3
                         1
                           Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, 12180 USA
                         2
                           Biomedical AI-Healthcare and Life Sciences, IBM T.J. Watson Research Center, Cambridge, MA, 02141 USA
                         3
                           The Institute of Data Exploration and Applications and the Mathematical Sciences and Computer Science Departments,
                         Rensselaer Polytechnic Institute, Troy, NY, 12180 USA


                                     Abstract
                                     Incorporating real-world data (RWD) into clinical trials can enhance trial efficiency, diversity, and generalizability.
                                     This paper introduces the Framework for Research in Synthetic Control Arms (FRESCA), which uses a novel
                                     Recommender System combined with Equity Adjustment strategies to design and evaluate Representative Hybrid
                                     Clinical Trials (HCTs). FRESCA employs a novel matching algorithm through its recommendation system to
                                     select suitable patients from RWD while ensuring that the selected population is representative of the target
                                     demographic. This dual approach improves both patient selection and trial outcomes by balancing statistical
                                     appropriateness and equity. Simulations based on data from two existing randomized clinical trials (RCTs) show
                                     that using FRESCA to recommend patients from RWD and apply equity adjustments enhances internal validity
                                     and generalizability. Our analysis indicates that combining matching and equity adjustments yields more accurate
                                     treatment effect estimates and fair population representation, even with reduced RCT control group sizes. In
                                     contrast, using either method alone may result in biased outcomes. The flexibility of FRESCA to simulate various
                                     HCT scenarios makes it a valuable tool for advancing equitable and efficient clinical trial designs.

                                     Keywords
                                     Causal Inference, Equity, Hybrid Clinical Trials, Randomized Clinical Trial, Recommender Systems




                         1. Introduction
                         Enhancing the efficiency, diversity, and generalizability of clinical trials can be achieved by incorporating
                         real-world data (RWD) [1]. This study introduces the Framework for Research in Synthetic Control
                         Arms (FRESCA), which combines a novel Recommender System with strategies for Equity Adjustment
                         to design and assess representative Hybrid Clinical Trials (HCTs). Synthetic control patients are patients
                         created from pre-existing de-identified datasets, used to mimic the characteristics of a real control
                         group in clinical trials. Synthetic control arms (SCAs) are especially useful in trials for rare diseases,
                         where finding enough "in-trial" concurrent controls (CCs) can be difficult due to ethical and practical
                         concerns [2] [3]. To address these challenges, HCTs use hybrid control arms (HCAs) that combine both
                         concurrent and synthetic controls. FRESCA uses health recommender systems based on propensity
                         score matching to recommend patients from external RWD who are suitable for inclusion in the trial,
                         creating a hybrid population.
                            The health recommender system in FRESCA identifies patients from RWD who closely match the
                         characteristics of those in the trial, enhancing the statistical power and reducing variance without
                         extending the trial duration or increasing costs. However, integrating RWD with randomized control
                         trial (RCT) data is challenging due to differences in their distributions [4]. To overcome this, FRESCA first
                         uses its health recommender system to select appropriate patients and then applies equity adjustments
                         to ensure the trial population accurately represents the target demographic. This combined approach


                          HealthRecSys’24: The 6th Workshop on Health Recommender Systems co-located with ACM RecSys 2024
                         *
                           Corresponding author.
                          $ neehan@rpi.edu (N. Neehal); anand@us.ibm.com (V. Anand); bennek@rpi.edu (K. P. Bennett)
                          € https://nafis-neehal.github.io/ (N. Neehal); https://research.ibm.com/people/vibha-anand (V. Anand);
                          https://faculty.rpi.edu/kristin-bennett (K. P. Bennett)
                           0000-0001-7015-8039 (N. Neehal); 0000-0001-8605-5712 (V. Anand); 0000-0002-8782-105X (K. P. Bennett)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
improves patient selection and ensures trial results are both statistically robust and demographically
representative.
   Our research aims to develop methods for conducting equitable HCTs and to provide a framework
for evaluating them. In this context, equity means that the trial participants should represent a broader
target population [5]. While RCTs can provide unbiased estimates for their specific cohorts, they
often fail to represent larger, more diverse target populations. Researchers use data from these target
populations to adjust RCT samples, ensuring that all relevant subgroups are included [6] [7]. Ensuring
equity in hybrid trials is essential for generalizability and is a key focus for institutions like the NIH
and FDA [8]. Our approach demonstrates that combining a health recommender system with equity
adjustments creates a more balanced and representative trial population than using either method alone.

   Table 1
   Distribution of Protected Attributes in FRESCA Cohorts and Biased External Controls in ALLHAT and
   SPRINT trials along with NHANES Target Subgroup Rates
  Attributes                          ALLHAT                                       SPRINT                       NHANES
                          TA            CC          Biased EC          TA            CC          Biased EC      Target
                       (N=8116)       (N=4000)      (N=9762)        (N=4234)       (N=2000)      (N=2200)        Rate
  Age Group
  40-59               1549 (19.1%)    772 (19.3%)   5841 (59.8%)    923 (21.8%)    438 (21.9%)    334 (15.2%)    31.2%
  59+                 6567 (80.9%)   3228 (80.7%)   3921 (40.2%)   3311 (78.2%)   1562 (78.1%)   1866 (84.8%)    68.8%
  Gender
  Female              3728 (45.9%)   1875 (46.9%)   2752 (28.2%)   1499 (35.4%)    691 (34.6%)    631 (28.7%)    55.4%
  Male                4388 (54.1%)   2125 (53.1%)   7010 (71.8%)   2735 (64.6%)   1309 (65.5%)   1569 (71.3%)    44.6%
  Race or Ethnicity
  Hispanic            1389 (17.1%)    740 (18.5%)   1671 (17.1%)    479 (11.3%)    225 (11.3%)    389 (17.7%)    10.0%
  NH Asian              84 (1.0%)      41 (1.0%)     693 (7.1%)      42 (1.0%)      15 (0.8%)      138 (6.3%)     3.9%
  NH Black            2525 (31.1%)   1265 (31.6%)   1937 (19.8%)   1232 (29.1%)    616 (30.8%)    468 (21.3%)    12.0%
  NH White            4050 (49.9%)   1921 (48.0%)   5285 (54.1%)   2451 (57.9%)   1128 (56.4%)   1172 (53.3%)    69.3%
  Other                 68 (0.8%)      33 (0.8%)     176 (1.8%)      30 (0.7%)      16 (0.8%)      33 (1.5%)      4.8%


   FRESCA was developed to design and evaluate these methods [9]. Before FRESCA, existing methods
for HCTs did not explicitly address equity in patient selection. FRESCA uses real clinical trial data to
simulate hypothetical trials, applying its methods to scenarios based on real RCTs, such as Systolic
Blood Pressure Intervention Trial (SPRINT) [10] and Antihypertensive and Lipid-Lowering Treatment
to Prevent Heart Attack Trial (ALLHAT) [11]. By integrating health recommender systems and equity
adjustment techniques, FRESCA ensures that the selected patients not only meet statistical criteria but
also represent the demographic characteristics of the target population. We define protected subgroups
based on age, gender, and race/ethnicity, using NHANES data [12] to estimate the rates for these
subgroups in a simulated target population. We will explore the use of additional protected attributes in
future work. FRESCA includes five main functions: generating cohorts, simulating scenarios, calculating
target subgroup rates, estimating treatment effects and equity, and providing a final assessment. It can
evaluate any HCT method, including those that combine health recommender systems with equity
adjustments. Detailed descriptions of the FRESCA framework and the trial configurations are provided
in sections 2.4 and 2.5.
   This paper makes several key contributions-

    • It identifies equity issues in HCTs and proposes solutions to improve the representativeness of
      trial populations.
    • It introduces an enhanced FRESCA framework with a modular architecture that supports multi-
      trial, multi-outcome, and multi-metric assessments for creating and evaluating HCTs.
    • It evaluates HCT methods that combine health recommender systems based on novel matching
      algorithm with equity adjustments, showing that the best results come from combining propensity
      score matching with IPF [13] weighting.
    • It demonstrates that using both health recommender systems and equity adjustments results in
      more equitable populations and precise estimates of the Population Hazard Ratio (PHR), even
      with smaller CC sizes.
    • It shows that variations in the sizes of treatment and control groups significantly affect the
      precision of treatment effect estimates, and that a balanced use of recommended synthetic
      controls is essential for accuracy.

   There are several strategies for integrating synthetic control populations into trial populations. Most
approaches use propensity score matching to select suitable external controls. Some methods focus
on matching based on treatment propensity [4, 14, 15], while others use propensity to predict trial
participation [3, 2, 1]. Bayesian approaches are also used to incorporate synthetic controls into trials
[16, 17], but they typically do not consider equity adjustments for a target population. Future work
with FRESCA could explore these and other methods to develop new strategies for equitable synthetic
control arms and assess their effectiveness. Further theoretical exploration of these and other HCT
algorithms is also planned.


2. Methodology


                                                     RCT




                       TA                             CC                           EC
                     Cohort                         Cohort                       Cohort
     Cohort
   Generation




                                                                                 Biased               Biased
                       TA                            CC
                                                                                   EC                Subgroup
                     Sample                        Sample
                                                                                 Sample                Rates
                                                                                                                     Legend

                                       Health Recommender System:                                                    TA = Treatment Arm
                                   Propensity Matching and HCT Formation                                             CC = Concurrent Controls
                                                                                                                     EC = External Controls
                                                                                                                     HC = Hybrid Controls
                                                  Propensity                                                         PHR = Population Hazard Ratio
                       TA                          Adjusted                                           Target         CTD = Cohort-Target Disparity
                                                      HC                                            Population
                                                                                                                            Data
                                                                                                                            FRESCA Functions
                                                                                                      Target
                                              Equity Adjustment                                      Subgroup               Process
                                                                                                       Rates
                                                                                                                            Random Sampling
                                  Calculate Adjusted Treatment Effect and Log                                               Sample Adjustment
                                                   Disparity                                      Target Subgroup           Standard Input/Output
                                                                                                  Rate Calculation
           Treatment Effect and
             Equity Estimation



                                                   Scenario
                                                   PHR and
    Scenario                                         CTD
                                                                                Repeat 50 Times
   Simulation



                                       Report aggregated PHR and CTD
                Assessment



            Figure 1: The FRESCA framework for hybrid clinical trials has five main functions. It utilizes a health
            recommender system for the “Propensity Matching and HC Formation” process and supports any
            standard method for distribution adjustment in the “Equity Adjustment” process.
2.1. Problem Definition
We define the problem using the potential outcomes framework from Neehal et al. [9]. Let 𝑌𝑠𝑡𝑖 represent
the potential outcome for subject 𝑖 in sample 𝑠 under treatment 𝑡, where 𝑠 = 0 is the target population,
𝑠 = 1 the RCT population, 𝑠 = 2 the RWD population, and 𝑠 = 3 an adjusted sample combining RCT
and RWD data. The Sample Hazard Ratio (SHR) in the RCT is 𝑆𝐻𝑅 = 𝐸(effect(𝑌11 , 𝑌10 )|𝑆 = 1), where
effect(𝑌11 , 𝑌10 ) is the difference in treatment effects between treated and control groups. The Population
Hazard Ratio (PHR) for the target population (𝑠 = 0) is defined as 𝑃 𝐻𝑅 = 𝐸(effect(𝑌01 , 𝑌00 )|𝑆 =
0), representing the expected treatment effect in the target population. Two main issues arise: (1)
Equity—the RCT may not represent the target population, leading to biased estimates (𝑃 𝐻𝑅 ̸= 𝑆𝐻𝑅),
and (2) Sample Size—insufficient patients in the control group may require synthetic controls (SCs)
from RWD. To accurately estimate PHR, we use a health recommender system with propensity score
matching to augment RCT data with SCs, and then perform appropriate equity adjustment, forming an
"equity-adjusted" sample (𝑠 = 3).

2.2. Data
We define the target population using the nationally representative hypertensive cohort from the
National Health and Nutrition Examination Survey (NHANES) 2015-2016 [12]. Representativeness in
the RCTs is assessed based on three protected attributes: Gender (Male, Female), Age Group (40-59,
59+), and Race/Ethnicity (Non-Hispanic Black, Non-Hispanic White, Non-Hispanic Asian, Hispanic,
Other). These age groups align with the inclusion criteria of the SPRINT and ALLHAT hypertension
studies. Target rates for each subgroup are calculated using survey-weighted analysis of US subjects
aged 40+ with hypertension.
   We use data from SPRINT and ALLHAT trials available from BioLINCC [18]. For ALLHAT, we focus
on the Amlodipine vs. Clorthalidone group, as the results are similar for the Lisinopril vs. Clorthalidone
group. After preprocessing, SPRINT includes 4,234 treated and 4,200 control patients, and ALLHAT
includes 8,116 treated and 13,762 control patients. The primary outcome for SPRINT is a composite
of major cardiovascular events, while for ALLHAT, the outcome is heart failure. Figure 1 illustrates
how FRESCA divides the RCT data into Treatment Arm (TA), Concurrent Controls (CC), and External
Controls (EC) cohorts. Table 1 presents the distribution of protected attributes in the RCT data and
the Biased External Controls for both SPRINT and ALLHAT, showing their differences from the target
NHANES population.
   The NHANES surveillance data and clinical trial data from BioLINCC were used with appropriate
approvals: BioLINCC approved ALLHAT and SPRINT data use under case 123537, and NHANES data
is freely available and exempt from human subjects research regulations per Rensselaer Polytechnic
Institute IRB 1863.

2.3. Adjustment Methods and Assessment Metrics
For balancing distributions between synthetic control and trial populations through the recommender
system, we employ propensity score matching using the “MatchIt” R package [19]. Iterative Proportional
Fitting (IPF) via the “IPFR” R package [20] is used for equity adjustment and Biased EC Cohort formation.
   Treatment effects are assessed using Cox’s Proportional Hazards Regression to estimate the Population
Hazard Ratio (PHR). The “ground-truth” Target PHR is estimated by equity adjustment on the entire
RCT dataset. For equity assessment, we use a variant of log disparity (LD)[5]:

                                            𝑜𝑑𝑑𝑠(𝑔(𝑥) = 1|𝑦 ′ = 1)
                                         {︂                          }︂
                                     log                                                              (1)
                                             𝑜𝑑𝑑𝑠(𝑔(𝑥) = 1|𝑦 = 1)

where 𝑔(𝑥) is the protected group, 𝑦 ′ is the observed cohort, and 𝑦 is the target population. Absolute LD
values between 0 and 0.22 are considered equitable [5]. We introduce Cohort-Target Disparity (CTD) as
the mean of median absolute LD values across simulated runs, calculated for subgroups defined by age,
race, and gender. This provides a comprehensive measure of demographic representativeness between
the study cohort and target population.

2.4. FRESCA Framework
FRESCA integrates a health recommender system to select suitable synthetic controls from external
data and combine them with RCT data, forming a hybrid control population. This framework uses
propensity score matching for patient selection and equity adjustments to ensure accurate estimates of
the Population Hazard Ratio (PHR) while maintaining representativeness for any target population.
FRESCA provides tools to assess the effectiveness of these methods, as shown in Fig 1, and comprises
five main functions. We demonstrate FRESCA’s application using the SPRINT and ALLHAT trials with
NHANES as the target population, but the framework is flexible and can be adapted to any RCT, RWD,
or target population.

2.4.1. Cohort Generation
In the Cohort Generation phase, FRESCA employs its health recommender system to generate three
cohorts: TA (Treatment Arm), CC (Concurrent Controls), and EC (External Controls). The TA and CC
cohorts are derived from RCT data, representing the treatment and control groups, respectively. The
EC cohort is sourced from real-world data, providing a pool of synthetic controls recommended by
the health recommender system to supplement the RCT. These cohorts collectively form the basis for
subsequent analyses.

2.4.2. Scenario Simulation
FRESCA facilitates the creation of diverse simulated trial scenarios to calculate adjusted PHRs and equity
metrics. This involves two stages: first, generating unbiased samples from the TA and CC cohorts to
simulate a randomized clinical trial; second, creating a Biased EC Sample from the EC cohort, reflecting
biased real-world data as the source of synthetic controls. The health recommender system plays a key
role in selecting these controls. Further details on simulation configurations are provided in section 2.5.

    Table 2
    Comparison of PHR and CTD across different trials, outcomes and methods. We show this for ALLHAT
    (𝑁𝑇 𝐴 = 4000, 𝑁𝐶𝐶 = 2000) and SPRINT (𝑁𝑇 𝐴 = 2000, 𝑁𝐶𝐶 = 1000) respectively. Symbol († ) in
    Cohort-Target Disparity column indicates measured CTD not being within equitable range (𝐶𝑇 𝐷 >
    0.22). Bold font indicates the best performing method.
    Trial         Outcome           Control                                            Target PHR         Estimated PHR        Cohort-Target Disparity
                                                       Adjustment Method
   (Study)        Examined         Population                                           [95% CI]              [95% CI]                [95% CI]
                                   CC           None                                                       1.39 [1.36, 1.43]       0.89 [0.84, 0.94]†
ALLHAT           Secondary         Hybrid       NC Matching                                                1.42 [1.37, 1.48]       0.87 [0.81, 0.94]†
                                                                                      1.38 [1.36, 1.41]
(Hypertension)   (Heart Failure)   Hybrid       Propensity Matching + IPF Sampling                         1.43 [1.32, 1.49]        0.03 [0.02, 0.04]
                                   Hybrid       Propensity Matching + IPF Weighting                       1.39 [1.33, 1.46]        0.04 [0.03, 0.05]
                                   CC           None                                                       0.75 [0.73, 0.78]       0.91 [0.86, 0.97]†
SPRINT                             Hybrid       NC Matching                                                0.74 [0.72, 0.77]       0.89 [0.84, 0.96]†
                 Primary                                                              0.79 [0.77, 0.82]
(Hypertension)                     Hybrid       Propensity Matching + IPF Sampling                         0.75 [0.67, 0.84]        0.01 [0.00, 0.01]
                                   Hybrid       Propensity Matching + IPF Weighting                       0.78 [0.74, 0.81]        0.04 [0.03, 0.05]




2.4.3. Target Subgroup Rates Calculation
The target rates for each subgroup are calculated using a survey-weighted analysis of the desired target
population from NHANES (e.g., adults with hypertension).

2.4.4. Treatment Effect and Equity Estimation
Once the scenario samples are created and target subgroup rates are determined, the next step is to
construct an equity-adjusted HCT population and estimate the treatment effect. FRESCA utilizes a
health recommender system based on propensity score matching to recommend suitable synthetic
controls (SCs) from the Biased EC Sample. A binary logistic regression model, incorporating TA, CC,
and Biased EC samples, generates propensity scores to select SCs, thereby forming a matched Hybrid
Control Arm (HCA). Equity adjustments are then applied to both the TA and HCA cohorts using the
Iterative Proportional Fitting (IPF) technique to better align them with the target population. This
process results in specific weight vectors, 𝑊𝐼𝑃 𝐹 _𝑇 𝐴 for the TA and 𝑊𝐼𝑃 𝐹 _𝐻𝐶𝐴 for HCA. Unlike the
previously used approach [9], where these weights were used to generate random samples, we now
directly compute the weighted and equity-adjusted treatment effect and equity value (LD) using these
weights.

2.4.5. Assessment
In the Assessment phase, FRESCA evaluates various HCT construction methods by combining the
health recommender system for propensity matching with two types of IPF-based equity adjustments:
weighted and sampling. Baseline scenarios without any adjustments or inclusion of SC are also evaluated
and compared with the NC Matching technique [14], with results summarized in Table 2. To assess the
precision of PHR estimations, FRESCA compares them with a “ground-truth” target PHR, derived from
equity adjustments applied to the complete RCT dataset (e.g., SPRINT/ALLHAT) using all treated and
control subjects. The data set is divided into treated and control cohorts, bootstrapped to match the sizes,
and adjusted to align with the NHANES population. The target PHR is calculated as an average across
all bootstrap samples and scenarios. Equity is evaluated by checking if the Cohort-Target Disparity
(CTD) falls within the [0, 0.22] range, adhering to the 80% rule [5].

2.5. Simulation of HCT Scenarios
FRESCA simulates various clinical trial scenarios to evaluate the effects of different trial design parame-
ters on outcomes using a health recommender system for patient selection.

2.5.1. ALLHAT
For ALLHAT, FRESCA creates a Concurrent Control (CC) Cohort by selecting 4,000 individuals from the
ALLHAT trial’s original control group, leaving 9,762 in the External Control (EC) Cohort. The Treatment
Arm (TA) remains unchanged with 8,116 participants. During simulation, biases are introduced in the
EC Cohort using the Iterative Proportional Fitting (IPF) method to reflect biased subgroup rates for age,
gender, and race, as well as smoker status, depression history, and HDLC history (Table 1). FRESCA
explores different experimental scenarios with varying sample sizes for TA (𝑁𝑇 𝐴 =4000) and CC (𝑁𝐶𝐶 =
0, 500, 1000, 2000, 4000), conducting 50 bootstrap simulations for each scenario. The mean Population
Hazard Ratio (PHR) and Cohort-Target Disparity (CTD) are calculated for each scenario, along with a
95% Confidence Interval.

2.5.2. SPRINT
For SPRINT, FRESCA selects a CC Cohort of 2,000 from the control group (total N=4200), leaving 2,200
in the EC Cohort, with the TA consisting of 4,234 participants. IPF is used to adjust for biases in three
protected attributes (Table 1) and additional factors such as Framingham Risk Score and Cardiovascular
Disease (CVD) History. Simulations use a TA sample size (𝑁𝑇 𝐴 =2000) and vary CC sample sizes (𝑁𝐶𝐶 =0,
500, 1000, 1500, 2000), running 50 bootstrap simulations per scenario. Results include mean PHR, CTD,
and 95% Confidence Intervals, as reported in the final assessment.
                 PHR (ALLHAT) − Only CC                                                            PHR (ALLHAT) − Propensity Matched                                             PHR (ALLHAT) − Equity Adjusted                                                    PHR (ALLHAT) − Both


          4000                                                                              4000                                                                          4000                                                                              4000



          2000                                                                              2000                                                                          2000                                                                              2000
CC Size




                                                                                  CC Size




                                                                                                                                                                CC Size




                                                                                                                                                                                                                                                  CC Size
          1000                                                                              1000                                                                          1000                                                                              1000



           500                                                                               500                                                                           500                                                                               500



             0    No data                                                                     0                                                                              0    No data                                                                      0


                 1.0                           1.2        1.4       1.6    1.8                     1.0                       1.2        1.4       1.6    1.8                     1.0                           1.2        1.4       1.6    1.8                     1.0                       1.2        1.4       1.6    1.8
                                                      Effect Size                                                                   Effect Size                                                                       Effect Size                                                                   Effect Size



                                                                                     (a) PHR Estimates measured using ALLHAT trial data
                 Equity (ALLHAT) − Only CC                                                         Equity (ALLHAT) − Propensity Matched                                          Equity (ALLHAT) − Equity Adjusted                                                 Equity (ALLHAT) − Both



          4000                                                                              4000                                                                          4000                                                                              4000




          2000                                                                              2000                                                                          2000                                                                              2000
                            Equitable Region




                                                                                                          Equitable Region




                                                                                                                                                                                            Equitable Region




                                                                                                                                                                                                                                                                          Equitable Region
CC Size




                                                                                  CC Size




                                                                                                                                                                CC Size




                                                                                                                                                                                                                                                  CC Size
          1000                                                                              1000                                                                          1000                                                                              1000




           500                                                                               500                                                                           500                                                                               500




             0         No data                                                                0                                                                              0         No data                                                                 0



                 0.00                          0.25      0.50       0.75   1.00                    0.00                      0.25      0.50       0.75   1.00                    0.00                          0.25      0.50       0.75   1.00                    0.00                      0.25      0.50       0.75   1.00
                                                      Effect Size                                                                   Effect Size                                                                       Effect Size                                                                   Effect Size



                                                                           (b) Cohort-Target Disparity measured using ALLHAT trial data
                                               Figure 2: Variation of PHR Estimates and Cohort-Target Disparity in ALLHAT Trial with Different CC
                                               Sizes. This figure illustrates the influence of various CC sizes (𝑁𝐶𝐶 = 0, 500, 1000, 2000, 4000) on the
                                               PHR estimates and equity measures for ALLHAT trial with a treatment arm size of 𝑁𝑇 𝐴 = 4000. In panel
                                               (a), the target PHR is demarcated by two solid red lines, encompassing the red dashed line representing
                                               the 95% confidence interval. Panel (b) features a green shaded area delineated by two black dashed
                                               lines, indicating the range considered equitable.


3. Results and Discussion
3.1. Performance comparison of different methods for creating HCTs
Our study evaluates methods for creating HCTs using two metrics as described above: PHR and
CTD. We analyze methods that combine propensity score matching with Iterative Proportional Fitting
(IPF) equity-adjustment methods (weighting or random sampling) and compare them to two baseline
scenarios: (i) no adjustments applied to the CC population, and (ii) SC added to trial via NC Matching
[14] algorithm. The results for the ALLHAT and SPRINT trials are detailed in Table 2. Our analysis
reveals two key findings: (i) Necessity of both Propensity and Equity Adjustments: PHR estimates without
equity adjustments (either by IPF weighting or sampling) are typically inequitable as evident by CTD
values, and they may produce biased treatment effects, as evident in high CTD values, and (ii) Superiority
of Weighting over Sampling: Using IPF Weighting for equity adjustment improves the accuracy and
consistency of PHR estimates compared to IPF Sampling. These findings, particularly the need for
comprehensive adjustments and the effectiveness of sample weighting, have been observed consistently
across multiple trials and outcomes, demonstrating the robustness of the FRESCA framework.

3.2. Examination of Variation in CC Size on PHR and CTD estimation
We studied the effect of varying CC population sizes on the estimated PHR and CTD in HCT. We
examined four methods: (i) no adjustments, (ii) only propensity matching, (iii) only equity adjustment,
and (iv) both propensity and equity adjustments. Fig 2 shows the results on ALLHAT for CC sizes
varying from 0 to 4000. Two main findings emerged - (i) Benefits of Synthetic Controls for Limited Data:
In scenarios with smaller CC sizes, missing subgroups were compensated by incorporating SC. This
strategy, especially with both propensity and equity adjustments, yielded accurate estimates, and (ii)
Table 3
The effect and significance of several trial design parameters in predicting the bias in PHR estimation. Here the
bias is defined to be the squared deviation of the estimated PHR from the target PHR. Star (*) symbol represents
a significant effect with p<0.05.
                        Predictors of Linear Model                            Estimate                     2.5%          97.5%          P Value
                       TA Size                                                      0.022                  0.009          0.036         0.001*
                       CC Size                                                     -0.134                 -0.157         -0.110         0.000*
                       Cohort-Target Disparity (CTD)                               -0.062                 -0.183          0.059         0.316
                       Cohort-RCT Disparity (CRD)                                   0.160                  0.021          0.298         0.024*
                       Controls (Only Equity)                                      -0.132                 -0.264         -0.001         0.048*
                       Controls (Both)                                             -0.089                 -0.222          0.043         0.186
                       Controls (Only Propensity)                                   0.035                  0.020          0.049         0.000*



PHR Accuracy and Acceptable Equity with Reduced CC Population Size: Reducing the CC population size
by 50% (from 4000 to 2000) still produced PHR estimates close to the target PHR. However, a larger CC
size is preferable for lower estimation variance, indicating a trade-off for trial designers. These patterns
were consistent across both SPRINT and ALLHAT trials, affirming the robustness of our findings. We
only show results for weighting equity adjustment for the ALLHAT trial for brevity, and additional
results are available in the Supplementary.

3.3. Examining effects of multiple factors for predicting PHR estimation accuracy
We analyzed factors affecting the accuracy of PHR estimates in ALLHAT using a linear model with
seven predictors. Bias in PHR estimation was quantified as the squared deviation from the target PHR.
Predictors included the size of the treatment arm (TA Size), the control group size (CC Size), Cohort-
Target Disparity (CTD), and Cohort-RCT Disparity (CRD), which measures the distribution differences
between control populations and the RCT population. Controls were categorized by adjustment type:
propensity, equity, both, or none. Results in Table 3 showed that TA Size, CC Size, and CRD significantly
predicted PHR bias. Key findings include: (i) Larger CC Size Reduces Bias: Increasing CC size lowers bias,
favoring a larger control group directly recruited over synthetic controls; (ii) Impact of Adjustments:
"Only Equity" and "Propensity and Equity" adjustments reduce bias compared to the "Only CC" category,
while "Only Propensity" adjustments increase bias, highlighting the importance of equity adjustments
for accurate PHR estimates.

3.4. Examining the effect of TA and CC Size ratio for a Fixed Size Recruitment Trial
We examined the balance between TA and CC population sizes when supplemented by SC in clinical
trials with a fixed recruitment size. Using ALLHAT and SPRINT trial data, we maintained a total
recruited participant cap of 4000, varying TA and CC sizes with corresponding SC adjustments. Four

                                          PHR (ALLHAT)                                              PHR (SPRINT)


                                   3000                                                      3000




                                   2000                                                      2000
                         SC Size




                                                                                   SC Size




                                   1000                                                      1000




                                     0                                                          0



                                          1.0     1.2        1.4       1.6   1.8                    0.6   0.7      0.8     0.9    1.0     1.1
                                                         Effect Size                                               Effect Size


Figure 3: Examining the influence of varied TA, CC, and SC Sizes on PHR Estimation in Fixed Sized Recruitment
Trials.
scenarios were considered with TA sizes of 3500, 3000, 2500, 2000 and CC sizes of 500, 1000, 1500, 2000,
inversely adjusting SC sizes 3000, 2000, 1000, 0. The PHR estimates from these scenarios are shown in
Figure 3. Key findings include: (i) The variance of PHR estimates increases with SC size, affecting the
stability of treatment effect estimation, and (ii) The PHR estimate can significantly shift with a highly
imbalanced ratio of real to synthetic data; especially observed in some scenarios with substantially
high SC size. This investigation therefore highlights the importance of carefully balancing the ratio
of CC and SC patients in HC to ensure accurate treatment effect estimates and avoiding erroneous
conclusions about a trial’s efficacy.


4. Conclusion
FRESCA offers a major advancement in equitable HCT methods and serves as a valuable tool for future
research. It creates realistic HCT scenarios, using a health recommender system for propensity score
matching and equity adjustments to provide more precise and equitable PHR estimates. Our simulations
suggest that fewer patients may be needed to achieve results similar to full trials, but further research is
required to determine the optimal balance of synthetic and concurrent controls in fixed-size trials. Future
work will involve testing FRESCA with more realistic EHR data, with additional protected attributes and
exploring the optimal size for CC recruitment during RCT design. Additionally, developing strategies
that integrate matching and equity adjustments in a single step could enhance efficiency and reduce
variance. These areas present opportunities for further refinement, making FRESCA a significant step
forward in hybrid clinical research with potential for ongoing improvement.


Acknowledgments
This work was partially funded by IBM Research. This manuscript was prepared using SPRINT and
ALLHAT study research materials obtained from the NHLBI Biologic Specimen and Data Repository
Information Coordinating Center and does not necessarily reflect the opinions or views of SPRINT,
ALLHAT or NHLBI.


References
 [1] A. Sachdeva, R. C. Tiwari, S. Guha, A novel approach to augment single-arm clinical studies
     with real-world data, Journal of Biopharmaceutical Statistics 32 (2022) 141–157. doi:10.1080/
     10543406.2021.2011902.
 [2] J. Harton, B. Segal, R. Mamtani, N. Mitra, R. A. Hubbard, Combining real-world and randomized
     control trial data using data-adaptive weighting via the on-trial score, Statistics in Biopharmaceu-
     tical Research (2022) 1–13. doi:10.1080/19466315.2022.2071982.
 [3] X. Yin, P. S. Mishra-Kalyan, R. Sridhara, M. D. Stewart, E. A. Stuart, R. C. Davi, Exploring the
     potential of external control arms created from patient level data: a case study in non-small cell
     lung cancer, Journal of Biopharmaceutical Statistics 32 (2022) 204–218. doi:10.1080/10543406.
     2021.2011901.
 [4] E. A. Stuart, D. B. Rubin, Matching with multiple control groups with adjustment for group
     differences, Journal of Educational and Behavioral Statistics 33 (2008) 279–306. doi:10.3102/
     1076998607306.
 [5] M. Qi, O. Cahan, M. A. Foreman, D. M. Gruen, A. K. Das, K. P. Bennett, Quantifying represen-
     tativeness in randomized clinical trials using machine learning fairness metrics, JAMIA Open 4
     (2021) ooab077. doi:10.1093/jamiaopen/ooab077.
 [6] E. Hartman, R. Grieve, R. Ramsahai, J. S. Sekhon, From sample average treatment effect to
     population average treatment effect on the treated: combining experimental with observational
     studies to estimate population treatment effects, Journal of the Royal Statistical Society. Series A
     (Statistics in Society) (2015) 757–778. doi:10.1111/rssa.12094.
 [7] A. Y. Ling, M. E. Montez-Rath, P. Carita, K. J. Chandross, L. Lucats, Z. Meng, B. Sebastien, K. Kap-
     phahn, M. Desai, An overview of current methods for real-world applications to generalize or
     transport clinical trial findings to target populations of interest, Epidemiology 34 (2023) 627–636.
     doi:10.1097/EDE.0000000000001633.
 [8] J. Petkovic, J. Jull, M. Yoganathan, O. Dewidar, S. Baird, J. M. Grimshaw, K. A. Johansson, E. Krist-
     jansson, J. McGowan, D. Moher, et al., Reporting of health equity considerations in cluster and
     individually randomized trials, Trials 21 (2020) 1–12. doi:10.1186/s13063-020-4223-5.
 [9] N. Neehal, V. Anand, K. P. Bennett, Framework for research in equitable synthetic control arms, in:
     AMIA Annual Symposium Proceedings, volume 2023, American Medical Informatics Association,
     2023, p. 530.
[10] J. T. Wright Jr, et al., A randomized trial of intensive versus standard blood-pressure control, New
     England Journal of Medicine 373 (2015) 2103–2116. doi:10.1056/NEJMoa1511939.
[11] C. D. Furberg, et al., Major outcomes in high-risk hypertensive patients randomized to angiotensin-
     converting enzyme inhibitor or calcium channel blocker vs diuretic: The antihypertensive and
     lipid-lowering treatment to prevent heart attack trial (allhat), JAMA 288 (2002) 2981–2997. doi:10.
     1001/jama.288.23.2981.
[12] Centers for Disease Control and Prevention, National health and nutrition examination survey
     (nhanes) data, U.S. Department of Health and Human Services, Centers for Disease Control and
     Prevention, 2023. URL: https://www.cdc.gov/nchs/nhanes/index.htm, accessed on: October 2023.
[13] W. E. Deming, F. F. Stephan, On a least squares adjustment of a sampled frequency table when the
     expected marginal totals are known, The Annals of Mathematical Statistics 11 (1940) 427–444.
     doi:10.1214/aoms/1177731829.
[14] J. Yuan, J. Liu, R. Zhu, Y. Lu, U. Palm, Design of randomized controlled confirmatory trials using
     historical control data to augment sample size for concurrent controls, Journal of Biopharmaceutical
     Statistics 29 (2019) 558–573. doi:10.1080/10543406.2018.1559853.
[15] Y. Liu, B. Lu, J. Foster, Y. Zhang, Z. J. Zhong, M.-H. Chen, P. Sun, Matching design for augmenting
     the control arm of a randomized controlled trial using real-world data, Journal of Biopharmaceutical
     Statistics 32 (2022) 124–140. doi:10.1080/10543406.2021.2011900.
[16] K. Viele, et al., Use of historical control data for assessing treatment effects in clinical trials,
     Pharmaceutical statistics 13 (2014) 41–54. doi:10.1002/pst.1589.
[17] X. Pang, et al., A bayesian alternative to synthetic control for comparative case studies, Political
     Analysis 30 (2022) 269–288. doi:10.1017/pan.2021.22.
[18] National Heart, Lung, and Blood Institute, Biologic specimen and data repository information
     coordinating center, https://biolincc.nhlbi.nih.gov/home/, 2024. Accessed September 22, 2024.
[19] D. E. Ho, K. Imai, G. King, E. A. Stuart, Matchit: Nonparametric preprocessing for parametric
     causal inference, Journal of Statistical Software 42 (2011) 1–28. doi:10.18637/jss.v042.i08.
[20] K. Ward, IPFR: List Balancing for Reweighting and Population Synthesis, 2020. URL: https://CRAN.
     R-project.org/package=ipfr, r package version 1.0.2.