<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>mendations through Patient Metadata Integration: A Persona-Based Evaluation Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Khushboo Thaker</string-name>
          <email>k.thaker@pitt.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Brusilovsky</string-name>
          <email>peterb@pitt.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daqing He</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Youjia Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Hassany</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Behnam Rahdari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Heidi Donovan</string-name>
          <email>donovanh@pitt.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Young Ji Lee</string-name>
          <email>leeyoung@pitt.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Health Recommender Systems, Personalized Health Information</institution>
          ,
          <addr-line>Persona based Evaluation, Patient Metadata</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computing and Information, University of Pittsburgh</institution>
          ,
          <addr-line>Pittsburgh</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Nursing, University of Pittsburgh</institution>
          ,
          <addr-line>Pittsburgh</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Patients and caregivers managing chronic diseases frequently seek information to navigate symptoms and make informed decisions as their conditions evolve. Health recommender systems are crucial in delivering personalized and relevant content tailored to individual needs. This paper presents a novel approach to enhancing content-based health recommendations by integrating patient metadata and trajectories. We developed a system that leverages these elements and evaluated its efectiveness with the assistance of healthcare professionals, specifically nurses. To address diverse needs and varied patient scenarios, we employed personas that simulate diferent patient journeys, ensuring the system's robustness across multiple contexts. Our findings indicate that incorporating user profiles with trajectory-based features significantly enhances the recommender system's performance. Our model enhanced with both profile and trajectory, achieved a Mean Average Precision (MAP) of 0.7318, demonstrating a 17 point increase. Our future research aims to utilize personas during the cold-start recommendation phase, where patient metadata is limited and persona can work as digital twin supporting patients journey.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Integration</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Individuals with chronic illnesses such as diabetes, cancer, and heart disease are often deeply involved in
managing their health. They actively seek out information to make informed decisions and efectively
manage their conditions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Traditionally, reliable health information has been disseminated through
patient education materials and health literacy workshops [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. However, these resources are typically
designed for a general audience and often fail to meet the specific needs of individual patients [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        In pursuit of more tailored health information, many patients turn to online search platforms.
Unfortunately, these platforms frequently fall short in delivering the specialized, personalized content
that patients require [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. As a result, patients increasingly seek information in disease-specific online
health communities (OHCs), where they can connect with others who share similar experiences [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
While beneficial, OHCs also pose significant risks of spreading misinformation, lacking the necessary
infrastructure to ensure patients access the most relevant and verified information [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Consumer health search engines and recommender systems ofer a promising solution to these
challenges by providing personalized and reliable health information. Recent research has shown
that enhancing the user experience on consumer health search engines, for example, through query
reformulation for specialized health terms, can significantly improve search outcomes [ 9, 10]. Similarly,
CEUR</p>
      <p>ceur-ws.org
consumer health recommender systems can enhance user satisfaction by suggesting personalized
content that incorporates user feedback [11, 12].</p>
      <p>Despite these advancements, existing health recommender systems often focus on addressing users’
immediate needs without suficiently integrating user meta data like user profile and trajectory
information. This limitation is particularly problematic in cold-start situations, where the system lacks
adequate data to generate accurate recommendations. Additionally, current evaluation methods for
these systems frequently overlook the diversity of user experiences, failing to account for the range of
patient scenarios encountered in practice.</p>
      <p>The primary motivation for this research is to address these limitations by developing a
contentbased recommendation system that integrates user profiles and trajectory information to enhance
recommendation accuracy. Specifically, we aim to investigate how the inclusion of trajectory-based
features can better address the diverse information needs of patients at diferent stages of their health
journey. The key contributions of this paper are as follows:
• The design and implementation of a content-based health recommender system that efectively
integrates user profiles and trajectory data, significantly improving recommendation accuracy.
• The introduction of a novel evaluation framework that employs personas, generated through
clustering online health community data and refined via expert feedback, to simulate diverse
patient scenarios.</p>
      <p>The remainder of this paper is organized as follows: Section 2 details the system architecture and
the methodology employed in this study. Section 3 presents the experimental results and discusses
the findings. Section 4 ofers implications, and potential areas for future research. Finally, Section 6
concludes the paper by summarizing the key findings and contributions.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Methodology</title>
      <sec id="sec-3-1">
        <title>2.1. Creating Personas for Comprehensive System Evaluation</title>
        <p>In designing a human-centered recommender system, it is critical to understand its target users, examine
their information needs, and understand in which aspects the users might be diferent from each other.
Specifically, in our case, where it is hard to evaluate the ongoing research on large patient population,
which makes is essential to create a comprehensive user base to cover the broad spectrum of patient
needs. For this we created ten Ovarian Cancer (OvCa) Patient and Caregiver personas by triangulating
multiple sources, including online cancer forums, direct interviews with patients and caregivers, domain
expert input, and clinical notes. The detailed process of persona creation is discussed in [13, 14]. To
depict the appearance and details of a persona, we have included a comprehensive representation in
Figure 1. The process include clustering of user base and than each cluster was enhanced by OvCa
health professional to develop a general persona representing these cluster. Table 1 lists all the features
utilized for clustering.</p>
        <p>While creating personas it was revealed that the disease trajectory and information needs play key
roles in diferentiating groups of similar users. In contrast, demographics-related metadata, which serve
as a key factor in grouping users in many existing recommender systems, were scarce and did not
play a major role in forming the personas. This study provided us with important design implications
for the development of personalized recommendation algorithms. First, the algorithms have to focus
on adapting to both patients’ disease trajectory and expressed information needs to recommend the
most relevant documents. Since both factors could change over time, the system has to constantly
monitor the users and keep their profiles up-to-date. Second, the presence of stable user clusters enables
the system to perform eficient “cold-start” personalization at the start of the user’s interaction with
the system by matching the user to one of the clusters and ofering a group-level adaptation. These
implications afected the design of the system algorithms discussed in Section 2.3.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Data Source</title>
        <p>The primary source of documents for this study is the HELPeR library 1, a specialized repository
meticulously organized into a topic hierarchy known as OvCa TH (Ovarian Cancer Topic Hierarchy) [15].
The OvCa TH, developed by a dedicated team of oncology nurses, categorizes documents into 17 broad
categories and 90 specific sub-categories. This hierarchical structure ensures that documents are
welldistributed collection across diferent categories. Such distribution is crucial for covering the diverse
informational needs of OvCa patients and caregivers, ensuring that the library provides comprehensive
and balanced content. The library consists of 2808 documents, a detail statistic about the library is
provided in Table 3 and top 10 domains are listed in Table 2.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Recommendation Methods:</title>
        <p>Our approach employs a content-based recommender system, which focuses on analyzing the content
of documents to generate personalized recommendations. The input to the system includes explicit
1https://sites.pitt.edu/ dah44/helper/
user information, such as the patients’s profile (in our case Persona).</p>
        <p>The output is a ranked list of documents that are tailored to the user’s specific needs.
2.3.1. Candidate Retrieval
In the initial phase of our recommendation pipeline, we focused on identifying a suitable candidate
retrieval method. This step is crucial as it determines the pool of documents that will be further ranked
and refined. The input query to this retrieval method is the need topics mentioned in the Patient
Personas like nephropathy, financial assistance, cancer pain and hormonal therapy.</p>
        <p>Our primary goal was not to find the absolute best model, but to select a reliable candidate ranking
method that would allow us to conduct further experiments efectively. Based on previous
benchmarks [16] on biomedical retrieval tasks we evaluated five underlying retrieval models: BM25, a
well-established probabilistic model, served as our baseline [17]. We also explored BERT-based models,
including BERT [18] and BioBERT [19] and MedCPT [16], which are designed to capture contextual
nuances in text, particularly in the biomedical domain [16] and ColBERT [20, 17, 16], which has shown
to be a strong baseline and generalized to bio-medical domain.</p>
        <p>To conduct all these experiments, we utilized Pyserini [21], which integrates PyTorch and FAISS-based
indexing for dense retrieval [22] and Lucene engine for BM25 retrieval. For FAISS indexing we utilized
default parameters with number of clusters to be 1 as our dataset is small and we are not interesting in
optimizing the performance. Further investigation is required to identify the best-performing models,
we leave these enhancements for future as discussed in Section 4.1. In our datset patient articles can be
smaller less than 100 words to huge more than 11000 words with mean 2069(± 4091) we have indexed
the documents at the section level. In this study, we employed several re-ranking methods to refine the
initial retrieval results based on both patient profiles and their trajectories.</p>
      </sec>
      <sec id="sec-3-4">
        <title>2.4. Profile and Trajectory-Based Re-Ranking</title>
        <p>In this study, we implemented two primary re-ranking approaches to refine the initial retrieval results
by enhancing the relevance to the patient’s specific profile and trajectory:</p>
        <p>Trajectory Description
End of Life The patient is in an advanced stage of disease where treatment is no
longer curative but palliative. Focus may shift to symptom
management, including pain relief, managing symptom, and emotional support.</p>
        <p>Advanced care planning, hospice care, and addressing emotional needs
such as anxiety, grief, and fear of death become priorities.</p>
        <p>Survivorship After Treat- The patient has completed initial treatment and is transitioning to
postment treatment care. This stage involves managing long-term side efects.</p>
        <p>Emotional challenges may include fear of recurrence and adjusting to a
new normal. Patients focus on recovery, rehabilitation, and monitoring
for recurrence.</p>
        <p>Survivorship The patient is currently receiving initial treatment for a new diagnosis.</p>
        <p>Information needs center around understanding treatment options,
managing side efects, and coping with the emotional stress of a new
cancer diagnosis. Patients may need support in navigating treatment
decisions and maintaining quality of life during active treatment.</p>
        <p>Survivorship During Pro- The patient is dealing with disease recurrence or progression, requiring
gression/Recurrence new treatment strategies. Focus shifts to managing new or worsening
symptoms, exploring second-line treatments, and addressing the
emotional impact of disease recurrence. Information on advanced therapies,
clinical trials, and symptom management becomes critical.</p>
        <p>Previvorship The patient is at high risk for developing ovarian cancer due to genetic
predispositions but has not yet been diagnosed. Information needs
include preventive strategies, such as risk-reducing surgeries , regular
screening, and lifestyle modifications. Emotional challenges may
include anxiety over potential diagnosis and decision-making regarding
preventive measures.</p>
        <p>Survivorship After Treat- The patient has completed treatment for recurrent or progressive
disment (Recurrence/Pro- ease and is now managing life post-treatment. There is a focus on
gression) managing long-term side efects of the second treatment, ongoing
surveillance, and addressing fears related to further progression.
Emotional support may be needed for coping with uncertainty, and physical
rehabilitation may be required to manage lingering symptoms from
aggressive treatments.
• Profile Re-Ranking (Keyphrase-Based) :</p>
        <p>We extracted keyphrases from the patient’s profile 2, which includes relevant attributes such as
medical history, demographics, and treatment details. These keyphrases were then appended to
the patient’s stated need to generate a refined rank list. This approach aims to align the retrieved
documents more closely with the patient’s overall profile. We initially tried to use complete
profile embeddings as well as all keyphrases but both of these methods performed poorly so we
decided to restrict our keyphrases based on semantic types including drug, disease, therapeutic
procedure, laboratory test.
• Trajectory-Based Re-Ranking: In this method, the patient’s trajectory, or their current stage
in the progression of their disease, was integrated into the recommendation process to provide
more personalized information. A patient’s disease trajectory refers to the stages they go through,
from diagnosis to treatment and beyond, which significantly impact their information needs.
For ovarian cancer patients, the type of information required can vary drastically depending on
whether they are in early treatment, post-treatment survivorship, or dealing with a recurrence of
the disease.
2Keyphrase Extraction was done using Scispacy with model en_core_sci_scibert. UMLS linking was utilized and semantic
type of each keyphrase was extracted -https://github.com/allenai/scispacy/tree/main</p>
        <p>To incorporate trajectory-based re-ranking, we created embeddings of the trajectory descriptions
as listed in Table 4, which were used as query information to re-rank the top results. This
approach ensured that documents related to the same information need but more specific to the
patient’s trajectory were ranked higher. For example, if the patient was in the ”Survivorship After
Treatment” stage, their primary need might be managing side efects from chemotherapy. By
embedding the description of this trajectory, the system would prioritize documents that address
not only general chemotherapy side efects but those particularly relevant to patients who have
completed treatment and are focusing on recovery and long-term management.
• Combining Profile and Trajectory Re-Rankings:
We merged the rank lists generated from
both the profile-based and trajectory-based re-rankings to produce a final combined rank list.
This approach leverages the strengths of both methods to deliver a more comprehensive and
contextually relevant set of results.</p>
        <p>To generate a final, more comprehensive ranking, we combined the results from the profile-based and
trajectory-based re-rankings using Reciprocal Rank Fusion (RRF). RRF is a technique used to merge
multiple ranking lists by assigning a score to each document based on its position in each list. The RRF
score for a document  in a ranking list  is calculated as:</p>
        <p>RRF_score() =

∑
=1  + rank ()
1
where  is a constant (typically set to 60), and rank () is the rank of document  in ranking list  . For
our experiments, we gave equal weight to all ranking lists, ensuring a balanced integration of both
profile-based and trajectory-based relevance. This combined approach allows us to generate a final
ranking that takes into account both the patient’s detailed profile and their trajectory, leading to a
more personalized and contextually appropriate set of recommendations. We restricted RRF to top 10
documents from each rank-list.</p>
      </sec>
      <sec id="sec-3-5">
        <title>2.5. The Study Design</title>
        <p>The study was designed to evaluate the efectiveness of diferent document recommendation algorithms
within the HELPeR system, based on feedback from expert caregivers. By simulating real-world
decisionmaking processes, the study aimed to assess how well these algorithms supported the selection of
relevant documents for ovarian cancer patients and their caregivers. The study involved several key
steps, as illustrated in Figure 3, guiding participants through a structured sequence of tasks that included
document annotation, feedback, and selection.</p>
        <p>Step 1: Pre-Survey
The study began with a pre-survey, which participants were required to complete. The survey collected
key demographic information and assessed participants’ caregiving backgrounds, as summarized in
need of the patient. This limitation of the study is discussed in Section 4.1
Step 2: Persona Familiarization
Following the pre-survey, participants were introduced to a randomly selected hypothetical patient
persona as discussed in Section 2.1. Participants were given time to familiarize themselves with the
persona and were asked to confirm their understanding of the patient’s context. This step ensured that
participants felt comfortable and confident in making document selections on behalf of the persona.
Step 3: Initial Document Feedback (Subtask 1)
The document list was created by first selecting the top ‘n‘ (n=10) documents generated by each of the
algorithms discussed in Section 2.3. These top documents were then combined to form a single list
of unique documents. To eliminate any positional bias [23], the combined list was randomly sorted.
Participants were subsequently presented with this randomized list of unique documents for their
review and feedback. Every time a participant reviewed an article, we ensured that the persona was
accessible to them (Refer 2. This approach allowed participants to refer back to the persona as needed,
minimizing the risk of recall bias that could occur if they had to rely solely on their memory of the
persona’s details. For each document, participants were required to provide detailed feedback [24].
This feedback focused on the relevance of the document to the persona’s needs and trajectory and the
participant’s assessment of the document’s usefulness.</p>
        <p>Step 4: Additional Document Feedback (Subtask 2)
Based on the feedback provided in Subtask 1, the HELPeR system generated a second set of documents.
From the additional documents, the unique documents were selected to complement the initial set.
These were shown to the participants for further feedback. This step was crucial in ensuring that the
study captured feedback on a comprehensive set of relevant documents.</p>
        <p>Step 5: Document Selection for Patient
In this step, participants were explicitly asked to assume the role of a nurse responsible for the patient.
They were instructed to select up to three articles from the documents provided that they deemed most
appropriate for the patient to read. Participants were required to justify their selections by indicating
specific qualities that made these documents and how to perceive the understanding of this document
for the presented patient. (Refer Table 5)
Step 6: Overall Feedback
Finally, participants were asked to provide overall feedback on the study. This included their impressions
of the persona they interacted with, the quality and relevance of the documents encountered, and
their thoughts on the HELPeR system. The design of this study was carefully structured to simulate a
realistic caregiving scenario, allowing us to gather valuable insights into the usability and efectiveness
of diferent recommendation algorithms within the HELPeR system, as judged by expert caregivers.</p>
      </sec>
      <sec id="sec-3-6">
        <title>2.6. Participants</title>
        <p>The study focused on assessing HELPeR from the perspective of expert caregivers. A total of eleven
nurses from the School of Nursing were recruited for the study. These participants were selected based
on their expertise and experience, with all participants possessing a minimum of 1 years of nursing
experience and all of them had care-giving experience. Additionally, each participant confirmed their
understanding of ovarian cancer and their ability to provide care for ovarian cancer patients.</p>
        <p>This study serves as a preliminary investigation, preceding a larger study involving patients and
caregivers. The choice to use personas in the study was influenced by the fact that nurses typically
have limited information about a patient during a clinical visit, making the use of a persona a realistic
representation of a real-world scenario.</p>
        <p>To recruit participants, we employed a convenience sampling method due to the limited availability
of qualified candidates. However, we ensured that the selected participants were representative of
a diverse range of ages. Specifically, the group consisted of eleven participants aged between 20
and 44 years. The detailed statistics of the participants are provided in Table 6 Overall, this group of
expert caregivers provided a representative sample for the study, allowing us to gather high quality
gold-standard to evaluate recommender algorithms.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Experimental Results</title>
      <sec id="sec-4-1">
        <title>3.1. Baseline Candidate Retrieval Performance</title>
        <p>BM25 shows moderate performance, achieving a MAP of 0.4253 and an MRR of 0.2667 for patient
need, but its efectiveness declines for patient trajectory (MAP: 0.2271, MRR: 0.2292) and expert
recommendations (MAP: 0.1850, MRR: 0.1339). Among dense retrieval models, BioBERT outperforms
BERT, especially in expert recommendation alignment (BioBERT MAP: 0.2289, MRR: 0.1339 vs. BERT
MAP: 0.1736, MRR: 0.1250). However, both models fall short in trajectory relevance compared to
sparse retrieval. As show in previous work [16], Fine-tuned models, ColBERT and MedCPT, lead in
performance. MedCPT, in particular, excels across all criteria, with the highest MAP and MRR values
(e.g., MAP: 0.5523 for need, 0.2815 for trajectory, and 0.4123 for expert recommendations), highlighting
the benefits of fine-tuning for retrieval tasks.</p>
        <p>The results demonstrate that fine-tuned dense retrieval models, especially MedCPT, ofer superior
accuracy in retrieving information relevant to patient needs, trajectories, and expert recommendations,
surpassing both traditional sparse and standard dense retrieval approaches. For our further experiments,
we will focus on BM25, ColBERT (general domain), and MedCPT (medical articles), as they are fine-tuned
models and have demonstrated superior performance already.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Efect of integrating Patient Metadata</title>
        <p>The performance of the re-ranking strategies across the BM25, ColBERT, and MedCPT models is
summarized in the Table 8. MedCPT consistently demonstrates superior performance across all criteria,
particularly in the combined profile and trajectory re-ranking approach, where it achieves a MAP
of 0.7318 for Need, 0.6018 for Trajectory, and 0.6314 for Expert Recommendations. ColBERT also
performs well, especially in the Trajectory-based re-ranking, indicating its efectiveness in integrating
patient-specific trajectory information. In contrast, while BM25 shows reasonable performance, it falls
behind the more advanced models, highlighting the significant benefits of incorporating both profile
and trajectory data in enhancing retrieval relevance.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Implications</title>
      <p>The use of personas, which are generic representations of patients at specific trajectory stages, ofers
a promising approach to addressing the cold-start problem in health recommender systems. These
systems often struggle with limited patient data due to privacy concerns or patient reluctance to
share information. Additionally, small population sizes in certain health contexts make collaborative
ifltering data scarce. By leveraging personas formed through representative patient stereotypes, health
recommender systems can potentially provide personalized recommendations even in the absence of
direct patient data, ofering a viable solution for enhancing system efectiveness in these challenging
scenarios.</p>
      <sec id="sec-5-1">
        <title>4.1. Limitations</title>
        <p>While hybrid retrieval and fine-tuning approaches have shown potential for improving retrieval
efectiveness [17], optimizing the retrieval pipeline was not the primary focus of this work. We utilized
a basic zero-shot retrieval pipeline, which, while suficient for our study’s objectives, may not fully
leverage the potential of more advanced methods. In future work, we plan to explore and incorporate
more sophisticated retrieval pipelines, including hybrid methods and fine-tuning techniques, to enhance
the accuracy and relevance of health recommendations.</p>
        <p>Recent advancements in large language models (LLMs) have demonstrated significant potential
for enhancing personalization in recommender systems through zero-shot and few-shot prompting
techniques [25]. These techniques have been particularly efective for tasks such as query expansion
and user profile integration, allowing for more nuanced and context-aware recommendations [ 26, 27].
However, in this work, we did not explore these LLM-based approaches due to constraints on time and
computational resources. While our current methods are efective for the study’s scope, integrating LLM
techniques could potentially lead to more refined and personalized recommendations. In future research,
we intend to investigate the application of LLMs to our recommendation framework, leveraging their
capabilities to improve both query processing and user profile adaptation.</p>
        <p>One of the primary limitations of this study is the relatively small sample size, with only 11 experts
participating in the evaluation. This limited number of participants is insuficient to establish statistical
significance for the results. Consequently, the findings should be interpreted with caution. However, it
is important to note that this study serves as a preliminary investigation. Future research will involve a
more comprehensive study with a larger sample size, including actual patients and caregivers. This
expanded dataset will enable a more robust analysis and provide deeper insights into the efectiveness
and applicability of the proposed recommendation system.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>In this study, we discussed and demonstrated the significance of integrating persona meta-data into
health recommender systems. By leveraging personas—generic representations of patients at specific
trajectory stages—we showed how these meta-data points can efectively address challenges in
personalization, particularly in cold-start scenarios where patient data may be limited due to privacy concerns
or patient reluctance. Our findings suggest that even minimal persona meta-data can enhance the
relevance of recommendations, providing a strong foundation for further exploration in this area.</p>
      <p>While we primarily focused on two specific meta-data points in this study, we recognize the potential
of other persona attributes to further improve recommender systems. Our future research will explore
how additional persona meta-data can be integrated to enrich recommendations and support more
nuanced personalization.</p>
      <p>Additionally, we collected qualitative feedback from nurse experts, which revealed a preference for
selecting documents that are easy for patients to understand. This insight aligns with our ongoing
research into knowledge adaptation [28], where we aim to tailor recommendations based on the patient’s
level of understanding and comprehension.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by awards from the National Library of Medicine (NLM) of the National
Institutes of Health (NIH) (Award Number: R01-LM013038). The content is solely the responsibility of
the authors and NIH or NLM had no role in the design or conduct of the study; analysis, or interpretation
of data; or preparation or review of the manuscript.
[9] Jimmy, G. Zuccon, B. Koopman, Choices in knowledge-base retrieval for consumer health search,
in: G. Pasi, B. Piwowarski, L. Azzopardi, A. Hanbury (Eds.), Advances in Information Retrieval,
Springer International Publishing, Cham, 2018, pp. 72–85.
[10] D. He, Z. Wang, K. Thaker, N. Zou, Translation and expansion: Enabling laypeople access to the
covid-19 academic collection, Data and Information Management 4 (2020) 177–190.
[11] R. D. Croon, L. V. Houdt, N. N. Htun, G. Štiglic, V. V. Abeele, K. Verbert, Health recommender
systems: Systematic review, Journal of Medical Internet Research 23 (2021).
[12] M. Etemadi, S. B. Abkenar, A. Ahmadzadeh, M. H. Kashani, P. Asghari, M. Akbari, E. Mahdipour,
A systematic review of healthcare recommender systems: Open issues, challenges, and techniques,
Expert Syst. Appl. 213 (2023) 118823.
[13] Y. Wang, K. Thaker, V. Hui, P. Brusilovsky, D. He, H. Donovan, Y. J. Lee, Utilizing digital twin to
create personas representing ovarian cancer patients and their families, in: Innovation in Applied
Nursing Informatics, IOS Press, 2024, pp. 754–756.
[14] K. Thaker, V. Hui, Z. Luo, Y. Wang, B. Rahadari, P. Brusilovsky, D. He, H. Donovan, Y. J. Lee, Walk
a mile in their shoes: Using personas to obtain health professional perspectives for an ovarian
cancer patient-focused recommender system, in: Proceedings of the AMIA Annual Symposium,
AMIA, New Orleans, USA, 2023.
[15] B. Rahdari, P. Brusilovsky, D. He, K. M. Thaker, Z. Luo, Y. J. Lee, Helper: An interactive
recommender system for ovarian cancer patients and caregivers, in: Proceedings of the 16th ACM
Conference on Recommender Systems, RecSys ’22, Association for Computing Machinery, New
York, NY, USA, 2022, p. 644–647.
[16] Q. Jin, W. Kim, Q. Chen, D. C. Comeau, L. Yeganova, W. J. Wilbur, Z. Lu, MedCPT: Contrastive
Pretrained Transformers with large-scale PubMed search logs for zero-shot biomedical information
retrieval, Bioinformatics 39 (2023).
[17] N. Thakur, N. Reimers, A. Rücklé, A. Srivastava, I. Gurevych, BEIR: A heterogeneous benchmark
for zero-shot evaluation of information retrieval models, in: Thirty-fith Conference on Neural
Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
[18] X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang, Q. Liu, TinyBERT: Distilling BERT for
natural language understanding, in: T. Cohn, Y. He, Y. Liu (Eds.), Findings of the Association for
Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020,
pp. 4163–4174. URL: https://aclanthology.org/2020.findings-emnlp.372. doi:10.18653/v1/2020.
findings- emnlp.372.
[19] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, J. Kang, Biobert: a pre-trained biomedical
language representation model for biomedical text mining, Bioinformatics (Oxford, England) 36
(2020) 1234–1240.
[20] K. Santhanam, O. Khattab, J. Saad-Falcon, C. Potts, M. Zaharia, ColBERTv2: Efective and eficient
retrieval via lightweight late interaction, in: M. Carpuat, M.-C. de Marnefe, I. V. Meza Ruiz (Eds.),
Proceedings of the 2022 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Association for Computational Linguistics,
Seattle, United States, 2022, pp. 3715–3734. URL: https://aclanthology.org/2022.naacl-main.272.
doi:10.18653/v1/2022.naacl- main.272.
[21] J. Lin, X. Ma, S.-C. Lin, J.-H. Yang, R. Pradeep, R. Nogueira, Pyserini: A python toolkit for
reproducible information retrieval research with sparse and dense representations, in: Proceedings
of the 44th International ACM SIGIR Conference on Research and Development in Information
Retrieval, SIGIR ’21, Association for Computing Machinery, New York, NY, USA, 2021, p. 2356–2362.
[22] J. Johnson, M. Douze, H. Jégou, Billion-scale similarity search with GPUs, IEEE Transactions on</p>
      <p>Big Data 7 (2019) 535–547.
[23] Y. Yue, R. Patel, H. Roehrig, Beyond position bias: examining result attractiveness as a source of
presentation bias in clickthrough data, in: Proceedings of the 19th International Conference on
World Wide Web, WWW ’10, Association for Computing Machinery, New York, NY, USA, 2010, p.
1011–1018.
[24] M. Prince, 9 - epidemiology, in: P. Wright, J. Stern, M. Phelan (Eds.), Core Psychiatry (Third</p>
      <p>Edition), third edition ed., W.B. Saunders, Oxford, 2012, pp. 115–129.
[25] P. Liu, L. Zhang, J. A. Gulla, Pre-train, prompt, and recommendation: A comprehensive survey of
language modeling paradigm adaptations in recommender systems, Transactions of the Association
for Computational Linguistics 11 (2023) 1553–1571.
[26] A. Acharya, B. Singh, N. Onoe, Llm based generation of item-description for recommendation
system, in: Proceedings of the 17th ACM Conference on Recommender Systems, RecSys ’23,
Association for Computing Machinery, New York, NY, USA, 2023, p. 1204–1207.
[27] Y. Wu, R. Xie, Y. Zhu, F. Zhuang, X. Zhang, L. Lin, Q. He, Personalized prompt for sequential
recommendation, IEEE Transactions on Knowledge and Data Engineering (2024).
[28] K. Thaker, Ka-recsys: Knowledge appropriate patient focused recommendation technologies, in:
Proceedings of the 16th ACM Conference on Recommender Systems, RecSys ’22, Association
for Computing Machinery, New York, NY, USA, 2022, p. 720–721. URL: https://doi.org/10.1145/
3523227.3547422. doi:10.1145/3523227.3547422.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Anker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Reinhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. H.</given-names>
            <surname>Feeley</surname>
          </string-name>
          ,
          <article-title>Health information seeking: A review of measures and methods</article-title>
          ,
          <source>Patient Education and Counseling</source>
          <volume>82</volume>
          (
          <year>2011</year>
          )
          <fpage>346</fpage>
          -
          <lpage>354</lpage>
          . URL: https://www. sciencedirect.com/science/article/pii/S0738399110007470. doi:https://doi.org/10.1016/j.pec.
          <year>2010</year>
          .
          <volume>12</volume>
          .008, methodology in Health Communication Research.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Susic</surname>
          </string-name>
          ,
          <article-title>Nihseniorhealth classes for senior citizens at a public library in louisiana</article-title>
          ,
          <source>Journal of Consumer Health on the Internet</source>
          <volume>13</volume>
          (
          <year>2009</year>
          )
          <fpage>417</fpage>
          -
          <lpage>419</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>A. O'Connor</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Stacey</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Entwistle</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Llewellyn-Thomas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Rovner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Holmes-Rovner</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Tait</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Tetroe</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Fiset</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Barry</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>Decision aids for people facing health treatment or screening decisions</article-title>
          .,
          <source>The Cochrane database of systematic reviews 2</source>
          (
          <year>2003</year>
          )
          <article-title>CD001431</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Carter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Nunlee-Bland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Callender</surname>
          </string-name>
          ,
          <article-title>A patient-centric, provider-assisted diabetes telehealth self-management intervention for urban minorities</article-title>
          .,
          <source>Perspectives in health information management 8</source>
          (
          <year>2011</year>
          )
          <article-title>1b</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hoti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hughes</surname>
          </string-name>
          , L. Emmerton,
          <article-title>Consumer use of “dr google”: A survey on health information-seeking behaviors and navigational needs</article-title>
          ,
          <source>Journal of Medical Internet Research</source>
          <volume>17</volume>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Thaker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Birkhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Brusilovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Donovan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Exploring resource sharing behaviors for finding relevant health resources: An analysis of an online ovarian cancer community</article-title>
          ,
          <source>JMIR Cancer (Preprint) Accepted</source>
          <volume>8</volume>
          (
          <year>2021</year>
          )
          <article-title>e33110</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kanthawala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vermeesch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Given</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huh</surname>
          </string-name>
          ,
          <article-title>Answers to health questions: Internet search results versus online health community responses</article-title>
          ,
          <source>Journal of Medical Internet Research</source>
          <volume>18</volume>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Johnston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Worrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gangi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wasko</surname>
          </string-name>
          ,
          <article-title>Online health communities: An assessment of the influence of participation on patient empowerment outcomes</article-title>
          ,
          <source>Inf. Technol. People</source>
          <volume>26</volume>
          (
          <year>2013</year>
          )
          <fpage>213</fpage>
          -
          <lpage>235</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>