<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Translating Polygenic Risk Score Research to a Clinical Setting</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diana Martínez-Minguet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Polygenic Risk Score, Conceptual Modeling, Precision Medicine</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>PROS Group, Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1969</year>
      </pub-date>
      <abstract>
        <p>Polygenic Risk Score (PRS) research represents an emerging and active area of medical genetics. A PRS allows to estimate the extent to which an individual's genetics contributes to the development of a complex disease. PRS analyses can be useful in many research areas, allowing personalized treatment and risk stratification of the population. To conduct a PRS analysis, a PRS Model is required. Clinicians face the critical task of selecting the most accurate PRS Model for their analysis. This is a crucial step given the impact on efectiveness of the analysis results, which can compromise patient care. However, variability of PRS Models, heterogeneity in concept representation and complexities regarding the prioritization of PRS Models in terms of relevance, make PRS Model selection a demanding task. This prevents the efective application of PRS analyses outside of its area of expertise, hindering their translation to the clinical setting. Following the Design Science methodology, we propose the design and validation of two artifacts to overcome these barriers: (i) a conceptual model for easing domain comprehension and PRS Models comparison and (ii) a method to allow for an adequate prioritization of PRS Models. This thesis aims to streamline the PRS Model selection process, assisting in the PRS analysis application and thereby helping to bridge the gap between PRS research and clinical practice.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Precision medicine has revolutionized how diseases are diagnosed, treated and prevented [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
An individual’s genomics, environmental, and lifestyle factors are now considered for providing
personalized care. Genomics plays an emerging role in medical research, providing insights
in how the human genome contributes to disease [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. One can diferentiate between simple
and complex diseases. While simple diseases have a clear genetic cause that can be traced
back to a single change in our DNA (i.e., genetic variant), complex diseases are caused by a
combination of several genetic variants, each contributing diferently to the risk of developing
such a disease [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Research on the genetic causes of complex diseases is of great interest since
these represent the majority of common diseases, posing the greatest burden on health care.
      </p>
      <p>
        In recent years, Polygenic Risk Scores have emerged as a new tool to estimate to what
extent the genetics of an individual contributes to a complex disease [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In fact, genetic risk
analyses using PRSs –PRS analyses– are one the most promising approaches for improving
CAiSE 2024 Doctoral Consortium
clinical decision-making assistance, treatment choices, and risk stratification of the population
for complex diseases.
      </p>
      <p>
        To conduct a PRS analysis a PRS Model is required. A PRS Model applies statistical methods
and algorithms to genomic population data in order to identify the variants associated with the
complex disease under study [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. PRS Models can be developed in a variety of ways, varying in
the ancestry of the population data or the statistical method used, among other factors. This
results in a wide range of available PRS Models, each with its unique characteristics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], that
need to be compared in order to find the most suitable one for a given PRS analysis.
      </p>
      <p>
        Clinicians face the critical task of selecting the most accurate PRS Model for their analysis,
which is a crucial step given that an inadequate model can compromise the efectiveness of the
analysis results [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, the selection process is hindered by complexities of the domain,
regarding lack of standardization in PRS Model reporting and variability of PRS Models [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
There are recommendations in the literature that can be leveraged to face the stated problems
and aid in the PRS Model selection process, but there is no methodology that formally addresses
this issue in a systematized way. For this reason, we propose the application of information
systems engineering techniques in order to accurately systematize the PRS Model selection
process.
      </p>
      <p>
        To sum up, this PhD thesis aims to streamline the PRS Model selection process, ensuring
that clinicians can choose the best PRS Model for their analysis. Following the Design Science
methodology [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], we will tackle the challenges by the design and validation of two artifacts: (i)
a Conceptual Model for characterizing the PRS research domain, providing domain clarification
and easing the comparison between PRS Models and (ii) a Method for the prioritization of PRS
Models, allowing for the ordering of PRS Models in terms of suitability for a given PRS analysis.
      </p>
      <p>The remaining of the paper is organized as follows: Section 2 presents the problem statement
and existing solutions, Section 3 defines the objectives and research questions, Section 4 yields
the methodology to be followed and the expected artifacts, Section 5 oversees the current status
of the research, and in Section 6 we explain the contributions of this work and draw the main
conclusions.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Problem Statement and Existing Solutions</title>
      <p>PRS analyses are used for estimating the genetic risk of developing a complex disease, and they
represent a useful tool for clinicians, epidemiologists and other researchers. Our stakeholders
are clinicians and researchers willing to perform a PRS analysis and their goal is to select the
most suitable PRS Model, since the efectiveness of the analysis results will depend on the
accurate selection of the PRS Model.</p>
      <p>
        There are two main problems preventing the stakeholders to achieve their goal. The first
issue is the heterogeneity in concept representation that hampers domain comprehension and
PRS Model benchmarking. There is a lack of standardized reporting criteria and terminology
not only in studies reporting PRS Models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] but also in studies providing explanations on the
PRS research domain [
        <xref ref-type="bibr" rid="ref10 ref6 ref8">6, 10, 8</xref>
        ]. As a result, the first step of the selection process which is to
compare diferent PRS Models becomes a demanding task . The second issue is related to
the factors that need to be taken into account when selecting an accurate PRS Model. There
are many features to be considered, for instance the ancestry distribution of the model, the
statistical method used or the performance metrics evaluated [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. These features do not provide
a straightforward criterion for including or discarding a PRS Model; instead, they influence
the suitability of a PRS Model for the desired analysis to varying degrees. Therefore, it is
challenging to order the PRS Models in terms of relevance or suitability for an analysis,
i.e., to prioritize them, complicating the selection process.
      </p>
      <p>
        In the literature we can find guidelines for improving the reporting of PRS Models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
and also recommendations for choosing a suitable PRS Model for an analysis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However,
there is no systematized or methodological approach for dealing with the PRS Model selection
process, which is the gap we aim to address in this work. For this purpose, information
systems engineering techniques can provide the means to tackle the aforementioned issues and
streamline the PRS Model selection process.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Objectives and Research Questions</title>
      <p>The main goal of this thesis is to streamline the PRS Model selection process. The Objectives
(O) and Research Questions (RQs) are the following:</p>
      <p>O1. Review of the state of the art and existing barriers in translation of PRS research
into the clinical context.</p>
      <p>RQ1.1 Which barriers exist in the PRS research domain preventing their translation to a clinical
setting?
RQ1.2 Which solutions exist to mitigate the identified barriers?</p>
      <p>O2. Design artifacts to enhance understanding of the domain and assist in the PRS
analysis application.</p>
      <p>RQ2.1 How to improve domain comprehension?
RQ2.2 How to enable PRS Model prioritization?</p>
      <p>O3. Validate the designed artifacts and analyze contributions of this work.
RQ3.1 To what extent does the proposed solution improve domain comprehension?
RQ3.2 To what extent does the proposed solution enable PRS Model prioritization?</p>
    </sec>
    <sec id="sec-5">
      <title>4. Methodology and Expected Artifacts</title>
      <p>
        The methodology to be followed is Design Science, proposed by Wieringa [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This methodology
consists of the design and investigation of artifacts in a context, in order to improve some aspect
of the context. In this case, the artifacts are the conceptual model and the method described
above, and the context is the PRS research field. The Design Cycle consists of three stages,
namely, Problem Investigation, Problem Treatment, and Treatment Validation. The objectives
of the PhD thesis described in Section 3 are aligned with the three stages (see Figure 1).
      </p>
      <sec id="sec-5-1">
        <title>4.1. Problem Investigation</title>
        <p>In the first stage, the associated objective to be pursued is O1. Review of the state of the art
and existing barriers in translation of PRS research into the clinical context. We will
carry out an in-depth research of the context and problem statement by answering two research
questions. The first one RQ1.1 Which barriers exist in the PRS research domain preventing their
translation to a clinical setting? will delineate the problem to be addressed, from the viewpoints
framed in the thesis research area. Here we will establish the stakeholders to be considered and
their goals, which will guide the development of artifacts. In RQ1.2 Which solutions exist to
mitigate the identified barriers? the existing solutions found in related works will be studied in
order to detect gaps or points of improvement. At this point we will define the approach to be
followed to address the research problem.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Treatment Design</title>
        <p>In this second stage we cover O2. Design artifacts to enhance understanding of the
domain and assist in the PRS analysis application, where we will develop the expected
artifacts for the solution of the identified problems.</p>
        <p>
          The first foreseen artifact will result from RQ2.1 How to improve domain comprehension?
We will rely on the use of Conceptual Modeling (CM) techniques to provide sound knowledge
on the domain of application [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. CM has proven to be an efective solution to improve
understandability and standardization in the biological domain [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13, 14, 15</xref>
          ]. Therefore, the
anticipated outcome is a conceptual model representing the PRS research domain, which will
be created utilizing the Unified Modeling Language (UML) Class Diagram [ 16]. Accurate choice
of nomenclature will be addressed by considering guideline recommendations and literature
review [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>Regarding the second research question, RQ2.2 How to enable PRS Model prioritization?,
we envisage the design of a method that will allow for an adequate ordering of PRS Models
in terms of relevance with respect to a given analysis. The method will be supported by the
conceptual model of the domain previously developed. The accurate characterization of the
constructs defining a PRS Model will guide the definition of the requirements and considerations
for developing the prioritization method. As a result, we will facilitate the prioritization of
models based on criteria established by the clinician or researcher conducting the PRS analysis.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Treatment Validation</title>
        <p>The last stage of the design cycle is the validation of the generated artifacts. We will evaluate if
the stakeholders’ goals are accomplished through the objective O3. Validate the designed
artifacts and analyze contributions of this work. The two artifacts to be validated are the
conceptual model of the PRS research domain and the method to enable the prioritization of
PRS Models.</p>
        <p>We envisage the artifacts validation following proposals provided by Wieringa. In order to
answer RQ3.1 To what extent does the proposed solution improve domain comprehension?, we
forsee an empirical validation of the conceptual model with users. For the second question,
RQ3.2 To what extent does the proposed solution enable PRS Model prioritization? we foresee the
validation of the method by expert opinion, and/or via a case study in a real setting.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Current status</title>
      <p>The research is starting its second year of development. So far we have addressed O1, in order
to answer RQ1.1 and RQ1.2. The research has involved exhaustive literature review on the
state of the art and the context of PRS research. As part of the context research which involves
genetics of complex diseases we conducted a Systematic Mapping Review of databases with
genes associated to complex diseases, identifying lack of consensus among database criteria of
inclusion and classification of genes. As a result of the sate of the art on PRS research we have
delimited the problem statement and the approach to be followed. This objective allowed to
structure the research plan which served as guidance for this manuscript.</p>
      <p>We have started to tackle O2, where artifacts are designed. Regarding RQ2.1 we are
developing a preliminary conceptual model on PRS research domain using CM techniques. In this
second year we plan to keep enriching the model with additional iterations, generating new
versions.</p>
      <p>Next steps are the design of the prioritization method. First, we will define the requirements
and considerations taking into account literature review and discussion with experts in the
domain, i.e., which features are relevant for PRS Model selection, existing dependencies or
feature priority. Secondly, we will characterize the strategy for evaluating the features. We
will develop a tool support to enable the instantiation of the method and the validation of the
artifacts. The validation of the artifacts are foreseen to be developed during the third year.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Contributions and final conclusions</title>
      <p>The artifacts proposed in this work represent the main contributions of the research: (i) the
conceptual model will provide comprehension on the PRS research domain by characterizing
the relevant constructs and features thus facilitating the comparison between PRS Models,
and (ii) the method to enable the prioritization of PRS Models will enable the ordering of PRS
Model in terms of suitability for a given analysis. On a more general scale, this research will be
contributing to the translation of PRS research into clinical practice, by streamlining the PRS
Model selection process and thus aiding in the PRS analysis application.</p>
      <p>Nowadays, medical research considers genomic, environmental and lifestyle factors of an
individual for providing tailored diagnosis and treatment. Since genetic information defines an
individual unequivocally, PRS analyses can be highly valuable for personalized medical care for
complex diseases. Genomic research produces data, tools and new knowledge incessantly, we
need to catch up to this evolution to make it profitable in real-world medical settings.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>I want to thank Prof. Oscar Pastor (opastor@dsic.upv.es) and Dr. Alberto García
(algarsi@vrain.upv.es) from the Universitat Politècnica de València for the supervision of this
Thesis. I also want to thank PhD students Mireia Costa and René Noel for their fruitful
discussions. This work was supported by the Generalitat Valenciana through the CoMoDiD project
(CIPROM/2021/023).
time: a conceptual model-based approach, BMC Bioinformatics 23 (2022). doi:10.1186/
s12859-022-04944-z.
[14] A. García, et al., A conceptual model-based approach to improve the representation and
management of omics data in precision medicine, IEEE Access PP (2021) 1–1. doi:10.1109/
ACCESS.2021.3128757.
[15] M. Costa, A. S., A. Palacio, A. Bernasconi, O. Pastor, A Reference Meta-model to
Understand DNA Variant Interpretation Guidelines, 2023, pp. 375–393. doi:10.1007/
978-3-031-47262-6_20.
[16] About the Unified Modeling Language Specification Version 2.5.1, https://www.omg.org/
spec/UML/2.5.1/About-UML, 2017. (Accessed on 03/06/2024).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gameiro</surname>
          </string-name>
          , et al.,
          <article-title>Precision medicine: Changing the way we think about healthcare</article-title>
          ,
          <source>Clinics</source>
          <volume>73</volume>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .6061/clinics/2017/e723.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Pattan</surname>
          </string-name>
          , et al.,
          <article-title>Genomics in medicine: A new era in medicine</article-title>
          ,
          <source>World Journal of Methodology</source>
          (
          <year>2021</year>
          )
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .5662/wjm.v11.
          <year>i5</year>
          .
          <fpage>231</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Visscher</surname>
          </string-name>
          , et al.,
          <article-title>Discovery and implications of polygenicity of common diseases</article-title>
          ,
          <source>Science</source>
          <volume>373</volume>
          (
          <year>2021</year>
          )
          <fpage>1468</fpage>
          -
          <lpage>1473</lpage>
          . doi:
          <volume>10</volume>
          .1126/science.abi8206.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Wray</surname>
          </string-name>
          , et al.,
          <article-title>From basic science to clinical application of polygenic risk scores: A primer</article-title>
          ,
          <source>JAMA psychiatry 78</source>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1001/jamapsychiatry.
          <year>2020</year>
          .
          <volume>3049</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          , et al.,
          <article-title>Genetic prediction of complex traits with polygenic scores: a statistical review</article-title>
          ,
          <source>Trends in Genetics</source>
          <volume>37</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1016/j.tig.
          <year>2021</year>
          .
          <volume>06</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Collister</surname>
          </string-name>
          , et al.,
          <article-title>Calculating Polygenic Risk Scores (PRS) in UK Biobank: A practical guide for epidemiologists</article-title>
          ,
          <source>Frontiers in Genetics</source>
          <volume>13</volume>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          .3389/fgene.
          <year>2022</year>
          .
          <volume>818574</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wand</surname>
          </string-name>
          , et al.,
          <article-title>Improving reporting standards for polygenic scores in risk prediction studies</article-title>
          ,
          <source>Nature</source>
          <volume>591</volume>
          (
          <year>2021</year>
          )
          <fpage>211</fpage>
          -
          <lpage>219</lpage>
          . doi:
          <volume>10</volume>
          .1038/s41586-021-03243-6.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. W. S.</given-names>
            <surname>Choi</surname>
          </string-name>
          , et al.,
          <article-title>Tutorial: a guide to performing polygenic risk score analyses</article-title>
          ,
          <source>Nature Protocols</source>
          <volume>15</volume>
          (
          <year>2020</year>
          ).
          <source>doi:10.1038/s41596-020-0353-1.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Wieringa</surname>
          </string-name>
          ,
          <article-title>Design science methodology: For information systems</article-title>
          and software engineering, Springer Berlin Heidelberg,
          <year>2014</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>662</fpage>
          -43839-8.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>C. Babb de Villiers</surname>
          </string-name>
          , et al.,
          <article-title>Understanding polygenic models, their development and the potential application of polygenic scores in healthcare</article-title>
          ,
          <source>Journal of Medical Genetics</source>
          <volume>57</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1136/jmedgenet-2019-106763.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Olivé</surname>
          </string-name>
          ,
          <source>Conceptual Modeling of Information Systems, Conceptual Modeling of Information Systems</source>
          ,
          <year>2007</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>540</fpage>
          -39390-0.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernasconi</surname>
          </string-name>
          , et al.,
          <article-title>A comprehensive approach for the conceptual modeling of genomic data</article-title>
          ,
          <source>in: Conceptual Modeling, Lecture Notes in Computer Science</source>
          , Springer International Publishing,
          <year>2022</year>
          , pp.
          <fpage>194</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>García</surname>
          </string-name>
          , et al.,
          <article-title>The challenge of managing the evolution of genomics data over</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>