<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>of Disease: A Conceptual Modeling Perspective</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diana Martínez-Minguet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mireia Costa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alberto García</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Óscar Pastor</string-name>
          <email>opastor@dsic.upv.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Conceptual Modeling, Conceptual Schema of the Human Genome, Complex Disease, DNA Variant</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ER2025: Companion Proceedings of the 44th International Conference on Conceptual Modeling: Industrial Track, ER Forum</institution>
          ,
          <addr-line>8th</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>PROS Research Group, Valencian Research Institute for Artificial Intelligence (VRAIN), Universitat Politècnica de València</institution>
          ,
          <addr-line>Camí</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>SCME</institution>
          ,
          <addr-line>Doctoral Consortium, Tutorials</addr-line>
          ,
          <institution>Project Exhibitions</institution>
          ,
          <addr-line>Posters and Demos</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>de Vera</institution>
          <addr-line>s/n, Valencia, 46022</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Modern healthcare is shifting toward a more personalized approach, where treatments and diagnostics are tailored to the individual. Genetic testing is a key driver of this evolution, ofering a powerful way to diagnose and assess health risks based on a person's unique genetic makeup. Traditionally, this analysis has focused on identifying a single, high-impact genetic variant as the primary cause of a patient's symptoms. However, advancements over the last few years have revealed that most diseases are far more complex, arising from the combined influence of numerous variants across the entire genome. This new perspective drives the daily generation of vast and complex data, creating an urgent need for genomic information systems to evolve in support of this new disease paradigm. The PROS Research Group specializes in creating genomic information systems built upon a strong conceptual modeling foundation. The cornerstone of these systems is the Conceptual Schema of the Human Genome (CSHG), which provides a standardized model for genomic data. However, this model remains grounded in the traditional, single-variant perspective. This paper presents an extension to the CSHG that integrates both single-variant and the more complex multi-variant perspective. Ultimately, this contribution provides the foundation for generating trustworthy and transparent information systems that can accurately reflect the full complexity of human disease.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The modern era of medicine is defined by the pursuit of precision: tailoring diagnostics and treatments
to the unique profile of each patient [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Because our DNA is our most characteristic individual feature,
understanding how this genetic blueprint contributes to disease lies at the heart of this efort. In the
clinical setting, this knowledge is put into practice through genetic testing, which analyzes a patient’s
DNA to identify relevant genetic variants.
      </p>
      <p>
        Traditionally, the goal of genetic testing has been to pinpoint a single, high-impact variant thought
to be responsible for a patient’s condition [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, this approach is insuficient to explain the
genetic architecture of most common diseases. Indeed, this single-variant perspective, often referred to
as a monogenic cause of disease, is primarily applicable to two scenarios: rare disorders (e.g., sickle cell
disease or Huntington’s disease) which collectively afect up to 7% of the population [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and the small
fraction of cases within common diseases that are driven by a single-gene cause. Inherited cancers are
a prime example, where well-known monogenic forms, such as those caused by high-impact variants in
the BRCA1 or BRCA2 genes, represent only 5–10% of all cases [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>In contrast, the genetic basis of most common diseases, clinically known as complex diseases, is far
more intricate. Rather than being caused by a single variant, these conditions arise when a particular
set of variants acts collectively to increase an individual’s risk —or predisposition— to developing the</p>
      <p>CEUR
Workshop</p>
      <p>
        ISSN1613-0073
disease [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This represents a fundamental shift in understanding, moving from a monogenic model of
direct causation to a polygenic model based on the cumulative efect of numerous variants.
      </p>
      <p>
        Therefore, the future of clinical genetics depends on incorporating this polygenic perspective into
standard testing. This future is materializing through Polygenic Risk Scores (PRSs), which consolidate
an individual’s complex genetic information into a single, actionable metric [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A PRS is calculated by
a statistical model that aggregates the small efects of thousands, or even millions, of common genetic
variants from across the genome. The result is a single number that quantifies an individual’s inherited
predisposition to a specific disease.
      </p>
      <p>
        While this measure is not definitive, integrating such polygenic information into clinical practice has
the potential to enable earlier and more precise disease prevention strategies, tailor screening protocols
to an individual’s genetic profile, and guide lifestyle or therapeutic interventions more efectively [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
Furthermore, by moving beyond the single-variant paradigm, genetic testing will shift from a primarily
diagnostic tool for rare conditions and the small percentage of common conditions caused by a single
variant to a proactive instrument for managing health across the broader population.
      </p>
      <p>
        However, to achieve this transformation, the genomic information systems that support genetic
testing must evolve to manage and interpret the increased complexity of polygenic data. In addition,
multidisciplinary teams involved (genetic counselors, bioinformaticians, and technical experts) will need
to adapt to new and rapidly evolving knowledge. In this context, conceptual modeling provides crucial
support, ofering a way to represent domain-specific information that is intuitive, easy to understand,
and meaningful [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In this context, the PROS Research Group at the Polytechnic University of Valencia
has established a strong foundation with its Model-based Development of trustworthy and transparent
genomic information systems. At the core of these systems lies the Conceptual Schema of the Human
Genome (CSHG) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], a framework designed to represent everything from the genome itself to the way
genetic variants influence disease.
      </p>
      <p>As polygenic approaches gain importance in clinical settings, this work presents a natural evolution
of the CSHG to integrate this perspective of polygenic diseases while preserving its proven monogenic
capabilities. Our main contribution is a unified model that considers both monogenic and polygenic
disease paradigms. This evolution will enable the development of the next generation of
modelbased information systems to capture the full spectrum of human genetic disease, allowing for more
comprehensive and clinically relevant applications.</p>
      <p>The remainder of this paper is structured as follows: Section 2 reviews the key genetic concepts
underlying our model extension. Section 3 then presents our proposed extension to the Conceptual
Schema of the Human Genome (CSHG). Finally, Section 4 summarizes our contributions and discusses
future research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>To provide context for our work, this section presents essential background on the modern understanding
of genetics in disease. First, we will cover the role of genetics in complex diseases. Then, we will explain
Polygenic Risk Scores (PRSs), the modern method used to measure the genetic risk for these conditions.</p>
      <sec id="sec-2-1">
        <title>2.1. Genetics in Complex Disease</title>
        <p>
          Complex diseases are conditions whose development is the result of the combined action of multiple
genes and environmental factors, rather than a single, high-impact genetic variant [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Most common
health conditions, including cardiovascular disease, type 2 diabetes, and many cancers, fall into this
category.
        </p>
        <p>
          At the genetic level, these diseases are shaped by the cumulative efect of thousands of common DNA
variants. To identify the variants that influence disease risk, researchers use Genome-Wide Association
Studies (GWAS) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. A GWAS efectively scans the entire genome of tens of thousands of individuals
and compares their DNA profiles to search for variants that occur more frequently in people with a
given condition than in those without it.
        </p>
        <p>
          While any single variant discovered through GWAS typically has only a small efect, these studies
have proven that the cumulative impact of these variants is significant. This insight —that many
small-efect variants together can explain a large part of an individual’s disease susceptibility— led
directly to the development of statistical models called Polygenic Risk Score (PRS) models [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. These
models aggregate the efects of numerous variants to quantify a person’s inherited predisposition to a
specific disease. The output is a single, comprehensive number which itself is also referred to as the
individual’s PRS.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Polygenic Risk Scores and their Clinical Potential</title>
        <p>The output of a Genome-Wide Association Study (GWAS) is a list of candidate variants statistically
associated with a given disease. However, these raw results are not a direct measure of an individual’s
overall risk. Instead, they serve as a starting point, a map highlighting regions of the genome that
warrant further investigation to understand their true contribution to disease.</p>
        <p>
          The goal of a PRS model is to transform the raw data from a GWAS into a single, clinically meaningful
measure of an individual’s genetic risk [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ]. To do so, the list of variants identified by GWAS is
curated. Statistical methods are used to filter out “noisy” or low-quality variants and to adjust for
variants that only appear to be associated with the disease due to their proximity to a truly causative
variant. As a result, a refined variant set is established, where each variant is assigned a “risk weight”
(its efect size) reflecting its specific contribution to the disease.
        </p>
        <p>This curated set of variants and their corresponding weights forms the core of the PRS model itself,
ready to be applied to an individual’s genetic data. An individual’s PRS is calculated by summing the
assigned risk weights of all the variants from this set that are present in the person’s DNA, resulting in
a single number that represents their overall inherited predisposition.</p>
        <p>
          The clinical interpretation of this individual’s PRS depends on two performance metrics of the variant
set that composes the PRS model: how much (risk association metrics) and how well (discriminatory
power metrics) the variant set predicts a trait or disease [
          <xref ref-type="bibr" rid="ref12 ref14">14, 12</xref>
          ]. A risk association metric describes
how the variant set relates to the likelihood of developing the disease. This is often expressed using
metrics like the Odds Ratio (OR), which compares the risk in a high-scoring group (e.g., the top 10%) to
a reference group (e.g., the middle 40-60%). For instance, if a PRS model for heart disease has an OR of 2
for individuals in the top decile vs. the middle quintile, it means those in the top range are about twice
as likely to develop the condition compared to those with average scores.
        </p>
        <p>The other relevant performance metric is the discriminatory power, which is a measure of the variant
set’s ability to diferentiate between individuals at risk and those not at risk. An example is the Area
Under the Curve (AUC), which gives the probability that a randomly chosen case has a higher PRS than
a randomly chosen control. As an example, if a PRS model has an AUC of 0.9, it means that the model
classifies correctly (i.e., assigns a higher PRS to a case than a control) 90% of the time.</p>
        <p>
          Using these metrics, the PRS computed for an individual can be interpreted [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. For instance, if
the resulting PRS of an individual falls within the top decile, the individual would be twice as likely
to develop the condition, while if the individual falls in the middle range, there is no increased risk of
disease. This afirmation would further be supported by the fact that the model has a good discriminatory
ability of 90%.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Evolution of the Conceptual Schema of the Human Genome</title>
      <p>At the PROS Research Group, genomic information systems are developed to facilitate the management
of genetic data for both clinical and research purposes. These systems are grounded in, and ontologically
supported by, the Conceptual Schema of the Human Genome (CSHG). This ontological foundation
enables the systems to be readily extended with new functionalities and adapted to evolving requirements
or changes in the genomic domain.</p>
      <sec id="sec-3-1">
        <title>3.1. Current state of the CSHG: single-variant approach</title>
        <p>In the current state of the CSHG (see Fig.1), a genetic Variant is related to a Phenotype —a term that
encompasses any observable trait, such as eye color, or a specific disease— through a Significance.
This Significance represents the clinical interpretation a specific Variant has for a given Phenotype,
as provided by the scientific community. It is characterized by two relevant attributes: a
“ClinicalSignificance” and a “levelOfCertainty”.</p>
        <p>On the one hand, the “ClinicalSignificance” describes the variant’s impact on the manifestation of the
phenotype. Standard significances include Pathogenic, for variants known to cause the disease; Benign,
for those not believed to have a causal role; and Uncertain Significance (VUS), where there is insuficient
or conflicting evidence to determine the variant’s efect. On the other hand, the “levelOfCertainty”
represents the level of confidence in the assigned “ClinicalSignificance”. This certainty is directly
dependent on the quality and strength of the scientific evidence used for the assessment.</p>
        <p>
          It is important to note that a given variant may hold diferent Significances for diferent phenotypes.
Even for the same phenotype, this Significance can vary depending on the expert opinion or the
study from which the assessment originates. How these conflicting interpretations are managed by the
CSHG is explained in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>While this representation supports clear and detailed mapping between individual variants and
phenotypes, it cannot capture genetic predisposition to complex diseases, where risk arises from the
combined influence of many variants. Below, we describe how we have addressed this gap through an
extension of the CSHG.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Extension of the CSHG: multi-variant approach</title>
        <p>The limitations of the CSHG’s single-variant representation highlight the need for an extension capable
of capturing the polygenic nature of complex diseases. To address this, we have incorporated the
multivariant perspective into the model. The additions to the CSHG that materialize this new perspective
are depicted in green in Figure 2.</p>
        <p>Our modeling strategy for this extension intends to mirror the structure of the single-variant
perspective. First, we shifted the focus from individual Variants to VariantSets. This approach allows
us to evaluate multiple Variants jointly, which is essential for representing the genetic basis of
complex diseases. In the VariantSet, each of the Variants is assigned an EffectSize that represents its
contribution to the disease risk. A single Variant may belong to multiple variant sets, with a distinct
efect size for each specific set.</p>
        <p>Each VariantSet comes originally from a GenomeWideAssociationStudy (GWAS). It is important to
clarify that a VariantSet represents the curated result after statistical processing (see Section 2.2), not
the raw GWAS output. However, this link to the original GWAS, though indirect, is explicitly maintained
in the model to ensure full data traceability. Furthermore, because diferent curation processes can
be applied to the same initial data, a single GenomeWideAssociationStudy can yield multiple, distinct
VariantSets. This one-to-many relationship is explicitly defined by the cardinalities between these
two classes in our model. The specific curation process and other relevant details for how a VariantSet
was obtained are captured in its “name”, “statisticalMethod” and “description” attributes.</p>
        <p>Second, with the focus now on VariantSets, we need to extend the analogy from the single-variant
model to represent their collective significance for a Phenotype. As detailed in Section 2.2, the clinical
interpretation of a PRS is guided by two performance metrics: risk association (measuring how much
risk the variant set confers) and discriminatory power (assessing how well it predicts the disease). These
metrics create a direct analogy to the single-variant model. A set’s RiskAssociation, which quantifies
the statistical link to the phenotype, is analogous to “ClinicalSignificance”. Its DiscriminatoryPower,
which assesses how well the set distinguishes cases from controls, is analogous to “levelOfCertainty”.
This relationship between the single-variant and multi-variant approaches is summarized in Table 1.
Clinical signifi- Strength of evidence Risk association Quantify the statistical association
becance, indicates that a variant is linked to metrics. Exam- tween the aggregated variant set and
qualitative mea- a disease/phenotype. Ex- ple: Odds Ratio the predicted phenotype. Example:
sures. Example: ample: pathogenic (i.e., (OR), Hazard Ra-  = 2 , i.e., individuals with a higher
pathogenic or be- disorder causing). tio (HR),  coef- PRS have about twice the chance of
denign. ficient, etc. veloping the disease compared to those
with an average PRS.</p>
        <p>Level of cer- Confidence in the Discriminatory Measure of the variant set’s ability
tainty, indicates classification, based on power metrics. to diferentiate between individuals at
qualitative strength, consistency, Example: AUC risk and not at risk. Example:   =
descriptors. Ex- and quality of evidence. (Area Under ROC 0.9, i.e., the probability that a randomly
ample: high, Example: moderate Curve), C-index chosen case has a higher PRS than a
moderate, or low evidence. (Concordance randomly chosen control is 90%.
evidence. index), etc.</p>
        <p>Elaborating on these metrics, the RiskAssociation is defined by a “name” and a “description” that
characterize its specific implementation (e.g., as an Odds Ratio (OR), Hazard Ratio (HR), or β coeficient).
The actual value of the metric is represented by the “riskMagnitude”, and being a statistical measure, it
is accompanied by its corresponding “confidenceInterval”. The interpretation of the “riskMagnitude”
requires considering two factors: the “unitOfChange” and the “directionOfEfect”. The “unitOfChange”
represents the scale of comparison, specifying exactly which groups are being compared. For example,
a PRS model might compare individuals in the top decile (the top 10% of scores) against those in
the middle quintile (the middle 20% of scores). If the resulting Odds Ratio (OR) is 2.0, it specifically
means that people in that top 10% bracket have twice the odds of developing the disease compared
to those in the middle 20% reference group. Without defining these comparison groups, the number
itself is meaningless. The “directionOfEfect” represents whether the calculated “riskMagnitude” is
risk-conferring– the typical situation– or represents a protective efect. For instance, in the case of the
odds ratio type of RiskAssociation, an  &gt; 1 represents the association is risk-conferring, while
 &lt; 1 represents a protective efect.</p>
        <p>Similarly, the DiscriminatoryPower metric is defined by a “name” and “description” specifying the
metric used (e.g., AUC). The resulting value is stored in the “discriminatoryPerformance” attribute and,
like the “riskMagnitude”, is accompanied by its “confidenceInterval”.</p>
        <p>These two metrics are encapsulated within a PerformanceAssessment class. This class
completes the analogy with the single-variant perspective by linking a VariantSet to a Phenotype,
using the RiskAssociation and DiscriminatoryPower as its fundamental components. The
PerformanceAssessment also includes “covariates” (such as age or sex) that may influence the
results, along with any other important “details”.</p>
        <p>A performance assessment’s validity is strictly tied to a specific Population, making this context as
fundamental as the Phenotype for correct interpretation. For example, metrics derived from a study of
middle-aged European women cannot be reliably applied to an elderly Asian man, given their vastly
diferent genetic and environmental backgrounds. To model this, we extended the existing Population
class from the original CSHG. While it was previously defined by a “name” and “genomicRegion”, our
extension adds a “definition” attribute (to specify demographics like age or sex) and the number of
individuals (“nIndividuals”) in the study cohort.</p>
        <p>
          With this representation, we can precisely define complex statements such as the one introduced in
Section 2.2. For instance, consider the following example from a type 2 diabetes variant set developed in
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] (PGS Catalog ID PGS001781), and depicted in Figure 3: “A type 2 diabetes mellitus variant set has
an OR of 1.75 per standard deviation (SD) for European individuals aged 57 years. This indicates that
individuals whose PRS falls at 1 SD when compared to a distribution of these population characteristics
have an increased risk of developing the disease 1.75 times more than people with an average score. In
addition, the variant set can correctly classify individuals with and without type 2 diabetes mellitus 73%
of the time”. As depicted in the figure, the variant set named T2D_PRSCS has been obtained through
the PRS-CS-auto statistical method, we also added two variants with their respective efect sizes as an
example, although the total number is 1091673 aggregated variants.
        </p>
        <p>This shift from a single-variant to a multi-variant perspective is essential for integrating polygenic
measures, such as Polygenic Risk Scores, into genomic information systems and for enabling their
clinical use.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future work</title>
      <p>Genetic testing ofers major benefits for the precision medicine context, allowing for targeted
interventions and precise health risk assessments. While the traditional monogenic paradigm of disease is
routinely adopted, the majority of common diseases do not conform to this pattern. This work addresses
a pressing gap in genomic information systems, which call for the adoption of the polygenic nature of
common complex diseases.</p>
      <p>We have introduced an extension to the Conceptual Schema of the Human Genome that unifies
monogenic (single-variant approach) and polygenic (multi-variant approach) perspectives on genetic
disease. The multi-variant approach is proposed as an analogy to the current single-variant approach,
thus facilitating its understanding and uptake through a known parallelism. With this innovation, we
enable the representation of the complexity of modern disease genetics within a coherent, model-driven
framework.</p>
      <p>Additionally, thanks to the standardized representation of information provided by the schema, we
can support and promote research through a variety of tasks. For instance, it is well established that
the same phenotype can be associated with multiple variant sets, depending on the initial GWAS data
and the statistical post-processing applied. The standardized representation enables straightforward
comparison of performance assessments across diferent variant sets, helping to determine which set is
most suitable for a given genomic analysis.</p>
      <p>At the variant level, the schema allows the study of intersections between diferent variant sets,
facilitating the identification of overlapping variants and the comparison of their efect sizes. Comparing
variant sets for the same phenotype can provide insights on their concordance, while comparing sets
across diferent phenotypes can uncover correlations between comorbid diseases, as seen in certain
mental disorders. Additionally, the unified representation supports analyses such as determining
whether a single high-impact variant linked to one phenotype may also contribute to another phenotype
with a smaller efect size.</p>
      <p>Regarding the future schema itself, next steps include refining the schema’s semantics through
validation with experts’ feedback, testing it against concrete and validated PRS models that aim to be
included as part of genomic analyses for polygenic risk prediction, and integrating it into operational
genomic platforms as the main organizational data structure to manage PRS-related data and perform
reliable interpretations for polygenic risk predictions. This unified representation opens the door to
comprehensive analyses, such as evaluating the implications of an individual carrying both a high-impact
variant and a high polygenic risk for the same phenotype, or conversely, exhibiting complementary
genetic risks arising from a benign high-impact variant but a high risk from common variation.</p>
      <p>The advancement proposed in this study is fundamental because current genetic testing workflows do
not yet incorporate PRS information. By providing a clear and transparent way to represent PRS-related
data, our proposed extension lays the essential groundwork for future genetic information systems
incorporating PRS information. Moreover, because the model was derived through a deliberate parallel
with the well-established single-variant approach, it lowers barriers to understanding and adoption,
ensuring a smoother transition to the polygenic perspective in both research and clinical practice. By
combining rigorous conceptual modeling with emerging genomic needs, this work paves the way for
information systems that more faithfully reflect the true complexity of human disease.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by the Generalitat Valenciana through the CoMoDiD project
(CIPROM/2021/023) and the predoctoral grant (ACIF/2021/117), and by the Universitat Politècnica
de València through grant PAID-06-24.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Gemini in order to: Grammar and spelling check,
Paraphrase and reword. After using this tool, the authors reviewed and edited the content as needed
and take full responsibility for the publication’s content.
improves polygenic risk scores for human coronary heart disease and type 2 diabetes,
Communications Biology 5 (2022) 158. doi:10.1038/s42003-021-02996-0.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Denny</surname>
          </string-name>
          , F. Collins,
          <article-title>Precision medicine in 2030-seven ways to transform healthcare</article-title>
          ,
          <source>Cell</source>
          <volume>184</volume>
          (
          <year>2021</year>
          )
          <fpage>1415</fpage>
          -
          <lpage>1419</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.cell.
          <year>2021</year>
          .
          <volume>01</volume>
          .015.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Franceschini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Frick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kopp</surname>
          </string-name>
          ,
          <article-title>Genetic testing in clinical settings</article-title>
          ,
          <source>American Journal of Kidney Diseases</source>
          <volume>72</volume>
          (
          <year>2018</year>
          ). doi:
          <volume>10</volume>
          .1053/j.ajkd.
          <year>2018</year>
          .
          <volume>02</volume>
          .351.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wakap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lambert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Olry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rodwell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gueydan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Valérie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rath</surname>
          </string-name>
          ,
          <article-title>Estimating cumulative point prevalence of rare diseases: analysis of the orphanet database</article-title>
          ,
          <source>European Journal of Human Genetics</source>
          <volume>28</volume>
          (
          <year>2019</year>
          ).
          <source>doi:10.1038/s41431-019-0508-0.</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hodgson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Foulkes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Maher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Turnbull</surname>
          </string-name>
          ,
          <article-title>Inherited susceptibility to cancer: Past, present and future</article-title>
          ,
          <source>Annals of Human Genetics</source>
          <volume>89</volume>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .1111/ahg.70013.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Visscher</surname>
          </string-name>
          , et al.,
          <article-title>Discovery and implications of polygenicity of common diseases</article-title>
          ,
          <source>Science</source>
          <volume>373</volume>
          (
          <year>2021</year>
          )
          <fpage>1468</fpage>
          -
          <lpage>1473</lpage>
          . doi:
          <volume>10</volume>
          .1126/science.abi8206.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ma</surname>
          </string-name>
          , et al.,
          <article-title>Genetic prediction of complex traits with polygenic scores: a statistical review</article-title>
          ,
          <source>Trends in Genetics</source>
          <volume>37</volume>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1016/j.tig.
          <year>2021</year>
          .
          <volume>06</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Wray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Austin</surname>
          </string-name>
          , J.
          <string-name>
            <surname>McGrath</surname>
            ,
            <given-names>I. Hickie</given-names>
          </string-name>
          , G. Murray,
          <string-name>
            <given-names>P.</given-names>
            <surname>Visscher</surname>
          </string-name>
          ,
          <article-title>From basic science to clinical application of polygenic risk scores: A primer</article-title>
          ,
          <source>JAMA psychiatry 78</source>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1001/ jamapsychiatry.
          <year>2020</year>
          .
          <volume>3049</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , E. Vassos,
          <article-title>Polygenic risk scores: From research tools to clinical instruments</article-title>
          ,
          <source>Genome Medicine</source>
          <volume>12</volume>
          (
          <year>2020</year>
          ).
          <source>doi:10.1186/s13073-020-00742-5.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Guarino</surname>
          </string-name>
          , G. Guizzardi,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mylopoulos</surname>
          </string-name>
          ,
          <article-title>On the philosophical foundations of conceptual models</article-title>
          ,
          <source>in: Information Modelling and Knowledge Bases XXXI, Frontiers in Artificial Intelligence and Applications</source>
          , IOS Press,
          <year>2020</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .3233/FAIA200002.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>A. García S.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Palacio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Reyes</given-names>
            <surname>Román</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Casamayor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pastor</surname>
          </string-name>
          ,
          <article-title>A conceptual model-based approach to improve the representation and management of omics data in precision medicine</article-title>
          ,
          <source>IEEE Access PP</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3128757</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Ufelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Munung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vries</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Okada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lappalainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Posthuma</surname>
          </string-name>
          ,
          <article-title>Genome-wide association studies</article-title>
          ,
          <source>Nature Reviews Methods Primers</source>
          <volume>1</volume>
          (
          <year>2021</year>
          ).
          <source>doi:10.1038/s43586-021-00056-9.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>C. Babb de Villiers</surname>
          </string-name>
          , et al.,
          <article-title>Understanding polygenic models, their development and the potential application of polygenic scores in healthcare</article-title>
          ,
          <source>Journal of Medical Genetics</source>
          <volume>57</volume>
          (
          <year>2020</year>
          ). doi:
          <volume>10</volume>
          .1136/ jmedgenet-2019-106763.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kelemen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Parkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Inouye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lambert</surname>
          </string-name>
          ,
          <article-title>Recent advances in polygenic scores: translation, equitability, methods and fair tools</article-title>
          ,
          <source>Genome Medicine</source>
          <volume>16</volume>
          (
          <year>2024</year>
          ).
          <source>doi:10.1186/s13073-024-01304-9.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vassy</surname>
          </string-name>
          ,
          <article-title>Polygenic risk scores in the clinic: Translating risk into action</article-title>
          ,
          <source>Human Genetics and Genomics Advances</source>
          <volume>2</volume>
          (
          <year>2021</year>
          )
          <article-title>100047</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.xhgg.
          <year>2021</year>
          .
          <volume>100047</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Corpas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Megy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Metastasio</surname>
          </string-name>
          , E. Lehmann,
          <article-title>Implementation of individualised polygenic risk score analysis: a test case of a family of four</article-title>
          ,
          <source>BMC Medical Genomics</source>
          <volume>15</volume>
          (
          <year>2022</year>
          ).
          <source>doi:10.1186/ s12920-022-01331-8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Palacio</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. S.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ribelles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pastor</surname>
          </string-name>
          ,
          <article-title>Evolution of an Adaptive Information System for Precision Medicine</article-title>
          ,
          <year>2021</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -79108-
          <issue>7</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Tamlander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mars</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pirinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Palotie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Daly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Riley-Gillis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Paul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Runz</surname>
          </string-name>
          ,
          <string-name>
            S. John,
            <given-names>R.</given-names>
            <surname>Plenge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Maranville</surname>
          </string-name>
          , G. Okafo,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lawless</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Salminen-Mankonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>McCarthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hunkapiller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ehm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Auro</surname>
          </string-name>
          , T. Southerington,
          <article-title>Integration of questionnaire-based risk factors</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>