<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predictive Modeling of Echocardiographic Parameters Using Electrocardiogram Features via Machine Learning Methods as a Tool for Assessing of Functional Status of Military Personnel</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anton Popov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl Stasiuk</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Illya Chaikovsky</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Glushkov Institute of Cybernetics</institution>
          ,
          <addr-line>40 Akademika Hlushkova Ave., Kyiv, 03187</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Igor Sikorsky Kyiv Polytechnic Institute</institution>
          ,
          <addr-line>37 Beresteiskiy Ave., Kyiv, 03056</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Defense University of Ukraine</institution>
          ,
          <addr-line>Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Ukrainian Catholic University</institution>
          ,
          <addr-line>17 Sventsitsky Str. Lviv, 79011</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>This study explores the feasibility of predicting echocardiographic (EchoCG) parameters from electrocardiogram (ECG) data using machine learning techniques. Two modeling approaches are investigated: regression for continuous parameter prediction and multi-class classification for clinically significant parameter ranges. A dataset of 37 patients with matched ECG and EchoCG data is used. Strong correlations between selected parameter pairs are identified. Results demonstrate that ensemble models such as Random Forest outperform linear models in most prediction tasks. Limitations due to data imbalance and potential improvements using balancing techniques are also discussed.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ECG</kwd>
        <kwd>EchoCG</kwd>
        <kwd>Electrocardiogram</kwd>
        <kwd>Echocardiography</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Biosignal Analysis</kwd>
        <kwd>Random Forest</kwd>
        <kwd>Classification</kwd>
        <kwd>Regression</kwd>
        <kwd>Predictive Modeling</kwd>
        <kwd>Intelligent Healthcare</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Electrocardiography (ECG) and transthoracic echocardiography (EchoCG) are two fundamental
diagnostic tools in cardiology. While ECG provides information on the electrical activity of the heart,
EchoCG ofers insights into its mechanical and structural function. These modalities are often used
together in clinical settings to diagnose and monitor cardiovascular diseases.</p>
      <p>
        In recent years, machine learning (ML) has demonstrated substantial potential in processing and
interpreting ECG data, enabling automatic detection of arrhythmias, structural abnormalities, and even
prediction of patient outcomes such as mortality [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Beyond diagnostic classification, some studies
have applied deep learning to ECG waveforms to predict structural parameters traditionally assessed
by EchoCG, such as left ventricular ejection fraction (LVEF) [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        Several recent eforts suggest that non-invasive ECG data may contain enough information to infer
certain echocardiographic abnormalities, especially when leveraged through advanced ML techniques
[
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ]. However, most existing studies focus on a limited number of EchoCG parameters or dichotomous
classification tasks (e.g., reduced vs. normal LVEF). Few works attempt comprehensive modeling of a
wide spectrum of EchoCG parameters from multivariate ECG data.
      </p>
      <p>
        At the same time, researchers acknowledge significant barriers to ML application in clinical
cardiology [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], particularly the limited availability of high-quality, paired ECG–EchoCG datasets and
the imbalance in class distribution, which hinders model generalizability [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]. Moreover, there is
little experience of using such an advanced method of analysis outside the cardiology clinic, in other
scenarios. At the same time, the assessment of the contractile function of the heart is a significant
component of the functional state of a person outside the hospital, including a serviceman preparing to
perform combat missions. Significant violations of this function, of course, limit combat readiness.
      </p>
      <p>In this paper, we explore the feasibility of predicting a wide set of echocardiographic parameters
from ECG-derived features using machine learning techniques. We utilize a dataset consisting of 37
military persons, free of heart disease, with matched ECG and EchoCG measurements, comprising 172
ECG parameters and 134 EchoCG parameters. Our approach includes:
• Performing Pearson correlation analysis to identify strongly associated ECG–EchoCG parameter
pairs.
• Training regression models—linear regression and Random Forest—to predict quantitative EchoCG
values from ECG data.
• Formulating a multi-class classification problem based on clinically meaningful ranges of selected</p>
      <p>EchoCG parameters (e.g., LV dimensions, LVEF).
• Evaluating classification performance and analyzing limitations due to data imbalance, with
suggestions for addressing them through oversampling (e.g., SMOTE), cost-sensitive learning,
and ensemble models.</p>
      <p>This research aims to evaluate the potential of ECG-based prediction models as a non-invasive tool for
estimating echocardiographic measurements. The results provide insight into the correlation between
electrical and mechanical cardiac markers and lay the groundwork for intelligent clinical decision
support systems.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Materials and Methods</title>
      <sec id="sec-2-1">
        <title>2.1. Dataset Description</title>
        <p>The dataset used in this study comprises paired records of electrocardiographic (ECG) and transthoracic
echocardiographic (EchoCG) parameters collected from a cohort of 43 patients. After preprocessing,
only 37 patients had both valid ECG and EchoCG records and were included in the analysis.</p>
        <p>In total, 126 unique ECG records and 64 unique EchoCG records were available. Among the final
dataset, 112 ECG records and 64 EchoCG records corresponded to the 37 patients with complete data.</p>
        <p>The ECG dataset initially contained 189 features, while the EchoCG dataset had 124. After filtering,
172 ECG parameters and 134 EchoCG parameters were retained for analysis.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Preprocessing and Feature Selection</title>
        <p>Several preprocessing steps were performed:
• Parameter exclusion: Non-informative fields such as patient identifiers, timestamps, and
demographic attributes (e.g., gender, birthdate) were excluded. Additionally, features with only
one unique value were removed.
• Missing data handling: Features with missing values in more than 10% of patients were excluded.</p>
        <p>Remaining missing values were imputed using feature-wise means.
• Encoding and normalization: Categorical variables were binary-encoded. All numeric features
were normalized to zero mean and unit variance.
• Aggregation: When multiple records per patient were available, measurements were averaged
to form a unified feature vector per patient.</p>
        <p>After these steps, the dataset included 163 ECG and 109 EchoCG parameters for each of the 37
patients.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Correlation Analysis</title>
        <p>To evaluate the relationship between ECG and EchoCG features, Pearson correlation coeficients were
computed for all pairwise combinations. The Pearson correlation coeficient  between variables  and
 is defined as:
 = √︀∑︀
=1(−¯)
∑︀=1(−¯)( −)¯
2√︀∑︀
=1(−)¯
2
where¯ and¯ are the sample means of  and , respectively.</p>
        <p>Several pairs demonstrated strong correlations ( &gt; 0.90), such as:
(1)
(2)
(3)
(4)</p>
        <p>These findings provided insights into potential functional and structural relationships between
electrical and mechanical cardiac properties.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Regression Modeling</title>
        <p>To predict continuous EchoCG parameters from ECG data, two regression models were trained:
• Linear Regression: Assumes linear dependence between ECG features and each EchoCG
parameter.
• Random Forest Regressor: An ensemble of 200 decision trees trained using bootstrapped
samples and random feature selection at each split.</p>
        <p>Prior to training, features were normalized. Data was split into training (80%) and testing (20%)
subsets using a random shufle split.</p>
        <p>Performance was evaluated using the following metrics:
• ECG parameter: Q/R amplitude ratio (lead AvF) and EchoCG parameter: Left ventricular systolic
sphericity index ( = 0.99),
• ECG parameter: T-wave symmetry (lead I) and EchoCG parameter: Mitral valve score ( = 0.99).</p>
        <p>MAE = 1 ∑︁ | − ˆ |</p>
        <p>=1
⎯
RMSE = ⎷⎸⎸ 1 ∑=︁1 ( − ˆ )2
2 = 1 −
∑︀=1( − ˆ )2
∑︀=1(−)¯ 2</p>
        <sec id="sec-2-4-1">
          <title>2.4.1. Mean Absolute Error (MAE)</title>
          <p>Measures the average magnitude of errors between predicted and actual values:</p>
        </sec>
        <sec id="sec-2-4-2">
          <title>2.4.2. Root Mean Square Error (RMSE)</title>
          <p>Emphasizes larger errors more than MAE:</p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>2.4.3. Coeficient of Determination ( 2)</title>
        <p>Represents the proportion of variance in the target variable explained by the model:
where¯ is the mean of the observed values.</p>
      </sec>
      <sec id="sec-2-6">
        <title>2.5. Classification of Clinically Relevant Ranges</title>
        <p>For several clinically important EchoCG parameters (e.g., LV end-diastolic diameter, LV end-systolic
volume, LA diameter, LVEF), value ranges were discretized into 4–5 categorical classes reflecting clinical
thresholds. This formulation transformed the prediction task into multi-class classification.</p>
        <p>Due to class imbalance (e.g., most patients concentrated in one class), the analysis focused on the
parameter with the most balanced class distribution: Left Atrial Anteroposterior Dimension. Two-class
classification was performed.</p>
        <p>A Random Forest classifier with 100 trees was used. Key settings included:
• Class balancing using inverse frequency weighting,
• Stratified train/test split to preserve class proportions,
• Fixed random seed for reproducibility.</p>
        <p>Classification performance was evaluated using:
The harmonic mean of precision and recall, particularly useful for imbalanced datasets:
Accuracy =</p>
        <p>Number of correct predictions</p>
        <p>Total number of predictions</p>
        <p>Precision · Recall
F1 = 2 · Precision + Recall
(5)
(6)</p>
        <sec id="sec-2-6-1">
          <title>2.5.1. Accuracy 2.5.2. F1 Score</title>
          <p>Where:
TP
• Precision = TP + FP
TP
• Recall = TP + FN</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. Regression Results</title>
        <p>Here, TP, FP, and FN denote true positives, false positives, and false negatives, respectively.
The regression task aimed to predict continuous echocardiographic (EchoCG) parameters using
electrocardiographic (ECG) features. Two models were evaluated: linear regression and random forest
regression. The performance was assessed on the test set using MAE, RMSE, and 2 score.</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Linear Regression</title>
          <p>Linear regression was able to capture linear relationships between ECG and EchoCG features. However,
only a limited subset of parameters achieved satisfactory performance. Table 1 presents the top EchoCG
parameters predicted with the highest 2 scores.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Random Forest Regression</title>
          <p>Random forest models outperformed linear regression across most predicted parameters due to their
ability to capture non-linear relationships. Table 2 summarizes the best-performing predictions based
on 2 scores.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Classification Results</title>
        <p>The classification experiment targeted the prediction of clinically meaningful ranges of selected EchoCG
parameters. Due to data imbalance, we focused on the binary classification of the parameter Left Atrial
Anteroposterior Dimension (LAAPD), which had the most balanced class distribution.</p>
        <p>The random forest classifier achieved the following results:
• Accuracy: 78%
• F1 Score (majority class): 0.88
• F1 Score (minority class): 0.00</p>
        <p>Despite attempts to balance the dataset using class weights and stratified sampling, the model failed to
correctly classify any minority class instances. This result highlights the dificulty of applying standard
classifiers on highly imbalanced clinical datasets.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Correlation Findings</title>
        <p>Pearson correlation analysis identified several ECG–EchoCG parameter pairs with strong linear
relationships ( &gt; 0.9), suggesting high predictive potential. Notable examples include:
• ECG: Q/R amplitude ratio (lead AvF) ↔ EchoCG: L systolic sphericity index ( = 0.99)
• ECG: T-wave symmetry (lead I) ↔ EchoCG: Mitral valve score ( = 0.99)
• ECG: Heart rhythm abnormality score ↔ EchoCG: Tricuspid regurgitation grade ( = 0.95)
These correlations validate the feasibility of ECG-driven estimation of certain mechanical heart
characteristics.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Conclusions</title>
      <p>This study explored the feasibility of predicting echocardiographic parameters using
electrocardiographic features via machine learning methods. Both regression and classification tasks were evaluated
to assess the potential for non-invasive, ECG-based estimation of EchoCG measurements.</p>
      <p>Our regression results show that both linear and non-linear models can predict a subset of EchoCG
parameters with reasonable accuracy. However, Random Forest regression consistently outperformed
linear regression, especially for parameters with known non-linear relationships to ECG markers.</p>
      <p>The highest 2 scores (above 0.80) were achieved for:
• KST LeA, potik cherez lehenevu arteriiu (2 = 0.896)
• VTLSh, aortalnyi potik (2 = 0.830)</p>
      <p>Other EchoCG parameters such as tricuspid regurgitation grade, LV hypertrophy markers, and LV
ejection fraction (FVLSh) were also predicted with acceptable performance (2 &gt; 0.6), demonstrating
that ECG signals contain information reflective of structural and hemodynamic cardiac states.</p>
      <p>Linear regression, while interpretable, was limited in its predictive power for most parameters. It only
achieved moderate 2 (around 0.5) for a few features, indicating that non-linear modeling is essential
for capturing complex ECG–EchoCG relationships.</p>
      <p>In the classification setting, EchoCG parameters were discretized into clinically meaningful ranges.
The model’s performance was significantly limited by strong class imbalance in the dataset. In the
binary classification task (e.g., predicting left atrial diameter class), the model achieved a high F1 score
for the majority class (0.88) but completely failed to identify the minority class (F1 = 0.00), despite class
weighting.
4.1. Limitations
• The dataset size (n = 37 patients) was small, limiting the generalizability and statistical power of
models.
• Many EchoCG parameters exhibited significant class imbalance, limiting the applicability of
standard classifiers.
• The features were extracted from structured ECG and EchoCG datasets; waveform-based deep
learning was not explored.
• Ensemble methods such as XGBoost or LightGBM designed for imbalanced data were not
employed.</p>
      <sec id="sec-4-1">
        <title>4.2. Conclusions</title>
        <p>This study confirms that machine learning models can predict several echocardiographic parameters
from ECG features with promising accuracy, particularly when using ensemble methods like Random
Forests. However, data limitations — especially in size and class distribution — currently constrain
the reliability and scope of these predictions. With further development and clinical validation,
ECGdriven estimation of EchoCG parameters could become a valuable, low-cost tool for cardiac screening
and monitoring. Moreover, the developed method undoubtedly has significant potential outside the
cardiology clinic, for example, for an objective assessment of the functional state of military personnel.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Funding</title>
      <p>Support for this research was provided by the National Research Foundation of Ukraine under project
No. 2023.04/0094, titled "Development of technology for objective monitoring of functional capabilities
and stress of military personnel based on miniature electrocardiographs and machine learning."</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used GPT-4o in order to: Grammar and spelling check.
After using this tool, the authors reviewed and edited the content as needed and take full responsibility
for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Opportunities and challenges of deep learning methods for electrocardiogram data: A review</article-title>
          , arXiv preprint arXiv:
          <year>2001</year>
          .
          <volume>01550</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Raghunath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Ulloa-Cerna</surname>
          </string-name>
          , et al.,
          <article-title>Deep neural networks can predict mortality from 12-lead electrocardiogram voltage data</article-title>
          ,
          <source>Nature Medicine</source>
          <volume>26</volume>
          (
          <year>2020</year>
          )
          <fpage>886</fpage>
          -
          <lpage>891</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Doe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Leveraging ecg images for predicting ejection fraction using machine learning</article-title>
          ,
          <source>Journal of Cardiovascular Informatics</source>
          (
          <year>2025</year>
          ).
          <article-title>Accepted manuscript</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <article-title>Deep learning-based identification of echocardiographic abnormalities from ecg</article-title>
          ,
          <source>Computers in Biology and Medicine</source>
          <volume>158</volume>
          (
          <year>2024</year>
          )
          <fpage>106013</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Boyle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , et al.,
          <article-title>Machine learning-assisted echocardiography prediction in childhood cancer survivors</article-title>
          ,
          <source>Cardio-Oncology</source>
          <volume>10</volume>
          (
          <year>2024</year>
          )
          <fpage>23</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Molenaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zwart</surname>
          </string-name>
          , R. De Jong, et al.,
          <article-title>Explainable machine learning using echocardiography to improve risk prediction in chronic coronary syndrome</article-title>
          ,
          <source>European Heart Journal - Digital Health</source>
          <volume>5</volume>
          (
          <year>2024</year>
          )
          <fpage>189</fpage>
          -
          <lpage>198</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chaikovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popov</surname>
          </string-name>
          ,
          <article-title>Advances in the analysis of electrocardiogram in context of mass screening: Technological trends and application of AI anomaly detection</article-title>
          , in: S. M.
          <string-name>
            <surname>Qaisar</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Nisar</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Subasi (Eds.),
          <source>Advances in Non-Invasive Biomedical Signal Sensing and Processing with Machine Learning</source>
          , Springer, Cham,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -23239-
          <issue>8</issue>
          _
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chaikovsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fogel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kazmirchyk</surname>
          </string-name>
          ,
          <article-title>Development of AI-based method to detect the subtle ECG deviations from the population ECG norm</article-title>
          ,
          <source>European Journal of Preventive Cardiology</source>
          <volume>28</volume>
          (
          <year>2021</year>
          )
          <fpage>zwab061</fpage>
          -
          <lpage>229</lpage>
          . doi:
          <volume>10</volume>
          .1093/eurjpc/zwab061.
          <fpage>229</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>B.</given-names>
            <surname>Cadaret</surname>
          </string-name>
          , K. Liu,
          <article-title>Machine learning in electrocardiography and echocardiography</article-title>
          ,
          <source>Current Cardiology Reports</source>
          <volume>22</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Wikipedia</surname>
            <given-names>contributors</given-names>
          </string-name>
          ,
          <source>Artificial intelligence in healthcare: cardiovascular applications</source>
          , https: //en.wikipedia.org/wiki/Artificial_intelligence_in_healthcare,
          <year>2025</year>
          .
          <source>Accessed August</source>
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>