=Paper= {{Paper |id=Vol-3741/paper14 |storemode=property |title=Infantile Predictors of Functional Gastrointestinal Disorders: A Machine Learning Approach to Risk Assessment |pdfUrl=https://ceur-ws.org/Vol-3741/paper14.pdf |volume=Vol-3741 |authors=Flavia Indrio,Elio Masciari,Flavia Marchese,Matteo Rinaldi,Gianfranco Maffei,Enea Vincenzo Napolitano,Isadora Beghetti,Luigi Corvaglia,Arianna Aceti |dblpUrl=https://dblp.org/rec/conf/sebd/IndrioMMRMNBCA24 }} ==Infantile Predictors of Functional Gastrointestinal Disorders: A Machine Learning Approach to Risk Assessment== https://ceur-ws.org/Vol-3741/paper14.pdf
                                Infantile Predictors of Functional Gastrointestinal
                                Disorders: A Machine Learning Approach to Risk
                                Assessment
                                Flavia Indrio1 , Elio Masciari2 , Flavia Marchese3 , Matteo Rinaldi4 , Gianfranco Maffei4 ,
                                Enea Vincenzo Napolitano2,* , Isadora Beghetti5 , Luigi Corvaglia5 and Arianna Aceti5
                                1
                                  Department of Experimental Medicine School of Medicine, University of Salento, Lecce, Italy
                                2
                                  Department of Electrical Engineering and Information Technology, University of Naples Federico II, Naples, Italy
                                3
                                  Department of Medical and Surgical Science Pediatric Section, University of Foggia, Foggia, Italy
                                4
                                  Department of Neonatology and NICU, Ospedali Riuniti Foggia, Foggia, Italy
                                5
                                  Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy


                                            Abstract
                                            This study examines the considerable impact of Functional Gastrointestinal Disorders (FGIDs) on children,
                                            their families and healthcare systems, and highlights the historic challenge of identifying children at
                                            risk due to unclear pathophysiology. The research aims to identify early-life risk factors for FGIDs,
                                            specifically infantile colic, regurgitation, and functional constipation, within the first year of life. Using a
                                            prospective observational cohort design, the study enrolled term and preterm infants from a tertiary care
                                            university hospital in Foggia, Italy, between 1 January 2020 and 31 December 2022, excluding infants
                                            with severe disease or major neonatal complications. By using conventional statistical methods and
                                            artificial intelligence, specifically a random forest classification model, this study identified birth weight,
                                            cord blood pH, and maternal age as significant predictors for FGIDs. A logistic regression predictive
                                            model also established an inverse relationship between these variables and the occurrence of FGIDs.
                                            Using these findings, the study created an AI-based predictive model and a practical, user-friendly web
                                            interface for risk assessment. This enables clinicians to identify infants at a higher risk for FGIDs. The
                                            approach is innovative and marks a pioneering step in FGID risk prediction.

                                            Keywords
                                            Neonatal Health, Early Diagnosis, Risk Factors, Health Informatics,




                                1. Introduction
                                Functional Gastrointestinal Disorders (FGIDs) are a significant challenge in pediatric healthcare
                                due to their prevalence and impact on infants. FGIDs refer to a range of conditions, including
                                infant colic, regurgitation, functional diarrhea, and functional constipation, that are defined

                                SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy
                                *
                                 Corresponding author.
                                $ flavia.indrio@unisalento.it (F. Indrio); elio.masciari@unina.it (E. Masciari); flavia.marchese@hotmail.it
                                (F. Marchese); mrinaldi@ospedaliriunitifoggia.it (M. Rinaldi); gmaffei@ospedaliriunitifoggia.it (G. Maffei);
                                eneavincenzo.napolitano@unina.it (E. V. Napolitano); isadora.beghetti@unibo.it (I. Beghetti);
                                luigi.corvaglia@unibo.it (L. Corvaglia); arianna.aceti2@unibo.it (A. Aceti)
                                 0000-0001-9789-7878 (F. Indrio); 0000-0002-1778-5321 (E. Masciari); 0000-0002-6384-9891 (E. V. Napolitano);
                                0000-0003-4819-1830 (I. Beghetti)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
by the absence of identifiable biochemical or structural anomalies. These conditions affect
almost 50% of infants in their first year of life [1, 2, 3, 4, 5, 6, 7]. Functional gastrointestinal
disorders cause distress and discomfort in infants and place substantial burdens on families and
healthcare systems worldwide. Despite their classification based on the Rome IV criteria, the
underlying pathophysiology of FGIDs remains unclear. Potential contributing factors include
genetic predispositions, psychosocial stressors, and early life events such as delivery type
and feeding practices [8, 9, 10, 11, 12, 13, 14]. It is important to note that any evaluation of
contributing factors should be objective and clearly marked as such.
   In recent years, the use of artificial intelligence (AI) and machine learning (ML) has revolu-
tionised biomedical research, providing innovative tools for analysing complex health issues
[15]. This is particularly true in paediatric healthcare, where AI and ML have the potential to
transform early detection, risk assessment, and intervention strategies for FGIDs. A Machine
Learning Approach to Risk Assessment aims to utilise machine learning to identify early-life
predictors of FGIDs. The research analyses a comprehensive area-based cohort to identify
multifaceted risk factors present during the first year of life that predispose infants to FGIDs.
   Using an AI-based predictive model, our aim is to gain a detailed understanding of the
early-life factors that contribute to FGIDs. This will enable the development of a practical
risk assessment tool to help clinicians identify infants who are at a higher risk of developing
FGIDs. This, in turn, will facilitate early and targeted interventions. Proactive measures have
the potential to improve the immediate symptoms of disorders and mitigate their long-term
impact on children’s health and well-being.
   This study aims to contribute significantly to the field of pediatric gastroenterology by using
machine learning to analyze early-life factors associated with the development of FGIDs. The
statement represents progress towards a predictive and preventative approach to pediatric
healthcare, moving beyond the symptom-based classification of FGIDs.


2. Methods
2.1. Study Design, Participants, and Recruitment
This study, a prospective observational cohort study, was conducted at the Obstetric and
Neonatal Unit of the "Ospedale Riuniti" in Foggia, Italy, from 1 January 2020 to 31 December
2023. The study adhered strictly to institutional data protection requirements, and informed
consent was obtained from the legal representatives of the infants before their participation.
The study’s protocol was approved by the Institutional Review Board to ensure compliance
with the ethical standards outlined in the Declaration of Helsinki.
   The study included both term and preterm infants who met the eligibility criteria. Newborns
presenting with severe clinical conditions, major neonatal complications such as inherited
metabolic disorders, congenital anomalies, or those who died during hospitalization were
excluded. Newborns who met the inclusion and exclusion criteria were consecutively recruited
within their first few days of life, as shown in the patient enrollment flow chart 1.
   The aim of the data collection was to identify potential perinatal risk factors for FGIDs. The
risk factors included gestational age (GA), birth weight (BW), sex, mode of delivery, Apgar
score, venous cord blood pH, maternal demographic characteristics, Neonatal Intensive Care
Figure 1: Patient enrollment flow chart


Unit (NICU) admission, antibiotic administration, and feeding practices at discharge. The data
collection was performed using a randomised controlled trial design. Discharge feeding was
categorised into exclusive breastfeeding and non-exclusive breastfeeding, which included both
exclusive formula feeding and mixed feeding.
   During follow-up visits in the first year of life, a dedicated pediatrician diagnosed and classified
FGIDs according to the Rome IV criteria. Parents provided information on feeding practices at
discharge from the nursery or NICU, and at subsequent milestones of 3, 6, and 12 months. This
information included details on family history of allergic diseases and parental smoking habits.
   The aim of this study was to investigate the relationship between the development of FGIDs
(infantile colic, regurgitation, functional constipation) within the first year of life and various
perinatal/neonatal characteristics.

2.2. Data Analyses
Potential associations between the development of each FGIDs (infantile colic, regurgitation,
functional constipation) and perinatal characteristics were investigated through both conven-
tional statistics and AI. An AI-based predictive model and a practical risk assessment tool for
each FGIDs were then developed.
   Differences between infants developing or non-developing each FGID were evaluated using
the independent sample t test for continuous variables, and the chi-squared test for categorical
data. A p value <.05 was considered as statistically significant. Statistical analyses were
performed using IBM SPSS Statistics 28.0 (IBM Corp., Armonk, NY, USA).
   A machine learning (ML) process was implemented for the analysis of the dataset. An accurate
data preprocessing 2.2.1 was first performed to obtain a dataset suitable for ML analysis. After
a cleaned dataset had been produced, it was partitioned into a training set composed of 4.242
instances and a test set composed of 1.818 instances. A Feature Selection step was performed to
identify the most important variables for the output prediction. Finally, a Classification model
based on Random Forest was produced.
2.2.1. Data Preprocessing
During the dataset preparation phase of our study, we prioritised the integrity and usability of
the data. This required a comprehensive data cleaning process, which began with the exclusion
of variables that had missing values exceeding 30% following the collection phase. We set this
threshold to ensure the quality and reliability of the dataset for subsequent analysis.
   Our approach to treating missing values varied based on the nature of the variables involved.
Continuous variables with missing entries were imputed using the mean value of the respec-
tive variable to ensure a balanced representation of the data without introducing significant
bias. Discrete variables with missing values were imputed with zero. Classification variables
underwent a stricter process, where instances missing any values were entirely removed from
the dataset. This strategy ensured that only complete and accurate data were included in the
analysis.
   To standardize the dataset and facilitate analysis, continuous variables were normalized using
a standard scaler. This allowed for quantification on a uniform scale. Categorical variables
were transformed using one-hot coding, which converts categories into a binary representation,
simplifying their inclusion in statistical models.
   After completing thorough data preparation and preliminary analysis, we selected specific
variables to include in the predictive model. The variables selected for the study were chosen
based on their relevance to the study objectives. These variables included birth weight (BW),
term/preterm status, mode of delivery (vaginal vs. cesarean), Neonatal Intensive Care Unit
(NICU) admission, sex, occurrence of twin births, maternal age, parity, 5-minute Apgar score,
and feeding at discharge. The feeding at discharge variable was categorized into exclusive versus
non-exclusive breastfeeding. In addition, we considered the smoking status of both parents due
to its potential influence on the risk of FGIDs in infants. This selected set of variables will serve
as the foundation for the development of an AI-based predictive model, which aims to shed
light on the complex factors contributing to FGIDs during the first year of life.

2.3. Risk Prediction model development
Our study utilised a comprehensive approach to investigate the complex relationships between
various factors and the occurrence of three specific target conditions: colic, constipation, and
regurgitation. The classification models used in this study included logistic regression, support
vector machine, decision tree classifier, extra tree classifier and random forest classifier. The
Logistic Regression was found to be the most effective in terms of accurately representing the
relationships between the variables and the target conditions. However, to address the challenge
of dealing with an unbalanced dataset, both oversampling and undersampling techniques were
applied to achieve a more balanced data representation. The selection of variables critical for
prediction was conducted through a comparative analysis of various models under different
configurations. Notably, all models consistently prioritised three variables: birth weight (BW),
maternal age, and pH levels, due to their significant importance values. Depending on the
specific model configuration, additional variables such as sex, mode of delivery, or the 5-minute
Apgar score may have been included as either the fourth or fifth variable to improve the model’s
predictive accuracy. To construct the risk prediction model, we experimented with different sets
of variables. We maintained BW, maternal age, and pH as constant predictors while varying the
inclusion of additional variables such as sex, mode of delivery, and the 5-minute Apgar score to
create variable sets of sizes 4, 5, or 6. The model’s predictive capacity was refined by identifying
the most impactful combination of variables, thanks to this procedural flexibility.
   The logistic regression encapsulates the foundation of our predictive models, particularly the
relationship between the predictor variables (BW, maternal age, and pH) and the probability of
a disorder’s occurrence.


3. Results
3.1. Study Population and Clinical Outcomes
A total of 6060 infants participated in the study, with a slight male predominance (52.3%) and
a twin birth rate of 6.0%. Among these infants, 488 (8.1%) were born preterm, indicating a
significant proportion of the cohort experienced early birth. The delivery method statistics
indicate a preference for vaginal birth, with 60.8% of infants delivered this way. The clinical
evaluation conducted at birth showed a mean Apgar score of 8.9 (SD = 0.4, range 3-10) five
minutes after delivery. Additionally, the mean venous cord blood pH was recorded at 7.32 (SD =
0.08, range 6.86-7.55), indicating favourable initial health outcomes for the newborns.
   Within our cohort, 27.3% of infants experienced colic, 18.7% experienced regurgitation, and
10.2% experienced constipation. A deeper analysis revealed that preterm infants were signifi-
cantly more likely to develop gastrointestinal conditions compared to their term counterparts.
The incidences of colic, regurgitation, and constipation were higher in the preterm group (38.1%
vs 25.8%, 35.8% vs 17.2%, and 21.6% vs 9.2%, respectively; p<0.001). These findings highlight the
increased vulnerability of preterm infants to gastrointestinal disorders.

3.2. Model Insights
The analysis started by examining the correlation matrix to identify potential collinearity among
variables in relation to the target conditions of colic, constipation, and regurgitation. This step
was crucial to ensure the validity of the predictive model by excluding the possibility of collinear
variables that could skew the results. The correlation matrix shows no collinearity affecting the
target variables, indicating the reliability of the selected variables for further analysis.
   Moderately strong positive correlations were found between certain variables, such as mater-
nal age and parity (distinguished by nulliparous vs. multiparous mothers), as well as between
the 5-minute Apgar score and birth weight (BW). The correlations suggest a relationship be-
tween the variables that could be significant in understanding infant health outcomes. However,
some variables showed a moderately strong negative correlation, specifically between Neonatal
Intensive Care Unit (NICU) admission and both birth weight (BW) and term birth. These inverse
relationships highlight factors that may influence the likelihood of NICU admission and indicate
the complex interplay of variables that affect infant health.
   The results of our Random Forest Classifier provided additional insights. It identified BW,
cord blood pH, and maternal age as the most influential variables in classifying the three target
conditions. These variables are important in reflecting the health status of the infant and their
potential impact on the likelihood of developing colic, constipation, and regurgitation. The
identification of key factors aids in understanding the investigated conditions and highlights
critical variables for healthcare professionals to monitor closely. These findings offer valuable
insights for clinical practice and future research into the development and diagnosis of functional
gastrointestinal disorders in infants.

3.3. Risk Prediction Model
Our research took a novel approach to predicting three common infant conditions: colic,
regurgitation, and constipation. We used a unified predictive modeling framework and carefully
selected predictor variables. The selection process resulted in the identification of birth weight
(BW), maternal age, and cord blood pH as the primary predictors, distinguished by their
pronounced importance coefficients relative to other variables.
   Three distinct yet interconnected predictive models were developed, each tailored to one of
the conditions under study. Although each model is specific to a particular condition, a common
set of predictor variables was incorporated across all models to comprehensively examine
their interrelationships and impacts on different health outcomes. This approach ensured both
efficiency and effectiveness of the models, while also providing insights into the nuanced roles
played by the predictors in the context of infant health.

3.3.1. Model insights and findings
    • The Colic Prediction Model revealed that higher values in birth weight, maternal age,
      and cord blood pH have a protective effect against colic onset. The model emphasises
      the significant role of pH levels in colic risk assessment, suggesting its potential as a key
      factor in preventive strategies.
    • The Regurgitation Prediction Model showed a pronounced inverse relationship between
      the occurrence of regurgitation and the predictors, with birth weight and pH identified as
      the most substantial protective factors. It is important to closely monitor these variables
      in newborns to reduce the risk of regurgitation.
    • The Constipation Prediction Model identified birth weight and cord blood pH as crucial
      predictors negatively associated with the risk of constipation, with maternal age playing
      a lesser role. The consistent significance of pH levels across all conditions emphasizes its
      relevance in the early identification and management of gastrointestinal issues in infants.

   The use of a unified predictive modelling approach in our study is a significant advancement
in paediatric healthcare, especially in the early detection and management of colic, regurgi-
tation, and constipation in infants. Our models focus on a specific set of predictors, while
also simplifying the interpretation and application of the findings. This methodology enables
healthcare providers to gain a better understanding of the complex dynamics of these prevalent
conditions and to implement more targeted and effective intervention strategies, ultimately
resulting in improved infant health outcomes.
3.4. Risk prediction tool
The research team has developed a user-friendly web interface that translates insights from
predictive models for infant colic, regurgitation, and constipation into a practical tool 2. This
digital application streamlines the process of risk prediction by allowing users to input key
variables such as birth weight (BW), maternal age, and cord blood pH, which have been identified
as significant predictors of these conditions. The web application calculates a prediction score
to reflect the probability of an infant developing any of the three disorders.
   A risk stratification mechanism interprets the scores into three distinct risk categories. Scores
under 33% are classified as low risk, indicating a minimal likelihood of the disorders. Scores
between 33% and 66% are deemed medium risk, suggesting a moderate probability. Finally,
scores above 66% are categorised as high risk, indicating a significant chance of occurrence.
This categorisation helps to provide a clear and actionable evaluation of risk levels.
   The development of this web-based tool simplifies complex statistical models for risk assess-
ment and supports early identification and management of infant health issues. This approach
aims to enhance preventative and diagnostic processes, ultimately contributing to better health
outcomes for infants.




Figure 2: Risk prediction score web interface




4. Discussion and Conclusions
This study represents a significant step forward in the comprehension and treatment of func-
tional gastrointestinal disorders in neonates and toddlers. We identified key risk factors for
FGIDs using both conventional statistics and machine learning. The Functional Risk Index
for Pediatric Subjects (FRIPS) is a novel, machine learning-based predictive model for early
diagnosis of FGIDs in children. Healthcare practitioners can input patient data and receive
risk coefficients for colic, regurgitation, and constipation, empowering them to make informed
clinical decisions. This work should be validated on different populations.
   The FRIPS predictive scores reflect the probability of developing any of the three conditions,
providing a nuanced understanding of risk beyond simple binary outcomes. This approach
employs an equation that considers the interplay between various factors rather than isolated
variables, underscoring the complexity of FGIDs. The application of ML for feature selection
has identified birth weight, cord blood pH, and maternal age as critical variables, indicating
their significant cross influence on disease occurrence.
   The incidence of FGIDs in our study population, particularly among preterm infants, is
consistent with previous research. However, the reported incidence rates vary due to differences
in diagnostic criteria and population stratification. Our findings indicate that preterm infants
are especially susceptible to FGIDs, likely due to the crucial developmental processes that occur
during the perinatal period, which affect the brain-gut-microbiota axis.
   Furthermore, our research confirms that low venous cord blood pH is a risk factor for FGIDs.
This suggests that increased surveillance for infants with low pH at birth could enable early
detection and intervention, potentially mitigating the risk of FGIDs in the first year of life.
This insight is particularly relevant given the established link between neonatal acidaemia and
neurological issues, as well as the emerging evidence of its impact on gastrointestinal health.
   Despite the contributions made, we acknowledge the limitations of our study, particularly
the sample size, which may not fully represent the global population. Future research should
aim to validate and refine FRIPS across diverse populations to enhance its applicability and
accuracy in predicting FGIDs.
   In conclusion, this study sheds light on the complex etiology and risk factors associated
with functional gastrointestinal disorders (FGIDs) in neonates and toddlers. It also offers a
practical tool for early diagnosis and management. The study integrates Machine Learning with
traditional statistical methods, providing a robust framework for enhancing pediatric healthcare
outcomes and improving the quality of life for children and their families worldwide.


Acknowledgments
This work has been supported by the project "AN APP TO SHED THE LIGHT ON THE WINDOW
OF OPPORTUNITY OF THE FIRST 1000 DAYS OF LIFE" funded by the MIUR Progetti di Ricerca
di Rilevante Interesse Nazionale (PRIN) Bando 2022.


References
 [1] J. Hyams, C. Di Lorenzo, M. Saps, R. Shulman, A. Staiano, M. van Tilburg, Functional
     disorders: children and adolescents. gastroenterology, 2016.
 [2] Y. Vandenplas, B. Hauser, S. Salvatore, Functional gastrointestinal disorders in infancy:
     impact on the health of the infant and family, Pediatric gastroenterology, hepatology &
     nutrition 22 (2019) 207–216.
 [3] M. A. Benninga, S. Nurko, C. Faure, P. E. Hyman, I. S. J. Roberts, N. L. Schechter, Childhood
     functional gastrointestinal disorders: neonate/toddler, Gastroenterology 150 (2016) 1443–
     1455.
 [4] Y. Vandenplas, A. Abkari, M. Bellaiche, M. Benninga, J. P. Chouraqui, F. Çokura, T. Harb,
     B. Hegar, C. Lifschitz, T. Ludwig, et al., Prevalence and health outcomes of functional
     gastrointestinal symptoms in infants from birth to 12 months of age, Journal of pediatric
     gastroenterology and nutrition 61 (2015) 531–537.
 [5] L. A. Lestari, A. N. Rizal, W. Damayanti, Y. Wibowo, C. Ming, Y. Vandenplas, Prevalence
     and risk factors of functional gastrointestinal disorders in infants in indonesia, Pediatric
     Gastroenterology, Hepatology & Nutrition 26 (2023) 58.
 [6] N. F. Steutel, J. Zeevenhooven, E. Scarpato, Y. Vandenplas, M. M. Tabbers, A. Staiano, M. A.
     Benninga, Prevalence of functional gastrointestinal disorders in european infants and
     toddlers, The Journal of pediatrics 221 (2020) 107–114.
 [7] A. Chogle, C. A. Velasco-Benitez, I. J. Koppen, J. E. Moreno, C. R. R. Hernández, M. Saps, A
     population-based study on the epidemiology of functional gastrointestinal disorders in
     young children, The Journal of pediatrics 179 (2016) 139–143.
 [8] M. A. van Tilburg, P. E. Hyman, L. Walker, A. Rouster, O. S. Palsson, S. M. Kim, W. E.
     Whitehead, Prevalence of functional gastrointestinal disorders in infants and toddlers,
     The Journal of pediatrics 166 (2015) 684–689.
 [9] G. Holtmann, A. Shah, M. Morrison, Pathophysiology of functional gastrointestinal
     disorders: a holistic overview, Digestive Diseases 35 (2018) 5–13.
[10] I. Koppen, M. Benninga, M. Singendonk, Motility disorders in infants, Early human
     development 114 (2017) 1–6.
[11] R. Shamir, I. St James-Roberts, C. Di Lorenzo, A. J. Burns, N. Thapar, F. Indrio, G. Riezzo,
     F. Raimondi, A. Di Mauro, R. Francavilla, et al., Infant crying, colic, and gastrointestinal
     discomfort in early childhood: a review of the evidence and most plausible mechanisms,
     Journal of pediatric gastroenterology and nutrition 57 (2013) S1.
[12] S. Salvatore, M. E. Baldassarre, A. Di Mauro, N. Laforgia, S. Tafuri, F. P. Bianchi, E. Dattoli,
     L. Morando, L. Pensabene, F. Meneghin, et al., Neonatal antibiotics and prematurity are
     associated with an increased risk of functional gastrointestinal disorders in the first year
     of life, The Journal of pediatrics 212 (2019) 44–51.
[13] M. M. B. B. Gondim, A. L. Goulart, M. B. d. Morais, Prematurity and functional gastroin-
     testinal disorders in infancy: a cross-sectional study, Sao Paulo Medical Journal 140 (2022)
     540–546.
[14] D. Bi, H. Jiang, K. Yang, T. Guan, L. Hou, G. Shu, Neonatal risk factors for functional
     gastrointestinal disorders in preterm infants in the first year of life (2022).
[15] E. V. Napolitano, S. Fioretto, E. Masciari, A. Anniciello, How pandemic affected the adoption
     of e-health systems, in: Proceedings of the 27th International Database Engineered
     Applications Symposium, 2023, pp. 94–98.