=Paper= {{Paper |id=Vol-3651/HeDAI_paper3 |storemode=property |title=Non-invasive AI-powered Diagnostics: The case of Voice-Disorder Detection |pdfUrl=https://ceur-ws.org/Vol-3651/HeDAI-3.pdf |volume=Vol-3651 |authors=Gabriele Ciravegna,Alkis Koudounas,Marco Fantini,Tania Cerquitelli,Elena Baralis,Erika Crosetti,Giovanni Succo |dblpUrl=https://dblp.org/rec/conf/edbt/CiravegnaKFCBCS24 }} ==Non-invasive AI-powered Diagnostics: The case of Voice-Disorder Detection== https://ceur-ws.org/Vol-3651/HeDAI-3.pdf
                                Non-invasive AI-powered Diagnostics: The case of
                                Voice-Disorder Detection - Vision paper
                                Gabriele Ciravegna1 , Alkis Koudounas1 , Marco Fantini2 , Tania Cerquitelli1 , Elena Baralis1 ,
                                Erika Crosetti3 and Giovanni Succo3
                                1
                                  Politecnico di Torino, Corso Duca degli Abruzzi, Turin, Italy
                                2
                                  ENT Unit, San Feliciano Hospital, Rome, Italy
                                3
                                  ENT Clinic - Head and Neck Cancer Unit, San Giovanni Bosco Hospital, Turin Italy


                                                                             Abstract
                                                                             This paper proposes a novel pipeline for non-invasive diagnosis and monitoring in healthcare, leveraging artificial intelligence
                                                                             (AI). The pipeline allows individuals to record various health data using everyday devices and analyze it via AI algorithms
                                                                             on a cloud-based platform. Experimental results on voice disorder detection demonstrate the effectiveness of the proposed
                                                                             approach when compared to existing solutions. Additionally, we discuss the positive impact of the pipeline on diagnosis,
                                                                             prognosis, and monitoring, emphasizing its non-invasive nature. Overall, we think the proposed pipeline might contribute to
                                                                             advancing AI-driven healthcare solutions with implications for global healthcare delivery.

                                                                             Keywords
                                                                             Artificial Intelligence, Non-invasive diagnostics, Voice disorder recognition, Voice analysis



                                1. Introduction                                                                                                       understanding of an individual’s health, facilitating early
                                                                                                                                                      detection and personalized intervention strategies.
                                Artificial Intelligence (AI) is increasingly integrated into                                                             The envisioned framework has the potential to im-
                                healthcare, offering opportunities to improve diagnostics,                                                            prove healthcare delivery as well as have economic and
                                treatment, and patient care. Through machine learning                                                                 technological impacts. First, by enabling large scale
                                algorithms, AI systems can analyze medical data, provid-                                                              screening, it may increase early detection, allowing for
                                ing clinical decision support and personalized treatment                                                              timely intervention and improved treatment outcomes.
                                options [1]. Non-invasive diagnostics is a critical aspect                                                            Second, by analyzing the evolution of patient data in
                                of modern healthcare delivery, offering patients a less in-                                                           time, healthcare providers may tailor interventions, opti-
                                trusive and more comfortable care experience. This holds                                                              mizing treatment efficacy and patient satisfaction. From
                                for both the diagnostic and the monitoring processes, im-                                                             an economic perspective, the improved efficiency of di-
                                proving the overall quality of life for patients [2]. Beyond                                                          agnosis and treatments may reduce overall healthcare
                                enhancing patient comfort, these methods also improve                                                                 costs. Finally, the developed models may be employed or
                                accessibility to healthcare services, particularly for un-                                                            fine-tuned in related data-scarce contexts.
                                derserved populations or those with limited access to                                                                    As part of our investigation, we conducted a prelim-
                                specialized medical facilities.                                                                                       inary study on voice disorder detection, a key aspect
                                   We propose an AI-based framework for non-invasive                                                                  of non-invasive diagnostics. We trained a deep learn-
                                diagnostics that integrates various data modalities, in-                                                              ing model for analyzing voice recordings that achieves
                                cluding voice recordings, self-pictures, typing patterns,                                                             very high precision in detecting the presence of pathol-
                                ECG readings, and sleep analysis, among others. These                                                                 ogy and accurately identifies the type of pathology. We
                                diverse sources of data are collected through everyday de-                                                            employed a transformer model [3], trained end-to-end
                                vices such as computers, smartphones, and smartwatches,                                                               (E2E) directly on the raw data, outperforming traditional
                                enabling convenient and continuous monitoring of in-                                                                  methods such as convolutional neural networks (CNN)
                                dividual health metrics. Subsequently, these data are                                                                 trained on frequency-transformed data. These prelimi-
                                fed into cloud-based applications where advanced AI al-                                                               nary results demonstrate the potential of our approach in
                                gorithms analyze them, identifying patterns that may                                                                  enabling accurate and efficient non-invasive diagnostics
                                be indicative of underlying health issues. Also, the in-                                                              for voice disorders.
                                tegration of multiple data sources allows for a holistic                                                                 The paper begins with an introduction to AI and its
                                                                                                                                                      medical applications (Section 2). It then outlines the
                                Published in the Proceedings of the Workshops of the EDBT/ICDT 2024                                                   proposed idea, and it addresses foreseen challenges (Sec-
                                Joint Conference (March 25-28, 2024), Paestum, Italy                                                                  tion 3). We then present the preliminary results on a
                                $ gabriele.ciravegna@polito.it (G. Ciravegna)                                                                         voice disorder detection case in Section 4, and conclude
                                                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative
                                                                       Commons License Attribution 4.0 International (CC BY 4.0).                     with a discussion on the framework’s impact (Section 5).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                                               in handling multi-omics and multi-modal data [12].


                                                               3. AI for non-invasive medicine
                                                                    In this paper, we propose a framework for analyzing
                                                                    an individual’s health condition in time through non-
                                                                    invasive AI-powered diagnostics. Our proposed pipeline
                                                                    embodies a user-centric approach, empowering individ-
                                                                    uals to actively participate in their healthcare journey.
                                                                    From a medical point of view, we consider the analysis of
                                                                    all pathologies detectable by human experts through non-
                                                                    invasive diagnosis. Drawing parallels with advancements
Figure 1: Outline of the proposed pipeline. A user records dif-
                                                                    in other domains, we argue that AI models can replicate
ferent types of data, e.g., pictures, voice, texts, heart monitors,
and sleep conditions. These data are uploaded to the cloud
                                                                    and potentially surpass human diagnostic accuracy also
and processed by an artificial intelligence method. The results in this domain. This solution may enable large-scale
are visualized on a user-controlled application but can also be screening through simple but accurate analysis of the
shared with a remote doctor who can require further exams. patient data collected remotely in a non-invasive way.
                                                               Data collection In Figure 1 we report a visualization
                                                            of the proposed framework. Leveraging everyday devices
2. Background                                               such as smartphones, laptops, or wearables, users can ef-
                                                            fortlessly record diverse types of data. Clearly, each type
Deep Learning and Transformers Deep Learning is of device enables the collection of different data types.
a robust method for uncovering patterns and insights Laptops and smartphones allow collecting typing and
from extensive datasets [4]. Unlike conventional ma- click patterns, texts, voice recordings, and pictures (the
chine learning approaches, Deep Learning models learn last two particularly through smartphones). Wearables,
directly from raw data in an E2E manner, without the on the other side, allow (and are already employed for)
need for manual feature engineering. Transformer mod- monitoring heart rate, blood pressure, athletic perfor-
els are a prominent class of Deep Learning architec- mance and sleep conditions, to cite a few.
tures [3], demonstrating the effectiveness of this ap-         Data storage analysis Upon collection, the data are
proach in analyzing sequential and multimodal data [5, 6]. uploaded to a cloud-based database and analyzed by
Unlike traditional recurrent and convolutional neural net- means of a single advanced artificial intelligence model.
works, Transformers can capture long-range dependen- The employment of the aforementioned transformer mod-
cies and preserve contextual information over extended els accommodate diverse data modalities and sources. By
sequences, thanks to their self-attention mechanisms. integrating multi-modal data processing capabilities, our
This characteristic enables Transformers to process raw framework aims to capture a holistic view of an individ-
multimodal sequential data, such as text or time-series ual’s health status, enabling comprehensive and personal-
data, as well as images, making them highly adaptable for ized diagnostics. Moreover, the collection of continuous
various applications, including medical diagnostics [7, 8]. health data enable taking into consideration in-time evo-
   DL for Medicine Deep Learning has emerged as a lution and predictive modelling. The model may also
transformative technology in various healthcare applica- evolve and improve over time by means of the new data.
tions. First, DL can analyze electronic health records [9],       Result visualization The processed results are
enabling personalized treatment recommendations and            then presented to users through an intuitive and user-
predictive analytics for patient outcomes. In medical          controlled application interface. Through this inter-
imaging [10], DL models can interpret radiological im-         face, users gain valuable insights into their health met-
ages, including X-rays, CT scans, and MRI images, with         rics, facilitating informed and proactive decisions regard-
levels of accuracy comparable to or even surpassing that       ing their healthcare. Furthermore, a user can transmit
of human experts. These models have been utilized for          the processed data and diagnostic results to their re-
tasks such as disease classification, lesion detection, and    mote healthcare professionals, allowing them to make in-
tumor segmentation, the latter also performed in real-         formed clinical decisions and make timely interventions.
time during surgery. In drug discovery, AI algorithms          Additionally, healthcare providers can utilize the data for
have been shown capable of developing novel drugs              population-level health monitoring and epidemiological
through accurate predictions of protein structures [11].       studies, facilitating the identification of emerging health
Finally, DL models have also shown increased capability        trends and proactive public health interventions.
3.1. Challenges                                               coordinate and distribute training over several models
                                                              without exchanging raw data. The goal is to mitigate
The effectiveness of the proposed framework relies also
                                                              privacy concerns associated with transmitting sensitive
on our ability to address and resolve a number of both
                                                              data to a centralized cloud-based database. Furthermore,
technical and non-technical issues.
                                                              personalized models generated through federated learn-
                                                              ing may exhibit higher diagnostic accuracy, as they are
3.1.1. Data quality                                           trained on data specific to each user’s health profile.
The collection of data through personal devices intro-
duces the risk of data noise, which encompasses various       3.1.3. Human understanding & trustworthiness
factors such as sensor inaccuracies, environmental inter-
                                                              The third issue concerns the explainability challenge as-
ference, and incorrect user input. Sensor inaccuracies
                                                              sociated with employing deep learning-based black box
may lead to erroneous measurements, while environmen-
                                                              models for predictions. While these models offer impres-
tal interference, such as background noise or lighting
                                                              sive performance, their inherent complexity makes them
conditions, can distort the recorded data. Additionally,
                                                              opaque and difficult to interpret. This lack of explainabil-
users may input incorrect data, such as taking pictures
                                                              ity presents challenges for clinicians and end-users who
of the wrong part of their body, which can further exac-
                                                              require insights into the model’s decision-making pro-
erbate data noise and impact the analysis model. These
                                                              cess [16, 17, 18, 19]. Without transparent explanations,
challenges can hinder the model’s ability to accurately
                                                              stakeholders and regulatory institutions may hesitate to
interpret and analyze the data, potentially leading to er-
                                                              trust the diagnostic and monitoring framework.
roneous conclusions and suboptimal performance.
                                                                 Research direction: Concept-based XAI models
   Research direction: data augmentation + input
                                                              A possible solution is represented by the employment
data checking To address the challenges posed by sensor
                                                              of eXplainable AI (XAI) algorithms that can shed light
inaccuracies and environmental interferences, a robust
                                                              on the decisions of the models [20, 21, 22]. Particularly,
data augmentation pipeline can be implemented to miti-
                                                              Concept-based XAI models offer intrinsic interpretability
gate these sources of noise. By incorporating various data
                                                              by mapping raw data to interpretable high-level concepts
augmentation techniques such as noise injection, signal
                                                              before making class predictions, offering insights into
filtering, and data synthesis, the pipeline can generate
                                                              the model’s decision-making process [23, 24, 25]. This
diverse and representative training data that encapsu-
                                                              approach not only addresses the explainability challenge
lates the variability present in real-world scenarios [13].
                                                              but also fosters trust and confidence in the diagnostic
Specific preprocessing tailored to each data modality can
                                                              and monitoring outcomes produced by the system.
help normalize and enhance the quality of the collected
data. Furthermore, to mitigate the issue of incorrect user
input, a dedicated model can be employed to validate the      4. The case of voice disorder
accuracy of the collected data. This model can check each
type of collected data, such as images or sensor readings,       detection
to verify their correctness and flag any discrepancies.
                                                              Vocal disorders are prevalent pathologies affecting a sig-
                                                              nificant portion of the population and exerting a substan-
3.1.2. Privacy preservation                                   tial impact on patients’ quality of life [26, 27, 28, 29].
The second problem arises from privacy concerns asso-         These disorders may originate from various causes, in-
ciated with collecting sensitive data and transmitting        cluding both benign and malignant conditions, and neu-
it to a cloud-based database. Individuals may be hesi-        rodegenerative disorders [30, 31, 32]. Diagnosis often
tant to share sensitive information due to concerns about     relies on clinicians’ auditory assessments of patients’
data security and privacy breaches [14]. Transmitting         voices, highlighting the critical need for accurate and
such data to a cloud-based database further exacerbates       timely detection. Here, a DL model is used to analyze the
these concerns, as it involves relinquishing control over     raw recordings and automatically detect patterns indica-
personal information to third-party service providers.        tive of vocal disorders and distinguish between various
Moreover, regulatory compliance [15] impose stringent         pathologies, including nodules, polyps, cysts, spasmodic
requirements on the handling of personal health infor-        dysphonia or vocal cord paralysis.
mation, adding complexity to the data collection process. Preliminary experiments We demonstrate signifi-
   Research direction: a federated learning ap- cant advancements over prior attempts     1
                                                                                               in voice disorder
proach The privacy issue can be effectively addressed   detection using AI models [33, 34]  . As reported in Fig-
through the adoption of a federated approach. Federated ure 2, our approach achieves notable improvements in
learning is an area of machine learning studying how to
                                                                  1
                                                                      We re-implemented these models for a fair comparison.
                                                                  in its transformative impact from a social, technological,
                                                                  and economic point of view.
                                                                     Social Impact By enabling individuals to record vari-
                                                                  ous types of data remotely using everyday devices, the
                                                                  proposed pipeline facilitates non-invasive diagnostic pro-
                                                                  cedures, eliminating the need for expensive and invasive
                                                                  tests. The accessibility of the proposed pipeline extends
                                                                  beyond traditional healthcare settings, allowing individ-
                                                                  uals in remote or underserved areas to access diagnos-
                                                                  tic services conveniently. Individuals can be monitored
                                                                  remotely, allowing healthcare providers to track their
Figure 2: Comparative analysis of the test model performance      health status in real-time and intervene promptly if ab-
in distinguishing Healthy individuals from those afflicted with   normalities are detected [36]. Early detection and in-
a pathological condition.                                         tervention facilitated by the pipeline lead to improved
                                                                  prognoses for patients, as healthcare providers can initi-
                                                                  ate treatment at earlier stages of disease progression.
                                                                     Technological Impact The proposed framework has
                                                                  significant technological implications. By leveraging the
                                                                  adaptable nature of the framework, models developed
                                                                  for one medical application can be readily deployed and
                                                                  tested in related contexts, accelerating the pace of med-
                                                                  ical research and innovation. As an example, the pre-
                                                                  sented voice disorder detection model could be tested for
                                                                  neurodegenerative patients. Additionally, the framework
                                                                  enables the fine-tuning of models for specific cases, in-
                                                                  cluding those with limited data availability, such as rare
                                                                  diseases. Despite the scarcity of data in such contexts,
                                                                  the model can still generalize due to its original training
Figure 3: Comparative analysis of test model performance in       on a larger and diverse dataset.
identifying the macro pathology that afflicts individuals.    Economic Impact From an economic point of view,
                                                           the proposed pipeline has low operating costs due to the
                                                           utilization of personal devices for data collection and
accuracy, up to +30%, primarily attributed to the utiliza- cloud-based analysis. Additionally, the framework fa-
tion of a Transformer model rather than a Convolutional cilitates the collection of low-cost, virtuous-cycle data,
Neural Network (CNN). The Transformer’s inherent abil- enabling continuous monitoring and feedback loops that
ity to process raw time-series data E2E without any time- enhance the accuracy and effectiveness of diagnostic and
frequency preprocessing offers distinct advantages in monitoring processes over time. Moreover, the frame-
analyzing voice data [35]. Additionally, we introduce work lowers the burden of diagnosis and monitoring on
a robust data augmentation pipeline and consider both public healthcare structures. This, in turn, enables health-
vowel and sentence-based recordings, further enhancing care professionals to focus on more complex and critical
performance by up to +10% compared to the employment medical issues, ultimately improving the efficiency and
of a standard Transformer model only. As reported in resource allocation of the healthcare system.
Figure 3, similar considerations hold also for the clas-
sification of the macro-category pathology. These en- References
hancements underscore the efficacy of our approach in
the diagnosing vocal disorders.                             [1] P. Rajpurkar, E. Chen, O. Banerjee, E. J. Topol, Ai in health
                                                                       and medicine, Nature medicine 28 (2022) 31–38.
                                                                   [2] J.-R. Rueda, I. Sola, A. Pascual, M. S. Casacuberta, Non-invasive
5. Discussion                                                          interventions for improving well-being and quality of life in
                                                                       patients with lung cancer, Cochrane Database of Systematic
                                                                       Reviews (2011).
Overall, the proposed pipeline holds promise for improv-           [3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.
ing healthcare delivery, leveraging AI to enable non-                  Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need,
invasive diagnostics and monitoring. The medical team                  Advances in neural information processing systems 30 (2017).
involved in the development of this pipeline also believes         [4] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, nature 521
                                                                       (2015) 436–444.
 [5] A. Baevski, Y. Zhou, A. Mohamed, M. Auli, wav2vec 2.0: A                    tematic review, arXiv preprint arXiv:2006.00093 (2020).
     framework for self-supervised learning of speech representa-           [21] E. Pastor, A. Koudounas, G. Attanasio, D. Hovy, E. Baralis,
     tions, Advances in neural information processing systems 33                 Explaining speech classification models via word-level audio
     (2020) 12449–12460.                                                         segments and paralinguistic features, in: Proceedings of the
 [6] W.-N. Hsu, B. Bolte, Y.-H. H. Tsai, K. Lakhotia, R. Salakhutdi-             18th Conference of the European Chapter of the Association
     nov, A. Mohamed, Hubert: Self-supervised speech representa-                 for Computational Linguistics, Association for Computational
     tion learning by masked prediction of hidden units, IEEE/ACM                Linguistics, 2024.
     Transactions on Audio, Speech, and Language Processing 29              [22] G. Ciravegna, P. Barbiero, F. Giannini, M. Gori, P. Lió, M. Mag-
     (2021) 3451–3460.                                                           gini, S. Melacci, Logic explained networks, Artificial Intelli-
 [7] H. Xiao, L. Li, Q. Liu, X. Zhu, Q. Zhang, Transformers in                   gence 314 (2023) 103822.
     medical image segmentation: A review, Biomedical Signal                [23] P. Barbiero, G. Ciravegna, F. Giannini, M. Espinosa Zarlenga,
     Processing and Control 84 (2023) 104791.                                    L. C. Magister, A. Tonda, P. Lio, F. Precioso, M. Jamnik,
 [8] M. La Quatra, L. Vaiani, A. Koudounas, L. Cagliero, P. Garza,               G. Marra, Interpretable neural-symbolic concept reasoning,
     E. Baralis, How much attention should we pay to mosquitoes?,                in: A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato,
     in: Proceedings of the 30th ACM International Conference on                 J. Scarlett (Eds.), Proceedings of the 40th International Con-
     Multimedia, MM ’22, Association for Computing Machinery,                    ference on Machine Learning, volume 202 of Proceedings of
     New York, NY, USA, 2022, p. 7135–7139. URL: https://doi.org/                Machine Learning Research, PMLR, 2023, pp. 1801–1825.
     10.1145/3503161.3551594. doi:10.1145/3503161.3551594.                  [24] M. Espinosa Zarlenga, P. Barbiero, G. Ciravegna, G. Marra,
 [9] A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt,               F. Giannini, M. Diligenti, Z. Shams, F. Precioso, S. Melacci,
     P. J. Liu, X. Liu, J. Marcus, M. Sun, et al., Scalable and accu-            A. Weller, et al., Concept embedding models: Beyond the
     rate deep learning with electronic health records, NPJ digital              accuracy-explainability trade-off, Advances in Neural Infor-
     medicine 1 (2018) 18.                                                       mation Processing Systems 35 (2022) 21400–21413.
[10] K. Suzuki, Overview of deep learning in medical imaging,               [25] E. Poeta, G. Ciravegna, E. Pastor, T. Cerquitelli, E. Baralis,
     Radiological physics and technology 10 (2017) 257–273.                      Concept-based explainable artificial intelligence: A survey,
[11] J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov,                     arXiv preprint arXiv:2312.12936 (2023).
     O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek,                [26] N. Roy, R. M. Merrill, S. D. Gray, E. M. Smith, Voice disor-
     A. Potapenko, et al., Highly accurate protein structure predic-             ders in the general population: prevalence, risk factors, and
     tion with alphafold, Nature 596 (2021) 583–589.                             occupational impact, The Laryngoscope 115 (2005) 1988–1995.
[12] M. Lovino, V. Randazzo, G. Ciravegna, P. Barbiero, E. Ficarra,         [27] S. M. Cohen, Self-reported impact of dysphonia in a primary
     G. Cirrincione, A survey on data integration for multi-omics                care population: An epidemiological study, The Laryngoscope
     sample clustering, Neurocomputing 488 (2022) 494–508.                       120 (2010) 2022–2032.
[13] L. Vaiani, A. Koudounas, M. La Quatra, L. Cagliero, P. Garza,          [28] N. Bhattacharyya, The prevalence of voice problems among
     E. Baralis, Transformer-based non-verbal emotion recogni-                   adults in the united states, The Laryngoscope 124 (2014).
     tion: Exploring model portability across speakers’ genders,            [29] N. Spantideas, E. Drosou, A. Karatsis, D. Assimakopoulos,
     in: Proceedings of the 3rd International on Multimodal Sen-                 Voice disorders in the general greek population and in patients
     timent Analysis Workshop and Challenge, MuSe’ 22, As-                       with laryngopharyngeal reflux. prevalence and risk factors,
     sociation for Computing Machinery, New York, NY, USA,                       Journal of Voice 29 (2015) 389–e27.
     2022, p. 89–94. URL: https://doi.org/10.1145/3551876.3554801.          [30] E. Brunner, K. Eberhard, M. Gugatschka, Prevalence of benign
     doi:10.1145/3551876.3554801.                                                vocal fold lesions: Long-term results from a single european
[14] X. Liu, L. Xie, Y. Wang, J. Zou, J. Xiong, Z. Ying, A. V. Vasilakos,        institution, Journal of Voice (2023).
     Privacy and security issues in deep learning: A survey, IEEE           [31] I. Karabayir, S. M. Goldman, S. Pappu, O. Akbilgic, Gradient
     Access 9 (2020) 4566–4593.                                                  boosting for parkinson’s disease diagnosis from voice record-
[15] T. Madiega, Artificial intelligence act, European Parliament:               ings, BMC Medical Informatics and Decision Making 20 (2020).
     European Parliamentary Research Service (2021).                        [32] H. Vieira, N. Costa, T. Sousa, S. Reis, L. Coelho, Voice-based
[16] A. Vellido, The importance of interpretability and visualization            classification of amyotrophic lateral sclerosis: where are we
     in machine learning for applications in medicine and health                 and where are we going? a systematic review, Neurodegener-
     care, Neural computing and applications 32 (2020).                          ative Diseases 19 (2020) 163–170.
[17] A. Koudounas, E. Pastor, G. Attanasio, V. Mazzia, M. Giollo,           [33] R. Islam, E. Abdel-Raheem, M. Tarique, Voice pathology de-
     T. Gueudre, L. Cagliero, L. de Alfaro, E. Baralis, D. Amberti,              tection using convolutional neural networks with electroglot-
     Exploring subgroup performance in end-to-end speech models,                 tographic (egg) and speech signals, Computer Methods and
     in: ICASSP 2023 - 2023 IEEE International Conference on                     Programs in Biomedicine Update 2 (2022) 100074.
     Acoustics, Speech and Signal Processing (ICASSP), 2023, pp.            [34] X. Xie, H. Cai, C. Li, Y. Wu, F. Ding, A voice disease detection
     1–5. doi:10.1109/ICASSP49357.2023.10095284.                                 method based on mfccs and shallow cnn, Journal of Voice
[18] A. Koudounas, E. Pastor, G. Attanasio, V. Mazzia, M. Giollo,                (2023).
     T. Gueudre, E. Reale, L. Cagliero, S. Cumani, L. de Alfaro,            [35] M. Radfar, A. Mouchtaris, S. Kunzmann, End-to-End Neu-
     E. Baralis, D. Amberti, Towards comprehensive subgroup                      ral Transformer Based Spoken Language Understanding, in:
     performance analysis in speech models, IEEE/ACM Trans-                      Proc. Interspeech 2020, 2020, pp. 866–870. doi:10.21437/
     actions on Audio, Speech, and Language Processing (2024).                   Interspeech.2020-1963.
     doi:10.1109/TASLP.2024.3363447.                                        [36] D. Apiletti, E. Baralis, G. Bruno, T. Cerquitelli, Real-time
[19] A. Koudounas, F. Giobergia, E. Baralis, Bad exoplanet! ex-                  analysis of physiological data to support medical applications,
     plaining degraded performance when reconstructing exoplan-                  IEEE transactions on information technology in biomedicine
     ets atmospheric parameters, in: NeurIPS 2023 AI for Sci-                    13 (2009) 313–321.
     ence Workshop, 2023. URL: https://openreview.net/forum?id=
     9Z4XZOhwiz.
[20] G. Vilone, L. Longo, Explainable artificial intelligence: a sys-