1. Introduction

Journal of Voice 29 (2015) 389-e27. 2022

10.1145/3503161.3551594

Non-invasive AI-powered Diagnostics: The case of Voice-Disorder Detection - Vision paper

Gabriele Ciravegna

Alkis Koudounas

Marco Fantini

Tania Cerquitelli

Elena Baralis

Erika Crosetti

Giovanni Succo

0 0 ENT Clinic - Head and Neck Cancer Unit, San Giovanni Bosco Hospital , Turin Italy 1 ENT Unit, San Feliciano Hospital , Rome , Italy 2 Politecnico di Torino, Corso Duca degli Abruzzi , Turin , Italy

2023

202 3451 3460

This paper proposes a novel pipeline for non-invasive diagnosis and monitoring in healthcare, leveraging artificial intelligence (AI). The pipeline allows individuals to record various health data using everyday devices and analyze it via AI algorithms on a cloud-based platform. Experimental results on voice disorder detection demonstrate the efectiveness of the proposed approach when compared to existing solutions. Additionally, we discuss the positive impact of the pipeline on diagnosis, prognosis, and monitoring, emphasizing its non-invasive nature. Overall, we think the proposed pipeline might contribute to advancing AI-driven healthcare solutions with implications for global healthcare delivery.

eol>Artificial Intelligence Non-invasive diagnostics Voice disorder recognition Voice analysis

1. Introduction

understanding of an individual’s health, facilitating early detection and personalized intervention strategies.

Artificial Intelligence (AI) is increasingly integrated into The envisioned framework has the potential to imhealthcare, ofering opportunities to improve diagnostics, prove healthcare delivery as well as have economic and treatment, and patient care. Through machine learning technological impacts. First, by enabling large scale algorithms, AI systems can analyze medical data, provid- screening, it may increase early detection, allowing for ing clinical decision support and personalized treatment timely intervention and improved treatment outcomes. options [ 1 ]. Non-invasive diagnostics is a critical aspect Second, by analyzing the evolution of patient data in of modern healthcare delivery, ofering patients a less in- time, healthcare providers may tailor interventions, optitrusive and more comfortable care experience. This holds mizing treatment eficacy and patient satisfaction. From for both the diagnostic and the monitoring processes, im- an economic perspective, the improved eficiency of diproving the overall quality of life for patients [ 2 ]. Beyond agnosis and treatments may reduce overall healthcare enhancing patient comfort, these methods also improve costs. Finally, the developed models may be employed or accessibility to healthcare services, particularly for un- fine-tuned in related data-scarce contexts. derserved populations or those with limited access to As part of our investigation, we conducted a prelimspecialized medical facilities. inary study on voice disorder detection, a key aspect

We propose an AI-based framework for non-invasive of non-invasive diagnostics. We trained a deep learndiagnostics that integrates various data modalities, in- ing model for analyzing voice recordings that achieves cluding voice recordings, self-pictures, typing patterns, very high precision in detecting the presence of patholECG readings, and sleep analysis, among others. These ogy and accurately identifies the type of pathology. We diverse sources of data are collected through everyday de- employed a transformer model [ 3 ], trained end-to-end vices such as computers, smartphones, and smartwatches, (E2E) directly on the raw data, outperforming traditional enabling convenient and continuous monitoring of in- methods such as convolutional neural networks (CNN) dividual health metrics. Subsequently, these data are trained on frequency-transformed data. These prelimifed into cloud-based applications where advanced AI al- nary results demonstrate the potential of our approach in gorithms analyze them, identifying patterns that may enabling accurate and eficient non-invasive diagnostics be indicative of underlying health issues. Also, the in- for voice disorders. tegration of multiple data sources allows for a holistic The paper begins with an introduction to AI and its medical applications (Section 2). It then outlines the proposed idea, and it addresses foreseen challenges (Section 3). We then present the preliminary results on a voice disorder detection case in Section 4, and conclude with a discussion on the framework’s impact (Section 5).

Published in the Proceedings of the Workshops of the EDBT/ICDT 2024 Joint Conference (March 25-28, 2024), Paestum, Italy $ gabriele.ciravegna@polito.it (G. Ciravegna)

© 2024 Copyright for this paper by its authors. Use permitted under Creative CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmmUoRns LWiceonsrekAstthribouptionP4r.0oIncteerenadtiionnagl s(CC(CBYE4U.0)R.-WS.org) in handling multi-omics and multi-modal data [12].

3. AI for non-invasive medicine In this paper, we propose a framework for analyzing

an individual’s health condition in time through noninvasive AI-powered diagnostics. Our proposed pipeline embodies a user-centric approach, empowering individuals to actively participate in their healthcare journey. From a medical point of view, we consider the analysis of all pathologies detectable by human experts through noninvasive diagnosis. Drawing parallels with advancements in other domains, we argue that AI models can replicate and potentially surpass human diagnostic accuracy also in this domain. This solution may enable large-scale screening through simple but accurate analysis of the patient data collected remotely in a non-invasive way.

2. Background

Deep Learning and Transformers Deep Learning is a robust method for uncovering patterns and insights from extensive datasets [ 4 ]. Unlike conventional machine learning approaches, Deep Learning models learn directly from raw data in an E2E manner, without the need for manual feature engineering. Transformer models are a prominent class of Deep Learning architectures [ 3 ], demonstrating the efectiveness of this ap- Data storage analysis Upon collection, the data are proach in analyzing sequential and multimodal data [5, 6]. uploaded to a cloud-based database and analyzed by Unlike traditional recurrent and convolutional neural net- means of a single advanced artificial intelligence model. works, Transformers can capture long-range dependen- The employment of the aforementioned transformer modcies and preserve contextual information over extended els accommodate diverse data modalities and sources. By sequences, thanks to their self-attention mechanisms. integrating multi-modal data processing capabilities, our This characteristic enables Transformers to process raw framework aims to capture a holistic view of an individmultimodal sequential data, such as text or time-series ual’s health status, enabling comprehensive and personaldata, as well as images, making them highly adaptable for ized diagnostics. Moreover, the collection of continuous various applications, including medical diagnostics [7, 8]. health data enable taking into consideration in-time evolution and predictive modelling. The model may also evolve and improve over time by means of the new data.

Data collection In Figure 1 we report a visualization

of the proposed framework. Leveraging everyday devices such as smartphones, laptops, or wearables, users can effortlessly record diverse types of data. Clearly, each type of device enables the collection of diferent data types.

Laptops and smartphones allow collecting typing and click patterns, texts, voice recordings, and pictures (the last two particularly through smartphones). Wearables, on the other side, allow (and are already employed for) monitoring heart rate, blood pressure, athletic performance and sleep conditions, to cite a few.

DL for Medicine Deep Learning has emerged as a transformative technology in various healthcare applications. First, DL can analyze electronic health records [9], Result visualization The processed results are enabling personalized treatment recommendations and then presented to users through an intuitive and userpredictive analytics for patient outcomes. In medical controlled application interface. Through this interimaging [10], DL models can interpret radiological im- face, users gain valuable insights into their health metages, including X-rays, CT scans, and MRI images, with rics, facilitating informed and proactive decisions regardlevels of accuracy comparable to or even surpassing that ing their healthcare. Furthermore, a user can transmit of human experts. These models have been utilized for the processed data and diagnostic results to their retasks such as disease classification, lesion detection, and mote healthcare professionals, allowing them to make intumor segmentation, the latter also performed in real- formed clinical decisions and make timely interventions. time during surgery. In drug discovery, AI algorithms Additionally, healthcare providers can utilize the data for have been shown capable of developing novel drugs population-level health monitoring and epidemiological through accurate predictions of protein structures [11]. studies, facilitating the identification of emerging health Finally, DL models have also shown increased capability trends and proactive public health interventions. 3.1. Challenges

The efectiveness of the proposed framework relies also on our ability to address and resolve a number of both technical and non-technical issues. 3.1.1. Data quality

The collection of data through personal devices introduces the risk of data noise, which encompasses various factors such as sensor inaccuracies, environmental interference, and incorrect user input. Sensor inaccuracies may lead to erroneous measurements, while environmental interference, such as background noise or lighting conditions, can distort the recorded data. Additionally, users may input incorrect data, such as taking pictures of the wrong part of their body, which can further exacerbate data noise and impact the analysis model. These challenges can hinder the model’s ability to accurately interpret and analyze the data, potentially leading to erroneous conclusions and suboptimal performance.

Research direction: data augmentation + input data checking To address the challenges posed by sensor inaccuracies and environmental interferences, a robust data augmentation pipeline can be implemented to mitigate these sources of noise. By incorporating various data augmentation techniques such as noise injection, signal ifltering, and data synthesis, the pipeline can generate diverse and representative training data that encapsulates the variability present in real-world scenarios [13]. Specific preprocessing tailored to each data modality can help normalize and enhance the quality of the collected data. Furthermore, to mitigate the issue of incorrect user input, a dedicated model can be employed to validate the accuracy of the collected data. This model can check each type of collected data, such as images or sensor readings, to verify their correctness and flag any discrepancies. coordinate and distribute training over several models without exchanging raw data. The goal is to mitigate privacy concerns associated with transmitting sensitive data to a centralized cloud-based database. Furthermore, personalized models generated through federated learning may exhibit higher diagnostic accuracy, as they are trained on data specific to each user’s health profile.

3.1.3. Human understanding & trustworthiness

The third issue concerns the explainability challenge associated with employing deep learning-based black box models for predictions. While these models ofer impressive performance, their inherent complexity makes them opaque and dificult to interpret. This lack of explainability presents challenges for clinicians and end-users who require insights into the model’s decision-making process [16, 17, 18, 19]. Without transparent explanations, stakeholders and regulatory institutions may hesitate to trust the diagnostic and monitoring framework.

Research direction: Concept-based XAI models A possible solution is represented by the employment of eXplainable AI (XAI) algorithms that can shed light on the decisions of the models [20, 21, 22]. Particularly, Concept-based XAI models ofer intrinsic interpretability by mapping raw data to interpretable high-level concepts before making class predictions, ofering insights into the model’s decision-making process [23, 24, 25]. This approach not only addresses the explainability challenge but also fosters trust and confidence in the diagnostic and monitoring outcomes produced by the system.

4. The case of voice disorder detection Vocal disorders are prevalent pathologies afecting a sig

nificant portion of the population and exerting a substan3.1.2. Privacy preservation tial impact on patients’ quality of life [26, 27, 28, 29]. The second problem arises from privacy concerns asso- These disorders may originate from various causes, inciated with collecting sensitive data and transmitting cluding both benign and malignant conditions, and neuit to a cloud-based database. Individuals may be hesi- rodegenerative disorders [30, 31, 32]. Diagnosis often tant to share sensitive information due to concerns about relies on clinicians’ auditory assessments of patients’ data security and privacy breaches [14]. Transmitting voices, highlighting the critical need for accurate and such data to a cloud-based database further exacerbates timely detection. Here, a DL model is used to analyze the these concerns, as it involves relinquishing control over raw recordings and automatically detect patterns indicapersonal information to third-party service providers. tive of vocal disorders and distinguish between various Moreover, regulatory compliance [15] impose stringent pathologies, including nodules, polyps, cysts, spasmodic requirements on the handling of personal health infor- dysphonia or vocal cord paralysis. mation, adding complexity to the data collection process.

Research direction: a federated learning approach The privacy issue can be efectively addressed through the adoption of a federated approach. Federated learning is an area of machine learning studying how to

Preliminary experiments We demonstrate significant advancements over prior attempts in voice disorder detection using AI models [33, 34]1. As reported in Figure 2, our approach achieves notable improvements in

1We re-implemented these models for a fair comparison.

accuracy, up to +30%, primarily attributed to the utilization of a Transformer model rather than a Convolutional Neural Network (CNN). The Transformer’s inherent ability to process raw time-series data E2E without any timefrequency preprocessing ofers distinct advantages in analyzing voice data [35]. Additionally, we introduce a robust data augmentation pipeline and consider both vowel and sentence-based recordings, further enhancing performance by up to +10% compared to the employment of a standard Transformer model only. As reported in Figure 3, similar considerations hold also for the classification of the macro-category pathology. These enhancements underscore the eficacy of our approach in the diagnosing vocal disorders.

5. Discussion Overall, the proposed pipeline holds promise for improv

ing healthcare delivery, leveraging AI to enable noninvasive diagnostics and monitoring. The medical team involved in the development of this pipeline also believes in its transformative impact from a social, technological, and economic point of view.

Social Impact By enabling individuals to record various types of data remotely using everyday devices, the proposed pipeline facilitates non-invasive diagnostic procedures, eliminating the need for expensive and invasive tests. The accessibility of the proposed pipeline extends beyond traditional healthcare settings, allowing individuals in remote or underserved areas to access diagnostic services conveniently. Individuals can be monitored remotely, allowing healthcare providers to track their health status in real-time and intervene promptly if abnormalities are detected [36]. Early detection and intervention facilitated by the pipeline lead to improved prognoses for patients, as healthcare providers can initiate treatment at earlier stages of disease progression.

Technological Impact The proposed framework has significant technological implications. By leveraging the adaptable nature of the framework, models developed for one medical application can be readily deployed and tested in related contexts, accelerating the pace of medical research and innovation. As an example, the presented voice disorder detection model could be tested for neurodegenerative patients. Additionally, the framework enables the fine-tuning of models for specific cases, including those with limited data availability, such as rare diseases. Despite the scarcity of data in such contexts, the model can still generalize due to its original training on a larger and diverse dataset.

Economic Impact From an economic point of view, the proposed pipeline has low operating costs due to the utilization of personal devices for data collection and cloud-based analysis. Additionally, the framework facilitates the collection of low-cost, virtuous-cycle data, enabling continuous monitoring and feedback loops that enhance the accuracy and efectiveness of diagnostic and monitoring processes over time. Moreover, the framework lowers the burden of diagnosis and monitoring on public healthcare structures. This, in turn, enables healthcare professionals to focus on more complex and critical medical issues, ultimately improving the eficiency and resource allocation of the healthcare system.

[1]

Rajpurkar ,

Chen ,

Banerjee ,

E. J.

Topol , Ai in health and medicine , Nature medicine 28 ( 2022 ) 31 - 38 .

[2]

J.-R.

Rueda , I. Sola ,

Pascual ,

M. S.

Casacuberta , Non-invasive interventions for improving well-being and quality of life in patients with lung cancer, Cochrane Database of Systematic Reviews ( 2011 ).

[3]

Vaswani ,

Shazeer ,

Parmar ,

Uszkoreit ,

Jones ,

A. N.

Gomez , Ł. Kaiser, I. Polosukhin , Attention is all you need , Advances in neural information processing systems 30 ( 2017 ).

[4]

LeCun , Y. Bengio, G. Hinton, Deep learning , nature 521 ( 2015 ) 436 - 444 .