<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Voice 29 (2015) 389-e27.
2022</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1145/3503161.3551594</article-id>
      <title-group>
        <article-title>Non-invasive AI-powered Diagnostics: The case of Voice-Disorder Detection - Vision paper</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriele Ciravegna</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alkis Koudounas</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Fantini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tania Cerquitelli</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Baralis</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erika Crosetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Succo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ENT Clinic - Head and Neck Cancer Unit, San Giovanni Bosco Hospital</institution>
          ,
          <addr-line>Turin</addr-line>
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ENT Unit, San Feliciano Hospital</institution>
          ,
          <addr-line>Rome</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Politecnico di Torino, Corso Duca degli Abruzzi</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>202</volume>
      <fpage>3451</fpage>
      <lpage>3460</lpage>
      <abstract>
        <p>This paper proposes a novel pipeline for non-invasive diagnosis and monitoring in healthcare, leveraging artificial intelligence (AI). The pipeline allows individuals to record various health data using everyday devices and analyze it via AI algorithms on a cloud-based platform. Experimental results on voice disorder detection demonstrate the efectiveness of the proposed approach when compared to existing solutions. Additionally, we discuss the positive impact of the pipeline on diagnosis, prognosis, and monitoring, emphasizing its non-invasive nature. Overall, we think the proposed pipeline might contribute to advancing AI-driven healthcare solutions with implications for global healthcare delivery.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Artificial Intelligence</kwd>
        <kwd>Non-invasive diagnostics</kwd>
        <kwd>Voice disorder recognition</kwd>
        <kwd>Voice analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>understanding of an individual’s health, facilitating early
detection and personalized intervention strategies.</p>
      <p>
        Artificial Intelligence (AI) is increasingly integrated into The envisioned framework has the potential to
imhealthcare, ofering opportunities to improve diagnostics, prove healthcare delivery as well as have economic and
treatment, and patient care. Through machine learning technological impacts. First, by enabling large scale
algorithms, AI systems can analyze medical data, provid- screening, it may increase early detection, allowing for
ing clinical decision support and personalized treatment timely intervention and improved treatment outcomes.
options [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Non-invasive diagnostics is a critical aspect Second, by analyzing the evolution of patient data in
of modern healthcare delivery, ofering patients a less in- time, healthcare providers may tailor interventions,
optitrusive and more comfortable care experience. This holds mizing treatment eficacy and patient satisfaction. From
for both the diagnostic and the monitoring processes, im- an economic perspective, the improved eficiency of
diproving the overall quality of life for patients [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Beyond agnosis and treatments may reduce overall healthcare
enhancing patient comfort, these methods also improve costs. Finally, the developed models may be employed or
accessibility to healthcare services, particularly for un- fine-tuned in related data-scarce contexts.
derserved populations or those with limited access to As part of our investigation, we conducted a
prelimspecialized medical facilities. inary study on voice disorder detection, a key aspect
      </p>
      <p>
        We propose an AI-based framework for non-invasive of non-invasive diagnostics. We trained a deep
learndiagnostics that integrates various data modalities, in- ing model for analyzing voice recordings that achieves
cluding voice recordings, self-pictures, typing patterns, very high precision in detecting the presence of
patholECG readings, and sleep analysis, among others. These ogy and accurately identifies the type of pathology. We
diverse sources of data are collected through everyday de- employed a transformer model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], trained end-to-end
vices such as computers, smartphones, and smartwatches, (E2E) directly on the raw data, outperforming traditional
enabling convenient and continuous monitoring of in- methods such as convolutional neural networks (CNN)
dividual health metrics. Subsequently, these data are trained on frequency-transformed data. These
prelimifed into cloud-based applications where advanced AI al- nary results demonstrate the potential of our approach in
gorithms analyze them, identifying patterns that may enabling accurate and eficient non-invasive diagnostics
be indicative of underlying health issues. Also, the in- for voice disorders.
tegration of multiple data sources allows for a holistic The paper begins with an introduction to AI and its
medical applications (Section 2). It then outlines the
proposed idea, and it addresses foreseen challenges
(Section 3). We then present the preliminary results on a
voice disorder detection case in Section 4, and conclude
with a discussion on the framework’s impact (Section 5).
      </p>
      <p>Published in the Proceedings of the Workshops of the EDBT/ICDT 2024
Joint Conference (March 25-28, 2024), Paestum, Italy
$ gabriele.ciravegna@polito.it (G. Ciravegna)</p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmmUoRns LWiceonsrekAstthribouptionP4r.0oIncteerenadtiionnagl s(CC(CBYE4U.0)R.-WS.org)
in handling multi-omics and multi-modal data [12].</p>
    </sec>
    <sec id="sec-2">
      <title>3. AI for non-invasive medicine</title>
      <sec id="sec-2-1">
        <title>In this paper, we propose a framework for analyzing</title>
        <p>an individual’s health condition in time through
noninvasive AI-powered diagnostics. Our proposed pipeline
embodies a user-centric approach, empowering
individuals to actively participate in their healthcare journey.
From a medical point of view, we consider the analysis of
all pathologies detectable by human experts through
noninvasive diagnosis. Drawing parallels with advancements
in other domains, we argue that AI models can replicate
and potentially surpass human diagnostic accuracy also
in this domain. This solution may enable large-scale
screening through simple but accurate analysis of the
patient data collected remotely in a non-invasive way.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Background</title>
      <p>
        Deep Learning and Transformers Deep Learning is
a robust method for uncovering patterns and insights
from extensive datasets [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Unlike conventional
machine learning approaches, Deep Learning models learn
directly from raw data in an E2E manner, without the
need for manual feature engineering. Transformer
models are a prominent class of Deep Learning
architectures [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], demonstrating the efectiveness of this ap- Data storage analysis Upon collection, the data are
proach in analyzing sequential and multimodal data [5, 6]. uploaded to a cloud-based database and analyzed by
Unlike traditional recurrent and convolutional neural net- means of a single advanced artificial intelligence model.
works, Transformers can capture long-range dependen- The employment of the aforementioned transformer
modcies and preserve contextual information over extended els accommodate diverse data modalities and sources. By
sequences, thanks to their self-attention mechanisms. integrating multi-modal data processing capabilities, our
This characteristic enables Transformers to process raw framework aims to capture a holistic view of an
individmultimodal sequential data, such as text or time-series ual’s health status, enabling comprehensive and
personaldata, as well as images, making them highly adaptable for ized diagnostics. Moreover, the collection of continuous
various applications, including medical diagnostics [7, 8]. health data enable taking into consideration in-time
evolution and predictive modelling. The model may also
evolve and improve over time by means of the new data.
      </p>
      <sec id="sec-3-1">
        <title>Data collection In Figure 1 we report a visualization</title>
        <p>of the proposed framework. Leveraging everyday devices
such as smartphones, laptops, or wearables, users can
effortlessly record diverse types of data. Clearly, each type
of device enables the collection of diferent data types.</p>
        <p>Laptops and smartphones allow collecting typing and
click patterns, texts, voice recordings, and pictures (the
last two particularly through smartphones). Wearables,
on the other side, allow (and are already employed for)
monitoring heart rate, blood pressure, athletic
performance and sleep conditions, to cite a few.</p>
        <p>DL for Medicine Deep Learning has emerged as a
transformative technology in various healthcare
applications. First, DL can analyze electronic health records [9], Result visualization The processed results are
enabling personalized treatment recommendations and then presented to users through an intuitive and
userpredictive analytics for patient outcomes. In medical controlled application interface. Through this
interimaging [10], DL models can interpret radiological im- face, users gain valuable insights into their health
metages, including X-rays, CT scans, and MRI images, with rics, facilitating informed and proactive decisions
regardlevels of accuracy comparable to or even surpassing that ing their healthcare. Furthermore, a user can transmit
of human experts. These models have been utilized for the processed data and diagnostic results to their
retasks such as disease classification, lesion detection, and mote healthcare professionals, allowing them to make
intumor segmentation, the latter also performed in real- formed clinical decisions and make timely interventions.
time during surgery. In drug discovery, AI algorithms Additionally, healthcare providers can utilize the data for
have been shown capable of developing novel drugs population-level health monitoring and epidemiological
through accurate predictions of protein structures [11]. studies, facilitating the identification of emerging health
Finally, DL models have also shown increased capability trends and proactive public health interventions.
3.1. Challenges</p>
      </sec>
      <sec id="sec-3-2">
        <title>The efectiveness of the proposed framework relies also on our ability to address and resolve a number of both technical and non-technical issues.</title>
        <sec id="sec-3-2-1">
          <title>3.1.1. Data quality</title>
          <p>The collection of data through personal devices
introduces the risk of data noise, which encompasses various
factors such as sensor inaccuracies, environmental
interference, and incorrect user input. Sensor inaccuracies
may lead to erroneous measurements, while
environmental interference, such as background noise or lighting
conditions, can distort the recorded data. Additionally,
users may input incorrect data, such as taking pictures
of the wrong part of their body, which can further
exacerbate data noise and impact the analysis model. These
challenges can hinder the model’s ability to accurately
interpret and analyze the data, potentially leading to
erroneous conclusions and suboptimal performance.</p>
          <p>Research direction: data augmentation + input
data checking To address the challenges posed by sensor
inaccuracies and environmental interferences, a robust
data augmentation pipeline can be implemented to
mitigate these sources of noise. By incorporating various data
augmentation techniques such as noise injection, signal
ifltering, and data synthesis, the pipeline can generate
diverse and representative training data that
encapsulates the variability present in real-world scenarios [13].
Specific preprocessing tailored to each data modality can
help normalize and enhance the quality of the collected
data. Furthermore, to mitigate the issue of incorrect user
input, a dedicated model can be employed to validate the
accuracy of the collected data. This model can check each
type of collected data, such as images or sensor readings,
to verify their correctness and flag any discrepancies.
coordinate and distribute training over several models
without exchanging raw data. The goal is to mitigate
privacy concerns associated with transmitting sensitive
data to a centralized cloud-based database. Furthermore,
personalized models generated through federated
learning may exhibit higher diagnostic accuracy, as they are
trained on data specific to each user’s health profile.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.1.3. Human understanding &amp; trustworthiness</title>
          <p>The third issue concerns the explainability challenge
associated with employing deep learning-based black box
models for predictions. While these models ofer
impressive performance, their inherent complexity makes them
opaque and dificult to interpret. This lack of
explainability presents challenges for clinicians and end-users who
require insights into the model’s decision-making
process [16, 17, 18, 19]. Without transparent explanations,
stakeholders and regulatory institutions may hesitate to
trust the diagnostic and monitoring framework.</p>
          <p>Research direction: Concept-based XAI models
A possible solution is represented by the employment
of eXplainable AI (XAI) algorithms that can shed light
on the decisions of the models [20, 21, 22]. Particularly,
Concept-based XAI models ofer intrinsic interpretability
by mapping raw data to interpretable high-level concepts
before making class predictions, ofering insights into
the model’s decision-making process [23, 24, 25]. This
approach not only addresses the explainability challenge
but also fosters trust and confidence in the diagnostic
and monitoring outcomes produced by the system.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. The case of voice disorder detection</title>
      <sec id="sec-4-1">
        <title>Vocal disorders are prevalent pathologies afecting a sig</title>
        <p>nificant portion of the population and exerting a
substan3.1.2. Privacy preservation tial impact on patients’ quality of life [26, 27, 28, 29].
The second problem arises from privacy concerns asso- These disorders may originate from various causes,
inciated with collecting sensitive data and transmitting cluding both benign and malignant conditions, and
neuit to a cloud-based database. Individuals may be hesi- rodegenerative disorders [30, 31, 32]. Diagnosis often
tant to share sensitive information due to concerns about relies on clinicians’ auditory assessments of patients’
data security and privacy breaches [14]. Transmitting voices, highlighting the critical need for accurate and
such data to a cloud-based database further exacerbates timely detection. Here, a DL model is used to analyze the
these concerns, as it involves relinquishing control over raw recordings and automatically detect patterns
indicapersonal information to third-party service providers. tive of vocal disorders and distinguish between various
Moreover, regulatory compliance [15] impose stringent pathologies, including nodules, polyps, cysts, spasmodic
requirements on the handling of personal health infor- dysphonia or vocal cord paralysis.
mation, adding complexity to the data collection process.</p>
        <p>Research direction: a federated learning
approach The privacy issue can be efectively addressed
through the adoption of a federated approach. Federated
learning is an area of machine learning studying how to</p>
        <p>Preliminary experiments We demonstrate
significant advancements over prior attempts in voice disorder
detection using AI models [33, 34]1. As reported in
Figure 2, our approach achieves notable improvements in</p>
      </sec>
      <sec id="sec-4-2">
        <title>1We re-implemented these models for a fair comparison.</title>
        <p>accuracy, up to +30%, primarily attributed to the
utilization of a Transformer model rather than a Convolutional
Neural Network (CNN). The Transformer’s inherent
ability to process raw time-series data E2E without any
timefrequency preprocessing ofers distinct advantages in
analyzing voice data [35]. Additionally, we introduce
a robust data augmentation pipeline and consider both
vowel and sentence-based recordings, further enhancing
performance by up to +10% compared to the employment
of a standard Transformer model only. As reported in
Figure 3, similar considerations hold also for the
classification of the macro-category pathology. These
enhancements underscore the eficacy of our approach in
the diagnosing vocal disorders.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <sec id="sec-5-1">
        <title>Overall, the proposed pipeline holds promise for improv</title>
        <p>ing healthcare delivery, leveraging AI to enable
noninvasive diagnostics and monitoring. The medical team
involved in the development of this pipeline also believes
in its transformative impact from a social, technological,
and economic point of view.</p>
        <p>Social Impact By enabling individuals to record
various types of data remotely using everyday devices, the
proposed pipeline facilitates non-invasive diagnostic
procedures, eliminating the need for expensive and invasive
tests. The accessibility of the proposed pipeline extends
beyond traditional healthcare settings, allowing
individuals in remote or underserved areas to access
diagnostic services conveniently. Individuals can be monitored
remotely, allowing healthcare providers to track their
health status in real-time and intervene promptly if
abnormalities are detected [36]. Early detection and
intervention facilitated by the pipeline lead to improved
prognoses for patients, as healthcare providers can
initiate treatment at earlier stages of disease progression.</p>
        <p>Technological Impact The proposed framework has
significant technological implications. By leveraging the
adaptable nature of the framework, models developed
for one medical application can be readily deployed and
tested in related contexts, accelerating the pace of
medical research and innovation. As an example, the
presented voice disorder detection model could be tested for
neurodegenerative patients. Additionally, the framework
enables the fine-tuning of models for specific cases,
including those with limited data availability, such as rare
diseases. Despite the scarcity of data in such contexts,
the model can still generalize due to its original training
on a larger and diverse dataset.</p>
        <p>Economic Impact From an economic point of view,
the proposed pipeline has low operating costs due to the
utilization of personal devices for data collection and
cloud-based analysis. Additionally, the framework
facilitates the collection of low-cost, virtuous-cycle data,
enabling continuous monitoring and feedback loops that
enhance the accuracy and efectiveness of diagnostic and
monitoring processes over time. Moreover, the
framework lowers the burden of diagnosis and monitoring on
public healthcare structures. This, in turn, enables
healthcare professionals to focus on more complex and critical
medical issues, ultimately improving the eficiency and
resource allocation of the healthcare system.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rajpurkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Topol</surname>
          </string-name>
          ,
          <article-title>Ai in health and medicine</article-title>
          ,
          <source>Nature medicine 28</source>
          (
          <year>2022</year>
          )
          <fpage>31</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Rueda</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pascual</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Casacuberta</surname>
          </string-name>
          ,
          <article-title>Non-invasive interventions for improving well-being and quality of life in patients with lung cancer, Cochrane Database of Systematic Reviews (</article-title>
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , Y. Bengio, G. Hinton,
          <article-title>Deep learning</article-title>
          , nature
          <volume>521</volume>
          (
          <year>2015</year>
          )
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>