<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Metaverse for Intelligent Healthcare: Opportunities and Challenges</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerio Guarrasi</string-name>
          <email>valerio.guarrasi@unicampus.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Tronchin</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Camillo Maria Caruso</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aurora Rofena</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guido Manni</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fatih Aksu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico Paolo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giulio Iannello</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rosa Sicilia</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ermanno Cordelli</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Soda</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Biomedical Sciences, Humanitas University</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Radiation Sciences, Radiation Physics, Biomedical Engineering, Umeå University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Unit of Computer Systems and Bioinformatics, Department of Engineering, University Campus Bio-Medico of Rome</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>for AI models. Therefore, there is a need for resilient AI</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>This abstract discusses the development of a metaverse for intelligent healthcare, which involves creating a virtual environment where healthcare professionals, patients, and researchers can interact and collaborate using digital technologies. The metaverse can improve the eficiency and efectiveness of healthcare services and provide new opportunities for research and innovation. AI models are necessary for analyzing patient data and providing personalized healthcare recommendations, but the data in a metaverse setting is inherently multimodal, unstructured, noisy, incomplete, limited, or partially inconsistent, which poses a challenge for AI models. However, it becomes necessary the integration of AI models for the development of virtual scanners to simulate image modalities, and robotics to simulate surgical procedures within a virtual environment. The ultimate goal is to leverage the power of AI to enhance the quality of healthcare in a metaverse for intelligent healthcare, which has the potential to transform the way healthcare services are delivered and improve health outcomes for patients worldwide.</p>
      </abstract>
      <kwd-group>
        <kwd>Challenges</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The development of a metaverse for intelligent
healthcare involves the creation of a virtual environment where
healthcare professionals, patients, and researchers can
interact and collaborate using digital technologies [1]. The
metaverse can be thought of as a virtual world where
users can engage with each other in real-time, create and
manipulate digital objects, and perform complex tasks in
a simulated environment. In the context of healthcare,
the metaverse can be used to improve the eficiency and
efectiveness of healthcare services, as well as provide
ample, healthcare professionals can use the metaverse to
perform virtual consultations with patients, monitor their
health remotely, and collaborate with other healthcare
providers in real-time.
care requires a range of technologies and skills, including
artificial intelligence (AI), virtual and augmented reality,
natural language processing, and data analytics. These
technologies can be used to create realistic simulations
of healthcare scenarios, analyze patient data, and
pro2.</p>
    </sec>
    <sec id="sec-2">
      <title>Multimodal Learning</title>
      <sec id="sec-2-1">
        <title>The healthcare metaverse can use multimodal data in var</title>
      </sec>
      <sec id="sec-2-2">
        <title>Multimodal data refers to data that is generated from</title>
        <p>new opportunities for research and innovation. For ex- limited, or partially inconsistent, which poses a challenge</p>
        <p>The development of a metaverse for intelligent health- ners to generate virtual medical images for patients. This
multiple sources [2].</p>
        <p>By combining data from diferent sources, such as
medical images, genetic information, medical history
available within the electronic health records, and lifestyle
data, the healthcare metaverse can exploit multimodal
data to create personalized patient health profiles that,
in turn, ofer the chance to tailor treatments and
interventions. This would help improve the accuracy and
eficiency of medical diagnoses and prognoses.</p>
        <p>Next sections overview research ongoing in our lab on
this topic.
2.1. When, which and how?
end-to-end model. It exploits Pareto multi-objective
optimization working with a performance metric and the
diversity score of multiple candidate unimodal neural
networks to be fused. We attain state-of-the-art results,
not only outperforming the baseline performance but
also being robust to external validation. Via this method,
we automatically understand “which” are the most suited
modalities and models for the task, “when” the fusion of
the modalities should occur, and “how” to optimally fuse
them.</p>
        <sec id="sec-2-2-1">
          <title>2.2. Multimodal Ensemble for Overall</title>
        </sec>
        <sec id="sec-2-2-2">
          <title>Survival</title>
          <p>Deep multimodal learning has shown promising perfor- In the context of lung cancer, multimodal learning is
mance also encompassing those attained by traditional becoming an increasingly important research area as
machine learning approaches. This happens because it can provide insights into the optimal treatment for
deep neural networks permit us to fuse the learners ex- aggressive tumors. In this particular work [5], we utilized
ploiting the loss backpropagation at diferent depths. the CLARO dataset [7], which includes CT images and
However, understanding “when”, “which”, and “how” clinical data from non-small-cell lung cancer patients, to
to fuse the modalities is the main open methodological investigate the use of multimodal learning for predicting
research question now opened. overall survival.</p>
          <p>The “when” question involves determining the optimal We employed a late fusion approach, which involves
depth in the network architecture to combine diferent training unimodal models on each modality separately
types of data, such as imaging and clinical data, to im- and then combining the results using an ensemble
prove accuracy and reliability. To determine the ideal method. We selected the optimal set of classifiers for
point of fusion among the diferent modalities in a joint the ensemble by solving a multi-objective optimization
fusion scenario, we have developed an iterative algorithm problem that maximizes both performance and diversity
that increases the number of fusion connections among of the unimodal models. The results of this study
demonconvolutional networks, enabling us to obtain optimal strate the potential of multimodal learning in improving
results. Our findings indicate that the gradual fusion the prediction of overall survival in lung cancer patients.
mechanism among modalities is highly efective, result- The proposed ensemble outperformed models trained on
ing in superior performance compared to traditional fu- a single modality and achieves state-of-the-art results on
sion techniques. the task at hand.</p>
          <p>The “which” question involves determining which
modalities and which models to fuse. Indeed, given a 2.3. Multimodal XAI
task characterized by a specific source of data available,
practitioners and researchers often need to find out the Explainable Artificial Intelligence (XAI) methods are
bemost relevant and useful models. In this respect in [3] coming increasingly important in the development of
we present a multi-objective optimized ensemble search intelligent healthcare systems, particularly in the context
that selects the best ensemble of networks to satisfy a of multimodal learning [8].
classification task, outperforming individual models. Our The use of multimodal learning can also make it more
method optimizes the search of this optimal ensemble by challenging to understand how the model arrived at its
deexploiting both evaluation and diversity metrics and it cision, particularly when the model is making decisions
was applied to various contexts like COVID-19 diagno- based on multiple inputs. This is where XAI methods
sis [4] and lung cancer overall survival [5] (deepened in become essential. XAI methods can help healthcare
prothe next subsection). fessionals and patients understand how the model arrived</p>
          <p>The “how” question deals with determining how to at its decision by providing interpretable and transparent
fuse the modalities. This involves determining the best explanations. This can help increase trust in the system
method for integrating diferent types of data, such as and ultimately improve patient outcomes.
using convolutional neural networks to analyze images In the context of a metaverse for intelligent healthcare,
and recurrent neural networks to analyze time-series XAI methods could be used to explain how the system
data. is diagnosing a patient based on their symptoms and</p>
          <p>To cope with these three questions in [6] we present medical history. This explanation could be provided in a
a novel approach optimizing the setup of a multimodal
visual or interactive format, making it easy for the patient the immediate broadcast allows the entire paradigm to be
to understand and engage with. restructured by building a single model passed between</p>
          <p>Practical applications have impacted supervised mul- clients, even eliminating the role of the server and thus
timodal fusion to explain decisions taken when identi- halving the number of parameters sent in each round.
fying patients afected by SARS-CoV-2 at risk of severe
outcomes, such as intensive care or death. Using chest
X-ray scans and clinical data, our approach creates a 3. Resilient AI
multimodal embedded representation of the data using The problem of unstructured, noisy, incomplete, limited
a convolutional Autoencoder for the imaging modality in number, or partially inconsistent data is a significant
and an Autoencoder for the tabular modality [9]. This challenge for many areas of data-driven research,
espeembedded representation is then joined with a multi- cially in healthcare. In AI, such situations could impact
layer perceptron that performs the classification task. To models’ accuracy and reliability, leading to incorrect or
help us understand how the network performs classifica- biased outcomes. Hence, developing resilient AI systems
tions, our novel XAI algorithm works on the variations of able to handle such types of data is crucial to ensure the
the embedding, computed by applying a latent shift that ethical and responsible use of AI in various applications.
simulates a counterfactual prediction. This reveals the
features of each modality that contribute the most to the
decision and computes a quantitative score indicating the 3.1. Missing Features
modality’s importance. By reducing the model’s
opacity, this approach improves trust and transparency for
doctors and regulators who may have dificulty trusting
the MDL models: indeed, the results on the AIforCOVID
dataset [10], which contains multimodal data of
clinical records and chest X-ray images of patients from six
Italian hospitals, show that the proposed method
provides meaningful explanations without degrading the
classification performance for the early identification of
COVID-19-positive patients at risk of severe outcome.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Missing data is a common problem in healthcare datasets,</title>
        <p>occurring when some information is not available for
some patients or variables in a dataset. Missing data not
only could bias the results, but it often contrasts with the
needs of AI models, which often require complete data
to function properly.</p>
        <p>There are several methods for dealing with missing
data in healthcare datasets, including imputation or
deletion techniques. However, each approach has its own
strengths and limitations, and the appropriate method
depends on the specifics of the dataset and the research
2.4. Federated Learning question being addressed.</p>
        <p>To address such issues, we developed a Transformer
It is common practice in AI research to collect data from model that masks the missing features during
trainmultiple sources and send it to a central server for compu- ing [12], so that the model ignores any missing data
tation. However, this approach poses several challenges during training. This approach eliminates the need for
in healthcare where sensitive patient data must be pro- imputation or deletion techniques, as the model simply
tected to avoid privacy violations. does not consider the missing data. In practice, the
pro</p>
        <p>To address this issue, federated learning has emerged posed method leverages the idea of the mask inside the
as a potential solution [11]. This is a critical require- self-attention module to learn from incomplete input
ment for the development of an eficient metaverse of data, which signals to the model only the positional
enintelligent healthcare, where patient data is protected coding eliminating the input embeddings. Furthermore,
and used to enhance patient care. Federated learning al- to shift to the use of tabular data with a Transformer
lows for training of a shared global model with a central model, we introduce a novel type of positional encoding
server while keeping the data private in local institutions, that identifies the feature and not the position.
thereby promoting a greener and more secure practice. We experimentally validated the method on a
clas</p>
        <p>In this area, we propose a new paradigm as a variant of sification task that aims to predict the overall survival
the most widely used approach, in which, instead of train- in patients afected by non-small-cell lung cancer using
ing individual client models independently for a number clinical data from the CLARO [7] project. The results
of epochs each turn, and then sending their weights back show that this approach overcomes the limitations of
trato the server, which takes care of aggregating them into a ditional imputation methods, reduces bias, and improves
single set of weights and redistributing them back to the the accuracy of the final analysis.
models, the concept of a token is introduced, passed to
each epoch sequentially or randomly among the clients,
which is intended to allow the weights to be sent to the 3.2. Siamese Networks
server only by its owner, who redistributes them directly
to all models. The absence of local training epochs and
It is well known that the AI’s power of analyzing vast
amounts of data is an element lying behind models’
per4. Virtual Scanner
formance. However, data availability is a major barrier
in many domains, healthcare and metaverse included. To
overcome this limitation, several works in the literature In a metaverse for intelligent healthcare, a virtual
scanhave studied how to learn in case of limited training data, ner refers to a computer-generated imaging device that
and, among them, Siamese networks are a viable alter- uses virtual reality technology to create medical images
native. They consist of two or more identical networks of a patient’s body. Virtual scanners can be used to
creworking in parallel and triplet networks is an established ate detailed 3D images of internal organs, bones, tissues,
subtype of this approach, where three identical networks and other structures without the need for invasive
proare used. During training, two of the inputs of these cedures, also minimizing patient discomfort as well as
three networks belong to the same class, while the third allowing medical professionals to view and manipulate
belongs to a diferent class. The main goal is to learn images in ways that would not be possible with
tradia feature space where each class forms a cluster, used tional imaging techniques. AI image translation models
in the inference step to classify new instances. Since can be particularly useful in this context. These models
triplet networks utilize inter-class diversities in addition use deep learning (DL) algorithms to translate medical
to intra-class similarities, and the number of possible images from one modality to another. In this context, we
triplets is much higher than the number of instances, are working in three directions.
they can be a convenient option for applications where In the first [ 13], we aim to develop and validate DL
the data is limited. models to perform Virtual Contrast Enhancement (VCE)</p>
        <p>On a private dataset with 86 patients, representing a in Contrast Enhanced Spectral Mammography (CESM).
real case with limited data, we demonstrated that triplet It is a diagnostic imaging technique in breast cancer that,
networks outperform deep networks using the softmax unlike standard mammography, involves the injection of
classifier to predict histological NSCLC subtypes from an iodinated contrast medium, which difuses into the
tuCT scans. mor tissue and enhances lesion visibility. Although this
results in improved diagnostic accuracy, especially in
pa3.3. Name-entity Recognition tients with dense parenchymal tissue, and although it is
a more afordable and accessible alternative to
contrastThe use of electronic health records (EHRs) can provide enhanced MRI, CESM has two main weaknesses. One is a
valuable data for medical research since physicians regis- biological risk due to ionizing radiation, whilst the other
ter information that describes symptoms, the diagnosis, refers to possible reactions to iodinated contrast agents,
treatments, and the evolution of the patient over time, such as nephropathy. CESM acquires a low-energy (LE)
which represent an invaluable source of data to build and a high-energy (HE) image in quick succession for
patient metaverse. However, the analysis of EHRs can each breast: the LE image is equivalent to standard
mambe dificult due to the presence of a large amount of un- mography, whereas the HE image is post-processed with
structured data and complex clinical language used by the corresponding LE image to compute the recombined
physicians. Clinical name-entity recognition (NER) is the image, which suppresses parenchymal tissue so that only
natural language processing task of identifying and cate- areas of contrast enhancement are visible. Our VCE task
gorizing medical information (entities) in clinical text. asks AI models to output a virtually recombined image</p>
        <p>We investigate the use of clinical NER to extract rele- having only the LE image as input. To this end, we
comvant clinical information from Italian-language EHRs of pared three DL models: an autoencoder, a CycleGAN,
NSCLC patients included in the CLARO project, investi- and the Pix2Pix. Results on a public [14] and on a private
gating a transformed-based approach. In particular, we dataset with 1003 and 105 patients scans, respectively,
aim to fine-tune bioBIT, an Italian biomedical pretrained show that the CycleGAN is able to efectively perform
model, that derives from BERT, for the task of clinical the image-to-image translation task in the lesion areas,
NER. To this end, we also first identified a set of relevant suggesting VCE is a viable alternative that deserves more
entities, which are then used to annotate the data cohort. research eforts. Therefore, VCE can be useful in a
meta</p>
        <p>Class imbalance learning is a common issue for clinical verse for intelligent healthcare since it can reduce the
NER since some entities may appear much less frequently need for the injection of an iodinated contrast medium
than others. To overcome it, we propose to use the Focal and it can eliminate the need for undergoing double
radiLoss, making the model more focused on rarer entities. ation. This would make imaging procedures safer and it
On a cohort of patients afected by Non-Small Cell Lung would reduce the time and cost associated with obtaining
Cancer, we obtained promising f1 score results in the diagnostic images.
recognition of clinical entities. Turning our attention to the second task, we aim to
develop and validate DL models to perform MR-to-CT
image translation—to generate synthetic CT (sCT) images
from real MR images. While CT is required for treatment
planning, due to its excellent anatomic localisation, it
is often complemented with MR imaging, because of its
superior soft-tissue contrast. However, taking multiple
images can be cost-prohibitive, burdensome to the
patient, and problematic in light of CT ionization risk. For
these reasons, MR-only treatment planning has become
an attractive alternative, and one of the main tasks in the
MR-only framework is MR-to-CT image translation—to
generate sCT images from real MR images. Many DL
models have been developed for the MR-to-CT
translation task, using convolutional neural networks to
generate sCT images, by minimizing pixel-wise diferences to
reference CT volumes. More recently, GAN models have
been used to solve this task: the pix2pix model has been
used, obtaining very promising results when there are
coregistered MR and CT images, and the CycleGAN model
has been used to handle the case with unpaired MR and
CT images. One of the main issues, however, has
consistently been the limited access to large datasets. Most
ML tasks in medical imaging sufer from a lack of data
and/or a lack of manually annotated data. To manage
this issue, a technique called data augmentation has been
widely utilized to improve generalization and robustness
when training deep neural networks. We developed a
novel strategy for latent data augmentation, exploiting
the learned internal latent representation of a StyleGAN2
model, to generate medical images of suficiently high
resolution and quality that can be used to augment the
real patient data available to develop a model that can
generate sCT images from MR images. The StyleGAN2
model allows to generate multiple imaging modalities
(i.e., MR and CT) of the same underlying synthetic
patient. A number of augmented/generated images are used
together with a number of real images (the fraction
decided by the analyst) to train the MRI-to-CT translation
model. The StyleGAN2 model is able to generate highly
realistic synthetic images, but it does not provide any
control over the generated images. We developed a novel
generative procedure that has two main components: 1)
an improved inverse generator, that embeds real MR and
CT images in the StyleGAN2 model’s learnt latent space
by enforcing the resulting regenerated images to retain
the semantic properties of the original real images at
multiple levels; 2) a regularisation term controlling the
distance in the latent space of a new purely generated
image to the real images, which allows the analyst to tune
the variability in the generated images. In particular, the
proposed approach allows to maximize the diversity of
the data sampled from the GAN manifold while retaining
their realness, i.e., the synthesized images will be
plausible datapoints from the learned real manifold (of real
CT-MR images). A comparative analysis between the
performance of the pix2pix model trained with and
without data augmentation procedure is performed, showing
a reduction of both scores using latent augmentation
procedure. Thus, latent DA can be used to develop better
MR-to-CT allowing the doctor to inspect multiple image
modalities without the need for invasive procedures.</p>
        <p>In the context of sCT generation, another pillar of our
research focuses on developing denoise techniques with
the aim of reducing the dose while maintaining the
quality of the virtual scanner provided to the clinician. Indeed,
under the ALARA (As Low As Reasonably Achievable)
principle, the use of low-Dose (LD) acquisition
protocol has become a clinical practice to be preferred over
high-Dose (HD) protocols. However, if on the one side
a reduction of radiations can be achieved, on the other,
the overall quality of the reconstructed CT decreases.</p>
        <p>Thereby, a trade-of between dose and noise level must
be tackled to ensure suficient diagnostic image quality.</p>
        <p>This is the reason why a lot of efort has been put into
investigating denoising strategies, trying to obtain
highquality CT images at the lowest cost in terms of radiation.</p>
        <p>To overcome this task, we develop a texture-based loss
function to be included in the objective of the CycleGAN
during training from LDCT to HDCT images. We move
from the hypothesis that the noise due to LD protocols
has a textural nature. Thus a texture-based loss will be
beneficial during training allowing a better denoising
quality and faster training.</p>
        <p>Overall, virtual scanners and AI image translation
models have the potential to revolutionize healthcare by
making it more eficient, less invasive, and more personalized.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Virtual Surgery</title>
      <p>Robotic surgery has revolutionized complex medical
applications such as minimally invasive, orthopedic, brain,
and radiotherapy surgeries. With the adoption of robotic
surgery, automation in the surgical domain has become a
crucial topic, presenting potential benefits like improved
consistency, increased dexterity, and access to standard
techniques [15].</p>
      <p>Path planning of surgical robots’ end efectors is a
promising area for automation, which can automate
surgical tasks or provide suggestions to the surgeon.
However, classical path planning algorithms have limitations
in dealing with dynamic environments like the surgical
environment. In this respect, early attempts in this field
have proven the feasibility and the potential impacts of
the automation, but also the limitations of classical path
planning algorithms. Reinforcement learning, even
exploiting deep architectures, could be a viable alternative
for path planning helping overcome such limitations.</p>
      <p>For this reason, we are developing a deep
reinforcement learning (DRL) framework that can plan an optimal
path from the entry point to the target point within the
patient, and update it during surgery using the
endoscopic camera feedback. Our study is divided into two</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <sec id="sec-4-1">
        <title>We acknowledge: FONDO PER LA CRESCITA SOSTENI</title>
        <p>BILE (F.C.S.) Bando Accordo Innovazione DM 24/5/2017
(Ministero delle Imprese e del Made in Italy), under the
project entitled “Piattaforma per la Medicina di
Precisione. Intelligenza Artificiale e Diagnostica Clinica
Integrata” (CUP B89J23000580005); PNRR MUR project
PE0000013-FAIR; the project n. F/130096/01-05/X38
Fondo per la Crescita Sostenibile - ACCORDI PER
L’INNOVAZIONE DI CUI AL D.M. 24 MAGGIO 2017 -
Ministero dello Sviluppo Economico (Italy), iii) Programma
Operativo Nazionale (PON) “Ricerca e Innovazione”
20142020 CCI2014IT16M2OP005 Azione IV.4.; Regione Lazio
PO FSE 2014-2020 Avviso Pubblico “Contributi per la
permanenza nel mondo accademico delle eccellenze” Asse
III – Istruzione e formazione - Priorità di investimento
10 ii) - Obiettivo specifico 10.5 Azione Cardine 21.
subproblems: global and local. The global subproblem
involves planning the optimal trajectory, given
information about the entry point, target point, and a digital twin
of the patient obtained from CT/MRI scans. Conversely,
the local subproblem focuses on updating the path based
on the endoscopic video feed.</p>
        <p>To tackle both subproblems we developed a simulation
environment that generates a digital twin of the patient
from CT/MRI scans and constructs a 3D environment
of the surgical environment from monocular endoscopic
images. The DRL agent is then trained to plan a global
optimal trajectory and learn to update the local trajectory
in real-time from the video feedback.</p>
        <p>Potential future applications of this DRL framework
into the metaverse are straightforward. For example, in
augmented reality, the DRL agent could be utilized to
provide suggestions to the surgeon on how to plan the
next move in the surgical environment, by overlaying the
DRL-generated path onto the surgeon’s field of vision.</p>
        <p>Furthermore, the simulation environment and DRL agent
can also be used to train surgeons on diferent cases in a
safe and controlled environment.
for covid-19 diagnosis from chest x-rays, Pattern</p>
        <p>Recognition 121 (2022) 108242.
[4] V. Guarrasi, N. C. D’Amico, R. Sicilia, E. Cordelli,</p>
        <p>P. Soda, A multi-expert system to detect covid-19
cases in x-ray images, in: 2021 IEEE 34th
International Symposium on Computer-Based Medical</p>
        <p>Systems (CBMS), IEEE, 2021, pp. 395–400.
[5] C. M. Caruso, V. Guarrasi, E. Cordelli, R.
Sicilia, S. Gentile, L. Messina, M. Fiore, C. Piccolo,
B. Beomonte Zobel, G. Iannello, et al., A
multimodal ensemble driven by multiobjective
optimisation to predict overall survival in non-small-cell
lung cancer, Journal of Imaging 8 (2022) 298.
[6] V. Guarrasi, P. Soda, Multi-objective optimization
determines when, which and how to fuse deep
networks: An application to predict covid-19
outcomes, Computers in Biology and Medicine 154
(2023) 106625.
[7] N. C. D’Amico, R. Sicilia, E. Cordelli, L. Tronchin,</p>
        <p>C. Greco, M. Fiore, A. Carnevale, G. Iannello,
S. Ramella, P. Soda, Radiomics-based prediction
of overall survival in lung cancer using diferent
volumes-of-interest, Applied Sciences 10 (2020)
6425.
[8] G. Joshi, et al., A review on explainability in
multimodal deep neural nets, IEEE Access 9 (2021)
59800–59821.
[9] V. Guarrasi, L. Tronchin, D. Albano, E. Faiella,</p>
        <p>D. Fazzini, D. Santucci, P. Soda, Multimodal
explainability via latent shift applied to covid-19
stratification, arXiv preprint arXiv:2212.14084 (2022).
[10] P. Soda, N. C. D’Amico, J. Tessadori, G. Valbusa,</p>
        <p>V. Guarrasi, C. Bortolotto, M. U. Akbar, R. Sicilia,
E. Cordelli, D. Fazzini, et al., Aiforcovid:
Predicting the clinical outcomes in patients with covid-19
applying ai to chest-x-rays. an italian multicentre
study, Medical image analysis 74 (2021) 102216.
[11] N. Truong, et al., Privacy preservation in federated
learning: An insightful survey from the gdpr
perspective, Computers &amp; Security 110 (2021) 102402.
[12] C. M. Caruso, P. Soda, V. Guarrasi, A deep learning
approach for overall survival analysis with missing
data, Submitted to: 2023 IEEE 36th International
Symposium on Computer-Based Medical Systems
(CBMS) (2023).
[1] G. Wang, A. Badal, X. Jia, J. S. Maltz, K. Mueller, K. J. [13] A. Rofena, P. Soda, V. Guarrasi, A deep learning
apMyers, C. Niu, M. Vannier, P. Yan, Z. Yu, et al., De- proach for virtual contrast enhancement in contrast
velopment of metaverse for intelligent healthcare, enhanced spectral mammography (cesm),
SubmitNature Machine Intelligence (2022) 1–8. ted to: 2023 IEEE 36th International Symposium on
[2] T. Baltrušaitis, et al., Multimodal machine learn- Computer-Based Medical Systems (CBMS) (2023).
ing: A survey and taxonomy, IEEE transactions on [14] R. Khaled, et al., Categorized contrast enhanced
pattern analysis and machine intelligence 41 (2018) mammography dataset for diagnostic and artificial
423–443. intelligence research, Scientific Data 9 (2022) 122.
[3] V. Guarrasi, N. C. D’Amico, R. Sicilia, E. Cordelli, [15] P. Fiorini, Automation and autonomy in robotic
P. Soda, Pareto optimization of deep networks surgery, 2021.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>