1. Introduction

Exploring Brain Tumor Segmentation and Patient Survival: An Interpretable Model Approach

Valerio Ponzi

0 1

Giorgio De Magistris

0 0 Department of Computer, Control and Management Engineering, Sapienza University of Rome , Via Ariosto 25, Roma, 00185 , Italy 1 Institute for Systems Analysis and Computer Science, Italian National Research Council , Via dei Taurini 19, Roma, 00185 , Italy

Detecting and delineating brain tumors from MRI images using artificial intelligence presents a complex challenge in medical AI. Recent progress has seen a variety of techniques employed to assist medical professionals in this task. Despite the efectiveness of machine learning algorithms in segmenting tumors, their lack of transparency in decision-making hinders trust and validation. In our project, we constructed an interpretable U-Net Model specifically tailored for brain tumor segmentation, leveraging both the Gradient-weighted Class Activation Mapping (Grad-CAM) Algorithm and the SHapley Additive exPlanations (SHAP) library. We relied on the BraTS2020 benchmark dataset for training and evaluation purposes. The U-Net model we employed yielded promising results. We then utilized Grad-CAM to visualize the crucial features attended to by the model within an image. Additionally, we enhanced interpretability by utilizing the SHAP library to elucidate the predictions made by various models (including Random Forest, KNN, SVC, and MLP) utilized for predicting patient survival days.

eol>Brain Tumor U-Net Segmentation Explainable Artificial Intelligence

1. Introduction

parency inherent in deep learning models poses a significant challenge. This opacity is problematic as docBrain tumors represent a significant challenge in health- tors need to understand how the model arrives at its care, afecting millions of individuals worldwide with conclusions to make informed decisions about patient their life-threatening implications. Accurate delineation care[ 2, 3 ]. Additionally, explainable artificial intelligence of these tumors is paramount for efective treatment (XAI) plays a crucial role in mitigating biases within AI strategies and ongoing monitoring of disease progres- models. Biases may arise if the model is trained on data sion. Over the past few years, deep learning techniques that doesn’t adequately represent the population it will have emerged as promising tools for brain tumor segmen- serve, leading to incorrect or skewed predictions. XAI tation, with the U-Net architecture gaining popularity can help identify and rectify these biases, thereby enfor its ability to capture intricate details within medical hancing the model’s reliability. Moreover, XAI fosters images. However, the inherent opacity of deep learning trust in AI systems by elucidating the decision-making models presents a hurdle, as it limits their interpretabil- process [ 4 ], thereby increasing the willingness of doctors ity and makes it dificult for clinicians to comprehend and patients to rely on these models [ 5, 6 ]. the rationale behind their decisions. Explainable Arti- In our project, we utilize the Gradient-weighted Class ifcial Intelligence (XAI) has garnered increasing impor- Activation Mapping (Grad-CAM) technique to imbue our tance, particularly in the medical domain, where precise segmentation UNET model with explainability. Gradtumor segmentation plays a crucial role. Tumor seg- CAM generates heatmaps highlighting the crucial rementation involves the identification and localization of gions of input images that the model focuses on when tumors within medical imaging data, such as MRI scans, making predictions. By visualizing these heatmaps, we CT scans, or X-rays. This process is indispensable in can- gain insights into the features guiding the model’s decicer diagnosis, treatment planning, and progress tracking. sions, facilitating better understanding of its behavior. XAI holds significance in tumor segmentation for several Furthermore, our project incorporates the SHAP reasons. Firstly, AI models often operate as "black boxes," (SHapley Additive exPlanations) approach, particularly meaning their decision-making processes are not readily relevant for tasks like predicting patient survival based transparent [ 1 ]. on medical imaging data in datasets like BRATS. SHAP In the context of medical imaging, the lack of trans- values elucidate the contributions of individual features to the model’s output, shedding light on the mechanisms underlying predictions. This transparency is vital in the medical context, where accurate predictions prosegmentation, augmented by Grad-CAM for heatmap the most interesting ones was [ 10 ]. The author implevisualization and SHAP for survival prediction analysis. mented a prototypical part network (ProtoPNet), which The segmentation accuracy stands at an impressive 99 dissects images by identifying prototypical parts and percent, with specific dice scores for necrotic, edema, and amalgamating evidence from these prototypes to derive enhancing regions. This comprehensive approach not a final classification. The operational principle of this only yields accurate predictions but also enhances inter- ProtoPNet involves comparing the latent features of f(x) pretability, trust, and confidence in AI-assisted medical with the learned prototypes. Specifically, for each class k, decision-making. the network seeks evidence for x belonging to class k by assessing its latent patch representations against every learned prototype p(j) associated with class k. 2. Related Works In another study focusing on interpretable machine learning, researchers introduced a method for "ClassifiBrain tumors are among the most perilous types of tu- cation of Mass Lesions in Digital Mammography" [ 11 ]. mors globally, with gliomas emerging as the predominant They employed a pixel-wise annotation technique to preprimary brain tumors. Gliomas stem from the aberrant cisely segment afected lesions, and the outcomes were proliferation of glial cells in the brain and spinal cord, subsequently depicted using GradCam and GradCam++ exhibiting varying degrees of malignancy and histologi- heatmaps. The findings demonstrated that pixel-wise cal classifications. Individuals diagnosed with glioblas- annotation improved the segmentation and localization toma, the most aggressive form of glioma, typically face of the afected area, with the generated heatmaps maina survival prognosis of fewer than 14 months on aver- taining focus on the impacted part of the skin, rather age. Medical professionals frequently utilize Magnetic than encompassing all image pixels. Resonance Imaging (MRI), a non-invasive technique, to The concept of utilizing GradCam for visual interprediagnose brain tumors because of its capability to gen- tation and explanation of model results originated from a erate a wide variety of tissue contrasts in each imaging related study, which utilized GradCam for visual explanamode [ 7 ]. However, analyzing and segmenting structural tions across a wide range of CNN-based models. This apMRI images of brain tumors is a challenging and time- proach combines Grad-CAM with fine-grained visualizaconsuming task that typically requires the expertise of tions to produce high-resolution, class-discriminative viprofessional neuroradiologists. Therefore, an automated sualizations. It was applied to various of-the-shelf image and dependable brain tumor segmentation method would classification, captioning, and visual question-answering greatly facilitate the diagnosis and treatment of brain tu- (VQA) models, including those based on ResNet architecmors. tures [ 12 ].

[ 8 ] An alternative approach is suggested to focus solely The domain of explainable Artificial Intelligence (xAI) on a small region of the image rather than processing the is relatively new but evolving rapidly, with the introducentire image, reducing computational time and address- tion of numerous libraries designed to elucidate the outing overfitting issues in a Cascade Deep Learning model. puts of opaque deep learning models. One such notable Additionally, a Cascade Convolutional Neural Network library is the SHAP (SHapley Additive exPlanations) li(C-ConvNet/C-CNN) is introduced, which extracts both brary. SHAP assigns an importance value to each feature local and global features through separate pathways. for a specific prediction. In their work [ 13 ], SHAP was Moreover, to enhance the accuracy of brain tumor seg- applied to the brats dataset. For each input feature, SHAP mentation beyond existing models, a new Distance-Wise calculates the importance value, ofering various calcuAttention (DWA) mechanism is employed. lation methods, including two model-agnostic methods

In another work [ 9 ], a novel design relying on a 3D applicable regardless of the trained network type, and U-Net model was developed, incorporating numerous four specific model methods, one of which is DeepExskip connections alongside cost-efective pre-trained 3D plainer.

MobileNetV2 blocks and attention modules. These pre- In this study, DeepExplainer was utilized to determine trained MobileNetV2 blocks aid the architecture by ofer- the importance values for a given combination of 3D MRI ing fewer parameters, ensuring a manageable model size voxel and age values. DeepExplainer eficiently approxiwithin our computational capacity, and facilitating faster mates SHAP values for a deep neural network model by convergence. Furthermore, additional skip connections recursively propagating DeepLIFT multipliers, thereby were introduced between the encoder and decoder blocks deriving an efective linearization technique from the to facilitate the transfer of extracted features, while at- SHAP values. By inputting an example data point into tention modules were employed to filter out irrelevant DeepExplainer, importance values for every pixel in the features transmitted through the skip connections. 3D voxel, as well as for the age value, are determined.

Further existing works on interpretable CNNs were These important values can then be visually represented examined during the execution of our project, One of by integrating them into a background image, which The U-Net architecture is a convolutional neural network initially devised for biomedical image segmentation but widely applicable across various computer vision segmentation tasks. It comprises a contracting path and an expanding path, forming a "U" shape. The contracting path functions akin to a traditional convolutional neural network, capturing image context through successive convolutional and max-pooling layers. These layers reduce spatial resolution while augmenting channel depth to extract broader image features. Conversely, the expanding path reconstructs spatial resolution and yields

3. Dataset

most pertinent tumor data after rigorous experimentation.

During data preprocessing, we identified irregular patterns in file 355, prompting its removal from the dataset to ensure the integrity of our results and model training. Additionally, we standardized the image size to 128*128 for training purposes. Our training data comprises stacked Flair and T1 images, while the model receives segmentation masks as labels, which are subsequently one-hot encoded for compatibility.

In this project, we utilized the BRATS2020 Dataset, a

commonly employed medical imaging dataset utilized for both brain tumor segmentation and classification tasks.

It represents an enhanced iteration of the BRATS2015 dataset and is made available by the Multimodal Brain Tumor Segmentation Challenge (BRATS).

The dataset comprises MRI scans of the brain obtained from patients diagnosed with diverse types of brain tu- 4. Our Methodology mors, including gliomas, meningiomas, and pituitary adenomas. These scans encompass four distinct modali- In this project, our approach is delineated into the folties: T1-weighted (T1), T1-weighted contrast-enhanced lowing steps: (T1ce), T2-weighted (T2), and fluid-attenuated inversion • Employing the Unet model for intricate segmenrecovery (FLAIR) images 1. Accompanying each MRI tation tasks, adept at delineating tumor contours scan is a ground truth segmentation map delineating the within MRI scans. tumor’s location and extent. Containing a total of 369 MRI scans, the BRATS2020 dataset designates 335 scans • Enacting an array of Machine Learning Algofor training and 34 for testing. These scans were sourced rithms to prognosticate patient survival, harnessfrom various medical institutions and meticulously an- ing extracted features from segmented tumor renotated by multiple experts. Furthermore, the dataset gions alongside ancillary clinical data. provides additional patient-related information such as • Deploying SHAP (SHapley Additive exPlanaage, gender, and tumor subtype. Widely recognized as tions) to furnish comprehensive elucidations of a benchmark dataset, BRATS2020 serves as a standard predictions rendered by machine learning modfor assessing the eficacy of algorithms in brain tumor els, facilitating an enhanced understanding of the segmentation and classification. Researchers leverage underlying determinants contributing to patient this dataset to innovate and validate new approaches survival prognostications. for automating these processes, aiming to enhance the accuracy and eficiency of diagnosis and treatment for 4.1. UNET for Tumour Segmentation individuals aflicted with brain tumors.

3.1. Data Preprocessing The dataset utilized in our project comprises MRI scans

of patients with various types of brain tumors, encompassing four modalities: T1-weighted (T1), T1-weighted contrast-enhanced (T1ce), T2-weighted (T2), and Flair Images, alongside corresponding ground truth segmentation masks. Each MRI image contains 155 slices, of which we selected slices ranging from 22 to 100, capturing the the final segmentation map. It employs convolutional analysis. and up-sampling layers to enhance spatial resolution Once the UNET Model is trained, the next phase inwhile reducing channel depth. Up-sampling methods volves the practical application of GradCam for explalike bilinear interpolation or transposed convolution are nation generation. We start by selecting an input MRI commonly used. image containing a brain tumor, which serves as the

Moreover, the U-Net incorporates skip connections subject for interpretation. This MRI image is then fed between corresponding layers of the contracting and ex- through the trained UNET Model to obtain the output panding paths. These connections enable the network segmentation map, which delineates the tumor region to circumvent spatial information loss from pooling op- within the image. erations and merge local and global image features ef- To delve deeper into understanding the model’s fectively. Skip connections concatenate feature maps decision-making process, we employ the concept of gradifrom corresponding layers, followed by a 1x1 convolu- ent computation. Specifically, we compute the gradients tional layer to decrease channel depth. The concatenated of the output segmentation map concerning the feature feature maps then feed subsequent convolutional and up- maps of the last convolutional layer. These gradients sampling layers in the expanding path. Training the U- provide valuable information regarding the importance Net model involves end-to-end optimization using pixel- of diferent regions within the input image in influencwise cross-entropy loss. This loss function compares ing the model’s segmentation decision. Leveraging these predicted segmentation maps with ground truth maps, gradients, we proceed to compute the GradCam heatmap guiding parameter adjustments of convolutional filters for the input MRI image. This heatmap efectively highto minimize the loss and generate accurate segmentation lights the regions within the image that exert the most maps [ 14 ]. significant influence on the segmentation decision made

Our UNET architecture operates on input images with by the UNET Model. By overlaying this heatmap onto dimensions of (128, 128, 2). Initially, a convolutional layer the input MRI image, we create a visually intuitive reprewith 64 filters, a 3x3 kernel size, and "same" padding is sentation that facilitates the interpretation of the model’s employed, followed by batch normalization and ReLU decision-making process. activation. Subsequently, the encoder phase comprises Through this approach, we gain valuable insights into multiple down-sampling blocks, each featuring two 3x3 the inner workings of the UNET Model for brain tumor convolutional layers with 64 filters, followed by batch segmentation. By visualizing the regions of the input normalization and ReLU activation. After each block, the image that contribute most significantly to the model’s iflter count doubles, and spatial resolution is halved via predictions, we enhance the interpretability of our model, max-pooling. The bottleneck layer is characterized by thereby fostering greater trust and understanding among four 3x3 convolutional layers with 1024 filters, alongside stakeholders in the medical domain. batch normalization and ReLU activation. Conversely, In summary, by integrating GradCam into our workthe decoder phase involves up-sampling blocks, consist- flow for brain tumor segmentation, we not only improve ing of 2x2 transpose convolutional layers with 512 filters. the transparency and interpretability of our model but These layers are concatenated with corresponding fea- also empower clinicians and researchers with actionable ture maps from the encoder part, followed by two 3x3 insights into the diagnostic process, ultimately leading convolutional layers with 64 filters, batch normalization, to more informed decision-making and better patient and ReLU activation. Finally, a 1x1 convolutional layer outcomes [ 12 ]. with 4 filters is employed in the final layer, succeeded by a softmax activation function to yield a probability 4.3. SHAP Explanations distribution across the four segmentation classes.

SHAP (SHapley Additive exPlanations) is a model

4.2. GradCam Algorithm agnostic method for interpreting the predictions of machine learning models. It can help to identify which GradCam, short for Gradient-weighted Class Activation features in the input data contributed the most to a parMapping, serves as a valuable tool in enhancing the in- ticular prediction [ 15 ]. terpretability of complex neural network models, par- In this project, we use SHAP to determine a patient ticularly in medical imaging tasks such as brain tumor survival. The patient survival data is already provided in segmentation using MRI images. To utilize GradCam ef- the survival info CSV file, which contains the following fectively for brain tumor segmentation, we first embark columns Brats20ID, Age, Survival days, Extent of Resecon training a UNET Model using BRATS Data, a well- tion. We preprocess the data and determine whether the known dataset extensively used in the field for brain extension of the tumor was short, medium, or large. Then tumor segmentation tasks. This initial step is crucial as the data is trained and tested using the various classificait lays the foundation for the subsequent interpretability tion algorithms including KNN, Random Forest, etc. The results of KNN, SVC, MLP, and Random Forest are then explained through the SHAP library. We used SHAP Kernel explainer and Tree Explainer to get the SHAP values and visualize them using the SHAP summary plot and SHAP force plot.

5. Results The results section includes the results obtained on the UNET Model train and test data, Visualizations using the GradCam Algorithm, and Results obtained on survival predictions data, and its explanations using SHAP. 5.1. Results on UNET Model

To interpret the model predictions, we use the GradCam technique. Our GradCam visualization function builds the gradient model using Unet model inputs, the last convolution layer of the model, and model outputs. The gradient model is then provided with a test image for which it computes the gradients of the output segmentation relating to the last convolution layer. These gradients are then used to compute the Heatmap, By visualizing the heatmap generated by Grad-CAM, we can gain insights into which parts of the input image are most important for the interpretable model to make its segmentation. Figure 5 shows the original and the GradCam heatmaps generated on that MRI Image by the model, we can see that the model focuses more on the Tumour area to predict the correct segmentation mask.

5.3. Patient Survival Prediction For predicting patient survival various ML algorithms

were used. Initially, we used a Random Forest Classifier with 3 trees to predict the extent of survival. The survival extent was categorized into three categories, small, medium, or long. For further experiments, we used the KNN classifier. Next, we used a Support Vector classifier on the same data in the context of getting better accuracy scores. Lastly, we experimented by training and testing the model using a Multi-Layer Perceptron (MLP) classiifer. The results of all those algorithms are included in the Table 1.

Our model was then validated on the test images to

visualize the output segmentations made by the model.

Figure 3 and figure 4 shows the results of the model. It perfectly segments all three classes namely "Neurotic/core", "Edema", and "Enhancing". The area comprising the tumour was perfectly identified by the model hence giving us perfect segmentations. understand feature importance and model behavior. The SHAP Tree Explainer is a method designed for interpreting tree-based machine learning models, such as decision trees or random forests. It computes the Shapley val- 6. Conclusion and Future Works ues by approximating the model with a set of additive tree-based models, enabling the attribution of feature con- Brain tumor segmentation through machine learning has tributions to individual predictions made by tree models. significantly assisted medical professionals in eficiently Figure 6 and Figure 7 display the results of our SHAP locating and resecting tumors. Various techniques have explanations. been explored to accurately segment tumors from MRI imAs we can notice from the above table, the results are not quite promising, with KNN and Random Forest being slightly better than our other experimented models. In future work, we would try to achieve better scores by trying various ensembles of models on the data. ages, including the utilization of the UNET Model in this project. Recent advancements in semantic segmentation have introduced several notable models: Mask R-CNN: This CNN architecture extends the Faster R-CNN object detection model to include a mask prediction branch, allowing it to perform object detection and instance segmentation simultaneously. DeepLab V3+: Designed for semantic segmentation of images, DeepLab V3+ employs dilated convolution to capture multi-scale context without increasing the number of parameters. PSPNet: Utilizing a pyramid pooling module, PSPNet captures global context at multiple scales, facilitating accurate predictions for objects of various sizes [ 18, 19 ]. FCN: Fully Convolutional Networks perform dense pixel-wise prediction of image labels, accommodating input images of arbitrary size and producing output images of the same size with predicted labels for each pixel [ 20 ]. Segment Anything Model (SAM): Facebook’s SAM, an open-source state-of-the-art computer vision model, is designed for image segmentation tasks [ 21 ]. These models, alongside UNET, have showcased state-of-the-art results in segmentation tasks. Our objective is to collaborate with these models on our data and visualize their outcomes on the BRATS2020 dataset.

In addition to tumor segmentation, we aimed to enhance the interpretability of the UNET model by employing GradCam. However, with advancements in the eXplainable Artificial Intelligence (XAI) field, several other visual interpretation techniques have emerged. We plan to explore these techniques, including GradCam++, SmoothGradCam++, Guided GradCam, and Score-CAM, to provide more precise and insightful model interpretations.

Moreover, in the realm of patient survival prediction, current models exhibit low accuracy and struggle with generalization. To address this, our future work will involve experimenting with sequential neural network models to achieve better results. Additionally, we will focus on tuning hyperparameters and exploring diferent parameter sets to improve model performance on both training and test data. These eforts aim to enhance the accuracy and reliability of patient survival predictions, thus advancing the impact of medical AI in clinical settings.

1–8

[1]

Saleem ,

A. R.

Shahid ,

Raza , Visual interpretability in 3d brain tumor segmentation network , Computers in Biology and Medicine 133 ( 2021 ) 104410 . URL: https://www.sciencedirect.com/science/article/pii/ S0010482521002043. doi:https://doi.org/10. 1016/j.compbiomed. 2021 . 104410 .

[2]

Yan ,

Chen ,

Xia ,

Wang ,

Xiao , An explainable brain tumor detection framework for mri analysis , Applied Sciences 13 ( 2023 ). URL: https://www.mdpi.com/2076-3417/13/6/3438. doi: 10 .3390/app13063438.

[3]

De Magistris ,

Caprari , G. Castro,

Russo ,

Iocchi ,

Nardi ,

Napoli , Vision-based holistic scene understanding for context-aware humanrobot interaction 13196 LNAI ( 2022 ) 310 - 325 . doi: 10 .1007/978-3- 031 -08421-8_ 21 .

[4]

Borowik ,

Woźniak ,

Fornaia ,

Giunta ,

Napoli ,

Pappalardo ,

Tramontana , A software architecture assisting workflow executions on cloud resources , International Journal of Electronics and Telecommunications 61 ( 2015 ) 17 - 23 . doi: 10 .1515/eletel-2015-0002.

[5]

Yang ,

Ye ,

Xia , Unbox the black-box for the medical explainable ai via multi-modal and multicentre data fusion: A mini-review, two showcases and beyond , Information Fusion 77 ( 2022 ) 29 - 52 . URL: https://www.sciencedirect.com/science/ article/pii/S1566253521001597. doi:https://doi. org/10.1016/j.inffus. 2021 . 07 .016.

[6]

Ponzi ,

Russo ,

Bianco ,

Napoli ,

Wajda , Psychoeducative social robots for an healthier lifestyle using artificial intelligence: a case-study , volume 3118 , 2021 , pp. 26 - 33 .

[7]

Khawaldeh ,

Pervaiz ,

Rafiq ,

R. S.

Alkhawaldeh , Noninvasive grading of glioma tumor using magnetic resonance imaging with convolutional neural networks , Applied Sciences 8 ( 2018 ). URL: https://www.mdpi.com/2076-3417/8/1/ 27. doi: 10 .3390/app8010027.

[8]

Ranjbarzadeh ,

A. Bagherian

Kasgari ,

S. Jafarzadeh

Ghoushchi ,

Anari ,

Naseri ,

Bendechache , Brain tumor segmentation based on deep learning and an attention mechanism using mri multi-modalities brain images ., Scientific reports ( 2021 ). URL: http://hdl.handle.net/10147/ 631439. doi: 10 .1038/s41598-021-90428-8.

[9]

Nodirov ,

A. B.

Abdusalomov ,

T. K.

Whangbo , Attention 3d u-net with multiple skip connections for segmentation of brain tumor images , Sensors 22 ( 2022 ). URL: https://www.mdpi.com/1424-8220/22/ 17/6501. doi: 10 .3390/s22176501.

[10]

Chen ,

Li ,

Barnett ,

Su ,

Rudin , This looks like that: deep learning for interpretable image recognition , CoRR abs/ 1806 .10574 ( 2018 ). URL: http: //arxiv.org/abs/ 1806 .10574. arXiv: 1806 .10574.

[11]

A. J.

Barnett ,

F. R.

Schwartz ,

Tao ,

Chen ,

Ren ,

J. Y.

Lo , C. Rudin, IAIA-BL: A case-based interpretable deep learning model for classification of mass lesions in digital mammography , CoRR abs/2103 .12308 ( 2021 ). URL: https://arxiv.org/abs/ 2103.12308. arXiv: 2103 . 12308 .

[12]

R. R.

Selvaraju , A. Das , R.

Vedantam , M.

Cogswell , D.

Parikh , D.

Batra , Grad-cam: Why did you say that ?, 2017 . arXiv: 1611 . 07450 .

[13]

Eder ,

Moser ,

Holzinger ,

JeanQuartier , F. Jeanquartier, Interpretable machine learning with brain image and survival data , BioMedInformatics 2 ( 2022 ) 492 - 510 . URL: https: //www.mdpi.com/2673-7426/2/3/31. doi: 10 .3390/ biomedinformatics2030031.

[14]

Ronneberger ,

Fischer ,

Brox , U-net: Convolutional networks for biomedical image segmentation , CoRR abs/1505 .04597 ( 2015 ). URL: http: //arxiv.org/abs/1505.04597. arXiv: 1505 . 04597 .

[15]

S. M.

Lundberg ,

Lee , A unified approach to interpreting model predictions , CoRR abs/1705 .07874 ( 2017 ). URL: http://arxiv.org/abs/ 1705.07874. arXiv: 1705 . 07874 .

[16]

Jadon , A survey of loss functions for semantic segmentation , in: 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) , IEEE, 2020 . URL: https://doi.org/10.1109% 2Fcibcb48159 . 2020 . 9277638 . doi: 10 .1109/cibcb48159. 2020 . 9277638 .

[17]

Alfarano , G. De Magistris,

Mongelli ,

Russo ,

Starczewski ,

Napoli , A novel convmixer transformer based architecture for violent behavior detection 14126 LNAI ( 2023 ) 3 - 16 . doi: 10 .1007/ 978-3- 031 -42508- 0 _ 1 .

[18]

Zhao ,

Shi ,

Qi ,

Wang ,

Jia , Pyramid scene parsing network, 2017 . arXiv: 1612 . 01105 .

[19]

Marcotrigiano ,

G. D.

Stingi ,

Fregnan ,

Magarelli ,

Pasquale ,

Russo ,

G. B.

Orsi ,

M. T.

Montagna ,

Napoli ,

Napoli , An integrated control plan in primary schools: Results of a field investigation on nutritional and hygienic features in the apulia region (southern italy) , Nutrients 13 ( 2021 ). doi: 10 .3390/nu13093006.

[20]

Long , E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation , 2015 . arXiv: 1411 . 4038 .

[21]

Kirillov ,

Mintun ,

Ravi ,

Mao ,

Rolland ,

Gustafson ,

Xiao ,

Whitehead ,

A. C.

Berg , W.- Y. Lo,

Dollár ,

Girshick , Segment anything, 2023 . arXiv: 2304 . 02643 .