1. Introduction

Analysis Pre and Post COVID-19 Pandemic Rorschach Test Data of Using EM Algorithms and GMM Models

Valerio Ponzi

Samuele Russo

Agata Wajda

Rafał Brociek

Christian Napoli

0 3 0 Department of Computer, Control and Management Engineering, Sapienza University of Rome , Via Ariosto 25, Roma, 00185 , Italy 1 Department of Mathematics Applications and Methods for Artificial Intelligence, Faculty of Applied Mathematics, Silesian University of Technology , Gliwice, 44-100 , Poland 2 Department of Psychology, Sapienza University of Rome , Via dei Marsi 78, Roma, 00185 , Italy 3 Institute for Systems Analysis and Computer Science, Italian National Research Council , Via dei Taurini 19, Roma, 00185 , Italy 4 Institute of Energy and Fuel Processing Technology , Zabrze, 41-803 , Poland

55 63

The global spread of the COVID-19 virus has become one of the greatest challenges that humanity has faced in recent years. The unprecedented circumstances of forced isolation and uncertainty that it has imposed on us continue to impact our mental well-being, whether or not we have been directly afected by the virus. Over a period of nearly three years (2017-2020), data was collected from multiple administrations of the Rorschach test, one of the most renowned and extensively studied psychological tests. This study involved the clustering of data, collected through the RAP3 software, to analyze the distinctive trends in data recorded before and after the pandemic. This was achieved through the implementation of the well-established machine learning algorithm, Expectation-Maximization. The proposed solution efectively identifies the key variables that significantly influence the subject's score and provides a reliable solution. Additionally, the solution ofers an intuitive visualization that can assist psychologists in accurately interpreting shifts in trends and response distributions within a large amount of data in the two periods.

eol>Rorschach test Gaussian Mixture Model (GMM) Expectation Maximization (EM) Principal component analysis (PCA)

1. Introduction

to identify which variables are responsible for eliciting the responses and examine how these values change in the data collected before and during the pandemic. It is worth noting that the dataset does not specify whether the subjects who underwent the test during the pandemic were infected or not.

The main idea is to perform a clustering of the data isolating some important features and sketching and understanding the behavior of the responses based on a probabilistic manner. For this purpose, clustering is a good choice since this is an unsupervised task, and to achieve that the Expectation-Maximization is the algorithm we are looking for. This algorithm will be implemented using a Gaussian Mixture Model (GMM) according to the number of clusters we assume to have, which is one of the results we’d like to achieve with this work. of this test and gave it his name. This test requires the participant to bring 10 cards (or tables) to their attention, 2. Related works each of which has a symmetric inkblot. The cards are shown in Fig. 1, where it can be seen that there are 5 Since the pandemic came up just a couple of years ago, monochromatic, 2 two-tone, and 3 colorized versions of the state-of-the-art research and literature about this spethem. One by one, these cards are given to the subject, cific task are quite poor. The most popular Rorschach who is then asked to comment on every aspect the card test-related research involved the use of deep learning represents from its perspective. It’s crucial to note that and neural networks for image classification [ 13 ], but the subject may take any amount of time to respond and this is closely related to inkblots for computer interpretathat there are no "correct" or "wrong" answers because tion. As specified before, in this case, we are interested the responses are all subjective. Also, notice there can in what kind of feature is the most efective for the rebe more than one response for each card: the psychia- sponses of the subjects and the diference in trends of the trist/psychologist encourages the subject to give many responses between tests done pre-pandemic and during original replies. Typically, the psychologist notes every the pandemic. answer the subject gives and it is related to a specific One of the major problems was (and is) the submission part of the inkblot. It’s important to add that there is no of the Rorschach test to infected people: as the AIP [ 14 ] specific submission order for the cards. 1 said the remote test submission introduces significant

Many things we can say about this test, but this lit- complications in some assessments where the physical tle jump into the psychology world is enough to un- presence of the subject was needed (cognitive, neurodederstand the utility of this test for the task. In fact, velopment, work stress). However, remote submission a big amount of data was collected in a dataset called was strongly recommended in the other cases (must be ‘COVID-19 Rorschach test dataset’ which contains sev- done if the patient was currently afected by Covid-19). eral samples of protocols and responses, from 2017-01-01 In detail, psychologists take into account lots of stimuli to 2020-09-15, available online for free. It includes some from a subject: during the remote sessions, they may demographics-related variables and the codes of the Com- find some little alterations in the verbal activities of the prehensive System (Exner, 2001). The dataset contains subject, but the non-verbal stimuli and handling manipmore than 500,000 coded responses to the test inkblots ulation may be dramatically afected, due to brightness stimuli. The series of responses refer to the interpretation and sharpness of the screen mainly. In-depth studies given by a certain subject. The data were collected by of the test results have not been published because the using the Rorschach Assistant Program (RAP3) software specialists need to preserve the privacy of the patients. [ 1 ], which is one of the currently available examples of Referring to software applications created to help the a system for online testing [ 2, 3, 4, 5, 6 ] and assessment psychiatrists in the analysis of the signatures, it’s worth [ 7, 8, 9, 10, 11, 12 ]. mentioning the PRALP3 [ 15 ] software that was made up

This dataset contains a significant amount of informa- by Pancheri, De Fidio and Corfiati from the University of tion for us to analyze. The final interpretation given by a Rome and the university of Bari and published in 1995, psychiatrist depends on various factors, including the response’s content, the visual stimuli, and the region of the inkblot that elicits the response. Our task in this paper is

1Associazione Italiana di Psicologia (Italian Association of Psy

chology, in English). Notice that Italy is the country with the highest submissions of the Rorschach test registered in the dataset. the RIAP5 [ 16 ] scoring program that was developed by the PAR company, the RAP3 program cited before and the CHESSSS [ 17 ] program published in 2016. From this, it’s clear that computer programs already helped industry specialists in a relevant manner.

Some sort of clustering performed on this very same dataset was done by Surekha Ramireddy and published on the Kaggle platform[ 18 ]. In this work, the clustering has been performed by the KMeans algorithm [19, 20, 21, 22]. The choice of the KMeans algorithm is easy: it is one of the simplest and most intuitive algorithms in unsupervised ML. Assuming that the distribution of the samples in the dataset is generated by Gaussian distributions represented by a number of K means (priors and covariance matrices are the same for each Gaussian distribution), the algorithm works by starting with a random initialization of the K means, then assign each point to the closest mean (they represent the center of the clusters) by computing the distance between the point and all the means and taking the one with lower distance. Then, it repeats the process updating the means (when a sample is added to a cluster). The algorithm stops when there is no change of cluster between the current and previous step.

Convergence is always reached if the used distance function guarantees the sum of distances decreases from one iteration to the next (when the mean is moving to the center of the centroid).

Some problems might be encountered: the choice of K is fundamental (usually, from 1 to 10), and mostly depend on the choice of the distance function. Also, this algorithm doesn’t consider priors and covariance matrix as parameters, so we rely on the computation of the means only and we do not have any information about the covariance matrices of the distributions, namely we cannot control the amplitude of the clusters. Moreover, the KMeans algorithm tends to cluster the data in a circular shape if in 2D (spherical if in 3D) and this may lose accuracy if the dataset is strongly unbalanced.

3. Dataset

Let’s take a closer look at the dataset that will be used in this analysis. The dataset comprises 506,480 samples, each consisting of 24 features. The features are listed below for completeness:[ 18 ]: • User: user ID number; • PQlevel: professional qualifications levels • Client: client ID number; • Age: client age in years; • Gender: client gender; • Country: client country; • Protocol: protocol ID number; • Test Date: the date the RAP3 protocol was created; • R: total number of responses in the protocol; • ResponseOrder: the order of responses in the protocol; • CardID: Rorschach card number, 1 to 10; • Location: indicated to which area of the blot the responses referred to; • LocationNumber: location normative number; • Developmental Quality: quality of processing; • Determinants: all the visual stimuli in the blot that shaped the reported objects in the response; • Pair: two identical objects are reported, based on the symmetry of the blot; • Form Quality: indicates how good is the fitness between the area of blot and the form of the object specified in the response; • FQText: the form quality associated Normative

Text; • Contents: abbreviations for the category to which the responded object belongs; • Popular: responses that occur with a frequency with a normative sample; • ZCode: the relationship between distinct blot areas; • ZScore: numerical value assigned to responses in which such organization activity occurs; • Special Scores: indicate the presence of special features in the response; • Rejection: number of card rejections in the protocol.

Not all of these features are relevant to our task; in

fact, the features we will take into account to perform the clustering are the ones stored in the Location, Determinants, and Contents columns. All of these features store information about the test that may be useful for the interpretation of the inkblots provided by the subject. In addition to being unnecessary for this task, the other features can be eliminated because they will add to the payload’s computational burden and processing time due to the large number of missing values they contain.

The choice to cluster the data using the Location, Contents, and Determinants features is easily understandable. However, rather than using the categorical Location feature, we opt for the numerical LocationNumber feature. Despite containing some missing values, this feature can still be used as the latent variable for the EM algorithm. Thus, we simplify the dataset by condensing it into three primary features.

Our goal is to cluster the data based on the determinants feature, but due to the limitations of clustering algorithms, it is not meant to consider more than 30 centroids. Therefore, we need to group similar values A H Cg Sc Hd Hh An Na (H) Ad Art Hx Ls Bt Fi Ay Bl Sx Id Fd (A) (Hd) Ex Cl (Ad) Ge Xy together to reduce the number of possible centroids. To do this, we can categorize the values of the determinants into major groups based on their similar meanings: • Determinants based exclusively on the form feature of the blot; • Determinants based on the parts of the blot that either seems to reflect or are paired with other parts of the blot. • Determinants based on movement features of the ifgure represented by the blot; • Determinants based on the color features of the blot as the principal cause of response; • Determinants based on the shading part in both achromatic and chromatic cards;

Our hypothesis is to create a model with five clusters,

each corresponding to the categories we identified earlier.

To ensure that our data is coherent, we need to take additional steps. One such step is to exclude protocols with less than seven responses, as they are considered non-useful for our purposes. This will result in a slight reduction in the size of our dataset.

Another step that we have to do is to split the values into Contents and Determinants because one sample may be collected in the dataset with more than one determinant and/or content. Moreover, every protocol has many responses associated with it, so we need to split this data in order to be considered as a single response.

The splitting operation will increase the shape of the dataset significantly. The complete list of all the values recorded in the Contents column is shown in Table 1. notice that the pre-pandemic dataset contains way more These are still categorical values and we could encode samples than the post-pandemic dataset because it covers them as one-hot vectors, but a number from 0 to 26 is tests done in more than 3 years. associated with each content to let the program run on a The last step of the pre-processing is to deal with misslocal machine (0 to A, 1 to H, and so on). ing or null values for the LocationNumber feature. This

Moreover, it’s useful to see what these values represent is done by substituting the missing values with the mean from the point of view of the specialist. The values refer value of the features using the SimpleImputer class of to some reference classes each answer belongs to. For the Sci-Kit Learn library for Python (this library will help example: content A refers to animal or animal parts; us for the whole development of this project). Next, we contents H, Hd, Hh refer to human parts; contents Sx will encode the Determinants features using a 1-out-of-K and Sc refer to sexual responses etc. encoding technique (not the one provided in the library),

Now we have a very augmented dataset to work with, storing a 4-dimensional array for each determinant with but we need to make a couple of steps before constructing a 1 in position i, where i = 0, 1, 2, or 3, according to the the models. category the determinant belongs to (as shown earlier).

The next step is to split not a single category but the Finally, we will perform a scaling operation on the data brand-new dataset into two subsets: one containing only before fitting the model, transforming it to a standard data from tests submitted before the pandemic, and the normally distributed data ranging between [ -1,1 ], with a other one containing only data from tests submitted after mean of 0 and a standard deviation of 1. the pandemic. In the first one, we’ll have tests done from After completing the pre-processing steps, the preJan 1, 2017, to Feb 29, 2020, while in the second one, pandemic dataset will contain 1,177,056 samples, and the we’ll have tests submitted from Mar 1, 2020, to Sep 15, post-pandemic dataset will contain 136,408 samples, with 2020. From now on, we will refer to these two subsets each dataset having 7 features (1 for location numbers, as the pre-pandemic dataset for the first one, and the 1 for contents, and 5 for storing 5-dimensional one-hot post-pandemic dataset for the second one. It’s easy to encoding for the determinants). Since we are dealing with a large number of samples, we will randomly select so only the k-th prior is selected for any sample. 10,000 samples from each dataset to evaluate clustering. For a given value of z, we have However, we can confidently rely on this choice because of the scaling operation conducted just before fitting the (|) = (; , Σ ) model. thus

4. Implementation

(|) = ∏︁ (; , Σ ) =1 In this paper, the EM[23] algorithm has been chosen to so now we are able to compute the joint distribution of cluster the data. This algorithm provides the computa- samples and latent variables as tions of the mean, the covariance matrix and the prior of each distribution (cluster) involved in the problem, given (|) = () (|). a dataset. This choice is reasonable: this algorithm is able to determine the cluster attributes in a smoother In this case, the z variables have the 1-out-of-K encodway, e.g. by considering the data to be distributed in ing property and the probability of having the dataset an elliptical way. In this way, the clusters appear to be generated by the defined model can be computed by smoother than the KMeans clustering and allow us to overcome the drawbacks described before.

() = ∑︁ () (|) = ∑︁ (; , Σ ) =1

4.1. Gaussian Mixture Model

One can notice that the GMM distribution P(x) can be First, we have to make a strong assumption: the sam- seen as a marginalization of a distribution P(x,z) over the ples in the dataset we’re dealing with are generated by a variables z.

Gaussian distribution. Since we are dealing also with K Given a dataset of observations D = {︁()=1}︁, each types of clusters, we assume the samples are generated data point is associated to the corresponding variable by K diferent Gaussian distributions. So the probability which is unknown. The analysis of latent variables of having a specific sample x in the dataset is expressed allows for a better understanding of input data (e.g., dias: mensionality reduction).

() = ∑︁ (; , Σ ) =1

4.2. Expectation Maximization

where x is a sample in the dataset = , i=1,...,N, is The Expectation Maximization (EM) algorithm is an apthe prior, is the mean and Σ is the covariance matrix proach for maximum likelihood estimation in the presof the k-th distribution. ence of latent variables. It’s a general technique for find

The model is composed of a combination of Gaussian ing maximum likelihood estimators in latent variable distributions because all K distributions are handled si- models. In detail, given a dataset D = {︁()=1}︁ and multaneously, hence the term Gaussian Mixture Model (GMM). a GMM defined as P(x), the algorithm determines the

In this case, a good way to express the data is by intro- estimations of the mean , the covariance matrix Σ ducing the so-called latent variables ∈ {(0, 1)}, with and the prior . z = (1, ..., ) and each is the 1-out-of-K encoding The EM algorithm is based on the estimation of the where only one component is 1 (in the k-th position) and maximum likelihood: let’s define the posterior probabilthe other K-1 components are zeros. Using this repre- ity after observation of x as sentation, we are assigning each sample to one specific distribution and each sample has a prior probability of being assigned to the k-th distribution equal to () = ( = 1|) = ( = 1) (| = 1) () thus the probability of having a specific set of latent variables is given by ( = 1) =

() = ∏︁

(; , Σ )) () = ∑︀

=1 (; , Σ )) thus the maximum likelihood is computed as argmax (|, Σ , )

, Σ, where at maximum

The EM algorithm [23] is an iterative approach that cycles between two modes. The first mode attempts to estimate the missing or latent variables called the estimation step (or E-step). The second mode attempts to optimize the parameters of the model to best explain the data, called the maximization step (or M-step). At ifrst, the algorithm takes a random initialization of the parameters as:

(0), (0), Σ (0), 4.2.1. E-step In this step, the model estimates the missing (latent) variables in the dataset. This is performed by the following: ()(+1) =

() (; (), Σ ())) ∑︀ =1

() (; (), Σ ())) 4.2.2. M-step In this step, the model maximizes the parameters of the model in the presence of the data and updates the parameters for further iterations. This is done by the following equations:

1 ∑︁ =1 (+1) =

()(+1) , Σ (+1) =

1 ∑︁ =1

()(+1) +1(, ), (+1) = , = ∑︁ =1

()(+1) where +1(, ) = − (+1))︁ (︁ ︁( − (+1))︁ which is calculated by taking into account the mean intracluster distance and the mean nearest-cluster distance for each data point. The silhouette score for a sample is ( − ) / (, ). The value is explained in the • A silhouette score with a value near 1 means the

data point is in the correct cluster; • A silhouette score with a value near 0 means the

data point might belong in some other cluster; • A silhouette score with a value near -1 means, the

data point is in the wrong cluster.

The silhouette score is computed for K = 2, ..., 10, and the number of clusters are chosen considering the highest silhouette score, in general.

5. Results

After going into the numerical results, we would like to underline that we compute the silhouette score to see what is the suggested number of clusters. Remember that we’d like to have k=4 clusters, so the silhouette scores will be used as a guide rather than a hard rule when deciding the number of clusters since the GMM is a probabilistic and 6 are related to the pre-pandemic data points and distributions, while Figures 5 and 7 are related to the post-pandemic data. Please notice that in both Figure model. The scores are sketched in Figure 2 for the pre- 4 and Figure 5 plots, some points may overlap so you pandemic data and in Figure 3 for the post-pandemic data might see fewer points for a cluster. and the numerical values are listed in Table 2. They have It’s also interesting to see that some points are very been computed by using the silhouette_score func- far away from the others and this happens for points tion provided by the Sci-Kit Learn library [24]. Further- that have been collected with a location number equal to more, the average Silhouette score for the pre-pandemic 99 (the mean of the location numbers is 10.7) that have GMM model is 0.4077, while the average Silhouette score been predicted as belonging to the shading category (the for the post-pandemic GMM model is 0.2622. yellow one). Those points may have an efective rule

One may notice that the functions do have not similar by enlarging and stretching the Gaussian distributions, shape, but the highest Silhouette score for both datasets especially the one related to the movement category. is related to k=2: this should suggest that the best number We plot the encoded location numbers on the horizonof clusters should be 2 for this dataset, but the meaning of tal axis and the encoded contents on the vertical axis. the silhouette score is to see that our choice k=5 is reason- One can assert the following statements for these plots: able. Despite the poor score for the post-pandemic data, we can keep the initial choice because we are assuming • For the pre-pandemic data, the plot is showing a that the distribution of entire data points is similar with- similar distribution of samples predicted as the out any distinction of data submissions. The choice of other 4 categories along the horizontal axis, while k=5 is also acceptable because it’s bigger than the average it is very diverse and distinct along the vertical Silhouette score (for the pre-pandemic data). axis: it’s clear that we don’t have many cases in

From the plots, we can say the choice of either k=3 which the distributions share same parts of the or k=4 could be acceptable, too. We may anticipate that space; the clusters might be distinct but some data points may • For the post-pandemic data, the plot is showing overlap in space. a very diferent scenario with respect to the pre

In Figure 4 and Figure 5, the data points randomly vious one: we see a big increment of samples prechosen for clustering are shown, whereas the GMMs are dicted as belonging to the form category (the dark shown in Figure 6 and Figure 7. In particular, Figures 4 blue ones), but one can see also see a decrease 6. Conclusions in numbers of samples predicted as belonging to the movement category (the orange ones) and the reflection category (the green ones) and the color The project highlights the significant impact that the nucategory (the light blue ones) samples change a merical value assigned to the location of a region on a little bit their trend; blot has on the psychologist’s final interpretation. Fur• For both pre and post-pandemic models, some thermore, Figures 6 and 7 demonstrate that the results points are very far away from the others and this produced by the KMeans algorithm can be quite chaotic. happens for points that have been collected with The proximity of the means and the elliptical shapes of a location number equal to 99 that have been pre- the distributions computed by the GMM indicate that the dicted as belonging to the shading category (the KMeans algorithm may not be able to accurately fit the yellow ones). In this case, the location number data. This is likely due to the algorithm’s constraint of is referring to this category only. We can under- computing circular-shaped distributions.[26]. stand this trend also by seeing at the model plots: Given the richness of the dataset and the subjectivity indeed, the yellow Gaussian distribution is very of the Rorschach test, other projects can be done. An thin (namely the minor semi-axis is really close interesting approach would be to cluster the data based to zero) and far away from the other Gaussian on the location or contents of the blot. Another option components. would be to analyze the data distribution by grouping it according to the cards and determining the most frequent response provided by participants for each card.

Notwithstanding any potential future research, the main aim of this project is to facilitate psychologists in comprehending the variations in Rorschach tests conducted before and after the Covid-19 pandemic. The primary objective is to identify the most significant noncorrelated variable that afects the overall subject choice in responses, thereby enabling an accurate interpretation of this phenomenon.

So, in general, we notice a significant change in distributions of almost all the categories. Because of that, we want to understand which feature is the most predominant one between the locations and the contents: to do this, we’ve performed the PCA (Principal Component Analysis)[25] by using the decomposition.PCA class of the Sci-Kit Learn library and we have got the following results: [0.90091513 0.09491116] for the pre-pandemic model, and [0.88808373 0.10983465] for the post-pandemic model, so in both cases, the most dominant component is the location number, and so the location of the inkblot which prompted the patient to respond in that way. One may conclude the location number can be seen as a kind of scaling factor for the data points.

[1]

J. E. Exner

Jr , A rorschach workbook for the comprehensive system , in: Rorschach Workshops , 2001 , pp. 171 - 187 .

[2]

Russo ,

Napoli , A comprehensive solution for psychological treatment and therapeutic path planning based on knowledge base and expertise sharing , volume 2472 , 2019 , pp. 41 - 47 .

[3]

Lo Sciuto ,

Russo ,

Napoli , A cloud-based lfexible solution for psychometric tests validation, administration and evaluation , volume 2468 , CEURWS , 2019 , pp. 16 - 21 .

[4]

Pepe ,

Tedeschi ,

Brandizzi ,

Russo ,

Iocchi ,

Napoli , Human attention assessment using a machine learning approach with gan-based data augmentation technique trained using a custom dataset , OBM Neurobiology 6 ( 2022 ). doi: 10 . 21926/obm.neurobiol. 2204139 .

[5]

Napoli ,

Pappalardo ,

Tramontana , A hybrid neuro-wavelet predictor for qos control and stability , Lecture Notes in Computer Science (including subseries Lecture Notes in Artiifcial Intelligence and Lecture Notes in Bioinformatics) 8249 LNAI ( 2013 ) 527 - 538 . doi: 10 .1007/ 978-3- 319 -03524-6_ 45 .

[6]

Brandizzi ,

Bianco , G. Castro,

Russo , A. Wa- [19]

K. P.

Sinaga , M.-

Yang , Unsupervised k-means jda, Automatic rgb inference based on facial emo- clustering algorithm , IEEE Access 8 ( 2020 ) 80716 - tion recognition , in: CEUR Workshop Proceedings , 80727. doi: 10 .1109/ACCESS. 2020 . 2988796 . volume 3092 , CEUR-WS , 2021 , pp. 66 - 74 . [20]

Capizzi ,

Bonanno ,

Napoli , Hybrid neu-

[7]

Capizzi ,

Napoli ,

Russo ,

Woźniak , Lessen- ral networks architectures for soc and voltage preing stress and anxiety-related behaviors by means diction of new generation batteries storage, in: of ai-driven drones for aromatherapy , in: CEUR 3rd International Conference on Clean Electrical Workshop Proceedings , volume 2594 , CEUR-WS , Power: Renewable Energy Resources Impact, IC2020 , pp. 7 - 12 . CEP 2011 , 2011 , pp. 341 - 344 . doi: 10 .1109/ICCEP.

[8]

Marcotrigiano , G. Stingi,

Fregnan , P. Maga- 2011 .6036301. relli, P. Pasquale,

Russo , G. Orsi,

Montagna , [21]

Nowak ,

Nowicki ,

Woźniak ,

Napoli ,

Napoli , An integrated control plan Multi-class nearest neighbour classifier for in primary schools: Results of a field investiga- incomplete data handling, in: Lecture Notes tion on nutritional and hygienic features in the in Artificial Intelligence (Subseries of Lecapulia region (southern italy) , Nutrients 13 ( 2021 ). ture Notes in Computer Science), volume doi: 10 .3390/nu13093006. 9119, Springer Verlag, 2015 , pp. 469 - 480 .

[9]

Brandizzi ,

Russo ,

Brociek ,

Wajda , First doi: 10 .1007/978-3- 319 -19324-3_ 42 . studies to apply the theory of mind theory to green [22]

Capizzi ,

Bonanno ,

Napoli , A wavelet and smart mobility by using gaussian area cluster- based prediction of wind and solar energy for longing , volume 3118 , CEUR-WS , 2021 , pp. 71 - 76 . term simulation of integrated generation systems,

[10]

Illari ,

Russo ,

Avanzato ,

Napoli , A cloud- in: SPEEDAM 2010 - International Symposium oriented architecture for the remote assessment on Power Electronics, Electrical Drives, Automaand follow-up of hospitalized patients , in: CEUR tion and Motion , 2010 , pp. 586 - 592 . doi: 10 .1109/ Workshop Proceedings, volume 2694 , CEUR-WS , SPEEDAM. 2010 . 5542259 . 2020 , pp. 29 - 35 . [23]

Moon , The expectation-maximization algorithm,

[11]

Dat ,

Ponzi ,

Russo ,

Vincelli , Supporting IEEE Signal Processing Magazine 13 ( 1996 ) 47 - 60 . impaired people with a following robotic assistant doi : 10 .1109/79.543975. by means of end-to-end visual target navigation [24]

K. R.

Shahapure , C. Nicholas, Cluster quality analand reinforcement learning approaches, in: CEUR ysis using silhouette score , 2020 IEEE 7th InternaWorkshop Proceedings , volume 3118 , CEUR-WS, tional Conference on Data Science and Advanced 2021 , pp. 51 - 63 . Analytics (DSAA) ( 2020 ) 747 - 748 . doi: 10 .1109/

[12]

Russo ,

Illari ,

Avanzato ,

Napoli , Reduc- DSAA49011 . 2020 . 00096. ing the psychological burden of isolated oncological [25]

Kherif ,

Latypova , Chapter 12 - prinpatients by means of decision trees, in: CEUR Work- cipal component analysis , in: A. Mechelli, shop Proceedings , volume 2768 , CEUR-WS , 2020 , S. Vieira (Eds.), Machine Learning , Acapp. 46 - 53 . demic Press, 2020 , pp. 209 - 225 . URL:

[13]

A. S.

Charles , Interpreting deep learning: The ma- https://www.sciencedirect.com/science/article/pii/ chine learning rorschach test?, stat.ML ( 2018 ) 1 - 4 . B9780128157398000122. doi:https://doi.org/

[14]

Alessandri ,

Aschieri ,

Bobbio ,

Daini ,

Lis , 10 .1016/B978-0 -12-815739-8 . 00012 - 2 .

Nucci , L. Parolin, Documento associazione [26]

Y. G.

Jung ,

M. S.

Kang ,

Heo , Clustering perforitaliana di psicologia (aip) sulle linee guida per mance comparison using k-means and expectation l'assessment ai tempi del coronavirus ( 2020 ). maximization algorithms , Biotechnology & Biotech-

[15] D. De Fidio , P.

Pancheri , L.

Corfiati , The automated nological Equipment 28 ( 2014 ) S44 - S48 . rorschach test: an assessment of the pralp3 three years after publication , Journal of Psychopathology ( 1998 ).

[16]

Exner , I. Weiner , PAR , Riap5: Scoring program, ???? URL: https://www.parinc.com/Products/Pkey/ 363.

[17]

J. M.

Smith ,

E. E.

Taylor , Chessss: An innovative rorschach scoring program , Journal of Personality Assessment 98 ( 2016 ) 660 - 662 .

[18]

Ramireddy , Covid-19 rorschach test dataset , 2021 . URL: https://www.kaggle.com/surekharamireddy/ covid-19 - rorschach -test-dataset/data.