<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysis Pre and Post COVID-19 Pandemic Rorschach Test Data of Using EM Algorithms and GMM Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valerio Ponzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Samuele Russo</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agata Wajda</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafał Brociek</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Napoli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer, Control and Management Engineering, Sapienza University of Rome</institution>
          ,
          <addr-line>Via Ariosto 25, Roma, 00185</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mathematics Applications and Methods for Artificial Intelligence, Faculty of Applied Mathematics, Silesian University of Technology</institution>
          ,
          <addr-line>Gliwice, 44-100</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Psychology, Sapienza University of Rome</institution>
          ,
          <addr-line>Via dei Marsi 78, Roma, 00185</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Institute for Systems Analysis and Computer Science, Italian National Research Council</institution>
          ,
          <addr-line>Via dei Taurini 19, Roma, 00185</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Institute of Energy and Fuel Processing Technology</institution>
          ,
          <addr-line>Zabrze, 41-803</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <fpage>55</fpage>
      <lpage>63</lpage>
      <abstract>
        <p>The global spread of the COVID-19 virus has become one of the greatest challenges that humanity has faced in recent years. The unprecedented circumstances of forced isolation and uncertainty that it has imposed on us continue to impact our mental well-being, whether or not we have been directly afected by the virus. Over a period of nearly three years (2017-2020), data was collected from multiple administrations of the Rorschach test, one of the most renowned and extensively studied psychological tests. This study involved the clustering of data, collected through the RAP3 software, to analyze the distinctive trends in data recorded before and after the pandemic. This was achieved through the implementation of the well-established machine learning algorithm, Expectation-Maximization. The proposed solution efectively identifies the key variables that significantly influence the subject's score and provides a reliable solution. Additionally, the solution ofers an intuitive visualization that can assist psychologists in accurately interpreting shifts in trends and response distributions within a large amount of data in the two periods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Rorschach test</kwd>
        <kwd>Gaussian Mixture Model (GMM)</kwd>
        <kwd>Expectation Maximization (EM)</kwd>
        <kwd>Principal component analysis (PCA)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>to identify which variables are responsible for eliciting
the responses and examine how these values change in
the data collected before and during the pandemic. It is
worth noting that the dataset does not specify whether
the subjects who underwent the test during the pandemic
were infected or not.</p>
      <p>
        The main idea is to perform a clustering of the data
isolating some important features and sketching and
understanding the behavior of the responses based on a
probabilistic manner. For this purpose, clustering is a good
choice since this is an unsupervised task, and to achieve
that the Expectation-Maximization is the algorithm we
are looking for. This algorithm will be implemented
using a Gaussian Mixture Model (GMM) according to the
number of clusters we assume to have, which is one of
the results we’d like to achieve with this work.
of this test and gave it his name. This test requires the
participant to bring 10 cards (or tables) to their attention, 2. Related works
each of which has a symmetric inkblot. The cards are
shown in Fig. 1, where it can be seen that there are 5 Since the pandemic came up just a couple of years ago,
monochromatic, 2 two-tone, and 3 colorized versions of the state-of-the-art research and literature about this
spethem. One by one, these cards are given to the subject, cific task are quite poor. The most popular Rorschach
who is then asked to comment on every aspect the card test-related research involved the use of deep learning
represents from its perspective. It’s crucial to note that and neural networks for image classification [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], but
the subject may take any amount of time to respond and this is closely related to inkblots for computer
interpretathat there are no "correct" or "wrong" answers because tion. As specified before, in this case, we are interested
the responses are all subjective. Also, notice there can in what kind of feature is the most efective for the
rebe more than one response for each card: the psychia- sponses of the subjects and the diference in trends of the
trist/psychologist encourages the subject to give many responses between tests done pre-pandemic and during
original replies. Typically, the psychologist notes every the pandemic.
answer the subject gives and it is related to a specific One of the major problems was (and is) the submission
part of the inkblot. It’s important to add that there is no of the Rorschach test to infected people: as the AIP [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
specific submission order for the cards. 1 said the remote test submission introduces significant
      </p>
      <p>
        Many things we can say about this test, but this lit- complications in some assessments where the physical
tle jump into the psychology world is enough to un- presence of the subject was needed (cognitive,
neurodederstand the utility of this test for the task. In fact, velopment, work stress). However, remote submission
a big amount of data was collected in a dataset called was strongly recommended in the other cases (must be
‘COVID-19 Rorschach test dataset’ which contains sev- done if the patient was currently afected by Covid-19).
eral samples of protocols and responses, from 2017-01-01 In detail, psychologists take into account lots of stimuli
to 2020-09-15, available online for free. It includes some from a subject: during the remote sessions, they may
demographics-related variables and the codes of the Com- find some little alterations in the verbal activities of the
prehensive System (Exner, 2001). The dataset contains subject, but the non-verbal stimuli and handling
manipmore than 500,000 coded responses to the test inkblots ulation may be dramatically afected, due to brightness
stimuli. The series of responses refer to the interpretation and sharpness of the screen mainly. In-depth studies
given by a certain subject. The data were collected by of the test results have not been published because the
using the Rorschach Assistant Program (RAP3) software specialists need to preserve the privacy of the patients.
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which is one of the currently available examples of Referring to software applications created to help the
a system for online testing [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5 ref6">2, 3, 4, 5, 6</xref>
        ] and assessment psychiatrists in the analysis of the signatures, it’s worth
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref7 ref8 ref9">7, 8, 9, 10, 11, 12</xref>
        ]. mentioning the PRALP3 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] software that was made up
      </p>
      <p>This dataset contains a significant amount of informa- by Pancheri, De Fidio and Corfiati from the University of
tion for us to analyze. The final interpretation given by a Rome and the university of Bari and published in 1995,
psychiatrist depends on various factors, including the
response’s content, the visual stimuli, and the region of the
inkblot that elicits the response. Our task in this paper is</p>
      <sec id="sec-1-1">
        <title>1Associazione Italiana di Psicologia (Italian Association of Psy</title>
        <p>
          chology, in English). Notice that Italy is the country with the highest
submissions of the Rorschach test registered in the dataset.
the RIAP5 [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] scoring program that was developed by
the PAR company, the RAP3 program cited before and
the CHESSSS [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] program published in 2016. From this,
it’s clear that computer programs already helped industry
specialists in a relevant manner.
        </p>
        <p>
          Some sort of clustering performed on this very same
dataset was done by Surekha Ramireddy and published
on the Kaggle platform[
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. In this work, the
clustering has been performed by the KMeans algorithm
[19, 20, 21, 22]. The choice of the KMeans algorithm is
easy: it is one of the simplest and most intuitive
algorithms in unsupervised ML. Assuming that the
distribution of the samples in the dataset is generated by
Gaussian distributions represented by a number of K means
(priors and covariance matrices are the same for each
Gaussian distribution), the algorithm works by starting
with a random initialization of the K means, then assign
each point to the closest mean (they represent the center
of the clusters) by computing the distance between the
point and all the means and taking the one with lower
distance. Then, it repeats the process updating the means
(when a sample is added to a cluster). The algorithm stops
when there is no change of cluster between the current
and previous step.
        </p>
        <p>Convergence is always reached if the used distance
function guarantees the sum of distances decreases from
one iteration to the next (when the mean is moving to
the center of the centroid).</p>
        <p>Some problems might be encountered: the choice of
K is fundamental (usually, from 1 to 10), and mostly
depend on the choice of the distance function. Also, this
algorithm doesn’t consider priors and covariance
matrix as parameters, so we rely on the computation of the
means only and we do not have any information about
the covariance matrices of the distributions, namely we
cannot control the amplitude of the clusters. Moreover,
the KMeans algorithm tends to cluster the data in a
circular shape if in 2D (spherical if in 3D) and this may lose
accuracy if the dataset is strongly unbalanced.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Dataset</title>
      <p>
        Let’s take a closer look at the dataset that will be used
in this analysis. The dataset comprises 506,480 samples,
each consisting of 24 features. The features are listed
below for completeness:[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]:
• User: user ID number;
• PQlevel: professional qualifications levels
• Client: client ID number;
• Age: client age in years;
• Gender: client gender;
• Country: client country;
• Protocol: protocol ID number;
• Test Date: the date the RAP3 protocol was
created;
• R: total number of responses in the protocol;
• ResponseOrder: the order of responses in the
protocol;
• CardID: Rorschach card number, 1 to 10;
• Location: indicated to which area of the blot the
responses referred to;
• LocationNumber: location normative number;
• Developmental Quality: quality of processing;
• Determinants: all the visual stimuli in the blot
that shaped the reported objects in the response;
• Pair: two identical objects are reported, based on
the symmetry of the blot;
• Form Quality: indicates how good is the fitness
between the area of blot and the form of the object
specified in the response;
• FQText: the form quality associated Normative
      </p>
      <p>Text;
• Contents: abbreviations for the category to which
the responded object belongs;
• Popular: responses that occur with a frequency
with a normative sample;
• ZCode: the relationship between distinct blot
areas;
• ZScore: numerical value assigned to responses in
which such organization activity occurs;
• Special Scores: indicate the presence of special
features in the response;
• Rejection: number of card rejections in the
protocol.</p>
      <sec id="sec-2-1">
        <title>Not all of these features are relevant to our task; in</title>
        <p>fact, the features we will take into account to perform
the clustering are the ones stored in the Location,
Determinants, and Contents columns. All of these features
store information about the test that may be useful for
the interpretation of the inkblots provided by the subject.
In addition to being unnecessary for this task, the other
features can be eliminated because they will add to the
payload’s computational burden and processing time due
to the large number of missing values they contain.</p>
        <p>The choice to cluster the data using the Location,
Contents, and Determinants features is easily understandable.
However, rather than using the categorical Location
feature, we opt for the numerical LocationNumber feature.
Despite containing some missing values, this feature can
still be used as the latent variable for the EM algorithm.
Thus, we simplify the dataset by condensing it into three
primary features.</p>
        <p>Our goal is to cluster the data based on the
determinants feature, but due to the limitations of clustering
algorithms, it is not meant to consider more than 30
centroids. Therefore, we need to group similar values
A
H
Cg
Sc
Hd
Hh
An
Na
(H)
Ad
Art
Hx
Ls
Bt
Fi
Ay
Bl
Sx
Id
Fd
(A)
(Hd)
Ex
Cl
(Ad)
Ge
Xy
together to reduce the number of possible centroids. To
do this, we can categorize the values of the determinants
into major groups based on their similar meanings:
• Determinants based exclusively on the form
feature of the blot;
• Determinants based on the parts of the blot that
either seems to reflect or are paired with other
parts of the blot.
• Determinants based on movement features of the
ifgure represented by the blot;
• Determinants based on the color features of the
blot as the principal cause of response;
• Determinants based on the shading part in both
achromatic and chromatic cards;</p>
      </sec>
      <sec id="sec-2-2">
        <title>Our hypothesis is to create a model with five clusters,</title>
        <p>each corresponding to the categories we identified earlier.</p>
        <p>To ensure that our data is coherent, we need to take
additional steps. One such step is to exclude protocols
with less than seven responses, as they are considered
non-useful for our purposes. This will result in a slight
reduction in the size of our dataset.</p>
        <p>Another step that we have to do is to split the
values into Contents and Determinants because one sample
may be collected in the dataset with more than one
determinant and/or content. Moreover, every protocol has
many responses associated with it, so we need to split
this data in order to be considered as a single response.</p>
        <p>The splitting operation will increase the shape of the
dataset significantly. The complete list of all the values
recorded in the Contents column is shown in Table 1. notice that the pre-pandemic dataset contains way more
These are still categorical values and we could encode samples than the post-pandemic dataset because it covers
them as one-hot vectors, but a number from 0 to 26 is tests done in more than 3 years.
associated with each content to let the program run on a The last step of the pre-processing is to deal with
misslocal machine (0 to A, 1 to H, and so on). ing or null values for the LocationNumber feature. This</p>
        <p>Moreover, it’s useful to see what these values represent is done by substituting the missing values with the mean
from the point of view of the specialist. The values refer value of the features using the SimpleImputer class of
to some reference classes each answer belongs to. For the Sci-Kit Learn library for Python (this library will help
example: content A refers to animal or animal parts; us for the whole development of this project). Next, we
contents H, Hd, Hh refer to human parts; contents Sx will encode the Determinants features using a 1-out-of-K
and Sc refer to sexual responses etc. encoding technique (not the one provided in the library),</p>
        <p>Now we have a very augmented dataset to work with, storing a 4-dimensional array for each determinant with
but we need to make a couple of steps before constructing a 1 in position i, where i = 0, 1, 2, or 3, according to the
the models. category the determinant belongs to (as shown earlier).</p>
        <p>
          The next step is to split not a single category but the Finally, we will perform a scaling operation on the data
brand-new dataset into two subsets: one containing only before fitting the model, transforming it to a standard
data from tests submitted before the pandemic, and the normally distributed data ranging between [
          <xref ref-type="bibr" rid="ref1">-1,1</xref>
          ], with a
other one containing only data from tests submitted after mean of 0 and a standard deviation of 1.
the pandemic. In the first one, we’ll have tests done from After completing the pre-processing steps, the
preJan 1, 2017, to Feb 29, 2020, while in the second one, pandemic dataset will contain 1,177,056 samples, and the
we’ll have tests submitted from Mar 1, 2020, to Sep 15, post-pandemic dataset will contain 136,408 samples, with
2020. From now on, we will refer to these two subsets each dataset having 7 features (1 for location numbers,
as the pre-pandemic dataset for the first one, and the 1 for contents, and 5 for storing 5-dimensional one-hot
post-pandemic dataset for the second one. It’s easy to encoding for the determinants). Since we are dealing
with a large number of samples, we will randomly select so only the k-th prior is selected for any sample.
10,000 samples from each dataset to evaluate clustering. For a given value of z, we have
However, we can confidently rely on this choice because
of the scaling operation conducted just before fitting the  (|) =  (;  , Σ )
model.
thus
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Implementation</title>
      <p>(|) = ∏︁  (;  , Σ )
=1
In this paper, the EM[23] algorithm has been chosen to so now we are able to compute the joint distribution of
cluster the data. This algorithm provides the computa- samples and latent variables as
tions of the mean, the covariance matrix and the prior of
each distribution (cluster) involved in the problem, given  (|) =  () (|).
a dataset. This choice is reasonable: this algorithm is
able to determine the cluster attributes in a smoother In this case, the z variables have the 1-out-of-K
encodway, e.g. by considering the data to be distributed in ing property and the probability of having the dataset
an elliptical way. In this way, the clusters appear to be generated by the defined model can be computed by
smoother than the KMeans clustering and allow us to
overcome the drawbacks described before.</p>
      <p>() = ∑︁  () (|) = ∑︁   (;  , Σ )
 =1</p>
      <sec id="sec-3-1">
        <title>4.1. Gaussian Mixture Model</title>
        <p>One can notice that the GMM distribution P(x) can be
First, we have to make a strong assumption: the sam- seen as a marginalization of a distribution P(x,z) over the
ples in the dataset we’re dealing with are generated by a variables z.</p>
        <p>Gaussian distribution. Since we are dealing also with K Given a dataset of observations D = {︁()=1}︁, each
types of clusters, we assume the samples are generated data point  is associated to the corresponding variable
by K diferent Gaussian distributions. So the probability  which is unknown. The analysis of latent variables
of having a specific sample x in the dataset is expressed allows for a better understanding of input data (e.g.,
dias: mensionality reduction).</p>
        <p>() = ∑︁   (;  , Σ )
=1</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Expectation Maximization</title>
        <p>where x is a sample in the dataset  = , i=1,...,N,   is The Expectation Maximization (EM) algorithm is an
apthe prior,   is the mean and Σ  is the covariance matrix proach for maximum likelihood estimation in the
presof the k-th distribution. ence of latent variables. It’s a general technique for
find</p>
        <p>The model is composed of a combination of Gaussian ing maximum likelihood estimators in latent variable
distributions because all K distributions are handled si- models. In detail, given a dataset D = {︁()=1}︁ and
multaneously, hence the term Gaussian Mixture Model
(GMM). a GMM defined as P(x), the algorithm determines the</p>
        <p>In this case, a good way to express the data is by intro- estimations of the mean  , the covariance matrix Σ 
ducing the so-called latent variables  ∈ {(0, 1)}, with and the prior  .
z = (1, ...,  ) and each  is the 1-out-of-K encoding The EM algorithm is based on the estimation of the
where only one component is 1 (in the k-th position) and maximum likelihood: let’s define the posterior
probabilthe other K-1 components are zeros. Using this repre- ity after observation of x as
sentation, we are assigning each sample to one specific
distribution and each sample has a prior probability of
being assigned to the k-th distribution equal to  () =  ( = 1|) =
 ( = 1) (| = 1)
 ()
thus the probability of having a specific set of latent
variables is given by
 ( = 1) =</p>
        <p>() = ∏︁</p>
        <p>=1</p>
        <p>(;  , Σ ))
 () = ∑︀</p>
        <p>=1    (;   , Σ  ))
thus the maximum likelihood is computed as
argmax  (|, Σ ,  )</p>
        <p>, Σ,
where at maximum</p>
        <p>The EM algorithm [23] is an iterative approach that
cycles between two modes. The first mode attempts to
estimate the missing or latent variables called the
estimation step (or E-step). The second mode attempts to
optimize the parameters of the model to best explain
the data, called the maximization step (or M-step). At
ifrst, the algorithm takes a random initialization of the
parameters as:</p>
        <p>(0),  (0), Σ (0),
4.2.1. E-step
In this step, the model estimates the missing (latent)
variables in the dataset. This is performed by the following:
 ()(+1) =</p>
        <p>() (;  (), Σ ()))
∑︀
=1</p>
        <p>() (;  (), Σ ()))
4.2.2. M-step
In this step, the model maximizes the parameters of the
model in the presence of the data and updates the
parameters for further iterations. This is done by the following
equations:</p>
        <p>1 ∑︁
 =1
 (+1) =</p>
        <p>()(+1) ,
Σ (+1) =</p>
        <p>1 ∑︁
 =1</p>
        <p>()(+1) +1(,  ),
 (+1) =

 ,  = ∑︁

=1</p>
        <p>()(+1)
where
+1(,  ) =  −  (+1))︁ (︁
︁(
 −  (+1))︁ 
which is calculated by taking into account the mean
intracluster distance  and the mean nearest-cluster distance
 for each data point. The silhouette score for a sample
is ( − ) /  (, ). The value is explained in the
• A silhouette score with a value near 1 means the</p>
        <p>data point is in the correct cluster;
• A silhouette score with a value near 0 means the</p>
        <p>data point might belong in some other cluster;
• A silhouette score with a value near -1 means, the</p>
        <p>data point is in the wrong cluster.</p>
        <p>The silhouette score is computed for K = 2, ..., 10, and the
number of clusters are chosen considering the highest
silhouette score, in general.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Results</title>
      <p>After going into the numerical results, we would like to
underline that we compute the silhouette score to see
what is the suggested number of clusters. Remember that
we’d like to have k=4 clusters, so the silhouette scores will
be used as a guide rather than a hard rule when deciding
the number of clusters since the GMM is a probabilistic
and 6 are related to the pre-pandemic data points and
distributions, while Figures 5 and 7 are related to the
post-pandemic data. Please notice that in both Figure
model. The scores are sketched in Figure 2 for the pre- 4 and Figure 5 plots, some points may overlap so you
pandemic data and in Figure 3 for the post-pandemic data might see fewer points for a cluster.
and the numerical values are listed in Table 2. They have It’s also interesting to see that some points are very
been computed by using the silhouette_score func- far away from the others and this happens for points
tion provided by the Sci-Kit Learn library [24]. Further- that have been collected with a location number equal to
more, the average Silhouette score for the pre-pandemic 99 (the mean of the location numbers is 10.7) that have
GMM model is 0.4077, while the average Silhouette score been predicted as belonging to the shading category (the
for the post-pandemic GMM model is 0.2622. yellow one). Those points may have an efective rule</p>
      <p>One may notice that the functions do have not similar by enlarging and stretching the Gaussian distributions,
shape, but the highest Silhouette score for both datasets especially the one related to the movement category.
is related to k=2: this should suggest that the best number We plot the encoded location numbers on the
horizonof clusters should be 2 for this dataset, but the meaning of tal axis and the encoded contents on the vertical axis.
the silhouette score is to see that our choice k=5 is reason- One can assert the following statements for these plots:
able. Despite the poor score for the post-pandemic data,
we can keep the initial choice because we are assuming • For the pre-pandemic data, the plot is showing a
that the distribution of entire data points is similar with- similar distribution of samples predicted as the
out any distinction of data submissions. The choice of other 4 categories along the horizontal axis, while
k=5 is also acceptable because it’s bigger than the average it is very diverse and distinct along the vertical
Silhouette score (for the pre-pandemic data). axis: it’s clear that we don’t have many cases in</p>
      <p>From the plots, we can say the choice of either k=3 which the distributions share same parts of the
or k=4 could be acceptable, too. We may anticipate that space;
the clusters might be distinct but some data points may • For the post-pandemic data, the plot is showing
overlap in space. a very diferent scenario with respect to the
pre</p>
      <p>In Figure 4 and Figure 5, the data points randomly vious one: we see a big increment of samples
prechosen for clustering are shown, whereas the GMMs are dicted as belonging to the form category (the dark
shown in Figure 6 and Figure 7. In particular, Figures 4 blue ones), but one can see also see a decrease
6. Conclusions
in numbers of samples predicted as belonging to
the movement category (the orange ones) and the
reflection category (the green ones) and the color The project highlights the significant impact that the
nucategory (the light blue ones) samples change a merical value assigned to the location of a region on a
little bit their trend; blot has on the psychologist’s final interpretation.
Fur• For both pre and post-pandemic models, some thermore, Figures 6 and 7 demonstrate that the results
points are very far away from the others and this produced by the KMeans algorithm can be quite chaotic.
happens for points that have been collected with The proximity of the means and the elliptical shapes of
a location number equal to 99 that have been pre- the distributions computed by the GMM indicate that the
dicted as belonging to the shading category (the KMeans algorithm may not be able to accurately fit the
yellow ones). In this case, the location number data. This is likely due to the algorithm’s constraint of
is referring to this category only. We can under- computing circular-shaped distributions.[26].
stand this trend also by seeing at the model plots: Given the richness of the dataset and the subjectivity
indeed, the yellow Gaussian distribution is very of the Rorschach test, other projects can be done. An
thin (namely the minor semi-axis is really close interesting approach would be to cluster the data based
to zero) and far away from the other Gaussian on the location or contents of the blot. Another option
components. would be to analyze the data distribution by grouping it
according to the cards and determining the most frequent
response provided by participants for each card.</p>
      <p>Notwithstanding any potential future research, the
main aim of this project is to facilitate psychologists in
comprehending the variations in Rorschach tests
conducted before and after the Covid-19 pandemic. The
primary objective is to identify the most significant
noncorrelated variable that afects the overall subject choice
in responses, thereby enabling an accurate interpretation
of this phenomenon.</p>
      <p>So, in general, we notice a significant change in
distributions of almost all the categories. Because of that,
we want to understand which feature is the most
predominant one between the locations and the contents:
to do this, we’ve performed the PCA (Principal
Component Analysis)[25] by using the decomposition.PCA
class of the Sci-Kit Learn library and we have got the
following results: [0.90091513 0.09491116] for the
pre-pandemic model, and [0.88808373 0.10983465]
for the post-pandemic model, so in both cases, the most
dominant component is the location number, and so the
location of the inkblot which prompted the patient to
respond in that way. One may conclude the location
number can be seen as a kind of scaling factor for the
data points.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J. E. Exner</given-names>
            <surname>Jr</surname>
          </string-name>
          ,
          <article-title>A rorschach workbook for the comprehensive system</article-title>
          ,
          <source>in: Rorschach Workshops</source>
          ,
          <year>2001</year>
          , pp.
          <fpage>171</fpage>
          -
          <lpage>187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A comprehensive solution for psychological treatment and therapeutic path planning based on knowledge base and expertise sharing</article-title>
          , volume
          <volume>2472</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lo Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A cloud-based lfexible solution for psychometric tests validation, administration and evaluation</article-title>
          , volume
          <volume>2468</volume>
          ,
          <string-name>
            <surname>CEURWS</surname>
          </string-name>
          ,
          <year>2019</year>
          , pp.
          <fpage>16</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pepe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tedeschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Iocchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>Human attention assessment using a machine learning approach with gan-based data augmentation technique trained using a custom dataset</article-title>
          ,
          <source>OBM Neurobiology 6</source>
          (
          <year>2022</year>
          ). doi:
          <volume>10</volume>
          . 21926/obm.neurobiol.
          <volume>2204139</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Pappalardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Tramontana</surname>
          </string-name>
          ,
          <article-title>A hybrid neuro-wavelet predictor for qos control and stability</article-title>
          ,
          <source>Lecture Notes in Computer Science (including subseries Lecture Notes in Artiifcial Intelligence and Lecture Notes in Bioinformatics) 8249 LNAI</source>
          (
          <year>2013</year>
          )
          <fpage>527</fpage>
          -
          <lpage>538</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>319</fpage>
          -03524-6_
          <fpage>45</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bianco</surname>
          </string-name>
          , G. Castro,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          , A. Wa- [19]
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Sinaga</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Unsupervised k-means jda, Automatic rgb inference based on facial emo- clustering algorithm</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          ) 80716
          <article-title>- tion recognition</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          , 80727. doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2020</year>
          .
          <volume>2988796</volume>
          . volume
          <volume>3092</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>66</fpage>
          -
          <lpage>74</lpage>
          . [20]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonanno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , Hybrid neu-
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <article-title>Lessen- ral networks architectures for soc and voltage preing stress and anxiety-related behaviors by means diction of new generation batteries storage, in: of ai-driven drones for aromatherapy</article-title>
          ,
          <source>in: CEUR 3rd International Conference on Clean Electrical Workshop Proceedings</source>
          , volume
          <volume>2594</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <source>Power: Renewable Energy Resources Impact, IC2020</source>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>12</lpage>
          .
          <source>CEP</source>
          <year>2011</year>
          ,
          <year>2011</year>
          , pp.
          <fpage>341</fpage>
          -
          <lpage>344</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICCEP.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Marcotrigiano</surname>
          </string-name>
          , G. Stingi,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fregnan</surname>
          </string-name>
          , P. Maga-
          <year>2011</year>
          .6036301. relli, P. Pasquale,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          , G. Orsi,
          <string-name>
            <given-names>M.</given-names>
            <surname>Montagna</surname>
          </string-name>
          , [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Nowak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Nowicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>An integrated control plan Multi-class nearest neighbour classifier for in primary schools: Results of a field investiga- incomplete data handling, in: Lecture Notes tion on nutritional and hygienic features in the in Artificial Intelligence (Subseries of Lecapulia region (southern italy)</article-title>
          ,
          <source>Nutrients</source>
          <volume>13</volume>
          (
          <year>2021</year>
          ). ture Notes in Computer Science), volume doi:
          <volume>10</volume>
          .3390/nu13093006. 9119, Springer Verlag,
          <year>2015</year>
          , pp.
          <fpage>469</fpage>
          -
          <lpage>480</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Brandizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brociek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wajda</surname>
          </string-name>
          , First doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -19324-3_
          <fpage>42</fpage>
          .
          <article-title>studies to apply the theory of mind theory to green</article-title>
          [22]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonanno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <article-title>A wavelet and smart mobility by using gaussian area cluster- based prediction of wind and solar energy for longing</article-title>
          , volume
          <volume>3118</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>71</fpage>
          -
          <lpage>76</lpage>
          .
          <article-title>term simulation of integrated generation systems,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Illari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avanzato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , A cloud- in: SPEEDAM 2010 -
          <article-title>International Symposium oriented architecture for the remote assessment on Power Electronics, Electrical Drives, Automaand follow-up of hospitalized patients</article-title>
          ,
          <source>in: CEUR tion and Motion</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>586</fpage>
          -
          <lpage>592</lpage>
          . doi:
          <volume>10</volume>
          .1109/ Workshop Proceedings, volume
          <volume>2694</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          , SPEEDAM.
          <year>2010</year>
          .
          <volume>5542259</volume>
          .
          <year>2020</year>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>35</lpage>
          . [23]
          <string-name>
            <given-names>T.</given-names>
            <surname>Moon</surname>
          </string-name>
          ,
          <article-title>The expectation-maximization algorithm,</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ponzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vincelli</surname>
          </string-name>
          ,
          <source>Supporting IEEE Signal Processing Magazine</source>
          <volume>13</volume>
          (
          <year>1996</year>
          )
          <fpage>47</fpage>
          -
          <lpage>60</lpage>
          .
          <article-title>impaired people with a following robotic assistant doi</article-title>
          :
          <volume>10</volume>
          .1109/79.543975.
          <article-title>by means of end-to-end visual target navigation</article-title>
          [24]
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Shahapure</surname>
          </string-name>
          , C. Nicholas,
          <article-title>Cluster quality analand reinforcement learning approaches, in: CEUR ysis using silhouette score</article-title>
          ,
          <source>2020 IEEE 7th InternaWorkshop Proceedings</source>
          , volume
          <volume>3118</volume>
          ,
          <source>CEUR-WS, tional Conference on Data Science and Advanced</source>
          <year>2021</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>63</lpage>
          . Analytics (DSAA) (
          <year>2020</year>
          )
          <fpage>747</fpage>
          -
          <lpage>748</lpage>
          . doi:
          <volume>10</volume>
          .1109/
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Russo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Illari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Avanzato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          , Reduc-
          <fpage>DSAA49011</fpage>
          .
          <year>2020</year>
          .
          <article-title>00096. ing the psychological burden of isolated oncological</article-title>
          [25]
          <string-name>
            <given-names>F.</given-names>
            <surname>Kherif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Latypova</surname>
          </string-name>
          ,
          <article-title>Chapter 12 - prinpatients by means of decision trees, in: CEUR Work- cipal component analysis</article-title>
          ,
          <source>in: A. Mechelli, shop Proceedings</source>
          , volume
          <volume>2768</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2020</year>
          , S. Vieira (Eds.),
          <source>Machine Learning</source>
          , Acapp.
          <fpage>46</fpage>
          -
          <lpage>53</lpage>
          . demic Press,
          <year>2020</year>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>225</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Charles</surname>
          </string-name>
          , Interpreting deep learning: The ma- https://www.sciencedirect.com/science/article/pii/ chine learning rorschach test?,
          <source>stat.ML</source>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          . B9780128157398000122. doi:https://doi.org/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Alessandri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Aschieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bobbio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Daini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lis</surname>
          </string-name>
          ,
          <volume>10</volume>
          .1016/B978-0
          <source>-12-815739-8</source>
          .
          <fpage>00012</fpage>
          -
          <lpage>2</lpage>
          .
          <string-name>
            <given-names>M.</given-names>
            <surname>Nucci</surname>
          </string-name>
          , L. Parolin, Documento associazione [26]
          <string-name>
            <given-names>Y. G.</given-names>
            <surname>Jung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heo</surname>
          </string-name>
          ,
          <article-title>Clustering perforitaliana di psicologia (aip) sulle linee guida per mance comparison using k-means and expectation l'assessment ai tempi del coronavirus (</article-title>
          <year>2020</year>
          ).
          <article-title>maximization algorithms</article-title>
          , Biotechnology &amp; Biotech-
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>D. De Fidio</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Pancheri</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Corfiati</surname>
          </string-name>
          ,
          <source>The automated nological Equipment</source>
          <volume>28</volume>
          (
          <year>2014</year>
          )
          <fpage>S44</fpage>
          -
          <lpage>S48</lpage>
          .
          <article-title>rorschach test: an assessment of the pralp3 three years after publication</article-title>
          ,
          <source>Journal of Psychopathology</source>
          (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Exner</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weiner</surname>
          </string-name>
          ,
          <string-name>
            <surname>PAR</surname>
          </string-name>
          , Riap5: Scoring program, ???? URL: https://www.parinc.com/Products/Pkey/ 363.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , Chessss:
          <article-title>An innovative rorschach scoring program</article-title>
          ,
          <source>Journal of Personality Assessment</source>
          <volume>98</volume>
          (
          <year>2016</year>
          )
          <fpage>660</fpage>
          -
          <lpage>662</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramireddy</surname>
          </string-name>
          , Covid-19
          <source>rorschach test dataset</source>
          ,
          <year>2021</year>
          . URL: https://www.kaggle.com/surekharamireddy/ covid-19
          <string-name>
            <surname>-</surname>
          </string-name>
          rorschach
          <article-title>-test-dataset/data.</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>