=Paper= {{Paper |id=Vol-3869/p09 |storemode=property |title=Understanding Parental Characteristics of Child Adoption Candidates using MMPI-2 and Evolutionary Clustering |pdfUrl=https://ceur-ws.org/Vol-3869/p09.pdf |volume=Vol-3869 |authors=Emanuele Iacobelli,Cristian Randieri,Paolo Roma,Samuele Russo |dblpUrl=https://dblp.org/rec/conf/icyrime/IacobelliRRR24 }} ==Understanding Parental Characteristics of Child Adoption Candidates using MMPI-2 and Evolutionary Clustering== https://ceur-ws.org/Vol-3869/p09.pdf
                                Understanding Parental Characteristics of Child Adoption
                                Candidates using MMPI-2 and Evolutionary Clustering
                                Emanuele Iacobelli1 , Cristian Randieri2 , Paolo Roma3 and Samuele Russo4
                                1
                                  Department of Computer, Control and Management Engineering, Sapienza University of Rome, 00185 Roma, Italy;
                                2
                                  Università degli Studi eCampus, Novedrate (CO), Italy;
                                3
                                  Department of Human Neuroscience, Sapienza University of Rome, Italy;
                                4
                                  Department of Psychology, Sapienza University of Rome, Italy;


                                               Abstract
                                               In the context of adoption, evaluating prospective adoptive parents using psychometric assessments such as the Minnesota
                                               Multiphasic Personality Inventory (MMPI) questionnaire is essential for understanding their psychological profiles. However,
                                               interpreting such complex data can be both challenging and time-consuming. In this study, we propose a meta-analysis
                                               tool to assist psychologists in their initial interpretation and analysis of MMPI-2 results by providing a clear data-driven
                                               visualization of key psychometric scales. Our system employs unsupervised learning techniques to uncover meaningful
                                               patterns and relationships in the data with minimal prior input. Specifically, a genetic algorithm is used to optimize clustering
                                               quality by selecting the most relevant psychological scales, enhancing cluster separation, and improving data interpretability.
                                               We also explored and compared the effectiveness of several clustering algorithms, including K-Means, Gaussian Mixture
                                               Model, and Spectral Clustering, to maximize the capabilities of our tool.

                                               Keywords
                                               Minnesota Multiphasic Personality Inventory (MMPI), Unsupervised Learning Algorithms, Genetic Algorithm, K-mean,
                                               Gaussian Mixture Model, Spectral Clustering



                                1. Introduction                                                                                         released MMPI-3 [12], published in 2020.
                                                                                                                                           For the evaluation of the results, the set of most impor-
                                Adoption is the process whereby individuals or families                                                 tant psychometric scales to be analyzed is usually hand-
                                assume the parenting of a child who is not biologically                                                 picked by field experts as it is highly task-dependent. For
                                their own. According to specific studies [1, 2, 3, 4], some-                                            that reason, in this study, we propose an unsupervised
                                times adoptees could have problems in psychological de-                                                 learning algorithm capable of clustering the data gath-
                                velopment, social relationships, and establishing a sense                                               ered with the MMPI-2 test using as little as possible prior
                                of identity. Therefore, finding suitable adoptive parents                                               knowledge during the preprocessing and postprocessing
                                is crucial for the well-being of the child.                                                             of the data.
                                   For that reason, standardized psychometric tests [5, 6,                                                 The clustering [13] process is an unsupervised learning
                                7, 8] are used to assess the personality and psychopathol-                                              technique designed to identify similarities within data
                                ogy traits of prospective adoptive parents. An example of                                               without predefined categories. In our case, by analyzing
                                such a test is the Minnesota Multiphasic Personality In-                                                the geometric properties of the data, the goal is to capture
                                ventory (MMPI) psychological test [9], proposed in 1943.                                                as many similarities as possible, even when the under-
                                Over the years, several variations of the test have been                                                lying distribution is not known a priori. Our approach
                                developed. The most commonly used versions today                                                        involves the development of a machine learning based
                                include the MMPI-2 [10], which was published in 1989                                                    [14, 15, 16, 17] genetic algorithm [18, 19, 20, 21] aimed at
                                specifically for adults; the MMPI-A [11], designed for ado-                                             optimizing both the minimum centroid distance and the
                                lescents and introduced in 1992; the MMPI-Restructured                                                  minimum inter-cluster distance, enhancing the cluster-
                                Form, a condensed version of the MMPI; and the recently                                                 ing quality. We also conducted experiments with three
                                                                                                                                        different clustering algorithms (K-Means [22, 23, 24, 25],
                                ICYRIME 2024: 9th International Conference of Yearly Reports on                                         Gaussian mixture model [26, 27], and Spectral clustering
                                Informatics, Mathematics, and Engineering. Catania, July 29-August                                      [28]) to determine the most suitable one for our system.
                                1, 2024                                                                                                 In particular, given that the number of clusters is not
                                Envelope-Open iacobelli@diag.uniroma1.it (E. Iacobelli);
                                cristian.randieri@uniecampus.it (C. Randieri);
                                                                                                                                        predetermined, careful interpretation of the results is
                                paolo.roma@uniroma1.it (P. Roma); samuele.russo@uniroma1.it                                             necessary to attribute meaningful explanations to each
                                (S. Russo)                                                                                              cluster.
                                Orcid 0009-0003-1379-9106 (E. Iacobelli); 0000-0001-5300-3561
                                (C. Randieri); 0000-0002-1031-0948 (P. Roma); 0000-0002-1846-9996
                                (S. Russo)
                                         © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                         Attribution 4.0 International (CC BY 4.0).




                                                                                                                                   69




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Emanuele Iacobelli et al. CEUR Workshop Proceedings                                                                  69–77



1.1. Roadmap                                           2.2. Traditional MMPI Clustering
This paper is organized as follows: first, an overview      Methods
of the MMPI-2 questionnaire, its scales, and traditional       Following this concise overview of the MMPI-2 test, prior
MMPI clustering methods is presented in Section 2. Next,       attempts to cluster datasets derived from this assessment
Section 3 provides a detailed description of the core tech-    have typically involved manually selecting sets of the
niques used in our algorithm. In Section 4, we describe        aforementioned psychometric scales.
the dataset employed in this experiment. Following this,          In [29], an algorithm very similar to K-Means (orig-
Section 5 offers a comprehensive explanation of the sys-       inally described in [30]) was applied to data obtained
tem we developed and its evaluation process. The clus-         from MMPI-2 tests administered to women in their third
tering results produced by our system are then presented       trimester of pregnancy. The objective was to determine
in Section 6. Finally, Section 7 summarizes the article’s      the personality characteristics of women who develop
content and outlines potential areas for future improve-       perinatal depression.
ment.                                                             Similarly, in [31], clusters were generated to identify
                                                               groups of chronic low-back pain patients based on per-
                                                               sonality traits identified through the MMPI-2 test.
2. State of the Art                                               Another notable study is presented in [32], where the
                                                               authors investigated individuals trained to simulate Post-
2.1. MMPI-2 Overview                                           traumatic Stress Disorder (PTSD). They conducted cluster
The MMPI-2 is used as a personality assessment tool            analysis on MMPI-2 clinical and validity scales, identi-
in clinical and non-clinical contexts to discern psy-          fying two well-fitting cluster solutions. Discriminant
chopathologies and behavioral traits in individuals. It        and multivariate analyses of variance (MANOVAs) were
comprises a series of true/false questions, known as items,    employed to evaluate the clusters, revealing significant
which are grouped into various scales designed to mea-         differences in MMPI-2 content scales. Specifically, de-
sure specific aspects of the subject’s disposition.            mographic variables had minimal influence on cluster
   Validity scales scrutinize the subject’s approach to        membership, but there were discrepancies in the reported
the test and demeanor, identifying inconsistencies or          clarity of PTSD education materials among clusters.
attempts to manipulate responses. Among them, the Lie             In [33], the authors investigated the MMPI-2-RF valid-
scale (L) evaluates honesty during the test, while the         ity scales’ effectiveness in profiling chronic pain patients.
K scale assesses defensive tendencies and reluctance to        To identify clusters, a two-step exploratory cluster anal-
acknowledge personal issues.                                   ysis was conducted, employing the auto-clustering selec-
   In addition, the MMPI incorporates ten primary clini-       tion feature in IBM SPSS 21 to select the optimal cluster
cal scales designed to detect a spectrum of psychological      solution. Cluster analysis revealed two distinct patient
disorders, encompassing Hypochondriasis (Hs), Depres-          clusters. Cluster 1 displayed valid responses and exhib-
sion (D), Hysteria (Hy), Psychopathic Deviate (Pd), Mas-       ited elevations primarily on somatic and low positive
culinity/Femininity (Mf), Paranoia (Pa), Psychasthenia         emotion scales. In contrast, Cluster 2 comprised patients
(Pt), Schizophrenia (Sc), Hypomania (Ma), and Social           who overreported on validity scales and demonstrated
Introversion (Si). Furthermore, content scales target spe-     elevations on multiple restructured clinical scales.
cific personal attitudes, including anger issues (ANG),
low self-esteem (LSE), family problems (FAM), and work-
related challenges (WRK), among others.                        3. Core Techniques in Our
   Additionally, supplemental scales are used in combina-         Algorithm
tion with the content scales to determine if some symp-
toms are attributed to alternative potential causes such       3.1. Genetic Algorithm
as controlled hostility, alcoholism, and more.
   Moreover, Psy-5 scales measure dimensional traits           All cited works in this paper employ clustering tech-
of personality disorders, including Aggressiveness, Psy-       niques with input from psychology experts to select rel-
choticism, Constraint, Neuroticism, and Extraversion.          evant psychometric scales for analysis. In contrast, our
   Finally, to ensure uniform interpretation across all        system autonomously selects key scales using a genetic
scales, scores are transformed into T-scores, ranging from     algorithm [34]. Genetic algorithms (GAs) are adaptive
30 to 120. Typically, scores exceeding 65 are considered       search procedures widely utilized in Artificial Intelli-
significant and warrant further examination.                   gence since the 1970s [35, 36, 37]. Drawing inspiration
                                                               from biological evolution, GAs simulate aspects of the
                                                               process of natural selection proposed by Charles Darwin.
                                                               They involve successive generations of candidate solu-




                                                          70
Emanuele Iacobelli et al. CEUR Workshop Proceedings                                                                   69–77



tions undergoing reproduction, mutation, and selection          fined number of clusters through an iterative process:
to converge toward optimal or near-optimal solutions.           randomly selecting K samples as initial clusters (and cen-
Genetic algorithms have a broad range of applications           troids), assigning each sample to the cluster with the
[38, 39, 40]; any problem that can be formalized as a           nearest centroid, recomputing centroids, and terminat-
string of 0s and 1s can potentially be optimized using          ing the process if no data points have switched clusters
this approach.                                                  or if the distance between new and old centroids falls
   In summary, a general genetic algorithm workflow is          below a certain threshold.
the following: firstly, an initial population of individuals       Gaussian Mixture Model (GMM) endeavors to fit a
(each represented as a string of 0s and 1s) is randomly         specified number (N) of normal distributions to dis-
generated. Next, a fitness value is assigned to each in-        tinct subsets of the original dataset by estimating their
dividual in the population according to a certain fitness       mean and variance parameters using the Expectation-
function. Then, multiple pools of individuals are ran-          Maximization (EM) algorithm [41].
domly selected, and a certain number of individuals are            Spectral Clustering, on the other hand, exploits the
chosen based on their fitness value to serve as parents         spectral properties of the affinity matrix to capture the
for the next population from each pool. For each pair of        underlying data structure, particularly in scenarios where
parents, two children are produced using the following          traditional clustering techniques may struggle with non-
criteria: a crossover index is randomly selected and deter-     linear or intricate relationships between data points. In
mines how much of the first part of one parent’s string is      particular, it leverages techniques such as spectral decom-
merged with the second part of the other parent’s string,       position (eigenvalue decomposition) or singular value
and vice versa. Finally, each bit of the generated children     decomposition (SVD), to transform data into a lower-
is flipped according to a certain probability simulating        dimensional space and subsequently employs a standard
the mutation process. This algorithm continues until a          clustering algorithm, such as K-means, to partition the
specific number of consecutive iterations occur without         data points into clusters.
any improvement in the best fitness value. When the al-
gorithm halts, the latest best individual found is selected
as the optimal solution discovered thus far.                    4. Dataset
                                                                In this study, we utilized a dataset comprising 202 en-
3.2. Clustering Algorithms                                      tries and 813 features for each entry. These features
Clustering algorithms belong to the unsupervised learn-         encompass anamnestic information, boolean answers to
ing domain of artificial intelligence and are designed          the MMPI’s questions, and T-scores. Figures 1, 2, and
to unveil concealed patterns and organize data points           3 provide an overview of the statistics regarding some
into coherent clusters based on their intrinsic similari-       of the anamnestic information and the clinical and con-
ties. These algorithms rely on different distance metrics       tent scales, calculated as T-scores, of the subjects in our
like Euclidean distance, cosine similarity, and the Jac-        dataset. For preprocessing, we removed features with
card coefficient to quantify the resemblance between            either a single value or a predominant value (e.g., ‘Citizen-
data points. The typical representation of each resulting       ship’) and those with high variability (e.g., ’Profession’).
cluster involves a centroid, acting as a central reference      Additionally, we dropped the gender column since MMPI
point summarizing the collective traits of its constituent      scales have the same interpretation for both men and
data points. These algorithms can be broadly categorized        women. The boolean answers to the MMPI’s questions
into several methodologies. Partitioning methods, exem-         were also discarded, as the normalized T-score values
plified by K-means, iteratively segment the dataset into        automatically encode this information.
non-overlapping clusters, ensuring each data point exclu-          To ensure data validity, according to the guidelines
sively belongs to one cluster. Hierarchical methods, such       provided by the authors of the MMPI test, applicants
as Agglomerative clustering, construct a hierarchical ar-       with Lie scale scores exceeding 75 were excluded. Ad-
rangement of clusters by iteratively merging or dividing        ditionally, none of the test-takers reached the cutoff of
existing clusters based on similarity criteria, culminating     30 unanswered questions on the ’cannot say’ scale that
in a tree-like structure. Model-based methods, on the           should invalidate the test. We also examined other va-
other hand, assume that the data is generated by a proba-       lidity scales such as F, TRINT, and VRINT, but no en-
bilistic model, such as a Gaussian Mixture Model (GMM),         tries were excluded based on these scales. Applicants
allowing for the probabilistic modeling of clusters.            with high values indicating alcohol or drug issues were
   In our study, we focus on evaluating and comparing           marked as rejected in advance.
the performance of K-means, Gaussian Mixture Model,                The remaining data, consisting of 191 entries with 120
and Spectral Clustering.                                        feature columns, was scaled to ensure all features had
   In detail, K-Means partitions samples into a prede-          the same magnitude within the range [0,1]. This scaling




                                                           71
Emanuele Iacobelli et al. CEUR Workshop Proceedings                                                                                              69–77



                                                                     Table 1
                                                                     This table presents the results achieved by combining various
                                                                     clustering algorithms and fitness functions within our genetic
                                                                     algorithm applied to a synthetically generated dataset (an
                                                                     example is displayed in Fig. 4). The ’Accuracy’ column repre-
                                                                     sents the proportion of correctly classified data points, while
Figure 1: This image displays some of the anamnestic infor-          the ’Iteration’ column indicates the number of iterations the
mation found in our dataset. From the top left to the bottom         algorithm took to achieve the best result.
right, we have: ’Profession’, ’Psychiatric Patients’, ’Citizen-
                                                                      Clustering Algorithm             Fitness Function             Accuracy   Iteration
ship’, ’Gender’, ’Marital Status’, ’Education’, ’PMA’, and ’Age’.
                                                                          K-Means             Minimum Inter-Cluster Distance         79,5%         6
                                                                          K-Means              Minimum Centroid Distance             81.4%         5
                                                                            GMM               Minimum Inter-Cluster Distance         68,8%         5
                                                                            GMM                Minimum Centroid Distance             72,6%         7
                                                                       Spectral Analysis      Minimum Inter-Cluster Distance         62,3%        23
                                                                       Spectral Analysis       Minimum Centroid Distance             64,7%        25




                                                                                                 𝑉 = min{𝑑(𝑠𝑖 , 𝑠𝑗 )}                                  (1)
Figure 2: This image displays statistics for various clinical                                            𝑠𝑖 ∈𝐶𝑖
scales, calculated as T-values, found in the dataset.                                                    𝑠𝑗 ∈𝐶𝑖
                                                                                                          𝑖≠𝑗
                                                                        where 𝑠𝑖 and 𝑠𝑗 are two distinct data points belong-
                                                                     ing to different clusters 𝐶𝑖 and 𝐶𝑗 , respectively, and 𝑑(., .)
                                                                     represents the Euclidean distance function.
                                                                        The minimum centroid distance measures the distance
                                                                     between the centroids of different clusters through the
                                                                     following formula:

                                                                                                                           ∑ 𝑠𝑖
                                                                                                                          𝑠𝑖 ∈𝐶𝑖
                                                                                           𝑉 = min{𝑑(𝑐𝑖 , 𝑐𝑗 )}, 𝑐𝑖 =                                  (2)
                                                                                                 𝑖≠𝑗                        |𝐶𝑖 |
Figure 3: This image displays statistics for various content
scales, calculated as T-values, found in the dataset.                   where 𝑠𝑖 and 𝑐𝑖 represents a data point and the cen-
                                                                     troid of the cluster 𝐶𝑖 , respectively, and 𝑑(., .) denotes the
                                                                     Euclidean distance function.
was crucial to prevent the overwhelming importance of                   To determine the best combination of the clustering
certain features, particularly the MMPI scales, compared             algorithm and fitness function, we evaluated all their pos-
to the boolean values.                                               sible combinations on a synthetically generated dataset.
                                                                     This dataset was generated by sampling data points from
                                                                     three normal distributions with closely located centroids
5. Methodology and System’s                                          and large variance, making the clustering more challeng-
                                                                     ing. Specifically, we used three 250-dimensional Gaus-
   Evaluation                                                        sian distributions with random means in the range [-
For clustering the dataset using a genetic algorithm, each           1.25,1.25] and a standard deviation equal to 20. To visual-
feature in our dataset has been encoded with a binary                ize the synthetic dataset in two dimensions (refer to Fig.
digit [0,1]. This encoding allows each individual to rep-            4 for an example of the data that can be produced), we
resent a unique combination of features. Features as-                applied the Principal Component Analysis (PCA) dimen-
signed the value 1 will be considered in the clustering              sionality reduction algorithm [42]. The best results were
process, while those denoted with 0 will be discarded.               achieved by combining K-means with minimum centroid
Each individual is then evaluated using two different fit-           distance, resulting in an accuracy of 81,4%. Results from
ness functions: the minimum inter-cluster distance and               other combinations are presented in Table 1, while in Fig.
the minimum centroid distance.                                       5 a visual representation of the results is proposed.
   The minimum inter-cluster distance calculates the min-
imum distance between two data points belonging to                   6. Results
different clusters through the following formula:
                                                                     To determine the optimal number of clusters for the K-
                                                                     Means clustering algorithm on the analyzed dataset, we




                                                                72
Emanuele Iacobelli et al. CEUR Workshop Proceedings                                                                       69–77




                                                                   Figure 6: This image represents the Silhouette scores (y-axis)
                                                                   obtained by our algorithm using different numbers of clusters
                                                                   (x-axis). Higher Silhouette scores indicate denser and better-
                                                                   separated clusters. For our dataset, the optimal score was
Figure 4: Example of a 2D synthetically generated dataset          achieved using 2 clusters, as highlighted by the vertical red
that we have used to evaluate the best combination of cluster-     dashed line.
ing algorithm and fitness function for our genetic algorithm.
This dataset was sampled from three 250-dimensional Gaus-
sian distributions with random means in the range [-1.25, 1.25]
and a standard deviation of 20. To visualize the data in 2D, wewhere 𝑎 is the mean distance between a sample and all
applied the Principal Component Analysis (PCA) algorithm       other points in the same cluster, and 𝑏 is the mean dis-
to reduce the dimensionality.                                  tance between a sample and all points in the nearest
                                                               cluster. The Silhouette Score, which is the average of
                                                               the Silhouette Coefficients for all elements in the dataset,
                                                               indicates the quality of clustering. A higher mean Silhou-
                                                               ette Score suggests denser and better-separated clusters.
                                                               In our study, the optimal number of clusters found for
                                                               our dataset was 2, as shown in Fig. 6. The Fig. 7 provides
                                                               a comprehensive overview of Silhouette coefficients for
                                                               different numbers of clusters, demonstrating the decline
                                                               in clustering quality as the number of clusters increases.
                                                                   Executing the PCA to the obtained clusters generates
                                                               the plot displayed in Fig. 8. It can be seen that on the
                                                               first principal component (x-axis) the two clusters are
                                                               well distinguished while on the second principal com-
                                                               ponent (y-axis) they both spread homogeneously even
                                                               if the elements belonging to the green cluster are more
                                                               concentrated around the zero value of that axis.
                                                                   In a more detailed analysis, Fig. 9 illustrates the intra-
Figure 5: This image represents the accuracy obtained by cluster average values for the four main group scales:
different clustering algorithms tested on a synthetic dataset Validity, Clinical, Content, and Supplemental. As ob-
to determine the most suitable algorithm for our work. Specif- served, the elements in the green cluster consistently
ically, we compared K-means, Gaussian Mixture Model, and show lower average values compared to those in the red
Spectral Clustering, all using the Minimum Centroid Distance
                                                               cluster, with the exception of the Validity scale. This
as the fitness function. The results showed that K-means was
the best algorithm, achieving an accuracy of 81.4% compared
                                                               reversal in trend may prompt psychologists to further ex-
to the ground truth.                                           amine these two clusters, as the scales within the Validity
                                                               group are designed to indicate how reliable and truthful
                                                               the test responses are. However, the differences between
                                                               the clusters are minor, and both demonstrate a high level
employed the Silhouette Analysis. This technique in- of reliability in responses, with few outliers. One of the
volves computing the Silhouette Coefficient 𝑠 for each key insights from this analysis is the notable difference in
element in the dataset, defined by:                            the Content scale, suggesting that individuals in the red
                                                               cluster may exhibit more psychological issues compared
                                𝑏−𝑎
                         𝑠=                               (3) to those in the green cluster.
                             max(𝑎, 𝑏)
                                                                   A similar trend, observed in Fig. 8, is also highlighted




                                                              73
Emanuele Iacobelli et al. CEUR Workshop Proceedings                                                                        69–77




                                                                    Figure 8: This image displays the results of the Principal
                                                                    Component Analysis (PCA) in two dimensions on the ana-
                                                                    lyzed dataset, highlighting the two clusters (red and green)
                                                                    identified by our algorithm.




                                                                    Figure 9: This image displays the four main group scales
                                                                    (Validity, Clinical, Content, and Supplemental) on the x-axis,
                                                                    and the y-axis presents the intra-cluster average values for
                                                                    each of these psychological scales for the two clusters (red
Figure 7: Starting from the top, each plot in this image rep-       and green) identified by our algorithm.
resents the Silhouette coefficients of all the elements in the
dataset, obtained by our algorithm using 2, 3, and 4 clusters,
respectively. The y-axis displays the dataset elements divided
by the cluster to which they belong, while the x-axis shows         dataset, thereby speeding up and simplifying the initial
the Silhouette coefficient. The vertical red dashed line repre-     data analysis.
sents the Silhouette score and it is evident that the clustering
quality declines as the number of clusters increases.
                                                                    7. Conclusion
                                                                    In this study, we proposed a novel approach for analyzing
in Fig. 11, where the x-axis represents the average values          MMPI-2 profiles of prospective adoptive parents using
of the Content scale and the y-axis the average values of           evolutionary clustering techniques. By incorporating
the Clinical scale for each element in the dataset.                 a genetic algorithm to autonomously select the most
   Finally, Fig. 10 provides a deeper analysis of the               relevant psychometric scales, we aimed to streamline
weights associated with the psychological scales for the            the clustering process and reduce reliance on manual
first and second principal components of the PCA. From              selection by domain experts.
this plot, it is clear that for the elements in the green clus-        By employing a genetic algorithm to automatically
ter, high values on scales related to the Content group             select the most relevant psychological scales, combined
correspond to highly positive weights, while low values             with K-Means clustering based on minimum centroid
correspond to negative weights. In contrast, the red clus-          distance and Silhouette analysis, we determined that two
ter exhibits an inverted trend. For the second principal            clusters were the optimal choice to describe the analyzed
component, the red cluster elements are more evenly                 dataset.
distributed across the dimension, while the green cluster              These clusters displayed distinct psychological profiles,
elements generally show lower values across the scales.             with notable differences particularly in the content and
From these graphs, psychology experts can gain insights             clinical scales, which may serve as valuable insights for
into the most relevant psychological scales within the



                                                               74
Emanuele Iacobelli et al. CEUR Workshop Proceedings                                                                          69–77




Figure 10: These two plots display, for the two clusters identified by our algorithm (red and green), the intra-cluster average
value for each single psychological scale in the dataset on the y-axis, and the weights associated with the first principal
component in the top plot and the second principal component in the bottom plot on the x-axis.



                                                                         further refining the genetic algorithm to handle larger
                                                                         and more diverse MMPI profiles. Additionally, exploring
                                                                         the integration of other clustering methods and incorpo-
                                                                         rating newer versions of the MMPI test, such as MMPI-3,
                                                                         may provide further improvements and adaptability in
                                                                         diverse psychological evaluations.


                                                                         References
                                                                          [1] D. E. Johnson, Adoption and the effect on children’s
                                                                              development, Early human development 68 (2002)
Figure 11: This image shows the average values of the content                 39–54.
scale on the x-axis and the clinical scale on the y-axis for each         [2] L. M. Beauchesne, As if born to: The social con-
element in the dataset. The division along the x-axis is clearly              struction of a deficit identity position for adopted
visible, while on the y-axis, elements in the green cluster tend              persons (1997).
to be more concentrated, whereas the red cluster elements                 [3] M. DeJong, J. Hodges, O. Malik, Children after adop-
are more evenly distributed across the clinical scale.                        tion: Exploring their psychological needs, Clinical
                                                                              child psychology and psychiatry 21 (2016) 536–550.
                                                                          [4] D. Brodzinsky, M. Gunnar, J. Palacios, Adoption and
psychologists when assessing potential adopters.                              trauma: Risks, recovery, and the lived experience of
   The implications of our approach are twofold: first, it                    adoption, Child abuse & neglect 130 (2022) 105309.
offers a data-driven methodology that enhances the ini-                   [5] S. Russo, S. I. Illari, R. Avanzato, C. Napoli, Reduc-
tial interpretation of complex MMPI-2 profiles, assisting                     ing the psychological burden of isolated oncologi-
psychologists in identifying meaningful patterns without                      cal patients by means of decision trees, in: CEUR
prior assumptions. Second, it underscores the potential                       Workshop Proceedings, volume 2768, 2020, p. 46 –
of unsupervised learning techniques, such as genetic al-                      53.
gorithms, in improving psychometric data analysis by                      [6] G. Lo Sciuto, S. Russo, C. Napoli, A cloud-based
automating feature selection and optimizing clustering                        flexible solution for psychometric tests validation,
quality.                                                                      administration and evaluation, in: CEUR Workshop
   Future work may involve expanding the dataset and                          Proceedings, volume 2468, 2019, p. 16 – 21.
                                                                          [7] S. Russo, C. Napoli, A comprehensive solution for



                                                                    75
Emanuele Iacobelli et al. CEUR Workshop Proceedings                                                                  69–77



     psychological treatment and therapeutic path plan-               2014 - 2014 IEEE Symposium Series on Computa-
     ning based on knowledge base and expertise shar-                 tional Intelligence - CIHLI 2014: 2014 IEEE Sym-
     ing, in: CEUR Workshop Proceedings, volume 2472,                 posium on Computational Intelligence for Human-
     2019, p. 41 – 47.                                                Like Intelligence, Proceedings, 2014. doi:10.1109/
 [8] S. I. Illari, S. Russo, R. Avanzato, C. Napoli, A cloud-         CIHLI.2014.7013395 .
     oriented architecture for the remote assessment             [19] M. Woźniak, D. Połap, M. Gabryel, R. K. Nowicki,
     and follow-up of hospitalized patients, in: CEUR                 C. Napoli, E. Tramontana, Can we process 2d im-
     Workshop Proceedings, volume 2694, 2020, p. 29 –                 ages using artificial bee colony?, in: Lecture Notes
     35.                                                              in Artificial Intelligence (Subseries of Lecture Notes
 [9] S. Hathaway, J. McKinley, A multiphasic person-                  in Computer Science), volume 9119, 2015, p. 660 –
     ality schedule (minnesota): I. construction of the               671. doi:10.1007/978- 3- 319- 19324- 3_59 .
     schedule, The Journal of Psychology 10 (1940)               [20] J. Shapiro, Genetic algorithms in machine learn-
     249–254.                                                         ing, in: Advanced Course on Artificial Intelligence,
[10] J. N. Butcher, Minnesota Multiphasic Personality                 Springer, 1999, pp. 146–168.
     Inventory-2: Manual for administration, scoring,            [21] M. Woźniak, D. Połap, C. Napoli, E. Tramontana,
     and interpretation, University of Minnesota Press,               Graphic object feature extraction system based on
     2001.                                                            cuckoo search algorithm, Expert Systems with Ap-
[11] J. Butcher, C. Williams, J. Graham, A. Tellegen,                 plications 66 (2016) 20 – 31. doi:10.1016/j.eswa.
     Y. Ben-Porath, R. Archer, et al., Manual para la ad-             2016.08.068 .
     ministración, puntaje e interpretación del inven-           [22] K. P. Sinaga, M.-S. Yang,           Unsupervised k-
     tario multifacético de la personalidad en adoles-                means clustering algorithm, IEEE access 8 (2020)
     centes, mmpi-a [manual for administration, scor-                 80716–80727.
     ing, and interpretation of the minnesota multipha-          [23] V. Ponzi, S. Russo, A. Wajda, R. Brociek, C. Napoli,
     sic personality inventory for adolescents: Mmpi-a],              Analysis pre and post covid-19 pandemic rorschach
     1992.                                                            test data of using em algorithms and gmm mod-
[12] Y. Ben-Porath, A. Tellegen, Minnesota Multipha-                  els, in: CEUR Workshop Proceedings, volume 3360,
     sic Personality Inventory-3 (MMPI-3): Manual for                 2022, p. 55 – 63.
     administration, scoring, and interpretation, Univer-        [24] N. Brandizzi, S. Russo, G. Galati, C. Napoli, Address-
     sity of Minnesota Press, 2020.                                   ing vehicle sharing through behavioral analysis: A
[13] R. Xu, D. Wunsch, Survey of clustering algorithms,               solution to user clustering using recency-frequency-
     IEEE Transactions on neural networks 16 (2005)                   monetary and vehicle relocation based on neigh-
     645–678.                                                         borhood splits, Information (Switzerland) 13 (2022).
[14] M. Woźniak, D. Połap, R. K. Nowicki, C. Napoli,                  doi:10.3390/info13110511 .
     G. Pappalardo, E. Tramontana, Novel approach                [25] M. Ahmed, R. Seraj, S. M. S. Islam, The k-means al-
     toward medical signals classifier, in: Proceedings               gorithm: A comprehensive survey and performance
     of the International Joint Conference on Neural                  evaluation, Electronics 9 (2020) 1295.
     Networks, volume September 2015, 2015. doi:10.              [26] N. Andriyanov, A. Tashlinsky, V. Dementiev, De-
     1109/IJCNN.2015.7280556 .                                        tailed clustering based on gaussian mixture models,
[15] E. Iacobelli, V. Ponzi, S. Russo, C. Napoli, Eye-                in: Intelligent Systems and Applications: Proceed-
     tracking system with low-end hardware: Devel-                    ings of the 2020 Intelligent Systems Conference
     opment and evaluation, Information (Switzerland)                 (IntelliSys) Volume 2, Springer, 2021, pp. 437–448.
     14 (2023). doi:10.3390/info14120644 .                       [27] N. Brandizzi, S. Russo, R. Brociek, A. Wajda, First
[16] F. Fiani, S. Russo, C. Napoli, An advanced solu-                 studies to apply the theory of mind theory to green
     tion based on machine learning for remote emdr                   and smart mobility by using gaussian area cluster-
     therapy, Technologies 11 (2023). doi:10.3390/                    ing, in: CEUR Workshop Proceedings, volume 3118,
     technologies11060172 .                                           2021, p. 71 – 76.
[17] I. E. Tibermacine, A. Tibermacine, W. Guettala,             [28] U. Von Luxburg, A tutorial on spectral clustering,
     C. Napoli, S. Russo, Enhancing sentiment anal-                   Statistics and computing 17 (2007) 395–416.
     ysis on seed-iv dataset with vision transformers:           [29] V. Meuti, I. Marini, A. Grillo, M. Lauriola, C. Leone,
     A comparative study, in: ACM International                       N. Giacchetti, F. Aceti, et al., Mmpi-2: cluster
     Conference Proceeding Series, 2023, p. 238 – 246.                analysis of personality profiles in perinatal depres-
     doi:10.1145/3638985.3639024 .                                    sion—preliminary evidence, The Scientific World
[18] C. Napoli, G. Pappalardo, E. Tramontana, Z. Marsza-              Journal 2014 (2014).
     lek, D. Polap, M. Wozniak, Simplified firefly algo-         [30] J. MacQueen, et al., Some methods for classifica-
     rithm for 2d image key-points search, in: IEEE SSCI              tion and analysis of multivariate observations, in:



                                                            76
Emanuele Iacobelli et al. CEUR Workshop Proceedings             69–77



     Proceedings of the fifth Berkeley symposium on
     mathematical statistics and probability, volume 1,
     Oakland, CA, USA, 1967, pp. 281–297.
[31] J. L. RILEY, M. E. Robinson, M. E. Geisser, V. T.
     Wittmer, Multivariate cluster analysis of the mmpi-
     2 in chronic low-back pain patients, 1993.
[32] J. D. Elhai, B. C. Frueh, Subtypes of clinical pre-
     sentations in malingerers of posttraumatic stress
     disorder: An mmpi-2 cluster analysis, Assessment
     8 (2001) 75–84.
[33] L. E. Aguerrevere, M. R. Calamia, K. W. Greve, K. J.
     Bianchini, K. L. Curtis, V. Ramirez, Clusters of
     financially incentivized chronic pain patients using
     the minnesota multiphasic personality inventory-
     2 restructured form (mmpi-2-rf)., Psychological
     Assessment 30 (2018) 634.
[34] K. De Jong, Learning with genetic algorithms: An
     overview, Machine learning 3 (1988) 121–138.
[35] K. De Jong, Adaptive system design: a genetic
     approach, IEEE Transactions on Systems, Man, and
     Cybernetics 10 (1980) 566–574.
[36] J. J. Grefenstette, Optimization of control parame-
     ters for genetic algorithms, IEEE Transactions on
     systems, man, and cybernetics 16 (1986) 122–128.
[37] K. A. De Jong, An analysis of the behavior of a
     class of genetic adaptive systems., University of
     Michigan, 1975.
[38] H. AL-Tabtabai, A. P. Alex, Using genetic algo-
     rithms to solve optimization problems in construc-
     tion, Engineering Construction and Architectural
     Management 6 (1999) 121–132.
[39] K. Deb, Genetic algorithm in search and optimiza-
     tion: the technique and applications, in: Proceed-
     ings of International Workshop on Soft Computing
     and Intelligent Systems,(ISI, Calcutta, India), Pro-
     ceedings of International Workshop on Soft Com-
     puting and Intelligent …, 1998, pp. 58–87.
[40] A. Homaifar, C. X. Qi, S. H. Lai, Constrained op-
     timization via genetic algorithms, Simulation 62
     (1994) 242–253.
[41] A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum
     likelihood from incomplete data via the em algo-
     rithm, Journal of the royal statistical society: series
     B (methodological) 39 (1977) 1–22.
[42] H. Abdi, L. J. Williams, Principal component analy-
     sis, Wiley interdisciplinary reviews: computational
     statistics 2 (2010) 433–459.




                                                           77