A proposal for improving EEG microstate generation
                                via interpretable deep clustering with convolutional
                                autoencoders
                                Arjun Vinayak Chikkankod∗,† , Luca Longo†
                                Artificial Intelligence and Cognitive Load Lab, the Applied Intelligence Research Centre, Technological University Dublin,
                                Dublin D07 H6K8, Ireland


                                            Abstract
                                            Electroencephalography-based microstates, characterised as quasi-stable states of mental activation,
                                            encapsulate the spatio-temporal dynamics of brain signals. They are representative template topographic
                                            maps for time intervals, usually in the order of 60-120ms, extracted from the whole time duration of
                                            an EEG-based experiment. This extraction is currently performed with shallow clustering algorithms
                                            such as k-means or hierarchical clustering and trained with many 1D vectors containing scalp-electrode
                                            activations, each representing an instance point in time. However, this approach ignores the spatial
                                            position of these electrodes, which we argue is essential information for improving cluster formation.
                                            This study contributes to the body of knowledge by introducing deep clustering, leveraging recent
                                            advancements in computer vision, and autonomously learning feature representations from EEG data
                                            during clustering, capturing the inherent structure and patterns more effectively. In addition, relevant
                                            advances in eXplainable Artificial Intelligence have enabled various attribution methods to interpret
                                            trained models, which can be exploited to understand the inherent mechanisms for microstate formation.

                                            Keywords
                                            EEG Microstates, Shallow clustering, Deep clustering, Convolutional autoencoders, Resting state.


                                1. Introduction
                                Microstates provide fine-grained temporal and spatial characteristics of multichannel, non-
                                stationary EEG signals. They are quasi-stable brain activation states that can last for 60-120
                                ms [1]. Initially, microstate analysis was performed mainly for alpha-filtered EEG signal [2].
                                Nowadays, it is executed predominantly on a broadband signal and with a few narrowband
                                signals to analyse different spectral profiles in rest, task, and activity-evoked states [3, 4]. EEG
                                microstates can capture global patterns of dynamically varying brain representations over
                                time [5]. They have a few distinct scalp potential topographies that capture the spatiotemporal
                                aspects of EEG brain signals with varying degrees of global explained variance [5]. Generally,
                                four microstate maps are often extracted using clustering methods, which can explain variance
                                in brain dynamics ranging from 64-84% [5, 6]. Although the EEG microstate is a powerful tool

                                Late-breaking work, Demos and Doctoral Consortium, colocated with The 2nd World Conference on eXplainable Artificial
                                Intelligence: July 17–19, 2024, Valletta, Malta
                                ∗
                                    Corresponding author.
                                †
                                    These authors contributed equally.
                                Envelope-Open arjunvinayak.chikkankod@tudublin.ie (A. V. Chikkankod); luca.longo@tudublin.ie (L. Longo)
                                Orcid 0000-0002-6472-1339 (A. V. Chikkankod); 0000-0002-2718-5426 (L. Longo)
                                          © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
for brain dynamics analysis, its computation still needs to address numerous uncertainties and
unanswered questions. Firstly, as multiple studies have pointed out, the four dominant cluster
maps often represent up to 80% of the global variance in an EEG recording. However, studies
conducted by Seitzman and colleagues could explain only 62% − 69% of the total variance with
four cluster maps [7]. In addition, recent studies have revealed that anywhere from five to fifteen
cluster maps are required to explain 80% of the global variance [7, 8, 9]. Secondly, the choice of
the algorithm to generate the cluster maps is completely arbitrary [10]. Generally, modified
k-means or agglomerative hierarchical clustering are used to render template maps. However,
to date, clustering as a research field has produced hundreds of algorithms, each addressing
specific challenges. In addition, both k-means and agglomerative hierarchical clustering are
highly sensitive to noisy data and outliers. Microstate theory assumes that a finite number of
non-overlapping, quasi-stable brain activation states lead to microstates. However, the problem
within this theory is the lack of stability in the number of cluster maps learnt from shallow
clustering (modified k-means, hierarchical clustering) and the resulting EEG microstates, which
can explain up to 80% of the variance from the input EEG signal. Stability refers to the minimum
variance in cluster map formation across subjects, tasks, and datasets that generalises EEG
microstates. Given the above methodological gaps, this research aims to design a unified
framework, leading to higher stability among the generated microstates, employing modern
deep-clustering methods, expected to be robust to noisy data, outliers, and cluster morphologies.
The Research Question (RQ) of our study is- To what extent do deep clustering methods improve
the identification and stability of cluster maps and enhance the efficiency of microstate sequences
across multiple resting and cognitive states over shallow clustering? The following section
reviews relevant work in microstate theory and identifies knowledge gaps. It is then followed by
section 3 presenting the design of a novel pipeline for microstate formation and analysis based
on deep-clustering and notions borrowed from the field of eXplainable Artificial Intelligence
(XAI)[11] for explainable microstate sequence formations. Eventually, section 4 presents the
preliminary findings of a comparative experiment contrasting current shallow methods with
the novel deep-clustering method under development.


2. State-of-the-art pipeline for microstate theory
The current, state-of-the-art pipeline for microstate formation and analysis is depicted in Figure
1. It is based on four steps: [A] to pre-processed EEG recordings for artefacts removal, facilitating
the computations in the following steps; [B] to compute the Global Field Power (GFP) peaks;
[C] to cluster the vectors with 𝑛 electrode recordings at the time of each peak identified in B;
the centroid of these clusters are the cluster maps or template maps. [D] to assign cluster labels
for non-GFP points using backfitting procedures. Much exists in phase A for pre-processing
and denoising EEG signals. We remind the readers of Makoto’s popular denoising pipeline. The
Global Field Power (GFP) computation of phase B is essentially the standard deviation of all
                                                          𝑁
the electrodes for a given time. Formally: 𝐺𝐹 𝑃 = √∑𝑖=1 (𝑣𝑖 − 𝑣)2 /𝑁. The local maxima of the
GFP are determined over all the EEG recordings by performing the first derivative and then
annotating the zero-crossings. Phase C is the clustering of the scalp-electrode activations at the
time location of the peaks of the GFP.
Figure 1: Illustration of the current state-of-the-art microstate pipeline.


   Once clusters are formed with an underlying algorithm, their centroids are usually considered
the representative of the cluster (the cluster maps or template maps). Eventually, backfitting is
performed on the original signal by assigning a label to each time point from the previously
identified cluster maps. Some different labels might be among a long sequence of other labels,
and a smoothing mechanism is performed [4]. This gives rise to microstates, dimensionally
reduced topographies sequences for the input EEG signal [4]. In this pipeline, clustering (phase
C) is the most critical because it can impact microstate formation, analysis, and development and
advancement of microstate theory. Clustering is an unsupervised machine learning technique
whose primary goal is to assign instances to cluster groups by minimising intra-cluster and
maximising inter-cluster dissimilarities [12]. It helps to partition data into groups based on
its features. Cluster formation comprises i) representation learning whereby optimal features
are selected and extracted from the input dataset [13, 14], ii) clustering method whereby an
algorithm that best suits the nature of the problem and input data characteristics is selected, iii)
result evaluation whereby the output (cluster) based on quality, scalability, and constraint-based
clustering parameters is evaluated, and iv) result explanation whereby pragmatic interpretability
and usability for the clustering outcome is performed [15].

2.1. Shallow Clustering
Shallow clustering approaches are traditional clustering algorithms broadly categorised into
the following methodologies [12, 15].
   1. Distance-based partitioning methods divide the data into multiple clusters by assigning
      input instances to the closest cluster centroids. K-means, k-medians, and k-medoids are
      popular partitioning techniques.
   2. Distance-based hierarchical methods provide a hierarchy of associations among input
      instances. Agglomerative (bottom-up) or divisive (top-down) methods establish the
      hierarchical structure. The agglomerative method begins with a singleton cluster. It
      merges it over subsequent iterations to construct clusters with a bottom-up hierarchy. In
      contrast, the divisive method begins with all instances assigned to a single cluster and
      splits it over subsequent iterations to construct two clusters with a top-down hierarchy.
      AGNES (AGglomerative NESting) and DIANA (Divisive Analysis) are the prominent
      agglomerative and divisive methods.
   3. Density-based methods treat regions with high density as potential clusters. Moreover,
      such dense regions are likely to take arbitrary shapes, which makes them suitable for
      data with non-convex shapes. DBSCAN is a well-known density-based algorithm.
   4. Grid-based methods extract grid-like configurations from each region using the grid data
      structure of fine granularity. Clusters are determined by computing the density of each
      region and then sorting the regions according to their densities. Grid-based methods
      are highly scalable, take minimal processing time, and are suitable for parallel process
      implementation. STING and CLIQUE are typical grid-based methods.
   5. Probabilistic and generative models assume that the data from a particular cluster comes
      from a single distribution and a generative model, such as a mixture of Gaussians. A
      model’s parameters can be estimated using the Expectation-Maximisation (EM) method,
      which aims to increase the maximum likelihood function on a given set of data points.
      Subsequently, they predict the generative probability of the underlying data points.
   In microstate theory research, tools such as Cartool and those provided by EEGLAB are
often used, including k-means, agglomerative clustering, MST (minimum spanning tree) based
clustering, Non-Negative Matrix Factorization (NMF), ICA, and spectral clustering. Many other
possibilities exist, but they are largely unexplored in the field, especially those based on deep
learning. Deep clustering methods have surpassed shallow clustering methods’ performance on
benchmarking image datasets [10, 16]. We argue that integrating these into microstate theory
research can advance the field and support enhanced microstate formation and analysis (red
contribution of Figure 1C).


3. Design and methods
Following the previously identified research directions for microstate theory advancement,
this section introduces an experiment where the described pipeline for microstate formation is
extended with a novel deep clustering method in phase C. The raw EEG data recording needs to
go to bandpass filtering using the Firwin filter, with a frequency range between 1.0 and 30.0 Hz.
Subsequently, the original sampling rate, usually higher, is resampled, usually around 120 Hz.
This adjustment helps balance data resolution with computational efficiency, ensuring the data
is manageable while retaining sufficient detail for accurate analysis. Subsequently, the data must
be referenced using an average reference method, reducing signal-related bias. Independent
Component Analysis (ICA) is applied using the Infomax algorithm, configured to extract several
components with a maximum number of iterations, allowing for convergence. This step is
pivotal for removing eye, muscle and other artefacts from the EEG data. After identifying
and removing these components, the original signal was reconstructed and forwarded to the
shallow clustering. The pre-processed EEG signal is used to construct a set of two-dimensional
(2D) topographic maps, one map per point in time. 2D topographic maps are formed by
projecting three-dimensional (3D) electrode information onto 2D Cartesian coordinates using
an Azimuthal equidistant projection. Electrode values are interpolated for areas corresponding
to non-electrode locations using the cubic method. This dataset is then used to train a deep
clustering architecture, as explained in the following section.

3.1. Deep Clustering
Deep clustering identifies cluster groups using features learned by deep neural networks [10].
We designed a novel deep clustering architecture based on a self-supervised autoencoder [17]
that iteratively learns representations and clusters from 2D topographic maps using a single
objective function. It attempts to learn an approximation of the identity function using the
weights W and bias b (ℎ𝑊 ,𝑏 (𝑥) ≈ 𝑥). The designed AE uses clustering as an additional learning
mechanism. In detail, it uses an auxiliary distribution and optimises cluster assignment by
minimising the KL-divergence loss. In a nutshell, an autoencoder for deep clustering can be
represented as 𝐿𝐴𝐸𝐷𝐶 = 𝐿𝐴𝐸 + 𝐿𝑆𝑇 where 𝐿𝐴𝐸 corresponds to the 𝐾 𝐿 loss function, whereas
𝐿𝑆𝑇 is the neighbourhood constraint clustering loss. The model utilises probabilistic outputs of
the softmax function for the clustering assignments. This model, subject to reconstruction and
neighbourhood constraints, systematically generates class-level explanations. The resulting
explanations are then mapped to template (cluster) maps, rendering the process transparent
and the results justifiable within the context of XAI principles [18, 19, 20]. After training such
architecture, the emerging clusters are analysed, and their centroids are extracted and mapped
to the original corresponding topographic maps, which become the template maps. Therefore,
the number of template maps equates to the number of clusters. Deciding the number of clusters
can follow the outcome of other research studies [4] or can be optimally established via a grid
search. The final phase is backfitting, which forms a sequence of microstates, as described in
section 2.
   A comparative experiment is designed, as depicted in Figure 2. Here, the traditional shallow
clustering method used within microstate theory research, namely a modified k-means, is
contrasted with the novel application of the deep-clustering autoencoder, as designed in section
3.1. In modified k-means, the input vectors are polarity invariant, which means their absolute
values are computed. The Test-Retest dataset was selected because it contains EEG recordings
for various tasks: eyes open and closed, subtraction, music, and memory tasks [21]. The
clustering phase (c) results are evaluated by assessing clustering effectiveness. Instead, the
microstate formation phase (D) is evaluated by analysing the stability of the generated microstate
sequence. Clustering effectiveness was quantified using internal clustering measures, namely
the Silhouette score, the Davies-Bouldin Index (DBI), and the Calinski-Harabasz Index (CHI).
Instead, the stability of the microstates was assessed through the duration parameter of each
microstate. This parameter indicates how consistently each microstate is maintained over time,
providing insights into the temporal stability of the microstate sequences.
Figure 2: A diagram illustrating the comparative experiment.


4. Preliminary Results
Table 1 presents the results of clustering effectiveness. The Silhouette score ([−1𝑡𝑜1]) gauges the
compactness and separation of the clusters; a higher score indicates well-defined clusters. The
Davies-Bouldin Index (DBI) assesses cluster delineation, with a lower score suggesting more
distinct clusters (0 being the ideal). Conversely, a higher Calinski-Harabasz Index (CHI) denotes
better cluster separation and formation. Preliminary analyses reveal that deep clustering using
the designed AE outperforms the Modified K-Means approach across all three metrics (Table 1).
The duration metric, the mean uninterrupted time a specific microstate label is maintained, is
expressed in milliseconds. This metric assesses the temporal stability of microstate sequences.
In the comparative analysis, the Modified K-Means algorithm yielded a microstate duration of
80𝑚𝑠, whereas the designed AE Deep clustering architecture achieved a slightly longer duration
of 85.5𝑚𝑠. This increase suggests that microstates generated by the deep learning method
remain stable for a longer duration before transitioning to the next state [5].

Table 1
Preliminary results of clustering effectiveness for Mod K-Means and Deep Clustering.
                                          Mod K-Means                     Deep Clustering
 Silhouette score                              -0.03                             0.29
 Davies-Bouldin Index (DBI)                    33.57                             1.67
 Calinski-Harabasz Index (CHI)                  6.44                            3727.14
5. Conclusion
Our study introduces a novel deep clustering as a potential alternative to traditional shallow
clustering methods for generating EEG microstates. Preliminary findings indicate that the deep
clustering method, based on a convolutional autoencoder, outperforms the modified k-means
in the cluster map identification and stability of the microstate, potentially leading to more
robust and reliable microstate analysis. With just four microstate templates, we could effectively
summarise the brain activity captured by 64 electrodes over 5 minutes. This dimensionality
reduction simplifies the analysis and makes the complex spatiotemporal dynamics of brain
signals more interpretable. By mapping these microstates to specific mental configurations,
we can gain insights into the neural correlates of various cognitive processes. However, the
evaluation was conducted on a single dataset, and further validation across diverse datasets
is required to generalise the findings. The study primarily focused on resting-state EEG data,
and future research should investigate the applicability of deep clustering to task-related EEG
data. Future research directions include exploring different deep clustering architectures to
improve microstate analysis further. Saliency and class activation maps can reveal neural
patterns influencing microstate generation, offering insights into brain function and cognition.


Acknowledgments
This work was conducted with the financial support of the Science Foundation Ireland Centre
for Research Training in Digitally-Enhanced Reality (d-real) under Grant No. 18/CRT/6224.
For the purpose of Open Access, the author has applied a CC BY public copyright licence to any
Author Accepted Manuscript version arising from this submission.


References
 [1] D. Lehmann, H. Ozaki, I. Pal, Eeg alpha map series: brain micro-states by space-oriented
     adaptive segmentation, Electroencephalography and Clinical Neurophysiology 67 (1987)
     271–288. doi:https://doi.org/10.1016/0013- 4694(87)90025- 3 .
 [2] D. Lehmann, Multichannel topography of human alpha eeg fields, Electroencephalog-
     raphy and Clinical Neurophysiology 31 (1971) 439–449. doi:https://doi.org/10.1016/
     0013- 4694(71)90165- 9 .
 [3] W. Hu, Z. Zhang, L. Zhang, G. Huang, L. Li, Z. Liang, Microstate detection in
     naturalistic electroencephalography data: A systematic comparison of topographical
     clustering strategies on an emotional database, Frontiers in Neuroscience 16 (2022).
     doi:10.3389/fnins.2022.812624 .
 [4] V. Férat, M. Seeber, C. Michel, T. Ros, Beyond broadband: Towards a spectral decomposition
     of electroencephalography microstates, Human Brain Mapping 43 (2022). doi:10.1002/
     hbm.25834 .
 [5] C. M. Michel, T. Koenig, Eeg microstates as a tool for studying the temporal dynamics of
     whole-brain neuronal networks: A review, NeuroImage 180 (2018) 577–593. doi:https:
     //doi.org/10.1016/j.neuroimage.2017.11.062 , brain Connectivity Dynamics.
 [6] T. Koenig, L. Prichep, D. Lehmann, P. V. Sosa, E. Braeker, H. Kleinlogel, R. Isenhart,
     E. R. John, Millisecond by millisecond, year by year: normative EEG microstates and
     developmental stages, Neuroimage 16 (2002) 41–48.
 [7] B. A. Seitzman, M. Abell, S. C. Bartley, M. A. Erickson, A. R. Bolbecker, W. P. Hetrick,
     Cognitive manipulation of brain electric microstates, Neuroimage 146 (2017) 533–543.
 [8] D. D’croz-Baron, L. Bréchet, M. Baker, K. Tanja, Auditory and visual tasks influence the
     temporal dynamics of eeg microstates during post-encoding rest, Brain Topography 34
     (2021) 3. doi:10.1007/s10548- 020- 00802- 4 .
 [9] A. Custo, D. Van De Ville, W. M. Wells, M. I. Tomescu, D. Brunet, C. M. Michel, Elec-
     troencephalographic resting-state networks: Source localization of microstates, Brain
     Connectivity 7 (2017) 671–682. doi:10.1089/brain.2016.0476 , pMID: 28938855.
[10] S. Zhou, H. Xu, Z. Zheng, J. Chen, Z. Li, J. Bu, J. Wu, X. Wang, W. Zhu, M. Ester, A
     comprehensive survey on deep clustering: Taxonomy, challenges, and future directions,
     ArXiv abs/2206.07579 (2022).
[11] L. Longo, et al., Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges
     and interdisciplinary research directions, Information Fusion 106 (2024) 102301. doi:https:
     //doi.org/10.1016/j.inffus.2024.102301 .
[12] J. Han, M. Kamber, J. Pei, Data mining concepts and techniques, third edition (2012).
[13] U. Lal, A. V. Chikkankod, L. Longo, A comparative study on feature extraction tech-
     niques for the discrimination of frontotemporal dementia and alzheimer’s disease with
     electroencephalography in resting-state adults, Brain Sciences 14 (2024). doi:10.3390/
     brainsci14040335 .
[14] U. Lal, A. V. Chikkankod, L. Longo, Fractal dimensions and machine learning for detection
     of parkinson’s disease in resting-state electroencephalography, Neural Comput. Appl.
     (2024).
[15] D. Xu, Y. Tian, A comprehensive survey of clustering algorithms, Annals of Data Science
     2 (2015) 165–193. doi:10.1007/s40745- 015- 0040- 1 .
[16] Y. Li, P. Hu, Z. Liu, D. Peng, J. Zhou, X. Peng, Contrastive clustering, 2020.
[17] A. V. Chikkankod, L. Longo, On the dimensionality and utility of convolutional autoen-
     coders latent space trained with topology-preserving spectral eeg head-maps, Machine
     Learning and Knowledge Extraction 4 (2022) 1042–1064. doi:10.3390/make4040053 .
[18] X. Guo, X. Liu, E. Zhu, J. Yin, Deep clustering with convolutional autoencoders, 2017, pp.
     373–382. doi:10.1007/978- 3- 319- 70096- 0_39 .
[19] C. A. Ellis, R. L. Miller, V. D. Calhoun, Improving explainability for single-channel eeg
     deep learning classifiers via interpretable filters and activation analysis*, in: 2023 IEEE
     International Conference on Bioinformatics and Biomedicine (BIBM), 2023, pp. 2474–2481.
     doi:10.1109/BIBM58861.2023.10385647 .
[20] I. Hussain, R. Jany, R. Boyer, A. Azad, S. A. Alyami, S. J. Park, M. M. Hasan, M. A. Hossain,
     An explainable eeg-based human activity recognition model using machine-learning
     approach and lime, Sensors 23 (2023). URL: https://www.mdpi.com/1424-8220/23/17/7452.
     doi:10.3390/s23177452 .
[21] Y. Wang, W. Duan, D. Dong, L. Ding, X. Lei, A test-retest resting and cognitive state eeg
     dataset (2022). doi:10.18112/openneuro.ds004148.v1.0.1 .