=Paper=
{{Paper
|id=Vol-3649/Paper6
|storemode=property
|title=Dance of the Neurons: Unraveling Sex from Brain Signals (short paper)
|pdfUrl=https://ceur-ws.org/Vol-3649/Paper6.pdf
|volume=Vol-3649
|authors=Mohammad-Javad Darvishi-Bayazi,Mohammad Sajjad Ghaemi,Jocelyn Faubert,Irina Rish
|dblpUrl=https://dblp.org/rec/conf/aaai/BayaziGFR24
}}
==Dance of the Neurons: Unraveling Sex from Brain Signals (short paper)==
Dance of the Neurons: Unraveling Sex from Brain Signals
Mohammad-Javad Darvishi-Bayazi1,2,3,* , Mohammad Sajjad Ghaemi4 , Jocelyn Faubert2,3 and
Irina Rish1,2
1
Mila - Québec AI Institute, Montréal, QC, Canada
2
Université de Montréal, Montréal, QC, Canada
3
Faubert Lab, Montréal, QC, Canada
4
National Research Council Canada, Toronto, ON, Canada
Abstract
Previous studies have shown that machine learning can predict biological sex from EEG data with high accuracy. However,
the validity and generalizability of these findings across different datasets and tasks still need to be clarified. In this paper, we
investigated the robustness and transferability of sex-related patterns in EEG data using a Convolutional neural network (CNN)
trained on several corpora of EEG recordings ranging from 221 to 12, 000 participants from healthy and diseased subjects.
We evaluated the CNN on datasets from various sources and groups, with varying degrees of shift in their distributions. We
found that CNNs can detect sex from EEG data accurately on datasets without fine-tuning or adaptation when the shift is low.
However, performance drops where the shift is drastic. These results suggest that sex-related patterns in EEG data are robust
and transferable across diverse datasets and relevant tasks. We discuss the implications of these findings for EEG analysis,
machine learning applications, and best practices to avoid sex biases that enhance personalized mental health interventions.
Keywords
EEG, Sex Prediction, Artificial Neural Network, Machine Learning, Robustness, Transfer Learning, Mental Health
1. Introduction Most of the studies in the field have primarily focused
on differences in brain size and static features [4, 5, 6, 7],
Addressing sex biases in medicine and mental health is ignoring the dynamic aspects of brain function. To ad-
vital, as exemplified by the US Food and Drug Admin- dress this gap, we propose using EEG, which provides
istration’s suspension of ten prescription drugs, eight insights into brain dynamics and activity patterns. How-
of which posed higher health risks in women. The root ever, one major challenge in utilizing EEG data is its inher-
cause of this issue lies in a discernible bias towards males ent noise. To overcome this issue, we suggest employing
in various research stages [1]. Therefore, recognizing sex a large number of samples to increase the signal-to-noise
as a crucial biological variable in primary and preclinical ratio, thus enhancing the reliability and accuracy of the
research ensures accurate and replicable results. findings. By incorporating EEG data into the analysis,
Understanding the complex interplay between brain we can better understand the brain’s dynamic processes
function and sex is vital to advancing mental health com- and their relationship to sex differences and behaviour.
prehension [2]. Electroencephalogram (EEG) signals, re- Despite the promising potential of using brain imaging
flecting brain electrical activity, offer a unique avenue for and machine learning in mental health research to clas-
exploring sex-related neural patterns. Combined with sify sex-specific markers, a significant challenge arises
large datasets, machine learning has become a powerful from the often small and limited datasets employed in
tool in deciphering intricate neurological phenomena. these studies. For instance, Bučková et al. [8] evalu-
This research endeavour holds significant implications ated deep learning classifiers on a small number of par-
for personalized medicine and mental health interven- ticipants with Major Depressive Disorder (MDD) and
tions, offering the potential to enhance early detection, Jochmann et al. [9], Van Putten et al. [10] applied on
diagnosis, and treatment of disorders [3]. The intersec- a mid-size dataset on healthy participants(see Table 1
tion of EEG analysis, machine learning, and large datasets for a comparison). The reliance on insufficient sample
opens new frontiers in mental health research, promis- sizes can lead to incomplete and biased conclusions [4],
ing more precise and practical approaches to promoting hindering the generalizability and reliability of findings.
mental well-being. This issue is particularly critical in understanding the
intricate connections between brain function and men-
Machine Learning for Cognitive and Mental Health Workshop tal health, where individual variations and complexities
(ML4CMH), AAAI 2024, Vancouver, BC, Canada
*
Corresponding author.
require comprehensive datasets.
$ mohammad.bayazi@mila.quebec (M. Darvishi-Bayazi) To tackle these challenges, we leveraged machine
https://www.linkedin.com/in/mjdarvishi/ (M. Darvishi-Bayazi) learning techniques on functional brain imaging data,
0000-0002-3251-8491 (M. Darvishi-Bayazi) specifically EEG, across diverse datasets encompassing
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Table 1
A comparison of previous studies on EEG sex detection. The table shows the name of the study, the dataset used, the number
of participants and recordings in the dataset and in (train, test) splits, participants’ conditions, and the data availability.
Study # of Participants # of Recordings Conditions Dataset
Van Putten et al. [10] 1308 (1000, 308) 1308 All Normal In Lab
Bučková et al. [8] 144 144 MDD In Lab
Jochmann et al. [9] 1282 (1140, 142) 1282 Only Normal Split TUAB
Ours 2417 2417 Normal/Abnormal Public-NMT
Ours 2329 2978 Normal/Abnormal Public-TUAB
Ours 14987 69000 Unlabeled Public-TUEG
varying sample sizes and populations, including both steps to the EEG data, including 21 common channels,
normal and abnormal 1 populations. Also, we examined which were selected across the datasets. We used Artifact
the performance of classifiers under the distribution shift Subspace Reconstruction (ASR) [15] to remove artifacts
on unseen data. Ultimately, our findings contribute to from the EEG. We z-scored the EEG signals to each chan-
more robust and applicable insights for targeted and per- nel’s statistics. We used predefined test sets to report the
sonalized mental health interventions. accuracy of our models and 15% of the training splits for
model selection.
2. Material and Methods
2.2. Model
2.1. Datasets We used ShallowNet [16] as our model for all experiments,
We used three publicly available EEG datasets with differ- as it is a simple and efficient Convolutional Neural net-
ent sample sizes and conditions to investigate the effect work (CNN) that can perform well on various EEG tasks.
of sex on EEG signals. The datasets are: ShallowNet consists of only one convolutional layer fol-
NMT (NUST-MH-TUKL EEG): This dataset contains lowed by a fully connected layer, which reduces the num-
2, 417 recordings from healthy and pathological subjects, ber of parameters and the computational cost compared
with a total duration of 625 hours. The recordings are to deeper networks. We implemented ShallowNet in
labelled as normal or abnormal by qualified neurologists BrainDecode [17] and trained it using the AdamW opti-
and also include demographic information, such as sex mizer with a learning rate of 0.000625, weight decay of
and age [11]. 0, drop probability of 0.5, and a batch size of 64. We used
TUAB (Temple University Hospital Abnormal binary cross-entropy as the loss function and balanced
EEG Corpus): This dataset is a subset of the TUEG cor- accuracy (BAC) as the evaluation metric.
pus that contains 1, 985 recordings from 1, 652 subjects,
with a total duration of 453 hours. The recordings are 2.3. Training and Evaluation
labelled as normal or abnormal by qualified neurologists
The training and evaluation of the model were conducted
[12, 13, 14].
with the primary objective of classifying sex from EEG
TUEG (Temple University Hospital EEG Corpus):
signals. Our focus extended beyond the training dataset
This dataset is a large open-source corpus of EEG data,
to include a comprehensive analysis of model perfor-
containing over 69, 000 recordings from 14, 987 subjects,
mance on both the test split of the training dataset and
with a total duration of 27, 062 hours. The recordings
other unseen datasets. The overarching goal was to assess
are de-identified and annotated with clinical information,
the Sex Detectability (SD) from EEG signals and evaluate
such as age and sex [13, 14].
the model’s robustness to distribution shifts in unseen
We utilize patient sex information, encoded as 0 or 1
data. We investigated detectability and transferability un-
as our neural network target. We focus on sex instead
der various conditions to investigate the model’s capabil-
of gender due to the dataset’s clinical origin, assuming
ities. Specifically, we explored the model’s performance
patients’ records reflected assigned birth sex rather than
when trained and tested on subsets of the data, consid-
self-identified gender. We applied several preprocessing
ering scenarios where only Normal participants were
1
The term “Normal/Abnormal” is used in original datasets to describe included, only Abnormal participants were included, or
EEG recordings that contain pathological features, such as epileptic when the entire dataset was utilized. This approach al-
spikes, periodic discharges, or other abnormal patterns. It does lowed us to understand how well the model generalizes
not imply any value judgment or stigma but rather reflects the
quality of the EEG signal. In this paper, we adhered to the same across different participant profiles.
terminology.
A Normal participants B Abnormal participants C All participants
Test dataset
TUAB
NMT
TUEG
TUAB NMT TUEG TUAB NMT TUEG TUAB NMT TUEG
Train dataset Train dataset Train dataset
Figure 1: SD from EEG in three populations: A) Normal B) Abnormal C) All participants. Error bars show the standard error
of BAC across three random seeds. It is worth noting that the TUEG dataset does not have pathology labels. Therefore, the
results for the TUEG dataset are not available in A and B, and we only visualize the results for all participants in C
Furthermore, we conducted experiments to understand dataset, which is out of distribution performance. The
the impact of SD on pathology detection. To achieve this, results indicate that the sex of the subjects is detectable
we trained the model on the NMT dataset, which features from their EEG recordings in all of the distribution sce-
imbalances in different aspects. Our analysis focused on narios, with accuracy ranging from 60% to 80%. The
different subgroups within the dataset, including Male results also show that the sex detection performance is
Normal, Female Normal, Male Abnormal, and Female slightly higher for the normal population than for the
Abnormal participants. We aimed to elucidate any poten- abnormal population. It is worth noting that the TUEG
tial associations between SD and pathology detection by dataset does not have pathology (Normal/Abnormal) la-
examining the model’s performance on these subgroups. bels, therefore, we do not show the results for normal
We ran each experiment with three random seeds. All and abnormal participants.
error bars show the standard error of the metrics of the
three seeds. 3.2. Performance on Unseen Data
(Zero-Shot)
3. Results To evaluate the model’s adaptability to unseen data, we
One of the objectives of this study is to investigate if conducted zero-shot performance assessments across var-
the biological sex of the subjects is detectable from their ious datasets. Zero-shot performance means that the
scalp EEG recordings. This question is relevant for un- model can predict the class of a sample from an unseen
derstanding the sex-specific differences in brain activity dataset without having seen any examples from that
and their implications for the diagnosis and treatment of dataset during training. Notably, the highest accuracy
various neurological and psychiatric disorders. Moreover, of 79.11% was achieved when training on TUEG and
this question is also essential for evaluating the poten- testing on TUAB EEG datasets. Conversely, the lowest
tial biases and limitations of machine learning classifiers accuracies were observed when the model was evaluated
trained on EEG data. across the TUH datasets and the NMT Scalp EEG Dataset.
Our investigation extended beyond the original train-
ing and testing datasets to explore out-of-distribution
3.1. Sex Detectability (SD) from EEG accuracy, mainly focusing on the abnormal population.
To address SD from EEG, we experimented with several Strikingly, the model exhibited higher accuracy in out-of-
datasets of different sizes and compositions, including distribution scenarios when dealing with the abnormal
the TUEG EEG dataset, the world’s most extensive open- population. To comprehensively gauge the generaliza-
source corpus of EEG data. We also considered the nor- tion of learned features, each model was tested on other
mal and abnormal populations of the subjects. We used a datasets to evaluate zero-shot performance. This analy-
shallow and deep convolutional neural network (CNN) as sis provided insights into how well the model leverages
our classifier, Previous studies have shown that this CNN learned features when confronted with entirely new data.
can achieve competitive accuracy with larger models in In table 2, we compare our models with previous work
predicting pathology from EEG data [18, 19]. on sex detection on the TUAB dataset, which has a mod-
The results of our experiments are summarized in Fig- erate sample size compared to two other datasets in our
ure 1 and Table 2, which show the BAC of the CNN study. The results show that ShallowNet archives compa-
classifier for each dataset. The figure shows the BAC rable results in the distribution scenario and outperforms
when we train the model on a dataset and test on its own by a high margin on the zero-shot scenario. The reason
test split, which is in distribution. It also shows the BAC for this improvement might be that the TUEG dataset
of a model trained on a dataset and tested on another has seven times more unique participants, and the data
A NMT data distribution B Performance in subgroups
Percentage
relative to the
entire dataset
Abnormal
Normal
BAC
Train Test Train Test Female Male
Female Male
Figure 2: Sex imbalances in the NMT dataset and its effect on pathology detection: A) Data distribution of the NMT dataset,
the number of male samples is two times higher than that of females. B) Accuracies of subgroups. The difference in the number
of samples does not affect the pathology detection.
Table 2 Table 2 and Figure 1). We then evaluated the pathol-
Comparison of BAC (%) between previous work on TUAB ogy detection models on the NMT dataset for different
dataset and ours subgroups.
Method TUAB Figure 2 shows how the sex imbalance in the NMT
Clean [9] 74.00±02.00 dataset does not affect the pathology detection perfor-
Clean (Ours) 72.89±01.21 mance. Although the NMT dataset has twice as many
Zero-Shot (Pre-trained-Ours) 79.11±01.47 male samples as female samples, as shown in panel A.
However, this does not lead to a significant difference in
the accuracy of the pathology detection models for the
distribution is close to TUAB. Therefore, it improves the male and female subgroups, as shown in panel B. This
performance of the TUAB dataset by 7%. suggests that the sex imbalance in the NMT dataset does
These results demonstrate that the biological sex of not hurt the pathology detection quality.
the subjects is a significant factor that machine learning One possible reason for this finding is that the NMT
methods could capture. However, these results also may dataset has a balanced ratio of normal and abnormal
imply that the sex of the subjects should be taken into samples in each sex subgroup, as shown in Figure 2A.
account when developing and evaluating machine learn- This means that the models can learn the features related
ing classifiers for EEG pathology detection, as the sex to the pathology, not the sex, of the subjects. Therefore,
distribution of the training and testing data may affect even though the sex is detectable from the EEG signals,
the generalization and robustness of the models. it does not interfere with the pathology detection task.
3.3. Sex Imbalance’s Impact on EEG 4. Discussion and conclusion
Pathology Detection
Historically documented sex differences in EEG patterns
As we see in the previous sections (SD and Zero-Shot), and the successful application of machine learning for
sex is detectable from the EEG signals and is an important automatic sex detection suggest that sex-related patterns
biological factor that can influence human brain activity can act as confounders in machine learning-based EEG
and behaviour. Therefore, considering it in the analysis assessments [8, 9]. In our experimentation on potential
is essential, especially when the datasets are imbalanced. confounding factors within the NMT dataset, we explored
In this section, we aim to investigate the effect of sex on a scenario involving an imbalance in male and female par-
pathology detection from EEG signals using the NMT ticipants. Our findings indicate that, in this dataset, sex
Scalp EEG Dataset. The NMT dataset has a significant does not function as a confounder due to an equal distri-
sex imbalance, as men are two times more frequent than bution of abnormal participants in the male/female splits.
women in the dataset. This raises the question of whether However, as demonstrated in the SD section, we reveal
the sex imbalance and the SD from the EEG signals can that sex remains detectable. Consequently, acknowledg-
affect the performance of the pathology detection models. ing sex as a factor is essential for precision medicine in
To address this question, we conducted several exper- mental health.
iments using different deep-learning architectures. We A key takeaway from an extensive review spanning
first verified that sex is detectable from the EEG signals three decades of research on human brain sex differences
using a simple convolutional neural network (CNN) that is that, despite evident behavioural distinctions between
achieved a good accuracy on the sex classification task men and women, disparities in brain structure and func-
on several EEG datasets with different sample sizes (see tion are minimal and inconsistent when adjusted for in-
dividual brain size and inefficient participant numbers 5. Acknowledgements
[4]. In contrast, our study employs EEG, which has high
temporal but low spatial resolution, to assess functional We extend our sincere appreciation to Mathilde Besson
brain activity. Our findings reveal distinct patterns across for their valuable comments, which greatly contributed
datasets with varying subject numbers, highlighting the to the refinement of this paper. This work was funded
unique insights provided by EEG in uncovering differ- by Canada CIFAR AI Chair Program and from the
ences. Canada Excellence Research Chairs (CERC) program,
Brain connectivity and topography research has National Research Council Canada, Natural Sciences
yielded diverse perspectives, providing a rich field for and Engineering Research Council (NSERC-CAE-CRIAC-
future investigations. For instance, Ingalhalikar et al. CARIQ, NSERC discovery grant RGPIN-2022-05122), Doc-
[20] found that male brains exhibit enhanced connectiv- toral Research Microsoft Diversity Award (Microsoft-
ity between perception and coordinated action, while Mila), Faculty of medicine-UdeM, and Faculté des études
female brains are structured to facilitate communication supérieures et postdoctorales. We thank Compute
between analytical and intuitive processing modes. Their Canada for providing computational resources.
study, involving 949 youths, demonstrated distinct pat-
terns in supratentorial connections, with stronger intra-
hemispheric connections in males and stronger inter-
References
hemispheric connections in females. Jochmann et al. [9] [1] S. K. Lee, Sex as an important biological variable
highlighted the significance of EEG topographies in sex in biomedical research, BMB reports 51 (2018) 167.
detection, revealing that even with disrupted waveforms, [2] D. M. Christiansen, M. M. McCarthy, M. V. Seeman,
the sex could be accurately identified. On the other hand, Understanding the influences of sex and gender
Bučková et al. [8] observed that the incorporation of differences in mental disorders, Frontiers in Psychi-
multivariate classification models did not consistently atry 13 (2022) 984195.
improve performance. Also, Eliot et al. [4] argues that de- [3] T. J. Sejnowski, P. S. Churchland, J. A. Movshon,
spite decades of examining sex effects on lateralized brain Putting big data to good use in neuroscience, Nature
function, there is no substantial evidence supporting the neuroscience 17 (2014) 1440–1441.
widely held belief that male brains are significantly more [4] L. Eliot, A. Ahmed, H. Khan, J. Patel, Dump the
lateralized than female brains. The diversity of findings “dimorphism”: Comprehensive synthesis of human
in the literature underscores the complexity of brain con- brain studies reveals few male-female differences
nectivity and topography, making it an intriguing and beyond size, Neuroscience & Biobehavioral Re-
promising avenue for future research. One could examine views 125 (2021) 667–697.
where the trained neural network looks when classifying [5] A. M. Chekroud, E. J. Ward, M. D. Rosenberg, A. J.
brain signals. Holmes, Patterns in the human brain mosaic dis-
Frequency bands are widely recognized as critical fea- criminate males from females, Proceedings of the
tures in quantitative EEG analysis. Despite their promi- National Academy of Sciences 113 (2016) E1968–
nence, the significance of these features in sex detection E1968.
remains unclear. Some studies assert that brain rhythms [6] F. Sepehrband, K. M. Lynch, R. P. Cabeen,
exhibit sex-specific patterns [21, 10], while others argue C. Gonzalez-Zacarias, L. Zhao, M. D’Arcy, C. Kessel-
that none of the traditional frequency bands play a partic- man, M. M. Herting, I. D. Dinov, A. W. Toga, et al.,
ularly crucial role in sex detection [9]. A potential avenue Neuroanatomical morphometric characterization
for future research would be to explore and substantiate of sex differences in youth using statistical learning,
these claims using an extensive dataset, such as TUEG. Neuroimage 172 (2018) 217–227.
In summary, our training and evaluation process thor- [7] C. Sanchis-Segura, M. V. Ibañez-Gual, N. Aguirre,
oughly explored the model’s performance in classifying Á. J. Cruz-Gómez, C. Forn, Effects of different in-
sex from EEG signals. We systematically assessed its abil- tracranial volume correction methods on univariate
ity to generalize to unseen data, examined detectability sex differences in grey matter volume and multi-
and generalization under varying conditions, and inves- variate sex prediction, Scientific Reports 10 (2020)
tigated potential implications for pathology detection 12953.
using a diverse and imbalanced dataset. The results of [8] B. Bučková, M. Brunovskỳ, M. Bareš, J. Hlinka, Pre-
these analyses contribute to a nuanced understanding dicting sex from eeg: validity and generalizability of
of the model’s capabilities and potential applications in deep-learning-based interpretable classifier, Fron-
clinical settings. tiers in Neuroscience 14 (2020) 589303.
[9] T. Jochmann, M. S. Seibel, E. Jochmann, S. Khan,
M. S. Hämäläinen, J. Haueisen, Sex-related patterns
in the electroencephalogram and their relevance in ceedings of the National Academy of Sciences 111
machine learning classifiers, Human Brain Map- (2014) 823–828.
ping 44 (2023) 4848–4858. [21] P. Kaushik, A. Gupta, P. P. Roy, D. P. Dogra, Eeg-
[10] M. J. Van Putten, S. Olbrich, M. Arns, Predicting sex based age and gender prediction using deep blstm-
from brain rhythms with deep learning, Scientific lstm network model, IEEE Sensors Journal 19 (2018)
reports 8 (2018) 3069. 2634–2641.
[11] H. A. Khan, R. Ul Ain, A. M. Kamboh, H. T. Butt,
S. Shafait, W. Alamgir, D. Stricker, F. Shafait, The
nmt scalp eeg dataset: an open-source annotated
dataset of healthy and pathological eeg recordings
for predictive modeling, Frontiers in neuroscience
15 (2022) 755817.
[12] S. López, I. Obeid, J. Picone, Automated interpre-
tation of abnormal adult electroencephalograms,
Ph.D. thesis, Temple University, 2017.
[13] N. Shawki, M. G. Shadin, T. Elseify, L. Jakielaszek,
T. Farkas, Y. Persidsky, N. Jhala, I. Obeid, J. Pi-
cone, Correction to: The temple university hospital
digital pathology corpus, in: Signal Processing
in Medicine and Biology: Emerging Trends in Re-
search and Applications, Springer, 2022, pp. C1–C1.
[14] I. Obeid, J. Picone, The temple university hospital
eeg data corpus, Frontiers in neuroscience 10 (2016)
196.
[15] S. Blum, N. S. Jacobsen, M. G. Bleichner, S. Debener,
A riemannian modification of artifact subspace re-
construction for eeg artifact handling, Frontiers in
human neuroscience 13 (2019) 141.
[16] R. T. Schirrmeister, J. T. Springenberg, L. D. J.
Fiederer, M. Glasstetter, K. Eggensperger,
M. Tangermann, F. Hutter, W. Burgard, T. Ball,
Deep learning with convolutional neural networks
for eeg decoding and visualization, Human brain
mapping 38 (2017) 5391–5420.
[17] R. T. Schirrmeister, J. T. Springenberg, L. D. J.
Fiederer, M. Glasstetter, K. Eggensperger,
M. Tangermann, F. Hutter, W. Burgard,
T. Ball, Deep learning with convolutional
neural networks for eeg decoding and vi-
sualization, Human Brain Mapping (2017).
URL: http://dx.doi.org/10.1002/hbm.23730.
doi:10.1002/hbm.23730.
[18] M.-J. Darvishi-Bayazi, M. S. Ghaemi, T. Lesort, M. R.
Arefin, J. Faubert, I. Rish, Amplifying pathological
detection in eeg signaling pathways through cross-
dataset transfer learning, Computers in Biology
and Medicine (2023) 107893.
[19] L. A. Gemein, R. T. Schirrmeister, P. Chrabąszcz,
D. Wilson, J. Boedecker, A. Schulze-Bonhage, F. Hut-
ter, T. Ball, Machine-learning-based diagnostics of
eeg pathology, NeuroImage 220 (2020) 117021.
[20] M. Ingalhalikar, A. Smith, D. Parker, T. D. Satterth-
waite, M. A. Elliott, K. Ruparel, H. Hakonarson,
R. E. Gur, R. C. Gur, R. Verma, Sex differences in
the structural connectome of the human brain, Pro-