Gender Detection and Stylistic Differences and Similarities between Males and Females in a Dream Tales Blog Raffaele Manna Antonio Pascucci Johanna Monti UNIOR NLP Research Group UNIOR NLP Research Group UNIOR NLP Research Group University L’Orientale University L’Orientale University L’Orientale Naples, Italy Naples, Italy Naples, Italy rmanna@unior.it apascucci@unior.it jmonti@unior.it Abstract Diary narratives represent a field already inves- tigated by researchers. The recent development English. In this paper we present the re- of web communities focused on telling dreams sults of a gender detection experiment car- allows researchers to access and discover new ried out on a corpus we built downloading characteristics related to the language of dreams. dream tales from a blog. We also high- Stylistic and linguistic features of dreams in blog light stylistic differences and similarities reports are essential in order to detect writing style concerning lexical choices between men and content differences between men and women, and women. In order to carry the exper- but also enable future researches associated to the iment we built a feed-forward neural net- different types of personality and styles associated work with traditional sparse n-hot encod- with mental health diagnoses and therapeutic out- ing using the Keras open source library. comes. The aim of this paper is to show that despite 1 Introduction dreams are just an unconscious production, there are several stylistic differences between the re- It is generally accepted that dreams are just an un- ports of dreams by males and females on online conscious production, and that represent a type of blogs. The model we built is able to represent and non-manipulable happening. However, many peo- classify all stylistic differences. ple believe that dreams are premonitory of future Moreover, this research represents a preliminary events as well as representations and reworkings step in the field of dream tales which will be fol- of past events. Humans tend to preserve all per- lowed by an attempt to find stylistic differences sonal events, some of them in the form of a diary, between dream tales and other forms of self narra- namely the best method to tell an event and keep tion (i.e. travel tales). its aura of magic. The paper is organized as follows: in Section 2 we Until recently, dream reports were relegated to the introduce Related Work, in Section 3 we describe the pages of paper journals or revealed to famil- the corpus we built and the blog. Methodology is iar people. At an earlier time, dreams are gathered described in Section 4 and Results are in Section from sleep research labs, psycho-therapeutic and 5. In Section 6 we present our Conclusions and we in patient settings, personal dream journals and oc- introduce Future Work. casionally classroom settings where “most recent dreams” and “most vivid dreams” are collected as 2 Related Work in (Domhoff, 2003). Social media have opened millions of pages where Textual analysis of dream reports is still not a com- people feel at ease to confess their thoughts, pletely investigated field in NLP. One of the pur- their experience and even their secret fantasies. poses of computational dream report analysis lies These platforms such as Twitter, Facebook and in understanding how and why a dream narrative web blogs are a good ground for computational differs from a waking narrative (Hendrickx et al., text analysis research in social science and mental 2016). For example, if a dream description con- health assessment via language. tains more function words than a waking narra- tive, what is the relationship between the content Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 of dreams and the use of more function words? International (CC BY 4.0). Earlier studies were conducted by (Domhoff, 2003 and Bulkeley, 2009). In their researches, dream clothes in dreams. reports are analyzed and a systematic category list In psychiatric studies, the gender variable is iden- of words that can be used for queries and word- tified as a predictive for psychotic behaviors and frequency counts in the DreamBank.net is pro- disorders. In (Thorup, et al., 2007), the authors vided. The categories are related to the content showed that, in psychotic patients, the gender- of dreams and used to retrieve the mentions of related variable has a role in showing different emotions, characters, perception, movement and psycho-pathological characteristics and different socio-cultural background. social functioning. Although no dream samples On the basis of this approach (Bulkley, 2014) up- were taken as a subject in this study. date the categories list and evaluate it on four Dream diaries refine the research in uncovering datasets of the DreamBank corpus. It has been connections between dreams and dreamer’s socio- shown that this type of word analysis can be ap- cultural background, mental conditions and neuro- plied to detect the topics of dreams. In addition, physiological factors. The language of online this latter contribution provides evidence that it is dreams in relation to mental health conditions has possible to guess about a person’s life and activ- yet to be analyzed, however prior laboratory re- ities, personal concerns and interests based on an search suggests that dream content may differ ac- individual dream collection . cording to clinical conditions. Other works focus on identifying the emotions in In (Skancke et al., 2014), emotional tone, themes the reports of dreams. In particular (Razav et and actor focus in dream report were associated al., 2014) use a machine learning method to as- with anxiety disorders, schizophrenia, personal- sign emotion labels to dreams on a four-level neg- ity and eating disorders. However, it is not clear ative/positive sentiment scale. In their research, whether dream content can be predictive with re- dreams are represented as word vectors and dy- spect to mental disorders. namic features are included to represent sentiment In (Scarone, 2008), the hypothesis of the dream- changes in dream descriptions. ing brain as a neurobiological model for psychosis In a more accurate sentiment analysis, (Frantova is tested by focusing on cognitive bizarreness, a and Bergler, 2009) train a classifier, based on distinctive property of the dreaming mental state semi-automatically compiled emotion word dic- defined by discontinuities and incongruities in tionaries, in order to assign five fuzzy-emotion cat- the dream report, thoughts and feelings. Cogni- egories to dream reports. Then, they compare their tive bizarreness is measured in written reports of results against a sample from the DreamBank that dreams and in verbal reports of waking fantasies is manually labeled with emotion annotations. in thirty schizophrenics and thirty normal controls. In some non-computational studies and aimed at The differences between these two groups indi- highlighting gender differences (Schredl, 2005; cate that, under experimental conditions, the wak- Schredl, 2010), dream reports are used to spot gen- ing cognition of schizophrenic subjects shares a der differences in dream recall. The first research common degree of formal cognitive bizarreness demonstrates that gender differences in dream re- with dream reports of both normal controls and calls and dream contents are stable. Human judges schizophrenics. These results support the hypoth- are able to correctly match the dreamer’s gender esis that dreaming brain could be a useful exper- based on a single dream report with a probabil- imental model for psychosis. Taking advantage ity better than chance. Based on these findings, of all the above considerations and mixing the in the latter study the stability of gender differ- psychiatric and neurobiological information of the ences in dream content is analyzed over time. Two studies shown, the present research wants first of dream themes (work-related dreams and dreams all to reveal the differences between genders in of deceased persons) were investigated and gen- dreams. And as a future goal, starting from the der differences resulted quite stable over time. In hypothesis of cognitive similarity between dreams (Mathes, 2013) gender differences are associated and psychoses and using dreams as an experimen- to personality traits. The analysis indicate that tal path, to clarify the relationship between gender some of the big five personality dimensions might and psychosis. be linked with some dream characteristics such as characters and the occurrence of weapons or 3 Dataset Description In Tables 2 and 3 we present four lists of six ex- clusive nouns and six exclusive verbs used by men The web is full of blogs, where people can share or women. Both exclusive nouns and exclusive opinions, questions and personal feelings and verbs are the most relevant for frequency for Males thoughts about their own life. Furthermore, people and Females classes. Verbs are reported in their also share their dreams, one of the most personal base form. The results indicate, without interpre- hidden aspects of life. tative effort for a human, that most relevant topics It is very easy to find a blog in which thousands given these high frequency words are associated of people share their “dream experiences”, some- to activities and events that the dreamers want to times discovering that other people have had sim- happen, in settings and adventurous situations for ilar experiences dictated by similar life styles. male dreamers. Meanwhile dreamers belonging to We investigated a blog, called SogniLucidi, on Females class seem to set their dreams in a bale- which every day thousands of people tell their ful scenario, where “transizione” (transition) and dreams and nightmares, mixing their nightly fan- “trapasso” (transition) mean that they dream about tasies with their unconscious writing style choices. twilight state, beyond death or they fantasize about SogniLucidi, that literally can be translated in Lu- surreal activities. cidDreams took its name from a term coined by the Dutch psychiatrist Frederik van Eeden in 1913: Males Females it describes the situation in which dreamers are destinazione (destination) balzo (bound) aware that they are dreaming. esplosione (explosion) luce (light) There are many techniques that, when cor- foresta (wood) nuvola (cloud) rectly applied, allow dreamers to obtain a “Lu- lenzuola (linens) piscina (swimming pool) cid Dream” and that we report for complete- spiaggia (beach) transizione (transition) ness: CAT (Cycle Adjustment Technique), MILD terrazze (terraces) trapasso (transition) (Mnemonic Induction of Lucid Dreaming), WBTB (Wake Back To Bed), WILD (Wake Initiated Lucid Table 2: Most frequent Exclusive Nouns in the Dreams), RCT (Reality Control Test) and ITES whole corpus. (Induction Through External Stimulus). The corpus we built for the investigation is bal- Males Females anced with gender and the number of authors an- assomigliare(to resemble) affrontare(to face) alyzed is not randomly selected but represents the baciare(to kiss) cadere(to fall) precise number of participants to the blog. funzionare(to function) ragionare(to reason) 3.1 Dataset Statistics ottenere(to obtain) stringere(to tighten) scomparire(to disappear) succedere(to happen) In this paragraph, we present the resulting statis- superare(to overcome) volare(to fly) tics obtained using the NLTK module together with other statistics formulas for the analysis of Table 3: Most frequent Exclusive Verbs in the the corpus we built on SogniLucidi blog. In Table whole corpus. 1 we report two important statistics about words: the number of tokens in texts written by men and Lastly, in Table 4 we report the average of tokens women and word types. We can notice that there per sentence. is a big difference in the number of tokens used by Males (80629) and Females (57673). Males Tokens AVG Females Tokens AVG 18,74 tokens/sentence 10,01 tokens/sentence Males Females Table 4: Average of tokens per sentence in texts Number of Tokens 80629 57673 written by men and women. Word Types 12254 11158 4 Methodology Table 1: Words’ statistics in the whole corpus in The training corpus consists in dream text descrip- terms of Number of Tokens and Word Types. tions written by two groups of authors: • 28 Male authors; Word level n-grams used the following parame- ters: • 28 Female authors. • Minimum document frequency = 2. Terms The corpus is balanced and labelled with gender. with a document frequency lower than would Gender annotation has been done manually and be ignored; based on the name of the users, their profile pho- tos and description. For each author, a total of • Term frequency-inverse document frequency fifteen texts about dreams are provided. Authors (tf-idf) weighting; are coded with an alpha-numeric author-ID. For each author, the last fifteen texts about dreams • Maximum document frequency = 1.0 or have been retrieved from the personal web diary’s rather terms that occur in all documents timeline. As a result, the time frame of the dream would be ignored. reports might vary from days to months, depend- ing on how frequently users report their dreams 4.2.1 Classification Model on the blog. To train our classification model, we We built a neural network to perform the gender exploited the descriptions of dreams only and not detection issue. We decided to run a feed-forward the comments (both comments of the authors and neural network with traditional sparse one-hot en- comments of other members of the SogniLucidi coding with the Keras open source library. After a blog). parameters selection, the model obtained the best performance with an Adam optimizer and a learn- 4.1 Preprocessing ing rate of 0.32, feeding it with a batch size of sev- For preprocessing we used the Python library enty and training for thirty epochs. Moreover, the BeautifulSoup along with same regex procedures. input layer of sixty-five neurons with an initializa- We performed the following preprocessing steps: tion using a norm kernel. Then, a RELU activa- tion function was applied, followed by a dropout • Removing the html tags; layer. During optimization, we found that a rel- • Removing URLs; atively big dropout rate of 0.5 outperformed the smaller dropout rates. The output layer is a single • Removing @username mentions; neuron, followed by a linear activation function. The feature set provided to the model was an n- • Lower-casing the characters; hot encoding of the uni-, bi- and trigrams. • Detecting stop-words by document fre- 5 Results quency and removing. Only n-grams that oc- curred in all documents has been considered In this section we describe the results on the train- a stop-word and ignored. ing data and the test data. The data we used was split into training and test data. The training set 4.2 Features contains a known output and the model learns on Feature selection is a very critical step in any this data in order to be generalized to other data model. For feature selection we use the sklearn later on. We have the test set (or subset) in order to utilities SelectKbest. It selects the n-best feature test our model+ prediction on this subset. We cal- based on a given criterion. In our experiments, culated accuracy scores on the training data, both the features are selected on the f classif criteria. on validation set (Dev set) of 0.3 and Test set of This function perform an ANOVA test, a type of 0.2. The performances (both for Dev test and Test hypothesis test, on each feature on its own and as- set) are shown in Table 5 in terms of Accuracy, sign that feature a p-value. The SelectKbest rank Precision and F1 Score. We obtained roughly the the features by that p-value and keep only the n- same results for Accuracy in Dev set and the Test best features. The feature set for the dream dataset set, 0.794 and 0.775, respectively. benefits from word trigrams in addition to other n- Finally, in order to compare our approach, we con- grams. In our final model, we use the following n- sidered two other baseline models namely Multi- grams features: Word unigrams, bigrams and tri- nomial Naive Bayes (MNB) and Linear Support grams. Vector Machine (SVM) besides the feed-forward to “Males” class. Dev set Test set Accuracy 0.796 0.776 Males Females Precision 0.937 0.917 Males 45 3 F1 Score 0.803 0.786 Females 19 41 Table 5: Performances in Dev set and Test set in Table 8: Confusion Matrix on Dev set. terms of Accuracy, Precision and F1 Score. After this intermediate phase and after having neural network for performance comparisons on tuned the parameters in order to optimize the Test set. model on the previous results, the classifier made a total of two hundred-fourteen predictions dur- MNB SVM ing the test phase. Out of two hundred-fourteen 0.411 0.588 predictions, the model predicted “Females” forty- three times and sixty-four “Males”. Indeed, fifty- Table 6: Baseline Accuracy Comparisons. nine people belong to “Females” class and, as pre- dicted during the validation phase, forty-eight to To assess the performance of the model, the “Males” class. We report gender prediction results Root Mean Square Error (RMSE) was computed. on test data in the confusion matrix in Table 9. RSME measures the distance of the predicted value to the true value. It is a measure of error, so the lower is the score, the better is the perfor- Males Females mance. We show RMSE results in Table 7. Males 44 4 Females 20 39 Dev set Test set 0.233 0.224 Table 9: Confusion Matrix on Test set. Table 7: RMSE of the feed-forward model on the Dev set and when using Test set. 6 Conclusions and Future Work Using classification accuracy alone when evaluat- In this paper we have shown our results on gen- ing the performance of the classification algorithm der detection in dream diaries and writing styles could be misleading, especially if the dataset- as in differences and similarities between males and fe- our case - is limited in size or is unbalanced or con- males in dream tales. First we explored the vo- tains more than two classes. Hence, a confusion cabulary of dream descriptions for both the genre- matrix is used to evaluate the results of the exper- class by listing some of the representative words iments. The confusion matrix M is a N- dimen- for each genre. Then, we evaluated our gender de- sional matrix, where N is the number of classes, tection model on the dream reports dataset. The that summarizes the classification performance of model succeeded in obtaining good results man- a classifier with respect to Test set and Dev set, aging to distinguish a good part of dreams made both as in our case. Each column of the ma- by men or women. This research represents our trix represents predicted classifications and each preliminary step in the field, toward subsequent row represents actual defined classifications. As studies, in which we are trying to detect stylistic shown in Table 8, during the validation phase, the differences between dream tales and personal de- classifier made a total of two hundred-sixteen pre- scriptive narratives, such as travel tales and other dictions, while during the test phase the classifier forms of self-narration. made a total of two hundred-fourteen predictions. Acknowledgments Out of two hundred-sixteen cases in validation, the classifier predicted “Females” forty-four times and This project has been partially supported by the sixty-four “Males”. Actually, sixty people in the PON Ricerca e Innovazione 2014/20 and the POR sample belong to “Females” class and forty-eight Campania FSE 2014/2020 funds. References McNamara, P., Duffy-Deno, K., Marsh, T.. (2019). Dream content analysis using Artificial Intelligence. Altszyler, E., Sigman, M., Ribeiro, S., Slezak, D. F.. International Journal of Dream Research, 42-52. (2016). Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams Mukherjee, A., Liu, B.. (2010, October). Improving database. arXiv preprint arXiv:1610.01520. gender classification of blog authors. In In Proceed- ings of the 2010 conference on Empirical Methods Altszyler, E., Ribeiro, S., Sigman, M., Slezak, D. F.. in natural Language Processing (pp. 207-217). As- (2017). The interpretation of dream meaning: Re- sociation for Computational Linguistics. solving ambiguity using Latent Semantic Analysis in a small corpus of text. Consciousness and cogni- Niederhoffer, K., Schler, J., Crutchley, P., Loveys, K., tion, 56, 178-187. Coppersmith, G.. (2017, August). In your wildest dreams: the language and psychological features of Bulkeley, K.. (2009). Seeking patterns in dream con- dreams. In Proceedings of the Fourth Workshop on tent: A systematic approach to word searches. Con- Computational Linguistics and Clinical Psycholo- sciousness and cognition, 18(4), 905-916. gyFrom Linguistic Signal to Clinical Reality (pp. 13- 25). Bulkeley, K.. (2014). Digital dream analysis: A revised method. Consciousness and cognition, 29, Nielsen, T. A., Stenstrom, P.. (2005). What are the 159-170. memory sources of dreaming?. Nature, 437(7063), 1286. Coelho, H.. (2010). Classification of dreams using machine learning. In ECAI: 19th European Con- Rangel, F., Rosso, P.. 2013. Use of language and au- ference on Artificial Intelligence: Including Presti- thor profiling: Identification of gender and age, Nat- gious Applications of Artificial Intelligence (PAIS- ural Language Processing and Cognitive Science, 2010): Proceedings (Vol. 215, p. 169). 177. Domhoff, G. W.. (2003). The scientific study of Razavi, A. H., Matwin, S., De Koninck, J., Amini, R. dreams: Neural networks, cognitive development, R.. (2014). Dream sentiment analysis using sec- and content analysis. American Psychological As- ond order soft co-occurrences (SOSCO) and time sociation. course representations. Journal of Intelligent Infor- mation Systems, 42(3), 393-413. Domhoff, G. W., Schneider, A.. (2008). Similari- ties and differences in dream content at the cross- Scarone, S., Manzone, M. L., Gambini, O., Kantzas, I., cultural, gender, and individual levels. Conscious- Limosani, I., D’agostino, A., Hobson, J. A.. (2008). ness and cognition, 17(4), 1257-1265. The dream as a model for psychosis: an experimen- tal approach using bizarreness as a cognitive marker. Frantova, E., Bergler, S.. (2009). Automatic emo- Schizophrenia Bulletin, 34(3), 515-522. tion annotation of dream diaries. In Proceedings of the analyzing social media to represent collective Scarpelli, S., Bartolacci, C., D’Atri, A., Gorgoni, M., knowledge workshop at K-CAP 2009, The fifth in- De Gennaro, L.. (2019). The functional role of ternational conference on knowledge capture. dreaming in emotional processes. Frontiers in Psy- chology, 10. Hawkins, I. I., Raymond, C., Boyd, R. L. 2017. Schredl, M., Sahin, V., Schfer, G.. (1998). Gender Such stuff as dreams are made on: Dream language, differences in dreams: do they reflect gender dif- LIWC norms, and personality correlates, Dreaming, ferences in waking life?. Personality and Individual 27(2), 102. Differences, 25(3), 433-442. Hendrickx, I., Onrust, L., Kunneman, F., Hrriyetolu, Schredl, M., Ciric, P., Gtz, S., Wittmann, L.. (2004). A., Bosch, A. V. D., Stoop, W. 2016. Unraveling Typical dreams: stability and gender differences. reported dreams with text analytics. arXiv preprint The journal of psychology, 138(6), 485-494. arXiv:1612.03659. Schredl, M., Piel, E.. (2005). Gender differences in Koppel, M., Argamon, S., Shimoni, A. R., 2002. Au- dreaming: Are they stable over time?. Personality tomatically categorizing written texts by author gen- and Individual Differences, 39(2), 309-316. der. Literary and linguistic computing, 17(4), 401- 412. Schredl, M., Becker, K., Feldmann, E.. (2010). Pre- dicting the dreamers gender from a single dream re- Mathes, J., Schredl, M.. (2013). Gender differences port: A matching study in a non-student sample. In- in dream content: Are they related to personality?. ternational Journal of Dream Research. International Journal of Dream Research. Schredl, M., Noveski, A.. (2018). Lu- Mechti, S., Jaoua, M., Belguith, L. H., Faiz, R., 2013. cid Dreaming: A Diary Study. Imagina- Author profiling using style-based features, Note- tion, Cognition and Personality, 38(1), 517. book Papers of CLEF2. https://doi.org/10.1177/0276236617742622 Siclari, F., et al. (2017). The neural correlates of dreaming. Nature neuroscience, 20(6), 872. Silberman, Y., Bentin, S., Miikkulainen, R.. (2007). Semantic Boost on Episodic Associations: An Em- piricallyBased Computational Model. Cognitive Sci- ence, 31(4), 645-671. Skancke, J. F., Holsen, I., Schredl, M.. (2014). Conti- nuity between waking life and dreams of psychiatric patients: a review and discussion of the implications for dream research.International Journal of Dream Research. Thorup, Anne and Petersen, Lone and Jeppesen, Pia and Ohlenschlæger, Johan and Christensen, Torben and Krarup, Gertrud and Jorgensen, Per and Nor- dentoft, Merete. (2007). Gender differences in young adults with first-episode schizophrenia spec- trum disorders at baseline in the Danish OPUS study.The Journal of nervous and mental disease, 195(5), 396-405 Van Eeden, F.. (1913, July). A study of dreams. In Pro- ceedings of the Society for Psychical Research.Vol. 26, No. Part 47, pp. 431-461.