Gender Detection and Stylistic Differences and Similarities between Males
                  and Females in a Dream Tales Blog
     Raffaele Manna           Antonio Pascucci          Johanna Monti
 UNIOR NLP Research Group UNIOR NLP Research Group UNIOR NLP Research Group
    University L’Orientale   University L’Orientale   University L’Orientale
        Naples, Italy            Naples, Italy            Naples, Italy
   rmanna@unior.it         apascucci@unior.it        jmonti@unior.it


                      Abstract                             Diary narratives represent a field already inves-
                                                           tigated by researchers. The recent development
    English. In this paper we present the re-              of web communities focused on telling dreams
    sults of a gender detection experiment car-            allows researchers to access and discover new
    ried out on a corpus we built downloading              characteristics related to the language of dreams.
    dream tales from a blog. We also high-                 Stylistic and linguistic features of dreams in blog
    light stylistic differences and similarities           reports are essential in order to detect writing style
    concerning lexical choices between men                 and content differences between men and women,
    and women. In order to carry the exper-                but also enable future researches associated to the
    iment we built a feed-forward neural net-              different types of personality and styles associated
    work with traditional sparse n-hot encod-              with mental health diagnoses and therapeutic out-
    ing using the Keras open source library.               comes.
                                                           The aim of this paper is to show that despite
1   Introduction                                           dreams are just an unconscious production, there
                                                           are several stylistic differences between the re-
It is generally accepted that dreams are just an un-
                                                           ports of dreams by males and females on online
conscious production, and that represent a type of
                                                           blogs. The model we built is able to represent and
non-manipulable happening. However, many peo-
                                                           classify all stylistic differences.
ple believe that dreams are premonitory of future
                                                           Moreover, this research represents a preliminary
events as well as representations and reworkings
                                                           step in the field of dream tales which will be fol-
of past events. Humans tend to preserve all per-
                                                           lowed by an attempt to find stylistic differences
sonal events, some of them in the form of a diary,
                                                           between dream tales and other forms of self narra-
namely the best method to tell an event and keep
                                                           tion (i.e. travel tales).
its aura of magic.
                                                           The paper is organized as follows: in Section 2 we
Until recently, dream reports were relegated to the
                                                           introduce Related Work, in Section 3 we describe
the pages of paper journals or revealed to famil-
                                                           the corpus we built and the blog. Methodology is
iar people. At an earlier time, dreams are gathered
                                                           described in Section 4 and Results are in Section
from sleep research labs, psycho-therapeutic and
                                                           5. In Section 6 we present our Conclusions and we
in patient settings, personal dream journals and oc-
                                                           introduce Future Work.
casionally classroom settings where “most recent
dreams” and “most vivid dreams” are collected as           2   Related Work
in (Domhoff, 2003).
Social media have opened millions of pages where           Textual analysis of dream reports is still not a com-
people feel at ease to confess their thoughts,             pletely investigated field in NLP. One of the pur-
their experience and even their secret fantasies.          poses of computational dream report analysis lies
These platforms such as Twitter, Facebook and              in understanding how and why a dream narrative
web blogs are a good ground for computational              differs from a waking narrative (Hendrickx et al.,
text analysis research in social science and mental        2016). For example, if a dream description con-
health assessment via language.                            tains more function words than a waking narra-
                                                           tive, what is the relationship between the content
     Copyright c 2019 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0   of dreams and the use of more function words?
International (CC BY 4.0).                                 Earlier studies were conducted by (Domhoff, 2003
and Bulkeley, 2009). In their researches, dream         clothes in dreams.
reports are analyzed and a systematic category list     In psychiatric studies, the gender variable is iden-
of words that can be used for queries and word-         tified as a predictive for psychotic behaviors and
frequency counts in the DreamBank.net is pro-           disorders. In (Thorup, et al., 2007), the authors
vided. The categories are related to the content        showed that, in psychotic patients, the gender-
of dreams and used to retrieve the mentions of          related variable has a role in showing different
emotions, characters, perception, movement and          psycho-pathological characteristics and different
socio-cultural background.                              social functioning. Although no dream samples
On the basis of this approach (Bulkley, 2014) up-       were taken as a subject in this study.
date the categories list and evaluate it on four        Dream diaries refine the research in uncovering
datasets of the DreamBank corpus. It has been           connections between dreams and dreamer’s socio-
shown that this type of word analysis can be ap-        cultural background, mental conditions and neuro-
plied to detect the topics of dreams. In addition,      physiological factors. The language of online
this latter contribution provides evidence that it is   dreams in relation to mental health conditions has
possible to guess about a person’s life and activ-      yet to be analyzed, however prior laboratory re-
ities, personal concerns and interests based on an      search suggests that dream content may differ ac-
individual dream collection .                           cording to clinical conditions.
Other works focus on identifying the emotions in        In (Skancke et al., 2014), emotional tone, themes
the reports of dreams. In particular (Razav et          and actor focus in dream report were associated
al., 2014) use a machine learning method to as-         with anxiety disorders, schizophrenia, personal-
sign emotion labels to dreams on a four-level neg-      ity and eating disorders. However, it is not clear
ative/positive sentiment scale. In their research,      whether dream content can be predictive with re-
dreams are represented as word vectors and dy-          spect to mental disorders.
namic features are included to represent sentiment      In (Scarone, 2008), the hypothesis of the dream-
changes in dream descriptions.                          ing brain as a neurobiological model for psychosis
In a more accurate sentiment analysis, (Frantova        is tested by focusing on cognitive bizarreness, a
and Bergler, 2009) train a classifier, based on         distinctive property of the dreaming mental state
semi-automatically compiled emotion word dic-           defined by discontinuities and incongruities in
tionaries, in order to assign five fuzzy-emotion cat-   the dream report, thoughts and feelings. Cogni-
egories to dream reports. Then, they compare their      tive bizarreness is measured in written reports of
results against a sample from the DreamBank that        dreams and in verbal reports of waking fantasies
is manually labeled with emotion annotations.           in thirty schizophrenics and thirty normal controls.
In some non-computational studies and aimed at          The differences between these two groups indi-
highlighting gender differences (Schredl, 2005;         cate that, under experimental conditions, the wak-
Schredl, 2010), dream reports are used to spot gen-     ing cognition of schizophrenic subjects shares a
der differences in dream recall. The first research     common degree of formal cognitive bizarreness
demonstrates that gender differences in dream re-       with dream reports of both normal controls and
calls and dream contents are stable. Human judges       schizophrenics. These results support the hypoth-
are able to correctly match the dreamer’s gender        esis that dreaming brain could be a useful exper-
based on a single dream report with a probabil-         imental model for psychosis. Taking advantage
ity better than chance. Based on these findings,        of all the above considerations and mixing the
in the latter study the stability of gender differ-     psychiatric and neurobiological information of the
ences in dream content is analyzed over time. Two       studies shown, the present research wants first of
dream themes (work-related dreams and dreams            all to reveal the differences between genders in
of deceased persons) were investigated and gen-         dreams. And as a future goal, starting from the
der differences resulted quite stable over time. In     hypothesis of cognitive similarity between dreams
(Mathes, 2013) gender differences are associated        and psychoses and using dreams as an experimen-
to personality traits. The analysis indicate that       tal path, to clarify the relationship between gender
some of the big five personality dimensions might       and psychosis.
be linked with some dream characteristics such
as characters and the occurrence of weapons or
3     Dataset Description                              In Tables 2 and 3 we present four lists of six ex-
                                                       clusive nouns and six exclusive verbs used by men
The web is full of blogs, where people can share       or women. Both exclusive nouns and exclusive
opinions, questions and personal feelings and          verbs are the most relevant for frequency for Males
thoughts about their own life. Furthermore, people     and Females classes. Verbs are reported in their
also share their dreams, one of the most personal      base form. The results indicate, without interpre-
hidden aspects of life.                                tative effort for a human, that most relevant topics
It is very easy to find a blog in which thousands      given these high frequency words are associated
of people share their “dream experiences”, some-       to activities and events that the dreamers want to
times discovering that other people have had sim-      happen, in settings and adventurous situations for
ilar experiences dictated by similar life styles.      male dreamers. Meanwhile dreamers belonging to
We investigated a blog, called SogniLucidi, on         Females class seem to set their dreams in a bale-
which every day thousands of people tell their         ful scenario, where “transizione” (transition) and
dreams and nightmares, mixing their nightly fan-       “trapasso” (transition) mean that they dream about
tasies with their unconscious writing style choices.   twilight state, beyond death or they fantasize about
SogniLucidi, that literally can be translated in Lu-   surreal activities.
cidDreams took its name from a term coined by
the Dutch psychiatrist Frederik van Eeden in 1913:                Males                        Females
it describes the situation in which dreamers are        destinazione (destination)          balzo (bound)
aware that they are dreaming.                            esplosione (explosion)               luce (light)
There are many techniques that, when cor-                     foresta (wood)                nuvola (cloud)
rectly applied, allow dreamers to obtain a “Lu-              lenzuola (linens)        piscina (swimming pool)
cid Dream” and that we report for complete-                  spiaggia (beach)          transizione (transition)
ness: CAT (Cycle Adjustment Technique), MILD                terrazze (terraces)          trapasso (transition)
(Mnemonic Induction of Lucid Dreaming), WBTB
(Wake Back To Bed), WILD (Wake Initiated Lucid         Table 2: Most frequent Exclusive Nouns in the
Dreams), RCT (Reality Control Test) and ITES           whole corpus.
(Induction Through External Stimulus).
The corpus we built for the investigation is bal-
                                                                  Males                      Females
anced with gender and the number of authors an-
                                                        assomigliare(to resemble)       affrontare(to face)
alyzed is not randomly selected but represents the
                                                              baciare(to kiss)             cadere(to fall)
precise number of participants to the blog.
                                                         funzionare(to function)       ragionare(to reason)
3.1    Dataset Statistics                                   ottenere(to obtain)        stringere(to tighten)
                                                        scomparire(to disappear)      succedere(to happen)
In this paragraph, we present the resulting statis-       superare(to overcome)            volare(to fly)
tics obtained using the NLTK module together
with other statistics formulas for the analysis of     Table 3: Most frequent Exclusive Verbs in the
the corpus we built on SogniLucidi blog. In Table      whole corpus.
1 we report two important statistics about words:
the number of tokens in texts written by men and       Lastly, in Table 4 we report the average of tokens
women and word types. We can notice that there         per sentence.
is a big difference in the number of tokens used by
Males (80629) and Females (57673).                       Males Tokens AVG         Females Tokens AVG
                                                        18,74 tokens/sentence     10,01 tokens/sentence

                            Males    Females           Table 4: Average of tokens per sentence in texts
      Number of Tokens      80629     57673            written by men and women.
      Word Types            12254     11158

                                                       4   Methodology
Table 1: Words’ statistics in the whole corpus in      The training corpus consists in dream text descrip-
terms of Number of Tokens and Word Types.              tions written by two groups of authors:
  • 28 Male authors;                                   Word level n-grams used the following parame-
                                                       ters:
  • 28 Female authors.
                                                           • Minimum document frequency = 2. Terms
The corpus is balanced and labelled with gender.
                                                             with a document frequency lower than would
Gender annotation has been done manually and
                                                             be ignored;
based on the name of the users, their profile pho-
tos and description. For each author, a total of           • Term frequency-inverse document frequency
fifteen texts about dreams are provided. Authors             (tf-idf) weighting;
are coded with an alpha-numeric author-ID. For
each author, the last fifteen texts about dreams           • Maximum document frequency = 1.0 or
have been retrieved from the personal web diary’s            rather terms that occur in all documents
timeline. As a result, the time frame of the dream           would be ignored.
reports might vary from days to months, depend-
ing on how frequently users report their dreams        4.2.1   Classification Model
on the blog. To train our classification model, we     We built a neural network to perform the gender
exploited the descriptions of dreams only and not      detection issue. We decided to run a feed-forward
the comments (both comments of the authors and         neural network with traditional sparse one-hot en-
comments of other members of the SogniLucidi           coding with the Keras open source library. After a
blog).                                                 parameters selection, the model obtained the best
                                                       performance with an Adam optimizer and a learn-
4.1   Preprocessing                                    ing rate of 0.32, feeding it with a batch size of sev-
For preprocessing we used the Python library           enty and training for thirty epochs. Moreover, the
BeautifulSoup along with same regex procedures.        input layer of sixty-five neurons with an initializa-
We performed the following preprocessing steps:        tion using a norm kernel. Then, a RELU activa-
                                                       tion function was applied, followed by a dropout
  • Removing the html tags;                            layer. During optimization, we found that a rel-
  • Removing URLs;                                     atively big dropout rate of 0.5 outperformed the
                                                       smaller dropout rates. The output layer is a single
  • Removing @username mentions;                       neuron, followed by a linear activation function.
                                                       The feature set provided to the model was an n-
  • Lower-casing the characters;                       hot encoding of the uni-, bi- and trigrams.
  • Detecting stop-words by document fre-
                                                       5    Results
    quency and removing. Only n-grams that oc-
    curred in all documents has been considered        In this section we describe the results on the train-
    a stop-word and ignored.                           ing data and the test data. The data we used was
                                                       split into training and test data. The training set
4.2   Features                                         contains a known output and the model learns on
Feature selection is a very critical step in any       this data in order to be generalized to other data
model. For feature selection we use the sklearn        later on. We have the test set (or subset) in order to
utilities SelectKbest. It selects the n-best feature   test our model+ prediction on this subset. We cal-
based on a given criterion. In our experiments,        culated accuracy scores on the training data, both
the features are selected on the f classif criteria.   on validation set (Dev set) of 0.3 and Test set of
This function perform an ANOVA test, a type of         0.2. The performances (both for Dev test and Test
hypothesis test, on each feature on its own and as-    set) are shown in Table 5 in terms of Accuracy,
sign that feature a p-value. The SelectKbest rank      Precision and F1 Score. We obtained roughly the
the features by that p-value and keep only the n-      same results for Accuracy in Dev set and the Test
best features. The feature set for the dream dataset   set, 0.794 and 0.775, respectively.
benefits from word trigrams in addition to other n-    Finally, in order to compare our approach, we con-
grams. In our final model, we use the following n-     sidered two other baseline models namely Multi-
grams features: Word unigrams, bigrams and tri-        nomial Naive Bayes (MNB) and Linear Support
grams.                                                 Vector Machine (SVM) besides the feed-forward
                                                         to “Males” class.
                        Dev set     Test set
          Accuracy       0.796       0.776
                                                                               Males     Females
          Precision      0.937       0.917
                                                                   Males        45          3
          F1 Score       0.803       0.786
                                                                   Females      19         41


Table 5: Performances in Dev set and Test set in
                                                              Table 8: Confusion Matrix on Dev set.
terms of Accuracy, Precision and F1 Score.
                                                         After this intermediate phase and after having
neural network for performance comparisons on            tuned the parameters in order to optimize the
Test set.                                                model on the previous results, the classifier made
                                                         a total of two hundred-fourteen predictions dur-
                   MNB      SVM                          ing the test phase. Out of two hundred-fourteen
                   0.411    0.588                        predictions, the model predicted “Females” forty-
                                                         three times and sixty-four “Males”. Indeed, fifty-
    Table 6: Baseline Accuracy Comparisons.              nine people belong to “Females” class and, as pre-
                                                         dicted during the validation phase, forty-eight to
To assess the performance of the model, the              “Males” class. We report gender prediction results
Root Mean Square Error (RMSE) was computed.              on test data in the confusion matrix in Table 9.
RSME measures the distance of the predicted
value to the true value. It is a measure of error,
so the lower is the score, the better is the perfor-                           Males     Females
mance. We show RMSE results in Table 7.                            Males        44          4
                                                                   Females      20         39
                 Dev set    Test set
                  0.233      0.224
                                                              Table 9: Confusion Matrix on Test set.
Table 7: RMSE of the feed-forward model on the
Dev set and when using Test set.                         6   Conclusions and Future Work
Using classification accuracy alone when evaluat-        In this paper we have shown our results on gen-
ing the performance of the classification algorithm      der detection in dream diaries and writing styles
could be misleading, especially if the dataset- as in    differences and similarities between males and fe-
our case - is limited in size or is unbalanced or con-   males in dream tales. First we explored the vo-
tains more than two classes. Hence, a confusion          cabulary of dream descriptions for both the genre-
matrix is used to evaluate the results of the exper-     class by listing some of the representative words
iments. The confusion matrix M is a N- dimen-            for each genre. Then, we evaluated our gender de-
sional matrix, where N is the number of classes,         tection model on the dream reports dataset. The
that summarizes the classification performance of        model succeeded in obtaining good results man-
a classifier with respect to Test set and Dev set,       aging to distinguish a good part of dreams made
both as in our case. Each column of the ma-              by men or women. This research represents our
trix represents predicted classifications and each       preliminary step in the field, toward subsequent
row represents actual defined classifications. As        studies, in which we are trying to detect stylistic
shown in Table 8, during the validation phase, the       differences between dream tales and personal de-
classifier made a total of two hundred-sixteen pre-      scriptive narratives, such as travel tales and other
dictions, while during the test phase the classifier     forms of self-narration.
made a total of two hundred-fourteen predictions.
                                                         Acknowledgments
Out of two hundred-sixteen cases in validation, the
classifier predicted “Females” forty-four times and      This project has been partially supported by the
sixty-four “Males”. Actually, sixty people in the        PON Ricerca e Innovazione 2014/20 and the POR
sample belong to “Females” class and forty-eight         Campania FSE 2014/2020 funds.
References                                                McNamara, P., Duffy-Deno, K., Marsh, T.. (2019).
                                                           Dream content analysis using Artificial Intelligence.
Altszyler, E., Sigman, M., Ribeiro, S., Slezak, D. F..     International Journal of Dream Research, 42-52.
  (2016). Comparative study of LSA vs Word2vec
  embeddings in small corpora: a case study in dreams     Mukherjee, A., Liu, B.. (2010, October). Improving
  database. arXiv preprint arXiv:1610.01520.               gender classification of blog authors. In In Proceed-
                                                           ings of the 2010 conference on Empirical Methods
Altszyler, E., Ribeiro, S., Sigman, M., Slezak, D. F..     in natural Language Processing (pp. 207-217). As-
  (2017). The interpretation of dream meaning: Re-         sociation for Computational Linguistics.
  solving ambiguity using Latent Semantic Analysis
  in a small corpus of text. Consciousness and cogni-     Niederhoffer, K., Schler, J., Crutchley, P., Loveys, K.,
  tion, 56, 178-187.                                        Coppersmith, G.. (2017, August). In your wildest
                                                            dreams: the language and psychological features of
Bulkeley, K.. (2009). Seeking patterns in dream con-        dreams. In Proceedings of the Fourth Workshop on
  tent: A systematic approach to word searches. Con-        Computational Linguistics and Clinical Psycholo-
  sciousness and cognition, 18(4), 905-916.                 gyFrom Linguistic Signal to Clinical Reality (pp. 13-
                                                            25).
Bulkeley, K.. (2014). Digital dream analysis: A
  revised method. Consciousness and cognition, 29,        Nielsen, T. A., Stenstrom, P.. (2005). What are the
  159-170.                                                  memory sources of dreaming?. Nature, 437(7063),
                                                            1286.
Coelho, H.. (2010). Classification of dreams using
  machine learning. In ECAI: 19th European Con-           Rangel, F., Rosso, P.. 2013. Use of language and au-
  ference on Artificial Intelligence: Including Presti-     thor profiling: Identification of gender and age, Nat-
  gious Applications of Artificial Intelligence (PAIS-      ural Language Processing and Cognitive Science,
  2010): Proceedings (Vol. 215, p. 169).                    177.

Domhoff, G. W.. (2003). The scientific study of           Razavi, A. H., Matwin, S., De Koninck, J., Amini, R.
  dreams: Neural networks, cognitive development,           R.. (2014). Dream sentiment analysis using sec-
  and content analysis. American Psychological As-          ond order soft co-occurrences (SOSCO) and time
  sociation.                                                course representations. Journal of Intelligent Infor-
                                                            mation Systems, 42(3), 393-413.
Domhoff, G. W., Schneider, A.. (2008). Similari-
  ties and differences in dream content at the cross-     Scarone, S., Manzone, M. L., Gambini, O., Kantzas, I.,
  cultural, gender, and individual levels. Conscious-       Limosani, I., D’agostino, A., Hobson, J. A.. (2008).
  ness and cognition, 17(4), 1257-1265.                     The dream as a model for psychosis: an experimen-
                                                            tal approach using bizarreness as a cognitive marker.
Frantova, E., Bergler, S.. (2009). Automatic emo-           Schizophrenia Bulletin, 34(3), 515-522.
  tion annotation of dream diaries. In Proceedings of
  the analyzing social media to represent collective      Scarpelli, S., Bartolacci, C., D’Atri, A., Gorgoni, M.,
  knowledge workshop at K-CAP 2009, The fifth in-           De Gennaro, L.. (2019). The functional role of
  ternational conference on knowledge capture.              dreaming in emotional processes. Frontiers in Psy-
                                                            chology, 10.
Hawkins, I. I., Raymond, C., Boyd, R. L. 2017.            Schredl, M., Sahin, V., Schfer, G.. (1998). Gender
  Such stuff as dreams are made on: Dream language,         differences in dreams: do they reflect gender dif-
  LIWC norms, and personality correlates, Dreaming,         ferences in waking life?. Personality and Individual
  27(2), 102.                                               Differences, 25(3), 433-442.
Hendrickx, I., Onrust, L., Kunneman, F., Hrriyetolu,      Schredl, M., Ciric, P., Gtz, S., Wittmann, L.. (2004).
  A., Bosch, A. V. D., Stoop, W. 2016. Unraveling           Typical dreams: stability and gender differences.
  reported dreams with text analytics. arXiv preprint       The journal of psychology, 138(6), 485-494.
  arXiv:1612.03659.
                                                          Schredl, M., Piel, E.. (2005). Gender differences in
Koppel, M., Argamon, S., Shimoni, A. R., 2002. Au-          dreaming: Are they stable over time?. Personality
  tomatically categorizing written texts by author gen-     and Individual Differences, 39(2), 309-316.
  der. Literary and linguistic computing, 17(4), 401-
  412.                                                    Schredl, M., Becker, K., Feldmann, E.. (2010). Pre-
                                                            dicting the dreamers gender from a single dream re-
Mathes, J., Schredl, M.. (2013). Gender differences         port: A matching study in a non-student sample. In-
 in dream content: Are they related to personality?.        ternational Journal of Dream Research.
 International Journal of Dream Research.
                                                          Schredl, M.,       Noveski, A..     (2018). Lu-
Mechti, S., Jaoua, M., Belguith, L. H., Faiz, R., 2013.     cid Dreaming:         A Diary Study. Imagina-
 Author profiling using style-based features, Note-         tion, Cognition and Personality, 38(1), 517.
 book Papers of CLEF2.                                      https://doi.org/10.1177/0276236617742622
Siclari, F., et al. (2017). The neural correlates of
   dreaming. Nature neuroscience, 20(6), 872.
Silberman, Y., Bentin, S., Miikkulainen, R.. (2007).
   Semantic Boost on Episodic Associations: An Em-
   piricallyBased Computational Model. Cognitive Sci-
   ence, 31(4), 645-671.
Skancke, J. F., Holsen, I., Schredl, M.. (2014). Conti-
  nuity between waking life and dreams of psychiatric
  patients: a review and discussion of the implications
  for dream research.International Journal of Dream
  Research.
Thorup, Anne and Petersen, Lone and Jeppesen, Pia
  and Ohlenschlæger, Johan and Christensen, Torben
  and Krarup, Gertrud and Jorgensen, Per and Nor-
  dentoft, Merete. (2007). Gender differences in
  young adults with first-episode schizophrenia spec-
  trum disorders at baseline in the Danish OPUS
  study.The Journal of nervous and mental disease,
  195(5), 396-405
Van Eeden, F.. (1913, July). A study of dreams. In Pro-
  ceedings of the Society for Psychical Research.Vol.
  26, No. Part 47, pp. 431-461.