=Paper=
{{Paper
|id=Vol-3038/paper29
|storemode=property
|title=Modules for Mental and Physical Health States Analysis based on User Text Input
|pdfUrl=https://ceur-ws.org/Vol-3038/short11.pdf
|volume=Vol-3038
|authors=Artem Bashtovyi,Andriy Fechan,Vitaliy Yakovyna
|dblpUrl=https://dblp.org/rec/conf/iddm/BashtovyiFY21
}}
==Modules for Mental and Physical Health States Analysis based on User Text Input==
<pdf width="1500px">https://ceur-ws.org/Vol-3038/short11.pdf</pdf>
<pre>
Modules for mental and physical health states analysis based
on user text input

Artem Bashtovyia, Andriy Fechana, Vitaliy Yakovynaa,b
a
     Lviv Polytechnic National University, 12 Bandera str., Lviv, 79013, Ukraine
b
     University of Warmia and Mazury in Olsztyn, 2 Michała Oczapowskiego str., Olsztyn, 10-719, Poland


                 Abstract
                 Given the pandemic situation around the world, mental and physical wellbeing become a
                 crucial part of our lives, having said that health support is one of the most popular topics
                 nowadays. In order to keep mental stability, we have to feel when the actual help is needed.
                 Sometimes it’s extremely hard to analyze our mental and physical health and the time when
                 we need to ask for help. In this work, we built the module for mental illness classification,
                 which can identify mental states based on human text input. The module is part of a platform
                 for physical and mental state identification based on user journaling. The platform will allow
                 daily journaling and automatically identify user mental and physical states based on the texts.


                 Keywords 1
                 Natural language processing, classification, multiclass analysis, mental health, journaling,
                 BERT, physical state.

1. Introduction
    Human health on of the most important factors for longevity and happy life. Nowadays most a lot
of people neglect the importance of mental health as far as mental issues are not always related to
physical symptoms. Usually, people reach out to doctors for additional help and treatment, albeit
problems with the mental issues are related to the physical state, sometimes it’s not that
straightforward to identify them. Furthermore, the question of supporting mental health is crucial
during the pandemic, when the lockdown affects our social life. The COVID-19 pandemic has a
negative impact on people’s mental well-being, this even provoked the term “covid depression”. With
an appropriate treatment and care method, many individuals are able to quickly identify the issues and
work on them as long as they elaborate with specialists. Some people may resist going to the
specialists out of fear, embarrassment, lack of support, and resources. At the same time usage of
IT(information technologies) can help with basic identification of mental state, thus creating a first
step to the treatment. One of the examples could be the application that would help to analyze mental
state without a specialist’s help, just sitting at home.
    Information technologies help us identify health problems more easily, for instance, usage of
image and motion recognition, IoT sensors for physical human health, and potential diagnostic. In
addition, one of the approaches which do not require additional equipment and sensors - text analysis
of human-written texts, including the texts published on the internet. In this work, we worked on the
model for multi-class text classification, which will be one of the components of a web application for
identifying mental and physical states based on user text input. The main purpose is to provide an
effective self-analysis platform for people who want to support their mental and physical well-being.


IDDM-2021: 4th International Conference on Informatics & Data-Driven Medicine, November 19–21, 2021 Valencia, Spain
EMAIL: artem.bashtovyi.mpz.2020@lpnu.ua (A. 1); andrii.v.fechan@lpnu.ua (A. 2); yakovyna@matman.uwm.edu.pl (A. 3)
ORCID: 0000-0003-4304-8605 (A. 1); 0000-0001-9970-5497 (A. 2); 0000-0003-0133-8591 (A. 3)
            ©️ 2021 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)
The platform will allow classifying user’s mental state(anxiety, depression, happiness, etc) and
physical state(fatigue, pain, energy).


2. Machine learning and medicine
   Numerous innovations and new tech solutions are already on the market of healthcare, it’s hard to
imagine modern medicine without computer-supported information systems. Machine learning(ML)
algorithms are specifically useful for analyzing human health[1,2,3] and pandemic affect
prediction[4]. Artificial intelligence(AI) is used in healthcare specifically for solving a wide range of
tasks[5], like the following:
        1. Predicting suicide risk at hospitalization via ML algorithms based on social media data.
        2. Identifying the correlation between antidepressant usage and deprivation based on
             behavior.
        3. Data analysis for identifying drugs abuse.
   On the whole, AI is becoming more popular for usage in combination with precision medicine,
usage of ML algorithms could potentially help to solve problems that will retrofit the analysis of
patients[6]. As the example in Figure 1, the components potentially could be improved and optimized.
According to the research from Clinical and Translational Science(CTS) AI methods and spread out
the approaches for patient treatment. AI is making medicine easier and more efficient nonetheless,
complete AI integration into healthcare requires solving the issues like data misclassification, external
factors which are not suitable for models training, user privacy.


Figure 1: Precision medicine and AI combination
3. NLP for physical and mental states identification
   Natural language processing (NLP) gives us the ability to process the text and extract meaningful
information based on user input. It allows doing the classification (a process of feature selection),
tagging, and parsing data in order to understand the human intention. In fact, NLP has been widely
used in the medical field, specifically for mental and physical health analysis. Clinical data and even
regular data from social media is a great source of useful information for mental illness detection [7],
in addition, specific data sources like suicide notes can be used for algorithms development [8].
Classification approaches based on clinical data showed success in predicting mental problems with a
precision of about 71% [9]. Recently, the research from the MIT group created a Long short-term
memory (LSTM) based model for depression predisposition by processing user text input and voice
input sequences.
   Physical health issues classification is a pretty hard task in contrast to mental health issues due to
the specific data processing and huge physical states diversification. Specific models are required for
such issues classification, for example, this research includes specific models for low back pain
detection [10]. It includes a comprehensive analysis of data from user medical records, which are
formed by doctors. Such tasks include a lot of effort in terms of data selection. Another research
describes a concrete approach for migraine classification[11]. Physical issues are hard to describe and
detect from the solid user input text, consequently, physical state detection is a complex task that
requires specific data selection and processing.


4. Platform architecture
   The purpose of this work is to create a model for NLP multi-class text analysis. The model is the
main component of the self-analysis platform for mental and physical states detection based on user
input. Figure 2 shows the main component of the platform.


Figure 2: High-level platform architecture

    Text processing module - main module for text processing and tagging, based on defined mental
and physical states classes. This module is related to the following steps and operations:
        ● Data handling for classification
        ● Test classification for the mental and physical state
    User data management module - basic component for storing, editing, and processing user records
in the journals.
    Web application - main client for user interaction and journaling.
    The text processing module requires the main components described in Figure 3.
Figure 3: Text processing module structure

4.1. Dataset
    Choosing the right dataset is crucial for NLP tasks, especially the processing of health-related text
that contains disease or state information. Protecting personal data is very important for medical
companies, the data must be fully confidential, although some of the sources provide open anonymous
user information, finding relevant data is relatively hard. For the sake of simplicity and privacy policy
regulation, the open-source data set from Reddit network was chosen[12]. The dataset contains the
classified mental diseases data that is extracted from respective Reddit topics. It includes 13727
posts(records), the main 5 mental disorders classes are defined: Attention deficit hyperactivity
disorder(adhd), Post-traumatic stress disorder(ptsd), anxiety, depression, bipolar disorder(bipolar).
The additional class undefined(none) is associated with topics that are not connected to the mental
state discussion(music, travel, science, politics). Table 1 shows the distribution of classes within the
dataset.

Table 1
Data set labels distribution
         Class(subreddit)                     Posts count                Avg. No of words per post
               adhd                              2465                             152.74
             anxiety                             2422                             170.38
               pstd                              2001                             233.55
              bipolar                            2407                             203.28
            depression                           2450                             152.74
        undefined(none)                          1982                             238.52
Figure 4: Dataset classes distribution

4.2. Initial model
   The classification module is based on BERT BASE[13] pre-trained model, the neural network
with 12-layers, 768-hidden, 12-heads, 110M parameters, trained on lower-cased English text. BERT
approach is specifically useful for the current task because it has a “fine-tuning” ability for domain-
specific data(in this case mental and physical illness). Having the proper datasets for physical and
mental states gives us the ability to fine-tune the model by applying the softmax function. Python,
Pytorch, Transformers are core technologies used for the classification module.
   Tokenization plays a crucial role in NLP text processing, BERT model uses a special kind of
tokenizer (BertTokenizer) based on reserved classes. BERT is a sequence-to-sequence model, having
said that it requires a fixed-length input sequence. Based on the conducted experiments it was defined
the number of tokens per sequence - 512. The padding and truncation were used as methods for
proper sequence length generation. After careful data examination and test runs, 10 epochs was the
most relevant value with a learning rate 2e-5. AdamW(one of the fastest algorithms[14]) served as the
main optimization approach, it’s the combination of AdaGrad and RMSProp algorithms, recently
became popular in the computer vision branch. In order to avoid overfitting, we used dropout
probability as 0.3. Table 2 shows epoch details:

Table 2
Train epoch details
             Epoch                              Loss                             Accuracy
               1                               0.8162                             0.7653
               2                               0.5332                             0.8420
               3                               0.4421                             0.9355
               4                               0.2671                             0.9184
               5                               0.1288                             0.9577
               6                               0.0857                             0.9612
               7                               0.0749                             0.9692
               8                               0.0506                             0.9702
               9                               0.0411                             0.9734
               10                              0.0391                             0.9790
Figure 5 shows the training and validation accuracy difference. The last step shows almost no
difference in training accuracy, thus we decided to stay with base parameters.


Figure 5: Training and validation accuracy results

Table 3
General results
        Precision                    Recall                       F1                     Accuracy
          0.88                        0.88                       0.89                      0.88

Table 4
BERT multiclass performance results. Data set labels distribution
      Class name               Precision                     Recall                          F1
         adhd                     0.90                        0.91                          0.90
        anxiety                   0.86                        0.84                          0.85
        bipolar                   0.88                        0.83                          0.86
      depression                  0.81                        0.88                          0.84
         ptsd                     0.87                        0.86                          0.89
   undefined(none)                0.98                        0.97                          0.98

Table 3 contains the final results after all training stages, the general model precision is 0.87, which is
a pretty high result dataset based on Reddit topics. The hardest part in multi-class health issues
classification     is        the       high         correlation      between       different        classes.
The following approaches are used for precision, recall, F1, and accuracy calculation:

    ●   Precision - true positives divided by sum of true positives and false positives
    ●   Recall - true positives divided by sum of true positives and false negatives
    ●   F1 - weighted average of precision and recall
    ●   Accuracy is defined based on the validation dataset with provided targets
   The results from Table 4 gives us precise information about text analysis and specific classes
prediction. We can definitely say that the precision of “none” class classification is high, it gives us
hope that the model is relatively resistant to the false-positive results. Also, we can say that the top
classes from the results are “ptsd” and “adhd” diseases. The worst results are “anxiety” and
“depression”, there are a few reasons for this consequence. Those two classes are highly correlated
with other classes. The “bipolar” class contains around 30% “depression” posts, about 11% “anxiety”
posts, and about 12% posts related to the “depression” in the same manner. Consequently, the
correlation between the classes impairs the precision results, which requires careful dataset
classification. We identified that we can observe text misclassification on the user input due to the
connection of the mental states. The proper base dataset classification, including the interaction with
experts in medical records and diagnosis, will help to improve the situation.


4.3. Adjusting elements of the model
    The pre-trained model that was used for the experiment performed relatively well, since BERT is
based on transformer architecture it has achieved state-of-the-art performance for NLP tasks on the
dataset. In the task described above, we performed fine-tuning of the model for mental disorders
classification that gives us the model for specific tasks. In spite of good model precision, the testing
steps took a lot of resources, specifically the time for processing the test dataset. The module for
mental health state prediction requires constant improvement in the future for experiments with
different states and higher classification precision, having said that we decided to modify the pre-
trained model in hope to reduce testing time in the future and change the model size. Fortunately,
there are a lot of approaches[15] to tune BERT configuration(bertology) and pre-trained model in
order to gain performance, precision boost, reduce the time and size of the transformer-based models.
    Following studies explored methods and approaches improving the original BERT model for
various tasks. XLNet model research[16] suggested adding autoaggressive capabilities to BERT,
improving the quality of the original model, though at the cost of extra compute and time
requirements. Well-known RoBERTa model provides higher performance[17]. The recent studies
explain the “head-pruning” approach to save testing time and model size without sacrificing the
performance too much. This can be useful for tasks that require constant model testing and fine-
tuning.
    The approach described in the study[18] helps prune unimportant heads. Basically, two main
approaches are used during head pruning. First to prune the whole head from the model, the second
prune only least useful heads weights for the given head. The head pruning requires defining the
importance of each head, the following formula suggested in the study and was used:
                                                        𝜕𝐿(𝑋)                                    (1)
                               𝐼ℎ = 𝐸𝑥~𝑋 |𝐴𝑡𝑡ℎ (𝑋)𝑇             |,
                                                       𝜕𝐴𝑡𝑡ℎ (𝑋)
    where X - the data distribution, L(x) - loss for sample x


4.4. Results from the experiment

   After conducting the experiments with head pruning, we highlighted the best parameters for testing
the configured model. Basically, the least head importance was defined for encoder multi-head
attention(EMHA). Hence the final pruning was performed mostly on EMHA. Figure 6 shows
particular steps in 10% head-pruning and the effect on the precision. Basically, the results we
discovered - the model precision drops drastically only after 38% pruned heads, which gives us a
positive attitude. The precision was reduced by almost 8% for the testing stage at that point, however,
test data validation took about 25% less time. Consequently, the adjusted model configuration for the
given dataset is suitable in terms of testing the model quickly. The final result showed that 8 heads per
layer with total precision of almost 80% and time lesser by almost 24% is the best option for the
selected dataset and model architecture.
   The final decision was to keep parameters for the initial model described in section 4.1, and
perform head pruning based on the final results above. Furthermore, we adjusted the number of
hidden layers respectively with pruned heads.


Figure 5: Head pruning stages with 10% step


Table 5
The original and pruned model with final parameters test performance results
       Model             Precision            Recall               F1                    Accuracy
   Original pre-           0.88                0.88               0.89                     0.88
      trained
  Modified model           0.81                0.82               0.79                     0.81
  with 38% heads
      pruned

    The positive results give us the ability to perform the testing stage for different mental issues
records faster. Additionally, we plan to use the approach described above for physical state
classification model tuning and configuration, thereby reducing testing time on physical states dataset.
    The future work is related to the second module development for physical state classification.
Physical diseases dataset requires careful processing, furthermore, the model development includes
constant feedback and collaboration with doctors and medical companies. The final step is to combine
to model into a single text classification model and create the system with a web application for the
full journaling process.
5. References

  [1] Y. Bashtyk, J. Campos, A. Fechan, S. Konstantyniv, V. Yakovyna, Computer monitoring of
      physical and chemical parameters of the environment using computer vision systems:
      Problems and prospects. CEUR Workshop Proceedings, 2020, 2753, pp. 437–442.
  [2] Z. Mykytyuk, A. Fechan, V. Petryshak, G. Barylo, O. Boyko, Optoelectronic multi-sensor of
      SO2 and NO2 gases. Modern Problems of Radio Engineering, Telecommunications and
      Computer Science, Proceedings of the 13th International Conference on TCSET 2016, 2016,
      pp. 402–405.
  [3] A. Kucher, O. Boyko, K. Ilkanych, A. Fechan, N. Shakhovska Retrospective analysis by
      multifactor regression in the evaluation of the results of fine-needle aspiration biopsy of
      thyroid nodulesCEUR Workshop Proceedings, 2020, 2753, pp. 443–447.
  [4] V. Yakovyna, N. Shakhovska, Modelling and predicting the spread of COVID-19 cases
      depending on restriction policy based on mined recommendation rules. Mathematical
      Biosciences and Engineering 18(3), 2021 2789–2812,.
  [5] Ayesha Kamran Ul haq, Amira Khattak, Noreen Jamil, M. Asif Naeem, Farhaan Mirza, "Data
      Analytics in Mental Healthcare", Scientific Programming, 2020, doi:10.1155/2020/2024160
  [6] K.B. Johnson, W.Q. Wei, D. Weeraratne, M.E. Frisse, K. Misulis, K. Rhee, J. Zhao, J.L.
      Snowdon, Precision medicine, AI, and the future of personalized health care, Clin. Trans.
      Sci., 14, 2020, doi:10.1111/cts.12884
  [7] A. Le Glaz, Y. Haralambous, D. Kim-Dufor, P. Lenca, R. Billot, T. Ryan, J. Marsh , J.
      DeVylder, M. Walter, S. Berrouiguet, Lemey C Machine Learning and Natural Language
      Processing in Mental Health: Systematic Review J Med Internet, Res 2021, doi:
      10.2196/15708
  [8] R. Cavlo, D. Milne, M. Hussain, H. Christensen, Natural language processing in mental
      health applications using non-clinical texts. Natural Language Engineering, 2017 23(5).
      doi:10.1017/S1351324916000383

  [9] N. Viani, R. Botelle, Kerwin, J. et al. A natural language processing approach for identifying
      temporal disease onset information from mental healthcare text, 2021, Sci Rep 11, 757.
      https://doi.org/10.1038/s41598-020-80457-0
  [10]         Judd, Michael & Zulkernine, Farhana & Wolfrom, Brent & Barber, David &
      Rajaram, Akshay Detecting Low Back Pain from Clinical Narratives Using Machine
      Learning, 2018, doi:10.1007/978-3-319-99133-7_10.
  [11]          M. Katsuki, N. Narita, Y. Matsumori, N. Ishida, O. Watanabe, S. Cai, T. Tominaga.
      Preliminary development of a deep learning-based automated primary headache diagnosis
      model using Japanese natural language processing of medical questionnaire, 2020,
      doi:10.25259/SNI_827_2020.
  [12]         Reddit topics classified dataset URL: https://github.com/amurark/mental-health-
      classification
  [13]          https://huggingface.co/bert-base-uncased
  [14]         Gentle Introduction to the Adam Optimization Algorithm for Deep Learning URL:
      https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning
  [15]         Rogers, Anna et al. “A Primer in BERTology: What We Know About How BERT
      Works.” Transactions of the Association for Computational Linguistics 8,2020,: 842-866.
  [16]         Yang, Zhilin et al. “XLNet: Generalized Autoregressive Pretraining for Language
      Understanding.” NeurIPS ,2019, arXiv:1906.08237.
  [17]          Liu, Yinhan et al. “RoBERTa: A Robustly Optimized BERT Pretraining Approach.”,
      2019,: n. pag., ArXiv abs/1907.11692.
  [18]         Michel, Paul et al. “Are Sixteen Heads Really Better than One?” NeurIPS, 2019,
      arXiv:1905.10650

</pre>