<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>at eRisk 2022: Early Detection of Depression Based-on Concatenating Representation of Multiple Hidden Layers of RoBERTa Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shih-Hung Wu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhao-Jun Qiu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chaoyang University of Technology</institution>
          ,
          <addr-line>Taichung</addr-line>
          ,
          <country country="TW">Taiwan, R.O.C</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Deep Learning, RoBERTa</institution>
          ,
          <addr-line>Depression Detection</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>medical records [1].</institution>
          <addr-line>In 2017, Reece, A.G., Reagan, A.J., Lix</addr-line>
          ,
          <institution>K.L.M et al. used Twitter data to predict</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Depression has been seen as a global crisis, with hundreds of millions of people around the world suffering from it. By analyzing people's writings on social media, a system has the opportunity to detect depression and can alert the person to seek medical help. Our team participated the CELF 2022 eRisk Task 2: Early Detection of Depression, a mission designed to detect people early for depression tendencies. Our research methodology focuses on improving the pre-training model RoBERTa. We ran a total of five experiments this year. The first one is regarded as a baseline using the pre-trained language model. Experiment two is to extract the output of hidden layers as a new representation. Experiment three is to obtain keyword features by extracting two categories of single word features. Experiment four is to train two models for the title and text separately, and integrate the results to make predictions. Experiment five is to integrate the methods of experiment two and experiment four. According to the results of the task evaluation, the method of experiment two is indeed better than using the pre-trained model. Experiments 4 and 5 performed well on the Task's Ranking-based evaluation after testing 1000 writings.</p>
      </abstract>
      <kwd-group>
        <kwd>Multiple</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>2022 Copyright for this paper by its authors.
and BDI answers, they believed it should be noted that not all categories were discussed in posts [15].
The second approach is to classify posts in different topics, and find the most relevant topics through
the word vectors with the corpus. Bucur et al. team and Spartalis et al. team also used the pre-trained
model approach [16,17], the difference being that one was trained to analyze post similarities and the
other was to analyze feature-based transfer learning.</p>
      <p>
        To analyze people's psychological conditions through a wide range of information in social media
is widely appreciated. CLEF eRisk also gave three different tasks this year [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], namely Task 1: Early
Detection of Signs of Pathological Gambling, Task 2: Early Detection of Depression, Task 3:
Measuring the severity of the signs of Eating Disorders. Our team is involved in Task 2, a task designed
to detect people for depression tendencies. The eRisk server iteratively provides user writing to the
participating teams by releasing data step by step. How to diagnose the tendency to depression early
through some data is part of the evaluation indicator, that is, the evaluation not only considers the
correctness of the system output, but also considers the time point at which its decision is published.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Data and Pre-processing</title>
      <p>
        The data used in this paper is the dataset provided at eRisk 2022 Task 2: Early Detection of
Depression [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][18]. The data contains text from multiple users, each of whom typically provides a large
amount of written text in the XML format as in Figure 1. ID: Contains the anonymous ID of the user,
title: the title of the post (keep blank for comments), INFO: the source of the post, TEXT: the content
of the post or comment.
      </p>
      <p>&lt;INDIVIDUAL&gt;
&lt;ID&gt; … &lt;/ID&gt;
&lt;WRITING&gt;
&lt;TITLE&gt; … &lt;/TITLE&gt;
&lt;DATE&gt; … &lt;/DATE&gt;
&lt;INFO&gt; … &lt;/INFO&gt;
&lt;TEXT&gt; … &lt;/TEXT&gt;
&lt;/WRITING&gt;
&lt;WRITING&gt;
&lt;TITLE&gt; … &lt;/TITLE&gt;
&lt;DATE&gt; … &lt;/DATE&gt;
&lt;INFO&gt; … &lt;/INFO&gt;
&lt;TEXT&gt; … &lt;/TEXT&gt;
&lt;/WRITING&gt;
……
&lt;/INDIVIDUAL&gt;</p>
      <p>The Early Detection of Depression datasets are listed in Figure 2. There are datasets in 2018 and
2017 respectively, each is collected social media posts of that year, and are divided into two categories:
depression (pos) and non-depression (neg). This paper uses 2018 data set for model training, and the
2017 data set for verification.</p>
      <sec id="sec-2-1">
        <title>Data</title>
        <p>Data
2018_cases
(Training Set)
2017_cases
(Test Set)
neg
pos
neg
pos
subject121.xml
subject130.xml</p>
        <p>⁝
subjectxxxx.xml
subject136.xml
subject188.xml</p>
        <p>⁝
subjectxxxx.xml
test_subject25.xml
test_subject50.xml</p>
        <p>⁝
test_subjectxxxx.xml
test_subject625.xml
test_subject1345.xml</p>
        <p>⁝
test_subjectxxxx.xml</p>
        <p>Since the dataset was collected from the forum and has not been processed, it contains paths, URLs,
some special characters, and so on. Therefore, we use regular expression do the preprocessing on the
title and text of each document as shown in Figure 3. Special characters, paths, URLs, parentheses, and
punctuation are removed. The number of training and verification after preprocessing is as shown in
Table 1.</p>
        <p>Extract
each title
and text</p>
        <p>Preprocessing that
delete URL, special
char, and white space</p>
        <p>Convert the file in to</p>
        <p>TSV format
(ID, Title, text)</p>
        <p>
          The training materials came from a total of 820 people, of which the majority (741 people) were
non-depression, which shows that the data is extremely unbalanced. The situation is often encountered
in real world problems, how to effectively filter the post is an important issue, and it is also the main
consideration of our research. According to the previous observation [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], there is a difference between
the length and amount of words used. We show them in Figure 4 and Figure 5, respectively, for the
length of the text and the number of words. Blue represents the post of non-depression ones, red
represents post written by depression users, and the X axis is the total number of posts 538,389. The
Yaxis indicates the length of the post and the number of words, respectively, and statistically there are
indeed some posts that show that the posts by non-depression users have a longer length and more
words than the posts by depression users. Therefore, according to this data, we removed the posts with
length more than 1000 and the number of words over 500. But this distinction is still limited, and most
of the posts are still similar in length to the number of words.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Our Approach</title>
      <p>We describe our system settings in sub-section 3.1 and how we evaluate our system in sub-section
3.2. The experiment settings of our 5 runs is shown in the following 5 sections.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1. Operating environment and model parameter settings</title>
      <p>Model is trained on Google Colab Pro, the training data is listed in Table 1, the data is divided into
80% for training, 20% verification, tokenizer and model are roberta-base. The hyper parameters settings
are: max length is set to 128, batch size is set to 100, hidden size is set to 768, learning rate is set to
1e5, weight decay is set to 1e-2, and epoch of fine-tuning is set to 2.</p>
    </sec>
    <sec id="sec-5">
      <title>3.2. Evaluation model method</title>
      <p>Evaluation processes is shown in Figure 6, there are two evaluation modules. One is to predict the
outcome of each data's depression tendency, and the other is to predict whether the statistical prediction
results are for the corresponding person to determine whether there is a depression tendency. The
process is to give test set data to the experimental model to predict whether it is a tendency to be
depressed, and calculate the model Precision, Recall, and F1-score scores. And the data results are
statistically judged by the corresponding person, adjusted from 1% to 99% of the symptomatic data,
and calculate the Precision, Recall and F1-score scores under different proportions to find the best
F1score score.</p>
      <sec id="sec-5-1">
        <title>Start</title>
      </sec>
      <sec id="sec-5-2">
        <title>Test Set</title>
      </sec>
      <sec id="sec-5-3">
        <title>Experiment Model</title>
      </sec>
      <sec id="sec-5-4">
        <title>Statistical Results</title>
      </sec>
      <sec id="sec-5-5">
        <title>Predict people's depression tendency (predicted pos data accounted for 1~99% adjustment)</title>
      </sec>
      <sec id="sec-5-6">
        <title>Calculate Precision,</title>
      </sec>
      <sec id="sec-5-7">
        <title>Recall and F1-score</title>
        <p>End</p>
        <p>
          The pre-trained RoBERTa model [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] was used as a baseline model for evaluating model score
changes against subsequent comparisons. The flowchart of experiment one is shown in Figure 7, the
only preprocessing is focus on the data imbalance issue. The treatment is to reduce the number of posts
extracted from the documents of the non-depression people in the training set (up to 500 posts per
person). The total training posts are 268,866 and use the TEXT part for model training.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Experiment 2: RoBERTa (Extract output of hidden layers)</title>
      <p>
        The main idea of experiment 2 is to change the embedding representation of an input sentence in the
RoBERTa model. The first tokens of each of the last four hidden layers are extracted from the model
for improvement [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This token represents the corresponding output vector of each layer, which means
that this token is the result of the model's representations in each hidden layer. In this experiment, the
results given by the last layer vector will be used for linear classification (see Figure 8). We want to
know if the model prediction can be improved by extracting multiple output vectors.
one sentence
hidden layer 1
      </p>
      <p>RoBERTa ⋮</p>
      <p>H : 2
H : 3
H : n-3
H : n-2
H : n-1</p>
      <p>H : n
 0
 0
 0</p>
      <p>0</p>
      <sec id="sec-6-1">
        <title>Linear classifier</title>
      </sec>
      <sec id="sec-6-2">
        <title>Answer ( neg or pos)</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Experiment 3: Building feature dictionary + RoBERTa</title>
      <p>
        The experiment 3 setting is not to unconditionally discard the non-depression data, but to filter the
data by creating a feature dictionary (to retain the information that can be matched by the dictionary).
From the progressive reference [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], when people’s thoughts and emotional reactions are different the
usage of words will be different too, that is why the negative emotion dictionary has been used in the
past. However, since that it is easy to publish posts using social media, new buzzwords or lists may be
generated at any time. Therefore; we try to extract a new dictionary of features by comparing the posts
from depression users and non-depression users.
      </p>
      <p>3.5.1.</p>
    </sec>
    <sec id="sec-8">
      <title>Build feature dictionary</title>
      <p>The extraction process is shown in Figure 9, the frequency of words in training data from depression
and non-depression users are counted separately. Some words only appear a few times, such as: personal
names, place names, song names, etc., so two threshold values (5, 16) are set for the frequency of
occurrence of words with depression and non-depression users. Two feature dictionaries are extracted,
and the number of words in the characteristics of the non-depression is 19,214, and the number of words
with the depression is 1,106.</p>
      <p>neg
posts
pos
posts
whitespace</p>
      <p>split
whitespace</p>
      <p>split</p>
    </sec>
    <sec id="sec-9">
      <title>3.5.2. Experiment process</title>
      <p>The flow chart of experiment 3 is shown in Figure 10, training data are screened by matching with
a feature dictionary. After screening, a total of 129,544 non-depression data were screened, and the
depression data was also screened for the purpose of strengthening the training of these data, a total of
902 cases. The processed training data contained all posts from depression users (40,353 posts) and
more characteristic posts (129,544+902 posts) after screening, with a total of 170,799 posts. The
training data is used to fine-tune the RoBERTa model.</p>
      <p>Relative
difference</p>
      <p>set</p>
      <sec id="sec-9-1">
        <title>Start</title>
      </sec>
      <sec id="sec-9-2">
        <title>Training Set</title>
        <p>pos posts
neg posts</p>
      </sec>
      <sec id="sec-9-3">
        <title>Match pos dictionary</title>
      </sec>
      <sec id="sec-9-4">
        <title>Match neg dictionary</title>
      </sec>
      <sec id="sec-9-5">
        <title>Training data</title>
      </sec>
      <sec id="sec-9-6">
        <title>RoBERTa</title>
      </sec>
      <sec id="sec-9-7">
        <title>Output (neg or pos)</title>
        <p>End</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Experiment 4: Combining Title and Text Prediction Models</title>
      <p>Experiment 4 is to train two models for the title and for the text separately. According to the
observation of the dataset, some of the data is only containing the title and no text, and this experiment
design is to deal this situation. Experimental process is shown in Figure 11, the system extracts the title
and body of each post from the training data, and the title and body each haw a separate RoBERTa
model for training, and the results are integrated to make judgments by a linear classifier.</p>
    </sec>
    <sec id="sec-11">
      <title>Experiment 5: Combining experiments 2 and 4</title>
      <p>We observed from the validation evaluation results of experiment 2 (Table 2) that the method of
extracting information from the hidden layer is effective, so we improved the process of experiment 4
according to the method of experiment 2 for experiment 5.</p>
    </sec>
    <sec id="sec-12">
      <title>4. Results and Discussion on System Development and eRisk task 2</title>
      <p>The result of our five experimental processes according to the Section 3.2 evaluation methodology.
Verify using the 2017 dataset test data. The first assessment is to determine whether there is a depression
tendency result is shown in Table 2, which is the result of the 401 users. According to the data results,
we find that the decision proportion of depression posts is predicted to be different on whether the user
has a tendency to be depressed. Figure 12, Table 3 are the metrics for all experiments at the proportion
of the best F1-score score.</p>
      <p>The evaluation results show that extracting multiple output vectors can effectively improve the
performance, which is more accurate than using only the last layer to predict the results. The result of
experiment 2 in Table 2 is more outstanding than experiment 1 in most evaluation matrices. From Figure
12, it can be seen that the best result of F1-Score is 60.19% when the proportion of depression in this
experimental model is 13%. In the comparison of Table 3 evaluation results, the F1-score of Experiment
2 is 2% better than that of Experiment 1.</p>
      <p>4.2.</p>
    </sec>
    <sec id="sec-13">
      <title>Experiment 3: Discussion</title>
      <p>The results of the assessment did not succeed in improving the prediction. The sharp increase in
Recall was accompanied by a sharp decline in precision, which made it easy to make mistakes in judging
the tendency to predict depression. As shown in Table 3, accuracy, recall, and F1-score were among
the worst of all experimental evaluations. The main reason for this situation is that the data is
overscreened. In the establishment of feature dictionaries, too extreme methods are taken, and only words
that appear in one of the categories are retained, which also leads to excessive exclusion of training
materials. This in turn leads to insufficient model training. From Figure 12, it can be observed that the
training model effect is very poor, when the proportion of symptoms is greater than 68%, the F1-score
is greatly reduced. This is abnormal, it means that the model tends to predict that there is depression
tendency result, but in fact, the number of people with non-depression tendency is much greater than
the number of people with depression tendencies. This condition, as mentioned earlier, is due to
overexcluding the results of the training data. And because the number of depression data is too rare after
matching, and all the data on depression tendency are put back to the training data, this also leads to the
training of the model to have a bias toward predicting depression.</p>
      <p>4.3.</p>
    </sec>
    <sec id="sec-14">
      <title>Experiment 4: Discussion</title>
      <p>The evaluation results have not improved significantly, and it can be observed from Table 2 that the
evaluation results of experiment 4 and experiment 1 are not much different, and only about 0.5% are
improved in judging whether people have a depression tendency. This is slightly helpful, but the effect
is not as effective as experiment 2. However, compared with the previous experiments, this model can
predict the results of the title without the text, so it has different applications similar to the previous
experimental models.</p>
      <p>4.4.</p>
    </sec>
    <sec id="sec-15">
      <title>Experiment 5: Discussion</title>
      <p>The results showed that instead of improving, they deteriorated, such as the results of experiment 4
in Figure 12, which were better than the evaluation results of experiment 5. And the reason for this
might be too much different information, and finally the model is difficult to converge and make wrong
judgments. From experiment 5 we found that the effect of this approach is limited, the vector size of
RoBERTa hidden layer is 768, so the last four hidden layers a total of 6144 dimension vectors will be
used. However, it might cause difficult to converge the results for a linear classification, so the model
judgment ability is reduced.</p>
    </sec>
    <sec id="sec-16">
      <title>Formal Results in eRisk 2022 Task 2</title>
      <p>We ran the above five experimental models on this Task 2, processing a total of 2000 iterations of
user writing, which took 7 days and 12 hours to complete. The Decision-based evaluation results were
not particularly pronounced (Table 4), and the Recall was on the high side of each experimental model.
We believe that the reasons for this result are due to the different way of evaluation. During the system
development phase, all of the writing are given at once. While the task is to give each user a writing
post at a time in an iterative way, and predict the data the user's depression tendencies early. However,
we have a good performance in the ranking-based results, and from Table 5, we can observe that the
more information our model gets, the evaluation score continues to rise, and P@10 the best performance
out of 1000.</p>
    </sec>
    <sec id="sec-17">
      <title>5. Conclusions</title>
      <p>During the system developing phase, we use all of the user's writing training to train the model,
which is different from task 2, which is to give one post at a time in an iterative manner. Therefore, our
model is weaker in early detection of user depression, but has a good performance in the ranking-based
results. Compared with the baseline of our experiment one, the results of experiment three are not as
expected. We learned from it that the statistical common word count ratio as a classification feature
might be overfitting. Extracting the output vector in the hidden layers in Experiment 2 as a new
representation has indeed been effectively improved, and it is more accurate to predict the result than
experiment 1 directly using a pre-trained model. In the design of experiment four, we combined the
model trained on the body text with the model that trained on the titles. Although this method has not
significantly improved in the evaluation effect, but compared to only the body text, the combined model
can handle special cases that title is missing or body text is missing.
6. References
[13] Javier Parapar, Patricia Martín-Rodilla, David E. Losada, Fabio Crestani.: Overview of eRisk
at CLEF 2021: Early Risk Prediction on the Internet (Extended Overview), 2021. URL:
http://ceur-ws.org/Vol-2936/paper-72.pdf
[14] Hassan Alhuzali, Tianlin Zhang, Sophia Ananiadou.: Predicting Sign of Depression via Using
Frozen Pre-trained Models and Random Forest Classifier, 2021. URL:
http://ceur-ws.org/Vol2936/paper-73.pdf
[15] Diana Inkpen, Ruba Skaik, Prasadith Buddhitha, Dimo Angelov, Maxwell Thomas
Fredenburgh.: uOttawa at eRisk 2021: Automatic Filling of the Beck’s Depression Inventory
Questionnaire using Deep Learning, 2021. URL: http://ceur-ws.org/Vol-2936/paper-79.pdf
[16] Ana-Maria Bucur, Adrian Cosma, Liviu P. Dinu.: Early Risk Detection of Pathological
Gambling, Self-Harm and Depression Using BERT, 2021. URL:
http://ceur-ws.org/Vol2936/paper-77.pdf
[17] Christoforos Spartalis, George Drosatos, Avi Arampatzis.: Transfer Learning for Automated</p>
      <p>Responses to the BDI Questionnaire, 2022. URL: http://ceur-ws.org/Vol-2936/paper-84.pdf
[18] Javier Parapar, Patricia Martín-Rodilla, David E. Losada, Fabio Crestani.: Evaluation Report
of eRisk 2022: Early Risk Prediction on the internet, In: Experimental IR Meets Multilinguality,
Multimodality, and Interaction. 13th International Conference of the CLEF Association, CLEF
2022. Springer International Publishing, Bologna, Italy. 2022.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Eichstaedt</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merchant</surname>
            ,
            <given-names>R.M.:</given-names>
          </string-name>
          <article-title>Facebook language predicts depression in medi cal records</article-title>
          .
          <source>Proceedings of the National Academy of Sciences (PNAS</source>
          )
          <volume>115</volume>
          (
          <issue>44</issue>
          ),
          <fpage>11203</fpage>
          -
          <lpage>11208</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Reece</surname>
            ,
            <given-names>A.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reagan</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lix</surname>
            ,
            <given-names>K.L.M.</given-names>
          </string-name>
          et al.:
          <article-title>Forecasting the onset and course of mental illne ss with Twitter data</article-title>
          .
          <source>Sci Rep</source>
          <volume>7</volume>
          ,
          <issue>13006</issue>
          (
          <year>2017</year>
          ). URL:https://doi.org/10.1038/s41598-017-1296
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[3] CLEF eRisk: Early risk prediction on the Internet</source>
          ,
          <year>2021</year>
          . URL: https://erisk.irlab.org/2021/ind ex.html
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[4] CLEF 2022 Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2022</year>
          . URL: https://clef2022.clef
          <article-title>-i nitiative</article-title>
          .eu/index.php?page=Pages/labs.html#erisk
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[5] eRisk 2022 Text Research Collection</source>
          ,
          <year>2022</year>
          . URL: https://erisk.irlab.org/eRisk2022.html
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Fidel</given-names>
            <surname>Cacheda</surname>
          </string-name>
          , Diego Fernández, Francisco J. Novoa, Víctor Carneiro.
          <source>: Analysis and Experim ents on Early Detection of Depression</source>
          ,
          <year>2018</year>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2125</volume>
          /paper_69.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Yinhan</given-names>
            <surname>Liu</surname>
          </string-name>
          , Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
          <string-name>
            <surname>Omer Levy</surname>
          </string-name>
          ,
          <article-title>Mi ke Lewis, Luke Zettlemoyer and Veselin Stoyanov</article-title>
          .
          <article-title>RoBERTa: A Robustly Optimized BERT Pretraining Approach</article-title>
          . Computing Research Repository, (
          <year>2019</year>
          ). arXiv:
          <year>1907</year>
          .11692. version 1
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Chris</surname>
            <given-names>McCormick</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Nick</given-names>
            <surname>Ryan</surname>
          </string-name>
          .
          <source>: BERT Word Embeddings Tutorial</source>
          ,
          <year>2019</year>
          . URL: https://mccor mickml.
          <source>com/</source>
          <year>2019</year>
          /05/14/BERT-word
          <article-title>-embeddings-tutorial/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Yen-Shuan</surname>
            <given-names>Huang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wen-Hsiang Lu</surname>
          </string-name>
          .:
          <article-title>Predicting Web User's Tendency of Depression Using Negative Thought-Driven Depression Model</article-title>
          ,
          <year>2015</year>
          . URL: https://hdl.handle.net/11296/uskn27
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Shih-Hung</surname>
            <given-names>Wu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao-Jun Qiu</surname>
          </string-name>
          .
          <article-title>: A RoBERTa-based model on measuring the severity of the signs of depression, 2021</article-title>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2936</volume>
          /paper-86.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin; Ming-Wei</surname>
          </string-name>
          <string-name>
            <surname>Chang</surname>
          </string-name>
          ;
          <article-title>Kenton Lee; Kristina Toutanova. BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the</source>
          <year>2019</year>
          <article-title>Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis</article-title>
          , MN, USA, June 2-7, (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Ashish</surname>
            <given-names>Vaswani</given-names>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
          <string-name>
            <given-names>Aidan N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Łukasz Kaiser, Illia Polosukhin.:
          <article-title>Attention IsAll You Need</article-title>
          .
          <source>arXiv:1706.03762v5 6 Dec</source>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>