<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Linguistic Metadata Augmented Classifiers at the CLEF 2017 Task for Early Detection of Depression FHDO Biomedical Computer Science Group (BCSG)</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Marcel</forename><surname>Trotzek</surname></persName>
							<email>mtrotzek@stud.fh-dortmund.de</email>
							<affiliation key="aff0">
								<orgName type="department">University of Applied Sciences and Arts Dortmund (FHDO) Department of Computer Science</orgName>
								<address>
									<addrLine>Emil-Figge-Str. 42</addrLine>
									<postCode>44227</postCode>
									<settlement>Dortmund</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sven</forename><surname>Koitka</surname></persName>
							<email>sven.koitka@fh-dortmund.de</email>
							<affiliation key="aff0">
								<orgName type="department">University of Applied Sciences and Arts Dortmund (FHDO) Department of Computer Science</orgName>
								<address>
									<addrLine>Emil-Figge-Str. 42</addrLine>
									<postCode>44227</postCode>
									<settlement>Dortmund</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">TU Dortmund University</orgName>
								<address>
									<addrLine>Otto-Hahn-Str. 14</addrLine>
									<postCode>44227</postCode>
									<settlement>Dortmund</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christoph</forename><forename type="middle">M</forename><surname>Friedrich</surname></persName>
							<email>christoph.friedrich@fh-dortmund.de</email>
							<affiliation key="aff0">
								<orgName type="department">University of Applied Sciences and Arts Dortmund (FHDO) Department of Computer Science</orgName>
								<address>
									<addrLine>Emil-Figge-Str. 42</addrLine>
									<postCode>44227</postCode>
									<settlement>Dortmund</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Linguistic Metadata Augmented Classifiers at the CLEF 2017 Task for Early Detection of Depression FHDO Biomedical Computer Science Group (BCSG)</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">E1A416FBE1ACB239B201CF820B576D64</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>depression</term>
					<term>early detection</term>
					<term>linguistic metadata</term>
					<term>paragraph vector</term>
					<term>latent semantic analysis</term>
					<term>long short term memory</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Methods for automatic early detection of depressed individuals based on written texts can help in research of this disorder and especially offer better assistance to those affected. FHDO Biomedical Computer Science Group (BCSG) has submitted results obtained from five models for the CLEF 2017 eRisk task for early detection of depression that are described in this paper. All models utilize linguistic meta information extracted from the texts of each evaluated user and combine them with classifiers based on Bag of Words (BoW) models, Paragraph Vector, Latent Semantic Analysis (LSA), and Recurrent Neural Networks (RNN) using Long Short Term Memory (LSTM). BCSG has achieved top performance according to ERDE5 and F1 score for this task.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>This paper describes the participation of FHDO Biomedical Computer Science Group (BCSG) at the Conference and Labs of the Evaluation Forum (CLEF) 2017 eRisk pilot task for early detection of depression <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23]</ref>. BCSG submitted results obtained from four different approaches and a fifth, additionally optimized variation of one model for late submission. These models as well as the findings concerning the dataset are described in this paper and an outlook on possible improvements and future research is given.</p><p>It is known that depression often leads to a negative image of oneself, pessimistic views, and an overall dejected mood <ref type="bibr" target="#b1">[2]</ref>. Accordingly, previous studies have shown that depression can have certain effects on the language used by patients. A study among depressed, formerly-depressed, and never-depressed students <ref type="bibr" target="#b35">[36]</ref> came to the conclusion that depressed individuals more frequently used the word "I" as well as negatively connoted adjectives. Similarly, an analysis of Twitter messages has shown that users suffering from depression used the words "my" and "me" much more frequently than others <ref type="bibr" target="#b28">[29]</ref>, while a Russian speech study found an increased usage of past tense verbs and pronouns in general. Findings like these have been used, for example, to create the Linguistic Inquiry and Word Count (LIWC) tool <ref type="bibr" target="#b38">[39]</ref> that allows to analyse the psychological and social state of an individual based on written texts.</p><p>A similar task using Twitter posts was organized at the CLPsych 2015 conference <ref type="bibr" target="#b8">[9]</ref> without the early detection aspect: Participants were asked to distinguish between users with depression and a control group, users with Post Traumatic Stress Disorder (PTSD) and a control group, as well as between users with depression and users with PTSD. Promising results were reported using topic modeling <ref type="bibr" target="#b34">[35]</ref> and rule-based approaches <ref type="bibr" target="#b30">[31]</ref>. It was also investigated how a set of user metadata features can be utilized and combined with a variety of document vectorizations in an ensemble <ref type="bibr" target="#b32">[33]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Dataset</head><p>The dataset presented in the CLEF 2017 eRisk pilot task consists of text contents written by users of www.reddit.com, which is a widely used communication platform for creating communities called subreddits that cover all kinds of topics 3 . Specifically, there is a very active community in the subreddit /r/depression 4 for people struggling with depression and similar subreddits for other mental disorders exist as well. The registration of a free account using a valid mail address and a public user name is necessary to create content, while reading is possible without registration, depending on the subreddit. Users can post content as link (using a title and either a URL or an image), as text content (using a title and optional text), or as comment (using only the text field and no title).</p><p>The given dataset contains all three kinds of content written by 887 users and 10 up to 2,000 documents per user. Table <ref type="table" target="#tab_0">1</ref> gives a summary of some basic characteristics of the training and test split. The task's goal is to classify which of these users show indications of depression by reading as few of their posts as possible in chronological order. Each document contains a timestamp of publication, the title, and the text content, while title or text can be empty. There also exist 91 cases of documents with both an empty title and text. The URL or image of link entries is not provided in the dataset. The number of unique n-grams contains all tokens with more than one alphabetical character (and the word "I") that occur in at least two documents, also including numbers, emoticons, and words that contain hyphens or apostrophes. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Corpus Analysis</head><p>After examining the general characteristics of the given dataset, a detailed analysis of the text contents is necessary to get an insight into promising features and the specific properties of the domain. In order to find the most interesting n-grams of the given corpus, Information Gain (IG) or expected Mutual Information (MI) was calculated. In case of binary classification tasks, the information contained in each feature is given as <ref type="bibr">[25, p. 272</ref>]:</p><formula xml:id="formula_0">I(U ; C) = et∈{0,1} ec∈{0,1} P (U = e t , C = e c ) log 2 P (U = e t , C = e c ) P (U = e t )P (C = e c ) ,<label>(1)</label></formula><p>with the random variable U taking values e t = 1 (the document contains term t) and e t = 0 (the document does not contain term t) and the random variable C taking values e c = 1 (the document is in class c) and e c = 0 (the document is not in class c). Similar to the previously described selection of unigrams in the corpus, IG was calculated without stopword removal for all uni-, bi-, and trigrams that can be found in at least two documents. The obtained scores were then used to find the 100 features with the highest IG of the corpus as well as the 100 features with the highest IG that occur more often in the depressed class, which is both shown in Fig.  Uni-, bi-, and trigrams with highest information gain for the whole corpus (left) and after excluding words that occur more often in the non-depressed class (right). In both text clouds, larger font size corresponds to higher information gain.</p><p>Both analyses give an interesting insight into the corpus that confirm previous research results described in the related work section. Comparing the two word clouds shows that the first person singular pronouns I, me, and my, which are frequently contained in documents of both classes, have the highest IG seen individually and are then found in some of the most important bi-and trigrams of the depressed class. The most important features of this class are, as could be expected, centered around depression and anxiety, while especially relationships (e.g. boyfriend, husband, partner, best friend), treatment (e.g. therapist, psychiatrist, medication), and look (e.g. acne, skin, makeup, alpha hydrox) can easily be identified as frequent topics and are often combined with personal or possessive pronouns. Interestingly, although the sad emoticon :-( is part of the top features in the depressed class, the happy emoticons :-) and :) occur even more frequently in this class and have a higher IG. The frequent combinations as in "thank you :)" point to the conclusion that this is often a reaction to thoroughly helpful conversations.</p><p>When examining the text data further, it becomes evident that the posts sometimes include quotes taken from messages of other users. This could be misleading for classification tasks since the quoted user might show indications of depression, while the actual author of this message might not or vice versa. Luckily, quotes seem to be unfrequent and can be identified to some extend because they are always indented by a single space, do not contain line breaks, and are preceded and followed by an empty line. There is no way to distinguish them from similarly indented one-line paragraphs by the actual author. By using a regular expression, 4,266 quotes can be found in the training data and 4,423 in the test data. For all models described in this paper, the prefix quote_ was added to each token within a quote to make them distinguishable from the same words written by the actual author.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Hand-crafted User Features</head><p>In addition to different document vectorization methods, a set of hand-crafted features has been derived from the text data and was used in all approaches. Several text statistics have been calculated and compared between the class of depressed and non-depressed users in the given dataset. The most promising features are displayed in Fig. <ref type="figure" target="#fig_2">2</ref> as box plot for each class. All features have been calculated as mean over all texts of the same user. In addition to the already mentioned counts of personal and possessive pronouns, past tense verbs, and the word I in particular, four standard measures for text readability have been calculated for the text content, namely Gunning Fog Index (FOG) <ref type="bibr" target="#b13">[14]</ref>, Flesch Reading Ease (FRE) <ref type="bibr" target="#b11">[12]</ref>, Linsear Write Formula (LWF) 5 <ref type="bibr" target="#b7">[8]</ref>, and New Dale-Chall Readability (DCR) <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b6">7]</ref>. Interestingly, while FOG, LWF, and DCR calculate a higher complexity for texts by depressed users (with values based on school years in the United States), FRE also calculates a higher score, corresponding to lower complexity in this case.  The average of the months in which all texts of a user have been submitted was included based on the hypothesis that depressive symptoms can be intensified in the winter months. This is difficult to observe in the given dataset, since the age of the available texts depends on how frequently a user has posted due to the limitation to the last 2000 writings per user. Users with many and frequent writings therefore tend to have more samples from early summer 2015 (when the collection was created), while less frequent writers provide a more uniform distribution of texts over all months. Additionally, five features have been created for the users that simply count the occurrences of some very specific n-grams in all their documents. This ensures that some of the strongest indicators of depression can still be identified easily even when using averaged document vectors or just a large amount of documents. These features were used in boolean form by all described models and count the following terms without regard to case:</p><p>-The chemical and brand names of common antidepressants available in the United States (e.g.: Sertraline or Zoloft) obtained from WebMD<ref type="foot" target="#foot_0">6</ref> -Explicit mentions of a diagnosis including the word depression (e.g.: "I was diagnosed with depression" or "I've been diagnosed with anxiety and depression") -The term "my depression" -The term "my anxiety" -The term "my therapist"</p><p>The mentioned terms have been picked carefully only from the training documents and have been designed to capture only statements referring to the personal situation of the author with the exception of the antidepressants. They could be extended for future research to include a more comprehensive list of medications or more general expressions of diagnosis (e.g. also including the terms "major depressive disorder" or "MDD"). Although the selected terms are not primarily helpful for early predictions, they are strong indicators to find already diagnosed individuals, which is important for the given task as well. It would also be interesting to include additional statistical features like the number of adjectives, adverbs, noun phrases, positive and negative emotions, and similar, as done for example by LIWC. Figure <ref type="figure" target="#fig_3">3</ref> displays the correlation of all user features without scaling and also includes the label information, where a higher value corresponds to the depressed class. It shows that all features are at least slightly correlated to the information whether a user is depressed or non-depressed.  The findings for this specific dataset confirm that texts by individuals suffering from depression indeed contain more pronouns and especially the word "I". Their texts are also slightly longer and more complex according to three of the four text complexity measures. This likely represents the difference between average users, who often post a large amount of short statements, and those who discuss problems and may even be looking for help.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Chosen Models</head><p>Two conventional document vectorization models as well as three models utilizing Long Short Term Memory (LSTM) <ref type="bibr" target="#b15">[16]</ref>, a layer architecture for Recurrent Neural Networks (RNN) <ref type="bibr" target="#b12">[13]</ref> specialized on sequences of data, have been used for the given task. One of these models also employs Latent Semantic Analysis (LSA) <ref type="bibr" target="#b10">[11]</ref> as dimensionality reduction step. All models have been optimized by 5-fold cross validation on the training data using F 1 score before the submission for the first chunk of test data and were not modified at a later point. The same applies for the described prediction thresholds that were also chosen by cross validation to submit predictions each week. The only exception is the final model BCSGE, which was used to get more time for optimization: For the first nine weeks, no predictions were submitted for this model, so only the predictions using all documents at once in the last week were scored. All models use a concatenation of the text and title field of each document as input and do not treat text and title separately. Identified quotes within text contents have been modified by adding a prefix to each quoted word as described earlier, while the tokenization step includes words, numbers, and emoticons as described in section 3.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Bag of Words Ensemble -BCSGA</head><p>The first model utilizes an ensemble of Bag of Words (BoW) classifiers with different term weightings and n-grams. The term weighting for bags of words can generally be split into three components: a term frequency component or local weight, a document frequency component or global weight, and a normalization component <ref type="bibr" target="#b36">[37]</ref>. A general term weighting scheme can therefore be given as <ref type="bibr" target="#b39">[40]</ref>:</p><formula xml:id="formula_1">t t,d = l t,d • g t • n d , (<label>2</label></formula><formula xml:id="formula_2">)</formula><p>where t t,d is the calculated weight for term t in document d, l t,d is the local weight of term t in document d, g t is the global weight of term t for all documents, and n d is the normalization factor for document d. A common example would be using the term frequency (tf ) as local weight and the inverse document frequency (idf ) as global weight, resulting in tf -idf weighting <ref type="bibr" target="#b36">[37]</ref>. All ensemble models use cosine normalization (l 2 -norm) for n d but varying local and global weights. The first one uses a combination of uni-, bi-, tri-, and 4-grams obtained from the training data: the 200,000 [1 − 4]-grams with the highest IG as given by Equation 1 are selected and their raw term frequency is used as local weight, while their IG score is used as global weight. The second BoW utilizes a modified version of tf , namely augmented term frequency (atf ) <ref type="bibr" target="#b39">[40]</ref>, multiplied by idf :</p><formula xml:id="formula_3">atf -idf (t, d) = a + (1 − a) tf t max(tf ) • log n d df (d, t) ,<label>(3)</label></formula><p>with max(tf ) being the maximum frequency of any term in the document, the total number of documents n d , and the smoothing parameter a, which is set to 0.3 for this model. This BoW, as well as the third one, contains all unigrams of the training corpus. The local weight of the third model consists of the logarithmic term frequency (logtf ) <ref type="bibr" target="#b29">[30]</ref> and the global weight is given by relevance frequency (rf ) <ref type="bibr" target="#b19">[20]</ref>, which can be combined as:</p><formula xml:id="formula_4">logtf -rf (t, d) = (1 + log(tf )) • log 2 2 + df t,+ max (1, df t,− ) , (<label>4</label></formula><formula xml:id="formula_5">)</formula><p>where df t,+ and df t,− is the number of documents in the depressed/non-depressed class that contain the term t. The final model of this ensemble uses the handcrafted user features described in section 3.2.</p><p>All three bags of words and the hand-crafted features were each used as input for a separate logistic regression classifier. Due to the imbalanced class distribution, a modified class weight was used for these classifiers similar to the original task paper <ref type="bibr" target="#b21">[22]</ref> to increase the cost of false negatives. It was calculated for the non-depressed class as 1/(1 + w) and for the depressed class as w/(1 + w), with w 1 = 2, w 2 = 6, w 3 = 2, and w 4 = 4 in the order as the different models have been described above. The final output probabilities were calculated as unweighted mean of all four logistic regression probabilities. Each week, this ensemble predicted any user with a probability above 0.5 as depressed and users below 0.15 as non-depressed, while in the final week all users with a probability equal to or less than 0.5 were predicted as non-depressed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Paragraph Vector -BCSGB</head><p>The second model is based on document vectorization by using Paragraph Vector <ref type="bibr" target="#b20">[21]</ref>, sometimes referred to as doc2vec, similar to the previously published word2vec <ref type="bibr" target="#b25">[26,</ref><ref type="bibr" target="#b26">27]</ref> on which it is based. While word2vec is used to train embedded word vectors from a large text corpus, Paragraph Vector learns vector representations for sentences, paragraphs, or whole documents. It was also found that Paragraph Vector can work better for smaller corpora than word2vec, which potentially makes it a viable option for this task. The two neural network architectures for each of these methods are all based on the probabilistic Neural Network Language Model <ref type="bibr" target="#b2">[3]</ref>.</p><p>For the Paragraph Vector classification of eRisk users, two separate models have been trained based on the training documents using the Python implementation in gensim 1.0.1 <ref type="bibr" target="#b33">[34]</ref>:</p><p>1. A Distributed Bag of Words model with 100 dimensional output, 10 training epochs, a context window of 10 words, negative sampling with 20 noise words, no downsampling, a learning rate from 0.025 to 1e−4, and all words contained in the documents. 2. A Distributed Memory model using the sum of input words with 100 dimensional output, 10 training epochs, a context window of 10, hierarchical softmax, downsampling of high-frequency words with 1e−4, a learning rate from 0.025 to 1e−4, and all words contained in the documents.</p><p>The output vectors of these two models were concatenated, as recommended by the developers <ref type="bibr" target="#b20">[21]</ref>, resulting in a 200 dimensional vector per document. Text content and title of the documents have again been concatenated and each of the resulting texts was used as separate input to Paragraph Vector. Test documents were vectorized by using an inference step that only outputs a new document vector and leaves all network weights fixed. Finally, the average of all documents by each user was calculated to obtain the average topic of everything the user has written. Figure <ref type="figure" target="#fig_4">4</ref> shows a twodimensional representation of the averaged training document vectors calculated by t-SNE <ref type="bibr" target="#b23">[24]</ref>. Even after a reduction to only two dimensions, there is at least one clearly visible cluster of non-depressed users and a rather noisy cluster of depressed users. A logistic regression classifier was trained on the 200-dimensional averaged document vectors, using the same class weight equation as in the previous model with w = 4. The calculated class probabilities were again averaged with the probabilities obtained from the logistic regression based on the hand-crafted user features. Since this model depends more on the number of documents it has been trained on, the final predictions were based on the probability as well as the number of documents written by the user to prevent too many false positives. Depressed predictions were submitted for probabilities from 0.6 with at least 20 documents, 0.7 with at least 10 documents, and all probabilities above 0.9, while non-depressed predictions required a probability below 0.1 with at least 20 documents, 0.05 with at least 10 documents, or a probability below 0.01.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Class non-depressed depressed</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">LSTM with LSA Vectors -BCSGC</head><p>This and the following two models are based on a Tensorflow <ref type="bibr" target="#b0">[1]</ref> neural network approach using an LSTM layer. By using sequences of text documents as input, the LSTM network allows to learn a general context of each user's documents while processing them in chronological order. All three LSTM models also use the hand-crafted user features as an additional meta data input and merge them with the LSTM output in a fully connected layer. This again ensures that these features are not lost after document vectorization and averaging. A final softmax layer was used to produce the actual output probabilities, the softsign function <ref type="bibr" target="#b3">[4]</ref> was chosen as activation for the LSTM cell, and dropout was added to prevent overfitting. The training steps of this and the following two LSTM models utilized Adam <ref type="bibr" target="#b18">[19]</ref> to minimize the cross-entropy loss.</p><p>For this first LSTM approach, LSA was used to reduce the BoW vectorized documents to a viable number of dimensions based on Singular Value Decomposition (SVD). All documents were first transformed into a BoW by selecting only the 10,000 unigrams with the highest IG and using their term frequency multiplied by their IG as term weighting. LSA was then used to reduce these document vectors to 100 dimensions, which retained 90.32% of the original variance in the training dataset. To obtain an equal sequence length for all users that is viable as network input, the document sequences were modified to have a length of 25 documents: For users with fewer documents, zero vectors were appended, while two randomly selected consecutive document vectors were averaged for longer sequences, until the maximum length was reached. Adam was then used with a fixed learning rate of 1e−4, 64 units were added to the LSTM cell, a dropout keep probability of 80% was applied, and the network was trained for 300 epochs.</p><p>Similar to the previous model, prediction thresholds were based on the network's output probability and the number of documents. Depressed predictions required a probability above 0.5 and at least 20 documents, above 0.7 and at least five documents, or above 0.9, while non-depressed predictions were submitted for probabilities below 0.05.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">LSTM with Paragraph Vectors -BCSGD</head><p>This fourth model utilized the same LSTM network as described for the previous one with identical parameters, except for a number of 128 hidden units in the LSTM cell and a training duration of 170 epochs. For the input sequences, documents were vectorized based on the two concatenated Paragraph Vector models of the second approach. Again, the resulting sequences of 200-dimensional document vectors were modified to have a unified length of 25. The model was configured to submit depressed predictions for any user with a probability above 0.3 and at least 50 documents, above 0.4 and at least 20 documents, or above 0.7, while probabilities below 0.01 resulted in a non-depressed prediction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Late LSTM with Paragraph Vectors -BCSGE</head><p>To have some additional time for model optimization and to compare the impact on the ERDE score, the fifth model was not used to submit any predictions until the last week. It is identical to the fourth model but uses two new, 200dimensional Paragraph Vector models that were trained on both training and test documents. This is an unsupervised that uses only text documents without any label information. Also, this model uses a second fully connected layer before the softmax layer, Rectified Linear Unit (ReLU) activation <ref type="bibr" target="#b14">[15]</ref> for both fully connected layers, a weight decay factor of 0.001 for all weights in the network, exponential learning rate decay from 1e−4 to 1e−5, a dropout keep probability of 70% for LSTM outputs, 128 hidden units in the LSTM, and was trained using batches of 100 users over 130 epochs. The document sequence length was again unified to 25 and a minority oversampling that duplicates each depressed user in the training input was used to counter the class imbalance. The final network architecture for this model is displayed in Fig. <ref type="figure" target="#fig_5">5</ref>, where m u represents the meta data for a single user u and x u,t is the sequence of input documents written by this user. In the final week, predictions obtained from this model were submitted based on the same thresholds that were used for the previous one. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results</head><p>Before discussing the official task results, analyzing the amount of correctly classified depressed individuals using the five BCSG models can give a first insight into the classification performance. The cumulative number of depressed predictions and actual true positives per model and week is shown in Fig. <ref type="figure" target="#fig_6">6</ref>. A horizontal line marks the total number of 52 depressed samples in the test set for reference. It becomes evident that there is still a lot of room for improvements. Although each model is able to detect a growing number of depressed users over the ten weeks, the proportion of false positives is large and the number of total true positives ranges between 24 and 38 of the 52 depressed users in the test set. Most true positives were found by the fifth model but at the cost of nearly as much false positives. This could at least partially be influenced by finding better prediction thresholds.</p><p>The final submissions to the CLEF 2017 early risk detection pilot task were scored using the ERDE 5 and ERDE 50 score for early detection tasks defined by the organizers as well as F 1 score. The scores and the underlying precision and recall values of all models have been published <ref type="bibr" target="#b22">[23]</ref> and are visualized in Fig. <ref type="figure" target="#fig_7">7</ref>. It shows the evaluation results of all eight participants and their up to five different models. The highlighted models of BCSG consistently achieved positions in the first ranks and even the fifth model was ranked in the top half according to both ERDE scores by only submitting a prediction in the last week. The achieved ERDE scores for this task cannot be compared to the previously published results by the organizers <ref type="bibr" target="#b21">[22]</ref>, since the documents had to be processed in weekly chunks for the task and it was not possible to submit predictions before processing a complete chunk. The best results of BCSG could be achieved by using the BoW model BCSGA (first in F 1 and second in ERDE 50 ) and the Paragraph Vector model BCSGB (first in ERDE 5 ), with the LSTM models close behind.   Since results for BCSGE are only available for the last week, it was evaluated again for all weeks after the golden truth file was published. For this ex post analysis, separate Paragraph Vector models were trained using the training data and already released test data for each week. If BCSGE had been used from the first week, the results would have been <ref type="bibr" target="#b15">16</ref>.01% in ERDE 5 , 9.78% in ERDE 50 , and 0.46 in F 1 . While this ERDE 50 score would have been the third best overall, the other scores show that this is still not well optimized and there are too many false positives. Future work will be used to examine the effect of hand-crafted features and preprocessing methods on the prediction results. A quick ex post analysis using the first two models BCSGA and BCSGB has shown that the selected hand-crafted features at least had a slightly positive effect (13.04% in ERDE 5 , 9.75% in ERDE 50 , and 0.63 in F 1 for BCSGA without hand-crafted features), with the exception of the ERDE 5 score for BCSGB, which would have been marginally better without hand-crafted features in contrast to a much worse F 1 score (12.67% in ERDE 5 , 10.76% in ERDE 50 , and 0.37 in F 1 for BCSGB).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions</head><p>The pilot task for early detection of depression has highlighted a variety of challenges posed by this area of research. These challenges are not limited to the task of distinguishing actual clinical depression from normal dejected mood as well as other, more or less related mental disorders like anxiety disorders, PTSD, or bipolar disorder. In the context of online platforms, there are also several other frequent false positives that could be observed in this task: relatives of depressed individuals and therapists offering advice can easily be mistaken for depressed cases when giving too much weight to single words or phrases. Drug users (which might indeed be an accompanying factor of depression <ref type="bibr" target="#b17">[18,</ref><ref type="bibr" target="#b5">6]</ref>) and authors posting fictional stories could regularly be spotted as false positives. On the other hand, there are cases of individuals who post hundreds of very ordinary comments but suddenly start expressing their feelings and talk about their depression. Such cases would be easier to predict by models that treat each document separately instead of using the whole history of a user.</p><p>The final results show that all chosen approaches are generally suitable for early detection of depression and all of them are of interest for future research. Due to the promising results using Paragraph Vector, optimizing these models and applying similar word and document embedding methods like fastText <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b16">17]</ref> and GloVe <ref type="bibr" target="#b31">[32]</ref> could be a priority for future work. The introduced neural network approaches with LSTM cells have been shown to be viable as well and allow for a variety of possible extensions and optimizations. Better prediction thresholds optimized based on ERDE scores or more specific signals for depressed predictions could help in making earlier predictions without too many false positives. Finally, the collected meta information on the user base can be extended to utilize emotion lexica <ref type="bibr" target="#b27">[28]</ref>, psychological and social insights obtained for example from LIWC, and additional statistical text features.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Fig.1. Uni-, bi-, and trigrams with highest information gain for the whole corpus (left) and after excluding words that occur more often in the non-depressed class (right). In both text clouds, larger font size corresponds to higher information gain.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Fig. 2 .</head><label>2</label><figDesc>Fig. 2. Boxplots of text features for both classes per user in the eRisk training dataset.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Fig. 3 .</head><label>3</label><figDesc>Fig. 3. Correlation matrix of all user features including the class information (nondepressed/depressed).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Fig. 4 .</head><label>4</label><figDesc>Fig. 4. Plot of the t-SNE reduced averaged document vectors per user for the Paragraph Vector model (BCSGB).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Fig. 5 .</head><label>5</label><figDesc>Fig. 5. Network architecture of the final LSTM model for BCSGE.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Fig. 6 .</head><label>6</label><figDesc>Fig. 6. Cumulative number of depressed predictions (blue plus gray bars) and proportion of true positives (blue bars only) per model after each week of the task. A horizontal line marks the 52 depressed samples in the test data.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Fig. 7 .</head><label>7</label><figDesc>Fig. 7. Official results of the eRisk pilot task in terms of ERDE5, ERDE50 and F1 score. The results of BCSG are highlighted. This plot is best viewed in electronic form.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Characteristics of the training and test datasets.</figDesc><table><row><cell>Training</cell><cell>Test</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>1.    </figDesc><table><row><cell>i'm not</cell><cell></cell></row><row><cell>i and i'm to so but have my me and i it just you i have like the i don't that for don't i was depression feel but i really i've because with know of if myself it's when get was am in do :) be when i i can that i this had out if i at on time or can because i up not about people so i i had anxiety think her want being is would of my i know someone i am how now i feel go she life too i just something what much going one feel like help thank thanks feeling even can't things my life thank you day all want to pain sometimes</cell><cell>anxiety depression :-) mmr my boyfriend :p acne my face meds cerave therapist my skin my anxiety makeup medication feel better feel like i boyfriend depressed my depression my husband with depression keto anxiety and diagnosed moisturizer depression is aha depression and diagnosed with</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_0">http://www.webmd.com/depression/guide/depression-medications-antidepressants -Accessed on 2017-05-07</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems</title>
		<author>
			<persName><forename type="first">M</forename><surname>Abadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Barham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Brevdo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Citro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Davis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Devin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ghemawat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Harp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Irving</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Isard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jozefowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kudlur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Levenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mané</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Schuster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Monga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Moore</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Murray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Olah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Shlens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Steiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Talwar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tucker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vanhoucke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vasudevan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Viégas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Warden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wattenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wicke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zheng</forename><forename type="middle">X</forename></persName>
		</author>
		<ptr target="org" />
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note>Software available from tensorflow</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">T</forename><surname>Beck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">A</forename><surname>Alford</surname></persName>
		</author>
		<title level="m">Depression: Causes and Treatment. Second Edition</title>
				<imprint>
			<publisher>University of Pennsylvania Press</publisher>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A Neural Probabilistic Language Model</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ducharme</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Vincent</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jauvin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="1137" to="1155" />
			<date type="published" when="2003-02">Feb. 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Quadratic Polynomials Learn Better Image Features</title>
		<author>
			<persName><forename type="first">J</forename><surname>Bergstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Desjardins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lamblin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<idno>1337</idno>
		<imprint>
			<date type="published" when="2009">2009</date>
		</imprint>
		<respStmt>
			<orgName>Département d&apos;Informatique et de Recherche Opérationnelle ; Université de Montréal</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.04606</idno>
		<title level="m">Enriching Word Vectors with Subword Information</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Drug Use and the Risk of Major Depressive Disorder, Alcohol Dependence, and Substance Use Disorders</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Brook</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Brook</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Whiteman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Arch Gen Psychiatry</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="1039" to="1044" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Readability Revisited: The New Dale-Chall Readability Formula</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Chall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Dale</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1995">1995</date>
			<publisher>Brookline Books</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Readability Helps the Level</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Christensen</surname></persName>
		</author>
		<ptr target="http://www.csun.edu/~vcecn006/read1.html-" />
		<imprint>
			<date type="published" when="2000">2000. 2017-04-21</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">CLPsych 2015 Shared Task: Depression and PTSD on Twitter</title>
		<author>
			<persName><forename type="first">G</forename><surname>Coppersmith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dredze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Harman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hollingshead</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mitchell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality</title>
				<meeting>the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="31" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A Formula for Predicting Readability: Instructions</title>
		<author>
			<persName><forename type="first">E</forename><surname>Dale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Chall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Educational Research Bulletin</title>
		<imprint>
			<biblScope unit="page" from="37" to="54" />
			<date type="published" when="1948">1948</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Indexing by Latent Semantic Analysis</title>
		<author>
			<persName><forename type="first">S</forename><surname>Deerwester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Dumais</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">W</forename><surname>Furnas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">K</forename><surname>Landauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Harshman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Society for Information Science</title>
		<imprint>
			<biblScope unit="volume">41</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="391" to="407" />
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A New Readability Yardstick</title>
		<author>
			<persName><forename type="first">R</forename><surname>Flesch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Applied Psychology</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="221" to="233" />
			<date type="published" when="1948">1948</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Deep Learning</title>
		<author>
			<persName><forename type="first">I</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Courville</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">The Technique of Clear Writing</title>
		<author>
			<persName><forename type="first">R</forename><surname>Gunning</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1952">1952</date>
			<publisher>McGraw-Hill</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">H</forename><surname>Hahnloser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">S</forename><surname>Seung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Slotine</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Computation</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="621" to="638" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Long Short-Term Memory</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Computation</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1735" to="1780" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Bag of Tricks for Efficient Text Classification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.01759</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">The Self-Medication Hypothesis of Addictive Disorders: Focus on Heroin and Cocaine Dependence</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Khantzian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The American Journal of Psychiatry</title>
		<imprint>
			<biblScope unit="volume">142</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="1259" to="1264" />
			<date type="published" when="1985">1985</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Adam: A Method for Stochastic Optimization</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6980</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd International Conference on Learning Representations (ICLR)</title>
				<meeting>the 3rd International Conference on Learning Representations (ICLR)<address><addrLine>San Diego</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Proposing a New Term Weighting Scheme for Text Categorization</title>
		<author>
			<persName><forename type="first">M</forename><surname>Lan</surname></persName>
		</author>
		<author>
			<persName><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chew</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-B</forename><surname>Low</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st National Conference on Artifical Intelligence (AAAI-06)</title>
				<meeting>the 21st National Conference on Artifical Intelligence (AAAI-06)</meeting>
		<imprint>
			<date type="published" when="2006">2006</date>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="763" to="768" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Distributed Representations of Sentences and Documents</title>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st International Conference on Machine Learning (ICML)</title>
				<meeting>the 31st International Conference on Machine Learning (ICML)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="1188" to="1196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">A Test Collection for Research on Depression and Language Use</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Losada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction: 7th International Conference of the CLEF Association</title>
				<meeting><address><addrLine>Évora, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="28" to="39" />
		</imprint>
	</monogr>
	<note>CLEF 2016</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">CLEF Lab on Early Risk Prediction on the Internet: Experimental Foundations</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Losada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Parapar</surname></persName>
		</author>
		<author>
			<persName><surname>Erisk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings Conference and Labs of the Evaluation Forum CLEF 2017</title>
				<meeting>Conference and Labs of the Evaluation Forum CLEF 2017<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Visualizing Data Using t-SNE</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">V D</forename><surname>Maaten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="2579" to="2605" />
			<date type="published" when="2008-11">Nov. 2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">An Introduction to Information Retrieval</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Raghavan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schütze</surname></persName>
		</author>
		<ptr target="https://nlp.stanford.edu/IR-book/pdf/irbookonlinereading.pdf-" />
		<imprint>
			<date type="published" when="2009">2009. 2017-04-21</date>
			<publisher>Cambridge University Press</publisher>
		</imprint>
	</monogr>
	<note>Online Edition</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Efficient Estimation of Word Representations in Vector Space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Workshop at International Conference on Learning Representations ICLR 2013</title>
				<meeting>Workshop at International Conference on Learning Representations ICLR 2013</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Distributed Representations of Words and Phrases and their Compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="3111" to="3119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Crowdsourcing a Word-Emotion Association Lexicon</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Mohammad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Turney</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Intelligence</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="436" to="465" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Identifying Depression on Twitter</title>
		<author>
			<persName><forename type="first">M</forename><surname>Nadeem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Horn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Coppersmith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.07384</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">A Study of Information Retrieval Weighting Schemes for Sentiment Analysis</title>
		<author>
			<persName><forename type="first">G</forename><surname>Paltoglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Thelwall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 48th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1386" to="1395" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Screening Twitter Users for Depression and PTSD with Lexical Decision Lists</title>
		<author>
			<persName><forename type="first">T</forename><surname>Pedersen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality</title>
				<meeting>the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="46" to="53" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">GloVe: Global Vectors for Word Representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Richard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Mental Illness Detection at the World Well-Being Project for the CLPsych</title>
		<author>
			<persName><forename type="first">D</forename><surname>Preoţiuc-Pietro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Schwartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ungar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Shared Task Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality</title>
				<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2015">2015. 2015</date>
			<biblScope unit="page" from="40" to="45" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Software Framework for Topic Modelling with Large Corpora</title>
		<author>
			<persName><forename type="first">R</forename><surname>Řehůřek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sojka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</title>
				<meeting>the LREC 2010 Workshop on New Challenges for NLP Frameworks</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="45" to="50" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">The University of Maryland CLPsych 2015 Shared Task System</title>
		<author>
			<persName><forename type="first">P</forename><surname>Resnik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Armstrong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Claudino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Nguyen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality</title>
				<meeting>the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="54" to="60" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Language Use of Depressed and Depression-Vulnerable College Students</title>
		<author>
			<persName><forename type="first">S</forename><surname>Rude</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E.-M</forename><surname>Gortner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pennebaker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cognition &amp; Emotion</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1121" to="1133" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Term-Weighting Approaches in Automatic Text Retrieval</title>
		<author>
			<persName><forename type="first">G</forename><surname>Salton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Buckley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="513" to="523" />
			<date type="published" when="1988">1988</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Language Changes as an Important Psychopathological Phenomenon of Mild Depression</title>
		<author>
			<persName><forename type="first">D</forename><surname>Smirnova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sloeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kuvshinova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Krasnov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Romanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Nosachev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">European Psychiatry</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods</title>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">R</forename><surname>Tausczik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Pennebaker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Language and Social Psychology</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="24" to="54" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Reducing Over-Weighting in Supervised Term Weighting for Sentiment Analysis</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 25th International Conference on Computational Linguistics (COLING</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="page" from="1322" to="1330" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
