<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">An End-to-End Set Transformer for User-Level Classification of Depression and Gambling Disorder</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ana-Maria</forename><surname>Bucur</surname></persName>
							<email>ana-maria.bucur@drd.unibuc.ro</email>
							<affiliation key="aff0">
								<orgName type="department">Interdisciplinary School of Doctoral Studies</orgName>
								<orgName type="institution">University of Bucharest</orgName>
								<address>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">PRHLT Research Center</orgName>
								<orgName type="institution">Universitat Politècnica de València</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Adrian</forename><surname>Cosma</surname></persName>
							<email>cosma.i.adrian@gmail.com</email>
							<affiliation key="aff2">
								<orgName type="institution">Politehnica University of Bucharest</orgName>
								<address>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Liviu</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
							<email>ldinu@fmi.unibuc.ro</email>
							<affiliation key="aff3">
								<orgName type="department">Faculty of Mathematics and Computer Science</orgName>
								<orgName type="institution">University of Bucharest</orgName>
								<address>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
							<affiliation key="aff4">
								<orgName type="department">Human Language Technologies Research Center</orgName>
								<orgName type="institution">University of Bucharest</orgName>
								<address>
									<country key="RO">Romania</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
							<email>prosso@dsic.upv.es</email>
							<affiliation key="aff1">
								<orgName type="department">PRHLT Research Center</orgName>
								<orgName type="institution">Universitat Politècnica de València</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff5">
								<orgName type="department">Evaluation Forum</orgName>
								<address>
									<addrLine>September 5-8</addrLine>
									<postCode>2022</postCode>
									<settlement>Bologna</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">An End-to-End Set Transformer for User-Level Classification of Depression and Gambling Disorder</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">973F9F248C416677FB88F6373E06A926</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T03:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>set transformer</term>
					<term>sentence encoder</term>
					<term>gambling disorder detection</term>
					<term>depression detection</term>
					<term>social media</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This work proposes a transformer architecture for user-level classification of gambling addiction and depression that is trainable end-to-end. As opposed to other methods that operate at the post level, we process a set of social media posts from a particular individual, to make use of the interactions between posts and eliminate label noise at the post level. We exploit the fact that, by not injecting positional encodings, multi-head attention is permutation invariant and we process randomly sampled sets of texts from a user after being encoded with a modern pretrained sentence encoder (RoBERTa / MiniLM). Moreover, our architecture is interpretable with modern feature attribution methods and allows for automatic dataset creation by identifying discriminating posts in a user's text-set. We perform ablation studies on hyper-parameters and evaluate our method for the eRisk 2022 Lab on early detection of signs of pathological gambling and early risk detection of depression. The method proposed by our team BLUE obtained the best ERDE5 score of 0.015, and the second-best ERDE50 score of 0.009 for pathological gambling detection. For the early detection of depression, we obtained the second-best ERDE50 of 0.027.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>How much can one know about someone from their social media interactions? Billions of people 1 use social media sites like Facebook, Instagram, Twitter, and Reddit every day. While some sites like Facebook and Instagram encourage users to use their real names, websites such as Reddit are often praised for enabling users to hide between a pseudonym, offering the illusion of privacy. Under the guise of anonymity, users tend to post more personal information related to their lives and their everyday struggles instead of striving to maintain an image and a persona when their identities are open <ref type="bibr" target="#b0">[1]</ref>. Many aspects of a user's personal life can be uncovered in their posting history. Of course, not one single post can be all-encompassing, but rather the information is scattered across many unrelated comments and posts. For instance, on the r/relationship_advice<ref type="foot" target="#foot_0">2</ref> subreddit a user might reveal their gender and age when discussing intimate relationship struggles, while on r/depression<ref type="foot" target="#foot_1">3</ref> a user might provide clues for their internal conflicts and experiences.</p><p>In the task of mental health disorders detection from social media text, many approaches operate on the post-level <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4]</ref>, considering that, for instance, if a user is depressed, then all their posts might contain some information regarding this issue. However, we posit that this method of post-level classification is unsuitable -many posts are unrelated and uninformative to the particular task. Their interaction, however, might contain clues to the mental well-being of a user.</p><p>As such, we propose an architecture that performs user-level classification by processing a set of posts from a user. We exploit the fact that the multi-head attention operation in transformers is permutation invariant and inputs multiple texts from a single user into the network, modeling their interaction and classifying the user. This approach has several advantages: (i) it is trainable end-to-end, mitigating the need for hand-crafted construction of global user features (ii) it is robust to label noise, as some posts might be uninformative, the network learns to ignore them in the decision and (iii) it is interpretable, using feature attribution methods <ref type="bibr" target="#b4">[5]</ref> we can extract the most important posts for the decision.</p><p>The Early Risk Prediction on the Internet (eRisk) <ref type="foot" target="#foot_2">4</ref> Lab started in 2017 with one pilot task and, since then, tacked the early risk detection of several mental illnesses: depression, self-harm, eating disorders, and pathological gambling. This work showcases team BLUE's proposed approach for Tasks 1 and 2 of eRisk 2022 Lab <ref type="bibr" target="#b5">[6]</ref>, of gambling and depression detection, respectively.</p><p>The paper makes the following contributions:</p><p>1. We propose a set-based transformer architecture for user-level classification, which makes a decision by processing multiple texts of a particular user. 2. We show that our architecture is robust to label noise and is interpretable with modern feature attribution methods, allowing it to be used as a dataset filtering tool. 3. We obtained promising results on the eRisk 2022 tasks on early risk detection of pathological gambling (best ERDE 5<ref type="foot" target="#foot_3">5</ref> score of 0.015 and the second-best ERDE 50 score of 0.009) and depression detection (second-best ERDE 50 of 0.027).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>Pathological Gambling For the detection of gambling disorder, the eRisk Lab is the first to use social media data for the assessment of gambling risk. Usually, the automated methods use data from behavioral markers <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8]</ref> or personality biomarkers <ref type="bibr" target="#b8">[9]</ref>. In the first iteration of the task for gambling addiction detection, the best-performing systems were developed by Maupomé et al. <ref type="bibr" target="#b9">[10]</ref> and Loyola et al. <ref type="bibr" target="#b10">[11]</ref>. Maupomé et al. <ref type="bibr" target="#b9">[10]</ref> used a user-level approach based on the similarity distance between the vector of topic probabilities of the users' texts to be assessed for pathological gambling risk and testimonials or items from a self-evaluation questionnaire for compulsive gamblers. By using this method, the authors obtain the best ERDE 5 of 0.048. Loyola et al. <ref type="bibr" target="#b10">[11]</ref> attain the best ERDE 50 (0.020) and latency-weighted F1 (0.693) through a post-level rule-based early alert policy on bag-of-words text representation classified with SVM.</p><p>Depression Depression detection from social media data is an interdisciplinary topic, and efforts have been made by researchers from both NLP and Psychology to detect different markers of depression found in the online discourse of individuals. Some depression cues found in language are: greater use of the first-person singular pronouns "I" <ref type="bibr" target="#b11">[12]</ref>, lesser use of first-person plural "we" <ref type="bibr" target="#b12">[13]</ref>, increased use of negative or absolutist terms (e.g., "never", "forever") <ref type="bibr" target="#b13">[14]</ref>, greater use of verbs at past tense <ref type="bibr" target="#b14">[15]</ref>.</p><p>For the task of early detection of depression, the best systems from the first iteration of the task (eRisk 2017) used as input linguistic meta information extracted from the texts such as LIWC <ref type="bibr" target="#b15">[16]</ref>, readability and hand-crafted features <ref type="bibr" target="#b16">[17]</ref> obtaining the best ERDE 5 (12.70%) or a combination of linguistic information and temporal variation of terms from users' posts <ref type="bibr" target="#b17">[18]</ref> achieving the best ERDE 50 (9.68%). The best-performing systems from eRisk 2018 were the ones from Funez et al. <ref type="bibr" target="#b18">[19]</ref> and Trotzek et al. <ref type="bibr" target="#b19">[20]</ref>. Funez et al. <ref type="bibr" target="#b18">[19]</ref> propose a user-level approach using an SVM classifier on semantic representations that take into account the temporal variation of terms between the users' posts and achieve an ERDE 5 of 8.78%. On the other hand, the best ERDE 50 (6.44%) is attained by Trotzek et al. <ref type="bibr" target="#b19">[20]</ref> using a chunk-level<ref type="foot" target="#foot_4">6</ref> approach using an ensemble of logistic regression classifiers on bag-of-words features. The dataset from the depression detection task from the eRisk Lab was an important resource later used in different research articles tackling the detection problem using approaches such as a neural network architecture on topic modeling features <ref type="bibr" target="#b20">[21]</ref>, SVM or deep learning architectures using fine-grained emotions features <ref type="bibr" target="#b21">[22]</ref> or deep learning methods using content, writing style and emotion features <ref type="bibr" target="#b22">[23]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Method</head><p>The transformer encoder, as proposed by Vaswani et al. <ref type="bibr" target="#b23">[24]</ref>, essentially consists of multiple sequential layers of multi-head attention. Scaled dot-product attention of a query 𝑄 relative to a set of values 𝑉 and a set of keys 𝐾 is computed using the following equation (𝑑 𝑘 is the dimensionality of the query and keys):</p><formula xml:id="formula_0">Attention(𝑄, 𝐾, 𝑉 ) = softmax( 𝑄𝐾 𝑇 √ 𝑑 𝑘 )𝑉<label>(1)</label></formula><p>As such, multi-head attention consists of multiple applications of the attention mechanism to the same input. The multi-head attention is defined as:</p><formula xml:id="formula_1">MultiHead(𝑄, 𝐾, 𝑉 ) = Concat(head 1 , head 2 . . . head ℎ )𝑊 𝑂 head 𝑖 = Attention(𝑄𝑊 𝑄 𝑖 , 𝐾𝑊 𝐾 𝑖 , 𝑉 𝑊 𝑉 𝑖 )<label>(2)</label></formula><p>In this formulation, multi-head attention is permutation invariant, and the current way to inject temporal information into the input sequence is by employing positional encodings <ref type="bibr" target="#b24">[25]</ref>. This is useful when processing sequential data such as texts. However, by omitting positional encodings, the transformer essentially acts as a set encoder. Lee et al. <ref type="bibr" target="#b25">[26]</ref> introduced the Set Transformer, in which they prove that multi-head attention is permutation invariant and that the Set Transformer is a universal approximator of permutation invariant functions. We make use of this fact to perform user-level classification by processing sets of texts (in the form of social media posts) from a particular user. The intuition behind processing a set of texts from a user is that no single social media post is sufficiently informative for a classifier decision, but rather their interaction and the user behavior as a whole. Moreover, through mean pooling, the inevitable noise (in terms of unrelated posts) is dampened, which aids classification in weakly-supervised scenarios, such as ours, in which a user is labeled rather than all of their posts.</p><p>We consider a user 𝑖 to contain multiple social media posts 𝑈 𝑖 . A set of 𝐾 texts 𝑡 are randomly sampled from 𝑈 𝑖 , which defines our text-set 𝑆 𝑖 = {𝑡 𝑗 ∼ 𝑈 𝑖 , 𝑗 ∈ (1 . . . 𝐾)}. We sample 𝐾 posts from the user's history, instead of processing all of them due to memory limitations -some individuals have thousands of posts while others have only in the order of tens. Moreover, stochasticity is introduced in the training procedure, which prevents overfitting. As such, for training, an input batch of size 𝑛 is defined by the concatenation of 𝑛 such text-sets: 𝐵 = {𝑆 𝑏 1 , 𝑆 𝑏 2 , . . . 𝑆 𝑏𝑛 }. We do not consider the relative order of the texts for a particular user, and text-sets are fed into the transformer encoder without using positional encoding. Since some users have a total number of texts smaller than 𝐾, creating a batch of text-sets is impossible without padding and masking. However, to alleviate this problem, we train with an effective batch size of 1 and chose to employ gradient accumulation to simulate a larger batch size.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> showcases our proposed model architecture for user-level classification. Each text in a text-set is embedded into a fixed-size vector using available pretrained sentence encoder models (i.e., RoBERTa / MiniLM). The text embeddings are fed into the transformer encoder network, and after processing, we perform mean pooling and output the decision. We compute binary cross-entropy at the user-level, for a text-set. The pretrained sentence encoder is frozen and not updated during training.</p><p>Baytas et al. <ref type="bibr" target="#b26">[27]</ref> proposed to use a T-LSTM to process social media posts sequentially as a time-series. The authors modify the LSTM architecture to include a relative time component. However, in our case it is unclear how to incorporate such a mechanism into the transformer  architecture, aside from using a relative positional encoding <ref type="bibr" target="#b27">[28]</ref>, which ignores long-ranged dependencies between posts. As such, we chose to ignore the temporal order of the posts and process them directly as a set. The main reason for considering the posts as a set is that in a user's post history, many posts are uninformative to the modeling task, and by processing a set of texts, label noise is reduced naturally as a direct consequence of the attention mechanism, which assigns more importance to informative posts. However, training with a sufficiently large dataset might achieve the same effect, but previous attempts at post-level classification have proven ineffective <ref type="bibr" target="#b3">[4]</ref>.</p><p>In order to assess the impact of the sentence representations, we chose two different sentence encoders: RoBERTa <ref type="bibr" target="#b28">[29]</ref> and MiniLM <ref type="bibr" target="#b29">[30]</ref>. We chose RoBERTa since it is one of the best performing English language models in downstream tasks <ref type="bibr" target="#b28">[29]</ref>, and MiniLM, a multi-lingual model, since some users have social media posts in languages other than English. Figure <ref type="figure" target="#fig_1">2</ref> showcases the performance gap between the two sentence encoders, averaged across multiple values of 𝐾. RoBERTa yields a consistently superior performance across training steps. Similarly, to assess the impact of the text-set size 𝐾, we performed an ablation study, as shown in Figure <ref type="figure" target="#fig_2">3</ref>. We kept the sentence encoder fixed to RoBERTa, and vary the number of texts per user 𝐾 ∈ {4, 8, 16, 32, 64, 128}. The best performance was achieved with 𝐾 = 16 and 𝐾 = 32 for Tasks 1 and 2, respectively.</p><p>In our final submission, we chose RoBERTa as a sentence encoder and sampled 𝐾 = 16 texts per user for Task 1 and 𝐾 = 32 for Task 2. We used the standard formulation of the transformer network <ref type="bibr" target="#b23">[24]</ref>, with 4 encoder layers, 8 attention heads each and a dimensionality of 256. Both networks were trained for 120 epochs, with AdamW optimizer <ref type="bibr" target="#b30">[31]</ref>, with a cyclical learning rate <ref type="bibr" target="#b31">[32]</ref> ranging from 0.00001 to 0.0001 across 6 epochs and a batch size of 128. To account for class imbalance, we computed balanced class weights with respect to each dataset and adjusted the loss function accordingly. Finally, we opted for a very high threshold when predicting the final decision.</p><p>Our proposed architecture can be easily interpretable using modern explainability methods for feature attribution <ref type="bibr" target="#b32">[33,</ref><ref type="bibr" target="#b33">34,</ref><ref type="bibr" target="#b4">5]</ref>, such as Integrated Gradients <ref type="bibr" target="#b4">[5]</ref>. It automatically identifies social media posts containing signs of mental health disorders and filters out uninformative posts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Interpretability</head><p>Since our model operates on sets of social media texts from a particular user, we can employ model explainability methods to assess the importance of a piece of text to the model decision. Through this, automatic filtering and selection of the most indicative posts of a user can be made for use in dataset creation. This idea is similar to Ríssola et al. <ref type="bibr" target="#b2">[3]</ref>, which employed a series of heuristics to recognize posts portraying depression symptoms for use in constructing a post-level training set from existing depression datasets annotated at the user level. As such, we use Integrated Gradients <ref type="bibr" target="#b4">[5]</ref> to compute attribution scores for a text-set. The integrated gradients method has been used in NLP to explore the contribution of individual words and phrases to a decision made by a classifier. Since we are not operating on words, but rather on whole texts, this method computes the most important text to the classifier decision. Figure <ref type="figure" target="#fig_3">4</ref> showcases selected samples ordered by their attribution score from the validation set of each task. All samples belong to the same user for each task, and the attribution scores are computed within the respective text-set. Posts with a high positive contribution to the decision contain more explicit descriptions of symptoms, while posts with more negative contributions are mainly unrelated to the particular mental illness. We use the integrated gradients method in one of our runs to select the most important posts in the user history. However, we emphasize that the best application of this approach is for automatic dataset creation in scenarios of weak supervision, which we aim to explore in future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Evaluation</head><p>There are two kinds of evaluation used for measuring the performance of the systems, decisionbased and raking-based. The decision-based evaluation is used for quantifying the capacity of a system to perform the binary classification and predicting if a user is from the positive class (i.e., pathological gambling or depression) or the negative one. It is comprised of standard measures for classification (Precision, Recall, F1) and measures for this specific task of early detection that consider the delay and the speed of the decision. The early risk detection error (ERDE) <ref type="bibr" target="#b34">[35]</ref> measures the correct predictions considering a late decision penalty (for predictions taken after the 5 or 50 first submissions of a user). To overcome the limitations of this metric <ref type="bibr" target="#b35">[36]</ref>, the latency-weighted F1 score <ref type="bibr" target="#b36">[37]</ref> was also proposed to measure the performance of early risk detection. Latency measures the delay in detecting true positives based on the median number of submissions seen by the system before taking a decision. The speed of a system that correctly predicts true positives from the first submission is equal to 1, while a slow system which decides after processing hundreds of texts. The latency-weighted F1 combines the F1-score with the delay in decision-taking for true positives. A perfect system should achieve a latency-weighted F1 of 1. Besides the binary classification decisions, the participating teams were asked to also submit a score for estimating the risk of users for the ranking-based evaluation. These scores are used to rank users' risk for pathological gambling or depression. Standard IR metrics (P@10, NDCG@10, and NDCG@100) are used to measure the models' ranking-based performance after processing 1, 100, 500, or 1000 submissions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Task 1: Early Detection of Signs of Pathological Gambling</head><p>The first task proposes the detection of gambling addiction from social media data. This being the second edition of this task, the organizers provided the last year's test data for training the systems. The dataset was collected from Reddit, following the methodology described by Losada and Crestani <ref type="bibr" target="#b34">[35]</ref> and contains a chronological sequence of posts from each user. The training dataset was comprised of 164 pathological gamblers, with a total of 54,674 submissions, and 2,184 control users with 1,073,883 submissions. The test dataset contains 81 users with gambling addiction, summing 14,627 posts, and 1,998 control users with a total of 1,014,122 posts. For the testing phase, the submissions of users were released sequentially, the systems proposed by the participating teams received one submission at a time from all the users. We submitted three runs for the early detection of pathological gambling: Run 0 is comprised of the text-set transformer model using the most recent 𝐾 = 16 posts for prediction; the system for Run 1 is the same text-set transformer model using as input the set of 𝐾 = 16 texts that are most important in a user's history, selected with Integrated Gradients; Run 2 is a baseline run, using the proposed model architecture for predicting at post-level, on one sample at a time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1</head><p>Decision-based evaluation on Task 1: Early Detection of Signs of Pathological Gambling. We show the performance of our systems compared to the best-performing run from each team. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2</head><p>Ranking-based evaluation on Task 1: Early Detection of Signs of Pathological Gambling.</p><p>1 writing 100 writings 500 writings 1000 writings Team Run ID P@10 NDCG@10 NDCG@100 P@10 NDCG@10 NDCG@100 P@10 NDCG@10 NDCG@100 P@10 NDCG@10 NDCG@100 BLUE 0 1.00 1.00 0.76 1.00 1.00 0.81 1.00 1.00 0.89 1.00 1.00 0.89 BLUE 1 1.00 1.00 0.76 1.00 1.00 0.89 1.00 1.00 0.91 1.00 1.00 0.91 BLUE 2 1.00 1.00 0.69 1.00 1.00 0.40 0.00 0.00 0.02 0.00 0.00 0.01 UNED-NLP 4 1.00 1.00 0.56 1.00 1.00 0.88 1.00 1.00 0.95 1.00 1.00 0.95 UNSL 0 1.00 1.00 0.68 1.00 1.00 0.90 1.00 1.00 0.93 1.00 1.00 0.95</p><p>Table <ref type="table">1</ref> showcases the performance of the systems measured using the decision-based measures. Regarding ERDE, our first run (Run 0), using the transformer architecture on the most recent texts from each user, manages to achieve the best ERDE 5 score of 0.015, and the secondbest ERDE 50 score of 0.009, demonstrating that the system could detect early the true positive cases. The perfect scores for latency 𝑇 𝑃 and speed show that our models were successful at detecting the true positive cases after the first writing. As expected, the baseline run using a post-level approach (Run 2) has the lowest performance. Regarding Run 2, we expected it to achieve the best performance from our submitted runs, as this approach is more aggressive in taking decisions by using for classification the most informative posts from users' history. Furthermore, our best run from this year's task surpasses all the runs from our participation in the first iteration of the task in 2021 <ref type="bibr" target="#b3">[4]</ref>, showing that a user-level approach considering a set of texts from each individual is more suitable than a post-level approach. In Table <ref type="table">2</ref> we show the results of the ranking-based evaluation, in which each team had to submit the rankings of users' risk for pathological gambling. Our team has excellent results for NDCG and P@10 in all the situations (after 1, 100, 1000, 5000 writings).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Task 2: Early Detection of Depression</head><p>This year marks the third iteration of the early detection of depression task, continuing the 2017 T1 and 2018 T2 tasks. The organizers provided the data from the previous two editions for training the models. Users from the depression class were labeled by their mention of diagnosis on their Reddit posts (e.g., "I was diagnosed with depression"). In contrast, users from the control class are users who do not have any mention of diagnosis in their posts <ref type="bibr" target="#b34">[35]</ref>. The training dataset comprises 214 users diagnosed with depression with 270,666 submissions and 1493 control users with a total of 2,959,080 submissions. The test set contains 98 users with depression with 35,332 posts, and 1,302 users in the control group with a total of 687,228 posts. The texts for making the predictions for the testing phase were released sequentially, and the systems from the participating teams had to decide on firing a decision for a specific user or waiting for more data. We submitted three runs for the early detection of depression: Run 0 is the text-set transformer model using the most recent 𝐾 = 32 posts for prediction; for Run 1 we employ the same text-set transformer model using as input the set of 𝐾 = 32 texts that are most important in a user's history, selected with Integrated Gradients; Run 2 is a baseline run, using the proposed model architecture for predicting at post-level, on one sample at a time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 3</head><p>Decision-based evaluation on Task 2: Early Detection of Depression. We show the performance of our systems compared to the best-performing run from each team. In Table <ref type="table">3</ref> we present the performance of the systems using the decision-based metrics. Our best performing run is the transformer architecture using the most recent texts from users (Run 0), followed by the system that considers only the most informative submissions from each user for the model's decisions (Run 1). The post-level system (Run 2) has the worst performance. Our three submitted runs achieve high Recall at the expense of lower Precision scores. The precision of our models can be improved by incorporating a mechanism for weighting user posts according to the prevalence of signs of depression <ref type="bibr" target="#b37">[38]</ref>. As such, a text-set containing few posts with signs of depression will not induce a positive prediction. Regarding the early detection evaluation, our team has the second-best score on the ERDE 50 metric (0.027), while our ERDE 5 score is close to the best one. Compared to the best metrics from the 2018 edition of this task, when the best ERDE 5 and ERDE 50 were 0.087 and 0.064, respectively, current systems surpass these scores due to more data being available for training the models and the advancements in the field of machine learning in the last few years. Regarding the standard metrics for classification, a slight improvement was made in terms of F1 score, from 0.64 in 2018 to 0.71 in 2022. The ranking-based evaluation performance from Table <ref type="table">4</ref> shows that for 1 and 1000 writings, our systems attain some of the best scores for P@10 and NDCG.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 4</head><p>Ranking-based evaluation on Task 2: Early Detection of Depression.</p><p>1 writing 100 writings 500 writings 1000 writings Team Run ID P@10 NDCG@10 NDCG@100 P@10 NDCG@10 NDCG@100 P@10 NDCG@10 NDCG@100 P@10 NDCG@10 NDCG@100 </p><formula xml:id="formula_2">BLUE</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this work, we proposed a transformer architecture that performs user-level classification of gambling addiction and depression detection. For each individual, the transformer processes a set of texts encoded by a pretrained sentence encoder to model the interactions between posts and mitigate noise in the dataset. Our network is interpretable and allows for automatic dataset creation by filtering uninformative posts in a user's history. Our method is a promising approach, especially for social media text processing, where a user has many texts: some informative and some unrelated to the particular modeling task. However, their interaction is indicative of the mental state of the user. We attained the best ERDE 5 score of 0.015, and the second-best ERDE 50 score of 0.009 for pathological gambling detection. For the early detection of depression, we obtained the second-best ERDE 50 (0.027).</p><p>For future work, we aim to extend our method and construct a mechanism for encoding the relative order of a user's posts with a modified version of relative positional embeddings <ref type="bibr" target="#b38">[39]</ref>. While we chose an approach that ignores temporal ordering and processes posts as a set, preserving order is a natural way to increase the expressive power in modeling a user's entire social media interactions, similar to architectures such as the time-aware LSTM <ref type="bibr" target="#b26">[27]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Proposed model architecture. We perform user-level classification by operating on a sample of K texts from a user. Texts are encoded with a pretrained sentence encoder and processed by a permutation-invariant transformer network. Binary cross-entropy loss is applied at the user level for a text-set.</figDesc><graphic coords="4,89.29,84.19,416.70,95.61" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Performance of our model across training steps, in terms of F 1 score, for different sentence encoders (RoBERTa / MiniLM). We show the mean and standard deviation of F 1 score across multiple values of 𝐾. For both tasks, RoBERTa yields consistent superior performance compared to MiniLM. Best viewed in color.</figDesc><graphic coords="5,89.29,255.96,416.69,111.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Performance of our model across training steps, in terms of validation F 1 score, for RoBERTa sentence embeddings and varying the 𝐾, the number of texts per user. For Tasks 1 and 2, the best performance is attained with 𝐾 = 16 and 𝐾 = 32, respectively. Best viewed in color.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Texts from a particular user, relatively ranked in terms of attribution scores (contribution to a positive decision by the model) computed with the Integrated Gradients method. For each task, all texts belong to a single text-set of a user. The model is able to identify posts with a clear discriminating information for each task. Best viewed in color. Examples have been paraphrased for anonymity.</figDesc><graphic coords="6,89.29,425.74,416.70,157.01" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://www.reddit.com/r/relationship_advice/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://www.reddit.com/r/depression/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2">https://erisk.irlab.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">Early Risk Detection Error, introduced in Section 5.1</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">in 2018 the test data was released in chunks of posts, not one post at a time as it is the case in this year's tasks</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The work of Ana-Maria Bucur was in the framework of the research project NPRP13S-0206-200281. The work of Paolo Rosso was in the framework of the research project PROME-TEO/2019/121 (DeepPattern) by the Generalitat Valenciana. The authors thank the EU-FEDER Comunitat Valenciana 2014-2020 grant IDIFEDER/2018/025.</p></div>
			</div>


			<div type="funding">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>CEUR Workshop Proceedings (CEUR-WS.org) 1 https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Mental health discourse on reddit: Self-disclosure, social support, and anonymity</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">De</forename><surname>Choudhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>De</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Eighth international AAAI conference on weblogs and social media</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Detection of suicide ideation in social media forums using deep learning</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Tadesse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Algorithms</title>
		<imprint>
			<biblScope unit="volume">13</biblScope>
			<biblScope unit="page">7</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A dataset for research on depression in social media</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">A</forename><surname>Ríssola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Bahrainian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization</title>
				<meeting>the 28th ACM Conference on User Modeling, Adaptation and Personalization</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="338" to="342" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Early risk detection of pathological gambling, self-harm and depression using bert</title>
		<author>
			<persName><forename type="first">A.-M</forename><surname>Bucur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Cosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Axiomatic attribution for deep networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sundararajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Taly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Yan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on machine learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="3319" to="3328" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Overview of erisk 2022: Early risk prediction on the internet</title>
		<author>
			<persName><forename type="first">J</forename><surname>Parapar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">M</forename><surname>Rodilla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Losada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>Crestani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Experimental IR Meets Multilinguality, Multimodality, and Interaction. 13th International Conference of the CLEF Association</title>
				<meeting><address><addrLine>CLEF</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2022">2022. 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Identifying high-risk online gamblers: A comparison of data mining procedures</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Philander</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Gambling Studies</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="53" to="63" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Applying data science to behavioral analysis of online gambling</title>
		<author>
			<persName><forename type="first">X</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lesch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Clark</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Current Addiction Reports</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="159" to="164" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Personality biomarkers of pathological gambling: A machine learning study</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cerasa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lofaro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cavedini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Martino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bruni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sarica</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mauro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Merante</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Rossomanno</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Rizzuto</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of neuroscience methods</title>
		<imprint>
			<biblScope unit="volume">294</biblScope>
			<biblScope unit="page" from="7" to="14" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Early detection of signs of pathological gambling, self-harm and depression through topic extraction and neural networks</title>
		<author>
			<persName><forename type="first">D</forename><surname>Maupomé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Armstrong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rancourt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Soulas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-J</forename><surname>Meurs</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">early alert policies for early risk detection</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Loyola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Burdisso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Thompson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cagnina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Errecalde</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Unsl at erisk 2021: A comparison of three</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note>CLEF (Working Notes</note>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Language use of depressed and depression-vulnerable college students</title>
		<author>
			<persName><forename type="first">S</forename><surname>Rude</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E.-M</forename><surname>Gortner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Pennebaker</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cognition &amp; Emotion</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="1121" to="1133" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A psychologically informed part-of-speech analysis of depression in social media</title>
		<author>
			<persName><forename type="first">A.-M</forename><surname>Bucur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">R</forename><surname>Podină</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP</title>
				<meeting>the International Conference on Recent Advances in Natural Language Processing (RANLP</meeting>
		<imprint>
			<date type="published" when="2021">2021. 2021</date>
			<biblScope unit="page" from="199" to="207" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">The internet-a new source of data on suicide, depression and anxiety: a preliminary study</title>
		<author>
			<persName><forename type="first">S</forename><surname>Fekete</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Archives of Suicide Research</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="351" to="361" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Language patterns discriminate mild depression from normal sadness and euthymic state</title>
		<author>
			<persName><forename type="first">D</forename><surname>Smirnova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cumming</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sloeva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Kuvshinova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Romanov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Nosachev</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Frontiers in psychiatry</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">105</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Linguistic inquiry and word count: Liwc 2001</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">W</forename><surname>Pennebaker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Francis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Booth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Mahway</title>
		<imprint>
			<biblScope unit="volume">71</biblScope>
			<biblScope unit="page">2001</biblScope>
			<date type="published" when="2001">2001</date>
			<publisher>Lawrence Erlbaum Associates</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Linguistic metadata augmented classifiers at the clef 2017 task for early detection of depression</title>
		<author>
			<persName><forename type="first">M</forename><surname>Trotzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Koitka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Friedrich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes)</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Temporal variation of terms as concept space for early risk prediction</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Errecalde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Villegas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">G</forename><surname>Funez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J G</forename><surname>Ucelay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Cagnina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">G</forename><surname>Funez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J G</forename><surname>Ucelay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">P</forename><surname>Villegas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Burdisso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">C</forename><surname>Cagnina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes-Y Gómez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Errecalde</surname></persName>
		</author>
		<title level="m">Unsl&apos;s participation at erisk 2018 lab</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note>CLEF (Working Notes)</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Word embeddings and linguistic metadata at the clef 2018 tasks for early detection of depression and anorexia</title>
		<author>
			<persName><forename type="first">M</forename><surname>Trotzek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Koitka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Friedrich</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF (Working Notes)</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Detecting early onset of depression from social media text using learned confidence scores</title>
		<author>
			<persName><forename type="first">A</forename><surname>Bucur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Dinu</surname></persName>
		</author>
		<ptr target="CEUR-WS.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting>the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020<address><addrLine>Bologna, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2020">March 1-3, 2021. 2020</date>
			<biblScope unit="volume">2769</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Detecting mental disorders in social media through emotional patterns-the case of anorexia and depression</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Aragon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Lopez-Monroy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-C</forename><forename type="middle">G</forename><surname>Gonzalez-Gurrola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Montes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Affective Computing</title>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">An emotion and cognitive based analysis of mental health disorders from social media data</title>
		<author>
			<persName><forename type="first">A.-S</forename><surname>Uban</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chulvi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rosso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Future Generation Computer Systems</title>
		<imprint>
			<biblScope unit="volume">124</biblScope>
			<biblScope unit="page" from="480" to="494" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ł</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Convolutional sequence to sequence learning</title>
		<author>
			<persName><forename type="first">J</forename><surname>Gehring</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Auli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Grangier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yarats</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">N</forename><surname>Dauphin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="1243" to="1252" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Set transformer: A framework for attention-based permutation-invariant neural networks</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kosiorek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Choi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">W</forename><surname>Teh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<meeting><address><addrLine>PMLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="3744" to="3753" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Patient subtyping via timeaware lstm networks</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">M</forename><surname>Baytas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining</title>
				<meeting>the 23rd ACM SIGKDD international conference on knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="65" to="74" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Kazemi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Goel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Eghbali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ramanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sahota</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Thakur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Smyth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Poupart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brubaker</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.05321</idno>
		<title level="m">Time2vec: Learning a vector representation of time</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<monogr>
		<title level="m" type="main">Roberta: A robustly optimized BERT pretraining approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno>CoRR abs/1907.11692</idno>
		<ptr target="http://arxiv.org/abs/1907.11692.arXiv:1907.11692" />
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers</title>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="5776" to="5788" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/1412.6980" />
		<title level="m">Adam: A method for stochastic optimization</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note>ICLR (Poster</note>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Cyclical learning rates for training neural networks</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">N</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE winter conference on applications of computer vision (WACV)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2017">2017. 2017</date>
			<biblScope unit="page" from="464" to="472" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">A unified approach to interpreting model predictions</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">M</forename><surname>Lundberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-I</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">why should i trust you?&quot; explaining the predictions of any classifier</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Ribeiro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</title>
				<meeting>the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1135" to="1144" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">A test collection for research on depression and language use</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Losada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="28" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<monogr>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Losada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">A</forename><surname>Crestani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Parapar</surname></persName>
		</author>
		<title level="m">Overview of erisk at clef 2019: Early risk prediction on the internet (extended overview)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
	<note>CLEF (Working Notes)</note>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">Measuring the latency of depression detection in social media</title>
		<author>
			<persName><forename type="first">F</forename><surname>Sadeque</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bethard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining</title>
				<meeting>the Eleventh ACM International Conference on Web Search and Data Mining</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="495" to="503" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">A dataset for research on depression in social media</title>
		<author>
			<persName><forename type="first">E</forename><surname>Ríssola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Bahrainian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
		<idno type="DOI">10.1145/3340631.3394879</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization</title>
				<meeting>the 28th ACM Conference on User Modeling, Adaptation and Personalization</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="338" to="342" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<analytic>
		<title level="a" type="main">Explore better relative position embeddings from encoding perspective for transformer models</title>
		<author>
			<persName><forename type="first">A</forename><surname>Qu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Niu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2021 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2989" to="2997" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
