<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Detecting Early Risk of Depression from Social Media User-generated Content</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Hayda</forename><surname>Almeida</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Quebec in Montreal (UQAM)</orgName>
								<address>
									<settlement>Montreal</settlement>
									<region>QC</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Antoine</forename><surname>Briand</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Quebec in Montreal (UQAM)</orgName>
								<address>
									<settlement>Montreal</settlement>
									<region>QC</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Marie-Jean</forename><surname>Meurs</surname></persName>
							<email>meurs.marie-jean@uqam.ca</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Quebec in Montreal (UQAM)</orgName>
								<address>
									<settlement>Montreal</settlement>
									<region>QC</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Detecting Early Risk of Depression from Social Media User-generated Content</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">2AA5EAD4683C9E46C0CDCFBF738483DF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:29+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Information Retrieval</term>
					<term>Mental Health</term>
					<term>Natural Language Processing</term>
					<term>Supervised Learning</term>
					<term>Text Mining</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper presents the systems developed by the UQAM team for the CLEF eRisk Pilot Task 2017. The goal was to predict as early as possible the risk of mental health issues from user-generated content in social media. Several approaches based on supervised learning and information retrieval methods were used to estimate the risk of depression for a user given the content of its posts in reddit. Among the five systems evaluated, the experiments show that combining information retrieval and machine learning approaches gives the best results.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The Early Detection of Depression Pilot Task was part of the CLEF eRisk 2017 workshop <ref type="bibr" target="#b15">[16]</ref>. The pilot task challenge consists of performing early risk detection of depression by analyzing user-generated content from reddit <ref type="foot" target="#foot_0">1</ref> . Towards this goal, a system receives user-generated content as input, and should output a prediction regarding the user's susceptibility to depression. The pilot task dataset contains user-generated content, which is organized and processed chronologically. This allows for monitoring the user progress, and detecting risk as early as possible. Users are categorized as risk or non-risk (of depression). Each user produced a sequence of reddit posts, written within a given period of time. The pilot task was organized in two stages: training and test, each having a different dataset divided into 10 chunks. During training stage, a dataset containing a sequential set of posts per user was provided along with the user's category. All training chunks were made available, containing the complete user post sequence. During test stage, the dataset of test users was released sequentially (one release each week). Each release contained part of the user post sequence, corresponding to one chunk (from the oldest to the newest posts). Participant systems had to output predictions for users based on all current test chunks before the release of a new chunk. The predictions could be either the category of a user or no decision, up to the last week of the test stage where all the users had to be given a category.</p><p>We describe hereafter our prediction system based on an ensemble classification approach, which combines supervised learning, information retrieval, and feature selection methods. This report is organized as follows: the system resources are described in Section 3; the system modules, and the decision algorithm merging the module predictions are described in Section 4. Experiments and results are described in Section 5 while conclusions and future works are discussed in Section 6.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Social media content has been commonly utilized to develop approaches that support mental health care. The latest CLPsych Shared Tasks <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b17">18]</ref> have proposed participants to predict users in eminent risk of depression or Post Traumatic Stress Disorder (PTSD). These tasks made use of tweets or mental health forum posts. In <ref type="bibr" target="#b10">[11]</ref>, a sentiment analysis model was built with focus on user-generated social media content. It uses highly relevant sentiment lexicons and sentiment intensity measurements. The authors demonstrated that the approach outperforms other commonly used lexicons, as well as machine learning-based tools. The authors of <ref type="bibr" target="#b18">[19]</ref> evaluated the usage of different features to analyze user posts from LiveJournal 2 , and compare discrepancies between posts from depression related online communities, and control (non-depression) related communities. Another approach was proposed by <ref type="bibr" target="#b16">[17]</ref>, relying on a statistical model based on the analysis of over 176 million tweets to identify communication patterns related to mental illness in Twitter, and to attempt predicting user behavioral patterns related to depression. We describe hereafter studies conducted mainly based on two research fields: supervised learning, and information retrieval.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Supervised Learning for Mental Health</head><p>Several studies were conducted towards identifying mental health issues in social media by using supervised learning methods. The choice of supervised algorithms varies according to the tasks and data at hand. However, the previous studies presented here generally rely on highly discriminative features to achieve state-of-the-art performance. This demonstrates the importance of attribute choice for such tasks. In <ref type="bibr" target="#b7">[8]</ref>, the authors presented a study on predicting depression from tweets by analyzing over 2 million posts of 476 users. The best performance was obtained with a SVM classifier and a set of behavioral features, such as occurrence of pronouns, usage of swearing and depression terms, tweet replies, as well as posting time and frequency. The work presented in <ref type="bibr" target="#b13">[14]</ref> identifies user psychological stress in tweets. Features such as emotion words, smileys, tweet mentions, replies, and posting frequency were obtained from single tweets, and from all user's tweets. The best performance was obtained by a four layer Deep Neural Network (DNN). Previous works have also used Twitter data to identify language differences between users potentially presenting PTSD <ref type="bibr" target="#b5">[6]</ref>, or who attempted suicide <ref type="bibr" target="#b6">[7]</ref>. In both these studies, the authors evaluated user-generated content using word and character language models. The findings point to characteristics of tweets associated to mental health issues, such as heavier use of emotions, usage of third person pronouns, anxiety terms, as well as high posting frequency. The authors in <ref type="bibr" target="#b22">[23]</ref> analyzed Facebook<ref type="foot" target="#foot_2">3</ref> status updates to predict user satisfaction with life. Their approach used feature selection of n-grams and topic extraction, aand built regression models based on the message level, and the user level. The results indicate that a cascade model, using message level predictions to inform user level predictions, performed best.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Information Retrieval for Mental Health</head><p>Information retrieval techniques are widely used to support knowledge discovery in the biomedical field. Most of the approaches are designed to help researchers and practitioners looking for relevant documents to support experiments or diagnoses. In the field of mental health, <ref type="bibr" target="#b9">[10]</ref> reports an interesting study to support mental health maintenance of U.S. army soldiers. The goal is to aid health practitioners to perform efficient follow-ups on soldiers, since the suicide attempt rate among them is known to be high. The approach made use of the Veterans Informatics and Computing Infrastructure (VINCI) resource to process mostly unstructured health information, such as clinical notes. The authors built a search engine based on Apache Solr<ref type="foot" target="#foot_3">4</ref> indexing these textual data to predict the risk of suicide attempt among soldiers. Even though only few pre-processing steps were utilized in this system, it provides promising performance, and covers a larger population than systems based on structured data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Resources</head><p>The following Sections describe the resources utilized to build our systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Dictionaries</head><p>The supervised learning-based systems rely on a set of depression-related dictionaries. The dictionary keywords are used to provide discriminative attributes for automatic classification. The dictionaries we utilized are lists of relevant feelings, medicine, drugs, and diseases, which are assumed to be related to depression. The feeling dictionary is composed of feeling words used in mental status exams <ref type="foot" target="#foot_4">5</ref> , and a conceptual feature map obtained from SenticNet <ref type="bibr" target="#b3">[4]</ref>. The medicine dictionary lists antidepressant names or depression-related medicine, obtained from Wikipedia <ref type="foot" target="#foot_5">6</ref> . The disease dictionary is composed of depression-related disease names, from Wikipedia<ref type="foot" target="#foot_6">7</ref> .</p><p>The drug dictionary contains a list of psychoactive drug names, such as hallucinogens, psychedelics, anxiolytics, and sedatives, also obtained from Wikipedia<ref type="foot" target="#foot_7">8</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Open Source Software</head><p>Classification To support developing the supervised learning method in our system, we have utilized the open-source machine learning framework Weka <ref type="bibr" target="#b23">[24]</ref> <ref type="foot" target="#foot_8">9</ref> . The Weka framework provides standard implementations of several classification algorithms. It also provides modules to handle and process Attribute Relation Format Files (ARFF) files, which contain a matrix representation of the dataset in terms of instances versus features, allowing to easily perform feature selection.</p><p>Indexating The information retrieval method in our system relies on the open-source search platform Apache Solr. The Solr platform allows for building a search engine to perform full-text search in a document index. Both Solr search and index modules are built based on the Apache Lucene<ref type="foot" target="#foot_9">10</ref> library. A Solr index is designed based on a schema, which is composed of a set of fields that represent a document object. Several pre-processing steps are also available in Solr, which can be applied at indexing time and also at query time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Methodology</head><p>To detect users in risk of developing depression, we have designed a multipronged approach that combines results obtained from both Information Retrieval (IR) and Supervised Learning (SL) based systems. The combination is performed by a decision algorithm. In Section 4.1, we explain how we utilized the CLEF eRisk training and test datasets in our experiments. The IR-based systems are described in Section 4.2 while the SL-based systems are presented in Section 4.3. Details on the decision algorithm are presented in Section 4.4. Finally, we briefly describe how we performed experiments to determine the best configuration for our approach in Section 4.5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Dataset</head><p>The CLEF eRisk training and test datasets are composed of user posts extracted from reddit. Both datasets are divided into a total of 10 chunks each, chronologically organized. Each chunk represents a sequence of writings for a given user in a period of time. Table <ref type="table" target="#tab_0">1</ref> shows statistics on the eRisk 2017 pilot task datasets. We have utilized the chronological aspect of the user writings when processing both training and test data. When processing the training data, we have computed the user posting frequency, which is further described in Section 4.3. When processing the test data, we have considered single chunk and multiple chunk predictions, as further explained below. In order to output predictions in a given week, we have utilized the test data in two different ways: first, to obtain a list of predictions only considering the current test chunk; second, to obtain a list of predictions considering all test chunks released so far. Both list of predictions are taken into account when merging outputs from different models and systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Information Retrieval Based Systems</head><p>We used an approach based on IR to retrieve similar documents from a test document used as a query. The intuition is that using the full content of a user post as a query should allow a search engine to retrieve semantically similar documents (posts). In our context, the similar posts are retrieved from the training corpus where they are already labeled according to the risk/no-risk state of the user who wrote them. We built two search engines relying on two different indexes created from the eRisk training corpus with and without indexing stop-words. We then considered the eRisk test documents as queries, which were submitted to both search engines.</p><p>For each test document d submitted to the search engines, we used the class (risk or non-risk) of the top n retrieved documents to compute a score S IR (d) reflecting how likely d has been produced by a depressed user. This can be compared to a k-nearest neighbors approach since we want to get the closest documents (neighbours) to a given document. The number of retrieved documents taken into account has been set experimentally to n = 20. S IR (d) is computed as follows:</p><formula xml:id="formula_0">S IR (d) = 1 n n i=1 δ(d i )</formula><p>where d i is the document retrieved by query d in position i, and</p><formula xml:id="formula_1">δ(d i ) = 1, if d i is labeled as risk 0, otherwise</formula><p>The test documents are then ordered according to their S IR score, and considered as risk candidates if their score is above a given threshold, which was experimentally set.</p><p>The search engines created in this approach rely on Apache Solr, and the BM25 probabilistic ranking algorithm <ref type="bibr" target="#b11">[12]</ref>. We first indexed all the fields in the training set. Two indexes, I and II, were generated based on the same schema but applying different preprocessing steps, which are described in Tables <ref type="table">2 and 3</ref>.</p><p>For Index I, we indexed all the data with little pre-processing. Index II uses the same schema along with more pre-processing steps: stop-words removal, stemming (using the Solr built-in Porter Stemmer algorithm), and punctuation filtering.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Index name Pre-processing</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Index I Tokenization</head><p>Lowercasing Index II As Index I +</p><note type="other">Stemming Stopwords Punctuation</note><p>Table <ref type="table">2</ref>. Pre-processing steps by indexes Table <ref type="table">3</ref> presents the fields used in the schema, i.e. all the fields available in the corpus (title, content, date, label). The Text field is a copy field that contains both content and title, and is used as the default search field. For better handling document-based queries, we utilized the built-in Solr MoreLikeThis (MLT) component 11 . Solr MLT enables retrieving documents that are similar to a given document, and is far more efficient compared to other classical search endpoints.  <ref type="table">3</ref>. Indexed fields</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Supervised Learning Method</head><p>The SL-based approach is based on the combined predictions of several classification models with different configurations. The SL models are designed using four classification algorithms and various feature types described below.</p><p>Features To design models for the SL-based systems, we have extracted discriminative features from the pilot task training dataset. Before extracting features, pre-processing steps were performed. These include word stemming, and normalization of URLs, smiley characters, as well as punctuation. The URLs and smiley normalization are relevant to better process the user-generated content, and help portraying the sentiment associated with a post. URLs can contain picture names, or words that refer to specific subjects. Smiley symbols are often used to represent an emotion, and during pre-processing they are replaced by actual words (e.g., :) or :-) are replaced by happy). All these cues are important since, if present, they might help representing a user's state of mind.</p><p>After pre-processing, four different feature types were extracted: n-grams, dictionary words, selected Part-Of-Speech (POS), and user posting frequency. N-gram features were extracted as of Bag-Of-Words (BOW), bigrams, and trigrams. Dictionary words were extracted based on the depression-related dictionaries described in Section 3.1. POS features were extracted by selecting the words annotated by the Stanford POS Tagger<ref type="foot" target="#foot_10">12</ref> as either adjective (JJ), noun (NN), predeterminer (PDT), particle (RP), or verb (VB).</p><p>As an attempt to account for the temporal evolution of the psychological state of a given user, we computed the user posting frequency, which represents the user activity pattern. The posting frequency of a user is computed as the time lapse between the oldest and the most recent writings, divided by the number of writings a user has generated in total. Statistics on features extracted from the training set are presented in Table <ref type="table" target="#tab_1">4</ref>.</p><p>Classifiers To build the SL models we have used three classification algorithms: Logistic Model Tree (LMT) <ref type="bibr" target="#b12">[13]</ref>, an Ensemble of Sequential Minimal Optimization (SMO) <ref type="bibr" target="#b19">[20]</ref> (ens SMO) classifiers, and an Ensemble of Random Forests <ref type="bibr" target="#b1">[2]</ref> (ens RF) classifiers. The ensembles are composed of 30 different classifiers each. The 30 Random Forest classifiers composing the ens RF were designed with iteration values from 10 to 50 (with increments of 10), and tree depth values from 2 to 10 (with increments of 2), as well as unlimited.</p><p>The 30 SMO classifiers composing the ens SMO were designed with tolerance parameter values from 0.001 to 0.005 (with increments of 0.001), and epsilon for round-off error values from 1 to 5 (with increments of 1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Decision Algorithm</head><p>The decision algorithm merges the predictions from both IR and SL based systems. The IR-based candidates are ranked based on similarity, and each candidate is associated with a S IR score, as described in Section 4.2. Documents with highest scores are considered as candidates for the risk class. For the eRisk task, the high score threshold has been experimentally set to 0.7, i.e. all the candidates are documents d with a score S IR (d) such that S IR (d) ≥ 0.7.</p><p>The SL-based approaches are used to refine the list of candidates proposed by the IRbased systems. To be selected, a document from the IR-based list must be classified as risk by at least one of the SL-based systems. Candidates proposed by the SL-based system are also ordered according to the confidence of the prediction, and first ranked candidates are selected regardless of their presence in the IR list. The decision function ∆ can be formalized as follows:</p><formula xml:id="formula_2">∆(d) = 1 IR (d) + 1 SL (d) + 1 SLf (d)</formula><p>where d is a test document, and 1 IR , 1 SL , 1 SLf are the indicator function respectively associated to the IR-based, the SL-based, and the SL-ranked-first lists of candidates. If ∆(d) ≥ 2, the document d is assigned the risk class, i.e. the user who generated this content is susceptible to depression.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Experiments</head><p>In order to determine the most suitable configuration for the IR and SL based systems, as well as the threshold for the decision algorithm, we have performed several experiments utilizing the pilot task training data. The classification models were selected after performing experiments with all three classifiers using all feature types, or several feature types combined. Only the best performing combination of feature sets and classifiers were kept for the SL-based systems.</p><p>For the experimental evaluation, the pilot task training dataset was utilized as described in Section 4. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results and Discussion</head><p>We submitted predictions on the test dataset obtained by five different systems. Four of these systems rely on a different ensemble configuration. The ensembles are either a merge of results obtained from the SL and IR based systems, or from a group of SL classifiers or IR-based systems. The five presented systems are described here:</p><p>-UQAMA is based on an ensemble approach, merging the output candidates from all SL-based systems (considering three classifiers and all features), with the output candidates from the IR-based systems. -UQAMB is based on candidates proposed by both IR-based systems only. We considered UQAMB as our baseline system.</p><p>-UQAMC is based on SL models built with a LMT classifier, and using as features either BOW or bigrams separately, and BOW or bigrams together with all the dictionary features. -UQAMD is based on SL models built with an ens RF classifier, using as features either BOW or bigrams together with all the dictionary features. -Lastly, UQAME is based on SL models built with an ens SMO classifier, using bigrams separately and together with all the dictionary features.</p><p>The user posting frequency was a feature used by all five systems. Table <ref type="table" target="#tab_2">5</ref> present the results obtained by the five systems in terms of the metrics utilized by the CLEF eRisk pilot task. Besides F1, Precision, and Recall, the pilot task also evaluated systems using the early risk detection error (ERDE) <ref type="bibr" target="#b14">[15]</ref>. The EDRE metric accounts for the imbalance problem on automatic classification, which could bias some classifiers. Additionally it penalizes late risk detection using a specific cost function, considering only the true positive scores, which are related to only the relevant (risk) documents.</p><p>In total, 8 teams participated in the CLEF eRisk 2017 pilot task, submitting a total of 30 different systems <ref type="bibr" target="#b15">[16]</ref>. In obtained by our systems. Among our five presented systems, the best overall performance was achieved by UQAMA with the best F1 score and Recall. The best Precision was achieved by UQAMD, which is designed based on an ens RF classifier. The contribution of each method to the performance of UQAMA needs to be further evaluated, as well as the impact of the various experimental settings. Finally, an interesting observation was drawn from analyzing the user posts of candidates predicted as risk by our systems. The post content of such candidates often presented two major topics: "video games", and "sexuality or relationship issues". The relationship between "depression" and these two topics has been studied from a clinical perspective in several recent works <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b21">22]</ref>. Interestingly, the co-occurrence of these topics with risk of depression was also spotted by our systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>This report describes the early risk prediction systems submitted to the CLEF eRisk 2017 pilot task. The system that performed best is based on a multipronged approach, which combines predictions from SL and IR based systems. SL-based systems made use of four major feature types, and three classification algorithms, LMT, ensemble SMO and ensemble RF. IR-based systems utilize two indexes, and users are ranked according to a similarity score based on the BM25 ranking algorithm <ref type="bibr" target="#b11">[12]</ref>. The predictions obtained from both SL and IR based systems are merged by a decision algorithm. The results demonstrate that combining SL and IR approaches outperforms the results obtained by each approach applied separately.</p><p>Future work During our experimental phase, we have performed preliminary tests to evaluate the usage of three other methods: (1) simple rule-based classification using a sentiment analysis library, (2) deep learning-based classification using a Recurrent Neural Network (RNN), and (3) topic extraction using Latent Dirichlet Allocation <ref type="bibr" target="#b0">[1]</ref>. Improving the system performance will involve further investigation of these approaches, as well as enhancement of the IR-based resources of the system.</p><p>Reproducibility Our system is publicly released as an open source software, and can be accessed at: https://github.com/BigMiners/eRisk2017</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>11 https://cwiki.apache.org/confluence/display/solr/</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>1 .</head><label>1</label><figDesc>The IR-based systems presented in Section 4.2 rank the users (writings) based on the S IR (d) score. This score is based on the categories of the 20 top similar documents retrieved. The number of documents in the top list has been setup through experiments on the training set. We ran several tests with different values (from 5 to 50, with increment of 5), and we chose 20 since it maximized the F-measure.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Statistics on the eRisk 2017 pilot task datasetTraining The training set was provided in its completeness at the beginning of the task. It has been manually annotated by experts. Users are categorized as either risk (depressed) or non-risk (non-depressed). To identify the most suitable models for both IR and SL methods, we performed several experiments using the training data. We utilized the training data in two different ways: first, using cross-validation on the training chunks 1 to 10; second, using the training chunks 1 to 9 as training set, and the training chunk 10 as validation set.</figDesc><table><row><cell></cell><cell cols="2">Training dataset Test dataset</cell></row><row><cell># users</cell><cell>486</cell><cell>401</cell></row><row><cell># writings</cell><cell>294,817</cell><cell>236,371</cell></row><row><cell># no-risk users</cell><cell>403</cell><cell>349</cell></row><row><cell># risk users</cell><cell>83</cell><cell>52</cell></row><row><cell># no-risk writings</cell><cell>263,966</cell><cell>217,665</cell></row><row><cell># risk writings</cell><cell>30,851</cell><cell>18,706</cell></row></table><note>Test The test set was provided gradually, being each test chunk released one week apart from the previous test chunk. Predictions on the test set were therefore provided weekly by our systems.</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 4 .</head><label>4</label><figDesc>Number of unique features</figDesc><table><row><cell></cell><cell># Features</cell></row><row><cell>BOW</cell><cell>105,161</cell></row><row><cell>Bigrams</cell><cell>1,544,714</cell></row><row><cell>Trigrams</cell><cell>3,397,459</cell></row><row><cell>Selected POS</cell><cell>118,139</cell></row><row><cell>Feelings dic.</cell><cell>205</cell></row><row><cell>Medicine dic.</cell><cell>30</cell></row><row><cell>Drugs dic.</cell><cell>57</cell></row><row><cell>Diseases dic.</cell><cell>43</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 5 ,</head><label>5</label><figDesc>we highlight in bold the most interesting results</figDesc><table><row><cell></cell><cell cols="2">ERDE 5 ERDE 50</cell><cell>F1</cell><cell>P</cell><cell>R</cell></row><row><cell cols="2">UQAMA 14.03%</cell><cell>12.29%</cell><cell cols="2">0.53 0.48 0.60</cell></row><row><cell>UQAMB</cell><cell>13.78%</cell><cell>12.78%</cell><cell cols="2">0.48 0.49 0.46</cell></row><row><cell>UQAMC</cell><cell>13.58%</cell><cell>12.83%</cell><cell cols="2">0.42 0.50 0.37</cell></row><row><cell cols="2">UQAMD 13.23%</cell><cell>11.98%</cell><cell cols="2">0.38 0.64 0.27</cell></row><row><cell>UQAME</cell><cell>13.68%</cell><cell>12.68%</cell><cell cols="2">0.39 0.45 0.35</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 5 .</head><label>5</label><figDesc>Performance results on the eRisk test set</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://www.reddit.com/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://www.livejournal.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://www.facebook.com</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://lucene.apache.org/solr/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://psychpage.com/learning/library/assess/feelings.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://en.wikipedia.org/wiki/List_of_antidepressants</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">https://en.wikipedia.org/wiki/Depression_(mood)</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">https://en.wikipedia.org/wiki/Psychoactive_drug</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">http://www.cs.waikato.ac.nz/ml/weka/citing.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="10" xml:id="foot_9">https://lucene.apache.org/core/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="12" xml:id="foot_10">https://nlp.stanford.edu/software/tagger.shtml</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Latent dirichlet allocation</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Blei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">I</forename><surname>Jordan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="993" to="1022" />
			<date type="published" when="2003-01">Jan. 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Random forests</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="5" to="32" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Is video gaming, or video game addiction, associated with depression, academic achievement, heavy episodic drinking, or conduct problems?</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Brunborg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Mentzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">R</forename><surname>Frøyland</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Behavioral Addictions</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="27" to="32" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis</title>
		<author>
			<persName><forename type="first">E</forename><surname>Cambria</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Olsher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rajagopal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th AAAI Conference on Artificial Intelligence</title>
				<meeting>the 28th AAAI Conference on Artificial Intelligence</meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1515" to="1521" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">CLPsych 2015 shared task: Depression and PTSD on Twitter</title>
		<author>
			<persName><forename type="first">G</forename><surname>Coppersmith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dredze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Harman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Hollingshead</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mitchell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology (CLPsych): From Linguistic Signal to Clinical Reality</title>
				<meeting>the 2nd Workshop on Computational Linguistics and Clinical Psychology (CLPsych): From Linguistic Signal to Clinical Reality</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="31" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Measuring Post Traumatic Stress Disorder in Twitter</title>
		<author>
			<persName><forename type="first">G</forename><surname>Coppersmith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Harman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dredze</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM)</title>
				<meeting>the 8th International AAAI Conference on Weblogs and Social Media (ICWSM)</meeting>
		<imprint>
			<date type="published" when="2014-06">June 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Exploratory analysis of social media prior to a suicide attempt</title>
		<author>
			<persName><forename type="first">G</forename><surname>Coppersmith</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ngo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Leary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Wood</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop on Computational Lingusitics and Clinical Psychology (CLPSych)</title>
				<meeting>the 3rd Workshop on Computational Lingusitics and Clinical Psychology (CLPSych)</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="106" to="117" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Predicting Depression via Social Media</title>
		<author>
			<persName><forename type="first">M</forename><surname>De Choudhury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gamon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Counts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Horvitz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th International AAAI Conference on Weblogs and Social Media (ICWSM)</title>
				<meeting>the 7th International AAAI Conference on Weblogs and Social Media (ICWSM)</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">The benefits of playing video games</title>
		<author>
			<persName><forename type="first">I</forename><surname>Granic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lobel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Engels</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">American Psychologist</title>
		<imprint>
			<biblScope unit="volume">69</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page">66</biblScope>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Use of text search to effectively identify lifetime prevalence of suicide attempts among veterans</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">W</forename><surname>Hammond</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Laundry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Oleary</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">P</forename><surname>Jones</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 46th Hawaii International Conference on System Sciences (HICSS)</title>
				<meeting>the 46th Hawaii International Conference on System Sciences (HICSS)</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="2676" to="2683" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">VADER: A parsimonious rule-based model for sentiment analysis of social media text</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Hutto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Gilbert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM)</title>
				<meeting>the 8th International AAAI Conference on Weblogs and Social Media (ICWSM)</meeting>
		<imprint>
			<date type="published" when="2014-06">June 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A probabilistic model of information retrieval: development and comparative experiments: Part 2</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">S</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Walker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">E</forename><surname>Robertson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Information Processing &amp; Management</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="809" to="840" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Logistic model trees</title>
		<author>
			<persName><forename type="first">N</forename><surname>Landwehr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="issue">1-2</biblScope>
			<biblScope unit="page" from="161" to="205" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">User-level psychological stress detection from social media using deep neural network</title>
		<author>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Jia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Feng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd ACM International Conference on Multimedia</title>
				<meeting>the 22nd ACM International Conference on Multimedia</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="507" to="516" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A Test Collection for Research on Depression and Language Use</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Losada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference of the Cross-Language Evaluation Forum for European Languages</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="28" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">CLEF Lab on Early Risk Prediction on the Internet: Experimental foundations</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Losada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Crestani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Parapar</surname></persName>
		</author>
		<author>
			<persName><surname>Erisk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings Conference and Labs of the Evaluation Forum CLEF 2017</title>
				<meeting>Conference and Labs of the Evaluation Forum CLEF 2017<address><addrLine>Dublin, Ireland</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2017">2017. 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Using social media to monitor mental health discussions-evidence from twitter</title>
		<author>
			<persName><forename type="first">C</forename><surname>Mcclellan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">M</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Mutter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kroutil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Landwehr</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Medical Informatics Association</title>
		<imprint>
			<biblScope unit="page">133</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note>JAMIA)</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">CLPsych 2016 Shared Task: Triaging content in online peer-support forums</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">N</forename><surname>Milne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hachey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Calvo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology (CLPsych)</title>
				<meeting>the 3rd Workshop on Computational Linguistics and Clinical Psychology (CLPsych)</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="118" to="127" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Affective and content analysis of online depression communities</title>
		<author>
			<persName><forename type="first">T</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Phung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Venkatesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Berk</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Affective Computing</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="217" to="226" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">Sequential minimal optimization: A fast algorithm for training support vector machines</title>
		<author>
			<persName><forename type="first">J</forename><surname>Platt</surname></persName>
		</author>
		<idno>MSR-TR-98-14</idno>
		<imprint>
			<date type="published" when="1998-04">April 1998</date>
		</imprint>
		<respStmt>
			<orgName>Microsoft</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Tech. Rep.</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">The relationship between multiple sex partners and anxiety, depression, and substance dependence disorders: a cohort study</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ramrakha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Paul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Bell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Dickson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">E</forename><surname>Moffitt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Caspi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Archives of Sexual Behavior</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="863" to="872" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">The relationship between addictive use of social media and video games and symptoms of psychiatric disorders: A large-scale cross-sectional study</title>
		<author>
			<persName><forename type="first">C</forename><surname>Schou Andreassen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Billieux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">D</forename><surname>Griffiths</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Kuss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Demetrovics</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mazzoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Pallesen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Psychology of Addictive Behaviors</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">252</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Predicting individual well-being through the language of social media</title>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Schwartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sap</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Kern</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Eichstaedt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kapelner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Blanco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Dziurzynski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Park</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Stillwell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Pacific Symposium on Biocomputing (PSB)</title>
				<imprint>
			<date type="published" when="2016-01">January 2016</date>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="page" from="516" to="527" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">The WEKA Workbench. Online Appendix for &quot;Data Mining: Practical machine learning tools and techniques</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">H</forename><surname>Witten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Frank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Hall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Pal</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>Morgan Kaufmann</publisher>
		</imprint>
	</monogr>
	<note>4 edn</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
