Towards Predicting the Subscription Status of Twitch.tv Users ECML-PKDD ChAT Discovery Challenge 2020 Konstantin Kobs,1 Martin Potthast,2 Matti Wiegmann,3 Albin Zehe,1 Benno Stein,3 Andreas Hotho1 1 Julius-Maximilians Universität Würzburg {kobs,zehe,hotho}@informatik.uni-wuerzburg.de 2 Leipzig University martin.potthast@uni-leipzig.de 3 Bauhaus-Universität Weimar {matti.wiegmann,benno.stein}@uni-weimar.de https://events.professor-x.de/dc-ecmlpkdd-2020/ Abstract We investigate whether the subscription status of active users of Twitch can be inferred from their activity patterns in the chats of streamers. To enable a diversity of solutions to this problem, this task was advertised as an ECML-PKDD discovery challenge 2020, called Chat Analytics for Twitch (ChAT). Four participants submitted their work- ing prediction models, which were evaluated at our site. The winning approach achieved an F1 score of 0.343, outperforming the baseline by a significant margin. The most salient conclusion that can be drawn at this time is that interaction behavior plays a crucial role in solving this task, meriting further analysis into this direction. 1 Introduction The popularity of game streaming and corresponding platforms, such as Twitch,2 for entertainment and professional e-sports is on the rise. The basic function of such platforms is to enable a screencast along with commentary to a live au- dience. While most streamers focus on streaming and commenting the game- play of games they are currently playing, other video and audio content is also streamed, though at a much smaller volume. The audience, in turn, can interact with the streamers in a channel’s chat and by other means. This enables stream- ers to engage with their audience in real time in order to build a followership and, eventually, to monetize their channel. The audience can donate to streamers and subscribe to the channel by paying a monthly fee, which is typically between Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 https://www.twitch.tv 2 Kobs et al. five to twenty-five dollars per channel subscription at the time of writing. This comes with exclusive chat and channel features such as exclusive channel-specific emotes, which are still or moving images approximately the size of a standard emoticon. A channel’s earnings are split 50/50 between the streamer and Twitch, incentivizing both to convert watching users to subscribed users [12]. Targeted advertising can be a useful tool to accomplish this, but depends on identifying users open to subscribe to a channel. If a classifier can be developed that pre- dicts the subscription status of a user-channel combination (based on the chat comments and activity patterns of the user in the channel), then applying this model to currently unsubscribed user-channel combinations can result in poten- tial targets for advertisement. At the ECML-PKDD Discovery Challenge “Chat Analytics for Twitch” (ChAT), the task was to build such a binary classification system. In order to enable the task, we provide a large training dataset consisting of over 400 million public Twitch comments published along this novel task. Addi- tionally, we constructed a test dataset with certain characteristics that can be used as both a benchmark for comparison and as a basis for future research and analysis. The training dataset was provided to participants, who developed their prediction models at their own site and then deployed their trained model to our online evaluation platform TIRA [19]. Each team was assigned a virtual machine to facilitate the installation of required dependencies, where at no point partic- ipants were given direct access to the test data. TIRA enables blind evaluation by preventing outside access while a software is executed. A participant’s soft- ware was executed remotely on the test data and its outputs were recorded and evaluated. The virtual machines on which the participants deployed their soft- ware were archived for reproducibility purposes. After the submission deadline all data has been made publicly available. This paper describes the task and the evaluation data for this challenge, summarizes the submissions, and presents an analysis of their performance. Our contributions are threefold: (1) An original task to predict the subscription status of Twitch users at given channels based on their interactions and chat messages. (2) A training dataset with Twitch chat messages, and a test dataset with char- acteristics useful for further analysis. (3) An overview and evaluation of the approaches that were submitted as part of the challenge. The remainder of the paper is organized as follows: Section 2 gives an overview on related work. Sec- tion 3 introduces the task and the datasets as well as the evaluation metric and the baseline. Section 4 describes the approaches developed by the participants, including the employed features and models. Section 5 analyzes the results of the submitted approaches. 2 Related Work As Twitch is one of the most popular streaming platforms for games, numer- ous studies have been conducted to understand the use, the impact, and the challenges of this new form of media. While many publications examine social, Towards Predicting the Subscription Status of Twitch.tv Users 3 cultural, and economic dynamics of the platform [1, 5, 8, 14, 24], implications for the media and community landscape [7, 12, 13, 22, 23], and its language [15, 18], only few also utilize machine learning methods to automatically process the chat and interaction data [2, 15]. Kobs et al. investigate the usefulness of emotes as indicators for the sentiment of chat messages [15]. They build an emote sentiment lexicon via crowdsourcing to improve sentiment analysis models based on dictionaries and convolutional neural networks, showing that common word-based dictionaries cannot capture the sentiment in the setting of chat messages on Twitch due to the platform- specific slang. Barbieri et al. define two Twitch-specific tasks: predicting emotes that are likely to be used in a message, and detecting troll messages, for which they also utilize emotes to generate ground truth labels [2]. Both tasks focus on the text of chat messages. Multiple experiments show that a Bidirectional LSTM [11] performs best for both tasks. In this work we introduce a novel task: the prediction of the subscription status of users based on their channel interaction; obviously this is valuable knowledge that can be used in a subscription recommendation setting. Many rec- ommendation algorithms, such as collaborative filtering, try to predict whether a user is interested in an item (in our case a channel) by correlating her prefer- ences with similar users [21]. In our setting, users who have subscribed to similar channels might be recommended the channels to which other similar users are subscribed to. However, personal interactions are not trivial to be included in such algorithms. For example, Twitter relies on a graph-based recommendation system that models users another user follows [10]. YouTube recommends indi- vidual videos instead of entire channels. Twitch subscriptions cost a monthly fee, while other social media platforms such as Facebook or Instagram allow users to follow or subscribe to channels free of charge. Note that approaches outside of the domain of social media are also related: E.g., in a digital newspaper setting, potential subscribers are identified using different user engagement features, such as the number of articles read, or the average time spent on an article [6]. In- stead of a single newspaper, Twitch hosts hundreds of channels that a user can subscribe to. The direct interaction of users with their channels’ hosts based on the chats and comments hence play an important role. 3 Subscription Prediction for Twitch We define the task of subscription prediction for Twitch as follows: Given the chat messages of a user in a channel on Twitch (including metadata such as timestamps and the currently streamed game), predict whether or not the user is subscribed to the channel. We instructed the participants not to augment their models using data other than the training data we supplied. The only exception to this rule is the use of pretrained models or dictionaries of emotes and their text representations. We ensured that the training and test datasets are disjunct 4 Kobs et al. in terms of user-channel combinations in order to prevent leakage of ground truth.3 3.1 Twitch Crawl A large dataset of nearly all publicly available Twitch comments in January 2020 was crawled using Twitch’s official API. Only channels labeled as English were considered. All user-channel combinations for which the subscription status changed during the recorded time period were omitted, i.e., if a user subscribed to or unsubscribed from the channel during January 2020. For each user-channel combination for which at least one comment has been recorded, the following metadata was recorded: – Name of the channel (anonymized). – Name of the commenting user (anonymized). – Whether or not the user is subscribed to the channel. – All public chat messages of the user in the channel in the recorded time period, each containing the timestamp when the user commented, the game that was played in the channel when the user commented in the form of a string label, and the chat comment/message itself. Many messages contain so-called emotes, i.e., still or moving images approx- imately the size of a standard emoticon. Emotes are very popular on Twitch and are used to express the emotional state of a user while watching [15]. Every emote has a text representation that is present in the text field. For example, in the message “awesome LUL”, ‘LUL’ is the text representation of the emote , which indicates general laughter and amusement. The crawled dataset was split into training and test datasets. The training dataset has a size of approximately 37 GB. In what follows, we describe the sampling strategy for the test dataset and report on a brief corpus analysis. 3.2 Test Dataset For the test dataset, we sampled 90,000 user-channel combinations with corre- sponding metadata, resulting in 636,452 messages. To ensure that different user and channel activities are covered in the test dataset, we employed the following sampling procedure. Given the number of messages in the dataset per user and channel, we categorized each user and channel into three activity classes based on the number of comments: 25 % of the users/channels with the lowest message counts are considered to be of low activity, and 25 % with the largest message counts are considered to be of high activity. All other users and channels are con- sidered to be of normal activity. We sampled 10,000 user-channel combinations 3 By mistake, emotes accessible to already subscribed users were not removed; one of the participants exploited this “feature” (without notifying us), rendering their approach infeasible in practice. Nevertheless, it provides for an interesting baseline. Towards Predicting the Subscription Status of Twitch.tv Users 5 Table 1. Statistics of the training dataset. Standard Statistic Mean Median Deviation channels a user comments in 3.73 14.91 2 channels a user is subscribed to 1.50 1.33 1 comments per user in channel 51.83 2350.20 7 comments per user in channel (subscribed) 55.66 164.78 9 comments per user in channel (not subscribed) 43.08 2385.56 6 comments in a channel 2802.61 23931.08 285 for each combination of user and channel activity, yielding 90,000 user-channel combinations in total for the test dataset. These user-channel combinations were removed from the training dataset. Additionally, for a randomly sampled 50 % of users in the test set, we removed their comments in other channels as well, such that half of the users are not present in the training dataset at all. This allowed us to analyze whether models perform better for already known users, even though no messages in the desired channel were present. 3.3 Training Dataset After removing the test data and the messages of 50 % of the test users in all other channels, the training dataset contained 29,539,420 user-channel combi- nations and a total of 410,686,442 public Twitch comments. Table 1 overviews key figures of the data. The training dataset has records of 146,537 channels and 7,923,774 users. On average, each user has written comments in fewer than four channels, while being subscribed to an average 1.5 channels. Among the 29 million user-channel combinations in the training data, 2,368,323 (8.02 %) users were subscribed, and 27,171,097 (91.98 %) were not. Subscribed users have a higher mean comment count than non-subscribed users. The high difference of standard deviation between subscribed and not subscribed users can be ex- plained by the use of bots that are not subscribed to channels, but comment very often in order to engage viewers, or to notify users and the streamer about certain events. Figure 1 depicts the number of messages that a user has sent or a chan- nel has received. Both histograms show exceedingly active users and channels having sent and received an extensive number of comments. The user with the most comments, “streamelements”, is a bot used by many streamers to send no- tifications about the channel to the chat. The channel that received the most comments, “xqcow”, belongs to a professional gamer and streamer. Regarding the content, the word cloud in Figure 2 gives an impression of the words used in the comments. Emotes play an important role in Twitch comments, e.g., LUL and PogChamp are heavily used. The Twitch chat is case- sensitive and only displays emotes if they are typed correctly; emotes mostly contain capital letters. Therefore, most words in Figure 2 containing capital 6 Kobs et al. Channels 0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 107 106 105 104 Messages Channels 103 102 Users 101 100 0 1 2 3 4 5 6 7 8 Users (millions) Figure 1. Histogram of the number of messages per channel (top) and user (bottom). letters depict emotes; “normal” words are mostly written lower-case. Besides emotes, ASCII-style emoticons such as :) or :D are popular on Twitch, too. It is not surprising that gaming and online slang words, such as “u” as a short form for “you”, “stream”, “play”, and “lol”, are often used. Kobs et al. [15] provides a more detailed analysis of the Twitch comment’s usage and activity patterns. 3.4 Performance Measures and Baselines Given the 90,000 user-channel combinations including their metadata from the test dataset, the participants’ models were supposed to predict whether or not the user is subscribed to the channel. Submissions were evaluated using the F1 measure, which is the harmonic mean of precision and recall with respect to sub- scribed user-channel combinations. Owing to the high class-imbalance between subscribed and unsubscribed users, a majority baseline yields an F1 score of 0.0. We further provide a random baseline which assigns class labels according to their distribution found in the the training dataset (8.02 % subscribed, 91.98 % not subscribed). Finally, the submission ItsBoshyTime provides a baseline based on the usage of subscriber-only emotes. 4 Survey of Submitted Approaches From the 23 registered teams only four submitted their approaches, three of which submitted also a notebook paper describing their approach.4 All ap- proaches rely on certain machine learning methods (see Table 2 for an overview), 4 Given the importance of emotes on Twitch, and for a bit of fun, we asked participants to choose one of the most common Twitch emotes as their team name. Towards Predicting the Subscription Status of Twitch.tv Users 7 Figure 2. A word cloud of the training dataset (excluding common stop words) in the form of the Twitch logo. Table 2. Overview of the submitted approaches by classification model used, employed data re-sampling, and model optimization. Team Model Re-sampling Optimization downsampling to feature selection, data sampling, VoyTECH CatBoost balance activity parameter tuning downsampling to CoolStoryBob XGBoost feature selection balance class StinkyCheese Neural – parameter tuning but model the input data in different ways. Table 3 gives an overview of the used features, categorized into four groups: (1) stylometric features describing the writing style of the users, (2) user activity features modeling the behavior of the users, (3) channel activity features modeling the behavior of the channels, and (4) interaction features modeling the relationship between a user and a channel. In the following the approaches are reviewed in greater detail. 8 Kobs et al. Table 3. Overview of the hand-crafted features for each approach, separated by sty- lometric features of a users messages, activity features of the user and the channel separately, and interaction features of a user within a channel. Features VoyTECH CoolStoryBob StinkyCheese Stylometric med. chars/message num. emoji avg. num. words/channel (user) max. chars/message num. distinct emotes – num. chars/message num. emotes – – num. allcaps – – num. single chars – – num. numericals – – num. stop word – – num. !,#, @ – – num. emoticons – – avg. word length – – sentiment score – Activity activity group num. word num. words (user) num. channels num. distinct games num. channels num. messages/game num. games – max. message interval – – active days/month – – num messages/game – – num. games – – top game – – num. message/chatting – – % messages in top game – – Activity id – num. messages (channel) num. users – avg. num. words per user activity group – agv. users max. message interval – – active days/month – – num messages/game – – num. games – – top game – – num. message/chatting – – % messages in top game – – Interaction time of first message – sum. time spent (user + time of last message – avg. message interval channel) days active – std. message interval – – min. message interval – – max. message interval VoyTECH by Bayer and Zouzias [3] is the winning approach. It is based on gradient boosting trees (CatBoost [20]) with hand-engineered features that model the user and channel behavior without considering the content of the chat text. Since the approach does not use the textual content, the chat messages are not preprocessed. Instead, the input is modeled completely by the 26 features shown in Table 3. Some features represent superficial stylometric information that encode the message length, some interaction features encode interaction duration, but most features model the activity of users and channels. It is note- worthy that VoyTECH uses the game as an anchor to assess the relationship between unseen users and channels, where eleven of the 26 features indicate the relationship between games, channels, and users. The authors subsample a val- idation dataset from the training data that is structurally similar to the test data, having a balanced distribution over user and channel activity levels as well Towards Predicting the Subscription Status of Twitch.tv Users 9 as a balanced number of known users. To find the optimal configuration of their model, Bayer and Zouzias carry out several experiments with varying features, differently-sized subsets of the training data, and diverse hyperparameters. They conclude that the best model on their validation dataset uses as many features and as much data as possible, as opposed to using a specific subset of the data or using only a selection of features. CoolStoryBob by Gärtner et al. [9] uses a feature-based gradient boost- ing model (XGBoost [4]), but focuses on representing the users’ texts rather than their activity or their interaction with the channels. The chat messages are preprocessed including lower-casing, removing the most and the least fre- quent words, common colloquial terms, stop words (NLTK’s stopword list [16]), and single-character tokens, replacing emojis and emoticons with corresponding text tokens, lemmatization (WordNet), and collapsing repetitions. The approach combines three sets of features: count vectors of the game titles, TF-IDF vectors computed from the chat messages with regard to subscription status, and hand- crafted numerical features primarily describing stylometric information as shown in Table 3. The authors subsample the original training dataset to balance the subscription status of the user-channel combinations. To find the optimal con- figuration of their model, the authors carry out a feature-value analysis and compare different model configurations with a five-fold cross validation. ItsBoshyTime exploits some shortcomings of our dataset. Since subscribers of channels can use channel-specific emotes, the usage of such emotes from the target channel reveals the ground truth about the user in question. While this approach is impractical, it provides as an interesting baseline since not all sub- scribed users make use of the channel-specific subscriber-only emotes available to them. To extract subscriber emotes from the training data, a dictionary of channels and their emotes was constructed via a heuristic to extract emotes from the messages in the training dataset: If a word begins with a lowercase letter and contains either a capital letter or a number, it is assumed it to be an emote. While most globally available Twitch emotes begin with a capital letter (e.g. LUL or PogChamp), subscriber emotes have a lower-case prefix based on the username which is usually automatically generated by Twitch.5 Based on this heuristic, an emote list for each channel in the training data is available. If a new user-channel combination is to be predicted, it is checked whether the channel has already been seen. If the channel is unknown, the approach defaults to predicting “not subscribed”; if an emote list for the channel is available, it is matched with the user’s list of used emotes. In case of a match the user is probably subscribed to the respective channel. StinkyCheese by Loures et al. [17] is based on a neural network, combining an LSTM [11] with hand-crafted features tp model the verbosity, the partic- ipation, and the attendance of users towards channels. The chat messages of user-channel combinations are not preprocessed, but concatenated and fed to an LSTM layer for encoding. The resulting textual encoding is concatenated with 5 https://help.twitch.tv/s/article/subscriber-emote-guide 10 Kobs et al. Table 4. Results of the competition, shown are the Precision, Recall, and F1 as well as the runtime in H:M:S. For the random baseline, expected values are provided. Rank Team Precision Recall F1 Runtime 1 VoyTECH 0.2796 0.4446 0.3433 00:07:39 2 CoolStoryBob 0.1904 0.4341 0.2647 00:05:34 3 ItsBoshyTime 0.4808 0.1775 0.2593 00:00:19 4 StinkyCheese 0.0817 0.5487 0.1422 00:13:06 Random Baseline 0.0689 0.0802 0.0741 hand-crafted features covering all of our four categories, but each less extensively than in the other submissions. The concatenated feature vector is fed through a fully connected layer for classification. In order to handle the large dataset, the training dataset is split into chunks of 100,000 user-channel combinations, and the model trained on one of these chunks. To improve the model, the authors optimize the hyperparameters on a second 100,000 user-channel chunk. 5 Results and Discussion The achieved performance of the participants and the random baseline are shown in Table 4. VoyTECH outperforms the competition by a fair margin. Relevant Features. A general trend we identify is that activity and interaction between games, channels, and users are more important than textual features. Gärtner et al. [9] reports that there is little difference in word usage between subscribed and not subscribed users. They also find that content features are of little significance in their model. In addition, the winning approach VoyTECH does not use content features at all, but only models interaction and stylometrics when it represents activity. StinkyCheese, which relies most on content— using an LSTM to directly incorporate the chat message contents—achieves the weakest performance. The top two approaches VoyTECH and CoolSto- ryBob explore the influence of activity groups on their performance while using very similar models. The authors of VoyTECH additionally resampled their validation dataset based on activity. Generalization to Unseen Users. As described in Section 3.2, 50 % of the users in the test set do not appear in the training dataset, enabling an analysis whether there are any differences in prediction performance between known and new users. Table 5 shows the F1 scores for all submissions on the test set, dependent on whether users are or are not part of the training data (Known Users and New Users, respectively). For each approach, except StinkyCheese, users already present in the training data were more often classified correctly than new users. The drop in performance is the largest for VoyTECH, as it relies on many user-centered features. Given only the messages of a user in the target channel Towards Predicting the Subscription Status of Twitch.tv Users 11 and thus missing additional information from the user’s interactions with other channels, the extracted features are less representative. Still, VoyTECH achieves better performance than the other approaches. Table 5. Performance difference of the submitted approaches on different subsets of the test data: Whether or not prior information about the user’s behavior in other channels is available, and dependent on channel and user activity classes. VoyTECH CoolStoryBob ItsBoshyTime StinkyCheese Known Users 0.3670 0.2660 0.2660 0.1410 Users New Users 0.3210 0.2630 0.2530 0.1440 low 0.2469 0.1827 0.0505 0.1232 Channel normal 0.3640 0.2404 0.1718 0.1308 high 0.4046 0.3481 0.4220 0.1738 Activity low 0.2824 0.2420 0.1781 0.0961 User normal 0.3452 0.2726 0.2689 0.1283 high 0.3744 0.2672 0.2973 0.1716 Results by User and Channel Activity. Table 5 also shows the performance of the submitted approaches based on different channel and user activities, respectively, as defined in Section 3.2. For the most part, it can be said that, the higher the activity of a user or a channel, the better the model can predict the subscription status of a user-channel combination. A more fine-grained activity analysis can be found in Table 6, considering all combinations of activity classes of users and channels. Again, the performance is mostly best for highly active users and channels and worst for users and channels with low activity. Most extracted features are based on the interaction of users and channels as well as their content. Having few interactions leads to less data and thus less robust features for a given user-channel combination. Table 6. F1 scores for different user and channel activity combinations. Best values per team are written in bold. Channel Activity Channel Activity VoyTECH CoolStoryBob low normal high low normal high low 0.184 0.298 0.351 low 0.156 0.156 0.354 User User normal 0.272 0.331 0.412 normal 0.187 0.213 0.391 Activity Activity high 0.264 0.411 0.430 high 0.189 0.272 0.320 Channel Activity Channel Activity ItsBoshyTime StinkyCheese low normal high low normal high low 0.028 0.088 0.294 low 0.086 0.079 0.121 User User normal 0.057 0.131 0.449 normal 0.115 0.107 0.168 Activity Activity high 0.058 0.225 0.482 high 0.147 0.167 0.198 12 Kobs et al. Table 7. Ensembles built from the approaches submitted to the challenge. None of the ensembles outperforms the VoyTECH approach. Method Precision Recall F1 VoyTECH 0.28 0.44 0.34 majority vote 0.20 0.46 0.28 “any” ensemble 0.10 0.76 0.17 “all” ensemble 0.38 0.21 0.27 “ VoyTECH or else” ensemble 0.20 0.46 0.28 Ensemble Approaches. Given that the three approaches that rely on different features and classifiers (excluding ItsBoshyTime), it is interesting to explore ensemble classification. We evaluated four different ensembles: 1. Majority vote, where users were classified as subscribed to a channel if at least two approaches say so, 2. An “any” ensemble, which classifies users as subscribed to a channel if at least one approach says so, 3. An “all” ensemble, which classifies users as subscribed to a channel if all approaches say so, and 4. A “ VoyTECH or else” ensemble, which follows the classification of the best-performing approach, VoyTECH, unless both other approaches dis- agree with it. All ensembles lead to overall worse F1 scores than the VoyTECH approach by itself. However, as can be expected, the “any” ensemble has a notably higher recall at a lower precision, while the “all” ensemble has higher precision at lower recall. Thus, these ensembles may still be relevant when optimizing for one of these metrics. The full results for all ensembles are given in Table 7. 6 Conclusion This paper presents the results of the ECML-PKDD ChAT Discovery Chal- lenge 2020. It outlines the task, the datasets, the approaches, as well as the results achieved by the submissions. Our analysis of the models covers different user and channel activity groups, as well as the generalizability towards new users. We are convinced that there is potential to further improve the predictions—examples: adding message contents in the winning submission VoyTECH for raising the model fidelity, or using ideas from StinkyCheese for better predicting new users. While most approaches work best with highly active users and chan- nels, the CoolStoryBob seems to work particularly well with normally active users. Combining their features and ideas into future models may further im- prove the prediction quality. In addition, adding Twitch-specific features such as the sentiment of Twitch comments (e.g., extracted using the technique described Towards Predicting the Subscription Status of Twitch.tv Users 13 by Kobs et al. [15]) appears promising. In this challenge, the channel and user names were anonymized for privacy. However, the name of a user may give hints on the subscription status, e.g., for users who include their favorite game into their screen name. Altogether, our challenge takes a first step towards solving the task of predicting the subscription status of users at channels, giving rise to new opportunities for marketing on game streaming platforms. Acknowledgments We thank all participating teams for submitting their models and papers, and the ECML-PKDD organizers for hosting our shared task as a discovery challenge. Bibliography [1] S. L. Anderson. Watching People Is Not a Game: Interactive Online Corporeality, Twitch.tv and Videogame Streams. Game Studies, 17(1), July 2017. ISSN 1604-7982. URL http://gamestudies.org/1701/articles/anderson. [2] F. Barbieri, L. Espinosa Anke, M. Ballesteros, J. Soler, and H. Saggion. Towards the Understanding of Gaming Audiences by Modeling Twitch Emotes. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 11–20, Copenhagen, Denmark, 2017. Association for Computational Linguistics. doi: 10.18653/v1/W17-4402. URL http://aclweb.org/anthology/W17-4402. [3] I. Bayer and A. Zouzias. Team voyTECH: User Activity Modeling with Boosting Trees. In K. Kobs, M. Potthast, M. Wiegmann, A. Zehe, B. Stein, and A. Hotho, editors, Proceedings of the ECML-PKDD Discovery Challenge: Chat Analytics for Twitch (ChAT 2020), 2020. [4] T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016. [5] B. C. Churchill and W. Xu. The Modem Nation: A First Study on Twitch.TV Social Structure and Player/Game Relationships. In 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), pages 223–228, Oct. 2016. doi: 10.1109/BDCloud-SocialCom-SustainCom.2016.43. [6] H. Davoudi, M. Zihayat, and A. An. Time-aware subscription prediction model for user acquisition in digital news media. In Proceedings of the 2017 SIAM International Conference on Data Mining, pages 135–143. SIAM, 2017. [7] T. Faas, L. Dombrowski, A. Young, and A. D. Miller. Watch Me Code: Programming Mentorship Communities on Twitch.tv. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW):1–18, Nov. 2018. ISSN 2573-0142, 2573-0142. doi: 10.1145/3274319. URL https://dl.acm.org/doi/10.1145/3274319. [8] E. Gandolfi. To watch or to play, it is in the game: The game culture on Twitch.tv among performers, plays and audiences. Journal of Gaming & Virtual Worlds, 8(1):63–82, Mar. 2016. ISSN 1757191X, 17571928. doi: 10.1386/jgvw.8.1.63_1. URL http://openurl.ingenta.com/content/xref?genre= article&issn=1757-191X&volume=8&issue=1&spage=63. 14 Kobs et al. [9] M. Gärtner, A. Theissler, and M. Fernandes. Detecting Potential Subscribers on Twitch: A Text Mining Approach with XGBoost – Discovery challenge ChAT: CoolStoryBob. In K. Kobs, M. Potthast, M. Wiegmann, A. Zehe, B. Stein, and A. Hotho, editors, Proceedings of the ECML-PKDD Discovery Challenge: Chat Analytics for Twitch (ChAT 2020), 2020. [10] P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh. Wtf: The who to follow service at twitter. In Proceedings of the 22nd international conference on World Wide Web, pages 505–514, 2013. [11] S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997. [12] M. R. Johnson and J. Woodcock. “It’s like the Gold Rush”: The Lives and Careers of Professional Video Game Streamers on Twitch.tv. Information, Communication & Society, 22(3):336–351, 2019. [13] M. R. Johnson and J. Woodcock. The impacts of live streaming and Twitch.tv on the video game industry. Media, Culture & Society, 41(5):670–688, July 2019. ISSN 0163-4437. doi: 10.1177/0163443718818363. URL https://doi.org/10.1177/0163443718818363. Publisher: SAGE Publications Ltd. [14] M. R. Johnson and J. Woodcock. “And Today’s Top Donator is”: How Live Streamers on Twitch.tv Monetize and Gamify Their Broadcasts. Social Media + Society, 5(4):2056305119881694, Oct. 2019. ISSN 2056-3051. doi: 10.1177/2056305119881694. URL https://doi.org/10.1177/2056305119881694. Publisher: SAGE Publications Ltd. [15] K. Kobs, A. Zehe, A. Bernstetter, J. Chibane, J. Pfister, J. Tritscher, and A. Hotho. Emote-Controlled: Obtaining Implicit Viewer Feedback Through Emote-Based Sentiment Analysis on Comments of Popular Twitch. tv Channels. ACM Transactions on Social Computing, 3(2):1–34, 2020. [16] E. Loper and S. Bird. Nltk: the natural language toolkit. arXiv preprint cs/0205028, 2002. [17] T. Loures, G. Fernandes, F. Araújo, K. Martins, and P. Vaz de Melo. StinkyCheese: Chat-Based Model for Subscription Classification. In K. Kobs, M. Potthast, M. Wiegmann, A. Zehe, B. Stein, and A. Hotho, editors, Proceedings of the ECML-PKDD Discovery Challenge: Chat Analytics for Twitch (ChAT 2020), 2020. [18] J. Olejniczak. A LINGUISTIC STUDY OF LANGUAGE VARIETY USED ON TWITCH.TV: DESRIPTIVE AND CORPUS-BASED APPROACHES. page 6. [19] M. Potthast, T. Gollub, M. Wiegmann, and B. Stein. TIRA Integrated Research Architecture. In N. Ferro and C. Peters, editors, Information Retrieval Evaluation in a Changing World, The Information Retrieval Series. Springer, Sept. 2019. ISBN 978-3-030-22948-1. doi: 10.1007/978-3-030-22948-1\_5. [20] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin. Catboost: unbiased boosting with categorical features. In Advances in neural information processing systems, pages 6638–6648, 2018. [21] J. B. Schafer, D. Frankowski, J. Herlocker, and S. Sen. Collaborative filtering recommender systems. In The adaptive web, pages 291–324. Springer, 2007. [22] J. Woodcock and M. R. Johnson. The Affective Labor and Performance of Live Streaming on Twitch.tv. Television & New Media, 20(8):813–823, Dec. 2019. ISSN 1527-4764. doi: 10.1177/1527476419851077. URL https://doi.org/10.1177/1527476419851077. Publisher: SAGE Publications. [23] J. Woodcock and M. R. Johnson. Live Streamers on Twitch.tv as Social Media Influencers: Chances and Challenges for Strategic Communication. International Towards Predicting the Subscription Status of Twitch.tv Users 15 Journal of Strategic Communication, 13(4):321–335, Aug. 2019. ISSN 1553-118X. doi: 10.1080/1553118X.2019.1630412. URL https://doi.org/10.1080/1553118X.2019.1630412. Publisher: Routledge _eprint: https://doi.org/10.1080/1553118X.2019.1630412. [24] C. Zhang and J. Liu. On crowdsourced interactive live streaming: a Twitch.tv-based measurement study. In Proceedings of the 25th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video - NOSSDAV ’15, pages 55–60, Portland, Oregon, 2015. ACM Press. ISBN 978-1-4503-3352-8. doi: 10.1145/2736084.2736091. URL http://dl.acm.org/citation.cfm?doid=2736084.2736091.