A Natural Language Processing Based Framework for Early Detection of Anorexia via Sequential Text Processing Notebook for the BioNLP-IISERB Lab at CLEF 2024 Prateek Sarangi1 , Sumit Kumar1 , Shraddha Agarwal1 and Tanmay Basu1 1 Department of Data Science and Engineering, Indian Institute of Science Education and Research, Bhopal, India Abstract Task 2 of eRisk shared tasks in CLEF 2024 aims to develop text mining solutions for early prediction of anorexia using sequentially posted texts over social media. Anorexia is an eating disorder, a kind of mental illness, where people have distorted perceptions of their body weights and accordingly manipulate their food habits, which often results in deficient body weight. The aim here is to identify anorexia by processing the interactions over social media of an individual through text mining models. The organisers provided training data with ground truths for training the models and test data for evaluating their performance. The BioNLP research group at the Indian Institute of Science Education and Research Bhopal (IISERB) participated in Task 2 and submitted five runs for five text mining frameworks. Five different classifiers and individual feature engineering techniques were used to develop frameworks. The bag-of-words model and transformer-based embedding were used as features for individual classifiers. The performance of multiple classifiers was evaluated using the training corpus. Then Random Forest, Adaptive Boosting, Logistic Regression, Support Vector Machine (SVM), and Longformer classifiers were chosen to run on the test set. Experimental results show that SVM and AdaBoost classifiers using the TF-IDF-based weighting schemes achieved the highest precision score among all submitted runs in Task 2 of eRisk 2024. However, the performance of our models in terms of the other metrics, like recall, f-score, ERDE, etc., is not reasonably good compared to the other runs. Hence, we plan to develop transformer-based embeddings from scratch using data collected from multiple social media platforms. Keywords BioNLP, Information Extraction, Text Classification, Mental Health, Anorexia Detection 1. Introduction The rise of social media has revolutionised communication and opened up new opportunities for sociological and psychological research, particularly in mental health [1]. Anorexia nervosa, a severe eating disorder characterised by an intense fear of weight gain and a distorted body image leading to self-starvation and significant health issues, is a crucial area of focus. Social media’s widespread use provides a unique, real-time view into behaviours and expressions that may be useful to identify such mental disorders[2, 3, 4]. The vast amount of multiple generated content on platforms like Facebook, Twitter, and Reddit offers an extensive data pool for detecting early signs of mental health conditions like anorexia [5]. Such contents, including written posts, comments, shared photos, and likes, can be analysed for patterns indicating the onset of anorexia, which may be helpful for early interventions [6]. Recognising social media’s potential as a public health tool, the Conference and Labs of the Evaluation Forum (CLEF) launched the eRisk initiative in 2018 to explore this potential. Initially focused on depression, the initiative later included anorexia, showcasing a broader commitment to using big data for mental health monitoring and intervention. CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France $ sarangiprateek80@gmail.com (P. Sarangi); sumitkumar9297@gmail.com (S. Kumar); shraddhaagarwal2001@gmail.com (S. Agarwal); tanmay@iiserb.ac.in (T. Basu) € https://www.linkedin.com/in/prateeksarangi/ (P. Sarangi); https://www.linkedin.com/in/sumit-kumar-787203178/ (S. Kumar); https://www.linkedin.com/in/shraddha-agarwal-98743320a/ (S. Agarwal); https://sites.google.com/view/tanmaybasu/ (T. Basu)  0009-0009-3729-7833 (P. Sarangi); 0009-0008-2069-3082 (S. Kumar); 0009-0001-1395-4570 (S. Agarwal); 0000-0001-9536-8075 (T. Basu) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings In 2024, the CLEF eRisk second shared task highlights the importance of sequential text processing to identify anorexia markers as they appear in social media posts [7]; in the way the mental health professionals analyse patient behaviours over a while for treating a mental illness [8]. This method aids not only in early detection but also in understanding the progression of anorexia through digital footprints. The Bio-NLP group at the Indian Institute of Science Education and Research Bhopal (IISERB) has developed comprehensive text mining frameworks for this task. These frameworks explore text feature extraction methods like transformer-based embeddings and classical bag-of-words models for semantic interpretation of social media text to identify the indicators of anorexia [9]. Subsequently, various text classifiers viz. Random Forest, Adaptive Boosting, Logistic Regression, SVM, and Longformer were trained on multiple features derived from the text data to categorise the posts to either anorexia or control group. The proposed frameworks are presented in section 2 of the paper. The organisers provided a training corpus with ground truths for developing the models and a separate test corpus for evaluating the performance of the submitted runs. The empirical analysis of the proposed frameworks on both the training and test corpora are provided in section 3.3 in terms of the evaluation metrics provided by eRisk organisers, like ERDE, Latency, F-score, etc. [10, 11]. The experimental evaluation shows that two of our runs containing Adaptive Boosting and SVM classifiers using the TF-TDF model achieve the first and second ranks in precision among all the runs submitted for this task. Moreover, our Longformer model achieves the best latency and speed, and another run having a Logistic Regression and Entropy-based bag of words model achieves the best speed among all 44 runs submitted for task 2 of eRisk 2024. However, our models could not achieve reasonable recall, F1 and ERDE scores compared to many other runs submitted for this task. Our models show poor performance for ranking-based evaluations. Hence, we need to investigate this direction to improve the performance. We plan to train a transformer-based model from scratch using the data collected from various social media sites like Reddit and Twitter to identify the subtle nuances in the sequential texts processed over time. Furthermore, in future, we need to consider the timestamp of the posts as a significant indicator in the model for semantic interpretation of the sequential texts. 2. Proposed Frameworks In the effort to identify early signs of anorexia through sequential text processing of social media conversations, particularly from Reddit, we have developed various text classification frameworks. Our methodologies are designed to harness the vast amount of unstructured text in these digital interactions, formatted in XML and provided by the organisers of the eRisk 2024 challenge. 2.1. Feature Engineering The feature engineering approaches we have explored are crafted to capture the intricate linguistic and semantic patterns from sequential social media texts to identify the signs of anorexia. Both classical Bag of Words (BoW)—-based features and recent transformer-based embeddings are used to train the classification models. The following three feature selection techniques generated features from the given training corpus. 2.1.1. TF-IDF Weighting Scheme of BoW Each document, representing a user’s aggregated posts, is converted into a vector with each unique term as a feature. This model provides a fundamental basis for more advanced analyses. The BoW approach[12, 13] considers each unique term as a feature, known as unigram, to constitute the set of features of a text collection known as vocabulary. Sometimes, it considers two, three, or multiple terms as one feature based on the significance of the text sequence, known as bigram, trigram, or n-grams, respectively. In the experimental analysis, we explored the classifiers’ performance using unigrams, bigrams and trigrams. After generating the dictionary of a corpus, the Term Frequency-Inverse Document Frequency (TF-IDF) weighting is used to develop the vector of a given text[14, 15, 16, 17]. TF counts the number of terms in a given text, whereas IDF of a term, say, t, is defined as (︂ )︂ 𝑁 IDF(𝑡) = log 𝐷𝐹 (𝑡) Where 𝑁 is the total number of texts in the text collections, and 𝐷𝐹 (𝑡) is the number of texts in a corpus containing the term t. 2.1.2. Entropy Based Weighting Scheme of BoW This scheme assigns weights to the terms in a document based on their entropy, which measures the amount of information or uncertainty associated with each term. Many researchers use the entropy- based term weighting technique to form a term-document matrix from a text collection [13, 15, 16, 17, 18]. This method is developed in the spirit that the more important term is the more frequent one that occurs in fewer documents, taking the distribution of the term over the corpus into account [15]. The weight of a term in a document is determined by the entropy of the term frequency of the term in that document [15]. The weight (𝑊𝑖𝑗 ) of the 𝑖𝑡ℎ term in the 𝑗 𝑡ℎ document is defined by the Entropy1 [15, 16] model as follows: 𝑁 ∑︀ (︃ 𝑃𝑖𝑗 log 𝑃𝑖𝑗 )︃ 𝑗=1 𝑡𝑓𝑖𝑗 where, (1) (︀ )︀ 𝑊𝑖𝑗 = log 𝑡𝑓𝑖𝑗 + 1 × 1+ , 𝑃𝑖𝑗 = 𝑁 log(𝑁 + 1) ∑︀ 𝑡𝑓𝑖𝑗 𝑗=1 Here, N is the total number of documents in the corpus, and 𝑡𝑓𝑖𝑗 is the frequency of 𝑖𝑡ℎ term in the 𝑗 𝑡ℎ document of the corpus. Generally, the BoW model generates many terms, making the term-document matrix sparse and high dimensional, which can badly affect the performance of the text classifiers [17]. Hence, 𝜒2 -statistic-based term selection technique was used for both TF-IDF and Entropy-based term weighting schemes in the experiments to identify essential terms from the term-document matrix, which is a widely used technique for term selection [13, 16, 19]. 2.1.3. Transformer-Based Embeddings Utilising cutting-edge transformer architectures like Longformer2 , we create dense vector representa- tions of text that capture deep contextual nuances far beyond traditional models. Bidirectional Encoder Representations from Transformers (BERT) is a contextualised word representation model based on a masked language model and pre-trained using bidirectional transformers on general domain corpora, i.e., English Wikipedia and books [20]. The Longformer model[21] was chosen because it performs better than BERT at understanding long-term relationships in texts[17]. It creates feature embeddings to help detect early signs of anorexia by recognising language patterns, from explicit mentions of body image issues to under-expressed signs of distress. 2.2. Text Classification Methods Our approach to text classification combines engineered features with several machine learning algo- rithms chosen for their ability to handle complex relationships within high-dimensional data. We use Support Vector Machines (SVM) with linear and RBF kernels for their efficiency in high-dimensional spaces and binary classification tasks, finding the optimal hyperplane to separate classes and maximising the margin between them. This method is particularly effective for its robustness against overfitting [22]. Logistic Regression (LR), a probabilistic model, is also employed with L1 and L2 regularisation due 1 https://radimrehurek.com/gensim/models/logentropy_model.html 2 https://huggingface.co/allenai/longformer-base-4096 to its interpretability and effectiveness for binary outcomes [23]. LR uses the probability of a particular class by fitting a logistic function to the data, providing precise, interpretable coefficients for each feature, which helps understand the influence of different features on the likelihood of anorexia. In addition, we have used Adaptive Boosting (AB), which improves classification accuracy by com- bining multiple weak classifiers, such as decision trees, in our model. AB sequentially applies a weak classifier to the data, adjusting the weights of misclassified instances so subsequent classifiers focus more on difficult cases, enhancing performance on complex textual data [24]. Furthermore, we incorporate Longformer, a Transformer-based model, which is pre-trained on large datasets and fine-tuned on our specific dataset [21]. This model is used to understand more extended context and relationships between words in a text to capture language patterns in user interactions. By training these classifiers on the features derived from the textual data, we aim to identify individual posts indicating the risk of anorexia. We systematically analyse the performance of these models on training data and validate them on unseen test data to develop a scalable and effective tool for the early detection of anorexia. The SVM, Logistic Regression, and AdaBoost models were implemented using Scikit-learn3 , while the transformer-based models were fine-tuned using the Hugging Face Transformers library [25]. In Section 3.3, we present the experimental results, showcasing the efficacy of our frameworks for identifying signs of anorexia through social media interactions. 3. Experimental Analysis The dataset provided by the eRisk 2024 challenge organisers includes 2,335 files, each containing a collection of Reddit posts from an individual user. In XML format, these files include metadata such as post dates and titles. Among these, 2,062 files are labelled as class "0" (no signs of anorexia, i.e., control group), while 273 are labelled as class "1" (potential signs of anorexia). 3.1. Data Preprocessing For our analysis, we extracted and merged all conversations of each subject from the XML files. It was observed that the text field was empty in some files, but the title field had vital text, ensuring that either had the data to be considered in such cases instead of just concentrating solely on the text field. This preprocessed data was input for our feature engineering and classification pipelines. 3.2. Experimental Settings To assess the performance of our frameworks on the raining corpus, we utilised several evaluation metrics commonly employed in binary classification tasks, including precision, recall and F1-score. Stratified 5-fold cross-validation has been conducted on the training dataset to evaluate our models’ performance and tune the hyperparameters. Grid search4 and random search techniques were used to identify the optimal hyperparameter settings for each model following the cross-validation scores. The best-performing models from the cross-validation stage were then tested on the unseen test dataset. The results of this evaluation are presented and discussed in Section 3.3. 3.3. Results and Discussion The BioNLP-IISERB team’s participation in the eRisk 2024 challenge aimed to detect anorexia early via sequential text processing. The approach involved several experimental runs tailored to explore the optimal parameters and methodologies. Table 2 shows the experimental comparison of various Bag of Words models across different classifiers on a training corpus. Support Vector Machine (SVM) with TF-IDF feature selection emerged as the top performer, achieving the highest precision (0.9639) and F1 score (0.8009). Logistic Regression (LR) 3 http://scikit-learn.org/stable/supervised_learning.html 4 https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html Table 1 BioNLP-IISERB team results for eRisk 2024 Challenge Task 2 Team Details Number of runs 5 Number of user writings processed 10 Processing time 09:39 (from first to last response) Table 2 Performance of Various Bag of Words Models on Different Classifiers on Training Corpus Classifier Feature Selection # Terms Recall Precision F1 Entropy 2500 0.5128 0.7650 0.6140 LR TF-IDF 10000 0.6777 0.8685 0.7613 Entropy 500 0.1465 0.5714 0.2332 RF TF-IDF 1000 0.5971 0.7689 0.6722 Entropy 1000 0.3626 0.9802 0.5294 SVM TF-IDF 10000 0.6850 0.9639 0.8009 Entropy 500 0.3993 0.7365 0.5178 Adaboost TF-IDF 1500 0.6410 0.8102 0.7157 LongFormer - - 0.0741 0.0606 0.0667 also performed strongly, particularly with TF-IDF, achieving the highest recall (0.6777) among all models. The Adaboost classifier showed moderate performance, while Random Forest models generally underperformed. Surprisingly, the LongFormer model showed remarkably poor performance across all metrics and needed further investigation. A clear trend can be seen in the superiority of TF-IDF over Entropy-based feature selection across all traditional classifiers, often by a significant margin. This suggests that TF-IDF’s ability to capture both term importance and distinctiveness is precious for this classification task. The optimal number of terms varied across classifiers, with SVM and LR performing best with 10,000 terms, while RF and Adaboost achieved their best performance with fewer terms (1000 and 1500, respectively). These results highlight the importance of classifier selection and feature engineering in text classification tasks, with SVM and LR coupled with TF-IDF emerging as strong baseline models for similar problems. Table 3 Decision-Based Evaluations of Proposed Frameworks on Test Data Runs P R F1 ERDE5 ERDE50 latency𝑇 𝑃 speed latency weighted F1 BioNLP-IISERB 0 0.53 0.23 0.32 0.10 0.09 2.00 1.00 0.32 (Entropy+BoW+LR) BioNLP-IISERB 1 0.54 0.75 0.62 0.08 0.04 4.00 0.99 0.62 (TF-IDF+BoW+LR) BioNLP-IISERB 2 0.58 0.16 0.25 0.10 0.10 1.00 1.00 0.25 (Longformer) BioNLP-IISERB 3 0.67 0.51 0.58 0.08 0.06 3.00 0.99 0.58 (TF-IDF+BoW+SVM) BioNLP-IISERB 4 0.73 0.62 0.67 0.08 0.05 4.00 0.99 0.66 (TF-IDF+BoW+AB) The performance exhibited in Table 3 with considerable variation across multiple experimental runs. Run 1 (TF-IDF+BoW+LR) demonstrated superior performance with a high F1 score (0.62), indicating an optimal balance between precision and recall. Run 3 and run 4 showed consistent performance, both achieving an F1 score of 0.58 and 0.67, respectively. Notably, these runs also exhibited the lowest ERDE50 values (0.08), suggesting enhanced early recognition capabilities. In contrast, run 0 (Entropy+BoW+LR) and run 2 (Longformer) obtained F1 scores of 0.32 and 0.25, Table 4 Ranking Based Performance of Proposed Frameworks on Test Data Writings Metrics BioNLP- BioNLP- BioNLP- BioNLP- BioNLP- IISERB0 IISERB1 IISERB2 IISERB3 IISERB4 (Entropy + (TFIDF + (Longformer) (Entropy + (TFIDF + SVM) SVM) AdaBoost) AdaBoost) P@10 0.10 0.00 0.00 0.10 0.20 1 NDCG@10 0.19 0.00 0.00 0.06 0.21 NDCG@100 0.06 0.07 0.05 0.09 0.10 P@10 0.00 0.00 0.00 0.00 0.00 100 NDCG@10 0.00 0.00 0.00 0.00 0.00 NDCG@100 0.00 0.00 0.00 0.00 0.00 P@10 0.00 0.00 0.00 0.00 0.00 500 NDCG@10 0.00 0.00 0.00 0.00 0.00 NDCG@100 0.00 0.00 0.00 0.00 0.00 P@10 0.00 0.00 0.00 0.00 0.00 1000 NDCG@10 0.00 0.00 0.00 0.00 0.00 NDCG@100 0.00 0.00 0.00 0.00 0.00 respectively. These runs also had elevated ERDE50 values (0.10 and 1.00), indicating reduced effectiveness in early correct decision recognition. Table 3 further demonstrates several strengths of the proposed approaches, such as runs 1, run 3 and run 4 exhibited high precision balanced with satisfactory recall, resulting in robust F1 scores. Moreover, run 3 and run 4 showed lower latencies (0.99 and 0.99), indicating efficient decision-making processes. The consistent performance in Run 3 (TF-IDF+BoW+SVM) and 4 (TF-IDF+BoW+AB) suggests a reliable and stable approach. However, certain limitations were observed where run 0 and run 2 displayed notably lower recall values (0.23 and 0.16), adversely affecting their overall F1 scores. Additionally, run 0 and run 2 exhibited higher latency, potentially due to inefficiencies in the early stages of model deployment. The ranking-based evaluations in Table 4 depicted a decline in performance metrics as the number of processed writings escalated. Initially, promising detection capabilities at smaller datasets experienced a stark reduction in precision and NDCG scores, diminishing to zero as the data volume increased. This suggests that while the initial models perform well under control, smaller datasets, the effectiveness wanes significantly under larger scales, pointing to potential overfitting or the need for more robust generalisation capabilities in the models. 4. Conclusion The BioNLP@IISERB team took part in the eRisk 2024 task 2 to develop robust text mining frameworks for early prediction of anorexia by analysing social media texts. The classical bag-of-words models and recent transformer-based methods were explored to generate potential features for identifying nuances in the given texts. Subsequently, different text classifiers were trained using these features to identify anorexia from the given social media texts. The experimental results suggests that these frameworks are capable of identifying textual patterns indicative of anorexia, however, there are rooms for further improvements. Some frameworks showed promising precision and recall scores on more minor texts, but we encountered substantial challenges in maintaining consistent performance as the volume of data increased. This suggests the need for continued research and refinement of our methods. By processing and analysing large-scale social media data, we can extract valuable insights into the onset and progression of conditions like anorexia, thus enabling timely and targeted support. In the future, more robust and dynamic models will be developed that can efficiently handle the growing volume of user-generated content while maintaining high accuracy in detection. Additionally, exploring multi-modal approaches that combine textual data with other types of information, such as images and social network data, could provide more comprehensive and nuanced insights into mental health conditions. In conclusion, by successfully applying these techniques, we would have a significant positive impact, providing early resilience and support for individuals dealing with anorexia and other mental health challenges. Future work should focus on stabilising recall performance and reducing latency to consistently enhance decision-making speed and accuracy. Acknowledgements Prateek Sarangi and Tanmay Basu acknowledge the support of the seed funding (INST/DSE/2023- 2024/18) provided by the Indian Institute of Science Education and Research Bhopal, India. References [1] W. Ragheb, J. Azé, S. Bringay, M. Servajean, Attentive multi-stage learning for early risk detection of signs of anorexia and self-harm on social media., in: CLEF (Working Notes), 2019. [2] M. De Choudhury, M. Gamon, S. Counts, E. Horvitz, Predicting depression via social media., ICWSM 13 (2013) 1–10. [3] M. De Choudhury, S. Counts, E. Horvitz, Social media as a measurement tool of depression in populations, in: Proceedings of the 5th Annual ACM Web Science Conference, ACM, 2013, pp. 47–56. [4] L. G. et al., Machine learning and natural language processing in mental health: systematic review, Journal of Medical Internet Research, 23 (2021) e15708. [5] S. Paul, S. K. Jandhyala, T. Basu, Early detection of signs of anorexia and depression over social media using effective machine learning frameworks., in: CLEF (Working notes), 2018. [6] B. W. Eidem, F. Cetta, J. L. Webb, L. C. Graham, M. S. Jay, Early detection of cardiac dysfunction: use of the myocardial performance index in patients with anorexia nervosa, Journal of adolescent health 29 (2001) 267–270. [7] J. Parapar, P. M. Rodilla, D. E. Losada, F. Crestani, Overview of erisk 2024: Early risk prediction on the internet, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction, 15th International Conference of the CLEF Association, CLEF 2024, Springer International, Grenoble, France, 2024. [8] A. Ranganathan, A. Haritha, D. Thenmozhi, C. Aravindan, Early detection of anorexia using rnn-lstm and svm classifiers., in: CLEF (Working Notes), 2019. [9] F. Galetta, F. Franzoni, A. Cupisti, E. Morelli, G. Santoro, F. Pentimone, Early detection of cardiac dysfunction in patients with anorexia nervosa by tissue doppler imaging, International journal of cardiology 101 (2005) 33–37. [10] J. Parapar, P. M. Rodilla, D. E. Losada, F. Crestani, Overview of erisk 2024: Early risk prediction on the internet (extended overview), in: Working Notes of the Conference and Labs of the Evaluation Forum CLEF 2024, CLEF 2024, CEUR Workshop Proceedings, Grenoble, France, 2024. [11] M. Marion, S. Lacroix, M. Caquard, L. Dreno, P. Scherdel, C. G. L. Guen, E. Caldagues, E. Launay, Earlier diagnosis in anorexia nervosa: better watch growth charts!, Journal of eating disorders 8 (2020) 1–9. [12] G. Salton, M. J. McGill, Introduction to Modern Information Retrieval, McGraw Hill, 1983. [13] T. Basu, S. Goldsworthy, G. V. Gkoutos, A sentence classification framework to identify geometric errors in radiation therapy from relevant literature, Information 12 (2021) 139. [14] A. Selamat, S. Omatu, Web page feature selection and classification using neural networks, Information Sciences 158 (2004) 69–88. [15] T. Sabbah, A. Selamat, M. H. Selamat, F. S. Al-Anzi, E. H. Viedma, O. Krejcar, H. Fujita, Modified frequency-based term weighting schemes for text classification, Applied Soft Computing 58 (2017) 193–206. [16] T. Basu, G. V. Gkoutos, Exploring the performance of baseline text mining frameworks for early prediction of self harm over social media., in: Proceedings of International Conference of CLEF Association, 2021, pp. 928–937. [17] H. Srivastava, N. S. Lijin, S. Sruthi, T. Basu, Nlp-iiserb@erisk2022: Exploring the potential of bag of words, document embeddings and transformer based framework for early prediction of eating disorder, depression and pathological gambling over social media, in: Proceedings of Experimental IR Meets Multilinguality, Multimodality, and Interaction: 13th International Conference of the CLEF Association, Bologna, Italy, 2022. [18] S. Goswami, S. Pal, S. Goldsworthy, T. Basu, An effective machine learning framework for data elements extraction from the literature of anxiety outcome measures to build systematic review, in: Business Information Systems: 22nd International Conference, BIS 2019, Seville, Spain, June 26–28, 2019, Proceedings, Part I 22, Springer, 2019, pp. 247–258. [19] T. Basu, C. Murthy, A supervised term selection technique for effective text categorization, International Journal of Machine Learning and Cybernetics 7 (2016) 877–892. [20] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018). [21] I. Beltagy, M. E. Peters, A. Cohan, Longformer: The long-document transformer, arXiv preprint arXiv:2004.05150 (2020). [22] C. Cortes, V. Vapnik, Support-vector networks, Machine learning 20 (1995) 273–297. [23] D. W. Hosmer, S. Lemeshow, R. X. Sturdivant, Applied logistic regression, volume 398, John Wiley & Sons, 2013. [24] Y. Freund, R. E. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of computer and system sciences 55 (1997) 119–139. [25] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Fun- towicz, et al., Transformers: State-of-the-art natural language processing, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 38–45.