=Paper=
{{Paper
|id=Vol-3617/paper-08
|storemode=property
|title=Forecasting Publications' Success Using Machine Learning Prediction Models
|pdfUrl=https://ceur-ws.org/Vol-3617/paper-08.pdf
|volume=Vol-3617
|authors=Rand Alchokr,Rayed Haider,Yusra Shakeel,Thomas Leich,Gunter Saake,Jacob Krüger
|dblpUrl=https://dblp.org/rec/conf/birws/AlchokrHSLSK23
}}
==Forecasting Publications' Success Using Machine Learning Prediction Models==
Forecasting Publications’ Success Using Machine Learning Prediction Models Rand Alchokr1,∗ , Rayed Haider3,∗ , Yusra Shakeel1,2,∗ , Thomas Leich3,4 , Gunter Saake1 and Jacob Krüger5 1 Otto-von-Guericke University, Magdeburg, Germany 2 Karlsruhe Institute of Technology, Karlsruhe, Germany 3 Hochschule Harz, Wernigerode, Germany 4 METOP GmbH, Magdeburg, Germany 5 Eindhoven University of Technology, The Netherlands Abstract Measuring the success and impact of a scientific publication is an important, thus controversial matter. Despite all the criticism, it is widespread that citation counts is considered a popular indication of a publication‘s success. Therefore, in this paper, we use a machine learning framework to test the ability of alternative metrics (altmetrics) to predict the future impact of papers reflected in the citation counts. To achieve the experiment, we extracted 7,588 papers from 10 computer science journals. To build the feature space for the prediction problem, 14 different altmetric indices were collected, 3 feature selection approaches, namely, Variance threshold, Pearson’s Correlation, and Mutual information method, were used to minimize the feature space and rank the features according to their contribution to the original dataset. To identify the classification performance of these features, three classifiers were used: Decision Tree, Random Forest, and Support Vector Machines. According to the experimental data, altmetrics can predict future citations and the most useful altmetrics indications are social media count, tweets, news count, capture count, and full-text view, with Random Forest outperforming the other classifiers. Keywords Bibliometric, alternative metrics, machine learning, computer science 1. Introduction A successful publication is a desirable goal for any researcher, irrespective of their scientific field. However, judging how successful a published paper is and measuring that success is considered a critical issue. Furthermore, forecasting scientific impact and success is becoming an essential regular task for the hiring committees, funding agencies, and department heads for recruitment decisions and rewards [3, 7, 26]. Through this a merit-based career advancement scheme is developed, that forecasts the individual’s performance based on past achievements and projects future performance. However, distilling the contents of each article into an appraisal BIR 2023: 13th International Workshop on Bibliometric-enhanced Information Retrieval at ECIR 2023, April 2, 2023 ∗ Corresponding author. Envelope-Open rand.alchokr@ovgu.de (R. Alchokr); rayedhaider95@gmail.com (R. Haider); yusra.shakeel@kit.edu (Y. Shakeel); tleich@hs-harz.de (T. Leich); saake@ovgu.de (G. Saake); j.kruger@tue.nl (J. Krüger) Orcid 0000−0003−0112−5430 (R. Alchokr); 0000-0001-5135-4325 (Y. Shakeel); 0000-0001-9580-7728 (T. Leich); 0000-0001-9576-8474 (G. Saake); 0000-0002-0283-248X (J. Krüger) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 77 CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings of an individual’s past, present, and future influence and determining an acceptable ranking of candidates is significantly challenging when presented with candidate pools ranging from a few hundred for tenure-track positions to thousands for fellowship and grant competitions. In the past decades, researchers have relied heavily on quantitative indicators for evaluating the scientific success of a given research body. Citation frequency is a well-known criterion for research evaluation and despite all the criticisms of citations not being a perfect and objective means of measuring scientific quality, citation counts are still widely referred to as a foremost indicator of the impact and success of a publication in the scientific community. Recently, there has been extensive research to investigate the link between a paper’s citations and all possible factors correlated to it [5, 10, 25, 23, 24, 15, 6]. Additionally, research is becoming interested in forecasting the future success of a paper [7, 13, 38, 39, 2]. Among these factors, bibliometrics and altmetrics are deemed of the utmost relevance. While bibliometrics is the traditional indices reflecting the characteristics as well as the credibility of papers, authors, and publishing venues (e.g. citations, h-index of the author, cite score of the venue, etc.), altmetrics have been recently introduced to capture the spread of a publication in various online platforms (e.g. Wikipedia, Twitter, Facebook). The combination of both bibliometrics and altmetrics, is recommended by researchers to complement their pros and cons [23]. Recent studies have looked at the relationship between bibliometric indicators and altmetrics, taking into consideration peer- reviewed quality evaluation methods [24, 8, 27, 33, 32, 30, 31]. The application of altmetrics in research assessment raises the question of whether the data collected by altmetrics is a good predictor of future success and whether it correlates with citation. On the other hand, the remarkable progress in the field of machine learning (ML) has matured a plethora of techniques that can efficiently handle various forecasting tasks. In the context of predicting papers’ citations using bibliometrics and altmetrics, multiple studies formulate the problem into a regression task that considers continuous values of both features and the output [13, 2, 17, 22, 29], whereas other studies considered classification algorithms that generate categorical outcome [38, 39]. In this paper, we rely on both of these metrics to find out which altmetrics features contribute to forecasting citation counts. We consider a paper successful if it achieves a high number of citations. We categorize the publications according to their citation counts. Belonging to a class of higher ranking hints at a more successful paper. The goal of this study is to determine which features of altmetric are useful in predicting future highly cited papers and which machine learning model would be the best for this prediction. In our experiment, we will use Decision Trees, Random Forests, and Support Vector Machines. In detail, our main contributions in this paper are as follows: • We collect an extensive dataset comprising papers from 10 computer engineering journals from 2010 to 2015. Further, we elicit the papers’ citations and altmetrics, aiming to find the most promising altmetrics formula to predict the future success of a paper. • We discuss multiple prediction models and compare their accuracy. Through our experiments, we aim to provide a better understanding of the usefulness of altmetrics to indicate the future success of publications. 78 2. Background Next, we present the background needed to understand this paper. 2.1. Evaluation Metrics Peer Reviewing during the scientific evaluation process of papers is an essential part of pub- lishing academic research, representing an important quality assurance mechanism [34]. On the other hand, bibliometrics which represent the traditional metrics are common measures that the research community relies on when assessing the scientific impact and quality of a publication [10]. Such metrics have multiple advantages, they mainly facilitate the examination of large datasets and help decision-making on individuals, institutions, or research grants [21]. Citation counts, h-index, and impact factor are among the most important metrics used for assessing the impact and quality of publications, publishing venues, authors, or research in general. Citation-based metrics are assumed to directly reflect on the impact and quality of a publication by implying credibility to the reader and reflecting the total impact of a publica- tion on a research field [25]. Despite their potential benefits, bibliometrics have always been criticized in the context of measuring the impact or quality of research, which they do not necessarily capture [21]. However, many studies suggest that using bibliometrics is a helpful complement to mitigate potential biases during traditional peer review. Altmetrics have been recently introduced as means to assess the impact of a publication based on publicly available interfaces of various online platforms [18]. These metrics allow researchers to track the impact of publications beyond traditional bibliographic metrics and help them in catching the buzz and spread of their research to a broader audience by calculating quantitative values of user interactions on social platforms, for instance, Wikipedia, Twitter, Facebook, the number of downloads, views, or read times. It is known that altmetrics may not accurately represent scientific quality, they lack the evidence, are difficult to measure, commercialized, and easily manipulated [36, 23], however, based on the mentioned benefits, many researchers argue that altmetrics can serve as an impact indicator and a complement to traditional metrics [23, 24, 15]. Researchers recommend using both kinds of metrics when assessing the impact or quality of a publication to complement their pros and cons [23]. In conclusion, we rely on both kinds of metrics to measure the success of a publication. We consider a paper successful and has an impact if it has achieved a high number of citations. 2.2. Predictive Algorithms By definition, machine learning is a branch of computer science that grew out of artificial intelligence research into pattern recognition and computational learning theory [14]. It is the learning and building of algorithms that can learn from and make predictions on datasets. There are three types of machine learning algorithms: 1) Supervised learning algorithms: with two types: classification and regression, 2) Unsupervised learning algorithms: association, clustering, and dimensionality reduction, and 3) reinforcement learning. Supervised learning, is defined as learning from labeled training data. The training data is learned using a supervised learning algorithm, which then creates a prediction function. For 79 unseen occurrences, the predictive function will be utilized to determine the class label. Linear regression, Logistic Regression, CART, Naïve Bayes, and K-Nearest Neighbors (KNN) — are examples of supervised learning. Also, Bagging with Random Forests, Boosting with XGBoost, and Multilayer Perception (basic ANN). To start with, Naive Bayes applies the assumption of independence between every set of features, meaning that all features contribute independently to the probability of the target’s outcome [16]. XGBoost is a scalable tree-boosting system, it is used widely by data scientists nowadays [11]. A classifier is an example of a supervised learning algorithm. Machine learning algorithms that tackle the categorization problem are known as classifiers. A classification issue is described as a task of determining class labels for fresh observations based on a training batch of data with a known class label. ANN is a helpful model for classification, clustering, pattern recognition, and prediction in many fields [1]. Random Forest inputs and random features produce good results in classification—less so in regression. Finally, The K-Nearest Neighbors (KNN) has often been used in pattern recognition problems. According to the existing literature, there have been various studies done to investigate the factors that influence citation and studies that attempted to forecast and estimate future citations. 3. Related Work According to existing literature, various studies investigated the factors that influence citation, while others attempted to forecast and estimate future citations. Some of these studies utilized the early citation counts to predict the publication‘s future success [35, 2, 29]. Their results agree on the impact early citations and other related factors have on predicting highly cited publications. Social media metrics started to gain interest in research. For instance, tweets had a weak ability to positively predict high citation counts across several disciplines [20]. In the computer science domain, multiple classification methods were used to check whether the future success of articles depends on bibliometrics or altmetrics, and the results show that both contribute equally with PCA achieving the best performance [39]. Another study investigated altmetrics specifically using the ”Altmetric Attention Scores”, but this time to predict the retraction of the articles. The results show that roughly one-fourth of the retractions are properly predicted using five alternative metrics Copiello [12]. Another study by Akella et al. [4] used atmetrics social media features to predict early and long-term citation counts using several classifiers and regressors, their main results indicate that Mendeley readership plays a crucial role in determining the early citations. We built our experiments on theirs, but first by determining the most influential features. We present an overview of the related work in Table 1 collected by conducting a literature search on Scopus1 digital library. For each study, we display the type of prediction, and feature selection methods that provide the necessary background information to guide our experiments. Overall, the literature demonstrates that researchers have explored a variety of machine learning algorithms and features in their efforts to predict the academic influence of research publications. 1 https://scopus.com 80 Table 1 Overview of the related work. Ref Algorithms Results Fields Wang et al. [39] Classification (Naïve Bayes, KNN, Random Forest), Relief-F, Principal PCA has the best performance with 0.947 preci- CS Component Analysis (PCA) and entropy weighted method to find sion which better predict the future success of articles: Bibliometrics or altmetrics Poggi et al. [27] Classification Correlation-based Feature Selection (CFS) SVM outperforms the other classification meth- CS ods with 0.894 precision Bornmann et al. Regression calculated an adjusted R2, journal papers published in 1980 The consideration of journal impact improves the CS,WoS [8] Journal impact, number of authors, the number of cited references, and prediction of long-term citation impact the number of pages Copiello [12] Classification comparing a set of 100 retracted articles with high Alt- Roughly one-fourth of the retractions are properly CS metric Attention Scores with a sample of 100 randomly chosen retracted predicted using five alternative metrics by PLoS ONE Akella et al. [4] Classification, Multiple Linear Regression, altmetrics social media fea- neural networks and ensemble models performed CS tures to predict the early and long-term citation counts better, with high predicted accuracy and F-1 scores. Mendeley readership plays a crucial role in determining the early citations Bai et al. [7] Paper Potential Index (PPI) model and multi-feature model PPI model outperforms the multi-feature model in CS,MS terms of range-normalized RMSE and it better in- terprets changes in citation without requiring pa- rameter adjustments. In terms of Mean Absolute Percentage Error and Accuracy, the multi-feature model outperforms the PPI model; nevertheless, its predictive performance is more dependent on parameter modification Yu et al. [40] Stepwise multiple regression used to select appropriate features and Regression model works well in this situation InfS,LibS to build a regression model for explaining the relationship between where bibliometrics have high predictability com- citation impact and the chosen features (external features of a paper, pared to other features and that the regression authors, journal, citations) model works well in this situation Hassan et al. [20] Linear regression, sentiment analysis of the publications tweets (posi- A weak positive prediction of high citation counts CS tive, negative, neutral) of 6,482,260 tweets, July 2011 to June 2016, user’s across 16 broad disciplines in Scopus, number of profile, types of journals, citation count, subjects unique Twitter users improved the adjusted R- squared value of regression analysis in several disciplines Stegehuis et al. Quantile regression, utilized citations to predict publication‘s future Both predictors (i.e., impact factor and early ci- P [35] success (Impact factor of the publication and the First 1-year citation tations) contribute to the accurate prediction of counts) are used as predictors long-term citation impact Daud et al. [13] CART, Naive Bayes, Maximum Entropy Markov, bibliometrics: author, Maximum Entropy Markov model had a better CS co-author, venue of publication prediction of the average number of citations whereas CART performed better for predicting an average relative increase in citations. They concluded that an excellent paper will be cited regardless of the paper’s publishing time and a high-quality paper will have a high influence Fu and Aliferis Logistic regression, support vector machine modules, cross-validation, It is feasible to accurately predict future citation Bio [17] AUC, HITON, and Markov Blanket Algorithm alongside with citation counts with a mixture of content-based and biblio- classifications, all features, only content features, bibliometric, and metric features using machine learning methods only the impact factor Abramo et al. [2] Linear regression, 8 years citation window to evaluate the Impact factor, Both measures are not reliable and could be ma- E,Ch early citations and compared the correlation of metrics (peer review, nipulative or biased after measuring the early im- bibliometrics) with the success of scholarly publications pact of three years after the publication. Li et al. [22] Deep learning CNN prediction models, biblio-features. Proposed methodology outperforms the state-of- M the-art models and gives accurate prediction of future citations Ruan et al. [29] XGBoost, linear regression, four-layer Back Propagation (BP) neural Performance of the BP neural network is signif- Inf,Doc network to predict the five-year citations of 49,834 papers, KNN, Ran- icantly better than the others, the accuracy of dom Forest, and Support Vector Regression. the model at predicting infrequently cited papers was higher than that for frequently cited ones. 5 features have effects (‘citations in the first two years, ‘first-cited age’, ‘paper length’, ‘month of the publication’, and ‘self-citations of journals’) Thelwall and Regression analysis of Altmetric.com data from November 2015 and The main altmetric indicator of scholarly impact Mut Nevill [37] Scopus citation counts from October 2017 for articles in 30 narrow is Mendeley reader counts, journal impact factors fields can predict later citation counts better than Alt- metric.com scores CS=Computer Science, E=Engineering, Bio=Biomedical, Lib=Library, Inf=Information, S= Science, M=Mathematics, Re=Rehabilitation, PM=Physical Medicine, Me=Medical, L=Life,WoS=Web of Science, APS=Applied Physics Statistics, CPS=Computational Science, AM=Applied Mathematics, P=Physics,Ch=chemical, Mut=Mutiple fields. Doc=Documentation 81 Table 2 Overview of the chosen Journals in our dataset from Scopus. # Journal #Papers SNIP SJR CSc Publisher 1 Advanced Engineering Informatics 377 2.089 0.946 6.9 Elsevier 2 Computers and Education 1494 4.28 3.047 12.7 Elsevier 3 Engineering with Computers 254 2.014 0.663 7.2 Springer Nature 4 IEEE Transactions on Image Processing 2302 4.182 2.893 15.6 IEEE 5 IEEE Transactions on Information Forensics & Security 931 3.617 1.897 14.7 IEEE 6 Industrial Management and Data Systems 441 2.502 1.39 7.9 Emerald 7 Information Processing and Management 418 3.199 1.192 8.6 Elsevier 8 Journal of Informetrics 493 2.146 2.079 8.4 Elsevier 9 Journal of Machine Learning Research 1 3.147 2.219 9.3 MIT Press 10 Neural Networks 877 2.246 1.718 10.0 Elsevier Total 7,588 #Papers from 2010 till 2015, CSc=CiteScore, SNIP=Source Normalized Impact per Paper, SJR=SCImago Journal Rank 4. Experiments In this section, we describe how we elicited and analyzed the data to achieve our goal. Our ex- periment consists of four main phases: (1) Data collection, (2) Feature selection, (3) Classification predictive models, and (4) Evaluation of the models. 4.1. Data Collection From Scopus digital library, we chose 10 computer science journals to extract their publications in a time period from 2010 to 2015 in order to give citation counts reasonable time to accumulate. The total number of extracted papers is 7,588. Table 2 displays the selected journals for our study and the number of papers extracted from each journal in addition to their properties displayed in the forms of the metrics that were used to choose them. We used two journal-related metrics in Scopus to help us decide which journals to include: CiteScore2 , Source Normalized Impact per Paper (SNIP)3 , and SCImago Journal Rank (SJR)4 . The altmetrics were collected from PlumX5 tool which we chose because it is integrated with Scopus, and by using the available APIs, we were able to extract the needed features. Since we were interested in determining whether altmetrics can predict citations, we used citation counts for these articles as the target variables for our models. Using Scopus special APIs and by matching and merging the data for each article based on its DOI, citations were collected for each paper and our dataset was completed. Our target variable is citation counts, and the features we selected are 14 altmetric features as listed and described in Table 3. The details of each feature are described on a separate webpage. Social media count6 are all interactions on social media platforms such as likes, shares on Facebook and Youtube, and tweets on Twitter. Subcategories of the Social media count are 2 https://service.elsevier.com/app/answers/detail/a_id/14880/supporthub/scopus/ 3 https://service.elsevier.com/app/answers/detail/a_id/14884/supporthub/scopus/kw/SNIP/ 4 https://service.elsevier.com/app/answers/detail/a_id/14883/supporthub/scopus/kw/sjr/ 5 https://plumanalytics.com/ 6 https://plumanalytics.com/learn/about-metrics/social-media-metrics/ 82 many, we chose tweet count and Facebook count. Mentions count7 is another PlumX category that includes blog posts, comments, reviews, and Wikipedia links about the publication from various resources such as Reddit, Slideshare, Vimeo, YouTube, and Github. The three most important subcategories of Mentions count are news, blog, and reference counts. Moving to the third category, Capture count8 which tracks users’ actions like bookmarking, marking as favorite, reading, and exporting the paper. It also includes multiple subcategories such as reader count which gathers its data from CiteULike, Goodreads, Mendeley, and SSRN, and export/saves count. The last PlumX metrics category is Usage count9 that points out some usage statistics like links count, abstract or full-text view count and more. The collected papers were ranked in descending order according to their citation counts and then categorized into 2 categories, the highly cited papers (HCPs) and the Low cited papers (LCPs) using the following process: • Calculate the average of all citation counts • Divide the papers according to their citations compared to the average into the following: – (HCPs) these papers have a number of citations that were greater than or equal to the average citation counts of the venue. – (LCPs) these papers have a number of citations less than the averaged citations of all papers of that venue. The goal of this categorization of the papers into two groups is to characterize the various stages of paper growth, with HCPs being assigned to the successful ones. Despite the simplicity of this two-class classifier setting, the evaluation of which indications better forecast the future success of papers could be clearly captured based on it. Table 3 Altmetrics features. Index Feature Description X0 Social media count Number of times a paper has been mentioned or shared on any social network X1 Tweet count Number of times a paper has been mentioned in a tweet on Twitter X2 FB count Number of times a paper has been mentioned or shared in Facebook X3 Mention count Number of users who have mentioned a particular paper online X4 News count Number of times a paper has been mentioned in news outlets X5 Blog count Number of times a paper has been mentioned or featured in a blog post X6 Reference count Number of references of that particular paper X7 Capture count Number of times capturing the interest in the paper on the internet X8 Reader count Read is counted each time someone views the paper X9 Export/Saves count Represents the number of saves of the paper in external platform X10 Usage count Is a record of every action taken by all user X11 Links click count Number of clicks on the link of paper X12 Links outs count Number of link that will take you to the paper X13 Full-text view Number of times a paper has been fully viewed online 7 https://plumanalytics.com/learn/about-metrics/mention-metrics/ 8 https://plumanalytics.com/learn/about-metrics/capture-metrics/ 9 https://plumanalytics.com/learn/about-metrics/usage-metrics/ 83 4.2. Feature Selection The term “feature selection” refers to the process of minimizing the number of input features that can be utilized to describe the interrelationships between them. It eliminates features that are redundant or useless. Irrelevant features give no valuable information about the data, whereas redundant features deliver no additional information than the currently selected features. In this paper, three different feature selection techniques were used to measure the importance of each feature on the dataset: Variance Threshold (VAR) a fundamental baseline technique to feature selection is the VAR approach. It removes features with low variance or those whose variance is less than a particular threshold. The premise is easy to grasp. Calculate the standard deviation of each sample feature value and if the number is less than the threshold, filter and then eliminate. By default, all zero-variance characteristics are turned off. A variance of 0 shows that the sample feature’s value has remained unchanged. Var[𝑥] = 𝑝(1 − 𝑝) Pearson’s Correlation (PC) Correlation-based Feature (CFS) is a well-known similarity mea- sure that evaluates the correlation between features and classes, as well as between features and other features, to determine the significance of characteristics. In this paper, the impor- tance of the features’ subset was determined by CFS using Pearson’s correlation equation. The covariance is cov (X, Y). It can be used for binary classification and regression issues, with a range of (-1, 1) from a negative to a positive correlation. It’s a fast statistic that ranks features according to their absolute correlation coefficient with the aim. Between a feature X and the target Y, the Pearson correlation coefficient is: cov(𝑋 , 𝑌 ) 𝜌𝑖 = 𝜎( 𝑋 )𝜎𝑦 Mutual Information Gain (MI) is a metric for measuring how much information one random variable possesses about another. That is the mutual information between X and Y and may be thought of as a measure of X’s amount of knowledge of Y (or Y’s amount of knowledge of X). Therefore, it can be defined as: 𝐼 (𝑋 ; 𝑌 ) = 𝐻 (𝑋 )–𝐻 (𝑋 |𝑌 ) Where I (X; Y) represents mutual information between X and Y, H(X) represents entropy for X, and H (X | Y) represents conditional entropy for X given Y. 4.3. Identification of the Prediction Algorithm To evaluate the robustness of the feature subsets created using the three feature selection techniques, three machine learning algorithms based on classification were applied to the collected features. We tested the following supervised machine learning methods. Decision Trees (DT) are trees that classify instances by sorting them based on feature values 84 [28]. Two entities, decision nodes, and leaves can be used to explain the tree. Each leaf in a decision tree indicates a value that the node might adopt, whereas each node represents a feature in an instance to be categorized. From the root to the leaf, a path is traced and sorted according to feature values. In this study, we have used 5 leaves. Random Forest (RF) is a classifier made up of h(x,k), k=1,..., where k is independently identically distributed random vectors, and each tree votes for the most popular class at input x with a single unit vote [9]. The RF classifier in this paper is made up of eight trees, each of which was developed using the classification and regression tree (CART) technique. Each case of a fresh dataset is handed down to each of the eight trees in order to categorize it. The forest picks the class with the most votes out of eight to be the case’s final class label. Support Vector Machines (SVM) is a technique, a sparse kernel decision machine that builds its learning model without calculating posterior probabilities. This is a relatively new supervised machine learning method. According to Gonzalez-Abril et al. [19], SVM conducts classification by creating an N-dimensional hyperplane that best divides the data into two groups. It has been demonstrated that increasing the margin and establishing the maximum feasible distance between the separating hyperplane and the instances on either side reduces the predicted generalization error. 4.4. Evaluation of Classification Models We have three classifiers to select from to answer a specific classification issue, therefore, we need to assess the quality of each (prediction accuracy). To achieve that, we use a confusion matrix that describes the number of correctly and incorrectly predicted examples by the classification model. Table 4 depicts the binary classification problem’s confusion matrix, which is a particular contingency table with two dimensions: actual and predicted. Each metric is a critical indicator of how well a model performed in relation to a set of criteria. The percentage of valid predictions correctly categorized by the model is known as model accuracy (Eq. 1): (𝑇 𝑃 + 𝑇 𝑁 ) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦(𝐴𝑐𝑐) = (𝑇 𝑃 + 𝑇 𝑁 + 𝐹 𝑃 + 𝐹 𝑁 ) Accuracy is the fraction of positive results predicted by the model that is really positive (Eq. 2): (𝑇 𝑃) 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑃𝑟𝑐) = (𝑇 𝑃 + 𝐹 𝑃) Model recall is the proportion of relevant outcomes retrieved (Eq. 3): (𝑇 𝑃) 𝑅𝑒𝑐𝑎𝑙𝑙(𝑅𝑐𝑙) = (𝑇 𝑃 + 𝐹 𝑁 ) 5. Results and Discussion The first step was selecting the most promising features using the three different feature selection methods, namely Variance Threshold (VAR), Pearson’s Correlation (PC), and Mutual Information 85 Table 4 Definition of a confusion matrix. Predicted Class Posiitve Negative Actual class Positive True Positive(TP) False Negative(NP) Negative False Positive(FP) True Negative(TN) (MI). The feature selection was done on the entire dataset. Our results show that there are 9 features that are considered the most significant to reflect the original dataset among the 14 features collected for the prediction task in Table 3. The outcome for the selected feature subset for each feature selection technique is shown in Table 5. The indices Social count(X0 ), Tweets(X1 ), News(X4 ), Capture(X7 ), and Text view(X13 ) appear in all three feature subsets among the nine features in Table 5, indicating that these five features constitute the dataset’s fundamental characteristics and are the most important representative of the original dataset. That is, these five features are the ones that play the most important roles in deciding which papers will become highly cited. Indices Social count(X0 ), Tweets(X1 ) show how many times a paper has been discussed in a certain term and shared on the social network, this indicates how social media metrics help research dissemination. News(X4 ) represents the number of times a paper is referenced in the news media. Moreover, Capture(X7 ) captures the interest in the publication on the internet overall. Another useful feature is Text view(X13 ), which is the number of times a publication has been seen in full detail. The second step was splitting our dataset into test and training sets. We sample our training set while holding out 30% of the data for testing (evaluating) our classifier. This method can approximate how well our model will perform on new data. The performance of these features in predicting future highly cited papers was then tested using the three classification models mentioned previously, Decision tree (DT), Random Forest (RF), and Support Vector Machines (SVM). A code project using Python and its libraries like SKlearn and numpy was developed to test these models and their performance later using the performance measures mentioned earlier. Table 6 illustrates the final classification performance of each feature selection method outcome under each of the three classifiers and the average classification accuracy (Acc), precision (Prc), and recall (Rcl) are shown in the last row. Obviously, each classifier has a significant classification performance for each of the feature subsets. The feature subset picked by Pearson’s correlation (PC) and Mutual information (MI) has obtained the best precision with 0.97 respectively trained by Random Forest and in terms of precision, all three feature selection techniques have the same number with 0.96. Whereas the Variance threshold (VAR) has a maximum recall of 0.99, which was evaluated using Random Forest. Regardless of the classification model or feature selection approach, the average classification Table 5 Feature results. Selection Feature subset VAR Social media(X0 ),Tweets(X1 ),News(X4 ),Capture(X7 ),Reader(X8 ),Export(X9 ),Usage(X10 ),Links out(X12 ),Text view(X13 ) PC Social media(X0 ),Tweets(X1 ),News(X4 ),Blog(X5 ),Capture(X7 ),Export(X9 ),Usage(X10 ),Links out(X12 ),Text view(X13 ) MI Social media(X0 ),Tweets(X1 ),News(X4 ),Blog(X5 ),Reference(X6 ),Capture(X7 ),Reader(X8 ),Links clicks(X11 ),Text view(X13 ) 86 Table 6 Classification Model Performance VAR PC MI Model Acc Prc Rcl Acc Prc Rcl Acc Prc Rcl DT 0.88 0.86 0.91 0.87 0.81 0.86 0.87 0.91 0.84 RF 0.97 0.96 0.99 0.97 0.96 0.98 0.97 0.96 0.98 SVM 0.93 0.95 0.91 0.92 0.94 0.9 0.92 0.94 0.9 Average 0.93 0.92 0.94 0.92 0.90 0.91 0.92 0.94 0.90 accuracy is equal to or more than 0.9. Although there is little variation in accuracies, the findings show that the features derived by the three feature selection approaches are stable and helpful to classify and forecast future highly cited papers. Furthermore, the results reveal that Random Forest, in particular, fared best, compared to the Decision Tree and Support Vector Machines. The current study’s drawback is that it only looked at 10 journals in the field of computer science. The findings from this small corpus would not apply to papers in other fields. Addi- tionally, the current study is exclusively based on PlumX altmetrics correlated with Scopus and we narrowed our attention to only three machine-learning algorithms. Other algorithms, such as neural networks, and XGBoost might be investigated in the future. However, the results serve as a point of reference for future evaluations of prediction-related studies. We provide the dataset along with one prediction model code as an example for further analysis10 . 6. Conclusion In this paper, we build several experiments based on previous research that investigated metrics and their potential power to predict citation counts. Focusing on the computer science domain and aiming to find the most promising formula of altmetrics to predict the future success of a paper measured in the number of citations, we first performed several feature selection techniques to choose the most important feature subset that better represents the original dataset. An extensive dataset comprising papers from 10 computer engineering journals (7,588) was collected, altmetrics and citation counts for each paper were extracted. Furthermore, altmetrics were evaluated using a feature space with 14 feature indices to determine the most promising dataset using Variance threshold, Pearson’s correlation, and Mutual information, and later the classification performance of the feature subsets was verified using three types of classifiers: Decision tree, Random forest, and Support vector machines. Finally, we evaluated these prediction models and compared their accuracy. The results show that Random forest surpasses the other classification methods and we conclude that altmetrics are a valuable predictor for highly cited papers, specifically these five altmetrics features: social media count, tweets, news, capture, reader count, and text view. References [1] Abiodun, O.I., Jantan, A., Omolara, A.E., Dada, K.V., Mohamed, N.A., Arshad, H., 2018. State-of-the-art in artificial neural network applications: A survey. Heliyon . 10 https://doi.org/10.5281/zenodo.7777785 87 [2] Abramo, G., D’Angelo, C., Felici, G., 2019. Predicting publication long-term impact through a combination of early citations and journal impact factor. Journal of Informetrics . [3] Acuna, D.E., Allesina, S., Kording, K.P., 2012. Future impact: Predicting scientific success. Nature . [4] Akella, A.P., Alhoori, H., Kondamudi, P.R., Freeman, C., Zhou, H., 2021. Early indicators of scientific impact: Predicting citations with altmetrics. Journal of Informetrics . [5] Aksnes, D., Langfeldt, L., Wouters, P., 2019. Citations, citation indicators, and research quality: An overview of basic concepts and theories. SAGE Open . [6] Alchokr, R., Krüger, J., Shakeel, Y., Saake, G., Leich, T., 2022. Peer-reviewing and sub- mission dynamics around top software-engineering venues: A juniors’ perspective, in: International Conference on Evaluation and Assessment in Software Engineering. [7] Bai, X., Zhang, F., Lee, I., 2019. Predicting the citations of scholarly paper. Journal of Informetrics . [8] Bornmann, L., Leydesdorff, L., Wang, J., 2014. How to improve the prediction based on citation impact percentiles for years shortly after the publication date? Journal of Informetrics . [9] Breiman, L., 2001. Random forests. Machine Learning . [10] Carlsson, H., 2009. Allocation of research funds using bibliometric indicators – asset and challenge to swedish higher education sector. [11] Chen, T., Guestrin, C., 2016. Xgboost: A scalable tree boosting system, in: International Conference on Knowledge Discovery and Data Mining. [12] Copiello, S., 2020. Other than detecting impact in advance, alternative metrics could act as early warning signs of retractions: tentative findings of a study into the papers retracted by plos one. Scientometrics . [13] Daud, A., Ahmad, M., Malik, M., Che, D., 2014. Using machine learning techniques for rising star prediction in co-author network. Scientometrics . [14] Edgar, T.W., Manz, D.O., 2017. Machine Learning. Syngress. [15] Eysenbach, G., 2011. Can tweets predict citations? metrics of social impact based on twitter and correlation with traditional metrics of scientific impact. Journal of Medical Internet Research . [16] Fan, J., Chen, M., Luo, J., Yang, S., Shi, J., Yao, Q., Zhang, X., Du, S., Qu, H., Cheng, Y., Ma, S., Zhang, M., Xu, X., Wang, Q., Zhan, S., 2021. The prediction of asymptomatic carotid atherosclerosis with electronic health records: A comparative study of six machine learning models. BMC Medical Informatics and Decision Making . [17] Fu, L., Aliferis, C., 2008. Models for predicting and explaining citation count of biomedical articles. AMIA Symposium . [18] Galligan, F., Dyas-Correia, S., 2013. Altmetrics: Rethinking the way we measure. Serials Review . [19] Gonzalez-Abril, L., Angulo, C., Velasco-Morente, F., Català, A., 2005. Unified dual for bi-class SVM approaches. Pattern Recognition . [20] Hassan, S.U., Aljohani, N., Idrees, N., Sarwar, R., Nawaz, R., Martínez-Cámara, E., Ventura, S., Herrera, F., 2020. Predicting literature’s early impact with sentiment analysis in twitter. Knowledge-Based Systems . [21] Holden, G., Rosenberg, G., Barker, K., 2005. Tracing thought through time and space: A 88 selective review of bibliometrics in social work. Social Work in Health Care . [22] Li, M., Xu, J., Ge, B., Liu, J., Jiang, J., Zhao, Q., 2019. A deep learning methodology for citation count prediction with large-scale biblio-features. [23] Lutz, B., 2014. Do altmetrics point to the broader impact of research? an overview of benefits and disadvantages of altmetrics. Journal of Informetrics . [24] Nuzzolese, A.G., Ciancarini, P., Gangemi, A., Peroni, S., Poggi, F., Presutti, V., 2019. Do altmetrics work for assessing research quality? Scientometrics . [25] Patro, B., Aggarwal, A., 2011. How honest is the h-index in measuring individual research output? Journal of postgraduate medicine . [26] Penner, O., Pan, R.K., Petersen, A.M., Kaski, K., Fortunato, S., 2013. On the predictability of future impact in science. Scientific reports 3, 3052. [27] Poggi, F., Ciancarini, P., Gangemi, A., Nuzzolese, A.G., Peroni, S., Presutti, V., 2019. Pre- dicting the results of evaluation procedures of academics. PeerJ Computer Science . [28] Quinlan, J.R., 1986. Induction of decision trees. Machine Learning . [29] Ruan, X., Zhu, Y., Li, J., Cheng, Y., 2020. Predicting the citation counts of individual papers via a BP neural network. Journal of Informetrics . [30] Shakeel, Y., Alchokr, R., Krüger, J., Leich, T., Saake, G., 2022a. Altmetrics and citation counts: An empirical analysis of the computer science domain, in: Joint Conference on Digital Libraries. [31] Shakeel, Y., Alchokr, R., Krüger, J., Leich, T., Saake, G., 2022b. Are altmetrics useful for assessing scientific impact? a survey, in: International Conference on Management of Digital EcoSystems. [32] Shakeel, Y., Alchokr, R., Krüger, J., Leich, T., Saake, G., 2022c. Incorporating altmet- rics to support selection and assessment of publications during literature analyses, in: International Conference on Evaluation and Assessment in Software Engineering. [33] Shakeel, Y., Alchokr, R., Krüger, J., Saake, G., Leich, T., 2021. Are altmetrics proxies or complements to citations for assessing impact in computer science?, in: Joint Conference on Digital Libraries. [34] Siler, K., Lee, K., Bero, L., 2015. Measuring the effectiveness of scientific gatekeeping. Proceedings of the National Academy of Sciences . [35] Stegehuis, C., Litvak, N., Waltman, L., 2015. Predicting the long-term citation impact of recent publications. Journal of Informetrics . [36] Thelwall, M., 2020. The pros and cons of the use of altmetrics in research assessment. Scholarly Assessment Reports . [37] Thelwall, M., Nevill, T., 2018. Could scientists use altmetric.com scores to predict longer term citation counts? Journal of Informetrics . [38] Wang, D., Song, C., Barabási, A.L., 2013. Quantifying long-term scientific impact. Science . [39] Wang, M., Wang, Z., Chen, G., 2019. Which can better predict the future success of articles? Bibliometric indices or alternative metrics. Scientometrics . [40] Yu, T., Yu, G., Li, P.Y., Wang, L., 2014. Citation impact prediction for scientific papers using stepwise regression analysis. Scientometrics 101, 1233–1252. 89