=Paper= {{Paper |id=Vol-2881/paper3 |storemode=property |title=Feature Extraction for Deep Neural Networks: A Case Study on the COVID-19 Retweet Prediction Challenge |pdfUrl=https://ceur-ws.org/Vol-2881/paper3.pdf |volume=Vol-2881 |authors=Daichi Takehara }} ==Feature Extraction for Deep Neural Networks: A Case Study on the COVID-19 Retweet Prediction Challenge== https://ceur-ws.org/Vol-2881/paper3.pdf
    Feature Extraction for Deep Neural Networks: A Case Study on
             the COVID-19 Retweet Prediction Challenge
                                                                            Daichi Takehara
                                                                             Aidemy Inc.
                                                                       takehara-d@aidemy.co.jp

ABSTRACT                                                                                     input features to the DNN according to the data and tasks, which
This paper presents our solution for the COVID-19 Retweet Predic-                            can be difficult.
tion Challenge, which is part of the CIKM 2020 AnalytiCup. The                                  As a case study to tackle these difficulties, in this paper, we
challenge was to predict the number of times it will be retweeted                            present our solution to the COVID-19 Retweet Prediction Chal-
of tweets related to COVID-19. We tackled this challenge using                               lenge as part of the CIKM 2020 AnalytiCup. This challenge’s task
a deep neural network-based retweet prediction method. In this                               was to predict the number retweets for a given COVID-19-related
method, we introduced useful feature extraction techniques for                               tweet. We propose a DNN-based retweet prediction method. In the
retweet prediction. Experiments have confirmed the effectiveness                             proposed method, we introduce a useful feature extraction method
of the techniques, especially for the primary processes: numerical                           as an input to a DNN for retweet prediction. In the feature ex-
feature transformation and user modeling. Finally, the solution used                         traction, we transform numerical features into multiple different
a stacking-based ensemble method to provide the final predictive                             distributions to effectively utilize the metrics related to tweets. Be-
result for the competition. The code for this solution is available at                       sides, we cluster users based on multiple types of features to enable
https://github.com/haradai1262/CIKM2020-AnalytiCup.                                          infrequent users to represent user attributes. Using the obtained
                                                                                             features, we train a DNN model. In the experiments, we verify
KEYWORDS                                                                                     the effectiveness of the transformation of numerical features and
                                                                                             user data handling, which are essential issues in retweet predic-
Information diffusion, Retweet prediction, Feature extraction, Deep
                                                                                             tion. In addition, as a solution for the competition, we introduce a
learning, COVID-19
                                                                                             stacking-based ensemble method to improve the prediction results’
                                                                                             performance and robustness.
1    INTRODUCTION
To understand the mechanisms of information diffusion is an active                           2 CHALLENGE
area of research that has many practical applications. In a crisis like
COVID-19, information diffusion directly influences people’s behav-
                                                                                             2.1 Dataset
ior and becomes especially valuable [6]. Retweeting, sharing tweets                          In the COVID-19 Retweet Prediction Challenge, TweetsCOV19
directly to followers on Twitter, can be viewed as amplifying the                            dataset was provided. This dataset consists of 8151524 tweets con-
diffusion of original content. Thus retweet prediction is beneficial                         cerning COVID-19 on Twitter published by 3664518 users from
for understanding the mechanisms of information diffusion.                                   October 2019 until April 2020. For each tweet, the dataset pro-
    Retweet prediction has been widely studied. In recent years,                             vides metadata and some precalculated features. The contents of
there has been growing interest in methods based on deep neural                              the dataset and the process of their generation are detailed in this
networks (DNNs), which have reported high performance [10, 15,                               paper [3].
19]. DNNs have made it possible to skip many feature engineering,
especially in image processing and natural language processing.                              2.2    Task Description
However, in DNNs for tabular data including retweet predictions,                             Given a tweet from the TweetsCOV19 dataset, the task was to
data pre-processing and feature engineering are still often necessary                        predict the number retweets (#retweets). The test data for the eval-
and significantly impact performance [9, 14].                                                uation are tweets published during May 2020, which the month
    In retweet prediction, the processing of numerical features re-                          subsequent to the tweets included in the TweetsCOV19 dataset.
lated to tweets, such as the number of followers, strongly affects per-                      The mean squared log error (MSLE) is used as the evaluation metric
formance. To train DNNs effectively, it may be useful to transform                           for the task.
the numerical features to different distributions [20]. Furthermore,
it is crucial to learn the expression of the user that publish tweets.                       3 METHOD
Although the embedding-based method using the user id is often
                                                                                             3.1 Overview
used in DNN-based methods it may not be that easy to sufficiently
learn the representation of the infrequent users included in the                             An overview of the proposed method is shown in Figure 1. First,
training data [5]. As mentioned above, it is necessary to design the                         we extract the features to be inputed to the DNN. The features
                                                                                             input to the DNN are divided into numerical, categorical, and multi-
 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons        hot categorical features, and categorical and multi-hot categorical
License Attribution 4.0 International (CC BY 4.0).                                           features are converted into low-dimensional vectors through the
In: Dimitar Dimitrov, Xiaofei Zhu (eds.): Proceedings of the CIKM AnalytiCup 2020, 22
October, 2020, Gawlay (Virtual Event), Ireland, 2020, published at http://ceur-ws.org.       embedding layer. Using the extracted features, we train a multilayer
                                                                                             perceptron (MLP) for retweet prediction.
                                                                                         9
 #Retweets      Log transformation           Target
                                                          MSE Loss
                                                                              Prediction
                                                                                                                                                  transformations, we use the transformed values as categorical fea-
                                                         Fully-Conneted (1) ⇨ ReLU                                                                tures.
                                   Fully-Conneted (128) ⇨ Batch normalization ⇨ ReLU ⇨ Dropout                                                       We apply these transformations to tweet metrics (Table 1) and
                                   Fully-Conneted (512) ⇨ Batch normalization ⇨ ReLU ⇨ Dropout                                                    input the obtained values into a MLP.
                                  Fully-Conneted (2048) ⇨ Batch normalization ⇨ ReLU ⇨ Dropout
                                                                                                                                                  3.2.2 User Modeling. Appropriately representing the user who
                                                           Flatten and Concatnate
                                                                                                                                                  published the tweet is essential for predicting #retweets. For DNN-
                                                                     Embedding                                  Embedding
                                                                                                                                                  based prediction models, a common and effective method is to learn
                Numerical Features                           Categorical Features                 Multi-hot Categorical Features
         Tweet metrics       User dynamics                  Time                 Sentiment           Entities               Hashtags
                                                                                                                                                  by inputting the user id into embedding layers. However, the infre-
        Tweet metrics log     User metrics                 User Id             User cluster Id      Mentions                 URLs                 quent users included in the training data are not sufficiently trained
       Tweet metrics rank

       Tweet metrics CDF
                                 Topic

                            Count encording
                                                      Tweet metrics binning                      Components of URLs
                                                                                                                                                  [5]. By clustering users from various points of view and embedding
         Tweet metrics z    Target encording                                                                                                      based on their cluster Ids, we can even learn the user attributes for
                                                                                                                                                  infrequent users. Specifically, we introduced the following three
                                                                                                                                                  types of user clustering.
Figure 1: Overview of our proposed method. The notation of fea-
tures corresponds to the name columns in Table 1.                                                                                                 User topic clustering. We clustered users using topics contained
                                                                                                                                                  in tweets. Specifically, we combined the entities, hashtags, men-
                                                                                                                                                  tions, and URLs included in the tweet and set them as sequences
3.2          Features                                                                                                                             for each user. Next, user topic features were extracted by applying
The features used in the proposed method are shown in Table 1. Nu-                                                                                the term frequency-inverse document frequency (TFIDF) [2] to
merical feature transformation and user modeling, the critical issues                                                                             the sequences and dimensionality reduction using singular value
of retweet prediction, are discussed in the following subsections.                                                                                decomposition (SVD) [4]. Using the extracted features, we applied
Please refer to the published code1 for strict processing.                                                                                        the K-means clustering [11] to the users.
                                                                                                                                                  User metric clustering. We clustered users using user-related
3.2.1 Numerical Feature Transformation. #retweets is strongly re-
                                                                                                                                                  metrics. User metric features consist of the mean and standard de-
lated to metrics of a tweet, such as the number of followers (#fol-
                                                                                                                                                  viation of the user’s followers, friends, likes, as well as the unique
lowers) and favorites (#followers), which are expected to have a
                                                                                                                                                  numbers of entities, hashtags, mentions, and URLs from the tweet
significant impact on the performance of our prediction model. In
                                                                                                                                                  log posted by each user. Using the obtained features, we applied
the proposed method, we attempt to represent various distributions
                                                                                                                                                  the K-means clustering to the users.
of these metrics and improve the performance by combining them
                                                                                                                                                  User topic and metric clustering. Using the features that com-
as input for the DNN. Specifically, we introduce the following five
                                                                                                                                                  bine user topic features and user metric features, we applied the
numerical feature transformations.
                                                                                                                                                  K-means clustering to the users.
Z transformation. We transform each value 𝑥𝑖 ∈ 𝑋 by the fol-                                                                                         Note that the number of clusters is set to 1000 in each clustering.
lowing function using the mean 𝑥 and standard deviation 𝜇 of the
dataset 𝑋 :                                                                                                                                       3.3    Model
                                    𝑥𝑖 − 𝑥
                         𝐹𝑧 (𝑥𝑖 ) =                           (1)                                                                                 Using the extracted features, we trained the MLP. In the proposed
                                                                                𝜇
CDF transformation. We derived a normal distribution from the                                                                                     method, the inputs of the MLP can be divided into numerical, cat-
mean and standard deviation observed from the dataset. Using the                                                                                  egorical, and multi-hot categorical features. Numerical features
distribution, we transformed the original values by the cumulative                                                                                were applied to min-max scaling and converted to a scale of [0,
distribution function (CDF). To implement this function, we used                                                                                  1]. In the proposed method, we transformed categorical features
the Python library SciPy2 .                                                                                                                       into low-dimensional vectors using the embedding layers. Specif-
Rank transformation. We transform each value 𝑥𝑖 ∈ 𝑋 by the                                                                                        ically, we represent one-hot vector 𝒙 𝒊 , a categorical feature, with
following function:                                                                                                                               low-dimensional vector 𝒆 𝒊 ,
                                              Õ
                                                                                      (
                                                                                          1 𝑥 𝑗 < 𝑥 is true                                                                    𝒆 𝒊 = 𝑬𝒄 𝒙 𝒊 ,                       (4)
                    𝐹𝑟𝑎𝑛𝑘 (𝑥𝑖 ) =                     I𝑥 𝑗 <𝑥 , I𝑥 𝑗 <𝑥 =                                                              (2)
                                             𝑥 𝑗 ∈𝑋
                                                                                          0 otherwise                                             where 𝑬 𝒄 is an embedding matrix for categorical feature 𝑐. We
                                                                                                                                                  further modify it and represent multi-hot vector 𝒙 𝒊 , a multi-hot
Log transformation. We transform each value 𝑥𝑖 ∈ 𝑋 by the
                                                                                                                                                  categorical feature, in the following way:
following function:
                                                                                                                                                                                      1
                                               𝐹𝑙𝑜𝑔 (𝑥𝑖 ) = log𝑒 (𝑥𝑖 + 1)                                                              (3)                                    𝒆𝒊 =      𝑬𝒄 𝒙 𝒊 ,                    (5)
                                                                                                                                                                                     𝑛𝑐
Here we add one to 𝑥𝑖 to avoid the output being infinity when 𝑥𝑖 is                                                                               where 𝑛𝑐 is the number of items that a sample has for categorical
zero.                                                                                                                                             feature 𝑐. These processed values are concatenated and flattened
Binning transformation. We separate each value into buckets of                                                                                    before inputting to the MLP.
the same size based on the quantiles of the sample. In the proposed                                                                                  As shown in Figure 1, the MLP is a structure that uses ReLU
method, the values are divided into ten quantiles. Unlike other                                                                                   [13] as the activation function and includes batch normalization
1 https://github.com/haradai1262/CIKM2020-AnalytiCup/blob/master/src/feature_                                                                     [7] and dropout [17]. In the proposed method, mean squared error
extraction.py                                                                                                                                     (MSE) loss is calculated as a loss function from the ground truth
2 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html                                                                      of #retweets using log transformation and the prediction of MLP.
                                                                                                                                             10
Table 1: Feature table. Numerical, categorical, and multi-hot categorical features are denoted by N, C, and MC in the Type
column, respectively. The values in the #Dim column is the number of dimensions of the features

 Name                    Description                                                                                                              Type   #Dim
                          Metrics related to a tweet. Specifically, we use #followers, #friends, and #favorites, as well as the multiplication of
 Tweet metrics                                                                                                                                    N      6
                          #followers and #favorites, #friends and #favorites, and #followers and #friends and #favorites
 Tweet metrics z          Values obtained by applying z transformation to “tweet metrics”                                                         N      6
 Tweet metrics CDF        Values obtained by applying CDF transformation to “tweet metrics”                                                       N      6
 Tweet metrics rank       Values obtained by applying rank transformation to “tweet metrics”                                                      N      6
 Tweet metrics log        Values obtained by applying log transformation to “tweet metrics”                                                       N      6
 Tweet metrics binning    Values obtained by applying binning transformation to “tweet metrics”                                                   N      6
 Sentiment                Positive (1 to 5) and negative (-1 to -5) sentiment scores extracted from the text of a tweet by SentiStrength [18]     C      2
                          Features obtained from the timestamp of a tweet. Specifically, we use “weekday,” “hour,” “day,” and “week of month”
 Time                                                                                                                                             N, C   5
                          as categorical features, and the difference between the timestamp of the tweet and 2020/6/1 as numerical features
 Entities                 Entities extracted from the text of a tweet by the Fast Entity Linker [1]                                               MC     1
 Hashtags                 Hashtags included in a tweet                                                                                            MC     1
 Mentions                 Mentions included in a tweet                                                                                            MC     1
 URLs                     URLs included in a tweet                                                                                                MC     1
                          Components of URLs included in a tweet. We extract the three components “protcol,” “host,” and “top level domain”
 Components of URLs                                                                                                                               MC     3
                          from the URL (e.g., “http,” “www.youtube.com,” and “.com” are extracted from http://www.youtube.com/)
 User ID                  User identifier                                                                                                         C      1
 User cluster ID          Identifier assigned to a user by three clustering methods described in section 3.2.2.                                   C      3
                          Metrics related to a user. Specifically, we use the mean and standard deviation of the followers, friends, favorites,
 User metrics                                                                                                                                     N      10
                          and unique numbers of entities, hashtags, mentions, and URLs from the user’s tweet history.
                          Metrics related to the dynamics in the #followers and #friends of a user. We use the increase in #follower and #friends
 User dynamics                                                                                                                                    N      8
                          from the previous day, the previous week, on the same day, and within the same week
                          5-dimensional features extracted by applying TFIDF [2] to sequences consisted of entities, hashtags, mentions,
 Topic                                                                                                                                            N      5
                          and URLs included in tweets and dimensionality reduction by SVD [4]
 Count encoding           Values obtained by applying count encoding [16] to the categorical features “sentiment” and “time ”                     N      6
                          Values obtained by applying target encoding [12] to the categorical features “tweet metrics binning,” “sentiment,”
 Target encoding                                                                                                                                  N      11
                          “time,” and “user Id”


Note that, at the time of inference, the output value is applied to                Table 2: Comparison of numerical feature transformations.
the inverse transformation and returned to the original scale.
                                                                                             Method                                        MSLE
                                                                                             Tweet metrics                                 0.187028
3.4      Validation Strategy                                                                 Tweet metrics + z transformation              0.173821
The method for dividing the dataset into training and validation                             Tweet metrics + CDF transformation            0.151882
data was as follows. In this competition, the test data for evaluation                       Tweet metrics + log transformation            0.129360
is May 2020, one month after the data included in the training data.                         Tweet metrics + rank transformation           0.130994
To bring the distribution of the validation data and the test data                           Tweet metrics + binning transformation        0.174810
closer, we need to use the validation data that is as close to the test                      Tweet metrics + all transformations           0.127761
data as possible in time series. Thus, we used the data from May
2020 as the validation data. We also wanted to utilize the May 2020
data to perform better learning with fresh data close to test data.
For this reason, the May 2020 data was divided into five validation                all models. Other hyperparameters can be strictly checked in the
data point and five models to be trained. Here, when verifying with                published code.
one verification data point, the remaining four are used for training
data. Finally, the prediction value of the test data was calculated                4.2     Results
for each model, and the evaluation score was calculated from their                 First, we verified the effectiveness of the numerical feature trans-
average value.                                                                     formation introduced in section 3.2.1. In the experiment, we tried
                                                                                   using the tweet metrics without the transformation, with the ap-
4 EXPERIMENTS                                                                      plication of each transformation, and with the application of all
                                                                                   transformations. The experimental results are shown in Table 2.
4.1 Settings                                                                       It was confirmed that log, rank, CDF, z, and binning transforma-
The experimental results are not the scores of the test dataset, but               tion contributed to improving the performance, in this order. Since
the average of the 5-fold validation described in section 3.4. We                  the number of followers and favorites in tweet metrics follows the
empirically set the sizes of the three fully-connected layers to 2048,             power law, it is reasonable that log transformation is useful. Also,
512, and 128, respectively, dimension of embedding to 32, dropout                  the best MSLE was obtained when applying all transformations.
rate to 0.3, and batch size to 256. We use Adam [8] to optimize                    The result shows the effectiveness of transforming tweet metrics
                                                                              11
        Table 3: Comparison of user modeling features.                               method. To improve the performance, we introduced a feature ex-
                                                                                     traction method to be input into the DNN (mainly focusing on nu-
         Method                                           MSLE                       merical feature transformation and user modeling) and confirmed
         Both user ID and user cluster ID are unused      0.144809                   its effectiveness with experiments. As a solution for the competi-
         User ID                                          0.128004                   tion, we introduced a stacking-based ensemble method for multiple
         User cluster ID                                  0.137432                   models, which positioned us in the 3rd place.
         User ID and user cluster ID                      0.127761
                                                                                     REFERENCES
                                                                                      [1] Roi Blanco, Giuseppe Ottaviano, and Edgar Meij. 2015. Fast and Space-Efficient
Table 4: Models used for the ensemble of our solution. MAE                                Entity Linking for Queries. In Proceedings of the Eighth ACM Int. Conf. on Web
in the LOSS column denotes mean absolute error loss.                                      Search and Data Mining. ACM, 179–188.
                                                                                      [2] Leo Breiman. 1996. Stacked regressions. Machine learning 24, 1 (1996), 49–64.
                                                                                      [3] Dimitar Dimitrov, Erdal Baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus
    Embedding dim     Sizes of FC layers   Dropout rate    Loss      MSLE                 Zloch, and Stefan Dietze. 2020. TweetsCOV19 - A Knowledge Base of Semantically
    32                2048, 512, 128       0.1             MSE       0.128448             Annotated Tweets about the COVID-19 Pandemic. In Proceedings of the 29th ACM
    32                2048, 512, 128       0.3             MSE       0.127761             International Conference on Information & Knowledge Management. Association
                                                                                          for Computing Machinery, New York, NY, USA, 2991–2998. https://doi.org/10.
    32                2048, 512, 128       0.5             MSE       0.128413             1145/3340531.3412765
    40                4096, 1024, 128      0.1             MSE       0.127964         [4] Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. 2011. Finding structure
    40                4096, 1024, 128      0.3             MSE       0.127810             with randomness: Probabilistic algorithms for constructing approximate matrix
                                                                                          decompositions. SIAM review 53, 2 (2011), 217–288.
    40                4096, 1024, 128      0.5             MSE       0.128520         [5] Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, and
    40                4096, 1024, 128      0.1             MAE       0.132143             Christina Lioma. 2020. Content-aware Neural Hashing for Cold-start Recom-
                                                                                          mendation. In Proceedings of the 43rd Int. ACM SIGIR Conf. on Research and
                                                                                          Development in Information Retrieval. 971–980.
Table 5: Final submission results of the top six teams (semi-                         [6] Cindy Hui, Yulia Tyshchuk, William A Wallace, Malik Magdon-Ismail, and Mark
                                                                                          Goldberg. 2012. Information cascades in social media in response to a crisis: a
finalists) in the competition.                                                            preliminary model and a case study. In Proceedings of the 21st Int. Conf. on World
                                                                                          Wide Web. 653–656.
           Rank     Team                    MSLE (Test dataset)                       [7] Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: accelerating deep
                                                                                          network training by reducing internal covariate shift. In Proceedings of the 32nd
           1        vinayaka                0.120551                                      Int. Conf. on Machine Learning. 448–456.
           2        mc-aida                 0.121094                                  [8] Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti-
                                                                                          mization. arXiv preprint arXiv:1412.6980.
           3        myaunraitau (ours)      0.136239                                  [9] Yuanfei Luo, Mengshuo Wang, Hao Zhou, Quanming Yao, Wei-Wei Tu, Yuqiang
           4        parklize                0.149997                                      Chen, Wenyuan Dai, and Qiang Yang. 2019. Autocross: Automatic feature crossing
           5        JimmyChang              0.156876                                      for tabular data in real-world applications. In Proceedings of the 25th ACM SIGKDD
           6        Thomary                 0.169047                                      Int. Conf. Knowledge Discovery & Data Mining. 1936–1945.
                                                                                     [10] Renfeng Ma, Xiangkun Hu, Qi Zhang, Xuanjing Huang, and Yu-Gang Jiang.
                                                                                          2019. Hot Topic-Aware Retweet Prediction with Masked Self-attentive Model.
                                                                                          In Proceedings of the 42nd Int. ACM SIGIR Conf. on Research and Development in
                                                                                          Information Retrieval. 525–534.
into different distributions and inputting them into the DNN model                   [11] James MacQueen et al. 1967. Some methods for classification and analysis of
for retweet prediction.                                                                   multivariate observations. In Proceedings of the Fifth Berkeley Symposium on
   Next, we verified the effectiveness of the user modeling intro-                        Mathematical Statistics and Probability, Vol. 1. 281–297.
                                                                                     [12] Daniele Micci-Barreca. 2001. A preprocessing scheme for high-cardinality cat-
duced in section 3.2.2. In the experiment, in regard to embedding of                      egorical attributes in classification and prediction problems. ACM SIGKDD
user ID and user cluster ID, we tried not using either, using either                      Explorations Newsletter 3, 1 (2001), 27–32.
                                                                                     [13] Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted
one, and using both. The experimental results are shown in Table 3.                       boltzmann machines. In Proceedings of the 27th Int. Conf. on Machine Learning.
It has been found that the performance is improved when the user                          807–814.
cluster ID is also used compared to when only using the user ID.                     [14] Jean-François Puget. 2017.           Feature Engineering For Deep Learning.
                                                                                          https://medium.com/inside-machine-learning/feature-engineering-for-
                                                                                          deep-learning-2b1fc7605ace. Accessed: 2020-09-28.
5     SOLUTION                                                                       [15] Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang.
                                                                                          2018. Deepinf: Social influence prediction with deep learning. In Proceedings of the
We used ensemble on multiple models with modified hyperparame-                            24th ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining. 2110–2119.
ters (size of embedding dimension, sizes of fully-connected layers,                  [16] Shubham Singh. 2020.              Categorical Variable Encoding Techniques.
                                                                                          https://medium.com/analytics-vidhya/categorical-variable-encoding-
and dropout rate) and loss function. Table 4 shows the seven mod-                         techniques-17e607fe42f9. Accessed: 2020-09-28.
els used for the ensemble. Stacking ridge regression [2] was used                    [17] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan
as the ensemble method. Stacking ridge regression is a method of                          Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from
                                                                                          overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929–1958.
blending each model’s prediction results by a linear sum based on                    [18] Mike Thelwall, Kevan Buckley, Georgios Paltoglou, Di Cai, and Arvid Kappas.
the weights learned by ridge regression. The integer value was                            2010. Sentiment strength detection in short informal text. Journal of the American
                                                                                          Society for Information Science and Technology 61, 12 (2010), 2544–2558.
obtained as the final predicted value by rounding according to the                   [19] Qi Zhang, Yeyun Gong, Jindou Wu, Haoran Huang, and Xuanjing Huang. 2016.
competition’s manners. The final leaderboard looked like Table 5.                         Retweet prediction with attention-based deep neural network. In Proceedings of
Our solution was located in the 3rd place.                                                the 25th ACM Int. on Conf. on Information and Knowledge Management. 75–84.
                                                                                     [20] Honglei Zhuang, Xuanhui Wang, Michael Bendersky, and Marc Najork. 2020.
                                                                                          Feature transformation for neural ranking models. In Proceedings of the 43rd Int.
6     CONCLUSION                                                                          ACM SIGIR Conf. Research and Development in Information Retrieval. 1649–1652.

This paper presents our solution for the COVID-19 Retweet Pre-
diction Challenge. We proposed a DNN-based retweet prediction
                                                                                12