<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Feature Extraction for Deep Neural Networks: A Case Study on the COVID-19 Retweet Prediction Challenge</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Information difusion</institution>
          ,
          <addr-line>Retweet prediction, Feature extraction, Deep learning, COVID-19</addr-line>
        </aff>
      </contrib-group>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>This paper presents our solution for the COVID-19 Retweet Prediction Challenge, which is part of the CIKM 2020 AnalytiCup. The challenge was to predict the number of times it will be retweeted of tweets related to COVID-19. We tackled this challenge using a deep neural network-based retweet prediction method. In this method, we introduced useful feature extraction techniques for retweet prediction. Experiments have confirmed the efectiveness of the techniques, especially for the primary processes: numerical feature transformation and user modeling. Finally, the solution used a stacking-based ensemble method to provide the final predictive result for the competition. The code for this solution is available at https://github.com/haradai1262/CIKM2020-AnalytiCup.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        To understand the mechanisms of information difusion is an active
area of research that has many practical applications. In a crisis like
COVID-19, information difusion directly influences people’s
behavior and becomes especially valuable [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Retweeting, sharing tweets
directly to followers on Twitter, can be viewed as amplifying the
difusion of original content. Thus retweet prediction is beneficial
for understanding the mechanisms of information difusion.
      </p>
      <p>
        Retweet prediction has been widely studied. In recent years,
there has been growing interest in methods based on deep neural
networks (DNNs), which have reported high performance [
        <xref ref-type="bibr" rid="ref10 ref15 ref19">10, 15,
19</xref>
        ]. DNNs have made it possible to skip many feature engineering,
especially in image processing and natural language processing.
However, in DNNs for tabular data including retweet predictions,
data pre-processing and feature engineering are still often necessary
and significantly impact performance [
        <xref ref-type="bibr" rid="ref14 ref9">9, 14</xref>
        ].
      </p>
      <p>
        In retweet prediction, the processing of numerical features
related to tweets, such as the number of followers, strongly afects
performance. To train DNNs efectively, it may be useful to transform
the numerical features to diferent distributions [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Furthermore,
it is crucial to learn the expression of the user that publish tweets.
Although the embedding-based method using the user id is often
used in DNN-based methods it may not be that easy to suficiently
learn the representation of the infrequent users included in the
training data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. As mentioned above, it is necessary to design the
input features to the DNN according to the data and tasks, which
can be dificult.
      </p>
      <p>
        As a case study to tackle these dificulties, in this paper, we
present our solution to the COVID-19 Retweet Prediction
Challenge as part of the CIKM 2020 AnalytiCup. This challenge’s task
was to predict the number retweets for a given COVID-19-related
tweet. We propose a DNN-based retweet prediction method. In the
proposed method, we introduce a useful feature extraction method
as an input to a DNN for retweet prediction. In the feature
extraction, we transform numerical features into multiple diferent
distributions to efectively utilize the metrics related to tweets.
Besides, we cluster users based on multiple types of features to enable
infrequent users to represent user attributes. Using the obtained
features, we train a DNN model. In the experiments, we verify
the efectiveness of the transformation of numerical features and
user data handling, which are essential issues in retweet
prediction. In addition, as a solution for the competition, we introduce a
stacking-based ensemble method to improve the prediction results’
performance and robustness.
In the COVID-19 Retweet Prediction Challenge, TweetsCOV19
dataset was provided. This dataset consists of 8151524 tweets
concerning COVID-19 on Twitter published by 3664518 users from
October 2019 until April 2020. For each tweet, the dataset
provides metadata and some precalculated features. The contents of
the dataset and the process of their generation are detailed in this
paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2.2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task Description</title>
      <p>Given a tweet from the TweetsCOV19 dataset, the task was to
predict the number retweets (#retweets). The test data for the
evaluation are tweets published during May 2020, which the month
subsequent to the tweets included in the TweetsCOV19 dataset.
The mean squared log error (MSLE) is used as the evaluation metric
for the task.
An overview of the proposed method is shown in Figure 1. First,
we extract the features to be inputed to the DNN. The features
input to the DNN are divided into numerical, categorical, and
multihot categorical features, and categorical and multi-hot categorical
features are converted into low-dimensional vectors through the
embedding layer. Using the extracted features, we train a multilayer
perceptron (MLP) for retweet prediction.
MSE Loss</p>
      <p>Fully-Conneted (1) ⇨ ReLU
Fully-Conneted (128) ⇨ Batch normalization ⇨ ReLU ⇨ Dropout
Fully-Conneted (512) ⇨ Batch normalization ⇨ ReLU ⇨ Dropout
Fully-Conneted (2048) ⇨ Batch normalization ⇨ ReLU ⇨ Dropout</p>
      <p>Flatten and Concatnate
The features used in the proposed method are shown in Table 1.
Numerical feature transformation and user modeling, the critical issues
of retweet prediction, are discussed in the following subsections.
Please refer to the published code1 for strict processing.
3.2.1 Numerical Feature Transformation. #retweets is strongly
related to metrics of a tweet, such as the number of followers
(#followers) and favorites (#followers), which are expected to have a
significant impact on the performance of our prediction model. In
the proposed method, we attempt to represent various distributions
of these metrics and improve the performance by combining them
as input for the DNN. Specifically, we introduce the following five
numerical feature transformations.</p>
      <p>Z transformation. We transform each value  ∈  by the
following function using the mean  and standard deviation  of the
dataset  :</p>
      <p>( ) =  −  (1)
CDF transformation. We derived a normal distribution from the
mean and standard deviation observed from the dataset. Using the
distribution, we transformed the original values by the cumulative
distribution function (CDF). To implement this function, we used
the Python library SciPy2.</p>
      <p>Rank transformation. We transform each value  ∈  by the
following function:
 ( ) =
Õ I &lt; , I &lt; =
 ∈
( 1   &lt;  is true
0 otherwise
Log transformation. We transform each value  ∈  by the
following function:</p>
      <p>( ) = log ( + 1)
Here we add one to  to avoid the output being infinity when  is
zero.</p>
      <p>Binning transformation. We separate each value into buckets of
the same size based on the quantiles of the sample. In the proposed
method, the values are divided into ten quantiles. Unlike other
1https://github.com/haradai1262/CIKM2020-AnalytiCup/blob/master/src/feature_
extraction.py
2https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html
(2)
(3)
transformations, we use the transformed values as categorical
features.</p>
      <p>
        We apply these transformations to tweet metrics (Table 1) and
input the obtained values into a MLP.
3.2.2 User Modeling. Appropriately representing the user who
published the tweet is essential for predicting #retweets. For
DNNbased prediction models, a common and efective method is to learn
by inputting the user id into embedding layers. However, the
infrequent users included in the training data are not suficiently trained
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. By clustering users from various points of view and embedding
based on their cluster Ids, we can even learn the user attributes for
infrequent users. Specifically, we introduced the following three
types of user clustering.
      </p>
      <p>
        User topic clustering. We clustered users using topics contained
in tweets. Specifically, we combined the entities, hashtags,
mentions, and URLs included in the tweet and set them as sequences
for each user. Next, user topic features were extracted by applying
the term frequency-inverse document frequency (TFIDF) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to
the sequences and dimensionality reduction using singular value
decomposition (SVD) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Using the extracted features, we applied
the K-means clustering [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to the users.
      </p>
      <p>User metric clustering. We clustered users using user-related
metrics. User metric features consist of the mean and standard
deviation of the user’s followers, friends, likes, as well as the unique
numbers of entities, hashtags, mentions, and URLs from the tweet
log posted by each user. Using the obtained features, we applied
the K-means clustering to the users.</p>
      <p>User topic and metric clustering. Using the features that
combine user topic features and user metric features, we applied the
K-means clustering to the users.</p>
      <p>Note that the number of clusters is set to 1000 in each clustering.
3.3</p>
    </sec>
    <sec id="sec-3">
      <title>Model</title>
      <p>
        Using the extracted features, we trained the MLP. In the proposed
method, the inputs of the MLP can be divided into numerical,
categorical, and multi-hot categorical features. Numerical features
were applied to min-max scaling and converted to a scale of [
        <xref ref-type="bibr" rid="ref1">0,
1</xref>
        ]. In the proposed method, we transformed categorical features
into low-dimensional vectors using the embedding layers.
Specifically, we represent one-hot vector   , a categorical feature, with
low-dimensional vector  ,
 =  ,
      </p>
      <p>1
 =</p>
      <p>,
where  is an embedding matrix for categorical feature . We
further modify it and represent multi-hot vector   , a multi-hot
categorical feature, in the following way:</p>
      <p>where  is the number of items that a sample has for categorical
feature . These processed values are concatenated and flattened
before inputting to the MLP.</p>
      <p>
        As shown in Figure 1, the MLP is a structure that uses ReLU
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as the activation function and includes batch normalization
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and dropout [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In the proposed method, mean squared error
(MSE) loss is calculated as a loss function from the ground truth
of #retweets using log transformation and the prediction of MLP.
(4)
(5)
      </p>
      <p>
        Description
#Mfoeltlroicwserreslaatnedd #tofaavotwriteeest,. #Sfprieecnificdasllayn, dw#efuavseor#iftoesll,oawnedr#s,fo#lflroiewnedrss, aanndd ##ffraiveonrdisteasn,das#wfaevlolraistetshe multiplication of N
Values obtained by applying z transformation to “tweet metrics” N
Values obtained by applying CDF transformation to “tweet metrics” N
Values obtained by applying rank transformation to “tweet metrics” N
Values obtained by applying log transformation to “tweet metrics” N
Values obtained by applying binning transformation to “tweet metrics” N
Positive (1 to 5) and negative (-1 to -5) sentiment scores extracted from the text of a tweet by SentiStrength [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] C
Features obtained from the timestamp of a tweet. Specifically, we use “weekday,” “hour,” “day,” and “week of month”
as categorical features, and the diference between the timestamp of the tweet and 2020/6/1 as numerical features
Entities extracted from the text of a tweet by the Fast Entity Linker [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
Hashtags included in a tweet
Mentions included in a tweet
URLs included in a tweet
Components of URLs included in a tweet. We extract the three components “protcol,” “host,” and “top level domain”
from the URL (e.g., “http,” “www.youtube.com,” and “.com” are extracted from http://www.youtube.com/)
User identifier C
Identifier assigned to a user by three clustering methods described in section 3.2.2. C
aMnedtruicnsiqrueleanteudmtobearsuosefre.nStpiteiceisfic,ahllays,hwtaeguss,emtehnetimoneas,naannddUsRtaLnsdfarrodmdtehveiautisoenr’soft wtheeetfohlilsotworeyr.s, friends, favorites, N
fMroemtritchsereplraetveidoutos tdhaey,dtyhneapmriecvsioinusthwee#efko,lloonwtehres saanmde#fdraieyn,dansdofwaituhsienr.thWeesuamseethweeeinkcrease in #follower and #friendsN
a5n-ddimUeRnLssioinncallufdeeadtuirnestwexetertascatnedd bdyimaepnpslyioinnagliTtyFIrDeFdu[c2t]iotonsbeyquSeVnDce[s4]consisted of entities, hashtags, mentions, N
Values obtained by applying count encoding [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to the categorical features “sentiment” and “time ” N
“Vtaimluee,”s aonbdta“inuesedrbIyd”applying target encoding [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to the categorical features “tweet metrics binning,” “sentiment,” N
N, C
MC
MC
MC
MC
MC
6
Note that, at the time of inference, the output value is applied to
the inverse transformation and returned to the original scale.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3.4 Validation Strategy</title>
      <p>The method for dividing the dataset into training and validation
data was as follows. In this competition, the test data for evaluation
is May 2020, one month after the data included in the training data.
To bring the distribution of the validation data and the test data
closer, we need to use the validation data that is as close to the test
data as possible in time series. Thus, we used the data from May
2020 as the validation data. We also wanted to utilize the May 2020
data to perform better learning with fresh data close to test data.
For this reason, the May 2020 data was divided into five validation
data point and five models to be trained. Here, when verifying with
one verification data point, the remaining four are used for training
data. Finally, the prediction value of the test data was calculated
for each model, and the evaluation score was calculated from their
average value.</p>
    </sec>
    <sec id="sec-5">
      <title>4 EXPERIMENTS</title>
    </sec>
    <sec id="sec-6">
      <title>4.1 Settings</title>
      <p>
        The experimental results are not the scores of the test dataset, but
the average of the 5-fold validation described in section 3.4. We
empirically set the sizes of the three fully-connected layers to 2048,
512, and 128, respectively, dimension of embedding to 32, dropout
rate to 0.3, and batch size to 256. We use Adam [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to optimize
all models. Other hyperparameters can be strictly checked in the
published code.
      </p>
    </sec>
    <sec id="sec-7">
      <title>4.2 Results</title>
      <p>First, we verified the efectiveness of the numerical feature
transformation introduced in section 3.2.1. In the experiment, we tried
using the tweet metrics without the transformation, with the
application of each transformation, and with the application of all
transformations. The experimental results are shown in Table 2.
It was confirmed that log, rank, CDF, z, and binning
transformation contributed to improving the performance, in this order. Since
the number of followers and favorites in tweet metrics follows the
power law, it is reasonable that log transformation is useful. Also,
the best MSLE was obtained when applying all transformations.
The result shows the efectiveness of transforming tweet metrics</p>
      <sec id="sec-7-1">
        <title>Both user ID and user cluster ID are unused</title>
        <p>MSLE
in the LOSS column denotes mean absolute error loss.
0.128448
0.127761
vinayaka
myaunraitau (ours)
parklize</p>
      </sec>
      <sec id="sec-7-2">
        <title>JimmyChang</title>
      </sec>
      <sec id="sec-7-3">
        <title>Thomary</title>
        <p>MSLE (Test dataset)
into diferent distributions and inputting them into the DNN model
for retweet prediction.</p>
        <p>Next, we verified the efectiveness of the user modeling
introduced in section 3.2.2. In the experiment, in regard to embedding of
user ID and user cluster ID, we tried not using either, using either
one, and using both. The experimental results are shown in Table 3.
It has been found that the performance is improved when the user
cluster ID is also used compared to when only using the user ID.
5</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>SOLUTION</title>
      <p>
        We used ensemble on multiple models with modified
hyperparameters (size of embedding dimension, sizes of fully-connected layers,
and dropout rate) and loss function. Table 4 shows the seven
models used for the ensemble. Stacking ridge regression [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] was used
as the ensemble method. Stacking ridge regression is a method of
blending each model’s prediction results by a linear sum based on
the weights learned by ridge regression. The integer value was
obtained as the final predicted value by rounding according to the
competition’s manners. The final leaderboard looked like Table 5.
      </p>
      <sec id="sec-8-1">
        <title>Our solution was located in the 3rd place.</title>
        <p>6</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>CONCLUSION</title>
      <p>This paper presents our solution for the COVID-19 Retweet
Prediction Challenge. We proposed a DNN-based retweet prediction
method. To improve the performance, we introduced a feature
extraction method to be input into the DNN (mainly focusing on
numerical feature transformation and user modeling) and confirmed
its efectiveness with experiments. As a solution for the
competition, we introduced a stacking-based ensemble method for multiple
models, which positioned us in the 3rd place.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Roi</given-names>
            <surname>Blanco</surname>
          </string-name>
          , Giuseppe Ottaviano, and
          <string-name>
            <given-names>Edgar</given-names>
            <surname>Meij</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Fast and Space-Eficient Entity Linking for Queries</article-title>
          .
          <source>In Proceedings of the Eighth ACM Int. Conf. on Web Search and Data Mining. ACM</source>
          ,
          <volume>179</volume>
          -
          <fpage>188</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Leo</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <year>1996</year>
          .
          <article-title>Stacked regressions</article-title>
          .
          <source>Machine learning 24, 1</source>
          (
          <year>1996</year>
          ),
          <fpage>49</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Dimitar</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          , Erdal Baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, and
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Dietze</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic</article-title>
          .
          <source>In Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management. Association for Computing Machinery</source>
          , New York, NY, USA,
          <fpage>2991</fpage>
          -
          <lpage>2998</lpage>
          . https://doi.org/10. 1145/3340531.3412765
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Nathan</given-names>
            <surname>Halko</surname>
          </string-name>
          ,
          <string-name>
            <surname>Per-Gunnar Martinsson</surname>
          </string-name>
          , and Joel A Tropp.
          <year>2011</year>
          .
          <article-title>Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions</article-title>
          .
          <source>SIAM review 53</source>
          ,
          <issue>2</issue>
          (
          <year>2011</year>
          ),
          <fpage>217</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Casper</given-names>
            <surname>Hansen</surname>
          </string-name>
          , Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, and
          <string-name>
            <given-names>Christina</given-names>
            <surname>Lioma</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Content-aware Neural Hashing for Cold-start Recommendation</article-title>
          .
          <source>In Proceedings of the 43rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval</source>
          .
          <fpage>971</fpage>
          -
          <lpage>980</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Cindy</given-names>
            <surname>Hui</surname>
          </string-name>
          , Yulia Tyshchuk, William A Wallace,
          <string-name>
            <surname>Malik</surname>
            Magdon-Ismail, and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Goldberg</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Information cascades in social media in response to a crisis: a preliminary model and a case study</article-title>
          .
          <source>In Proceedings of the 21st Int. Conf. on World Wide Web</source>
          .
          <fpage>653</fpage>
          -
          <lpage>656</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Iofe</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Szegedy</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Batch normalization: accelerating deep network training by reducing internal covariate shift</article-title>
          .
          <source>In Proceedings of the 32nd Int. Conf. on Machine Learning</source>
          .
          <fpage>448</fpage>
          -
          <lpage>456</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Diederik</surname>
            <given-names>P</given-names>
          </string-name>
          <string-name>
            <surname>Kingma and Jimmy Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>6980</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Yuanfei</given-names>
            <surname>Luo</surname>
          </string-name>
          , Mengshuo Wang,
          <string-name>
            <surname>Hao Zhou</surname>
          </string-name>
          , Quanming Yao,
          <string-name>
            <surname>Wei-Wei</surname>
            <given-names>Tu</given-names>
          </string-name>
          , Yuqiang Chen, Wenyuan Dai, and
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Autocross: Automatic feature crossing for tabular data in real-world applications</article-title>
          .
          <source>In Proceedings of the 25th ACM SIGKDD Int. Conf. Knowledge Discovery &amp; Data Mining</source>
          .
          <year>1936</year>
          -
          <fpage>1945</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Renfeng</surname>
            <given-names>Ma</given-names>
          </string-name>
          , Xiangkun Hu, Qi Zhang, Xuanjing Huang, and
          <string-name>
            <surname>Yu-Gang Jiang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Hot Topic-Aware Retweet Prediction with Masked Self-attentive Model</article-title>
          .
          <source>In Proceedings of the 42nd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval</source>
          .
          <fpage>525</fpage>
          -
          <lpage>534</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>James</surname>
            <given-names>MacQueen</given-names>
          </string-name>
          et al.
          <year>1967</year>
          .
          <article-title>Some methods for classification and analysis of multivariate observations</article-title>
          .
          <source>In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability</source>
          , Vol.
          <volume>1</volume>
          .
          <fpage>281</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Daniele</given-names>
            <surname>Micci-Barreca</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>3</volume>
          ,
          <issue>1</issue>
          (
          <year>2001</year>
          ),
          <fpage>27</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Vinod</given-names>
            <surname>Nair</surname>
          </string-name>
          and
          <string-name>
            <given-names>Geofrey E</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Rectified linear units improve restricted boltzmann machines</article-title>
          .
          <source>In Proceedings of the 27th Int. Conf. on Machine Learning</source>
          .
          <fpage>807</fpage>
          -
          <lpage>814</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Jean-François Puget</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Feature Engineering For Deep Learning</article-title>
          . https://medium.com
          <article-title>/inside-machine-learning/feature-engineering-fordeep-learning-2b1fc7605ace</article-title>
          . Accessed:
          <fpage>2020</fpage>
          -09-28.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Jiezhong</surname>
            <given-names>Qiu</given-names>
          </string-name>
          , Jian Tang, Hao Ma, Yuxiao Dong,
          <string-name>
            <given-names>Kuansan</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jie</given-names>
            <surname>Tang</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deepinf: Social influence prediction with deep learning</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGKDD Int. Conf. on Knowledge Discovery &amp; Data Mining</source>
          .
          <fpage>2110</fpage>
          -
          <lpage>2119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Shubham</given-names>
            <surname>Singh</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Categorical Variable Encoding Techniques</article-title>
          . https://medium.com/analytics-vidhya/
          <article-title>categorical-variable-encodingtechniques-17e607fe42f9</article-title>
          . Accessed:
          <fpage>2020</fpage>
          -09-28.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Nitish</surname>
            <given-names>Srivastava</given-names>
          </string-name>
          , Geofrey Hinton, Alex Krizhevsky, Ilya Sutskever, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          <volume>15</volume>
          ,
          <issue>1</issue>
          (
          <year>2014</year>
          ),
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Mike</surname>
            <given-names>Thelwall</given-names>
          </string-name>
          , Kevan Buckley, Georgios Paltoglou, Di Cai, and
          <string-name>
            <given-names>Arvid</given-names>
            <surname>Kappas</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Sentiment strength detection in short informal text</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>61</volume>
          ,
          <issue>12</issue>
          (
          <year>2010</year>
          ),
          <fpage>2544</fpage>
          -
          <lpage>2558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Qi</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Yeyun Gong, Jindou Wu, Haoran Huang, and
          <string-name>
            <given-names>Xuanjing</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Retweet prediction with attention-based deep neural network</article-title>
          .
          <source>In Proceedings of the 25th ACM Int. on Conf. on Information and Knowledge Management</source>
          .
          <fpage>75</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Honglei</surname>
            <given-names>Zhuang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Xuanhui</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Bendersky</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Marc</given-names>
            <surname>Najork</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Feature transformation for neural ranking models</article-title>
          .
          <source>In Proceedings of the 43rd Int. ACM SIGIR Conf. Research and Development in Information Retrieval</source>
          .
          <fpage>1649</fpage>
          -
          <lpage>1652</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>