<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Regression-enhanced Random Forests with Personalized Patching for COVID-19 Retweet Prediction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Guangyuan Piao</string-name>
          <email>guangyuan.piao@mu.ie</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>COVID-19, Random Forests, Neural Networks, Factorization Ma-</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Weipeng Huang</string-name>
          <email>weipeng.huang@insight-centre.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight Centre for Data Analytics, School of Computer Science, University College Dublin</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Maynooth International Engineering College, Department of Computer Science, Maynooth University</institution>
          ,
          <addr-line>Maynooth, Co Kildare</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>chines</institution>
          ,
          <addr-line>Deep Learning, Retweet Prediction, Twitter</addr-line>
        </aff>
      </contrib-group>
      <fpage>13</fpage>
      <lpage>16</lpage>
      <abstract>
        <p>In this report, we describe an ensemble approach with a set of enhanced random forest models for COVID-19 retweet prediction challenge at CIKM Analyticup 2020 held by the 29th ACM International Conference on Information and Knowledge Management. The proposed approach is based on a global model and a set of personalized models. The global model consists of a set of random forests enhanced by three diferent types of models such as linear regression, feed-forward neural networks, and factorization machines. In addition to this global model, we trained a number of personalized models for users that exist in both training and test sets and have a suficient number of tweets for training. Our approach obtained a MSLE (Mean Squared Log Error) value of 0.149997 on the test set of the challenge and ranked 4th on the final leaderboard.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Retweeting or reposting, a function to repost a post such as a tweet
with followers, is one of the most crucial functionalities in many
popular social media platforms such as Twitter1 or Weibo2 as it
enables information spreading on those platforms. Understanding
retweet behavior is useful for many applications such as political
audience design [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or fake news spreading and tracking [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Therefore, understanding and modeling retweet behavior has been an
active research area and might be particularly helpful during times
of crisis, such as the current COVID-19 pandemic.
      </p>
      <p>In this regard, the COVID-19 retweet prediction challenge held
in conjunction with the 29th ACM International Conference On
Information and Knowledge Management was launched to better
understand retweet behavior in the context of COVID-19. The
challenge has two phases including validation and testing where 51
teams participated the validation phase and 20 teams participated
the testing pahse. In this report, we present our proposed approach
for the retweet prediction task in the challenge, which ranked 4th
place on the final leaderboard after the testing phase.</p>
      <sec id="sec-1-1">
        <title>1https://twitter.com 2https://weibo.com</title>
        <p>1.1</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>COVID-19 Retweet Prediction Challenge</title>
      <p>
        The retweet prediction challenge is based on the TweetsCOV19
dataset [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] — a publicly available dataset containing more than 8
million tweets related to COVID-19, spanning the period October
2019 to April 2020. On top of the TweetsCOV19 dataset, the dataset
provided by the challenge and the problem and evaluation metric
are given as follows.
      </p>
      <p>
        Dataset. The dataset of the challenge consists of 8,151,524
COVID19 related tweets for training, 961,182, and 961,183 tweets for
validation, and testing, respectively. In addition, the challenge also
provides a set of features for each tweet, such as:
•    for each tweet from Twitter
•  , i.e., the author of a tweet
•   of a tweet in the UTC time zone
• # (No. of followers) which indicates the number of
followers of the author of a tweet
• #(No. of friends) which indicates the number of friends
of the author of a tweet
• # (No. of favorites) which indicates the number of
favorites of a tweet
• Entities and their scores extracted from each tweet using FEL
library [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
• Sentiment scores of each tweet extracted from SentiStrength3
• Mentions of other user accounts in each tweet
• Hashtags in each tweet
• URLs in each tweet
• #Retweets(No. of retweets) which indicates the number of
retweets of a tweet. This is the target variable for prediction
on the validation and test datasets.
      </p>
      <p>Problem. Given the set of features for a tweet from TweetsCOV19,
the task is to predict the number of times it has been retweeted.</p>
      <p>Evaluation metric. Consider the predicted results ˆ and the actual
retweet counts  on the test set, which are both of length . The
performance is evaluated by MSLE (Mean Squared Log Error):
MSLE(, ˆ) =
(ln(1 +  ) − ln (1 + ˆ ))2
(1)
1 Õ
 =1</p>
    </sec>
    <sec id="sec-3">
      <title>PROPOSED APPROACH</title>
      <p>Our approach consists of two main components by splitting users
into two groups based on whether the user exists in training,
validation, and test sets. Figure 1 shows an overview of the approach.</p>
      <sec id="sec-3-1">
        <title>3http://sentistrength.wlv.ac.uk/</title>
        <p>global model</p>
        <p>The first group of users consists are the ones who exist in both
training and the test (and validation) sets with a suficient number
of tweets for training. The rest of users fall into the second group.</p>
        <p>First, for the second group of users, we build a global model
which is an ensemble of random forest models enhanced by
linear regression, feed-forward neural networks, and factorization
machines. Secondly, for each user in the first group, we build a
personalized model for each user using a random forest enhanced
by a linear regression model. Next, we discuss the global and
personalized models in detail.
2.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Global model</title>
      <p>
        The global model is a collection of regression-enhanced random
forests (RERF), which has been introduced recently in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to cope
with the extrapolation problem of random forests where predictions
on the test set are required at points out of the domain of the
training dataset. In contrast to the definition of RERF with a specific
regression model (Lasso) in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], we use a general definition of RERF
in this work as follows:
      </p>
      <p>Given a training dataset  = { = (,  ) :  = 1, . . . ,  } where
 is the size of the training set. Also,  = { :  = 1, . . . ,  } is
the set of targeted feature values and  = { :  = 1, . . . ,  }
refers to the final set of features (e.g., after manual engineering,
transformation, scaling, adding high-order, or interaction):
Step 1: Train a regression model ( ) using the training set, and
let  =  − ( ) be the residual from ( ). Here, ( )
can be any regression model such as linear, Lasso, Ridge,
neural networks, or factorization machines, except a
treebased regressor. We then create a new training dataset
 = { = (,  ) :  = 1, . . . ,  }.</p>
      <p>Step 2: Train a random forest model  ( ) using the new
training set  . The hyper-parameters can be predefined or
determined with grid search and cross-validation.</p>
      <p>Step 3: Given the trained model (·) and  (·), the RERF
prediction ˆ for the response at ˆ is given by ˆ = (ˆ ) +  (ˆ ).</p>
      <p>
        We use ★★RF to refer to a RERF depending on which regression
model is used for enhancing a random forest. The global model
consists of three types of RERFs with 16 models in total where the
ifnal prediction is the mean of predicted values from those models.
• A LRRF (Linear Regression-enhanced Random Forest) which
denotes a simple linear regression-enhanced random
forest model. We used a simple linear regression without an
intercept and regularization given the large number of
examples in the training set. For the corresponding random forest
model, we used one with a maximum depth of 20 which
consists of 500 estimators/trees.
• Ten NNRFs (Neural Networks-enhanced Random Forests)
where each NNRF uses feed-forward neural networks with
diferent hyper-parameters (e.g., the number of hidden layers
and neurons) for enhancing the corresponding random forest
model. For the corresponding random forest model, we used
one with a maximum depth of 18 which consists of 500
estimators.
• Five FMRFs (Factorization Machine-enhanced Random
Forests) where four of them are DeepFM (Deep
Factorization Machine) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] models with diferent hyper-parameters
(e.g., the number of iteration or seed) and one xDeepFM [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
for enhancing the corresponding random forest model. The
random forest model consists of 500 estimators and has a
maximum depth of 16 and maximum features of 50%.
      </p>
      <p>
        For training, the input of each RERF is a set of feature values (we
will discuss the features in Section 2.3) regarding a tweet and the
number of retweets of it. Given MSLE as the evaluation metric of
the challenge, we further log transformed the set of feature values
and the number of retweets for each tweet for training a RERF.
Those RERFs are implemented using scikit-learn [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and DeepCTR
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] Python packages. The implementation details can be found in
our github repository4.
2.2
      </p>
    </sec>
    <sec id="sec-5">
      <title>Patching personalized models</title>
      <p>
        Although the global model captures the overall relationship
between the set of features and the retweet count of a tweet, the
relationship would vary depending on the author of a tweet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Figure 2 shows an example of the variance of the relationship
between the number of favorites and the number of retweets for two
diferent users in a log scale. Therefore, for the first group of users
who are in both training and test sets and have at least 10 tweets
for training, a personalized LRRF model for each user is trained,
and the prediction using the global model will be patched/updated
with the prediction from a personalized model.
      </p>
      <p>One challenge of training a personalized model is the number
of tweets for a user can be limited, and using all features used for
training the global model can result in overfitting. To cope with
this problem, we only used #Favorites as a single feature to learn
a personalized model for each user. Also, as tweets having zero
values in either #Favorites and #Retweets are not useful to learn a
personalized model, we further limit users who have more than six
tweets having none zero values in both #Favorites and #Retweets.
Overall, 236,240 tweets in the test set belong to this category.
4https://github.com/parklize/cikm2020-analyticup</p>
      <p>On one hand, the above-mentioned personalized LRRFs using
a single feature might resolve the problem of overfitting for users
with a small number of tweets. On the other hand, we found that
those LRRFs can result in underfitting for user who have a large
number of tweets for training. Therefore, for the group of users who
have more than  tweets having nonzero values in both #Favorites
and #Retweets, we use RidgeRFs (or LRRFs with L2 regularization)
with all features that have been used for the global model where the
penalty term is set to 5. We empirically found that  = 160 achieves
the best results. Overall, 70,821 tweets in the test set belong to this
category.
2.3</p>
    </sec>
    <sec id="sec-6">
      <title>Features</title>
      <p>On top of the features provided by the challenge for each tweet
which has been introduced in Section 1, we extracted 30 features
which are described in detail in Table 2. The features we used for
training models in Section 2.1 and 2.2 can be classified into four
categories: (1) user features, (2) content features, (3) time features,
and (4) sentiment features.</p>
      <p>User features denote a set of features related to the user/author
of a tweet. In addition to the number of followers and friends of a
user, we also included the ratio of those two numbers and the total
number of tweets posted by the user in the training, validation, and
test datasets. The total number of tweets shows the activity level
of a user and we found that it helped to improve the prediction
performance.</p>
      <p>Content features include a set of features related to tweet content
to capture diferent characteristics of the content. For example, the
number of favorites that a tweet has, the popularity of entities,
hashtags, mentions, and URL domain in a tweet. The popularity
of an entity can be estimated by how many times an entity in a
tweet appeared in all tweets in the training, validation, and testing
datasets. We also noticed that a tweet could be retweeted more
when a popular account (e.g., @WHO) is mentioned in the tweet.
To incorporate popularity of mentioned users in a tweet, we used
the maximum number of followers and friends of mentioned users
where the number of followers and friends for each mentioned user
has been obtained via the Twitter API.</p>
      <p>LRRF</p>
      <p>LRRF+Patching</p>
      <p>Time features consist of features that capture relevant
information related to the time when a tweet is posted such as whether the
tweet is posted on a weekend, or on which day of the week.</p>
      <p>Sentiment features refer to both positive and negative sentiment
scores of a tweet provided by the SentiStrength, and their
interaction (e.g., the sum of positive and negative scores).
3</p>
    </sec>
    <sec id="sec-7">
      <title>RESULTS</title>
      <p>in each category) in the table.</p>
      <sec id="sec-7-1">
        <title>No. of friends / No. of followers</title>
      </sec>
      <sec id="sec-7-2">
        <title>No. of favorites / No. of followers</title>
      </sec>
      <sec id="sec-7-3">
        <title>Has entity</title>
      </sec>
      <sec id="sec-7-4">
        <title>Has hashtag</title>
      </sec>
      <sec id="sec-7-5">
        <title>Has mention Has URL</title>
      </sec>
      <sec id="sec-7-6">
        <title>No. of entities</title>
      </sec>
      <sec id="sec-7-7">
        <title>No. of hashtags</title>
      </sec>
      <sec id="sec-7-8">
        <title>No. of mentions</title>
      </sec>
      <sec id="sec-7-9">
        <title>No. of URLs</title>
      </sec>
      <sec id="sec-7-10">
        <title>Entity popularity</title>
      </sec>
      <sec id="sec-7-11">
        <title>Hashtag popularity</title>
      </sec>
      <sec id="sec-7-12">
        <title>Mention popularity</title>
      </sec>
      <sec id="sec-7-13">
        <title>URL domain popularity</title>
      </sec>
      <sec id="sec-7-14">
        <title>Tweet length</title>
      </sec>
      <sec id="sec-7-15">
        <title>No. of top 20 entities</title>
      </sec>
      <sec id="sec-7-16">
        <title>No. of top 20 hashtags</title>
      </sec>
      <sec id="sec-7-17">
        <title>No. of top 20 mentions Time (3)</title>
      </sec>
      <sec id="sec-7-18">
        <title>Sentiment (3)</title>
      </sec>
      <sec id="sec-7-19">
        <title>Time segment</title>
      </sec>
      <sec id="sec-7-20">
        <title>Weekend</title>
      </sec>
      <sec id="sec-7-21">
        <title>Day of week</title>
      </sec>
      <sec id="sec-7-22">
        <title>Positive sentiment</title>
      </sec>
      <sec id="sec-7-23">
        <title>Negative sentiment</title>
      </sec>
      <sec id="sec-7-24">
        <title>Overall sentiment</title>
      </sec>
      <sec id="sec-7-25">
        <title>Number of followers that a user has</title>
      </sec>
      <sec id="sec-7-26">
        <title>Number of friends that a user has</title>
      </sec>
      <sec id="sec-7-27">
        <title>The ratio of those two numbers</title>
      </sec>
      <sec id="sec-7-28">
        <title>No. of tweets posted by a user</title>
      </sec>
      <sec id="sec-7-29">
        <title>Number of favorites that a tweet has</title>
      </sec>
      <sec id="sec-7-30">
        <title>The ratio of those two numbers</title>
        <p>1 or 0 to denote whether a tweet contains any entity
1 or 0 to denote whether a tweet contains any hashtag
1 or 0 to denote whether a tweet mentions other users
1 or 0 to denote whether a tweet contains any URL
The total number of entities extracted from a tweet</p>
      </sec>
      <sec id="sec-7-31">
        <title>The total number of hashtags in a tweet</title>
      </sec>
      <sec id="sec-7-32">
        <title>The total number of mentions in a tweet</title>
      </sec>
      <sec id="sec-7-33">
        <title>The total number of URLs in a tweet</title>
        <p>How many times an entity in a tweet appeared in all tweets
(Take the maximum value of all entities in a tweet)
How many times a hashtag in a tweet appeared in all tweets
How many times a mentioned user in a tweet appeared in all tweets
How many times the domain of a URL in a tweet appeared in all tweets
The total number of entities, hashtags, mentions, as well as URLs
Number of top 20 entities from all tweets of a day
Number of top 20 hashtags from all tweets of a day
Number of top 20 mentioned users from all tweets of a day
The time segment of a tweet {1 · · ·24} indicating when it is posted
1 or 0 to indicate whether a tweet is posted on a weekend or not
A score for positive (1 to 5) sentiment for a tweet
A value from {1 · · ·7} to indicate the ℎ day of a week
A score for negative (-1 to -5) sentiment for a tweet
The sum of positive and negative sentiment of a tweet</p>
      </sec>
      <sec id="sec-7-34">
        <title>Maximum No. of followers of mentioned users The maximum number of followers of mentioned users in a tweet</title>
      </sec>
      <sec id="sec-7-35">
        <title>Maximum No. of friends of mentioned users The maximum number of friends of mentioned users in a tweet a suficient number of tweets for training a personalized model further improved the performance.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGEMENTS</title>
      <p>We pay our highest respect to numerous healthcare professionals
and volunteers battling the COVID-19 pandemic on the front lines.
W. Huang is supported by Science Foundation Ireland under grant
number SFI/12/RC/2289_P2.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Roi</given-names>
            <surname>Blanco</surname>
          </string-name>
          , Giuseppe Ottaviano, and
          <string-name>
            <given-names>Edgar</given-names>
            <surname>Meij</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Fast and Space-Eficient Entity Linking in Queries</article-title>
          .
          <source>In Proceedings of the Eight ACM International Conference on Web Search and Data Mining</source>
          (Shanghai, China) (
          <article-title>WSDM 15)</article-title>
          . ACM, New York, NY, USA,
          <volume>10</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Dimitar</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          , Erdal Baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, and
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Dietze</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic</article-title>
          .
          <source>In Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management. Association for Computing Machinery</source>
          , New York, NY, USA,
          <fpage>2991</fpage>
          -
          <lpage>2998</lpage>
          . https://doi.org/10. 1145/3340531.3412765
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Huifeng</given-names>
            <surname>Guo</surname>
          </string-name>
          , Ruiming Tang, Yunming Ye,
          <string-name>
            <given-names>Zhenguo</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Xiuqiang</given-names>
            <surname>He</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>DeepFM: a factorization-machine based neural network for CTR prediction</article-title>
          .
          <source>arXiv preprint arXiv:1703.04247</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jianxun</given-names>
            <surname>Lian</surname>
          </string-name>
          , Xiaohuan Zhou, Fuzheng Zhang, Zhongxia Chen, Xing Xie, and
          <string-name>
            <given-names>Guangzhong</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>xdeepfm: Combining explicit and implicit feature interactions for recommender systems</article-title>
          .
          <source>In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          .
          <fpage>1754</fpage>
          -
          <lpage>1763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Guangyuan</given-names>
            <surname>Piao</surname>
          </string-name>
          and John G Breslin.
          <year>2018</year>
          .
          <article-title>Learning to Rank Tweets with AuthorBased Long Short-Term Memory Networks</article-title>
          .
          <source>In International Conference on Web Engineering</source>
          . Springer,
          <fpage>288</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Weichen</given-names>
            <surname>Shen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>DeepCTR: Easy-to-use,Modular and Extendible package of deep-learning based CTR models</article-title>
          . https://github.com/shenweichen/deepctr.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Stieglitz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Linh</given-names>
            <surname>Dang-Xuan</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Political communication and influence through microblogging-An empirical analysis of sentiment in Twitter messages and retweet behavior</article-title>
          .
          <source>In 2012 45th Hawaii International Conference on System Sciences. IEEE</source>
          ,
          <fpage>3500</fpage>
          -
          <lpage>3509</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Soroush</given-names>
            <surname>Vosoughi</surname>
          </string-name>
          , Deb Roy, and
          <string-name>
            <given-names>Sinan</given-names>
            <surname>Aral</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The spread of true and false news online</article-title>
          .
          <source>Science</source>
          <volume>359</volume>
          ,
          <issue>6380</issue>
          (
          <year>2018</year>
          ),
          <fpage>1146</fpage>
          -
          <lpage>1151</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Haozhe</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Dan Nettleton, and
          <string-name>
            <given-names>Zhengyuan</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Regression-enhanced random forests</article-title>
          . arXiv preprint arXiv:
          <year>1904</year>
          .
          <volume>10416</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>