<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CIKM AnalytiCup 2020: COVID-19 Retweet Prediction with Personalized Atention</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Deep Learning</institution>
          ,
          <addr-line>Personalized attention, COVID-19, Pseudo-labelling</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the first place winning solution for the CIKM AnalytiCup 2020 COVID-19 retweet prediction challenge. The objective of the challenge is to predict the popularity of COVID-19 related tweets in terms of the number of retweets, and the submitted solutions of the challenge are ranked based on Mean Squared Logarithmic Error(MSLE) on the leaderboard. The proposed deep learning model to predict retweet counts uses minimal hand-engineered features and learns to predict retweet count based on a personalized attention mechanism. As a tweet keyword may have diferent informativeness for diferent users, the personalized attention mechanism helps the deep learning model to weigh the importance of tweet keywords based on a user's interest to retweet. Additional techniques such as adding external data sets to training and pseudolabeling are also experimented with to further improve the MSLE score. The final solution comprises of an ensemble of diferent personalized attention-based deep learning models, and the source code for the solution can be found at https://github.com/vinayakarajt/CIKM2020.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Understanding information difusion in social networks is
imperative as it helps to comprehend social interactions among users
in a better way. Information spread on a large scale in social
networks enables marketers, advertisers to design their campaigns
more efectively to target potential customers. In addition to that,
identifying influential users [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] in social media is also significant as
these users contribute immensely to information difusion during
viral marketing campaigns. Relationships between users on
social networks heavily afect the amount of information exchanged
among themselves. Furthermore, understanding how fake news
spreads in social networks is also crucial to prevent the propagation
of misinformation during global pandemics such as COVID-19.
      </p>
      <p>Modeling information difusion in social networks is a hot
research topic that has garnered more interest in research
communities of late. In CIKM AnalytiCup 2020, the competition objective is
to model the information spreading mechanism during COVID-19
by predicting the retweet count of tweets on Twitter. Retweeting a
tweet is one of the functions of Twitter that helps users to quickly
share their tweets or tweets of other users to all of their followers.
Retweets can be seen as one of the ways information spreads on</p>
      <p>
        Twitter and are very crucial to understand the information difusion
mechanism on Twitter. Some of the practical applications of
information difusion using tweets are political audience design [
        <xref ref-type="bibr" rid="ref15 ref9">9, 15</xref>
        ],
fake news spreading and tracking [
        <xref ref-type="bibr" rid="ref10 ref17">10, 17</xref>
        ] and health promotion
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>In this paper, all the techniques used to predict the retweet
count of tweets are discussed in detail. The first section provides
a summary of the dataset presented to solve the problem.
Handengineering new features and their pre-processing techniques are
also explained in this section. Model architecture and the
personalized attention mechanism are described in the next section, and
ifnally, in the last section, all the experiments carried out to improve
the model score on the test leaderboard are explained in detail.
2</p>
    </sec>
    <sec id="sec-2">
      <title>DATASET</title>
      <p>
        TweetsCOV19 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] dataset provided in the competition is a large
collection of COVID-19 related tweets that are extracted using a
seed list of 268 COVID-19 related keywords [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] from a large corpus
of anonymized and annotated TweetsKB [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] corpus. TweetsCOV19
contains all the COVID-19 related tweets from October 2019 to
April 2020 and the total number of tweets in the dataset is around
8 million that are posted by 3.7 million users. For each tweet, the
user who tweeted that, time of the tweet, metadata information
such as #followers, #favorites and #Friends and text information
of tweets are provided in the dataset. Text information of tweets is
split into entities, hashtags, mentions, and URLs. Entities of each
tweet are created using Fast Entity Linker [
        <xref ref-type="bibr" rid="ref1 ref11">1, 11</xref>
        ]. The sentiment
of each tweet is also provided and is extracted using SentiStrength
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] which scores each tweet between -4(very negative) to 4(very
positive).
      </p>
      <p>In addition to the given metadata features, few more features are
derived from the given tweet metadata information and used to
predict the retweet count. A full list of features and their preprocessing
techniques is provided in Table 1.</p>
      <p>Both original tweet keywords and their respective annotated
entities are extracted and considered for analysis. Numbers and
special characters are removed from hashtags and mentions, and
duplicate keywords are removed from entities, hashtags, and
mentions. URLs are split into two parts. The hostname of the tweet
URL is extracted as URL-1, and the path of the URL is considered as
URL-2. Besides that, numbers and special characters are removed
from URL-2.
3</p>
    </sec>
    <sec id="sec-3">
      <title>MODEL ARCHITECTURE</title>
      <p>Figure 1 shows the network architecture of the retweet prediction
model and the attention mechanism. High cardinal feature such
Feature
week
time
year
no_entities
keyword_entities</p>
      <p>no_hashtags
keyword_hashtags
no_mentions
no_urls
Sentiment
#Favorites
#Followers
#Friends
#Followers/#Friends
#Friends/#Favorites
#Favorites/#Followers
username</p>
      <p>Description
Week info extracted from timestamp
Time info extracted from timestamp
Year info extracted from timestamp</p>
      <p>No. of entities in a tweet
No. of COVID related entities in a tweet</p>
      <p>No. of hashtags in a tweet
No. of COVID related hashtags in a tweet</p>
      <p>No. of mentions in a tweet</p>
      <p>No. of urls in a tweet
Sentiment score from SentiStrength (-4 to 4)</p>
      <p>Tweet favorites
No. of followers of an user</p>
      <p>No. of friends of an user
No. of followers and friends ratio
No. of friends and favorites ratio
No. of favorites and followers ratio</p>
      <p>Encrypted username</p>
      <sec id="sec-3-1">
        <title>Preprocessing Technique</title>
      </sec>
      <sec id="sec-3-2">
        <title>One-hot-encoded</title>
        <p>Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized</p>
        <p>One-hot-encoded
Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized
Log transformed and then standardized
Label encoded</p>
        <p>Retweet Count
Dense Layer 2 (150)
Dense Layer 1 (500)
Numeric and  One Hot
Encoded Features </p>
        <sec id="sec-3-2-1">
          <title>ReprEesnetintytation</title>
          <p>Hashtag
Representation
Mentions
Representation
Url Host
Representation
Url Path
Representation
At ention
Outputs
User Dense</p>
          <p>Vector
User Dense Layer</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>UserVEemcbtoerdding</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>User Elamybeerdding</title>
          <p>LSTM/CNN
Output
Sequences</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>LSTHMe/aCdNN</title>
          <p>Entity Word
Vectors
LSTM/CNN
Output
Sequences</p>
        </sec>
        <sec id="sec-3-2-5">
          <title>LSTHMe/aCdNN </title>
          <p>LSTM/CNN
Output
Sequences</p>
        </sec>
        <sec id="sec-3-2-6">
          <title>LSTHMe/aCdNN </title>
          <p>Hashtag Word
Vectors</p>
        </sec>
        <sec id="sec-3-2-7">
          <title>MenVtieocntsorWsord</title>
          <p>Word Embedding Layer
LSTM/CNN
Output
Sequences</p>
        </sec>
        <sec id="sec-3-2-8">
          <title>LSTHMe/aCdNN </title>
          <p>Host Word
Vectors
Username
Entities
Hashtags
Mentions
Host
Path
URLs
LSTM/CNN
Output
Sequences</p>
        </sec>
        <sec id="sec-3-2-9">
          <title>LSTHMe/aCdNN </title>
          <p>Path Word
Vectors</p>
          <p>Personlaized
At ention
as username is embedded as a fixed-length vector using user
embedding layer, which is then passed through a series of user dense
layers to get the final representation of users.</p>
          <p>The word embedding layer is shared by the preprocessed
keywords of tweet entities, mentions, hashtags, and URLs and is
initialized by any pre-trained word vectors. For a tweet, entity word
vectors are a sequence of word vectors queried from the embedding</p>
        </sec>
        <sec id="sec-3-2-10">
          <title>Entity/Hashtag/Mention/URLs</title>
        </sec>
        <sec id="sec-3-2-11">
          <title>Representation r User Dense Vector u</title>
          <p>v1
a1
ai
aj
am
vi
vj
vm</p>
        </sec>
        <sec id="sec-3-2-12">
          <title>LSTM/CNN Output Sequence</title>
          <p>layer and the length of the sequence is equivalent to the number
of entity keywords extracted from that tweet. The length of the
sequence is fixed for the entire dataset and is a hyper-parameter in
the model. Entity keywords smaller than the sequence length are
padded with zeros on the left, and the larger ones are trimmed on
the right. Word vectors of hashtags, mentions, and URLs are created
the same way as the entity word vectors. These extracted word
vectors are then inputted to an LSTM/CNN layer, which is then
used to learn the representation of entities, mentions, hashtags, and
URLs from their respective word vector sequence.</p>
          <p>
            User vector representation  and the LSTM/CNN output vectors 
are used to create personalized attention mechanism [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. Attention
weight  is formulated as:
 =  
 ∗ ℎ ( ∗  +  )
          </p>
          <p>( )
 = Í</p>
          <p>=1  (  )
where  and  are user projection parameters and  is the
sequence length. The final representation  of entities/hashtags/mentions/URLs
is given by:
 =

Õ</p>
          <p>∗  
=1</p>
          <p>The final representation vectors of entities, hashtags, mentions,
and URLs are then concatenated together with the user vector and
other features such as tweet metadata and time-based features. The
ifnal concatenated feature vector is then passed through a series of
dense layers to estimate the retweet count.
4
4.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTS</title>
    </sec>
    <sec id="sec-5">
      <title>Experiment Setting</title>
      <p>Dataset provided for the competition comprises of all COVID 19
related tweets from 2019-09-30 to 2020-05-31. Of which, the entire
month of May 2020 is considered for testing and is split into two
testing sets - testing set 1 &amp; 2. Testing set 1 is used for validating the
model on the leader board and testing set 2 is used to rank the final
winners of the competition. The rest of the data set from 2019-09-30
to 2020-04-30 is used for training the model. The training data set
is sorted in chronological order and the very recent 5% tweets of
the training data set are filtered out and used as the validation
set. Information about training, validation, and testing splits are
provided in Table 2.</p>
      <p>Mean Square Logarithmic Error (MSLE) is the evaluation metric
used in this competition. MSLE is given by:
1 Õ
 =0
 =</p>
      <p>( (( + 1) − ( + 1))2
where  and  are the actual and predicted retweet counts
respectively. MSLE penalizes under estimations more than over
estimations.</p>
      <p>The model described in Figure 1 is trained on a Tesla V100 GPU
machine. The optimal hyper-parameter settings are selected based
on the model with the best MSLE score on the validation set, and the
tuned hyperparameters of the model setup are provided in Table 3.
4.2</p>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>
        The performance of the models on the final testing dataset is shown
in Table 4. A single personalized attention model with fast text
embedding and LSTM head for learning tweet representation provides
an MSLE score of 0.12860 on the test dataset. A large collection
of annotated tweets for the months of April 2020 and March 2020
from the dataset corpus TweetsKB is added to the training dataset
and a deep learning model is trained on the whole dataset. This
addition of an external dataset decreases the MSLE score by 3.76%.
TweetsCOV19 is the subset of TweetsKB and hence doesn’t include
all the tweets of users but their COVID related tweets. Including all
tweets of a user not only help the personalized mechanism to
understand the relation between users and their tweets but also help the
model to learn a rich representation of users and tweet keywords.
To further improve the score, techniques such as ensembling and
pseudo-labeling are also tried.
4.2.1 Ensembling. In addition to initialing the Twitter keywords
with fasttext embeddings [
        <xref ref-type="bibr" rid="ref2 ref7">2, 7</xref>
        ], pre-trained word vectors such as
glove840 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], glovetwitter [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], fasttext wiki [
        <xref ref-type="bibr" rid="ref2 ref7">2, 7</xref>
        ] and LexVec
[
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ] are also used to train the deep learning model. Among the
ifve models trained with diferent pre-trained vectors, fasttext
embedding initialization provides the best score on the testing
leaderboard. Furthermore, another set of models is trained by replacing
the LSTM head with CNN head and also with all the five pre-trained
word vectors. Individual MSLE scores of the models with CNN head
are much lower than the models with LSTM heads but ensembling
all the models together provide a significant improvement on the
testing leaderboard. In total, there are ten personalized attention
models, and the final solution is created by ensembling all the ten
output predictions with simple averaging. Ensembling decreases
the MSLE score by 2.464%.
4.2.2 Pseudo-Labelling. Pseudo-labelling is another trick tried to
decrease the MSLE score. Best output solutions on the leaderboard
of the test set 1 and test set 2 are used as labels for the respective
data sets and are then added to the training set for building the deep
learning models. Similar to the ensembling technique described
above, ten diferent models with diferent pre-trained word vectors
and LSTM/CNN heads are built with the new dataset and are then
averaged. Pseudo-labelling decreased the MSLE score by a very
small percentage of 0.132%.
5
      </p>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION</title>
      <p>A methodology to estimate retweet count for COVID related tweets
is proposed in this paper. The personalized attention-based deep
learning model described in this paper uses less hand-engineered
features and learns a rich representation of users and tweet
keywords to predict retweet count. To further improve the performance
of the model, techniques such as adding external datasets,
ensembling, and pseudo-labeling are also tried. The final solution to
estimate retweet counts is created by an ensemble of deep learning
models which placed the team first on the testing leaderboard.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Roi</given-names>
            <surname>Blanco</surname>
          </string-name>
          , Giuseppe Ottaviano, and
          <string-name>
            <given-names>Edgar</given-names>
            <surname>Meij</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Fast and Space-Eficient Entity Linking in Queries</article-title>
          .
          <source>In Proceedings of the Eight ACM International Conference on Web Search and Data Mining</source>
          (Shanghai, China) (
          <article-title>WSDM 15)</article-title>
          . ACM, New York, NY, USA,
          <volume>10</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Enriching Word Vectors with Subword Information</article-title>
          .
          <source>CoRR abs/1607</source>
          .04606 (
          <year>2016</year>
          ). arXiv:
          <volume>1607</volume>
          .04606 http://arxiv.org/abs/1607.04606
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Jae</given-names>
            <surname>Eun Chung</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Retweeting in health promotion: Analysis of tweets about Breast Cancer Awareness Month</article-title>
          .
          <source>Computers in Human Behavior</source>
          <volume>74</volume>
          (04
          <year>2017</year>
          ). https://doi.org/10.1016/j.chb.
          <year>2017</year>
          .
          <volume>04</volume>
          .025
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Dimitar</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          .
          <source>2020 (accessed August</source>
          <year>2020</year>
          ).
          <article-title>COVID-19 related keywords</article-title>
          . https://data.gesis.org/tweetscov19/keywords.txt
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Dimitar</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          , Erdal Baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, and
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Dietze</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>TweetsCOV19 - A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic</article-title>
          .
          <source>In Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management. Association for Computing Machinery</source>
          , New York, NY, USA,
          <fpage>2991</fpage>
          -
          <lpage>2998</lpage>
          . https://doi.org/10. 1145/3340531.3412765
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Pavlos</given-names>
            <surname>Fafalios</surname>
          </string-name>
          , Vasileios Iosifidis, Eirini Ntoutsi, and
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Dietze</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets</article-title>
          .
          <source>In European Semantic Web Conference</source>
          . Springer,
          <fpage>177</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Armand</given-names>
            <surname>Joulin</surname>
          </string-name>
          , Edouard Grave, Piotr Bojanowski, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Bag of Tricks for Eficient Text Classification</article-title>
          .
          <source>CoRR abs/1607</source>
          .01759 (
          <year>2016</year>
          ). arXiv:
          <volume>1607</volume>
          .01759 http://arxiv.org/abs/1607.01759
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kafeza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kanavos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Makris</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Vikatos</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>T-PICE: Twitter Personality Based Influential Communities Extraction System</article-title>
          .
          <source>In 2014 IEEE International Congress on Big Data</source>
          .
          <fpage>212</fpage>
          -
          <lpage>219</lpage>
          . https://doi.org/10.1109/BigData.Congress.
          <year>2014</year>
          .38
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Eunice</given-names>
            <surname>Kim</surname>
          </string-name>
          , Yongjun Sung, and
          <string-name>
            <given-names>Hamsu</given-names>
            <surname>Kang</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Brand followers' retweeting behavior on Twitter: How brand relationships influence brand electronic wordof-mouth</article-title>
          .
          <source>Computers in Human Behavior</source>
          <volume>37</volume>
          (
          <year>2014</year>
          ),
          <fpage>18</fpage>
          -
          <lpage>25</lpage>
          . https://doi.org/10. 1016/j.chb.
          <year>2014</year>
          .
          <volume>04</volume>
          .020
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Cristian</surname>
            <given-names>Lumezanu</given-names>
          </string-name>
          , Nick Feamster, and
          <string-name>
            <given-names>Hans</given-names>
            <surname>Klein</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title># bias: Measuring the Tweeting Behavior of Propagandists.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Aasish</surname>
            <given-names>Pappu</given-names>
          </string-name>
          , Roi Blanco, Yashar Mehdad, Amanda Stent, and
          <string-name>
            <given-names>Kapil</given-names>
            <surname>Thadani</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Lightweight Multilingual Entity Extraction and Linking</article-title>
          .
          <source>In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining</source>
          (Cambridge, United Kingdom)
          <article-title>(WSDM '17). Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>365</fpage>
          -
          <lpage>374</lpage>
          . https://doi.org/10.1145/3018661.3018724
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          .
          <source>In Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <volume>1532</volume>
          -
          <fpage>1543</fpage>
          . http://www.aclweb.org/anthology/D14- 1162
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Alexandre</surname>
            <given-names>Salle</given-names>
          </string-name>
          , Marco Idiart, and
          <string-name>
            <given-names>Aline</given-names>
            <surname>Villavicencio</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory</article-title>
          .
          <source>CoRR abs/1606</source>
          .01283 (
          <year>2016</year>
          ). arXiv:
          <volume>1606</volume>
          .01283 http://arxiv. org/abs/1606.01283
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Alexandre</surname>
            <given-names>Salle</given-names>
          </string-name>
          , Aline Villavicencio, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Idiart</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          : Short Papers).
          <source>Association for Computational Linguistics</source>
          , Berlin, Germany,
          <fpage>419</fpage>
          -
          <lpage>424</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>P16</fpage>
          -2068
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Stieglitz</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Dang-Xuan</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Political Communication and Influence through Microblogging-An Empirical Analysis of Sentiment in Twitter Messages and Retweet Behavior</article-title>
          .
          <source>In 2012 45th Hawaii International Conference on System Sciences</source>
          .
          <volume>3500</volume>
          -
          <fpage>3509</fpage>
          . https://doi.org/10.1109/HICSS.
          <year>2012</year>
          .476
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Mike</surname>
            <given-names>Thelwall</given-names>
          </string-name>
          , Kevan Buckley, Georgios Paltoglou, Di Cai, and
          <string-name>
            <given-names>Arvid</given-names>
            <surname>Kappas</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Sentiment strength detection in short informal text</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>61</volume>
          ,
          <issue>12</issue>
          (
          <year>2010</year>
          ),
          <fpage>2544</fpage>
          -
          <lpage>2558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Soroush</surname>
            <given-names>Vosoughi</given-names>
          </string-name>
          , Deb Roy, and
          <string-name>
            <given-names>Sinan</given-names>
            <surname>Aral</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The spread of true and false news online</article-title>
          .
          <source>Science</source>
          <volume>359</volume>
          ,
          <issue>6380</issue>
          (
          <year>2018</year>
          ),
          <fpage>1146</fpage>
          -
          <lpage>1151</lpage>
          . https://doi.org/10.1126/science. aap9559 arXiv:https://science.sciencemag.org/content/359/6380/1146.full.pdf
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Chuhan</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Fangzhao Wu,
          <string-name>
            <given-names>Mingxiao</given-names>
            <surname>An</surname>
          </string-name>
          , Jianqiang Huang,
          <string-name>
            <given-names>Yongfeng</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xing</given-names>
            <surname>Xie</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>NPA: Neural News Recommendation with Personalized Attention</article-title>
          . CoRR abs/
          <year>1907</year>
          .05559 (
          <year>2019</year>
          ). arXiv:
          <year>1907</year>
          .05559 http://arxiv.org/abs/
          <year>1907</year>
          .05559
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>