<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Corpus of news articles annotated with article-level sentiment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ahmet Aker, Hauke Gravenkamp, Sabrina J. Mayer</string-name>
          <email>rstName.lastName@uni-due.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marius Hamacher, Anne Smets, Alicia Nti</string-name>
          <email>rstName.lastName@stud.uni-due.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Erdmann, Julia Serong, Anna Welpinghus</string-name>
          <email>rstName.lastName@tu.dortmund.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Marchi</string-name>
          <email>rstName.lastName@rub.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ruhr University Bochum</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technical University of Dortmund</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Duisburg-Essen</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Research on sentiment analysis is in its
mature status. Studies on this topic have
proposed various solutions and datasets to guide
machine-learning approaches. However, so far
the sentiment scoring is restricted to the level
of short textual units such as sentences. Our
comparison shows that there is a huge gap
between machines and human judges when
the task is to determine sentiment scores of
a longer text such as a news article. To close
this gap, we propose a new human-annotated
dataset containing 250 news articles with
sentiment labels at article level. Each article is
annotated by at least 10 people. The articles
are evenly divided into fake and non-fake
categories. Our investigation on this corpus shows
that fake articles are signi cantly more
sentimental than non-fake ones. The dataset will
be made publicly available.</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        Nowadays, the amount of online news content is
immense and its sources are very diverse. For the
readers and other consumers of online news who value
balanced, diverse, and reliable information, it is necessary
to have access to additional information to evaluate the
available news articles. For this purpose, Fuhr et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
propose to label every online news article with
information nutrition labels to describe the ingredients of
the article and thus give the reader a chance to
evaluate what she is reading. This concept is analogous
to food packages where nutrition labels help buyers in
their decision-making. The authors discuss 9 di erent
information nutrition labels including sentiment. The
sentiment of a news article is subtly re ected by the
tone and e ective content of a writer's words [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Fuhr
et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] conclude that knowing about an article's level
of sentiment could help the reader to judge the
credibility and whether it is trying to deceive the reader by
relying on emotional communication.
      </p>
      <p>
        Sentiment analysis is a mature research direction
and has been summarized by several overview papers
and books [
        <xref ref-type="bibr" rid="ref13 ref3 ref4">13, 3, 4</xref>
        ]. Commonly, sentiment is
computed on a small fraction of text such as a phrase or
sentence. Using this strategy, authors of [
        <xref ref-type="bibr" rid="ref1 ref11 ref14">11, 14, 1</xref>
        ]
analyze for instance Twitter posts. To compute
sentiment over a text, such as a news article that spans
over several sentences, [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] use the aggregated
average sentiment score of the text's sentences. However,
our current study shows that this does not align with
text length
sentences
sentence average words
      </p>
      <p>Textual statistics about articles in the
fake</p>
      <p>non-fake
min
max
median
mean
min
max
median
mean
min
max
median
mean
the human perception of sentiment. If there are only,
e.g. two sentences in the article which are
sentimentally loaded and the remaining sentences are neutral,
a sentence-based sentiment scorer will label the
article as not sentimental or will assign a low sentiment
score. On the contrary, our study shows that humans
may consider the entire article as highly sentimental
even if there are only 1-2 sentences that are highly
sentimental.</p>
      <p>In this work, we propose to release a dataset
containing 250 news articles with article-level sentiment
labels.1 These labels were assigned to each article by
at least 10 paid annotators. To our knowledge, this is
the rst article-level sentiment labeled corpus. We
believe this corpus will open new ways of addressing the
sentiment perception gap between humans and
machines. Over this corpus, we also run two automatic
sentiment assessors and show that their scores do not
correlate with human-assigned scores.</p>
      <p>In addition, our articles are split into fake (125)
and non-fake (125) articles. We show that at the
article level, fake articles are signi cantly more
sentimental than the non-fake ones. This nding supports the
assumption that sentiment will help readers to
distinguish between credible and non-credible articles.</p>
      <p>In the following, we will rst describe the dataset
annotated with sentiment at article level (Section 2).</p>
      <p>In Section 3, we present inter-rater agreement among
the annotators, the analysis of sentiment provided for
fake and non-fake articles, as well as a qualitative
analysis of articles with low and high sentiment scores.</p>
      <p>
        In Section 4, we provide results about our correlation
analysis between human sentiment scores and those
obtained automatically. Finally, we discuss our
ndings and conclude the paper in Section 5.
2
We retrieved the news articles annotated in this work
from FakeNewsNet [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], a corpus of news stories
divided into fake and non-fake articles. To determine
whether a story is fake or not, the FakeNewsNet
authors extracted articles and veracity scores from two
prevalent fact-checking sites PolitiFact 2 and
GossipCop3. We sampled 125 fake and 125 non-fake articles
from this corpus. All articles are dealing with
political news, mostly the 2016 US presidential election.
      </p>
      <p>Table 1 lists textual statistics about the articles.</p>
      <p>Each news article was rated between 10 and
22 times (mean = 15:524; median = 15) and
each annotator rated 1 to 250 articles (mean =
42:185; median = 17).</p>
      <p>Annotators were recruited from colleagues and
friends and were encouraged to refer the annotation
project to their acquaintances. They were free to rate
as many articles as they liked and were compensated
with 3.5€ (or 3£ if they were residents of the UK) per
article. The recruitment method and relatively high
monetary compensation were chosen to ensure high
data quality.</p>
      <p>Sentiment was rated in two di erent ways. First,
annotators were asked to rate textual qualities of the
given article that indicate sentiment, for instance, The
article contains many words that transport
particularly strong emotions.. These qualities were measured
by ve properties on a 5-Point Rating Scale, labeled
Strongly Disagree to Strongly Agree. Afterwards,
annotators were asked to rate sentiment directly on a
percentage scale (Overall, how emotionally charged is
the article? Judge on a scale from 0-100 ), 100
indiDataset cating high sentiment intensity and 0 indicating low
sentiment intensity.</p>
      <p>We opted for the two-fold annotation approach to
generate sentiment scores that could be used to train
machine-learning models as well as sentiment
indicators that could provide insights as to why and how
people rate the level of sentiment of an article. In the
present work, however, we only analyze the percentage
scores for sentiment. When referring to annotations,
we refer to these sentiment scores. The other
sentiment variables are not discussed in this current work
due to spacial constraints.</p>
      <p>Note that annotators did not annotate the
sentiment polarity, e.g. "highly positive" or "slightly
negative", but only the sentiment intensity, e.g. "high" or
1https://github.com/ahmetaker/newsArticlesWithSentimentScore"low". In this scheme, highly positive and highly
neg2https://www.politifact.com/ ative articles receive the same score. We chose this
3https://www.gossipcop.com/ annotation scheme since article-level polarity seems
less informative for an entire article: In cases where
a single article praises one position and condemns
another, giving an overall polarity score is ambiguous
and sentence-level polarity scores may be more
informative.</p>
      <p>
        The notion of sentiment intensity is still di erent
from subjectivity. A subjective statement contains
personal views of an author whereas an objective article
contains facts about a certain topic. Both subjective
and objective statements may or may not contain
sentiment [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. For example, "the man was killed"
expresses a negative sentiment in an objective fashion,
while "I believe the earth is at" is a subjective
statement expressing no sentiment. For an investigation of
article level subjectivity, see [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Analysis of Sentiment Scores</title>
      <p>First, we measure di erences in inter-rater agreement
for fake and non-fake articles in order to see whether
the annotators agree on the judgments or not. We
also analyze the distribution of sentiment ratings to
see whether there are di erences in sentiment scores
for fake and non-fake articles. Afterwards, we look at
articles with particularly high or low sentiment scores
to nd di erences in the writing of the articles that
could in uence annotators in their ratings and
determine whether an article is perceived sentimental.
3.1</p>
      <p>
        Inter-rater Reliability Analysis
Inter-rater reliability is measured using the Intra Class
Correlation (ICC) Index. A one-way random e ects
model for absolute agreement with average measures
as observation units is assumed (ICC(1,k)). (We
followed the guidelines of [
        <xref ref-type="bibr" rid="ref10 ref8">8, 10</xref>
        ] to select the ICC model
parameters.)
      </p>
      <p>Since not every annotator annotated every article,
annotators are assumed to be a random e ect in the
model. We chose the minimum number of available
annotations per article (k = 10) as the basis for the
reliability analysis. In cases where more than 10
annotations were available for an article, we randomly
chose 10 annotations. Observational units are average
measures since the sentiment for each article is going
to be the average of all human annotations for the
given article.</p>
      <p>
        The total Intra Class Correlation is 0:88, which
indicates good to excellent reliability [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Reliability
is slightly higher for real (ICC(1; 10) = :90) than for
fake articles (ICC(1; 10) = :76) (see Table 2).
      </p>
      <p>
        Note that there is a large discrepancy between
the average point estimates and the single point
estimates for the same data (ICC(1; 1) = :42; CI[:95] =
[:37; :48]). While this is generally expected [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we
considered the di erence to be large enough to report.
95% CI
The dataset contains 3788 sentiment score
annotations, ranging between 0 and 100. The mean score
is 49:92 with a standard deviation of 32:54. When
looking at all articles, scores are mostly uniformly
distributed with minor peaks at the maximum and
minimum values (see Figure 1). The distribution changes
when dividing the articles into fake and real ones. Fake
articles receive higher scores (mean = 61:50) than
non-fake ones (mean = 38:69). We found a signi cant
di erence (t(3786) = 22:99; p &lt; :001) of medium
magnitude (cohen0s d = :75), using a t-Test. In addition,
the percentage of fake articles with a sentiment score
of 50 or higher stands at 70:4 compared to real articles
where only 40:6 percent were rated with a score above
50. This shows that indeed fake articles are rated
signi cantly more sentimental than the non-fake ones.
A rst qualitative analysis of the articles rated with
the highest and lowest mean sentiment scores indicates
di erences in language use and sentence structure.
      </p>
      <p>Articles with a low sentiment score are mostly
election reports and contain listings of facts and gures.
To give examples: "Solid Republican: Alabama (9),
Alaska (3), Arkansas (6), Idaho (4), Indiana (11),
Kansas (6), Kentucky (8), Louisiana (8) [...]", or
"Clinton's strength comes from the Atlanta area, where
she leads Trump 55% to 35%. But Trump leads her
51% to 33% elsewhere in the Peach state. She leads
88% to 4% among (..)."</p>
      <p>The last example also demonstrates the use of a
repetitive and simple sentence structure, for instance,
the iterating use of the word leads. "In Iowa Sept.
29. In Kansas Oct. 19. [...]" states another example
for the repeated use of language. On the whole, the
used language seems unemotional, rather neutral and
without bias.</p>
      <p>Articles with the highest mean score seem to consist
of a larger number of negative words. "Kill",
"murder", "guns", "shooting", "racism" and "dead and
bloodied" are a few speci c examples of negative words
we observed in the articles. To some extent, o ensive
language is used which indicates a subjective view and
bias. Statements such as "[...] sick human being
unt for any political o ce [...]", or "[...] nothing but a
bunch of idiot lowlifes" can be quoted as exemplary
for o ensive language use.</p>
      <p>In some high-sentiment articles, we also found
rhetorical devices such as analogies, comparisons, and
rhetorical questions, which do not occur in the same
manner in the low-sentiment articles. Analogies and
comparisons are initiated by the word like such as in
the following sentence: "Clinton speculated about this,
and like a predictable rube under the hot lights Trump
cracked under the pressure." The following sentence
gives an example for a rhetorical question found in
one of the articles:"Did Trump say he was interested
in paying higher taxes? No. Did Trump say he would
like to reform the tax code so that he would be forced
to pay higher taxes? No."
4</p>
    </sec>
    <sec id="sec-4">
      <title>Comparison between Model Predictions and Human Annotations</title>
      <p>
        To see how existing sentence-level sentiment analysis
models perform on the dataset, we used the Pattern3 4
Web Mining Package [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and the Stanford Core NLP 5
Package.
      </p>
      <p>The Pattern3 package provides a dictionary-based
sentiment analyzer with a dictionary of adjectives and
their corresponding sentiment polarity and intensity.
The model determines the sentiment score of a
sentence by averaging the sentiment scores of all
adjectives in a sentence. Scores range between 1:0
(negative) and 1:0 (positive).</p>
      <p>The Stanford Core NLP package provides a
recursive neural network model for sentiment analysis [?].
It assigns sentiment labels based on the contents and
syntactic structure of a sentence. The output is one of
ve labels (very negative, negative, neutral, positive,
very positive).</p>
      <p>Model predictions were obtained by processing the
articles in the dataset sentence by sentence and
averaging over the sentence scores. Since the models assign
sentiment values on di erent scales than the one used
by our annotators, we mapped the values to match our
scale. For the Pattern3 scores, we took the absolute
value and multiplied it by 100 and for Stanford scores,
we mapped the labels to intensity scores (very
negative = 100, negative = 50, neutral = 0, positive = 50,
very positive = 100).</p>
      <p>Human ratings represent the average sentiment
score per article.</p>
      <p>4https://github.com/pattern3
5https://stanfordnlp.github.io/CoreNLP/
In general, model predictions are lower than the
human ratings and span a more narrower of values.
Model predictions of the Pattern3 Sentiment Analyzer
range from 2:04 to 32:99 with a mean of 14:81 and a
standard deviation of 5:47. Predictions of the Stanford
Core NLP Analyzer range from 24:07 to 62:5 with a
mean of 43:18 and a standard deviation of 5:69. On
the other hand, human annotations span a wide range
of values from 4:55 to 95:25, with a mean of 49:39 and
a standard deviation of 21:77.</p>
      <p>Figure 2 shows a scatter plot of the human
ratings and the model predictions. The correlations are
signi cant yet very small (r = :171; R2 = :029; p &lt;
:001 for Pattern3, r = :139; R2 = :019; p = :028
for Stanford Core NLP ) and prediction errors are
high, while those of Pattern3 are larger (M SE =
1657:87; M AE = 35:05) than those of Stanford Core
NLP (M SE = 576; M AE = 20:42). We also looked at
the distribution of sentiment scores for the model
predictions. When comparing scores assigned to fake
articles (mean = 15:19) and scores assigned to real articles
(mean = 14:43), the predictions do not di er signi
cantly (t(248) = 1:09; p = :28; cohen0s d = 0:14). On
the other hand, analyzing the scores assigned by
human annotators on the article level, we found a
significant di erence between fake articles (mean = 60:36)
and real articles (mean = 38:42) with a large
magnitude (t(248) = 9:21; p &lt; :001; cohen0s d = 1:17).
The results indicate that the computation of an
overall sentiment score based on sentence-level sentiment
scores is not useful for fake news detection. However,
human ratings at article level can indeed be used to
distinguish between fake and non-fake articles.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and Conclusion</title>
      <p>A new human annotated sentiment dataset is
presented in this paper. To the best of our knowledge,
it is the rst dataset providing high quality,
articlelevel sentiment ratings.</p>
      <p>Our analysis of model predictions shows that
sentence-level sentiment estimates are unable to match
human estimates for entire articles. Sentence-level
models underestimate true sentiment scores, probably
due to the fact that results are averaged over the
sentiments of all sentences. The fact that the Pattern3
predictions are generally lower than the ones of
Stanford Core NLP supports this hypothesis, as Pattern3
averages over all adjectives in a sentence and all
sentences, whereas the Stanford model is only averaged
over all sentences in the article. If an article contains
mostly neutral sentences and only a few sentences with
strong emotional statements, these models will assign
the article a relatively low score. Contrarily, for human
readers, already a few of such emotionally-charged
sentences can shape the perception of the entire article.
Sentiment analysis models should, therefore, operate
at the article level rather than at the sentence level.
Our dataset can be used to train such models and is
thus a valuable addition to the collection of available
sentiment datasets.</p>
      <p>Furthermore, fake and real articles di er in the
distribution of sentiment annotations. Real articles in
our dataset receive signi cantly lower sentiment scores
than fake ones. This quali es sentiment as a potential
feature for fake news classi cation of political news
articles. Sentence-level models failed to generate scores
that re ect this relation. Models could be improved
by making predictions on the article level and by
using our dataset for training.</p>
      <p>Future research could be aimed at examining this
nding further by incorporating more articles,
potentially also from di erent topic domains, as our dataset
includes only political news articles.</p>
      <p>We started investigating where di erences in
sentiment may be coming from and (unsurprisingly) nd
that more extreme and emotionally-charged
statements were used in high-sentiment articles. As
mentioned earlier, the interesting nding here is that even
a few such statements seem to a ect the overall
impression of an article's sentiment.</p>
      <p>In future studies, this investigation could be
expanded either by detecting which sentences have the
largest impact on the overall sentiment score of an
article or by identifying individual-level determinants that
a ect people's perception of sentiment in an article.
6</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This work was funded by the Global Young Faculty6
and the Deutsche Forschungsgemeinschaft (DFG,
German Research Foundation) - GRK 2167, Research
Training Group \User-Centred Social Media".
6https://www.global-young-faculty.de/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vovsha</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rambow</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Passonneau</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Sentiment analysis of twitter data</article-title>
          .
          <source>In Proceedings of the Workshop on Language in Social Media (LSM</source>
          <year>2011</year>
          )
          <article-title>(</article-title>
          <year>2011</year>
          ), pp.
          <volume>30</volume>
          {
          <fpage>38</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Aker</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gravenkamp</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mayer</surname>
            , Sabrina,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamacher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smets</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erdmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Serong</surname>
            ,
            <given-names>Julia</given-names>
          </string-name>
          <string-name>
            <surname>Welpinghus</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Marchi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>Corpus of news articles annotated with article level subjectivity</article-title>
          .
          <source>In ROME 2019: Workshop on Reducing Online Misinformation Exposure</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Cambria</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bandyopadhyay</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Feraco</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>A practical guide to sentiment analysis</article-title>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Cambria</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuller</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Havasi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>New avenues in opinion mining and sentiment analysis</article-title>
          .
          <source>IEEE Intelligent systems 28, 2</source>
          (
          <year>2013</year>
          ),
          <volume>15</volume>
          {
          <fpage>21</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Conroy</surname>
            ,
            <given-names>N. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rubin</surname>
            ,
            <given-names>V. L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Automatic deception detection: Methods for nding fake news</article-title>
          .
          <source>Proceedings of the Association for Information Science and Technology 52</source>
          ,
          <issue>1</issue>
          (
          <year>2015</year>
          ),
          <volume>1</volume>
          {
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>De</given-names>
            <surname>Smedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            , and
            <surname>Daelemans</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <article-title>Pattern for python</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>13</volume>
          ,
          <issue>1</issue>
          (
          <year>June 2012</year>
          ),
          <year>2063</year>
          {
          <year>2067</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Fuhr</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nejdl</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giachanou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grefenstette</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanselowski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jarvelin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>Y.</given-names>
            , and
            <surname>Mothe</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>An information nutritional label for online documents</article-title>
          .
          <source>ACM SIGIR Forum 51</source>
          ,
          <issue>3</issue>
          (feb
          <year>2018</year>
          ),
          <volume>46</volume>
          {
          <fpage>66</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hallgren</surname>
            ,
            <given-names>K. A.</given-names>
          </string-name>
          <article-title>Computing inter-rater reliability for observational data: An overview and tutorial</article-title>
          .
          <source>Tutorials in quantitative methods for psychology 8</source>
          ,
          <issue>22833776</issue>
          (
          <year>2012</year>
          ),
          <volume>23</volume>
          {
          <fpage>34</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Kevin</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , Hogden,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Schwenger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Sahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Madan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Bangaru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Muradov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            , and
            <surname>Aker</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Information nutrition labels: A plugin for online news evaluation</article-title>
          .
          <source>In Proceedings of the First Workshop on Fact Extraction and VERi cation (FEVER)</source>
          (Brussels, Belgium, Nov.
          <year>2018</year>
          ),
          <article-title>Association for Computational Linguistics</article-title>
          , pp.
          <volume>28</volume>
          {
          <fpage>33</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Koo</surname>
            ,
            <given-names>T. K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M. Y.</given-names>
          </string-name>
          <article-title>A guideline of selecting and reporting intraclass correlation coefcients for reliability research</article-title>
          .
          <source>Journal of chiropractic medicine 15</source>
          ,
          <issue>27330520</issue>
          (
          <year>June 2016</year>
          ),
          <volume>155</volume>
          {
          <fpage>163</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Kouloumpis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <given-names>T.</given-names>
            , and
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Twitter sentiment analysis: The good the bad and the omg! In Fifth International AAAI conference on weblogs and social media (</article-title>
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Sentiment analysis and opinion mining</article-title>
          .
          <source>Synthesis lectures on human language technologies 5</source>
          ,
          <issue>1</issue>
          (
          <year>2012</year>
          ),
          <volume>1</volume>
          {
          <fpage>167</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Mejova</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Sentiment analysis: An overview</article-title>
          . University of Iowa, Computer Science Department (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Pak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Paroubek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Twitter as a corpus for sentiment analysis and opinion mining</article-title>
          . In LREc (
          <year>2010</year>
          ), vol.
          <volume>10</volume>
          , pp.
          <volume>1320</volume>
          {
          <fpage>1326</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Shu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mahudeswaran</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , and Liu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fakenewsnet</surname>
          </string-name>
          :
          <article-title>A data repository with news content, social context and dynamic information for studying fake news on social media</article-title>
          . CoRR abs/
          <year>1809</year>
          .01286 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>