<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Bias Bubbles: Using Semi-Supervised Learning to Measure How Many Biased News Articles Are Around Us</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bias Bubbles</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Insight Centre for Data Analytics</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science, University College Dublin</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Science Foundation Ireland Centre for Research Training in Machine Learning</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The proliferation of web 2.0 technology allows us to easily create and share online content, but also leads to the rapid spread of misinformation and biased media, which has considerable negative efects on society. Deep learning-based classifiers are one common way of identifying media bias, but they sufer from a lack of large-scale labelled datasets. In this paper, we first explore the use of pseudo-labelling technology to mitigate this problem. Second, we exploit a masking method to identify biased sentences in news articles by iteratively masking each sentence from an article and observing the change in output of a bias detection model. These identified sentences not only contribute to evaluating the proposed model, but also enable end-users to understand where media bias arises in an article. Finally, we apply our well-trained bias detection model to a well-known news article dataset to show how widespread media bias is-the results show that it is rampant and has become a serious social problem that we cannot ignore.</p>
      </abstract>
      <kwd-group>
        <kwd>Media Bias</kwd>
        <kwd>Pseudo-labelling</kwd>
        <kwd>Semi-supervised Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Online news websites are efective news transmission platforms, however, studies
have shown that media bias is widespread in them, and is caused by inherent
lfaws in the news production process [
        <xref ref-type="bibr" rid="ref14 ref2">2, 14</xref>
        ]. The side efects of media bias—
such as distorting readers’ perception and negatively influencing social
decisionmaking—have been widely recognized by social scientists [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In computer science
solutions have been explored to identify media bias automatically—from
traditional lexicon-based algorithms [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to more recent deep learning-based models
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, accurately detecting media bias in news articles and evaluating the
degree of media bias that exists in our society remain significant challenges.
      </p>
      <p>
        Some inherent characteristics of media bias are a major cause of these
challenges. First, the forms of media bias are variant, such as using a tendentious
or inflammatory vocabulary, adopting diferent writing styles, or reporting an
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
event only in favour of one side. Second, bias is not a problem of honesty of
reporting but journalists’ own preferred opinions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and usually the bias is subtle
rather than explicit because it is easier to afect unsuspecting readers that way
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These characteristics not only make media bias recognition more challenging
than other text classification tasks, but also increase the dificulty and the cost of
manually labelling news articles for media bias. Therefore, the scale of datasets
released for media bias detection is usually quite small. For example, the
Annotated Data dataset [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] contains only 46 news articles and 1,235 sentences from
4 news events. The lack of large-scale labelled data prevents researchers from
adopting sophisticated models to improve classification accuracy.
      </p>
      <p>
        In this paper we address these challenges in three ways. First, we explore the
simultaneous use of unlabelled or machine labelled data (large-scale but with a
lot of noise) and human labelled data (small-scale with better quality) by
using two pseudo-labelling algorithms to augment training datasets containing the
latter with the former. Second, we verify the generalization ability of bias
detectors trained in the previous step by observing their performance on an unseen
dataset. We also exploit a masking approach [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] to identify biased sentences by
iteratively masking each sentence and making a comparison with human labels
to further evaluate the proposed models. Finally, we leverage two well-trained
bias detectors to analyze media bias in a large-scale news article dataset, MIND
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The results show that media bias is a widespread phenomenon and has
become a serious social problem that we cannot ignore. The workflow followed
in this paper is illustrated in Fig. 1.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Recently computational approaches based on natural language processing and
machine learning have been employed to detect biased news articles. A
common perspective is to regard biased news detection as a text classification task.
Therefore, feature mining and classifiers employed in text classification tasks can
also be applied to biased news recognition. Kiesel et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] noted how many
entries into the Semeval-2019 task on Hyperpartisan news detection used standard
text mining methods—including word n-gram, word embeddings, stylometric
features, sentiment and emotion features, and recognition of named entities in
the news. However, in the Semeval-2019 task deep neural networks that adopt
the most current trends (largely based on transformer models [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]) were the
most common approaches used by competition teams [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Another type of approach deals with news bias, from discovering news bias
texts to locating biased information. Some views find potential opinions by
evaluating expressions of “bias target”. For example, [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] designed a method called
“stakeholder mining” because they treat important entities as stakeholders in
the text. Similarly, [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] extract frame attributes and target words or phrases
related to frame topics from political news articles.
      </p>
      <p>
        Labelled training data is a prerequisite for applying machine learning to
media bias detection. In the past two years, a number of important manually
labelled biased news article datasets have been released, including the SemEval
2019 Task4 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Ukraine Crisis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], NewsWCL50 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Annotated Data [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], and
BASIL [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] datasets. These datasets cover news articles across diferent areas and
feature diferent kinds of bias, e.g., the SemEval 2019 Task 4 dataset includes
articles labelled from the perspective of political ideologies; and the Ukraine
Crisis dataset identifies bias derived from diferent countries.
      </p>
      <p>
        Based on the annotation granularity, these datasets can be mainly divided
into three groups: article level, sentence level, and word group level. For
example, the SemEval 2019 Task 4 dataset is the largest article-level news dataset,
and contains 1,273 manually labelled news articles, each categorised as biased
or unbiased. In the Ukraine crisis dataset, Fa¨rber et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] extracted 90 news
articles with a total of 2,057 sentences and labelled the data at both article and
sentence level from multiple perspectives, such as subjectivity and the presence
of hidden assumptions. The Annotated Data dataset is another sentence-level
dataset including 46 news articles (made up of 1,235 sentences) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] covering 4
news events. However, the major limitation of these manually labelled datasets
is their small scale.
      </p>
      <p>
        Researchers have studied automated labelling technologies to address the
limited size of manually labelled datasets. Distant supervision is a popular
technique for annotating datasets in the context of media bias detection. In this
approach news articles are labelled not based on the detailed content of the
articles themselves but rather based on the characteristics (e.g., political leaning) of
their publisher. The SemEval 2019 Task4 dataset [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] also contains a large
corpus for identifying hyper-partisanship, which has 754,000 news articles labelled
via distant supervision. However, a recent study [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] shows that these types of
datasets are very noisy and it is not yet clear how that can best be utilized in
media bias detection tasks.
      </p>
      <p>
        Self-training methods form a branch of semi-supervised learning, and
leverage the probability output of a model to generate pseudo-labels for unlabelled
data [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. This approach can easily add more input data to help train a model.
Due to its simplicity and efectiveness, self-training has been successfully used
in various tasks. The entropy minimization (EntMin) method [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] encourages a
model to make low-entropy predictions on unlabelled data through entropy
regularization, and then employs qualified unlabelled data in standard supervised
learning settings. Another simple and efective way to train neural networks in
a semi-supervised way is pseudo-labelling [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. A neural network model trained
using labelled data through supervised learning directly predicts pseudo-labels
for instances in the unlabelled dataset and these are then used along with the
labelled data to retrain the model. Inspired by knowledge distillation, the noisy
student approach [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] is a supervised method that transfers the knowledge of a
teacher model to an equivalent or larger student model. The teacher model is
ifrst trained on labelled data to generate pseudo-labels for unlabelled examples.
Then the equivalent or larger student model uses the knowledge of the teacher
model to train on the labelled and pseudo-labelled data.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Pseudo-labelling Enhanced Bias Detectors</title>
      <p>
        In this section, we present and evaluate our two pseudo-labelling frameworks:
Overlap-checking and Meta-learning. We use the network from Jiang et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] as
our backbone model, as it is one of the best models submitted to the leaderboard
of SemEval 2019 Task 44. Baly et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] showed that the top models on this
leaderboard are trained purely on manual labelled articles. In this experiment,
we demonstrate how to utilize by-publisher data (through distant supervision)
in the training process via our pseudo-labelling frameworks and evaluate their
performance on the SemEval 2019 Task4 hyperpartisan news detection dataset.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Pseudo-labelling Frameworks</title>
        <p>This section describes our two pseudo-labelling frameworks: Overlap-checking
and Meta-learning.</p>
        <p>(a) Overlap Selection
(b) Meta-Learning
4 https://pan.webis.de/semeval19/semeval19-web/#results</p>
        <p>
          Overlap-checking. An overview of the proposed overlap-checking
framework is presented in Fig. 2(a). The framework contains four steps: (1) the
network is first trained on manually labelled data until it converges; (2) the training
leverages the overlap-checking mechanism to select a batch of pseudo-labelled
data; (3) new data is generated using labelled data and pseudo-labelled data;
and (4) the model is re-trained on new data. For unlabelled samples, the
pseudolabelling method is used to label the data based on the probability distribution
of the model prediction [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          The overlap-checking method belongs to the branch of semi-supervised
learning. Using this simple and eficient method, the system can easily add more data
to help re-train the model. He &amp; Sun [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] proved that using a batch of
samples with the highest prediction probability of the model can help enhance the
performance of the model. In the overlap-checking framework, the vanilla
pseudolabelling method selects the class with the highest predicted probability from the
completely unlabelled dataset as the pseudo label of the sample. Assuming that
there are L classes, denoted l as category instance, where a value of 1 indicates
that category l is selected and 0 not selected, the formula is as follows:
y′ =
(1 if l = arg maxl∈L f (x)′
0
otherwise
(1)
        </p>
        <p>The system then combines both the pseudo-labelled annotation and the
distant supervision annotation on the by-publisher dataset by considering their
consistency. We denote the distant supervision dataset as A, the pseudo-labelled
dataset as P , and the intersection set of A and P as candidate set C = A ∩ P .
Eventually, the top N pseudo-labelled samples are returned on the basis of
descending order of the predicted probability value, where N represents the
expected number of pseudo-samples.</p>
        <p>
          Meta-learning. The meta-learning framework takes inspiration from the
meta pseudo-labels approach [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. The workflow of the meta-learning framework
is shown in Fig. 2(b). The pseudo-labelling method maintains a network to be
trained sequentially on a clean dataset and a pseudo dataset. Unlike the vanilla
pseudo-labelling method, meta-pseudo-labelling trains the teacher network and
the student network in parallel. The teacher network updates its own information
from two aspects: a signal from the annotated data and feedback from the student
network. Acquiring teacher network signals is by the standard process of training
a supervised learning model based on annotated datasets. Getting feedback from
the student network requires that the student network inherits the same network
structure as the teacher network, but the update of the student network is based
on noisy data.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Experiments and Evaluation</title>
        <p>This section describes the evaluation experiment designed to assess the
performance of the Overlap-checking and Meta-learning methods.</p>
        <p>
          Semeval-2019 Task 4 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] focuses on detecting if a news article contains biased
information. Released along with the competition is an article-level
hyperpartisan news bias dataset. The dataset includes 1,273 manually labelled samples (of
which 628 are kept private for evaluation) and 754,000 automatically labelled
samples based on publisher attributes. We use the the hyperpartisan dataset as
the training dataset for our models. We collect all published manually labelled
samples and 30,000 samples selected randomly from the automatically labelled
dataset. A summary of the training dataset is shown in Table 1.
        </p>
        <p>
          The solution of Semeval-2019 Task 4 winning team [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], is employed as the
base detector in our approaches. This builds an Elmo-based sentence encoder
to encode sentences to high-dimensional semantic vectors which are passed into
diferently initialized convolutional layers and batch normalization layers in
parallel. The final output is a dense layer followed by a sigmoid function that
concatenates output from the previous layers.
        </p>
        <p>
          The training details follow the same configuration on the data processing and
network side. Unlike the approach by Jiang et al. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] that only uses manually
labelled data, we improve performance by adding the data enhancement module
to leverage the noisy by-publisher distant labels.
        </p>
        <p>
          We conduct detailed comparisons of diferent data strategies combined with
the bias detector, recording results in Table 2. The results in Table 2 are based
on 10-fold cross-validation and bias detection performance is measured using
accuracy, precision, recall and the F1 score. The Bias detector is precisely the
same configuration as the original version [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. 5
        </p>
        <p>The overlap-checking and meta-learning methods are used with addition of
distant supervised data to increase the size of the data by 1x, 2x, and 3x.
Overlapchecking has the best performance in the case of using equal proportions, in
which accuracy, precision and F1 score respectively exceed the baseline model.
Providing more distant supervised data for the meta-learning method leads to
better accuracy, recall and F1 score. However, with the addition of more data,
the precision of the meta-learning model decreases.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluating Generalization</title>
      <p>
        In the previous section, we demonstrated the efectiveness of the proposed
solutions on the Semeval-2019 hyperpartisan news dataset. We are interested in
5 We re-implement the Jiang et al. approach [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] in PyTorch, and our accuracy score
is 0.0116 higher than what they reported, which we assume is due to diferent
initializations.
whether these trained detectors can generalise to other news article datasets to
assess the degree of media bias in them, and also whether these models have the
ability to recognize biased sentences within news articles. To address these two
questions, we conduct experiments to evaluate the trained biased news detectors
on a completely unseen dataset, the Annotated Data dataset [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], that contains
article-level and sentence-level manual annotations.
The Annotated Data Dataset is a fine-grained news bias dataset [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Annotators
have evaluated the degree of bias at the article and sentence level. Four to five
annotators provide a bias score for each sample. The scores range from one (not
biased) to four (very biased). To use the Annotated Data Dataset to conduct
a generalization experiment, we assign a bias score to each sample (article or
sentence) by aggregating the annotators’ scores using the mean.
Evaluating the performance of the model trained on the binary Semeval-2019
hyperpartisan news detection dataset directly on the Annotated Dataset is
complicated because the labels in the Annotated Data Dataset indicate a degree of
bias from one to four. The output of the trained bias detector, however, is a
continuous probability of bias value between zero and one. We can, therefore,
measure the generalization ability of the bias detection model by measuring the
correlation between the model outputs and the aggregated bias scores. The first
and third columns of Fig. 3 show scatter plots of aggregated human annotated
bias degree (horizontal axis) and the probability of bias predicted by a model
(vertical axis) for models trained using overlap-checking (OC) and meta-learning
(ML) with diferent degrees of data augmentation (1x, 2x, and 3x). These plots
show that humans score towards the biased direction for an article whose
predicted results exceeds 0.5. Similarly, for articles that human annotators tend to
be rate as unbiased, the prediction of the detectors is also between 0-0.5. There
is, however, disagreement in the lower right corner of the plots.
      </p>
      <p>
        The second and fourth columns of Fig. 3 show a trend analysis, where the
degree of bias annotated by humans (horizontal axis) and the number of articles
(left vertical axis) forms a histogram. The right vertical axis shows the mean and
standard deviation of the probability of bias outputs in each bin. We see that
as the biased score annotated by humans increases, the probability predicted by
models also increases, showing a strong positive correlation between these two.
A neural-network-based bias detector outputs a probability of bias, pinitial, when
presented with an article as input. Inspired by [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], to measure the impact of a
specific sentence, si, on the output of the model we can mask si out of the article,
recompute the output of the model, psi , and calculate pshift, the diference
between this and the original probability of bias: pshift = pinitial − psi. Fig. 4
illustrates this process.
      </p>
      <p>The scatter plots in the rfist and third columns of Fig. 5 shows the
relationship between pshift values (horizontal axis) and human annotated bias values
for sentences (vertical axis). The second and fourth columns of Fig. 5
illustrate the changes in bias scores of sentences under diferent probability
intervals. These plots show that there is a certain positive correlation between the
human-annotated bias scores of the sentences and their pshift values.</p>
    </sec>
    <sec id="sec-5">
      <title>Analyzing Media Bias in the MIND Dataset</title>
      <p>
        This section analyzes Microsoft’s large-scale English-based news
recommendation dataset, MIND [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], which includes one million users and more than 160k
news articles, and is widely employed in news-related academic research. Each
article in the MIND dataset has a wealth of information associated with it: news
id, title, summary, body URL and category. The number of articles used in our
experiment is less than the total number in the MIND dataset as we removed
articles that failed to get the body. The overlap-checking method and meta-learning
method were used to infer the number of biased articles in the MIND dataset.
The number of biased articles detected by each of the approaches is presented
in Table 3. A more detailed analysis of the amount of bias in diferent news
categories from the MIND dataset is shown in Fig. 6. From Table 3, we observe
that the overlap-checking method identified 8.8804% of news as biased, while the
meta-learning method is more conservative. Fig. 6 shows the number of biased
articles detected by each approach—overlap-checking, and meta-learning—for
diferent categories of news article in the MIND dataset. In the weather,
music, and travel categories, the amount of biased news detected is relatively low,
while in health, food and drink, as well as lifestyle, the amount of biased
content detected is relatively high. In sports, there is quite a diference between the
numbers of biased articles detected, which requires further in-depth analysis.
In this work, we propose two pseudo-labelling based solutions—overlap-checking
and meta-learning—to augment training sets with noisy automatically labelled
data when training media bias detection models. The experimental results show
that these data augmentation strategies have a positive efect on model
performance. To validate the generalization capability of the proposed models, we
re-evaluate them on a completely unseen dataset. The results show that the
probability of bias values output by the models is highly consistent with human
annotations. In addition, we exploit a masking method to identify essential
sentences that afect the model’s decision-making. The comparison results show a
partial overlap trend between the biased sentences recognized by annotators and
identified by models. Finally, we infer the amount of biased news in a large-scale
news article dataset MIND. This shows that media bias is widespread–over 8%
of all articles (and in some categories as much as 15%) are biased. In the future,
we plan to look more in-depth at the MIND dataset to further understand the
degree of bias that it contains.
      </p>
      <p>Acknowledgements. This publication has emanated from research conducted
with the financial support of Science Foundation Ireland under Grant number
18/CRT/6183. For the purpose of Open Access, the author has applied a CC
BY public copyright licence to any Author Accepted Manuscript version arising
from this submission.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ramy</given-names>
            <surname>Baly</surname>
          </string-name>
          et al. “
          <article-title>We Can Detect Your Bias: Predicting the Political Ideology of News Articles”</article-title>
          .
          <source>In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <year>2020</year>
          , pp.
          <fpage>4982</fpage>
          -
          <lpage>4991</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>David P Baron.</surname>
          </string-name>
          “
          <article-title>Persistent media bias”</article-title>
          .
          <source>In: Journal of Public Economics</source>
          <volume>90</volume>
          .
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          (
          <year>2006</year>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Bernhardt</surname>
          </string-name>
          , Stefan Krasa, and Mattias Polborn. “
          <article-title>Political polarization and the electoral efects of media bias”</article-title>
          .
          <source>In: Journal of Public Economics</source>
          <volume>92</volume>
          .
          <fpage>5</fpage>
          -
          <lpage>6</lpage>
          (
          <year>2008</year>
          ), pp.
          <fpage>1092</fpage>
          -
          <lpage>1104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Yahui</given-names>
            <surname>Chen</surname>
          </string-name>
          . “
          <article-title>Convolutional neural network for sentence classification”</article-title>
          .
          <source>MA thesis</source>
          . University of Waterloo,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Lisa</given-names>
            <surname>Fan</surname>
          </string-name>
          et al. “In Plain Sight:
          <article-title>Media Bias Through the Lens of Factual Reporting”</article-title>
          .
          <source>In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source>
          .
          <year>2019</year>
          , pp.
          <fpage>6343</fpage>
          -
          <lpage>6349</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Fa</surname>
          </string-name>
          ¨rber et al. “
          <article-title>A Multidimensional Dataset Based on Crowdsourcing for Analyzing and Detecting News Bias”</article-title>
          .
          <source>In: Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management</source>
          .
          <year>2020</year>
          , pp.
          <fpage>3007</fpage>
          -
          <lpage>3014</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Yves</given-names>
            <surname>Grandvalet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          , et al. “
          <article-title>Semi-supervised learning by entropy minimization</article-title>
          .”
          <source>In: CAP</source>
          <volume>367</volume>
          (
          <year>2005</year>
          ), pp.
          <fpage>281</fpage>
          -
          <lpage>296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Stephan</given-names>
            <surname>Greene</surname>
          </string-name>
          and
          <string-name>
            <given-names>Philip</given-names>
            <surname>Resnik</surname>
          </string-name>
          . “
          <article-title>More than words: Syntactic packaging and implicit sentiment”</article-title>
          . In:
          <article-title>Proceedings of human language technologies: The 2009 annual conference of the north american chapter of the association for computational linguistics</article-title>
          .
          <source>2009</source>
          , pp.
          <fpage>503</fpage>
          -
          <lpage>511</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Tim</given-names>
            <surname>Groseclose</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Milyo</surname>
          </string-name>
          .
          <article-title>“A measure of media bias”</article-title>
          .
          <source>In: The Quarterly Journal of Economics 120.4</source>
          (
          <issue>2005</issue>
          ), pp.
          <fpage>1191</fpage>
          -
          <lpage>1237</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Felix</surname>
            <given-names>Hamborg</given-names>
          </string-name>
          , Anastasia Zhukova, and Bela Gipp. “
          <article-title>Automated identiifcation of media bias by word choice and labeling in news articles”</article-title>
          .
          <source>In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL)</source>
          .
          <source>IEEE</source>
          .
          <year>2019</year>
          , pp.
          <fpage>196</fpage>
          -
          <lpage>205</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Hangfeng</given-names>
            <surname>He</surname>
          </string-name>
          and
          <string-name>
            <given-names>Xu</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>“A unified model for cross-domain and semisupervised named entity recognition in chinese social media”</article-title>
          .
          <source>In: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          . Vol.
          <volume>31</volume>
          . 1.
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Ye</given-names>
            <surname>Jiang</surname>
          </string-name>
          et al. “
          <article-title>Team bertha von suttner at semeval-2019 task 4: Hyperpartisan news detection using elmo sentence representation convolutional network”</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          .
          <year>2019</year>
          , pp.
          <fpage>840</fpage>
          -
          <lpage>844</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Kiesel</surname>
          </string-name>
          et al. “
          <article-title>Semeval-2019 task 4: Hyperpartisan news detection”</article-title>
          .
          <source>In: Proceedings of the 13th International Workshop on Semantic Evaluation</source>
          .
          <year>2019</year>
          , pp.
          <fpage>829</fpage>
          -
          <lpage>839</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Juhi</given-names>
            <surname>Kulshrestha</surname>
          </string-name>
          et al. “
          <article-title>Search bias quantification: investigating political bias in social media and web search”</article-title>
          .
          <source>In: Information Retrieval Journal 22.1</source>
          (
          <issue>2019</issue>
          ), pp.
          <fpage>188</fpage>
          -
          <lpage>227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Dong-Hyun Lee</surname>
          </string-name>
          et al. “
          <article-title>Pseudo-label: The simple and eficient semi-supervised learning method for deep neural networks”</article-title>
          .
          <source>In: Workshop on challenges in representation learning, ICML</source>
          . Vol.
          <volume>3</volume>
          . 2.
          <year>2013</year>
          , p.
          <fpage>896</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Sora</given-names>
            <surname>Lim</surname>
          </string-name>
          et al. “
          <article-title>Annotating and analyzing biased sentences in news articles using crowdsourcing”</article-title>
          .
          <source>In: Proceedings of the 12th Language Resources and Evaluation Conference</source>
          .
          <year>2020</year>
          , pp.
          <fpage>1478</fpage>
          -
          <lpage>1484</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Tatsuya</surname>
            <given-names>Ogawa</given-names>
          </string-name>
          , Qiang Ma, and Masatoshi Yoshikawa. “
          <article-title>News bias analysis based on stakeholder mining”</article-title>
          .
          <source>In: IEICE transactions on information and systems 94.3</source>
          (
          <issue>2011</issue>
          ), pp.
          <fpage>578</fpage>
          -
          <lpage>586</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Hieu</given-names>
            <surname>Pham</surname>
          </string-name>
          et al. “
          <article-title>Meta pseudo labels”</article-title>
          .
          <source>In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          .
          <year>2021</year>
          , pp.
          <fpage>11557</fpage>
          -
          <lpage>11568</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          et al. “
          <article-title>Attention is all you need”</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          .
          <source>2017</source>
          , pp.
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Fangzhao</given-names>
            <surname>Wu</surname>
          </string-name>
          et al. “
          <article-title>Mind: A large-scale dataset for news recommendation”</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          .
          <year>2020</year>
          , pp.
          <fpage>3597</fpage>
          -
          <lpage>3606</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Qizhe</given-names>
            <surname>Xie</surname>
          </string-name>
          et al. “
          <article-title>Self-training with noisy student improves imagenet classification”</article-title>
          .
          <source>In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</source>
          .
          <year>2020</year>
          , pp.
          <fpage>10687</fpage>
          -
          <lpage>10698</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Linyi</given-names>
            <surname>Yang</surname>
          </string-name>
          et al. “
          <article-title>Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification”</article-title>
          .
          <source>In: Proceedings of the 28th International Conference on Computational Linguistics</source>
          .
          <year>2020</year>
          , pp.
          <fpage>6150</fpage>
          -
          <lpage>6160</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Xiangli</given-names>
            <surname>Yang</surname>
          </string-name>
          et al. “
          <article-title>A Survey on Deep Semi-supervised Learning”</article-title>
          .
          <source>In: arXiv preprint arXiv:2103.00550</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>