<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Clickbait Mitigation in Industrial Recommender System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Václav Blahut</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Karel Koupil</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michal Chudoba</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Radek Tomšů</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seznam.cz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Praha</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Praha</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>43</fpage>
      <lpage>51</lpage>
      <abstract>
        <p>Many online platforms struggle to control the massive increase in clickbait content. Publishers and professional content creators strive to get users attention in a highly competitive online environment, thus trying to write catchy, superlative, and misleading titles. In our setting, we run a semi-open on-line content platform with hundreds of professional publishers and thousands of content creators. We examined several diferent approaches, how to systematically penalize a clickbait item in industrial recommender system. To compare the eficiency of selected approaches, we conducted large-scale online AB experiments, reported results indicating KPI shifts, and shared several insights from application of the approach on our platform.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Recommender systems</kwd>
        <kwd>Clickbait</kwd>
        <kwd>Debiasing</kwd>
        <kwd>Auxiliary task</kwd>
        <kwd>Industry</kwd>
        <kwd>Post-processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Our content platform serves millions of active users each day, delivering personalized recommendations
from a wide range of content creators. Users receive a customized mix of on-line media, including news,
entertainment articles, videos, and podcasts. Clickbait (CB) often employs sensationalist headlines
or descriptions that exaggerate the content’s value or importance to lure users into clicking. These
headlines often create a "curiosity gap" by providing just enough information to spur interest, but not
enough to satisfy it, forcing users to click to learn more [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Without adequate control mechanisms,
some publishers take advantage of the situation, leading to a surge in clickbait exposure and causing
individual items to become excessively dominant in the content stream. Clickbait preys on human
curiosity and emotional triggers, such as surprise, fear, or excitement. Using these psychological factors,
clickbait aims to maximize click-through rates (CTR) at the expense of content quality [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Therefore,
it has a massive impact on industrial recommenders: erosion of user trust [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], degradation of content
quality[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], challenges for content moderation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and also an economic impact, as clickbait is often
driven by the economic incentives of online advertising [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], where platforms and content creators
are rewarded for maximizing clicks and engagement. This creates a conflict between the goals of
user satisfaction and revenue generation, and also between the short-term gains of publishers and the
long-term sustainability of the content platform.
      </p>
      <p>In this work, we investigate several approaches to integrate clickbait mitigation into an industrial
recommendation system, ranging from heuristic methods to reinforcement learning, validated through
extensive online A/B testing.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Clickbait mitigation</title>
      <p>The initial step in mitigating clickbait involves using a model that assesses the degree of clickbaitiness
in article titles. In our setting, we employ an internal BERT-like model trained on a large, carefully
0.8
r
o
t
fca 0.6
n
o
i
t
a
c
i
iltp 0.4
l
u
m
0.2
0
0
annotated dataset. The model classifies each article title and outputs a continuous score between 0 and
1, where 0 means that there is no clickbait and 1 signals a clickbait title. This model is static and used in
all of our experiments.</p>
      <p>We explore several methods to integrate clickbait mitigation into the recommendation system. First,
we introduce a model-agnostic approach that requires no changes to the model architecture or the
training process. Next, we modify the model itself by incorporating clickbait signals, both as an input
feature and as an auxiliary learning objective, allowing the model to account for undesirable behavior
through its loss function. Finally, we apply an actor-critic framework from reinforcement learning to
guide the model’s output distribution to more desirable recommendations. Each of these approaches is
described in detail below.</p>
      <sec id="sec-2-1">
        <title>2.1. Heuristic post-processing</title>
        <p>One of the most straightforward, model-agnostic solutions to applying any kind of promotion or
penalization logic based on some item attribute is to modify the score of an item that is used for
ranking. In our case, we decided to reduce the item score by multiplying by a factor derived from the
clickbaitiness of the item. To compute this factor, we use a custom, adjustable sigmoid function:
 (, ℎ, , , ) = (1 − ) +
1 + ︁( 1− )︁ 1− 2 (−ℎ−ℎ)
where c is the clickbaitiness of given item, h and l are the locations of the higher and lower control
points, d is the relative drop of the value at the control points, and w is the weight, controlling the
lowest value of the curve and therefore the overall efect of clickbait penalization. The c parameter is
tied to a given item, and all the other parameters are treated as hyperparameters of the solution. The
function, including a visualization of the hyperparameters and their values used in the experiment, can
be seen in Figure 1.</p>
        <p>(1)</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Debiasing via shallow tower</title>
        <p>
          The presence of clickbait in recommendation can be considered as a bias. We decided to experiment with
an existing model debiasing technique that employs a shallow tower architecture[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], which bypasses
the main model to isolate biases from genuine user utility. We used the clickbaitiness of an item as a
dense feature connected to the shallow tower, bypassing the core of the model, and added the logit of a
single dense layer with 0.1 dropout rate to the logits of multitask prediction heads. At prediction time,
the clickbaitiness was zeroed. Previously, we also experimented with other baselines from [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] such as
debiasing via simple input feature or adversarial learning via gradient negation of auxiliary task, but
the shallow tower technique worked best for us, consistently with [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Auxiliary task – Clickbait reward</title>
        <p>
          Even more direct solution how to penalize the model for recommending clickbait was to handle the
clickbaitiness at loss level. While we tried reducing the weight of training data rows for clickbait
items, the most promising approach turned out to be the utilization of the predicted CTR and item’s
clickbaitiness into an additional auxiliary task loss [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>= log( 1 −  ) * (2*¯   − 1) (2)</p>
        <p>This allows the model to directly learn which items to promote over others based on their clickbaitiness.
The efect of this approach can be adjusted by modifying the loss weight.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Reinforcement learning</title>
        <p>
          While auxiliary task solution does properly punish high-clickbait articles, it still needs interactions to
learn such punishment from. We therefore looked for solutions that are more in the realm of model
alignment rather than only standard supervised learning from implicit interactions. Inspired by RLHF
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and other methods used in LLMs [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we came up with a method for recommendation models.
        </p>
        <p>We sample from a catalogue of available articles and perform an actor-critic cycle with KL divergence
bounds over frozen model with the previously mentioned clickbait loss to align the model.
(3)
(4)
(5)
(6)
critic =  (¯ ; − )</p>
        <p>= (frozen; actor)
actor =  *   −   * ¯</p>
        <p>The impact of this method can be regulated by adjusting the loss weight as well as by modifying the
way the items are sampled from catalogue.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental setup</title>
      <p>
        The main pipeline of our recommender system follows the 4-stage framework as described by Higley et
al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. At ranking stage, the system utilizes Deep and Cross Network V2[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] model from TensorFlow
Recommenders library1. We use multitask learning to predict the click-through rate¯   and the time
¯  spent consuming the item. For final ranking, the predictions are then combined using the formula
(¯  ,¯ ) =¯ ¯ 0,8
as further described in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The model is trained incrementally every 5 minutes. An endless feed of
recommended items consists of 20-item slates, incrementally generated online as user scrolls through
the feed. All hyperparameters of the system, including those of the variants discussed in this work,
were selected based on preceding exhaustive series of experiments.
1https://www.tensorflow.org/recommenders/
      </p>
      <sec id="sec-3-1">
        <title>3.1. A/B testing</title>
        <p>In order to assess the performance of our recommender system, we perform randomized A/B tests on
live trafic. For each of the tested variants, we include A/A variant. The size of each variant as well as
test duration are chosen empirically w.r.t. statistical significance. The unit of randomization in our AB
testing system is the user.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Metrics</title>
        <p>To evaluate the efectiveness of the tested methods in reducing the amount of clickbait items in
recommendations, and to see its impact on the KPIs, we report these user-level metrics:
• Number of impressions per clickbait category - to show the shifts in the amount of recommended
CB, we categorize items into three categories according to their clickbait score corresponding to
non-CB, light CB and heavy CB.
• Click-through rate - the number of clicks divided by number of item impressions.
• Number of clicks
• Spent time - the sum of amount of time user spent consuming clicked items.
• Bounce rate - the ratio of clicks resulting in immediate leave after the content of article is displayed.</p>
        <p>Unlike other mentioned metrics, reducing bounce rate means improvement.</p>
        <p>All metrics are computed per user on personalized portion of trafic and their reported changes are
in relative scale.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Clickbait mitigation can lead to various outcomes. In the short-term, it most likely leads to a reduced
number of clicks in the system. Clickbait titles have generally higher click-through-rate than other
article titles. The same may not be true for other metrics that try to measure user satisfaction, such
as spent time, the overall number of user interactions, etc. Our findings suggest that some clickbait
titles are connected with a high number of bounces, i.e. users leaving the article in a short time and not
reading a significant part of it. However, there are also clickbait titles that keep the users interested
in the articles. Therefore, it is not straightforward to predict the outcomes of the mitigation on these
metrics.</p>
      <p>Examining the experimental results reveals that each tested approach exhibited distinguishably
diferent behavior. All of the methods successfully reduced the exposure of CB, some of them more
aggressively than others and with varying impacts on KPIs. Without any doubt, the heuristic
postprocess method is the overall favorite, substantially reducing the exposure of heavy CB similarly to the
most aggressive Auxiliary task method, but with much lower, manageable impact on KPIs. Another
positive aspect is that the method is easy to set up and fine-tune to desired trade-of, model-agnostic
and therefore potentially applicable to other than personalized recommendation scenarios.</p>
      <p>The debiasing via shallow tower variant has also shown interesting results. It is not as powerful in
suppressing CB as other methods, however the efect is still not negligible, especially considering the
improvements in majority of KPIs. The slight drop in number of clicks is compensated by an increase
in spent time, meaning that users spent more time consuming the items rather than browsing the
recommendations, which is considered a positive change. One potential limitation of this method is the
inability to control the extent of clickbait suppression.</p>
      <p>The auxiliary task produced highly contradictory results. On the one hand, it reduced CB impressions
more efectively than the other approaches. On the other hand, it led to a significant decline in
clickrelated metrics. Interestingly, despite this drop, users spent more time consuming the content overall,
and the bounce rate decreased significantly more than with the other methods. These mixed outcomes
are intriguing and give us flexibility in choosing which KPIs to prioritize.</p>
      <p>Reinforcement learning resulted in a strong reduction in clickbait, but this came at the cost of an
overall decline in both click and time-based metrics. Its outcomes were somewhat similar to those of
the auxiliary task, with the added downside that even time spent on site decreased slightly. This is the
most challenging setup, and so far, it hasn’t delivered the level of performance we were hoping for.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Although we were open to use machine learning approaches to mitigate the presence of clickbait in
recommendations, the heuristic post-process method demonstrated the best trade-of - it efectively
reduced the exposition of heavy clickbait and positively impacted bounce rate with a rather minor
cost in terms of the performance in clicks and CTR. A natural continuation of this research will be to
explore a combination of multiple approaches, such as shallow tower and post-process methods, to
further enhance the results. This combination may leverage the strengths of both heuristic and ML
solutions to improve clickbait mitigation and user satisfaction.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>We thank Vít Líbal and Daniel Procházka for their extensive feedback on this paper. We thank Josef
Florian and Petr Zelenka for their assistance with conducting the experiments. We thank Minh Duc
Pham for sigmoid function reduction.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT, Microsoft Copilot and Perplexity AI in
order to: Paraphrase and reword, Improve writing style. Further, the authors used SciSpace in order to:
Citation management. After using these tools, the authors reviewed and edited the content as needed
and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T.-S. Chua,
          <article-title>Clicks can be cheating: Counterfactual recommendation for mitigating clickbait issue</article-title>
          ,
          <source>in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '21</source>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          ,
          <year>2021</year>
          , p.
          <fpage>1288</fpage>
          -
          <lpage>1297</lpage>
          . URL: http://dx.doi.org/10.1145/3404835.3462962. doi:
          <volume>10</volume>
          .1145/3404835.3462962.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Immorlica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jagadeesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lucier</surname>
          </string-name>
          ,
          <article-title>Clickbait vs. quality: How engagement-based optimization shapes the content landscape in online platforms</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2401.09804. arXiv:
          <volume>2401</volume>
          .
          <fpage>09804</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jácobo-Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marino-Jiménez</surname>
          </string-name>
          ,
          <article-title>Clickbait: Research, challenges and opportunities - a systematic literature review</article-title>
          ,
          <year>2024</year>
          . URL: https://doi.org/10.30935/ojcmt/15267.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Andrews</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumthekar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sathiamoorthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yi</surname>
          </string-name>
          , E. Chi,
          <article-title>Recommending what video to watch next: a multitask ranking system</article-title>
          ,
          <source>in: Proceedings doi:10.1145/3298689</source>
          .3346997.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Körner</surname>
          </string-name>
          ,
          <article-title>Auxiliary tasks in multi-task learning</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1805</year>
          . 06334. arXiv:
          <year>1805</year>
          .06334.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Ziegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Stiennon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Christiano</surname>
          </string-name>
          , G. Irving,
          <article-title>Fine-tuning language models from human preferences</article-title>
          , arXiv preprint arXiv:
          <year>1909</year>
          .
          <volume>08593</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Rafailov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , E. Mitchell,
          <string-name>
            <surname>C. D. Manning</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Ermon</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Finn</surname>
          </string-name>
          ,
          <article-title>Direct preference optimization: Your language model is secretly a reward model</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          )
          <fpage>53728</fpage>
          -
          <lpage>53741</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Higley</surname>
          </string-name>
          , E. Oldridge,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rabhi</surname>
          </string-name>
          , G. de Souza Pereira Moreira,
          <article-title>Building and deploying a multi-stage recommender system with merlin</article-title>
          ,
          <source>in: Proceedings of the 16th ACM Conference on Recommender Systems</source>
          , RecSys '22,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>632</fpage>
          -
          <lpage>635</lpage>
          . URL: https://doi.org/10.1145/3523227.3551468. doi:
          <volume>10</volume>
          .1145/3523227. 3551468.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shivanna</surname>
          </string-name>
          , D. Cheng, S. Jain,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hong</surname>
          </string-name>
          , E. Chi, Dcn v2:
          <article-title>Improved deep &amp; cross network and practical lessons for web-scale learning to rank systems</article-title>
          ,
          <source>in: Proceedings of the Web Conference</source>
          <year>2021</year>
          , WWW '21,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>1785</fpage>
          -
          <lpage>1797</lpage>
          . URL: https://doi.org/10.1145/3442381.3450078. doi:
          <volume>10</volume>
          .1145/3442381.3450078.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Florian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Drdák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tomsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Koupil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Blahut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kuchar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rehor</surname>
          </string-name>
          ,
          <article-title>Combining models for better user satisfaction in video recommendation</article-title>
          ., in: ORSUM@ RecSys,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>