<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Workshop on News Recommendation and Analytics, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Predicting Feature-based Similarity in the News Domain Using Human Judgments</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alain D. Starke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Øverhaug</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christoph Trattner</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Bergen</institution>
          ,
          <addr-line>P.O. Box 7800, 5020 Bergen</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Wageningen University &amp; Research</institution>
          ,
          <addr-line>Droevendaalsesteeg 4, 6708 PB Wageningen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>25</volume>
      <issue>2021</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>When reading an online news article, users are typically presented 'more like this' recommendations by news websites. In this study, we assessed diferent similarity functions for news item retrieval, by comparing them to human judgments of similarity. We asked 401 participants to assess the overall similarity of ten pairs of political news articles, which were compared to feature-specific similarity functions (e.g., based on body text or images). We found that users indicated to mostly use text-based features (e.g., title) for their similarity judgments, suggesting that body text similarity was the most representative for their judgment. Moreover, we modeled similarity judgments using diferent regression techniques. Using data from another study, we contrasted our results across retrieval domains, revealing that similarity functions in news are less representative of user judgments than those in movies and recipes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;news</kwd>
        <kwd>similarity</kwd>
        <kwd>similar-item retrieval</kwd>
        <kwd>recommender systems</kwd>
        <kwd>user study</kwd>
        <kwd>human judgment</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Problem Outline</title>
        <p>
          News retrieval faces several domain-specific challenges. Compared to leisure domains (e.g.,
movies), news articles are volatile, in the sense that they become obsolete quickly or may be
updated later [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Consequently, user preferences may strongly depend on contextual factors,
such as a user’s time of day or location [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ].
        </p>
        <p>
          News websites typically present content-based recommendations [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. A common setup is to
present a list of articles that are similar to the story the user is currently reading, such as depicted
        </p>
        <p>Title
Main image</p>
        <p>Author
Date of Publication
Lead paragraph</p>
        <p>Body text
Item recommendations
in Figure 1. These are often labeled ‘More on this Story’ (e.g., at BBC News), showcasing similar
articles in terms of their publication time or specific keywords.</p>
        <p>
          Whether two news articles are alike can be computed using similarity functions [
          <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
          ].
Features (e.g., title) considered by such functions should to a large extent reflect a user’s
similarity assessment [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], while not being too similar to what a user is currently reading, for
it may lead to redundancy [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. However, research on feature-based similarity is limited and
rather domain-dependent. For example, users browsing on recipe websites tend to use titles
and header photos to assess similarity between recipes, while users of movie recommenders
use plot descriptions and genre [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. As a result, there is no consensus on which news article
features best represent a user’s similarity judgment. This may be problematic, as similarity
functions in recommender systems may be more efective if they reflect user perceptions.
        </p>
        <p>
          Hence, the current study assesses a set of similarity functions for news article retrieval,
particularly for the task of similar-item recommendation. We ask users of an online news
system to judge the similarity between pairs of news articles, which is used to develop a model
to predict news similarity. Subsequently, we perform cross-domain comparisons, comparing
which features are used for human similarity judgments in news, movies, and recipes, using
data from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We posit the following research questions:
• RQ1: Which news article features are used by humans to judge similarity and to what
extent are diferent feature-specific similarity functions related to human similarity
judgments?
• RQ2: Which combination of news article features is best suited to predict user similarity
judgments?
• RQ3: How does the use of news features and their similarity functions compare to those
used in the recipe and movie domains?
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Contributions</title>
        <p>
          This paper makes the following contributions:
• We advance the understanding of how readers perceive similarity between news articles,
in terms of (i) which article cues or features are reported as important, and (ii) how
features correlate with similarity ratings provided by users, (iii) that user-reported feature
importance is not always consistent with the computed correlations.
• We show which news information features can predict a user’s similarity judgment.
• We juxtapose our news study with findings from the movie and recipe domains, using
data from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], showing that feature-specific similarity functions in the news domains are
less representative of human judgment than functions in the movie and recipe domains.
• We present a reproducible data processing pipeline, available on Github1, and add a
benchmarking dataset for the publicly available Washington Post Corpus news article
database.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>We highlight work from the domains of Similar-item Retrieval and Semantic Similarity to craft
similarity functions. Moreover, we discuss specific challenges in news recommendation, and
explain how similarity functions are assessed by using human similarity judgments as ground
truth.</p>
      <sec id="sec-2-1">
        <title>2.1. Similar Item Retrieval</title>
        <p>
          Similar item retrieval seeks to identify unseen or novel items that are similar to what a user has
elicited preferences for [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In the recommender domain, this is referred to as a similar-item
recommendation problem. A fundamental question is how to compute similarity between
concepts [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ], which is examined in studies on semantic similarity [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], a field of research
that usually not only captures the similarity between two concepts, but also how diferent
they are [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This can be based on ontological relations, based on human knowledge, or on
co-occurrence metrics that stem from a hierarchical or annotated corpus of words [
          <xref ref-type="bibr" rid="ref12 ref2">2, 12</xref>
          ]. For
example, latent semantic analysis derives meaning and similarity from the text context itself,
by examining how and how often words are used [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>
          A traditional method is to compute similarity between items by deriving vectors from text
items. Although TF-IDF has been outperformed by other metrics, such as BM25 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], Term
Frequency-Inverse Document Frequency remains one of the most commonly used IR methods
to create similarity vectors [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. It uses the term frequency per document and the inverse
appearance frequency across all documents [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], while similarity between the vectors of liked
and unseen items can be computed using cosine similarity [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
        <p>
          A much simpler approach is to derive a set of keywords from each item [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. For example, a
book recommender could compute the similarity between 1 =  , , , and
2 =  , , , through the Jaccard coeficient :  (, ) = ||11∩∪22|| .
        </p>
        <sec id="sec-2-1-1">
          <title>1https://github.com/Overhaug/HuJuRecSys</title>
          <p>There are various other similarity metrics available, such as the Levenshtein distance (i.e., “edit
distance”), and LDA (Latent Dirichlet Allocation).</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Similarity Representations in the News Domain</title>
        <p>
          News recommender systems primarily focus on textual representations of news articles [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
Most approaches utilize the main text or title, ignoring most other textual features, such as the
author [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. A straightforward, but more uncommon approach in academic studies [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], is to
retrieve articles based on date-time, such as those that are published on the same day as the
article that is currently inspected. Other approaches include the use of (sub)categories, while
image-based similarity is more common in other domains [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], such as food [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
2.2.1. Text-based approaches
Most similarity functions relevant in news retrieval are text-based. TF-IDF is traditionally
combined with Cosine similarity and used as a news recommendation benchmark [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. In some
cases, its efectiveness can be improved by constraining it on a maximum number of words [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
TF-IDF can also be combined with a K-Nearest Neighbor algorithm to recommend short-term
interest news articles [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>
          Besides the aforementioned methods, a common approach is to derive latent topics from texts.
Although recent work uses Word2Vec and BERT [
          <xref ref-type="bibr" rid="ref22 ref23">22, 23</xref>
          ], this work considers Latent Dirichlet
Allocation (LDA) and Probabilistic Latent Semantic Indexing (PLSI) [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. LDA and PLSI can
cluster topically-similar news articles based on tags and named entities. News recommendations
can be refined afterwards based on recency scores.
        </p>
        <p>
          A final interesting text-based method is based on sentiment analysis. Sentiment analysis
mines a text’s opinions in terms of the underlying attitude, judgments, and beliefs. It has been
suggested that negativity in news has a large impact, triggering more vivid recall of news story
details among users [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
2.2.2. Other News Features
A news article’s date-time feature is also leveraged in the context of similar-item news
recommendation, either through pre-filtering, recency modeling, or post-filtering [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Pre-filtering
involves omitting outdated news articles before computation starts, while the more uncommon
post-filtering removes all non-recent articles from a Top-N set. Recency modeling is the most
common, which incorporates recency as one of the factors in an algorithm’s similarity
computation (e.g., by giving it a higher weight). Pon et al. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] describe an approach that targets users
with multiple interests, by considering recency in conjunction with a ‘multiple topic tracking’
technique.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Assessing Similarity Functions Using Human Judgments</title>
        <p>
          Similar-item retrieval approaches, as also used in similar-item recommender systems, are
typically validated using human judgments [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. An important question is to what extent
similarity functions reflect a user’s similarity assessment of item pairs. This could lead to
problems if a user either ignores or overvalues diferent item features, compared to what is
being computed [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. This has been studied in the movie and recipe domains: Trattner and
Jannach [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] contrast user similarity assessments to a set of similarity functions, pointing out that
specific features (e.g., a recipe’s title or a movie’s genre) strongly correlate with user similarity
judgments. In a similar vein, Yao and Harper [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] assess to what extent diferent algorithms for
related item recommendations in music are consistent with user similarity judgments.
        </p>
        <p>
          However, assessing similarity between news articles might be harder than between movies.
Whereas similarity between movie pairs is usually attributed to the annotated metadata (e.g.,
genre), two news articles could be similar because they are recent, address a common topic,
or because a person appears in both stories. Although a few studies let humans assess the
overall similarity between news headlines [
          <xref ref-type="bibr" rid="ref2 ref28">2, 28</xref>
          ], none have done so across multiple features.
For example, users in the work of Tintarev and Masthof [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] successfully judged the similarity
between news articles, but only based on their headlines.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Key diferences with previous work</title>
        <p>
          Novel to our approach is the use of feature-specific similarity representations and functions in
news, as well as grounding them in human similarity judgments. Most relevant to our approach
are the works of Trattner and Jannach [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], and Yao and Harper [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ], for they explore how
computational functions for similarity compare to users’ perception of similarity. In particular,
Trattner and Jannach [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] serve as an example for our approach, for they also present an online
study on similarity perceptions. However, these studies concerned retrieval in music, movies,
and recipes. Since the merit of feature-specific similarity functions in other domains is unknown
for news, the goal of the current study is to assess their performance in news.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>We assess the utility of diferent feature-specific similarity functions by collecting human
judgments of similarity for pairs of news articles. In this section, we describe (1) the dataset
and its specific features, (2) the engineered similarity functions, and (2) the design of our user
study to determine the efectiveness of these functions.</p>
      <sec id="sec-3-1">
        <title>3.1. Dataset and Feature Engineering</title>
        <p>3.1.1. News Database
We employed a publicly available news article database. We focused on a scenario of a single
news source, as the use of multiple news websites could lead to ‘duplicate’ articles on the same
news event. To ensure reproducibility, we obtained news articles from the open Washington
Post Corpus [29]. The news items in the dataset comprised title, author (including a bio),
date of publication, section headers, and the main body text. In addition, we retrieved the
images associated with the news articles, 655,533 in total. After removing duplicates from
the original source, our remaining dataset contained 238,082 articles, which were originally
published between Jan’12 and Aug’18.
For our user study, we selected news articles categorized in ‘Politics’, as they were on
(inter)nationally relevant topics. Other categories were neglected as they focused more on local
events and may have an efect on similarity estimates, as these events may not be familiar to
the user. We sampled a total of 2400 ‘Politics’ news articles, 400 from each year between 2012
and 2017, for the descriptive statistics are reported in Table 1.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Modeling Similarity with Feature-Based Similarity Functions</title>
        <p>
          To model the similarity between two news articles, we used twenty similarity functions and
representations across seven dataset features. We designed functions in line with the field’s
current state-of-the-art, by exploiting specific cues that people may use to assess similarity
between two items – based on findings from the movie and recipe domains [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>
          Table 2 describes the developed similarity functions. For each pair of news articles, we
computed similarity scores based on seven main features: subcategory, title, presented images,
author (including bio), publication dates, and body text (first 50 words and full text). For
text-based features, the similarity functions were either based on word mappings or distance
methods, while similarity based on subcategories and authors was computed using a Jaccard
coeficient. Moreover, we computed date-time similarity (i.e. recency modeling) through a linear
function that computed how many days apart two articles were published.
3.2.1. Title
Title-based similarity was computed using four string similarity functions and a topic-based
one. The string-based functions were based on distance metrics: the Levenshtein distance (LV)
[30], the Jaro-Winkler method (JW) [31], the longest common subsequence, and the bi-gram
distance method (BI) [32]. Similar to Trattner and Jannach [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], Latent Dirichlet Allocation (LDA)
topic-modeling was set to 100 topics.
3.2.2. Image Features
In line with the current state-of-the-art [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we computed image-based similarity using six
diferent functions. These were an image’s brightness, sharpness (i.e., based on a pixel’s
intensity), contrast, colorfulness (i.e., based on the sRGB color space), entropy (i.e., amount of
information captured per image dot), and image embeddings. Mathematical details are available
in our Github repository.
3.2.3. Body Text
Body similarity was computed for two string-based functions (i.e., TF-IDF), a topic-based
function (i.e., LDA), and a text sentiment-based metric (based on research of [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]). TF-IDF
encodings were paired with cosine similarity, for which we discerned between similarity based
on an article’s first 50 words (i.e., an article’s first paragraph), which could be compared to the
average movie plot length in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], and similarity based on the entire body text.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. User Study</title>
        <p>
          The similarity functions in Table 2 were assessed by computing similarity scores per news
article pair and comparing them to human judgments. We explain our sampling strategy and
how we collected human judgments of similarity.
3.3.1. Sampling News Article Pairs on Similarity
We compiled a set of news article pairs that were either strongly similar, dissimilar or in-between.
To ensure a good distribution, we employed a stratified sampling strategy that was in line with
previous work [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We computed the pairwise similarity across all 2400 news articles, averaging
the similarity values of all functions in Table 2. Pairs were ordered on their similarity levels and
divided into ten deciles, groups D1-D10 of equal size. We sampled a total of 6,000 news article
pairs: 2,000 dissimilar pairs between decile D1, 2,000 pairs from deciles D2-D9, and 2000 similar
pairs from decile D10.
3.3.2. Procedure and Measures
The resulting 6000 news article pairs were used to collect human judgments on similarity. Figure
2 depicts a mock-up of the main application, showing from top to bottom diferent news article
features (Note: an author bio could also be inspected). Users could read all text if they clicked
‘read more’.
        </p>
        <p>
          Users were presented ten news article pairs, of which one was an attention check.2 Much
like in the study by Tintarev and Masthof [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], users were asked to assess the similarity of each
news article pair on a 5-point scale (cf., Figure 2). As an extension to other studies, users also
indicated their familiarity with each article and the level of confidence in their assessment (all
5-point scales). Moreover, we asked users to what extent they employed diferent features in
their similarity judgments (5-point scales). Finally, we inquired on a user’s frequency of news
consumption and their demographics.
3.3.3. Participants
Participants were recruited from Amazon MTurk. Since we used a database of news articles
that concerned American politics, we only recruited U.S.-based participants. They had at least
an average hit acceptance rate of 98% and 500 completed HITs. A total of 401 participants
completed our study, with a median time of 6 minutes and 35 seconds, who were compensated
with 0.5 USD.
        </p>
        <p>
          Only 241 participants (60.01%) passed our attention check, which was slightly higher than in
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. This resulted in usable 2,169 similarity judgments; only 21 pairs were presented twice, to
diferent users. This final sample (53% males) mostly consisted of age groups 25-34 (33.2%) and
35-44 (30.3%), of which 66% reported to visit news websites at least once a week (24.9% did so
daily), while 50 participants rarely read online news.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>For our analyses, we first examined the use of diferent news features, assessing diferent
similarity functions through human judgments (RQ1). Furthermore, we predicted human
similarity judgments using model-based approaches (RQ2). In addition, we compared our results
for RQ1-RQ2 with the news and recipe domains (RQ3).</p>
      <sec id="sec-4-1">
        <title>4.1. News Features Usage</title>
        <p>We examined to what extent participants used diferent features to assess similarity between
news articles (RQ1). Figure 3A summarizes the results for participants who passed the attention
check. On average, an article’s title (M=4.2) and body text (M=4.4) were considered most often,
while sentiment (M=3.7) and an article’s subcategory (M=3.2) saw above average use. In contrast,
author features, publication date, an article’s image were rarely used to assess similarity. Figure
3B shows that all diferences between features were significant (all:  &lt; 0.01), based on a one-way
ANOVA on feature usage and a Tukey’s HSD post-hoc analysis.</p>
        <p>
          With regard to [RQ3], most findings were compatible with the movie and recipes domains.
The use of title and body text was also observed for recipes (i.e., ingredients and directions),
while plot and genre features were used in movies [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The use of the genre cue in movies was
also more frequent than the use of a news article’s subcategory.
        </p>
        <sec id="sec-4-1-1">
          <title>2Users were asked for this pair to only answer ‘5’ on all answer scales.</title>
          <p>Iagem trhuoA foeubP
t
a
Information Cue D
t
x
e
T
y
d
o
B
o
i
B
r
o
h
t
u
A</p>
          <p>Itruhoage−Am ttrchouuab−SA littrouhe−TA ittrroouuhoh−BAA ittryxoodoheu−TBBA itftraeoobuohu−PBAD iItrooauheg−BAm ittrcoohaubu−SBA ilittroohue−TBAPttrxyeudoho−TABaitftyxeoaedoub−TPBDr tIxyedaoge−TBm ttxyceudoab−TSB littyxeode−TTB fttrbuuoeaho−APD Itfbuaoeage−PDm tftcubaeoaub−SPD lifttbouaee−TPD Itceaguba−Sm liIteage−Tm ilttceuba−TS</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Grounding Similarity Functions in Human Similarity Judgments</title>
        <p>
          4.2.1. Descriptive Statistics
To address [RQ1], we compared feature-specific similarity scores of presented news article pairs
to similarity ratings given by users. Figure 4 contrasts the similarity scores, averaged across
all similarity functions, with the users’ similarity judgments, averaged per user. As shown,
there was a discrepancy between the similarity inferred by the similarity functions, which was
distributed around the mean value of 0.39 ( = 0.085), and the similarity judgments of users,
which was lower ( = 0.18,  = 0.24). This suggested that users were less likely to judge
two news articles to be similar, compared to our similarity functions.
4.2.2. Feature-specific Comparison in News
Table 3 outlines the Spearman correlations between similarity functions and the similarity
judgments given by users. It diferentiates between the results of our own user study (i.e.,
‘News Articles’), and that of [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] for the movie and recipe domains, allowing for cross-domain
comparisons (discussed later).
        </p>
        <p>
          We first discuss the results for the news domain and focus on users who passed the attention
check. Table 3 shows that most correlations were modest (all  &lt; 0.3), suggesting that the news
similarity functions did not fully reflect a user’s judgment. Among all features, we found that
full body text similarity (BodyText:TFIDF ) correlated most strongly to user judgments:  = 0.29,
 &lt; 0.001, which was also the most commonly used feature in earlier news recommendation
scenarios [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Although some users might have only inspected an article’s first 50 words (cf.,
the text visible in Figure 2; on average 15% of the full body text), the BodyText:50TFIDF metric
had a much lower correlation:  = 0.14,  &lt; 0.001.
        </p>
        <p>Among all image similarity metrics, embeddings (Image:EMB) had the highest correlation
with user judgments:  = 0.17* , which was modest nonetheless. This function, along with
BodyText:TFIDF, Author:Jacc, AuthorBio:TFIDF, and Subcat:Jacc, seemed to best represent user
similarity judgments in news.</p>
        <p>Table 3 highlights that other functions did not represent a user’s similarity judgment in news,
such as sentiment (BodyText:Sent):  = − 0.02. Surprisingly, although most users considered
titles to assess similarity, their judgments were hardly similar to each distance-based title
similarity function (all  &lt; 0.1). Note that the Title:LDA and BodyText:LDA might have sufered
from insuficient latent topic information, as their correlations were close to zero.</p>
        <p>
          Finally, because similarity ratings correlated positively with familiarity scores ( = 0.27* ),
we tested whether only including judgments for familiar news article pairs (i.e., with scores
of 4 or higher) afected the results in Table 3. Although this would increase correlations with
1 to 4 percentage points for most features, most changes were statistically significant (e.g.,
TFIDF:BodyText would increase from 0.29 to 0.33).
4.2.3. Cross-domain Comparison
Using data from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we compared the results in Table 3 across the news, recipe, and movie
domains. Correlations between human judgments and similarity functions in the news domain
were shown to be much weaker than in the recipe domain and, to a lesser extent, the movie
domain. This applied to most features, including title, image, and body text.
        </p>
        <p>
          Two notable diferences lie in title and image-based functions. Whereas the reported
correlaSubcat:Jacc
Title:LV
Title:JW
Title:LCS
Title:BI
Title:LDA
Image:BR
Image:SH
Image:CO
Image:COL
Image:EN
Image:EMB
Author:Jacc
Date:ND
BodyText:TFIDF
BodyText:50TFIDF
BodyText:LDA
BodyText:Sent
AuthorBio:TFIDF
AuthorBio:LDA
study), and recipes and movies (obtained from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]).   denotes correlations with users who passed
the attention check,   denotes those with all users. * &lt; 0.05;** &lt; 0.01;*** &lt; 0.001.
        </p>
        <p>News Articles</p>
        <p>Recipes</p>
        <p>Movies
Similarity Function  pass</p>
        <p>Sim. Function  pass</p>
        <p>Sim. Function  pass
tions for title features were weak in news ( &lt; 0.1), the distance-based title metrics showed
strong correlations with user judgments for recipes (ℎ ≈ 0.5). With regard to image-specific
similarity, functions in news were only weakly correlated to human judgments (  = 0.17),
while they were more representative for recipes (  = 0.44) and movies (  = 0.22).</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Predicting Human Similarity Judgments</title>
        <p>Going beyond simple correlation analyses, we also sought to predict similarities with these
functions using state-of-the-art machine learning methods (RQ2), as used in recommender
systems research. This helped us to understand each feature’s importance, beyond the
featurespecific correlations in Table 3.
4.3.1. Model Evaluation and Cross-Domain Comparison
To determine model performance, standard metrics such as Root Mean Square Error (RMSE), R2,
and Mean Absolute Error (MAE) were used. Five-fold cross-validation was used as an evaluation
protocol. Furthermore, by applying grid search on a validation set from the training data, the
optimal hyper-parameters for each model were found.</p>
        <p>
          Genre:Jacc
Title:LV
Title:JW
Title:LCS
Title:BI
Title:LDA
Image:BR
Image:SH
Image:CO
Image:COL
Image:EN
Image:EMB
Dir:Jacc
Date:MD
Plot:TFIDF
Plot:LDA
than a random baseline ( &lt; 0.05). Table 4 (i) also compares our results to findings from
the recipe and movie domains (RQ3), adapted from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Most notably, we found that Lasso is
the best performing model, while Ridge outperformed other models in the Recipe and Movie
domains. Moreover, the news model (i.e., 2 = 0.33) was less accurate than the recipe model
(i.e., 2 = 0.51), while its accuracy was comparable to that of the movie model (i.e., 2 = 0.36).
This suggested that the similarity functions adapted from [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] were less representative for user
similarity judgments in the news domain.
4.3.2. Feature-specific Models and User Characteristics
To further explore [RQ2], Table 4 (ii) describes the performance of feature-specific models.
To compare our findings to other domains, Ridge regression was used to combine multiple
similarity functions per feature, while linear regression was used for features with a single
function. Although the representativeness of the diferent BodyText similarity functions varied
(cf., Table 3), it was the best predicting feature, even outperforming the All features model.
        </p>
        <p>Finally, we included user characteristics and demographics in our Ridge model. We tested
the impact of each additional feature separately, as well as simultaneously. Table 4 (iii) outlines
that the addition of user characteristics (e.g., news consumption frequency) hardly afected the
model’s predictive quality. A model that included the user’s age reported the lowest RMSE,
but this decrease (from 0.9141 in (i) to 0.9081 in (iii)) was not statistically significant diferent
according to a Wilcoxon Rank-Sum test.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>This work contributes to the literature on similarity estimates, which is a central theme in the
recommender systems literature, with a particular focus on the news domain. It is among the
ifrst to study news similarity representations in detail, making the following contributions:
1. Determining which features are considered by users when judging similarity between
news articles.
2. Assessing how feature-specific similarity functions relate to similarity judgments.
3. Predicting similarity judgments of users through machine learning models.
4. Comparing our results to findings from the movie and recipe domains.</p>
      <p>
        We have taken a first step towards designing representative feature-specific similarity functions
for news, going beyond other studies that focused on overall similarity or just a single feature
[
        <xref ref-type="bibr" rid="ref2 ref28">28, 2</xref>
        ].
      </p>
      <sec id="sec-5-1">
        <title>5.1. Feature-specific Similarity</title>
        <p>
          We have assessed the value of feature-specific similarity functions in the news domain, adapted
from recommender literature in the news, movie, and recipe domains [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We find that most
feature-specific similarity functions only partially reflect a user’s similarity judgment, yielding
modest correlations. To best reflect user perceptions, we suggest that content-based news
recommender systems should exploit the body text, supported by image embeddings, article
categories, and the author. The representativeness of body text is grounded in the reported
feature use, as well as consistent with previous studies on news retrieval [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In contrast,
although users used a news article’s title in their similarity judgments, we have found title-based
similarity functions to be hardly representative for these judgments. The weak correlations
could be attributed to the relatively ‘wordy’ titles of news articles (cf., Table 1), compared to
the other domains in scope. At the similarity function level, it is possible that the string-based
functions do not capture more subtle similarities between news articles, for example if two
headlines describe an identical news event, but from a diferent news angle. Moreover, the
insignificant correlation between Title:LDA and a user’s similarity judgment suggests that
word-based similarity is unrelated to how users perceive a pair of news articles.
        </p>
        <p>In terms of predicting similarity judgment, we have used machine learning to determine
model accuracy and feature importance, and to examine the predictive value of additional user
characteristics. We find that the addition of user characteristics and demographics in our models
does not significantly improve the accuracy indicators, indicating there is little variance across
users. In terms of similarity modeling, these findings suggest that the main focus should be on
leveraging a news article’s BodyText, while other features should only be used if the similarity
functions would be more accustomed to the news domain.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Cross-domain Comparisons</title>
        <p>
          We have also explored cross-domain diferences. In line with [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we have found further evidence
that diferent domains call for diferent similarity functions. For one, the ridge regression model
for news is found to be somewhat less accurate than for news and recipes, although a 2 of 0.33
is reasonable. However, the MAE of 0.75 for a measure that is scaled from 1 to 5 suggests that
there is room for improvement, which could be attributed to the low given similarity scores.
        </p>
        <p>It seems that text-based similarity (i.e., movie plot, recipe directions, news’ body text) is useful
in most domains in scope, given an appropriate similarity function. BodyText features are listed
among the strongest correlations, as well as among the strongest predictors. In contrast, the title
and image features are less representative of similarity judgments in news and movies, compared
to the recipe domain. Whereas only image embeddings seem to be somewhat representative of
news similarity assessments, images features are more useful in determining recipe similarity.</p>
        <p>
          We have observed that the model accuracy reported in Table 4 is comparable to findings
from the movie domain (cf., [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]). This is despite the diferences in given similarity scores across
domains (which is much lower for news; see Figure 4), and the weaker correlations reported in
Table 3. All in all, the news domain seems to require similarity functions that are less
‘tasterelated’ than movies or recipes, but further research is needed to develop more accurate ones,
possibly by also using psychological theories on similarity [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Limitations &amp; Future Work</title>
        <p>A notable limitation of our approach is the use of a single dataset, which only comprises
political articles. It is possible that the relation between similarity judgments and
featurespecific similarity functions would be afected when employing additional main categories. For
example, ‘name-dropping’ sports teams in a news article title might result in a higher feature
importance for news article titles, compared to ‘political judgments’. Furthermore, the news
articles shown to users were a few years old, which might have reduced familiarity levels and,
in turn, decreased similarity ratings.</p>
        <p>
          Another shortcoming is that it is not entirely clear on what grounds users have made their
similarity judgments. We have asked them a single question on similarity, while some other
studies have also used multiple questionnaire items [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. However, our inquiry on reported
feature use by participants (RQ1) reveals a part of the underlying cognitive process, and suggests
what are good features to optimize for. In fact, this is also a new finding.
        </p>
        <p>
          For future studies, we suggest to develop and assess feature-specific similarity functions
that unambiguously apply to the news domain. For example, similarity functions that leverage
named entities (e.g., ‘Donald Trump’ or ‘France’) could help to manage user expectations about
inter-article similarity. Furthermore, it would be most useful to test our assertions in an online
study where news article recommendations are evaluated, much like the work of [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ].
        </p>
        <p>Above all, we like to emphasize that the current study serves as a first step. Based on these
ifndings, future studies can further develop feature-specific similarity functions for the news
domains, for this paper provides insight in what types of functions and features are successful,
and which ones are not.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was supported by industry partners and the Research Council of Norway with
funding to MediaFutures: Research Centre for Responsible Media Technology and Innovation,
through the Centres for Research-based Innovation scheme, project number 309339.</p>
      <p>Society for Information Science 51 (2000) 793–804.
[29] NIST, Trec washington post corpus, 2019. Data retrieved from, https://trec.nist.gov/data/
wapost/.
[30] L. Yujian, L. Bo, A normalized Levenshtein distance metric, IEEE Transactions on Pattern</p>
      <p>Analysis and Machine Intelligence (2007).
[31] M. A. Jaro, Advances in record-linkage methodology as applied to matching the 1985
census of Tampa, Florida, Journal of the American Statistical Association (1989).
[32] G. Kondrak, N-gram similarity and distance, in: Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics),
2005.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Karimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jugovac</surname>
          </string-name>
          ,
          <article-title>News recommender systems-survey and roads ahead</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>54</volume>
          (
          <year>2018</year>
          )
          <fpage>1203</fpage>
          -
          <lpage>1227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tintarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Masthof</surname>
          </string-name>
          ,
          <article-title>Similarity for news recommender systems</article-title>
          ,
          <source>in: In Proceedings of the AH'06 Workshop on Recommender Systems and Intelligent User Interfaces, Citeseer</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>A. S. Das</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Datar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Garg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Rajaram</surname>
          </string-name>
          ,
          <article-title>Google news personalization: scalable online collaborative filtering</article-title>
          ,
          <source>in: Proceedings of the 16th international conference on World Wide Web</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>271</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fortuna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mladenić</surname>
          </string-name>
          ,
          <article-title>Real-time news recommender system</article-title>
          ,
          <source>in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>583</fpage>
          -
          <lpage>586</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>T. De Pessemier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Courtois</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Vanhecke</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Van Damme</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Martens</surname>
          </string-name>
          , L. De Marez,
          <article-title>A user-centric evaluation of context-aware recommendations for a mobile news service</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>75</volume>
          (
          <year>2016</year>
          )
          <fpage>3323</fpage>
          -
          <lpage>3351</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Elbadrawy</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Karypis, User-specific feature-based similarity models for top-n recommendation of new items</article-title>
          ,
          <source>ACM Transactions on Intelligent Systems and Technology (TIST) 6</source>
          (
          <issue>2015</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Trattner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <article-title>Learning to recommend similar items from human judgments, User Modeling and User-Adapted Interaction 30 (</article-title>
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Ö.</given-names>
            <surname>Özgöbek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Gulla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Erdur</surname>
          </string-name>
          ,
          <article-title>A survey on challenges and methods in news recommendation</article-title>
          ,
          <source>in: International Conference on Web Information Systems and Technologies</source>
          , volume
          <volume>2</volume>
          , SCITEPRESS,
          <year>2014</year>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>285</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Winecof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brasoveanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Casavant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Washabaugh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Graham</surname>
          </string-name>
          ,
          <article-title>Users in the loop: a psychologically-informed approach to similar item retrieval</article-title>
          ,
          <source>in: Proceedings of the 13th ACM Conference on Recommender Systems</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>52</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Richardson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smeaton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <article-title>Using WordNet as a knowledge base for measuring semantic similarity between words</article-title>
          ,
          <source>Technical Report Working Paper CA-1294</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>An information-theoretic definition of similarity</article-title>
          , in: ICML, volume
          <volume>98</volume>
          ,
          <year>1998</year>
          , pp.
          <fpage>296</fpage>
          -
          <lpage>304</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Takale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Nandgaonkar</surname>
          </string-name>
          ,
          <article-title>Measuring semantic similarity between words using web documents</article-title>
          ,
          <source>International Journal of Advanced Computer Science and Applications (IJACSA) 1</source>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lv</surname>
          </string-name>
          , T. Moon,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kolari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <article-title>Learning to model relatedness for news recommendation</article-title>
          ,
          <source>in: Proceedings of the 20th International Conference on World Wide Web</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Billsus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          ,
          <article-title>User modeling for adaptive news access, User Modelling</article-title>
          and
          <string-name>
            <surname>User-Adapted Interaction</surname>
          </string-name>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zanker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Felfernig</surname>
          </string-name>
          , G. Friedrich,
          <source>Recommender systems: an introduction</source>
          , Cambridge University Press,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>I.</given-names>
            <surname>Cantador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <article-title>Semantic contextualisation in a news recommender system</article-title>
          ,
          <source>in: Workshop on Context-Aware Recommender Systems at the RecSys 2009: ACM Conference on Recommender Systems</source>
          , ACM, New York,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lommatzsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kille</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hopfgartner</surname>
          </string-name>
          , L. Ramming, NewsREEL multimedia at MediaEval 2018:
          <article-title>News recommendation with image and text content</article-title>
          ,
          <source>in: CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rorvig</surname>
          </string-name>
          ,
          <article-title>Images of similarity: A visual exploration of optimal similarity metrics and scaling properties of trec topic-document sets</article-title>
          ,
          <source>Journal of the American Society for Information Science</source>
          <volume>50</volume>
          (
          <year>1999</year>
          )
          <fpage>639</fpage>
          -
          <lpage>651</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Goossen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Ijntema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Frasincar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hogenboom</surname>
          </string-name>
          , U. Kaymak,
          <article-title>News personalization using the CF-IDF semantic recommender</article-title>
          , in: ACM International Conference Proceeding Series,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bogers</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Van Den Bosch</surname>
          </string-name>
          ,
          <article-title>Comparing and evaluating information retrieval algorithms for news recommendation</article-title>
          ,
          <source>in: RecSys'07: Proceedings of the 2007 ACM Conference on Recommender Systems</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>D.</given-names>
            <surname>Billsus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          ,
          <article-title>Personal news agent that talks, learns and explains</article-title>
          ,
          <source>in: Proceedings of the International Conference on Autonomous Agents</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>B. P.</given-names>
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shiebler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sedhain</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Bronstein</surname>
          </string-name>
          ,
          <article-title>Tuning word2vec for large scale recommendation systems</article-title>
          ,
          <source>in: Fourteenth ACM Conference on Recommender Systems</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>732</fpage>
          -
          <lpage>737</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yan</surname>
          </string-name>
          , T. Liu,
          <article-title>A bert-based ensemble model for chinese news topic prediction</article-title>
          ,
          <source>in: Proceedings of the 2020 2nd International Conference on Big Data Engineering</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Knox</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Padmanabhan,</surname>
          </string-name>
          <article-title>SCENE: A scalable two-stage personalized news recommendation system</article-title>
          ,
          <source>in: SIGIR'11 - Proceedings of the 34th International ACM SIGIR Conference</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>S.</given-names>
            <surname>Soroka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Young</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Balmas</surname>
          </string-name>
          ,
          <article-title>Bad News or Mad News? Sentiment Scoring of Negativity, Fear, and Anger in News Content, Annals of the American Academy of Political and Social Science (</article-title>
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Pon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Cardenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buttler</surname>
          </string-name>
          , T. Critchlow,
          <article-title>Tracking multiple topics for finding interesting articles</article-title>
          ,
          <source>in: Proceedings of the ACM SIGKDD International Conference</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Harper</surname>
          </string-name>
          ,
          <article-title>Judging similarity: a user-centric study of related item recommendations</article-title>
          ,
          <source>in: Proceedings of the 12th ACM Conference on Recommender Systems</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>288</fpage>
          -
          <lpage>296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>C.</given-names>
            <surname>Watters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Rating news documents for similarity</article-title>
          ,
          <source>Journal of the American</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>