<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>mendation with Automatic Annotations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ismail Harrando</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raphaël Troncy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>EURECOM</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Recommender Systems, Content-based Recommendation, Knowledge Graph, Automatic Annotation</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Environments (ComplexRec) Joint Workshop @ RecSys 2021</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>be then used to generate a KG connecting all content in</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>the media catalog. Given the versatility of Knowledge</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>With the immense growth of media content production on the internet and increasing wariness about privacy, content-based recommendation systems ofer the possibility of promoting media to users (e.g. posts, videos, podcasts) based solely on a representation of the content, i.e. without using any user-related data such as views and more generally interactions between users and items. In this work, we study the potential of using of-the-shelf automatic annotation tools from the Information Extraction literature to improve recommendation performance without any extra cost of training, data collection or annotation. We experiment with how these annotations can improve recommendations on two tasks: the traditional user history-based recommendation, as well as a purely content-based recommendation evaluation. We pair these automatic annotations with the manually created metadata and we show that Knowledge Graphs through their embeddings constitute a great modality to seamlessly integrate this extracted knowledge and provide better recommendations. The evaluation code, as well as the enrichment generation, is available at https://github.com/D2KLab/ka-recsys.</p>
      </abstract>
      <kwd-group>
        <kwd>only on the content</kwd>
        <kwd>we leverage several Information</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>As user engagement with content online has become
a crucial element in most if not all content-providing
multimedia platforms – i.e. retaining a user’s interest in
the provided content and maximizing their time
watchrecommendations out of), and in cases where it is hard
to collect such feedback (anonymity, privacy).</p>
      <p>
        In this paper, we are interested in the second kind of
recommendations which are based solely on the content
of the media to recommend. The “content” in
contentbased can refer to a variety of potential formats: text,
ing/reading/listening to the content, the role of recom- image, video, metadata (e.g. tags and keywords) and
mender systems cannot be overstated in shaping and im- so on. Typically, a representation of such content is
the usually overwhelming amount of data into a con- the representation of an item of interest (e.g. the video
proving the user experience when it comes to consuming
and interacting with said content, as it helps funneling
densed, targeted and interesting selection of items that
the user is most likely to find enjoyable and interesting.
extracted or learned, and the task of recommendation
is then cast as a content similarity/retrieval task: given
the user is currently watching), and the representation of
all items already existing in the catalog, we want to find
laborative filtering , i.e. leveraging user statistics and
Traditionally, recommendation systems either use col- the items which have the highest similarity to the item
of interest. While many varieties of this approach exist
their implicit/explicit feedback (views, likes, watch time) (ones that target other metrics such as serendipity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
the same items), or provide content-based recommen- framed as finding the best content representation that
to find items to recommend (the underlying assumption
is that people who have similar interests interact with
dations, which rely on the content of the item itself to
ifnd similar content without any input from the user.
      </p>
      <p>
        Content-based recommendations are particularly
interesting in the case of the cold start problem where there
is no feedback from users (no interactions to based the
nEvelop-O
(R. Troncy)
Systems (KaRS) &amp; 5th Edition of Recommendation in Complex
Graphs, they allow us to combine these automatic
annotations with already existing metadata seamlessly. To
validate this approach, we focus on studying the TED
dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], an open-sourced multimedia dataset that
offers the unique possibility of evaluating
recommendations based on both the content only (“related videos”, as
curated by human editors) and the user preferences based
on their interactions history. We demonstrate that our
approach improves the recommendation performance
on both tasks, and that KGs are a reliable framework to
integrate external knowledge into the task of
recommendation.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The TED Dataset The TED dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a multimodal
dataset which contains the audiovisual recordings of the
TED talks downloaded from the oficial website 1, which
sums up to 1149 talks, alongside metadata fields and user
profiles with rating and commenting interactions. The
metadata fields are as follows: identifier, title,
description, speaker name, TED event at which the talk is given,
transcript, publication date, filming date, and number of
views. For nearly every video, the dataset contains a list
of user interactions (marked by the action of “Adding to
favorites”), as well as up to three “related videos”, which
are picked by the editorial staf to be recommended to
the user to watch next. What is unique for this dataset is
that it provides two sorts of ground truths for the
recommender system use-case, that we can formulate in these
two tasks:
supposed to reflect subjective topical relatedness
between talks in the corpus. Performance on this
task reflects the model’s ability to recommend
content to either users without an interactions
history (new users, visitors without accounts) or
new videos (that have not yet received any
interactions). We note that in the ground truth, some
talks are associated with three related talks, some
with two, and some with only one. We account
for this in the evaluation metrics.
      </p>
      <p>
        Previous works have studied specific aspects of this
dataset such as sentiment analysis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], estimating trust
from comments polarity and ratings to improve
recommendation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], or studying hybrid recommender systems
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In this work, we focus our interest on this dataset
as it ofers a unique possibility of evaluating
contentbased recommendation using both real user feedback
and hand-picked recommendations, as the later has not
been considered in any of the published works on this
dataset to the best of our knowledge.
      </p>
      <p>
        We also note that, while the dataset is multimodal (TED
Talks Videos are also available), our work does not tackle
visual information extraction, mainly because TED Talks
are not visually diverse (mostly speakers and audience
wide shots). This is however a promising direction of
work that has been tackled in previous works [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
• Task 1 - Personalized (user-specific) recom- Graph-based Recommender Systems Given the
remendations: based on a user’s list of favorite cent growing interest in Knowledge Graphs and their
talks, the task is to predict what they would watch applications, there is a growing literature on the
technext. A evaluation dataset can thus be created niques and models that can be leveraged to build
using a “leave one out” protocol, i.e. removing “knowledge-aware” recommender systems. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] present
one interaction from the user list of favorites, and such an approach to bring external knowledge to the
measuring how successful a method is in predict- task of content-based Knowledge Graphs, identifying
ing the omitted item. Most recommender system- two main approaches to what they called
“Semanticstype datasets contain a similar information, i.e. aware Recommender Systems” to tackle traditional
probwhat items a user has actually interacted with in lems of content-based recommender systems, Top-down
reality, based on their viewing/interaction history. Approaches which incorporate knowledge from
ontologThis task is usually handled with collaborative ical resources such as WordNet [11], and encyclopedic
ifltering methods (e.g. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), but is still interesting knowledge sources such as Wikipedia2, to enrich the
for content-based recommendation in the case of item representations with external world and linguistic
the cold start problem: when a new talk is added to knowledge, and Bottom-up Approaches which uses
linthe platform, how can we recommend it to other guistic resources such as what we commonly refer to as
users? The most common approach is to use its distributional word representations, e.g. using pretrained
content to recommend it to users who previously word embeddings to avoid the issue of exact matching
liked a similar content. in traditional content-based systems. They also raise
the problem of the potential use of a graph structure
• Task 2 - General (content-based) recommen- to discover latent connections among items, which we
dations: to the best of our knowledge, this is the study in our experiments. [12] ofers an extensive
suronly dataset which ofers ground truth for multi- vey of Knowledge Graph-based Recommender System
media recommendations based on content only, approaches, proposing a high-level taxonomy of methods
which are referred to as “related videos”, manu- that either use graph embeddings, connectivity patterns
ally annotated by TED editorial staf. These are
      </p>
      <sec id="sec-2-1">
        <title>1https://www.ted.com</title>
      </sec>
      <sec id="sec-2-2">
        <title>2https://en.wikipedia.org/wiki/Main_Page</title>
        <p>(common paths mining), or combining the two. In this pa- 3.1. Topic Modeling
per, we only focus on embedding-based methods to study
the use of automatic annotations on the performance of Topic modeling is a ubiquitously used Information
Extracrecommender systems. Additionally, unlike some previ- tion technique, which attempts to find the latent topics in
ous works, our work does not tackle the two tasks jointly a text corpus. A topic can be roughly defined as a
coheras a learning problem[13], but attempts to show how ent set of vocabulary words that tend to co-appear with
the same approach can at the same time improve the high probability in the same documents. When applied
performance on both. on documents of natural language, topic models have the
ability to find the underlying “themes” in the document
collection, such as sport, technology, etc.
3. Approach The literature on topic modeling is rich and diverse,
with approaches relying solely on word counts such as
The proposed approach builds on using several Informa- the commonly used LDA [15], to using state-of-the-art
tion Extraction techniques such as Topic Modeling (3.1), representations to represent documents in more
meanNamed Entity Recognition (3.2), and Keyword Extraction ingful representational spaces [16, 17]. Topics are usually
(3.3), to generate high level descriptors – annotations – represented with their “top N words” (the  words most
of the content of each video in the dataset. Once the likely to appear given a topic). In our dataset, we find
annotations are generated for each video, we use them to topics such as:
build a Knowledge Graph connecting the talks by their
annotations. This approach also allows us to integrate • Technology: network,online,computers,digital,google
external metadata if such metadata is available (for our • Environment: waste,plants,electrical,plastic,battery
dataset, metadata such as “Tags” and “Themes” are avail- • Gaming: games,online,virtual,gamers,penalty
able and will be used). Once the KG is generated, we can • Health: aids,malaria,drugs,mortality,vaccine
use a graph embedding method [14] to generate a
fixeddimensional embedding for each video in the dataset,
such that videos having similar annotations would be
represented in proximity in the embedding space. As a
result, we can measure the (cosine) similarity between any
two videos’ embeddings as a proxy to their relatedness.</p>
        <p>The approach is illustrated in Figure 1.</p>
        <p>We present a selection of automatic annotations
techniques and how they are used in our approach in the
following subsections.</p>
        <p>For our experiments, we use LDA as it is still commonly
used and ofers simple yet competitive performance[ 18].</p>
        <p>We test two aspects of topic modeling that can influence
the structure of the graph (the number of nodes and
relations added) which are the number of topics (i.e. the
number of topic nodes in the final KG), as well as the
cutof threshold reflecting the topic model’s confidence is
assigning a given topic to a given talk (which would afect
the number of relations to topic nodes). We report the
results in Section 4. For a better performance of the topic
modeling task, we preprocess our dataset as follows:</p>
      </sec>
      <sec id="sec-2-3">
        <title>1. Lowercase all words</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments and Results</title>
      <sec id="sec-3-1">
        <title>2. Remove short words (less than 3 characters) 3. Remove punctuation 4. Remove the most frequent words (top 1%)</title>
        <sec id="sec-3-1-1">
          <title>3.2. Named Entity Recognition</title>
          <p>In this section, we explain the experimental protocol and
describe the results for the diferent experiments done
to study the impact of using automatic annotations on
recommendation performance. We first reintroduce the
dataset and how it is going to be used in the rest of this
section. Then, we define the metrics we use to measure
this performance (Hit Rate, Mean Reciprocal Rate and
Normalized Discounted Cumulative Gain), and the
embedding method to use for the rest of the experiments.</p>
          <p>For each automatic annotation considered (i.e. Topics,
Named Entities and Keywords), we consider several
conifgurations, with and without the addition of the original
metadata from the dataset. Finally, we observe the
potential of combining the resulting automatically generated
graph embeddings with the textual embeddings of the
content, and show how the two complement each other
to push the performance even higher.</p>
          <p>Named Entity Recognition is the task of extracting from
unstructured text, terms or phrases that refer to named
entities, i.e. real world objects that have proper names
and can refer to one of several classes: persons, places,
organizations, etc. Once extracted, these Named Entities
can be used as high level descriptors for a text content.</p>
          <p>For example, if two talks mention “Einstein” and
“Newton”, they may have a similar topic. While this task used
to rely on grammatical and hand-crafted features to
designate what would constitute a Named Entity (e.g. starts
with a capital letter), modern systems do without such
hand crafted features [ 19, 20], but rely on combining
the learning power of neural networks with annotated
corpora of Named Entities.</p>
          <p>In our experiments, we use SpaCy’s [21] NER model 4.1. Dataset
which uses an architecture that combines a word
embedding strategy using sub word features, and a deep As mentioned previously, the TED Talks dataset has two
convolution neural network with residual connections, versions of ground truths (or prediction tasks) for
recomwhich is “designed to give a good balance of eficiency, mendation, namely:
accuracy and adaptability”3.
• User-specific recommendations that are based on</p>
          <p>For our experiments, we keep the Named Entities be- actual users interactions history (henceforth
relonging to the following classes: ’PERSON’, ’LOC’
(locaferred to as T1)
tion), ’ORG’ (organization), ’GPE’ (geopolitical entity),
’FAC’ (faculty), ’PRODUCT’, and ’WORK_OF_ART’. We • Content-based recommendations, which are
also experiment with the impact of keeping all extracted hand-picked by editors for each talk (henceforth
Named Entities or filtering some out based on frequency, referred to as T2)
thus altering the number of added nodes to the graph
and their relations to the existing talks. We report the For our evaluation purposes, to unify the evaluation for
results in Section 4. both tasks, we proceed as follows:</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.3. Keyword Extraction</title>
          <p>Similarly to the two previous tasks, Keyword Extraction
is the process of extracting terms of phrases that
summarize on a high level the core themes of a textual document.
Generally, the keywords (or sometimes called tags) are
the terms or phrases that are explicitly mentioned in the
text with a high frequency or are somehow relevant to a
big portion of it.</p>
          <p>For our experiments, we use KeyBERT [22], an
ofthe-shelf keyword extractor that is based on BERT [20],
which extracts keywords by first finding the frequent
n-grams, then measuring the similarity between their
embedding and the embedding of the whole document.
We experiment with keeping all keywords or filtering
out rare ones and report the results in Section 4.
3urlhttps://spacy.io/universe/project/video-spacys-ner-model
• For T1, we create a test split using the
leave-oneout protocol that is commonly used in the
literature [23], thus having a “training” set which
contains all but one talk that the user interacted
with (the user has to have at least two
interactions otherwise they are dropped). We create a
user embedding by averaging the computed
embeddings of all talks in the training set. The top
recommendations are then generated by taking
the talks which have the highest similarity score
(in the same KG embedding space) to the user
embedding. We note that there is actually no actual
training taking place, but this method allows us
to leverage actual “historical” user behavior to
evaluate purely content-based recommendation.
• For T2, we consider all “related videos” as a test
set. In other words, for each talk, we compute its
similarity to all other talks in the dataset, and we
recommend the talks which score the highest.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>4.2. Metrics</title>
          <p>To evaluate the performance of our method, we use two
commonly used metrics in the recommender systems
literature. In the following paragraphs,  is the number
of talks in the dataset,  is the number of users with
at least 2 interactions in their history,  is the number
of (ordered) model recommendations to considerate (we
picked  = 10 in our results),  is a talk ID (which maps
to its embedding),  is a user ID (which maps to its
embedding, i.e. the average of the embeddings of all talks
in the user’s history),</p>
          <p>() is the  ℎ recommendation
by our model (x being a user ID for T1 and a talk ID for
T2). ℎ(, ) = 1</p>
          <p>if the talk  is indeed in the ground truth
for  , otherwise it is 0.  ()
talks in T2 (which can be 1, 2 or 3).  (, )
is the number of related
is the rank
of talk  in the suggested recommendations for talk/user
 by descending similarity score.</p>
          <p>A simple metric to quantify the
probability of an item in the ground truth to be among the
top-K suggestions produced by the system. For T1, this
means that the left-out item from the user history must
be among the</p>
          <p>most similar talks to the user embedding
(as defined above). For</p>
          <p>T2, this means that the talk that
was manually picked by editors is among the K-most
similar talks in the embedding space.</p>
          <p>For T1 we get the formula:</p>
          <p>1
 =1 =1
∑
∑ ℎ(,  
 ())</p>
          <p>For T2, we normalize the counting of hits to account
for the variance of number of talks in the ground truth
so that the Hit Rate is 1 at best (i.e. when all related
talks in the ground truth are included in the system’s
recommendations):
1
Mean Reciprocal Rate (MRR@K): Similarly to</p>
          <p>, this metric also measures the probability of
having ground truth recommendations among the system’s
predictions, but it also accounts for the rank (order) of the
prediction: the closest it is to the top of the predictions,
the better. For T1 we get the formula:</p>
          <p>1
∑</p>
          <p>∑
 =1 =1  (,  
ℎ(,  
 ())
 ())</p>
          <p>For T2, and again to account for varying number of
talks in the ground truth, we slightly alter the previous
formula so that it is equal to 1 if all related talks are
occupying the top spots in the system predictions:
1

∑
 =1 ∑
()
=1
1
1/

∑
=1  (,  
ℎ(,  
 ())
 ())</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>4.3. Evaluation Protocol</title>
          <p>The protocol is summarized in Figure 1. For each of
the studied automatic annotations, we start by running
our automatic annotation model (as described in 3). We
then create a Knowledge Graph using on one hand the
metadata provided in the dataset (each talk is labeled with
a “tag” and a “theme”), and our automatically extracted
descriptors on the other hand. Once we connect all the
talks using these annotations, we run a Graph Embedding
method (see Section 4.4) to generate an embedding for
each talk in the dataset. These embeddings serve then as
representations that we can use to measure similarities
for both T1 and T2.</p>
        </sec>
        <sec id="sec-3-1-5">
          <title>4.4. Choice of embeddings</title>
          <p>Throughout the experiments section, we generate a graph
connecting the talks and their annotations. Next, we
compute node embeddings for each talk in our dataset. While
this choice is important for the overall performance of
the final recommendation system, our focus in this paper
is to demonstrate the utility of automatic annotations for
improving content recommendation.</p>
          <p>To bypass the need to select a proper graph embedding
technique and the expensive hyperparameter finetuning
that goes with it for each experiment, we simulate an
ideal scenario where we start from the KG containing the
talks and their manually annotated metadata from the
original TED dataset, i.e. tags and themes. This would
allow us to create a Knowledge Graph that does not contain
any noisy or extraneous annotations. We compute the
node embeddings for each talk using a selection of
embedding algorithms contained in the P y k g 2 v e c package [24]4,
a Python library for learning representations of entities
and relations in Knowledge Graphs using state-of-the-art
models. We finetune each representation using a small
grid-search optimization over learning rate, embedding
size and number of training epochs. We also add the
Onehot encoding of each talk (each talk is represented by a
binary vector which represent the presence or absence
of each tag and theme in the metadata) to see if there is
an advantage for using graph embeddings over a simple
lfat representation of the nodes, i.e. whether the graph
embeddings encode some semantics between the
annotations that a simple binary representation cannot pick up
on (e.g. the presence of one tag may be related to some
other tag/theme, in other words that the annotations are
not mutually orthogonal).</p>
          <p>We report the results on tables 1 and 2, for T1 and T2,
respectively.</p>
          <p>Embedding method
ConvE
DistMult
NTN
Rescal
TransD
TransE
TransH
TransM
TransR
One-hot
ConvE
DistMult
NTN
Rescal
TransD
TransE
TransH
TransM
TransR</p>
          <p>One-hot
– Over the studied configurations of
hyperparameters, translation-based methods perform the best
empirically, with T r a n s D [25] performing the best
(by quite a margin) in both set of experiments.
While further experiments may be needed to
determine how much this performance is due to the
nature of the dataset (size, sparsity, etc.) and the</p>
        </sec>
        <sec id="sec-3-1-6">
          <title>4.5. Automatic annotations</title>
          <p>In this section, we observe the performance gain of the
diferent automatic enrichment methods we have
introduced in Section 3.
4.5.1. Topic Modeling
In Table 3, we report on the results of adding the output
of the topic modeling annotations to the KG. We
evaluate the results as we vary two parameters: the number
of topics and the cutof threshold (the confidence score
above which we assign a talk to a given topic).</p>
          <p># topics</p>
          <p>Threshold
No topics added
10 0.03
10 0.3
40 0.03
40 0.3
100 0.03
100 0.3
No topics added
10 0.03
10 0.3
40 0.03
40 0.3
100 0.03
100 0.3</p>
          <p>T1
T2
0.0765
0.0612
0.0629
0.0769
0.0782
0.0562
0.0606
0.2403
0.2096
0.2135
0.2365
0.2475
0.1921
0.2074
0.0315
0.0246
0.0262
0.0317
0.0326
0.0220
0.0230
0.1542
0.033
0.1294
0.1623
0.1716
0.1196
0.1226</p>
          <p>From this small sample of hyperparameters values, we
see that both the number of topics and the cutof
threshold impact the performance of the recommendation on 4.5.3. Keywords Extraction
both tasks. Performance improves when raising the
cutof threshold, which implies that when we only assign In Table 5, we report on the results of adding the output
topics to talks, if the topic model is highly confident, it of the Keyword Extraction to the KG. We evaluate the
decreases the noisy relations in the graph and decrease results as we add either all extracted keywords or only
the risk of accidentally connecting nodes that are not the ones that the keyword extraction model assigned a
really topically similar. We also note that under the right high enough confidence score to. In our experiment, a
configuration, we improve the performance on both met- confidence score above 0.3 has been chosen.
rics for both tasks, whereas in most other configurations
the performance sufers. We note that with the number Confidence HIT@10 MRR@10
of topics one should find a value that is befitting the stud- T1
ied corpus, as the value 40 (inspired by the ground truth No KWs added 0.0765 0.0315
number of themes in the dataset) seems to give the best All KWs added 0.0732 0.0295
results. Only with conf &gt; 0.3 0.0772 0.0322</p>
          <p>Topic modeling is a task that is generally very sen- T2
sitive to the initial hyper-parameters and subject to
inherent stochasticity, which means that with enough ex- No KWs added 0.2403 0.1542
periments, it is likely to find a configuration of hyper- AOlnlKlyWwsiathddceodnf &gt; 0.3 00..22439984 00..11552933
pamaters (not only the number of topics and the cutof
threshold but also model-specific hyperparameters such Table 5
as LDA’s alpha and beta) that yields even better improve- The results of enriching the metadata KG with Keywords
ment over the reported results. nodes, varying the confidence threshold
4.5.2. Named Entity Recognition
In Table 4, we report on the results of adding the output 4.5.4. Combining annotations
of the Named Entity Recognition annotations to the KG. In Table 6, we summarize the results from previous
exWe evaluate the results as we switch between keeping periments, and we see that the addition of the best
conall entities we extracted in the KG and keeping only ones ifguration from each experimental setting into one KG
that appear with a high enough frequency: in our case, further improves the results.
we only add nodes for entities that are mentioned more
than 10 times in the corpus.</p>
          <p>Annotation
# mentions
No NEs added
All NEs added
More than 10 mentions
No NEs added
All NEs added
More than 10 mentions</p>
          <p>T1
T2</p>
          <p>From these results, we see that adding NEs improves
the results of the recommender system, especially
after removing rarely appearing Named Entities (either We observe that the automatic annotations overall
imerroneous or superfluous mentions). We also notice that prove the performance on the recommendation task on
MRR increases significantly with this addition for T2, purely content-based recommendations (T2), but
surprissuggesting that the Named Entities are strong indicators ingly, they do so even for user preference-based ones (T1),
of content relatedness. although the overall performance is still significantly
lower. One could argue that this is because users are usu- centric recommendation problems.
ally interested in similar content to what they watched
previously (in other words, all recommendation tasks are
partially content-based). There is a possibility, however, Acknowledgment
that the user is likely to click on the suggested video
in the “related” section, which creates a dependence
between the two tasks that is impossible to untangle. This
is beyond the scope of this paper, but it is interesting
to study the feedback loop of recommendation in such
setting. Finally, the results suggest that Named Entity
Recognition contributes the most to the overall
performance improvement of the system, as it is the closest to References
the overall performance and still gives a better absolute
MRR score.</p>
          <p>This work has been partially supported by the French
National Research Agency (ANR) within the ANTRACT
(ANR-17-CE38-0010) projects, and by the European
Union’s Horizon 2020 research and innovation program
within the MeMAD (GA 780069) project.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion and future work</title>
      <p>In this work, we showed how combining the knowledge
extracted automatically using Information Extraction
techniques with the representational power of KG and
their embeddings can improve the performance
contentbased media Recommender Systems without requiring
any supervision or external data collection, as we
demonstrated clear performance improvement as measured on
two tasks: making recommendations based on manually
curated recommendations, and based on actual users
interaction history. Our results are reproducible using the
code published at https://github.com/D2KLab/ka-recsys.</p>
      <p>With these promising results showing actual
improvement over relying only on human annotation, there are
multiple paths for further exploration. First, other
techniques from the information extraction literature can
be investigated such as entity linking, aspect extraction,
and concept mining, with more exploration to be done
on the techniques already presented (i.e. experimenting
with other approaches for Topic Modeling, Named Entity
Extraction and Keyword Extraction). What’s more, as
shown experimentally, the way these automatic
annotations are processed and filtered (thus changing the
structure of the generated KG), the results can vary, which
calls for further study of how to balance the quantity of
automatic annotations and the cutback on the necessary
noise that comes with it. Another direction of work is to
further explore models that go beyond simple graph
embeddings. We should also consider combining the results
of such annotations with the original textual context, as
our early experiments suggest that combining both the
low-level features (text embeddings) and high level ones
(graph embeddings) improve further upon the
performance. Furthermore, as these extracted annotations live
on a KG, multiple methods in the direction of Explainable
Recommendations can be explored in tandem.</p>
      <p>Finally, we would like to test this approach on other
datasets to see if it can be as successful on other
contentmender Systems, Springer US, Boston, MA, 2015, ter of the Association for Computational
Linguispp. 119–159. tics: Human Language Technologies, Volume 1
[11] G. A. Miller, Wordnet: A lexical database for (Long and Short Papers), Association for
Comenglish, Commun. ACM 38 (1995) 39–41. URL: putational Linguistics, Minneapolis, Minnesota,
https://doi.org/10.1145/219717.219748. 2019, pp. 4171–4186. URL: https://aclanthology.org/
[12] Q. Guo, F. Zhuang, C. Qin, H. Zhu, X. Xie, H. Xiong, N19-1423.</p>
      <p>Q. He, A survey on knowledge graph-based recom- [21] M. Honnibal, I. Montani, S. Van Landeghem,
mender systems, 2020. URL: https://arxiv.org/abs/ A. Boyd, spaCy: Industrial-strength Natural
Lan2003.00911. guage Processing in Python, 2020. URL: https://doi.
[13] Y. Cao, X. Wang, X. He, Z. Hu, C. Tat-seng, Unifying org/10.5281/zenodo.1212303.
knowledge graph learning and recommendation: [22] M. Grootendorst, Keybert: Minimal keyword
exTowards a better understanding of user preference, traction with bert., 2020. URL: https://doi.org/10.
in: WWW, 2019. URL: https://arxiv.org/abs/1906. 5281/zenodo.4461265.</p>
      <p>04239. [23] S. Rendle, Factorization machines, in: IEEE
In[14] H. Cai, V. Zheng, K. Chang, A comprehensive sur- ternational Conference on Data Mining, 2010, pp.
vey of graph embedding: Problems, techniques, and 995–1000.
applications, IEEE Transactions on Knowledge and [24] S. Y. Yu, S. Rokka Chhetri, A. Canedo, P. Goyal,
Data Engineering 30 (2018) 1616–1637. M. A. A. Faruque, Pykg2vec: A python library for
[15] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet knowledge graph embedding, 2019.</p>
      <p>allocation 3 (2003) 993–1022. [25] G. Ji, S. He, L. Xu, K. Liu, J. Zhao, Knowledge graph
[16] F. Bianchi, S. Terragni, D. Hovy, Pre-training is a embedding via dynamic mapping matrix, in: ACL,
hot topic: Contextualized document embeddings 2015.
improve topic coherence, in: Proceedings of the
59th Annual Meeting of the Association for
Computational Linguistics and the 11th International
Joint Conference on Natural Language Processing
(Volume 2: Short Papers), Association for
Computational Linguistics, Online, 2021, pp. 759–766. URL:
https://aclanthology.org/2021.acl-short.96.
[17] T. Tian, Z. F. Fang, Attention-based
autoencoder topic model for short texts, Procedia
Computer Science 151 (2019) 1134–1139.</p>
      <p>URL: https://www.sciencedirect.com/science/
article/pii/S1877050919306283. doi:h t t p s :
/ / d o i . o r g / 1 0 . 1 0 1 6 / j . p r o c s . 2 0 1 9 . 0 4 . 1 6 1 , the
10th International Conference on Ambient
Systems, Networks and Technologies (ANT 2019) /
The 2nd International Conference on Emerging
Data and Industry 4.0 (EDI40 2019) / Afiliated</p>
      <p>Workshops.
[18] I. Harrando, P. Lisena, R. Troncy, Apples to apples:</p>
      <p>A systematic evaluation of topic models, in: RANLP,
volume 260, 2021, pp. 488–498.
[19] I. Yamada, A. Asai, H. Shindo, H. Takeda, Y.
Matsumoto, LUKE: Deep contextualized entity
representations with entity-aware self-attention, in:
Proceedings of the 2020 Conference on Empirical
Methods in Natural Language Processing (EMNLP),
Association for Computational Linguistics, Online,
2020, pp. 6442–6454. URL: https://aclanthology.org/
2020.emnlp-main.523.
[20] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:</p>
      <p>Pre-training of deep bidirectional transformers for
language understanding, in: Proceedings of the
2019 Conference of the North American
Chap</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kotkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Veijalainen</surname>
          </string-name>
          ,
          <article-title>A survey of serendipity in recommender systems</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>111</volume>
          (
          <year>2016</year>
          )
          <fpage>180</fpage>
          -
          <lpage>192</lpage>
          . URL: https://www.sciencedirect.com/science/ article/pii/S0950705116302763.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kunaver</surname>
          </string-name>
          , T. Požrl,
          <article-title>Diversity in recommender systems - a survey, Knowledge-Based Systems 123 (</article-title>
          <year>2017</year>
          )
          <fpage>154</fpage>
          -
          <lpage>162</lpage>
          . URL: https://www.sciencedirect. com/science/article/pii/S0950705117300680.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Explainable recommendation: A survey and new perspectives</article-title>
          ,
          <source>Found. Trends Inf. Retr</source>
          .
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pappas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu-Belis</surname>
          </string-name>
          ,
          <article-title>Combining content with user preferences for ted lecture recommendation</article-title>
          ,
          <source>in: 11th International Workshop on Content-Based Multimedia Indexing (CBMI)</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>47</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Schafer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Frankowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Herlocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <source>Collaborative Filtering Recommender Systems</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2007</year>
          , pp.
          <fpage>291</fpage>
          -
          <lpage>324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pappas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu-Belis</surname>
          </string-name>
          ,
          <article-title>Sentiment analysis of user comments for one-class collaborative filtering over ted talks</article-title>
          ,
          <source>in: 36th international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>773</fpage>
          -
          <lpage>776</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Merchant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <article-title>Hybrid trust-aware model for personalized top-n recommendation</article-title>
          ,
          <source>in: Fourth ACM IKDD Conferences on Data Sciences, Association for Computing Machinery</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Pappas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Popescu-Belis</surname>
          </string-name>
          ,
          <article-title>Combining content with user preferences for non-fiction multimedia recommendation: a study on ted lectures</article-title>
          ,
          <source>Multimedia Tools and Applications</source>
          <volume>74</volume>
          (
          <year>2013</year>
          )
          <fpage>1175</fpage>
          -
          <lpage>1197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Zheng, Multi-Modal Knowledge Graphs for Recommender Systems</article-title>
          , Association for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>1405</fpage>
          -
          <lpage>1414</lpage>
          . URL: https://doi.org/10.1145/3340531. 3411947.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>M. de Gemmis</surname>
            , P. Lops,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Musto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Narducci</surname>
          </string-name>
          , G. Semeraro,
          <string-name>
            <surname>Semantics-Aware Content-Based</surname>
          </string-name>
          Recom-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>