=Paper=
{{Paper
|id=Vol-2283/MediaEval_18_paper_42
|storemode=property
|title=Baseline Algorithms for Predicting the Interest in News Based on Multimedia-Data
|pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_42.pdf
|volume=Vol-2283
|authors=Andreas Lommatzsch,Benjamin Kille
|dblpUrl=https://dblp.org/rec/conf/mediaeval/LommatzschK18
}}
==Baseline Algorithms for Predicting the Interest in News Based on Multimedia-Data==
<pdf width="1500px">https://ceur-ws.org/Vol-2283/MediaEval_18_paper_42.pdf</pdf>
<pre>
            Baseline Algorithms for Predicting the Interest in News
                          based on Multimedia Data
                           Andreas Lommatzsch                                                         Benjamin Kille
                  DAI-Labor, TU Berlin, Berlin, Germany                                 DAI-Labor, TU Berlin, Berlin, Germany
                   andreas.lommatzsch@dai-labor.de                                          benjamin.kille@dai-labor.de
ABSTRACT                                                                  3     BASELINES
The analysis of images in the context of recommender systems is           NewsREEL Multimedia tasks the participants to find the news items
a challenging research topic. NewsREEL Multimedia enables re-             which users will read most frequently. The participating teams
searchers to study new algorithms with a large dataset. The dataset       must predict the number of impressions for each item listed in the
comprises news items and the number of impressions as a proxy             test weeks. We introduce three baseline strategies for predicting
for interestingness. Each news article comes with textual and im-         the number of impressions: random, document-based, and feature-
age features. This paper presents data characteristics and baseline       based.
prediction models. We discuss the performance of these predictors
and explain the detected patterns.                                        3.1      Random
                                                                          The random baseline assigns each item a random non-negative
KEYWORDS                                                                  integer as number of impressions. This random guessing should be
Multimedia, News, Recommender Systems, Image Analysis                     the lower bound for all prediction strategies.

1     INTRODUCTION                                                        3.2      Document-based Approach
The NewsREEL Multimedia tasks supplies participants with differ-          The document-based approach centers on the notion of document
ent kinds of data. These include low-level features, image labels, and    similarity. The algorithm employs the basic concept of the k nearest
texts. Thus, participants may apply a broad spectrum of machine           neighbor classifier [1, Chapter 4.4]. First, we represent each news
learning approaches. There is little existing work as NewsREEL            items as a bag of words. We obtain the words either from the
Multimedia represents the first task of its kind. The tasks’ overview     articles’ texts or image annotations. Next, we determine the ten
paper [3] presents an outline and detailed description.                   most similar news items by means of cosine distances amid their
   In this paper, we study ways to predict the popularity of news         term vectors. The computation exhibits linear complexity in the
items relying on multimedia data. We analyze differences among            number of news items. With the NewsREEL Multimedia dataset, the
publishers, especially, how they affect the quality of predictions.       computation took several minutes. Finally, we estimate the number
   The remainder of this paper is structured as follows: Section 2        of impressions as the sum of the ten neighbors’ impressions.
analyzes the dataset. Subsequently, we introduce different predic-
tors (Section 3). Section 4 discusses the baseline results. Finally,
Section 5 concludes and suggests directions for future research.                                      3         6            3            6
                                                                                         1       10        10       1   10           10
                                                                                                      week 0                     week 6
2     DATA DESCRIPTION                                                        10
                                                                                   −2
                                                                                                                                              10
                                                                                                                                                   −2

The dataset covers thirteen weeks of four selected publishers. Three          10
                                                                                   −4
                                                                                                                                              10
                                                                                                                                                   −4
publishers—17614, 13554, and 39234—make up most of the impres-
sions. Fig. 1 illustrates how the number of impressions is distributed.
                                                                                                      week 1                     week 7
We recognize the downward trend on the log-log plots. This indi-              10
                                                                                   −2
                                                                                                                                              10
                                                                                                                                                   −2

cates power law distributed quantities. In other words, few articles               −4                                                              −4
                                                                              10                                                              10
collect most attention whereas a majority of articles receives little
attention. As a result, the predictors must accurately pick the best
articles to perform well. The automatic annotators have assigned                   −2                 week 2                     week 8            −2
                                                                              10                                                              10
a frequent subset of labels to articles. For publisher 17614, these                −4                                                              −4
include ‘stage,’ ‘suit,’ and ‘wig.’ The dataset provides the labels           10                                                              10
computed using six different labeler configurations. All annotators                      1       10
                                                                                                      3
                                                                                                          10
                                                                                                                6
                                                                                                                1       10
                                                                                                                             3
                                                                                                                                10
                                                                                                                                          6

rely on ImageNet, which had been trained on publicly available                               13554         17614          39234
images. The annotators differ with respect to the used frameworks
(Tensorflow, Keras) and the applied pre-trained network (VGG16,
VGG19, InceptionV3, ResNet50). The task incentivizes partici-             Figure 1: Distribution of Impressions for three publishers in
pants to find the relation between configuration and performance.         the training set. Publishers have been color-coded according
                                                                          to the legend. The x-axis shows the number of impressions.
MediaEval ’18, 29-31 October 2018, Sophia Antipolis, France               The y-axis refer to the proportion of articles. Both axes are
© 2018 Copyright held by the owner/author(s).
                                                                          plotted logarithmically.
MediaEval ’18, 29-31 October 2018, Sophia Antipolis, France


3.3    Feature-based Approach                                             Table 1: Prec@10% for the baseline algorithms
                                                                          We will add the missing numbers.
The feature-based approach considers features rather than docu-
ments. We derive features as terms occurring in the news article
                                                                                 recommender                    labeler           domain
as well as labels assigned to images. For each term and label, we                name                           config.   13554    17614 39234
compute the average number of impressions of all articles related                doc. similarity using images     2       0.207    0.103   0.110
to them. We estimate the number of impression for a given article                doc. similarity using images     3       0.223    0.109   0.104
by averaging the expected impressions of all its features.                       doc. similarity using images     4       0.200    0.114   0.104
   The NewsREEL Multimedia dataset contains further informa-                     doc. similarity using images     5       0.224    0.112   0.104
                                                                                 doc. similarity using images     6       0.227    0.109   0.121
tion facilitating variations of this approach. Image labels carry a              doc. similarity using images     7       0.232    0.109   0.091
reference to their annotator’s configuration. Thus, the baseline can             doc. similarity using text       -       0.186    0.100   0.137
focus on particular annotators’ input or combinations thereof. In                image feature-based              2       0.159    0.097   0.123
addition, each label entails a confidence score. The score indicates             image feature-based              3       0.137    0.099   0.127
how confident the annotator is that the label applies to the image.              image feature-based              4       0.091    0.108   0.113
                                                                                 image feature-based              5       0.108    0.104   0.110
We can modify the baseline to consider these scores as weights.
                                                                                 image feature-based              6       0.129    0.109   0.110
                                                                                 image feature-based              7       0.124    0.106   0.096
4     EVALUATION                                                                 text feature-based               -       0.347    0.192   0.225
                                                                                 random                            -      0.101    0.102   0.102
We have evaluated the implemented algorithms paying attention to
the configurations used to annotate the images. Table 1 shows that
the results differ strongly in between domains. The random baseline
performs at ≈ 10% for all three publishers. In contrast, the text-based   labels computed by the algorithms, we found, that the labels typi-
method achieves 34.7% for publisher 13554, 19.2% for publisher            cally describe selected objects in the image, but are not optimized
17614, and 22.5% for publisher 39234. The image-based method              for interestingness prediction. An additional challenge is raised by
exhibits noticeable differences as well. While it scores 19.0% for        example (“stock”) images used by the publishers with news items
publisher 13554 with configuration 7, it barely exceeds the random        for that no recent photos exist.
baseline for publisher 17614 and 39234. The good performance of              Overall, the evaluation results between the configurations and
image based recommenders for domain 13554 (“cars”) compared               domains. The underlying rules should be researched in detail to
with the other domains (“world and local news”) could be explained        improve the prediction algorithms and to optimize the parameter
by the fact, that articles on 13554 are have a longer lifecycle and       configurations.
are less influenced by breaking news.
   Comparing the text-based predictors with the image-based pre-          5   CONCLUSION
dictors, we find that text feature-based methods on average show a
                                                                          In this paper, we have presented several ways to estimate how
better performance. The approach focusing on selected text features
                                                                          popular news items will become based on multimedia data. The
performs significantly better than the text terms based document
                                                                          results suggest that performance strongly depends on the individual
similarity method. The document similarity method which uses
                                                                          publisher. We have observed that text-based features perform better
images obtains similar results to the image-based feature meth-
                                                                          than image-based features. This could be due to terms being more
ods. For publisher 13554, they score 23.2% with configuration 7,
                                                                          closely linked to the events reported by the articles.
whereas they remain on the random baseline level for the remaining
                                                                             While text-based methods have outperformed the random base-
publishers. Specific terms appear to affect items’ popularity more
                                                                          line consistently, image-based approaches merely overcome the
than assigned images do. A suitable weighting scheme is of major
                                                                          random baseline for some publishers. This indicates that news
importance. Comparing word features with image features, the
                                                                          articles’ popularity may be disconnected from images for some
results indicate that the words are more suitable for forecasting the
                                                                          publishers. Furthermore, we have seen that the quality of image-
popularity of items than the computed images labels. An analysis
                                                                          based recommendations depends on the annotator used to create
of the correlation between image labels and text terms should be
                                                                          the labels.
conducted. The use of different languages—English for image labels
and German for news texts—introduces an additional difficulty.               Future Work. We see several ways to extend this research:
   We analyze the differences between the feature-based and the           (1) Our work has focused exclusively on “high-level” features such
document-based approaches. On average, the feature-based meth-            as image labels. Low-level features deserve further attention.
ods outperform the document-based approaches. This could be               (2) In our experiments, we have analyzed annotators’ configurations
explained by considering more robust data (when using features)           and the token-based methods separately. A weighted combination
instead of merely considering the documents most similar to the           of both might yield an performance boost for some publishers. For
current news item. Top text terms in domain 13554 (domain cars)           a live recommender the context of the item should be considered
are middle-class, unique, mar and grand; the top image labels are         as well.
snake (referring to cables), roof, and folding chair.                     (3) Our feature-based approach linearly combines features. More
   Comparing the influence of the image labeler configuration, we         complex methods—such as neuronal networks or SVMs—should
find that the labeler 4 based on the InceptionV3 [4] performs worse       be tested. They could capture the underlying distributions more
than the predictors using the VGG [2] component. Analyzing the            accurately.
Baseline Algorithms for Predicting the Interest in News
based on Multimedia Data                                                                           MediaEval ’18, 29-31 October 2018, Sophia Antipolis, France

REFERENCES                                                                                  [3] A. Lommatzsch, B. Kille, F. Hopfgartner, and L. Ramming. NewsREEL Multimedia
[1] R. O. Duda, P. E. Hart, D. G. Stork, et al. Pattern classification. 2nd. Edition. New       at MediaEval 2018: News Recommendation with Image and Text Content. In Procs.
    York, 55, 2001.                                                                             of the MediaEval, 2018.
[2] A. Dutta, A. Gupta, and A. Zissermann.              VGG image annotator (VIA).          [4] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception
    http://www.robots.ox.ac.uk/ vgg/software/via/, 2016.                                        architecture for computer vision. CoRR, abs/1512.00567, 2015.

</pre>