=Paper=
{{Paper
|id=Vol-2283/MediaEval_18_paper_42
|storemode=property
|title=Baseline Algorithms for Predicting the Interest in News Based on Multimedia-Data
|pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_42.pdf
|volume=Vol-2283
|authors=Andreas Lommatzsch,Benjamin Kille
|dblpUrl=https://dblp.org/rec/conf/mediaeval/LommatzschK18
}}
==Baseline Algorithms for Predicting the Interest in News Based on Multimedia-Data==
Baseline Algorithms for Predicting the Interest in News based on Multimedia Data Andreas Lommatzsch Benjamin Kille DAI-Labor, TU Berlin, Berlin, Germany DAI-Labor, TU Berlin, Berlin, Germany andreas.lommatzsch@dai-labor.de benjamin.kille@dai-labor.de ABSTRACT 3 BASELINES The analysis of images in the context of recommender systems is NewsREEL Multimedia tasks the participants to find the news items a challenging research topic. NewsREEL Multimedia enables re- which users will read most frequently. The participating teams searchers to study new algorithms with a large dataset. The dataset must predict the number of impressions for each item listed in the comprises news items and the number of impressions as a proxy test weeks. We introduce three baseline strategies for predicting for interestingness. Each news article comes with textual and im- the number of impressions: random, document-based, and feature- age features. This paper presents data characteristics and baseline based. prediction models. We discuss the performance of these predictors and explain the detected patterns. 3.1 Random The random baseline assigns each item a random non-negative KEYWORDS integer as number of impressions. This random guessing should be Multimedia, News, Recommender Systems, Image Analysis the lower bound for all prediction strategies. 1 INTRODUCTION 3.2 Document-based Approach The NewsREEL Multimedia tasks supplies participants with differ- The document-based approach centers on the notion of document ent kinds of data. These include low-level features, image labels, and similarity. The algorithm employs the basic concept of the k nearest texts. Thus, participants may apply a broad spectrum of machine neighbor classifier [1, Chapter 4.4]. First, we represent each news learning approaches. There is little existing work as NewsREEL items as a bag of words. We obtain the words either from the Multimedia represents the first task of its kind. The tasks’ overview articles’ texts or image annotations. Next, we determine the ten paper [3] presents an outline and detailed description. most similar news items by means of cosine distances amid their In this paper, we study ways to predict the popularity of news term vectors. The computation exhibits linear complexity in the items relying on multimedia data. We analyze differences among number of news items. With the NewsREEL Multimedia dataset, the publishers, especially, how they affect the quality of predictions. computation took several minutes. Finally, we estimate the number The remainder of this paper is structured as follows: Section 2 of impressions as the sum of the ten neighbors’ impressions. analyzes the dataset. Subsequently, we introduce different predic- tors (Section 3). Section 4 discusses the baseline results. Finally, Section 5 concludes and suggests directions for future research. 3 6 3 6 1 10 10 1 10 10 week 0 week 6 2 DATA DESCRIPTION 10 −2 10 −2 The dataset covers thirteen weeks of four selected publishers. Three 10 −4 10 −4 publishers—17614, 13554, and 39234—make up most of the impres- sions. Fig. 1 illustrates how the number of impressions is distributed. week 1 week 7 We recognize the downward trend on the log-log plots. This indi- 10 −2 10 −2 cates power law distributed quantities. In other words, few articles −4 −4 10 10 collect most attention whereas a majority of articles receives little attention. As a result, the predictors must accurately pick the best articles to perform well. The automatic annotators have assigned −2 week 2 week 8 −2 10 10 a frequent subset of labels to articles. For publisher 17614, these −4 −4 include ‘stage,’ ‘suit,’ and ‘wig.’ The dataset provides the labels 10 10 computed using six different labeler configurations. All annotators 1 10 3 10 6 1 10 3 10 6 rely on ImageNet, which had been trained on publicly available 13554 17614 39234 images. The annotators differ with respect to the used frameworks (Tensorflow, Keras) and the applied pre-trained network (VGG16, VGG19, InceptionV3, ResNet50). The task incentivizes partici- Figure 1: Distribution of Impressions for three publishers in pants to find the relation between configuration and performance. the training set. Publishers have been color-coded according to the legend. The x-axis shows the number of impressions. MediaEval ’18, 29-31 October 2018, Sophia Antipolis, France The y-axis refer to the proportion of articles. Both axes are © 2018 Copyright held by the owner/author(s). plotted logarithmically. MediaEval ’18, 29-31 October 2018, Sophia Antipolis, France 3.3 Feature-based Approach Table 1: Prec@10% for the baseline algorithms We will add the missing numbers. The feature-based approach considers features rather than docu- ments. We derive features as terms occurring in the news article recommender labeler domain as well as labels assigned to images. For each term and label, we name config. 13554 17614 39234 compute the average number of impressions of all articles related doc. similarity using images 2 0.207 0.103 0.110 to them. We estimate the number of impression for a given article doc. similarity using images 3 0.223 0.109 0.104 by averaging the expected impressions of all its features. doc. similarity using images 4 0.200 0.114 0.104 The NewsREEL Multimedia dataset contains further informa- doc. similarity using images 5 0.224 0.112 0.104 doc. similarity using images 6 0.227 0.109 0.121 tion facilitating variations of this approach. Image labels carry a doc. similarity using images 7 0.232 0.109 0.091 reference to their annotator’s configuration. Thus, the baseline can doc. similarity using text - 0.186 0.100 0.137 focus on particular annotators’ input or combinations thereof. In image feature-based 2 0.159 0.097 0.123 addition, each label entails a confidence score. The score indicates image feature-based 3 0.137 0.099 0.127 how confident the annotator is that the label applies to the image. image feature-based 4 0.091 0.108 0.113 image feature-based 5 0.108 0.104 0.110 We can modify the baseline to consider these scores as weights. image feature-based 6 0.129 0.109 0.110 image feature-based 7 0.124 0.106 0.096 4 EVALUATION text feature-based - 0.347 0.192 0.225 random - 0.101 0.102 0.102 We have evaluated the implemented algorithms paying attention to the configurations used to annotate the images. Table 1 shows that the results differ strongly in between domains. The random baseline performs at ≈ 10% for all three publishers. In contrast, the text-based labels computed by the algorithms, we found, that the labels typi- method achieves 34.7% for publisher 13554, 19.2% for publisher cally describe selected objects in the image, but are not optimized 17614, and 22.5% for publisher 39234. The image-based method for interestingness prediction. An additional challenge is raised by exhibits noticeable differences as well. While it scores 19.0% for example (“stock”) images used by the publishers with news items publisher 13554 with configuration 7, it barely exceeds the random for that no recent photos exist. baseline for publisher 17614 and 39234. The good performance of Overall, the evaluation results between the configurations and image based recommenders for domain 13554 (“cars”) compared domains. The underlying rules should be researched in detail to with the other domains (“world and local news”) could be explained improve the prediction algorithms and to optimize the parameter by the fact, that articles on 13554 are have a longer lifecycle and configurations. are less influenced by breaking news. Comparing the text-based predictors with the image-based pre- 5 CONCLUSION dictors, we find that text feature-based methods on average show a In this paper, we have presented several ways to estimate how better performance. The approach focusing on selected text features popular news items will become based on multimedia data. The performs significantly better than the text terms based document results suggest that performance strongly depends on the individual similarity method. The document similarity method which uses publisher. We have observed that text-based features perform better images obtains similar results to the image-based feature meth- than image-based features. This could be due to terms being more ods. For publisher 13554, they score 23.2% with configuration 7, closely linked to the events reported by the articles. whereas they remain on the random baseline level for the remaining While text-based methods have outperformed the random base- publishers. Specific terms appear to affect items’ popularity more line consistently, image-based approaches merely overcome the than assigned images do. A suitable weighting scheme is of major random baseline for some publishers. This indicates that news importance. Comparing word features with image features, the articles’ popularity may be disconnected from images for some results indicate that the words are more suitable for forecasting the publishers. Furthermore, we have seen that the quality of image- popularity of items than the computed images labels. An analysis based recommendations depends on the annotator used to create of the correlation between image labels and text terms should be the labels. conducted. The use of different languages—English for image labels and German for news texts—introduces an additional difficulty. Future Work. We see several ways to extend this research: We analyze the differences between the feature-based and the (1) Our work has focused exclusively on “high-level” features such document-based approaches. On average, the feature-based meth- as image labels. Low-level features deserve further attention. ods outperform the document-based approaches. This could be (2) In our experiments, we have analyzed annotators’ configurations explained by considering more robust data (when using features) and the token-based methods separately. A weighted combination instead of merely considering the documents most similar to the of both might yield an performance boost for some publishers. For current news item. Top text terms in domain 13554 (domain cars) a live recommender the context of the item should be considered are middle-class, unique, mar and grand; the top image labels are as well. snake (referring to cables), roof, and folding chair. (3) Our feature-based approach linearly combines features. More Comparing the influence of the image labeler configuration, we complex methods—such as neuronal networks or SVMs—should find that the labeler 4 based on the InceptionV3 [4] performs worse be tested. They could capture the underlying distributions more than the predictors using the VGG [2] component. Analyzing the accurately. Baseline Algorithms for Predicting the Interest in News based on Multimedia Data MediaEval ’18, 29-31 October 2018, Sophia Antipolis, France REFERENCES [3] A. Lommatzsch, B. Kille, F. Hopfgartner, and L. Ramming. NewsREEL Multimedia [1] R. O. Duda, P. E. Hart, D. G. Stork, et al. Pattern classification. 2nd. Edition. New at MediaEval 2018: News Recommendation with Image and Text Content. In Procs. York, 55, 2001. of the MediaEval, 2018. [2] A. Dutta, A. Gupta, and A. Zissermann. VGG image annotator (VIA). [4] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception http://www.robots.ox.ac.uk/ vgg/software/via/, 2016. architecture for computer vision. CoRR, abs/1512.00567, 2015.