AdsISee: Advertisement Detection and Tracking for Sponsorship Evaluation in Soccer Matches Alexander Westermann∗ Philipp Krayer∗ Andreas Weiler Institute of Applied Information Institute of Applied Information Institute of Applied Information Technology Technology Technology Zurich University of Applied Zurich University of Applied Zurich University of Applied Sciences Sciences Sciences Winterthur, Switzerland Winterthur, Switzerland Winterthur, Switzerland alw.westermann@gmail.com philipp.krayer@protonmail.com andreas.weiler@zhaw.ch Figure 1: Example of AdsISee with three ads in the soccer match Sweden against Switzerland at the World Cup 2018. ABSTRACT to the global advertising market and the sponsors paid up to US In this work, we present AdsISee, a real-life application for the $200 million for a sponsorship package [3]. The target audience detection and tracking of advertisements in broadcasts of soccer of these advertisements are primarily the on-site visitors. How- matches for supporting business analysts in the task of spon- ever, also the television viewers are important receivers of these sorship evaluation and reporting. Our approach is based on dif- kind of advertisements. For example, an average of about 191 ferent combinations of several techniques for object detection million viewers watched the soccer matches at the World Cup and tracking in images. In contrast to other works which use 2018 as live broadcast [4]. Additionally, millions of viewers are the technology of neural networks, we use alternative solutions watching full or recapped recordings of soccer matches at any to detect advertisements based on provided pre-defined image time. In respect to these numbers, every second is important in templates and without any training period. Hereby it was possi- which the advertisements can be seen on screen by the viewers. ble to build an application which can be executed on standard In this work, we present AdsISee, a real-life application for hardware by still providing a feasible performance. Furthermore, the detection and tracking of pre-defined advertisements in the our evaluations show that we can achieve comparable results frames of soccer match broadcasts. By using AdsISee, we are able against other existing approaches, which use neural networks, to generate a sponsorship report to support business analysts for sponsorship evaluation. in their task of sponsorship evaluation and further analysis. We use different combinations of several techniques for object de- tection and tracking like the FLANN [9] matcher or the MOSSE 1 INTRODUCTION AND MOTIVATION [7] tracker. In contrast to other works, in which neural networks The market of advertisements in sport events is tremendous. [1, 8] or the Haar cascade detector [11] are being used, which For example, advertisers had to pay US $5.25 million to air a 30- need to be trained on the detectable object beforehand, we use second long commercial during the Super Bowl 2019 broadcast alternative solutions to detect advertisements based on provided [12]. However, besides the advertisements, which are explicitly templates and without any training period. One advantage of shown to the television viewers, there are advertisements, which our implementation1 is that it can be executed on standard hard- are directly placed in the sport events itself. These advertise- ware by still providing a feasible performance. Therefore, the ments are shown on the margins of the playing field as perimeter main goal of this work is to evaluate various technologies in advertising, on the clothes of the players, or somewhere else in the field of visual computing and applying them in a domain the real-world environment of the sports event. For example, the specific area in order detect objects without the use of machine FIFA World Cup 2018 brought an additional of US $2.4 billion learning methods. This offers the advantages of less configura- ∗ Both authors contributed equally to this research. tion, no manual annotation of advertisements and no learning process in order to successfully detect the advertisements. Our © 2020 Copyright for this paper by its author(s). Published in the Workshop Proceed- experiments (see section 3) show that it is possible to detect and ings of the EDBT/ICDT 2020 Joint Conference (March 30-April 2, 2020, Copenhagen, Denmark) on CEUR-WS.org. Use permitted under Creative Commons License At- track advertisements in different quality levels. To evaluate our tribution 4.0 International (CC BY 4.0) 1 https://github.com/AdsISee/AdsISee (November 20, 2019) work, we created several case studies for different soccer matches. Furthermore, we compare our application against the solution of Orpix ComputerVision Inc. [5], which is the market leader in the area of sponsorship evaluation who uses trained neural networks in order to detect the advertisements in broadcasted soccer matches. The detailed evaluations provide insights into the advantages and disadvantages of the used technologies in Figure 2: The three possible perspectives of an banner ad. order to detect and track advertisements. 2 METHODOLOGY Our developed solution detects advertisements on the margins Figure 3: Example of detected features in a template. of the playing field in broadcasts of soccer matches based on templates in the format of pre-defined images. The broadcast of 2.2 Feature Matching the soccer match is divided into its individual frames, which are After extracting the features from the template and target, each used as target images and are processed in a streaming fashion feature from the template has to be searched for a match with a one after the other. In each of the target images our solution tries feature from the target. An example can be seen in Figure 4. The to detect the template image for the matching advertisements. In exact position of the advertisement in the target image can be cal- a first step, relevant technologies for object detection in the do- culated as soon as the number of matches exceeds a threshold. We main of advertisements were evaluated. By analyzing a common use in our implementation the Fast Library for Approximate Near- set of advertisements in soccer matches, we were able to discover est Neighbors (FLANN ) in order to calculate matches between the certain properties, which can be used for the recognition of the template and target image. The FLANN matcher calculates the ads in the target images. One of the main characteristics are the nearest neighbors between the properties of the detected features clearly distinguishable colors with high contrast which supports which are represented by a distance. By applying a threshold the viewers and our application to better recognize the exposed for acceptable distances, we are able to distinguish between cor- brands. By analyzing and extracting the different color compo- rect matches, which belong to the advertisement, and incorrect nents of the template it is possible to filter out large areas of matches. the target image which do not correspond with the colors of the target image. Accordingly, the search area for the advertisements can already be severely reduced. 2.1 Feature Detection Another unique characteristic of advertisements on banners are the simple geometric properties of the exposed logos. Edges, cor- ners and flat surfaces can be extracted out of the advertisement templates in order to match them in the target images. Therefore, two different methods have been applied to detect those unique Figure 4: Example of feature extraction and matching. features with their positions from the templates and target im- ages. One of the main challenges with this approach is, that in The Brute-Force matcher offers a faster alternative to the most cases the perspectives and sizes of the advertisements in FLANN matcher, which compares each feature from the template the target images do not correspond to the advertisements in the with all the extracted features from the target image and matches templates. The perspectives of the advertisements in the target the features with the smallest difference [6]. Our evaluations images depend on the angle and position of the camera recording show that the Brute-Force matcher was able to perform faster the sport event. We solved this issue by using the Scale Invariant than the FLANN matcher, however, the accuracy was slightly Feature Transformations (SIFT) in our implementation, which is reduced. Accordingly, we our solution applies the Brute-Force a technology that was originally designed for panoramic image matcher additionally to the FLANN matcher for cases where the stitching [2]. By applying the SIFT algorithm on the advertise- performance has to be maximized. ment templates and target images, unique features (cf. Figure 3) such as corners, edges and flat areas can be extracted regard- 2.3 Matching Multiple Advertisements less of their scaling and perspective. To improve the accuracy of During our evaluations we figured out that the matching has the feature detection, each advertisement template is scaled to major issues by detecting more than one identical advertisement, one of the three main perspectives, in which the banners occur which is visible in the target image. For example in Figure 1 the during the broadcasts. These perspectives include the frontal application needs to detect the advertisements of McDonalds directly visible and the positions on the left or right of the field and Visa more than once. In this case the approach of using the (cf. Figure 2). Brute-Force and the FLANN matcher detects identical features of the ads accordingly and therefore it is impossible for feature matching to distinguish the individual features between the same advertisements. To solve this problem, we have developed our own solution which allows to differentiate extracted features between the same advertisements. After an advertisement has been detected, the search for the same advertisement is repeated, excluding the features of the already matched advertisement. This procedure is repeated until no new advertisements are detected. This allows our solution to allocate each feature from the template advertisement to multiple features in the target image according to the amount of the identical visible advertisements. 2.4 Tracking The perquisite for feature detection and matching is that the searched object has to be sharp. However, due to the movement of the camera, in most cases the advertisements often appear blurry in the target image. As a result, no sharp edges or corners can be extracted out of the target image and the application is not able to detect the advertisement. For these cases we used a com- bination of tracking technologies, the Median Flow [13] - and the MOSSE [7] tracker, in our solution. In order to detect blurred ad- vertisements nevertheless, we have applied tracking technologies to follow the movement of advertisements in the target image once they have been detected. After evaluating various tracking technologies, we decided to implement the MEDIANFLOW- and the MOSSE tracker which show high performance and accuracy as our experiments have shown. Once an advertisement has been detected through SIFT and feature matching, it will be registered in a tracker. For all following frames, the tracker is updated which determines the exact position of the tracked advertisement. By calculating a matrix for the perspective transformation out of the previously detected advertisement, we were able to reconstruct Figure 5: Visualization of empty area tracking. the exact place and perspective of the tracked advertisement. As an added benefit, this approach increases the performance of the entire software. The detection of advertisements through feature matching is a relatively time-consuming process. Once an advertisement has been detected in a frame, it can be tracked for all subsequent frames and thus no longer has to be discovered again. Besides of advertisement tracking, this technology can Target compression: Yes No also be used to track advertisement-free areas. As soon as no Average accuracy: 71% 88% advertisements have been detected in a frame, the complete target No. of errors: 1 0 image is marked for tracking, which will be excluded for all Average time: 1.03 sec. 4.12 sec. the following advertising searches (cf. Figure 5). The impact the Table 1: Evaluation of target compression accuracy and performance of the advertisement and empty area tracking will be tested in a detailed evaluation (cf. Section 3). 3 EXPERIMENTS In a detailed evaluation we tested both the accuracy and the performance of our application in advertisement detection in The results in Table 1 show that reducing the resolution of broadcasts of sport events. In the first series of experiments, the the target image by 50% results in an increase in performance influence of the individual components has been evaluated. All by 400%. However, the accuracy decreases by 20%. Reducing the experiments have been performed on an Ubuntu-Machine the accuracy erases some detectable features since small edges with 2.01 GHz, 8 CPU cores and 16 GB RAM. and corners disappear. Accordingly, fewer features have to be matched in order to detect the advertisement which decreases the search process. If the advertisement is poorly visible, not enough 3.1 Evaluation of Functionalities features can be extracted for a successful match, which explains In a first experiment we have tested the impact of compressing the slight reduction in accuracy. the target in a pre-processing step. We reduced the resolution of In another experiment we have tested the impact of each the target image by 50% and compared it with a test run without individual feature on the performance and accuracy of the adver- target compression. tisement detection. Color Matching Tracking Average Average filtering algorithm algorithm time accuracy Off Brute-Force MF 4.75 84% On Brute-Force MF 4.65 82% Off Brute-Force MOSSE 4.47 82% On Brute-Force MOSSE 4.61 83% Off FLANN MF 5.25 86% On FLANN MF 4.78 85% Off FLANN MOSSE 5.40 85% On FLANN MOSSE 5.06 85% Table 2: Comparison of various functionalities This evaluation shows that filtering out the irrelevant colors Figure 6: Accuracy with different thresholds in a pre-processing step increases the performance slightly, how- ever, resulting in a slight decrease in accuracy (cf. Table 2). This can be explained by the color filter interfering with the features of advertisements in the target images. The evaluation of the two different feature-matchers shows that the FLANN matcher per- forms in terms of accuracy better than the Brute-Force matcher. Though, the FLANN matcher took on average 0.5 seconds longer than the Brute-Force matcher to calculate the matches between the target advertisement and the template image. In addition to the matching algorithms, the two different tracking methods have been compared. The MOSSE tracker showed a slightly better per- formance without a significant reduction in accuracy compared with the MEDIANFLOW tracker. Accordingly, to these results, all features have been implemented in our solution and the user can decide whether the performance or the accuracy should be enhanced for the advertisement evaluation. Figure 7: Partially covered ad could be detected 3.2 Ideal Matching Difference been selected and tested on this software. 71% of all ads have In order to successfully match the features from the advertise- been successfully detected without any incorrectly detected non- ment templates with the extracted features from the template, existing advertisements. In some cases, the advertisements could it is necessary to filter out the wrong matches. Each extracted be detected, although they were partially covered (cf. Figure 7). feature from the template advertisement will be matched with In a second run AdsISee is evaluated on various video scenarios the most similar feature from the target image and the difference of live broadcast soccer matches. In each video clip the advertise- between the two features is calculated. If there are no advertise- ments were visible with different properties, which tested the ments in the target image, a non-existent advertisement will still limitations of the software. be detected, but with features whose differences are much higher compared with those who matched a correct advertisement. To Sudden Sudden prevent this, a threshold is defined for the highest acceptable Camera appearance disappearance Accuracy: difference in matches between features. A too high threshold movement: of ad: of ad: would result in matches which do not belong to an advertisement Slowly No No 98% and accordingly with a too low threshold correct matches would Slowly Yes No 97% be filtered out. The following experiment has been performed for Slowly Yes Yes 98% the purpose of finding the ideal threshold for acceptable matching Fast No No 85% differences. The results in Figure 6 show that the ideal threshold Fast No Yes 91% for acceptable matching differences should be between 0.7 and Fast Yes Yes 70% 0.75 to achieve the best results. This threshold is implemented Table 3: Comparison of various video scenarios accordingly in our solution. 3.3 Ground Truth Evaluation In this evaluation phase we run several tests for optimizing the In Table 3 it is visible, that on average our solution could detect configuration of AdsISee for maximal accuracy for advertise- the advertisements with high accuracy of 90%. The tracker was ment detection. The goal was to compare this software with able to track the movements of the advertisements of even fast excerpts from live broadcasts of sport events and determine its camera movements. Sudden appearances of the advertisements overall accuracy. In a first run, the software was tested without were always correctly recognized by the feature detection and the use of tracking. This determined the accuracy of the plain matching component after a maximum of 3 frames. However, detection process of advertisements. 30 frames with clearly visi- in the last test video, the advertisement was slowly faded away ble advertisements and 30 frames without advertisements have by an animation on the banner-screen. Since the used tracker technologies cannot detect the disappearance of an object slowly fading away, the position of the advertisement was continued being tracked even though the actual advertisement already dis- appeared. This resulted in a sharp drop of the measured accuracy. 3.4 Comparative Evaluation Orpix ComputerVision Inc. offers a cloud-based solution for eval- uations of advertisement occurrences during live broadcasts of sport events. This solution uses a state-of-the-art convolutional neural networks [10] to detect the advertisements and process the target images in a frame rate of 1 FPS. Neural networks provide excellent results in object recognition, but have the disadvan- tage that they need to be trained by an elaborate process on the object beforehand. This involves annotating advertisements in Figure 8: Example of an advertisement that is tagged twice hundreds of example templates by hand. by Orpix. In this test, the accuracy of the product of Orpix is compared to that of our solution. The goal is to show the advantage of Ad- sISee, that it can perform a sponsorship evaluation by providing of Orpix detected the advertisements twice or more, resulting only proper advertisement templates, without having to train in an inaccurate tagging of the corresponding advertisement any algorithm or making any configurations beforehand. Orpix (cf. Figure 8). In some cases, several adjacent advertisements provides one free online example of a sponsorship evaluation of have been marked as one single advertisement. Additionally, our the final game France versus Croatia at the FIFA World CUP 2018, solution is able to determine the positions of the advertisement which will be used for the comparison against our solution. The with better precision than the solution of Orpix. Unfortunately, computational performance of their solution is not mentioned the report about the final match at the FIFA World Cup 2018 by Orpix. Accordingly, no accurate comparison in performance was the only report, which Orpix provided and therefore our can be made between AdsISee and the solution of Orpix. comparable evaluation is just based on this single event and Based on 5 different advertising templates, the software was report. tested on randomly selected video sequences of the World CUP 2018 final game. The tagged advertisements were compared with 3.5 Problems Encountered the individual frames from the example evaluation from Orpix. In order to extract as many different features as possible from the advertising template, the images have to be provided in a Accuracy Accuracy Template: good quality. During our search for advertisement templates, we Orpix: AdsISee: were not able to find any high-quality templates that matched the Wanda 93.75 % 90.00 % commercials which appeared on the banners. Accordingly, we ex- Hyundai 88.50 % 73.25 % tracted the advertisements from broadcasts of sport events man- Qatar Airways 83.25 % 91.00 % ually, acquiring templates with slightly reduced quality. Thus, Hisense 99.50 % 92.75 % the tests were not performed with the best prerequisites that Gazprom 96.25 % 40.00 % could have been possible. Therefore, it can be assumed that Ad- Table 4: Comparison between Orpix and AdsISee. sISee can achieve even better results by providing high-quality advertisement templates as input. Our applied tracking technologies are ideal for tracking the All tests were performed on an Ubuntu system (2.01 GHz, 8 movement throughout the screen of detected advertisements. CPU’s, 16 GB RAM). Our application needs 500 MB RAM and The tracker is able to recognize if the advertisement suddenly dis- processes the frames on average with 0.981 seconds per frame. appears and terminates accordingly the tracking phase. However, The results (cf. Table 4) show that our software performs with if the banner changes the advertisement by an animation, the an average accuracy of 77.40 % and is therefore slightly lower tracker is not able to detect this and the wrong advertisement is than the solution from Orpix, which had an average accuracy continuously tracked. This resulted in some of our experiments of 92.25 %. This difference can be explained by the occurrences in a reduced rate of accuracy. of animations in the advertisement’s banners, which could only During the evaluation phase, we observed that in some rare be observed in this particular game. Normally most of the times cases non-existent advertisements have been falsely detected (cf. static banners are used in sport events. As a result, the tracker was Figure 9). We were able to partially solve this issue by imple- not able to detect the change of advertisements on the banners menting a filter which checks the positions and relations of the and false advertisements were tracked in all subsequent frames. detected advertisements. This filter prevented most of the false- In addition, the advertisement templates were only available in detected advertisements we encountered during our evaluations. reduced quality which limited the amount of extractable features Te filter validates the calculated frame of the advertisement be- (as explained in Section 3.5). Also, the solution of Orpix was fore the detected object is registered for the tracking. We defined in contrast to AdsISee able to detect advertisements that were the following conditions, which have to be met by the detected far away from the camera and even hardly recognizable by the object to recognize it as an advertisement: human eye. • The edges of the marked object must not cross each other. Nevertheless, in some cases our software was able to detect • The aspect ratios of the detected object must match those the advertisements more precisely than Orpix. Often, the solution of the template advertisement. Figure 11: Detection and Tracking of ads in hockey games with AdsISee. 5 CONCLUSIONS AND FUTURE WORK Figure 9: Example of a falsely detected advertisement. In this work, we have demonstrated that it is possible to detect advertisements in broadcasts of sport events, especially soccer matches, for sponsorship evaluation and analysis. We were able 4 DEMONSTRATION to apply alternative technologies for object detection in a specific To demonstrate the functionality and usefulness of AdsISee we domain and show certain advantages compared to the state-of- created a video compilation2 with examples of the detection and the-art technologies. We have successfully implemented a proto- tracking of several advertisements in soccer and hockey matches. type that can detect advertisements in broadcasts of sport events As templates we use a selection of 18 different advertisements (cf. by only providing templates of advertisements. AdsISee extracts Figure 10) of various companies. We tried to cover a wide range unique features such as color, edges and corners out of the tem- of colors and shapes with our template collection. Our demonstra- plate and matches those with each individual frame from the live tion provides examples for the detection and tracking of single broadcast. With the implemented tracking technologies, it allows or multiple advertisements in static and also fast-moving frames. to track the movement of the advertisements throughout the We also demonstrate in the video that AdsISee occasionally has screen. We tested the prototype for accuracy and performance problems (see Section 3.5) with the detection and tracking of the in an extensive evaluation phase. In addition, AdsISee was com- correct objects. For example, the country name “USA” in combi- pared with a product of Orpix which is the leader in the area nation with the blue color and the banner format (see timeframe of sponsorship evaluation in sport events, where similar results 0:39 to 1:30 in the video) is detected and tracked as the Visa have been measured. Our solution showed some advantages over advertisement. the product of Orpix, for example, the tagged position of the ad- vertising was calculated which a better precision than the product of Orpix. For future work we plan to extend our approach to further sport events and also to other detection and tracking areas in the frames, where ads could be placed. For example, Figure 11 shows the result of our approaches to detect and track advertisements in hockey games on the margins of the playing field, as well as on the jerseys of the players. Additionally, many improvements can still be implemented and tested in future. The filter for detecting false positives could be improved by comparing the colors of the matched advertisement with the templates. The advertisement will not be tagged if the colors do not match the template, which would also eliminate the issue with animations. Furthermore, since most advertisements contain a large part of text, adding an Figure 10: Advertisement templates used for the demon- additional text recognition feature could improve the accuracy stration video. of AdsISee. REFERENCES Additionally, to the video output with the marked advertise- [1] Khaled Almgren, Murali Krishnan, Fatima Aljanobi, and Jeongkyu Lee. 2018. ments, AdsISee creates a report as a standard text file containing AD or Non-AD: A Deep Learning Approach to Detect Advertisements from Magazines. Entropy 20, 12 (2018). the information about the sequences, frames, and the detected [2] Matthew Brown and David G. Lowe. 2006. Automatic Panoramic Image ads. This report can be used to visualize (cf. Figure 1) and analyze Stitching using Invariant Features. International Journal of Computer Vision complete broadcasts of sport events in a compact view. For exam- 74, 1 (2006), 1–15. [3] Marketing Charts. 2018. World Cup 2018 Stats. https://www.marketingcharts. ple, the report can be grouped by time sequences, advertisements, com/industries/sports-industries-83790. [Online; accessed 25-October-2019]. or the number of detected advertisements. This can support ana- [4] Fifa.com. 2019. More than half the world watched record- breaking 2018 World Cup). https://www.fifa.com/worldcup/news/ lysts in figuring out the sequences with the most advertisements more-than-half-the-world-watched-record-breaking-2018-world-cup. or comparing the on-screen time of the own advertisement with [Online; accessed 25-October-2019]. others. [5] Orpix ComputerVision Inc. [n.d.]. Sponsorship Analytics and Evaluation-Orpix-Computer Vision. http://www.orpix-inc.com/ sponsorship-valuation-analytics/. [Online; accessed 25-October-2019]. [6] Amila Jakubovic and Jasmin Velagic. 2018. Image Feature Matching and Object 2 https://youtu.be/KReFUcKiw4E (November 27, 2019) Detection Using Brute-Force Matchers. 2018 International Symposium ELMAR (2018). [7] Peter Janku, Karel Koplik, Tomas Dulik, and Istvan Szabo. 2016. Comparison of tracking algorithms implemented in OpenCV. MATEC Web of Conferences 76 (2016), 1—-6. [8] Shervin Minaee, Imed Bouazizi, Prakash Kolan, and Hossein Najafzadeh. 2018. Ad-Net: Audio-Visual Convolutional Neural Network for Advertisement De- tection In Videos. CoRR (2018). [9] Marius Muja and David G. Lowe. 2009. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration. In Proc. of the Fourth International Conference on Computer Vision Theory and Applications (VISAPP 2009). 331– 340. [10] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137—- 1149. [11] Sander Soo. 2014. Object detection using Haar-cascade Classifier. Institute of Computer Science, University of Tartu. [12] Statista.com. 2019. Super Bowl average costs of a 30-second TV advertisement from 2002 to 2019 (in million U.S. dollars). https://www.statista.com/statistics/ 217134/total-advertisement-revenue-of-super-bowls/. [Online; accessed 25-October-2019]. [13] Kalal Zdenek, Mikolajczyk Krystian, and Matas Jiri. 2010. Forward-Backward Error: Automatic Detection of Tracking Failures. In 20th International Confer- ence on Pattern Recognition (ICPR). 2756–2759.