Providing brands visibility data in live sports videos using deep learning algorithms *

Providing brands visibility data in live sports videos using deep learning algorithms * JuliusGudauskas julius.gudauskas@ktu.edu Department of Applied Informatics Kaunas University of Technology

Studentu 50 Kaunas Lithuania

Information Society University Studies

2024, May 17 Kaunas Lithuania

Providing brands visibility data in live sports videos using deep learning algorithms * 1613-0073 7427189287C5C417EA73C2AC4058E201 GROBID - A machine learning software for extracting information from scholarly documents Brands visibility logo detection YOLO Faster R-CNN

In the dynamic landscape of marketing and advertising, assessing brand visibility in live sports events plays a pivotal role in understanding brand exposure and impact. Traditional methods of manual annotation and analysis are time-consuming and subjective, necessitating automated solutions for efficient and objective evaluation. In this study proposed a novel approach leveraging deep learning algorithms to evaluate brand visibility in live sports videos. This research employs state-of-the-art object detection models, such as YOLO (You Only Look Once) and Faster R-CNN, to detect and localize brand logos within video frames. By training these models on annotated open-source logo datasets, we can extract valuable insights about the brands. The experimental results demonstrate the effectiveness of the proposed methodology in detecting logos and providing a valuable data about the positions for brand owners.

Introduction

In the world of sports live video streaming, a considerable number of brands are trying to get noticed by the audience using various popular visibility materials: posters, stickers, billboards, etc. The research evaluating the impact and effectiveness of advertisements in sports arenas emphasizes that people notice at least some of the advertisements they are exposed to and usually remembers a part of them that were the most noticeable [1]. Typically, clients engage in negotiations with advertising executives to determine the conditions that will govern brand placement in the arena -specifying factors like coverage, frequency of display on advertisement billboards, and overall visibility strategies. A study conducted by Eventmarketer [2] found that 72% of the audience are captivated by the brand when they see it during the events like music festivals or sports competitions where they are provided with good emotions and excitement. These occasions, characterized by heightened emotions and excitement, offer a unique opportunity for brands to establish a connection with a vast and diverse audience, potentially converting them into new users. However, to reach a wider audience, the brand must be placed in a visible location. Studies has shown that locations, such as boundary line hoarding are considered as the perfect place to display brand logos without irritating the viewers and getting maximum visibility [3] [4]. As sponsorship agreements comes with a significant cost, brand owners are interested to know if their investment is paying off. But measuring the effectiveness of different brand placement can prove challenging, time consuming and requiring manual work, leaving brand owners uncertain about the true impact of their investment in sponsorship deals.

A comprehensive understanding of the effectiveness of brand advertising through the integration of deep learning techniques in visual material analysis remains an evolving area, prompting the need for further research to refine methodologies and uncover insights that can inform strategic marketing decisions. To calculate the brand visibility metrics, the logos detection algorithm, that has the ability to detect all the logos, needs to be created. Current logo detection methods often focus on a limited set of logo categories, necessitating extensive training data that includes annotations for object bounding boxes [5]. But the main challenge remains the steep growth of the existing brand's amount and the brand image changes in the current ones. In this research, we addressed an open logo detection challenge and provided a unified brand's logo detection and recognition approach, using up to date machine learning algorithms.

Related works

Simple methods, that were developed as the early approaches for specific logo detections, relies on manually engineered visual features and traditional classification models [6] [7]. But such methods have a huge flaw: the region selection search algorithm based on sliding windows struggles to provide high accuracy in the high time complexity, manually created features lack robustness for logo diversity. Recent advancements in deep learning have revolutionized the field of visual material analysis, providing novel avenues for detecting and recognizing various objects, including logos. Many researchers have explored the application of deep neural networks in various image recognition tasks, demonstrating their capability to extract complex features and patterns. In the context of logo detection, the solution is usually determined by the size of the logos that has to be detected. Compared to a bigger size logos, smaller ones are more difficult to detect for several reasons: small logos with the low resolution contain little visual information, small logos covers a small area, it's surrounding box is more challenging to locate and there are usually less small logo samples [8]. In the small vehicles logo detection problem proceeding [9], researchers solved the issue using a YOLO [10] algorithm. In contrast to traditional methods relying on manual feature extraction, this system offers the benefits of selflearning features and direct image input. It can efficiently achieve both vehicle logo positioning and recognition functions. The researchers also introduce the Fast RCNN approach, that employs deep neural networks, utilizing convolutional layers to progressively extract abstract feature representations learned from previous convolutions [11] [12].

Natural visual scenes usually exhibit complexity and diversity -logos face various challenges, including object interference, shape distortion, different lighting, and limited perspective effects, which increases the difficulty of logo detection. In a recent development, researchers introduced a transfer learning approach, leveraging Densely Connected Convolutional Networks (DenseNet) for logo recognition [13]. They tested their method on the FlickerLogos-32 dataset and reached the accuracy higher than 92%. The visibility can also be impacted by the bad weather conditions. In the challenge, where the logos has to be detected in bad weather, authors presented an object proposal generation system AttentionMask [14]. The experimental findings indicate that the suggested approach demonstrates strong capabilities in identifying logos within intricate real-world settings. Nonetheless, data gathered from real-world scenarios may not match the quality of artificially augmented data, leading to a decline in the model's performance when detecting images in such real-world conditions.

Methodology

This work proposes a solution: a model that can identifying and predicting the bounding box around any logo within an image, irrespective of brand. Given the constant influx of new businesses, maintaining an up-to-date model with sufficient data for each brand proves challenging, potentially leading to difficulties in detecting logos of newly established brands during inference. To address this, the aim is to develop a model focused solely on detecting logos in images, regardless of brand affiliation. This approach eliminates the need to specifically train the model on individual brands, ensuring accurate detection of logos irrespective of their origin. The solution is focused on creating the model with highest accuracy, so the implementation is done using two different object detection algorithm pipelines: one -stage, that utilizes a single pass of the input image and enables the processing of the entire image in one go and two-stage models, that uses two passes of the image to make a prediction, where first pass is used to generate a set of proposals or potential object locations, and the second pass is used to refine these proposals and make final predictions [15]. In the category of one -stage models, YOLOv7 [16] visualized in Figure 1, were selected, because of its advantage in detecting smaller objects compared to a single -shot detection approach, and for two -stage -selected Faster R-CNN, that is visualized in Figure 2.

Data

In conducting research on existing open-source logo datasets in Table 1, the attention was dedicated to comprehensively evaluating the diversity of logos within these repositories. Given the diverse nature of logos, ranging from graphic designs to text-based representations, the consideration was given to ensure that the selected datasets contain a wide spectrum of logo types. QMUL-OpenLogo [18]. This dataset contains more than 27000 images with 352 different logos.It is a benchmark dataset for logo detection, formed by combining seven existing datasets and establishing an open protocol for evaluating detection performance. This dataset demonstrates a significant imbalance in distribution and notable variations in scale, crucial aspects for evaluating the effectiveness of detection algorithms.

LogoDet-3k [18]: This dataset contains more than 3000 unique classes, 158652 images with labeled logo symbols (dataset example in Figure 3.). This dataset divides logos into 9 different sub -categories: food, clothes, necessities, electronics, transportation, leisure time equipment, sports, and medicine and other (described in Table 3.). The main advantage of this data set is the large number of variations of the same brands -positions, lighting conditions, angles.

Data preprocessing

In object detection models, the training data encompasses crucial elements, such as the images itself, the coordinates of object bounding boxes, and their respective labels. Brand logo datasets commonly feature annotations tailored to individual brands -each brand is labeled by the name of it. However, this approach presents a notable challenge: models must be trained separately to detect each brand and will require additional fine -tuning to recognize newly introduced brands. Moreover, some brands will contain more training images than the others, so it also introduces a class disbalance problem that will potentially impact the model's performance.

In response to these challenges, introduced a data preprocessing step, that involves categorizing brands into two distinct groups: logos with textual elements and graphicbased logos. The result is achieved in this workflow visualized in Figure 4.:

 Each image is cropped by the bounding box coordinates.  The image is processed using pytesseract -one of the most popular Python libraries for optical character recognition.  If the optical characters were detected -assign the logo with the "TextLogo" label.

If not -assign the logo with the "GraphicsLogo" label. The newly created dataset (in Table 4.) contains a noticeable class imbalance This might have a huge impact in the model's ability to accurately assign the correct labels, leading to poor overall predictions. To address this issue, more graphic logos from other datasets were added, leaving the final dataset with a similar amount of each category. By ensuring a balanced distribution of samples across both categories, the model is equipped to learn effectively from a diverse range of examples, enhancing its capacity to make accurate predictions across all classes The final dataset was formed using the same amount graphics logos and text logos.

Results

This research presents a comparative analysis between two object detection frameworks: YOLO (You Only Look Once) and Faster R-CNN. Specifically focusing on their efficiency to detect logo's bounding boxes, the aim is to provide an accuracy comparison between onestage and two-stage detection methodologies. The comparison analysis is essential for understanding the trade-offs between accuracy and efficiency. While one-stage detectors are generally faster, they might sacrifice some accuracy compared to two-stage detectors. By quantitatively comparing the accuracy of bounding box detection between the two methods, determination can be done whether the sacrifice in accuracy is acceptable given the efficiency gains.

The experiments were done using specifically crafted dataset (3.2) with the model's parameters that provided the best accuracy results:

• Faster R-CNN with ResNet-50 V1 FNP backbone, leveraging pre-trained weights, a learning rate (lr) set at 10-4, a momentum (m) of 0.9, and a weight decay (wd) of 10-6 and a batch size of 64.

• YOLOv7 with a learning rate (lr) set at 10-4, L2 regularization at 10-4 and batch size of 64.

The evaluation of deep learning models was made using an mAP metric. Mean Average Precision (mAP) is a widely used metric for evaluating the performance of machine learning models, particularly in object detection and recognition tasks. It provides the assessment of a model's ability to accurately identify objects within an image dataset. The mAP metric calculates the average precision for each class of objects across all images, then averages these values to produce a single score. This score reflects both the precision (the ratio of true positive predictions to all positive predictions) and recall (the ratio of true positive predictions to all actual positives) of the model. A higher mAP indicates better performance, with a score of 100% representing perfect detection accuracy. By utilizing mAP, we can quantify and compare the effectiveness of different models, aiding in the advancement of computer vision technologies.

Both model's data were augmented using random sized box crop with probability of 0.5, preserving the integrity of logos bounding boxes during random cropping and resizing. Horizontal Flip and Vertical Flip operations were applied independently, each with a 0.3 probability, introducing variations in viewpoint by flipping images horizontally and vertically. Data augmentation helped to increase YOLO model performance by 3% mAP and Faster R-CNN by 2% mAP. Both model accuracies in the last 70 epochs presented in the chart Figure 5. Even though, the Faster R-CNN achieved better accuracy results, the speed cost is significant compared both methods. During the performance analysis noticed, that YOLOv7 performs 13% faster. The Faster R-CNN model is more favorable when more time is available, but the YOLO algorithm is more favorable for the real-time task. During the comparison analysis between different Faster R-CNN FNP architectures noticed that ResNet-50 v1 demonstrates noticeable accuracy improvement over MobileNet-V3 and VGG16. The evaluation based on mAP metric shows that ResNet-50v1 provides up to 6% better accuracy results, compared to MobileNet and up to 5% compared to VGG16. Different FNP accuracies in the last 70 epochs presented in chart Figure 6. The real-world testing were done using 100 custom annotated frames from sport video footage with 664 logos in it. The model detected 523 logos bounding boxes -78% of all logos that were in testing data. The results presented in the confusion matrix Figure 7. In this study explored a deep learning algorithms to evaluate brand visibility in live sports videos, presenting a novel approach to address the challenges of manual annotation and subjective analysis. Through the utilization of advanced object detection models like YOLO and Faster R-CNN, the final model demonstrated the capability of automated methods to accurately detect and localize brand logos within the dynamic context of sport videos. Even though the final model demonstrated the accuracy of detecting eight out of ten logos in real -world video, this finding emphasizes the significant potential of automated solution in overcoming the limitations associated with manual annotation, offering more objective and more efficient evaluation of brand positioning. Looking forward, the continued development and refinement of deep learning methodologies, coupled with advancements in real-time monitoring capabilities, hold promise for further enhancing the accuracy and effectiveness of brand visibility evaluation in live sports videos. Additionally, the availability of larger and more diverse annotated datasets will be necessary in improving model performance and generalization.

Conclusions

Figure 1 :1Figure 1: YOLOv7 workflow [16].

Figure 2 :2Figure 2: Faster R-CNN workflow.

Figure 4 :4Figure 4: Data pre-processing pipeline Rather than developing a new dataset, this approach leverages existing ones. Such approach not only saves time and resources, but also ensures that the model benefits from a diverse range of logo samples. Additionally, providing a minimum of two classes for training is essential for any object detection model to effectively learn and generalize. The final dataset for model training is stored in COCO (Common Objects in Context) format.The newly created dataset (in Table4.) contains a noticeable class imbalance This might have a huge impact in the model's ability to accurately assign the correct labels, leading to poor overall predictions. To address this issue, more graphic logos from other datasets were added, leaving the final dataset with a similar amount of each category. By ensuring a balanced distribution of samples across both categories, the model is equipped to learn effectively from a diverse range of examples, enhancing its capacity to make accurate predictions across all classes The final dataset was formed using the same amount graphics logos and text logos.

Figure 5 :5Figure 5: Faster R-CNN and YOLOv7 comparison.

Figure 6 :6Figure 6: Faster R-CNN different FNP comparison.

Figure 7 :7Figure 7: Testing results confusion matrix.

Table 11Open-source logo datasets. This dataset includes 32 different logo classes from various domains. Since it contains about 2240 images with marked boundaries coordinates, it is well suited for building a model for brand detection and recognition (Table2.)Dataset nameLogosImagesYearBelaLogos [15] FlickrLogos-32 [16]37 2710,000 22402009 2011SportLogo [17]3128362020QMUL-OpenLogo [18]35227,0832018FoodLogoDet-1500 [19]150099,7682021LogoDet-3K [18]3000158,6522020FlickrLogos-32 [15]:

Table 22FlickrLogos-32analysis. Datase t partDescriptionImagesTotal images countP1Selected pictures with clean backgrounds. One brand logo in each image.10 for each logo320P2 P3Different images showing at least one brand logo in different conditions. Pictures without logos. Different images showing at least one brand logo in different conditions.30 for each logo 3000 without any logo 30 for each logo 3000 without any3960 3960Pictures without logos.logo

Table 33LogoDet-3k analysis.Figure 3: LogoDet-3k visualisation.CategoryImages countTotal brands countFood brands Clothes brands53,350 31,26664,276 37,601Necessities brands24,82230,643Electronics brands9,67512,139Transportation brands10,44512,791Leisure time equipment brands5,6856,573Sports brands3,9535,041Medicine brands3,9455,185

Table 44Logo categories results after preprocessing.DatasetText logoGraphics logoLogoDet-3K FlickrLogos-32 and others Total168,416 113 281,41625,845 3,154 29,008

The impact and effectiveness of advertisements in a sports arena LTurley JRichardShannon Journal of services marketing SShaz Experiential Marketing. A practical guide to interactive brand experiences

London and Philadelphia

Kogan Page Impact of Elements of Ad's on Sports Fan Attitude during a Live Sporting Event GRaghav GAradhana Scholarly Journaly 24 2022 An analytical study of consumer awareness RArora AChawla VSachdeva International Journal of Advanced Research in Management and Socal Sciences 13 8 Mutual enhancement for detection of multiple logos in sports videos LYuan LXiaoqing ZChengcui WYongtao TZhi IEEE International Conference on Computer Vision 2017 Elliptical ASIFT agglomeration in class prototype for logo detection RBoia CFlorea LFlorea Proceedings of the British Machine Vision Conference the British Machine Vision Conference 2015 Tree-basedShapeDescriptorforscalablelogodetection CWan ZZhao XGuo ACai Visual Communications and Image Processing 2013 Deep Learning for Logo Detection: A Survey HSujuan LJiacheng MWeiqing HY ZQiang ZYuanjie JShuqiang ACM Trans. Multimedia Comput. Commun 20 72 2023 A real-time vehicle logo detection method based on improved YOLOv2 YKangning HShaoqi LYe LChao YGuangqiang Proceedings of the International Conference on Wireless Algorithms, Systems, and Applications the International Conference on Wireless Algorithms, Systems, and Applications You only look once: Unified, real-time object detection JRedmon SKumar Divvala RBGirshick AFarhadi Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition the IEEE Conference on Computer Vision and Pattern Recognition 2016 Improving Small Object Proposals for Company Logo Detection CEgget DZecha SBrehm RLienhart ACM on International Conference on Multimedia Retrieval

New York

2017 A closer look: Small object detection in faster R-CNN CEgget SBrehm AWinschel DZecha RLienhart IEEE International Conference on Multimedia and Expo

Hong Kong

2017 Logo Recognition with the Use of Deep Convolutional Neural Networks AAlsheikhy YSald 2020 Engineering, Technology and Applied Science Research Which airline is this? Airline logo detection in real-world weather conditions CWilms RHeid MSadeghi ARibbrock SFrintrop Proceedings of the 25th International Conference on Pattern Recognition the 25th International Conference on Pattern Recognition 2021 A Review of Object Detection Models based on Convolutional Neural Network FSultana SAbu DParamartha ntelligent computing: image processing based applications 2020 PP-YOLO: An effective and efficient implementation of object detector XLong . arXiv:2007.12099 2020 rXiv preprint Logo retrieval with a contrario visual query expansion AJoly OBuisson roceedings of the 17th ACM International Conference on Multimedia Scalable logo recognition in real -world images SRomberg LGarcia Pueyo RLienhart VZRoelof Proceedings of the 1st ACM International Conference on Multimedia Retrieval the 1st ACM International Conference on Multimedia Retrieval A new sport teams logo dataset for detection tasks AKuznetsov AVSavchenko Proceedings of the International Conference on Computer Vision and Graphics the International Conference on Computer Vision and Graphics Open logo detection challenge HSu XZhu SGong Proceedings of the British Machine Vision Conference the British Machine Vision Conference FoodLogoDet-1500: A dataset for large-scale food logo detection via multi-scale feature decoupling network HQiang MWeiqing WJing HSujuan YZheng JShuqiang Proceedings of the 29th ACM International Conference on Multimedia the 29th ACM International Conference on Multimedia LogoDet-3K: Alarge-scale image dataset for logo detection WJing MWeiqing HSujuan MShengnan ZYuanjie JShuqiang IEEE Transactions on multimedia