Automated Recognition of Sports Scores using
                                PyTessaract OCR and CNN: SportsVideo Task at
                                MediaEval 2023
                                Bhuvana Jayaraman† , Mirnalinee TT, Harshida Sujatha Palaniraj† , Mohith Adluru†
                                and Sanjjit Sounderrajan†
                                Sri Sivasubramaniya Nadar College of Engineering, Chennai, Tamil Nadu, India


                                                                         Abstract
                                                                         In the dynamic realm of competitive swimming, timely and accurate dissemination of race results
                                                                         is paramount. The advent of digital display boards has streamlined this process, offering real-time
                                                                         performance metrics for each swimmer. However, extracting and recognizing the characters displayed
                                                                         on these boards pose unique challenges. Our proposed methodology addresses these intricacies, offering
                                                                         a robust solution for real-time score recognition amidst dynamic game-play and varying camera angles.
                                                                         The systematic approach leverages advanced image preprocessing techniques to refine input images,
                                                                         employs PyTesseract OCR for real-time interpretation of textual information from swimming board
                                                                         images, and integrates post-processing validation to enhance the accuracy of the extracted race results.
                                                                         Our findings not only contribute to the evolving field of sports analytics but also pave the way for
                                                                         enhanced viewer experience and enriched data accessibility in the realm of sports broadcasting.


                                1. Introduction
                                Swimming competitions have long been a focal point of sporting events, attracting participants
                                and enthusiasts alike. In this era of fast-paced sporting events, the real-time availability of
                                race results is of paramount importance, offering athletes insights into their performance and
                                spectators the thrill of immediate competition outcomes.
                                   Digital boards have emerged as indispensable tools in this context, acting as conduits for
                                promptly showcasing race results. These boards, equipped with advanced display technology,
                                present a visually rich representation of crucial information such as swimmer names, times,
                                and rankings. However, the process of manually extracting this information from the visually
                                diverse and dynamic context of swimming events is arduous and prone to errors [1].
                                   The motivation behind this research stems from the recognition of the transformative potential
                                of OCR technology in addressing these challenges. By automating the extraction of textual
                                information from digital boards, OCR not only promises to expedite the process but also ensures
                                a higher degree of accuracy. This, in turn, enhances the overall experience for both participants
                                and spectators, fostering a more efficient and error-resistant mechanism for obtaining vital race
                                results.

                                MediaEval’23: Multimedia Evaluation Workshop, February 1–2, 2024, Amsterdam, The Netherlands and Online
                                *
                                  Corresponding author.
                                †
                                  These authors contributed equally.
                                $ bhuvanaj@ssn.edu.in (B. Jayaraman); mirnalineett@ssn.edu.in (M. TT); harshida2110349@ssn.edu.in
                                (H. S. Palaniraj); mohith2110799@ssn.edu.in (M. Adluru); sanjjit2110378@ssn.edu.in (S. Sounderrajan)
                                 https://www.ssn.edu.in/staff-members/dr-j-bhuvana/ (B. Jayaraman);
                                https://www.ssn.edu.in/staff-members/dr-t-t-mirnalinee/ (M. TT)
                                 0000-0002-9328-6989 (B. Jayaraman); 0000-0001-6403-3520 (M. TT)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   The primary objectives of this research, within the context of Task 6 to recognize results
of races [2], involve evaluating OCR’s effectiveness in recognizing characters under diverse
conditions encountered in swimming environments and developing a tailored preprocessing
pipeline to enhance its adaptability. By achieving these objectives, the study seeks to contribute
to the efficient and accurate extraction of critical race information, offering potential applications
beyond competitive swimming in visually complex scenarios.


2. Related Work
Extraction and recognition of text from an image is an important step to display and predict
the results. Tesseract OCR is implemented to extract texts from given video frames or images
with an additional image processing filter, OpenCV [3]. Some of the processes used in are
morphological operations and Gabor filters for preprocessing, edge detection, DWT and SWT
for feature extraction and Hough transform and k-means clustering for segmentation [4].
   For unconstrained image indexing and retrieval system using a neural network, this paper
suggests extracting a set of features from ROI for that specific color plane and using them
further in a feature-based classifier to determine if the ROI contains text or non-text blocks.
The blocks identified as text are next given as input to an OCR. The OCR output in the form of
ASCII characters forming words is stored in a database as keywords with reference for future
retrieval [5].
   The existing algorithms have been modified to be effective on blurred and noisy images.
Images with clear letter boundaries are filtered using OCR to detect the text [6].
   Text extraction involves detection, localization, tracking, binarization, extraction, enhance-
ment and recognition of the text from the given image. The proposed methods were based on
morphological operators, wavelet transform, artificial neural network, skeletonization operation,
edge detection algorithm, histogram technique, etc [7]. The method used to extract text regions
for the objective of image segmentation uses DWT and k-means clustering [8] [9]. Moreover,
methods such as texture-based text extraction approach to existing edge-based and connected
component (CC)-based algorithms were proposed [10] for text extraction due to variables in
orientation, alignment, font, size, poor picture contrast, and complex background in the text.
   This paper focuses on text segmentation in textured backgrounds with similar colors to text,
outperforming recent works on challenging images.[1].


3. Approach
Our comprehensive approach to the automated extraction of swimming race results integrates
image preprocessing, Optical Character Recognition (OCR) with PyTesseract, and meticulous
post-processing validation. Leveraging OpenCV and PyTesseract, we systematically optimize
input images, configure OCR settings, and validate extracted information. Simultaneously, our
innovative solution incorporates a hybrid methodology for score extraction in table tennis
and swimming events. This approach combines Convolutional Neural Network (CNN) feature
extraction with Tesseract OCR for both visual and textual cues. Initial benchmarks involved
training the CNN on visual features and parallel processing with Tesseract OCR for text recog-
nition. Successes include accurate score extraction and additional context, while challenges
involved varying fonts and layouts. Refined CNN architecture, preprocessing optimization,
and continuous learning mechanisms addressed these challenges. Real-world feedback and
validation on diverse datasets guided our iterative development, resulting in a robust solution
for both swimming race results and sports event scores.
Figure 1: Architecture overview


3.1. Dataset Description
The dataset comprises a collection of images capturing swimming scoreboards with associated
player names and scores, sourced from various swimming competitions. The dataset includes
variations in lighting conditions, background settings, scoreboard designs, and perspectives
to enhance the development and evaluation of algorithms for accurate score extraction from
scoreboard images.

3.2. Image Preprocessing
In the initial step of image preprocessing, the swimming board image is loaded using the
OpenCV library. The primary goal is to enhance the subsequent OCR process by simplifying
the image’s structure. Grayscale conversion is applied to eliminate color complexities, as the
focus is on textual information. Optional thresholding techniques are considered to create a
binary image, accentuating text while minimizing background noise. Additionally, denoising is
employed to further enhance the clarity of the text on the image. These preprocessing steps
collectively lay the groundwork for accurate OCR by optimizing the input image.

3.3. OCR Text Extraction
Following image preprocessing, the PyTesseract library is employed for Optical Character
Recognition (OCR). The OCR process involves interpreting the text present on the swimming
board image. It is crucial to configure the OCR settings appropriately, specifying the OCR
Engine Mode (OEM) and Page Segmentation Mode (PSM). These configurations are selected
based on the specific characteristics of the text in the swimming race result boards. PyTesseract
translates the visual information in the image into machine-readable text, forming the basis for
subsequent analysis.
3.4. Post Processing
Post-processing steps are crucial for refining and organizing the extracted text into meaningful
race results. The extracted text is split into relevant components based on the expected structure
of the race results. Following this, custom validation checks are implemented to ensure that the
extracted information aligns with the expected format of swimming race results. Any errors or
inconsistencies introduced during the OCR process are addressed in this step. Post-processing
thus serves as a critical phase for ensuring the accuracy and reliability of the extracted race
results.

3.5. Hybrid Approach: PyTesseract OCR + CNN-Based Score Extraction
Our streamlined approach starts with dataset curation, collecting diverse annotated scoreboards
from swimming events. PyTesseract OCR extracts initial text, focusing on visible scores and
details. Preprocessing involves normalizing and resizing images for consistency and model
generalization. VGG16, a pre-trained CNN, then captures visual patterns associated with scores.
A custom score layer is added that enhances the model’s ability to predict numerical scores.
Validation optimizes hyperparameters for robust performance. Real-time deployment prioritizes
CNN predictions, augmented by consolidation and confidence scoring for improved accuracy.
Continuous learning, through periodic updates, ensures adaptability to evolving scoreboards
and presentation styles. This efficient hybrid method integrates OCR with CNN for automated
score extraction in dynamic sports events.


4. Results and Analysis
In our research endeavor, SSN-MLRG-TEAM2 devised a hybrid PyTesseract OCR + CNN ap-
proach, which yielded promising results for score extraction in swimming events. The overall
accuracy reached 88.05%, showcasing a notable improvement over individual PyTesseract OCR
or CNN usage.
   Comparative analysis revealed our hybrid approach’s unique advantages—adaptability to
diverse sports, real-time processing, and improved accuracy. Future research will explore
extensions to other sports and domains, emphasizing responsible AI practices. Overall, our
results underscore the potential of the hybrid PyTesseract OCR + CNN approach in automating
sports analytics.


5. Discussion and Outlook
Our methodology for swimming race result extraction has exhibited a commendable accuracy of
88.05% in interpreting diverse swimming board images, providing a robust foundation for auto-
mated sports analytics. Despite successes, challenges such as variations in text layout and font
styles persist, warranting further refinement in OCR configurations and potential integration
of machine learning techniques. Looking forward, the broader applicability of our approach
extends beyond swimming competitions to other sports with similar visual data challenges.
Future research avenues include exploring real-time data streaming integration and developing a
more comprehensive sports analytics framework. Ethical considerations, particularly regarding
data privacy and bias mitigation, remain crucial focal points. As technological advancements
continue, our work serves as a stepping stone for ongoing developments in automated sports
analytics, presenting valuable insights and avenues for future exploration and improvement.
References
 [1] S. Minaee, Y. Wang, Text extraction from texture images using masked signal decomposition, in:
     2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), IEEE, 2017, pp.
     1210–1214.
 [2] A. Erades, P. Martin, R. V. B. Mansencal, R. Péteri, J. Morlier, S. Duffner, J. Benois-Pineau, Sportsvideo:
     A multimedia dataset for event and position detection in table tennis and swimming, in: Working
     Notes Proceedings of the MediaEval 2023 Workshop, Amsterdam, The Netherlands and Online and
     Online, 1-2 February 2024, CEUR Workshop Proceedings, CEUR-WS.org, 2023.
 [3] R. Chatterjee, A. Mondal, Effects of different filters on text extractions from videos using tesseract,
     2021. doi:10.13140/RG.2.2.36025.08804.
 [4] R. Manjunath, S. Guruswamy, A Survey on Text Detection from Document Images, 2020, pp.
     961–972. doi:10.1007/978-981-15-0633-8_98.
 [5] C. Misra, P. K. Swain, J. K. Mantri, Text extraction and recognition from image using neural network,
     International journal of computer applications 40 (2012) 13–19.
 [6] N.-M. Chidiac, P. Damien, C. Yaacoub, A robust algorithm for text extraction from images, in: 2016
     39th International Conference on Telecommunications and Signal Processing (TSP), IEEE, 2016, pp.
     493–497.
 [7] C. Sumathi, T. Santhanam, G. Devi, A survey on various approaches of text extraction in images,
     International Journal of Computer Science and Engineering Survey 3 (2012).
 [8] D. Ghai, D. Gera, N. Jain, A new approach to extract text from images based on dwt and k-
     means clustering, International Journal of Computational Intelligence Systems 9 (2016) 900–916.
     doi:10.1080/18756891.2016.1237189.
 [9] D. Ghai, N. Jain, Comparative analysis of multi-scale wavelet decomposition and k-means clus-
     tering based text extraction, Wireless Personal Communications 109 (2019) 1–36. doi:10.1007/
     s11277-019-06574-w.
[10] D. Ghai, N. Jain, Comparison of Different Text Extraction Techniques for Complex Color Images,
     2022, pp. 139–160. doi:10.1002/9781119861850.ch9.