<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Comparative Analysis of Camera Calibration Algorithms for Football Applications</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Oleksandr</forename><surname>Sorokivskyi</surname></persName>
							<email>sasha.sorokivski@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Department of Computer Science</orgName>
								<orgName type="department" key="dep2">Faculty of Computer Information Systems and Software Engineering</orgName>
								<orgName type="institution">Ternopil Ivan Puluj National Technical University</orgName>
								<address>
									<settlement>Ternopil</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Volodymyr</forename><surname>Hotovych</surname></persName>
							<email>gotovych@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Department of Computer Science</orgName>
								<orgName type="department" key="dep2">Faculty of Computer Information Systems and Software Engineering</orgName>
								<orgName type="institution">Ternopil Ivan Puluj National Technical University</orgName>
								<address>
									<settlement>Ternopil</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Oleg</forename><surname>Nazarevych</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Department of Computer Science</orgName>
								<orgName type="department" key="dep2">Faculty of Computer Information Systems and Software Engineering</orgName>
								<orgName type="institution">Ternopil Ivan Puluj National Technical University</orgName>
								<address>
									<settlement>Ternopil</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Grygorii</forename><surname>Shymchuk</surname></persName>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Department of Computer Science</orgName>
								<orgName type="department" key="dep2">Faculty of Computer Information Systems and Software Engineering</orgName>
								<orgName type="institution">Ternopil Ivan Puluj National Technical University</orgName>
								<address>
									<settlement>Ternopil</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Comparative Analysis of Camera Calibration Algorithms for Football Applications</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">C6CD7E2FEFCB91804C74BC44B5C78520</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:41+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>computer vision</term>
					<term>camera calibration</term>
					<term>homography estimation</term>
					<term>football application</term>
					<term>perspective projection</term>
					<term>semantic segmentation</term>
					<term>clustering</term>
					<term>machine learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In solving the problem of automated analysis of football match video recordings, special video cameras are currently used. This work presents a comparative characterization of known algorithms and methods for video camera calibration, including those utilizing machine learning and neural networks, with the aim of identifying their shortcomings and forming a theoretical foundation for developing modern, more effective methods and algorithms. Specifically, it examines both algorithms that require more input data but operate quickly [1, 2] and more accurate algorithms using machine learning <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b4">5]</ref>. It is demonstrated that their main drawback is either accuracy or speed. More accurate algorithms using machine learning often do not specify the algorithm's operational speed, which precludes their use in realtime applications. The examined works that emphasize speed frequently lack the accuracy necessary for practical use in real-life scenarios.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Football match analysis uses statistical data, tactics, and player performance metrics to help coaches, scouts, and media professionals understand games better and make data-driven decisions. .In football analytics, determining players' positions on the field plays a crucial role. Based on such information, it is possible not only to analyze <ref type="bibr" target="#b6">[7]</ref> but also to predict <ref type="bibr" target="#b7">[8]</ref> the game outcome.</p><p>One of the most popular solutions is the use of location sensors attached to players' bodies. However, this solution is not always optimal. Body-attached sensors often cause discomfort to players, and moreover, this solution is costly, making it inaccessible for football clubs with limited budgets.</p><p>Currently, computer vision technologies are gaining increasing popularity for solving the player localization problem on the field, particularly through automatic analysis of match video recordings. The determination of players' positions on the field occurs in two stages: camera calibration and parameter determination, followed by player localization in the camera image.</p><p>This work presents a comparative analysis of known computer vision and machine learning algorithms for camera calibration and parameter determination.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related works 2.1. Camera Model</head><p>For the classic pinhole model, the basic formula of perspective projection is given by: λ m m=K [ R T ] M , <ref type="bibr" target="#b0">(1)</ref> where: M denotes a 3D point and m denotes the corresponding 2D point on image. They are both expressed in homogeneous coordinate and λ m is an arbitrary scale factor.</p><p>R is 3 x 3 rotation matrix that describes the rotational mapping from the world coordinate system into the camera coordinate system.</p><p>T is a 3 x 1 vector that describes the translational mapping from the world coordinate system into the camera coordinate system.</p><p>K is a 3 x 3 matrix describing the internal camera parameters:</p><formula xml:id="formula_0">K = [ f s u 0 0 βf v 0 0 0 1 ] 2)</formula><p>where: Scale factor f applies to both the u and v axes of an image, while s describes the skew between these two axes.</p><p>Beta accounts for non-isotropic scaling, and the coordinates (u o ,v o ) denote the principal point.</p><p>When the observed 3D points lie on a plane, this projection can be simplified to a homography, which is a 3x3 matrix mapping between two planar surfaces. When considering perspective projection, an interesting phenomenon occurs with parallel lines in 3D space. If these lines are not parallel to the image plane, their 2D projections converge to a single point in the image. This point of convergence is termed the vanishing point. Notably, the line that connects this vanishing point to the optical center of the camera runs parallel to the corresponding 3D lines in space. Consequently, all sets of parallel lines in 3D space that share the same direction will correspond to the same vanishing point in the 2D image. This principle is fundamental to understanding how 3D scenes are projected onto 2D images in perspective projection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Problem statement</head><p>Camera calibration involves determining its internal parameters (focal length, pixel ratio, projection center) and external parameters (rotation and translation expressing the camera's position and orientation relative to the world coordinate system). Early approaches rely on matching local features in combination with direct linear transformation (DLT) to estimate homography. One of the first algorithms for determining these parameters is vanishingpoint based calibration (VPBC). The study <ref type="bibr" target="#b8">[9]</ref> presents a two-stage camera calibration method. In the first stage, the focal length and location of the principal point (intersection of the optical axis with the image plane) are determined using a single image of a calibration cube, an example of which is shown in Figure <ref type="figure">1</ref>. The second stage is dedicated to estimating the rotation matrix and translation vector between two cameras, using a stereo pair of images of a flat calibration pattern. This stage involves finding three corresponding vanishing points on both images, computing the rotation matrix based on these points, and estimating the translation vector through triangulation. In <ref type="bibr" target="#b9">[10]</ref> an improvement proposed to this method. Their approach is based on using only one image with two vanishing points, eliminating the need for a special calibration pattern. The method uses two lines to determine the vanishing points and information about the length of one of these lines to determine the transformation and subsequent calculations. This enhanced method simplifies the calibration process, making it more practical for various applications. Both studies - <ref type="bibr" target="#b8">[9]</ref> and <ref type="bibr" target="#b9">[10]</ref> -made significant contributions to the development of camera calibration methods, improving the accuracy and convenience of this process. The first study laid the foundation for using vanishing points in camera calibration, while the second proposed a more efficient approach requiring less input data. One of the first applications of the aforementioned algorithms in football is described in <ref type="bibr" target="#b1">[2]</ref>. This method consists of two main stages. The first stage involves detecting straight lines or their segments. For this purpose, Hough methods <ref type="bibr" target="#b10">[11]</ref> or edge segmentation methods <ref type="bibr" target="#b11">[12]</ref> are used. The detected lines are grouped into vertical and horizontal sets. The second stage involves matching these two sets of image segments with segments in the football field model. Matching occurs by identifying segments that intersect with each other. Having two vanishing points, the algorithm selects segments that best correspond to the field model constructed using these vanishing points. After this, rotation (R) and translation (T) matrices are calculated to obtain the final camera model. An example of the algorithm's sequential operation on real images is shown in Figure <ref type="figure">2</ref>. However, all the aforementioned algorithms have limitations and work effectively only under certain conditions and with specific input data. In the modern world, this is insufficient for a fully functional analytics system, the foundation of which is determining the homography matrix in each frame of the video stream.</p><p>Further discussion will be devoted to methods that not only find key points but also continue calculating the homography matrix in subsequent frames. These methods allow for the creation of more reliable and flexible systems for analyzing football matches, capable of working in various conditions and with different types of input data. A comparison of the aforementioned algorithms in terms of their application in real conditions of modern football is presented in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Dynamic camera calibration</head><p>The study <ref type="bibr" target="#b12">[13]</ref> is one of the pioneering works in the field of not only determining the homography matrix for individual frames but also tracking its changes in a sequence of frames. The authors first proposed a combined approach for automatic computation of homography during camera motion. This approach includes using the KLT system <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b15">16]</ref> for automatic detection of correspondences between frames by extracting characteristic features. Since these correspondences are not perfect and contain outliers, the RANSAC algorithm <ref type="bibr" target="#b16">[17]</ref> is applied to filter out incorrect matches. After this additional selection, the DLT algorithm is used to compute a new homography matrix for the current frame. An example of the algorithm's output is shown in Figure <ref type="figure" target="#fig_1">3</ref>.</p><p>However, this method has a significant drawback: with each new frame, the reprojection error accumulates, leading to inaccuracies in the homography matrix. To address this issue, the authors propose periodically adjusting the homography matrix using key points on the field lines.</p><p>It is important to note that the study does not specify the accuracy of the proposed algorithm. This approach laid the foundation for further research in the field of dynamic homography determination in video sequences.</p><p>The study <ref type="bibr" target="#b17">[18]</ref> proposes one of the first approaches to determining not only the homography matrix but also the camera rotation angles. The authors developed a method that uses prior information about key points in the goal area to calculate these parameters with a fixed focal length. Experimental verification of the algorithm was conducted on a sample of 500 frames, and the researchers claim to have achieved reprojection accuracy within 2 pixels. It is important to note that the work focuses on theoretical aspects of determining rotation angles without considering the practical application of this method to frames that do not contain the goal area. A visual representation of the reprojection results obtained using this algorithm can be seen in Figure <ref type="figure" target="#fig_2">4</ref>.</p><p>Despite using lines and camera parameters, <ref type="bibr" target="#b0">[1]</ref> proposes using multiple key frames and lines along with ellipses to find the homography matrix. The process begins with system initialization, where key frames, examples of which are shown in Figure <ref type="figure" target="#fig_3">5</ref>, are selected from the video sequence to cover the range of camera motion. Point correspondences between these key frames and the geometric model are manually selected to estimate homographies for all key frames. When processing new frames, the algorithm first identifies the closest key frame using local feature matching, applying SFOP key point detection <ref type="bibr" target="#b18">[19]</ref> with SIFT descriptors <ref type="bibr" target="#b19">[20]</ref>. This provides an initial estimate of the homography between the current frame and the geometric model. Feature finding is then performed by projecting the geometric model onto the current frame using the initial homography estimate. A model-driven approach is used to detect lines and ellipses in the frame, while point correspondences are obtained by back-projecting matches from the nearest key frame. The algorithm then proceeds to estimate homographies using two methods. First, it combines feature matches (lines, points, and ellipses) to obtain a linear estimate of the homography (H lin ). Second, it computes a frame-to-frame homography using local feature matches and combines this with the previous frame's homography to obtain an alternative estimate (H tr ). For refinement, the algorithm chooses between H lin and H tr based on the residual error area. The selected estimate serves as the initial value for further geometric minimization. The algorithm requires a lot of input data, which may be impossible or time-consuming to obtain.  The accuracy of this approach is also not specified.</p><p>The study by Zhang et al. <ref type="bibr" target="#b20">[21]</ref> describes a method for simplifying camera calibration and finding the transformation matrix by leveraging the specifics of a particular task. Traditionally, the DLT algorithm required four non-collinear key points to obtain the homography matrix. Instead, the authors propose the PCC (Pan-tilt camera calibration) algorithm, which takes into account the specifics of game filming where the camera remains stationary and only pan-tilt parameters change. This approach allows reducing the number of required key points for calibration to two. The PCC calibration process consists of two stages: first, initial camera calibration is performed using four points to determine fixed parameters, and then the Levenberg-Marquardt algorithm <ref type="bibr" target="#b21">[22]</ref> is applied to find the homography matrix using only two key points. An important innovation in this work is the use of the offset line as a source of key points for determining the transformation matrix. The authors conducted a comparative analysis of the accuracy of both algorithms using computer simulation, which showed that the accuracy of the PCC algorithm surpasses that of the DLT algorithm.</p><p>A comparison of the aforementioned algorithms in terms of their application in real conditions of modern football is presented in Table <ref type="table" target="#tab_1">2</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Machine learning based camera calibration</head><p>Machine learning is a powerful approach in the field of artificial intelligence that uses statistical methods to analyze large volumes of data. This technology allows algorithms to detect complex patterns and make accurate predictions, finding applications in many areas -from speech recognition to autonomous vehicle control.</p><p>In the context of camera calibration and finding the homography matrix, <ref type="bibr" target="#b2">[3]</ref> proposes an innovative approach. They use a branch and bound method in a Markov random field, where the energy function is based on semantic features such as field surface, lines, and circles. These features are obtained through semantic segmentation -one of the tasks of machine learning. The process involves minimizing an energy function that takes into account that the field should predominantly consist of field surface pixels, and the projections of field primitives should correspond to detected primitives in the image. To optimize this function, the authors applied a Structured SVM algorithm trained on data from 9 unique stadiums. The accuracy of the algorithm, measured on 186 labeled images, reached an IOU score of 0.86. Examples of the algorithm's results are shown in Figure <ref type="figure" target="#fig_4">6</ref>.</p><p>An alternative approach using machine learning was proposed <ref type="bibr" target="#b3">[4]</ref>. The authors developed their own camera simulator to create 75 labeled images that imitate field edges. These images and corresponding transformation matrices are stored in a separate database. When processing a real match image, the KNN algorithm searches for the most similar image in the database using one of three strategies: Chamfer matching, HOG, or CNN-based. To extract field edges from real images, the stroke width transform (SWT) algorithm is applied, which demonstrates better noise resistance compared to traditional methods such as the Canny edge detector. Additionally, the authors remove the crowd from the image (using color-based field segmentation) and players (applying the Faster-RCNN human detector) to obtain an edge map that predominantly contains only field lines with minimal noise.</p><p>In <ref type="bibr" target="#b22">[23]</ref>, a method is proposed that uses two neural networks: the first determines the initial homography matrix, while the second estimates the registration error. The process involves transforming the sports field template to the current perspective, combining the transformed image with the current one, and iteratively updating the homography parameters to minimize the error. The authors evaluated the accuracy of their method on WorldCup and hockey match data <ref type="bibr" target="#b3">[4]</ref>, achieving an IOU score of 89.8, which surpasses previous results.</p><p>A similar method is presented in <ref type="bibr" target="#b5">[6]</ref>, but with an important distinction. They first perform semantic segmentation of the field lines using DeepLabV3 ResNet <ref type="bibr" target="#b23">[24]</ref>, and then determine camera parameters through iterative optimization. This approach considers the reprojection error calculated from the found segments and their counterparts in the 2D image. The method was tested on the SoccerNetV3Calibration <ref type="bibr" target="#b24">[25]</ref> and WorldCup datasets, achieving scores of 76.9 Compound Score and 96.1 IOU part , respectively.</p><p>The researchers in <ref type="bibr" target="#b25">[26]</ref> introduce a novel approach using evenly spaced keypoints as field-specific features, framing the task as an instance segmentation problem with dynamic filter learning. To validate their method, they created the TS-WorldCup dataset, which comprises 3,812 sequential images from 43 videos of the 2014 and 2018 FIFA World Cup tournaments, featuring precise field markings. The method employs a standard encoder-decoder architecture similar to U-Net, with a ResNet-34 backbone for the encoder. It introduces a keypoints-aware label condition, using 91 pre-defined keypoints and dynamically generated convolution kernels. The approach utilizes a keypoints-specific controller and dynamic head to predict keypoint heatmaps, which are then merged to estimate the final homography using DLT and RANSAC. The proposed method demonstrated strong performance when evaluated on the WC and TSWC datasets, achieving IOU par and IOU whole scores of 0.96 and 0.91 for WorldCup, and 0.97 and 0.93 for TSWC, respectively. In their study, <ref type="bibr" target="#b26">[27]</ref> introduce speed metrics as a measure of performance in keypoint detection. The authors propose a dual-model approach, employing separate deep learning models for point and line detection, both utilizing heatmap-based techniques. For keypoint extraction, they adopt the widelyused HRNetV2-w48 model as their backbone architecture. The researchers report a processing time of 33.6 ms per image using a single Nvidia GeForce RTX 3090 GPU. While this performance approaches real-time capabilities, further optimization is necessary for widespread practical application.</p><p>In contrast, <ref type="bibr" target="#b4">[5]</ref> do not explicitly report processing times for their approach. They also use the HRNetV2-w48 backbone, but employ a single network for both keypoint and line detection, potentially offering computational advantages. Their model is trained on a less powerful NVIDIA GeForce RTX 2080 Ti GPU, which may impact processing speed. While both papers use heatmap-based techniques, the "No Bells, Just Whistles" approach integrates an additional boundary channel to enhance global information capture. This difference in architecture and the use of a single network for multiple tasks may lead to different performance characteristics, though direct speed comparisons are not possible without explicit timing data from the second paper. The effectiveness of this algorithm was evaluated on the SN23, WC14, and TSWC datasets, with the most significant improvement (98.6) achieved on the TSWC dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>3.Comparative analysis</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Calibration methods comparison</head><p>Static calibration methods paved the way for more accurate and powerful approaches. The methods of <ref type="bibr" target="#b8">[9]</ref> and <ref type="bibr" target="#b9">[10]</ref> are utilized in nearly every dynamic and machine learning-based algorithm. However, without additional refinement, they cannot be employed to determine the homography matrix for frame coordinate transformation. In contrast, <ref type="bibr" target="#b1">[2]</ref> can be applied to certain frames containing a sufficient number of visible segments, with the accuracy of this approach primarily dependent on the quality and quantity of identified segments. In typical football match video recordings, conditions are often suboptimal, with insufficient visible lines for this algorithm to function effectively.</p><p>The importance lies not only in the accuracy and speed of algorithms but also in their field coverage. A football field does not always contain enough lines to determine key points based on their intersections. The existing field coverage issues of previous approaches are partially addressed in <ref type="bibr" target="#b12">[13]</ref>. However, the authors do not specify the algorithm's accuracy, only its speed -1900 frames per hour, or 2 frames per second on a 2.8 GHz Pentium IV processor. Modern cameras record at a minimum of 24 frames per second, which would result in a very long wait time to process a 90-minute football match.</p><p>Camera parameter determination in <ref type="bibr" target="#b17">[18]</ref> does not resolve accuracy or speed issues of systems, focusing more on adding content to existing broadcasts or videos, which does not require high precision. Accuracy assessment is mentioned in <ref type="bibr" target="#b0">[1]</ref> -visual evaluation by experts is conducted. However, neither accuracy nor speed metrics are provided, though the approaches in this work improve upon the results of <ref type="bibr" target="#b12">[13]</ref>. Field coverage is also increased by using not only lines but also their combination with ellipses on the field, found in the central and goal areas.</p><p>However, improvements in accuracy and coverage depend on well-annotated key frames required for the algorithm's operation. This condition precludes application to real football match videos, as new annotated key frames would need to be created for each new camera or stadium.</p><p>With the advancement of machine learning, camera calibration methods have also evolved. In <ref type="bibr" target="#b2">[3]</ref>, the reprojection accuracy after homography determination is 0.88 IOU, significantly surpassing all previous approaches. The algorithm also does not focus on field parts where many key elements are visible but works on all field areas. The authors also indicate the speed of the homography matrix determination algorithm but do not specify the speed of the segmentation model. Typically, segmentation models are resource-intensive, precluding their use in real-time and significantly increasing the resources required for processing pre-recorded video. In <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b25">26]</ref>, the focus is mainly on improving accuracy and increasing field coverage. Moreover, with the achieved maximum accuracy of full field homography IOU part of 0.92, using the algorithm on offline video is quite feasible. Speed is mentioned only in <ref type="bibr" target="#b22">[23]</ref>, taking 9.58 seconds per frame.</p><p>Recent advancements in real-time keypoint detection algorithms have demonstrated significant progress in processing speed and efficiency. Falaleev et al. <ref type="bibr" target="#b26">[27]</ref> and Gutierrez et al. <ref type="bibr" target="#b4">[5]</ref> have reported state-of-the-art performance using the HRNet keypoint detection model, achieving a processing time of 33ms per frame on an Nvidia GeForce RTX 3090 GPU. These studies utilized the HRNetV2-w48 model, which belongs to the second-largest category within the HRNetV2 model family and comprises 67.1 million parameters. While the aforementioned research focused on larger models, it is worth noting that smaller variants within the HRNet family exist. These include the HRNet-W40-C with 57.6 million parameters and the most compact version, HRNet-W18-C-Small-v1, containing 13.2 million parameters.</p><p>However, the performance characteristics of these smaller models in real-time keypoint detection tasks remain unexplored in the current literature. Furthermore, the studies by Falaleev et al. <ref type="bibr" target="#b26">[27]</ref> and Gutierrez et al. <ref type="bibr" target="#b4">[5]</ref> did not investigate additional techniques for model size reduction or processing speed enhancement. This presents an opportunity for future research to explore optimization strategies that could potentially improve the efficiency and applicability of keypoint detection models across various computational resources and real-time scenarios.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Future directions of research</head><p>An analysis of current literature in the field of machine learning for camera calibration and homography matrix determination reveals a significant shortcoming in algorithm description and evaluation: a considerable portion of the presented methods lacks comprehensive information regarding quality metrics or performance speed. This limitation is particularly noticeable in machine learning algorithms, where information about the algorithm's operational speed is often absent. Such a situation creates serious obstacles for objective assessment of algorithm efficiency and their comparison.</p><p>The lack of performance data significantly limits the application of these algorithms in real-time systems, where data processing speed is a critical factor. Many algorithms that demonstrate high accuracy on test datasets may prove unsuitable for practical use due to low operational speed.</p><p>To overcome these limitations, it is necessary to focus on developing algorithms in the direction of improving their performance. This includes optimizing existing algorithms, developing new approaches with an emphasis on computational efficiency, as well as applying parallel computing and specialized hardware. It is important for researchers and developers to pay more attention to comprehensive algorithm evaluation, including both quality and performance metrics, which will expand their scope of application and increase efficiency in real conditions.</p><p>It is worth noting that the authors of the aforementioned works do not consider a number of important optimization techniques for computer vision models. In particular, methods such as Pruning, Quantization, Knowledge Distillation, and Sparsity remain overlooked. These techniques have significant potential for substantially accelerating model performance while maintaining their effectiveness.</p><p>The use of these methods allows for the optimization of large and powerful models, reducing their computational requirements without significant loss of quality. For example, Pruning allows for the removal of the least important weights in a neural network, Quantization reduces the precision of parameter representation, Knowledge Distillation transfers knowledge from a large model to a smaller one, and Sparsity introduces sparseness into the network architecture.</p><p>The absence of consideration of these techniques in the analyzed works indicates a potential direction for further research and improvements in the field of computer vision model optimization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusions</head><p>In this work, studies related to camera calibration for determining the homography matrix for subsequent transformation of 2D coordinates into 3D coordinates were analyzed. The analysis encompassed both foundational works associated with general camera calibration algorithms and more contemporary developments utilizing machine learning. It was ascertained that works without using of machine learning do not specify algorithm accuracy. Machine learning approach demonstrate sufficient accuracy but lack the necessary speed for comfortable use on offline recordings, as well as for potential future real-time application during broadcasts. The study analyzed and presented the main shortcomings of these works, conducted a comparative analysis of solutions, and identified directions and ideas for future improvements in this field.</p><p>The research covered a range of approaches, from basic camera calibration methods to advanced machine learning techniques. It revealed that traditional methods often lack precise accuracy metrics, while machine learning approach, though accurate, fall short in terms of processing speed. This speed limitation hinders their practical application in both offline video analysis and real-time broadcast scenarios.</p><p>A critical evaluation of existing methodologies highlighted their respective strengths and weaknesses. The comparative analysis provided insights into the effectiveness of various solutions, considering factors such as accuracy, field coverage, and computational efficiency. This comprehensive review served to pinpoint areas where current approaches fall short and where future research efforts should be concentrated.</p><p>Based on the findings, several directions for future research and development were identified. These include the need for improved algorithm speed optimization, especially for machine learning-based methods, without compromising accuracy. Additionally, the potential for incorporating advanced optimization techniques such as pruning, quantization, knowledge distillation, and sparsity in model architectures was emphasized as a promising avenue for enhancing both accuracy and computational efficiency.</p><p>The work underscores the importance of developing algorithms that not only achieve high accuracy but also demonstrate practical applicability in real-world scenarios, particularly in the context of sports analytics and broadcast technologies. By highlighting these areas for improvement, this analysis provides a valuable foundation for future research aimed at advancing the task of camera calibration and homography matrix determination.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :Figure 2 :</head><label>12</label><figDesc>Figure 1: The aluminium block used to calibrate intinsic parameters of each camera</figDesc><graphic coords="3,117.15,187.95,360.90,267.75" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Results examples from [13]: each row corresponds to different frame</figDesc><graphic coords="5,207.40,65.55,180.60,262.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Results examples from [18].</figDesc><graphic coords="6,94.55,330.70,406.30,71.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Key frames used in [1].</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Examples of the obtained homographies and semantic segmentations in [3].</figDesc><graphic coords="8,94.55,92.00,406.00,286.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Comparison table for static calibration methods</figDesc><table><row><cell>Method</cell><cell></cell><cell cols="2">Input data</cell><cell></cell><cell>Accuracy</cell><cell>Applicability</cell></row><row><cell cols="2">Using Points for Camera Vanishing Calibration [9]</cell><cell cols="3">Specific calibration pattern; Calibration required Images; 2 Cameras</cell><cell>Translation errors from 13 to 45 cm for distances ranging were about ±3 mm</cell><cell>Base method for further usage</cell></row><row><cell>Using</cell><cell>Vanishing</cell><cell cols="3">A single image</cell><cell>Not provided</cell><cell>Base method for</cell></row><row><cell cols="2">Points for Camera</cell><cell cols="3">containing at least</cell><cell>further usage</cell></row><row><cell cols="2">Calibration Coarse Reconstruction from and 3D A Single Image [10]</cell><cell cols="3">two points; Two sets of vanishing parallel lines selected by the user to determine the</cell></row><row><cell></cell><cell></cell><cell cols="2">vanishing</cell><cell>points;</cell></row><row><cell></cell><cell></cell><cell cols="3">The length of one</cell></row><row><cell></cell><cell></cell><cell cols="3">line segment in 3D</cell></row><row><cell></cell><cell></cell><cell cols="3">space (to determine</cell></row><row><cell></cell><cell></cell><cell>the</cell><cell cols="2">translation</cell></row><row><cell></cell><cell></cell><cell>vector);</cell><cell></cell><cell>The</cell></row><row><cell></cell><cell></cell><cell cols="3">principal point is</cell></row><row><cell></cell><cell></cell><cell cols="3">assumed to be the</cell></row><row><cell></cell><cell></cell><cell cols="3">center of the image</cell></row><row><cell></cell><cell></cell><cell cols="3">The aspect ratio is</cell></row><row><cell></cell><cell></cell><cell cols="3">fixed by the user</cell></row><row><cell cols="2">Fast 2D model-toimage registration using vanishing points for sports video analysis [2]</cell><cell cols="3">A sufficient number of segments of reasonable quality must be extractable from the images for</cell><cell>Not provided</cell><cell>Applicable only for frames where sufficient number of segments are observable</cell></row><row><cell></cell><cell></cell><cell>the</cell><cell cols="2">registration</cell></row><row><cell></cell><cell></cell><cell cols="3">system to work.</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Comparison table for dynamic calibration methods</figDesc><table><row><cell>Method</cell><cell>Input data</cell><cell>Accuracy</cell><cell>Applicability</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Comparison table for machine learning based calibration methods</figDesc><table><row><cell cols="2">Method</cell><cell></cell><cell>Input data</cell><cell></cell><cell cols="2">Accuracy</cell><cell>Applicability</cell></row><row><cell cols="3">Sports Localization Deep Structured Field via Models [3]</cell><cell cols="2">No input data re-quired</cell><cell cols="2">Authors accuracy Based on that test set collected games as a test set. 186 images from 10</cell><cell>For offline usage</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">0.88 IOU with the</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">manually</cell><cell>labeled</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>data.</cell></row><row><cell cols="2">Automated</cell><cell>Top</cell><cell cols="2">Database with edge</cell><cell>Authors</cell><cell>manually</cell><cell>For offline usage</cell></row><row><cell cols="3">View Registration of</cell><cell>images</cell><cell>and</cell><cell cols="2">annotated</cell><cell>500</cell></row><row><cell cols="3">Broadcast Football</cell><cell>corresponding</cell><cell></cell><cell cols="2">images from 16</cell></row><row><cell cols="2">Videos [4]</cell><cell></cell><cell>homography</cell><cell></cell><cell cols="2">different</cell><cell>matches.</cell></row><row><cell></cell><cell></cell><cell></cell><cell>matrices</cell><cell></cell><cell cols="2">IOU measure is 0.86</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>for</cell><cell>the</cell><cell>best</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">approach.</cell></row><row><cell cols="3">Optimizing Through</cell><cell>Models weights</cell><cell></cell><cell cols="2">WorldCup IOU: 0.88</cell><cell>For offline usage</cell></row><row><cell cols="3">Learned Errors for</cell><cell></cell><cell></cell><cell cols="2">Hokey dataset IOU:</cell></row><row><cell cols="3">Accurate Sports</cell><cell></cell><cell></cell><cell>0.967</cell></row><row><cell>Field</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="3">Registration [23]</cell><cell></cell><cell></cell><cell></cell></row><row><cell cols="2">TVCalib:</cell><cell>Camera</cell><cell>Model weights</cell><cell></cell><cell cols="2">WorldCUP IOU par :</cell><cell>For offline usage</cell></row><row><cell cols="2">Calibration</cell><cell>for</cell><cell></cell><cell></cell><cell>0.96</cell></row><row><cell cols="2">Sports Field</cell><cell></cell><cell></cell><cell></cell><cell cols="2">SoccerNetV3</cell><cell>CR:</cell></row><row><cell cols="3">Registration in</cell><cell></cell><cell></cell><cell>76.9</cell></row><row><cell cols="2">Football [6]</cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell>Sports</cell><cell></cell><cell>Field</cell><cell>Model weights</cell><cell></cell><cell cols="2">WorldCup IOU par :</cell><cell>For offline usage</cell></row><row><cell cols="2">Registration</cell><cell>via</cell><cell></cell><cell></cell><cell>0.96</cell></row><row><cell cols="3">Keypointsaware Label Condition [26]</cell><cell></cell><cell></cell><cell cols="2">WorldCup IOU whole : 0.91</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell cols="2">TSWC IOU par : 0.97</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>TSWC</cell><cell>IOU whole :</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>0.93</cell></row><row><cell cols="3">Enhancing Football Camera Calibration Exploitation [27] Through Keypoint</cell><cell>Model weights</cell><cell></cell><cell cols="2">Acc@5: 0.73 Soccernet Challenge 2023 [28] Camera Calibration</cell><cell>For offline usage</cell></row><row><cell>No</cell><cell>Bells,</cell><cell>Just</cell><cell>Models weights</cell><cell></cell><cell cols="2">WorldCup IOU par :</cell><cell>For offline usage</cell></row><row><cell cols="2">Whistles:</cell><cell>Sports</cell><cell></cell><cell></cell><cell>0.96</cell></row><row><cell cols="3">Field Registration by Leveraging Geometric Properties [5]</cell><cell></cell><cell></cell><cell cols="2">WorldCup IOU whole : 0.92 TSWC IOU par : 0.98 TSWC IOU whole :</cell></row><row><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell>0.96</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Using line and ellipse features for rectification of broadcast hockey video</title>
		<author>
			<persName><forename type="first">A</forename><surname>Gupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Little</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Woodham</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Canadian conference on computer and robot vision</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2011">2011. 2011</date>
			<biblScope unit="page" from="32" to="39" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Fast 2d model-to-image registration using vanishing points for sports video analysis</title>
		<author>
			<persName><forename type="first">J.-B</forename><surname>Hayet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Piater</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">G</forename><surname>Verly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Image Processing 2005</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page">417</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Sports field localization via deep structured models</title>
		<author>
			<persName><forename type="first">N</forename><surname>Homayounfar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Fidler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Urtasun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="5212" to="5220" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Automated top view registration of broadcast football videos</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bhat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Gandhi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Jawahar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Winter Conference on Applications of Computer Vision (WACV)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="305" to="313" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">No bells just whistles: Sports field registration by leveraging geometric properties</title>
		<author>
			<persName><forename type="first">M</forename><surname>Gutiérrez-Pérez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Agudo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="3325" to="3334" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Tvcalib: Camera calibration for sports field registration in soccer</title>
		<author>
			<persName><forename type="first">J</forename><surname>Theiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ewerth</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</title>
				<meeting>the IEEE/CVF Winter Conference on Applications of Computer Vision</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1166" to="1175" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Motion fields to predict play evolution in dynamic sport scenes</title>
		<author>
			<persName><forename type="first">K</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Grundmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Shamir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Matthews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">K</forename><surname>Hodgins</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Essa</surname></persName>
		</author>
		<ptr target="https://api.semanticscholar.org/CorpusID:8455859" />
	</analytic>
	<monogr>
		<title level="m">IEEE Computer Society Conference on Computer Vision and Pattern Recognition</title>
				<imprint>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="840" to="847" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Video analysis of hockey play in selected game situations</title>
		<author>
			<persName><forename type="first">F</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Woodham</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Image and Vision Computing</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page" from="45" to="58" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Using vanishing points for camera calibration</title>
		<author>
			<persName><forename type="first">B</forename><surname>Caprile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Torre</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International journal of computer vision</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="127" to="139" />
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Using vanishing points for camera calibration and coarse 3d reconstruction from a single image</title>
		<author>
			<persName><forename type="first">E</forename><surname>Guillou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Meneveaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Maisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Bouatouch</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Visual Computer</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="396" to="410" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Robust camera calibration for sport videos using court models</title>
		<author>
			<persName><forename type="first">D</forename><surname>Farin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Krabbe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Effelsberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Storage and Retrieval Methods and Applications for Multimedia</title>
				<imprint>
			<publisher>SPIE</publisher>
			<date type="published" when="2003">2004. 2003</date>
			<biblScope unit="volume">5307</biblScope>
			<biblScope unit="page" from="80" to="91" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Incremental rectification of sports fields in video streams with application to soccer</title>
		<author>
			<persName><forename type="first">J.-B</forename><surname>Hayet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Piater</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Verly</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advanced Concepts for Intelligent Vision Systems</title>
				<meeting><address><addrLine>ACIVS</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2004">2004. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Automatic rectification of long image sequences</title>
		<author>
			<persName><forename type="first">K</forename><surname>Okuma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Little</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">G</forename><surname>Lowe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Asian conference on computer vision</title>
				<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="volume">9</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Birchfield</surname></persName>
		</author>
		<title level="m">Depth and motion discontinuities</title>
				<imprint>
			<date type="published" when="1999">1999</date>
		</imprint>
		<respStmt>
			<orgName>stanford university</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Good features to track</title>
		<author>
			<persName><forename type="first">J</forename><surname>Shi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tomasi</surname></persName>
		</author>
		<idno type="DOI">10.1109/CVPR.1994.323794</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition</title>
				<meeting>/ CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2000">2000</date>
			<biblScope unit="volume">600</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Detection and tracking of point</title>
		<author>
			<persName><forename type="first">C</forename><surname>Tomasi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kanade</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Int J Comput Vis</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">3</biblScope>
			<date type="published" when="1991">1991</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Fischler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Bolles</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="381" to="395" />
			<date type="published" when="1981">1981</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Camera pose estimation in soccer scenes based on vanishing points</title>
		<author>
			<persName><forename type="first">V</forename><surname>Babaee-Kashany</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">R</forename><surname>Pourreza</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Symposium on Haptic Audio Visual Environments and Games</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Detecting interpretable and accurate scale-invariant keypoints</title>
		<author>
			<persName><forename type="first">W</forename><surname>Förstner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Dickscheid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Schindler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 12th International Conference on Computer Vision</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2009">2009. 2009</date>
			<biblScope unit="page" from="2256" to="2263" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Distinctive image features from scale-invariant keypoints</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">G</forename><surname>Lowe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International journal of computer vision</title>
		<imprint>
			<biblScope unit="volume">60</biblScope>
			<biblScope unit="page" from="91" to="110" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Research on camera calibration in football broadcast videos</title>
		<author>
			<persName><forename type="first">S</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal of uand e-Service, Science and Technology</title>
		<imprint>
			<biblScope unit="volume">8</biblScope>
			<biblScope unit="page" from="89" to="98" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Semantic video shot segmentation based on color ratio feature and svm</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Niu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2008 International Conference on Cyberworlds</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="157" to="162" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Optimizing through learned errors for accurate sports field registration</title>
		<author>
			<persName><forename type="first">W</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C G</forename><surname>Higuera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Angles</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Javan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Yi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</title>
				<meeting>the IEEE/CVF Winter Conference on Applications of Computer Vision</meeting>
		<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="201" to="210" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<author>
			<persName><forename type="first">L.-C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Papandreou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Schroff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Adam</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1706.05587</idno>
		<title level="m">Rethinking atrous convolution for semantic image segmentation</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Scaling up soccernet with multi-view spatial localization and re-identification</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cioppa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Deliege</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Giancola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Ghanem</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Van Droogenbroeck</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Scientific data</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">355</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Sports field registration via keypoints-aware label condition</title>
		<author>
			<persName><forename type="first">Y.-J</forename><surname>Chu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-W</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K.-W</forename><surname>Hsiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Lien</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-H</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-C</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R.-R</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-Y</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-K</forename><surname>Chu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</title>
				<meeting>the IEEE/CVF Conference on Computer Vision and Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="3523" to="3530" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">S</forename><surname>Falaleev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2410.07401</idno>
		<title level="m">Enhancing soccer camera calibration through keypoint exploitation</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Soccernet 2023 challenges results</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cioppa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Giancola</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Somers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Magera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mkhallati</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Deliège</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Held</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hinojosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Mansourian</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Sports Engineering</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page">24</biblScope>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
