<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multiple object tracking for video-based sports analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julius Gudauskas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Žygimantas Matusevičius</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kaunas University of Technology</institution>
          ,
          <addr-line>Studentų g. 50, Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>Multiple object tracking (MOT) is a challenging task in computer vision. Many algorithms have been proposed to track multiple targets for video surveillance, team-sport analysis, or human-computer interaction. Recent studies have already indicated that multiple object tracking could provide valuable information in team sports analysis. Therefore, in this paper, we investigate object tracking techniques for paralympic team sport - goalball. Different tracking methods have been implemented and compared, evaluating prediction accuracy and performance speed in players and the ball tracking.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multiple object tracking</kwd>
        <kwd>MOT</kwd>
        <kwd>SOT</kwd>
        <kwd>CNN</kwd>
        <kwd>ONNX</kwd>
        <kwd>Goalball</kwd>
        <kwd>Boosting</kwd>
        <kwd>CSR-DCF</kwd>
        <kwd>KCF</kwd>
        <kwd>MOSSE</kwd>
        <kwd>TLD</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).</p>
    </sec>
    <sec id="sec-2">
      <title>Overview 2.1</title>
    </sec>
    <sec id="sec-3">
      <title>Related works</title>
      <p>Some object tracking strategies have been implemented, but the best solution has not yet been found.
Next, we examine the existing object tracking techniques and object tracking solutions in sports.</p>
      <p>
        SAP develops solutions for video-based sports analytics. For the football wo
        <xref ref-type="bibr" rid="ref6">rld championship 2014</xref>
        in Brazil, SAP with German Football Association successfully developed Match Insights analytical
solution. It was decided to integrate it with Panasonic video and tracking software [8] to improve the
solution.
      </p>
      <p>VisualiZation in real-time (Vizrt) provides content creation, control, and delivery tools for the digital
media business. The company's products include software for designing real-time 3D graphics and
maps, envisioning sports analyses, controlling media assets, and obtaining single workflow solutions
for the digital broadcast trade [9], [10].</p>
      <p>PITCHf/x data set is a free source granted by Major League Baseball Advanced Media (MLBAM)
and Sportvision. Brooks Baseball [11] performs methodical innovations to this data to increase its worth
and usability. They manually analyze the Pitch Info by using many parameters of each pitch's trajectory
and approve the parameters against some other sources such as video proof and direct interaction with
on-field personnel (e.g., pitching coaches, catchers, and the pitchers themselves). The trajectory data's
default values are somewhat altered to align them more nearly with the actual values.</p>
      <p>Sportradar [11], a Swiss corporation, concentrates on accumulating and examining data related to
sports results by cooperating with bookmakers, widespread football associations, and global football
associations. Their performing projects include collecting, processing, monitoring, and selling sports
data, appearing in a different collection of sports-related live data and digital content.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Multiple object tracking based on single object tracking</title>
      <p>Multiple object tracking (MOT) is one of the most challenging tasks in computer vision. A reliable
and universal solution to this problem is not yet known - often, several objects are tracked using a single
object tracking (SOT) method. With this tracking method, each object is tracked separately and
independently of the other objects. The article [12] proposed a powerful real-time tracking method
Boosting, that considers the tracking problem as a binary classification problem between object and
background. Most existing approaches build a representation of the targeted object before the tracking
function begins and therefore utilize an established representation to handle appearance adjustments
during tracking. However, this method does both - adjusting to the variations in appearance during
tracking and selecting suitable features that can learn any object and discriminate it from the
surrounding background. In Discriminative Correlation Filter with Channel and Spatial Reliability
(CSR-DCF), the reliability map adapts the filter support to the object suitable for tracking, overcoming
both the circular shift problems and enabling an arbitrary search range and the rectangular shape
assumption's limitations [13]. The CSR-DCF has the highest performance on standard benchmarks –
OTB100, VOT2015, and VOT2016 while running in real-time on a single CPU. Despite using basic
features like histogram of oriented gradient (HOG) and Colornames, the CSR-DCF performs parallel
with trackers that apply computationally complex deep Convolutional Networks but is noticeably faster.
In [14], originators demonstrated that it is possible to analytically model natural image translations,
showing that the resulting info and kernel matrices become circulant under some conditions. The DFT's
diagonalization presents a general blueprint called Kernelized Correlation Filter (KCF) for creating fast
algorithms that deal with translations. This blueprint has been applied to linear and kernel ridge
regression, obtaining the highest development trackers that work at hundreds of FPS and can be
implemented with a few code lines. The visual tracking problem, which is traditionally solved using
heavyweight classifiers, complex appearance models, and stochastic search methods, can be replaced
by effective and more straightforward Minimum Output Sum of Squared Error (MOSSE) correlation
filters [15]. However, there are several ways how this tracker can be improved. For example, if the
target's appearance is relatively steady, drifting could be eased by occasionally recentring the filter
based on the initial frame. Also, the tracker can be extended to estimate scale and rotation changes by
filtering the tracking window's log-polar transform after an update. In paper [7], authors studied the
problem of tracking an object in a video stream, where the object changes appearance frequently moving
in and out of the camera view. They designed a new Tracking, Learning, and Detection (TDL)
framework. Many challenges have to be addressed to get a more trustworthy and general system based
on TLD. For example, TLD does not perform well in the case of full out-of-plane rotation. In that case,
the Median-Flow tracker drifts away from the target and can be re-initialized if the object comes back
with an appearance seen/learned before. The current implementation of TLD trains only the detector,
and the tracker stays fixed. As a result, the tracker always makes identical errors, and currently, it tracks
a single object. Multi-target tracking opens engrossing questions about how to train the models and
share features to scale jointly.
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Multiple object tracking based on object detection and position forecasting</title>
      <p>We can also rely on recognition-based solutions to solve the problem of tracking multiple moving
objects. These algorithms' idea is to detect the tracked objects in each analyzed frame and classify them
into sets of moving objects. This problem is usually framed as a data linking task, but several obstacles
can lead to poor tracking accuracy. To identify tracked objects in a frame can be applied various neural
network-based or non-neural network-based algorithms. Such classical methods as the Viola-Jones
algorithm work in real-time by analyzing the image's pixels [16]. Although the algorithm is quite
primitive, it has pretty high accuracy and real-time speed. This algorithm can be taught to detect classes
of different objects (applied to different subtasks such as pedestrian and car), but due to the algorithm's
favourable properties, this algorithm is usually applied in face recognition. [18] The object detection
process can be established using HOG [14], scale-invariant feature transformation [16], Haar cascade
classifiers [16], etc. These algorithms are used to determine low-level feature information. More
complex tasks usually require obtaining higher-level information, and that is possible using deep
learning techniques. A convolutional neural network (CNN) a class of deep neural networks, most
commonly applied for image recognition tasks [16]. You Only Look Once (YOLO) is a deep learning
algorithm for object detection, which is most fast and accurate than most other algorithms [16]. By
dividing the input image into areas and predicting the boundary box's coordinates and the class's
probability for each region, it converts object detection problems into regression issues to achieve
endto-end detection. YOLO can work well for multiple objects where each object is associated with one
grid cell. However, in the case of overlap, in which one grid cell contains two different objects' centre
points, we can use anchor boxes to allow one grid cell to detect multiple objects. The common challenge
complicates the multiple objects tracking and detrimental to the result – frequent occlusions, similar
appearance, interactions between multiple tracked objects, the unstable appearance of the object in the
video, etc.</p>
    </sec>
    <sec id="sec-6">
      <title>Proposal 3.1</title>
    </sec>
    <sec id="sec-7">
      <title>MOT using SOT</title>
      <p>In the following part of the article, we will provide proposals for multiple object tracking. The
presented algorithms are designed to solve the players tracking problem in targeted video.</p>
      <p>Firstly, multiple object tracking (MOT) is developed by employing independent single object
trackers (SOT). MOT model has a list of tracked objects, and each of objects has its tracker, id,
and rectangle object, which stores the metrics of the tracked object: x and y coordinates, height and
width of the region of interest (ROI) (see Figure 1).</p>
      <p>The process of object tracking using Unified Modeling Language (UML) notation is provided in
Figure 2. First, a video file is selected, and MOT initialization is performed. After the initialization of
the model, the objects to be tracked are marked. This process is performed manually. Finally, it is
possible to start processing video frames, where each frame is used to update the MOT model.</p>
      <p>During the MOT model update (see Figure 3), the frame is used to update each tracked object.
Because objects are tracked entirely independently, this process can be parallelized. After updating
tracked objects, MOT removes from the list those objects that have not been successfully updated for a
certain period of time - it is assumed that the object has been lost. During the tracked object update
process (see Figure 4), the object tracker is updated. If this operation is performed successfully, then
the rectangle object is updated, and the failure counter is restored; otherwise, the fail counter increased.
3.2</p>
    </sec>
    <sec id="sec-8">
      <title>MOT using CNN object detection</title>
      <p>An alternative way to track multiple moving objects is to use constant object detection and detected
object classification. To solve this problem, a method of object recognition that provides the highest
possible accuracy, as well as a method of classifying objects according to the previous coordinates of
the presence of each moving object, is required. Convolutional Neural Network (CNN) allows forming
a multi-layered model, which can provide an advantage in analyzing more than one feature without
compromising speed. For CNN model training, 117 different shots of a goalball match have been used.
All the training data has been marked with the bounding box required for prediction (see Figure 5).</p>
      <p>To evaluate the performance of the CNN model, three accuracy characteristics have been used:
precision, recall, and the mean average precision (mAP):</p>
      <p>The trained model can be exported and applied locally. Depending on the technology used, the model
format can also be selected in different ways. The Open Neural Network Exchange (ONNX) format
model is used for this study. ONNX provides definitions of an extensible computation graph model,
built-in operators, and standard data types focused on inferencing (evaluation). The model was
constructed using eight layers with the input image in BGR format. Trained CNN model provides
composed of bounding boxes, class labels, and confidence levels (see Figure 6). Each player is detected
multiple times with a different probability. To remove unwanted redundancies, a filter is used that
leaves only the bounding box satisfying the marginal probability. Recognition of players is not enough
to track them. There is a classification problem in how to assign a bounding box to a particular player.</p>
      <p>At the beginning of the analysis, the players being followed are marked. Having the start coordinate
of each player, we can go through all the CNN results and assign each player the best bounding box.
This step is repeated for each iteration of the refinement. Once the player was not detected using CNN
(it usually appears when the player intersects), we keep the old coordinate and move to the next frame.
4</p>
    </sec>
    <sec id="sec-9">
      <title>Experiment</title>
      <p>The experiment was done using five different single object tracking methods: Boosting, CSR-DCF,
KCF, MOSSE, TLD, and one multiple object tracking using CNN object detection. For testing, it was
used three different goalball videos up to 1 minute long each. The experiment goal is to track six
different players on the playfield marked from 1 to 6 (see Figure 7).</p>
      <p>The two most essential parameters in the evaluation of algorithms are:
• How accurately the player is tracked;
• How quickly the video is analyzed.</p>
      <p>To evaluate player tracking accuracy, the number of frames in which a player was detected and
tracked was processed. Because videos of specific durations and frames per second ratio were used for
testing, we can evaluate each player's accuracy being tracked. However, not all algorithms can
determine whether a player's position has been adjusted or not, so this metric is only valid for specific
algorithms.</p>
      <p>We have analyzed results of the KCF algorithm at the tracking in more details (see Figure 8). The
algorithm provides the best tracking results for the third and the sixth players (in some cases, the first).
The best results were obtained using the “Male BRA vs. SWE” video stream: the total accuracy of all
tracked players is 89%. From the results, it cannot be said that the algorithm gives a stable and similar
result under all conditions because it depends on many factors: the noise in the video material,
bystanders, the exchange of players, the angle of the frames. It can be justified comparing the “Male
BRA vs. SWE” and “Male LTU vs. USA” videos where a clear difference is visible. Using the same
algorithm, total accuracy dropped from 89% to 81%. Supervising the algorithm revealed that players
overlap more often in the less accurate video than in the greater accuracy provided one. Also, the tracked
player is more often abandoned when making very sudden movements.</p>
      <p>Another critical factor in the evaluation of algorithms is speed. Each algorithm is based on a different
computational strategy, in which the speed may depend on different factors. After performing an
experiment with each tracking algorithm and analyzing three videos for testing (see Figure 9), it was
noticed that the MOSSE algorithm copes with the task even several times faster than the other
algorithms. The slowest algorithm that the experiment was done is TLD.</p>
      <p>Another experiment - to use CNN for object detection and tracking. The model was trained using
Microsoft Azure Custom Vision Service. The training process was performed using 117 different shots
from the goalball game videos. In each frame, the players on the playing field and the ball were marked.
The obtained results after training the convolutional neural network are provided in the Table 1.</p>
      <sec id="sec-9-1">
        <title>Explanation</title>
        <p>It measures how many of the predictions that the model</p>
        <p>made were actually correct
It measures how well the model can find all the positive</p>
        <p>predicted boxes
It is calculated by taking the mean average precision over all
classes and overall IoU thresholds, depending on different
detection challenges that exists</p>
        <p>From the results, we can conclude that the model quite accurately predicted the players in the frames.
Of these guesses, an average of 82% is accurate bounding boxes belonging to the hypothesized object.
When it comes to recognizing the ball in the frame, the model performs worse. While the average is
89% of the shots, the ball is guessed; only an average of 39% is guessed in the proper right place. This
may be because the ball is relatively small in the frame, and it is partially blocked by the players and
sometimes merges with the background.</p>
        <p>Additional experiments have been carried out to evaluate how the model performs in real test cases
using video stream. The main task of the tracking algorithm is to solve the predicted bounding box
classification. The prediction was made using CNN, and bounding box classification was performed
using an algorithm that depends on previously detected player coordinates. The same data set, including
three video streams, has been used in the experiment. Tracker provides quite a stable accuracy for each
video stream (see Table 2).</p>
      </sec>
      <sec id="sec-9-2">
        <title>Paralympic Games 2016 Goalball Male LTU vs USA</title>
        <p>
          Paraly
          <xref ref-type="bibr" rid="ref4">mpic Games 2016</xref>
          Goalball Male BRA vs SWE
Paraly
          <xref ref-type="bibr" rid="ref4">mpic Games 2016</xref>
          Goalball Female USA vs BRA
        </p>
        <p>Since the result determines in which part of the frame each player was detected and classified (see
Figure 10), it is difficult to evaluate whether the classifier classifies the players with their belonging
bounding boxes. The accuracy of the analysis can only be confirmed by the person supervising the
analysis.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Conclusions</title>
      <p>In this paper, the research of object tracking techniques used for real-time goalball video analysis
has been performed. Our proposal's novelty is that we have adapted single object tracking algorithms
to solve multiple object tracking task. We also applied multiple object tracking models to the analysis
of sports video material. We carried out different experiments to evaluate two multiple objects tracking
task approaches: by employing a single object tracker (including Boosting, CSR-DCF, KCF, MOSSE,
TLD); and CNN for multiple object tracking. For the first approach, we evaluate the method's
performance in terms of the number of frames and speed. Experiments have shown that only the KCF
algorithm can determine the adjustments of a player's position. MOSSE algorithm outperforms other
algorithms in terms of speed and is three times faster than KCF and 9,8 times faster than TLD. CNN
results are promising for players' position prediction, and accuracy varies from 88.12% to 90,26%; the
accuracy was measured by calculating the total number of frames where each player was predicted and
classified. However, CNN has shown poor performance for ball predictions providing 39% average
accuracy of ball position. An interesting direction for further research would be to combine neural
networks-based object detection and single object tracking in order to get better tracking results.
6</p>
    </sec>
    <sec id="sec-11">
      <title>Acknowledgments</title>
      <p>We want to express our very great appreciation to Dr. Agnė Paulauskaitė-Tarasevičienė for her
insights and advice.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Martinez-Martin</surname>
          </string-name>
          ir
          <string-name>
            <surname>A. P.</surname>
          </string-name>
          d. Pobil, „
          <article-title>Object Detection and Recognition for AssistiveRobots,“ Robotics &amp; automation magazine</article-title>
          ,
          <source>t. 24</source>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>138</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>M. S. Adam</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          <string-name>
            <surname>Anisi ir</surname>
            <given-names>IhsanAli</given-names>
          </string-name>
          , „
          <article-title>Object tracking sensor networks in smart cities: Taxonomy, architecture, applications, research challenges and future directions,“ Future Generation Computer Systems</article-title>
          , t.
          <volume>107</volume>
          , pp.
          <fpage>909</fpage>
          -
          <lpage>923</lpage>
          ,
          <year>2020</year>
          , June.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>F.</given-names>
            <surname>Joy</surname>
          </string-name>
          ir
          <string-name>
            <given-names>V. V.</given-names>
            <surname>Kumar</surname>
          </string-name>
          , „
          <article-title>A review on multiple object detection and tracking in smart city video analytics</article-title>
          ,“ Research gate,
          <year>2018</year>
          , January.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Li</surname>
          </string-name>
          , „
          <article-title>Detecting, segmenting and tracking bio-medical objects</article-title>
          ,
          <source>“ Scolars Mine Doctoral Dissertations</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Georgescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          , W. Wu,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ionasec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zheng ir D. Comaniciu</surname>
          </string-name>
          , „
          <article-title>Learning-Based Detection and Tracking in Medical Imaging: A Probabilistic Approach</article-title>
          ,“ M.
          <string-name>
            <surname>González</surname>
          </string-name>
          Hidalgo et al. (eds.),
          <source>Deformation Models, Lecture Notes in Computational Vision and Biomechanics 7</source>
          , pp.
          <fpage>209</fpage>
          -
          <lpage>235</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Azad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Azad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. B.</given-names>
            <surname>Khalifa</surname>
          </string-name>
          ir S. Jamali, „
          <article-title>Real-time human-computer interaction based on face and hand gesture recognition</article-title>
          ,“
          <source>International Journal in Foundations of Computer Science &amp; Technology (IJFCST)</source>
          ,
          <source>t. 4</source>
          ,
          <issue>nr</issue>
          . 4, pp.
          <fpage>37</fpage>
          -
          <lpage>48</lpage>
          ,
          <year>2014</year>
          , July.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kalal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mikolajczyk ir J. Matas</surname>
          </string-name>
          , „
          <article-title>Tracking-learning-detection</article-title>
          .
          <source>Pattern Analysis and Machine Intelligence</source>
          ,“ IEEE Transactions, pp.
          <fpage>1409</fpage>
          -
          <lpage>1422</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>„SAP and Panasonic Launch Joint Initiative for Video-Based Sports Analytics Solutions,“</article-title>
          <source>SAP News</source>
          ,
          <year>2014</year>
          , September 12.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Danelljan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. S.</given-names>
            <surname>Khan ir M. Felsberg</surname>
          </string-name>
          , „
          <article-title>Accurate scale estimation for robust visual tracking,“ roc</article-title>
          .
          <source>British Machine Vision Conference</source>
          , %
          <volume>1</volume>
          t. iš %
          <volume>21</volume>
          ,
          <issue>2</issue>
          ,
          <issue>4</issue>
          ,
          <issue>8</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Danelljan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Hager</surname>
            ,
            <given-names>F. S.</given-names>
          </string-name>
          <string-name>
            <surname>Khan ir M. Felsberg</surname>
          </string-name>
          , „
          <article-title>Learning spatially regularized correlation filters for visual tracking</article-title>
          .
          <source>Pages</source>
          <volume>4310</volume>
          - 4318,“ įtraukta IEEE International Conference on Computer Vision, Santiago, Chile,
          <year>2015</year>
          , December 7-13.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Danelljan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. S.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Felsberg ir J</article-title>
          . v. d. Weijer, „
          <article-title>Adaptive color attributes for realtime visual tracking</article-title>
          .
          <source>Pages</source>
          <volume>1090</volume>
          -1097,“ įtraukta IEEE Conference on Computer Vision and Pattern Recognition, Columbus,
          <string-name>
            <surname>OH</surname>
          </string-name>
          , USA,
          <year>2014</year>
          , June 23 - 28.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Z.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Prokhorov ir D. Tao, „Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking,“</article-title>
          <source>Comp. Vis. Patt. Recognition</source>
          , pp.
          <fpage>749</fpage>
          -
          <lpage>758</lpage>
          ,
          <year>June 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>H.</given-names>
            <surname>Grabner</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Grabner ir H</article-title>
          . Bischof, „
          <article-title>Real-time tracking via on-line boosting</article-title>
          ,“ BMVC, t.
          <volume>1</volume>
          , p.
          <fpage>6</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>A.</given-names>
            <surname>Lukezic</surname>
          </string-name>
          , T. Voj'ir,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Zajc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas ir M. Kristan</surname>
          </string-name>
          , „
          <article-title>Discriminative correlation filter tracker with channel and spatial reliability</article-title>
          ,“
          <source>International Journal of Computer Vision</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Henriques</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Caseiro</surname>
          </string-name>
          , P. Martins ir J. Batista, „
          <article-title>Exploiting the circulant structure of tracking-by-detection with kernels,“</article-title>
          <source>In proceedings of the European Conference on Computer Vision</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Bolme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Beveridge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Draper ir M. L. Yui</surname>
          </string-name>
          , „
          <article-title>Visual object tracking using adaptive correlation filters,“ įtraukta Computer Vision and Pattern Recognition (CVPR</article-title>
          ),
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>S. K. S. Anjali B Guptha</surname>
          </string-name>
          , „
          <article-title>Multiple Face Detection and Tracking using Viola-Jones Algorithm</article-title>
          ,“
          <source>International Research Journal of Engineering and Technology (IRJET)</source>
          , t.
          <volume>07</volume>
          , nr.
          <volume>04</volume>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>R.</given-names>
            <surname>Padilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Netto ir E. A. B. d</surname>
          </string-name>
          . Silva, „
          <article-title>A Survey on Performance Metrics for ObjectDetection Algorithms</article-title>
          ,“
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>