<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshops, Los
Angeles, USA, March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Video Scene Extraction Tool for Soccer Goalkeeper Performance Data Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yasushi Akiyama</string-name>
          <email>Yasushi.Akiyama@smu.ca</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rodolfo Garcia</string-name>
          <email>Rodolfo.Garcia.Barrantes@smu.ca</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tyson Hynes</string-name>
          <email>tyson@gkstopper.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kilo Communications</institution>
          ,
          <addr-line>Halifax, Nova Scotia</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Saint Mary's University</institution>
          ,
          <addr-line>Halifax, Nova Scotia</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Saint Mary's University</institution>
          ,
          <addr-line>Halifax, Nova Scotia</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>20</volume>
      <issue>2019</issue>
      <abstract>
        <p>We will present a new approach for the scene extraction of sport videos by incorporating the user-interactions to specify certain parameters during the extraction process, instead of relying on fully automated processes. It employs a scene search algorithm and a supporting user interface (UI). This UI allows the users to visually investigate the scene search results and specify key parameters, such as the reference frames and sensitivity threshold values to be used for the template matching algorithms, in order to find relevant frames for the scene extraction. We will show the results of this approach using two videos of youth soccer games. Our main focus in these case studies was to extract segments of these videos, in which the goalkeepers interacted with the balls. The resulting videos can then be exported for further player performance analyses enabled by Stopper, an app that tracks keeper performance and provides analytical data visualizations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 INTRODUCTION</title>
      <p>
        In this paper, we will present a new approach to
extracting segments of sport videos by incorporating the
userinteractions to specify certain parameters during the
extraction process, instead of relying on fully automatic
approaches. When coaches and players review videos of their
IUI Workshops’19, March 20, 2019, Los Angeles, USA
© 2019 Copyright for the individual papers by the papers’ authors. Copying
permitted for private and academic purposes. This volume is published and
copyrighted by its editors.
own games or those of competing teams for analysis
purposes, they typically fast forward through game footage until
they find the segments that show important plays within
these games. For example, Stopper [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] is a mobile app that
tracks soccer goalkeeper performance and provides
analytical data visualizations. The users of this app can record the
data while watching live games or retrospectively watching
the recorded videos. While a single soccer game is typically
90 minute long, the amount of time a goalkeeper is involved
in plays is significantly less than the full duration of a game.
Thus, it would be ideal if a previously edited shorter version
of the video that only shows the relevant plays (i.e., video
highlights) is provided for the users of Stopper so that they
will not need to skip irrelevant parts within a game.
      </p>
      <p>
        While some video segmentation and summary generation
algorithms exist and work in certain domains [
        <xref ref-type="bibr" rid="ref15 ref26 ref9">9, 15, 26</xref>
        ], to
our knowledge, there is no approach that can directly be
applied to our problem domain. Our system provides an
intuitive UI that allows the users to specify certain areas of a
video frame to be used for the template-matching algorithms.
The system will then find all the relevant frames based on
the template matching results. The tool also allows the users
to select the sensitivity of template-matching so as to
control how many false-positive frames are to be included in,
or false-negative frames excluded from, the resulting video
highlights.
      </p>
      <p>The rest of the paper is organized as follows. We will first
briefly describe Stopper in Section 2 to give more context
of the current research. In Section 3, we will discuss the
overview of the past research and approaches to
addressing similar problems. Section 4 will describe the proposed
approach, together with the UI that is designed to provide
certain user interactions to select several parameters.
Section 5 will show the results of the case-studies to test our
approaches in diferent settings. Our main focus was to extract
segments of these videos, specifically when the goalkeepers
interacted with the balls. These videos can then be exported
for further player performance analyses enabled by Stopper.
We will finally provide conclusions and discussions on the
implications for the future work in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>STOPPER</title>
      <p>Stopper (shown in Fig. 1) is a mobile app developed to record
and visualize soccer goalkeeper game performance data.
Users can track the data in five key performance areas: (1) Saves
(a shot directed towards the goal that is intercepted by the
goalkeeper), (2) Goals Against (a shot that passes over the
goal line), (3) Crosses (a ball played into the centre of the
ifeld), (4) Distribution (a pass by the goalkeeper using either
their hands or feet), and (5) Communication (how the
goalkeeper verbally and in gestures supports and organizes their
team), which collectively provide a framework for analysing
goalkeeper strengths and weaknesses.</p>
      <p>
        Commonly used metrics such as Goals Against Average
(GAA), Save Percentage (Sv%), and Expected Goals (xG)
provide limited correlation to goalkeeper ability [
        <xref ref-type="bibr" rid="ref30 ref7">7, 30</xref>
        ]. As a
result, analysing individual goalkeeper performance separately
from the overall team performance carries an inescapable
degree of subjectivity [
        <xref ref-type="bibr" rid="ref23 ref6">6, 23</xref>
        ]. The resulting data based on
Stopper’s five key components can establish a more
comparative benchmark for individual player performance, and it is
less likely influenced by the quality of the defensive play of
their own team or the attacking capability of opposing teams,
compared to the traditional performance measurements.
      </p>
      <p>While Stopper’s analytical data visualizations in these
performance areas can help understand the overall keeper
performance, corresponding videos showing the tracked
actions will provide a crucial component for more detailed
analyses as a training and coaching tool. Currently, the users
ifrst log the goalkeeper performance using Stopper while
watching the game. Once the data is recorded, Stopper uses
timestamps of the goalkeeper actions logged during a game
to generate video snippets for individual goalkeeper actions.
Our focus in the current research is somewhat the reverse
of this process. That is, we will first extract video segments
that only contain the goalkeeper interactions and provide
the users with the extracted videos. In this way, they will
not need to watch the entire 90 minute of the soccer game
in order to log the performance data.
3</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
    </sec>
    <sec id="sec-4">
      <title>Automatic Video Segmentation</title>
      <p>
        There have been studies in the related problem domains. One
group of research focused on video segmentation approaches,
specifically for videos of sports. Oyama and Nakao [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
proposed an approach to identifying diferent types of plays
(i.e., scrum, lineout, maul, lack, place-kick) in a rugby video
based on the image analysis of player interactions. Li and
Sezan [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] also proposed an approach to classifying
diferent plays in sport videos, using broadcast videos of baseball
and football. Ekin et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] used the low-level analysis for
cinematic feature extraction for scene boundary detection
and scene classifications in soccer videos. A slightly diferent
approach was proposed by Baillie and Jose [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], using an
audio signal analysis to detect certain scenes by incorporating
Hidden Markov model classifiers in their algorithm. All these
studies utilize broadcast videos, often of professional sports,
that were shot from multiple cameras positioned at diferent
locations in stadiums. Thus, switching between scenes, or
cuts, often gave suficient cues for these approaches to
detect diferent plays in these games. Since our current work
is focused on the analyses of videos of youth players, the
videos are usually recorded by a single camera, positioned
to align with the centre line of a field. Diferent plays are
recorded by panning the camera horizontally so there are
no “cuts” to be detected in the recording.
      </p>
      <p>
        Video segmentation and scene detection approaches
outside of the sport video domain have also been investigated [
        <xref ref-type="bibr" rid="ref21 ref28">21,
28</xref>
        ]. These approaches detect scenes/segments based on cuts
that typically produce abrupt changes in video boundaries
or on video transitions that exhibit certain characteristics
in visual parameters such as colour and brightness changes.
However, these approaches sufer from the same issue as the
above ones that capitalize on cuts and switching between
cameras. Further, some approaches can work well in
certain sports (e.g., detecting scrums in rugby that have distinct
player interactions/formations) but are not straightforwardly
applicable to other sports. For example, soccer games
typically have variety of plays that do not necessarily exhibit
visual patterns in player interactions, perhaps with only very
few exceptions such as corner kicks or penalty kicks. In the
case of audio analysis [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the approach relies on a large
number of spectators to generate suficiently salient audio
features. Most youth games may not even have any
spectators or audience (e.g., practice games and scrimmages) to
generate audible cues to detect certain plays. Thus, none of
the above approaches will work well in our specific problem
space.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Object Detection and Template Matching Algorithms</title>
      <p>
        Approaches to the object detection in images and video can
broadly be devided into four categories [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], feature-based,
motion-based, classifier-based [
        <xref ref-type="bibr" rid="ref11 ref18">11, 18</xref>
        ], and template-based.
Feature-based object detection utilizes object features such as
shapes [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and colours [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. These approaches, however, did
not work well in our preliminary investigation when we tried
to keep track of players (e.g., keepers) or a soccer ball, due to
several potential factors such as objects (e.g., humans) often
changing shapes during the play, colours of uniforms being
too similar to the background colours (e.g., green jerseys
on green grass), and frequent occlusions of target objects.
Motion-based detection approaches often use static
background reference frame(s) and detect changes in the
foreground by eliminating the background images [
        <xref ref-type="bibr" rid="ref12 ref14 ref32 ref33">12, 14, 32, 33</xref>
        ].
These approaches typically require static background
images, but in our case, since the camera follows soccer balls,
the background images keep changing, making it dificult
for us to straightforwardly apply them to detect objects in
videos. We also investigated the possibility of integrating
some classifier-based approaches into our framework;
however, we could not find suitable solutions that could detect
frames with target objects, especially when there are not
suficient samples for the model training. Therefore, we used
a simple template matching algorithm using the
normalized cross-correlation [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] in our framework. As will be
discussed in this paper, the object detection approach itself in
our framework can be switched to another, potentially better,
solution later. The focus of the current paper is to propose
a generic framework of video segmentation and show the
early results of this proposed approach.
      </p>
      <p>
        To this end, we have observed that generic automatic video
segmentation approaches can benefit from certain domain
knowledge. For instance, the work such as done by Kim et
al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and Oude-Elberink and Kemoi [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] integrate the
userinteraction for their object detection and tracking algorithms
in the video. Our proposed approach also utilizes the user
input in order to complement and improve the automatic
video segmentation algorithms. We will now describe our
approach in the next section.
4
      </p>
    </sec>
    <sec id="sec-6">
      <title>PROPOSED APPROACH</title>
      <p>
        Our proposed approach will work in five basic steps,
interactively with the users’ input. This section describes each
of these steps in detail, together with the corresponding UI
modules, a prototype of which is developed as a web-based
application.
(1) The user uploads an original video and specify a
reference frame
Users will first select a video, from which they want to create
video highlights, using the provided interface (shown in
Figure 2). They will then specify what we call a reference
frame. Reference frames are the frames, in which they specify
areas to be used to find relevant frames that contain certain
objects or backgrounds. The UI tool allows the users to skip
back and forth to find a frame that shows objects that most
likely appear when the target actions occur. For example, if
we are to find the segments that show the goalkeeper who
is on the right-hand side of the pitch in action, then the user
should select a frame when the camera pans to the right so
that it includes the entire goal area (shown in Figure 3).
(2) The user selects reference areas to be used for the
frame search
The next step is to specify areas of the reference frame that
the users want to use for the relevant frame search. We call
these areas reference areas. This step is necessary for
reducing the chances of the algorithm detecting irrelevant frames
due to the overall similarity of the video frames. For example,
soccer videos often contain many frames that are considered
similar by most similarity metrics due to the fact that certain
background images such as bleachers and grass on the pitch
appear in almost every single frame in the video. However,
we do not want to include these background areas because
they are too generic and are not great references in terms
of finding relevant frames. Instead, we need to only include
portions of the reference frame that display salient objects or
features that can be used to identify relevant frames. For
example, if the users are to find video segments that contain the
goalkeeper’s interactions, then they may choose reference
areas that show the entire goal area and/or the goal itself.
This step of reference area selection is depicted in Figure 4.
(a) After the user selected (b) After the user selected
one reference area. another reference area. The
user continues to add as
many reference areas as
they wish to.
(3) The system rates each frame in the original video
with the relevance metric
For each frame in the original video, the algorithm will
calculate its likelihood of containing each of the reference areas
by using a template matching algorithm. It will repeat the
process for all the reference areas and calculate the overall
likelihood of the reference areas appearing in that frame, as
an average likelihood for all the reference areas. The
template matching algorithm that was employed in our case
studies was provided in an open source library for computer
vision and machine learning software and image processing,
called OpenCV (Open Source Computer Vision Library) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
The function matchtemplate in this library calculates the
cross-correlation [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] between a reference area and the
target frame. Conceptually, it scans the target frame by sliding a
reference area (i.e., the template) over the target frame pixel
by pixel, while calculating the correlation of the two images
at each location: the reference area and the portion of the
frame underneath it. This process is depicted in Fig.5.
      </p>
      <p>Let C (x, y) be the cross-correlation of the two images at
a pixel (x, y), T (x, y) the pixel value of the target frame at
(x, y), and R (x, y) the pixel value of the reference area at
(x, y), the metric is calculated by the following formula.</p>
      <p>Further, based on the general observation that a frame in
the video may likely have diferent lighting/intensity based
on factors such as camera angles and exposures, we will
use the normalized cross-correlation to mitigate the lighting
efects:</p>
      <p>C (x, y) = qPx′,y′ T (x ′, y′)2 · Px′,y′ R(x + x ′, y + y′)2</p>
      <p>Px′,y′ (T (x ′, y′) · R(x + x ′, y + y′))</p>
      <p>We calculate C (x, y) for all the pixels given by Eq. 2, and
then use the maximum value in C (x, y) (i.e., the highest
likelihood of the reference area matched in the target frame)
as the relevance metric for this frame. We repeat this process
for each reference area and calculate the overall likelihood of
the reference areas appearing in the frame. The pseudocode
of this entire step of calculating the frame relevance metric
is described in Algorithm 1.</p>
      <p>Algorithm 1 Relevant frame search algorithm
(1)
(2)
for each frame fi in original video do
sum ← 0
n ← 0
for each reference area aj do
pi j ← Likelihood of aj appearing in fi
sum ← sum + pi j
n ← n + 1
end for
avei ← sunm (n &gt; 0)
if avei &gt; threshold then</p>
      <p>
        relevant Frames.add ( fi )
end if
end for
We have experimented with other metrics commonly used
for the template matching such as the sum of squared
diference [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], but the normalized cross-correlation yielded the
best results overall. While the current paper presents our
entire framework of the video segmentation processes that
can easily be facilitated by novice users, the template
matching algorithm itself in our framework can also be replaced
by others (e.g., [
        <xref ref-type="bibr" rid="ref19 ref22 ref4">4, 19, 22</xref>
        ]) and by incorporating certain
machine learning algorithms such as deep learning models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
that may potentially improve the accuracy of the template
matching.
(4) The user selects a threshold value
The tool will now display the resulting relevance metrics
from the previous step, and using this visualized data, the
user can select an ideal threshold to be used for the next step.
The UI allows the user to move the threshold line on the
visualized relevance metrics data, so that they can control which
section(s) of the original video to be included. The green
dots in Fig. 6 indicate the frames to be included in the final
extracted video highlights, while the frames indicated by the
red dots will be excluded from them. Naturally, lowering the
threshold may include false positive frames (i.e., irrelevant
frames), while raising it may result in false negative frames
(i.e., missed relevant frames).
      </p>
      <p>This visualization tool is also synchronized with the video
viewer. That is, as the user clicks on the data points in the
data plot, the video viewer’s time cursor is also moved to
that particular point in time, allowing the user to visually
inspect the corresponding plays in the original video. This
interaction is depicted in Fig.7. Therefore, with this visual
aid, the user will need to spend less time to scan the original
video as it allows them to directly get to the frames that will
likely include the keeper interactions with the ball. It is this
interactive visual investigation of the video data that allows
the users to minimize the time spent on searching for the
relevant frames.
(5) The system extracts video segments with the
frames with the relevance rating higher than a
threshold value
The final step is to extract video segments that contain the
relevant frames. Our current approach is to take all the frames
with relevance metric values above the specified threshold
(i.e., all the green segments shown in Fig. 6). The video
segments will then be created by sequencing all these relevant
frames.
5</p>
    </sec>
    <sec id="sec-7">
      <title>CASE STUDIES</title>
      <p>In the case studies, we used video footage from two US
Soccer Development Academy league games. Both the videos
were in the MPEG-4 (AAC, H.264 codec) format and had
the dimensions of 1280 by 720 with the frame rate of 29.97
frames per second (fps).</p>
      <p>(1) Video #1: Contains 15 minutes of a soccer game, with
a relatively clear weather and fair lighting.
(2) Video #2: Contains 10 minutes of a soccer game, under
the rainy condition with darker lighting.</p>
      <p>Both the cameras were set a few metres above the ground
so they were slightly looking down the pitch. They were
secured on tripods and the panning mechanism was used to
keep track of the ball, thus the videos only show a portion of
the pitch at one time, and never the entire field at any given
time. All the cases were run on a MacBook Pro (13-inch, 2018)
with 2.3GHz Intel Core i5 CPU and 8GB 2133MHz LPDDR3
memory.</p>
    </sec>
    <sec id="sec-8">
      <title>Case 1: Detecting the keeper interactions on the right-hand side of the pitch on Video #1, with the reference area containing the goal</title>
      <p>In order for the template-matching to work, we will first need
to choose a reference area(s) that will be unique and static in
shapes and colours for the most part of the video. For
example, choosing goalkeepers themselves as references do not
typically produce ideal results as they move around while the
shape of the object (i.e., a human) changes significantly. Also,
in the videos that we used in these case studies, the keepers
wore shirts with neon yellow and green colours, which often
blended in with the green colour of the grass, potentially
confusing the template-matching algorithm. Therefore, we
chose a reference frame that shows the entire goal on the
right-hand side of the pitch (shown in Fig. 8) and selected
this goal as a reference area (shown in Fig. 9). After the visual
inspection of the data, we used the correlation metric of 0.98
as the threshold to create the resulting videos.</p>
      <p>The results are shown in Fig. 10. Most of the anticipated
frames were detected as relevant, with the highest relevance
metric was indeed detected at the reference frame around
at the 131st second. However, the algorithm missed some
frames that should have been considered as relevant in terms
of the plays in which the keeper was involved. For example,
if you look at the frame at the bottom left in Fig. 10 that
shows the play at 129 seconds into the video. This play was
right before the reference frame and the keeper is actually
holding the ball. However, this and some of the other frames
leading up to the reference frame were omitted from the
relevance frames. This omission was in fact inevitable as the
reference area clearly shows the entire goal while this frame
at 129th second is missing the right side of the goal. One
solution to include these frames is to lower the threshold, but
it will also include irrelevant frames that appear earlier in
the video. Therefore, while the template matching algorithm
itself seems to have worked properly, we probably did not
choose the most ideal reference frame/area(s).</p>
    </sec>
    <sec id="sec-9">
      <title>Case 2: Detecting the keeper interactions on the right-hand side of the pitch on Video #1, with the reference areas containing both the goal and the unique background area</title>
      <p>Given the above results, in addition to the goal, we also
experimented with an additional reference area, which shows
a unique background area shown in Fig. 11, thus we used
both the goal as well as this unique background from the
same reference frame to perform the template-matching.</p>
      <p>As shown in Fig. 12, this additional reference area
improved the performance in that it included those frames (e.g.,
the top left frame shown in Fig. 12) that did not have the
entire goal but were parts of the play, in which the keeper
interacted with the ball. This result illustrates the importance
of integrating the user input into these processes instead of
relying on entirely automated approaches.</p>
    </sec>
    <sec id="sec-10">
      <title>Case 3: Detecting the keeper interactions on the right-hand side of the pitch on Video #2</title>
      <p>We also tested with the video with some visible noise caused
by the rain. The reference area used in this test is shown in
Fig. 13.</p>
      <p>As shown in Fig. 14, the expected relevant frames were still
appropriately detected, despite that the visibility condition
was not as ideal as the first two cases.</p>
    </sec>
    <sec id="sec-11">
      <title>Case 4: Detecting the keeper interactions on the right-hand side of the pitch on Down-sampled Video #2</title>
      <p>For all the above cases, we used the original frame rate of
29.97 fps and ran the relevant frame search algorithm on all
the frames. However, typical plays of soccer do not require
such a high frame-rate for our purposes, thus we
experimented to first down-sample the original video to lower
frame rates of 16 fps and 4 fps, in order to increase the
efifciency of our approach. Once we identified the relevant
frames, we used the original video to extrapolate the
corresponding segments of the video. As seen in Fig. 15 that
shows the results of comparing three diferent frame rates,
it produced almost identical curves of the relevant metrics.</p>
      <p>The results showed that this down-sampling significantly
accelerated the process without afecting the overall results.
To give the general idea of the processing time for the
relevant metrics calculations, Table 1 shows the calculation time
for Video #2 which was ten minute long. Based on this
observation, the tool was able to calculate the relevant frames
in about 1/16 of the length of the original video. Note that
the calculation time itself of course can also be improved
(a) The original 29.97 fps.</p>
      <p>(b) 16 fps.</p>
      <p>(c) 4 fps.
further in a few ways, for example, by calculating the frame
relevance metrics in parallel, as the relevance metric for each
frame does not depend on the other frames’ results, by
downsampling the video resolution, or skipping a number of pixels
during the template matching instead of checking against
every single pixel.</p>
    </sec>
    <sec id="sec-12">
      <title>6 CONCLUSIONS AND FUTURE WORK</title>
      <p>We proposed a new framework of semi-automatic video
segmentation of sport videos, and the UI tool that implements
the proposed approach. Instead of relying on a fully
automatic method, our approach consists of five fundamental
steps that integrate the user input and knowledge to help
reduce potential errors. The provided UI tool allows the users
to easily select a reference frame and reference areas that
are used to detect relevant video frames that contain target
player actions, and visualizes the relevance metrics to
determine the optimal threshold value for the video extraction.
The users can interactively investigate the corresponding
video segments capitalizing on this visualization tool, thus
likely spending less time to search for important plays in the
videos. The case-studies showed that our approach worked
well with certain videos, but there are several factors that
affected the performance of the approach and we are currently
working to improve it in multiple aspects.</p>
      <p>One such aspect is the further investigation to compare
the template-matching and object detection algorithms. As
discussed throughout the paper, there are some algorithms
that may potentially improve the accuracy of the tool. Some
algorithms may be more suitable for certain conditions such
as videos with specific image backgrounds or lighting
conditions. Some of the predictive models such as those utilizing
the deep learning algorithms may potentially be an option
once we obtain enough video data in order to train the
models. In this case, a potential approach is to first run a
clustering algorithm on videos based on certain parameters such as
background types and lighting conditions, and then create
separate models for each of those types.</p>
      <p>
        As well, the threshold is currently determined by the users
before the system renders the video, but this may potentially
be estimated, for example, by integrating known threshold
estimation methods [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Finally, the framework itself can
potentially be applied to other similar types of sports such
as basketball, rugby, and field hokey. Our approach will of
course need to be modified to accommodate diferences in
games. For example, one experiment that we conducted with
a basketball video revealed that, while the tool did work
relatively well detecting any plays near the hoop, since the
game moves much faster than soccer, there should be some
sort of mechanism to include frames leading up to, and after
those plays to be included to show more complete sequence
of actions. Solutions to these new challenges posed by other
types of sports will likely lead to further improvement of the
tool in general.
      </p>
    </sec>
    <sec id="sec-13">
      <title>7 ACKNOWLEDGEMENT</title>
      <p>This research was supported by Natural Research
Council (NRC) Canada Industrial Research Assistance Program
and Nova Scotia Business Inc. Productivity and Innovation
Voucher Program. A special thanks to Jyothi Sethi, an M.Sc.
student at Saint Mary’s University, who contributed her skills
to implement the UI tool.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Baillie</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Jose</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>An Audio-Based Sports Video Segmentation and Event Detection Algorithm</article-title>
          .
          <source>In 2004 Conference on Computer Vision and Pattern Recognition Workshop</source>
          .
          <fpage>110</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bradski</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>The OpenCV Library</article-title>
          .
          <source>Dr. Dobb's Journal of Software Tools</source>
          (
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Davit</given-names>
            <surname>Buniatyan</surname>
          </string-name>
          , Thomas Macrina, Dodam Ih, Jonathan Zung, and
          <string-name>
            <given-names>H. Sebastian</given-names>
            <surname>Seung</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Deep Learning Improves Template Matching by Normalized Cross Correlation</article-title>
          .
          <source>CoRR abs/1705</source>
          .08593 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Luigi</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Stefano</surname>
          </string-name>
          , Stefano Mattoccia, and
          <string-name>
            <given-names>Federico</given-names>
            <surname>Tombari</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>ZNCC-based Template Matching Using Bounded Partial Correlation. Pattern Recogn</article-title>
          .
          <source>Lett</source>
          .
          <volume>26</volume>
          ,
          <issue>14</issue>
          (Oct.
          <year>2005</year>
          ),
          <fpage>2129</fpage>
          -
          <lpage>2134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ekin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Tekalp</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Automatic soccer video analysis and summarization</article-title>
          .
          <source>IEEE Transactions on Image Processing 12</source>
          , 7
          <issue>(</issue>
          <year>July 2003</year>
          ),
          <fpage>796</fpage>
          -
          <lpage>807</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Garry</given-names>
            <surname>Gelade</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Evaluating the ability of goalkeepers in English Premier League football</article-title>
          .
          <source>Journal of Quantitative Analysis in Sports 10</source>
          ,
          <issue>2</issue>
          (
          <year>2014</year>
          ),
          <fpage>279</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Sam</given-names>
            <surname>Gregory</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>GoalkeepersâĂŹ save percentage an unreliable stat</article-title>
          . http://www.sportsnet.ca/soccer. Accessed: February,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Bruce</surname>
            <given-names>E. Hansen. 2003. Sample</given-names>
          </string-name>
          <string-name>
            <surname>Splitting</surname>
            and
            <given-names>Threshold</given-names>
          </string-name>
          <string-name>
            <surname>Estimation</surname>
          </string-name>
          .
          <source>Econometrica</source>
          <volume>68</volume>
          ,
          <issue>3</issue>
          (
          <year>2003</year>
          ),
          <fpage>575</fpage>
          -
          <lpage>603</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hashimoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shirota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Iizawa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Kitagawa</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Digest making method based on turning point analysis</article-title>
          .
          <source>In Proceedings of the Second International Conference on Web Information Systems Engineering</source>
          , Vol.
          <volume>1</volume>
          .
          <fpage>83</fpage>
          -
          <lpage>91</lpage>
          vol.
          <volume>1</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>M. B. Hisham</surname>
            ,
            <given-names>S. N.</given-names>
          </string-name>
          <string-name>
            <surname>Yaakob</surname>
            ,
            <given-names>R. A. A.</given-names>
          </string-name>
          <string-name>
            <surname>Raof</surname>
            ,
            <given-names>A. B. A.</given-names>
          </string-name>
          <string-name>
            <surname>Nazren</surname>
            , and
            <given-names>N. M.</given-names>
          </string-name>
          <string-name>
            <surname>Wafi</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Template Matching using Sum of Squared Diference and Normalized Cross Correlation</article-title>
          .
          <source>In 2015 IEEE Student Conference on Research and Development (SCOReD)</source>
          .
          <fpage>100</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Matroid</given-names>
            <surname>Inc</surname>
          </string-name>
          .
          <year>2019</year>
          . Matroid. https://www.matroid.com/. Accessed:
          <year>February 2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Johnsen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ashley</given-names>
            <surname>Tews</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Real-Time Object Tracking and Classification Using a Static Camera</article-title>
          .
          <source>In IEEE International Conference on Robotics and Automation - Workshop on People Detection and Tracking.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Munchurl</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.G.</given-names>
            <surname>Jeon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.S.</given-names>
            <surname>Kwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.H.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Ahn</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Moving object segmentation in video sequences by user interaction and automatic object tracking</article-title>
          .
          <source>Image and Vision Computing</source>
          <volume>19</volume>
          ,
          <issue>5</issue>
          (
          <year>2001</year>
          ),
          <fpage>245</fpage>
          -
          <lpage>260</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Rajshree</given-names>
            <surname>Lande</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Mulajkar</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Moving Object Detection using Foreground Detection for Video Surveillance System</article-title>
          .
          <source>International Research Journal of Engineering and Technology</source>
          <volume>5</volume>
          ,
          <issue>6</issue>
          (
          <year>June 2018</year>
          ),
          <fpage>517</fpage>
          -
          <lpage>519</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lertrusdachakul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Watanabe</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Yokota</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Automatic Digest Generation by Extracting Important Scenes from the Content of Presentations</article-title>
          .
          <source>In 2008 19th International Workshop on Database and Expert Systems Applications</source>
          .
          <fpage>590</fpage>
          -
          <lpage>594</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Lefevre</surname>
          </string-name>
          , E. Bouton,
          <string-name>
            <given-names>T.</given-names>
            <surname>Brouard</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Vincent</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>A new way to use hidden Markov models for object tracking in video sequences</article-title>
          .
          <source>In Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429)</source>
          , Vol.
          <volume>3</volume>
          . III-
          <volume>117</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. Ibrahim</given-names>
            <surname>Sezan</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Event detection and summarization in sports video</article-title>
          .
          <source>In Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL</source>
          <year>2001</year>
          ).
          <fpage>132</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Yi</given-names>
            <surname>Liu and Yuan F. Zheng</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Video object segmentation and tracking using ψ -learning classification</article-title>
          .
          <source>Circuits and Systems for Video Technology, IEEE Transactions on 15</source>
          (
          <year>2005</year>
          ),
          <fpage>885</fpage>
          -
          <lpage>899</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>David</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Lowe</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Distinctive Image Features from Scale-Invariant Keypoints</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>60</volume>
          ,
          <issue>2</issue>
          (
          <issue>01</issue>
          <year>Nov 2004</year>
          ),
          <fpage>91</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Wei-Lwun Lu</surname>
            and
            <given-names>J. J.</given-names>
          </string-name>
          <string-name>
            <surname>Little</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Simultaneous Tracking and Action Recognition using the PCA-HOG Descriptor</article-title>
          .
          <source>In The 3rd Canadian Conference on Computer and Robot Vision (CRV'06)</source>
          .
          <fpage>6</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Zheng</given-names>
            <surname>Lu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kristen</given-names>
            <surname>Grauman</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Story-Driven Summarization for Egocentric Video</article-title>
          .
          <source>In Proceedings of the 2013 IEEE Conference on Computer Vision</source>
          and
          <article-title>Pattern Recognition (CVPR '13)</article-title>
          . IEEE Computer Society, Washington, DC, USA,
          <fpage>2714</fpage>
          -
          <lpage>2721</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Abdullah</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Moussa</surname>
            ,
            <given-names>M I. Habib</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Rawya</given-names>
            <surname>Rizk</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>FRoTeMa: Fast and Robust Template Matching</article-title>
          .
          <source>International Journal of Advanced Computer Science and Applications</source>
          <volume>6</volume>
          (
          <year>October 2015</year>
          ),
          <fpage>195</fpage>
          -
          <lpage>200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Joel</given-names>
            <surname>Oberstone</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Diferentiating the Top English Premier League Football Clubs from the Rest of the Pack: Identifying the Keys to Success</article-title>
          .
          <source>Journal of Quantitative Analysis in Sports 5</source>
          ,
          <issue>3</issue>
          (
          <year>2009</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Sander</given-names>
            <surname>Oude Elberink</surname>
          </string-name>
          and
          <string-name>
            <given-names>B</given-names>
            <surname>Kemboi</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>User-assisted Object Detection by Segment Based Similarity Measures in Mobile Laser Scanner Data</article-title>
          .
          <source>In ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Vol. XL-3</source>
          .
          <fpage>239</fpage>
          -
          <lpage>246</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Oyama</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Nakao</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Automatic extraction of specific scene from sports video</article-title>
          .
          <source>In 2015 10th Asian Control Conference (ASCC)</source>
          .
          <article-title>1-4</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Vyacheslav</given-names>
            <surname>Parshin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Liming</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Video Summarization Based on User-defined Constraints and Preferences</article-title>
          .
          <source>In Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval (RIAO '04)</source>
          . Paris, France, France,
          <fpage>18</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Sarvaiya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Patnaik</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Bombaywala</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Image Registration by Template Matching Using Normalized Cross-Correlation</article-title>
          .
          <source>In 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies</source>
          .
          <fpage>819</fpage>
          -
          <lpage>822</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Adel</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Sewisy</surname>
            ,
            <given-names>Khaled F.</given-names>
          </string-name>
          <string-name>
            <surname>Hussain</surname>
            , and
            <given-names>Amjad D.</given-names>
          </string-name>
          <string-name>
            <surname>Suleiman</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Speedup Video Segmentation via Dual Shot Boundary Detection (SDSBD)</article-title>
          .
          <source>International Research Journal of Engineering and Technology (IRJET) 3</source>
          ,
          <issue>12</issue>
          (
          <year>December 2016</year>
          ),
          <fpage>11</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Sanjivani</surname>
            <given-names>Shantaiya</given-names>
          </string-name>
          , Keshri Verma, and
          <string-name>
            <given-names>Kamal</given-names>
            <surname>Mehta</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Article: A Survey on Approaches of Object Detection</article-title>
          .
          <source>International Journal of Computer Applications</source>
          <volume>65</volume>
          ,
          <issue>18</issue>
          (March
          <year>2013</year>
          ),
          <fpage>14</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Colin</given-names>
            <surname>Trainor</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Goalkeepers: How repeatable are shot saving performances</article-title>
          ? http://www.statsbomb.com/
          <year>2014</year>
          /10/ goalkeepers-how
          <article-title>-repeatable-are-shot-saving-performances</article-title>
          .
          <source>Accessed: February</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>GKStopper</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>PROFESSIONAL GOALKEEPER SOFTWARE: The app that tracks keeper performance</article-title>
          . http://gkstopper.com/. Accessed:
          <year>February 2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>L</given-names>
            <surname>Vibha</surname>
          </string-name>
          ,
          <string-name>
            <surname>Chetana Hegde</surname>
            ,
            <given-names>P Shenoy</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venugopal</surname>
            <given-names>K R</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Lalit</given-names>
            <surname>Patnaik</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Dynamic Object Detection, Tracking and Counting in Video Streams for Multimedia Mining</article-title>
          . IAENG
          <source>International Journal of Computer Science</source>
          (01
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Ngan</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Segmentation and Tracking Multiple Objects Under Occlusion From Multiview Video</article-title>
          .
          <source>IEEE Transactions on Image Processing</source>
          <volume>20</volume>
          ,
          <issue>11</issue>
          (Nov
          <year>2011</year>
          ),
          <fpage>3308</fpage>
          -
          <lpage>3313</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>