=Paper=
{{Paper
|id=Vol-2327/UISTDA2
|storemode=property
|title=Video Scene Extraction Tool for Soccer Goalkeeper Performance Data Analysis
|pdfUrl=https://ceur-ws.org/Vol-2327/IUI19WS-UISTDA-2.pdf
|volume=Vol-2327
|authors=Yasushi Akiyama,Rodolfo Garcia Barrantes,Tyson Hynes
|dblpUrl=https://dblp.org/rec/conf/iui/AkiyamaBH19
}}
==Video Scene Extraction Tool for Soccer Goalkeeper Performance Data Analysis==
Video Scene Extraction Tool for Soccer Goalkeeper
Performance Data Analysis
Yasushi Akiyama Rodolfo Garcia Tyson Hynes
Saint Mary’s University Saint Mary’s University Kilo Communications
Halifax, Nova Scotia Halifax, Nova Scotia Halifax, Nova Scotia
Yasushi.Akiyama@smu.ca Rodolfo.Garcia.Barrantes@smu.ca tyson@gkstopper.com
ABSTRACT own games or those of competing teams for analysis pur-
We will present a new approach for the scene extraction poses, they typically fast forward through game footage until
of sport videos by incorporating the user-interactions to they find the segments that show important plays within
specify certain parameters during the extraction process, in- these games. For example, Stopper [31] is a mobile app that
stead of relying on fully automated processes. It employs a tracks soccer goalkeeper performance and provides analyti-
scene search algorithm and a supporting user interface (UI). cal data visualizations. The users of this app can record the
This UI allows the users to visually investigate the scene data while watching live games or retrospectively watching
search results and specify key parameters, such as the refer- the recorded videos. While a single soccer game is typically
ence frames and sensitivity threshold values to be used for 90 minute long, the amount of time a goalkeeper is involved
the template matching algorithms, in order to find relevant in plays is significantly less than the full duration of a game.
frames for the scene extraction. We will show the results of Thus, it would be ideal if a previously edited shorter version
this approach using two videos of youth soccer games. Our of the video that only shows the relevant plays (i.e., video
main focus in these case studies was to extract segments highlights) is provided for the users of Stopper so that they
of these videos, in which the goalkeepers interacted with will not need to skip irrelevant parts within a game.
the balls. The resulting videos can then be exported for fur- While some video segmentation and summary generation
ther player performance analyses enabled by Stopper, an app algorithms exist and work in certain domains [9, 15, 26], to
that tracks keeper performance and provides analytical data our knowledge, there is no approach that can directly be
visualizations. applied to our problem domain. Our system provides an in-
tuitive UI that allows the users to specify certain areas of a
KEYWORDS video frame to be used for the template-matching algorithms.
UI tool for spatial and temporal data analyses, video analysis, The system will then find all the relevant frames based on
video segmentation algorithms, interactive data processing, the template matching results. The tool also allows the users
image processing, template-matching to select the sensitivity of template-matching so as to con-
trol how many false-positive frames are to be included in,
ACM Reference Format: or false-negative frames excluded from, the resulting video
Yasushi Akiyama, Rodolfo Garcia, and Tyson Hynes. 2019. Video highlights.
Scene Extraction Tool for Soccer Goalkeeper Performance Data The rest of the paper is organized as follows. We will first
Analysis. In Joint Proceedings of the ACM IUI 2019 Workshops, Los briefly describe Stopper in Section 2 to give more context
Angeles, USA, March 20, 2019 , 9 pages.
of the current research. In Section 3, we will discuss the
overview of the past research and approaches to address-
1 INTRODUCTION ing similar problems. Section 4 will describe the proposed
In this paper, we will present a new approach to extract- approach, together with the UI that is designed to provide
ing segments of sport videos by incorporating the user- certain user interactions to select several parameters. Sec-
interactions to specify certain parameters during the ex- tion 5 will show the results of the case-studies to test our ap-
traction process, instead of relying on fully automatic ap- proaches in different settings. Our main focus was to extract
proaches. When coaches and players review videos of their segments of these videos, specifically when the goalkeepers
interacted with the balls. These videos can then be exported
for further player performance analyses enabled by Stopper.
IUI Workshops’19, March 20, 2019, Los Angeles, USA We will finally provide conclusions and discussions on the
© 2019 Copyright for the individual papers by the papers’ authors. Copying implications for the future work in Section 6.
permitted for private and academic purposes. This volume is published and
copyrighted by its editors.
IUI Workshops’19, March 20, 2019, Los Angeles, USA Akiyama, Garcia, and Hynes
2 STOPPER the users with the extracted videos. In this way, they will
Stopper (shown in Fig. 1) is a mobile app developed to record not need to watch the entire 90 minute of the soccer game
and visualize soccer goalkeeper game performance data. in order to log the performance data.
Users can track the data in five key performance areas: (1) Saves
(a shot directed towards the goal that is intercepted by the
goalkeeper), (2) Goals Against (a shot that passes over the 3 RELATED WORK
goal line), (3) Crosses (a ball played into the centre of the Automatic Video Segmentation
field), (4) Distribution (a pass by the goalkeeper using either There have been studies in the related problem domains. One
their hands or feet), and (5) Communication (how the goal- group of research focused on video segmentation approaches,
keeper verbally and in gestures supports and organizes their specifically for videos of sports. Oyama and Nakao [25] pro-
team), which collectively provide a framework for analysing posed an approach to identifying different types of plays
goalkeeper strengths and weaknesses. (i.e., scrum, lineout, maul, lack, place-kick) in a rugby video
based on the image analysis of player interactions. Li and
Sezan [17] also proposed an approach to classifying differ-
ent plays in sport videos, using broadcast videos of baseball
and football. Ekin et al. [5] used the low-level analysis for
cinematic feature extraction for scene boundary detection
and scene classifications in soccer videos. A slightly different
approach was proposed by Baillie and Jose [1], using an au-
dio signal analysis to detect certain scenes by incorporating
Hidden Markov model classifiers in their algorithm. All these
studies utilize broadcast videos, often of professional sports,
that were shot from multiple cameras positioned at different
locations in stadiums. Thus, switching between scenes, or
cuts, often gave sufficient cues for these approaches to de-
tect different plays in these games. Since our current work
Figure 1: Stopper is a mobile app developed to record and
visualize soccer goalkeeper game performance data.
is focused on the analyses of videos of youth players, the
videos are usually recorded by a single camera, positioned
to align with the centre line of a field. Different plays are
Commonly used metrics such as Goals Against Average
recorded by panning the camera horizontally so there are
(GAA), Save Percentage (Sv%), and Expected Goals (xG) pro-
no “cuts” to be detected in the recording.
vide limited correlation to goalkeeper ability [7, 30]. As a re-
Video segmentation and scene detection approaches out-
sult, analysing individual goalkeeper performance separately
side of the sport video domain have also been investigated [21,
from the overall team performance carries an inescapable
28]. These approaches detect scenes/segments based on cuts
degree of subjectivity [6, 23]. The resulting data based on
that typically produce abrupt changes in video boundaries
Stopper’s five key components can establish a more compar-
or on video transitions that exhibit certain characteristics
ative benchmark for individual player performance, and it is
in visual parameters such as colour and brightness changes.
less likely influenced by the quality of the defensive play of
However, these approaches suffer from the same issue as the
their own team or the attacking capability of opposing teams,
above ones that capitalize on cuts and switching between
compared to the traditional performance measurements.
cameras. Further, some approaches can work well in cer-
While Stopper’s analytical data visualizations in these
tain sports (e.g., detecting scrums in rugby that have distinct
performance areas can help understand the overall keeper
player interactions/formations) but are not straightforwardly
performance, corresponding videos showing the tracked ac-
applicable to other sports. For example, soccer games typ-
tions will provide a crucial component for more detailed
ically have variety of plays that do not necessarily exhibit
analyses as a training and coaching tool. Currently, the users
visual patterns in player interactions, perhaps with only very
first log the goalkeeper performance using Stopper while
few exceptions such as corner kicks or penalty kicks. In the
watching the game. Once the data is recorded, Stopper uses
case of audio analysis [1], the approach relies on a large
timestamps of the goalkeeper actions logged during a game
number of spectators to generate sufficiently salient audio
to generate video snippets for individual goalkeeper actions.
features. Most youth games may not even have any spec-
Our focus in the current research is somewhat the reverse
tators or audience (e.g., practice games and scrimmages) to
of this process. That is, we will first extract video segments
generate audible cues to detect certain plays. Thus, none of
that only contain the goalkeeper interactions and provide
Vid. Scene Ext. Tool for Soccer Goalkeeper Perf. Data Analysis IUI Workshops’19, March 20, 2019, Los Angeles, USA
the above approaches will work well in our specific problem (1) The user uploads an original video and specify a
space. reference frame
Users will first select a video, from which they want to create
Object Detection and Template Matching Algorithms video highlights, using the provided interface (shown in
Figure 2). They will then specify what we call a reference
Approaches to the object detection in images and video can
frame. Reference frames are the frames, in which they specify
broadly be devided into four categories [29], feature-based,
areas to be used to find relevant frames that contain certain
motion-based, classifier-based [11, 18], and template-based.
objects or backgrounds. The UI tool allows the users to skip
Feature-based object detection utilizes object features such as
back and forth to find a frame that shows objects that most
shapes [20] and colours [16]. These approaches, however, did
likely appear when the target actions occur. For example, if
not work well in our preliminary investigation when we tried
we are to find the segments that show the goalkeeper who
to keep track of players (e.g., keepers) or a soccer ball, due to
is on the right-hand side of the pitch in action, then the user
several potential factors such as objects (e.g., humans) often
should select a frame when the camera pans to the right so
changing shapes during the play, colours of uniforms being
that it includes the entire goal area (shown in Figure 3).
too similar to the background colours (e.g., green jerseys
on green grass), and frequent occlusions of target objects.
Motion-based detection approaches often use static back-
ground reference frame(s) and detect changes in the fore-
ground by eliminating the background images [12, 14, 32, 33].
These approaches typically require static background im-
ages, but in our case, since the camera follows soccer balls,
the background images keep changing, making it difficult
for us to straightforwardly apply them to detect objects in
videos. We also investigated the possibility of integrating Figure 2: The UI tool allows users to select and preview the
some classifier-based approaches into our framework; how- target video.
ever, we could not find suitable solutions that could detect
frames with target objects, especially when there are not
sufficient samples for the model training. Therefore, we used
a simple template matching algorithm using the normal-
ized cross-correlation [27] in our framework. As will be dis-
cussed in this paper, the object detection approach itself in
our framework can be switched to another, potentially better,
solution later. The focus of the current paper is to propose
a generic framework of video segmentation and show the
early results of this proposed approach.
To this end, we have observed that generic automatic video
segmentation approaches can benefit from certain domain
knowledge. For instance, the work such as done by Kim et
al. [13] and Oude-Elberink and Kemoi [24] integrate the user-
interaction for their object detection and tracking algorithms Figure 3: The film strip shows video frames, some of which
in the video. Our proposed approach also utilizes the user are candidate reference frames (indicated by the red outline)
input in order to complement and improve the automatic to be used in the next step.
video segmentation algorithms. We will now describe our
approach in the next section.
(2) The user selects reference areas to be used for the
frame search
4 PROPOSED APPROACH
The next step is to specify areas of the reference frame that
Our proposed approach will work in five basic steps, inter- the users want to use for the relevant frame search. We call
actively with the users’ input. This section describes each these areas reference areas. This step is necessary for reduc-
of these steps in detail, together with the corresponding UI ing the chances of the algorithm detecting irrelevant frames
modules, a prototype of which is developed as a web-based due to the overall similarity of the video frames. For example,
application. soccer videos often contain many frames that are considered
IUI Workshops’19, March 20, 2019, Los Angeles, USA Akiyama, Garcia, and Hynes
similar by most similarity metrics due to the fact that certain
background images such as bleachers and grass on the pitch
appear in almost every single frame in the video. However,
we do not want to include these background areas because
they are too generic and are not great references in terms
of finding relevant frames. Instead, we need to only include
portions of the reference frame that display salient objects or
features that can be used to identify relevant frames. For ex-
ample, if the users are to find video segments that contain the
goalkeeper’s interactions, then they may choose reference
areas that show the entire goal area and/or the goal itself. Figure 5: Each reference area is compared against an area in
This step of reference area selection is depicted in Figure 4. the target frame, by shifting it pixel by pixel, while calculat-
ing the correlation of the reference area and an area of the
same size at each location within the target frame.
X ( )
C (x, y) = T (x ′, y ′ ) · R(x + x ′, y + y ′ ) (1)
x ′,y ′
(a) After the user selected (b) After the user selected Further, based on the general observation that a frame in
one reference area. another reference area. The the video may likely have different lighting/intensity based
user continues to add as
on factors such as camera angles and exposures, we will
many reference areas as
use the normalized cross-correlation to mitigate the lighting
they wish to.
effects:
Figure 4: The selection of reference areas to be used for the ( )
T (x ′, y ′ ) · R(x + x ′, y + y ′ )
P
relevant frame search algorithm. x ′,y ′
C (x, y) = qP (2)
x ′,y ′ T (x , y ) · x ′,y ′ R(x + x , y + y )
′ ′ 2 P ′ ′ 2
(3) The system rates each frame in the original video
We calculate C (x, y) for all the pixels given by Eq. 2, and
with the relevance metric
then use the maximum value in C (x, y) (i.e., the highest
For each frame in the original video, the algorithm will cal- likelihood of the reference area matched in the target frame)
culate its likelihood of containing each of the reference areas as the relevance metric for this frame. We repeat this process
by using a template matching algorithm. It will repeat the for each reference area and calculate the overall likelihood of
process for all the reference areas and calculate the overall the reference areas appearing in the frame. The pseudocode
likelihood of the reference areas appearing in that frame, as of this entire step of calculating the frame relevance metric
an average likelihood for all the reference areas. The tem- is described in Algorithm 1.
plate matching algorithm that was employed in our case
studies was provided in an open source library for computer Algorithm 1 Relevant frame search algorithm
vision and machine learning software and image processing,
for each frame fi in original video do
called OpenCV (Open Source Computer Vision Library) [2].
sum ← 0
The function matchtemplate in this library calculates the
n←0
cross-correlation [27] between a reference area and the tar- for each reference area a j do
get frame. Conceptually, it scans the target frame by sliding a pi j ← Likelihood of a j appearing in fi
reference area (i.e., the template) over the target frame pixel sum ← sum + pi j
by pixel, while calculating the correlation of the two images n ←n+1
at each location: the reference area and the portion of the end for
frame underneath it. This process is depicted in Fig.5. avei ← sumn (n > 0)
Let C (x, y) be the cross-correlation of the two images at if avei > threshold then
a pixel (x, y), T (x, y) the pixel value of the target frame at relevantFrames.add ( fi )
(x, y), and R(x, y) the pixel value of the reference area at end if
end for
(x, y), the metric is calculated by the following formula.
Vid. Scene Ext. Tool for Soccer Goalkeeper Perf. Data Analysis IUI Workshops’19, March 20, 2019, Los Angeles, USA
We have experimented with other metrics commonly used interaction is depicted in Fig.7. Therefore, with this visual
for the template matching such as the sum of squared differ- aid, the user will need to spend less time to scan the original
ence [10], but the normalized cross-correlation yielded the video as it allows them to directly get to the frames that will
best results overall. While the current paper presents our likely include the keeper interactions with the ball. It is this
entire framework of the video segmentation processes that interactive visual investigation of the video data that allows
can easily be facilitated by novice users, the template match- the users to minimize the time spent on searching for the
ing algorithm itself in our framework can also be replaced relevant frames.
by others (e.g., [4, 19, 22]) and by incorporating certain ma-
chine learning algorithms such as deep learning models [3]
that may potentially improve the accuracy of the template
matching.
(4) The user selects a threshold value
The tool will now display the resulting relevance metrics
from the previous step, and using this visualized data, the
user can select an ideal threshold to be used for the next step.
The UI allows the user to move the threshold line on the visu-
alized relevance metrics data, so that they can control which
section(s) of the original video to be included. The green
dots in Fig. 6 indicate the frames to be included in the final
extracted video highlights, while the frames indicated by the
red dots will be excluded from them. Naturally, lowering the
threshold may include false positive frames (i.e., irrelevant
frames), while raising it may result in false negative frames
(i.e., missed relevant frames).
Figure 7: The corresponding video frames will be shown by
clicking the data points in the visualization tool, allowing
the users to interactively inspect the video based on the rel-
evance metrics calculated by the tool. This interactive visual
investigation of the video data presumably reduces the time
spent to search for important plays in the video.
Figure 6: The user moves the line on the visualized relevance
metrics to control the threshold. The frames indicated by
the green dots are to be included in the final extracted video
segments. In this particular example, the threshold used in (5) The system extracts video segments with the
the bottom-left plot effectively separates the two groups of frames with the relevance rating higher than a
data points. threshold value
The final step is to extract video segments that contain the rel-
This visualization tool is also synchronized with the video evant frames. Our current approach is to take all the frames
viewer. That is, as the user clicks on the data points in the with relevance metric values above the specified threshold
data plot, the video viewer’s time cursor is also moved to (i.e., all the green segments shown in Fig. 6). The video seg-
that particular point in time, allowing the user to visually ments will then be created by sequencing all these relevant
inspect the corresponding plays in the original video. This frames.
IUI Workshops’19, March 20, 2019, Los Angeles, USA Akiyama, Garcia, and Hynes
5 CASE STUDIES
In the case studies, we used video footage from two US Soc-
cer Development Academy league games. Both the videos
were in the MPEG-4 (AAC, H.264 codec) format and had
the dimensions of 1280 by 720 with the frame rate of 29.97
frames per second (fps).
(1) Video #1: Contains 15 minutes of a soccer game, with
a relatively clear weather and fair lighting. Figure 9: The reference area that shows the goal on the right-
(2) Video #2: Contains 10 minutes of a soccer game, under hand side of the pitch.
the rainy condition with darker lighting.
Both the cameras were set a few metres above the ground
so they were slightly looking down the pitch. They were
secured on tripods and the panning mechanism was used to The results are shown in Fig. 10. Most of the anticipated
keep track of the ball, thus the videos only show a portion of frames were detected as relevant, with the highest relevance
the pitch at one time, and never the entire field at any given metric was indeed detected at the reference frame around
time. All the cases were run on a MacBook Pro (13-inch, 2018) at the 131st second. However, the algorithm missed some
with 2.3GHz Intel Core i5 CPU and 8GB 2133MHz LPDDR3 frames that should have been considered as relevant in terms
memory. of the plays in which the keeper was involved. For example,
if you look at the frame at the bottom left in Fig. 10 that
Case 1: Detecting the keeper interactions on the shows the play at 129 seconds into the video. This play was
right-hand side of the pitch on Video #1, with the right before the reference frame and the keeper is actually
reference area containing the goal holding the ball. However, this and some of the other frames
In order for the template-matching to work, we will first need leading up to the reference frame were omitted from the
to choose a reference area(s) that will be unique and static in relevance frames. This omission was in fact inevitable as the
shapes and colours for the most part of the video. For exam- reference area clearly shows the entire goal while this frame
ple, choosing goalkeepers themselves as references do not at 129th second is missing the right side of the goal. One
typically produce ideal results as they move around while the solution to include these frames is to lower the threshold, but
shape of the object (i.e., a human) changes significantly. Also, it will also include irrelevant frames that appear earlier in
in the videos that we used in these case studies, the keepers the video. Therefore, while the template matching algorithm
wore shirts with neon yellow and green colours, which often itself seems to have worked properly, we probably did not
blended in with the green colour of the grass, potentially choose the most ideal reference frame/area(s).
confusing the template-matching algorithm. Therefore, we
chose a reference frame that shows the entire goal on the
right-hand side of the pitch (shown in Fig. 8) and selected
this goal as a reference area (shown in Fig. 9). After the visual
inspection of the data, we used the correlation metric of 0.98
as the threshold to create the resulting videos.
Figure 10: The results of Case 1.
Figure 8: The reference frame that shows the entire goal on
the right-hand side of the pitch.
Vid. Scene Ext. Tool for Soccer Goalkeeper Perf. Data Analysis IUI Workshops’19, March 20, 2019, Los Angeles, USA
Case 2: Detecting the keeper interactions on the
right-hand side of the pitch on Video #1, with the
reference areas containing both the goal and the
unique background area
Given the above results, in addition to the goal, we also
experimented with an additional reference area, which shows
a unique background area shown in Fig. 11, thus we used
both the goal as well as this unique background from the
same reference frame to perform the template-matching.
Figure 13: The reference area used to identify the relevant
frames for the keeper on the right-hand side of the pitch.
Figure 11: The reference area that shows a unique back-
ground area.
As shown in Fig. 12, this additional reference area im-
proved the performance in that it included those frames (e.g.,
the top left frame shown in Fig. 12) that did not have the
entire goal but were parts of the play, in which the keeper
interacted with the ball. This result illustrates the importance
of integrating the user input into these processes instead of
relying on entirely automated approaches.
Figure 14: The results of Case 3 (using the threshold value
of 0.978).
Case 4: Detecting the keeper interactions on the
right-hand side of the pitch on Down-sampled Video
#2
For all the above cases, we used the original frame rate of
29.97 fps and ran the relevant frame search algorithm on all
the frames. However, typical plays of soccer do not require
such a high frame-rate for our purposes, thus we experi-
mented to first down-sample the original video to lower
frame rates of 16 fps and 4 fps, in order to increase the ef-
Figure 12: The results of Case 2 (using the threshold value of ficiency of our approach. Once we identified the relevant
0.946). The resulting video included frames even when the frames, we used the original video to extrapolate the cor-
goal is not shown but they were parts of the relevant play. responding segments of the video. As seen in Fig. 15 that
shows the results of comparing three different frame rates,
it produced almost identical curves of the relevant metrics.
Case 3: Detecting the keeper interactions on the The results showed that this down-sampling significantly
right-hand side of the pitch on Video #2 accelerated the process without affecting the overall results.
We also tested with the video with some visible noise caused To give the general idea of the processing time for the rele-
by the rain. The reference area used in this test is shown in vant metrics calculations, Table 1 shows the calculation time
Fig. 13. for Video #2 which was ten minute long. Based on this ob-
As shown in Fig. 14, the expected relevant frames were still servation, the tool was able to calculate the relevant frames
appropriately detected, despite that the visibility condition in about 1/16 of the length of the original video. Note that
was not as ideal as the first two cases. the calculation time itself of course can also be improved
IUI Workshops’19, March 20, 2019, Los Angeles, USA Akiyama, Garcia, and Hynes
player actions, and visualizes the relevance metrics to deter-
mine the optimal threshold value for the video extraction.
The users can interactively investigate the corresponding
video segments capitalizing on this visualization tool, thus
likely spending less time to search for important plays in the
videos. The case-studies showed that our approach worked
(a) The original 29.97 fps.
well with certain videos, but there are several factors that af-
fected the performance of the approach and we are currently
working to improve it in multiple aspects.
One such aspect is the further investigation to compare
the template-matching and object detection algorithms. As
discussed throughout the paper, there are some algorithms
(b) 16 fps. that may potentially improve the accuracy of the tool. Some
algorithms may be more suitable for certain conditions such
as videos with specific image backgrounds or lighting condi-
tions. Some of the predictive models such as those utilizing
the deep learning algorithms may potentially be an option
once we obtain enough video data in order to train the mod-
els. In this case, a potential approach is to first run a cluster-
(c) 4 fps. ing algorithm on videos based on certain parameters such as
background types and lighting conditions, and then create
Figure 15: The relevance metrics of the same video, using separate models for each of those types.
(a) the original 29.97 fps frame rate, (b) 16 fps, and (c) 4 fps. As well, the threshold is currently determined by the users
They all produced the similar curves as well as the extracted before the system renders the video, but this may potentially
videos. be estimated, for example, by integrating known threshold
estimation methods [8]. Finally, the framework itself can
Table 1: The comparisons of the relevance metric cal- potentially be applied to other similar types of sports such
culation time based on the different frame rates. as basketball, rugby, and field hokey. Our approach will of
course need to be modified to accommodate differences in
games. For example, one experiment that we conducted with
Frame rate (fps) Time (seconds)
a basketball video revealed that, while the tool did work
Full 207.88
relatively well detecting any plays near the hoop, since the
16 106.68
game moves much faster than soccer, there should be some
4 38.09
sort of mechanism to include frames leading up to, and after
those plays to be included to show more complete sequence
of actions. Solutions to these new challenges posed by other
further in a few ways, for example, by calculating the frame types of sports will likely lead to further improvement of the
relevance metrics in parallel, as the relevance metric for each tool in general.
frame does not depend on the other frames’ results, by down-
sampling the video resolution, or skipping a number of pixels
during the template matching instead of checking against 7 ACKNOWLEDGEMENT
every single pixel. This research was supported by Natural Research Coun-
cil (NRC) Canada Industrial Research Assistance Program
6 CONCLUSIONS AND FUTURE WORK and Nova Scotia Business Inc. Productivity and Innovation
We proposed a new framework of semi-automatic video seg- Voucher Program. A special thanks to Jyothi Sethi, an M.Sc.
mentation of sport videos, and the UI tool that implements student at Saint Mary’s University, who contributed her skills
the proposed approach. Instead of relying on a fully auto- to implement the UI tool.
matic method, our approach consists of five fundamental
steps that integrate the user input and knowledge to help REFERENCES
reduce potential errors. The provided UI tool allows the users [1] M. Baillie and J. M. Jose. 2004. An Audio-Based Sports Video Segmen-
to easily select a reference frame and reference areas that tation and Event Detection Algorithm. In 2004 Conference on Computer
are used to detect relevant video frames that contain target Vision and Pattern Recognition Workshop. 110–110.
Vid. Scene Ext. Tool for Soccer Goalkeeper Perf. Data Analysis IUI Workshops’19, March 20, 2019, Los Angeles, USA
[2] G. Bradski. 2000. The OpenCV Library. Dr. Dobb’s Journal of Software Computer Vision and Pattern Recognition (CVPR ’13). IEEE Computer
Tools (2000). Society, Washington, DC, USA, 2714–2721.
[3] Davit Buniatyan, Thomas Macrina, Dodam Ih, Jonathan Zung, and [22] Abdullah M. Moussa, M I. Habib, and Rawya Rizk. 2015. FRoTeMa:
H. Sebastian Seung. 2017. Deep Learning Improves Template Matching Fast and Robust Template Matching. International Journal of Advanced
by Normalized Cross Correlation. CoRR abs/1705.08593 (2017). Computer Science and Applications 6 (October 2015), 195–200.
[4] Luigi Di Stefano, Stefano Mattoccia, and Federico Tombari. 2005. [23] Joel Oberstone. 2009. Differentiating the Top English Premier League
ZNCC-based Template Matching Using Bounded Partial Correlation. Football Clubs from the Rest of the Pack: Identifying the Keys to
Pattern Recogn. Lett. 26, 14 (Oct. 2005), 2129–2134. Success. Journal of Quantitative Analysis in Sports 5, 3 (2009), 1–29.
[5] A. Ekin, A. M. Tekalp, and R. Mehrotra. 2003. Automatic soccer video [24] Sander Oude Elberink and B Kemboi. 2014. User-assisted Object Detec-
analysis and summarization. IEEE Transactions on Image Processing 12, tion by Segment Based Similarity Measures in Mobile Laser Scanner
7 (July 2003), 796–807. Data. In ISPRS - International Archives of the Photogrammetry, Remote
[6] Garry Gelade. 2014. Evaluating the ability of goalkeepers in English Sensing and Spatial Information Sciences, Vol. XL-3. 239–246.
Premier League football. Journal of Quantitative Analysis in Sports 10, [25] T. Oyama and D. Nakao. 2015. Automatic extraction of specific scene
2 (2014), 279–286. from sports video. In 2015 10th Asian Control Conference (ASCC). 1–4.
[7] Sam Gregory. 2015. GoalkeepersâĂŹ save percentage an unreliable [26] Vyacheslav Parshin and Liming Chen. 2004. Video Summarization
stat. http://www.sportsnet.ca/soccer. Accessed: February, 2019. Based on User-defined Constraints and Preferences. In Coupling Ap-
[8] Bruce E. Hansen. 2003. Sample Splitting and Threshold Estimation. proaches, Coupling Media and Coupling Languages for Information
Econometrica 68, 3 (2003), 575–603. Retrieval (RIAO ’04). Paris, France, France, 18–24.
[9] T. Hashimoto, Y. Shirota, A. Iizawa, and H. Kitagawa. 2001. Digest [27] J. N. Sarvaiya, S. Patnaik, and S. Bombaywala. 2009. Image Registra-
making method based on turning point analysis. In Proceedings of the tion by Template Matching Using Normalized Cross-Correlation. In
Second International Conference on Web Information Systems Engineer- 2009 International Conference on Advances in Computing, Control, and
ing, Vol. 1. 83–91 vol.1. Telecommunication Technologies. 819–822.
[10] M. B. Hisham, S. N. Yaakob, R. A. A. Raof, A. B. A. Nazren, and N. M. [28] Adel A. Sewisy, Khaled F. Hussain, and Amjad D. Suleiman. 2016.
Wafi. 2015. Template Matching using Sum of Squared Difference and Speedup Video Segmentation via Dual Shot Boundary Detection
Normalized Cross Correlation. In 2015 IEEE Student Conference on (SDSBD). International Research Journal of Engineering and Technology
Research and Development (SCOReD). 100–104. (IRJET) 3, 12 (December 2016), 11–14.
[11] Matroid Inc. 2019. Matroid. https://www.matroid.com/. Accessed: [29] Sanjivani Shantaiya, Keshri Verma, and Kamal Mehta. 2013. Article: A
February 2019. Survey on Approaches of Object Detection. International Journal of
[12] S. Johnsen and Ashley Tews. 2009. Real-Time Object Tracking and Computer Applications 65, 18 (March 2013), 14–20.
Classification Using a Static Camera. In IEEE International Confer- [30] Colin Trainor. 2014. Goalkeepers: How repeatable are shot
ence on Robotics and Automation - Workshop on People Detection and saving performances? http://www.statsbomb.com/2014/10/
Tracking. goalkeepers-how-repeatable-are-shot-saving-performances. Ac-
[13] Munchurl Kim, J.G. Jeon, J.S. Kwak, M.H. Lee, and C. Ahn. 2001. Mov- cessed: February, 2019.
ing object segmentation in video sequences by user interaction and [31] GKStopper. 2019. PROFESSIONAL GOALKEEPER SOFTWARE: The
automatic object tracking. Image and Vision Computing 19, 5 (2001), app that tracks keeper performance. http://gkstopper.com/. Accessed:
245 – 260. February 2019.
[14] Rajshree Lande and R. M. Mulajkar. 2018. Moving Object Detection [32] L Vibha, Chetana Hegde, P Shenoy, Venugopal K R, and Lalit Pat-
using Foreground Detection for Video Surveillance System. Interna- naik. 2008. Dynamic Object Detection, Tracking and Counting in
tional Research Journal of Engineering and Technology 5, 6 (June 2018), Video Streams for Multimedia Mining. IAENG International Journal of
517–519. Computer Science (01 2008).
[15] H. H. Le, T. Lertrusdachakul, T. Watanabe, and H. Yokota. 2008. Au- [33] Q. Zhang and K. N. Ngan. 2011. Segmentation and Tracking Multiple
tomatic Digest Generation by Extracting Important Scenes from the Objects Under Occlusion From Multiview Video. IEEE Transactions on
Content of Presentations. In 2008 19th International Workshop on Data- Image Processing 20, 11 (Nov 2011), 3308–3313.
base and Expert Systems Applications. 590–594.
[16] S. Lefevre, E. Bouton, T. Brouard, and N. Vincent. 2003. A new way
to use hidden Markov models for object tracking in video sequences.
In Proceedings 2003 International Conference on Image Processing (Cat.
No.03CH37429), Vol. 3. III–117.
[17] B. Li and M. Ibrahim Sezan. 2001. Event detection and summarization
in sports video. In Proceedings IEEE Workshop on Content-Based Access
of Image and Video Libraries (CBAIVL 2001). 132–138.
[18] Yi Liu and Yuan F. Zheng. 2005. Video object segmentation and track-
ing using ψ -learning classification. Circuits and Systems for Video
Technology, IEEE Transactions on 15 (2005), 885 – 899.
[19] David G. Lowe. 2004. Distinctive Image Features from Scale-Invariant
Keypoints. International Journal of Computer Vision 60, 2 (01 Nov 2004),
91–110.
[20] Wei-Lwun Lu and J. J. Little. 2006. Simultaneous Tracking and Action
Recognition using the PCA-HOG Descriptor. In The 3rd Canadian
Conference on Computer and Robot Vision (CRV’06). 6–6.
[21] Zheng Lu and Kristen Grauman. 2013. Story-Driven Summarization
for Egocentric Video. In Proceedings of the 2013 IEEE Conference on