=Paper=
{{Paper
|id=Vol-3762/572
|storemode=property
|title=Video Analytics for Volleyball: Preliminary Results and Future Prospects of the 5VREAL Project
|pdfUrl=https://ceur-ws.org/Vol-3762/572.pdf
|volume=Vol-3762
|authors=Andrea Rosani,Ivan Donadello,Michele Calvanese,Alessandro Torcinovich,Giuseppe Di Fatta,Marco Montali,Oswald Lanz
|dblpUrl=https://dblp.org/rec/conf/ital-ia/RosaniDCTFML24
}}
==Video Analytics for Volleyball: Preliminary Results and Future Prospects of the 5VREAL Project==
Video Analytics for Volleyball: Preliminary Results and
Future Prospects of the 5VREAL Project
Andrea Rosani, Ivan Donadello, Michele Calvanese*, Alessandro Torcinovich,
Giuseppe Di Fatta, Marco Montali and Oswald Lanz
Libera Università di Bolzano, Piazza Università 1, Bozen-Bolzano, 39100, Italy
Abstract
This paper introduces a real-time action recognition and tactical-behavior mining system
designed specifically for volleyball games. The system aims to provide data augmentation, video
annotation and KPI extraction processes by accurately identifying various actions and action
sequential patterns performed during volleyball matches. Leveraging advanced computer vision
techniques, the system aims at automatically detecting and recognizing player actions and group
actions in real time. Then, Process Mining techniques are used to extract tactical behaviors, in the
form of temporal relations, among player actions. By providing precise annotations, the system
significantly provides an instrument for volleyball game analytics and tactical analysis. This
paper outlines the architecture and key components of the real-time action recognition and
tactical-behavior mining system and presents some preliminary results on the performance of
the proposed model.
Keywords
Video action recognition, data augmentation, video annotation, process mining, sports
1
1. Introduction developments obtained. First, a review of some
Over the past decade, action recognition in particularly relevant works in the specific field is
professional sport activities has rapidly gained proposed. Then, methods and algorithms are
popularity as a tool for a variety of tasks such as player described, along with some results of preliminary
experiments on a public dataset [7].
performance analytics, computer-aided game
refereeing, and the like. In response to this interest, 1.1. Context: the 5VREAL Project
several action recognition systems have been devised This paper describes the preliminary results obtained
in the context of several sports, such as football, during the activity related to the project 5VREAL – 5G
basket, rugby, etc. Volley Reality Experience & Analytics Live, focused on
In this context, this paper presents an action the study and implementation of a system for the
recognition system for volleyball game analysis. The acquisition, analysis and transmission of video and
preliminary results obtained during the activity focus analytics in the context of volleyball games and
on the detection of actions, events, and tactical training sessions. The project aims to create a scalable
behaviors in volley with the final objective of solution, which can be used at all levels of
providing a reliable Ai-powered data augmentation competition, professional and amateur.
system that can be used for the TV broadcasting of Two use cases are developed:
volley games in a real time scenario, as well as for off- • Fun Engagement: This use case aims to use
line analytics activities, starting from the video artificial intelligence algorithms to enrich
collected by a multi view source and shared using 5G the spectator’s experience while watching
transmission. the match with augmented reality
The document is structured into several sections information displayed in real time on the
that outline in detail the study process and the broadcasted videos.
Ital-IA 2024: 4th National Conference on Artificial Intelligence, 0009-0008-2622-6776 (A. Rosani); 000-0002-0701-5729
organized by CINI, May 29-30, 2024, Naples, Italy (I. Donadello); 0009-0005-4103-0147 (M. Calvanese);
∗ M. Calvanese contributed with work done during his Master 0000-0001-8110-1791 (A. Torcinovich); 0000-0003-3096-
Thesis project at UPC Barcelona with Prof. Carlos Andujar Gran. 2844 (Di Fatta), 0000-0002-8021-3430 (M. Montali), 0000-
0003-4793-4276 (O. Lanz)
© 2024 Copyright for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
• Coach: Use of the game & ‘rhythm’ for 3. Methodology and algorithms
technical staff. After the game, the technical
staff or directly the coach receives 3.1. General architecture of the system
indications on positions, speed, trajectories, The AI block consists of a set of algorithms required to
time intervals between touches and higher- a) identify the position and trajectory of the ball, b)
level semantic information about the tactical identify the position of individual players, and c)
behaviors of the team that can favor a more detect and identify actions performed within a
in-depth technical and tactical analysis. specific timeframe.
The involvement in the project of industrial The acquisition of images for AI occurs through
partners operating in the media production sector three iPhone 14 Pro devices mounted tripods with
will enable a real application scenario to test the calibrated cameras, connected to a backend via 5G,
performances of the proposed solution. The project is
producing synchronized SRT (Secure Reliable
funded by the Italian Ministry of Enterprises and
Made in Italy, MIMIT under the MIMIT FSC 2014-2020: Transport) compressed video streams.
Tecnologie 5G. Progetti di sperimentazione e ricerca –
Piano di Sviluppo e Coesione 2014-2020.
2. State of art in action recognition
and tactical behavior for volley
The task of action/pose estimation involves analyzing
video content to track one or more persons of interest
and identify their key anatomical features, typically
Figure 1: Overview of the architecture of the
defined as keypoints [14], [26]. When multiple actors
volleyball action recognition system.
interact, the task is usually referred as Group Activity The ball localization module starts the processing
Recognition (GAR) [18], [19], [22]. by producing a continuous data stream of the ball
GAR algorithms differ in how they model spatial trajectory. When a change in its direction is detected,
and temporal information in videos. Some dated the player tracking and action detection modules are
approaches apply recurrent models: [7] develops a activated (Figure 1). This generates an output of the
hierarchical model based on two long-short term events occurred in the selected timeframe. In the
memory (LSTM) models, [13] proposes a recurrent following, we analyze in detail the different steps. 3D
neural network (RNN) model with attention Ball tracking is described by a project partner in
another submission to Ital-IA 2024.
mechanisms and semantic graphs, [3] generates a
map of candidate regions of interest and uses an RNN 3.2. Ball trajectories change detection
architecture for temporal processing, and [24] adopts The general scheme for ball trajectory analysis can be
subdivided in the following steps (Figure 2):
a top-down approach using Gated Recurrent Unit.
1. Identification of possible candidate ball
Other works focus on convolutional mechanisms:
positions.
[2] develops a convolutional relational machine for 2. Incremental interpolation of candidates with
GAR, [19] works on individual poses using one- parabolic trajectories, producing a parabola
dimensional convolutional neural networks. for each frame.
Newer models like graph-based networks and 3. Linking of trajectories from which to derive
Transformers are also employed: [25] uses a graph- the motion of the ball.
based model for spatio-temporal relationships, 4. Detection of trigger events when the ball
designs a descriptor for crowded scenarios, and [10] undergoes an upward acceleration, such as a
[12] proposes a Transformer-based solution for player touching or a bounce on the floor.
processing spatial and temporal information. The algorithm, originally proposed in [5], requires
To recognize tactical behaviors, techniques like as input the positions of the ball at each time step, that
sequence mining algorithms and Inductive Logic can be easily devised with a ball tracking system [14].
Programming are used ([21], [19], [23]). Works in this The path of the ball is modelled by a piecewise
field include [9] and [11] for predicting complex parabolic trajectory. Initially, seed triplets are
events from football matches using Answer Set identified within a threshold distance (𝑟).
Programming and Subgraph Discovery. In our work, These triplets serve as initial anchors for parabolic
temporal pattern mining algorithms based on Linear fitting. Due to false positives, multiple seed triplets
Temporal Logics will be used, offering a different per frame may exist. Each triplet is used to fit a
approach compared to the mentioned works. parabola, and candidate detections close to the
estimated position are added to a set of supporting actions in different environments [4]. These studies
points. focus on extracting meaningful information from
videos, by detecting and recognizing what a subject is
doing [15], [16], [17].
The posture detection occurs within the video
stream, in the player's bounding box, that is the area
of interests of an object (the player, in this case)
tracked in each video frame. The detection of the
posture uses pose estimation technologies based on
machine learning models [24], that identify key
anatomical features of players, such as joints,
extremities, center of mass, etc., commonly referred to
as keypoints [8]. In the case of a volleyball player, the
bounding box is used to locate the player's position
within the video frame and subsequently extract
Figure 2: Ball trajectories analysis and trigger event keypoints on the players' bodies (Figures 4 and 5).
detection [5].
The temporally furthest points within the support
set are used to fit a new parabola. This iterative
process continues until the set of supporting points
ceases to grow. Parabolas with upward-pointing
acceleration vectors are excluded as they violate
physical constraints.
Figure 4: Example annotation from the Volleyball
dataset showing the bounding box of each player
divided by team (using different colors) and the action
performed ("Left spike"). (Image from [7])
Starting from this information is possible to
perform action recognition, as demonstrated
effectively in [16], [17] that will be used as reference
in the project for this specific task.
Figure 3: Action and Group Activity Recognition 3.4. Team activity recognition
(images from [7]). The variation in the ball trajectory The challenge of Group Activity Recognition (GAR)
identifies an interaction that triggers the event. requires addressing two main aspects. First, it
To ensure a unique parabola per frame, trajectory demands a compositional understanding of the scene.
distances are computed and used to construct a Due to the relatively high number of people present in
the scene, it's challenging to learn meaningful
weighted graph. Dijkstra's algorithm [6] identifies the
representations for GAR over the entire area. Since
optimal path through this graph, yielding the final group activities often involve subgroups of actors and
sequence of parabolas describing the ball's path. scene objects, the final label of the action depends on
Considering that the action mainly occurs around a compositional understanding of these entities.
the ball's position, the proposed solution allows for Secondly, GAR benefits from relational reasoning on
detecting changes in the direction of the ball due to scene elements to understand the relative importance
gameplay interactions. This trajectory variation of entities and their interactions [26].
triggers an analysis mechanism of the activities 4. Preliminary results
performed near the contact point to activate the In the following, we present some preliminary results
subsequent phase of recognizing the actions of obtained using state-of-the-art techniques on public
individual players and teams (Figure 3). available datasets.
3.3. Individual player action recognition 4.1. Dataset
In the rapidly evolving field of action recognition, The Volleyball dataset [7], represents a significant
many datasets, structures, and architectures have resource in the context of sports action recognition,
been introduced to address the challenges and specifically on volleyball. Although originally
complexities associated with understanding human designed for athlete action recognition, the dataset
has been extended to include the task of 2D ball
detection in the image. The dataset comprises a total
of 4830 frames from 55 videos, offering a wide variety
of actions and activities to analyze (Figure 4). In the
dataset, there are nine annotations for individual
player actions and eight group activities, detailed in
Table 1.
Table 1
Classes of individual player activities are listed, and
group actions, including the number of instances. Figure 6: Example 2D application of player
Action No. of Group Activity No. of identification and identification of ball trajectory
Classes Instances Class Instances changes ("trigger"). Keypoints can be observed on
Waiting 3601 Right set 644 each player's silhouette, along with the corresponding
Setting 1332 Right spike 623 arc of the ball trajectory.
Digging 2333 Right pass 801
Falling 1241 Right winpoint 295
Spiking 1216 Left winpoint 367
Blocking 2458 Left pass 826
Jumping 341 Left spike 642
Moving 5121 Left set 633
Standing 38696
4.2. Group activity recognition
GAR is performed at different levels. Initially, the
keypoints of the various players are extracted. Based
on these, an estimation of the action each player is
doing is defined, and then related to the predicted
level of person-to-person and person-to-group
interaction.
4.2.1. Trigger event identification and GAR
The situation that activates the GAR mechanism is
represented by the trigger, identified with the change
of the ball direction (Figure 5).
Figure 7: Our results on the Volleyball dataset
considering the Olympic Split [7], [26]. In the first
confusion matrix we represent GAR, in the second one
the single player activities.
Like humans, object representation is performed at
various granularities, as well as reasoning about their
interactions to transform sensory signals into high-
level knowledge. GAR is addressed by modeling a
video as a set of tokens representing multi-scale
Figure 5: -Detailed schema for action and group semantic concepts present in the video, thus allowing
activity recognition. the described method to be easily adaptable to
In Figure 6 we present some frames from [7], understand any video with multi-actor multi-object
processed using the proposed algorithms, detailed in interactions.
the following section, allowing for a comprehensive In the specific case of volleyball, the actors are
visualization of the keypoints of the various players represented by the players, while the object is
combined with the trajectories of the ball represented by the ball. These tokens include
4.2.2. Hierarchy of semantic events for GAR keypoints, people, person-to-person interactions,
Taking inspiration from the approach proposed in person-to-group interactions, and object interactions.
[26], composite learning of entities in the video and The performance of this analysis, compared to
relational reasoning on these entities is established. previous techniques based on standard RGB analysis
(i.e., considering the entire images and not just the with Linear Temporal Logic over finite traces (LTLf),
keypoints), shows significant accuracy (Figure 7) one of the reference logics in the field [28]. Examples
4.3. Tactical behavior of such templates are the Chain Response between
By tactical behavior, we mean a set of temporal actions A and B that means that action A must be
relationships among volleyball actions that can lead to immediately followed by action B or the Alternate
an outcome of particular interest, such as scoring a Precedence between A and B that means that action B
point. In what follows we provide a conceptual must be preceded by action A without any other
framework to formally define tactical behaviors and occurrence of B in between, see [27] Table 2. In
use Process Mining (PM) techniques for mining addition, RuM provides the selection of a numeric
tactical behaviors from annotated volleyball matches. support that indicates the percentage of occurrence of
4.3.1. A conceptual model for tactical behaviors a particular template in the set of matches that can be
A tactical behavior is a set of temporal relationships used as a key process indicator. The 55 Volleyball
over events in a volleyball match. An event is the main matches were analyzed in less than 10 seconds, a
action of a player on the ball which has a start time, an suitable performance for an offline scenario. With a
end time, a set of players involved with information support of 20%, we obtained 50 tactical behaviors
related to their pose, their bounding boxes, their expressed using LTLf templates, automatically
unique identifiers, the quality of the action and the translated by the tool in natural language sentences
position of the ball. For example: for a better human comprehension. An example of
• A dunk by a player from area A1 is mined tactical behavior is that in the 47.73% of the
immediately followed by a point scored. matches, each jump (for a block) is preceded by a
• A reception (with low quality) of a player is dunk without any other jump in between. In addition,
immediately followed by a point. RuM also allows us to link the tactical behaviors of
actions to the other concepts of the above conceptual
Our conceptual model for a volleyball event is scheme.
shown in Figure 8.
Figure 8: The conceptual model for volleyball events. Figure 9: The conformance checking analysis of
A volleyball match is therefore a sequence of predefined tactical behaviors.
annotations of volleyball events in chronological RuM also supports the manual definition of
order. Such events are annotated with the use of the tactical behaviors and the analysis of the matches
computer vision techniques above or provided by according to such predefined behaviors. This task is
scoutmen. called conformance checking and, as two examples of
4.3.2. Process Mining for tactical behaviors tactical behaviors, we defined that a jump is followed
Process Mining [20] embraces Data Mining and by a spike and that a spike is followed by a block.
Knowledge Representation and focuses on the Figure 9 shows the results of the RuM conformance
analysis and improvement of business processes checking.
based on data collected from the information systems. Each behavior is analyzed for each match and, on
One of its key features is the availability of tools for the right, the actions of match 5 are shown and
mining information from temporal discrete data. We highlighted in green if they conform to the tactical
analyzed the matches of the Volleyball dataset behavior, in red otherwise.
(converted in a suitable format) with the Process Acknowledgements
Mining RuM (Rule Mining Made Simple) tool [1] to This work is supported by 5VREAL – 5G VOLLEY
mine tactical behaviors. REALITY EXPERIENCE & ANALYTICS LIVE, CUP
RuM extracts temporal relations among actions of I53C23001340005, funded by Italian Ministry of
volleyball events through a list of templates defined Enterprises and Made in Italy.
References [15] Sudhakaran S, Escalera S, Lanz O: Gate-Shift
Networks for Video Action Recognition, IEEE
[1] Alman, A., Donadello, I., Maggi, F. M., Montali, M.
Declarative Process Mining for Software CVPR 2020
Processes: The RuM Toolkit and the Declare4Py [16] Sudhakaran S, Escalera S, Lanz O: Gate-Shift-
Python Library. In Int. Conf. on Product-Focused Fuse for Video Action Recognition, IEEE TPAMI,
2023
Sw Process Improvement (2023).
[17] Takahashi M, Ikeya K, Kano M, Ookubo H,
[2] Azar S.M., Atigh M.G., Nickabadi A., Alahi A.:
Convolutional relational machine for group Mishina T: Robust Volleyball Tracking System
activity recognition. In: IEEE CVPR. (2019) Using Multi-View Cameras. ICPR, 2016
[18] Thilakarathne H., Nibali A., He Z., Morgan S.: Pose
[3] Bagautdinov, T., Alahi, A., Fleuret, F., Fua, P.,
is all you need: The pose only group activity
Savarese, S.: Social scene understanding: End-
recognition system (pogars). arXiv preprint
to-end multi-person action localization and
collective activity recognition. IEEE CVPR. arXiv:2108.04186 (2021)
(2017) [19] Van Haaren, J., Ben Shitrit, H., Davis, J., Fua, P.
(2016, August). Analyzing volleyball match data
[4] Camarena F, Gonzalez-Mendoza M, Chang L,
from the 2014 world championships using
Cuevas-Ascencio R: An Overview of the Vision-
Based Human Action Recognition Field, Math. machine learning techniques. In Proceedings of
Comput. Appl. 2023 the 22nd ACM SIGKDD (pp. 627-634).
[5] Calvanese M: Ball tracking in Padel Videos using [20] Van Der Aalst, W., van der Aalst, W. (2016). Data
science in action (pp. 3-23). Springer Berlin
Convolutional Neural Networks. [Laurea
magistrale], Università di Bologna, Corso di Heidelberg.
Studio in Artificial intelligence, 2023 [21] Wenninger, S., Link, D., Lames, M. (2019). Data
[6] Dijkstra E.W: A note on two problems in mining in elite beach volleyball–detecting
tactical patterns using market basket analysis.
connexion with graphs. Numerische
IJCSS, 18(2), 1-19.
mathematik, 1959
[22] Wu L.F., Wang Q., Jian M., Qiao Y., Zhao, B.X.: A
[7] Ibrahim MS, Muralidharan S, Deng Z, Vahdat A,
Mori G. A hierarchical deep temporal model for comprehensive review of group activity
group activity recognition. CVPR, 2016 recognition in videos. International Journal of
Automation and Computing pp. 1–17 (2021)
[8] Jiang T, Lu P, Zhang L, Ma N, Han R, Lyu C, Li Y,
[23] Xia, H., Tracy, R., Zhao, Y., Fraisse, E., Wang, Y. F.,
Chen K: RTMPose: Real-Time Multi-Person Pose
Petzold, L. (2022, November). VREN: volleyball
Estimation based on MMPose. ArXiv, 2023
[9] Khan, A., Bozzato, L., Serafini, L., Lazzerini, B. rally dataset with expression notation language.
(2019). Visual reasoning on complex events in In 2022 IEEE ICKG (pp. 337-346).
[24] Xu D., Fu H., Wu L., Jian M., Wang D., Liu X.: Group
soccer videos using answer set programming. In
activity recognition by using effective multiple
GCAI 2019.
[10] Li J, Wang C, Zhu H, Mao Y, Fang H, Lu C.: modality relation representation with temporal-
CrowdPose: Efficient Crowded Scenes Pose spatial attention. IEEE Access 8, (2020)
[25] Yan R., Xie L., Tang J., Shu X., Tian Q.: Higcin:
Estimation and A New Benchmark, CVPR, 2019
hierarchical graph-based cross inference
[11] Meerhoff, L. A., Goes, F. R., De Leeuw, A. W.,
network for group activity recognition. IEEE
Knobbe, A. (2020). Exploring successful team
tactics in soccer tracking data. In Machine TPAMI (2020)
Learning and Knowledge Discovery in [26] Zhou H, Kadav A, Shamsian A, Geng S, Lai F, Zhao
L, Liu T, Kapadia M, Graf HP: COMPOSER:
Databases: Int. Workshops of ECML PKDD 2019.
Compositional Reasoning of Group Activity in
[12] Nabi, M., Bue, A., Murino, V.: Temporal poselets
Videos with Keypoint-Only Modality. ECCV,
for collective activity detection and recognition.
In: IEEE CVPR. pp. 500–507 (2013) 2022
[13] Qi, M., Qin, J., Li, A., Wang, Y., Luo, J., Van Gool, L.: [27] Donadello, I., Di Francescomarino, C., Maggi, F.
M., Ricci, F., Shikhizada, A. Outcome-oriented
stagnet: An attentive semantic rnn for group
prescriptive process monitoring based on
activity recognition. In: Proc. of the ECCV. (2018)
[14] Rahimian P, Toka L: Optical tracking in team temporal logic patterns. Engineering
sports: A survey on player and ball tracking Applications of Artificial Intelligence (2023).
methods in soccer and other team sports. [28] Claudio Di Ciccio, Marco Montali: Declarative
Process Specifications: Reasoning, Discovery,
Journal of Quantitative Analysis in Sports, 2022
Monitoring. Process Mining Handbook 2022.