=Paper=
{{Paper
|id=Vol-3762/572
|storemode=property
|title=Video Analytics for Volleyball: Preliminary Results and Future Prospects of the 5VREAL Project
|pdfUrl=https://ceur-ws.org/Vol-3762/572.pdf
|volume=Vol-3762
|authors=Andrea Rosani,Ivan Donadello,Michele Calvanese,Alessandro Torcinovich,Giuseppe Di Fatta,Marco Montali,Oswald Lanz
|dblpUrl=https://dblp.org/rec/conf/ital-ia/RosaniDCTFML24
}}
==Video Analytics for Volleyball: Preliminary Results and Future Prospects of the 5VREAL Project==
<pdf width="1500px">https://ceur-ws.org/Vol-3762/572.pdf</pdf>
<pre>
                                Video Analytics for Volleyball: Preliminary Results and
                                Future Prospects of the 5VREAL Project
                                Andrea Rosani, Ivan Donadello, Michele Calvanese*, Alessandro Torcinovich,
                                Giuseppe Di Fatta, Marco Montali and Oswald Lanz
                                Libera Università di Bolzano, Piazza Università 1, Bozen-Bolzano, 39100, Italy


                                                   Abstract
                                                   This paper introduces a real-time action recognition and tactical-behavior mining system
                                                   designed specifically for volleyball games. The system aims to provide data augmentation, video
                                                   annotation and KPI extraction processes by accurately identifying various actions and action
                                                   sequential patterns performed during volleyball matches. Leveraging advanced computer vision
                                                   techniques, the system aims at automatically detecting and recognizing player actions and group
                                                   actions in real time. Then, Process Mining techniques are used to extract tactical behaviors, in the
                                                   form of temporal relations, among player actions. By providing precise annotations, the system
                                                   significantly provides an instrument for volleyball game analytics and tactical analysis. This
                                                   paper outlines the architecture and key components of the real-time action recognition and
                                                   tactical-behavior mining system and presents some preliminary results on the performance of
                                                   the proposed model.
                                                   Keywords
                                                   Video action recognition, data augmentation, video annotation, process mining, sports
                                                   1


                                1. Introduction                                                     developments obtained. First, a review of some
                                Over the past decade, action recognition in                         particularly relevant works in the specific field is
                                professional sport activities has rapidly gained                    proposed. Then, methods and algorithms are
                                popularity as a tool for a variety of tasks such as player          described, along with some results of preliminary
                                                                                                    experiments on a public dataset [7].
                                performance analytics, computer-aided game
                                refereeing, and the like. In response to this interest,             1.1. Context: the 5VREAL Project
                                several action recognition systems have been devised                This paper describes the preliminary results obtained
                                in the context of several sports, such as football,                 during the activity related to the project 5VREAL – 5G
                                basket, rugby, etc.                                                 Volley Reality Experience & Analytics Live, focused on
                                    In this context, this paper presents an action                  the study and implementation of a system for the
                                recognition system for volleyball game analysis. The                acquisition, analysis and transmission of video and
                                preliminary results obtained during the activity focus              analytics in the context of volleyball games and
                                on the detection of actions, events, and tactical                   training sessions. The project aims to create a scalable
                                behaviors in volley with the final objective of                     solution, which can be used at all levels of
                                providing a reliable Ai-powered data augmentation                   competition, professional and amateur.
                                system that can be used for the TV broadcasting of                  Two use cases are developed:
                                volley games in a real time scenario, as well as for off-                •    Fun Engagement: This use case aims to use
                                line analytics activities, starting from the video                            artificial intelligence algorithms to enrich
                                collected by a multi view source and shared using 5G                          the spectator’s experience while watching
                                transmission.                                                                 the match with augmented reality
                                    The document is structured into several sections                          information displayed in real time on the
                                that outline in detail the study process and the                              broadcasted videos.


                                Ital-IA 2024: 4th National Conference on Artificial Intelligence,         0009-0008-2622-6776 (A. Rosani); 000-0002-0701-5729
                                organized by CINI, May 29-30, 2024, Naples, Italy                     (I. Donadello); 0009-0005-4103-0147 (M. Calvanese);
                                ∗ M. Calvanese contributed with work done during his Master               0000-0001-8110-1791 (A. Torcinovich); 0000-0003-3096-
                                Thesis project at UPC Barcelona with Prof. Carlos Andujar Gran.       2844 (Di Fatta), 0000-0002-8021-3430 (M. Montali), 0000-
                                                                                                      0003-4793-4276 (O. Lanz)
                                                                                                                  © 2024 Copyright for this paper by its authors. Use permitted under
                                                                                                                  Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
    •     Coach: Use of the game & ‘rhythm’ for            3. Methodology and algorithms
          technical staff. After the game, the technical
          staff or directly the coach receives             3.1. General architecture of the system
          indications on positions, speed, trajectories,   The AI block consists of a set of algorithms required to
          time intervals between touches and higher-       a) identify the position and trajectory of the ball, b)
          level semantic information about the tactical    identify the position of individual players, and c)
          behaviors of the team that can favor a more      detect and identify actions performed within a
          in-depth technical and tactical analysis.        specific timeframe.
    The involvement in the project of industrial               The acquisition of images for AI occurs through
partners operating in the media production sector          three iPhone 14 Pro devices mounted tripods with
will enable a real application scenario to test the        calibrated cameras, connected to a backend via 5G,
performances of the proposed solution. The project is
                                                           producing synchronized SRT (Secure Reliable
funded by the Italian Ministry of Enterprises and
Made in Italy, MIMIT under the MIMIT FSC 2014-2020:        Transport) compressed video streams.
Tecnologie 5G. Progetti di sperimentazione e ricerca –
Piano di Sviluppo e Coesione 2014-2020.

2. State of art in action recognition
    and tactical behavior for volley
The task of action/pose estimation involves analyzing
video content to track one or more persons of interest
and identify their key anatomical features, typically
                                                           Figure 1: Overview of the architecture of the
defined as keypoints [14], [26]. When multiple actors
                                                           volleyball action recognition system.
interact, the task is usually referred as Group Activity       The ball localization module starts the processing
Recognition (GAR) [18], [19], [22].                        by producing a continuous data stream of the ball
    GAR algorithms differ in how they model spatial        trajectory. When a change in its direction is detected,
and temporal information in videos. Some dated             the player tracking and action detection modules are
approaches apply recurrent models: [7] develops a          activated (Figure 1). This generates an output of the
hierarchical model based on two long-short term            events occurred in the selected timeframe. In the
memory (LSTM) models, [13] proposes a recurrent            following, we analyze in detail the different steps. 3D
neural network (RNN) model with attention                  Ball tracking is described by a project partner in
                                                           another submission to Ital-IA 2024.
mechanisms and semantic graphs, [3] generates a
map of candidate regions of interest and uses an RNN       3.2. Ball trajectories change detection
architecture for temporal processing, and [24] adopts      The general scheme for ball trajectory analysis can be
                                                           subdivided in the following steps (Figure 2):
a top-down approach using Gated Recurrent Unit.
                                                                1. Identification of possible candidate ball
    Other works focus on convolutional mechanisms:
                                                                     positions.
[2] develops a convolutional relational machine for             2. Incremental interpolation of candidates with
GAR, [19] works on individual poses using one-                       parabolic trajectories, producing a parabola
dimensional convolutional neural networks.                           for each frame.
    Newer models like graph-based networks and                  3. Linking of trajectories from which to derive
Transformers are also employed: [25] uses a graph-                   the motion of the ball.
based model for spatio-temporal relationships,                  4. Detection of trigger events when the ball
designs a descriptor for crowded scenarios, and [10]                 undergoes an upward acceleration, such as a
[12] proposes a Transformer-based solution for                       player touching or a bounce on the floor.
processing spatial and temporal information.                    The algorithm, originally proposed in [5], requires
    To recognize tactical behaviors, techniques like       as input the positions of the ball at each time step, that
sequence mining algorithms and Inductive Logic             can be easily devised with a ball tracking system [14].
Programming are used ([21], [19], [23]). Works in this     The path of the ball is modelled by a piecewise
field include [9] and [11] for predicting complex          parabolic trajectory. Initially, seed triplets are
events from football matches using Answer Set              identified within a threshold distance (𝑟).
Programming and Subgraph Discovery. In our work,                These triplets serve as initial anchors for parabolic
temporal pattern mining algorithms based on Linear         fitting. Due to false positives, multiple seed triplets
Temporal Logics will be used, offering a different         per frame may exist. Each triplet is used to fit a
approach compared to the mentioned works.                  parabola, and candidate detections close to the
estimated position are added to a set of supporting       actions in different environments [4]. These studies
points.                                                   focus on extracting meaningful information from
                                                          videos, by detecting and recognizing what a subject is
                                                          doing [15], [16], [17].
                                                              The posture detection occurs within the video
                                                          stream, in the player's bounding box, that is the area
                                                          of interests of an object (the player, in this case)
                                                          tracked in each video frame. The detection of the
                                                          posture uses pose estimation technologies based on
                                                          machine learning models [24], that identify key
                                                          anatomical features of players, such as joints,
                                                          extremities, center of mass, etc., commonly referred to
                                                          as keypoints [8]. In the case of a volleyball player, the
                                                          bounding box is used to locate the player's position
                                                          within the video frame and subsequently extract
Figure 2: Ball trajectories analysis and trigger event    keypoints on the players' bodies (Figures 4 and 5).
detection [5].
   The temporally furthest points within the support
set are used to fit a new parabola. This iterative
process continues until the set of supporting points
ceases to grow. Parabolas with upward-pointing
acceleration vectors are excluded as they violate
physical constraints.

                                                          Figure 4: Example annotation from the Volleyball
                                                          dataset showing the bounding box of each player
                                                          divided by team (using different colors) and the action
                                                          performed ("Left spike"). (Image from [7])
                                                              Starting from this information is possible to
                                                          perform action recognition, as demonstrated
                                                          effectively in [16], [17] that will be used as reference
                                                          in the project for this specific task.
Figure 3: Action and Group Activity Recognition           3.4. Team activity recognition
(images from [7]). The variation in the ball trajectory   The challenge of Group Activity Recognition (GAR)
identifies an interaction that triggers the event.        requires addressing two main aspects. First, it
    To ensure a unique parabola per frame, trajectory     demands a compositional understanding of the scene.
distances are computed and used to construct a            Due to the relatively high number of people present in
                                                          the scene, it's challenging to learn meaningful
weighted graph. Dijkstra's algorithm [6] identifies the
                                                          representations for GAR over the entire area. Since
optimal path through this graph, yielding the final       group activities often involve subgroups of actors and
sequence of parabolas describing the ball's path.         scene objects, the final label of the action depends on
    Considering that the action mainly occurs around      a compositional understanding of these entities.
the ball's position, the proposed solution allows for     Secondly, GAR benefits from relational reasoning on
detecting changes in the direction of the ball due to     scene elements to understand the relative importance
gameplay interactions. This trajectory variation          of entities and their interactions [26].
triggers an analysis mechanism of the activities          4. Preliminary results
performed near the contact point to activate the          In the following, we present some preliminary results
subsequent phase of recognizing the actions of            obtained using state-of-the-art techniques on public
individual players and teams (Figure 3).                  available datasets.
3.3. Individual player action recognition                 4.1. Dataset
In the rapidly evolving field of action recognition,      The Volleyball dataset [7], represents a significant
many datasets, structures, and architectures have         resource in the context of sports action recognition,
been introduced to address the challenges and             specifically on volleyball. Although originally
complexities associated with understanding human          designed for athlete action recognition, the dataset
has been extended to include the task of 2D ball
detection in the image. The dataset comprises a total
of 4830 frames from 55 videos, offering a wide variety
of actions and activities to analyze (Figure 4). In the
dataset, there are nine annotations for individual
player actions and eight group activities, detailed in
Table 1.
Table 1
Classes of individual player activities are listed, and
group actions, including the number of instances.         Figure 6: Example 2D application of player
   Action      No. of     Group Activity       No. of     identification and identification of ball trajectory
   Classes Instances Class                  Instances     changes ("trigger"). Keypoints can be observed on
  Waiting      3601          Right set         644        each player's silhouette, along with the corresponding
   Setting     1332         Right spike        623        arc of the ball trajectory.
   Digging     2333         Right pass         801
   Falling     1241       Right winpoint       295
   Spiking     1216        Left winpoint       367
  Blocking     2458          Left pass         826
  Jumping       341          Left spike        642
   Moving      5121           Left set         633
  Standing     38696


4.2. Group activity recognition
GAR is performed at different levels. Initially, the
keypoints of the various players are extracted. Based
on these, an estimation of the action each player is
doing is defined, and then related to the predicted
level of person-to-person and person-to-group
interaction.
4.2.1. Trigger event identification and GAR
The situation that activates the GAR mechanism is
represented by the trigger, identified with the change
of the ball direction (Figure 5).

                                                          Figure 7: Our results on the Volleyball dataset
                                                          considering the Olympic Split [7], [26]. In the first
                                                          confusion matrix we represent GAR, in the second one
                                                          the single player activities.
                                                          Like humans, object representation is performed at
                                                          various granularities, as well as reasoning about their
                                                          interactions to transform sensory signals into high-
                                                          level knowledge. GAR is addressed by modeling a
                                                          video as a set of tokens representing multi-scale
Figure 5: -Detailed schema for action and group           semantic concepts present in the video, thus allowing
activity recognition.                                     the described method to be easily adaptable to
In Figure 6 we present some frames from [7],              understand any video with multi-actor multi-object
processed using the proposed algorithms, detailed in      interactions.
the following section, allowing for a comprehensive           In the specific case of volleyball, the actors are
visualization of the keypoints of the various players     represented by the players, while the object is
combined with the trajectories of the ball                represented by the ball. These tokens include
4.2.2. Hierarchy of semantic events for GAR               keypoints, people, person-to-person interactions,
 Taking inspiration from the approach proposed in         person-to-group interactions, and object interactions.
[26], composite learning of entities in the video and     The performance of this analysis, compared to
relational reasoning on these entities is established.    previous techniques based on standard RGB analysis
(i.e., considering the entire images and not just the       with Linear Temporal Logic over finite traces (LTLf),
keypoints), shows significant accuracy (Figure 7)           one of the reference logics in the field [28]. Examples
4.3. Tactical behavior                                      of such templates are the Chain Response between
By tactical behavior, we mean a set of temporal             actions A and B that means that action A must be
relationships among volleyball actions that can lead to     immediately followed by action B or the Alternate
an outcome of particular interest, such as scoring a        Precedence between A and B that means that action B
point. In what follows we provide a conceptual              must be preceded by action A without any other
framework to formally define tactical behaviors and         occurrence of B in between, see [27] Table 2. In
use Process Mining (PM) techniques for mining               addition, RuM provides the selection of a numeric
tactical behaviors from annotated volleyball matches.       support that indicates the percentage of occurrence of
4.3.1. A conceptual model for tactical behaviors            a particular template in the set of matches that can be
A tactical behavior is a set of temporal relationships      used as a key process indicator. The 55 Volleyball
over events in a volleyball match. An event is the main     matches were analyzed in less than 10 seconds, a
action of a player on the ball which has a start time, an   suitable performance for an offline scenario. With a
end time, a set of players involved with information        support of 20%, we obtained 50 tactical behaviors
related to their pose, their bounding boxes, their          expressed using LTLf templates, automatically
unique identifiers, the quality of the action and the       translated by the tool in natural language sentences
position of the ball. For example:                          for a better human comprehension. An example of
     •    A dunk by a player from area A1 is                mined tactical behavior is that in the 47.73% of the
          immediately followed by a point scored.           matches, each jump (for a block) is preceded by a
     •    A reception (with low quality) of a player is     dunk without any other jump in between. In addition,
          immediately followed by a point.                  RuM also allows us to link the tactical behaviors of
                                                            actions to the other concepts of the above conceptual
   Our conceptual model for a volleyball event is           scheme.
shown in Figure 8.


Figure 8: The conceptual model for volleyball events.       Figure 9: The conformance checking analysis of
   A volleyball match is therefore a sequence of            predefined tactical behaviors.
annotations of volleyball events in chronological               RuM also supports the manual definition of
order. Such events are annotated with the use of the        tactical behaviors and the analysis of the matches
computer vision techniques above or provided by             according to such predefined behaviors. This task is
scoutmen.                                                   called conformance checking and, as two examples of
4.3.2. Process Mining for tactical behaviors                tactical behaviors, we defined that a jump is followed
Process Mining [20] embraces Data Mining and                by a spike and that a spike is followed by a block.
Knowledge Representation and focuses on the                 Figure 9 shows the results of the RuM conformance
analysis and improvement of business processes              checking.
based on data collected from the information systems.         Each behavior is analyzed for each match and, on
One of its key features is the availability of tools for    the right, the actions of match 5 are shown and
mining information from temporal discrete data. We          highlighted in green if they conform to the tactical
analyzed the matches of the Volleyball dataset              behavior, in red otherwise.
(converted in a suitable format) with the Process           Acknowledgements
Mining RuM (Rule Mining Made Simple) tool [1] to            This work is supported by 5VREAL – 5G VOLLEY
mine tactical behaviors.                                    REALITY EXPERIENCE & ANALYTICS LIVE, CUP
    RuM extracts temporal relations among actions of        I53C23001340005, funded by Italian Ministry of
volleyball events through a list of templates defined       Enterprises and Made in Italy.
References                                                       [15] Sudhakaran S, Escalera S, Lanz O: Gate-Shift
                                                                      Networks for Video Action Recognition, IEEE
[1]  Alman, A., Donadello, I., Maggi, F. M., Montali, M.
     Declarative Process Mining for Software                          CVPR 2020
     Processes: The RuM Toolkit and the Declare4Py               [16] Sudhakaran S, Escalera S, Lanz O: Gate-Shift-
     Python Library. In Int. Conf. on Product-Focused                 Fuse for Video Action Recognition, IEEE TPAMI,
                                                                      2023
     Sw Process Improvement (2023).
                                                                 [17] Takahashi M, Ikeya K, Kano M, Ookubo H,
[2] Azar S.M., Atigh M.G., Nickabadi A., Alahi A.:
     Convolutional relational machine for group                       Mishina T: Robust Volleyball Tracking System
     activity recognition. In: IEEE CVPR. (2019)                      Using Multi-View Cameras. ICPR, 2016
                                                                 [18] Thilakarathne H., Nibali A., He Z., Morgan S.: Pose
[3] Bagautdinov, T., Alahi, A., Fleuret, F., Fua, P.,
                                                                      is all you need: The pose only group activity
     Savarese, S.: Social scene understanding: End-
                                                                      recognition system (pogars). arXiv preprint
     to-end multi-person action localization and
     collective activity recognition. IEEE CVPR.                      arXiv:2108.04186 (2021)
     (2017)                                                      [19] Van Haaren, J., Ben Shitrit, H., Davis, J., Fua, P.
                                                                      (2016, August). Analyzing volleyball match data
[4] Camarena F, Gonzalez-Mendoza M, Chang L,
                                                                      from the 2014 world championships using
     Cuevas-Ascencio R: An Overview of the Vision-
     Based Human Action Recognition Field, Math.                      machine learning techniques. In Proceedings of
     Comput. Appl. 2023                                               the 22nd ACM SIGKDD (pp. 627-634).
[5] Calvanese M: Ball tracking in Padel Videos using             [20] Van Der Aalst, W., van der Aalst, W. (2016). Data
                                                                      science in action (pp. 3-23). Springer Berlin
     Convolutional Neural Networks. [Laurea
     magistrale], Università di Bologna, Corso di                     Heidelberg.
     Studio in Artificial intelligence, 2023                     [21] Wenninger, S., Link, D., Lames, M. (2019). Data
[6] Dijkstra E.W: A note on two problems in                           mining in elite beach volleyball–detecting
                                                                      tactical patterns using market basket analysis.
     connexion          with     graphs.         Numerische
                                                                      IJCSS, 18(2), 1-19.
     mathematik, 1959
                                                                 [22] Wu L.F., Wang Q., Jian M., Qiao Y., Zhao, B.X.: A
[7] Ibrahim MS, Muralidharan S, Deng Z, Vahdat A,
     Mori G. A hierarchical deep temporal model for                   comprehensive review of group activity
     group activity recognition. CVPR, 2016                           recognition in videos. International Journal of
                                                                      Automation and Computing pp. 1–17 (2021)
[8] Jiang T, Lu P, Zhang L, Ma N, Han R, Lyu C, Li Y,
                                                                 [23] Xia, H., Tracy, R., Zhao, Y., Fraisse, E., Wang, Y. F.,
     Chen K: RTMPose: Real-Time Multi-Person Pose
                                                                      Petzold, L. (2022, November). VREN: volleyball
     Estimation based on MMPose. ArXiv, 2023
[9] Khan, A., Bozzato, L., Serafini, L., Lazzerini, B.                rally dataset with expression notation language.
     (2019). Visual reasoning on complex events in                    In 2022 IEEE ICKG (pp. 337-346).
                                                                 [24] Xu D., Fu H., Wu L., Jian M., Wang D., Liu X.: Group
     soccer videos using answer set programming. In
                                                                      activity recognition by using effective multiple
     GCAI 2019.
[10] Li J, Wang C, Zhu H, Mao Y, Fang H, Lu C.:                       modality relation representation with temporal-
     CrowdPose: Efficient Crowded Scenes Pose                         spatial attention. IEEE Access 8, (2020)
                                                                 [25] Yan R., Xie L., Tang J., Shu X., Tian Q.: Higcin:
     Estimation and A New Benchmark, CVPR, 2019
                                                                      hierarchical graph-based cross inference
[11] Meerhoff, L. A., Goes, F. R., De Leeuw, A. W.,
                                                                      network for group activity recognition. IEEE
     Knobbe, A. (2020). Exploring successful team
     tactics in soccer tracking data. In Machine                      TPAMI (2020)
     Learning and Knowledge Discovery in                         [26] Zhou H, Kadav A, Shamsian A, Geng S, Lai F, Zhao
                                                                      L, Liu T, Kapadia M, Graf HP: COMPOSER:
     Databases: Int. Workshops of ECML PKDD 2019.
                                                                      Compositional Reasoning of Group Activity in
[12] Nabi, M., Bue, A., Murino, V.: Temporal poselets
                                                                      Videos with Keypoint-Only Modality. ECCV,
     for collective activity detection and recognition.
     In: IEEE CVPR. pp. 500–507 (2013)                                2022
[13] Qi, M., Qin, J., Li, A., Wang, Y., Luo, J., Van Gool, L.:   [27] Donadello, I., Di Francescomarino, C., Maggi, F.
                                                                      M., Ricci, F., Shikhizada, A. Outcome-oriented
     stagnet: An attentive semantic rnn for group
                                                                      prescriptive process monitoring based on
     activity recognition. In: Proc. of the ECCV. (2018)
[14] Rahimian P, Toka L: Optical tracking in team                     temporal       logic      patterns.       Engineering
     sports: A survey on player and ball tracking                     Applications of Artificial Intelligence (2023).
     methods in soccer and other team sports.                    [28] Claudio Di Ciccio, Marco Montali: Declarative
                                                                      Process Specifications: Reasoning, Discovery,
     Journal of Quantitative Analysis in Sports, 2022
                                                                      Monitoring. Process Mining Handbook 2022.

</pre>