<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>SportsVideo: A Multimedia Dataset for Sports Event and Position Detection in Table Tennis and Swimming</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aymeric Erades</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre-Etienne Martin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Romain Vuillemot</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boris Mansencal</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renaud Peteri</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julien Morlier</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Dufner</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jenny Benois-Pineau</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CCP Department, Max Planck Institute for Evolutionary Anthropology</institution>
          ,
          <addr-line>D-04103 Leipzig</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ecole Centrale de Lyon</institution>
          ,
          <addr-line>LIRIS</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>INSA Lyon</institution>
          ,
          <addr-line>LIRIS</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Bordeaux</institution>
          ,
          <addr-line>Labri</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Positions and actions detection/classification are one of the main challenges in visual content analysis and mining. Sports video ofer such challenges due to the variety of scenes and actions they contain. Sports also provide a wide range of analysis, related to athletes' performances and tactics. We propose a series of 6 sports-related tasks, divided each into 2 sub-tasks for two sports, table tennis and swimming. Those tasks are a follow-up from the Sport Task and SwimTrack task from the 2022 MediaEval Benchmark.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        We present SportsVideo, a series of six sports-related multimedia tasks, divided in sub-tasks for
table tennis and swimming. The dataset is a merge of the Sport Task (Table Tennis) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] with
SwimTrack [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] (Swimming) task from the Medieval 2022 edition [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] into a single benchmark
dataset. The main motivation is to provide a more complete and challenging benchmark for
video analysis with complex scenes and actions. The first four tasks are related to image and
video analysis, the fifth to sound analysis and the last one to textual information extraction.
      </p>
      <p>
        By combining two diferent sports –table tennis and swimming– we aim to encourage an
approach that generalizes beyond a single sport type since the two sports are very diferent in
terms of the type of video, the type of events and the type of analysis required. Table tennis is a
fast-paced sport with a lot of action and a small field of view. Swimming is a slower sport with
a large field of view and a lot of occlusions. We expect participants to develop approaches that
can generalize to both, but also to other sports. Those tasks have been identified and designed
to be as independent as possible so that participants can choose to participate in one or more
tasks. But if combined, they can provide a more complete analysis of sports videos. Participants
are encouraged to release their code publicly with their submission. This year, similarly to
the Sport Task 2022 edition [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a baseline for both subtasks 2.1 and 3.1 is shared publicly1 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
Background on sports is provided in the following two PhD thesis in swimming [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and in table
tennis in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Tasks Description</title>
      <sec id="sec-2-1">
        <title>Task 1 - positions detections</title>
        <p>The main information in sports is related to positions of players in videos featuring diferent
numbers of lanes in a swimming pool and sides around a table tennis (seen from various angles).
They need to provide bounding boxes for identified players, and their results are evaluated
using Average Precision (AP) at an IoU ratio of 0.25, counting true positives and negatives
across the dataset.</p>
        <p>Subtask 1.1 (table tennis) – To detect 2 or 4 players (depending if single or double) and track
them during the video especially during double games where players have a lot of overlaps,
from videos recorded from various angles (e.g., side, corder).</p>
        <p>
          Subtask 1.2 (swimming) – To detect up to 8 swimmers in the pool from static videos (recorded
from the side of the pool). A baseline is provided in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Task 2 - events detection</title>
        <p>Another key information in sports video is related to events, in particular strokes, which are
key for performance evaluation. In swimming, stroke events indicate the pace of swimming for
Freestyle, Backstroke, and Butterfly are triggered when the right hand enters the water, while
for Breaststroke, when the head reaches its highest point.</p>
        <p>Subtask 2.1 (table tennis) – To detect when a player is performing a stroke (i.e. a ball hit with
the racket) using close-up videos. The goal is to detect the exact frame when the ball is hit by
the racket. Videos are provided with the ball annotated. Evaluation is based on the F1-score,
which measures the harmonic mean of precision and recall.</p>
        <p>
          Subtask 2.2 (swimming) – To detect each time a swimmer is achieving a repeated motion for
each swimming style (for freestyle, backstroke, and butterfly stroke once the swimmer’s right
hand enters the water; for breaststroke once the head is at its highest point). Swimmers’ strokes
are identified after the underwater phase and until the swimmer finishes the race, excluding
underwater phases for races longer than 50m. Video clips of cropped swimmers are provided,
and evaluation is based on Of-By-One Accuracy, which measures the proportion of correctly
estimated stroke counts within a tolerated error of one stroke in the dataset. A baseline is
provided in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Task 3 - events classification</title>
        <p>The goal of this task is to classify the type of stroke performed by a player in table tennis and
swimming. Participants need to categorize a collection of shortened table tennis stroke videos,
each containing either a single stroke or no stroke. There are 20 potential stroke categories and
an extra category for non-strokes. Two sets with annotations are given: a training set with 807
videos and a validation set with 230 videos. The challenge is to classify a non-annotated test set
comprising 118 videos, with the assurance that the trimmed videos in each set are derived from
the same untrimmed videos but captured at diferent time instances without temporal overlap.
Subtask 3.1 (table tennis) – To classify diferent strokes in table tennis from trimmed videos in
which only one stroke is present. There are 3 diferent categories of strokes, services, forehand
and backhand. For services we have 6 diferent classes. For forehand and backhand we have 5
classes. For a total of 16 classes and one non-stroke class.</p>
        <p>Subtask 3.2 (swimming) – To classify diferent swimming styles (Freestyle, Backstroke,
Breaststroke, Butterfly).</p>
      </sec>
      <sec id="sec-2-4">
        <title>Task 4 - field/table registration</title>
        <p>Sports videos in general, and the one we provided, are usually recorded from the side. This tasks
asks participants to find the absolute homography matrix for each frame in the dataset. The
precision of this projection is evaluated using Intersection over Union (IoU), with two metrics:
IoU for the visible pool parts and IoU for the entire pool, including parts outside the camera’s
ifeld of view.</p>
        <p>Subtask 4.1 (table tennis) – To detect the table position for a given video frame of a whole
table tennis. The dataset contains 54 annotated images with homography matrix from TV
broadcasts of table tennis matches.</p>
        <p>
          Subtask 4.2 (swimming) – To detect pool position for a given video frame. The dataset is
provided with 500 annotated images with homography matrix from [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Task 5 - sound detection</title>
        <p>Sports are highly multi-modal events. Sound is an important modality that can be used to detect
events. They can either be used as oficial cues (e.g., buzzer sound in swimming) or as additional
cues (e.g., ball bounce in table tennis). In this task, participants are asked to detect sound events
in table tennis and swimming videos. In table tennis, the ball bounces on the table at each
stroke.</p>
        <p>Subtask 5.1 (table tennis) – Ball hits indicate the pace of the game. The goal is to detect the
exact frame when the ball bounces on the table. Videos are provided with the ball annotated
and evaluation is based on the F1-score.</p>
        <p>Subtask 5.2 (swimming) – A buzzer sound (preceded by an "on your mark") signals the start.
Participants are asked to find the time of this sound within audio files extracted from live videos.
These files may or may not include the buzzer sound, which can occur at various points during
the recording. This task is challenging because the sound might be recorded from a considerable
distance and could be accompanied by significant background noise.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Task 6 - score and results extraction</title>
        <p>In most sports, the outcome is presented on a scoreboard, featuring race time for each swimmer
(and possibly extra data like reaction time) or the current score of the game. These scoreboards
are typically either physical LCD screens on the wall or close to the referee. Digital versions
shown on TV broadcasts.</p>
        <p>Subtask 6.1 (table tennis) – To recognise the score of the match. In table tennis, the score of a
match can be embedded in the broadcast video or it can be shown by referees with scoreboards.
When score is embedded in stream video, names of players are also displayed.
Subtask 6.2 (swimming) – The task here involves extracting swimmers’ names, lane numbers,
and race results (times) from screenshots of these scoreboards. The images and scoreboard
coordinates will be provided, with the localization aspect already addressed. During swimmer
competitions, after each race, results are displayed on digital boards. The goal is to recognise
characters of these boards to obtain the results of races.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Acknowledgement</title>
      <p>We thank the people involved in the previous versions of this challenge. This project was
partially funded by the ANR NePTUNE, grant number ANR-19-STHP-0004 and the FFTT
partnership convention.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Calandre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mansencal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Benois-Pineau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Péteri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Mascarilla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Morlier</surname>
          </string-name>
          ,
          <article-title>Sport task: Fine grained action detection and classification of table tennis strokes from videos for mediaeval 2022</article-title>
          , in: [3],
          <year>2022</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3583</volume>
          /paper26.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jacquelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jaunet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vuillemot</surname>
          </string-name>
          , S. Dufner,
          <source>SwimTrack: Swimmers and Stroke Rate Detection in Elite Race Videos</source>
          ,
          <year>2023</year>
          . URL: https://hal.science/hal-03936053. doi:
          <volume>10</volume>
          .1145/nnnnnnn.nnnnnnn.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Langguth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lommatzsch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Andreadis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dao</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hürriyetoglu</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>T. S.</given-names>
          </string-name>
          <string-name>
            <surname>Nordmo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vuillemot</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
          </string-name>
          (Eds.),
          <source>Working Notes Proceedings of the MediaEval 2022 Workshop</source>
          , Bergen, Norway and Online,
          <volume>12</volume>
          -
          <fpage>13</fpage>
          January
          <year>2023</year>
          , volume
          <volume>3583</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3583</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <article-title>Baseline method for the sport task of mediaeval 2022 with 3d cnns using attention mechanism</article-title>
          ,
          <source>in: [3]</source>
          ,
          <year>2022</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3583</volume>
          /paper19.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <article-title>Baseline method for the sport task of mediaeval 2023 3d cnns using attention mechanisms for table tennis stoke detection and classification</article-title>
          .,
          <source>in: Working Notes Proceedings of the MediaEval 2023 Workshop</source>
          , Amsterdam,
          <source>The Netherlands and Online, 1-2 February</source>
          <year>2024</year>
          , CEUR Workshop Proceedings, CEUR-WS.org,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jacquelin</surname>
          </string-name>
          ,
          <source>Automatic Analysis of Elite Swimming Race Videos</source>
          , These de doctorat, Ecully, Ecole centrale de Lyon,
          <year>2022</year>
          . URL: https://www.theses.fr/2022ECDL0017.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.-E.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <article-title>Fine-grained action detection and classification from videos with spatio-temporal convolutional neural networks : Application to Table Tennis</article-title>
          .,
          <string-name>
            <surname>Theses</surname>
          </string-name>
          , Université de Bordeaux,
          <year>2020</year>
          . URL: https://theses.hal.science/tel-03128769.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jacquelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vuillemot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dufner</surname>
          </string-name>
          ,
          <article-title>Detecting Swimmers in Unconstrained Videos with Few Training Data</article-title>
          ,
          <source>8th Workshop on Machine Learning and Data Mining for Sports Analytics</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jacquelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vuillemot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dufner</surname>
          </string-name>
          ,
          <article-title>Periodicity Counting in Videos with Unsupervised Learning of Cyclic Embeddings, Pattern Recognition Letters (</article-title>
          <year>2022</year>
          ). URL: https://hal.archives-ouvertes.fr/ hal-03738161. doi:
          <volume>10</volume>
          .1016/j.patrec.
          <year>2022</year>
          .
          <volume>07</volume>
          .013.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jacquelin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dufner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vuillemot</surname>
          </string-name>
          ,
          <article-title>Eficient One-Shot Sports Field Image Registration With Arbitrary Keypoint Segmentation</article-title>
          , in: IEEE International Conference on Image Processing, Bordeaux, France,
          <year>2022</year>
          . URL: https://hal.archives-ouvertes.fr/hal-03738153.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>