<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sports Video Classification: Classification of Strokes in Table Tennis for MediaEval 2020</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierre-Etienne Martin</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jenny Benois-Pineau</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Boris Mansencal</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Renaud Péteri</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laurent Mascarilla</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jordan Calandre</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julien Morlier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IMS, University of Bordeaux</institution>
          ,
          <addr-line>Talence</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>MIA, La Rochelle University</institution>
          ,
          <addr-line>La Rochelle</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Univ. Bordeaux</institution>
          ,
          <addr-line>CNRS, Bordeaux INP, LaBRI, Talence</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Fine grained action classification has raised new challenges compared to classical action classification problem. Sport video analysis is a very popular research topic, due to the variety of application areas, ranging from multimedia intelligent devices with user-tailored digests, up to analysis of athletes' performances. Running since 2019 as a part of MediaEval, we ofer a task which consists in classifying table tennis strokes from videos recorded in natural conditions at the University of Bordeaux. The aim is to build tools for teachers, coaches and players to analyse table tennis games. Such tools could lead to an automatic profiling of the player and the training session could then be adapted for improving sports skills more eficiently.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Action detection and classification is one of the main challenges in
visual content analysis and mining [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Over the last few years, the
number of datasets for action classification has drastically increased
in terms of video content, resolution, localization and number of
classes. However the latest research shows that classification
performed using deep neural networks often focuses on the whole
scene and the background and not on the action itself.
      </p>
      <p>
        Sport video analysis has been a very popular research topic, due
to the variety of application areas, ranging from multimedia
intelligent devices with user-tailored digests, up to analysis of athletes’
performance [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The Sport Video Classification project was
initiated by the Faculty of Sports (STAPS) and the computer science
laboratories (LaBRI) of the University of Bordeaux, and by the MIA
laboratory of La Rochelle University1. The goal of this project is
to develop artificial intelligence and multimedia indexing methods
for the recognition of table tennis sport activities. The ultimate
goal is to evaluate the performance of athletes, with a particular
focus on students, in order to develop optimal training strategies.
To that aim, a video corpus named TTStroke-21 was recorded with
volunteer players. These data are of great scientific interest for the
Multimedia community participating in the MediaEval campaign.
1This work was supported by the New Aquitania Region through CRISP project
ComputeR vIsion for Sport Performance and the MIRES federation.
      </p>
      <p>
        Several datasets such as UCF-101 [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], HMDB [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and AVA [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
have been used for many years as benchmarks for action
classification methods. In [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], spatio-temporal dependencies are learned
from the video using only RGB images for classification. This
method is promising but its scores are still below multi-modal
methods such I3D [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. More recently, datasets have been enriched, like
JHMDB [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Kinetics [
        <xref ref-type="bibr" rid="ref2 ref3 ref9">2, 3, 9</xref>
        ] or fused like AVA_Kinetics [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Some
also focus on the intra-class dissimilarity such as the
SomethingSomething dataset. Others, such as the Olympic Sports dataset [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ],
focus on sport actions only. However those datasets are not
dedicated to a specific sport and its associated rules. Few datasets
focus on fine-grained classification. We can cite FineGym [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ],
introduced recently, which focuses on gymnastic videos, and our
TTStroke21 [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] comprising table tennis strokes.
      </p>
      <p>
        TTStroke-21 is manually annotated by professional players or
teachers of table tennis, making the annotation process more time
consuming, but more temporally and qualitatively accurate.
Classification methods such as I3D model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or LTC model [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] performing
well on UCF-101 dataset inspired the work done in [
        <xref ref-type="bibr" rid="ref18 ref21">18, 21</xref>
        ] which
introduces a TSTCNN - Twin Spatio Temporal Convolutional Neural
Network. Here, the video stream and derived computed optical flow
are passed through the branches of the TSTCNN. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] the
normalization of the flow is also investigated to improve the classification
score while in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] an attention block is introduced to improve the
performances and speed of convergence. The inter-similarity of
actions - strokes - in TTStroke-21 makes the classification task
challenging and the multi-modal method seemed to improve
performances. To better understand learned features and classification
process taking place in the TSTCNN, we also developed a new
visualization technique [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Recent work focusing on table tennis [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] tries to get the tactics
of the players based on their performance during matches using a
Markov chain model. In [
        <xref ref-type="bibr" rid="ref14 ref27 ref32">14, 27, 32</xref>
        ] stroke recognition is performed
using sensors. In [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] segmentation of the player, ball coordinates,
event detection is explored while [
        <xref ref-type="bibr" rid="ref13 ref31">13, 31</xref>
        ] focus solely on the
trajectory of the ball.
      </p>
      <p>In this task overview paper, in section 2, we introduce the specific
conditions of usage of this data, then describe TTStroke-21 and
the task respectively in sections 3 and 4. The evaluation method is
explained in section 5. Supplementary notes are shared in section 6.
More information can be found on the dedicated GitHub web page2.
2https://multimediaeval.github.io/2020-Sports-Video-Classification-Task/
a. Video acquisition</p>
      <p>b. Annotation platform</p>
    </sec>
    <sec id="sec-2">
      <title>SPECIFIC CONDITIONS OF USAGE</title>
      <p>TTStroke-21 is constituted of videos with players playing table
tennis in natural conditions. Even if we are using an automatic
tool for blurring players’ faces, some faces are misdetected on few
frames and thus some players remain identifiable. In order to
respect the personal data and privacy of the players, this dataset
is subject to a usage agreement, referred to as Special Conditions.
These Special Conditions apply to the use of videos, referred to as
Images, generated in the framework of the program Sports video
classification: classification of strokes in table tennis, for the
implementation of the MediaEval program. They correspond to the
specific usage agreement referred to in the Usage agreement for the
MediaEval 2020 Research Collections, signed between the User and
the University of Delft. The full and complete acceptance, without
any reservation, of these Special Conditions is a mandatory
prerequisite for the provision of the Images as part of the MediaEval
2020 evaluation campaign. A complete reading of these conditions
is necessary and requires the user, for example, to obscure the
faces (blurring, black banner, etc.) in the video before use in any
publication and to destroy the data by October 1st, 2021.
3</p>
    </sec>
    <sec id="sec-3">
      <title>DATASET DESCRIPTION</title>
      <p>In the MediaEval 2020 campaign, we release the same subset of the
TTStroke-21 dataset than last year. The only diference is the
blurring of the faces and the specification if the player is right-handed
or left-handed. The dataset has been recorded in a sport faculty
facility using a light-weight equipment, such as GoPro cameras. It is
constituted of player-centred videos recorded in natural conditions
without markers or sensors, see Fig 1. It comprises 20 table tennis
stroke classes, i.e. 8 services: Serve Forehand Backspin, Serve
Forehand Loop, Serve Forehand Sidespin, Serve Forehand
Topspin, Serve Backhand Backspin, Serve Backhand Loop,
Serve Backhand Sidespin, Serve Backhand Topspin; 6
ofensive strokes: Offensive Forehand Hit, Offensive Forehand
Loop, Offensive Forehand Flip, Offensive Backhand Hit,
Offensive Backhand Loop, Offensive Backhand Flip; and
6 defensive strokes: Defensive Forehand Push, Defensive
Forehand Block, Defensive Forehand Backspin, Defensive
Backhand Push, Defensive Backhand Block, Defensive
Backhand Backspin. Also, all the strokes can be divided in two
super-classes: Forehand and Backhand. This taxonomy was
designed with professional table tennis teachers.</p>
      <p>
        All videos are recorded in MPEG-4 format. Unlike the task at
MediaEval 2019 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], most of the faces are blurred. To do so, faces
are detected with OpenCV deep learning face detector, based on the
Single Shot Detector (SSD) framework with a ResNet base network,
for each frame of the original video. The detected face is blurred
and frames are re-encoded in a video.
      </p>
      <p>The organisation of the delivered data is as follows:
• The provided dataset is split into two subsets: i) training
set and ii) test set;
• In each directory, there are several videos (in MPEG-4
format) and each video may contain several actions;
• Each video file is provided with a XML file describing the
actions present in the video and if the player is right-handed
or left-handed;
• Each action has 3 attributes: the starting frame, the ending
frame, and the stroke class;
• In the train set XML files, all the attributes are specified.</p>
      <p>In the test set XML files, only the starting and ending
frames are specified. The stroke class attribute is purposely
set to value: “Unknown”, and should be updated by the
participants to one of the 20 valid classes.
.
4</p>
    </sec>
    <sec id="sec-4">
      <title>TASK DESCRIPTION</title>
      <p>The Sport Video Annotation task consists, for each action of each
test video, in assigning a label using a given taxonomy of 20 classes
of table tennis strokes.</p>
      <p>Participants may submit up to five runs. For each run, they must
provide one XML file per video file containing, with the actions
associated with the recognised stroke class. Runs may be submitted
as an archive (zip or tar.gz file) with each run in a diferent directory.
Participants should also indicate if any external data, such as other
dataset or pretrained networks, was used to compute their runs.
The task is considered fully automatic. Once the video are provided
to the system, results should be produced without any human
intervention.
5</p>
    </sec>
    <sec id="sec-5">
      <title>EVALUATION</title>
      <p>For MediaEval 2020, we propose a light-weight classification task.
It consists in classification of table tennis strokes which temporal
borders are supplied in the XML files accompanying each video file.
Hence for each test video the participants are invited to produce
an XML file in which each stroke is labelled accordingly to the
given taxonomy. This means that the default label “unknown” has
to be replaced by the label of the stroke class that the participant’s
system has assigned. All submissions will be evaluated in terms of
per-class accuracy ( ) and of global accuracy ().</p>
      <p>The organizers will also provide to the participants diferent
confusion matrices: one considering all the classes, and others
considering the type of the stroke such as: ofensive, defensive and
defensive and/or using forehand and backhand superclasses of the
strokes.
6</p>
    </sec>
    <sec id="sec-6">
      <title>DISCUSSION</title>
      <p>
        The participants from last years have reached a maximum accuracy
of 22.9% [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], 14.1%[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and 11.3% [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] leaving room for
improvement. Participants are welcome to share their dificulties and their
results even if they seem not suficiently good.
      </p>
      <p>Sports Video Classification: Classification of Strokes in Table Tennis</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jordan</given-names>
            <surname>Calandre</surname>
          </string-name>
          , Renaud Péteri, and
          <string-name>
            <given-names>Laurent</given-names>
            <surname>Mascarilla</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Optical Flow Singularities for Sports Video Annotation: Detection of Strokes in Table Tennis</article-title>
          , See [
          <volume>11</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>João</given-names>
            <surname>Carreira</surname>
          </string-name>
          , Eric Noland, Andras Banki-Horvath,
          <string-name>
            <given-names>Chloe</given-names>
            <surname>Hillier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2018</year>
          . A Short Note about Kinetics-
          <volume>600</volume>
          . CoRR abs/
          <year>1808</year>
          .01340 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>João</given-names>
            <surname>Carreira</surname>
          </string-name>
          , Eric Noland, Chloe Hillier, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>A Short Note on the Kinetics-700 Human Action Dataset</article-title>
          . CoRR abs/
          <year>1907</year>
          .06987 (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>João</given-names>
            <surname>Carreira</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset</article-title>
          . (
          <year>2017</year>
          ),
          <fpage>4724</fpage>
          -
          <lpage>4733</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Moritz</given-names>
            <surname>Einfalt</surname>
          </string-name>
          , Dan Zecha, and
          <string-name>
            <given-names>Rainer</given-names>
            <surname>Lienhart</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>ActivityConditioned Continuous Human Pose Estimation for Performance Analysis of Athletes Using the Example of Swimming</article-title>
          .
          <source>In IEEE WACV</source>
          <year>2018</year>
          ,
          <string-name>
            <surname>Lake</surname>
            <given-names>Tahoe</given-names>
          </string-name>
          ,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA, March
          <volume>12</volume>
          -15,
          <year>2018</year>
          .
          <fpage>446</fpage>
          -
          <lpage>455</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Kazi</given-names>
            <surname>Ahmed Asif Fuad</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pierre-Etienne</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romai Giot</surname>
          </string-name>
          , Romain Bourqui, Jenny Benois-Pineau, and
          <string-name>
            <given-names>Akka</given-names>
            <surname>Zemmari</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Feature Understanding in 3D CNNs for Actions Recognition in Video</article-title>
          .
          <source>In Tenth International Conference on Image Processing Theory, Tools and Applications</source>
          ,
          <string-name>
            <surname>IPTA</surname>
          </string-name>
          <year>2020</year>
          , Paris, France, November 9-
          <issue>12</issue>
          ,
          <year>2020</year>
          . 1-
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Chunhui</given-names>
            <surname>Gu</surname>
          </string-name>
          , Chen Sun,
          <string-name>
            <given-names>David A.</given-names>
            <surname>Ross</surname>
          </string-name>
          , Carl Vondrick, Caroline Pantofaru,
          <string-name>
            <given-names>Yeqing</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sudheendra</given-names>
            <surname>Vijayanarasimhan</surname>
          </string-name>
          , George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and
          <string-name>
            <given-names>Jitendra</given-names>
            <surname>Malik</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions</article-title>
          . (
          <year>2018</year>
          ),
          <fpage>6047</fpage>
          -
          <lpage>6056</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Hueihan</given-names>
            <surname>Jhuang</surname>
          </string-name>
          , Juergen Gall, Silvia Zufi, Cordelia Schmid, and
          <string-name>
            <given-names>Michael J.</given-names>
            <surname>Black</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Towards Understanding Action Recognition</article-title>
          .
          <source>In IEEE ICCV</source>
          <year>2013</year>
          , Sydney, Australia, December 1-
          <issue>8</issue>
          ,
          <year>2013</year>
          . IEEE Computer Society,
          <fpage>3192</fpage>
          -
          <lpage>3199</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Will</given-names>
            <surname>Kay</surname>
          </string-name>
          , João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>The Kinetics Human Action Video Dataset</article-title>
          .
          <source>CoRR abs/1705</source>
          .06950 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Hildegard</surname>
            <given-names>Kuehne</given-names>
          </string-name>
          , Hueihan Jhuang, Estíbaliz Garrote, Tomaso A.
          <string-name>
            <surname>Poggio</surname>
            , and
            <given-names>Thomas</given-names>
          </string-name>
          <string-name>
            <surname>Serre</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>HMDB: A large video database for human motion recognition</article-title>
          .
          <source>In IEEE ICCV</source>
          <year>2011</year>
          , Barcelona, Spain, November 6-
          <issue>13</issue>
          ,
          <year>2011</year>
          ,
          <string-name>
            <given-names>Dimitris N.</given-names>
            <surname>Metaxas</surname>
          </string-name>
          , Long Quan, Alberto Sanfeliu, and Luc Van Gool (Eds.).
          <source>IEEE Computer Society</source>
          ,
          <fpage>2556</fpage>
          -
          <lpage>2563</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Martha</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Larson</surname>
            , Steven Alexander Hicks, Mihai Gabriel Constantin, Benjamin Bischke, Alastair Porter,
            <given-names>Peijian</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Mathias</given-names>
          </string-name>
          <string-name>
            <surname>Lux</surname>
          </string-name>
          , Laura Cabrera Quiros,
          <string-name>
            <surname>Jordan Calandre</surname>
          </string-name>
          , and Gareth Jones (Eds.).
          <source>2020. Working Notes Proceedings of the MediaEval 2019 Workshop</source>
          , Sophia Antipolis, France,
          <fpage>27</fpage>
          -
          <lpage>30</lpage>
          October
          <year>2019</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , Vol.
          <volume>2670</volume>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Ang</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Meghana</given-names>
            <surname>Thotakuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>David A.</given-names>
            <surname>Ross</surname>
          </string-name>
          , João Carreira, Alexander Vostrikov, and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>The AVA-Kinetics Localized Human Actions Video Dataset</article-title>
          . CoRR abs/
          <year>2005</year>
          .00214 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Hsien-I Lin</surname>
          </string-name>
          , Zhangguo
          <string-name>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Yi-Chen Huang</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Ball Tracking and Trajectory Prediction for Table-Tennis Robots</article-title>
          .
          <source>Sensors</source>
          <volume>20</volume>
          ,
          <issue>2</issue>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Ruichen</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Zhelong Wang, Xin Shi,
          <string-name>
            <given-names>Hongyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sen</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jie</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Ning</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Table Tennis Stroke Recognition Based on Body Sensor Network</article-title>
          .
          <source>In IDCS</source>
          <year>2019</year>
          , Naples, Italy,
          <source>October 10-12</source>
          ,
          <year>2019</year>
          ,
          <source>Proceedings (Lecture Notes in Computer Science)</source>
          , Rafaele Montella, Angelo Ciaramella, Giancarlo Fortino, Antonio Guerrieri, and Antonio Liotta (Eds.), Vol.
          <volume>11874</volume>
          . Springer,
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Zheng</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Haifeng</given-names>
            <surname>Hu</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Spatiotemporal Relation Networks for Video Action Recognition</article-title>
          .
          <source>IEEE Access</source>
          <volume>7</volume>
          (
          <year>2019</year>
          ),
          <fpage>14969</fpage>
          -
          <lpage>14976</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Pierre-Etienne</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenny</surname>
          </string-name>
          Benois-Pineau, Boris Mansencal, Renaud Péteri, Laurent Mascarilla,
          <string-name>
            <surname>Jordan Calandre</surname>
            , and
            <given-names>Julien</given-names>
          </string-name>
          <string-name>
            <surname>Morlier</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Sports Video Annotation: Detection of Strokes in Table Tennis Task for MediaEval 2019</article-title>
          , See [
          <volume>11</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Pierre-Etienne</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenny</surname>
            Benois-Pineau, Boris Mansencal, Renaud Péteri, and
            <given-names>Julien</given-names>
          </string-name>
          <string-name>
            <surname>Morlier</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Siamese Spatio-Temporal Convolutional Neural Network for Stroke Classification in Table Tennis Games</article-title>
          , See [
          <volume>11</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Pierre-Etienne</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenny</surname>
            Benois-Pineau,
            <given-names>Renaud</given-names>
          </string-name>
          <string-name>
            <surname>Péteri</surname>
            , and
            <given-names>Julien</given-names>
          </string-name>
          <string-name>
            <surname>Morlier</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Sport Action Recognition with Siamese Spatio-Temporal CNNs: Application to Table Tennis</article-title>
          .
          <source>In CBMI</source>
          <year>2018</year>
          ,
          <string-name>
            <surname>La</surname>
            <given-names>Rochelle</given-names>
          </string-name>
          ,
          <source>France, September 4-6</source>
          ,
          <year>2018</year>
          . IEEE, 1-
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Pierre-Etienne</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenny</surname>
            Benois-Pineau,
            <given-names>Renaud</given-names>
          </string-name>
          <string-name>
            <surname>Péteri</surname>
            , and
            <given-names>Julien</given-names>
          </string-name>
          <string-name>
            <surname>Morlier</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Optimal Choice of Motion Estimation Methods for FineGrained Action Classification with 3D Convolutional Networks</article-title>
          .
          <source>In IEEE ICIP</source>
          <year>2019</year>
          , Taipei, Taiwan,
          <source>September 22-25</source>
          ,
          <year>2019</year>
          . IEEE,
          <fpage>554</fpage>
          -
          <lpage>558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Pierre-Etienne</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenny</surname>
            Benois-Pineau,
            <given-names>Renaud</given-names>
          </string-name>
          <string-name>
            <surname>Péteri</surname>
            , and
            <given-names>Julien</given-names>
          </string-name>
          <string-name>
            <surname>Morlier</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>3D attention mechanisms in Twin Spatio-Temporal Convolutional Neural Networks</article-title>
          .
          <article-title>Application to action classification in videos of table tennis games.</article-title>
          . In 2ICPR2020 - MiCo Milano Congress Center, Italy,
          <fpage>10</fpage>
          -
          <lpage>15</lpage>
          January
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Pierre-Etienne</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenny</surname>
            Benois-Pineau,
            <given-names>Renaud</given-names>
          </string-name>
          <string-name>
            <surname>Péteri</surname>
            , and
            <given-names>Julien</given-names>
          </string-name>
          <string-name>
            <surname>Morlier</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Fine grained sport action recognition with Twin spatiotemporal convolutional neural networks</article-title>
          .
          <source>Multim. Tools Appl</source>
          .
          <volume>79</volume>
          ,
          <fpage>27</fpage>
          -
          <lpage>28</lpage>
          (
          <year>2020</year>
          ),
          <fpage>20429</fpage>
          -
          <lpage>20447</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Juan</given-names>
            <surname>Carlos</surname>
          </string-name>
          <string-name>
            <given-names>Niebles</given-names>
            ,
            <surname>Chih-Wei</surname>
          </string-name>
          <string-name>
            <given-names>Chen</given-names>
            , and
            <surname>Fei-Fei Li</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification</article-title>
          . In Computer Vision - ECCV
          <year>2010</year>
          , Heraklion, Crete, Greece, September 5-
          <issue>11</issue>
          ,
          <year>2010</year>
          , Proceedings,
          <string-name>
            <surname>Part II</surname>
          </string-name>
          (Lecture Notes in Computer Science),
          <source>Kostas Daniilidis, Petros Maragos, and Nikos Paragios (Eds.)</source>
          , Vol.
          <volume>6312</volume>
          . Springer,
          <fpage>392</fpage>
          -
          <lpage>405</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Dian</surname>
            <given-names>Shao</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Yue</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bo</given-names>
            <surname>Dai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Dahua</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding</article-title>
          . CoRR abs/
          <year>2004</year>
          .06704 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Khurram</surname>
            <given-names>Soomro</given-names>
          </string-name>
          , Amir Roshan Zamir, and
          <string-name>
            <given-names>Mubarak</given-names>
            <surname>Shah</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild</article-title>
          .
          <source>CoRR abs/1212</source>
          .0402 (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Siddharth</surname>
            <given-names>Sriraman</given-names>
          </string-name>
          , Srinath Srinivasan,
          <string-name>
            <surname>Vishnu K. Krishnan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Bhuvana</surname>
            <given-names>J</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>T. T.</given-names>
            <surname>Mirnalinee</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>MediaEval 2019: LRCNs for Stroke Detection in Table Tennis</article-title>
          , See [
          <volume>11</volume>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Andrei</surname>
            <given-names>Stoian</given-names>
          </string-name>
          , Marin Ferecatu, Jenny Benois-Pineau, and
          <string-name>
            <given-names>Michel</given-names>
            <surname>Crucianu</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Fast Action Localization in Large-Scale Video Archives</article-title>
          .
          <source>IEEE Trans. Circuits Syst. Video Techn</source>
          .
          <volume>26</volume>
          ,
          <issue>10</issue>
          (
          <year>2016</year>
          ),
          <fpage>1917</fpage>
          -
          <lpage>1930</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Tabrizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pashazadeh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Javani</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Comparative Study of Table Tennis Forehand Strokes Classification Using Deep Learning and SVM</article-title>
          .
          <source>IEEE Sensors Journal</source>
          (
          <year>2020</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>1</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Gül</surname>
            <given-names>Varol</given-names>
          </string-name>
          , Ivan Laptev, and
          <string-name>
            <given-names>Cordelia</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Long-Term Temporal Convolutions for Action Recognition</article-title>
          .
          <source>IEEE Trans. Pattern Anal. Mach. Intell</source>
          .
          <volume>40</volume>
          ,
          <issue>6</issue>
          (
          <year>2018</year>
          ),
          <fpage>1510</fpage>
          -
          <lpage>1517</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Roman</surname>
            <given-names>Voeikov</given-names>
          </string-name>
          , Nikolay Falaleev, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Baikulov</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>TTNet: Real-time temporal and spatial video analysis of table tennis</article-title>
          . CoRR abs/
          <year>2004</year>
          .09927 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Jiachen</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kejian Zhao</surname>
            ,
            <given-names>Dazhen</given-names>
          </string-name>
          <string-name>
            <surname>Deng</surname>
            , Anqi Cao, Xiao Xie, Zheng Zhou, Hui Zhang, and
            <given-names>Yingcai</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Tac-Simur: Tactic-based Simulative Visual Analytics of Table Tennis</article-title>
          .
          <source>IEEE Trans. Vis. Comput. Graph</source>
          .
          <volume>26</volume>
          ,
          <issue>1</issue>
          (
          <year>2020</year>
          ),
          <fpage>407</fpage>
          -
          <lpage>417</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Erwin</given-names>
            <surname>Wu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hideki</given-names>
            <surname>Koike</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>FuturePong: Real-time Table Tennis Trajectory Forecasting using Pose Prediction Network</article-title>
          .
          <source>In CHI</source>
          <year>2020</year>
          ,
          <article-title>Honolulu</article-title>
          ,
          <string-name>
            <given-names>HI</given-names>
            , USA,
            <surname>Regina</surname>
          </string-name>
          <string-name>
            <surname>Bernhaupt</surname>
          </string-name>
          , Florian 'Floyd' Mueller, David Verweij,
          <string-name>
            <given-names>Josh</given-names>
            <surname>Andres</surname>
          </string-name>
          ,
          <string-name>
            <surname>Joanna</surname>
            <given-names>McGrenere</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Andy</given-names>
            <surname>Cockburn</surname>
          </string-name>
          , Ignacio Avellino, Alix Goguey, Pernille Bjøn,
          <string-name>
            <given-names>Shengdong</given-names>
            <surname>Zhao</surname>
          </string-name>
          , Briane Paul Samson, and Rafal Kocielnik (Eds.). ACM, 1-
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Kun</surname>
            <given-names>Xia</given-names>
          </string-name>
          , Hanyu Wang, Menghan Xu,
          <string-name>
            <given-names>Zheng</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sheng</given-names>
            <surname>He</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Yusong</given-names>
            <surname>Tang</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Racquet Sports Recognition Using a Hybrid Clustering Model Learned from Integrated Wearable Sensor</article-title>
          .
          <source>Sensors</source>
          <volume>20</volume>
          ,
          <issue>6</issue>
          (
          <year>2020</year>
          ),
          <fpage>1638</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>