On the Impact of Computer Vision Algorithms on
Sport Training Automation: Proof of Concept for
Shadow Boxing Virtual Instructor
Ilya Makarov1,2 , Stanislav Petrov1
1
 HSE University, Moscow, Russia, Pokrovsky boulevard 11, 109028 Moscow, Russian Federation
2
 Artificial Intelligence Research Institute (AIRI), Moscow, Russia, Nizhny Susalny lane 5 p. 19, 105064 Moscow, Russian
Federation


                                         Abstract
                                         We overview several XR applications of deep convolutional neural networks to the opportunity for
                                         creating an automated sports training process. From smart soccer and basketballs to automated fitness
                                         training programs, the progress of computer vision methods combined with personalized recommender
                                         systems and specific data science algorithms allows one to train non-contact training processes in a
                                         semi-automated manner with virtual mirrors and XR devices. We overview modern progress in this area
                                         and also present our own prototype of Shadow Boxing simulator for Virtual Mirror aiming to match
                                         part of boxer training with automated control and gamification process. We show that the trend of
                                         automating training instructors in sport leads to a positive shift in sportsmen and trainees’ view of
                                         artificial intelligence in common life.

                                         Keywords
                                         Computer Vision, Mixed Reality, Automated Sports Instructor, Boxing Simulator


1. Introduction
Nowadays, mixed reality (MR) and artificial intelligence (AI) allow humans to engage in different
types of sport life activities, which previously requires human-to-human interaction, but now
can be substituted with training in virtual, augmented, or mixed reality. The idea of that is
quite simple: usually, the instructor repeats the sequences of commands, which are limited in
number and length, and such a small world of possible actions allows AI specialists to train the
model for sport and fitness exercises, while automatically personalize the process of enriching
training program based on exact person progress. The overview of the existing application is
presented in Table 1. Although there are commercial products in this field, it is still open for
market changes and new startups, while waiting for significant improvements on computer
vision and life-long learning algorithms supporting the systems of automated sports training.


AISMA-2021: International Workshop on Advanced in Information Security Management and Applications, Stavropol,
Krasnoyarsk, Russia, October 1, 2021
$ iamakarov@hse.ru (I. Makarov); stasdp@mail.ru (S. Petrov)
 0000-0002-3308-8825 (I. Makarov)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                                                                                                 95
Table 1
Table of Smart Fitness Simulators based on Computer Vision (CV) or Data Science (DS) techniques.
       Name                                                 CV/DS   Product/     Reference
                                                                    Prototype
       Smartspot (fitness virtual mirror)                   CV      Prototype    [1]
       D.Gym (gym tracking and analytics)                   CV      None         [2]
       Fitbod (recommending exercises)                      DS      Product      [3]
       Phormatics (exercise pose evaluation)                CV      Prototype    [4]
       Perch (barbell exercises)                            CV      Prototype    [5]
       EPAM Smart Gym (barbell)                             CV      Prototype    [6]
       Smart Balls by DU (soccer/basketball training)       CV      Product      [7]
       Just Dance (dancing video game)                      CV      Product      [8]


2. Core Concepts in Automated Instructors
In order to implement a working prototype of virtual avatar training sportsman one need to
solve the following problems:

    • to develop RGBD body and skeleton motion segmentation [9] and tracking [10], usually
      used for aerobics/dance exercises (see [8]1 , in the latter video upper left corner contains
      Human Figure Segmentation based on which each body part movement is scored and
      then visualized in-game);
    • to implement visualization in computer graphics engines, for example, Unreal Engine 4
      or Unity [11];
    • to analyze wearable smart devices for exercises on gym equipment, for example, [5], and
      track heart-beat, velocity, weight (see simple prototype2 ), and visualize right and wrong
      points using human pose estimation3 [12, 13, 14];
    • to train several tracks recommendation system similar to [3] 4 allowing to personalize
      training experience [15].

  In what follows, we describe one of such systems (work-in-progress) designed for a particular
type of boxing training, called ‘Shadow Boxing’, which requires the imagination of a virtual
enemy and working on boxing techniques without haptic interaction with a partner/instructor.


3. Proposal of Shadow Boxing MR App
Every boxer uses shadow boxing during training. Beginners should start warming up with
shadow boxing for at least 15 minutes. Usually, it takes 30 minutes for common boxers and 60
minutes for professionals.

   1
     https://www.youtube.com/watch?v=PvZA8NKgrBI
   2
     https://youtu.be/3CSzFhabZ3g
   3
     https://github.com/cbsudux/awesome-human-pose-estimation
   4
     https://www.youtube.com/watch?v=yOizW6130QY


                                                                                                     96
Figure 1: Concept of Boxing in VR


   One of the main conditions for evaluation of Shadow Boxing is the fact that sportsmen should
not be getting tired when shadowboxing. Another reason for training using shadow boxing
is the development of weak back muscles, the low endurance of which may be crucial if the
boxer is weakly adapted to many missing hits in a real fight after training with a bag. The
main idea is that a boxer has to elaborate on improving their imagination of fight, developing
so-called “muscle memory” and working on certain goals during shadow boxing, which can be
done anywhere at anytime.
   The main objectives of Shadow Boxing can be divided into an improvement of several key
parameters:
   1. boxing technique, offense, and defense (complex):
        a) correct hits (tracking, CV)
        b) balance (tracking, CV)
        c) combinations of hits as simple techniques (tracking, CV)

   2. overall fighting abilities:
        a) strength (sensor)
        b) power (acceleration, CV)
        c) speed (speed, CV)
        d) endurance (time series, CV & DS)
        e) footwork (tracking, CV)
        f) rhythm (complex CV, unknown)


4. Related Work
4.1. Existing Boxing Applications
In the first part, we overview existing products and prototypes in the area of Boxing simulators.


                                                                                                    97
4.1.1. Patents and Existing Prototypes in Boxing
Robotic boxing punching bags were presented in [16] and real-world prototype [17]. Both ideas
keep insisting on interaction with a physical bag to hit. The scientific explanation of suggested
models can be found in the corresponding patent and punch measurement work [18].

4.1.2. Sensor application using electrodes for imitating hit feeling
In [19], the similar idea of haptic feedback was used for reconstruction of hand and finger
movement based on myogram, tracking muscle activities in the middle between hand and
elbow.

4.1.3. VR applications
Currently, there are two popular video games: [20]5 and [21]6 . The first provides the experience
of virtual punching bug, order of hits, slow-motion, and other gamified moments of simulating
boxing match; the second is the fist mitt hitting training without a partner. Both applications
lack the realism of punching training and interaction with both virtual enemy and virtual
trainer. Moreover, the entertainment model of immersion in video games was prioritized over
the training aspect of professional and novice boxers via human-computer interaction in VR.

4.2. Computer Vision Research on Boxing Recognition
Existing research on boxing action recognition and related topics are mostly connected to
human pose estimation and segmentation, and also boxing hit recognition.
   In [22], the authors presented a punch recognition system based on prior knowledge of
boxing punches. The authors of [23] presented a robust framework to recognize fine-grained
boxing punches from specifically posed depth images from head/ceiling view.
   An overview of the application of deep learning to RGB-D-based motion recognition: RGB-
based, depth-based, skeleton-based, and RGB+D-based, can be found in [24]. The game and
sports action recognition dataset most close to our task was presented in [25]. A survey on
deep learning for sport-specific movement recognition performance describes advances in the
field of sports action recognition [26].
   Shortly speaking, there are datasets allowing recognition of certain types of the simplest
punches and human movements, while also estimating parameters of such punches, such as
speed, acceleration, and exact three-D locations of joints. However, there is no complex study
and available dataset of boxing techniques, combinations, and parameters of such training. This
is the situation, in which a combination of boxing instructors and CV specialists will be required
for synergy between technologies and expert domain knowledge.


   5
       https://www.youtube.com/watch?v=7zuKLu7dOng
   6
       https://www.youtube.com/watch?v=n51DqCpqGSU


                                                                                                     98
5. Our Vision of Shadow Boxing Application
We will in detail describe our vision of Shadow Boxing gamification and implementation of
human-computer interaction with Virtual Boxing Instructor.
   We aim to create a simple Boxer avatar visualized on a virtual mirror that authorizes the
person (possibly based on biometry). At first, it is required to calibrate the system while asking
the user to make simple movements, single hits, and short combinations in order to adapt hit
score and tracking systems for the user.
   We visualize the animation of correct combination and user’s performance showing the
differences and mistakes in the process of training as follows. An animated avatar shows
movements and a user repeats them. After a certain sequence is executed, the sportsman sees
his score, and a slow-motion replay of his action is compared with a virtual avatar executing
the same motion sequence. The differences between recorded movements of the avatar and the
user are visualized on the screen and interpreted in advises how to overcome this difference.
   Depending on boxer level, to master certain combinations we may divide the sequence into
well-done and the problem moves, which may be trained separately in slower motion to later
combine them in one sequence after each sub-part is mastered. Each simple and combined
boxing sequence should be defined together with the boxing instructor or be based on some
boxing training manual.
   Every task is considered as a sequence of simple moves and punches initialized by timer/sound,
and visualized as a sequence of cheat sheets during boxer performing this sequence. For
example, the combination “hit left - hit left - jab right - evade left - move right” is evaluated
based on position, speed, acceleration, and depth parameters of each movement in parallel with
recognition of each movement from one or several RGBD cameras, which can be placed in front,
around or above sportsman. The collection of a new dataset will be required depending on the
complexity of the prototype and different boxing punches modalities.
   Based on boxer data we can collect his performance, evaluate his parameters (endurance,
speed, balance, transitions from defense to offense, etc.), and recommend he train certain
sequences in which he lacks skills. In the first step, it will be hardcoded rules based on boxing
theory ideas, later we will use collected data to use data mining techniques provided self-
supervised recommendations.
   As for implementation, we start with person tracking and identification and training deep
learning models for human 3D skeleton reconstruction (pose estimation) so it will be coherent
with boxer movement on video and show avatar repeating these actions. Then we deal with
boxing hit recognition (hook, jab, etc., on which there are no available open datasets) from
RGBD video and evaluation of speed and power parameters of how the hits should be done,
so that the system will be able to recognize simple hits and their combinations, score the hit
based on inner parameters and show how it should have been properly done. Finally, we aim to
develop a virtual enemy avatar for which the training boxer should develop his counter-measure
hits and movement, first with virtual hints, and further, just by observing his movements. The
resulting application will support these two modes of training as repeating hits and knowing
what hits to throw against the virtual enemy.


                                                                                                     99
6. Development and Release Description for Shadow Boxing
   Training Application
Below, we describe the structure of input information, problems in AI and CV to be solved,
and consistent increase in functionality of the Shadow Boxing application. We also envision
the development of consequent updates to Shadow Boxing Training based on increased use of
existing boxing fights datasets and MR head-mounted displays.

6.1. Input Channels
The combinations of different inputs should be presented for a successful Demo integrating
into a working prototype. The user has to be properly led to the current goals, see the results of
training, collect the feedback, and do not be disappointed by mistakes of recognition module,
which requires very robust and precise algorithms to be developed applied to the boxing domain.
   We consider the combination of the following input channels:

    • Graphical avatar, tracking the position of human, and providing personal training based
      on input parameters of training mode and current level;
    • Sound input into headphones for instructor commands, especially during combination
      movements, during which boxer could not properly see the whole picture, but has to
      respond fast to possible motion errors or react to virtual enemy actions;
    • AR interface to visualize virtual enemy and make the training of real fight possible;
    • Possible electrode-based sensors for simulating recoil feedback and sensory feelings of
      touching/punching may be tested.

6.2. Problems Pipeline to be Solved
We formulate core concepts of computer vision and machine learning fields to be solved in
order to make the prototype of Shadow Boxing training sufficient for evaluation in a real-world
scenarios with professional boxing sportsmen, together with complexity estimation based on
state-of-the-art results in the related fields.
   1. Computer Vision
        a) Person Identification (simple)
        b) Person Tracking, including reidentification in occluded scenarios
           (middle)
        c) Depth Person Segmentation (middle)
       d) Depth Person Skeleton Reconstruction (middle-hard)
        e) Boxer Action Recognition (middle-hard)
        f) Generating Avatar Movement based on collected data (hard)
        g) Parsing boxing TV videos and Enemy reconstruction (very hard)
       h) Programming game AI for Boxing sparring (state-of-art)

   2. Machine Learning
       a) Robust measurement of punch parameters (simple)


                                                                                                     100
        b) Measuring human performance during series of training (middle)
        c) Recommending simple moves to work on (simple)
        d) Recommending long sequences for skills improvement (middle)
        e) Boxing Technique learning as a list of sequences to be learned combined with proper
           evaluation of sub-part movements (hard, state-of-art)
  In addition, Reinforcement Learning methods may be applied to reverse engineer agents’
behavior, thus training a virtual agents to perform actions by receiving rewards for correct
actions recognized by computer vision encoder similar to environments for training agents in
3D shooter games [27, 28, 29, 30, 31, 32] or 2D interaction games [33, 34, 35].

6.3. AR Helmet Training (Demo v1.5)
As a continuation of the previous work, we aim to work on a prototype of a Mixed Reality
application, in which the boxer will wear an optical see-through head-mounted display allowing
him to see visual enemy or training targets in Augmented Reality. We plan to add the following
elements based on the previous description.
    • Implementation of simplest training exercises similar to VR Creed Boxing game: hitting
      certain areas in front of the boxer in the corresponding sequence, hitting virtual bag to
      hit other targets, evade, speed in-/out- from the enemy zone, etc.
    • Virtual BOTs playing box against a person with health bar measures and training certain
      boxing techniques to counter them, such as training mode against Southpaw.

6.4. TV highlighting mode (Demo v2.0)
Standalone application to recognize and parse boxing matches, visualize highlights and recognize
the sequence of actions to later map them to predefined short avatar scripted scenes for learning
in AR boxing demo. It could be used for TV parsing and slow-motion reconstructions of
highlighted events.
  The p

6.5. Personalized enemy for Shadow Boxing Sparring in AR (Demo v2.5)
After we learn how to parse TV videos of boxing matches, we could collect data on the specific
person and implement his skills and techniques into AI in-game prototype, which later will
be used as an opponent in AR for training against exact type of opponent with known skills,
parameters, and techniques.
  This last task is challenging and at the current moment, nobody tries to do this, but the
current progress in neural networks and human action recognition allow us to state that we
could solve this task in Machine Learning and Game Artificial Intelligence fields.


7. Example of implementation from scratch
One of the main goals is to create a neural network that could capture actions that differs a
little from each other. Also, the domain in our work – boxing, is a very narrow area for action


                                                                                                    101
recognition. Thus, there is no dataset with different boxing punches where the boxer stays right
in front of the camera.
   We aim to create a domain-specific network that could help boxers in their training. Given
that we can assume some limitations to model usage:

    • The presence of only one person in the video
    • This person is 2-3 meters away in front of the camera
    • Boxing is performed in direction of the camera

   In this work, we wanted to create a dataset with as many as possible features. There are
several ways of doing that, such as recording video and using different neural networks to
extract pose, mask, depth, etc., or to record video using a special camera that writes depth,
IR, pose, and mask data. The first approach is straightforward and accurate – using video
and modern neural networks, we can collect pose estimation, depth, and mask of the human,
but this approach is computationally expensive for detailed streaming data annotation despite
usefulness of the methods proposed in [36, 37]. It can be efficiently done without solving the
mask problem if only depth information is required [38, 39, 40, 41, 42, 43, 44, 45, 46, 47]. In
addition, pose estimation may be efficiently solved by graph neural networks, which are of great
use for extracting and preserving structural dependencies. Our works in the domain of graph
feature engineering can be found in [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63].
   The latter approach is more expensive than the former but gives a human pose, the mask of
a person, and a depth map in near real-time and does not require wearing additional sensors
from the sportsman. As the balance for our task, we stop on the latter approach.
   For dataset collection purposes, we have ORRBEC Astra Pro. It can capture RGB image data
in 1280×780 resolution, depth image 640×480 with 30fps both. It uses USB 2.0 to connect to a
computer.
   ORRBEC developed Astra SDK for body tracking. It was used to develop a program that in
different threads read and write to correspondent files: RGB video of the action, Depth video,
Mask of the person body with a mask of the floor, Pose keypoints.
   For this work, we collected video representations of four basic boxer actions: Defense, Cross
punch, Hook, Uppercut. Each class contains 200 samples of the author boxing 2-3 meters
away in front of the camera. Actions captured in one location, different clothes, and different
techniques were used(with wrist rotation and without it). All punches was performed with the
right hand.
   Data preparation could be split into 3 stages:
    • Input video splinted manually in 32 video pieces representing one boxing punch
    • Splinted video cut to 480×480 resolution from the right and left sides
    • Cut video is reshaped to 224×224 resolution video Data augmentation.
   There were 3 types of data augmentation were performed. On the second step of the data, the
preparation video was randomly cropped from right between 60- and 100-pixel range Temporal
augmentation. For each 32-frame cut video clip randomly was taken bias in a range from -8 to
-4 and from 4 to 8. This bias served to produce another 32-frame but using starting point plus
bias.


                                                                                                        102
   In fact, Astra Camera recorded video with some freezes and not all samples were captured
with 30fps, but as actions were performed with the different speeds it had no effect on the
collected data.
   In this work we stopped our choice on Multi streams I3D network[36]: I3D is a good performer
on Kinetics dataset, is not enormous as C3D, and does not need sampling frames from the video.
We omitted the use of optical flow as an input, instead of using depth flow, and do not take
into account the pretraining encoder on ImageNet. The training was performed with the size
of batch = 2 in 1 epoch. Test dataset contained 30% of all data. The size of the train dataset
was 2750. The training was end to end with Cross-Entropy loss on the sum of the logits of all
streams. We trained the network in 3 setups: all four streams; depth, pose and mask streams;
depth and pose streams. After training, we get average accuracy on trimmed videos around 95%
on test and validation sets. That could be explained by overfitting on small-size dataset.
   Results demonstrate that the I3D net successfully classified all actions from the test data set.
To improve the results, an augmented extended dataset was recorded and tested providing more
robust and believable results for shadow boxing simulation [64].


8. Conclusion
We overview the state-of-the-art research and production projects in the field of automating
fitness and sports training. We showed that despite the current advances in the deep learning
field, technological aspects of creating a production-ready prototype are still quite challenging
tasks.
   We further discuss the particular problem of developing a computer vision based simulator
for Shadow Boxing, and the problem arising during this development. We envision the future
progress of such a simulator based on current deep learning and computer vision techniques
and suggest the exact scenario of increasing complexity and usability of such systems in the
boxer training process.
   We aim to show the first version of the prototype during the conference and collect feedback
both, in terms of user study and algorithm accuracy impacting the whole simulator. We also
aim to measure human perception on whether such a system may substitute a human boxing
instructor in non-contact sports training.


References
 [1] Smartspot, https://smartspot.io, 2017.
 [2] D.Gym, http://dohyungkim.com/dgym/, 2017.
 [3] Fitbod, https://fitbod.me, 2018.
 [4] Phormatics, https://github.com/jrobchin/phormatics, 2018.
 [5] Perch, https://www.perch.fit/, 2018.
 [6] EPAM, https://bit.ly/2uWmxUm, 2019.
 [7] Smart Balls, https://buy.dribbleup.com/collection/smart-balls, 2019.
 [8] Just Dance, https://www.ubisoft.com/en-us/game/just-dance-2019, 2019.


                                                                                                      103
 [9] L. L. Presti, M. La Cascia, 3d skeleton-based human action classification: A survey, Pattern
     Recognition 53 (2016) 130–147.
[10] Z.-Q. Cheng, Y. Chen, R. R. Martin, T. Wu, Z. Song, Parametric modeling of 3d human
     body shape—a survey, Computers & Graphics 71 (2018) 88–100.
[11] C. George, M. Spitzer, H. Hussmann, Training in ivr: investigating the effect of instructor
     design on social presence and performance of the vr user, in: Proceedings of the 24th
     ACM Symposium on Virtual Reality Software and Technology, ACM, ACM, New York,
     USA, 2018, p. 27.
[12] D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, H.-P. Seidel, W. Xu, D. Casas,
     C. Theobalt, Vnect: Real-time 3d human pose estimation with a single rgb camera, ACM
     Transactions on Graphics (TOG) 36 (2017) 44.
[13] H. Fang, S. Xie, Y.-W. Tai, C. Lu, Rmpe: Regional multi-person pose estimation, in: The
     IEEE International Conference on Computer Vision (ICCV), volume 2, IEEE, New York,
     USA, 2017, pp. 1–10.
[14] G. Varol, D. Ceylan, B. Russell, J. Yang, E. Yumer, I. Laptev, C. Schmid, Bodynet: Volumetric
     inference of 3d human body shapes, arXiv preprint arXiv:1804.04875 arXiv:1804.04875
     (2018) 1–27.
[15] T. T. Tran, J. W. Choi, C. Van Dang, G. SuPark, J. Y. Baek, J. W. Kim, Recommender
     system with artificial intelligence for fitness assistance system, in: 2018 15th International
     Conference on Ubiquitous Robots (UR), IEEE, IEEE, New York, USA, 2018, pp. 489–492.
[16] Boxing buddy system, https://patents.google.com/patent/US9586120B1/en, 2014.
[17] BotBoxer, http://botboxer.com/, 2018.
[18] S. Chadli, N. Ababou, A. Ababou, A new instrument for punch analysis in boxing, Procedia
     Engineering 72 (2014) 411 – 416. URL: http://www.sciencedirect.com/science/article/pii/
     S187770581400589X. doi:https://doi.org/10.1016/j.proeng.2014.06.073, the
     Engineering of Sport 10.
[19] P. Lopes, A. Ion, P. Baudisch, Impacto: Simulating physical impact by combining tactile
     stimulation with electrical muscle stimulation, in: Proceedings of the 28th Annual ACM
     Symposium on User Interface Software &#38; Technology, UIST ’15, ACM, New York, NY,
     USA, 2015, pp. 11–19. URL: http://doi.acm.org/10.1145/2807442.2807443. doi:10.1145/
     2807442.2807443.
[20] Creed: Rise to Glory, https://survios.com/creed/, 2018.
[21] The Fastest Fist, https://store.steampowered.com/app/544540/The_Fastest_Fist/, 2018.
[22] S. Kasiri-Bidhendi, C. Fookes, S. Morgan, D. T. Martin, S. Sridharan, Combat sports
     analytics: Boxing punch classification using overhead depthimagery, in: 2015 IEEE
     International Conference on Image Processing (ICIP), IEEE, New York, USA, 2015, pp.
     4545–4549. doi:10.1109/ICIP.2015.7351667.
[23] S. Kasiri, C. Fookes, S. Sridharan, S. Morgan, Fine-grained action recognition of boxing
     punches from depth imagery, Computer Vision and Image Understanding 159 (2017)
     143 – 153. URL: http://www.sciencedirect.com/science/article/pii/S1077314217300668.
     doi:https://doi.org/10.1016/j.cviu.2017.04.007, computer Vision in Sports.
[24] P. Wang, W. Li, P. Ogunbona, J. Wan, S. Escalera, Rgb-d-based human motion recognition
     with deep learning: A survey, Computer Vision and Image Understanding 171 (2018)
     118 – 139. URL: http://www.sciencedirect.com/science/article/pii/S1077314218300663.


                                                                                                      104
     doi:https://doi.org/10.1016/j.cviu.2018.04.007.
[25] V. Bloom, D. Makris, V. Argyriou, G3d: A gaming action dataset and real time action recog-
     nition evaluation framework, in: Computer Vision and Pattern Recognition Workshops
     (CVPRW), 2012 IEEE Computer Society Conference on, IEEE, IEEE, New York, USA, 2012,
     pp. 7–12.
[26] E. E. Cust, A. J. Sweeting, K. Ball, S. Robertson, Machine and deep learning for sport-specific
     movement recognition: a systematic review of model development and performance,
     Journal of Sports Sciences 0 (2018) 1–33. doi:10.1080/02640414.2018.1521769, pMID:
     30307362.
[27] I. Makarov et al., First-person shooter game for virtual reality headset with advanced
     multi-agent intelligent system, in: Proceedings of the 24th ACM international conference
     on Multimedia, 2016, pp. 735–736.
[28] I. Makarov, M. Tokmakov, L. Tokmakova, Imitation of human behavior in 3d-shooter game,
     AIST’2015 Analysis of Images, Social Networks and Texts (2015) 64.
[29] I. Makarov et al., Modelling human-like behavior through reward-based approach in a
     first-person shooter game, in: EEML Proceedings, 2016, pp. 24–33.
[30] I. Makarov, P. Polyakov, R. Karpichev, Voronoi-based path planning based on visibility
     and kill/death ratio tactical component, in: Proceedings of AIST, 2018, pp. 129–140.
[31] I. Makarov, P. Polyakov, Smoothing voronoi-based path with minimized length and
     visibility using composite bezier curves, in: AIST (Supplement), 2016, pp. 191–202.
[32] I. Makarov, O. Konoplia, P. Polyakov, M. Martynov, P. Zyuzin, O. Gerasimova, V. Bodish-
     tianu, Adapting first-person shooter video game for playing with virtual reality headsets.,
     in: FLAIRS Conference, 2017, pp. 412–416.
[33] I. Makarov, D. Savostyanov, B. Litvyakov, D. I. Ignatov, Predicting winning team and
     probabilistic ratings in “dota 2” and “counter-strike: Global offensive” video games, in:
     Proceedings of AIST, Springer, 2017, pp. 183–196.
[34] I. Kamaldinov, I. Makarov, Deep reinforcement learning in match-3 game, in: Proceedings
     of CoG, IEEE, 2019, pp. 1–4.
[35] I. Kamaldinov, I. Makarov, Deep reinforcement learning methods in match-3 game, in:
     Proceedings of AIST, Springer, 2019, pp. 51–62.
[36] J. Hong, B. Cho, Y. W. Hong, H. Byun, Contextual action cues from camera sensor for
     multi-stream action recognition, Sensors 19 (2019) 1382.
[37] K. Lomotin, I. Makarov, Automated image and video quality assessment for computational
     video editing, in: International Conference on Analysis of Images, Social Networks and
     Texts, Springer, 2020, pp. 243–256.
[38] I. Makarov, V. Aliev, O. Gerasimova, Semi-dense depth interpolation using deep convo-
     lutional neural networks, in: Proceedings of the 2017 ACM on Multimedia Conference,
     ACM, 2017, pp. 1407–1415.
[39] I. Makarov, D. Maslov, O. Gerasimova, V. Aliev, A. Korinevskaya, U. Sharma, H. Wang,
     On reproducing semi-dense depth map reconstruction using deep convolutional neural
     networks with perceptual loss, in: Proceedings of the 27th ACM International Conference
     on Multimedia, 2019, pp. 1080–1084.
[40] D. Maslov, I. Makarov, Online supervised attention-based recurrent depth estimation from
     monocular video, PeerJ Computer Science 6 (2020) e317.


                                                                                                       105
[41] I. Makarov, V. Aliev, O. Gerasimova, P. Polyakov, Depth map interpolation using percep-
     tual loss, in: Mixed and Augmented Reality (ISMAR-Adjunct), 2017 IEEE International
     Symposium on, IEEE, 2017, pp. 93–94.
[42] A. Korinevskaya, I. Makarov, Fast depth map super-resolution using deep neural network,
     in: 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-
     Adjunct), IEEE, 2018, pp. 117–122.
[43] I. Makarov, A. Korinevskaya, V. Aliev, Fast semi-dense depth map estimation, in: Pro-
     ceedings of the 2018 ACM Workshop on Multimedia for Real Estate Tech, ACM, ACM, NY,
     USA, 2018, pp. 18–21.
[44] I. Makarov, A. Korinevskaya, V. Aliev, Sparse depth map interpolation using deep convo-
     lutional neural networks, in: 2018 41st IC on Telecommunications and Signal Processing
     (TSP), IEEE, IEEE, NY, USA, 2018, pp. 1–5.
[45] I. Makarov, A. Korinevskaya, V. Aliev, Super-resolution of interpolated downsampled
     semi-dense depth map, in: Proceedings of the 23rd International ACM Conference on 3D
     Web Technology, ACM, ACM, NY, USA, 2018, p. 27.
[46] D. Maslov, I. Makarov, Fast depth reconstruction using deep convolutional neural networks,
     in: International Work-Conference on Artificial Neural Networks, Springer, 2021, pp. 456–
     467.
[47] I. Makarov, I. Guschenko-Cheverda, Learning loss for active learning in depth reconstruc-
     tion problem, in: Proceedings of CINTI’21, IEEE, 2021, pp. 1–6.
[48] K. Tikhomirova, I. Makarov, Community detection based on the nodes role in a network:
     The telegram platform case, in: International Conference on Analysis of Images, Social
     Networks and Texts, Springer, 2020, pp. 294–302.
[49] I. Makarov, A. Savchenko, A. Korovko, L. Sherstyuk, N. Severin, A. Mikheev, D. Babaev,
     Temporal graph network embedding with causal anonymous walks representations, arXiv
     preprint arXiv:2108.08754 (2021).
[50] P. Zolnikov, M. Zubov, N. Nikitinsky, I. Makarov, Efficient algorithms for constructing
     multiplex networks embedding, in: Proceedings of EEML conference, 2019, pp. 57–67.
[51] I. Makarov, D. Kiselev, N. Nikitinsky, L. Subelj, Survey on graph embeddings and their
     applications to machine learning problems on graphs, PeerJ Computer Science (2021).
[52] I. Makarov, M. Makarov, D. Kiselev, Fusion of text and graph information for machine
     learning problems on networks, PeerJ Computer Science 7 (2021).
[53] I. Makarov, K. Korovina, D. Kiselev, Jonnee: Joint network nodes and edges embedding,
     IEEE Access (2021) 1–14.
[54] I. Makarov, O. Bulanov, L. E. Zhukov, Co-author recommender system, in: International
     Conference on Network Analysis, Springer, 2016, pp. 251–257.
[55] M. K. Rustem, I. Makarov, L. E. Zhukov, Predicting psychology attributes of a social
     network user, in: Proceedings of the Fourth Workshop on Experimental Economics and
     Machine Learning (EEML’17), Dresden, Germany, September 17-18, 2017, CEUR WP, 2017,
     pp. 1–7.
[56] I. Makarov, O. Bulanov, O. Gerasimova, N. Meshcheryakova, I. Karpov, L. E. Zhukov,
     Scientific matchmaker: Collaborator recommender system, in: International Conference
     on Analysis of Images, Social Networks and Texts, Springer, 2017, pp. 404–410.
[57] I. Makarov, O. Gerasimova, P. Sulimov, L. E. Zhukov, Recommending co-authorship via


                                                                                                  106
     network embeddings and feature engineering: The case of national research university
     higher school of economics, in: Proceedings of the 18th ACM/IEEE on Joint Conference
     on Digital Libraries, ACM, 2018, pp. 365–366.
[58] I. Makarov, O. Gerasimova, P. Sulimov, K. Korovina, L. E. Zhukov, Joint node-edge network
     embedding for link prediction, in: International Conference on Analysis of Images, Social
     Networks and Texts, Springer, 2018, pp. 20–31.
[59] I. Makarov, O. Gerasimova, P. Sulimov, L. E. Zhukov, Co-authorship network embedding
     and recommending collaborators via network embedding, in: International Conference on
     Analysis of Images, Social Networks and Texts, Springer, 2018, pp. 32–38.
[60] I. Makarov, O. Gerasimova, P. Sulimov, L. E. Zhukov, Dual network embedding for
     representing research interests in the link prediction problem on co-authorship networks,
     PeerJ Computer Science 5 (2019) e172.
[61] I. Makarov, O. Gerasimova, Predicting collaborations in co-authorship network, in: 2019
     14th International Workshop on Semantic and Social Media Adaptation and Personalization
     (SMAP), IEEE, 2019, pp. 1–6.
[62] I. Makarov, O. Gerasimova, Link prediction regression for weighted co-authorship net-
     works, in: International Work-Conference on Artificial Neural Networks, Springer, 2019,
     pp. 667–677.
[63] I. Makarov, A. Oborevich, Network embedding for cluster analysis, in: Proceedings of
     CINTI’21, IEEE, 2021, pp. 1–6.
[64] A. Broilovskiy, I. Makarov, Human action recognition for boxing training simulator, in:
     International Conference on Analysis of Images, Social Networks and Texts, Springer,
     2020, pp. 331–343.


                                                                                                 107