=Paper= {{Paper |id=Vol-3658/paper16 |storemode=property |title=Optimizing Sperm Detection and Tracking in Fluids with Equalize Class Representation Augmentation |pdfUrl=https://ceur-ws.org/Vol-3658/paper16.pdf |volume=Vol-3658 |authors=Trong-Hieu Nguyen-Mau,Quoc-Huy Trinh,Ngoc-Linh Nguyen-Ha,Tuong-Vy Truong-Thuy,Tuan-Anh Yang,Hai-Dang Nguyen,Ngoc-Thao Nguyen,Minh-Triet Tran |dblpUrl=https://dblp.org/rec/conf/mediaeval/MauTNTYNNT23 }} ==Optimizing Sperm Detection and Tracking in Fluids with Equalize Class Representation Augmentation== https://ceur-ws.org/Vol-3658/paper16.pdf
                                Optimizing Sperm Detection and Tracking in Fluids
                                with Equalize Class Representation Augmentation
                                Trong-Hieu Nguyen-Mau1,2 , Quoc-Huy Trinh1,2 , Ngoc-Linh Nguyen-Ha1,2 ,
                                Tuong-Vy Truong-Thuy1,2 , Tuan-Anh Yang1,2 , Hai-Dang Nguyen1,2,* ,
                                Ngoc-Thao Nguyen1,2,* and Minh-Triet Tran1,2,*
                                1
                                    University of Science, VNU-HCM
                                2
                                    Vietnam National University, Ho Chi Minh City, Vietnam


                                                                         Abstract
                                                                         The task of Transparent Tracking of Spermatozoa aims to detect and track sperm in a fluid environment.
                                                                         In addressing this challenge, we propose a framework that utilizes YOLOv8 and BoTSORT to address
                                                                         the issues related to the failure to detect small objects. Additionally, we suggest incorporating the
                                                                         equalization augmentation method to tackle problems related to imbalanced data. Our analysis results
                                                                         indicate that our methods can effectively resolve the imbalance issues in each data class and accurately
                                                                         detect small objects. This improvement significantly enhances the overall detection results.




                                1. Introduction
                                Traditional manual sperm quality assessment through microscopy faces challenges like time
                                consumption, the need for expert skills, and variability in results. Computer-Aided-Sperm-
                                Analysis (CASA) systems, introduced to automate sperm identification, tracking, and counting,
                                offer an efficient alternative for male fertility evaluation. Despite their growing popularity,
                                CASA systems often struggle with inaccuracies. Previous deep learning approaches, including
                                those using YOLO-based models [1, 2], have shown promise in enhancing detection and tracking.
                                Yet, these methods still grapple with detecting small objects and addressing data imbalance,
                                leading to reduced precision in tracking spermatozoa.
                                   To address these shortcomings, we propose a novel approach in this challenge. Our work
                                employs YOLOv8, a supervised model with the capability to effectively detect small objects, and
                                apply equalization augmentation to solve the problem of an imbalanced dataset. Additionally,
                                we assess the performance of this model using a simple tracking pipeline to underscore the
                                crucial role of the detection model in this task.
                                   In the 2023 MediaEval challenge [3], our focus is on the Medical Multimedia Task - Transparent
                                Tracking of Spermatozoa. The Medico 2023 task [3] is centered on the effective tracking of
                                sperm cells in video recordings [4]. Our participation is geared towards resolving the primary
                                challenges in the accurate detection and tracking of sperm cells, which involves tackling both
                                Subtask 1 and Subtask 2 of the Medico 2023 challenge.
                                MediaEval’23: Multimedia Evaluation Workshop, February 1–2, 2024, Amsterdam, The Netherlands and Online
                                *
                                  Corresponding author.
                                †
                                  These authors contributed equally.
                                $ nmthieu@selab.hcmus.edu.vn (T. Nguyen-Mau); 20120013@student.hcmus.edu.vn (Q. Trinh);
                                nhnlinh20@apcs.fitus.edu.vn (N. Nguyen-Ha); tttvy20@apcs.fitus.edu.vn (T. Truong-Thuy);
                                ytanh21@apcs.fitus.edu.vn (T. Yang); nhdang@selab.hcmus.edu.vn (H. Nguyen); nnthao@fit.hcmus.edu.vn
                                (N. Nguyen); tmtriet@fit.hcmus.edu.vn (M. Tran)
                                 0000-0003-2823-3861 (T. Nguyen-Mau); 0000-0002-7205-3211 (Q. Trinh); 0000-0003-0888-8908 (H. Nguyen);
                                0000-0003-0888-8908 (N. Nguyen); 0000-0003-3046-3041 (M. Tran)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Method
2.1. Detection model
YOLOv8 [5] represents the most recent advancement in the YOLO series of object detection
models by incorporating the Feature Pyramid Network (FPN) and the Path Aggregation Network
(PAN). The FPN in YOLOv8 [5] operates by progressively reducing the spatial resolution of the
input image while simultaneously increasing the number of feature channels. This process gen-
erates feature maps adept at detecting objects across various scales and resolutions. Conversely,
the PAN architecture enhances the model’s ability to capture multi-scale and multi-resolution
features essential for accurately identifying objects of diverse sizes and shapes, by integrating
features from different network levels using skip connections [6]. We employed YOLOv8 [5] and
its various scaled versions, including YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x
in the detection stage.

2.2. Equalize Class Representation Augmentation in Sperm Detection
This comparative display in Figure 1 showcases the augmentation process designed to balance
class representation in a sperm detection dataset. The first column presents the original mi-
croscopic images. The second column features these images annotated with blue bounding
boxes identifying sperm, green for clusters, and red for small or pinhead spermatozoa. The
third column demonstrates the images post-augmentation, indicating the enhancement of
dataset diversity. The final column displays the augmented images with retained and updated
annotations, ensuring accurate identification across the dataset’s newly diversified spectrum.




Figure 1: Visualization of the Equalize Class Representation Augmentation in Sperm Detection.



Table 1
Frequency of Different Classes
                                 Class         Samples    Percentages
                                 sperm          612377      93.30%
                                cluster         22112       3.37%
                            small or pinhead    21846       3.33%

   Table 1 shows that in our dataset, the "sperm" class predominates at 93.30%, while "clus-
ter" and "small or pinhead" classes are underrepresented at 3.37% and 3.33%. This imbalance
highlights the need for our Equalize Class Representation Augmentation method, aimed at
balancing the dataset for better model training and enhancing detection accuracy for less fre-
quent classes. Our "Equalize Class Representation Augmentation in Sperm Detection" method
combats class imbalance by augmenting underrepresented classes to equal the dominant class’s
frequency. It involves cropping regions from original images and randomly pasting them at
non-overlapping locations, thereby increasing the presence of rarer classes. Updated annota-
tions ensure dataset integrity, leading to a more balanced class representation and improved
accuracy and generalization of the detection model.
   It is worth noting that our Equalize Class Representation Augmentation method references
[7], which contextualizes our work within the broader field of data augmentation in instance
segmentation. [7] also showed that pasting objects randomly is sufficient and can provide solid
gains on top of strong baselines.

2.3. Tracking method
To track the detected sperm, we utilize BoT-SORT [8], an extended version of the BYTETracker
class for YOLOv8, specifically designed for object tracking incorporating ReID and the GMC
algorithm. One advantage of employing this tracking method over the previous one is its
capability to capture motion and seamlessly integrate it to enhance the Kalman filter state vector
more effectively.
   The tracking system is configured with specific parameters, including an initial association
threshold of 0.5, a secondary association threshold of 0.1, an initialization threshold for new
tracks set at 0.6, a track buffer duration of 30, and a track matching threshold of 0.8. For BoT-
SORT, the settings comprise a global motion compensation method named sparseOptFlow, a
proximity threshold of 0.5, an appearance threshold of 0.25, and ReID model usage not enabled.


3. Experiment
3.1. Implementation Detail
During both the training and inference stages, our model is resized to 640, following the
instructions from YOLO [5]. As for the hyperparameters, a batch size of 64 is employed, and
the SGD optimizer with a learning rate of 0.001 is used. Additionally, online augmentation
techniques such as flip, rotation, mixup, translation, and mosaic are applied. Our model is
obtained after 300 epochs.

3.2. Experimental result
Results of our experiments on different sizes of pre-trained YOLOv8 model for the detection
task with the validation set are presented in Table 2. The validation set has 5850 images, with
the number of times class "sperm", "cluster", "small or pinhead" appear are 159305, 9606 and
5149, respectively.
   Through experimenting with different sizes of pre-trained YOLOv8 models for detection,
we found that larger models generalize better on the task, with a noticeable exception for the
"cluster" class. YOLOv8x outperformed the "sperm" class, compared to smaller models - with a
mAP50 of 0.719 and an mAP50-95 of 0.271 for "sperm" classes. YOLOv8x also performed best
for detecting "small or pinhead" sperms with a mAP50 of 0.0919 and a mAP50-95 of 0.0361.
However, the smaller the model, the better it can detect "cluster" sperms. Most notably, YOLOv8n
had the highest precision and recall rate for cluster sperms - 0.253 and 0.112 respectively. By
applying data augmentation, YOLOv8n detected cluster sperms best with a mAP50 of 0.14 and
an mAP50-95 of 0.0384. Given the difference in model performance, ensembling is a possible
choice.
Table 2
Qualitative results of our methods

    Model      Data Augmentation          Class           P          R       mAP50    mAP50-95
  YOLOv8n              No                   all          0.288      0.254     0.219     0.0717
                                          sperm          0.558       0.63      0.49      0.166
                                         cluster         0.253      0.112    0.0797    0.0185
                                     small or pinhead   0.0542     0.0202    0.0281     0.0101
  YOLOv8n              Yes                  all          0.274      0.262     0.236     0.0861
                                          sperm          0.628      0.637      0.6       0.227
                                         cluster         0.139     0.0999     0.14     0.0384
                                     small or pinhead   0.0546     0.0478    0.0288     0.0129
  YOLOv8s              Yes                  all           0.23      0.236     0.191    0.0705
                                          sperm           0.59      0.628      0.52      0.199
                                         cluster         0.0734     0.038    0.0387    0.00739
                                     small or pinhead   0.0262     0.0406    0.0139    0.00503
  YOLOv8m              Yes                  all          0.208      0.182     0.155     0.0533
                                          sperm          0.522      0.505     0.411      0.141
                                         cluster        0.00257   0.000104   0.0013    0.00013
                                     small or pinhead      0.1     0.0404    0.0523     0.019
  YOLOv8l              Yes                  all          0.234       0.23     0.206    0.0768
                                          sperm          0.582      0.635     0.558      0.216
                                         cluster         0.0805    0.0237    0.0412    0.00595
                                     small or pinhead   0.0398      0.033    0.0206    0.00853
  YOLOv8x              Yes                  all          0.317      0.272     0.28      0.108
                                          sperm          0.727      0.742    0.719      0.271
                                         cluster         0.0586    0.00271   0.0294     0.0159
                                     small or pinhead    0.166     0.0701    0.0919    0.0361



4. Discussion and Outlook
In conclusion, in this challenge, we introduce a novel framework employing YOLOv8 and
advancements in equalization augmentation to tackle issues related to sperm shape and class
imbalance as observed in the aforementioned work. The experimental results highlight that our
model effectively mitigates weaknesses in detecting small objects, ultimately yielding improved
results in the tracking stage. Furthermore, incorporating our offline augmentation methods
into the dataset can assist the model in partially addressing issues related to class imbalance.
The results from the experiments demonstrate the promise of our method to facilitate further
research in sperm detection, contributing to enhanced performance of the tracking pipeline in
general.


Acknowledgment
This research is funded by Viet Nam National University Ho Chi Minh City (VNU-HCM) under
grant number DS2020-42-01.
References
[1] T.-L. Huynh, H.-H. Nguyen, X.-N. Hoang, T. T. P. Dao, T.-P. Nguyen, V.-T. Huynh, H.-
    D. Nguyen, T.-N. Le, M.-T. Tran, Tail-aware sperm analysis for transparent tracking of
    spermatozoa (2022).
[2] M. Kosela, J. Aszyk, M. Jarek, J. Klimek, T. Prokop, Tracking of spermatozoa by yolov5
    detection and strongsort with osnet tracker (2022).
[3] V. Thambawita, A. M. Storås, T.-L. Huynh, H.-D. Nguyen, M.-T. Tran, T.-N. Le, P. Halvorsen,
    M. A. Riegler, S. Hicks, Medico Multimedia Task at MediaEval 2023: Transparent Tracking
    of Spermatozoa, in: Proceedings of MediaEval 2023 CEUR Workshop, 2023.
[4] T. B. Haugen, S. A. Hicks, J. M. Andersen, O. Witczak, H. L. Hammer, R. Borgli, P. Halvorsen,
    M. Riegler, Visem: A multimodal video dataset of human spermatozoa, in: MMSys, 2019,
    pp. 261–266.
[5] G. Jocher, A. Chaurasia, J. Qiu, YOLO by Ultralytics, 2023.
[6] J. Terven, D. Cordova-Esparza, A comprehensive review of yolo: From yolov1 to yolov8
    and beyond, arXiv preprint arXiv:2304.00501 (2023).
[7] G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T.-Y. Lin, E. D. Cubuk, Q. V. Le, B. Zoph, Simple
    copy-paste is a strong data augmentation method for instance segmentation, in: CVPR,
    IEEE, 2021.
[8] N. Aharon, R. Orfaig, B.-Z. Bobrovsky, Bot-sort: Robust associations multi-pedestrian
    tracking, arXiv preprint arXiv:2206.14651 (2022).