=Paper=
{{Paper
|id=Vol-3658/paper16
|storemode=property
|title=Optimizing Sperm Detection and Tracking in Fluids with Equalize Class
Representation Augmentation
|pdfUrl=https://ceur-ws.org/Vol-3658/paper16.pdf
|volume=Vol-3658
|authors=Trong-Hieu Nguyen-Mau,Quoc-Huy Trinh,Ngoc-Linh Nguyen-Ha,Tuong-Vy Truong-Thuy,Tuan-Anh Yang,Hai-Dang Nguyen,Ngoc-Thao Nguyen,Minh-Triet Tran
|dblpUrl=https://dblp.org/rec/conf/mediaeval/MauTNTYNNT23
}}
==Optimizing Sperm Detection and Tracking in Fluids with Equalize Class
Representation Augmentation==
Optimizing Sperm Detection and Tracking in Fluids
with Equalize Class Representation Augmentation
Trong-Hieu Nguyen-Mau1,2 , Quoc-Huy Trinh1,2 , Ngoc-Linh Nguyen-Ha1,2 ,
Tuong-Vy Truong-Thuy1,2 , Tuan-Anh Yang1,2 , Hai-Dang Nguyen1,2,* ,
Ngoc-Thao Nguyen1,2,* and Minh-Triet Tran1,2,*
1
University of Science, VNU-HCM
2
Vietnam National University, Ho Chi Minh City, Vietnam
Abstract
The task of Transparent Tracking of Spermatozoa aims to detect and track sperm in a fluid environment.
In addressing this challenge, we propose a framework that utilizes YOLOv8 and BoTSORT to address
the issues related to the failure to detect small objects. Additionally, we suggest incorporating the
equalization augmentation method to tackle problems related to imbalanced data. Our analysis results
indicate that our methods can effectively resolve the imbalance issues in each data class and accurately
detect small objects. This improvement significantly enhances the overall detection results.
1. Introduction
Traditional manual sperm quality assessment through microscopy faces challenges like time
consumption, the need for expert skills, and variability in results. Computer-Aided-Sperm-
Analysis (CASA) systems, introduced to automate sperm identification, tracking, and counting,
offer an efficient alternative for male fertility evaluation. Despite their growing popularity,
CASA systems often struggle with inaccuracies. Previous deep learning approaches, including
those using YOLO-based models [1, 2], have shown promise in enhancing detection and tracking.
Yet, these methods still grapple with detecting small objects and addressing data imbalance,
leading to reduced precision in tracking spermatozoa.
To address these shortcomings, we propose a novel approach in this challenge. Our work
employs YOLOv8, a supervised model with the capability to effectively detect small objects, and
apply equalization augmentation to solve the problem of an imbalanced dataset. Additionally,
we assess the performance of this model using a simple tracking pipeline to underscore the
crucial role of the detection model in this task.
In the 2023 MediaEval challenge [3], our focus is on the Medical Multimedia Task - Transparent
Tracking of Spermatozoa. The Medico 2023 task [3] is centered on the effective tracking of
sperm cells in video recordings [4]. Our participation is geared towards resolving the primary
challenges in the accurate detection and tracking of sperm cells, which involves tackling both
Subtask 1 and Subtask 2 of the Medico 2023 challenge.
MediaEval’23: Multimedia Evaluation Workshop, February 1–2, 2024, Amsterdam, The Netherlands and Online
*
Corresponding author.
†
These authors contributed equally.
$ nmthieu@selab.hcmus.edu.vn (T. Nguyen-Mau); 20120013@student.hcmus.edu.vn (Q. Trinh);
nhnlinh20@apcs.fitus.edu.vn (N. Nguyen-Ha); tttvy20@apcs.fitus.edu.vn (T. Truong-Thuy);
ytanh21@apcs.fitus.edu.vn (T. Yang); nhdang@selab.hcmus.edu.vn (H. Nguyen); nnthao@fit.hcmus.edu.vn
(N. Nguyen); tmtriet@fit.hcmus.edu.vn (M. Tran)
0000-0003-2823-3861 (T. Nguyen-Mau); 0000-0002-7205-3211 (Q. Trinh); 0000-0003-0888-8908 (H. Nguyen);
0000-0003-0888-8908 (N. Nguyen); 0000-0003-3046-3041 (M. Tran)
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
2. Method
2.1. Detection model
YOLOv8 [5] represents the most recent advancement in the YOLO series of object detection
models by incorporating the Feature Pyramid Network (FPN) and the Path Aggregation Network
(PAN). The FPN in YOLOv8 [5] operates by progressively reducing the spatial resolution of the
input image while simultaneously increasing the number of feature channels. This process gen-
erates feature maps adept at detecting objects across various scales and resolutions. Conversely,
the PAN architecture enhances the model’s ability to capture multi-scale and multi-resolution
features essential for accurately identifying objects of diverse sizes and shapes, by integrating
features from different network levels using skip connections [6]. We employed YOLOv8 [5] and
its various scaled versions, including YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x
in the detection stage.
2.2. Equalize Class Representation Augmentation in Sperm Detection
This comparative display in Figure 1 showcases the augmentation process designed to balance
class representation in a sperm detection dataset. The first column presents the original mi-
croscopic images. The second column features these images annotated with blue bounding
boxes identifying sperm, green for clusters, and red for small or pinhead spermatozoa. The
third column demonstrates the images post-augmentation, indicating the enhancement of
dataset diversity. The final column displays the augmented images with retained and updated
annotations, ensuring accurate identification across the dataset’s newly diversified spectrum.
Figure 1: Visualization of the Equalize Class Representation Augmentation in Sperm Detection.
Table 1
Frequency of Different Classes
Class Samples Percentages
sperm 612377 93.30%
cluster 22112 3.37%
small or pinhead 21846 3.33%
Table 1 shows that in our dataset, the "sperm" class predominates at 93.30%, while "clus-
ter" and "small or pinhead" classes are underrepresented at 3.37% and 3.33%. This imbalance
highlights the need for our Equalize Class Representation Augmentation method, aimed at
balancing the dataset for better model training and enhancing detection accuracy for less fre-
quent classes. Our "Equalize Class Representation Augmentation in Sperm Detection" method
combats class imbalance by augmenting underrepresented classes to equal the dominant class’s
frequency. It involves cropping regions from original images and randomly pasting them at
non-overlapping locations, thereby increasing the presence of rarer classes. Updated annota-
tions ensure dataset integrity, leading to a more balanced class representation and improved
accuracy and generalization of the detection model.
It is worth noting that our Equalize Class Representation Augmentation method references
[7], which contextualizes our work within the broader field of data augmentation in instance
segmentation. [7] also showed that pasting objects randomly is sufficient and can provide solid
gains on top of strong baselines.
2.3. Tracking method
To track the detected sperm, we utilize BoT-SORT [8], an extended version of the BYTETracker
class for YOLOv8, specifically designed for object tracking incorporating ReID and the GMC
algorithm. One advantage of employing this tracking method over the previous one is its
capability to capture motion and seamlessly integrate it to enhance the Kalman filter state vector
more effectively.
The tracking system is configured with specific parameters, including an initial association
threshold of 0.5, a secondary association threshold of 0.1, an initialization threshold for new
tracks set at 0.6, a track buffer duration of 30, and a track matching threshold of 0.8. For BoT-
SORT, the settings comprise a global motion compensation method named sparseOptFlow, a
proximity threshold of 0.5, an appearance threshold of 0.25, and ReID model usage not enabled.
3. Experiment
3.1. Implementation Detail
During both the training and inference stages, our model is resized to 640, following the
instructions from YOLO [5]. As for the hyperparameters, a batch size of 64 is employed, and
the SGD optimizer with a learning rate of 0.001 is used. Additionally, online augmentation
techniques such as flip, rotation, mixup, translation, and mosaic are applied. Our model is
obtained after 300 epochs.
3.2. Experimental result
Results of our experiments on different sizes of pre-trained YOLOv8 model for the detection
task with the validation set are presented in Table 2. The validation set has 5850 images, with
the number of times class "sperm", "cluster", "small or pinhead" appear are 159305, 9606 and
5149, respectively.
Through experimenting with different sizes of pre-trained YOLOv8 models for detection,
we found that larger models generalize better on the task, with a noticeable exception for the
"cluster" class. YOLOv8x outperformed the "sperm" class, compared to smaller models - with a
mAP50 of 0.719 and an mAP50-95 of 0.271 for "sperm" classes. YOLOv8x also performed best
for detecting "small or pinhead" sperms with a mAP50 of 0.0919 and a mAP50-95 of 0.0361.
However, the smaller the model, the better it can detect "cluster" sperms. Most notably, YOLOv8n
had the highest precision and recall rate for cluster sperms - 0.253 and 0.112 respectively. By
applying data augmentation, YOLOv8n detected cluster sperms best with a mAP50 of 0.14 and
an mAP50-95 of 0.0384. Given the difference in model performance, ensembling is a possible
choice.
Table 2
Qualitative results of our methods
Model Data Augmentation Class P R mAP50 mAP50-95
YOLOv8n No all 0.288 0.254 0.219 0.0717
sperm 0.558 0.63 0.49 0.166
cluster 0.253 0.112 0.0797 0.0185
small or pinhead 0.0542 0.0202 0.0281 0.0101
YOLOv8n Yes all 0.274 0.262 0.236 0.0861
sperm 0.628 0.637 0.6 0.227
cluster 0.139 0.0999 0.14 0.0384
small or pinhead 0.0546 0.0478 0.0288 0.0129
YOLOv8s Yes all 0.23 0.236 0.191 0.0705
sperm 0.59 0.628 0.52 0.199
cluster 0.0734 0.038 0.0387 0.00739
small or pinhead 0.0262 0.0406 0.0139 0.00503
YOLOv8m Yes all 0.208 0.182 0.155 0.0533
sperm 0.522 0.505 0.411 0.141
cluster 0.00257 0.000104 0.0013 0.00013
small or pinhead 0.1 0.0404 0.0523 0.019
YOLOv8l Yes all 0.234 0.23 0.206 0.0768
sperm 0.582 0.635 0.558 0.216
cluster 0.0805 0.0237 0.0412 0.00595
small or pinhead 0.0398 0.033 0.0206 0.00853
YOLOv8x Yes all 0.317 0.272 0.28 0.108
sperm 0.727 0.742 0.719 0.271
cluster 0.0586 0.00271 0.0294 0.0159
small or pinhead 0.166 0.0701 0.0919 0.0361
4. Discussion and Outlook
In conclusion, in this challenge, we introduce a novel framework employing YOLOv8 and
advancements in equalization augmentation to tackle issues related to sperm shape and class
imbalance as observed in the aforementioned work. The experimental results highlight that our
model effectively mitigates weaknesses in detecting small objects, ultimately yielding improved
results in the tracking stage. Furthermore, incorporating our offline augmentation methods
into the dataset can assist the model in partially addressing issues related to class imbalance.
The results from the experiments demonstrate the promise of our method to facilitate further
research in sperm detection, contributing to enhanced performance of the tracking pipeline in
general.
Acknowledgment
This research is funded by Viet Nam National University Ho Chi Minh City (VNU-HCM) under
grant number DS2020-42-01.
References
[1] T.-L. Huynh, H.-H. Nguyen, X.-N. Hoang, T. T. P. Dao, T.-P. Nguyen, V.-T. Huynh, H.-
D. Nguyen, T.-N. Le, M.-T. Tran, Tail-aware sperm analysis for transparent tracking of
spermatozoa (2022).
[2] M. Kosela, J. Aszyk, M. Jarek, J. Klimek, T. Prokop, Tracking of spermatozoa by yolov5
detection and strongsort with osnet tracker (2022).
[3] V. Thambawita, A. M. Storås, T.-L. Huynh, H.-D. Nguyen, M.-T. Tran, T.-N. Le, P. Halvorsen,
M. A. Riegler, S. Hicks, Medico Multimedia Task at MediaEval 2023: Transparent Tracking
of Spermatozoa, in: Proceedings of MediaEval 2023 CEUR Workshop, 2023.
[4] T. B. Haugen, S. A. Hicks, J. M. Andersen, O. Witczak, H. L. Hammer, R. Borgli, P. Halvorsen,
M. Riegler, Visem: A multimodal video dataset of human spermatozoa, in: MMSys, 2019,
pp. 261–266.
[5] G. Jocher, A. Chaurasia, J. Qiu, YOLO by Ultralytics, 2023.
[6] J. Terven, D. Cordova-Esparza, A comprehensive review of yolo: From yolov1 to yolov8
and beyond, arXiv preprint arXiv:2304.00501 (2023).
[7] G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T.-Y. Lin, E. D. Cubuk, Q. V. Le, B. Zoph, Simple
copy-paste is a strong data augmentation method for instance segmentation, in: CVPR,
IEEE, 2021.
[8] N. Aharon, R. Orfaig, B.-Z. Bobrovsky, Bot-sort: Robust associations multi-pedestrian
tracking, arXiv preprint arXiv:2206.14651 (2022).