<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimizing Sperm Detection and Tracking in Fluids with Equalize Class Representation Augmentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Trong-Hieu Nguyen-Mau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Quoc-Huy Trinh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ngoc-Linh Nguyen-Ha</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tuong-Vy Truong-Thuy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tuan-Anh Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hai-Dang Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ngoc-Thao Nguyen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Triet Tran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Science</institution>
          ,
          <addr-line>VNU-HCM</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vietnam National University</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The task of Transparent Tracking of Spermatozoa aims to detect and track sperm in a fluid environment. In addressing this challenge, we propose a framework that utilizes YOLOv8 and BoTSORT to address the issues related to the failure to detect small objects. Additionally, we suggest incorporating the equalization augmentation method to tackle problems related to imbalanced data. Our analysis results indicate that our methods can efectively resolve the imbalance issues in each data class and accurately detect small objects. This improvement significantly enhances the overall detection results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Traditional manual sperm quality assessment through microscopy faces challenges like time
consumption, the need for expert skills, and variability in results.
Computer-Aided-SpermAnalysis (CASA) systems, introduced to automate sperm identification, tracking, and counting,
ofer an eficient alternative for male fertility evaluation. Despite their growing popularity,
CASA systems often struggle with inaccuracies. Previous deep learning approaches, including
those using YOLO-based models [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], have shown promise in enhancing detection and tracking.
Yet, these methods still grapple with detecting small objects and addressing data imbalance,
leading to reduced precision in tracking spermatozoa.
      </p>
      <p>To address these shortcomings, we propose a novel approach in this challenge. Our work
employs YOLOv8, a supervised model with the capability to efectively detect small objects, and
apply equalization augmentation to solve the problem of an imbalanced dataset. Additionally,
we assess the performance of this model using a simple tracking pipeline to underscore the
crucial role of the detection model in this task.</p>
      <p>
        In the 2023 MediaEval challenge [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], our focus is on the Medical Multimedia Task - Transparent
Tracking of Spermatozoa. The Medico 2023 task [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is centered on the efective tracking of
sperm cells in video recordings [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Our participation is geared towards resolving the primary
challenges in the accurate detection and tracking of sperm cells, which involves tackling both
Subtask 1 and Subtask 2 of the Medico 2023 challenge.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Method</title>
      <sec id="sec-2-1">
        <title>2.1. Detection model</title>
        <p>
          YOLOv8 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] represents the most recent advancement in the YOLO series of object detection
models by incorporating the Feature Pyramid Network (FPN) and the Path Aggregation Network
(PAN). The FPN in YOLOv8 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] operates by progressively reducing the spatial resolution of the
input image while simultaneously increasing the number of feature channels. This process
generates feature maps adept at detecting objects across various scales and resolutions. Conversely,
the PAN architecture enhances the model’s ability to capture multi-scale and multi-resolution
features essential for accurately identifying objects of diverse sizes and shapes, by integrating
features from diferent network levels using skip connections [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. We employed YOLOv8 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and
its various scaled versions, including YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x
in the detection stage.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Equalize Class Representation Augmentation in Sperm Detection</title>
        <p>This comparative display in Figure 1 showcases the augmentation process designed to balance
class representation in a sperm detection dataset. The first column presents the original
microscopic images. The second column features these images annotated with blue bounding
boxes identifying sperm, green for clusters, and red for small or pinhead spermatozoa. The
third column demonstrates the images post-augmentation, indicating the enhancement of
dataset diversity. The final column displays the augmented images with retained and updated
annotations, ensuring accurate identification across the dataset’s newly diversified spectrum.</p>
        <p>Table 1 shows that in our dataset, the "sperm" class predominates at 93.30%, while
"cluster" and "small or pinhead" classes are underrepresented at 3.37% and 3.33%. This imbalance
highlights the need for our Equalize Class Representation Augmentation method, aimed at
balancing the dataset for better model training and enhancing detection accuracy for less
frequent classes. Our "Equalize Class Representation Augmentation in Sperm Detection" method
combats class imbalance by augmenting underrepresented classes to equal the dominant class’s
frequency. It involves cropping regions from original images and randomly pasting them at
non-overlapping locations, thereby increasing the presence of rarer classes. Updated
annotations ensure dataset integrity, leading to a more balanced class representation and improved
accuracy and generalization of the detection model.</p>
        <p>
          It is worth noting that our Equalize Class Representation Augmentation method references
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], which contextualizes our work within the broader field of data augmentation in instance
segmentation. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] also showed that pasting objects randomly is suficient and can provide solid
gains on top of strong baselines.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Tracking method</title>
        <p>
          To track the detected sperm, we utilize BoT-SORT [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], an extended version of the BYTETracker
class for YOLOv8, specifically designed for object tracking incorporating ReID and the GMC
algorithm. One advantage of employing this tracking method over the previous one is its
capability to capture motion and seamlessly integrate it to enhance the Kalman filter state vector
more efectively.
        </p>
        <p>The tracking system is configured with specific parameters, including an initial association
threshold of 0.5, a secondary association threshold of 0.1, an initialization threshold for new
tracks set at 0.6, a track bufer duration of 30, and a track matching threshold of 0.8. For
BoTSORT, the settings comprise a global motion compensation method named sparseOptFlow, a
proximity threshold of 0.5, an appearance threshold of 0.25, and ReID model usage not enabled.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment</title>
      <sec id="sec-3-1">
        <title>3.1. Implementation Detail</title>
        <p>
          During both the training and inference stages, our model is resized to 640, following the
instructions from YOLO [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. As for the hyperparameters, a batch size of 64 is employed, and
the SGD optimizer with a learning rate of 0.001 is used. Additionally, online augmentation
techniques such as flip, rotation, mixup, translation, and mosaic are applied. Our model is
obtained after 300 epochs.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experimental result</title>
        <p>Results of our experiments on diferent sizes of pre-trained YOLOv8 model for the detection
task with the validation set are presented in Table 2. The validation set has 5850 images, with
the number of times class "sperm", "cluster", "small or pinhead" appear are 159305, 9606 and
5149, respectively.</p>
        <p>Through experimenting with diferent sizes of pre-trained YOLOv8 models for detection,
we found that larger models generalize better on the task, with a noticeable exception for the
"cluster" class. YOLOv8x outperformed the "sperm" class, compared to smaller models - with a
mAP50 of 0.719 and an mAP50-95 of 0.271 for "sperm" classes. YOLOv8x also performed best
for detecting "small or pinhead" sperms with a mAP50 of 0.0919 and a mAP50-95 of 0.0361.
However, the smaller the model, the better it can detect "cluster" sperms. Most notably, YOLOv8n
had the highest precision and recall rate for cluster sperms - 0.253 and 0.112 respectively. By
applying data augmentation, YOLOv8n detected cluster sperms best with a mAP50 of 0.14 and
an mAP50-95 of 0.0384. Given the diference in model performance, ensembling is a possible
choice.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Outlook</title>
      <p>In conclusion, in this challenge, we introduce a novel framework employing YOLOv8 and
advancements in equalization augmentation to tackle issues related to sperm shape and class
imbalance as observed in the aforementioned work. The experimental results highlight that our
model efectively mitigates weaknesses in detecting small objects, ultimately yielding improved
results in the tracking stage. Furthermore, incorporating our ofline augmentation methods
into the dataset can assist the model in partially addressing issues related to class imbalance.
The results from the experiments demonstrate the promise of our method to facilitate further
research in sperm detection, contributing to enhanced performance of the tracking pipeline in
general.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgment</title>
      <p>This research is funded by Viet Nam National University Ho Chi Minh City (VNU-HCM) under
grant number DS2020-42-01.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.-L.</given-names>
            <surname>Huynh</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Nguyen</surname>
            ,
            <given-names>X.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Hoang</surname>
            ,
            <given-names>T. T. P.</given-names>
          </string-name>
          <string-name>
            <surname>Dao</surname>
            ,
            <given-names>T.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          , V.-T. Huynh, H.
          <string-name>
            <surname>- D. Nguyen</surname>
            ,
            <given-names>T.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          , M.-T. Tran,
          <article-title>Tail-aware sperm analysis for transparent tracking of spermatozoa (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kosela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Aszyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klimek</surname>
          </string-name>
          , T. Prokop,
          <article-title>Tracking of spermatozoa by yolov5 detection and strongsort with osnet tracker (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Storås</surname>
          </string-name>
          , T.-L. Huynh, H.
          <string-name>
            <surname>-D. Nguyen</surname>
            , M.-T. Tran,
            <given-names>T.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
          </string-name>
          , Medico Multimedia Task at MediaEval 2023:
          <article-title>Transparent Tracking of Spermatozoa</article-title>
          ,
          <source>in: Proceedings of MediaEval 2023 CEUR Workshop</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. B.</given-names>
            <surname>Haugen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Witczak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Hammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Borgli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riegler</surname>
          </string-name>
          ,
          <article-title>Visem: A multimodal video dataset of human spermatozoa</article-title>
          ,
          <source>in: MMSys</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>261</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Jocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chaurasia</surname>
          </string-name>
          , J. Qiu, YOLO by Ultralytics,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Terven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cordova-Esparza</surname>
          </string-name>
          ,
          <article-title>A comprehensive review of yolo: From yolov1 to yolov8 and beyond</article-title>
          ,
          <source>arXiv preprint arXiv:2304.00501</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ghiasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Qian</surname>
          </string-name>
          , T.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Cubuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <article-title>Simple copy-paste is a strong data augmentation method for instance segmentation</article-title>
          , in: CVPR, IEEE,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Aharon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Orfaig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-Z.</given-names>
            <surname>Bobrovsky</surname>
          </string-name>
          ,
          <article-title>Bot-sort: Robust associations multi-pedestrian tracking</article-title>
          ,
          <source>arXiv preprint arXiv:2206.14651</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>