<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Medico Multimedia Task at MediaEval 2023: Transparent Tracking of Spermatozoa</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vajira Thambawita</string-name>
          <email>vajira@simula.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea M. Storås</string-name>
          <email>andrea@simula.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tuan-Luc Huynh</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thien-Phuc Tran</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hai-Dang Nguyen</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Triet Tran</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trung-Nghia Le</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pål Halvorsen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael A. Riegler</string-name>
          <email>michael@simula.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven Hicks</string-name>
          <email>steven@simula.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>OsloMet</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SimulaMet</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Science</institution>
          ,
          <addr-line>VNU-HCM</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vietnam National University</institution>
          ,
          <addr-line>Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Medico Multimedia Task returns for its seventh iteration as a segment of MediaEval 2023. The challenge comprises three main tasks: sperm tracking, sperm detection, and sperm motility predictions. Additionally, this year, we have broadened our focus by incorporating new graph data derived from sperm-bounding boxes and unique identifiers taken from manually annotated data. We invite participants to employ innovative methods, diverging from traditional ones, to study sperm using machine learning. The dataset includes video recordings of spermatozoa, complemented with annotations and graph data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The 2023 Medico task builds on the previous Medico edition about transparent tracking of
spermatozoa in videos [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. While infertility is increasing on a global basis, optimizing
techniques used for treating this medical condition is becoming increasingly more important.
A central part of selecting the appropriate treatment is analysis of semen samples through
a microscope, which is performed by a medical expert. However, manual examinations are
time consuming, and the results are subjective and highly dependent on the experience of the
medical expert. Computer-aided systems for automatic analysis have been developed, but they
do not work well in clinical settings. Consequently, there is a need for improved methods for
identifying, tracking and counting spermatozoa in fresh semen samples.
      </p>
      <p>The goal of the 2023 Medico task is to encourage the participants to track individual
spermatozoa in real-time and combine diferent data sources to predict common measurements used
for sperm quality assessment, specifically the motility of the spermatozoa. Solving this task
successfully might pave the way for developing improved computer-aided systems to assist
medical experts in the fertility clinic.</p>
      <p>
        Annotated videos from the VISEM-Tracking dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are used in the task. The provided
development dataset contains 20 videos which have frame-by-frame bounding box annotations,
each being 30 seconds long. In addition, we provide a set of sperm characteristics (hormone
levels, fatty acid data, etc.), anonymized study participant data, and motility and morphology data
aligned with World Health Organization (WHO) recommendations. Finally, we provide graph
data, i.e., data containing nodes and edges that represent the spatial and temporal relationships
between the sperm, extracted from the original VISEM-Tracking dataset. Based on this data,
the participants will be asked to solve the following four subtasks, where Subtask 4 is optional:
      </p>
      <p>Subtask 1: The goal of this subtask is localization and tracking of sperm cells in a given
semen video. Specifically, the subtask focuses on examining microscopic videos of sperm, where
experts have manually annotated spermatozoa. Participants are tasked with detecting individual
sperm cells by providing bounding box coordinates and tracking them by assigning unique IDs.
The required format for the bounding box coordinates should align with the structure used in
the development datasets.</p>
      <p>Subtask 2: This subtask requires the participants to further refine the methodologies
employed in Subtask 1, emphasizing not only high prediction accuracy but also computational
eficiency and inference time. Participants must provide reports on the average Frames Per
Second (FPS) and Floating-Point Operations Per Second (FLOPS) while conducting inference
with a batch size of 1.</p>
      <p>Subtask 3: The goal of this task is to anticipate sperm motility1 in terms of the percentage
of progressive and non-progressive spermatozoa. The prediction needs to be performed
patientwise, resulting in a singular value for each patient pertaining to the predicted attribute. To
address this subtask, sperm tracking or bounding boxes obtained from Subtasks 1 and/or 2
are indispensable. Participants are strongly encouraged to consider the temporal dimension
of the videos, since temporal information propagated from previous frames are crucial for
extrapolating properties in subsequent frames (i.e., sperm motility). This is important due
to the fact that an analysis based solely on individual frames will be insuficient to capture
the movement or motility of sperm, which contains vital information necessary for accurate
predictions.</p>
      <p>Subtask 4: This task is experimental in nature and asks the participants to generate graphs
representing the predicted tracks for the spermatozoa in order to assess the sperm motility.
The participants are asked to employ graph data structures as input to a model that predicts
the level of motility in sperm samples. The construction of graph structures can be facilitated
using the predicted bounding boxes. Graphs for training models to predict sperm motility are
provided, while the graphs required for testing the final models must be generated from the
prediction models in Subtasks 1 and/or 2.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset Details</title>
      <p>
        As in the 2022 edition [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], the 2023 Medico task uses the VISEM-Tracking dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
VISEMTracking is based on the VISEM dataset [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where males aged 18 years and older were examined
with respect to fertility. The participants provided written informed consent to participate in
the study. The project was approved by the Norwegian data authority and the Regional Medical
Ethics Committee of South-East Norway (REK). For this task, we include a development set
consisting of 20 videos from VISEM-Tracking. Each video is 30 seconds long and contains
detailed frame-by-frame annotations of individual spermatozoa using bounding boxes. Five
additional videos without annotations are provided for testing.
      </p>
      <p>
        For each patient, we include a video of live sperm (video and extracted frames), manually
annotated bounding box details for each spermatozoon (sample frames are presented in Figure 1),
a set of measurements from a standard semen analysis for the whole sample, a sperm fatty acid
profile, the fatty acid composition of serum phospholipids, study participants-related data, and
WHO analysis data. The bounding box coordinates are provided in two separate folders: one
1Motility is the ability to move independently, where a progressive spermatozoon is able to "move forward" and a
non-progressive would move for example in circles without any forward progression.
folder has bounding box coordinates in YOLO format [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and the other folder contains feature
identifiers in addition to the bounding box coordinates. These feature identifiers can be used to
identify the same bounding box in diferent frames in a video. Each video has a resolution of
640 × 480 pixels and runs at 50 frames per second (FPS). The dataset contains six CSV files in
total. One row in each CSV file represents a participant. The six CSV files are:
• semen_analysis_data: the results of standard semen analysis.
• fatty_acids_spermatozoa: the levels of several fatty acids in the sperm of the participants.
• fatty_acids_serum: the serum levels of the fatty acids of the phospholipids (measured
from the blood of the participant).
• sex_hormones: the serum levels of sex hormones measured in the blood of the participants.
• study_participant_related_data: general information about the participants such as age,
abstinence time, and Body Mass Index (BMI).
      </p>
      <p>• videos: overview of which video file belongs to what participant.</p>
      <p>Regarding Subtask 4, which is optional, graph data2 is provided. The structure of the graphs
is depicted in Figure 2. The graphs represent spatial and temporal relationships between sperm
in a video. Spatial edges () connect sperms within the same frame, while temporal edges ()
connect sperms across diferent frames. The graphs have been generated with varying spatial
threshold ( ) values, where each threshold value determines the maximum distance between
two nodes for them to be connected in the graph. The following threshold values are used: 0.1,
0.2, 0.3, 0.4, and 0.5. The graph data contains separate folders with graphs generated using each
spatial threshold, i.e., there are five folders in total. In each threshold folder, there is a subfolder
including separate graphs for individual frames in the video and a GraphML file containing the
graph for the complete video.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <p>In order to evaluate the proposed solutions thoroughly, several evaluation metrics are computed
for each subtask.</p>
      <p>
        Subtask 1: We use the widely recognized COCO evaluation metrics to evaluate the
performance of sperm detection methods. This comprehensive assessment involves precision,
recall, mAP@50 (mean average precision @50), and mAP@50-95, providing a multidimensional
perspective on the accuracy of the methods employed in this critical domain. Furthermore,
we leverage Jonathan Luiten’s TrackEval library [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], encompassing crucial metrics such as the
Higher Order Tracking Accuracy (HOTA) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and a spectrum of other multi-object tracking
(MOT) evaluation criteria, to ofer a profound analysis of tracking performance. This addition
broadens the scope of evaluation.
      </p>
      <p>Subtask 2: We utilize the aforementioned evaluation metrics as in Subtask 1. However, the
performance evaluation considers the inference speed of the methods as a weighted factor. This
integration adds a new dimension to the evaluation, weighing the performance against the
eficiency and agility of inference, thus illuminating the intricate interplay between accuracy
and speed in sperm detection and tracking.</p>
      <p>Subtask 3 and Subtask 4: Evaluation of regression sperm motility prediction performance
involves the use of Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE).
These metrics provide insights into the accuracy of the predictions.
2https://huggingface.co/datasets/SimulaMet-HOST/visem-tracking-graphs
(b)
(e)
(h)
et
(c)
(f)
(i)
N3</p>
      <p>N2
es Tr</p>
      <p>N0
C
Frame t+1</p>
      <p>N1
(a)
(d)
(g)
N3</p>
      <p>N2
es Tr</p>
      <p>N0
C
Frame t</p>
      <p>N1</p>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion and Outlook</title>
      <p>This year, we introduced supplementary graph data derived from the details of bounding boxes
and feature IDs. These IDs facilitate the identification of the same sperm cell across multiple
frames of a video. We believe that participants can explore innovative approaches that diverge
from conventional methods in the realm of sperm analysis through machine learning. Alongside
experimental Subtask 4, we retained the foundational tasks associated with sperm tracking and
the prediction of sperm motility, denoted as Subtasks 1, 2, and 3.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Storås</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Witczak</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Haugen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Hammer</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
          </string-name>
          , Medico Multimedia Task at MediaEval 2022:
          <article-title>Transparent Tracking of Spermatozoa</article-title>
          ,
          <source>in: Proceedings of MediaEval 2022 CEUR Workshop</source>
          ,
          <year>2022</year>
          . URL: https: //2022.multimediaeval.com/paper5501.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.-L.</given-names>
            <surname>Huynh</surname>
          </string-name>
          , H.
          <string-name>
            <surname>-H. Nguyen</surname>
            ,
            <given-names>X.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Hoang</surname>
            ,
            <given-names>T. T. P.</given-names>
          </string-name>
          <string-name>
            <surname>Dao</surname>
            ,
            <given-names>T.-P.</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          , V.-T. Huynh, H.
          <string-name>
            <surname>-D. Nguyen</surname>
            ,
            <given-names>T.-N.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          , M.-T. Tran,
          <article-title>Tail-Aware Sperm Analysis for Transparent Tracking of Spermatozoa</article-title>
          ,
          <source>in: Proceedings of MediaEval 2022 CEUR Workshop</source>
          ,
          <year>2023</year>
          . URL: https://2022.multimediaeval.com/paper6101.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kosela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Aszyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klimek</surname>
          </string-name>
          , T. Prokop,
          <article-title>Tracking of Spermatozoa by YOLOv5 Detection and StrongSORT with OSNet Tracker</article-title>
          ,
          <source>in: Proceedings of MediaEval 2022 CEUR Workshop</source>
          ,
          <year>2023</year>
          . URL: https://2022.multimediaeval.com/paper7367.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Thambawita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Storås</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Witczak</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Haugen</surname>
            ,
            <given-names>H. L.</given-names>
          </string-name>
          <string-name>
            <surname>Hammer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
          </string-name>
          ,
          <article-title>VISEM-Tracking, a human spermatozoa tracking dataset</article-title>
          ,
          <source>Scientific Data</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1038/s41597-023-02173-4.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T. B.</given-names>
            <surname>Haugen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Witczak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. L.</given-names>
            <surname>Hammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Borgli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Riegler, VISEM: A Multimodal Video Dataset of Human Spermatozoa</article-title>
          ,
          <source>in: Proceedings of the 10th ACM Multimedia Systems Conference, MMSys '19</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2019</year>
          , p.
          <fpage>261</fpage>
          -
          <lpage>266</lpage>
          . doi:
          <volume>10</volume>
          .1145/3304109.3325814.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Redmon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Divvala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Farhadi</surname>
          </string-name>
          , You Only Look Once: Unified,
          <string-name>
            <surname>Real-Time Object</surname>
          </string-name>
          Detection,
          <source>in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>779</fpage>
          -
          <lpage>788</lpage>
          . doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2016</year>
          .
          <volume>91</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Jonathon Luiten</surname>
          </string-name>
          , TrackEval, https://github.com/JonathonLuiten/TrackEval,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Luiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Osep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dendorfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Torr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Geiger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Leal-Taixé</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Leibe,</surname>
          </string-name>
          <article-title>HOTA: A Higher Order Metric for Evaluating Multi-Object Tracking</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          (
          <year>2021</year>
          )
          <fpage>548</fpage>
          -
          <lpage>578</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11263-020-01375-2.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>