<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of ImageCLEF Lifelog 2020: Lifelog Moment Retrieval and Sport Performance Lifelog</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Van-Tu Ninh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tu-Khiem Le</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liting Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Piras</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Riegler</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pal Halvorsen</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathias Lux</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Triet Tran</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cathal Gurrin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Duc-Tien Dang-Nguyen</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dublin City University</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ITEC, Klagenfurt University</institution>
          ,
          <addr-line>Klagenfurt</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Pluribus One &amp; University of Cagliari</institution>
          ,
          <addr-line>Cagliari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Simula Metropolitan Center for Digital Engineering</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Bergen</institution>
          ,
          <addr-line>Bergen</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Science</institution>
          ,
          <addr-line>VNU-HCM, Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the fourth edition of Lifelog challenges in ImageCLEF 2020. In this edition, the Lifelog challenges consist of two tasks which are Lifelog Moments Retrieval (LMRT) and Sport Performance Lifelog (SPLL). While the Lifelog Moments Retrieval challenge follows the same format of the previous edition, its data is a larger multimodal dataset based on the merger of three previous NTCIR Lifelog datasets, which contain approximately 191,439 images with corresponding visual concepts and other related metadata. The Sport Performance Lifelog, which is a brand new challenge, is composed of three subtasks that focus on predicting the expected performance of athletes who trained for a sport event. In summary, the ImageCLEF Lifelog 2020 receives 50 runs from six teams in total with competitive results.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Due to the widely use of wearable devices, digital sensors, and smart phones
which capture photos, biometric signals, and location information passively, a
huge amount of daily-life data is recorded by many people everyday. As a
result, there is an ever increasing reserach e ort into developing methodologies for
exploiting the potential of this data. Such lifelog data has been used for many
retrieval and analytics challenges since the inaugoral NTCIR-12 - Lifelog task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
in 2016.
      </p>
      <p>
        There have been many reserch tasks addressed by these challenges, such as
lifelog retrieval, data segmentation, data enhancement/annotation and
interactive retrieval. Speci cally in the ImageCLEF lifelog challenge, we note a number
of di erent tasks, such as the Solve My Life Puzzle task in 2019 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Activity
of Daily Living Understanding task in 2018 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], or Lifelog Summarization task
in 2017 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Therefore, in the fourth edition of ImageCLEFlifelog tasks hosted
in ImageCLEF 2020 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the organizers both propose a brand-new task which
monitors the wellbeing and predicts the expected performance of the athletes
training for a sporting event, as well as continuing to maintain the core Lifelog
Retrieval Moments task with enriched dataset in terms of visual concepts,
annotations, and scale.
      </p>
      <p>Details of the two challenges and the data employed are provided in
section 2. In section 3, submissions and results are presented and discussed. In nal
section 4, the paper is concluded with the discussion of nal remarks and future
work.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Overview of the Task</title>
      <sec id="sec-2-1">
        <title>Motivation and Objectives</title>
        <p>Personal lifelog data is continually increasing in volume due to the popularity
of personal wearable/portable devices for health monitoring and life recording
such as smartphones, smart watches, tness bands, video cameras, biometric
data devices and GPS or location devices. As a huge amount of data is created
daily, there is a need for a systems that can analyse the data, index, categorize,
summarize to gain deep insights from the data and support a user in some
positive way.</p>
        <p>Although many related workshops of lifelogging were held successfully for
years such as three editions of NTCIR, annual Lifelog Search Challenge (LSC),
and ImageCLEFlifelog 2019, we still aim to bring the attention of lifelogging to
not only research groups but also to diverse audiences. Nevertheless, we continue
to maintain the core task to encourage research groups to propose creative
retrieval approaches to lifelog data, as well as nominating a new task to introduce
a new challenge to the research community.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Challenge Description</title>
      </sec>
      <sec id="sec-2-3">
        <title>Lifelog Moment Retrieval Task (LMRT)</title>
        <p>
          In this task, the participants were required to retrieve a number of speci c
moments in a lifelogger's life. Moments are de ned as "semantic events, or activities
that happened throughout the day" [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. For example, a participant would have
been required to nd and return relevant moments for the query \Find the
moment(s) when the lifelogger was having an ice cream on the beach". In this
edition, particular attention was to be paid to the diversi cation of the selected
moments with respect to the target scenario. The ground truth for this subtask
was created using a manual annotation process and aimed towards compete
relevance judgements. Figure 1 illustrates some examples of the moments when the
lifelogger was shopping in a toy shop. In addition, listings 1 and 2 show all the
Description: Find the moment(s) in 2015 or 2016 when u1 enjoyed beers in the bar.
Narrative: To be considered relevant, u1 must be clearly in a bar and drink beers.
        </p>
        <sec id="sec-2-3-1">
          <title>T.002 Building Personal Computer</title>
          <p>Description: Find the moment(s) when u1 built my personal computer from scratch.
Narrative: To be considered relevant, u1 must be clearly in the o ce with the PC
parts or uncompleted PCs on the table.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>T.003 In A Toyshop</title>
          <p>Description: Find the moment(s) when u1 was looking at items in a toyshop.
Narrative: To be considered relevant, u1 must be clearly in a toyshop where various
toys are being examined.</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>T.004 Television Recording</title>
          <p>Description: Find the moment(s) when u1 was being recorded for a television show.
Narrative: To be considered relevant, there must clearly be a television camera in
front of u1. The moments the interviewer/cameramen is interviewing/recording u1
are also considered relevant.</p>
        </sec>
        <sec id="sec-2-3-4">
          <title>T.005 Public Transport In Home Country</title>
          <p>Description: Find the moment(s) in 2015 and 2018 when u1 was using public
transports in my home country (Ireland).</p>
          <p>Narrative: Taking any form of public transport in Ireland is considered relevant,
such as bus, taxi, train and boat.</p>
        </sec>
        <sec id="sec-2-3-5">
          <title>T.006 Seaside Moments</title>
          <p>Description: Find moment(s) in which u1 was walking by the sea taking photos or
eating ice-cream.</p>
          <p>Narrative: To be considered relevant, u1 must be taking a walk by the sea or eating
ice-cream with the sea is clearly visible.</p>
        </sec>
        <sec id="sec-2-3-6">
          <title>T.007 Grocery Stores</title>
          <p>Description: Find moment(s) in 2016 and 2018 when u1 was doing grocery shopping
on the weekends.</p>
          <p>Narrative: To be considered relevant, u1 must be clearly buys or visibly interacts
with products in a grocery store on the weekends.</p>
        </sec>
        <sec id="sec-2-3-7">
          <title>T.008 Photograph of The Bridge</title>
          <p>Description: Find the moment(s) when u1 was taking a photo of a bridge.
Narrative: Moments when u1 was walking on a street without stopping to take a
photo of a bridge are not relevant. Any other moment showing a bridge when a
photo was not being taken are also not considered to be relevant.</p>
        </sec>
        <sec id="sec-2-3-8">
          <title>T.009 Car Repair</title>
          <p>Description: Find the moment(s) when u1 was repairing his car in the garden.
Narrative: Moments when u1 was repairing his car in the garden with the gloves on
his hand. Sometimes, he also held the hammer and his phone and these moments
are also considered relevant.</p>
        </sec>
        <sec id="sec-2-3-9">
          <title>T.010 Monsters</title>
          <p>Description: Find the moment(s) when u1 was looking at an old clock, with owers
visible, with a small monster watching u1.</p>
          <p>Narrative: Moments when u1 was at home, looking at an old clock, with owers
visible, with a lamp and two small monsters watching u1 are considered relevant.</p>
          <p>Listing 1: Description of topics for the development set in LMRT.</p>
        </sec>
        <sec id="sec-2-3-10">
          <title>T.001 Praying Rite</title>
          <p>Description: Find the moment when u1 was attending a praying rite with other
people in the church.</p>
          <p>Narrative: To be relevant, the moment must show u1 is currently inside the church,
attending a praying rite with other people.</p>
        </sec>
        <sec id="sec-2-3-11">
          <title>T.002 Lifelog data on touchscreen on the wall</title>
          <p>Description: Find the moment when u1 was looking at lifelog data on a large
touchscreen on the wall.</p>
          <p>Narrative: To be relevant, the moment must show u1 was looking at his lifelog data
on the touchscreen wall (not desktop monitor).</p>
        </sec>
        <sec id="sec-2-3-12">
          <title>T.003 Bus to work - Bus to home</title>
          <p>Description: Find the moment when u1 was getting a bus to his o ce at Dublin
City University or was going home by bus.</p>
          <p>Narrative: To be relevant, u1 was on the bus (not waiting for the bus) and the
destination was his home or his workplace.</p>
        </sec>
        <sec id="sec-2-3-13">
          <title>T.004 Bus at the airport</title>
          <p>Description: Find the moment when u1 was getting on a bus in the aircraft landing
deck in the airport.</p>
          <p>Narrative: To be relevant, u1 was walking out from the airplane to the bus parking
in the aircraft landing deck with many airplanes visible.</p>
        </sec>
        <sec id="sec-2-3-14">
          <title>T.005 Medicine cabinet</title>
          <p>Description: Find the moment when u1 was looking inside the medicine cabinet in
the bathroom at home.</p>
          <p>Narrative: To be considered relevant, u1 must be at home, looking inside the opening
medicine cabinet beside a mirror in the bathroom.</p>
        </sec>
        <sec id="sec-2-3-15">
          <title>T.006 Order Food in the Airport</title>
          <p>Description: Find the moment when u1 was ordering fast food in the airport.
Narrative: To be relevant, u1 must be at the airport and ordering food. The moments
that u1 was queuing to order food are also considered relevant.</p>
        </sec>
        <sec id="sec-2-3-16">
          <title>T.007 Seafood at Restaurant</title>
          <p>Description: Find moments when u1 was eating seafood in a restaurant in the
evening time.</p>
          <p>Narrative: The moments show u1 was eating seafood or seafood parts in any
restaurant in the evening time are considered relevant.</p>
        </sec>
        <sec id="sec-2-3-17">
          <title>T.008 Meeting with people</title>
          <p>Description: Find the moments when u1 was a round-table meeting with many
people and there was pink (not red) name-cards for each person.</p>
          <p>Narrative: The moments show u1 was at a roundtable and having a meeting with
many people, and with pink name-cards visible are considered relevant.</p>
        </sec>
        <sec id="sec-2-3-18">
          <title>T.009 Eating Pizza</title>
          <p>Description: Find the moments when u1 was eating a pizza while talking to a man.
Narrative: To be considered relevant, u1 must eat or hold a pizza with a man in the
background.</p>
        </sec>
        <sec id="sec-2-3-19">
          <title>T.010 Socialising</title>
          <p>Description: Find the moments when u1 was talking to a lady in a red top, standing
directly in front of a poster hanging on a wall.</p>
          <p>Narrative: To be relevant, u1 must be talking with a woman in red, who was standing
right in front of a scienti c research poster.</p>
          <p>Listing 2: Description of topics for the test set in LMRT.
queries used in the challenge.</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Sport Performance Lifelog (SPLL)</title>
        <p>Given a dataset of 16 people who train for a 5km run for the sport event (e.g.,
daily sleeping patterns, daily heart rate, sport activities, and image logs of all
food consumed during the training period), participants are required to predict
the expected performance (e.g., estimated nishing time, average heart rate and
calories consumption) of the trained athlete.
2.3</p>
      </sec>
      <sec id="sec-2-5">
        <title>Datasets</title>
        <p>
          LMRT Task: The data is a combination of three previously released datasets
of NTCIR-Lifelog Tasks: NTCIR-12 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], NTCIR-13 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], and NTCIR-14 [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. It
is a large multimodal lifelog data of 114 days from one lifelogger whose dates
range from 2015 to 2018. It contains ve main data types which are multimedia
content, biometrics data, location and GPS, human activity data, and visual
concepts and annotations of non-text multimedia content. Details of each type
of data are as follows:
{ Multimedia Content: Most of this data are non-annotated egocentric
photos captured passively by two wearable digital cameras: OMG Autographer
and Narrative Clip1. The lifelogger wore the device for 16-18 hours per day to
capture a complete visual trace of daily life with about 2-3 photos captured
per minute during waking hours. The photo data was manually redacted to
remove identi able content and faces [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
1 Narrative Clip and Narrative Clip 2 - http://getnarrative.com
{ Biometrics Data: This data contains heart rate, calories, and movement
speed using a Fitbit tness tracker2. The lifelogger wore the Fitbit wearable
device everyday for 24 hours so as to record continuous biometrics data.
{ Location and GPS: 166 semantic locations as well as GPS data (with and
without location name) are recorded using both Moves app and smartphones.
The GPS plays an important role to infer the time zone of lifelogger's current
location to convert the time of di erent wearable devices into one standard
timestamp.
{ Human Activity Data: This data is recorded by the Moves app which
also provide some annotated semantic locations. It consists of four types of
activities which are walking, running, transport, and airplane.
{ Visual Concepts and Annotations: The passively auto-captured images
are passed through two deep neural networks to extract visual concepts
about scenes and visual objects. For scene identi cation, we still employ
the PlacesCNN [25] as in the latest edition. For visual object detection, we
employed Mask R-CNN [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] pre-trained on 80 items of MSCOCO dataset
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] which is used to provide the category of visual objects in the image as
well as its bounding boxes.
        </p>
        <p>Format of the metadata. The metadata was stored in a csv les, which
was called the metadata table. The structure and meaning of each eld in the
table are described in Table 3. Additionally, visual categories and concepts
descriptors are also provided in the visual concepts table. The format of it could
be found in Table 4.</p>
        <p>SPLL Task: The data was gathered using three di erent approaches:
wearable devices (Fitbtit Fitness Tracker (Fitbit Versa), Google Forms, and PMSYS.
Biometric data of an individual (training athlete) is recorded using Fitbit
Fitness Tracker, including 13 di erent elds of information such as daily heart rate,
calories, daily sleeping patterns, sport activities, etc. Google Forms were used to
collect information of meals, drinks, medications, etc. At the same time,
information of subjective wellness, injuries, and training load was recorded by PMSYS
system. In addition, image-logs of food consumed during the training period from
at least 2 participants and self reported data like mode, stress, fatigue, readiness
2 Fitbit Fitness Tracker (FitBit Versa) - https://www. tbit.com</p>
        <sec id="sec-2-5-1">
          <title>Catgories</title>
          <p>Calories
Steps
Distance
Sleep
Lightly active minutes
Moderately active
minutes
Very active minutes
Sedentary minutes
Heart rate
Time in heart rate zones
Resting heart rate
Exercise
Sleep score
Google Forms reporting
Wellness
Injury
SRPE</p>
          <p>Fitbit
Fitbit
Fitbit
Fitbit
Fitbit
Fitbit
Fitbit
Fitbit
Fitbit
Fitbit
Fitbit
Fitbit
Fibtit
Google
Form
PMSYS
PMSYS
PMSYS</p>
          <p>Per minute
Per minute
Per minute
When it happens
(usually daily)
Per day
Per day
Per day
Perday
Per 5 seconds
Per day
Per day
When it happens
100 entries per</p>
          <p>
            le
When it happens
(usually daily)
Per day
Per day
Per day
Per day
3377529
1534705
1534705
2064
2244
2396
2396
2396
20991392
2178
1803
2440
1836
1569
1747
225
783
to train and other measurements also used for professional soccer teams [
            <xref ref-type="bibr" rid="ref23">23</xref>
            ].
The data was approved by the Norwegian Center for Research Data with proper
copyright and ethical approval to release. Statistics of the ImageCLEFlifelog
2020 SPLL data is shown in table 2.
LMRT: Classic metrics are employed to assess the performance of this LMRT
task. These metrics include:
{ Cluster Recall at X (CR@X) - a metric that assesses how many di erent
clusters from the ground truth are represented among the top X results;
{ Precision at X (P@X) - measures the number of relevant photos among the
top X results;
{ F1-measure at X (F1@X) - the harmonic mean of the previous two.
          </p>
          <p>Various cut o points are considered, e.g., X=5, 10, 20, 30, 40, 50. O cial
ranking metrics are the F1-measure@10, which gives equal importance to
diversity (via CR@10) and relevance (via P@10). In particular, the nal score to rank
submissions of participants is the average F1-measure@10 of ten queries, which
provides information of general performance of each interactive system for all 10
queries.</p>
          <p>Participants were allowed to undertake the sub-tasks in an interactive or
automatic manner. For interactive submissions, a maximum of ve minutes of
search time was allowed per topic. In particular, methods that allowed
interaction with real users (via Relevance Feedback (RF), for example), i.e., beside
of the best performance, the way of interaction (like number of iterations using
RF), or innovation level of the method (for example, new way to interact with
real users) were encouraged.</p>
          <p>SPLL: For this task, we employ two evaluation metrics to rank the submissions
of participants. The primary score is to check how accurately the participants
can predict whether it was an improvement or a deterioration after the training
process by comparing the sign of the actual change value to the predicted one.
The secondary score is the absolute di erence between the actual change and
the predicted one. The primary score is ranked in descending order, and if there
is a draw in the primary score, the secondary score is used to re-rank the teams.
2.5</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>Ground Truth Format</title>
        <p>LMRT Task. The development ground truth for the LMRT task was provided
in two individual txt les: one le for the cluster ground truth and one le for
the relevant image ground truth.</p>
        <p>In the cluster ground-truth le, each line corresponded to a cluster where
the rst value was the topic id, followed by the cluster id number. Lines were
separated by an end-of-line character (carriage return). An example is presented
below:
{ 1, 1
{ 1, 2
{ ...
{ 2, 1
{ 2, 2
{ ...</p>
        <p>In the relevant ground-truth le, the rst value on each line was the topic id,
followed by a unique photo id which is image name without the extension, and
then followed by the cluster id number (that corresponded to the values in the
cluster ground-truth le) separated by comma. Each line corresponded to the
ground truth of one image and lines were separated by an end-of-line character
(carriage return). An example is presented below:
{ 1, b00001216 21i6bq 20150306 174552e, 1
{ 1, b00001217 21i6bq 20150306 174713e, 1
{ 1, b00001218 21i6bq 20150306 174751e, 1
{ 1, b00002953 21i6bq 20150316 203635e, 2
{ 1, b00002954 21i6bq 20150316 203642e, 2
{ ...
{ 2, b00000183 21i6bq 20150313 072410e, 1
{ 2, b00000184 21i6bq 20150313 072443e, 1
{ 2, b00000906 21i6bq 20150312 171852e, 2
{ 2, b00000908 21i6bq 20150312 172005e, 2
{ 2, b00000909 21i6bq 20150312 172040e, 2
{ ...</p>
        <p>SPLL Task. The ground truth was provided in one txt le. For each line in
this le, the rst value was the id of the sub-task which is 1, 2 or 3 (since the
SPLL task is split into three sub-tasks), followed by the id of individual (p01,
p02, ..., p16), followed by the actual change in the status of the individual after
the training period. Although the three sub-tasks has di erent requirements,
their output format is the same, which is a number indicating the change before
and after training with preceding '+' sign if the change is an increase, or '-' sign
if the change is a decrease. If there is no change after the training process, a 0
value without a preceding sign is also allowed. Values in each line were separated
by comma. Lines were separated by and end-of-line character (carriage return).
An example is shown below:
{ 1, p01, +8
{ 1, p10, +86
{ ...</p>
        <p>Run
RUN1*
RUN2*
RUN3*
RUN1
RUN2
RUN3
RUN4
RUN5
RUN6
RUN7
RUN1
RUN2</p>
        <p>RUN3
Team</p>
        <p>Run</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Evaluation Results</title>
      <p>This year, we obtained 50 valid submissions in both two tasks of
ImageCLEFlifelog tasks from 6 teams, which is not as high as in previous year. However,
the results of these submissions show a signi cant improvement in the nal scores
compared to ImageCLEFlifelog 2019. In particular, there were 38 submissions
in LMRT with 6 teams participating in the task, while only one non-organizer
team submitted 10 runs in SPLL task. The submitted runs and their results are
summarised in Tables 5 and 6.
3.2</p>
      <sec id="sec-3-1">
        <title>Results</title>
        <p>In this section we provide a short description of all submitted approaches followed
by the o cial result of the task.</p>
        <p>The Organizer team continue to provided a baseline approach for the LMRT
task with a web-based interactive search engine, which is an improved version</p>
        <sec id="sec-3-1-1">
          <title>Team</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>Organiser</title>
        </sec>
        <sec id="sec-3-1-3">
          <title>BIDAL Run</title>
          <p>
            RUN1*
RUN2*
RUN1
RUN2
RUN3
RUN4
RUN5
RUN6
RUN7
RUN8
RUN9
RUN10
Notes: * submissions from the organizer teams are just for reference. The results in
this paper are o cial version of ImageCLEFlifelog 2020 tasks.
of LIFER 2.0 system [
            <xref ref-type="bibr" rid="ref18">18</xref>
            ] which was used at ImageCLEFlifelog 2019. The
interactive elements of this system comprise three features: free-text querying and
lterinf, visual similarity image search, and elastic sequencing to view nearby
moments. The system, which focuses on experimenting the e ciency of free-text
query features, is an early version of LifeSeeker 2.0 interactive search engine
[
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]. For the query processing procedure, the authors use natural language
processing to parse the query into meaningful terms and employ Bag-of-Words to
retrieve and rank relevant documents. The dictionary in Bag-of-Words is split
into three dictionaries for ltering by using terms matching: time, location, and
visual concepts. The authors extract more detailed visual concepts inferred from
deep neural networks pre-trained on Visual Genome dataset [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] which were
shown to be extremely useful for the retrieval process. For the SPLL task, the
organizer team provided baseline approaches for all three sub-tasks, which used
the exercise data from Fitbit Tracker, self-reporting, and food-images only. The
authors propose a naive solution which computes the di erence between
consecutive rows of data from exercise activities and self-reporting including distance,
exercise duration, calories, and weight; then categorises them into positive and
negative groups based on sign of the value ('+' or '-') and calculate the
average of the two groups. Finally, they sum the two average to obtain the results.
In addition, they also try to build a Linear Regression Model to predict the
pace change and a Convolutional Neural Network to detect the type of food for
manual calories inference.
          </p>
          <p>
            The REGIM-Lab approaches the LMRT Task with the same strategies as
their work in ImageCLEFlifelog 2019 [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], which used the ground truth of the
development dataset from both LMRT 2019 and LMRT 2020 to automatically
categorise images into categories for deep neural network ne-tuning with
MobileNet v2 and DenseNet and visual concepts clustering. However, the di erence
is that they use Elastic Search and Kibana Query Language (KBL) to perform
retrieval on image concepts and metadata instead Apache Cassandra and
Cassandra Query Language (CQL). Moreover, they attempt to enrich more visual
concepts using YOLO v3 [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ] trained on OpenImage dataset. They also treat
the textual queries with three-word embedding models built from scratch which
are Word2vec, FastText, and Glove
          </p>
          <p>
            HCMUS focused on the LMRT task only this year. Their retrieval system
has three components which are query by caption, query by similar image, and
query by place and time from the metadata. For query by caption, they encoded
images using Faster R-CNN [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] to extract object-level features, then applied
self-attention to learn interaction between them. For query sentence, they used
RoBerta model [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] to encode sentences. Finally, two feed forward networks were
deployed to map image and text features to the common space correspondingly.
Therefore, when a sentence is given, their model ranked all images based on
cosine distance between the encoded images and the encoded query sentence
to nd the most relevant images to the description of the sentence. For query
by similar image, the same strategy was applied with ResNet152 image encoder
[
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] instead of Faster R-CNN. For query by place and time from metadata, they
simply nd all moments based on the given semantic locations and view the
images which are before and after a speci c moments.
          </p>
          <p>
            DCU-DDTeam interactive search engine is the improved version of their
Mysceal system in LSC'20 [24] and follows the same pipeline. The visual
concepts of each image are the combination of the given metadata and outputs from
DeepLabv3+ [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] and the enriched metadata extracted from Microsoft Computer
Vision API. These annotations, along with other information such as locations
and time, were then indexed in the Elastic Search. The input query is analyzed
to extract the main information and enlarged by their expansion mechanism.
They are then combined with the indexed database to nd matching images
which are then ordered by their ranking function. In this version, they
introduced three changes in the previous system including visual similarity, the user
interface, and the summary panel. The visual similarities between images were
measured by using cosine distance between visual features composed of SIFT
[
            <xref ref-type="bibr" rid="ref17">17</xref>
            ] and VGG16 features. For the user interface, the authors remove the triad
of searching bars as in the original version and reorganised the interface to
explore cluster events more e ciently. The summary panel consists of the \Word
List" panel which is the area on the screen showing the results of their query
expansion with adjustable scores allowing the user to emphasize the concepts
that they need to retrieve.
          </p>
          <p>
            The BIDAL team is the only non-organizer team participating in both the
LMRT and SPLL tasks. For the LMRT task, the authors generated clusters by
employing a scene recognition model trained on the Google Fixmatch method
[
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. They then used an attention mechanism to match the input query with the
correct samples, which were then utilized to nd other relevant moments. For
the SPLL task, they summarized information from various interval attributes,
removed several unnecessary attributes, and generated some new attributes. Then,
they trained several typical time-series neural network structure including
Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) using
the generated attributes or a set of attributes, or some pre-de ned seeding
attributes.
          </p>
          <p>
            The UA.PT Bioinformatics team continued to employ the approaches from
last year challenge [
            <xref ref-type="bibr" rid="ref21">21</xref>
            ] to test the performance of their automatic lifelog search
engine in the attempt to enrich visual concepts and labels by utilising many
di erent object detection networks including Faster R-CNN and Yolov3 [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]
pretrained on COCO dataset [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. The image retrieval procedure was then done on
a text-based vector space by computing similarity score between the text labels
extracted from the images and and the visual concepts. Finally, a threshold was
set to choose the results for each topic. As the results prove that this automatic
approach did not work, the authors developed a web-based interactive search
engine with a timestamp-clustering visualization to select the moments instead
of de ning a threshold to choose the results automatically. The algorithms for
searching relevant moments are mostly the same as automatic approach except
for three new features which are: narrowing searching items by text matching
between manually analysed query and the indexed database containing concepts
of each image.
          </p>
          <p>The o cial results are summarised in Tables 5 and 6. There are six teams
participating in the LMRT task with the highest F1@10 score of 0.81 was achieved
by HCMUS (Table 5). Most of the teams tried to enrich the visual concepts by
deploying di erent CNNs for objects and places detection, then performing text
analysis on the query and text matching. Some additional features were also
added in most interactive systems such as searching for visually similar images,
terms weighting for results re-ranking, context understanding before performing
search, etc. The highest scoring approach by the HCMUS team, considered
visual vector features extracted from CNNs when making the comparison between
feature vectors to nd relative moments.</p>
          <p>In the SPLL task, only one non-organizer team participated and they
managed to achieve good scores. For the prediction of performance change, their
approach gained 0.82 and 128.0 in terms of prediction accuracy and L1 distance
between the prediction value and actual change respectively.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussions and Conclusions</title>
      <p>In the ImageCLEFlifelog 2020, most of the submitted results managed to gain
high scores in both tasks. Although the set of topics for LMRT task is di erent
from previous edition, participants managed to search for the relative moments
on a large-scale dataset while still achieving higher scores than the results in
previous edition. This proves that the proposed features and query mechanisms
actually enhance the performance of their retrieval systems. Most of the teams
enrich semantic visual concepts using many di erent CNNs pretrained on
different datasets such as COCO, OpenImage, and Visual Genome before indexing
and querying; retrieve relative images based on text matching and text retrieval
algorithms; perform visual similar image search. We also note many interesting
approaches from teams to enhance the a ordance and interaction of the retrieval
systems, including integrating lter mechanism into free-text search, considering
adding visual vector features into the nal encoded vector, clustering images into
events, etc.</p>
      <p>Regarding the number of teams and submitted runs, only 6 teams
participated in the LMRT task, including an organizer team, which produced 50
submissions in total. Each team was allowed to submit up to 10 runs. For the LMRT
task, among ve teams which participated in ImageCLEFlifelog 2019 (including
the organizer team), four teams managed to obtain better results with the
highest F1-score up to 0.81. The mean (SD) increase of nal F1-score from these ve
teams is 0.25 (0.18). The new team from Dublin City University also managed
to achieve the 4th rank with a 0.48 F1-score. For the SPLL task, as the task
is new, only one team from The Big Data Analytics Laboratory submitted 10
runs. Their best submission achieves an accuracy of performance change and the
absolute di erence between the prediction and actual change are 0.82 and 128
respectively, which is a good result.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgement</title>
      <p>This publication has emanated from research supported in party by research
grants from Irish Research Council (IRC) under Grant Number GOIPG/2016/741
and Science Foundation Ireland under grant numbers SFI/12/RC/2289 and
SFI/13/RC/2106.
24. Tran, L.D., Nguyen, M.D., Binh, N.T., Lee, H., Gurrin, C.: Mysceal: An
experimental interactive lifelog retrieval system for lsc'20. Proceedings of the Third Annual
Workshop on Lifelog Search Challenge (2020)
25. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million
image database for scene recognition. IEEE Transactions on Pattern Analysis and
Machine Intelligence (2017)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abdallah</surname>
            ,
            <given-names>F.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feki</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ammar</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amar</surname>
            ,
            <given-names>C.B.</given-names>
          </string-name>
          :
          <article-title>Big data for lifelog moments retrieval improvement</article-title>
          .
          <source>In: CLEF</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Papandreou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adam</surname>
          </string-name>
          , H.:
          <article-title>Encoder-decoder with atrous separable convolution for semantic image segmentation</article-title>
          .
          <source>In: ECCV</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boato</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Overview of imagecle ifelog 2017: Lifelog retrieval and summarization</article-title>
          . In: CLEF (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lux</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Overview of imagecle ifelog 2018: Daily living understanding and lifelog moment retrieval</article-title>
          .
          <source>In: CLEF</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lux</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Overview of imagecle ifelog 2019: Solve my life puzzle and lifelog moment retrieval</article-title>
          .
          <source>In: CLEF</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hopfgartner</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Albatal</surname>
          </string-name>
          , R.:
          <article-title>Overview of ntcir-12 lifelog task</article-title>
          .
          <source>In: NTCIR</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hopfgartner</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Albatal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , DangNguyen, D.T.:
          <article-title>Overview of ntcir-13 lifelog-2 task (</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hopfgartner</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Albatal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Healy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the ntcir-14 lifelog-3 task (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smeaton</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doherty</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          : Lifelogging:
          <article-title>Personal big data</article-title>
          .
          <source>Found. Trends Inf. Retr. 8</source>
          ,
          <issue>1</issue>
          {
          <fpage>125</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkioxari</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
          </string-name>
          , R.B.:
          <string-name>
            <surname>Mask</surname>
          </string-name>
          r-cnn.
          <source>2017 IEEE International Conference on Computer Vision</source>
          (ICCV) pp.
          <volume>2980</volume>
          {
          <issue>2988</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>2016 IEEE Conference on Computer Vision</source>
          and Pattern
          <string-name>
            <surname>Recognition</surname>
          </string-name>
          (CVPR) pp.
          <volume>770</volume>
          {
          <issue>778</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Peteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DemnerFushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozlovski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , l Halvorsen,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fichou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Berari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Brie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Stefan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            ,
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.G.</surname>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2020: Multimedia retrieval in lifelogging, medical, nature, and internet applications</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the 11th International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ), vol.
          <volume>12260</volume>
          .
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          - 25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Krishna</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johnson</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Hata</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kravitz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalantidis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shamma</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fei-Fei</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Visual genome: Connecting language and vision using crowdsourced dense image annotations</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>123</volume>
          , 32{
          <fpage>73</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>H.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Healy</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Lifeseeker 2.0: Interactive lifelog search engine at lsc 2020</article-title>
          .
          <source>Proceedings of the Third Annual Workshop on Lifelog Search Challenge</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>T.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hays</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zitnick</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          :
          <article-title>Microsoft coco: Common objects in context</article-title>
          .
          <source>ArXiv abs/1405</source>
          .0312 (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . ArXiv abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lowe</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Distinctive image features from scale-invariant keypoints</article-title>
          .
          <source>International Journal of Computer Vision</source>
          <volume>60</volume>
          , 91{ (11
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ninh</surname>
          </string-name>
          , V.T.,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lux</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang-Nguyen</surname>
          </string-name>
          , D.T.:
          <article-title>Lifer 2.0: Discovering personal lifelog insights using an interactive lifelog retrieval system</article-title>
          .
          <source>In: CLEF</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Redmon</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farhadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Yolov3: An incremental improvement</article-title>
          . ArXiv abs/
          <year>1804</year>
          .02767 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <string-name>
            <surname>Faster</surname>
          </string-name>
          r-cnn:
          <article-title>Towards real-time object detection with region proposal networks</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>39</volume>
          ,
          <fpage>1137</fpage>
          {
          <fpage>1149</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ribeiro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neves</surname>
            ,
            <given-names>A.J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          : Ua.pt bioinformatics at imageclef 2019:
          <article-title>Lifelog moment retrieval based on image annotation and natural language processing</article-title>
          .
          <source>In: CLEF</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Sohn</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berthelot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carlini</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cubuk</surname>
            ,
            <given-names>E.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurakin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Zhang, H., Ra el, C.: Fixmatch:
          <article-title>Simplifying semi-supervised learning with consistency and con dence</article-title>
          . ArXiv abs/
          <year>2001</year>
          .07685 (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgli</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pettersen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johansen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johansen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kupka</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stensland</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jha</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Gr nli, T.M.,
          <string-name>
            <surname>Fredriksen</surname>
            ,
            <given-names>P.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eg</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fagernes</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biorn-Hansen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hammer</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Pmdata: A sports logging dataset (02</article-title>
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>