<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>UA.PT Bioinformatics at ImageCLEF 2020: Lifelog Moment Retrieval Web based Tool</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ricardo Ribeiro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Silva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alina Trifan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose Luis Oliveira</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio J. R. Neves</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IEETA/DETI, University of Aveiro</institution>
          ,
          <addr-line>3810-193 Aveiro</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of the Bioinformatics group of the Institute of Electronics and Engineering Informatics of University of Aveiro in the ImageCLEF lifelog task, more speci cally in the Lifelog Moment Retrieval (LMRT) sub-task. In our rst participation last year we tackled the LMRT challenge with an automatic approach. Following the same steps, we improved our results, while introducing a new interactive approach. For the automatic approach, two submissions were made. We started by processing all images in the lifelog dataset using object detection and scene recognition algorithms. Afterwards, we processed the query topics with Natural Language Processing (NLP) algorithms in order to extract relevant words related to the desired moment. Finally, we compared the visual concepts of the image with the textual concepts of the query topic with the goal of computing a condence score that relates the image to the topic. For the interactive approach, we developed a web application in order to visualize and provide an interactive tool to the users. The application is divided in three stages. In the rst one, the user uploads the images from the dataset, as well the textual data annotations. In the second stage, the user interacts with the application assigning the extracted words to the several topics. Consequently, the application retrieves the image associated to the topic with a certain con dence. In the last stage, we provide a visual environment with two di erent views, in the form of a image gallery or data tables organized into timestamp clusters. Similarly to our previous participation, the results of the automatic approach are still far from being competitive. We conclude that an automatic approach might not be the best solution for the LMRT task since the currently available stateof-the-art technology is still not able to wield better results. However, our interactive approach with relevance feedback obtained better and competitive results, achieving a F1-measure@10 score of 0.52.</p>
      </abstract>
      <kwd-group>
        <kwd>lifelog</kwd>
        <kwd>moment retrieval</kwd>
        <kwd>image processing</kwd>
        <kwd>web applica- tion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The number of workshops and tasks for research has increased over the last
few years and among them are the main elds of ImageCLEF 2020 lab [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]:
multimedia retrieval in lifelogging, medical, mature, and internet applications.
The multimedia retrieval in lifelogging has received signi cant attention from
both research and commercial communities. The increasing number of mobile
and wearable devices is dramatically changing the way we collect data about a
person's life.
      </p>
      <p>
        Lifelogging is de ned as a form of pervasive computing consisting of a
unied digital record of the totality of an individual's experiences, captured
multimodally through digital sensors and stored permanently as a personal
multimedia archive. In a simple way, lifelogging is the process of tracking and recording
personal data created through our activities and behaviour [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Personal lifelogs have a great potential in numerous applications, including
memory and moments retrieval, daily living understanding, diet monitoring, or
disease diagnosis, as well as other emerging application areas [9]. For example: in
Alzheimer's disease, people with memory problems can use a lifelog application
to help a specialist follow the progress of the disease, or to remember certain
moments from the last days or months.</p>
      <p>
        One of the greatest challenges of lifelog applications is the large amount of
lifelog data that a person can generate. The lifelog datasets, for example the
ImageCLEFlifelog dataset [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], are rich multimodal datasets which consist in
one or more months of data from multiple lifeloggers. Therefore, an important
aspect is the lifelog data organization in the interest of improving the search and
retrieval of information. In order to organize the lifelog data, useful information
has to be extracted from it. Other important aspects are the visualization and
user interface of the application.
      </p>
      <p>
        With the purpose of improving the results obtained in the previous year's
challenge [7], we developed a rst version of a web application to provide a visual
and interactive environment to the user. In last year's work [7], the approach
was fully automatic using an exhaustive method to retrieve data and there was
no tool for visualization and interaction with the user. However this year, a
signi cant improvement has been made with regard to the data retrieval using a
dynamic and faster method. Initially, only the data provided by the organization
is used and stored in the database to further use in the retrieval stage in our
application. We divided this approach into 3 di erent stages, such as upload,
retrieval and visualization. At each stage, there is an interaction with the user,
which is encouraged by the organizers of the ImageCLEFlifelog [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The web
application is still in an early stage but is the baseline of our current work.
      </p>
      <p>This paper starts with an introductory section and it is organized as follows.
Section 2 provides a brief introduction to the ImageCLEF lifelog and the
subtask Lifelog Moment Retrieval. The proposed methods are described in Section
3. In Section 4, the results of all submitted runs obtained in the LMRT sub-task
are described. Finally, a summary of the work presented in this paper, concluding
remarks, and future work can be read in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>Task Description</title>
      <p>
        The ImageCLEFlifelog 2020 task [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is divided into two di erent sub-tasks: the
Lifelog moment retrieval (LMRT) and Sport Performance Lifelog (SPLL)
subtask. In this work, as in the previous year's challenge [7], we only addressed the
LMRT sub-task, as a continuous research work that we intend to develop with
the aim of giving our contribution to real problems that exist around the world
that can bene t from this technology.
      </p>
      <p>In the LMRT subtask, the main objective is to create a system capable
of retrieving a number of prede ned moments in a lifelogger's day-to-day life
from a set of images. Moments can be de ned as semantic events or activities
that happen at any given time during the day. For example, given the query
"Find the moment(s) when the lifelogger was having an icecream on the beach\
the participants should return the corresponding relevant images that show the
moments of the lifelogger having icecream at the beach. Like last year, particular
attention should be paid to the diversi cation of the selected moments with
respect to the target scenario.</p>
      <p>
        ImageCLEFlifelog dataset is a new rich multimodal dataset which consists of
4.5 months of data from three lifeloggers, namely: images (1,500-2,500 per day),
visual concepts (automatically extracted visual concepts with varying rates of
accuracy), semantic content (locations and activities) based on sensor readings
on mobile devices (via the Moves App), biometrics information (heart rate,
galvanic skin response, calories burn, steps, continual blood glucose, etc.), music
listening history and computer usage [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, in this work we only use the
images, the visual concepts and the semantic content of the dataset.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed Method</title>
      <p>We submitted a total of 3 runs in the LMRT sub-task. The work made this
year had a signi cant improvement comparing with our previous work [7], due
to the interactive and visual approach with the user that we choose to apply.
In this section, we present the proposed approach of our submissions. The rst
two runs follow the same approach as last year [7], where we aimed at building a
fully automatic process for image retrieval. However, the improvement is in our
last submission, in which a web application was developed providing visual and
interactive environment to the user. This web application is a rst prototype,
far from a nal version, but we consider it as a baseline of our work.
3.1</p>
      <sec id="sec-3-1">
        <title>Automatic approach (Run 1 and 2)</title>
        <p>
          Initially, the images of the dataset were processed using algorithms for label
detection, such as objects and scenes. The information provided by the
organizers, such as locations, activities and local time, are also used. In both runs,
for scene recognition we used a pretrained model provided by Zhou et al. [10]
trained on the Places365 standard dataset. For the rst run, the method used to
extract objects from the images is a combination of ResNeXt-101 and Feature
Pyramid Network architectures in a basic Faster Region-based Convolutional
Network (Faster R-CNN) pretrained on the COCO dataset that was proposed
by Mahajan et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          In the second run, the object detection algorithm used is the YoloV3 [6] model
pretrained in the COCO dataset. Subsequently, we proceed to the extraction
of relevant words from the query topics and the computation of the semantic
similarity between word vectors done with a Natural Language Processing library
called SpaCy [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. From the topic title, description and narrative, relevant words
were extracted and organized into di erent categories, such as relevant things,
negative things, activities, dates, locations and environment.
        </p>
        <p>Using topic 1 as an example :
{ Title : "Praying Rite."
{ Description : "Find the moment when u1 was attending a praying rite with
other people in the church."
{ Narrative : "To be relevant,the moment must show u1 is currently inside
the church, attending a praying rite with other people. The moments that
u1 is outside with the church visible or inside the church but is not attending
the praying rite are not considered relevant."</p>
        <p>The extracted textual data is as follows:
{ relevant things - "rite" , people".
{ activities - "praying", "praying rite", "attending".
{ locations - "church"
{ dates - empty.
{ user inside - "true".
{ user outside - "false".
{ negative relevant thing - "church visible".
{ negative locations: empty.
{ negative activities : empty.
{ negative dates: empty.</p>
        <p>Afterwards, a con dence score is computed for each image in the dataset.
The score is obtained through the comparison of the extracted words from the
topic and the extracted labels from the images. This score is in uenced by the
scores of the image concepts obtained through the object detection phase and
the di erent weights assigned to each category. The weight for each category is
obtained through two di erent factors, a factor of importance and a computed
factor.</p>
        <p>In Run 1, the importance factor for all categories is the same. This means
that each category has the same weight for the computation of the con dence
score.</p>
        <p>For Run 2, we decided to de ne the importance factor di erently for each
category. We give a bigger importance to speci c categories like "relevant things"
in order to improve results, since we compute the similarity of this textual
category with our object detection extracted image label concepts. Categories like
"activities" and "locations" get a lesser importance factor since they are being
compared to the organizers label data which is limiting and lesser accurate. The
sum of all importance factors of all categories is equal to 1, which represents
100%.</p>
        <p>The computed factor is obtained from the distribution of the factor of
importance from empty categories to all other categories. If we don't extract any
textual data from a query topic for the category "activities", this category will
be empty, therefore, we apportioned the importance factor of the "activities"
category to all other categories, increasing their importance factor, in order to
maintain the sum of 1. This value is not the same for each category, we
maintain the ratio of the distribution the same as the distribution of the importance
factor between all categories. To make it clearly, if the importance factor for
"relevant things" is 0.5, which is half of the sum of all importance factors, and
if the "activities" category is worth 0.2 and has no extracted textual data, then
half of 0.2 is distributed to "relevant things", which increases the importance to
0.6 and the remainder 0.1 will be distributed the same way to other categories
ensuring that the sum of all importance factors is 1.</p>
        <p>The negative categories works the same way, but instead of contributing for
the con dence score, it decreases the value of the con dence.</p>
        <p>A general threshold was previously de ned in order to remove images of low
concept scores or low con dence score, images above the threshold are selected
for the query topic. The threshold was implemented through some trial and
error during the test phases, and it merely serves the purpose of saving some
computational time.</p>
        <p>Run 2 di ers from Run 1 not only in the image processing step, where di
erent image processing algorithms were used, but also in the retrieval step, where
all factors of importance were altered in order to give more importance to some
categories than others, as previously discussed. Another di erence is the
negative category which was discarded from the calculation of the con dence score
in Run 2.</p>
        <p>Finally, a script runs through all the selected con dence scores for a given
query topic and stores the fty highest on the csv le. As expected by the
previous year results, this automatic and exhaustive approach is not the most
suitable for a lifelog application.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Web application (Run 3)</title>
        <p>To improve our results in this challenge, we develop a web application in order to
visualize and provide an interactive tool for our lifelog system. As encouraged by
the organizers, in this run, we used a method that allows interaction with users.
As a rst approach, we are only considering the data provided by the challenge
organizers. We divided the web application into three stages, respectively:
{ Upload: the user uploads the images from the lifelog dataset into the
application. The textual data annotations provided by the organizers are
automatically uploaded and organized in the application database associated
with uploaded images.
{ Retrieval: the user introduces the inputs words extracted from the query
topic into several words categories, date and time. The retrieval process
starts comparing these inputs with the app database information. Finally, a
con dence to each image retrieved is assigned for the query topic.
{ Visualization: the user visualizes the retrieved images and scores, in form
of image gallery or data tables, divided into timestamp clusters. The user
choose manually the relevant clusters for the query topic.</p>
        <p>Figure 1 shows a general representation of our lifelog application. In a rst
stage, the user has to upload the images into the application, which are stored in
the database together with the data provided by the organizers for each image
from the lifelog dataset. Afterwards, the user requests the image retrieval for the
query topic by introducing relevant words manually in the application, the stage
of retrieval begins. These relevant words are divided into several categories, such
as objects, locations, activities, irrelevant words, date and time, and they are
compared with the labels stored in the database. This comparison is made using
the similarity of word vectors. Images with labels similar to the topic relevant
words are selected. Subsequently, the con dence for the corresponding image is
computed through the similarity value of the labels and the score of each similar
label in the database. In order to reduce the amount of images retrieved by the
system, images with low con dence are excluded from the output images. At the
end, the retrieved images are clustered by timestamp intervals and the user can
visualize the images in the form of image gallery or data tables.</p>
        <p>A more detailed explanation is provided in the following sections for each
stage of our lifelog application.
Upload In an initial stage, the user uploads the images dataset into the lifelog
application that are organized and stored in the database associated with some of
the data provided by the organizers, such as visual concepts and metadata. The
data is organized in our database into di erent tables/models, such as images,
concepts, locations, activities, scenes, attributes, among others. In our
application, each model maps to a single database table. Figure 2 shows a diagram of
these data models in the database. The relationship between models makes our
system faster and more e cient compared to an exhaustive approach.</p>
        <p>The image model has a many-to-many relationship with the models concept,
location, category, activity and attribute. For example: an image can contain
several concepts, and a concept can be found in several images. The tag eld
of the label model is the labels name extracted from the visual concepts and
metadata, which has a one-to-many relationship with the other models, in other
words, one label may be connected to several images and this label can be
associated to several models, such as concept and category models, depending
on the type of label and the number of times that appear in the image. Usually,
the name of the labels are in their base form or dictionary form, called the
word's lemma, however labels in other forms are transformed to the basic form
for further use. This transformation is called lemmatizer.
Retrieval Unlike the exhaustive approach of run 1 and 2, that compute the
con dence of each image, this approach (run 3) only computes the con dence of
some images that are selected in a rst step for the speci c topic by using the
similarity of word vectors, which makes this retrieval method more e cient and
using less processing time.</p>
        <p>The topics are manually analysed by the user, which extracts relevant words
from them. By introducing these words divided into several categories, such as
objects, locations, activities and irrelevant words in the application, the retrieval
step begins. If a topic contains time ranges, years or days of the week, the user
can also insert that data in our application to further lter the retrieved images.
Figure 3 shows the retrieval view of the web application.</p>
        <p>In the retrieval stage, the input arguments are: objects that appear on the
images; activities that the user was practicing; locations or places where the
user was; negatives or irrelevant things, activities or locations that should not
appear in the images; time ranges, years and days of the week (Monday, Tuesday,
Wednesday, Thursday, Friday, Saturday, and Sunday).</p>
        <p>
          The SpaCy library [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is used for two di erent tasks: to assign the base forms
words (lemmatizer) and to compare word vectors (cosine similarity). As in the
upload stage, the input words are processed to their lemma, which improves
and facilitate the comparison between word vectors. Afterward, the similarity
between the processed input words and the labels stored the database is
computed. Images that contains labels that are similar or equal to the words entered
by the user are selected to compute the con dence. If the user enters negative
words in our applications, images with labels similar or equal to these negative
words are automatically excluded. In order to improve the processing time of the
retrieval stage, the similarity of word pairs are stored in the database so that
it is not necessary to compute the similarity of the same word pair more than
once.
        </p>
        <p>The con dence of the selected images is computed using the similarity
calculated previously and the score of the labels. For labels without score eld, it
is only used the similarity to calculate the con dence. As last ltering on the
retrieval stage, the images are selected based on the con dence threshold.
Visualization The selected images are organized into di erent clusters based
on images timestamps provided by the organizers. The retrieved images were
visualized in our application organized into the timestamp clusters. The
application provides an easy way for users to visualize and identify the clusters that
are associated to the speci c topic. Figure 4 shows the user view of the clustered
images in form of images gallery. We provided another way of visualization in
form of a data table as shown in Figure 5.</p>
        <p>In order to improve the results, the user can exclude several irrelevant images
from the selected clusters. To improve the cluster recall of the run, the user can
change the con dence of a relevant image of each selected timestamp clusters to
the maximum con dence that consequently increases the f1 measure of this run.
We submitted a total of 3 runs on the LMRT sub-task. In this task, an arithmetic
mean of all query topics results is calculated as the nal score. The ranking
metrics was the F1-measure@10, which gives equal importance to diversity (via
CR@10) and relevance (via P@10), Cluster Recall and Precision at top 10 results,
respectively.</p>
        <p>We described the three submissions in Section 3. The rst two submissions
follows an automatic manner as in our previous work [7]. Due to the results of
this automatic approach, we take into consideration the development of a system
that allows interaction with real users, as emphasized by the organizers.</p>
        <p>
          Comparing the automatic with the interactive approach, a signi cant
improvement can be seen. This improvement is due to not only to the new
retrieval approach, but also to the interactive and visual approach. We consider
the visualization and user interaction one of the most important tools in a lifelog
application.
The results obtained are shown in Table 1, along with the best result in this task,
for comparison. The results of all of the participating teams can be found in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
We can observe that our last submission (run 3) is still not the best on this
task, but we made a considerable improvement compared with the automatic
approach from this year and the previous year [7], and we are also closing the
gap between the best ones, such as HCMUS team with the best F1-measure@10
on the LMRT task, with the ambition to obtain much better results.
        </p>
        <p>Considering the results shown in Table 1 we are convinced that the
interactive approach is a better suited method for the LMRT challenge, the user
visualization and interaction with the application allows for much more
accurate results. Creating a fully automatic system is complicated, this is because it
requires a lot of processing power, every image has to be fully processed in order
to extract labels. However, considering that computing time is not a problem,
a few ways that we could improve the results of our automatic approach in the
future would be implementing activity recognition algorithms, color recognition
algorithms and better scene recognition algorithms.</p>
        <p>As an initial lifelog application, the results shows that we are in a good path
to solve some of the problems that exist in these challenges, which could help to
improve the daily lives of many people. Considering the previous work problems
[7], we solve some of them in this work, such as the identi cation of bigrams,
trigrams or n-grams, which allows to compute the similarity between n-grams
or sentences.</p>
        <p>In our application, we only use the information provided by the organizers,
which leaves us somewhat limited as to the visual concepts in the lifelog images.
We believe that using the most recent state-of-art algorithms, a more rich
description of the images can be obtained, resulting in a performance increase. In
the future, we intend to integrate in our application features that have already
been developed in previous work, such as selecting images in upload stage based
on low level properties [8]. However, we think that using more of the metadata
provided by the organizers can also improve the result. For example, make use
of the GPS coordinates (latitude and longitude) to trace the lifelogger routes,
such as the way home to work and vice-versa.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future Work</title>
      <p>The Lifelog Moment Retrieval (LMRT) sub-task of ImageCLEF lifelog 2020 was
the baseline for a new web application that aims to help people to improve their
quality of life.</p>
      <p>We obtained the same exact results for the automatic approach (run 1 and
run 2) even when using di erent state-of-the-art object detection algorithms
and di erent weights for each category. Some of the reasons for this to occur is
because much of the used information used was provided by the organizers, like
activities and locations. Not only that, but the obtained scene recognition labels
were not accurate enough.</p>
      <p>In our interactive approach, using the application developed we were able to
obtain a F1-measure@10 score of 0.52, which is till date our best. This makes
us believe that an approach with visualization and user interaction is a more
suitable method for a lifelog application. Although the results are already better
compared to the previous work, our application is a baseline version which still
requires improvements and new tools.</p>
      <p>For future improvements in our approaches, we pretend to implement better
scene recognition, object detection, activity and color detection algorithms, since
color was a relevant element in some of the topics in the LMRT task. We will
also use other data provided by the organizers, such as GPS coordinates and
integrate features that have already been implemented in previous work.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>Supported by the Integrated Programme of SR&amp;TD SOCA (Ref.
CENTRO01-0145-FEDER-000010), co-funded by Centro 2020 program, Portugal 2020,
European Union, through the European Regional Development Fund.
6. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint
arXiv:1804.02767 (2018)
7. Ribeiro, R., Neves, A.J., Oliveira, J.L.: Ua.pt bioinformatics at imageclef 2019:
Lifelog moment retrieval based on image annotation and natural language
processing. In: CLEF (Working Notes) (2019)
8. Ribeiro, R.F., Neves, A.J., Oliveira, J.L.: Image selection based on low level
properties for lifelog moment retrieval. In: Twelfth International Conference on Machine
Vision (ICMV 2019). vol. 11433, p. 1143303. International Society for Optics and
Photonics (2020)
9. Wang, P., Sun, L., Smeaton, A.F., Gurrin, C., Yang, S.: Computer vision for
lifelogging: Characterizing everyday activities based on visual semantics. In: Computer
Vision for Assistive Healthcare, pp. 249{282. Elsevier (2018)
10. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million
image database for scene recognition. IEEE transactions on pattern analysis and
machine intelligence 40(6), 1452{1464 (2017)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dodge</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kitchin</surname>
          </string-name>
          , R.:
          <article-title>`outlines of a world coming into existence': pervasive computing and the ethics of forgetting. Environment and planning B: planning and design 34(3</article-title>
          ),
          <volume>431</volume>
          {
          <fpage>445</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Honnibal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montani</surname>
          </string-name>
          , I.:
          <article-title>spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing (</article-title>
          <year>2017</year>
          ), to appear
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Peteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DemnerFushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozlovski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , l Halvorsen,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fichou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Berari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Brie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Stefan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            ,
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.G.</surname>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2020: Multimedia retrieval in lifelogging, medical, nature, and internet applications</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the 11th International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ), vol.
          <volume>12260</volume>
          .
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          - 25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Mahajan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanathan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paluri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bharambe</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>van der Maaten</surname>
          </string-name>
          , L.:
          <article-title>Exploring the limits of weakly supervised pretraining</article-title>
          .
          <source>In: Proceedings of the European Conference on Computer Vision (ECCV)</source>
          . pp.
          <volume>181</volume>
          {
          <issue>196</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ninh</surname>
          </string-name>
          , V.T.,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , l Halvorsen,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          , D.T.:
          <article-title>Overview of ImageCLEF Lifelog 2020:Lifelog Moment Retrieval and Sport Performance Lifelog</article-title>
          .
          <source>In: CLEF2020 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceurws.org&gt;, Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          -25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>