<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging Content and Context in Understanding Activities of Daily Living</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Minh-Son Dao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asem Kasem</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohamed Saleem Haja Nazmudeen</string-name>
          <email>mohamed.saleemg@utb.edu.bn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universiti Teknologi Brunei</institution>
          ,
          <country country="BN">Brunei</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Information Technology</institution>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces a content-context-based method to automatically provide a summarization of lifelog data based on selected concepts of Activities of Daily Living (ADL). The main idea of the proposed method is to create a so-called (1) Daily-Normal Environment Panorama (DNEP) image, and a (2) Daily-Abnormal Environment (DAE) Taxonomy. The former is used to detect events that happen in known environments such as in a house, in an o ce and on the way from a parking lot to an o ce. The latter is used to detect events whose concepts can be detected by a pre-de ned taxonomy such as in a restaurant, in a church, and on a street. The proposed method is evaluated by using the data and evaluation tool o ered by organizers of imageCLEFlifelog2018 - subtask Activities of Daily Living understanding (ADLT). The experiments show that the proposed method works better than methods proposed by other participants of the imageCLEFlifelog2018.</p>
      </abstract>
      <kwd-group>
        <kwd>lifelog data analysis</kwd>
        <kwd>image alignment and stitching</kwd>
        <kwd>se- mantic taxonomy</kwd>
        <kwd>heterogeneous data segmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Lifelogging is an interesting topic that can help to understand the daily living
activities and to recall moments of interest that happened in the past. The scope of
applications that utilize lifelog data extends to human-supported scenarios such
as personal healthcare and personal assistants [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Since lifelog data is a
timeseries data, events detection in lifelog data can be considered as scene detection
in video data where data segmentation plays an essential role [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Therefore, the
task of segmentation in lifelog data analysis is crucial, and recently many works
have focused on lifelog data segmentation [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Most of these methods
utilized visual features only for video segmentation. However, lifelog data contains
not only visual information, but also heterogeneous data such as textual data
(e.g. tags, comments), geo data (e.g. GPS, places name), and physiological data
(e.g. heartbeat, skin temperature). Hence, it is essential to have a method that
can analyze such heterogeneous and big data to extract the information people
may need.
      </p>
      <p>
        The imageCLEFlifelog2018 (the organizer hereafter) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ][
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] organizes a
challenge to encourage participants to propose methods to automatically analyze
such data towards categorizing, summarizing, and retrieving information of
interest. We have joined the challenge with the subtask "Activities of Daily Living
understanding" (ADLT), and this paper reports our work and discusses its
evaluation on this subtask.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>
        In this section, we introduce what we call DNEP image and DAE taxonomy
and how to leverage them to detect events in lifelog data. The idea behind
these image and taxonomy concepts is based on "Content without Context is
meaningless" [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. It means that we use not only the content of lifelog data to
understand an event, but also the context where that event happened. This is
expected to improve event detection as well as decrease the missing/redundant
information of events boundaries.
2.1
      </p>
      <p>Daily-Abnormal Environment Taxonomy (Contexts and</p>
      <p>Activities)
Most of the events that happen daily have their own unspoken/spoken rules by
which we can build a suitable taxonomy. For example, when visiting a church
the environment must contain salient and typical symbols of a church such as
the cross and Saint statutes, while the activities could be: slowly walking,
quietly sitting or kneeling. Therefore, based on the events concept we can build
a taxonomy for the speci c event to further detect that event in lifelog data.
Another example is socialising in a restaurant. The visual taxonomy of a meal
table, especially a menu and a counter (with or without a queue), can be
integrated with GPS and/or the restaurants name tagged by people to distinguish
whether the event happened in a restaurant or in a relatives house.</p>
      <p>In fact, each daily activity can be determined when knowing the environment
where such an activity happens. These environments again can be determined
by "scene recognition" and "visual concepts detection" tasks. The former names
a place and the latter labels all objects appeared inside the place.</p>
      <p>
        The work carried out by Zhou et al. [13] is a good example of scene
recognition. In this work, the scene hierarchy de ned by the authors has two levels,
and each scene is located to a suitable slot in this scene hierarchy. For example,
the conference room scene is located at (level 1: indoor ! level 2: workplace
(ofce building, factory, lab, etc.))3. The organizer also o ered the scene ontology
described in NTCIR-13 Lifelog Ontology [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This ontology gives the summary
of scenes appeared in the dataset. By integrating the scene ontology and scene
3 http : ==places2:csail:mit:edu=download:html
hierarchy, we build DAE taxonomy from the root to level 3, as illustrated in Fig.
1.
      </p>
      <p>
        When successfully locating scene A (in level 3), the next question is "what
kind of activities normally happens and which visual concepts always appear
in the scene A?". Such a question leads to the need of building level 4 of DAE
taxonomy. In this level, each level-3 scene has two same-level categories activities
and visual concepts. The former is built based on the Activities/Facets of Life
activity de ned in NTCIR-13 Lifelog Ontology ; and the latter is constructed by
utilizing the visual concepts and food-logs and drink-logs described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. For
example, in a kitchen, preparing a meal often happens and de nitely there must
be kitchen appliance and foods, illustrate in Fig. 1.
      </p>
      <p>Algorithm 1: DAE taxonomy building
1. Repeat Alg.1-step.1 to the ground-truth data to generate the training data.
2. Use the context, physical activities tags, physiological data, and image/visual
concepts provided by the organizer and scene hierarchy o ered by [13] to
build the DAE taxonomy. In our case, the physical activities tags and
physiological data are utilized to determine two categories (1) moving (2)
nomoving.
3. Build a lookup location table (DAE-LT) for determining the geofencing of
the event using GPS and places name tags.</p>
      <p>The moving of people inside a scene also can give rich semantic cues for
understanding their activities. For example, the loop of long standing (no-moving)
and short walking (moving) inside a kitchen can di er from the loop of long
sitting (no-moving) and short walking/turning (moving) inside an o ce. These
information can be captured by using biometrics:(grs, steps) entry in the
metadata of dataset with the human activities recognizer developed in [15].</p>
      <p>The Algorithm 1 summarizes the content of this subsection.
2.2</p>
      <p>Daily-Normal Environment Panorama Image (Visual</p>
      <p>Background)
One of the characteristics of lifelog data is the repeated routine. People usually
have at least one place to visit almost every day and the environment of this place
rarely changes, such as at home, a relatives house, an o ce, a favorite restaurant,
and a familiar supermarket. Therefore, if we can accumulate all images captured
from those places, we can build a panorama image. Consequently, if we can
successfully project a lifelog image onto a panorama image (e.g. using image
alignment, object detection, image segmentation) with a known concept, we can
assign the right event label for that image and further detect the boundary of
that event.</p>
      <p>
        The Image Alignment and Stitching have been researched for over a decade
now [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Two popular approaches are applied to align and stitch images (1)
features-based, and (2) direct (or global) methods. While the former uses
images features (e.g. points, edges), the latter utilizes the whole image to estimate
the transformation between images. We utilized two methods introduced by
Meneghetti et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Poleg and Peleg [12] to create our DNEP images. The
former can deal with the sparsely structured environment where not enough
distinct features can be detected such as in the case of uniform walls, oors and
ceilings in indoor scenarios, and sky and sea in outdoor scenarios. The latter
can deal with non-overlapping images issue that happens due to non-continuous
recording if lifelog data.
      </p>
      <p>After creating a DNEP image, this image will be located into the DAE
taxonomy by using scene recognition tools [13]. The Fig. 2 illustrates the DNEP image
of in a living room scene that created by aligning and stitching all developing
data containing living room images.</p>
      <p>The Algorithm 2 denotes the way we create DNEP image.</p>
      <p>
        Algorithm 2: DNEP image building
1. Use the ground-truth data to build the training data of a given event.
2. Assign the suitable hierarchical context (e.g. indoor ! in a house ! in a
kitchen, indoor ! in a house ! in a living room, indoor ! in a working
place ! in an o ce) to the training data.
3. Build a lookup location table (DNEP-LT) for determining the geofencing of
the event using GPS and places name tags.
4. Utilize algorithms introduced in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and [12] to build events DNEP image.
      </p>
      <p>In fact, DNEP image is a set of panorama images created by aligning and
stitching all images contained in the training data. In the beginning, there could
be several panorama images due to not having enough images to re ect the
surrounding environment. Nevertheless, these panorama images will be merged
if more images are added to the training dataset. This process can be done either
by manually adding more ground-truth data or by automatically importing the
images generated by Algorithm 3.</p>
      <p>The Fig. 3 and 4 illustrate one example of DNEP image. In this case, there
are two DNEP images that express the same environment o ce. Due to lack
of suitable images they have temporally not yet merged together. Nevertheless,
there is a ghost laptop-monitor that could be a good cue for merging two DNEP
images. In this paper, this problem has not yet solved and is served for the future
work.
2.3</p>
      <p>Activities of Daily Living Understanding Using DNEP image
and DAE taxonomy
The idea of our content-context-based segmentation algorithm is quite simple.
We divide the environment into two distinguish categories: (1) daily-normal
environment, and (2) occasionally-abnormal environment. For example, the in a
home and in an o ce environments are assigned to the former, and the in a
publicly-accessible building and in an open space are classi ed to the latter one.</p>
      <p>With the daily-normal environment category, we create DNEP images and
treat them as the visual background of activities, and access DAE taxonomy by
top-bottom direction. Giving an image a, we simply use the object detection or
template matching to justify whether this image belongs to the DNEP image of
the required environment A. We than use DAE taxonomy to check how many
visual concepts and activities of the current scene A the image a can satisfy to
get the conclusion of what activity it is.</p>
      <p>With the occasionally-abnormal environment, we apply DAE taxonomy in
the bottom-up manner. It means that, rst we try to extract as many visual
concepts as possible from a given image to ful l visual concepts of level 4. Then,
we recognize the scene of the given image to match level 3 to the root. Other tasks
such as location name extraction and activities detection are carried out parallel
to ll the activities of level 4. Finally, we check whether the new taxonomy (just
created) is matched with the DAE taxonomy of the required task.</p>
      <p>The Algorithm 3 abstractly describes how we can carry out the
contentcontext-base segmentation to meet the requirement of the challenges.</p>
      <p>Algorithm 3: Content-Context-based Segmentation
1. Prepare DNEP images and DAE taxonomies based on the quest.
2. Pick an event. Based on the events concept, it could have a DNEP image
and/or DAE taxonomy that can be used for segmentation.
3. IF using DNEP image
(a) create the CANDIDATE-A bu er as f(image id; event=non
(b) load the related DNEP image.
(c) with each lifelog image that falls in the temporal frame de ned by the
event (e.g. watching TV before 7am) determine whether this image
belongs to the DNEP image by applying object detection (with occlusion
option).</p>
      <p>i. if successful, assign the event label to this image, and add to the
bu er CANDIDATE-A.
ii. if this image does not appear in the DNEP image but its location is
still in the DNEP images geofencing, then use the algorithm in [12]
to align and stitch this image to the DNEP image. Next, assign the
event label to this image, and add it to the bu er CANDIDATE-A.
iii. if this image and its location do not belong the DNEP image, then it
is assigned non-event label and add to the bu er CANDIDATE-A.
(d) repeat (c) until all temporal frame is scanned.
(e) merge all consecutive (image-id, event label) of CANDIDATE-A if a
certain number of non-event labels lay between two event clusters. In
our case, this number is set to be less than 1/10 of total time when
merging them together.
(f) those images that are assigned as non-event-label before merging and
as event-label after merging will be sent to the CANDIDATE-A set.
eventlabel)g
Further, they will be manually con rmed and automatically aligned and
stitched to the relative DNEP image. This will help to decrease the
number of panorama images as well as to increase the coverage of the
DNEP image.
4. IF using DAE taxonomy
(a) create the CANDIDATE-B bu er as f(image id; event=non
(b) load the related DAE taxonomy.
(c) with each lifelog image that falls in the temporal frame de ned by the
event,
i. detect all objects and scenes de ned in the DAE taxonomy. NOTE:
In our case, we used visual concepts that contain both objects and
scenes names, provided by the organizer for this task.
ii. extract information of locations (GPS, place's name), and physical
activities (action's names, moving/no-moving).
iii. check whether this information satis es the DAE taxonomy. If yes,
add it to the CANDIDATE-B bu er as event-label. If not, add it as
non-event-label.
(d) merge all consecutive (image-id, event label) of CANDIDATE-B similarly
to 3(e) above.
5. IF both DNEP images and DAE taxonomies are used
(a) since we treat a DAE taxonomy as a foreground and a DNEP image as
a background, we merge the CANDIDATE-A and CANDIDATE-B so
that CANDIDATE-A should cover CANDIDATE-B.
The data and metrics o ered by imageCLEFlifelog2018 - subtask Activities
of Daily Living understanding (ADLT) are utilized to evaluate the proposed
method. Ten events with given concepts are required to be detected; each
detected event must be reported in the form of a triplex (topic-id, number-of-times,
number-of-minutes) where topic-id is the number of the queried topic,
numberoftimes reports how many times the event occurred, and number-of-minutes tells
us for how long (in minutes) the event lasted. Equation (1) is the metric used to
evaluate our results.</p>
      <p>1
ADLscore = (max(0; 1
2
jn</p>
      <p>ngtj ) + max(0; 1
ngt
jm</p>
      <p>mgtj ))
mgt
(1)
where (n, ngt) and (m, mgt) are the (submitted value, ground-truth value) for
how many times the events occurred, and for how long (in minutes) the events
lasted, respectively.</p>
      <p>Based on the testset provided by the organizer, we assigned categories of
DNEP image and DAE taxonomy to the ten required queries as denoted in
Table 1</p>
      <p>We used the same parameters of the methods we have utilized. Table 2 reports
the results of participants in this subtask.
Query ID Query (ADL, context) Category
1 (Drinking co ee, in an O ce) DNEP image, DAE taxonomy
2 (Shopping, outside O ce) DAE taxonomy
3 (Preparing meals, Home) DNEP image, DAE taxonomy
4 (Watching TV, Home) DNEP image, DAE taxonomy
5 (Listening/Watching Presentations, at Work) DAE taxonomy
6 (Using mobile device, In a vehicle) DAE taxonomy
7 (Not using computers, In an o ce) DNEP image, DAE taxonomy
8 (Walking, on the street) DAE taxonomy
9 (, In a church) DAE taxonomy
10 (Socialising/Eating/Drinking, In a restaurant) DAE taxonomy</p>
      <p>In fact, the proposed method works case-by-case since it heavily depends on
the content and context of a given event. The training phase is vital and needs
manual intervention to build a suitable taxonomy depending on given activities
and contexts. While the DNEP image can be generally generated without or with
little manual support, the DAE taxonomy is established using peoples knowledge
about the event and how many objects the system can detect and recognize from
images. Thus, if the person lacks events knowledge to build the DAE taxonomy,
the result of event segmentation could degrade.</p>
      <p>For some events, both DNEP image and DAE taxonomy did not work well,
e.g. in the case of "Find how many times and how long the user is having co ee in
the o ce. Having co ee at the bars at the workplace is not considered." (topic 1
- subtask ADLT). The DNEP image successfully detects an o ce scene, and the
DAE taxonomy can recognize a co ee cup on a table. Nevertheless, it is hard to
nd a suitable hint to know exactly when the person drinks the co ee. Although
that person already tagged the time for drinking co ee, visual and other cues
say nothing about that activity. In this case, our method almost failed to detect
the right boundary of this event
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>In this paper, we introduce a content-context-based method to automatically
detect events with given concepts from lifelog data. Data and metrics o ered
by imageCLEFlifelog2018 are used to evaluate the proposed method. The
dailynormal environment panorama image and the daily-abnormal environment
taxonomy are created to detect events. The events content (e.g. visual, textual,
physiological, and GPS features) and context (e.g. concepts and taxonomy) are
carefully taken into account to create DNEP image and DAE taxonomy as well
as to detect events. Both DNEP image and DAE taxonomy have the ability
to evolve themselves along the lifelog data time. It means that the more data
gets recorded, the larger the scope of events DNEP image and DAE
taxonomy can cover. In future, post-processing to polish events boundaries will be
investigated. Moreover, fusion of features [14] will be evaluated to seek better
evaluation. Currently, we only use features provided by the organizer. These
features are somehow not enough for building a strong DAE taxonomy as well
as successfully projecting an image into DNEP images. Consequently, we will
develop our own features extractors to ful l our requirements.
12. Poleg, Y., Peleg, S.: Alignment and Mosaicing of Non-Overlapping Images. In:</p>
      <p>Procs. IEEE Int. Conf. on Computational Photography (ICCP) (2012)
13. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million
Image Database for Scene Recognition. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 1{14 (2017)
14. Dao, M.S., Pham, Q.N.M., Kasem, A., Nazmudeen, M.S.: A Context-Aware
LateFusion Approach for Disaster Image Retrieval from Social Media. In: ACM ICMR,
Yokoham, Japan (2018)
15. Dao, M.S., Dang-Nguyen, D.T., Riegler, M., Gurrin, C.: Smart Lifelogging:
Recognizing Human Activities using PHASOR. ICPRAM 2017 (2017)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smeaton</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doherty</surname>
            ,
            <given-names>A.R.:</given-names>
          </string-name>
          <article-title>LifeLogging: Personal Big Data</article-title>
          .
          <source>Journal of Foundations and Trend R in Information Retrieval</source>
          .
          <volume>8</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>125</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Del</given-names>
            <surname>Fabro</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          , Boszormeny, L.:
          <article-title>State-of-the-art and future challenges in video scene detection: a survey</article-title>
          .
          <source>Journal of Multimedia Systems</source>
          .
          <volume>195</volume>
          ,
          <issue>427</issue>
          {
          <fpage>454</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Doherty</surname>
            ,
            <given-names>A.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smeaton</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ellis</surname>
            ,
            <given-names>D.P.W.</given-names>
          </string-name>
          :.
          <article-title>Multimodal segmentation of lifelog data</article-title>
          . In: Procs.
          <article-title>Large Scale Semantic Access to Content (Text, Image, Video, and</article-title>
          <string-name>
            <surname>Sound</surname>
          </string-name>
          ) (
          <source>RIAO '07)</source>
          , pp.
          <volume>21</volume>
          {
          <issue>38</issue>
          Paris, France (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dimiccoli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          et al.:
          <article-title>SR-clustering: Semantic regularized clustering for egocentric photo streams segmentation</article-title>
          .
          <source>Journal of Computer Vision</source>
          and Image Understanding.
          <volume>155</volume>
          ,
          <issue>55</issue>
          {
          <fpage>69</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gupta</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Approaches for Event Segmentation of Visual Lifelog Data</article-title>
          .
          <source>In: MultiMedia Modeling</source>
          , pp.
          <volume>581</volume>
          {
          <issue>593</issue>
          (
          <year>2018</year>
          ). https://doi.org/10.1007/
          <fpage>9783319736037</fpage>
          -
          <lpage>47</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Furnari</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Battiato</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farinella</surname>
            ,
            <given-names>G.M.</given-names>
          </string-name>
          :
          <article-title>Personal-Location-Based Temporal Segmentation of Egocentric Videos for Lifelogging Applications</article-title>
          .
          <source>Journal of Visual Communication and Image Representation</source>
          .
          <volume>12</volume>
          ,
          <issue>1</issue>
          {
          <fpage>12</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villegas</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eickho</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrearczyk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farri</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lungren</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            , L.m Lux,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Overview of ImageCLEF 2018:
          <article-title>Challenges, Datasets and Evaluation</article-title>
          . In: Experimental IR Meets Multilinguality, Multimodality, and Interaction,
          <string-name>
            <surname>CLEF</surname>
          </string-name>
          <year>2018</year>
          , Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lux</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Overview of ImageCLEFlifelog 2018:
          <article-title>Daily Living Understanding and Lifelog Moment Retrieval</article-title>
          . In: Procs. CEUR Workshop, CLEF2018 Working Notes, Avignon, France (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Jain</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sinha</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Content without Context is Meaningless</article-title>
          .
          <source>In: Procs. ACM MM</source>
          <year>2010</year>
          , pp.
          <volume>1</volume>
          {
          <fpage>10</fpage>
          . ACM, Firenze, Italy (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Szeliski</surname>
          </string-name>
          , R.:
          <article-title>Image Alignment and Stitching: A Tutorial</article-title>
          .
          <source>Journal of Foundations and Trend R in Computer Graphics and Vision</source>
          .
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>104</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Menegetti</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Danelljan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Felsberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nordberg</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Image Alignment for Panorama Stitching in Sparsely Structured Environments</article-title>
          . In: R.R.
          <article-title>Paulsen and</article-title>
          K.S.
          <source>Pedersend (Eds) SCIA</source>
          <year>2015</year>
          , LNCS 9127, pp.
          <volume>428</volume>
          {
          <issue>439</issue>
          (
          <year>2015</year>
          ). https://doi.org/10.1007/
          <fpage>9783319196657</fpage>
          -
          <lpage>36</lpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>