<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LIFER 2.0: Discovering Personal Lifelog Insights using an Interactive Lifelog Retrieval System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Van-Tu Ninh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tu-Khiem Le</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liting Zhou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Piras</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Riegler</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathias Lux</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh-Triet Tran</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cathal Gurrin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Duc-Tien Dang-Nguyen</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dublin City University</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Klagenfurt University</institution>
          ,
          <addr-line>Klagenfurt</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Pluribus One &amp; University of Cagliari</institution>
          ,
          <addr-line>Cagliari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Simula Research Laboratory</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Bergen</institution>
          ,
          <addr-line>Bergen</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>University of Science</institution>
          ,
          <addr-line>VNU-HCM, Ho Chi Minh City</addr-line>
          ,
          <country country="VN">Vietnam</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of the Organiser Team in the ImageCLEFlifelog 2019 Solve My Life Puzzle (Puzzle) and Lifelog Moment Retrieval (LMRT) tasks. We proposed to use LIFER 2.0, an enhanced version of LIFER, which was an interactive retrieval system for personal lifelog data. We utilised LIFER 2.0 with some additional visual features, obtained by using traditional visual bag-of-words, to solve the Puzzle task, while with the LMRT, we applied LIFER 2.0 only with the provided information. The results on both tasks con rmed that by using faceted lter and context browsing, a user can gain insights from their personal lifelog by employing very simple interactions. These results also serve as baselines for other approaches in the ImageCLEFlifelog 2019 challenge to compare with.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>An increasingly wide range of personal devices, such as smartphones, video
cameras, and wearable devices allow individuals to capture pictures, videos, and
audio clips for every moment of their lives. Considering the huge amount of data
created, questions on how to design and develop an automatic system for fast and
accurate data retrieval and understanding are becoming increasingly important.</p>
      <p>
        In this work, we highlight the state-of-the-art techniques adopted for
ImageCLEFLifelog 2019 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] at ImageCLEF2019 [7], which include Solve My Life
(PUZZLE) and Lifelog Moment Retrieval (LMRT). For ImageCLEF LMRT task,
considering the multi-modality of lifelog data, we pre-processed the images to
remove noisy data as a rst step and then focused on the exploitation of
associated metadata (time, activities, location, etc.) from moments of daily life.
Inheriting the structure of the interactive search engine from [15], we developed
a new facets lter and context browsing interface, with additional visual
concepts and criteria expansion for ImageCLEF2019 LMRT. For the Puzzle task,
we interpreted this task as a clustering problem and applied the state-of-the-art
Visual Bag-of-Words [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] method for both reordering lifelogger's moments and
predicting the part-of-day.
      </p>
      <p>Building on prior research, we extended the retrieval system and optimised
it for the domain of lifelogging. The main contributions of this paper are thus:
{ A short survey of the current and state-of-the-art work in relevant domain.
{ An introduction and discussion of the schema and functions of our baseline
interactive search engine.
{ A presentation, analysis, and discussion of the results obtained from the
o cial competition.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Interactive Lifelog Retrieval System: In recent years, a large volume of
work has been performed on developing information retrieval approaches to
increasingly commonplace personal digital collections, such as lifelogs. This has
been supported by a number of international benchmarking e orts, the most
recent of which is the Lifelog Search Challenge(LSC) [5], which is a multi-annual,
real-time retrieval challenge that evaluates di erent approaches to interactive
retrieval from lifelog collections.</p>
      <p>
        For benchmarking systems, Zhou el al. [15] provided an e cient retrieval
system in 2018, based primarily on faceted querying using captured metadata, which
served as a baseline for other systems, and provided the basis for the LIFER 2.0
system presented in this paper. For Interactive Retrieval, the LEMORE[11]
system, integrates classical image descriptors with high-level semantic concepts and
designs a graphical user interface that uses natural language to process a user's
query. For a more complete review of interactive retrieval systems, we refer the
reader to [5], which highlights six di erent interactive lifelog retrieval systems.
More recently, we have noted the development of novel retrieval approaches, that
transcend the desktop, such as the Virtual Reality interactive retrieval system [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
that combines visual concepts and dates/times as the basis for a faceted ltering
mechanism that presents results in a novel VR-interface.
      </p>
      <p>
        Image Retrieval: The description of the puzzle task is to rearrange the
massive image data (without time stamps) in in chronological order and predict
the correct day (Monday or Sunday) and part of day (morning, afternoon, or
evening). One possible computer vision-based approach is to detect and extract
features for e cient image retrieval. Visual Bag-of-Words is a well-known
approach for this kind of solution. There are many visual features that can be used
for visual Bag-of-Words such as SIFT [9], root-SIFT, SURF [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], etc. Another
proper approach is to use deep feature from deep neural network like ResNet [6]
to classify the part of day of an image and retrieve the most similar images to
rearrange the images.
      </p>
    </sec>
    <sec id="sec-3">
      <title>ImageCLEFlifelog2019 LMRT Task: LIFER 2.0</title>
    </sec>
    <sec id="sec-4">
      <title>Baseline Interactive Retrieval Search Engine</title>
      <p>For ImageCLEFlifelog2019 LMRT task, we exploit LIFER 2.0- baseline
interactive search engine which was initiated in [15], and improved in [10]. In this
section, we provide a description of the interactive retrieval system and how it
can be used to solve information needs. Our system, as described in [10], is a
criteria matching engine which is built mainly from ve categories: date/time,
location, activity, biometrics, and visual concepts.
1. Date/Time: Date/time is an important feature in our search engine system
because it can narrow down the scope of moment searching. For instance,
time is speci cally useful in query 6: "Having breakfast at home" (must
have breakfast at home from 5:00 am to 9:00 am". It could also be useful for
result ltering and lifelogger's behaviour guessing. In our system, date/time
criteria include week days, date, and time.
2. Location: Location criteria contain location categories and location names,
which are also advantageous for user to retrieve the relevant images in topic
1, 5, and 6. These topics depend on mostly on location ltering to nd the
proper moments and increase the variety of chosen images.
3. Activity: Although activity metadata in ImageCLEFlifelog 2019 dataset
is not diverse, it is a potential criterion to be integrated to our system to
improve the search engine with user actions/behaviours when it is ready.
4. Biometrics: Due to the lack of activity information, biometric data provide
us the means to guess the moments when lifelogger is eating, walking, moving
by heart rate and calories changes.
5. Visual Concepts: These concepts play the key roles in nding the proper
images for topics owing to the diversity of concepts, annotations, and
keywords. They include place attributes, place categories, and objects' name.
Place attributes and categories are extracted from places365-CNN [14] with
top 10 extracted attributes and top 5 place category predictions. Objects in
image are detected using Faster R-CNN [12] trained on MSCOCO dataset [8].
These ve sources of information are instantiated in the user interface as facets
of a user query, as shown in Figure 1.</p>
      <p>The interface of our system was divided into two parts: facets lter and
context browsing. For the facets lter, a user could adjust his/her choice of
ve aforementioned criteria to retrieve the desired moments. In each criterion,
except for location, the keywords and tags are combined into query condition
using the OR operator to expand the diversity of returned results. Finally, all
the conditions from each criterion are merged into one nal query by utilising
the AND operator. For context browsing, the keywords and annotations from
location, visual concepts, activity are added into an auto-complete search bar.
The user then types and chooses the proper tags which are suitable for current
context of each topic. The query processing of this function is the same as the
facets lter. The interface of LIFER 2.0- baseline interactive search engine is
demonstrated in Figure 2.</p>
      <p>Visual
Concept</p>
      <p>Activity</p>
      <p>Location</p>
      <p>Time</p>
      <p>Biometrics
Images</p>
      <p>A set of criteria
User
Indexed Server
API/Interface</p>
    </sec>
    <sec id="sec-5">
      <title>ImageCLEFlifelog2019 Puzzle Task: Lifelogger's</title>
    </sec>
    <sec id="sec-6">
      <title>Activity Mining Approach</title>
      <p>
        In ImageCLEFlifelog2019 Puzzle Task, by utilising our baseline interactive search
engine, we could review the provided training data and study lifelogger's
activity. Because habit and daily routine of lifelogger's activity do not change much
in lifelogger's life, we use only visual information to reconstruct the order of
images in test set. We propose to utilise Visual Bag-of-Words [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] method to
retrieve the proper time of images in each query and predict part-of-day based on
the retrieved time. For this, we employ SIFT feature extraction [9] and conduct
experiments on the number of visual clusters - k using the K-Means algorithm.
The aim is to measure the e ect of our proposed method while increasing the
parameter k. The remaining steps are similar to the Bag-of-Words algorithm for
text retrieval [13]. The way how we handle the rank list to choose the nal time
for each image in test set is presented in section 5.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Experiment and Results</title>
      <p>LMRT Task
For the Lifelog Moment Retrieval Task, we conducted an interactive search
experiment with the participation of two novice users. Each person was trained
to use the search engine for 10 minutes and was then given a further 10 more
minutes to get used to the system by performing 2 sample queries. Following
this, the experiment began and the user executed 10 queries from test set. Table
1 displays the result of our two runs from the participant. As can be seen from
the table, we achieved 41% in terms of precision, with cluster recall of 31% and
29% in F1 score.
In order to obtain the timestamp of each image in the test set, we established
the majority vote among Top-N retrieved images from the returned ranking.
The nal time would be the average time of Top-N images. The accuracy of the
Lifelogger's Activity Mining Approach also depends closely on the con guration
of the Bag-of-Words model, especially the number of K clusters for visual features
extracted from SIFT detector. Therefore, we submitted 8 runs in total, with 2
con gurations of majority vote (Top-1 and Top-3) and 4 con gurations of K
clusters (512, 1024, 2048, 4096), which are summarised in Table 2</p>
      <p>We achieved the overall score of 26.8% which shows that the best con
guration is using the highest number of clusters and taking the time of the most
relevant image as the nal time.
40
50
In this paper, we introduced a baseline interactive search engine which uses
faceted ltering and context browsing for the ImageCLEFlifelog2019 LMRT task.
We also presented our proposed method for ImageCLEFlifelog2019 Puzzle Task
to re-order the lifelogger's moments by using visual Bag-of-Words based on the
belief of the minor change of his/her daily routine.</p>
      <sec id="sec-7-1">
        <title>RunID</title>
      </sec>
      <sec id="sec-7-2">
        <title>Puzzle Run 1</title>
        <p>Puzzle Run 2
Puzzle Run 3
Puzzle Run 4
Puzzle Run 5
Puzzle Run 6
Puzzle Run 7
Puzzle Run 8</p>
        <p>For the LMRT task, the analysis demonstrates that our search engine
increased the F1 score by increasing cluster recall through valid experiment
criteria. However, for novice users, the system still needs more annotation data
of activities, object names, in order to increase the e ectiveness of the search
engine.</p>
        <p>For the Puzzle task, it could be inferred that our proposed method could
segment the images into correct clusters for part-of-days. However, our method
could not solve the problem of re-ranking the moments in each cluster to increase
the Kendall's Tau score. This shows that reconstruction the moments in each
part of day still remains to be a challenge and requires further study.
7</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgement</title>
      <p>This publication has emanated from research supported in party by research
grants from Irish Research Council (IRC) under Grant Number GOIPG/2016/741
and Science Foundation Ireland under grant numbers SFI/12/RC/2289 and
SFI/13/RC/2106.
5. Gurrin, C., Schoe mann, K., Joho, H., Leibetseder, A., Zhou, L., Duane, A.,
DangNguyen, D.T., Riegler, M., Piras, L., Tran, M.T., et al.: [Invited papers]
Comparing Approaches to Interactive Lifelog Search at the Lifelog Search Challenge
(LSC2018). ITE Transactions on Media Technology and Applications 7(2), 46{59
(2019)
6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.</p>
      <p>CoRR abs/1512.03385 (2015), http://arxiv.org/abs/1512.03385
7. Ionescu, B., Muller, H., Peteri, R., Cid, Y.D., Liauchuk, V., Kovalev, V., Klimuk,
D., Tarasau, A., Abacha, A.B., Hasan, S.A., Datla, V., Liu, J., Demner-Fushman,
D., Dang-Nguyen, D.T., Piras, L., Riegler, M., Tran, M.T., Lux, M., Gurrin, C.,
Pelka, O., Friedrich, C.M., de Herrera, A.G.S., Garcia, N., Kavallieratou, E., del
Blanco, C.R., Rodr guez, C.C., Vasillopoulos, N., Karampidis, K., Chamberlain,
J., Clark, A., Campello, A.: ImageCLEF 2019: Multimedia retrieval in medicine,
lifelogging, security and nature. In: Experimental IR Meets Multilinguality,
Multimodality, and Interaction. Proceedings of the 10th International Conference of
the CLEF Association (CLEF 2019), LNCS Lecture Notes in Computer Science,
Springer, Lugano, Switzerland (September 9-12 2019)
8. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P.,</p>
      <p>Zitnick, C.L.: Microsoft coco: Common objects in context pp. 740{755 (2014)
9. Lowe, D.G.: Distinctive image features from scale-invariant
keypoints. Int. J. Comput. Vision 60(2), 91{110 (Nov 2004),
https://doi.org/10.1023/B:VISI.0000029664.99615.94
10. Ninh, V.T., Le, T.K., Zhou, L., Healy, G., Tran, M.T., Dang-Nguyen, D.T., Smyth,
S., Gurrin, C.: A Baseline Interactive Retrieval Engine for the NTICR-14 Lifelog-3
Semantic Access Task. In: The Fourteenth NTCIR conference (NTCIR-14) (2019)
11. de Oliveira Barra, G., Cartas Ayala, A., Bolan~os, M., Dimiccoli, M., Giro Nieto, X.,
Radeva, P.: Lemore: A lifelog engine for moments retrieval at the ntcir-lifelog lsat
task. In: Proceedings of the 12th NTCIR Conference on Evaluation of Information
Access Technologies (2016)
12. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time
object detection with region proposal networks. In: Proceedings of the 28th
International Conference on Neural Information Processing Systems -
Volume 1. pp. 91{99. NIPS'15, MIT Press, Cambridge, MA, USA (2015),
http://dl.acm.org/citation.cfm?id=2969239.2969250
13. Robertson, S.E., Jones, K.S.: Simple, proven approaches to text retrieval. Tech.</p>
      <p>rep. (1997)
14. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: A 10 million
image database for scene recognition. IEEE Transactions on Pattern Analysis and
Machine Intelligence (2017)
15. Zhou, L., Hinbarji, Z., Dang-Nguyen, D.T., Gurrin, C.: Lifer: An interactive lifelog
retrieval system. In: Proceedings of the 2018 ACM Workshop on The Lifelog Search
Challenge. pp. 9{14. ACM (2018)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bay</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ess</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tuytelaars</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Gool</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Speeded-up robust features (surf)</article-title>
          .
          <source>Comput. Vis. Image Underst</source>
          .
          <volume>110</volume>
          (
          <issue>3</issue>
          ),
          <volume>346</volume>
          {359 (Jun
          <year>2008</year>
          ), http://dx.doi.org/10.1016/j.cviu.
          <year>2007</year>
          .
          <volume>09</volume>
          .014
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Csurka</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dance</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Willamowski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bray</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Visual categorization with bags of keypoints</article-title>
          . In: Workshop on statistical learning in
          <source>computer vision</source>
          , ECCV. vol.
          <volume>1</volume>
          , pp.
          <volume>1</volume>
          {
          <fpage>2</fpage>
          .
          <string-name>
            <surname>Prague</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dang-Nguyen</surname>
            ,
            <given-names>D.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lux</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Overview of ImageCLEFlifelog 2019:
          <article-title>Solve my life puzzle and Lifelog Moment Retrieval</article-title>
          .
          <source>In: CLEF2019 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          , Lugano,
          <source>Switzerland (September</source>
          <volume>09</volume>
          -12
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Duane</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>User interaction for visual lifelog retrieval in a virtual environment</article-title>
          .
          <source>In: International Conference on Multimedia Modeling</source>
          . pp.
          <volume>239</volume>
          {
          <fpage>250</fpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>