=Paper= {{Paper |id=Vol-2863/paper-04 |storemode=property |title=A User Study on User Attention for an Interactive Content-based Image Search System |pdfUrl=https://ceur-ws.org/Vol-2863/paper-04.pdf |volume=Vol-2863 |authors=Mahmoud Artemi,Haiming Liu |dblpUrl=https://dblp.org/rec/conf/chiir/Artemi021 }} ==A User Study on User Attention for an Interactive Content-based Image Search System== https://ceur-ws.org/Vol-2863/paper-04.pdf
A User Study on User Attention for an Interactive Content-based
Image Search System
Mahmoud Artemia and Haiming Liub
a
     University of Bedfordshire, University Square, Luton, LU1 3JU, UK
b
     University of Bedfordshire, University Square, Luton, LU1 3JU, UK


                                  Abstract
                                  User attention is one of the fundamental indications of users’ interests in search. For content-
                                  based image search systems, it is important to understand what users pay attention to, and thus
                                  engage users more in the search process. It remains a big challenge to design a user-centered
                                  interactive interface that serves well for both user interaction and search model, and be able to
                                  bridge the unsolved problem in content-based image search called the Semantic Gap. In an
                                  effort to solve the problem, we designed an interactive content-based image search interface
                                  called Search Strategy (SS) based on Vakkari’s model. SS enables users to engage in three
                                  stages (pre-focus, focus-formulation, and post-focus) during the search process. We carried out
                                  a user study to observe which interface attracts more user attention. The user study is conducted
                                  in a lab-based setting using a screen-based eye tracker (Tobii Pro Nano) and Galvanic Skin
                                  Response (GSR) on the iMotions platform. The preliminary results show that participant
                                  attention is noticeably higher on the SS interface. This finding highlights the need for a well-
                                  designed interface that enables user interaction at all stages of image search process, and at the
                                  same time the interface should allow users to manipulate the search model effectively.

                                  Keywords
                                  Content-based image retrieval, user interface, active learning, Vakkari model, eye tracking,
                                  query formulation.

1. Introduction

Most image search systems rely on text-based retrieval. It is often challenging for users to describe their
search intents by describing images using keywords; these may lead to unsatisfactory retrieval results
containing images irrelevant to the users’ search intents [1]. To preserve the users’ intent visually and
improve the search performance, content-based image retrieval (CBIR) has emerged [2-6]. Since CBIR
search uses the representation of visual features (such as color, shape, and texture), it is built for users
to express more precisely their intents. [6]. Although CBIR helps cope with the ambiguity in text-based
image search systems, it presents a new challenge called the Semantic Gap, a gap between low-level
visual features that a computer understands and high-level semantics that users understand.
    CBIR mainly works on the representation of visual image features to identify similarity of the images
to the users’ visual queries. Sustained attention has been made in order to cope with two essential
challenges in CBIR system, i.e., intention and semantic gaps. Figure 1 illustrates the intention gap lies
between user search intent and desired query [6, 7], whilst the semantic gap refers to the difficulty of
mapping high-level concept to low-level image features [4].

BIRDS 2021: Bridging the Gap between Information Science, Information Retrieval and Data Science, March 19, 2021. Online Event.
EMAIL: Mahmoud.Artemi@study.beds.ac.uk (M. Artemi); haiming.liu@beds.ac.uk (H. Liu)
ORCID: 0000-0002-5177-8977 (M. Artemi); 0000-0002-0390-3657 (H. Liu)
                               © 2021 Copyright for this paper by its authors.
                               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Wor
    Pr
       ks
        hop
     oceedi
          ngs
                ht
                I
                 tp:
                   //
                    ceur
                       -
                SSN1613-
                        ws
                         .or
                       0073
                           g

                               CEUR Workshop Proceedings (CEUR-WS.org)




                                                                                                     26
                                  Intention Gap



                                                                                    Search                             Image
               User                           Query                                 system                           collection
                                                                                             Semantic Gap
 Figure 1: Involvement of intention gap and semantic gap in content-based image retrieval (CBIR).
 Figure adapted from [7]

The basic framework of a CBIR search system is shown in Figure 2, which comprises of four main
components, query formulation / relevance feedback, feature extraction, similarity matching, and results
presentation.
       • Query Formulation: from the user perspective, the user can use various query formulation
           schemas to express their intention.
       • Feature Extraction: also known as content representation, an image is constructed by an
           array of pixel distributions containing low visual features such as shape, color, and texture.
       • Retrieval Model: also known as similarity matching, the CBIR search model returns a set of
           ranking images by applying similarity metrics between image query and database images.
       • Relevance Feedback (RF): due to lack of sufficient semantics in a given query, the RF
           provides a mechanism to formulate and modify a given query, aiming to capture user intents
           more precisely.


                                                                                Image representation in
                 Image                 Feature extraction
                                                                                        space
                database                                                                                                  Offline Process
  System End




                                                                                                                                       Online Process
                                                            Image representation            Similarity
                               Feature extraction                                                                   Retrieved images
                                                                 in space                   matching
     User End




                     User              Query formulation/Relevance                        No                  Yes
                                                                                                  Satisfied                  Done
                   intention                    feedback


 Figure 2: CBIR system flowchart


Relevance feedback has been an effective way to bring users into the CBIR search loop, which allows
them to provide feedback to obtain improved results. Most of the research on relevance feedback
focuses on enabling users to provide feedback at the result assessment stage [1, 3]. However, often the
underline machine learning mechanisms in many CBIR systems need user feedback at the query
formulation stage for better training and search performance [6]. There is a need to design an interactive
CBIR search system that not only allows users to interact with the retrieved image results but also
allows users to visually explore the image collection and facilitates users to train the underlying search
model through a user-centered interactive search interface, therefore improving search performance and
users’ search experiences and satisfaction [1, 8, 9].
    In this paper, we present our CBIR system, developed based on the concept of Vakkari’s three-stage
model in [6]. We also report a user study that we carried out on our interactive CBIR system design.
The user study investigates the advantages of the proposed interfaces detailed in Section 3. The
preliminary results enable us to better understand the users’ information needs and the influence of user
attention.


                                                                           27
2. Related work

The work presented in this paper is shaped by prior studies in the area of interactive information
retrieval, especially relevant feedback approaches through user interface design, and task-based
information retrieval for a better search experience.

2.1.    Active learning paradigm

Active learning is a machine learning mechanism that requests labels of data instances to train a model.
Various active learning algorithms have been introduced for different applications [10-12]. Active
learning is a semi-supervised learning where the learner model has an active role in defining the current
data points to be labelled by an oracle (e.g., a user) [11]. The process of active learning starts with
proposing the images to be labelled, and then those new-obtained labels are added to the training set to
train the learning model. That is, active leaning iteratively requests the user to label items (such as
document or images) to obtain new data points. This process aims to get the desired results. In the active
learning paradigm, the training data are not selected beforehand, unlike in other machine learning
problems, which require treating the training data as fixed, selected data. During the training process,
the active learner has the role of choosing the data to be acquired for the training purpose. In the training
loop, the user typically takes actions that enable them to gain more information [13]. One of the major
obstacles in CBIR is the intention gap. In order to bridge the intention gap, Relevance Feedback (RF)
is used to capture semantic information from user intention and thus improve the search system.
Different RF mechanisms have been introduced in CBIR to enable users to steer the CBIR system
during retrieval process [4, 14-16]. This enables users to interact with the image search results.
Therefore, RF is used by the users to mark the returned results as being relevant or irrelevant for a given
query in an iteration schema. Then, the search model performs another search iteration to improve the
system performance and return a more relevant set of images; iterations are continued until the user is
satisfied with the search results. Although RF is introduced in CBIR, the search results can still be
unsatisfactory [17]; the amount of data provided as feedback may be too small or unreliable to improve
system performance as the system already knows about those selected images, which in fact sometimes
confuse the system [11].
    Due to limitations of conventional RF approaches, we propose applying active learning to the query
learning stage (query formulation /focus formulation). Although an active learning procedure still
requires users to judge the relevance level of the found items compared to their relevance level to the
query, there is a significant difference between conventional RF approaches and the proposed active
learning approach in terms of which images need to be labelled [17]. In the RF scenario, the user
typically labels the top ranked items of retrieval results to improve the system performance of the next
interaction. In this paper, this RF scenario is applied as the baseline system named Information Goal
(IG). In the proposed active learning scenario, the learner model actively requests the users to label
uncertain items to the search system in order to improve the system accuracy. In this paper, the proposed
active learning mechanism is called Search Strategy (SS). It has been reported that the learning rate in
an active learning mechanism is faster than in relevance feedback, and thus active learning achieves
better accuracy than RF [6, 11]. In this paper, the pool-based active learning is used together with a
support active machine in our CBIR system to enable the user’s engagement in the query formulation
stage (query learning). Our approach employs a Support Vector Machine (SVM) [18], which uses a
kernel model for classification [16]. Once again, the users’ needs can be defined as the users’ intents.
In order to capture the users’ needs and attention, the users are asked to provide feedback as their
preferences to given images. Those images are then used to train the learner model during the query
formulation stage. This is different from asking the user to give feedback based on the result images
that are already recognized by the search model (learner).




                                                      28
2.2.     Vakkari’s three-stage model

   In this paper, our focal point is the task-based model introduced by Vakkari [19]. It consists of a
three-stage information seeking process: pre-focus includes three actions performed by users – they
may initiate the search by selecting a query image before or after exploring the image collection; focus-
formulation is where users may refine or change the search activity; and post-focus at the end of the
search process, where a user can collect and save results of value to their needs. According to Vakkari’s
model, the exploratory search process begins with pre-focus as the user typically starts with broad
knowledge of a topic-based task, and then focus-formulation to narrow query formulation [20].
Decision-making may occur during the search process and continue to be presented clearly at the
assessment stage (post-focus). Users assess a set of returned images to find not only a set of relevant
images but also the best images that fit a given task and are useful (utility) for the user’s needs (intents).
The effects of applying Vakkari’s three search stages are described in Figure 3.

                              Pre-focus                 Focus formulation                  Post-focus

   Information gain         Broad                             Narrow                         Specific
         level            knowledge                          knowledge                      knowledge

    Relevance and                                       User involvement                       High
                                Low
     utility levels

 Figure 3: Effects of applying Vakkari’s three search stages


Vakkari’s model is applied in our search system design to support evaluation of user interaction in task
performance. Artemi and Liu [6] concluded that a better search system design is needed in order to
capture user intention in the early stage of task search process. More clearly, it is insufficient to grasp
the user’s search intention by asking the user to provide feedback on returned images that the search
model already knows about those images. Furthermore, in most existing image query schemas, the end-
user’s visual query formulation handles the form of a single image, which might be insufficient to
indicate the user’s search intention, in some cases.

2.3.    User search interface
Search interfaces play a vital role of intermediary interaction between search systems and end users. In
the context of information retrieval, various approaches have been presented to design effective
interfaces which fit user needs and more importantly, improve user interactions. Recent studies have
concluded that different aspects should be considered for user interface design, such as cognitive aspects
and task complexity, which might impede information seeking [21]. As in Section 2.1, it is observed
that Vakkari [19] studied the nature process of information seeking, but gave no guidelines to design
and implement search systems’ user interface aspects. This issue has also been presented in [22]. There
are few studies that have studied the role of low-level user interface functionalities at different stages
of the information seeking process.
    Huurdeman and Kamps [22] designed a multistage information search system to support the
information seeking process; the system was built upon the concept of task-based information seeking
theory. Artemi and Liu [6] proposed a three-stage interface based on Vakkari’s model for content-based
image retrieval to capture user’ intents during the focus formulation stage. White et al. [23] investigated
the usability of implicit and explicit relevance feedback; their findings were that implicit feedback was
used more in the early search stage while explicit feedback was used at the end of search process. Niu
and Kelly [24] found that query suggestions were used for complex and difficult search tasks in the
final stage of the search process. Kules et al. [25] conducted an eye tracking study in which exploratory
search tasks were performed on a faceted search interface; the findings showed the user attentions


                                                       29
started at facets, then on query and later moved to results. Huurdeman et al [26] proposed a multistage
simulated task approach, where three distinct tasks were performed in a way representing Vakkari’s
three-stage model. In this paper, a three-stage user interface [6] is used with eye tracking to look further
into the impact user engagement evolves on user attention.

3. Three-stage-based search interface design

Here, we consider the workflow of the Search Strategy interface (SS) along with baseline Information
Goal (IG) search interface (Figure 4-b) as presented in [6]. The SS interface enables the CBIR system,
built based on the active learning paradigm, to capture the users’ preferences during the query
formulation stage, where the users can provide additional image examples within the training stage.
The SS interface has three panels (Figure 4-a): the upper left panel is for exploring and selecting N
random images. The upper right panel is the feedback window, where a user marks images in the pool
query set as being relevant or irrelevant for selected iterations. In the bottom panel, the CBIR system
returns a diversity of resultant sets considered matching the concept learned, where the user assesses
the retrieved image set as being relevant and useful. The Explicit Searcher Model (ESM) from [6]
represents the sequence of interactions between a searcher and the CBIR system over the course of a
search session.




(a) Search Strategy (SS) interface [6]                           (b) Information Goal (IG) interface [27]
Figure 4: User interfaces used in this study

Using an eye tracker to record user interactions enabled us to investigate the effectiveness of user
engagement/attention in the focus stage and the exploratory search process. The SS system used the
active learning mechanism where data is abundant [28]. It enabled the users to provide feedback as an
intent or preferences. This method is successful in accelerating learning [29].
The feature extractor parameters were applied in these experiments as presented in [3]. The experiments
were conducted using two interfaces, SS and IG interfaces (Figure 4). The relevance feedback
mechanism was applied to the IG system [9]. Figure 5 shows the study boundary settings of the system
proposed in this paper. The query type used was query by example to find a target image through an
interactive paradigm; the image-visual features were applied for the image matching process.




                                                      30
                       By keyword                                                          Clustering
                                             Visual
       Specific         By sketch            Feature          Visual       Classic
                                                                                            Browsing
                    By concept layout
       General                              Semantic         Semantic    Interactive
                                             Feature                                         Target
                       By example

                                               Feature                     Search
  Application         Query method                           Matching                        Approach
                                            representation                 schema
 Figure 5: The boundary settings in our study

4. Evaluation

A controlled lab-based user study was conducted on the SS and IG systems using eye tracker and
galvanic skin response (GSR) devices with respect to Vakkari’s three-stage model of the information
seeking process. This offers additional insight into how the participant’s engagement in query
formulation influences user attention at the result assessment stage.

4.1.      Experimental setup

The experiments were conducted in a UX lab. The aim of the experiments was to find out at which
stage the user’ attention was high and why so, through collecting and analyzing eye gaze activity and
galvanic skin response (GSR) to capture emotional arousal.
   The eye tracker used in this setting was Tobii Pro Nano, selected to capture the activity of eye gaze
fixation with sample rate 60Hz, and detecting visual attention. The GSR was used to record the level of
emotional response that users experienced with the system. The model of used devices with output
metrics is illustrated in Table 1.

Table 1
Devices used in this study with output metrics
     Device             Model                 Tool                Output metric        Output metric
                                                                                           type
   Eye tracker     Tobii Pro Nano       Heatmaps, and areas of     Fixation, and         Attention
                                            interest (AOI)          time spent
        GSR         Wireless GSR           Automated peak         Peak detection        Emotional
                     Shimmer                  detection                                  arousal

Figure 6 shows the experimental setup where eye tracker and GSR devices are connected. Both devices
were synchronized with the iMotion platform. iMotion creates users’ behavior data from the biosensor
recordings.




                                                       31
                                                                                    Tobii Pro Nano


                                                                         stimulus




                                                 connected
                                                 biosensors
                  real-time
                 data streams




          EEG-EMOTIV                                            GSR


 Figure 6: The experimental setup

4.2.      Experimental design

To investigate effectively the effects of the users’ interaction with the interface based on three stages as
users paid a certain attention when examine the image results. Therefore, this amount of user attention
helps us to differentiate in which interface or system the results potentially meet user needs. To obtain
evidence from the post-focus stage of what influences user attention behavior when they contribute in
all search aspects, visualizing user’s gaze path is needed such as heat maps and fixation patterns. We
designed a controlled user study to obtain eye tracking data along with explicit feedback on search
satisfaction from participants. Each of our participants performed two exploratory-image search tasks
on each search interface. The GSR and eye tracker recorded the user activities. Table 2 shows the two
image search tasks which participants were asked to perform using the SS and IG interfaces.

Table 2
Exploratory search tasks
 Task 1    Background: Imagine you intend to enter a photo competition on the topic of “Good variety food
           guide”, where you could win £50. This photo competition is being run by BBC Good Food: they are
           all about good recipes, and about quality home cooking that everyone can enjoy and like. The images
           you intend to present in this guide would show a variety of healthy and delicious inspiration,
           including a decadent dessert. It would also present trustworthy guidance for even some foodie needs.
           In order to get ideas for the competition, you want to look for already existing photographs conveying
           a similar subject. Your task is to find as many as diverse images that you think are the best fit to the
           topic “Good variety food guide”.

 Task 2    Background: Imagine you are an interior designer, specialist in lighting with responsibility for the
           design of a leaflet that illuminates customers about the chandelier options in terms of colors and
           shapes, which can be designed and intended for practical, or relaxing uses or both combined.
           Customers do not have knowledge and experience of lighting their homes. Your task is to find diverse
           chandelier images from a large collection of images that can be included in the leaflet. The leaflet is
           intended to raise interest among them and to have a variety of chandelier shapes lined up for matching
           customer requirements, style and budget.


                                                         32
Twelve participants were recruited through a mailing list, 9 postgraduate students and 3 undergraduate
students, comprising 4 females and 8 males. Only 8 participants had adequate technical knowledge in
search system design. All participants were familiar with text-based search, but not with search using
query by example (CBIR). The experiment lasted about 55 minutes. In the first step the GSR device
was attached, and the eye tracker calibrated before performing each task. The experiment was conducted
in a UX lab. Data were collected using the iMotion platform. Post-task questionnaires were presented
as stimulated recall after each task was performed. The experimental procedure is depicted in Figure 7.
In order to avoid the impact of learning and fatigue, the stimulus order of search tasks was not fixed.

                                     Introduction to the research
                                                study

                                    Consent of the research study


                                          Background survey


                                                Training


               Exploratory Task                     Post-task
                                                                           5-point Likert scale
         (Information Goal Interface)             questionnaire



              Exploratory Task                     Post-task
                                                                           5-point Likert scale
          (Search Strategy Interface)            questionnaire




                                   Researcher-administered survey

 Figure 7: Experimental procedure

As shown in the experimental procedure, each participant was informed of the study objectives and
their consent obtained; they completed a background survey. Before performing any tasks, we provided
each participant with training on each system, since the quality of query formulation has significant
impact on search results, and it can be beneficial to involve users in all retrieval processes [5]. The
questions addressed here are:
   Q1: To what extent can user engagement in query formulation improve the user involvement during
         post-focus stage?
   Q2: To what extent can user engagement in focus-formulation stage affect user perceptions?

5. Results and discussion

Visual analysis of fixation and heat map patterns is presented including the heat maps on objects of the
interface; eye gaze fixation path activities are then presented. Heat maps were generated during the two
exploratory tasks that participants performed using the IG and SS interfaces. Eye fixation is one of the
most widely used indicators in eye tracking studies [30]. They can illustrate precise visual attention of
the interaction activities occurring with the search system. To address the first research question (Q1),
where the recorded data for both eye fixation and heat maps were generated by eye tracker devices, it
is observed among the participant data that high fixation rate denotes high attention paid by participants
on a target image. The heat maps are objective attributes, representing the time spent on a certain object
(image). Therefore, heat maps are useful to observe potential issues related to user perception, for
instance, interface usability, task completion, task performance, and task complexity. Here, we look at

                                                     33
how users assess the result panel on two different interface designs. In this context the eye tracker helps
us to spot additional insights from image search elements. The recorded data were aggregated to enable
static visualization and then heat maps were generated. In this analysis we look at the aggregated and
individual levels, thus eye tracking data of areas of interest (AOI) metrics per participant and AOI
fixations per participant were exported to statistical software (SPSS). To spot the importance of user
engagement in the focus-formulation stage, Figure (8-a) illustrates the customized heat maps that easily
indicate the most heat focused element were where the image results presented on the SS and IG
interfaces. Clearly, more attention can be observed on the SS interface, which means more images were
selected, unlike the attention rate on the IG interface result panel (Figure 8-b) where the user had intent
to change the search system, on the bottom right corner. It is also noticed that high heats were recorded
on images that have similar color background with different object texture design.




                       a                                                   b
 Figure 8: Heat maps generated on the result panels (scenes) of (a) SS and (b) IG interfaces

In order to quantify visual attention at both individual and aggregated levels, we first aggregated
multiple dynamic events in the recorded video stimuli. Within the recording sensor, we created a scene
as a segment. The segment was defined with a fixed size allocated manually along the timeline for each
participant. The created scenes were treated as static stimuli at the individual level. Four static AOIs of
eye tracking matrices were generated and used for analyzing the created scene. The drawn AOIs are to
quantify the visual attention on the result panel. The heat maps indicate highlights where the
participant’s attention was focused, in Figures 9-a and 9-b of the SS and IG interfaces, respectively.




                         a                                                     b
 Figure 9: Created areas of interest (AOIs) on the result panels of the (a) SS and (b) IG interfaces

Figures 10-a and 10-b show the gaze path activity at the individual level with option to observe the
duration of a fixation, unlike dynamic or static gaze path. The gaze patterns are for Task 2 on both
interfaces. The circle size indicates the fixation time, where the radius increases with longer fixation.
To obtain more understanding, the fixation values can be observed across the presented AOIs in
Figure 9. The number of fixations per image is related to how long a participant engages on interface
elements or with useful images they might have seen. Eye tracking data brings evidence that the
users’ actions are not randomly taken. When the results do not fit user needs, the gaze fixation path

                                                     34
illustrates how participants try to find alternative search methods to find a desired image (see right
corner at the bottom).




                          a                                                  b
Figure 10: Individual gaze maps on the result panels of the (a) SS and (b) IG interfaces

The Mann–Whitney U-test was applied as an independent samples t-test, performed on ordinal recorded
data on the result panels of the IG and SS interfaces. This is to find the significance of the difference
between the time spent and fixation counts on both results of the IG and SS interfaces. We found that
there is a significant difference between the two elements of time spent (ms) in AOI with no fixation
for the IG and SS interfaces (based on raw data) at p = 0.00032. Moreover, the Mann–Whitney U-test
shows a high significance at p < 0.05 (at 0.0031) on the total duration spent in AOI of all participants’
fixations (excluding data points between fixations). The significant difference between the amounts of
time spent in AOI-based raw data for both interfaces can be seen in Figure 11.


                                        Time spent_IG-G (ms)            Time spent_SS-G (ms)
                     16000
                     14000
                     12000
   Time spent (ms)




                     10000
                     8000
                     6000
                     4000
                     2000
                        0
                             R1   R2   R3    R4      R5        R6      R7     R8      R9       R10   R11   R12
                                                               Participant

 Figure 11: Time spent in AOI (not fixation based) of the result panel on IG and SS interfaces


Figure 12 shows the number of fixations recorded inside the AOI of the result panels using IG and SS
interfaces. The fixation counts on SS results are distributed with higher scores than the IG results. The
higher the amount of total fixation counted inside the AOI, the more time a participant spent on an
AOI with high interest. Similarly, as seen in Figure 13, the total times spent in AOI of all participants’
fixations (excluding data points between fixations) were significantly higher at the result-assessment
stage of the SS than the IG interface.




                                                               35
                                                                Fixation Count_IG         Fixation Count_SS

                       70

                       60

                       50
  Fixations Count




                       40

                       30

                       20

                       10

                           0
                                  R1        R2        R3        R4        R5        R6        R7        R8        R9        R10     R11    R12
                                                                                    Participant


 Figure 12: Number of fixations recorded inside the AOI of the result panels on the SS and IG
 interfaces

                                                           Time spent_IG-F (ms)                Time spent_SS-F (ms)

                           14000
                           12000
         Time spent (ms)




                           10000
                               8000
                               6000
                               4000
                               2000
                                 0
                                       R1        R2        R3        R4        R5        R6        R7        R8        R9     R10    R11    R12

                                                                                Participant

 Figure 13: Total time spent in AOI of all participants’ fixations (excluding data points between
 fixations) of the result panels on the SS and IG interfaces

Meanwhile, eye tracking data indicates how participants experience the exploratory image search tasks;
the participants’ perceptions in respect of both search systems (i.e., IG and SS) are important to address
the second research question (Q2).
    To report the participants’ perceptions, we aggregated and exported the survey data per participant
from the stimuli of a 5-point Likert scale. The survey data is from post-task questionnaires. Figure 14
shows the average of four elements of users’ perceptions during the second task performed on interfaces
IG and SS. The results show that among the four factors, the SS interface outperforms the IG interface:
this includes task performance rate, the approach to task handling, the number of returned relevant
images, and overall user’s satisfaction rate.
    Figure 15 shows the individual heat maps from the post-task questionnaires (PTQ) for SS and IG. It
highlights the benefit of using eye tracker data, which can complement the survey stimuli evaluation,
as the participant’s attention focuses more on the area that receives a number of mouse clicks.




                                                                                         36
Figure 14: Comparison of user perceptions towards IG and SS interfaces; error bars represent standard
deviation




                          a                                                 b
Figure 15: Individual heat maps on the PTQ panels of the (a) SS and (b) IG interfaces

6. Conclusion and future work

We used eye tracking sensor and GSR sensor in the user study to determine at which stage users pay
high attention. The evaluation of user search experience comprised of task performance, task handling,
returned relevant images, and overall satisfaction. The aggregated eye tracking data were helpful to
identify at which stage more attention was paid. The finding shows that while participants engage in
the focus-formulation stage (i.e., SS interface) during both search tasks, the SS aggregated heat maps
and gaze fixations were noticeably higher at the end of the search process (result panel) than when the
IG interface applied. The analysis of recorded eye-tracking data revealed that the gaze behavior patterns
can complement the survey stimuli evaluation by examining the gaze navigation behavior and fixations.
The limitation of this paper is that the recorded GSR data were not integrated with the eye tracking data.
Further investigation will be devoted to shed light into the key factors of CBIR-approach design by
increasing the number of participants and aggregating GSR and eye tracking data. That is in order to
pave the way to obtain a better image search paradigm.




                                                     37
7. References
[1]    V. Tyagi, Content-Based Image Retrieval: Ideas, Influences, and Current Trends. Springer, 2018.
[2]    A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-based image retrieval at the
       end of the early years," IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 12, pp.
       1349-1380, 2000.
[3]    H. Liu, S. Zagorac, V. Uren, D. Song, and S. Rüger, "Enabling effective user interactions in content-
       based image retrieval," in Asia Information Retrieval Symposium, 2009: Springer, pp. 265-276.
[4]    A. Mohanan and S. Raju, "A Survey on Different Relevance Feedback Techniques in Content Based
       Image Retrieval," International Research Journal of Engineering and Technology, vol. 4, no. 02, pp.
       582-585, 2017.
[5]    W. Zhou, H. Li, and Q. Tian, "Recent advance in content-based image retrieval: A literature survey,"
       arXiv preprint arXiv:1706.06064, 2017.
[6]    M. Artemi and H. Liu: Content-based Image Search System Design for Capturing User Preferences
       during Query Formulation. Proc. of BIRDS2020, Xi'an, China, July, 2020, http://ceur-ws.org/Vol-
       2741/paper-12.pdf
[7]    H. Zhang, Z.-J. Zha, Y. Yang, S. Yan, Y. Gao, and T.-S. Chua, "Attribute-augmented semantic
       hierarchy: towards bridging semantic gap and intention gap in image retrieval," in Proceedings of the
       21st ACM international conference on Multimedia, 2013, pp. 33-42.
[8]    L. Piras and G. Giacinto, "Information fusion in content based image retrieval: A comprehensive
       overview," Information Fusion, vol. 37, pp. 50-60, 2017.
[9]    H. Liu, P. Mulholland, D. Song, V. Uren, and S. Rüger, "Applying information foraging theory to
       understand user interaction with content-based image retrieval," in Proceedings of the third symposium
       on Information interaction in context, 2010: ACM, pp. 135-144.
[10]   B. Settles, "Active learning literature survey," University of Wisconsin-Madison Department of
       Computer Sciences, 2009.
[11]   X.-D. Zhang, "Machine learning," in A Matrix Algebra Approach to Artificial Intelligence: Springer,
       2020, pp. 223-440.
[12]   B. Settles, M. Craven, and L. Friedland, "Active learning with real annotation costs," in Proceedings of
       the NIPS workshop on cost-sensitive learning, 2008: Vancouver, CA:, pp. 1-10.
[13]   C. Sammut and G. I. Webb, Encyclopedia of machine learning. Springer Science & Business Media,
       2011.
[14]   X. S. Zhou and T. S. Huang, "Relevance feedback in image retrieval: A comprehensive review,"
       Multimedia systems, vol. 8, no. 6, pp. 536-544, 2003.
[15]   P. B. Patil and M. B. Kokare, "Relevance Feedback in Content Based Image Retrieval: A Review,"
       Journal of Applied Computer Science & Mathematics, no. 10, 2011.
[16]   D.-p. Tian, "A Review on Relevance Feedback for Content-based Image Retrieval," J. Inf. Hiding
       Multim. Signal Process., vol. 9, pp. 108-119, 2018.
[17]   S. Jones, L. Shao, and K. Du, "Active learning for human action retrieval using query pool selection,"
       Neurocomputing, vol. 124, pp. 89-96, 2014.
[18]   M. Artemi and H. Liu, "Image optimization using improved gray-scale quantization for content-based
       image retrieval," in 2020 IEEE 6th International Conference on Optimization and Applications
       (ICOA), 2020: IEEE, pp. 1-6.
[19]   P. Vakkari, "A theory of the task-based information retrieval process: a summary and generalisation of
       a longitudinal study," Journal of documentation, vol. 57, no. 1, pp. 44-60, 2001.
[20]   K. Athukorala, A. Oulasvirta, D. Głowacka, J. Vreeken, and G. Jacucci, "Narrow or broad?: Estimating
       subjective specificity in exploratory search," in Proceedings of the 23rd ACM International Conference
       on Conference on Information and Knowledge Management, 2014: ACM, pp. 819-828.
[21]   M. Hearst, Search user interfaces. Cambridge university press, 2009.
[22]   H. C. Huurdeman and J. Kamps, "Designing multistage search systems to support the information
       seeking process," in Understanding and Improving Information Search: Springer, 2020, pp. 113-137.
[23]   R. W. White, I. Ruthven, and J. M. Jose, "A study of factors affecting the utility of implicit relevance
       feedback," in Proceedings of the 28th annual international ACM SIGIR conference on Research and
       development in information retrieval, 2005, pp. 35-42.
[24]   X. Niu and D. Kelly, "The use of query suggestions during information search," Information
       Processing & Management, vol. 50, no. 1, pp. 218-234, 2014.
[25]   B. Kules, R. Capra, M. Banta, and T. Sierra, "What do exploratory searchers look at in a faceted search
       interface?," in Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, 2009, pp.
       313-322.


                                                      38
[26]   H. C. Huurdeman, M. L. Wilson, and J. Kamps, "Active and passive utility of search interface features
       in different information seeking task stages," in Proceedings of the 2016 ACM on Conference on
       Human Information Interaction and Retrieval, 2016, pp. 3-12.
[27]   H. Liu, D. Song, and P. Mulholland, "Exploration of Applying a Theory-Based User Classification
       Model to Inform Personalised Content-Based Image Retrieval System Design," in Proceedings of HCI
       Korea, 2016: Hanbit Media, Inc., pp. 61-68.
[28]   I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical machine learning tools and
       techniques. Morgan Kaufmann, 2016.
[29]   S. Amershi, M. Cakmak, W. B. Knox, and T. Kulesza, "Power to the people: The role of humans in
       interactive machine learning," AI Magazine, vol. 35, no. 4, pp. 105-120, 2014.
[30]   K. Holmqvist, M. Nyström, R. Andersson, R. Dewhurst, H. Jarodzka, and J. Van de Weijer, Eye
       tracking: A comprehensive guide to methods and measures. OUP Oxford, 2011.




                                                     39