=Paper=
{{Paper
|id=None
|storemode=property
|title=Using Semantic Differentials for an Evaluative View of the Search Engine as an Interactive System
|pdfUrl=https://ceur-ws.org/Vol-909/paper2.pdf
|volume=Vol-909
|dblpUrl=https://dblp.org/rec/conf/eurohcir/Johnson12
}}
==Using Semantic Differentials for an Evaluative View of the Search Engine as an Interactive System==
Using semantic differentials for an evaluative view of the
search engine as an interactive system
Frances Johnson
Department of Languages, Information & Communications
Manchester Metropolitan University
Geoffrey Manton
+44 161 247 6156
F.Johnson@mmu.ac.uk
ABSTRACT
In this paper, we investigate the use of semantic differentials in Accordingly, search engine developments have focused on
obtaining the evaluative view held by users of the search engine. providing query assistance drawing on contextual aspects to the
The completed scales of bipolar adjectives were analysed to search, such as personal history and/or current context [9]. At the
suggest the dimensions of the user judgment formed when asked interface, developments focus on improving the search process via
to characterize a search engine. These were then used to obtain a richer information representations and interactions, such as
comparative evaluation of two engines potentially offering previews and facets through to tools that allow the user to view
different types of support (or assistance) during a search. We and explore connections in the results, for example ‘the relation
consider the value of using the semantic differential as a technique browser data analysis tool’ [10]. These shifts into HCIR are
in the toolkit for assessing the user experience during information intended to help in the various stages of search, from starting the
interactions in exploratory search tasks. task and understanding the query topic, throughout the search in
deciding what to do next, and to stopping with a sense of
Categories and Subject Descriptors H3.3 [Information search confidence. In short, developments aim to support true
and retrieval]; Search process. H.5.2 [User interfaces]: exploration of the search and, whilst many efforts may fall short,
Evaluation/methodology they will provide some form of user support in query assistance
General Terms and in improving the search process as an interactive experience.
Measurement, Performance, Design, Human Factors The context for evaluation is predicated on White and Roth’s [3]
model of the exploratory search process. This involves the
Keywords searcher in a dynamic interplay between the cognition of their
Semantic Differentials, User Evaluation, Exploratory Search, ‘problem space’ and their exploratory activities in the iterative
Information Interaction, User Interface Design, search process including the query formulation, results
examination and information extraction. Data collected on the
searcher’s information interactions may confirm this model [7] as
well as attempt to systematically evaluate the effectiveness of
exploratory search systems. In evaluation, a framework is used to
1. INTRODUCTION attempt to assess performance during the search stages and to
The design of interfaces to support exploratory search seeks to relate aspects of the system to its role in supporting information
provide users with the tools for and the experience of an exploration, including sense making or query visualisation [5].
interactive and engaging search. This is a departure from the The challenge for the evaluation of exploratory search is the
classic model of information retrieval wherein the user submits a assumption that the user is willing or able to make an evaluative
keyword query to the system and scans the list of retrieved results judgment throughout the search or that valid measures can be
for relevance, either stopping with relevant results or refining the found through their actions, for example of usage of query terms.
query to get results that are closer to the information need. In general, evaluation draws from established HCI measures of
Exploratory search does not necessarily assume that the user has a effectiveness (can people complete their tasks?) efficiency (how
well defined information need (at least one that can be articulated long do people take?), an assessment of the user’s overall
as a keyword query) or indeed that the query will be ‘static’ and satisfaction or other affective responses. Where possible, and
thus satisfied by a single list of retrieved results. increasingly so, the user actions are observed and recorded as
dependent on the system and/or its interface. In this study we
focus on an attempt to obtain the user’s evaluative view of the
search engine, based on criteria which may be affected by the
Presented at EuroHCIR2012. Copyright © 2012 for the individual developments for new and richer interactive designs. It is
papers by the papers' authors. Copying permitted only for private assumed that this would be part of an assessment which when
and academic purposes. This volume is published and copyrighted taken with others will build a picture of the ‘user experience’ of
by its editors. the system used in exploratory search.
2. USER EVALUATION The assumption made here, in the use of SDs on ‘search engines’
is that users hold an evaluative view which is formed when using
In developing an instrument to collect the user assessment effort the engine to find and/or explore information. The SD is used to
goes into ensuring that the evaluation is made in the task context. investigate the adjectives that best ‘conceptualise’ the search
It means little to know that the user is ‘satisfied’ with the interface engine, from the user perspective. Factorial analysis is also used
without gaining insight into why this assessment has been formed. to identify the dimensions of the judgment, in a sense the
A variety of questionnaires have been developed for assessing packaging of the components of the judgment into smaller units of
usability of interactive systems, such as search engines. Two well meaning reflecting what is important when responding to the
known are the SUS (System Usability Scale) developed at the concept ‘search engine’.
Digital Equipment Corporation [2] and the QUIS (Questionnaire
for User Interaction Satisfaction) from the University of Maryland The design of the SD aims to allow a degree of abstraction in the
[4]. Both assess usability from the user perspective with 10 evaluation so that participants can reflect the complexity of their
statements and rating scales in the SUS and the QUIS with 27 response. In this study, the adjectives to include on the SD scale
questions. The QUIS asks the user to respond on a rating scale to were chosen from Microsoft’s Product Reaction Cards, these
statements which address specific usability aspects of the system, having been collected in previous research, usability studies and
such as “use of the terms were consistent throughout the website”. in the marketing of web sites and systems. The majority of the
The SUS on the other hand focuses on collecting the users’ terms formed pairs on some continuum and 40 terms (20 pairs)
overall reaction to the site/system on statements, such as “I found were selected to present in the SD. The selection was subject to
the website unnecessarily complex”. Arguably the QUIS focuses the judgment of the researcher. This is a limitation of this
on the concerns that a developer might have when assessing exploratory study, however some steps were taken to formalise the
usability whilst the SUS assumes that the user’s overall selection. A loose grouping of the adjective pairs was made as
assessment is a reflection on the extent to which their goal relating to appearance (such as ‘attractive’), judgment (‘relevant’),
directed tasks were facilitated by the system and its design. emotive (‘boring’) and use (‘fast’). Five pairs from each of these
groupings were made. The pairs were mixed on the SD to avoid
Questionnaires, such as SUS, are used in an experimental set up having all the positive terms on one side of the scale and only
when an explanation of the user’s overall assessment is sought. intervals were shown on the scales with the numerical values used
However, the limitations of the questionnaire to capture and only for data entry. This allowed participants to focus on how an
provide insight into the complexity of the user’s assessment has adjective pair related to the engine and its characteristics, rather
lead to alternative tools, for example Microsoft’s Product than on ‘scoring’ it in some way.
Reaction Cards in the "Desirability Toolkit". This invites
participants on a usability test to select as many, or as few, words
from a list of 118 which best describe their reaction and/or
interaction with the system they have just used. Benedek and 3.1 Implementation
Miner [1] includes a list of the words used and point out that the The study was conducted on our undergraduates studying BSc
approach helps elicit negative comments as well as positive, thus Web Development and on a postgraduate cohort studying on MA
overcoming a problem with questionnaires biased towards Library and Information Management or the MSc Information
positive responses. Management. A total of 89 students participated in the study. At
Given the potential scope of the users’ response (represented in the start of the class each participant was asked to think about a
the reaction cards with some 100+ terms) this study sets out to search engine, and adjectives they would use to describe the
investigate the value in assembling these into a framework (of engine, (in other words, “what it means to them”). Each
sorts) for the collection of the users’ evaluative judgment of an participant was then given the SD to complete. This is referred to
interactive system based on the technique known as ‘semantic as the ‘baseline’ and the data were analysed to gauge user
differentials’. Specifically the aim of this small preliminary perceptions of search engines.
investigation was to begin to determine the extent to which users In the following lab sessions (about one hour later) each
hold an evaluative view of a ‘search engine’ and, what are the participant was required to perform two search tasks on each of
dimensions (traits or criteria) on which we form this view. If it the two search engines - Google, an engine we can assume some
can be found that this view is strongly held (that is, an attitude is familiarity and, a second clustering engine (Yippy, formerly
formed which may influence how we behave and interact with the Clusty). The two tasks were as follows
search engine) then it may be feasible to investigate the influence,
1. Find information on the symptoms for diabetes type II
if any, of a design for information interaction on the evaluative
view. In this study the technique of semantic differentials is used 2. Find information to help write an assignment on the
to best describe the evaluative view held by its participants. This debate ‘nurture vs nature’
is then employed to assess two quite different search engines
following the completion of two query based searches. These were selected to give the participants experience of using
the engines for a closed question (find symptoms) and on a more
open ‘informational’ type of query (on the ‘nature nurture’
3. SEMANTIC DIFFERENTIALS debate). A measure of search success was not taken as the aim
was simply to get the participants using the engines. The order of
Semantic Differentials (SDs) originate from the work of Osgood use of the two sites was randomized so that approximately half of
[8] as a technique for attitude measurement, scaling people on the participants worked on Google first and half on the clustering
their responses to adjectives in respect to a concept. Typically engine. All were told to spend no longer than 10 minutes
individuals respond to several pairs of bipolar adjectives scored searching on each engine and to complete the SD for each engine
on a continuum + to – and in doing so differentiate their meaning immediately after each use.
of the concept in intensity and in direction (in a ‘semantic space’).
4. FINDINGS
4.2 Comparative evaluations
4.1 Evaluative views Using the same SDs, participants scaled their responses post
The responses to the baseline (think of an engine) were entered search using Google and the clustering search engine. These were
into SPSS with the scales coded (7-1) so that the positive entered into a worksheet to obtain basic statistics. The mode for
adjectives corresponded to the higher numbers. Descriptive each adjective is shown Figure 1 with a note of those with mode
statistics of mean, mode and standard deviation were calculated >4 and < 3 suggesting a positive or negative response.
for each of the adjectives. Those with a mean greater than 4 or
less than 3 were taken to suggest the adjective pairs that best
characterise the participants’ view, as follows
attractive unattractive
powerful simplistic
valuable not valuable
relevant irrelevant
satisfying frustrating
fast slow
predictable unpredictable
intuitive rigid
Google (mode > 4 or < 3)
easy difficult
& in bold where mean is also > 4
Factor analysis investigates the correlations among subsets of the
1attractive - , 6valuable - , 8relevant - ,
responses to the bipolar pairs and groups the correlated variables
such that each group is largely independent of the others. 15satisfying - , 16fast - , 17predictable - ,18controllable -,
Exploratory factor analysis was employed to identify the groups and (where mode < 3) 19rigid -
which might explain most of the variance in the data. With 20
pairs of adjectives to perform Principal Components Analysis
(PCA) in SPSS it is recommended that a minimum of 100
responses are obtained, whilst others recommend that a sample
requires approx 5-10 times the number of people as scale pairs
[6]. With 89 responses we should use a reduced number of pairs,
however the Kaiser-Meyer-Olkin measure of sampling adequacy
(.616) is greater than the 0.6 needed to indicate that the
correlations matrix may be able to factorise. So with this, PCA
was run (with varimax rotation to force items to ‘load’ with only
one factor group), to identify the possible ‘factors’ or subsets
derived from patterns of correlation of the adjective pairs. The
following five subsets were obtained (the adjectives from the list
above having a low or high mean are shown in bold). The labels
were assigned to suggest the evaluative dimension.
Factor 1 label USE – Utility
Clustering search engine (mode > 4 or < 3)
effective, valuable, satisfy, relevant, predictable, & in bold where mean > 4 or < 3
intimidating, inspiring, stimulating
14engaging - , 19intuitive –
Factor 2 label QUALITY – Affective and (where mode < 3)
13intimidating - , 17 – unpredictable
engaging, fun, connected
Factor 3 label QUALITY - Appearance
Figure 1. Responses to the adjectives for both engines
high quality, personal, meaningful, rigid, attractive
Using the suggested dimensions or aspects of the user evaluation
Factor 4 label USE – Efficient from the factor analysis of the ‘baseline’ data we can compare the
participants’ responses on the high or low scoring adjectives
easy, intuitive, fast, powerful across the engines. On QUALITY – Appearance Google
was rated rigid and attractive and whereas Google was neutral on
Factor 5 label USE - Control the factor QUALITY- Affective, the clustering search
engine obtained a positive score towards the adjective engaging.
controllable On the factor labeled USE- Utility Google was scored as
predictable, valuable, relevant and satisfying, whereas the [2] Brooke, J. SUS: A Quick and Dirty Usability Scale. In: P.W.
clustering engine as unpredictable and towards intimidating. On Jordan, B. Thomas, B.A. Weerdmeester & I.L. McClelland (Eds.),
USE-Efficient Google was rated as fast and the clustering Usability Evaluation in Industry. London: Taylor & Francis, 1996
engine appears more intuitive. Google was also rated as [www.itu.dk/courses/U/E2005/litteratur/sus.pdf#
controllable.
[3] Capra, R., and Marchionini, G. The Relation Browser tool for
faceted exloratory search. Proceedings of the 2008 Conference on
5. DISCUSSION Digital Libraries, Pittsburg, Pennsylvania, June, 2008
This is an exploratory study and it has its limitations. It is
questionable whether the selection of the adjectives to use in the [4] Chin, J. P., Diehl, V. A, & Norman, K. Development of an
SD influenced the results. In particular there is uncertainty in the instrument measuring user satisfaction of the human-computer
results that intuitive to rigid is on some continuum. Also there is interface, Proceedings of ACM SIGCHI ,1988, pp. 213-218.
some unease at accepting a factor with 8 out of 20 pairs and one http://www.cs.umd.edu/hcil/quis/
with only one. Perhaps the sample size was too small to attempt
factoring. The results also raise questions on how some of the [5] Daqing He, et al An evaluation of adaptive filtering in the
adjectives were interpreted by the participants. These context of realistic task-based information exploration. .
withstanding, the participants in this study did appear to hold an Information Processing Management, 44(2), 2008 pp. 511-533
evaluative judgment of the concept ‘search engine’ and the traits
represented in the scale were grouped to suggest the aspects on [6] Gable, R. K., & Wolf, M. E.. Instrument development in the
which an assessment may be formed. It is of particular interest affective domain (2nd ed.). Boston: Kluwer Academic, 1993
that upon using the search engine Google to conduct a search task
the ratings on the SD, on the whole, altered only in the factors of [7] Kules, B and Capra, R. Visualizing stages during an
‘controllable’ and USE –efficient (easy, intuitive and exploratory search. Proceedings HCIR October 20th, 2011.
powerful). Perhaps we can assume that Google was the typical
engine when asked to think of an engine in the baseline and, when [8] Osgood, C.E, Suci, G., & Tannenbaum, P The Measurement
it came to use Google, users shifted their perception with regards of Meaning. University of Illinois Press, 1957
to some of the adjectives. Perhaps this is not surprising but it may
suggest that we hold an implicit view of search engines, and that [9] Teevan, J., Dumais,S.T and E. Horvitz. Potential for
this view will be influenced by actual use (and the experience). Personalization. ACM Transactions on Computer-Human
Our participants may have had less familiarity with the clustering Interaction special issue on Data Mining for Understanding User
engine, and in the evaluation this appears to have prompted an Needs, 17(1), 2010 http://people.csail.mit.edu/teevan/work/
‘affective’ response in finding the engine to be ‘engaging’ whilst publications/ papers/tochi10.pdf
also indicating shifts in the ‘use’ factors (towards an assessment
of the engine as ‘unpredictable’). Again the infallibility of some [10] White, Ryen W. & Roth., R. A. Exploratory Search: Beyond
of the terms is highlighted where an ‘unpredictable’ system may the Query-Response Paradigm, CA: Morgan and Claypool, 2009
be regarded to be a negative judgment, but if the system is also
considered to be engaging the assessment could be highly
desirable depending on the user’s goals. This study of the use of Appendix: The Semantic Differential scale
semantic differentials indicates that it is worth running the test
with a new cohort of students to determine the extent to which a attractive _ _ _ _ _ _ _ unattractive
consistent view is obtained. As an exploratory study it also impersonal _ _ _ _ _ _ _ personal
suggests that further research on user’s perceptions and mental dull _ _ _ _ _ _ _ fun
models of search engines is worthwhile. With regards to the powerful _ _ _ _ _ _ _ simplistic
challenge of providing an evaluation of the exploratory search, disconnected _ _ _ _ _ _ _ connected
this study falls short as no behavioural data was obtained. valuable _ _ _ _ _ _ _ not valuable
However, perhaps, with further design of the SD and use in an high quality _ _ _ _ _ _ _ low quality
experimental set up with honed tasks, a user assessment of the irrelevant _ _ _ _ _ _ _ relevant
interface may be obtained as dependent on the search interface effective _ _ _ _ _ _ _ ineffective
development and design. incomprehensible _ _ _ _ _ _ _ meaningful
stimulating _ _ _ _ _ _ _ confusing
boring _ _ _ _ _ _ _ inspiring
6. REFERENCES intimidating _ _ _ _ _ _ _ empowering
stressful _ _ _ _ _ _ _ engaging
[1] Benedek, J. and Miner, T. "Measuring Desirability: New satisfying _ _ _ _ _ _ _ frustrating
Methods for Evaluating Desirability in a Usability Lab Setting." fast _ _ _ _ _ _ _ slow
Redmond, WA: Microsoft Corporation, 2002. predictable _ _ _ _ _ _ _ unpredictable
http://www.microsoft.com/usability/UEPostings/Desirability controllable _ _ _ _ _ _ _ uncontrollable
Toolkit.doc intuitive _ _ _ _ _ _ _ rigid
difficult _ _ _ _ _ _ _ eas