=Paper=
{{Paper
|id=Vol-1178/CLEF2012wn-CLEFeHealth-KellyEt2012
|storemode=property
|title=Considering Subjects and Scenarios in Large-Scale User-Centered Evaluation of a Multilingual Multimodal Medical Search System
|pdfUrl=https://ceur-ws.org/Vol-1178/CLEF2012wn-CLEFeHealth-KellyEt2012.pdf
|volume=Vol-1178
}}
==Considering Subjects and Scenarios in Large-Scale User-Centered Evaluation of a Multilingual Multimodal Medical Search System==
<pdf width="1500px">https://ceur-ws.org/Vol-1178/CLEF2012wn-CLEFeHealth-KellyEt2012.pdf</pdf>
<pre>
Considering Subjects and Scenarios in Large-Scale User-
  Centered Evaluation of a Multilingual Multimodal
               Medical Search System

     Liadh Kelly1, Lorraine Goeuriot1, Gareth J. F. Jones1, and Allan Hanbury2
1
 Centre for Next Generation Localisation, School of Computing, Dublin City University, Dub-
                                        lin 9, Ireland
2
  Department of Software Technology and Interactive Systems, Vienna University of Technol-
                                ogy, 1040 Vienna, Austria
             {lkelly, lgoeuriot, gjones}@computing.dcu.ie,
                            hanbury@ifs.tuwien.ac.at


      Abstract. Khresmoi aims to provide a multilingual and multimodal search and
      access system for biomedical information and documents, which targets three
      classes of user. Two of these are groups with general medical interests: the gen-
      eral public and general medical practitioners. The other is an example clinician
      group with a specific expertise: radiologists. Khresmoi targets diverse users,
      with: varying native language, language skills, medical knowledge, information
      needs or querying behaviour. The system seeks to provide these groups with in-
      novative and effective services for searching through the large amount of avail-
      able medical information. It gathers several innovative technologies linked to
      medical information (e.g. information extraction and retrieval, machine transla-
      tion) to provide a comprehensive tool adapted to all users. In parallel to the de-
      velopment of the search system, a global evaluation strategy is being designed
      to enable an assessment of the efficiency and effectiveness of the developed
      technologies and of the quality of the support provided to the target users. The-
      se evaluation plans consist of both empirical and user-centred evaluations. We
      focus here on user-centred evaluations. Creation of holistic user-centred evalua-
      tion approaches allows to evaluate the search application in a comprehensive
      way while being mindful of the diversity of users. First steps in this strategy re-
      late to the experimental subjects and evaluation scenarios. Firstly, demographic
      features of each class of user have to be defined in order to get representative
      groups. Gaining subjects of this nature can be problematic, especially when
      medical professionals are required, and it raises issues such as the payment of
      subjects, or the geographic distance between subjects and investigators. Moreo-
      ver, the Khresmoi system is intended for use by individuals with different na-
      tive languages and different skills in the English language. Again, this raises is-
      sues such as gaining a good spread of subjects with different language skills, as
      well as the assessment of subjects' language skills. To determine the demo-
      graphic spread of subjects, standard questionnaires should be completed by sub-
      jects to determine their age, gender, medical condition, prior use of medical
      search engines, computer skill, etc. Secondly, a robust methodology for scenar-
      io creation has to be developed. User studies conducted within the project gave
       us an insight into the types of search tasks that users are performing and would
       perform in the medical space and a method to model this task information will
       be defined. Results of these evaluations will guide further research and devel-
       opment in the Khresmoi project.

       Keywords: Scenarios; Subjects; User-Centered Evaluation Strategy


1      Introduction
Medical search applications can be required to service the differing information needs
of multiple classes of users with varying medical knowledge levels, and language
skills, as well as varying querying behaviours. The precise nature of these users' needs
has to be understood to develop effective applications. Evaluation of developed
search applications requires creation of holistic user-centred evaluation approaches
which allow for comprehensive evaluation while being mindful of the diversity
of users.
   This paper describes plans for evaluation of the effectiveness of the Khresmoi sys-
tem, a large scale multilingual eHealth system being developed across 12 institutions
in the EU (http://khresmoi.eu/). Khresmoi aims to provide a multilingual
and multimodal search and access system for biomedical information and documents,
which targets three classes of users. Two of these are groups with general medical
interests: the general public and general medical practitioners. The other group is an
example clinician group with a specific expertise: radiologists. The system seeks to
provide these groups with innovative and effective services for searching through the
very large amount of medical information available. Relevant information may often
not be available in the searcher’s native language, as scientific content is assumed to
be more often available in English. Thus one of the major features of our system is
translation support to provide cross-lingual access to the English language medical
information. A key feature here is to provide support appropriate to the varying lan-
guage skills of the users. The system gathers several innovative technologies linked to
medical information (e.g. text and image information extraction and retrieval, ma-
chine translation) to provide a comprehensive tool adapted to all users.
   The overall aim of our evaluations is to enable an assessment of the efficiency and
effectiveness of the developed technologies and of the quality of the support provided
to the target users. The results of these evaluations will guide further research and
development in the Khresmoi project. These evaluation plans consist of both empiri-
cal and user-centred evaluations. In this paper we focus on elements of the design of
the user-centred evaluations.


2      Materials and Methods
Users and their requirements are key to creating useful medical information retrieval
systems. While user behaviour and satisfaction has been studied for general web
search1 and health-related search2, further investigations have to be carried out to
define a proper evaluation strategy within multifunctional medical search systems
such as Khresmoi. Surveys and interviews with representative potential users have
been conducted and guided us in understanding the requirements of the end users of
the Khresmoi system. The surveys were conducted with 385 members of the general
public3, 556 physicians4 and 34 radiologists5. Questions asked included their use of
search systems to get medical information, as well as features they would like in a
new medical information system. Based on the results of these requirements, two
prototype systems have been developed: one for the general public and general practi-
tioners; and the other for radiologists. Detailed evaluation plans are being developed
for these prototypes. The first steps in this strategy relate to the experimental subjects
and evaluation scenarios. In this section we highlight the considerations of these ele-
ments in our large scale user-centered evaluation strategy.
   Firstly, we define demographic features of each class of user in order to get repre-
sentative groups. This will allow us to recruit suitable subjects, as well as to explore
any relationships between the demographics and users satisfaction regarding the sys-
tem features. In order to conduct thorough user-centered evaluation of our multilin-
gual medical system servicing the different categories of users, each of these user
categories should be evenly represented in the evaluations (general public, general
practitioners and radiologists). Gaining subjects of this nature can be problematic,
especially when professionals are required. Members of the general public will be
recruited through patients’ organisations, general practitioners through a Society of
Physicians, and radiologists through two hospitals. Using real subjects of this nature
raises issues such as whether subjects should be paid; the extent to which evaluations
can be conducted online; and practical issues related to the geographic distance be-
tween subjects - practicality and cost associated with subjects travelling to the inves-
tigator, or for the investigator to travel to many distant locations. Since our system is
intended for use by individuals with different native languages (English, German,
French, Spanish and Czech as test cases) and different skill levels in the English lan-
guage (ranging from none to fluent) for which translation support is provided, the
subjects used in the experiments should have varying levels of English. This will
enable us to evaluate the utility of the translation support functionality for individuals
with different English levels. Gaining a good spread of subjects with different lan-
guage skills, or indeed determining the language skills of the subjects one gains, is
nontrivial. Lessons on how to approach this topic could be taken from Marlow et al's
explorations into the effects of language skills on multilingual web search 6. Similar to
the language skills, other varying characteristics of the users may have a strong im-
pact on their use of the system and their satisfaction, e.g. medical knowledge level,
computer skills. To determine the demographic spread of subjects, standard question-
naires should be completed by subjects to determine their age, gender, medical condi-
tion, prior use of medical search engines, computer skill, etc.
   Our second consideration for a large scale user-centered evaluation strategy is de-
velopment of a robust methodology for scenario creation. The user studies mentioned
earlier in this section gave us an insight into the types of search tasks that users are
performing and would perform in the medical space. A method to model this task
information will be defined for use in development of experimental search scenarios.
This scenario creation process is complicated by the diversity of users. In creating
scenarios for the general public for example, we need to think in standardised form
about the different characteristics of users within this population, e.g. different Eng-
lish language skills. In doing this, we also need to be mindful that scenarios should
include situations examining both monolingual and multilingual search.


3      Conclusion
In this paper we described the importance of subject and scenario considerations for
user-centered evaluation strategy development of the Khresmoi multilingual multi-
modal biomedical information system, being developed for three target user groups:
general public; general practitioners; and radiologists.


Acknowledgements
The research leading to these results has received funding from the European Union Seventh
Framework Programme (FP7/2007-2013) under grant agreement n°257528 (KHRESMOI).


References
1. D. Rose and D. Levinson (2004). Understanding User Goals in Web Search. In WWW,
    p.13-19.
 2. R. Cline and K. Haynes (2001). Consumer health information seeking on the Internet: the
    state of the art. In Health Education Research 16(6).
 3. N. Pletneva and A. Vargas (2011). D8.1.1. Requirements for the general public health
    search. Technical report, Khresmoi Project.
 4. M. Gschwandtner, M. Kritz, and C. Boyer (2011). D8.1.2: Requirements of the health pro-
    fessional search. Technical report, Khresmoi Project.
 5. H. Müller (2011). D9.1: Report on image use behaviour and requirements. Technical re-
    port, Khresmoi Project.
 6. J. Marlow, P. Clough, J. C. Recuero, and J. Artiles (2008). Exploring the Effects of Lan-
    guage Skills on Multilingual Web Search. In ECIR 2008.

</pre>