=Paper=
{{Paper
|id=Vol-2167/short13
|storemode=property
|title=Building Interaction Profiles for Better Search Tools in DLs
|pdfUrl=https://ceur-ws.org/Vol-2167/short13.pdf
|volume=Vol-2167
|authors=Maram Barifah
|dblpUrl=https://dblp.org/rec/conf/desires/Barifah18
}}
==Building Interaction Profiles for Better Search Tools in DLs==
Building Interaction Profiles for Be�er Search Tools in DLs
Maram Barifah
Università della Svizzera italiana (USI)
Faculty of Informatics, Lugano, Switzerland
maram.barifah@usi.ch
This research starts by considering users of a digital library (DL) and build a hierarchical taxonomy of the website, and [2] de�ne a code
aims at using the data extracted from logged �les, including search schema of the all types of the interactions based on analysing and
strategies, and queries, to build e�ective user-interaction pro�les. understanding the structure of the website. In this research we
and use them to guide designers and systems developers in the consider both strategies.
production of more usable, useful and e�ective interaction pro�les. Features engineering: Based on the interface and log �les analy-
It is important to stress that the proposed interaction pro�les are sis, the meaningful features are identi�ed.
built by extracting a number of features from the log �les. Thus, they Mining user behaviour: The remaining sessions are further anal-
contain information about real searching experiences, including ysed and grouped based on the available variables in the records.
usage patterns, user familiarity with the system, and time intervals. We started the analysis of the �rst million record which consists
This study is conducted in collaboration with RERO Doc digi- of 125000 session records. 72095 sessions were detected with only
tal library1 . RERO Doc is the network of the libraries of Western one record. The aim of this phase is to identify di�erent usage pat-
Switzerland. Users from di�erent parts of the world can search terns among user interactions and group them accordingly. Two
on di�erent domains: Nursing, Economics, Computer Science and main di�erent grouping techniques were used: topic modelling
others. RERO Doc provides various document types such as books, and K-Means. For 6 topic model, the Coherence Score of the topic
articles, theses, periodicals,etc. Thus, the research questions are: modelling is 0.35. For K-Means the estimated number of clusters
(1) What are the most suitable techniques to produce rich/realistic is 6 with Silhouette Coe�cient of 0.95. So far, six di�erent usage
groups of data extracted from log �les in order to build inter- patterns have been identi�ed and interpreted qualitatively:
action pro�les? (1) Single sessions or known-item, where searchers visit RERO
(2) What are the main features to characterise interaction pro�les? for downloading documents without any interactions.
(3) What is the minimum size of data to produce robust groups to (2) Complicated sessions, where the usage pattern is charac-
use for building interaction pro�les? terised by heavily interactions including submitting queries,
Data preparing and processing: browsing, and using di�erent functions.
Interface analysis: the interface is inspected in order to understand (3) Light navigators who navigate the library for navigating
di�erent search options. without using di�erent functions on the interface.
Preprocessing phase: we follow the framework of [2] as the follow- (4) Advance navigators whose navigations are characterised by
ing: using di�erent functions and many iterations.
Data loading: the dataset consists of 59 million records 20 GB col- (5) Light browsers whose searching is simple, short without
lected over a six-month period. using di�erent functions.
Data cleaning: including users identi�cations hidden, elimination (6) Advanced browsers, their interactions is long and including
of the erroneous, and corrupted records. advance search functions.
Data parsing: consists of sessions recognition and removing the non- In conclusion, log �le analysis is an unobtrusive method to de-
human sessions e.g. Googlebot, SemanticScolarBot. The session "is tect usage patterns of digital library. The aim of this research in
a common unit of interaction that is used in search log analysis" progress is to build interactions pro�les in the digital library context
[3]. Session recognition depends on the user interactions and on in order to gain more insights into users searching experiences. We
the features of the interface. This phase is crucial for identifying plan more experiments to test and compare the e�ectiveness of the
distinct classes of searching patterns [2, 3]. We identify a session by techniques used to group data for preparing interaction pro�les.
the combinations of user IP, time stamp, and user agent extracted Then, we will involve experts to assess the quality of these inter-
from the log �les. Also, the non-human requests are removed in action pro�les and how e�ective these are in assisting designers
this stage and so data is reduced to 9 GB. and system developers in the production of more usable, useful and
Data coding: in this phase, the URL requests were analysed and di- e�ective tools to support searchers.
vided into meaningful parts including user IP, time stamp, request,
referrer, user agent, session IP. Researchers follow di�erent strate- REFERENCES
gies to analyse the URLs embodied on the log �les. For example,[1] [1] Hui-Min Chen and Michael D Cooper. 2002. Stochastic modeling of usage patterns
in a web-based information system. Journal of the Association for Information
1 https://doc.rero.ch Science and Technology 53, 7 (2002), 536–548.
[2] Yu Chi, Tingting Jiang, Daqing He, and Rui Meng. 2017. Towards an integrated
clickstream data analysis framework for understanding web users’ information
DESIRES, August 2018, Bertinoro, Italy behavior. iConference 2017 Proceedings (2017).
© 2018 Copyright held by the owner/author(s). Publication rights licensed to Associa- [3] Tony Russell-Rose, Paul Clough, and Elaine G Toms. 2014. Categorising search ses-
tion for Computing Machinery. sions: some insights from human judgments. In Proceedings of the 5th Information
Interaction in Context Symposium. 251–254.