=Paper= {{Paper |id=Vol-2167/short13 |storemode=property |title=Building Interaction Profiles for Better Search Tools in DLs |pdfUrl=https://ceur-ws.org/Vol-2167/short13.pdf |volume=Vol-2167 |authors=Maram Barifah |dblpUrl=https://dblp.org/rec/conf/desires/Barifah18 }} ==Building Interaction Profiles for Better Search Tools in DLs== https://ceur-ws.org/Vol-2167/short13.pdf
        Building Interaction Profiles for Be�er Search Tools in DLs
                                                                            Maram Barifah
                                                           Università della Svizzera italiana (USI)
                                                         Faculty of Informatics, Lugano, Switzerland
                                                                    maram.barifah@usi.ch
This research starts by considering users of a digital library (DL) and                 build a hierarchical taxonomy of the website, and [2] de�ne a code
aims at using the data extracted from logged �les, including search                     schema of the all types of the interactions based on analysing and
strategies, and queries, to build e�ective user-interaction pro�les.                    understanding the structure of the website. In this research we
and use them to guide designers and systems developers in the                           consider both strategies.
production of more usable, useful and e�ective interaction pro�les.                     Features engineering: Based on the interface and log �les analy-
It is important to stress that the proposed interaction pro�les are                     sis, the meaningful features are identi�ed.
built by extracting a number of features from the log �les. Thus, they                  Mining user behaviour: The remaining sessions are further anal-
contain information about real searching experiences, including                         ysed and grouped based on the available variables in the records.
usage patterns, user familiarity with the system, and time intervals.                   We started the analysis of the �rst million record which consists
    This study is conducted in collaboration with RERO Doc digi-                        of 125000 session records. 72095 sessions were detected with only
tal library1 . RERO Doc is the network of the libraries of Western                      one record. The aim of this phase is to identify di�erent usage pat-
Switzerland. Users from di�erent parts of the world can search                          terns among user interactions and group them accordingly. Two
on di�erent domains: Nursing, Economics, Computer Science and                           main di�erent grouping techniques were used: topic modelling
others. RERO Doc provides various document types such as books,                         and K-Means. For 6 topic model, the Coherence Score of the topic
articles, theses, periodicals,etc. Thus, the research questions are:                    modelling is 0.35. For K-Means the estimated number of clusters
    (1) What are the most suitable techniques to produce rich/realistic                 is 6 with Silhouette Coe�cient of 0.95. So far, six di�erent usage
        groups of data extracted from log �les in order to build inter-                 patterns have been identi�ed and interpreted qualitatively:
        action pro�les?                                                                     (1) Single sessions or known-item, where searchers visit RERO
    (2) What are the main features to characterise interaction pro�les?                         for downloading documents without any interactions.
    (3) What is the minimum size of data to produce robust groups to                        (2) Complicated sessions, where the usage pattern is charac-
        use for building interaction pro�les?                                                   terised by heavily interactions including submitting queries,
    Data preparing and processing:                                                              browsing, and using di�erent functions.
Interface analysis: the interface is inspected in order to understand                       (3) Light navigators who navigate the library for navigating
di�erent search options.                                                                        without using di�erent functions on the interface.
Preprocessing phase: we follow the framework of [2] as the follow-                          (4) Advance navigators whose navigations are characterised by
ing:                                                                                            using di�erent functions and many iterations.
Data loading: the dataset consists of 59 million records 20 GB col-                         (5) Light browsers whose searching is simple, short without
lected over a six-month period.                                                                 using di�erent functions.
Data cleaning: including users identi�cations hidden, elimination                           (6) Advanced browsers, their interactions is long and including
of the erroneous, and corrupted records.                                                        advance search functions.
Data parsing: consists of sessions recognition and removing the non-                       In conclusion, log �le analysis is an unobtrusive method to de-
human sessions e.g. Googlebot, SemanticScolarBot. The session "is                       tect usage patterns of digital library. The aim of this research in
a common unit of interaction that is used in search log analysis"                       progress is to build interactions pro�les in the digital library context
[3]. Session recognition depends on the user interactions and on                        in order to gain more insights into users searching experiences. We
the features of the interface. This phase is crucial for identifying                    plan more experiments to test and compare the e�ectiveness of the
distinct classes of searching patterns [2, 3]. We identify a session by                 techniques used to group data for preparing interaction pro�les.
the combinations of user IP, time stamp, and user agent extracted                       Then, we will involve experts to assess the quality of these inter-
from the log �les. Also, the non-human requests are removed in                          action pro�les and how e�ective these are in assisting designers
this stage and so data is reduced to 9 GB.                                              and system developers in the production of more usable, useful and
Data coding: in this phase, the URL requests were analysed and di-                      e�ective tools to support searchers.
vided into meaningful parts including user IP, time stamp, request,
referrer, user agent, session IP. Researchers follow di�erent strate-                   REFERENCES
gies to analyse the URLs embodied on the log �les. For example,[1]                      [1] Hui-Min Chen and Michael D Cooper. 2002. Stochastic modeling of usage patterns
                                                                                            in a web-based information system. Journal of the Association for Information
1 https://doc.rero.ch                                                                       Science and Technology 53, 7 (2002), 536–548.
                                                                                        [2] Yu Chi, Tingting Jiang, Daqing He, and Rui Meng. 2017. Towards an integrated
                                                                                            clickstream data analysis framework for understanding web users’ information
DESIRES, August 2018, Bertinoro, Italy                                                      behavior. iConference 2017 Proceedings (2017).
© 2018 Copyright held by the owner/author(s). Publication rights licensed to Associa-   [3] Tony Russell-Rose, Paul Clough, and Elaine G Toms. 2014. Categorising search ses-
tion for Computing Machinery.                                                               sions: some insights from human judgments. In Proceedings of the 5th Information
                                                                                            Interaction in Context Symposium. 251–254.