<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Stochastic modeling of usage patterns
in a web-based information system. Journal of the Association for Information
Science and Technology</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Building Interaction Profiles for Be er Search Tools in DLs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maram Barifah</string-name>
          <email>maram.barifah@usi.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Università della Svizzera italiana (USI) Faculty of Informatics</institution>
          ,
          <addr-line>Lugano</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>53</volume>
      <issue>7</issue>
      <abstract>
        <p>This research starts by considering users of a digital library (DL) and aims at using the data extracted from logged les, including search strategies, and queries, to build e ective user-interaction pro les. and use them to guide designers and systems developers in the production of more usable, useful and e ective interaction pro les. It is important to stress that the proposed interaction pro les are built by extracting a number of features from the log les. Thus, they contain information about real searching experiences, including usage patterns, user familiarity with the system, and time intervals. This study is conducted in collaboration with RERO Doc digital library1. RERO Doc is the network of the libraries of Western Switzerland. Users from di erent parts of the world can search on di erent domains: Nursing, Economics, Computer Science and others. RERO Doc provides various document types such as books, articles, theses, periodicals,etc. Thus, the research questions are:</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>(1) What are the most suitable techniques to produce rich/realistic
groups of data extracted from log les in order to build
interaction pro les?
(2) What are the main features to characterise interaction pro les?
(3) What is the minimum size of data to produce robust groups to
use for building interaction pro les?</p>
      <p>Data preparing and processing:
Interface analysis: the interface is inspected in order to understand
di erent search options.</p>
      <p>Preprocessing phase: we follow the framework of [2] as the
following:
Data loading: the dataset consists of 59 million records 20 GB
collected over a six-month period.</p>
      <p>Data cleaning: including users identi cations hidden, elimination
of the erroneous, and corrupted records.</p>
      <p>Data parsing: consists of sessions recognition and removing the
nonhuman sessions e.g. Googlebot, SemanticScolarBot. The session "is
a common unit of interaction that is used in search log analysis"
[3]. Session recognition depends on the user interactions and on
the features of the interface. This phase is crucial for identifying
distinct classes of searching patterns [2, 3]. We identify a session by
the combinations of user IP, time stamp, and user agent extracted
from the log les. Also, the non-human requests are removed in
this stage and so data is reduced to 9 GB.</p>
      <p>Data coding: in this phase, the URL requests were analysed and
divided into meaningful parts including user IP, time stamp, request,
referrer, user agent, session IP. Researchers follow di erent
strategies to analyse the URLs embodied on the log les. For example,[1]
build a hierarchical taxonomy of the website, and [2] de ne a code
schema of the all types of the interactions based on analysing and
understanding the structure of the website. In this research we
consider both strategies.</p>
      <p>Features engineering: Based on the interface and log les
analysis, the meaningful features are identi ed.</p>
      <p>Mining user behaviour: The remaining sessions are further
analysed and grouped based on the available variables in the records.
We started the analysis of the rst million record which consists
of 125000 session records. 72095 sessions were detected with only
one record. The aim of this phase is to identify di erent usage
patterns among user interactions and group them accordingly. Two
main di erent grouping techniques were used: topic modelling
and K-Means. For 6 topic model, the Coherence Score of the topic
modelling is 0.35. For K-Means the estimated number of clusters
is 6 with Silhouette Coe cient of 0.95. So far, six di erent usage
patterns have been identi ed and interpreted qualitatively:
(1) Single sessions or known-item, where searchers visit RERO
for downloading documents without any interactions.
(2) Complicated sessions, where the usage pattern is
characterised by heavily interactions including submitting queries,
browsing, and using di erent functions.
(3) Light navigators who navigate the library for navigating
without using di erent functions on the interface.
(4) Advance navigators whose navigations are characterised by
using di erent functions and many iterations.
(5) Light browsers whose searching is simple, short without
using di erent functions.
(6) Advanced browsers, their interactions is long and including
advance search functions.</p>
      <p>In conclusion, log le analysis is an unobtrusive method to
detect usage patterns of digital library. The aim of this research in
progress is to build interactions pro les in the digital library context
in order to gain more insights into users searching experiences. We
plan more experiments to test and compare the e ectiveness of the
techniques used to group data for preparing interaction pro les.
Then, we will involve experts to assess the quality of these
interaction pro les and how e ective these are in assisting designers
and system developers in the production of more usable, useful and
e ective tools to support searchers.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>