<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using Closed Itemsets for Implicit User Authentication in Web Browsing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>O. Coupelon</string-name>
          <email>olivier.coupelon@almerys.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>D. Dia</string-name>
          <email>diye.dia@almerys.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>F. Labernia</string-name>
          <email>fabien.labernia@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Y. Loiseau</string-name>
          <email>loiseau@isima.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>O. Raynaud</string-name>
          <email>raynaud@isima.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Almerys</institution>
          ,
          <addr-line>46 rue du Ressort, 63967 Clermont-Ferrand</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Blaise Pascal University</institution>
          ,
          <addr-line>24 Avenue des Landais, 63170 AubiŁre</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Faced with both identity theft and the theft of means of authentication, users of digital services are starting to look rather suspiciously at online systems. The behavior is made up of a series of observable actions of an Internet user and, taken as a whole, the most frequent of these actions amount to habit. Habit and reputation oer ways of recognizing the user. The introduction of an implicit means of authentication based upon the user's behavior allows web sites and businesses to rationalize the risks they take when authorizing access to critical functionalities. In this paper, we propose a new model for implicit authentication of web users based on extraction of closed patterns. On a data set of web navigation connection logs of 3;000 users over a six-month period we follow the experimental protocol described in [1] to compute performance of our model.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In order to achieve productivity gains, companies are encouraging their
customers to access their services via the Internet. It is accepted that on-line
services are more immediate and more user-friendly than accessing these services
via a brick and mortar agency, which involves going there and, more often than
not, waiting around [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Nevertheless, access to these services does pose
security problems. Certain services provide access to sensitive data such as banking
data, for which it is absolutely essential to authenticate the users concerned.
However identity thefts are becoming more and more numerous [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We can
distinguish two paradigms for increasing access security. The rst one consists of
making access protocols stronger by relying, for example, on external devices for
transmitting access codes that are supplementary to the login/password pair.
Nevertheless, these processes are detrimental to the user-friendliness and
usability of the services. The number of transactions abandoned before reaching the
end of the process is increasing and exchange volumes are decreasing. The
second paradigm consists to the contrary of simplifying the identication processes
in order to increase the exchange volumes. By way of examples, we can mention
single-click payment [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or using RFID chips for contactless payments. Where
these two paradigms meet is where we nd implicit means of authentication.
A means of authentication is a process that makes it possible to ensure that the
identity declared in the event of access is indeed the user’s identity. Traditionally,
a user authenticates himself or herself by providing proof of identity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This
process is called explicit authentication. In contrast, implicit authentication
does not require anything from the user but instead studies his or her behavior,
the trail left by the user’s actions, and then either does or does not validate the
declared identity. An implicit means of authentication cannot replace traditional
means of authentication as it is necessary for the user to have access to his or
her service so that the person’s behavior may be studied and their identity
can either be validated or rejected. To the contrary, if it is eective, it would
enable stronger authentication modes to be avoided (such as chip cards and PIN
numbers), which are detrimental to the usability of services. The challenge is to
detect identity theft as quickly as possible and, to the contrary, to validate a
legitimate identity for as long a time as possible.
      </p>
      <p>
        This contribution is organized as follows: in section 2 we shall oer a
state-ofthe-art about implicit authentication and user’s prole in web browsing. Then
we propose a learning model for implicit authentication of web users we are
dealing with in section 3. In section 4, we compare several methods for building
proles of each user. We faithfully reproduce the experimental study conducted
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and we analyze all of our results. Finally, in section 5, we shall resume our
results and discuss our future work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>
        In his survey of implicit authentication for mobile devices ([
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), the author says of
an authentication system that it is implicit if the system does not make demands
of the user (see Table 1).
      </p>
      <p>
        Implicit authentication systems were studied very quickly for mobile phones.
In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the authors studied behaviour based on variables specic to
smartphones such as calls, SMS’s, browsing between applications, location, and the
time of day. Experiments were conducted based on the data for 50 users over
a period of 12 days. The data were gathered using an application installed by
users who were volunteers. The users’ proles were built up from how frequently
positive or negative events occurred and the location. Within this context, a
positive event is an event consistent with the information gathered upstream.
By way of an example, calling a number which is in the phone’s directory is a
positive event. The results of this study show that based on ten or so actions,
you can detect fraudulent use of a smartphone with an accuracy of 95%. In a
quite dierent context, the authors of [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] relied on a Bayesian classication in
order to associate a behaviour class with each video streaming user. The data
set is simulated and consists of 1;000 users over 100 days. The variables taken
into account are the quality of the ow, the type of program, the duration of the
session, the type of user, and the popularity of the video. The results are mixed,
because the model proposed admits to an accuracy rate of 50%.
      </p>
      <p>Capturing</p>
      <p>Method</p>
      <sec id="sec-2-1">
        <title>Feature</title>
      </sec>
      <sec id="sec-2-2">
        <title>Token</title>
      </sec>
      <sec id="sec-2-3">
        <title>Passcode</title>
        <p>Keyboard input</p>
      </sec>
      <sec id="sec-2-4">
        <title>Explicit Using Closed Itemsets for Implicit User Authentication in Web Browsing 3</title>
      </sec>
      <sec id="sec-2-5">
        <title>Implicit/Explicit Spoong Threats</title>
      </sec>
      <sec id="sec-2-6">
        <title>Problems</title>
      </sec>
      <sec id="sec-2-7">
        <title>Face &amp; Iris Camera</title>
      </sec>
      <sec id="sec-2-8">
        <title>Both</title>
      </sec>
      <sec id="sec-2-9">
        <title>Keystroke Keyboard</title>
      </sec>
      <sec id="sec-2-10">
        <title>Location</title>
      </sec>
      <sec id="sec-2-11">
        <title>Network</title>
        <p>GPS, infrastruc- Implicit
ture
Software protocol Implicit
(e.g. WireShark)</p>
      </sec>
      <sec id="sec-2-12">
        <title>Informed</title>
        <p>strangers</p>
      </sec>
      <sec id="sec-2-13">
        <title>Informed strangers Implicit, explicit Typing imitation Long training possible (dicult) phase, reliability</title>
      </sec>
      <sec id="sec-2-14">
        <title>Keyloggers,</title>
        <p>Shoulder Surng</p>
      </sec>
      <sec id="sec-2-15">
        <title>Guessable words pass</title>
      </sec>
      <sec id="sec-2-16">
        <title>Picture of the le- Lighting situagitimate user tion and make-up</title>
      </sec>
      <sec id="sec-2-17">
        <title>Traveling, precision</title>
      </sec>
      <sec id="sec-2-18">
        <title>Precision</title>
        <p>Hardware device Mainly explicit, None
implicit possible</p>
      </sec>
      <sec id="sec-2-19">
        <title>Easily stolen or lost Table 1. Comparison of dierent authentication methods</title>
        <p>
          The particular context of implicit authentication for web browsing was studied
in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the author adopted the domain name, the
number of pages viewed, the session start time, and its duration, as characteristic
variables. The data set, which was gathered by a service provider, consisted of
300 rst connections by 2;798 users over a period of 12 months. The user proles
consisted of patterns with a size of 1. The author compares several pattern
selection approaches like the support and the lift approaches. The study shows that
for small, anonymous behavioural patterns (involving up to twenty or so sites
visited), the most eective models are still traditional classication models like
decision trees. On the other hand, whenever anonymous behaviour exceeds 70 or
so sites, the support and lift-based classication models are more accurate. The
study conducted in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] states that the size of the data set remains a determining
parameter. Their study, conducted on 10 users over a one-month period, did not
enable them to build a signicant model for distinguishing users. The authors
also concluded that no variable taken individually enables a user to be
authenticated. Drawing inspiration from a study conducted in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the authors of [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]
studied several techniques for spying on a user who holds a dynamic IP address,
based on behavioural models. The methods compared are seeking motives, the
nearest neighbours technique, and the multinomial Bayesian classier. The data
set consisted of DNS requests from 3;600 users over a two-month period. In this
study, only the most signicant variables and the most popular host names were
considered. The accuracy rates for the models proposed were satisfactory.
The study that we conduct in this paper also forms part of a continuation of the
work by [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We faithfully reproduce his experimental protocol on our data and
we compare performance of our classication algorithm to his specic models.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Models</title>
      <p>We propose an intuitive learning model architecture for user authentication.
From a data set of web browsing logs we compute a set of own patterns for
each user. A pattern is a set of frequently visited sites. The size of pattern
may vary. Thanks to these proles we are able to provide an authentication
for anonymous sessions. We then compute confusion matrices and we provide
precisions of the models. In our present study, we compare performance of a naive
Bayes classier to variations on k-nearest neighbors algorithms. More precisely,
the studied parameters are selection process of user own patterns, computation
process of user proles and distance functions computed for classication stage.
Figure 1 outlines the framework of the machine learning process.</p>
      <sec id="sec-3-1">
        <title>Past</title>
        <p>Behaviour</p>
      </sec>
      <sec id="sec-3-2">
        <title>Anonymous Behaviour</title>
        <p>?</p>
      </sec>
      <sec id="sec-3-3">
        <title>Learning Algorithms</title>
        <sec id="sec-3-3-1">
          <title>User ?</title>
          <p>Prole - Score Computation</p>
          <p>- AutheUnsteicration
We call a session a set of visited web sites at a specic time by a given user ui
such as i 2 [1;n] and n is the number of users. The size of a session is limited
and equal to 10. The learning database of each user ui takes the form of a set of
sessions denoted Sui and is built from log data3. We call S = Si Sui the whole
set of sessions of the database.</p>
          <p>We call Wui the whole set of web sites visited at least once by user ui and we
call W = SiWui the whole set of visited sites. The order of visited web sites is
not taken into account by this model.</p>
          <p>Denition 1 (k-pattern). Let W be a set of visited web sites and S be a set
of sessions on W . A subset P of W is called a k pattern where k is the size
of P . A session S in S is said to contain a k pattern P if P S.
Denition 2 (Support and relative support (lift)). We dene the support
of a pattern P as the percentage of sessions in S containing P (by extension we
give the support of a pattern in the set of sessions of a given user ui):
supportS (P ) = jjfS 2 S j P</p>
          <p>jjSjj</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3 Cf. section 4.1</title>
          <p>Sgjj
supportSui (P ) = jjfS 2 Sui j P
jjSui jj</p>
          <p>Sgjj
For a given user the relative strength of a pattern is equivalent to the lif t in
a context of association rules (i.e. the support of the pattern within this user
divided by the support of the pattern across all users). More formally:
lif tSui S (P ) =
supportSui (P )
supportS (P )
The support measures the strength of a pattern in behavioral description of a
given user. The relative support mitigates support measure by considering the
pattern’s support on the whole sessions set. The stronger the global support of
a pattern, the lesser characteristic of a specic user.</p>
          <p>
            The tf-idf is a numerical statistic that is intended to reect how relevant a word
is to a document in a corpus. The tf-idf value increases proportionally to the
number of times a word appears in the document, but is oset by the frequency
of the word in the whole corpus ([
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]). In our context, a word becomes a pattern,
a document becomes a set of sessions Sui of a given user and the corpus becomes
the whole set S of all sessions.
          </p>
          <p>Denition 3 ( tf idf ). Let P be a pattern, let U be a set of users and Up U
such that 8ui 2 Up, supportSui (P ) 6= 0. Let Sui be a set of sessions of a given
user ui and S a whole set of sessions. The normalized term frequency denoted
tf (P ) is equal to supportSui (P ) and the inverse document frequency denoted
idf (P ) is equal to log (jjU jj=jjUP jj). We have:
tf
idf (P ) = supportSui (P )
log
jjU jj
jjUP jj
Denition 4 (Closure system). Let S be a collection of sessions on the set
W of web sites. We denote Sc the closure under intersection of S. By adding W
in Sc, Sc is called a closure system.</p>
          <p>Denition 5 (Closure operator). Let W be a set, a map C: 2W
a closure operator on W if for all sets A and B in W we have: A
A B =) C(A) C(B) and C(C(A)) = C(A).
! 2W is</p>
          <p>C(A),
2TWheboyre8mA 21.2WLe,tCSScc (bAe )a=clTosfuSre2sySsctejmAon SWg .isTahecnlotshueremoapperCatSocr doenneWd o4n.
Denition 6 (Closed pattern 5). Let Sc be a closure system on W and CSc
its corresponding closure operator. Let P be a pattern (i.e. a set of visited sites),
we said that P is a closed pattern if CSc (P ) = P .</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>4 Refer to the book of [15].</title>
          <p>
            5 This denition is equivalent to a concept of the formal context K = (S;W;I) where
S is a set of objects, W a set of attributes and I a binary relation between S and
W [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ].
          </p>
        </sec>
        <sec id="sec-3-3-4">
          <title>Own patterns selection</title>
          <p>
            The rst and most important step of our model, called own patterns selection
is to calculate the set of own patterns for each user ui. This set of patterns is
denoted Pui = fPi;1; Pi;2;:::; Pi;pg. In [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], the author states that p = 10 should be
a reference value and that beyond this value model performance are stable. We
shall follow that recommendation. In [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], 10 frequent 1 patterns are selected
for each user. The aim of our study is to show that it could be more ecient
to select closed k patterns. But, the number of closed patterns should be
strong, so we compare three heuristics ( H1, H2 and H3) to select the 10 closed
patterns of each user. For each heuristic, closed patterns are computed thanks
to Charm algorithm ([
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]) provided on the Coron platform ([
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]). Only closed
patterns with a size lower than or equal to 7 are considered. These heuristics are
presented here:
1. 10 1 patterns with the largest support values (as in [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ])
2. H1: 40 closed k patterns with the largest tf-idf values.
3. H2: 10 ltered closed k patterns with the largest support and maximal
values by inclusion set operator.
4. H3: 10 ltered closed k patterns with the largest tf-idf and minimal values
by inclusion set operator.
          </p>
          <p>Algorithm 1 describes the process of H1 to select the 40 own patterns for a given
user. With H1, the model performance is improved when p increases up to 40.
p = 10 is the better choice for H2 and H3. The best results are from H1.</p>
          <p>Algorithm 1: H1: 40 closed k patterns with the largest tf-idf values.</p>
          <p>Data: Cui : the set of closed itemsets of user ui from Charm;
p : the number of selected own patterns;</p>
          <p>Result: Pui : the set of own patterns of user ui;
1 begin
2 Compute the tf idf for each pattern from Charm;
3 Sort the list of patterns in descending order according to the tf
value;
4 Return the top p patterns;
idf
3.3</p>
          <p>User proles computation
We dene and we denote Pall = Si Pui the whole set of own patterns. The set
Pall allows us to dene a common space in which all users could be embedded.
More formally, Pall denes a vector space V of size all = jjPalljj where a given
user ui is represented as a vector Vui = (mi;1;mi;2;:::;mi;all).</p>
          <p>
            The second step of our model, called user prole computation , is to compute, for
each user ui, a numerical value for each component mi;j of the vector Vui . i is
the user id, j 2 [1;all] is a pattern id and m stands for a given measure. In this
paper, we compare two measures proposed in [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]: the support and the lift.
mi;j = supportSui (Pj)
and
mi;j = lif tSui S (Pj)
3.4
          </p>
          <p>Authentication stage
In our model, the authentication step is based on the identication. For that
purpose, our model guesses the user corresponding to an anonymous set of
sessions, then it checks if the guessed identity corresponds to the real identity. From
this set of sessions we have to build a test prole and to nd the nearest user
prole dened during the learning step.</p>
          <p>Test sessions Performance of our models are calculated on anonymous data
sets of growing size.The more information available, the better the classication
will be. The rst data set consists of only one session, the second consists of 10
sessions, the third one consists of 20 sessions, and the last one consists of 30
sessions. For the test phase, all sessions have the same size of 10 sites.
Building test prole Let S be the whole set of sessions from the learning
data set. Let Sut be an anonymous set of sessions and Vut = (mt;1;mt;2;:::;mt;all)
its corresponding prole vector. We will compare two approaches to build the
anonymous test prole, the support and the lift:
8i; mt;i = supportSut (Pi)
and
8i; mt;i = lif tSut S =
supportSut (Pi)
supportS (Pi)
Distance functions Let Vui = (mi;1;mi;2;:::;mi;all) and Vut = (mt;1;mt;2;:::;mt;all)
be two proles. We denoted DisEuclidean(Vui ;Vut ) the Euclidean distance and
we denote SimCosine(Vui ;Vut ) the cosine similarity function. We have:
DisEuclidean(Vui ;Vut ) = sX(mt;j</p>
          <p>mi;j)2
SimCosine(Vui ;Vut ) =</p>
          <p>j</p>
          <p>Pj(mt;j
qPj(mt;j)2
mi;j)
Pj(mi;j)2
4
4.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental results</title>
      <sec id="sec-4-1">
        <title>Data set</title>
        <p>
          Our data set is comprised of the web navigation connection logs of 3;000 users
over a six-month period. We have at our disposal the domain name visited and
each user ID. From the variables of day and time of connection we have
constructed connection sessions for each user. A session is therefore a set of web
sites visited. The number of visited web sites per session is limited and equal
to 10. For the relevance of our study we used Adblock 6 lters to remove all
domains regarded as advertising. The majority of users from this data set are not
suciently active to be of relevance. Therefore, as in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], we have limited our
study to the 2% of most active users and obtained the signicant session sets for
52 users. The 30 users most active (who have a large number of sessions) among
those 52 users are used in this paper. Table 2 gives the detailed statistics for this
data set.
        </p>
        <p>7698 sessions</p>
        <p>Size
#sessions/users</p>
        <p>Minimum Maximum Mean Standard deviation
10 10 10 0
101 733 257 289
Algorithm 2 (see appendix) describes our experimental protocol. The rst loop
sets the size of the set of users among which a group of anonymous sessions will
be classied. The second one sets the size of this sessions group. Finally, the third
loop sets the number of iterations used to compute the average accuracy rate.
The loop on line 10 computes the specic patterns of each user and establishes
the proles vector. The loop on line 13 computes the vector’s components for
each user. The nested loops on lines 16 and 18 classify test data and compute
the accuracy rate.
From own patterns of each user we compute the set Pall as the whole set of
own patterns which denes the prole vector of each user. We use the support
of a pattern as numerical value for each components (cf. section 3.3). Following
Table 3 provides the size of the prole vector and the distribution of own patterns
according to size for each heuristic. With 30 users and 10 own patterns per user,
the maximal size of the prole is 300.
20
0
5
10</p>
        <p>
          Comparative performance with [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the author compares, in particular, two methods of prole vector calculus.
In both cases, the own patterns are size 1 and are chosen amongst the most
frequent. The rst method, named support-based proling, uses the corresponding
support pattern as the numerical value for each component of the prole vector.
The second method, called lift-based proling, uses the lift measure. In order to
compare the performances of the H1 model with the two models support-based
proling and lift-based proling, we have accurately replicated the experimental
protocol described in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] on our own data set. The results are given in Table 4.
        </p>
        <p>
          The data of Table 4 highlight that the H1 heuristic allows rates that are
perceptibly better than those of the two models proposed in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] in all possible
scenarios. Nevertheless, the Bayes classier remains the most ecient when the
session group is size 1 in compliance with [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Figure 3 allows a clearer
understanding of the moment the Bayes curve crosses the H1 heuristic curve.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>6 http://adblock-listefr.com/</title>
        <sec id="sec-4-2-1">
          <title>Support Lift</title>
          <p>Charm H1
Bayes
Support
Lift
Charm H1
Bayes
Support
Lift
Charm H1
Bayes
Support
Lift
Charm H1
Bayes
Support
Lift
Charm H1
Bayes
1
65
67
72
85
40
41
49
67
27
29
37
54
19
21
30
43
16
17
26
39
10
89
90
98
99
74
78
90
96
66
64
83
91
55
58
76
87
53
54
72
83
20
95
97
99
73
83
86
95
56
79
77
92
51
68
68
86
48
64
64
83
46
cy 60
a
r
u
c
cA 40</p>
          <p>20</p>
          <p>Comparative performance of distance functions
The last gure 4 shows the impact of distance function choice on performances
of models.</p>
          <p>80</p>
          <p>Figure 4 illustrates the signicance of the distance function concerning the
performance. Indeed, when used with Euclidean distance, the H1 method is a bit
more precise than the lift one (about 3%). However, performances are improved
by using the cosine similarity and their relative ranking is even reversed. H1
method’s performance are then better than lift by 10%.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and future work</title>
      <p>
        In this study, we proposed a learning model for implicit authentication of web
users. We proposed an simple and original algorithm (cf. Algorithm 1) to get a set
of own patterns allowing to characterize each web user. The taken patterns have
dierent size and qualify as closed patterns from closure system generated by
the set of sessions (cf. Table 3). By reproducing experimental protocol described
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], we showed that the performances of our model are signicantly better
than some models proposed in the literature (cf. Table 4). We also showed the
key role of the distance function (cf. Figure 4).
      </p>
      <p>This study should be extended in order to improve the obtained results. For
a very small sites ow, the results of the solution should be better than results
from Bayes’ method. Another way to improve results will be to select other types
of variable and to add them to our current dataset. The selection of data has an
undeniable impact on the results.
Appendix</p>
      <sec id="sec-5-1">
        <title>Algorithm 2: Experiment procedure</title>
        <p>Data: Si Sui : all sessions from n users;
X : number of successive executions;</p>
        <p>Result: The mean accuracy of select models;
1 begin
2 for (N = f2; 5; 10; 20; 30g) do
3 for (S = f1; 10; 20; 30g) do
4 for (z = 1; : : : ;X) do
5 Select N random users;
6 For each user, select SN = min(jSui j; i = 1; : : : ;N );
7 Take the 23 of the SN sessions from each users to form the
training set;
Take the rest of SN sessions to form the validation set;
Pakll ; (the global prole vector for each model k);
for each (ui; i = 1; : : : ;N ) do</p>
        <p>Compute the own patterns Puki (1 jPuki j 10);</p>
        <p>Pakll Pakll [ Puki ;
for each (ui; i = 1; : : : ;N ) do</p>
        <p>Compute the vector Vuki with support or lift;
15
16
17
18
19
20
21</p>
        <p>Initialize to 0 the confusion matrix M k of the method k;
for each (ui; i = 1; : : : ;N do</p>
        <p>Compute the test stream Tui (jT j is xed ; T 2 Tui );
while (Tui 6= ;) do</p>
        <p>Take SW sessions from Tui to compute V k;
T
ua max(simil(Vuki ;VTk)) or min(dist(Vuki ;VTk));</p>
        <p>M k[ui][ua] M k[ui][ua] + 1;
Compute the mean accuracy of k from M k;</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.C.</given-names>
          </string-name>
          :
          <article-title>Web user behavioral proling for user identication</article-title>
          .
          <source>Decision Support Systems (49)</source>
          (
          <year>2010</year>
          ) pp.
          <fpage>261271</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Guvence-Rodoper</surname>
            ,
            <given-names>C.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benbasat</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cenfetelli</surname>
          </string-name>
          , R.T.:
          <article-title>Adoption of B2B Exchanges: Eects of IT-Mediated Website Services, Website Functionality, Benets, and</article-title>
          <string-name>
            <surname>Costs. ICIS</surname>
          </string-name>
          <year>2008</year>
          Proceedings (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lagier</surname>
          </string-name>
          , F.:
          <source>CybercriminalitØ : 120</source>
          .
          <article-title>000 victimes d'usurpation d'identitØ chaque annØe en france</article-title>
          .
          <source>Le populaire du centre (in French)</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Filson</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>The impact of e-commerce strategies on rm value: Lessons from Amazon.com and its early competitors</article-title>
          .
          <source>The Journal of Business</source>
          <volume>77</volume>
          (
          <issue>S2</issue>
          ) (
          <year>2004</year>
          ) pp.
          <fpage>S135S154</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang, H.,
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , J.:
          <article-title>A novel service-oriented AAA architecture</article-title>
          .
          <volume>3</volume>
          (
          <issue>2003</issue>
          ) pp.
          <fpage>28332837</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Stockinger</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Implicit authentication on mobile devices</article-title>
          .
          <source>The Media Informatics Advanced Seminar on Ubiquitous Computing</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jakobsson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chow</surname>
          </string-name>
          , R.:
          <article-title>Implicit authentication through learning user behavior</article-title>
          . M.
          <string-name>
            <surname>Burmester</surname>
          </string-name>
          et al. (Eds.):
          <source>ISC</source>
          <year>2010</year>
          , LNCS
          <volume>6531</volume>
          (
          <year>2011</year>
          ) pp.
          <fpage>99113</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Jakobsson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Golle</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chow</surname>
          </string-name>
          , R.:
          <article-title>Implicit authentication for mobile devices</article-title>
          .
          <source>Proceeding HotSec'09 Proceedings of the 4th USENIX conference on Hot topics in security (</source>
          <year>2009</year>
          ) pp.
          <fpage>99</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ullah</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doyen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gati</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Un classieur du comportement des utilisateurs dans les applications pair--pair de streaming vidØo</article-title>
          .
          <article-title>CFIP 2011 - Colloque Francophone sur l'IngØnierie des Protocoles (in French) (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Goel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hofman</surname>
            ,
            <given-names>J.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirer</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Who does what on the web: A large-scale study of browsing behavior</article-title>
          .
          <source>In: ICWSM</source>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomkins</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A characterization of online browsing behavior</article-title>
          .
          <source>In: Proceedings of the 19th international conference on World wide web, ACM</source>
          (
          <year>2010</year>
          ) pp.
          <fpage>561570</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Abramson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aha</surname>
            ,
            <given-names>D.W.:</given-names>
          </string-name>
          <article-title>User authentication from web browsing behavior</article-title>
          .
          <source>Proceedings of the Twenty-Sixth International Florida Articial Intelligence Research Society</source>
          Conference pp.
          <fpage>268273</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Herrmann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banse</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Federrath</surname>
          </string-name>
          , H.:
          <article-title>Behavior-based tracking: Exploiting characteristic patterns in DNS trac</article-title>
          . Computer &amp;
          <string-name>
            <surname>Security</surname>
          </string-name>
          (
          <year>2013</year>
          ) pp.
          <fpage>117</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Automatic text processing: The transformation, analysis and retrieval of information by computer</article-title>
          .
          <source>Addison Wesley</source>
          (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Davey</surname>
            ,
            <given-names>B.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Priestley</surname>
            ,
            <given-names>H.A.</given-names>
          </string-name>
          :
          <article-title>Introduction to lattices and orders</article-title>
          . Cambridge University Press (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
          </string-name>
          , R.:
          <article-title>Formal concept analysis</article-title>
          ,
          <source>mathematical foundation</source>
          , BerlinHeidelberg-NewYork et al.:Springer (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zaki</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hsiao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Ecient algorithms for mining closed itemsets and their lattice structure</article-title>
          .
          <source>IEEE Transactions on knowledge and data engineering 17(4)</source>
          (
          <year>2002</year>
          ) pp.
          <fpage>462478</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Szathmary</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Symbolic Data Mining Methods with the Coron Platform</article-title>
          .
          <source>PhD Thesis</source>
          in Computer Science, University Henri PoincarØ Nancy 1,
          <string-name>
            <surname>France</surname>
          </string-name>
          (
          <year>Nov 2006</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>