<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Event and anomaly detection using Tucker3 decomposition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hadi Fanaee Tork</string-name>
          <email>hadi.fanaee@fe.up.pt</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Márcia Oliveira</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>João Gama</string-name>
          <email>jgama@fep.up.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simon Malinowski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Morla</string-name>
          <email>ricardo.morla@inescporto.pt</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>INESC TEC, FEUP- University of Porto</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIAAD-INESC TEC, FEP- University of Porto</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LIAAD-INESC TEC, FEUP- University of Porto</institution>
        </aff>
      </contrib-group>
      <fpage>8</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>1 Failure detection in telecommunication networks is a vital task. So far, several supervised and unsupervised solutions have been provided for discovering failures in such networks. Among them unsupervised approaches has attracted more attention since no label data is required [1]. Often, network devices are not able to provide information about the type of failure. In such cases, unsupervised setting is more appropriate for diagnosis. Among unsupervised approaches, Principal Component Analysis (PCA) has been widely used for anomaly detection literature and can be applied to matrix data (e.g. Users-Features). However, one of the important properties of network data is their temporal sequential nature. So considering the interaction of dimensions over a third dimension, such as time, may provide us better insights into the nature of network failures. In this paper we demonstrate the power of three-way analysis to detect events and anomalies in timeevolving network data. Event detection can be briefly described as the task of discovering unusual behavior of a system during a specific period of the time. On the other hand, anomaly detection concentrates on the detection of abnormal points. So clearly it is different from event detection since it just considers the points rather than a group of points. Our work takes into account both issues using multi-way data analysis. Our methodology comprises the following steps: 1) Anomaly detection: detection of individual abnormal users 2) Generating user trajectories (i.e. behavior of users over time), 3) Clustering users' trajectories to discover abnormal trajectories and 4) Detection of events: group of users who show abnormal behavior during specific time periods. Although there is a rich body of research on the two mentioned issues, to the best of our knowledge we are the first ones applying multi-way analysis to the anomaly and event detection problem. In the remainder of this section we explain some basic and related concepts and works. Afterwards, we define the problem, and then discuss three-way analysis methods. Hereafter, we introduce the dataset and experiments. Finally, we discuss the results and point out possible future directions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>ANOMALY DETECTION</title>
      <p>
        Anomaly is as a pattern in the data that does not conform to the
expected behavior [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Anomaly detection has a wide range of
application in computer network intrusion detection, medical
informatics, and credit card fraud detection. A significant amount
of research has been devoted to solve this problem. However our
focus is on unsupervised methods. Anomaly detection techniques
can be classified into five groups [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]: classification-based,
clustering-based, nearest neighbor based, statistical methods,
information theory-based methods and spectral methods. Based on
this classification, our method is placed in the group of spectral
methods. These approaches first decompose the high-dimensional
data into a lower dimension space and then assume that normal and
abnormal data points appear significantly different from together.
This some benefits: 1) they can be employed in both unsupervised
and supervised settings 2) they can detect anomalies in high
dimensional data, and 3) unlike clustering techniques, they do not
require complicated manual parameter estimation. So far, most of
the work related to spectral anomaly detection was based on
Principal Component Analysis (PCA) and Singular Value
Decomposition (SVD). Two of the most important applications of
PCA during recent years has been in the domain of intrusion
detection [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and traffic anomaly detection [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
1.2
      </p>
    </sec>
    <sec id="sec-3">
      <title>EVENT DETECTION</title>
      <p>
        Due to huge amount of sequential data being generated by sensors,
event detection has become an emerging issue with several
realworld applications. Event is a significant occurrence or pattern that
is unusual comparing to the normal patterns of the behavior of a
system [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This can be natural phenomena or manual system
interaction. Some examples of events can be an attack on the
network, bioterrorist activities, epidemic disease, damage in an
aircraft, pipe-breaks, forest fires, etc. A real system behaves
normally most of the time, until an anomaly occurs that may cause
damages to the system. Since the effects of an event in the system
are not known a priori, detecting and characterizing abnormal
events is challenging. This is the reason why most of the time we
cannot evaluate different algorithms. One solution might be
injection of artificial event into the normal data. However,
construction of a realistic event pattern is not trivial [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
1.3
      </p>
    </sec>
    <sec id="sec-4">
      <title>HIDDEN MARKOV MODELS</title>
      <p>
        Hidden Markov Models (HMMs) have been used at least for the
last three decades in signal processing, especially in domain of
speech recognition. They have also been applied in many other
domains as bioinformatics (e.g. biological sequence analysis),
environmental studies (e.g. earthquake and wind detection), and
finance (financial time series). HMMs became popular for its
simplicity and general mathematical tractability [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        HMMs are widely used to describe complex probability
distributions in time series and are well adapted to model time
dependencies in such series. HMMs assume that observations
distribution does not follow a normal distribution and are generated
by different processes. Each process is dependent on the state of an
underlying and unobserved Markov process [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Markov process
denotes that the value of a process Xt only depends on the previous
value of X. Using notations of [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] let:
      </p>
      <p>T = Length of the Observation sequence
N = Number of states
Q = {q0, q1, …, q (N-1) } = distinct states of Markov process
A = State transition probabilities
B = set of N observation probability distributions
π = Initial State Distribution
O = (O0, O1,…, O (T-1)) = observation sequence</p>
      <p>A HMM Model is defined with the triple of λ= (A, B, π). It
assumes that Observations are drawn using the observation
probability distribution associated to the current state. The
transition probabilities between states are given in matrix A.</p>
      <p>
        The three main problems related with HMMs are the following.
The first problem consists in computing the probability P(O) that a
given observation sequence O is generated by a given HMM λ. The
second problem consists in finding the most probable sequence of
hidden states given an observation sequence O and λ and the third
problem is related to parameter inference. It consists in estimating
the parameters of the HMM λ that best fits a given observation
sequence O. The mainly used algorithms to solve these problems
are given in the last column of Table 1. More details about these
algorithms can be found in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In this paper, we deal with the
third problem to estimate the HMM parameters that best describe
time series, as it will be explained in Section 2.
      </p>
      <p>
        Problem
Problem 1
Problem 2
Problem 3
Traditional data analysis techniques such as PCA, clustering,
regression, etc. are only able to model two dimensional data and
they do not consider the interaction between more than two
dimensions. However, in several real-world phenomena, there is a
mutual relationship between more than two dimensions (e.g. a 3D
tensor (Users×Features×Time)) and thus, they should be analyzed
through a three-way perspective. Three-way analysis considers all
mutual dependencies between the different dimensions and
provides a compact representation of the original tensor in
lowerdimensional spaces. The most common three-way analysis models
are Tucker2, Tucker3, and PARAFAC [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which are generalized
versions of two-mode principal component model or, more
specifically, SVD. Following, we briefly introduce Tucker3 model
as the best-known method for analysis of three-way data.
      </p>
      <p>The modeling of such behavior is not straightforward because the
number of notification messages is not equal for each user during
the time period under analysis. For instance, one user may face 40
connection problems in an hour, hence generating 40 messages,
while others may face 5 or even no problems at all. In standard
event detection problems, for each time point there is a
measurement via one or multiple sensors. In the context of our
application, such measurements do not take place at regular time
points, since user modems (or sensors) only send messages to the
server when something unexpected occurs. Figure 2 illustrates two
sample users. Each circle represents the time stamp at which a
notification relative to the given user is received, while ΔT
represents the inter-arrival time between two consecutive
messages. As it can be seen, 2 messages were related to user 1 in
that period, while 4 were related to user 2 during the same period.
Also, the ΔT between messages is larger for user 1 than for user 2.
This means that user 2 sent messages more frequently than user 1.
As in many other event detection problems, we could easily use the
number of events per hour (measurement) at different users
(sensors) to detect the events but this way we would lose the
information content provided by the ΔT’s.</p>
      <p>
        As the number of ΔT is not the same for each user, this feature
cannot be directly integrated in our model. Hence, this would cause
some vectors to have different lengths, which is not supported by
the Tucker3 analysis. To solve this, every time-series of ΔT
relative to a given user is modeled by a 2-state HMM obtained by
the Baum-Welch algorithms [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. 6 parameters are extracted from
the HMM and are used to describe the time-series of ΔT of the
users. Using this approach we obtain the same number of features
for each user and, then, include this information in our feature
vectors.
Dataset is extracted from the usage log of a European IP/TV
service provider. The raw dataset includes the notification
messages of users in each line including their occurrence time. As
previously mentioned, it is not possible to use this data directly in
our modeling approach, so some pre-processing steps were
performed. In addition to the obtained HMM parameters for each
hour and for each user, we included another features, such as mean,
variance, entropy and number of messages per hour, to our feature
vector. We generated two separated datasets, each one spanning a
time period of one month, which is equivalent to 720 hours. In one
set we selected 102 users and in another we selected 909 users. The
latter dataset is an extended version of the former. We then
transformed both datasets to the tensor format. These datasets are
shown in a format of Tucker3 input tensor (figure 1) in Table 2
where I, J, K represent users, features and hours modes,
respectively.
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>EXPERIMENTS</title>
      <p>This section is divided into three subsections, according to the
steps mentioned in the Introduction section. In subsection 1, we
explain how we detect the abnormal users. In the next subsection
we describe how we generate user trajectories And in the last
subsection we explain how we cluster the trajectories using
hierarchical clustering and detect events using user trajectories.
4.1</p>
    </sec>
    <sec id="sec-6">
      <title>Abnormal Users</title>
      <p>
        We applied Tucker3 model to both datasets X102 and X909 by
employing a MATLAB package called Three-mode component
analysis (Tucker3) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Before that, we performed ANOVA test
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] to see the significance of three-way and two-way interaction
in the data. The results of this test are presented in Table 3. ANOVA
Max 2D represents the maximum value obtained via different
combinations of two-way modeling (e.g. I-J, J-K, I-K). As it can be
seen, bigger numbers are obtained for three-dimension interaction
(ANOVA 3D), which reveals that there is a mutual interaction
between the three dimensions in both datasets that can be explained
better with three-way modeling like Tucker3, than with two-way
modeling like PCA.
      </p>
      <p>Data
X102
X909</p>
      <p>
        The next step is to estimate the best parameters P, Q, R of
Equation 1. P-Q-R is similar to what we have in PCA. In PCA we
just determine the number of PCs for one dimension but here we
need to determine the number of principal components for each
one of the three modes. P, Q and R can assume values that fall
within the interval [
        <xref ref-type="bibr" rid="ref1">1,</xref>
        ], where denotes the maximum
number of entities in the corresponding mode. For example, in
terms of X102 the P-Q-R can go from 1-1-1 to 102-10-720. These
parameters are chosen based on a trade-off between model
parsimony, or complexity, and goodness of fit. For instance,
regarding the mentioned dataset, 1-1-1 gives about 28% fit (less
complete and less complex) and model 102-10-720 gives 100% fit
(most complete and most complex). If we try parameters 3-2-2 the
model has a 42% fit. So it can be more reasonable choice because
it finds a good compromise between complexity and fit. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] the
scree test method is proposed as a guideline to choose these
parameters. We used this test to determine the best model for both
datasets. The selected model parameters and their corresponding
fits are presented in Table 3. This means that, for example, for
dataset X102 if we choose Tucker3 model with 3, 2 and 2
components to summarize the users, features and hours modes,
respectively, the model is able to explain 42% of the total variance
contained in raw data. After the estimation of model parameters,
we used the selected model to decompose the raw data into a lower
dimensional subspace, as illustrated in Figure 1 and Equation 1.
After the decomposition we obtained matrices , and , a small
core tensor and a tensor of residual errors.
      </p>
      <p>In order to detect the abnormal users we simply projected the users
on the component space yielded by matrix . This projection is
presented in Figure 3, for dataset X102. The three-dimensional
subspace is given by the three obtained components by the model
for the 1st mode (users). As mentioned earlier, this number of
components is one of the parameters of the model, namely P = 3,
which corresponds to the first mode.
In order to evaluate the reliability of the model we used the same
procedure and applied a Tucker3 model to dataset X909, which
includes all users of X102. Our idea was to see how this model can
identify abnormal users from both datasets. For this purpose, we
computed the Euclidean distance between each user in the
projection space (see Figure 3) and the corresponding center (0, 0,
0), for both datasets X102 and X909. Then we normalized the
distances for each dataset and computed the Pearson correlation for
the common users of these two datasets, according to their distance
to the center of the subspace. We obtained a correlation of 68.44%.
Although, for X909 we just took 3 out of 40 main components to
and model fit was different for both datasets (42% for X102 and
51.01% for X909), abnormal or normal users in X102
approximately appeared as the same way in X909 with 68.44%
confidence. This denotes that Tucker3 is a robust model to detect
the abnormal users.
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>User Trajectories</title>
      <p>Visualization methods like the one we presented in Figure 3 are
not able to show the evolving behavior of users over time. We
need another solution to enable us understanding the behavior of
users over time. One solution is to project the users on a
decomposed feature space (matrix of Figure 1) for each time
point. Since both of our selected parameters have Q equal to 2 it
means that after projecting Users on feature space we must have a
coordinate of ( , %) for each timepoint and for each user. The
process of generating this coordinates is presented in Figure 4.
':, and ':,) represent the two components that summarize the
original entities of the features mode and represent the
threeorder tensor (see Figure 1). The rows of the front matrix are the
users, the columns correspond to the features and the third mode
(*-axis) represents the hours. If we compute the dot product
between each tensor’s rows with the columns of the component
matrix , yielded by the Tucker3 model we obtain the
coordinate( , %) for a given timepoint. If we repeat this procedure
for all time points (e.g. hours), we are able to generate the
coordinates of each user for the 720 hours. The user trajectories are</p>
      <p>To explore this goal, we employed Agglomerative Hierarchical
Clustering toolbox from MATLAB to cluster user trajectories. We
defined Euclidean distance between each point in trajectories as
our distance function and Ward's criterion as the linkage criterion.
We tested different values of cut-off from 0.6 to 1.2 to examine the
clustering structure. The most suited clustering structure was
obtained for a dendrogram distance of 1, which cuts the tree to
level that, corresponds to three clusters. The average trajectory of
these clusters is shown in Figure 6. Cluster red has1 user (0.1%),
cluster blue comprises 866 users (97.4%) and cluster green
includes 22 users (2.5%). As it can be seen, no specific pattern can
be recognized from the green and the red cluster. The users in these
two clusters show an abnormal behavior almost in all time points.
Such event can be due to a stable specific problem such as a
problem in the user device. Regarding the blue cluster, it is
possible to detect three events. First significant event occurs
between hours 350 to 400. Second and third events also occur
between 450 to 480 and 520 to 560, respectively. However, the
occurrence of the second and the third events should be assessed
with hypothesis testing since they can be due to an accidental
change.
5</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>In this paper, we present a study on using the Tucker3
decomposition to discover abnormal users in an IP/TV network.
Our results indicate that Tucker3 is a robust method for detecting
abnormal users in situations where interactions between the three
dimensions are present. From the tensor decomposition, we can
define user trajectories. The trajectories allow us to observe the
behavior of these users over time. We were able to identify two
kinds of abnormal users: those who show frequent abnormal
behavior over the whole time period and those who are associated
to one or few severe abnormal behaviors over the time period.
Without resorting to the analysis of user temporal trajectories it
would have been harder to uncover such facts. Furthermore, from
the clusters of the users’ trajectories, we have identified three
events that occurred during three time points in the network. The
result of this work can be used in a real network surveillance
system to identify failures in the quickest possible time. In this
work, we did not consider the spatial relation of users. Taking into
account spatial relationships between network nodes could lead to
a better clustering of users. Since some users might show similar
behavior, with some delays, other distance measures for clustering
should be tested. Currently we are employing another distance
function using dynamic time warping, which assigns two users
with same behavior but with a time shift in the same cluster. The
solution we presented for detection of events was based on
clustering of trajectories. We are going to apply sliding window on
trajectories to find time periods that have the most compact
trajectories, which would lead to the discovery of events in a more
accurate and reliable way</p>
    </sec>
    <sec id="sec-9">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This work was supported by the Institute for Systems and Computer
Engineering of Porto (INESC TEC) under projects TNET with reference
BI/120033/PEst/LIAAD and project KDUS with reference
PTDC/EIAEIA/098355/2008. The authors are grateful for the financial support. This
work is also financed by the ERDF European Regional Development Fund
through the COMPETE Programme (operational programme for
competitiveness) and by National Funds through the FCT Fundação para a
Ciência e a Tecnologia (Portuguese Foundation for Science and
Technology) within project CMU-PT/RNQ/0029/2009. We'd like to
acknowledge the support of J.A. at the European IP/TV service provider.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Banerjee</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            <given-names>V. Chandola V.</given-names>
          </string-name>
          ,
          <article-title>"Anomaly detection: a survey,"</article-title>
          <source>2009</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Battiti</surname>
          </string-name>
          ,
          <article-title>"Identifying intrusions in computer networks with principal component analysis,"</article-title>
          <source>in International Conference on Availability, Reliability and Security</source>
          , Vienna, Austria,
          <year>2006</year>
          , pp.
          <fpage>270</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X.</given-names>
            <surname>Guan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X. Zhang W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>"A novel intrusion detection method based on principal component analysis,"</article-title>
          <source>in IEEE Symposium on Neural Networks</source>
          ,
          <year>2004</year>
          , p.
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.-L.</given-names>
            ,
            <surname>Chen</surname>
          </string-name>
          , S.-C.,
          <string-name>
            <surname>Sarinnapakorn</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>and Chang</given-names>
            <surname>Shyu</surname>
          </string-name>
          ,
          <article-title>"A novel anomaly detection scheme-based on principal component classifier,"</article-title>
          <source>in 3rd IEEE International Conference on Data Mining</source>
          , Melbourne, Florida,
          <year>2003</year>
          , pp.
          <fpage>353</fpage>
          --
          <lpage>365</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Callegari</surname>
          </string-name>
          ,
          <article-title>"A novel PCA-based Network Anomaly Detection,"</article-title>
          <source>in IEEE International Conference on Communications (ICC)</source>
          , Pisa, Italy ,
          <year>2011</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>M. C.</surname>
          </string-name>
          et al.
          <source>Kerman, "Event Detection Challenges, Methods, and Applications in Natural and Artificial Systems," in 14th International Command and Control Research and Technology Symposium</source>
          , Washington, DC,
          <year>2009</year>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Shmueli</surname>
          </string-name>
          ,
          <article-title>"Wavelet-based Monitoring in Modern Biosurveillance,"</article-title>
          University of Maryland, College Park,
          <source>Report Working Paper RHS-06-002</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>M. I. Zucchini W.</surname>
          </string-name>
          ,
          <article-title>Hidden Markov Models for Time Series</article-title>
          . USA/FL: Chapman &amp; Hall/CRC,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Stamp</surname>
          </string-name>
          . (
          <year>2004</year>
          , Jan) Department of Computer Science, San Jose State University. [Online]. http://www.cs.sjsu.edu/faculty/stamp/RUA/HMM.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.A.L.</given-names>
            , &amp; van
            <surname>Mechelen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Kiers</surname>
          </string-name>
          ,
          <article-title>"Three-way component analysis: Principles and illustrative application,"</article-title>
          <source>Psychological Methods</source>
          , vol.
          <volume>6</volume>
          , pp.
          <fpage>84</fpage>
          -
          <lpage>110</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Lawrence</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Rabiner</surname>
          </string-name>
          ,
          <article-title>"A tutorial on hidden Markov models and selected applications in speech recognition,"</article-title>
          <source>Proceedings of the IEEE</source>
          , vol.
          <volume>77</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>257</fpage>
          -
          <lpage>286</lpage>
          ,
          <year>Feb 1989</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>