<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Modeling and Clustering the Behavior of Animals Using Hidden Markov Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tomáš Šabata</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomáš Borovicˇka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇa</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Information Technology, Czech Technical University in Prague</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country>The Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science, Czech Academy of Sciences</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country>The Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1649</volume>
      <fpage>172</fpage>
      <lpage>178</lpage>
      <abstract>
        <p>The objectives of this article are to model behavior of individual animals and to cluster the resulting models in order to group animals with similar behavior patterns. Hidden Markov models are considered suitable for clustering purposes. Their clustering is well studied, however, only if the observable variables can be assumed to be Gaussian mixtures, which is not valid in our case. Therefore, we use the Kullback-Leibler divergence to cluster hidden Markov models with observable variables that have an arbitrary distribution. Hierarchical and spectral clustering is applied. To evaluate the modeling approach, an experiment was performed and an accuracy of 83.86% was reached in predicting behavioral sequences of individual animals. Results of clustering were evaluated by means of statistical descriptors of the animals and by a domain expert, both methods confirm that the results of clustering are meaningful.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>A mathematical model that describes behavior is called
behavioral model. In particular, we assume that patterns of
animals’ behavior are reflected in their behavioral models
and the differences can be identified and analyzed on the
individual level.</p>
      <p>For each animal, a model that represents its behavior in
a certain time period is created. A comparison of
models of one animal from different periods of time can show
changes in its behavior over time. Rapid changes in the
behavior can indicate an important event or disorder.</p>
      <p>Moreover, different animals can be compared based on
their behavioral models. The behavioral model as a
descriptor of the behavioral patterns can be used to group and
classify the animals. Furthermore, characteristics of each
group can be extracted and changes of behavioral patterns
can be tracked.</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        The field of the behavioral analysis is very wide [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
However, the paper focuses on a specific kind of behavioral
analysis, behavioral analysis of animals as a sequence of
states.
      </p>
      <p>To create a sufficiently accurate behavioral model, an
abstraction and simplification of the behavior is used. The
most common abstraction of an animals’ behavior is the
abstraction as a sequence of states</p>
      <p>
        The description of behavior as a sequence of states is a
commonly used abstraction and simplification in
modeling behavior [
        <xref ref-type="bibr" rid="ref13 ref14 ref6">6, 13, 14</xref>
        ]. This allows to create generalised
and sufficiently accurate behavioral model. Each state of
the sequence corresponds to an action. Actions are
organized in a finite sequence [
        <xref ref-type="bibr" rid="ref13 ref14 ref17 ref20 ref6">6, 13, 14, 17, 20</xref>
        ]. The model is
expected to be more accurate if actions are easily
separable. Thus, these actions should be mutually disjoint and
cover all activities that an animal can do. It means that the
animal is doing exactly one action at a time.
      </p>
      <p>An abstraction of an animal living in a closed
environment is much simpler than for example an abstraction of
the behavior of a human. Such animals can do only a
limited number of actions. These actions are reactions to the
internal state of the animal (hunger, thirst, ...), reactions to
other animals’ behavior or to the environment.</p>
      <p>
        The most commonly used methods to model sequences
are Hidden Markov Models [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Dynamic Bayesian
networks [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Conditional Random Fields [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], Neural
Networks [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], Linear regression model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] or Structured
Support Vector machines [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. All these methods belong to
methods of supervised learning. Even though, Hidden
markov models can be also estimated in a semisupervised
way [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. There are also approaches that utilize
unsupervised learning methods to deal with modeling of
behavior. The Fuzzy Q-state Learning is an example of
unsupervised method that was used to model humans’ behavior. In
this approach labels are created by the Iterative Bayesian
Fuzzy Clustering algorithm [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Preliminaries</title>
      <sec id="sec-3-1">
        <title>Hidden Markov Models</title>
        <p>
          Influenced by [
          <xref ref-type="bibr" rid="ref14 ref20 ref6">6, 14, 20</xref>
          ], we have decided to use hidden
Markov models for behavioral modeling. In this
subsection, the principles of a hidden Markov model (HMM) will
be recalled.
        </p>
        <p>With each HMM, a random process indexed by time is
connected, which is assumed to be in exactly one of a set of
N distinct states at any time. At regularly spaced discrete
times, the system changes its state according to
probabilities of transitions between states. Time steps associated
with time changes are denoted t = 1, 2, 3, ... . The actual
state at a time step t is denoted qt .</p>
        <p>The process itself is assumed to be a Markov chain,
usually a first-order Markov chain, although a Markov chain
of any order can be used. It is usually assumed that the
Markov chain is homogeneous. This assumption allows
the Markov chain to be described as a matrix of transition
probabilities A = {ai j}, which is formally defined in the
Equation (1).</p>
        <p>ai j = P(qt = y j|qt−1 = yi), 1 ≤ i, j ≤ N
(1)</p>
        <p>The simple observable Markov chain is too restrictive to
describe the reality, however it can be extended. Denoting
Y the variable recording the states of the Markov chain, a
HMM is obtained through completing Y with a
multivariate random variable X. In the context of that HMM, X is
called ’observation variable’ or ’output variable’, whereas
Y is called ’hidden variable’. The the hidden variable takes
values in the set {y1, y2, ..., yN } and observable variable X
takes values in the set {x1, x2, ..., xM}.</p>
        <p>We assume to have an observation sequence O =
o1o2...oT and a state sequence Q = q1q2...qT which
corresponds to the observation sequence. HMM can be
characterized using three probability distributions:
1. A state transition probability distribution A = {ai j}
which is formally defined by the Equation (1).
2. An probability distribution of observation variables,
B = {bi, j}, where bi, j is a probability of observation
variables in state yi and it is formally defined as (2).</p>
        <p>bi, j = P(ot = x j|qt = yi)
The matrix B of the discrete variable can be
described using N × M stochastic matrix, where M
denotes number of possible values of X .
3. An initial state distribution π = {πi} is defined by
πi = P(q1 = yi)
When the initial state distribution is assumed to be
stationary, then a initial state distribution is usually
computed as the stationary distribution of the Markov
chain described with matrix A by solving the
Equation (4).
(2)
(3)
(4)
πA =
∑ πiAi j =
i
π
π j</p>
        <p>With these three elements, the HMM is fully defined.
The model is denoted λ = (A, B, π). A HMM can be
graphically depicted by Trellis diagram which is shown in
Figure 1. The joint probability of O and Q, which
corresponds to the trellis diagram, is described by the Equation
q1
o1
q2
o2
q3
o3
(5). By means of the above notation, it can be formally
described as in the Equation (6).</p>
        <p>T
P(o1, ..., oT , q1, ..., qT ) = P(q1)P(o1|q1) ∏ P(qk|qk−1)P(ok|qk)
k=2
T
P(o1, ..., oT , q1, ..., qT ) = πq1 bq1,o1 ∏ aqk−1,qk bqk,ok
k=2
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Clustering approaches</title>
        <p>
          Hidden Markov models are often used in cluster
analysis of sequences [
          <xref ref-type="bibr" rid="ref1 ref18 ref9">1, 9, 18</xref>
          ]. Clustering sequences is more
complex task than clustering feature vectors since infinite
sequences are not in a Euclidean space and sufficiently
long finite sequences are in a high-dimensional Euclidean
spaces. Although there exist approaches how to measure
distance between two sequences (Levenshtein distance,
Hamming distance, longest common subsequence, etc.),
however, they may not be always meaningful. It may be
more reasonable to learn a HMM for each sequence and
cluster the HMMs. A HMM is the random process that
produces the sequence.
        </p>
        <p>
          Due to the fact that HMMs’ parameters lie on a
nonlinear manifold, application of the k-means algorithm
would not succeed [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. In addition, parameters of a
particular generative model are not unique, therefore, a
permutation of states may correspond to the same model [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
It means that different sequences may be generated by the
same HMM.
        </p>
        <p>
          There are already a few approaches to the cluster
analysis by means of HMMs. One of them uses the
spectral clustering with Bhattacharyya divergence [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Another
commonly used dissimilarity measure in spectral
clustering of HMMs is the Kullback-Leibler divergence [
          <xref ref-type="bibr" rid="ref10 ref21 ref24">10, 21,
24</xref>
          ].
        </p>
        <p>
          The approach called variational hierarchical
expectation maximization (VHEM) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is also based on spectral
clustering. It directly uses the probability distributions of
HMMs instead of constructing an initial embedding and
besides clustering, it generates new HMMs. The newly
generated HMMs are representatives of each cluster.
        </p>
        <p>A disadvantage of all the mentioned methods is the
assumption that observations are normally distributed and
can be described using the Gaussian mixture model. A
qT
oT
(5)
(6)
distance measure between two HMMs with normally
distributed observations can be computed in polynomial time.
However, that assumption can be too restrictive for many
real-life applications.
of set with a probability proportional to P(O|λ ). Using
Viterbi algorithm, the most likely sequence Qopt of states
can be computed by P(Qopt, O|λ ) = maxQP(Q, O|λ ). This
leads to the KL divergence.</p>
        <p>
          Dissimilarity measures of HMMs The definition of a
distance between two HMMs is challenging since HMM
consists of a Markov chain and a joint distribution of
observations. In [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], the authors considered two distance
measures applicable to HMMs λ and λ ′. The first of them
is a normalized Euclidean distance of the corresponding
matrices B, B’,
dec(λ , λ ′) =
s 1 N M
        </p>
        <p>
          ∑ ∑ ||bik − b′ik||2,
N i=1 k=1
(7)
where N denotes a number of states and M denotes a
number of observations. The number of states of both models
has to be identical N = N′. The second distance proposed
in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is defined by
dmec(λ , λ ′) =
s 1 N M
        </p>
        <p>∑ min j ∑ ||bik − b′jk||2.</p>
        <p>N i=1 k=1</p>
        <p>
          A disadvantage of the considered distances is that they
ignore the Markov chain. Therefore, there exist HMMs
λ and λ ′ the distance between which is zero although the
generated probability distributions Pλ and P′ are different.
λ
Consequently, both distance measures are not metrics, but
only pseudometrics [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Although the distances ignore
the Markov chain, it is agreed that the observation
probability distribution matrix B is, in most cases, a more
sensitive set of parameters related to the closeness of HMMs
then the initial distribution vector π or the state transitions
matrix A [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>Another distance between HMMs λ , λ ′ is the
KullbackLeibler divergence (KL divergence)
dKL(λ , λ ′) =</p>
        <p>Z</p>
        <p>1
O G(O)
log P(O|λ )</p>
        <p>P(O|λ ′)</p>
        <p>P(O|λ )dO,
(9)
where G(O) is a function weighting the length of the
sequence O. Two kinds of such weighting functions are
commonly used. The function G(O) is equal to the length
of sequence if we measure the divergence between two
HMMs that generate equally long sequences. It is equal
to expected value of lenghts of sequences if we measure
the divergence between two HMMs which generate
various long sequences.</p>
        <p>
          An analytic solution for integral in the Equation (9)
exists for HMMs with Gaussian distributed observations [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ],
otherwise it can be calculated numerically. The time
complexity of the numerical computation grows exponentially
with the length of the sequence. Since we want to use it for
long sequences we use an aproximation described in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
We assume to have an ordered set of output sequences F =
{O1, ..., OL}. Any sequence O could became l-th member
dVit(λ , λ ′) =
        </p>
        <sec id="sec-3-2-1">
          <title>We assume that:</title>
          <p>Z</p>
          <p>1
O G(O)
log</p>
          <p>P(Qopt, O|λ )
P(Q′opt, O|λ ′)</p>
          <p>P(O|λ )dO (10)
• The most probable sequences of both hidden Markov
models are equal for both HMMs. The assumption is
reasonable if two HMMs are not too dissimiliar.
• The Markov chain is ergodic. A Markov chain is
called ergodic if there is a nonzero probability to get
from any state q to any other state q’ (not necessarily
in one move). A Markov chain with an ergodic
subset (in particular, an ergodic Markov chain) generates
sufficient long sequence.</p>
          <p>
            With these assumptions, the KL divergence can be
computed using following equation
(8)
dVit (λ , λ ′) =
where ε is an approximation error [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. According to
Subsection 3.1, the equation can be rewritten using the
notation of HMMs as it is shown in the Equation (12).
dVit (λ , λ ′) − ε =
          </p>
          <p>log ayt ,yt+1 − log a′yt ,yt+1 +
1 T −1</p>
          <p>∑
G(O) t=1
1 T</p>
          <p>∑ log byt ,xt − log b′yt ,xt</p>
          <p>G(O) t=1</p>
          <p>
            If a sequence O is long enough, the law of large number
allows us to approximate Equation (12) as (13) [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ].
dVit (λ , λ ′) ≈ δeVit = ∑ ai jπi log ai j − log a′i j +
i, j
∑ bikπi log bi j − log b′i j
i,k
          </p>
          <p>A disadvantage of the KL divergence is that it is not
symmetric.</p>
          <p>
            The Bhattacharyya divergence is also a commonly used
metric for HMMs. For two probability distributions f and
g, it is defined by the Equation (14). Similarly to the KL
divergence, it can be easily computed if HMM has
normally distributed observations [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. Differently to the KL
divergence, the Bhattacharyya divergence is symmetric.
dB( f , g) = − ln(Z p f (x)g(x)dx)
(14)
          </p>
          <p>Using a chosen dissimilarity measure, a dissimilarity
matrix can be created and used for clustering, in
particular, in hierarchical and spectral clustering algorithms.
(12)
(13)</p>
          <p>Spectral clustering uses similarities between objects
represented as a weighted graph. There are three ways
how to create such a graph:
1. The ε-neighborhood graph. In this way, all pairs of
points with distances smaller than ε are connected.
The graph is often considered as unweighted because
its edges are based on the dichotomous property of
distances at most ε.
2. k-nearest neighbor graphs. In this way, the goal
is to connect vertex vi with vertex v j if v j is among
the k nearest neighbors of vi. This way is usually
used in image segmentation where each pixel has its
neighbourhood defined by 4 or 8 pixels.
3. The fully connected graph. This way means
connecting all points the pairwise similarity of which
is positive. This construction is usually chosen if
the similarity function itself already encodes mainly
local neighborhoods. An example of such a
similarity function is the Gaussian similarity function
||xi−x j||2
(s(xi, x j) = e− 2σ2 ). The parameter σ controls the
width of the neighbourhood.</p>
          <p>
            Spectral clustering can be performed by three different
algorithms [
            <xref ref-type="bibr" rid="ref19">19</xref>
            ]:
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>1. Unnormalized spectral clustering</title>
          <p>2. Normalized spectral clustering according to Shi and</p>
          <p>Malik
3. Normalized spectral clustering according to Ng,
Jordan, and Weiss
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Proposed approach</title>
      <p>This article focuses on behavioral modeling and analysis
of animals that live in a closed environment. This
simplifies the abstraction of the behavior since the number of
possible actions of animals in a closed environment is
limited. The proposed approach has been illustrated on a herd
of cows containing one hundred of individuals.</p>
      <p>The transition and emission matrices of the hidden
Markov model are estimated using a sequence of states
and sequence of observable values, therefore a set of
possible states and a set of possible observable variable values
have to be predefined. Each element of the sequence
describes one minute of animal’s behavior.</p>
      <p>Five possible states that represent actions which an
animal can do are defined. These states are denoted
S1, S2, S3, S4, S5. An animal is considered to be in the state
S5 if it is not in any of the states S1-S4. States S1, S2, S3
and S4 correspond to eating, drinking, resting and being
milked.</p>
      <p>Two observable variables are calculated for each state
except the state S5:
1. the duration of the last occurrence of the state,
2. the time elapsed since the last occurrence of the state.
These eight variables are discretized using binning by
equal frequency into four or five bins. The ninth
observable variable represents a daytime which is discretized
into six blocks of four hours. The Cartesian product of
the value sets of observable variables contains, after
discretization, 1,500,000 combinations of values.</p>
      <p>To avoid zero probabilities in the emission and
transition matrices, empirical transition and emission matrices
averaged over all models of respective animal are used as
initial estimates. Zero probabilities in those initial
estimates are replaced with small probabilities estimated as
one over the number of elements in the matrix.</p>
      <p>Since the assumption of normally distributed
observations is in our case not valid, we use an approximation
of the Kullback-Leibler divergence, recalled in
Subsection 3.2, as the similarity measure for HMM clustering.
Using this approximation, a distance matrix D for the set
of HMMs constructed for the individual anamals is
computed, i.e., Di, j represents the KL divergence of the j-th
from the i-th most likely sequence (Di, j = dVit(λi, λ j)).
The properties of the KL divergence imply that the matrix
is real valued, non-symmetric and its diagonal has zero
values.</p>
      <p>Hierarchical and spectral clustering were applied to
cluster the models. A parameter of hierarchical
clustering is the kind of linkage. Since HMMs are not points in
an Euclidean space, the single, average or complete
linkage can be used. An optimal number of clusters can be
determined by the domain expert from the dendogram of
the hierarchical clustering.</p>
      <p>For the spectral clustering, a fully connected graph
based on the Gaussian similarity function is used as the
similarity graph. Estimation of the parameter sigma of the
similarity function is based on the optimization of balance
of clusters.</p>
      <p>The results of clustering were subsequently analysed
using descriptive characteristics (e.g., age and weight of the
animal) that were not among the observable variables.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental results</title>
      <p>The experiment is devided into two parts, behavioral
modeling and clustering of the models. Section 5.1 describes
results of the modeling using HMMs and Section 5.2
presents the results of clustering.
5.1</p>
      <sec id="sec-5-1">
        <title>Hidden Markov modeling</title>
        <p>The dataset that consists of ten consecutive days was split
into train and test sets in a ratio 9:1. Parameters of a model
were estimated with data of nine consecutive days and the
model was evaluated with data of the tenth day. For each
animal, transitions and emissions matrices were estimated
using the Baum–Welch algorithm.</p>
        <p>Hidden Markov models were evaluated in two different
ways, visually by a domain expert and using the Viterbi’s
sequence. The domain expert checks if the real animals’
behavior is consistent with the stationary distribution of
the resulting Markov chain, which describes the behavior
to which the Markov chain converges. Consequently, the
stationary distribution describes how much time an animal
spends doing some activity in comparison with all
activities.</p>
        <p>Stationary distributions were computed for each animal
separately. A histogram of the probabilities of individual
states in the stationary distributions is shown in Figure 2.</p>
        <p>Probabilities of the stationary distributions averaged
over all animals are in Table 1.</p>
      </sec>
      <sec id="sec-5-2">
        <title>State</title>
        <p>S1
S2
S3
S4
S5</p>
        <p>Average probability
8.5%
0.4%
44.8%
1.8%
44.5%</p>
        <p>The second way of evaluation is using the Viterbi’s
sequence. The Viterbi’s sequence determines the most likely
sequence of states given a sequence of observations and
is denoted v1v2....vT . To get an accuracy of the
prediction of a sequence of states, the real sequences of states
are element-wise compared with the Viterbi’s sequences.
The accuracy is calculated as number of elements that do
not differ divided by length of sequence. It was calculated
separately for each state as well as for whole sequence.
S1</p>
        <p>S2</p>
        <p>S3</p>
        <p>S4</p>
        <p>S5
Resulting models are able to predict animals’ actions with
an average accuracy of 83.86%. It can be seen that the
state S5 has the worst accuracy from all states. It is caused
by a delay. A model can relatively precisely detect that
an action has started but the detection of a termination of
the action is delayed. It causes that the state S5 has worse
accuracy than other states. Figure 3 visualizes the results
of evaluation of the Viterbi’s sequence.
In this subsection, we discuss results and evaluation of
the cluster analysis. The visualisation of a distance
matrix is shown in Figure 4. A color of the point i,j
represents distance between i-th animal’s model and j-th
animal’s model. The lighter the color is the less similar the
models are. The distance matrix is used by both clustering
methods.</p>
        <p>Linkages of hierarchical clustering can be visualized by
dendrograms. Dendrogram is used to visually estimate an
optimal number of clusters. The dendrogram of a
complete linkage is shown in Figure 5. According to it,
animals can be assigned into several approximately equally
sized clusters.</p>
        <p>The evaluation of clustering results is a difficult task
since there are no references for validation. However,
clusters can be validated by a domain expert based on
descriptive characteristics related to animals (e.g., age, weight).
These characteristics were not used for modeling, thus,
they can not influence the clustering. Figures 6, 7 and 8
show results of the spectral clustering according to Shi and
Malik with a fully connected similarity graph with sigma
equal to 0.65. In the figures can be seen that there are
differences in descriptive statistics of such characteristic
between individual clusters. For example, animals assigned
to cluster 1 have significantly higher values of the
characteristic than animals of other clusters. It indicates that the
clustering is reasonable and meaningful.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Hidden Markov models are proved to be applicable to
modeling of animal behavior represented as a sequence of
states. According to their evaluation, the model is able
to predict animals’ actions with an average accuracy of
83.86%. Furthermore, the stationary distributions of the
resulting models were validated by a domain expert.
Cluster analysis was performed by classical clustering
algorithms using the approximation of KL divergence. The
clustering produces meaningful results and clusters
animals into interpretable groups</p>
      <p>Prediction can be further improved with better
definition of states. In particular, the state S5 can be divided
into more disjoint states. This may positively influence
results of the clustering. Moreover, different modeling
approaches commonly used for modeling sequences, such
as dynamic Bayesian networks, conditional random fields,
are intended to be researched, as well as their clustering
possibilities.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Alon</surname>
          </string-name>
          , Stan Sclaroff, George Kollios, and
          <string-name>
            <given-names>Vladimir</given-names>
            <surname>Pavlovic</surname>
          </string-name>
          .
          <article-title>Discovering clusters in motion time-series data</article-title>
          .
          <source>In Computer Vision and Pattern Recognition</source>
          ,
          <year>2003</year>
          . Proceedings. 2003 IEEE Computer Society Conference on, volume
          <volume>1</volume>
          ,
          <string-name>
            <surname>pages</surname>
            <given-names>I</given-names>
          </string-name>
          -
          <fpage>375</fpage>
          . IEEE,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Longbing</given-names>
            <surname>Cao</surname>
          </string-name>
          and
          <string-name>
            <surname>Philip S Yu</surname>
          </string-name>
          . Behavior computing. Springer, London,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Emanuele</given-names>
            <surname>Coviello</surname>
          </string-name>
          ,
          <string-name>
            <surname>Gert R Lanckriet</surname>
          </string-name>
          , and
          <string-name>
            <surname>Antoni</surname>
            <given-names>B Chan.</given-names>
          </string-name>
          <article-title>The variational hierarchical EM algorithm for clustering hidden Markov models</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>404</fpage>
          -
          <lpage>412</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Qi</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <surname>Xiao-Qing</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tian-Ming</surname>
            <given-names>Wang</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>Damir</given-names>
            <surname>Vukicevic</surname>
          </string-name>
          .
          <article-title>Linear regression model of dna sequences and its application</article-title>
          .
          <source>Journal of Computational Chemistry</source>
          ,
          <volume>28</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1434</fpage>
          -
          <lpage>1445</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Markus</given-names>
            <surname>Falkhausen</surname>
          </string-name>
          , Herbert Reininger, and
          <string-name>
            <given-names>Dietrich</given-names>
            <surname>Wolf</surname>
          </string-name>
          .
          <article-title>Calculation of distance measures between hidden Markov models</article-title>
          .
          <source>In In Proc. Eurospeech</source>
          , pages
          <fpage>1487</fpage>
          -
          <lpage>1490</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          , G. Poulton,
          <string-name>
            <given-names>P.</given-names>
            <surname>Corke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Bishop-Hurley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wark</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Swain</surname>
          </string-name>
          .
          <article-title>Using accelerometer, high sample rate GPS and magnetometer data to develop a cattle movement and behaviour model</article-title>
          .
          <source>Ecological Modelling</source>
          ,
          <volume>220</volume>
          (
          <issue>17</issue>
          ):
          <fpage>2068</fpage>
          -
          <lpage>2075</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>John</surname>
            <given-names>R Hershey</given-names>
          </string-name>
          and
          <article-title>Peder A Olsen. Variational Bhattacharyya divergence for hidden Markov models</article-title>
          .
          <source>In Acoustics, Speech and Signal Processing</source>
          ,
          <year>2008</year>
          .
          <article-title>ICASSP 2008</article-title>
          . IEEE International Conference on, pages
          <fpage>4557</fpage>
          -
          <lpage>4560</lpage>
          . IEEE,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>John</surname>
            <given-names>R Hershey</given-names>
          </string-name>
          ,
          <article-title>Peder A Olsen,</article-title>
          and
          <string-name>
            <surname>Steven J Rennie. Variational</surname>
          </string-name>
          Kullback-
          <article-title>Leibler divergence for hidden Markov models</article-title>
          .
          <source>In Automatic Speech Recognition &amp; Understanding</source>
          ,
          <year>2007</year>
          . ASRU. IEEE Workshop on, pages
          <fpage>323</fpage>
          -
          <lpage>328</lpage>
          . IEEE,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Tony</given-names>
            <surname>Jebara</surname>
          </string-name>
          , Yingbo Song, and
          <string-name>
            <given-names>Kapil</given-names>
            <surname>Thadani</surname>
          </string-name>
          .
          <article-title>Spectral clustering and embedding with hidden Markov models</article-title>
          .
          <source>In Machine Learning: ECML 2007</source>
          , pages
          <fpage>164</fpage>
          -
          <lpage>175</lpage>
          . Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Biing-Hwang Fred</surname>
          </string-name>
          Juang and
          <string-name>
            <surname>Lawrence R Rabiner.</surname>
          </string-name>
          <article-title>A probabilistic distance measure for hidden Markov models</article-title>
          .
          <source>AT&amp;T technical journal</source>
          ,
          <volume>64</volume>
          (
          <issue>2</issue>
          ):
          <fpage>391</fpage>
          -
          <lpage>408</lpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>John</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Kelley</surname>
          </string-name>
          .
          <source>General Topology (Graduate Texts in Mathematics)</source>
          . Springer,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>John D. Lafferty</surname>
          </string-name>
          ,
          <string-name>
            <surname>Andrew McCallum</surname>
          </string-name>
          , and
          <string-name>
            <surname>Fernando</surname>
            <given-names>C. N.</given-names>
          </string-name>
          <string-name>
            <surname>Pereira</surname>
          </string-name>
          .
          <article-title>Conditional random fields: Probabilistic models for segmenting and labeling sequence data</article-title>
          . pages
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Sang</given-names>
            <surname>Wan</surname>
          </string-name>
          <string-name>
            <surname>Lee</surname>
          </string-name>
          , Yong Soo Kim, and
          <string-name>
            <given-names>Zeungnam</given-names>
            <surname>Bien</surname>
          </string-name>
          .
          <article-title>A nonsupervised learning framework of human behavior patterns based on sequential actions</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          , vol.
          <volume>22</volume>
          (
          <issue>issue</issue>
          4):
          <fpage>479</fpage>
          -
          <lpage>492</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Maja</surname>
            <given-names>Matetic´</given-names>
          </string-name>
          , Slobodan Ribaric´, and Ivo Ipšic´.
          <article-title>Qualitative modelling and analysis of animal behaviour</article-title>
          .
          <source>Applied Intelligence</source>
          , vol.
          <volume>21</volume>
          (
          <issue>issue</issue>
          1):
          <fpage>25</fpage>
          -
          <lpage>44</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Bill O'Brien</surname>
          </string-name>
          , John Dooley, and Thomas J Brazil.
          <article-title>Rf power amplifier behavioral modeling using a globally recurrent neural network</article-title>
          .
          <source>In Microwave Symposium Digest</source>
          ,
          <year>2006</year>
          .
          <string-name>
            <given-names>IEEE</given-names>
            <surname>MTT-S International</surname>
          </string-name>
          , pages
          <fpage>1089</fpage>
          -
          <lpage>1092</lpage>
          . IEEE,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Vladimir</surname>
            <given-names>Pavlovic´</given-names>
          </string-name>
          ,
          <string-name>
            <surname>James</surname>
            <given-names>M Rehg</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tat-Jen Cham</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kevin P Murphy.</surname>
          </string-name>
          <article-title>A dynamic Bayesian network approach to figure tracking using learned dynamic models</article-title>
          .
          <source>In Computer Vision</source>
          ,
          <year>1999</year>
          .
          <source>The Proceedings of the Seventh IEEE International Conference on</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>94</fpage>
          -
          <lpage>101</lpage>
          . IEEE,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Amy</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Sliva</surname>
          </string-name>
          .
          <article-title>Scalable techniques for behavioral analysis and forecasting</article-title>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Padhraic</given-names>
            <surname>Smyth</surname>
          </string-name>
          et al.
          <article-title>Clustering sequences with hidden Markov models</article-title>
          .
          <source>Advances in neural information processing systems</source>
          , pages
          <fpage>648</fpage>
          -
          <lpage>654</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Ulrike</surname>
            <given-names>Von</given-names>
          </string-name>
          <string-name>
            <surname>Luxburg</surname>
          </string-name>
          .
          <article-title>A tutorial on spectral clustering</article-title>
          .
          <source>Statistics and computing</source>
          ,
          <volume>17</volume>
          (
          <issue>4</issue>
          ):
          <fpage>395</fpage>
          -
          <lpage>416</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <article-title>Nelleke d</article-title>
          . Weerd, Frank v. Langevelde, Herman v. Oeveren, Bart A.
          <string-name>
            <surname>Nolet</surname>
          </string-name>
          , Andrea Kölzsch,
          <string-name>
            <surname>Herbert H. T. Prins</surname>
            , and
            <given-names>W. F.</given-names>
          </string-name>
          <string-name>
            <surname>Boer</surname>
          </string-name>
          .
          <article-title>Deriving animal behaviour from highfrequency GPS: Tracking cows in open and forested habitat</article-title>
          .
          <source>PLoS One</source>
          ,
          <volume>10</volume>
          (
          <issue>6</issue>
          ),
          <year>06 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Jie</given-names>
            <surname>Yin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Qiang</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Integrating hidden Markov models and spectral analysis for sensory time series clustering</article-title>
          .
          <source>In Data Mining</source>
          , Fifth IEEE International Conference on, pages
          <fpage>8</fpage>
          -pp.
          <source>IEEE</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.-X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>Structured Support Vector Machines for Speech Recogntion</article-title>
          .
          <source>PhD thesis</source>
          , Cambridge University,
          <year>March 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Shi</given-names>
            <surname>Zhong</surname>
          </string-name>
          .
          <article-title>Semi-supervised sequence classification with HMMs</article-title>
          .
          <source>In Valerie Barr and Zdravko Markov</source>
          , editors,
          <source>FLAIRS Conference</source>
          , pages
          <fpage>568</fpage>
          -
          <lpage>574</lpage>
          . AAAI Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Shi</given-names>
            <surname>Zhong</surname>
          </string-name>
          and
          <string-name>
            <given-names>Joydeep</given-names>
            <surname>Ghosh</surname>
          </string-name>
          .
          <article-title>A unified framework for model-based clustering</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          ,
          <volume>4</volume>
          :
          <fpage>1001</fpage>
          -
          <lpage>1037</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>