<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Framework for Study of Caching Algorithm Efficiency.</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>supervisor: Michael V. Grankov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Mosab Bassam Y. Al Zgool Don State Technical University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Proceedings of the Spring Young Researcher's Colloquium On Database and Information Systems SYRCoDIS</institution>
          ,
          <addr-line>St.-Petersburg, Russia, 2008</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Thanh Hung Ngo Don State Technical University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we offer several models of reference sequences (traces of references) using Markov chains for testing of the replacement policies in caching systems. These models enable the generations of traces with the repeated subsequences of references, which are of great interest for study of forecasting methods in caching systems. Furthermore, we offer the scheme of the program stand, where these models have been realized, and result of the experiments, which have been carried out with its help. Index terms: program stand, model of reference sequences, Markov chains, traces with repeated subsequences.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1 Introduction
The most popular method to study the replacement
policies in caching systems is its testing by the program
simulating caching system. In this approach different
traces are given to the input of the program and the
cache-hit rate is registered as the result provided
cachesystem. The more the value of this rate is, the more
efficient the investigated policy performs.</p>
      <p>Specific characters of the functioning of the
information system often render a big influence upon
behavior of the reference sequences. So testing the
policy, have been specially designed for a certain
information system, requires the traces, reflecting
specifics of the system. There are two ways of achieving
the reference sequences: by means of logging the
accesses being executed in the system for a long-lasting
time (e.g. collecting reference sequences) [1, 2] and by
means of trace generating programs [3, 4]. The traces
having been achieved using the first method more really
reflect the specifics of the system, but their achievement
requires a significant computing and material resources.
The achievement of traces using program-generator is
faster and cheaper. The main problem of the last method
is the development of mathematical model, describing
the information system.</p>
      <p>In this paper we offer several models of reference
sequences using Markov chains [5]. Besides, we offer
the scheme of the program stand, where these models
have been realized, and the result of the experiments,
which have been carried out with its help.
2 Models of reference sequences
2.1 Model with One Markov Chain (MOMC)
specified by the triple (S , A, π ) , where:
The model is widely used for modeling of different
queuing systems, including caching system. MOMC is
1. S = {s1 , s 2 , … , s N } - the set of N objects in
the system. Objects may be the papers in paper caching
systems or the objects in the object caching systems;
2. A = {a ij } - the object transition probabilities.
The value of element aij of matrix equals to the
probability of event, when the system, having accessed
the object si , accesses the object s j . If mark the
object having been accessed at the moment t as qt ,</p>
      <p>N
aij ≥ 0, ∑ a ij = 1, 1 ≤ i, j ≤ N ;</p>
      <p>j=1
3.</p>
      <p>π = (π 1 , π 2 ,… , π N )
- the initial object
distribution.</p>
      <p>The generation of traces by MOMC follows the
algorithm:
1. Input S , A , π .
2. Input the count of references L (length of trace).
3. Initialize value for the variable O , which is the
string logging the references to the object, as an empty
string.</p>
      <p>4. Initialize value for the variable t , which is the
current count of references:</p>
      <p>t = 1 .</p>
      <p>5. Choose the initial object qt from the set S
according to the initial object distribution π .
6. WHILE t ≤ L</p>
      <p>Let qt = sk , append sk to the string O :</p>
      <p>O = O + sk .</p>
      <p>8. Choose the next object according to the matrix
of object transition probabilities A and the current
object sk .</p>
      <p>9. Increase the value of the current count of
references t by one:
λ
which references to the objects obey the uniform
law (as in occasion 1). Thus the first and the
second occasions are just the special cases of the
third occasion.</p>
      <p>
        For example we have brought into the matrix in
table 1 the noise with the following parameters
(λ = 7, x = 0.0198) and have generated traces
with its help. The matrix of transition
probabilities now looks like in table 2. Some of
generated traces are as follow:
(
        <xref ref-type="bibr" rid="ref1 ref1 ref2 ref2 ref2 ref2 ref3 ref3 ref3 ref4 ref4 ref4 ref4 ref4 ref4">1, 4, 2, 3, 4, 2, 3, 4, 4, 2, 3, 1, 4, 2, 4</xref>
        );
ln(N )
      </p>
      <p>Because of one’s simplicity the MOMC is not able
to describe the functioning principles of the real
systems. To specify the more complicated mechanisms
it is used the hidden Markov model.
2.2 Model with Hidden Markov Chain (MHMC)
MHMC is specified by the quintuple (S , V , A, B , π ) ,
where:</p>
      <p>1. S = {s1 , s 2 ,… , sU } - the set from U users of
the information system;</p>
      <p>2. A = {a ij } - the user transition probabilities. If
mark the user, which has got access to the system at the
moment t , by qt , then:
of the system;</p>
      <p>4. B = {bij } - probabilities of choosing object for
the users. The i -th row is the object choice probability
distribution for the i -th user. The value of element bij
equals to the probability of event, when the current user
qt = si chooses the object v j . If mark this object as ot ,
then:
bij = P (o t = v j | q t = s i ) ,</p>
      <p>N
bij ≥ 0, ∑ bij = 1, 1 ≤ i ≤ U , 1 ≤ j ≤ N ;</p>
      <p>j=1
5.</p>
      <p>π = (π 1 , π 2 , … , π N
)
- the initial user
distribution.</p>
      <p>The generation of traces by MHMC follows the
algorithm:
1. Input S , V , A , B , π .
2. Input the count of references L (length of trace).
3. Initialize value for the variable O , which is the
string logging the references to the object, as an empty
string.</p>
      <p>4. Initialize value for the variable t , which is the
current count of references: t = 1 .</p>
      <p>5. Choose the initial user qt from the set S
according to the initial user distribution π .</p>
      <p>6. WHILE t ≤ L
7. Let qt = sk .</p>
      <p>8. Choose the current object ot according to the
matrix B and the current user sk .</p>
      <p>9. Let ot = vl , append vl to the string O :</p>
      <p>O = O + vl .</p>
      <p>10. Choose the next user according to the matrix of
user transition probabilities A and the current user sk .</p>
      <p>11. Increase the value of the current count of
references t by one: t = t + 1 .</p>
      <p>12. END-WHILE.</p>
      <p>13. Save the achieved reference string O as the
desired trace.</p>
      <p>The second model more naturally reflects the
relationship between the elements of the information
system: each user accesses to the objects due to his logic
(his distribution law).</p>
      <p>Both of the described models have disadvantages.
The first one is not taken into account the presence of
the users, and the second one is not able to model the
cyclic traces. For elimination of these restrictions we
offer the two-level model of Markov chains.
2.3 Two-Level Model of Markov Chains (TLMMC)
TLMMC
(S , V , AS , AV , π S
is
specified by
, π V ) , where:
1. S = {s1 , s 2 , … , sU } - the set from U
users of
the information system;</p>
      <p>2. V = {v1 , v 2 , … , v N } - the set from
of the system;</p>
      <p>3. AS = {a ij } - the user transition probabilities. If
mark the user, which has got access to the system at the
a ij = P (q t +1 = s j | q t = s i ) ,
4.</p>
      <p>AV = (A 1 , A 2 , … , A U ) - matrix of object
transition probabilities for the users, where:</p>
      <p>A k = {a ikj } - matrix of
object
transition
probabilities for the k -th user. Value of the element a kij
equals to the probability of event, when the k -th user,
having chosen the i -th object, chooses the j -th object.
If mark the object, being chosen by the k -th user at his
r -th choice, as or , then</p>
      <p>a ikj = P (o r +1 = v j | ((o r = v i ) ∧ (q t = s k ))) ,</p>
      <p>π S = (π1 , π 2 , … , πU )</p>
      <p>N
a ikj ≥ 0, ∑j=1 a ikj = 1, 1 ≤ i , j ≤ N , 1 ≤ k ≤ U ;
- the
initial
user
distribution;</p>
      <p>6. π V = (π 1 ,π 2 , … ,π U ) - the initial object
distributions for the users, where:
π k = (π 1k ,π 2k , … ,π kN ) - the initial object
distribution for the k -th user. 1 ≤ k ≤ U .</p>
      <p>The generation of traces by MHMC follows the
algorithm:
1. Input: S , V , AS , A 1 , A 2 , … , A U , and
πS , π1 , π2 , …, πU .
2. Input the count of references L (length of trace).
3. Initialize value for the variable O , which is the
string logging the references to the object, as an empty
string.</p>
      <p>4. Initialize value for the variable l , which is the
current count of references: l = 1 .</p>
      <p>5. Initialize value for the variable t , which is the
count of the user turns: t = 1 .
according to the initial user distribution π .</p>
      <p>S
6. Choose the initial user qt from the set S
7. WHILE l ≤ L
8.</p>
      <p>Let qt = sk .</p>
      <p>One of the traces achieved with help of the model
looks like: 4, 2, 3, 1, 3, 1, 4, 2, 4, 2, 3, 1, 1, 2, 2, 3, 1, 4,
1, 1, 2, 3, 1, 4, 2, 1, 4, 2. In this trace the absolute cyclic
trace (…, 1, 4, 2, 3, 1, 4, 2, 3, …) has been distorted and
fragmented into subsequences due to the random turns
of the users and the random number of references
having been made in their turns. These distortions create
more difficulties for the methods identifying the
subsequences in the references sequence. So the model
TLMMC is of great interest to investigate the prediction
methods in caching system [6, 7].
3 The scheme of program stand
The program stand has been written in programming
environment Delphi 7. It has following components (fig.
1):
1. Trace - generator.</p>
      <p>Generate traces by the one of different models.
2. Realization block of replacement policies.
3. Cache simulator.</p>
      <p>Simulate the performance of cache-system with the
chosen policy and chosen traces model.
4. Analysis block
Analyze the result of performance of cache-system.
Functioning of the program stand has been illustrated by
benchmark analysis of performance of different
replacement policies: Random, LRU and LFU. The
result of the experiment is showed in fig. 2.</p>
      <p>In the investigation it has been used the model
TLMMC with the following parameters:</p>
      <p>S = {1, 2}; V = {1, 2, … , 100 }; π S = (.5, .5 );
π1 = (.01, .01, … , .01 ); π 2 = (.01, .01, … , .01 );</p>
      <p>Matrix A1 is specified as matrix of cyclic traces
with a random “noise” (looks like the matrix in tab. 2).
When x = 0 the absolute cyclic trace, being generated
according to this matrix, looks like:
⎛1,2,3,4,87,35,7,99,9,76,13,12,11,14,15,34,78, ⎞
⎜ ⎟
⎜30,74,20,29,5,23,64,80,100,79,28,21,53,6,26,⎟
⎜⎜33,32,44,17,81,65,51,73,41,42,43,31,48,46, ⎟⎟
⎜ 47,22,49,45,96,52,18,54,55,94,19,60,90,25, ⎟
⎜ ⎟
⎜92,62,16,24,40,63,67,68,69,75,71,72,56,61, ⎟
⎜ 70,10,77,86,38,58,37,82,83,84,27,36,50,88, ⎟
⎜⎜⎝89,59,57,91,93,39,95,85,97,98,8,66 ⎟⎟⎠
Matrix A2 determines for the second user the
equiprobable transition from an object to any objects,
including the transition to it self, i.e.
ai2j = 1 / 100, ∀i, j : 1 ≤ i, j ≤ 100 .</p>
      <p>As magnitude of the “noise” coefficient it has been
used three values: x = 0 ; x = 0.1054 and x = 4.6052 .
In all cases λ = 1 . When x = 0 , in generated traces
present the repeated subsequences. This fact has
negatively influenced upon performance of Random and
LRU, but not for LFU. In this case it is registered the
slight advantage of Random over LRU and significant
advantage of LFU over the rest. Increasing the
magnitude of x , the indeterminacy in generation of the
traces rapidly increases. This change positively
influences upon performance of Random and LRU, and
equalizes the performance of all policies.
The result of the demonstration experiment shows
urgency and relevance of the offered traces models and
also program stand for study of different replacement
policies. The TLMMC presents a great interest to study
the forecasting method in cache-system. The traces
generated by this model were applied to testing the
decomposition method of the relations in database
systems with the criteria of increasing the cache-hit rate
[8]. Herein, when the magnitude of x reaches its
maximum value, the experimental cache-hit rate
completely agrees with the corresponding rate
theoretically has been achieved in that paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Min</given-names>
            <surname>Xu</surname>
          </string-name>
          , Vyacheslav Malyugin, Jeffrey Sheldon, Ganesh Venkitachalam, Boris Weissman.
          <article-title>ReTrace: Collecting Execution Trace with Virtual Machine Deterministic Replay</article-title>
          . // Third Annual Workshop on Modeling,
          <source>Benchmarking and Simulation, held in conjunction with the 34th Annual International Symposium on Computer Architecture</source>
          ,
          <year>June 2007</year>
          . http://www.xhfamily.com/x/files/MoBS07_
          <article-title>replay_i trace</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Hervé</given-names>
            <surname>Touati</surname>
          </string-name>
          ,
          <source>Alan Jay Smith. Reducing and Manipulating Complex Trace Data. // Software - Practice and Experience</source>
          . Vol.
          <volume>21</volume>
          ,
          <year>June 1991</year>
          ,
          <fpage>639</fpage>
          -
          <lpage>655</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Eeckhout</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Bosschere</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neefs</surname>
            <given-names>H</given-names>
          </string-name>
          .
          <article-title>Performance analysis through synthetic trace generation</article-title>
          .
          <source>// Performance Analysis of Systems and Software</source>
          ,
          <year>2000</year>
          . ISPASS. IEEE International Symposium on Volume,
          <source>Issue</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Germán</given-names>
            <surname>Galeano</surname>
          </string-name>
          <string-name>
            <given-names>Gil</given-names>
            , Juan A.
            <surname>Gómez Pulido</surname>
          </string-name>
          , and
          <string-name>
            <surname>Juan</surname>
            <given-names>M. Sánchez</given-names>
          </string-name>
          <string-name>
            <surname>Pérez</surname>
          </string-name>
          .
          <article-title>Tool for the Analysis and Memory-Trace Generation of DOS Executable Files</article-title>
          . // Microprocessors and Microsystems, Volume
          <volume>22</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>7</given-names>
          </string-name>
          ,
          <issue>25</issue>
          <year>January 1999</year>
          , Pages
          <fpage>389</fpage>
          -
          <lpage>393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Lawrence</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Rabiner</surname>
          </string-name>
          .
          <article-title>A tutorial on Hidden Markov Models and selected applications in speech recognition</article-title>
          .
          <source>// Proceedings of the IEEE</source>
          , VOL.
          <volume>77</volume>
          , NO. 2,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Fei</given-names>
            <surname>Guo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yan</given-names>
            <surname>Solihin</surname>
          </string-name>
          .
          <article-title>A Prediction Model for Alternative Cache Replacement Policies</article-title>
          . // http://www.ece.ncsu.edu
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Thomas</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kroeger</surname>
            ,
            <given-names>Darrell D. E.</given-names>
          </string-name>
          <string-name>
            <surname>Long</surname>
          </string-name>
          .
          <source>Predicting File System Actions from Prior Events. // Proceedings of the USENIX 1996 Annual Technical Conference</source>
          , pages
          <fpage>319</fpage>
          -
          <lpage>328</lpage>
          ,
          <year>January 1996</year>
          . http://citeseer.ist.psu.edu/kroeger96predicting.html
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Thanh</given-names>
            <surname>Hung</surname>
          </string-name>
          <string-name>
            <surname>Ngo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Michael V.</given-names>
            <surname>Grankov</surname>
          </string-name>
          .
          <article-title>New Object Function for Vertical Partitioning in Database System</article-title>
          . // the present colloquium.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>