-

The Framework for Study of Caching Algorithm Efficiency.

supervisor: Michael V. Grankov

0 1 2 0 Mosab Bassam Y. Al Zgool Don State Technical University 1 Proceedings of the Spring Young Researcher's Colloquium On Database and Information Systems SYRCoDIS , St.-Petersburg, Russia, 2008 2 Thanh Hung Ngo Don State Technical University

In this paper we offer several models of reference sequences (traces of references) using Markov chains for testing of the replacement policies in caching systems. These models enable the generations of traces with the repeated subsequences of references, which are of great interest for study of forecasting methods in caching systems. Furthermore, we offer the scheme of the program stand, where these models have been realized, and result of the experiments, which have been carried out with its help. Index terms: program stand, model of reference sequences, Markov chains, traces with repeated subsequences.

1 Introduction The most popular method to study the replacement policies in caching systems is its testing by the program simulating caching system. In this approach different traces are given to the input of the program and the cache-hit rate is registered as the result provided cachesystem. The more the value of this rate is, the more efficient the investigated policy performs.

Specific characters of the functioning of the information system often render a big influence upon behavior of the reference sequences. So testing the policy, have been specially designed for a certain information system, requires the traces, reflecting specifics of the system. There are two ways of achieving the reference sequences: by means of logging the accesses being executed in the system for a long-lasting time (e.g. collecting reference sequences) [1, 2] and by means of trace generating programs [3, 4]. The traces having been achieved using the first method more really reflect the specifics of the system, but their achievement requires a significant computing and material resources. The achievement of traces using program-generator is faster and cheaper. The main problem of the last method is the development of mathematical model, describing the information system.

In this paper we offer several models of reference sequences using Markov chains [5]. Besides, we offer the scheme of the program stand, where these models have been realized, and the result of the experiments, which have been carried out with its help. 2 Models of reference sequences 2.1 Model with One Markov Chain (MOMC) specified by the triple (S , A, π ) , where: The model is widely used for modeling of different queuing systems, including caching system. MOMC is 1. S = {s1 , s 2 , … , s N } - the set of N objects in the system. Objects may be the papers in paper caching systems or the objects in the object caching systems; 2. A = {a ij } - the object transition probabilities. The value of element aij of matrix equals to the probability of event, when the system, having accessed the object si , accesses the object s j . If mark the object having been accessed at the moment t as qt ,

N aij ≥ 0, ∑ a ij = 1, 1 ≤ i, j ≤ N ;

j=1 3.

π = (π 1 , π 2 ,… , π N ) - the initial object distribution.

The generation of traces by MOMC follows the algorithm: 1. Input S , A , π . 2. Input the count of references L (length of trace). 3. Initialize value for the variable O , which is the string logging the references to the object, as an empty string.

4. Initialize value for the variable t , which is the current count of references:

t = 1 .

5. Choose the initial object qt from the set S according to the initial object distribution π . 6. WHILE t ≤ L

Let qt = sk , append sk to the string O :

O = O + sk .

8. Choose the next object according to the matrix of object transition probabilities A and the current object sk .

9. Increase the value of the current count of references t by one: λ which references to the objects obey the uniform law (as in occasion 1). Thus the first and the second occasions are just the special cases of the third occasion.

For example we have brought into the matrix in table 1 the noise with the following parameters (λ = 7, x = 0.0198) and have generated traces with its help. The matrix of transition probabilities now looks like in table 2. Some of generated traces are as follow: ( 1, 4, 2, 3, 4, 2, 3, 4, 4, 2, 3, 1, 4, 2, 4 ); ln(N )

Because of one’s simplicity the MOMC is not able to describe the functioning principles of the real systems. To specify the more complicated mechanisms it is used the hidden Markov model. 2.2 Model with Hidden Markov Chain (MHMC) MHMC is specified by the quintuple (S , V , A, B , π ) , where:

1. S = {s1 , s 2 ,… , sU } - the set from U users of the information system;

2. A = {a ij } - the user transition probabilities. If mark the user, which has got access to the system at the moment t , by qt , then: of the system;

4. B = {bij } - probabilities of choosing object for the users. The i -th row is the object choice probability distribution for the i -th user. The value of element bij equals to the probability of event, when the current user qt = si chooses the object v j . If mark this object as ot , then: bij = P (o t = v j | q t = s i ) ,

N bij ≥ 0, ∑ bij = 1, 1 ≤ i ≤ U , 1 ≤ j ≤ N ;

j=1 5.

π = (π 1 , π 2 , … , π N ) - the initial user distribution.

The generation of traces by MHMC follows the algorithm: 1. Input S , V , A , B , π . 2. Input the count of references L (length of trace). 3. Initialize value for the variable O , which is the string logging the references to the object, as an empty string.

4. Initialize value for the variable t , which is the current count of references: t = 1 .

5. Choose the initial user qt from the set S according to the initial user distribution π .

6. WHILE t ≤ L 7. Let qt = sk .

8. Choose the current object ot according to the matrix B and the current user sk .

9. Let ot = vl , append vl to the string O :

O = O + vl .

10. Choose the next user according to the matrix of user transition probabilities A and the current user sk .

11. Increase the value of the current count of references t by one: t = t + 1 .

12. END-WHILE.

13. Save the achieved reference string O as the desired trace.

The second model more naturally reflects the relationship between the elements of the information system: each user accesses to the objects due to his logic (his distribution law).

Both of the described models have disadvantages. The first one is not taken into account the presence of the users, and the second one is not able to model the cyclic traces. For elimination of these restrictions we offer the two-level model of Markov chains. 2.3 Two-Level Model of Markov Chains (TLMMC) TLMMC (S , V , AS , AV , π S is specified by , π V ) , where: 1. S = {s1 , s 2 , … , sU } - the set from U users of the information system;

2. V = {v1 , v 2 , … , v N } - the set from of the system;

3. AS = {a ij } - the user transition probabilities. If mark the user, which has got access to the system at the a ij = P (q t +1 = s j | q t = s i ) , 4.

AV = (A 1 , A 2 , … , A U ) - matrix of object transition probabilities for the users, where:

A k = {a ikj } - matrix of object transition probabilities for the k -th user. Value of the element a kij equals to the probability of event, when the k -th user, having chosen the i -th object, chooses the j -th object. If mark the object, being chosen by the k -th user at his r -th choice, as or , then

a ikj = P (o r +1 = v j | ((o r = v i ) ∧ (q t = s k ))) ,

π S = (π1 , π 2 , … , πU )

N a ikj ≥ 0, ∑j=1 a ikj = 1, 1 ≤ i , j ≤ N , 1 ≤ k ≤ U ; - the initial user distribution;

6. π V = (π 1 ,π 2 , … ,π U ) - the initial object distributions for the users, where: π k = (π 1k ,π 2k , … ,π kN ) - the initial object distribution for the k -th user. 1 ≤ k ≤ U .

The generation of traces by MHMC follows the algorithm: 1. Input: S , V , AS , A 1 , A 2 , … , A U , and πS , π1 , π2 , …, πU . 2. Input the count of references L (length of trace). 3. Initialize value for the variable O , which is the string logging the references to the object, as an empty string.

4. Initialize value for the variable l , which is the current count of references: l = 1 .

5. Initialize value for the variable t , which is the count of the user turns: t = 1 . according to the initial user distribution π .

S 6. Choose the initial user qt from the set S 7. WHILE l ≤ L 8.

Let qt = sk .

One of the traces achieved with help of the model looks like: 4, 2, 3, 1, 3, 1, 4, 2, 4, 2, 3, 1, 1, 2, 2, 3, 1, 4, 1, 1, 2, 3, 1, 4, 2, 1, 4, 2. In this trace the absolute cyclic trace (…, 1, 4, 2, 3, 1, 4, 2, 3, …) has been distorted and fragmented into subsequences due to the random turns of the users and the random number of references having been made in their turns. These distortions create more difficulties for the methods identifying the subsequences in the references sequence. So the model TLMMC is of great interest to investigate the prediction methods in caching system [6, 7]. 3 The scheme of program stand The program stand has been written in programming environment Delphi 7. It has following components (fig. 1): 1. Trace - generator.

Generate traces by the one of different models. 2. Realization block of replacement policies. 3. Cache simulator.

Simulate the performance of cache-system with the chosen policy and chosen traces model. 4. Analysis block Analyze the result of performance of cache-system. Functioning of the program stand has been illustrated by benchmark analysis of performance of different replacement policies: Random, LRU and LFU. The result of the experiment is showed in fig. 2.

In the investigation it has been used the model TLMMC with the following parameters:

S = {1, 2}; V = {1, 2, … , 100 }; π S = (.5, .5 ); π1 = (.01, .01, … , .01 ); π 2 = (.01, .01, … , .01 );

Matrix A1 is specified as matrix of cyclic traces with a random “noise” (looks like the matrix in tab. 2). When x = 0 the absolute cyclic trace, being generated according to this matrix, looks like: ⎛1,2,3,4,87,35,7,99,9,76,13,12,11,14,15,34,78, ⎞ ⎜ ⎟ ⎜30,74,20,29,5,23,64,80,100,79,28,21,53,6,26,⎟ ⎜⎜33,32,44,17,81,65,51,73,41,42,43,31,48,46, ⎟⎟ ⎜ 47,22,49,45,96,52,18,54,55,94,19,60,90,25, ⎟ ⎜ ⎟ ⎜92,62,16,24,40,63,67,68,69,75,71,72,56,61, ⎟ ⎜ 70,10,77,86,38,58,37,82,83,84,27,36,50,88, ⎟ ⎜⎜⎝89,59,57,91,93,39,95,85,97,98,8,66 ⎟⎟⎠ Matrix A2 determines for the second user the equiprobable transition from an object to any objects, including the transition to it self, i.e. ai2j = 1 / 100, ∀i, j : 1 ≤ i, j ≤ 100 .

As magnitude of the “noise” coefficient it has been used three values: x = 0 ; x = 0.1054 and x = 4.6052 . In all cases λ = 1 . When x = 0 , in generated traces present the repeated subsequences. This fact has negatively influenced upon performance of Random and LRU, but not for LFU. In this case it is registered the slight advantage of Random over LRU and significant advantage of LFU over the rest. Increasing the magnitude of x , the indeterminacy in generation of the traces rapidly increases. This change positively influences upon performance of Random and LRU, and equalizes the performance of all policies. The result of the demonstration experiment shows urgency and relevance of the offered traces models and also program stand for study of different replacement policies. The TLMMC presents a great interest to study the forecasting method in cache-system. The traces generated by this model were applied to testing the decomposition method of the relations in database systems with the criteria of increasing the cache-hit rate [8]. Herein, when the magnitude of x reaches its maximum value, the experimental cache-hit rate completely agrees with the corresponding rate theoretically has been achieved in that paper.

[1]

Min

Xu , Vyacheslav Malyugin, Jeffrey Sheldon, Ganesh Venkitachalam, Boris Weissman. ReTrace: Collecting Execution Trace with Virtual Machine Deterministic Replay . // Third Annual Workshop on Modeling, Benchmarking and Simulation, held in conjunction with the 34th Annual International Symposium on Computer Architecture , June 2007 . http://www.xhfamily.com/x/files/MoBS07_ replay_i trace .pdf

[2]

Hervé

Touati , Alan Jay Smith. Reducing and Manipulating Complex Trace Data. // Software - Practice and Experience . Vol. 21 , June 1991 , 639 - 655 .

[3] Eeckhout

, De Bosschere

, Neefs

. Performance analysis through synthetic trace generation . // Performance Analysis of Systems and Software , 2000 . ISPASS. IEEE International Symposium on Volume, Issue , 2000 .

[4]

Germán

Galeano Gil , Juan A. Gómez Pulido , and Juan

M. Sánchez

Pérez . Tool for the Analysis and Memory-Trace Generation of DOS Executable Files . // Microprocessors and Microsystems, Volume 22 , Issue

, 25 January 1999 , Pages 389 - 393 .

[5] Lawrence

Rabiner . A tutorial on Hidden Markov Models and selected applications in speech recognition . // Proceedings of the IEEE , VOL. 77 , NO. 2, Feb 1989 .

[6]

Fei

Guo and

Yan

Solihin . A Prediction Model for Alternative Cache Replacement Policies . // http://www.ece.ncsu.edu

[7] Thomas

Kroeger , Darrell D. E.

Long . Predicting File System Actions from Prior Events. // Proceedings of the USENIX 1996 Annual Technical Conference , pages 319 - 328 , January 1996 . http://citeseer.ist.psu.edu/kroeger96predicting.html

[8]

Thanh

Hung Ngo ,

Michael V.

Grankov . New Object Function for Vertical Partitioning in Database System . // the present colloquium.