=Paper=
{{Paper
|id=Vol-1343/paper4
|storemode=property
|title=Parallel Approach to Context Transformations
|pdfUrl=https://ceur-ws.org/Vol-1343/paper4.pdf
|volume=Vol-1343
|dblpUrl=https://dblp.org/rec/conf/dateso/VasinekP15
}}
==Parallel Approach to Context Transformations==
Parallel Approach to Context Transformations Parallel Approach to Context Transformations Michal Vašinek, Jan Platoš Michal Vašinek, Jan Platoš Department of Computer Science, FEECS, VŠB-Technical University of Ostrava, 17.listopadu, Department of Computer Science, 708 33, Ostrava FEECS, - Poruba,University of Ostrava, VŠB-Technical 17.listopadu, 708 33, Ostrava - Poruba, michal.vasinek@vsb.cz,jan.platos@vsb.cz, michal.vasinek@vsb.cz,jan.platos@vsb.cz, Abstract. Context transformation is a process that turns input data into one with lower zero order entropy. The main weakness of algorithms presented sofar is the complexity of replacing procedure. In this paper we describe properties related to the parallelization of replacing procedure and we propose a modified version of a basic context transformation algorithm, that uses a parallel approach. Keywords: compression, context, transformation, entropy, parallel com- puting 1 Introduction There are two main classes of algorithms employed in the data compression. The first class of algorithms deals directly with the compression and their purpose is to decrease the size of the input message. Examples [3] of such algorithms are Huffman coding, Arithmetic coding, Lempel Ziv algorithms family, PPM and many others. The second class of algorithms behaves more like preprocessors for the first class, these algorithms are usually called transformations, examples are Burrows-Wheeler transformation [1] or MoveToFront [2] transformation that are used in bzip2 file compressor. The purpose of transformations is not to decrease message size but to change the internal message structure that could be more easily handled by some of the first class algorithms. In [5] and [6] we propose a reversible transformation method called a ‘Context transformation’, that we use to reduce zero-order en- tropy of input messages. Transformed data are then compressed using entropy coding algorithms, like Huffman coding. In this paper we describe properties related to the parallelization of context transformation algorithms. We use several notation conventions: we use Σ to denote set of message symbols, characters in equations are greek symbols from Σ and unless stated otherwise, they can represent any member of Σ, i.e. α = β as well as α 6= β it should be clear from the context. When we present examples of transformations we use symbols from english alphabet to denote particular transformations like ab → ac, then each character a, b, c, . . . are distinct characters a 6= b 6= c 6= . . . . The rest of the paper is organized as follows. Section 2 contains description of the proposed Context transformations and their properties. Section 3 analyses complexity of the transformation with respect to the number of symbols in the M. Nečaský, J. Pokorný, P. Moravec (Eds.): Dateso 2015, pp. 40–51, CEUR-WS.org/Vol-1343. Parallel Approach to Context Transformations 41 alphabet for both proposed transformation types. Section 4 describes the ability of the transformations to work in parallel. Section 5 contains design of the parallel algorithm and presents results achieved on the selected dataset. Last Section 6 concludes the paper and discusses achieved results. 2 Context Transformations The context transformation is a method that takes two digrams, the digram αβ that is present in the input message and the digram αγ that is not present in the input message. Transformation replaces all occurences of αβ for αγ. The symbol α is called a context symbol and it is present in both digrams. Definition 1 Context transformation(CT) is a mapping CT (αβ → αγ, w) : Σ n → Σ n , Σ is the alphabet of the input message w and n is the length of the input message, that replaces all digrams αβ for digram αγ, where p(α, γ) = 0 and β 6= γ. We may consider also the transformation of the form βα → γα, such trans- formation would correspond to the same transformation like in Definition 1 if we perform replacement on the mirror message. There is one rule that must be followed and it is that such transformation must be applied on the input message from right to left, respectively from left to right in the case of βα → γα. This rule ensures that for each context transformation an inverse transformation that transforms the new message back to the input message exists. The proof of this rule can be found in [5] and [7]. Context transformations are a subset of more general set of generalized context transformations: Definition 2 Generalized context transformation(GCT) is a mapping GCT (αβ ↔ αγ, w) :Σ n → Σ n , Σ is the alphabet of the input message w and n is the length of the input message, that exchanges all digrams αβ for digram αγ and vice-versa. GCT transformation can be applied in any direction, but CT ⊂ GCT only when they are applied in the same direction. In this paper we describe GCT transformations applied from right so it is consistent with the former notion of context transformations. The main difference between a context and a generalized context transfor- mation is that we can swap any two digrams beginning with alpha, hence we are no more restricted on cases when one of digrams is not present in a message. Example of each transformation is presented in Fig. 1. The following paragraphs contains brief discussion of the context transforma- tion process and its implication to zero order Shanonn entropy [4]. The reader can find the detailed description in [5] and [7]. When we will speak about entropy we mean the Shannon’s entropy over alphabet Σ of individual symbols defined by: 42 Michal Vašinek, Jan Platoš Fig. 1. Examples of two different types of transformations, p(b) > p(c) > p(d), p(a, c) > p(a, b), p(b, c) > p(a, b): a) Sequence of context transformations ab → ad, ac → ab, b) Generalized context transformation ab ↔ ac X H=− p(x)log(p(x)) (1) x∈Σ where p(x) is probability of symbol x. Context transformations are based on the fact that the structured data, like a human-written texts or programming codes has its inner structure and the most of the information is not hidden in the entropy of input data alphabet but it depends more on occurences of digrams, trigrams, words, . . . Suppose example string ‘kakaoo’, frequencies of its digrams can be repre- sented by a matrix, we call such matrix a ‘Context matrix’. Entry of a context matrix is non-negative integer that represents frequency of digram represented by row symbol followed by column symbol. For our example string, the context matrix is shown in Table 1. k a o k 0 2 0 a 1 0 1 o 0 0 1 Table 1. The context matrix that represents the string ‘kakaoo’ Since the probability distribution of symbols is uniform, the entropy of our example string is for given alphabet maximal. From context matrix we see that there are several accessible transformations, first we select a context transfor- mation that replaces all digrams ‘ka’ for a digram ‘kk’, so the condition of zero and non-zero entry is fulfilled. The resulted string is ‘kkkkoo’ and its context matrix is shown in Table 2. We can see two different consequences of this transformation: – the alphabet size decreased, Parallel Approach to Context Transformations 43 – the entropy decreased. k a o k 3 0 1 a 0 0 0 o 0 0 1 Table 2. The context matrix that represents the transformed string ‘kkkkoo’ When we use a generalized context transformation instead of a simple context transformation we arrive in two different strings based on which transformation direction was applied. When GCT is applied from left then GCT→ (w) = kkaaoo and from right like in a context transformation case GCT← (w) = kkkkoo. We may select another transformation, for example ‘oo’ for ‘ok’, leaving w in the state ‘kkkkok’, that has even lower entropy. What we examine in this paper is, if we can run these two transformations simultaneously, respectively under what conditions these can be perfomerd in parallel. 2.1 Transformations and their consequences on Huffman coding Compression schema based on CT or GCT consist of two phases, in the first phase the input message is transformed using (G)CT, and finaly in the second phase, the message is compressed using some zero-order entropy coder. Fig. 2. Compession schema based on context transformations. Suppose static Huffman coding as a zero-order entropy coder, if p(β) > p(γ) then for lengths of their corresponding Huffman codes holds that |c(β)| ≤ |c(γ)|. If p(α, γ) 6= 0 and p(α, β) = 0 (resp. p(α, β) < p(α, γ)) and transformation CT (αγ → αβ) (resp. GCT (αβ ↔ αγ)) is applied, then all symbols γ that are part of digrams αγ will be coded by the code of length |c(β)| instead of the code of length |c(γ)|. 44 Michal Vašinek, Jan Platoš 3 Complexity analysis Presented algorithms are all greedy approaches, algorithms iteratively search for the best entropy reducing transformations and then apply transformation on the source. Formally we divide each iteration in two steps: 1. Search procedure - find the best transformation according the actual context matrix, 2. Replacing procedure - apply selected transformation on the input string w. n ← input size w ← input string compute matrix() repeat transformation = search procedure() H0 = entropy() H1 = entropy(transformation) if n ∗ (H0 − H1) < LIM IT then return end if replacing procedure(transformation,w) until true Fig. 3. Basic structure of context transformation algorithms The infinite loop in Fig. 3 terminates when there are no more transformations that are able to reduce entropy more, than by a limit given by LIM IT variable. The LIM IT variable is set up to the size of transformation description. In our algorithms LIM IT = 24 because we store description of each transformation as a triplet of byte values αβγ. Both parts of the algorithm can be parallelized, but the operation that is the most time consuming is the replacing procedure as is depicted in Table 3. In the Section 5 we present a modified version of the basic algorithm, that uses the fact, that some transformations can be performed simultaneously and we try to parallelize the replacing procedure. Search(ms) Replacement(ms) 1.144 28.813 Table 3. Average time per file needed for search and replacement procedures. Dataset assasin(6097 files). Parallel Approach to Context Transformations 45 In the next sections we analyse complexities of both parts and we show that for source string w of length |w| → ∞ the only significant part remains to be replacing procedure. 3.1 Search procedure Algorithm in Fig. 3 is a basic greedy algorithm that searches for and eventually performs context transformations in such a way that the resulted entropy of al- phabet Σ is lower than in the untransformed case Search procedure operates on the context matrix, where rows and columns are sorted according their frequen- cies. The algorithm iterates through rows of the matrix from the most probable one to the least probable one. The simplified version of the search procedure is presented in the following pseudocode: cm ← context matrix ∆H = 0 ← the change of zero order entropy T ← searched transf ormation for row = 1 to dim(cm) do for col 1 = 1 to dim(cm) do for col 2 = col 1 + 1 to dim(cm) do ∆Htemp = compute dH(row, col 1, col 2) if ∆Htemp − ∆H < 0 then ∆H = ∆Htemp ; T = [row, col 1, col 2] end if end for end for end for Fig. 4. Outline of the search procedure The function compute dH has constant complexity because it computes only change of entropy, probabilities will be modified only for symbols β and γ, so we don’t need to recompute entropy of all symbols, but is is sufficient to recompute entropies of β and γ. The complexity of the search procedure is dependent only on the size of the alphabet and its worst case complexity is O(t|Σ|3 ), because we have to perform 1/2|Σ|3 searches. In our algorithm design the maximum allowable size of the alphabet is |Σ| = 256, since we interpret as a letter only a one byte values. That concludes that the upper limit of operations needed for one search is fixed and in the worst case it is given by |Σ|3 . There are several techniques how the search procedure can be designed with lower complexity, i.e. precomputation and storage of n best transformations at the beginning and then only updating the modified ones, leads to the number of entropy recomputation given by |Σ|3 + 2t|Σ|2 . 46 Michal Vašinek, Jan Platoš 3.2 Replacing procedure The second step, the replacing procedure is very simple, it passes data t times and exchanges both digrams. Let t be a number of performed GCTs and n is the length of the input, then the complexity of replacing procedure is O(tn). The inverse transformation consists only from replacing procedure and so it also has the complexity O(tn). If the generalized context transformation is of the form αβ ↔ αγ then its inverse transformation is of the same form, but is taken in the opposite direction. The experimentally observed t was in range 100-1000 of transformations. 3.3 Comparison When we let the complexitities functions to be equal we arrive at the limit when one computation of a replacement procedure becomes more significant than the one computation of a search procedure. We have to arrive at the limit because the search procedure is independent of the input size: c1 tn = c2 |Σ|3 + c3 t|Σ|2 (2) Constants ci represents implementation specific coefficients. The number of transformations t on both sides of the equation can be rearranged leaving us with: c2 |Σ|3 + c3 t|Σ|2 c2 |Σ|3 c3 |Σ|2 n= = + (3) c1 t c1 t c1 as a number of transformations t → ∞ the first term on the right side becomes zero: c3 |Σ|2 lim n = (4) t→∞ c1 when |Σ| is finite, then for all source strings of length m, where m > n, the replacing procedure will be more computationally intensive than the search pro- cedure. c1 tm > c2 |Σ|3 + c3 t|Σ|2 (5) 4 Parallel transformations In this section we describe conditions needed for parallelization of the context and generalized context transformations. 4.1 Commuting transformations Let’s consider two different transformations T1 and T2 , we say that these two transformations commute, if they satisfy following definition: Parallel Approach to Context Transformations 47 Definition 1. Two different transformations T1 and T2 commute if: T1 (T2 (w)) = T2 (T1 (w)) (6) In our model example of the string w = kakaoo the two presented transfor- mations commute, since if T1 is ‘ko’ to ‘kk’ transformation and T2 is ‘oo’ to ‘ok’ transformation then T2 (T1 (‘kakaoo0 )) = T1 (T2 (‘kakaoo0 )) = kkkkok. As an example of non-commuting transformations let’s consider transfor- mation T1 again and transformation T3 that replaces digrams ‘ao’ to ‘aa’. Ap- plying these two transformations in both ways will lead to different results. T3 (T1 (w)) =0 kkkkao0 but T1 (T3 (w)) =0 kkkkoo0 so we see that T1 (T3 (w)) 6= T3 (T1 (w)) 4.2 Parallel transformations Commutativity is not sufficient property to decide if two transformations could be run in parallel. As may be seen along with property that they have an inverse transformation to them. Let’s consider again our example word w = kakaoo and two transformations T4, representing ak → aa and T5 representing ao → aa. These two transforma- tions commute and transform together the word ‘kakaoo’ into the word ‘kaaaao‘, but when we perform inverse transformation, we can get different results, since we don’t know which of two inverse transformations will replace digram ‘aa’ first. Before we show how to handle inverse transformations, we introduce two sets, an abstract set Di and set Ki , that will be later used to prove a theorem about parallel transformations: Definition 2. Let Di be a set of all unique digrams that are modified(created or destroyed) by transformation Ti (αβ ↔ αγ) and let the set Ki be a set of all transformation digrams Ki = {αβ, αγ}. The set Di contains digrams αβ, αγ and all digrams of type βX and γX, where X ∈ Σ. Suppose two sets D1 and D2 and let D1 ∩ D2 6= ∅, these two sets share at least one common digram d, suppose that d ∈ K1 ∪ K2 , it means that transformations T1 and T2 will interfere on this particular digram i.e. when transformation T1 modifies digram αβ on αγ then the second transformation won’t be able to modify digram αβ as it would do in the case when there is no other transformation T1 . From the above reasoning we form a lemma about parallel transformations: Lemma 1. Two transformations T1 and T2 can be run in parallel if: D1 ∩ D2 = ∅ (7) Proof. Since D1 ∩ D2 = ∅ then no digrams that are modified by both trans- formations T1 and T2 exist and so these two transformations can be applied together. t u 48 Michal Vašinek, Jan Platoš Lemma 1 gives us a simple condition that we use in our algorithms to con- struct a set of parallel transformations, but it is a weak condition, because there still exists parallel transformations, but D1 ∩ D2 6= ∅, i.e. suppose T1 (ab ↔ ac) and T2 (db ↔ dc), for these two transformations D1 ∩ D2 = {bX, cX}. Theorem 1. Two transformations T1 and T2 can be run in parallel if: D1 ∩ K2 = ∅ (8) and D2 ∩ K1 = ∅ (9) Proof. Suppose that transformation T1 is of the form T1 (αβ ↔ αγ), it has correspoding sets D1 = {αβ, αγ, βX, γX} and K1 = {αβ, αγ}. If T2 contains in its K2 one of the elements from D1 then it means that the transformation T1 can modify some of the elements that would be otherwise transformed(replaced) by T2 , so the two transformations can be run in parallel only if D1 ∩ K2 = ∅. Similar reasoning can be used to prove theorem for equation (9). If T2 contains in its D2 one of the elements from K1 , i.e. αβ then if T2 will change any occurence of αβ first, the first transformation won’t be able to modify it to αγ, so such transformations cannot be parallelized and for parallel transformations must hold D2 ∩ K1 = ∅. Now suppose that D1 ∩ K2 = ∅, but D2 ∩ K1 6= ∅, then some element i.e. αβ is in the set D2 , we know that transformations are not parallel when αβ ∈ K1 ∧ αβ ∈ K2 , now we prove that they cannot be parallelized also if αβ ∈ D2 \K2 , bacause then T2 is of the form Xα ↔ XY , but when α is modified on Y then instead of αβ will be Y β and transformation T1 cannot modify it. The same is valid in the case when D1 ∩ K2 6= ∅, but D2 ∩ K1 = ∅. So both conditions in Theorem 1 must be valid together. t u Because our parallel algorithm is based on Lemma 1, we show several other properties that parallel transformations based on Lemma 1 have. Theorem 2. Two different transformations T1 and T2 can be run in parallel if: T1 (T2 (w)) = T2 (T1 (w)) = wT (10) and T1−1 (T2−1 (wT )) = T2−1 (T1−1 (wT )) = w (11) Proof. Suppose the transformation T← (αβ ↔ αγ), where the arrow at the in- dex is a label for transformation direction i.e. from right to left ←; it has its coresponding set of modified digrams D← = {αβ, αγ, βX, γX}, next suppose the transformation T→ (αβ ↔ αγ) and its coresponding set of digrams D→ = {αβ, αγ, βX, γX}, we see that D← = D→ , but we know that T→ (T← (w)) = w, so transformations T and T −1 share the same set D. Parallel Approach to Context Transformations 49 If T1 (T2 (w)) = T2 (T1 (w)) = wT then it means that there are no interfering digrams, because if there would be such digrams then the equality would not hold, so D←,1 ∩ D←,2 = ∅, but we showed that D← = D→ so also intersect of inverse transformations sets is empty and T1−1 and T2−1 will also recover w correctly. t u Lemma 1 and Theorem 2 describes the same phenomenom and they can be generalized for the arbitrary set of parallel transformations: Theorem 3. The set of transformations T can be run in parallel if for all pairs of transformations Ti and Tj holds that: Di ∩ Dj = ∅ (12) Proof. Suppose the set T can be run in parallel, and suppose two transformations Ti , Tj ∈ T such that Di ∩ Dj 6= ∅, then it means that there exist some digram αβ common for Ti and Tj and these transformations cannot be run in parallel, but this is in contradiction with hypothesis that T is parallel, so the set Di ∩ Dj = ∅. t u There is one important fact to emphasize that emerged as a consequence of Theorem 2, it is a statement about inverse transformations: Corollary 1. Let T = {T1 , T2 , . . . , Tn } is a set of parallel transformations, then the set T −1 = {T1−1 , T2−1 , . . . , Tn−1 } is parallel as well. Proof. In Theorem 2 we proved that Di = Di−1 and because for the set T holds that for all sets Di , Dj is Di ∩ Dj = ∅, then also Di−1 ∩ Dj−1 = ∅ and the transformation set T −1 can be run in parallel. Corollary 1 is very important result because it tells us that we may parallelize not only the set T but also its inverse T −1 so an inverse transformation algorithm is parallelizable as well. With the knowledge of Lemma 1 we know how to construct set T , now we explore how large the set possibly can be for particular alphabet Σ: Theorem 4. The maximal size MT of the set of parallel transformations T for particular alphabet Σ is: |Σ| M =b c (13) 2 Proof. There are two basic types of the set D, one type coresponds to the trans- formation of the form αα ↔ αβ and the second type coresponds to the transfor- mation αβ ↔ αγ. The first set D1 = {αβ, αα, αX, βX} = {αX, βX} influences two rows of the context matrix meanwhile the second set D2 = {αβ, αγ, βX, γX} influences three rows. So when only transformations of the first type are selected into T then at most b|Σ|/2c transformations can be run in parallel. t u Theorem 5. Relation to be parallel between transformations is not transitive. 50 Michal Vašinek, Jan Platoš Proof. Suppose a sets of digrams Di∈{0,1,2} . Let D0 ∩ D1 = ∅ and D1 ∩ D2 = ∅, if transitivity holds then D0 ∩ D2 = ∅, but this is not generally true. Consider following example, let T0 (ab ↔ ac),T1 (ad ↔ ae), transformations are clearly parallelizable. Let T2 (ac ↔ af ), D1 ∩ D2 = ∅, but D0 ∩ D2 6= ∅. t u 5 Parallel version of the basic context transformation algorithm As we saw in the section about parallel transformations, there is a distinct num- ber of rows affected by each transformation. This knowledge allows us, based on Lemma 1, collect parallel transformations in the first step and afterwards perform them simultaneously. Individual transformations are perfomed simulta- neously in shared memory. The process is outlined in Fig.5. cm ← context matrix t array = [] ← parallel transf ormations t size ← array size for row = 1 to dim(cm) do for col = 1 to dim(cm) do i=0 while parallel transf ormation(t array, row, col) and i < t size do append transf ormation(t array, get transf ormation()) i=i+1 end while for transf ormation in t array do perf orm parallel(transf ormation) end for{This section is run in parallel} end for end for Fig. 5. Parallel modification of basic transformation algorithm We tested algorithm on the computer machine equipped by four proccessors. Parallel algorithm was more then three times faster than the serial one. The results are shown in Table 4. Table 4. Average processing time per file. Dataset assasin. Serial(ms) Parallel(ms) Speed-up 45.383 13.397 3.387 Serial and parallel algorithms can have different transformation paths(i.e. different transformations or the order in which transformations are performed). Parallel Approach to Context Transformations 51 In the serial version of the algorithm, there can be also performed transforma- tions inaccessible to the parallel algorithm, so the resulted entropy is lower in the serial case as is presented in Table 5. No transformation Serial Parallel 4.938 4.054 4.106 Table 5. Entropy comparison. Dataset assasin. 6 Conclusion We showed a basic properties that have to be fulfilled to run replacing procedure of context transformation algorithms in parallel. The presented parallel version of the basic context transformation algorithm significantly increases the speed of processing of individual data files. On the other hand, usage of parallel algorithm can lead to minor increase of the resulted entropy. There are two directions in which we would like to continue our research, the first direction is to prepare parallel algorithms according Theorem 1 instead of Lemma 1 and since algorithms presented sofar performs transformations using only digrams, in the future work we will also focus on development of algorithms operating on different context lengths. Acknowledgment This work was supported by the SGS in VSB - Technical University of Ostrava, Czech Republic, under the grant No. SP2015/146. References 1. Burrows, M. and Wheeler D.J., A block sorting lossless data compression algorithm, 1994 2. Bentley, J, L. and Sleator, D. D. and Tarjan, R. E. and Wei, Victor K. A locally adaptive data compression scheme Commun. ACM, vol 4, pp. 320-330, 1986 3. Salomon, D., Data Compression: The Complete Reference, Springer, NewYork, 2007. 4. Shannon, C. E., A mathematical theory of communication Bell System Technical Journal, vol. 27, pp. 379-423 and 623-656, 1948. 5. Vašinek, M., Kontextové mapy a jejich aplikace Master Thesis, VŠB - Technical University of Ostrava, 2013 6. Vašinek, M. and Platoš, J., Entropy reduction using context transformations Data Compression Conference(DCC), pp. 431-431, 2014 7. Vašinek, M., Context Transformations Ph.D. Workshop of Faculty of Electrical En- gineering and Computer Science, 2014