=Paper= {{Paper |id=Vol-1343/paper4 |storemode=property |title=Parallel Approach to Context Transformations |pdfUrl=https://ceur-ws.org/Vol-1343/paper4.pdf |volume=Vol-1343 |dblpUrl=https://dblp.org/rec/conf/dateso/VasinekP15 }} ==Parallel Approach to Context Transformations== https://ceur-ws.org/Vol-1343/paper4.pdf

Parallel Approach to Context Transformations
Parallel Approach to Context Transformations
Michal Vašinek, Jan Platoš
Michal Vašinek, Jan Platoš
Department of Computer Science, FEECS, VŠB-Technical University of Ostrava,
17.listopadu,
Department of Computer Science, 708 33, Ostrava
FEECS, - Poruba,University of Ostrava,
VŠB-Technical
17.listopadu, 708 33, Ostrava - Poruba,
michal.vasinek@vsb.cz,jan.platos@vsb.cz,
michal.vasinek@vsb.cz,jan.platos@vsb.cz,

Abstract. Context transformation is a process that turns input data
into one with lower zero order entropy. The main weakness of algorithms
presented sofar is the complexity of replacing procedure. In this paper we
describe properties related to the parallelization of replacing procedure
and we propose a modified version of a basic context transformation
algorithm, that uses a parallel approach.

Keywords: compression, context, transformation, entropy, parallel com-
puting

1 Introduction
There are two main classes of algorithms employed in the data compression. The
first class of algorithms deals directly with the compression and their purpose is
to decrease the size of the input message. Examples [3] of such algorithms are
Huffman coding, Arithmetic coding, Lempel Ziv algorithms family, PPM and
many others. The second class of algorithms behaves more like preprocessors for
the first class, these algorithms are usually called transformations, examples are
Burrows-Wheeler transformation [1] or MoveToFront [2] transformation that are
used in bzip2 file compressor.
The purpose of transformations is not to decrease message size but to change
the internal message structure that could be more easily handled by some of
the first class algorithms. In [5] and [6] we propose a reversible transformation
method called a ‘Context transformation’, that we use to reduce zero-order en-
tropy of input messages. Transformed data are then compressed using entropy
coding algorithms, like Huffman coding. In this paper we describe properties
related to the parallelization of context transformation algorithms.
We use several notation conventions: we use Σ to denote set of message
symbols, characters in equations are greek symbols from Σ and unless stated
otherwise, they can represent any member of Σ, i.e. α = β as well as α 6= β it
should be clear from the context. When we present examples of transformations
we use symbols from english alphabet to denote particular transformations like
ab → ac, then each character a, b, c, . . . are distinct characters a 6= b 6= c 6= . . . .
The rest of the paper is organized as follows. Section 2 contains description
of the proposed Context transformations and their properties. Section 3 analyses
complexity of the transformation with respect to the number of symbols in the

M. Nečaský, J. Pokorný, P. Moravec (Eds.): Dateso 2015, pp. 40–51, CEUR-WS.org/Vol-1343.
Parallel Approach to Context Transformations 41

alphabet for both proposed transformation types. Section 4 describes the ability
of the transformations to work in parallel. Section 5 contains design of the parallel
algorithm and presents results achieved on the selected dataset. Last Section 6
concludes the paper and discusses achieved results.

2 Context Transformations

The context transformation is a method that takes two digrams, the digram αβ
that is present in the input message and the digram αγ that is not present in the
input message. Transformation replaces all occurences of αβ for αγ. The symbol
α is called a context symbol and it is present in both digrams.

Definition 1 Context transformation(CT) is a mapping CT (αβ → αγ, w) :
Σ n → Σ n , Σ is the alphabet of the input message w and n is the length of the
input message, that replaces all digrams αβ for digram αγ, where p(α, γ) = 0
and β 6= γ.

We may consider also the transformation of the form βα → γα, such trans-
formation would correspond to the same transformation like in Definition 1 if
we perform replacement on the mirror message. There is one rule that must be
followed and it is that such transformation must be applied on the input message
from right to left, respectively from left to right in the case of βα → γα. This
rule ensures that for each context transformation an inverse transformation that
transforms the new message back to the input message exists. The proof of this
rule can be found in [5] and [7]. Context transformations are a subset of more
general set of generalized context transformations:

Definition 2 Generalized context transformation(GCT) is a mapping GCT (αβ
↔ αγ, w) :Σ n → Σ n , Σ is the alphabet of the input message w and n is the
length of the input message, that exchanges all digrams αβ for digram αγ and
vice-versa.

GCT transformation can be applied in any direction, but CT ⊂ GCT only
when they are applied in the same direction. In this paper we describe GCT
transformations applied from right so it is consistent with the former notion of
context transformations.
The main difference between a context and a generalized context transfor-
mation is that we can swap any two digrams beginning with alpha, hence we
are no more restricted on cases when one of digrams is not present in a message.
Example of each transformation is presented in Fig. 1.
The following paragraphs contains brief discussion of the context transforma-
tion process and its implication to zero order Shanonn entropy [4]. The reader
can find the detailed description in [5] and [7]. When we will speak about entropy
we mean the Shannon’s entropy over alphabet Σ of individual symbols defined
by:
42 Michal Vašinek, Jan Platoš

Fig. 1. Examples of two different types of transformations, p(b) > p(c) > p(d), p(a, c) >
p(a, b), p(b, c) > p(a, b): a) Sequence of context transformations ab → ad, ac → ab, b)
Generalized context transformation ab ↔ ac

X
H=− p(x)log(p(x)) (1)
x∈Σ

where p(x) is probability of symbol x. Context transformations are based on
the fact that the structured data, like a human-written texts or programming
codes has its inner structure and the most of the information is not hidden in the
entropy of input data alphabet but it depends more on occurences of digrams,
trigrams, words, . . .
Suppose example string ‘kakaoo’, frequencies of its digrams can be repre-
sented by a matrix, we call such matrix a ‘Context matrix’. Entry of a context
matrix is non-negative integer that represents frequency of digram represented
by row symbol followed by column symbol. For our example string, the context
matrix is shown in Table 1.

k a o
k 0 2 0
a 1 0 1
o 0 0 1

Table 1. The context matrix that represents the string ‘kakaoo’

Since the probability distribution of symbols is uniform, the entropy of our
example string is for given alphabet maximal. From context matrix we see that
there are several accessible transformations, first we select a context transfor-
mation that replaces all digrams ‘ka’ for a digram ‘kk’, so the condition of zero
and non-zero entry is fulfilled.
The resulted string is ‘kkkkoo’ and its context matrix is shown in Table 2.
We can see two different consequences of this transformation:

– the alphabet size decreased,
Parallel Approach to Context Transformations 43

– the entropy decreased.

k a o
k 3 0 1
a 0 0 0
o 0 0 1

Table 2. The context matrix that represents the transformed string ‘kkkkoo’

When we use a generalized context transformation instead of a simple context
transformation we arrive in two different strings based on which transformation
direction was applied. When GCT is applied from left then GCT→ (w) = kkaaoo
and from right like in a context transformation case GCT← (w) = kkkkoo.
We may select another transformation, for example ‘oo’ for ‘ok’, leaving w in
the state ‘kkkkok’, that has even lower entropy. What we examine in this paper
is, if we can run these two transformations simultaneously, respectively under
what conditions these can be perfomerd in parallel.

2.1 Transformations and their consequences on Huffman coding
Compression schema based on CT or GCT consist of two phases, in the first
phase the input message is transformed using (G)CT, and finaly in the second
phase, the message is compressed using some zero-order entropy coder.

Fig. 2. Compession schema based on context transformations.

Suppose static Huffman coding as a zero-order entropy coder, if p(β) > p(γ)
then for lengths of their corresponding Huffman codes holds that |c(β)| ≤ |c(γ)|.
If p(α, γ) 6= 0 and p(α, β) = 0 (resp. p(α, β) < p(α, γ)) and transformation
CT (αγ → αβ) (resp. GCT (αβ ↔ αγ)) is applied, then all symbols γ that are
part of digrams αγ will be coded by the code of length |c(β)| instead of the code
of length |c(γ)|.
44 Michal Vašinek, Jan Platoš

3 Complexity analysis

Presented algorithms are all greedy approaches, algorithms iteratively search for
the best entropy reducing transformations and then apply transformation on the
source. Formally we divide each iteration in two steps:

1. Search procedure - find the best transformation according the actual context
matrix,
2. Replacing procedure - apply selected transformation on the input string w.

n ← input size
w ← input string
compute matrix()
repeat
transformation = search procedure()
H0 = entropy()
H1 = entropy(transformation)
if n ∗ (H0 − H1) < LIM IT then
return
end if
replacing procedure(transformation,w)
until true

Fig. 3. Basic structure of context transformation algorithms

The infinite loop in Fig. 3 terminates when there are no more transformations
that are able to reduce entropy more, than by a limit given by LIM IT variable.
The LIM IT variable is set up to the size of transformation description. In our
algorithms LIM IT = 24 because we store description of each transformation as
a triplet of byte values αβγ.
Both parts of the algorithm can be parallelized, but the operation that is
the most time consuming is the replacing procedure as is depicted in Table 3. In
the Section 5 we present a modified version of the basic algorithm, that uses the
fact, that some transformations can be performed simultaneously and we try to
parallelize the replacing procedure.

Search(ms) Replacement(ms)
1.144 28.813

Table 3. Average time per file needed for search and replacement procedures. Dataset
assasin(6097 files).
Parallel Approach to Context Transformations 45

In the next sections we analyse complexities of both parts and we show that
for source string w of length |w| → ∞ the only significant part remains to be
replacing procedure.

3.1 Search procedure

Algorithm in Fig. 3 is a basic greedy algorithm that searches for and eventually
performs context transformations in such a way that the resulted entropy of al-
phabet Σ is lower than in the untransformed case Search procedure operates on
the context matrix, where rows and columns are sorted according their frequen-
cies. The algorithm iterates through rows of the matrix from the most probable
one to the least probable one.
The simplified version of the search procedure is presented in the following
pseudocode:

cm ← context matrix
∆H = 0 ← the change of zero order entropy
T ← searched transf ormation
for row = 1 to dim(cm) do
for col 1 = 1 to dim(cm) do
for col 2 = col 1 + 1 to dim(cm) do
∆Htemp = compute dH(row, col 1, col 2)
if ∆Htemp − ∆H < 0 then
∆H = ∆Htemp ;
T = [row, col 1, col 2]
end if
end for
end for
end for

Fig. 4. Outline of the search procedure

The function compute dH has constant complexity because it computes only
change of entropy, probabilities will be modified only for symbols β and γ, so we
don’t need to recompute entropy of all symbols, but is is sufficient to recompute
entropies of β and γ. The complexity of the search procedure is dependent only
on the size of the alphabet and its worst case complexity is O(t|Σ|3 ), because
we have to perform 1/2|Σ|3 searches. In our algorithm design the maximum
allowable size of the alphabet is |Σ| = 256, since we interpret as a letter only a
one byte values. That concludes that the upper limit of operations needed for
one search is fixed and in the worst case it is given by |Σ|3 .
There are several techniques how the search procedure can be designed with
lower complexity, i.e. precomputation and storage of n best transformations at
the beginning and then only updating the modified ones, leads to the number of
entropy recomputation given by |Σ|3 + 2t|Σ|2 .
46 Michal Vašinek, Jan Platoš

3.2 Replacing procedure

The second step, the replacing procedure is very simple, it passes data t times
and exchanges both digrams. Let t be a number of performed GCTs and n is
the length of the input, then the complexity of replacing procedure is O(tn).
The inverse transformation consists only from replacing procedure and so it
also has the complexity O(tn). If the generalized context transformation is of the
form αβ ↔ αγ then its inverse transformation is of the same form, but is taken
in the opposite direction. The experimentally observed t was in range 100-1000
of transformations.

3.3 Comparison

When we let the complexitities functions to be equal we arrive at the limit when
one computation of a replacement procedure becomes more significant than the
one computation of a search procedure. We have to arrive at the limit because
the search procedure is independent of the input size:

c1 tn = c2 |Σ|3 + c3 t|Σ|2 (2)
Constants ci represents implementation specific coefficients. The number of
transformations t on both sides of the equation can be rearranged leaving us
with:
c2 |Σ|3 + c3 t|Σ|2 c2 |Σ|3 c3 |Σ|2
n= = + (3)
c1 t c1 t c1
as a number of transformations t → ∞ the first term on the right side
becomes zero:
c3 |Σ|2
lim n = (4)
t→∞ c1
when |Σ| is finite, then for all source strings of length m, where m > n, the
replacing procedure will be more computationally intensive than the search pro-
cedure.

c1 tm > c2 |Σ|3 + c3 t|Σ|2 (5)

4 Parallel transformations

In this section we describe conditions needed for parallelization of the context
and generalized context transformations.

4.1 Commuting transformations

Let’s consider two different transformations T1 and T2 , we say that these two
transformations commute, if they satisfy following definition:
Parallel Approach to Context Transformations 47

Definition 1. Two different transformations T1 and T2 commute if:

T1 (T2 (w)) = T2 (T1 (w)) (6)

In our model example of the string w = kakaoo the two presented transfor-
mations commute, since if T1 is ‘ko’ to ‘kk’ transformation and T2 is ‘oo’ to ‘ok’
transformation then T2 (T1 (‘kakaoo0 )) = T1 (T2 (‘kakaoo0 )) = kkkkok.
As an example of non-commuting transformations let’s consider transfor-
mation T1 again and transformation T3 that replaces digrams ‘ao’ to ‘aa’. Ap-
plying these two transformations in both ways will lead to different results.
T3 (T1 (w)) =0 kkkkao0 but T1 (T3 (w)) =0 kkkkoo0 so we see that T1 (T3 (w)) 6=
T3 (T1 (w))

4.2 Parallel transformations

Commutativity is not sufficient property to decide if two transformations could
be run in parallel. As may be seen along with property that they have an inverse
transformation to them.
Let’s consider again our example word w = kakaoo and two transformations
T4, representing ak → aa and T5 representing ao → aa. These two transforma-
tions commute and transform together the word ‘kakaoo’ into the word ‘kaaaao‘,
but when we perform inverse transformation, we can get different results, since
we don’t know which of two inverse transformations will replace digram ‘aa’ first.
Before we show how to handle inverse transformations, we introduce two sets,
an abstract set Di and set Ki , that will be later used to prove a theorem about
parallel transformations:

Definition 2. Let Di be a set of all unique digrams that are modified(created
or destroyed) by transformation Ti (αβ ↔ αγ) and let the set Ki be a set of all
transformation digrams Ki = {αβ, αγ}.

The set Di contains digrams αβ, αγ and all digrams of type βX and γX,
where X ∈ Σ. Suppose two sets D1 and D2 and let D1 ∩ D2 6= ∅, these two
sets share at least one common digram d, suppose that d ∈ K1 ∪ K2 , it means
that transformations T1 and T2 will interfere on this particular digram i.e. when
transformation T1 modifies digram αβ on αγ then the second transformation
won’t be able to modify digram αβ as it would do in the case when there is
no other transformation T1 . From the above reasoning we form a lemma about
parallel transformations:

Lemma 1. Two transformations T1 and T2 can be run in parallel if:

D1 ∩ D2 = ∅ (7)

Proof. Since D1 ∩ D2 = ∅ then no digrams that are modified by both trans-
formations T1 and T2 exist and so these two transformations can be applied
together. t
u
48 Michal Vašinek, Jan Platoš

Lemma 1 gives us a simple condition that we use in our algorithms to con-
struct a set of parallel transformations, but it is a weak condition, because there
still exists parallel transformations, but D1 ∩ D2 6= ∅, i.e. suppose T1 (ab ↔ ac)
and T2 (db ↔ dc), for these two transformations D1 ∩ D2 = {bX, cX}.

Theorem 1. Two transformations T1 and T2 can be run in parallel if:

D1 ∩ K2 = ∅ (8)

and
D2 ∩ K1 = ∅ (9)

Proof. Suppose that transformation T1 is of the form T1 (αβ ↔ αγ), it has
correspoding sets D1 = {αβ, αγ, βX, γX} and K1 = {αβ, αγ}.
If T2 contains in its K2 one of the elements from D1 then it means that the
transformation T1 can modify some of the elements that would be otherwise
transformed(replaced) by T2 , so the two transformations can be run in parallel
only if D1 ∩ K2 = ∅.
Similar reasoning can be used to prove theorem for equation (9). If T2 contains
in its D2 one of the elements from K1 , i.e. αβ then if T2 will change any occurence
of αβ first, the first transformation won’t be able to modify it to αγ, so such
transformations cannot be parallelized and for parallel transformations must
hold D2 ∩ K1 = ∅.
Now suppose that D1 ∩ K2 = ∅, but D2 ∩ K1 6= ∅, then some element
i.e. αβ is in the set D2 , we know that transformations are not parallel when
αβ ∈ K1 ∧ αβ ∈ K2 , now we prove that they cannot be parallelized also if
αβ ∈ D2 \K2 , bacause then T2 is of the form Xα ↔ XY , but when α is modified
on Y then instead of αβ will be Y β and transformation T1 cannot modify it.
The same is valid in the case when D1 ∩ K2 6= ∅, but D2 ∩ K1 = ∅. So both
conditions in Theorem 1 must be valid together. t
u

Because our parallel algorithm is based on Lemma 1, we show several other
properties that parallel transformations based on Lemma 1 have.

Theorem 2. Two different transformations T1 and T2 can be run in parallel if:

T1 (T2 (w)) = T2 (T1 (w)) = wT (10)

and
T1−1 (T2−1 (wT )) = T2−1 (T1−1 (wT )) = w (11)

Proof. Suppose the transformation T← (αβ ↔ αγ), where the arrow at the in-
dex is a label for transformation direction i.e. from right to left ←; it has its
coresponding set of modified digrams D← = {αβ, αγ, βX, γX}, next suppose
the transformation T→ (αβ ↔ αγ) and its coresponding set of digrams D→ =
{αβ, αγ, βX, γX}, we see that D← = D→ , but we know that T→ (T← (w)) = w,
so transformations T and T −1 share the same set D.
Parallel Approach to Context Transformations 49

If T1 (T2 (w)) = T2 (T1 (w)) = wT then it means that there are no interfering
digrams, because if there would be such digrams then the equality would not
hold, so D←,1 ∩ D←,2 = ∅, but we showed that D← = D→ so also intersect
of inverse transformations sets is empty and T1−1 and T2−1 will also recover w
correctly. t
u

Lemma 1 and Theorem 2 describes the same phenomenom and they can be
generalized for the arbitrary set of parallel transformations:

Theorem 3. The set of transformations T can be run in parallel if for all pairs
of transformations Ti and Tj holds that:

Di ∩ Dj = ∅ (12)

Proof. Suppose the set T can be run in parallel, and suppose two transformations
Ti , Tj ∈ T such that Di ∩ Dj 6= ∅, then it means that there exist some digram αβ
common for Ti and Tj and these transformations cannot be run in parallel, but
this is in contradiction with hypothesis that T is parallel, so the set Di ∩ Dj = ∅.
t
u

There is one important fact to emphasize that emerged as a consequence of
Theorem 2, it is a statement about inverse transformations:

Corollary 1. Let T = {T1 , T2 , . . . , Tn } is a set of parallel transformations, then
the set T −1 = {T1−1 , T2−1 , . . . , Tn−1 } is parallel as well.

Proof. In Theorem 2 we proved that Di = Di−1 and because for the set T holds
that for all sets Di , Dj is Di ∩ Dj = ∅, then also Di−1 ∩ Dj−1 = ∅ and the
transformation set T −1 can be run in parallel.

Corollary 1 is very important result because it tells us that we may parallelize
not only the set T but also its inverse T −1 so an inverse transformation algorithm
is parallelizable as well.
With the knowledge of Lemma 1 we know how to construct set T , now we
explore how large the set possibly can be for particular alphabet Σ:

Theorem 4. The maximal size MT of the set of parallel transformations T for
particular alphabet Σ is:
|Σ|
M =b c (13)
2
Proof. There are two basic types of the set D, one type coresponds to the trans-
formation of the form αα ↔ αβ and the second type coresponds to the transfor-
mation αβ ↔ αγ. The first set D1 = {αβ, αα, αX, βX} = {αX, βX} influences
two rows of the context matrix meanwhile the second set D2 = {αβ, αγ, βX, γX}
influences three rows. So when only transformations of the first type are selected
into T then at most b|Σ|/2c transformations can be run in parallel. t
u

Theorem 5. Relation to be parallel between transformations is not transitive.
50 Michal Vašinek, Jan Platoš

Proof. Suppose a sets of digrams Di∈{0,1,2} . Let D0 ∩ D1 = ∅ and D1 ∩ D2 = ∅,
if transitivity holds then D0 ∩ D2 = ∅, but this is not generally true. Consider
following example, let T0 (ab ↔ ac),T1 (ad ↔ ae), transformations are clearly
parallelizable. Let T2 (ac ↔ af ), D1 ∩ D2 = ∅, but D0 ∩ D2 6= ∅. t
u

5 Parallel version of the basic context transformation
algorithm
As we saw in the section about parallel transformations, there is a distinct num-
ber of rows affected by each transformation. This knowledge allows us, based
on Lemma 1, collect parallel transformations in the first step and afterwards
perform them simultaneously. Individual transformations are perfomed simulta-
neously in shared memory. The process is outlined in Fig.5.

cm ← context matrix
t array = [] ← parallel transf ormations
t size ← array size
for row = 1 to dim(cm) do
for col = 1 to dim(cm) do
i=0
while parallel transf ormation(t array, row, col) and i < t size do
append transf ormation(t array, get transf ormation())
i=i+1
end while
for transf ormation in t array do
perf orm parallel(transf ormation)
end for{This section is run in parallel}
end for
end for

Fig. 5. Parallel modification of basic transformation algorithm

We tested algorithm on the computer machine equipped by four proccessors.
Parallel algorithm was more then three times faster than the serial one. The
results are shown in Table 4.

Table 4. Average processing time per file. Dataset assasin.

Serial(ms) Parallel(ms) Speed-up
45.383 13.397 3.387

Serial and parallel algorithms can have different transformation paths(i.e.
different transformations or the order in which transformations are performed).
Parallel Approach to Context Transformations 51

In the serial version of the algorithm, there can be also performed transforma-
tions inaccessible to the parallel algorithm, so the resulted entropy is lower in
the serial case as is presented in Table 5.

No transformation Serial Parallel
4.938 4.054 4.106

Table 5. Entropy comparison. Dataset assasin.

6 Conclusion
We showed a basic properties that have to be fulfilled to run replacing procedure
of context transformation algorithms in parallel. The presented parallel version
of the basic context transformation algorithm significantly increases the speed of
processing of individual data files. On the other hand, usage of parallel algorithm
can lead to minor increase of the resulted entropy.
There are two directions in which we would like to continue our research, the
first direction is to prepare parallel algorithms according Theorem 1 instead of
Lemma 1 and since algorithms presented sofar performs transformations using
only digrams, in the future work we will also focus on development of algorithms
operating on different context lengths.

Acknowledgment
This work was supported by the SGS in VSB - Technical University of Ostrava,
Czech Republic, under the grant No. SP2015/146.

References
1. Burrows, M. and Wheeler D.J., A block sorting lossless data compression algorithm,
1994
2. Bentley, J, L. and Sleator, D. D. and Tarjan, R. E. and Wei, Victor K. A locally
adaptive data compression scheme Commun. ACM, vol 4, pp. 320-330, 1986
3. Salomon, D., Data Compression: The Complete Reference, Springer, NewYork, 2007.
4. Shannon, C. E., A mathematical theory of communication Bell System Technical
Journal, vol. 27, pp. 379-423 and 623-656, 1948.
5. Vašinek, M., Kontextové mapy a jejich aplikace Master Thesis, VŠB - Technical
University of Ostrava, 2013
6. Vašinek, M. and Platoš, J., Entropy reduction using context transformations Data
Compression Conference(DCC), pp. 431-431, 2014
7. Vašinek, M., Context Transformations Ph.D. Workshop of Faculty of Electrical En-
gineering and Computer Science, 2014