=Paper= {{Paper |id=Vol-1454/ramics15-st_9-14 |storemode=property |title=RLE-based Algorithm for Testing Biorders |pdfUrl=https://ceur-ws.org/Vol-1454/ramics15-st_9-14.pdf |volume=Vol-1454 }} ==RLE-based Algorithm for Testing Biorders== https://ceur-ws.org/Vol-1454/ramics15-st_9-14.pdf
           RLE-based Algorithm for Testing Biorders

                                     Oliver Lanzerath

 Department of Computer Sciences, Bonn-Rhein-Sieg University of Applied Sciences,
               Grantham-Allee 20, 53757 Sankt Augustin, Germany
    oliver.lanzerath@smail.inf.hochschule-bonn-rhein-sieg.de
                        http://www.inf.h-brs.de


         Abstract. Binary relations with certain properties such as biorders, equiv-
         alences or difunctional relations can be represented as particular matri-
         ces. In order for these properties to be identified usually a rearrange-
         ment of rows and columns is required in order to reshape it into a recog-
         nisable normal form. Most algorithms performing these transformations
         are working on binary matrix representations of the underlying relations.
         This paper presents an approach to use the RLE-compressed matrix rep-
         resentation as a data structure for storing relations to test whether they
         are biorders in a hopefully more efficient way.

         Keywords: RLE-XOR · RLE-permutation · biorder


1      Introduction
The matrix representation of a binary relation can be interpreted as a bitmap
image, that is, a bit sequence. In many cases the usage of a run length encoding
(RLE) technique results in a smaller representation of such pictures by shorter
codes for lengthy bit strings. Hence, algorithms which use RLE-compressed
binary matrices as input may have better runtime complexity on average. A
bitvector x consists of alternating series of 0- and 1-sequences. Referring to [3],
a bitvector can be represented by the lengths of the single sequences as follows.

Run Length Encoding. Let seqi ∈ 0j |j ∈ N ∪ 1j |j ∈ N , i ∈ N, be a sequence
                                                

with value(seqi ) = 0, seqi ∈ 0 |j ∈ N and value(seqi ) = 1, seqi ∈ 1j |j ∈ N .
                               j                                    
                                          n
Then, a bitvector x = x0 ...xn−1 ∈ {0, 1} can be represented as x = seq1 ...seqk ,
                                          Pk
1 ≤ k ≤ n, value(seqi ) 6= value(seqi+1 ), i=1 |seqi | = n. The RLE-compression
of a vector x is given by the vector
                                xrle = x0 [|seq1 |, ..., |seqk |]                       (1)
Figure 1 shows an example for the RLE-compression of a bitvector. Further-
more we make use of a notation similar to arrays for accessing elements of
the RLE-compressed vector, where xrle [0] refers to the leading element x0 and
xrle [i], 1 ≤ i ≤ k, to the following length specifications.1
 1
     E.g. the RLE-compression of the vector x = 1100001 is given by xrle = 1 [2, 4, 1] with
     xrle [0] = 1, xrle [1] = 2, xrle [2] = 4 and xrle [3] = 1.
10      O. Lanzerath


                                 11111  000 11111111111
                                 | {z } |{z} |   {z   }
                                     5     3           11
                                 |             {z           }
                                           1[5,3,11]



                Fig. 1. Example for the RLE-compression of a bitvector.


Biorders. Biorders can be defined in different ways. First we introduce the for-
mal definition[4]: Let R ⊆ X × X be a homogeneous binary relation. R is called
biorder (or: Ferrers relation in heterogeneous cases), iff
                              aRb ∧ cRd ∧ ¬aRd → cRb
holds ∀ a, b, c, d ∈ X.
    In the context of this paper the following definiton is helpful. A binary rela-
tion is called a biorder, iff the matrix can be represented in (upper left) echelon
block form2 by rearranging rows and columns independently [4]. Thus, it is suffi-
cient to test relations for being biorders by using an algorithm that computes the
echelon block form if possible and returns an error otherwise. Figure 2 shows
the results of the group stage of group B during the recent FIFA World Cup,
where ARB , iff A won the match against B, A, B ∈ {ESP , CHL, NLD, AUS },
A 6= B. By rearranging rows and then columns the matrix for the relation R
               NLD




                                               NLD




                                                                       NLD
               AUS




                                               AUS




                                                                       AUS
               CHL




                                               CHL




                                                                       CHL
               ESP




                                               ESP




                                                                       ESP
            R                              R                       R
          ESP 0 0 0 1                    NLD 1 1 0 1             NLD 1 1 1 0
          CHL 1 0 0 1                    CHL 1 0 0 1             CHL 1 1 0 0
          NLD 1 1 0 1                    ESP 0 0 0 1             ESP 1 0 0 0
          AUS 0 0 0 0                    AUS 0 0 0 0             AUS 0 0 0 0

                  Fig. 2. Group stage FIFA World Cup 2014 (group B).


is transformed into echelon block form proving that R is a biorder.3 If a given
relation is a biorder, the echelon block form can be achieved in two steps [1]:
 1. Sort the rows by their Hamming weight4 in descending order.
 2. Sort the columns by their Hamming weight in descending order.
A corresponding algorithm would perform these two steps and check if the
result is in echelon block form. The time complexity of such an algorithm is
in O(2n log n + n2 ) ⊆ O(n2 ). The following lemma states, that with respect to
biorder tests the second step is not needed.
 2
   Visually speaking, a binary matrix is in upper left echelon block form, if all 1-entries
   are placed in the upper left corner and the cardinality of the 1-entries monotonically
   decreases from top to bottom.
 3
   Soccer results do not induce biorders in general. The example is well-chosen here.
 4                             Pn−1
   We use the notation kxk = i=0     xi to note the Hamming weight of a bitvector x (cf.
   [2]).
                                    RLE-based Algorithm for Testing Biorders       11

Lemma 1. Let A be a binary n × n-matrix, and let A0 be the result of sorting the rows
                                                        a biorder if and only if A0 is
of A by their Hamming weights in descending order. A is 
exclusively composed of column vectors x∗i ∈ {0 , 1 } ∪ 1j 0n−j |1 ≤ j ≤ n .
                                               n n



Proof. First, we show the if-part by contradiction. Assume that a given n × n-
matrix resp. binary relation is a biorder, and after having sorted the rows by
their Hamming weights there exists a column vector that does not fulfil the
condition. Then there exists at least one column vector in which a 1-sequence
follows
n                                                    i, 1 ≤ i ≤ n, such that x∗i ∈
        a 0-sequence, i.e. there exists at least one o
                  j            n−j−2
 υ01ω|υ ∈ {0, 1} , ω ∈ {0, 1}               , 0 ≤ j ≤ n − 2 . Now we sort the columns
by their Hamming weights. Let xki = 0 and xli = 1 with k < l, then kxk∗ k ≥
kxl∗ k. After having sorted rows and columns the matrix should be in echelon
block form because it is a biorder. Then xl0 = xl1 = · · · = xli = 1 and xk0 =
xk1 = · · · = xki = 1 because of kxk∗ k ≥ kxl∗ k. This contradicts the assumption.
   The other direction is obvious. If the rows are ordered by Hamming weight
and all column vectors are of the desired type, there must be an ordering
hx∗j1 , x∗j2 , ..., x∗jn i, ji ∈ {0, ..., n − 1} of the column vectors with kx∗j1 k ≥
kx∗j2 k≥ ... ≥ kx∗jn k. The echelon block form is accomplished by sorting the
columns according to this ordering.



2   Algorithm

The complexity of checking all column vectors of a binary matrix for a certain
form is still in O(n2 ). If the column vectors are RLE-compressed, the checking
can be done in linear time. We only need to verify that all RLE-compressed col-
umn vectors have a length of 1 or 2 and start with a 1-sequence. We assume
the relation is represented in RLE-compressed form, i.e. column vectors as well
as row vectors are RLE-compressed (c.f. running example in Fig. 3). A biorder-
               NLD
               AUS
               CHL
               ESP




                                                                NLD
                                                                AUS
                                                                CHL
                                                                ESP




         ESP 0 0 0 1                ESP 0[3,1]
                                                                0[1,2,1]
                                                                0[2,1,1]
                                                                0[4]
                                                                1[3,1]




         CHL 1 0 0 1                CHL 1[1,2,1]
         NLD 1 1 0 1                NLD 1[2,1,1]
         AUS 0 0 0 0                AUS 0[4]

                      Fig. 3. RLE-compressed rows and columns



checking algorithm is defined as follows. First one adjusts the definition of the
Hamming weight for RLE-compressed vectors with respect to the new sorting
algorithm. If the leading element xrle [0] is 1, all odd positions in the follow-
ing array refer to 1-entries in the corresponding binary matrix, and vice versa.
12         O. Lanzerath

Hence, the Hamming weight for RLE-compressed vectors can be defined as:5
                            rle
                             |xP |
                                        xrle [i], xrle [0] = 1
                           
                           
                           
                           
                            i=1,i∈O
                 kxrle k =      rle
                              |xP   |
                                      +
                                                                       (2)
                           
                                         rle       rle
                                        x [i], x [0] = 0
                           
                           
                           
                           
                                      i=2,i∈E+


where O+ and E+ are the positive odd and even numbers, respectively.
    Sorting RLE-compressed row vectors by the Hamming weight causes
changes in the column vectors, too. In binary-coded cases this is no point of
interest because this change occurs automatically. But, if the vectors are RLE-
compressed, it is much more complicated. A permutation of the row vectors
xrle                                         rle
  j∗ can have effects on all column vectors x∗i . Assume xj∗ and xk∗ are swapped
then xji and xki must be inverted iff xji 6= xki . Algorithm 1 implements a bit-
compare function for RLE-compressed vectors.


      Data: A RLE-compressed n-bitvector xrle and two integers
              j, k, 1 ≤ j ≤ n, 1 ≤ k ≤ n, j < k
      Result: Boolean: xk 6= xj
      dist := 0;
      counter := xrle [1];
      for i := 2 to |xrle | do
           if counter ≥ j && counter ≤ k then
                dist + +;
           end
           counter+ = xrle [i];
      end
      if dist == 1 mod 2 then
           return true;
      else
           return f alse;
      end
       Algorithm 1: bitCompare-function for RLE-compressed n-bitvectors


To ensure that in case xji 6= xki the corresponding bits must be switched, we
use an XOR-operation with a special bit mask:

                       x x1 · · · xj−1 xj xj+1 · · · xk−1 xk xk+1 · · · xn
                    mask 0 · · · 0 1 0 · · · 0 1 0 · · · 0
                    XOR x1 · · · xj−1 xj xj+1 · · · xk−1 xk xk+1 · · · xn

The logical XOR is the essential part of the whole procedure. Algorithm 2 shows
how the XOR-operation can be computed on RLE-compressed vectors.
5
     |xrle | is the length of the array or the RLE-compressed vector, respectively.
                                         RLE-based Algorithm for Testing Biorders   13

   First the biorder checking algorithm sorts the row vectors xrle
                                                               j∗ by their Ham-
ming weights. For each pairwise permutation of rows all column vectors xrle  ∗i
must be adjusted if necessary. After that the resulting column vectors must be
checked for meeting the conditions of Lemma 1. If each column vector meets
the conditions, the underlying relation is a biorder.



3   Examination

As is well known, sorting row vectors of a binary matrix by the Hamming
weight is in O(n log n), as well as sorting RLE-compressed vectors. But, to check
the conditions of Lemma 1 in later process, the column vectors need to be
adjusted, too. In the worst case all n column vectors must be modified for
each change of rows and, therefore, n XOR-operations have to be computed.
The required time for each XOR-operation depends on the length of the RLE-
compressed vectors. Hence, the time complexity of the sorting procedure is
bounded from above by O(n3 log n). Now, the test for meeting the conditions
of Lemma 1 can be done in O(n).



    Data: Two RLE-compressed copies of n-bitvectors xrle , y rle
    Result: Result of the XOR z rle
    xpointer := 1;
    ypointer := 1;
    zpointer := 1;
    z rle [0] := |xrle [0] − y rle [0]|;
             zpointer
                      z rle [i] < n do
                P
    while
            i:=1
       if xrle [xpointer ] == y rle [ypointer ] then
           z rle [zpointer ]+ = xrle [xpointer ];
           xpointer + +;
           ypointer + +;
       else if xrle [xpointer ] < y rle [ypointer ] then
           z rle [zpointer ]+ = xrle [xpointer ];
           y rle [ypointer ]− = xrle [xpointer ];
           xpointer + +;
           zpointer + +;
       else
           z rle [zpointer ]+ = y rle [ypointer ];
           xrle [xpointer ]− = y rle [ypointer ];
           ypointer + +;
           zpointer + +;

    end
              Algorithm 2: XOR on RLE-compressed n-bitvectors
14      O. Lanzerath

    To speed up the runtime of the algorithm one could use a more specific op-
eration than the XOR. Furthermore, generating customized bitmasks for each
column during the sorting process could reduce the number of necessary XOR-
operations to n. In this paper the logical XOR on RLE-compressed vectors is
focused because using it in context of checking relations for being biorders is
only one possible use case. An other one could be a simple compare function
that returns true if two RLE-compressed vectors are equal and f alse otherwise.
Premising two n-bitvectors xrle , y rle are equal, the bitwise XOR requires n steps
and the RLE-XOR requires |xrle | · 32 steps (assuming that natural numbers are
saved as 32-bit integers). Hence, such an algorithm can reach much better run-
times for sparsely populated or sorted vectors than a bitwise one. A conceivable
use case for such an algorithm could be an automated error correction for par-
ticular kinds of databases.


4    Conclusion

Advantages and disadvantages of using RLE codes in the context of relational
algebraic methods have not been well studied yet. The main contribution of this
paper is to motivate further research and analysis of RLE encodings for binary
relations. Actually, the presented algorithm is in O(n3 ) but clearly conveys the
advantages of using RLE-codes. Our current work focuses on the development
of further efficient algorithms for logical operators on RLE codes. Once a suf-
ficient repository is available, these operations can be used for more efficient
row-to-row and column-to-column comparison and, hence, sorting. Together
with more test procedures for difunctionality and transitivity, we hope to find
an algorithm that comes close to the “magic” runtime complexity of O(n2.3 ).

Acknowledgements. The author wish to express his thanks to Dr. Martin E.
Müller and Professor Dr. Kurt-Ulrich Witt for careful reading and useful com-
ments.


References
1. Müller, M.E.: Towards Finding Maximal Subrelations with Desired Properties. In:
   Höfner, P., Jipsen, P., Kahl, W., Müller, M.E. (eds.) RAMiCS 2014. LNCS, vol. 8428,
   pp. 344–361. Springer, Heidelberg (2014)
2. Reed, I.: A class of multiple-error-correcting codes and the decoding scheme. Trans-
   actions of the IRE Professional Group on Information Theory, vol. 4, no. 4, pp. 38–49
   (1954)
3. Salomon, D.: Data compression - The Complete Reference, 4th Edition. Springer, Lon-
   don (2007)
4. Schmidt, G.: Relational Mathematics, Encyclopedia of Mathematics and its Applica-
   tions, vol. 132. Cambridge University Press, Cambridge (2011)