RLE-based Algorithm for Testing Biorders Oliver Lanzerath Department of Computer Sciences, Bonn-Rhein-Sieg University of Applied Sciences, Grantham-Allee 20, 53757 Sankt Augustin, Germany oliver.lanzerath@smail.inf.hochschule-bonn-rhein-sieg.de http://www.inf.h-brs.de Abstract. Binary relations with certain properties such as biorders, equiv- alences or difunctional relations can be represented as particular matri- ces. In order for these properties to be identified usually a rearrange- ment of rows and columns is required in order to reshape it into a recog- nisable normal form. Most algorithms performing these transformations are working on binary matrix representations of the underlying relations. This paper presents an approach to use the RLE-compressed matrix rep- resentation as a data structure for storing relations to test whether they are biorders in a hopefully more efficient way. Keywords: RLE-XOR · RLE-permutation · biorder 1 Introduction The matrix representation of a binary relation can be interpreted as a bitmap image, that is, a bit sequence. In many cases the usage of a run length encoding (RLE) technique results in a smaller representation of such pictures by shorter codes for lengthy bit strings. Hence, algorithms which use RLE-compressed binary matrices as input may have better runtime complexity on average. A bitvector x consists of alternating series of 0- and 1-sequences. Referring to [3], a bitvector can be represented by the lengths of the single sequences as follows. Run Length Encoding. Let seqi ∈ 0j |j ∈ N ∪ 1j |j ∈ N , i ∈ N, be a sequence   with value(seqi ) = 0, seqi ∈ 0 |j ∈ N and value(seqi ) = 1, seqi ∈ 1j |j ∈ N .  j  n Then, a bitvector x = x0 ...xn−1 ∈ {0, 1} can be represented as x = seq1 ...seqk , Pk 1 ≤ k ≤ n, value(seqi ) 6= value(seqi+1 ), i=1 |seqi | = n. The RLE-compression of a vector x is given by the vector xrle = x0 [|seq1 |, ..., |seqk |] (1) Figure 1 shows an example for the RLE-compression of a bitvector. Further- more we make use of a notation similar to arrays for accessing elements of the RLE-compressed vector, where xrle [0] refers to the leading element x0 and xrle [i], 1 ≤ i ≤ k, to the following length specifications.1 1 E.g. the RLE-compression of the vector x = 1100001 is given by xrle = 1 [2, 4, 1] with xrle [0] = 1, xrle [1] = 2, xrle [2] = 4 and xrle [3] = 1. 10 O. Lanzerath 11111 000 11111111111 | {z } |{z} | {z } 5 3 11 | {z } 1[5,3,11] Fig. 1. Example for the RLE-compression of a bitvector. Biorders. Biorders can be defined in different ways. First we introduce the for- mal definition[4]: Let R ⊆ X × X be a homogeneous binary relation. R is called biorder (or: Ferrers relation in heterogeneous cases), iff aRb ∧ cRd ∧ ¬aRd → cRb holds ∀ a, b, c, d ∈ X. In the context of this paper the following definiton is helpful. A binary rela- tion is called a biorder, iff the matrix can be represented in (upper left) echelon block form2 by rearranging rows and columns independently [4]. Thus, it is suffi- cient to test relations for being biorders by using an algorithm that computes the echelon block form if possible and returns an error otherwise. Figure 2 shows the results of the group stage of group B during the recent FIFA World Cup, where ARB , iff A won the match against B, A, B ∈ {ESP , CHL, NLD, AUS }, A 6= B. By rearranging rows and then columns the matrix for the relation R NLD NLD NLD AUS AUS AUS CHL CHL CHL ESP ESP ESP R R R ESP 0 0 0 1 NLD 1 1 0 1 NLD 1 1 1 0 CHL 1 0 0 1 CHL 1 0 0 1 CHL 1 1 0 0 NLD 1 1 0 1 ESP 0 0 0 1 ESP 1 0 0 0 AUS 0 0 0 0 AUS 0 0 0 0 AUS 0 0 0 0 Fig. 2. Group stage FIFA World Cup 2014 (group B). is transformed into echelon block form proving that R is a biorder.3 If a given relation is a biorder, the echelon block form can be achieved in two steps [1]: 1. Sort the rows by their Hamming weight4 in descending order. 2. Sort the columns by their Hamming weight in descending order. A corresponding algorithm would perform these two steps and check if the result is in echelon block form. The time complexity of such an algorithm is in O(2n log n + n2 ) ⊆ O(n2 ). The following lemma states, that with respect to biorder tests the second step is not needed. 2 Visually speaking, a binary matrix is in upper left echelon block form, if all 1-entries are placed in the upper left corner and the cardinality of the 1-entries monotonically decreases from top to bottom. 3 Soccer results do not induce biorders in general. The example is well-chosen here. 4 Pn−1 We use the notation kxk = i=0 xi to note the Hamming weight of a bitvector x (cf. [2]). RLE-based Algorithm for Testing Biorders 11 Lemma 1. Let A be a binary n × n-matrix, and let A0 be the result of sorting the rows a biorder if and only if A0 is of A by their Hamming weights in descending order. A is  exclusively composed of column vectors x∗i ∈ {0 , 1 } ∪ 1j 0n−j |1 ≤ j ≤ n . n n Proof. First, we show the if-part by contradiction. Assume that a given n × n- matrix resp. binary relation is a biorder, and after having sorted the rows by their Hamming weights there exists a column vector that does not fulfil the condition. Then there exists at least one column vector in which a 1-sequence follows n i, 1 ≤ i ≤ n, such that x∗i ∈ a 0-sequence, i.e. there exists at least one o j n−j−2 υ01ω|υ ∈ {0, 1} , ω ∈ {0, 1} , 0 ≤ j ≤ n − 2 . Now we sort the columns by their Hamming weights. Let xki = 0 and xli = 1 with k < l, then kxk∗ k ≥ kxl∗ k. After having sorted rows and columns the matrix should be in echelon block form because it is a biorder. Then xl0 = xl1 = · · · = xli = 1 and xk0 = xk1 = · · · = xki = 1 because of kxk∗ k ≥ kxl∗ k. This contradicts the assumption. The other direction is obvious. If the rows are ordered by Hamming weight and all column vectors are of the desired type, there must be an ordering hx∗j1 , x∗j2 , ..., x∗jn i, ji ∈ {0, ..., n − 1} of the column vectors with kx∗j1 k ≥ kx∗j2 k≥ ... ≥ kx∗jn k. The echelon block form is accomplished by sorting the columns according to this ordering. 2 Algorithm The complexity of checking all column vectors of a binary matrix for a certain form is still in O(n2 ). If the column vectors are RLE-compressed, the checking can be done in linear time. We only need to verify that all RLE-compressed col- umn vectors have a length of 1 or 2 and start with a 1-sequence. We assume the relation is represented in RLE-compressed form, i.e. column vectors as well as row vectors are RLE-compressed (c.f. running example in Fig. 3). A biorder- NLD AUS CHL ESP NLD AUS CHL ESP ESP 0 0 0 1 ESP 0[3,1] 0[1,2,1] 0[2,1,1] 0[4] 1[3,1] CHL 1 0 0 1 CHL 1[1,2,1] NLD 1 1 0 1 NLD 1[2,1,1] AUS 0 0 0 0 AUS 0[4] Fig. 3. RLE-compressed rows and columns checking algorithm is defined as follows. First one adjusts the definition of the Hamming weight for RLE-compressed vectors with respect to the new sorting algorithm. If the leading element xrle [0] is 1, all odd positions in the follow- ing array refer to 1-entries in the corresponding binary matrix, and vice versa. 12 O. Lanzerath Hence, the Hamming weight for RLE-compressed vectors can be defined as:5  rle  |xP | xrle [i], xrle [0] = 1     i=1,i∈O kxrle k = rle |xP | + (2)  rle rle x [i], x [0] = 0     i=2,i∈E+ where O+ and E+ are the positive odd and even numbers, respectively. Sorting RLE-compressed row vectors by the Hamming weight causes changes in the column vectors, too. In binary-coded cases this is no point of interest because this change occurs automatically. But, if the vectors are RLE- compressed, it is much more complicated. A permutation of the row vectors xrle rle j∗ can have effects on all column vectors x∗i . Assume xj∗ and xk∗ are swapped then xji and xki must be inverted iff xji 6= xki . Algorithm 1 implements a bit- compare function for RLE-compressed vectors. Data: A RLE-compressed n-bitvector xrle and two integers j, k, 1 ≤ j ≤ n, 1 ≤ k ≤ n, j < k Result: Boolean: xk 6= xj dist := 0; counter := xrle [1]; for i := 2 to |xrle | do if counter ≥ j && counter ≤ k then dist + +; end counter+ = xrle [i]; end if dist == 1 mod 2 then return true; else return f alse; end Algorithm 1: bitCompare-function for RLE-compressed n-bitvectors To ensure that in case xji 6= xki the corresponding bits must be switched, we use an XOR-operation with a special bit mask: x x1 · · · xj−1 xj xj+1 · · · xk−1 xk xk+1 · · · xn mask 0 · · · 0 1 0 · · · 0 1 0 · · · 0 XOR x1 · · · xj−1 xj xj+1 · · · xk−1 xk xk+1 · · · xn The logical XOR is the essential part of the whole procedure. Algorithm 2 shows how the XOR-operation can be computed on RLE-compressed vectors. 5 |xrle | is the length of the array or the RLE-compressed vector, respectively. RLE-based Algorithm for Testing Biorders 13 First the biorder checking algorithm sorts the row vectors xrle j∗ by their Ham- ming weights. For each pairwise permutation of rows all column vectors xrle ∗i must be adjusted if necessary. After that the resulting column vectors must be checked for meeting the conditions of Lemma 1. If each column vector meets the conditions, the underlying relation is a biorder. 3 Examination As is well known, sorting row vectors of a binary matrix by the Hamming weight is in O(n log n), as well as sorting RLE-compressed vectors. But, to check the conditions of Lemma 1 in later process, the column vectors need to be adjusted, too. In the worst case all n column vectors must be modified for each change of rows and, therefore, n XOR-operations have to be computed. The required time for each XOR-operation depends on the length of the RLE- compressed vectors. Hence, the time complexity of the sorting procedure is bounded from above by O(n3 log n). Now, the test for meeting the conditions of Lemma 1 can be done in O(n). Data: Two RLE-compressed copies of n-bitvectors xrle , y rle Result: Result of the XOR z rle xpointer := 1; ypointer := 1; zpointer := 1; z rle [0] := |xrle [0] − y rle [0]|; zpointer z rle [i] < n do P while i:=1 if xrle [xpointer ] == y rle [ypointer ] then z rle [zpointer ]+ = xrle [xpointer ]; xpointer + +; ypointer + +; else if xrle [xpointer ] < y rle [ypointer ] then z rle [zpointer ]+ = xrle [xpointer ]; y rle [ypointer ]− = xrle [xpointer ]; xpointer + +; zpointer + +; else z rle [zpointer ]+ = y rle [ypointer ]; xrle [xpointer ]− = y rle [ypointer ]; ypointer + +; zpointer + +; end Algorithm 2: XOR on RLE-compressed n-bitvectors 14 O. Lanzerath To speed up the runtime of the algorithm one could use a more specific op- eration than the XOR. Furthermore, generating customized bitmasks for each column during the sorting process could reduce the number of necessary XOR- operations to n. In this paper the logical XOR on RLE-compressed vectors is focused because using it in context of checking relations for being biorders is only one possible use case. An other one could be a simple compare function that returns true if two RLE-compressed vectors are equal and f alse otherwise. Premising two n-bitvectors xrle , y rle are equal, the bitwise XOR requires n steps and the RLE-XOR requires |xrle | · 32 steps (assuming that natural numbers are saved as 32-bit integers). Hence, such an algorithm can reach much better run- times for sparsely populated or sorted vectors than a bitwise one. A conceivable use case for such an algorithm could be an automated error correction for par- ticular kinds of databases. 4 Conclusion Advantages and disadvantages of using RLE codes in the context of relational algebraic methods have not been well studied yet. The main contribution of this paper is to motivate further research and analysis of RLE encodings for binary relations. Actually, the presented algorithm is in O(n3 ) but clearly conveys the advantages of using RLE-codes. Our current work focuses on the development of further efficient algorithms for logical operators on RLE codes. Once a suf- ficient repository is available, these operations can be used for more efficient row-to-row and column-to-column comparison and, hence, sorting. Together with more test procedures for difunctionality and transitivity, we hope to find an algorithm that comes close to the “magic” runtime complexity of O(n2.3 ). Acknowledgements. The author wish to express his thanks to Dr. Martin E. Müller and Professor Dr. Kurt-Ulrich Witt for careful reading and useful com- ments. References 1. Müller, M.E.: Towards Finding Maximal Subrelations with Desired Properties. In: Höfner, P., Jipsen, P., Kahl, W., Müller, M.E. (eds.) RAMiCS 2014. LNCS, vol. 8428, pp. 344–361. Springer, Heidelberg (2014) 2. Reed, I.: A class of multiple-error-correcting codes and the decoding scheme. Trans- actions of the IRE Professional Group on Information Theory, vol. 4, no. 4, pp. 38–49 (1954) 3. Salomon, D.: Data compression - The Complete Reference, 4th Edition. Springer, Lon- don (2007) 4. Schmidt, G.: Relational Mathematics, Encyclopedia of Mathematics and its Applica- tions, vol. 132. Cambridge University Press, Cambridge (2011)