Multiway Blockwise In-place Merging Viliam Geffert and Jozef Gajdoš Institute of Computer Science, P.J.Šafárik University, Faculty of Science Jesenná 5, 041 54 Košice, Slovak Republic viliam.geffert@upjs.sk, jozef.gajdos@upjs.sk Abstract. We present an algorithm for asymptotically ef- array A. Moreover, the lengths n1 , . . . , nk of input se- ficient multiway blockwise in-place merging. Given an ar- quences are positive integer multiples of s, and hence, ray A containing sorted subsequences Pk A1 , . . . , Ak of respec- there is always a block boundary between the last el- tive lengths n1 , . . . , nk , where n = n, we assume i=1 i ement of Ai and the first element of Ai+1 , for each that extra k·s elements (so called buffer elements) are po- i ∈ 1, . . . , k−1. We shall also assume, that before the sitioned at the very end of array A, and that the lengths merging starts, blocks can be mixed up quite arbitrar- n1 , . . . , nk are positive integer multiples of some param- ily, so we no longer know the original membership of eter s (i.e., multiples of a given block of length s). The number of input sequences k is a fixed constant parameter, blocks in the input sequences A1 , . . . , Ak . not dependent on the lengths of input sequences. Then our So far, the problem has been resolved for two-way algorithm merges the subsequences A1 , . . . , Ak into a single merging, i.e., for k = 2 [4]. This algorithm uses 2n + sorted sequence, performing Θ(log k·n) + O((n/s)2 ) + O(s· o(n) comparisons, 3n + o(n) element moves and O(1) log s) element comparisons and 3 · n + O(s · log s) element extra locations for storing elements, in the worst case. moves.1 Thus, by repeated application of this algorithm, we Then, for s = dn2/3 /(log n)1/3 e, this gives an algorithm could carry out k-way merging in linear time, for ar- performing Θ(log k ·n) + O((n·log n)2/3 ) comparisons and bitrary k ≥ 2. However, implemented this way, the k- 3·n + O((n·log n)2/3 ) moves. That is, our algorithm runs way merging would perform 3·dlog ke·n+o(n) element in linear time, with an asymptotically optimal number of moves and 2·dlog ke·n+o(n) element comparisons. We comparisons and with the number of moves independent shall show that the number of moves does not depend on the number of input sequences. Moreover, our algo- rithm is “almost in-place”, it requires only k extra blocks on k, if the lengths n1 , . . . , nk are integer multiples of size s = o(n). of the block size s. Namely, using the algorithm of Geffert et. al [4] as our starting point, we show that multiway blockwise in-place merging is possible with 1 Introduction dlog ke·n + O((n/s)2 ) + O(s·log s) element comparisons and 3·n + O(s·log s) moves. For s = dn2/3 /(log n)1/3 e, this gives an algorithm with dlog ke·n + O((n·log n)2/3 ) Given an array A[1..n] consisting of sorted subsequen- comparisons and 3·n + O((n·log n)2/3 ) moves, and the ces A1 , . . . , Ak each Pcontaining n1 , . . . , nk elements re- k number of element moves independent on the number spectively, where i=1 ni = n, the classical multiway of input sequences. (It is also easy to show that the in-place merging problem is to rearrange these ele- number of comparisons cannot be improved.) ments to form a single sorted sequence of n elements, assuming that only one extra storage location (in ad- dition to the array A) is available for storing elements. 2 Comparisons in a simple multiway To store array indices, counters, etc. only O(1) stor- merging age locations are available. The efficiency of a merg- ing algorithm is given by two quantities: the number To explain how elements are compared, we first solve of pairwise element comparisons and the number of a simpler task. Assume that we are given an array A, element moves carried out in the worst case, both ex- consisting of k sorted subsequences A1 , A2 , . . . , Ak , pressed as a function of n. In merging, these are the that are to be merged into a single sorted sequence. only operations permitted for elements. The lengths of these P subsequences are n1 , n2 , . . . , nk In this paper we study the computational complex- respectively, with k ni = n. i=1 ity of the multiway blockwise in-place merging prob- Assume also that, together with the given array A, lem. More precisely, we assume that the entire array A we are also given an extra array B of the same size n, is divided into blocks of equal size s, and that k ex- which will be used as an output zone. tra blocks of size s are positioned at the very end of During the computation, the algorithm uses aux- 1 Throughout the paper, log x denotes the binary loga- iliary index variables i1 , . . . , ik and oc , where ij , for rithm of x. j ∈ {1, . . . , k}, points to the smallest element of the sequence Aj not yet processed. This element will be Before passing further, we define the following rel- called the current input element of the j-th sequence, ative order of blocks in the array A. Let X be a block or simply the j-th input element. The index oc points with the leftmost and the rightmost elements denoted to the leftmost empty position in the array B. by xL and xR , respectively. Such block can be repre- Then the straightforward implementation of the sented in the form X = hxL , xR i. Similarly, let Y = merge routine proceeds as follows. We find the small- hyL , yR i be an another block. We say that the block X est element not yet processed, by comparing elements is smaller than or equal to Y , if xL < yL , or xL = yL at the positions i1 , . . . , ik , and move this element to and xR ≤ yR . Otherwise, X is greater than Y . In other the output zone in B. After that, we update the nec- words, the blocks are ordered according to their left- essary index variables and repeat the process until all most elements and, in the case of equal leftmost el- the elements have been merged. Implemented this way, ements, the elements at the rightmost positions are each element will be moved just once and the number used as the second order criterion. of comparisons, per each element, will be k −1. This gives us (k−1)·n comparisons and n element moves in total. Now the modified merging algorithm proceeds as follows. First, using the above block ordering, find the The number of comparisons can be reduced by im- smallest k blocks in the array A. These blocks will ini- plementing a selection tree of depth dlog ke above the tially become the k current input blocks, their leftmost k current input elements. Initially, to build a selec- elements becoming the k current input elements. The tion tree, k − 1 comparisons are required. Then the j-th current input block will be denoted by Xj , simi- smallest element, not yet processed, can be moved to larly, the j-th current input element by xj . The posi- the output zone. After this, the element following the tions of current input elements are kept in index vari- smallest element in the same subsequence is inserted ables i1 , . . . , ik . Above the k current input elements, in the tree and the selection tree is updated. To do we build a selection tree. All blocks that are not input this, only dlog ke comparisons are needed. To avoid el- blocks are called common blocks. ement moves, only pointers to elements are stored in After that, the merging process can proceed in the the selection tree. (For more details concerning this same way as described in Section 2. That is, using data structure, see [1–3].) The number of moves re- the selection tree, determine ij , the position of the mains unchanged, but now we have k − 1 compar- smallest input element not yet processed, among the isons for the first element and only dlog ke compar- k current input elements, and move this element to isons per each other element. This gives us a total of the output zone in the array B. Then the element po- (k−1) + dlog ke·(n−1) ≤ dlog ke·n + O(1) comparisons. sitioned immediately on the right of xj , within the same block Xj , becomes a new j-th current input el- 3 Comparisons in a blockwise merging ement, its index pointer is inserted into the selection tree, and the tree is updated, with dlog ke compar- This section describes one of the cardinal tricks used in isons. This can be repeated until one of the current our algorithm. Again, we are given the array A con- input blocks becomes empty. sisting of the sorted subsequences A1 , . . . , Ak , to be When this happens, i.e., each time the element xj , merged together. We still have the extra array B, used just moved to the output zone, was the last (right- as an output zone. most) element in the corresponding input block Xj , However, now the entire array A is divided into the block Xj is “discarded” and the smallest (accord- blocks of equal size s (the exact value of s will be ing to our relative block ordering) common block not determined later, so that the number of comparisons yet processed will be used as the new j-th current and moves is minimized) and, before the merging can input block. The leftmost element in this block will start, these blocks are mixed up quite arbitrarily. Be- become the new j-th current input element. Since the cause of the permutation of blocks in A, we no longer blocks are mixed up in the array A, we need to scan know the original membership of blocks in the input sequentially all blocks (actually, all blocks not yet pro- sequences A1 , . . . , Ak . cessed only) to determine which one of them is the Still, the relative order of elements inside individ- smallest. This search for a new input block consumes ual blocks is preserved. Moreover, we shall also assume O((n/s)2 ) additional comparisons: there are at most that n1 , . . . , nk , the respective lengths of input se- n/s blocks and such search is activated only if one of quences, are positive integer multiples of s, and hence, the input blocks has been discarded as empty, i.e., at before mixing the blocks up, there was always a block most n/s times. (For the time being, just assume that boundary between the last element of Ai and the first we can distinguish discarded blocks from those not yet element of Ai+1 , for each i ∈ 1, . . . , k−1. processed, at no extra cost.) However, before merging, the blocks have been mi- a so-called escape block. We also maintain a current xed up quite arbitrarily and hence their origin in the escape position ec , which is initially the position of input subsequences A1 , . . . , Ak cannot be recovered. the first (leftmost) element in the escape block. We The proof that the above algorithm behaves correctly, create a hole here by putting the buffer element at that is, the elements are transported to the output this position aside. zone in sorted order, will be published in the full ver- Now, find the smallest k blocks2 in the area oc- sion of the paper. (** The proof can also be found in cupied by A1 , . . . , Ak , according to the relative block the Appendix for the Program Committee. **) ordering defined in Section 3. This can be done with The number of element moves remains unchanged, O(k 2 ) ≤ O(1) comparisons, by the use of some k cur- but now we use dlog ke · n + O((n/s)2 ) comparisons, sors (index variables) moving along in A, since each under assumption that we can distinguish discarded of the sequences A1 , . . . , Ak is sorted. The smallest k blocks from those not yet processed at no extra cost. blocks will initially become the k current input blocks X1 , . . . , Xk . For each j = 1, . . . , k, the first element xj in the block Xj becomes a j-th current input element, 4 In-place merging, simplified case and its position is kept in the index variable ij . Above the k input elements, we build a selection tree of depth Now we shall convert the above merging algorithm dlog ke. To do that, k −1 ≤ O(1) initial comparisons into a procedure working “almost” in-place. More pre- are needed. cisely, we are again given the array A containing the The very first block of the array A becomes an sorted subsequences Pk A1 , . . . , Ak , of respective lengths output block and a position oc = 1 pointing there n1 , . . . , nk , with i=1 ni = n. All these lengths are becomes a current output position. The initial out- positive integer multiples of the given parameter s. put position may — quite likely — coincide with a po- However, we no longer have a separate array B of sition of some current input element. Observe that size n. Instead, we have some extra k·s elements po- ec mod s = oc mod s, which is an invariant we shall sitioned at the very end of the array A, behind Ak . keep in the course of the entire computation. All other The elements in this small additional area are greater blocks are called common blocks. than any of the elements in A1 , . . . , Ak . During the computation, they can be mixed with other elements, but their original contents cannot be destroyed. These In general, the algorithm maintains current posi- elements will be called buffer elements. To let the el- tions of the following special blocks: free blocks, the ements ever move, we have also one extra location number of which ranges between 0 and k, their left- where we can put a single element aside. most positions are stored in the free block stack; ex- The sorted output should be formed within the actly k input blocks, the current input positions inside same array A, in the locations occupied by the input the respective blocks are stored in the index variables sequences A1 , . . . , Ak . (As a consequence, the buffer el- i1 , . . . , ik ; one output block with the current output ements should also end up in their original locations.) position oc inside this block; and one escape block with Therefore, the moves are performed in a different way, the current escape position ec inside. The values of oc based on the idea of internal buffering, used in a two- and ec are synchronized modulo s. way in-place merging [4]. Nevertheless, the compar- Usually, the optional free blocks, the k current in- isons are performed in the same way as described in put blocks, the output block, and the escape block are Section 3. all disjoint, and the merging proceeds as described in Section 4.2. However, after the initiation, the output block may overlay one of the current input blocks, if 4.1 Initiation the leftmost block in A1 has been selected as an in- Divide the entire array A into blocks of equal size s. put block. If this happens, the current output position Since the lengths of all input sequences A1 , . . . , Ak are coincides with a position of one of the current input positive integer multiples of s, there is always a block elements, and the computation starts in a very special boundary between the last element of Ai and the first mode of Section 4.9. element of Ai+1 , for each i ∈ 1, . . . , k−1. Similarly, the buffer elements, positioned in the small additional 2 Picking simply the leftmost blocks in the sequences area at the very end, form the last k blocks. A1 , . . . , Ak would do no harm. In addition, this would Initially, the last k blocks will be used as free blocks, not require any initial element comparisons. However, their starting positions are stored in a free block stack we are presenting the algorithm in a form that is suitable of height k. After that, the position of one free block for application in the general case, comparing elements is picked out of the stack and this block is used as in accordance with the strategy presented in Section 3. 4.2 Standard situation Step B. The smallest input element xj not yet pro- cessed is moved from the position ij to its final The standard situation is illustrated by Fig. 1. Dur- position at oc . ing the computation, the k·s buffer elements can be Step C. A new hole is created at the position ec +1 by found at the following locations: to the left of the j- moving its buffer element to the place released by th input element xj in the j-th input block Xj , for the smallest input element just moved. After that, j ∈ {1, . . . , k}, to the right of ec in the escape block, all necessary index variables are incremented and with the hole at the position ec , and also in free blocks, the selection tree is updated. consisting of buffer elements only. The elements merged already, from all the input This gives 3 moves and dlog ke comparisons per each blocks, form a contiguous output zone at the very be- element transported to its final location. Now there ginning of A, ending at position oc −1. Hence, the next are various special cases that should be detected and element to be output will go to the position oc in the handled with care. All exceptions are checked up on output block. after the execution of Step B, in the order of their All elements not merged yet are scattered in blocks appearance, unless stated otherwise. Most of the ex- between the output zone and the end of the array A. ception handling routines replace Step C by a different The permutation of these blocks is allowed, however, action. elements to be merged keep their relative positions within each block. On the other hand, the origin of the blocks in the subsequences A1 , . . . , Ak cannot be 4.3 Escape block becomes full recovered. So optional free blocks, input blocks, es- cape block, and common blocks can reside anywhere If the rightmost element of the output block is moved between the output zone and the end of the array A. to the last position of the escape block, the new hole cannot be created at the position ec +1 in Step C. Instead, one free block at the top of the stack becomes the new escape block and a new hole is created at k-th input block the beginning of this block. This is accomplished by i-th input block common block common block j-th input block common block common block common block escape block output block removing its starting position from the free block stack free block and assigning it to ec . The subsequent move of the buffer element from the new position of ec to the place released by the output smallest input element does not increase the number zone ... of moves; it replaces the move in Step C. The selection oc xi xj ec xk tree is updated in the standard way. It should be pointed out that, at this moment, there does exist at least one free block in the stack. As- Fig. 1. Standard situation sume, for example, that the j-th input element xj has just been transported to the output zone. After that, The output block spans across the current output we have rj ∈ {1, . . . , s} buffer elements in the j-th in- position oc , so its left part belongs to the output zone. put block Xj , including the hole, but rh ∈ {0, . . . , s−1} As the output grows to the right, the elements lying buffer elements in other input blocks Xh , for each to the right of oc are moved from the output block to h ∈ {1, . . . , k}, h 6= j, since each input block, except the corresponding positions in the escape block, i.e., for Xj , contains at least one input element. Moreover, to the right of ec . The positions of oc and ec are syn- the escape block is full, and hence it does not contain chronized, i.e., we have always oc mod s = ec mod s. any buffer elements at all. Assuming there is no free Hence, the relative positions of escaping elements are block available, this gives at most s + (k−1)·(s−1) < preserved within the blocks. Moreover, oc and ec reach k · s buffer elements in total. But this is a contradic- their respective block boundaries at the same time. tion, since the number of buffer elements, including the hole, is always equal to k·s. Now we are ready for merging. Using the selection tree, we determine xj , the smallest element among the k current input elements in the blocks X1 , . . . , Xk , and 4.4 Current input block becomes empty move this element to the output zone as follows: We check next whether the smallest element xj , just Step A. The element at the position oc in the output moved from the position ij to the output zone, was block escapes to the hole at the position ec . the last element of the corresponding input block Xj . If so, we have an entire block consisting of buffer el- 4.5 One of the input blocks overlays the ements only, with hole at the end after Step B. This escape block hole is filled in the standard way, described in Step C, If the common block that should be processed next but the old input block Xj becomes a free block and is the logical block composed of the left part of the its starting position is saved in the stack. Since we escape block and the right part of the output block, have k·s buffer elements in total, a stack of height k then both the new current input block Xj and the es- is sufficient. cape block are located within the same physical block. Here xj is always positioned to the left of ec and the Next, we have to find a new j-th input block Xj , buffer elements are both to the left of xj and to the and assign a new value to ij . Since the blocks are right of ec . mixed up, we scan sequentially the remaining com- Once the position of xj is properly initiated, all mon blocks to determine which common block should actions are performed in the standard way described in become the new j-th current input block. The smallest Section 4.2. That is, the elements are transported from common block, according to the block ordering intro- the output block to the position of ec , from the input duced in Section 3, is the next one to be processed. blocks to the position of oc , and buffer elements from As already shown in Section 3, the elements are trans- ec +1 to locations released in the input blocks. Since ec ported to the output zone in sorted order even though moves to the right “faster” than does ij , this special this strategy does not necessarily pick up the j-th in- case returns automatically to the standard mode as put block from the j-th input sequence Aj . soon as ec reaches a block boundary. Then the escape Free blocks, as well as all remaining current input block separates from the current input block Xj as blocks, are ignored in this scanning. Moreover, the el- described in Section 4.3. ements to the left of ec in the escape block (if not empty) together with the elements to the right of oc in 4.6 Output block overlays the escape block the output block are viewed as a single logical block. In a practical implementation, we can start with the left- Next we check whether the output zone, crossing a most escape-block element and the rightmost output- block boundary, does not bump into any “special” block element as a starting key and search the rest block. It is easy to see that this may happen only if of the array for a common block with a smaller key.3 ec points to the beginning of the escape block that is If the logical block composed of the left part of the empty, using the fact that the positions of oc and ec escape block and the right part of the output block are synchronized and that the special handling of Sec- should be processed next, the program control will be tion 4.3 is performed first. switched to the mode described in Section 4.5. Now consider that the output block overlays the escape block, i.e., they are both located within the If the escape block is empty, then both ec and oc same physical block. In this mode, we always have point to the beginning of their respective blocks. Then o = e . The element movement corresponds now to a c c the escape block is skipped and the output block is more efficient scheme: handled as a common block, so we may even find out that the new input block should be located at the same Step B’. The smallest input element xj not yet pro- position as the output block. This special mode is ex- cessed is moved to the hole at the position oc = ec . plained in Section 4.9. Step C’. A new hole is created at the position oc+1 = The search for new input blocks costs O((n/s)2 ) ec + 1 by moving its buffer element to the place additional comparisons: there are O(n/s) blocks in released by xj . Then all necessary index variables total and such search is activated only if one of the are incremented and the selection tree is updated. input blocks is exhausted, i.e., at most O(n/s) times. Step A is eliminated, since oc = ec . This mode is ter- The same upper bound holds for arithmetic operations minated as soon as oc and ec reach a block boundary. with indexes as well. We also need a slightly modified version of the routine described in Section 4.4. If one of the input blocks be- 3 comes empty, it becomes free as usual, but the com- It is quite straightforward to detect whether a block be- bined output/escape block is skipped in the search for ginning at a given position ` is common: the value of ` must not be saved in the free block stack, and b`/sc must the next input block. be different from bi1 /sc, . . . , bik /sc (excluding bij /sc), and also from bec /sc. For each given block, this can 4.7 Output block overlays a free block be verified in O(k) ≤ O(1) time, performing auxiliary arithmetic operations with indexes only, but no element If the output zone crosses a block boundary and the comparisons or moves. value of oc is equal to some f` , the leftmost position of a block stored in the free block stack, the new output tion 4.4, the combined output/escape block is not block and the corresponding free block are overlaid. disposed of as free, moreover, it is skipped out dur- This can be verified in O(k) ≤ O(1) time. By the same ing the search. The program control is switched to argument as in Section 4.6, we have that ec must point the mode of Section 4.6 as the output and escape to the beginning of an empty escape block. blocks are still overlaid. Therefore, we can easily swap the free block with (2) Let us now consider that this combined block be- the escape block by swapping the pointers stored in comes full. This may happen only if, for some f` and ec , since both these blocks contain buffer ele- h 6= j, an element xh from another input block Xh ments only. Second, one move suffices to transport the is moved to the output zone and, after Step B’, the hole from one block to another. Note that this element output position oc “bumps” into xj . In this case, move is for free, we actually save some moves because we take one free block from the top of the stack the next s transports to the output zone will require and change it into a new escape block. We defi- only 2s moves, instead of 3s as in the standard case. nitely have at least one free block available, since Thus, the program control is switched to the mode we disposed one block as free at the very beginning described in Section 4.6. of this mode. The hole, located in Xh at the posi- tion of the last element transported to the output, 4.8 Output block overlays a current input jumps to a position ec in the new escape block, block so that ec mod s = oc mod s. This move replaces Step C’ for the last element just merged. Hence, it If the output position oc points to some Xj after cross- does not increase the total number of moves. Then ing a block boundary, the output block overlays the j- we follow the instructions of Section 4.9. th input block Xj . Again, by the argument presented in Section 4.6, oc can point to the beginning of an input block only if ec points to the beginning of an 4.9 Output zone bumps into a current input empty escape block. There are now two cases to con- element sider. First, if the j-th current input element xj is the The program control can jump to this special mode leftmost element of Xj , the program control is switch- from several different places (Sections 4.1, 4.4, and two ed immediately to the special mode to be described in different places in Section 4.8). In any case, we have an Section 4.9. empty escape block, containing the hole and buffer ele- Second, if xj is not the leftmost element of Xj , we ments only. The output block and a block Xj , which is dispose of the empty escape block as free by storing one of the input blocks, are overlaid. Moreover, there its starting position ec in the stack, create a hole at oc is no room in between, the output position oc is point- by moving a single buffer element from the position oc ing to the current input element xj . The position of to ec , and overlay the output block by a new escape hole in the escape block is synchronized with oc , i.e., block, by assigning the value of oc to ec . The additional we have ec mod s = oc mod s. transportation of the hole is for free, not increasing the As long as the elements to be output are selected in total number of moves, because we can charge it as the input block Xj , they can be moved to the output (nonexistent) Step A for the next element that will be zone. This needs no actual transportation, just the transported to the output zone. Since xj is not placed positions of oc and ij are moved synchronously to the at the beginning of the block, we can guarantee that right. To keep ec synchronized with oc , we move the at least one transport to the output will use only two hole along the escape block in parallel, which gives us moves in the next future. one move per element. There are two ways out of this This special mode can be viewed as if three blocks loop. were overlaid, namely, the output, escape, and the cur- (1) If oc and ij reach the block boundary, we sim- rent input block Xj . The buffer elements are between ply search for the next input block to be pro- the hole at ec = oc and the current input element xj . cessed; the current configuration is the same as The elements are moved according to Step B’ and if, in the standard mode, oc , ec , and ij reached Step C’ of Section 4.6. However, there is a different the block boundaries at the same time (with the exception handling here. old input block Xj disposed of as free, by Sec- (1) If the rightmost input element of this combined tion 4.4). Thus, unless something “exceptional” block has been transported to the output zone, happens, the program control returns to the stan- then the input block Xj separates from the out- dard mode. (The possible exceptions are those dis- put/escape block, since we search for the next in- cussed in Sections 4.6–4.8, and 4.10.) The single put block to be processed. But here, unlike in Sec- move required to place the hole back to the begin- ning of the escape block is for free, it substitutes sists of the subsequences A1 , . . . , Ak merged into a sin- Step C for the last element merged. gle sorted sequence, followed by a sorted sequence of (2) If the element to be transported to the output zone buffer elements. is an element xh from another input block Xh , for some h 6= j, some rearrangements are neces- 4.11 Summary sary. Recall that the hole position ec in the es- cape block is synchronized with oc , i.e., we have Summing up the costs paid for maintaining the se- ec mod s = oc mod s. First, the input element xj lection tree, transporting the elements to the output is moved from position oc to position ec . Now we zone, searching for smallest input blocks, and for sort- can transport xh to the output position oc . Fi- ing the residual zone, it is easy to see that the above nally, a new hole is created4 at the position ec +1 algorithm uses dlog ke·n + O((n/s)2 ) + O(s·log s) el- by moving its buffer element to the place released ement comparisons and 3 · n + O(s · log s) moves. For by xh . s = dn2/3 /(log n)1/3 e, this gives an algorithm with The result is that the current input block Xj , over- dlog ke · n + O((n · log n)2/3 ) comparisons and 3 · n + laid by the output block, jumps and overlays the O((n·log n)2/3 ) moves. escape block. Thus, the control is switched to the mode of Section 4.5. Clearly, this rearrangement needs only three mo- 5 Conclusion ves. Since one more element has been transported to the output zone, the number of moves is the In this paper we have shown that k-way blockwise same as in the standard case. in-place merging can be accomplished efficiently with almost optimal number of element comparisons and moves. Moreover, the number of element moves is in- 4.10 Common blocks are exhausted dependent on k, the number of input sequences. Note that this algorithm does not merge stably, that is, the If one of the current input blocks becomes empty, but relative order of equal elements may not be preserved. there is no common block to become a new input Whether there exist a stable multiway blockwise in- block, the above procedure is stopped. At this point, place merging algorithm is left as an open problem. the output zone, consisting of the elements merged al- We conjecture that, using the algorithm described ready in their final locations, is followed by a residual here as a subroutine, it is possible to devise an asymp- zone of size n0 starting at the position oc . This zone totically efficient multiway in-place merging algorithm. consists of the right part of the output block, k−1 un- We dare to formulate this conjecture since the work on merged input blocks, at most k free blocks, and one such algorithm is currently in progress. escape block. Thus, the total length of this residual zone is n0 ≤ s + (k−1)·s + k·s + s = (2k+1)·s. The residual zone can be sorted by the use of Heap- References sort (including also the buffer element put aside at the very beginning of the computation, to create a hole). 1. Katajainen, J., Pasanen, T.: In-Place Sorting with This will cost only O(k·s·log(k·s)) ≤ O(s·log s) com- Fewer Moves. Inform. Process. Lett. 70 (1999) 31–37 parisons and the same number of moves [5–9]. Alterna- 2. Katajainen, J., Pasanen, T., Teuhola, J.: Practical In- tively, we could also use an algorithm sorting in-place Place Mergesort. Nordic J. Comput. 3 (1996) 27–40 with O(s·log s) comparisons but only O(s) moves [10]. 3. Katajainen, J., Träff, J. L.: A Meticulous Analysis of Now we are done: the buffer elements are greater Mergesort Programs. Lect. Notes Comput. Sci. 1203 than any other element, and hence the array now con- (1997) 217–28 4. Geffert, V., Katajainen, J., Pasanen, T.: Asymptotically 4 Unless the position ec +1 itself is across the block bound- Efficient In-Place Merging. Theoret. Comput. Sci. 237 ary. If xj is moved to the rightmost position in the escape (2000) 159–81 block, the escape block jumps immediately and one free 5. Carlsson, S.: A Note on Heapsort. Comput. J. 35 (1992) block becomes a new escape block. This nested excep- 410–11 tion thus returns the algorithm to the standard mode; all 6. Knuth, D. E.: The Art of Computer Programming, “special” blocks now reside in pairwise disjoint regions. Vol. 3: Sorting and Searching. Addison-Wesley, Second However, we jump to the point where the standard rou- edition (1998) tine checks the exceptions of Sections 4.4–4.10. Among 7. Schaffer, R., Sedgewick, R.: The Analysis of Heapsort. others, we have to check whether the input block Xh J. Algorithms 15 (1993) 76–100 has not become empty, or if the output zone, just cross- 8. Wegener, I.: Bottom-Up-Heapsort, a New Variant of ing a block boundary, has not bumped into any other Heapsort Beating, on an Average, Quicksort (If n Is Not “special” block again. Very Small). Theoret. Comput. Sci. 118 (1993) 81–98 9. Williams, J. W. J.: Heapsort (Algorithm 232). Comm. Appendix for the Program Committee Assoc. Comput. Mach. 7 (1964) 347–48 10. Franceschini, G., Geffert, V.: An In-Place Sorting with In this appendix, we give a proof of correctness of the O(n·log n) Comparisons and O(n) Moves. J. Assoc. blockwise merging algorithm described in Section 3. Comput. Mach. 52 (2005) 515–37 Recall that the algorithm starts with the smallest k blocks of the array A and, each time one of the k in- put blocks becomes empty, the smallest common block not yet processed becomes the new input block, not taking into account its origin in one of the sequences A1 , . . . , Ak . Let us assume, for contradiction, that an element xC has just been transported to the output zone, but there still exists an unprocessed element y, such that y < xC . Clearly, the element xC was a current input element in some current input block. Originally, this input block was represented in the form X = hxL , xR i, which gave its relative block order. Here xL repre- sents the original leftmost element of X (transported to the output even earlier than xC ) and xR the right- most element (still residing in X). Since all blocks are sorted, we have that xL ≤ xC ≤ xR (not excluding the possibility that xC coincides with xL or xR ). Simi- larly, the element y resides in a block characterized by Y = hyL , yR i, with yL ≤ y ≤ yR . (The current status of the original leftmost element yL is not known: it can still reside in Y but, potentially, it may be a part of the output zone already.) Now there are the following possibilities to consider: (a) The block Y is one of the current input blocks (not excluding the possibility that Y coincides with X). This case is straightforward. Let yC denotes the cur- rent input element in the input block Y . Clearly, yC ≤ y, since y has not been processed yet. But then yC ≤ y < xC , that is, yC < xC , which is a contradiction, since xC has just been moved to the output, and hence determined to be the smallest one among all current input elements. (b) The block Y is one of the common blocks (not yet being processed) and, at the present moment, all k current input blocks have their origin in different input sequences A1 , . . . , Ak . This does not necessarily mean that the i-th input block Xi originated from Ai , since it could have its origin in a different sequence. Nevertheless, this does imply that one current input block Z ∈ {X1 , . . . , Xk } originated from the same se- quence A` as did the block Y . Let Z = hzL , zR i represents the original character- ization of this current input block, and let zC be its current input element. Clearly, zL ≤ zC ≤ zR . Using the fact that the block Z has been selected as an in- put block prior to selecting the block Y = hyL , yR i, and that these two blocks originated from the same sorted sequence A` , we have that zR ≤ yL . (For, if zR > yL , the sequence A` could not be sorted.) But this gives that zC ≤ zR ≤ yL ≤ y < xC . Therefore, zC < xC , which is a contradiction, since the element xC has been determined to be the smallest one among all current input elements. (c) Finally, let Y be one of the common blocks and, at the present moment, at least two current in- put blocks have their origin in the same input se- quence. That is, we have V, W ∈ {X1 , . . . , Xk }, com- ing from A` , for some ` ∈ {1, . . . , k}. Let the respective characterizations of these two blocks be V = hvL , vR i and W = hwL , wR i, their respective current input ele- ments be vC and wC . Clearly, vL ≤ vC ≤ vR and wL ≤ wC ≤ wR . Without loss of generality, we can also as- sume that the block V was smaller than or equal to W . But then vR ≤ wL , since these two blocks came from the same sorted sequence A` . Moreover, at the present moment, the element xC has just been selected as the smallest one from among all current input elements, and hence we also have that xC ≤ vC . Putting these facts together, we get yL ≤ y < xC ≤ vC ≤ vR ≤ wL . Therefore, yL < wL , which contradicts the fact that the block W had been selected as an input block prior to selecting the block Y . Summing up, all cases have led us to contradic- tions, and hence the elements are always transported to the output zone in sorted order.5 5 Notice that the algorithm would behave correctly even if the number of used current input blocks exceeded k, the number of original input sequences. This only eliminates Case (b) in the proof of correctness.