=Paper= {{Paper |id=Vol-414/paper-3 |storemode=property |title=Multiway blockwise in-place merging |pdfUrl=https://ceur-ws.org/Vol-414/paper3.pdf |volume=Vol-414 |dblpUrl=https://dblp.org/rec/conf/itat/GeffertG08 }} ==Multiway blockwise in-place merging== https://ceur-ws.org/Vol-414/paper3.pdf

Multiway Blockwise In-place Merging

Viliam Geffert and Jozef Gajdoš

Institute of Computer Science, P.J.Šafárik University, Faculty of Science
Jesenná 5, 041 54 Košice, Slovak Republic
viliam.geffert@upjs.sk, jozef.gajdos@upjs.sk

Abstract. We present an algorithm for asymptotically ef- array A. Moreover, the lengths n1 , . . . , nk of input se-
ficient multiway blockwise in-place merging. Given an ar- quences are positive integer multiples of s, and hence,
ray A containing sorted subsequences Pk A1 , . . . , Ak of respec-
there is always a block boundary between the last el-
tive lengths n1 , . . . , nk , where n = n, we assume
i=1 i ement of Ai and the first element of Ai+1 , for each
that extra k·s elements (so called buffer elements) are po- i ∈ 1, . . . , k−1. We shall also assume, that before the
sitioned at the very end of array A, and that the lengths
merging starts, blocks can be mixed up quite arbitrar-
n1 , . . . , nk are positive integer multiples of some param-
ily, so we no longer know the original membership of
eter s (i.e., multiples of a given block of length s). The
number of input sequences k is a fixed constant parameter, blocks in the input sequences A1 , . . . , Ak .
not dependent on the lengths of input sequences. Then our So far, the problem has been resolved for two-way
algorithm merges the subsequences A1 , . . . , Ak into a single merging, i.e., for k = 2 [4]. This algorithm uses 2n +
sorted sequence, performing Θ(log k·n) + O((n/s)2 ) + O(s· o(n) comparisons, 3n + o(n) element moves and O(1)
log s) element comparisons and 3 · n + O(s · log s) element extra locations for storing elements, in the worst case.
moves.1 Thus, by repeated application of this algorithm, we
Then, for s = dn2/3 /(log n)1/3 e, this gives an algorithm could carry out k-way merging in linear time, for ar-
performing Θ(log k ·n) + O((n·log n)2/3 ) comparisons and bitrary k ≥ 2. However, implemented this way, the k-
3·n + O((n·log n)2/3 ) moves. That is, our algorithm runs way merging would perform 3·dlog ke·n+o(n) element
in linear time, with an asymptotically optimal number of moves and 2·dlog ke·n+o(n) element comparisons. We
comparisons and with the number of moves independent
shall show that the number of moves does not depend
on the number of input sequences. Moreover, our algo-
rithm is “almost in-place”, it requires only k extra blocks
on k, if the lengths n1 , . . . , nk are integer multiples
of size s = o(n). of the block size s. Namely, using the algorithm of
Geffert et. al [4] as our starting point, we show that
multiway blockwise in-place merging is possible with
1 Introduction dlog ke·n + O((n/s)2 ) + O(s·log s) element comparisons
and 3·n + O(s·log s) moves. For s = dn2/3 /(log n)1/3 e,
this gives an algorithm with dlog ke·n + O((n·log n)2/3 )
Given an array A[1..n] consisting of sorted subsequen-
comparisons and 3·n + O((n·log n)2/3 ) moves, and the
ces A1 , . . . , Ak each
Pcontaining n1 , . . . , nk elements re-
k number of element moves independent on the number
spectively, where i=1 ni = n, the classical multiway
of input sequences. (It is also easy to show that the
in-place merging problem is to rearrange these ele-
number of comparisons cannot be improved.)
ments to form a single sorted sequence of n elements,
assuming that only one extra storage location (in ad-
dition to the array A) is available for storing elements. 2 Comparisons in a simple multiway
To store array indices, counters, etc. only O(1) stor- merging
age locations are available. The efficiency of a merg-
ing algorithm is given by two quantities: the number To explain how elements are compared, we first solve
of pairwise element comparisons and the number of a simpler task. Assume that we are given an array A,
element moves carried out in the worst case, both ex- consisting of k sorted subsequences A1 , A2 , . . . , Ak ,
pressed as a function of n. In merging, these are the that are to be merged into a single sorted sequence.
only operations permitted for elements. The lengths of these P subsequences are n1 , n2 , . . . , nk
In this paper we study the computational complex- respectively, with k ni = n.
i=1
ity of the multiway blockwise in-place merging prob- Assume also that, together with the given array A,
lem. More precisely, we assume that the entire array A we are also given an extra array B of the same size n,
is divided into blocks of equal size s, and that k ex- which will be used as an output zone.
tra blocks of size s are positioned at the very end of
During the computation, the algorithm uses aux-
1
Throughout the paper, log x denotes the binary loga- iliary index variables i1 , . . . , ik and oc , where ij , for
rithm of x. j ∈ {1, . . . , k}, points to the smallest element of the
sequence Aj not yet processed. This element will be Before passing further, we define the following rel-
called the current input element of the j-th sequence, ative order of blocks in the array A. Let X be a block
or simply the j-th input element. The index oc points with the leftmost and the rightmost elements denoted
to the leftmost empty position in the array B. by xL and xR , respectively. Such block can be repre-
Then the straightforward implementation of the sented in the form X = hxL , xR i. Similarly, let Y =
merge routine proceeds as follows. We find the small- hyL , yR i be an another block. We say that the block X
est element not yet processed, by comparing elements is smaller than or equal to Y , if xL < yL , or xL = yL
at the positions i1 , . . . , ik , and move this element to and xR ≤ yR . Otherwise, X is greater than Y . In other
the output zone in B. After that, we update the nec- words, the blocks are ordered according to their left-
essary index variables and repeat the process until all most elements and, in the case of equal leftmost el-
the elements have been merged. Implemented this way, ements, the elements at the rightmost positions are
each element will be moved just once and the number used as the second order criterion.
of comparisons, per each element, will be k −1. This
gives us (k−1)·n comparisons and n element moves in
total. Now the modified merging algorithm proceeds as
follows. First, using the above block ordering, find the
The number of comparisons can be reduced by im- smallest k blocks in the array A. These blocks will ini-
plementing a selection tree of depth dlog ke above the tially become the k current input blocks, their leftmost
k current input elements. Initially, to build a selec- elements becoming the k current input elements. The
tion tree, k − 1 comparisons are required. Then the j-th current input block will be denoted by Xj , simi-
smallest element, not yet processed, can be moved to larly, the j-th current input element by xj . The posi-
the output zone. After this, the element following the tions of current input elements are kept in index vari-
smallest element in the same subsequence is inserted ables i1 , . . . , ik . Above the k current input elements,
in the tree and the selection tree is updated. To do we build a selection tree. All blocks that are not input
this, only dlog ke comparisons are needed. To avoid el- blocks are called common blocks.
ement moves, only pointers to elements are stored in
After that, the merging process can proceed in the
the selection tree. (For more details concerning this
same way as described in Section 2. That is, using
data structure, see [1–3].) The number of moves re-
the selection tree, determine ij , the position of the
mains unchanged, but now we have k − 1 compar-
smallest input element not yet processed, among the
isons for the first element and only dlog ke compar-
k current input elements, and move this element to
isons per each other element. This gives us a total of
the output zone in the array B. Then the element po-
(k−1) + dlog ke·(n−1) ≤ dlog ke·n + O(1) comparisons.
sitioned immediately on the right of xj , within the
same block Xj , becomes a new j-th current input el-
3 Comparisons in a blockwise merging ement, its index pointer is inserted into the selection
tree, and the tree is updated, with dlog ke compar-
This section describes one of the cardinal tricks used in isons. This can be repeated until one of the current
our algorithm. Again, we are given the array A con- input blocks becomes empty.
sisting of the sorted subsequences A1 , . . . , Ak , to be When this happens, i.e., each time the element xj ,
merged together. We still have the extra array B, used just moved to the output zone, was the last (right-
as an output zone. most) element in the corresponding input block Xj ,
However, now the entire array A is divided into the block Xj is “discarded” and the smallest (accord-
blocks of equal size s (the exact value of s will be ing to our relative block ordering) common block not
determined later, so that the number of comparisons yet processed will be used as the new j-th current
and moves is minimized) and, before the merging can input block. The leftmost element in this block will
start, these blocks are mixed up quite arbitrarily. Be- become the new j-th current input element. Since the
cause of the permutation of blocks in A, we no longer blocks are mixed up in the array A, we need to scan
know the original membership of blocks in the input sequentially all blocks (actually, all blocks not yet pro-
sequences A1 , . . . , Ak . cessed only) to determine which one of them is the
Still, the relative order of elements inside individ- smallest. This search for a new input block consumes
ual blocks is preserved. Moreover, we shall also assume O((n/s)2 ) additional comparisons: there are at most
that n1 , . . . , nk , the respective lengths of input se- n/s blocks and such search is activated only if one of
quences, are positive integer multiples of s, and hence, the input blocks has been discarded as empty, i.e., at
before mixing the blocks up, there was always a block most n/s times. (For the time being, just assume that
boundary between the last element of Ai and the first we can distinguish discarded blocks from those not yet
element of Ai+1 , for each i ∈ 1, . . . , k−1. processed, at no extra cost.)
However, before merging, the blocks have been mi- a so-called escape block. We also maintain a current
xed up quite arbitrarily and hence their origin in the escape position ec , which is initially the position of
input subsequences A1 , . . . , Ak cannot be recovered. the first (leftmost) element in the escape block. We
The proof that the above algorithm behaves correctly, create a hole here by putting the buffer element at
that is, the elements are transported to the output this position aside.
zone in sorted order, will be published in the full ver- Now, find the smallest k blocks2 in the area oc-
sion of the paper. (** The proof can also be found in cupied by A1 , . . . , Ak , according to the relative block
the Appendix for the Program Committee. **) ordering defined in Section 3. This can be done with
The number of element moves remains unchanged, O(k 2 ) ≤ O(1) comparisons, by the use of some k cur-
but now we use dlog ke · n + O((n/s)2 ) comparisons, sors (index variables) moving along in A, since each
under assumption that we can distinguish discarded of the sequences A1 , . . . , Ak is sorted. The smallest k
blocks from those not yet processed at no extra cost. blocks will initially become the k current input blocks
X1 , . . . , Xk . For each j = 1, . . . , k, the first element xj
in the block Xj becomes a j-th current input element,
4 In-place merging, simplified case and its position is kept in the index variable ij . Above
the k input elements, we build a selection tree of depth
Now we shall convert the above merging algorithm dlog ke. To do that, k −1 ≤ O(1) initial comparisons
into a procedure working “almost” in-place. More pre- are needed.
cisely, we are again given the array A containing the The very first block of the array A becomes an
sorted subsequences Pk A1 , . . . , Ak , of respective lengths output block and a position oc = 1 pointing there
n1 , . . . , nk , with i=1 ni = n. All these lengths are becomes a current output position. The initial out-
positive integer multiples of the given parameter s. put position may — quite likely — coincide with a po-
However, we no longer have a separate array B of sition of some current input element. Observe that
size n. Instead, we have some extra k·s elements po- ec mod s = oc mod s, which is an invariant we shall
sitioned at the very end of the array A, behind Ak . keep in the course of the entire computation. All other
The elements in this small additional area are greater blocks are called common blocks.
than any of the elements in A1 , . . . , Ak . During the
computation, they can be mixed with other elements,
but their original contents cannot be destroyed. These In general, the algorithm maintains current posi-
elements will be called buffer elements. To let the el- tions of the following special blocks: free blocks, the
ements ever move, we have also one extra location number of which ranges between 0 and k, their left-
where we can put a single element aside. most positions are stored in the free block stack; ex-
The sorted output should be formed within the actly k input blocks, the current input positions inside
same array A, in the locations occupied by the input the respective blocks are stored in the index variables
sequences A1 , . . . , Ak . (As a consequence, the buffer el- i1 , . . . , ik ; one output block with the current output
ements should also end up in their original locations.) position oc inside this block; and one escape block with
Therefore, the moves are performed in a different way, the current escape position ec inside. The values of oc
based on the idea of internal buffering, used in a two- and ec are synchronized modulo s.
way in-place merging [4]. Nevertheless, the compar- Usually, the optional free blocks, the k current in-
isons are performed in the same way as described in put blocks, the output block, and the escape block are
Section 3. all disjoint, and the merging proceeds as described in
Section 4.2. However, after the initiation, the output
block may overlay one of the current input blocks, if
4.1 Initiation
the leftmost block in A1 has been selected as an in-
Divide the entire array A into blocks of equal size s. put block. If this happens, the current output position
Since the lengths of all input sequences A1 , . . . , Ak are coincides with a position of one of the current input
positive integer multiples of s, there is always a block elements, and the computation starts in a very special
boundary between the last element of Ai and the first mode of Section 4.9.
element of Ai+1 , for each i ∈ 1, . . . , k−1. Similarly,
the buffer elements, positioned in the small additional 2
Picking simply the leftmost blocks in the sequences
area at the very end, form the last k blocks. A1 , . . . , Ak would do no harm. In addition, this would
Initially, the last k blocks will be used as free blocks, not require any initial element comparisons. However,
their starting positions are stored in a free block stack we are presenting the algorithm in a form that is suitable
of height k. After that, the position of one free block for application in the general case, comparing elements
is picked out of the stack and this block is used as in accordance with the strategy presented in Section 3.
4.2 Standard situation Step B. The smallest input element xj not yet pro-
cessed is moved from the position ij to its final
The standard situation is illustrated by Fig. 1. Dur- position at oc .
ing the computation, the k·s buffer elements can be Step C. A new hole is created at the position ec +1 by
found at the following locations: to the left of the j- moving its buffer element to the place released by
th input element xj in the j-th input block Xj , for the smallest input element just moved. After that,
j ∈ {1, . . . , k}, to the right of ec in the escape block, all necessary index variables are incremented and
with the hole at the position ec , and also in free blocks, the selection tree is updated.
consisting of buffer elements only.
The elements merged already, from all the input This gives 3 moves and dlog ke comparisons per each
blocks, form a contiguous output zone at the very be- element transported to its final location. Now there
ginning of A, ending at position oc −1. Hence, the next are various special cases that should be detected and
element to be output will go to the position oc in the handled with care. All exceptions are checked up on
output block. after the execution of Step B, in the order of their
All elements not merged yet are scattered in blocks appearance, unless stated otherwise. Most of the ex-
between the output zone and the end of the array A. ception handling routines replace Step C by a different
The permutation of these blocks is allowed, however, action.
elements to be merged keep their relative positions
within each block. On the other hand, the origin of
the blocks in the subsequences A1 , . . . , Ak cannot be 4.3 Escape block becomes full
recovered. So optional free blocks, input blocks, es-
cape block, and common blocks can reside anywhere If the rightmost element of the output block is moved
between the output zone and the end of the array A. to the last position of the escape block, the new hole
cannot be created at the position ec +1 in Step C.
Instead, one free block at the top of the stack becomes
the new escape block and a new hole is created at
k-th input block

the beginning of this block. This is accomplished by
i-th input block
common block

common block

j-th input block

common block

common block
escape block
output block

removing its starting position from the free block stack
free block

and assigning it to ec .
The subsequent move of the buffer element from
the new position of ec to the place released by the
output
smallest input element does not increase the number
zone ...
of moves; it replaces the move in Step C. The selection
oc xi xj ec xk tree is updated in the standard way.
It should be pointed out that, at this moment,
there does exist at least one free block in the stack. As-
Fig. 1. Standard situation
sume, for example, that the j-th input element xj has
just been transported to the output zone. After that,
The output block spans across the current output we have rj ∈ {1, . . . , s} buffer elements in the j-th in-
position oc , so its left part belongs to the output zone. put block Xj , including the hole, but rh ∈ {0, . . . , s−1}
As the output grows to the right, the elements lying buffer elements in other input blocks Xh , for each
to the right of oc are moved from the output block to h ∈ {1, . . . , k}, h 6= j, since each input block, except
the corresponding positions in the escape block, i.e., for Xj , contains at least one input element. Moreover,
to the right of ec . The positions of oc and ec are syn- the escape block is full, and hence it does not contain
chronized, i.e., we have always oc mod s = ec mod s. any buffer elements at all. Assuming there is no free
Hence, the relative positions of escaping elements are block available, this gives at most s + (k−1)·(s−1) <
preserved within the blocks. Moreover, oc and ec reach k · s buffer elements in total. But this is a contradic-
their respective block boundaries at the same time. tion, since the number of buffer elements, including
the hole, is always equal to k·s.
Now we are ready for merging. Using the selection
tree, we determine xj , the smallest element among the
k current input elements in the blocks X1 , . . . , Xk , and 4.4 Current input block becomes empty
move this element to the output zone as follows:
We check next whether the smallest element xj , just
Step A. The element at the position oc in the output moved from the position ij to the output zone, was
block escapes to the hole at the position ec . the last element of the corresponding input block Xj .
If so, we have an entire block consisting of buffer el- 4.5 One of the input blocks overlays the
ements only, with hole at the end after Step B. This escape block
hole is filled in the standard way, described in Step C,
If the common block that should be processed next
but the old input block Xj becomes a free block and
is the logical block composed of the left part of the
its starting position is saved in the stack. Since we
escape block and the right part of the output block,
have k·s buffer elements in total, a stack of height k
then both the new current input block Xj and the es-
is sufficient.
cape block are located within the same physical block.
Here xj is always positioned to the left of ec and the
Next, we have to find a new j-th input block Xj , buffer elements are both to the left of xj and to the
and assign a new value to ij . Since the blocks are right of ec .
mixed up, we scan sequentially the remaining com- Once the position of xj is properly initiated, all
mon blocks to determine which common block should actions are performed in the standard way described in
become the new j-th current input block. The smallest Section 4.2. That is, the elements are transported from
common block, according to the block ordering intro- the output block to the position of ec , from the input
duced in Section 3, is the next one to be processed. blocks to the position of oc , and buffer elements from
As already shown in Section 3, the elements are trans- ec +1 to locations released in the input blocks. Since ec
ported to the output zone in sorted order even though moves to the right “faster” than does ij , this special
this strategy does not necessarily pick up the j-th in- case returns automatically to the standard mode as
put block from the j-th input sequence Aj . soon as ec reaches a block boundary. Then the escape
Free blocks, as well as all remaining current input block separates from the current input block Xj as
blocks, are ignored in this scanning. Moreover, the el- described in Section 4.3.
ements to the left of ec in the escape block (if not
empty) together with the elements to the right of oc in 4.6 Output block overlays the escape block
the output block are viewed as a single logical block. In
a practical implementation, we can start with the left- Next we check whether the output zone, crossing a
most escape-block element and the rightmost output- block boundary, does not bump into any “special”
block element as a starting key and search the rest block. It is easy to see that this may happen only if
of the array for a common block with a smaller key.3 ec points to the beginning of the escape block that is
If the logical block composed of the left part of the empty, using the fact that the positions of oc and ec
escape block and the right part of the output block are synchronized and that the special handling of Sec-
should be processed next, the program control will be tion 4.3 is performed first.
switched to the mode described in Section 4.5. Now consider that the output block overlays the
escape block, i.e., they are both located within the
If the escape block is empty, then both ec and oc same physical block. In this mode, we always have
point to the beginning of their respective blocks. Then o = e . The element movement corresponds now to a
c c
the escape block is skipped and the output block is more efficient scheme:
handled as a common block, so we may even find out
that the new input block should be located at the same Step B’. The smallest input element xj not yet pro-
position as the output block. This special mode is ex- cessed is moved to the hole at the position oc = ec .
plained in Section 4.9. Step C’. A new hole is created at the position oc+1 =
The search for new input blocks costs O((n/s)2 ) ec + 1 by moving its buffer element to the place
additional comparisons: there are O(n/s) blocks in released by xj . Then all necessary index variables
total and such search is activated only if one of the are incremented and the selection tree is updated.
input blocks is exhausted, i.e., at most O(n/s) times. Step A is eliminated, since oc = ec . This mode is ter-
The same upper bound holds for arithmetic operations minated as soon as oc and ec reach a block boundary.
with indexes as well. We also need a slightly modified version of the routine
described in Section 4.4. If one of the input blocks be-
3 comes empty, it becomes free as usual, but the com-
It is quite straightforward to detect whether a block be-
bined output/escape block is skipped in the search for
ginning at a given position ` is common: the value of `
must not be saved in the free block stack, and b`/sc must the next input block.
be different from bi1 /sc, . . . , bik /sc (excluding bij /sc),
and also from bec /sc. For each given block, this can 4.7 Output block overlays a free block
be verified in O(k) ≤ O(1) time, performing auxiliary
arithmetic operations with indexes only, but no element If the output zone crosses a block boundary and the
comparisons or moves. value of oc is equal to some f` , the leftmost position of
a block stored in the free block stack, the new output tion 4.4, the combined output/escape block is not
block and the corresponding free block are overlaid. disposed of as free, moreover, it is skipped out dur-
This can be verified in O(k) ≤ O(1) time. By the same ing the search. The program control is switched to
argument as in Section 4.6, we have that ec must point the mode of Section 4.6 as the output and escape
to the beginning of an empty escape block. blocks are still overlaid.
Therefore, we can easily swap the free block with (2) Let us now consider that this combined block be-
the escape block by swapping the pointers stored in comes full. This may happen only if, for some
f` and ec , since both these blocks contain buffer ele- h 6= j, an element xh from another input block Xh
ments only. Second, one move suffices to transport the is moved to the output zone and, after Step B’, the
hole from one block to another. Note that this element output position oc “bumps” into xj . In this case,
move is for free, we actually save some moves because we take one free block from the top of the stack
the next s transports to the output zone will require and change it into a new escape block. We defi-
only 2s moves, instead of 3s as in the standard case. nitely have at least one free block available, since
Thus, the program control is switched to the mode we disposed one block as free at the very beginning
described in Section 4.6. of this mode. The hole, located in Xh at the posi-
tion of the last element transported to the output,
4.8 Output block overlays a current input jumps to a position ec in the new escape block,
block so that ec mod s = oc mod s. This move replaces
Step C’ for the last element just merged. Hence, it
If the output position oc points to some Xj after cross- does not increase the total number of moves. Then
ing a block boundary, the output block overlays the j- we follow the instructions of Section 4.9.
th input block Xj . Again, by the argument presented
in Section 4.6, oc can point to the beginning of an
input block only if ec points to the beginning of an 4.9 Output zone bumps into a current input
empty escape block. There are now two cases to con- element
sider.
First, if the j-th current input element xj is the The program control can jump to this special mode
leftmost element of Xj , the program control is switch- from several different places (Sections 4.1, 4.4, and two
ed immediately to the special mode to be described in different places in Section 4.8). In any case, we have an
Section 4.9. empty escape block, containing the hole and buffer ele-
Second, if xj is not the leftmost element of Xj , we ments only. The output block and a block Xj , which is
dispose of the empty escape block as free by storing one of the input blocks, are overlaid. Moreover, there
its starting position ec in the stack, create a hole at oc is no room in between, the output position oc is point-
by moving a single buffer element from the position oc ing to the current input element xj . The position of
to ec , and overlay the output block by a new escape hole in the escape block is synchronized with oc , i.e.,
block, by assigning the value of oc to ec . The additional we have ec mod s = oc mod s.
transportation of the hole is for free, not increasing the As long as the elements to be output are selected in
total number of moves, because we can charge it as the input block Xj , they can be moved to the output
(nonexistent) Step A for the next element that will be zone. This needs no actual transportation, just the
transported to the output zone. Since xj is not placed positions of oc and ij are moved synchronously to the
at the beginning of the block, we can guarantee that right. To keep ec synchronized with oc , we move the
at least one transport to the output will use only two hole along the escape block in parallel, which gives us
moves in the next future. one move per element. There are two ways out of this
This special mode can be viewed as if three blocks loop.
were overlaid, namely, the output, escape, and the cur-
(1) If oc and ij reach the block boundary, we sim-
rent input block Xj . The buffer elements are between
ply search for the next input block to be pro-
the hole at ec = oc and the current input element xj .
cessed; the current configuration is the same as
The elements are moved according to Step B’ and
if, in the standard mode, oc , ec , and ij reached
Step C’ of Section 4.6. However, there is a different
the block boundaries at the same time (with the
exception handling here.
old input block Xj disposed of as free, by Sec-
(1) If the rightmost input element of this combined tion 4.4). Thus, unless something “exceptional”
block has been transported to the output zone, happens, the program control returns to the stan-
then the input block Xj separates from the out- dard mode. (The possible exceptions are those dis-
put/escape block, since we search for the next in- cussed in Sections 4.6–4.8, and 4.10.) The single
put block to be processed. But here, unlike in Sec- move required to place the hole back to the begin-
ning of the escape block is for free, it substitutes sists of the subsequences A1 , . . . , Ak merged into a sin-
Step C for the last element merged. gle sorted sequence, followed by a sorted sequence of
(2) If the element to be transported to the output zone buffer elements.
is an element xh from another input block Xh ,
for some h 6= j, some rearrangements are neces-
4.11 Summary
sary. Recall that the hole position ec in the es-
cape block is synchronized with oc , i.e., we have Summing up the costs paid for maintaining the se-
ec mod s = oc mod s. First, the input element xj lection tree, transporting the elements to the output
is moved from position oc to position ec . Now we zone, searching for smallest input blocks, and for sort-
can transport xh to the output position oc . Fi- ing the residual zone, it is easy to see that the above
nally, a new hole is created4 at the position ec +1 algorithm uses dlog ke·n + O((n/s)2 ) + O(s·log s) el-
by moving its buffer element to the place released ement comparisons and 3 · n + O(s · log s) moves. For
by xh . s = dn2/3 /(log n)1/3 e, this gives an algorithm with
The result is that the current input block Xj , over- dlog ke · n + O((n · log n)2/3 ) comparisons and 3 · n +
laid by the output block, jumps and overlays the O((n·log n)2/3 ) moves.
escape block. Thus, the control is switched to the
mode of Section 4.5.
Clearly, this rearrangement needs only three mo- 5 Conclusion
ves. Since one more element has been transported
to the output zone, the number of moves is the In this paper we have shown that k-way blockwise
same as in the standard case. in-place merging can be accomplished efficiently with
almost optimal number of element comparisons and
moves. Moreover, the number of element moves is in-
4.10 Common blocks are exhausted dependent on k, the number of input sequences. Note
that this algorithm does not merge stably, that is, the
If one of the current input blocks becomes empty, but
relative order of equal elements may not be preserved.
there is no common block to become a new input
Whether there exist a stable multiway blockwise in-
block, the above procedure is stopped. At this point,
place merging algorithm is left as an open problem.
the output zone, consisting of the elements merged al-
We conjecture that, using the algorithm described
ready in their final locations, is followed by a residual
here as a subroutine, it is possible to devise an asymp-
zone of size n0 starting at the position oc . This zone
totically efficient multiway in-place merging algorithm.
consists of the right part of the output block, k−1 un-
We dare to formulate this conjecture since the work on
merged input blocks, at most k free blocks, and one
such algorithm is currently in progress.
escape block. Thus, the total length of this residual
zone is n0 ≤ s + (k−1)·s + k·s + s = (2k+1)·s.
The residual zone can be sorted by the use of Heap- References
sort (including also the buffer element put aside at the
very beginning of the computation, to create a hole). 1. Katajainen, J., Pasanen, T.: In-Place Sorting with
This will cost only O(k·s·log(k·s)) ≤ O(s·log s) com- Fewer Moves. Inform. Process. Lett. 70 (1999) 31–37
parisons and the same number of moves [5–9]. Alterna- 2. Katajainen, J., Pasanen, T., Teuhola, J.: Practical In-
tively, we could also use an algorithm sorting in-place Place Mergesort. Nordic J. Comput. 3 (1996) 27–40
with O(s·log s) comparisons but only O(s) moves [10]. 3. Katajainen, J., Träff, J. L.: A Meticulous Analysis of
Now we are done: the buffer elements are greater Mergesort Programs. Lect. Notes Comput. Sci. 1203
than any other element, and hence the array now con- (1997) 217–28
4. Geffert, V., Katajainen, J., Pasanen, T.: Asymptotically
4
Unless the position ec +1 itself is across the block bound- Efficient In-Place Merging. Theoret. Comput. Sci. 237
ary. If xj is moved to the rightmost position in the escape (2000) 159–81
block, the escape block jumps immediately and one free 5. Carlsson, S.: A Note on Heapsort. Comput. J. 35 (1992)
block becomes a new escape block. This nested excep- 410–11
tion thus returns the algorithm to the standard mode; all 6. Knuth, D. E.: The Art of Computer Programming,
“special” blocks now reside in pairwise disjoint regions. Vol. 3: Sorting and Searching. Addison-Wesley, Second
However, we jump to the point where the standard rou- edition (1998)
tine checks the exceptions of Sections 4.4–4.10. Among 7. Schaffer, R., Sedgewick, R.: The Analysis of Heapsort.
others, we have to check whether the input block Xh J. Algorithms 15 (1993) 76–100
has not become empty, or if the output zone, just cross- 8. Wegener, I.: Bottom-Up-Heapsort, a New Variant of
ing a block boundary, has not bumped into any other Heapsort Beating, on an Average, Quicksort (If n Is Not
“special” block again. Very Small). Theoret. Comput. Sci. 118 (1993) 81–98
9. Williams, J. W. J.: Heapsort (Algorithm 232). Comm. Appendix for the Program Committee
Assoc. Comput. Mach. 7 (1964) 347–48
10. Franceschini, G., Geffert, V.: An In-Place Sorting with In this appendix, we give a proof of correctness of the
O(n·log n) Comparisons and O(n) Moves. J. Assoc.
blockwise merging algorithm described in Section 3.
Comput. Mach. 52 (2005) 515–37
Recall that the algorithm starts with the smallest
k blocks of the array A and, each time one of the k in-
put blocks becomes empty, the smallest common block
not yet processed becomes the new input block, not
taking into account its origin in one of the sequences
A1 , . . . , Ak .
Let us assume, for contradiction, that an element
xC has just been transported to the output zone, but
there still exists an unprocessed element y, such that
y < xC . Clearly, the element xC was a current input
element in some current input block. Originally, this
input block was represented in the form X = hxL , xR i,
which gave its relative block order. Here xL repre-
sents the original leftmost element of X (transported
to the output even earlier than xC ) and xR the right-
most element (still residing in X). Since all blocks are
sorted, we have that xL ≤ xC ≤ xR (not excluding
the possibility that xC coincides with xL or xR ). Simi-
larly, the element y resides in a block characterized by
Y = hyL , yR i, with yL ≤ y ≤ yR . (The current status of
the original leftmost element yL is not known: it can
still reside in Y but, potentially, it may be a part of
the output zone already.) Now there are the following
possibilities to consider:
(a) The block Y is one of the current input blocks
(not excluding the possibility that Y coincides with X).
This case is straightforward. Let yC denotes the cur-
rent input element in the input block Y . Clearly, yC ≤
y, since y has not been processed yet. But then yC ≤
y < xC , that is, yC < xC , which is a contradiction,
since xC has just been moved to the output, and hence
determined to be the smallest one among all current
input elements.
(b) The block Y is one of the common blocks (not
yet being processed) and, at the present moment, all
k current input blocks have their origin in different
input sequences A1 , . . . , Ak . This does not necessarily
mean that the i-th input block Xi originated from Ai ,
since it could have its origin in a different sequence.
Nevertheless, this does imply that one current input
block Z ∈ {X1 , . . . , Xk } originated from the same se-
quence A` as did the block Y .
Let Z = hzL , zR i represents the original character-
ization of this current input block, and let zC be its
current input element. Clearly, zL ≤ zC ≤ zR . Using
the fact that the block Z has been selected as an in-
put block prior to selecting the block Y = hyL , yR i, and
that these two blocks originated from the same sorted
sequence A` , we have that zR ≤ yL . (For, if zR > yL ,
the sequence A` could not be sorted.) But this gives
that zC ≤ zR ≤ yL ≤ y < xC . Therefore, zC < xC ,
which is a contradiction, since the element xC has been
determined to be the smallest one among all current
input elements.
(c) Finally, let Y be one of the common blocks
and, at the present moment, at least two current in-
put blocks have their origin in the same input se-
quence. That is, we have V, W ∈ {X1 , . . . , Xk }, com-
ing from A` , for some ` ∈ {1, . . . , k}. Let the respective
characterizations of these two blocks be V = hvL , vR i
and W = hwL , wR i, their respective current input ele-
ments be vC and wC . Clearly, vL ≤ vC ≤ vR and wL ≤
wC ≤ wR . Without loss of generality, we can also as-
sume that the block V was smaller than or equal to W .
But then vR ≤ wL , since these two blocks came from
the same sorted sequence A` . Moreover, at the present
moment, the element xC has just been selected as the
smallest one from among all current input elements,
and hence we also have that xC ≤ vC . Putting these
facts together, we get yL ≤ y < xC ≤ vC ≤ vR ≤ wL .
Therefore, yL < wL , which contradicts the fact that
the block W had been selected as an input block prior
to selecting the block Y .
Summing up, all cases have led us to contradic-
tions, and hence the elements are always transported
to the output zone in sorted order.5

5
Notice that the algorithm would behave correctly even if
the number of used current input blocks exceeded k, the
number of original input sequences. This only eliminates
Case (b) in the proof of correctness.