Multiway Blockwise In-place Merging

                                               Viliam Geffert and Jozef Gajdoš

                          Institute of Computer Science, P.J.Šafárik University, Faculty of Science
                                          Jesenná 5, 041 54 Košice, Slovak Republic
                                      viliam.geffert@upjs.sk, jozef.gajdos@upjs.sk

Abstract. We present an algorithm for asymptotically ef-        array A. Moreover, the lengths n1 , . . . , nk of input se-
ficient multiway blockwise in-place merging. Given an ar-       quences are positive integer multiples of s, and hence,
ray A containing sorted subsequences Pk A1 , . . . , Ak of respec-
                                                                there is always a block boundary between the last el-
tive lengths n1 , . . . , nk , where      n = n, we assume
                                      i=1 i                     ement of Ai and the first element of Ai+1 , for each
that extra k·s elements (so called buffer elements) are po-     i ∈ 1, . . . , k−1. We shall also assume, that before the
sitioned at the very end of array A, and that the lengths
                                                                merging starts, blocks can be mixed up quite arbitrar-
n1 , . . . , nk are positive integer multiples of some param-
                                                                ily, so we no longer know the original membership of
eter s (i.e., multiples of a given block of length s). The
number of input sequences k is a fixed constant parameter,      blocks in the input sequences A1 , . . . , Ak .
not dependent on the lengths of input sequences. Then our            So far, the problem has been resolved for two-way
algorithm merges the subsequences A1 , . . . , Ak into a single merging, i.e., for k = 2 [4]. This algorithm uses 2n +
sorted sequence, performing Θ(log k·n) + O((n/s)2 ) + O(s·      o(n) comparisons, 3n + o(n) element moves and O(1)
log s) element comparisons and 3 · n + O(s · log s) element     extra locations for storing elements, in the worst case.
moves.1                                                         Thus, by repeated application of this algorithm, we
Then, for s = dn2/3 /(log n)1/3 e, this gives an algorithm      could carry out k-way merging in linear time, for ar-
performing Θ(log k ·n) + O((n·log n)2/3 ) comparisons and       bitrary k ≥ 2. However, implemented this way, the k-
3·n + O((n·log n)2/3 ) moves. That is, our algorithm runs       way merging would perform 3·dlog ke·n+o(n) element
in linear time, with an asymptotically optimal number of        moves and 2·dlog ke·n+o(n) element comparisons. We
comparisons and with the number of moves independent
                                                                shall show that the number of moves does not depend
on the number of input sequences. Moreover, our algo-
rithm is “almost in-place”, it requires only k extra blocks
                                                                on k, if the lengths n1 , . . . , nk are integer multiples
of size s = o(n).                                               of the block size s. Namely, using the algorithm of
                                                                Geffert et. al [4] as our starting point, we show that
                                                                multiway blockwise in-place merging is possible with
1 Introduction                                                  dlog ke·n + O((n/s)2 ) + O(s·log s) element comparisons
                                                                and 3·n + O(s·log s) moves. For s = dn2/3 /(log n)1/3 e,
                                                                this gives an algorithm with dlog ke·n + O((n·log n)2/3 )
Given an array A[1..n] consisting of sorted subsequen-
                                                                comparisons and 3·n + O((n·log n)2/3 ) moves, and the
ces A1 , . . . , Ak each
                       Pcontaining n1 , . . . , nk elements re-
                         k                                      number of element moves independent on the number
spectively, where i=1 ni = n, the classical multiway
                                                                of input sequences. (It is also easy to show that the
in-place merging problem is to rearrange these ele-
                                                                number of comparisons cannot be improved.)
ments to form a single sorted sequence of n elements,
assuming that only one extra storage location (in ad-
dition to the array A) is available for storing elements. 2 Comparisons in a simple multiway
To store array indices, counters, etc. only O(1) stor-                merging
age locations are available. The efficiency of a merg-
ing algorithm is given by two quantities: the number To explain how elements are compared, we first solve
of pairwise element comparisons and the number of a simpler task. Assume that we are given an array A,
element moves carried out in the worst case, both ex- consisting of k sorted subsequences A1 , A2 , . . . , Ak ,
pressed as a function of n. In merging, these are the that are to be merged into a single sorted sequence.
only operations permitted for elements.                         The lengths of these  P subsequences are n1 , n2 , . . . , nk
     In this paper we study the computational complex- respectively, with k ni = n.
                                                                                         i=1
ity of the multiway blockwise in-place merging prob-                 Assume also that, together with the given array A,
lem. More precisely, we assume that the entire array A we are also given an extra array B of the same size n,
is divided into blocks of equal size s, and that k ex- which will be used as an output zone.
tra blocks of size s are positioned at the very end of
                                                                     During the computation, the algorithm uses aux-
 1
   Throughout the paper, log x denotes the binary loga- iliary index variables i1 , . . . , ik and oc , where ij , for
   rithm of x.                                                  j ∈ {1, . . . , k}, points to the smallest element of the
sequence Aj not yet processed. This element will be                Before passing further, we define the following rel-
called the current input element of the j-th sequence,        ative order of blocks in the array A. Let X be a block
or simply the j-th input element. The index oc points         with the leftmost and the rightmost elements denoted
to the leftmost empty position in the array B.                by xL and xR , respectively. Such block can be repre-
    Then the straightforward implementation of the            sented in the form X = hxL , xR i. Similarly, let Y =
merge routine proceeds as follows. We find the small-         hyL , yR i be an another block. We say that the block X
est element not yet processed, by comparing elements          is smaller than or equal to Y , if xL < yL , or xL = yL
at the positions i1 , . . . , ik , and move this element to   and xR ≤ yR . Otherwise, X is greater than Y . In other
the output zone in B. After that, we update the nec-          words, the blocks are ordered according to their left-
essary index variables and repeat the process until all       most elements and, in the case of equal leftmost el-
the elements have been merged. Implemented this way,          ements, the elements at the rightmost positions are
each element will be moved just once and the number           used as the second order criterion.
of comparisons, per each element, will be k −1. This
gives us (k−1)·n comparisons and n element moves in
total.                                                            Now the modified merging algorithm proceeds as
                                                              follows. First, using the above block ordering, find the
    The number of comparisons can be reduced by im-           smallest k blocks in the array A. These blocks will ini-
plementing a selection tree of depth dlog ke above the        tially become the k current input blocks, their leftmost
k current input elements. Initially, to build a selec-        elements becoming the k current input elements. The
tion tree, k − 1 comparisons are required. Then the           j-th current input block will be denoted by Xj , simi-
smallest element, not yet processed, can be moved to          larly, the j-th current input element by xj . The posi-
the output zone. After this, the element following the        tions of current input elements are kept in index vari-
smallest element in the same subsequence is inserted          ables i1 , . . . , ik . Above the k current input elements,
in the tree and the selection tree is updated. To do          we build a selection tree. All blocks that are not input
this, only dlog ke comparisons are needed. To avoid el-       blocks are called common blocks.
ement moves, only pointers to elements are stored in
                                                               After that, the merging process can proceed in the
the selection tree. (For more details concerning this
                                                           same way as described in Section 2. That is, using
data structure, see [1–3].) The number of moves re-
                                                           the selection tree, determine ij , the position of the
mains unchanged, but now we have k − 1 compar-
                                                           smallest input element not yet processed, among the
isons for the first element and only dlog ke compar-
                                                           k current input elements, and move this element to
isons per each other element. This gives us a total of
                                                           the output zone in the array B. Then the element po-
(k−1) + dlog ke·(n−1) ≤ dlog ke·n + O(1) comparisons.
                                                           sitioned immediately on the right of xj , within the
                                                           same block Xj , becomes a new j-th current input el-
3 Comparisons in a blockwise merging ement, its index pointer is inserted into the selection
                                                           tree, and the tree is updated, with dlog ke compar-
This section describes one of the cardinal tricks used in isons. This can be repeated until one of the current
our algorithm. Again, we are given the array A con- input blocks becomes empty.
sisting of the sorted subsequences A1 , . . . , Ak , to be     When this happens, i.e., each time the element xj ,
merged together. We still have the extra array B, used just moved to the output zone, was the last (right-
as an output zone.                                         most) element in the corresponding input block Xj ,
    However, now the entire array A is divided into the block Xj is “discarded” and the smallest (accord-
blocks of equal size s (the exact value of s will be ing to our relative block ordering) common block not
determined later, so that the number of comparisons yet processed will be used as the new j-th current
and moves is minimized) and, before the merging can input block. The leftmost element in this block will
start, these blocks are mixed up quite arbitrarily. Be- become the new j-th current input element. Since the
cause of the permutation of blocks in A, we no longer blocks are mixed up in the array A, we need to scan
know the original membership of blocks in the input sequentially all blocks (actually, all blocks not yet pro-
sequences A1 , . . . , Ak .                                cessed only) to determine which one of them is the
    Still, the relative order of elements inside individ- smallest. This search for a new input block consumes
ual blocks is preserved. Moreover, we shall also assume O((n/s)2 ) additional comparisons: there are at most
that n1 , . . . , nk , the respective lengths of input se- n/s blocks and such search is activated only if one of
quences, are positive integer multiples of s, and hence, the input blocks has been discarded as empty, i.e., at
before mixing the blocks up, there was always a block most n/s times. (For the time being, just assume that
boundary between the last element of Ai and the first we can distinguish discarded blocks from those not yet
element of Ai+1 , for each i ∈ 1, . . . , k−1.             processed, at no extra cost.)
    However, before merging, the blocks have been mi-               a so-called escape block. We also maintain a current
xed up quite arbitrarily and hence their origin in the              escape position ec , which is initially the position of
input subsequences A1 , . . . , Ak cannot be recovered.             the first (leftmost) element in the escape block. We
The proof that the above algorithm behaves correctly,               create a hole here by putting the buffer element at
that is, the elements are transported to the output                 this position aside.
zone in sorted order, will be published in the full ver-                Now, find the smallest k blocks2 in the area oc-
sion of the paper. (** The proof can also be found in               cupied by A1 , . . . , Ak , according to the relative block
the Appendix for the Program Committee. **)                         ordering defined in Section 3. This can be done with
   The number of element moves remains unchanged,                   O(k 2 ) ≤ O(1) comparisons, by the use of some k cur-
but now we use dlog ke · n + O((n/s)2 ) comparisons,                sors (index variables) moving along in A, since each
under assumption that we can distinguish discarded                  of the sequences A1 , . . . , Ak is sorted. The smallest k
blocks from those not yet processed at no extra cost.               blocks will initially become the k current input blocks
                                                                    X1 , . . . , Xk . For each j = 1, . . . , k, the first element xj
                                                                    in the block Xj becomes a j-th current input element,
4     In-place merging, simplified case                             and its position is kept in the index variable ij . Above
                                                                    the k input elements, we build a selection tree of depth
Now we shall convert the above merging algorithm                    dlog ke. To do that, k −1 ≤ O(1) initial comparisons
into a procedure working “almost” in-place. More pre-               are needed.
cisely, we are again given the array A containing the                   The very first block of the array A becomes an
sorted subsequences    Pk A1 , . . . , Ak , of respective lengths   output block and a position oc = 1 pointing there
n1 , . . . , nk , with  i=1 ni = n. All these lengths are           becomes a current output position. The initial out-
positive integer multiples of the given parameter s.                put position may — quite likely — coincide with a po-
     However, we no longer have a separate array B of               sition of some current input element. Observe that
size n. Instead, we have some extra k·s elements po-                ec mod s = oc mod s, which is an invariant we shall
sitioned at the very end of the array A, behind Ak .                keep in the course of the entire computation. All other
The elements in this small additional area are greater              blocks are called common blocks.
than any of the elements in A1 , . . . , Ak . During the
computation, they can be mixed with other elements,
but their original contents cannot be destroyed. These                    In general, the algorithm maintains current posi-
elements will be called buffer elements. To let the el-             tions of the following special blocks: free blocks, the
ements ever move, we have also one extra location                   number of which ranges between 0 and k, their left-
where we can put a single element aside.                            most positions are stored in the free block stack; ex-
     The sorted output should be formed within the                  actly k input blocks, the current input positions inside
same array A, in the locations occupied by the input                the respective blocks are stored in the index variables
sequences A1 , . . . , Ak . (As a consequence, the buffer el-       i1 , . . . , ik ; one output block with the current output
ements should also end up in their original locations.)             position oc inside this block; and one escape block with
Therefore, the moves are performed in a different way,              the current escape position ec inside. The values of oc
based on the idea of internal buffering, used in a two-             and ec are synchronized modulo s.
way in-place merging [4]. Nevertheless, the compar-                     Usually, the optional free blocks, the k current in-
isons are performed in the same way as described in                 put blocks, the output block, and the escape block are
Section 3.                                                          all disjoint, and the merging proceeds as described in
                                                                    Section 4.2. However, after the initiation, the output
                                                                    block may overlay one of the current input blocks, if
4.1    Initiation
                                                                    the leftmost block in A1 has been selected as an in-
Divide the entire array A into blocks of equal size s.              put block. If this happens, the current output position
Since the lengths of all input sequences A1 , . . . , Ak are        coincides with a position of one of the current input
positive integer multiples of s, there is always a block            elements, and the computation starts in a very special
boundary between the last element of Ai and the first               mode of Section 4.9.
element of Ai+1 , for each i ∈ 1, . . . , k−1. Similarly,
the buffer elements, positioned in the small additional             2
                                                                        Picking simply the leftmost blocks in the sequences
area at the very end, form the last k blocks.                           A1 , . . . , Ak would do no harm. In addition, this would
    Initially, the last k blocks will be used as free blocks,           not require any initial element comparisons. However,
their starting positions are stored in a free block stack               we are presenting the algorithm in a form that is suitable
of height k. After that, the position of one free block                 for application in the general case, comparing elements
is picked out of the stack and this block is used as                    in accordance with the strategy presented in Section 3.
4.2     Standard situation                                                                                                                                                                           Step B. The smallest input element xj not yet pro-
                                                                                                                                                                                                        cessed is moved from the position ij to its final
The standard situation is illustrated by Fig. 1. Dur-                                                                                                                                                   position at oc .
ing the computation, the k·s buffer elements can be                                                                                                                                                  Step C. A new hole is created at the position ec +1 by
found at the following locations: to the left of the j-                                                                                                                                                 moving its buffer element to the place released by
th input element xj in the j-th input block Xj , for                                                                                                                                                    the smallest input element just moved. After that,
j ∈ {1, . . . , k}, to the right of ec in the escape block,                                                                                                                                             all necessary index variables are incremented and
with the hole at the position ec , and also in free blocks,                                                                                                                                             the selection tree is updated.
consisting of buffer elements only.
   The elements merged already, from all the input                                                                                                                                                  This gives 3 moves and dlog ke comparisons per each
blocks, form a contiguous output zone at the very be-                                                                                                                                               element transported to its final location. Now there
ginning of A, ending at position oc −1. Hence, the next                                                                                                                                             are various special cases that should be detected and
element to be output will go to the position oc in the                                                                                                                                              handled with care. All exceptions are checked up on
output block.                                                                                                                                                                                       after the execution of Step B, in the order of their
   All elements not merged yet are scattered in blocks                                                                                                                                              appearance, unless stated otherwise. Most of the ex-
between the output zone and the end of the array A.                                                                                                                                                 ception handling routines replace Step C by a different
The permutation of these blocks is allowed, however,                                                                                                                                                action.
elements to be merged keep their relative positions
within each block. On the other hand, the origin of
the blocks in the subsequences A1 , . . . , Ak cannot be                                                                                                                                            4.3   Escape block becomes full
recovered. So optional free blocks, input blocks, es-
cape block, and common blocks can reside anywhere                                                                                                                                                   If the rightmost element of the output block is moved
between the output zone and the end of the array A.                                                                                                                                                 to the last position of the escape block, the new hole
                                                                                                                                                                                                    cannot be created at the position ec +1 in Step C.
                                                                                                                                                                                                    Instead, one free block at the top of the stack becomes
                                                                                                                                                                                                    the new escape block and a new hole is created at
                                                                                                                                             k-th input block


                                                                                                                                                                                                    the beginning of this block. This is accomplished by
                                             i-th input block
                              common block


                                                                             common block

                                                                                            j-th input block


                                                                                                                              common block


                                                                                                                                                                common block

                                                                                                                                                                               common block
                                                                                                               escape block
               output block


                                                                                                                                                                                                    removing its starting position from the free block stack
                                                                free block


                                                                                                                                                                                                    and assigning it to ec .
                                                                                                                                                                                                        The subsequent move of the buffer element from
                                                                                                                                                                                                    the new position of ec to the place released by the
      output
                                                                                                                                                                                                    smallest input element does not increase the number
       zone                                                                                                                                                                                   ...
                                                                                                                                                                                                    of moves; it replaces the move in Step C. The selection
                 oc                              xi                                                   xj          ec                         xk                                                     tree is updated in the standard way.
                                                                                                                                                                                                         It should be pointed out that, at this moment,
                                                                                                                                                                                                    there does exist at least one free block in the stack. As-
                                    Fig. 1. Standard situation
                                                                                                                                                                                                    sume, for example, that the j-th input element xj has
                                                                                                                                                                                                    just been transported to the output zone. After that,
    The output block spans across the current output                                                                                                                                                we have rj ∈ {1, . . . , s} buffer elements in the j-th in-
position oc , so its left part belongs to the output zone.                                                                                                                                          put block Xj , including the hole, but rh ∈ {0, . . . , s−1}
As the output grows to the right, the elements lying                                                                                                                                                buffer elements in other input blocks Xh , for each
to the right of oc are moved from the output block to                                                                                                                                               h ∈ {1, . . . , k}, h 6= j, since each input block, except
the corresponding positions in the escape block, i.e.,                                                                                                                                              for Xj , contains at least one input element. Moreover,
to the right of ec . The positions of oc and ec are syn-                                                                                                                                            the escape block is full, and hence it does not contain
chronized, i.e., we have always oc mod s = ec mod s.                                                                                                                                                any buffer elements at all. Assuming there is no free
Hence, the relative positions of escaping elements are                                                                                                                                              block available, this gives at most s + (k−1)·(s−1) <
preserved within the blocks. Moreover, oc and ec reach                                                                                                                                              k · s buffer elements in total. But this is a contradic-
their respective block boundaries at the same time.                                                                                                                                                 tion, since the number of buffer elements, including
                                                                                                                                                                                                    the hole, is always equal to k·s.
    Now we are ready for merging. Using the selection
tree, we determine xj , the smallest element among the
k current input elements in the blocks X1 , . . . , Xk , and 4.4 Current input block becomes empty
move this element to the output zone as follows:
                                                             We check next whether the smallest element xj , just
 Step A. The element at the position oc in the output moved from the position ij to the output zone, was
    block escapes to the hole at the position ec .           the last element of the corresponding input block Xj .
If so, we have an entire block consisting of buffer el-             4.5   One of the input blocks overlays the
ements only, with hole at the end after Step B. This                      escape block
hole is filled in the standard way, described in Step C,
                                                                    If the common block that should be processed next
but the old input block Xj becomes a free block and
                                                                    is the logical block composed of the left part of the
its starting position is saved in the stack. Since we
                                                                    escape block and the right part of the output block,
have k·s buffer elements in total, a stack of height k
                                                                    then both the new current input block Xj and the es-
is sufficient.
                                                                    cape block are located within the same physical block.
                                                                    Here xj is always positioned to the left of ec and the
    Next, we have to find a new j-th input block Xj ,               buffer elements are both to the left of xj and to the
and assign a new value to ij . Since the blocks are                 right of ec .
mixed up, we scan sequentially the remaining com-                       Once the position of xj is properly initiated, all
mon blocks to determine which common block should                   actions are performed in the standard way described in
become the new j-th current input block. The smallest               Section 4.2. That is, the elements are transported from
common block, according to the block ordering intro-                the output block to the position of ec , from the input
duced in Section 3, is the next one to be processed.                blocks to the position of oc , and buffer elements from
As already shown in Section 3, the elements are trans-              ec +1 to locations released in the input blocks. Since ec
ported to the output zone in sorted order even though               moves to the right “faster” than does ij , this special
this strategy does not necessarily pick up the j-th in-             case returns automatically to the standard mode as
put block from the j-th input sequence Aj .                         soon as ec reaches a block boundary. Then the escape
    Free blocks, as well as all remaining current input             block separates from the current input block Xj as
blocks, are ignored in this scanning. Moreover, the el-             described in Section 4.3.
ements to the left of ec in the escape block (if not
empty) together with the elements to the right of oc in             4.6   Output block overlays the escape block
the output block are viewed as a single logical block. In
a practical implementation, we can start with the left- Next we check whether the output zone, crossing a
most escape-block element and the rightmost output-     block boundary, does not bump into any “special”
block element as a starting key and search the rest     block. It is easy to see that this may happen only if
of the array for a common block with a smaller key.3    ec points to the beginning of the escape block that is
If the logical block composed of the left part of the   empty, using the fact that the positions of oc and ec
escape block and the right part of the output block     are synchronized and that the special handling of Sec-
should be processed next, the program control will be   tion 4.3 is performed first.
switched to the mode described in Section 4.5.              Now consider that the output block overlays the
                                                        escape block, i.e., they are both located within the
    If the escape block is empty, then both ec and oc same physical block. In this mode, we always have
point to the beginning of their respective blocks. Then o = e . The element movement corresponds now to a
                                                         c     c
the escape block is skipped and the output block is more efficient scheme:
handled as a common block, so we may even find out
that the new input block should be located at the same Step B’. The smallest input element xj not yet pro-
position as the output block. This special mode is ex-      cessed is moved to the hole at the position oc = ec .
plained in Section 4.9.                                  Step C’. A new hole is created at the position oc+1 =
    The search for new input blocks costs O((n/s)2 )        ec + 1 by moving its buffer element to the place
additional comparisons: there are O(n/s) blocks in          released by xj . Then all necessary index variables
total and such search is activated only if one of the       are  incremented and the selection tree is updated.
input blocks is exhausted, i.e., at most O(n/s) times. Step A is eliminated, since oc = ec . This mode is ter-
The same upper bound holds for arithmetic operations minated as soon as oc and ec reach a block boundary.
with indexes as well.                                        We also need a slightly modified version of the routine
                                                             described in Section 4.4. If one of the input blocks be-
 3                                                           comes empty, it becomes free as usual, but the com-
   It is quite straightforward to detect whether a block be-
                                                             bined output/escape block is skipped in the search for
   ginning at a given position ` is common: the value of `
   must not be saved in the free block stack, and b`/sc must the next input block.
  be different from bi1 /sc, . . . , bik /sc (excluding bij /sc),
  and also from bec /sc. For each given block, this can             4.7   Output block overlays a free block
  be verified in O(k) ≤ O(1) time, performing auxiliary
  arithmetic operations with indexes only, but no element           If the output zone crosses a block boundary and the
  comparisons or moves.                                             value of oc is equal to some f` , the leftmost position of
a block stored in the free block stack, the new output         tion 4.4, the combined output/escape block is not
block and the corresponding free block are overlaid.           disposed of as free, moreover, it is skipped out dur-
This can be verified in O(k) ≤ O(1) time. By the same          ing the search. The program control is switched to
argument as in Section 4.6, we have that ec must point         the mode of Section 4.6 as the output and escape
to the beginning of an empty escape block.                     blocks are still overlaid.
    Therefore, we can easily swap the free block with (2) Let us now consider that this combined block be-
the escape block by swapping the pointers stored in            comes full. This may happen only if, for some
f` and ec , since both these blocks contain buffer ele-        h 6= j, an element xh from another input block Xh
ments only. Second, one move suffices to transport the         is moved to the output zone and, after Step B’, the
hole from one block to another. Note that this element         output position oc “bumps” into xj . In this case,
move is for free, we actually save some moves because          we take one free block from the top of the stack
the next s transports to the output zone will require          and change it into a new escape block. We defi-
only 2s moves, instead of 3s as in the standard case.          nitely have at least one free block available, since
Thus, the program control is switched to the mode              we disposed one block as free at the very beginning
described in Section 4.6.                                      of this mode. The hole, located in Xh at the posi-
                                                               tion of the last element transported to the output,
4.8 Output block overlays a current input                      jumps to a position ec in the new escape block,
       block                                                   so that ec mod s = oc mod s. This move replaces
                                                               Step C’ for the last element just merged. Hence, it
If the output position oc points to some Xj after cross-       does not increase the total number of moves. Then
ing a block boundary, the output block overlays the j-         we follow the instructions of Section 4.9.
th input block Xj . Again, by the argument presented
in Section 4.6, oc can point to the beginning of an
input block only if ec points to the beginning of an 4.9 Output zone bumps into a current input
empty escape block. There are now two cases to con-               element
sider.
    First, if the j-th current input element xj is the The program control can jump to this special mode
leftmost element of Xj , the program control is switch- from several different places (Sections 4.1, 4.4, and two
ed immediately to the special mode to be described in different places in Section 4.8). In any case, we have an
Section 4.9.                                               empty escape block, containing the hole and buffer ele-
    Second, if xj is not the leftmost element of Xj , we ments only. The output block and a block Xj , which is
dispose of the empty escape block as free by storing one of the input blocks, are overlaid. Moreover, there
its starting position ec in the stack, create a hole at oc is no room in between, the output position oc is point-
by moving a single buffer element from the position oc ing to the current input element xj . The position of
to ec , and overlay the output block by a new escape hole in the escape block is synchronized with oc , i.e.,
block, by assigning the value of oc to ec . The additional we have ec mod s = oc mod s.
transportation of the hole is for free, not increasing the     As long as the elements to be output are selected in
total number of moves, because we can charge it as         the input block Xj , they can be moved to the output
(nonexistent) Step A for the next element that will be     zone.  This needs no actual transportation, just the
transported to the output zone. Since xj is not placed     positions  of oc and ij are moved synchronously to the
at the beginning of the block, we can guarantee that       right. To  keep ec synchronized with oc , we move the
at least one transport to the output will use only two     hole along  the escape block in parallel, which gives us
moves in the next future.                                  one  move  per  element. There are two ways out of this
    This special mode can be viewed as if three blocks     loop.
were overlaid, namely, the output, escape, and the cur-
                                                           (1) If oc and ij reach the block boundary, we sim-
rent input block Xj . The buffer elements are between
                                                               ply search for the next input block to be pro-
the hole at ec = oc and the current input element xj .
                                                               cessed; the current configuration is the same as
The elements are moved according to Step B’ and
                                                               if, in the standard mode, oc , ec , and ij reached
Step C’ of Section 4.6. However, there is a different
                                                               the block boundaries at the same time (with the
exception handling here.
                                                               old input block Xj disposed of as free, by Sec-
(1) If the rightmost input element of this combined            tion 4.4). Thus, unless something “exceptional”
     block has been transported to the output zone,            happens, the program control returns to the stan-
     then the input block Xj separates from the out-           dard mode. (The possible exceptions are those dis-
     put/escape block, since we search for the next in-        cussed in Sections 4.6–4.8, and 4.10.) The single
     put block to be processed. But here, unlike in Sec-       move required to place the hole back to the begin-
    ning of the escape block is for free, it substitutes          sists of the subsequences A1 , . . . , Ak merged into a sin-
    Step C for the last element merged.                           gle sorted sequence, followed by a sorted sequence of
(2) If the element to be transported to the output zone           buffer elements.
    is an element xh from another input block Xh ,
    for some h 6= j, some rearrangements are neces-
                                                                  4.11    Summary
    sary. Recall that the hole position ec in the es-
    cape block is synchronized with oc , i.e., we have            Summing up the costs paid for maintaining the se-
    ec mod s = oc mod s. First, the input element xj              lection tree, transporting the elements to the output
    is moved from position oc to position ec . Now we             zone, searching for smallest input blocks, and for sort-
    can transport xh to the output position oc . Fi-              ing the residual zone, it is easy to see that the above
    nally, a new hole is created4 at the position ec +1           algorithm uses dlog ke·n + O((n/s)2 ) + O(s·log s) el-
    by moving its buffer element to the place released            ement comparisons and 3 · n + O(s · log s) moves. For
    by xh .                                                       s = dn2/3 /(log n)1/3 e, this gives an algorithm with
    The result is that the current input block Xj , over-         dlog ke · n + O((n · log n)2/3 ) comparisons and 3 · n +
    laid by the output block, jumps and overlays the              O((n·log n)2/3 ) moves.
    escape block. Thus, the control is switched to the
    mode of Section 4.5.
    Clearly, this rearrangement needs only three mo-              5    Conclusion
    ves. Since one more element has been transported
    to the output zone, the number of moves is the                In this paper we have shown that k-way blockwise
    same as in the standard case.                                 in-place merging can be accomplished efficiently with
                                                                  almost optimal number of element comparisons and
                                                                  moves. Moreover, the number of element moves is in-
4.10     Common blocks are exhausted                              dependent on k, the number of input sequences. Note
                                                                  that this algorithm does not merge stably, that is, the
If one of the current input blocks becomes empty, but
                                                                  relative order of equal elements may not be preserved.
there is no common block to become a new input
                                                                  Whether there exist a stable multiway blockwise in-
block, the above procedure is stopped. At this point,
                                                                  place merging algorithm is left as an open problem.
the output zone, consisting of the elements merged al-
                                                                      We conjecture that, using the algorithm described
ready in their final locations, is followed by a residual
                                                                  here as a subroutine, it is possible to devise an asymp-
zone of size n0 starting at the position oc . This zone
                                                                  totically efficient multiway in-place merging algorithm.
consists of the right part of the output block, k−1 un-
                                                                  We dare to formulate this conjecture since the work on
merged input blocks, at most k free blocks, and one
                                                                  such algorithm is currently in progress.
escape block. Thus, the total length of this residual
zone is n0 ≤ s + (k−1)·s + k·s + s = (2k+1)·s.
    The residual zone can be sorted by the use of Heap-           References
sort (including also the buffer element put aside at the
very beginning of the computation, to create a hole).             1. Katajainen, J., Pasanen, T.: In-Place Sorting with
This will cost only O(k·s·log(k·s)) ≤ O(s·log s) com-                Fewer Moves. Inform. Process. Lett. 70 (1999) 31–37
parisons and the same number of moves [5–9]. Alterna-             2. Katajainen, J., Pasanen, T., Teuhola, J.: Practical In-
tively, we could also use an algorithm sorting in-place              Place Mergesort. Nordic J. Comput. 3 (1996) 27–40
with O(s·log s) comparisons but only O(s) moves [10].             3. Katajainen, J., Träff, J. L.: A Meticulous Analysis of
    Now we are done: the buffer elements are greater                 Mergesort Programs. Lect. Notes Comput. Sci. 1203
than any other element, and hence the array now con-                 (1997) 217–28
                                                                  4. Geffert, V., Katajainen, J., Pasanen, T.: Asymptotically
4
    Unless the position ec +1 itself is across the block bound-      Efficient In-Place Merging. Theoret. Comput. Sci. 237
    ary. If xj is moved to the rightmost position in the escape      (2000) 159–81
    block, the escape block jumps immediately and one free        5. Carlsson, S.: A Note on Heapsort. Comput. J. 35 (1992)
    block becomes a new escape block. This nested excep-             410–11
    tion thus returns the algorithm to the standard mode; all     6. Knuth, D. E.: The Art of Computer Programming,
    “special” blocks now reside in pairwise disjoint regions.        Vol. 3: Sorting and Searching. Addison-Wesley, Second
    However, we jump to the point where the standard rou-            edition (1998)
    tine checks the exceptions of Sections 4.4–4.10. Among        7. Schaffer, R., Sedgewick, R.: The Analysis of Heapsort.
    others, we have to check whether the input block Xh              J. Algorithms 15 (1993) 76–100
    has not become empty, or if the output zone, just cross-      8. Wegener, I.: Bottom-Up-Heapsort, a New Variant of
    ing a block boundary, has not bumped into any other              Heapsort Beating, on an Average, Quicksort (If n Is Not
    “special” block again.                                           Very Small). Theoret. Comput. Sci. 118 (1993) 81–98
9. Williams, J. W. J.: Heapsort (Algorithm 232). Comm.        Appendix for the Program Committee
   Assoc. Comput. Mach. 7 (1964) 347–48
10. Franceschini, G., Geffert, V.: An In-Place Sorting with   In this appendix, we give a proof of correctness of the
   O(n·log n) Comparisons and O(n) Moves. J. Assoc.
                                                              blockwise merging algorithm described in Section 3.
   Comput. Mach. 52 (2005) 515–37
                                                                   Recall that the algorithm starts with the smallest
                                                              k blocks of the array A and, each time one of the k in-
                                                              put blocks becomes empty, the smallest common block
                                                              not yet processed becomes the new input block, not
                                                              taking into account its origin in one of the sequences
                                                              A1 , . . . , Ak .
                                                                   Let us assume, for contradiction, that an element
                                                              xC has just been transported to the output zone, but
                                                              there still exists an unprocessed element y, such that
                                                              y < xC . Clearly, the element xC was a current input
                                                              element in some current input block. Originally, this
                                                              input block was represented in the form X = hxL , xR i,
                                                              which gave its relative block order. Here xL repre-
                                                              sents the original leftmost element of X (transported
                                                              to the output even earlier than xC ) and xR the right-
                                                              most element (still residing in X). Since all blocks are
                                                              sorted, we have that xL ≤ xC ≤ xR (not excluding
                                                              the possibility that xC coincides with xL or xR ). Simi-
                                                              larly, the element y resides in a block characterized by
                                                              Y = hyL , yR i, with yL ≤ y ≤ yR . (The current status of
                                                              the original leftmost element yL is not known: it can
                                                              still reside in Y but, potentially, it may be a part of
                                                              the output zone already.) Now there are the following
                                                              possibilities to consider:
                                                                  (a) The block Y is one of the current input blocks
                                                              (not excluding the possibility that Y coincides with X).
                                                              This case is straightforward. Let yC denotes the cur-
                                                              rent input element in the input block Y . Clearly, yC ≤
                                                              y, since y has not been processed yet. But then yC ≤
                                                              y < xC , that is, yC < xC , which is a contradiction,
                                                              since xC has just been moved to the output, and hence
                                                              determined to be the smallest one among all current
                                                              input elements.
                                                                  (b) The block Y is one of the common blocks (not
                                                              yet being processed) and, at the present moment, all
                                                              k current input blocks have their origin in different
                                                              input sequences A1 , . . . , Ak . This does not necessarily
                                                              mean that the i-th input block Xi originated from Ai ,
                                                              since it could have its origin in a different sequence.
                                                              Nevertheless, this does imply that one current input
                                                              block Z ∈ {X1 , . . . , Xk } originated from the same se-
                                                              quence A` as did the block Y .
                                                                  Let Z = hzL , zR i represents the original character-
                                                              ization of this current input block, and let zC be its
                                                              current input element. Clearly, zL ≤ zC ≤ zR . Using
                                                              the fact that the block Z has been selected as an in-
                                                              put block prior to selecting the block Y = hyL , yR i, and
                                                              that these two blocks originated from the same sorted
                                                              sequence A` , we have that zR ≤ yL . (For, if zR > yL ,
                                                              the sequence A` could not be sorted.) But this gives
that zC ≤ zR ≤ yL ≤ y < xC . Therefore, zC < xC ,
which is a contradiction, since the element xC has been
determined to be the smallest one among all current
input elements.
    (c) Finally, let Y be one of the common blocks
and, at the present moment, at least two current in-
put blocks have their origin in the same input se-
quence. That is, we have V, W ∈ {X1 , . . . , Xk }, com-
ing from A` , for some ` ∈ {1, . . . , k}. Let the respective
characterizations of these two blocks be V = hvL , vR i
and W = hwL , wR i, their respective current input ele-
ments be vC and wC . Clearly, vL ≤ vC ≤ vR and wL ≤
wC ≤ wR . Without loss of generality, we can also as-
sume that the block V was smaller than or equal to W .
But then vR ≤ wL , since these two blocks came from
the same sorted sequence A` . Moreover, at the present
moment, the element xC has just been selected as the
smallest one from among all current input elements,
and hence we also have that xC ≤ vC . Putting these
facts together, we get yL ≤ y < xC ≤ vC ≤ vR ≤ wL .
Therefore, yL < wL , which contradicts the fact that
the block W had been selected as an input block prior
to selecting the block Y .
    Summing up, all cases have led us to contradic-
tions, and hence the elements are always transported
to the output zone in sorted order.5


5
    Notice that the algorithm would behave correctly even if
    the number of used current input blocks exceeded k, the
    number of original input sequences. This only eliminates
    Case (b) in the proof of correctness.