=Paper=
{{Paper
|id=Vol-2300/Paper30
|storemode=property
|title=Development and Research of Conveyor Structures of Binary Number Sorting Algorithms
|pdfUrl=https://ceur-ws.org/Vol-2300/Paper30.pdf
|volume=Vol-2300
|authors=Volodymyr Gryga,Yaroslav Nykolaichuk,Nataliia Vozna, Artur Voronych,Boris Krulikovskyi
|dblpUrl=https://dblp.org/rec/conf/acit4/GrygaNVVK18
}}
==Development and Research of Conveyor Structures of Binary Number Sorting Algorithms==
123
Development and Research of Conveyor Structures of
Binary Number Sorting Algorithms
Volodymyr Gryga1, Yaroslav Nykolaichuk2, Nataliia Vozna2, Artur Voronych3, Boris
Krulikovskyi4
1. Department of Computer Engineering and Electronics, Vasyl Stefanyk Precarpathian National University, UKRAINE, Ivano-Frankivsk,
57 Shevchenko str., email: v.dr_2000@ukr.net
2. Department of Specialized Computer Systems, Ternopil National Economic University, UKRAINE, Ternopil, 8 Chekhova str., email:
ya.nykolaichuk@tneu.edu.ua, n.vozna@ukr.net
3. Department of Computer Systems and Networks, Ivano-Frankivsk National Technical University of Oil and Gas, UKRAINE, Ivano-
Frankivsk, 15 Karpatska str., email: a.voronych@it.nung.edu.ua
4. Department of Computer Engineering, National University of Water Management and Nature Resources Use, UKRAINE, Rivne, 11
Soborna str., email: kboris@ukr.net
Abstract: The characteristics of the structures the "pairwise-odd" permutation method, the fusion method
complexity of conveyor sorting devices, constructed on and other [3].
the basis of parallel algorithms for sorting binary arrays If we use the hardware method, then there is ensured
are analyzed. The improved structure of the conveyor implementation of the binary number sorting operation in real
sorting device, which compared to the known one had 1.9 time and the possibility of applying new circuit design
times less hardware complexity and 1.5 times higher solutions with the use of a modern element base with an
performance, was developed and researched. Modeling of orientation towards implementation in the FPGA form.
VHDL models of conveyor sorting devices was carried The implementation of highly efficient conveyor devices
out, their synthesis and implementation was made in the for sorting binary arrays requires extensive use of a modern
Xilinx FPGA by automated designing Vivado system element base, the improvement of existing and the
applications. The received practical results of the development of new methods, algorithms and device
complexity characteristic of the developed conveyor structures.
devices coincide with theoretical calculations. There are In this regard, an important t ask is to develop efficient
also determined the relevant areas of application of the structures for conveyor sorting devices (CSDs), which will
developed devices. improve the time characteristics of multithreaded data
Keywords: special device for sorting arrays of data, processing. The implementation of such devices in the
conveyor sorting device, basic sorting elements, languages of the hardware description and their synthesis in
comparison scheme. FPGA will enable the designer to choose the optimal
hardware cost and time-efficient characteristics of the
I. INTRODUCTION structure of the CSD.
Sorting is one of the common problems of data processing
and it is generally understood as a problem of placing
II. CONVEYOR STRUCTURES ANALYSIS OF PARALLEL
elements of disordered set of data sets values in order of ALGORITHMS FOR BINARY NUMBERS SORTING
monotonic increase or decrease [1]. The sorting operation The conveyor processing principle involves the alignment
takes on average 25% of machine time [2] and it is most in time operators of the algorithms sorting on different data
often used in the tasks of digital processing of signals and [5-8].
images, similar to convolution operations, fast Fourier Conveyor structures of sorting devices of binary arrays is
transform, and others [3]. We know many methods of developed and analyzed below. They are based on the known
sequential and parallel sorting of binary numbers [1-9]. There parallel sorting algorithms presented in a graphical form, and
is a large number of algorithms to organize sorting in modern their hardware and time complexity are estimated. (Their
digital signal processors, and each of these algorithms has its system characteristics)
advantages and disadvantages. Performing a software sorting The hardware implementation of the known parallel
operation is time-consuming and generally used to perform sorting algorithms in the conveyor structures involves the full
consistent data sorting methods. Particularly effective is the reflection of their flow graphs algorithm (tiered-parallel
parallel performance of algorithm sorting operations using algorithm forms) [4-8,10,11,13] into the structure of the
hardware methods that significantly accelerate the execution operating device. The vertices of the graph (functional
time of the algorithm. Such methods of parallel sorting operators) will correspond to the hardware block (operation)
algorithms, in which the sequence of executed operations and the arcs will correspond to lines for the transmission of
only depends on the number of input data, are the following: input data of intermediate and final results. The conveyor
the Batcher’s method [2], the modified "bubbles" method [1], registers are placed on the lines that connect the operating
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
124
units of one tier with the operating units of the previous tier. x1 x2 x3 x4 x5 x6
Tier-parallel algorithm forms determines the degree of
Rg1 Rg2 Rg3 Rg4 Rg5 Rg6
parallelism of the graph (the maximum number of vertices in
a single tier or the width of the graph), as well as the BSE1 BSE2 BSE3
minimum-possible time of calculation of the given algorithm
(number of tiers or height of the graph) [6-8]. Rg7 Rg8 Rg9 Rg10 Rg11 Rg12
It is shown the structure of the CSD in Fig. 1, which is
based on the numbers sorting by modified “bubbles” sorting BSE4 BSE5
method [1,13].
x1 x2 x3 x4 Rg13 Rg14 Rg15 Rg16 Rg17 Rg18
Rg1 Rg2 Rg3 Rg4 BSE6 BSE7 BSE8
Rg19 Rg20 Rg21 Rg22 Rg23 Rg24
BSE1
BSE9 BSE10
Rg5 Rg6 Rg7 Rg8
Rg25 Rg26 Rg27 Rg28 Rg29 Rg30
BSE2
BSE11 BSE12 BSE13
Rg9 Rg10 Rg11 Rg12
Rg31 Rg32 Rg33 Rg34 Rg35 Rg36
BSE3 BSE4 BSE14 BSE15
Rg13 Rg14 Rg15 Rg16 Rg37 Rg38 Rg39 Rg40 Rg41 Rg42
BSE5
y1 y2 y3 y4 y5 y6
Fig.2. The structure of Conveyor sorting device for 6 values by
Rg17 Rg18 Rg19 Rg20
"pairwise-odd" permutation method.
x1 x2 x3 x4 x5 x6 x7 x8
BSE6
Rg1 Rg2 Rg3 Rg4 Rg5 Rg6 Rg7 Rg8
Rg21 Rg22 Rg3 Rg24 BSE1 BSE2 BSE3 BSE4
y1 y2 y3 y4 Rg9 Rg10 Rg11 Rg12 Rg13 Rg14 Rg15 Rg16
Fig.1. The structure of Conveyor sorting device for 4 values by the
modified “bubbles” sorting method. BSE5 BSE6 BSE7 BSE8
The basis for the above-presented structure of conveyor Rg17 Rg18 Rg19 Rg20 Rg21 Rg22 Rg23 Rg24
sorting device is the basic sorting elements (BSE) and
conveyor register (Rg). BSE9 BSE10
The structure of this conveyor device consists of the same
Rg25 Rg26 Rg27 Rg28 Rg29 Rg30 Rg31 Rg32
basic sorting operations (max/min). CSD structure (Fig. 1)
contains N(N-1)/2 basic sorting operations and N(2N-3)
BSE11 BSE12 BSE13 BSE14
conveyor registers for input N values.
The time complexity of this conveyor device structure is Rg33 Rg34 Rg35 Rg36 Rg37 Rg38 Rg39 Rg40
determined by the critical distribution of the signal through
(2N-3) basic operations of sorting and BSE15 BSE16 BSE17
(2N-3)+1 conveyor registers.
Rg41 Rg42 Rg43 Rg44 Rg45 Rg46 Rg47 Rg48
It is shown the structure of the CSD in Fig. 2, which is
constructed on the basis of sorting algorithm by "pairwise-
BSE18 BSE19
odd" permutation method [2,13].
CSD structure (Fig. 2) contains N(N-1)/2 basic sorting Rg49 Rg50 Rg51 Rg52 Rg53 Rg54 Rg55 Rg56
operations and (N × N) conveyor registers for N input values.
The time complexity of this conveyor device structure is y1 y2 y3 y4 y5 y6 y7 y8
determined by the critical distribution of the signal through N
Fig.3. The structure of Conveyor sorting device for 8 values by
basic operations of sorting and (N+1) conveyor registers. Batcher`s method.
It is shown the CSD structure in Fig. 3, which is
constructed on the basis of the sorting algorithm graph by CSD structure (Fig. 3) contains 0, 48 N ln 2 N basic sorting
Batcher’s method [3,13].
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
125
1 τ CS - time complexity of comparison scheme,
operations and ( log 2 N )([log 2 N + 1]) × N conveyor
2 τ - time complexity of multiplexer,
MP
registers for N input values.
The time complexity of this conveyor device structure is τ REG - time complexity of register.
determined by the critical distribution of the signal through
1
III. DEVELOPMENT OF AN IMPROVED STRUCTURE OF
( log 2 N )( log 2 N + 1) basic operations of sorting and (N-1) THE CONVEYOR SORTING DEVICE
2
conveyor registers. The bubble CSD (Fig. 1) can be improved by performing
The conveyor structures of the sorting devices (Fig. 1-3) independent base-sorting operations separately for the first
consist of the same type of basic sorting elements that and second half of the input data on the two conveyor
compare two numbers and more of them are sent to the first structures.
output and less to the second output. The internal structure of Values X are already ordered by recession and need to be
sorting basic element is presented in Fig. 4. i
X1 X2 2
further applied to (( N 2) − N 2) / 2 + N 2 basic sorting
elements and 4N conveyor registers.
It is shown an improved CSD for 4-bit binary numbers
(n = 4) for 8 input values (N =8) in Fig. 5.
x8 x7 x6 x5 x4 x3 x2 x1
Comparison
Rg8 Rg7 Rg6 Rg5 Rg4 Rg3 Rg2 Rg1
Scheme
BSE2 BSE1
М1 1 М2 Rg16 Rg15 Rg14 Rg13 Rg12 Rg11 Rg10 Rg9
BSE4 BSE3
Y1(Min) Y2 (Max) Rg24 Rg23 Rg22 Rg21 Rg20 Rg19 Rg18 Rg17
Fig.4. Internal structure of the basic sorting element BSE8 BSE7 BSE6 BSE5
When comparing two numbers (X 1 , X 2 ) in the comparison
Rg32 Rg31 Rg30 Rg29 Rg28 Rg27 Rg26 Rg25
scheme "for more" (X 1 > X 2 ), the output of the scheme is
formed a comparison sign, the direct value of which controls
BSE10 BSE9
the multiplexer M1, to issue a smaller number to the output
(Y 1 ) and the inverse value controls the multiplexer M2 to Rg40 Rg39 Rg38 Rg37 Rg36 Rg35 Rg34 Rg33
issue a larger number at the output (Y 2 ) [5,13].
The internal structure of the comparison scheme can be BSE12 BSE11
constructed on the basis of logical elements [5] or accelerated Rg48 Rg47 Rg46 Rg45 Rg44 Rg43 Rg42 Rg41
carry adders [13].
The calculation of the hardware and time complexity of
different comparison schemes and 2-input multiplexers is
given in paper [6]. BSE16 BSE15 BSE14 BSE13
With the account to these data, the hardware complexity of
Rg56 Rg55 Rg54 Rg53 Rg52 Rg51 Rg50 Rg49
the CSD by the modified "bubbles" method for N = 8 and
n= 4 equals:
BSE19 BSE18 BSE17
ACCSD = N × (N -1)/2 × (A CS + 2 × A MP )+(n × A REG ×
×LREG ) = 28 × (24 + 56)+(4 × 4 × 112) = 4032 ( gates ), Rg64 Rg63 Rg62 Rg61 Rg60 Rg59 Rg58 Rg57
where ACCSD - hardware complexity of classical CSD, BSE20 BSE21
A CS - hardware complexity of comparison scheme, Rg72 Rg71 Rg70 Rg69 Rg68 Rg67 Rg66 Rg65
A MP - hardware complexity of multiplexer,
BSE22
A REG - hardware complexity of register,
Rg80 Rg79 Rg78 R77 Rg76 Rg75 Rg74 Rg73
LREG -quantity of conveyor register.
The time complexity of this conveyor device equals: y8 y7 y6 y5 y4 y3 y2 y1
τ CCSD = (τ CS + τ MP ) + (14 ×τ REG ) = Fig.5. An improved structure of the CSD
= (6 + 3) + (14 × 2) = 37ν ( micro − cycles ), The number of basic sorting elements for this CSD for the
where τ - time complexity of classical CSD, binary numbers is equal to 3(( N 2) 2 − N 2)) / 2 + N 2 , and
CCSD
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
126
the number of conveyor registers is 6 N + 4 N for N input
values.
The hardware complexity of an improved CSD for N = 8
and n = 4 is equal to:
AICSD = 22 × (A CS + 2 × A MP )+(n × A REG × LREG ) =
=22 × (16 + 24)+(4 × 4 × 80) =2160 ( gates ),
where AICSD - hardware complexity of improved CSD,
A CS - hardware complexity of improved comparison scheme
[13],
A - hardware complexity of improved multiplexer [13],
MP
A REG - hardware complexity of register,
LREG -quantity of conveyor register.
The time complexity of such CSD is equal to: Fig.7. The dependence graph of the time complexity on the
τ ICSD = (τ CS + τ MP ) + (10 ×τ REG ) = number of the input data for the CSDs
= (3 + 2) + (10 × 2) = 25ν ( micro − cycles ), It is shown in the graph that the time complexity of an
where τ - time complexity of classical CSD; improved CSD requires about 1.5 times less microtacts than
CCSD the classical CSD.
τ CS - time complexity of comparison scheme;
V. RESULTS OF CSD’S SIMULATION AND SYNTHESIS
τ - time complexity of multiplexer;
MP ON FPGA
τ REG - time complexity of register.
Structures of classical and improved CSD for 8 input –one-
So, in comparison with the classic device, we get a byte numbers are described in VHDL (Virtual Hardware
=
reduction in the hardware complexity in K =
4032 2160 1,9 Description Language). The simulation of the developed
A
CSDs on the functional level was carried out, their RTL
times and increase the speed in= =
K 37 25 1,5 time. circuits were obtained and the Xilinx FPGA synthesis was
τ
IV. RESULTS OF THE CONVEYOR SORTING DEVICES performed.
RESEARCH It is shown in Fig. 8 a functional diagram of the improved
CSD operation for 8 input one-byte numbers.
It is shown a dependence graph of logical elements
(valves) number on the input data number for the conveyor
structures of the known (classical) and improved sorting
devices by the modified "bubbles" method.
Fig.8. Functional simulation diagram for CSD of 8 one-byte
numbers
Fig.6. Graph of the dependence of the number of valves on the The diagram shows that an array of unsorted 8-bit values is
input data number for the CSDs given at the inputs of the CSD (D_in1, ..., D_in8) on the 1st
The graph shows that the number of logical elements for cycle. Then we get a sorted descending order at the outputs
an improved CSD is 1.9 times less than for a classic CSD. (D_out1, ..., D_out8) at the 9th cycle.
It is shown a graph of the dependence of the time It is shown in Fig. 9 the topology view of improved CSD
complexity in Fig. 7 expressed in the microtacts on the input with the VHDL implementation model on the
data value for the structures of the known (classical) CSD xc7a100tcsg324-1 crystal (Artix-7 family) of the Xilinx
and the improved CSD with the modified "bubbles" method. firmware by Vivado CAD.
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
127
[2] Thomas Braunl,. Parallel Programming: Transl. from
german ,К.: Vyscha Shkola., pp. 358, 1997.
[3] S.Y. Kung, VLSI array processors. Trans. from English -
М.: “Peace”, pp. 672, 1991.
[4] Thomas H. Cormen, Charles E. Leiserson, Ronald L.
Rivest and Clifford Stein. Introduction to algorithms, 3rd
edition, Massachusetts Institute of Technology, pp.1313,
2009.
[5] A. Melnyk, Memory with ordered access: monograph,
Lviv: Press Lviv Polytechnic, pp. 296, 2014.
[6] A. Melnyk, Computer Architecture, Lutsk:Regional
publisher, pp. 470, 2008.
[7] V. Voevodin, Parallel Computing. Textbook for high
schools, BHV:Peter, pp. 608, 2002.
[8] Andrew S. Tanenbaum Computer Architecture Austin.
6th ed., St. Petersburg: “Peter”, pp. 816, 2013.
[9] O. Kehret, A. Walz and A. Sikora, “Integration of
hardware security modules into a deeply embedded TLS
stack” International Journal of Computing, vol. 15,
issue 1, pp. 24-32, 2016.
[10] V. Zadiraka, Ya. Nykolaichuk. Methods of effective
Fig.9. View of the crystal topology of CSD during implementation protection of information flows, Ternopil: Terno-graf,
at the FPGA pp.308, 2014.
Vivado CAD tools placed the implemented VHDL project [11] S. Melnychuk, A. Voronych, L. Nykolaichuk and B.
of the improved CSD almost in the center of the crystal. Krulikovskyi “The Structure and Components of
Table 1 presents the results of the synthesis of Embedded Special Processors for Determination of
implemented CSDs for sorting 8 one-byte numbers on the Entropy Signals and Random Massages” Perspective
FPGA of the Xilinx firmware. technologies and methods in MEMS design.
Proceedings of XIIIth International Conference.
TABLE 1. RESULTS OF THE CSDS SYNTHESIS ON FPGA MEMSTECH 2017, Lviv-Svalyava, Ukraine, pp.81-84,
Classic CSD Improved CSD February, 2017.
Blocks Blocks [12] B. Krulikovskyi, A. Davletova, V. Gryga and Y.
Clock Clock Nykolaichuk “Synthesis of Components of High
FPGA quantity quantity
frequency frequency Performance Special Proccessors of Execution of
FPGA FPGA
(MHz) (MHz) Arithmetic and Logical Operations Dsts Proccessing in
(CLB) (CLB)
Theoretical and Numerical Basis Rademacher”,
1 Artix 7 950 109,9 475 174,7 Proceedings of XIVth International Conference.
As can be seen from Table 1, experimental results coincide CADSM’2017, Lviv-Poljana, Ukraine, pp. 214-217,
with analytical calculations. February,2017.
[13] V. Gryga, Y. Nykolaichuk, N. Vozna and B.
VI. CONCLUSION Krulikovskyi “Synthesis of a microelectronic structure
During the research of CSDs it is determined that they are of a specialized processor for sorting an array of binary
widely used in multithreaded data processing, and are often numbers” Perspective technologies and methods in
used in digital processing of signals, images and sorting MEMS design. Proceedings of XIIIth International
networks for the rapid transfer of large data arrays. Conference. MEMSTECH 2017, Lviv-Svalyava,
The improved structure of the CSD of binary arrays by the Ukraine, pp.170-173, April, 2017.
modified "bubble" method was developed, the hardware and [14] V. Gryga, I. Kolosov and O. Danyluk “The development
time complexity calculation was performed. of a fast iterative algorithm structure of cosine
As a result of the comparison with the classical CSD for transform” The Experience of Designing and
binary number array by the modified bubble method it is Application of CAD System in Microelectronics.
received decreasing the cost of the equipment in 1.9 times Proceedings of XIIIth International Conference.
and increasing of speed in 1.5 times, which is confirmed by TCSET’2016, Lviv-Poljana, Ukraine, pp.506-509,
the results of the practical implementation of the Xilinx February,2016.
FPGA. [15] R. Dunets and V. Gryga “Spatio-temporal synthesis of
transformation matrix of reverse fast cosine
REFERENCES transformation”, Proceedings of XIIIth International
[1] D. Knuth, The art of computer programming, V.3: Transl. Conference. CADSM’2015, Lviv-Poljana, Ukraine,
from eng. М.: “Peace”, pp. 841, 1978. pp.45-49, February, 2015.
ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic